Learning from heterogenous data
Learnin" rel="nofollow">ing from heterogenous data
Paper details:
“learnin" rel="nofollow">ing from heterogenous data”.
For example the problem of learnin" rel="nofollow">ing representations of human faces (this is a very broad problem that is relevant
to face recognition, expression recognition, 3D face modellin" rel="nofollow">ing, face editin" rel="nofollow">ing and many other topics). Typically the
pipelin" rel="nofollow">ine in" rel="nofollow">involves:
1. collectin" rel="nofollow">ing a dataset
2. transformin" rel="nofollow">ing it in" rel="nofollow">into a homogenous format
3. apply machin" rel="nofollow">ine learnin" rel="nofollow">ing (e.g. manifold learnin" rel="nofollow">ing)
4. new data can be projectedÕÉ≠†Њ† in" rel="nofollow">into the representation so long as it is in" rel="nofollow">in the same form as the train" rel="nofollow">inin" rel="nofollow">ing data
As a concrete example, the famous Active Appearance Model would look like this:
1. data is 2D images of faces
2. face images are labelled with consistent set of landmark poin" rel="nofollow">ints
3. representation is learnt by applyin" rel="nofollow">ing PCA to normalised shape and appearance in" rel="nofollow">information
4. a new 2D face image can be represented if it is labelled with the same set of landmark poin" rel="nofollow">ints
The problem with this sort of pipelin" rel="nofollow">ine is that every approach needs a dataset that exactly conforms to the format
requirements. In reality there are a huge range of datasets with different properties, e.g. 2D versus 3D, labelled
versus unlabelled, different noise properties, different resolutions, some missin" rel="nofollow">ing data (e.g. 3D scans with holes),
3D shape only versus shape and texture, same person at different ages versus sin" rel="nofollow">ingle age, multiview versus sin" rel="nofollow">ingle
view and so on.
Approach is a neural network (probably a CNN) that has multiple in" rel="nofollow">inputs. E.g. have in" rel="nofollow">inputs for 2D image (perhaps
at different resolutions), labels, landmark positions, 3D data and so on. then provide one or more in" rel="nofollow">inputs to the
network and feed forward to the representation. probably tweak the architecture to allow prediction of some in" rel="nofollow">inputs
given other ones.