Learning from heterogenous data

Learnin" rel="nofollow">ing from heterogenous data Paper details: “learnin" rel="nofollow">ing from heterogenous data”. For example the problem of learnin" rel="nofollow">ing representations of human faces (this is a very broad problem that is relevant to face recognition, expression recognition, 3D face modellin" rel="nofollow">ing, face editin" rel="nofollow">ing and many other topics). Typically the pipelin" rel="nofollow">ine in" rel="nofollow">involves: 1. collectin" rel="nofollow">ing a dataset 2. transformin" rel="nofollow">ing it in" rel="nofollow">into a homogenous format 3. apply machin" rel="nofollow">ine learnin" rel="nofollow">ing (e.g. manifold learnin" rel="nofollow">ing) 4. new data can be projectedÕÉ≠†Њ† in" rel="nofollow">into the representation so long as it is in" rel="nofollow">in the same form as the train" rel="nofollow">inin" rel="nofollow">ing data As a concrete example, the famous Active Appearance Model would look like this: 1. data is 2D images of faces 2. face images are labelled with consistent set of landmark poin" rel="nofollow">ints 3. representation is learnt by applyin" rel="nofollow">ing PCA to normalised shape and appearance in" rel="nofollow">information 4. a new 2D face image can be represented if it is labelled with the same set of landmark poin" rel="nofollow">ints The problem with this sort of pipelin" rel="nofollow">ine is that every approach needs a dataset that exactly conforms to the format requirements. In reality there are a huge range of datasets with different properties, e.g. 2D versus 3D, labelled versus unlabelled, different noise properties, different resolutions, some missin" rel="nofollow">ing data (e.g. 3D scans with holes), 3D shape only versus shape and texture, same person at different ages versus sin" rel="nofollow">ingle age, multiview versus sin" rel="nofollow">ingle view and so on. Approach is a neural network (probably a CNN) that has multiple in" rel="nofollow">inputs. E.g. have in" rel="nofollow">inputs for 2D image (perhaps at different resolutions), labels, landmark positions, 3D data and so on. then provide one or more in" rel="nofollow">inputs to the network and feed forward to the representation. probably tweak the architecture to allow prediction of some in" rel="nofollow">inputs given other ones.