Face and facial expression recognition play a crucial role in many applications such as biometrics, human computer interactions and non-verbal communications. The human face can provide important clues/cues to identify people, and determine their emotional state, even without their explicit cooperation. However, variations in illumination conditions, facial pose, occlusion and facial expression (for face recognition), can dramatically degrade the performance of face and facial expression recognition systems. To address these challenges, this thesis presents novel feature extraction methods based on hand-engineered global and local features geared towards the problem of face recognition in still images. Novel feature learning methods are also proposed for the task of video based face and facial expression recognition. The proposed methods are capable of providing robust and distinctive facial features in the presence of variations in illumination, occlusion, pose and image resolution.
The thesis starts by investigating the ability of Curvelet transform to extract robust global features for the task of 3D face recognition under different facial expressions. The benefits of fusing 3D and 2D Curvelet features is also investigated to achieve multimodal face identification.
While such an approach proposed above extracts robust features from semi-rigid regions, it is often hard to automatically detect such regions across different datasets. Thus, a novel Curvelet local feature approach is proposed to extract local features rather than global features. The proposed approach relies on a novel multimodal keypoint detector capable of repeatably identifying keypoints on textured 3D face surfaces. Unique local surface descriptors are then constructed around each detected keypoint by integrating curvelet elements of different orientations. Unlike previously reported curvelet-based face recognition algorithms, which extract global features from textured faces only, our algorithm extracts both texture and 3D local features.
The thesis also addresses the problem of face recognition from low resolution videos (e.g, security camera). This problem introduces new challenges requiring a method capable of exploiting the temporal information or/and appearance variations within image sequences (videos) during the feature extraction.To address these issues, a novel feature learning RBM-based model is proposed to automatically extract the best features, which can represent the semantic knowledge within videos (image sets). The structure of the proposed model involves two hidden sets used to encode the dominant appearances (facial features) and temporal information within videos (image sets). To learn the proposed model, an extension of the standard Constructive Divergence algorithm is proposed to facilitate the encoding of two different feature types (i.e.,facial features and temporal information).
For video based facial expression recognition, the thesis also proposes a novel feature learning RBM-based model to learn effectively the relationships (or transformations) between image pairs associated with different facial expressions. The proposed model has the ability to disentangle these transformations (e.g. pose variations and facial expressions) by encoding them into two different hidden sets. The first hidden set is used to encode facial-expression morphlets, while the second hidden set is used to encode non-facial-expression morphlets. This is achieved using an algorithm, dubbed Quadripartite Contrastive Divergence.
|Qualification||Doctor of Philosophy|
|Publication status||Unpublished - Sept 2015|