Learning techniques for multi-modal facial analysis

Munawar Hayat

    Research output: ThesisDoctoral Thesis

    655 Downloads (Pure)


    Face and facial expression recognition are two important facial analysis tasks with numerous real life applications. This dissertation investigates the suitability of different data modalities for these two tasks. Specifically, the dissertation first proposes a method for the automatic analysis of textured 3D videos for facial expression recognition. The task of face recognition is then considered across multiple data modalities which include 3D static images and videos, RGB-D images acquired from low cost Kinect sensor and low quality grey scale images acquired from surveillance cameras. The dissertation is organized as a set of papers already published or submitted to journals or internationally refereed conferences.

    The dissertation first evaluates and compares existing methods of spatiotemporal feature description for 2D video-based facial expression recognition. It then presents an automatic framework, which exploits the dynamics of textured 3D videos for the recognizing six discrete facial expressions. Specifically, local video-patches of variable lengths are extracted from numerous locations of the training videos and represented as points on the Grassmannian manifold. An efficient graph-based spectral clustering algorithm is proposed to separately cluster these points for every expression class. Using a valid Grassmannian kernel function, the resulting cluster centers are embedded into a Reproducing Kernel Hilbert Space (RKHS) where six binary SVM models are learnt for classification.

    The dissertation then proposes manifold learning, deep learning and discriminative learning techniques for face recognition across multiple data modalities. First, a computationally efficient low level feature description method is proposed for face recognition from 3D static images. A method for the spatiotemporal evaluation of 3D videos is then presented. Face recognition from RGB-D images acquired from Kinect sensor is then considered as an image set classification problem. A method for the compact description of image sets using Riemannian geometry is proposed in this regards. For classification, SVM models are learnt on the Lie group of Riemannian manifold. The dissertation then finally considers face recognition from low quality imagery acquired from easily installable video surveillance cameras. Face recognition from this data modality is also studied under the framework of image set classification. For this purpose, two independent high performing methods are proposed. The first method learns deep reconstruction models, which can automatically discover the underlying complex geometric structure of the images in an image set. The second method empowers well developed binary classifiers for the task of multi-class image set classification. Compared to the existing binary to multiclass extension strategies, the proposed method is very efficient since it only trains few binary classifiers and uses very few images for the training of each of these classifiers.

    Original languageEnglish
    QualificationDoctor of Philosophy
    • An, Senjian, Supervisor
    • Bennamoun, Mohammed, Supervisor
    Publication statusUnpublished - Aug 2015


    Dive into the research topics of 'Learning techniques for multi-modal facial analysis'. Together they form a unique fingerprint.

    Cite this