3D action recognition from novel viewpoints

Hossein Rahmani, Ajmal Mian

    Research output: Chapter in Book/Conference paperConference paperpeer-review

    112 Citations (Scopus)


    We propose a human pose representation model that transfers human poses acquired from different unknown views to a view-invariant high-level space. The model is a deep convolutional neural network and requires a large corpus of multiview training data which is very expensive to acquire. Therefore, we propose a method to generate this data by fitting synthetic 3D human models to real motion capture data and rendering the human poses from numerous viewpoints. While learning the CNN model, we do not use action labels but only the pose labels after clustering all training poses into k clusters. The proposed model is able to generalize to real depth images of unseen poses without the need for re-training or fine-tuning. Real depth videos are passed through the model frame-wise to extract viewinvariant features. For spatio-temporal representation, we propose group sparse Fourier Temporal Pyramid which robustly encodes the action specific most discriminative output features of the proposed human pose model. Experiments on two multiview and three single-view benchmark datasets show that the proposed method dramatically outperforms existing state-of-the-art in action recognition.
    Original languageEnglish
    Title of host publicationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
    EditorsLisa O'Conner
    Place of PublicationUnited States
    PublisherIEEE, Institute of Electrical and Electronics Engineers
    Number of pages10
    ISBN (Electronic)1063-6919
    ISBN (Print)9781467388511
    Publication statusPublished - 2016
    Event2016 IEEE Conference on Computer Vision and Pattern Recognition - Las Vegas, United States
    Duration: 26 Jun 20161 Jul 2016


    Conference2016 IEEE Conference on Computer Vision and Pattern Recognition
    Country/TerritoryUnited States
    CityLas Vegas


    Dive into the research topics of '3D action recognition from novel viewpoints'. Together they form a unique fingerprint.

    Cite this