Viewpoint invariant RGB-D human action recognition

Research output: Chapter in Book/Conference paperConference paperpeer-review

5 Citations (Scopus)


Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L1L2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.
Original languageEnglish
Title of host publication2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA)
EditorsY. Guo, H. Li, W. Cai, M. Murshed, Z. Wang, J. Gao, D.D. Feng
Place of PublicationUnited States
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISBN (Print)9781538628393
Publication statusPublished - 2017
Event2017 International Conference on Digital Image Computing: Techniques and Applications - Sydney, Australia
Duration: 29 Nov 20171 Dec 2017


Conference2017 International Conference on Digital Image Computing: Techniques and Applications
Abbreviated titleDICTA


Dive into the research topics of 'Viewpoint invariant RGB-D human action recognition'. Together they form a unique fingerprint.

Cite this