Viewpoint invariant RGB-D human action recognition

Research output: Chapter in Book/Conference paperConference paper

3 Citations (Scopus)

Abstract

Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L1L2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.
Original languageEnglish
Title of host publication2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA)
EditorsY. Guo, H. Li, W. Cai, M. Murshed, Z. Wang, J. Gao, D.D. Feng
Place of PublicationUnited States
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages261-268
ISBN (Print)9781538628393
DOIs
Publication statusPublished - 2017
Event2017 International Conference on Digital Image Computing: Techniques and Applications - Sydney, Australia
Duration: 29 Nov 20171 Dec 2017

Conference

Conference2017 International Conference on Digital Image Computing: Techniques and Applications
Abbreviated titleDICTA
CountryAustralia
CitySydney
Period29/11/171/12/17

Fingerprint

Classifiers
Cameras
Trajectories

Cite this

Liu, J., Akhtar, N., & Mian, A. S. (2017). Viewpoint invariant RGB-D human action recognition. In Y. Guo, H. Li, W. Cai, M. Murshed, Z. Wang, J. Gao, & D. D. Feng (Eds.), 2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA) (pp. 261-268). United States: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/DICTA.2017.8227505
Liu, Jian ; Akhtar, Naveed ; Mian, Ajmal Saeed. / Viewpoint invariant RGB-D human action recognition. 2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA). editor / Y. Guo ; H. Li ; W. Cai ; M. Murshed ; Z. Wang ; J. Gao ; D.D. Feng. United States : IEEE, Institute of Electrical and Electronics Engineers, 2017. pp. 261-268
@inproceedings{37ff6df807474ebf9a74d701e8651d19,
title = "Viewpoint invariant RGB-D human action recognition",
abstract = "Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L1L2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2{\%} improvement in the accuracy over the nearest competitor.",
author = "Jian Liu and Naveed Akhtar and Mian, {Ajmal Saeed}",
year = "2017",
doi = "10.1109/DICTA.2017.8227505",
language = "English",
isbn = "9781538628393",
pages = "261--268",
editor = "Y. Guo and H. Li and W. Cai and M. Murshed and Z. Wang and J. Gao and D.D. Feng",
booktitle = "2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA)",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
address = "United States",

}

Liu, J, Akhtar, N & Mian, AS 2017, Viewpoint invariant RGB-D human action recognition. in Y Guo, H Li, W Cai, M Murshed, Z Wang, J Gao & DD Feng (eds), 2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA). IEEE, Institute of Electrical and Electronics Engineers, United States, pp. 261-268, 2017 International Conference on Digital Image Computing: Techniques and Applications, Sydney, Australia, 29/11/17. https://doi.org/10.1109/DICTA.2017.8227505

Viewpoint invariant RGB-D human action recognition. / Liu, Jian; Akhtar, Naveed; Mian, Ajmal Saeed.

2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA). ed. / Y. Guo; H. Li; W. Cai; M. Murshed; Z. Wang; J. Gao; D.D. Feng. United States : IEEE, Institute of Electrical and Electronics Engineers, 2017. p. 261-268.

Research output: Chapter in Book/Conference paperConference paper

TY - GEN

T1 - Viewpoint invariant RGB-D human action recognition

AU - Liu, Jian

AU - Akhtar, Naveed

AU - Mian, Ajmal Saeed

PY - 2017

Y1 - 2017

N2 - Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L1L2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.

AB - Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L1L2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.

U2 - 10.1109/DICTA.2017.8227505

DO - 10.1109/DICTA.2017.8227505

M3 - Conference paper

SN - 9781538628393

SP - 261

EP - 268

BT - 2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA)

A2 - Guo, Y.

A2 - Li, H.

A2 - Cai, W.

A2 - Murshed, M.

A2 - Wang, Z.

A2 - Gao, J.

A2 - Feng, D.D.

PB - IEEE, Institute of Electrical and Electronics Engineers

CY - United States

ER -

Liu J, Akhtar N, Mian AS. Viewpoint invariant RGB-D human action recognition. In Guo Y, Li H, Cai W, Murshed M, Wang Z, Gao J, Feng DD, editors, 2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA). United States: IEEE, Institute of Electrical and Electronics Engineers. 2017. p. 261-268 https://doi.org/10.1109/DICTA.2017.8227505