Viewpoint Invariant RGB-D Human Action Recognition

Research output: Contribution to journalArticle

Abstract

Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L 1 L 2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.
Original languageEnglish
JournalIEEE Access
Publication statusPublished - 2018

Fingerprint

Classifiers
Cameras
Trajectories

Cite this

@article{ad28978683ec4e79a7a875435ec57e0d,
title = "Viewpoint Invariant RGB-D Human Action Recognition",
abstract = "Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L 1 L 2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2{\%} improvement in the accuracy over the nearest competitor.",
author = "Jian Liu and Naveed Akhtar and Ajmal Mian",
year = "2018",
language = "English",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",

}

Viewpoint Invariant RGB-D Human Action Recognition. / Liu, Jian; Akhtar, Naveed; Mian, Ajmal.

In: IEEE Access, 2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Viewpoint Invariant RGB-D Human Action Recognition

AU - Liu, Jian

AU - Akhtar, Naveed

AU - Mian, Ajmal

PY - 2018

Y1 - 2018

N2 - Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L 1 L 2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.

AB - Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L 1 L 2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.

M3 - Article

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

ER -