A Comparative Review of Recent Kinect-based Action Recognition Algorithms

Lei Wang, Du Huynh, Piotr Koniusz

Research output: Contribution to journalArticle

Abstract

Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare ten recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that skeleton-based features are more robust for cross-view recognition than depth-based features, and that deep learning features are suitable for large datasets.
Original languageEnglish
Pages (from-to)15-28
Number of pages13
JournalIEEE Transactions on Image Processing
Volume29
DOIs
Publication statusPublished - 2020

Fingerprint

Computer vision
Cameras
Experiments
Deep learning

Cite this

@article{695959b8a58d489f89d46a3f8d7d147b,
title = "A Comparative Review of Recent Kinect-based Action Recognition Algorithms",
abstract = "Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare ten recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that skeleton-based features are more robust for cross-view recognition than depth-based features, and that deep learning features are suitable for large datasets.",
keywords = "cs.CV",
author = "Lei Wang and Du Huynh and Piotr Koniusz",
year = "2020",
doi = "10.1109/TIP.2019.2925285",
language = "English",
volume = "29",
pages = "15--28",
journal = "IEEE Transactions on Image Processing",
issn = "1057-7149",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",

}

A Comparative Review of Recent Kinect-based Action Recognition Algorithms. / Wang, Lei; Huynh, Du; Koniusz, Piotr.

In: IEEE Transactions on Image Processing, Vol. 29, 2020, p. 15-28.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A Comparative Review of Recent Kinect-based Action Recognition Algorithms

AU - Wang, Lei

AU - Huynh, Du

AU - Koniusz, Piotr

PY - 2020

Y1 - 2020

N2 - Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare ten recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that skeleton-based features are more robust for cross-view recognition than depth-based features, and that deep learning features are suitable for large datasets.

AB - Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare ten recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that skeleton-based features are more robust for cross-view recognition than depth-based features, and that deep learning features are suitable for large datasets.

KW - cs.CV

UR - https://www2.scopus.com/record/display.uri?eid=2-s2.0-85072509273&origin=resultslist&sort=plf-f&src=s&st1=10.1109%2fTIP.2019.2925285&st2=&sid=12cff259703a1dfbe46edd58189a5836&sot=b&sdt=b&sl=29&s=DOI%2810.1109%2fTIP.2019.2925285%29&relpos=0&citeCnt=0&searchTerm=

U2 - 10.1109/TIP.2019.2925285

DO - 10.1109/TIP.2019.2925285

M3 - Article

VL - 29

SP - 15

EP - 28

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

SN - 1057-7149

ER -