Adversarial Attack on Skeleton-Based Human Action Recognition

Research output: Contribution to journalArticle

Abstract

Deep learning models achieve impressive performance for skeleton-based human action recognition. Graph convolutional networks (GCNs) are particularly suitable for this task due to the graph-structured nature of skeleton data. However, the robustness of these models to adversarial attacks remains largely unexplored due to their complex spatiotemporal nature that must represent sparse and discrete skeleton joints. This work presents the first adversarial attack on skeleton-based action recognition with GCNs. The proposed targeted attack, termed constrained iterative attack for skeleton actions (CIASA), perturbs joint locations in an action sequence such that the resulting adversarial sequence preserves the temporal coherence, spatial integrity, and the anthropomorphic plausibility of the skeletons. CIASA achieves this feat by satisfying multiple physical constraints and employing spatial skeleton realignments for the perturbed skeletons along with regularization of the adversarial skeletons with generative networks. We also explore the possibility of semantically imperceptible localized attacks with CIASA and succeed in fooling the state-of-the-art skeleton action recognition models with high confidence. CIASA perturbations show high transferability in black-box settings. We also show that the perturbed skeleton sequences are able to induce adversarial behavior in the RGB videos created with computer graphics. A comprehensive evaluation with NTU and Kinetics data sets ascertains the effectiveness of CIASA for graph-based skeleton action recognition and reveals the imminent threat to the spatiotemporal deep learning tasks in general.

Original languageEnglish
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
Publication statusE-pub ahead of print - 22 Dec 2020

Fingerprint Dive into the research topics of 'Adversarial Attack on Skeleton-Based Human Action Recognition'. Together they form a unique fingerprint.

Cite this