Deep learning models achieve impressive performance for skeleton-based human action recognition. Graph convolutional networks (GCNs) are particularly suitable for this task due to the graph-structured nature of skeleton data. However, the robustness of these models to adversarial attacks remains largely unexplored due to their complex spatiotemporal nature that must represent sparse and discrete skeleton joints. This work presents the first adversarial attack on skeleton-based action recognition with GCNs. The proposed targeted attack, termed constrained iterative attack for skeleton actions (CIASA), perturbs joint locations in an action sequence such that the resulting adversarial sequence preserves the temporal coherence, spatial integrity, and the anthropomorphic plausibility of the skeletons. CIASA achieves this feat by satisfying multiple physical constraints and employing spatial skeleton realignments for the perturbed skeletons along with regularization of the adversarial skeletons with generative networks. We also explore the possibility of semantically imperceptible localized attacks with CIASA and succeed in fooling the state-of-the-art skeleton action recognition models with high confidence. CIASA perturbations show high transferability in black-box settings. We also show that the perturbed skeleton sequences are able to induce adversarial behavior in the RGB videos created with computer graphics. A comprehensive evaluation with NTU and Kinetics data sets ascertains the effectiveness of CIASA for graph-based skeleton action recognition and reveals the imminent threat to the spatiotemporal deep learning tasks in general.
|Journal||IEEE Transactions on Neural Networks and Learning Systems|
|Publication status||E-pub ahead of print - 22 Dec 2020|