TY - JOUR
T1 - Variation-aware directed graph convolutional networks for skeleton-based action recognition
AU - Li, Tianchen
AU - Geng, Pei
AU - Cai, Guohui
AU - Hou, Xinran
AU - Lu, Xuequan
AU - Lyu, Lei
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/10/25
Y1 - 2024/10/25
N2 - Directed Graph convolutional networks (DGCNs) have been indeed gaining attention and being applied in skeleton-based action recognition tasks to capture the hierarchical relationships of skeleton via directed graph topology. However, they typically pay the same attention to regions with variations and regions with relative statics of the skeleton, which leads to low accuracy when recognizing fine-grained actions. To this end, we design an innovative variation-aware directed graph convolutional network (VA-DGCN) to focus on the regions where the variation takes place. VA-DGCN comprises a variation-aware directed spatial convolution (VDSC) module and a multi-scale contrastive temporal convolution (MCTC) module. Specifically, in order to capture subtle variations in fine-grained actions, VDSC introduce the average posture of the action sequence as the static anchor. In VDSC, the channel-specific topology branch is designed to model the different kinematic properties of different channels to extract global features, to which global-attention graph convolution is added for unnaturally connected joints. Subsequently, we identify the regions with variations by comparing global features acquired from action sequence and average posture. In temporal, MCTC comprises multiple branches for extracting temporal features at different scales. Moreover, to maximize the mutual information between branches, we introduce contrastive learning to drive the module to learn more meaningful action representations. We conduct extensive experiments on three public datasets to validate the feasibility and efficacy of our proposed VA-DGCN.
AB - Directed Graph convolutional networks (DGCNs) have been indeed gaining attention and being applied in skeleton-based action recognition tasks to capture the hierarchical relationships of skeleton via directed graph topology. However, they typically pay the same attention to regions with variations and regions with relative statics of the skeleton, which leads to low accuracy when recognizing fine-grained actions. To this end, we design an innovative variation-aware directed graph convolutional network (VA-DGCN) to focus on the regions where the variation takes place. VA-DGCN comprises a variation-aware directed spatial convolution (VDSC) module and a multi-scale contrastive temporal convolution (MCTC) module. Specifically, in order to capture subtle variations in fine-grained actions, VDSC introduce the average posture of the action sequence as the static anchor. In VDSC, the channel-specific topology branch is designed to model the different kinematic properties of different channels to extract global features, to which global-attention graph convolution is added for unnaturally connected joints. Subsequently, we identify the regions with variations by comparing global features acquired from action sequence and average posture. In temporal, MCTC comprises multiple branches for extracting temporal features at different scales. Moreover, to maximize the mutual information between branches, we introduce contrastive learning to drive the module to learn more meaningful action representations. We conduct extensive experiments on three public datasets to validate the feasibility and efficacy of our proposed VA-DGCN.
KW - 3D human skeleton data
KW - Fine-grained action recognition
KW - Graph convolutional network
KW - Self-attention mechanism
UR - https://www.scopus.com/pages/publications/85201154919
U2 - 10.1016/j.knosys.2024.112319
DO - 10.1016/j.knosys.2024.112319
M3 - Article
AN - SCOPUS:85201154919
SN - 0950-7051
VL - 302
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 112319
ER -