TY - JOUR
T1 - Learning Sparse Temporal Video Mapping for Action Quality Assessment in Floor Gymnastics
AU - Zahan, Sania
AU - Hassan, Mubashar
AU - Mian, Ajmal
PY - 2024/5/16
Y1 - 2024/5/16
N2 - Automated athlete performance measurement from sports videos can significantly enhance sports evaluation. It requires modeling extended sequences, as the intricate spatio-temporal progression significantly influences overall performance. However, the lack of comprehensive datasets with long-duration samples has hindered researchers from focusing on temporal aspects, leading them to primarily concentrate on spatial structures for assessing short-duration videos. Consequently, in-depth analysis of longer videos has received limited attention. This study aims to explore long-term videos and analyze local discriminative spatial dependencies and global semantics for sports action quality assessment (AQA). A new dataset, coined AGF-Olympics, is presented in this paper incorporating artistic gymnastic floor routines. It features highly challenging scenarios with extensive variations in background, viewpoint, and scale, with a duration of up to 2 min. Finally, a discriminative non-local attention (DNLA) is introduced for score regression that effectively maps dense feature space to a sparse representation by disentangling complex associations in long-duration sports videos. DNLA encodes crucial features by analyzing cross-space–time correlations and filtering out features with lower significance. Thus, it ensures that the model prioritizes significant joints in the spatial domain and frames in the temporal domain. Experimental results demonstrate that the proposed method achieves superior performances and provides a benchmark for the AGF-Olympics dataset. Overall, the proposed method achieves a 7% higher regression rate with a 65.14% reduction in FLOPS and 52.82% faster inference time compared to the current state-of-the-art method.
AB - Automated athlete performance measurement from sports videos can significantly enhance sports evaluation. It requires modeling extended sequences, as the intricate spatio-temporal progression significantly influences overall performance. However, the lack of comprehensive datasets with long-duration samples has hindered researchers from focusing on temporal aspects, leading them to primarily concentrate on spatial structures for assessing short-duration videos. Consequently, in-depth analysis of longer videos has received limited attention. This study aims to explore long-term videos and analyze local discriminative spatial dependencies and global semantics for sports action quality assessment (AQA). A new dataset, coined AGF-Olympics, is presented in this paper incorporating artistic gymnastic floor routines. It features highly challenging scenarios with extensive variations in background, viewpoint, and scale, with a duration of up to 2 min. Finally, a discriminative non-local attention (DNLA) is introduced for score regression that effectively maps dense feature space to a sparse representation by disentangling complex associations in long-duration sports videos. DNLA encodes crucial features by analyzing cross-space–time correlations and filtering out features with lower significance. Thus, it ensures that the model prioritizes significant joints in the spatial domain and frames in the temporal domain. Experimental results demonstrate that the proposed method achieves superior performances and provides a benchmark for the AGF-Olympics dataset. Overall, the proposed method achieves a 7% higher regression rate with a 65.14% reduction in FLOPS and 52.82% faster inference time compared to the current state-of-the-art method.
U2 - 10.1109/TIM.2024.3398072
DO - 10.1109/TIM.2024.3398072
M3 - Article
SN - 0018-9456
VL - 73
SP - 1
EP - 11
JO - Institute of Electrical and Electronics Engineers Transactions on Instrumentation and Measurement
JF - Institute of Electrical and Electronics Engineers Transactions on Instrumentation and Measurement
M1 - 5020311
ER -