Continuous Gesture Segmentation and Recognition using 3DCNN and Convolutional LSTM

Guangming Zhu, Liang Zhang, Peiyi Shen, Juan Song, Syed Afaq Ali Shah, Mohammed Bennamoun

Research output: Contribution to journalArticlepeer-review

67 Citations (Scopus)


Continuous gesture recognition aims at recognizing the ongoing gestures from continuous gesture sequences, and is more meaningful for the scenarios where the start and end frames of each gesture instance are generally unknown in practical applications. This paper presents an effective deep architecture for continuous gesture recognition. Firstly, continuous gesture sequences are segmented into isolated gesture instances using the proposed temporal dilated Res3D network. A balanced squared hinge loss function is proposed to deal with the imbalance between boundaries and non-boundaries. Temporal dilation can preserve the temporal information for the dense detection of the boundaries at fine granularity, and the large temporal receptive field makes the segmentation results more reasonable and effective. Then, the recognition network is constructed based on the 3D convolutional neural network (3DCNN), the convolutional Long-Short-Term-Memory network (ConvLSTM), and the 2D convolutional neural network (2DCNN) for isolated gesture recognition. The "3DCNN-ConvLSTM-2DCNN" architecture is more effective to learn long-term and deep spatiotemporal features. The proposed segmentation and recognition networks obtain the Jaccard index of 0.7163 on the Chalearn LAP ConGD dataset, which is 0.106 higher than the winner of 2017 ChaLearn LAP Large-scale Continuous Gesture Recognition Challenge.

Original languageEnglish
Article number8458185
Pages (from-to)1011-1021
Number of pages11
JournalIEEE Transactions on Multimedia
Issue number4
Publication statusPublished - Apr 2019


Dive into the research topics of 'Continuous Gesture Segmentation and Recognition using 3DCNN and Convolutional LSTM'. Together they form a unique fingerprint.

Cite this