Attention in Convolutional LSTM for Gesture Recognition

Research output: Contribution to journalConference articlepeer-review

76 Citations (Scopus)


Convolutional long short-term memory (LSTM) networks have been widely used for action/gesture recognition, and different attention mechanisms have also been embedded into the LSTM or the convolutional LSTM (ConvLSTM) networks. Based on the previous gesture recognition architectures which combine the three-dimensional convolution neural network (3DCNN) and ConvLSTM, this paper explores the effects of attention mechanism in ConvLSTM. Several variants of ConvLSTM are evaluated: (a) Removing the convolutional structures of the three gates in ConvLSTM, (b) Applying the attention mechanism on the input of ConvLSTM, (c) Reconstructing the input and (d) output gates respectively with the modified channel-wise attention mechanism. The evaluation results demonstrate that the spatial convolutions in the three gates scarcely contribute to the spatiotemporal feature fusion, and the attention mechanisms embedded into the input and output gates cannot improve the feature fusion. In other words, ConvLSTM mainly contributes to the temporal fusion along with the recurrent steps to learn the long-term spatiotemporal features, when taking as input the spatial or spatiotemporal features. On this basis, a new variant of LSTM is derived, in which the convolutional structures are only embedded into the input-to-state transition of LSTM. The code of the LSTM variants is publicly available.
Original languageEnglish
Pages (from-to)1953-1962
Number of pages10
JournalAdvances in Neural Information Processing Systems
Publication statusPublished - 1 Oct 2018
Event32nd Conference on Neural Information Processing Systems - Montréal, Canada
Duration: 3 Dec 20188 Dec 2018


Dive into the research topics of 'Attention in Convolutional LSTM for Gesture Recognition'. Together they form a unique fingerprint.

Cite this