Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network

Jichao Kan, Kun Hu, Markus Hagenbuchner, Ah Chung Tsoi, Mohammed Bennamoun, Zhiyong Wang

Research output: Chapter in Book/Conference paperConference paperpeer-review

Abstract

Sign language translation (SLT), which generates text in a spoken language from visual content in a sign language, is important to assist the hard-of-hearing community for their communications. Inspired by neural machine translation (NMT), most existing SLT studies adopted a general sequence to sequence learning strategy. However, SLT is significantly different from general NMT tasks since sign languages convey messages through multiple visual-manual aspects. Therefore, in this paper, these unique characteristics of sign languages are formulated as hierarchical spatio-temporal graph representations, including high-level and fine-level graphs of which a vertex characterizes a specified body part and an edge represents their interactions. Particularly, high-level graphs represent the patterns in the regions such as hands and face, and fine-level graphs consider the joints of hands and landmarks of facial regions. To learn these graph patterns, a novel deep learning architecture, namely hierarchical spatio-temporal graph neural network (HST-GNN), is proposed. Graph convolutions and graph self-attentions with neighborhood context are proposed to characterize both the local and the global graph properties. Experimental results on benchmark datasets demonstrated the effectiveness of the proposed method.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages2131-2140
Number of pages10
ISBN (Electronic)9781665409155
DOIs
Publication statusPublished - 2022
Event22nd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022 - Waikoloa, United States
Duration: 4 Jan 20228 Jan 2022

Conference

Conference22nd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022
Country/TerritoryUnited States
CityWaikoloa
Period4/01/228/01/22

Fingerprint

Dive into the research topics of 'Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network'. Together they form a unique fingerprint.

Cite this