REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

Siwen Luo, Soyeon Caren Han, Kaiyuan Sun, Josiah Poon

Research output: Chapter in Book/Conference paperConference paperpeer-review

4 Citations (Scopus)

Abstract

Visual Question Answering (VQA) is a challenging multi-modal task that requires not only the semantic understanding of images and questions, but also the sound perception of a step-by-step reasoning process that would lead to the correct answer. So far, most successful attempts in VQA have been focused on only one aspect; either the interaction of visual pixel features of images and word features of questions, or the reasoning process of answering the question of an image with simple objects. In this paper, we propose a deep reasoning VQA model (REXUP- REason, EXtract, and UPdate) with explicit visual structure-aware textual information, and it works well in capturing step-by-step reasoning process and detecting complex object-relationships in photo-realistic images. REXUP consists of two branches, image object-oriented and scene graph-oriented, which jointly works with the super-diagonal fusion compositional attention networks. We evaluate REXUP on the benchmark GQA dataset and conduct extensive ablation studies to explore the reasons behind REXUP’s effectiveness. Our best model significantly outperforms the previous state-of-the-art, which delivers 92.7% on the validation set, and 73.1% on the test-dev set. Our code is available at: https://github.com/usydnlp/REXUP/.

Original languageEnglish
Title of host publicationNeural Information Processing - 27th International Conference, ICONIP 2020, Proceedings
EditorsHaiqin Yang, Kitsuchart Pasupa, Andrew Chi-Sing Leung, James T. Kwok, Jonathan H. Chan, Irwin King
PublisherSpringer
Pages520-532
Number of pages13
ISBN (Print)9783030638290
DOIs
Publication statusPublished - 2020
Externally publishedYes
Event27th International Conference on Neural Information Processing, ICONIP 2020 - Bangkok, Thailand
Duration: 18 Nov 202022 Nov 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12532 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Neural Information Processing, ICONIP 2020
Country/TerritoryThailand
CityBangkok
Period18/11/2022/11/20

Fingerprint

Dive into the research topics of 'REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering'. Together they form a unique fingerprint.

Cite this