Multimodal graph representation for chart question answering

Yue Dai

Research output: ThesisMaster's Thesis

105 Downloads (Pure)

Abstract

Chart Question Answering (ChartQA) involves interpreting charts and answering questions about them. Existing models often lose object-level information due to transformer-based image patching inputs. To address this, we propose a multimodal scene graph combining visual and textual graphs to capture structural and semantic relationships in charts. We further introduce a multimodal graph contrastive learning framework for better feature fusion, integrated as a soft prompt in the model's decoder. Experimental results show significant performance improvements. Additionally, we explore chain-of-thought prompting to reduce hallucinations in large language models. We conclude with future directions to advance ChartQA research.
Original languageEnglish
QualificationMasters
Awarding Institution
  • The University of Western Australia
Supervisors/Advisors
  • Liu, Wei, Supervisor
  • Han, Caren, Supervisor
Award date19 Mar 2025
DOIs
Publication statusUnpublished - 2025

Fingerprint

Dive into the research topics of 'Multimodal graph representation for chart question answering'. Together they form a unique fingerprint.
  • MSG-Chart: Multimodal Scene Graph for ChartQA

    Dai, Y., Han, S. C. & Liu, W., 21 Oct 2024, CIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. Association for Computing Machinery (ACM), p. 3709-3713 5 p. (International Conference on Information and Knowledge Management, Proceedings).

    Research output: Chapter in Book/Conference paperConference paperpeer-review

    Open Access

Cite this