Chart Question Answering (ChartQA) involves interpreting charts and answering questions about them. Existing models often lose object-level information due to transformer-based image patching inputs. To address this, we propose a multimodal scene graph combining visual and textual graphs to capture structural and semantic relationships in charts. We further introduce a multimodal graph contrastive learning framework for better feature fusion, integrated as a soft prompt in the model's decoder. Experimental results show significant performance improvements. Additionally, we explore chain-of-thought prompting to reduce hallucinations in large language models. We conclude with future directions to advance ChartQA research.
Original language | English |
---|
Qualification | Masters |
---|
Awarding Institution | - The University of Western Australia
|
---|
Supervisors/Advisors | - Liu, Wei, Supervisor
- Han, Caren, Supervisor
|
---|
Award date | 19 Mar 2025 |
---|
DOIs | |
---|
Publication status | Unpublished - 2025 |
---|