PDF-VQA: A New Dataset for Real-World VQA on PDF Documents

Yihao Ding, Siwen Luo, Hyunsuk Chung, Soyeon Caren Han

Research output: Chapter in Book/Conference paperConference paperpeer-review

Abstract

Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks (The full dataset is released in https://github.com/adlnlp/pdfvqa).

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationApplied Data Science and Demo Track - European Conference, ECML PKDD 2023, Proceedings Part VI
EditorsGianmarco De Francisci Morales, Francesco Bonchi, Claudia Perlich, Natali Ruchansky, Nicolas Kourtellis, Elena Baralis
Place of PublicationSwitzerland
PublisherSpringer Nature Switzerland AG
Pages585-601
Number of pages17
Edition1
ISBN (Electronic)9783031434273
ISBN (Print)9783031434266
DOIs
Publication statusPublished - 18 Sept 2023
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy
Duration: 18 Sept 202322 Sept 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14174 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023
Country/TerritoryItaly
CityTurin
Period18/09/2322/09/23

Fingerprint

Dive into the research topics of 'PDF-VQA: A New Dataset for Real-World VQA on PDF Documents'. Together they form a unique fingerprint.

Cite this