TY - GEN
T1 - PDF-VQA
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
AU - Ding, Yihao
AU - Luo, Siwen
AU - Chung, Hyunsuk
AU - Han, Soyeon Caren
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023/9/18
Y1 - 2023/9/18
N2 - Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks (The full dataset is released in https://github.com/adlnlp/pdfvqa).
AB - Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding as well as contextual understanding and key information extraction. Our PDF-VQA dataset extends the current scale of document understanding that limits on the single document page to the new scale that asks questions over the full document of multiple pages. We also propose a new graph-based VQA model that explicitly integrates the spatial and hierarchically structural relationships between different document elements to boost the document structural understanding. The performances are compared with several baselines over different question types and tasks (The full dataset is released in https://github.com/adlnlp/pdfvqa).
KW - Document Information Extraction
KW - Document Understanding
KW - Visual Question Answering
UR - http://www.scopus.com/inward/record.url?scp=85174443406&partnerID=8YFLogxK
UR - https://link.springer.com/book/10.1007/978-3-031-43427-3#bibliographic-information
UR - https://onesearch.library.uwa.edu.au/permalink/61UWA_INST/1vk1d8f/alma991487180402101
U2 - 10.1007/978-3-031-43427-3_35
DO - 10.1007/978-3-031-43427-3_35
M3 - Conference paper
AN - SCOPUS:85174443406
SN - 9783031434266
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 585
EP - 601
BT - Machine Learning and Knowledge Discovery in Databases
A2 - De Francisci Morales, Gianmarco
A2 - Bonchi, Francesco
A2 - Perlich, Claudia
A2 - Ruchansky, Natali
A2 - Kourtellis, Nicolas
A2 - Baralis, Elena
PB - Springer Nature Switzerland AG
CY - Switzerland
Y2 - 18 September 2023 through 22 September 2023
ER -