MaintIE: A Fine-Grained Annotation Schema and Benchmark for Information Extraction from Maintenance Short Texts.

Research output: Chapter in Book/Conference paperConference paperpeer-review

1 Citation (Scopus)
22 Downloads (Pure)

Abstract

Maintenance short texts (MST), derived from maintenance work order records, encapsulate crucial information in a concise yet information-rich format. These user-generated technical texts provide critical insights into the state and maintenance activities of machines, infrastructure, and other engineered assets–pillars of the modern economy. Despite their importance for asset management decision-making, extracting and leveraging this information at scale remains a significant challenge. This paper presents MaintIE, a multi-level fine-grained annotation scheme for entity recognition and relation extraction, consisting of 5 top-level classes: PhysicalObject, State, Process, Activity and Property and 224 leaf entities, along with 6 relations tailored to MSTs. Using MaintIE, we have curated a multi-annotator, high-quality, fine-grained corpus of 1,076 annotated texts. Additionally, we present a coarse-grained corpus of 7,000 texts and consider its performance for bootstrapping and enhancing fine-grained information extraction. Using these corpora, we provide model performance measures for benchmarking automated entity recognition and relation extraction. The MaintIE scheme, corpus, and model are publicly available at https://github.com/nlp-tlp/maintie under the MIT license, encouraging further community exploration and innovation in extracting valuable insights from MSTs.
Original languageEnglish
Title of host publicationLREC/COLING
Subtitle of host publicationProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherAssociation for Computational Linguistics (ACL)
Pages10939-10951
Number of pages13
ISBN (Electronic)9782493814104
Publication statusPublished - 2024
Event2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation - Torino, Italy
Duration: 20 May 202425 May 2024

Publication series

Name2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

Conference

Conference2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Abbreviated titleLREC/COLING 2024
Country/TerritoryItaly
CityTorino
Period20/05/2425/05/24

Fingerprint

Dive into the research topics of 'MaintIE: A Fine-Grained Annotation Schema and Benchmark for Information Extraction from Maintenance Short Texts.'. Together they form a unique fingerprint.

Cite this