Causal knowledge extraction from long text maintenance documents

Research output: Contribution to journalArticlepeer-review


Large numbers of maintenance Work Request Notification (WRN) records are created by industry as part of standard business work flows. These digital records hold invaluable insights crucial to best practice in asset management. Of particular interest are the cause–effect relations in the long text WRN field. In this research we develop a two-stage deep learning pipeline to extract cause-and-effect triples and construct a causal graph database. A novel sentence-level noise removal method in the first stage filters out information extraneous to causal semantics. The second stage leverages a joint entity-and-relation extraction model to extract causal relations. To train the noise removal and causality extraction models we produced an annotated dataset of 1027 WRN records. The results for causality extraction as measured by F1-score are 83% and 92% for the identification of Cause and Effect entities respectively, and 78% for a correct causal relation between these entities. The pipeline is applied to a real-word, industrial plant dataset of 98,000 WRN records to produce a graph database. This work provides a framework for technical personnel to query the causes of equipment failures enabling answers to questions such as “what are the most common, costly, and recent causes of failures at my facility?”.

Original languageEnglish
Article number104110
Number of pages15
JournalComputers in Industry
Early online date31 May 2024
Publication statusE-pub ahead of print - 31 May 2024


Dive into the research topics of 'Causal knowledge extraction from long text maintenance documents'. Together they form a unique fingerprint.

Cite this