TY - JOUR
T1 - Causal knowledge extraction from long text maintenance documents
AU - Hershowitz, Brad
AU - Hodkiewicz, Melinda
AU - Bikaun, Tyler
AU - Stewart, Michael
AU - Liu, Wei
PY - 2024/10
Y1 - 2024/10
N2 - Large numbers of maintenance Work Request Notification (WRN) records are created by industry as part of standard business work flows. These digital records hold invaluable insights crucial to best practice in asset management. Of particular interest are the cause–effect relations in the long text WRN field. In this research we develop a two-stage deep learning pipeline to extract cause-and-effect triples and construct a causal graph database. A novel sentence-level noise removal method in the first stage filters out information extraneous to causal semantics. The second stage leverages a joint entity-and-relation extraction model to extract causal relations. To train the noise removal and causality extraction models we produced an annotated dataset of 1027 WRN records. The results for causality extraction as measured by F1-score are 83% and 92% for the identification of Cause and Effect entities respectively, and 78% for a correct causal relation between these entities. The pipeline is applied to a real-word, industrial plant dataset of 98,000 WRN records to produce a graph database. This work provides a framework for technical personnel to query the causes of equipment failures enabling answers to questions such as “what are the most common, costly, and recent causes of failures at my facility?”.
AB - Large numbers of maintenance Work Request Notification (WRN) records are created by industry as part of standard business work flows. These digital records hold invaluable insights crucial to best practice in asset management. Of particular interest are the cause–effect relations in the long text WRN field. In this research we develop a two-stage deep learning pipeline to extract cause-and-effect triples and construct a causal graph database. A novel sentence-level noise removal method in the first stage filters out information extraneous to causal semantics. The second stage leverages a joint entity-and-relation extraction model to extract causal relations. To train the noise removal and causality extraction models we produced an annotated dataset of 1027 WRN records. The results for causality extraction as measured by F1-score are 83% and 92% for the identification of Cause and Effect entities respectively, and 78% for a correct causal relation between these entities. The pipeline is applied to a real-word, industrial plant dataset of 98,000 WRN records to produce a graph database. This work provides a framework for technical personnel to query the causes of equipment failures enabling answers to questions such as “what are the most common, costly, and recent causes of failures at my facility?”.
KW - Causal
KW - Deep learning
KW - Failure mode
KW - Information extraction
KW - Knowledge graph
KW - Maintenance
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85194966625&partnerID=8YFLogxK
U2 - 10.1016/j.compind.2024.104110
DO - 10.1016/j.compind.2024.104110
M3 - Article
AN - SCOPUS:85194966625
SN - 0166-3615
VL - 161
JO - Computers in Industry
JF - Computers in Industry
M1 - 104110
ER -