Using Context-Free Grammar to Generate Synthetic Technical Short Texts

Research output: Chapter in Book/Conference paperConference paperpeer-review

1 Citation (Scopus)

Abstract

Valuable technical information are buried in the under-utilised, user-generated technical texts in engineering domains, such as manufacturing, logistics and maintenance. For maintenance and reliability personnel, the unstructured technical text in maintenance work orders (MWO) hold crucial information about failures and work performed on physical assets. However, the domain-specific language used and scarcity of shared labelled data sets in these contexts present formidable challenges to contemporary natural language processing (NLP) techniques, resulting in inability to achieve performance similar to those in non-engineering domains. In this work, we explore the structure of language in technical short texts by learning a context-free grammar (CFG) through unsupervised grammar induction on industrial MWO texts. We exploit the grammar’s generative properties for novel sentence generation and corpus construction and assess its viability for developing synthetic MWO data sets. The results demonstrate a) there exists a grammar in the MWOs, b) the grammar was able to model aspects of the maintenance technical language to produce 12k of synthetic MWO texts 93% as natural and 87% as correct as real texts, and c) the domain-specific language used in technical short text remains challenging to parse due to low data quality and sparsity. Contributions of this work include baseline results for a grammar-based synthetic technical text generation and an appreciation for challenges in assessing the engineering correctness and naturalness of the new synthetic texts.

Original languageEnglish
Title of host publicationAI 2022
Subtitle of host publicationAdvances in Artificial Intelligence - 35th Australasian Joint Conference, AI 2022, Proceedings
EditorsHaris Aziz, Débora Corrêa, Tim French
PublisherSpringer Science + Business Media
Pages325-338
Number of pages14
Volume13728
ISBN (Electronic)9783031226953
ISBN (Print)9783031226946
DOIs
Publication statusPublished - 2022
Event35th Australasian Joint Conference on Artificial Intelligence, AI 2022 - Perth, Australia
Duration: 5 Dec 20229 Dec 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13728 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference35th Australasian Joint Conference on Artificial Intelligence, AI 2022
Country/TerritoryAustralia
CityPerth
Period5/12/229/12/22

Fingerprint

Dive into the research topics of 'Using Context-Free Grammar to Generate Synthetic Technical Short Texts'. Together they form a unique fingerprint.

Cite this