Abstract
NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialised-domain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at: https://youtu.be/P7_ooKrQPDU.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing |
| Subtitle of host publication | System Demonstrations |
| Place of Publication | USA |
| Publisher | Association for Computational Linguistics |
| Pages | 212-219 |
| ISBN (Electronic) | 978-1-955917-11-7 |
| Publication status | Published - Nov 2021 |
| Event | 2021 Conference on Empirical Methods in Natural Language Processing - , Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 |
Conference
| Conference | 2021 Conference on Empirical Methods in Natural Language Processing |
|---|---|
| Abbreviated title | EMNLP 2021 |
| Country/Territory | Dominican Republic |
| Period | 7/11/21 → 11/11/21 |
Fingerprint
Dive into the research topics of 'LexiClean: An annotation tool for rapid multi-task lexical normalisation'. Together they form a unique fingerprint.Research output
- 1 Doctoral Thesis
-
Automatic knowledge extraction from industrial maintenance short text using deep learning
Bikaun, T., 2024, (Unpublished)Research output: Thesis › Doctoral Thesis
File117 Downloads (Pure)
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver