LexiClean: An annotation tool for rapid multi-task lexical normalisation

Research output: Chapter in Book/Conference paperConference paperpeer-review

Abstract

NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialised-domain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at: https://youtu.be/P7_ooKrQPDU.
Original languageEnglish
Title of host publicationProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Subtitle of host publicationSystem Demonstrations
Place of PublicationUSA
PublisherAssociation for Computational Linguistics
Pages212-219
ISBN (Electronic)978-1-955917-11-7
Publication statusPublished - Nov 2021
Event2021 Conference on Empirical Methods in Natural Language Processing - , Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2021
Country/TerritoryDominican Republic
Period7/11/2111/11/21

Fingerprint

Dive into the research topics of 'LexiClean: An annotation tool for rapid multi-task lexical normalisation'. Together they form a unique fingerprint.

Cite this