LexiClean: An annotation tool for rapid multi-task lexical normalisation

Research output: Chapter in Book/Conference paperConference paperpeer-review


NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialised-domain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at: https://youtu.be/P7_ooKrQPDU.
Original languageEnglish
Title of host publicationProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Subtitle of host publicationSystem Demonstrations
Place of PublicationUSA
PublisherAssociation for Computational Linguistics
ISBN (Electronic)978-1-955917-11-7
Publication statusPublished - Nov 2021


Dive into the research topics of 'LexiClean: An annotation tool for rapid multi-task lexical normalisation'. Together they form a unique fingerprint.

Cite this