TY - GEN
T1 - LexiClean
T2 - 2021 Conference on Empirical Methods in Natural Language Processing
AU - Bikaun, Tyler
AU - French, Tim
AU - Hodkiewicz, Melinda
AU - Stewart, Michael
AU - Liu, Wei
PY - 2021/11
Y1 - 2021/11
N2 - NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialised-domain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at: https://youtu.be/P7_ooKrQPDU.
AB - NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialised-domain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at: https://youtu.be/P7_ooKrQPDU.
UR - https://aclanthology.org/2021.emnlp-demo.25/
UR - https://aclanthology.org/volumes/2021.emnlp-demo/
M3 - Conference paper
SP - 212
EP - 219
BT - Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
PB - Association for Computational Linguistics
CY - USA
Y2 - 7 November 2021 through 11 November 2021
ER -