Abstract
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of BigScience Episode #5 |
| Subtitle of host publication | Workshop on Challenges & Perspectives in Creating Large Language Models |
| Editors | Angela Fan, Suzana Ilic, Thomas Wolf, Matthias Gallé |
| Place of Publication | USA |
| Publisher | Association for Computational Linguistics |
| Pages | 75-83 |
| Number of pages | 9 |
| ISBN (Electronic) | 9781955917261 |
| ISBN (Print) | 9781955917261 |
| Publication status | Published - May 2022 |
| Event | BigScience Episode #5 – Challenges & Perspectives in Creating Large Language Models: ACL 2022 Workshop - Duration: 27 May 2022 → 27 May 2022 |
Workshop
| Workshop | BigScience Episode #5 – Challenges & Perspectives in Creating Large Language Models |
|---|---|
| Period | 27/05/22 → 27/05/22 |