E2EET: from pipeline to end-to-end entity typing via transformer-based embeddings

Research output: Contribution to journalArticlepeer-review

Abstract

Entity typing (ET) is the process of identifying the semantic types of every entity within a corpus. ET involves labelling each entity mention with one or more class labels. As a multi-class, multi-label task, it is considerably more challenging than named entity recognition. This means existing entity typing models require pre-identified mentions and cannot operate directly on plain text. Pipeline-based approaches are therefore used to join a mention extraction model and an entity typing model to process raw text. Another key limiting factor is that these mention-level ET models are trained on fixed context windows, which makes the entity typing results sensitive to window size selection. In light of these drawbacks, we propose an end-to-end entity typing model (E2EET) using a Bi-GRU to remove the dependency on window size. To demonstrate the effectiveness of our E2EET model, we created a stronger baseline mention-level model by incorporating the latest contextualised transformer-based embeddings (BERT). Extensive ablative studies demonstrate the competitiveness and simplicity of our end-to-end model for entity typing.

Original languageEnglish
Pages (from-to)95-113
Number of pages19
JournalKnowledge and Information Systems
Volume64
Issue number1
Early online date30 Nov 2021
DOIs
Publication statusPublished - Jan 2022

Fingerprint

Dive into the research topics of 'E2EET: from pipeline to end-to-end entity typing via transformer-based embeddings'. Together they form a unique fingerprint.

Cite this