Featureless Domain-Specific Term Extraction with Minimal Labelled Data

Research output: Chapter in Book/Conference paperConference paperpeer-review

21 Citations (Scopus)


Supervised domain-specific term extraction often suffers from two common problems, namely labourious manual feature selection, and the lack of labelled data. In this paper, we introduce a weakly supervised bootstrapping approach using two deep learning classifiers. Each classifier learns the representations of terms separately by taking word embedding vectors as inputs, thus no manually selected feature is required. The two classifiers are firstly trained on a small set of labelled data, then independently make predictions on a subset of the unlabeled data. The most confident predictions are subsequently added to the training set to retrain the classifiers. This co-training process minimises the reliance on labelled data. Evaluations on two datasets demonstrate that the proposed co-training approach achieves a competitive performance with limited training data as compared to standard supervised learning baseline.
Original languageEnglish
Title of host publicationProceedings of the Australasian Language Technology Association Workshop 2016
EditorsTrever Cohn
Place of PublicationAustralia
PublisherAustralasian Language Technology Association
Number of pages10
Publication statusPublished - 2016
EventAustralasian Language Technology Association Workshop 2016 - Monash University, Melbourne, Australia
Duration: 5 Dec 20167 Dec 2016


ConferenceAustralasian Language Technology Association Workshop 2016
Abbreviated titleALTA 2016
Internet address


Dive into the research topics of 'Featureless Domain-Specific Term Extraction with Minimal Labelled Data'. Together they form a unique fingerprint.

Cite this