Abstract
Semi-supervised learning is an essential approach to classification when the available labeled data is insufficient and we need to also make use of unlabeled data in the learning process. Numerous research efforts have focused on designing algorithms to improve the Fi score, but have any mechanism to control precision or recall individually. However, many applications have precision/recall preferences. For instance, an email spam classifier requires a precision of 0.9 to mitigate the false dismissal of useful emails. In this paper, we propose a method that allows to specify a precision/recall preference while maximising the Fx score. Our key idea is that we divide the semi-supervised learning process into multiple rounds of supervised learning, and the classifier learned at each round is calibrated using a subset of the labeled dataset before we use it on the unlabeled dataset for enlarging the training dataset. Our idea is applicable to a number of learning models such as Support Vector Machines (SVMs), Bayesian networks and neural networks. We focus our research and the implementation of our idea on SVMs. We conduct extensive experiments to validate the effectiveness of our method. The experimental results show that our method can train classifiers with a precision/recall preference, while the popular semi-supervised SVM training algorithm (which we use as the baseline) cannot. When we specify the precision preference and the recall preference to be the same, which indicates to maximise the Fi score only as the baseline does, our method achieves better or similar Fi scores to the baseline. An additional advantage of our method is that it converges much faster than the baseline.
Original language | English |
---|---|
Title of host publication | The 23rd ACM Conference on Information and Knowledge Management (CIKM 2014) |
Place of Publication | USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 421-430 |
Number of pages | 10 |
ISBN (Print) | 978-1-4503-2598-1 |
DOIs | |
Publication status | Published - 2014 |
Externally published | Yes |
Event | 23rd ACM International Conference on Information and Knowledge Management - Shanghai, China Duration: 3 Nov 2014 → 7 Nov 2014 |
Conference
Conference | 23rd ACM International Conference on Information and Knowledge Management |
---|---|
Abbreviated title | CIKM 2014 |
Country/Territory | China |
City | Shanghai |
Period | 3/11/14 → 7/11/14 |