TY - GEN
T1 - Enhancing SVMs with Problem Context Aware Pipeline
AU - Wen, Zeyi
AU - Zhou, Zhishang
AU - Liu, Hanfeng
AU - He, Bingsheng
AU - Li, Xia
AU - Chen, Jian
PY - 2021/8/14
Y1 - 2021/8/14
N2 - In recent years, many data mining practitioners have treated deep neural networks (DNNs) as a standard recipe of creating the state-of-the-art solutions. As a result, models like Support Vector Machines (SVMs) have been overlooked. While the results from DNNs are encouraging, DNNs also come with their huge number of parameters in the model and overheads in long training/inference time. SVMs have excellent properties such as convexity, good generality and efficiency. In this paper, we propose techniques to enhance SVMs with an automatic pipeline which exploits the context of the learning problem. The pipeline consists of several components including data aware subproblem construction, feature customization, data balancing among subproblems with augmentation, and kernel hyper-parameter tuner. Comprehensive experiments show that our proposed solution is more efficient, while producing better results than the other SVM based approaches. Additionally, we conduct a case study of our proposed solution on a popular sentiment analysis problem - -the aspect term sentiment analysis (ATSA) task. The study shows that our SVM based solution can achieve competitive predictive accuracy to DNN (and even majority of the BERT) based approaches. Furthermore, our solution is about 40 times faster in inference and has 100 times fewer parameters than the models using BERT. Our findings can encourage more research work on conventional machine learning techniques which may be a good alternative for smaller model size and faster training/inference.
AB - In recent years, many data mining practitioners have treated deep neural networks (DNNs) as a standard recipe of creating the state-of-the-art solutions. As a result, models like Support Vector Machines (SVMs) have been overlooked. While the results from DNNs are encouraging, DNNs also come with their huge number of parameters in the model and overheads in long training/inference time. SVMs have excellent properties such as convexity, good generality and efficiency. In this paper, we propose techniques to enhance SVMs with an automatic pipeline which exploits the context of the learning problem. The pipeline consists of several components including data aware subproblem construction, feature customization, data balancing among subproblems with augmentation, and kernel hyper-parameter tuner. Comprehensive experiments show that our proposed solution is more efficient, while producing better results than the other SVM based approaches. Additionally, we conduct a case study of our proposed solution on a popular sentiment analysis problem - -the aspect term sentiment analysis (ATSA) task. The study shows that our SVM based solution can achieve competitive predictive accuracy to DNN (and even majority of the BERT) based approaches. Furthermore, our solution is about 40 times faster in inference and has 100 times fewer parameters than the models using BERT. Our findings can encourage more research work on conventional machine learning techniques which may be a good alternative for smaller model size and faster training/inference.
KW - machine learning
KW - sentiment analysis
KW - support vector machines
UR - http://www.scopus.com/inward/record.url?scp=85114940794&partnerID=8YFLogxK
U2 - 10.1145/3447548.3467291
DO - 10.1145/3447548.3467291
M3 - Conference paper
AN - SCOPUS:85114940794
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1821
EP - 1829
BT - KDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery (ACM)
T2 - 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021
Y2 - 14 August 2021 through 18 August 2021
ER -