TY - JOUR
T1 - A comparative analysis of data preparation algorithms for customer churn prediction
T2 - A case study in the telecommunication industry
AU - Coussement, Kristof
AU - Lessmann, Stefan
AU - Verstraeten, Geert
PY - 2017/3/1
Y1 - 2017/3/1
N2 - Data preparation is a process that aims to convert independent (categorical and continuous) variables into a form appropriate for further analysis. We examine data-preparation alternatives to enhance the prediction performance for the commonly-used logit model. This study, conducted in a churn prediction modeling context, benchmarks an optimized logit model against eight state-of-the-art data mining techniques that use standard input data, including real-world cross-sectional data from a large European telecommunication provider. The results lead to following conclusions. (i) Analysts better acknowledge that the data-preparation technique they choose actually affects churn prediction performance; we find improvements of up to 14.5% in the area under the receiving operating characteristics curve and 34% in the top decile lift. (ii) The enhanced logistic regression also is competitive with more advanced single and ensemble data mining algorithms. This article concludes with some managerial implications and suggestions for further research, including evidence of the generalizability of the results for other business settings.
AB - Data preparation is a process that aims to convert independent (categorical and continuous) variables into a form appropriate for further analysis. We examine data-preparation alternatives to enhance the prediction performance for the commonly-used logit model. This study, conducted in a churn prediction modeling context, benchmarks an optimized logit model against eight state-of-the-art data mining techniques that use standard input data, including real-world cross-sectional data from a large European telecommunication provider. The results lead to following conclusions. (i) Analysts better acknowledge that the data-preparation technique they choose actually affects churn prediction performance; we find improvements of up to 14.5% in the area under the receiving operating characteristics curve and 34% in the top decile lift. (ii) The enhanced logistic regression also is competitive with more advanced single and ensemble data mining algorithms. This article concludes with some managerial implications and suggestions for further research, including evidence of the generalizability of the results for other business settings.
KW - Churn prediction
KW - Data preparation techniques
KW - Predictive analytics
UR - http://www.scopus.com/inward/record.url?scp=85028254431&partnerID=8YFLogxK
U2 - 10.1016/j.dss.2016.11.007
DO - 10.1016/j.dss.2016.11.007
M3 - Article
AN - SCOPUS:85028254431
SN - 0167-9236
VL - 95
SP - 27
EP - 36
JO - Decision Support Systems
JF - Decision Support Systems
ER -