A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry

Kristof Coussement, Stefan Lessmann, Geert Verstraeten

Research output: Contribution to journalArticlepeer-review

149 Citations (Scopus)

Abstract

Data preparation is a process that aims to convert independent (categorical and continuous) variables into a form appropriate for further analysis. We examine data-preparation alternatives to enhance the prediction performance for the commonly-used logit model. This study, conducted in a churn prediction modeling context, benchmarks an optimized logit model against eight state-of-the-art data mining techniques that use standard input data, including real-world cross-sectional data from a large European telecommunication provider. The results lead to following conclusions. (i) Analysts better acknowledge that the data-preparation technique they choose actually affects churn prediction performance; we find improvements of up to 14.5% in the area under the receiving operating characteristics curve and 34% in the top decile lift. (ii) The enhanced logistic regression also is competitive with more advanced single and ensemble data mining algorithms. This article concludes with some managerial implications and suggestions for further research, including evidence of the generalizability of the results for other business settings.

Original languageEnglish
Pages (from-to)27-36
Number of pages10
JournalDecision Support Systems
Volume95
DOIs
Publication statusPublished - 1 Mar 2017
Externally publishedYes

Fingerprint

Dive into the research topics of 'A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry'. Together they form a unique fingerprint.

Cite this