A Hybrid Two-Level Support Vector Machine-Based Method for Churn Analysis

Sarac, Ferdi and Şeker, Hüseyin and Lisowski, Marcin and Timothy, Alan (2021) A Hybrid Two-Level Support Vector Machine-Based Method for Churn Analysis. In: 2021 5th International Conference on Cloud and Big Data Computing (ICCBDC 2021), 13-15 August 2021. (In Press)

[img]
Preview
Text
Churn Analaysis ICCBDC 2021 acm_template.pdf - Accepted Version

Download (629kB)

Abstract

Customer churn is a central problem in almost every sector. Due to the diversity of the customers, products and services, and a massive amount of data being generated as a result of e-commerce tools and services, (big) data analytics and artificial intelligence-based methods have been developed and used for churn analysis in order to develop a strategy that is expected to understand the reasons behind the customer churn and subsequently to develop an effective and profitable customer retention programme. The analysis based on the data analytics and artificial intelligence methods focuses more on the profiling of customers, the classification of customer churn and identification of features that affect the churn. However, there doesn’t seem many studies that would be able to help understand how much a potential customer is likely to (or not likely to) pay for the products or services when churned or not, and to predict how much a particular customer or group of customers may have paid for the products or services. Therefore, in this study, a two-level churn analysis is proposed to (1) classify the customer churn or not, and (1) predict how much the customer has paid for the service. In order to achieve it, a machine learning method, namely support vector machine (SVM), was used for the classification part whereas a monthly service charge was predicted by using support vector regression (SVR) method. In order to select the most appropriate feature sub-set for both analyses, an unsupervised feature selection method, namely the multi-cluster feature selection method was utilized. The same feature selection method was used for both analyses for the sake of consistency to understand its performance over both analyses. The proposed hybrid approach was then applied in IBM’s Telcom data set with over 7000 customers in order to demonstrate the applicability and generalization ability of the proposed two-level approach. The SVM-based classification method has yielded AUC 85.6 and total classification accuracy of 81.5% being higher than those of a recent study where an aggressive set of the supervised classification methods was performed. The SVR-based prediction of the monthly charge has resulted in RMSE of 1.27, which is a reasonably acceptable outcome in the sector given the diversity of the ranges of charges as evidenced in its standard deviation. The approach presented in the study demonstrates that both the churn classification and charge prediction can be performed at the same time with a higher degree of accuracy. As the approach is open for further improvement, future analysis will be carried out to improve the accuracy for both analyses over other data sets to demonstrate its robustness and generalization ability.

Item Type: Conference or Workshop Item (Paper)
Date: 13 August 2021
Uncontrolled Keywords: Classification, Prediction, Feature Selection, Support Vector Machine, Support Vector Regression, Multi Clustering Feature Selection, Unsupervised Feature Selection, Churn Analysis, Customer Willingness to Pay, Customer Retention, IBM Telco Customer Churn Data Set.
Subjects: G300 Statistics
G400 Computer Science
G700 Artificial Intelligence
L100 Economics
N300 Finance
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Huseyin Seker
Date Deposited: 31 Aug 2021 09:30
Last Modified: 31 Aug 2021 09:30
URI: http://www.open-access.bcu.ac.uk/id/eprint/12122

Actions (login required)

View Item View Item

Research

In this section...