Telco Customer Churn/Retention Rate ML/AI Strategies that Work!

Cohort ML aanalysis
smart customer retention strategies that work!
Machine learning could predict customers with high probability to churn!

Contents:

  1. Motivation
  2. Objectives
    1. Example: Large B2B product company
  3. Approach
  4. Workflow
  5. Implementation
  6. Initial Data Analysis
    1. #plt.show()
    2. #plt.show()
    3. #plt.show()
    4. #plt.show()
    5. #plt.show()
    6. #plt.show()
    7. #plt.show()
    8. #plt.show()
  7. Churn Analytics
    1. #plt.show()
    2. #plt.show()
    3. #plt.show()
    4. #plt.show()
  8. Data Preparation
  9. RF Model
  10. GB Model
    1. #plt.show()
  11. AB Model
    1. #plt.show()
  12. LGBM Model
    1. #plt.show()
  13. Calibration Plots
  14. Feature Engineering
  15. Cluster Analysis
  16. Conclusions
  17. Related Links

Motivation

Cohort analysis [1-10] is a way to understand customer churn (aka attrition) representing the number or percentage of customers who don’t purchase additional products or services. 

Today’s most successful companies address cohort by leveraging Machine Learning (ML) as part of Artificial Intelligence (AI) to build models that accurately predict churn and take action before a customer leaves [1-3]. Companies that are looking for a targeted and effective approach to reduce customer churn would do well to make use of the possibilities that ML/AI has to offer [4].

According to Glassbox [5], AI has the potential to boost rates of profitability by an average of 38% by 2035. In fact, we’ll create more data in the next 3 years than during the past 30 years—making data analysis even more difficult. AI can surface trends and patterns, revealing the big picture behind user behavior.

Key Benefits [4-6]:

  • AI helps you reduce customer turnover
  • AI enables you to anticipate negative changes in your customers’ behaviour
  • Your data exploration tells you exactly which experiences/customers are at risk
  • Approach dissatisfied customers in time because AI picks up on alarm signals
  • Put an extra effort into loyal customers based upon the 80/20 principle

Objectives

Customer Churn is one of the most important and challenging problems for businesses such as Credit Card companies, cable service providers, SASS and telecommunication companies worldwide [7]. 

These problems are to be addressed by maximizing the rate ratio

max [ (Customer Retention Rate)/(Customer Churn Rate) ]

or

max (Customer Retention Rate)

while

min (Customer Churn Rate).

At the end of this study, we’ll be able to answer the following questions [10]:

  • Which customers are churning
  • Why they’re cancelling
  • How to fix the problem

Churn analytics [8] is the process of measuring the rate at which customers quit the product, site, or service. It answers the questions “Are we losing customers?” and “If so, how?” to allow teams to take action. Lower churn rates lead to happier customers, larger margins, and higher profits. To prevent churn, teams must first measure it with analytics.

There are two types for churn. Customer churn is the rate at which you are losing specific customers/accounts. Revenue (aka MRR) churn measures the overall volume of recurring revenue lost in a given period:

customer churn rate formula
vs MRR churn rate
Example: Large B2B product company

A large pool toys manufacturer sells its products through 700,000 resellers around the United States. Last month, it added 15,000 resellers and lost 9,000.

Monthly reseller churn rate: 9,000 / 700,000 = 1.3%

Approach

Any churn analysis consists of the following three steps [10]:

  1. Setup Churn Analytics Tools
  2. Find out why customers are churning
  3. Analyze churn by cohorts

The idea is to get a feel for which cohorts—or groups of customers with shared characteristics, such as when they subscribed to your product or where they shop—are leaving. Then use surveys and personal outreach to get insights into what’s driving those cohort members away so you can take proactive steps to retain them.

Workflow

  • Import and install relevant Python libraries and packages
  • Read input dataset as a table and check the content
  • Churn Exploratory Data Analysis (EDA)
  • Feature Engineering and Impact Analysis
  • Train/Test Data Splitting, Sampling and Preparation
  • Run Training Classifiers – RF, GB, Ada, and LGBM
  • Compare ML performance QC metrics
  • Prepare a full classification report
  • Apply cluster PCA analysis (optional)

Implementation

Lets import and/or install the required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib
import warnings
warnings.filterwarnings(‘ignore’)

!pip install lightgbm

from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import classification_report, accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import plot_confusion_matrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from lightgbm import LGBMClassifier

and set the working directory /YOURPATH

import os
os.chdir(‘YOURPATH’)

Let’s read the input dataset

data = pd.read_csv(‘https://raw.githubusercontent.com/andhikaw789/Telco-Customer-Churn/main/Telco-Customer-Churn.csv’)

and check the content

data.head()

table input data

data.shape

(7043, 21)
There are 21 columns (features) and 7043 rows (customers).

data[‘customerID’].duplicated().value_counts()

False    7043
Name: customerID, dtype: int64
There are no duplicates

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 
 17  PaymentMethod     7043 non-null   object 
 18  MonthlyCharges    7043 non-null   float64
 19  TotalCharges      7043 non-null   object 
 20  Churn             7043 non-null   object 
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

Let’s make the unwanted parsing to NaN values

data[‘TotalCharges’] = pd.to_numeric(data[‘TotalCharges’], errors=’coerce’)
data[‘TotalCharges’].replace(‘ ‘, np.nan, inplace=True)
data[‘TotalCharges’] = data[‘TotalCharges’].astype(float)

data.isna().sum()

customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies      0
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges        11
Churn                0
dtype: int64

There’s only 11 missing in total charges that we got from replacing the spacing character to NaN values before.

Initial Data Analysis

Let’s begin the EDA phase and check the overall churn proportion

yy=data[‘Dependents’]
plt.figure(figsize= (10,6))
fig = yy.value_counts(normalize = True).plot.pie(autopct=’%1.2f%%’)
plt.title(“Pie-chart showing Dependents”, fontdict={‘fontsize’: 20, ‘fontweight’ : 5, ‘color’ : ‘Green’})
fig.legend(title=”Dependents”,
loc=”center left”,
bbox_to_anchor=(1, 0, 0.5, 1))

#plt.show()

plt.savefig(‘telco_dependents_piechart.png’)

piechart churn

We can see that there is an unequal distribution of classes (Churn) in the training dataset. We face the imbalanced classification problem.

Let’s check the gender factor

yy=data[‘gender’]
plt.figure(figsize= (10,6))
fig = yy.value_counts(normalize = True).plot.pie(autopct=’%1.2f%%’)
plt.title(“Pie-chart showing Gender”, fontdict={‘fontsize’: 20, ‘fontweight’ : 5, ‘color’ : ‘Green’})
fig.legend(title=”Gender”,
loc=”center left”,
bbox_to_anchor=(1, 0, 0.5, 1))

#plt.show()

plt.savefig(‘telco_gender_piechart.png’)

piechart gender

The plot shows that there is no gender gap in the dataset – both M and F are equally represented.

Let’s check the Partner feature

yy=data[‘Partner’]
plt.figure(figsize= (10,6))
fig = yy.value_counts(normalize = True).plot.pie(autopct=’%1.2f%%’)
plt.title(“Pie-chart showing Partner”, fontdict={‘fontsize’: 20, ‘fontweight’ : 5, ‘color’ : ‘Green’})
fig.legend(title=”Partner”,
loc=”center left”,
bbox_to_anchor=(1, 0, 0.5, 1))

#plt.show()

plt.savefig(‘telco_partner_piechart.png’)

piechart partner

It is clear that the dataset is well balanced in terms of the partnership proportion.

Let’s check the Dependents factor

yy=data[‘Dependents’]
plt.figure(figsize= (10,6))
fig = yy.value_counts(normalize = True).plot.pie(autopct=’%1.2f%%’)
plt.title(“Pie-chart showing Dependents”, fontdict={‘fontsize’: 20, ‘fontweight’ : 5, ‘color’ : ‘Green’})
fig.legend(title=”Dependents”,
loc=”center left”,
bbox_to_anchor=(1, 0, 0.5, 1))

#plt.show()

plt.savefig(‘telco_dependents_piechart.png’)

piechart dependents

It is clear that the dependency ratio is not well balanced in the dataset.

Let’s look at the SeniorCitizen factor

yy=data[‘SeniorCitizen’]
plt.figure(figsize= (10,6))
fig = yy.value_counts(normalize = True).plot.pie(autopct=’%1.2f%%’)
plt.title(“Pie-chart showing SeniorCitizen”, fontdict={‘fontsize’: 20, ‘fontweight’ : 5, ‘color’ : ‘Green’})
fig.legend(title=”SeniorCitizen”,
loc=”center left”,
bbox_to_anchor=(1, 0, 0.5, 1))

#plt.show()

plt.savefig(‘telco_seniorcitizen_piechart.png’)

piechart seniorcitizen

We can see that senior citizens are underrepresented in the dataset.

Let’s check the PhoneService factor

yy=data[‘PhoneService’]
plt.figure(figsize= (10,6))
fig = yy.value_counts(normalize = True).plot.pie(autopct=’%1.2f%%’)
plt.title(“Pie-chart showing PhoneService”, fontdict={‘fontsize’: 20, ‘fontweight’ : 5, ‘color’ : ‘Green’})
fig.legend(title=”PhoneService”,
loc=”center left”,
bbox_to_anchor=(1, 0, 0.5, 1))

#plt.show()

plt.savefig(‘telco_phoneservice_piechart.png’)

piechart phoneservice

We can see that the percentage of PhoneService=0 is negligibly small in the dataset.

Let’s look at the PaymentMethod piechart

yy=data[‘PaymentMethod’]
plt.figure(figsize= (10,6))
fig = yy.value_counts(normalize = True).plot.pie(autopct=’%1.2f%%’)
plt.title(“Pie-chart showing PaymentMethod”, fontdict={‘fontsize’: 20, ‘fontweight’ : 5, ‘color’ : ‘Green’})
fig.legend(title=”PaymentMethod”,
loc=”center left”,
bbox_to_anchor=(1, 0, 0.5, 1))

#plt.show()

plt.savefig(‘telco_payment_piechart.png’)

piechart paymentmethod

We can see almost equal representation of all available payment options such as electronic/mailed check, bank and credit card transfers.

Let’s consider the PaperlessBilling feature

piechart paperlessbilling

This chart shows that the value PaperlessBilling=1 is dominant in the dataset.

Let’s check the Contract (month-to-month, one year and two years contract) feature

yy=data[‘Contract’]
plt.figure(figsize= (10,6))
fig = yy.value_counts(normalize = True).plot.pie(autopct=’%1.2f%%’)
plt.title(“Pie-chart showing Contract”, fontdict={‘fontsize’: 20, ‘fontweight’ : 5, ‘color’ : ‘Green’})
fig.legend(title=”Contract”,
loc=”center left”,
bbox_to_anchor=(1, 0, 0.5, 1))

#plt.show()

plt.savefig(‘telco_contract_piechart.png’)

piechart contract

As we can see, the case Contract=0 has more than 55% share in the dataset.

Let’s look at the piechart showing MultipleLines (no phone service and yes/no multiple lines)

piechart multiplelines

It is clear that the percentage MultipleLines=1 is negligible compared to MultipleLines=0,2.

Let’s plot the histograms of tenure, MonthlyCharges, and TotalCharges

data.hist(column=’tenure’)
plt.savefig(‘telco_tenure_hist.png’)

histogram tenure

data.hist(column=’MonthlyCharges’)
plt.savefig(‘telco_monthlycharges_hist.png’)

histogram Monthlycharges

data.hist(column=’TotalCharges’)
plt.savefig(‘telco_totalcharges_hist.png’)

histogram totalcharges

We can see the following dominant trends in our dataset: tenure<10 and tenure>60, MonthlyCharges<30, and TotalCharges<1000.

Churn Analytics

Now, in order to identify churn by different classes, we will group it by these classes and churn is ‘yes.’ Following that, we’ll utilize a count plot to estimate how many employees will quit the organization [1,2].

Let’s begin with Gender

plt.figure(figsize=(8,5), facecolor=’white’)
sns.set(style=’whitegrid’)
ax = sns.countplot(data=data, x=’gender’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Gender’)
for p in ax.patches:
ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=13)

#plt.show()

plt.savefig(‘telecom_gender.png’)

barchart churn by gender

This plot shows that gender does not affect the customer churn.

Let’s look at the impact of Senior Citizen

plt.figure(figsize=(8, 5), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’SeniorCitizen’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Senior Citizen’)
for p in ax.patches:
ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=10)
plt.savefig(‘telecom_seniorcitizen.png’)

barchart churn by senior citizen

It appears that senior citizens are likely to leave as customers.

Let’s identify attrition by Partner

plt.figure(figsize=(8, 5), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’Partner’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Partner’)
for p in ax.patches:
ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=13)
plt.savefig(‘telecom_partner.png’)

barchart churn by partner

It is clear that there is a weak correlation between the customer churn and the Partner feature.

Let’s explore the Dependents feature

plt.figure(figsize=(8, 5), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’Dependents’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Dependents’)
for p in ax.patches:
ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=10)
plt.savefig(‘telecom_dependents.png’)

barchart churn by dependents

There’s a small possibility that no dependents affects the customer to churn.

Let’s check the churn by Phone Service

sns.set(style=’whitegrid’)
plt.figure(figsize=(8, 5), facecolor=’white’)
ax=sns.countplot(data=data, x=’PhoneService’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Phone Service’)
for p in ax.patches:
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_phoneservice.png’)

barchart churn by phone service

Since the churn’s percentage proportions of “yes/no” phone service are the same, this feature doesn’t affect the customer churn.

Let’s look at the churn by Multiple Lines

plt.figure(figsize=(8, 5), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’MultipleLines’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Multiple Lines’)
plt.legend(loc=’upper right’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=8)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_multiplelines.png’)

barchart churn by multiple lines

We can see that there’s a slim chance that no multiple lines affects the customer churn.

Let’s look at InternetService

plt.figure(figsize=(8, 6), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’InternetService’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Internet Service’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=10)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_internetservice.png’)

barchart churn by internet service

It turns out that there’s a chance that the fiber optic internet service affects the churn.

Let’s look at the churn by Online Security

plt.figure(figsize=(8, 6), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’OnlineSecurity’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Online Security’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=8)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_onlinesecurity.png’)

barchart churn by online security

This plot shows that there’s a possibility that no online security affects the customer churn.

Let’s check Online Backup

plt.figure(figsize=(10,7))
sns.set(style=’whitegrid’)
plt.figure(figsize=(8, 6), facecolor=’white’)
ax=sns.countplot(data=data, x=’OnlineBackup’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Online Backup’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=8)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_onlinebackup.png’)

barchart churn by online backup

We can see that there’s a chance that no online backup affects the customer churn.

Let’s look at Device Protection

sns.set(style=’whitegrid’)
plt.figure(figsize=(8, 6), facecolor=’white’)
ax=sns.countplot(data=data, x=’DeviceProtection’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Device Protection’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=8)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_deviceprotection.png’)

barchart churn by device protection

We can see that there’s a chance that no online backup affects the customer churn.

Let’s look at Tech Support

plt.figure(figsize=(8,6), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’TechSupport’, hue=’Churn’, saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Tech Support’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=10)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_techsupport.png’)

barchart churn by tech support

There’s a possibility that no tech support is linked to the customer churn.

Let’s examine Streaming TV

plt.figure(figsize=(8,6), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’StreamingTV’, hue=’Churn’,saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Streaming TV’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=10)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_streamingtv.png’)

barchart churn by streaming TV

This plot shows that streaming tv may not affect the customer churn.

Let’s examine Streaming Movies

plt.figure(figsize=(9,7), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’StreamingMovies’, hue=’Churn’,saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Streaming Movies’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=10)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’, fontsize=9)
plt.savefig(‘telecom_streamingmovies.png’)

barchart churn by streaming movies

It appears that streaming movies are weakly related to the customer churn.

Let’s check the churn by Contract

plt.figure(figsize=(8,7), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’Contract’, hue=’Churn’,saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Contract’)
for p in ax.patches:
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’)
plt.savefig(‘telecom_contract.png’)

barchart churn by contract

It is clear that there’s a chance that the month-to-month contract is linked to the customer churn.

Let’s look at Paperless Billing

plt.figure(figsize=(8,6), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’PaperlessBilling’, hue=’Churn’,saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Paperless Billing’)
for p in ax.patches:
ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=13)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’)
plt.savefig(‘telecom_paperless.png’)

barchart churn by paperless billing

We can see that there’s a chance that paperless billing is related to the customer churn.

Let’s now consider Payment Method

plt.figure(figsize=(10,6), facecolor=’white’)
sns.set(style=’whitegrid’)
ax=sns.countplot(data=data, x=’PaymentMethod’, hue=’Churn’,saturation=1, alpha=0.9, palette=’bright’)
ax.set_title(‘Churn by Payment Method’)
for p in ax.patches:
#ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=10)
ax.annotate(f’\n{p.get_height()}’, (p.get_x()+0.2, p.get_height()), ha=’center’, va=’top’, color=’white’, size=13)
number = ‘{}’.format(p.get_height().astype(‘int64’))
ax.annotate(number, (p.get_x() + p.get_width()/2., p.get_height()), ha=’center’, va=’center’,
xytext=(0,5), textcoords=’offset points’, color=’black’, fontweight=’semibold’)
plt.savefig(‘telecom_payment.png’)

barchart churn by payment method

There’s a probability that electronic check payment method is linked to the customer churn.

Let’s explore the remaining features (Tenure and Monthly/Total Charges) by plotting their histograms

plt.figure(figsize=(12,5), facecolor=’white’)
plt.figure(facecolor=’white’)
sns.set(style=’whitegrid’)
sns.histplot(data=data, x=’tenure’, hue=’Churn’, binwidth=2, kde=True)
plt.title(‘Tenure’)

#plt.show()

plt.savefig(‘telecom_tenurehist.png’)


histogram churn by tenure

plt.figure(figsize=(11,5), facecolor=’white’)
plt.figure(facecolor=’white’)
sns.set(style=’whitegrid’)
sns.histplot(data=data, x=’MonthlyCharges’, hue=’Churn’, binwidth=2, kde=True)
plt.title(‘Monthly Charges’)

#plt.show()

plt.savefig(‘telecom_chargeshist.png’)

histogram churn by monthly charges

plt.figure(figsize=(13,5), facecolor=’white’)
plt.figure(facecolor=’white’)
sns.set(style=’whitegrid’)
sns.histplot(data=data, x=’TotalCharges’, hue=’Churn’, binwidth=100, kde=True)
plt.title(‘Total Charges’)

#plt.show()

plt.savefig(‘telecom_totalchargeshist.png’)

histogram churn by total charges

These three plots reveal the following trends typical for churned customers:

  • Tenure = 0-2 months
  • MonthlyCharges = 70-100
  • TotalCharges < 200 (overlap with no-churn)

Key takeaways

  • There is a relationship between the churn rate and the following features: senior citizen, fiber optic internet service, paperless billing, month-to-month contract, electronic check payment, and high monthly charges.
  • There is a weak correlation between the churn rate and certain variables such as dependents, online security/backup, tech support, streaming tv/movies, and device protection.

Read more.

Data Preparation

Let’s check the missing values

data_null = data[data[‘TotalCharges’].isnull()]
data_null[[‘tenure’, ‘MonthlyCharges’, ‘TotalCharges’, ‘Churn’]]

table missing values

Let’s impute the missing values with 0 using fillna

data[‘TotalCharges’].fillna(0, inplace=True)

data_prep=data

Let’s proceed with encoding to change categorical variables into numerical ones. We will use OneHotEncoder for the “yes/no” variables

from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(categories=[[‘Yes’, ‘No’]] ,handle_unknown=’ignore’, sparse = False)

cols = [‘Partner’,’Dependents’, ‘PhoneService’, ‘PaperlessBilling’, ‘Churn’]

for i in cols:
y=np.array(data_prep[i]).reshape(-1,1)
ohe.fit(y)
data_prep[i] = ohe.transform(y)

and LabelEncoder for other available categorical variables

from sklearn.preprocessing import LabelEncoder
lenc = LabelEncoder()

cols = [‘gender’, ‘MultipleLines’, ‘InternetService’, ‘OnlineSecurity’, ‘OnlineBackup’, ‘DeviceProtection’,’TechSupport’, ‘StreamingTV’, ‘StreamingMovies’, ‘Contract’, ‘PaymentMethod’]

for i in cols:
lenc.fit(data_prep[i])
data_prep[i] = lenc.transform(data_prep[i])

Let’s split the target variable and other variables

data_x = data_prep[[‘gender’, ‘SeniorCitizen’, ‘Partner’, ‘Dependents’, ‘tenure’, ‘PhoneService’, ‘MultipleLines’, ‘InternetService’, ‘OnlineSecurity’, ‘OnlineBackup’, ‘DeviceProtection’, ‘TechSupport’, ‘StreamingTV’, ‘StreamingMovies’, ‘Contract’, ‘PaperlessBilling’, ‘PaymentMethod’, ‘MonthlyCharges’, ‘TotalCharges’]].copy()

data_y = data_prep[‘Churn’]

Let’s split the training and test data by setting test_size=0.2, random_state=5

from sklearn.model_selection import train_test_split
from collections import Counter

x_train, x_test, y_train, y_test = train_test_split(data_x, data_y, test_size=0.2, random_state=5)
Counter(y_train)

Counter({1.0: 1483, 0.0: 4151})

Let’s scale the both training and test data using MinMaxScaler

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

scaled_train=np.array(x_train[[‘tenure’, ‘MonthlyCharges’, ‘TotalCharges’]]).reshape(-1,3)

scaler = scaler.fit(scaled_train)

x_train[[‘tenure’, ‘MonthlyCharges’, ‘TotalCharges’]] = scaler.transform(scaled_train)

scaled_test=np.array(x_test[[‘tenure’, ‘MonthlyCharges’, ‘TotalCharges’]]).reshape(-1,3)

x_test[[‘tenure’, ‘MonthlyCharges’, ‘TotalCharges’]] = scaler.transform(scaled_test)

Let’s resample the training data using SMOTE with the parameters sampling_strategy = 0.8, k_neighbors=5, random_state=5 (resampled data 1)

from imblearn.over_sampling import SMOTE
sm = SMOTE(sampling_strategy = 0.8, k_neighbors=5, random_state=5)
x_resample, y_resample = sm.fit_resample(x_train, y_train)
Counter(y_resample)

Counter({1.0: 3320, 0.0: 4151})

Also, we can define the parameter sampling_strategy = 0.66 (resampled data 2)

sm = SMOTE(sampling_strategy = 0.66, k_neighbors=5, random_state=5)
x_resample_2, y_resample_2 = sm.fit_resample(x_train, y_train)
Counter(y_resample_2)

Counter({1.0: 2739, 0.0: 4151})

RF Model

Let’s begin with RandomForestClassifier

Original data:

rf = RandomForestClassifier(random_state=5, criterion=’entropy’, n_estimators=18, max_depth=12)
rf.fit(x_train, y_train)
prediction = rf.predict(x_test)
print(confusion_matrix(y_test, prediction))
print(“Accuracy Random Forest: %.2f” % (accuracy_score(y_test, prediction)100) ) print(“Recall Random Forest:”,recall_score(y_test, prediction)100)
print(“Precision Random Forest:”,precision_score(y_test, prediction)*100)

[[915 108]
 [195 191]]
Accuracy Random Forest: 78.50
Recall Random Forest: 49.48186528497409
Precision Random Forest: 63.87959866220736

Resampled data 1:

rf.fit(x_resample, y_resample)
prediction = rf.predict(x_test)
print(confusion_matrix(y_test, prediction))
print(“Accuracy Random Forest: %.2f” % (accuracy_score(y_test, prediction)100) ) print(“Recall Random Forest:”,recall_score(y_test, prediction)100)
print(“Precision Random Forest:”,precision_score(y_test, prediction)*100)

[[841 182]
 [130 256]]
Accuracy Random Forest: 77.86
Recall Random Forest: 66.32124352331607
Precision Random Forest: 58.44748858447488

Resampled data 2:

rf.fit(x_resample_2, y_resample_2)
prediction = rf.predict(x_test)
print(confusion_matrix(y_test, prediction))
print(“Accuracy Random Forest: %.2f” % (accuracy_score(y_test, prediction)100) ) print(“Recall Random Forest:”,recall_score(y_test, prediction)100)
print(“Precision Random Forest:”,precision_score(y_test, prediction)*100)

[[859 164]
 [135 251]]
Accuracy Random Forest: 78.78
Recall Random Forest: 65.02590673575129
Precision Random Forest: 60.48192771084337

Let’s plot the confusion matrix

RF confusion matrix

Let’s plot the ROC curve

from sklearn.metrics import roc_curve
y_pred_proba = rf.predict_proba(x_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
plt.plot([0,1],[0,1],’k-‘)
plt.plot(fpr,tpr, label=’Knn’)
plt.xlabel(‘FPR’)
plt.ylabel(‘TPR’)
plt.title(‘RF ROC curve’)
plt.savefig(‘roc_rf_curve.png’)

RF ROC curve

and get the score

from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_pred_proba)

0.832993988016552

Let’s plot the KS statistic plot

import scikitplot as skplt
Y_test_probs = rf.predict_proba(x_test)
skplt.metrics.plot_ks_statistic(y_test, Y_test_probs, figsize=(10,6));

RF KS Statistic plot

Let’s plot the Lift curve

#import scikitplot as skplt
skplt.metrics.plot_lift_curve(y_test, Y_test_probs, figsize=(10,6));

RF  lift curve

Let’s plot the Learning Curve

#import scikitplot as skplt
skplt.estimators.plot_learning_curve(rf, x_test, prediction,
cv=7, shuffle=True, scoring=”accuracy”,
n_jobs=-1, figsize=(6,4), title_fontsize=”large”, text_fontsize=”large”,
title=”RandomForestClassifier Learning Curve”);

RF learning curve

We can see that the training score ~1.0 is the indicator of training data overfitting (low bias).

In principle, we can use the Randomised Search CV to identify the best estimates for the RF classifier and then use those estimates to built an improved RF classifier. 

GB Model

Let’s train, test and validate the GradientBoosting Classifier

Original data:

gb_clf = GradientBoostingClassifier(random_state=5, learning_rate= 1, loss= ‘exponential’, max_depth= 1, max_features= 1)
gb_clf.fit(x_train, y_train)
predictiongnb = gb_clf.predict(x_test)
print(confusion_matrix(y_test, predictiongnb))
print(“Accuracy Gradient Boost: %.2f” % (accuracy_score(y_test, predictiongnb)100) ) print(“Recall Gradient Boost:”,recall_score(y_test, predictiongnb)100)
print(“Precision Gradient Boost:”,precision_score(y_test, predictiongnb)*100)
print(“”)

[[924  99]
 [173 213]]
Accuracy Gradient Boost: 80.70
Recall Gradient Boost: 55.181347150259064
Precision Gradient Boost: 68.26923076923077

Resampled data 1

gb_clf.fit(x_resample, y_resample)
predictiongnb = gb_clf.predict(x_test)
print(confusion_matrix(y_test, predictiongnb))
print(“Accuracy Gradient Boost: %.2f” % (accuracy_score(y_test, predictiongnb)100) ) print(“Recall Gradient Boost:”,recall_score(y_test, predictiongnb)100)
print(“Precision Gradient Boost:”,precision_score(y_test, predictiongnb)*100)
print(“”)

[[813 210]
 [113 273]]
Accuracy Gradient Boost: 77.08
Recall Gradient Boost: 70.72538860103627
Precision Gradient Boost: 56.52173913043478

Resampled data 2

gb_clf.fit(x_resample_2, y_resample_2)
predictiongnb = gb_clf.predict(x_test)
print(confusion_matrix(y_test, predictiongnb))
print(“Accuracy Gradient Boost: %.2f” % (accuracy_score(y_test, predictiongnb)100) ) print(“Recall Gradient Boost:”,recall_score(y_test, predictiongnb)100)
print(“Precision Gradient Boost:”,precision_score(y_test, predictiongnb)*100)
print(“”)

[[850 173]
 [125 261]]
Accuracy Gradient Boost: 78.85
Recall Gradient Boost: 67.61658031088082
Precision Gradient Boost: 60.13824884792627

Let’s plot the confusion matrix

GB confusion matrix

Let’s plot the ROC curve

#from sklearn.metrics import roc_curve
y_pred_proba = gb_clf.predict_proba(x_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
plt.plot([0,1],[0,1],’k-‘)
plt.plot(fpr,tpr, label=’Knn’)
plt.xlabel(‘FPR’)
plt.ylabel(‘TPR’)
plt.title(‘Gradient Boost ROC curve’)

#plt.show()

plt.savefig(‘roc_gb_curve.png’)

GB ROC curve

and get the score

#from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_pred_proba)

0.851618727809602

Let’s look at the KS statistic plot

Y_test_probs = gb_clf.predict_proba(x_test)
skplt.metrics.plot_ks_statistic(y_test, Y_test_probs, figsize=(10,6));

Gb KS statistic plot

and plot the Lift curve

GB lift curve

Let’s plot the Learning curve

#import scikitplot as skplt
skplt.estimators.plot_learning_curve(gb_clf, x_test, prediction,
cv=7, shuffle=True, scoring=”accuracy”,
n_jobs=-1, figsize=(6,4), title_fontsize=”large”, text_fontsize=”large”,
title=”Gradient Boost Classifier Learning Curve”);

GB learning curve

That is the best Learning curve we have obtained so far.

AB Model

Let’s look at the AdaBoost Classifier

Original data:

ada = AdaBoostClassifier(random_state=5, learning_rate=0.5, n_estimators=50)
ada.fit(x_train, y_train)
predictionada = ada.predict(x_test)
print(confusion_matrix(y_test, predictionada))
print(“Accuracy Ada Boost: %.2f” % (accuracy_score(y_test, predictionada)100) ) print(“Recall Ada Boost:”,recall_score(y_test, predictionada)100)
print(“Precision Ada Boost:”,precision_score(y_test, predictionada)*100)
print(“”)

[[926  97]
 [173 213]]
Accuracy Ada Boost: 80.84
Recall Ada Boost: 55.181347150259064
Precision Ada Boost: 68.70967741935485

Resampled data 1:

ada.fit(x_resample, y_resample)
predictionada = ada.predict(x_test)
print(confusion_matrix(y_test, predictionada))
print(“Accuracy Ada Boost: %.2f” % (accuracy_score(y_test, predictionada)100) ) print(“Recall Ada Boost:”,recall_score(y_test, predictionada)100)
print(“Precision Ada Boost:”,precision_score(y_test, predictionada)*100)
print(“”)

[[797 226]
 [109 277]]
Accuracy Ada Boost: 76.22
Recall Ada Boost: 71.76165803108809
Precision Ada Boost: 55.069582504970185

Resampled data 2:

ada.fit(x_resample_2, y_resample_2)
predictionada = ada.predict(x_test)
print(confusion_matrix(y_test, predictionada))
print(“Accuracy Ada Boost: %.2f” % (accuracy_score(y_test, predictionada)100) ) print(“Recall Ada Boost:”,recall_score(y_test, predictionada)100)
print(“Precision Ada Boost:”,precision_score(y_test, predictionada)*100)
print(“”)

[[818 205]
 [120 266]]
Accuracy Ada Boost: 76.93
Recall Ada Boost: 68.9119170984456
Precision Ada Boost: 56.475583864118896

Let’s look at the confusion matrix

#from sklearn.metrics import classification_report, confusion_matrix
cf_matrix=confusion_matrix(y_test, predictionada)
sns.heatmap(cf_matrix/np.sum(cf_matrix), annot=True,
fmt=’.2%’, cmap=’Blues’)
plt.savefig(‘telecom_ada_confusion.png’)

Ada confusion matrix

Let’s plot the ROC curve

#from sklearn.metrics import roc_curve
y_pred_proba = ada.predict_proba(x_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
plt.plot([0,1],[0,1],’k-‘)
plt.plot(fpr,tpr, label=’Knn’)
plt.xlabel(‘FPR’)
plt.ylabel(‘TPR’)
plt.title(‘Ada Boost ROC curve’)

#plt.show()

plt.savefig(‘telecom_ada_roc_curve.png’)

Ada ROC curve

and get the score

#from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_pred_proba)

0.8481632301622273

Let’s construct the KS statistic plot

ada KS statistic plot

We also plot the Lift curve

skplt.metrics.plot_lift_curve(y_test, Y_test_probs, figsize=(10,6));

Ada lift curve

Let’s plot the Learning curve

#import scikitplot as skplt
skplt.estimators.plot_learning_curve(ada, x_test, prediction,
cv=7, shuffle=True, scoring=”accuracy”,
n_jobs=-1, figsize=(6,4), title_fontsize=”large”, text_fontsize=”large”,
title=”Ada Boost Classifier Learning Curve”);

Ada learning curve

We can see that the cross-validation score has relatively large confidence intervals.

LGBM Model

Let’s look at the LGBM Classifier.

Original data:

lgbm = LGBMClassifier(random_state=5, learning_rate= 0.05, n_estimators= 90, num_leaves= 20, boosting_type=’dart’)
lgbm.fit(x_train, y_train)
predictionlgbm = lgbm.predict(x_test)
print(confusion_matrix(y_test, predictionlgbm))
print(“Accuracy LightGBM: %.2f” % (accuracy_score(y_test, predictionlgbm)100) ) print(“Recall LightGBM:”,recall_score(y_test, predictionlgbm)100)
print(“Precision LightGBM:”,precision_score(y_test, predictionlgbm)*100)
print(“”)

[[936  87]
 [191 195]]
Accuracy LightGBM: 80.27
Recall LightGBM: 50.51813471502591
Precision LightGBM: 69.14893617021278

Resampled data 1:

lgbm.fit(x_resample, y_resample)
predictionlgbm = lgbm.predict(x_test)
print(confusion_matrix(y_test, predictionlgbm))
print(“Accuracy LightGBM: %.2f” % (accuracy_score(y_test, predictionlgbm)100) ) print(“Recall LightGBM:”,recall_score(y_test, predictionlgbm)100)
print(“Precision LightGBM:”,precision_score(y_test, predictionlgbm)*100)
print(“”)

[[826 197]
 [116 270]]
Accuracy LightGBM: 77.79
Recall LightGBM: 69.94818652849742
Precision LightGBM: 57.81584582441114

Resampled data 2:

lgbm.fit(x_resample_2, y_resample_2)
predictionlgbm = lgbm.predict(x_test)
print(confusion_matrix(y_test, predictionlgbm))
print(“Accuracy LightGBM: %.2f” % (accuracy_score(y_test, predictionlgbm)100) ) print(“Recall LightGBM:”,recall_score(y_test, predictionlgbm)100)
print(“Precision LightGBM:”,precision_score(y_test, predictionlgbm)*100)
print(“”)

[[869 154]
 [136 250]]
Accuracy LightGBM: 79.42
Recall LightGBM: 64.76683937823834
Precision LightGBM: 61.88118811881188

Let’s plot the confusion matrix

#from sklearn.metrics import classification_report, confusion_matrix
cf_matrix=confusion_matrix(y_test, predictionlgbm)
sns.heatmap(cf_matrix/np.sum(cf_matrix), annot=True,
fmt=’.2%’, cmap=’Blues’)
plt.savefig(‘telecom_lgbm_confusion.png’)

LGBM confusion matrix

Let’s look at the ROC curve

#from sklearn.metrics import roc_curve
y_pred_proba = lgbm.predict_proba(x_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
plt.plot([0,1],[0,1],’k-‘)
plt.plot(fpr,tpr, label=’Knn’)
plt.xlabel(‘FPR’)
plt.ylabel(‘TPR’)
plt.title(‘LightGBM ROC curve’)

#plt.show()

plt.savefig(‘telecom_lgbm_roc.png’)

LGBM ROC curve

and get the score

#from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_pred_proba)

0.8485127051899575

Let’s construct the KS Statistic plot

Y_test_probs = lgbm.predict_proba(x_test)
skplt.metrics.plot_ks_statistic(y_test, Y_test_probs, figsize=(10,6));

LGBM KS Statistic plot

Let’s look at the Lift curve

skplt.metrics.plot_lift_curve(y_test, Y_test_probs, figsize=(10,6));

LGBM Lift Curve

Let’s look at the Learning curve

#import scikitplot as skplt
skplt.estimators.plot_learning_curve(lgbm, x_test, prediction,
cv=7, shuffle=True, scoring=”accuracy”,
n_jobs=-1, figsize=(6,4), title_fontsize=”large”, text_fontsize=”large”,
title=”LightGBM Classifier Learning Curve”);

LGBM learning curve

Calibration Plots

Let’s compare the Calibration curves of our classifiers

rf_probas = RandomForestClassifier().fit(x_train, y_train).predict_proba(x_test)
gbc_probas = GradientBoostingClassifier().fit(x_train, y_train).predict_proba(x_test)
abc_probas = AdaBoostClassifier().fit(x_train, y_train).predict_proba(x_test)
lgbmc_scores = LGBMClassifier().fit(x_train, y_train).predict_proba(x_test)

probas_list = [rf_probas, gbc_probas, abc_probas, lgbmc_scores]
clf_names = [‘RandomForest’, ‘GradientBoosting’, ‘AdaBoost’, ‘LGBM’]
skplt.metrics.plot_calibration_curve(y_test,
probas_list,
clf_names, n_bins=15,
figsize=(12,6)
);

calibration curves

We can see that GB (blue curve) is the best classifier in terms of calibration.

Feature Engineering

Let’s compute the correlation heatmap by defining the mask to set the values in the upper triangle to True

plt.figure(figsize=(18, 10))

mask = np.triu(np.ones_like(data_prep.corr(), dtype=np.bool))
heatmap = sns.heatmap(data_prep.corr(), mask=mask, vmin=-1, vmax=1, annot=True, cmap=’BrBG’)
heatmap.set_title(‘Triangle Correlation Heatmap’, fontdict={‘fontsize’:18}, pad=16);
plt.savefig(‘telcom_corrmatrix.png’)

triangle correlation heatmap

Let’s run random forest to get feature importance

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators = 25).fit(x_train, y_train)

feats = x_train.columns

for feature in zip(feats, rf.feature_importances_):
print(feature)

('gender', 0.027237348338938393)
('SeniorCitizen', 0.020887518841213648)
('Partner', 0.022358221011906435)
('Dependents', 0.0192192077185847)
('tenure', 0.15460929865409162)
('PhoneService', 0.004699263026114435)
('MultipleLines', 0.023072448889206308)
('InternetService', 0.028156528258148645)
('OnlineSecurity', 0.04750854161214951)
('OnlineBackup', 0.02195234642958269)
('DeviceProtection', 0.03126612890688831)
('TechSupport', 0.04543420250489264)
('StreamingTV', 0.017003835114310445)
('StreamingMovies', 0.016991860057308194)
('Contract', 0.07936673266861258)
('PaperlessBilling', 0.026547872337756764)
('PaymentMethod', 0.051829192705564645)
('MonthlyCharges', 0.176231663037708)
('TotalCharges', 0.1856277898870221)

We can sort these values in the descending order

imp_df = pd.DataFrame({
“Varname”: x_train.columns,
“Imp”: rf.feature_importances_
})
imp_df.sort_values(by=”Imp”, ascending=False)

table importance factor

Let’s create and plot the list of important features with weights defined above

importances = rf.feature_importances_

weights = pd.Series(importances,
index=x_train.columns.values)
weights.sort_values()[-10:].plot(kind = ‘barh’)

sorted importance barchart

Cluster Analysis

Let’s look at the Elbow plot

from sklearn.cluster import KMeans
skplt.cluster.plot_elbow_curve(KMeans(random_state=1),
x_test,
cluster_ranges=range(2, 20),
figsize=(8,6));

elbow plot
K-means

and check PCA component explained variances

from sklearn.decomposition import PCA
pca = PCA(random_state=1)
pca.fit(x_test)

skplt.decomposition.plot_pca_component_variance(pca, figsize=(8,6));

PCA clusters explained variances

We can also look at the 2-D PCA projection

skplt.decomposition.plot_pca_2d_projection(pca, x_test, y_test,
figsize=(10,10),
cmap=”tab10″);

PCA 2-D projection

Let’s perform the Silhouette analysis

kmeans = KMeans(n_clusters=10, random_state=1)
kmeans.fit(x_train, y_train)
cluster_labels = kmeans.predict(x_test)

skplt.metrics.plot_silhouette(x_test, cluster_labels,
figsize=(8,6));

Silhouette analysis for 10 K-means clusters

and compare the Silhouette coefficient values assigned to our 10 cluster labels defined above.

Conclusions

Model resampled data 2RFGBABLGBM
Accuracy %78797779
Recall %65676965
Precision %60605662
ROC Score %83858585

The most dominant features are Total/Monthly Charges, tenure and contract.

We have identified 10 K-means clusters. The PCA explained variance ratio is 0.792 for first 8 componanets.

As a company grows, manually evaluating customer churn becomes difficult. Yet it’s important to regularly calculate and track churn metrics over time so you can spot and ameliorate problems.

The proposed Python sequence can help churn analytics teams analyze and understand the company’s churn rate from various perspectives while forecasting future churn for planning purposes. The ML/AI capability offers a strong churn analysis program that can help you model out future capital needs, make a workforce management plan and inform other essential business decisions.

Step one is complete. You know which customers are churning, and why. The next question is what do you do with everything you found? Lucky for you, there are 6 proven strategies to reduce churn.

[1] https://medium.com/@andhikaw.789/telco-customer-churn-machine-learning-prediction-170f16ee2fa6

[2] https://medium.com/mlearning-ai/analyzing-ibm-employee-attrition-ec9b8b9f5b0e

[3] https://wp.me/pdMwZd-kB

[4] https://trendskout.com/

[5] https://discover.glassbox.com/

[6] https://www.netsuite.com/portal/resource/articles/human-resources/customer-churn-analysis.shtml?mc24943=v1

[7] https://towardsdatascience.com/customer-churn-analysis-4f77cc70b3bd

[8] https://mixpanel.com/blog/churn-analytics/

[9] https://www.gainsight.com/glossary/what-is-customer-churn-analysis/

[10] https://baremetrics.com/blog/churn-analysis

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: