Predicting Customer Churn for Subscription-based Businesses

Hrvoje Smolic
-
06/04/2022

Predicting Customer Churn

Today, most services are digitalized, and data is more and more available. Companies have been able to store and process vast amounts of data while realizing that being customer-centric was becoming the main requirement to stand out from the competition. Predicting customer churn is important for subscription-based businesses. They must focus on customer retention and churn management to be, or remain, leaders. They also need to understand which customers are canceling their subscriptions and why. 

The cost of acquiring a new customer could be higher than that of retaining a customer by as much as 700%, and that increasing customer retention rates by a mere 5% could increase profits by 25% to 95%.

In this article, we will perform churn analysis and prediction in Graphite without writing a single line of code. 

What is Customer Churn?

Customer churn happens in the Software-as-a-Service business similarly as it is in subscription-based industries like the telecommunications industry. But very often, companies lack knowledge about the factors leading to customer churn. They must implement customer churn prediction models to respond to customer churn in time.

Customer Churn Model

The main characteristic of machine learning is the ability to build systems capable of finding patterns in data and learning from it - without explicit programming the rules. In customer churn prediction models, the Model will observe behavior characteristics and other features that decrease customer satisfaction from using company services/products.

Predicting Customer Churn for a B2B SaaS company
Image by the author - Predicting Customer Churn model idea

First, in the training phase, machine learning algorithms will reveal some shared behavior patterns of those customers who have already left the company. 

Then, once trained, algorithms can check the behavior of future customers against such patterns - and point out potential churners.

Armed with that knowledge, companies can be proactive with these customers to engage with them, understand their pain points, and prevent churn before it happens.

Dataset for Predicting Customer Churn

So, how do we start working with churn rate prediction? Which data is needed? 

For this tutorial, we use a Telecom Customer Churn dataset from Kaggle, which is quite popular for churn modeling.

Each row represents a customer, and each column contains the customer's attributes.

The dataset contains information about:

  • Customers who left – the column is called "Churn", and this will be the target column in our Model (something we want to predict)
  • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
  • Customer account information – how long they've been a customer, contract, payment method, paperless billing, monthly charges, and total charges
  • Demographic info about customers – gender, age, and if they have partners and dependents

Import Customer Churn Dataset

Let's import and parse a CSV file that we previously downloaded from Kaggle.

Predicting Customer Churn dataset
Image by the author: churn training dataset in Graphite

We can browse through our dataset rows, filter, or search on the View Data tab.

We have 21 columns and 7032 rows.

Every uploaded dataset in Graphite has a practical Summary tab. It enables, at a glance, to check distributions of numeric columns, the number of null values, and different statistical measures.

Predicting Customer Churn dataset info
Image by the author: customer churn training dataset summary in Graphite

We can quickly check that our target column, "Churn", explaining if a customer left or not, is not very imbalanced. That means we have enough "yes" and "no" signals to train the model.

It is interesting to see the distribution of some of the columns, like "monthly charges". Most of our customers have monthly charges up to $28. Another customer group is centered around $80 / month.

churn customer meaning
Image by the author: monthly charges distribution

Dataset Classification Model

Predicting Customer Churn is a great use case of binary machine learning classification

The reason is because our target variable, "Churn" can have only two states -

  • NO - not churner
  • YES - churner. 

Run the No-Code Machine Learning Model

Now we have our dataset uploaded. All is set to create a no-code machine learning model in Graphite. We chose the Binary Classification model.

Predicting Customer Churn model selection
Image by the author: a Model selection in Graphite

In Graphite, to build a binary classification model, you need

  • a binary target column (what are we predicting, with only two distinct states? For us it is a column "Churn")
  • a set of features (other columns from the dataset that have an impact on the target column)

In just a few mouse clicks, we will define a model Scenario in Graphite.

We select our Target column from our dataset:

graphite churn telco model target
Image by the author: target column selection in Graphite

We selected all other columns as features. 

Customer Churn Analysis

We will leave all other options on default and run this scenario.

Graphite will take care of several preprocessing steps to achieve the best results, so you don't have to think about them. If you are curious about technical stuff, all these preprocessing steps will occur automatically:

  • null values handling
  • missing values
  • One Hot Encoding
  • fix imbalance
  • normalization
  • constants
  • cardinality

Graphite will take a sample of 80% (5625 rows) of our data and train several machine learning models.

Then, it will test those models on the remaining 20% (1407 rows) and calculate relevant model scores. Based on scores, it will select the best performing model for the dataset.

The best model fit, results, and predictions are available on the Results tab, after about 20 seconds training.

In our case the best Model based on the F1 value score is Logistic Regression. Other models' training metrics are listed below.

graphite churn telco model training
Image by the author: customer churn training results in Graphite

Confusion Matrix

Confusion Matrix makes it easy to see whether the Model is confusing two classes (YES and NO in our case). For each class, it summarizes the number of correct and incorrect predictions. The Model predicted column 'Churn' for a test dataset of 1407 rows and compared the predicted outcomes to the historical outcomes.

Predicting Customer Churn churn customers
Image by the author: customer churn confusion matrix in Graphite

Correct Predictions

1129 in total out of 1407 test rows. This is defining Model Accuracy = 80.24%

True Positives (TP) = 204: a row was Yes and the model predicted a Yes class for it.

True Negatives (TN) = 925: a row was No and the model predicted a No class for it.

Errors

278 in total out of 1407 test rows, 19.76%

False Positives (FP) = 103: a row was No and the model predicted a Yes class for it.

False Negatives (FN) = 175: a row was Yes and the model predicted a No class for it.

Other Model Scores

Please note that we describe predicted values as Positive and Negative and actual values as True and False.

Accuracy, (TP + TN) / TOTAL.

From all the classes (positive and negative), 80.24% of them we have predicted correctly.
Accuracy should be as high as possible.

Precision, TP / (TP + FP).

From all the classes we have predicted as positive, 66.45% are actually positive.
Precision should be as high as possible.

Recall, TP / (TP + FN).

From all the positive classes, 53.83% we predicted correctly.
Recall should be as high as possible.

F1 score, 2 * (Precision * Recall)/(Precision + Recall).

F1-score is 59.48%. It helps to measure Recall and Precision simultaneously.

Feature importance

Feature importance refers to how much this Model relies upon each column (feature) to make accurate predictions. The more a model relies on a column (feature) to make predictions, the more important it is for the Model overall. Graphite uses a permutation feature importance for this calculation.

Predicting Churn feature importance
Image by the author: customer churn feature importance in Graphite

The most important feature is column

  • "tenure" (Number of months the customer has stayed with the company), then
  • "Total Charges",
  • "Contract",
  • "Internet Service" and so on.

For example, "gender" and the fact that customer is "Senior citizen" or not don't have any influence on churn.

In Graphite, it is straightforward to check any feature concerning our target column ("Churn").

Crush You Retention KPI.

Train your customer churn model in minutes.

No-code machine learning for everyday business users.

Legend:

  • green - customers that churned
  • blue - customers that are still with us

Notice that most churn can be seen in the tenure 0-5 months, and then again for tenure 50-55 months. Already some valuable info for customer success team.

Predicting Churn feature importance
Image by the author: tenure and customer churn

Next insight is that most churn can be seen in the contracts that are “Month-to-Month”:

Predicting Churn feature importance
Image by the author: contract and customer churn

Regarding Internet Service - likelihood of customers to churn is bigger if they use "Fiber Optic".

Predicting Churn feature importance
Image by the author: internet service and customer churn

Last Step - Predicting Churn for the new customers!

It is important to say that Graphite automatically deploys trained Model.

That means it is easy to predict new, unseen data on customer churn. We can get answers to questions like "Who will churn next"? "What is the probability of that outcome"?

Suppose your team gives you information about new customers after you train the Churn Model with Graphite. 

You can quickly check whether customers will churn - and the probability of churn.

A powerful tool to increase your retention.

Let's check the churn prediction for one of the new customers:

Predicting Customer Churn
Image by the author: predicting churn in Graphite

The Model claims this one will not churn, with 72% probability. He is not a target for the customer success team.

He is a better candidate for upselling or participating in a case study than a customer who is currently a churn risk.

For another new customer, the model is predicting that she WILL churn:

Predicting Customer Churn
Image by the author: predicting churn in Graphite

The main drivers, if you recall, are tenure, Contract, Internet Service - this customer has a Month-to-Month contract, and Fiber Optics, which signals she is likely to churn.

Conclusion

Churn is a natural health indicator for subscription-based companies. Identifying customers who aren't happy with provided solutions allows businesses to learn about operation problems, product or pricing plan weak points, and customer preferences to reduce reasons for churn proactively.
Also, it's essential to define data sources to have a complete picture of customer interaction history. The more qualitative the dataset, the more precise forecasts will be.

I hope this helped you understand how easy it is to train models in a no-code machine learning software like Graphite. With just a few mouse clicks, we train the ML model and predict.

You can explore all other Graphite Models here. This page may be helpful if you are interested in different machine learning use cases. Feel free to train your machine learning model on any dataset with the same ease or schedule a demo if you need help or have any questions.

I hope you enjoyed it!

Now that you are here...

Graphite Note simplifies the use of Machine Learning in analytics by helping business users to generate no-code machine learning models - without writing a single line of code.

If you liked this blog post, you'll love Graphite!
SIGN UP FREE
No Credit Card Required
More from our Blog
>RETURN TO BLOG 

Stay inspired and informed!

Sign up and get AI related content delivered to your inbox.
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram