Today, most services are digitalized, and data is more and more available. Companies have been able to store and process vast amounts of data while realizing that being customer-centric was becoming the main requirement to stand out from the competition. Predicting customer churn is important for subscription-based businesses. They must focus on customer retention and churn management to be, or remain, leaders. They also need to understand which customers are canceling their subscriptions and why.
The cost of acquiring a new customer could be higher than that of retaining a customer by as much as 700%, and that increasing customer retention rates by a mere 5% could increase profits by 25% to 95%.
In this article, we will perform churn analysis and prediction in Graphite without writing a single line of code.
What is Customer Churn?
Customer churn happens in the Software-as-a-Service business similarly as it is in subscription-based industries like the telecommunications industry. But very often, companies lack knowledge about the factors leading to customer churn. They must implement customer churn prediction models to respond to customer churn in time.
Customer Churn Model
The main characteristic of machine learning is the ability to build systems capable of finding patterns in data and learning from it - without explicit programming the rules. In customer churn prediction models, the Model will observe behavior characteristics and other features that decrease customer satisfaction from using company services/products.
First, in the training phase, machine learning algorithms will reveal some shared behavior patterns of those customers who have already left the company.
Then, once trained, algorithms can check the behavior of future customers against such patterns - and point out potential churners.
Armed with that knowledge, companies can be proactive with these customers to engage with them, understand their pain points, and prevent churn before it happens.
Dataset for Predicting Customer Churn
So, how do we start working with churn rate prediction? Which data is needed?
For this tutorial, we use a Telecom Customer Churn dataset from Kaggle, which is quite popular for churn modeling.
Each row represents a customer, and each column contains the customer's attributes.
The dataset contains information about:
Customers who left – the column is called "Churn", and this will be the target column in our Model (something we want to predict)
Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
Customer account information – how long they've been a customer, contract, payment method, paperless billing, monthly charges, and total charges
Demographic info about customers – gender, age, and if they have partners and dependents
Import Customer Churn Dataset
Let's import and parse a CSV file that we previously downloaded from Kaggle.
We can browse through our dataset rows, filter, or search on the View Data tab.
We have 21 columns and 7032 rows.
Every uploaded dataset in Graphite has a practical Summary tab. It enables, at a glance, to check distributions of numeric columns, the number of null values, and different statistical measures.
We can quickly check that our target column, "Churn", explaining if a customer left or not, is not very imbalanced. That means we have enough "yes" and "no" signals to train the model.
It is interesting to see the distribution of some of the columns, like "monthly charges". Most of our customers have monthly charges up to $28. Another customer group is centered around $80 / month.
The reason is because our target variable, "Churn" can have only two states -
NO - not churner
YES - churner.
Run the No-Code Machine Learning Model
Now we have our dataset uploaded. All is set to create a no-code machine learning model in Graphite. We chose the Binary Classification model.
In Graphite, to build a binary classification model, you need
a binary target column (what are we predicting, with only two distinct states? For us it is a column "Churn")
a set of features (other columns from the dataset that have an impact on the target column)
In just a few mouse clicks, we will define a model Scenario in Graphite.
We select our Target column from our dataset:
We selected all other columns as features.
Customer Churn Analysis
We will leave all other options on default and run this scenario.
Graphite will take care of several preprocessing steps to achieve the best results, so you don't have to think about them. If you are curious about technical stuff, all these preprocessing steps will occur automatically:
null values handling
One Hot Encoding
Graphite will take a sample of 80% (5625 rows) of our data and train several machine learning models.
Then, it will test those models on the remaining 20% (1407 rows) and calculate relevant model scores. Based on scores, it will select the best performing model for the dataset.
The best model fit, results, and predictions are available on the Results tab, after about 20 seconds training.
In our case the best Model based on the F1 value score is Logistic Regression. Other models' training metrics are listed below.
Confusion Matrix makes it easy to see whether the Model is confusing two classes (YES and NO in our case). For each class, it summarizes the number of correct and incorrect predictions. The Model predicted column 'Churn' for a test dataset of 1407 rows and compared the predicted outcomes to the historical outcomes.
1129 in total out of 1407 test rows. This is defining Model Accuracy = 80.24%
True Positives (TP) = 204: a row was Yes and the model predicted a Yes class for it.
True Negatives (TN) = 925: a row was No and the model predicted a No class for it.
278 in total out of 1407 test rows, 19.76%
False Positives (FP) = 103: a row was No and the model predicted a Yes class for it.
False Negatives (FN) = 175: a row was Yes and the model predicted a No class for it.
Other Model Scores
Please note that we describe predicted values as Positive and Negative and actual values as True and False.
Accuracy, (TP + TN) / TOTAL.
From all the classes (positive and negative), 80.24% of them we have predicted correctly. Accuracy should be as high as possible.
Precision, TP / (TP + FP).
From all the classes we have predicted as positive, 66.45% are actually positive. Precision should be as high as possible.
Recall, TP / (TP + FN).
From all the positive classes, 53.83% we predicted correctly. Recall should be as high as possible.
F1 score, 2 * (Precision * Recall)/(Precision + Recall).
F1-score is 59.48%. It helps to measure Recall and Precision simultaneously.
Feature importance refers to how much this Model relies upon each column (feature) to make accurate predictions. The more a model relies on a column (feature) to make predictions, the more important it is for the Model overall. Graphite uses a permutation feature importance for this calculation.
The most important feature is column
"tenure" (Number of months the customer has stayed with the company), then
"Internet Service" and so on.
For example, "gender" and the fact that customer is "Senior citizen" or not don't have any influence on churn.
In Graphite, it is straightforward to check any feature concerning our target column ("Churn").
Notice that most churn can be seen in the tenure 0-5 months, and then again for tenure 50-55 months. Already some valuable info for customer success team.
Next insight is that most churn can be seen in the contracts that are “Month-to-Month”:
Regarding Internet Service - likelihood of customers to churn is bigger if they use "Fiber Optic".
Last Step - Predicting Churn for the new customers!
It is important to say that Graphite automatically deploys trained Model.
That means it is easy to predict new, unseen data on customer churn. We can get answers to questions like "Who will churn next"? "What is the probability of that outcome"?
Suppose your team gives you information about new customers after you train the Churn Model with Graphite.
You can quickly check whether customers will churn - and the probability of churn.
A powerful tool to increase your retention.
Let's check the churn prediction for one of the new customers:
The Model claims this one will not churn, with 72% probability. He is not a target for the customer success team.
He is a better candidate for upselling or participating in a case study than a customer who is currently a churn risk.
For another new customer, the model is predicting that she WILL churn:
The main drivers, if you recall, are tenure, Contract, Internet Service - this customer has a Month-to-Month contract, and Fiber Optics, which signals she is likely to churn.
Churn is a natural health indicator for subscription-based companies. Identifying customers who aren't happy with provided solutions allows businesses to learn about operation problems, product or pricing plan weak points, and customer preferences to reduce reasons for churn proactively. Also, it's essential to define data sources to have a complete picture of customer interaction history. The more qualitative the dataset, the more precise forecasts will be.
I hope this helped you understand how easy it is to train models in a no-code machine learning software like Graphite. With just a few mouse clicks, we train the ML model and predict.
You can explore all other Graphite Models here. This page may be helpful if you are interested in different machine learning use cases. Feel free to train your machine learning model on any dataset with the same ease or schedule a demo if you need help or have any questions.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!