Today, most services are digitalized, and data is more and more available. Companies have been able to store and process vast amounts of data while realizing that being customer-centric was becoming the main requirement to stand out from the competition. Predicting customer churn is important for subscription-based businesses. They must focus on customer retention and churn management to be, or remain, leaders. They also need to understand which customers are canceling their subscriptions and why.
The cost of acquiring a new customer could be higher than that of retaining a customer by as much as 700%, and that increasing customer retention rates by a mere 5% could increase profits by 25% to 95%.
In this article, we will perform churn analysis and prediction in Graphite without writing a single line of code.
What is Customer Churn?
Customer churn happens in the Software-as-a-Service business similarly as it is in subscription-based industries like the telecommunications industry. But very often, companies lack knowledge about the factors leading to customer churn. They must implement customer churn prediction models to respond to customer churn in time.
Customer Churn Model
The main characteristic of machine learning is the ability to build systems capable of finding patterns in data and learning from it - without explicit programming the rules. In customer churn prediction models, the Model will observe behavior characteristics and other features that decrease customer satisfaction from using company services/products.
Image by the author - Predicting Customer Churn model idea
First, in the training phase, machine learning algorithms will reveal some shared behavior patterns of those customers who have already left the company.
Then, once trained, algorithms can check the behavior of future customers against such patterns - and point out potential churners.
Armed with that knowledge, companies can be proactive with these customers to engage with them, understand their pain points, and prevent churn before it happens.
Dataset for Predicting Customer Churn
So, how do we start working with churn rate prediction? Which data is needed?
For this tutorial, we use a Telecom Customer Churn dataset from Kaggle, which is quite popular for churn modeling.
Each row represents a customer, and each column contains the customer's attributes.
The dataset contains information about:
Customers who left – the column is called "Churn", and this will be the target column in our Model (something we want to predict)
Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
Customer account information – how long they've been a customer, contract, payment method, paperless billing, monthly charges, and total charges
Demographic info about customers – gender, age, and if they have partners and dependents
Import Customer Churn Dataset
Let's import and parse a CSV file that we previously downloaded from Kaggle.
We can browse through our dataset rows, filter, or search on the View Data tab.
We have 21 columns and 7032 rows.
Every uploaded dataset in Graphite has a practical Summary tab. It enables, at a glance, to check distributions of numeric columns, the number of null values, and different statistical measures.
Image by the author: customer churn training dataset summary in Graphite
We can quickly check that our target column, "Churn", explaining if a customer left or not, is not very imbalanced. That means we have enough "yes" and "no" signals to train the model.
It is interesting to see the distribution of some of the columns, like "monthly charges". Most of our customers have monthly charges up to $28. Another customer group is centered around $80 / month.
The reason is because our target variable, "Churn" can have only two states -
NO - not churner
YES - churner.
We say this dataset is labeled correctly (with target variable Churn), and we are ready to train a model.
Run the No-Code Machine Learning Model
Now we have our dataset uploaded. All is set to create a no-code machine learning model in Graphite. We chose the Binary Classification model.
Image by the author: a Model selection in Graphite
In Graphite, to build a binary classification model, you need
a binary target column (what are we predicting, with only two distinct states? For us it is a column "Churn")
a set of features (other columns from the dataset that have an impact on the target column)
In just a few mouse clicks, we will define a model Scenario in Graphite.
We select our Target column from our dataset:
Image by the author: target column selection in Graphite
We selected all other columns as features.
Customer Churn Analysis
We will leave all other options on default and run this scenario.
Graphite will take care of several preprocessing steps to achieve the best results, so you don't have to think about them. If you are curious about technical stuff, all these preprocessing steps will occur automatically:
null values handling
missing values
One Hot Encoding
fix imbalance
normalization
constants
cardinality
Graphite will take a sample of 80% (5625 rows) of our data and train several machine learning models.
Then, it will test those models on the remaining 20% (1407 rows) and calculate relevant model scores. Based on scores, it will select the best performing model for the dataset.
The best model fit, results, and predictions are available on the Results tab, after about 20 seconds training.
In our case the best Model based on the F1 value score is Logistic Regression. Other models' training metrics are listed below.
Image by the author: customer churn training results in Graphite
Confusion Matrix
Confusion Matrix makes it easy to see whether the Model is confusing two classes (YES and NO in our case). For each class, it summarizes the number of correct and incorrect predictions. The Model predicted column 'Churn' for a test dataset of 1407 rows and compared the predicted outcomes to the historical outcomes.
Image by the author: customer churn confusion matrix in Graphite
Correct Predictions
1129 in total out of 1407 test rows. This is defining Model Accuracy = 80.24%
True Positives (TP) = 204: a row was Yes and the model predicted a Yes class for it.
True Negatives (TN) = 925: a row was No and the model predicted a No class for it.
Errors
278 in total out of 1407 test rows, 19.76%
False Positives (FP) = 103: a row was No and the model predicted a Yes class for it.
False Negatives (FN) = 175: a row was Yes and the model predicted a No class for it.
Other Model Scores
Please note that we describe predicted values as Positive and Negative and actual values as True and False.
Accuracy, (TP + TN) / TOTAL.
From all the classes (positive and negative), 80.24% of them we have predicted correctly. Accuracy should be as high as possible.
From all the classes we have predicted as positive, 66.45% are actually positive. Precision should be as high as possible.
Recall, TP / (TP + FN).
From all the positive classes, 53.83% we predicted correctly. Recall should be as high as possible.
F1 score, 2 * (Precision * Recall)/(Precision + Recall).
F1-score is 59.48%. It helps to measure Recall and Precision simultaneously.
Feature importance
Feature importance refers to how much this Model relies upon each column (feature) to make accurate predictions. The more a model relies on a column (feature) to make predictions, the more important it is for the Model overall. Graphite uses a permutation feature importance for this calculation.
Image by the author: customer churn feature importance in Graphite
The most important feature is column
"tenure" (Number of months the customer has stayed with the company), then
"Total Charges",
"Contract",
"Internet Service" and so on.
For example, "gender" and the fact that customer is "Senior citizen" or not don't have any influence on churn.
In Graphite, it is straightforward to check any feature concerning our target column ("Churn").
Notice that most churn can be seen in the tenure 0-5 months, and then again for tenure 50-55 months. Already some valuable info for customer success team.
Image by the author: tenure and customer churn
Next insight is that most churn can be seen in the contracts that are “Month-to-Month”:
Image by the author: contract and customer churn
Regarding Internet Service - likelihood of customers to churn is bigger if they use "Fiber Optic".
Image by the author: internet service and customer churn
Last Step - Predicting Churn for the new customers!
It is important to say that Graphite automatically deploys trained Model.
That means it is easy to predict new, unseen data on customer churn. We can get answers to questions like "Who will churn next"? "What is the probability of that outcome"?
Suppose your team gives you information about new customers after you train the Churn Model with Graphite.
You can quickly check whether customers will churn - and the probability of churn.
A powerful tool to increase your retention.
Let's check the churn prediction for one of the new customers:
Image by the author: predicting churn in Graphite
The Model claims this one will not churn, with 72% probability. He is not a target for the customer success team.
He is a better candidate for upselling or participating in a case study than a customer who is currently a churn risk.
For another new customer, the model is predicting that she WILL churn:
Image by the author: predicting churn in Graphite
The main drivers, if you recall, are tenure, Contract, Internet Service - this customer has a Month-to-Month contract, and Fiber Optics, which signals she is likely to churn.
Conclusion
Churn is a natural health indicator for subscription-based companies. Identifying customers who aren't happy with provided solutions allows businesses to learn about operation problems, product or pricing plan weak points, and customer preferences to reduce reasons for churn proactively. Also, it's essential to define data sources to have a complete picture of customer interaction history. The more qualitative the dataset, the more precise forecasts will be.
I hope this helped you understand how easy it is to train models in a no-code predictive analytics software like Graphite. With just a few mouse clicks, we train the ML model and predict.
You can explore all other Graphite Models here. This page may be helpful if you are interested in different machine learning use cases. Feel free to train your machine learning model on any dataset with the same ease or schedule a demo if you need help or have any questions.
This blog post provides insights based on the current research and understanding of AI, machine learning and predictive analytics applications for companies. Businesses should use this information as a guide and seek professional advice when developing and implementing new strategies.
Note
At Graphite Note, we are committed to providing our readers with accurate and up-to-date information. Our content is regularly reviewed and updated to reflect the latest advancements in the field of predictive analytics and AI.
Author Bio
Hrvoje Smolic, is the accomplished Founder and CEO of Graphite Note. He holds a Master's degree in Physics from the University of Zagreb. In 2010 Hrvoje founded Qualia, a company that created BusinessQ, an innovative SaaS data visualization software utilized by over 15,000 companies worldwide. Continuing his entrepreneurial journey, Hrvoje founded Graphite Note in 2020, a visionary company that seeks to redefine the business intelligence landscape by seamlessly integrating data analytics, predictive analytics algorithms, and effective human communication.
Graphite Note simplifies the use of Machine Learning in analytics by helping business users to generate no-code machine learning models - without writing a single line of code.
If you liked this blog post, you'll love Graphite Note!
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!