A Lead represents a potential customer interested in buying your products or services. In this era of Advanced Analytics and Machine Learning, every organization wants to transform the process of identifying Leads from a long list of people or companies that have some interest in the product or service you offer. Transform it to make it better and more efficient. To address this issue, the Leads Scoring Model comes into the picture.
Companies are in business to profit and can only profit if they provide the products or services their customers demand. Meeting these demands means that they can generate revenue to keep the business going and expand it. That is where generating sales leads comes in because if companies are not bringing in new customers, then they will not be able to grow and will begin to stagnate.
But is just generating leads enough?
Let's dive in.
What is a leads scoring model?
Leads Scoring Model is merely a methodology where we train machine learning models to learn from historical data. In our example, the model will learn to classify leads in two states - "will convert" and "will not convert." It will also understand what influences the leads to convert.
If your team has many leads but not enough resources to pursue them all, you must prioritize your sales teams' time and give them the best possible leads. That will mean the leads with the highest probability to convert.
Maybe it will find out that leads who had more than two phone calls are very likely to convert, but only if they have spent more than 10 minutes on your website.
I am sure you will agree that this knowledge is extremely powerful in the hands of any marketing and sales team.
Dataset for leads scoring model
When you acquire the lead, it commonly includes information like:
tags / comments
contact details of the lead
Source of origin
Time spent on the website
the number of clicks
number of emails sent
number of phone calls/demos
For this post, we are using a popular Lead Conversion dataset from Kaggle. It contains over 9240 past leads with 37 columns.
The dataset consists of various attributes such as Lead Source, Total Time Spent on Website, Total Visits, or Last Activity. These may or may not help decide whether a lead will be converted.
The most important variable is the column 'Converted.' It tells whether a past lead was converted to a customer or not.
'1' means it was converted
'0' means it wasn't converted.
Our goal: the company desires to identify the most potential leads, also known as 'Hot Leads.'
Suppose they successfully identify this set of leads. In that case, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.
In a few mouse clicks, we imported and parsed a CSV file that we previously downloaded from Kaggle.
We can browse through our dataset rows, filter, or search on the View Data tab. We have 37 columns and 9240 rows.
Every uploaded dataset in Graphite has a practical Summary tab. It enables, at a glance, to check distributions of numeric columns, the number of null values, and different statistical measures.
As part of quick exploratory data analysis (EDA), it is always good to check correlations (ready for you on the Correlation tab) in the dataset to understand and "feel" the data better.
Binary Classification in Machine Learning
Predicting Lead Conversion is a great use case of binary machine learning classification. Binary, because our target variable we will be training the model for can have only two states - '0 - not converted' and '1 - converted'.
Run the no-code machine learning model in Graphite
Now we have our dataset uploaded, and we are ready to create a no-code machine learning model in Graphite. We chose the Binary Classification model.
In Graphite, to build a binary classification model, you need
a binary target column (what are we predicting, with only two distinct states?)
a set of features (other columns that have an impact on the target column)
In just a few mouse clicks, we will define a model Scenario.
Our Target column from our dataset:
We will select all other columns as features.
Note how Graphite immediately excluded columns that are not appropriate for modeling. Examples:
Prospect ID: it contains 9240 unique values. The column will not be used because it is a categorical value that contains more than 90% unique values.
Magazine: The column will not be used because it is a constant. It can not influence the target variable.
Binary classification Model Results
We will leave all other options on default and run this scenario.
Graphite will take care of several preprocessing steps to achieve the best results, so you don't have to think about them. All these preprocessing steps will occur automatically:
null values handling
One Hot Encoding
Graphite will take a sample of 80% of our data and train several machine learning models. Then, it will test those models on the remaining 20% and calculate relevant model scores. The final best model fit, results, and predictions will be available on the Results tab.
After about 30 seconds, we have our results.
Graphite runs several Machine learning algorithms that work best with binary classification problems by using
80% of the data (7392 rows) for training and
20% (1848 rows) for a test dataset.
The total training time was 36.46 seconds.
The best model based on the F1 value is Light Gradient Boosting Machine. Other models' training metrics are listed below.
The model Fit tab shows how well Graphite performs. For 1848 rows in the test dataset, we compare the Model's predictions of the column 'Converted' to historical, known outcomes for the column 'Converted.' Model fit is better when the historical and predicted bars are closer.
Confusion Matrix - How Did the Model Perform?
Confusion Matrix reveals classification errors. It makes it easy to see whether the Model is confusing two classes. For each class, it summarizes the number of correct and incorrect predictions. The Model predicted column 'Converted' for a test dataset of 1848 rows and compared the predicted outcomes to the historical outcomes.
1762 in total out of 1848 test rows. This is defining Model Accuracy = 95.35%
True Positives (TP) = 634: a row was 1 and the model predicted a 1 class for it.
True Negatives (TN) = 1128: a row was 0 and the model predicted a 0 class for it.
86 in total out of 1848 test rows, 4.65%
False Positives (FP) = 35: a row was 0 and the model predicted a 1 class for it.
False Negatives (FN) = 51: a row was 1 and the model predicted a 0 class for it.
Other Model Scores
Please note that we describe predicted values as Positive and Negative and actual values as True and False.
Accuracy, (TP + TN) / TOTAL.
From all the classes (positive and negative), 95.35% of them we have predicted correctly. Accuracy should be as high as possible.
Precision, TP / (TP + FP).
From all the classes we have predicted as positive, 94.77% are actually positive. Precision should be as high as possible.
Recall, TP / (TP + FN).
From all the positive classes, 92.55% we predicted correctly. Recall should be as high as possible.
F1 score, 2 * (Precision * Recall)/(Precision + Recall).
F1-score is 93.65%.It helps to measure Recall and Precision at the same time. You cannot have a high F1 score without strong model underneath.
Feature importance refers to how much this Model relies upon each column (feature) to make accurate predictions. The more a model relies on a column (feature) to make predictions, the more important it is for the Model. Graphite uses a permutation feature importance for this calculation.
The most important feature is column "Tags", then "Last Notable Activity", "Total Time Spent on Website", and so on.
In Graphite, it is very easy to check column like "Tags" in respect to our target column ("Converted"). The most leads that converted have a tag value of "Will revert after reading the email":
Predictions for the New Leads
Graphite automatically deployed trained Model. That means it is straightforward to predict new, unseen data on leads, whether they will convert or not, and the probability of such an outcome.
Imagine that your marketing team informs you about their new lead after you trained the Leads Scoring Model with Graphite.
You can check whether that lead will convert and the probability. A powerful tool to keep you focused only on high-quality leads!
For this particular new lead, Graphite predicted that it would convert (Converted = 1), with a 97 % probability of such an outcome.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!