A Lead represents a potential customer interested in buying your products or services. In this era of Advanced Analytics and Machine Learning, every organization wants to transform the process of identifying Leads from a long list of people or companies that have some interest in the product or service you offer. Transform it to make it better and more efficient. To address this issue, the Lead Scoring Model comes into the picture.
Companies are in business to profit and can only profit if they provide the products or services their customers demand. Meeting these demands means that they can generate revenue to keep the business going and expand it. That is where generating sales leads comes in because if companies are not bringing in new customers, then they will not be able to grow and will begin to stagnate.
But is just generating leads enough?
Let's dive in.
If your team has many leads but not enough resources to pursue them all, you must prioritize your sales teams' time and give them the best possible leads. That will mean the leads with the highest probability to convert.
Lead Scoring Model is merely a methodology where we train machine learning models to learn from historical data. In our example, the model will learn to classify leads in two states - "will convert" and "will not convert." It will also understand what influences the leads to convert.
Maybe it will find out that leads who had more than two phone calls are very likely to convert, but only if they have spent more than 10 minutes on your website.
I am sure you will agree that this knowledge is extremely powerful in the hands of any marketing and sales team.
When you acquire the lead, it commonly includes information like:
For this post, we are using a popular Lead Conversion dataset from Kaggle. It contains over 9240 past leads with 37 columns.
The dataset consists of various attributes such as Lead Source, Total Time Spent on Website, Total Visits, or Last Activity. These may or may not help decide whether a lead will be converted.
The most important variable is the column 'Converted.' It tells whether a past lead was converted to a customer or not.
Our goal: the company desires to identify the most potential leads, also known as 'Hot Leads.'
Suppose they successfully identify this set of leads. In that case, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.
In a few mouse clicks, we imported and parsed a CSV file that we previously downloaded from Kaggle.
We can browse through our dataset rows, filter, or search on the View Data tab. We have 37 columns and 9240 rows.
Every uploaded dataset in Graphite has a practical Summary tab. It enables, at a glance, to check distributions of numeric columns, the number of null values, and different statistical measures.
As part of quick exploratory data analysis (EDA), it is always good to check correlations (ready for you on the Correlation tab) in the dataset to understand and "feel" the data better.
Predicting Lead Conversion is a great use case of binary machine learning classification. Binary, because our target variable we will be training the model for can have only two states - '0 - not converted' and '1 - converted'.
Now we have our dataset uploaded, and we are ready to create a no-code machine learning model in Graphite. We chose the Binary Classification model.
In Graphite, to build a binary classification model, you need
In just a few mouse clicks, we will define a model Scenario.
Our Target column from our dataset:
We will select all other columns as features.
Note how Graphite immediately excluded columns that are not appropriate for modeling. Examples:
We will leave all other options on default and run this scenario.
Graphite will take care of several preprocessing steps to achieve the best results, so you don't have to think about them. All these preprocessing steps will occur automatically:
Graphite will take a sample of 80% of our data and train several machine learning models. Then, it will test those models on the remaining 20% and calculate relevant model scores. The final best model fit, results, and predictions will be available on the Results tab.
After about 30 seconds, we have our results.
Graphite runs several Machine learning algorithms that work best with binary classification problems by using
The total training time was 36.46 seconds.
The best model based on the F1 value is Light Gradient Boosting Machine. Other models' training metrics are listed below.
The model Fit tab shows how well Graphite performs. For 1848 rows in the test dataset, we compare the Model's predictions of the column 'Converted' to historical, known outcomes for the column 'Converted.' Model fit is better when the historical and predicted bars are closer.
Confusion Matrix reveals classification errors.
It makes it easy to see whether the Model is confusing two classes. For each class, it summarizes the number of correct and incorrect predictions. The Model predicted column 'Converted' for a test dataset of 1848 rows and compared the predicted outcomes to the historical outcomes.
1762 in total out of 1848 test rows. This is defining Model Accuracy = 95.35%
True Positives (TP) = 634: a row was 1 and the model predicted a 1 class for it.
True Negatives (TN) = 1128: a row was 0 and the model predicted a 0 class for it.
86 in total out of 1848 test rows, 4.65%
False Positives (FP) = 35: a row was 0 and the model predicted a 1 class for it.
False Negatives (FN) = 51: a row was 1 and the model predicted a 0 class for it.
Other Model Scores
Please note that we describe predicted values as Positive and Negative and actual values as True and False.
Accuracy, (TP + TN) / TOTAL.
From all the classes (positive and negative), 95.35% of them we have predicted correctly.
Accuracy should be as high as possible.
Precision, TP / (TP + FP).
From all the classes we have predicted as positive, 94.77% are actually positive.
Precision should be as high as possible.
Recall, TP / (TP + FN).
From all the positive classes, 92.55% we predicted correctly.
Recall should be as high as possible.
F1 score, 2 * (Precision * Recall)/(Precision + Recall).
F1-score is 93.65%.It helps to measure Recall and Precision at the same time. You cannot have a high F1 score without strong model underneath.
Feature importance refers to how much this Model relies upon each column (feature) to make accurate predictions. The more a model relies on a column (feature) to make predictions, the more important it is for the Model. Graphite uses a permutation feature importance for this calculation.
The most important feature is column "Tags", then "Last Notable Activity", "Total Time Spent on Website", and so on.
In Graphite, it is very easy to check column like "Tags" in respect to our target column ("Converted"). The most leads that converted have a tag value of "Will revert after reading the email":
Graphite automatically deployed trained Model. That means it is straightforward to predict new, unseen data on leads, whether they will convert or not, and the probability of such an outcome.
Imagine that after you trained the Lead Scoring Model with Graphite, your marketing team informs you about their new lead.
You can check whether that lead will convert and the probability. A powerful tool to keep you focused only on high-quality leads!
For this particular new lead, Graphite predicted that it would convert (Converted = 1), with a 97 % probability of such an outcome.
I hope this helped you understand how easy it is to train models in a no-code machine learning software like Graphite Note. With just a few mouse clicks, we could predict the lead conversion.
I hope you guys enjoyed it!