How to Create a Lead Scoring Model with Graphite

Hrvoje Smolic
-
29/03/2022

Introduction

A Lead represents a potential customer interested in buying your products or services. In this era of Advanced Analytics and Machine Learning, every organization wants to transform the process of identifying Leads from a long list of people or companies that have some interest in the product or service you offer. Transform it to make it better and more efficient. To address this issue, the Lead Scoring Model comes into the picture.

Lead Scoring Model
Image by the author: Lead Scoring model training and predictions process

Companies are in business to profit and can only profit if they provide the products or services their customers demand. Meeting these demands means that they can generate revenue to keep the business going and expand it. That is where generating sales leads comes in because if companies are not bringing in new customers, then they will not be able to grow and will begin to stagnate.

But is just generating leads enough?

Let's dive in. 

What is a lead scoring model?

If your team has many leads but not enough resources to pursue them all, you must prioritize your sales teams' time and give them the best possible leads. That will mean the leads with the highest probability to convert.

Lead Scoring Model is merely a methodology where we train machine learning models to learn from historical data. In our example, the model will learn to classify leads in two states - "will convert" and "will not convert." It will also understand what influences the leads to convert. 

 Maybe it will find out that leads who had more than two phone calls are very likely to convert, but only if they have spent more than 10 minutes on your website.

I am sure you will agree that this knowledge is extremely powerful in the hands of any marketing and sales team.

Dataset

When you acquire the lead, it commonly includes information like:

  • name 
  • demographic
  • tags / comments
  • contact details of the lead
  • Source of origin 
  • Time spent on the website
  • the number of clicks
  • number of emails sent
  • number of phone calls/demos
  • ...

For this post, we are using a popular Lead Conversion dataset from Kaggle. It contains over 9240 past leads with 37 columns. 

The dataset consists of various attributes such as Lead Source, Total Time Spent on Website, Total Visits, or Last Activity. These may or may not help decide whether a lead will be converted. 

The most important variable is the column 'Converted.' It tells whether a past lead was converted to a customer or not.  

  •  '1' means it was converted 
  •  '0' means it wasn't converted.

Our goal: the company desires to identify the most potential leads, also known as 'Hot Leads.'

Suppose they successfully identify this set of leads. In that case, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone. 

Import dataset

In a few mouse clicks, we imported and parsed a CSV file that we previously downloaded from Kaggle.

graphite binary model lead scoring dataset
Image by the author: lead scoring training dataset in Graphite

We can browse through our dataset rows, filter, or search on the View Data tab. We have 37 columns and 9240 rows.

Every uploaded dataset in Graphite has a practical Summary tab. It enables, at a glance, to check distributions of numeric columns, the number of null values, and different statistical measures.

graphite binary model lead scoring datase summaryt
Image by the author: lead scoring training dataset summary in Graphite

As part of quick exploratory data analysis (EDA), it is always good to check correlations (ready for you on the Correlation tab) in the dataset to understand and "feel" the data better.

graphite binary model lead scoring dataset correlation
Image by the author: lead scoring training dataset correlations in Graphite

 

Binary Classification

Predicting Lead Conversion is a great use case of binary machine learning classification. Binary, because our target variable we will be training the model for can have only two states - '0 - not converted' and '1 - converted'. 

Run the no-code machine learning model in Graphite

Now we have our dataset uploaded, and we are ready to create a no-code machine learning model in Graphite. We chose the Binary Classification model.

In Graphite, to build a binary classification model, you need

  • a binary target column (what are we predicting, with only two distinct states?)
  • a set of features (other columns that have an impact on the target column)

In just a few mouse clicks, we will define a model Scenario.

Our Target column from our dataset:

graphite binary model lead scoring model target
Image by the author: target column selection in Graphite

We will select all other columns as features. 

graphite binary model lead scoring model features
Image by the author: feature columns selection in Graphite

Note how Graphite immediately excluded columns that are not appropriate for modeling. Examples:

  • Prospect ID: it contains 9240 unique values. The column will not be used because it is a categorical value that contains more than 90% unique values.
  • Magazine: The column will not be used because it is a constant. It can not influence the target variable.

Binary classification model results

We will leave all other options on default and run this scenario.

Graphite will take care of several preprocessing steps to achieve the best results, so you don't have to think about them. All these preprocessing steps will occur automatically:

  • null values handling
  • missing values
  • One Hot Encoding
  • fix imbalance
  • normalization
  • constants
  • cardinality

Graphite will take a sample of 80% of our data and train several machine learning models. Then, it will test those models on the remaining 20% and calculate relevant model scores. The final best model fit, results, and predictions will be available on the Results tab.

After about 30 seconds, we have our results.

Graphite runs several Machine learning algorithms that work best with binary classification problems by using

  • 80% of the data (7392 rows) for training and
  • 20% (1848 rows) for a test dataset.

The total training time was 36.46 seconds.

The best model based on the F1 value is Light Gradient Boosting Machine. Other models' training metrics are listed below.

graphite binary model lead scoring model training
Image by the author: lead scoring training results in Graphite

The model Fit tab shows how well Graphite performs. For 1848 rows in the test dataset, we compare the Model's predictions of the column 'Converted' to historical, known outcomes for the column 'Converted.' Model fit is better when the historical and predicted bars are closer.

graphite binary model lead scoring model fit
Image by the author: lead scoring model fit in Graphite

Confusion Matrix

Confusion Matrix reveals classification errors.
It makes it easy to see whether the Model is confusing two classes. For each class, it summarizes the number of correct and incorrect predictions. The Model predicted column 'Converted' for a test dataset of 1848 rows and compared the predicted outcomes to the historical outcomes.

Image by the author: lead scoring model accuracy in Graphite

Correct Predictions

1762 in total out of 1848 test rows. This is defining Model Accuracy = 95.35%

True Positives (TP) = 634: a row was 1 and the model predicted a 1 class for it.

True Negatives (TN) = 1128: a row was 0 and the model predicted a 0 class for it.

Errors

86 in total out of 1848 test rows, 4.65%

False Positives (FP) = 35: a row was 0 and the model predicted a 1 class for it.

False Negatives (FN) = 51: a row was 1 and the model predicted a 0 class for it.

Other Model Scores

Please note that we describe predicted values as Positive and Negative and actual values as True and False.

Accuracy, (TP + TN) / TOTAL.

From all the classes (positive and negative), 95.35% of them we have predicted correctly.
Accuracy should be as high as possible.

Precision, TP / (TP + FP).

From all the classes we have predicted as positive, 94.77% are actually positive.
Precision should be as high as possible.

Recall, TP / (TP + FN).

From all the positive classes, 92.55% we predicted correctly.
Recall should be as high as possible.

F1 score, 2 * (Precision * Recall)/(Precision + Recall).

F1-score is 93.65%.It helps to measure Recall and Precision at the same time. You cannot have a high F1 score without strong model underneath.

Feature importance

Feature importance refers to how much this Model relies upon each column (feature) to make accurate predictions. The more a model relies on a column (feature) to make predictions, the more important it is for the Model. Graphite uses a permutation feature importance for this calculation.

graphite binary model lead scoring model feature importance
Image by the author: lead scoring feature importance in Graphite

The most important feature is column "Tags", then "Last Notable Activity", "Total Time Spent on Website", and so on.

In Graphite, it is very easy to check column like "Tags" in respect to our target column ("Converted"). The most leads that converted have a tag value of "Will revert after reading the email":

Image1
How to Create a Lead Scoring Model with Graphite 13

Predictions for the new leads

Graphite automatically deployed trained Model. That means it is straightforward to predict new, unseen data on leads, whether they will convert or not, and the probability of such an outcome.

Imagine that after you trained the Lead Scoring Model with Graphite, your marketing team informs you about their new lead. 

You can check whether that lead will convert and the probability. A powerful tool to keep you focused only on high-quality leads!

Image2
Prediction in Graphite Note

For this particular new lead, Graphite predicted that it would convert (Converted = 1), with a 97 % probability of such an outcome.

Conclusion

I hope this helped you understand how easy it is to train models in a no-code machine learning software like Graphite Note. With just a few mouse clicks, we could predict the lead conversion.

You can explore all other Graphite Models here. Feel free to train your machine learning model on any dataset with the same ease or schedule a demo if you need any help or have any questions.

I hope you guys enjoyed it!

Now that you are here...

Graphite Note simplifies the use of Machine Learning in analytics by helping business users to generate no-code machine learning models - without writing a single line of code.

If you liked this blog post, you'll love Graphite!
SIGN UP FREE
No Credit Card Required
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram