How to Create a Leads Scoring Model: Definitive Guide

Hrvoje Smolic
-
29/03/2022

Leads Scoring Model

A Lead represents a potential customer interested in buying your products or services. In this era of Advanced Analytics and Machine Learning, every organization wants to transform the process of identifying Leads from a long list of people or companies that have some interest in the product or service you offer. Transform it to make it better and more efficient. To address this issue, the Leads Scoring Model comes into the picture.

Leads Scoring Model
Image by the author: Leads Scoring model training and predictions process

Companies are in business to profit and can only profit if they provide the products or services their customers demand. Meeting these demands means that they can generate revenue to keep the business going and expand it. That is where generating sales leads comes in because if companies are not bringing in new customers, then they will not be able to grow and will begin to stagnate.

But is just generating leads enough?

Let's dive in. 

What is a leads scoring model?

Leads Scoring Model is merely a methodology where we train machine learning models to learn from historical data. In our example, the model will learn to classify leads in two states - "will convert" and "will not convert." It will also understand what influences the leads to convert. 

If your team has many leads but not enough resources to pursue them all, you must prioritize your sales teams' time and give them the best possible leads. That will mean the leads with the highest probability to convert.

 Maybe it will find out that leads who had more than two phone calls are very likely to convert, but only if they have spent more than 10 minutes on your website.

I am sure you will agree that this knowledge is extremely powerful in the hands of any marketing and sales team.

Dataset for leads scoring model

When you acquire the lead, it commonly includes information like:

  • name 
  • demographic
  • tags / comments
  • contact details of the lead
  • Source of origin 
  • Time spent on the website
  • the number of clicks
  • number of emails sent
  • number of phone calls/demos
  • ...

For this post, we are using a popular Lead Conversion dataset from Kaggle. It contains over 9240 past leads with 37 columns. 

The dataset consists of various attributes such as Lead Source, Total Time Spent on Website, Total Visits, or Last Activity. These may or may not help decide whether a lead will be converted. 

The most important variable is the column 'Converted.' It tells whether a past lead was converted to a customer or not.  

  •  '1' means it was converted 
  •  '0' means it wasn't converted.

This is a great example of properly labeled dataset - with column 'Converted'.

Our goal: the company desires to identify the most potential leads, also known as 'Hot Leads.'

Suppose they successfully identify this set of leads. In that case, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone. 

Import dataset

In a few mouse clicks, we imported and parsed a CSV file that we previously downloaded from Kaggle.

lead scoring models dataset
Image by the author: leads scoring training dataset in Graphite

We can browse through our dataset rows, filter, or search on the View Data tab. We have 37 columns and 9240 rows.

Every uploaded dataset in Graphite has a practical Summary tab. It enables, at a glance, to check distributions of numeric columns, the number of null values, and different statistical measures.

lead scoring models data
Image by the author: leads scoring training dataset summary in Graphite

As part of quick exploratory data analysis (EDA), it is always good to check correlations (ready for you on the Correlation tab) in the dataset to understand and "feel" the data better.

leads scoring model accuracy
Image by the author: leads scoring training dataset correlations in Graphite

 

Binary Classification in Machine Learning

Predicting Lead Conversion is a great use case of binary machine learning classification. Binary, because our target variable we will be training the model for can have only two states - '0 - not converted' and '1 - converted'. 

Run the no-code machine learning model in Graphite

Now we have our dataset uploaded, and we are ready to create a no-code machine learning model in Graphite. We chose the Binary Classification model.

In Graphite, to build a binary classification model, you need

  • a binary target column (what are we predicting, with only two distinct states?)
  • a set of features (other columns that have an impact on the target column)

In just a few mouse clicks, we will define a model Scenario.

Our Target column from our dataset:

how to build a lead scoring model
Image by the author: target column selection in Graphite

We will select all other columns as features. 

how to build a lead scoring model
Image by the author: feature columns selection in Graphite

Note how Graphite immediately excluded columns that are not appropriate for modeling. Examples:

  • Prospect ID: it contains 9240 unique values. The column will not be used because it is a categorical value that contains more than 90% unique values.
  • Magazine: The column will not be used because it is a constant. It can not influence the target variable.

Binary classification Model Results

We will leave all other options on default and run this scenario.

Graphite will take care of several preprocessing steps to achieve the best results, so you don't have to think about them. All these preprocessing steps will occur automatically:

  • null values handling
  • missing values
  • One Hot Encoding
  • fix imbalance
  • normalization
  • constants
  • cardinality

Graphite will take a sample of 80% of our data and train several machine learning models. Then, it will test those models on the remaining 20% and calculate relevant model scores. The final best model fit, results, and predictions will be available on the Results tab.

After about 30 seconds, we have our results.

Graphite runs several Machine learning algorithms that work best with binary classification problems by using

  • 80% of the data (7392 rows) for training and
  • 20% (1848 rows) for a test dataset.

The total training time was 36.46 seconds.

The best model based on the F1 value is Light Gradient Boosting Machine. Other models' training metrics are listed below.

how to build a lead scoring model
Image by the author: leads scoring training results in Graphite

The model Fit tab shows how well Graphite performs. For 1848 rows in the test dataset, we compare the Model's predictions of the column 'Converted' to historical, known outcomes for the column 'Converted.' Model fit is better when the historical and predicted bars are closer.

marketing lead scoring
Image by the author: leads scoring model fit in Graphite

Confusion Matrix - How Did the Model Perform?

Confusion Matrix reveals classification errors.
It makes it easy to see whether the Model is confusing two classes. For each class, it summarizes the number of correct and incorrect predictions. The Model predicted column 'Converted' for a test dataset of 1848 rows and compared the predicted outcomes to the historical outcomes.

marketing lead scoring
Image by the author: leads scoring model accuracy in Graphite

Correct Predictions

1762 in total out of 1848 test rows. This is defining Model Accuracy = 95.35%

True Positives (TP) = 634: a row was 1 and the model predicted a 1 class for it.

True Negatives (TN) = 1128: a row was 0 and the model predicted a 0 class for it.

Errors

86 in total out of 1848 test rows, 4.65%

False Positives (FP) = 35: a row was 0 and the model predicted a 1 class for it.

False Negatives (FN) = 51: a row was 1 and the model predicted a 0 class for it.

Other Model Scores

Please note that we describe predicted values as Positive and Negative and actual values as True and False.

Accuracy, (TP + TN) / TOTAL.

From all the classes (positive and negative), 95.35% of them we have predicted correctly.
Accuracy should be as high as possible.

Precision, TP / (TP + FP).

From all the classes we have predicted as positive, 94.77% are actually positive.
Precision should be as high as possible.

Recall, TP / (TP + FN).

From all the positive classes, 92.55% we predicted correctly.
Recall should be as high as possible.

F1 score, 2 * (Precision * Recall)/(Precision + Recall).

F1-score is 93.65%.It helps to measure Recall and Precision at the same time. You cannot have a high F1 score without strong model underneath.

Feature importance

Feature importance refers to how much this Model relies upon each column (feature) to make accurate predictions. The more a model relies on a column (feature) to make predictions, the more important it is for the Model. Graphite uses a permutation feature importance for this calculation.

why lead scoring is important
Image by the author: leads scoring feature importance in Graphite

The most important feature is column "Tags", then "Last Notable Activity", "Total Time Spent on Website", and so on.

In Graphite, it is very easy to check column like "Tags" in respect to our target column ("Converted"). The most leads that converted have a tag value of "Will revert after reading the email":

why lead scoring is important
How to Create a Leads Scoring Model: Definitive Guide 1

Predictions for the New Leads

Graphite automatically deployed trained Model. That means it is straightforward to predict new, unseen data on leads, whether they will convert or not, and the probability of such an outcome.

Imagine that your marketing team informs you about their new lead after you trained the Leads Scoring Model with Graphite. 

You can check whether that lead will convert and the probability. A powerful tool to keep you focused only on high-quality leads!

how to build a lead scoring model
Prediction in Graphite Note

For this particular new lead, Graphite predicted that it would convert (Converted = 1), with a 97 % probability of such an outcome.

Supercharge lead conversion.

Train your lead scoring model

in minutes.

No-code machine learning for everyday business users.

Conclusion

I hope this helped you understand how easy it is to train models in a no-code machine learning software like Graphite Note. With just a few mouse clicks, we could predict the lead conversion.

You can explore all other Graphite Models here. Feel free to train your machine learning model on any dataset with the same ease or schedule a demo if you need any help or have any questions.

I hope you guys enjoyed it!

Disclaimer

This blog post provides insights based on the current research and understanding of AI, machine learning and predictive analytics applications for companies.  Businesses should use this information as a guide and seek professional advice when developing and implementing new strategies.

Note

At Graphite Note, we are committed to providing our readers with accurate and up-to-date information. Our content is regularly reviewed and updated to reflect the latest advancements in the field of predictive analytics and AI.

Author Bio

Hrvoje Smolic, born in 1976 in Zagreb, Croatia, is the accomplished Founder and CEO of Graphite Note. He holds a Master's degree in Physics from the University of Zagreb. In 2010 Hrvoje founded Qualia, a company that created BusinessQ, an innovative SaaS data visualization software utilized by over 15,000 companies worldwide. Continuing his entrepreneurial journey, Hrvoje founded Graphite Note in 2020, a visionary company that seeks to redefine the business intelligence landscape by seamlessly integrating data analytics, predictive analytics algorithms, and effective human communication.

Connect on LinkedIn
Connect on Medium

Now that you are here...

Graphite Note simplifies the use of Machine Learning in analytics by helping business users to generate no-code machine learning models - without writing a single line of code.

If you liked this blog post, you'll love Graphite Note!
14 Days Free Trial, No Credit Card Required
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram