ABC Analysis Model - Model Results

Since ABC analysis divides items into 3 categories, let's analyze these categories by checking the Model Results. The results consist of 3 tabs: ABC Summary, Pareto Chart, and Details Tabs.

Graphite Note - ABC Summary Tab

If we take a look at the ABC Summary Tab, we can see two pie charts - on the first one we can see the percentage of items in each category, while on the other one we can see the total value (revenue) of each category. In the picture above, we can see that 22.65 % of the items belong to category A and they represent 78.58% of the total value, meaning, the biggest profit comes from the items in category A!

Graphite Note - Pareto Chart Tab

The ABC analysis, also called Pareto analysis, is based on the Pareto principle, which says that 80% of the results (output) come from 20% of the efforts (input). The Pareto Chart is a combination of a bar and a line graph - it contains both bars and lines, where each bar represents an item/entity in descending order, while the height of the bar represents the value of the item/entity. The curved orange line represents the cumulative percentage of the item/entity.

There is a long list of benefits from including ABC analysis in your business, such as improved inventory optimization and forecasting, reduced storage expenses, strategic pricing of the products, etc. With Graphite, all you have to do is upload your data, create the desired model and explore the results. Enjoy! πŸ™‚

ABC Analysis Model - Model Scenario

Often companies spend a lot of time managing items/entities that have a low contribution to the profit margin. Every item/entity inside your shop does not have equal value - some of them cost more, some are used more frequently, and some are both. This is where ABC analysis steps in, which helps companies to focus on the right items/entities.

ABC analysis is a classification method in which items/entities are divided into three categories, A, B, and C. Category A is typically the smallest category and consists of the most important items/entities ('the vital few'), while category C is the largest category and consists of least valuable items/entities ('the trivial many').

So far this is the simplest model, i.e only 2 columns are needed in your dataset. You have to identify an ID column in your dataset, such as Product ID or name, SKU, etc. Based on the selected values, the data will be grouped by that column. After that, you have to select the numeric column (feature) which represents the value of the ID column (for example, product/customer revenue or the number of sold units,...).

And that's it - in the next post we will go through the Model Results, so stay tuned! πŸ™‚

General Segmentation Model - Model Results

Let's see how to interpret the results after we have run our model. The results consist of 4 tabs: Cluster Summary, By Cluster, By Numeric Value, and Details Tabs.

Graphite Note - average values of numeric values by cluster

As the model divided your data into clusters, a group of objects where objects in the same cluster are more similar to each other than to those in other clusters, it is essential to compare the average values ​​of the variables across all clusters. That's why in the Cluster Summary Tab you can see the differences between the clusters through the graph. For example, in the picture above, you can see that customers in Cluster0 have the highest average value of Spending Score, unlike the customers in Cluster4.

Wouldn't it be interesting to explore each cluster by a numeric value or each numeric value by a cluster? That's why we have the By Cluster and By Numeric Value Tab - each variable and cluster are analyzed by their minimum and maximum, first and the third quartile, etc. The devil is in the details - details are important, so be conscientious and pay attention to the small things. Last but not least, on the Detail Tab, you can find a detailed table where you can see all relevant values which were used for the above results.

With the right dataset and a few clicks, you will get results that will considerably help you in your business - general segmentation helps you in creating marketing and business strategies for each detected group. It's all up to you now, collect your data and start modeling. πŸ™‚

General Segmentation Model - Model Scenario

With General segmentation, you can find out hidden similarities between the data, such as the similarity between the price of the product or services provided to the purchasing history of the customers. It's an unsupervised algorithm that segments the data into groups, based on some kind of similarity between the numerical variables.


So let's see how you can run this model in Graphite. Firstly, you have to identify an ID column - that way you can identify the customer or product within the groups. After that, you have to select the numeric columns (features) from your dataset on which the segmentation will be based.

Now we move to the tricky part, data preprocessing! We will rarely come across high-quality data - for the model to give the best possible results, we must do some data cleaning and transformation. What to do with the missing values? You can either remove them or replace them with the corresponding value, such as the mean value or prediction. For example, let's suppose you have chosen Age and Height as numeric columns. The values of the variable Age range between 10 and 80, while for the Height between 100 and 210. The algorithm can give more importance to the Height variable, because it has higher values than Age - in case you decide to transform/scale your data, you can either standardize or normalize it.

In the end, you need to determine the number of groups you want to get. In case you are not sure, Graphite will try to determine the best number of groups. But what about the model result? More about that in the next post! πŸ™‚

New vs Returning Customers Model - Model Results

The model results consist of 5 tabs: New vs Returning, Retention %, Revenue New vs Returning, Number of orders New vs Returning, and Detail Tab (in case you are not familiar with the basics of the model, take a look at this link).

Graphite Note - New vs Returning Customers Tab

Depending on the aggregation level, you can see the number of distinct and returning customers detected in the time period on the New vs Returning Tab. For example, in December 2020, there were a total of 2.88k customers, of which 1.84K were new and 1.05K returning.

Graphite Note -Retention % Tab

If you are interested in retention, the percentage of your returning customers, through a time period, switch to the Retention % Tab. The results in the Revenue New vs Returning Tab depend on the Model Scenario: if you have selected a monetary variable in the Model Scenario, you can observe her behavior, depending on the new and returning customers.

Graphite Note - No. of orders New vs Returning Tab

In the model scenario, you had to select the Order ID variable. That part was necessary so that you can track the number of orders over time for your customers in the Number of orders New vs Returning Tab. For example, we can see that returning customers are consistent throughout the period, i.e., they have a larger number of orders each month than new customers. Last but not least, on the Detail Tab, you can find a detailed table where you can see all relevant values which were used for the above results.

New vs Returning Customers Model - Model Scenario

In this report, we want to divide customers into returning and new customers (this is the most fundamental type of customer segmentation). The new customers have made only one purchase from your business, while the returning ones have made more than one. Let’s go through their basic characteristics.

New customers are

while returning customers are

Let's go through the New vs returning customer analysis inside Graphite. The dataset on which you will run your model must contain a time-related column. Since the dataset contains data for a certain period of time, it's important to choose the aggregation level. For example, if weekly aggregation is selected, Graphite will generate a new vs returning customers dataset with a weekly frequency. It is necessary to contain data such as Customer ID and Order ID. Additionally, if you want, you can choose the Monetary (amount spent) variable.

With Graphite, compare absolute figures and percentages, learn how many customers you are currently retaining on a daily, weekly, or monthly basis. In the next post, the focus will be on the Model Results. Stay tuned! πŸ™‚

RFM Customer Segmentation Model - Model Results

As we now know how to run RFM analysis in Graphite, let's go through the Model Results. The results consist of 6 tabs: Recency, Frequency, Monetary, RFM Analysis, RFM Matrix, and Details Tabs. All results are visualized because a visual summary of information makes it easier to identify patterns than looking through thousands of rows.

According to the Recency factor, which is defined as the number of days since the last purchase, we divide customers into 5 groups:

In the Recency Tab, we observe the behavior of the above groups, such as the number of customers, average monetary, average frequency, and average recency per group.

As Frequency is defined as the total number of purchases, customers can buy

Monetary is defined as the amount of money the customer spent, so the customer can be a

In the Frequency and Monetary Tabs, you can track down the same behavior of the related groups, as with the Recency Tab.

RFM analysis ranks every customer in each of these three categories on a scale of 0 (worst) to 4 (best). After that, we assign an RFM score to each customer, by concatenating his numbers for Recency, Frequency, and Monetary value. Depending upon their RFM score, customers can be segregated into the following categories:

All information related to the above groups of customers, such as the number of customers, average monetary, average frequency, and average recency per group, can be found in the RFM Analysis Tab.

The RFM Matrix Tab represents a matrix, showing the number of customers, monetary sum and average, average frequency, and average recency (with breakdown by Recency, Frequency, and Monetary segments). All the values related to the first five tabs, with much more, can be found on the Details Tab, in the form of a table. 

Collect your data and start exploring your customers' behavior: finding the right stability between focusing on new and existing customers is leading to brand trust and loyalty. πŸ™‚

RFM Customer Segmentation Model - Model Scenario

Wouldn't be great to tailor your marketing strategy regarding identified groups of customers? That way, you can target each group with personalized offers, increase profit, improve unit economics, etc.

RFM Customer Segmentation Model identifies customers based on three key factors:

Let's go through the RFM analysis inside Graphite. The dataset on which you will run your RFM Model must contain a time-related column, given that this report studies customer behavior over a period of time. We need to distinguish all customers, so we need an identifier variable like Customer ID. If you might have data about Customer Names, great, if not, don't worry, just select the same column as in the Customer ID field. Finally, we need to choose the numeric variable regard to which we will observe customer behavior, called Monetary (amount spent).

That's it, you are ready to run your first RFM Model. With RFM Model you can develop, sustain, or improve customer relationships, trial your product pricing, and strengthen your profitability or customer retention. In the next post, we will talk more about the model results, so stay tuned. πŸ™‚


Time-series Forecast Model - Remove data points

Last but not least, besides adding country holidays and special events to your model, you can also delete some data points from your dataset.

As we saw in the previous post in the second example, we handled the begging of June 2020, the start of a promotion, as a special event. But in case that we don't know anything about that period, we come to a problem where the model carries the influence of that period to all the same dates in the past and in the future.

In Graphite, all you have to do is enter the start time and the end time of the period you want to delete. In case your time period lasts just one day, the start and the end date should be the same. Also, you can remove multiple data periods. It is important to give the model as much information as possible in order to get the most accurate prediction. In order to do that, you have to know your data, what type of data you are managing, on which date you need to pay attention to, which days are outliers, etc.

Time-series Forecast Model - Country holidays and special dates

Besides the target prediction limitation, we added two new parameters that are related to country holidays and special dates. Make yourself at home, because we will go through these parameters that can significantly improve model accuracy.

There are cases where you can notice some large deviations for certain days in data or in the results of the model. For example, for days around holidays, stores record more customers than during the year, but the model gives too much importance to those days so the predictive values ​​are much higher than expected. But if the model was "informed" about these holidays, we would get much better results - a balance emerges between the data. In Graphite, we added a new parameter Country Holidays: all you have to do is go to the advanced part inside the Model Scenario and select a country or countries for which you want to add holidays. As you can see above, by adding Norway's holidays, we improved ours evaluation metrics (MAPE, MAE, and RMSE are lower, R-squared is higher).

On the other hand, if you have various promotions or events during the year that affect your data, you can add them to the model. But it is important to know when these promotions or events will occur in the future. For instance, at the beginning of June 2020, you can see a huge jump in data (that week was a promotion of a larger share of products within various stores, so the number of sold products was huge). The jump was so big, that the model carried its influence to all the same dates in the past and in the future. But, when that time period is entered as a special event, much better results are obtained, as you can see above; the focus remains on the entered promotion days. To do that in Graphite, you have to enter the name of the promotion, the start date of the promotion, how many days it lasted/will last in the future, and all its future dates.

By combining these two parameters, you can really get much better results. The more information the model receives, the more accurate the prediction will be. There is only one new parameter left, removing data points, but we would go through it in the next post. πŸ™‚