Tips and tricks for your datasets

Before you run your model, it is important to get familiar with the essential parameters of the model. This way you will know what the dataset should look like, ie which columns your dataset should have. Also, you should pay attention to the format of the variables.

Graphite Note - example of monthly data (the first day of months)

In case you want to run the Timeseries Forecast Model, you will have to select the time-related column and the target column. The target column must be numeric and represents the measurement you want to predict. You can have daily, weekly, or monthly data; be careful about the format for weekly and monthly data. For example, each month must be represented with the first or last date of the month. As for the weekly data, it is important to choose which day of the week will represent the week: if Wednesday is selected, the time-related column must contain the dates for each Wednesday of the week.

The following models we will describe are related to customer segmentation, therefore they have similar parameters. Unlike the Timeseries model, they also have optional parameters. Now, first things first. The RFM Customer Segmentation Model requires 3 columns: a time-related, customer ID, and monetary column. Let’s say your dataset contains sales data for a certain period of time (it contains a list of transactions of all customers). You can select a transaction date as a time-related column and sales amount as a monetary column (monetary column must be numeric). Each customer has their own ID: the customer name is not necessary (therefore, the customer name is an optional parameter). The Customer Cohort Analysis Model has the same required parameters as the RFM Model, along with the transaction/order ID. With the New vs Returning Customers Model, the situation is a little bit different: the time-related column, customer, and order ID column are required, while the monetary column (such as sales amount) is optional.

Of course, each of the datasets may have additional variables - new models are coming soon, while the existing ones will upgrade. Stay tuned! 🙂

Dataset examples

What data do I need for modeling? Are data types important? What if I have too many columns? We have highlighted few popular datasets so you can get to know Graphite better. After that, it's all up to you, collect your data and start having insights and fun!

Car Sales (source: GitHub) - The dataset contains monthly data on car sales from 1960 to 1968. It is great for our time series forecast model with which you can predict sales for the upcoming months.

Airline passengers (source: GitHub) - Every Data Scientist throughout his career runs by this dataset. It's a great test example for various time series forecast models, including ours too.

Daily admissions (source: - Machine Learning models can be applied in various fields. For example, this dataset contains daily admissions from a respiratory center. It can be used to predict the number of patients for future days with our Timeseries Forecast Model.

eCommerce orders example (source: - This is a demo CSV with orders for an imaginary eCommerce shop. You can use it for Timeseries forecasting, RFM model, General Segmentation, or New vs Returning Customers model in Graphite.

Mall Customers (source: Kaggle) - a demo Mall Customers dataset from Kaggle. Ideal for General customer segmentation in Graphite.