With General segmentation, you can find out hidden similarities between the data, such as the similarity between the price of the product or services provided to the purchasing history of the customers. It's an unsupervised algorithm that segments the data into groups, based on some kind of similarity between the numerical variables.
So let's see how you can run this model in Graphite. Firstly, you have to identify an ID column - that way you can identify the customer or product within the groups. After that, you have to select the numeric columns (features) from your dataset on which the segmentation will be based.
Now we move to the tricky part, data preprocessing! We will rarely come across high-quality data - for the model to give the best possible results, we must do some data cleaning and transformation. What to do with the missing values? You can either remove them or replace them with the corresponding value, such as the mean value or prediction. For example, let's suppose you have chosen Age and Height as numeric columns. The values of the variable Age range between 10 and 80, while for the Height between 100 and 210. The algorithm can give more importance to the Height variable, because it has higher values than Age - in case you decide to transform/scale your data, you can either standardize or normalize it.
In the end, you need to determine the number of groups you want to get. In case you are not sure, Graphite will try to determine the best number of groups. But what about the model result? More about that in the next post! 🙂