Identify a Segmentation Variable

When working with a large data set, it often contains several heterogeneous subgroups. Therefore, it is beneficial to segment the data into these subgroups and create a separate model for each subgroup. Sometimes, a category variable exists in your data set that is suitable for segmentation. If a pre-defined segmentation variable does not exist, you can derive segmentation information from a decision tree or cluster. This example shows both cases.
In the Data pane, find the category variable Vehicle Type. Notice that this variable contains three distinct values. If you visualize this variable, you can see that most vehicles are classified as cars, some are classified as trucks, and a smaller portion are classified as both a car and a truck. You use the Vehicle Type variable as a segmentation variable for the linear regression model and GLM that you create in the next section.
Last updated: August 16, 2017