When working with a
large data set, it often contains several heterogeneous subgroups.
Therefore, it is beneficial to segment the data into these subgroups
and create a separate model for each subgroup. Sometimes, a category
variable exists in your data set that is suitable for segmentation.
If a pre-defined segmentation variable does not exist, you can derive
segmentation information from a decision tree or cluster. This example
shows both cases.
In the Data pane,
find the category variable Vehicle Type.
Notice that this variable contains three distinct values. If you visualize
this variable, you can see that most vehicles are classified as cars,
some are classified as trucks, and a smaller portion are classified
as both a car and a truck. You use the Vehicle Type variable
as a segmentation variable for the linear regression model and GLM
that you create in the next section.