Create a Segmentation Variable from a Cluster

You can segment data by clustering the data. A cluster is a group of observations that are similar in some way that is suggested by the data. SAS Visual Statistics includes a cluster visualization that automatically segments the data based on the properties and variables that you specify. After clustering the data, you can derive a cluster ID variable that specifies which cluster each observation belongs to. This cluster ID variable is then used in your models. For the following example, you want to cluster the data on several other measures.

Click to create a new visualization.
Click to specify that this visualization is a cluster.
Drag and drop the variables Vehicle Cylinders, Vehicle EngineSize (l), and Vehicle Horsepower onto the visualization. By default, five clusters are created.
Click in the visualization title bar, and select Derive a Cluster ID Variable. Enter Vehicle Clusters in the Name field. Click OK.

This new cluster ID variable contains the cluster assignment for each observation in the data. Observations with missing values are assigned to their own cluster. You will use this variable in the models that you create in the next sections.

Note: Even though five clusters are created, this variable contains six measurement levels (distinct values). This is because there is an additional measurement level created for observations with missing values.
Save the exploration.

You can also use the decision tree visualization to segment the data. After creating a decision tree, you can derive a leaf ID variable that contains the leaf assignment information for each observation.

The cluster ID variable and the leaf ID variable can be used in subsequent visualizations as either an effect or a group by variable. The cluster ID and leaf ID variables persist even if you delete the visualization that created them.

Last updated: August 16, 2017