You can segment data
by clustering the data. A cluster is a group of observations that
are similar in some way that is suggested by the data. SAS Visual
Statistics includes a cluster visualization that automatically segments
the data based on the properties and variables that you specify. After
clustering the data, you can derive a cluster ID variable that specifies
which cluster each observation belongs to. This cluster ID variable
is then used in your models. For the following example, you want to
cluster the data on several other measures.
-
Click
to create a new visualization.
-
Click
to specify that this visualization is a cluster.
-
Drag and drop the variables Vehicle
Cylinders, Vehicle EngineSize (l),
and Vehicle Horsepower onto the visualization.
By default, five clusters are created.
-
Click
in the visualization title bar, and select
Derive
a Cluster ID Variable. Enter
Vehicle
Clusters
in the
Name field.
Click
OK.
This new cluster ID
variable contains the cluster assignment for each observation in the
data. Observations with missing values are assigned to their own cluster.
You will use this variable in the models that you create in the next
sections.
Note: Even though five clusters
are created, this variable contains six measurement levels (distinct
values). This is because there is an additional measurement level
created for observations with missing values.
-
You can also use the
decision tree visualization to segment the data. After creating a
decision tree, you can derive a leaf ID variable that contains the
leaf assignment information for each observation.
The cluster ID variable
and the leaf ID variable can be used in subsequent visualizations
as either an effect or a group by variable. The cluster ID and leaf
ID variables persist even if you delete the visualization that created
them.