In this section, you will examine segmented or clustered
data using the
Segment Profile node. A segment
is a cluster number that you derive analytically by using SAS Text
Miner clustering techniques. The
Segment Profile node
enables you to get a better idea of what makes each segment unique
or at least different from the population. The node generates various
reports that aid in exploring and comparing the distribution of these
factors within the segments and population. For more information about
the
Segment Profile node, see the SAS Enterprise
Miner Help.
To examine data segments,
complete the following steps:
-
Select the
Assess tab
on the node toolbar, and drag a
Segment Profile node
into the diagram workspace.
-
Connect the
Text
Cluster node to the
Segment Profile node.
-
Select the
Segment
Profile node.
-
Click the
for the
Variables property.
The
Variables window
appears.
-
Select all the “_prob”
variables and set their Use value to
No.
Note: You can hold down Shift and
select all the “_prob” variables by clicking on the
first “_prob” variable and dragging the pointer to select
all “_prob” variables. After all “_prob”
variables are selected, you can change the Use value of each selected
“_prob” variable by changing the Use value of one of
the “_prob” variables. This will change the other “_prob”
Use values to the selected value as well.
-
Select all the “_SVD”
variables and set their
Use value to
No.
-
-
Select the
Segment
Profile node in the diagram workspace.
-
Enter
0.0010 as
the value for the
Minimum Worth property.
-
Right-click the
Segment
Profile node, and select
Run.
-
Click
Yes in
the
Confirmation dialog box when you are
asked whether you want to run the path.
-
After the node finishes
running, click
Results in the
Run
Status dialog box.
-
Maximize the
Profile window.
The following shows
a portion of this window.
The
Profile window
displays a lattice, or grid, of plots that compare the distribution
for the identified and report variables for both the segment and the
population. The graphs shown in this window illustrate variables that
have been identified as factors that distinguish the segment from
the population that it represents. Each row represents a single segment.
The far-left margin identifies the segment, its count, and the percentage
of the total population.
The columns are organized
from left to right according to their ability to discriminate that
segment from the population. Report variables, if specified, appear
on the right in alphabetical order after the selected inputs. The
lattice graph has the following features:
-
Class variable — is displayed
as two nested pie charts that consist of two concentric rings. The
inner ring represents the distribution of the total population. The
outer ring represents the distribution for the given segment.
-
Interval variable — is displayed
as a histogram. The blue shaded region represents the within-segment
distribution. The red outline represents the population distribution.
The height of the histogram bars can be scaled by count or by percentage
of the segment population. When you are using the percentage, the
view shows the relative difference between the segment and the population.
When you are using the count, the view shows the absolute difference
between the segment and the population.
-
Maximize the
Segment
Size chart.
-
Maximize the
Variable
Worth window.
The following shows
a portion of this window.
-
Close the
Results window.