In this section,
you will examine segmented or clustered data using the Segment Profile
node. A segment is a cluster number derived analytically using SAS
Text Miner clustering techniques. The Segment Profile node enables
you to get a better idea of what makes each segment unique or at least
different from the population. The node generates various reports
that aid in exploring and comparing the distribution of these factors
within the segments and population.
To examine
data segments, complete the following steps:
-
From
the
Assess tab, drag and drop a Segment Profile
node into the diagram workspace and connect the Text Miner node to
the Segment Profile node.
-
Select
the Segment Profile node. Select the
button for the
Variables property. The Variables — Prof window opens.
-
Select
all the PROB variables and set their Use value to
No.
Note: You can hold
down Shift and select all the PROB variables by clicking on the first
PROB variable and dragging the pointer to select all PROB variables.
After all PROB variables are selected, you can change the Use value
of each selected PROB variable by changing the Use value of one of
the PROB variables. This will change the other PROB Use values to
the selected value as well.
-
Select
all the _SVD_variables and set their
Use value
to
No.
Note: You can hold
down Shift and select all the _SVD_ variables by clicking on the first
_SVD_ variable and dragging the pointer to select all _SVD_ variables.
After all _SVD_ variables are selected, you can change the Use value
of each selected _SVD_ variable by changing the Use value of one of
the _SVD_ variables. This will change the other _SVD_ Use values to
the selected value as well.
-
-
Select
the Segment Profile node in the diagram workspace. In the Properties
panel, set the
Minimum Worth property to
0.0010.
-
Right-click
the Segment Profile node, and select
Run.
-
Click
Yes in the Confirmation dialog box. After the node finishes
running, click
Results in the Run Status
dialog box.
-
Maximize
the Profile: _CLUSTER_ window. The following shows a portion of this
window.
The Profile:
_CLUSTER_ window displays a lattice, or grid, of plots that compare
the distribution for the identified and report variables for both
the segment and the population. The graphs shown in this window illustrate
variables that have been identified as factors that distinguish the
segment from the population that it represents. Each row represents
a single segment. The far-left margin identifies the segment, its
count, and the percentage of the total population.
The columns
are organized from left to right according to their ability to discriminate
that segment from the population. Report variables, if specified,
appear on the right in alphabetical order after the selected inputs.
The lattice graph has the following features:
-
Class variable — displays
as two nested pie charts that consist of two concentric rings. The
inner ring represents the distribution of the total population. The
outer ring represents the distribution for the given segment.
-
Interval variable — displays
as a histogram. The blue shaded region represents the within-segment
distribution. The red outline represents the population distribution.
The height of the histogram bars can be scaled by count or by percentage
of the segment population. When you are using the percentage, the
view shows the relative difference between the segment and the population.
When you are using the count, the view shows the absolute difference
between the segment and the population.
-
Maximize
the Segment Size: _CLUSTER_ window. The following shows a portion
of this window.
-
Maximize
the Variable Worth: _CLUSTER_ window. The following shows a portion
of this window.
-
Note the
strong relationships between some of the vaccinations given and the
clustered categories. You can think of the "wheels" or concentric
rings as follows: the inner circle represents all the adverse events,
while the outer circle contains only the adverse events in
that cluster.
-
Close
the
Results window.