Examine Data Segments

In this section, you will examine segmented or clustered data using the Segment Profile node. A segment is a cluster number derived analytically using SAS Text Miner clustering techniques. The Segment Profile node enables you to get a better idea of what makes each segment unique or at least different from the population. The node generates various reports that aid in exploring and comparing the distribution of these factors within the segments and population.

To examine data segments, complete the following steps:

From the Assess tab, drag and drop a Segment Profile node into the diagram workspace and connect the Text Miner node to the Segment Profile node.
Select the Segment Profile node. Select the button for the Variables property. The Variables — Prof window opens.
Select all the PROB variables and set their Use value to No.

Note: You can hold down Shift and select all the PROB variables by clicking on the first PROB variable and dragging the pointer to select all PROB variables. After all PROB variables are selected, you can change the Use value of each selected PROB variable by changing the Use value of one of the PROB variables. This will change the other PROB Use values to the selected value as well.
Select all the _SVD_variables and set their Use value to No.

Note: You can hold down Shift and select all the _SVD_ variables by clicking on the first _SVD_ variable and dragging the pointer to select all _SVD_ variables. After all _SVD_ variables are selected, you can change the Use value of each selected _SVD_ variable by changing the Use value of one of the _SVD_ variables. This will change the other _SVD_ Use values to the selected value as well.
Click OK.
Select the Segment Profile node in the diagram workspace. In the Properties panel, set the Minimum Worth property to 0.0010.
Right-click the Segment Profile node, and select Run.
Click Yes in the Confirmation dialog box. After the node finishes running, click Results in the Run Status dialog box.
Maximize the Profile: _CLUSTER_ window. The following shows a portion of this window.

The Profile: _CLUSTER_ window displays a lattice, or grid, of plots that compare the distribution for the identified and report variables for both the segment and the population. The graphs shown in this window illustrate variables that have been identified as factors that distinguish the segment from the population that it represents. Each row represents a single segment. The far-left margin identifies the segment, its count, and the percentage of the total population.
The columns are organized from left to right according to their ability to discriminate that segment from the population. Report variables, if specified, appear on the right in alphabetical order after the selected inputs. The lattice graph has the following features:
- Class variable — displays as two nested pie charts that consist of two concentric rings. The inner ring represents the distribution of the total population. The outer ring represents the distribution for the given segment.
- Interval variable — displays as a histogram. The blue shaded region represents the within-segment distribution. The red outline represents the population distribution. The height of the histogram bars can be scaled by count or by percentage of the segment population. When you are using the percentage, the view shows the relative difference between the segment and the population. When you are using the count, the view shows the absolute difference between the segment and the population.
Maximize the Segment Size: _CLUSTER_ window. The following shows a portion of this window.
Maximize the Variable Worth: _CLUSTER_ window. The following shows a portion of this window.
Note the strong relationships between some of the vaccinations given and the clustered categories. You can think of the "wheels" or concentric rings as follows: the inner circle represents all the adverse events, while the outer circle contains only the adverse events in that cluster.
Close the Results window.