Examine Data Segments

In this section, you will examine segmented or clustered data using the Segment Profile node. A segment is a cluster number that you derive analytically by using SAS Text Miner clustering techniques. The Segment Profile node enables you to get a better idea of what makes each segment unique or at least different from the population. The node generates various reports that aid in exploring and comparing the distribution of these factors within the segments and population. For more information about the Segment Profile node, see the SAS Enterprise Miner Help.

To examine data segments, complete the following steps:

Select the Assess tab on the node toolbar, and drag a Segment Profile node into the diagram workspace.
Connect the Text Cluster node to the Segment Profile node.
Select the Segment Profile node.
Click the for the Variables property.

The Variables window appears.
Select all the “_prob” variables and set their Use value to No.

Note: You can hold down Shift and select all the “_prob” variables by clicking on the first “_prob” variable and dragging the pointer to select all “_prob” variables. After all “_prob” variables are selected, you can change the Use value of each selected “_prob” variable by changing the Use value of one of the “_prob” variables. This will change the other “_prob” Use values to the selected value as well.
Select all the “_SVD” variables and set their Use value to No.
Click OK.
Select the Segment Profile node in the diagram workspace.
Enter 0.0010 as the value for the Minimum Worth property.
Right-click the Segment Profile node, and select Run.
Click Yes in the Confirmation dialog box when you are asked whether you want to run the path.
After the node finishes running, click Results in the Run Status dialog box.
Maximize the Profile window.

The following shows a portion of this window.

The Profile window displays a lattice, or grid, of plots that compare the distribution for the identified and report variables for both the segment and the population. The graphs shown in this window illustrate variables that have been identified as factors that distinguish the segment from the population that it represents. Each row represents a single segment. The far-left margin identifies the segment, its count, and the percentage of the total population.
The columns are organized from left to right according to their ability to discriminate that segment from the population. Report variables, if specified, appear on the right in alphabetical order after the selected inputs. The lattice graph has the following features:
- Class variable — is displayed as two nested pie charts that consist of two concentric rings. The inner ring represents the distribution of the total population. The outer ring represents the distribution for the given segment.
- Interval variable — is displayed as a histogram. The blue shaded region represents the within-segment distribution. The red outline represents the population distribution. The height of the histogram bars can be scaled by count or by percentage of the segment population. When you are using the percentage, the view shows the relative difference between the segment and the population. When you are using the count, the view shows the absolute difference between the segment and the population.
Maximize the Segment Size chart.
Maximize the Variable Worth window.

The following shows a portion of this window.
Close the Results window.