Consider the following
scenario. A baseball manager wants to identify and group players on
the team who are very similar with respect to several statistics of
interest. Note that there is no response variable in this example.
The manager simply wants to identify different groups of players.
The manager also wants to learn what differentiates players in one
group from players in a different group.
The data set for this
example is located in SAMPSIO.DMABASE. The following table contains
a description of the important variables.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Positions played at
the end of 1986
|
|
|
|
League at the end of
1986
|
|
|
|
Division at the end
of 1986
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Years in the major leagues
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For this example, you
set the model role for TEAM, POSITION, LEAGUE, DIVISION, and SALARY
to
Rejected. SALARY is rejected because this
information is contained in the LOGSALAR variable. No target variables
are used in a cluster analysis or self-organizing map (SOM). If you
want to identify groups based on a target variable, consider a predictive
modeling technique and specify a categorical target variable.