The PROC MODECLUS statement invokes the MODECLUS procedure. Table 66.1 summarizes the options available in the PROC MODECLUS statement. These options are discussed in the following sections.
Table 66.1: Summary of PROC MODECLUS Statement Options
Option |
Description |
---|---|
Specify input and output data sets |
|
Specifies input data set name |
|
Specifies output data set name for observations |
|
Specifies output data set name for clusters |
|
Specifies output data set name for cluster solutions |
|
Specify variables in output data sets |
|
Specifies variable in the OUT= and OUTCLUS= data sets identifying clusters |
|
Specifies variable in the OUT= data set containing density estimates |
|
Specifies length of variables in the output data sets |
|
Summarize and process coordinate data before clustering |
|
Requests simple statistics |
|
Standardizes the variables to mean 0 and standard deviation 1 |
|
Specify smoothing parameters |
|
Specifies number of neighbors to use for kth-nearest-neighbor density estimation |
|
Specifies number of neighbors to use for clustering |
|
Specifies number of neighbors to use for kth-nearest-neighbor density estimation and clustering |
|
Specifies radius of the sphere of support for uniform-kernel density estimation |
|
Specifies radius of the neighborhood for clustering |
|
Specifies radius of the sphere of support for uniform-kernel density estimation and the neighborhood clustering |
|
Specify density estimation options |
|
Specifies number of times the density estimates are to be cascaded |
|
Specifies dimensionality to be used when computing density estimates |
|
Uses arithmetic means for cascading density estimates |
|
Uses harmonic means for cascading density estimates |
|
Uses sums for cascading density estimates |
|
Specify clustering methods and options |
|
Dissolves clusters with n or fewer members |
|
Stops the analysis after obtaining a solution with either no cluster or a single cluster |
|
Requests that nonsignificant clusters be hierarchically joined |
|
Specifies maximum number of clusters to be obtained with METHOD=6 |
|
Specifies clustering method to use |
|
Specifies minimum members for either cluster to be designated a modal cluster when two clusters are joined using METHOD=5 |
|
Specifies power of the density used with METHOD=6 |
|
Specifies approximate significance tests for the number of clusters |
|
Specifies assignment threshold used with METHOD=6 |
|
Specify the output display options |
|
Produces all optional output |
|
Displays the density and cluster membership of observations with neighbors belonging to a different cluster |
|
Retains the neighbor lists for each observation in memory |
|
Displays the estimated cross validated log density of each observation |
|
Displays the estimated density and cluster membership of each observation |
|
Displays estimates of local dimensionality and writes them to the OUT=data set |
|
Displays the neighbors of each observation |
|
Suppresses the display of the output |
|
Suppresses the display of the summary of the number of clusters, number of unassigned observations, and maximum p-value for each analysis |
|
Suppresses the display of statistics for each cluster |
|
Traces the cluster assignments when METHOD=6 |
You can specify at least one of the following options for smoothing parameters for density estimation: DK=, K=, DR=, or R=. To obtain a cluster analysis, you can specify the METHOD= option and at least one of the following smoothing parameters for clustering: CK=, K=, CR=, or R=. If you want significance tests for the number of clusters, you should specify either the DR= or R= option. If none of the smoothing parameters is specified, the MODECLUS procedure provides a default value for the R= option. See the section Density Estimation for the formula of a reasonable first guess for R= and a discussion of smoothing parameters.
You can specify lists of values for the DK=, CK=, K=, DR=, CR=, and R= options. Numbers in the lists can be separated by blanks or commas. You can include in the lists one or more items of the form start TO stop BY increment. Each list can contain either one value or the same number of values as in every other list that contains more than one value. If a list has only one value, that value is used in combination with all the values in longer lists. If two or more lists have more than one value, then one analysis is done by using the first value in each list, another analysis is done by using the second value in each list, and so on.
You can specify the following options in the PROC MODECLUS statement.