The STEPDISC Procedure

Input Data Sets

The input data set can be an ordinary SAS data set or one of several specially structured data sets created by statistical procedures available with SAS/STAT software. For more information about these data sets, see Appendix A: Special SAS Data Sets. The BY variable in these data sets becomes the CLASS variable in PROC STEPDISC. These specially structured data sets include the following:

  • TYPE=CORR data sets created by PROC CORR by using a BY statement

  • TYPE=COV data sets created by PROC PRINCOMP by using both the COV option and a BY statement

  • TYPE=CSSCP data sets created by PROC CORR by using the CSSCP option and a BY statement, where the OUT= data set is assigned TYPE=CSSCP with the TYPE= data set option

  • TYPE=SSCP data sets created by PROC REG by using both the OUTSSCP= option and a BY statement

When the input data set is TYPE=CORR, TYPE=COV, or TYPE=CSSCP, the STEPDISC procedure reads the number of observations for each class from the observations with _TYPE_=’N’ and the variable means in each class from the observations with _TYPE_=’MEAN’. The procedure then reads the within-class correlations from the observations with _TYPE_=’CORR’, the standard deviations from the observations with _TYPE_=’STD’ (data set TYPE=CORR), the within-class covariances from the observations with _TYPE_=’COV’ (data set TYPE=COV), or the within-class corrected sums of squares and crossproducts from the observations with _TYPE_=’CSSCP’ (data set TYPE=CSSCP).

When the data set does not include any observations with _TYPE_=’CORR’ (data set TYPE=CORR), _TYPE_=’COV’ (data set TYPE=COV), or _TYPE_=’CSSCP’ (data set TYPE=CSSCP) for each class, PROC STEPDISC reads the pooled within-class information from the data set. In this case, the STEPDISC procedure reads the pooled within-class correlations from the observations with _TYPE_=’PCORR’, the pooled within-class standard deviations from the observations with _TYPE_=’PSTD’ (data set TYPE=CORR), the pooled within-class covariances from the observations with _TYPE_=’PCOV’ (data set TYPE=COV), or the pooled within-class corrected SSCP matrix from the observations with_TYPE_=’PSSCP’ (data set TYPE=CSSCP).

When the input data set is TYPE=SSCP, the STEPDISC procedure reads the number of observations for each class from the observations with _TYPE_=’N’, the sum of weights of observations from the variable INTERCEPT in observations with _TYPE_=’SSCP’ and _NAME_=’INTERCEPT’, the variable sums from the analysis variables in observations with _TYPE_=’SSCP’ and _NAME_=’INTERCEPT’, and the uncorrected sums of squares and crossproducts from the analysis variables in observations with _TYPE_=’SSCP’ and _NAME_=variable-names.