FOCUS AREAS

SAS/STAT Topics

SAS/STAT Software

Discriminant Analysis

The SAS/STAT procedures for discriminant analysis fit data with one classification variable and several quantitative variables. The purpose of discriminant analysis can be to find one or more of the following: a mathematical rule for guessing to which class an observation belongs, a set of linear combinations of the quantitative variables that best reveals the differences among the classes, or a subset of the quantitative variables that best reveals the differences among the classes.

The SAS/STAT discriminant analysis procedures include the following:

CANDISC Procedure


The CANDISC procedure performs a canonical discriminant analysis, computes squared Mahalanobis distances between class means, and performs both univariate and multivariate one-way analyses of variance. The procedure enables you to do the following:

  • display both standardized and unstandardized canonical coefficients
  • display correlations between the canonical variables and the original variables as well as the class means for the canonical variables
  • test the hypothesis that each canonical correlation and all smaller canonical correlations are zero in the population
  • create a data set that contains the canonical coefficients
  • create a data set that contains scored canonical variables
  • create a data set that corresponds to any output table
  • perform BY group processing, which enables you to obtain separate analyses on grouped observations
  • perform weighted analysis
For further details, see CANDISC Procedure

DISCRIM Procedure


Given a set of observations that contains one or more quantitative variables and a classification variable which indexes groups of observations, the DISCRIM procedure develops a discriminant criterion to classify each observation into one of the groups. The derived discriminant criterion from this data set can be applied to a second data set during the same execution of PROC DISCRIM. The following are highlights of the DISCRIM procedure's features:

  • when the distribution within each group is assumed to be multivariate normal, the discriminant function is determined by a parametric method (a measure of generalized squared distance)
  • when no assumptions can be made about the distribution within each group, or when the distribution is assumed not to be multivariate normal, nonparametric methods are used to estimate the group-specific densities
  • nonparametric methods include the kernel and k-nearest-neighbor methods
  • uniform, normal, Epanechnikov, biweight, or triweight kernels are used for density estimation
  • Mahalanobis or Euclidean distance can be used to determine proximity
  • Mahalanobis distance can be based on either the full covariance matrix or the diagonal matrix of variances
  • the pooled covariance matrix is used to calculate the Mahalanobis distances with a k-nearest-neighbor method
  • individual within-group covariance matrices or the pooled covariance matrix can be used to calculate the Mahalanobis distances with a kernel method
  • posterior probability estimates of group membership for each class can be evaluated
  • the performance of a discriminant criterion is evaluated by estimating error rates (probabilities of misclassification) in the classification of future observations
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • performs weighted analysis
  • creates a SAS data set that corresponds to any output table
For further details, see DISCRIM Procedure

STEPDISC Procedure


Given a classification variable and several quantitative variables, the STEPDISC procedure performs a stepwise discriminant analysis to select a subset of the quantitative variables for use in discriminating among the classes. The set of variables that make up each class is assumed to be multivariate normal with a common covariance matrix. The following are highlights of the STEPDISC procedure's features:

  • selection methods include forward selection, backward elimination, and stepwise selection
  • variables are chosen to enter or leave the model according to one of two criteria:
    • the significance level of an F test from an analysis of covariance, where the variables already chosen act as covariates and the variable under consideration is the dependent variable
    • the squared partial correlation for predicting the variable under consideration from the CLASS variable, controlling for the effects of the variables already selected for the model
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • perform weighted analysis
  • creates a SAS data set that corresponds to any output table
For further details, see STEPDISC Procedure