SAS/STAT Software

DISCRIM Procedure

Given a set of observations that contains one or more quantitative variables and a classification variable which indexes groups of observations, the DISCRIM procedure develops a discriminant criterion to classify each observation into one of the groups. The derived discriminant criterion from this data set can be applied to a second data set during the same execution of PROC DISCRIM. The following are highlights of the DISCRIM procedure's features:

  • when the distribution within each group is assumed to be multivariate normal, the discriminant function is determined by a parametric method (a measure of generalized squared distance)
  • when no assumptions can be made about the distribution within each group, or when the distribution is assumed not to be multivariate normal, nonparametric methods are used to estimate the group-specific densities
  • nonparametric methods include the kernel and k-nearest-neighbor methods
  • uniform, normal, Epanechnikov, biweight, or triweight kernels are used for density estimation
  • Mahalanobis or Euclidean distance can be used to determine proximity
  • Mahalanobis distance can be based on either the full covariance matrix or the diagonal matrix of variances
  • the pooled covariance matrix is used to calculate the Mahalanobis distances with a k-nearest-neighbor method
  • individual within-group covariance matrices or the pooled covariance matrix can be used to calculate the Mahalanobis distances with a kernel method
  • posterior probability estimates of group membership for each class can be evaluated
  • the performance of a discriminant criterion is evaluated by estimating error rates (probabilities of misclassification) in the classification of future observations
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • performs weighted analysis
  • creates a SAS data set that corresponds to any output table

For further details see the DISCRIM Procedure