The CANDISC Procedure

Computational Details

Subsections:

General Formulas

General Formulas

Canonical discriminant analysis is equivalent to canonical correlation analysis between the quantitative variables and a set of dummy variables coded from the CLASS variable. In the following notation, the dummy variables are denoted by $\mb{y}$ and the quantitative variables are denoted by $\mb{x}$ . The total sample covariance matrix for the $\mb{x}$ and $\mb{y}$ variables is

$\mb{S} = \left[\begin{matrix} \mb{S}_{xx} & \mb{S}_{xy} \cr \mb{S}_{yx} & \mb{S}_{yy} \end{matrix}\right]$

When c is the number of groups, $n_ t$ is the number of observations in group t, and $\mb{S}_ t$ is the sample covariance matrix for the $\mb{x}$ variables in group t, the within-class pooled covariance matrix for the $\mb{x}$ variables is

$\mb{S}_ p = \frac{1}{\sum n_ t-c} {\sum (n_ t-1)\mb{S}_ t}$

The canonical correlations, $\rho _ i$ , are the square roots of the eigenvalues, $\lambda _ i$ , of the following matrix. The corresponding eigenvectors are $\mb{v}_ i$ .

${\mb{S}_ p}^{-1/2}\mb{S}_{xy}{\mb{S}_{yy}}^{-1}\mb{S}_{yx}{\mb{S}_ p}^{-1/2}$

Let $\mb{V}$ be the matrix that contains the eigenvectors $\mb{v}_ i$ that correspond to nonzero eigenvalues as columns. The raw canonical coefficients are calculated as follows:

$\mb{R} = {\mb{S}_ p}^{-1/2}\mb{V}$

The pooled within-class standardized canonical coefficients are

$\mb{P} = \mr{diag}(\mb{S}_ p)^{1/2}\mb{R}$

The total sample standardized canonical coefficients are

$\mb{T} = \mr{diag}(\mb{S}_{xx})^{1/2}\mb{R}$

Let $\mb{X}_ c$ be the matrix that contains the centered $\mb{x}$ variables as columns. The canonical scores can be calculated by any of the following:

$\mb{X}_ c \, \mb{R}$

$\mb{X}_ c \, \mr{diag}(\mb{S}_ p)^{-1/2}\mb{P}$

$\mb{X}_ c \, \mr{diag}(\mb{S}_{xx})^{-1/2}\mb{T}$

For the multivariate tests based on $\mb{E}^{-1}\mb{H}$ ,

$\mb{E} = (n-1)(\mb{S}_{yy} - \mb{S}_{yx}\mb{S}_{xx}^{-1}\mb{S}_{xy})$

$\mb{H} = (n-1)\mb{S}_{yx}\mb{S}_{xx}^{-1}\mb{S}_{xy}$

where n is the total number of observations.