The CANDISC Procedure

Computational Details


General Formulas

Canonical discriminant analysis is equivalent to canonical correlation analysis between the quantitative variables and a set of dummy variables coded from the CLASS variable. In the following notation, the dummy variables are denoted by $\mb{y}$ and the quantitative variables are denoted by $\mb{x}$. The total sample covariance matrix for the $\mb{x}$ and $\mb{y}$ variables is

\[ \mb{S} = \left[\begin{matrix} \mb{S}_{xx} & \mb{S}_{xy} \cr \mb{S}_{yx} & \mb{S}_{yy} \end{matrix}\right] \]

When c is the number of groups, $n_ t$ is the number of observations in group t, and $\mb{S}_ t$ is the sample covariance matrix for the $\mb{x}$ variables in group t, the within-class pooled covariance matrix for the $\mb{x}$ variables is

\[ \mb{S}_ p = \frac{1}{\sum n_ t-c} {\sum (n_ t-1)\mb{S}_ t} \]

The canonical correlations, $\rho _ i$, are the square roots of the eigenvalues, $\lambda _ i$, of the following matrix. The corresponding eigenvectors are $\mb{v}_ i$.

\[ {\mb{S}_ p}^{-1/2}\mb{S}_{xy}{\mb{S}_{yy}}^{-1}\mb{S}_{yx}{\mb{S}_ p}^{-1/2} \]

Let $\mb{V}$ be the matrix that contains the eigenvectors $\mb{v}_ i$ that correspond to nonzero eigenvalues as columns. The raw canonical coefficients are calculated as follows:

\[ \mb{R} = {\mb{S}_ p}^{-1/2}\mb{V} \]

The pooled within-class standardized canonical coefficients are

\[ \mb{P} = \mr{diag}(\mb{S}_ p)^{1/2}\mb{R} \]

The total sample standardized canonical coefficients are

\[ \mb{T} = \mr{diag}(\mb{S}_{xx})^{1/2}\mb{R} \]

Let $\mb{X}_ c$ be the matrix that contains the centered $\mb{x}$ variables as columns. The canonical scores can be calculated by any of the following:

\[ \mb{X}_ c \, \mb{R} \]
\[ \mb{X}_ c \, \mr{diag}(\mb{S}_ p)^{-1/2}\mb{P} \]
\[ \mb{X}_ c \, \mr{diag}(\mb{S}_{xx})^{-1/2}\mb{T} \]

For the multivariate tests based on $\mb{E}^{-1}\mb{H}$,

\[ \mb{E} = (n-1)(\mb{S}_{yy} - \mb{S}_{yx}\mb{S}_{xx}^{-1}\mb{S}_{xy}) \]
\[ \mb{H} = (n-1)\mb{S}_{yx}\mb{S}_{xx}^{-1}\mb{S}_{xy} \]

where n is the total number of observations.