Consider an artificial data set with two classes of observations indicated by 'H' and 'O'. The following statements generate and plot the data:

data random; drop n; Group = 'H'; do n = 1 to 20; x = 4.5 + 2 * normal(57391); y = x + .5 + normal(57391); output; end; Group = 'O'; do n = 1 to 20; x = 6.25 + 2 * normal(57391); y = x - 1 + normal(57391); output; end; run; proc sgplot noautolegend; scatter y=y x=x / markerchar=group group=group; run;

The plot is shown in Figure 10.1.

The following statements perform a canonical discriminant analysis and display the results in Figure 10.2:

proc candisc anova; class Group; var x y; run;

Figure 10.2: Contrasting Univariate and Multivariate Analyses

The CANDISC Procedure

Univariate Test Statistics | |||||||
---|---|---|---|---|---|---|---|

F Statistics, Num DF=1, Den DF=38 | |||||||

Variable | Total Standard Deviation |
Pooled Standard Deviation |
Between Standard Deviation |
R-Square | R-Square / (1-RSq) |
F Value | Pr > F |

x | 2.1776 | 2.1498 | 0.6820 | 0.0503 | 0.0530 | 2.01 | 0.1641 |

y | 2.4215 | 2.4486 | 0.2047 | 0.0037 | 0.0037 | 0.14 | 0.7105 |

The CANDISC Procedure

Canonical Correlation |
Adjusted Canonical Correlation |
Approximate Standard Error |
Squared Canonical Correlation |
Eigenvalues of Inv(E)*H = CanRsq/(1-CanRsq) |
Test of H0: The canonical correlations in the current row and all that follow are zero | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Eigenvalue | Difference | Proportion | Cumulative | Likelihood Ratio |
Approximate F Value |
Num DF | Den DF | Pr > F | |||||

1 | 0.598300 | 0.589467 | 0.102808 | 0.357963 | 0.5575 | 1.0000 | 1.0000 | 0.64203704 | 10.31 | 2 | 37 | 0.0003 |

Note: | The F statistic is exact. |

The univariate R squares are very small, 0.0503 for `x`

and 0.0037 for `y`

, and neither variable shows a significant difference between the classes at the 0.10 level.

The multivariate test for differences between the classes is significant at the 0.0003 level. Thus, the multivariate analysis
has found a highly significant difference, whereas the univariate analyses failed to achieve even the 0.10 level. The raw
canonical coefficients for the first canonical variable, `Can1`

, show that the classes differ most widely on the linear combination -1.205756217 `x`

+ 1.010412967 `y`

or approximately `y`

- 1.2 `x`

. The R square between `Can1`

and the CLASS variable is 0.357963 as given by the squared canonical correlation, which is much higher than either univariate
R square.

In this example, the variables are highly correlated within classes. If the within-class correlation were smaller, there would be greater agreement between the univariate and multivariate analyses.