Introduction to Discriminant Procedures

Example: Contrasting Univariate and Multivariate Analyses

Consider an artificial data set with two classes of observations indicated by 'H' and 'O'. The following statements generate and plot the data:

data random;
   drop n;

   Group = 'H';
   do n = 1 to 20;
      x = 4.5 + 2 * normal(57391);
      y = x + .5 + normal(57391);
      output;
   end;

   Group = 'O';
   do n = 1 to 20;
      x = 6.25 + 2 * normal(57391);
      y = x - 1 + normal(57391);
      output;
   end;

run;

proc sgplot noautolegend;
   scatter y=y x=x / markerchar=group group=group;
run;

The plot is shown in Figure 10.1.

Figure 10.1: Groups for Contrasting Univariate and Multivariate Analyses

The following statements perform a canonical discriminant analysis and display the results in Figure 10.2:

proc candisc anova;
   class Group;
   var x y;
run;

Figure 10.2: Contrasting Univariate and Multivariate Analyses

The CANDISC Procedure

Total Sample Size	40	DF Total	39
Variables	2	DF Within Classes	38
Classes	2	DF Between Classes	1

Number of Observations Read	40
Number of Observations Used	40

Class Level Information
Group	Variable Name	Frequency	Weight	Proportion
H	H	20	20.0000	0.500000
O	O	20	20.0000	0.500000

The CANDISC Procedure

Univariate Test Statistics
F Statistics, Num DF=1, Den DF=38
Variable	Total Standard Deviation	Pooled Standard Deviation	Between Standard Deviation	R-Square	R-Square / (1-RSq)	F Value	Pr > F
x	2.1776	2.1498	0.6820	0.0503	0.0530	2.01	0.1641
y	2.4215	2.4486	0.2047	0.0037	0.0037	0.14	0.7105

Average R-Square
Unweighted	0.0269868
Weighted by Variance	0.0245201

Multivariate Statistics and Exact F Statistics
S=1 M=0 N=17.5
Statistic	Value	F Value	Num DF	Den DF	Pr > F
Wilks' Lambda	0.64203704	10.31	2	37	0.0003
Pillai's Trace	0.35796296	10.31	2	37	0.0003
Hotelling-Lawley Trace	0.55754252	10.31	2	37	0.0003
Roy's Greatest Root	0.55754252	10.31	2	37	0.0003

The CANDISC Procedure

	Canonical Correlation	Adjusted Canonical Correlation	Approximate Standard Error	Squared Canonical Correlation	Eigenvalues of Inv(E)*H = CanRsq/(1-CanRsq)				Test of H0: The canonical correlations in the current row and all that follow are zero
	Canonical Correlation	Adjusted Canonical Correlation	Approximate Standard Error	Squared Canonical Correlation	Eigenvalue	Difference	Proportion	Cumulative	Likelihood Ratio	Approximate F Value	Num DF	Den DF	Pr > F
1	0.598300	0.589467	0.102808	0.357963	0.5575		1.0000	1.0000	0.64203704	10.31	2	37	0.0003

Note:

The F statistic is exact.

The CANDISC Procedure

Total Canonical Structure
Variable	Can1
x	-0.374883
y	0.101206

Between Canonical Structure
Variable	Can1
x	-1.000000
y	1.000000

Pooled Within Canonical Structure
Variable	Can1
x	-0.308237
y	0.081243

The CANDISC Procedure

Total-Sample Standardized Canonical Coefficients
Variable	Can1
x	-2.625596855
y	2.446680169

Pooled Within-Class Standardized Canonical Coefficients
Variable	Can1
x	-2.592150014
y	2.474116072

Raw Canonical Coefficients
Variable	Can1
x	-1.205756217
y	1.010412967

Class Means on Canonical Variables
Group	Can1
H	0.7277811475
O	-.7277811475

The univariate R squares are very small, 0.0503 for x and 0.0037 for y, and neither variable shows a significant difference between the classes at the 0.10 level.

The multivariate test for differences between the classes is significant at the 0.0003 level. Thus, the multivariate analysis has found a highly significant difference, whereas the univariate analyses failed to achieve even the 0.10 level. The raw canonical coefficients for the first canonical variable, Can1, show that the classes differ most widely on the linear combination -1.205756217 x + 1.010412967 y or approximately y - 1.2 x. The R square between Can1 and the CLASS variable is 0.357963 as given by the squared canonical correlation, which is much higher than either univariate R square.

In this example, the variables are highly correlated within classes. If the within-class correlation were smaller, there would be greater agreement between the univariate and multivariate analyses.