The CANDISC Procedure

Output Data Sets

Subsections:

OUT= Data Set
OUTSTAT= Data Set

OUT= Data Set

The OUT= data set contains all the variables in the original data set plus new variables that contain the canonical variable scores. You determine the number of new variables by using the NCAN= option. The names of the new variables are formed as they are for the PREFIX= option. The new variables have means equal to 0 and pooled within-class variances equal to 1. An OUT= data set cannot be created if the DATA= data set is not an ordinary SAS data set.

OUTSTAT= Data Set

The OUTSTAT= data set is similar to the TYPE=CORR data set that the CORR procedure produces but contains many results in addition to those produced by the CORR procedure.

The OUTSTAT= data set is TYPE=CORR, and it contains the following variables:

the BY variables, if any
the CLASS variable
_TYPE_, a character variable of length 8 that identifies the type of statistic
_NAME_, a character variable of length 32 that identifies the row of the matrix or the name of the canonical variable
the quantitative variables (those in the VAR statement, or if there is no VAR statement, all numeric variables not listed in any other statement)

The observations, as identified by the variable _TYPE_, have the following _TYPE_ values:

_TYPE_: Contents
N: number of observations both for the total sample (CLASS variable missing) and within each class (CLASS variable present)
SUMWGT: sum of weights both for the total sample (CLASS variable missing) and within each class (CLASS variable present) if a WEIGHT statement is specified
MEAN: means both for the total sample (CLASS variable missing) and within each class (CLASS variable present)
STDMEAN: total-standardized class means
PSTDMEAN: pooled within-class standardized class means
STD: standard deviations both for the total sample (CLASS variable missing) and within each class (CLASS variable present)
PSTD: pooled within-class standard deviations
BSTD: between-class standard deviations
RSQUARED: univariate R squares

The following kinds of observations are identified by the combination of the variables _TYPE_ and _NAME_. When the _TYPE_ variable has one of the following values, the _NAME_ variable identifies the row of the matrix:

_TYPE_: Contents
CSSCP: corrected SSCP matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)
PSSCP: pooled within-class corrected SSCP matrix
BSSCP: between-class SSCP matrix
COV: covariance matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)
PCOV: pooled within-class covariance matrix
BCOV: between-class covariance matrix
CORR: correlation matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)
PCORR: pooled within-class correlation matrix
BCORR: between-class correlation matrix

When the _TYPE_ variable has one of the following values, the _NAME_ variable identifies the canonical variable:

_TYPE_: Contents
CANCORR: canonical correlations
STRUCTUR: canonical structure
BSTRUCT: between canonical structure
PSTRUCT: pooled within-class canonical structure
SCORE: total sample standardized canonical coefficients
PSCORE: pooled within-class standardized canonical coefficients
RAWSCORE: raw canonical coefficients
CANMEAN: means of the canonical variables for each class

You can use this data set in PROC SCORE to get scores on the canonical variables for new data by using one of the following forms:

* The CLASS variable C is numeric;
proc score data=NewData score=Coef(where=(c = .  )) out=Scores;
run;

* The CLASS variable C is character;
proc score data=NewData score=Coef(where=(c = ' ')) out=Scores;
run;

The WHERE clause is used to exclude the within-class means and standard deviations. PROC SCORE standardizes the new data by subtracting the original variable means that are stored in the _TYPE_=MEAN observations and dividing by the original variable standard deviations from the _TYPE_=STD observations. Then PROC SCORE multiplies the standardized variables by the coefficients from the _TYPE_=SCORE observations to get the canonical scores.