The CANDISC Procedure |
Output Data Sets |
The OUT= data set contains all the variables in the original data set plus new variables containing the canonical variable scores. You determine the number of new variables by using the NCAN= option. The names of the new variables are formed as described in the PREFIX= option. The new variables have means equal to zero and pooled within-class variances equal to one. An OUT= data set cannot be created if the DATA= data set is not an ordinary SAS data set.
The OUTSTAT= data set is similar to the TYPE=CORR data set produced by the CORR procedure but contains many results in addition to those produced by the CORR procedure.
The OUTSTAT= data set is TYPE=CORR, and it contains the following variables:
the BY variables, if any
the CLASS variable
_TYPE_, a character variable of length 8 that identifies the type of statistic
_NAME_, a character variable of length 32 that identifies the row of the matrix or the name of the canonical variable
the quantitative variables (those in the VAR statement, or if there is no VAR statement, all numeric variables not listed in any other statement)
The observations, as identified by the variable _TYPE_, have the following _TYPE_ values:
Contents
number of observations both for the total sample (CLASS variable missing) and within each class (CLASS variable present)
sum of weights both for the total sample (CLASS variable missing) and within each class (CLASS variable present) if a WEIGHT statement is specified
means both for the total sample (CLASS variable missing) and within each class (CLASS variable present)
total-standardized class means
pooled within-class standardized class means
standard deviations both for the total sample (CLASS variable missing) and within each class (CLASS variable present)
pooled within-class standard deviations
between-class standard deviations
univariate R squares
The following kinds of observations are identified by the combination of the variables _TYPE_ and _NAME_. When the _TYPE_ variable has one of the following values, the _NAME_ variable identifies the row of the matrix:
Contents
corrected SSCP matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)
pooled within-class corrected SSCP matrix
between-class SSCP matrix
covariance matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)
pooled within-class covariance matrix
between-class covariance matrix
correlation matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)
pooled within-class correlation matrix
between-class correlation matrix
When the _TYPE_ variable has one of the following values, the _NAME_ variable identifies the canonical variable:
Contents
canonical correlations
canonical structure
between canonical structure
pooled within-class canonical structure
total sample standardized canonical coefficients
pooled within-class standardized canonical coefficients
raw canonical coefficients
means of the canonical variables for each class
You can use this data set with PROC SCORE to get scores on the canonical variables for new data by using one of the following forms:
* The CLASS variable C is numeric; proc score data=NewData score=Coef(where=(c = . )) out=Scores; run; * The CLASS variable C is character; proc score data=NewData score=Coef(where=(c = ' ')) out=Scores; run;
The WHERE clause is used to exclude the within-class means and standard deviations. PROC SCORE standardizes the new data by subtracting the original variable means that are stored in the _TYPE_=’MEAN’ observations, and dividing by the original variable standard deviations from the _TYPE_=’STD’ observations. Then PROC SCORE multiplies the standardized variables by the coefficients from the _TYPE_=’SCORE’ observations to get the canonical scores.
Copyright © SAS Institute, Inc. All Rights Reserved.