The HPCANDISC Procedure

PROC HPCANDISC Statement

PROC HPCANDISC <options> ;

The PROC HPCANDISC statement invokes the HPCANDISC procedure. Optionally, it also identifies input and output data sets, specifies the analyses performed, and controls displayed output. Table 5.1 summarizes the options available in the PROC HPCANDISC statement.

Table 5.1: PROC HPCANDISC Statement Options

Option

Description

Specify Data Sets

DATA=

Specifies the input data set

OUT=

Specifies the output data set that contains canonical scores

OUTSTAT=

Specifies the output statistics data set

Specify Details of Analysis

NCAN=

Specifies the number of canonical variables

PREFIX=

Specifies a prefix for naming the canonical variables

SINGULAR=

Specifies the singularity criterion

Control Displayed Output

ALL

Displays all output

ANOVA

Displays univariate statistics

BCORR

Displays between correlations

BCOV

Displays between covariances

BSSCP

Displays between SSCPs

DISTANCE

Displays squared Mahalanobis distances

NOPRINT

Suppresses all displayed output

PCORR

Displays pooled correlations

PCOV

Displays pooled covariances

PSSCP

Displays pooled SSCPs

SHORT

Suppresses some displayed output

SIMPLE

Displays simple descriptive statistics

STDMEAN

Displays standardized class means

TCORR

Displays total correlations

TCOV

Displays total covariances

TSSCP

Displays total SSCPs

WCORR

Displays within correlations

WCOV

Displays within covariances

WSSCP

Displays within SSCPs


The following list provides details about these options.

ALL

activates all the display options.

ANOVA

displays univariate statistics for testing the hypothesis that the class means are equal in the population for each variable.

BCORR

displays between-class correlations.

BCOV

displays between-class covariances. The between-class covariance matrix equals the between-class SSCP matrix divided by $n(c-1)/c$, where n is the number of observations and c is the number of classes. The between-class covariances should be interpreted in comparison with the total-sample and within-class covariances, not as formal estimates of population parameters.

BSSCP

displays the between-class SSCP matrix.

DATA=SAS-data-set

specifies the data set to be analyzed. The data set can only be an ordinary SAS data set (raw data). If you omit the DATA= option, PROC HPCANDISC uses the most recently created SAS data set.

If PROC HPCANDISC executes in distributed mode, the input data are distributed to memory on the appliance nodes and analyzed in parallel, unless the data are already distributed in the appliance database. In that case the procedure reads the data alongside the distributed database. For more information, see the section Processing Modes about the various execution modes and the section Alongside-the-Database Execution about the alongside-the-database model in Chapter 3: Shared Concepts and Topics.

DISTANCE
MAHALANOBIS

displays squared Mahalanobis distances between the group means, the F statistics, and the corresponding probabilities of greater squared Mahalanobis distances between the group means.

NCAN=n

specifies the number of canonical variables to be computed. The value of n must be less than or equal to the number of variables. If you specify NCAN=0, PROC HPCANDISC displays the canonical correlations but not the canonical coefficients, structures, or means. A negative value suppresses the canonical analysis entirely. Let v be the number of variables in the VAR statement, and let c be the number of classes. If you omit the NCAN= option, only $\min (v, c-1)$ canonical variables are generated; if you also specify an OUT= output data set, v canonical variables are generated, and the last $v-(c-1)$ canonical variables have missing values.

NOPRINT

suppresses the normal display of results. This option temporarily disables the Output Delivery System (ODS). For more information about ODS, see Chapter 20: Using the Output Delivery System in SAS/STAT User's Guide.

OUT=SAS-data-set

creates an output SAS data set to contain observationwise canonical variable scores. The variables in the input data set are not included in the output data set to avoid data duplication for large data sets; however, variables that are specified in the ID statement are included.

If the input data are in distributed form, in which access of data in a particular order cannot be guaranteed, the HPCANDISC procedure copies the distribution or partition key to the output data set so that its contents can be joined with the input data.

If you want to create a SAS data set in a permanent library, you must specify a two-level name. For more information about permanent libraries and SAS data sets, see SAS Language Reference: Concepts. For more information about OUT= data sets, see the section Output Data Sets.

OUTSTAT=SAS-data-set

creates a TYPE=CORR output SAS data set to contain various statistics, including class means, standard deviations, correlations, canonical correlations, canonical structures, canonical coefficients, and means of canonical variables for each class level.

If you want to create a SAS data set in a permanent library, you must specify a two-level name. For more information about permanent libraries and SAS data sets, see SAS Language Reference: Concepts.

PCORR

displays pooled within-class correlations (partial correlations based on the pooled within-class covariances).

PCOV

displays pooled within-class covariances.

PREFIX=name

specifies a prefix for naming the canonical variables. By default, the names are Can1, Can2, Can3, and so on. If you specify PREFIX=Abc, the components are named Abc1, Abc2, and so on. The number of characters in the prefix plus the number of digits required to designate the canonical variables should not exceed 32. The prefix is truncated if the combined length exceeds 32.

PSSCP

displays the pooled within-class corrected SSCP matrix.

SHORT

suppresses the display of canonical structures, canonical coefficients, and class means on canonical variables; only tables of canonical correlations and multivariate test statistics are displayed.

SIMPLE

displays simple descriptive statistics for the total sample and within each class.

SINGULAR=p

specifies the criterion for determining the singularity of the total-sample correlation matrix and the pooled within-class covariance matrix, where 0 < p < 1. The default is SINGULAR=1E–8.

Let $\mb {S}$ be the total-sample correlation matrix. If the R square for predicting a quantitative variable in the VAR statement from the variables that precede it exceeds 1 – p, then $\mb {S}$ is considered singular. If $\mb {S}$ is singular, the probability levels for the multivariate test statistics and canonical correlations are adjusted for the number of variables whose R square exceeds 1 – p.

If $\mb {S}$ is considered singular and the inverse of $\mb {S}$ (squared Mahalanobis distances) is required, a quasi inverse is used instead. For more information, see the section Quasi-inverse in SAS/STAT User's Guide.

STDMEAN

displays total-sample and pooled within-class standardized class means.

TCORR

displays total-sample correlations.

TCOV

displays total-sample covariances.

TSSCP

displays the total-sample corrected SSCP matrix.

WCORR

displays within-class correlations for each class level.

WCOV

displays within-class covariances for each class level.

WSSCP

displays the within-class corrected SSCP matrix for each class level.