The GLM Procedure

Output Data Sets

OUT= Data Set Created by the OUTPUT Statement

The OUTPUT statement produces an output data set that contains the following:

  • all original data from the SAS data set input to PROC GLM

  • the new variables corresponding to the diagnostic measures specified with statistics keywords in the OUTPUT statement (PREDICTED=, RESIDUAL=, and so on)

With multiple dependent variables, a name can be specified for any of the diagnostic measures for each of the dependent variables in the order in which they occur in the MODEL statement.

For example, suppose that the input data set A contains the variables y1, y2, y3, x1, and x2. Then you can use the following statements:

proc glm data=A;
   model y1 y2 y3=x1;
   output out=out p=y1hat y2hat y3hat
                  r=y1resid lclm=y1lcl uclm=y1ucl;
run;

The output data set out contains y1, y2, y3, x1, x2, y1hat, y2hat, y3hat, y1resid, y1lcl, and y1ucl. The variable x2 is output even though it is not used by PROC GLM. Although predicted values are generated for all three dependent variables, residuals are output for only the first dependent variable.

When any independent variable in the analysis (including all class variables) is missing for an observation, then all new variables that correspond to diagnostic measures are missing for the observation in the output data set.

When a dependent variable in the analysis is missing for an observation, then some new variables that correspond to diagnostic measures are missing for the observation in the output data set, and some are still available. Specifically, in this case, the new variables that correspond to COOKD, COVRATIO, DFFITS, PRESS, R, RSTUDENT, STDR, and STUDENT are missing in the output data set. The variables corresponding to H, LCL, LCLM, P, STDI, STDP, UCL, and UCLM are not missing.

OUT= Data Set Created by the LSMEANS Statement

The OUT= option in the LSMEANS statement produces an output data set that contains the following:

  • the unformatted values of each classification variable specified in any effect in the LSMEANS statement

  • a new variable, LSMEAN, which contains the LS-mean for the specified levels of the classification variables

  • a new variable, STDERR, which contains the standard error of the LS-mean

The variances and covariances among the LS-means are also output when the COV option is specified along with the OUT= option. In this case, only one effect can be specified in the LSMEANS statement, and the following variables are included in the output data set:

  • new variables, COV1, COV2, …, COVn, where n is the number of levels of the effect specified in the LSMEANS statement. These variables contain the covariances of each LS-mean with every other LS-mean.

  • a new variable, NUMBER, which provides an index for each observation to identify the covariances that correspond to that observation. The covariances for the observation with NUMBER equal to n can be found in the variable COVn.

OUTSTAT= Data Set

The OUTSTAT= option in the PROC GLM statement produces an output data set that contains the following:

  • the BY variables, if any

  • _TYPE_, a new character variable. _TYPE_ can take the values ‘SS1’, ‘SS2’, ‘SS3’, ‘SS4’, or ‘CONTRAST’, corresponding to the various types of sums of squares generated, or the values ‘CANCORR’, ‘STRUCTUR’, or ‘SCORE’, if a canonical analysis is performed through the MANOVA statement and no M= matrix is specified.

  • _SOURCE_, a new character variable. For each observation in the data set, _SOURCE_ contains the name of the model effect or contrast label from which the corresponding statistics are generated.

  • _NAME_, a new character variable. For each observation in the data set, _NAME_ contains the name of one of the dependent variables in the model or, in the case of canonical statistics, the name of one of the canonical variables (CAN1, CAN2, and so forth).

  • four new numeric variables: SS, DF, F, and PROB, containing sums of squares, degrees of freedom, F values, and probabilities, respectively, for each model or contrast sum of squares generated in the analysis. For observations resulting from canonical analyses, these variables have missing values.

  • if there is more than one dependent variable, then variables with the same names as the dependent variables represent the following:

    • for _TYPE_=SS1, SS2, SS3, SS4, or CONTRAST, the crossproducts of the hypothesis matrices

    • for _TYPE_=CANCORR, canonical correlations for each variable

    • for _TYPE_=STRUCTUR, coefficients of the total structure matrix

    • for _TYPE_=SCORE, raw canonical score coefficients

The output data set can be used to perform special hypothesis tests (for example, with the IML procedure in SAS/IML software), to reformat output, to produce canonical variates (through the SCORE procedure), or to rotate structure matrices (through the FACTOR procedure).