When a variable is specified in both the CLASS and MODEL statements in PROC GLM, the procedure uses GLM parameterization. This is a less than full-rank parameterization in which a CLASS variable with k levels is represented in the design matrix by a set of k 0,1-coded indicator (or "dummy" ) variables. If the SOLUTION option in the MODEL statement is also specified, the following note is included in the displayed results below the parameter estimates table:
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.
Note that there are many possible parameterizations, each of which imposes a different interpretation on the model parameters. However, only the GLM parameterization is available in PROC GLM. For more on the parameterizations available in other procedures, see "Parameterization of Model Effects: Other Parameterizations" in the Shared Concepts and Topics chapter of the SAS/STAT User's Guide and this usage note.
The GLM parameterization provides easily interpretable hypotheses about the model parameters but results in an "overparameterized" model — that is, a model with more parameters than degrees of freedom. The above NOTE is displayed to make you aware of the overparameterized model provided by GLM parameterization and does not indicate a problem with the fitted model. Interpretation of the PROC GLM parameter estimates is discussed and illustrated in this usage note.
In PROC GLM, the GLM parameterization used by the CLASS statement always induces a linear dependency between the intercept and the indicator variables it creates for an effect. Linear dependencies cause the X'X matrix to be singular (less than full-rank), resulting in the nonexistence of a unique inverse. A generalized inverse is computed, resulting in parameter estimates which are only one of an infinite number of possible solutions. The estimates are regarded as biased or not uniquely estimable. With this solution, an estimate is set to zero whenever the design column for that parameter is a linear combination of previous columns. This technique is used in other SAS procedures using least squares estimation and GLM parameterization and is presented in most statistical texts for solving for the parameters estimates of the model. See "Parameterization of PROC GLM Models: Degrees of Freedom" in the Details section of the PROC GLM documentation.
For more details, see the "Statistical Background" section in the "Introduction to Statistical Modeling with SAS/STAT Software" chapter of the SAS/STAT User's Guide. Also see the Searle (1971) and Milliken & Johnson references cited in this usage note as well as "Using the Generalized Inverse" in SAS for Linear Models, Fourth Edition.
To obtain unbiased, unique parameter estimates, a full-rank parameterization is required. One way to do this is by simply dropping the dummy variable associated with the last level of each CLASS variable, and generating any interaction and nested effects using only the remaining dummy variables. The resulting reference parameterization is a full-rank parameterization which carries the same interpretation of the model parameters as does GLM parameterization.
A full-rank analysis using reference parameterization can be conducted using PROC GLMSELECT as shown in the following example.
data a; input drug disease @; do i=1 to 6; input y @; output; end; datalines; 1 1 42 44 36 13 19 22 1 2 33 . 26 . 33 21 1 3 31 -3 . 25 25 24 2 1 28 . 23 34 42 13 2 2 . 34 33 31 . 36 2 3 3 26 28 32 4 16 3 1 . . 1 29 . 19 3 2 . 11 9 7 1 -6 3 3 21 1 . 9 3 . ;
The following PROC GLM analysis produces nonunique parameter estimates and displays the NOTE mentioned above.
proc glm data=a; ods select parameterestimates; class drug disease; model y = drug disease drug*disease / solution; run;
|
The GLMSELECT procedure allows you to use any of several parameterizations. You can use the full-rank reference parameterization by specifying the PARAM=REF option in the CLASS statement. The SELECTION=NONE option in the MODEL statement fits the model as specified rather than allow PROC GLMSELECT to select effects to be in the model. If desired you can specify the reference level for each CLASS variable as discussed in the final section (Use a procedure offering the PARAM=REF and REF= options in the CLASS statement) of this usage note.
proc glmselect data=a; ods select parameterestimates; class drug disease / param=ref; model y = drug disease drug*disease / selection=none; run;
|
Notice that the parameters that are set to zero in the GLM analysis are omitted from the GLMSELECT analysis. Otherwise, the parameter estimates from the GLMSELECT analysis of the reference coded design are identical to the estimates from the initial, GLM coded design. Since the results of the two analyses are identical, you can see that repeating the analysis using reference coding isn't really necessary.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | All | n/a |
Type: | Usage Note |
Priority: | low |
Topic: | Analytics ==> Analysis of Variance SAS Reference ==> Procedures ==> GLM |
Date Modified: | 2010-01-29 11:56:29 |
Date Created: | 2002-12-16 10:56:38 |