The GLMSELECT Procedure |
CLASS Variable Parameterization |
Consider a model with one classification variable A with four levels, 1, 2, 5, and 7. Details of the possible choices for the PARAM= option follow.
Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of . For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.
Effect Coding |
|||
---|---|---|---|
Design Matrix |
|||
A |
A1 |
A2 |
A5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parameter estimates of classification main effects that use the effect coding scheme estimate the difference in the effect of each nonreference level compared to the average effect over all four levels.
As in PROC GLM, four columns are created to indicate group membership. The design matrix columns for A are as follows.
GLM Coding |
||||
---|---|---|---|---|
Design Matrix |
||||
A |
A1 |
A2 |
A5 |
A7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parameter estimates of classification main effects that use the GLM coding scheme estimate the difference in the effects of each level compared to the last level.
Three columns are created to indicate group membership of the higher levels of the effect. For the first level of the effect (which for A is 1), all three dummy variables have a value of 0. The design matrix columns for A are as follows.
Ordinal Coding |
|||
---|---|---|---|
Design Matrix |
|||
A |
A2 |
A5 |
A7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The first level of the effect is a control or baseline level. Parameter estimates of classification main effects that use the ORDINAL coding scheme estimate the effect on the response as the ordinal factor is set to each succeeding level. When the parameters for an ordinal main effect have the same sign, the response effect is monotonic across the levels.
Three columns are created. The first represents the linear term (), the second represents the quadratic term (), and the third represents the cubic term (), where is the level value. If the classification levels are not numeric, they are translated into 1, 2, 3, according to their sorting order. The design matrix columns for A are as follows.
Polynomial Coding |
|||
---|---|---|---|
Design Matrix |
|||
A |
APOLY1 |
APOLY2 |
APOLY3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of 0. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.
Reference Coding |
|||
---|---|---|---|
Design Matrix |
|||
A |
A1 |
A2 |
A5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parameter estimates of CLASS main effects that use the reference coding scheme estimate the difference in the effect of each nonreference level compared to the effect of the reference level.
The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=EFFECT. The design matrix columns for A are as follows.
Orthogonal Effect Coding |
|||
---|---|---|---|
Design Matrix |
|||
A |
AOEFF1 |
AOEFF2 |
AOEFF3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=ORDINAL. The design matrix columns for A are as follows.
Orthogonal Ordinal Coding |
|||
---|---|---|---|
Design Matrix |
|||
A |
AOORD1 |
AOORD2 |
AOORD3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=POLY. The design matrix columns for A are as follows.
Orthogonal Polynomial Coding |
|||
---|---|---|---|
Design Matrix |
|||
A |
AOPOLY1 |
AOPOLY2 |
AOPOLY5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=REFERENCE. The design matrix columns for A are as follows.
Orthogonal Reference Coding |
|||
---|---|---|---|
Design Matrix |
|||
A |
AOREF1 |
AOREF2 |
AOREF3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following example illustrates several features of the CLASS statement.
data codingExample; drop i; do i=1 to 1000; c1 = 1 + mod(i,6); if i < 50 then c2 = 'very low '; else if i < 250 then c2 = 'low'; else if i < 500 then c2 = 'medium'; else if i < 800 then c2 = 'high'; else c2 = 'very high'; x1 = ranuni(1); x2 = ranuni(1); y = x1 + 10*(c1=3) +5*(c1=5) +rannor(1); output; end; run; proc glmselect data=codingExample; class c1(param=ref split) c2(param=ordinal order=data) / delimiter = ',' showcoding; model y = c1 c2 x1 x2/orderselect; run;
Class Level Information | |||
---|---|---|---|
Class | Levels | Values | |
c1 | 6 | * | 1,2,3,4,5,6 |
c2 | 5 | very low,low,medium,high,very high | |
* Associated Parameters Split |
The "Class Level Information" table shown in Figure 42.11 is produced by default whenever you specify a CLASS statement. Note that because the levels of the variable "c2" contain embedded blanks, the DELIMITER="," option has been specified. The SHOWCODING option requests the display of the "Class Level Coding" table shown in Figure 42.12. An ordinal parameterization is used for "c2" because its levels have a natural order. Furthermore, because these levels appear in their natural order in the data, you can preserve this order by specifying the ORDER=data option.
The SPLIT option has been specified for the classification variable "c1." This permits the parameters associated with the effect "c1" to enter or leave the model individually. The "Parameter Estimates" table in Figure 42.13 shows that for this example the parameters corresponding to only levels 3 and 5 of "c1" are in the selected model. Finally, note that the ORDERSELECT option in the MODEL statement specifies that the parameters are displayed in the order in which they first entered the model.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.