Previous Page | Next Page

The GLMSELECT Procedure

CLASS Variable Parameterization

Consider a model with one classification variable A with four levels, 1, 2, 5, and 7. Details of the possible choices for the PARAM= option follow.

EFFECT

Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of . For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.

Effect Coding

 

Design Matrix

A

A1

A2

A5

Parameter estimates of classification main effects that use the effect coding scheme estimate the difference in the effect of each nonreference level compared to the average effect over all four levels.

GLM

As in PROC GLM, four columns are created to indicate group membership. The design matrix columns for A are as follows.

GLM Coding

 

Design Matrix

A

A1

A2

A5

A7

Parameter estimates of classification main effects that use the GLM coding scheme estimate the difference in the effects of each level compared to the last level.


ORDINAL
THERMOMETER

Three columns are created to indicate group membership of the higher levels of the effect. For the first level of the effect (which for A is 1), all three dummy variables have a value of 0. The design matrix columns for A are as follows.

Ordinal Coding

 

Design Matrix

A

A2

A5

A7

The first level of the effect is a control or baseline level. Parameter estimates of classification main effects that use the ORDINAL coding scheme estimate the effect on the response as the ordinal factor is set to each succeeding level. When the parameters for an ordinal main effect have the same sign, the response effect is monotonic across the levels.

POLYNOMIAL
POLY

Three columns are created. The first represents the linear term (), the second represents the quadratic term (), and the third represents the cubic term (), where is the level value. If the classification levels are not numeric, they are translated into 1, 2, 3, according to their sorting order. The design matrix columns for A are as follows.

Polynomial Coding

 

Design Matrix

A

APOLY1

APOLY2

APOLY3

REFERENCE
REF

Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of 0. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.

Reference Coding

 

Design Matrix

A

A1

A2

A5

Parameter estimates of CLASS main effects that use the reference coding scheme estimate the difference in the effect of each nonreference level compared to the effect of the reference level.

ORTHEFFECT

The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=EFFECT. The design matrix columns for A are as follows.

Orthogonal Effect Coding

 

Design Matrix

A

AOEFF1

AOEFF2

AOEFF3

ORTHORDINAL
ORTHOTHERM

The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=ORDINAL. The design matrix columns for A are as follows.

Orthogonal Ordinal Coding

 

Design Matrix

A

AOORD1

AOORD2

AOORD3

ORTHPOLY

The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=POLY. The design matrix columns for A are as follows.

Orthogonal Polynomial Coding

 

Design Matrix

A

AOPOLY1

AOPOLY2

AOPOLY5

ORTHREF

The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=REFERENCE. The design matrix columns for A are as follows.

Orthogonal Reference Coding

 

Design Matrix

A

AOREF1

AOREF2

AOREF3

The following example illustrates several features of the CLASS statement.

   data codingExample;
      drop i;
      do i=1 to 1000;
        c1 = 1 + mod(i,6);
        if      i < 50  then c2 = 'very low ';
        else if i < 250 then c2 = 'low';
        else if i < 500 then c2 = 'medium';
        else if i < 800 then c2 = 'high';
        else                 c2 = 'very high';     
        x1 = ranuni(1);
        x2 = ranuni(1);
        y = x1 + 10*(c1=3) +5*(c1=5) +rannor(1);
        output;
      end;
   run;
   proc glmselect data=codingExample;
      class c1(param=ref split) c2(param=ordinal order=data) / 
             delimiter = ',' showcoding;
      model y = c1 c2 x1 x2/orderselect;
   run;

Figure 42.11 Class Level Information
The GLMSELECT Procedure

Class Level Information
Class Levels   Values
c1 6 * 1,2,3,4,5,6
c2 5   very low,low,medium,high,very high
* Associated Parameters Split

The "Class Level Information" table shown in Figure 42.11 is produced by default whenever you specify a CLASS statement. Note that because the levels of the variable "c2" contain embedded blanks, the DELIMITER="," option has been specified. The SHOWCODING option requests the display of the "Class Level Coding" table shown in Figure 42.12. An ordinal parameterization is used for "c2" because its levels have a natural order. Furthermore, because these levels appear in their natural order in the data, you can preserve this order by specifying the ORDER=data option.

Figure 42.12 Class Level Coding
Class Level Coding
c1
Level
Design Variables
1 2 3 4 5
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
5 0 0 0 0 1
6 0 0 0 0 0

The SPLIT option has been specified for the classification variable "c1." This permits the parameters associated with the effect "c1" to enter or leave the model individually. The "Parameter Estimates" table in Figure 42.13 shows that for this example the parameters corresponding to only levels 3 and 5 of "c1" are in the selected model. Finally, note that the ORDERSELECT option in the MODEL statement specifies that the parameters are displayed in the order in which they first entered the model.

Figure 42.13 Parameter Estimates
Parameter Estimates
Parameter DF Estimate Standard Error t Value
Intercept 1 -0.216680 0.068650 -3.16
c1_3 1 10.160900 0.087898 115.60
c1_5 1 5.018015 0.087885 57.10
x1 1 1.315468 0.109772 11.98

Previous Page | Next Page | Top of Page