PROC GLMSELECT: CLASS Variable Parameterization :: SAS/STAT(R) 9.2 User's Guide, Second Edition

The GLMSELECT Procedure

CLASS Variable Parameterization

Consider a model with one classification variable A with four levels, 1, 2, 5, and 7. Details of the possible choices for the PARAM= option follow.

EFFECT

Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of $\text{[math]}$ . For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.

Effect Coding
	Design Matrix
A	A1	A2	A5
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

Parameter estimates of classification main effects that use the effect coding scheme estimate the difference in the effect of each nonreference level compared to the average effect over all four levels.

GLM

As in PROC GLM, four columns are created to indicate group membership. The design matrix columns for A are as follows.

GLM Coding
	Design Matrix
A	A1	A2	A5	A7
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

Parameter estimates of classification main effects that use the GLM coding scheme estimate the difference in the effects of each level compared to the last level.

ORDINAL

THERMOMETER

Three columns are created to indicate group membership of the higher levels of the effect. For the first level of the effect (which for A is 1), all three dummy variables have a value of 0. The design matrix columns for A are as follows.

Ordinal Coding
	Design Matrix
A	A2	A5	A7
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The first level of the effect is a control or baseline level. Parameter estimates of classification main effects that use the ORDINAL coding scheme estimate the effect on the response as the ordinal factor is set to each succeeding level. When the parameters for an ordinal main effect have the same sign, the response effect is monotonic across the levels.

POLYNOMIAL

POLY

Three columns are created. The first represents the linear term ( $\text{[math]}$ ), the second represents the quadratic term ( $\text{[math]}$ ), and the third represents the cubic term ( $\text{[math]}$ ), where $\text{[math]}$ is the level value. If the classification levels are not numeric, they are translated into 1, 2, 3, $\text{[math]}$ according to their sorting order. The design matrix columns for A are as follows.

Polynomial Coding
	Design Matrix
A	APOLY1	APOLY2	APOLY3
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

REFERENCE

REF

Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of 0. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.

Reference Coding
	Design Matrix
A	A1	A2	A5
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

Parameter estimates of CLASS main effects that use the reference coding scheme estimate the difference in the effect of each nonreference level compared to the effect of the reference level.

ORTHEFFECT

The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=EFFECT. The design matrix columns for A are as follows.

Orthogonal Effect Coding
	Design Matrix
A	AOEFF1	AOEFF2	AOEFF3
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

ORTHORDINAL

ORTHOTHERM

The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=ORDINAL. The design matrix columns for A are as follows.

Orthogonal Ordinal Coding
	Design Matrix
A	AOORD1	AOORD2	AOORD3
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

ORTHPOLY

The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=POLY. The design matrix columns for A are as follows.

Orthogonal Polynomial Coding
	Design Matrix
A	AOPOLY1	AOPOLY2	AOPOLY5
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

ORTHREF

The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=REFERENCE. The design matrix columns for A are as follows.

Orthogonal Reference Coding
	Design Matrix
A	AOREF1	AOREF2	AOREF3
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The following example illustrates several features of the CLASS statement.

   data codingExample;
      drop i;
      do i=1 to 1000;
        c1 = 1 + mod(i,6);
        if      i < 50  then c2 = 'very low ';
        else if i < 250 then c2 = 'low';
        else if i < 500 then c2 = 'medium';
        else if i < 800 then c2 = 'high';
        else                 c2 = 'very high';     
        x1 = ranuni(1);
        x2 = ranuni(1);
        y = x1 + 10*(c1=3) +5*(c1=5) +rannor(1);
        output;
      end;
   run;
   proc glmselect data=codingExample;
      class c1(param=ref split) c2(param=ordinal order=data) / 
             delimiter = ',' showcoding;
      model y = c1 c2 x1 x2/orderselect;
   run;

Figure 42.11 Class Level Information

The GLMSELECT Procedure

Class Level Information
Class	Levels		Values
c1	6	*	1,2,3,4,5,6
c2	5		very low,low,medium,high,very high
* Associated Parameters Split

The "Class Level Information" table shown in Figure 42.11 is produced by default whenever you specify a CLASS statement. Note that because the levels of the variable "c2" contain embedded blanks, the DELIMITER="," option has been specified. The SHOWCODING option requests the display of the "Class Level Coding" table shown in Figure 42.12. An ordinal parameterization is used for "c2" because its levels have a natural order. Furthermore, because these levels appear in their natural order in the data, you can preserve this order by specifying the ORDER=data option.

Figure 42.12 Class Level Coding

Class Level Coding
c1 Level	Design Variables
c1 Level	1	2	3	4	5
1	1	0	0	0	0
2	0	1	0	0	0
3	0	0	1	0	0
4	0	0	0	1	0
5	0	0	0	0	1
6	0	0	0	0	0

The SPLIT option has been specified for the classification variable "c1." This permits the parameters associated with the effect "c1" to enter or leave the model individually. The "Parameter Estimates" table in Figure 42.13 shows that for this example the parameters corresponding to only levels 3 and 5 of "c1" are in the selected model. Finally, note that the ORDERSELECT option in the MODEL statement specifies that the parameters are displayed in the order in which they first entered the model.

Figure 42.13 Parameter Estimates

Parameter Estimates
Parameter	DF	Estimate	Standard Error	t Value
Intercept	1	-0.216680	0.068650	-3.16
c1_3	1	10.160900	0.087898	115.60
c1_5	1	5.018015	0.087885	57.10
x1	1	1.315468	0.109772	11.98

Top of Page