You use the CLASS statement to identify classification (qualitative) variables, which are factors that separate the observations into groups. For example, a completely randomized design has a single class-variable that identifies the groups of observations. A randomized complete block design has two class-variables; one identifies the blocks and one identifies the treatments.
You can specify various v-options for each variable by enclosing them in parentheses after the variable name. You can also specify global v-options for the CLASS statement by placing them after a slash (/). Global v-options are applied to all the variables specified in the CLASS statement. However, individual CLASS variable v-options override the global v-options.
Class-variables can be either numeric or character. The OPTEX procedure uses the formatted values of class-variables in forming model effects. Any variable in the model that is not listed in the CLASS statement is assumed to be continuous (quantitative). Continuous variables must be numeric.
Note: If you specify a data set containing fixed covariate effects with a DESIGN= data set in the BLOCKS statement, then a CLASS or MODEL statement that follows the BLOCKS statement refers to the model for the fixed covariates. A CLASS or MODEL statement that defines the model for the candidate points (treatment model) should be specified before the BLOCKS statement.
specifies the sorting order for the levels of classification variables. This ordering determines which parameters in the model correspond to each level in the data, so the ORDER= option may be useful when you use the CONTRAST statement. When ORDER=FORMATTED (the default) for numeric variables for which you have supplied no explicit format (that is, for which there is no corresponding FORMAT statement in the current PROC OPTEX run or in the DATA step that created the data set), the levels are ordered by their internal (numeric) value. Note that this represents a change from previous releases for how class levels are ordered. Before SAS 8, numeric class levels with no explicit format were ordered by their BEST12. formatted values, and in order to revert to the previous ordering you can specify this format explicitly for the affected classification variables. The change was implemented because the former default behavior for ORDER=FORMATTED often resulted in levels not being ordered numerically. The following table shows how PROC OPTEX interprets values of the ORDER= option.
Value of ORDER= |
Levels Sorted By |
---|---|
DATA |
order of appearance in the input data set |
FORMATTED |
external formatted value, except for numeric |
variables with no explicit format, which are |
|
sorted by their unformatted (internal) value |
|
FREQ |
descending frequency count; levels with the |
most observations come first in the order |
|
INTERNAL |
unformatted value |
By default, ORDER=FORMATTED. For FORMATTED and INTERNAL, the sort order is machine dependent.
For more information on sorting order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts.
specifies the parameterization method for the classification variable(s). Design matrix columns are created from CLASS variables according to the following coding schemes. The default is PARAM=ORTHEFFECT. Note that this represents a change from previous releases for how classification variables are parameterized. Before SAS 9, the default was PARAM=EFFECT, and in order to revert to the previous parameterization you can specify PARAM=EFFECT explicitly for the affected classification variables. The change was implemented because an orthogonal parameterization leads to D- and A-efficiency values that more realistically reflect the true efficiency of the design. If PARAM=ORTHPOLY or PARAM=POLY, and the CLASS levels are numeric, then the ORDER= option in the CLASS statement is ignored, and the internal, unformatted values are used.
specifies effect coding
specifies polynomial coding
specifies reference cell coding
specifies ordinal, or "thermometer" coding
specifies orthogonal effect coding
specifies orthogonal polynomial coding
specifies orthogonal reference cell coding
specifies orthogonal ordinal coding
All of these parameterizations are full rank. The orthogonal versions perform a scaled, intercept-augmented Gram-Schmidt orthogonalization on the columns of the corresponding nonorthogonal parameterizations. For the EFFECT and REFERENCE parameterizations, the REF= option in the CLASS statement determines the reference level.
Consider a model with one CLASS variable A with four levels, 1, 2, 5, and 7. Details of the possible choices for the PARAM= option follow.
Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of –1. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.
Effect Coding |
|||
---|---|---|---|
A |
Design Matrix |
||
1 |
1 |
0 |
0 |
2 |
0 |
1 |
0 |
5 |
0 |
0 |
1 |
7 |
–1 |
–1 |
–1 |
Parameter estimates of CLASS main effects that uses the effect coding scheme estimate the difference in the effect of each nonreference level compared to the average effect over all 4 levels.
Three columns are created. The first represents the linear term (x), the second represents the quadratic term (), and the third represents the cubic term (), where x is the level value. If the CLASS levels are not numeric, they are translated into 1, 2, 3, according to their sorting order. The design matrix columns for A are as follows.
Polynomial Coding |
|||
---|---|---|---|
A |
Design Matrix |
||
1 |
1 |
1 |
1 |
2 |
2 |
4 |
8 |
5 |
5 |
25 |
125 |
7 |
7 |
49 |
343 |
Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of 0. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.
Reference Coding |
|||
---|---|---|---|
A |
Design Matrix |
||
1 |
1 |
0 |
0 |
2 |
0 |
1 |
0 |
5 |
0 |
0 |
1 |
7 |
0 |
0 |
0 |
Parameter estimates of CLASS main effects that uses the reference coding scheme estimate the difference in the effect of each nonreference level compared to the effect of the reference level.
Three columns are created to indicate group membership in successive collections of levels after the first. For instance, the design matrix columns for A are as follows.
Ordinal Coding |
|||
---|---|---|---|
A |
Design Matrix |
||
1 |
0 |
0 |
0 |
2 |
1 |
0 |
0 |
5 |
1 |
1 |
0 |
7 |
1 |
1 |
1 |
Parameter estimates of CLASS main effects that uses the ordinal coding scheme estimate the difference in the average effect of each successive collection of levels compared to the effect of the first level.
The columns are obtained by applying the Gram-Schmidt orthogonalization to the mean-centered columns for PARAM=EFFECT, and then scaling so that the sum of squares for each column equals the number of levels. The design matrix columns for A are as follows.
Orthogonal Effects Coding |
|||
---|---|---|---|
A |
Design Matrix |
||
1 |
1.414 |
–0.816 |
–0.577 |
2 |
0 |
1.633 |
–0.577 |
5 |
0 |
0 |
1.732 |
7 |
–1.414 |
–0.816 |
–0.577 |
The columns are obtained by applying the Gram-Schmidt orthogonalization to the mean-centered columns for PARAM=POLY, and then scaling so that the sum of squares for each column equals the number of levels. The design matrix columns for A are as follows.
Orthogonal Polynomial Coding |
|||
---|---|---|---|
A |
Design Matrix |
||
1 |
–1.153 |
0.907 |
–0.921 |
2 |
–0.734 |
–0.540 |
1.473 |
5 |
0.524 |
–1.370 |
–0.921 |
7 |
1.363 |
1.004 |
0.368 |
The columns are obtained by applying the Gram-Schmidt orthogonalization to the mean-centered columns for PARAM=REFERENCE, and then scaling so that the sum of squares for each column equals the number of levels. The design matrix columns for A are as follows.
Orthogonal Reference Coding |
|||
---|---|---|---|
A |
Design Matrix |
||
1 |
1.732 |
0 |
0 |
2 |
–0.577 |
1.633 |
0 |
5 |
–0.577 |
–0.816 |
1.414 |
7 |
–0.577 |
–0.816 |
–1.414 |
The columns are obtained by applying the Gram-Schmidt orthogonalization to the mean-centered columns for PARAM=REFERENCE, and then scaling so that the sum of squares for each column equals the number of levels. The design matrix columns for A are as follows.
Orthogonal Ordinal Coding |
|||
---|---|---|---|
A |
Design Matrix |
||
1 |
–1.732 |
0 |
0 |
2 |
0.577 |
–1.633 |
0 |
5 |
0.577 |
0.816 |
–1.414 |
7 |
0.577 |
0.816 |
1.414 |
specifies the reference level for PARAM=EFFECT or PARAM=REFERENCE. For an individual (but not a global) variable REF= option, you can specify the level of the variable to use as the reference level. For a global or individual variable REF= option, you can use one of the following keywords. The default is REF=LAST.
designates the first ordered level as reference
designates the last ordered level as reference
specifies that class levels should be determined by using only up to the first 16 characters of the formatted values of CLASS variables. When formatted values are longer than 16 characters, you can use this option in order to revert to the levels as determined in releases previous to SAS 9.