A classification variable enters the statistical analysis or model not through its values but through its levels. The process of associating values of a variable with levels is called levelization.
During the process of levelization, observations that share the same value are assigned to the same level. The manner in which values are grouped can be affected by the inclusion of formats. You can determine the sort order of the levels by specifying the ORDER= option in the CLASS statement. You can also control the sort order separately for each variable in the CLASS statement.
Consider the data on nine observations in Table 23.4. The variable A
is integer-valued, and the variable X
is a continuous variable that has a missing value for the fourth observation. The fourth and fifth columns of Table 23.4 apply two different formats to the variable X
.
Table 23.4: Example Data for Levelization
Obs |
A |
X |
FORMAT X 3.0 |
FORMAT X 3.1 |
---|---|---|---|---|
1 |
2 |
1.09 |
1 |
1.1 |
2 |
2 |
1.13 |
1 |
1.1 |
3 |
2 |
1.27 |
1 |
1.3 |
4 |
3 |
. |
. |
. |
5 |
3 |
2.26 |
2 |
2.3 |
6 |
3 |
2.48 |
2 |
2.5 |
7 |
4 |
3.34 |
3 |
3.3 |
8 |
4 |
3.34 |
3 |
3.3 |
9 |
4 |
3.14 |
3 |
3.1 |
By default, levelization of the variables groups the observations by the formatted value of the variable, except for numerical variables for which no explicit format is provided. Those numerical variables are sorted by their internal value. The levelization of the four columns in Table 23.4 leads to the level assignment in Table 23.5.
Table 23.5: Values and Levels
A |
X |
FORMAT X 3.0 |
FORMAT X 3.1 |
|||||
---|---|---|---|---|---|---|---|---|
Obs |
Value |
Level |
Value |
Level |
Value |
Level |
Value |
Level |
1 |
2 |
1 |
1.09 |
1 |
1 |
1 |
1.1 |
1 |
2 |
2 |
1 |
1.13 |
2 |
1 |
1 |
1.1 |
1 |
3 |
2 |
1 |
1.27 |
3 |
1 |
1 |
1.3 |
2 |
4 |
3 |
2 |
. |
. |
. |
. |
. |
. |
5 |
3 |
2 |
2.26 |
4 |
2 |
2 |
2.3 |
3 |
6 |
3 |
2 |
2.48 |
5 |
2 |
2 |
2.5 |
4 |
7 |
4 |
3 |
3.34 |
7 |
3 |
3 |
3.3 |
6 |
8 |
4 |
3 |
3.34 |
7 |
3 |
3 |
3.3 |
6 |
9 |
4 |
3 |
3.14 |
6 |
3 |
3 |
3.1 |
5 |
You can specify the sort order for the levels of CLASS variables in the ORDER= option in the CLASS statement.
When ORDER=FORMATTED (which is the default) is in effect for numeric variables for which you have supplied no explicit format, the levels are ordered by their internal values. To order numeric class levels that have no explicit format by their BEST12. formatted values, you can specify the BEST12. format explicitly for the CLASS variables.
Table 23.6 shows how values of the ORDER= option are interpreted.
Table 23.6: Interpretation of Values of ORDER= Option
Value of ORDER= |
Levels Sorted By |
---|---|
DATA |
Order of appearance in the input data set |
FORMATTED |
External formatted value, except for numeric variables that have no explicit format, which are sorted by their unformatted (internal) value |
FREQ |
Descending frequency count (levels that have the most observations come first in the order) |
INTERNAL |
Unformatted value |
FREQDATA |
Order of descending frequency count, and within counts by order of appearance in the input data set when counts are tied |
FREQFORMATTED |
Order of descending frequency count, and within counts by formatted value when counts are tied |
FREQINTERNAL |
Order of descending frequency count, and within counts by unformatted (internal) value when counts are tied |
For FORMATTED, FREQFORMATTED, FREQINTERNAL, and INTERNAL values, the sort order is machine-dependent. For more information about sort order, see the chapter about the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts.
When you specify the MISSING option in the CLASS statement, the missing values ('.' for a numeric variable and blanks for a character variable) are included in the levelization and are assigned a level. Table 23.7 displays the results of levelizing the values in Table 23.4 when the MISSING option is in effect.
Table 23.7: Values and Levels with the MISSING Option
A |
X |
FORMAT x 3.0 |
FORMAT x 3.1 |
|||||
---|---|---|---|---|---|---|---|---|
Obs |
Value |
Level |
Value |
Level |
Value |
Level |
Value |
Level |
1 |
2 |
1 |
1.09 |
2 |
1 |
2 |
1.1 |
2 |
2 |
2 |
1 |
1.13 |
3 |
1 |
2 |
1.1 |
2 |
3 |
2 |
1 |
1.27 |
4 |
1 |
2 |
1.3 |
3 |
4 |
3 |
2 |
. |
1 |
. |
1 |
. |
1 |
5 |
3 |
2 |
2.26 |
5 |
2 |
3 |
2.3 |
4 |
6 |
3 |
2 |
2.48 |
6 |
2 |
3 |
2.5 |
5 |
7 |
4 |
3 |
3.34 |
8 |
3 |
4 |
3.3 |
7 |
8 |
4 |
3 |
3.34 |
8 |
3 |
4 |
3.3 |
7 |
9 |
4 |
3 |
3.14 |
7 |
3 |
4 |
3.1 |
6 |
When you do not specify the MISSING option, it is important to understand the implications of missing values for your statistical
analysis. When PROC HPSEVERITY levelizes the CLASS variables, any observations for which a CLASS variable has a missing value
are excluded from the analysis. This is true regardless of whether the variable is used to form the statistical model. For
example, consider the case in which some observations contain missing values for variable A
but the records for these observations are otherwise complete with respect to all other variables in the model. The analysis
results that come from the following statements do not include any observations for which variable A
contains missing values, even though A
is not specified in the SCALEMODEL statement:
class A B; scalemodel B x B*x;
You can request PROC HPSEVERITY to print the "Descriptive Statistics" table, which shows the number of observations that are read from the data set and the number of observations that are used in the analysis. Pay careful attention to this table—especially when your data set contains missing values—to ensure that no observations are unintentionally excluded from the analysis.