The HPSEVERITY Procedure

Specification and Parameterization of Model Effects

PROC HPSEVERITY supports formation of regression effects in the SCALEMODEL statement. A regression effect is formed from one or more regressor variables according to effect construction rules (parameterization). Each regression effect forms one element of $\bX $ in the linear model structure $\bX \bbeta $ that affects the scale parameter. The SCALEMODEL statement in conjunction with the CLASS statement supports a rich set of effects. In order to correctly interpret the results, you need to understand the specification and parameterization of effects that are discussed in this section.

Effects are specified by a special notation that uses variable names and operators. There are two types of regressor variables: classification (or CLASS) variables and continuous variables. Classification variables can be either numeric or character and are specified in a CLASS statement. For more information, see the section Levelization of Classification Variables. A regressor variable that is not declared in the CLASS statement is assumed to be continuous.

Two primary operators (crossing and nesting) are used for combining the variables, and several additional operators are used to simplify effect specification. Operators are discussed in the section Effect Operators.

If you specify the CLASS statement, then PROC HPSEVERITY supports a general linear model (GLM) parameterization and a reference parameterization for the classification variables. The GLM parameterization is the default. For more information, see the sections GLM Parameterization of Classification Variables and Effects and Reference Parameterization.

Effect Operators

Table 9.8 summarizes the operators that are available for selecting and constructing effects. These operators are discussed in the following sections.

Table 9.8: Available Effect Operators

Operator

Example

Description

Interaction

A*B

Crosses the levels of the effects

Nesting

A(B)

Nests A levels within B levels

Bar operator

A | B | C

Specifies all interactions

At sign operator

A | B | C@2

Reduces interactions in bar effects

Dash operator

A1-A10

Specifies sequentially numbered variables

Colon operator

A:

Specifies variables that have a common prefix

Double dash operator

A- -C

Specifies sequential variables in data set order


Bar and At Sign Operators

You can shorten the specification of a large factorial model by using the bar operator. For example, two ways of writing the model for a full three-way factorial model follow:

scalemodel A B C   A*B A*C B*C   A*B*C;

scalemodel A|B|C;

When you use the bar (|), the right and left sides become effects, and the cross of them becomes an effect. Multiple bars are permitted. The expressions are expanded from left to right, using rules 2–4 from Searle (1971, p. 390).

  • Multiple bars are evaluated from left to right. For example, A | B | C is evaluated as follows:

    A | B | C

    $\rightarrow $

    $\{ $ A | B $\} $ | C

     

    $\rightarrow $

    $\{ $ A  B  A*B $\} $ | C

     

    $\rightarrow $

    A  B  A*B  C  A*C  B*C  A*B*C

  • Crossed and nested groups of variables are combined. For example, A(B) | C(D) generates A*C(B D), among other terms.

  • Duplicate variables are removed. For example, A(C) | B(C) generates A*B(C C), among other terms, and the extra C is removed.

  • Effects are discarded if a variable occurs on both the crossed and nested parts of an effect. For example, A(B) | B(D E) generates A*B(B D E), but this effect is eliminated immediately.

You can also specify the maximum number of variables involved in any effect that results from bar evaluation by specifying that maximum number, preceded by an at sign (@), at the end of the bar effect. For example, the following specification selects only those effects that contain two or fewer variables:

scalemodel A|B|C@2;

The preceding example is equivalent to the following SCALEMODEL statement:

scalemodel A B C   A*B A*C B*C;

More examples of using the bar and at sign operators follow:

A | C(B)

is equivalent to

A   C(B)   A*C(B)

A(B) | C(B)

is equivalent to

A(B)   C(B)   A*C(B)

A(B) | B(D E)

is equivalent to

A(B)   B(D E)

A | B(A) | C

is equivalent to

A   B(A)   C   A*C   B*C(A)

A | B(A) | C@2

is equivalent to

A   B(A)   C   A*C

A | B | C | D@2

is equivalent to

A  B  A*B  C  A*C  B*C  D  A*D  B*D  C*D

A*B(C*D)

is equivalent to

A*B(C D)

Note: The preceding examples assume the following CLASS statement specification:

class A B C D;
Colon, Dash, and Double Dash Operators

You can simplify the specification of a large model when some of your variables have a common prefix by using the colon (:) operator and the dash (-) operator. The colon operator selects all variables that have a particular prefix, and the dash operator enables you to list variables that are numbered sequentially. For example, if your data set contains the variables X1 through X9, the following SCALEMODEL statements are equivalent:

scalemodel X1 X2 X3 X4 X5 X6 X7 X8 X9;

scalemodel X1-X9;

scalemodel X:;

If your data set contains only the three covariates X1, X2, and X9, then the colon operator selects all three variables:

scalemodel X:;

However, the following specification returns an error because X3 through X8 are not in the data set:

scalemodel X1-X9;

The double dash (- -) operator enables you to select variables that are stored sequentially in the SAS data set, whether or not they have a common prefix. You can use the CONTENTS procedure (see Base SAS Procedures Guide) to determine your variable ordering. For example, if you replace the dash in the preceding SCALEMODEL statement with a double dash, as follows, then all three variables are selected:

scalemodel X1--X9;

If your data set contains the variables A, B, and C, then you can use the double dash operator to select these variables by specifying the following:

scalemodel A--C;

GLM Parameterization of Classification Variables and Effects

Table 9.9 shows the types of effects that are available in the HPSEVERITY procedure; they are discussed in more detail in the following sections. Let A, B, and C represent classification variables, and let X and Z represent continuous variables.

Table 9.9: Available Types of Effects

Effect

Example

Description

Singleton continuous

X Z

Continuous variables

Polynomial continuous

X*Z

Interaction of continuous variables

Main

A B

CLASS variables

Interaction

A*B

Crossing of CLASS variables

Nested

A(B)

Main effect A nested within CLASS effect B

Continuous-by-class

X*A

Crossing of continuous and CLASS variables

Continuous-nesting-class

X(A)

Continuous variable X nested within CLASS variable A

General

X*Z*A(B)

Combinations of different types of effects


Continuous Effects

Continuous variables or polynomial terms that involve them can be included in the model as continuous effects. An effect that contains a single continuous variable is referred to as a singleton continuous effect, and an effect that contains an interaction of only continuous variables is referred to as a polynomial continuous effect. The actual values of such terms are included as columns of the relevant model matrices. You can use the bar operator along with a continuous variable to generate polynomial effects. For example, X | X | X expands to X X*X X*X*X, which is a cubic model.

Main Effects

If a classification variable has m levels, the GLM parameterization generates m columns for its main effect in the model matrix. Each column is an indicator variable for a given level. The order of the columns is the sort order of the values of their levels and can be controlled by the ORDER= option in the CLASS statement.

Table 9.10 is an example where $\beta _0$ denotes the intercept and A and B are classification variables that have two and three levels, respectively.

Table 9.10: Example of Main Effects

Data

 

I

 

A

 

B

A

B

 

$\beta _0$

 

A1

A2

 

B1

B2

B3

1

1

 

1

 

1

0

 

1

0

0

1

2

 

1

 

1

0

 

0

1

0

1

3

 

1

 

1

0

 

0

0

1

2

1

 

1

 

0

1

 

1

0

0

2

2

 

1

 

0

1

 

0

1

0

2

3

 

1

 

0

1

 

0

0

1


There are usually more columns for these effects than there are degrees of freedom to estimate them. In other words, the GLM parameterization of main effects is singular.

Interaction Effects

Often a regression model includes interaction (crossed) effects to account for how the effect of a variable changes along with the values of other variables. In an interaction, the terms are first reordered to correspond to the order of the variables in the CLASS statement. Thus, B*A becomes A*B if A precedes B in the CLASS statement. Then, the GLM parameterization generates columns for all combinations of levels that occur in the data. The order of the columns is such that the rightmost variables in the interaction change faster than the leftmost variables, as illustrated in Table 9.11.

Table 9.11: Example of Interaction Effects

Data

 

I

 

A

 

B

 

A*B

A

B

 

$\beta _0$

 

A1

A2

 

B1

B2

B3

 

A1B1

A1B2

A1B3

A2B1

A2B2

A2B3

1

1

 

1

 

1

0

 

1

0

0

 

1

0

0

0

0

0

1

2

 

1

 

1

0

 

0

1

0

 

0

1

0

0

0

0

1

3

 

1

 

1

0

 

0

0

1

 

0

0

1

0

0

0

2

1

 

1

 

0

1

 

1

0

0

 

0

0

0

1

0

0

2

2

 

1

 

0

1

 

0

1

0

 

0

0

0

0

1

0

2

3

 

1

 

0

1

 

0

0

1

 

0

0

0

0

0

1


In the matrix in Table 9.11, main-effects columns are not linearly independent of crossed-effects columns. In fact, the column space for the crossed effects contains the space of the main effect.

When your regression model contains many interaction effects, you might be able to code them more parsimoniously by using the bar operator ( | ). The bar operator generates all possible interaction effects. For example, A | B | C expands to A B A*B C A*C B*C A*B*C. To eliminate higher-order interaction effects, use the at sign (@) in conjunction with the bar operator. For example, A | B | C | D@2 expands to A B A*B C A*C B*C D A*D B*D C*D.

Nested Effects

Nested effects are generated in the same manner as crossed effects. Hence, the design columns that are generated by the following two statements are the same (but the ordering of the columns is different):

scalemodel A B(A);

scalemodel A A*B;

The nesting operator in PROC HPSEVERITY is more of a notational convenience than an operation that is distinct from crossing. Nested effects are usually characterized by the property that the nested variables do not appear as main effects. The order of the variables within nesting parentheses is made to correspond to the order of these variables in the CLASS statement. The order of the columns is such that variables outside the parentheses index faster than those inside the parentheses, and the rightmost nested variables index faster than the leftmost variables, as illustrated in Table 9.12.

Table 9.12: Example of Nested Effects

Data

 

I

 

A

 

B(A)

A

B

 

$\beta _0$

 

A1

A2

 

B1A1

B2A1

B3A1

B1A2

B2A2

B3A2

1

1

 

1

 

1

0

 

1

0

0

0

0

0

1

2

 

1

 

1

0

 

0

1

0

0

0

0

1

3

 

1

 

1

0

 

0

0

1

0

0

0

2

1

 

1

 

0

1

 

0

0

0

1

0

0

2

2

 

1

 

0

1

 

0

0

0

0

1

0

2

3

 

1

 

0

1

 

0

0

0

0

0

1


Continuous-Nesting-Class Effects

When a continuous variable nests or crosses with a classification variable, the design columns are constructed by multiplying the continuous values into the design columns for the classification effect, as illustrated in Table 9.13.

Table 9.13: Example of Continuous-Nesting-Class Effects

Data

 

I

 

A

 

X(A)

X

A

 

$\beta _0$

 

A1

A2

 

X(A1)

X(A2)

21

1

 

1

 

1

0

 

21

0

24

1

 

1

 

1

0

 

24

0

22

1

 

1

 

1

0

 

22

0

28

2

 

1

 

0

1

 

0

28

19

2

 

1

 

0

1

 

0

19

23

2

 

1

 

0

1

 

0

23


Continuous-by-Class Effects

Continuous-by-class effects generate the same design columns as continuous-nesting-class effects. Table 9.14 shows the construction of the X*A effect. The two columns for this effect are the same as the columns for the X(A) effect in Table 9.13.

Table 9.14: Example of Continuous-by-Class Effects

Data

 

I

 

X

 

A

 

X*A

X

A

 

$\beta _0$

 

X

 

A1

A2

 

X*A1

X*A2

21

1

 

1

 

21

 

1

0

 

21

0

24

1

 

1

 

24

 

1

0

 

24

0

22

1

 

1

 

22

 

1

0

 

22

0

28

2

 

1

 

28

 

0

1

 

0

28

19

2

 

1

 

19

 

0

1

 

0

19

23

2

 

1

 

23

 

0

1

 

0

23


General Effects

An example that combines all the effects is X1*X2*A*B*C(D E). The continuous list comes first, followed by the crossed list, followed by the nested list in parentheses. PROC HPSEVERITY might rename effects to correspond to ordering rules. For example, B*A(E D) might be renamed A*B(D E) to satisfy the following:

  • Classification variables that occur outside parentheses (crossed effects) are sorted in the order in which they appear in the CLASS statement.

  • Variables within parentheses (nested effects) are sorted in the order in which they appear in the CLASS statement.

The sequencing of the parameters that are generated by an effect is determined by the variables whose levels are indexed faster:

  • Variables in the crossed list index faster than variables in the nested list.

  • Within a crossed or nested list, variables to the right index faster than variables to the left.

For example, suppose a model includes four effects—A, B, C, and D—each of which has two levels, 1 and 2. Assume the CLASS statement is

class A B C D;

Then the order of the parameters for the effect B*A(C D), which is renamed A*B(C D), is

\[  \begin{array}{cccccc} A_1 B_1 C_1 D_1 \rightarrow &  A_1 B_2 C_1 D_1 \rightarrow &  A_2 B_1 C_1 D_1 \rightarrow &  A_2 B_2 C_1 D_1 \rightarrow & \\ A_1 B_1 C_1 D_2 \rightarrow &  A_1 B_2 C_1 D_2 \rightarrow &  A_2 B_1 C_1 D_2 \rightarrow &  A_2 B_2 C_1 D_2 \rightarrow & \\ A_1 B_1 C_2 D_1 \rightarrow &  A_1 B_2 C_2 D_1 \rightarrow &  A_2 B_1 C_2 D_1 \rightarrow &  A_2 B_2 C_2 D_1 \rightarrow & \\ A_1 B_1 C_2 D_2 \rightarrow &  A_1 B_2 C_2 D_2 \rightarrow &  A_2 B_1 C_2 D_2 \rightarrow &  A_2 B_2 C_2 D_2 \phantom{\rightarrow } \end{array}  \]

Note that first the crossed effects B and A are sorted in the order in which they appear in the CLASS statement so that A precedes B in the parameter list. Then, for each combination of the nested effects in turn, combinations of A and B appear. The B effect changes fastest because it is rightmost in the cross list. Then A changes next fastest, and D changes next fastest after that. The C effect changes most slowly because it is leftmost in the nested list.

Reference Parameterization

Classification variables can be represented in the reference parameterization. Consider the classification variable A that has four values, 1, 2, 5, and 7. The reference parameterization generates three columns (one less than the number of variable levels). The columns indicate group membership of the nonreference levels. For the reference level, the three dummy variables have a value of 0. If the reference level is 7 (REF=’7’), the design columns for variable A are as shown in Table 9.15.

Table 9.15: Reference Coding

 

Design Matrix

A

A1

A2

A5

1

1

0

0

2

0

1

0

5

0

0

1

7

0

0

0


Parameter estimates of CLASS main effects that use the reference coding scheme estimate the difference in the effect of each nonreference level compared to the effect of the reference level.