Each term in a model, called an effect, is a variable or combination of variables. Effects are specified with a special notation that uses variable names and operators. There are two kinds of variables: classification (or CLASS) variables and continuous variables. There are two primary operators: crossing and nesting. A third operator, the bar operator, is used to simplify effect specification.
In an analysis-of-variance model, independent variables must be variables that identify classification levels. In the SAS
System, these are called classification (or class) variables and are declared in the CLASS
statement. (They can also be called categorical, qualitative, discrete, or nominal variables.) Classification variables can be either numeric or character. The values of a classification variable are called levels. For example, the classification variable Sex
has the levels "male" and "female."
In a model, an independent variable that is not declared in the CLASS statement is assumed to be continuous. Continuous variables, which must be numeric, are used for response variables and covariates. For example, the heights and weights of subjects are continuous variables.
There are seven different types of effects used in the GLM procedure. In the following list, assume that A
, B
, C
, D
, and E
are CLASS
variables and that X1
, X2
, and Y
are continuous variables:
Regressor effects are specified by writing continuous variables by themselves: X1
X2
.
Polynomial effects are specified by joining two or more continuous variables with asterisks: X1
*X1
X1
*X2
.
Main effects are specified by writing CLASS variables by themselves: A
B
C
.
Crossed effects (interactions) are specified by joining classification variables with asterisks: A
*B
B
*C
A
*B
*C
.
Nested effects are specified by following a main effect or crossed effect with a classification variable or list of classification variables enclosed in parentheses. The main
effect or crossed effect is nested within the effects listed in parentheses: B
(A
) C
(B
*A
) D
*E
(C
*B
*A
). In this example, B
(A
) is read "B
nested within A
."
Continuous-by-class effects are written by joining continuous variables and classification variables with asterisks: X1
*A
.
Continuous-nesting-class effects consist of continuous variables followed by a classification variable interaction enclosed in parentheses: X1
(A
) X1
*X2
(A
*B
).
One example of the general form of an effect involving several variables is
X1
*X2
*A
*B
*C
(D
*E
)
This example contains crossed continuous terms by crossed classification terms nested within more than one classification variable. The continuous list comes first, followed by the crossed list, followed by the nesting list in parentheses. Note that asterisks can appear within the nested list but not immediately before the left parenthesis. For details on how the design matrix and parameters are defined with respect to the effects specified in this section, see the section Parameterization of PROC GLM Models.
The MODEL
statement and several other statements use these effects. Some examples of MODEL
statements that use various kinds of effects are shown in the following table; a
, b
, and c
represent classification variables, and y
, y1
, y2
, x
, and z
represent continuous variables.
Specification |
Type of Model |
---|---|
Simple regression |
|
|
Multiple regression |
|
Polynomial regression |
|
Multivariate regression |
One-way ANOVA |
|
|
Main-effects ANOVA |
|
Factorial ANOVA with interaction |
|
Nested ANOVA |
|
Multivariate analysis of variance (MANOVA) |
Analysis of covariance |
|
|
Separate-slopes regression |
|
Homogeneity-of-slopes regression |
You can shorten the specification of a large factorial model by using the bar operator. For example, two ways of writing the model for a full three-way factorial model follow:
model Y = A B C A*B A*C B*C A*B*C; model Y = A|B|C;
When the bar (|) is used, the right and left sides become effects, and the cross of them becomes an effect. Multiple bars are permitted. The expressions are expanded from left to right, using rules 2–4 given in Searle (1971, p. 390).
Multiple bars are evaluated from left to right. For instance, A
|B
|C
is evaluated as follows:
|
|
|
|
|
|
|
|
Crossed and nested groups of variables are combined. For example, A
(B
) | C
(D
) generates A
*C
(B
D
), among other terms.
Duplicate variables are removed. For example, A
(C
) | B
(C
) generates A
*B
(C
C
), among other terms, and the extra C
is removed.
Effects are discarded if a variable occurs on both the crossed and nested parts of an effect. For instance, A
(B
) | B
(D
E
) generates A
*B
(B
D
E
), but this effect is eliminated immediately.
You can also specify the maximum number of variables involved in any effect that results from bar evaluation by specifying
that maximum number, preceded by an @ sign, at the end of the bar effect. For example, the specification A
| B
| C
@2 would result in only those effects that contain 2 or fewer variables: in this case, A
B
A
*B
C
A
*C
and B
*C
.
More examples of using the bar and at operators follow:
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|
|
is equivalent to |
|