Shared Concepts and Topics

Lag Effects

EFFECT name=LAG (variable / WITHIN=variable PERIOD=variable lag-options);

EFFECT name=LAG (variable / WITHIN=(var-list) PERIOD=variable lag-options);

A lag effect is a classification effect for the variable that is specified after the keyword LAG. A lag effect represents the effect of a previous value of the lagged variable when the observations of this variable are inherently ordered. A typical example where lag effects are useful is a study in which different subjects are given sequences of treatments and you want to investigate whether the treatment in the previous period is important in understanding the outcome in the current period. You can do this by including a lagged treatment effect in your model.

The precise definition of a lag effect depends on a subdivision of the data into disjoint subsets and an ordering into units of the observations within a subset. The subsets are often called subjects, and they are specified in the required WITHIN= option. The units are often called periods, and they are specified in the required PERIOD= option. For an observation that belongs to a particular subject at a particular period, the design matrix columns of the lagged variable are the usual design matrix columns of that variable except for the observation at the preceding period for that subject. Observations at the initial period do not have a preceding value, and so the design matrix columns of the lag effect for these observations are set to 0. You can also define lag effects where the number of periods that are lagged is greater than 1. If the number of periods that are lagged is n, then the design matrix columns of observations in periods less than or equal to n are set to 0. The design matrix columns that correspond to a subject at period p, where p > n, are the usual design matrix columns of the lagged variable for that subject at period p – n.

In a valid lag design there is at most one observation for a particular period and subject. For example, the following set of treatments by subject and period form a valid lag design:

Subject	Period	Treatment
Sheila	1	B
Joey	1	A
Athena	1	A
Gelindo	1	A
Sheila	2	C
Joey	2	A
Athena	2	.
Gelindo	2	B
Sheila	3	B
Joey	3	C
Athena	3	A
Gelindo	3	B

A convenient way to represent the organization of observations into subjects and periods is to form the lag design matrix. The rows and columns of this matrix correspond to the subjects and periods, respectively. The lag design matrix entry is the treatment for the corresponding subject and period.

The associated lag design matrix is

	Period
Subject	1	2	3
Athena	A		A
Gelindo	A	B	B
Joey	A	A	C
Sheila	B	C	B

Note that the subject Athena did not receive a treatment at period 2, and so the corresponding entry in the lag design matrix is missing. The following statements define a lag effect for this lag design:

CLASS treatment;
EFFECT Lag = LAG( treatment / WITHIN=subject PERIOD=period);

When GLM coding is used for the variable treatment, the design matrix columns Lag_A, Lag_B, and Lag_C for the constructed effect Lag are as follows:

Subject	Period	Treatment	Lag_A	Lag_B	Lag_C
Athena	1	A	0	0	0
Athena	2		1	0	0
Athena	3	A	.	.	.
Gelindo	1	A	0	0	0
Gelindo	2	B	1	0	0
Gelindo	3	B	0	1	0
Joey	1	A	0	0	0
Joey	2	A	1	0	0
Joey	3	C	1	0	0
Sheila	1	B	0	0	0
Sheila	2	C	0	1	0
Sheila	3	B	0	0	1

The design matrix columns for each subject at period 1 are all 0 because there are no lagged observations for period 1. You can also see that the design matrix columns at period 3 for subject Athena are missing because Athena did not receive a treatment at period 2. Nevertheless, the design matrix columns for Athena at period 2 are nonmissing and correspond to the treatment "A" that she received in period 1.

You must specify the following required options:

PERIOD=variable: specifies the period variable of the lag design. The number of periods is the number of unique formatted values of the variable, and the ordering of the period is formed by sorting these formatted values in ascending order. You must specify this option.
WITHIN=(var-list) | variable: specifies a variable (or a list of variables within parentheses) that defines the subject grouping of the lag design. If there is only one WITHIN= variable, then the parentheses are not required. Each subject is defined by the unique set of formatted values of the variables in the WITHIN= list. The subjects are sorted in ascending lexicographic order. You must specify this option.

You can also specify the following optional lag-options:

DESIGNROLE=variable: specifies a numeric variable that subsets observations into two groups: a group in which the value of variable is nonzero and a group in which the value of variable is zero. The observations in the first group are used to form the lag design matrix that is used in fitting the model. The lag design that corresponds to the second group is used when observations in the input data set that do not belong to the first group are scored. This option is useful when you want to obtain predicted values in an output data set for observations that are not used in fitting the model. If you do not specify this option, then all observations are assigned to the first group.
DETAILS: requests a table that shows the lag design matrix of the lag effect.
NLAG=n: specifies the number of lags. By default NLAG=1.