This example of a repeated measures study is taken from Diggle, Liang, and Zeger (1994, p. 100). The data consist of body weights of 27 cows, measured at 23 unequally spaced time points over a period of approximately 22 months. Following Diggle, Liang, and Zeger (1994), one animal is removed from the analysis, one observation is removed according to their Figure 5.7, and the time is shifted to start at 0 and is measured in 10-day increments. The design is a factorial, and the factors are the infection of an animal with M. paratuberculosis and whether the animal is receiving iron dosing. The data set contains five variables: cow assigns a unique identification number—from 1 to 26—to each cow in the study, tpoint denotes the time of the growth measurement, weight denotes the growth measurement, iron is a dummy variable that indicates whether the animal is receiving iron or not, and infection is a dummy variable that indicates whether the animal is infected or not. The goal of the study is to assess the effect of iron and infection—and their possible interaction—on weight. The following DATA steps create this data set:
data times; input time1-time23; datalines; 122 150 166 179 219 247 276 296 324 354 380 445 478 508 536 569 599 627 655 668 723 751 781 ; data cows; if _n_ = 1 then merge times; array t{23} time1 - time23; array w{23} weight1 - weight23; input cow iron infection weight1-weight23 @@; do i=1 to 23; weight = w{i}; tpoint = (t{i}-t{1})/10; output; end; keep cow iron infection tpoint weight; datalines; 1 0 0 4.7 4.905 5.011 5.075 5.136 5.165 5.298 5.323 ... more lines ...
The following DATA step adds ironInf, a grouping variable that is used later during the plotting of the results. In the next step, the data are sorted by the index variable, tpoint.
data cows; set cows; ironInf = "No Iron and No Infection"; if iron=1 and infection=1 then ironInf = "Iron and Infection"; else if iron=1 and infection=0 then ironInf = "No Iron and Infection"; else if iron=0 and infection=1 then ironInf = "Iron and No Infection"; else ironInf = "No Iron and No Infection"; run;
proc sort data=cows; by tpoint ; run;
To assess the effect of iron and infection on weight, the natural growth profile of the animals must also be accounted for. Here two alternate models for this problem are considered. The first model assumes that the observed weight of an animal is the sum of a common growth profile, which is modeled by a polynomial spline trend of order 2, the regression effects of iron and infection, and the observation error—modeled as white noise. An interaction term, for interaction between iron and infection, was found to be insignificant and is not included. In the second model, the common growth profile and the regression variables of the first model are replaced by four environment specific growth profiles.
The following statements fit the first model:
proc ssm data=cows; id tpoint; trend growth(ps(2)); irregular wn; model weight = iron infection growth wn; eval pattern = iron + infection + growth; forecast out=for; quit;
Output 27.3.1 shows that the state dimension of this model is 2 (corresponding to the polynomial trend specification of order 2), the number of diffuse elements in the initial condition is 4 (corresponding to the trend and the two regressors iron and infection), and the number of unknown parameters is 2 (corresponding to the variance parameters of trend and irregular).
Model Summary | |
---|---|
Model Property | Value |
Number of Model Equations | 1 |
State Dimension | 2 |
Dimension of the Diffuse Initial Condition | 4 |
Number of Parameters | 2 |
ID Variable Information | ||||
---|---|---|---|---|
Name | Start | End | MaxDelta | Type |
tpoint | 0 | 65.9000 | 6.50 | Irregular with replication |
Estimates of the Regression Parameters | |||||
---|---|---|---|---|---|
Response Variable | Regression Variable | Estimate | Approx Std Error |
t Value | Approx Pr > |t| |
weight | iron | -0.07476 | 0.0076114 | -9.82 | <.0001 |
weight | infection | -0.12921 | 0.0085924 | -15.04 | <.0001 |
Maximum Likelihood Estimates of the Unknown System Parameters (Parameters Not Specified in the PARMS Statement) | ||||
---|---|---|---|---|
Component | Type | Parameter | Estimate | Approx Std Error |
growth | PS(2) Trend | Level Variance | 0.00001625 | 9.01304E-6 |
wn | Irregular | Variance | 0.00858 | 0.0005034 |
After examining the model fit, it is useful to study how well the patterns implied by the model follow the data. pattern, defined by the EVAL statement, is a sum of the trend component and the regression effects. A graphical examination of the smoothed estimate of pattern is done next. The following DATA step merges the output data set specified in the FORECAST statement, for, with the input data set, cows. In particular, this adds ironInf (a grouping variable from cows) to for.
data for; merge for cows; by tpoint; run;
The following statements produce the graphs of smoothed_pattern, grouped according to the environment condition (see Output 27.3.5). The plot clearly shows that the control group "No Iron and No Infection" has the best growth profile, while the worst growth profile is for the group "Iron and Infection."
proc sgplot data=for noautolegend; title 'Common Growth Profile Adjusted by Iron and Infection Status'; band x=tpoint lower=smoothed_lower_pattern upper=smoothed_upper_pattern / group=ironInf name="band"; series x=tpoint y=smoothed_pattern / group=ironInf name="series"; keylegend "series"; run;
The following statements produce a panel of plots that show how well smoothed_pattern follows the observed data:
proc sgpanel data=for noautolegend; title 'Growth Plots Grouped by Iron and Infection'; label tpoint='Time' ; panelby iron infection / columns=2; band x=tpoint lower=smoothed_lower_pattern upper=smoothed_upper_pattern ; scatter x=tpoint y=weight; series x=tpoint y=smoothed_pattern ; run;
Output 27.3.6 shows that the model fits the data reasonably well.
The following statements fit the second model. In this model separate polynomial trends are fit according to different settings of iron and infection by specifying an appropriate list of (dummy) variables in the CROSS= option of the trend specification.
proc ssm data=cows; id tpoint; a1 = (iron=1 and infection=1); a2 = (iron=1 and infection=0); a3 = (iron=0 and infection=1); a4 = (iron=0 and infection=0); trend growth(ps(2)) cross=(a1-a4); irregular wn; model weight = growth wn; forecast out=for1; quit;
As a result of the CROSS= option, the trend component growth is actually a sum of four separate trends that correspond to the different iron-infection settings. Denoting growth by and the four independent trends by and ,
where a1, a2, a3, and a4 are the dummy variables specified in the CROSS= option. This shows that, for any given setting (say, the one for a4) is simply the corresponding trend . The model summary, shown in Output 27.3.7, reflects the increased state dimension and the increased number of parameters.
Model Summary | |
---|---|
Model Property | Value |
Number of Model Equations | 1 |
State Dimension | 8 |
Dimension of the Diffuse Initial Condition | 8 |
Number of Parameters | 5 |
Component | Type | Parameter | Estimate |
---|---|---|---|
growth(Cross = a1) | PS(2) Trend | Level Variance | 0.00001279 |
growth(Cross = a2) | PS(2) Trend | Level Variance | 0.00000872 |
growth(Cross = a3) | PS(2) Trend | Level Variance | 0.00000907 |
growth(Cross = a4) | PS(2) Trend | Level Variance | 0.00000845 |
wn | Irregular | Variance | 0.00839 |
data for1; merge for1 cows; by tpoint; run;
The following statements produce the graphs of smoothed for the desired settings (since the grouping variable ironInf exactly corresponds to these settings). Once again, the plot in Output 27.3.9 clearly shows that the control group "No Iron and No Infection" has the best growth profile, while the worst growth profile is for the group "Iron and Infection." However, unlike the first model, the profile curves are not merely shifted versions of a common profile.
proc sgplot data=for1 noautolegend; title 'Iron and Infection Status-Specific Growth Profiles'; band x=tpoint lower=smoothed_lower_growth upper=smoothed_upper_growth / group=ironInf name="band"; series x=tpoint y=smoothed_growth / group=ironInf name="series"; keylegend "series"; run;
Note: This procedure is experimental.