The SSM Procedure

Example 34.4 Longitudinal Data: Smoothing of Repeated Measures

This example of a repeated measures study is taken from Diggle, Liang, and Zeger (1994, p. 100). The data consist of body weights of 27 cows, measured at 23 unequally spaced time points over a period of approximately 22 months. Following Diggle, Liang, and Zeger (1994), one animal is removed from the analysis, one observation is removed according to their Figure 5.7, and the time is shifted to start at 0 and is measured in 10-day increments. The design is a $2 \times 2$ factorial, and the factors are the infection of an animal with M. paratuberculosis and whether the animal is receiving iron dosing. The data set contains five variables: cow assigns a unique identification number—from 1 to 26—to each cow in the study, tpoint denotes the time of the growth measurement, weight denotes the growth measurement, iron is a dummy variable that indicates whether the animal is receiving iron or not, and infection is a dummy variable that indicates whether the animal is infected or not. The goal of the study is to assess the effect of iron and infection—and their possible interaction—on weight. The following DATA steps create this data set:

data times;
  input time1-time23;
  datalines;
 122  150  166  179  219  247  276  296  324  354  380  445
 478  508  536  569  599  627  655  668  723  751  781
;

data Cows;
  if _n_ = 1 then merge times;
  array t{23} time1   - time23;
  array w{23} weight1 - weight23;
  input cow iron infection weight1-weight23 @@;
  do i=1 to 23;
     weight = w{i};
     tpoint = (t{i}-t{1})/10;
     output;
  end;
  keep cow iron infection tpoint weight;
  datalines;
  1 0 0  4.7    4.905  5.011  5.075  5.136  5.165  5.298  5.323
         5.416  5.438  5.541  5.652  5.687  5.737  5.814  5.799

   ... more lines ...   

The following DATA step adds ironInf, a grouping variable that is used later during the plotting of the results. In the next step, the data are sorted by the index variable, tpoint.

data Cows;
   set Cows;
   ironInf = "No Iron and No Infection";
   if iron=1 and infection=1 then ironInf = "Iron and Infection";
   else if iron=1 and infection=0 then ironInf = "Iron and No Infection";
   else if iron=0 and infection=1 then ironInf = "No Iron and Infection";
   else ironInf = "No Iron and No Infection";
   run;
 proc sort data=Cows;
     by tpoint ;
 run;

To assess the effect of iron and infection on weight, the natural growth profile of the animals must also be accounted for. Here two alternate models for this problem are considered. The first model assumes that the observed weight of an animal is the sum of a common growth profile, which is modeled by a polynomial spline trend of order 2, the regression effects of iron and infection, and the observation error—modeled as white noise. An interaction term, for interaction between iron and infection, was found to be insignificant and is not included. In the second model, the common growth profile and the regression variables of the first model are replaced by four environment specific growth profiles.

The following statements fit the first model:

 proc ssm data=Cows;
    id tpoint;
    trend growth(ps(2));
    irregular wn;
    model weight = iron infection growth wn;
    eval pattern = iron + infection + growth;
    output out=For;
 quit;

Output 34.4.1 shows that the state dimension of this model is 2 (corresponding to the polynomial trend specification of order 2), the number of diffuse elements in the initial condition is 4 (corresponding to the trend and the two regressors iron and infection), and the number of unknown parameters is 2 (corresponding to the variance parameters of trend and irregular).

Output 34.4.1: Model1: Model Summary Information

The SSM Procedure

Model Summary
Model Property Value
Number of Model Equations 1
State Dimension 2
Dimension of the Diffuse Initial Condition 4
Number of Parameters 2



Output 34.4.2 shows that the ID variable is irregularly spaced with replication.

Output 34.4.2: ID Variable Information

ID Variable Information
Name Start End Max Delta NDistinct Type
tpoint 0 65.9 6.5 23 Irregular with Replication



The estimated regression coefficients of iron and infection, shown in Output 34.4.3, are significant and negative. This implies that both iron and infection adversely affect the response variable, weight.

Output 34.4.3: Model 1: Regression Estimates

Regression Parameter Estimates
Response Variable Regression Variable Estimate Standard
Error
t Value Pr > |t|
weight iron -0.0748 0.00761 -9.82 <.0001
weight infection -0.1292 0.00859 -15.04 <.0001



The variance estimates of the trend component and the irregular component are shown in Output 34.4.4.

Output 34.4.4: Model 1: Estimates of Unnamed Parameters

Model Parameter Estimates
Component Type Parameter Estimate Standard
Error
t Value
growth PS(2) Trend Level Variance 0.0000162 9.01E-06 1.80
wn Irregular Variance 0.0085849 5.03E-04 17.06



After examining the model fit, it is useful to study how well the patterns implied by the model follow the data. pattern, defined by the EVAL statement, is a sum of the trend component and the regression effects. A graphical examination of the smoothed estimate of pattern is done next. The following DATA step merges the output data set specified in the OUTPUT statement, For, with the input data set, Cows. In particular, this adds ironInf (a grouping variable from Cows) to For.

data For;
    merge for Cows;
    by tpoint;
run;

The following statements produce the graphs of smoothed_pattern, grouped according to the environment condition (see Output 34.4.5). The plot clearly shows that the control group "No Iron and No Infection" has the best growth profile, while the worst growth profile is for the group "Iron and Infection."

 proc sgplot data=For noautolegend;
     title 'Common Growth Profile Adjusted by Iron and Infection Status';
     band x=tpoint lower=smoothed_lower_pattern
        upper=smoothed_upper_pattern / group=ironInf name="band";
     series  x=tpoint y=smoothed_pattern / group=ironInf name="series";
     keylegend "series";
 run;

Output 34.4.5: Model 1: Growth Profile Comparison with 95% Confidence Bands

Model 1: Growth Profile Comparison with 95% Confidence Bands


The following statements produce a panel of plots that show how well smoothed_pattern follows the observed data:

 proc sgpanel data=For noautolegend;
   title 'Growth Plots Grouped by Iron and Infection';
    label tpoint='Time' ;
    panelby iron infection / columns=2;
    band x=tpoint lower=smoothed_lower_pattern
       upper=smoothed_upper_pattern ;
    scatter x=tpoint y=weight;
    series  x=tpoint y=smoothed_pattern ;
 run;

Output 34.4.6 shows that the model fits the data reasonably well.

Output 34.4.6: Model 1: Smoothed Model Fit Lines

Model 1: Smoothed Model Fit Lines


The following statements fit the second model. In this model separate polynomial trends are fit according to different settings of iron and infection by specifying an appropriate list of (dummy) variables in the CROSS= option of the trend specification.

 proc ssm data=Cows;
     id tpoint;
     a1 = (iron=1 and infection=1);
     a2 = (iron=1 and infection=0);
     a3 = (iron=0 and infection=1);
     a4 = (iron=0 and infection=0);
     trend growth(ps(2)) cross=(a1-a4);
     irregular wn;
     model weight = growth wn;
     /* Define contrasts between a1 and other treatments  */
     comp a1Curve = growth_state_[1];
     comp a2Curve = growth_state_[2];
     comp a3Curve = growth_state_[3];
     comp a4Curve = growth_state_[4];
     eval contrast21 = a2Curve - a1Curve;
     eval contrast31 = a3Curve - a1Curve;
     eval contrast41 = a4Curve - a1Curve;
     output out=for1;
 quit;

As a result of the CROSS= option, the trend component growth is actually a sum of four separate trends that correspond to the different iron-infection settings. Denoting growth by $\mu _{t}$ and the four independent trends by $\mu _{1,t}, \;  \mu _{2,t}, \;  \mu _{3,t},$ and $\mu _{4,t}$,

\[ \mu _{t} = a1 *\mu _{1,t} + a2* \mu _{2,t} + a3 *\mu _{3,t} + a4* \mu _{4,t} \]

where a1, a2, a3, and a4 are the dummy variables specified in the CROSS= option. This shows that, for any given setting (say, the one for a4) $\mu _{t}$ is simply the corresponding trend $ \mu _{4,t} $. In addition, note the form of the COMPONENT statements that define the components a1Curve, a2Curve, a3Curve, and a4Curve. This form of the COMPONENT statement treats the state that is associated with growth, named growth_state_ by convention, as a state of nominal dimension 4—the number of variables in the CROSS= list. This, in turn, implies that a1Curve, which is defined as growth_state_[1], refers to $\mu _{1,t}$. These components are subsequently used in the EVAL statements to define contrasts between the trends—for example, contrast21 corresponds to the difference between the trends $\mu _{2,t}$ and $\mu _{1,t}$. The estimates of these components (a1Curve, a2Curve, . . ., contrast41) are output to the data set For1 named in the OUT= option of the OUTPUT data set.

The model summary, shown in Output 34.4.7, reflects the increased state dimension and the increased number of parameters.

Output 34.4.7: Model2: Model Summary Information

The SSM Procedure

Model Summary
Model Property Value
Number of Model Equations 1
State Dimension 8
Dimension of the Diffuse Initial Condition 8
Number of Parameters 5



Output 34.4.8 shows the parameter estimates for this model.

Output 34.4.8: Model2: Estimates of Unnamed Parameters (Partial Output)

Component Parameter Estimate StdErr tValue
growth(Cross = a1) Level Variance 1.28E-05 6.83E-06 1.87
growth(Cross = a2) Level Variance 8.72E-06 3.81E-06 2.29
growth(Cross = a3) Level Variance 9.07E-06 4.23E-06 2.14
growth(Cross = a4) Level Variance 8.45E-06 3.40E-06 2.49
wn Variance 8.39E-03 4.98E-04 16.84



Next, the smoothed estimate of trend (growth) is graphically studied. The following DATA step prepares the data for the grouped plots of smoothed_growth by merging For1 with the input data set Cows. As before, the reason is merely to include ironInf (the grouping variable).

data For1;
    merge For1 Cows;
    by tpoint;
run;

The following statements produce the graphs of smoothed $\mu _{t}$ for the desired settings (since the grouping variable ironInf exactly corresponds to these settings). Once again, the plot in Output 34.4.9 clearly shows that the control group "No Iron and No Infection" has the best growth profile, while the worst growth profile is for the group "Iron and Infection." However, unlike the first model, the profile curves are not merely shifted versions of a common profile.

 proc sgplot data=For1 noautolegend;
    title 'Iron and Infection Status-Specific Growth Profiles';
     band x=tpoint lower=smoothed_lower_growth
       upper=smoothed_upper_growth / group=ironInf name="band";
     series  x=tpoint y=smoothed_growth / group=ironInf name="series";
     keylegend "series";
 run;

Output 34.4.9: Model 2: Growth Profile Comparison with 95% Confidence Bands

Model 2: Growth Profile Comparison with 95% Confidence Bands


The following statements produce the plot of smoothed $(\mu _{4,t} - \mu _{1,t})$—contrast between the best and the worst growth profiles.

 proc sgplot data=For1;
    title "Estimated Contrast Between the Treatments 4 and 1 ";
     band x=tpoint lower=smoothed_lower_contrast41
        upper=smoothed_upper_contrast41;
     series x=tpoint y=smoothed_contrast41;
 run;

Output 34.4.10 shows that the growth pattern of the control group "No Iron and No Infection" consistently remains above the growth pattern of the treatment group "Iron and Infection."

Output 34.4.10: Estimated Contrast Between the Treatments 4 and 1 with 95% Confidence Bands

Estimated Contrast Between the Treatments 4 and 1 with 95% Confidence Bands