The SSM Procedure (Experimental)

Example 27.4 Smoothing of Repeated Measures Data

This example of a repeated measures study is taken from Diggle, Liang, and Zeger (1994, p. 100). The data consist of body weights of 27 cows, measured at 23 unequally spaced time points over a period of approximately 22 months. Following Diggle, Liang, and Zeger (1994), one animal is removed from the analysis, one observation is removed according to their Figure 5.7, and the time is shifted to start at 0 and is measured in 10-day increments. The design is a $2 \times 2$ factorial, and the factors are the infection of an animal with M. paratuberculosis and whether the animal is receiving iron dosing. The data set contains five variables: cow assigns a unique identification number—from 1 to 26—to each cow in the study, tpoint denotes the time of the growth measurement, weight denotes the growth measurement, iron is a dummy variable that indicates whether the animal is receiving iron or not, and infection is a dummy variable that indicates whether the animal is infected or not. The goal of the study is to assess the effect of iron and infection—and their possible interaction—on weight. The following DATA steps create this data set:

data times;
  input time1-time23; 
  datalines; 
 122  150  166  179  219  247  276  296  324  354  380  445 
 478  508  536  569  599  627  655  668  723  751  781  
; 
 
data cows; 
  if _n_ = 1 then merge times; 
  array t{23} time1   - time23; 
  array w{23} weight1 - weight23; 
  input cow iron infection weight1-weight23 @@; 
  do i=1 to 23; 
     weight = w{i}; 
     tpoint = (t{i}-t{1})/10; 
     output; 
  end; 
  keep cow iron infection tpoint weight; 
  datalines; 
  1 0 0  4.7    4.905  5.011  5.075  5.136  5.165  5.298  5.323 
         5.416  5.438  5.541  5.652  5.687  5.737  5.814  5.799 

   ... more lines ...   

The following DATA step adds ironInf, a grouping variable that is used later during the plotting of the results. In the next step, the data are sorted by the index variable, tpoint.

data cows;
   set cows;
   ironInf = "No Iron and No Infection";
   if iron=1 and infection=1 then ironInf = "Iron and Infection";
   else if iron=1 and infection=0 then ironInf = "No Iron and Infection";
   else if iron=0 and infection=1 then ironInf = "Iron and No Infection";
   else ironInf = "No Iron and No Infection";
   run;
 proc sort data=cows; 
     by tpoint ; 
 run; 

To assess the effect of iron and infection on weight, the natural growth profile of the animals must also be accounted for. Here two alternate models for this problem are considered. The first model assumes that the observed weight of an animal is the sum of a common growth profile, which is modeled by a polynomial spline trend of order 2, the regression effects of iron and infection, and the observation error—modeled as white noise. An interaction term, for interaction between iron and infection, was found to be insignificant and is not included. In the second model, the common growth profile and the regression variables of the first model are replaced by four environment specific growth profiles.

The following statements fit the first model:

 proc ssm data=cows; 
    id tpoint; 
    trend growth(ps(2));  
    irregular wn; 
    model weight = iron infection growth wn; 
    eval pattern = iron + infection + growth;
    output out=for; 
 quit;  

Output 27.4.1 shows that the state dimension of this model is 2 (corresponding to the polynomial trend specification of order 2), the number of diffuse elements in the initial condition is 4 (corresponding to the trend and the two regressors iron and infection), and the number of unknown parameters is 2 (corresponding to the variance parameters of trend and irregular).

Output 27.4.1: Model1: Model Summary Information

The SSM Procedure

Model Summary
Model Property Value
Number of Model Equations 1
State Dimension 2
Dimension of the Diffuse Initial Condition 4
Number of Parameters 2


Output 27.4.2 shows that the ID variable is irregularly spaced with replication.

Output 27.4.2: ID Variable Information

ID Variable Information
Name Start End Max Delta Type
tpoint 0 65.9000 6.5 Irregular with Replication


The estimated regression coefficients of iron and infection, shown in Output 27.4.3, are significant and negative. This implies that both iron and infection adversely affect the response variable, weight.

Output 27.4.3: Model 1: Regression Estimates

Regression Parameter Estimates
Response Variable Regression Variable Estimate Standard Error t Value Pr > |t|
weight iron -0.0748 0.00761 -9.82 <.0001
weight infection -0.1292 0.00859 -15.04 <.0001


The variance estimates of the trend component and the irregular component are shown in Output 27.4.4.

Output 27.4.4: Model 1: Estimates of Unnamed Parameters

Model Parameter Estimates
Component Type Parameter Estimate Standard Error
growth PS(2) Trend Level Variance 0.0000162 9.01E-06
wn Irregular Variance 0.0085849 5.03E-04


After examining the model fit, it is useful to study how well the patterns implied by the model follow the data. pattern, defined by the EVAL statement, is a sum of the trend component and the regression effects. A graphical examination of the smoothed estimate of pattern is done next. The following DATA step merges the output data set specified in the OUTPUT statement, for, with the input data set, cows. In particular, this adds ironInf (a grouping variable from cows) to for.

data for;
    merge for cows;
    by tpoint;
run;

The following statements produce the graphs of smoothed_pattern, grouped according to the environment condition (see Output 27.4.5). The plot clearly shows that the control group No Iron and No Infection has the best growth profile, while the worst growth profile is for the group Iron and Infection.

 proc sgplot data=for noautolegend;
     title 'Common Growth Profile Adjusted by Iron and Infection Status';
     band x=tpoint lower=smoothed_lower_pattern 
        upper=smoothed_upper_pattern / group=ironInf name="band";
     series  x=tpoint y=smoothed_pattern / group=ironInf name="series";
     keylegend "series";
 run;

Output 27.4.5: Model 1: Growth Profile Comparison with 95% Confidence Bands

Model 1: Growth Profile Comparison with 95% Confidence Bands


The following statements produce a panel of plots that show how well smoothed_pattern follows the observed data:

 proc sgpanel data=for noautolegend;
   title 'Growth Plots Grouped by Iron and Infection';
    label tpoint='Time' ;
    panelby iron infection / columns=2;
    band x=tpoint lower=smoothed_lower_pattern 
       upper=smoothed_upper_pattern ;
    scatter x=tpoint y=weight;
    series  x=tpoint y=smoothed_pattern ;
 run;

Output 27.4.6 shows that the model fits the data reasonably well.

Output 27.4.6: Model 1: Smoothed Model Fit Lines

Model 1: Smoothed Model Fit Lines


The following statements fit the second model. In this model separate polynomial trends are fit according to different settings of iron and infection by specifying an appropriate list of (dummy) variables in the CROSS= option of the trend specification.

 proc ssm data=cows; 
     id tpoint; 
     a1 = (iron=1 and infection=1);
     a2 = (iron=1 and infection=0);
     a3 = (iron=0 and infection=1);
     a4 = (iron=0 and infection=0);
     trend growth(ps(2)) cross=(a1-a4);  
     irregular wn; 
     model weight = growth wn; 
     output out=for1; 
 quit; 

As a result of the CROSS= option, the trend component growth is actually a sum of four separate trends that correspond to the different iron-infection settings. Denoting growth by $\mu _{t}$ and the four independent trends by $\mu _{1,t}, \;  \mu _{2,t}, \;  \mu _{3,t},$ and $\mu _{4,t}$,

\[  \mu _{t} = a1 *\mu _{1,t} + a2* \mu _{2,t} + a3 *\mu _{3,t} + a4* \mu _{4,t}  \]

where a1, a2, a3, and a4 are the dummy variables specified in the CROSS= option. This shows that, for any given setting (say, the one for a4) $\mu _{t}$ is simply the corresponding trend $ \mu _{4,t} $. The model summary, shown in Output 27.4.7, reflects the increased state dimension and the increased number of parameters.

Output 27.4.7: Model2: Model Summary Information

The SSM Procedure

Model Summary
Model Property Value
Number of Model Equations 1
State Dimension 8
Dimension of the Diffuse Initial Condition 8
Number of Parameters 5


Output 27.4.8 shows the parameter estimates for this model.

Output 27.4.8: Model2: Estimates of Unnamed Parameters (Partial Output)

Component Parameter Estimate StdErr
growth(Cross = a1) Level Variance 1.28E-05 6.83E-06
growth(Cross = a2) Level Variance 8.72E-06 3.81E-06
growth(Cross = a3) Level Variance 9.07E-06 4.23E-06
growth(Cross = a4) Level Variance 8.45E-06 3.40E-06
wn Variance 8.39E-03 4.98E-04


Next, the smoothed estimate of trend (growth) is graphically studied. The following DATA step prepares the data for the grouped plots of smoothed_growth by merging for1 with the input data set cows. As before, the reason is merely to include ironInf (the grouping variable).

data for1;
    merge for1 cows;
    by tpoint;
run;

The following statements produce the graphs of smoothed $\mu _{t}$ for the desired settings (since the grouping variable ironInf exactly corresponds to these settings). Once again, the plot in Output 27.4.9 clearly shows that the control group No Iron and No Infection has the best growth profile, while the worst growth profile is for the group Iron and Infection. However, unlike the first model, the profile curves are not merely shifted versions of a common profile.

 proc sgplot data=for1 noautolegend;
    title 'Iron and Infection Status-Specific Growth Profiles';
     band x=tpoint lower=smoothed_lower_growth 
       upper=smoothed_upper_growth / group=ironInf name="band";
     series  x=tpoint y=smoothed_growth / group=ironInf name="series";
     keylegend "series";
 run;

Output 27.4.9: Model 2: Growth Profile Comparison with 95% Confidence Bands

Model 2: Growth Profile Comparison with 95% Confidence Bands