Example 89.1 Analysis of Clustered Data

When experimental units are naturally or artificially clustered, failure times of experimental units within a cluster are correlated. Lee, Wei, and Amato (1992) estimate the regression parameters in the Cox model by maximizing a partial likelihood function under an independent working correlation assumption and estimate the variance of the estimated regression coefficients by using a robust sandwich variance estimator that accounts for the intracluster dependence.

The Diabetic Retinopathy Study (DRS) is a randomized, controlled clinical trial of more than 1,700 patients across 15 medical centers. One objective of this study was to determine if photocoagulation treatment delays the occurrence of blindness. One eye of each patient was randomly assigned to treatment and the other eye to control. See Example 66.11 in Chapter 66, The PHREG Procedure, for more information about the data set and a similar analysis; see http://www.nei.nih.gov/neitrials/static/study62.asp for more information about the DRS.

Each patient is a cluster that contributes two observations to the input data set, one for each eye. The following variables are available:

  • ID, patient’s identification

  • Time, failure time

  • Status, event indicator (0=censored, and 1=uncensored)

  • Treatment, treatment received (1=laser photocoagulation, and 0=otherwise)

  • DiabeticType, type of diabetes (0=juvenile onset with age of onset at 20 or under, and 1= adult onset with age of onset over 20)

The following DATA step creates the data set Blind, which represents 197 diabetic patients from the DRS:

data Blind;
   input ID Time Status DiabeticType Treatment @@;
   datalines;
   5 46.23 0 1 1    5 46.23 0 1 0   14 42.50 0 0 1   14 31.30 1 0 0
  16 42.27 0 0 1   16 42.27 0 0 0   25 20.60 0 0 1   25 20.60 0 0 0
  29 38.77 0 0 1   29  0.30 1 0 0   46 65.23 0 0 1   46 54.27 1 0 0
  49 63.50 0 0 1   49 10.80 1 0 0   56 23.17 0 0 1   56 23.17 0 0 0
  61  1.47 0 0 1   61  1.47 0 0 0   71 58.07 0 1 1   71 13.83 1 1 0
 100 46.43 1 1 1  100 48.53 0 1 0  112 44.40 0 1 1  112  7.90 1 1 0
 120 39.57 0 1 1  120 39.57 0 1 0  127 30.83 1 1 1  127 38.57 1 1 0
 133 66.27 0 1 1  133 14.10 1 1 0  150 20.17 1 0 1  150  6.90 1 0 0
 167 58.43 0 1 1  167 41.40 1 1 0  176 58.20 0 0 1  176 58.20 0 0 0

   ... more lines ...   

1705  8.00 0 0 1 1705  8.00 0 0 0 1717 51.60 0 1 1 1717 42.33 1 1 0
1727 49.97 0 1 1 1727  2.90 1 1 0 1746 45.90 0 0 1 1746  1.43 1 0 0
1749 41.93 0 1 1 1749 41.93 0 1 0
;

The following SAS statements request a proportional hazards regression of Time on Treatment, DiabeticType, and the Treatment DiabeticType interaction, with Status as the censoring indicator. The CLUSTER statement indicates the observations that came from the same patient.

proc surveyphreg data=Blind;
   model Time*Status(0) = Treatment DiabeticType Treatment*DiabeticType;
   cluster id;
run;

Output 89.1.1 displays some summary information. There are 394 observations and 197 patients (clusters). Almost 61% of the observations are censored. The p-values for the null model are less than 0.0001 for both the likelihood ratio test and the Wald test (Output 89.1.2), which indicates that the survival time is highly dependent on Treatment and DiabeticType. In this example, the likelihood ratio statistic has a chi-square distribution with 3 degrees of freedom and the Wald statistics has the F distribution with the numerator degrees of freedom 3 and the denominator degrees of freedom 196. The denominator degrees of freedom are calculated as the number of clusters (197) minus one.

Output 89.1.1 Summary Information
The SURVEYPHREG Procedure

Number of Observations Read 394
Number of Observations Used 394

Design Summary
Number of Clusters 197

Summary of the Number of Event and Censored
Values
Total Event Censored Percent
Censored
394 155 239 60.66

Variance Estimation
Method Taylor Series

Output 89.1.2 Global Test Results
Testing Global Null Hypothesis: BETA=0
Test Test
Statistic
Num DF Den DF p-Value
Likelihood Ratio 28.4556 3 Infty <.0001
Wald 11.3872 3 196 <.0001

Output 89.1.3 displays parameter estimates, standard errors, statistics, denominator degrees of freedom, p-values, and hazard ratios. In this example data set, Treatment and Treatment DiabeticType interaction are significant with p-values 0.023 and 0.006, respectively. Since the model contains Treatment DiabeticType interaction, the exponential of the estimated regression coefficient is not the hazard ratio. Use the ESTIMATE statement to calculate the hazard ratios.

Output 89.1.3 Parameter Estimates
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error t Value Pr > |t| Hazard
Ratio
Treatment 196 -0.424672 0.185912 -2.28 0.0234 0.654
DiabeticType 196 0.340841 0.196577 1.73 0.0845 1.406
Treatment*DiabeticTy 196 -0.845665 0.305081 -2.77 0.0061 0.429