The SURVEYPHREG Procedure

Example 113.1 Analysis of Clustered Data

When experimental units are naturally or artificially clustered, failure times of experimental units within a cluster are correlated. Lee, Wei, and Amato (1992) estimate the regression parameters in the Cox model by maximizing a partial likelihood function under an independent working correlation assumption and estimate the variance of the estimated regression coefficients by using a robust sandwich variance estimator that accounts for the intracluster dependence.

The Diabetic Retinopathy Study (DRS) is a randomized, controlled clinical trial of more than 1,700 patients across 15 medical centers. One objective of this study was to determine if photocoagulation treatment delays the occurrence of blindness. One eye of each patient was randomly assigned to treatment and the other eye to control. For more information about the data set and a similar analysis, see Example 85.11 in Chapter 85: The PHREG Procedure.

Each patient is a cluster that contributes two observations to the input data set, one for each eye. The following variables are available:

ID, patient’s identification
Time, failure time
Status, event indicator (0=censored, and 1=uncensored)
Treatment, treatment received (1=laser photocoagulation, and 0=otherwise)
DiabeticType, type of diabetes (0=juvenile onset with age of onset at 20 or under, and 1= adult onset with age of onset over 20)

The following DATA step creates the data set Blind, which represents 197 diabetic patients from the DRS:

data Blind;
   input ID Time Status DiabeticType Treatment @@;
   datalines;
   5 46.23 0 1 1    5 46.23 0 1 0   14 42.50 0 0 1   14 31.30 1 0 0
  16 42.27 0 0 1   16 42.27 0 0 0   25 20.60 0 0 1   25 20.60 0 0 0
  29 38.77 0 0 1   29  0.30 1 0 0   46 65.23 0 0 1   46 54.27 1 0 0
  49 63.50 0 0 1   49 10.80 1 0 0   56 23.17 0 0 1   56 23.17 0 0 0
  61  1.47 0 0 1   61  1.47 0 0 0   71 58.07 0 1 1   71 13.83 1 1 0
 100 46.43 1 1 1  100 48.53 0 1 0  112 44.40 0 1 1  112  7.90 1 1 0
 120 39.57 0 1 1  120 39.57 0 1 0  127 30.83 1 1 1  127 38.57 1 1 0
 133 66.27 0 1 1  133 14.10 1 1 0  150 20.17 1 0 1  150  6.90 1 0 0
 167 58.43 0 1 1  167 41.40 1 1 0  176 58.20 0 0 1  176 58.20 0 0 0
 185 57.43 0 1 1  185 57.43 0 1 0  190 56.03 0 0 1  190 56.03 0 0 0

   ... more lines ...   

1705  8.00 0 0 1 1705  8.00 0 0 0 1717 51.60 0 1 1 1717 42.33 1 1 0
1727 49.97 0 1 1 1727  2.90 1 1 0 1746 45.90 0 0 1 1746  1.43 1 0 0
1749 41.93 0 1 1 1749 41.93 0 1 0
;

The following statements request a proportional hazards regression of Time on Treatment, DiabeticType, and the Treatment $\times$ DiabeticType interaction, with Status as the censoring indicator. The CLUSTER statement indicates the observations that came from the same patient.

proc surveyphreg data=Blind;
   model Time*Status(0) = Treatment DiabeticType Treatment*DiabeticType;
   cluster id;
run;

Output 113.1.1 displays some summary information. There are 394 observations and 197 patients (clusters). Almost 61% of the observations are censored. The p-values for the null model are less than 0.0001 for both the likelihood ratio test and the Wald test (Output 113.1.2), indicating that the survival time is highly dependent on Treatment and DiabeticType. In this example, the likelihood ratio statistic has an approximate chi-square distribution with 3 degrees of freedom, and the Wald statistic has an approximate F distribution with 3 numerator degrees of freedom and 194 denominator degrees of freedom. The denominator degrees of freedom are calculated as the number of clusters (197) minus the number of estimable parameters (3).

Output 113.1.1: Summary Information

The SURVEYPHREG Procedure

Number of Observations Read	394
Number of Observations Used	394

Design Summary
Number of Clusters	197

Summary of the Number of Event and Censored Values
Total	Event	Censored	Percent Censored
394	155	239	60.66

Variance Estimation
Method	Taylor Series

Output 113.1.2: Global Test Results

Testing Global Null Hypothesis: BETA=0
Test	Test Statistic	Num DF	Den DF	p-Value
Likelihood Ratio	28.4556	3	Infty	<.0001
Wald	11.4455	3	194	<.0001

Output 113.1.3 displays parameter estimates, standard errors, t statistics, denominator degrees of freedom, p-values, and hazard ratios. In this example data set, Treatment and Treatment $\times$ DiabeticType interaction are significant with p-values 0.023 and 0.006, respectively. Since the model contains Treatment $\times$ DiabeticType interaction, the exponential of the estimated regression coefficient is not the hazard ratio. Use the ESTIMATE statement to calculate the hazard ratios.

Output 113.1.3: Parameter Estimates

Analysis of Maximum Likelihood Estimates
Parameter	DF	Estimate	Standard Error	t Value	Pr > \|t\|	Hazard Ratio
Treatment	196	-0.424672	0.185438	-2.29	0.0231	0.654
DiabeticType	196	0.340841	0.196076	1.74	0.0837	1.406
Treatment*DiabeticTy	196	-0.845665	0.304303	-2.78	0.0060	0.429