The NLMIXED Procedure

Example 82.5 Failure Time and Frailty Model

In this example an accelerated failure time model with proportional hazard is fitted with and without random effects. The data are from the "Getting Started" example of PROC LIFEREG; see Chapter 69: The LIFEREG Procedure. Thirty-eight patients are divided into two groups of equal size, and different pain relievers are assigned to each group. The outcome reported is the time in minutes until headache relief. The variable censor indicates whether relief was observed during the course of the observation period (censor = 0) or whether the observation is censored (censor = 1). The SAS DATA step for these data is as follows:

data headache;
   input minutes group censor @@;
   patient = _n_;
   datalines;
11  1  0    12  1  0    19  1  0    19  1  0
19  1  0    19  1  0    21  1  0    20  1  0
21  1  0    21  1  0    20  1  0    21  1  0
20  1  0    21  1  0    25  1  0    27  1  0
30  1  0    21  1  1    24  1  1    14  2  0
16  2  0    16  2  0    21  2  0    21  2  0
23  2  0    23  2  0    23  2  0    23  2  0
25  2  1    23  2  0    24  2  0    24  2  0
26  2  1    32  2  1    30  2  1    30  2  0
32  2  1    20  2  1
;

In modeling survival data, censoring of observations must be taken into account carefully. In this example, only right censoring occurs. If $g(t,\bbeta )$, $h(t,\bbeta )$, and $G(t,\bbeta )$ denote the density of failure, the hazard function, and the survival distribution function at time t, respectively, then the log likelihood can be written as

\begin{align*} l(\bbeta ;\mb{t}) & = \sum _{i \in U_ u} \log g(t_ i,\bbeta ) + \sum _{i \in U_ c} \log G(t_ i,\bbeta ) \\ & = \sum _{i \in U_ u} \log h(t_ i,\bbeta ) + \sum _{i=1}^ n \log G(t_ i,\bbeta ) \end{align*}

(See Cox and Oakes 1984, Ch. 3.) In these expressions $U_ u$ is the set of uncensored observations, $U_ c$ is the set of censored observations, and n denotes the total sample size.

The proportional hazards specification expresses the hazard in terms of a baseline hazard, multiplied by a constant. In this example the hazard is that of a Weibull model and is parameterized as $h(t,\bbeta ) = \gamma \alpha (\alpha t)^{\gamma -1}$ and $\alpha = \exp \{ - \mb{x}’\bbeta \} $.

The linear predictor is set equal to the intercept in the reference group (group = 2); this defines the baseline hazard. The corresponding distribution of survival past time t is $G(t,\bbeta ) = \exp \{ - (\alpha t)^{\gamma }\} $. See Cox and Oakes (1984, Table 2.1) and the section "Supported Distributions" in Chapter 69: The LIFEREG Procedure, for this and other survival distribution models and various parameterizations.

The following NLMIXED statements fit this accelerated failure time model and estimate the cumulative distribution function of time to headache relief:

proc nlmixed data=headache;
   bounds gamma > 0;
   linp  = b0 - b1*(group-2);
   alpha = exp(-linp);
   G_t   = exp(-(alpha*minutes)**gamma);
   g     = gamma*alpha*((alpha*minutes)**(gamma-1))*G_t;
   ll    = (censor=0)*log(g) + (censor=1)*log(G_t);
   model minutes ~ general(ll);
   predict 1-G_t out=cdf;
run;

Output 82.5.1: Specifications Table for Fixed-Effects Failure Time Model

The NLMIXED Procedure

Specifications
Data Set WORK.HEADACHE
Dependent Variable minutes
Distribution for Dependent Variable General
Optimization Technique Dual Quasi-Newton
Integration Method None



The "Specifications" table shows that no integration is required, since the model does not contain random effects (Output 82.5.1).

Output 82.5.2: Negative Log Likelihood with Default Starting Values

Initial Parameters
gamma b0 b1 Negative
Log
Likelihood
1 1 1 263.990327



No starting values were given for the three parameters. The NLMIXED procedure assigns the default value of 1.0 in this case. The negative log likelihood based on these starting values is shown in Output 82.5.2.

Output 82.5.3: Iteration History for Fixed-Effects Failure Time Model

Iteration History
Iteration Calls Negative
Log
Likelihood
Difference Maximum
Gradient
Slope
1 4 169.2443 94.74602 22.5599 -2230.83
2 8 142.8735 26.3708 14.8863 -3.64643
3 11 140.6337 2.239814 11.2523 -9.49454
4 15 122.8907 17.74304 19.4496 -2.50807
5 17 121.3970 1.493699 13.8558 -4.55427
6 20 120.6238 0.773116 13.6706 -1.38064
7 22 119.2782 1.345647 15.7801 -1.69072
8 26 116.2713 3.006871 26.9403 -3.25290
9 30 109.4274 6.843925 19.8838 -6.92890
10 35 103.2981 6.129298 12.1565 -4.96054
11 40 101.6862 1.611863 14.2487 -4.34059
12 42 100.0279 1.658364 11.6985 -13.2049
13 46 99.9189048 0.108971 3.60255 -0.55176
14 49 99.8738836 0.045021 0.17071 -0.16645
15 52 99.8736392 0.000244 0.050822 -0.00041
16 55 99.8736351 4.071E-6 0.000705 -6.9E-6
17 58 99.8736351 6.1E-10 4.768E-6 -1.23E-9

NOTE: GCONV convergence criterion satisfied.



The "Iteration History" table shows that the procedure converges after 17 iterations and 34 evaluations of the objective function (Output 82.5.3).

Output 82.5.4: Fit Statistics and Parameter Estimates

Fit Statistics
-2 Log Likelihood 199.7
AIC (smaller is better) 205.7
AICC (smaller is better) 206.5
BIC (smaller is better) 210.7

Parameter Estimates
Parameter Estimate Standard
Error
DF t Value Pr > |t| 95% Confidence Limits Gradient
gamma 4.7128 0.6742 38 6.99 <.0001 3.3479 6.0777 5.327E-8
b0 3.3091 0.05885 38 56.23 <.0001 3.1900 3.4283 -4.77E-6
b1 -0.1933 0.07856 38 -2.46 0.0185 -0.3523 -0.03426 -1.22E-6



The parameter estimates and their standard errors shown in Output 82.5.4 are identical to those obtained with the LIFEREG procedure and the following statements:

proc lifereg data=headache;
   class group;
   model minutes*censor(1) = group / dist=weibull;
   output out=new cdf=prob;
run;

The t statistic and confidence limits are based on 38 degrees of freedom. The LIFEREG procedure computes z intervals for the parameter estimates.

For the two groups you obtain

\begin{align*} \widehat{\alpha }(\mbox{group}=1) & = \exp \{ -3.3091 + 0.1933\} = 0.04434 \\ \widehat{\alpha }(\mbox{group}=2) & = \exp \{ -3.3091\} = 0.03655 \end{align*}

The probabilities of headache relief by t minutes are estimated as

\begin{align*} 1-G(t,\mbox{group}=1) & = 1 - \exp \{ - (0.04434*t)^{4.7128} \} \\ 1-G(t,\mbox{group}=2) & = 1 - \exp \{ - (0.03655*t)^{4.7128} \} \\ \end{align*}

These probabilities, calculated at the observed times, are shown for the two groups in Output 82.5.5 and printed with the following statements:

proc print data=cdf;
   var group censor patient minutes pred;
run;

Since the slope estimate is negative with p-value of 0.0185, you can infer that pain reliever 1 leads to overall significantly faster relief, but the estimated probabilities give no information about patient-to-patient variation within and between groups. For example, while pain reliever 1 provides faster relief overall, some patients in group 2 might respond more quickly than some patients in group 1. A frailty model enables you to accommodate and estimate patient-to-patient variation in health status by introducing random effects into a subject’s hazard function.

Output 82.5.5: Estimated Cumulative Distribution Function

Obs group censor patient minutes Pred
1 1 0 1 11 0.03336
2 1 0 2 12 0.04985
3 1 0 3 19 0.35975
4 1 0 4 19 0.35975
5 1 0 5 19 0.35975
6 1 0 6 19 0.35975
7 1 0 7 21 0.51063
8 1 0 8 20 0.43325
9 1 0 9 21 0.51063
10 1 0 10 21 0.51063
11 1 0 11 20 0.43325
12 1 0 12 21 0.51063
13 1 0 13 20 0.43325
14 1 0 14 21 0.51063
15 1 0 15 25 0.80315
16 1 0 16 27 0.90328
17 1 0 17 30 0.97846
18 1 1 18 21 0.51063
19 1 1 19 24 0.73838
20 2 0 20 14 0.04163
21 2 0 21 16 0.07667
22 2 0 22 16 0.07667
23 2 0 23 21 0.24976
24 2 0 24 21 0.24976
25 2 0 25 23 0.35674
26 2 0 26 23 0.35674
27 2 0 27 23 0.35674
28 2 0 28 23 0.35674
29 2 1 29 25 0.47982
30 2 0 30 23 0.35674
31 2 0 31 24 0.41678
32 2 0 32 24 0.41678
33 2 1 33 26 0.54446
34 2 1 34 32 0.87656
35 2 1 35 30 0.78633
36 2 0 36 30 0.78633
37 2 1 37 32 0.87656
38 2 1 38 20 0.20414



The following statements model the hazard for patient i in terms of $ \alpha _ i = \exp \{ - \mb{x}_ i’\bbeta - z_ i \} $, where $z_ i$ is a (normal) random patient effect. Notice that the only difference from the previous NLMIXED statements are the RANDOM statement and the addition of z in the linear predictor. The empirical Bayes estimates of the random effect (RANDOM statement), the parameter estimates (ODS OUTPUT statement), and the estimated cumulative distribution function (PREDICT statement) are saved to subsequently graph the patient-specific distribution functions.

ods output ParameterEstimates=est;
proc nlmixed data=headache;
   bounds gamma > 0;
   linp  = b0 - b1*(group-2) + z;
   alpha = exp(-linp);
   G_t   = exp(-(alpha*minutes)**gamma);
   g     = gamma*alpha*((alpha*minutes)**(gamma-1))*G_t;
   ll = (censor=0)*log(g) + (censor=1)*log(G_t);
   model minutes ~ general(ll);
   random z ~ normal(0,exp(2*logsig)) subject=patient out=EB;
   predict 1-G_t out=cdf;
run;

Output 82.5.6: Specifications for Random Frailty Model

The NLMIXED Procedure

Specifications
Data Set WORK.HEADACHE
Dependent Variable minutes
Distribution for Dependent Variable General
Random Effects z
Distribution for Random Effects Normal
Subject Variable patient
Optimization Technique Dual Quasi-Newton
Integration Method Adaptive Gaussian Quadrature



The "Specifications" table shows that the objective function is computed by adaptive Gaussian quadrature because of the presence of random effects (compare Output 82.5.6 and Output 82.5.1). The "Dimensions" table reports that nine quadrature points are being used to integrate over the random effects (Output 82.5.7).

Output 82.5.7: Dimensions Table for Random Frailty Model

Dimensions
Observations Used 38
Observations Not Used 0
Total Observations 38
Subjects 38
Max Obs per Subject 1
Parameters 4
Quadrature Points 9



Output 82.5.8: Iteration History for Random Frailty Model

Iteration History
Iteration Calls Negative
Log
Likelihood
Difference Maximum
Gradient
Slope
1 9 142.1214 28.82225 12.1448 -88.8664
2 12 136.4404 5.681042 25.9310 -65.7217
3 16 122.9720 13.46833 46.5655 -146.887
4 19 120.9048 2.067216 23.7794 -94.2862
5 23 109.2241 11.68068 57.6549 -92.4075
6 26 105.0647 4.159411 4.82465 -19.5879
7 28 101.9022 3.162526 14.1287 -6.33767
8 31 99.6907395 2.211468 7.67682 -3.42364
9 34 99.3654033 0.325336 5.68920 -0.93978
10 37 99.2602178 0.105185 0.31764 -0.23408
11 40 99.2544340 0.005784 1.17351 -0.00556
12 42 99.2456973 0.008737 0.24741 -0.00871
13 45 99.2445445 0.001153 0.10494 -0.00218
14 48 99.2444958 0.000049 0.005646 -0.00010
15 51 99.2444957 9.147E-8 0.000271 -1.84E-7

NOTE: GCONV convergence criterion satisfied.



The procedure converges after 15 iterations (Output 82.5.8). The achieved –2 log likelihood is only 1.2 less than that in the model without random effects (compare Output 82.5.9 and Output 82.5.4). Compared to a chi-square distribution with one degree of freedom, the addition of the random effect appears not to improve the model significantly. You must exercise care, however, in interpreting likelihood ratio tests when the value under the null hypothesis falls on the boundary of the parameter space (see, for example, Self and Liang 1987).

Output 82.5.9: Fit Statistics for Random Frailty Model

Fit Statistics
-2 Log Likelihood 198.5
AIC (smaller is better) 206.5
AICC (smaller is better) 207.7
BIC (smaller is better) 213.0



Output 82.5.10: Parameter Estimates

Parameter Estimates
Parameter Estimate Standard
Error
DF t Value Pr > |t| 95% Confidence Limits Gradient
gamma 6.2867 2.1334 37 2.95 0.0055 1.9640 10.6093 -1.89E-7
b0 3.2786 0.06576 37 49.86 <.0001 3.1453 3.4118 0.000271
b1 -0.1761 0.08264 37 -2.13 0.0398 -0.3436 -0.00867 0.000111
logsig -1.9027 0.5273 37 -3.61 0.0009 -2.9710 -0.8343 0.000027



The estimate of the Weibull parameter has changed drastically from the model without random effects (compare Output 82.5.10 and Output 82.5.4). The variance of the patient random effect is $\exp \{ -2 \times 1.9027\}  = 0.02225$. The listing in Output 82.5.11 shows the empirical Bayes estimates of the random effects. These are the adjustments made to the linear predictor in order to obtain a patient’s survival distribution. The listing is produced with the following statements:

proc print data=eb;
   var Patient Effect Estimate StdErrPred;
run;

Output 82.5.11: Empirical Bayes Estimates of Random Effects

Obs patient Effect Estimate StdErrPred
1 1 z -0.13597 0.23249
2 2 z -0.13323 0.22793
3 3 z -0.06294 0.13813
4 4 z -0.06294 0.13813
5 5 z -0.06294 0.13813
6 6 z -0.06294 0.13813
7 7 z -0.02568 0.11759
8 8 z -0.04499 0.12618
9 9 z -0.02568 0.11759
10 10 z -0.02568 0.11759
11 11 z -0.04499 0.12618
12 12 z -0.02568 0.11759
13 13 z -0.04499 0.12618
14 14 z -0.02568 0.11759
15 15 z 0.05980 0.11618
16 16 z 0.10458 0.12684
17 17 z 0.17147 0.14550
18 18 z 0.06471 0.13807
19 19 z 0.11157 0.14604
20 20 z -0.13406 0.22899
21 21 z -0.12698 0.21666
22 22 z -0.12698 0.21666
23 23 z -0.08506 0.15701
24 24 z -0.08506 0.15701
25 25 z -0.05797 0.13294
26 26 z -0.05797 0.13294
27 27 z -0.05797 0.13294
28 28 z -0.05797 0.13294
29 29 z 0.06420 0.13956
30 30 z -0.05797 0.13294
31 31 z -0.04266 0.12390
32 32 z -0.04266 0.12390
33 33 z 0.07618 0.14132
34 34 z 0.16292 0.16459
35 35 z 0.13193 0.15528
36 36 z 0.06327 0.12124
37 37 z 0.16292 0.16459
38 38 z 0.02074 0.14160



The predicted values and patient-specific survival distributions can be plotted with the SAS code that follows:

proc transpose data=est(keep=estimate)
    out=trest(rename=(col1=gamma col2=b0 col3=b1));
run;

data pred;
   merge eb(keep=estimate) headache(keep=patient group);
   array pp{2} pred1-pred2;
   if _n_ = 1 then set trest(keep=gamma b0 b1);
   do time=11 to 32;
      linp      = b0 - b1*(group-2) + estimate;
      pp{group} = 1-exp(- (exp(-linp)*time)**gamma);
      symbolid  = patient+1;
      output;
   end;
   keep pred1 pred2 time patient;
run;

data pred;
   merge pred
         cdf(where  = (group=1)
             rename = (pred=pcdf1 minutes=minutes1)
             keep   = pred minutes group)
         cdf(where  = (group=2)
             rename = (pred=pcdf2 minutes=minutes2)
             keep   = pred minutes group);
   drop group;
run;

proc sgplot data=pred noautolegend;
   label minutes1='Minutes to Headache Relief'
         pcdf1   ='Estimated Patient-specific CDF';
   series  x=time     y=pred1  /
           group=patient
           lineattrs=(pattern=solid color=black);
   series  x=time     y=pred2  /
           group=patient
           lineattrs=(pattern=dash color=black);
   scatter x=minutes1 y=pcdf1  /
           markerattrs=(symbol=CircleFilled size=9);
   scatter x=minutes2 y=pcdf2  /
           markerattrs=(symbol=Circle       size=9);
run;

The separation of the distribution functions by groups is evident in Output 82.5.12. Most of the distributions of patients in the first group are to the left of the distributions in the second group. The separation is not complete, however. Several patients who are assigned the second pain reliever experience headache relief more quickly than patients assigned to the first group.

Output 82.5.12: Patient-Specific CDFs and Predicted Values. Pain Reliever 1: Solid Lines, Closed Circles; Pain Reliever 2: Dashed Lines, Open Circles.

Patient-Specific CDFs and Predicted Values. Pain Reliever 1: Solid Lines, Closed Circles; Pain Reliever 2: Dashed Lines, Open Circles.