The PHREG Procedure

Proportional Subdistribution Hazards Model for Competing-Risks Data

Competing risks arise in the analysis of time-to-event data when the event of interest can be impeded by a prior event of a different type. For example, a leukemia patient’s relapse might be unobservable because the patient dies before relapse is diagnosed. In the presence of competing risks, the Kaplan-Meier method of estimating the survivor function is biased, because you can no longer assume that a subject will experience the event of interest if the follow-up period is long enough. The cumulative incidence function (CIF), which is the marginal failure subdistribution of a given cause, is widely used in competing-risks analysis.

The proportional hazards model for the subdistribution that Fine and Gray (1999) propose aims at modeling the cumulative incidence of an event of interest. They define a subdistribution hazard,

\[  \bar{\lambda }_ k(t) = - \frac{d}{dt} \left(1 - F_ k(t) \right)  \]

where $F_ k(t)$ is the cumulative incidence function for the failure of cause k, and they impose a proportional hazards assumption on the subdistribution hazards:

\[  \bar{\lambda }_ k(t|\bZ ) = \bar{\lambda }_{k,0} \exp (\bbeta _ k’\bZ )  \]

The estimation of the regression coefficients is based on modified risk sets, where subjects that experience a competing event are retained after their event. The weight of those subjects that are artificially retained in the risk sets is gradually reduced according to the conditional probability of being under follow-up had the competing event not occurred.

You use PROC PHREG to fit the Fine and Gray (1999) model by specifying the EVENTCODE= option in the MODEL statement to indicate the event of interest. Maximum likelihood estimates of the regression coefficients are obtained by the Newton-Raphson algorithm. The covariance matrix of the parameter estimator is computed as a sandwich estimate. You can request the CIF curves for a given set of covariates by using the BASELINE statement. The PLOTS=CIF option in the PROC PHREG statement displays a plot of the curves. You can obtain Schoenfeld residuals and score residuals by using the OUTPUT statement.

To model the subdistribution hazards for clustered data (Zhou et al., 2012), you use the COVS(AGGREGATE) option in the PROC PHREG statement. You also have to specify the ID statement to identify the clusters. To model the subdistribution hazards for stratified data (Zhou et al., 2011), you use the STRATA statement. PROC PHREG handles only regular stratified data that have a small number of large subject groups.

When you specify the EVENTCODE= option in the MODEL statement, the ASSESS, BAYES, and RANDOM statements are ignored. The ATRISK and COVM options in the PROC PHREG statement are also ignored, as are the following options in the MODEL statement: BEST=, DETAILS, HIERARCHY=, INCLUDE=, NOFIT, PLCONV=, RISKLIMITS=PL, SELECTION=, SEQUENTIAL, SLENTRY=, SLSTAY=, TYPE1, and TYPE3(LR, SCORE). Profile likelihood confidence intervals for the hazard ratios are not available for the Fine and Gray competing-risks analysis.

Parameter Estimation

For the ith subject, $i=1,\ldots ,n$, let $X_ i$, $\Delta _ i$, $\epsilon _ i$, and $\bZ _ i(t)$ be the observed time, event indicator, cause of failure, and covariate vector at time t, respectively. Assume that K causes of failure are observable ($\epsilon _ i \in (1, \ldots , K)$). Consider failure from cause 1 to be the failure of interest, with failures of other causes as competing events. Let

\begin{eqnarray*}  N_ i(t) & =&  I(X_ i\leq t, \epsilon _ i=1) \\ Y_ i(t) & =&  1 - N_ i(t-) \end{eqnarray*}

Note that if $\epsilon _ i=1$, then $N_ i(t)=I(X_ i\leq t)$ and $Y_ i(t)=I(X_ i\geq t)$; if $\epsilon _ i \neq 1$, then $N_ i(t)=0$ and $Y_ i(t)=1$. Let

\begin{eqnarray*}  r_ i(t) & =&  I(C_ i \geq T_ i \wedge t) \\ w_ i(t) & =&  r_ i(t) \frac{G(t)}{G(X_ i \wedge t)} \end{eqnarray*}

where $G(t)$ is the Kaplan-Meier estimate of the survivor function of the censoring variable, which is calculated using $\{ X_ i, 1-\Delta _ i, i=1,2,\ldots ,n\} $. If $\Delta _ i=0$, then $r_ i(t)=1$ when $t\leq X _ i$ and 0 otherwise; and if $\Delta _ i=1$, then $r_ i(t)=1$. Table 73.12 displays the weight of a subject at a function of time.

Table 73.12: Weight for the ith Subject

$t, X_ i$


$r_ i(t)$

$Y_ i(t)$

$w_ i(t)$

$t \leq X_ i$

$\Delta _ i=0$





$\Delta _ i\epsilon _ i=1 $





$\Delta _ i\epsilon _ i \neq 1$




$t > X_ i$

$\Delta _ i=0$





$\Delta _ i\epsilon _ i=1$



$G(t)/G(X_ i)$


$\Delta _ i\epsilon _ i\neq 1$



$G(t)/G(X_ i)$

The regression coefficients $\bbeta $ are estimated by maximizing the pseudo-likelihood $L(\bbeta )$ with respect to $\bbeta $:

\[  L(\bbeta ) = \prod _{i=1}^ n \left( \frac{\exp (\bbeta '\bZ _ i(X_ i))}{ \sum _{j=1}^ n Y_ j(X_ i)w_ j(X_ i) \exp (\bbeta '\bZ _ j(X_ i))}\right)^{I(\Delta _ i\epsilon _ i=1)}  \]

The variance-covariance matrix of the maximum likelihood estimator $\hat{\bbeta }$ is approximated by a sandwich estimate.

With $\mb{a}^{(0)}=1$, $\mb{a}^{(1)}=\mb{a}$, and $\mb{a}^{(2)}=\mb{a}\mb{a}’$, let

\begin{eqnarray*}  \bS _2^{(r)}(\bbeta ,u) & =&  \sum _{j=1}^ n w_ j(u)Y_ j(u)\bZ _ j(u)^{\otimes r}\exp (\bbeta ’\bZ _ j(u)), \mbox{~ ~ } r=0,1,2 \\ \bar{\bZ }(\bbeta ,u) & =&  \frac{\bS _2^{(1)}(\bbeta ,u)}{S_2^{(0)}(\bbeta ,u)} \end{eqnarray*}

The score function $\bU _2(\bbeta )$ and the observed information matrix $\hat{\bOmega }$ are given by

\begin{eqnarray*}  \bU _2(\hat{\bbeta }) & =&  \sum _{i=1}^ n \biggl (\bZ _ i(X_ i) - \bar{\bZ }(\bbeta ,X_ i) \biggr ) I(\Delta _ i\epsilon _ i=1) \\ \hat{\bOmega } & =&  -\frac{\partial \bU _2(\hat{\bbeta )}}{\partial \bbeta } = \sum _{i=1}^ n \left( \frac{\bS _2^{(2)}(\hat{\bbeta }, X_ i)}{S_2^{(0)}(\hat{\bbeta },X_ i)} - \bar{\bZ }^{\otimes 2}(\hat{\bbeta },X_ i) \right) I(\Delta _ i\epsilon _ i=1) \end{eqnarray*}

The sandwich variance estimate of $\hat{\bbeta }$ is

\[  \widehat{\mr{var}}(\hat{\bbeta }) = \hat{\bOmega }^{-1} \hat{\bSigma } \hat{\bOmega }^{-1}  \]

where $\hat{\bSigma }$ is the estimate of the variance-covariance matrix of $\bU _2(\hat{\bbeta })$ that is given by

\[  \hat{\bSigma } = \sum _{i=1}^ n (\hat{\bm {\eta }}_ i + \hat{\bpsi }_ i)^{\otimes 2} \\  \]


\[  \hat{\bm {\eta }}_ i = \int _0^\infty \biggl (\bZ _ i(u) - \bar{\bZ }(\hat{\bbeta },u) \biggr ) w_ i(u) d\hat{M}_ i^1(u)  \]
\[  \hat{\bpsi }_ i =\int _0^\infty \frac{\hat{\mb{q}}(u)}{\pi (u)} d\hat{M}_ i^ c(u)  \]
\[  \hat{\mb{q}}(u) = - \sum _{i=1}^{n} \int _0^\infty \biggl (\bZ _ i(s) - \bar{\bZ }(\hat{\bbeta }, s) \biggr ) w_ i(s)d\hat{M}_ i^1I(s\geq u > X_ i)  \]
\[  \pi (u) = \sum _ j I(X_ j \geq u)  \]
\[  \hat{M}_ i^1(t) = N_ i(t) - \int _0^ t Y_ i(s) \exp (\hat{\bbeta }’\bZ _ i(s)) d\hat{\Lambda }_{10}(s)  \]
\[  \hat{M}_ i^ c(t) = I(X_ i \leq t, \Delta _ i=0) - \int _0^ t I(X_ i\geq u) d\hat{\Lambda }^ c(u)  \]
\[  \hat{\Lambda }_{10}(t) = \sum _{i=1}^ n \int _0^ t \frac{1}{S_2^{(0)}(\hat{\bbeta },u)} w_ i(u) dN_ i(u)  \]
\[  \hat{\Lambda }^ c(t) = \int _0^ t \frac{\sum _ i d\{ I(X_ i \leq u, \Delta _ i=0) \} }{\sum _ i I(X_ i \geq u)}  \]


You can use the OUTPUT statement to output Schoenfeld residuals and score residuals to a SAS data set.

Schoenfeld residuals:

$\bZ _ i(X_ i) - \bar{\bZ }(\hat{\bbeta },X_ i), \Delta _ i\epsilon _ i=1$

$1\leq i \leq n$

Score residuals

$ \hat{\bm {\eta }}_ i + \hat{\bpsi }_ i$

$1\leq i \leq n$

Cumulative Incidence Prediction

For an individual with covariates $\bZ =\mb{z}_0$, the cumulative subdistribution hazard is estimated by

\[  \hat{\Lambda }_1(t;\mb{z}_0) = \int _0^ t \exp [\hat{\bbeta }’\mb{z}_0]d\hat{\Lambda }_{10}(u)  \]

and the predicted cumulative incidence is

\[  \hat{F}_1(t;\mb{z}_0) = 1 - \exp [-\hat{\Lambda }_1(t;\mb{z}_0)]  \]

To compute the confidence interval for the cumulative incidence, consider a monotone transformation $m(p)$ with first derivative $\dot{m}(p)$. Fine and Gray (1999, Section 5) give the following procedure to calculate pointwise confidence intervals. First, you generate B samples of normal random deviates $\{  (A_{k1}, \ldots , A_{kn}), 1\leq k \leq B\} $. You can specify the value of B by using the NORMALSAMPLE= option in the BASELINE statement. Then, you compute the estimate of var$\{ m[\hat{F}_1(t;\bm {z}_0)] - m[F_1(t;\bm {z}_0)]\} $ as

\[  \hat{\sigma }^2(t;\bm {z}_0) = \frac{1}{B} \sum _{k=1}^ B \hat{J}_{1k}^2(t;\bm {z}_0)  \]


\begin{eqnarray*}  \hat{J}_{1k}(t;\bm {z}_0) &  = &  \dot{m}[\hat{F}_1(t;\mb{z}_0)]\exp [- \hat{\Lambda }_1(t;\mb{z}_0)] \sum _{i=1}^ n A_{ki} \biggl \{  \int _0^ t \frac{\exp (\hat{\bbeta }'\bm {z}_0)}{S_2^{(0)}(\hat{\bbeta },u)} w_ i(u) d\hat{M}^1_ i(u) \\ & &  + ~  \hat{\bm {h}}’(t;\bm {z}_0) \hat{\bOmega }^{-1}(\hat{\bm {\eta }}_ i + \hat{\bpsi }_ i) + \int _0^\infty \frac{\hat{\bm {v}}(u,t,\bm {z}_0)}{\hat{\pi }(u)} d\hat{M}_ i^ c(u) \biggr \}  \end{eqnarray*}
\[  \hat{\mb{h}}(t;\mb{z}_0) = \exp (\hat{\bbeta }’\mr{z}_0) \biggl \{  \hat{\Lambda }_{10}(t)\mb{z}_0 - \int _0^ t \bar{\bZ }(\hat{\bbeta },u)d\hat{\Lambda }_{10}(u) \biggr \}   \]
\[  \hat{v}(u,t,\mb{z}_0) = -\exp (\hat{\bbeta }’\mr{z}_0) \sum _{i=1}^ n \int _0^ t \frac{1}{S_2^{(0)}(\hat{\bbeta },s)} w_ i(s) d\hat{M}_ i^1(s) I(s \geq u > X_ i)  \]

A 100(1–$\alpha $)% confidence interval for $\hat{F}_1(t;\mb{z}_0)$ is given by

\[  m^{-1} \biggl ( m[\hat{F}_1(t;\mb{z}_0)] \pm z_{\alpha } \hat{\sigma }(t;\mb{z}_0) \biggr )  \]

where $z_\alpha $ is the 100(1–$\frac{\alpha }{2}$) percentile of a standard normal distribution.

The CLTYPE option in the BASELINE statement enables you to choose the LOG transformation, the LOGLOG (log of negative log) transformation, or the IDENTITY transformation. You can also output the standard error of the cumulative incidence, which is approximated by the delta method as follows:

\[  \hat{\sigma }^2(\hat{F}(t;\mb{z}_0)) = \left(\dot{m}[\hat{F}(t;\mb{z}_0)] \right)^{-2} \hat{\sigma }^2(t;\mb{z}_0)  \]

Table 73.13 displays the variance estimator for each transformation that is available in PROC PHREG.

Table 73.13: Variance Estimate of the CIF Predictor

CLTYPE= keyword





$\hat{\sigma }^2(t;\mb{z}_0)$


$m(p)=\log (p)$

$\left( \hat{F}_1(t;\mb{z}_0) \right)^2 \hat{\sigma }^2(t;\mb{z}_0)$


$m(p)=\log (-\log (p))$

$\left( \hat{F}(t;\mb{z}_0)\log (\hat{F}(t;\mb{z}_0)) \right)^2 \hat{\sigma }^2(t;\mb{z}_0) $