The PHREG Procedure

Assessment of the Proportional Hazards Model

The proportional hazards model specifies that the hazard function for the failure time T associated with a $p\times 1$ column covariate vector $\bZ $ takes the form

\[ \lambda (t;\bZ )=\lambda _0(t)\mr{e}^{\bbeta '\bZ } \]

where $\lambda _0(.)$ is an unspecified baseline hazard function and $\bbeta $ is a $p\times 1$ column vector of regression parameters. Lin, Wei, and Ying (1993) present graphical and numerical methods for model assessment based on the cumulative sums of martingale residuals and their transforms over certain coordinates (such as covariate values or follow-up times). The distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by simulation. Each observed residual pattern can then be compared, both graphically and numerically, with a number of realizations from the null distribution. Such comparisons enable you to assess objectively whether the observed residual pattern reflects anything beyond random fluctuation. These procedures are useful in determining appropriate functional forms of covariates and assessing the proportional hazards assumption. You use the ASSESS statement to carry out these model-checking procedures.

For a sample of n subjects, let $(X_ i,\Delta _ i,\bZ _ i)$ be the data of the ith subject; that is, $X_ i$ represents the observed failure time, $\Delta _ i$ has a value of 1 if $X_ i$ is an uncensored time and 0 otherwise, and $\bZ _ i=(Z_{1i},\ldots ,Z_{pi})’$ is a p-vector of covariates. Let $N_ i(t)=\Delta _ iI(X_ i \le t)$ and $Y_ i(t)=I(X_ i\ge t)$. Let

\[ S^{(0)}(\bbeta ,t) = \sum _{i=1}^ n Y_ i(t)\mr{e}^{\bbeta '\bZ _ i} ~ ~ \mr{and}~ ~ \bZ (\bbeta ,t) = \frac{\sum _{i=1}^ n Y_ i(t)\mr{e}^{\bbeta '\bZ _ i}\bZ _ i}{S^{(0)}(\bbeta ,t)} \]

Let $\hat{\bbeta }$ be the maximum partial likelihood estimate of $\bbeta $, and let $\mc{I}(\hat{\bbeta })$ be the observed information matrix.

The martingale residuals are defined as

\[ \hat{M}_ i(t) = N_ i(t) -\int _0^{t} Y_ i(u)\mr{e}^{\hat{\bbeta }'\bZ _ i} d\hat{\Lambda }_0(u), i=1,\ldots ,n \]

where $ \hat{\Lambda }_0(t) = \int _0^ t \frac{\sum _{i=1}^ n dN_ i(u)}{S^{(0)}(\hat{\bbeta },u)}$.

The empirical score process $\bU (\hat{\bbeta },t)= (U_1(\hat{\bbeta },t),\ldots ,U_ p(\hat{\bbeta },t))’$ is a transform of the martingale residuals:

\[ \bU (\hat{\bbeta },t) = \sum _{i=1}^ n \bZ _ i\hat{M}_ i(t) \]

Checking the Functional Form of a Covariate

To check the functional form of the jth covariate, consider the partial-sum process of $\hat{M}_ i=\hat{M}_ i(\infty )$:

\[ W_ j(z)= \sum _{i=1}^ n I(Z_{ji} \le z)\hat{M}_ i \]

Under that null hypothesis that the model holds, $W_ j(z)$ can be approximated by the zero-mean Gaussian process

\begin{eqnarray*} \hat{W}_{j}(z) & = & \sum _{l=1}^ n \Delta _ l \biggl \{ I(Z_{jl} \le z) - \frac{\sum _{i=1}^ n Y_ i(X_ l)\mr{e}^{\bbeta '\bZ _ i}I(Z_{ij}\le z)}{S^{(0)}(\hat{\bbeta },X_ l)}\biggr \} G_ l - \\ & & \sum _{k=1}^ n \int _0^\infty Y_ k(s) \mr{e}^{\hat{\bbeta }'\bZ _ k} I(Z_{jk} \le z) [\bZ _ k - \bar{\bZ }(\hat{\bbeta },s)]’d\hat{\Lambda }_0(s) \\ & & \times \mc{I}^{-1}(\hat{\bbeta }) \sum _{l=1}^ n \Delta _ l [\bZ _ l -\bar{\bZ }(\hat{\bbeta },X_ l) ] G_ l \\ \end{eqnarray*}

where $(G_1,\ldots ,G_ n)$ are independent standard normal variables that are independent of $(X_ i,\Delta _ i,\bZ _ i)$, $i=1,\ldots ,n$.

You can assess the functional form of the jth covariate by plotting a small number of realizations (the default is 20) of $\hat{W}_{j}(z)$ on the same graph as the observed $W_ j(z)$ and visually comparing them to see how typical the observed pattern of $W_ j(z)$ is of the null distribution samples. You can supplement the graphical inspection method with a Kolmogorov-type supremum test. Let $s_ j$ be the observed value of $S_ j = \sup _ z|W_ j(z)|$ and let $\hat{S}_ j = \sup _ z|\hat{W}_ j(z)|$. The p-value $\mr{Pr}(S_ j \ge s_ j)$ is approximated by $\mr{Pr}(\hat{S}_ j \ge s_ j)$, which in turn is approximated by generating a large number of realizations (1000 is the default) of $\hat{W}_ j(.)$.

Checking the Proportional Hazards Assumption

Consider the standardized empirical score process for the jth component of $\bZ $

\[ U_ j^*(t)=[\mc{I}^{-1}(\hat{\bbeta })_{jj}]^{\frac{1}{2}} U_ j(\hat{\bbeta },t), \]

Under the null hypothesis that the model holds, $U^*_ j(t)$ can be approximated by

\begin{eqnarray*} \hat{U}^*_ j(t) & =& [\mc{I}^{-1}(\hat{\bbeta })_{jj}]^{\frac{1}{2}} \biggl \{ \sum _{l=1}^ n I(X_ l\le t)\Delta _ l [Z_{jl} - \bar{Z_ j}(\hat{\bbeta },t)]G_ l - \\ & & \sum _{k=1}^ n \int _0^ t Y_ k(s)\mr{e}^{\hat{\bbeta }'\bZ _ k} Z_{jk} [ \bZ _ k - \bar{\bZ }(\hat{\bbeta },s) ]’d\hat{\Lambda }_0(s) \\ & & \times \mc{I}^{-1}(\hat{\bbeta }) \sum _{l=1}^ n \Delta _ l [\bZ _ l -\bar{\bZ }(\hat{\bbeta },X_ l)]G_ l \biggr \} \end{eqnarray*}

where $\bar{Z}_ j(\hat{\bbeta },t)$ is the jth component of $\bar{\bZ }(\hat{\bbeta },t)$, and $(G_1,\ldots ,G_ n)$ are independent standard normal variables that are independent of $(X_ i,\Delta _ i,\bZ _ i$, $(i=1,\ldots ,n)$.

You can assess the proportional hazards assumption for the jth covariate by plotting a few realizations of $\hat{U}^*_{j}(t)$ on the same graph as the observed $U^*_ j(t)$ and visually comparing them to see how typical the observed pattern of $U^*_ j(t)$ is of the null distribution samples. Again you can supplement the graphical inspection method with a Kolmogorov-type supremum test. Let $s^*_ j$ be the observed value of $S^*_ j = \sup _ t|U^*_ j(t)|$ and let $\hat{S}^*_ j = \sup _ t|\hat{U}^*_ j(t)|$. The p-value $\mr{Pr}[S^*_ j \ge s^*_ j]$ is approximated by $\mr{Pr}[\hat{S}^*_ j \ge s_ j^*]$, which in turn is approximated by generating a large number of realizations (1000 is the default) of $\hat{U}^*_ j(.)$.