The LIFETEST Procedure

Breslow, Fleming-Harrington, and Kaplan-Meier Methods

Let $t_1 < t_2 < \cdots < t_ D$ represent the distinct event times. For each $i=1,\ldots ,D$ , let $Y_ i$ be the number of surviving units (the size of the risk set) just prior to $t_ i$ and let $d_ i$ be the number of units that fail at $t_ i$ . If the NOTRUNCATE option is specified in the FREQ statement, $Y_ i$ and $d_ i$ can be nonintegers.

The Breslow estimate of the survivor function is

$\hat{S}(t_ i) = \exp \biggl (-\sum _{j=1}^ i \frac{d_ j}{Y_ j} \biggr )$

Note that the Breslow estimate is the exponentiation of the negative Nelson-Aalen estimate of the cumulative hazard function.

The Fleming-Harrington estimate (Fleming and Harrington 1984) of the survivor function is

$\hat{S}(t_ i) = \exp \biggl (-\sum _{k=1}^ i\sum _{j=0}^{d_ k-1} \frac{1}{Y_ k-j} \biggr )$

If the frequency values are not integers, the Fleming-Harrington estimate cannot be computed.

The Kaplan-Meier (product-limit) estimate of the survivor function at $t_ i$ is the cumulative product

$\hat{S}(t_ i) = \prod _{j=1}^ i \left( 1 - \frac{d_ j}{Y_ j} \right)$

Notice that all the estimators are defined to be right continuous; that is, the events at $t_ i$ are included in the estimate of $S(t_ i)$ . The corresponding estimate of the standard error is computed using Greenwood’s formula (Kalbfleisch and Prentice 1980) as

$\hat{\sigma } \left( \hat{S}(t_ i) \right) = \hat{S}(t_ i) \sqrt { \sum _{j=1}^ i \frac{d_ j}{Y_ j(Y_ j-d_ j)} } ~$

The first quartile (or the 25th percentile) of the survival time is the time beyond which 75% of the subjects in the population under study are expected to survive. It is estimated by

$q_{.25} = \mr{min}\{ t_ j | \hat{S}(t_ j) < 0.75\}$

If $\hat{S}(t)$ is exactly equal to 0.75 from $t_ j$ to $t_{j+1}$ , the first quartile is taken to be $(t_ j + t_{j+1})/2$ . If it happens that $\hat{S}(t)$ is greater than 0.75 for all values of t, the first quartile cannot be estimated and is represented by a missing value in the printed output.

The general formula for estimating the 100pth percentile point is

$q_{p} = \mr{min}\{ t_ j | \hat{S}(t_ j) < 1-p\}$

The second quartile (the median) and the third quartile of survival times correspond to p = 0.5 and p = 0.75, respectively.

Brookmeyer and Crowley (1982) have constructed the confidence interval for the median survival time based on the confidence interval for the $S(t)$ . The methodology is generalized to construct the confidence interval for the 100pth percentile based on a g-transformed confidence interval for $S(t)$ (Klein and Moeschberger 1997). You can use the CONFTYPE= option to specify the g-transformation. The $100(1-\alpha )$ % confidence interval for the first quantile survival time is the set of all points t that satisfy

$\biggl | \frac{ g(\hat{S}(t)) - g(1 - 0.25)}{g'(\hat{S}(t)) \hat{\sigma }(\hat{S}(t))} \biggr | \leq z_{1-\frac{\alpha }{2}}$

where $g’(x)$ is the first derivative of $g(x)$ and $z_{1-\frac{\alpha }{2}}$ is the $(100(1-\frac{\alpha }{2}))$ th percentile of the standard normal distribution.

Consider the bone marrow transplant data described in Example 70.2. The following table illustrates the construction of the confidence limits for the first quartile in the ALL group. Values of $\frac{ g(\hat{S}(t)) - g(1 - 0.25)}{g'(\hat{S}(t)) \hat{\sigma }(\hat{S}(t))}$ that lie between $\pm z_{1-\frac{0.05}{2}}$ = $\pm$ 1.965 are highlighted.

Constructing 95% Confidence Limits for the 25th Percentile
			$\frac{ g(\hat{S}(t)) - g(1 -0.25)}{g'(\hat{S}(t)) \hat{\sigma }(\hat{S}(t))}$
t	$\hat{S}(t)$	$\hat{\sigma }(\hat{S}(t))$	LINEAR	LOGLOG	LOG	ASINSQRT	LOGIT
1	0.97368	0.025967	8.6141	2.37831	9.7871	4.44648	2.47903
55	0.94737	0.036224	5.4486	2.36375	6.1098	3.60151	2.46635
74	0.92105	0.043744	3.9103	2.16833	4.3257	2.94398	2.25757
86	0.89474	0.049784	2.9073	1.89961	3.1713	2.38164	1.97023
104	0.86842	0.054836	2.1595	1.59196	2.3217	1.87884	1.64297
107	0.84211	0.059153	1.5571	1.26050	1.6490	1.41733	1.29331
109	0.81579	0.062886	1.0462	0.91307	1.0908	0.98624	0.93069
110	0.78947	0.066135	0.5969	0.55415	0.6123	0.57846	0.56079
122	0.73684	0.071434	–0.1842	–0.18808	–0.1826	–0.18573	–0.18728
129	0.71053	0.073570	–0.5365	–0.56842	–0.5222	–0.54859	–0.56101
172	0.68421	0.075405	–0.8725	–0.95372	–0.8330	–0.90178	–0.93247
192	0.65789	0.076960	–1.1968	–1.34341	–1.1201	–1.24712	–1.30048
194	0.63158	0.078252	–1.5133	–1.73709	–1.3870	–1.58613	–1.66406
230	0.60412	0.079522	–1.8345	–2.14672	–1.6432	–1.92995	–2.03291
276	0.57666	0.080509	–2.1531	–2.55898	–1.8825	–2.26871	–2.39408
332	0.54920	0.081223	–2.4722	–2.97389	–2.1070	–2.60380	–2.74691
383	0.52174	0.081672	–2.7948	–3.39146	–2.3183	–2.93646	–3.09068
418	0.49428	0.081860	–3.1239	–3.81166	–2.5177	–3.26782	–3.42460
466	0.46682	0.081788	–3.4624	–4.23445	–2.7062	–3.59898	–3.74781
487	0.43936	0.081457	–3.8136	–4.65971	–2.8844	–3.93103	–4.05931
526	0.41190	0.080862	–4.1812	–5.08726	–3.0527	–4.26507	–4.35795
609	0.38248	0.080260	–4.5791	–5.52446	–3.2091	–4.60719	–4.64271
662	0.35306	0.079296	–5.0059	–5.96222	–3.3546	–4.95358	–4.90900

Consider the LINEAR transformation where $g(x)=x$ . The event times that satisfy $\biggl | \frac{ g(\hat{S}(t)) - g(1 - p)}{g'(\hat{S}(t)) \sqrt {\hat{V}(\hat{S}(t))}} \biggr | \leq 1.9599$ include 107, 109, 110, 122, 129, 172, 192, 194, and 230. The confidence of the interval [107, 230] is less than 95%. Brookmeyer and Crowley (1982) suggest extending the confidence interval to but not including the next event time. As such the 95% confidence interval for the first quartile based on the linear transform is [107, 276). The following table lists the confidence intervals for the various transforms.

95% CI’s for the 25th Percentile
CONFTYPE	[Lower	Upper)
LINEAR	107	276
LOGLOG	86	230
LOG	107	332
ASINSQRT	104	276
LOGIT	104	230

Sometimes, the confidence limits for the quartiles cannot be estimated. For convenience of explanation, consider the linear transform $g(x)=x$ . If the curve that represents the upper confidence limits for the survivor function lies above 0.75, the upper confidence limit for first quartile cannot be estimated. On the other hand, if the curve that represents the lower confidence limits for the survivor function lies above 0.75, the lower confidence limit for the quartile cannot be estimated.

The estimated mean survival time is

$\hat{\mu } = \sum _{i=1}^ D \hat{S}(t_{i-1})(t_ i - t_{i-1})$

where $t_0$ is defined to be zero. When the largest observed time is censored, this sum underestimates the mean. The standard error of $\hat{\mu }$ is estimated as

$\hat{\sigma }(\hat{\mu }) = \sqrt {\frac{m}{m-1} \sum _{i=1}^{D-1} \frac{d_ i A_ i^2}{Y_ i (Y_ i - d_ i)} }$

where

$\begin{eqnarray*} A_ i & = & \sum _{j=i}^{D-1} \hat{S}(t_ j)(t_{j+1} - t_ j) \\[0.05in] m & = & \sum _{j=1}^ D d_ j ~ \\ \end{eqnarray*}$

If the largest observed time is not an event, you can use the TIMELIM= option to specify a time limit L and estimate the mean survival time limited to the time L and its standard error by replacing k by k + 1 with $t_{k+1}=L$ .

Nelson-Aalen Estimate of the Cumulative Hazard Function

The Nelson-Aalen cumulative hazard estimator, defined up to the largest observed time on study, is

$\tilde{H}(t) = \sum _{t_ i\leq t} \frac{d_ i}{Y_ i}$

and its estimated variance is

$\hat{\sigma }^2 \left( \tilde{H}(t) \right) = \sum _{t_ i\leq t} \frac{d_ i}{Y_ i^2}$

Adjusted Kaplan-Meier Estimate

PROC LIFETEST computes the adjusted Kaplan-Meier estimate (AKME) of the survivor function if you specify both METHOD=KM and the WEIGHT statement. Let ( $T_ i,\delta _ i,w_ i), i=1,\ldots ,n,$ denote an independent sample of right-censored survival data, where $T_ i$ is the possibly right-censored time, $\delta _ i$ is the censoring indicator ( $\delta _ i=0$ if $T_ i$ is censored and $\delta _ i=1$ if $T_ i$ is an event time), and $w_ i$ is the weight (from the WEIGHT statement). Let $t_1<t_2,\ldots <t_ D$ be the D distinct event times in the sample. At time $t_ j, j=1,\ldots ,D$ , there are $d_{j}=\sum _{i}\delta _ iI(T_ i=t_ j)$ events out of $Y_{j}=\sum _{i}I(T_ i \geq t_ j)$ subjects. The weighted number of events and the weighted number at risk are $d^ w_{j} = \sum _{i} w_ i\delta _ i I(T_ i=t_ j)$ and $Y^ w_{j} = \sum _{i} w_ iI(T_ i \geq t_ j)$ , respectively. The AKME (Xie and Liu 2005) is

$\hat{S}(t) = \left\{ \begin{array}{ll} 1 & \mbox{if } t<t_1 \\ \prod _{t_ j \leq t} \left[ 1- \frac{d^ w_{j}}{Y^ w_{j}}\right] & \mbox{if } t \geq t_1 \end{array} \right.$

The estimated variance of $\hat{S}(t)$ is

$\hat{\sigma }^2\left(\hat{S}(t) \right) = \left(\hat{S}(t)\right)^2 \sum _{j:t_ j \leq t} \frac{d^ w_{j}/Y^ w_{j}}{M_{j} (1-d^ w_{j}/Y^ w_{j})}$

where

$M_{j} = \frac{\left(\sum _{i:T_ i \geq t_ j} w_ i \right)^2}{ \sum _{i:T_ i \geq t_ j} w_ i^2}$