|
Chapter Contents |
Previous |
Next |
| The LIFEREG Procedure |
Probability plots are useful tools for the display and analysis of lifetime data. Probability plots use an inverse distribution scale so that a cumulative distribution function (CDF) plots as a straight line. A nonparametric estimate of the CDF of the lifetime data will plot approximately as a straight line, thus providing a visual assessment of goodness-of-fit.
You can use the PROBPLOT statement in LIFEREG to create probability plots of data that are complete, right-censored, interval-censored, or a combination of censoring types (arbitrarily censored). A line representing the maximum likelihood fit from the MODEL statement and pointwise parametric confidence bands for the cumulative probabilities are also included on the plot.
A random variable Y belongs to a location-scale family of distributions if its CDF F is of the form

where
is
the location parameter and
is the scale parameter.
Here, G is a CDF
that cannot depend on any unknown parameters, and G is the CDF
of Y if
and
.
For example, if Y is a normal random variable with mean
and
standard deviation
,


The normal, extreme value,
and logistic distributions are location-scale models. The 3-parameter
gamma distribution is a location-scale model if the shape parameter
is fixed.
If T has a lognormal, Weibull, or log-logistic distribution, then
log(T) has a distribution that is a location-scale model.
Probability plots are constructed for lognormal, Weibull,
and log-logistic distributions by using log(T) instead of T
in the plots.
Let
be ordered
observations of a random sample with distribution function
F(y).
A probability plot is a plot of the points y(i) against
mi=G-1(ai), where
is an estimate of the CDF
.The nonparametric CDF estimates ai are sometimes called plotting positions. The axis on
which the points mi are plotted is usually labeled with a probability
scale (the scale of ai).
If F is one of the location-scale distributions, then y is the lifetime; otherwise, the log of the lifetime is used to transform the distribution to a location-scale model.
If the data actually
have the stated distribution, then
,

There are several ways to compute the nonparametric CDF estimates used in probability plots from lifetime data. These are discussed in the next two sections.
Let
be ordered
observations of a random sample including failure times and censor times.
Order the data in increasing order.
Label all the data with reverse ranks ri, with
r1 = n, ... , rn = 1.
For the lifetime (not censoring time) corresponding to
reverse rank ri, compute the survival function estimate
For the Kaplan-Meier method,
For the modified Kaplan-Meier method, use
For complete samples, ai=i/(n+1) for the expected rank method, a'i=i/n for the Kaplan-Meier method, and a''i=(i-.5)/n for the modified Kaplan-Meier method. If the largest observation is a failure for the Kaplan-Meier estimator, then Fn=1 and the point is not plotted.

For complete samples, the failure order number ji is equal to i, the order of the failure in the sample. In this case, the preceding equation for ai is an approximation to the median plotting position computed as the median of the ith-order statistic from the uniform distribution on (0, 1). In the censored case, ji is not necessarily an integer, but the preceding equation still provides an approximation to the median plotting position. The PPOS=MEDRANK option specifies the median rank plotting position.
If an interval probability is smaller than a tolerance (10-6 by default) after convergence, the probability is set to zero, the interval probabilities are renormalized so that they add to one, and iterations are restarted. Usually the algorithm converges in just a few more iterations. You can change the default value of the tolerance with the TOLPROB= option. You can specify the NOPOLISH option to avoid setting small probabilities to zero and restarting the algorithm.
If you specify the ITPRINTEM option, a table summarizing the Turnbull estimate of the interval probabilities is displayed. The columns labeled "Reduced Gradient" and "Lagrange Multiplier" are used in checking final convergence of the maximum likelihood estimate. The Lagrange multipliers must all be greater than or equal to zero, or the solution is not maximum likelihood. Refer to Gentleman and Geyer (1994) for more details of the convergence checking. Also refer to Meeker and Escobar (1998, chap. 3) for more information.
See Example 6.3 for an illustration.
![[ F_L,\;\; F_U ]=[ \frac{\hat{F}}{\hat{F} + ( 1-\hat{F})w}, \;\; \frac{\hat{F}}{\hat{F} + ( 1-\hat{F})/w} ]](images/lfreq41.gif)
![w=\exp[\frac{z_{1-\alpha/2}{se}_{\hat{F}}} {(\hat{F}(1-\hat{F}))} ]](images/lfreq42.gif)
where zp is the pth quantile of the standard normal distribution.
![[ F_L,\;\; F_U ]=[ \frac{\hat{F}}{\hat{F} + ( 1-\hat{F})w}, \;\; \frac{\hat{F}}{\hat{F} + ( 1-\hat{F})/w} ]](images/lfreq41.gif)
![w=\exp[\frac{e_{a,b,1-\alpha/2}{se}_{\hat{F}}} {(\hat{F}(1-\hat{F}))}]](images/lfreq43.gif)
where the factor
is the solution of
![x\exp(-x^2/2)\log[\frac{(1-a)b}{(1-b)a}]/\sqrt{8\pi}=\alpha/2](images/lfreq45.gif)
The time interval (ta, tb) over which the bands are valid depends in a complicated way on the constants a and b defined in Nair (1984), 0 < a < b < 1. a and b are chosen by default so that the confidence bands are valid between the lowest and highest times corresponding to failures in the case of multiply censored data, or, to the lowest and highest intervals for which probabilities are computed for arbitrarily censored data. You can optionally specify a and b directly with the NPINTERVALS=SIMULTANEOUS(a, b) option in the PROBPLOT statement.
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.