Introduction to Survival Analysis Procedures

Survival Analysis with SAS/STAT Procedures

The typical goal in survival analysis is to characterize the distribution of the survival time for a given population, to compare the survival distributions among different groups, or to study the relationship between the survival time and some concomitant variables.

A first step in analyzing of a set of survival data is to use the LIFETEST or ICLIFETEST procedure to compute and plot the estimate of the distribution of the survival time. In many applications, you often have several survival curves to compare. For example, you might want to compare the survival experiences of patients who receive different treatments for their disease. You can investigate the relationship between covariates and the survival time by computing estimates of the survival distribution function within strata that are defined by the covariates. In particular, if the proportional hazards model is appropriate, the estimates of the log(-log(SURVIVAL)) plotted against the log(TIME) should give approximately parallel lines, where SURVIVAL is the survival distribution estimate and TIME is the failure time variable. In addition, these lines should be approximately straight if the Weibull model is appropriate.

You can use knowledge of the association between failure time and the concomitant variables to select covariates for further investigation. The LIFETEST procedure computes linear rank statistics by using Wilcoxon or log-rank scores. These statistics and their estimated covariance matrix can be used with the REG procedure and the METHOD=RSQUARE option to find the subset of variables that produce the largest joint test statistic for association. An illustration of this methodology is given in Example 58.1 of Chapter 58: The LIFETEST Procedure.

Another approach to examining the relationship between survival time and the concomitant variables is through a regression model in which the survival time has a distribution that depends on the concomitant variables. The regression coefficients can be interpreted as describing the direction and strength of the effect of each explanatory variable on the survival time.

In many biological systems, the Cox model might be a reasonable description of the relationship between the distribution of the survival time and the prognostic factors. You use PROC PHREG to fit the Cox regression model. The regression coefficient is interpreted as the increase of the log-hazard ratio that results in the increase of one unit in the covariate. However, the underlying hazard function is left unspecified, and, as in any other model, the results can be misleading if the proportional hazards assumptions do not hold.

When you have interval-censored data, it is difficult to fit the semiparametric Cox regression models. But you can use the ICPHREG procedure to fit a different proportional hazards model, and the regression coefficients of such a model can still be interpreted as log-hazard ratios.

Accelerated failure time models are popular for fitting survival data from physical systems. In many cases, the underlying survival distribution is known empirically. You use PROC LIFEREG to fit these parametric models. Also, PROC LIFEREG can accommodate data that contain left-censored or interval-censored observations, which PROC PHREG does not allow.

A common technique for checking the validity of a regression model is to embed it in a larger model and use the likelihood ratio test to check whether the reduction to the actual model is valid. Other techniques include examining the residuals. Both PROC LIFEREG and PROC PHREG produce predicted values, residuals, and other computed values that can be used to assess the model’s adequacy.