The PHREG Procedure

Specifics for Bayesian Analysis

To request a Bayesian analysis, you specify the new BAYES statement in addition to the PROC PHREG statement and the MODEL statement. You include a CLASS statement if you have effects that involve categorical variables. The FREQ or WEIGHT statement can be included if you have a frequency or weight variable, respectively, in the input data. The STRATA statement can be used to carry out a stratified analysis for the Cox model, but it is not allowed in the piecewise constant baseline hazard model. Programming statements can be used to create time-dependent covariates for the Cox model, but they are not allowed in the piecewise constant baseline hazard model. However, you can use the counting process style of input to accommodate time-dependent covariates that are not continuously changing with time for the piecewise constant baseline hazard model and the Cox model as well. The HAZARDRATIO statement enables you to request a hazard ratio analysis based on the posterior samples. The ASSESS, CONTRAST, ID, OUTPUT, and TEST statements, if specified, are ignored. Also ignored are the COVM and COVS options in the PROC PHREG statement and the following options in the MODEL statement: BEST=, CORRB, COVB, DETAILS, HIERARCHY=, INCLUDE=, MAXSTEP=, NOFIT, PLCONV=, SELECTION=, SEQUENTIAL, SLENTRY=, and SLSTAY=.

Piecewise Constant Baseline Hazard Model

Single Failure Time Variable

Let $\text{[math]}$ be the observed data. Let $\text{[math]}$ be a partition of the time axis.

Hazards in Original Scale

The hazard function for subject $\text{[math]}$ is

$\text{[math]}$

where

$\text{[math]}$

The baseline cumulative hazard function is

$\text{[math]}$

where

$\text{[math]}$

The log likelihood is given by

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

where $\text{[math]}$ .

Note that for $\text{[math]}$ , the full conditional for $\text{[math]}$ is log-concave only when $\text{[math]}$ , but the full conditionals for the $\text{[math]}$ ’s are always log-concave.

For a given $\text{[math]}$ , $\text{[math]}$ gives

$\text{[math]}$

Substituting these values into $\text{[math]}$ gives the profile log likelihood for $\text{[math]}$

$\text{[math]}$

where $\text{[math]}$ . Since the constant $\text{[math]}$ does not depend on $\text{[math]}$ , it can be discarded from $\text{[math]}$ in the optimization.

The MLE $\text{[math]}$ of $\text{[math]}$ is obtained by maximizing

$\text{[math]}$

with respect to $\text{[math]}$ , and the MLE $\text{[math]}$ of $\text{[math]}$ is given by

$\text{[math]}$

For $\text{[math]}$ , let

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The partial derivatives of $\text{[math]}$ are

	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The asymptotic covariance matrix for $\text{[math]}$ is obtained as the inverse of the information matrix given by

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

See Example 6.5.1 in Lawless (2003) for details.

Hazards in Log Scale

By letting

$\text{[math]}$

you can build a prior correlation among the $\text{[math]}$ ’s by using a correlated prior $\text{[math]}$ , where $\text{[math]}$ .

The log likelihood is given by

$\text{[math]}$

Then the MLE of $\text{[math]}$ is given by

$\text{[math]}$

Note that the full conditionals for $\text{[math]}$ ’s and $\text{[math]}$ ’s are always log-concave.

The asymptotic covariance matrix for $\text{[math]}$ is obtained as the inverse of the information matrix formed by

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

Counting Process Style of Input

Let $\text{[math]}$ be the observed data. Let $\text{[math]}$ be a partition of the time axis, where $\text{[math]}$ for all $\text{[math]}$ .

Replacing $\text{[math]}$ with

$\text{[math]}$

the formulation for the single failure time variable applies.

Priors for Model Parameters

For a Cox model, the model parameters are the regression coefficients. For a piecewise exponential model, the model parameters consist of the regression coefficients and the hazards or log-hazards. The priors for the hazards and the priors for the regression coefficients are assumed to be independent, while you can have a joint multivariate normal prior for the log-hazards and the regression coefficients.

Hazard Parameters

Let $\text{[math]}$ be the constant baseline hazards.

Improper Prior

The joint prior density is given by

$\text{[math]}$

This prior is improper (nonintegrable), but the posterior distribution is proper as long as there is at least one event time in each of the constant hazard intervals.

Uniform Prior

The joint prior density is given by

$\text{[math]}$

This prior is improper (nonintegrable), but the posteriors are proper as long as there is at least one event time in each of the constant hazard intervals.

Gamma Prior

The gamma distribution $\text{[math]}$ has a pdf

$\text{[math]}$

where $\text{[math]}$ is the shape parameter and $\text{[math]}$ is the scale parameter. The mean is $\text{[math]}$ and the variance is $\text{[math]}$ .

Independent Gamma Prior

Suppose for $\text{[math]}$ , $\text{[math]}$ has an independent $\text{[math]}$ prior. The joint prior density is given by

$\text{[math]}$

AR1 Prior

$\text{[math]}$ are correlated as follows:

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The joint prior density is given by

$\text{[math]}$

Log-Hazard Parameters

Write $\text{[math]}$ .

Uniform Prior

The joint prior density is given by

$\text{[math]}$

Note that the uniform prior for the log-hazards is the same as the improper prior for the hazards.

Normal Prior

Assume $\text{[math]}$ has a multivariate normal prior with mean vector $\text{[math]}$ and covariance matrix $\text{[math]}$ . The joint prior density is given by

$\text{[math]}$

Regression Coefficients

Let $\text{[math]}$ be the vector of regression coefficients.

Uniform Prior

The joint prior density is given by

$\text{[math]}$

This prior is improper, but the posterior distributions for $\text{[math]}$ are proper.

Normal Prior

Assume $\text{[math]}$ has a multivariate normal prior with mean vector $\text{[math]}$ and covariance matrix $\text{[math]}$ . The joint prior density is given by

$\text{[math]}$

Joint Multivariate Normal Prior for Log-Hazards and Regression Coefficients

Assume $\text{[math]}$ has a multivariate normal prior with mean vector $\text{[math]}$ and covariance matrix $\text{[math]}$ . The joint prior density is given by

$\text{[math]}$

Zellner’s g-Prior

Assume $\text{[math]}$ has a multivariate normal prior with mean vector $\text{[math]}$ and covariance matrix $\text{[math]}$ , where $\text{[math]}$ is the design matrix and $\text{[math]}$ is either a constant or it follows a gamma prior with density $\text{[math]}$ where $\text{[math]}$ and $\text{[math]}$ are the SHAPE= and ISCALE= parameters. Let $\text{[math]}$ be the rank of $\text{[math]}$ . The joint prior density with g being a constant c is given by

$\text{[math]}$

The joint prior density with g having a gamma prior is given by

$\text{[math]}$

Posterior Distribution

Denote the observed data by $\text{[math]}$ .

Cox Model

$\text{[math]}$

where $\text{[math]}$ is the partial likelihood function with regression coefficients $\text{[math]}$ as parameters.

Piecewise Exponential Model

Hazard Parameters

$\text{[math]}$

where $\text{[math]}$ is the likelihood function with hazards $\text{[math]}$ and regression coefficients $\text{[math]}$ as parameters.

Log-Hazard Parameters

$\text{[math]}$

where $\text{[math]}$ is the likelihood function with log-hazards $\text{[math]}$ and regression coefficients $\text{[math]}$ as parameters.

Sampling from the Posterior Distribution

For the Gibbs sampler, PROC PHREG uses the ARMS (adaptive rejection Metropolis sampling) algorithm of Gilks, Best, and Tan (1995) to sample from the full conditionals. This is the default sampling scheme. Alternatively, you can requests the random walk Metropolis (RWM) algorithm to sample an entire parameter vector from the posterior distribution. For a general discussion of these algorithms, refer to section Markov Chain Monte Carlo Method.

You can output these posterior samples into a SAS data set by using the OUTPOST= option in the BAYES statement, or you can use the following SAS statement to output the posterior samples into the SAS data set Post:

 ods output PosteriorSample=Post;

The output data set also includes the variables LogLike and LogPost, which represent the log of the likelihood and the log of the posterior log density, respectively.

Let $\text{[math]}$ be the parameter vector. For the Cox model, the $\text{[math]}$ ’s are the regression coefficients $\text{[math]}$ ’s, and for the piecewise constant baseline hazard model, the $\text{[math]}$ ’s consist of the baseline hazards $\text{[math]}$ ’s (or log baseline hazards $\text{[math]}$ ’s) and the regression coefficients $\text{[math]}$ ’s. Let $\text{[math]}$ be the likelihood function, where $\text{[math]}$ is the observed data. Note that for the Cox model, the likelihood contains the infinite-dimensional baseline hazard function, and the gamma process is perhaps the most commonly used prior process (Ibrahim, Chen, and Sinha; 2001). However, Sinha, Ibrahim, and Chen (2003) justify using the partial likelihood as the likelihood function for the Bayesian analysis. Let $\text{[math]}$ be the prior distribution. The posterior $\text{[math]}$ is proportional to the joint distribution $\text{[math]}$ .

Gibbs Sampler

The full conditional distribution of $\text{[math]}$ is proportional to the joint distribution; that is,

$\text{[math]}$

For example, the one-dimensional conditional distribution of $\text{[math]}$ , given $\text{[math]}$ , is computed as

$\text{[math]}$

Suppose you have a set of arbitrary starting values $\text{[math]}$ . Using the ARMS algorithm, an iteration of the Gibbs sampler consists of the following:

draw $\text{[math]}$ from $\text{[math]}$
draw $\text{[math]}$ from $\text{[math]}$
$\text{[math]}$
draw $\text{[math]}$ from $\text{[math]}$

After one iteration, you have $\text{[math]}$ . After $\text{[math]}$ iterations, you have $\text{[math]}$ . Cumulatively, a chain of $\text{[math]}$ samples is obtained.

Random Walk Metropolis Algorithm

PROC PHREG uses a multivariate normal proposal distribution $\text{[math]}$ centered at $\text{[math]}$ . With an initial parameter vector $\text{[math]}$ , a new sample $\text{[math]}$ is obtained as follows:

sample $\text{[math]}$ from $\text{[math]}$
calculate the quantity $\text{[math]}$
sample $\text{[math]}$ from the uniform distribution $\text{[math]}$
set $\text{[math]}$ if $\text{[math]}$ ; otherwise set $\text{[math]}$

With $\text{[math]}$ taking the role of $\text{[math]}$ , the previous steps are repeated to generate the next sample $\text{[math]}$ . After $\text{[math]}$ iterations, a chain of $\text{[math]}$ samples $\text{[math]}$ is obtained.

Starting Values of the Markov Chains

When the BAYES statement is specified, PROC PHREG generates one Markov chain that contains the approximate posterior samples of the model parameters. Additional chains are produced when the Gelman-Rubin diagnostics are requested. Starting values (initial values) can be specified in the INITIAL= data set in the BAYES statement. If the INITIAL= option is not specified, PROC PHREG picks its own initial values for the chains based on the maximum likelihood estimates of $\text{[math]}$ and the prior information of $\text{[math]}$ .

Denote $\text{[math]}$ as the integral value of $\text{[math]}$ .

Constant Baseline Hazards $\text{[math]}$ ’s

For the first chain that the summary statistics and diagnostics are based on, the initial values are

$\text{[math]}$

For subsequent chains, the starting values are picked in two different ways according to the total number of chains specified. If the total number of chains specified is less than or equal to 10, initial values of the $\text{[math]}$ th chain ( $\text{[math]}$ ) are given by

$\text{[math]}$

with the plus sign for odd $\text{[math]}$ and minus sign for even $\text{[math]}$ . If the total number of chains is greater than 10, initial values are picked at random over a wide range of values. Let $\text{[math]}$ be a uniform random number between 0 and 1; the initial value for $\text{[math]}$ is given by

$\text{[math]}$

Regression Coefficients and Log-Hazard Parameters $\text{[math]}$ ’s

The $\text{[math]}$ ’s are the regression coefficients $\text{[math]}$ ’s, and in the piecewise exponential model, include the log-hazard parameters $\text{[math]}$ ’s. For the first chain that the summary statistics and regression diagnostics are based on, the initial values are

$\text{[math]}$

If the number of chains requested is less than or equal to 10, initial values for the $\text{[math]}$ th chain ( $\text{[math]}$ ) are given by

$\text{[math]}$

with the plus sign for odd $\text{[math]}$ and minus sign for even $\text{[math]}$ . When there are more than 10 chains, the initial value for the $\text{[math]}$ is picked at random over the range $\text{[math]}$ ; that is,

$\text{[math]}$

where $\text{[math]}$ is a uniform random number between 0 and 1.

Fit Statistics

Denote the observed data by $\text{[math]}$ . Let $\text{[math]}$ be the vector of parameters of length $\text{[math]}$ . Let $\text{[math]}$ be the likelihood. The deviance information criterion (DIC) proposed in Spiegelhalter et al. (2002) is a Bayesian model assessment tool. Let Dev $\text{[math]}$ . Let $\text{[math]}$ and $\text{[math]}$ be the corresponding posterior means of $\text{[math]}$ and $\text{[math]}$ , respectively. The deviance information criterion is computed as

$\text{[math]}$

Also computed is

$\text{[math]}$

where $\text{[math]}$ is interpreted as the effective number of parameters.

Note that $\text{[math]}$ defined here does not have the standardizing term as in the section Deviance Information Criterion (DIC). Nevertheless, the DIC calculated here is still useful for variable selection.

Posterior Distribution for Quantities of Interest

Let $\text{[math]}$ be the parameter vector. For the Cox model, the $\text{[math]}$ ’s are the regression coefficients $\text{[math]}$ ’s; for the piecewise constant baseline hazard model, the $\text{[math]}$ ’s consist of the baseline hazards $\text{[math]}$ ’s (or log baseline hazards $\text{[math]}$ ’s) and the regression coefficients $\text{[math]}$ ’s. Let $\text{[math]}$ be the chain that represents the posterior distribution for $\text{[math]}$ .

Consider a quantity of interest $\text{[math]}$ that can be expressed as a function $\text{[math]}$ of the parameter vector $\text{[math]}$ . You can construct the posterior distribution of $\text{[math]}$ by evaluating the function $\text{[math]}$ for each $\text{[math]}$ in $\text{[math]}$ . The posterior chain for $\text{[math]}$ is $\text{[math]}$ Summary statistics such as mean, standard deviation, percentiles, and credible intervals are used to describe the posterior distribution of $\text{[math]}$ .

Hazard Ratio

As shown in the section Hazard Ratios, a log-hazard ratio is a linear combination of the regression coefficients. Let $\text{[math]}$ be the vector of linear coefficients. The posterior sample for this hazard ratio is the set $\text{[math]}$ .

Survival Distribution

Let $\text{[math]}$ be a covariate vector of interest.

Cox Model

Let $\text{[math]}$ be the observed data. Define

$\text{[math]}$

Consider the $\text{[math]}$ th draw $\text{[math]}$ of $\text{[math]}$ . The baseline cumulative hazard function at time $\text{[math]}$ is given by

$\text{[math]}$

For the given covariate vector $\text{[math]}$ , the cumulative hazard function at time $\text{[math]}$ is

$\text{[math]}$

and the survival function at time $\text{[math]}$ is

$\text{[math]}$

Piecewise Exponential Model

Let $\text{[math]}$ be a partition of the time axis. Consider the $\text{[math]}$ th draw $\text{[math]}$ in $\text{[math]}$ , where $\text{[math]}$ consists of $\text{[math]}$ and $\text{[math]}$ . The baseline cumulative hazard function at time $\text{[math]}$ is

$\text{[math]}$

where

$\text{[math]}$

For the given covariate vector $\text{[math]}$ , the cumulative hazard function at time $\text{[math]}$ is

$\text{[math]}$

and the survival function at time $\text{[math]}$ is

$\text{[math]}$

The PHREG Procedure

Piecewise Constant Baseline Hazard Model

Single Failure Time Variable

Hazards in Original Scale

Hazards in Log Scale

Counting Process Style of Input

Priors for Model Parameters

Hazard Parameters

Improper Prior

Uniform Prior

Gamma Prior

Independent Gamma Prior

AR1 Prior

Log-Hazard Parameters

Uniform Prior

Normal Prior

Regression Coefficients

Uniform Prior

Normal Prior

Joint Multivariate Normal Prior for Log-Hazards and Regression Coefficients

Zellner’s g-Prior

Posterior Distribution

Cox Model

Piecewise Exponential Model

Hazard Parameters

Log-Hazard Parameters

Sampling from the Posterior Distribution

Gibbs Sampler

Random Walk Metropolis Algorithm

Starting Values of the Markov Chains

Constant Baseline Hazards ’s

Regression Coefficients and Log-Hazard Parameters ’s

Fit Statistics

Posterior Distribution for Quantities of Interest

Hazard Ratio

Survival Distribution

Cox Model

Piecewise Exponential Model

Constant Baseline Hazards $\text{[math]}$ ’s

Regression Coefficients and Log-Hazard Parameters $\text{[math]}$ ’s