The PHREG Procedure

Specifics for Bayesian Analysis

To request a Bayesian analysis, you specify the new BAYES statement in addition to the PROC PHREG statement and the MODEL statement. You include a CLASS statement if you have effects that involve categorical variables. The FREQ or WEIGHT statement can be included if you have a frequency or weight variable, respectively, in the input data. The STRATA statement can be used to carry out a stratified analysis for the Cox model, but it is not allowed in the piecewise constant baseline hazard model. Programming statements can be used to create time-dependent covariates for the Cox model, but they are not allowed in the piecewise constant baseline hazard model. However, you can use the counting process style of input to accommodate time-dependent covariates that are not continuously changing with time for the piecewise constant baseline hazard model and the Cox model as well. The HAZARDRATIO statement enables you to request a hazard ratio analysis based on the posterior samples. The ASSESS, CONTRAST, ID, OUTPUT, and TEST statements, if specified, are ignored. Also ignored are the COVM and COVS options in the PROC PHREG statement and the following options in the MODEL statement: BEST=, CORRB, COVB, DETAILS, HIERARCHY=, INCLUDE=, MAXSTEP=, NOFIT, PLCONV=, SELECTION=, SEQUENTIAL, SLENTRY=, and SLSTAY=.

Piecewise Constant Baseline Hazard Model

Single Failure Time Variable

Let $\{ (t_ i,\mb {x}_ i,\delta _ i), i=1,2,\ldots ,n\} $ be the observed data. Let $a_0=0 < a_1 < \ldots < a_{J-1} <a_ J=\infty $ be a partition of the time axis.

Hazards in Original Scale

The hazard function for subject i is

\[  h(t|\mb {x}_ i;\btheta )= h_0(t) \exp (\bbeta ’\mb {x}_ i)  \]

where

\[  h_0(t)= \lambda _ j ~ ~ \mr {if}~ a_{j-1} \le t < a_ j, j=1,\ldots ,J  \]

The baseline cumulative hazard function is

\[  H_0(t)= \sum _{j=1}^ J \lambda _ j \Delta _ j(t)  \]

where

$\displaystyle  \Delta _ j(t) = \left\{  \begin{array}{ll} 0 &  t<a_{j-1} \\ t-a_{j-1} &  a_{j-1}\le t < a_ j \\ a_ j - a_{j-1} &  t \ge a_ j \end{array} \right.  $

The log likelihood is given by

$\displaystyle  l(\blambda ,\bbeta )  $
$\displaystyle = $
$\displaystyle  \sum _{i=1}^ n \delta _ i \biggl [ \sum _{j=1}^ J I(a_{j-1} \le t_ i <a_ j) \log \lambda _ j + \bbeta ’\mb {x}_ i \biggr ] - \sum _{i=1}^ n \biggl [ \sum _{j=1}^ J \Delta _ j(t_ i)\lambda _ j \biggr ] \exp (\bbeta ’\mb {x}_ i)  $
$\displaystyle  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ J d_ j \log \lambda _ j + \sum _{i=1}^ n \delta _ i \bbeta ’\mb {x}_ i -\sum _{j=1}^ J \lambda _ j \biggl [ \sum _{i=1}^ n \Delta _ j(t_ i)\exp (\bbeta ’\mb {x}_ i) \biggr ]  $

where $d_ j = \sum _{i=1}^ n \delta _ i I(a_{j-1} \le t_ i <a_ j)$.

Note that for $1\leq j \leq J$, the full conditional for $\lambda _ j$ is log-concave only when $d_ j>0$, but the full conditionals for the $\beta $’s are always log-concave.

For a given $\bbeta $, $\frac{\partial l}{\partial \blambda } = 0$ gives

\[  \tilde{\lambda }_ j(\bbeta )= \frac{d_ j}{\sum _{i=1}^ n\Delta _ j(t_ i) \exp (\bbeta \mb {x}_ i)}, ~ ~  j=1,\ldots ,J  \]

Substituting these values into $l(\blambda ,\bbeta )$ gives the profile log likelihood for $\bbeta $

\[  l_ p(\bbeta )= \sum _{i=1}^ n \delta _ i\bbeta ’\mb {x}_ i - \sum _{j=1}^ J d_ j \log \biggl [\sum _{l=1}^ n \Delta _ j(t_ l) \exp (\bbeta ’\mb {x}_ l) \biggr ] + c  \]

where $c=\sum _ j (d_ j \log d_ j - d_ j)$. Since the constant c does not depend on $\bbeta $, it can be discarded from $l_ p(\bbeta )$ in the optimization.

The MLE $\hat{\bbeta }$ of $\bbeta $ is obtained by maximizing

\[  l_ p(\bbeta )= \sum _{i=1}^ n \delta _ i\bbeta ’\mb {x}_ i - \sum _{j=1}^ J d_ j \log \biggl [\sum _{l=1}^ n \Delta _ j(t_ l) \exp (\bbeta ’\mb {x}_ l) \biggr ]  \]

with respect to $\bbeta $, and the MLE $\hat{\blambda }$ of $\blambda $ is given by

\[ \hat{\blambda }=\tilde{\blambda }(\hat{\bbeta })  \]

For $j=1,\ldots ,J$, let

$\displaystyle  \bS _ j^{(r)}(\bbeta )  $
$\displaystyle = $
$\displaystyle  \sum _{l=1}^ n \Delta _ j(t_ l)\mr {e}^{\bbeta \mb {x}_ l} \mb {x}_ l^{\otimes r}, \mbox{~ ~ } r=0,1,2 $
$\displaystyle \bE _ j(\bbeta )  $
$\displaystyle = $
$\displaystyle  \frac{\bS _ j^{(1)}(\bbeta )}{S_ j^{(0)}(\bbeta )}  $

The partial derivatives of $l_ p(\bbeta )$ are

$\displaystyle  \frac{\partial l_ p(\bbeta )}{\partial \bbeta }  $
$\displaystyle = $
$\displaystyle  \sum _{i=1}^ n \delta _ i \mb {x}_ i - \sum _{j=1}^ J d_ j \bE _ j(\bbeta ) $
$\displaystyle - \frac{\partial ^2 l_ p(\bbeta )}{\partial \bbeta ^2}  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ J d_ j \biggl \{  \frac{\bS _ j^{(2)}(\bbeta )}{S_ j^{(0)}(\bbeta )} - \biggl [\bE _ j(\bbeta )\biggl ] \biggl [ \bE _ j(\bbeta )\biggr ]’\biggl \}   $

The asymptotic covariance matrix for $(\hat{\blambda },\hat{\bbeta })$ is obtained as the inverse of the information matrix given by

$\displaystyle  -\frac{\partial ^2l(\hat{\blambda },\hat{\bbeta })}{\partial \blambda ^2}  $
$\displaystyle = $
$\displaystyle  \mc {D} \biggl (\frac{d_1}{\hat{\lambda }_1^2},\ldots ,\frac{d_ J}{\hat{\lambda }_ J^2} \biggr )  $
$\displaystyle -\frac{\partial ^2l(\hat{\blambda },\hat{\bbeta })}{\partial \bbeta ^2}  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ J \hat{\lambda }_ j \bS _ j^{(2)}(\hat{\bbeta })  $
$\displaystyle -\frac{\partial ^2l(\hat{\blambda },\hat{\bbeta })}{\partial \blambda \partial \bbeta }  $
$\displaystyle = $
$\displaystyle  (\bS ^{(1)}_1(\hat{\bbeta }), \ldots , \bS ^{(1)}_ J(\hat{\bbeta }))  $

See Example 6.5.1 in Lawless (2003) for details.

Hazards in Log Scale

By letting

\[  \alpha _ j=\log (\lambda _ j), \mbox{~ ~ } j=1,\ldots ,J  \]

you can build a prior correlation among the $\lambda _ j$’s by using a correlated prior $\balpha \sim N(\balpha _0,\Sigma _\alpha )$, where $\balpha =(\alpha _1,\ldots ,\alpha _ J)’$.

The log likelihood is given by

\[  l(\balpha ,\bbeta ) = \sum _{j=1}^ J d_ j \alpha _ j + \sum _{i=1}^ n \delta _ i \bbeta ’\mb {x}_ i -\sum _{j=1}^ J \mr {e}^{\alpha _ j} S_ j^{(0)}(\bbeta )  \]

Then the MLE of $\lambda _ j$ is given by

\[  \mr {e}^{\hat{\alpha }_ j} = \hat{\lambda }_ j = \frac{d_ j}{S_ j^{0}(\hat{\bbeta })}  \]

Note that the full conditionals for $\alpha $’s and $\beta $’s are always log-concave.

The asymptotic covariance matrix for $(\hat{\balpha },\hat{\bbeta })$ is obtained as the inverse of the information matrix formed by

$\displaystyle  -\frac{\partial ^2l(\hat{\balpha },\hat{\bbeta })}{\partial \balpha ^2}  $
$\displaystyle = $
$\displaystyle  \mc {D} \biggl (\mr {e}^{\hat{\alpha }_ j}S_ j^{0}(\hat{\bbeta }),\ldots , \mr {e}^{\hat{\alpha }_ J}S_ J^{0}(\hat{\bbeta })) \biggr )  $
$\displaystyle -\frac{\partial ^2l(\hat{\balpha },\hat{\bbeta })}{\partial \bbeta ^2}  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^ J \mr {e}^{\hat{\alpha }_ j} \bS _ j^{(2)}(\hat{\bbeta })  $
$\displaystyle -\frac{\partial ^2l(\hat{\balpha },\hat{\bbeta })}{\partial \balpha \partial \bbeta }  $
$\displaystyle = $
$\displaystyle  (\mr {e}^{\hat{\alpha }_ j}\bS ^{(1)}_1(\hat{\bbeta }), \ldots , \mr {e}^{\hat{\alpha }_ j}\bS ^{(1)}_ J(\hat{\bbeta }))  $
Counting Process Style of Input

Let $\{ ((s_ i,t_ i],\mb {x}_ i,\delta _ i), i=1,2,\ldots ,n\} $ be the observed data. Let $a_0=0 < a_1 < \ldots <a_ k$ be a partition of the time axis, where $a_ k > t_ i$ for all $i=1,2,\ldots ,n$.

Replacing $\Delta _ j(t_ i)$ with

$\displaystyle  \Delta _ j((s_ i,t_ i]) = \left\{  \begin{array}{ll} 0 &  t_ i<a_{j-1} \vee s_ i> a_ j \\ t_ i-\max (s_ i,a_{j-1}) &  a_{j-1}\le t_ i < a_ j \\ a_ j - \max (s_ i,a_{j-1}) &  t_ i \ge a_ j \end{array} \right.  $

the formulation for the single failure time variable applies.

Priors for Model Parameters

For a Cox model, the model parameters are the regression coefficients. For a piecewise exponential model, the model parameters consist of the regression coefficients and the hazards or log-hazards. The priors for the hazards and the priors for the regression coefficients are assumed to be independent, while you can have a joint multivariate normal prior for the log-hazards and the regression coefficients.

Hazard Parameters

Let $\lambda _1,\ldots ,\lambda _ J$ be the constant baseline hazards.

Improper Prior

The joint prior density is given by

\[  p(\lambda _1,\ldots ,\lambda _ J) = \prod _{j=1}^ J \frac{1}{\lambda _ j}. \forall \lambda _ j > 0  \]

This prior is improper (nonintegrable), but the posterior distribution is proper as long as there is at least one event time in each of the constant hazard intervals.

Uniform Prior

The joint prior density is given by

\[  p(\lambda _1,\ldots ,\lambda _ J) \propto 1, \forall \lambda _ j > 0  \]

This prior is improper (nonintegrable), but the posteriors are proper as long as there is at least one event time in each of the constant hazard intervals.

Gamma Prior

The gamma distribution $G(a,b)$ has a PDF

\[  f_{a,b}(t) = \frac{b (b t)^{a-1}\mr {e}^{-b t}}{\Gamma (a)}, t>0  \]

where a is the shape parameter and $b^{-1}$ is the scale parameter. The mean is $\frac{a}{b}$ and the variance is $\frac{a}{b^2}$.

Independent Gamma Prior

Suppose for $j=1,\ldots ,J$, $\lambda _ j$ has an independent $G(a_ j,b_ j)$ prior. The joint prior density is given by

\[  p(\lambda _1,\ldots ,\lambda _ J) \propto \prod _{j=1}^ J \biggl \{  \lambda _ j^{a_ j-1} \mr {e}^{-b_ j \lambda _ j} \biggr \} , \forall \lambda _ j > 0  \]
AR1 Prior

$\lambda _1, \ldots , \lambda _ J$ are correlated as follows:

$\displaystyle  \lambda _1  $
$\displaystyle  \sim  $
$\displaystyle  G(a_1,b_1)  $
$\displaystyle \lambda _2  $
$\displaystyle  \sim  $
$\displaystyle  G \biggl (a_2, \frac{b_2}{\lambda _1} \biggr )  $
$\displaystyle \ldots  $
$\displaystyle  $
$\displaystyle  \ldots  $
$\displaystyle \lambda _ J  $
$\displaystyle  \sim  $
$\displaystyle  G \biggl (a_{J}, \frac{b_{J}}{\lambda _{J-1}} \biggr )  $

The joint prior density is given by

\[  p(\lambda _1,\ldots ,\lambda _ J) \propto \lambda _1^{a_1 -1} \mr {e}^{-b_1 \lambda _1} \prod _{j=2}^ J \biggl (\frac{b_ j}{\lambda _{j-1}} \biggr )^{a_ j} \lambda _ j^{a_ j-1} \mr {e}^{- \frac{b_ j}{\lambda _{j-1}} \lambda _ j}  \]
Log-Hazard Parameters

Write $\balpha = (\alpha _1, \ldots , \alpha _ J)’ \equiv (\log \lambda _1, \ldots , \log \lambda _ J)’$.

Uniform Prior

The joint prior density is given by

\[  p(\alpha _1 \ldots \alpha _ J) \propto 1, \forall -\infty <\alpha _ i<\infty  \]

Note that the uniform prior for the log-hazards is the same as the improper prior for the hazards.

Normal Prior

Assume $\balpha $ has a multivariate normal prior with mean vector $\balpha _0$ and covariance matrix $\bPsi _0$. The joint prior density is given by

\[  p(\balpha ) \propto \mr {e}^{-\frac{1}{2} (\balpha -\balpha _0)\bPsi _0^{-1}(\balpha -\balpha _0)}  \]
Regression Coefficients

Let $\bbeta =(\beta _1, \ldots , \beta _ k)’$ be the vector of regression coefficients.

Uniform Prior

The joint prior density is given by

\[  p(\beta _1,\ldots ,\beta _ k) \propto 1, \forall -\infty <\beta _ i<\infty  \]

This prior is improper, but the posterior distributions for $\bbeta $ are proper.

Normal Prior

Assume $\bbeta $ has a multivariate normal prior with mean vector $\bbeta _0$ and covariance matrix $\bSigma _0$. The joint prior density is given by

\[  p(\bbeta ) \propto \mr {e}^{-\frac{1}{2} (\bbeta -\bbeta _0)\bSigma _0^{-1}(\bbeta -\bbeta _0)}  \]
Joint Multivariate Normal Prior for Log-Hazards and Regression Coefficients

Assume $(\balpha ’,\bbeta ’)’$ has a multivariate normal prior with mean vector $(\alpha _0’,\bbeta _0’)’$ and covariance matrix $\bPhi _0$. The joint prior density is given by

\[  p(\balpha ,\bbeta ) \propto \mr {e}^{-\frac{1}{2} [(\balpha -\balpha _0),(\bbeta -\bbeta _0)]\bPhi _0^{-1} [(\balpha -\balpha _0),(\bbeta -\bbeta _0)]}  \]
Zellner’s g-Prior

Assume $\bbeta $ has a multivariate normal prior with mean vector $\mb {0}$ and covariance matrix $(g\bX ^\prime \bX )^{-1}$, where $\bX $ is the design matrix and g is either a constant or it follows a gamma prior with density $ f(\tau )=\frac{b (b\tau )^{a-1}{\mr {e}}^{-b\tau }}{\Gamma (a)} $ where a and b are the SHAPE= and ISCALE= parameters. Let k be the rank of $\bX $. The joint prior density with g being a constant c is given by

\[  p(\bbeta ) \propto c^{\frac{k}{2}} \mr {e}^{-\frac{1}{2} \bbeta (c\bX ^\prime \bX )^{-1}\bbeta } \\  \]

The joint prior density with g having a gamma prior is given by

\[  p(\bbeta ,\tau ) \propto \tau ^{\frac{k}{2}} \mr {e}^{-\frac{1}{2} \bbeta (\tau \bX ^\prime \bX )^{-1}\bbeta } \frac{b(b\tau )^{a-1}{\mr {e}}^{-b\tau }}{\Gamma (a)}  \]
Dispersion Parameter for Frailty Model
Improper Prior

The density is

\[  p(\theta )= \frac{1}{\theta }  \]
Inverse Gamma Prior

The inverse gamma distribution $IG(a,b)$ has a density

\[  p(\theta |a,b)=\frac{b^ a \theta ^{-(a+1)} \mr {e}^{-\frac{b}{\theta }}}{\Gamma (a)}  \]

where a and b are the SHAPE= and SCALE= parameters.

Gamma Prior

The gamma distribution $G(a,b)$ has a density

\[  p(\theta |a,b)= \frac{b^ a \theta ^{a-1}\mr {e}^{-b\theta }}{\Gamma (a)}  \]

where a and b are the SHAPE= and ISCALE= parameters.

Posterior Distribution

Denote the observed data by D.

Cox Model
\[  \pi (\bbeta |D) \propto \underbrace{L_(D|\bbeta )}_{\mr {partial~ likelihood}} \overbrace{p(\bbeta )}^{\mr {prior}}  \]
Frailty Model

Based on the framework of Sargent (1998),

\[  \pi (\bbeta ,\bgamma ,\theta |D) \propto \underbrace{L(D|\bbeta ,\bgamma )}_{\mr {partial~ likelihood}} \overbrace{g(\bgamma |\theta )}^{\mr {random~ effects}} \underbrace{p(\bbeta ) p(\theta )}_{\mr {priors}}  \]

where the joint density of the random effects $\bgamma =(\gamma _1,\dots ,\gamma _ s)’$ is given by

\begin{equation*}  g(\bgamma |\theta ) \propto \left\{  \begin{array}{ll} \prod _ i \exp \left(\frac{\gamma _ i}{\theta }\right) \exp \left(-\exp \left(\frac{\gamma _ i}{\theta }\right)\right) &  \textrm{ gamma frailty} \\ \prod _ i \exp \left(- \frac{\gamma _ i^2}{2\theta }\right) &  \textrm{ lognormal frailty} \\ \end{array} \right. \end{equation*}
Piecewise Exponential Model
Hazard Parameters
\[  \pi (\blambda ,\bbeta |D) \propto L_ H(D|\blambda ,\bbeta ) p(\blambda ) p(\bbeta )  \]

where $L_{H}(D|\blambda ,\bbeta )$ is the likelihood function with hazards $\blambda $ and regression coefficients $\bbeta $ as parameters.

Log-Hazard Parameters
\begin{equation*} \begin{split}  \pi (\balpha ,\bbeta |D) \propto \left\{  \begin{array}{ll} L_{\mr {LH}}(D|\balpha ,\bbeta ) p(\balpha ,\bbeta ) &  \textrm{if} \quad (\balpha ’,\bbeta ’)’ \sim \textrm{MVN} \\ L_{\mr {LH}}(D|\balpha ,\bbeta ) p(\balpha ) p(\bbeta ) &  \textrm{otherwise} \end{array} \right. \end{split}\end{equation*}

where $L_{\mr {LH}}(D|\balpha ,\bbeta )$ is the likelihood function with log-hazards $\balpha $ and regression coefficients $\bbeta $ as parameters.

Sampling from the Posterior Distribution

For the Gibbs sampler, PROC PHREG uses the ARMS (adaptive rejection Metropolis sampling) algorithm of Gilks, Best, and Tan (1995) to sample from the full conditionals. This is the default sampling scheme. Alternatively, you can requests the random walk Metropolis (RWM) algorithm to sample an entire parameter vector from the posterior distribution. For a general discussion of these algorithms, see section Markov Chain Monte Carlo Method.

You can output these posterior samples into a SAS data set by using the OUTPOST= option in the BAYES statement, or you can use the following SAS statement to output the posterior samples into the SAS data set Post:

 ods output PosteriorSample=Post;

The output data set also includes the variables LogLike and LogPost, which represent the log of the likelihood and the log of the posterior log density, respectively.

Let $\btheta = (\theta _1, \ldots , \theta _ k)’$ be the parameter vector. For the Cox model, the $\theta _ i$’s are the regression coefficients $\beta _ i$’s, and for the piecewise constant baseline hazard model, the $\theta _ i$’s consist of the baseline hazards $\lambda _ i$’s (or log baseline hazards $\alpha _ i$’s) and the regression coefficients $\beta _ j$’s. Let $L(D|\btheta )$ be the likelihood function, where D is the observed data. Note that for the Cox model, the likelihood contains the infinite-dimensional baseline hazard function, and the gamma process is perhaps the most commonly used prior process (Ibrahim, Chen, and Sinha, 2001). However, Sinha, Ibrahim, and Chen (2003) justify using the partial likelihood as the likelihood function for the Bayesian analysis. Let $p(\btheta )$ be the prior distribution. The posterior $f\pi (\btheta |D)$ is proportional to the joint distribution $L(D|\btheta ) p(\btheta )$.

Gibbs Sampler

The full conditional distribution of $\theta _ i$ is proportional to the joint distribution; that is,

\[  \pi (\theta _ i | \theta _ j, i \neq j, D) \propto L(D|\btheta ) p(\btheta )  \]

For example, the one-dimensional conditional distribution of $\theta _1$, given $\theta _ j=\theta _ j^*, 2\leq j \leq k$, is computed as

\[  \pi (\theta _1 | \theta _ j=\theta _ j^*, 2\leq j \leq k, D) = L(D|\btheta =(\theta _1, \theta ^*_2,\ldots ,\theta ^*_ k)’) p(\btheta =(\theta _1, \theta ^*_2,\ldots ,\theta ^*_ k)’)  \]

Suppose you have a set of arbitrary starting values $\{ \theta _1^{(0)}, \ldots , \theta _ k^{(0)}\} $. Using the ARMS algorithm, an iteration of the Gibbs sampler consists of the following:

  • draw $\theta _1^{(1)}$ from $\pi (\theta _1|\theta _2^{(0)},\ldots ,\theta _ k^{(0)}, D)$

  • draw $\theta _2^{(1)}$ from $\pi (\theta _2|\theta _1^{(1)}, \theta _3^{(0)},\ldots ,\theta _ k^{(0)}, D)$

  • $\vdots $

  • draw $\theta _ k^{(1)}$ from $\pi (\theta _ k|\theta _1^{(1)},\ldots ,\theta _{k-1}^{(1)}, D)$

After one iteration, you have $\{ \theta _1^{(1)}, \ldots , \theta _ k^{(1)}\} $. After n iterations, you have $\{ \theta _1^{(n)}, \ldots , \theta _ k^{(n)}\} $. Cumulatively, a chain of n samples is obtained.

Random Walk Metropolis Algorithm

PROC PHREG uses a multivariate normal proposal distribution $q(.|\btheta )$ centered at $\btheta $. With an initial parameter vector $\btheta ^{(0)}$, a new sample $\btheta ^{(1)}$ is obtained as follows:

  • sample $\btheta ^*$ from $q(.|\btheta ^{(0)})$

  • calculate the quantity $ r = \min \left\{  \frac{\pi (\btheta ^* |D)}{\pi (\btheta ^{(0)}|D)}, 1 \right\}  $

  • sample u from the uniform distribution $\mi {U}(0,1)$

  • set $\btheta ^{(1)} = \btheta ^*$ if $u < r$; otherwise set $\btheta ^{(1)} = \btheta ^{(0)}$

With $\btheta ^{(1)}$ taking the role of $\btheta ^{(0)}$, the previous steps are repeated to generate the next sample $\btheta ^{(2)}$. After n iterations, a chain of n samples $\{ \btheta ^{(1)},\ldots ,\btheta ^{(n)}\} $ is obtained.

Starting Values of the Markov Chains

When the BAYES statement is specified, PROC PHREG generates one Markov chain that contains the approximate posterior samples of the model parameters. Additional chains are produced when the Gelman-Rubin diagnostics are requested. Starting values (initial values) can be specified in the INITIAL= data set in the BAYES statement. If the INITIAL= option is not specified, PROC PHREG picks its own initial values for the chains based on the maximum likelihood estimates of $\btheta $ and the prior information of $\btheta $.

Denote $[x]$ as the integral value of x.

Constant Baseline Hazards $\lambda _ i$’s

For the first chain that the summary statistics and diagnostics are based on, the initial values are

\[  \lambda _ i^{(0)} = \hat{\lambda }_ i  \]

For subsequent chains, the starting values are picked in two different ways according to the total number of chains specified. If the total number of chains specified is less than or equal to 10, initial values of the rth chain ($2 \leq r\leq 10$) are given by

\[  \lambda _ i^{(0)} = \hat{\lambda }_ i \mr {e}^{\pm \biggl ([\frac{r}{2}]+2 \biggl ) \hat{s}(\hat{\lambda }_ i)}  \]

with the plus sign for odd r and minus sign for even r. If the total number of chains is greater than 10, initial values are picked at random over a wide range of values. Let $u_ i$ be a uniform random number between 0 and 1; the initial value for $\lambda _ i$ is given by

\[  \lambda _ i^{(0)} = \hat{\lambda }_ i \mr {e}^{ 16(u_ i-0.5)\hat{s}(\hat{\lambda }_ i)}  \]
Regression Coefficients and Log-Hazard Parameters $\theta _ i$’s

The $\theta _ i$’s are the regression coefficients $\beta _ i$’s, and in the piecewise exponential model, include the log-hazard parameters $\alpha _ i$’s. For the first chain that the summary statistics and regression diagnostics are based on, the initial values are

\[  \theta _ i^{(0)} = \hat{\theta }_ i  \]

If the number of chains requested is less than or equal to 10, initial values for the rth chain ($2 \leq r \leq 10$) are given by

\[  \theta _ i^{(0)} = \hat{\theta }_ i \pm \biggl (2+ \biggl [\frac{r}{2} \biggr ] \biggr ) \hat{s}(\hat{\theta }_ i)  \]

with the plus sign for odd r and minus sign for even r. When there are more than 10 chains, the initial value for the $\theta _ i$ is picked at random over the range $(\hat{\theta _ i} - 8\hat{s}(\hat{\theta _ i}), \hat{\theta _ i}+ 8\hat{s}(\hat{\theta _ i}))$; that is,

\[  \theta _ i^{(0)} = \hat{\theta }_ i + 16(u_ i-0.5)\hat{s}(\hat{\theta }_ i)  \]

where $u_ i$ is a uniform random number between 0 and 1.

Fit Statistics

Denote the observed data by D. Let $\btheta $ be the vector of parameters of length k. Let $L(D|\btheta )$ be the likelihood. The deviance information criterion (DIC) proposed in Spiegelhalter et al. (2002) is a Bayesian model assessment tool. Let Dev$(\btheta ) = -2\log L(D|\btheta )$. Let $\overline{\mr {Dev}(\theta )}$ and $\bar{\btheta }$ be the corresponding posterior means of $\mr {Dev}(\btheta )$ and $\btheta $, respectively. The deviance information criterion is computed as

\[  \mr {DIC} = 2\overline{\mr {Dev}(\btheta )} - \mr {Dev}(\bar{\btheta })  \]

Also computed is

\[  pD= \overline{\mr {Dev}(\btheta )} - \mr {Dev}(\bar{\btheta })  \]

where pD is interpreted as the effective number of parameters.

Note that $\mr {Dev}(\btheta )$ defined here does not have the standardizing term as in the section Deviance Information Criterion (DIC). Nevertheless, the DIC calculated here is still useful for variable selection.

Posterior Distribution for Quantities of Interest

Let $\btheta = (\theta _1, \ldots , \theta _ k)’$ be the parameter vector. For the Cox model, the $\theta _ i$’s are the regression coefficients $\beta _ i$’s; for the piecewise constant baseline hazard model, the $\theta _ i$’s consist of the baseline hazards $\lambda _ i$’s (or log baseline hazards $\alpha _ i$’s) and the regression coefficients $\beta _ j$’s. Let $\mc {S}=\{ \btheta ^{(r)},r=1,\ldots ,N\} $ be the chain that represents the posterior distribution for $\btheta $.

Consider a quantity of interest $\tau $ that can be expressed as a function $f(\btheta )$ of the parameter vector $\btheta $. You can construct the posterior distribution of $\tau $ by evaluating the function $f(\btheta ^{(r)})$ for each $\btheta ^{(r)}$ in $\mc {S}$. The posterior chain for $\tau $ is $\{ f(\btheta ^{(r)}), r=1,\ldots ,N\} .$ Summary statistics such as mean, standard deviation, percentiles, and credible intervals are used to describe the posterior distribution of $\tau $.

Hazard Ratio

As shown in the section Hazard Ratios, a log-hazard ratio is a linear combination of the regression coefficients. Let $\mb {h}$ be the vector of linear coefficients. The posterior sample for this hazard ratio is the set $\{ \exp (\mr {h}’\bbeta ^{(r)}), r=1,\ldots , N\} $.

Survival Distribution

Let $\mr {x}$ be a covariate vector of interest.

Cox Model

Let $\{ (t_ i,\mb {z}_ i,\delta _ i), i=1,2,\ldots ,n\} $ be the observed data. Define

$\displaystyle  Y_ i(t)= \left\{  \begin{array}{ll} 1 &  t < t_ i\\ 0 &  \mr {otherwise} \end{array} \right.  $

Consider the rth draw $\bbeta ^{(r)}$ of $\mc {S}$. The baseline cumulative hazard function at time t is given by

\[  H_0(t|\bbeta ^{(r)}) = \sum _{i:t_ i\leq t} \frac{\delta _ i}{\sum _{l=1}^ n Y_ l(t_ i)\mr {exp}(\mb {z}_ l \bbeta ^{(r)})}  \]

For the given covariate vector $\mr {x}$, the cumulative hazard function at time t is

\[  H(t;\mb {x}|\bbeta ^{(r)}) = H_0(t|\bbeta ^{(r)})\exp (\mb {x}’\bbeta ^{(r)})  \]

and the survival function at time t is

\[  S(t;\mb {x}|\bbeta ^{(r)}) = \exp [-H^{(r)}(t;\mb {x}|\bbeta ^{(r)})]  \]
Piecewise Exponential Model

Let $0=a_0 < a_1 <\ldots <a_ J<\infty $ be a partition of the time axis. Consider the rth draw $\btheta ^{(r)}$ in $\mc {S}$, where $\btheta ^{(r)}$ consists of $\blambda ^{(r)}=(\lambda ^{(r)}_1,\ldots ,\lambda ^{(r)}_ J)’$ and $\bbeta ^{(r)}$. The baseline cumulative hazard function at time t is

\[  H_0(t|\blambda ^{(r)})= \sum _{j=1}^ J \lambda ^{(r)}_ j \Delta _ j(t)  \]

where

$\displaystyle  \Delta _ j(t) = \left\{  \begin{array}{ll} 0 &  t<a_{j-1} \\ t-a_{j-1} &  a_{j-1}\le t < a_ j \\ a_ j - a_{j-1} &  t \ge a_ j \end{array} \right.  $

For the given covariate vector $\mb {x}$, the cumulative hazard function at time t is

\[  H(t;\mb {x}|\blambda ^{(r)},\bbeta ^{(r)})=H_0(t|\blambda ^{(r)})\exp (\mb {x}’\bbeta ^{(r)})  \]

and the survival function at time t is

\[  S(t;\mb {x}|\blambda ^{(r)},\bbeta ^{(r)}) = \exp [-H(t;\mb {x}|\blambda ^{(r)},\bbeta ^{(r)})]  \]