Introduction to Bayesian Analysis Procedures

Summary Statistics

Subsections:

Mean
Standard Deviation
Standard Error of the Mean Estimate
Percentiles
Correlation
Covariance
Equal-Tail Credible Interval
Highest Posterior Density (HPD) Interval
Deviance Information Criterion (DIC)

Let $\btheta$ be a p-dimensional parameter vector of interest: $\btheta =\left\{ \theta _{1},\dots ,\theta _{p}\right\}$ . For each $i\in \left\{ 1,\dots , p\right\} \,$ , there are n observations: $\theta _{i}=\left\{ \theta _{i}^ t\, ,\ t=1,\dots ,n\right\}$ .

Mean

The posterior mean is calculated by using the following formula:

$E\left( \theta _{i}|\mb{y} \right) \approx \bar{\theta }_{i} = \frac{1}{n} \sum _{t=1}^{n}\theta _{i}^ t,\ \mbox{for } i = 1, \dots , n$

Standard Deviation

Sample standard deviation (expressed in variance term) is calculated by using the following formula:

$\mr{Var}(\theta _ i | \mb{y}) \approx s_ i^2 = \frac{1}{n-1} \sum _{t=1}^{n}\left( \theta _{i}^ t-\bar{\theta }_{i}\right) ^{2}$

Standard Error of the Mean Estimate

Suppose you have n iid samples, the mean estimate is $\bar{\theta }_ i$ , and the sample standard deviation is $s_ i$ . The standard error of the estimate is $\hat{\sigma }_ i / \sqrt {n}$ . However, positive autocorrelation (see the section Autocorrelations for a definition) in the MCMC samples makes this an underestimate. To take account of the autocorrelation, the Bayesian procedures correct the standard error by using effective sample size (see the section Effective Sample Size).

Given an effective sample size of m, the standard error for $\bar{\theta }_ i$ is $\hat{\sigma }_ i / \sqrt {m}$ . The procedures use the following formula (expressed in variance term):

${\widehat{\mr{Var}}(\bar{\theta }_{i})} = \frac{ 1 + 2 \sum _{k=1}^\infty \rho _ k(\theta _ i) }{n} \cdot \frac{\sum _{t=1}^{n}\left( \theta _{i}^ t-\bar{\theta }_{i}\right) ^{2}}{ (n-1) }$

The standard error of the mean is also known as the Monte Carlo standard error (MCSE). The MCSE provides a measurement of the accuracy of the posterior estimates, and small values do not necessarily indicate that you have recovered the true posterior mean.

Percentiles

Sample percentiles are calculated using Definition 5 (see Chapter 4: The UNIVARIATE Procedure in Base SAS 9.4 Procedures Guide: Statistical Procedures, Fourth Edition).

Correlation

Correlation between $\theta _{i}$ and $\theta _{j}$ is calculated as

$r_{ij}=\frac{\sum _{t=1}^{n}\left( \theta _{i}^ t-\bar{ \theta }_{i}\right) \left( \theta _{j}^ t-\bar{\theta }_{j}\right) }{\sqrt { \sum _{t}\left( \theta _{i}^ t-\bar{\theta }_{i}\right) ^{2}\sum _{t}\left( \theta _{j}^ t-\bar{\theta }_{j}\right) ^{2}}}$

Covariance

Covariance $\theta _{i}$ and $\theta _{j}$ is calculated as

$s_{ij} = \sum _{t=1}^{n}\left( \theta _{i}^ t-\bar{ \theta }_{i}\right) \left( \theta _{j}^ t-\bar{\theta }_{j}\right) \left/ \left( n - 1 \right) \right.$

Equal-Tail Credible Interval

Let $\pi \left( \theta _{i}| \mb{y}\right)$ denote the marginal posterior cumulative distribution function of $\theta _{i}$ . A $100\left( 1-\alpha \right) \%$ Bayesian equal-tail credible interval for $\theta _{i}$ is $\left( \theta _{i}^{\alpha /2},\theta _{i}^{1-\alpha / 2 }\right)$ , where $\pi \left( \theta _{i}^{\alpha /2}|\mb{y}\right) =\frac{\alpha }{2}$ , and $\pi \left( \theta _{i}^{ 1-\alpha /2}|\mb{y}\right) =1-\frac{\alpha }{2}$ . The interval is obtained using the empirical $\frac{ \alpha }{2}$ th and $(1-\frac{\alpha }{2})$ th percentiles of $\left\{ \theta _{i}^ t \right\}$ .

Highest Posterior Density (HPD) Interval

For a definition of an HPD interval, see the section Interval Estimation. The procedures use the Chen-Shao algorithm (Chen and Shao 1999; Chen, Shao, and Ibrahim 2000) to estimate an empirical HPD interval of $\theta _{i}$ :

Sort $\left\{ \theta _{i}^ t\right\}$ to obtain the ordered values:

$\theta _{i\left( 1\right) }\leq \theta _{i\left( 2\right) }\leq \cdots \leq \theta _{i\left( n\right) }$
Compute the $100\left( 1-\alpha \right) \%$ credible intervals:

$R_{j}\left( n\right) =\left( \theta _{i\left( j\right) },\theta _{i\left( j+ \left[ \left( 1-\alpha \right) n\right] \right) }\right)$

for $j=1,2,\dots ,n-\left[ \left( 1-\alpha \right) n\right]$ .
The $100\left( 1-\alpha \right) \%$ HPD interval, denoted by $R_{j^{\ast }}\left( n\right)$ , is the one with the smallest interval width among all credible intervals.

Deviance Information Criterion (DIC)

The deviance information criterion (DIC) (Spiegelhalter et al. 2002) is a model assessment tool, and it is a Bayesian alternative to Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC, also known as the Schwarz criterion). The DIC uses the posterior densities, which means that it takes the prior information into account. The criterion can be applied to nonnested models and models that have non-iid data. Calculation of the DIC in MCMC is trivial—it does not require maximization over the parameter space, like the AIC and BIC. A smaller DIC indicates a better fit to the data set.

Letting $\btheta$ be the parameters of the model, the deviance information formula is

$\mr{DIC} = \overline{D(\btheta )} + p_ D = D(\overline{\btheta }) + 2p_ D$

where

$D(\btheta )$ $= 2 \left( \log (f(\mb{y})) - \log (p(\mb{y} | \btheta )) \right)$ : deviance

where

: $p(\mb{y}|\btheta )$ : likelihood function with the normalizing constants.
: $f(\mb{y})$ : a standardizing term that is a function of the data alone. This term is constant with respect to the parameter and is irrelevant when you compare different models that have the same likelihood function. Since the term cancels out in DIC comparisons, its calculation is often omitted.

Note: You can think of the deviance as the difference in twice the log likelihood between the saturated, $f(\mb{y})$ , and fitted, $p(\mb{y}|\btheta )$ , models.

$\overline{\btheta }$ : posterior mean, approximated by $\frac{1}{n} \sum _{t=1}^{n}\btheta ^ t$

$\overline{D(\btheta )}$ : posterior mean of the deviance, approximated by $\frac{1}{n} \sum _{t=1}^{n} D(\btheta ^ t)$ . The expected deviation measures how well the model fits the data.

$D(\overline{\btheta })$ : deviance evaluated at $\bar{\btheta }$ , equal to $-2 \log (p(\mb{y} | \bar{\btheta }))$ . It is the deviance evaluated at your "best" posterior estimate.

$p_ D$ : effective number of parameters. It is the difference between the measure of fit and the deviance at the estimates: $\overline{D(\btheta )} - D(\overline{\btheta })$ . This term describes the complexity of the model, and it serves as a penalization term that corrects deviance’s propensity toward models with more parameters.