Bayesian Inference

Bayesian inference about $\theta $ is primarily based on the posterior distribution of $\theta $. There are various ways in which you can summarize this distribution. For example, you can report your findings through point estimates. You can also use the posterior distribution to construct hypothesis tests or probability statements.

Point Estimation and Estimation Error

Classical methods often report the maximum likelihood estimator (MLE) or the method of moments estimator (MOME) of a parameter. In contrast, Bayesian approaches often use the posterior mean. The definition of the posterior mean is given by

\[  E(\theta |\mb {y} ) = \int \theta ~ p(\theta |\mb {y})~ d \theta  \]

Other commonly used posterior estimators include the posterior median, defined as

\[  \theta \colon P(\theta \geq \mr {median}|\mb {y}) = P(\mr {median} \leq \theta |\mb {y}) = \frac{1}{2} \]

and the posterior mode, defined as the value of $\theta $ that maximizes $p(\theta |\mb {y})$.

The variance of the posterior density (simply referred to as the posterior variance) describes the uncertainty in the parameter, which is a random variable in the Bayesian paradigm. A Bayesian analysis typically uses the posterior variance, or the posterior standard deviation, to characterize the dispersion of the parameter. In multidimensional models, covariance or correlation matrices are used.

If you know the distributional form of the posterior density of interest, you can report the exact posterior point estimates. When models become too difficult to analyze analytically, you have to use simulation algorithms, such as the Markov chain Monte Carlo (MCMC) method to obtain posterior estimates (see the section Markov Chain Monte Carlo Method). All of the Bayesian procedures rely on MCMC to obtain all posterior estimates. Using only a finite number of samples, simulations introduce an additional level of uncertainty to the accuracy of the estimates. Monte Carlo standard error (MCSE), which is the standard error of the posterior mean estimate, measures the simulation accuracy. See the section Standard Error of the Mean Estimate for more information.

The posterior standard deviation and the MCSE are two completely different concepts: the posterior standard deviation describes the uncertainty in the parameter, while the MCSE describes only the uncertainty in the parameter estimate as a result of MCMC simulation. The posterior standard deviation is a function of the sample size in the data set, and the MCSE is a function of the number of iterations in the simulation.

Hypothesis Testing

Suppose you have the following null and alternative hypotheses: $H_{0}$ is $\theta \in \Theta _0$ and $H_{1}$ is $\theta \in \Theta _0^{c}$, where $\Theta _0$ is a subset of the parameter space and $\Theta _0^{c}$ is its complement. Using the posterior distribution $\pi (\theta | \mb {y})$, you can compute the posterior probabilities $P(\theta \in \Theta _0 | \mb {y})$ and $P(\theta \in \Theta _0^{c} | \mb {y})$, or the probabilities that $H_0$ and $H_1$ are true, respectively. One way to perform a Bayesian hypothesis test is to accept the null hypothesis if $P(\theta \in \Theta _0 | \mb {y}) \geq P(\theta \in \Theta _0^{c} | \mb {y})$ and vice versa, or to accept the null hypothesis if $P(\theta \in \Theta _0 | \mb {y})$ is greater than a predefined threshold, such as 0.75, to guard against falsely accepted null distribution.

It is more difficult to carry out a point null hypothesis test in a Bayesian analysis. A point null hypothesis is a test of $H_{0}\colon \theta = \theta _0 $ versus $H_{1}\colon \theta \neq \theta _0$. If the prior distribution $\pi (\theta )$ is a continuous density, then the posterior probability of the null hypothesis being true is 0, and there is no point in carrying out the test. One alternative is to restate the null to be a small interval hypothesis: $\theta \in \Theta _0 = (\theta _0 - a, \theta _0 + a)$, where a is a very small constant. The Bayesian paradigm can deal with an interval hypothesis more easily. Another approach is to give a mixture prior distribution to $\theta $ with a positive probability of $p_0$ on $\theta _0$ and the density $(1-p_0) \pi (\theta )$ on $\theta \neq \theta _0$. This prior ensures a nonzero posterior probability on $\theta _0$, and you can then make realistic probabilistic comparisons. For more detailed treatment of Bayesian hypothesis testing, see Berger (1985).

Interval Estimation

The Bayesian set estimates are called credible sets, which is also known as credible intervals. This is analogous to the concept of confidence intervals used in classical statistics. Given a posterior distribution $p(\theta | \mb {y})$, A is a credible set for $\theta $ if

\[  P(\theta \in A | \mb {y}) = \int _ A p(\theta |\mb {y})d\theta  \]

For example, you can construct a 95% credible set for $\theta $ by finding an interval, A, over which $\int _ A p(\theta |\mb {y}) = 0.95$.

You can construct credible sets that have equal tails. A $100(1-\alpha )\% $ equal-tail interval corresponds to the $100(\alpha /2)$ and $100(1-\alpha /2)$ percentiles of the posterior distribution. Some statisticians prefer this interval because it is invariant under transformations. Another frequently used Bayesian credible set is called the highest posterior density (HPD) interval.

A $100(1-\alpha )\% $ HPD interval is a region that satisfies the following two conditions:

  1. The posterior probability of that region is $100(1-\alpha )\% $.

  2. The minimum density of any point within that region is equal to or larger than the density of any point outside that region.

The HPD is an interval in which most of the distribution lies. Some statisticians prefer this interval because it is the smallest interval.

One major distinction between Bayesian and classical sets is their interpretation. The Bayesian probability reflects a person’s subjective beliefs. Following this approach, a statistician can make the claim that $\theta $ is inside a credible interval with measurable probability. This property is appealing because it enables you to make a direct probability statement about parameters. Many people find this concept to be a more natural way of understanding a probability interval, which is also easier to explain to nonstatisticians. A confidence interval, on the other hand, enables you to make a claim that the interval covers the true parameter. The interpretation reflects the uncertainty in the sampling procedure; a confidence interval of $100(1-\alpha )\% $ asserts that, in the long run, $100(1-\alpha )\% $ of the realized confidence intervals cover the true parameter.