The most frequently used statistical methods are known as frequentist (or classical) methods. These methods assume that unknown parameters are fixed constants, and they define probability by using limiting relative frequencies. It follows from these assumptions that probabilities are objective and that you cannot make probabilistic statements about parameters because they are fixed. Bayesian methods offer an alternative approach; they treat parameters as random variables and define probability as degrees of belief (that is, the probability of an event is the degree to which you believe the event is true). It follows from these postulates that probabilities are subjective and that you can make probability statements about parameters. The term Bayesian comes from the prevalent usage of Bayes’ theorem, which was named after the Reverend Thomas Bayes, an eighteenth century Presbyterian minister. Bayes was interested in solving the question of inverse probability: after observing a collection of events, what is the probability of one event?

Suppose you are interested in estimating $\theta $ from data $\mb {y} = \{  y_1, \ldots , y_ n \} $ by using a statistical model described by a density $p( \mb {y}|\theta )$. Bayesian philosophy states that $\theta $ cannot be determined exactly, and uncertainty about the parameter is expressed through probability statements and distributions. You can say that $\theta $ follows a normal distribution with mean 0 and variance 1, if it is believed that this distribution best describes the uncertainty associated with the parameter. The following steps describe the essential elements of Bayesian inference:

  1. A probability distribution for $\theta $ is formulated as $\pi (\theta )$, which is known as the prior distribution, or just the prior. The prior distribution expresses your beliefs (for example, on the mean, the spread, the skewness, and so forth) about the parameter before you examine the data.

  2. Given the observed data $\mb {y}$, you choose a statistical model $p(\mb {y}|\theta )$ to describe the distribution of $\mb {y}$ given $\theta $.

  3. You update your beliefs about $\theta $ by combining information from the prior distribution and the data through the calculation of the posterior distribution, $p(\theta | \mb {y})$.

The third step is carried out by using Bayes’ theorem, which enables you to combine the prior distribution and the model in the following way:

\[  p(\theta | \mb {y}) = \frac{p(\theta , \mb {y})}{p(\mb {y})} = \frac{p(\mb {y} | \theta )\pi (\theta )}{p(\mb {y})}= \frac{p(\mb {y} | \theta )\pi (\theta )}{\int p(\mb {y} | \theta ) \pi (\theta )d\theta }  \]

The quantity

\[ p(\mb {y}) = \int p(\mb {y}|\theta ) \pi (\theta ) d\theta  \]

is the normalizing constant of the posterior distribution. This quantity $p(\mb {y})$ is also the marginal distribution of $\mb {y}$, and it is sometimes called the marginal distribution of the data. The likelihood function of $\theta $ is any function proportional to $p(\mb {y}|\theta )$; that is, $L(\theta ) \propto p(\mb {y}|\theta ) $. Another way of writing Bayes’ theorem is as follows:

\[  p(\theta | \mb {y}) = \frac{L (\theta ) \pi (\theta )}{\int L (\theta ) \pi (\theta ) d \theta }  \]

The marginal distribution $p(\mb {y})$ is an integral. As long as the integral is finite, the particular value of the integral does not provide any additional information about the posterior distribution. Hence, $p(\theta |\mb {y})$ can be written up to an arbitrary constant, presented here in proportional form as:

\[  p(\theta | \mb {y}) \propto L(\theta ) \pi (\theta )  \]

Simply put, Bayes’ theorem tells you how to update existing knowledge with new information. You begin with a prior belief $\pi (\theta )$, and after learning information from data $\mb {y}$, you change or update your belief about $\theta $ and obtain $p(\theta | \mb {y})$. These are the essential elements of the Bayesian approach to data analysis.

In theory, Bayesian methods offer simple alternatives to statistical inference—all inferences follow from the posterior distribution $p(\theta |\mb {y})$. In practice, however, you can obtain the posterior distribution with straightforward analytical solutions only in the most rudimentary problems. Most Bayesian analyses require sophisticated computations, including the use of simulation methods. You generate samples from the posterior distribution and use these samples to estimate the quantities of interest. PROC MCMC uses a self-tuning Metropolis algorithm (see the section Metropolis and Metropolis-Hastings Algorithms). The GENMOD, LIFEREG, and PHREG procedures use the Gibbs sampler (see the section Gibbs Sampler). The BCHOICE and FMM procedure use a combination of Gibbs sampler and latent variable sampler. An important aspect of any analysis is assessing the convergence of the Markov chains. Inferences based on nonconverged Markov chains can be both inaccurate and misleading.

Both Bayesian and classical methods have their advantages and disadvantages. From a practical point of view, your choice of method depends on what you want to accomplish with your data analysis. If you have prior information (either expert opinion or historical knowledge) that you want to incorporate into the analysis, then you should consider Bayesian methods. In addition, if you want to communicate your findings in terms of probability notions that can be more easily understood by nonstatisticians, Bayesian methods might be appropriate. The Bayesian paradigm can often provide a framework for answering specific scientific questions that a single point estimate cannot sufficiently address. Alternatively, if you are interested only in estimating parameters based on the likelihood, then numerical optimization methods, such as the Newton-Raphson method, can give you very precise estimates and there is no need to use a Bayesian analysis. For further discussions of the relative advantages and disadvantages of Bayesian analysis, see the section Bayesian Analysis: Advantages and Disadvantages.