The QLIM Procedure

Marginal Likelihood

The Bayes theorem states that

\begin{equation*} p(\theta | \mb{y})\propto \pi (\theta ) L(y|\theta ) \end{equation*}

where $\theta $ is a vector of parameters and $\pi (\theta )$ is the product of the prior densities that are specified in the PRIOR statement. The term $L(y|\theta )$ is the likelihood that is associated with the MODEL statement. The function $\pi (\theta ) L(y|\theta )$ is the nonnormalized posterior distribution over the parameter vector $\theta $. The normalized posterior distribution (simply, the posterior distribution) is

\begin{equation*} p(\theta | \mb{y})= \frac{\pi (\theta ) L(y|\theta )}{\int _{\theta }\pi (\theta ) L(y|\theta )d\theta } \end{equation*}

The denominator $m(y)=\int _{\theta }\pi (\theta ) L(y|\theta )d\theta $ (also called the “marginal likelihood”) is a quantity of interest because it represents the probability of the data after the effect of the parameter vector has been averaged out. Because of its interpretation, the marginal likelihood can be used in various applications, including model averaging, variable selection, and model selection.

A natural estimate of the marginal likelihood is provided by the harmonic mean,

\begin{equation*} m(y)=\left\{ \frac{1}{n}\sum \limits _{i=1}^{n}\frac{1}{L(y|\theta _ i)}\right\} ^{-1} \end{equation*}

where $\theta _ i$ is a sample draw from the posterior distribution. In practical applications, this estimator has proven to be unstable.

An alternative and more stable estimator can be obtained with an importance sampling scheme. The auxiliary distribution for the importance sampler can be chosen through the cross entropy theory (Chan and Eisenstat 2015). In particular, given a parametric family of distributions, the auxiliary density function is chosen to be the one closest, in terms of the Kullback-Leibler divergence, to the probability density that would give a zero variance estimate of the marginal likelihood. In practical terms, this is equivalent to the following algorithm:

  1. Choose a parametric family, $f(.,\beta )$, for the parameters of the model: $f(\theta |\beta )$.

  2. Evaluate the maximum likelihood estimator of $\beta $ by using the posterior samples $\theta _1,\ldots ,\theta _ n$ as data.

  3. Use $f(\theta ^{*}|\hat{\beta }_{mle})$ to generate the importance samples $\theta ^{*}_1,\ldots ,\theta ^{*}_{n^{*}}$.

  4. Estimate the marginal likelihood:

    \begin{equation*} m(y)=\frac{1}{n^{*}}\sum \limits _{j=1}^{n^{*}}\frac{L(y|\theta ^{*}_ j)\pi (\theta ^{*}_ j)}{f(\theta ^{*}_ j|\hat{\beta }_{mle})} \end{equation*}

The parametric family for the auxiliary distribution is chosen to be Gaussian. The parameters that are subject to bounds are transformed accordingly

  • If $-\infty <\theta <\infty $, then $p=\theta $.

  • If $m\leq \theta <\infty $, then $q=\log (\theta -m)$.

  • If $-\infty <\theta \leq M$, then $r=\log (M-\theta )$.

  • If $m\leq \theta \leq M$, then $s=\log (\theta -m)-\log (M-\theta )$.

Assuming independence for the parameters that are subject to bounds, the auxiliary distribution to generate importance samples is

\begin{equation*} \begin{pmatrix} \mb{p} \\ \mb{q} \\ \mb{r} \\ \mb{s} \end{pmatrix} \sim \mb{N} \left[ \begin{pmatrix} \mu _ p \\ \mu _{q} \\ \mu _{r} \\ \mu _{s} \\ \end{pmatrix}, \begin{pmatrix} \Sigma _ p & 0 & 0 & 0 \\ 0 & \Sigma _ q & 0 & 0 \\ 0 & 0 & \Sigma _ r & 0 \\ 0 & 0 & 0 & \Sigma _ r \\ \end{pmatrix} \right] \end{equation*}

where $\mb{p}$, $\mb{q}$, $\mb{r}$, and $\mb{s}$ are vectors that contain the transformations of the unbounded, bounded-below, bounded-above, and bounded-above-and-below parameters. Also, given the imposed independence structure, $\Sigma _ p$ can be a nondiagonal matrix, but $\Sigma _ q$, $\Sigma _ r$, and $\Sigma _ s$ are assumed to be diagonal matrices.