The COUNTREG Procedure

Marginal Likelihood

Subsections:

The Bayes theorem states that

\begin{equation*} p(\theta | \mb{y})\propto \pi (\theta ) L(y|\theta ) \end{equation*}

where $\theta $ is a vector of parameters and $\pi (\theta )$ is the product of the prior densities, which are specified in the PRIOR statement. The term $L(y|\theta )$ is the likelihood associated with the MODEL statement. The function $\pi (\theta ) L(y|\theta )$ is the nonnormalized posterior distribution over the parameter vector $\theta $. The normalized posterior distribution, or simply the posterior distribution, is

\begin{equation*} p(\theta | \mb{y})= \frac{\pi (\theta ) L(y|\theta )}{\int _{\theta }\pi (\theta ) L(y|\theta )d\theta } \end{equation*}

The denominator $m(y)=\int _{\theta }\pi (\theta ) L(y|\theta )d\theta $, also called the “marginal likelihood,” is a quantity of interest because it represents the probability of the data after the effect of the parameter vector has been averaged out. Due to its interpretation, the marginal likelihood can be used in various applications, including model averaging and variable or model selection.

A natural estimate of the marginal likelihood is provided by the harmonic mean,

\begin{equation*} m(y)=\left\{ \frac{1}{n}\sum \limits _{i=1}^{n}\frac{1}{L(y|\theta _ i)}\right\} ^{-1} \end{equation*}

where $\theta _ i$ is a sample draw from the posterior distribution. This estimator has proven to be unstable in practical applications.

An alternative and more stable estimator can be obtained by using an importance sampling scheme. The auxiliary distribution for the importance sampler can be chosen through the cross-entropy theory (Chan and Eisenstat 2015). In particular, given a parametric family of distributions, the auxiliary density function is chosen to be the one closest, in terms of the Kullback-Leibler divergence, to the probability density that would give a zero variance estimate of the marginal likelihood. In practical terms, this is equivalent to the following algorithm:

  1. Choose a parametric family, $f(.,\beta )$, for the parameters of the model: $f(\theta |\beta )$

  2. Evaluate the maximum likelihood estimator of $\beta $ by using the posterior samples $\theta _1,\ldots ,\theta _ n$ as data

  3. Use $f(\theta ^{*}|\hat{\beta }_{mle})$ to generate the importance samples: $\theta ^{*}_1,\ldots ,\theta ^{*}_{n^{*}}$

  4. Estimate the marginal likelihood:

    \begin{equation*} m(y)=\frac{1}{n^{*}}\sum \limits _{j=1}^{n^{*}}\frac{L(y|\theta ^{*}_ j)\pi (\theta ^{*}_ j)}{f(\theta ^{*}_ j|\hat{\beta }_{mle})} \end{equation*}

The parametric family for the auxiliary distribution is chosen to be Gaussian. The parameters that are subject to bounds are transformed accordingly

  • If $-\infty <\theta <\infty $, then $p=\theta $.

  • If $m\leq \theta <\infty $, then $q=\log (\theta -m)$.

  • If $-\infty <\theta \leq M$, then $r=\log (M-\theta )$.

  • If $m\leq \theta \leq M$, then $s=\log (\theta -m)-\log (M-\theta )$.

Assuming independence for the parameters that are subject to bounds, the auxiliary distribution to generate importance samples is

\begin{equation*} \begin{pmatrix} \mb{p} \\ \mb{q} \\ \mb{r} \\ \mb{s} \end{pmatrix} \sim \mb{N} \left[ \begin{pmatrix} \mu _ p \\ \mu _{q} \\ \mu _{r} \\ \mu _{s} \\ \end{pmatrix}, \begin{pmatrix} \Sigma _ p & 0 & 0 & 0 \\ 0 & \Sigma _ q & 0 & 0 \\ 0 & 0 & \Sigma _ r & 0 \\ 0 & 0 & 0 & \Sigma _ r \\ \end{pmatrix} \right] \end{equation*}

where $\mb{p}$, $\mb{q}$, $\mb{r}$ and $\mb{s}$ are vectors containing the transformations of the unbounded, bounded-below, bounded-above and bounded-above-and-below parameters. Also, given the imposed independence structure, $\Sigma _ p$ can be a non-diagonal matrix while $\Sigma _ q$, $\Sigma _ r$ and $\Sigma _ s$ are imposed to be diagonal matrices.

Standard Distributions

Table 12.4 through Table 12.9 show all the distribution density functions that PROC COUNTREG recognizes. You specify these distribution densities in the PRIOR statement.

Table 12.4: Beta Distribution

PRIOR statement

BETA(SHAPE1=a, SHAPE2=b, MIN=m, MAX=M)

 

Note: Commonly $m=0$ and $M=1$.

Density

$\frac{(\theta -m)^{a-1} (M-\theta )^{b-1}}{B(a,b)(M-m)^{a+b-1}}$

Parameter restriction

$a>0$, $b>0$, $-\infty <m<M<\infty $

Range

$ \left\{  \begin{array}{ll} \left[ m, M \right] &  \mbox{when } a = 1, b = 1 \\ \left[ m, M \right) &  \mbox{when } a = 1, b \neq 1 \\ \left( m, M \right] &  \mbox{when } a \neq 1, b = 1 \\ \left( m, M \right) &  \mbox{otherwise} \end{array} \right. $

Mean

$ \frac{a}{a+b}\times (M-m)+m$

Variance

$ \frac{ab}{(a+b)^2(a+b+1)}\times (M-m)^2$

Mode

$ \left\{  \begin{array}{ll} \frac{a-1}{a+b-2}\times M+\frac{b-1}{a+b-2}\times m &  a > 1, b > 1 \\ m \mbox{ and } M &  a < 1, b < 1 \\ m &  \left\{  \begin{array}{l} a < 1, b \geq 1 \\ a = 1, b > 1 \\ \end{array} \right. \\ M &  \left\{  \begin{array}{l} a \geq 1, b < 1 \\ a > 1, b = 1 \\ \end{array} \right. \\ \mbox{not unique} &  a = b = 1 \end{array} \right. $

Defaults

SHAPE1=SHAPE2=1, $\Variable{MIN}\rightarrow -\infty $, $\Variable{MAX}\rightarrow \infty $


Table 12.5: Gamma Distribution

PRIOR statement

GAMMA(SHAPE=a, SCALE=b )

Density

$\frac{1}{b^ a\Gamma (a)} \theta ^{a-1} e^{-\theta /b} $

Parameter restriction

$ a > 0, b > 0 $

Range

$[0,\infty )$

Mean

$ab$

Variance

$ab^2$

Mode

$(a-1)b$

Defaults

SHAPE=SCALE=1


Table 12.6: Inverse Gamma Distribution

PRIOR statement

IGAMMA(SHAPE=a, SCALE=b)

Density

$ \frac{b^ a}{\Gamma (a)} \theta ^{-(a+1)}e^{-b/\theta } $

Parameter restriction

$ a > 0, b > 0$

Range

$ 0<\theta <\infty $

Mean

$\frac{b}{a-1},\qquad a > 1$

Variance

$\frac{b^2}{(a-1)^2(a-2)},\qquad a>2$

Mode

$ \frac{b}{a+1}$

Defaults

SHAPE=2.000001, SCALE=1


Table 12.7: Normal Distribution

PRIOR statement

NORMAL(MEAN=$\mu $, VAR=$\sigma ^2$)

Density

$ \frac{1}{\sigma \sqrt {2\pi }} \exp \left( - \frac{(\theta - \mu )^2}{2\sigma ^2}\right) $

Parameter restriction

$ \sigma ^2 > 0 $

Range

$ -\infty <\theta <\infty $

Mean

$\mu $

Variance

$\sigma ^2$

Mode

$\mu $

Defaults

MEAN=0, VAR=1000000


Table 12.8: t Distribution

PRIOR statement

T(LOCATION=$\mu $, DF=$\nu $)

Density

$\frac{\Gamma \left(\frac{\nu +1}{2}\right)}{\Gamma \left(\frac{\nu }{2}\right)\sqrt {\pi \nu }}\left[1+\frac{(\theta -\mu )^2}{\nu }\right]^{-\frac{\nu +1}{2}} $

Parameter restriction

$ \nu > 0 $

Range

$ -\infty <\theta <\infty $

Mean

$\mu , \text { for }\nu >1$

Variance

$\frac{\nu }{\nu -2}, \text { for }\nu >2$

Mode

$\mu $

Defaults

LOCATION=0, DF=3


Table 12.9: Uniform Distribution

PRIOR statement

UNIFORM(MIN=m, MAX=M)

Density

$ \frac{1}{M-m}$

Parameter restriction

$-\infty <m<M<\infty $

Range

$ \theta \in [m, M]$

Mean

$ \frac{m+M}{2} $

Variance

$\frac{(M-m)^2}{12}$

Mode

Not unique

Defaults

MIN$\rightarrow -\infty $, MAX$\rightarrow \infty $