The Bayes theorem states that
where is a vector of parameters and is the product of the prior densities, which are specified in the PRIOR statement. The term is the likelihood associated with the MODEL statement. The function is the nonnormalized posterior distribution over the parameter vector . The normalized posterior distribution, or simply the posterior distribution, is
The denominator , also called the “marginal likelihood,” is a quantity of interest because it represents the probability of the data after the effect of the parameter vector has been averaged out. Due to its interpretation, the marginal likelihood can be used in various applications, including model averaging and variable or model selection.
A natural estimate of the marginal likelihood is provided by the harmonic mean,
where is a sample draw from the posterior distribution. This estimator has proven to be unstable in practical applications.
An alternative and more stable estimator can be obtained by using an importance sampling scheme. The auxiliary distribution for the importance sampler can be chosen through the cross-entropy theory (Chan and Eisenstat 2015). In particular, given a parametric family of distributions, the auxiliary density function is chosen to be the one closest, in terms of the Kullback-Leibler divergence, to the probability density that would give a zero variance estimate of the marginal likelihood. In practical terms, this is equivalent to the following algorithm:
Choose a parametric family, , for the parameters of the model:
Evaluate the maximum likelihood estimator of by using the posterior samples as data
Use to generate the importance samples:
Estimate the marginal likelihood:
The parametric family for the auxiliary distribution is chosen to be Gaussian. The parameters that are subject to bounds are transformed accordingly
If , then .
If , then .
If , then .
If , then .
Assuming independence for the parameters that are subject to bounds, the auxiliary distribution to generate importance samples is
where , , and are vectors containing the transformations of the unbounded, bounded-below, bounded-above and bounded-above-and-below parameters. Also, given the imposed independence structure, can be a non-diagonal matrix while , and are imposed to be diagonal matrices.
Table 12.4 through Table 12.9 show all the distribution density functions that PROC COUNTREG recognizes. You specify these distribution densities in the PRIOR statement.
Table 12.4: Beta Distribution
PRIOR statement |
BETA(SHAPE1=a, SHAPE2=b, MIN=m, MAX=M) |
Note: Commonly and . |
|
Density |
|
Parameter restriction |
, , |
Range |
|
Mean |
|
Variance |
|
Mode |
|
Defaults |
SHAPE1=SHAPE2=1, , |
Table 12.5: Gamma Distribution
PRIOR statement |
GAMMA(SHAPE=a, SCALE=b ) |
Density |
|
Parameter restriction |
|
Range |
|
Mean |
|
Variance |
|
Mode |
|
Defaults |
SHAPE=SCALE=1 |
Table 12.6: Inverse Gamma Distribution
PRIOR statement |
IGAMMA(SHAPE=a, SCALE=b) |
Density |
|
Parameter restriction |
|
Range |
|
Mean |
|
Variance |
|
Mode |
|
Defaults |
SHAPE=2.000001, SCALE=1 |
Table 12.7: Normal Distribution
PRIOR statement |
NORMAL(MEAN=, VAR=) |
Density |
|
Parameter restriction |
|
Range |
|
Mean |
|
Variance |
|
Mode |
|
Defaults |
MEAN=0, VAR=1000000 |
Table 12.8: t Distribution
PRIOR statement |
T(LOCATION=, DF=) |
Density |
|
Parameter restriction |
|
Range |
|
Mean |
|
Variance |
|
Mode |
|
Defaults |
LOCATION=0, DF=3 |
Table 12.9: Uniform Distribution
PRIOR statement |
UNIFORM(MIN=m, MAX=M) |
Density |
|
Parameter restriction |
|
Range |
|
Mean |
|
Variance |
|
Mode |
Not unique |
Defaults |
MIN, MAX |