The MCMC Procedure

Blocking of Parameters

In a multivariate parameter model, if all $\text{[math]}$ parameters are proposed with one joint distribution $\text{[math]}$ , acceptance or rejection would occur for all of them. This can be rather inefficient, especially when parameters have vastly different scales. A way to avoid this difficulty is to allocate the $\text{[math]}$ parameters into $\text{[math]}$ blocks and update them separately. The PARMS statement specifies model parameters. It also puts parameters in separate blocks, and each block of parameters is updated sequentially in the procedure.

Suppose that you want to sample from a multivariate distribution with probability density function $\text{[math]}$ where $\text{[math]}$ Now suppose that these $\text{[math]}$ parameters are separated into $\text{[math]}$ blocks—for example, $\text{[math]}$ where $\text{[math]}$ , where each $\text{[math]}$ contains a nonempty subset of the $\text{[math]}$ , and where each $\text{[math]}$ is contained in one and only one $\text{[math]}$ . In the MCMC context, the $\text{[math]}$ ’s are blocks of parameters. In the blocked algorithm, a proposal is composed of several parts. Instead of proposing a simultaneous move for all the $\text{[math]}$ ’s, a proposal is made for the $\text{[math]}$ ’s in $\text{[math]}$ only, then for the $\text{[math]}$ ’s in $\text{[math]}$ , and so on for $\text{[math]}$ subproposals. Any accepted proposal can involve any number of the blocks moving. Not necessarily all of the parameters move at once as in the all-at-once Metropolis algorithm.

Formally, the blocked Metropolis algorithm is as follows. Let $\text{[math]}$ be the collection of $\text{[math]}$ that are in block $\text{[math]}$ and let $\text{[math]}$ be a symmetric multivariate distribution centered at the current values of $\text{[math]}$ .

Let $\text{[math]}$ . Choose points for all $\text{[math]}$ . This can be an arbitrary point as long as $\text{[math]}$ .
For $\text{[math]}$ :
1. Generate a new sample, $\text{[math]}$ , using the proposal distribution $\text{[math]}$ .
2. Calculate the following quantity:
  
  $\text{[math]}$
3. Sample $\text{[math]}$ from the uniform distribution $\text{[math]}$ .
4. Set $\text{[math]}$ if $\text{[math]}$ ; $\text{[math]}$ otherwise.
Set $\text{[math]}$ . If $\text{[math]}$ , the number of desired samples, go back to Step 2; otherwise, stop.

With PROC MCMC, you can sample all parameters simultaneously by putting them all in a single PARMS statement, you can sample parameters individually by putting each parameter in its own PARMS statement, or you can sample certain subsets of parameters together by grouping each subset in its own PARMS statements. For example, if the model you are interested in has five parameters, alpha, beta, gamma, phi, sigma, the all-at-once strategy is as follows:

parms alpha beta gamma phi sigma;

The one-at-a-time strategy is as follows:

parms alpha;
parms beta;
parms gamma;
parms phi;
parms sigma;

A two-block strategy could be as follows:

parms alpha beta gamma;
parms phi sigma;

The exceptions to the previously described blocking strategies are parameters that use conjugate sampler and array-based parameters (parameters that have multivariate prior distributions). In these cases, the parameters are updated by themselves, regardless of whether they are members of any PARMS statement blocks.

One of the greatest challenges in MCMC sampling is achieving good mixing of the chains—the chains should quickly traverse the support of the stationary distribution. A number of factors determine the behavior of a Metropolis sampler; blocking is one of them, so you want to be extra careful when it comes to choosing a good design. Generally speaking, forming blocks of parameters has its advantages, but it is not true that the larger the block the faster the convergence.

When simultaneously sampling a large number of parameters, the algorithm might find it difficult to achieve good mixing. As the number of parameters gets large, it is much more likely to have (proposal) samples that fall well into the tails of the target distribution, producing too small a test ratio. As a result, few proposed values are accepted and convergence is slow. On the other hand, when sampling each parameter individually, the chain might mix far too slowly because the conditional distributions (of $\text{[math]}$ given all other $\text{[math]}$ ’s) might be very "narrow." Hence, it takes a long time for the chain to explore fully that dimension alone. There are no theoretical results that can help determine an optimal "blocking" for an arbitrary parametric model. A rule followed in practice is to form small groups of correlated parameters that belong to the same context in the formulation of the model. The best mixing is usually obtained with a blocking strategy somewhere between the all-at-once and one-at-a-time strategies.