SAS Institute. The Power to Know

FOCUS AREAS

SAS/STAT Examples

Bayesian Hierarchical Poisson Regression Model for Overdispersed Count Data


Contents | SAS Program

Overview

Overdispersion occurs when count data appear more dispersed than expected under a reference model. Overdispersion can be caused by positive correlation among the observations, an incorrect model, an incorrect distributional specification, or incorrect variance functions. This example uses the MCMC procedure to fit a Bayesian hierarchical Poisson regression model to overdispersed count data. It displays how Bayesian hierarchical Poisson regression models are effective in capturing overdispersion and providing a better fit.

Analysis

Count data frequently display overdispersion (more variation than expected from a standard parametric model). Breslow ( 1984 ) and others discuss these types of models and suggest several different ways to model them. Hierarchical Poisson models have been found effective in capturing the overdispersion in data sets with extra Poisson variation.

Hierarchical Poisson regression models are expressed as Poisson models with a log link and a normal variance on the mean parameter. More formally, a hierarchical Poisson regression model is written as

for , , and .

Consider data collected by Margolin, Kaplan, and Zeiger ( 1981 ) and studied in Breslow ( 1984 ) from an Ames salmonella mutagenicity assay. Table 1 reports the number of revertant colonies of TA98 salmonella (SALM) tested at six dose levels of quinoline (DOSE) with three replicate plates.

Table 1 Salmonella Data Set

Doses of Quinoline ( g per plate)

0

10

33

100

333

1000

15

16

16

27

33

20

21

18

26

41

38

27

29

21

33

60

41

42

The following statements read the data into SAS and create the ASSAY data set. The variable models the mutagenic effect of quinoline. The value 10 is chosen because it is the smallest nonzero treatment level.

          data assay;
      input dose salm @@;
      l_dose10 = log(dose+10);
      plate = _n_;
      datalines;
      0 15    0 21    0 29 
     10 16   10 18   10 21 
     33 16   33 26   33 33 
    100 27  100 41  100 60 
    333 33  333 38  333 41 
   1000 20 1000 27 1000 42 
   ;
         

The data appear to be a likely candidate for a Bayesian hierarchical Poisson regression model with replicates. However, you usually begin with a standard analysis of a Poisson regression and evaluate any evidence for overdispersion.

Bayesian Poisson Regression Model

Suppose you want to fit a Bayesian Poisson regression model for the frequency of revertant colonies of TA98 Salmonella with density

( 1 )

for the plates, where represents the regression parameters and is the vector of covariates written as .

The likelihood function for each of the revertant colonies of salmonella and corresponding covariates is

( 2 )

where denotes a conditional probability mass function. The Poisson density is evaluated at the specified value of with a corresponding mean parameter . The three parameters, , , and , correspond to an intercept, the toxic effect for dose, and the mutagenic effect for dose, respectively.

Suppose the following prior distributions are placed on the parameters, where indicates a prior distribution:

The diffuse prior expresses your lack of knowledge about the regression parameters.

Using Bayes’ theorem, the likelihood function and prior distributions determine the posterior distribution of , and as follows:

PROC MCMC obtains samples from the desired posterior distribution. You do not need to specify the exact form of the posterior distribution.

The goodness-of-fit Pearson chi-square statistic as given in McCullagh and Nelder ( 1989 ) is calculated to assess model fit:

( 3 )

First, let and represent an expectation and a variance, respectively, for a Poisson likelihood where is defined in Equation 1 . If there is no overdispersion, the Pearson statistic will roughly equal the number of observations in the data set minus the number of parameters in the model.

The following SAS statements use the likelihood function and diffuse prior distributions to fit the Bayesian Poisson regression model and calculate the Pearson chi-square statistic.

           ods graphics on;
   proc mcmc data=assay seed=1181 nbi=10000 nmc=10000
      propcov=quanew monitor=(_parms_ Pearson) dic;
      parms beta1 2 beta2 0 beta3 0;
      prior beta: ~ normal(0,var=1000);   
      lambda = exp(beta1 + beta2*dose + beta3*l_dose10);
      model salm ~ poisson(lambda);
      if plate eq 1 then Pearson = 0;
      Pearson =  Pearson + ((salm - lambda)**2/lambda);
   run;
   ods graphics off;
          

The PROC MCMC statement invokes the procedure and specifies the input data set. The SEED= option specifies a seed for the random number generator (the seed guarantees the reproducibility of the random stream). The NBI= option specifies the number of burn-in iterations. The NMC= option specifies the number of posterior simulation iterations. The PROPCOV=QUANEW option uses the estimated inverse Hessian matrix as the initial proposal covariance matrix. The MONITOR=(_PARMS_ PEARSON) option outputs analysis on the _PARMS_ (which is shorthand for all model parameters) and the Pearson chi-square statistic. The DIC option specifies that the deviance information criterion (DIC) is requested.

The PARMS statement specifies the parameters in the model and assigns initial values to each of them. The PRIOR statements specify priors for all the parameters. The notation beta: in the PRIOR statement is shorthand for all variables that start with beta . The shorthand notation is not necessary, but it keeps your code succinct. The LAMBDA assignment statement calculates for each observation in the Poisson model, as given in Equation 1 . The MODEL statement specifies the likelihood function for SALM, as given in Equation 2 .

The next two lines of statements use SALM and LAMBDA (which is a function of sampled parameters) to calculate the Pearson chi-square statistic. The IF statement resets the value of PEARSON to zero at the top of the data set (that is, when the value of the data set variable PLATE is 1). As PROC MCMC cycles through the data set at each iteration, the procedure cumulatively adds the Pearson chi-square statistic over each value of SALM. By the end of the data set, you obtain the Pearson chi-square statistic, as defined in Equation 3 .

Figure 1 displays diagnostic plots to assess whether the Markov chains have converged.

Figure 1 Bayesian Poisson Model Diagnostic Plots for
Bayesian Poisson Model Diagnostic Plots for 1

The trace plot in Figure 1 indicates that the chain appears to have reached a stationary distribution. It also has good mixing and is dense. The autocorrelation plot indicates low autocorrelation and efficient sampling. Finally, the kernel density plot shows the smooth, unimodal shape of posterior marginal distribution for . The remaining diagnostic plots (not shown here) similarly indicate good convergence in the other parameters.

PROC MCMC produces formal diagnostic tests by default. They are omitted here since informal checks on the chains, autocorrelation, and posterior density plots displays stabilization and convergence.

Figure 2 reports summary and interval statistics for each parameter’s posterior distribution. PROC MCMC calculates the sampled value of the Pearson chi-square statistic at each iteration and produces its corresponding posterior summary statistics.

Figure 2 Posterior Model Summary of Poisson Regression
The MCMC Procedure

Posterior Summaries
Parameter N Mean Standard
Deviation
Percentiles
25% 50% 75%
beta1 10000 2.1779 0.2122 2.0384 2.1866 2.3162
beta2 10000 -0.00101 0.000236 -0.00117 -0.00101 -0.00085
beta3 10000 0.3183 0.0554 0.2812 0.3183 0.3550
Pearson 10000 49.4333 3.4943 46.8652 48.4389 50.9847

Posterior Intervals
Parameter Alpha Equal-Tail Interval HPD Interval
beta1 0.050 1.7487 2.5922 1.7558 2.5957
beta2 0.050 -0.00149 -0.00055 -0.00150 -0.00057
beta3 0.050 0.2100 0.4286 0.2165 0.4322
Pearson 0.050 45.5640 58.8233 45.3203 56.2570

With and three model parameters, the sampled value 49.4333 of the Pearson chi-square statistic is much greater than the desired , providing evidence of overdispersion. Although you have already captured the mutagenic and toxic effects, a Bayesian hierarchical Poisson model can also capture the observed excess variation due to the replicate plates.

Figure 3 reports the calculated DIC (Spiegelhalter et al.; 2002 ) for the Bayesian Poisson regression model. The DIC is a model assessment tool, and a Bayesian alternative to Akaike’s or Bayesian information criterion. The DIC can be applied to non-nested models and models that have non-iid data. When models with different DIC values are compared, a smaller DIC indicates a better fit to the data set. The DIC for this model is 141.961.

Figure 3 DIC of Poisson Regression
Deviance Information Criterion
Dbar (posterior mean of deviance) 139.110
Dmean (deviance evaluated at posterior mean) 136.260
pD (effective number of parameters) 2.851
DIC (smaller is better) 141.961

Bayesian Hierarchical Poisson Regression Model

In overdispersed Poisson regression, the parameter estimates do not vary much from the Poisson model, but the estimated variance is inflated. Draper ( 1996 ) considers Bayesian hierarchical Poisson regression models for this type of data with density

( 4 )

for the treatment levels and replicates. The likelihood function is given in Equation 2 , but now there are additional parameters in the model. You add random effects, which are equivalent to modeling with a normal distribution with mean and variance .

Suppose the priors are placed on the regression and variance parameters as follows:

The mean and variance, and , respectively, for the Pearson chi-square statistic in the hierarchical Poisson regression model are calculated using the iterated expectation and variance functions as:

( 5 )
( 6 )

The following SAS statements use the likelihood function and prior distributions to fit the Bayesian Poisson hierarchical regression model and calculate the Pearson chi-square statistic.

           proc mcmc data=assay seed=248601 nbi=10000 nmc=100000
      monitor=(beta1-beta3 Pearson) dic;
      array llambda[18];
    
      parms beta1 2 beta2 0.01 beta3 0.5;
      parms s2 1;
      parms (llambda:) 1;
     
      prior beta: ~ normal(0,var=1000);
      prior s2 ~ general(-log(s2));
      w = beta1 + beta2*dose + beta3*l_dose10;
      if plate eq 1 then lp = 0;
      lp = lp +  lpdfnorm(llambda[plate], w, sqrt(s2));
      prior llambda: ~ general(lp);
      lambda = exp(llambda[plate]);      
      model salm ~ poisson(lambda);
      
      mu = exp(w + s2/2);
      sigma2 = mu + (exp(s2) - 1)*(mu**2);
      if plate eq 1 then Pearson = 0;
      Pearson =  Pearson + ((salm - mu)**2/(sigma2));
   run;    
   ods graphics off;
          

The ARRAY statement associates a name with a list of variables and constants. The ARRAY statement enables easy referencing of the variables and corresponding values in the procedure. In this example, the ARRAY statement is used to create an array of parameters for the values of all 18 observations.

The first PARMS statement places all regression parameters in a single block and assigns them initial values. The second PARMS statement places the variance parameter in a separate block and assigns it an initial value of 1. The third PARMS statement places all the parameters in a separate block and assigns them an initial value of 1.

The PRIOR statement on the regression parameters does not change from the Poisson regression model. The PRIOR statement for uses the GENERAL function. The GENERAL function indicates that you are using a SAS statement to construct a nonstandard density or distribution. The argument is an expression which takes the value of the logarithm function of the prior or likelihood distribution. The noninformative prior on is not a standard distribution; therefore the GENERAL function is used here.

The W assignment statement calculates for each observation. The IF statement resets the value of LP to be zero at the top of the data set (that is, when the value of the data set variable PLATE is 1). As PROC MCMC cycles through the data set at each iteration, the procedure cumulatively adds the log density of the normally distributed with mean W and variance . The LP expression takes the value of the logarithm of the joint prior distribution of the s. The next PRIOR statement indicates that the is jointly distributed with LP as the value of the logarithm of the density. By the end of the data set, you obtain the sum of the log densities stored in the LP variable. The sum of the log densities is used as the input parameter in the GENERAL function in the PRIOR statement for the parameters.

The LAMBDA assignment statement calculates in the hierarchical Poisson regression model, as given in Equation 4 . The MODEL statement specifies the likelihood function for SALM, as given in Equation 2 .

The next four lines of statements use SALM and the sampled values of and to calculate the Pearson chi-square statistic according to Equation 3 . The moments are evaluated in the MU and SIGMA2 assignment variables according to Equation 5 and 6 , respectively.

The diagnostic plot for is illustrated in Figure 4 . It displays the desired convergence, low autocorrelation, and smooth unimodal marginal posterior density for the parameter. The remaining diagnostic plots (not shown here) similarly indicate good convergence in the other parameters.

Figure 4 Bayesian Hierarchical Poisson Regression Model Diagnostic Plot for
Bayesian Hierarchical Poisson Regression Model Diagnostic Plot for 1

Figure 5 reports summary and interval statistics for the parameters and for the Pearson chi-square statistic. The fit statistic in the Bayesian hierarchical Poisson regression model is greatly reduced with a value of 20.2052, which suggests a much better fit compared to its value of 49.4333 in the Poisson regression model. The value of the fit statistic is now much closer to the desired number of observations minus the number of parameters.

Compared to the results in Figure 2 , the parameter estimates for intercept and treatment change a small amount, their standard errors are inflated, and the confidence intervals are wider. Thus the treatment effect of quinoline does not decrease with the Bayesian hierarchical Poisson regression model.

Figure 5 Posterior Summary of Bayesian Hierarchical Poisson Regression Model
The MCMC Procedure

Posterior Summaries
Parameter N Mean Standard
Deviation
Percentiles
25% 50% 75%
beta1 100000 2.1561 0.3602 1.9293 2.1615 2.3909
beta2 100000 -0.00099 0.000431 -0.00126 -0.00099 -0.00071
beta3 100000 0.3150 0.0982 0.2514 0.3141 0.3778
Pearson 100000 20.2052 7.6027 14.6394 19.3144 24.8984

Posterior Intervals
Parameter Alpha Equal-Tail Interval HPD Interval
beta1 0.050 1.4256 2.8607 1.4468 2.8719
beta2 0.050 -0.00183 -0.00013 -0.00183 -0.00014
beta3 0.050 0.1225 0.5095 0.1210 0.5070
Pearson 0.050 7.9802 37.3088 6.8023 35.5336

Figure 3 reports the computed DIC for the Bayesian hierarchical Poisson regression model. The DIC for the hierarchical model is 123.329 and is smaller than the DIC for the Poisson regression model shown in Figure 3 . A smaller value of DIC suggests a better fit; you see that the hierarchical model provides a better fit to the data.

Figure 6 DIC of Poisson Regression
Deviance Information Criterion
Dbar (posterior mean of deviance) 110.733
Dmean (deviance evaluated at posterior mean) 98.136
pD (effective number of parameters) 12.596
DIC (smaller is better) 123.329

References

Breslow, N. E. (1984), “Extra Poisson Variation in Log-Linear Models,” Applied Statistics , 33, 38–44.

Draper, D. (1996), “Discussion of the Paper by Lee and Nelder,” Journal of the Royal Statistical Society, Series B , 58, 662–663.

Margolin, B. H., Kaplan, N., and Zeiger, E. (1981), “Statistical Analysis of the Ames Salmonella Microsome Test,” Proceedings of the National Academy of Science , 76, 3779–3783.

McCullagh, P. and Nelder, J. A. (1989), Generalized Linear Models , Second Edition, London: Chapman & Hall.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van der Linde, A. (2002), “Bayesian Measures of Model Complexity and Fit,” Journal of the Royal Statistical Society, Series B , 64(4), 583–616, with discussion.