PROC PLM: Posterior Inference for Binomial Data :: SAS/STAT(R) 9.3 User's Guide

Example 68.4 Posterior Inference for Binomial Data

This example demonstrates how you can use PROC PLM to perform posterior inference from a Bayesian analysis. The data for this example are taken from Weisberg (1985) and concern the effect of small electrical currents on farm animals. The ultimate goal of the experiment was to understand the effects of high-voltage power lines on livestock and to better protect farm animals. Seven cows and six shock intensities were used in two experiments. In one experiment, each cow was given 30 electrical shocks with five at each shock intensity in random order. The number of shock responses was recorded for each cow at each shock level. The experiment was then repeated to investigate whether the response diminished due to fatigue of cows, or due to learning. So each cow received a total of 60 shocks. For the following analysis, the cow difference is ignored. The following DATA step lists the data where the variable current represents the shock level, the variable response represents the number of shock responses, the variable trial represents the total number of trials at each shock level, and the variable experiment represents the experiment number (1 for the initial experiment and 2 for the repeated one):

data cow;
   input current response trial experiment;
   datalines;
0  0 35 1
0  0 35 2
1  6 35 1
1  3 35 2
2 13 35 1
2  8 35 2
3 26 35 1
3 21 35 2
4 33 35 1
4 27 35 2
5 34 35 1
5 29 35 2
;

Suppose you are interested in modeling the distribution of the shock response based on the level of the current and the experiment number. You can use the GENMOD procedure to fit a frequentist logistic model for the data. However, if you have some prior information about parameter estimates, you can fit a Bayesian logistic regression model to take this prior information into account. In this case, suppose you believe the logit of response has a positive association with the shock level but you are uncertain about the ranges of other regression coefficients. To incorporate this prior information in the regression model, you can use the following statements:

data prior;
    input _type_$ current;
    datalines;
mean 100
var   50
;

proc genmod data=cow;
   class experiment;
   bayes coeffprior=normal(input=prior) seed=1;
   model response/trial = current|experiment / dist=binomial;
   store cowgmd;
   title 'Bayesian Logistic Model on Cow';
run;

The DATA step before the GENMOD procedure creates a data set prior that specifies the prior distribution information for current, which in this case is a normal distribution with mean $\text{[math]}$ and variance $\text{[math]}$ . This reflects a rough belief in a positive coefficient in a moderate range for current. The prior distribution parameters are not specified for experiment and the interaction between experiment and current, and so PROC GENMOD assigns a default prior for them, which is a normal distribution with mean $\text{[math]}$ and variance 1E6.

After the DATA step, the BAYES statement in PROC GENMOD specifies that the regression coefficients follow a normal distribution with mean and variance specified in the input data set named prior. It also specifies 1 as the seed for the random number generator in the simulation of the posterior sample. The MODEL statement requests a logistic regression model with a logit link. The STORE statement requests that the fitted results be saved into an item store named cowgmd.

The convergence diagnostics in the output of PROC GENMOD indicate that the Markov chain has converged. Output 68.4.1 displays summaries on the posterior sample of the regression coefficients. The posterior mean for the intercept is $\text{[math]}$ with a 95% HPD interval $\text{[math]}$ . The posterior mean of the coefficient for current is $\text{[math]}$ with a 95% HPD interval $\text{[math]}$ , which indicates a positive association between the logit of response and the shock level. Further investigation about whether shock reaction was different between two experiment is warranted.

Output 68.4.1 Posterior Summaries on the Bayesian Logistic Model

Bayesian Logistic Model on Cow

The GENMOD Procedure

Bayesian Analysis

Posterior Summaries
Parameter	N	Mean	Standard Deviation	Percentiles
Parameter	N	Mean	Standard Deviation	25%	50%	75%
Intercept	10000	-3.5857	0.4822	-3.9014	-3.5704	-3.2553
current	10000	1.1893	0.1536	1.0833	1.1843	1.2893
experiment1	10000	0.00727	0.7025	-0.4483	0.00849	0.4879
experiment1current	10000	0.3695	0.2529	0.1977	0.3651	0.5332

Posterior Intervals
Parameter	Alpha	Equal-Tail Interval		HPD Interval
Intercept	0.050	-4.5814	-2.6799	-4.5226	-2.6303
current	0.050	0.9016	1.5047	0.8950	1.4946
experiment1	0.050	-1.4347	1.3517	-1.4390	1.3439
experiment1current	0.050	-0.1134	0.8802	-0.1105	0.8809

Bayesian model fitting typically involves a large amount of simulation. Using the item store and PROC PLM, you do not need to refit the model to perform further posterior inference. Suppose you want to determine whether the shock reaction for the current level is different between the two experiments. You can use PROC PLM with the ESTIMATE statement in the following statements:

proc plm source=cowgmd;
   estimate
   'Diff at current 0' experiment 1 -1 current*experiment [1, 0 1] [-1, 0 2],
   'Diff at current 1' experiment 1 -1 current*experiment [1, 1 1] [-1, 1 2],
   'Diff at current 2' experiment 1 -1 current*experiment [1, 2 1] [-1, 2 2],
   'Diff at current 3' experiment 1 -1 current*experiment [1, 3 1] [-1, 3 2],
   'Diff at current 4' experiment 1 -1 current*experiment [1, 4 1] [-1, 4 2],
   'Diff at current 5' experiment 1 -1 current*experiment [1, 5 1] [-1, 5 2]
    / exp cl;
run;

Each line in the ESTIMATE statement compares the fits between the two groups at each current level. The nonpositional syntax is used for the interaction effect current*experiment. For example, the first line requests coefficient 1 for the interaction effect at current level 0 for the initial experiment, and coefficient –1 for the effect at current level 0 for the repeated experiment. The two terms are then added to derive the difference. For more details about the nonpositional syntax, see Positional and Nonpositional Syntax for Coefficients in Linear Functions of Chapter 19, Shared Concepts and Topics.

The EXP option exponentiates log odds ratios to produce odds ratios. The CL option requests that confidence limits be constructed for both log odds ratios and odds ratios. Output 68.4.2 lists the posterior sample estimates for differences between experiments at different current levels.

Output 68.4.2 Comparisons between Experiments at Different Current Levels

Bayesian Logistic Model on Cow

The PLM Procedure

Sample Estimates
Label	N	Estimate	Standard Deviation	Percentiles			Alpha	Lower HPD	Upper HPD	Exponentiated	Standard Deviation of Exponentiated	Percentiles for Exponentiated			Lower HPD of Exponentiated	Upper HPD of Exponentiated
Label	N	Estimate	Standard Deviation	25th	50th	75th	Alpha	Lower HPD	Upper HPD	Exponentiated	Standard Deviation of Exponentiated	25th	50th	75th	Lower HPD of Exponentiated	Upper HPD of Exponentiated
Diff at current 0	10000	0.007272	0.7025	-0.4483	0.00849	0.4879	0.05	-1.4390	1.3439	1.2811	0.974564	0.6387	1.0085	1.6289	0.07387	3.1001
Diff at current 1	10000	0.3767	0.4840	0.0590	0.3802	0.7051	0.05	-0.6113	1.3007	1.6362	0.824141	1.0608	1.4626	2.0240	0.4184	3.2783
Diff at current 2	10000	0.7462	0.3207	0.5316	0.7500	0.9581	0.05	0.1091	1.3665	2.2202	0.730518	1.7017	2.1170	2.6066	0.9713	3.6418
Diff at current 3	10000	1.1156	0.3151	0.9023	1.1113	1.3253	0.05	0.5205	1.7407	3.2082	1.052458	2.4652	3.0383	3.7632	1.4772	5.3149
Diff at current 4	10000	1.4851	0.4729	1.1681	1.4739	1.7943	0.05	0.5601	2.4287	4.9514	2.602392	3.2157	4.3661	6.0152	1.3250	9.9922
Diff at current 5	10000	1.8546	0.6899	1.3925	1.8382	2.3004	0.05	0.4712	3.1885	8.1917	7.099208	4.0250	6.2849	9.9777	0.8604	20.2432

The sample statistics are constructed from the posterior sample saved in the item store cowgmd. From the output, the odds of a cow showing shock reaction at level 0 in the initial experiment is 1.2811 (with a 95% HPD interval ( $\text{[math]}$ , $\text{[math]}$ )) times the odds in the repeated experiment. The HPD interval for the odds ratio is constructed based on the mean and variance of the sample of the exponentiated log odds ratios, instead of based on the exponentiated mean and variance of the posterior sample of log odds ratios. The HPD interval suggests that there is not much evidence that the cows responded differently at current level 0 between the two experiments. Similar conclusions can be drawn for current level 1, 2, and 5. However, there is strong evidence that cows responded differently at current level 3 and 4 between the two experiments. The possible explanation is that, if the current level is so small that cows could hardly feel it or the current level is so strong that cows could hardly bear it, cows would respond consistently in the two experiment. If the current level is moderate, cows might get used to it and their response diminished in the repeated experiment.

You can visualize the distribution of the posterior sample of log odds ratios by specifying the PLOTS= option in the ESTIMATE statement. In the following statements, ODS Graphics is enabled by the ODS GRAPHICS ON statement, the PLOTS=BOXPLOT option requests a box plot of posterior distribution of log odds ratios. The suboption ORIENT=HORIZONTAL specifies a horizontal orientation of the boxes.

ods graphics on;
proc plm source=cowgmd;
   estimate
   'Diff at current 0' experiment 1 -1 current*experiment [1, 0 1] [-1, 0 2],
   'Diff at current 1' experiment 1 -1 current*experiment [1, 1 1] [-1, 1 2],
   'Diff at current 2' experiment 1 -1 current*experiment [1, 2 1] [-1, 2 2],
   'Diff at current 3' experiment 1 -1 current*experiment [1, 3 1] [-1, 3 2],
   'Diff at current 4' experiment 1 -1 current*experiment [1, 4 1] [-1, 4 2],
   'Diff at current 5' experiment 1 -1 current*experiment [1, 5 1] [-1, 5 2]
    / plots=boxplot(orient=horizontal);
run;
ods graphics off;

Output 68.4.3 displays the box plot of the posterior sample of log odds ratios. The two boxes for differences at current level 3 and 4 show that the corresponding log odds ratios are significantly larger than the reference value $\text{[math]}$ . This indicate that there is obvious evidence that the probability of cow response is larger in the initial experiment than in the repeated one at the two current levels. The other four boxes show that the corresponding log odds ratios are not significantly different from 0, which suggests that there is no obvious reaction difference at current level 0, 1, 2, and 5 between the two experiments.

Output 68.4.3 Box Plot of Difference between Two Experiments