SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 52285: Fitting the beta binomial model to overdispersed binomial data

DetailsAboutRate It

The example titled "Overdispersion" in the LOGISTIC procedure documentation gives an example of overdispersed data. The data are the proportions (R out of N) of germinating seeds from two cultivars (CULT) that were planted in pots with two soil conditions (SOIL).

      data seeds;
         input pot n r cult soil;
         datalines;
       1 16     8      0       0
       2 51    26      0       0
       3 45    23      0       0
       4 39    10      0       0
       5 36     9      0       0
       6 81    23      1       0
       7 30    10      1       0
       8 39    17      1       0
       9 28     8      1       0
      10 62    23      1       0
      11 51    32      0       1
      12 72    55      0       1
      13 41    22      0       1
      14 12     3      0       1
      15 13    10      0       1
      16 79    46      1       1
      17 30    15      1       1
      18 51    32      1       1
      19 74    53      1       1
      20 56    12      1       1
      ;

These statements fit an ordinary logistic model to the binomial data using events/trials syntax. The SCALE=NONE option is used to display the Pearson and deviance goodness of fit statistics. Since the AGGREGATE= option is not specified, the observations are treated as separate populations.

      proc logistic data=seeds;
         model r/n = cult soil cult*soil / scale=none;
         run;

The similarity of the Pearson and deviance statistics suggests that there is sufficient replication for these statistics to be chi-square distributed. As a result, they can be used as tests of fit. The significance of both statistics (p<0.0001) indicates inadequate fit of the model. Since there are no additional variables or higher-order terms that can be added to the model to improve the fit, the significance of the statistics is taken as evidence of overdispersion.

Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 68.3465 16 4.2717 <.0001
Pearson 66.7617 16 4.1726 <.0001

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept 1 -0.3788 0.1489 6.4730 0.0110
cult 1 -0.2956 0.2020 2.1412 0.1434
soil 1 0.9781 0.2128 21.1234 <.0001
cult*soil 1 -0.1239 0.2790 0.1973 0.6569

One approach to handling overdispersion is Williams' method and this is shown in the documentation example. An alternative approach is to fit a beta-binomial model. This can be done using very similar syntax in PROC FMM. As in PROC LOGISTIC, you can use the OUTPUT statement to produce a data set of predicted values.

      proc fmm data=seeds;
         model r/n = cult soil cult*soil / dist=betabinomial;
         output out=preds pred=p_bb;
         run;

Accounting for the overdispersion in the data results in larger standard errors for the parameter estimates and therefore larger p-values. Notice the substantial increase in the p-value for SOIL (p=0.0223). Since neither CULT nor the CULT*SOIL interaction are significant, you might decide to remove both terms from the model.

Parameter Estimates for Beta-Binomial Model
Effect Estimate Standard
Error
z Value Pr > |z|
Intercept -0.3922 0.2561 -1.53 0.1256
cult -0.2322 0.3594 -0.65 0.5182
soil 0.8498 0.3718 2.29 0.0223
cult*soil -0.1088 0.5066 -0.21 0.8299
Scale Parameter 18.3264 8.4664    

See this note for more discussion of overdispersion.



Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemSAS/STATz/OS
Z64
OpenVMS VAX
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
Microsoft Windows 8 Enterprise 32-bit
Microsoft Windows 8 Enterprise x64
Microsoft Windows 8 Pro 32-bit
Microsoft Windows 8 Pro x64
Microsoft Windows 8.1 Enterprise 32-bit
Microsoft Windows 8.1 Enterprise x64
Microsoft Windows 8.1 Pro
Microsoft Windows 8.1 Pro 32-bit
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows 2000 Server
Microsoft Windows 2000 Professional
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 R2
Microsoft Windows Server 2008 for x64
Microsoft Windows Server 2012 Datacenter
Microsoft Windows Server 2012 R2 Datacenter
Microsoft Windows Server 2012 R2 Std
Microsoft Windows Server 2012 Std
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
ABI+ for Intel Architecture
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.