The following example applies the Pearson goodness of fit test to assess the fit of the negative binomial distribution to a set of count data after estimating the parameters of the distribution. For general information on testing the fit of distributions, see this note.
This example appears in Morel and Neerchal (2012). The data are counts of eggs produced by a species of snail found in samples taken from 926 individuals. These statements create data set EGGS containing the data.
data eggs; input y freq; datalines; 0 603 1 112 2 93 3 53 4 19 5 21 6 7 7 6 8 5 9 2 10 1 11 2 14 2 ;
The following statements fit the negative binomial model to the data using PROC COUNTREG in SAS/ETS software. Since the model contains only an intercept (no covariates), the data are considered a sample from a single population. The intercept estimates the linktransformed negative binomial mean. If desired, the PRED= option in the OUTPUT statement could be specified to apply the inverse link function and provide an estimate of the negative binomial mean. The negative binomial dispersion parameter is also estimated. The PROB= option provides the expected probabilities for each observed count. Note that the model can also be fit using PROC GENMOD in SAS/STAT software, but GENMOD does not provide an option to save the expected probabilities of the observed counts.
proc countreg data=eggs; model y = / dist=negbin; output out=exp_nb prob=prob_nb; freq freq; run;
Notice that two parameters are estimated by PROC COUNTREG as described above.

Because of the small expected probabilities for the larger counts, some counts are combined before conducting the test. The FORMAT and MEANS steps below combine counts 8 and 9 into one category and counts 10 and larger into another category. This results in a total of 10 categories of Y. The variable containing the expected probabilities is named _TESTP_ as explained below.
proc format; value yfmt 89 = "8 or 9" 10high = ">=10"; run; proc means sum nway data=exp_nb; class y; var prob_nb; format y yfmt.; output out=exp_nb sum=_testp_; run;
Note that counts of 12, 13, 15, and higher did not occur in this data set. However, they would be expected to occur in larger or repeated samples. Consequently, frequencies for these counts may be regarded as sampling zeros and their expected probabilities should be included when computing the goodness of fit statistics. All expected probabilities should sum to one, but not all possible counts are ever observed in a sample. The expected probability of all counts 10 or greater is equal to one minus the probability of all counts less than 10. The following DATA step cumulates the expected probabilities (in SUMEXP) and sets the expected probability for the last count (Y=10) to this difference.
data exp_nb; set exp_nb; sumexp + _testp_; sumtolast = lag(sumexp); if y=10 then _testp_ = 1  sumtolast; run;
The CHISQ option in the PROC FREQ step which follows requests that the goodness of fit chisquare test be performed for the oneway table of Y counts. The WEIGHT statement provides the observed frequencies. When the TESTP= suboption specifies a data set name (EXP_NB in this case), PROC FREQ looks for a variable named _TESTP_ in the data set. This variable provides the expected probabilities. With the observed and expected values, the CHISQ option is able to compute the goodness of fit statistics. The likelihood ratio chisquare test is included by specifying the LRCHISQ suboption. However, to provide proper tests the usual degrees of freedom for the tests (k1, where k is the number of Y categories) must be reduced by the number of parameters that were estimated. Since PROC COUNTREG was used to estimate the two parameters of the negative binomial distribution (mean and dispersion), the DF=2 option is specified to reduce the degrees of freedom by two. Alternatively, you could provide the correct degrees of freedom by specifying DF=7 since k=10 and two parameters are estimated (1012 = 7). The capability to specify a data set of expected probabilities or frequencies and to adjust degrees of freedom was added in the 9.3 TS1M2 release of SAS.
proc freq data=eggs; table y / chisq(testp=exp_nb df=2 lrchisq); format y yfmt.; weight freq; run;
The results of the tests indicate that the negative binomial model does not fit the data well (p=0.0016).

Product Family  Product  System  SAS Release  
Reported  Fixed*  
SAS System  SAS/STAT  z/OS  
OpenVMS VAX  
Microsoft® Windows® for 64Bit Itaniumbased Systems  
Microsoft Windows Server 2003 Datacenter 64bit Edition  
Microsoft Windows Server 2003 Enterprise 64bit Edition  
Microsoft Windows XP 64bit Edition  
Microsoft® Windows® for x64  
OS/2  
Microsoft Windows 8  
Microsoft Windows 95/98  
Microsoft Windows 2000 Advanced Server  
Microsoft Windows 2000 Datacenter Server  
Microsoft Windows 2000 Server  
Microsoft Windows 2000 Professional  
Microsoft Windows 2012  
Microsoft Windows NT Workstation  
Microsoft Windows Server 2003 Datacenter Edition  
Microsoft Windows Server 2003 Enterprise Edition  
Microsoft Windows Server 2003 Standard Edition  
Microsoft Windows Server 2003 for x64  
Microsoft Windows Server 2008  
Microsoft Windows Server 2008 for x64  
Microsoft Windows XP Professional  
Windows 7 Enterprise 32 bit  
Windows 7 Enterprise x64  
Windows 7 Home Premium 32 bit  
Windows 7 Home Premium x64  
Windows 7 Professional 32 bit  
Windows 7 Professional x64  
Windows 7 Ultimate 32 bit  
Windows 7 Ultimate x64  
Windows Millennium Edition (Me)  
Windows Vista  
Windows Vista for x64  
64bit Enabled AIX  
64bit Enabled HPUX  
64bit Enabled Solaris  
ABI+ for Intel Architecture  
AIX  
HPUX  
HPUX IPF  
IRIX  
Linux  
Linux for x64  
Linux on Itanium  
OpenVMS Alpha  
OpenVMS on HP Integrity  
Solaris  
Solaris for x64  
Tru64 UNIX 
Type:  Usage Note 
Priority:  
Topic:  Analytics ==> Categorical Data Analysis Analytics ==> Distribution Analysis SAS Reference ==> Procedures ==> FREQ 
Date Modified:  20120921 14:43:00 
Date Created:  20120921 14:38:29 