This example demonstrates how you can model the means and mixture proportions separately in a binomial cluster model. It also compares the binomial cluster model to the betabinomial model.
In a typical teratological experiment, the offspring of animals that were exposed to a toxin during pregnancy are studied for malformation. If you count the number of malformed offspring in a litter of size n, then this count is typically not binomially distributed. The responses of the offspring from the same litter are not independent; hence their sum does not constitute a binomial random variable. Relative to a binomial model, data from teratological experiments exhibit overdispersion because ignoring positive correlation among the responses tends to overstate the precision of the parameter estimates. Overdispersion mechanisms are briefly discussed in the section Overdispersion.
In this application, the focus is on mixtures and models that involve a mixing mechanism. The mixing approach (Williams, 1975; Haseman and Kupper, 1979) supposes that the binomial success probability is a random variable that follows a distribution:










If , then the betabinomial distribution reduces to a standard binomial model with success probability . The parameterization of the betabinomial distribution used by the FMM procedure is based on Neerchal and Morel (1998), see the section LogLikelihood Functions for Response Distributions for details.
Morel and Nagaraj (1993); Morel and Neerchal (1997); Neerchal and Morel (1998) propose a different model to capture dependency within binomial clusters. Their model is a twocomponent mixture that gives rise to the same mean and variance function as the betabinomial model. The genesis is different, however. In the binomial cluster model of Morel and Neerchal, suppose there is a cluster of n Bernoulli outcomes with success probability . The number of responses in the cluster decomposes into outcomes that all respond with either “success” or “failure”; the important aspect is that they all respond identically. The remaining n – N Bernoulli outcomes respond independently, so the sum of successes in this group is a binomial random variable. Denote the probability with which cluster members fall into the group of identical respondents as . Then is the probability that a response belongs to the group of independent Bernoulli outcomes.
It is easy to see how this process of dividing the individual Bernoulli outcomes creates clustering. The binomial cluster model can be written as the twocomponent mixture

where , , and . This mixture model is somewhat unusual because the mixing probability appears as a parameter in the component distributions. The two probabilities involved, and , have the following interpretation: is the unconditional probability of success for any observation, and is the probability with which the Bernoulli observations respond identically. The complement of this probability, , is the probability with which the Bernoulli outcomes respond independently. If , then the twocomponent mixture reduces to a standard Binomial model with success probability . Since both and are involved in the success probabilities of the two Binomial variables in the mixture, you can affect these binomial means by specifying effects in the PROBMODEL statement (for the s) or the MODEL statement (for the s). In a “straight” twocomponent Binomial mixture,

you would vary the success probabilities and through the MODEL statement.
With the FMM procedure, you can fit the betabinomial model by specifying DIST=BETABIN and the binomial cluster model by specifying DIST=BINOMCLUS in the MODEL statement.
Morel and Neerchal (1997) report data from a completely randomized design that studies the teratogenicity of phenytoin in 81 pregnant mice. The treatment structure of the experiment is an augmented factorial. In addition to an untreated control, mice received 60 mg/kg of phenytoin (PHT), 100 mg/kg of trichloropropene oxide (TCPO), and their combination. The design was augmented with a control group that was treated with water. As in Morel and Neerchal (1997), the two control groups are combined here into a single group.
The following DATA step creates the data for this analysis as displayed in Table 1 of Morel and Neerchal (1997). The second DATA step creates continuous variables x1
–x3
to match the parameterization of these authors.
data ossi; length tx $8; input tx$ n @@; do i=1 to n; input y m @@; output; end; drop i; datalines; Control 18 8 8 9 9 7 9 0 5 3 3 5 8 9 10 5 8 5 8 1 6 0 5 8 8 9 10 5 5 4 7 9 10 6 6 3 5 Control 17 8 9 7 10 10 10 1 6 6 6 1 9 8 9 6 7 5 5 7 9 2 5 5 6 2 8 1 8 0 2 7 8 5 7 PHT 19 1 9 4 9 3 7 4 7 0 7 0 4 1 8 1 7 2 7 2 8 1 7 0 2 3 10 3 7 2 7 0 8 0 8 1 10 1 1 TCPO 16 0 5 7 10 4 4 8 11 6 10 6 9 3 4 2 8 0 6 0 9 3 6 2 9 7 9 1 10 8 8 6 9 PHT+TCPO 11 2 2 0 7 1 8 7 8 0 10 0 4 0 6 0 7 6 6 1 6 1 7 ; data ossi; set ossi; array xx{3} x1x3; do i=1 to 3; xx{i}=0; end; pht = 0; tcpo = 0; if (tx='TCPO') then do; xx{1} = 1; tcpo = 100; end; else if (tx='PHT') then do; xx{2} = 1; pht = 60; end; else if (tx='PHT+TCPO') then do; pht = 60; tcpo = 100; xx{1} = 1; xx{2} = 1; xx{3}=1; end; run;
The FMM procedure models the mean parameters through the MODEL statement and the mixing proportions through the PROBMODEL statement. In the binomial cluster model, you can place a regression structure on either set of probabilities, and the regression structure does not need to be the same. In the following statements, the unconditional probability of ossification is modeled as a twoway factorial, whereas the intralitter effect—the propensity to group within a cluster—is assumed to be constant:
proc fmm data=ossi; class pht tcpo; model y/m = / dist=binomcluster; probmodel pht tcpo pht*tcpo; run;
The CLASS statement declares the PHT
and TCPO
variables as classification variables. They affect the analysis through their levels, not through their numeric values. The
MODEL statement declares the distribution of the data to follow a binomial cluster model. The FMM procedure then automatically
assumes that the model is a twocomponent mixture. An intercept is included by default. The PROBMODEL statement declares the effect structure for the mixing probabilities. The unconditional probability of ossification of a
fetus depends on the main effects and the interaction in the factorial.
The “Model Information” table displays important details about the model fit with the FMM procedure (Output 37.1.1). Although no K= option was specified in the MODEL statement, the FMM procedure recognizes the model as a twocomponent model. The “Class Level Information” table displays the levels and values of the PHT
and TCPO
variables. Eightyone observations are read from the data and are used in the analysis. These observations comprise 287 events
and 585 total outcomes.
Output 37.1.1: Model Information in Binomial Cluster Model with Constant Clustering Probability
Model Information  

Data Set  WORK.OSSI 
Response Variable (Events)  y 
Response Variable (Trials)  m 
Type of Model  Binomial Cluster 
Distribution  Binomial Cluster 
Components  2 
Link Function  Logit 
Estimation Method  Maximum Likelihood 
Class Level Information  

Class  Levels  Values 
pht  2  0 60 
tcpo  2  0 100 
Number of Observations Read  81 

Number of Observations Used  81 
Number of Events  287 
Number of Trials  585 
The “Optimization Information” table in Output 37.1.2 gives details about the maximum likelihood optimization. By default, the FMM procedure uses a quasiNewton algorithm. The model contains five parameters, four of which are part of the model for the mixing probabilities. The fifth parameter is the intercept in the model for .
Output 37.1.2: Optimization in Binomial Cluster Model with Constant Clustering Probability
Optimization Information  

Optimization Technique  Dual QuasiNewton 
Parameters in Optimization  5 
Mean Function Parameters  1 
Scale Parameters  0 
Mixing Prob Parameters  4 
Number of Threads  2 
Iteration History  

Iteration  Evaluations  Objective Function 
Change  Max Gradient 
0  5  174.92723892  .  43.78769 
1  2  154.13180744  20.79543149  11.2346 
2  3  153.26693611  0.86487133  6.888215 
3  2  152.84974281  0.41719329  3.541977 
4  3  152.61756033  0.23218248  2.783556 
5  3  152.54795303  0.06960730  1.146807 
6  3  152.52684929  0.02110374  0.034367 
7  3  152.52671214  0.00013715  0.011511 
8  3  152.52670799  0.00000415  0.000202 
9  3  152.52670799  0.00000000  4.001E6 
Convergence criterion (GCONV=1E8) satisfied. 
Fit Statistics  

2 Log Likelihood  305.1 
AIC (smaller is better)  315.1 
AICC (smaller is better)  315.9 
BIC (smaller is better)  327.0 
Pearson Statistic  89.2077 
Effective Parameters  5 
Effective Components  2 
After nine iterations, the iterative optimization converges. The –2 log likelihood at the converged solution is 305.1, and the Pearson statistic is 89.2077. The FMM procedure computes the Pearson statistic as a general goodnessoffit measure that expresses the closeness of the fitted model to the data.
The estimates of the parameters in the conditional probability and in the unconditional probability are given in Output 37.1.3. The intercept estimate in the model for is 0.3356. Since the default link in the binomial cluster model is the logit link, the estimate of the conditional probability is

This value is displayed in the “Inverse Linked Estimate” column. There is greater than a 50% chance that the individual fetuses in a litter provide the same response. The clustering tendency is substantial.
Output 37.1.3: Parameter Estimates in Binomial Cluster Model with Constant Clustering Probability
Parameter Estimates for 'Binomial Cluster' Model  

Component  Effect  Estimate  Standard Error  z Value  Pr > z  Inverse Linked Estimate 
1  Intercept  0.3356  0.1714  1.96  0.0503  0.5831 
Parameter Estimates for Mixing Probabilities  

Effect  pht  tcpo  Estimate  Standard Error  z Value  Pr > z 
Intercept  1.2194  0.4690  2.60  0.0093  
pht  0  0.9129  0.5608  1.63  0.1036  
pht  60  0  .  .  .  
tcpo  0  0.3295  0.5534  0.60  0.5516  
tcpo  100  0  .  .  .  
pht*tcpo  0  0  0.6162  0.6678  0.92  0.3561 
pht*tcpo  0  100  0  .  .  . 
pht*tcpo  60  0  0  .  .  . 
pht*tcpo  60  100  0  .  .  . 
The “Mixing Probabilities” table displays the estimates of the parameters in the model for on the logit scale (Output 37.1.3). Table 37.11 constructs the estimates of the unconditional probabilities of ossification.
Table 37.11: Estimates of Ossification Probabilities
PHT 
TCPO 



0 
0 
–1.2194 + 0.9129 + 0.3295 + 0.6162 = 0.6392 
0.6546 
60 
0 
–1.2194 + 0.3295 = –0.8899 
0.2911 
0 
100 
–1.2194 + 0.9129 = –0.3065 
0.4240 
60 
100 
–1.2194 
0.2280 
Morel and Neerchal (1997) considered a model in which the intralitter effects also depend on the treatments. This model is fit with the FMM procedure with the following statements:
proc fmm data=ossi; class pht tcpo; model y/m = pht tcpo pht*tcpo / dist=binomcluster; probmodel pht tcpo pht*tcpo; run;
The –2 log likelihood of this model is much reduced compared to the previous model with constant conditional probability (compare 287.8 in Output 37.1.4 with 305.1 in Output 37.1.2). The likelihoodratio statistic of 17.3 is significant, ). Varying the conditional probabilities by treatment improved the model fit significantly.
Output 37.1.4: Fit Statistics and Parameter Estimates in Binomial Cluster Model
Fit Statistics  

2 Log Likelihood  287.8 
AIC (smaller is better)  303.8 
AICC (smaller is better)  305.8 
BIC (smaller is better)  323.0 
Pearson Statistic  85.5998 
Effective Parameters  8 
Effective Components  2 
Parameter Estimates for 'Binomial Cluster' Model  

Component  Effect  pht  tcpo  Estimate  Standard Error  z Value  Pr > z 
1  Intercept  1.8213  0.5889  3.09  0.0020  
1  pht  0  1.4962  0.6630  2.26  0.0240  
1  pht  60  0  .  .  .  
1  tcpo  0  3.1828  1.1261  2.83  0.0047  
1  tcpo  100  0  .  .  .  
1  pht*tcpo  0  0  3.3736  1.1953  2.82  0.0048 
1  pht*tcpo  0  100  0  .  .  . 
1  pht*tcpo  60  0  0  .  .  . 
1  pht*tcpo  60  100  0  .  .  . 
Parameter Estimates for Mixing Probabilities  

Effect  pht  tcpo  Estimate  Standard Error  z Value  Pr > z 
Intercept  0.7394  0.5395  1.37  0.1705  
pht  0  0.4351  0.6203  0.70  0.4830  
pht  60  0  .  .  .  
tcpo  0  0.5342  0.5893  0.91  0.3646  
tcpo  100  0  .  .  .  
pht*tcpo  0  0  1.4055  0.7080  1.99  0.0471 
pht*tcpo  0  100  0  .  .  . 
pht*tcpo  60  0  0  .  .  . 
pht*tcpo  60  100  0  .  .  . 
Table 37.12 computes the conditional probabilities in the four treatment groups. Recall that the previous model estimated a constant clustering probability of 0.5831.
Table 37.12: Estimates of Clustering Probabilities
PHT 
TCPO 



0 
0 
1.8213 – 1.4962 – 3.1828 + 3.3736 = 0.5159 
0.6262 
60 
0 
1.8213 – 3.1828 = –1.3615 
0.2040 
0 
100 
1.8213 – 1.4962 = 0.3251 
0.5806 
60 
100 
1.8213 
0.8607 
The presence of phenytoin alone reduces the probability of response clustering within the litter. The presence of trichloropropene oxide alone does not have a strong effect on the clustering. The simultaneous presence of both agents substantially increases the probability of clustering.
The following statements fit the binomial cluster model in the parameterization of Morel and Neerchal (1997).
proc fmm data=ossi; model y/m = x1x3 / dist=binomcluster; probmodel x1x3; run;
The model fit is the same as in the previous model (compare the “Fit Statistics” tables in Output 37.1.5 and Output 37.1.4). The parameter estimates change due to the reparameterization of the treatment effects and match the results in Table III of Morel and Neerchal (1997).
Output 37.1.5: Fit Statistics and Estimates (Morel and Neerchal Parameterization)
Fit Statistics  

2 Log Likelihood  287.8 
AIC (smaller is better)  303.8 
AICC (smaller is better)  305.8 
BIC (smaller is better)  323.0 
Pearson Statistic  85.5999 
Effective Parameters  8 
Effective Components  2 
Parameter Estimates for 'Binomial Cluster' Model  

Component  Effect  Estimate  Standard Error  z Value  Pr > z 
1  Intercept  0.5159  0.2603  1.98  0.0475 
1  x1  0.1908  0.4006  0.48  0.6339 
1  x2  1.8774  0.9946  1.89  0.0591 
1  x3  3.3736  1.1953  2.82  0.0048 
Parameter Estimates for Mixing Probabilities  

Effect  Estimate  Standard Error  z Value  Pr > z 
Intercept  0.5669  0.2455  2.31  0.0209 
x1  0.8712  0.3924  2.22  0.0264 
x2  1.8405  0.3413  5.39  <.0001 
x3  1.4055  0.7080  1.99  0.0471 
The following sets of statements fit the binomial and betabinomial models, respectively, as singlecomponent mixtures in the parameterization akin to the first binomial cluster model. Note that the model effects that affect the underlying Bernoulli success probabilities are specified in the MODEL statement, in contrast to the binomial cluster model.
proc fmm data=ossi; model y/m = x1x3 / dist=binomial; run;
proc fmm data=ossi; model y/m = x1x3 / dist=betabinomial; run;
The Pearson statistic for the betabinomial model (Output 37.1.6) indicates a much better fit compared to the singlecomponent binomial model (Output 37.1.7). This is not surprising since these data are obviously overdispersed relative to a binomial model because the Bernoulli outcomes are not independent. The difference between the binomial cluster and the betabinomial model lies in the mechanism by which the correlations are induced:
a mixing mechanism in the betabinomial model that leads to a common shared random effect among all offspring in a cluster
a mixture specification in the binomial cluster model that divides the offspring in a litter into identical and independent responders
Output 37.1.6: Fit Statistics in Binomial Model
Fit Statistics  

2 Log Likelihood  401.8 
AIC (smaller is better)  409.8 
AICC (smaller is better)  410.3 
BIC (smaller is better)  419.4 
Pearson Statistic  252.1 
Output 37.1.7: Fit Statistics in BetaBinomial Model
Fit Statistics  

2 Log Likelihood  306.6 
AIC (smaller is better)  316.6 
AICC (smaller is better)  317.4 
BIC (smaller is better)  328.5 
Pearson Statistic  87.5379 