The AGGREGATE= option appears in the LOGISTIC, GENMOD, and PROBIT procedures and it applies to binomial or multinomial response variables. The following information applies to all three procedures. However, the MODEL statements that are used for illustration here are from PROC GENMOD and therefore include the DIST= option. Display of the Pearson and deviance goodness of fit statistics can be requested in PROC LOGISTIC by specifying both the SCALE=NONE and AGGREGATE options. In PROC GENMOD, the SCALE= option is not needed to display these statistics, but as described below, the AGGREGATE option is needed. In PROC PROBIT, both the LACKFIT and AGGREGATE options must be specified.
The deviance and Pearson chi-square statistics are sums, over subpopulations, of the discrepancies between the observed and predicted response probabilities. Note that the model must create only one predicted probability for a subpopulation in order for it to know how to compute the discrepancy for that subpopulation. To compute these statistics, the procedure has to know how the data was sampled, that is, what the subpopulations are in the data. Typically, the
subpopulations are defined by the observed settings of all the predictors in the
model and this is indicated by listing all of the predictor variables in the
AGGREGATE= option. In this example, the predictor variables are A, B, and C, and each unique setting of these variables represents a subpopulation:
model y = a b c a*b a*c / dist=binomial aggregate=(a b c);
However, if you fit the model and decide to remove a variable, this does not
alter the fact that the data was sampled from subpopulations defined using that
variable. So, the variables that are listed in AGGREGATE= might define subpopulations more finely than the variables in the model. For example, the following MODEL statement removes C from the model but still defines the subpopulations for the deviance statistic using A, B, and C:
model y = a b a*b / dist=binomial aggregate=(a b c);
For each A-B-C subpopulation there is only one predicted value in the above model. In fact, the predicted value that is produced by the model for a particular setting of A and B would apply to all A-B-C subpopulations that
have the same setting of A and B. But you cannot define the subpopulations
more coarsely than the variables in the model because then the model would create more than one predicted value within the subpopulations defined by AGGREGATE=. For example, suppose you have the following MODEL statement:
model y = a b c a*b a*c / dist=binomial aggregate=(a b);
This statement would cause an error similar to this:
NOTE: The SCALE= option is ignored because there is more than one profile of the
explanatory variables within the same profile of the aggregate variables.
This error occurs because within an A-B subpopulation, there would be
different predicted values associated with different levels of C, so a single
contribution to the deviance statistic could not be determined.
When using events/trials syntax to analyze summarized binomial data, note that PROC GENMOD currently does not allow aggregation beyond the observation level. That is, PROC GENMOD always assumes that each observation is a subpopulation. In the LOGISTIC or PROBIT procedures, you can use the AGGREGATE= option to further aggregate the binomial observations.
Operating System and Release Information
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.