The GENMOD Procedure

Multinomial Models

This type of model applies to cases where an observation can fall into one of k categories. Binary data occur in the special case where k = 2. If there are $m_ i$ observations in a subpopulation i, then the probability distribution of the number falling into the k categories $\mb {y}_ i = (y_{i1}, y_{i2}, \ldots , y_{ik})$ can be modeled by the multinomial distribution, defined in the section Response Probability Distributions, with $\sum _ j y_{ij} = m_ i$ . The multinomial model is an ordinal model if the categories have a natural order.

Residuals are not available in the OBSTATS table or the output data set for multinomial models.

By default, and consistently with binomial models, the GENMOD procedure orders the response categories for ordinal multinomial models from lowest to highest and models the probabilities of the lower response levels. You can change the way PROC GENMOD orders the response levels with the RORDER= option in the PROC GENMOD statement. The order that PROC GENMOD uses is shown in the “Response Profiles” output table described in the section Response Profile.

The GENMOD procedure supports only the ordinal multinomial model. If $(p_{i1}, p_{i2}, \ldots ,p_{ik})$ are the category probabilities, the cumulative category probabilities are modeled with the same link functions used for binomial data. Let $P_{ir} = \sum _{j=1}^ r p_{ij}$ , $r=1, 2, \ldots , k\! -\! 1$ , be the cumulative category probabilities (note that $P_{ik} = 1$ ). The ordinal model is

$g(P_{ir}) = \mu _ r + \mb {x}^\prime \bbeta ~ ~ ~ \mbox{for}~ ~ ~ r = 1, 2, \ldots , k\! -\! 1$

where $\mu _1, \mu _2, \ldots , \mu _{k-1}$ are intercept terms that depend only on the categories and $\mb {x}_ i$ is a vector of covariates that does not include an intercept term. The logit, probit, and complementary log-log link functions g are available. These are obtained by specifying the MODEL statement options DIST=MULTINOMIAL and LINK=CUMLOGIT (cumulative logit), LINK=CUMPROBIT (cumulative probit), or LINK=CUMCLL (cumulative complementary log-log). Alternatively,

$P_{ir} = \mr {F}(\mu _ r + \mb {x}^\prime \bbeta ) ~ ~ ~ \mbox{for}~ ~ ~ r = 1, 2, \ldots , k\! -\! 1$

where $\mr {F}=g^{-1}$ is a cumulative distribution function for the logistic, normal, or extreme-value distribution.

PROC GENMOD estimates the intercept parameters $\mu _1, \mu _2, \ldots , \mu _{k-1}$ and regression parameters $\bbeta$ by maximum likelihood.

The subpopulations i are defined by constant values of the AGGREGATE= variable. This has no effect on the parameter estimates, but it does affect the deviance and Pearson chi-square statistics; it also affects parameter estimate standard errors if you specify the SCALE=DEVIANCE or SCALE=PEARSON option.