There are many types of models in the area of logistic modeling. Following are some common logistic models. Note that the words *logistic * and *logit * are used interchangeably.

The three basic categories of logistic models are the binary, ordinal, and nominal models. They are discussed here along with related, special models and estimation methods. Examples of many of these models can be found in the documentation of the procedures that are mentioned or at the links that are provided. Many of these models are discussed and illustrated in more detail by Stokes et al. and Allison. All procedures that are mentioned are part of SAS/STAT^{®} software unless otherwise indicated.

**Binary logistic regression model**- Used to model a
*binary*(two-level) response—for example,*yes*or*no*.

This model can be fit by many procedures, including the SAS/STAT procedures LOGISTIC and GENMOD (using asymptotic or exact conditional methods), CATMOD (using weighted least squares or maximum likelihood (ML)), PROBIT, GAM, GLIMMIX, SURVEYLOGISTIC, FMM (using ML or Bayesian estimation), GEE (beginning in SAS 9.4 TS1M2, uses Generalized Estimating Equations and provides robust standard error estimates), MCMC (using Bayesian estimation), NLMIXED, and HPGENSELECT; and the SAS/ETS*How to fit it:*^{®}procedures MDC and QLIM. The GAM and ADAPTIVEREG procedures can fit more flexible logistic models by using spline or loess smoothers. The GLIMMIX, NLMIXED, and MCMC procedures allow the inclusion of random effects in the model. Longitudinal or repeated measures data can be modeled using the GENMOD (REPEATED statement), GLIMMIX (RANDOM statement), GEE (REPEATED statement), MCMC (RANDOM statement), or CATMOD (REPEATED statement) procedures.**Case-control model using matched pairs (or sets)**- Used to model binary response data from pairs of subjects or sets of subjects that are matched on certain characteristics. In 1:1 matching, one case is matched to one control. In 1:m or m:n matching, one or more cases are matched to two or more controls. Conditional methods are used to fit this model.

Use the STRATA statement in PROC LOGISTIC. With a little additional work, the model can also be fit using the STRATA statement in PROC PHREG. Exact estimation can also be done by adding the EXACT statement in PROC LOGISTIC. Or use the STRATA and EXACT statements in PROC GENMOD beginning with SAS/STAT 9.22 in SAS 9.2 TS2M3.*How to fit it:* **Bradley-Terry model**- Used when subjects are asked to compare many items, two at a time. That is, a subject indicates preference for one item in each of several pairs of items presented. Preference probabilities and ratios of preference probabilities for items can be determined from the model parameters. Conditional methods are used to fit this model.

Details and an example are given in this sample program.*How to fit it:* **Generalized Additive Model (GAM) for binary response data**- This is a very flexible nonparametric model that relaxes assumptions of linearity and is useful for data exploration. Nonlinearity is allowed via smoothers such as cubic splines, local regression (
*loess*), and bivariate thin-plate splines. It is available only for binary responses.

In PROC GAM, specify the DIST=BINOMIAL option in the MODEL statement to fit a logistic model.*How to fit it:* **Nonparametric logistic model using adaptive splines**- Includes nonadditive models which allow interactions and additive models with no interactions. The method combines regression splines and model selection methods. An overfitted model is grown using a fast forward selection technique and then pruned back using backward selection before selecting a final model.

PROC ADAPTIVEREG (beginning in SAS 9.3 TS1M2) with the DIST=BINOMIAL option in the MODEL statement fits the nonadditive model allowing two-way interactions. Specify the ADDITIVE option in the MODEL statement if a model not allowing interaction is desired.*How to fit it:*

**Ordinal (ordered) logistic regression model**- Used to model an ordered response—for example,
*low*,*medium*, or*high*. Might also be called the*ordinal multinomial logit model*. Ordinal logistic models take into account the ordered nature of the response, which can result in simple, more powerful models. Typical response functions that are modeled are*cumulative logits*,*adjacent-category logits*, or*continuation-ratio logits*resulting in ordinal logistic models known as the cumulative logit model, the adjacent-category logit model, and the continuation-ratio logit model. The proportional odds model and the partial proportional odds model are special cases of the cumulative logit model. If the spacing between levels of the ordinal response scale is known, so that numerical scores can reasonably be assigned to the response levels, then a mean response model can be fit.**Cumulative logit model**- This is one type of ordinal logistic model that models cumulative logits: CLogit
_{i}=log((1-Pr[Y≤i])/Pr[Y≤i]), for Y=1,2,... . Notice that each cumulative logit involves all levels of the response and dichotomizes the response scale. The unrestricted cumulative logit model has a complete set of parameter estimates for each cumulative logit — that is, multiple intercepts and multiple estimates for each predictor. But such a complex model can result in probabilities that do not accumulate properly. Typically, simpler models that are special cases of the general cumulative logit model are used. Such models are the proportional odds model and the partial proportional odds model.

PROC CATMOD uses the method of weighted least squares to fit the unrestricted cumulative logit model when the RESPONSE CLOGITS; statement is specified. The model, with or without proportional odds restrictions on some or all predictors, can also be fit in PROC NLMIXED as discussed in this note.*How to fit it:* **Proportional odds model**- This is a cumulative logit model that assumes that the odds of response below a given response level are constant regardless of which level you pick. This model allows separate intercepts for the cumulative logit, but restricts the parameter sets for the predictors to be the same across all logits. A proportional odds model that constrains some predictors to have common parameters and leaves other predictors free to have separate parameters is called a partial proportional odds model.

PROC LOGISTIC fits the proportional odds model by default when the response has more than two levels. PROC GENMOD, PROC GLIMMIX, and PROC HPGENSELECT fit this model by default when the DIST=MULT option is specified in the MODEL statement. PROC PROBIT can fit the model if you specify DIST=LOGISTIC in the MODEL statement. It can also be fit in the SAS/ETS procedure QLIM if you specify the DISCRETE(DIST=LOGISTIC) option in the MODEL statement. PROC NLMIXED can fit the full or partial proportional odds model as discussed in this note. PROC GLIMMIX and NLMIXED allow for the inclusion of random effects in the model. All procedures use maximum likelihood estimation.*How to fit it:* **Adjacent-category logit model**- This is a type of ordinal logistic model that models adjacent-category logits: ALogit
_{i}=log(Pr[Y=i+1]/Pr[Y=i]), for Y=1,2,... . Notice that each adjacent-category logit contrasts two adjacent response categories rather than involving the entire response scale as with cumulative logits.

PROC CATMOD uses the method of weighted least squares to fit the adjacent-category logit model when the RESPONSE ALOGITS; statement is specified. The model has a parameter vector for each logit—that is, multiple intercepts and multiple parameters for each predictor. A simpler, restricted model with an intercept for each logit but a single parameter vector for the predictors across all logits can be fit by including the _RESPONSE_ keyword in the MODEL statement. This simpler model can also be fit by maximum likelihood in PROC CATMOD because this model can be written as a generalized logit model. See the text by Agresti for discussion and examples using PROC CATMOD. This restricted models can also be fit via maximum likelihood in PROC GENMOD as a loglinear model as discussed by Allison.*How to fit it:* **Continuation-ratio logit model**- This is a type of ordinal logistic model that models continuation-ratio logits: CRLogit
_{i}=log(Pr[Y=i+1]/Pr[Y≤i]), for Y=1,2,... .

This model, which has a parameter vector for each logit, can be fit by weighted least squares (WLS) in PROC CATMOD by using the capabilities in the RESPONSE statement to define custom response functions. It can also be fit by maximum likelihood by taking advantage of the fact that the parameters for each logit can be estimated using a separate binary logistic model. A simpler model with an intercept for each logit but a single parameter vector for the predictors across all logits can be fit by WLS in PROC CATMOD by including the _RESPONSE_ keyword in the MODEL statement. Examples are provided in this note.*How to fit it:* **Mean response model**- This is not really a logistic model because no type of logit response function is modeled. As a result, the model does not guarantee that predicted values correspond to a valid set of probabilities across the response levels. However, it is an alternative to the ordinal logistic models that were discussed earlier when numerical scores can be assigned to the response levels, implying that the spacing among the levels is known. This model is most natural when the ordinal categorical response represents a continuous response that is coarsely measured. Note that this model provides an estimated mean response rather than assigning estimated probabilities to each response level.

PROC CATMOD fits the mean response model by weighted least squares when the RESPONSE MEANS; statement is specified.*How to fit it:*

**Nominal (unordered) logistic regression model**- Used to model a multilevel response with no ordering—for example, eye color with levels
*brown*,*green*, and*blue*. Such a response is also called**polytomous**,**polychotomous**, or**multinomial**.**Multinomial logit model**- Strictly speaking, the term
*multinomial*indicates only that the response has more than two levels; it does not specify whether they are ordered or unordered. Nevertheless, the term*multinomial logit model*is often used when the response is a set of unordered choices and refers to the discrete choice model. But the term can also refer to the ordinal model or to the generalized logit model. **Generalized logit model**- Also known as the
**baseline logit**model. It is a type of unconditional, nominal logistic model in which the response functions that are modeled are known as*generalized logits*or*baseline logits*.

The LOGISTIC and SURVEYLOGISTIC procedures fit this model when you specify the LINK=GLOGIT option in the MODEL statement. The GLIMMIX and HPGENSELECT procedures fit this model when you specify the DIST=MULT and LINK=GLOGIT options in the MODEL statement. Beginning in SAS 9.4 TS1M2, the FMM procedure fits this model when you specify the DIST=MULT option in the MODEL statement. All of the above procedures fit the model using maximum likelihood estimation. Exact estimation of the model is also available in PROC LOGISTIC. This is the default model fit by PROC CATMOD when the response has more than two levels. PROC CATMOD can fit the model using maximum likelihood (the default) or weighted least squares (specify the WLS option).*How to fit it:* **Discrete choice models**- Used to model a response that is the choice of individuals—for example, among transportation modes (car, bus, train, plane). Some or all predictors can be properties of the choices (cost, speed, and so on) rather than properties of the choosers as in models such as the generalized logit model. McFadden's conditional logit model, the nested logit model, the mixed logit model, and the exploded logit model are discrete choice models. The generalized logit model is often used as a discrete choice model too when the predictors are all properties of the choosers (subjects) and not of the choices.
**McFadden's conditional logit model**- Is a discrete choice model in which the predictors are properties of the choices (response levels). This model assumes independence from irrelevant alternatives (IIA). More information about this model can be found in this note.

The discrete choice model can be fit by the SAS/STAT procedure BCHOICE using Bayesian methods (beginning in SAS 9.4 TS1M1), the SAS/STAT procedure PHREG using the STRATA statement and the TIES=BRESLOW option, the SAS/STAT procedure LOGISTIC using the STRATA statement, the SAS/ETS procedure MDC, and in the SAS Market Research Application.*How to fit it:* **Nested logit model**- Is a generalization of the conditional logit model that relaxes the IIA assumption to allow for particular patterns of correlation in unobserved utility. It can be used if the set of alternatives that are faced by an individual can be partitioned into subsets such that the IIA property holds within subsets but not across subsets.

The nested logit model can be fit by the SAS/STAT procedure BCHOICE using Bayesian methods (beginning in SAS 9.4 TS1M1), or the SAS/ETS procedure MDC.*How to fit it:* **Mixed logit model**- Is a generalization of the conditional logit model that can represent very general patterns of substitution among alternatives. In this model, the utility function of each decision maker can be decomposed into a deterministic component (linear combination of observed variables) and a stochastic error component. The choice probability is a mixture of logits. The model for the error component involves random coefficients.

The mixed logit model can be fit using the SAS/STAT procedure BCHOICE (beginning in SAS 9.4 TS1M1), or the SAS/ETS procedure MDC.*How to fit it:* **Exploded logit model**- Is a model used when subjects rank all or some of the choices. The likelihood is identical to that for a stratified Cox model typically used in survival analysis to model the time to an event. Discussion and an example can be found in the text by Allison.

The data are arranged as one observation per rank from a subject. Use PROC PHREG with the variable containing the ranks as the response variable. Specify a variable in the STRATA statement identifying the subjects. If tied ranks occur in the data, use the TIES=DISCRETE option in the MODEL statement.*How to fit it:*

**Logistic model for longitudinal (or repeated measures) data**- These models are for a response that is observed more than once on each subject (or item), either at multiple times or under multiple conditions. The response can be binary, ordinal, or nominal. There are three primary types of models: marginal (or population-averaged), subject-specific (includes fixed-effects and random-effects model), and transitional.
**Generalized Estimating Equations (GEE)**- Available for binary and ordinal responses, the GEE method allows missing values within a subject without losing all data from the subject, and time-varying predictors can appear in the model. The method requires a large number of subjects. For binary responses, a variation on the GEE model that models the association among responses with odds ratios rather than correlations is Alternating Logistic Regression (ALR). The ALR method does not restrict the correlation among the measurements as the usual GEE method does when applied to a binary response. Like the GEE model, ALR provides estimates of the marginal model parameters. But ALR also estimates parameters of the model on the log odds ratios among the measurements. The GEE and ALR models are marginal models. Note that the GEE estimation method is not a maximum likelihood method.

To fit the GEE model, specify the REPEATED statement in PROC GENMOD or (beginning in SAS 9.4 TS1M2) PROC GEE. In PROC GENMOD, the DIST=BINOMIAL or DIST=MULT option must appear in the MODEL statement to request binary or ordinal multinomial logistic models, respectively. The nominal multinomial model is not available. Use the TYPE= option in the REPEATED statement to specify the correlation structure among the repeated measurements within a subject. To fit the ALR model, specify LOGOR= rather than TYPE= in the REPEATED statement of PROC GENMOD. PROC GEE only fits the binary model but also implements the weighted GEE method when missing responses depend on previous responses. Weighted GEE is not available in PROC GENMOD. As with GENMOD, use the TYPE= option to specify the correlation structure. A GEE model, estimated by residual pseudo-likelihood, can also be fit using PROC GLIMMIX. Specify the EMPIRICAL option in the PROC GLIMMIX statement. Also, specify the RANDOM _RESIDUAL_ statement with the subject variable in the SUBJECT= option. An example is given in the GLIMMIX documentation.*How to fit it:* **Cluster model with variance adjustment**-
This model is fit by maximum likelihood, but variances are adjusted using the cluster structure of the data. For a small number of clusters, this model with the Morel adjustment (VARADJUST=MOREL) can provide a better fit than the GEE model. It can be used for binary, ordinal, or nominal responses. It is a marginal model.

Use PROC SURVEYLOGISTIC with the CLUSTER statement and optionally use the VARADJUST= option in the MODEL statement.*How to fit it:* **Binomial and multinomial cluster models**- These models, proposed by Morel and Nagaraj (1993), account for the overdispersion that results when some proportion of the observed population responds in the same way while the remaining proportion responds according to a binomial or multinomial distribution.

To fit the binomial cluster model in PROC FMM, specify the DIST=BINOMCLUS option in the MODEL statement. Beginning in SAS 9.4 TS1M2, you can fit the multinomial cluster model in PROC FMM by specifying the DIST=MCLUS option in the MODEL statement.*How to fit it:* **Weighted Least Squares (WLS)**- Available for binary, ordinal, and nominal responses, the WLS method requires complete data for each subject (otherwise the subject is ignored) and does not allow time-varying predictors in the model. It is a marginal model.

Specify the REPEATED statement in PROC CATMOD.*How to fit it:* **Fixed-effects logistic model**- This model treats each measurement on each subject as a separate observation, and the set of subject coefficients that would appear in an unconditional model are eliminated by conditional methods. This is a conditional, subject-specific model (as opposed to a population-averaged model like the GEE model). See the extensive discussion and examples in the text on Fixed Effects Methods by Allison.

For binary response data, use the STRATA statement in PROC LOGISTIC. See the Allison text on Fixed Effects Methods concerning fitting a related model for multinomial responses.*How to fit it:* **Transition models for discrete state space stochastic processes**- Discrete time Markov chains can be represented as log-linear models. Predictor variables can be incorporated by fitting a logistic model and treating previous response values as additional predictors.

Log-linear models can be fit in PROC GENMOD by specifying the cell counts of the table as the response and specifying DIST=POISSON in the MODEL statement. The first-order Markov chain for a process over four discrete times is MODEL COUNT = T1|T2 T2|T3 T3|T4 / DIST=POISSON; The second-order model is MODEL COUNT = T1|T2|T3 T2|T3|T4 / DIST=POISSON; A transitional model incorporating predictor variables is fit as a logistic model, but the input data set should have separate observations for the responses at each time for each subject. Variables representing lags of the response variable can then be used as predictors in the model. Examples are provided in this note.*How to fit it:*

**Logistic model for survey data**- Includes models for binary, ordinal, and nominal responses. For proper inferences, the analysis or survey data must incorporate properties of the survey sample design, including stratification, clustering, and unequal weighting.

In PROC SURVEYLOGISTIC use the RATE= and TOTAL= options in the PROC statement, and the CLUSTER, STRATA, and WEIGHT statements to specify the sampling design and sampling weights. Note that simply using the WEIGHT statement in PROC LOGISTIC is not sufficient since special variance estimators are required.*How to fit it:* **Random effects logistic model (as a generalized linear mixed model or nonlinear mixed model)**- Allows random effects in a logistic model resulting in a subject-specific model. This is a conditional model that can also be used to model longitudinal or repeated measures data.

Use PROC GLIMMIX and in the MODEL statement specify DIST=BINOMIAL LINK=LOGIT (for binary logit model), DIST=MULT LINK=CLOGIT (for an ordinal logit model), or DIST=MULT LINK=GLOGIT (for a nominal logit model). Use RANDOM statements to define random effects. The model can also be fit in PROC NLMIXED by using a different methodology that typically limits the number of random effects to one or two. Only binary responses are directly supported (specify BINARY(p) or BINOMIAL(n,p) in the MODEL statement), though multinomial models can be accommodated by defining the multinomial log likelihood (via the GENERAL distribution type in the MODEL statement). For Bayesian estimation, use PROC MCMC and specify BINARY or BINOMIAL in the MODEL statement, a PRIOR statement to specify prior distributions for the parameters, and RANDOM statements do define random effects.*How to fit it:* **Heteroscedastic logistic model**- This model allows the dispersion to be modeled as well as the mean. Binary and ordinal responses can be modeled.

Use the MODEL statement in the SAS/ETS procedure QLIM to specify the model for the mean, and use the HETERO statement to specify the dispersion model. Specify the DISCRETE(DIST=LOGISTIC) option in the MODEL statement to fit a binary or ordinal logistic model (depending on the number of levels that are detected in the response variable).*How to fit it:* **Multivariate logistic model**- Simultaneously models multiple responses, taking into account the correlations among all response functions.

In PROC CATMOD, specify the RESPONSE LOGITS; statement and multiple response variables in the MODEL statement to fit the model by using weighted least squares estimation. For example, these statements simultaneously model logits that are defined separately on three response variables: response logits; model x1*x2*x3 = group; The bivariate probit model can be fit in SAS/ETS PROC QLIM: model y1 y2 = x / discrete;*How to fit it:* **Conditional and exact conditional logistic regression**- These are model estimation methods using conditional methods which eliminate the need to estimate a set of parameters which can be considered nuisance parameters. A conditional likelihood is maximized to estimate the remaining parameters. Alternatively, exact estimation can be done using permutation methods. Exact methods are appropriate for small-sample or sparse data situations that often result in the failure (nonconvergence or
*separation*) of the usual unconditional maximum likelihood estimation method. However, exact methods can take a great deal of time and memory as sample or model sizes increase. For sample sizes too large for the default exact method, a Monte Carlo method is provided.

Specify the STRATA statement in PROC LOGISTIC to fit the conditional model. Also specify the EXACT statement in PROC LOGISTIC or (beginning with SAS/STAT 9.22 in SAS 9.2 TS2M3) in PROC GENMOD to fit the exact conditional model. Use the EXACTOPTIONS option in the PROC LOGISTIC statement (or the EXACTOPTIONS statement in PROC GENMOD) to select the exact method (METHOD=NETWORKMC requests the Monte Carlo method) and to control other aspects of the exact analysis.*How to fit it:* **Bayesian logistic regression**- This is a method for estimating the parameters of a binary logistic model. The method assumes that the parameters of the model are random variables and allows you to specify prior distributions for them. The method combines the data and the prior distribution to arrive at a posterior distribution for each parameter. For more information, see "Introduction to Bayesian Analysis Procedures" in the Introductions section of the
*SAS/STAT User's Guide*.

In PROC GENMOD or PROC FMM, add the BAYES statement to specify prior distributions for the parameters. In PROC MCMC, specify BINARY or BINOMIAL in the MODEL statement, and a PRIOR statement to specify prior distributions for the parameters.*How to fit it:*

Product Family | Product | System | SAS Release | |

Reported | Fixed* | |||

SAS System | SAS/STAT | All | n/a | |

SAS System | SAS/ETS | All | n/a |

Type: | Usage Note |

Priority: | low |

Topic: | SAS Reference ==> Procedures ==> SURVEYLOGISTIC SAS Reference ==> Procedures ==> PROBIT Analytics ==> Regression SAS Reference ==> Procedures ==> PHREG SAS Reference ==> Procedures ==> NLMIXED Analytics ==> Mixed Models SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> MDC Analytics ==> Market Research SAS Reference ==> Procedures ==> QLIM Analytics ==> Longitudinal Analysis Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> CATMOD Analytics ==> Econometrics SAS Reference ==> Procedures ==> GAM SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> GLIMMIX SAS Reference ==> Procedures ==> FMM SAS Reference ==> Procedures ==> MCMC Analytics ==> Psychometrics Analytics ==> Bayesian Analysis SAS Reference ==> Procedures ==> ADAPTIVEREG SAS Reference ==> Procedures ==> BCHOICE Analytics ==> Survey Sampling and Analysis SAS Reference ==> Procedures ==> GEE |

Date Modified: | 2014-08-26 10:51:15 |

Date Created: | 2002-12-16 10:56:39 |