Usage Note 22871: Types of logistic (or logit) models that can be fit using SAS®
There are many types of models in the area of logistic modeling. Following are some common logistic models. Note that the words logistic and logit are used interchangeably.
The three basic categories of logistic models are the binary, ordinal, and nominal models. They are discussed here along with related, special models and estimation methods. Examples of many of these models can be found in the documentation of the procedures that are mentioned or at the links that are provided. Many of these models are discussed and illustrated in more detail by Stokes et al. and Allison. All procedures that are mentioned are part of SAS/STAT® software unless otherwise indicated.
- Binary logistic regression model
- Used to model a binary (two-level) response — for example, yes or no.
How to fit it: This model can be fit by many procedures, including the SAS/STAT procedures LOGISTIC and GENMOD (using asymptotic or exact conditional methods), CATMOD (using weighted least squares or maximum likelihood (ML)), HPLOGISTIC, PROBIT, GAM, GAMPL, GLIMMIX, SURVEYLOGISTIC, FMM and HPFMM (using ML or Bayesian estimation), GEE (beginning in SAS 9.4 TS1M2, uses Generalized Estimating Equations and provides robust standard error estimates), MCMC (using Bayesian estimation), ADAPTIVEREG, NLMIXED, HPNLMOD, HPGENSELECT; and the SAS/ETS® procedures MDC and QLIM. The GAM, GAMPL, and ADAPTIVEREG procedures can fit more flexible logistic models by using spline or loess smoothers. The GLIMMIX, NLMIXED, MCMC, and (beginning in SAS 9.4 TS1M4) QLIM procedures allow the inclusion of random effects in the model. Longitudinal or repeated measures data can be modeled using the REPEATED statement in GENMOD or CATMOD, and using the RANDOM statement in GLIMMIX, GEE, MCMC, NLMIXED, or QLIM procedures.
- Case-control model using matched pairs (or sets)
- Used to model binary response data from pairs of subjects or sets of subjects that are matched on certain characteristics. In 1:1 matching, one case is matched to one control. In 1:m or m:n matching, one or more cases are matched to two or more controls. Conditional methods are used to fit this model.
How to fit it: Use the STRATA statement in PROC LOGISTIC. With a little additional work, the model can also be fit using the STRATA statement in PROC PHREG. Exact estimation can also be done by adding the EXACT statement in PROC LOGISTIC. Or use the STRATA and EXACT statements in PROC GENMOD beginning with SAS/STAT 9.22 in SAS 9.2 TS2M3.
- Bradley-Terry model
- Used when subjects are asked to compare many items, two at a time. That is, a subject indicates preference for one item in each of several pairs of items presented. Preference probabilities and ratios of preference probabilities for items can be determined from the model parameters. Conditional methods are used to fit this model.
How to fit it: Details and an example are given in this sample program.
- Generalized Additive Model (GAM) for binary response data
- This is a very flexible nonparametric model that relaxes assumptions of linearity and is useful for data exploration. Nonlinearity is allowed via smoothers such as cubic splines, local regression (loess), and bivariate thin-plate splines. It is available only for binary responses.
How to fit it: In PROC GAM or (beginning in SAS 9.4 TS1M3) PROC GAMPL, specify the DIST=BINOMIAL option in the MODEL statement to fit a logistic model.
- Nonparametric logistic model using adaptive splines
- Includes nonadditive models which allow interactions and additive models with no interactions. The method combines regression splines and model selection methods. An overfitted model is grown using a fast forward selection technique and then pruned back using backward selection before selecting a final model.
How to fit it: PROC ADAPTIVEREG (beginning in SAS 9.3 TS1M2) with the DIST=BINOMIAL option in the MODEL statement fits the nonadditive model allowing two-way interactions. Specify the ADDITIVE option in the MODEL statement if a model not allowing interaction is desired.
- Ordinal (ordered) logistic regression model
- Used to model an ordered response — for example, low, medium, or high. Might also be called the ordinal multinomial logit model. Ordinal logistic models take into account the ordered nature of the response, which can result in simple, more powerful models. Typical response functions that are modeled are cumulative logits, adjacent-category logits, or continuation-ratio logits resulting in ordinal logistic models known as the cumulative logit model, the adjacent-category logit model, and the continuation-ratio logit model. The proportional odds model and the partial proportional odds model are special cases of the cumulative logit model. If the spacing between levels of the ordinal response scale is known, so that numerical scores can reasonably be assigned to the response levels, then a mean response model can be fit.
- Cumulative logit model
- This is one type of ordinal logistic model that models cumulative logits: CLogiti=log(Pr[Y≤i]/(1-Pr[Y≤i])), for Y=1,2,... . Notice that each cumulative logit involves all levels of the response and dichotomizes the response scale. The unrestricted cumulative logit model has a complete set of parameter estimates for each cumulative logit — that is, multiple intercepts and multiple estimates for each predictor. But such a complex model can result in probabilities that do not accumulate properly. Typically, simpler models that are special cases of the general cumulative logit model are used. Such models are the proportional odds model and the partial proportional odds model.
How to fit it: See proportional odds model.
- Proportional odds model
- This is a cumulative logit model that assumes that the odds of response below a given response level are constant regardless of which level you pick. This model allows separate intercepts for the cumulative logit, but restricts the parameter sets for the predictors to be the same across all logits. A proportional odds model that constrains some predictors to have common parameters and leaves other predictors free to have separate parameters is called a partial proportional odds model.
How to fit it: PROC LOGISTIC fits the proportional odds model by default when the response has more than two levels. PROC LOGISTIC can also fit the partial proportional odds model by specifying (beginning in SAS 9.3 TS1M2) the UNEQUALSLOPES option and optionally (beginning in SAS 9.4 TS1M2) the EQUALSLOPES option in the MODEL statement. PROC GENMOD, PROC GLIMMIX, and PROC HPGENSELECT fit this model by default when the DIST=MULT option is specified in the MODEL statement. PROC PROBIT can fit the model if you specify DIST=LOGISTIC in the MODEL statement. It can also be fit in the SAS/ETS procedure QLIM if you specify the DISCRETE(DIST=LOGISTIC) option in the MODEL statement. PROC NLMIXED can fit the full or partial proportional odds model as discussed in this note. The GLIMMIX and NLMIXED procedures in SAS/STAT and (beginning in SAS 9.4 TS1M4) QLIM in SAS/ETS allow for the inclusion of random effects in the model. The preceding procedures use maximum likelihood estimation. PROC CATMOD uses weighted least squares to fit the unrestricted cumulative logit model when the RESPONSE CLOGITS; statement is specified. The restricted or partial proportional odds model can also be specified.
- Adjacent-category logit model
- This is a type of ordinal logistic model that models adjacent-category logits: ALogiti=log(Pr[Y=i]/Pr[Y=i+1]), for Y=1,2,... . Notice that each adjacent-category logit contrasts two adjacent response categories rather than involving the entire response scale as with cumulative logits. The model can allow separate parameter vectors for each logit (unequal slopes model) or be restricted (equal slopes) allowing only the intercepts to vary.
How to fit it: Beginning in SAS 9.4 TS1M3, specify the LINK=ALOGIT option in the MODEL statement of PROC LOGISTIC to fit the model using maximum likelihood estimation. The equal slopes model is fitted by default. Specify the UNEQUALSLOPES option to fit the unequal slopes model. PROC CATMOD uses the method of weighted least squares to fit the adjacent-category logit model when the RESPONSE ALOGITS; statement is specified. The unequal slopes model is fitted by default. The equal slopes model can be fitted by including the _RESPONSE_ keyword in the MODEL statement. This simpler model can be fitted by maximum likelihood in PROC CATMOD by writing this model as a generalized logit model. See the text by Agresti for discussion and examples using PROC CATMOD. The equal slopes model can also be fitted by maximum likelihood in PROC GENMOD as a loglinear model as discussed by Allison.
- Continuation-ratio logit model
- This is a type of ordinal logistic model that models continuation-ratio logits: CRLogiti=log(Pr[Y=i+1]/Pr[Y≤i]), for Y=1,2,... .
How to fit it: This model, which has a parameter vector for each logit, can be fit by weighted least squares (WLS) in PROC CATMOD by using the capabilities in the RESPONSE statement to define custom response functions. It can also be fit by maximum likelihood by taking advantage of the fact that the parameters for each logit can be estimated using a separate binary logistic model. A simpler model with an intercept for each logit but a single parameter vector for the predictors across all logits can be fit by WLS in PROC CATMOD by including the _RESPONSE_ keyword in the MODEL statement. Examples are provided in this note.
- Mean response model
- This is not really a logistic model because no type of logit response function is modeled. As a result, the model does not guarantee that predicted values correspond to a valid set of probabilities across the response levels. However, it is an alternative to the ordinal logistic models that were discussed earlier when numerical scores can be assigned to the response levels, implying that the spacing among the levels is known. This model is most natural when the ordinal categorical response represents a continuous response that is coarsely measured. Note that this model provides an estimated mean response rather than assigning estimated probabilities to each response level.
How to fit it: PROC CATMOD fits the mean response model by weighted least squares when the RESPONSE MEANS; statement is specified.
- Nominal (unordered) logistic regression model
- Used to model a multilevel response with no ordering — for example, eye color with levels brown, green, and blue. Such a response is also called polytomous, polychotomous, or multinomial.
- Multinomial logit model
- Strictly speaking, the term multinomial indicates only that the response has more than two levels; it does not specify whether they are ordered or unordered. Nevertheless, the term multinomial logit model is often used when the response is a set of unordered choices and refers to the discrete choice model. But the term can also refer to the ordinal model or to the generalized logit model.
- Generalized logit model
- This is an unconditional, nominal logistic model in which a set of k response functions are modeled and are known as generalized or baseline logits that contrast each level with the last level: GLogiti=log(Pr[Y=i]/Pr[Y=k+1]), for Y=1,2,...,k+1.
How to fit it: PROC LOGISTIC fits this model when you specify the LINK=GLOGIT option in the MODEL statement. By default, the parameter vectors for the logits are allowed to differ, but beginning in SAS 9.4 TS1M2, you can use the EQUALSLOPES option to constrain some or all parameters to be equal across the logits. PROC SURVEYLOGISTIC also fits this model with the LINK=GLOGIT option. The GLIMMIX and HPGENSELECT procedures fit this model when you specify the DIST=MULT and LINK=GLOGIT options in the MODEL statement. Beginning in SAS 9.4 TS1M2, the FMM procedure fits this model when you specify the DIST=MULT option in the MODEL statement. All of the above procedures fit the model using maximum likelihood estimation. Exact estimation of the model is also available in PROC LOGISTIC. This is the default model fit by PROC CATMOD when the response has more than two levels. PROC CATMOD can fit the model using maximum likelihood (the default) or weighted least squares (specify the WLS option).
- Discrete choice models
- Used to model a response that is the choice of individuals—for example, among transportation modes (car, bus, train, plane). Some or all predictors can be properties of the choices (cost, speed, and so on) rather than properties of the choosers as in models such as the generalized logit model. McFadden's conditional logit model, the nested logit model, the mixed logit model, and the exploded logit model are discrete choice models. The generalized logit model is often used as a discrete choice model too when the predictors are all properties of the choosers (subjects) and not of the choices.
- McFadden's conditional logit model
- Is a discrete choice model in which the predictors are properties of the choices (response levels). This model assumes independence from irrelevant alternatives (IIA). More information about this model can be found in this note.
How to fit it: The discrete choice model can be fit by the SAS/STAT procedure BCHOICE using Bayesian methods (beginning in SAS 9.4 TS1M1), the SAS/STAT procedure PHREG using the STRATA statement and the TIES=BRESLOW option, the SAS/STAT procedure LOGISTIC using the STRATA statement, and the SAS/ETS procedure MDC.
- Nested logit model
- Is a generalization of the conditional logit model that relaxes the IIA assumption to allow for particular patterns of correlation in unobserved utility. It can be used if the set of alternatives that are faced by an individual can be partitioned into subsets such that the IIA property holds within subsets but not across subsets.
How to fit it: The nested logit model can be fit by the SAS/STAT procedure BCHOICE using Bayesian methods (beginning in SAS 9.4 TS1M1), or the SAS/ETS procedure MDC.
- Mixed logit model
- Is a generalization of the conditional logit model that can represent very general patterns of substitution among alternatives. In this model, the utility function of each decision maker can be decomposed into a deterministic component (linear combination of observed variables) and a stochastic error component. The choice probability is a mixture of logits. The model for the error component involves random coefficients.
How to fit it: The mixed logit model can be fit using the SAS/STAT procedure BCHOICE (beginning in SAS 9.4 TS1M1), or the SAS/ETS procedure MDC.
- Exploded logit model
- Is a model used when subjects rank all or some of the choices. The likelihood is identical to that for a stratified Cox model typically used in survival analysis to model the time to an event. Discussion and an example can be found in the text by Allison.
How to fit it: The data are arranged as one observation per rank from a subject. Use PROC PHREG with the variable containing the ranks as the response variable. Specify a variable in the STRATA statement identifying the subjects. If tied ranks occur in the data, use the TIES=DISCRETE option in the MODEL statement.
Special Logistic Models
- Logistic model for longitudinal (or repeated measures) data
- These models are for a response that is observed more than once on each subject (or item), either at multiple times or under multiple conditions. The response can be binary, ordinal, or nominal. There are three primary types of models: marginal (or population-averaged), subject-specific (includes fixed-effects and random-effects model), and transitional.
- Generalized Estimating Equations (GEE)
- The GEE method allows missing values within a subject without losing all data from the subject, and time-varying predictors can appear in the model. The method requires a large number of subjects. A variation on the GEE model that models the association among responses with odds ratios rather than correlations is Alternating Logistic Regression (ALR). The ALR method does not restrict the correlation among the measurements as the usual GEE method does. Like the GEE model, ALR provides estimates of the marginal model parameters. But ALR also estimates parameters of the model on the log odds ratios among the measurements. The GEE and ALR models are marginal models. Note that the GEE estimation method is not a maximum likelihood method.
How to fit it: To fit the GEE model, specify the REPEATED statement in PROC GENMOD or (beginning in SAS 9.4 TS1M2) PROC GEE. In PROC GENMOD, the DIST=BINOMIAL or DIST=MULT option must appear in the MODEL statement to request binary or ordinal multinomial logistic models, respectively. The nominal multinomial model is available in PROC GEE beginning in SAS 9.4 TS1M3. Use the TYPE= option in the REPEATED statement to specify the correlation structure among the repeated measurements within a subject. To fit the ALR model, specify LOGOR= rather than TYPE= in the REPEATED statement of PROC GENMOD or (beginning in SAS 9.4 TS1M3) in PROC GEE. The ALR model is available only for binary responses in PROC GENMOD but can also be used with ordinal multinomial responses in PROC GEE. PROC GEE also implements the weighted GEE method when missing responses depend on previous responses. Weighted GEE is not available in PROC GENMOD. A GEE model, estimated by residual pseudo-likelihood, can also be fit using PROC GLIMMIX. Specify the EMPIRICAL option in the PROC GLIMMIX statement. Also, specify the RANDOM _RESIDUAL_ statement with the subject variable in the SUBJECT= option. An example is given in the GLIMMIX documentation.
- Cluster model with variance adjustment
- This model is fit by maximum likelihood, but variances are adjusted using the cluster structure of the data. For a small number of clusters, this model with the Morel adjustment (VADJUST=MOREL) can provide a better fit than the GEE model. It can be used for binary, ordinal, or nominal responses. It is a marginal model.
How to fit it: Use PROC SURVEYLOGISTIC with the CLUSTER statement and optionally use the VADJUST= option in the MODEL statement.
- Binomial and multinomial cluster models
- These models, proposed by Morel and Nagaraj (1993), account for the overdispersion that results when some proportion of the observed population responds in the same way while the remaining proportion responds according to a binomial or multinomial distribution.
How to fit it: To fit the binomial cluster model in PROC FMM, specify the DIST=BINOMCLUS option in the MODEL statement. Beginning in SAS 9.4 TS1M2, you can fit the multinomial cluster model in PROC FMM by specifying the DIST=MCLUS option in the MODEL statement.
- Weighted Least Squares (WLS)
- Available for binary, ordinal, and nominal responses, the WLS method requires complete data for each subject (otherwise the subject is ignored) and does not allow time-varying predictors in the model. It is a marginal model.
How to fit it: Specify the REPEATED statement in PROC CATMOD.
- Fixed-effects logistic model
- This model treats each measurement on each subject as a separate observation, and the set of subject coefficients that would appear in an unconditional model are eliminated by conditional methods. This is a conditional, subject-specific model (as opposed to a population-averaged model like the GEE model). See the extensive discussion and examples in the text on Fixed Effects Methods by Allison.
How to fit it: For binary response data, use the STRATA statement in PROC LOGISTIC. See the Allison text on Fixed Effects Methods concerning fitting a related model for multinomial responses.
- Transition models for discrete state space stochastic processes
- Discrete time Markov chains can be represented as log-linear models. Predictor variables can be incorporated by fitting a logistic model and treating previous response values as additional predictors.
How to fit it: Log-linear models can be fit in PROC GENMOD by specifying the cell counts of the table as the response and specifying DIST=POISSON in the MODEL statement. The first-order Markov chain for a process over four discrete times is MODEL COUNT = T1|T2 T2|T3 T3|T4 / DIST=POISSON; The second-order model is MODEL COUNT = T1|T2|T3 T2|T3|T4 / DIST=POISSON; A transitional model incorporating predictor variables is fit as a logistic model, but the input data set should have separate observations for the responses at each time for each subject. Variables representing lags of the response variable can then be used as predictors in the model. Examples are provided in this note.
- Fractional logistic model
- When the response is continuous and bounded between 0 and 1 (or 0% and 100%) and does not represent a set of binary Bernoulli trials, the response distribution is generally not known. A quasi-likelihood method can be used to fit the model as discussed in McCullagh and Nelder (1989). The 4- or 5-parameter logistic models, which have particular nonlinear forms, are also used for this situation.
How to fit it: Use PROC GLIMMIX and specify dist=binomial link=logit in the MODEL statement. Also specify random _residual_; to estimate a scale (dispersion) parameter. For examples, see this note and "Quasi-likelihood Estimation for Proportions with Unknown Distribution" in the Examples section of the GLIMMIX documentation.
- Four- or 5-parameter logistic model
- Like the fractional logistic model, these models (also called Emax or Hill models) are for a continuous response bounded between 0 and 1. They can be fit assuming a specified distribution or using quasi-likelihood for a more distribution-free approach. These models have particular nonlinear forms.
How to fit it: Use PROC NLMIXED to define the model form. Specify the desired distribution or define a quasi-likelihood function. See this note for an example.
- Logistic model for survey data
- Includes models for binary, ordinal, and nominal responses. For proper inferences, the analysis or survey data must incorporate properties of the survey sample design, including stratification, clustering, and unequal weighting.
How to fit it: In PROC SURVEYLOGISTIC use the RATE= and TOTAL= options in the PROC statement, and the CLUSTER, STRATA, and WEIGHT statements to specify the sampling design and sampling weights. Note that simply using the WEIGHT statement in PROC LOGISTIC is not sufficient since special variance estimators are required.
- Random effects logistic model (as a generalized linear mixed model or nonlinear mixed model)
- Allows random effects in a logistic model resulting in a subject-specific model. This is a conditional model that can also be used to model longitudinal or repeated measures data.
How to fit it: Use PROC GLIMMIX and in the MODEL statement specify DIST=BINOMIAL LINK=LOGIT (for binary logit model), DIST=MULT LINK=CLOGIT (for an ordinal logit model), or DIST=MULT LINK=GLOGIT (for a nominal logit model). Use RANDOM statements to define random effects. The model can also be fit in PROC NLMIXED by using a different methodology that typically limits the number of random effects to one or two. Only binary responses are directly supported (specify BINARY(p) or BINOMIAL(n,p) in the MODEL statement), though multinomial models can be accommodated by defining the multinomial log likelihood (via the GENERAL distribution type in the MODEL statement). Beginning in SAS 9.4 TS1M4, random effects can be added to binary or ordinal logit and probit models with the RANDOM statement in PROC QLIM in SAS/ETS. For Bayesian estimation, use PROC MCMC and specify BINARY or BINOMIAL in the MODEL statement, a PRIOR statement to specify prior distributions for the parameters, and RANDOM statements do define random effects.
- Heteroscedastic logistic model
- This model allows the dispersion to be modeled as well as the mean. Binary and ordinal responses can be modeled.
How to fit it: Use the MODEL statement in the SAS/ETS procedure QLIM to specify the model for the mean, and use the HETERO statement to specify the dispersion model. Specify the DISCRETE(DIST=LOGISTIC) option in the MODEL statement to fit a binary or ordinal logistic model (depending on the number of levels that are detected in the response variable).
- Multivariate logistic model
- Simultaneously models multiple responses, taking into account the correlations among all response functions.
How to fit it: In PROC CATMOD, specify the RESPONSE LOGITS; statement and multiple response variables in the MODEL statement to fit the model by using weighted least squares estimation. For example, these statements simultaneously model logits that are defined separately on three response variables: response logits; model x1*x2*x3 = group; The bivariate probit model can be fit in SAS/ETS PROC QLIM: model y1 y2 = x / discrete;
- Conditional and exact conditional logistic regression
- These are model estimation methods using conditional methods which eliminate the need to estimate a set of parameters which can be considered nuisance parameters. The conditional model is available only with a binary response. A conditional likelihood is maximized to estimate the remaining parameters. Alternatively, exact estimation can be done using permutation methods. Exact methods are appropriate for small-sample or sparse data situations that often result in the failure (nonconvergence or separation) of the usual unconditional maximum likelihood estimation method. However, exact methods can take a great deal of time and memory as sample or model sizes increase. For sample sizes too large for the default exact method, a Monte Carlo method is provided.
How to fit it: Specify the STRATA statement in PROC LOGISTIC to fit the conditional model. Also specify the EXACT statement in PROC LOGISTIC or (beginning with SAS/STAT 9.22 in SAS 9.2 TS2M3) in PROC GENMOD to fit the exact conditional model. Use the EXACTOPTIONS option in the PROC LOGISTIC statement (or the EXACTOPTIONS statement in PROC GENMOD) to select the exact method (METHOD=NETWORKMC requests the Monte Carlo method) and to control other aspects of the exact analysis.
- Bayesian logistic regression
- This is a method for estimating the parameters of a binary logistic model. The method assumes that the parameters of the model are random variables and allows you to specify prior distributions for them. The method combines the data and the prior distribution to arrive at a posterior distribution for each parameter. For more information, see "Introduction to Bayesian Analysis Procedures" in the Introductions section of the SAS/STAT User's Guide.
How to fit it: In PROC GENMOD or PROC FMM, add the BAYES statement to specify prior distributions for the parameters. In PROC MCMC, specify BINARY or BINOMIAL in the MODEL statement, and a PRIOR statement to specify prior distributions for the parameters.
Operating System and Release Information
SAS System | SAS/STAT | All | n/a | |
SAS System | SAS/ETS | All | n/a | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Usage Note |
Priority: | low |
Topic: | SAS Reference ==> Procedures ==> SURVEYLOGISTIC SAS Reference ==> Procedures ==> PROBIT Analytics ==> Regression SAS Reference ==> Procedures ==> PHREG SAS Reference ==> Procedures ==> NLMIXED Analytics ==> Mixed Models SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> MDC Analytics ==> Market Research SAS Reference ==> Procedures ==> QLIM Analytics ==> Longitudinal Analysis Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> CATMOD Analytics ==> Econometrics SAS Reference ==> Procedures ==> GAM SAS Reference ==> Procedures ==> GENMOD SAS Reference ==> Procedures ==> GLIMMIX SAS Reference ==> Procedures ==> FMM SAS Reference ==> Procedures ==> MCMC Analytics ==> Psychometrics Analytics ==> Bayesian Analysis SAS Reference ==> Procedures ==> ADAPTIVEREG SAS Reference ==> Procedures ==> BCHOICE Analytics ==> Survey Sampling and Analysis SAS Reference ==> Procedures ==> GEE
|
Date Modified: | 2018-09-18 11:31:00 |
Date Created: | 2002-12-16 10:56:39 |