Generalized Linear Regression

As outlined in the section Generalized Linear Models in Chapter 3: Introduction to Statistical Modeling with SAS/STAT Software, the class of generalized linear models generalizes the linear regression models in two ways:

  • by allowing the data to come from a distribution that is a member of the exponential family of distributions

  • by introducing a link function, $g(\cdot )$, that provides a mapping between the linear predictor $\eta = \mb {x}’\bbeta $ and the mean of the data, $g(\mr {E}[Y]) = \eta $. The link function is monotonic, so that $\mr {E}[Y] = g^{-1}(\eta )$; $g^{-1}(\cdot )$ is called the inverse link function.

One of the most commonly used generalized linear regression models is the logistic model for binary or binomial data. Suppose that Y denotes a binary outcome variable that takes on the values 1 and 0 with the probabilities $\pi $ and $1-\pi $, respectively. The probability $\pi $ is also referred to as the success probability, supposing that the coding $Y=1$ corresponds to a success in a Bernoulli experiment. The success probability is also the mean of Y, and one of the aims of logistic regression analysis is to study how regressor variables affect the outcome probabilities or functions thereof, such as odds ratios.

The logistic regression model for $\pi $ is defined by the linear predictor $\eta = \mb {x}’\bbeta $ and the logit link function:

\[  \mr {logit}(\mr {Pr}(Y=0)) = \log \left( \frac{\pi }{1-\pi } \right) = \mb {x}’\bbeta  \]

The inversely linked linear predictor function in this model is

\[  \mr {Pr}(Y = 0) = \frac{1}{1+\exp (-\eta )}  \]

The dichotomous logistic regression model can be extended to multinomial (polychotomous) data. Two classes of models for multinomial data can be fit by using procedures in SAS/STAT software: models for ordinal data that rely on cumulative link functions, and models for nominal (unordered) outcomes that rely on generalized logits. The next sections briefly discuss SAS/STAT procedures for logistic regression. For more information about the comparison of the procedures mentioned there with respect to analysis of categorical responses, see Chapter 8: Introduction to Categorical Data Analysis Procedures.

The SAS/STAT procedures CATMOD, GENMOD, GLIMMIX, LOGISTIC, and PROBIT can fit generalized linear models for binary, binomial, and multinomial outcomes.