Assigning Data to Roles

To run the Generalized Linear Models task, you must select an input data source. To filter the input data source, click Filter Icon.
You must assign a column to the Response variable role.
Option Name
Description
Roles
Response
Distribution
specifies the distribution for your model. You can choose from these distributions:
  • Binomial
  • Gamma
  • Inverse Gaussian
  • Multinomial
  • Negative binomial
  • Normal
  • Poisson
  • Tweedie
Options for Binomial Distribution
Note: These options are available if you select Binomial from the Distribution drop-down list.
Response data consists of numbers of events and trials
specifies whether the data consists of a variable that specifies the number of positive responses (events) and another variable that specifies the number of trials.
Number of events
specifies the column that contains the number of events.
Number of trials
specifies the column that contains the number of trials.
Response
specifies the variable that contains response values.
If you create a binomial response model, you can specify the first or last ordered category as the reference category by using the Event of interest option. You can also select a custom category.
Note: This option is available only if you do not select the Response data consists of numbers of events and trials check box.
Options for All Distribution Types
Response
specifies the variable that contains response values.
If you create a binomial response model or a nominal multinomial model, you can specify the first or last ordered category as the reference category by using the Event of interest option. You can also select a custom category.
  • To create a binomial response model, select Binomial as the distribution. For the binomial response model, specifying one response category as the reference is the same as specifying the other response category as the event category.
  • To create a nominal multinomial model, select Multinomial as the distribution and select Generalized logit as the link function. For the generalized logit model, each logit contrasts a nonreference category with the reference category.
Link function
specifies the link function for your model. The functions that are available depend on the selected distribution.
If you select Default for the link function, then the default link function for the model distribution is used.
Here is the list of distributions with the corresponding default link function:
  • Binomial distribution uses the logit link function.
  • Gamma distribution uses the reciprocal link function.
  • Inverse Gaussian distribution uses the reciprocal square link function.
  • Multinomial distribution uses the cumulative logit link function.
  • Negative binomial distribution uses the log link function.
  • Normal distribution uses the identity link function.
  • Poisson distribution uses the log link function.
  • Tweedie distribution uses the log link function.
Explanatory Variables
Classification variables
specifies the variables to use to group (classify) data in the analysis. Classification variables can be either character or numeric.
Parameterization of Effects
Coding
specifies the parameterization method for the classification variable. Design matrix columns are created from the classification variables according to the selected coding scheme.
You can select from these coding schemes:
  • GLM coding specifies less-than-full-rank, reference-cell coding. This coding scheme is the default.
  • Reference coding specifies reference-cell coding.
Treatment of Missing Values
An observation is excluded from the analysis when either of these conditions is met:
  • if any variable in the model contains a missing value
  • if any classification variable contains a missing value (regardless of whether the classification variable is used in the model)
Continuous variables
specifies the independent covariates (regressors) for the regression model. If you do not specify a continuous variable, the task fits a model that contains only an intercept.
Offset variable
specifies a variable to be used as an offset to the linear predictor. An offset plays the role of an effect whose coefficient is known to be 1. Observations that have missing values for the offset variable are excluded from the analysis.
Additional Roles
Frequency count
lists a numeric variable whose value represents the frequency of the observation. If you assign a variable to this role, the task assumes that each observation represents n observations, where n is the value of the frequency variable. If n is not an integer, SAS truncates it. If n is less than 1 or is missing, the observation is excluded from the analysis. The sum of the frequency variable represents the total number of observations.
Weight variable
specifies the column to use as a weight to perform a weighted analysis of the data.