Assigning Data to Roles

To run the Generalized Linear Models task, you must select an input data source. To filter the input data source, click Filter Icon.
You must also assign a column to the Response variable role for all distribution types except binomial. If you select a binomial distribution, you must assign either a single response variable or a pair of variables to the Number of events and Number of trials roles.
Option Name
Description
Roles
Response
Distribution
specifies the distribution for your model. You can choose from these distributions:
  • Binomial
  • Gamma
  • Inverse Gaussian
  • Multinomial
  • Negative binomial
  • Normal
  • Poisson
  • Tweedie. If you select a Tweedie distribution, you can specify the Tweedie power parameter. This value must be greater than 1.1 and less than or equal to 3.0.
  • Zero-inflated negative binomial
  • Zero-inflated Poisson
Options for Binomial Distribution
Response data consists of numbers of events and trials
specifies that a pair of variables consists of response data for events and trials.
Number of events
specifies the column that contains the number of events.
Number of trials
specifies the column that contains the number of trials.
Response
specifies the single variable that contains response values.
Use the Event of interest option to select a value of the response variable that represents the event that you want to model.
Note: The Response role and the Event of interest option are available only if you do not select the Response data consists of numbers of events and trials check box.
Options for All Distribution Types
Response
specifies the variable that contains the response data. For most distribution types, you specify a single numeric variable.
Link function
specifies the link function for your model. The functions that are available depend on the selected distribution.
Explanatory Variables
Classification variables
specifies the variables to use to group (classify) data in the analysis. Classification variables can be either character or numeric. A classification variable is a variable that enters the statistical analysis or model through its levels, not through its values. The process of associating values of a variable with levels is termed levelization.
Parameterization of Effects
Coding
specifies the parameterization method for the classification variable. Design matrix columns are created from the classification variables according to the selected coding scheme.
You can select from these coding schemes:
  • Effect coding specifies effect coding.
  • GLM coding specifies less-than-full-rank, reference-cell coding. This coding scheme is the default.
  • Reference coding specifies reference-cell coding.
Treatment of Missing Values
An observation is excluded from the analysis when either of these conditions is met:
  • if any variable in the model contains a missing value
  • if any classification variable contains a missing value (regardless of whether the classification variable is used in the model)
Continuous variables
specifies the independent covariates (regressors) for the regression model. If you do not specify a continuous variable, the task fits a model that contains only an intercept.
Offset variable
specifies a variable to be used as an offset to the linear predictor. An offset plays the role of an effect whose coefficient is known to be 1. Observations that have missing values for the offset variable are excluded from the analysis.
Additional Roles
Frequency count
lists a numeric variable whose value represents the frequency of the observation. If you assign a variable to this role, the task assumes that each observation represents n observations, where n is the value of the frequency variable. If n is not an integer, SAS truncates it. If n is less than 1 or is missing, the observation is excluded from the analysis. The sum of the frequency variable represents the total number of observations.
Weight variable
specifies the numeric column to use as a weight to perform a weighted analysis of the data.
Group analysis by
enables you to obtain separate analyses of observations for each unique group.