Assigning Data to Roles

To run the Generalized Linear Models task, you must select an input data source. To filter the input data source, click Filter Icon

You must also assign a column to the Response variable role for all distribution types except binomial. If you select a binomial distribution, you must assign either a single response variable or a pair of variables to the Number of events and Number of trials roles.

Option Name	Description
Roles
Response
Distribution	specifies the distribution for your model. You can choose from these distributions: Binomial Gamma Inverse Gaussian Multinomial Negative binomial Normal Poisson Tweedie. If you select a Tweedie distribution, you can specify the Tweedie power parameter. This value must be greater than 1.1 and less than or equal to 3.0. Zero-inflated negative binomial Zero-inflated Poisson
Options for Binomial Distribution
Response data consists of numbers of events and trials	specifies that a pair of variables consists of response data for events and trials.
Number of events	specifies the column that contains the number of events.
Number of trials	specifies the column that contains the number of trials.
Response	specifies the single variable that contains response values. Use the Event of interest option to select a value of the response variable that represents the event that you want to model. Note: The Response role and the Event of interest option are available only if you do not select the Response data consists of numbers of events and trials check box.
Options for All Distribution Types
Response	specifies the variable that contains the response data. For most distribution types, you specify a single numeric variable.
Link function	specifies the link function for your model. The functions that are available depend on the selected distribution.
Explanatory Variables
Classification variables	specifies the variables to use to group (classify) data in the analysis. Classification variables can be either character or numeric. A classification variable is a variable that enters the statistical analysis or model through its levels, not through its values. The process of associating values of a variable with levels is termed levelization.
Parameterization of Effects
Coding	specifies the parameterization method for the classification variable. Design matrix columns are created from the classification variables according to the selected coding scheme. You can select from these coding schemes: Effect coding specifies effect coding. GLM coding specifies less-than-full-rank, reference-cell coding. This coding scheme is the default. Reference coding specifies reference-cell coding.
Treatment of Missing Values
An observation is excluded from the analysis when either of these conditions is met: if any variable in the model contains a missing value if any classification variable contains a missing value (regardless of whether the classification variable is used in the model)
Continuous variables	specifies the independent covariates (regressors) for the regression model. If you do not specify a continuous variable, the task fits a model that contains only an intercept.
Offset variable	specifies a variable to be used as an offset to the linear predictor. An offset plays the role of an effect whose coefficient is known to be 1. Observations that have missing values for the offset variable are excluded from the analysis.
Additional Roles
Frequency count	lists a numeric variable whose value represents the frequency of the observation. If you assign a variable to this role, the task assumes that each observation represents n observations, where n is the value of the frequency variable. If n is not an integer, SAS truncates it. If n is less than 1 or is missing, the observation is excluded from the analysis. The sum of the frequency variable represents the total number of observations.
Weight variable	specifies the numeric column to use as a weight to perform a weighted analysis of the data.
Group analysis by	enables you to obtain separate analyses of observations for each unique group.