Assigning Data to Roles

To run the Predictive Regression Models task, you must select an input data source. To filter the input data source, click Filter Icon.
You also must assign a column to the Dependent variable role and a column to the Classification variables role or the Continuous variables role.
Role
Description
Roles
Dependent variable
specifies the numeric variable to use as the dependent variable for the regression analysis.
Classification variables
specifies the variables to use to group (classify) data in the analysis. A classification variable is a variable that enters the statistical analysis or model through its levels, not through its values. The process of associating values of a variable with levels is termed levelization.
Parameterization of Effects
Coding
specifies the parameterization method for the classification variable. Design matrix columns are created from the classification variables according to the selected coding scheme.
You can select from these coding schemes:
  • Effects coding specifies effect coding.
  • GLM coding specifies less-than-full-rank, reference-cell coding. This coding scheme is the default.
  • Reference coding specifies reference-cell coding.
Treatment of Missing Values
An observation is excluded from the analysis if any variable in the model contains a missing value. In addition, an observation is excluded if any classification variable specified earlier in this table contains a missing value, regardless if it is used in the model.
Continuous variables
specifies the independent covariates (regressors) for the regression model. If you do not specify a continuous variable, the task fits a model that contains only an intercept.
Additional Roles
Frequency count
lists a numeric variable whose value represents the frequency of the observation. If you assign a variable to this role, the task assumes that each observation represents n observations, where n is the value of the frequency variable. If n is not an integer, SAS truncates it. If n is less than 1 or is missing, the observation is excluded from the analysis. The sum of the frequency variable represents the total number of observations.
Weight
specifies the numeric column to use as a weight to perform a weighted analysis of the data.
Group analysis by
enables you to obtain separate analyses of observations for each unique group.