Assigning Data to Roles

To run the Rapid Predictive Modeler, you must select an input data source. To filter the input data source, click Filter Icon.
You must assign a variable to the Dependent variable role.
Role
Description
Roles
Dependent variable
specifies the value that you want to predict or classify. The dependent variable is also known as the target variable.
Note: The dependent variable must have 10 or less nonmissing levels. If the number of missing levels is greater than 10, you cannot run the task until you select a different dependent variable.
Decisions and Priors
specifies this information:
  • Event level specifies the class target value that you want to model. The SAS Rapid Predictive Modeler automatically builds a model that provides the probabilities for each target event, but reporting improves when the desired target level is known.
  • Prior probabilities displays the counts and proportions of the target variable levels that occur in the model training data. You can adjust these values when your target variable is a categorical variable, and the training data and population data have different target distributions.
    For example, consider a model that was trained on oversampled data, where 50% of observations are responders and 50% of observations are non-responders. However, the population data that the model scores historically contains only 10% responders and 90% non-responders. You can use prior probability settings to inform the model of the historically expected proportions of responders to non-responders.
    • If you do not want to specify prior probabilities, select None (which is the default).
    • To specify equal probabilities for all levels of the target variable, select Equal.
    • To specify your own custom prior probabilities for target variable levels in the scored data, select User-defined and specify the probabilities. The prior probabilities that you specify must sum to 1.
    Note: Prior probabilities are supported only if the dependent variable has 10 or fewer values.
  • Decision function specifies the costs, profits, or weights that you want to associate with the predicted results. The table of values is called a decision matrix. You use a decision matrix to associate a value with each possible decision outcome.
    • If your model does not require a decision matrix, select None.
    • To use your model to maximize profit, select Maximum, and if desired, enter a higher weight in the true positive cell of the matrix.
    • To use your model to minimize cost, select Minimum, and if desired, enter a higher weight in the true negative cell of the matrix.
    • To use your model to predict rare events, select Inverse to identify true positive and true negative predictions, at the risk of misestimating false positive and false negative predictions. Inverse is the default value.
    Note: The decision matrix is supported only if the dependent variable has 10 or fewer values.
Additional Roles
Variables to exclude from the model
specifies the variables that you do not want to include in your analysis.
Frequency count
specifies the variable to use to represent the frequency value. The data is treated as if each case is replicated as many times as the value of the frequency variable.
ID variables
specifies variables that are useful for reporting and scoring selection functions. These variables are not included in the analysis.