Example Process Flow Diagram

Task 4. Defining a Target Profile for the GOOD_BAD Target Variable

A target profile contains information for the target, such as the event level for binary targets, decision matrices, and prior probabilities. The active target profile information is read downstream in the process flow by the modeling nodes and the Assessment node. Although you can define and edit a target profile in any of the modeling nodes, it is often convenient to define this information early in the process flow when you set the target variable in the Input Data Source node.

In this section, you will set the target event level, specify the loss matrix, and define the prior vector for the target GOOD_BAD. Follow these steps to define the target profile for GOOD_BAD:

  1. Open the Target Profiler by right-clicking in any cell of the GOOD_BAD target variable row of the Variables tab, and select Edit target profile. The Target Profiles for the GOOD_BAD window opens.

    [Profiles tab of the Target Profiles for GOOD_BAD window showing the profile associated with the SAMPSIO.DMAGECR dataset.]

    By default, the Target Profiles for the GOOD_BAD window contains a predefined profile. The asterisk beside the profile name indicates that it is the active target profile. You can create new target profiles, but for this example, you will modify the existing profile.

  2. Set the Target Event Level:

    Select the Target tab. By default, the Input Data Source node sets the Order value to Descending for binary targets. Because the order is set to descending by default, good is the target event level. You will model the probability that a customer has good credit. The Assessment node is dependent on the event level when calculating assessment statistics, such as expected profit or loss. If you wanted to model the probability that a customer has bad credit, you would need to set the event level to bad. You can accomplish this by setting the Order value for GOOD_BAD to Ascending in the Class Variables tab of the Input Data Source window.

  3. Define the Loss Matrix for the Target GOOD_BAD:

    1. Use the Assessment Information tab to define decision matrices, and set the assessment objective for each matrix to either maximize profit, maximize profit with costs (revenue), or minimize loss. For binary targets, the Assessment Information tab contains four predefined decision matrices. These matrices cannot be redefined and are not suitable for this example. Here, you want to define the realistic loss matrix that was described in the overview of the process flow.

      [Assessment Information tab of the Target Profiles for GOOD_BAD window showing levels for Profit vector]

    2. Add a new loss matrix that you can modify by copying the Default Loss matrix (right-click on the Default Loss matrix and then select Copy). A new decision matrix that is named Profit matrix is added to the list box. Alternatively, you can add a new decision matrix by right-clicking an open area of the list box, and selecting Add. This matrix will always also be a copy of the Default profit matrix.

    3. Only one matrix can have a status of use. To set the status of the new matrix to use, select the Profit matrix entry, right-click on the entry, and select the Set to use menu item. An asterisk appears besides the matrix name indicating that it is now the active decision matrix that will be read downstream in the process flow by the modeling and Assessment nodes.

      [Assessment Information tab of the Target Profiles for GOOD_BAD window showing a Profit matrix.]

    4. To rename the matrix, delete the existing name in the Name text box, type a new name, and then press the ENTER key. In this example, the matrix has been renamed Realistic Loss.

      [Assessment Information tab of the Target Profile for GOOD_BAD window showing the Profit Matrix renamed to the Realistic Loss matrix.]

      Note:   The values in the matrix are still the same as the predefined Default Loss matrix that you copied. You use this matrix to obtain the correct misclassification rate for the good and bad credit risk applicants.  [cautionend]

    5. By default, the decision column names are set to the target levels (good and bad). To rename the decision column names to accept and reject, click Edit Decisions. Then type accept in place of the decision that is named good, and reject in place of the decision that is named bad.

      [Decisions and Utilities tab of the Editing Decisions and Utilites: Realistic Loss window configured to Minimize loss.]

      Note:   You can also use the Decisions and Utilities tab to set the assessment objective for the matrix to either Maximize profit, Maximize profit with costs (revenue), or Minimize loss. Because you copied the predefined Default loss matrix, the assessment objective is already correctly set to Minimize loss. If you set the assessment objective to Maximize profit with costs, you can assign a cost variable or a constant cost to each decision. To assign a cost variable to a decision, the cost model role must have been assigned to the appropriate variables in the Variables tab of the Input Data Source node.  [cautionend]

    6. Close the Editing Decisions and Utilities window and follow the prompts to save your changes. Click Yes to return to the Assessment Information tab.

    7. Type the following values in the loss matrix:

      [Assessment Information tab of the Target Profiles for GOOD_BAD showing updated matrix values for the Realistic Loss Matrix]

      Each of the modeling nodes will use this loss matrix to calculate the expected losses.

    8. To specify the true operational priors for the data that you intend to score, select the Prior tab.

      [The Prior tab of the Target Profiles for GOOD_BAD window showing the Proportional to Data vector choice selected.]

      By default, there are three predefined prior vectors. The Equal probability vector contains equal probabilities for the target levels. The Proportional to data vector contains priors that are proportional to those in the input data set. Note that the input data set contains 70% good risk applicants and 30% bad risk applicants. The actual probabilities in the score data set are believed to be 90% and 10% for good and bad credit risk applicants, respectively. The active None prior vector computes the posterior probabilities under the assumption that the prior probabilities are proportional to the frequencies of the classes in the training data set.

    9. To add a new prior vector, right-click in an open area of the list box and select Add. A new Prior vector is added that contains prior values that are also proportional to those in the input data set.

    10. To set the status of the vector to use, right-click on the Prior vector and select Set to Use.

    11. Type the following values in the vector matrix:

      [Prior tab of the Target Profiles for GOOD_BAD window showing Prior vector matrix with user-input Prior Probability matrix.]

      The prior probabilities will be used to adjust the relative contribution of each class when computing the total and average loss.

    12. Close the window and follow the prompts to save your changes to the target profile. You are returned to the Variables tab of the Input Data Source node. Do not close the Input Data Source node.

space
Previous Page | Next Page | Top of Page