Example Process Flow Diagram

Task 7. Creating Variable Transformations

Now that you have partitioned the data, you might want to explore the data with exploratory nodes and perhaps modify the data with modification nodes. For brevity, in this section you will use the Transform Variables node to create transformations of existing variables. The data is often useful in its original form, but transformations may help to maximize the information content that you can retrieve. Transformations are useful when you want to improve the fit of a model to the data. For example, transformations can be used to stabilize variance, remove nonlinearity, improve additivity, and correct nonnormality.

  1. Add a Transform Variables node to the Diagram Workspace.

  2. Connect the Data Partition node to the Transform Variables node.

  3. Open the configuration interface to the Transform Variables node. The Variables tab is displayed.

  4. View the distribution of AMOUNT:

    1. Right-click in any cell of the AMOUNT variable row and select View Distribution of AMOUNT. A histogram of AMOUNT is displayed in the Variable Histogram window.

      [Variable Histogram window showing bar chart for Amount.]

      Note:   Notice that the distribution for AMOUNT is skewed heavily to one side. The extreme values may cause imprecision in the parameter estimates.  [cautionend]

    2. Close the window to return to the Variables tab of the Transform Variables window.

  5. Create a new input variable that maximizes the normality of AMOUNT:

    1. Right-click in any cell of the row that contains the variable AMOUNT and select Transform. Another pop-up menu opens.

    2. Select the Maximize Normality power transformation to create the transformation and return to the Variables tab of the Transform Variables window. Maximize Normality chooses the transformation from a set of best power transformations that yields sample quantiles that are closest to the theoretical quantiles of a normal distribution.

      [Variables Tab of Transform Variables configuration window showing new variable AMOU_ONV]

      A new variable has been created (for this example, AMOU_ONV), which is the log of AMOUNT. If you scroll to the right, the Formula column lists the formula for the transformation. The skewness statistic has been reduced from 1.95 for AMOUNT to 0.13 for the transformed variable AMOU_ONV. The Keep status column identifies variables that will be kept and passed to subsequent modeling nodes (Yes) and those that will not be (No). The keep status for the original input AMOUNT is automatically set to No when you apply a transformation.

      [Variables tab of Transform Variables configuration window showing logarithmic values associated with newly created variable AMOU_ONV.]

    3. View the distribution for the log of AMOUNT (for this example, AMOU_ONV).

      [Variable Histogram window showing bar chart of logarithmic values of the variable AMOUNT]

      Note:   Notice that the distribution for the log of AMOUNT is fairly symmetrical.  [cautionend]

    4. Close the window to return to the Variables tab of the Transform Variables window.

  6. Create an ordinal grouping variable from an interval variable:

    Using the Transformation node, you can easily transform an interval variable into a group variable. Because you are interested in the credit worthiness of particular age groups, create an ordinal grouping variable from the interval input variable AGE.

    1. Right-click in any cell of the AGE variable row and select Transform. Another pop-up menu opens.

    2. Select Bucket. This selection opens the Input Number window, which is used to define the number of buckets (groups) that you want to create. By default, the node creates four buckets.

      [Input Number window queries user How Many Buckets? with numeric input spinner displaying the number 4]

    3. Select Close to create the default four buckets. The Select values window opens. It displays the position of each bucket.

      [Select Values window displaying bucket positions within the bar chart values.]

      To reposition a bin (bucket), drag the slider bar for the bin to a new location. You can also set the bin position by selecting the bin number that you want to modify from the Bin drop-down arrow and entering a new bin value in the Value text box.

    4. For this example, close the Select Values window to use the default bin positions and to return to the Variables tab. The new bucket variable is added as a variable entry in the Variables tab. The keep status of the original input variable AGE is set to No.

  7. Close the Transform Variables node. Click Yes in the Message window to save your changes.

space
Previous Page | Next Page | Top of Page