Previous Page | Next Page

Working with Nodes That Modify, Model, and Explore

Create Variable Transformations


Overview

Some data can be better mined by modifying the variable values with some transformation function. The data is often useful in its original form, but transforming the data might help maximize the information content that you can retrieve. Transformations are useful when you want to improve the fit of a model to the data. For example, transformations can be used to stabilize variance, remove nonlinearity, improve additivity, and correct non-normality.

You can use the Formula Builder and Expression Builder windows in the Transform Variable node to create variable transformations. You can also view distribution plots of variables before and after the transformation to assess how effective the data transformation is.


View Variable Distribution Plots

  1. Drag a Transform Variables node from the Modify tab of the node toolbar into the Diagram Workspace.

  2. Connect the Impute node to the Transform Variables node.

    [untitled graphic]

  3. Select the Transform Variables node in the Diagram Workspace to view its settings in the Properties panel. The default transformation method for all variables is None. You can use the Variables property to configure variable transformation on a case-by-case basis, or you can use the Default Methods section of the Properties panel to set transformation methods according to variable type.

    The variable distribution plots that you view in the Transform Variables node are generated using sampled data. You can configure how the data is sampled in the Sample Properties section of the Transform Variables node Properties panel.

  4. In the Properties panel for the Transform Variables node, click the ellipsis button to the right of the Formulas property. This action opens the Formulas window.

    [untitled graphic]

    In the Formulas window, the Outputs table is empty, because you have not created any variables yet.

    [untitled graphic]

  5. Examine the distributions of the current variables, and note which variables might benefit from transformation. A good variable transformation modifies the distribution of the variable so that it more closely resembles a normal distribution (a bell-shaped curve).

  6. View the distribution plots for the variables SES and URBANICITY to see the data before the missing values were replaced with imputed values. Distribution plots for the variables IMP_REPL_SES and IMP_REPL_URBANICITY show the data after the missing values were imputed and replaced.

    [untitled graphic]


Add a Variable Transformation

  1. Click the Create icon [Add a Variable Transformation]on the left side of the toolbar to start creating a variable transformation.

    The Add Transformation window opens.

    [Add Transformation Window]

  2. Edit the following Value columns to configure the new variable that you are creating:

    • Change the Name from TRANS_0 to OVERALL_RESP_RATE.

    • Set Format to PERCENT6..

    • Set Label to Overall Response Rate.

    [Add Transformation Window]

  3. Click Build in the Add Transformation window. The Expression Builder window opens.

    [Expression Builder Window]

  4. Click on All Functions to see the comprehensive list of pre-built SAS functions that are available for variable transformations.

  5. Select the Variables List tab in the Expression Builder window. Scroll down the list of variables to REP_LIFETIME_GIFT_COUNT, select it, and click Insert. The REP_LIFETIME_GIFT_COUNT variable appears in the Expression Text box.

    [untitled graphic]

  6. Click the division operator button [untitled graphic]. Return to the Variables List tab and select the variable REP_LIFETIME_PROM.

    [untitled graphic]

  7. Click Insert. The REP_LIFETIME_GIFT_COUNT/LIFETIME_PROM expression appears in the Expression Text box.

    [untitled graphic]

  8. Click OK in the Expression Builder window.

  9. Click OK in the Add Transformation window.

    [untitled graphic]

  10. In the Formulas window, click Preview to see a plot of the new variable.

    [untitled graphic]

  11. Note that because the distribution of OVERALL_RESP_RATE is skewed, you should transform it further.

    [untitled graphic]

  12. Click the Edit Expression button [untitled graphic] on the left side of the Formulas window.

  13. Select the REP_LIFETIME_GIFT_COUNT/REP_LIFETIME_PROM expression in the Expression Text box.

  14. On the Functions tab, select the Mathematical folder and then select LOG(argument) from the panel on the right.

    [untitled graphic]

  15. Click Insert. The expression text is updated as follows:

    [untitled graphic]

  16. Click OK in the Expression Builder window.

  17. Click Refresh Plot at the bottom left of the Formulas window.

    [untitled graphic]

    The distribution is now much closer to a normal distribution.

  18. Because the Overall Response Rate variable has been mathematically transformed, the variable's format (PERCENT) is no longer accurate. The variable format requires updating. To change the variable format, click the Edit Properties icon [untitled graphic] on the left side of the Formulas window.

  19. In the Edit Transformation window, select Format and then press the Backspace key to clear the text box. Leave the Format value blank in order to use the default format for numeric values.

    [untitled graphic]

  20. Click OK in the Edit Transformation window.

  21. Click OK to exit the Formulas window.


Apply Standard Variable Transformations

You can now apply standard transformations to some of the original variables to modify the distributions so that they more closely resemble a normal distribution. Typical transformations include functions such as logarithmic functions, binning, square root, and inverse functions. The default method for variable transformations for all target and input measurement levels is none, as noted in the Properties panel.

  1. To apply transformations to selected variables, click the ellipsis button to the right of the Variables property in the Transform Variables Properties panel.

    [untitled graphic]

    The Variables - Trans window opens.

  2. You can transform individual variables in the Variables - Trans window. Apply the Log Method transformation to each of the following variables:

    • REP_FILE_AVG_GIFT

    • REP_LAST_GIFT_AMT

    • REP_LIFETIME_AVG_GIFT_AMT

    • REP_LIFETIME_GIFT_AMOUNT

    You can highlight adjacent variable rows, or you can hold down the CTRL-key and selecting non-contiguous variables and apply the same transformation to these highlighted variables.

    [untitled graphic]

  3. Apply the Optimal method to the following variables:

    • REP_LIFTIME_CARD_PROM

    • REP_LIFETIME_GIFT_COUNT

    • REP_MEDIAN_HOME_VALUE

    • REP_MEDIAN_HOUSEHOLD_INCOME

    • REP_PER_CAPITA_INCOME

    • REP_RECENT_RESPONSE_PROP

    • REP_RECENT_STAR_STATUS

    Note that you can hold down the CTRL key and select multiple variables to change their settings at one time instead of changing each one individually.

    [untitled graphic]

  4. Select the Method column heading to sort the variable rows by the transformation method.

  5. Click OK to close the Variables - Trans window.

    Note:   When Enterprise Miner creates imputed variable values in a data set, the original data set variables remain, but are automatically assigned a Rejected variable status. Rejected variables are not included in data mining algorithms that follow the data imputation step.  [cautionend]

  6. Run the Transform Variables node.

Previous Page | Next Page | Top of Page