Previous Page | Next Page

Working with Nodes That Sample, Explore, and Modify

Replace Missing Data

You use the Replacement node to generate score code to process unknown variable levels when you are scoring data, and to interactively specify replacement values for class levels.

In this task, you add and configure a Replacement node in your process flow diagram.

  1. From the Modify tab of the node toolbar, drag a Replacement node into the Diagram Workspace and connect it to the Data Partition node.

    [untitled graphic]

  2. Select the Data Partition node. On the Properties panel, select the ellipsis button to the right of the Variables property to explore any of the variables in the input data set. The Variables window opens.

    [untitled graphic]

  3. In the Variables window, sort by level and then select the variables SES and URBANICITY, and then click Explore. The Explore window opens.

    Note:   If Explore is dimmed and unavailable, right-click the Data Partition node and select Run.  [cautionend]

    [untitled graphic]

  4. In the Explore window, notice that both the SES and URBANICITY variables contain observations that have missing values. The observations are represented by question marks. Later, you will use the Impute node to replace the missing values with imputed values that have more predictive power.

    [untitled graphic]

  5. Double click the bar that corresponds to missing values (SES = "?") in the SES histogram. Notice that when observations display missing values for the variable SES, the observations also display missing values for the variable URBANICITY. The graphs interact with one another.

    [untitled graphic]

  6. Close the Explore window.

  7. Click OK to close the Variables window.

  8. In the Replacement node Properties panel, select the ellipsis button to the right of the Class Variables Replacement Editor property.

    [untitled graphic]

  9. The Replacement Editor window opens.

    Note:   By default, Enterprise Miner replaces unknown levels using the Unknown Levels property in the Properties panel. The choices are Ignore, Missing and Mode (the most frequent value). Ensure that the Unknown Level property is set to Ignore.  [cautionend]

    [untitled graphic]

  10. Scroll through the data table in the Replacement Editor window. Observe the values for the variable levels of SES and URBANICITY. When one of these variable levels displays a question mark (?) in the Char Raw value column, enter _MISSING_ in the Replacement Value column for that row. This will cause the Replacement node to replace the variable value with a SAS missing value notation.

    [untitled graphic]

  11. Click OK.

  12. Right-click the Replacement node and select Run.

    [untitled graphic]

Previous Page | Next Page | Top of Page