In this example, the
variables SES and URBANICITY are class variables for which the value
?
denotes
a missing value. Because a question mark does not denote a missing
value in the terms that SAS defines a missing value (that is, a blank
or a period), SAS Enterprise Miner sees it as an additional level
of a class variable. However, the knowledge that these values are
missing will be useful later in the model-building process.
To use the Replacement
node to interactively specify that such observations of these
variables are missing:
-
Select the
Modify tab
on the Toolbar.
-
Select the
Replacement node
icon. Drag the node into the Diagram Workspace.
-
Connect the
Data
Partition node to the
Replacement node.
-
Select the
Replacement node.
In the Properties Panel, scroll down to view the Train properties.
-
For interval variables,
click on the value of
Default Limits Method,
and select
None from the drop-down menu that
appears. This selection indicates that no values of interval variables
should be replaced. With the default selection, a particular range
for the values of each interval variable would have been enforced.
In this example, you do not want to enforce such a range.
Note: In this data set, all missing
interval variable values are correctly coded as SAS missing values
(a blank or a period).
-
For class variables,
click on the ellipses that represent the value of
Replacement
Editor. The Replacement Editor opens.
-
Notice that SES and URBANICITY
both have a level that contains observations with the value
?
.
In the case of these two variables, this level represents observations
with missing values. Enter
_MISSING_
as
the
Replacement Value for the two rows, as
shown in the image below. This action enables SAS Enterprise Miner
to see that the question marks indicate missing values for these two
variables. Later, you will impute values for observations with missing
values.
-
Enter
_UNKNOWN_
as
the
Replacement Value for the level of DONOR_GENDER
that has the value
A
. This value is
the result of a data entry error, and you do not know whether the
intention was to code it as an
F
or
an
M
.
-
In the Diagram Workspace,
right-click the Replacement node, and select
Run from
the resulting menu. Click
Yes in the
Confirmation window
that opens.
-
In the window that appears
when processing completes, click
OK.
In the data that is
exported from the Replacement node, a new variable is created for
each variable that is replaced (in this example, SES, URBANICITY,
and DONOR_GENDER). The original variable is not overwritten. Instead,
the new variable has the same name as the original variable but is
prefaced with REP_. The original version of each variable also exists
in the exported data and has the role
Rejected
.
To view the data that
is exported by a node, click the ellipsis button that represents the
value of the General property
Exported Data in
the Properties Panel. To view the exported variables, click
Properties in
the window that opens, and then view the
Variables tab. Similarly,
you can view the data that is imported and used by a node by clicking
the ellipsis button that represents the value of the General property
Imported
Data in the Properties Panel.