Integrating a modeling
node into the Enterprise Miner environment requires that you write
scoring code that generates predicted or posterior variables with
appropriate names. The attributes of the variables and assessment
variables for each target variable are stored in SAS data sets. The
names of the data sets can be found in WORK.EM_TARGETDECINFO. Consider
the following process flow diagram:
The variable BAD is
the single target variable and has the following decisions profile:
Say that you add the
following code to the Train code of the node:
proc print data=work.em_targetdecinfo;
run;
Then you would get the
following output:
The output, by default,
displays the names of the variables that you want to create. For example,
after you train your model, you need to generate two variables that
represent the predictions for the target variable, BAD. The output
above tells you that the names of the variables, in this example,
should be P_BAD1 and P_BAD0; P_BAD1 is the probability that BAD =
1 and P_BAD0 is the probability that BAD = 0. The source of that information
is the DECMETA data set for the target, BAD. The result of the PROC
PRINT statement that is displayed at the bottom of the output informs
us that the name of the DECMETA data set is EMWS8.Ids_BAD_DM. Using
Explorer, we can view the data set:
At run time, when there
is only one target variable, the &EM_DEC_DECMETA macro variable
is assigned the name of the decision metadata data set for the target
variable. In this example, &EM_DEC_DECMETA resolves to EMWS8.Ids_BAD_DM.
Using &EM_DEC_DECMETA enables you to retrieve the information
programmatically. For example, the code below creates two macro arrays,
pred_vars and pred_labels, that contain the names and labels, respectively,
of the posterior or predicted variables. The numLevels macro variable
identifies the number of levels for a class target variable.
data _null_;
set &em_dec_decmeta end=eof;
where _TYPE_='PREDICTED';
call symput('pred_vars'!!strip(put(_N_,BEST.)),
strip(Variable));
call symput('pred_labels'!!strip(put(_N_,BEST.)),
strip(tranwrd(Label,"'","''")));
if eof then
call symput('numLevels', strip(put(_N_,BEST.)));
run;
You can loop through
the macro arrays using the numLevels macro variable as the terminal
value for the loop.
If more than one target
variable is used, then &EM_DEC_DECMETA is blank. In that case,
you need to retrieve the names of the decisions data sets (one per
target) from the WORK.EM_TARGETDECINFO data set. The code below demonstrates
how this can be accomplished:
data _null_;
set WORK.EM_TARGETDECINFO;
where TARGET = 'target-name';
call symput('EM_DEC_DECMETA', decmeta);
run;
For example, suppose
we modify the attributes of the Home Equity data set making JOB a
target variable in addition to the variable BAD. Then suppose we give
it the following decision profile:
Note: The profile above is for
demonstration purposes only; the values are not intended to represent
a realistic decision profile for business purposes.
Suppose you add this
code:
data _null_;
set work.em_targetdecinfo;
where TARGET = "JOB";
call symput("em_dec_decmeta", decmeta);
run;
This code then causes
the macro variable, &EM_DEC_DECMETA, to resolve to the value,
EMWS8.Ids_JOB_DM. Using Explorer once again, you can view the DECMETA
data set for the target variable, JOB:
You would use this code
once for each target variable, making the appropriate substitution
for the target-name in the
WHERE statement.
If the data sets exported
by the node contain the appropriate predicted variables, the %EM_MODEL
macro can be used to notify the Enterprise Miner environment to compute
fit statistics. It can also generate scoring code that computes classification
(I_, F_, and U_ variables), decision, and residual variables (R_ variables).
Assessment statistics are produced by default, provided those variables
are available.