Server Code

Overview

Throughout the example, the following process flow diagram is used to illustrate the results generated by the node:
Regression Node Example PFD
  • The target variable is AMOUNT.
  • The Linear Regression extension node has its Method property set to Stepwise.
  • The Linear Regression extension node has its Excluded Variables property set to Reject.
The extension node's server code consists of the following four files:
  • The reg.source entry contains the macro %main; it is the entry source for the node.
  • The reg_create.source entry contains the macro %create and is associated with the CREATE action. The macro % create initializes the macro variables associated with the node's properties and registers the data sets created by the node.
  • The reg_train.source entry contains the macro %train and is associated with the TRAIN action. The macro %train calls three additional macros: %procreg, %fillFile, and %makeScoreCode. The code for these three macros is therefore included in reg_train.source. The code generates and submits the PROC REG step code that produces the parameter estimates and generates the FLOW and PUBLISH scoring code.
  • The reg_score.source entry contains the macro %score and is associated with the SCORE action. The macro %score controls how variables that are excluded from the final model are exported from the node.

The reg.source Entry

%macro main;
	%if %upcase(&EM_ACTION) = CREATE %then %do;
		filename temp catalog 'sashelp.em61ext.reg_create.source';
		%include temp;
		filename temp;
		%create;
	%end;

	%else
	%if %upcase(&EM_ACTION) = TRAIN %then %do;
		filename temp catalog 'sashelp.em61ext.reg_train.source';
		%include temp;
		filename temp;
		%train;
	%end;

	%if %upcase(&EM_ACTION) = SCORE %then %do;
		filename temp catalog 'sashelp.em61ext.reg_score.source';
		%include temp;
		filename temp;
		%score;
	%end;

%mend main;
%main;

The CREATE Action

When the CREATE action is called, the following code stored in the reg_create.source entry is submitted:
%macro create;
	/* Training Properties */
	%em_property(name=Method, value=NONE);
	%em_property(name=Details, value=N);

/* Scoring Properties */
	%em_property(name=ExcludedVariable, value=REJECT, action=SCORE);

	/* Register Data Sets */
	%EM_REGISTER(key=OUTEST, type=DATA);
	%EM_REGISTER(key=EFFECTS, type=DATA);
%mend create;
Using the &EM_PROPERTY macro, we define two Train properties and one Score property:
  • Method is a String Property with a ChoiceList Control. The property indicates the model selection method that is used to obtain the final model. The initial value of the Method property is NONE, so by default, no selection method is used. The property has no action associated with it, so it is assumed to be a Train property.
  • Details is a Boolean Property. When set to Y, it indicates that statistics are to be listed in the output at the end of each step when a model selection method is used.
  • ExcludedVariable is a String Property with a ChoiceList Control. The property indicates how the node exports variables that are not selected in the final model when using a model selection technique. By default, the value is REJECT, which means that such variables have their role set to REJECTED. This is a Score property because it does not affect the model or results produced by PROC REG. For performance reasons, we do not need to refit the linear regression model if the user changes the property to NONE or HIDE. By associating the property with a SCORE action, the node skips over the TRAIN action and simply rescores and regenerates the exported metadata.
The %EM_REGISTER macro is used to register the EFFECTS and the OUTEST data sets, which contain the parameter estimates from the linear regression model.

The TRAIN Action

When the &EM_ACTION macro variable is set to TRAIN, the reg_train.source entry is executed. This extension node simply executes the REG procedure. The extension node has data requirements:
  • There must be a training data set imported by the node. If not, an exception is thrown indicating that the user must specify a training data set.
    Note: In this example, the exception string has been set to an encoding string that is recognized by the SAS Enterprise Miner client.
  • There must be an interval target variable. If not, an exception is thrown indicating that the user must specify an interval target variable.
The %EM_GETNAME macro is called to initialize the &EM_USER_OUTEST and &EM_USER_EFFECTS macro variables. These data sets are used to store the parameter estimates.
%macro train;
	%if %sysfunc(index(&EM_DEBUG, SOURCE))>0 or
		%sysfunc(index(&EM_DEBUG, ALL))>0 %then %do;
		options mprint;
	%end;

	%if (^%sysfunc(exist(&EM_IMPORT_DATA)) and
		^%sysfunc(exist(&EM_IMPORT_DATA, VIEW)))
		or "&EM_IMPORT_DATA" eq "" %then %do;
		%let EMEXCEPTIONSTRING = exception.server.IMPORT.NOTRAIN,1;
		%goto doenda;
	%end;

	%if (%EM_INTERVAL_TARGET eq ) %then %do;
		%let EMEXCEPTIONSTRING = exception.server.METADATA.USE1INTERVALTARGET;
		%goto doenda;
	%end;

	%em_getname(key=OUTEST, TYPE=DATA);
	%em_getname(key=EFFECTS, type=DATA);
	%procreg;
	%makeScoreCode;

	%em_model(TARGET=&targetvar,
		ASSESS=Y,
		DECSCORECODE=Y,
		FITSTATISTICS=Y,
		CLASSIFICATION=N,
		RESIDUALS=Y);
	%em_report(key=EFFECTS,
		viewtype=BAR,
		TIPTEXT=VARIABLE,
		X=VARIABLE,
		Freq=TVALUE,
		Autodisplay=Y,
		description=%nrbquote(Effects Plot),
		block=MODEL);
	%doenda:
%mend train;
In the %procreg macro, we fit a linear regression model using the REG procedure:
  • Using the ODS system, create the EFFECTS data set containing the parameter estimates.
  • If the Details property is set to Yes (corresponds to the &EM_PROPERTY_DETAILS macro variable), then the DETAILS options of the MODEL statement is used.
  • The model uses all interval and rejected variables with the “Use” attribute set to “Yes”. Those variables are assigned to the %EM_INTERVAL_INPUT and %EM_INTERVAL_REJECTED macros.
  • If a frequency variable is defined, the FREQ statement is used.
%macro procreg;
		%global targetVar;
		%let targetVar = %scan(%EM_INTERVAL_TARGET, 1, );
		ods output parameterestimates= &EM_USER_EFFECTS;
		proc reg data=&EM_IMPORT_DATA OUTEST=&EM_USER_OUTEST;
			model &targetVar = %EM_INTERVAL_INPUT %EM_INTERVAL_REJECTED
			%if %upcase(&EM_PROPERTY_METHOD) ne NONE %then %do;
				selection= &EM_PROPERTY_METHOD
			%end;
			;
		%if %EM_FREQ ne %then %do;
			freq %EM_FREQ;
		%end;
	run;
	ods _all_ close;
	ods listing;
%mend procreg;
The EFFECTS data set has the following structure:
Model   Dependent Variable  DF  Estimate     StdErr     tValue  Probt
MODEL1  amount Intercept    1   -1130.54625  534.48857  -2.12   0.0347
MODEL1  amount age          1   14.12780     5.53920    2.55    0.0109
MODEL1  amount duration     1   136.22034    5.32411    25.59   <.0001
MODEL1  amount employed     1   -108.10434   52.16738   -2.07   0.0385
MODEL1  amount foreign      1   567.01572    323.58225  1.75    0.0800
MODEL1  amount installp     1   -830.99671   54.44354   -15.26  <.0001
MODEL1  amount job          1   570.83009    103.14025  5.53    <.0001
MODEL1  amount property     1   263.71329    62.04117   4.25    <.0001
MODEL1  amount savings      1   56.29680     38.38939   1.47    0.1428
MODEL1  amount telephon     1   642.84575    135.33767  4.75    <.0001
You can easily generate the scoring code using this data set.
The OUTEST data set contains the parameter estimates for variables in the final model, but also identifies variables that are excluded from the model. It has the following structure:
_MODEL_  _TYPE_  _DEPVAR_  _RMSE_   Intercept  age
MODEL1   PARMS   amount    1892.16  -1130.55   14.1278

checking   coapp   depends
.          .       .
Note that the above output has been separated onto multiple rows for display purposes only.
The %makeScoreCode macro retrieves the name of the predicted variable using the decision metadata data set. If only one target variable is defined, that data set corresponds to the &EM_DEC_DECMETA macro variable. If multiple target variables are defined, you can retrieve the decision metadata data set from the &EM_TARGETDECINFO data set.
The %fillfile macro processes the EFFECTS data set, generates the scoring code, and saves it in the &EM_FILE_EMPUBLISHSCORECODE and &EM_FILE_FLOWSCORECODE files that correspond to the Publish and Flow scoring code, respectively.
%macro fillFile(type=, predVar=, file=);
	filename tempf "&file";
	data _null_;
		file tempf;
		set &EM_USER_EFFECTS end=eof;
		if _N_=1 then do;
			put "&predVar = ";
			if Variable = 'Intercept' then
				put Estimate;
			else
				put Estimate '*' Variable;
		end;
		else do;
			put '+' Estimate '*' Variable;
		end;
		if eof then do;
			put ";";
		end;
	run;
	filename tempf;
%mend fillFile;

%macro makeScoreCode;
	%let predvar=;
	%if &em_dec_decmeta eq %then %do;
		%if %sysfunc(exist(EM_TARGETDECINFO)) %then %do;
			data _null_;
				set EM_TARGETDECINFO;
				where TARGET="&targetVar";
				call symput('em_dec_decmeta', DECMETA);
			run;
		%end;
	%end;
	%if (&em_dec_decmeta ne ) and %sysfunc(exist(&em_dec_decmeta)) %then %do;
		data _null_;
			set &em_dec_decmeta;
			where _TYPE_ = 'PREDICTED';
			call symput('predVar', strip(VARIABLE));
			call symput('predLabel', strip(LABEL));
		run;
	%end;

	%if &predVar eq %then %goto doendm;

	%fillFile(type=publish, predvar=&predVar, file=&EM_FILE_EMPUBLISHSCORECODE);
	%fillFile(type=flow, predvar=&predVar, file=&EM_FILE_EMFLOWSCORECODE);
	%doendm:

%mend makeScoreCode;
The generated scoring code has the following form:
P_amount =
-1130.54625
+14.12780 *age
+136.22034 *duration
+-108.10434 *employed
+567.01572 *foreign
+-830.99671 *installp
+570.83009 *job
+263.71329 *property
+56.29680 *savings
+642.84575 *telephon
;
The %EM_MODEL macro is used to generate additional scoring code and to produce assessment reports.
%em_model(TARGET=&targetvar,
	ASSESS=Y,
	DECSCORECODE=Y,
	FITSTATISTICS=Y,
	CLASSIFICATION=N,
	RESIDUALS=Y);
  • ASSESS=Y — indicates to generate assessment reports (Score Rankings and Score Distribution).
  • DECSCORECODE=Y — indicates to append score code to generate decision variables when a profit matrix is defined.
  • FITSTATISTICS=Y — indicates to compute fit statistics associated with the model. Those are computed for the training data set and for validation and test data sets when applicable.
  • CLASSIFICATION=N — indicates not to generate report and score code associated with the classification variables (I_).
  • RESIDUALS=Y — indicates to append the code generating the residual variable (R_) to the flow score code and produce the residual report.
For example, the Flow scoring code would now appear as follows:
P_amount =
-1130.54625
+14.12780 *age
+136.22034 *duration
+-108.10434 *employed
+567.01572 *foreign
+-830.99671 *installp
+570.83009 *job
+263.71329 *property
+56.29680 *savings
+642.84575 *telephon
;
*------------------------------------------------------------*;
*Computing Residual Vars: amount;
*------------------------------------------------------------*;
Label R_amount = 'Residual: amount';
R_amount = amount - P_amount;
The %EM_REPORT macro generates a graph of the parameter estimates:
%em_report(key=EFFECTS,
viewtype=BAR,
TIPTEXT=VARIABLE,
X=VARIABLE,
Freq=TVALUE,
Autodisplay=Y,
description=%nrbquote(Effects Plot),
block=MODEL);
  • Key=EFFECTS — identifies the data set used to produce the chart.
  • Viewtype=BAR — indicates to generate a BAR graph.
  • TIPTEXT=VARIABLE — indicates that the variable named VARIABLE is to be used to identify a bar when clicking on it.
  • X=VARIABLE — indicates that the bar chart should have one bar for each variable.
  • FREQ=TVALUE — specifies that the variable TVALUE should be used to control the height of the various bar.
  • AutoDisplay=Y — indicates to display the report whenever the Results viewer of the node is opened.
  • Description==%nrbquote(Effects Plot) — specifies the title bar of the report.
  • Block=MODEL — indicates that the report should appear under the “Model” menu item.
Effects Plot

The SCORE Action

When the &EM_ACTION macro variable is set to SCORE, the reg_score.source entry is executed.
The %em_getname macro is used again to retrieve the &em_user_outest macro variable. This is done because the training code might not be running before executing the SCORE action. For example, if the ExcludedVariable is the only modified property, the TRAIN action would be bypassed.
If the user specifies a model selection method using the Method property and sets the ExcludedVariable property to either HIDE or REJECT, the node generates DATA step code that modifies the metadata that is exported to successor nodes. The DATA step code is saved in the &EM_FILE_CMETA_TRAIN file.
Using PROC TRANSPOSE of Base SAS, the node identifies all the variables with missing parameter estimates. Those are variables excluded from the final model. If the ExcludedVariable property is set to REJECT, then the role of the variables with missing parameter estimates is set to REJECTED. If the ExcludedVariable property is set to HIDE, variables with missing parameter estimates are deleted from the exported metadata so that successor nodes are not exposed to those variables.
%macro score;
	/* Delete Code Modifying Exported Metadata */
	filename tempd "&EM_FILE_CDELTA_TRAIN";
	data _null_;
		if fexist('tempd') then
			rc=fdelete('tempd');
	run;

	%if (%upcase("&EM_PROPERTY_METHOD") ne "NONE") and
		(%upcase("&EM_PROPERTY_EXCLUDEDVARIABLE") ne "NONE")
		%then %do;

	%em_getname(key=OUTEST, type=DATA);
	proc transpose data=&EM_USER_OUTEST
		out=temp(where=(Col1 eq .));
	run;

	data _null_;
		file tempd;
		length String $200;
		set temp end=eof;
		if _N_=1 then put 'if upcase(NAME) in(';
		string = quote(strip(upcase(_NAME_)));
		put string;
		if eof then do;
			%if %upcase("&EM_PROPERTY_EXCLUDEDVARIABLE") eq "REJECT"
				%then %do;
			put ') then ROLE="REJECTED";';
			%end;
			%else %do;
				put ') then delete;';
			%end;
		end;
	run;
	%end;

	filename tempd;
%mend score;
For example, the generated “delta code” could have the following form:
if upcase(NAME) in(
"CHECKING"
"COAPP"
"DEPENDS"
"EXISTCR"
"HISTORY"
"HOUSING"
"MARITAL"
"OTHER"
"RESIDENT"
) then ROLE="REJECTED";