The specific function
of each node is performed by a SAS program that is associated with
the node. Thus, when a node is placed in a process flow diagram, it
is a graphical representation of a SAS program. An extension node's
SAS program consists of one or more SAS source code files residing
on the SAS Enterprise Miner server. The source code can be stored
in a SAS library or in external files. Any valid SAS statement can
be used in an extension node's SAS program. However, you cannot issue
statements that generate a SAS windowing environment. The SAS windowing
environment from Base SAS is not compatible with SAS Enterprise Miner.
For example, you cannot execute
SAS/LAB software from within an extension
node.
As you begin to design
your node's SAS program, ask yourself these five questions:
-
What needs to occur when the extension
node's icon is initially placed in a process flow diagram?
-
What is the node going to accomplish
at run time?
-
Will the node generate Publish
or Flow code?
-
What types of reports should be
displayed in the node's
Results window?
-
What program options or arguments
should the user be able to modify; what should the default values
be; and should the choices, or range of values, be restricted?
SAS Enterprise Miner
5.3 introduced two new features that can significantly enhance the
performance of extension nodes: the EM6 server class and the &EM_ACTION
macro variable. With these features, a node's code can be separated
into the following actions that identify the type of code that is
running:
-
Create — executes only when
the node is first placed on a process flow diagram.
-
Train — executes the first
time the node is run. Subsequently, it executes when one of the following
occurs:
-
A user runs the node and an input
data set has changed.
-
A user runs the node and the variables
table has changed.
-
A user runs the node and one of
the node's Train properties has been changed.
-
Score — executes the first
time the node is run. Subsequently, it executes when one of the following
occurs:
-
A user runs the node and an input
data set has changed.
-
A user runs the node and one of
the node's Score properties has been changed.
-
The Train action has executed.
-
Report — executes the first
time the node is run. Subsequently, it executes when one of the following
occurs:
-
A user runs a node and one of the
node's Report properties has been changed.
-
The Train or Score action has executed.
To take advantage of
this feature, write your code as separate SAS macros. SAS Enterprise
Miner executes the macros sequentially, each triggered by an internally
generated &EM_ACTION macro variable. That is, the &EM_ACTION
macro variable initially resolves to a value of CREATE. When all code
associated with that action has completed, the &EM_ACTION macro
variable is updated to a value of TRAIN. When all code associated
with the TRAIN action has executed, the &EM_ACTION macro variable
is updated to a value of SCORE. After all code associated with the
SCORE action has executed, the &EM_ACTION macro variable is updated
to a value of REPORT; all code associated with the REPORT action is
then executed.
Each Property that you
define in the node's XML properties file can be assigned an action
value. When a node is placed in a process flow diagram and the process
flow diagram is run initially, all of the node's code executes and
all executed actions are recorded. When the process flow diagram is
run subsequently, the code doesn't have to execute again unless a
property setting, the variables table, or data imported from a predecessor
node has changed. If a user has changed a property setting, SAS Enterprise
Miner can determine what action is associated with that property.
Thus, it can begin the new execution sequence with that action value.
For example, suppose that a user changes a REPORT property setting.
The TRAIN and SCORE code does not have to execute again. This can
save significant computing time, particularly when you have large
data sets, complex algorithms, or many nodes in a process flow diagram.
You are not required
to take advantage of actions, and your code is not required to conform
to any particular structure. However, to take full advantage of the
actions mechanism, write your SAS code so that it conforms to the
following structure:
%macro main;
%if %upcase(&EM_ACTION) = CREATE %then %do;
/*add CREATE code */
%else;
%if %upcase(&EM_ACTION) = TRAIN %then %do;
/*add TRAIN code */
%else;
%if %upcase(&EM_ACTION) = SCORE %then %do;
/*add SCORE code */
%else;
%if %upcase(&EM_ACTION) = REPORT %then %do;
/*add REPORT code */
%mend main;
%main;
Typically, the code
associated with the CREATE, TRAIN, SCORE, and REPORT actions consists
of four separate macros — %Create, %Train, %Score, and %Report.
All nodes do not have
code associated with all four actions. This poses no problem. SAS
Enterprise Miner recognizes only the entry point that you declare
in the node's XML properties file. It initializes the &EM_ACTION
macro variable and submits the main program. If the main program does
not include any code that is triggered by a particular action, the
&EM_ACTION macro variable is updated to the next action in the
sequence. Therefore, if you do not separate your code by actions,
all code is treated like TRAIN code; the entire main program must
execute completely every time the node is run.
A common practice used
for SAS Enterprise Miner nodes is to place the macro, %Main, in a
separate file named name. source. name is the name of the node and
typically corresponds to the value of the name attribute of the Components
element in the XML properties file. name.source serves as the entry
point for the extension node's SAS program. It is also common practice
to place the source code for the %Create, %Train, %Score, and %Report
macros in separate files with names like name_create.source, name_train.source,
name_score.source, and name_report.source. There might also be additional
files containing other macros or actions with names like name_macros.source
and name_actions.source (these types of actions are discussed in
Appendix 2: Controls That Require Server Code. To implement this strategy, use
FILENAME and %INCLUDE statements in the %Main macro to access the
other files. For example, assume that your extension node's SAS program
is stored in the Sashelp library in a SAS catalog named Sashelp.Emext
and that the catalog contains these five files:
Example.source would
contain the %Main macro, and it would appear as follows:
/* example.source */
%macro main;
%if %upcase(&EM_ACTION) = CREATE %then %do;
filename temp catalog 'sashelp.emext.example_create.source';
%include temp;
filename temp;
%create;
%end;
%else
%if %upcase(&EM_ACTION) = TRAIN %then %do;
filename temp catalog 'sashelp.emext.example_train.source';
%include temp;
filename temp;
%train;
%end;
%else
%if %upcase(&EM_ACTION) = SCORE %then %do;
filename temp catalog 'sashelp.emext.example_score.source';
%include temp;
filename temp;
%score;
%end;
%else
%if %upcase(&EM_ACTION) = REPORT %then %do;
filename temp catalog 'sashelp.emext.example_report.source';
%include temp;
filename temp;
%report;
%end;
%mend main;
%main;
The other four files
would contain their respective macros. There is more to this strategy
than simple organizational efficiency; it can actually enhance performance.
To illustrate, consider the following scenario. When a node is first
placed in a process flow diagram, the entire main program is read
and processed. Suppose your TRAIN code contains a thousand lines of
code. If the code is contained in the main program, all thousand lines
of TRAIN code must be read and processed. However, if the TRAIN code
is in a separate file, that code is not processed until the first
time the node is run.
A similar situation
can occur at run time. At run time, the entire main program is processed.
Suppose the node has already been run once and the user has changed
a Report property. The actions mechanism prevents the TRAIN code from
executing again. However, if your TRAIN code is stored in a separate
file, the TRAIN code does not have to be read and processed. This
is the recommended strategy.
To store your code in
external files rather than in a SAS catalog, simply alter the FILENAME
statements accordingly. However, you must store the entry point file
(for example, example.source) in a catalog and place it in a SAS library
that is accessible by Enterprise Miner. The simplest way to do this
is to include your catalog in the Sashelp library by placing the catalog
in the SASCFG folder. The exact location of this folder depends on
your operating system and your installation configuration, but it
is always found under the root SAS directory and has a path resembling
...\SAS\SASFoundation\9.2\nls\en\SASCFG
.
For example, on a typical Windows installation, the path is
C:\Program
Files\SAS\SASFoundation\9.2\nls\en\SASCFG
.
You can also store the
catalog in another folder and then modify the SAS system configuration
file Sasv9.cfg so that this folder is included in the Sashelp search
path. The Sasv9.cfg file is located under the root SAS directory in
...\SAS\SASFoundation\9.2\nls\en
.
Putting your code in the Sashelp library enables anyone using that
server to access it.
An alternative is to
place your code in a separate folder and issue a LIBNAME statement.
The library needs to be accessible when a project is opened because
a node's main program is read and processed when the node is first
placed in a process flow diagram (only the CREATE action is executed).
If a LIBNAME statement has not been issued when a project opens and
you drop a node in a process flow diagram, the node's main program
will not be accessible by Enterprise Miner. See
Appendix 4: Allocating Libraries for SAS Enterprise
Miner for details.