XML Property Files

Basic Structure

An extension node's XML properties file provides a facility for managing information about the node. The XML file for an extension node is stored under the SAS configuration directory: ...\SAS\Config\Levn\analyticsPlatform\apps\EnterpriseMiner\ext.
The basic structure and minimal features of an XML properties file are as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Component PUBLIC
	"-//SAS//EnterpriseMiner DTD Components 1.3//EN"
	"Components.dtd">

<Component
	type="AF"
	resource="com.sas.analytics.eminer.visuals.PropertyBundle"
	serverclass="EM6"
	name=" "
	displayName=" "
	description=" "
	group=" "
	icon=" "
	prefix=" " >

<PropertyDescriptors>
</PropertyDescriptors>

<Views>
</Views>

</Component>
The preceding XML code can be copied verbatim and used as a template for an extension node's XML properties file. XML is case-sensitive, so it is important that the element tags are written as specified in the example. The values for all of the elements' attributes must be quoted strings.

Server Code

The specific function of each node is performed by a SAS program that is associated with the node. Thus, when a node is placed in a process flow diagram, it is a graphical representation of a SAS program. An extension node's SAS program consists of one or more SAS source code files residing on the SAS Enterprise Miner server. The source code can be stored in a SAS library or in external files. Any valid SAS statement can be used in an extension node's SAS program. However, you cannot issue statements that generate a SAS windowing environment. The SAS windowing environment from Base SAS is not compatible with SAS Enterprise Miner. For example, you cannot execute SAS/LAB software from within an extension node.
As you begin to design your node's SAS program, ask yourself these five questions:
  • What needs to occur when the extension node's icon is initially placed in a process flow diagram?
  • What is the node going to accomplish at run time?
  • Will the node generate Publish or Flow code?
  • What types of reports should be displayed in the node's Results window?
  • What program options or arguments should the user be able to modify; what should the default values be; and should the choices, or range of values, be restricted?
SAS Enterprise Miner 5.3 introduced two new features that can significantly enhance the performance of extension nodes: the EM6 server class and the &EM_ACTION macro variable. With these features, a node's code can be separated into the following actions that identify the type of code that is running:
  • Create — executes only when the node is first placed on a process flow diagram.
  • Train — executes the first time the node is run. Subsequently, it executes when one of the following occurs:
    • A user runs the node and an input data set has changed.
    • A user runs the node and the variables table has changed.
    • A user runs the node and one of the node's Train properties has been changed.
  • Score — executes the first time the node is run. Subsequently, it executes when one of the following occurs:
    • A user runs the node and an input data set has changed.
    • A user runs the node and one of the node's Score properties has been changed.
    • The Train action has executed.
  • Report — executes the first time the node is run. Subsequently, it executes when one of the following occurs:
    • A user runs a node and one of the node's Report properties has been changed.
    • The Train or Score action has executed.
To take advantage of this feature, write your code as separate SAS macros. SAS Enterprise Miner executes the macros sequentially, each triggered by an internally generated &EM_ACTION macro variable. That is, the &EM_ACTION macro variable initially resolves to a value of CREATE. When all code associated with that action has completed, the &EM_ACTION macro variable is updated to a value of TRAIN. When all code associated with the TRAIN action has executed, the &EM_ACTION macro variable is updated to a value of SCORE. After all code associated with the SCORE action has executed, the &EM_ACTION macro variable is updated to a value of REPORT; all code associated with the REPORT action is then executed.
Each Property that you define in the node's XML properties file can be assigned an action value. When a node is placed in a process flow diagram and the process flow diagram is run initially, all of the node's code executes and all executed actions are recorded. When the process flow diagram is run subsequently, the code doesn't have to execute again unless a property setting, the variables table, or data imported from a predecessor node has changed. If a user has changed a property setting, SAS Enterprise Miner can determine what action is associated with that property. Thus, it can begin the new execution sequence with that action value. For example, suppose that a user changes a REPORT property setting. The TRAIN and SCORE code does not have to execute again. This can save significant computing time, particularly when you have large data sets, complex algorithms, or many nodes in a process flow diagram.
You are not required to take advantage of actions, and your code is not required to conform to any particular structure. However, to take full advantage of the actions mechanism, write your SAS code so that it conforms to the following structure:
%macro main;
%if %upcase(&EM_ACTION) = CREATE %then %do;
	/*add CREATE code */
%else;
%if %upcase(&EM_ACTION) = TRAIN %then %do;
	/*add TRAIN code */
%else;
%if %upcase(&EM_ACTION) = SCORE %then %do;
	/*add SCORE code */
%else;
%if %upcase(&EM_ACTION) = REPORT %then %do;
	/*add REPORT code */
%mend main;
%main;
Typically, the code associated with the CREATE, TRAIN, SCORE, and REPORT actions consists of four separate macros — %Create, %Train, %Score, and %Report.
All nodes do not have code associated with all four actions. This poses no problem. SAS Enterprise Miner recognizes only the entry point that you declare in the node's XML properties file. It initializes the &EM_ACTION macro variable and submits the main program. If the main program does not include any code that is triggered by a particular action, the &EM_ACTION macro variable is updated to the next action in the sequence. Therefore, if you do not separate your code by actions, all code is treated like TRAIN code; the entire main program must execute completely every time the node is run.
A common practice used for SAS Enterprise Miner nodes is to place the macro, %Main, in a separate file named name. source. name is the name of the node and typically corresponds to the value of the name attribute of the Components element in the XML properties file. name.source serves as the entry point for the extension node's SAS program. It is also common practice to place the source code for the %Create, %Train, %Score, and %Report macros in separate files with names like name_create.source, name_train.source, name_score.source, and name_report.source. There might also be additional files containing other macros or actions with names like name_macros.source and name_actions.source (these types of actions are discussed in Appendix 2: Controls That Require Server Code. To implement this strategy, use FILENAME and %INCLUDE statements in the %Main macro to access the other files. For example, assume that your extension node's SAS program is stored in the Sashelp library in a SAS catalog named Sashelp.Emext and that the catalog contains these five files:
  • example.source
  • example_create.source
  • example_train.source
  • example_score.source
  • example_report.source
Example.source would contain the %Main macro, and it would appear as follows:
/* example.source */
%macro main;
	%if %upcase(&EM_ACTION) = CREATE %then %do;
		filename temp catalog 'sashelp.emext.example_create.source';
		%include temp;
		filename temp;
		%create;
	%end;
	%else
	%if %upcase(&EM_ACTION) = TRAIN %then %do;
		filename temp catalog 'sashelp.emext.example_train.source';
		%include temp;
		filename temp;
		%train;
	%end;
	%else
	%if %upcase(&EM_ACTION) = SCORE %then %do;
		filename temp catalog 'sashelp.emext.example_score.source';
		%include temp;
		filename temp;
		%score;
	%end;
	%else
	%if %upcase(&EM_ACTION) = REPORT %then %do;
		filename temp catalog 'sashelp.emext.example_report.source';
		%include temp;
		filename temp;
		%report;
	%end;
%mend main;
%main;
The other four files would contain their respective macros. There is more to this strategy than simple organizational efficiency; it can actually enhance performance. To illustrate, consider the following scenario. When a node is first placed in a process flow diagram, the entire main program is read and processed. Suppose your TRAIN code contains a thousand lines of code. If the code is contained in the main program, all thousand lines of TRAIN code must be read and processed. However, if the TRAIN code is in a separate file, that code is not processed until the first time the node is run.
A similar situation can occur at run time. At run time, the entire main program is processed. Suppose the node has already been run once and the user has changed a Report property. The actions mechanism prevents the TRAIN code from executing again. However, if your TRAIN code is stored in a separate file, the TRAIN code does not have to be read and processed. This is the recommended strategy.
To store your code in external files rather than in a SAS catalog, simply alter the FILENAME statements accordingly. However, you must store the entry point file (for example, example.source) in a catalog and place it in a SAS library that is accessible by Enterprise Miner. The simplest way to do this is to include your catalog in the Sashelp library by placing the catalog in the SASCFG folder. The exact location of this folder depends on your operating system and your installation configuration, but it is always found under the root SAS directory and has a path resembling ...\SAS\SASFoundation\9.2\nls\en\SASCFG. For example, on a typical Windows installation, the path is C:\Program Files\SAS\SASFoundation\9.2\nls\en\SASCFG.
You can also store the catalog in another folder and then modify the SAS system configuration file Sasv9.cfg so that this folder is included in the Sashelp search path. The Sasv9.cfg file is located under the root SAS directory in ...\SAS\SASFoundation\9.2\nls\en. Putting your code in the Sashelp library enables anyone using that server to access it.
An alternative is to place your code in a separate folder and issue a LIBNAME statement. The library needs to be accessible when a project is opened because a node's main program is read and processed when the node is first placed in a process flow diagram (only the CREATE action is executed). If a LIBNAME statement has not been issued when a project opens and you drop a node in a process flow diagram, the node's main program will not be accessible by Enterprise Miner. See Appendix 4: Allocating Libraries for SAS Enterprise Miner for details.