When thinking about
how to take advantage of the actions mechanism, you might find it
useful to think of a node's code as being analogous to a process flow,
where your Train, Score, and Report code are separate nodes that always
have fixed relative positions.
If you don't take advantage
of actions, all of your code would be Train code, so that is your
default. The question then becomes: what functionality can you remove
from your Train code and put in Score or Report code in order to best
take advantage of actions? A node's Train action is typically the
most time consuming. Therefore, your objective is to separate your
code so that user actions do not cause the Train action to be executed
unnecessarily. Keep in mind that the actions mechanism has an impact
only if at least one of the following is true:
-
a user runs the node and an input
data set has changed
-
a user runs the node and the variables
table has changed
-
a user runs the node and one of
the node's properties has been changed. This can include changing
the data in a registered file that has its Property attribute set
to Y and its Action attribute set to either TRAIN, SCORE, or REPORT.
An extension node's
program typically performs the following:
Input processing refers
to processes like scanning the training data to fit statistical models,
performing data transformations, generating descriptive statistics,
and so on. This is typically the main function of a node. Input processing
is almost always performed in the node's Train action. Output processing
refers to processes that prepare the data that is passed to subsequent
nodes in a process flow. Typically this involves data scoring or modifying
metadata. When possible, you include output processing in the Score
action. However, some output processes induce feedback into an input
process. Such output processes would, therefore, be performed in the
Train action. For example, suppose your node generates a decision
tree (input process). You then allow the user to modify the metadata
(output process); in this case, suppose the user is allowed to manually
reject input variables. In most situations like this, you would want
to regenerate the tree (feedback). Finally, the input process often
generates information that you want to report to the user. This information
is typically reported in the form of tables or graphs. This reporting
process rarely induces feedback into either the input or output processes
and is typically performed in the node's Report action.