When thinking about
how to take advantage of the actions mechanism, you might find it
useful to think of a node's code as being analogous to a process
flow, where your Train, Score, and Report code are separate nodes
that are always at fixed relative positions.
If you don't take
advantage of actions, all of your code would be Train code, so that
is your default. The question then becomes: what functionality can
you remove from your Train code and put in Score or Report code in
order to best take advantage of actions? A node's Train action
is typically the most time consuming. Therefore, your objective is
to separate your code so that user actions do not cause the Train
action to be executed unnecessarily. Keep in mind that the actions
mechanism has an impact only if at least one of the following is true:
-
a user runs the node and an input
data set has changed
-
a user runs the node and the variables
table has changed
-
a user runs the node and one of
the node's properties has been changed. This can include changing
the data in a registered file that has its Property attribute set
to Y and its Action attribute set to either TRAIN, SCORE, or REPORT.
An extension node's
program typically performs the following:
Input processing refers
to processes like scanning the training data to fit statistical models,
performing data transformations, generating descriptive statistics,
and so on. This is typically the main function of a node. Input processing
is almost always performed in the node's Train action. Output
processing refers to processes that prepare the data that is passed
to subsequent nodes in a process flow. Typically, this involves data
scoring or modifying metadata. When possible, you include output processing
in the Score action. However, some output processes induce feedback
into an input process. Such output processes would, therefore, be
performed in the Train action. For example, suppose your node generates
a decision tree (input process). You then allow the user to modify
the metadata (output process); in this case, suppose the user is allowed
to manually reject input variables. In most situations like this,
you would want to regenerate the tree (feedback). Finally, the input
process often generates information that you want to report to the
user. This information is typically reported in the form of tables
or graphs. This reporting process rarely induces feedback into either
the input or output processes and is typically performed in the node's
Report action.