Train, Score, and Report Actions

When thinking about how to take advantage of the actions mechanism, you might find it useful to think of a node's code as being analogous to a process flow, where your Train, Score, and Report code are separate nodes that are always at fixed relative positions.
Code Flow graphic
If you don't take advantage of actions, all of your code would be Train code, so that is your default. The question then becomes: what functionality can you remove from your Train code and put in Score or Report code in order to best take advantage of actions? A node's Train action is typically the most time consuming. Therefore, your objective is to separate your code so that user actions do not cause the Train action to be executed unnecessarily. Keep in mind that the actions mechanism has an impact only if at least one of the following is true:
  • a user runs the node and an input data set has changed
  • a user runs the node and the variables table has changed
  • a user runs the node and one of the node's properties has been changed. This can include changing the data in a registered file that has its Property attribute set to Y and its Action attribute set to either TRAIN, SCORE, or REPORT.
An extension node's program typically performs the following:
  • input processing
  • output processing
  • report processing
Input processing refers to processes like scanning the training data to fit statistical models, performing data transformations, generating descriptive statistics, and so on. This is typically the main function of a node. Input processing is almost always performed in the node's Train action. Output processing refers to processes that prepare the data that is passed to subsequent nodes in a process flow. Typically, this involves data scoring or modifying metadata. When possible, you include output processing in the Score action. However, some output processes induce feedback into an input process. Such output processes would, therefore, be performed in the Train action. For example, suppose your node generates a decision tree (input process). You then allow the user to modify the metadata (output process); in this case, suppose the user is allowed to manually reject input variables. In most situations like this, you would want to regenerate the tree (feedback). Finally, the input process often generates information that you want to report to the user. This information is typically reported in the form of tables or graphs. This reporting process rarely induces feedback into either the input or output processes and is typically performed in the node's Report action.