Partition Input Data

The Data Partition node enables you to partition your input data into one of the following data sets:
  • Train — used for preliminary model fitting. The analyst attempts to find the best model weights by using this data set.
  • Validation — used to assess the adequacy of the model in the Model Comparison node. The validation data set is also used for model fine-tuning in the Decision Tree model node to create the best subtree.
  • Test — used to obtain a final, unbiased estimate of the generalization error of the model.
For more information about the Data Partition node, see the SAS Enterprise Miner Help.
Perform the following steps to add a Data Partition node to the analysis:
  1. Select the Sample tab on the node toolbar and drag a Data Partition node into the diagram workspace.
  2. Connect the VAEREXT_SERIOUS input data node to the Data Partition node.
    Note: To connect one node to another node in the default horizontal view, position the mouse pointer at the right edge of a node. A pencil icon appears. Hold the left mouse button down, and drag the line to the left edge of the node that you want to connect to, and then release the left mouse button. To change your view of connected nodes to a vertical view, right-click in the diagram workspace, and select Layoutthen select Vertically in the menu that appears.
    Process flow diagram
  3. Select the Data Partition node to view its properties.
    Details about the node appear in the Properties Panel.
  4. Set the Data Set Allocations properties as follows:
    • Set the Training property to 60.0.
    • Set the Validation property to 20.0.
    • Set the Test property to 20.0.
    These data partition settings ensure adequate data when you build prediction models with the VAEREXT_SERIOUS data.