The HPSPLIT Procedure

PROC HPSPLIT Features

The main features of the HPSPLIT procedure are as follows:

  • Model creation

    • supports interval and nominal inputs

    • supports nominal targets

    • provides the entropy, Gini, and FastCHAID methods for tree growth

    • provides multiple statistical metrics for tree pruning

    • provides C4.5-style pruning

    • partitions the input data set into training and validation sets

  • Score output data set

    • saves scored results for the training data

    • provides predicted levels and posterior probabilities

  • Score code file

    • saves SAS DATA step code, which can be used for scoring new data with the tree model

  • Rules file

    • saves English rules that describe the leaves of the tree

  • Node output data set

    • saves statistics and descriptive information for the nodes in the tree

  • Variable importance output data set

    • saves the importance of the input variables in creating the pruned decision tree

    • provides variable importance for the validation set

  • Subtree monitoring output data sets

    • save statistical metrics for each subtree that is created during growth

    • save statistical metrics for each subtree that is created during pruning

Because the HPSPLIT procedure is a high-performance analytical procedure, it also does the following:

  • enables you to run in distributed mode on a cluster of machines that distribute the data and the computations

  • enables you to run in single-machine mode on the server where SAS is installed

  • exploits all of the available cores and concurrent threads, regardless of execution mode.

For more information, see the section Processing Modes in Chapter 2: Shared Concepts and Topics.