The HPSPLIT Procedure

PROC HPSPLIT Features

The main features of the HPSPLIT procedure are as follows:

  • Model creation

    • supports interval and nominal inputs

    • supports nominal targets (decision trees)

    • supports interval targets (regression trees)

    • provides the entropy, Gini, FastCHAID, CHAID, information gain ratio (IGR), and chi-square methods for decision tree growth (for nominal targets)

    • provides the variance, CHAID, and F test methods for regression tree growth (for interval targets)

    • provides multiple statistical metrics for decision tree pruning

    • provides C4.5-style decision tree pruning

    • provides ASE-based regression tree pruning

    • provides cost-complexity pruning

    • provides minimum metric subtree selection

    • partitions the input data set into training and validation sets

    • provides surrogate rules in addition to popularity, similarity, or a dedicated branch for missing value assignments

  • Score output data set

    • saves scored results for the training data

    • provides predicted levels and posterior probabilities

  • Score code file

    • saves SAS DATA step code, which can be used for scoring new data with the tree model

  • Rules file

    • saves node rules that describe the leaves of the tree

  • Node output data set

    • saves statistics and descriptive information for the nodes in the tree

  • Variable importance output data set

    • saves the importance of the input variables in creating the pruned decision tree

    • provides variable importance for the validation set

  • Subtree monitoring output data sets

    • save statistical metrics for each subtree that is created during growth

    • save statistical metrics for each subtree that is created during pruning

Because the HPSPLIT procedure is a high-performance analytical procedure, it also does the following:

  • enables you to run in distributed mode on a cluster of machines that distribute the data and the computations

  • enables you to run in single-machine mode on the server where SAS is installed

  • exploits all of the available cores and concurrent threads, regardless of execution mode.

For more information, see the section Processing Modes in ChapterĀ 3: Shared Concepts and Topics.