PROC HPSPLIT Features :: SAS/STAT(R) 13.1 User's Guide: High-Performance Procedures

The main features of the HPSPLIT procedure are as follows:

Model creation
- supports interval and nominal inputs
- supports nominal targets (decision trees)
- supports interval targets (regression trees)
- provides the entropy, Gini, FastCHAID, and chi-square methods for decision tree growth (for nominal targets)
- provides the variance and F test methods for regression tree growth (for interval targets)
- provides multiple statistical metrics for decision tree pruning
- provides C4.5-style decision tree pruning
- provides ASE-based regression tree pruning
- partitions the input data set into training and validation sets
- provides surrogate rules in addition to popularity, similarity, or a dedicated branch for missing value assignments
Score output data set
- saves scored results for the training data
- provides predicted levels and posterior probabilities
Score code file
- saves SAS DATA step code, which can be used for scoring new data with the tree model
Rules file
- saves node rules that describe the leaves of the tree
Node output data set
- saves statistics and descriptive information for the nodes in the tree
Variable importance output data set
- saves the importance of the input variables in creating the pruned decision tree
- provides variable importance for the validation set
Subtree monitoring output data sets
- save statistical metrics for each subtree that is created during growth
- save statistical metrics for each subtree that is created during pruning

Because the HPSPLIT procedure is a high-performance analytical procedure, it also does the following:

enables you to run in distributed mode on a cluster of machines that distribute the data and the computations
enables you to run in single-machine mode on the server where SAS is installed
exploits all of the available cores and concurrent threads, regardless of execution mode.

The HPSPLIT Procedure