PROC HPSPLIT Features :: SAS/STAT(R) 12.3 User's Guide: High-Performance Procedures

The main features of the HPSPLIT procedure are as follows:

Model creation
- supports interval and nominal inputs
- supports nominal targets
- provides the entropy, Gini, and FastCHAID methods for tree growth
- provides multiple statistical metrics for tree pruning
- provides C4.5-style pruning
- partitions the input data set into training and validation sets
Score output data set
- saves scored results for the training data
- provides predicted levels and posterior probabilities
Score code file
- saves SAS DATA step code, which can be used for scoring new data with the tree model
Rules file
- saves English rules that describe the leaves of the tree
Node output data set
- saves statistics and descriptive information for the nodes in the tree
Variable importance output data set
- saves the importance of the input variables in creating the pruned decision tree
- provides variable importance for the validation set
Subtree monitoring output data sets
- save statistical metrics for each subtree that is created during growth
- save statistical metrics for each subtree that is created during pruning

Because the HPSPLIT procedure is a high-performance analytical procedure, it also does the following:

enables you to run in distributed mode on a cluster of machines that distribute the data and the computations
enables you to run in single-machine mode on the server where SAS is installed
exploits all of the available cores and concurrent threads, regardless of execution mode.

The HPSPLIT Procedure