The main features of the HPSPLIT procedure are as follows:
Model creation
supports interval and nominal inputs
supports nominal targets (decision trees)
supports interval targets (regression trees)
provides the entropy, Gini, FastCHAID, and chi-square methods for decision tree growth (for nominal targets)
provides the variance and F test methods for regression tree growth (for interval targets)
provides multiple statistical metrics for decision tree pruning
provides C4.5-style decision tree pruning
provides ASE-based regression tree pruning
partitions the input data set into training and validation sets
provides surrogate rules in addition to popularity, similarity, or a dedicated branch for missing value assignments
Score output data set
saves scored results for the training data
provides predicted levels and posterior probabilities
Score code file
saves SAS DATA step code, which can be used for scoring new data with the tree model
Rules file
saves node rules that describe the leaves of the tree
Node output data set
saves statistics and descriptive information for the nodes in the tree
Variable importance output data set
saves the importance of the input variables in creating the pruned decision tree
provides variable importance for the validation set
Subtree monitoring output data sets
save statistical metrics for each subtree that is created during growth
save statistical metrics for each subtree that is created during pruning
Because the HPSPLIT procedure is a high-performance analytical procedure, it also does the following:
enables you to run in distributed mode on a cluster of machines that distribute the data and the computations
enables you to run in single-machine mode on the server where SAS is installed
exploits all of the available cores and concurrent threads, regardless of execution mode.
For more information, see the section Processing Modes in Chapter 3: Shared Concepts and Topics.