The main features of the HPSPLIT procedure are as follows:
Model creation
supports interval and nominal inputs
supports nominal targets
provides the entropy, Gini, and FastCHAID methods for tree growth
provides multiple statistical metrics for tree pruning
provides C4.5-style pruning
partitions the input data set into training and validation sets
Score output data set
saves scored results for the training data
provides predicted levels and posterior probabilities
Score code file
saves SAS DATA step code, which can be used for scoring new data with the tree model
Rules file
saves English rules that describe the leaves of the tree
Node output data set
saves statistics and descriptive information for the nodes in the tree
Variable importance output data set
saves the importance of the input variables in creating the pruned decision tree
provides variable importance for the validation set
Subtree monitoring output data sets
save statistical metrics for each subtree that is created during growth
save statistical metrics for each subtree that is created during pruning
Because the HPSPLIT procedure is a high-performance analytical procedure, it also does the following:
enables you to run in distributed mode on a cluster of machines that distribute the data and the computations
enables you to run in single-machine mode on the server where SAS is installed
exploits all of the available cores and concurrent threads, regardless of execution mode.
For more information, see the section Processing Modes in Chapter 2: Shared Concepts and Topics.