SAS/STAT Software

HPSPLIT Procedure

The HPSPLIT procedure is a high-performance procedure that builds tree-based statistical models for classification and regression. The procedure constructs two types of decision trees: classification trees for modeling categorical responses, and regression trees for modeling continuous responses. The following are highlights of the HPSPLIT procedure's features:

  • methods of splitting nodes include criteria based on impurity (entropy, Gini index, residual sum of squares) and criteria based on statistical tests (chi-square, F test, CHAID, FastCHAID)
  • computes cost-complexity, C4.5, and reduced-error methods of pruning trees
  • supports cross validation and validation data for selecting the best subtree
  • handles missing values by various methods, including surrogate rules
  • tree diagrams, plots for cost-complexity analysis, and plots of ROC curves
  • statistics for assessing model fit, including model-based (resubstitution) statistics and cross validation statistics
  • measures of variable importance
  • SAS DATA step code for scoring new data

For further details see the SAS/STAT User's Guide: The HPSPLIT Procedure ( PDF | HTML )