The HPSPLIT Procedure

Handling Missing Values

When building and pruning a tree, PROC HPSPLIT ignores observations that have a missing value in the target. It includes these observations when using the SCORE statement to score the data, and it includes them in the SAS DATA step code.

PROC HPSPLIT always includes observations that have missing values in input variables. It uses a special level or bin for them that is not used in per-variable split determination. After the splitter has determined the per-variable split, the observations that have a missing value in that variable are assigned to the leaf that has the largest number of observations.

Each split handles missing values by assigning them to one of the children. This ensures that data scored by the SAS DATA step score code can always assign a target to any record.