The HPSPLIT Procedure

SAS Enterprise Miner Syntax and Notes

In addition to the syntax that is described in the CLASS and MODEL statement sections, PROC HPSPLIT supports SAS Enterprise Miner INPUT/TARGET syntax that many Enterprise Miner users are familiar with. The INPUT/TARGET syntax cannot be used together with the CLASS/MODEL syntax of SAS/STAT. Doing so is an error.

Enterprise Miner style syntax has one TARGET statement and one or more INPUT statements. If you use the Enterprise Miner syntax, then the PROC HPSPLIT statement, the TARGET statement, and the INPUT statement are required. Depending on the options in those statements, specified variables can be interval or nominal. By default, numeric INPUT variables are treated as interval (or continuous) predictors, and character INPUT variables are treated as nominal (or categorical) predictors.

  • INPUT variables </ option>;

  • TARGET variable </ option>;

INPUT Statement

  • INPUT variables </ option>;

The INPUT statement specifies predictor variables for the decision tree or regression tree. The value of variable can be a range such as "var_1–var_1000" or the special "_ALL_" value to include all variables in the data set. As with CLASS variables, all nominal INPUT variables are padded or truncated to 32 characters.

It is an error to use an INPUT statement with a MODEL or CLASS statement.

You can specify the following option:

LEVEL=INT | NOM

specifies whether the specified predictor variables are interval or nominal.

INT

treats all numeric variables as interval predictors.

NOM

treats all variables as nominal predictors.

By default, numeric variables are treated as interval predictors, and character variables are treated as nominal predictors. Specifying LEVEL=NOM forces all variables in that statement to be treated as nominal. PROC HPSPLIT ignores the LEVEL=INT option for character variables.

TARGET Statement

  • TARGET variable </ options>;

The TARGET statement names the variable whose values PROC HPSPLIT tries to predict. Missing values in the target are ignored except during scoring.

It is an error to use a TARGET statement with a MODEL or CLASS statement.

You can specify the following options:

LEVEL=INT | NOM

specifies whether the specified response variable is interval or nominal.

INT

treats the response as an interval variable and creates a regression tree.

NOM

treats the response as a nominal variable and creates a decision tree.

By default, LEVEL=NOM, and PROC HPSPLIT creates a decision tree (nominal response).

ORDER=ordering

ensures that the response values are levelized in the specified order. You can specify the following values:

ASC | ASCENDING

levelizes response values in ascending order.

DESC | DESCENDING

levelizes response values in descending order.

FMTASC | ASCFORMATTED

levelizes response values in ascending order of the formatted value.

FMTDESC | DESFORMATTED

levelizes response values in descending order of the formatted value.

By default, ORDER=DESC.

Example Classification Tree Syntax for SAS/STAT and SAS Enterprise Miner

The following two programs are equivalent. The first is based on the syntax in the section Syntax: HPSPLIT Procedure, and the second is SAS Enterprise Miner syntax.

proc hpsplit data=sashelp.cars;
  class enginesize model;
  model enginesize = mpg_highway model;
run;
proc hpsplit data=sashelp.cars;
  target enginesize;
  input mpg_highway model;
run;

Example Regression Tree Syntax for SAS/STAT and SAS Enterprise Miner

The following two programs are equivalent. The first is based on the syntax in the section Syntax: HPSPLIT Procedure, and the second is SAS Enterprise Miner syntax.

proc hpsplit data=sashelp.cars;
  class model;
  model enginesize = mpg_highway model;
run;
proc hpsplit data=sashelp.cars;
  target enginesize / level=int;
  input mpg_highway model;
run;

Note for SAS Enterprise Miner Users

Note: The RSS splitting criterion is also known as the variance splitting criterion.