The HPREG Procedure

SELECTION Statement

  • SELECTION <options>;

The SELECTION statement performs variable selection. All options except the SCREEN option are fully documented in the section SELECTION Statement in SAS/STAT 14.1 User's Guide: High-Performance Procedures. The SCREEN option is described in the following section. The remainder of this section describes specific information about how PROC HPREG implements the METHOD= option and the DETAILS= option.

The HPREG procedure supports the following values of the METHOD= option in the SELECTION statement:

NONE

specifies no model selection.

FORWARD

specifies the forward selection method, which starts with no effects in the model and adds effects.

BACKWARD

specifies the backward elimination method, which starts with all effects in the model and deletes effects.

STEPWISE

specifies the stepwise regression method, which is similar to the forward selection method except that effects already in the model do not necessarily stay there.

FORWARDSWAP

specifies the forward-swap selection method, which is an extension of the forward selection method. Before any addition step, PROC HPREG makes all pairwise swaps of effects in and out of the current model that improve the selection criterion. When the selection criterion is R square, this method is the same as the MAXR method in the REG procedure in SAS/STAT software.

LAR

specifies the least angle regression method. Like forward selection, this method starts with no effects in the model and adds effects. The parameter estimates at any step are "shrunk" when compared to the corresponding least squares estimates. If the model contains classification variables, then these classification variables are split. For more information, see the SPLIT option in the CLASS statement.

LASSO

specifies the lasso method, which adds and deletes parameters based on a version of ordinary least squares in which the sum of the absolute regression coefficients is constrained. If the model contains classification variables, then these classification variables are split. For more information, see the SPLIT option in the CLASS statement.

The DETAILS=ALL and DETAILS=STEPS options produce the "ANOVA," "Fit Statistics," and "Parameter Estimates" tables, which provide information about the model that is selected at each step of the selection process.

In addition to other options, which are fully documented in the section SELECTION Statement in SAS/STAT 14.1 User's Guide: High-Performance Procedures, PROC HPREG also supports a SCREEN option, which has the following syntax:

SCREEN <(global-screen-options)> <=screen-options>

You can specify following global-screen-options:

DETAILS=NONE | SUMMARY | ALL

specifies the level of detail to be produced about the screening process. You can specify the following values:

NONE

suppresses all tables that provide details of the screening process.

ALL

produces the following output and shows model selection details at each stage of the screening process:

  • a screening table that shows the correlations that are used to obtain the screened effects for the first two stages of the screening process

  • a screened effects table that lists the effects that are chosen at each stage of the screening process

SUMMARY

produces the following output and shows details about the model selection only for the final stage of the screening process:

  • a screening table that shows the correlations that are used to obtain the screened effects for the first two stages of the screening process

  • a screened effects table that lists the effects that are chosen at each stage of the screening process

By default, DETAILS=SUMMARY.

SINGLESTAGE

screens effects and selects a model only once.

MULTISTAGE

performs multiple stages, each of which contains a screening and a model selection step.

You can specify the following screen-options after an = sign:

SCREEN=n1 <n2>

specifies the number of effects to be chosen at the first two stages of the screening process. If you specify only n1, then n1 is used for both the first and second stages. If you specify both n1 and n2, then n1 is used at the first stage and n2 is used at the second stage. At the first stage, effects are ranked in decreasing order of the magnitude of their pairwise correlations with the response, and the first n1 effects are used in the selection process at that stage. At the second stage, effects are ranked in decreasing order of the magnitude of their pairwise correlations with the residuals obtained at the first stage, and the first n2 effects are used in the selection process at that stage.

SCREEN=PERCENT(p1 <p2>)

specifies the percentage of effects in the MODEL statement to be chosen at the first two stages of the screening process. If you specify only p1, then p1 is used for both the first and second stages. If you specify p1 and p2, then p1 is used at the first stage and p2 is used at the second stage.

SCREEN=CUTOFF(c1 <c2>)

specifies the minimum value of the screening statistic that effects must have in order to be chosen at the first two stages of the screening process. If you specify only c1, then c1 is used for both the first and second stages. If you specify both c1 and c2, then c1 is used at the first stage and c2 is used at the second stage. At the first stage, any effect whose absolute pairwise correlation with the response is less than the first-stage cutoff is not used in the selection process at that stage. At the second stage, any effect whose absolute pairwise correlation with the residuals obtained from the first stage is less than the second-stage cutoff is not used in the selection process at that stage.

If you do not specify any screen-options, SCREEN=PERCENT(10) by default.

For a classification effect that has multiple degrees of freedom, pairwise correlations with the response at the first stage and the first stage residuals at the second stage are computed separately for each dummy variable that corresponds to the levels of the classification variables in the effect. The largest magnitude of these correlations is used as a proxy for the correlation statistic for that effect.