The HPREG Procedure

Displayed Output

The following sections describe the output produced by PROC HPREG. The output is organized into various tables, which are discussed in the order of appearance.

Performance Information

The Performance Information table is produced by default. It displays information about the execution mode. For single-machine mode, the table displays the number of threads used. For distributed mode, the table displays the grid mode (symmetric or asymmetric), the number of compute nodes, and the number of threads per node.

Model Information

The Model Information table displays basic information about the model, such as the response variable, frequency variable, weight variable, and the type of parameterization used for classification variables named in the CLASS statement.

Selection Information

When you specify the SELECTION statement, the HPREG procedure produces by default a series of tables with information about the model selection. The Selection Information table informs you about the model selection method; select, stop, and choose criteria; and other parameters that govern the selection. You can suppress this table by specifying DETAILS=NONE in the SELECTION statement.

Number of Observations

The Number of Observations table displays the number of observations read from the input data set and the number of observations used in the analysis. If you specify a FREQ statement, the table also displays the sum of frequencies read and used. If you use a PARTITION statement, the table also displays the number of observations used for each data role.

Class Level Information

The Class Level Information table lists the levels of every variable specified in the CLASS statement. You should check this information to make sure that the data are correct. You can adjust the order of the CLASS variable levels with the ORDER= option in the CLASS statement. You can suppress the Class Level Information table completely or partially with the NOCLPRINT= option in the PROC HPREG statement.

If the classification variables are in the reference parameterization, the Class Level Information table also displays the reference value for each variable. The Class Level Information table also indicates which, if any, of the classification variables are split by using the SPLIT option in the CLASS statement.

Dimensions

The Dimensions table displays information about the number of effects and the number of parameters from which the selected model is chosen. If you use split classification variables, then this table also includes the number of effects after splitting is taken into account.

Entry and Removal Candidates

When you specify the DETAILS=ALL or DETAILS=STEPS option in the SELECTION statement, the HPREG procedure produces Entry Candidates and Removal Candidates tables that display the effect names and values of the criterion used to select entering or departing effects at each step of the selection process. The effects are displayed in sorted order from best to worst of the selection criterion.

Selection Summary

When you specify the SELECTION statement, the HPREG procedure produces the Selection Summary table with information about the sequence of steps of the selection process. For each step, the effect that was entered or dropped is displayed along with the statistics used to select the effect, stop the selection, and choose the selected model. For all criteria that you can use for model selection, the steps at which the optimal values of these criteria occur are also indicated.

The display of the Selection Summary table can be suppressed by specifying DETAILS=NONE in the SELECTION statement.

Stop Reason

The Stop Reason table displays the reason why the selection stopped. To facilitate programmatic use of this table, an integer code is assigned to each reason and is included if you output this table by using an ODS OUTPUT statement. The reasons and their associated codes follow:

Code

Stop Reason

1

All eligible effects are in the model.

2

All eligible effects have been removed.

3

Specified maximum number of steps done.

4

The model contains the specified maximum number of effects.

5

The model contains the specified minimum number of effects (for backward selection).

6

The stopping criterion is at a local optimum.

7

No suitable add or drop candidate could be found.

8

Adding or dropping any effect does not improve the selection criterion.

9

No candidate meets the appropriate SLE or SLS significance level.

10

Stepwise selection is cycling.

11

The model is an exact fit.

12

Dropping an effect would result in an empty model.

The display of the Stop Reason table can be suppressed by specifying DETAILS=NONE in the SELECTION statement.

Selection Reason

When you specify the SELECTION statement, the HPREG procedure produces a simple table that contains text informing you about the reason why the final model was selected.

The display of the Selection Reason table can be suppressed by specifying DETAILS=NONE in the SELECTION statement.

Selected Effects

When you specify the SELECTION statement, the HPREG procedure produces a simple table that contains text informing you about which effects were selected into the final model.

ANOVA

The ANOVA table displays an analysis of variance for the selected model. This table includes the following:

  • the Source of the variation, Model for the fitted regression, Error for the residual error, and C Total for the total variation after correcting for the mean. The Uncorrected Total Variation is produced when the NOINT option is used.

  • the degrees of freedom (DF) associated with the source

  • the Sum of Squares for the term

  • the Mean Square, the sum of squares divided by the degrees of freedom

  • the F Value for testing the hypothesis that all parameters are 0 except for the intercept. This is formed by dividing the mean square for Model by the mean square for Error.

  • the Prob>F, the probability of getting a greater F statistic than that observed if the hypothesis is true. When you do model selection, these p-values are generally liberal because they are not adjusted for the fact that the terms in the model have been selected.

You can request ANOVA tables for the model at each step of the selection process with the DETAILS= option in the SELECTION statement.

Fit Statistics

The Fit Statistics table displays fit statistics for the selected model. The statistics displayed include the following:

  • Root MSE, an estimate of the standard deviation of the error term. It is calculated as the square root of the mean square error.

  • R-square, a measure between 0 and 1 that indicates the portion of the (corrected) total variation attributed to the fit rather than left to residual error. It is calculated as SS(Model) divided by SS(Total). It is also called the coefficient of determination. It is the square of the multiple correlation—in other words, the square of the correlation between the dependent variable and the predicted values.

  • Adj R-Sq, the adjusted R-square, a version of R-square that has been adjusted for degrees of freedom. It is calculated as

    \[  \bar{R}^2 = 1 - \frac{(n-i)(1-R^2)}{n-p}  \]

    where i is equal to 1 if there is an intercept and 0 otherwise, n is the number of observations used to fit the model, and p is the number of parameters in the model.

  • fit criteria AIC, AICC, BIC, CP, and PRESS if they are used in the selection process. See Table 8.5 for the formulas for evaluating these criteria.

  • the average square errors (ASE) on the training, validation, and test data.

You can request Fit Statistics tables for the model at each step of the selection process with the DETAILS= option in the SELECTION statement.

Parameter Estimates

The Parameter Estimates table displays the parameters in the selected model and their estimates. The information displayed for each parameter in the selected model includes the following:

  • the parameter label that includes the effect name and level information for effects that contain classification variables

  • the degrees of freedom (DF) for the parameter. There is one degree of freedom unless the model is not full rank.

  • the parameter estimate

  • the standard error, which is the estimate of the standard deviation of the parameter estimate

  • t Value, the t test that the parameter is 0. This is computed as the parameter estimate divided by the standard error.

  • the Pr > |t|, the probability that a t statistic would obtain a greater absolute value than that observed given that the true parameter is 0. This is the two-tailed significance probability.

    When you do model selection, these p-values are generally liberal because they are not adjusted for the fact that the terms in the model have been selected.

You can request Parameter Estimates tables for the model at each step of the selection process with the DETAILS= option in the SELECTION statement.

Timing Information

If you specify the DETAILS option in the PERFORMANCE statement, the procedure also produces a Timing table in which elapsed time (absolute and relative) for the main tasks of the procedure are displayed.