The following HPGENSELECT statements examine the same data that is used in the section Getting Started: HPGENSELECT Procedure, but they request model selection via the forward selection technique. Model effects are added in the order of their significance until no more effects make a significant improvement of the current model. The DETAILS=ALL option in the SELECTION statement requests that all tables that are related to model selection be produced.
The data set getStarted
is shown in the section Getting Started: HPGENSELECT Procedure. It contains 100 observations on a count response variable (Y
), a continuous variable (Total
) to be used in Example 4.3, and five categorical variables (C1
–C5
), each of which has four numerical levels.
A log-linked Poisson regression model is specified by using classification effects for variables C1
–C5
. The following statements request model selection by using the forward selection method:
proc hpgenselect data=getStarted; class C1-C5; model Y = C1-C5 / Distribution=Poisson; selection method=forward details=all; run;
The model selection tables are shown in Output 4.1.1 through Output 4.1.3.
The “Selection Information” table in Output 4.1.1 summarizes the settings for the model selection. Effects are added to the model only if they produce a significant improvement as judged by comparing the p-value of a score test to the entry significance level (SLE), which is 0.05 by default. The forward selection stops when no effect outside the model meets this criterion.
Output 4.1.1: Selection Information
Selection Information | |
---|---|
Selection Method | Forward |
Select Criterion | Significance Level |
Stop Criterion | Significance Level |
Effect Hierarchy Enforced | None |
Entry Significance Level (SLE) | 0.05 |
Stop Horizon | 1 |
The “Selection Summary” table in Output 4.1.2 shows the effects that were added to the model and their significance level. Step 0 refers to the null model that contains
only an intercept. In the next step, effect C2
made the most significant contribution to the model among the candidate effects (p < 0.0001). In step 2, the most significant contribution when adding an effect to a model that contains the intercept and
C2
was made by C5
. In step 3, the variable C1
(p = 0.0496) was added. In the subsequent step, no effect could be added to the model that would produce a p-value less than 0.05, so variable selection stops.
Output 4.1.2: Selection Summary Information
Selection Summary | |||
---|---|---|---|
Step | Effect Entered |
Number Effects In |
p Value |
0 | Intercept | 1 | . |
1 | C2 | 2 | <.0001 |
2 | C5 | 3 | <.0001 |
3 | C1 | 4 | 0.0496 |
Selection stopped because no candidate for entry is significant at the 0.05 level. |
Selected Effects: | Intercept C1 C2 C5 |
---|
The DETAILS=ALL option produces the “Selection Details” table, which provides fit statistics and the value of the score test chi-square statistic at each step.
Output 4.1.3: Selection Details
Selection Details | ||||||||
---|---|---|---|---|---|---|---|---|
Step | Description | Effects In Model |
Chi-Square | Pr > ChiSq | -2 LogL | AIC | AICC | BIC |
0 | Initial Model | 1 | 350.193 | 352.193 | 352.234 | 354.798 | ||
1 | C2 entered | 2 | 25.7340 | <.0001 | 324.611 | 332.611 | 333.032 | 343.032 |
2 | C5 entered | 3 | 23.0291 | <.0001 | 303.580 | 317.580 | 318.798 | 335.817 |
3 | C1 entered | 4 | 7.8328 | 0.0496 | 295.263 | 315.263 | 317.735 | 341.315 |
Output 4.1.4 displays information about the selected model. Notice that the –2 log likelihood value in the “Fit Statistics” table is larger than the value for the full model in Figure 4.7. This is expected because the selected model contains only a subset of the parameters. Because the selected model is more parsimonious than the full model, the information criteria AIC, AICC and BIC are smaller than in the full model, indicating a better fit.
Output 4.1.4: Fit Statistics
Fit Statistics | |
---|---|
-2 Log Likelihood | 295.26316 |
AIC (smaller is better) | 315.26316 |
AICC (smaller is better) | 317.73507 |
BIC (smaller is better) | 341.31486 |
Pearson Chi-Square | 85.06563 |
Pearson Chi-Square/DF | 0.94517 |
The parameter estimates of the selected model are given in Output 4.1.5. Notice that the effects are listed in the “Parameter Estimates” table in the order in which they were specified in the MODEL statement and not in the order in which they were added to the model.
Output 4.1.5: Parameter Estimates
Parameter Estimates | |||||
---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
Chi-Square | Pr > ChiSq |
Intercept | 1 | 0.775498 | 0.242561 | 10.2216 | 0.0014 |
C1 0 | 1 | -0.211240 | 0.207209 | 1.0393 | 0.3080 |
C1 1 | 1 | -0.685575 | 0.255713 | 7.1879 | 0.0073 |
C1 2 | 1 | -0.127612 | 0.203663 | 0.3926 | 0.5309 |
C1 3 | 0 | 0 | . | . | . |
C2 0 | 1 | 0.958378 | 0.239731 | 15.9817 | <.0001 |
C2 1 | 1 | 0.738529 | 0.237098 | 9.7024 | 0.0018 |
C2 2 | 1 | 0.211075 | 0.255791 | 0.6809 | 0.4093 |
C2 3 | 0 | 0 | . | . | . |
C5 0 | 1 | -0.825545 | 0.214054 | 14.8743 | 0.0001 |
C5 1 | 1 | -0.697611 | 0.202607 | 11.8555 | 0.0006 |
C5 2 | 1 | -0.566706 | 0.213961 | 7.0153 | 0.0081 |
C5 3 | 0 | 0 | . | . | . |