Example 66.2 Best Subset Selection

An alternative to stepwise selection of variables is best subset selection. This method uses the branch-and-bound algorithm of Furnival and Wilson (1974) to find a specified number of best models containing one, two, or three variables, and so on, up to the single model containing all of the explanatory variables. The criterion used to determine the "best" subset is based on the global score chi-square statistic. For two models A and B, each having the same number of explanatory variables, model A is considered to be better than model B if the global score chi-square statistic for A exceeds that for B.

In the following statements, best subset selection analysis is requested by specifying the SELECTION=SCORE option in the MODEL statement. The BEST=3 option requests the procedure to identify only the three best models for each size. In other words, PROC PHREG will list the three models having the highest score statistics of all the models possible for a given number of covariates.

proc phreg data=Myeloma;
   model Time*VStatus(0)=LogBUN HGB Platelet Age LogWBC
                         Frac LogPBM Protein SCalc
                         / selection=score best=3;
run;

Output 66.2.1 displays the results of this analysis. The number of explanatory variables in the model is given in the first column, and the names of the variables are listed on the right. The models are listed in descending order of their score chi-square values within each model size. For example, among all models containing two explanatory variables, the model that contains the variables LogBUN and HGB has the largest score value (12.7252), the model that contains the variables LogBUN and Platelet has the second-largest score value (11.1842), and the model that contains the variables LogBUN and SCalc has the third-largest score value (9.9962).

Output 66.2.1 Best Variable Combinations
The PHREG Procedure

Regression Models Selected by Score Criterion
Number of
Variables
Score
Chi-Square
Variables Included in Model
1 8.5164 LogBUN
1 5.0664 HGB
1 3.1816 Platelet
2 12.7252 LogBUN HGB
2 11.1842 LogBUN Platelet
2 9.9962 LogBUN SCalc
3 15.3053 LogBUN HGB SCalc
3 13.9911 LogBUN HGB Age
3 13.5788 LogBUN HGB Frac
4 16.9873 LogBUN HGB Age SCalc
4 16.0457 LogBUN HGB Frac SCalc
4 15.7619 LogBUN HGB LogPBM SCalc
5 17.6291 LogBUN HGB Age Frac SCalc
5 17.3519 LogBUN HGB Age LogPBM SCalc
5 17.1922 LogBUN HGB Age LogWBC SCalc
6 17.9120 LogBUN HGB Age Frac LogPBM SCalc
6 17.7947 LogBUN HGB Age LogWBC Frac SCalc
6 17.7744 LogBUN HGB Platelet Age Frac SCalc
7 18.1517 LogBUN HGB Platelet Age Frac LogPBM SCalc
7 18.0568 LogBUN HGB Age LogWBC Frac LogPBM SCalc
7 18.0223 LogBUN HGB Platelet Age LogWBC Frac SCalc
8 18.3925 LogBUN HGB Platelet Age LogWBC Frac LogPBM SCalc
8 18.1636 LogBUN HGB Platelet Age Frac LogPBM Protein SCalc
8 18.1309 LogBUN HGB Platelet Age LogWBC Frac Protein SCalc
9 18.4550 LogBUN HGB Platelet Age LogWBC Frac LogPBM Protein SCalc