The QUANTSELECT Procedure (Experimental)

Getting Started: QUANTSELECT Procedure

This example demonstrates how you can use the QUANTSELECT procedure to select covariate effects for quantile regression. The data set under investigation contains salary and performance information for Major League Baseball (MLB) players, excluding pitchers, who played at least one game in both the 1986 and 1987 seasons. The salaries (Time Inc., 1987) are for the 1987 season, and the performance measures are from 1986 (Reichler, 1987).

data baseball;
   length team $ 12;
   input name $ 1-18 nAtBat nHits nHome nRuns nRBI nBB
         yrMajor crAtBat crHits crHome crRuns crRbi crBB
         League $ Division $ Team $ Position $ nOuts nAssts
         nError Salary;
   label name="Player's Name"
      nAtBat="Times at Bat in 1986"
      nHits="Hits in 1986"
      nHome="Home Runs in 1986"
      nRuns="Runs in 1986"
      nRBI="RBIs in 1986"
      nBB="Walks in 1986"
      yrMajor="Years in the Major Leagues"
      crAtBat="Career times at bat"
      crHits="Career Hits"
      crHome="Career Home Runs"
      crRuns="Career Runs"
      crRbi="Career RBIs"
      crBB="Career Walks"
      League="League at the end of 1986"
      Division="Division at the end of 1986"
      Team="Team at the end of 1986"
      Position="Position(s) in 1986"
      nOuts="Put Outs in 1986"
      nAssts="Assists in 1986"
      nError="Errors in 1986"
      Salary="1987 Salary in $ Thousands";
      if League='American' and Division='East' then Div='AE';
      if League='American' and Division='West' then Div='AW';
      if League='National' and Division='East' then Div='NE';
      if League='National' and Division='West' then Div='NW';
      logSalary = log(Salary);
   datalines;
Allanson, Andy       293    66     1    30    29    14
                       1   293    66     1    30    29    14
                    American East Cleveland C 446 33 20 .
Ashby, Alan          315    81     7    24    38    39

   ... more lines ...   

Wilson, Willie       631   170     9    77    44    31
                      11  4908  1457    30   775   357   249
                    American West KansasCity CF 408 4 3 1000
;

Suppose you want to investigate how the MLB players’ salaries for the 1987 season depend on performance measures for the players’ previous season and MLB careers. As a starting point for such a analysis, you can use the following statements to obtain a parsimonious conditional median model at $\tau =0.5$:

proc quantselect data=baseball;
   class Div;
   model Salary = nAtBat nHits nHome nRuns nRBI nBB yrMajor crAtBat
                  crHits crHome crRuns crRbi crBB nAssts nError nOuts
                  Div
         / selection=lasso(adaptive stop=aic choose=sbc sh=7);
run;

The SELECTION=LASSO(ADAPTIVE) option in the MODEL statement specifies the adaptive LASSO method (Zou, 2006), which controls the effect selection process. The STOP=AIC option specifies that Akaike’s information criterion (AIC) be used to determine the stopping condition. The CHOOSE=SBC option specifies that the Schwarz Bayesian information criterion (SBC) be used to determine the final selected model. The SH= option specifies the number of stop horizons, which requests that the selection process be stopped whenever the STOP= criterion values at step $s+1, \cdots , s+\mbox{SH}$ are worse than those for step s for some $s\in \{ 0,1,\cdots \} $.

Figure 78.1 shows the Model Information table, which indicates the effect selection settings. You can see that the default quantile type is single level, so this effect selection is effective only for $\tau =0.5$.

Figure 78.1: Model Information

The QUANTSELECT Procedure

Model Information
Data Set WORK.BASEBALL
Dependent Variable Salary
Selection Method Adaptive LASSO
Quantile Type Single Level
Stop Criterion AIC
Choose Criterion SBC


Figure 78.2 summarizes the effect selection process, which starts with an intercept-only model at step 0. At step 1, the effect that corresponds to the career runs is added to the model that reduced the AIC value from 2691.6511 to 2510.7297. You can see that step 10 has the minimum AIC and that step 7 has the minimum SBC. Common sense also tells you that the SBC favors a smaller model than the AIC.

Figure 78.2: Selection Summary

The QUANTSELECT Procedure
 
Quantile = 0.5

Selection Summary
Step Effect
Entered
Number
Effects
In
AIC SBC
0 Intercept 1 2691.6511 2695.2232
1 crRuns 2 2510.7297 2517.8740
2 nHits 3 2470.4807 2481.1971
3 crHome 4 2463.5953 2477.8839
4 nBB 5 2463.7806 2481.6414
5 nOuts 6 2455.6212 2477.0541
6 Div AW 7 2451.4609 2476.4660
7 nAtBat 8 2445.0446 2473.6218*
8 crBB 9 2445.5432 2477.6926
9 nHome 10 2443.4818 2479.2033
10 nRuns 11 2442.6036* 2481.8973
11 crAtBat 12 2442.8387 2485.7046
12 yrMajor 13 2443.5374 2489.9754
13 nError 14 2445.2085 2495.2187
14 Div NE 15 2446.4042 2499.9865
15 nRBI 16 2448.2003 2505.3547
16 nAssts 17 2449.9731 2510.6997
* Optimal Value Of Criterion


Figure 78.3 shows that the selection process stopped at a local minimum of the STOP= criterion, which is step 10. According to the SH=7 option, the effect selection process is stopped at step 10 because all the AIC values for step 11 through step 17 are no less than the AIC at step 10. Step 17 is ignored in the selection summary table because it is the last step.

Figure 78.3: Stop Reason

Selection stopped at a local minimum of the AIC criterion.


Figure 78.4 shows how the final selected model is determined. CHOOSE=SBC is specified in this example, so the model at step 7 is chosen as the final selected model.

Figure 78.4: Selection Reason

The model at step 7 is selected where SBC is 2473.62.


Figure 78.5 shows the final selected effects and Figure 78.6 shows the parameter estimates for the final selected model.

Figure 78.5: Selected Effects

The QUANTSELECT Procedure
 
Quantile = 0.5

Selected Effects: Intercept nAtBat nHits nBB crHome crRuns nOuts Div AW


Figure 78.6: Parameter Estimates

Parameter Estimates
Parameter DF Estimate Standardized
Estimate
Intercept 1 -18.187539 0
nAtBat 1 -1.582714 -0.500417
nHits 1 7.044354 0.686968
nBB 1 2.053726 0.097911
crHome 1 1.429926 0.272726
crRuns 1 0.425955 0.316167
nOuts 1 0.282803 0.175489
Div AW 1 -57.671778 -0.056862


Quantile regression can fit a conditional quantile model at any quantile level $\tau \in (0,1)$, so it can describe the entire distribution of a response variable conditional on covariate effects. To further investigate the effects that might affect the MLB players’ salaries, you can also conduct effect selection at $\tau =0.1$ and $\tau =0.9$, which correspond to low-end salaries and high-end salaries respectively. The following statements use the same selection settings that are used in the previous program:


proc quantselect data=baseball;
   class Div;
   model Salary = nAtBat nHits nHome nRuns nRBI nBB yrMajor crAtBat
                  crHits crHome crRuns crRbi crBB nAssts nError nOuts
                  Div
         / quantiles=0.1 0.9 selection=lasso(adaptive stop=aic choose=sbc sh=7);
run;

Figure 78.7 shows the effect selection summary with $\tau =0.1$.

Figure 78.7: Selection Summary: $\tau =0.1$

The QUANTSELECT Procedure
 
Quantile = 0.1

Selection Summary
Step Effect
Entered
Effect
Removed
Number
Effects
In
AIC SBC
0 Intercept   1 2008.3489 2011.9211
1 crRuns   2 1918.7675 1925.9118
2 nHits   3 1897.2425 1907.9590*
3 yrMajor   4 1897.2476 1911.5362
4 crBB   5 1896.1765 1914.0373
5 nBB   6 1894.1257 1915.5587
6 Div AW   7 1891.0577 1916.0628
7 crHome   8 1892.2575 1920.8348
8   crHome 7 1891.0577 1916.0628
9 nAtBat   8 1889.6365* 1918.2137
10 nHome   9 1891.2595 1923.4089
11 crHome   10 1890.7463 1926.4679
12 nRBI   11 1892.7421 1932.0358
13 Div NW   12 1892.9745 1935.8404
14   nRBI 11 1891.3217 1930.6154
15 crAtBat   12 1891.8810 1934.7468
* Optimal Value Of Criterion


Figure 78.8 shows the parameter estimates for the final selected model with $\tau =0.1$. You can see from Figure 78.8 that low-end salaries for MLB players depend mainly on career runs and hits in 1986.

Figure 78.8: Parameter Estimates: $\tau =0.1$

The QUANTSELECT Procedure
 
Quantile = 0.1

Parameter Estimates
Parameter DF Estimate Standardized
Estimate
Intercept 1 -4.397043 0
nHits 1 0.878564 0.085678
crRuns 1 0.327350 0.242977


Figure 78.9 shows the effect selection summary with $\tau =0.9$.

Figure 78.9: Selection Summary: $\tau =0.9$

The QUANTSELECT Procedure
 
Quantile = 0.9

Selection Summary
Step Effect
Entered
Effect
Removed
Number
Effects
In
AIC SBC
0 Intercept   1 2436.7289 2440.3011
1 crHits   2 2197.4349 2204.5792
2 crRbi   3 2183.6148 2194.3313
3 nHits   4 2113.2757 2127.5643
4   crRbi 3 2127.8632 2138.5797
5 crRbi   4 2113.2757 2127.5643
6   crRbi 3 2127.8632 2138.5797
7 crRbi   4 2113.2757 2127.5643
8 crHome   5 2099.2203 2117.0811
9   crRbi 4 2099.3891 2113.6777
10 crRbi   5 2099.2203 2117.0811
11   crRbi 4 2099.3891 2113.6777
12 nOuts   5 2067.1926 2085.0533
13 Div AW   6 2048.2393 2069.6723
14 crRuns   7 2028.8040 2053.8090
15 nAtBat   8 2012.8195 2041.3968
16   crHits 7 2017.0290 2042.0341
17 crRbi   8 2009.3551 2037.9324
18 crAtBat   9 2011.2415 2043.3908
19   crRbi 8 2011.4053 2039.9825
20 crRbi   9 2011.2415 2043.3908
21   crAtBat 8 2009.3551 2037.9324
22 crAtBat   9 2011.2415 2043.3908
23 nBB   10 2004.5033 2040.2249
24   crAtBat 9 2003.1023 2035.2517*
25 crAtBat   10 2004.5033 2040.2249
26   crAtBat 9 2003.1023 2035.2517
27 crAtBat   10 2004.5033 2040.2249
28 nError   11 2004.2230 2043.5167
29 crHits   12 2003.0544 2045.9203
30 Div NE   13 2001.9603 2048.3983
31 Div AE   14 2001.8349* 2051.8451
32 nRuns   15 2003.5961 2057.1784
33 nHome   16 2004.2721 2061.4266
34 nRBI   17 2006.0023 2066.7289
35 yrMajor   18 2007.9975 2072.2963
36 crBB   19 2009.9514 2077.8223
37 nAssts   20 2011.9095 2083.3525
* Optimal Value Of Criterion


Figure 78.10 shows the parameter estimates for the final selected model with $\tau =0.9$.

Figure 78.10: Parameter Estimates: $\tau =0.9$

Parameter Estimates
Parameter DF Estimate Standardized
Estimate
Intercept 1 92.893875 0
nAtBat 1 -1.858170 -0.587509
nHits 1 8.155573 0.795335
nBB 1 3.392794 0.161751
crHome 1 3.191472 0.608700
crRuns 1 1.394317 1.034939
crRbi 1 -0.913371 -0.664951
nOuts 1 0.437241 0.271323
Div AW 1 -167.110005 -0.164764


To visually illustrate how the model evolves through the selection process, the QUANTSELECT procedure provides the coefficient plot, the average check loss plot, and several criterion plots in either packed or unpacked forms. You can request these plots by using the PLOTS= option. The following statements request all the plots for the baseball data at $\tau =0.1$; they also use the STOP=AIC criterion, the CHOOSE=SBC criterion, and the SH=7 option:

ods graphics on;
proc quantselect data=baseball plot=all;
   class Div;
   model Salary = nAtBat nHits nHome nRuns nRBI nBB yrMajor crAtBat
                  crHits crHome crRuns crRbi crBB nAssts nError nOuts
                  Div
         / quantiles=0.1 selection=lasso(adaptive stop=aic choose=sbc sh=7);
run;

Figure 78.11 shows the progression of the parameter estimates as the selection process proceeds.

Figure 78.11: Coefficient Panel: $\tau =0.1$

Coefficient Panel: =0.1


Figure 78.12 shows the progression of the average check losses as the selection process proceeds.

Figure 78.12: Average Check Loss Plot: $\tau =0.1$

Average Check Loss Plot: =0.1


Figure 78.13 shows the progression of four effect selection criteria as the selection process proceeds.

Figure 78.13: Criterion Panel: $\tau =0.1$

Criterion Panel: =0.1