# The QUANTSELECT Procedure (Experimental)

## Getting Started: QUANTSELECT Procedure

This example demonstrates how you can use the QUANTSELECT procedure to select covariate effects for quantile regression. The data set under investigation contains salary and performance information for Major League Baseball (MLB) players, excluding pitchers, who played at least one game in both the 1986 and 1987 seasons. The salaries (Time Inc., 1987) are for the 1987 season, and the performance measures are from 1986 (Reichler, 1987).

data baseball;
length team $12; input name$ 1-18 nAtBat nHits nHome nRuns nRBI nBB
yrMajor crAtBat crHits crHome crRuns crRbi crBB
League $Division$ Team $Position$ nOuts nAssts
nError Salary;
label name="Player's Name"
nAtBat="Times at Bat in 1986"
nHits="Hits in 1986"
nHome="Home Runs in 1986"
nRuns="Runs in 1986"
nRBI="RBIs in 1986"
nBB="Walks in 1986"
yrMajor="Years in the Major Leagues"
crAtBat="Career times at bat"
crHits="Career Hits"
crHome="Career Home Runs"
crRuns="Career Runs"
crRbi="Career RBIs"
crBB="Career Walks"
League="League at the end of 1986"
Division="Division at the end of 1986"
Team="Team at the end of 1986"
Position="Position(s) in 1986"
nOuts="Put Outs in 1986"
nAssts="Assists in 1986"
nError="Errors in 1986"
Salary="1987 Salary in \$ Thousands";
if League='American' and Division='East' then Div='AE';
if League='American' and Division='West' then Div='AW';
if League='National' and Division='East' then Div='NE';
if League='National' and Division='West' then Div='NW';
logSalary = log(Salary);
datalines;
Allanson, Andy       293    66     1    30    29    14
1   293    66     1    30    29    14
American East Cleveland C 446 33 20 .
Ashby, Alan          315    81     7    24    38    39

... more lines ...

Wilson, Willie       631   170     9    77    44    31
11  4908  1457    30   775   357   249
American West KansasCity CF 408 4 3 1000
;


Suppose you want to investigate how the MLB players’ salaries for the 1987 season depend on performance measures for the players’ previous season and MLB careers. As a starting point for such a analysis, you can use the following statements to obtain a parsimonious conditional median model at :

proc quantselect data=baseball;
class Div;
model Salary = nAtBat nHits nHome nRuns nRBI nBB yrMajor crAtBat
crHits crHome crRuns crRbi crBB nAssts nError nOuts
Div
run;


The SELECTION=LASSO(ADAPTIVE) option in the MODEL statement specifies the adaptive LASSO method (Zou, 2006), which controls the effect selection process. The STOP=AIC option specifies that Akaike’s information criterion (AIC) be used to determine the stopping condition. The CHOOSE=SBC option specifies that the Schwarz Bayesian information criterion (SBC) be used to determine the final selected model. The SH= option specifies the number of stop horizons, which requests that the selection process be stopped whenever the STOP= criterion values at step are worse than those for step s for some .

Figure 78.1 shows the Model Information table, which indicates the effect selection settings. You can see that the default quantile type is single level, so this effect selection is effective only for .

Figure 78.1: Model Information

The QUANTSELECT Procedure

Model Information
Data Set WORK.BASEBALL
Dependent Variable Salary
Quantile Type Single Level
Stop Criterion AIC
Choose Criterion SBC

Figure 78.2 summarizes the effect selection process, which starts with an intercept-only model at step 0. At step 1, the effect that corresponds to the career runs is added to the model that reduced the AIC value from 2691.6511 to 2510.7297. You can see that step 10 has the minimum AIC and that step 7 has the minimum SBC. Common sense also tells you that the SBC favors a smaller model than the AIC.

Figure 78.2: Selection Summary

The QUANTSELECT Procedure

Quantile = 0.5

Selection Summary
Step Effect
Entered
Number
Effects
In
AIC SBC
0 Intercept 1 2691.6511 2695.2232
1 crRuns 2 2510.7297 2517.8740
2 nHits 3 2470.4807 2481.1971
3 crHome 4 2463.5953 2477.8839
4 nBB 5 2463.7806 2481.6414
5 nOuts 6 2455.6212 2477.0541
6 Div AW 7 2451.4609 2476.4660
7 nAtBat 8 2445.0446 2473.6218*
8 crBB 9 2445.5432 2477.6926
9 nHome 10 2443.4818 2479.2033
10 nRuns 11 2442.6036* 2481.8973
11 crAtBat 12 2442.8387 2485.7046
12 yrMajor 13 2443.5374 2489.9754
13 nError 14 2445.2085 2495.2187
14 Div NE 15 2446.4042 2499.9865
15 nRBI 16 2448.2003 2505.3547
16 nAssts 17 2449.9731 2510.6997

Figure 78.3 shows that the selection process stopped at a local minimum of the STOP= criterion, which is step 10. According to the SH=7 option, the effect selection process is stopped at step 10 because all the AIC values for step 11 through step 17 are no less than the AIC at step 10. Step 17 is ignored in the selection summary table because it is the last step.

Figure 78.3: Stop Reason

 Selection stopped at a local minimum of the AIC criterion.

Figure 78.4 shows how the final selected model is determined. CHOOSE=SBC is specified in this example, so the model at step 7 is chosen as the final selected model.

Figure 78.4: Selection Reason

 The model at step 7 is selected where SBC is 2473.62.

Figure 78.5 shows the final selected effects and Figure 78.6 shows the parameter estimates for the final selected model.

Figure 78.5: Selected Effects

The QUANTSELECT Procedure

Quantile = 0.5

Selected Effects: Intercept nAtBat nHits nBB crHome crRuns nOuts Div AW

Figure 78.6: Parameter Estimates

Parameter Estimates
Parameter DF Estimate Standardized
Estimate
Intercept 1 -18.187539 0
nAtBat 1 -1.582714 -0.500417
nHits 1 7.044354 0.686968
nBB 1 2.053726 0.097911
crHome 1 1.429926 0.272726
crRuns 1 0.425955 0.316167
nOuts 1 0.282803 0.175489
Div AW 1 -57.671778 -0.056862

Quantile regression can fit a conditional quantile model at any quantile level , so it can describe the entire distribution of a response variable conditional on covariate effects. To further investigate the effects that might affect the MLB players’ salaries, you can also conduct effect selection at and , which correspond to low-end salaries and high-end salaries respectively. The following statements use the same selection settings that are used in the previous program:


proc quantselect data=baseball;
class Div;
model Salary = nAtBat nHits nHome nRuns nRBI nBB yrMajor crAtBat
crHits crHome crRuns crRbi crBB nAssts nError nOuts
Div
/ quantiles=0.1 0.9 selection=lasso(adaptive stop=aic choose=sbc sh=7);
run;


Figure 78.7 shows the effect selection summary with .

Figure 78.8 shows the parameter estimates for the final selected model with . You can see from Figure 78.8 that low-end salaries for MLB players depend mainly on career runs and hits in 1986.

Figure 78.9 shows the effect selection summary with .

Figure 78.10 shows the parameter estimates for the final selected model with .

To visually illustrate how the model evolves through the selection process, the QUANTSELECT procedure provides the coefficient plot, the average check loss plot, and several criterion plots in either packed or unpacked forms. You can request these plots by using the PLOTS= option. The following statements request all the plots for the baseball data at ; they also use the STOP=AIC criterion, the CHOOSE=SBC criterion, and the SH=7 option:

ods graphics on;
proc quantselect data=baseball plot=all;
class Div;
model Salary = nAtBat nHits nHome nRuns nRBI nBB yrMajor crAtBat
crHits crHome crRuns crRbi crBB nAssts nError nOuts
Div
/ quantiles=0.1 selection=lasso(adaptive stop=aic choose=sbc sh=7);
run;


Figure 78.11 shows the progression of the parameter estimates as the selection process proceeds.

Figure 78.12 shows the progression of the average check losses as the selection process proceeds.

Figure 78.13 shows the progression of four effect selection criteria as the selection process proceeds.