Quantile Regression

Quantile regression generalizes the concept of a univariate quantile to a conditional quantile given one or more covariates. Recall that a student’s score on a test is at the th quantile if his or her score is better than that of of the students who took the test. The score is also said to be at the 100th percentile.

For a random variable Y with probability distribution function


the th quantile of Y is defined as the inverse function


where . In particular, the median is .

For a random sample of Y, it is well known that the sample median minimizes the sum of absolute deviations


Likewise, the general th sample quantile , which is the analog of , is formulated as the minimizer


where , , and where denotes the indicator function. The loss function assigns a weight of to positive residuals and a weight of to negative residuals.

Using this loss function, the linear conditional quantile function extends the th sample quantile to the regression setting in the same way that the linear conditional mean function extends the sample mean. Recall that OLS regression estimates the linear conditional mean function by solving for


The estimated parameter minimizes the sum of squared residuals in the same way that the sample mean minimizes the sum of squares:


Likewise, quantile regression estimates the linear conditional quantile function, , by solving


for . The quantity is called the th regression quantile. The case , which minimizes the sum of absolute residuals, corresponds to median regression, which is also known as regression.

The set of regression quantiles


is referred to as the quantile process.

The QUANTREG procedure computes the quantile function and conducts statistical inference on the estimated parameters .