Selection Models

In sample selection models, one or several dependent variables are observed when another variable takes certain values. For example, the standard Heckman selection model can be defined as

     
     
     

where and are jointly normal with zero mean, standard deviations of 1 and , and correlation of . is the variable that the selection is based on, and is observed when has a value of 1. Least squares regression using the observed data of produces inconsistent estimates of . Maximum likelihood method is used to estimate selection models. It is also possible to estimate these models by using Heckman’s method, which is more computationally efficient. But it can be shown that the resulting estimates, although consistent, are not asymptotically efficient under normality assumption. Moreover, this method often violates the constraint on correlation coefficient .

The log-likelihood function of the Heckman selection model is written as

     
     

Only one variable is allowed for the selection to be based on, but the selection may lead to several variables. For example, in the following switching regression model,

     
     
     
     

is the variable that the selection is based on. If , then is observed. If , then is observed. Because it is never the case that and are observed at the same time, the correlation between and cannot be estimated. Only the correlation between and and the correlation between and can be estimated. This estimation uses the maximum likelihood method.

A brief example of the code for this model can be found in Sample Selection Model.

The Heckman selection model can include censoring or truncation. For a brief example of the code for these models see Sample Selection Model with Truncation and Censoring. The following example shows a variable that is censored from below at zero.

     
     
     
     

In this case, the log-likelihood function of the Heckman selection model needs to be modified to include the censored region.

     
     
     

In case is truncated from below at zero instead of censored, the likelihood function can be written as