Heckman Selection Model Task

About the Heckman Selection Model Task

The Heckman two-step selection method provides a means of correcting for non-randomly selected samples. It is a two-stage estimation method. The first stage performs a probit analysis on a selection equation. The second stage analyzes an outcome equation based on the first-stage binary probit model.
Note: This task is available only if you are running SAS 9.4 (or later) and SAS/ETS 12.3 (or later).

Example: Heckman Selection Model Task

To create this example:
  1. Create the Work.Mroz data set. For more information, see MROZ Data Set.
  2. In the Tasks section, expand the Econometrics folder and double-click Heckman Selection Model. The user interface for the Heckman Selection Model task opens.
  3. On the Data tab, select the WORK.MROZ data set.
  4. Assign columns to these roles:
    Role
    Column Name
    Selection Equation
    Dependent variable
    inlf
    Continuous variables
    nwifeinc
    exper
    expersq
    age
    kidslt6
    kidsge6
    Outcome Equation
    Dependent variable
    lwage
    Continuous variables
    exper
    expersq
    Categorical variables
    educ
  5. To run the task, click Submit SAS Code.
Here is a subset of the results:
Example of Results from the Heckman Selection Model

Assigning Data to Roles

To run the Heckman Selection Model task, you must assign columns to the Dependent variable roles for the selection and outcome equations.
Role
Column Name
Selection Equation
Dependent variable
specifies a single numeric column that takes binary values. By default, the task uses samples where the dependent variable is equal to 1.
Continuous variables
specifies the independent columns (or regressors) to use in the model for the selection equation dependent variable.
Categorical variables
specifies how to group the values into levels.
Include the intercept
specifies whether to include the intercept in the selection equation.
Outcome Equation
Dependent variable
specifies a single numeric column to use.
Continuous variables
specifies the independent columns (or regressors) to use in the model for the outcome equation dependent variable.
Categorical values
specifies how to group the values into levels.
Include the intercept
specifies whether to include the intercept in the selection equation.

Setting Options

Option
Description
Methods
Variance estimation method
specifies whether to calculate the standard errors by using the corrected standard errors or the OLS standard errors.
Type of covariances of the parameter estimates
specifies the method to calculate the covariance matrix of parameter estimates. You can select the covariance from the outer product matrix, from the inverse Hessian matrix, or from the output product and Hessian matrices (the quasi-maximum likelihood estimates).
Optimization
Method
specifies the iterative minimization method to use. By default, the Quasi-Newton method is used.
Maximum number of iterations
specifies the maximum number of iterations for the selected method.
Statistics
You can specify whether the results include the statistics that the task creates by default, the default statistics and any additional statistics that you select, or no statistics.
Here is the information that you can include in the results:
  • correlation matrix of the parameter estimates
  • covariance matrix of the parameter estimates
  • iteration history of the objective function and parameter estimates