The ENTROPY Procedure (Experimental)

Censored or Truncated Dependent Variables

In practice, you might find that variables are not always measured throughout their natural ranges. A given variable might be recorded continuously in a range, but, outside of that range, only the endpoint is denoted. In other words, say that the data generating process is:

$y_\mi {i} = \mb{x}_\mi {i} \mb{\beta } + \epsilon .$

However, you observe the following:

$y^{\star }_\mi {i} = \left\{ \begin{array}{r@{\quad :\quad }l} ub & y_\mi {i} \ge ub \\ \mb{x}_\mi {i} \mb{\beta } + \epsilon & lb < y_\mi {i} < ub \\ lb & y_\mi {i} \le \mi{lb} \end{array} \right.$

The primal problem is simply a slight modification of the primal formulation for GME-GCE. You specify different supports for the errors in the truncated or censored region, perhaps reflecting some nonsample information. Then the data constraints are modified. The constraints that arise in the censored areas are changed to inequality constraints (Golan, Judge, and Perloff, 1997). Let the variable $\mb{X}^{u}$ denote the observations of the explanatory variable where censoring occurs from the top, $\mb{X}^{l}$ from the bottom, and $\mb{X}^{a}$ in the middle region (no censoring). Let, $\mb{V}^{u}$ be the supports for the observations at the upper bound, $\mb{V}^{l}$ lower bound, and $\mb{V}^{a}$ in the middle.

You have:

$\left[ \begin{array}{c} \mb{y} ^\mi {u} \ge \mi{ub} \\ \mb{y} ^\mi {a} \\ \mb{y} ^\mi {l} \le \mi{lb} \end{array} \right] = \left[ \begin{array}{c} \mb{X}^{u} \\ \mb{X}^{a} \\ \mb{X}^{l} \end{array} \right] \mb{Z} \mb{p} + \left[ \begin{array}{c} \mb{V}^{u}\mb{w} ^{u} \\ \mb{V}^{a}\mb{w} ^{a} \\ \mb{V}^{l}\mb{w} ^{l} \end{array} \right]$

The primal problem then becomes

$\begin{eqnarray*} \mr{maximize} & H(p,w) \: = \: -p’ \, \ln (p) \: - \: w’ \, \ln (w) \\ \mr{subject\, to} & \Strong{y} ^\emph {a} \: = \: \mb{X}^{a} \, \mb{V}^{a} \, p \: + \: \mb{V}^{a} \, \Strong{w} ^{a} \\ & \Strong{y} ^\emph {u} \: \ge \: \mb{X}^{u} \, \mb{V}^{u} \, p \: + \: \mb{V}^{u} \, \Strong{w} ^{u} \\ & \Strong{y} ^\emph {l} \: \le \: \mb{X}^{l} \, \mb{V}^{l} \, p \: + \: \mb{V}^{l} \, \Strong{w} ^{l} \\ & 1_{K} \: = \: (I_{K} \, \otimes \, 1_{L}’) \, p \\ & 1_{T} \: = \: (I_{T} \, \otimes \, 1_{L}’) \, w \end{eqnarray*}$

PROC ENTROPY requires that the number of supports be identical for all three regions.

Alternatively, you can think of cases where the dependent variable is observed continuously for most of its range. However, the variable’s range is reported for some observations. Such data is often found in highly disaggregated state level employment measures.

$y^{\star }_\mi {i} = \left\{ \begin{array}{r@{\quad :\quad }l} \emph{missing} & l_1 \le y \le r_1 \\ \vdots & \vdots \\ \emph{missing} & l_\mi {k} \le y \le r_\mi {k} \\ \mb{x} _\mi {i} \mb{\beta } + \epsilon & otherwise \\ \end{array} \right.$

Just as in the censored case, each range yields two inequality constraints for each observation in that range.