The ENTROPY Procedure (Experimental)

Generalized Maximum Entropy for Multinomial Discrete Choice Models

Multinomial discrete choice models take the form of an experiment that consists of $n$ trials. On each trial, one of $k$ alternatives is observed. If $y_{ij}$ is the random variable that takes on the value 1 when alternative $j$ is selected for the $i$th trial and 0 otherwise, then the probability that $y_{ij}$ is 1, conditional on a vector of regressors $X_ i$ and unknown parameter vector $\beta _ j$, is

\[  \mr {Pr} ( y_{ij} = 1 | X_ i, \beta _ j) = G( X_ i’ \beta _ j )  \]

where $G()$ is a link function. For noisy data the model becomes:

\[  y_{ij} = G( X_ i’ \beta _ j ) + \epsilon _{ij} = p_{ij} + \epsilon _{ij}  \]

The standard maximum likelihood approach for multinomial logit is equivalent to the maximum entropy solution for discrete choice models. The generalized maximum entropy approach avoids an assumption of the form of the link function $G()$.

The generalized maximum entropy for discrete choice models (GME-D) is written in primal form as

$\displaystyle  \mr {maximize}  $
$\displaystyle  H(p,w) \:  $
$\displaystyle  = \:  -p’ \,  \ln (p) \:  - \:  w’ \,  \ln (w)  $
$\displaystyle \mr {subject\,  to}  $
$\displaystyle (I_ j \otimes X’y) \:  $
$\displaystyle  = \:  (I_ j \otimes X’) p \:  + \:  (I_ j \otimes X’)V \,  w  $
$\displaystyle  $
$\displaystyle  \sum _ j^ k p_{ij}  $
$\displaystyle = 1 \, \, \, \,  \textrm{for} \, \,  i=1 \, \textrm{to} \,  N  $
$\displaystyle  $
$\displaystyle  \sum _ m^ L w_{ijm} $
$\displaystyle  = 1 \, \, \, \, \textrm{for} \,  i=1 \,  \textrm{to} \,  N \, \,  \textrm{and} \, \,  j=1 \,  \textrm{to} \,  k  $

Golan, Judge, and Miller (1996) have shown that the dual unconstrained formulation of the GME-D can be viewed as a general class of logit models. Additionally, as the sample size increases, the solution of the dual problem approaches the maximum likelihood solution. Because of these characteristics, only the dual approach is available for the GME-D estimation method.

The parameters $\beta _ j$ are the Lagrange multipliers of the constraints. The covariance matrix of the parameter estimates is computed as the inverse of the Hessian of the dual form of the objective function.