The BCHOICE Procedure

Gamerman Algorithm

Discrete choice logit models fall in the framework of a generalized linear model (GLM) with a logit link. The Metropolis-Hastings sampling approach of Gamerman (1997) is well suited to this type of model.

In the GLM setting, the data are assumed to be independent with exponential family density

\[ f(\mb{y}_ i|\btheta _ i)=\exp {[(\mb{y}_ i\btheta _ i-b(\btheta _ i))/\phi _ i]}c(\mb{y}_ i,\phi _ i) \]

The means that $\bmu _ i=E(\mb{y}_ i|\btheta _ i)$ are related to the canonical parameters $\btheta _ i$ via $\bmu _ i=b’(\btheta _ i)$ and to the regression coefficients via the link function

\[ g(\bmu _ i)=\eta _ i=\mb{X}_ i\bbeta \]

The maximum likelihood (ML) estimator in a GLM and the asymptotic variance are obtained by iterative application of weighted least squares (IWLS) to transformed observations. Following McCullagh and Nelder (1989), define the transformed response as

\[ \tilde{\mb{y}}_ i(\bbeta )=\eta _ i+(\mb{y}_ i-\bmu _ i)g’(\bmu _ i) \]

and define the corresponding weights as

\[ \bW _ i^{-1}(\bbeta )=b”(\btheta _ i)[g’(\bmu _ i)]^2 \]

Suppose a normal prior is specified on $\bbeta $, $\pi (\bbeta )=\mbox{N}(\bar{\bbeta }, \bOmega _{\bbeta })$. The posterior density is as follows:

\begin{eqnarray*} p(\bbeta | \bY ) & \propto & p(\bY | \bbeta ) \pi (\bbeta )\\ & \propto & \exp {\{ \sum _{i=1}^ N \frac{\mb{y}_ i\btheta _ i-b(\btheta _ i)}{\phi _ i}-\frac{1}{2}(\bbeta -\bar{\bbeta })’\bOmega _{\bbeta }^{-1}(\bbeta -\bar{\bbeta })\} } \end{eqnarray*}

Gamerman (1997) proposes that Metropolis-Hastings sampling be combined with iterative weighted least squares as follows:

  1. Start with $\bbeta ^{(0)}$ and $t=1$.

  2. Sample $\bbeta ^*$ from the proposal density $\bN ( \mb{m}^{(t)}, \mb{C}^{(t)})$, where

    \begin{eqnarray*} \mb{m}^{(t)}& =& \{ \bOmega _{\bbeta }^{-1}+\mb{X}’\mb{W}(\bbeta ^{(t-1)})\mb{X}\} ^{-1} \{ \bOmega _{\bbeta }^{-1}\bar{\bbeta }+\mb{X}’\mb{W}(\bbeta ^{(t-1)})\tilde\bY (\bbeta ^{(t-1)})\} \\ \mb{C}^{(t)}& =& \{ \bOmega _{\bbeta }^{-1}+\mb{X}’\mb{W}(\bbeta ^{(t-1)})\mb{X}\} ^{-1} \end{eqnarray*}
  3. Accept $\bbeta ^*$ with probability

    \[ \alpha (\bbeta _{(t-1)}, \bbeta ^*)=\mbox{min}[1, \frac{p(\bbeta ^*|\bY )q(\bbeta ^{*},\bbeta ^{(t-1)})}{p(\bbeta ^{(t-1)}|\bY )q(\bbeta ^{(t-1)},\bbeta ^{*})}] \]

    where $p(\bbeta | \bY )$ is the posterior density and $q(\bbeta ^{*},\bbeta ^{(t-1)})$ and $q(\bbeta ^{(t-1)},\bbeta ^{*})$ are the transitional probabilities that are based on the proposal density $\bN ( \mb{m}^{(.)}, \mb{C}^{(.)})$. More specifically, $q(\bbeta ^{*},\bbeta ^{(t-1)})$ is an $\bN ( \mb{m}^{*}, \mb{C}^{*})$ density that is evaluated at $\bbeta ^{(t-1)}$, whereas $\mb{m}^*$ and $\mb{C}^*$ have the same expression as $\mb{m}^{(t)}$ and $\mb{C}^{(t)}$ but depend on $\bbeta ^{*}$ instead of $\bbeta ^{(t-1)}$. If $\bbeta ^*$ is not accepted, the chain stays with $\bbeta ^{(t-1)}$.

  4. Set $t=t+1$ and return to step 1.

You can extend this methodology to logit models that have random effects. If there are random effects, the link function is extended to

\[ g(\bmu _ i)=\eta _ i=\mb{X}_ i\bbeta +\mb{Z}_ i\bgamma _ i \]

where the random effects are assumed to have a normal distribution, $\pi (\bgamma _ i)=\mbox{N}(\mb{0}, \bOmega _{\bgamma })$, and $\pi (\bOmega _{\bgamma })=\mbox{inverse Wishart} (\nu _0, \bV _0)$. The posterior density is

\[ p(\bbeta ,\bgamma _1,\ldots , \bgamma _ N, \bOmega _{\bgamma } | \bY ) \propto p(\bY | \bbeta , \bgamma _1,\ldots , \bgamma _ N, \bOmega _{\bgamma }) \pi (\bbeta ) \prod _{i=1}^ N\pi (\bgamma _ i)\pi (\bOmega _{\bgamma }) \]

The parameters are divided into blocks, $\bbeta ,\bgamma _1,\ldots , \bgamma _ N$, and $\bOmega _{\bgamma }$. For the fixed-effects $\bbeta $ block, the conditional posterior has the same form, but the link changes to include $\mb{Z}_ i\bgamma _ i, i=1,\ldots , N$, which are taken as known constants (offsets) at each iteration. The only change that is needed is to replace the transformed response $\tilde{\mb{y}}_ i(\bbeta ^{(t-1)})$ with $\tilde{\mb{y}}_ i(\bbeta ^{(t-1)})-\mb{Z}_ i\bgamma _ i$ in step 2 of the previous Gamerman procedure.

For the random-effects $\bgamma _ i$ block, the same Metropolis-Hastings sampling with the least square proposal can apply. The conditional posterior is

\[ p(\bgamma _ i |\bY , \bbeta , \bOmega _{\bgamma }) \propto \exp {\{ \frac{\mb{y}_ i\btheta _ i-b(\btheta _ i)}{\phi _ i}-\frac{1}{2}\bgamma _ i’\bOmega _{\bgamma }^{-1}\bgamma _ i\} } \]

The transformed response is now $\tilde{\mb{y}}_ i(\bgamma _ i^{(t-1)})$, and the proposal density is $\bN ( \mb{m}_ i^{(t)}, \mb{C}_ i^{(t)})$, where

\begin{eqnarray*} \mb{m}_ i^{(t)}& =& \{ \bOmega _{\bgamma }^{-1}+\mb{Z}’\mb{W}(\bgamma _ i^{(t-1)})\mb{Z}\} ^{-1} \mb{Z}’\mb{W}(\bgamma _ i^{(t-1)})\{ \tilde\bY (\bgamma _ i^{(t-1)})-\mb{X}_ i\bbeta \} \\ \mb{C}^{(t)}& =& \{ \bOmega _{\bgamma }^{-1}+\mb{Z}’\mb{W}(\bgamma _ i^{(t-1)})\mb{X}_ i\} ^{-1} \end{eqnarray*}

Finally, for the covariance matrix $\bOmega _{\bgamma }$ block, direct sampling from an $\mbox{inverse Wishart} (\nu _0+N, \bV _0+S)$ is used, where $ S=\sum _{i=1}^{N}\bgamma _ i\bgamma _ i’/N$.

The chain is initialized with random effects set to 0 and the covariance set to the identity matrix. Updating is done first for the fixed effects, $\bbeta $, as a block to position the chain in the correct region of the parameter space. Then the random effects are updated, and finally the covariance of the random effects is updated. For more information about this algorithm, see Gamerman (1997).