Mixed Logit Model

In mixed logit models, an individual’s utility from any alternative can be decomposed into a deterministic component, $\mathbf{x}_{ij}’\bbeta $, which is a linear combination of observed variables, and a stochastic component, $\xi _{ij}+\epsilon _{ij}$,

\[  U_{ij} = \mathbf{x}_{ij}’\bbeta + \xi _{ij} + \epsilon _{ij}  \]

where $\mathbf{x}_{ij}$ is a vector of observed variables that relate to individual $i$ and alternative $j$, $\bbeta $ is a vector of parameters, $\xi _{ij}$ is an error component that can be correlated among alternatives and heteroscedastic for each individual, and $\epsilon _{ij}$ is a random term with zero mean that is independently and identically distributed over alternatives and individuals. The conditional logit model is derived if you assume $\epsilon _{ij}$ has an iid Gumbel distribution and $V(\xi _{ij})=0$.

The mixed logit model assumes a general distribution for $\xi _{ij}$ and an iid Gumbel distribution for $\epsilon _{ij}$. Denote the density function of the error component $\xi _{ij}$ as $f(\xi _{ij}|\bgamma )$, where $\bgamma $ is a parameter vector of the distribution of $\xi _{ij}$. The choice probability of alternative $j$ for individual $i$ is written as

\[  P_{i}(j) = \int Q_{i}(j|\xi _{ij})f(\xi _{ij}|\bgamma )d\xi _{ij}  \]

where the conditional choice probability for a given value of $\xi _{ij}$ is the logit

\[  Q_{i}(j|\xi _{ij}) = \frac{\exp (\mathbf{x}_{ij}\bbeta +\xi _{ij})}{\sum _{k\in C_{i}}\exp (\mathbf{x}_{ik}\bbeta +\xi _{ik})}  \]

Since $\xi _{ij}$ is not given, the unconditional choice probability, $P_{i}(j)$, is the integral of the conditional choice probability, $Q_{i}(j|\xi _{ij})$, over the distribution of $\xi _{ij}$. This model is called mixed logit since the choice probability is a mixture of logits with $f(\xi _{ij}|\bgamma )$ as the mixing distribution.

In general, the mixed logit model does not have an exact likelihood function because the probability $P_{i}(j)$ does not always have a closed form solution. Therefore, a simulation method is used for computing the approximate probability,

\[  \tilde{P}_{i}(j) = 1/S \sum _{s=1}^{S}\tilde{Q}_{i}(j|\xi _{ij}^{s})  \]

where $S$ is the number of simulation replications and $\tilde{P}_{i}(j)$ is a simulated probability. The simulated log-likelihood function is computed as

\[  \tilde{\mathcal{L}} = \sum _{i=1}^{N}\sum _{j=1}^{n_{i}} d_{ij}\ln (\tilde{P}_{i}(j))  \]


\[  d_{ij} = \left\{  \begin{array}{cl} 1 &  \mr {if \;  individual \; } i \mr {\;  chooses \;  alternative} \;  j \\ 0 &  \mr {otherwise} \end{array} \right.  \]

For simulation purposes, assume that the error component has a specific structure,

\[  \xi _{ij} = \mathbf{z}_{ij}’\bmu + \mathbf{w}_{ij}’\bbeta ^{*}  \]

where $\mathbf{z}_{ij}$ is a vector of observed data and $\bmu $ is a random vector with zero mean and density function $\psi (\bmu |\bgamma )$. The observed data vector ($\mathbf{z}_{ij}$) of the error component can contain some or all elements of $\mathbf{x}_{ij}$. The component $\mathbf{z}_{ij}’\bmu $ induces heteroscedasticity and correlation across unobserved utility components of the alternatives. This allows flexible substitution patterns among the alternatives. The $k$th element of vector $\bmu $ is distributed as

\[  \mu _{k} \sim (0,\sigma _{k}^{2})  \]

Therefore, $\mu _{k}$ can be specified as

\[  \mu _{k} = \sigma _{k}\epsilon _{\mu }  \]


\[  \epsilon _{\mu } \sim N(0,1)  \]


\[  \epsilon _{\mu } \sim U(-\sqrt {3},\sqrt {3})  \]

In addition, $\bbeta ^{*}$ is a vector of random parameters (random coefficients). Random coefficients allow heterogeneity across individuals in their sensitivity to observed exogenous variables. The observed data vector, $\mathbf{w}_{ij}$, is a subset of $\mathbf{x}_{ij}$. The following three types of distributions for the random coefficients are supported, where the $m$th element of $\bbeta ^{*}$ is denoted as $\beta ^{*}_{m}$:

  • Normally distributed coefficient with the mean $b_{m}$ and spread $s_{m}$ being estimated.

    \[  \beta ^{*}_{m} = b_{m} + s_{m}\epsilon _{\beta } \quad \mbox{and} \quad \epsilon _{\beta } \sim N(0,1)  \]
  • Uniformly distributed coefficient with the mean $b_{m}$ and spread $s_{m}$ being estimated. A uniform distribution with mean $b$ and spread $s$ is $U(b-s, b+s)$.

    \[  \beta ^{*}_{m} = b_{m} + s_{m}\epsilon _{\beta } \quad \mbox{and} \quad \epsilon _{\beta } \sim U(-1, 1)  \]
  • Lognormally distributed coefficient. The coefficient is calculated as

    \[  \beta ^{*}_{m} = \exp (b_ m + s_ m \epsilon _{\beta }) \quad \mbox{and} \quad \epsilon _{\beta } \sim N(0,1)  \]

    where $b_ m$ and $s_ m$ are parameters that are estimated.

The estimate of spread for normally, uniformly, and lognormally distributed coefficients can be negative. The absolute value of the estimated spread can be interpreted as an estimate of standard deviation for normally distributed coefficients.

A detailed description of mixed logit models can be found, for example, in Brownstone and Train (1999).