Multinomial Probit :: SAS/ETS(R) 13.1 User's Guide

Multinomial Probit

The multinomial probit model allows the random components of the utility of the different alternatives to be nonindependent and nonidentical. Thus, it does not have the IIA property. The increase in the flexibility of the error structure comes at the expense of introducing several additional parameters in the covariance matrix of the errors.

Consider the random utility function

$U_{ij} = \mathbf{x}_{ij}’\bbeta + \epsilon _{ij}$

where the joint distribution of $(\epsilon _{i1}, \epsilon _{i2}, \cdots , \epsilon _{iJ})$ is multivariate normal:

$\left[ \begin{array}{c} \epsilon _{i1} \\ \epsilon _{i2} \\ \vdots \\ \epsilon _{iJ} \\ \end{array} \right] \sim N(\mathbf{0},\bSigma )$

$\bSigma = \left[ \sigma _{jk} \right]_{j,k=1,\ldots ,J}$

The dimension of the error covariance matrix is determined by the number of alternatives $J$ . Given $(\mathbf{x}_{i1}, \mathbf{x}_{i2}, \cdots , \mathbf{x}_{iJ})$ , the $j$ th alternative is chosen if and only if $U_{ij} \ge U_{ik}$ for all $k \ne j$ . Thus, the probability that the $j$ th alternative is chosen is

$P(y_{i} = j) = P_{ij} = P[\epsilon _{i1}-\epsilon _{ij}< (\mathbf{x}_{ij}-\mathbf{x}_{i1})’\bbeta ,\ldots , \epsilon _{iJ}-\epsilon _{ij} <(\mathbf{x}_{ij}-\mathbf{x}_{iJ})’\bbeta ]$

where $y_{i}$ is a random variable that indicates the choice made. This is a cumulative probability from a $(J-1)$ -variate normal distribution. Since evaluation of this probability involves multidimensional integration, it is practical to use a simulation method to estimate the model. Many studies have shown that the simulators proposed by the following authors (henceforth referred to as GHK) perform well: Geweke (1989); Hajivassiliou (1993); Keane (1994). For example, Hajivassiliou, McFadden, and Ruud (1996) compare 13 simulators using 11 different simulation methods and conclude that the GHK simulation method is the most reliable. To compute the probability of the multivariate normal distribution, the recursive simulation method is used. Refer to Hajivassiliou (1993) for more details about GHK simulators.

The log-likelihood function for the multinomial probit model can be written as

$\mathcal{L} = \sum _{i=1}^{N}\sum _{j=1}^{J}d_{ij}\ln P(y_{i} = j)$

where

$d_{ij} = \left\{ \begin{array}{cl} 1 & \mr {if \; individual} \; i \; \mr {chooses \; alternative} \; j \\ 0 & \mr {otherwise} \end{array} \right.$

For identification of the multinomial probit model, two of the diagonal elements of $\Sigma$ are normalized to 1, and it is assumed that for one of the choices whose error variance is normalized to 1 (say, $k$ ), it is also true that $\sigma _{jk} = \sigma _{kj} = 0$ for $j=1,\cdots ,J$ and $j \neq k$ . Thus, a model with $J$ alternatives has at most $J (J-1)/2 -1$ covariance parameters after normalization.

Let $D$ and $R$ be defined as

$D = \left[ \begin{array}{cccc} \sigma _{1} & 0 & \cdots & 0 \\ 0 & \sigma _{2} & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & \cdots & \sigma _{J} \\ \end{array} \right]$

$R = \left[ \begin{array}{cccc} 1 & \rho _{12} & \cdots & \rho _{1J} \\ \rho _{21} & 1 & \cdots & \rho _{2J} \\ \vdots & \vdots & \vdots & \vdots \\ \rho _{J1} & \rho _{J2} & \cdots & 1 \\ \end{array} \right]$

where $\sigma _{j}^{2} = \sigma _{jj}$ and $\rho _{jk} = \frac{\sigma _{jk}}{\sigma _{j}\sigma _{k}}$ . Then, for identification, $\sigma _{J-1}= \sigma _{J}=1$ and $\rho _{kJ} = \rho _{Jk} = 0$ , for all $k \neq J$ can be imposed, and the error covariance matrix is $\bSigma = DRD$ .

In the standard MDC output, the parameter estimates STD_j and RHO_jk correspond to $\sigma _{j}$ and $\rho _{jk}$ .

In principle, the multinomial probit model is fully identified with the preceding normalizations. However, in practice, convergence in applications of the model with more than three alternatives often requires additional restrictions on the elements of $\bSigma$ .

It must also be noted that the unrestricted structure of the error covariance matrix makes it impossible to forecast demand for a new alternative without knowledge of the new $(J+1)$ by $(J+1)$ error covariance matrix.