The QLIM Procedure

Ordinal Discrete Choice Modeling

Binary Probit and Logit Model

The binary choice model is

\[ y^{*}_{i} = \mathbf{x}_{i}’\bbeta + \epsilon _{i} \]

where value of the latent dependent variable, $y^{*}_{i}$, is observed only as follows:

\begin{eqnarray*} y_{i} & = 1 & \hbox{if } y^{*}_{i}>0 \\ & = 0 & \hbox{otherwise} \end{eqnarray*}

The disturbance, $\epsilon _{i}$, of the probit model has standard normal distribution with the distribution function (CDF)

\[ \Phi (x)=\int _{-\infty }^{x}\frac{1}{\sqrt {2\pi }}\exp (-t^2/2)dt \]

The disturbance of the logit model has standard logistic distribution with the CDF

\[ \Lambda (x)=\frac{\exp (x)}{1+\exp (x)} = \frac{1}{1+\exp (-x)} \]

The binary discrete choice model has the following probability that the event $\{ y_{i}=1\} $ occurs:

\[ P(y_{i}=1) = F(\mathbf{x}_{i}’\bbeta ) = \left\{ \begin{array}{ll} \Phi (\mathbf{x}_{i}’\bbeta ) & \mr{(probit)} \\ \Lambda (\mathbf{x}_{i}’\bbeta ) & \mr{(logit)} \end{array} \right. \]

The log-likelihood function is

\[ \ell = \sum _{i=1}^{N}\left\{ y_{i}\log [F(\mathbf{x}_{i}’\bbeta )] + (1-y_{i})\log [1-F(\mathbf{x}_{i}’\bbeta )]\right\} \]

where the CDF $F(x)$ is defined as $\Phi (x)$ for the probit model while $F(x)=\Lambda (x)$ for logit. The first order derivatives of the logit model are

\[ \frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N}(y_{i}- \Lambda (\mathbf{x}_{i}’\bbeta ))\mathbf{x}_{i} \]

The probit model has more complicated derivatives

\[ \frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N} \left\{ \frac{(2y_{i} - 1)\phi \left[(2y_{i} - 1)\mathbf{x}_{i}'\bbeta \right]}{\Phi \left[(2y_{i} - 1)\mathbf{x}_{i}'\bbeta \right]}\right\} \mathbf{x}_{i} = \sum _{i=1}^{N}r_{i} \mathbf{x}_{i} \]

where

\[ r_{i} = \frac{(2y_{i} - 1)\phi \left[(2y_{i} - 1)\mathbf{x}_{i}'\bbeta \right]}{\Phi \left[(2y_{i} - 1)\mathbf{x}_{i}'\bbeta \right]} \]

Note that the logit maximum likelihood estimates are $\frac{\pi }{\sqrt {3}}$ times greater than probit maximum likelihood estimates, since the probit parameter estimates, $\bbeta $, are standardized, and the error term with logistic distribution has a variance of $\frac{\pi ^{2}}{3}$.

Ordinal Probit/Logit

When the dependent variable is observed in sequence with M categories, binary discrete choice modeling is not appropriate for data analysis. McKelvey and Zavoina (1975) proposed the ordinal (or ordered) probit model.

Consider the following regression equation:

\[ y_{i}^{*} = \mathbf{x}_{i}’\bbeta + \epsilon _{i} \]

where error disturbances, $\epsilon _{i}$, have the distribution function F. The unobserved continuous random variable, $y_{i}^{*}$, is identified as M categories. Suppose there are $M+1$ real numbers, $\mu _{0},\cdots ,\mu _{M}$, where $\mu _{0}=-\infty $, $\mu _{1}=0$, $\mu _{M}=\infty $, and $\mu _{0} \leq \mu _{1} \leq \cdots \leq \mu _{M}$. Define

\[ R_{i,j} = \mu _{j} - \mathbf{x}_{i}’\bbeta \]

The probability that the unobserved dependent variable is contained in the jth category can be written as

\[ P[\mu _{j-1}< y_{i}^{*} \leq \mu _{j}] = F(R_{i,j}) - F(R_{i,j-1}) \]

The log-likelihood function is

\[ \ell = \sum _{i=1}^{N}\sum _{j=1}^{M}d_{ij}\log \left[F(R_{i,j}) - F(R_{i,j-1})\right] \]

where

\[ d_{ij} = \left\{ \begin{array}{cl} 1& \mr{if} \mu _{j-1}< y_{i} \leq \mu _{j} \\ 0& \mr{otherwise} \end{array} \right. \]

The first derivatives are written as

\[ \frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N}\sum _{j=1}^{M} d_{ij}\left[\frac{f(R_{i,j-1}) - f(R_{i,j})}{F(R_{i,j})-F(R_{i,j-1})} \mathbf{x}_{i}\right] \]
\[ \frac{\partial \ell }{\partial \mu _{k}} = \sum _{i=1}^{N}\sum _{j=1}^{M} d_{ij}\left[\frac{\delta _{j,k}f(R_{i,j}) - \delta _{j-1,k}f(R_{i,j-1})}{F(R_{i,j})-F(R_{i,j-1})}\right] \]

where $f(x) = \frac{d F(x)}{dx}$ and $\delta _{j,k}=1$ if $j=k$, and $\delta _{j,k}=0$ otherwise. When the ordinal probit is estimated, it is assumed that $F(R_{i,j})=\Phi (R_{i,j})$. The ordinal logit model is estimated if $F(R_{i,j})=\Lambda (R_{i,j})$. The first threshold parameter, $\mu _{1}$, is estimated when the LIMIT1=VARYING option is specified. By default (LIMIT1=ZERO), so that $M-2$ threshold parameters ($\mu _{2},\dots ,\mu _{M-1}$) are estimated.

The ordered probit models are analyzed by Aitchison and Silvey (1957), and Cox (1970) discussed ordered response data by using the logit model. They defined the probability that $y_{i}^{*}$ belongs to jth category as

\[ P[\mu _{j-1}< y_{i} \leq \mu _{j}] = F(\mu _{j}+\mathbf{x}_{i}’\btheta ) - F(\mu _{j-1}+\mathbf{x}_{i}’\btheta ) \]

where $\mu _{0}=-\infty $ and $\mu _{M}=\infty $. Therefore, the ordered response model analyzed by Aitchison and Silvey can be estimated if the LIMIT1=VARYING option is specified. Note that $\btheta =-\bbeta $.

Goodness-of-Fit Measures

The goodness-of-fit measures discussed in this section apply only to discrete dependent variable models.

McFadden (1974) suggested a likelihood ratio index that is analogous to the $R^{2}$ in the linear regression model:

\[ R^{2}_{M} = 1 - \frac{\ln L}{\ln L_{0}} \]

where L is the value of the maximum likelihood function and $L_{0}$ is the value of a likelihood function when regression coefficients except an intercept term are zero. It can be shown that $L_{0}$ can be written as

\[ L_{0} = \sum _{j=1}^{M} N_{j} \ln (\frac{N_{j}}{N} ) \]

where $N_{j}$ is the number of responses in category j.

Estrella (1998) proposes the following requirements for a goodness-of-fit measure to be desirable in discrete choice modeling:

  • The measure must take values in $[0,1]$, where 0 represents no fit and 1 corresponds to perfect fit.

  • The measure should be directly related to the valid test statistic for significance of all slope coefficients.

  • The derivative of the measure with respect to the test statistic should comply with corresponding derivatives in a linear regression.

Estrella’s (1998) measure is written

\[ R_{E1}^{2} = 1 - \left(\frac{\ln L}{\ln L_{0}}\right) ^{-\frac{2}{N}\ln L_{0}} \]

An alternative measure suggested by Estrella (1998) is

\[ R_{E2}^{2} = 1 - [ (\ln L - K) / \ln L_{0} ]^{-\frac{2}{N}\ln L_{0}} \]

where $\ln L_{0}$ is computed with null slope parameter values, N is the number observations used, and K represents the number of estimated parameters.

Other goodness-of-fit measures are summarized as follows:

\[ R_{CU1}^{2} = 1 - \left(\frac{L_{0}}{L}\right)^{\frac{2}{N}} \; \; (\mr{Cragg-Uhler 1}) \]
\[ R_{CU2}^{2} = \frac{1 - (L_{0}/L)^{\frac{2}{N}}}{1 - L_{0}^{\frac{2}{N}}} \; \; (\mr{Cragg-Uhler 2}) \]
\[ R_{A}^{2} = \frac{2(\ln L - \ln L_{0})}{2(\ln L - \ln L_{0})+N} \; \; (\mr{Aldrich-Nelson}) \]
\[ R_{VZ}^{2} = R_{A}^{2}\frac{2\ln L_{0} - N}{2\ln L_{0}} \; \; (\mr{Veall-Zimmermann}) \]
\[ R_{MZ}^{2} = \frac{\sum _{i=1}^{N}(\hat{y}_{i} - \bar{\hat{y_{i}}})^{2}}{N +\sum _{i=1}^{N}(\hat{y}_{i} - \bar{\hat{y_{i}}})^{2}} \; \; (\mr{McKelvey-Zavoina}) \]

where $\hat{y}_{i}=\mathbf{x}_{i}’\hat{\bbeta }$ and $\bar{\hat{y_{i}}} = \sum _{i=1}^{N} \hat{y}_{i} / N$.