The QLIM Procedure

Ordinal Discrete Choice Modeling

Subsections:

Binary Probit and Logit Model
Ordinal Probit/Logit
Goodness-of-Fit Measures

Binary Probit and Logit Model

The binary choice model is

$y^{*}_{i} = \mathbf{x}_{i}’\bbeta + \epsilon _{i}$

where value of the latent dependent variable, $y^{*}_{i}$ , is observed only as follows:

$\begin{eqnarray*} y_{i} & = 1 & \hbox{if } y^{*}_{i}>0 \\ & = 0 & \hbox{otherwise} \end{eqnarray*}$

The disturbance, $\epsilon _{i}$ , of the probit model has standard normal distribution with the distribution function (CDF)

$\Phi (x)=\int _{-\infty }^{x}\frac{1}{\sqrt {2\pi }}\exp (-t^2/2)dt$

The disturbance of the logit model has standard logistic distribution with the CDF

$\Lambda (x)=\frac{\exp (x)}{1+\exp (x)} = \frac{1}{1+\exp (-x)}$

The binary discrete choice model has the following probability that the event $\{ y_{i}=1\}$ occurs:

$P(y_{i}=1) = F(\mathbf{x}_{i}’\bbeta ) = \left\{ \begin{array}{ll} \Phi (\mathbf{x}_{i}’\bbeta ) & \mr{(probit)} \\ \Lambda (\mathbf{x}_{i}’\bbeta ) & \mr{(logit)} \end{array} \right.$

The log-likelihood function is

$\ell = \sum _{i=1}^{N}\left\{ y_{i}\log [F(\mathbf{x}_{i}’\bbeta )] + (1-y_{i})\log [1-F(\mathbf{x}_{i}’\bbeta )]\right\}$

where the CDF $F(x)$ is defined as $\Phi (x)$ for the probit model while $F(x)=\Lambda (x)$ for logit. The first order derivatives of the logit model are

$\frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N}(y_{i}- \Lambda (\mathbf{x}_{i}’\bbeta ))\mathbf{x}_{i}$

The probit model has more complicated derivatives

$\frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N} \left\{ \frac{(2y_{i} - 1)\phi \left[(2y_{i} - 1)\mathbf{x}_{i}'\bbeta \right]}{\Phi \left[(2y_{i} - 1)\mathbf{x}_{i}'\bbeta \right]}\right\} \mathbf{x}_{i} = \sum _{i=1}^{N}r_{i} \mathbf{x}_{i}$

where

$r_{i} = \frac{(2y_{i} - 1)\phi \left[(2y_{i} - 1)\mathbf{x}_{i}'\bbeta \right]}{\Phi \left[(2y_{i} - 1)\mathbf{x}_{i}'\bbeta \right]}$

Note that the logit maximum likelihood estimates are $\frac{\pi }{\sqrt {3}}$ times greater than probit maximum likelihood estimates, since the probit parameter estimates, $\bbeta$ , are standardized, and the error term with logistic distribution has a variance of $\frac{\pi ^{2}}{3}$ .

Ordinal Probit/Logit

When the dependent variable is observed in sequence with M categories, binary discrete choice modeling is not appropriate for data analysis. McKelvey and Zavoina (1975) proposed the ordinal (or ordered) probit model.

Consider the following regression equation:

$y_{i}^{*} = \mathbf{x}_{i}’\bbeta + \epsilon _{i}$

where error disturbances, $\epsilon _{i}$ , have the distribution function F. The unobserved continuous random variable, $y_{i}^{*}$ , is identified as M categories. Suppose there are $M+1$ real numbers, $\mu _{0},\cdots ,\mu _{M}$ , where $\mu _{0}=-\infty$ , $\mu _{1}=0$ , $\mu _{M}=\infty$ , and $\mu _{0} \leq \mu _{1} \leq \cdots \leq \mu _{M}$ . Define

$R_{i,j} = \mu _{j} - \mathbf{x}_{i}’\bbeta$

The probability that the unobserved dependent variable is contained in the jth category can be written as

$P[\mu _{j-1}< y_{i}^{*} \leq \mu _{j}] = F(R_{i,j}) - F(R_{i,j-1})$

The log-likelihood function is

$\ell = \sum _{i=1}^{N}\sum _{j=1}^{M}d_{ij}\log \left[F(R_{i,j}) - F(R_{i,j-1})\right]$

where

$d_{ij} = \left\{ \begin{array}{cl} 1& \mr{if} \mu _{j-1}< y_{i} \leq \mu _{j} \\ 0& \mr{otherwise} \end{array} \right.$

The first derivatives are written as

$\frac{\partial \ell }{\partial \bbeta } = \sum _{i=1}^{N}\sum _{j=1}^{M} d_{ij}\left[\frac{f(R_{i,j-1}) - f(R_{i,j})}{F(R_{i,j})-F(R_{i,j-1})} \mathbf{x}_{i}\right]$

$\frac{\partial \ell }{\partial \mu _{k}} = \sum _{i=1}^{N}\sum _{j=1}^{M} d_{ij}\left[\frac{\delta _{j,k}f(R_{i,j}) - \delta _{j-1,k}f(R_{i,j-1})}{F(R_{i,j})-F(R_{i,j-1})}\right]$

where $f(x) = \frac{d F(x)}{dx}$ and $\delta _{j,k}=1$ if $j=k$ , and $\delta _{j,k}=0$ otherwise. When the ordinal probit is estimated, it is assumed that $F(R_{i,j})=\Phi (R_{i,j})$ . The ordinal logit model is estimated if $F(R_{i,j})=\Lambda (R_{i,j})$ . The first threshold parameter, $\mu _{1}$ , is estimated when the LIMIT1=VARYING option is specified. By default (LIMIT1=ZERO), so that $M-2$ threshold parameters ( $\mu _{2},\dots ,\mu _{M-1}$ ) are estimated.

The ordered probit models are analyzed by Aitchison and Silvey (1957), and Cox (1970) discussed ordered response data by using the logit model. They defined the probability that $y_{i}^{*}$ belongs to jth category as

$P[\mu _{j-1}< y_{i} \leq \mu _{j}] = F(\mu _{j}+\mathbf{x}_{i}’\btheta ) - F(\mu _{j-1}+\mathbf{x}_{i}’\btheta )$

where $\mu _{0}=-\infty$ and $\mu _{M}=\infty$ . Therefore, the ordered response model analyzed by Aitchison and Silvey can be estimated if the LIMIT1=VARYING option is specified. Note that $\btheta =-\bbeta$ .

Goodness-of-Fit Measures

The goodness-of-fit measures discussed in this section apply only to discrete dependent variable models.

McFadden (1974) suggested a likelihood ratio index that is analogous to the $R^{2}$ in the linear regression model:

$R^{2}_{M} = 1 - \frac{\ln L}{\ln L_{0}}$

where L is the value of the maximum likelihood function and $L_{0}$ is the value of a likelihood function when regression coefficients except an intercept term are zero. It can be shown that $L_{0}$ can be written as

$L_{0} = \sum _{j=1}^{M} N_{j} \ln (\frac{N_{j}}{N} )$

where $N_{j}$ is the number of responses in category j.

Estrella (1998) proposes the following requirements for a goodness-of-fit measure to be desirable in discrete choice modeling:

The measure must take values in $[0,1]$ , where 0 represents no fit and 1 corresponds to perfect fit.
The measure should be directly related to the valid test statistic for significance of all slope coefficients.
The derivative of the measure with respect to the test statistic should comply with corresponding derivatives in a linear regression.

Estrella’s (1998) measure is written

$R_{E1}^{2} = 1 - \left(\frac{\ln L}{\ln L_{0}}\right) ^{-\frac{2}{N}\ln L_{0}}$

An alternative measure suggested by Estrella (1998) is

$R_{E2}^{2} = 1 - [ (\ln L - K) / \ln L_{0} ]^{-\frac{2}{N}\ln L_{0}}$

where $\ln L_{0}$ is computed with null slope parameter values, N is the number observations used, and K represents the number of estimated parameters.

Other goodness-of-fit measures are summarized as follows:

$R_{CU1}^{2} = 1 - \left(\frac{L_{0}}{L}\right)^{\frac{2}{N}} \; \; (\mr{Cragg-Uhler 1})$

$R_{CU2}^{2} = \frac{1 - (L_{0}/L)^{\frac{2}{N}}}{1 - L_{0}^{\frac{2}{N}}} \; \; (\mr{Cragg-Uhler 2})$

$R_{A}^{2} = \frac{2(\ln L - \ln L_{0})}{2(\ln L - \ln L_{0})+N} \; \; (\mr{Aldrich-Nelson})$

$R_{VZ}^{2} = R_{A}^{2}\frac{2\ln L_{0} - N}{2\ln L_{0}} \; \; (\mr{Veall-Zimmermann})$

$R_{MZ}^{2} = \frac{\sum _{i=1}^{N}(\hat{y}_{i} - \bar{\hat{y_{i}}})^{2}}{N +\sum _{i=1}^{N}(\hat{y}_{i} - \bar{\hat{y_{i}}})^{2}} \; \; (\mr{McKelvey-Zavoina})$

where $\hat{y}_{i}=\mathbf{x}_{i}’\hat{\bbeta }$ and $\bar{\hat{y_{i}}} = \sum _{i=1}^{N} \hat{y}_{i} / N$ .