The BCHOICE Procedure

Discrete Choice Models

Discrete choice models are used in marketing research to model decision makers’ choices among alternative products and services. The decision makers might be people, households, companies and so on, and the alternatives might be products, services, actions, or any other options or items about which choices must be made (Train 2009). The collection of alternatives that are available to the decision makers is called a choice set.

Discrete choice models are derived under the assumption of utility-maximizing behavior by the decision maker. When individuals are asked to make one choice among a set of alternatives, they usually determine the level of utility that each alternative offers. The utility that individual i obtains from alternative j among J alternatives is denoted as

\[ u_{ij} = v_{ij} + \epsilon _{ij}, ~ ~ ~ ~ i=1, \ldots , N, ~ ~ \mbox{and}~ ~ j=1, \ldots , J \]

where the subscript i is an index for the individuals, the subscript j is an index for the alternatives in a choice set, $v_{ij}$ is a nonstochastic utility function that relates those observed factors to the utility, and $\epsilon _{ij}$ is the error component that captures the unobserved characteristics of the utility. In discrete choice models, the observed part of the utility function is assumed to be linear in the parameters,

\[ v_{ij}=\mb{x}_{ij}’\bbeta \]

where $\mathbf{x}_{ij}$ is a p-dimensional design vector of observed attribute levels that relate to alternative j and $\bbeta $ is the corresponding vector of fixed regression coefficients that indicate the utilities or part-worths of the attribute levels.

Decision makers choose the alternative that gives them the greatest utility. Let $\mb{y}_{i}$ be the multinomial response vector for the ith individual. The value $y_{ij}$ takes 1 if the jth component of $\mb{u}_{i}=(u_{i1}, \ldots ,u_{iJ})$ is the largest, and 0 otherwise:

\begin{eqnarray*} u_{ij}& =& \mb{x}_{ij}’\bbeta + \epsilon _{ij}\\ y_{ij}& =& \left\{ \begin{array}{ll} 1 & \mbox{if}~ ~ u_{ij}\ge \max (\mb{u}_{i})\\ 0 & \mbox{otherwise} \end{array}\right. \end{eqnarray*}

The probability that the individual i chooses alternative j is

\begin{eqnarray*} P(y_{ij}=1)& = & \mbox{Pr}~ (u_{ij} > u_{ik} ~ ~ \mbox{for}~ ~ \mbox{all}~ ~ k \ne j) \\ & = & \mbox{Pr}~ (v_{ij}+\epsilon _{ij} > v_{ik}+\epsilon _{ik} ~ ~ \mbox{for}~ ~ \mbox{all}~ ~ k \ne j) \\ & = & \mbox{Pr}~ (\epsilon _{ik} - \epsilon _{ij} < v_{ij} - v_{ik} ~ ~ \mbox{for}~ ~ \mbox{all}~ ~ k \ne j)\\ & = & \int _\epsilon \mbox{I}(\epsilon _{ik} - \epsilon _{ij} < v_{ij} - v_{ik} ~ ~ \mbox{for}~ ~ \mbox{all}~ ~ k \ne j)f(\bepsilon _ i)d\bepsilon _ i \end{eqnarray*}

where I(.) is the indicator function and $f(\bepsilon _ i)$ denotes the joint density of the error vector $\bepsilon _ i=(\epsilon _{i1}, \ldots , \epsilon _{iJ})$. Different specifications about the density result in different types of choice models, as detailed in the next section. Logit and nested logit models have a closed form for this integral, whereas a probit model does not.