The BCHOICE Procedure

Random Effects

Choice models that have random effects (or random coefficients) provide solutions to create individual-level or group-specific utilities. They are also referred as "mixed models" or "hybrid models." One of the greatest challenges in marketing research is to account for the diversity of preferences and sensitivities in the marketplace. Heterogeneity in individual preferences is the reason for differentiated product programs and market segmentation. As individual preferences become more diverse, it becomes less appropriate to consider the analyses in an aggregated way. Individual utilities are useful because they make segmentation easy and provide a way to detect groups. Because people have different preferences, it can be misleading to roll the whole sample together into a single set of utilities.

For example, imagine studying the popularity of a new brand. Some participants in the study who are interviewed love the new brand, whereas others dislike it. If you simply aggregate the data and look at the average, the conclusion is that the sample is ambivalent toward the new brand. This would be the least helpful conclusion that could be drawn, because it does not fit for anyone.

Choice models that have random effects generalize the standard choice models to incorporate individual-level effects. Let the utility that individual i obtains from alternative j in choice situation t ($t=1,\ldots , T$ ) be

\begin{eqnarray*} u_{ijt} & =& \mb{x}_{ijt}’\bbeta + \mb{z}_{ijt}’\bgamma _ i+ \epsilon _{ijt}\\ y_{ijt}& =& \left\{ \begin{array}{ll} 1 & \mbox{if}~ ~ u_{ijt}\ge \max (u_{i1t}, u_{i2t}, \ldots , u_{iJt})\\ 0 & \mbox{otherwise} \end{array}\right. \end{eqnarray*}

where $ y_{ijt}$ is the observed choice for individual i and alternative j in choice situation t; $\mb{x}_{ijt}$ is the fixed design vector for individual i and alternative j in choice situation t; $\bbeta $ is the vector of fixed coefficients; $\mb{z}_{ijt}$ is the random design vector for individual i and alternative j in choice situation t; and $\bgamma _ i$ is the vector of random coefficients for individual i that correspond to $\mb{z}_{ijt}$. Sometimes the random coefficients might be at a level different from the individual level. For example, it is common to assume that there are random coefficients at the household level or group level of participants. For the convenience of notation, random effects are assumed to be at the individual level.

In the random-effects model, it is assumed that each $\bgamma _ i$ is drawn from a superpopulation and this superpopulation is normal, $\bgamma _ i \sim \mbox{iid}~ ~ ~  \mbox{N}(\mb{0}, \bOmega _{\bgamma })$. An additional stage is added to the model where a prior for $\bOmega _{\bgamma }$ is specified:

\begin{eqnarray*} \pi (\bgamma _ i) & =& \mbox{N} (\mb{0}, \bOmega _{\bgamma })\\ \pi (\bOmega _{\bgamma }) & =& \mbox{inverse Wishart} (\nu _0, \bV _0) \end{eqnarray*}

The covariance matrix $\bOmega _{\bgamma }$ characterizes the extent of heterogeneity among individuals. Large diagonal elements of $\bOmega _{\bgamma }$ indicate substantial heterogeneity in part-worths. Off-diagonal elements indicate patterns in the evaluation of attribute levels. For example, positive covariances specify pairs of attribute levels that tend to be evaluated similarly across respondents. Product offerings that consist of these attribute levels are more strongly preferred or disliked by certain individuals.

In this setup, the prior mean is 0 for the random effects, meaning that the random effects either are truly around 0 or have been centered by the fixed effects. For random effects whose mean is not around 0, you can follow the usual practice of specifying them in the fixed effects. For example, if one random effect is price and you do not think that the population mean of price is around 0, then you should add price as a fixed effect as follows:

proc bchoice data=randeffdata;
   class subj set;
   model y = price / choiceset=(subj set);
   random  price / subject=subj;
run;

Thus, you obtain the estimate for the population mean of the price effect through the fixed effect, and you obtain the deviation from the population mean for each individual through random effects.

Allenby and Rossi (1999) and Rossi, Allenby, and McCulloch (2005) propose a hierarchical Bayesian random-effects model that is set up in a different way. In their model, there are no fixed effects but only random effects. This model, which is referred to as the random-effects-only model in the rest of this chapter, is as follows:

\begin{eqnarray*} u_{ijt} & =& \mb{z}_{ijt}’\bgamma _ i+ \epsilon _{ijt}\\ y_{ijt}& =& \left\{ \begin{array}{ll} 1 & \mbox{if}~ ~ u_{ijt}\ge \max (u_{i1t}, u_{i2t}, \ldots , u_{iJt})\\ 0 & \mbox{otherwise} \end{array}\right.\\ \pi (\bgamma _ i) & =& \mbox{N} (\bar\bgamma , \bOmega _{\bgamma })\\ \pi (\bOmega _{\bgamma }) & =& \mbox{inverse Wishart} (\nu _0, \bV _0) \end{eqnarray*}

where $\bar\bgamma $ is a mean vector of regression coefficients, which models the central location of distribution of the random coefficients; $\bar\bgamma $ represents the average part-worths across the respondents in the data. If you want to use this setup, specify the REMEAN option in the RANDOM statement to request estimation on $\bar\bgamma $ and do not specify any fixed effects in the MODEL statement.

Rossi, McCulloch, and Allenby (1996) and Rossi, Allenby, and McCulloch (2005) add another layer of flexibility to the random-effects-only model by allowing heterogeneity that is driven by observable (demographic) characteristics of the individuals. They model the prior mean of the random coefficients as a function of the individual’s demographic variables (such as age and gender),

\begin{eqnarray*} \pi (\bgamma _ i) & =& \mbox{N} (\bGamma d_{i}, \bOmega _{\bgamma })\\ \pi (\bOmega _{\bgamma }) & =& \mbox{inverse Wishart} (\nu _0, \bV _0) \end{eqnarray*}

where ${d_ i}$ is a vector that consists of an intercept and some observable demographic variables. $\bGamma $ is a matrix of regression coefficients, which affects the location of distribution of the random coefficients. $\bGamma $ should be useful for identifying respondents who have part-worths that are different from those in the rest of the sample if some individual-level characteristics are included in ${d_ i}$. This specification allows the preferences or intercepts to vary by both demographic variables and the slopes. If ${d_ i}$ consists of only an intercept, this model reduces to the previous one. For more information about how to sample $\bGamma $, see Rossi, McCulloch, and Allenby (1996), Rossi, Allenby, and McCulloch (2005) (Section 2.12 in Chapter 2), and Rossi (2012).

Logit with Random Effects

The logit model with random effects consists of the fixed-coefficients parameters $\bbeta $, the random-coefficients parameters $\bgamma _ i$, and the covariance parameters for the random coefficients $\bOmega _{\bgamma }$. You can use the Metropolis-Hastings sampling with Gamerman approach to draw samples through the following three conditional posterior distributions:

\begin{eqnarray*} & (1)& (\bbeta | \bgamma _ i, \bY ) \\ & (2)& (\bgamma _ i | \bbeta , \bOmega _{\bgamma }, \bY ) ~ ~ ~ ~ i=1,\ldots , N \\ & (3)& (\bOmega _{\bgamma }|\bgamma _ i, \bY ) \end{eqnarray*}

All chains are initialized with random effects that are set to 0 and a covariance matrix that is set to an identity matrix. Updating is done first for the fixed effects, $\bbeta $, as a block to position the chain in the correct region of the parameter space. Then the random effects are updated, and finally the covariance of the random effects is updated. For more information, see Gamerman (1997) and the section Gamerman Algorithm.

The hierarchical Bayesian random-effects-only model as proposed in Allenby and Rossi (1999) and Rossi, Allenby, and McCulloch (2005) contains the random-coefficients parameters $\bgamma _ i$, the population mean of the random-coefficients parameters $\bar\bgamma $, and the covariance parameters for the random coefficients $\bOmega _{\bgamma }$. The sampling can be carried out by the following conditional posteriors:

\begin{eqnarray*} & (1)& (\bgamma _ i | \bar\bgamma ,\bOmega _{\bgamma }, \bY ) ~ ~ ~ ~ i=1,\ldots , N \\ & (2)& (\bar\bgamma | \bgamma _ i, \bOmega _{\bgamma }) \\ & (3)& (\bOmega _{\bgamma }|\bgamma _ i, \bar\bgamma ) \end{eqnarray*}

The second and third conditional posteriors are easy to draw because they have direct sampling distributions: the second has a normal distribution with a mean of $\sum _{i=1}^{N} \bgamma _ i/N$ and a covariance of $\bOmega _{\bgamma }/N$; the third is an $\mbox{inverse Wishart} (\nu _0+N, \bV _0+S)$, where $ S=\sum _{i=1}^{N}(\bgamma _ i-\bar\bgamma )(\bgamma _ i-\bar\bgamma )’/N$. There is no closed form for the first conditional posterior. The Metropolis-Hastings sampling with Gamerman approach is the default sampling algorithm for it.

You can also use random walk Metropolis sampling when direct sampling is not an option, such as for the fixed-coefficients parameters $\bbeta $ and random-coefficients parameters $\bgamma _ i$. You can specify ALGORITHM =RWM to choose random walk Metropolis sampling. For the random-coefficients parameters $\bgamma _ i$, Rossi, McCulloch, and Allenby (1996) and Rossi, Allenby, and McCulloch (2005) suggest random walk Metropolis sampling in which increments have covariance $s^2 \bOmega _{\bgamma }^{t}$, where s is a scaling constant whose value is usually set at $\frac{2.93}{\sqrt {\mbox{dim}(\bgamma _ i)}}$ and $\bOmega _{\bgamma }^{t}$ is the current draw of $\bOmega _{\bgamma }$.

The sampling for the random-effects-only setup is often faster, because the first conditional posterior for each i involves only the data for individual i and because the second and third conditional posteriors depend not on the data directly but only on the draws of $\bgamma _ i$.

Probit with Random Effects

The probit model with random effects has the following parameters: the fixed-coefficients parameters $\bbeta $, the covariance parameters for the error differences $\tilde\bSigma $, the random-coefficients parameters $\bgamma _ i$, and the covariance parameters for the random coefficients $\bOmega _{\bgamma }$. It has extra parameters $(\bgamma _ i, \bOmega _{\bgamma } )$ in addition to $(\bbeta , \tilde\bSigma )$ in a fixed-effects-only model. You can conveniently adapt the Gibbs sampler proposed in McCulloch and Rossi (1994) to handle the model by appending the new parameters to the set of parameters that would be drawn for a probit model without random coefficients. The sampling can be carried out by the following conditional posteriors:

\begin{eqnarray*} & (1)& (w_{ij} | \mb{w}_{i,-j}, \bbeta , \tilde\bSigma , \bgamma _ i, \bY ) ~ ~ ~ ~ i=1,\ldots , N ~ ~ \mbox{and}~ ~ j=1,\ldots , J-1 \\ & (2)& (\bbeta | \bW , \tilde\bSigma , \bgamma _ i, \bY ) \\ & (3)& (\bgamma _ i | \bW , \bbeta , \tilde\bSigma ,\bOmega _{\bgamma }, \bY ) ~ ~ ~ ~ i=1,\ldots , N \\ & (4)& (\tilde\bSigma | \bW , \bbeta , \bY ) \\ & (5)& (\bOmega _{\bgamma } | \bW , \bbeta ,\tilde\bSigma ,\bgamma _ i, \bY ) \end{eqnarray*}

All the groups of conditional distributions have closed forms that are easily drawn from. Conditional (1) is truncated normal, (2) and (3) are normal, and (4) and (5) are inverse Wishart distribution. For more information, see McCulloch and Rossi (1994).