The BCHOICE Procedure

A Logit Model with Random Effects

Choice models that have random effects (or random coefficients) provide solutions to create individual-level or group-specific utilities. Because people have different preferences, it can be misleading to roll the whole sample together into a single set of utilities. The desire to account for individual differences, instead of treating all respondents alike, provides challenges in marketing research. For logit models that have random effects, using frequentist methods to optimize of the likelihood function can be numerically difficult. Bayesian methods are ideally suited for analysis with random effects.

Choice models that have random effects generalize the standard choice models to incorporate individual-level effects. Let the utility that individual i obtains from alternative j in choice situation t ($t=1,\ldots , T$ ) be

\begin{eqnarray*} u_{ijt} & =& \mb{x}_{ijt}’\bbeta + \mb{z}_{ijt}’\bgamma _ i+ \epsilon _{ijt}\\ y_{ijt} & =& 1 ~ ~ \mbox{if}~ ~ u_{ijt}\ge \max (u_{i1t}, u_{i2t}, \ldots , u_{iJt})\\ & =& 0 ~ ~ \mbox{otherwise} \end{eqnarray*}

where $ y_{ijt}$ is the observed choice for individual i and alternative j in choice situation t; $\mb{x}_{ijt}$ is the fixed design vector for individual i and alternative j in choice situation t; $\bbeta $ are the fixed coefficients; $\mb{z}_{ijt}$ is the random design vector for individual i and alternative j in choice situation t; and $\bgamma _ i$ are the random coefficients for individual i corresponding to $\mb{z}_{ijt}$.

It is assumed that each $\bgamma _ i$ is drawn from a superpopulation and that this superpopulation is normal, $\bgamma _ i \sim \mbox{iid}~ ~ ~  \mbox{N}(\mb{0}, \bOmega _{\bgamma })$. An additional stage is added to the model in which a prior for $\bOmega _{\bgamma }$ is specified:

\begin{eqnarray*} \pi (\bgamma _ i) & =& \mbox{N} (\mb{0}, \bOmega _{\bgamma })\\ \pi (\bOmega _{\bgamma }) & =& \mbox{inverse Wishart} (\nu _0, \bV _0) \end{eqnarray*}

The covariance matrix $\bOmega _{\bgamma }$ characterizes the extent of heterogeneity among individuals. Large diagonal elements of $\bOmega _{\bgamma }$ indicate substantial heterogeneity in part-worths. Off-diagonal elements indicate patterns in the evaluation of attribute levels.

Consider a study that estimates the market demand for kitchen trash cans (Rossi 2013). There are four attributes, and each has two levels: touchless opening (Yes/No), material (Steel/Plastic), automatic trash bag replacement (Yes/No), and price ($80/$40). The number of all possible hypothetical types of trash cans is $2^4=16$. Including more attributes and more levels can easily become unmanageable. The study uses a fractional factorial design, in which the first three factors are set up to be a full factorial design and the fourth is generated as the product of the first three. This design confounds the three-way interaction with the effect of the fourth factor, shown in Table 27.1.

Table 27.1: Design for the Trash Can Study

Obs

Touchless

Steel

AutoBag

Price80

1

–1

–1

–1

–1

2

–1

–1

1

1

3

–1

1

–1

1

4

–1

1

1

–1

5

1

–1

–1

1

6

1

–1

1

–1

7

1

1

–1

–1

8

1

1

1

1


In Table 27.1, 1 means "Yes" and –1 means "No." This is a balanced design, in which each level appears the same number of times. This study assigns only two alternatives to a choice set by randomly sampling two rows from the previous table and giving each individual 10 choice sets (or choice tasks) to pick from. For more information about how to design a choice model efficiently, see Kuhfeld (2010).

Data were obtained by enrolling 104 people and assigning 10 choice tasks to each of them: for each task, the participants stated their preference between two types of trash cans. The following steps read in the data:

data Trashcan;
   input ID Task Choice Index Touchless Steel AutoBag Price80 @@;
   datalines;  
1 1 1 1 0 1 1 0 1 1 0 2 1 1 0 0 1 2 0 1 0 0 0 0 1 2 1 2 1 1 1 1 1 3 0 1
0 0 0 0 1 3 1 2 1 1 0 0 1 4 0 1 0 1 0 1 1 4 1 2 1 0 0 1 1 5 1 1 0 1 1 0
1 5 0 2 1 0 0 1 1 6 0 1 0 0 1 1 1 6 1 2 1 1 1 1 1 7 0 1 1 0 0 1 1 7 1 2
1 1 0 0 1 8 0 1 0 0 1 1 1 8 1 2 0 1 1 0 1 9 1 1 0 0 1 1 1 9 0 2 1 0 0 1
1 10 0 1 0 1 1 0 1 10 1 2 1 0 1 0 2 1 1 1 0 1 1 0 2 1 0 2 1 1 0 0 2 2 0
1 0 0 0 0 2 2 1 2 1 1 1 1 2 3 0 1 0 0 0 0 2 3 1 2 1 1 0 0 2 4 0 1 0 1 0
1 2 4 1 2 1 0 0 1 2 5 1 1 0 1 1 0 2 5 0 2 1 0 0 1 2 6 1 1 0 0 1 1 2 6 0
2 1 1 1 1 2 7 1 1 1 0 0 1 2 7 0 2 1 1 0 0 2 8 1 1 0 0 1 1 2 8 0 2 0 1 1
0 2 9 1 1 0 0 1 1 2 9 0 2 1 0 0 1 2 10 1 1 0 1 1 0 2 10 0 2 1 0 1 0 3 1

   ... more lines ...   

2 1 2 1 1 1 1 104 3 0 1 0 0 0 0 104 3 1 2 1 1 0 0 104 4 0 1 0 1 0 1 104
4 1 2 1 0 0 1 104 5 1 1 0 1 1 0 104 5 0 2 1 0 0 1 104 6 0 1 0 0 1 1 104
6 1 2 1 1 1 1 104 7 0 1 1 0 0 1 104 7 1 2 1 1 0 0 104 8 0 1 0 0 1 1 104
8 1 2 0 1 1 0 104 9 0 1 0 0 1 1 104 9 1 2 1 0 0 1 104 10 0 1 0 1 1 0 104
10 1 2 1 0 1 0
;
proc print data=Trashcan (obs=8);
run;

The data for the first four choice tasks are shown in Figure 27.8.

Figure 27.8: Data for the First Four Choice Tasks

Obs ID Task Choice Index Touchless Steel AutoBag Price80
1 1 1 1 1 0 1 1 0
2 1 1 0 2 1 1 0 0
3 1 2 0 1 0 0 0 0
4 1 2 1 2 1 1 1 1
5 1 3 0 1 0 0 0 0
6 1 3 1 2 1 1 0 0
7 1 4 0 1 0 1 0 1
8 1 4 1 2 1 0 0 1



In the data, ID is the individual’s ID number, and Task indexes the number of choice tasks. The response is Choice, which states each individual’s choice for each choice task. Touchless, Steel, AutoBag, and Price80 are the attribute variables; for each of them, 1 means "Yes" and 0 means "No." In the data, 0 replaces the –1 values that are shown in the design matrix in Table 27.1.

The following statements fit a logit model with random effects:

proc bchoice data=Trashcan seed=1 nmc=30000 thin=2 nthreads=4;
   class ID Task;
   model Choice = Touchless Steel AutoBag Price80 / choiceset=(ID Task);
   random Touchless Steel AutoBag Price80 / sub=ID monitor=(1 to 5) type=un;
run;

The NTHREADS option in the PROC BCHOICE statement specifies the number of threads to be used for analytic computations and sampling. Using four threads at the same time enhances the efficiency and reduces the run time. If you do not specify the NTHREADS option, the default number is 1. The maximum number of threads should not exceed the total number of CPUs on the host where the analytic computations execute.

The choice set is specified by ID (which identifies the participants) and by Task (which identifies each of the 10 choice tasks that are assigned to each participant). The variables ID and Task are needed in the CLASS statement because they define the choice set in the MODEL statement.

In addition to the MODEL statement for fixed effects, the RANDOM statement is added for random effects. Note that Touchless, Steel, AutoBag, and Price80 are listed as both fixed and random effects, so that their average part-worth values in the population are estimated via fixed effects and the deviation from the overall mean for each individual is presented through random effects. The SUB =ID argument in the RANDOM statement defines ID as a subject index for the random effects grouping, so that each person with a different ID has his or her own random effects. The MONITOR option requests the production of the individual-level random-effects parameter estimates, and the MONITOR =(1 to 5) option requests the random-effects parameter estimates for the first five subjects, (By default, PROC BCHOICE does not output results for any individual-level random-effects parameters.) The TYPE =UN option in the RANDOM statement specifies an unstructured covariance matrix for the random effects. The unstructured type provides a mechanism for estimating the correlation between the random effects. The TYPE =VC (variance components) option, which is the default structure, models a different variance component for each random effect.

Summary statistics for the fixed coefficients ($\bbeta $), the covariance of the random coefficients ($\bOmega _{\bgamma }$), and the random coefficients ($\bgamma _ i$) for the first five individuals are shown in Figure 27.9.

Figure 27.9: Posterior Summary Statistics

The BCHOICE Procedure

Posterior Summaries and Intervals
Parameter Subject N Mean Standard
Deviation
95% HPD Interval
Touchless   15000 1.7043 0.2709 1.1645 2.2322
Steel   15000 1.0516 0.2608 0.5376 1.5691
AutoBag   15000 2.1722 0.3583 1.4886 2.8948
Price80   15000 -4.6321 0.6720 -5.9688 -3.4046
RECov Touchless, Touchless   15000 3.1145 1.0613 1.3441 5.2576
RECov Steel, Touchless   15000 -0.4727 0.8796 -2.1884 1.2334
RECov Steel, Steel   15000 2.6312 0.9565 1.1035 4.5817
RECov AutoBag, Touchless   15000 -0.6473 0.8811 -2.3803 1.1974
RECov AutoBag, Steel   15000 0.0595 0.7078 -1.4318 1.3748
RECov AutoBag, AutoBag   15000 3.5547 1.5837 0.9611 6.7248
RECov Price80, Touchless   15000 -1.3016 1.1857 -3.8046 0.7154
RECov Price80, Steel   15000 -1.5935 1.2052 -4.1428 0.6170
RECov Price80, AutoBag   15000 -2.2190 1.7156 -5.7073 0.7349
RECov Price80, Price80   15000 7.7452 3.4362 2.2287 14.3546
Touchless ID 1 15000 0.5244 1.0525 -1.3366 2.7890
Steel ID 1 15000 -0.3298 1.0799 -2.3057 1.9751
AutoBag ID 1 15000 1.6216 1.3842 -0.9207 4.4860
Price80 ID 1 15000 -0.5394 2.2018 -5.2251 3.4324
Touchless ID 2 15000 -1.7498 0.9341 -3.5965 0.0610
Steel ID 2 15000 -1.4343 0.8572 -3.2176 0.2048
AutoBag ID 2 15000 0.4767 1.3208 -2.0239 3.1743
Price80 ID 2 15000 4.6873 1.4955 1.9223 7.7535
Touchless ID 3 15000 -0.6191 0.9695 -2.4024 1.4490
Steel ID 3 15000 0.5779 0.9524 -1.2792 2.4187
AutoBag ID 3 15000 -1.8737 1.2107 -4.2878 0.4407
Price80 ID 3 15000 3.7507 1.6747 0.4795 7.0672
Touchless ID 4 15000 -0.5921 0.9679 -2.5222 1.2694
Steel ID 4 15000 0.3152 1.0416 -1.7134 2.3286
AutoBag ID 4 15000 0.9770 1.4250 -1.7103 3.9276
Price80 ID 4 15000 -1.9068 2.3717 -6.4256 2.8233
Touchless ID 5 15000 -1.2689 1.1491 -3.4720 0.9828
Steel ID 5 15000 1.6581 1.2220 -0.4731 4.2409
AutoBag ID 5 15000 1.2695 1.5708 -1.7506 4.4762
Price80 ID 5 15000 -0.3706 2.3810 -5.0593 4.2134



The fixed effects (Touchless, Steel, AutoBag, and Price80) are shown in the first four rows. Across all the respondents in the data, the average part-worths for touchless opening, steel material, and automatic trash bag replacement are all positive, indicating that most people favor those features; the average part-worth for having to pay USD80 for a trash can instead of USD40 is negative (–4.6), which is very intuitive, because spending more money is usually unfavorable.

The covariance estimate of the random coefficients ($\bOmega _{\bgamma }$) is displayed by the parameters whose label begins with "RECov":

\[ \hat{\bOmega }_{\bgamma } = \begin{pmatrix} 3.11 & . & . & . \\ -0.47 & 2.63 & . & . \\ -0.65 & 0.06 & 3.55 & . \\ -1.30 & -1.59 & -2.22 & 7.75 \end{pmatrix} \]

where the dots refer to the corresponding elements in the lower part of the symmetric covariance matrix. The covariance estimate of the random coefficients ($\bOmega _{\bgamma }$) characterizes the variability of part-worths across respondents. Some of the diagonal elements of the matrix are large. For example, the variance for price (labeled "RECov Price80, Price80") is quite large, indicating substantial unexplained difference in response to price. Off-diagonal elements of the matrix illustrate attribute levels that tend to be evaluated similarly (positive covariance) or differently (negative covariance) across all the respondents. The covariances between each of the attributes (Touchless, Steel, and AutoBag) and Price80 are all negative, implying that the respondents who prefer some of the new features are those who are also unwilling to pay a higher price for the trash can. Therefore, offering a discounted price might be a particularly effective method of introducing the new features to customers.

The next set of parameters that are displayed are the estimates for the individual-level random effects for the first five respondents (see Figure 27.9). These estimates are the deviation from the overall means (which are estimated via the fixed effects). The part-worth for touchless opening for the first respondent (who is labeled "ID 1" in the Subject column) is 1.7 + 0.5 = 2.2.

Allenby and Rossi (1999) and Rossi, Allenby, and McCulloch (2005) propose a hierarchical Bayesian random-effects model that is set up in a different way such that there are no fixed effects but only random effects. For more information about this type of model, see the section Random Effects and a follow-up example in A Random-Effects-Only Logit Model.