The BCHOICE Procedure (Experimental)

Example 27.7 Predict the Choice Probabilities

This example shows how to obtain the posterior predictive distribution of the choice probability that each alternative is chosen from a choice set. The posterior predictive distribution enables you to get the expected choice probabilities of all the alternatives in the data, or even to predict market share for simulated or hypothetical products or marketplaces that do not directly reflect the choice set in the data.

Suppose you have a data set that contains all the attribute variables (the design matrix) for all the alternatives in a choice set. For example, in the candy study earlier in the chapter, in the section A Simple Logit Model, you can use the same eight alternatives: Dark is 1 for dark chocolate and 0 for soft chocolate; Soft is 1 for soft center and 0 for chewy center; Nuts is 1 if the candy contains nuts and 0 if it contains no nuts. The following data set contains the eight alternatives:

data DesignMatrix;
   input Dark Soft Nuts;
datalines;
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
;

You can use the PREDDIST statement, which obtains samples from the posterior predictive distribution of each of the choice probabilities by using the posterior samples of parameters in the model:

proc bchoice data=Chocs outpost=Bsamp nmc=10000 thin=2 seed=124;
   class Dark(ref='0') Soft(ref='0') Nuts(ref='0') Subj;
   model Choice = Dark Soft Nuts / choiceset=(Subj);
   preddist covariates=DesignMatrix nalter=8 outpred=Predout;
run;

%POSTSUM(data=Predout, var=Prob_1_:);

In the PREDDIST statement, the COVARIATES= option names the data set to contain the explanatory variable values for which the predictions are established. This data set must contain data that have the same variables that are used in the model. The NALTER= option specifies the number of alternatives in each choice set in the COVARIATES= data set. All choice sets in the data must have the same number of alternatives. If you omit the COVARIATES= option, the DATA= data set that you specify in the PROC BCHOICE statement is used instead. The OUTPRED= option creates an output data set to contain the samples from the posterior predictive distribution of the choice probabilities. Then you can use SAS autocall macros to analyze the posterior samples. For example, the %POSTSUM macro provides summary statistics.

You can predict the choice probabilities by using the means of the posterior distributions. The results from using the %POSTSUM macro are shown in Output 27.7.1. There is only one choice set for choice probability prediction, in which there are a total of eight alternatives. This explains the parameter names in the first column of the output, where the first number indexes the choice sets and the second number indexes the alternatives in each choice set. The most preferred chocolate candy is the sixth one, Dark/Chewy/Nuts, which takes about half the market.

Output 27.7.1: Choice of Chocolate Candies

Summary Statistics

Parameter N Mean StdDev P25 P50 P75
Prob_1_1 5000 0.05541 0.04252 0.02396 0.04529 0.07282
Prob_1_2 5000 0.13093 0.08009 0.06949 0.11487 0.17965
Prob_1_3 5000 0.00686 0.00896 0.00137 0.00366 0.00853
Prob_1_4 5000 0.01578 0.01786 0.00389 0.00966 0.02061
Prob_1_5 5000 0.21016 0.10328 0.13431 0.19301 0.27497
Prob_1_6 5000 0.49462 0.13152 0.40465 0.49734 0.58797
Prob_1_7 5000 0.02617 0.02793 0.00717 0.01683 0.03550
Prob_1_8 5000 0.06008 0.05261 0.02032 0.04571 0.08514