The BCHOICE Procedure (Experimental)

PREDDIST Statement

PREDDIST <'label'> OUTPRED=SAS-data-set <COVARIATES=SAS-data-set> ;

The PREDDIST statement creates a new SAS data set that contains random samples from the posterior predictive distribution of the choice probabilities. The posterior predictive distribution is the distribution of unobserved observations (prediction) conditional on the observed data. Let $\mb {Y}$ be the observed data, $\mb {X}$ be the covariates, $\btheta $ be the parameter, and $\mb {Y}_{\mbox{pred}}$ be the unobserved data. The posterior predictive distribution is defined as follows:

\begin{eqnarray*}  p(\mb {Y}_{\mbox{pred}} | \mb {Y}, \mb {X}) & =&  \int p(\mb {Y}_{\mbox{pred}}, \btheta | \mb {Y}, \mb {X}) d\btheta \\ & =&  \int p(\mb {Y}_{\mbox{pred}} | \btheta , \mb {Y}, \mb {X}) p(\btheta | \mb {Y}, \mb {X}) d\btheta \\ \end{eqnarray*}

Assuming that the observed and unobserved data are conditionally independent given $\btheta $, the posterior predictive distribution can be further simplified as follows:

\[  p(\mb {Y}_{\mbox{pred}} | \mb {Y}, \mb {X} ) = \int p(\mb {Y}_{\mbox{pred}} | \btheta ) p(\btheta | \mb {Y}, \mb {X}) d\btheta  \]

The posterior predictive distribution is an integral of the likelihood function $p(\mb {Y}_{\mbox{pred}} | \btheta )$ with respect to the posterior distribution $p(\btheta | \mb {Y})$. The PREDDIST statement generates samples from a posterior predictive distribution based on draws from the posterior distribution of $\btheta $.

You can specify the following options:


names the SAS data set to contain the sets of explanatory variable values for which the predictions are established. This data set must contain data that has the same variables used in the model. If you omit the COVARIATES= option, the DATA= data set that is specified in the PROC BCHOICE statement is used instead.


specifies the number of alternatives in a choice set in the COVARIATES= data set. All choice sets in the data must have the same number of alternatives.


creates an output data set to contain the samples from the posterior predictive distribution.