The TRANSREG Procedure

Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO

A discrete choice experiment is constructed consisting of four product brands, each available at three different prices, $1.49, $1.99, $2.49. In addition, each choice set contains a constant other alternative available at $1.49. In the fifth choice set, price is constant. PROC TRANSREG is used to code the design, and the PHREG procedure fits the multinomial logit choice model (not shown). See Kuhfeld (2010) for more information about discrete choice modeling and the multinomial logit model; look for the latest Discrete Choice report. The following statements produce Figure 97.76:

title 'Choice Model Coding';

data design;
   array p[4];
   input p1-p4 @@;
   set = _n_;
   do brand = 1 to 4;
      price = p[brand];
      output;
   end;
   brand = .; price = 1.49; output; /* constant alternative */
   keep set brand price;
   datalines;
1.49 1.99 1.49 1.99 1.99 1.99 2.49 1.49 1.99 1.49 1.99 1.49
1.99 1.49 2.49 1.99 1.49 1.49 1.49 1.49 2.49 1.49 1.99 2.49
1.49 1.49 2.49 2.49 2.49 2.49 1.49 1.49 1.49 2.49 2.49 1.99
2.49 2.49 2.49 1.49 1.99 2.49 1.49 2.49 2.49 1.99 2.49 2.49
2.49 1.49 1.49 1.99 1.49 1.99 1.99 1.49 2.49 1.99 1.99 1.99
1.99 1.99 1.49 2.49 1.99 2.49 1.99 1.99 1.49 2.49 1.99 2.49
;
proc transreg data=design design norestoremissing nozeroconstant;
   model class(brand / zero=none) identity(price);
   output out=coded;
   by set;
run;

proc print data=coded(firstobs=21 obs=25);
   var set brand &_trgind;
run;

In the interest of space, only the fifth choice set is displayed in Figure 97.76.

Figure 97.76: The Fifth Choice Set

Choice Model Coding

Obs set brand brand1 brand2 brand3 brand4 price
21 5 1 1 0 0 0 1.49
22 5 2 0 1 0 0 1.49
23 5 3 0 0 1 0 1.49
24 5 4 0 0 0 1 1.49
25 5 . 0 0 0 0 1.49


For the constant alternative (Brand = .), the brand coding is a row of zeros due to the NORESTOREMISSING o-option, and Price is a constant $1.49 (instead of 0) due to the NOZEROCONSTANT.

The data set was coded by choice set (BY set;). This is a small problem. With very large problems, it might be necessary to restrict the number of observations that are coded at one time so that the procedure uses less time and memory. Coding by choice set is one option. When coding is performed after the data are merged in, coding by subject and choice set combinations is another option. Alternatively, you can specify DESIGN=n, where n is the number of observations to code at one time. For example, you can specify DESIGN=100 or DESIGN=1000 to process the data set in blocks of 100 or 1000 observations. Specify the NOZEROCONSTANT a-option to ensure that constant variables within blocks are not zeroed. When you specify DESIGN=n, or perform coding after the data are merged in, specify the dependent variable and any other variables needed for analysis as ID variables.