SUPPORT / SAMPLES & SAS NOTES
 

Support

Sample 24992: Fitting the Bradley-Terry model to preference data from items presented in pairs

DetailsAboutRate It

One type of study to evaluate preferences among items is done by presenting items to subjects in pairs. Subjects then indicate which item in each pair is preferred. Data from such paired evaluation of items can be modeled using logistic regression assuming that the multiple evaluations of a given pair are independent with fixed probability of preferring one item, and that the evaluations of different pairs are independent. To fit this model with the LOGISTIC or GENMOD procedure:

  • Specify a logistic model.
  • Specify the NOINT option to suppress the intercept.
  • Use events/trials syntax. In the data, record the number of times one item is preferred over the other (events variable), and the total number of subjects evaluating the pair (trials variable).
  • Specify one independent variable per item. The independent variables are coded:
    • 1 if the item is preferred
    • -1 if the item is not preferred
    • 0 if the item is not in the current pair
  • To obtain the Pearson and deviance chi-square fit statistics in PROC LOGISTIC, specify the SCALE=NONE option in the MODEL statement.

Example

The data in the following example show baseball results from 1987. While not quite a typical study of preference, the results of a baseball game can be thought of indicating a "preference" of one team over the other. The data below are summarized so that each observation gives a count (WIN) of the wins of one team over another during the season. For instance, the first observation indicates that Milwaukee defeated Detroit 7 times. For each pair of teams, 13 games (TOTAL) were played. Indicator variables for each team are created as defined above.

data games; 
   input mil det tor new bos cle bal win;  
   total=13;
   datalines;
1   -1  0   0   0   0   0   7  
1   0   -1  0   0   0   0   9 
1   0   0   -1  0   0   0   7 
1   0   0   0   -1  0   0   7 
1   0   0   0   0   -1  0   9 
1   0   0   0   0   0   -1  11
0   1   -1  0   0   0   0   7 
0   1   0   -1  0   0   0   5 
0   1   0   0   -1  0   0   11
0   1   0   0   0   -1  0   9 
0   1   0   0   0   0   -1  9 
0   0   1   -1  0   0   0   7 
0   0   1   0   -1  0   0   7 
0   0   1   0   0   -1  0   8 
0   0   1   0   0   0   -1  12
0   0   0   1   -1  0   0   6 
0   0   0   1   0   -1  0   7 
0   0   0   1   0   0   -1  10
0   0   0   0   1   -1  0   7 
0   0   0   0   1   0   -1  12
0   0   0   0   0   1   -1  6
;

The following statements fit the Bradley-Terry model using either PROC LOGISTIC or PROC GENMOD. Results from PROC LOGISTIC will be discussed below.

proc logistic data=games;
  model win/total = mil det tor new bos cle bal  /
        scale=none noint;
  output out=PrefProbs p=Prob;
  run;
  
proc genmod data=games;
  model win/total = mil det tor new bos cle bal  /
        dist=binomial noint;
  output out=PrefProbs p=Prob;
  run;

Following are the parameter estimates of the model from PROC LOGISTIC.

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
mil 1 1.5813 0.3433 21.2236 <.0001
det 1 1.4364 0.3396 17.8936 <.0001
tor 1 1.2945 0.3367 14.7837 0.0001
new 1 1.2476 0.3359 13.7988 0.0002
bos 1 1.1077 0.3339 11.0068 0.0009
cle 1 0.6838 0.3319 4.2458 0.0393
bal 0 0 . . .

In the Goodness of Fit table, the Pearson and deviance statistics produced by the SCALE=NONE option indicate that the model fits.

Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 15.7365 15 1.0491 0.3998
Pearson 14.6125 15 0.9742 0.4797

The predicted preference probabilities are available in the OUT= data set, which is displayed by the following statements.

proc print data=PrefProbs; 
  run;
Obs mil det tor new bos cle bal win total Prob
1 1 -1 0 0 0 0 0 7 13 0.53617
2 1 0 -1 0 0 0 0 9 13 0.57123
3 1 0 0 -1 0 0 0 7 13 0.58267
4 1 0 0 0 -1 0 0 7 13 0.61625
5 1 0 0 0 0 -1 0 9 13 0.71044
6 1 0 0 0 0 0 -1 11 13 0.82940
7 0 1 -1 0 0 0 0 7 13 0.53542
8 0 1 0 -1 0 0 0 5 13 0.54706
9 0 1 0 0 -1 0 0 11 13 0.58145
10 0 1 0 0 0 -1 0 9 13 0.67974
11 0 1 0 0 0 0 -1 9 13 0.80790
12 0 0 1 -1 0 0 0 7 13 0.51171
13 0 0 1 0 -1 0 0 7 13 0.54656
14 0 0 1 0 0 -1 0 8 13 0.64809
15 0 0 1 0 0 0 -1 12 13 0.78491
16 0 0 0 1 -1 0 0 6 13 0.53492
17 0 0 0 1 0 -1 0 7 13 0.63732
18 0 0 0 1 0 0 -1 10 13 0.77689
19 0 0 0 0 1 -1 0 7 13 0.60440
20 0 0 0 0 1 0 -1 12 13 0.75170
21 0 0 0 0 0 1 -1 6 13 0.66460

The parameter estimates are such that exponentiating an estimate yields the ratio of predicted preference probabilites involving the associated item and the last item. For instance, using the Milwaukee estimate:

                            Pr(Milwaukee over Baltimore)    0.8294
      exp(1.5813) = 4.862 = ---------------------------- =  ------
                            Pr(Baltimore over Milwaukee)    0.1706

This indicates that Milwaukee was nearly 5 times more likely to defeat Baltimore than vice versa. The probability of Milwaukee over Baltimore is given in observation 6 in the table of predicted preference probabilities. 1 minus this probability is the probability of Milwaukee over Baltimore.

Similarly, the ratio for any two items can be obtained by exponentiating the difference between the corresponding items' parameters estimates. For example, Boston was slightly more likely to defeat New York (1.15 times) than the reverse:

      exp(1.2476 - 1.1077) = 1.15
                             Pr(New York over Boston)   0.53492
                           = ------------------------ = -------
                             Pr(Boston over New York)   0.46508

Ordering effect / Home field advantage

Agresti fits a second model allowing for the effect of home field advantage. The data set below records the wins and total at home and away. A variable (HOME) indicates whether the game was at home or away.

data home;
   input mil det tor new bos cle bal win total home; 
   datalines;
1   -1  0   0   0   0   0   4   7   1
1   0   -1  0   0   0   0   4   6   1
1   0   0   -1  0   0   0   4   7   1
1   0   0   0   -1  0   0   6   7   1
1   0   0   0   0   -1  0   4   6   1
1   0   0   0   0   0   -1  6   6   1
0   1   -1  0   0   0   0   4   6   1
0   1   0   -1  0   0   0   4   7   1
0   1   0   0   -1  0   0   6   6   1
0   1   0   0   0   -1  0   6   7   1
0   1   0   0   0   0   -1  4   7   1
0   0   1   -1  0   0   0   2   6   1
0   0   1   0   -1  0   0   4   7   1
0   0   1   0   0   -1  0   4   6   1
0   0   1   0   0   0   -1  6   6   1
0   0   0   1   -1  0   0   4   7   1
0   0   0   1   0   -1  0   4   6   1
0   0   0   1   0   0   -1  6   7   1
0   0   0   0   1   -1  0   5   7   1
0   0   0   0   1   0   -1  6   6   1
0   0   0   0   0   1   -1  2   6   1

1   -1  0   0   0   0   0   3   6   -1
1   0   -1  0   0   0   0   5   7   -1
1   0   0   -1  0   0   0   3   6   -1
1   0   0   0   -1  0   0   1   6   -1
1   0   0   0   0   -1  0   5   7   -1
1   0   0   0   0   0   -1  5   7   -1
0   1   -1  0   0   0   0   3   7   -1
0   1   0   -1  0   0   0   1   6   -1
0   1   0   0   -1  0   0   5   7   -1
0   1   0   0   0   -1  0   3   6   -1
0   1   0   0   0   0   -1  5   6   -1
0   0   1   -1  0   0   0   5   7   -1
0   0   1   0   -1  0   0   3   6   -1
0   0   1   0   0   -1  0   4   7   -1
0   0   1   0   0   0   -1  6   7   -1
0   0   0   1   -1  0   0   2   6   -1
0   0   0   1   0   -1  0   3   7   -1
0   0   0   1   0   0   -1  4   6   -1
0   0   0   0   1   -1  0   2   6   -1
0   0   0   0   1   0   -1  6   7   -1
0   0   0   0   0   1   -1  4   7   -1
;

These statements fit the Bradley-Terry model allowing for the home field advantage, which is an example of an ordering effect. Deleting the HOME predictor reproduces the results of the basic Bradley-Terry model above.

proc logistic data=home;
  model win/total = mil det tor new bos cle bal home / 
        scale=none noint;
  run;
  
proc genmod data=home;
  model win/total = mil det tor new bos cle bal home/
        dist=binomial noint;
  run;

The Goodness of Fit table shows that this model fits adequately.

Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 38.6429 35 1.1041 0.3084
Pearson 34.9625 35 0.9989 0.4700

The parameter estimates table from the fitted model indicates that there was a significant home field advantage (p=0.0210). In a paired preference study, this model could be used to evaluate the effect of an item being presented first in a pair.

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
mil 1 1.6195 0.3474 21.7375 <.0001
det 1 1.4753 0.3446 18.3348 <.0001
tor 1 1.3271 0.3403 15.2063 <.0001
new 1 1.2813 0.3404 14.1687 0.0002
bos 1 1.1438 0.3378 11.4621 0.0007
cle 1 0.7047 0.3350 4.4248 0.0354
bal 0 0 . . .
home 1 0.3023 0.1309 5.3283 0.0210

_____

Agresti, A. (1990 & 2002), Categorical Data Analysis, New York: John Wiley & Sons, Inc.

McCullagh, P. and Nelder, J.A. (1989), Generalized Linear Models, Second Edition, London: Chapman and Hall.




These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.