SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 23127: Estimating the odds ratio for matched pairs data with binary response

DetailsAboutRate It

For matched pairs data with a binary response (such as yes/no responses from husband and wife pairs), the AGREE option in PROC FREQ provides a test of equal probability of a yes response. This is McNemar's test of marginal homogeneity. However, as discussed by Fleiss (2003), an estimator other than the usual odds ratio estimator should be used for matched pairs data. An estimate of the odds ratio for matched pairs data can be obtained by using the CMH option with a stratified table specification in PROC FREQ, or using the STRATA statement in PROC LOGISTIC.

To estimate the difference in probabilities (risk difference) with matched pairs data, rather than the odds ratio, see this note. To estimate the risk difference between independent groups, rather than in matched pairs, see this note.

Example

Using the retrospective study example presented by Fleiss (2003) with matched case-control pairs, the following statements compute McNemar's test (without continuity correction), using the AGREE option, and the usual odds ratio estimate for unmatched data, using the RELRISK option:

    data a;
        do case = 'present','absent';
            do control = 'present','absent';
                input count @@;
                output;
            end;
        end;
    datalines;
    15 20
     5 60
    ;

    proc freq order=data;
        weight count;
        table case * control / agree relrisk;
    run;

McNemar's test statistic is significant at p=0.0027. However, the odds ratio estimate (9.0) from the RELRISK option does not account for the data being matched.

Frequency
Percent
Row Pct
Col Pct
Table of case by control
case control
present absent Total
present
15
15.00
42.86
75.00
20
20.00
57.14
25.00
35
35.00
 
 
absent
5
5.00
7.69
25.00
60
60.00
92.31
75.00
65
65.00
 
 
Total
20
20.00
80
80.00
100
100.00

Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence Limits
Case-Control (Odds Ratio) 9.0000 2.9027 27.9051
Cohort (Col1 Risk) 5.5714 2.2094 14.0497
Cohort (Col2 Risk) 0.6190 0.4607 0.8318

McNemar's Test
Statistic (S) 9.0000
DF 1
Pr > S 0.0027

The steps below compute the appropriate odds ratio estimate and confidence interval for matched pairs data. The data must first be arranged in a stratified layout in which a variable identifies the pairs (strata) and another variable identifies the subject in each pair. In the following statements, an observation is created for each subject in each pair: RESPONSE='case' indicates the subject is a case; RESPONSE='control' indicates the subject is a control. The FACTOR variable indicates whether the predictive factor is present or absent. The stratifying variable, ID, has a unique value for each pair so that the members in each pair have the same value of ID.

    data indiv;
       set a;
       retain id 0;
       do id=id+1 to id+count;
         factor=case; response='case'; output;
         factor=control; response='control'; output;
       end;
       keep id factor response;
       run;

These statements display the observations for the first four pairs which were all from the present/present factor combination.

     proc print data=indiv (obs=8);
       id id;
       run;

The odds ratio can be computed via stratified analyses in the FREQ or LOGISTIC procedure. In PROC FREQ, specify a three-way table with the pair identifier, ID, as a stratifying variable and specify the CMH option. Note that in the TABLE statement of PROC FREQ, the last (rightmost) variable in a table specification is the column variable, the next variable to the left is the row variable, and all variables to the left of the row variable are stratifying variables. The NOPRINT option suppresses printing of the separate tables for the pairs (100 tables in this case). In PROC LOGISTIC, the STRATA statement specifies the stratification variable(s) and requests the appropriate conditional logistic model. PROC LOGISTIC provides point and confidence interval estimates of the odds ratio. The optional EXACT statement can be used to provide an exact conditional estimate and confidence interval of the odds ratio.

    proc freq order=data;
        table id*factor*response / cmh noprint;
        run;

    proc logistic;
        strata id;
        class factor (ref='absent') / param=ref;
        model response(event='case') = factor;
        exact factor / estimate=odds;
    run;

The correct estimate of the odds ratio from this matched pairs data is 4.0 which is provided by the Mantel-Haenszel estimate from the CMH option in PROC FREQ and by the asymptotic and exact odds ratio estimates from PROC LOGISTIC. FREQ and LOGISTIC provide a 95% asymptotic confidence interval for the odds ratio is (1.5, 10.7). PROC LOGISTIC also provides an exact 95% confidence interval (1.46, 13.64).

Estimates of the Common Relative Risk (Row1/Row2)
Type of Study Method Value 95% Confidence Limits
Case-Control Mantel-Haenszel 4.0000 1.5013 10.6576
(Odds Ratio) Logit ** 3.7372 1.5114 9.2406
Cohort Mantel-Haenszel 4.0000 1.5013 10.6576
(Col1 Risk) Logit ** 1.9332 1.1654 3.2067
Cohort Mantel-Haenszel 0.2500 0.0938 0.6661
(Col2 Risk) Logit ** 0.5173 0.3119 0.8580

Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
factor present vs absent 4.000 1.501 10.658

Exact Odds Ratios
Parameter   Estimate 95% Confidence Limits Two-sided p-Value
factor present 4.000 1.457 13.639 0.0041

______

Fleiss, J. L., Levin, B., and Paik, M. C. (2003), Statistical Methods for Rates and Proportions, 3d ed. New York: John Wiley & Sons, Inc.



Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemSAS/STATAlln/a
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.