The CASECONTROL Procedure

Example 4.3 Producing Odds Ratios for Various Disease Models

In addition to the chi-square test statistics between a marker and a disease, you might be interested in inferences about the odds ratios based on the table of allele-by-disease counts for each marker. You can use the OR option in the PROC CASECONTROL statement to have the odds ratios from these tables included in the OUTSTAT= data set along with confidence limits based on the level specified in the ALPHA= option (or 0.05 by default).

This data set contains 20 individuals genotyped at five SNPs, as follows.

data genotypes;
   input (g1-g5) ($) disease;
   datalines;
 B/B B/A B/A A/A A/A 1
 B/B B/B B/A A/A B/B 0
 A/B B/B B/A B/A B/B 1
 B/B A/B B/A A/A B/B 1
 B/B B/B A/B A/B B/B 0
 A/A B/B A/A B/A B/B 0
 B/B B/B B/A B/A A/B 1
 B/B B/B A/A B/A A/B 1
 B/B B/B A/A A/A B/B 1
 B/B A/A B/B B/A B/B 1
 B/B B/A B/B B/A A/B 0
 B/B B/B A/A A/A B/B 1
 B/B B/A B/B B/B B/B 0
 B/B B/B B/B A/A B/B 1
 B/B B/B B/A B/A B/B 0
 A/A B/B B/B B/B B/B 1
 B/B B/B B/B B/A B/B 1
 B/B B/B B/B B/B B/B 1
 B/B B/B B/A A/A B/A 0
 B/B B/B A/A B/A B/B 1
;

An output data set containing the odds ratios and respective confidence limits can be produced with the following code:

proc casecontrol data=genotypes genocol or;
   var g1-g5;
   trait disease;
run;

proc print heading=h;
   var Locus NumTrait0 NumTrait1 AlleleOddsRatio LowerCL UpperCL;
run;

Note that the GENOCOL option is used since columns contain genotypes, not individual alleles. The columns listed in the VAR statement of PROC PRINT are shown in Output 4.3.1. Since the odds ratios are based on the allele counts, an additive disease model is assumed.

Output 4.3.1: Output Data Set from PROC CASECONTROL Containing Odds Ratios: Additive Model

Obs Locus NumTrait0 NumTrait1 AlleleOddsRatio LowerCL UpperCL
1 g1 7 13 1.27778 0.18724 8.72011
2 g2 7 13 0.91667 0.14597 5.75651
3 g3 7 13 0.87500 0.23620 3.24146
4 g4 7 13 0.83333 0.22242 3.12219
5 g5 7 13 0.91667 0.14597 5.75651


What if you want to look at odds ratios for genotypes assuming a dominant or recessive disease model? You can use PROC FORMAT to group together genotypes, such as the heterozygous genotype with one of the homozygous genotypes. In the following code, two formats are created for the genotypes: $DOM_B. for a model where allele $B$ is dominant (or $A$ is recessive) and $REC_B. for a model where allele $B$ acts in a recessive manner.

proc format;
   value $dom_B 'A/A'='A/A'
                'B/B'='B/B'
                'A/B'='B/B'
                'B/A'='B/B'
                ;
   value $rec_B 'A/A'='A/A'
                'B/B'='B/B'
                'A/B'='A/A'
                'B/A'='A/A'
                ;
run;

proc casecontrol data=genotypes genocol or;
   var g1-g5;
   format g1-g5 $dom_b.;
   trait disease;
run;
proc print heading=h;
   var Locus NumTrait0 NumTrait1 AlleleOddsRatio LowerCL UpperCL;
run;

In this code, the FORMAT statement is used in PROC CASECONTROL to request odds ratios for a disease model where allele $B$ is dominant; that is, the genotypes $A/B$ and $B/B$ are grouped into one category. The odds ratios for genotype $A/A$ versus $A/B$ and $B/B$ are now shown in Output 4.3.2. Similarly, a disease model with $B$ as the recessive allele could be tested instead using the $REC_B. format in the FORMAT statement.

Output 4.3.2: Output Data Set from PROC CASECONTROL Containing Odds Ratios: Dominance Model

Obs Locus NumTrait0 NumTrait1 AlleleOddsRatio LowerCL UpperCL
1 g1 7 13 2.000 0.10574 37.8296
2 g2 7 13 0.000 . .
3 g3 7 13 0.375 0.03326 4.2281
4 g4 7 13 0.640 0.08798 4.6554
5 g5 7 13 0.000 . .