The CASECONTROL Procedure

Example 5.1 Performing Case-Control Tests on Multiallelic Markers

The following data are taken from GAW9 (Hodge 1995). A sample of 60 founders was taken from 200 nuclear families, 30 affected with a disease and 30 unaffected. Each founder was genotyped at two marker loci.

data founders;
   input id disease a1-a4 @@;
   datalines;
4   1 6 4 3 7  17  2 4 7 2 7
39  2 6 8 7 7  41  2 4 4 4 7
46  1 8 4 1 5  50  2 4 2 3 7
54  2 4 8 7 6  56  2 7 4 7 7
62  2 4 1 7 3  69  2 6 8 2 7
79  1 6 6 8 7  80  2 6 4 7 3
83  2 8 4 2 7  85  1 5 6 6 2
95  1 3 2 3 7  101 1 4 6 7 7
106 1 2 1 7 2  107 1 1 2 7 7
115 2 4 2 7 5  116 1 4 1 7 3
120 2 1 6 2 7  123 2 4 4 7 2
130 1 5 2 3 7  133 1 8 6 3 6
134 1 8 4 2 2  139 2 6 4 7 6
142 2 3 6 7 7  151 1 4 6 4 3
152 1 6 7 6 7  153 1 5 1 7 6
154 1 4 6 6 6  168 1 1 4 3 7
178 2 4 1 7 1  187 1 1 8 1 2
189 2 6 4 5 7  190 2 4 4 3 7
195 2 4 4 7 2  207 2 1 6 7 7
216 1 7 4 1 5  222 2 4 2 7 3
225 2 8 7 7 6  234 1 6 4 2 2
244 1 4 4 7 6  249 2 6 8 7 2
263 1 8 2 3 7  267 2 2 2 2 7
276 2 1 6 7 1  284 2 4 8 2 2
286 1 8 8 2 1  289 1 2 6 6 3
290 1 2 4 5 7  294 2 1 8 6 7
297 2 5 4 7 6  313 1 1 7 7 2
337 1 2 6 7 6  366 2 2 2 7 7
368 2 3 1 7 2  381 1 6 4 5 3
384 1 6 2 2 7  396 1 4 5 7 2
;             

The multiallelic versions of the association tests are performed since each marker has more than two alleles. The following code invokes the three case-control tests to find out whether there is a significant association between either of the markers and disease status. Note that the same output could be produced by omitting the three tests, ALLELE, GENOTYPE, and TREND, from the PROC CASECONTROL statement.

proc casecontrol data=founders genotype allele trend;
   trait disease;
   var a1-a4;
run;
proc print noobs heading=h;
    format ProbAllele ProbGenotype ProbTrend pvalue6.5;
    format ChiSqAllele ChiSqGenotype ChiSqTrend 6.3;
run;

An output data set is created by default, and the output from the PRINT procedure is displayed in Output 5.1.1.

Output 5.1.1: Output Data Set from PROC CASECONTROL for Multiallelic Markers

Locus NumTrait1 NumTrait2 ChiSqGenotype ChiSqAllele ChiSqTrend dfGenotype dfAllele dfTrend ProbGenotype ProbAllele ProbTrend
M1 30 30 27.333 4.441 5.039 24 7 7 0.2892 0.7278 0.6552
M2 30 30 18.077 8.772 13.244 15 7 7 0.2586 0.2694 0.0664


This analysis finds no significant association between disease status and either of the markers. Suppose, however, that allele 7 of the second marker had been identified by previous studies as an allele of interest for this particular disease, and thus there is concern that its effect is swamped by the other seven alleles. The data set can be modified as follows so that the second marker is considered a biallelic marker with alleles 7 and not 7.

data marker2;
   set founders;
   if a3 ne 7 then a3=1;
   if a4 ne 7 then a4=1;
   keep id a3 a4 disease;

Now all three tests can be performed on the marker in the new data set, as follows:

proc casecontrol data=marker2;
   trait disease;
   var a3 a4;
run;
proc print noobs heading=h;
    format ProbAllele ProbGenotype ProbTrend pvalue6.5;
    format ChiSqAllele ChiSqGenotype ChiSqTrend 6.3;
run;

PROC CASECONTROL performs all three tests by default since none were specified. The output data set for this analysis is displayed in Output 5.1.2.

Output 5.1.2: Output Data Set from PROC CASECONTROL for a Biallelic Marker

Locus NumTrait1 NumTrait2 ChiSqGenotype ChiSqAllele ChiSqTrend dfGenotype dfAllele dfTrend ProbGenotype ProbAllele ProbTrend
M1 30 30 12.193 6.599 10.103 2 1 1 0.0023 0.0102 0.0015


With just the single allele of interest, there is now a significant association (using a significance level of $\alpha =0.05$) according to all three case-control tests between the marker (specifically, allele 7) and disease status. Note that the allele and trend tests, both of which are testing for additive allele effects, produce quite different $p$-values, which could be an indication that HWE does not hold for allele 7. This is in fact the case, which can be checked by running the ALLELE procedure on data set marker2 to test for HWE (see Chapter 3: The ALLELE Procedure, for more information). The excess of heterozygotes forces $X^2_ A$ to be smaller than $X^2_ T$, and only $X^2_ T$ remains a valid chi-square statistic under the HWE violation.