The ALLELE Procedure

Example 3.2 Computing Linkage Disequilibrium Measures for SNP Data

The following data set contains 44 individuals’ genotypes at five SNPs.

data snps;
   input s1-s10;
   datalines;
2 2 2 1 2 1 1 1 2 2
2 2 2 2 2 1 1 1 2 2
2 2 2 2 2 1 2 1 2 2
2 2 2 2 . . 1 1 2 2
2 2 2 2 1 2 1 2 2 2
2 2 2 2 . . 2 1 2 2
2 2 2 2 2 1 2 1 2 2
2 2 2 2 . . 2 1 2 2
2 2 2 2 1 1 1 1 2 2
2 2 1 1 2 2 2 1 2 2
2 2 2 1 2 2 2 1 2 2
2 2 2 2 1 1 1 1 2 2
2 2 2 1 2 2 2 2 2 2

   ... more lines ...   

2 2 2 1 2 2 2 1 2 2
2 2 2 2 2 2 2 1 2 2
2 2 2 2 2 1 1 1 2 2
2 2 2 2 2 2 1 1 2 2
2 2 2 2 2 1 2 1 2 2
2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 1 2 2
2 2 2 2 2 2 2 1 2 2
;

Now an analysis using PROC ALLELE can be performed as follows:

proc allele data=snps prefix=SNP nofreq haplo=est 
      corrcoeff dprime;
   var s1-s10;
run;

This analysis produces summary statistics of the five SNPs as well as the Linkage Disequilibrium Measures table, which contains estimated two-locus haplotype frequencies and disequilibrium coefficients, and the linkage disequilibrium measures $r$ and $D’$. The allele and genotype frequency output tables are suppressed with the NOFREQ option.

The results from the analysis are shown in Output 3.2.1 and Output 3.2.2. Note the names of the markers that are used.

Output 3.2.1: Summary of SNPs for the ALLELE Procedure

The ALLELE Procedure

Marker Summary
Locus Number
of
Indiv
Number
of
Alleles
Polymorph
Info
Content
Heterozygosity Allelic
Diversity
Test for HWE
Chi-
Square
DF Pr > ChiSq
SNP1 44 1 0.0000 0.0000 0.0000 0.0000 0 .
SNP2 44 2 0.1190 0.0909 0.1271 3.5627 1 0.0591
SNP3 41 2 0.3283 0.4390 0.4140 0.1493 1 0.6992
SNP4 43 2 0.3728 0.4884 0.4957 0.0093 1 0.9231
SNP5 44 1 0.0000 0.0000 0.0000 0.0000 0 .


There are two SNPs that have only one allele appearing in the data.

Output 3.2.2: Linkage Disequilibrium Measures for SNPs Using the ALLELE Procedure

Linkage Disequilibrium Measures
Locus1 Locus2 Number
of
Indiv
Haplotype Frequency LD
Coeff
Corr
Coeff
Lewontin's
D'
SNP1 SNP2 44 2-1 0.0682 0.0000 . .
SNP1 SNP2 44 2-2 0.9318 0.0000 . .
SNP1 SNP3 41 2-1 0.2927 0.0000 . .
SNP1 SNP3 41 2-2 0.7073 0.0000 . .
SNP1 SNP4 43 2-1 0.5465 0.0000 . .
SNP1 SNP4 43 2-2 0.4535 0.0000 . .
SNP1 SNP5 44 2-2 1.0000 0.0000 . .
SNP2 SNP3 41 1-2 0.0732 0.0214 0.1807 1.0000
SNP2 SNP3 41 2-1 0.2927 0.0214 0.1807 1.0000
SNP2 SNP3 41 2-2 0.6341 -0.0214 -0.1807 -1.0000
SNP2 SNP4 43 1-1 0.0331 -0.0050 -0.0398 -0.1322
SNP2 SNP4 43 1-2 0.0367 0.0050 0.0398 0.1322
SNP2 SNP4 43 2-1 0.5134 0.0050 0.0398 0.1322
SNP2 SNP4 43 2-2 0.4168 -0.0050 -0.0398 -0.1322
SNP2 SNP5 44 1-2 0.0682 0.0000 . .
SNP2 SNP5 44 2-2 0.9318 0.0000 . .
SNP3 SNP4 40 1-1 0.2221 0.0608 0.2661 0.4382
SNP3 SNP4 40 1-2 0.0779 -0.0608 -0.2661 -0.4382
SNP3 SNP4 40 2-1 0.3154 -0.0608 -0.2661 -0.4382
SNP3 SNP4 40 2-2 0.3846 0.0608 0.2661 0.4382
SNP3 SNP5 41 1-2 0.2927 0.0000 . .
SNP3 SNP5 41 2-2 0.7073 0.0000 . .
SNP4 SNP5 43 1-2 0.5465 0.0000 . .
SNP4 SNP5 43 2-2 0.4535 0.0000 . .


In Output 3.2.2, the values for the linkage disequilibrium measures are missing for several haplotypes; this occurs when there is only one allele at one of the markers contained in the haplotype, and thus the denominators for these measures are zero. Also note that when the markers are biallelic, the gametic disequilibria have the same absolute values for all four possible haplotypes.