The following data set contains 44 individuals’ genotypes at five SNPs.
data snps; input s1-s10; datalines; 2 2 2 1 2 1 1 1 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 . . 1 1 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 . . 2 1 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 . . 2 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 1 1 2 2 2 1 2 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 1 2 2 2 2 2 2 ... more lines ... 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 ;
Now an analysis using PROC ALLELE can be performed as follows:
proc allele data=snps prefix=SNP nofreq haplo=est corrcoeff dprime; var s1-s10; run;
This analysis produces summary statistics of the five SNPs as well as the “Linkage Disequilibrium Measures” table, which contains estimated two-locus haplotype frequencies and disequilibrium coefficients, and the linkage disequilibrium measures and . The allele and genotype frequency output tables are suppressed with the NOFREQ option.
The results from the analysis are shown in Output 3.2.1 and Output 3.2.2. Note the names of the markers that are used.
Output 3.2.1: Summary of SNPs for the ALLELE Procedure
Marker Summary | ||||||||
---|---|---|---|---|---|---|---|---|
Locus | Number of Indiv |
Number of Alleles |
Polymorph Info Content |
Heterozygosity | Allelic Diversity |
Test for HWE | ||
Chi- Square |
DF | Pr > ChiSq | ||||||
SNP1 | 44 | 1 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0 | . |
SNP2 | 44 | 2 | 0.1190 | 0.0909 | 0.1271 | 3.5627 | 1 | 0.0591 |
SNP3 | 41 | 2 | 0.3283 | 0.4390 | 0.4140 | 0.1493 | 1 | 0.6992 |
SNP4 | 43 | 2 | 0.3728 | 0.4884 | 0.4957 | 0.0093 | 1 | 0.9231 |
SNP5 | 44 | 1 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0 | . |
There are two SNPs that have only one allele appearing in the data.
Output 3.2.2: Linkage Disequilibrium Measures for SNPs Using the ALLELE Procedure
Linkage Disequilibrium Measures | |||||||
---|---|---|---|---|---|---|---|
Locus1 | Locus2 | Number of Indiv |
Haplotype | Frequency | LD Coeff |
Corr Coeff |
Lewontin's D' |
SNP1 | SNP2 | 44 | 2-1 | 0.0682 | 0.0000 | . | . |
SNP1 | SNP2 | 44 | 2-2 | 0.9318 | 0.0000 | . | . |
SNP1 | SNP3 | 41 | 2-1 | 0.2927 | 0.0000 | . | . |
SNP1 | SNP3 | 41 | 2-2 | 0.7073 | 0.0000 | . | . |
SNP1 | SNP4 | 43 | 2-1 | 0.5465 | 0.0000 | . | . |
SNP1 | SNP4 | 43 | 2-2 | 0.4535 | 0.0000 | . | . |
SNP1 | SNP5 | 44 | 2-2 | 1.0000 | 0.0000 | . | . |
SNP2 | SNP3 | 41 | 1-2 | 0.0732 | 0.0214 | 0.1807 | 1.0000 |
SNP2 | SNP3 | 41 | 2-1 | 0.2927 | 0.0214 | 0.1807 | 1.0000 |
SNP2 | SNP3 | 41 | 2-2 | 0.6341 | -0.0214 | -0.1807 | -1.0000 |
SNP2 | SNP4 | 43 | 1-1 | 0.0331 | -0.0050 | -0.0398 | -0.1322 |
SNP2 | SNP4 | 43 | 1-2 | 0.0367 | 0.0050 | 0.0398 | 0.1322 |
SNP2 | SNP4 | 43 | 2-1 | 0.5134 | 0.0050 | 0.0398 | 0.1322 |
SNP2 | SNP4 | 43 | 2-2 | 0.4168 | -0.0050 | -0.0398 | -0.1322 |
SNP2 | SNP5 | 44 | 1-2 | 0.0682 | 0.0000 | . | . |
SNP2 | SNP5 | 44 | 2-2 | 0.9318 | 0.0000 | . | . |
SNP3 | SNP4 | 40 | 1-1 | 0.2221 | 0.0608 | 0.2661 | 0.4382 |
SNP3 | SNP4 | 40 | 1-2 | 0.0779 | -0.0608 | -0.2661 | -0.4382 |
SNP3 | SNP4 | 40 | 2-1 | 0.3154 | -0.0608 | -0.2661 | -0.4382 |
SNP3 | SNP4 | 40 | 2-2 | 0.3846 | 0.0608 | 0.2661 | 0.4382 |
SNP3 | SNP5 | 41 | 1-2 | 0.2927 | 0.0000 | . | . |
SNP3 | SNP5 | 41 | 2-2 | 0.7073 | 0.0000 | . | . |
SNP4 | SNP5 | 43 | 1-2 | 0.5465 | 0.0000 | . | . |
SNP4 | SNP5 | 43 | 2-2 | 0.4535 | 0.0000 | . | . |
In Output 3.2.2, the values for the linkage disequilibrium measures are missing for several haplotypes; this occurs when there is only one allele at one of the markers contained in the haplotype, and thus the denominators for these measures are zero. Also note that when the markers are biallelic, the gametic disequilibria have the same absolute values for all four possible haplotypes.