The HAPLOTYPE Procedure |
Example |
Assume you have a random sample with 25 individuals genotyped at four markers. You want to infer the gametic phases of the genotypes and estimate their frequencies. There are eight columns of data, with the first two columns containing the pair of alleles at the first marker, the next two columns containing the pair of alleles for the second marker, and so on. Each row represents an individual. The data can be read into a SAS data set as follows:
data markers; input (m1-m8) ($); datalines; B B A B B B A A A A B B A B A B B B A A B B B B A B A B A B A B A A A B A B B B B B A A A B A B A B B B A B A A A B A A A A A A B B A A A A A B A B A B A B B B A B A B A B A A B B A B A B A A A B A A A B A B A B B B B B A B A A A B A A A B B B A B A B A B A B B B A A A B B B B B A A A A A B A A A B A A A B A A A B A B B B A A A A A B A A A B A A A B A B A A A A B B A A A A A A A A A B B B A A A A ;
You can now use PROC HAPLOTYPE to infer the possible haplotypes and estimate the four-locus haplotype frequencies in this sample. The following statements perform these calculations:
proc haplotype data=markers out=hapout init=random prefix=SNP seed=51220; var m1-m8; run;
proc print data=hapout noobs round; run;
This analysis uses the EM algorithm to estimate the haplotype frequencies from the sample. The standard errors and a confidence interval are estimated, by default, under a binomial assumption for each haplotype frequency estimate. A more precise estimate of the standard error can be obtained through the jackknife process by specifying the option SE=JACKKNIFE in the PROC HAPLOTYPE statement, but this takes considerably more computations (see the Methods of Estimating Standard Error section for more information). The option INIT=RANDOM indicates that initial haplotype frequencies are randomly generated, using a random seed created by the system clock since the SEED= option is omitted. The default confidence level 0.95 is used, since the ALPHA= option of the PROC HAPLOTYPE statement was omitted. Also by default, the convergence criterion of 0.00001 must be satisfied for one iteration, and the maximum number of iterations is set to 100. The PREFIX= option requests that the four markers, indicated by the eight allele variables in the VAR statement, be named SNP1–SNP4.
The results from the procedure are shown in Figures 8.1 through 8.3.
Analysis Information | |
---|---|
Loci Used | SNP1 SNP2 SNP3 SNP4 |
Number of Individuals | 25 |
Number of Starts | 1 |
Convergence Criterion | 0.00001 |
Iterations Checked for Conv. | 1 |
Maximum Number of Iterations | 100 |
Number of Iterations Used | 15 |
Log Likelihood | -95.94742 |
Initialization Method | Random |
Random Number Seed | 51220 |
Standard Error Method | Binomial |
Haplotype Frequency Cutoff | 0 |
Figure 8.1 displays a table with information about several of the settings used to perform the HAPLOTYPE procedure as well as information about the EM algorithm. Note that you can obtain from this table the random seed that was generated by the system clock if you need to replicate this analysis.
Haplotype Frequencies | |||||
---|---|---|---|---|---|
Number | Haplotype | Freq | Standard Error |
95% Confidence Limits | |
1 | A-A-A-A | 0.14302 | 0.05001 | 0.04500 | 0.24105 |
2 | A-A-A-B | 0.07527 | 0.03769 | 0.00140 | 0.14914 |
3 | A-A-B-A | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
4 | A-A-B-B | 0.00000 | 0.00010 | 0.00000 | 0.00020 |
5 | A-B-A-A | 0.09307 | 0.04151 | 0.01173 | 0.17442 |
6 | A-B-A-B | 0.05335 | 0.03210 | 0.00000 | 0.11627 |
7 | A-B-B-A | 0.00002 | 0.00061 | 0.00000 | 0.00122 |
8 | A-B-B-B | 0.07526 | 0.03769 | 0.00140 | 0.14913 |
9 | B-A-A-A | 0.08638 | 0.04013 | 0.00772 | 0.16504 |
10 | B-A-A-B | 0.08792 | 0.04046 | 0.00863 | 0.16722 |
11 | B-A-B-A | 0.07921 | 0.03858 | 0.00359 | 0.15482 |
12 | B-A-B-B | 0.10819 | 0.04437 | 0.02122 | 0.19517 |
13 | B-B-A-A | 0.10098 | 0.04304 | 0.01662 | 0.18534 |
14 | B-B-A-B | 0.00000 | 0.00001 | 0.00000 | 0.00002 |
15 | B-B-B-A | 0.09732 | 0.04234 | 0.01433 | 0.18030 |
16 | B-B-B-B | 0.00000 | 0.00001 | 0.00000 | 0.00002 |
Figure 8.2 displays the possible haplotypes in the sample and their estimated frequencies with standard errors and the lower and upper limits of the 95% confidence interval.
_ID_ | m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | HAPLOTYPE1 | HAPLOTYPE2 | PROB |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | B | B | A | B | B | B | A | A | B-A-B-A | B-B-B-A | 1.00 |
2 | A | A | B | B | A | B | A | B | A-B-A-A | A-B-B-B | 1.00 |
2 | A | A | B | B | A | B | A | B | A-B-A-B | A-B-B-A | 0.00 |
3 | B | B | A | A | B | B | B | B | B-A-B-B | B-A-B-B | 1.00 |
4 | A | B | A | B | A | B | A | B | A-A-A-B | B-B-B-A | 0.26 |
4 | A | B | A | B | A | B | A | B | A-B-A-A | B-A-B-B | 0.36 |
4 | A | B | A | B | A | B | A | B | A-B-A-B | B-A-B-A | 0.15 |
4 | A | B | A | B | A | B | A | B | A-B-B-A | B-A-A-B | 0.00 |
4 | A | B | A | B | A | B | A | B | A-B-B-B | B-A-A-A | 0.23 |
5 | A | A | A | B | A | B | B | B | A-A-A-B | A-B-B-B | 1.00 |
6 | B | B | A | A | A | B | A | B | B-A-A-A | B-A-B-B | 0.57 |
6 | B | B | A | A | A | B | A | B | B-A-A-B | B-A-B-A | 0.43 |
7 | A | B | B | B | A | B | A | A | A-B-A-A | B-B-B-A | 1.00 |
7 | A | B | B | B | A | B | A | A | A-B-B-A | B-B-A-A | 0.00 |
8 | A | B | A | A | A | A | A | A | A-A-A-A | B-A-A-A | 1.00 |
9 | B | B | A | A | A | A | A | B | B-A-A-A | B-A-A-B | 1.00 |
10 | A | B | A | B | A | B | B | B | A-B-A-B | B-A-B-B | 0.47 |
10 | A | B | A | B | A | B | B | B | A-B-B-B | B-A-A-B | 0.53 |
11 | A | B | A | B | A | B | A | A | A-A-A-A | B-B-B-A | 0.65 |
11 | A | B | A | B | A | B | A | A | A-B-A-A | B-A-B-A | 0.35 |
11 | A | B | A | B | A | B | A | A | A-B-B-A | B-A-A-A | 0.00 |
12 | B | B | A | B | A | B | A | A | B-A-A-A | B-B-B-A | 0.51 |
12 | B | B | A | B | A | B | A | A | B-A-B-A | B-B-A-A | 0.49 |
13 | A | B | A | A | A | B | A | B | A-A-A-A | B-A-B-B | 0.72 |
13 | A | B | A | A | A | B | A | B | A-A-A-B | B-A-B-A | 0.28 |
14 | A | B | B | B | B | B | A | B | A-B-B-B | B-B-B-A | 1.00 |
15 | A | A | A | B | A | A | A | B | A-A-A-A | A-B-A-B | 0.52 |
15 | A | A | A | B | A | A | A | B | A-A-A-B | A-B-A-A | 0.48 |
16 | B | B | A | B | A | B | A | B | B-A-A-B | B-B-B-A | 0.44 |
16 | B | B | A | B | A | B | A | B | B-A-B-B | B-B-A-A | 0.56 |
17 | A | B | B | B | A | A | A | B | A-B-A-B | B-B-A-A | 1.00 |
18 | B | B | B | B | A | A | A | A | B-B-A-A | B-B-A-A | 1.00 |
19 | A | B | A | A | A | B | A | A | A-A-A-A | B-A-B-A | 1.00 |
20 | A | B | A | A | A | B | A | B | A-A-A-A | B-A-B-B | 0.72 |
20 | A | B | A | A | A | B | A | B | A-A-A-B | B-A-B-A | 0.28 |
21 | B | B | A | A | A | A | A | B | B-A-A-A | B-A-A-B | 1.00 |
22 | A | A | A | B | A | A | A | B | A-A-A-A | A-B-A-B | 0.52 |
22 | A | A | A | B | A | A | A | B | A-A-A-B | A-B-A-A | 0.48 |
23 | A | B | A | A | A | A | B | B | A-A-A-B | B-A-A-B | 1.00 |
24 | A | A | A | A | A | A | A | A | A-A-A-A | A-A-A-A | 1.00 |
25 | A | B | B | B | A | A | A | A | A-B-A-A | B-B-A-A | 1.00 |
Figure 8.3 displays each individual’s genotype with each of the possible haplotype pairs that the genotype can comprise, and the probability that the genotype can be resolved into each of the possible haplotype pairs.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.