The HAPLOTYPE Procedure |
Here is an example of 227 individuals genotyped at three markers, data that were created based on genotype frequency tables from the Lab of Statistical Genetics at Rockefeller University (2001). Note that when reading in the data, there are four individuals’ genotypes per line, except for the last line of the DATA step, which contains three individuals’ genotypes. The SAS data set that is created using the following code contains one individual per row with six columns representing the two alleles at each of three marker loci.
data ehdata; input m1-m6 @@; datalines; 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 2 3 1 1 1 1 2 3 1 1 1 1 2 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 1 3 1 1 1 2 1 3 1 1 1 2 2 3 1 1 1 2 2 3 1 1 1 2 2 3 1 1 1 2 3 3 1 1 1 2 3 3 1 1 1 2 3 3 1 1 1 2 3 3 1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 1 3 1 1 2 2 1 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 3 1 2 1 1 1 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 3 3 1 2 1 1 3 3 1 2 1 1 3 3 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 3 1 2 1 2 1 3 1 2 1 2 1 3 1 2 1 2 2 3 1 2 1 2 2 3 1 2 1 2 2 3 1 2 1 2 2 3 1 2 1 2 3 3 1 2 1 2 3 3 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 2 1 2 2 2 1 2 1 2 2 2 1 2 1 2 2 2 1 3 1 2 2 2 1 3 1 2 2 2 2 3 1 2 2 2 2 3 1 2 2 2 2 3 1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3 2 2 1 1 1 1 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 1 3 2 2 1 1 1 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 2 2 2 1 2 1 2 2 2 1 2 1 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 3 3 2 2 1 2 3 3 2 2 1 2 3 3 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 3 2 2 2 2 2 3 2 2 2 2 2 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 ;
The haplotype frequencies can be estimated using the EM algorithm and their standard errors estimated using the jackknife method by implementing the following code:
proc haplotype data=ehdata se=jackknife maxiter=20 itprint nlag=4; var m1-m6; run;
This produces the ODS output shown in Outputs 8.1.1 through 8.1.4.
Analysis Information | |
---|---|
Loci Used | M1 M2 M3 |
Number of Individuals | 227 |
Number of Starts | 1 |
Convergence Criterion | 0.00001 |
Iterations Checked for Conv. | 4 |
Maximum Number of Iterations | 20 |
Number of Iterations Used | 11 |
Log Likelihood | -934.97918 |
Initialization Method | Linkage Equilibrium |
Standard Error Method | Jackknife |
Haplotype Frequency Cutoff | 0 |
The HAPLOTYPE Procedure |
Tests for Haplotype-Trait Association |
Individual ID | disease | A_B_C | A_B_c | a_B_C | a_B_c | A_b_C | A_b_c | a_b_c | a_b_C |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0.29 | 0.21 | 0.21 | 0.29 | 0.00 | 0.00 | 0.00 | 0 |
2 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
3 | 0 | 0.00 | 0.27 | 0.00 | 0.23 | 0.00 | 0.23 | 0.27 | 0 |
4 | 1 | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
5 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
6 | 0 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
7 | 1 | 0.22 | 0.00 | 0.13 | 0.15 | 0.15 | 0.13 | 0.22 | 0 |
8 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
9 | 1 | 0.00 | 0.50 | 0.00 | 0.50 | 0.00 | 0.00 | 0.00 | 0 |
10 | 0 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 0.00 | 0.50 | 0 |
11 | 1 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
12 | 1 | 0.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
13 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.50 | 0.50 | 0 |
14 | 1 | 0.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
15 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | 0 |
16 | 0 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
17 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
18 | 1 | 0.00 | 0.27 | 0.00 | 0.23 | 0.00 | 0.23 | 0.27 | 0 |
19 | 1 | 0.29 | 0.21 | 0.21 | 0.29 | 0.00 | 0.00 | 0.00 | 0 |
20 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | 0.00 | 0 |
21 | 1 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
22 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
23 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
24 | 0 | 0.22 | 0.00 | 0.13 | 0.15 | 0.15 | 0.13 | 0.22 | 0 |
25 | 0 | 0.50 | 0.00 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
26 | 1 | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
27 | 0 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
28 | 1 | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
29 | 1 | 0.01 | 0.00 | 0.49 | 0.00 | 0.49 | 0.00 | 0.00 | 0 |
30 | 1 | 0.22 | 0.00 | 0.13 | 0.15 | 0.15 | 0.13 | 0.22 | 0 |
31 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
32 | 1 | 0.00 | 0.50 | 0.00 | 0.50 | 0.00 | 0.00 | 0.00 | 0 |
33 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
34 | 1 | 0.22 | 0.00 | 0.13 | 0.15 | 0.15 | 0.13 | 0.22 | 0 |
35 | 1 | 0.50 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 0.00 | 0 |
36 | 1 | 0.50 | 0.00 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
37 | 0 | 0.22 | 0.00 | 0.13 | 0.15 | 0.15 | 0.13 | 0.22 | 0 |
38 | 0 | 0.01 | 0.00 | 0.49 | 0.00 | 0.49 | 0.00 | 0.00 | 0 |
39 | 1 | 0.27 | 0.23 | 0.00 | 0.00 | 0.23 | 0.27 | 0.00 | 0 |
40 | 0 | 0.00 | 0.27 | 0.00 | 0.23 | 0.00 | 0.23 | 0.27 | 0 |
41 | 0 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
42 | 1 | 0.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0 |
43 | 1 | 0.29 | 0.21 | 0.21 | 0.29 | 0.00 | 0.00 | 0.00 | 0 |
Output 8.1.1 displays information about several of the settings used to perform the HAPLOTYPE procedure on the ehdata data set. Note that although the MAXITER= option was set to 20 iterations, convergence according to the criterion of 0.00001 was reached for four consecutive iterations prior to the 20th iteration, at which point the estimation process stopped. To obtain more precise frequency estimates, a lower convergence criterion can be used.
Because the ITPRINT option was specified in the PROC HAPLOTYPE statement, the iteration history of the EM algorithm is included in the ODS output. Output 8.1.2 contains the table displaying this information. By default, the "Convergence Status" table is displayed (Output 8.1.3), which consists of only one line indicating whether convergence was met.
Haplotype Frequencies | |||||
---|---|---|---|---|---|
Number | Haplotype | Freq | Standard Error |
95% Confidence Limits | |
1 | 1-1-1 | 0.09170 | 0.01505 | 0.06221 | 0.12119 |
2 | 1-1-2 | 0.02080 | 0.00952 | 0.00214 | 0.03946 |
3 | 1-1-3 | 0.11509 | 0.01766 | 0.08048 | 0.14971 |
4 | 1-2-1 | 0.07904 | 0.01696 | 0.04580 | 0.11228 |
5 | 1-2-2 | 0.06768 | 0.01546 | 0.03738 | 0.09799 |
6 | 1-2-3 | 0.12788 | 0.02094 | 0.08685 | 0.16891 |
7 | 2-1-1 | 0.05521 | 0.01227 | 0.03115 | 0.07926 |
8 | 2-1-2 | 0.11700 | 0.01782 | 0.08207 | 0.15193 |
9 | 2-1-3 | 0.07376 | 0.01495 | 0.04446 | 0.10307 |
10 | 2-2-1 | 0.11766 | 0.01831 | 0.08177 | 0.15355 |
11 | 2-2-2 | 0.03020 | 0.00899 | 0.01257 | 0.04782 |
12 | 2-2-3 | 0.10397 | 0.01833 | 0.06805 | 0.13989 |
Analysis Information | |
---|---|
Loci Used | SNP1 SNP2 SNP3 SNP4 |
Number of Individuals | 25 |
Number of Starts | 5 |
Convergence Criterion | 0.00001 |
Iterations Checked for Conv. | 1 |
Maximum Number of Iterations | 100 |
Number of Iterations Used | 19 |
Log Likelihood | -95.94742 |
Initialization Method | Random |
Random Number Seed | 499887544 |
Standard Error Method | Binomial |
Haplotype Frequency Cutoff | 0 |
Output 8.1.4 displays the 12 possible three-locus haplotypes in the data and their estimated haplotype frequencies, standard errors, and bounds for the 95% confidence intervals for the estimates.
To see how the CUTOFF= option affects the "Haplotype Frequencies" table, suppose you want to view only the haplotypes with an estimated frequency of at least 0.10. The following code creates such a table:
proc haplotype data=ehdata se=jackknife cutoff=0.10 nlag=4; var m1-m6; run;
Now the "Haplotype Frequencies" table is displayed as in Output 8.1.5.
Haplotype Frequencies | |||||
---|---|---|---|---|---|
Number | Haplotype | Freq | Standard Error |
95% Confidence Limits | |
1 | 1-1-3 | 0.11509 | 0.01766 | 0.08048 | 0.14971 |
2 | 1-2-3 | 0.12788 | 0.02094 | 0.08685 | 0.16891 |
3 | 2-1-2 | 0.11700 | 0.01782 | 0.08207 | 0.15193 |
4 | 2-2-1 | 0.11766 | 0.01831 | 0.08177 | 0.15355 |
5 | 2-2-3 | 0.10397 | 0.01833 | 0.06805 | 0.13989 |
Output 8.1.5 displays only the five 3-locus haplotypes with estimated frequencies of at least 0.10. This option is especially useful for keeping the "Haplotype Frequencies" table to a manageable size when many marker loci or loci with several alleles are used and when many of the haplotypes have estimated frequencies very near zero. Using CUTOFF=1 suppresses the "Haplotype Frequencies" table.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.