Previous Page | Next Page

The HAPLOTYPE Procedure

Example 8.1 Estimating Three-Locus Haplotype Frequencies

Here is an example of 227 individuals genotyped at three markers, data that were created based on genotype frequency tables from the Lab of Statistical Genetics at Rockefeller University (2001). Note that when reading in the data, there are four individuals’ genotypes per line, except for the last line of the DATA step, which contains three individuals’ genotypes. The SAS data set that is created using the following code contains one individual per row with six columns representing the two alleles at each of three marker loci.

   data ehdata;
      input m1-m6 @@;
      datalines;
   1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3
   1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3
   1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3
   1 1 1 1 2 3 1 1 1 1 2 3 1 1 1 1 2 3 1 1 1 1 3 3
   1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3
   1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3
   1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 2 1 2
   1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2
   1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2
   1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 2 2 1 1 1 2 2 2
   1 1 1 2 1 3 1 1 1 2 1 3 1 1 1 2 2 3 1 1 1 2 2 3
   1 1 1 2 2 3 1 1 1 2 3 3 1 1 1 2 3 3 1 1 1 2 3 3
   1 1 1 2 3 3 1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 1
   1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 2 1 1 2 2 1 2
   1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 2 2 1 1 2 2 2 2
   1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 1 3 1 1 2 2 1 3
   1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
   1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
   1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
   1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
   1 1 2 2 3 3 1 1 2 2 3 3 1 2 1 1 1 1 1 2 1 1 1 1
   1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1
   1 2 1 1 1 1 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 3
   1 2 1 1 1 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 2 3
   1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 3 3
   1 2 1 1 3 3 1 2 1 1 3 3 1 2 1 2 1 1 1 2 1 2 1 1
   1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1
   1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 3
   1 2 1 2 1 3 1 2 1 2 1 3 1 2 1 2 2 3 1 2 1 2 2 3
   1 2 1 2 2 3 1 2 1 2 2 3 1 2 1 2 3 3 1 2 1 2 3 3
   1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1
   1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1
   1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 2 1 2 2 2 1 2
   1 2 2 2 1 2 1 2 2 2 1 3 1 2 2 2 1 3 1 2 2 2 2 3
   1 2 2 2 2 3 1 2 2 2 2 3 1 2 2 2 3 3 1 2 2 2 3 3
   1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3
   1 2 2 2 3 3 1 2 2 2 3 3 2 2 1 1 1 1 2 2 1 1 1 2
   2 2 1 1 1 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2
   2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2
   2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 1 3
   2 2 1 1 1 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3
   2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3
   2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 3 3 2 2 1 1 3 3
   2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 2 1 1 2 2 1 2 1 1
   2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 2
   2 2 1 2 1 2 2 2 1 2 1 2 2 2 1 2 2 2 2 2 1 2 2 2
   2 2 1 2 2 2 2 2 1 2 2 2 2 2 1 2 1 3 2 2 1 2 1 3
   2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 1 3
   2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3
   2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 3 3
   2 2 1 2 3 3 2 2 1 2 3 3 2 2 2 2 1 1 2 2 2 2 1 1
   2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1
   2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 2
   2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 3 2 2 2 2 2 3
   2 2 2 2 2 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3
   2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3
   2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 
   ;

The haplotype frequencies can be estimated using the EM algorithm and their standard errors estimated using the jackknife method by implementing the following code:

   proc haplotype data=ehdata se=jackknife maxiter=20 itprint nlag=4;
      var m1-m6;
   run;
   

This produces the ODS output shown in Outputs 8.1.1 through 8.1.4.

Output 8.1.1 Analysis Information for the HAPLOTYPE Procedure
The HAPLOTYPE Procedure

Analysis Information
Loci Used M1 M2 M3
Number of Individuals 227
Number of Starts 1
Convergence Criterion 0.00001
Iterations Checked for Conv. 4
Maximum Number of Iterations 20
Number of Iterations Used 11
Log Likelihood -934.97918
Initialization Method Linkage Equilibrium
Standard Error Method Jackknife
Haplotype Frequency Cutoff 0

The HAPLOTYPE Procedure
 
Tests for Haplotype-Trait Association

Individual ID disease A_B_C A_B_c a_B_C a_B_c A_b_C A_b_c a_b_c a_b_C
1 1 0.29 0.21 0.21 0.29 0.00 0.00 0.00 0
2 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
3 0 0.00 0.27 0.00 0.23 0.00 0.23 0.27 0
4 1 0.50 0.50 0.00 0.00 0.00 0.00 0.00 0
5 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
6 0 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
7 1 0.22 0.00 0.13 0.15 0.15 0.13 0.22 0
8 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
9 1 0.00 0.50 0.00 0.50 0.00 0.00 0.00 0
10 0 0.00 0.00 0.00 0.50 0.00 0.00 0.50 0
11 1 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0
12 1 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0
13 0 0.00 0.00 0.00 0.00 0.00 0.50 0.50 0
14 1 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0
15 0 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0
16 0 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
17 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
18 1 0.00 0.27 0.00 0.23 0.00 0.23 0.27 0
19 1 0.29 0.21 0.21 0.29 0.00 0.00 0.00 0
20 0 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0
21 1 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0
22 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
23 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
24 0 0.22 0.00 0.13 0.15 0.15 0.13 0.22 0
25 0 0.50 0.00 0.50 0.00 0.00 0.00 0.00 0
26 1 0.50 0.50 0.00 0.00 0.00 0.00 0.00 0
27 0 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
28 1 0.50 0.50 0.00 0.00 0.00 0.00 0.00 0
29 1 0.01 0.00 0.49 0.00 0.49 0.00 0.00 0
30 1 0.22 0.00 0.13 0.15 0.15 0.13 0.22 0
31 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
32 1 0.00 0.50 0.00 0.50 0.00 0.00 0.00 0
33 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
34 1 0.22 0.00 0.13 0.15 0.15 0.13 0.22 0
35 1 0.50 0.00 0.00 0.00 0.50 0.00 0.00 0
36 1 0.50 0.00 0.50 0.00 0.00 0.00 0.00 0
37 0 0.22 0.00 0.13 0.15 0.15 0.13 0.22 0
38 0 0.01 0.00 0.49 0.00 0.49 0.00 0.00 0
39 1 0.27 0.23 0.00 0.00 0.23 0.27 0.00 0
40 0 0.00 0.27 0.00 0.23 0.00 0.23 0.27 0
41 0 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0
42 1 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0
43 1 0.29 0.21 0.21 0.29 0.00 0.00 0.00 0

The HAPLOTYPE Procedure
 
Tests for Haplotype-Trait Association

The LOGISTIC Procedure

Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 6.1962 1 0.0128
Score 6.3995 1 0.0114
Wald 4.9675 1 0.0258

Output 8.1.1 displays information about several of the settings used to perform the HAPLOTYPE procedure on the ehdata data set. Note that although the MAXITER= option was set to 20 iterations, convergence according to the criterion of 0.00001 was reached for four consecutive iterations prior to the 20th iteration, at which point the estimation process stopped. To obtain more precise frequency estimates, a lower convergence criterion can be used.

Output 8.1.2 Iteration History for the HAPLOTYPE Procedure
Iteration History
Iter LogLike Ratio
Changed
0 -953.89697  
1 -937.92181 0.01675
2 -935.91870 0.00214
3 -935.35775 0.00060
4 -935.13050 0.00024
5 -935.03710 0.00010
6 -935.00051 0.00004
7 -934.98679 0.00001
8 -934.98180 0.00001
9 -934.98002 0.00000
10 -934.97940 0.00000
11 -934.97918 0.00000

Output 8.1.3 Convergence Status for the HAPLOTYPE Procedure
Algorithm converged.

Because the ITPRINT option was specified in the PROC HAPLOTYPE statement, the iteration history of the EM algorithm is included in the ODS output. Output 8.1.2 contains the table displaying this information. By default, the "Convergence Status" table is displayed (Output 8.1.3), which consists of only one line indicating whether convergence was met.

Output 8.1.4 Haplotype Frequencies from the HAPLOTYPE Procedure
Haplotype Frequencies
Number Haplotype Freq Standard
Error
95% Confidence Limits
1 1-1-1 0.09170 0.01505 0.06221 0.12119
2 1-1-2 0.02080 0.00952 0.00214 0.03946
3 1-1-3 0.11509 0.01766 0.08048 0.14971
4 1-2-1 0.07904 0.01696 0.04580 0.11228
5 1-2-2 0.06768 0.01546 0.03738 0.09799
6 1-2-3 0.12788 0.02094 0.08685 0.16891
7 2-1-1 0.05521 0.01227 0.03115 0.07926
8 2-1-2 0.11700 0.01782 0.08207 0.15193
9 2-1-3 0.07376 0.01495 0.04446 0.10307
10 2-2-1 0.11766 0.01831 0.08177 0.15355
11 2-2-2 0.03020 0.00899 0.01257 0.04782
12 2-2-3 0.10397 0.01833 0.06805 0.13989

The HAPLOTYPE Procedure

Analysis Information
Loci Used SNP1 SNP2 SNP3 SNP4
Number of Individuals 25
Number of Starts 5
Convergence Criterion 0.00001
Iterations Checked for Conv. 1
Maximum Number of Iterations 100
Number of Iterations Used 19
Log Likelihood -95.94742
Initialization Method Random
Random Number Seed 499887544
Standard Error Method Binomial
Haplotype Frequency Cutoff 0

Output 8.1.4 displays the 12 possible three-locus haplotypes in the data and their estimated haplotype frequencies, standard errors, and bounds for the 95% confidence intervals for the estimates.

To see how the CUTOFF= option affects the "Haplotype Frequencies" table, suppose you want to view only the haplotypes with an estimated frequency of at least 0.10. The following code creates such a table:

   proc haplotype data=ehdata se=jackknife cutoff=0.10 nlag=4;
     var m1-m6;
   run;
   

Now the "Haplotype Frequencies" table is displayed as in Output 8.1.5.

Output 8.1.5 Haplotype Frequencies from the HAPLOTYPE Procedure Using the CUTOFF= Option
The HAPLOTYPE Procedure

Haplotype Frequencies
Number Haplotype Freq Standard
Error
95% Confidence Limits
1 1-1-3 0.11509 0.01766 0.08048 0.14971
2 1-2-3 0.12788 0.02094 0.08685 0.16891
3 2-1-2 0.11700 0.01782 0.08207 0.15193
4 2-2-1 0.11766 0.01831 0.08177 0.15355
5 2-2-3 0.10397 0.01833 0.06805 0.13989

Output 8.1.5 displays only the five 3-locus haplotypes with estimated frequencies of at least 0.10. This option is especially useful for keeping the "Haplotype Frequencies" table to a manageable size when many marker loci or loci with several alleles are used and when many of the haplotypes have estimated frequencies very near zero. Using CUTOFF=1 suppresses the "Haplotype Frequencies" table.

Previous Page | Next Page | Top of Page