The HAPLOTYPE Procedure

Example 7.1 Estimating Three-Locus Haplotype Frequencies

Here is an example of 227 individuals genotyped at three markers, data that were created based on genotype frequency tables from the Lab of Statistical Genetics at Rockefeller University (2001). Note that when reading in the data, there are four individuals’ genotypes per line, except for the last line of the DATA step, which contains three individuals’ genotypes. The SAS data set that is created using the following code contains one individual per row with six columns representing the two alleles at each of three marker loci.

data ehdata;
   input m1-m6 @@;
   datalines;
1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3
1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3
1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3
1 1 1 1 2 3 1 1 1 1 2 3 1 1 1 1 2 3 1 1 1 1 3 3
1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3
1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3 1 1 1 1 3 3
1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 1 2 1 2
1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2
1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2
1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 2 2 1 1 1 2 2 2
1 1 1 2 1 3 1 1 1 2 1 3 1 1 1 2 2 3 1 1 1 2 2 3
1 1 1 2 2 3 1 1 1 2 3 3 1 1 1 2 3 3 1 1 1 2 3 3
1 1 1 2 3 3 1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 1
1 1 2 2 1 1 1 1 2 2 1 1 1 1 2 2 1 2 1 1 2 2 1 2
1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 2 2 1 1 2 2 2 2
1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 1 3 1 1 2 2 1 3
1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
1 1 2 2 3 3 1 1 2 2 3 3 1 2 1 1 1 1 1 2 1 1 1 1
1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1
1 2 1 1 1 1 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 3
1 2 1 1 1 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 2 3
1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 2 3 1 2 1 1 3 3
1 2 1 1 3 3 1 2 1 1 3 3 1 2 1 2 1 1 1 2 1 2 1 1
1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1
1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 2 1 3
1 2 1 2 1 3 1 2 1 2 1 3 1 2 1 2 2 3 1 2 1 2 2 3
1 2 1 2 2 3 1 2 1 2 2 3 1 2 1 2 3 3 1 2 1 2 3 3
1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1
1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1
1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 2 1 2 2 2 1 2
1 2 2 2 1 2 1 2 2 2 1 3 1 2 2 2 1 3 1 2 2 2 2 3
1 2 2 2 2 3 1 2 2 2 2 3 1 2 2 2 3 3 1 2 2 2 3 3
1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3 1 2 2 2 3 3
1 2 2 2 3 3 1 2 2 2 3 3 2 2 1 1 1 1 2 2 1 1 1 2
2 2 1 1 1 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2
2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2
2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 1 3
2 2 1 1 1 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3
2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 2 3
2 2 1 1 2 3 2 2 1 1 2 3 2 2 1 1 3 3 2 2 1 1 3 3
2 2 1 1 3 3 2 2 1 1 3 3 2 2 1 2 1 1 2 2 1 2 1 1
2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 1 2 2 1 2 1 2
2 2 1 2 1 2 2 2 1 2 1 2 2 2 1 2 2 2 2 2 1 2 2 2
2 2 1 2 2 2 2 2 1 2 2 2 2 2 1 2 1 3 2 2 1 2 1 3
2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 1 3 2 2 1 2 1 3
2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3
2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 2 3 2 2 1 2 3 3
2 2 1 2 3 3 2 2 1 2 3 3 2 2 2 2 1 1 2 2 2 2 1 1
2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1
2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 2
2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 3 2 2 2 2 2 3
2 2 2 2 2 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3
2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3
2 2 2 2 3 3 2 2 2 2 3 3 2 2 2 2 3 3 
;

The haplotype frequencies can be estimated using the EM algorithm and their standard errors estimated using the jackknife method by implementing the following code:

proc haplotype data=ehdata se=jackknife maxiter=20 itprint nlag=4;
   var m1-m6;
run;

This produces the ODS output shown in Outputs Output 7.1.1 through Output 7.1.4.

Output 7.1.1: Analysis Information for the HAPLOTYPE Procedure

The HAPLOTYPE Procedure

Analysis Information
Loci Used M1 M2 M3
Number of Individuals 227
Number of Starts 1
Convergence Criterion 0.00001
Iterations Checked for Conv. 4
Maximum Number of Iterations 20
Number of Iterations Used 11
Log Likelihood -934.97918
Initialization Method Linkage Equilibrium
Standard Error Method Jackknife
Haplotype Frequency Cutoff 0


Output 7.1.1 displays information about several of the settings used to perform the HAPLOTYPE procedure on the ehdata data set. Note that although the MAXITER= option was set to 20 iterations, convergence according to the criterion of 0.00001 was reached for four consecutive iterations prior to the 20th iteration, at which point the estimation process stopped. To obtain more precise frequency estimates, a lower convergence criterion can be used.

Output 7.1.2: Iteration History for the HAPLOTYPE Procedure

Iteration History
Iter LogLike Ratio
Changed
0 -953.89697  
1 -937.92181 0.01675
2 -935.91870 0.00214
3 -935.35775 0.00060
4 -935.13050 0.00024
5 -935.03710 0.00010
6 -935.00051 0.00004
7 -934.98679 0.00001
8 -934.98180 0.00001
9 -934.98002 0.00000
10 -934.97940 0.00000
11 -934.97918 0.00000


Output 7.1.3: Convergence Status for the HAPLOTYPE Procedure

Algorithm converged.


Because the ITPRINT option was specified in the PROC HAPLOTYPE statement, the iteration history of the EM algorithm is included in the ODS output. Output 7.1.2 contains the table displaying this information. By default, the Convergence Status table is displayed (Output 7.1.3), which consists of only one line indicating whether convergence was met.

Output 7.1.4: Haplotype Frequencies from the HAPLOTYPE Procedure

Haplotype Frequencies
Number Haplotype Freq Standard
Error
95% Confidence Limits
1 1-1-1 0.09170 0.01505 0.06221 0.12119
2 1-1-2 0.02080 0.00952 0.00214 0.03946
3 1-1-3 0.11509 0.01766 0.08048 0.14971
4 1-2-1 0.07904 0.01696 0.04580 0.11228
5 1-2-2 0.06768 0.01546 0.03738 0.09799
6 1-2-3 0.12788 0.02094 0.08685 0.16891
7 2-1-1 0.05521 0.01227 0.03115 0.07926
8 2-1-2 0.11700 0.01782 0.08207 0.15193
9 2-1-3 0.07376 0.01495 0.04446 0.10307
10 2-2-1 0.11766 0.01831 0.08177 0.15355
11 2-2-2 0.03020 0.00899 0.01257 0.04782
12 2-2-3 0.10397 0.01833 0.06805 0.13989


Output 7.1.4 displays the 12 possible three-locus haplotypes in the data and their estimated haplotype frequencies, standard errors, and bounds for the 95% confidence intervals for the estimates.

To see how the CUTOFF= option affects the Haplotype Frequencies table, suppose you want to view only the haplotypes with an estimated frequency of at least 0.10. The following code creates such a table:

proc haplotype data=ehdata se=jackknife cutoff=0.10 nlag=4;
  var m1-m6;
run;

Now the Haplotype Frequencies table is displayed as in Output 7.1.5.

Output 7.1.5: Haplotype Frequencies from the HAPLOTYPE Procedure Using the CUTOFF= Option

The HAPLOTYPE Procedure

Haplotype Frequencies
Number Haplotype Freq Standard
Error
95% Confidence Limits
1 1-1-3 0.11509 0.01766 0.08048 0.14971
2 1-2-3 0.12788 0.02094 0.08685 0.16891
3 2-1-2 0.11700 0.01782 0.08207 0.15193
4 2-2-1 0.11766 0.01831 0.08177 0.15355
5 2-2-3 0.10397 0.01833 0.06805 0.13989


Output 7.1.5 displays only the five 3-locus haplotypes with estimated frequencies of at least 0.10. This option is especially useful for keeping the Haplotype Frequencies table to a manageable size when many marker loci or loci with several alleles are used and when many of the haplotypes have estimated frequencies very near zero. Using CUTOFF=1 suppresses the Haplotype Frequencies table.