Previous Page | Next Page

The HAPLOTYPE Procedure

Example

Assume you have a random sample with 25 individuals genotyped at four markers. You want to infer the gametic phases of the genotypes and estimate their frequencies. There are eight columns of data, with the first two columns containing the pair of alleles at the first marker, the next two columns containing the pair of alleles for the second marker, and so on. Each row represents an individual. The data can be read into a SAS data set as follows:

   data markers;
      input (m1-m8) ($);
      datalines; 
   B  B  A  B  B  B  A  A     
   A  A  B  B  A  B  A  B     
   B  B  A  A  B  B  B  B     
   A  B  A  B  A  B  A  B     
   A  A  A  B  A  B  B  B     
   B  B  A  A  A  B  A  B     
   A  B  B  B  A  B  A  A     
   A  B  A  A  A  A  A  A     
   B  B  A  A  A  A  A  B     
   A  B  A  B  A  B  B  B     
   A  B  A  B  A  B  A  A     
   B  B  A  B  A  B  A  A     
   A  B  A  A  A  B  A  B     
   A  B  B  B  B  B  A  B     
   A  A  A  B  A  A  A  B     
   B  B  A  B  A  B  A  B     
   A  B  B  B  A  A  A  B     
   B  B  B  B  A  A  A  A     
   A  B  A  A  A  B  A  A     
   A  B  A  A  A  B  A  B     
   B  B  A  A  A  A  A  B     
   A  A  A  B  A  A  A  B     
   A  B  A  A  A  A  B  B     
   A  A  A  A  A  A  A  A     
   A  B  B  B  A  A  A  A 
   ;                

You can now use PROC HAPLOTYPE to infer the possible haplotypes and estimate the four-locus haplotype frequencies in this sample. The following statements perform these calculations:

   proc haplotype data=markers out=hapout init=random prefix=SNP seed=51220;
      var m1-m8;
   run;
   proc print data=hapout noobs round;
   run;

This analysis uses the EM algorithm to estimate the haplotype frequencies from the sample. The standard errors and a confidence interval are estimated, by default, under a binomial assumption for each haplotype frequency estimate. A more precise estimate of the standard error can be obtained through the jackknife process by specifying the option SE=JACKKNIFE in the PROC HAPLOTYPE statement, but this takes considerably more computations (see the Methods of Estimating Standard Error section for more information). The option INIT=RANDOM indicates that initial haplotype frequencies are randomly generated, using a random seed created by the system clock since the SEED= option is omitted. The default confidence level 0.95 is used, since the ALPHA= option of the PROC HAPLOTYPE statement was omitted. Also by default, the convergence criterion of 0.00001 must be satisfied for one iteration, and the maximum number of iterations is set to 100. The PREFIX= option requests that the four markers, indicated by the eight allele variables in the VAR statement, be named SNP1–SNP4.

The results from the procedure are shown in Figures 8.1 through 8.3.

Figure 8.1 Analysis Information for the HAPLOTYPE Procedure
The HAPLOTYPE Procedure

Analysis Information
Loci Used SNP1 SNP2 SNP3 SNP4
Number of Individuals 25
Number of Starts 1
Convergence Criterion 0.00001
Iterations Checked for Conv. 1
Maximum Number of Iterations 100
Number of Iterations Used 15
Log Likelihood -95.94742
Initialization Method Random
Random Number Seed 51220
Standard Error Method Binomial
Haplotype Frequency Cutoff 0

Figure 8.1 displays a table with information about several of the settings used to perform the HAPLOTYPE procedure as well as information about the EM algorithm. Note that you can obtain from this table the random seed that was generated by the system clock if you need to replicate this analysis.

Figure 8.2 Haplotype Frequencies from the HAPLOTYPE Procedure
Haplotype Frequencies
Number Haplotype Freq Standard
Error
95% Confidence Limits
1 A-A-A-A 0.14302 0.05001 0.04500 0.24105
2 A-A-A-B 0.07527 0.03769 0.00140 0.14914
3 A-A-B-A 0.00000 0.00000 0.00000 0.00000
4 A-A-B-B 0.00000 0.00010 0.00000 0.00020
5 A-B-A-A 0.09307 0.04151 0.01173 0.17442
6 A-B-A-B 0.05335 0.03210 0.00000 0.11627
7 A-B-B-A 0.00002 0.00061 0.00000 0.00122
8 A-B-B-B 0.07526 0.03769 0.00140 0.14913
9 B-A-A-A 0.08638 0.04013 0.00772 0.16504
10 B-A-A-B 0.08792 0.04046 0.00863 0.16722
11 B-A-B-A 0.07921 0.03858 0.00359 0.15482
12 B-A-B-B 0.10819 0.04437 0.02122 0.19517
13 B-B-A-A 0.10098 0.04304 0.01662 0.18534
14 B-B-A-B 0.00000 0.00001 0.00000 0.00002
15 B-B-B-A 0.09732 0.04234 0.01433 0.18030
16 B-B-B-B 0.00000 0.00001 0.00000 0.00002

Figure 8.2 displays the possible haplotypes in the sample and their estimated frequencies with standard errors and the lower and upper limits of the 95% confidence interval.

Figure 8.3 Output Data Set from the HAPLOTYPE Procedure
_ID_ m1 m2 m3 m4 m5 m6 m7 m8 HAPLOTYPE1 HAPLOTYPE2 PROB
1 B B A B B B A A B-A-B-A B-B-B-A 1.00
2 A A B B A B A B A-B-A-A A-B-B-B 1.00
2 A A B B A B A B A-B-A-B A-B-B-A 0.00
3 B B A A B B B B B-A-B-B B-A-B-B 1.00
4 A B A B A B A B A-A-A-B B-B-B-A 0.26
4 A B A B A B A B A-B-A-A B-A-B-B 0.36
4 A B A B A B A B A-B-A-B B-A-B-A 0.15
4 A B A B A B A B A-B-B-A B-A-A-B 0.00
4 A B A B A B A B A-B-B-B B-A-A-A 0.23
5 A A A B A B B B A-A-A-B A-B-B-B 1.00
6 B B A A A B A B B-A-A-A B-A-B-B 0.57
6 B B A A A B A B B-A-A-B B-A-B-A 0.43
7 A B B B A B A A A-B-A-A B-B-B-A 1.00
7 A B B B A B A A A-B-B-A B-B-A-A 0.00
8 A B A A A A A A A-A-A-A B-A-A-A 1.00
9 B B A A A A A B B-A-A-A B-A-A-B 1.00
10 A B A B A B B B A-B-A-B B-A-B-B 0.47
10 A B A B A B B B A-B-B-B B-A-A-B 0.53
11 A B A B A B A A A-A-A-A B-B-B-A 0.65
11 A B A B A B A A A-B-A-A B-A-B-A 0.35
11 A B A B A B A A A-B-B-A B-A-A-A 0.00
12 B B A B A B A A B-A-A-A B-B-B-A 0.51
12 B B A B A B A A B-A-B-A B-B-A-A 0.49
13 A B A A A B A B A-A-A-A B-A-B-B 0.72
13 A B A A A B A B A-A-A-B B-A-B-A 0.28
14 A B B B B B A B A-B-B-B B-B-B-A 1.00
15 A A A B A A A B A-A-A-A A-B-A-B 0.52
15 A A A B A A A B A-A-A-B A-B-A-A 0.48
16 B B A B A B A B B-A-A-B B-B-B-A 0.44
16 B B A B A B A B B-A-B-B B-B-A-A 0.56
17 A B B B A A A B A-B-A-B B-B-A-A 1.00
18 B B B B A A A A B-B-A-A B-B-A-A 1.00
19 A B A A A B A A A-A-A-A B-A-B-A 1.00
20 A B A A A B A B A-A-A-A B-A-B-B 0.72
20 A B A A A B A B A-A-A-B B-A-B-A 0.28
21 B B A A A A A B B-A-A-A B-A-A-B 1.00
22 A A A B A A A B A-A-A-A A-B-A-B 0.52
22 A A A B A A A B A-A-A-B A-B-A-A 0.48
23 A B A A A A B B A-A-A-B B-A-A-B 1.00
24 A A A A A A A A A-A-A-A A-A-A-A 1.00
25 A B B B A A A A A-B-A-A B-B-A-A 1.00

Figure 8.3 displays each individual’s genotype with each of the possible haplotype pairs that the genotype can comprise, and the probability that the genotype can be resolved into each of the possible haplotype pairs.

Previous Page | Next Page | Top of Page