The HTSNP Procedure (Experimental)

Example

The following haplotypes from markers at the CTLA4 locus (Johnson et al. 2001) can be read into a SAS data set as follows:

data ctla4; 
   input (m1-m12)($) freq;
   datalines;
C T A A G C C A C C A G 0.333
T T A G G C C G C T G G 0.224
T C A G G C C G C T G G 0.058
T T A A G C C G C T G G 0.020
C T A A G T C A C C A G 0.080
C T A G G T C A C C A G 0.017
C T A G G C C A C C A G 0.045
T T A G G C C A C C A G 0.018
C T G G A C T A T C G A 0.086
C T G G A C C A T C G A 0.054
C T G G A C C A C C G A 0.021
;

You can now use PROC HTSNP to search a subset of markers that explains most of the haplotype diversity in this sample. The following statements perform the search:

proc htsnp data=ctla4 size=5 method=im  
           cutoff=0.05 seed=244 conv=0.99;
   var m1-m12;
   freq freq;
run; 

The iterative maximization algorithm is selected as the search method with the METHOD=IM option. The SIZE=5 option indicates that only subsets containing exactly five SNPs are considered in the search. All haplotypes in the data set with a frequency below 0.05 are excluded from the search process because the CUTOFF=0.05 option was specified. The search continues until the convergence criterion of 0.99 is met as specified in the CONV= option. The iterative maximization algorithm randomly selects an initial set of markers, so using different seeds can produce different results.

The results from the procedure are shown in Figures Figure 9.1 and Figure 9.2.

Figure 9.1: Marker Summary for PROC HTSNP

The HTSNP Procedure

Marker Summary
Locus Allele Frequency Diversity
m1 C 0.6653 0.4454
  T 0.3347 0.4454
m2 C 0.0607 0.1140
  T 0.9393 0.1140
m3 A 0.8316 0.2801
  G 0.1684 0.2801
m4 A 0.4529 0.4956
  G 0.5471 0.4956
m5 A 0.1684 0.2801
  G 0.8316 0.2801
m6 C 0.8985 0.1823
  T 0.1015 0.1823
m7 C 0.9100 0.1637
  T 0.0900 0.1637
m8 A 0.6841 0.4322
  G 0.3159 0.4322
m9 C 0.8536 0.2500
  T 0.1464 0.2500
m10 C 0.6841 0.4322
  T 0.3159 0.4322
m11 A 0.5157 0.4995
  G 0.4843 0.4995
m12 A 0.1684 0.2801
  G 0.8316 0.2801


Figure 9.1 displays the summary of the marker loci for this sample. This includes the frequency of each allele and the gene diversity at each marker.

Figure 9.2: htSNP Evaluation

htSNP Evaluation
Rank HTSNP1 HTSNP2 HTSNP3 HTSNP4 HTSNP5 PDE
1 m2 m3 m6 m7 m8 1.0000


Figure 9.2 displays the ODS table containing the set of five SNPs that were selected as the htSNPs; these five markers correspond to those selected by Johnson et al. (2001).