The following haplotypes from markers at the CTLA4 locus (Johnson et al., 2001) can be read into a SAS data set as follows:
data ctla4; input (m1-m12)($) freq; datalines; C T A A G C C A C C A G 0.333 T T A G G C C G C T G G 0.224 T C A G G C C G C T G G 0.058 T T A A G C C G C T G G 0.020 C T A A G T C A C C A G 0.080 C T A G G T C A C C A G 0.017 C T A G G C C A C C A G 0.045 T T A G G C C A C C A G 0.018 C T G G A C T A T C G A 0.086 C T G G A C C A T C G A 0.054 C T G G A C C A C C G A 0.021 ;
You can now use PROC HTSNP to search a subset of markers that explains most of the haplotype diversity in this sample. The following statements perform the search:
proc htsnp data=ctla4 size=5 method=im cutoff=0.05 seed=244 conv=0.99; var m1-m12; freq freq; run;
The iterative maximization algorithm is selected as the search method with the METHOD=IM option. The SIZE=5 option indicates that only subsets containing exactly five SNPs are considered in the search. All haplotypes in the data set with a frequency below 0.05 are excluded from the search process because the CUTOFF=0.05 option was specified. The search continues until the convergence criterion of 0.99 is met as specified in the CONV= option. The iterative maximization algorithm randomly selects an initial set of markers, so using different seeds can produce different results.
The results from the procedure are shown in Figures Figure 8.1 and Figure 8.2.
Figure 8.1: Marker Summary for PROC HTSNP
Marker Summary | |||
---|---|---|---|
Locus | Allele | Frequency | Diversity |
m1 | C | 0.6653 | 0.4454 |
T | 0.3347 | 0.4454 | |
m2 | C | 0.0607 | 0.1140 |
T | 0.9393 | 0.1140 | |
m3 | A | 0.8316 | 0.2801 |
G | 0.1684 | 0.2801 | |
m4 | A | 0.4529 | 0.4956 |
G | 0.5471 | 0.4956 | |
m5 | A | 0.1684 | 0.2801 |
G | 0.8316 | 0.2801 | |
m6 | C | 0.8985 | 0.1823 |
T | 0.1015 | 0.1823 | |
m7 | C | 0.9100 | 0.1637 |
T | 0.0900 | 0.1637 | |
m8 | A | 0.6841 | 0.4322 |
G | 0.3159 | 0.4322 | |
m9 | C | 0.8536 | 0.2500 |
T | 0.1464 | 0.2500 | |
m10 | C | 0.6841 | 0.4322 |
T | 0.3159 | 0.4322 | |
m11 | A | 0.5157 | 0.4995 |
G | 0.4843 | 0.4995 | |
m12 | A | 0.1684 | 0.2801 |
G | 0.8316 | 0.2801 |
Figure 8.1 displays the summary of the marker loci for this sample. This includes the frequency of each allele and the gene diversity at each marker.
Figure 8.2: htSNP Evaluation
htSNP Evaluation | ||||||
---|---|---|---|---|---|---|
Rank | HTSNP1 | HTSNP2 | HTSNP3 | HTSNP4 | HTSNP5 | PDE |
1 | m2 | m3 | m6 | m7 | m8 | 1.0000 |
Figure 8.2 displays the ODS table containing the set of five SNPs that were selected as the htSNPs; these five markers correspond to those selected by Johnson et al. (2001).