The ALLELE Procedure

Example 3.1 Using the NDATA= Option with Microsatellites

The following is a subset of data from GAW12 (Wijsman et al. 2001) and contains 17 individuals’ genotypes at 14 microsatellite markers.

data gaw;
   input id m1-m14 / m15-m28;
   datalines;
 1 11 14  6  8  2  5  9  4  6  1  9  9  9  7
    3  5 10  1  4  6  5  9  1  1  3  5  6  2   
 2  2 12  1  4  6  6  3  3  2  1 11 11  4 11  
    2  2 13 11  2  1  9  9  1  5  6  1  2  5    
 3  2 10  4  8  4  9  2  7  7  1  9  2  7 10  
    2  2  7  7  6  8  9  4  5  1  7  2  6  2   
 4  5 14  7  3  9 13  4  2  2  4 11  5  4  7  
    4  5  7  6  8  2  9  9  1  6  4  1  8  9         
 5 12 12  3  8  6  2  1  7  3  5  6 11  6  9  
    5  2 13 16  7  1  9  4  1  1  7  1  1  2   
 6  4  7  7  8  7 12  4  2  6  5  5 11  5 11  
    2  4 15 11  1  1  9  2  6  5  7  6  1  5    
 7  2 10  6  8  7  1  2  3  6  2  5  8  5  6  
    5  6 13 10  1  8  9  3  1  6  7  7  2  6    
 8  2 11  6  2  7  1  2  3  6  6 10 11 11  6  
    4  2 11 11  4  5 11  2  3  2  1  4  1  2     
 9  2  7  1  1  3  1  5  7  2  5  5 11 11 11  
    2  6 11  2  1  6  4  9  5  5  4  2  5  9     
10 11 12  2  4 13  3  1  2  4  9  5 10  7  5  
    4  4  1  6  8  1  6 10  1  1  2  5  1  1    
11 11  2  7  8  1  5  4  6  4  7  5 11 11  6  
    5  4 16 13  7  4  5  6  6  1  1  4  1  1     
12  2 12  6  8  2  7  3  2  7  5  2  8  9  6  
    2  4  7 16  7  1 10  9  5  1  1  4  9  1     
13 13 14  8  3 12 13  7  4  3  2  6 10  9  5  
    4  4  2 14  8  8  3  6  5  1  1  6  6  2    
14  7 10  6  5 10 13  8  3  5  5  9  9 11  6  
    5  4 13 14  1  1  6  9  2  1  5  3  1  2    
15 10 11  4  3  9  7  6  3  4  6 10  1  7  9  
    2  2  2 14  6  1  9  2  1  1  6  7  5  2    
16  2  5  2  7  7  2  2  9  2  2  2  6  9  5  
    2  2  7  1  1  2  6  2  1  1  1  1  9  6      
17 11  4  4  4  9  1  7  8  5  3  5  1 11  5  
    6  5  2 12  1  5  9  9  1  5  7  7  6  1    
; 

The actual names of the markers can be used, by creating a data set with the variable NAME containing these names.

data map;
   input name $ location;
   datalines;
D22G001   0.50
D22G002   0.79
D22G003   0.88
D22G004   1.02
D22G005   1.24
D22G006   2.20
D22G007   4.27
D22G008   5.85
D22G009   6.70
D22G010   9.36
D22G011  10.87
D22G012  11.67
D22G013  12.66
D22G014  15.89
;

Now an analysis using PROC ALLELE can be performed as follows:

proc allele data=gaw ndata=map nofreq perms=10000 seed=456;
   var m1-m28;
run;

This analysis produces summary statistics of the 14 markers and is using 10,000 permutations to approximate an exact $p$-value for the HWE test. The allele and genotype frequency output tables are suppressed with the NOFREQ option.

The results from the analysis are shown in Output 3.1.1. Note the names of the markers that are used.

Output 3.1.1: Summary of Microsatellites for the ALLELE Procedure

The ALLELE Procedure

Marker Summary
Locus Number
of
Indiv
Number
of
Alleles
Polymorph
Info
Content
Heterozygosity Allelic
Diversity
Test for HWE
Chi-
Square
DF Pr > ChiSq Prob
Exact
D22G001 17 9 0.8384 0.9412 0.8547 32.5172 36 0.6350 0.8581
D22G002 17 8 0.8296 0.8824 0.8478 28.5222 28 0.4370 0.3868
D22G003 17 11 0.8749 0.9412 0.8858 48.2139 55 0.7295 0.7050
D22G004 17 9 0.8259 0.9412 0.8443 24.9692 36 0.9166 0.8361
D22G005 17 8 0.8272 0.8235 0.8460 20.9416 28 0.8278 0.9413
D22G006 17 8 0.8257 0.8235 0.8443 32.0018 28 0.2744 0.1102
D22G007 17 7 0.8012 0.9412 0.8253 19.7625 21 0.5363 0.5745
D22G008 17 5 0.6665 0.6471 0.7163 11.4619 10 0.3227 0.2525
D22G009 17 11 0.8788 0.8824 0.8893 52.1333 55 0.5849 0.3866
D22G010 17 7 0.7572 0.8235 0.7820 14.7227 21 0.8366 0.8624
D22G011 17 8 0.7274 0.8235 0.7509 19.0400 28 0.8969 0.8898
D22G012 17 5 0.5661 0.6471 0.6142 17.3473 10 0.0670 0.5122
D22G013 17 7 0.7965 0.8235 0.8201 38.8062 21 0.0104 0.0390
D22G014 17 6 0.7507 0.8824 0.7837 17.2802 15 0.3024 0.4651