The ALLELE Procedure

Example 3.1 Using the NDATA= Option with Microsatellites

The following is a subset of data from GAW12 (Wijsman et al. 2001) and contains 17 individuals’ genotypes at 14 microsatellite markers.

data gaw;
   input id m1-m14 / m15-m28;
   datalines;
 1 11 14  6  8  2  5  9  4  6  1  9  9  9  7
    3  5 10  1  4  6  5  9  1  1  3  5  6  2   
 2  2 12  1  4  6  6  3  3  2  1 11 11  4 11  
    2  2 13 11  2  1  9  9  1  5  6  1  2  5    
 3  2 10  4  8  4  9  2  7  7  1  9  2  7 10  
    2  2  7  7  6  8  9  4  5  1  7  2  6  2   
 4  5 14  7  3  9 13  4  2  2  4 11  5  4  7  
    4  5  7  6  8  2  9  9  1  6  4  1  8  9         
 5 12 12  3  8  6  2  1  7  3  5  6 11  6  9  
    5  2 13 16  7  1  9  4  1  1  7  1  1  2   
 6  4  7  7  8  7 12  4  2  6  5  5 11  5 11  
    2  4 15 11  1  1  9  2  6  5  7  6  1  5    
 7  2 10  6  8  7  1  2  3  6  2  5  8  5  6  
    5  6 13 10  1  8  9  3  1  6  7  7  2  6    
 8  2 11  6  2  7  1  2  3  6  6 10 11 11  6  
    4  2 11 11  4  5 11  2  3  2  1  4  1  2     
 9  2  7  1  1  3  1  5  7  2  5  5 11 11 11  
    2  6 11  2  1  6  4  9  5  5  4  2  5  9     
10 11 12  2  4 13  3  1  2  4  9  5 10  7  5  
    4  4  1  6  8  1  6 10  1  1  2  5  1  1    
11 11  2  7  8  1  5  4  6  4  7  5 11 11  6  
    5  4 16 13  7  4  5  6  6  1  1  4  1  1     
12  2 12  6  8  2  7  3  2  7  5  2  8  9  6  
    2  4  7 16  7  1 10  9  5  1  1  4  9  1     
13 13 14  8  3 12 13  7  4  3  2  6 10  9  5  
    4  4  2 14  8  8  3  6  5  1  1  6  6  2    
14  7 10  6  5 10 13  8  3  5  5  9  9 11  6  
    5  4 13 14  1  1  6  9  2  1  5  3  1  2    
15 10 11  4  3  9  7  6  3  4  6 10  1  7  9  
    2  2  2 14  6  1  9  2  1  1  6  7  5  2    
16  2  5  2  7  7  2  2  9  2  2  2  6  9  5  
    2  2  7  1  1  2  6  2  1  1  1  1  9  6      
17 11  4  4  4  9  1  7  8  5  3  5  1 11  5  
    6  5  2 12  1  5  9  9  1  5  7  7  6  1    
;

The actual names of the markers can be used, by creating a data set with the variable NAME containing these names.

data map;
   input name $ location;
   datalines;
D22G001   0.50
D22G002   0.79
D22G003   0.88
D22G004   1.02
D22G005   1.24
D22G006   2.20
D22G007   4.27
D22G008   5.85
D22G009   6.70
D22G010   9.36
D22G011  10.87
D22G012  11.67
D22G013  12.66
D22G014  15.89
;

Now an analysis using PROC ALLELE can be performed as follows:

proc allele data=gaw ndata=map nofreq perms=10000 seed=456;
   var m1-m28;
run;

This analysis produces summary statistics of the 14 markers and is using 10,000 permutations to approximate an exact -value for the HWE test. The allele and genotype frequency output tables are suppressed with the NOFREQ option.

The results from the analysis are shown in Output 3.1.1. Note the names of the markers that are used.

Output 3.1.1: Summary of Microsatellites for the ALLELE Procedure

The ALLELE Procedure

Marker Summary
Locus	Number of Indiv	Number of Alleles	Polymorph Info Content	Heterozygosity	Allelic Diversity	Test for HWE
Locus	Number of Indiv	Number of Alleles	Polymorph Info Content	Heterozygosity	Allelic Diversity	Chi- Square	DF	Pr > ChiSq	Prob Exact
D22G001	17	9	0.8384	0.9412	0.8547	32.5172	36	0.6350	0.8581
D22G002	17	8	0.8296	0.8824	0.8478	28.5222	28	0.4370	0.3868
D22G003	17	11	0.8749	0.9412	0.8858	48.2139	55	0.7295	0.7050
D22G004	17	9	0.8259	0.9412	0.8443	24.9692	36	0.9166	0.8361
D22G005	17	8	0.8272	0.8235	0.8460	20.9416	28	0.8278	0.9413
D22G006	17	8	0.8257	0.8235	0.8443	32.0018	28	0.2744	0.1102
D22G007	17	7	0.8012	0.9412	0.8253	19.7625	21	0.5363	0.5745
D22G008	17	5	0.6665	0.6471	0.7163	11.4619	10	0.3227	0.2525
D22G009	17	11	0.8788	0.8824	0.8893	52.1333	55	0.5849	0.3866
D22G010	17	7	0.7572	0.8235	0.7820	14.7227	21	0.8366	0.8624
D22G011	17	8	0.7274	0.8235	0.7509	19.0400	28	0.8969	0.8898
D22G012	17	5	0.5661	0.6471	0.6142	17.3473	10	0.0670	0.5122
D22G013	17	7	0.7965	0.8235	0.8201	38.8062	21	0.0104	0.0390
D22G014	17	6	0.7507	0.8824	0.7837	17.2802	15	0.3024	0.4651