A marker locus M can have a series of alleles , . A sample of individuals can therefore have several different genotypes at the locus, with copies of type . The number of copies of allele can be found directly by summation: . The sample frequencies are written as and . The ’s are unbiased maximum likelihood estimates (MLEs) of the population proportions .
The variance of the sample allele frequency is calculated as
and can be estimated by replacing and with their sample values and . The variance of the sample genotype frequency is not generally calculated; instead, an MLE of the HWD coefficient for alleles and is calculated as
and the MLE’s variance is estimated using one of the following formulas, depending on whether the two alleles are the same or different:
The standard error, the square root of the variance, is reported for the sample allele frequencies and the disequilibrium coefficient estimates. When the BOOTSTRAP=
option of the PROC ALLELE statement is specified, bootstrap confidence intervals are formed by resampling individuals from the data set and are reported for these estimates, with the % confidence level given by the ALPHA= option (or by default).