The ALLELE Procedure

Testing for Hardy-Weinberg Equilibrium

Under ideal population conditions, the two alleles an individual receives, one from each parent, are independent so that $P_{uu} = p_ u^2$ and $P_{uv} = 2p_ up_ v, u\neq v$. The factor of 2 for heterozygotes recognizes the fact that $M_ u/M_ v$ and $M_ v/M_ u$ genotypes are generally indistinguishable. This statement about allelic independence within loci is called Hardy-Weinberg equilibrium (HWE). Forces such as selection, mutation, and migration in a population or nonrandom mating can cause departures from HWE. Two methods are used here for testing a marker for HWE, both of which can accommodate any number of alleles. Both methods are testing the hypothesis that $P_{uu}= p_ u^2$ and $P_{uv} = 2p_ up_ v, u\neq v$ for all $u,v=1,\ldots ,k$.

Chi-Square Goodness-of-Fit Test

The chi-square goodness-of-fit test can be used to test markers for HWE. The chi-square statistic

\[  X^2_ T = \sum _ u \frac{(n_{uu} - n\tilde{p}_ u^2)^2}{n\tilde{p}_ u^2} + \sum _ u \sum _{v > u} \frac{(n_{uv}-2n\tilde{p}_ u\tilde{p}_ v)^2}{2n\tilde{p}_ u\tilde{p}_ v}  \]

has $k(k-1)/2$ degrees of freedom (df), where $k$ is the number of alleles at the marker locus.

Permutation Version of Exact Test

The permutation version of the exact test given by Guo and Thompson (1992) is based on the conditional probability of genotype counts given allelic counts and the hypothesis of allelic independence. The probability of the observed genotype counts under this hypothesis is

\[  T = \frac{n!}{(2n)!} \frac{2^ h\prod _ u n_ u!}{\prod _{u,v}n_{uv}!}  \]

where $h=\sum _ u \sum _{v \neq u}n_{uv}$ is the number of heterozygous individuals. Significance levels are calculated by the Monte Carlo permutation procedure. The $2n$ alleles are randomly permuted the number of times indicated in the PERMS= option to form new sets of $n$ genotypes. The significance level is then calculated as the proportion of times the value of $T$ for each set of permuted data does not exceed the value of $T$ for the actual data. You can indicate the random seed used to randomly permute the data in the SEED= option of the PROC ALLELE statement.