The ALLELE Procedure

Population Structure

The genetic structure of populations can be characterized by Wright’s $F$ statistics (1951) measuring the degree of relatedness between different types of allele pairs. Cockerham (1969, 1973) defines these same quantities in an analysis-of-variance (ANOVA) framework. For a population hierarchy defined by the variable in the POP statement, these measures include $\theta _ P$ and, when HWE is not assumed, $F$ and $f$, corresponding to Wright’s $F_{ST}$, $F_{IT}$, and $F_{IS}$, respectively. A weighted average of these measures over loci can be reported as an overall measure, and measures for individual loci can be requested as well. The estimates of these parameters are calculated using an ANOVA structure along with a method-of-moments approach.

For genotypic data with unknown phase from $r$ populations, variation can be partitioned into three sources: between populations, between individuals within populations, and within individuals, with respective observed mean squares $MSP$, $MSI$, and $MSG$. Using the method of moments to equate estimates of the variance components with functions of the observed mean squares, the coancestry coefficients can be estimated as follows:

\begin{eqnarray*}  \hat{F} &  = &  1 - \frac{2n_ c MSG}{MSP + (n_ c-1)MSI + n_ c MSG}\\ \hat{\theta }_ P &  = &  \frac{MSP-MSI}{MSP + (n_ c-1)MSI + n_ c MSG} \\ \hat{f} &  = &  \frac{\hat{F}-\hat{\theta }_ p}{1-\hat{F}} \end{eqnarray*}

where $n_ c=\frac{1}{r-1}(\sum _{i=1}^ r n_ i - \frac{\sum _ i n_ i^2}{\sum _ i n_ i})$ for $r$ populations.

If HWE is assumed in a two-level population hierarchy, the data can be treated as haploid data where allele, not genotype, frequencies are used in the calculations. Also, in this scenario, $\theta _ P$ and $F$ are equal and $f=0$. Thus, there is only one parameter to estimate, $\theta _ P$, which represents the covariance of alleles from the same population relative to the covariance between alleles from different populations, estimated as follows:

\begin{eqnarray*}  \hat{\theta }_ P &  = &  \frac{MSP-MSG}{MSP + (n_ c-1) MSG} \\ \end{eqnarray*}

where the counts used in $n_ c$ are now in terms of alleles instead of individuals.

Tests of hypotheses that these parameters are 0 can be executed via permutation tests. A different permutation scheme is used for each parameter under each population structure scenario. The schemes displayed in Table 2.2 “Permutation Schemes for Population Structure Parameters” are derived from Excoffier and Lischer (2011).

Table 2.2: Permutation Schemes for Population Structure Parameters

Parameter

$f$ = 0 ?

Permutation Scheme

$\theta _ P$

Yes

Individuals among populations

$\theta _ P$

No

Individuals among populations

$F$

No

Alleles among populations

$f$

No

Alleles within populations