The BTL Procedure (Experimental)

Statistical Computations

The model specified using the MARKER, MODEL, RANDOM, and REPEATED statements is estimated using mixed model theory, and the resulting model statistics are printed in the Model Statistics table. For more details about these calculations, see the Mixed Models Theory section in The MIXED Procedure chapter in the SAS/STAT User's Guide.

If the PARMEST statement is used, a BTL model will be fit to the input data. This section describes the formulation of the BTL model and the procedure for estimating model parameters for the given data set. The section has been adapted from Coffman et al. (2005).

PROC BTL fits the input data to a probability model for multiple binary trait loci (Simonsen, 2004). The assumed genetic map contains alternating markers ($M_ i$) and binary trait loci ($G_ i$), with at least one marker associated with each binary trait locus: $M_1 G_1 M_2 G_2 \cdots M_ k G_ k$. One allele is fixed in backcross populations, so there are $K=2^ k$ unique marker classes or BTL $k$-locus genotypes. In an $F_ n$ population with phase unknown, each locus has three possible genotypes, giving a total of $K=3^{k}$ genotypes across the $k$ markers or BTL. The recombination rate between two loci is the probability that a crossover occurs between the loci, ranging from 0 (complete linkage) to 0.5 (no linkage). This value is represented by $r_ i$ for the loci $G_ i$ and $M_ i$, $i=1,\ldots ,k$, and $\theta _ i$ for markers $M_ i$ and $M_{i+1}$, so there are $k-1$ marker recombination parameters. Each penetrance parameter, $p_{m}$, is the probability that a binary trait is present for the $m$th BTL genotype (McIntyre, Coffman, and Doerge, 2001). Similarly, $\pi _ m$ denotes the penetrance parameter for the $m$th marker genotype.

The joint probabilities of the BTL genotypes ($G$), the marker classes ($M$), and the trait ($Y$) can be expressed in matrix form in terms of $\mathbf{r}$, $\mathbf{\theta }$, and $\mathbf{p}$, assuming no selection, interference, or mutation, as shown by Simonsen (2004). These probabilities provide a likelihood equation for $\mathbf{r}$, $\mathbf{\theta }$, and $\mathbf{p}$. From this likelihood, the maximum likelihood estimate (MLE) for $\pi _ m$, $\hat{\pi }_ m$, is given by the observed binomial proportion of individuals with marker genotype $m$ in whom the trait is present. The invariance property of MLEs (Casella and Berger, 1990) can be applied to obtain the MLE of penetrance parameters $\mathbf{p}$ as the product of $\mathbf{\hat{\pi }}$ and a function of the recombination rates $\mathbf{r}$. By entering a known set of $\mathbf{r}$ or performing a grid search over a range of possible values of $\mathbf{r}$, unique estimates of penetrance parameters $\mathbf{p}$ can be computed.