Example 4.1 Performing BTL Model Selection and Estimation

In the following data taken from Coffman et al. (2005), a sample of trout from an experimental backcross population were genotyped at markers spanning 38 chromosomes, and the disease resistance of each trout was recorded concomitantly. In a prior analysis, single-marker tests were performed with each marker, and the marker most strongly associated with the trait for each chromosome was kept. This reduced set of data along with the corresponding map data is input in the following code. "A" corresponds to the homozygous genotype, and "B" corresponds to the heterozygous genotype; these are the default values for PROC BTL.

data TroutDat;
   input (m1-m17) ($) trait;
   datalines;
A A A B A A B B A A A B B A A B A 0
A B A A B B A B A A A A B B A A B 0

   ... more lines ...   

B B A B B A B A B B A B A B B B B 1
B B B A A B A B A A A A A A A B A 1
;
data TroutMap;
   input marker $ location name $ chromosome;
   datalines;
m1 0 agcagc3 1 
m2 0 acgatt11 2 
m3 0 agcagc5 3 
m4 11.6 agcagc6 4 
m5 0 accaag22 5 
m6 0 agcaca5 6 
m7 43 agcagt27 7 
m8 9.3 acgagt8 8 
m9 52.8 agcact10 9 
m10 2.3 agccag11 10 
m11 0 accatc3 11 
m12 0 accaag8 12 
m13 14 accaag13 13 
m14 7 acgaag8 14 
m15 7 acgatc9 15 
m16 0 agcagt26 16 
m17 18.4 acccca20 17 
;

First, the single-marker models are tested as shown in the following code. The ALL=1 option in the MARKER statement requests the single-marker model tests for each marker in the MARKER statement. The GROUP= option indicates that chromosome is the linkage group variable in the map data set.


proc btl data=TroutDat map=TroutMap;
   marker m1-m17 /all=1 group=chromosome;
   model trait;
run;

The single-marker model results are shown in Output 4.1.1.

Output 4.1.1 Single-Marker Model Statistics
The BTL Procedure

Model Statistics
Linkage
Group
Marker
Effect
Name DF Chi-
Square
Pr > ChiSq AIC AICC BIC
1 M1 agcagc3 1 2.9479 0.0860 48.3 49.1 52.6
2 M2 acgatt11 1 1.0612 0.3029 50.1 51.0 54.4
3 M3 agcagc5 1 2.9479 0.0860 48.3 49.1 52.6
4 M4 agcagc6 1 2.1504 0.1425 49.0 49.9 53.3
5 M5 accaag22 1 1.2346 0.2665 50.0 50.9 54.3
6 M6 agcaca5 1 1.6173 0.2035 49.6 50.5 53.9
7 M7 agcagt27 1 3.5413 0.0599 47.7 48.5 52.0
8 M8 acgagt8 1 1.2158 0.2702 50.0 50.9 54.3
9 M9 agcact10 1 3.5413 0.0599 47.7 48.5 52.0
10 M10 agccag11 1 2.9479 0.0860 48.3 49.1 52.6
11 M11 accatc3 1 1.2848 0.2570 49.9 50.8 54.2
12 M12 accaag8 1 7.3678 0.0066 43.8 44.7 48.1
13 M13 accaag13 1 5.1787 0.0229 46.0 46.9 50.3
14 M14 acgaag8 1 2.6072 0.1064 48.6 49.5 52.9
15 M15 acgatc9 1 1.7060 0.1915 49.5 50.4 53.8
16 M16 agcagt26 1 2.3491 0.1254 48.8 49.7 53.2
17 M17 acccca20 1 3.5413 0.0599 47.7 48.5 52.0

Next, all two-marker models are estimated with the following code. Note that only the five best models are requested using the BEST= option, and the model selection criterion is selected to be the -value from using the MC= option.

proc btl data=TroutDat map=TroutMap;
   marker m1-m17 /all=2 best=5 mc=p group=chromosome;
   model trait;
run;

The two-marker model results are shown in the Output 4.1.2.

Output 4.1.2 Two-Marker Model Statistics
The BTL Procedure

Model Statistics
Marker
Effect
DF Chi-
Square
Pr > ChiSq AIC AICC BIC
M12*M17 3 24.8422 <.0001 30.4 32.8 37.5
M12*M14 3 15.1226 0.0017 40.1 42.5 47.2
M8*M12 3 13.6663 0.0034 41.5 43.9 48.7
M7*M12 3 12.9359 0.0048 42.3 44.7 49.4
M9*M12 3 11.6006 0.0089 43.6 46.0 50.8

It looks like m12 xm17 is the best two-marker effect. You can try to estimate parameters for a BTL model by using the following code. The parameter estimation is requested using the PARMEST statement. The linkage units of the location variable from the map data set are specified to be CM (centimorgans) by using the LINKUNIT option. The linkage model used to create the inputted linkage map is specified as Haldane by using the LINKMOD= option.

Now that the experimental design has been specified, by default PROC BTL performs a grid search over a range of possible recombination values and displays penetrance estimates that are found to be within the range of valid values. The default grid search for the uses values from 0 to 0.5 in increments of 0.1. In the following code, a finer grid search is requested by specifying the increment to be 0.05 in the RINC= option. Also, the penetrance parameter limits are set to be –0.1 and 1.1 using the options PMIN= and PMAX=, respectively (the default values are 0 and 1). The SAS code follows.

proc btl data=TroutDat map=TroutMap;
   marker m12 m17 /group=chromosome;
   model trait;
   parmest r=0 linkmod=H linkunit=cm boot=1000;
run;

The average penetrance value for each marker class is shown in the "Marker Class Means" table in Output 4.1.3, and the BTL penetrance estimates found using 0 for the marker-BTL recombination rates are shown in the "Parameter Estimates" table in Output 4.1.4. The confidence limits for the penetrance parameters from using the bootstrap technique are displayed in Output 4.1.4 as well.

Output 4.1.3 Marker Class Means
The BTL Procedure

Marker Class Means
Marker
Class
Marker
Genotype
N Mean Standard
Error
pi11 AA 8 1.0000 0.0000
pi12 AB 8 0.2500 0.0234
pi21 BA 6 0.0000 0.0000
pi22 BB 9 0.3333 0.0247

Output 4.1.4 Parameter Estimates
Parameter Estimates
Parameter Estimate 95% Confidence Limits
r1 0.0000    
r2 0.0000    
p11 4.0000 4.0000 4.0000
p12 1.0000 0.0000 2.4000
p21 0.0000 0.0000 0.0000
p22 1.3333 0.0000 2.6667
theta 0.5000    
NOTE: The r and theta parameters are
fixed.

Suppose your input data were coded differently, and "A" signified heterozygote and "B" signified homozygote. Since this coding is different from the default, you can specify the genotype values by using the HOMOZYGOTE= (or HO=) and HETEROZYGOTE= (or HE=) options.

proc btl data=TroutDat map=TroutMap;
   marker m12 m17 /group=chromosome;
   model trait;
   parmest ho="B" he="A" r=0 linkmod=H linkunit=cm boot=1000;
run;

In Outputs 4.1.5 and 4.1.6, notice that the marker class mean values are reversed and different penetrance estimates are obtained.

Output 4.1.5 Marker Class Means with Different Genotype Coding
The BTL Procedure

Marker Class Means
Marker
Class
Marker
Genotype
N Mean Standard
Error
pi11 BB 9 0.3333 0.0247
pi12 BA 6 0.0000 0.0000
pi21 AB 8 0.2500 0.0234
pi22 AA 8 1.0000 0.0000

Output 4.1.6 Parameter Estimates with Different Genotype Coding
Parameter Estimates
Parameter Estimate 95% Confidence Limits
r1 0.0000    
r2 0.0000    
p11 1.3333 0.0000 2.6667
p12 0.0000 0.0000 0.0000
p21 1.0000 0.0000 2.4000
p22 4.0000 4.0000 4.0000
theta 0.5000    
NOTE: The r and theta parameters are
fixed.


Note: This procedure is experimental.