Example 6.2 Checking for Genotyping Errors

This example demonstrates the different kinds of family genotype errors (that is, Mendelian inconsistencies within a nuclear family) that can be detected by PROC FAMILY, and the output that displays this information. Here is a sample data set that contains genotype errors:

data ped_samp;
   input id p1 p2 a1 a2 dis;
   datalines;
  1   0   0 1 1 0
  2   0   0 2 3 0
  3   1   2 1 2 0
  4   1   2 4 5 1
101   0   0 . . 0
102   0   0 2 3 0
103 101 102 4 5 1
104 101 102 2 4 1
201   0   0 . . 0
202   0   0 1 4 0
203 201 202 1 5 1
204 201 202 1 6 0
205 201 202 1 7 1
301   0   0 . . 0
302   0   0 . . 0
303 301 302 1 2 1
304 301 302 1 3 0
305 301 302 1 4 0
401   0   0 . . 0
402   0   0 . . 0
403 401 402 1 1 1
404 401 402 2 2 1
405 401 402 3 3 0
501   0   0 . . 0
502   0   0 . . 0
503 501 502 1 1 0
504 501 502 2 2 0
505 501 502 1 3 1
601   0   0 . . 0
602   0   0 . . 0
603 601 602 1 1 1
604 601 602 1 4 0
605 601 602 2 3 0
701   0   0 . . 0
702   0   0 . . 0
703 701 702 1 2 0
704 701 702 2 3 1
705 701 702 1 4 0
707 701 702 2 5 1
801   0   0 1 3 0
802   0   0 . . 0
804 801 802 1 4 1
805 801 802 3 2 1
;

In addition to the usual output data set that is created, the SHOWALL option, used in the following code, requests that all families be included in the "Family Summary" table. Since there are families with genotype errors, this table would have been created by default, but only the families in error would be displayed in it.

proc family data=ped_samp showall;
   id id p1 p2;
   trait dis;
   var a1 a2;
run;

proc print; 
run;

The "Family Summary" table shown in Output 6.2.1 includes an error code, which is explained in the "Description of Error Codes" table in Output 6.2.2. The statistics shown in Output 6.2.3 are based only on the last family since all the other families have some sort of genotype error and thus are excluded from the analyses. The analysis would need to be performed again after genotyping errors have been corrected.

Output 6.2.1 Summary of Family/Marker Information
The FAMILY Procedure

Family Summary
Parent1 Parent2 Locus Number
of
Typed
Parents
Typed Children Error
Code
Aff Unaff
1 2 M1 2 1 1 8
101 102 M1 1 2 0 6
201 202 M1 1 2 1 7
301 302 M1 0 1 2 5
401 402 M1 0 2 1 4
501 502 M1 0 1 2 3
601 602 M1 0 1 2 2
701 702 M1 0 2 2 1
801 802 M1 1 2 0 0

Output 6.2.2 Description of Error Codes
Description of Error Codes
Code Description
0 No errors
1 More than 4 alleles
2 1 homozygous genotype and more than 3 alleles
3 2 homozygous genotypes and more than 2 alleles
4 More than 2 homozygous genotypes
5 An allele occurs in more than 2 heterozygous genotypes
6 At least one genotype does not contain a parental allele
7 More than 2 alleles from missing parent
8 At least one genotype incompatible with parental genotypes

Output 6.2.3 Output Data Set from PROC FAMILY
Obs Locus ChiSqTDT ChiSqSTDT ChiSqSDT ChiSqRCTDT dfTDT dfSTDT dfSDT dfRCTDT ProbTDT ProbSTDT ProbSDT ProbRCTDT
1 M1 0 0 0 0 1 0 0 1 1 . . 1