Spoiling the Whole Bunch: Quality Control Aimed at Preserving the Integrity of High-Throughput Genotyping

Department of Medicine, The University of Chicago, Chicago, IL 60637, USA.
The American Journal of Human Genetics (Impact Factor: 10.93). 07/2010; 87(1):123-8. DOI: 10.1016/j.ajhg.2010.06.005
Source: PubMed


False-positive or false-negative results attributable to undetected genotyping errors and confounding factors present a constant challenge for genome-wide association studies (GWAS) given the low signals associated with complex phenotypes and the noise associated with high-throughput genotyping. In the context of the genetics of kidneys in diabetes (GoKinD) study, we identify a source of error in genotype calling and demonstrate that a standard battery of quality-control (QC) measures is not sufficient to detect and/or correct it. We show that, if genotyping and calling are done by plate (batch), even a few DNA samples of marginally acceptable quality can profoundly alter the allele calls for other samples on the plate. In turn, this leads to significant differential bias in estimates of allele frequency between plates and, potentially, to false-positive associations, particularly when case and control samples are not sufficiently randomized to plates. This problem may become widespread as investigators tap into existing public databases for GWAS control samples. We describe how to detect and correct this bias by utilizing additional sources of information, including raw signal-intensity data.

Download full-text


Available from: Jennifer Below, Nov 18, 2014
21 Reads
  • Source
    • "Another limitation of our study is the use of different genotyping chips for the cases and controls (Human660W-Quadv1 BeadArrays for the cases and HumanHap550v3 for the controls; both from Illumina), while the same quality control (QC) measures were applied to the raw genotyping data for all chips [see also Cichon et al., 2011]. As a word of caution regarding chip based genotyping results, we like to point out that we have used all commonly recommended QC checks (e.g., check of intensity plots, see [Pluzhnikov et al., 2010]) for the 31 best hits. Surprisingly, the formerly best SNP (rs4862110) did pass all QC checks, but showed massive problems in the re-genotyping (TaqMan) approach; so that eventually the SNP had to be removed from further analysis (see ''Materials and Methods'' section). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The heritability of attention deficit hyperactivity disorder (ADHD) is approximately 0.8. Despite several larger scale attempts, genome-wide association studies (GWAS) have not led to the identification of significant results. We performed a GWAS based on 495 German young patients with ADHD (according to DSM-IV criteria; Human660W-Quadv1; Illumina, San Diego, CA) and on 1,300 population-based adult controls (HumanHap550v3; Illumina). Some genes neighboring the single nucleotide polymorphisms (SNPs) with the lowest P-values (best P-value: 8.38 × 10(-7)) have potential relevance for ADHD (e.g., glutamate receptor, metabotropic 5 gene, GRM5). After quality control, the 30 independent SNPs with the lowest P-values (P-values ≤ 7.57 × 10(-5) ) were chosen for confirmation. Genotyping of these SNPs in up to 320 independent German families comprising at least one child with ADHD revealed directionally consistent effect-size point estimates for 19 (10 not consistent) of the SNPs. In silico analyses of the 30 SNPs in the largest meta-analysis so far (2,064 trios, 896 cases, and 2,455 controls) revealed directionally consistent effect-size point estimates for 16 SNPs (11 not consistent). None of the combined analyses revealed a genome-wide significant result. SNPs in previously described autosomal candidate genes did not show significantly lower P-values compared to SNPs within random sets of genes of the same size. We did not find genome-wide significant results in a GWAS of German children with ADHD compared to controls. The second best SNP is located in an intron of GRM5, a gene located within a recently described region with an infrequent copy number variation in patients with ADHD.
    American Journal of Medical Genetics Part B Neuropsychiatric Genetics 12/2011; 156B(8):888-97. DOI:10.1002/ajmg.b.31246 · 3.42 Impact Factor
  • Source
    • "(Pearson Goodness of fit test, P 5 0.302 for rs4862110 and P 5 0.182 for rs4957798). When each batch was tested individually against all samples to obtain a more powerful test [Pluzhnikov et al., 2010], there was no indication of variation other than one due to sampling (Fig. 3). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Using genome-wide association studies to identify genetic variants contributing to disease has been highly successful with many novel genetic predispositions identified and biological pathways revealed. Several pitfalls for spurious association or non-replication have been highlighted: from population structure, automated genotype scoring for cases and controls, to age-varying association. We describe an important yet unreported source of bias in case-control studies due to variations in chip technology between different commercial array releases. As cases are commonly genotyped with newer arrays and freely available control resources are frequently used for comparison, there exists an important potential for false associations which are robust to standard quality control and replication design.
    Genetic Epidemiology 07/2011; 35(5):423-6. DOI:10.1002/gepi.20559 · 2.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There is emerging evidence for a genetic basis of patient-reported quality-of-life (QOL) outcomes that can ultimately be incorporated into clinical research and practice. Objectives are (1) to provide arguments for the timeliness of investigating the genetic basis of QOL given the scientific advances in genetics and patient-reported QOL research; (2) to describe the clinical implications of such investigations; (3) to present a theoretical foundation for investigating the genetic underpinnings of QOL; and (4) to describe a series of papers resulting from the GENEQOL Consortium that was established to move this work forward. Discussion of scientific advances based on relevant literature. In genetics, technological advances allow for increases in speed and efficiency and decreases in costs in exploring the genetic underpinnings of disease processes, drug metabolism, treatment response, and survival. In patient-based research, advances yield empirically based and stringent approaches to measurement that are scientifically robust. Insights into the genetic basis of QOL will ultimately allow early identification of patients susceptible to QOL deficits and to target care. The Wilson and Cleary model for patient-reported outcomes was refined by incorporating the genetic underpinnings of QOL. This series of papers provides a path for QOL and genetics researchers to work together to move this field forward and to unravel the intricate interplay of the genetic underpinnings of patient-reported QOL outcomes. The ultimate result will be a greater understanding of the process relating disease, patient, and doctor that will have the potential to lead to improved survival, QOL, and health services delivery.
    Quality of Life Research 10/2010; 19(10):1395-403. DOI:10.1007/s11136-010-9759-5 · 2.49 Impact Factor
Show more