Spoiling the whole bunch: quality control aimed at preserving the integrity of high-throughput genotyping.

Department of Medicine, The University of Chicago, Chicago, IL 60637, USA.
The American Journal of Human Genetics (Impact Factor: 10.99). 07/2010; 87(1):123-8. DOI: 10.1016/j.ajhg.2010.06.005
Source: PubMed

ABSTRACT False-positive or false-negative results attributable to undetected genotyping errors and confounding factors present a constant challenge for genome-wide association studies (GWAS) given the low signals associated with complex phenotypes and the noise associated with high-throughput genotyping. In the context of the genetics of kidneys in diabetes (GoKinD) study, we identify a source of error in genotype calling and demonstrate that a standard battery of quality-control (QC) measures is not sufficient to detect and/or correct it. We show that, if genotyping and calling are done by plate (batch), even a few DNA samples of marginally acceptable quality can profoundly alter the allele calls for other samples on the plate. In turn, this leads to significant differential bias in estimates of allele frequency between plates and, potentially, to false-positive associations, particularly when case and control samples are not sufficiently randomized to plates. This problem may become widespread as investigators tap into existing public databases for GWAS control samples. We describe how to detect and correct this bias by utilizing additional sources of information, including raw signal-intensity data.


Available from: Jennifer Below, Nov 18, 2014
  • Source
    The Journal of Bone and Joint Surgery 03/2014; 96(5):e38. DOI:10.2106/JBJS.M.00398 · 4.31 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Diabetes is increasing at daunting rates worldwide, and approximately 40% of affected individuals will develop kidney complications. Diabetic kidney disease (DKD) is the leading cause of end-stage kidney disease, and there are significant healthcare costs providing appropriate renal replacement therapies to affected individuals. For several decades, investigators have sought to discover inherited risk factors and biomarkers for DKD. In recent years, advances in high-throughput laboratory techniques and computational analyses, coupled with the establishment of multicenter consortia, have helped to identify genetic loci that are replicated across multiple populations. Several genome-wide association studies (GWAS) have been conducted for DKD with further meta-analysis of GWAS and comprehensive "single gene" meta-analyses now published. Despite these efforts, much of the inherited predisposition to DKD remains unexplained. Meta-analyses and integrated-omics pathway studies are being used to help elucidate underlying genetic risks. Epigenetic phenomena are increasingly recognized as important drivers of disease risk, and several epigenome-wide association studies have now been completed. This review describes key findings and ongoing genetic and epigenetic initiatives for DKD.
    Advances in chronic kidney disease 05/2014; 21(3):287-296. DOI:10.1053/j.ackd.2014.03.010 · 1.94 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models.
    Genetic Epidemiology 09/2014; 38 Suppl 1(S1):S21-8. DOI:10.1002/gepi.21821 · 2.95 Impact Factor