Article

Confounding from Cryptic Relatedness in Case-Control Association Studies

Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
PLoS Genetics (Impact Factor: 8.17). 10/2005; 1(3):e32. DOI: 10.1371/journal.pgen.0010032
Source: PubMed

ABSTRACT Synopsis
There has long been concern in the human genetics community that case-control association studies may be subject to high rates of false positives if there is unrecognized population structure. After being considered rather suspect in the 1990s for this reason, case-control studies are regaining popularity, and will no doubt be used widely in future genome-wide association studies.
Therefore, it is important to fully understand the types of factors that can lead to excess rates of false positives in case-control studies. Virtually all of the previous discussion in the literature of excess false positives (confounding) in case-control studies has focused on the role of population structure. Yet a widely cited 1999 paper by Devlin and Roeder (that introduced the genomic control concept) argued that, in fact, “cryptic relatedness” (referring to the idea that some members of a case-control sample might actually be close relatives, unbeknownst to the investigator) is likely to be a far more important confounder than population structure. Moreover, one of the two main types of statistical approaches for dealing with confounding in case-control studies (i.e., structured association methods) does not correct for cryptic relatedness.
This work provides the first careful model of cryptic relatedness, and outlines exactly when cryptic relatedness is and is not likely to be a problem. The authors provide simple expressions that predict the extent of confounding due to cryptic relatedness. Surprisingly, these expressions are functions of directly observable parameters. The analytical results show that, for well-designed studies in outbred populations, the degree of confounding due to cryptic relatedness will usually be negligible. However, in contrast, studies where there is a sampling bias toward collecting relatives may indeed suffer from excessive rates of false positives.

Download full-text

Full-text

Available from: Benjamin F Voight, Jul 24, 2014
0 Followers
 · 
123 Views
  • Source
    • "So identification of true positives in association mapping requires correction for the confounding effects of population structure. Similarly, covariance between individuals because of their relatedness can increase the false positive rate (Voight and Pritchard 2005). To reduce the false positive associations in our study, we used the population structure matrix (Q) to evaluate the effects of population structure and pairwise kinship (K) to evaluate relatedness among individuals. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Association studies have emerged as a powerful tool for identification of markers associated with quantitative traits in forest trees. The cytosolic enzyme uridine 5′ diphosphate-glucose dehydrogenase (UGDH) converts UDP-glucose to UDP-glucuronate and affects cell wall formation in higher plants. Here, we used association genetics to identify UDP-glucose dehydrogenase (PtUGDH) allelic variation that associates with wood quality traits in Populus tomentosa. We isolated a 1,828 bp PtUGDH cDNA encoding a polypeptide of 481 amino acids. Expression analysis revealed that PtUGDH was expressed predominantly in young root, developing xylem from vascular tissues, and young leaves, suggesting that UGDH functions in cell wall formation. We identified 59 single-nucleotide polymorphisms (SNPs; π T = 0.00475) by resequencing the PtUGDH locus of 40 individuals and genotyped the 22 most common SNPs (minor allele frequency >10 %) in a discovery population (n = 426). Linkage disequilibrium (LD) analysis showed that LD did not extend over the entire gene (r 2 < 0.1, within 300 bp). Association studies indicated that three SNPs (false discovery rate, Q < 0.05) and 12 haplotypes (Q < 0.05) were significantly associated with wood properties. The three significant SNPs are all in the 5′ untranslated regions of PtUGDH, and the phenotypic variance explained by each SNP ranged from 5.37 to 11.97 %. We validated one association in a validation population (n = 1,200) and validated another association by examining its effect on gene expression. The present study provided molecular markers associated with fiber length and holocellulose content, markers that have potential applications in marker-assisted breeding.
    Tree Genetics & Genomes 04/2014; 10(2). DOI:10.1007/s11295-013-0689-6 · 2.44 Impact Factor
  • Source
    • "The above analyses suggest that population stratification does not correlate with the trait and does not influence the results of the association study. Cryptic relatedness, i.e., unknown genetic relationships between individuals in a sample, can also confound the association analysis due to nonindependence and larger than expected phenotypic variance (Voight and Pritchard 2005; Cheng et al. 2010). We estimated the GRM from whole-genome SNP data using mixmogam (a Python implementation of EMMAX) (Kang et al. 2010; Segura et al. 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification and validation of gene-gene interactions is a major challenge in human studies. Here, we explore an approach for studying epistasis in humans using a Drosophila melanogaster model of neonatal diabetes mellitus. Expression of the mutant preproinsulin (hINS(C96Y)) in the eye imaginal disc mimics the human disease: it activates conserved stress response pathways and leads to cell death (reduction in eye area). Dominant-acting variants in wild-derived inbred lines from the Drosophila Genetics Reference Panel produce a continuous, highly heritable distribution of eye degeneration phenotypes in a hINS(C96Y) background. A genome-wide association study (GWAS) in 154 sequenced lines identified a sharp peak on chromosome 3L, which mapped to a 400bp linkage block within an intron of the gene sulfateless (sfl). RNAi knock-down of sfl enhanced the eye degeneration phenotype in a mutant-hINS-dependent manner. RNAi against two additional genes in the heparan sulfate (HS) biosynthetic pathway (ttv and botv), in which sfl acts, also modified the eye phenotype in a hINS(C96Y)-dependent manner, strongly suggesting a novel link between HS-modified proteins and cellular responses to misfolded proteins. Finally, we evaluated allele-specific expression difference between the two major sfl-intronic haplotypes in heterozygtes. The results showed significant heterogeneity in marker-associated gene expression, thereby leaving the causal mutation(s) and its mechanism unidentified. In conclusion, the ability to create a model of human genetic disease, map a QTL by GWAS to a specific gene, validate its contribution to disease with available genetic resources, and the potential to experimentally link the variant to a molecular mechanism, demonstrate the many advantages Drosophila holds in determining the genetic underpinnings of human disease.
    Genetics 02/2014; 196(2):557-567. DOI:10.1534/genetics.113.157800 · 4.87 Impact Factor
  • Source
    • "Identity by Descent in the DGRP Resource Because association studies rely on the assumption that individuals are unrelated (Voight and Pritchard 2005), we used the SNP calls from Mackay et al. (2012) to scan for regions of extensive identity by descent (IBD) in the DGRP sample; >95% similarity in 1 Mb windows with 100 kb steps. We began with a total of 148 DGRP lines for which we have sequence data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Here we present computational machinery to efficiently and accurately identify transposable element (TE) insertions in 146 next-generation sequenced inbred strains of Drosophila melanogaster. The panel of lines we use in our study is composed of strains from a pair of genetic mapping resources; the Drosophila Genetic Reference Panel (DGRP) and the Drosophila Synthetic Population Resource (DSPR). We identified 23,087 TE insertions in these lines, of which 83.3% are found in only one line. There are marked differences in the distribution of elements over the genome, with TEs found at higher densities on the X chromosome, and in regions of low recombination. We also identified many more TEs per base pair of intronic sequence and fewer TEs per base pair of exonic sequence than expected if TEs are located at random locations in the euchromatic genome. There was substantial variation in TE load across genes. For example, the paralogs derailed and derailed-2 show a significant difference in the number of TE insertions, potentially reflecting differences in the selection acting on these loci. When considering TE families we find a very weak effect of gene family size on TE insertions per gene, indicating that as gene family size increases the number of TE insertions in a given gene within that family also increases. Transposable elements are known to be associated with certain phenotypes, and our data will allow investigators using the DGRP and DSPR to assess the functional role of TE insertions in complex trait variation more generally. Notably, because most TEs are very rare and often private to a single line, causative TEs resulting in phenotypic differences among individuals may typically fail to replicate across mapping panels since individual elements are unlikely to segregate in both panels. Our data suggest that "burden tests" that test for the effect of TEs as a class may be more fruitful.
    Molecular Biology and Evolution 07/2013; 30(10). DOI:10.1093/molbev/mst129 · 14.31 Impact Factor
Show more