Curses—Winnerʼs and Otherwise—in Genetic Epidemiology

Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
Epidemiology (Cambridge, Mass.) (Impact Factor: 6.2). 10/2008; 19(5):649-51; discussion 657-8. DOI: 10.1097/EDE.0b013e318181b865
Source: PubMed


The estimated effect of a marker allele from the initial study reporting the marker-allele association is often exaggerated relative to the estimated effect in follow-up studies (the "winner's curse" phenomenon). This is a particular concern for genome-wide association studies, where markers typically must pass very stringent significance thresholds to be selected for replication. A related problem is the overestimation of the predictive accuracy that occurs when the same data set is used to select a multilocus risk model from a wide range of possible models and then estimate the accuracy of the final model ("over-fitting"). Even in the absence of these quantitative biases, researchers can over-state the qualitative importance of their findings--for example, by focusing on relative risks in a context where sensitivity and specificity may be more appropriate measures. Epidemiologists need to be aware of these potential problems: as authors, to avoid or minimize them, and as readers, to detect them.

5 Reads
  • Source
    • "Finally, it needs to be emphasized that the GWAS results reported here represent preliminary findings that need to be replicated in independent data sets. However, even if these were to reveal smaller effect estimates than those reported here (e.g., as a result of the “winner's curse”; Kraft, 2008), this should have no bearing on the functional genetic and expression profiling results of our study. In addition to generating independent genetic association data, future work will need to extend our eQTL findings to the CNS, confirm the regulatory role of hsa-mir-138-5p on endogenous expression of DCP1B and other target genes and their corresponding proteins (in particular those potentially involved in human memory function, such as WWC1 [KIBRA]), and assess the role of this miRNA on their putative targets in vivo. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic factors underlie a substantial proportion of individual differences in cognitive functions in humans, including processes related to episodic and working memory. While genetic association studies have proposed several candidate "memory genes", these currently explain only a minor fraction of the phenotypic variance. Here, we performed genome-wide screening on 13 episodic and working memory phenotypes in 1,318 participants of the Berlin Aging Study II aged 60 years or older. The analyses highlight a number of novel single nucleotide polymorphisms (SNPs) associated with memory performance, including one located in a putative regulatory region of microRNA (miRNA) hsa-mir-138-5p (rs9882688, P-value = 7.8x10-9). Expression quantitative trait locus analyses on next-generation RNA-sequencing data revealed that rs9882688 genotypes show a significant correlation with the expression levels of this miRNA in 309 human lymphoblastoid cell lines (P-value = 5x10-4). In silico modeling of other top-ranking GWAS signals identified an additional memory-associated SNP in the 3' untranslated region (3'UTR) of DCP1B, a gene encoding a core component of the mRNA decapping complex in humans, predicted to interfere with hsa-mir-138-5p binding. This prediction was confirmed in vitro by luciferase assays showing differential binding of hsa-mir-138-5p to 3'UTR reporter constructs in two human cell lines (HEK293: P-value = 0.0470; SH-SY5Y: P-value = 0.0866). Finally, expression profiling of hsa-mir-138-5p and DCP1B mRNA in human post-mortem brain tissue revealed that both molecules are expressed simultaneously in frontal cortex and hippocampus, suggesting that the proposed interaction between hsa-mir-138-5p and DCP1B may also take place in vivo. In summary, by combining unbiased genome-wide screening with extensive in silico modeling, in vitro functional assays, and gene expression profiling, our study identified miRNA-138 as a potential molecular regulator of human memory function.
    Frontiers in Human Neuroscience 07/2014; 8:501. DOI:10.3389/fnhum.2014.00501 · 3.63 Impact Factor
  • Source
    • "However, we often depend on the luck of the draw (cf. “winners curse”) [33,34], when we insist on extreme levels of certainty from a single analysis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions.
    BioData Mining 06/2014; 7(1):10. DOI:10.1186/1756-0381-7-10 · 2.02 Impact Factor
  • Source
    • "Although the replication population was smaller (378 vs. 454), it was able to confirm each of the major effector loci for femur length. The observed inflation of phenotype effects during an initial search (the ''winners curse'') is a predicted phenomenon in genome-wide analyses (Lohmueller et al. 2003; Kraft 2008). When combined, the two populations provided a total of 832 animals derived from (BALB/cJ 3 C57BL/6J) F 1 by (C3H/HeJ 3 DBA/2J) F 1 parents and having a phenotype measurement for adult femur length. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Finding the causative genetic variations that underlie complex adult traits is a significant experimental challenge. The unbiased search strategy of genome-wide association (GWAS) has been used extensively in recent human population studies. These efforts, however, typically find only a minor fraction of the genetic loci that are predicted to affect variation. As an experimental model for the analysis of adult polygenic traits, we measured a mouse population for multiple phenotypes and conducted a genome-wide search for effector loci. Complex adult phenotypes, related to body size and bone structure, were measured as component phenotypes, and each subphenotype was associated with a genomic spectrum of candidate effector loci. The strategy successfully detected several loci for the phenotypes, at genome-wide significance, using a single, modest-sized population (N = 505). The effector loci each explain 2%-10% of the measured trait variation and, taken together, the loci can account for over 25% of a trait's total population variation. A replicate population (N = 378) was used to confirm initially observed loci for one trait (femur length), and, when the two groups were merged, the combined population demonstrated increased power to detect loci. In contrast to human population studies, our mouse genome-wide searches find loci that individually explain a larger fraction of the observed variation. Also, the additive effects of our detected mouse loci more closely match the predicted genetic component of variation. The genetic loci discovered are logical candidates for components of the genetic networks having evolutionary conservation with human biology.
    Genome Research 05/2012; 22(8):1549-57. DOI:10.1101/gr.135582.111 · 14.63 Impact Factor
Show more