Identifying Recent Adaptations in Large-Scale Genomic Data

Department of Biology, MIT, Cambridge, MA 02139, USA
Cell (Impact Factor: 32.24). 02/2013; 152(4):703-13. DOI: 10.1016/j.cell.2013.01.035
Source: PubMed


Although several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence variation from the 1000 Genomes (1000G) Project and the composite of multiple signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes 35 high-scoring nonsynonymous variants, 59 variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate nonsynonymous variant in Toll-like receptor 5 (TLR5) and show that it leads to altered NF-κB signaling in response to bacterial flagellin. PAPERFLICK:

Download full-text


Available from: Angela Yen, Jan 20, 2016
  • Source
    • "For instance, MFPH alone implicated selection in a gene-region in Maasai that has been pointed out as a candidate region for stature in Pygmy groups. Thus, MFPH constitutes a valuable additional summary statistic for investigating local adaptation, possibly in a demography-informed approach utilizing, for instance, Approximate Bayesian Computation [52,53]. MFPH is well suited for analyzing large genome wide data since it is quick and easy to compute for phased data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Genome-wide scans for regions that demonstrate deviating patterns of genetic variation have become common approaches for finding genes targeted by selection. Several genomic patterns have been utilized for this purpose, including deviations in haplotype homozygosity, frequency spectra and genetic differentiation between populations. Results We describe a novel approach based on the Maximum Frequency of Private Haplotypes – MFPH – to search for signals of recent population-specific selection. The MFPH statistic is straightforward to compute for phased SNP- and sequence-data. Using both simulated and empirical data, we show that MFPH can be a powerful statistic to detect recent population-specific selection, that it performs at the same level as other commonly used summary statistics (e.g. FST, iHS and XP-EHH), and that MFPH in some cases capture signals of selection that are missed by other statistics. For instance, in the Maasai, MFPH reveals a strong signal of selection in a region where other investigated statistics fail to pick up a clear signal that contains the genes DOCK3, MAPKAPK3 and CISH. This region has been suggested to affect height in many populations based on phenotype-genotype association studies. It has specifically been suggested to be targeted by selection in Pygmy groups, which are on the opposite end of the human height spectrum compared to the Maasai. Conclusions From the analysis of both simulated and publicly available empirical data, we show that MFPH represents a summary statistic that can provide further insight concerning population-specific adaptation.
    Full-text · Article · May 2014 · BMC Genetics
  • Source
    • "Finally, among the strongest selection signals in east Asians, three gene regions have been linked to breast cancer. These include RAD51L1 and the ECHDC1-RNF146 region identified by GWAS (Hoggart et al. 2007; Gold et al. 2008), and HERC1, which has previously been reported as a selection target and is mutated in breast cancer (Grossman et al. 2013). These observations highlight the need for further studies to better understand the extent to which cancer, which is generally a rather late-onset disease, has been a selective factor by itself or a by-product of other selective forces exerting pressure on pleiotropic genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide scans for selection have identified multiple regions of the human genome as being targeted by positive selection. However, only a small proportion has been replicated across studies, and the prevalence of positive selection as a mechanism of adaptive change in humans remains controversial. Here we explore the power of two haplotype-based statistics - the integrated haplotype score (iHS) and the Derived Intra-allelic Nucleotide Diversity (DIND) test - in the context of next-generation sequencing data, and evaluate their robustness to demography and other selection modes. We show that these statistics are both powerful for the detection of recent positive selection, regardless of population history, and robust to variation in coverage, with DIND being insensitive to very low coverage. We apply these statistics to whole-genome sequence datasets from the 1000 Genomes Project and Complete Genomics. We found that putative targets of selection were highly significantly enriched in genic and non-synonymous SNPs, and that DIND was more powerful than iHS in the context of small sample sizes, low-quality genotype calling or poor coverage. As we excluded genomic confounders and alternative selection models, such as background selection, the observed enrichment attests to the action of recent, strong positive selection. Further support to the adaptive significance of these genomic regions came from their enrichment in functional variants detected by genome-wide association studies, informing the relationship between past selection and current benign and disease-related phenotypic variation. Our results indicate that hard sweeps targeting low-frequency standing variation have played a moderate, albeit significant, role in recent human evolution.
    Full-text · Article · Apr 2014 · Molecular Biology and Evolution
  • Source
    • "To investigate if high CEhZ scores may be indicative of evolutionary pressure, CEhZ loci were intersected with regions recently identified as harbouring signals of natural selection in central Europeans (CEU), Chinese (CHB) or Yoruba Africans (YRI) using the composite of multiple signals (CMS) test [16]. This revealed that 20% of the CEU, 13% of the YRI, and 10% of the CHB regions identified by CEhZ and CMS overlapped (Additional file 4: Data S2). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion. These patterns can be quantified by compression efficiency (CE). In principle, the composition of an entire genome can be represented by a CE number quantifying allele representation and order. We applied a compression algorithm (DEFLATE) to genome-wide high-density SNP data from 4,155 human, 1,800 cattle, 1,222 sheep, 81 dogs and 49 mice samples. All human ethnic groups can be clustered by CE and the clusters recover phylogeography based on traditional fixation index (FST) analyses. CE analysis of other mammals results in segregation by breed or species, and is sensitive to admixture and past effective population size. This clustering is a consequence of individual patterns such as runs of homozygosity. Intriguingly, a related approach can also be used to identify genomic loci that show population-specific CE segregation. A high resolution CE 'sliding window' scan across the human genome, organised at the population level, revealed genes known to be under evolutionary pressure. These include SLC24A5 (European and Gujarati Indian skin pigmentation), HERC2 (European eye color), LCT (European and Maasai milk digestion) and EDAR (Asian hair thickness). We also identified a set of previously unidentified loci with high population-specific CE scores including the chromatin remodeler SCMH1 in Africans and EDA2R in Asians. Closer inspection reveals that these prioritised genomic regions do not correspond to simple runs of homozygosity but rather compositionally complex regions that are shared by many individuals of a given population. Unlike FST, CE analyses do not require ab initio population comparisons and are amenable to the hemizygous X chromosome. We conclude with a discussion of the implications of CE for a complex systems science view of genome evolution. CE allows one to clearly visualise the evolution of individual genomes and populations through a formal, mathematically-rigorous information space. Overall, CE makes a set of biological predictions, some of which are unique and await functional validation.
    Full-text · Article · Mar 2014 · BMC Bioinformatics
Show more