Transcriptome and genome sequencing uncovers functional variation in humans

1] Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland [2] Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, 1211 Geneva, Switzerland [3] Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland.
Nature (Impact Factor: 42.35). 09/2013; 501(7468). DOI: 10.1038/nature12531
Source: PubMed

ABSTRACT Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project-the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

Download full-text


Available from: Xavier Estivill, Jul 29, 2015
  • Source
    • "These profiles were normalized, averaged, smoothed, and centered on the exon midpoint. To investigate the impact of intronic structural variants on nucleosome localization (Figure 4C), we used the MNase-seq data above and corresponding sQTL (Lappalainen et al., 2013) and genotype data (Abecasis et al., 2012). We then compiled the MNase profiles of individuals with genotypes representing shorter and longer upstream introns. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mammalian genes are composed of exons, but the evolutionary origins and functions of new internal exons are poorly understood. Here, we analyzed patterns of exon gain using deep cDNA sequencing data from five mammals and one bird, identifying thousands of species- and lineage-specific exons. Most new exons derived from unique rather than repetitive intronic sequence. Unlike exons conserved across mammals, species-specific internal exons were mostly located in 5' UTRs and alternatively spliced. They were associated with upstream intronic deletions, increased nucleosome occupancy, and RNA polymerase II pausing. Genes containing new internal exons had increased gene expression, but only in tissues in which the exon was included. Increased expression correlated with the level of exon inclusion, promoter proximity, and signatures of cotranscriptional splicing. Altogether, these findings suggest that increased splicing at the 5' ends of genes enhances expression and that changes in 5' end splicing alter gene expression between tissues and between species. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    Cell Reports 03/2015; 10(12). DOI:10.1016/j.celrep.2015.02.058 · 7.21 Impact Factor
  • Source
    • "The Seqnature software and analysis approach can be applied to other next generation sequencing technologies such as Chip-seq to identify allelic differences in transcription factor occupancy (Reddy et al. 2012) or DNase I sensitivity mapping to identify allelespecific chromatin activation (Degner et al. 2012). Previous eQTL studies in genetically diverse populations suggest that the most significant eQTL tend to be local (Rockman and Kruglyak 2006; Pickrell et al. 2010; Aylor et al. 2011; Lappalainen et al. 2013). Gene prioritization methods are becoming more important in the genome-wide association studies (GWAS) era (Hou and Zhao 2013), and genes with local eQTL are promising candidates for underlying disease-associated regions in human GWAS studies (Knight 2005; Chen et al. 2008; Emilsson et al. 2008; Musunuru et al. 2010; Hou and Zhao 2013; Li et al. 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations.
    Genetics 09/2014; 198(1):59-73. DOI:10.1534/genetics.114.165886 · 4.87 Impact Factor
  • Source
    • "Conservative estimations suggest that there are no < 250 LoF variants per sequenced genome, 100 of them known to be related to human diseases, and more than 30 in a homozygous state and predicted to be highly damaging (Xue et al, 2012), suggesting a previously unnoticed level of variation with putative functional consequences in protein-coding regions in humans (MacArthur & Tyler-Smith, 2010). Moreover, this apparently pathological variation is not restricted to coding regions but also seems to occur in other noncoding , regulatory elements, such as miRNAs (Carbonell et al, 2012), transcription factor binding sites (TFBSs) (Spivakov et al, 2012) and others (Lappalainen et al, 2013). The origin of this Goh et al, 2007; Wagner et al, 2007; Ideker & Sharan, 2008; Mitra et al, 2013), suggests that causative genes for the same disease often reside in the same biological module, which can be a protein complex (Lage et al, 2007), or a subnetwork of protein interactions (Lim et al, 2006; Ideker & Sharan, 2008; Vidal et al, 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent genomic projects have revealed the existence of an unexpectedly large amount of deleterious variability in the human genome. Several hypotheses have been proposed to explain such an apparently high mutational load. However, the mechanisms by which deleterious mutations in some genes cause a pathological effect but are apparently innocuous in other genes remain largely unknown. This study searched for deleterious variants in the 1,000 genomes populations, as well as in a newly sequenced population of 252 healthy Spanish individuals. In addition, variants causative of monogenic diseases and somatic variants from 41 chronic lymphocytic leukaemia patients were analysed. The deleterious variants found were analysed in the context of the interactome to understand the role of network topology in the maintenance of the observed mutational load. Our results suggest that one of the mechanisms whereby the effect of these deleterious variants on the phenotype is suppressed could be related to the configuration of the protein interaction network. Most of the deleterious variants observed in healthy individuals are concentrated in peripheral regions of the interactome, in combinations that preserve their connectivity, and have a marginal effect on interactome integrity. On the contrary, likely pathogenic cancer somatic deleterious variants tend to occur in internal regions of the interactome, often with associated structural consequences. Finally, variants causative of monogenic diseases seem to occupy an intermediate position. Our observations suggest that the real pathological potential of a variant might be more a systems property rather than an intrinsic property of individual proteins.
    Molecular Systems Biology 09/2014; 10(9). DOI:10.15252/msb.20145222 · 14.10 Impact Factor
Show more