Transcriptome and genome sequencing uncovers functional variation in humans

1] Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland [2] Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, 1211 Geneva, Switzerland [3] Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland.
Nature (Impact Factor: 41.46). 09/2013; 501(7468). DOI: 10.1038/nature12531
Source: PubMed


Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project-the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

Download full-text


Available from: Xavier Estivill, Oct 04, 2015
229 Reads
  • Source
    • "To explore whether genotypes of SNPs influence expression, in silico analysis was performed using expression profile of data set series GSE6536 (Gene Expression Omnibus, (, (Stranger et al., 2007), 1000 genomes data base (Abecasis et al., 2012) and Geuvadis RNA sequencing data base (Lappalainen et al., 2013). P-values were calculated comparing mean of expression values of genes having different genotypes by "
    [Show abstract] [Hide abstract]
    ABSTRACT: Oral cancer is usually preceded by pre-cancerous lesion and related to tobacco abuse. Tobacco carcinogens damage DNA and cells harboring such damaged DNA normally undergo apoptotic death, but cancer cells are exceptionally resistant to apoptosis. Here we studied association between sequence and expression variations in apoptotic pathway genes and risk of oral cancer and precancer. Ninety nine tagSNPs in 23 genes, involved in mitochondrial and non-mitochondrial apoptotic pathways, were genotyped in 525 cancer and 253 leukoplakia patients and 538 healthy controls using Illumina Golden Gate assay. Six SNPs (rs1473418 at BCL2; rs1950252 at BCL2L2; rs8190315 at BID; rs511044 at CASP1; rs2227310 at CASP7 and rs13010627 at CASP10) significantly modified risk of oral cancer but SNPs only at BCL2, CASP1and CASP10 modulated risk of leukoplakia. Combination of SNPs showed a steep increase in risk of cancer with increase in "effective" number of risk alleles. In silico analysis of published data set and our unpublished RNAseq data suggest that change in expression of BID and CASP7 may have affected risk of cancer. In conclusion, three SNPs, rs1473418 in BCL2, rs1950252 in BCL2L2 and rs511044 in CASP1, are being implicated for the first time in oral cancer. Since SNPs at BCL2, CASP1 and CASP10 modulated risk of both leukoplakia and cancer, so, they should be studied in more details for possible biomarkers in transition of leukoplakia to cancer. This study also implies importance of mitochondrial apoptotic pathway gene (such as BCL2) in progression of leukoplakia to oral cancer.
    Mitochondrion 09/2015; DOI:10.1016/j.mito.2015.09.001 · 3.25 Impact Factor
  • Source
    • "These profiles were normalized, averaged, smoothed, and centered on the exon midpoint. To investigate the impact of intronic structural variants on nucleosome localization (Figure 4C), we used the MNase-seq data above and corresponding sQTL (Lappalainen et al., 2013) and genotype data (Abecasis et al., 2012). We then compiled the MNase profiles of individuals with genotypes representing shorter and longer upstream introns. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Mammalian genes are composed of exons, but the evolutionary origins and functions of new internal exons are poorly understood. Here, we analyzed patterns of exon gain using deep cDNA sequencing data from five mammals and one bird, identifying thousands of species- and lineage-specific exons. Most new exons derived from unique rather than repetitive intronic sequence. Unlike exons conserved across mammals, species-specific internal exons were mostly located in 5' UTRs and alternatively spliced. They were associated with upstream intronic deletions, increased nucleosome occupancy, and RNA polymerase II pausing. Genes containing new internal exons had increased gene expression, but only in tissues in which the exon was included. Increased expression correlated with the level of exon inclusion, promoter proximity, and signatures of cotranscriptional splicing. Altogether, these findings suggest that increased splicing at the 5' ends of genes enhances expression and that changes in 5' end splicing alter gene expression between tissues and between species. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    Cell Reports 03/2015; 10(12). DOI:10.1016/j.celrep.2015.02.058 · 8.36 Impact Factor
  • Source
    • "The Seqnature software and analysis approach can be applied to other next generation sequencing technologies such as Chip-seq to identify allelic differences in transcription factor occupancy (Reddy et al. 2012) or DNase I sensitivity mapping to identify allelespecific chromatin activation (Degner et al. 2012). Previous eQTL studies in genetically diverse populations suggest that the most significant eQTL tend to be local (Rockman and Kruglyak 2006; Pickrell et al. 2010; Aylor et al. 2011; Lappalainen et al. 2013). Gene prioritization methods are becoming more important in the genome-wide association studies (GWAS) era (Hou and Zhao 2013), and genes with local eQTL are promising candidates for underlying disease-associated regions in human GWAS studies (Knight 2005; Chen et al. 2008; Emilsson et al. 2008; Musunuru et al. 2010; Hou and Zhao 2013; Li et al. 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations.
    Genetics 09/2014; 198(1):59-73. DOI:10.1534/genetics.114.165886 · 5.96 Impact Factor
Show more