Alvis Brazma

EMBL-EBI, Cambridge, England, United Kingdom

Are you Alvis Brazma?

Claim your profile

Publications (170)1611.56 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The intronic SNP rs7903146 in the T-cell factor 7-like 2 gene (TCF7L2) is the common genetic variant most highly associated with Type 2 diabetes known to date. The risk T-allele is located in an open chromatin region specific to human pancreatic islets of Langerhans, thereby accessible for binding of regulatory proteins. The risk T-allele locus exhibits stronger enhancer activity compared to the non-risk C-allele. The aim of this study was to identify transcriptional regulators that bind the open chromatin region in the rs7903146 locus and thereby potentially regulate TCF7L2 expression and activity. Using affinity chromatography followed by Edman sequencing, we identified one candidate regulatory protein, i.e. high-mobility group protein B1 (HMGB1). The binding of HMGB1 to the rs7903146 locus was confirmed in pancreatic islets from human deceased donors, in HCT116 and in HEK293 cell lines using: (i) protein purification on affinity columns followed by Western blot, (ii) chromatin immunoprecipitation followed by qPCR and (iii) electrophoretic mobility shift assay. The results also suggested that HMGB1 might have higher binding affinity to the C-allele of rs7903146 compared to the T-allele, which was supported in vitro using Dynamic Light Scattering, possibly in a tissue-specific manner. The functional consequence of HMGB1 depletion in HCT116 and INS1 cells was reduced insulin and TCF7L2 mRNA expression, TCF7L2 transcriptional activity and glucose stimulated insulin secretion. These findings suggest that the rs7903146 locus might exert its enhancer function by interacting with HMGB1 in an allele dependent manner.
    No preview · Article · Feb 2016 · Molecular and Cellular Endocrinology
  • Source
    Jo McEntyre · Ugis Sarkans · Alvis Brazma

    Preview · Article · Dec 2015 · Molecular Systems Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons—estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: ‘enrichment’ in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.
    Full-text · Article · Oct 2015 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Although high-throughput studies of gene expression have generated large amounts of data, most of which is freely available in public archives, the use of this valuable resource is limited by computational complications and non-homogenous annotation. To address these issues, we have performed a complete re-annotation of public microarray data from human skeletal muscle biopsies and constructed a muscle expression compendium consisting of nearly 3000 samples. The created muscle compendium is a publicly available resource including all curated annotation. Using this data set, we aimed to elucidate the molecular mechanism of muscle aging and to describe how physical exercise may alleviate negative physiological effects. Results: We find 957 genes to be significantly associated with aging (p < 0.05, FDR = 5 %, n = 361). Aging was associated with perturbation of many central metabolic pathways like mitochondrial function including reduced expression of genes in the ATP synthase, NADH dehydrogenase, cytochrome C reductase and oxidase complexes, as well as in glucose and pyruvate processing. Among the genes with the strongest association with aging were H3 histone, family 3B (H3F3B, p = 3.4 x 10(-13)), AHNAK nucleoprotein, desmoyokin (AHNAK, p = 6.9 x 10(-12)), and histone deacetylase 4 (HDAC4, p = 4.0 x 10(-9)). We also discover genes previously not linked to muscle aging and metabolism, such as fasciculation and elongation protein zeta 2 (FEZ2, p = 2.8 x 10(-8)). Out of the 957 genes associated with aging, 21 (p < 0.001, false discovery rate = 5 %, n = 116) were also associated with maximal oxygen consumption (VO2MAX). Strikingly, 20 out of those 21 genes are regulated in opposite direction when comparing increasing age with increasing VO2MAX. Conclusions: These results support that mitochondrial dysfunction is a major age-related factor and also highlight the beneficial effects of maintaining a high physical capacity for prevention of age-related sarcopenia.
    Full-text · Article · Oct 2015 · Skeletal Muscle
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Phenotypic differences between species are driven by changes in gene expression and, by extension, by modifications in the regulation of the transcriptome. Investigation of mammalian transcriptome divergence has been restricted to analysis of bulk gene expression levels and gene-internal splicing. Using allele-specific expression analysis in inter-strain hybrids of Mus musculus, we determined the contribution of multiple cellular regulatory systems to transcriptome divergence, including: alternative promoter usage, transcription start site selection, cassette exon usage, alternative last exon usage, and alternative polyadenylation site choice. Between mouse strains, a fifth of genes have variations in isoform usage that contribute to transcriptomic changes, half of which alter encoded amino acid sequence. Virtually all divergence in isoform usage altered the post-transcriptional regulatory instructions in gene UTRs. Furthermore, most genes with isoform differences between strains contain changes originating from multiple regulatory systems. This result indicates widespread cross-talk and coordination exists among different regulatory systems. Overall, isoform usage diverges in parallel with and independently to gene expression evolution, and the cis and trans regulatory contribution to each differs significantly.
    Preview · Article · Sep 2015 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based. We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome. The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.
    Full-text · Article · Jun 2015 · BMC Genomics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case vs. control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. Results: We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework, where each experiment is modelled separately and the retrieval is done by finding related models. For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples. We then show empirically that inference for the full probabilistic model can be approximated with good performance using the computationally fast k-means clustering algorithm. The suggested metric for retrieval using clusterings is the normalized information distance. The method is highly scalable and straightforward to apply to construct a general-purpose gene expression experiment retrieval method. Availability: The method can be implemented using only standard k-means and normalized information distance, available in many standard statistical software packages.
    Full-text · Article · May 2015 · Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The Cellular Phenotype Database (CPD) is a repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology and (iii) to facilitate development of analytical methods in this field. Results: In this article we present CPD, its data structure and user interface, propose a minimal set of information describing RNA interference experiments, and suggest a generic schema for management and aggregation of outputs from phenotypic or molecular localization experiments. The database has a flexible structure for management of data from heterogeneous sources of systems microscopy experimental outputs generated by a variety of protocols and technologies and can be queried by gene, reagent, gene attribute, study keywords, phenotype or ontology terms. Availability and implementation: CPD is developed as part of the Systems Microscopy Network of Excellence and is accessible at http://www.ebi.ac.uk/fg/sym. Contact: jes@ebi.ac.uk or ugis@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Full-text · Article · Apr 2015 · Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42 000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.
    Full-text · Article · Oct 2014 · Nucleic Acids Research
  • [Show abstract] [Hide abstract]
    ABSTRACT: One purpose of the biomedical literature is to report results in sufficient detail that the methods of data collection and analysis can be independently replicated and verified. Here we present reporting guidelines for gene expression localization experiments: the minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE). MISFISHIE is modeled after the Minimum Information About a Microarray Experiment (MIAME) specification for microarray experiments. Both guidelines define what information should be reported without dictating a format for encoding that information. MISFISHIE describes six types of information to be provided for each experiment: experimental design, biomaterials and treatments, reporters, staining, imaging data and image characterizations. This specification has benefited the consortium within which it was developed and is expected to benefit the wider research community. We welcome feedback from the scientific community to help improve our proposal.
    No preview · Article · Oct 2014 · Nature Biotechnology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The incidence of renal cell carcinoma (RCC) is increasing worldwide, and its prevalence is particularly high in some parts of Central Europe. Here we undertake whole-genome and transcriptome sequencing of clear cell RCC (ccRCC), the most common form of the disease, in patients from four different European countries with contrasting disease incidence to explore the underlying genomic architecture of RCC. Our findings support previous reports on frequent aberrations in the epigenetic machinery and PI3K/mTOR signalling, and uncover novel pathways and genes affected by recurrent mutations and abnormal transcriptome patterns including focal adhesion, components of extracellular matrix (ECM) and genes encoding FAT cadherins. Furthermore, a large majority of patients from Romania have an unexpected high frequency of A:T>T:A transversions, consistent with exposure to aristolochic acid (AA). These results show that the processes underlying ccRCC tumorigenesis may vary in different populations and suggest that AA may be an important ccRCC carcinogen in Romania, a finding with major public health implications.
    Full-text · Article · Oct 2014 · Nature Communications
  • Source
    Nuno A Fonseca · John Marioni · Alvis Brazma
    [Show abstract] [Hide abstract]
    ABSTRACT: Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignment/quantification tools can have a major effect upon the expression levels found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the “true” expression levels? We evaluate fifty gene profiling pipelines in experimental and simulated data sets with different characteristics (e.g, read length and sequencing depth). In the absence of knowledge of the ‘ground truth’ in real RNAseq data sets, we used simulated data to assess the differences between the “true” expression and those reconstructed by the analysis pipelines. Even though this approach does not take into account all known biases present in RNAseq data, it still allows to estimate the accuracy of the gene expression values inferred by different analysis pipelines. The results show that i) overall there is a high correlation between the expression levels inferred by the best pipelines and the true quantification values; ii) the error in the estimated gene expression values can vary considerably across genes; and iii) a small set of genes have expression estimates with consistently high error (across data sets and methods). Finally, although the mapping software is important, the quantification method makes a greater difference to the results.
    Full-text · Article · Sep 2014 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chimeric RNAs originating from two or more different genes are known to exist not only in cancer, but also in normal tissues, where they can play a role in human evolution. However, the exact mechanism of their formation is unknown. Here, we use RNA sequencing data from 462 healthy individuals representing 5 human populations to systematically identify and in depth characterize 81 RNA tandem chimeric transcripts, 13 of which are novel. We observe that 6 out of these 81 chimeras have been regarded as cancer-specific. Moreover, we show that a prevalence of long introns at the fusion breakpoint is associated with the chimeric transcripts formation. We also find that tandem RNA chimeras have lower abundances as compared to their partner genes. Finally, by combining our results with genomic data from the same individuals we uncover intronic genetic variants associated with the chimeric RNA formation. Taken together our findings provide an important insight into the chimeric transcripts formation and open new avenues of research into the role of intronic genetic variants in post-transcriptional processing events.
    Full-text · Article · Aug 2014 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The cooperation of transcriptional and post-transcriptional levels of control to shape gene regulation is only partially understood. Here we show that a combination of two simple and non-invasive genomic techniques, coupled with kinetic mathematical modeling, afford insight into the intricate dynamics of RNA regulation in response to oxidative stress in the fission yeast Schizosaccharomyces pombe. This study reveals a dominant role of transcriptional regulation in response to stress, but also points to the first minutes after stress induction as a critical time when the coordinated control of mRNA turnover can support the control of transcription for rapid gene regulation. In addition, we uncover specialized gene expression strategies associated with distinct functional gene groups, such as simultaneous transcriptional repression and mRNA destabilization for genes encoding ribosomal proteins, delayed mRNA destabilization with varying contribution of transcription for ribosome biogenesis genes, dominant roles of mRNA stabilization for genes functioning in protein degradation, and adjustment of both transcription and mRNA turnover during the adaptation to stress. We also show that genes regulated independently of the bZIP transcription factor Atf1p are predominantly controlled by mRNA turnover, and identify putative cis-regulatory sequences that are associated with different gene expression strategies during the stress response. This study highlights the intricate and multi-faceted interplay between transcription and RNA turnover during the dynamic regulatory response to stress.
    Full-text · Article · Jul 2014 · RNA Biology

  • No preview · Poster · Mar 2014

  • No preview · Article · Mar 2014 · Neuromuscular Disorders
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.
    Full-text · Article · Dec 2013 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The BioSamples database at the EBI (http://www.ebi.ac.uk/biosamples) provides an integration point for BioSamples information between technology specific databases at the EBI, projects such as ENCODE and reference collections such as cell lines. The database delivers a unified query interface and API to query sample information across EBI’s databases and provides links back to assay databases. Sample groups are used to manage related samples, e.g. those from an experimental submission, or a single reference collection. Infrastructural improvements include a new user interface with ontological and key word queries, a new query API, a new data submission API, complete RDF data download and a supporting SPARQL endpoint, accessioning at the point of submission to the European Nucleotide Archive and European Genotype Phenotype Archives and improved query response times.
    Full-text · Article · Nov 2013 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project-the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
    Full-text · Article · Sep 2013 · Nature
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.
    Full-text · Article · Sep 2013 · Nature Biotechnology

Publication Stats

17k Citations
1,611.56 Total Impact Points

Institutions

  • 1998-2015
    • EMBL-EBI
      • Functional Genomics Group
      Cambridge, England, United Kingdom
  • 2012
    • Wellcome Trust
      Londinium, England, United Kingdom
  • 2010-2012
    • Cancer Research UK Cambridge Institute
      Cambridge, England, United Kingdom
  • 2007-2012
    • Wellcome Trust Sanger Institute
      Cambridge, England, United Kingdom
  • 2009-2011
    • University of Cambridge
      Cambridge, England, United Kingdom
    • Max Planck Institute for Molecular Genetics
      • Department of Computational Molecular Biology
      Berlín, Berlin, Germany
  • 2008
    • European Molecular Biology Laboratory
      Heidelburg, Baden-Württemberg, Germany
  • 2005
    • British Antarctic Survey
      Cambridge, England, United Kingdom
    • University College Dublin
      • Conway Institute of Biomolecular & Biomedical Research
      Dublin, Leinster, Ireland
  • 1996-2005
    • University of Latvia
      • Institute of Mathematics and Computer Science
      Rija, Riga, Latvia
  • 2004
    • Stanford University
      • Department of Biochemistry
      Palo Alto, California, United States
  • 2002
    • Harvard University
      Cambridge, Massachusetts, United States
    • Instituto de Bioinformatica e Biotecnologia
      Natal, Rio Grande do Norte, Brazil