Alvis Brazma

EMBL-EBI, Cambridge, England, United Kingdom

Are you Alvis Brazma?

Claim your profile

Publications (156)1365.21 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The cooperation of transcriptional and post-transcriptional levels of control to shape gene regulation is only partially understood. Here we show that a combination of two simple and non-invasive genomic techniques, coupled with kinetic mathematical modeling, afford insight into the intricate dynamics of RNA regulation in response to oxidative stress in the fission yeast Schizosaccharomyces pombe. This study reveals a dominant role of transcriptional regulation in response to stress, but also points to the first minutes after stress induction as a critical time when the coordinated control of mRNA turnover can support the control of transcription for rapid gene regulation. In addition, we uncover specialized gene expression strategies associated with distinct functional gene groups, such as simultaneous transcriptional repression and mRNA destabilization for genes encoding ribosomal proteins, delayed mRNA destabilization with varying contribution of transcription for ribosome biogenesis genes, dominant roles of mRNA stabilization for genes functioning in protein degradation, and adjustment of both transcription and mRNA turnover during the adaptation to stress. We also show that genes regulated independently of the bZIP transcription factor Atf1p are predominantly controlled by mRNA turnover, and identify putative cis-regulatory sequences that are associated with different gene expression strategies during the stress response. This study highlights the intricate and multi-faceted interplay between transcription and RNA turnover during the dynamic regulatory response to stress.
    RNA biology. 07/2014; 11(6).
  • Neuromuscular Disorders 01/2014; 24:S22-S23. · 3.46 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chimeric RNAs originating from two or more different genes are known to exist not only in cancer, but also in normal tissues, where they can play a role in human evolution. However, the exact mechanism of their formation is unknown. Here, we use RNA sequencing data from 462 healthy individuals representing 5 human populations to systematically identify and in depth characterize 81 RNA tandem chimeric transcripts, 13 of which are novel. We observe that 6 out of these 81 chimeras have been regarded as cancer-specific. Moreover, we show that a prevalence of long introns at the fusion breakpoint is associated with the chimeric transcripts formation. We also find that tandem RNA chimeras have lower abundances as compared to their partner genes. Finally, by combining our results with genomic data from the same individuals we uncover intronic genetic variants associated with the chimeric RNA formation. Taken together our findings provide an important insight into the chimeric transcripts formation and open new avenues of research into the role of intronic genetic variants in post-transcriptional processing events.
    PLoS ONE 01/2014; 9(8):e104567. · 3.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Expression Atlas ( is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.
    Nucleic Acids Research 12/2013; · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The BioSamples database at the EBI ( provides an integration point for BioSamples information between technology specific databases at the EBI, projects such as ENCODE and reference collections such as cell lines. The database delivers a unified query interface and API to query sample information across EBI's databases and provides links back to assay databases. Sample groups are used to manage related samples, e.g. those from an experimental submission, or a single reference collection. Infrastructural improvements include a new user interface with ontological and key word queries, a new query API, a new data submission API, complete RDF data download and a supporting SPARQL endpoint, accessioning at the point of submission to the European Nucleotide Archive and European Genotype Phenotype Archives and improved query response times.
    Nucleic Acids Research 11/2013; · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.
    Nature Biotechnology 09/2013; · 32.44 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project-the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
    Nature 09/2013; · 38.60 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To mechanistically characterize the microevolutionary processes active in altering transcription factor (TF) binding among closely related mammals, we compared the genome-wide binding of three tissue-specific TFs that control liver gene expression in six rodents. Despite an overall fast turnover of TF binding locations between species, we identified thousands of TF regions of highly constrained TF binding intensity. Although individual mutations in bound sequence motifs can influence TF binding, most binding differences occur in the absence of nearby sequence variations. Instead, combinatorial binding was found to be significant for genetic and evolutionary stability; cobound TFs tend to disappear in concert and were sensitive to genetic knockout of partner TFs. The large, qualitative differences in genomic regions bound between closely related mammals, when contrasted with the smaller, quantitative TF binding differences among Drosophila species, illustrate how genome structure and population genetics together shape regulatory evolution.
    Cell 08/2013; 154(3):530-40. · 31.96 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene. Here we show that, in a given condition, most protein coding genes have one major transcript expressed at significantly higher level than others, that in human tissues the major transcripts contribute almost 85 percent to the total mRNA from protein coding loci, and that often the same major transcript is expressed in many tissues. We detect a high degree of overlap between the set of major transcripts and a recently published set of alternatively spliced transcripts that are predicted to be translated utilizing proteomic data. Thus, we hypothesize that although some minor transcripts may play a functional role, the major ones are likely to be the main contributors to the proteome. However, we still detect a non-negligible fraction of protein coding genes for which the major transcript does not code a protein. Overall, our findings suggest that the transcriptome from protein coding loci is dominated by one transcript per gene and that not all the transcripts that contribute to transcriptome diversity are equally likely to contribute to protein diversity. This observation can help to prioritize candidate targets in proteomics research and to predict the functional impact of the detected changes in variation studies.
    Genome biology 07/2013; 14(7):R70. · 10.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genes for the production of a broad range of fungal secondary metabolites are frequently colinear. The prevalence of such gene clusters was systematically examined across the genome of the cereal pathogen Fusarium graminearum. The topological structure of transcriptional networks was also examined to investigate control mechanisms for mycotoxin biosynthesis and other processes. The genes associated with transcriptional processes were identified, and the genomic location of transcription-associated proteins (TAPs) analyzed in conjunction with the locations of genes exhibiting similar expression patterns. Highly conserved TAPs reside in regions of chromosomes with very low or no recombination, contrasting with putative regulator genes. Co-expression group profiles were used to define positionally clustered genes and a number of members of these clusters encode proteins participating in secondary metabolism. Gene expression profiles suggest there is an abundance of condition-specific transcriptional regulation. Analysis of the promoter regions of co-expressed genes showed enrichment for conserved DNA-sequence motifs. Potential global transcription factors recognising these motifs contain distinct sets of DNA-binding domains (DBDs) from those present in local regulators. Proteins associated with basal transcriptional functions are encoded by genes enriched in regions of the genome with low recombination. Systematic searches revealed dispersed and compact clusters of co-expressed genes, often containing a transcription factor, and typically containing genes involved in biosynthetic pathways. Transcriptional networks exhibit a layered structure in which the position in the hierarchy of a regulator is closely linked to the DBD structural class.
    BMC Systems Biology 06/2013; 7(1):52. · 2.98 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Oncogenic fusion genes that involve kinases have proven to be effective targets for therapy in a wide range of cancers. Unfortunately, the diagnostic approaches required to identify these events are struggling to keep pace with the diverse array of genetic alterations that occur in cancer. Diagnostic screening in solid tumours is particularly challenging, as many fusion genes occur with a low frequency. To overcome these limitations, we developed a capture enrichment strategy to enable high throughput transcript sequencing of the human kinome. This approach provides a global overview of kinase fusion events, irrespective of the identity of the fusion partner. To demonstrate the utility of this system we profiled one hundred non-small cell lung cancers and identified numerous genetic alterations impacting Fibroblast Growth Factor Receptor 3 (FGFR3) in lung squamous cell carcinoma and a novel ALK fusion partner in lung adenocarcinoma.
    The Journal of Pathology 05/2013; · 7.59 Impact Factor
  • Source
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.
    Nucleic Acids Research 04/2013; · 8.81 Impact Factor
  • Johan Rung, Alvis Brazma
    [Show abstract] [Hide abstract]
    ABSTRACT: Our understanding of gene expression has changed dramatically over the past decade, largely catalysed by technological developments. High-throughput experiments - microarrays and next-generation sequencing - have generated large amounts of genome-wide gene expression data that are collected in public archives. Added-value databases process, analyse and annotate these data further to make them accessible to every biologist. In this Review, we discuss the utility of the gene expression data that are in the public domain and how researchers are making use of these data. Reuse of public data can be very powerful, but there are many obstacles in data preparation and analysis and in the interpretation of the results. We will discuss these challenges and provide recommendations that we believe can improve the utility of such data.
    Nature Reviews Genetics 12/2012; · 41.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The paper proposes a hybrid system based approach for modelling of intracellular networks and introduces a restricted subclass of hybrid systems - HSM - with an objective of still being able to provide sufficient power for the modelling of biological systems, while imposing some restrictions that facilitate analysis of systems described by such models. The use of hybrid system based models have become increasingly popular, likely due to the facts that: 1) they provide sufficiently powerful mathematical formalism to describe biological processes of interest and does it in a 'natural way' from the biological perspective; 2) there are well established mathematical techniques as well as supporting software tools for analyzing such models. However often these models are very dependent on the quantitative parameters of the system (concentrations of proteins, their growth functions etc.) that are seldom exactly known, instead of more limited information of the system that can be observed in practice (directions of change in concentrations, but not the exact values etc.) As a result these models may work well for simulation of the system (prediction of its state starting from some initial conditions), but are too complicated for prediction of all possible qualitatively different behaviours a modelled system might have. With HSM we try to propose a hybrid system based formalism that is still sufficiently powerful for description of biological systems, while being as restricted as possible to facilitate the analysis of the systems described. We separate between the quantitative system parameters and their qualitative values that can be observed in practice. For HSM we provide an algorithm that analyses the system without the need to know the exact parameter values. We apply our model and analysis methods to a well-studied gene network of lambda phage. The phage has two well-known qualitatively different behaviours - lysis and lysogeny. We show that our model has an attractor structure that corresponds well to these two behaviours and that these are the only stable behaviours that can be exhibited by the system. The algorithm also generates (in principle biologically verifiable) hypotheses about the mutations of lambda phage that should change its observable behaviour.
    Gene 12/2012; · 2.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The ArrayExpress Archive of Functional Genomics Data ( is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
    Nucleic Acids Research 11/2012; · 8.81 Impact Factor
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task numerous software tools have been proposed. Determining the mappers that are most suitable for a specific application is not trivial. RESULTS: This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem. AVAILABILITY: A regularly updated compendium of mappers can be found at CONTACT: SUPPLEMENTARY INFORMATION: Supplementary information on this manuscript is available online.
    Bioinformatics 10/2012; · 5.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gene expression levels are thought to diverge primarily via regulatory mutations in trans within species, and in cis between species. To test this hypothesis in mammals we used RNA-sequencing to measure gene expression divergence between C57BL/6J and CAST/EiJ mouse strains and allele-specific expression in their F1 progeny. We identified 535 genes with parent-of-origin specific expression patterns, although few of these showed full allelic silencing. This suggests that the number of imprinted genes in a typical mouse somatic tissue is relatively small. In the set of non-imprinted genes, 32% showed evidence of divergent expression between the two strains. Of these, 2% could be attributed purely to variants acting in trans, while 43% were attributable only to variants acting in cis. The genes with expression divergence driven by changes in trans showed significantly higher sequence constraint than genes where the divergence was explained by variants acting in cis. The remaining genes with divergent patterns of expression (55%) were regulated by a combination of variants acting in cis and variants acting in trans. Intriguingly, the changes in expression induced by the cis and trans variants were in opposite directions more frequently than expected by chance, implying that compensatory regulation to stabilize gene expression levels is widespread. We propose that expression levels of genes regulated by this mechanism are fine-tuned by cis variants that arise following regulatory changes in trans, suggesting that many cis variants are not the primary targets of natural selection.
    Genome Research 08/2012; · 14.40 Impact Factor

Publication Stats

10k Citations
1,365.21 Total Impact Points


  • 1998–2014
    • EMBL-EBI
      Cambridge, England, United Kingdom
  • 2012
    • CUNY Graduate Center
      New York City, New York, United States
  • 2003–2012
    • Wellcome Trust Sanger Institute
      Cambridge, England, United Kingdom
  • 2011
    • Aalto University
      • Department of Information and Computer Science
      Helsinki, Province of Southern Finland, Finland
    • Dana-Farber Cancer Institute
      Boston, Massachusetts, United States
  • 2009
    • Helsinki Institute for Information Technology HIIT
      Helsinki, Southern Finland Province, Finland
    • University College London
      • Department of Genetics, Evolution and Environment (GEE)
      London, ENG, United Kingdom
    • Max Planck Institute for Molecular Genetics
      • Department of Computational Molecular Biology
      Berlin, Land Berlin, Germany
  • 2008
    • European Molecular Biology Laboratory
      Heidelburg, Baden-Württemberg, Germany
  • 2007
    • King's College London
      • Department of Medical and Molecular Genetics
      London, ENG, United Kingdom
  • 1996–2007
    • University of Latvia
      • Institute of Mathematics and Computer Science
      Riga, Riga, Latvia
  • 2004–2006
    • Stanford University
      • Department of Biochemistry
      Stanford, CA, United States
  • 2005
    • University of Cambridge
      Cambridge, England, United Kingdom
    • British Antarctic Survey
      Cambridge, England, United Kingdom
    • University College Dublin
      Dublin, Leinster, Ireland
  • 2002
    • University of Helsinki
      • Department of Computer Science
      Helsinki, Province of Southern Finland, Finland
    • University of California, Berkeley
      • Department of Molecular and Cell Biology
      Berkeley, MO, United States