Yoseph Barash

University of Toronto, Toronto, Ontario, Canada

Are you Yoseph Barash?

Claim your profile

Publications (14)90.78 Total impact

  • Source
    Article: Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context.
    Hui Yuan Xiong, Yoseph Barash, Brendan J Frey
    [show abstract] [hide abstract]
    ABSTRACT: Alternative splicing is a major contributor to cellular diversity in mammalian tissues and relates to many human diseases. An important goal in understanding this phenomenon is to infer a 'splicing code' that predicts how splicing is regulated in different cell types by features derived from RNA, DNA and epigenetic modifiers. We formulate the assembly of a splicing code as a problem of statistical inference and introduce a Bayesian method that uses an adaptively selected number of hidden variables to combine subgroups of features into a network, allows different tissues to share feature subgroups and uses a Gibbs sampler to hedge predictions and ascertain the statistical significance of identified features. Using data for 3665 cassette exons, 1014 RNA features and 4 tissue types derived from 27 mouse tissues (http://genes.toronto.edu/wasp), we benchmarked several methods. Our method outperforms all others, and achieves relative improvements of 52% in splicing code quality and up to 22% in classification error, compared with the state of the art. Novel combinations of regulatory features and novel combinations of tissues that share feature subgroups were identified using our method. frey@psi.toronto.edu Supplementary data are available at Bioinformatics online.
    Bioinformatics 07/2011; 27(18):2554-62. · 5.47 Impact Factor
  • Source
    Article: Model-based detection of alternative splicing signals.
    [show abstract] [hide abstract]
    ABSTRACT: Transcripts from approximately 95% of human multi-exon genes are subject to alternative splicing (AS). The growing interest in AS is propelled by its prominent contribution to transcriptome and proteome complexity and the role of aberrant AS in numerous diseases. Recent technological advances enable thousands of exons to be simultaneously profiled across diverse cell types and cellular conditions, but require accurate identification of condition-specific splicing changes. It is necessary to accurately identify such splicing changes to elucidate the underlying regulatory programs or link the splicing changes to specific diseases. We present a probabilistic model tailored for high-throughput AS data, where observed isoform levels are explained as combinations of condition-specific AS signals. According to our formulation, given an AS dataset our tasks are to detect common signals in the data and identify the exons relevant to each signal. Our model can incorporate prior knowledge about underlying AS signals, measurement quality and gene expression level effects. Using a large-scale multi-tissue AS dataset, we demonstrate the advantage of our method over standard alternative approaches. In addition, we describe newly found tissue-specific AS signals which were verified experimentally, and discuss associated regulatory features. Supplementary data are available at Bioinformatics online.
    Bioinformatics 06/2010; 26(12):i325-33. · 5.47 Impact Factor
  • Source
    Article: Deciphering the splicing code.
    [show abstract] [hide abstract]
    ABSTRACT: Alternative splicing has a crucial role in the generation of biological complexity, and its misregulation is often involved in human disease. Here we describe the assembly of a 'splicing code', which uses combinations of hundreds of RNA features to predict tissue-dependent changes in alternative splicing for thousands of exons. The code determines new classes of splicing patterns, identifies distinct regulatory programs in different tissues, and identifies mutation-verified regulatory sequences. Widespread regulatory strategies are revealed, including the use of unexpectedly large combinations of features, the establishment of low exon inclusion levels that are overcome by features in specific tissues, the appearance of features deeper into introns than previously appreciated, and the modulation of splice variant levels by transcript structure characteristics. The code detected a class of exons whose inclusion silences expression in adult tissues by activating nonsense-mediated messenger RNA decay, but whose exclusion promotes expression during embryogenesis. The code facilitates the discovery and detailed characterization of regulated alternative splicing events on a genome-wide scale.
    Nature 05/2010; 465(7294):53-9. · 36.28 Impact Factor
  • Source
    Article: An illuminated view of molecular biology.
    Yoseph Barash, Xinchen Wang
    [show abstract] [hide abstract]
    ABSTRACT: A report on the 18th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 7th Special Interest Group meeting on Alternative Splicing, Boston, USA, 9-13 July 2010.
    Genome biology 01/2010; 11(8):307. · 6.63 Impact Factor
  • Article: A systematic analysis of intronic sequences downstream of 5' splice sites reveals a widespread role for U-rich motifs and TIA1/TIAL1 proteins in alternative splicing regulation.
    [show abstract] [hide abstract]
    ABSTRACT: To identify human intronic sequences associated with 5' splice site recognition, we performed a systematic search for motifs enriched in introns downstream of both constitutive and alternative cassette exons. Significant enrichment was observed for U-rich motifs within 100 nucleotides downstream of 5' splice sites of both classes of exons, with the highest enrichment between positions +6 and +30. Exons adjacent to U-rich intronic motifs contain lower frequencies of exonic splicing enhancers and higher frequencies of exonic splicing silencers, compared with exons not followed by U-rich intronic motifs. These findings motivated us to explore the possibility of a widespread role for U-rich motifs in promoting exon inclusion. Since cytotoxic granule-associated RNA binding protein (TIA1) and TIA1-like 1 (TIAL1; also known as TIAR) were previously shown in vitro to bind to U-rich motifs downstream of 5' splice sites, and to facilitate 5' splice site recognition in vitro and in vivo, we investigated whether these factors function more generally in the regulation of splicing of exons followed by U-rich intronic motifs. Simultaneous knockdown of TIA1 and TIAL1 resulted in increased skipping of 36/41 (88%) of alternatively spliced exons associated with U-rich motifs, but did not affect 32/33 (97%) alternatively spliced exons that are not associated with U-rich motifs. The increase in exon skipping correlated with the proximity of the first U-rich motif and the overall "U-richness" of the adjacent intronic region. The majority of the alternative splicing events regulated by TIA1/TIAL1 are conserved in mouse, and the corresponding genes are associated with diverse cellular functions. Based on our results, we estimate that approximately 15% of alternative cassette exons are regulated by TIA1/TIAL1 via U-rich intronic elements.
    Genome Research 08/2008; 18(8):1247-58. · 13.61 Impact Factor
  • Source
    Article: Functional coordination of alternative splicing in the mammalian central nervous system.
    [show abstract] [hide abstract]
    ABSTRACT: Alternative splicing (AS) functions to expand proteomic complexity and plays numerous important roles in gene regulation. However, the extent to which AS coordinates functions in a cell and tissue type specific manner is not known. Moreover, the sequence code that underlies cell and tissue type specific regulation of AS is poorly understood. Using quantitative AS microarray profiling, we have identified a large number of widely expressed mouse genes that contain single or coordinated pairs of alternative exons that are spliced in a tissue regulated fashion. The majority of these AS events display differential regulation in central nervous system (CNS) tissues. Approximately half of the corresponding genes have neural specific functions and operate in common processes and interconnected pathways. Differential regulation of AS in the CNS tissues correlates strongly with a set of mostly new motifs that are predominantly located in the intron and constitutive exon sequences neighboring CNS-regulated alternative exons. Different subsets of these motifs are correlated with either increased inclusion or increased exclusion of alternative exons in CNS tissues, relative to the other profiled tissues. Our findings provide new evidence that specific cellular processes in the mammalian CNS are coordinated at the level of AS, and that a complex splicing code underlies CNS specific AS regulation. This code appears to comprise many new motifs, some of which are located in the constitutive exons neighboring regulated alternative exons. These data provide a basis for understanding the molecular mechanisms by which the tissue specific functions of widely expressed genes are coordinated at the level of AS.
    Genome biology 02/2007; 8(6):R108. · 6.63 Impact Factor
  • Article: Y. Barash, G. Elidan, T. Kaplan, , N. Friedman.
    Bioinformatics. 01/2005; 21:596-600.
  • Source
    Article: Sfp1 is a stress- and nutrient-sensitive regulator of ribosomal protein gene expression.
    [show abstract] [hide abstract]
    ABSTRACT: Yeast cells modulate their protein synthesis capacity in response to physiological needs through the transcriptional control of ribosomal protein (RP) genes. Here we demonstrate that the transcription factor Sfp1, previously shown to play a role in the control of cell size, regulates RP gene expression in response to nutrients and stress. Under optimal growth conditions, Sfp1 is localized to the nucleus, bound to the promoters of RP genes, and helps promote RP gene expression. In response to inhibition of target of rapamycin (TOR) signaling, stress, or changes in nutrient availability, Sfp1 is released from RP gene promoters and leaves the nucleus, and RP gene transcription is down-regulated. Additionally, cells lacking Sfp1 fail to appropriately modulate RP gene expression in response to environmental cues. We conclude that Sfp1 integrates information from nutrient- and stress-responsive signaling pathways to help control RP gene expression.
    Proceedings of the National Academy of Sciences 11/2004; 101(40):14315-22. · 9.68 Impact Factor
  • Source
    Article: Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Recent years' exponential increase in DNA microarrays experiments has motivated the development of many signal quantitation (SQ) algorithms. These algorithms perform various transformations on the actual measurements aimed to enable researchers to compare readings of different genes quantitatively within one experiment and across separate experiments. However, it is relatively unclear whether there is a 'best' algorithm to quantitate microarray data. The ability to compare and assess such algorithms is crucial for any downstream analysis. In this work, we suggest a methodology for comparing different signal quantitation algorithms for gene expression data. Our aim is to enable researchers to compare the effect of different SQ algorithms on the specific dataset they are dealing with. We combine two kinds of tests to assess the effect of an SQ algorithm in terms of signal to noise ratio. To assess noise, we exploit redundancy within the experimental dataset to test the variability of a given SQ algorithm output. For the effect of the SQ on the signal we evaluate the overabundance of differentially expressed genes using various statistical significance tests. RESULTS: We demonstrate our analysis approach with three SQ algorithms for oligonucleotide microarrays. We compare the results of using the dChip software and the RMAExpress software to the ones obtained by using the standard Affymetrix MAS5 on a dataset containing pairs of repeated hybridizations. Our analysis suggests that dChip is more robust and stable than the MAS5 tools for about 60% of the genes while RMAExpress is able to achieve an even greater improvement in terms of signal to noise, for more than 95% of the genes.
    Bioinformatics 05/2004; 20(6):839-46. · 5.47 Impact Factor
  • Source
    Article: Modeling Dependencies in Protein-DNA Binding Sites
    [show abstract] [hide abstract]
    ABSTRACT: The availability of whole genome sequences and high-throughput genomic assays opens the door for in silico analysis of transcription regulation. This includes methods for discovering and characterizing the binding sites of DNA-binding proteins, such as transcription factors. A common representation of transcription factor binding sites is a position specific score matrix (PSSM). This representation makes the strong assumption that binding site positions are independent of each other. In this work, we explore Bayesian network representations of binding sites that provide different tradeoffs between complexity (number of parameters) and the richness of dependencies between positions. We develop the formal machinery for learning such models from data and for estimating the statistical significance of putative binding sites. We then evaluate the ramifications of these richer representations in characterizing binding site motifs and predicting their genomic locations. We show that these richer representations improve over the PSSM model in both tasks.
    07/2003;
  • Source
    Article: From Promoter Sequence to Expression: A Probabilistic Framework
    [show abstract] [hide abstract]
    ABSTRACT: We present a probabilistic framework that models the process by which transcriptional binding explains the mRNA expression of different genes. Our joint probabilistic model unifies the two key components of this process: the prediction of gene regulation events from sequence motifs in the gene's promoter region, and the prediction of mRNA expression from combinations of gene regulation events in different settings. Our approach has several advantages. By learning promoter sequence motifs that are directly predictive of expression data, it can improve the identification of binding site patterns. It is also able to identify combinatorial regulation via interactions of different transcription factors. Finally, the general framework allows us to integrate additional data sources, including data from the recent binding localization assays. We demonstrate our approach on the cell cycle data of Spellman et al., combined with the binding localization information of Simon et al. We show that the learned model predicts expression from sequence, and that it identifies coherent co-regulated groups with significant transcription factor motifs. It also provides valuable biological insight into the domain via these co-regulated "modules" and the combinatorial regulation effects that govern their behavior.
    05/2002;
  • Article: From Promoter Sequence to Expression:
    [show abstract] [hide abstract]
    ABSTRACT: We present a probabilistic framework that models the process by which transcriptional binding explains the mRNA expression of different genes. Our joint probabilistic model unifies the two key components of this process: the prediction of gene regulation events from sequence motifs in the gene's promoter region, and the prediction of mRNA expression from combinations of gene regulation events in different settings. Our approach has several advantages. By learning promoter sequence motifs that are directly predictive of expression data, it can improve the identification of binding site patterns. It is also able to identify combinatorial regulation via interactions of different transcription factors. Finally, the general framework allows us to integrate additional data sources, including data from the recent binding localization assays. We demonstrate our approach on the cell cycle data of Spellman et al., combined with the binding localization information of Simon et al. We show that the learned model predicts expression from sequence, and that it identifies coherent co-regulated groups with significant transcription factor motifs. It also provides valuable biological insight into the domain via these co-regulated "modules" and the combinatorial regulation effects that govern their behavior.
    02/2002;
  • Source
    Article: A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites
    Yoseph Barash, Gill Bejerano, Nir Friedman
    [show abstract] [hide abstract]
    ABSTRACT: A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent ood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms.
    02/2002;
  • Source
    Article: Context-specific Bayesian clustering for gene expression data.
    Yoseph Barash, Nir Friedman
    [show abstract] [hide abstract]
    ABSTRACT: The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors. In this work, we present a class of mathematical models that help in understanding the connections between transcription factors and functional classes of genes based on genetic and genomic data. Such a model represents the joint distribution of transcription factor binding sites and of expression levels of a gene in a unified probabilistic model. Learning a combined probability model of binding sites and expression patterns enables us to improve the clustering of the genes based on the discovery of putative binding sites and to detect which binding sites and experiments best characterize a cluster. To learn such models from data, we introduce a new search method that rapidly learns a model according to a Bayesian score. We evaluate our method on synthetic data as well as on real life data and analyze the biological insights it provides. Finally, we demonstrate the applicability of the method to other data analysis problems in gene expression data.
    Journal of Computational Biology 02/2002; 9(2):169-91. · 1.55 Impact Factor

Institutions

  • 2010–2011
    • University of Toronto
      • • Department of Electrical and Computer Engineering
      • • Banting and Best Department of Medical Research
      Toronto, Ontario, Canada
  • 2002–2004
    • Hebrew University of Jerusalem
      • Rachel and Selim Benin School of Computer Science and Engineering
      Jerusalem, Jerusalem District, Israel