Sonja J Prohaska

University of Leipzig, Leipzig, Saxony, Germany

Are you Sonja J Prohaska?

Claim your profile

Publications (74)345.52 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: We present an efficient generalization of algebraic dynamic programming (ADP) to unordered data types and a formalism for the automated derivation of outside grammars from their inside progenitors. These theoretical contributions are illustrated by ADP-style algorithms for shortest Hamiltonian path problems. These arise naturally when asking whether the evolutionary history of an ancient gene cluster can be explained by a series of local tandem duplications. Our framework makes it easy to compute Maximum accuracy solutions, which in turn require the computation of the probabilities of individual edges in the ensemble of Hamiltonian paths. The expansion of the Hox gene clusters is investigated as a show-case application. For implementation details see: http://www.bioinf.uni-leipzig.de/Software/setgram/
    Lecture Notes in Bioinformatics (Brazilian Symposium on Bioinformatics, BSB 2014); 10/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: The cell cycle genes homology region (CHR) has been identified as a DNA element with an important role in transcriptional regulation of late cell cycle genes. It has been shown that such genes are controlled by DREAM, MMB and FOXM1-MuvB and that these protein complexes can contact DNA via CHR sites. However, it has not been elucidated which sequence variations of the canonical CHR are functional and how frequent CHR-based regulation is utilized in mammalian genomes. Here, we define the spectrum of functional CHR elements. As the basis for a computational meta-analysis, we identify new CHR sequences and compile phylogenetic motif conservation as well as genome-wide protein-DNA binding and gene expression data. We identify CHR elements in most late cell cycle genes binding DREAM, MMB, or FOXM1-MuvB. In contrast, Myb- and forkhead-binding sites are underrepresented in both early and late cell cycle genes. Our findings support a general mechanism: sequential binding of DREAM, MMB and FOXM1-MuvB complexes to late cell cycle genes requires CHR elements. Taken together, we define the group of CHR-regulated genes in mammalian genomes and provide evidence that the CHR is the central promoter element in transcriptional regulation of late cell cycle genes by DREAM, MMB and FOXM1-MuvB.
    Nucleic Acids Research 08/2014; · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Over the last years, more and more biological data became available. Besides the pure amount of new data, also its dimensionality - the number of different attributes per data point - increased. Recently, especially the amount of data on chromatin and its modifications increased considerably. In the field of epigenetics, appropriate visualization tools designed for highlighting the different aspects of epigenetic data are currently not available. Results: We present a tool called TiBi-Scatter enabling correlation analysis in 2D. This approach allows for analyzing multidimensional data while keeping the use of resources such as memory small. Thus, it is in particular applicable to large data sets. Conclusions: TiBi-Scatter is a resource-friendly and easy to use tool that allows for the hypothesis-free analysis of large multidimensional biological data sets.
    4th Symposium on Biologial Data Visualization, Boston, MA, USA; 07/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Enzymatic splicing in Archaeal tRNAs is guided by bulge-helix-bulge structural elements, while much less seems to be known about splicing in other small RNAs (sRNAs). We conduct a genome-wide analysis of several archaeal genomes to identify putative BHB elements and compare our findings with available RNA-seq data. We also provide an analysis of the viability of using pattern-based and stochastic structural scanning algorithms for in silico studies of the occurrence of BHB motifs. Furthermore, we comment on splicing motifs in other small RNAs, which mostly do not fit the pattern of bulge-helix-bulge motifs. Appendix and supporting files available at: http://www.bioinf.uni-leipzig.de/publications/supplements/14-001
    IWBBIO 2014 (2nd International Work-Conference on Bioinformatics and Biomedical Engineering); 04/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.
    PLoS ONE 01/2014; 9(8):e105015. · 3.53 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most investigations into the large-scale patterns of protein evolution are based on gene annotations that have been compiled in reference databases. The use of these resources for quantitative comparisons, however, is complicated by sometimes vast differences in coverage. More importantly, however, we also observe substantial ascertainment biases that cannot be removed by simple normalization procedures. A striking example is provided by the correlations between protein domains. We observe that statistics derived from different computational gene annotation procedure show dramatic discrepancies, and even qualitative changes from negative to positive correlation, when compared to statistics obtained from annotation databases.
    Malaysian Journal of Fundamental and Applied Sciences. 07/2013; 10(2).
  • Christian Arnold, Peter F Stadler, Sonja J Prohaska
    [Show abstract] [Hide abstract]
    ABSTRACT: Eukaryotic histones carry a diverse set of specific chemical modifications that accumulate over the life-time of a cell and have a crucial impact on the cell state in general and the transcriptional program in particular. Replication constitutes a dramatic disruption of the chromatin states that effectively amounts to partial erasure of stored information. To preserve its epigenetic state the cell reconstructs (at least part of) the histone modifications by means of processes that are still very poorly understood. A plausible hypothesis is that the different combinations of reader and writer domains in histone-modifying enzymes implement local rewriting rules that are capable of "recomputing" the desired parental modification patterns on the basis of the partial information contained in that half of the nucleosomes that predate replication. To test whether such a mechanism is theoretically feasible, we have developed a flexible stochastic simulation system (available at http://www.bioinf.uni-leipzig.de/Software/StoChDyn) for studying the dynamics of histone modification states. The implementation is based on Gillespie's approach, i.e., it models the master equation of a detailed chemical model. It is efficient enough to use an evolutionary algorithm to find patterns across multiple cell divisions with high accuracy. We found that it is easy to evolve a system of enzymes that can maintain a particular chromatin state roughly stable, even without explicit boundary elements separating differentially modified chromatin domains. However, the success of this task depends on several previously unanticipated factors, such as the length of the initial state, the specific pattern that should be maintained, the time between replications, and chemical parameters such as enzymatic binding and dissociation rates. All these factors also influence the accumulation of errors in the wake of cell divisions.
    Journal of Theoretical Biology 07/2013; · 2.35 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
    Nature 04/2013; 496(7445):311-316. · 38.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
    Nature 04/2013; 496(7445):311-316. · 38.60 Impact Factor
  • Source
    Physical Biology 03/2013; · 2.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chromatin-related mechanisms, as e.g. histone modifications, are known to be involved in regulatory switches within the transcriptome. Only recently, mathematical models of these mechanisms have been established. So far they have not been applied to genome-wide data. We here introduce a mathematical model of transcriptional regulation by histone modifications and apply it to data of trimethylation of histone 3 at lysine 4 (H3K4me3) and 27 (H3K27me3) in mouse pluripotent and lineage-committed cells. The model describes binding of protein complexes to chromatin which are capable of reading and writing histone marks. Molecular interactions of the complexes with DNA and modified histones create a regulatory switch of transcriptional activity. The regulatory states of the switch depend on the activity of histone (de-) methylases, the strength of complex-DNA-binding and the number of nucleosomes capable of cooperatively contributing to complex-binding. Our model explains experimentally measured length distributions of modified chromatin regions. It suggests (i) that high CpG-density facilitates recruitment of the modifying complexes in embryonic stem cells and (ii) that re-organization of extended chromatin regions during lineage specification into neuronal progenitor cells requires targeted de-modification. Our approach represents a basic step towards multi-scale models of transcriptional control during development and lineage specification.
    Physical Biology 03/2013; 10(2):026006. · 2.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Epigenetic mechanisms play an important role in regulating and stabilizing functional states of living cells. However, in spite of an increasing amount of experimental data, models of transcriptional regulation by epigenetic processes, in particular by histone modifications, are rather rare. In this article, we focus on epigenetic modes of transcriptional regulation based on histone modifications and their potential dynamical interplay with DNA methylation and higher-order chromatin structure. The main purpose of this article is to review recent formal modeling approaches to the dynamics and propagation of histone modifications and to relate them to available experimental data. We evaluate their assumptions with respect to recruitment of relevant modifiers, establishment and processing of modifications, and compare the emerging stability properties and memory effects. Theoretical predictions that await experimental validation are highlighted and potential extensions of these models towards multiscale models of self-organizing chromatin are discussed.
    Epigenomics 04/2012; 4(2):205-19. · 2.43 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current genome-wide ChIP-seq experiments on different epigenetic marks aim at unraveling the interplay between their regulation mechanisms. Published evaluation tools, however, allow testing for predefined hypotheses only. Here, we present a novel method for annotation-independent exploration of epigenetic data and their inter-correlation with other genome-wide features. Our method is based on a combinatorial genome segmentation solely using information on combinations of epigenetic marks. It does not require prior knowledge about the data (e.g. gene positions), but allows integrating the data in a straightforward manner. Thereby, it combines compression, clustering and visualization of the data in a single tool. Our method provides intuitive maps of epigenetic patterns across multiple levels of organization, e.g. of the co-occurrence of different epigenetic marks in different cell types. Thus, it facilitates the formulation of new hypotheses on the principles of epigenetic regulation. We apply our method to histone modification data on trimethylation of histone H3 at lysine 4, 9 and 27 in multi-potent and lineage-primed mouse cells, analyzing their combinatorial modification pattern as well as differentiation-related changes of single modifications. We demonstrate that our method is capable of reproducing recent findings of gene centered approaches, e.g. correlations between CpG-density and the analyzed histone modifications. Moreover, combining the clustered epigenetic data with information on the expression status of associated genes we classify differences in epigenetic status of e.g. house-keeping genes versus differentiation-related genes. Visualizing the distribution of modification states on the chromosomes, we discover strong patterns for chromosome X. For example, exclusively H3K9me3 marked segments are enriched, while poised and active states are rare. Hence, our method also provides new insights into chromosome-specific epigenetic patterns, opening up new questions how "epigenetic computation" is distributed over the genome in space and time.
    PLoS ONE 01/2012; 7(10):e46811. · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The transitions to multicellularity mark the most pivotal and distinctive events in life's history on Earth. Although several transitions to "simple" multicellularity (SM) have been recorded in both bacterial and eukaryotic clades, transitions to complex multicellularity (CM) have only happened a few times in eukaryotes. A large number of cell types (associated with large body size), increased energy consumption per gene expressed, and an increment of non-protein-coding DNA positively correlate with CM. These three factors can indeed be understood as the causes and consequences of the regulation of gene expression. Here, we discuss how a vast expansion of non-protein-coding RNA (ncRNAs) regulators rather than large numbers of novel protein regulators can easily contribute to the emergence of CM. We also propose that the evolutionary advantage of RNA-based gene regulation derives from the robustness of the RNA structure that makes it easy to combine genetic drift with functional exploration. We describe a model which aims to explain how the evolutionary dynamic of ncRNAs becomes dominated by the accessibility of advantageous mutations to innovate regulation in complex multicellular organisms. The information and models discussed here outline the hypothesis that pervasive ncRNA-based regulatory systems, only capable of being expanded and explored in higher eukaryotes, are prerequisite to complex multicellularity. Thereby, regulatory RNA molecules in Eukarya have allowed intensification of morphological complexity by stabilizing critical phenotypes and controlling developmental precision. Although the origin of RNA on early Earth is still controversial, it is becoming clear that once RNA emerged into a protocellular system, its relevance within the evolution of biological systems has been greater than we previously thought.
    Origins of Life 12/2011; 41(6):587-607. · 2.05 Impact Factor
  • Source
    Sonja J.prohaska, Peter F.stadler
    [Show abstract] [Hide abstract]
    ABSTRACT: High-throughput experiments have produced convicing evidence for an extensive contribution of diverse classes of RNAs in the expression of genetic information. Instead of a simple arrangement of mostly protein-coding genes, the human transcriptome features a complex arrangement of overlapping transcripts, many of which do not code for proteins at all, while others "sample" exons from several different "genes". The complexity of the transcriptome and the prevalence of noncoding transcripts forces us to reconsider both the concept of the "gene" itself and our understanding of the mechanisms that regulate "gene expression".
    Biophysical Reviews and Letters 11/2011; 03(01n02).
  • Source
    A.A. Parikesit, P. Stadler, S. Prohaska
    26th German Conference on Bioinformatics 2011. 7-9 September 2011; 09/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Functional RNA elements can be embedded also within exonic sequences coding for functional proteins. While not uncommon in viruses, only a few examples of this type have been described in some detail for eukaryotic genomes. Here we use RNAz and RNAcode, two comparative genomics methods that measure signatures of stabilizing selection acting on RNA secondary structure and peptide sequence, resp., to survey the fruit fly genomes. We estimate that there might be on the order of 1000 loci that are subject to dual selection pressure. The used genome-wide screens also expose the limitations of the currently available methods.
    Biochimie 07/2011; 93(11):2019-23. · 3.14 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Teleost fishes have extra Hox gene clusters owing to shared or lineage-specific genome duplication events in rayfinned fish (actinopterygian) phylogeny. Hence, extrapolating between genome function of teleosts and human or even between different fish species is difficult. We have sequenced and analyzed Hox gene clusters of the Senegal bichir (Polypterus senegalus), an extant representative of the most basal actinopterygian lineage. Bichir possesses four Hox gene clusters (A, B, C, D); phylogenetic analysis supports their orthology to the four Hox gene clusters of the gnathostome ancestor. We have generated a comprehensive database of conserved Hox noncoding sequences that include cartilaginous, lobe-finned, and ray-finned fishes (bichir and teleosts). Our analysis identified putative and known Hox cis-regulatory sequences with differing depths of conservation in Gnathostoma. We found that although bichir possesses four Hox gene clusters, its pattern of conservation of noncoding sequences is mosaic between outgroups, such as human, coelacanth, and shark, with four Hox gene clusters and teleosts, such as zebrafish and pufferfish, with seven or eight Hox gene clusters. Notably, bichir Hox gene clusters have been invaded by DNA transposons and this trend is further exemplified in teleosts, suggesting an as yet unrecognized mechanism of genome evolution that may explain Hox cluster plasticity in actinopterygians. Taken together, our results suggest that actinopterygian Hox gene clusters experienced a reduction in selective constraints that surprisingly predates the teleost-specific genome duplication.
    Journal of Experimental Zoology Part B Molecular and Developmental Evolution 06/2011; 316(6):451-64. · 2.12 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: a b s t r a c t Scientific theories seek to provide simple explanations for significant empirical regularities based on fundamental physical and mechanistic constraints. Biological theories have rarely reached a level of generality and predictive power comparable to physical theories. This discrepancy is explained through a combination of frozen accidents, environmental heterogeneity, and widespread non-linearities observed in adaptive processes. At the same time, model building has proven to be very successful when it comes to explaining and predicting the behavior of particular biological systems. In this respect biology resembles alternative model-rich frameworks, such as economics and engineering. In this paper we explore the prospects for general theories in biology, and suggest that these take inspiration not only from physics, but also from the information sciences. Future theoretical biology is likely to represent a hybrid of parsimonious reasoning and algorithmic or rule-based explanation. An open question is whether these new frameworks will remain transparent to human reason. In this context, we discuss the role of machine learning in the early stages of scientific discovery. We argue that evolutionary history is not only a source of uncertainty, but also provides the basis, through conserved traits, for very general explanations for biological regularities, and the prospect of unified theories of life.
    Journal of Theoretical Biology 03/2011; 276(1). · 2.35 Impact Factor

Publication Stats

2k Citations
345.52 Total Impact Points

Institutions

  • 2003–2014
    • University of Leipzig
      • • Interdisziplinäres Zentrum für Bioinformatik
      • • Institut für Informatik
      Leipzig, Saxony, Germany
  • 2013
    • Harvard University
      Cambridge, Massachusetts, United States
  • 2011
    • Philipps University of Marburg
      • Institut für Pharmazeutische Chemie
      Marburg, Hesse, Germany
    • University of Vienna
      • Department of Theoretical Chemistry
      Vienna, Vienna, Austria
  • 2008–2010
    • Benaroya Research Institute
      Seattle, Washington, United States
    • Santa Fe Institute
      Santa Fe, New Mexico, United States
  • 2009
    • University of Freiburg
      • Bioinformatics
      Freiburg, Lower Saxony, Germany
  • 2007–2008
    • Arizona State University
      • Department of Biomedical Informatics
      Phoenix, Arizona, United States
  • 2006
    • Georg-August-Universität Göttingen
      • Institute of Microbiology and Genetics
      Göttingen, Lower Saxony, Germany
  • 2004
    • Yale University
      • Department of Ecology and Evolutionary Biology
      New Haven, CT, United States