Topics (11) View all

Research experience

  • Jan 2012–
    present
    Research: Max Delbrück Centrum für Molekulare Medizin
    Max Delbrück Centrum für Molekulare Medizin · Berlin Institute for Medical Systems Biology
    Germany · Berlin
  • Jan 2011–
    present
    Research: Philipps-Universität Marburg
    Philipps-Universität Marburg
    Germany · Marburg an der Lahn
  • Jan 2011–
    present
    Research: University of Southern Denmark
    University of Southern Denmark · Department of Mathematics and Computer Science
    Denmark · Copenhagen
  • Jan 2010–
    present
    Research: University of Illinois, Urbana-Champaign
    University of Illinois, Urbana-Champaign
    USA · Urbana
  • Jan 2010–
    present
    Research: Università degli Studi del Piemonte Orientale "Amedeo Avogadro"
    Università degli Studi del Piemonte Orientale "Amedeo Avogadro"
    Italy · Alessandria
  • Jan 2009–
    present
    Research: Janelia Farm Research Campus
    Janelia Farm Research Campus
    USA · Ashburn
  • Jan 2009–
    Dec 2011
    Research: Universität Freiburg
    Universität Freiburg · Department of Computer Science
    Germany · Freiburg
  • Jan 2009–
    present
    Research: Zoologisches Forschungsmuseum Alexander Koenig
    Zoologisches Forschungsmuseum Alexander Koenig
    Germany · Bonn
  • Jan 2008–
    Dec 2010
    Research: Benaroya Research Institute
    Benaroya Research Institute
    USA · Seattle
  • Jan 2007–
    Dec 2013
    Research: University of Copenhagen
    University of Copenhagen · Bioinformatics Centre
    Denmark · Copenhagen
  • Jan 2006–
    present
    Research: National Chiao Tung University
    National Chiao Tung University · Institute of Bioinformatics and Systems Biology
    Taiwan · Hsinchu
  • Jan 2006–
    present
    Research: Georg-August-Universität Göttingen
    Georg-August-Universität Göttingen · Institute of Microbiology and Genetics
    Germany · Göttingen
  • Jan 2006–
    present
    Research: Chinese Academy of Science
    Chinese Academy of Science
    China · Beijing
  • Jan 2005–
    present
    Research: Novartis Institutes for BioMedical Research
    Novartis Institutes for BioMedical Research
    USA · Cambridge
  • Jan 2004–
    present
    Research: Universidade Estadual de Campinas
    Universidade Estadual de Campinas · Instituto de Física "Gleb Wataghin" (IFGW)
    Brazil · Campinas
  • Jan 2004–
    Dec 2006
    Research: Los Alamos National Laboratory
    Los Alamos National Laboratory · Bioscience Division
    USA · Los Alamos
  • Jan 2003–
    Dec 2007
    Research: Yale University
    Yale University · Department of Ecology and Evolutionary Biology
    USA · New Haven
  • Jan 2003–
    Dec 2012
    Research: University of Leipzig
    University of Leipzig · Institut für Informatik
    Germany · Leipzig
  • Jan 1998–
    present
    Research: University of Auckland
    University of Auckland
    New Zealand · Auckland
  • Jan 1991–
    Dec 2012
    Research: Santa Fe Institute
    Santa Fe Institute
    USA · Santa Fe
  • Jan 1991–
    Dec 2013
    Research: Universität Wien
    Universität Wien · Department of Theoretical Chemistry
    Austria · Vienna

Publications (390) View all

  • Source
    Article: Convex cycle bases
    Marc Hellmuth, Josef Leydold, Peter F. Stadler
    [show abstract] [hide abstract]
    ABSTRACT: Convex cycles play a role e.g. in the context of product graphs. We introduce convex cycle bases and describe a polynomial-time algorithm that recognizes whether a given graph has a convex cycle basis and provides an explicit construction in the positive case. Relations between convex cycles bases and other types of cycles bases are discussed. In particular we show that if G has a unique minimal cycle bases, this basis is convex. Furthermore, we characterize a class of graphs with convex cycles bases that includes partial cubes and hence median graphs.
    Ars Mathematica Contemporanea 01/2014; 7(1):123-140. · 0.40 Impact Factor
  • Article: Mapping the RNA-Seq trash bin: Unusual transcripts in prokaryotic transcriptome sequencing data
    [show abstract] [hide abstract]
    ABSTRACT: Prokaryotic transcripts constitute almost always uninterrupted intervals when mapped back to the genome. Split reads, i.e., RNA-seq reads consisting of parts that only map to discontiguous loci, are thus disregarded in most analysis pipelines. There are, however, some well-known exceptions, in particular, tRNA splicing and circularized small RNAs in Archaea as well as self-splicing introns. Here, we reanalyze a series of published RNA-seq data sets, screening them specifically for non-contiguously mapping reads. We recover most of the known cases together with several novel archaeal ncRNAs associated with circularized products. In Eubacteria, only a handful of interesting candidates were obtained beyond a few previously described group I and group II introns. Most of the atypically mapping reads do not appear to correspond to well-defined, specifically processed products. Whether this diffuse background is, at least in part, an incidental by-product of prokaryotic RNA processing or whether it consists entirely of technical artifacts of reverse transcription or amplification remains unknown.
    RNA Biology 05/2013; · 4.93 Impact Factor
  • Article: Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics.
    [show abstract] [hide abstract]
    ABSTRACT: Correct annotation of protein coding genes is the basis of conventional data analysis in proteomic studies. Nevertheless, most protein sequence databases almost exclusively rely on gene finding software and inevitably also miss protein annotations or possess errors. Proteogenomics tries to overcome these issues by matching MS data directly against a genome sequence database. Here we report an in-depth proteogenomics study of Helicobacter pylori strain 26695. MS data was searched against a combined database of the NCBI annotations and a six-frame translation of the genome. Database searches with Mascot and X!Tandem revealed 1115 proteins identified by at least two peptides with a peptide false discovery rate below 1%. This represents 71% of the predicted proteome. So far this is the most extensive proteome study of H. pylori. Our proteogenomics approach unambiguously identified four previously missed annotations and furthermore allowed us to correct sequences of six annotated proteins. Since secreted proteins are often involved in pathogenic processes we further investigated signal peptidase cleavage sites. By applying a database search that accommodates the identification of semi-specific cleaved peptides, 63 previously unknown signal peptides were detected. The motif LXA showed to be the predominant recognition sequence for signal peptidases. BIOLOGICAL SIGNIFICANCE: Even so de novo sequencing algorithms were significantly optimized within the last years the method still has its limitations in terms of speed, quality and completeness. Hence, the result of each standard proteomic study highly relies on correct annotation of protein coding genes which is the basis of conventional data analysis. Due to the amazing technical improvements genome sequencing is well established and easily accessible for the science community and is no longer a bottleneck in research. However, the annotation of protein coding sequences in genomic data is usually based on gene finding software. These tools are limited in their prediction accuracy. It is typically problematic to determine exact gene boundaries. Furthermore it is even harder to additionally correctly predict signal peptides. In conclusion this can result in protein databases with erroneous, incomplete and even missing entries. Nevertheless proteogenomic evaluations are still rare. In this study we investigated the proteome of the human pathogen Helicobacter pylori (HP) - a human pathogen which infected about 50% of the world's population and is responsible for many gastric diseases like gastric and duodenal ulcers as well as gastric cancer. Using GeLC-MS and 2D-LC-MS and applied multiple proteases. Thus we were able to highly reliable identify 1115 proteins (FDR <0.01%) by at least two peptides with a peptide false discovery rate below 1%. This represents 71% of the predicted proteome listed in the NCBI database and is so far this is the most extensive proteome study of H. pylori. Based on this data we reanalyzed the data we the focus to (i) to detect previously not annotated proteins, (ii) correct gene boundary and frame shift errors, (iii) and to identify signal peptides Thus our proteogenomics approach resulted in the unambiguously identification of four previously missed annotations the correction of six annotated proteins as well as the detection of 63 previously unknown signal peptides. Furthermore with a motif analysis "L X A" could be determined to be the predominant recognition sequence for signal peptidase I in H. pylori. The results very validated by MS spectrum comparison between detected and synthetized peptides. Furthermore, transcripts for all newly annotated proteins were detected in a whole transcriptome analysis. We have annotated proteins of particular biological interest like the ferrous iron transport protein A, the coiled-coil-rich protein HP0058 and the lipopolysaccharide biosynthesis protein HP0619. Database entries for these proteins might be important to study biological pathways involved in pathogenesis or drug response of H. pylori. For instance, the protein HP0619 could be a drug target for the inhibition of the LPS synthesis pathway. Additionally, we investigated the specificity of the signal peptidases of H. pylori. We could determine certain differences to the proposed signal peptide structure of other Gram-negative bacteria such as E. coli. Signal peptidases are essential enzymes for the viability of bacterial cells and are involved in pathogenesis Therefore signal peptidases could be novel targets for antibiotics. Additionally, inclusion of signal peptides into the database could increase peptide and protein identifications of future proteome studies.
    Journal of proteomics 05/2013; · 5.07 Impact Factor
  • Article: The RNAsnp web server: predicting SNP effects on local RNA secondary structure.
    [show abstract] [hide abstract]
    ABSTRACT: The function of many non-coding RNA genes and cis-regulatory elements of messenger RNA largely depends on the structure, which is in turn determined by their sequence. Single nucleotide polymorphisms (SNPs) and other mutations may disrupt the RNA structure, interfere with the molecular function and hence cause a phenotypic effect. RNAsnp is an efficient method to predict the effect of SNPs on local RNA secondary structure based on the RNA folding algorithms implemented in the Vienna RNA package. The SNP effects are quantified in terms of empirical P-values, which, for computational efficiency, are derived from extensive pre-computed tables of distributions of substitution effects as a function of gene length and GC content. Here, we present a web service that not only provides an interface for RNAsnp but also features a graphical output representation. In addition, the web server is connected to a local mirror of the UCSC genome browser database that enables the users to select the genomic sequences for analysis and visualize the results directly in the UCSC genome browser. The RNAsnp web server is freely available at: http://rth.dk/resources/rnasnp/.
    Nucleic Acids Research 04/2013; · 8.03 Impact Factor
  • Article: LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search.
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: The search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAsor microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptor-based methods. In most practical applications published to-date, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task? RESULTS: Sequence-structure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is pre-processed to obtain local base pairing probabilities. As a search engine we devised a semi-global scanning variant of LocARNA's algorithm for sequence-structure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence. CONCLUSIONS: Although we demonstrate that a purely structure-based homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptomewide applications, such methods will provide accurate structure annotations on the target side. AVAILABILITY: Source code of the free software LocARNAscan 1.0 and supplementary data are available at http://www.bioinf.uni-leipzig.de/Software/LocARNAscan.
    Algorithms for Molecular Biology 04/2013; 8(1):14. · 1.35 Impact Factor

Following (112) See all

Followers (166) See all