Jan Baumbach

University of Southern Denmark, Odense, South Denmark, Denmark

Are you Jan Baumbach?

Claim your profile

Publications (84)286.62 Total impact

  • Christian Wiwie · Jan Baumbach · Richard Röttger
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.
    Nature Methods 09/2015; DOI:10.1038/nmeth.3583 · 32.07 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bacteria are highly diverse organisms that are able to adapt to a broad range of environments and hosts due to their high genomic plasticity. Horizontal gene transfer plays a pivotal role in this genome plasticity and in evolution by leaps through the incorporation of large blocks of genome sequences, ordinarily known as genomic islands (GEIs). GEIs may harbor genes encoding virulence, metabolism, antibiotic resistance and symbiosis-related functions, namely pathogenicity islands (PAIs), metabolic islands (MIs), resistance islands (RIs) and symbiotic islands (SIs). Although many software for the prediction of GEIs exist, they only focus on PAI prediction and present other limitations, such as complicated installation and inconvenient user interfaces. Here, we present GIPSy, the Genomic Island Prediction Software, a standalone and user-friendly software for the prediction of GEIs, built on our previously developed Pathogenicity Island Prediction Software (PIPS). We also present four application cases in which we crosslink data from literature to PAIs, MIs, RIs and SIs predicted by GIPSy. Briefly, GIPSy correctly predicted the following previously described GEIs: 13 PAIs larger than 30kb in Escherichia coli CFT073; 1 MI for Burkholderia pseudomallei K96243, which seems to be a miscellaneous island; 1 RI of Acinetobacter baumannii AYE, named AbaR1; and, 1 SI of Mesorhizobium loti MAFF303099 presenting a mosaic structure. GIPSy is the first life-style-specific genomic island prediction software to perform analyses of PAIs, MIs, RIs and SIs, opening a door for a better understanding of bacterial genome plasticity and the adaptation to new traits.
    Journal of Biotechnology 09/2015; DOI:10.1016/j.jbiotec.2015.09.008 · 2.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Organisms utilize a multitude of mechanisms for responding to changing environmental conditions, maintaining their functional homeostasis and to overcome stress situations. One of the most important mechanisms is transcriptional gene regulation. In-depth study of the transcriptional gene regulatory network can lead to various practical applications, creating a greater understanding of how organisms control their cellular behavior. In this work, we present a new database, CMRegNet for the gene regulatory networks of Corynebacterium glutamicum ATCC 13032 and Mycobacterium tuberculosis H37Rv. We furthermore transferred the known networks of these model organisms to 18 other non-model but phylogenetically close species (target organisms) of the CMNR group. In comparison to other network transfers, for the first time we utilized two model organisms resulting into a more diverse and complete network of the target organisms. CMRegNet provides easy access to a total of 3,103 known regulations in C. glutamicum ATCC 13032 and M. tuberculosis H37Rv and to 38,940 evolutionary conserved interactions for 18 non-model species of the CMNR group. This makes CMRegNet to date the most comprehensive database of regulatory interactions of CMNR bacteria. The content of CMRegNet is publicly available online via a web interface found at http://lgcm.icb.ufmg.br/cmregnet .
    BMC Genomics 06/2015; 16(1):452. DOI:10.1186/s12864-015-1631-0 · 3.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Computational breath analysis is a growing research area aiming at identifying volatile organic compounds (VOCs) in human breath to assist medical diagnostics of the next generation. While inexpensive and non-invasive bioanalytical technologies for metabolite detection in exhaled air and bacterial/fungal vapor exist and the first studies on the power of supervised machine learning methods for profiling of the resulting data were conducted, we lack methods to extract hidden data features emerging from confounding factors. Here, we present Carotta, a new cluster analysis framework dedicated to uncovering such hidden substructures by sophisticated unsupervised statistical learning methods. We study the power of transitivity clustering and hierarchical clustering to identify groups of VOCs with similar expression behavior over most patient breath samples and/or groups of patients with a similar VOC intensity pattern. This enables the discovery of dependencies between metabolites. On the one hand, this allows us to eliminate the effect of potential confounding factors hindering disease classification, such as smoking. On the other hand, we may also identify VOCs associated with disease subtypes or concomitant diseases. Carotta is an open source software with an intuitive graphical user interface promoting data handling, analysis and visualization. The back-end is designed to be modular, allowing for easy extensions with plugins in the future, such as new clustering methods and statistics. It does not require much prior knowledge or technical skills to operate. We demonstrate its power and applicability by means of one artificial dataset. We also apply Carotta exemplarily to a real-world example dataset on chronic obstructive pulmonary disease (COPD). While the artificial data are utilized as a proof of concept, we will demonstrate how Carotta finds candidate markers in our real dataset associated with confounders rather than the primary disease (COPD) and bronchial carcinoma (BC). Carotta is publicly available at http://carotta.compbio.sdu.dk [1].
    06/2015; 5(2):344-363. DOI:10.3390/metabo5020344
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An in-depth understanding of complex systems such as hepatitis C virus (HCV) infection and host immunomodulatory response is an open challenge for biologists. In order to understand the mechanisms involved in immune evasion by HCV, we present a simplified formalization of the highly dynamic system consisting of HCV, its replication cycle and host immune responses at the cellular level using Hybrid Petri Net (HPN). The approach followed in this study comprises of step wise simulation, model validation and analysis of host immune response. This study was performed with an objective of making correlations among viral RNA levels, interferon (IFN) production and interferon stimulated genes (ISGs) induction. The results correlate with the biological data verifying that the model is very useful in predicting the dynamic behavior of the signaling proteins in response to a stimulus. This study implicates that the HCV infection is dependent upon several key factors of the host immune response. The effect of host proteins on limiting viral infection is effectively overruled by the viral pathogen. This study also analyzes activity levels of RNase L, miR-122, IFN, ISGs and PKR induction and inhibition of TLR3/RIG1 mediated pathways in response to targeted manipulation in the presence of HCV. The results are in complete agreement at the time of writing with the published expression studies and western blot experiments. Our model also provides some biological insights regarding the role of PKR in the acute infection of HCV. It might help to explain why many patients fail to clear acute HCV infection while others, with low ISGs basal levels, clear HCV spontaneously. The described methodology can easily be reproduced, which suitably supports the study of other viral infections in a formal, automated and expressive manner. The Petri Net-based modeling approach applied here may provide valuable insights for study design and analyses to evaluate other disease associated integrated pathways in biological systems.
    Integrative Biology 03/2015; 7(5). DOI:10.1039/C4IB00285G · 3.76 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We �nally arrived in the post-genome era and face systems biology challenges of immense dimensionality. Huge graphs model the interplay of biological entities of all kinds (genes, proteins, metabolites). In parallel, the emergence of next-generation OMICS technology allows measuring their expression on a large-scale and in high-throughput. Horizons opened for so-called network enrichment strategies, which aim for combining these two data types, networks and expression matrices. One basically assumes that disease-speci�c, foreground (FG) genes have a di�erent expression distribution than the others, the background (BG) genes, in a set of patients compared to a control group. A priori one knows neither FG, BG, and their expression distributions. De novo network enrichment tools seek to �nd densely connected sub-networks that are enriched with FG genes, i.e. deregulated diseases-speci�c subnetworks. As we do not know all FG genes and, more important, many of the BG genes (i.e. genes that are not disease-related) we struggle evaluating the real-world relevance of the sub-networks extracted by network enrichers. Here, we contribute with a proof-of-principle study addressing this problem. We introduce a sampling procedure to generate arti�cial 'gold standards' of FG and BG genes of varying complexity. Therefore, we introduce two intuitive parameters controlling how distant FG and BG genes are in their expression values (separation), and how densely the FG genes are distributed in a network (density), respectively. For the latter, we introduce two algorithms to 'hide' FG genes with a certain density in a graph. We exemplary benchmark the performance of the network enrichment tool KeyPathwayMiner in �nding the FG genes that we have 'hidden' in the input network for di�erent density and separation values. We believe that our simple but robust strategy is applicable for systematically assessing and comparing the quality of network enrichment tools in the future.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the availability of newer and cheaper sequencing methods, genomic data are being generated at an increasingly fast pace. In spite of the high degree of complexity of currently available search routines, the massive number of sequences available virtually prohibits quick and correct identification of large groups of sequences sharing common traits. Hence, there is a need for clustering tools for automatic knowledge extraction enabling the curation of large-scale databases. Current sophisticated approaches on sequence clustering are based on pairwise similarity matrices. This is impractical for databases of hundreds of thousands of sequences as such a similarity matrix alone would exceed the available memory. In this paper, a new approach called MultiLevel Clustering (MLC) is proposed which avoids a majority of sequence comparisons, and therefore, significantly reduces the total runtime for clustering. An implementation of the algorithm allowed clustering of all 344,239 ITS (Internal Transcribed Spacer) fungal sequences from GenBank utilizing only a normal desktop computer within 22 CPU-hours whereas the greedy clustering method took up to 242 CPU-hours.
    Scientific Reports 10/2014; 4:6837. DOI:10.1038/srep06837 · 5.58 Impact Factor
  • Jan Baumbach · Richard Röttger
    [Show abstract] [Hide abstract]
    ABSTRACT: A graphical abstract is available for this content
    Integrative Biology 10/2014; 6(11). DOI:10.1039/c4ib90037e · 3.76 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The volatolom is the sum of volatile organic compounds that are emitted by all living cells and tissues. We seek to non-invasively "sniff" biomarker molecules that are predictive for the biomedical fate of individual patients. This promises great hope to move the therapeutic windows to earlier stages of disease progression. While portable devices for breathomics measurement exist, we face the traditional biomarker research barrier: A lack of robustness hinders translation to the world outside laboratories. To move from biomarker discovery to validation, from separability to predictability, we have developed several bioinformatics methods for computational breath analysis, which have the potential to redefine non-invasive biomedical decision making by rapid and cheap matching of decisive medical patterns in exhaled air. We aim to provide a supplementary diagnostic tool complementing classic urine, blood and tissue samples. The presentation will review the state of the art, highlight existing challenges and introduce new data mining methods for identifying breathomics biomarkers.
    Highlight Talk at 13th European Conference on Computational Biology (ECCB), Strassbourg, France; 09/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. Typical challenges involved in this technology are antibody selection, sample preparation and optimization of staining conditions. The issue of combining effective sample management and data analysis, however, has been widely neglected. Results: This motivated us to develop MIRACLE, a comprehensive and user-friendly web application bridging the gap between spotting and array analysis by conveniently keeping track of sample information. Data processing includes correction of staining bias, estimation of protein concentration from response curves, normalization for total protein amount per sample and statistical evaluation. Established analysis methods have been integrated with MIRACLE, offering experimental scientists an end-to-end solution for sample management and for carrying out data analysis. In addition, experienced users have the possibility to export data to R for more complex analyses. MIRACLE thus has the potential to further spread utilization of RPPAs as an emerging technology for high-throughput protein analysis. Availability: Project URL: http://www.nanocan.org/miracle/ Contact: mlist@health.sdu.dk Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2014; 30(17):i631-i638. DOI:10.1093/bioinformatics/btu473 · 4.98 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Over the last decade network enrichment analysis has become popular in computational systems biology to elucidate aberrant network modules. Traditionally, these approaches focus on combining gene expression data with protein-protein interaction (PPI) networks. Nowadays, the so-called omics technologies allow for inclusion of many more data sets, e.g. protein phosphorylation or epigenetic modifications. This creates a need for analysis methods that can combine these various sources of data to obtain a systems-level view on aberrant biological networks.ResultsWe present a new release of KeyPathwayMiner (version 4.0) that is not limited to analyses of single omics data sets, e.g. gene expression, but is able to directly combine several different omics data types. Version 4.0 can further integrate existing knowledge by adding a search bias towards sub-networks that contain (avoid) genes provided in a positive (negative) list. Finally the new release now also provides a set of novel visualization features and has been implemented as an app for the standard bioinformatics network analysis tool: Cytoscape.Conclusion With KeyPathwayMiner 4.0, we publish a Cytoscape app for multi-omics based sub-network extraction. It is available in Cytoscape¿s app store http://apps.cytoscape.org/apps/keypathwayminer or via http://keypathwayminer.mpi-inf.mpg.de.
    BMC Systems Biology 08/2014; 8(1):99. DOI:10.1186/s12918-014-0099-x · 2.44 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In life sciences, and particularly biomedical research, linking aberrant pathways exhibiting phenotype-specific alterations to the underlying physical condition or disease is an ongoing challenge. Computationally, a key approach for pathway identification is data enrichment, combined with generation of biological networks. This allows identification of intrinsic patterns in the data and their linkage to a specific context such as cellular compartments, diseases or functions. Identification of aberrant pathways by traditional approaches is often limited to biological networks based on either gene expression, protein expression or post-translational modifications. To overcome single omics analysis, we developed a set of computational methods that allow a combined analysis of data collections from multiple omics fields utilizing hybrid interactome networks. We apply these methods to data obtained from a triple-negative breast cancer cell line model, combining data sets of gene and protein expression as well as protein phosphorylation. We focus on alterations associated with the phenotypical differences arising from epithelial-mesenchymal transition in two breast cancer cell lines exhibiting epithelial-like and mesenchymal-like morphology, respectively. Here we identified altered protein signaling activity in a complex biologically relevant network, related to focal adhesion and migration of breast cancer cells. We found dysregulated functional network modules revealing altered phosphorylation-dependent activity in concordance with the phenotypic traits and migrating potential of the tested model. In addition, we identified Ser267 on zyxin, a protein coupled to actin filament polymerization, as a potential in vivo phosphorylation target of cyclin-dependent kinase 1.
    Integrative Biology 08/2014; 6(11). DOI:10.1039/c4ib00137k · 3.76 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: We address the problem of multiple protein-protein interaction (PPI) network alignment. Given a set of such networks for different species we might ask how much the network topology is conserved throughout evolution. Solving this problem will help to derive a subset of interactions that is conserved over multiple species thus forming a 'core interactome'. Methods: We model the problem as Topological Multiple one-to-one Network Alignment (TMNA), where we aim to minimize the total Graph Edit Distance (GED) between pairs of the input networks. Here, the GED between two graphs is the number of deleted and inserted edges that are required to make one graph isomorphic to another. By minimizing the GED we indirectly maximize the number of edges that are aligned in multiple networks simultaneously. However, computing an optimal GED value is computationally intractable. We thus propose an evolutionary algorithm and developed a software tool, GEDEVO-M, which is able to align multiple PPI networks using topological information only. We demonstrate the power of our approach by computing a maximal common subnetwork for a set of bacterial and eukaryotic PPI networks. GEDEVO-M thus provides great potential for computing the 'core interactome' of different species. Availability: http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/.
  • [Show abstract] [Hide abstract]
    ABSTRACT: As high-throughput technologies become cheaper and easier to use, raw sequence data and corresponding annotations for many organisms are becoming available. However, sequence data alone is not sufficient to explain the biological behaviour of organisms, which arises largely from complex molecular interactions. There is a need to develop new platform technologies that can be applied to the investigation of whole-genome datasets in an efficient and cost-effective manner. One such approach is the transfer of existing knowledge from well-studied organisms to closely-related organisms. In this paper, we describe a system, BacillusRegNet, for the use of a model organism, Bacillus subtilis, to infer genome-wide regulatory networks in less well-studied close relatives. The putative transcription factors, their binding sequences and predicted promoter sequences along with annotations are available from the associated BacillusRegNet website (http://bacillus.ncl.ac.uk).
    Journal of integrative bioinformatics 07/2014; 11(2):244. DOI:10.2390/biecoll-jib-2014-244
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.
    Journal of integrative bioinformatics 06/2014; 11(2):236. DOI:10.2390/biecoll-jib-2014-236
  • Jan Baumbach · Jiong Guo · Rashid Ibragimov
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.
    05/2014; 5(2):161-168. DOI:10.4331/wjbc.v5.i2.161
  • [Show abstract] [Hide abstract]
    ABSTRACT: We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.
    Briefings in functional genomics 05/2014; 13(5). DOI:10.1093/bfgp/elu014 · 3.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We define breathomics as the metabolomics study of exhaled air. It is a strongly emerging metabolomics research field that mainly focuses on health-related volatile organic compounds (VOCs). Since the amount of these compounds varies with health status, breathomics holds great promise to deliver non-invasive diagnostic tools. Thus, the main aim of breathomics is to find patterns of VOCs related to abnormal (for instance inflammatory) metabolic processes occurring in the human body. Recently, analytical methods for measuring VOCs in exhaled air with high resolution and high throughput have been extensively developed. Yet, the application of machine learning methods for fingerprinting VOC profiles in the breathomics is still in its infancy. Therefore, in this paper, we describe the current state of the art in data pre-processing and multivariate analysis of breathomics data. We start with the detailed pre-processing pipelines for breathomics data obtained from gas-chromatography mass spectrometry and an ion-mobility spectrometer coupled to multi-capillary columns. The outcome of data pre-processing is a matrix containing the relative abundances of a set of VOCs for a group of patients under different conditions (e.g. disease stage, treatment). Independently of the utilized analytical method, the most important question, 'which VOCs are discriminatory?', remains the same. Answers can be given by several modern machine learning techniques (multivariate statistics) and, therefore, are the focus of this paper. We demonstrate the advantages as well the drawbacks of such techniques. We aim to help the community to understand how to profit from a particular method. In parallel, we hope to make the community aware of the existing data fusion methods, as yet unresearched in breathomics.
    Journal of Breath Research 04/2014; 8(2):027105. DOI:10.1088/1752-7155/8/2/027105 · 4.63 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Ion mobility spectrometry coupled to multi capillary columns (MCC/IMS) combines highly sensitive spectrometry with a rapid separation technique. MCC\IMS is widely used for biomedical breath analysis. The identification of molecules in such a complex sample necessitates a reference database. The existing IMS reference databases are still in their infancy and do not allow to actually identify all analytes. With a gas chromatograph coupled to a mass selective detector (GC/MSD) setup in parallel to a MCC/IMS instrumentation we may increase the accuracy of automatic analyte identification. To overcome the time-consuming manual evaluation and comparison of the results of both devices, we developed a software tool MIMA (MS-IMS-Mapper), which can computationally generate analyte layers for MCC/ IMS spectra by using the corresponding GC/MSD data. We demonstrate the power of our method by successfully identifying the analytes of a seven-component mixture. In conclusion, the main contribution of MIMA is a fast and easy computational method for assigning analyte names to yet un-assigned signals in MCC/IMS data. We believe that this will greatly impact modern MCC/IMS-based biomarker research by “giving a name” to previously detected diseasespecific molecules.
    International Journal for Ion Mobility Spectrometry 04/2014; 17(2):95-101. DOI:10.1007/s12127-014-0149-5

Publication Stats

1k Citations
286.62 Total Impact Points


  • 2013–2015
    • University of Southern Denmark
      • Department of Mathematics and Computer Science
      Odense, South Denmark, Denmark
    • Institute of Integrative Omics and Applied Biotechnology
      Rānāghāt, Bengal, India
  • 2014
    • Federal University of Minas Gerais
      • Institute of Biological Sciences
      Cidade de Minas, Minas Gerais, Brazil
  • 2010–2014
    • Universität des Saarlandes
      • Bioinformatics
      Saarbrücken, Saarland, Germany
    • University of California, Berkeley
      Berkeley, California, United States
  • 2010–2012
    • Max Planck Institute for Informatics
      Saarbrücken, Saarland, Germany
  • 2011
    • Buck Institute for Research on Aging
      NOT, California, United States
  • 2006–2011
    • Bielefeld University
      • • CeBiTec - Center for Biotechnology
      • • Faculty of Technology
      Bielefeld, North Rhine-Westphalia, Germany