Oliver Kohlbacher

Oliver Kohlbacher
University of Tuebingen | EKU Tübingen · Center for Bioinformatics and Medical Informatics

Professor

About

612
Publications
102,461
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,795
Citations
Citations since 2017
190 Research Items
11766 Citations
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
Additional affiliations
July 2015 - present
University of Tuebingen
Position
  • Professor (Associate)
February 2012 - December 2018
University of Tuebingen
Position
  • Managing Director
July 2003 - present
University of Tuebingen
Position
  • Professor
Education
April 1996 - January 2001
Max Planck Institute for Informatics
Field of study
  • Computer Science
October 1990 - January 1996
Universität des Saarlandes
Field of study
  • Chemistry, Computer Science

Publications

Publications (612)
Article
Full-text available
Models derived from human pluripotent stem cells that accurately recapitulate neural development in vitro and allow for the generation of specific neuronal subtypes are of major interest to the stem cell and biomedical community. Notch signalling, particularly through the Notch effector HES5, is a major pathway critical for the onset and maintenanc...
Article
Full-text available
RNA-protein complexes play pivotal roles in many central biological processes. Although methods based on high-throughput sequencing have advanced our ability to identify the specific RNAs bound by a particular protein, there is a need for precise and systematic ways to identify RNA interaction sites on proteins. We have developed an experimental an...
Article
Full-text available
Motivation: The human leukocyte antigen (HLA) gene cluster plays a crucial role in adaptive immunity and is thus relevant in many biomedical applications. While next-generation sequencing data are often available for a patient, deducing the HLA genotype is difficult because of substantial sequence similarity within the cluster and exceptionally hi...
Article
Full-text available
DNA methylation is a defining feature of mammalian cellular identity and is essential for normal development. Most cell types, except germ cells and pre-implantation embryos, display relatively stable DNA methylation patterns, with 70-80% of all CpGs being methylated. Despite recent advances, we still have a limited understanding of when, where and...
Article
Differentiation of human embryonic stem cells (hESCs) provides a unique opportunity to study the regulatory mechanisms that facilitate cellular transitions in a human context. To that end, we performed comprehensive transcriptional and epigenetic profiling of populations derived through directed differentiation of hESCs representing each of the thr...
Article
Full-text available
In recent years, modern life sciences research underwent a rapid development driven mainly by the technical improvements in analytical areas leading to miniaturization, parallelization, and high throughput processing of biological samples. This has led to the generation of huge amounts of experimental data. To meet these rising demands, the German...
Preprint
Full-text available
Mass spectrometry has become an indispensable tool in the life sciences. The new major version 3 of the computational framework OpenMS provides significant advancements regarding open, scalable, and reproducible high-throughput workflows for proteomics, metabolomics, and oligonucleotide mass spectrometry. OpenMS makes analyses from emerging fields...
Preprint
Full-text available
Public proteomics data is rapidly increasing, creating a computational challenge for large-scale reanalysis. Here, we introduce quantms, an open-source cloud-based pipeline for massively parallel proteomics data analysis. We used quantms to reanalyze 56 of the largest datasets, comprising 26801 instrument files from 9502 human samples, to quantify...
Article
Full-text available
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that c...
Preprint
Full-text available
Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentat...
Preprint
Full-text available
Neonatal apneas and hypopneas present a serious risk for healthy infant development. Treating these adverse events requires frequent manual stimulation by skilled personnel, which can lead to alert fatigue. Automatically predicting these adverse events before they occur would enable the use of methods for automatic intervention. In this work, we pr...
Article
Full-text available
Background Personalized oncology represents a shift in cancer treatment from conventional methods to target specific therapies where the decisions are made based on the patient specific tumor profile. Selection of the optimal therapy relies on a complex interdisciplinary analysis and interpretation of these variants by experts in molecular tumor bo...
Preprint
Full-text available
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that c...
Article
Full-text available
Background The clinical utility of molecular profiling and targeted therapies for neuro-oncology patients outside of clinical trials is not established. We aimed at investigating feasibility and clinical utility of molecular profiling and targeted therapy in adult patients with advanced tumors in the nervous system within a prospective observationa...
Article
Full-text available
Background The immune peptidome of OPSCC has not previously been studied. Cancer-antigen specific vaccination may improve clinical outcome and efficacy of immune checkpoint inhibitors such as PD1/PD-L1 antibodies. Methods Mapping of the OPSCC HLA ligandome was performed by mass spectrometry (MS) based analysis of naturally presented HLA ligands is...
Article
Full-text available
Background Although of high individual and socioeconomic relevance, a reliable prediction model for the prognosis of juvenile stroke (18–55 years) is missing. Therefore, the study presented in this protocol aims to prospectively validate the discriminatory power of a prediction score for the 3 months functional outcome after juvenile stroke or tran...
Preprint
Full-text available
The need for data privacy and security -- enforced through increasingly strict data protection regulations -- renders the use of healthcare data for machine learning difficult. In particular, the transfer of data between different hospitals is often not permissible and thus cross-site pooling of data not an option. The Personal Health Train (PHT) p...
Poster
Technical and methodological advances enable to apply crosslinking mass spectrometry (XL-MS) for the identification of DNA and RNA binding sites within proteins in vitro. The use of chemical crosslinking agents and incorporation of photoactivatable nucleotides increases crosslinking efficiencies broadening the applicability of XL-MS to in vivo stud...
Article
Full-text available
Lysosomes are well-established as the main cellular organelles for the degradation of macromolecules and emerging as regulatory centers of metabolism. They are of crucial importance for cellular homeostasis, which is exemplified by a plethora of disorders related to alterations in lysosomal function. In this context, protein complexes play a decisi...
Preprint
Full-text available
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that c...
Preprint
Full-text available
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that c...
Article
Full-text available
To identify potential genetic causes for Mayer-Rokitansky-Küster-Hauser syndrome (MRKH), we analyzed blood and rudimentary uterine tissue of 5 MRKH discordant monozygotic twin pairs. Assuming that a variant solely identified in the affected twin or affected tissue could cause the phenotype, we identified a mosaic variant in ACTR3B with high allele...
Article
Background As the number of concomitantly used drugs increases, the prevalence of medication risks increases. These include, for example, drug interactions which may reduce or increase the desired and undesired effects of individual drugs.Objectives The POLypharmacy, drug interActions and Risks (POLAR) project aims to contribute to the detection of...
Preprint
Full-text available
Even though raw mass spectrometry data is information rich, the vast majority of the data is underutilized. The ability to interrogate these rich datasets is handicapped by the limited capability and flexibility of existing software. We introduce the Mass Spec Query Language (MassQL) that addresses these issues by enabling an expressive set of mass...
Article
Full-text available
The detailed analysis and structural characterization of proteoforms by top-down proteomics (TDP) has gained a lot of interest in biomedical research. Data-dependent acquisition (DDA) of intact proteins is non-trivial due to the diversity and complexity of proteoforms. Dedicated acquisition methods thus have the potential to greatly improve TDP. He...
Article
Full-text available
Background Stroke is one of the most frequent diseases, and half of the stroke survivors are left with permanent impairment. Prediction of individual outcome is still difficult. Many but not all patients with stroke improve by approximately 1.7 times the initial impairment, that has been termed proportional recovery rule. The present study aims at...
Preprint
Objective Mayer-Rokitansky-Küster-Hauser syndrome (MRKH) is a rare congenital disease manifesting with aplasia or severe hypoplasia of uterine structures. Even though extensive studies have been performed, for the majority of cases the etiology remains unclear. In this study, we sought to identify genetic causes in discordant monozygotic (MZ) twins...
Chapter
Mass deconvolution, the determination of proteoform precursor and fragment masses, is crucial for top-down proteomics data analysis. Here we describe the detailed procedure to run FLASHDeconv, an ultrafast, high-quality mass deconvolution tool. Both spectrum- and feature-level deconvolution results are obtainable in various output formats by FLASHD...
Chapter
Full-text available
COVID-19 has challenged the healthcare systems worldwide. To quickly identify successful diagnostic and therapeutic approaches large data sharing approaches are inevitable. Though organizational clinical data are abundant, many of them are available only in isolated silos and largely inaccessible to external researchers. To overcome and tackle this...
Article
Full-text available
Human expansion in the course of the Neolithic transition in western Eurasia has been one of the major topics in ancient DNA (aDNA) research in the last ten years. Multiple studies have shown that the spread of agriculture and animal husbandry from the Near East across Europe was accompanied by large-scale human expansions. Moreover, changes in sub...
Article
Full-text available
Background With a growing amount of (multi-)omics data being available, the extraction of knowledge from these datasets is still a difficult problem. Classical enrichment-style analyses require predefined pathways or gene sets that are tested for significant deregulation to assess whether the pathway is functionally involved in the biological proce...
Article
Full-text available
The extraction of meaningful biological knowledge from high-throughput mass spectrometry data relies on limiting false discoveries to a manageable amount. For targeted approaches in metabolomics a main challenge is the detection of false positive metabolic features in the low signal-to-noise ranges of data-independent acquisition results and their...
Article
Full-text available
Motivation Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease-gene association studies are of great importance. However, genomic data is very sensitive when compared to other data types and contains information about individuals and their relatives....
Article
Full-text available
Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and i...
Article
Proteins are central to all of the processes of life. For their activity, they almost invariably need to interact with other macromolecules, be they nucleic acids, membranes, glycans, or other proteins. The interaction between proteins is indeed the most common mode of macromolecular interaction underpinning living systems. To understand these syst...
Article
Full-text available
Understanding the molecular principles that govern the composition of the MHC-I immunopeptidome across different primary tissues is fundamentally important to predict how T cell respond in different contexts in vivo. Here, we performed a global analysis of the MHC-I immunopeptidome from 29 and 19 primary human and mouse tissues, respectively. First...
Preprint
Full-text available
Lysosomes are well-established as the main cellular organelles for the degradation of macromolecules and emerging as regulatory centers of metabolism. They are of crucial importance for cellular homeostasis, which is exemplified by a plethora of disorders related to alterations in lysosomal function. In this context, protein complexes play a decisi...
Chapter
Visualisations of metabolites and metabolic pathways have been used since the early years of research in biology, and pathway maps have become very popular in biochemistry textbooks, on posters, as well as in electronic resources and web pages about metabolism. Visualisations help to present knowledge and support browsing through chemical structure...
Article
Full-text available
Background Intensive Care Resources are heavily utilized during the COVID-19 pandemic. However, risk stratification and prediction of SARS-CoV-2 patient clinical outcomes upon ICU admission remain inadequate. This study aimed to develop a machine learning model, based on retrospective & prospective clinical data, to stratify patient risk and predic...
Article
Full-text available
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics...
Preprint
Full-text available
Top-down proteomics (TDP) has gained a lot of interest in biomedical application for detailed analysis and structural characterization of proteoforms. Data-dependent acquisition (DDA) of intact proteins is non-trivial due to the diversity and complex signal of proteoforms. Dedicated acquisition methods thus have the potential to greatly improve TDP...
Preprint
Full-text available
Background Personalized oncology represents a shift in cancer treatment from conventional methods to target specific therapies where the decisions are made based on the patient specific tumor profile. Selection of the optimal therapy relies on a complex interdisciplinary analysis and interpretation of these variants by experts in molecular tumor bo...
Article
Full-text available
Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. The main advantages include greater reproducibility and sensitivity and a greater dynamic range compared with data-dependent acquisition (DDA). However, the data analysis is complex and often requires expert knowledge when dealing with large-sc...
Preprint
Full-text available
Background. Intensive Care Resources are heavily utilized during the COVID-19 pandemic. However, risk stratification and prediction of SARS-CoV-2 patient clinical outcomes upon ICU admission remain inadequate. This study aimed to develop a machine learning model, based on retrospective & prospective clinical data, to stratify patient risk and predi...
Article
High price differences between European (e.g., from Italy) and Caucasian (e.g., from Georgia) cultivars motivate food fraud in hazelnuts. In this work, we present two targeted methods for differentiation and grouping different hazelnut cultivars based on polymorphic variations in the chloroplast genome. After sequencing the chloroplast genome of 12...
Article
Background: In the initial phase of the COVID-19 pandemic, a lower incidence and death rate was observed in Germany compared to its neighbouring countries, but some studies showed comparatively high death rates in ventilated COVID-19 patients. Methods: In this retrospective analysis, hospital stays of COVID-19 patients at 14 German university hospi...
Preprint
Full-text available
The amount of public proteomics data is increasing at an extraordinary rate. Hundreds of datasets are submitted each month to ProteomeXchange repositories, representing many types of proteomics studies, focusing on different aspects such as quantitative experiments, post-translational modifications, protein-protein interactions, or subcellular loca...
Article
Full-text available
Pathogens and associated outbreaks of infectious disease exert selective pressure on human populations, and any changes in allele frequencies that result may be especially evident for genes involved in immunity. In this regard, the 1346-1353 Yersinia pestis-caused Black Death pandemic, with continued plague outbreaks spanning several hundred years,...
Preprint
Full-text available
Background With a growing amount of (multi-)omics data being available, the extraction of knowledge from these datasets is still a difficult problem. Classical enrichment-style analyses require predefined pathways or gene sets that are tested for significant deregulation to assess whether the pathway is functionally involved in the biological proce...
Article
Full-text available
The SARS-CoV-2 virus is the causative agent of the global COVID-19 infectious disease outbreak, which can lead to acute respiratory distress syndrome (ARDS). However, it is still unclear how the virus interferes with immune cell and metabolic functions in the human body. In this study, we investigated the immune response in acute or convalescent CO...
Article
Full-text available
Objective We present the Berlin-Tübingen-Oncology corpus (BRONCO), a large and freely available corpus of shuffled sentences from German oncological discharge summaries annotated with diagnosis, treatments, medications, and further attributes including negation and speculation. The aim of BRONCO is to foster reproducible and openly available resear...
Article
Full-text available
Background The human leucocyte antigen (HLA) complex controls adaptive immunity by presenting defined fractions of the intracellular and extracellular protein content to immune cells. Understanding the benign HLA ligand repertoire is a prerequisite to define safe T-cell-based immunotherapies against cancer. Due to the poor availability of benign ti...
Preprint
Full-text available
As part of the NUM CODEX project, we have developed a workflow for data collection and transformation of patients suffering from COVID-19. Based on the GECCO dataset, electronic Case Report Forms (eCRFs) were designed for the electronic data capture (EDC)-Systems REDCap and DIS. Their standard CDISC ODM output was subsequently mapped and transforme...
Article
Full-text available
Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale int...
Article
Full-text available
Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories. However, given the increase in the number of clinical proteomics studies, an important emerging topic is the management and dissemination of clinical, and thus potentially sensitiv...
Article
Full-text available
The Wartberg culture (WBC, 3500-2800 BCE) dates to the Late Neolithic period, a time of important demographic and cultural transformations in western Europe. We performed genome-wide analyses of 42 individuals who were interred in a WBC collective burial in Niedertiefenbach, Germany (3300-3200 cal. BCE). The results showed that the farming populati...
Article
Full-text available
Background Overcoming the COVID-19 crisis requires new ideas and strategies for online communication of personal medical information and patient empowerment. Rapid testing of a large number of subjects is essential for monitoring and delaying the spread of SARS-CoV-2 in order to mitigate the pandemic’s consequences. People who do not know that they...
Preprint
BACKGROUND Overcoming the COVID-19 crisis requires new ideas and strategies for online communication of personal medical information and patient empowerment. Rapid testing of a large number of subjects is essential for monitoring and delaying the spread of SARS-CoV-2 in order to mitigate the pandemic’s consequences. People who do not know that they...
Article
Full-text available
The COVID-19 pandemic has caused strains on health systems worldwide disrupting routine hospital services for all non-COVID patients. Within this retrospective study, we analyzed inpatient hospital admissions across 18 German university hospitals during the 2020 lockdown period compared to 2018. Patients admitted to hospital between January 1 and M...
Article
Full-text available
T cell immunity is central for the control of viral infections. To characterize T cell immunity, but also for the development of vaccines, identification of exact viral T cell epitopes is fundamental. Here we identify and characterize multiple dominant and subdominant SARS-CoV-2 HLA class I and HLA-DR peptides as potential T cell epitopes in COVID-...
Article
The aim of the present study was the prediction of the geographical origin of almonds (Prunus dulcis Mill.) via Fourier transform near-infrared (FT-NIR) spectroscopy. For this purpose, 250 almond samples from six different countries were analyzed. As the year of harvest has a major impact on the metabolome, three different crop years (2017–2019) we...
Article
Zusammenfassung Hintergrund: In der Anfangsphase der COVID-19-Pandemie konnte in Deutschland zwar eine niedrigere Inzidenz- und Letalitätsrate im Vergleich zu seinen Nachbarländern beobachtet werden, allerdings zeigten Studien zum Teil vergleichsweise hohe Letalitätsraten bei beatmeten COVID-19-Patienten. Methode: Im Rahmen dieser retrospektiven A...
Preprint
Full-text available
Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. Main advantages include greater reproducibility, sensitivity and dynamic range compared to data-dependent acquisition (DDA). However, data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we...
Article
Technological advances in high-resolution mass spectrometry (MS) vastly increased the number of samples that can be processed in a life science experiment, as well as volume and complexity of the generated data. To address the bottleneck of high-throughput data processing, we present SmartPeak (https://github.com/AutoFlowResearch/SmartPeak), an app...