
Oliver KohlbacherUniversity of Tuebingen | EKU Tübingen · Center for Bioinformatics and Medical Informatics
Oliver Kohlbacher
Professor
About
612
Publications
102,461
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,795
Citations
Citations since 2017
Introduction
Additional affiliations
July 2015 - present
February 2012 - December 2018
July 2003 - present
Education
April 1996 - January 2001
October 1990 - January 1996
Publications
Publications (612)
Models derived from human pluripotent stem cells that accurately recapitulate neural development in vitro and allow for the generation of specific neuronal subtypes are of major interest to the stem cell and biomedical community. Notch signalling, particularly through the Notch effector HES5, is a major pathway critical for the onset and maintenanc...
RNA-protein complexes play pivotal roles in many central biological processes. Although methods based on high-throughput sequencing have advanced our ability to identify the specific RNAs bound by a particular protein, there is a need for precise and systematic ways to identify RNA interaction sites on proteins. We have developed an experimental an...
Motivation:
The human leukocyte antigen (HLA) gene cluster plays a crucial role in adaptive immunity and is thus relevant in many biomedical applications. While next-generation sequencing data are often available for a patient, deducing the HLA genotype is difficult because of substantial sequence similarity within the cluster and exceptionally hi...
DNA methylation is a defining feature of mammalian cellular identity and is essential for normal development. Most cell types, except germ cells and pre-implantation embryos, display relatively stable DNA methylation patterns, with 70-80% of all CpGs being methylated. Despite recent advances, we still have a limited understanding of when, where and...
Differentiation of human embryonic stem cells (hESCs) provides a unique opportunity to study the regulatory mechanisms that facilitate cellular transitions in a human context. To that end, we performed comprehensive transcriptional and epigenetic profiling of populations derived through directed differentiation of hESCs representing each of the thr...
In recent years, modern life sciences research underwent a rapid development driven mainly by the technical improvements in analytical areas leading to miniaturization, parallelization, and high throughput processing of biological samples. This has led to the generation of huge amounts of experimental data. To meet these rising demands, the German...
Mass spectrometry has become an indispensable tool in the life sciences. The new major version 3 of the computational framework OpenMS provides significant advancements regarding open, scalable, and reproducible high-throughput workflows for proteomics, metabolomics, and oligonucleotide mass spectrometry. OpenMS makes analyses from emerging fields...
Public proteomics data is rapidly increasing, creating a computational challenge for large-scale reanalysis. Here, we introduce quantms, an open-source cloud-based pipeline for massively parallel proteomics data analysis. We used quantms to reanalyze 56 of the largest datasets, comprising 26801 instrument files from 9502 human samples, to quantify...
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that c...
Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentat...
Neonatal apneas and hypopneas present a serious risk for healthy infant development. Treating these adverse events requires frequent manual stimulation by skilled personnel, which can lead to alert fatigue. Automatically predicting these adverse events before they occur would enable the use of methods for automatic intervention. In this work, we pr...
Background
Personalized oncology represents a shift in cancer treatment from conventional methods to target specific therapies where the decisions are made based on the patient specific tumor profile. Selection of the optimal therapy relies on a complex interdisciplinary analysis and interpretation of these variants by experts in molecular tumor bo...
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that c...
Background
The clinical utility of molecular profiling and targeted therapies for neuro-oncology patients outside of clinical trials is not established. We aimed at investigating feasibility and clinical utility of molecular profiling and targeted therapy in adult patients with advanced tumors in the nervous system within a prospective observationa...
Background
The immune peptidome of OPSCC has not previously been studied. Cancer-antigen specific vaccination may improve clinical outcome and efficacy of immune checkpoint inhibitors such as PD1/PD-L1 antibodies.
Methods
Mapping of the OPSCC HLA ligandome was performed by mass spectrometry (MS) based analysis of naturally presented HLA ligands is...
Background
Although of high individual and socioeconomic relevance, a reliable prediction model for the prognosis of juvenile stroke (18–55 years) is missing. Therefore, the study presented in this protocol aims to prospectively validate the discriminatory power of a prediction score for the 3 months functional outcome after juvenile stroke or tran...
The need for data privacy and security -- enforced through increasingly strict data protection regulations -- renders the use of healthcare data for machine learning difficult. In particular, the transfer of data between different hospitals is often not permissible and thus cross-site pooling of data not an option. The Personal Health Train (PHT) p...
Technical and methodological advances enable to apply crosslinking mass spectrometry (XL-MS) for the identification of DNA and RNA binding sites within proteins in vitro. The use of chemical crosslinking agents and incorporation of photoactivatable nucleotides increases crosslinking efficiencies broadening the applicability of XL-MS to in vivo stud...
Lysosomes are well-established as the main cellular organelles for the degradation of macromolecules and emerging as regulatory centers of metabolism. They are of crucial importance for cellular homeostasis, which is exemplified by a plethora of disorders related to alterations in lysosomal function. In this context, protein complexes play a decisi...
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that c...
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that c...
To identify potential genetic causes for Mayer-Rokitansky-Küster-Hauser syndrome (MRKH), we analyzed blood and rudimentary uterine tissue of 5 MRKH discordant monozygotic twin pairs. Assuming that a variant solely identified in the affected twin or affected tissue could cause the phenotype, we identified a mosaic variant in ACTR3B with high allele...
Background
As the number of concomitantly used drugs increases, the prevalence of medication risks increases. These include, for example, drug interactions which may reduce or increase the desired and undesired effects of individual drugs.Objectives
The POLypharmacy, drug interActions and Risks (POLAR) project aims to contribute to the detection of...
Even though raw mass spectrometry data is information rich, the vast majority of the data is underutilized. The ability to interrogate these rich datasets is handicapped by the limited capability and flexibility of existing software. We introduce the Mass Spec Query Language (MassQL) that addresses these issues by enabling an expressive set of mass...
The detailed analysis and structural characterization of proteoforms by top-down proteomics (TDP) has gained a lot of interest in biomedical research. Data-dependent acquisition (DDA) of intact proteins is non-trivial due to the diversity and complexity of proteoforms. Dedicated acquisition methods thus have the potential to greatly improve TDP. He...
Background
Stroke is one of the most frequent diseases, and half of the stroke survivors are left with permanent impairment. Prediction of individual outcome is still difficult. Many but not all patients with stroke improve by approximately 1.7 times the initial impairment, that has been termed proportional recovery rule. The present study aims at...
Objective
Mayer-Rokitansky-Küster-Hauser syndrome (MRKH) is a rare congenital disease manifesting with aplasia or severe hypoplasia of uterine structures. Even though extensive studies have been performed, for the majority of cases the etiology remains unclear. In this study, we sought to identify genetic causes in discordant monozygotic (MZ) twins...
Mass deconvolution, the determination of proteoform precursor and fragment masses, is crucial for top-down proteomics data analysis. Here we describe the detailed procedure to run FLASHDeconv, an ultrafast, high-quality mass deconvolution tool. Both spectrum- and feature-level deconvolution results are obtainable in various output formats by FLASHD...
COVID-19 has challenged the healthcare systems worldwide. To quickly identify successful diagnostic and therapeutic approaches large data sharing approaches are inevitable. Though organizational clinical data are abundant, many of them are available only in isolated silos and largely inaccessible to external researchers. To overcome and tackle this...
Human expansion in the course of the Neolithic transition in western Eurasia has been one of the major topics in ancient DNA (aDNA) research in the last ten years. Multiple studies have shown that the spread of agriculture and animal husbandry from the Near East across Europe was accompanied by large-scale human expansions. Moreover, changes in sub...
Background
With a growing amount of (multi-)omics data being available, the extraction of knowledge from these datasets is still a difficult problem. Classical enrichment-style analyses require predefined pathways or gene sets that are tested for significant deregulation to assess whether the pathway is functionally involved in the biological proce...
The extraction of meaningful biological knowledge from high-throughput mass spectrometry data relies on limiting false discoveries to a manageable amount. For targeted approaches in metabolomics a main challenge is the detection of false positive metabolic features in the low signal-to-noise ranges of data-independent acquisition results and their...
Motivation
Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease-gene association studies are of great importance. However, genomic data is very sensitive when compared to other data types and contains information about individuals and their relatives....
Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and i...
Proteins are central to all of the processes of life. For their activity, they almost invariably need to interact with other macromolecules, be they nucleic acids, membranes, glycans, or other proteins. The interaction between proteins is indeed the most common mode of macromolecular interaction underpinning living systems. To understand these syst...
Understanding the molecular principles that govern the composition of the MHC-I immunopeptidome across different primary tissues is fundamentally important to predict how T cell respond in different contexts in vivo. Here, we performed a global analysis of the MHC-I immunopeptidome from 29 and 19 primary human and mouse tissues, respectively. First...
Lysosomes are well-established as the main cellular organelles for the degradation of macromolecules and emerging as regulatory centers of metabolism. They are of crucial importance for cellular homeostasis, which is exemplified by a plethora of disorders related to alterations in lysosomal function. In this context, protein complexes play a decisi...
Visualisations of metabolites and metabolic pathways have been used since the early years of research in biology, and pathway maps have become very popular in biochemistry textbooks, on posters, as well as in electronic resources and web pages about metabolism. Visualisations help to present knowledge and support browsing through chemical structure...
Background
Intensive Care Resources are heavily utilized during the COVID-19 pandemic. However, risk stratification and prediction of SARS-CoV-2 patient clinical outcomes upon ICU admission remain inadequate. This study aimed to develop a machine learning model, based on retrospective & prospective clinical data, to stratify patient risk and predic...
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics...
Top-down proteomics (TDP) has gained a lot of interest in biomedical application for detailed analysis and structural characterization of proteoforms. Data-dependent acquisition (DDA) of intact proteins is non-trivial due to the diversity and complex signal of proteoforms. Dedicated acquisition methods thus have the potential to greatly improve TDP...
Background
Personalized oncology represents a shift in cancer treatment from conventional methods to target specific therapies where the decisions are made based on the patient specific tumor profile. Selection of the optimal therapy relies on a complex interdisciplinary analysis and interpretation of these variants by experts in molecular tumor bo...
Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. The main advantages include greater reproducibility and sensitivity and a greater dynamic range compared with data-dependent acquisition (DDA). However, the data analysis is complex and often requires expert knowledge when dealing with large-sc...
Background. Intensive Care Resources are heavily utilized during the COVID-19 pandemic. However, risk stratification and prediction of SARS-CoV-2 patient clinical outcomes upon ICU admission remain inadequate. This study aimed to develop a machine learning model, based on retrospective & prospective clinical data, to stratify patient risk and predi...
High price differences between European (e.g., from Italy) and Caucasian (e.g., from Georgia) cultivars motivate food fraud in hazelnuts. In this work, we present two targeted methods for differentiation and grouping different hazelnut cultivars based on polymorphic variations in the chloroplast genome. After sequencing the chloroplast genome of 12...
Background: In the initial phase of the COVID-19 pandemic, a lower incidence and death rate was observed in Germany compared to its neighbouring countries, but some studies showed comparatively high death rates in ventilated COVID-19 patients. Methods: In this retrospective analysis, hospital stays of COVID-19 patients at 14 German university hospi...
The amount of public proteomics data is increasing at an extraordinary rate. Hundreds of datasets are submitted each month to ProteomeXchange repositories, representing many types of proteomics studies, focusing on different aspects such as quantitative experiments, post-translational modifications, protein-protein interactions, or subcellular loca...
Pathogens and associated outbreaks of infectious disease exert selective pressure on human populations, and any changes in allele frequencies that result may be especially evident for genes involved in immunity. In this regard, the 1346-1353 Yersinia pestis-caused Black Death pandemic, with continued plague outbreaks spanning several hundred years,...
Background
With a growing amount of (multi-)omics data being available, the extraction of knowledge from these datasets is still a difficult problem. Classical enrichment-style analyses require predefined pathways or gene sets that are tested for significant deregulation to assess whether the pathway is functionally involved in the biological proce...
The SARS-CoV-2 virus is the causative agent of the global COVID-19 infectious disease outbreak, which can lead to acute respiratory distress syndrome (ARDS). However, it is still unclear how the virus interferes with immune cell and metabolic functions in the human body. In this study, we investigated the immune response in acute or convalescent CO...
Objective
We present the Berlin-Tübingen-Oncology corpus (BRONCO), a large and freely available corpus of shuffled sentences from German oncological discharge summaries annotated with diagnosis, treatments, medications, and further attributes including negation and speculation. The aim of BRONCO is to foster reproducible and openly available resear...
Background
The human leucocyte antigen (HLA) complex controls adaptive immunity by presenting defined fractions of the intracellular and extracellular protein content to immune cells. Understanding the benign HLA ligand repertoire is a prerequisite to define safe T-cell-based immunotherapies against cancer. Due to the poor availability of benign ti...
As part of the NUM CODEX project, we have developed a workflow for data collection and transformation of patients suffering from COVID-19. Based on the GECCO dataset, electronic Case Report Forms (eCRFs) were designed for the electronic data capture (EDC)-Systems REDCap and DIS. Their standard CDISC ODM output was subsequently mapped and transforme...
Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale int...
Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories. However, given the increase in the number of clinical proteomics studies, an important emerging topic is the management and dissemination of clinical, and thus potentially sensitiv...
The Wartberg culture (WBC, 3500-2800 BCE) dates to the Late Neolithic period, a time of important demographic and cultural transformations in western Europe. We performed genome-wide analyses of 42 individuals who were interred in a WBC collective burial in Niedertiefenbach, Germany (3300-3200 cal. BCE). The results showed that the farming populati...
Background
Overcoming the COVID-19 crisis requires new ideas and strategies for online communication of personal medical information and patient empowerment. Rapid testing of a large number of subjects is essential for monitoring and delaying the spread of SARS-CoV-2 in order to mitigate the pandemic’s consequences. People who do not know that they...
BACKGROUND
Overcoming the COVID-19 crisis requires new ideas and strategies for online communication of personal medical information and patient empowerment. Rapid testing of a large number of subjects is essential for monitoring and delaying the spread of SARS-CoV-2 in order to mitigate the pandemic’s consequences. People who do not know that they...
The COVID-19 pandemic has caused strains on health systems worldwide disrupting routine hospital services for all non-COVID patients. Within this retrospective study, we analyzed inpatient hospital admissions across 18 German university hospitals during the 2020 lockdown period compared to 2018. Patients admitted to hospital between January 1 and M...
T cell immunity is central for the control of viral infections. To characterize T cell immunity, but also for the development of vaccines, identification of exact viral T cell epitopes is fundamental. Here we identify and characterize multiple dominant and subdominant SARS-CoV-2 HLA class I and HLA-DR peptides as potential T cell epitopes in COVID-...
The aim of the present study was the prediction of the geographical origin of almonds (Prunus dulcis Mill.) via Fourier transform near-infrared (FT-NIR) spectroscopy. For this purpose, 250 almond samples from six different countries were analyzed. As the year of harvest has a major impact on the metabolome, three different crop years (2017–2019) we...
Zusammenfassung
Hintergrund: In der Anfangsphase der COVID-19-Pandemie konnte in Deutschland zwar eine niedrigere Inzidenz- und Letalitätsrate im Vergleich zu seinen Nachbarländern beobachtet werden, allerdings zeigten Studien zum Teil vergleichsweise hohe Letalitätsraten bei beatmeten COVID-19-Patienten.
Methode: Im Rahmen dieser retrospektiven A...
Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. Main advantages include greater reproducibility, sensitivity and dynamic range compared to data-dependent acquisition (DDA). However, data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we...
Technological advances in high-resolution mass spectrometry (MS) vastly increased the number of samples that can be processed in a life science experiment, as well as volume and complexity of the generated data. To address the bottleneck of high-throughput data processing, we present SmartPeak (https://github.com/AutoFlowResearch/SmartPeak), an app...