David B. Blumenthal

David B. Blumenthal
Friedrich-Alexander-University of Erlangen-Nürnberg | FAU · Department Artificial Intelligence in Biomedical Engineering

Prof. Dr.

About

44
Publications
6,470
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
398
Citations
Introduction
I am tenure track assistant professor head of the Biomedical Network Science Lab at the Department Artificial Intelligence in Biomedical Engineerung of the Friedrich-Alexander University Erlangen-Nürnberg. My lab investigates molecular disease mechanisms, using techniques from network science and artificial intelligence. We also develop privacy-preserving decentralized biomedical AI solutions, which enable cross-institutional studies on sensitive data.

Publications

Publications (44)
Article
As the development of new drugs reaches its physical and financial limits, drug repurposing has become more important than ever. For mechanistically grounded drug repurposing, it is crucial to uncover the disease mechanisms and to detect clusters of mechanistically related diseases. Various methods for computing candidate disease mechanisms and dis...
Preprint
Full-text available
Federated learning (FL) is emerging as a privacy-aware alternative to classical cloud-based machine learning. In FL, the sensitive data remains in data silos and only aggregated parameters are exchanged. Hospitals and research institutions which are not willing to share their data can join a federated study without breaching confidentiality. In add...
Preprint
Gene regulation is frequently altered in diseases in unique and often patient-specific ways. Hence, personalized strategies have been proposed to infer patient-specific gene-regulatory networks. However, existing methods do not focus on disease-specific dysregulation or lack assessments of statistical significance. Moreover, they do not account for...
Article
Matchings between objects from two datasets, domains, or ontologies have to be computed in various application scenarios. One often used meta-approach — which we call bipartite data matching — is to leverage domain knowledge for defining costs between the objects that should be matched, and to then use the classical Hungarian algorithm to compute a...
Article
Full-text available
Background Artificial intelligence (AI) has been successfully applied in numerous scientific domains. In biomedicine, AI has already shown tremendous potential, e.g., in the interpretation of next-generation sequencing data and in the design of clinical decision support systems. Objectives However, training an AI model on sensitive data raises conc...
Article
Motivation Disease module mining methods (DMMMs) extract subgraphs that constitute candidate disease mechanisms from molecular interaction networks such as protein-protein interaction (PPI) networks. Irrespective of the employed models, DMMMs typically include non-robust steps in their workflows, i. e., the computed subnetworks vary when running th...
Article
Full-text available
Aggregating transcriptomics data across hospitals can increase sensitivity and robustness of differential expression analyses, yielding deeper clinical insights. As data exchange is often restricted by privacy legislation, meta-analyses are frequently employed to pool local results. However, the accuracy might drop if class labels are inhomogeneous...
Article
Full-text available
Traditional drug discovery faces a severe efficacy crisis. Repurposing of registered drugs provides an alternative with lower costs and faster drug development timelines. However, the data necessary for the identification of disease modules, i.e. pathways and sub-networks describing the mechanisms of complex diseases which contain potential drug ta...
Preprint
Finding the graphs that are most similar to a query graph in a large database is a common task with various applications. A widely-used similarity measure is the graph edit distance, which provides an intuitive notion of similarity and naturally supports graphs with vertex and edge attributes. Since its computation is NP-hard, techniques for accele...
Chapter
The inference of minimum spanning arborescences within a set of objects is a general problem which translates into numerous application-specific unsupervised learning tasks. We introduce a unified and generic structure called edit arborescence that relies on edit paths between data in a collection, as well as the Minimum Edit Arborescence Problem,...
Chapter
Finding the graphs that are most similar to a query graph in a large database is a common task with various applications. A widely-used similarity measure is the graph edit distance, which provides an intuitive notion of similarity and naturally supports graphs with vertex and edge attributes. Since its computation is NP-hard, techniques for accele...
Article
We present the AIMe registry, a community-driven reporting platform for AI in biomedicine. It aims to enhance the accessibility, reproducibility and usability of biomedical AI models, and allows future revisions by the community. View-only version: https://rdcu.be/cv5H7
Preprint
Full-text available
The inference of minimum spanning arborescences within a set of objects is a general problem which translates into numerous application-specific unsupervised learning tasks. We introduce a unified and generic structure called edit arborescence that relies on edit paths between data in a collection, as well as the Min Edit Arborescence Problem, whic...
Article
In network and systems medicine, active module identification methods (AMIMs) are widely used for discovering candidate molecular disease mechanisms. To this end, AMIMs combine network analysis algorithms with molecular profiling data, most commonly, by projecting gene expression data onto generic protein–protein interaction (PPI) networks. Althoug...
Article
Full-text available
In this paper, we present GMG-BCU — a local search algorithm based on block coordinate update for estimating a generalized median graph for a given collection of labeled or unlabeled input graphs. Unlike all competitors, GMG-BCU is designed for both discrete and continuous label spaces and can be configured to run in linear time w. r. t. the size o...
Article
Full-text available
The graph edit distance (GED) is a flexible distance measure which is widely used for inexact graph matching. Since its exact computation is [Formula: see text]-hard, heuristics are used in practice. A popular approach is to obtain upper bounds for GED via transformations to the linear sum assignment problem with error-correction (LSAPE). Typically...
Article
Motivation Unsupervised learning approaches are frequently employed to stratify patients into clinically relevant subgroups and to identify biomarkers such as disease-associated genes. However, clustering and biclustering techniques are oblivious to the functional relationship of genes and are thus not ideally suited to pinpoint molecular mechanism...
Article
Coronavirus disease 2019 (COVID-19), caused by the SARS-CoV-2 virus, has developed into a pandemic causing major disruptions and hundreds of thousands of deaths in wide parts of the world. As of July 3, 2020, neither vaccines nor approved drugs for effective treatment are available. In this article, we showcase how to individuate drug targets and p...
Article
Motivation Recently, various tools for detecting single nucleotide polymorphisms (SNPs) involved in epistasis have been developed. However, no studies evaluate the employed statistical epistasis models such as the χ2-test or quadratic regression independently of the tools that use them. Such an independent evaluation is crucial for developing impro...
Preprint
Full-text available
Federated learning is a well-established approach to privacy-preserving training of a joint model on heavily distributed data. Federated averaging (FedAvg) is a well-known communication-efficient algorithm for federated learning, which performs well if the data distribution across the clients is independently and identically distributed (IID). Howe...
Preprint
Full-text available
Aggregating clinical transcriptomics data across hospitals can increase sensitivity and robustness of differential gene expression analyses yielding deeper clinical insights. As data exchange is often restricted by privacy legislation, meta-analyses are frequently employed to pool local results. However, if class labels or confounders are inhomogen...
Article
Full-text available
We discuss the trend towards using quantitative metrics for evaluating research. We claim that, rather than promoting meaningful research, purely metric-based research evaluation schemes potentially lead to a dystopian academic reality, leaving no space for creativity and intellectual initiative. After sketching what the future could look like if q...
Article
Full-text available
In this paper, we investigate the computation of alternative paths between two locations in a road network. More specifically, we study the k-shortest paths with limited overlap (\(k\text {SPwLO}\)) problem that aims at finding a set of k paths such that all paths are sufficiently dissimilar to each other and as short as possible. To compute \(k\te...
Preprint
Full-text available
Artificial intelligence (AI) has been successfully applied in numerous scientific domains including biomedicine and healthcare. Here, it has led to several breakthroughs ranging from clinical decision support systems, image analysis to whole genome sequencing. However, training an AI model on sensitive data raises also concerns about the privacy of...
Article
Full-text available
Coronavirus Disease-2019 (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. Various studies exist about the molecular mechanisms of viral infection. However, such information is spread across many publications and it is very time-consuming to integrate, and exploit. We develop CoVex, an interactive online platform for SARS-CoV-2 ho...
Preprint
Full-text available
Coronavirus Disease-2019 (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. It was first identified in Wuhan, China, and has since spread causing a global pandemic. Various studies have been performed to understand the molecular mechanisms of viral infection for predicting drug repurposing candidates. However, such information is s...
Article
Simulated data is crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes, or depend on proprietary software. In contrast,...
Article
Full-text available
Because of its flexibility, intuitiveness, and expressivity, the graph edit distance (GED) is one of the most widely used distance measures for labeled graphs. Since exactly computing GED is NP-hard, over the past years, various heuristics have been proposed. They use techniques such as transformations to the linear sum assignment problem with erro...
Article
Full-text available
The graph edit distance (GED) measures the dissimilarity between two graphs as the minimal cost of a sequence of elementary operations transforming one graph into another. This measure is fundamental in many areas such as structural pattern recognition or classification. However, exactly computing GED is NP-hard. Among different classes of heuristi...
Preprint
Full-text available
Due to their capacity to encode rich structural information, labeled graphs are often used for modeling various kinds of objects such as images, molecules, and chemical compounds. If pattern recognition problems such as clustering and classification are to be solved on these domains, a (dis-)similarity measure for labeled graphs has to be defined....
Preprint
Full-text available
Graph Edit Distance (GED) measures the dissimilarity between two graphs as the minimal cost of a sequence of elementary operations transforming one graph into another. This measure is fundamental in many areas such as structural pattern recognition or classification. However, exactly computing GED is NP-hard. Among different classes of heuristic al...
Preprint
The graph edit distance (GED) is a flexible distance measure which is widely used for inexact graph matching. Since its exact computation is NP-hard, heuristics are used in practice. A popular approach is to obtain upper bounds for GED via transformations to the linear sum assignment problem with error-correction (LSAPE). Typically, local structure...
Conference Paper
The graph edit distance () is a flexible graph dissimilarity measure widely used within the structural pattern recognition field. In this paper, we present GEDLIB, a C++ library for exactly or approximately computing . Many existing algorithms for are already implemented in GEDLIB. Moreover, GEDLIB is designed to be easily extensible: for implement...
Conference Paper
Full-text available
We discuss the trend towards using quantitative metrics for evaluating research. We claim that, rather than promoting meaningful research, purely metric-based research evaluation schemes potentially lead to a dystopian academic reality, leaving no space for creativity and intellectual initiative. After sketching what the future could look like if q...
Preprint
Full-text available
Shortest path computation is a fundamental problem in road networks. However, in many real-world scenarios, determining solely the shortest path is not enough. In this paper, we study the problem of finding k-Dissimilar Paths with Minimum Collective Length (kDPwML), which aims at computing a set of paths from a source s to a target t such that all...
Conference Paper
Full-text available
The graph edit distance () is a flexible graph dissimilarity measure widely used within the structural pattern recognition field. A widely used paradigm for approximating is to define local structures rooted at the nodes of the input graphs and use these structures to transform the problem of computing into a linear sum assignment problem with erro...
Conference Paper
Full-text available
The graph edit distance (GED) is a widely used distance measure for attributed graphs. It has recently been shown that the problem of computing GED, which is a NP-hard optimization problem, can be formulated as a quadratic assignment problem (QAP). This formulation is useful, since it allows to derive well performing approximative heuristics for GE...
Article
The graph edit distance is a widely used distance measure for labelled graph. However, A★−GED, the standard approach for its exact computation, suffers from huge runtime and memory requirements. Recently, three better performing algorithms have been proposed: The general algorithms DF−GED and BIP−GED, and the algorithm CSI−GED, which only works for...
Article
We propose an algorithm that efficiently solves the linear sum assignment problem with error-correction and no cost constraints. This problem is encountered for instance in the approximation of the graph edit distance. The fastest currently available solvers for the linear sum assignment problem require the pairwise costs to respect the triangle in...
Article
Full-text available
The problem of deriving lower and upper bounds for the edit distance between undirected, labelled graphs has recently received increasing attention. However, only one algorithm has been proposed that allegedly computes not only an upper but also a lower bound for non-uniform edit costs and incorporates information about both node and edge labels. I...
Conference Paper
Full-text available
The graph edit distance is a well-established and widely used distance measure for labelled, undirected graphs. However, since its ex- act computation is NP-hard, research has mainly focused on devising approximative heuristics and only few exact algorithms have been pro- posed. The standard approach A⋆-GED, a node-based best-first search that work...
Conference Paper
Full-text available
The problem of deriving lower and upper bounds for the edit distance between labelled undirected graphs has recently received increasing attention. However, only one algorithm has been proposed that allegedly computes not only an upper but also a lower bound for non-uniform metric edit costs and incorporates information about both node and edge lab...

Network

Cited By