Jörg Rahnenführer

Jörg Rahnenführer
Technische Universität Dortmund | TUD · Faculty of Statistics

Prof. Dr.

About

361
Publications
37,819
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,523
Citations
Citations since 2017
141 Research Items
6168 Citations
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000
Additional affiliations
April 2007 - December 2009
Technische Universität Dortmund
Position
  • Professor (Full)
November 2002 - March 2007
Max Planck Institute for Informatics
Position
  • PostDoc Position
October 2001 - September 2002
University of Nebraska Medical Center
Position
  • Consultant

Publications

Publications (361)
Article
Parabens have been used for decades as preservatives in food, drugs and cosmetics. The majority however, were banned in 2009 and 2014 leaving only methyl-, ethyl-, propyl-, and butyl-derivates available for subsequent use. Methyl- and propylparaben have been extensively tested in vivo, with no resulting evidence for developmental and reproductive t...
Article
Full-text available
Animal studies for embryotoxicity evaluation of potential therapeutics and environmental factors are complex, costly, and time-consuming. Often, studies are not of human relevance because of species differences. In the present study, we recapitulated the process of cardiomyogenesis in human induced pluripotent stem cells (hiPSCs) by modulation of t...
Article
Full-text available
The analysis of dose–response, concentration–response, and time–response relationships is a central component of toxicological research. A major decision with respect to the statistical analysis is whether to consider only the actually measured concentrations or to assume an underlying (parametric) model that allows extrapolation. Recent research s...
Preprint
Full-text available
Animal studies for embryotoxicity evaluation of potential therapeutics and environmental factors are complex, costly, and time-consuming. Often, studies are not of human relevance because of species differences. In the present study, we recapitulated the process of cardiomyogenesis in human induced pluripotent stem cells (hiPSCs) by modulation of t...
Article
Full-text available
Background In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have lar...
Article
Full-text available
Background & aims: Nonalcoholic fatty liver disease (NAFLD) is a major health burden associated with the metabolic syndrome leading to liver fibrosis, cirrhosis and ultimately liver cancer. In humans, the PNPLA3 I148M polymorphism of the phospholipase patatin-like phospholipid domain containing protein 3 (PNPLA3) has a well-documented impact on me...
Article
Full-text available
We examined differences in HER2 expression between primary tumors and distant metastases, particularly within the HER2-negative primary breast cancer cohort (HER2-low and HER2-zero). The retrospective study included 191 consecutive paired samples of primary breast cancer and distant metastases diagnosed between 1995 and 2019. HER2-negative samples...
Article
Full-text available
The experience of adversity in childhood has been associated with poor health outcomes in adulthood. In search of the biological mechanisms underlying these effects, research so far focused on alterations of DNA methylation or shifts in transcriptomic profiles. The level of protein, however, has been largely neglected. We utilized mass spectrometry...
Article
Full-text available
Background Intrinsic or acquired resistance to HER2-targeted therapy is often a problem when small molecule tyrosine kinase inhibitors or antibodies are used to treat patients with HER2 positive breast cancer. Therefore, the identification of new targets and therapies for this patient group is warranted. Activated choline metabolism, characterized...
Preprint
Full-text available
Background: Novel antibody-drug conjugates (ADCs) show activity in HER2-low advanced breast cancer. We examined differences in HER2 expression between primary tumors and distant metastases, particularly within the HER2-negative cohort (HER2-low and HER2-zero). Patients and Methods: The retrospective study included 191 consecutive paired samples of...
Article
Full-text available
Proteasome inhibition is associated with parkinsonian pathology in vivo and degeneration of dopaminergic neurons in vitro. We explored here the metabolome (386 metabolites) and transcriptome (3257 transcripts) regulations of human LUHMES neurons, following exposure to MG-132 [100 nM]. This proteasome inhibitor killed cells within 24 h but did not r...
Chapter
Full-text available
Survival analysis comprises statistical methods for time-to-event data. The main prediction tasks include the estimation of the influence of prognostic factors for, say, medical treatments, and the modelling and prediction of survival times using regression models. In recent years, in molecular medicine, many omics technologies have been developed,...
Article
Full-text available
A range of regularization approaches have been proposed in the data sciences to overcome overfitting, to exploit sparsity or to improve prediction. Using a broad definition of regularization, namely controlling model complexity by adding information in order to solve ill-posed problems or to prevent overfitting, we review a range of approaches with...
Article
Full-text available
Human-relevant tests to predict developmental toxicity are urgently needed. A currently intensively studied approach makes use of differentiating human stem cells to measure chemically-induced deviations of the normal developmental program, as in a recent study based on cardiac differentiation (UKK2). Here, we (i) tested the performance of an assay...
Article
Full-text available
In bottom-up proteomics, proteins are enzymatically digested into peptides before measurement with mass spectrometry. The relationship between proteins and their corresponding peptides can be represented by bipartite graphs. We conduct a comprehensive analysis of bipartite graphs using quantified peptides from measured data sets as well as theoreti...
Article
Background Recently, novel antibody––drug conjugates (ADCs) showed clinical activity in a subset of advanced human epidermal growth factor receptor 2 (HER2)-negative patients. We investigated the prognostic significance of HER2-low and HER2-zero tumours. Patients and methods The retrospective cohort study included 410 consecutive node-negative bre...
Article
The accumulation of lipid droplets in hepatocytes is a key feature of drug-induced liver injury (DILI) and can be induced by a subset of hepatotoxic compounds. In the present study, we optimized and evaluated an in vitro technique based on the fluorescent dye Nile Red, further named Nile Red assay to quantify lipid droplets induced by the exposure...
Article
Full-text available
Foraminifera are highly diverse and have a long evolutionary history. As key bioindicators, their phylogenetic schemes are of great importance for paleogeographic applications, but may be hard to recognize correctly. The phylogenetic relationships within the prominent genus Amphistegina are still uncertain. Molecular studies on Amphistegina have so...
Article
Full-text available
Background Pluripotent stem cell (PSC)-derived hepatocyte-like cells (HLC) have enormous potential as a replacement for primary hepatocytes in drug screening, toxicology and cell replacement therapy, but their genome-wide expression patterns differ strongly from primary human hepatocytes (PHH). Methods We differentiated human induced pluripotent s...
Article
Statistical modeling approaches for dose-response or concentration-response analyses are often required in toxicological applications, especially for cytotoxicity assays. By fitting a concentration-response curve, one can derive target concentrations, such as the EC50. In practice, concentration-response data for different exposure durations might...
Article
Full-text available
Despite the progress made in developmental toxicology, there is a great need for in vitro tests that identify developmental toxicants in relation to human oral doses and blood concentrations. In the present study, we established the hiPSC-based UKK2 in vitro test and analyzed genome-wide expression profiles of 23 known teratogens and 16 non-teratog...
Preprint
Full-text available
For understanding large text corpora, a widely used method is Latent Dirichlet Allocation (LDA). The topic assignments from LDA usually rely on a (random) initialization such that the outcome is also to some extent random. In particular, replicated runs on the same text data lead to different results such that the LDA is not fully reproducible. Thi...
Article
Bile acids (BA) are known to influence the susceptibility of hepatocytes to chemicals. We investigated the cytotoxicity of 18 compounds with known hepatotoxicity status and pharmacokinetics in cultivated primary human hepatocytes with and without the addition of a BA mix to the cell culture medium. This BA mix consisted of physiological ratios of t...
Chapter
Fitting models with high predictive accuracy that include all relevant but no irrelevant or redundant features is a challenging task on data sets with similar (e.g. highly correlated) features. We propose the approach of tuning the hyperparameters of a predictive model in a multi-criteria fashion with respect to predictive accuracy and feature sele...
Article
Full-text available
Background& Aims Acetaminophen (APAP) overdose remains a frequent cause of acute liver failure, which in patients is generally accompanied by increased levels of serum bile acids (BA). However, the pathophysiological role of BA remains elusive. Here, we investigated the role of BA in APAP-induced hepatotoxicity. Methods We performed intravital ima...
Article
We extend the scope of application for MCP‐Mod (Multiple Comparison Procedure and Modeling) to in vitro gene expression data and assess its characteristics regarding model selection for concentration gene expression curves. Precisely, we apply MCP‐Mod on single genes of a high‐dimensional gene expression data set, where human embryonic stem cells w...
Article
We propose to use Bayesian optimization (BO) to improve the efficiency of the design selection process in clinical trials. BO is a method to optimize expensive black‐box functions, by using a regression as a surrogate to guide the search. In clinical trials, planning test procedures and sample sizes is a crucial task. A common goal is to maximize t...
Article
Full-text available
Background Important objectives in cancer research are the prediction of a patient’s risk based on molecular measurements such as gene expression data and the identification of new prognostic biomarkers (e.g. genes). In clinical practice, this is often challenging because patient cohorts are typically small and can be heterogeneous. In classical su...
Article
Full-text available
Background An important task in clinical medicine is the construction of risk prediction models for specific subgroups of patients based on high-dimensional molecular measurements such as gene expression data. Major objectives in modeling high-dimensional data are good prediction performance and feature selection to find a subset of predictors that...
Article
Full-text available
An in vitro/in silico method that determines the risk of human drug induced liver injury in relation to oral doses and blood concentrations of drugs was recently introduced. This method utilizes information on the maximal blood concentration (Cmax) for a specific dose of a test compound, which can be estimated using physiologically-based pharmacoki...
Article
Full-text available
Mouse models of non-alcoholic fatty liver disease (NAFLD) are required to define therapeutic targets, but detailed time-resolved studies to establish a sequence of events are lacking. Here, we fed male C57Bl/6N mice a Western or standard diet over 48 weeks. Multiscale time-resolved characterization was performed using RNA-seq, histopathology, immun...
Article
Full-text available
Feature selection is crucial for the analysis of high-dimensional data, but benchmark studies for data with a survival outcome are rare. We compare 14 filter methods for feature selection based on 11 high-dimensional gene expression survival data sets. The aim is to provide guidance on the choice of filter methods for other researchers and practiti...
Preprint
Full-text available
Motivation In bottom-up proteomics, proteins are enzymatically digested before measurement with mass spectrometry. The relationship between proteins and peptides can be represented by bipartite graphs. This representation is useful to aid protein inference and quantification, which is complex due to the occurrence of shared peptides. We conducted a...
Article
Full-text available
We studied the prognostic impact of tumor immunoglobulin kappa C (IGKC) mRNA expression as a marker of the humoral immune system in the FinHer trial patient population, where 1010 patients with early breast cancer were randomly allocated to either docetaxel-containing or vinorelbine-containing adjuvant chemotherapy. HER2-positive patients were addi...
Article
Full-text available
In many practical machine learning applications, there are two objectives: one is to maximize predictive accuracy and the other is to minimize costs of the resulting model. These costs of individual features may be financial costs, but can also refer to other aspects, for example, evaluation time. Feature selection addresses both objectives, as it...
Preprint
Fitting models with high predictive accuracy that include all relevant but no irrelevant or redundant features is a challenging task on data sets with similar (e.g. highly correlated) features. We propose the approach of tuning the hyperparameters of a predictive model in a multi-criteria fashion with respect to predictive accuracy and feature sele...
Article
Full-text available
The predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machi...
Preprint
Full-text available
We propose to use Bayesian optimization (BO) to improve the efficiency of the design selection process in clinical trials. BO is a method to optimize expensive black-box functions, by using a regression as a surrogate to guide the search. In clinical trials, planning test procedures and sample sizes is a crucial task. A common goal is to maximize t...
Article
Full-text available
Wir stellen in diesem Aufsatz ein Modell interdisziplinärer Zusammenarbeit zwischen Kommunikationswissenschaft und Methodenwissenschaft (hier: Statistik) vor. Dabei steht die Frage im Mittelpunkt, wie sich die Kollaboration grundverschiedener Disziplinen über einen längeren Zeitraum verstetigen lässt. Der agilen Entwicklung von Forschungssoftware,...
Article
Purpose: Expression-based classifiers to predict complete pathological response (pCR) after neoadjuvant chemotherapy (NACT) are not routinely used in the clinic. We aimed to build and validate a classifier for pCR after NACT. Experimental design: We performed a prospective multicenter study (EXPRESSION) including 114 patients treated with anthra...
Article
Full-text available
Motivation An important goal of concentration-response studies in toxicology is to determine an ’alert’ concentration where a critical level of the response variable is exceeded. In a classical observation-based approach, only measured concentrations are considered as potential alert concentrations. Alternatively, a parametric curve is fitted to th...
Article
Full-text available
Thousands of transcriptome data sets are available, but approaches for their use in dynamic cell response modelling are few, especially for processes affected simultaneously by two orthogonal influencing variables. We approached this problem for neuroepithelial development of human pluripotent stem cells (differentiation variable), in the presence...
Article
Full-text available
In health research, statistical methods are frequently used to address a wide variety of research questions. For almost every analytical challenge, different methods are available. But how do we choose between different methods and how do we judge whether the chosen method is appropriate for our specific study? Like in any science, in statistics, e...
Article
Full-text available
The debate about possible adverse effects of bisphenol A (BPA) has been ongoing for decades. Bisphenol F (BPF) and S (BPS) have been suggested as "safer" alternatives. In the present study we used hepatocyte-like cells (HLCs) derived from the human embryonic stem cell lines Man12 and H9 to compare the three bisphenol derivatives. Stem cell-derived...
Article
Full-text available
In cell biology, pharmacology and toxicology dose-response and concentration-response curves are frequently fitted to data with statistical methods. Such fits are used to derive quantitative measures (e.g. EC[Formula: see text] values) describing the relationship between the concentration of a compound or the strength of an intervention applied to...
Preprint
For data sets with similar features, for example highly correlated features, most existing stability measures behave in an undesired way: They consider features that are almost identical but have different identifiers as different features. Existing adjusted stability measures, that is, stability measures that take into account the similarities bet...
Preprint
Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model. These costs allow to incorporate disfavored aspects of features, e.g. failure rates of as measuring device, or patient harm, in the model selection process. Random Forests define a particularly challenging problem...
Preprint
In many practical machine learning applications, there are two objectives: one is to maximize predictive accuracy and the other is to minimize costs of the resulting model. These costs of individual features may be financial costs, but can also refer to other aspects, like for example evaluation time. Feature selection addresses both objectives, as...
Chapter
Full-text available
A large number of applications in text data analysis use the Latent Dirichlet Allocation (LDA) as one of the most popular methods in topic modeling. Although the instability of the LDA is mentioned sometimes, it is usually not considered systematically. Instead, an LDA is often selected from a small set of LDAs using heuristic means or human coding...
Article
Full-text available
DNA‐encoded combinatorial synthesis provides efficient and dense coverage of chemical space around privileged molecular structures. The indole side chain of tryptophan plays a prominent role in key, or “hot spot” regions of protein‐protein interactions. A DNA‐encoded combinatorial peptoid library was designed based on the Ugi four‐component reactio...
Article
Full-text available
A focused approach: A DNA‐encoded peptoid library was designed by the Ugi multicomponent reaction around indole structures that mimic the side chain of tryptophan. Applying this focused library to the challenging cancer targets MDM2 and hTEAD4 yielded compounds for inhibitor development. Compounds binding to hTEAD4 disrupted the hTEAD4–YAP interact...
Preprint
Important objectives in cancer research are the prediction of a patient's risk based on molecular measurements such as gene expression data and the identification of new prognostic biomarkers (e.g. genes). In clinical practice, this is often challenging because patient cohorts are typically small and can be heterogeneous. In classical subgroup anal...
Preprint
An important task in clinical medicine is the construction of risk prediction models for specific subgroups of patients based on high-dimensional molecular measurements such as gene expression data. Major objectives in modeling high-dimensional data are good prediction performance and feature selection to find a subset of predictors that are truly...
Preprint
Full-text available
For organizing large text corpora topic modeling provides useful tools. A widely used method is Latent Dirichlet Allocation (LDA), a generative probabilistic model which models single texts in a collection of texts as mixtures of latent topics. The assignments of words to topics rely on initial values such that generally the outcome of LDA is not f...
Article
Full-text available
Background: With modern methods in biotechnology, the search for biomarkers has advanced to a challenging statistical task exploring high dimensional data sets. Feature selection is a widely researched preprocessing step to handle huge numbers of biomarker candidates and has special importance for the analysis of biomedical data. Such data sets of...
Article
Full-text available
The first in vitro tests for developmental toxicity made use of rodent cells. Newer teratology tests, e.g. developed during the ESNATS project, use human cells and measure mechanistic endpoints (such as transcriptome changes). However, the toxicological implications of mechanistic parameters are hard to judge, without functional/morphological endpo...
Chapter
For data sets with similar features, for example highly correlated features, most existing stability measures behave in an undesired way: They consider features that are almost identical but have different identifiers as different features. Existing adjusted stability measures, that is, stability measures that take into account the similarities bet...
Chapter
Latent Dirichlet Allocation (LDA) is one of the most popular topic models employed for the analysis of large text data. When applied repeatedly to the same text corpus, LDA leads to different results. To address this issue, several methods have been proposed. In this paper, instead of dealing with this methodological source of algorithmic uncertain...
Article
Full-text available
Many toxicological test methods, including assays of cell viability and function, require an evaluation of concentration-response data. This often involves curve fitting, and the resulting mathematical functions are then used to determine the concentration at which a certain deviation from the control value occurs (e.g. a decrease of cell viability...