# Geert MolenberghsUniversiteit Hasselt and University of Leuven · I-BioStat

Geert Molenberghs

## About

1,033

Publications

203,110

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

32,662

Citations

Introduction

Additional affiliations

August 2007 - present

October 1993 - present

## Publications

Publications (1,033)

In the meta-analytic surrogate evaluation framework, the trial-level coefficient of determination
R
trial
2
quantifies the strength of the association between the expected causal treatment effects on the surrogate (S) and the true (T) endpoints. Burzykowski and Buyse supplemented this metric of surrogacy with the surrogate threshold effect (STE)...

During most of their life, stars fuse hydrogen into helium in their cores. The mixing of chemical elements in the radiative envelope of stars with a convective core is able to replenish the core with extra fuel. If effective, such deep mixing allows stars to live longer and change their evolutionary path. Yet localized observations to constrain int...

This report describes the combined impact of different social distancing scenarios, the 501Y.V1 variant and the ongoing vaccination campaign in Belgium and illustrates the importance of epidemic control in the period up to August 1, 2021. • Changing social distancing behaviour as a result of lifting measures too soon might, in spite of the ongoing...

Starting from historic reflections, the current SARS-CoV-2 induced COVID-19 pandemic is examined from various perspectives, in terms of what it implies for the implementation of non-pharmaceutical interventions, the modeling and monitoring of the epidemic, the development of early-warning systems, the study of mortality, prevalence estimation, diag...

Although COVID-19 has been spreading throughout Belgium since February, 2020, its spatial dynamics in Belgium remain poorly understood, partly due to the limited testing of suspected cases during the epidemic’s early phase. We analyse data of COVID-19 symptoms, as self-reported in a weekly online survey, which is open to all Belgian citizens. We pr...

Given the heterogeneous responses to therapy and the high cost of treatments, there is an increasing interest in identifying pretreatment predictors of therapeutic effect. Clearly, the success of such an endeavor will depend on the amount of information that the patient‐specific variables convey about the individual causal treatment effect on the r...

The relationship between association and surrogacy has been the focus of much debate in the surrogate marker literature. Recently, the individual causal association (ICA) has been introduced as a metric of surrogacy in the causal inference framework, when both the surrogate and the true endpoint are normally distributed and when both are binary. Ea...

Background:
Immunosenescence biomarkers and peripheral blood parameters are evaluated separately as possible predictive markers of immunotherapy. Here, we illustrate the use of a causal inference model to identify predictive biomarkers of CIMAvaxEGF success in the treatment of Non-Small Cell Lung Cancer Patients.
Methods:
Data from a controlled...

Objective. Scrutiny of COVID-19 mortality in Belgium over the period 8 March-9 May 2020 (Weeks 11-19), using number of deaths per million, infection fatality rates, and the relation between COVID-19 mortality and excess death rates.
Data. Publicly available COVID-19 mortality (2020); overall mortality (2009-2020) data in Belgium and demographic dat...

Although COVID-19 has been spreading throughout Belgium since February, 2020, its spatial dynamics in Belgium remain poorly understood, due to the limited testing of suspected cases. We analyse data of COVID-19 symptoms, as self-reported in a weekly online survey, which is open to all Belgian citizens. We predict symptoms' incidence using binomial...

Since factor analysis is one of the most often used techniques in psychometrics, comparing or combining solutions from different factor analyses is often needed. Several measures to compare factors exist, one of the best known is Tucker's congruence coefficient, which is enjoying newly found popularity thanks to the recent work of Lorenzo-Seva and...

The purpose of this paper is to contrast the Mantel–Haenszel estimator with an optimal estimator to better understand its specific nature, as well as some unique and interesting properties of the data setting for which it was developed. It is emphasized here that the Mantel–Haenszel estimator does not follow from optimality considerations, but neve...

The draft ICH E9(R1) addendum stipulates that an estimator should align with its associated estimand and yield an estimate that facilitates reliable interpretations. The addendum further stipulates that assumptions should be justifiable and plausible, and that the extent of assumptions is an important consideration for whether an estimate will be r...

This paper provides examples of defining estimands in real-world scenarios following ICH E9(R1) guidelines. Detailed discussions on choosing the estimands and estimators can be found in our companion papers. Three scenarios of increasing complexity are illustrated. The first example is a proof-of-concept trial in major depressive disorder where the...

The National Research Council (NRC) Expert Panel Report on Prevention and Treatment of Missing Data in Clinical Trials highlighted the need for clearly defining objectives and estimands. That report sparked considerable discussion and literature on estimands and how to choose them. Importantly, consideration moved beyond missing data to include all...

Introduction
Currently, no treatment that delays with the progression of Friedreich ataxia is available. In the majority of patients Friedreich ataxia is caused by homozygous pathological expansion of GAA repeats in the first intron of the FXN gene. Nicotinamide acts as a histone deacetylase inhibitor. Dose escalation studies have shown, that short...

Objectives
We investigated the potential impact of reduced tobacco use scenarios on total life expectancy and health expectancies, i.e., healthy life years and unhealthy life years.
Methods
Data from the Belgian Health Interview Survey 2013 were used to estimate smoking and disability prevalence. Disability was based on the Global Activity Limitat...

Background: Immunosenescence biomarkers and peripheral blood parameters are evaluated separately as possible predictive markers of immunotherapy. Here we illustrate the use of a causal inference model to identify predictive biomarkers of CIMAvaxEGF success in the treatment of Non–Small Cell Lung Cancer Patients. Methods: Data from a clinical trial...

Background: Immunosenescence biomarkers and peripheral blood parameters are evaluated separately as possible predictive markers of immunotherapy. Here, we illustrate the use of a causal inference model to identify predictive biomarkers of CIMAvaxEGF success in the treatment of Non–Small Cell Lung Cancer Patients.
Methods: Data from a controlled cli...

Background: Immunosenescence biomarkers and peripheral blood parameters are evaluated separately as possible predictive markers of immunotherapy. Here, we illustrate the use of a causal inference model to identify predictive biomarkers of CIMAvaxEGF success in the treatment of Non–Small Cell Lung Cancer Patients.
Methods: Data from a clinical trial...

In Belgium, variations in thyroid cancer incidence were observed around the major nuclear sites. The present ecological study investigates whether there is an excess incidence of thyroid cancer among people living in the vicinity of the four nuclear sites at the smallest Belgian geographical level. Rate ratios were obtained from a Bayesian hierarch...

When assessing surrogate endpoints in clinical studies under a causal-inference framework, a simulation-based sensitivity analysis is required, so as to sample the unidentifiable parameters across plausible values. To be precise, correlation matrices need to be sampled with only some of their entries identified from the data, known as the matrix co...

Identification of genomic biomarkers is an important area of research in the context of drug discovery experiments. These experiments typically consist of several high dimensional datasets that contain information about a set of drugs (compounds) under development. This type of data structure introduces the challenge of multi-source data integratio...

Biomarkers play a key role in the monitoring of disease progression. The time taken for an individual to reach a biomarker exceeding or lower than a meaningful threshold is often of interest. Due to the inherent variability of biomarkers, persistence criteria are sometimes included in the definitions of progression, such that only two consecutive m...

Context: While factor/principal component analysis is a popular technique to deal with high number of variables, for example in health surveys, comparing or combining solutions from different factor analyses can be cumbersome even though combining factors is necessary in several situations. For example, when applying multiple imputation (to account...

Clustered count data are commonly analysed by the generalized linear mixed model (GLMM). Here, the correlation due to clustering and some overdispersion is captured by the inclusion of cluster-specific normally distributed random effects. Often, the model does not capture the variability completely. Therefore, the GLMM can be extended by including...

Emulators provide approximations to computationally expensive functions and are widely used in diverse domains, despite the ever increasing speed of computational devices. In this paper we establish a connection between two independently developed emulation methods: radial basis function networks and Gaussian process emulation. The methodological r...

In 1981, the idea of a superwind that ends the life of cool giant stars was proposed. Extreme OH/IR-stars develop superwinds with the highest mass-loss rates known so far, up to a few 10^(-4) Msun/yr, informing our understanding of the maximum mass-loss rate achieved during the Asymptotic Giant Branch (AGB) phase. A condundrum arises whereby the ob...

In the version of this Letter originally published, the caption of Fig. 2 incorrectly said J = 3–2; it should have said J = 2–1. This has now been corrected.

Background
Multi-mode data collection is widely used in surveys. Since several modes of data collection are successively applied in such design (e.g. self-administered questionnaire after face-to-face interview), partial nonresponse occurs if participants fail to complete all stages of the data collection. Although such nonresponse might seriously...

This paper provides examples of defining estimands in real-world scenarios following ICH E9(R1) guidelines. Detailed discussions on choosing the estimands and estimators can be found in our companion papers. Three scenarios of increasing complexity are illustrated. The first example is a proof-of-concept trial in major depressive disorder where the...

The draft ICH E9(R1) addendum stipulates that an estimator should align with its associated estimand and yield an estimate that facilitates reliable interpretations. The addendum further stipulates that assumptions should be justifiable and plausible, and that the extent of assumptions is an important consideration for whether an estimate will be r...

The National Research Council (NRC) Expert Panel Report on Prevention and Treatment of Missing Data in Clinical Trials highlighted the need for clearly defining objectives and estimands. That report sparked considerable discussion and literature on estimands and how to choose them. Importantly, consideration moved beyond missing data to include all...

In spite of medical and methodological advances, the identification of good surrogate endpoints has remained a challenging endeavour. This may, at least partially, be attributable to the fact that most researchers have only focused on univariate surrogates endpoints. In the present work, we argue in favour of using multivariate surrogates and intro...

The asteroseismic modelling of period spacing patterns from gravito-inertial modes in stars with a convective core is a high-dimensional problem. We utilize the measured period spacing pattern of prograde dipole gravity modes (acquiring 0), in combination with the effective temperature (Teff) and surface gravity (log g) derived from spectroscopy, t...

The asteroseismic modelling of period spacing patterns from gravito-inertial modes in stars with a convective core is a high-dimensional problem. We utilise the measured period spacing pattern of prograde dipole gravity modes (acquiring $\Pi_0$), in combination with the effective temperature ($T_{\rm eff}$) and surface gravity ($\log g$) derived fr...

At the beginning of the 21st century, a new paradigm was introduced for the evaluation of surrogate endpoints based on meta-analysis. In this paradigm, the putative surrogate is assessed at two different levels, the so-called, trial and individual level. Trial level surrogacy is defined as the association between the expected causal treatment effec...

This paper presents the rationale, genesis, and applications of Project Cornelia, an ongoing computational art history project developed by a cross-disciplinary team at the KU Leuven (University of Leuven). It shares practical perspectives acquired while conceptualizing and unfolding the project and discusses successes as well as challenges and set...

Surrogate endpoints need to be statistically evaluated before they can be used as substitutes of true endpoints in clinical studies. However, even though several evaluation methods have been introduced over the last decades, the identification of good surrogate endpoints remains practically and conceptually challenging. In the present work, the que...

Background
Non-suicidal self-injury (NSSI) is defined as the repetitive, direct, and deliberate destruction of one’s body tissue without an intention to die. Existing cross-sectional research indicates that the association between maternal/peer attachment and NSSI is mediated by identity synthesis and confusion. However, longitudinal confirmation o...

The simultaneous presence of variability due to both pulsations and binarity is no rare phenomenon. Unfortunately, the complexities of dealing with even one of these sources of variability individually means that the other signal is often treated as a nuisance and discarded. However, both types of variability offermeans to probe fundamental stellar...

The increase in life expectancy followed by the burden of chronic diseases contributes to disability at older ages. The estimation of how much chronic conditions contribute to disability can be useful to develop public health strategies to reduce the burden. This paper introduces the R package addhaz, which is based on the attribution method (Nusse...

The asteroseismic modelling of period spacing patterns from gravito-inertial modes in stars with a convective core is a high-dimensional problem. We utilize the measured period spacing pattern of prograde dipole gravity modes (acquiring Pi(0)), in combination with the effective temperature (Teff) and surface gravity (log g) derived from spectroscop...

Data Monitoring Committees (DMCs) are an integral part of clinical drug development. Their use has evolved along with changing study designs and regulatory expectations, which has associated statistical and ethical implications. Although there is guidance from the different regulatory agencies, there are opportunities to bring more consistency to a...

The individual causal association (ICA) has recently been introduced as a metric of surrogacy in a causal‐inference framework. The ICA is defined on the unit interval and quantifies the association between the individual causal effect on the surrogate (ΔS) and true (ΔT) endpoint. In addition, the ICA offers a general assessment of the surrogate pre...

We present a key example from sequential analysis, which illustrates that conditional bias reduction can cause infinite mean absolute error.

We consider multiple imputation as a procedure iterating over a set of imputed datasets. Based on an appropriate stopping rule the number of imputed datasets is determined. Simulations and real-data analyses indicate that the sufficient number of imputed datasets may in some cases be substantially larger than the very small numbers that are usually...

Background:
IDeAl (Integrated designs and analysis of small population clinical trials) is an EU funded project developing new statistical design and analysis methodologies for clinical trials in small population groups. Here we provide an overview of IDeAl findings and give recommendations to applied researchers.
Method:
The description of the...

Missing data is almost inevitable in correlated-data studies. For non-Gaussian outcomes with moderate to large sequences, direct-likelihood methods can involve complex, hard-to-manipulate likelihoods. Popular alternative approaches, like generalized estimating equations, that are frequently used to circumvent the computational complexity of full li...

Estimating complex linear mixed models using an iterative full maximum likelihood estimator can be cumbersome in some cases. With small and unbalanced datasets, convergence problems are common. Also, for large datasets, iterative procedures can be computationally prohibitive. To overcome these computational issues, an unbiased two-stage closed-form...

The simultaneous presence of variability due to both pulsations and binarity is no rare phenomenon. Unfortunately, the complexities of dealing with even one of these sources of variability individually means that the other signal is often treated as a nuisance and discarded. However, both types of variability offer means to probe fundamental stella...

Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach...

We refine the classical Lindeberg-Feller central limit theorem by obtaining asymptotic bounds on the Kolmogorov distance, the Wasserstein distance, and the parametrized Prokhorov distances in terms of a Lindeberg index. We thus obtain more general approximate central limit theorems, which roughly state that the row-wise sums of a triangular array a...

Purpose
Vascular factors have been suggested to influence the development and progression of glaucoma. They are thought to be especially relevant for normal‐tension glaucoma (NTG) patients. We aim to investigate which vascular factors, including advanced vascular examinations, better describe patients with NTG comparing to those with primary open‐a...

A Weibull-model-based approach is examined to handle under- and overdispersed discrete data in a hierarchical framework. This methodology was first introduced by Nakagawa and Osaki (1975, IEEE Transactions on Reliability, 24, 300–301), and later examined for under- and overdispersion by Klakattawi et al. (2018, Entropy, 20, 142) in the univariate c...

The maximum entropy principle offers a constructive criterion for setting up probability distributions on the basis of partial knowledge. In the present work, the principle is applied to tackle an important problem in the surrogate marker field, namely, the evaluation of a binary outcome as a putative surrogate for a binary true endpoint within a c...

We propose a methodological framework to perform forward asteroseismic modeling of stars with a convective core, based on gravity-mode oscillations. These probe the near-core region in the deep stellar interior. The modeling relies on a set of observed high-precision oscillation frequencies of low-degree coherent gravity modes with long lifetimes a...

The emergence of multidrug resistant-tuberculosis (MDR-TB), defined as Mycobacterium tuberculosis strains with in vitro resistance to at least isoniazid and rifampicin, has necessitated evaluation and validation of appropriate surrogate endpoints for treatment response in drug trials for MDR-TB. The trial that has demonstrated efficacy of bedaquili...

Institutional review boards.
(DOCX)

A) Relationship between S24 (on the basis of AFB smear conversion) and T for BDQ; B) Relationship between S24 (on the basis of AFB smear conversion) and T for Placebo control.
(DOCX)

A) Relationship between S24 (on the basis of culture conversion) and T for BDQ: imputed values; B) Relationship between S24 (on the basis of culture conversion) and T for Placebo control: imputed values.
(DOCX)

Gaussian process (GP) emulation is a relatively recent statistical technique that provides a fast-running approximation to a complex computer model, given training data generated by the considered model. Despite its sound theoretical foundation, GP emulation falls short in practical applications where the training dataset is very large, due to nume...