Leonhard Held's research while affiliated with University of Zurich and other places

Publications (294)

Article
We examine the concept of Bayesian Additional Evidence (BAE) recently proposed by Sondhi et al. We derive simple closed-form expressions for BAE and compare its properties with other methods for assessing findings in the light of new evidence. We find that while BAE is easy to apply, it lacks both a compelling rationale and clarity of use needed fo...
Article
Full-text available
Background Post-mortem imaging has been suggested as an alternative to conventional autopsy in the prenatal and postnatal periods. Noninvasive autopsies do not provide tissue for histological examination, which may limit their clinical value, especially when infection-related morbidity and mortality are suspected. Methods We performed a prospectiv...
Preprint
Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRP) in order to achieve publishable, positive and significant results. Numerous metrics have been developed to determine replication success but it has not yet been established how w...
Preprint
Full-text available
The ongoing replication crisis in science has increased interest in the methodology of replication studies. We propose a novel Bayesian analysis approach using power priors: The likelihood of the original study's data is raised to the power of $\alpha$, and then used as the prior distribution in the analysis of the replication data. Posterior distr...
Preprint
Clinical translation from bench to bedside often remains challenging even despite promising preclinical evidence. Among many drivers like biological complexity or poorly understood disease pathology, preclinical evidence often lacks desired robustness. Reasons include low sample sizes, selective reporting, publication bias, and consequently inflate...
Preprint
Theoretical arguments and empirical investigations indicate that a high proportion of published findings are false or do not replicate. The current position paper provides a broad perspective on this scientific error, focusing both on reform history and on opportunities for future reform. Talking points are organised along four main themes: methodo...
Preprint
There are over 55 different ways to construct a confidence respectively credible interval (CI) for the binomial proportion. Methods to compare them are necessary to decide which should be used in practice. The interval score has been suggested to compare prediction intervals. This score is a proper scoring rule that combines the coverage as a measu...
Preprint
Full-text available
We introduce a novel statistical framework to study replicability which simultaneously offers overall Type-I error control, an assessment of compatibility and a combined confidence region. The approach is based on a recently proposed reverse-Bayes method for the analysis of replication success. We show how the method can be recalibrated to obtain a...
Preprint
Full-text available
Power priors are used for incorporating historical data in Bayesian analyses by taking the likelihood of the historical data raised to the power $\alpha$ as the prior distribution for the model parameters. The power parameter $\alpha$ is typically unknown and assigned a prior distribution, most commonly a beta distribution. Here, we give a novel th...
Preprint
Replication studies are increasingly conducted to assess credibility of scientific findings. Most of these replication attempts target studies with a superiority design, but there is a lack of methodology regarding the analysis of replication studies with alternative types of designs. In order to fill this gap, we adapt three approaches used for su...
Article
Sharing data and code as part of a research publication is crucial for ensuring the computational reproducibility of scientific work. But sharing should be done at the article submission stage, not after publication as it is now, say Rachel Heyard and Leonhard Held. Statisticians and data scientists have the skills and tools to make this change and...
Article
Full-text available
After allogeneic hematopoietic stem cell transplantation (allo-HSCT), the recurrence of recent thymic emigrants (RTE) and self-tolerant T cells indicate normalized thymic function. From 2008 to 2019, we retrospectively analyzed the RTE-reconstitution rate and the minimal time to reach normal age-specific first percentiles for CD31⁺CD45RA⁺CD4⁺T cell...
Preprint
Approval of treatments in areas of high medical need may not follow the two-trials paradigm, but might be granted under conditional approval. Under conditional approval, the evidence for a treatment effect from a pre-market clinical trial has to be substantiated in an independent post-market clinical trial or a longer follow-up duration. Several wa...
Article
Full-text available
It is now widely accepted that the standard inferential toolkit used by the scientific research community - null-hypothesis significance testing (NHST) - is not fit for purpose. Yet despite the threat posed to the scientific enterprise, there is no agreement concerning alternative approaches for evidence assessment. This lack of consensus reflects...
Preprint
This paper aims to provide early-career researchers with a useful introduction to good research practices.
Article
Full-text available
We present an approach to extend the endemic-epidemic (EE) modelling framework for the analysis of infectious disease data. In its spatiotemporal formulation, spatial dependencies have originally been captured by static neighbourhood matrices. We propose to adjust these weight matrices over time to reflect changes in spatial connectivity between ge...
Article
Full-text available
IMPORTANCE In observational studies, patients' treatment outcome expectations have been associated with better outcomes (ie, a placebo response), whereas concerns about adverse side effects have been associated with an increase in the negative effects of treatments (ie, a nocebo response). Some randomized trials have suggested that communication fr...
Article
Full-text available
Objectives To assess the prevalence of statistically significant treatment effects, adverse events and small-study effects (when small studies report more extreme results than large studies) and publication bias (over-reporting of statistically significant results) across medical specialties. Design Large meta-epidemiological study of treatment ef...
Article
Background It is established that comorbidities negatively influence colorectal cancer (CRC)-specific survival. Only few studies have used the relative survival (RS) setting to estimate this association, although RS has been proven particularly useful considering the inaccuracy in death certification. This study aimed to investigate the impact of n...
Article
Full-text available
The coronavirus disease 2019 (COVID-19) pandemic is heterogeneous throughout Africa and threatening millions of lives. Surveillance and short-term modeling forecasts are critical to provide timely information for decisions on control strategies. We created a strategy that helps predict the country-level case occurrences based on cases within or ext...
Preprint
Even though observational studies do not deliver as strong and supporting evidence as randomized controlled trials, they contribute valuable evidence for clinical decision making. We reflect on potential problems and their solution in observational studies and provide a checklist (SHORT, Simple cHecklist for Observational Research in clinical sTudi...
Article
Coding mistakes can lead to false results. Statisticians and data scientists should exploit best practices and tools in statistical programming to enhance reproducible analyses. By Simon Schwab and Leonhard Held Coding mistakes can lead to false results. Statisticians and data scientists should exploit best practices and tools in statistical progra...
Article
Full-text available
We present an approach to extend the Endemic-Epidemic (EE) modelling framework for the analysis of infectious disease data. In its spatiotemporal application, spatial dependencies have originally been captured by a power law applied to static neighbourhood matrices. We propose to adjust these weight matrices over time to reflect changes in spatial...
Article
Full-text available
Background Colorectal cancer (CRC) is among the three most common incident cancers and causes of cancer death in Switzerland for both men and women. To promote aspects of gender medicine, we examined differences in treatment decision and survival by sex in CRC patients diagnosed 2000 and 2001 in the canton of Zurich, Switzerland. Methods Character...
Preprint
Coding mistakes can lead to false results. Statisticians and data scientists should exploit best practices and tools in statistical programming to enhance reproducible analyses.
Preprint
It is now widely accepted that the standard inferential toolkit used by the scientific research community -- null-hypothesis significance testing (NHST) -- is not fit for purpose. Yet despite the threat posed to the scientific enterprise, there is no agreement concerning alternative approaches. This lack of consensus reflects long-standing issues c...
Article
Although recombinant human erythropoietin (rhEpo) has been shown to be neuroprotective in experimental and clinical studies,1,2 prophylactic early high-dose rhEpo did not improve neurodevelopment among 2-year-olds who had been born very preterm in a randomized clinical trial.3 We report the prespecified secondary neurodevelopmental outcomes of the...
Article
If a scientific study reports a discovery with a p‐value at or around 0.05, how credible is it? And what are the chances that a replication of this study will produce a similarly “significant” finding? Leonhard Held, Samuel Pawel and Simon Schwab's answers may surprise you If a scientific study reports a discovery with a p‐value at or around 0.05,...
Preprint
Full-text available
The ongoing coronavirus disease 2019 (COVID-19) pandemic is heterogeneous throughout Africa and threatening millions of lives. Surveillance and short-term modeling forecasts are critical to provide timely information for decisions on control strategies. We use a model that explains the evolution of the COVID-19 pandemic over time in the entire Afri...
Preprint
Publication bias is a persisting problem in meta-analyses for evidence based medicine. As a consequence small studies with large treatment effects are more likely to be reported than studies with a null result which causes asymmetry. Here, we investigated treatment effects from 57,186 studies from 1922 to 2019, and overall 99,129 meta-analyses and...
Article
Nunes et al. ([54]) provide an overview of mathematical models used to analyse epidemics and techniques for conducting studies to obtain parameter estimates for such models. They discuss the SEIR model which has been used in much coronavirus disease 2019 (COVID-19) analysis. Our discussion presents a modelling framework based in time series analysi...
Article
There is an urgent need to develop new methodology for the design and analysis of replication studies. Recently, a reverse-Bayes method called the sceptical p-value has been proposed for this purpose; the inversion of Bayes' theorem allows us to mathematically formalise the notion of scepticism, which in turn can be used to assess the agreement bet...
Article
Replication studies are increasingly conducted to confirm original findings. However, there is no established standard how to assess replication success and in practice many different approaches are used. The purpose of this paper is to refine and extend a recently proposed reverse-Bayes approach for the analysis of replication studies. We show how...
Preprint
Replication studies are increasingly conducted to confirm original findings. However, there is no established standard how to assess replication success and in practice many different approaches are used. The purpose of this paper is to refine and extend a recently proposed reverse-Bayes approach for the analysis of replication studies. We show how...
Article
Count data are often subject to underreporting, especially in infectious disease surveillance. We propose an approximate maximum likelihood method to fit count time series models from the endemic-epidemic class to underreported data. The approach is based on marginal moment matching where underreported processes are approximated through completely...
Preprint
There is an urgent need to develop new methodology for the design and analysis of replication studies. Recently, a reverse-Bayes method called the sceptical $p$-value has been proposed for this purpose; the inversion of Bayes' theorem allows us to mathematically formalise the notion of scepticism, which in turn can be used to assess the agreement b...
Article
Full-text available
Background: Hospital acquired pneumonia (HAP) is divided in two distinct groups, ventilator-associated pneumonia (VAP) and non-ventilator-associated HAP (nvHAP). Although nvHAP occurs more frequently than VAP and results in similar mortality and costs, prevention guidelines and prevention focus almost exclusively on VAP. Scientific evidence about...
Article
The novel coronavirus has dramatically affected our daily lives in the short term. But will the pandemic change research for the better over the longer term? By Simon Schwab and Leonhard Held The novel coronavirus has dramatically affected our daily lives in the short term. But will the pandemic change research for the better over the longer term?...
Article
Readers on sins and abuses of significance tests, horse kicks and dark forces.
Article
Multivariate count time series models are an important tool for analyzing and predicting the spread of infectious disease. We consider the endemic-epidemic framework, a class of autoregressive models for infectious disease surveillance counts, and replace the default autoregression on counts from the previous time period with more flexible weightin...
Preprint
The reproducibility crisis has led to an increasing number of replication studies being conducted. Sample sizes for replication studies are often calculated using conditional power based on the effect estimate from the original study. However, this approach is not well suited as it ignores the uncertainty of the original result. Bayesian methods ar...
Article
Full-text available
Throughout the last decade, the so-called replication crisis has stimulated many researchers to conduct large-scale replication projects. With data from four of these projects, we computed probabilistic forecasts of the replication outcomes, which we then evaluated regarding discrimination, calibration and sharpness. A novel model, which can take i...
Chapter
Frequentist properties of the maximum likelihood estimate of a scalar parameter are derived. The Wald, score and likelihood ratio test statistics and the corresponding confidence intervals are introduced. Variance-stabilising transformations are also discussed. A case study comparing coverage and width of several confidence intervals for a proporti...
Chapter
This chapter describes numerical methods for Bayesian inference in non-conjugate settings. Standard numerical techniques and the Laplace approximation provide ways to numerically compute posterior characteristics of interest. Monte Carlo methods, including Monte Carlo integration, rejection and importance sampling as well as Markov chain Monte Carl...
Article
Full-text available
Statistical methodology plays a crucial role in drug regulation. Decisions by the US Food and Drug Administration or European Medicines Agency are typically made based on multiple primary studies testing the same medical product, where the two‐trials rule is the standard requirement, despite shortcomings. A new approach is proposed for this task ba...
Chapter
Chapter 2 introduces the fundamental notion of the likelihood function and related quantities, such as the maximum likelihood estimate, the score function, and Fisher information. Computational algorithms are treated to compute the maximum likelihood estimate, such as optimisation and the EM algorithm. The concept of sufficiency and the likelihood...
Chapter
The concepts described in Chap. 4 are now extended to multiparameter models. The concept of profile likelihood is introduced as well as the generalised likelihood ratio statistic. The conditional likelihood, an alternative way to eliminate a nuisance parameter, is discussed. Exercises are given at the end.
Chapter
This chapter gives an introduction to Bayesian inference. Conjugate, improper and Jeffreys prior distributions are introduced as well as various Bayesian point and interval estimates. Bayesian inference in multiparameter models is discussed and some results from Bayesian asymptotics are described. Finally, empirical Bayes methods are described, com...
Chapter
This chapter discusses fundamental concepts of frequentist inference, such as unbiasedness and consistency, standard errors and confidence intervals, significance tests and P-values. There is also a section on the bootstrap method. Exercises are given at the end.
Chapter
A time series is a series of observations of a quantity of interest. Markov models are commonly used in applications to take into account the dependence between successive observations. This chapter describes the statistical analysis of different types of Markov models for categorical and continuous time series data, including hidden Markov models...
Chapter
Chapter 9 describes the statistical methodology to predict future data in the presence of unknown model parameters. Emphasis is given on probabilistic predictions, obtained with either a likelihood or Bayesian approach. Connections to the simpler plug-in prediction are also described. Finally, methods to assess the quality of probabilistic predicti...
Chapter
This chapter describes methodology for model selection both from a likelihood and a Bayesian perspective. In particular, AIC and BIC is discussed and its connection to cross-validation. Bayesian model selection based on the marginal likelihood is described, including Bayesian model averaging. Finally, DIC is introduced, completed by a number of exe...
Article
Simon Schwab and Leonhard Held explain the differences between confirmatory and exploratory research and the dangers of confusing the two concepts Simon Schwab and Leonhard Held explain the differences between confirmatory and exploratory research and the dangers of confusing the two concepts.
Preprint
Count data are often subject to underreporting, especially in infectious disease surveillance. We propose an approximate maximum likelihood method to fit count time series models from the endemic-epidemic class to underreported data. The approach is based on marginal moment matching where underreported processes are approximated through completely...
Article
Leonhard Held and Simon Schwab introduce a new series of articles that will highlight topics related to the production of robust, effective and reproducible science Leonhard Held and Simon Schwab introduce a new series of articles that will highlight topics related to the production of robust, effective and reproducible science.
Preprint
Reproducibility Notes - a new series of articles that will highlight topics related to the production of robust, effective and reproducible science.
Article
Full-text available
Three suitable compounds (morphine, chlorpromazine, and phenobarbital) to treat neonatal abstinence syndrome were compared in a prospective multicenter, double-blind trial. Neonates exposed to opioids in utero were randomly allocated to one of three treatment groups. When a predefined threshold of a modified Finnegan score was reached, treatment st...
Book
This richly illustrated textbook covers modern statistical methods with applications in medicine, epidemiology and biology. Firstly, it discusses the importance of statistical models in applied quantitative research and the central role of the likelihood function, describing likelihood-based inference from a frequentist viewpoint, and exploring the...
Article
Full-text available
A new standard is proposed for the evidential assessment of replication studies. The approach combines a specific reverse Bayes technique with prior‐predictive tail probabilities to define replication success. The method gives rise to a quantitative measure for replication success, called the sceptical p‐value. The sceptical p‐value integrates trad...
Preprint
A new significance test is proposed to substantiate scientific findings from multiple primary studies investigating the same research hypothesis. The test statistic is based on the harmonic mean of the squared study-specific test statistics and can also include weights. Appropriate scaling ensures that, for any number of studies, the null distribut...
Article
Full-text available
Short forms of IQ (S-IQ) assessments are time efficient and highly predictive of the full IQ (F-IQ) in healthy individuals. To investigate the validity of S-IQs for patients with neurodevelopmental impairments, this study tested a well-established S-IQ version in patients with congenital heart disease (CHD). The Wechsler Intelligence Scale for Chil...
Preprint
Throughout the last decade, the so-called replication crisis has stimulated many researchers to conduct large-scale replication projects. With data from four of these projects, we computed probabilistic forecasts of the replication outcomes, which we then evaluated regarding discrimination, calibration and sharpness. A novel model, which can take i...
Chapter
Forecasting the future course of epidemics has always been one of the main goals of epidemic modelling. This chapter reviews statistical methods to quantify the accuracy of epidemic forecasts. We distinguish point and probabilistic forecasts and describe different methods to evaluate and compare the predictive performance across models. Two case st...
Article
Full-text available
Clinical prediction models play a key role in risk stratification, therapy assignment and many other fields of medical decision making. Before they can enter clinical practice, their usefulness has to be demonstrated using systematic validation. Methods to assess their predictive performance have been proposed for continuous, binary, and time‐to‐ev...
Article
Introduction: The Management of Myelomeningocele Study, a.k.a. the MOMS trial, was published in 2011 in the New England Journal of Medicine. This prospective randomized controlled trial proved to be a milestone publication that provided definitive evidence that fetal surgery is a novel standard of care for select fetuses with spina bifida aperta (...
Article
Full-text available
The concept of intrinsic credibility has been recently introduced to check the credibility of 'out of the blue' findings without any prior support. A significant result is deemed intrinsically credible if it is in conflict with a sceptical prior derived from the very same data that would make the effect just non-significant. In this paper, I propos...
Article
Full-text available
Purpose: To find ways to reduce the rate of over-triage without drastically increasing the rate of under-triage, we applied a current guideline and identified relevant pre-hospital triage predictors that indicate the need for immediate evaluation and treatment of severely injured patients in the resuscitation area. Methods: Data for adult trauma...
Preprint
Multivariate time series models are an important tool for the analysis of routine public health surveillance data. We extend the endemic-epidemic framework, a class of multivariate autoregressive models for infectious disease counts, by including higher-order lags. Several parsimonious parameterizations motivated from serial interval distributions...
Article
p‐Values are commonly transformed to lower bounds on Bayes factors, so‐called minimum Bayes factors. For the linear model, a sample‐size adjusted minimum Bayes factor over the class of g‐priors on the regression coefficients has recently been proposed (Held & Ott, The American Statistician 70(4), 335–341, 2016). Here, we extend this methodology to...
Preprint
A new standard is proposed for the evidential assessment of replication studies. The approach combines a specific reverse-Bayes technique with prior-predictive tail probabilities to define replication success. The method gives rise to a quantitative measure for replication success, called the sceptical p-value. The sceptical p-value integrates trad...
Article
The development of clinical prediction models requires the selection of suitable predictor variables. Techniques to perform objective Bayesian variable selection in the linear model are well developed and have been extended to the generalized linear model setting as well as to the Cox proportional hazards model. Here, we consider discrete time‐to‐e...
Preprint
Forecasting the future course of epidemics has always been one of the main goals of epidemic modelling. This chapter reviews statistical methods to quantify the accuracy of epidemic forecasts. We distinguish point and probabilistic forecasts and describe different methods to evaluate and compare the predictive performance across models. Two case st...
Article
There is now a large literature on optimal predictive model selection. Bayesian methodology based on the g-prior has been developed for the linear model where the median probability model (MPM) has certain optimality features. However, it is unclear if these properties also hold in the generalised linear model (GLM) framework, frequently used in cl...
Article
Full-text available
Innovations are urgently required for clinical development of antibacterials against multidrug-resistant organisms. Therefore, a European, public-private working group (STAT-Net; part of Combatting Bacterial Resistance in Europe [COMBACTE]), has reviewed and tested several innovative trials designs and analytical methods for randomized clinical tri...
Article
In this study, we assessed intracorporal mercury concentrations in subjects living on partially mercury-contaminated soils in a defined area in Switzerland. We assessed 64 mothers and 107 children who resided in a defined area for at least 3 months. Mercury in biological samples (urine and hair) was measured, a detailed questionnaire was administer...