Michael G. B. Blum’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (6)


Workflow of parametric simulations. For a set of parameters corresponding to a clinical trial scenario, 10,000 instances of clinical trials are simulated to estimate statistical power. The parameters used to obtain the illustrative Kaplan-Meier curves and power curves are C=0.65,Λ=0.9,w=1.5,d=0,θ=0.7.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C=0.65, \Lambda =0.9, w=1.5, d=0, \theta =0.7.$$\end{document} The number of patients in the power curve is the sum of both arms
Reduction in sample size Robs2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{\mathsf{o}\mathsf{b}\mathsf{s}}^{2}$$\end{document} as a function of the prognostic performance (C-index) of the covariate for a range of cumulative incidence values. Cumulative incidence Λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Lambda$$\end{document} is measured at the end of the follow-up period. In the simulations, the hazard ratio is set at θ=0.7,\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =0.7,$$\end{document} the drop-out rate at d=0.01\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=0.01$$\end{document}, and the shape parameter of the Weibull distribution at w=1.5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w=1.5$$\end{document}. The cumulative incidence values that are provided for the breast cancer and HCC indications come from clinical trials selected in Table 2. eBC, early breast cancer; eHCC, early resectable hepatocellular carcinoma; mBC, metastatic breast cancer; aHCC, advanced hepatocellular carcinoma
Effect of broader eligibility criteria and of covariate adjustment on statistical power. Different inclusion statuses are shown by color and adjustment statuses by type of line, irrespective of color. (A) Results of the parametric simulations where the covariate adjusted for is a standard Gaussian. (B) Results of the semi-synthetic simulations based on the HCC-TCGA cohort. The clinical covariates adjusted for are ECOG score and tumor staging. The three levels of inclusion are based on eligibility criteria of past and ongoing trials outlined in Table 1. For all simulations, a constant treatment effect size is assumed across the population. More restrictive eligibility criteria exclude patients with higher disease severity
More efficient and inclusive time-to-event trials with covariate adjustment: a simulation study
  • Article
  • Full-text available

June 2023

·

51 Reads

·

2 Citations

Trials

·

Honghao Li

·

·

[...]

·

Félix Balazard

Adjustment for prognostic covariates increases the statistical power of randomized trials. The factors influencing the increase of power are well-known for trials with continuous outcomes. Here, we study which factors influence power and sample size requirements in time-to-event trials. We consider both parametric simulations and simulations derived from the Cancer Genome Atlas (TCGA) cohort of hepatocellular carcinoma (HCC) patients to assess how sample size requirements are reduced with covariate adjustment. Simulations demonstrate that the benefit of covariate adjustment increases with the prognostic performance of the adjustment covariate (C-index) and with the cumulative incidence of the event in the trial. For a covariate that has an intermediate prognostic performance (C-index=0.65), the reduction of sample size varies from 3.1% when cumulative incidence is of 10% to 29.1% when the cumulative incidence is of 90%. Broadening eligibility criteria usually reduces statistical power while our simulations show that it can be maintained with adequate covariate adjustment. In a simulation of adjuvant trials in HCC, we find that the number of patients screened for eligibility can be divided by 2.4 when broadening eligibility criteria. Last, we find that the Cox-Snell RCS2RCS2{R}_{\mathsf{C}\mathsf{S}}^{2} is a conservative estimation of the reduction in sample size requirements provided by covariate adjustment. Overall, more systematic adjustment for prognostic covariates leads to more efficient and inclusive clinical trials especially when cumulative incidence is large as in metastatic and advanced cancers. Code and results are available at https://github.com/owkin/CovadjustSim.

Download

Type I error rate and power evaluated with Monte Carlo simulations of the five estimators included in the simulation study. Each dot corresponds to a simulation study that includes 300 replicates. The horizontal dashed line corresponds to the expected type I error rate of 5%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\%$$\end{document}
Logarithm of the Mean Absolute Error (MAE) and Mean Squared Error (MSE) of the five estimators included in the simulation study. Each dot corresponds to a simulation study that includes 300 replicates
Logarithm of the bias of the five estimators included in the simulation study. Each dot corresponds to a simulation study that includes 300 replicates
Log width of the 95%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document} Confidence Intervals (C.I) for the different methods. To measure the log width, we compute the logarithm of the variance of a Gaussian distribution which 95%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document} C.I. would match the observed C.I. Each dot corresponds to a simulation study that includes 300 replicates
Results of the two replication experiments. Each point corresponds to an observational experiment. For the 9 null replication experiments, the expected target ATT is 0 and for the 19 RCT experiments, the expected target ATT is the RCT estimate. The larger point is the mean of the points and the bar extends to the mean plus or minus two times the standard deviation. For each method, the position on the x-axis does not matter and random perturbation on the x-axis is added to the points to allow optimal visualisation
External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning

December 2022

·

255 Reads

·

15 Citations

Background An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for External Control Arms (ECA) analysis is insufficient. Methods We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients. Results Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. In terms of mean squared error, G-computation usually minimizes mean squared error. Compared to other methods, DDML has varying Mean Squared Error performances that improves with increasing sample sizes. For hypothesis testing, all methods control type I error and DDML is the most conservative. G-computation is the best method in terms of statistical power, and DDML has comparable power at n=1000 n = 1000 but inferior ones for smaller sample sizes. The replication procedure also indicates that G-computation minimizes mean squared error whereas DDML has intermediate performances in between G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest whereas confidence intervals obtained with DDML are the widest for small sample sizes, which confirms its conservative nature. Conclusions For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches.


Figure 3: Effect of broader eligibility criteria and of covariate adjustment on statistical power. A: results of the parametric simulations where the least at risk patients are selected in the 50% inclusion scenario. B: results of the semi-synthetic simulation based on the HCC-TCGA cohort. The three levels of inclusion are based on eligibility criteria of past and ongoing trials outlined in table 1.
More efficient and inclusive time-to-event trials with covariate adjustment: a simulation study

April 2022

·

37 Reads

Adjustment for prognostic covariates increases the statistical power of randomized trials. The factors influencing increase of power are well-known for trials with continuous outcomes. Here, we study which factors influence power and sample size requirements in time-to-event trials. We consider both parametric simulations and simulations derived from the TCGA cohort of hepatocellular carcinoma (HCC) patients to assess how sample size requirements are reduced with covariate adjustment. Simulations demonstrate that the benefit of covariate adjustment increases with the prognostic performance of the adjustment covariate (C-index) and with the cumulative incidence of the event in the trial. For a covariate that has an intermediate prognostic performance (C-index=0.65), the reduction of sample size varies from 1.7% when cumulative incidence is of 10% to 26.5% when cumulative incidence is of 90%. Broadening eligibility criteria usually reduces statistical power while our simulations show that it can be maintained with adequate covariate adjustment. In a simulation of HCC trials, we find that the number of patients screened for eligibility can be divided by 2.7 when broadening eligibility criteria. Last, we find that the Cox-Snell is a good approximation of the reduction in sample size requirements provided by covariate adjustment. This metric can be used in the design of time-to-event trials to determine sample size. Overall, more systematic adjustment for prognostic covariates leads to more efficient and inclusive clinical trials especially when cumulative incidence is large as in metastatic and advanced cancers. Key messages Covariate adjustment is a statistical technique that leverages prognostic scores within the statistical analysis of a trial. We study its benefits for time-to-event trials. Power gain achieved with covariate adjustment is determined by the prognostic performance of the covariate and by the cumulative incidence of events at the end of the follow-up period. Trials in indications with large cumulative incidence such as metastatic cancers can benefit from covariate adjustment to improve their statistical power. Covariate adjustment maintains statistical power in trials when eligibility criteria are broadened.


Figure 2 Logarithm of the Mean Absolute Error (MAE) and Mean Squared Error (MSE) of the five estimators included in the simulation study. Each dot corresponds to a simulation study that includes 100 replicates.
External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning

January 2022

·

32 Reads

·

3 Citations

Background An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for ECA analysis is insufficient. Methods We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients. Results Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. Ranking based on mean square error is different with G-computation always being among the lowest-error methods while DDML relative performance improves with increasing sample sizes. For hypothesis testing, DDML controls type-1 error and is conservative whereas G-computation and propensity score approaches can be liberal with type I errors ranging between 5% and 10% in some settings. G-computation is the best method in terms of statistical power, and DDML has comparable power at n = 1000 but its power is inferior to propensity score approaches at n = 250. The replication procedure also indicates that G-computation minimizes mean squared error while DDML has intermediate performances compared to G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest in lines with its liberal type I error whereas confidence intervals of DDML are the widest that confirms its conservative nature. Conclusions For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches.


Kaplan–Meier curves for the high-risk individuals and the ones with low or medium risk according to AI-severity
The threshold to assign individuals into a high-risk group was the 2/3 quantile of the AI-severity score computed for patients of the KB development cohort. a Kaplan–Meier curves were obtained for the 150 leftover KB patients from the development cohort. b Kaplan–Meier curves were obtained for the 135 patients of the IGR validation cohort. p-values for the log-rank test were equal to 4.77e–07 (KB) and 4.00e–12 (IGR). The two terciles used to determine threshold values for low-, medium-, and high-risk groups were equal to 0.187 and 0.375. Diamonds correspond to censoring of patients who were still hospitalized at the time when data ceased to be updated. The bands correspond to the sequence of the 95% confidence intervals of the survival probabilities for each day. KB Kremlin-Bicêtre hospital, IGR Institut Gustave Roussy hospital.
AUC values when comparing AI-severity to other prognostic scores for COVID-19 severity/mortality
The AI-severity model was trained using the severity outcome defined as an oxygen flow rate of 15 L/min or higher, the need for mechanical ventilation, or death. When evaluating AI-severity on the alternative outcomes, the model was not trained again. a AUC results are reported on the leftover KB patients from the development cohort (150 patients). b The mean AUC (averaged over outcomes and over hospitals) as a function of the sample size (sum of sample sizes for the development and validation cohorts) used to construct the score. c AUC results are reported on the external validation set from IGR (135 patients). Models are sorted from left to right (and from top to bottom in the legend) by decreasing order of AUC values (averaged over outcomes and over hospitals). Error bars represent the 95% confidence intervals obtained with the DeLong procedure. Stars indicate the order of magnitude of p-values for the DeLong one-sided test in which we test if AUCAI-severity > AUCother score, • 0.05 < p ≤ 0.10, *0.01 < p ≤ 0.05, **0.001 < p ≤ 0.01, ***p ≤ 0.001. KB Kremlin-Bicêtre hospital, IGR Institut Gustave Roussy hospital, ICU intensive care unit, NEWS2 National Early Warning Score 2, AUC area under the curve.
Confusion matrix obtained with AI-severity, which includes CT scan information in addition to clinical and biological variables and with C & B, which contains only clinical and biological variables
Values in the matrices correspond to the number of patients in each category, which is defined by the true severity status and its predicted one. The confusion matrix was computed using the outcome “oxygen flow rate of 15 L/min or higher and/or the need for mechanical ventilation and/or patient death.” For both scores, we considered the 2/3 quantile—computed using the development cohort (KB)—to distinguish severe patients from non-severe patients. In addition to the neural network variable computed from CT scan images, the variables included in AI-severity consist of oxygen saturation, age, sex, platelet, and urea. The variables included in C & B consist of oxygen saturation, age, sex, platelet, urea, LDH, hypertension, chronic kidney disease, dyspnea, and neutophil values. Both scores were constructed using a feature selection algorithm that selected optimal variables. KB Kremlin-Bicêtre hospital, IGR Institut Gustave Roussy hospital.
Integrating deep learning CT-scan model, biological and clinical variables to predict severity of COVID-19 patients

January 2021

·

334 Reads

·

174 Citations

The SARS-COV-2 pandemic has put pressure on intensive care units, so that identifying predictors of disease severity is a priority. We collect 58 clinical and biological variables, and chest CT scan data, from 1003 coronavirus-infected patients from two French hospitals. We train a deep learning model based on CT scans to predict severity. We then construct the multimodal AI-severity score that includes 5 clinical and biological variables (age, sex, oxygenation, urea, platelet) in addition to the deep learning model. We show that neural network analysis of CT-scans brings unique prognosis information, although it is correlated with other markers of severity (oxygenation, LDH, and CRP) explaining the measurable but limited 0.03 increase of AUC obtained when adding CT-scan information to clinical variables. Here, we show that when comparing AI-severity with 11 existing severity scores, we find significantly improved prognosis performance; AI-severity can therefore rapidly become a reference scoring approach. The SARS-COV-2 pandemic has put pressure on intensive care units, so that predicting severe deterioration early is a priority. Here, the authors develop a multimodal severity score including clinical and imaging features that has significantly improved prognostic performance in two validation datasets compared to previous scores.


Figure 1: Population description for the KB and IGR hospitals. Among the 1,003 patients of the study, biological and clinical variables were available for 989 individuals. Categorical variables are expressed as percentages [available]. Continuous variables are shown as median (IQR) [available]. Association with severity are reported with p-values for each center and the pooled p-value has been obtained with Stouffer's method to combine p-values. p-values that are shown are not adjusted for multiplicity. Variables and pooled p-values are in bold when the variable is significant after Bonferroni adjustment to account for multiple testing across the 63 variables. For continuous variables, odds ratios are computed for an increase of one standard deviation of the continuous variable. KB odds ratios are in blue, IGR in red.
AI-based multi-modal integration of clinical characteristics, lab tests and chest CTs improves COVID-19 outcome prediction of hospitalized patients

May 2020

·

188 Reads

·

9 Citations

With 15% of severe cases among hospitalized patients1, the SARS-COV-2 pandemic has put tremendous pressure on Intensive Care Units, and made the identification of early predictors of severity a public health priority. We collected clinical and biological data, as well as CT scan images and radiology reports from 1,003 coronavirus-infected patients from two French hospitals. Radiologists' manual CT annotations were also available. We first identified 11 clinical variables and 3 types of radiologist-reported features significantly associated with prognosis. Next, focusing on the CT images, we trained deep learning models to automatically segment the scans and reproduce radiologists' annotations. We also built CT image-based deep learning models that predicted severity better than models based on the radiologists' scan reports. Finally, we showed that including CT scan features alongside the clinical and biological data yielded more accurate predictions than using clinical and biological data only. These findings show that CT scans provide insightful early predictors of severity.

Citations (5)


... In this book chapter, we did not discuss in depth the survival endpoint, which is also a common endpoint in RCTs, especially in oncology, and has been studied by other researchers extensively (Hernández et al., 2006;Benkeser et al., 2021;Momal et al., 2023). Hernández et al. (2006) concluded that the covariate adjustment for the survival endpoint could also yield power and simultaneously control for Type I error rate. ...

Reference:

Covariate Adjustment in Analyzing Randomized Clinical Trials: Approaches, Software, and Application
More efficient and inclusive time-to-event trials with covariate adjustment: a simulation study

Trials

... ITC methods are applicable both when individual-level patient data (IPD) from the external source is unavailable [7,8] and when IPD is accessible. In the latter case, an external control arm (ECA) can be constructed, allowing combined analysis of outcomes and covariates from the clinical study and the external source [9][10][11]. In addition to contextualising results from single-arm trials, another common application of ECA analysis is estimation of the efficacy of a novel therapy relative to a comparator not included in the control arm of a phase III RCT. ...

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning

... Or machine learning and super learning approaches (van der Laan et al., 2007; Polley and Van Der Laan, 2010) (as briefly mentioned in Section 2.7) can be used to achieving optimal results for semi-parametric estimation of causal effects such as in Colson et al. (2016). Alternatively, one can construct predictive models by applying machine learning and super learning methods to external control data (or experimental arm data) and then use these models to predict potential outcomes for patients in the master protocol (or external control) such as in Loiseau et al. (2022). ...

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning

... The coronavirus disease 2019 (COVID- 19) has had a great impact worldwide. As of 19 April 2023, there have been over 700 million confirmed cases and over six million deaths [1]. ...

Integrating deep learning CT-scan model, biological and clinical variables to predict severity of COVID-19 patients

... Prior studies describe separate use of DL algorithms, volume of disease, and radiomics for diagnosis, disease severity, treatment response, outcome (death), oxygen supplement, intubation and ICU admission in patients with SARS-CoV-2 pneumonia. 5,6,[17][18][19] Although performance of our DL algorithm and radiomics approach is similar to prior reports, besides the influence of motion artifacts, we document both the comparative and additive value of DL-based and radiomics features in prediction of outcome and need for ICU admission. The previously reported subjective grading of disease extent in each lobe, a tedious and time-consuming process, we demonstrate that quantitative lung lobelevel information on volume and percentage of affected lungs is superior for assessing disease severity and predicting patient outcome. ...

AI-based multi-modal integration of clinical characteristics, lab tests and chest CTs improves COVID-19 outcome prediction of hospitalized patients