Jeroen M. Goedhart’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Figure 1: Set-up of FusedTree. In each leaf node m (m = 1, . . . , 4 in this example), we fit a linear regression using n m samples with omics covariates X (m) and an intercept c m . The intercept contains the (potentially nonlinear) clinical information. The regression in leaf node m borrows information from the other leaf nodes by linking the regressions (indicated with ←→) through fusion penalty (5).
Figure 2: Boxplots of the prediction mean square errors of several prediction models across 500 simulated data sets for the Interaction(top), Full Fusion (middle), and Linear (bottom) simulation experiment. For all experiments, we consider N = 100 (left) and N = 300 (right). The oracle prediction model is only considered for the Interaction experiment ( * indicates that oracle model boxplots are missing for the Full Fusion and Linear experiment). We do not depict results for ridge regression in the Interaction experiment because its PMSE's fall far outside the range of the PMSE's of the other models (indicated by ↑). Outliers of boxplots are not shown.
Figure 3: (a) The estimated survival tree of FusedTree. In the leaf nodes, the relative death rate (top) and the number of events/node sample size (bottom) are depicted. The plot is produced using the R package rpart.plot. (b) Regularization paths as a function of fusion penalty α for the effect estimates of two genes in nodes 5, 12, and 13 of FusedTree. The vertical dotted line (at log α = 9.6) indicates the tuned α of FusedTree.
Figure S1: Fit of the tree
Figure S4: Scatter plot of PMSE ZeroFus /PMSE FusedTree as a function of fusion penalty α (log scale) across 500 simulated data sets for N = 100 and N = 300 for the effect modification simulation experiment (Section 4.1)

+3

Fusion of Tree-induced Regressions for Clinico-genomic Data
  • Preprint
  • File available

November 2024

·

36 Reads

Jeroen M. Goedhart

·

Mark A. van de Wiel

·

Wessel N. van Wieringen

·

Thomas Klausch

Cancer prognosis is often based on a set of omics covariates and a set of established clinical covariates such as age and tumor stage. Combining these two sets poses challenges. First, dimension difference: clinical covariates should be favored because they are low-dimensional and usually have stronger prognostic ability than high-dimensional omics covariates. Second, interactions: genetic profiles and their prognostic effects may vary across patient subpopulations. Last, redundancy: a (set of) gene(s) may encode similar prognostic information as a clinical covariate. To address these challenges, we combine regression trees, employing clinical covariates only, with a fusion-like penalized regression framework in the leaf nodes for the omics covariates. The fusion penalty controls the variability in genetic profiles across subpopulations. We prove that the shrinkage limit of the proposed method equals a benchmark model: a ridge regression with penalized omics covariates and unpenalized clinical covariates. Furthermore, the proposed method allows researchers to evaluate, for different subpopulations, whether the overall omics effect enhances prognosis compared to only employing clinical covariates. In an application to colorectal cancer prognosis based on established clinical covariates and 20,000+ gene expressions, we illustrate the features of our method.

Download

Estimation of Predictive Performance in High-Dimensional Data Settings using Learning Curves

September 2022

·

3 Reads

Computational Statistics & Data Analysis

In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.


Figure 2: Boxplots of the empirical distribution of distances of the lower 95% confidence bounds to the true AU C values (y-axis) for the bootstrap, and Learn2Evaluate without (L2E -BC) and with bias correction (L2E + BC) in the N = 100, ν = 1000 setting. The learning curve is fitted by an inverse power law and we determine n opt by MSE minimization. Outliers are not depicted.
Figure 4: Boxplots of the empirical distribution of distances of the lower 95% confidence bounds to the true AU C values (y-axis) for the bootstrap, and Learn2Evaluate without (L2E -BC) and with bias correction (L2E + BC) in the N = 200, ν = 1000 setting. The learning curve is fitted by an inverse power law and we determine n opt by MSE minimization. Outliers are not depicted.
Coverage results for the 95% lower confidence bounds of Learn2Evaluate with (L bc nopt ) and without bias correction (L nopt ), the asymptotic AUC variance estimator of Le Dell (Le Dell), and leave- one-out bootstrapping (LOOB). The learning curve is fitted by a power law and n opt is determined by MSE minimization.
Estimation of Predictive Performance in High-Dimensional Data Settings using Learning Curves

June 2022

·

24 Reads

In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.