Marco Virgolin

Marco Virgolin
Centrum Wiskunde & Informatica | CWI · Research Group for Life Sciences & Health

PhD

About

46
Publications
2,074
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
216
Citations
Introduction
My main research interest are two Es of Machine Learning (ML): Explainable- and Evolutionary-ML. Explainable ML concerns explaining the meaning of normally uninterpretable ML models, and also techniques to directly generate interpretable ML models. Understanding what an ML model mean can provide not only important insights on the problem at hand, but also confidence in using the model’s predictions, specially when dealing with delicate applications (e.g., cancer treatment). Evolutionary ML uses evolutionary computation (my main expertise is Genetic Programming) to directly generate ML models, or in synergy with other ML techniques. I am also interested in medical applications of ML, and have experience with pediatric radiation therapy.

Publications

Publications (46)
Conference Paper
The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a recently introduced model-based EA that has been shown to be capable of outperforming state-of-the-art alternative EAs in terms of scalability when solving discrete optimization problems. One of the key aspects of GOMEA's success is a variation operator that is designed to extensively...
Preprint
Many risk-sensitive applications require Machine Learning (ML) models to be interpretable. Attempts to obtain interpretable models typically rely on tuning, by trial-and-error, hyper-parameters of model complexity that are only loosely related to interpretability. We show that it is instead possible to take a meta-learning approach: an ML model of...
Article
Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the...
Preprint
Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled w...
Article
Full-text available
PURPOSE To validate and compare the performance of four organ dose reconstruction approaches for historical radiation treatment (RT) planning based on two-dimensional (2D) radiographs. MATERIALS AND METHODS We considered 10 Wilms’ tumor patients with planning computed tomography (CT) images for whom we developed typical historic Wilms’ tumor RT pl...
Preprint
Currently, the genetic programming version of the gene-pool optimal mixing evolutionary algorithm (GP-GOMEA) is among the top-performing algorithms for symbolic regression (SR). A key strength of GP-GOMEA is its way of performing variation, which dynamically adapts to the emergence of patterns in the population. However, GP-GOMEA lacks a mechanism...
Preprint
Interpretability can be critical for the safe and responsible use of machine learning models in high-stakes applications. So far, evolutionary computation (EC), in particular in the form of genetic programming (GP), represents a key enabler for the discovery of interpretable machine learning (IML) models. In this short paper, we argue that research...
Preprint
Dimensionality reduction (DR) is an important technique for data exploration and knowledge discovery. However, most of the main DR methods are either linear (e.g., PCA), do not provide an explicit mapping between the original data and its lower-dimensional representation (e.g., MDS, t-SNE, isomap), or produce mappings that cannot be easily interpre...
Preprint
Genetic programming (GP) is one of the best approaches today to discover symbolic regression models. To find models that trade off accuracy and complexity, the non-dominated sorting genetic algorithm II (NSGA-II) is widely used. Unfortunately, it has been shown that NSGA-II can be inefficient: in early generations, low-complexity models over-replic...
Preprint
Emotion recognition in children can help the early identification of, and intervention on, psychological complications that arise in stressful situations such as cancer treatment. Though deep learning models are increasingly being adopted, data scarcity is often an issue in pediatric medicine, including for facial emotion recognition in children. I...
Preprint
In this chapter, we provide a review of conversational agents (CAs), discussing chatbots, intended for casual conversation with a user, as well as task-oriented agents that generally engage in discussions intended to reach one or several specific goals, often (but not always) within a specific domain. We also consider the concept of embodied conver...
Preprint
Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algorithms can be changed. Researchers have proposed a number of desiderata that CEs should meet to be practically useful, such as requiring minimal effort to enact, or complying with causal models. We consider a further aspect to improve the usability of...
Preprint
Full-text available
When it comes to solving optimization problems with evolutionary algorithms (EAs) in a reliable and scalable manner, detecting and exploiting linkage information, i.e., dependencies between variables, can be key. In this article, we present the latest version of, and propose substantial enhancements to, the Gene-pool Optimal Mixing Evoutionary Algo...
Preprint
In this position paper, we present five key principles, namely interpretability, inherent capability to explain, independent data, interactive learning, and inquisitiveness, for the development of conversational AI that, unlike the currently popular black box approaches, is transparent and accountable. At present, there is a growing concern with th...
Preprint
Full-text available
Many promising approaches to symbolic regression have been presented in recent years, yet progress in the field continues to suffer from a lack of uniform, robust, and transparent benchmarking standards. In this paper, we address this shortcoming by introducing an open-source, reproducible benchmarking platform for symbolic regression. We assess 14...
Preprint
High-stakes applications require AI-generated models to be interpretable. Current algorithms for the synthesis of potentially interpretable models rely on objectives or regularization terms that represent interpretability only coarsely (e.g., model size) and are not designed for a specific user. Yet, interpretability is intrinsically subjective. In...
Chapter
Neural Architecture Search (NAS), i.e., the automation of neural network design, has gained much popularity in recent years with increasingly complex search algorithms being proposed. Yet, solid comparisons with simple baselines are often missing. At the same time, recent retrospective studies have found many new algorithms to be no better than ran...
Preprint
Learning ensembles can substantially improve the generalization performance of low-bias high-variance estimators such as deep decision trees and deep nets. Improvements have also been found when Genetic Programming (GP) is used to learn the estimators. Yet, the best way to learn ensembles in GP remains to be determined, especially considering that...
Chapter
Many risk-sensitive applications require Machine Learning (ML) models to be interpretable. Attempts to obtain interpretable models typically rely on tuning, by trial-and-error, hyper-parameters of model complexity that are only loosely related to interpretability. We show that it is instead possible to take a meta-learning approach: an ML model of...
Article
Purpose: Current phantoms used for the dose reconstruction of long-term childhood cancer survivors lack individualization. We design a method to predict highly individualized abdominal three-dimensional (3-D) phantoms automatically. Approach: We train machine learning (ML) models to map (2-D) patient features to 3-D organ-at-risk (OAR) metrics upon...
Article
To study radiotherapy-related adverse effects, detailed dose information (3D distribution) is needed for accurate dose-effect modeling. For childhood cancer survivors who underwent radiotherapy in the pre-CT era, only 2D radiographs were acquired, thus 3D dose distributions must be reconstructed from limited information. State-of-the-art methods ac...
Article
The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, i.e., the linkage, to estimate what patter...
Preprint
Neural Architecture Search (NAS), i.e., the automation of neural network design, has gained much popularity in recent years with increasingly complex search algorithms being proposed. Yet, solid comparisons with simple baselines are often missing. At the same time, recent retrospective studies have found many new algorithms to be no better than ran...
Preprint
To study radiotherapy-related adverse effects, detailed dose information (3D distribution) is needed for accurate dose-effect modeling. For childhood cancer survivors who underwent radiotherapy in the pre-CT era, only 2D radiographs were acquired, thus 3D dose distributions must be reconstructed. State-of-the-art methods achieve this by using 3D su...
Article
Performing large-scale three-dimensional radiation dose reconstruction for patients requires a large amount of manual work. We present an image processing-based pipeline to automatically reconstruct radiation dose. The pipeline was designed for childhood cancer survivors that received abdominal radiotherapy with anterior-to-posterior and posterior-...
Preprint
Machine Learning (ML) is proving extremely beneficial in many healthcare applications. In pediatric oncology, retrospective studies that investigate the relationship between treatment and late adverse effects still rely on simple heuristics. To assess the effects of radiation therapy, treatment plans are typically simulated on phantoms, i.e., virtu...
Conference Paper
Semantic Backpropagation (SB) is a recent technique that promotes effective variation in tree-based genetic programming. The basic idea of SB is to provide information on what output is desirable for a specified tree node, by propagating the desired root-node output back to the specified node using inversions of functions encountered along the way....
Preprint
Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the...
Article
In retrospective radiation treatment (RT) dosimetry, a surrogate anatomy is often used for patients without 3D CT. To gain insight in what the crucial aspects in a surrogate anatomy are to enable accurate dose reconstruction, we investigated the relation of patient characteristics and internal anatomical features with deviations in reconstructed or...
Preprint
The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) has been shown to be a top performing EA in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts randomly, GOMEA learns a model of interdependencies within the genotype, i.e., the linkage, to estimate what patterns to propagate. In this...
Article
Full-text available
Evolutionary algorithms (EAs) have proven to be effective in tackling problems in many different domains. However, users are often required to spend a significant amount of effort in fine-tuning the EA parameters in order to make the algorithm work. In principle, visualization tools may be of great help in this laborious task, but current visualiza...
Conference Paper
The recently introduced Gene-pool Optimal Mixing Evolutionary Algorithm for Genetic Programming (GP-GOMEA) has been shown to find much smaller solutions of equally high quality compared to other state-of-the-art GP approaches. This is an interesting aspect as small solutions better enable human interpretation. In this paper, an adaptation of GP-GOM...
Article
Purpose: The aim of this study is to establish the first step towards a novel and highly individualized 3D dose distribution reconstruction method, based on CT scans and organ delineations of recently treated patients. Specifically, the feasibility of automatically selecting the CT scan of a recently treated childhood cancer patient who is similar...
Poster
Full-text available
Dose-reconstruction accuracy of the approach based on CT scans of recently-treated children is primarily related to similarity in internal anatomy, not to patient features like height and weight.
Conference Paper
There is an increasing interest in the development of techniques for automatic relation extraction from unstructured text. The biomedical domain, in particular, is a sector that may greatly benefit from those techniques due to the huge and ever increasing amount of scientific publications describing observed phenomena of potential clinical interest...

Network

Cited By