Riccardo De Bin

Riccardo De Bin
University of Oslo · Department of Mathematics

About

60
Publications
9,308
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
848
Citations

Publications

Publications (60)
Article
Full-text available
This paper addresses the statistical distribution of wave crest heights in ocean environments. This is much needed in several engineering applications including risk and reliability assessment of marine and coastal structures and is an important input for design of ocean structures. However, even though crest height distributions have received a lo...
Preprint
Full-text available
We introduce GPTreeO, a flexible R package for scalable Gaussian process (GP) regression, particularly tailored to continual learning problems. GPTreeO builds upon the Dividing Local Gaussian Processes (DLGP) algorithm, in which a binary tree of local GP regressors is dynamically constructed using a continual stream of input data. In GPTreeO we ext...
Article
Full-text available
Regression modelling often presents a trade-off between predictiveness and interpretability. Highly predictive and popular tree-based algorithms such as Random Forest and boosted trees predict very well the outcome of new observations, but the effect of the predictors on the result is hard to interpret. Highly interpretable algorithms like linear e...
Article
Full-text available
We propose a framework for fitting multivariable fractional polynomial models as special cases of Bayesian generalized nonlinear models, applying an adapted version of the genetically modified mode jumping Markov chain Monte Carlo algorithm. The universality of the Bayesian generalized nonlinear models allows us to employ a Bayesian version of frac...
Preprint
Full-text available
We propose a framework for fitting fractional polynomials models as special cases of Bayesian Generalized Nonlinear Models, applying an adapted version of the Genetically Modified Mode Jumping Markov Chain Monte Carlo algorithm. The universality of the Bayesian Generalized Nonlinear Models allows us to employ a Bayesian version of the fractional po...
Article
Full-text available
Background In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have lar...
Article
Full-text available
Machine learning can make a strong contribution to accelerating the discovery of transition metal complexes (TMC). These compounds will play a key role in the development of new technologies for which there is an urgent need, including the production of green hydrogen from renewable sources. Despite the recent developments in machine learning for d...
Article
Full-text available
Background The research of biomarker-treatment interactions is commonly investigated in randomized clinical trials (RCT) for improving medicine precision. The hierarchical interaction constraint states that an interaction should only be in a model if its main effects are also in the model. However, this constraint is not guaranteed in the standard...
Article
Lithium-ion batteries are a prominent technology for the electrification of the transport sector, which itself is a key measure towards the departure from fossil fuels. The “green shift” is taking place in the marine industry too, where the number of battery-powered vessels is fastly growing. In this case, monitoring the battery State of Health is...
Preprint
Full-text available
Machine learning can make a strong contribution to accelerating the discovery of transition metal complexes (TMC). These compounds will play a key role in the development of new technologies for which there is an urgent need, including the production of green hydrogen from renewable sources. Despite the recent developments in machine learning for d...
Article
Full-text available
Quantitative adverse outcome pathway network (qAOPN) is gaining momentum due to the predictive nature, alignment with quantitative risk assessment and great potential as a computational new approach methodology (NAM) to reduce laboratory animal tests. The present work aimed to demonstrate two advanced modeling approaches, piecewise structural equat...
Preprint
Full-text available
A characteristic feature of time-to-event data analysis is possible censoring of the event time. Most of the statistical learning methods for handling censored data are limited by the assumption of independent censoring, even if this can lead to biased predictions when the assumption does not hold. This paper introduces Clayton-boost, a boosting ap...
Article
Longevity and safety of lithium-ion batteries are facilitated by efficient monitoring and adjustment of the battery operating conditions. Hence, it is crucial to implement fast and accurate algorithms for State of Health (SoH) monitoring on the Battery Management System. The task is challenging due to the complexity and multitude of the factors con...
Conference Paper
Full-text available
In this work, we suggest a framework to fit fractional polynomials based on the Bayesian Generalized Nonlinear Models (BGNLM, Hubin et al, 2021). A version of the Genetically Modified Mode Jumping Markov Chain Monte Carlo (GMJMCMC) algorithm (Hubin et al, 2020) is adopted. Preliminary simulation runs show promising results in terms of identifying t...
Conference Paper
We propose a boosting model for the analysis of censored data with a dependent censoring scheme, based on the accelerated failure time model and the Clayton copula. Both in the motivating example, related to aeroplane landing, and in a classic biomedical dataset, our proposed approach provides excellent results. Url: https://www.iwsm2022.com/wp-c...
Preprint
Full-text available
Background: The research of biomarker-treatment interactions is commonly investigated in randomized clinical trials (RCT) for improving medicine precision. The hierarchical interaction constraint states that an interaction should only be in a model if its main effects are also in the model. However, this constraint is not guaranteed in the differen...
Article
Full-text available
In this paper we propose a boosting algorithm to extend the applicability of a first hitting time model to high-dimensional frameworks. Based on an underlying stochastic process, first hitting time models do not require the proportional hazards assumption, hardly verifiable in the high-dimensional context, and represent a valid parametric alternati...
Article
Full-text available
The presence of snow and ice on runway surfaces reduces the available tire-pavement friction needed for retardation and directional control and causes potential economic and safety threats for the aviation industry during the winter seasons. To activate appropriate safety procedures, pilots need accurate and timely information on the actual runway...
Article
Publication bias and p-hacking are two well-known phenomena that strongly affect the scientific literature and cause severe problems in meta-analyses. Due to these phenomena, the assumptions of meta-analyses are seriously violated and the results of the studies cannot be trusted. While publication bias is very often captured well by the weighting f...
Article
Full-text available
Across the field of education research there has been an increased focus on the development, critique, and evaluation of statistical methods and data usage due to recently created, very large datasets and machine learning techniques. In physics education research (PER), this increased focus has recently been shown through the 2019 Physical Review P...
Preprint
Pusblished version available: https://doi.org/10.1016/j.coldregions.2022.103556 - The presence of snow and ice on runway surfaces reduces the available tire-pavement friction needed for retardation and directional control and causes potential economic and safety threats for the aviation industry during the winter seasons. To activate appropriate sa...
Preprint
Full-text available
Across the field of education research there has been an increased focus on the development, critique, and evaluation of statistical methods and data usage due to recently created, very large data sets and machine learning techniques. In physics education research (PER), this increased focus has recently been shown through the 2019 Physical Review...
Presentation
Full-text available
Quantitative adverse outcome pathway (qAOP) is gaining momentum due to the predictive nature and alignment with quantitative risk assessment. A wide range of modeling approaches can potentially assist the construction of qAOPs. Among these, piecewise structural equation modeling (PSEM) is considered highly suitable for qAOP network construction. Th...
Poster
Full-text available
An adverse outcome pathway (AOP) network has been developed to describe the adverse effect of UV-B radiation (AOP #327−330). This tentative AOP, which is the first AOP for a non-chemical stressor, is a complex network linking a molecular initiating event (MIE: cellular ROS formation) to an adverse outcome (AO: reduced survival of a crustacean), thr...
Preprint
Full-text available
Longevity and safety of Lithium-ion batteries are facilitated by efficient monitoring and adjustment of the battery operating conditions: hence, it is crucial to implement fast and accurate algorithms for State of Health (SoH) monitoring on the Battery Management System. The task is challenging due to the complexity and multitude of the factors con...
Poster
Full-text available
Due to the high number of chemicals and species, it is not feasible to assess the risk of every chemical to human and ecosystems. Cost-effective alternative ecotoxicity testing strategies with reduced needs for laboratory animal use are highly demanded. New Approach Methodologies (NAMs), such as high-throughput screening and high-content toxicogeno...
Article
Full-text available
The time it takes a student to graduate with a university degree is mitigated by a variety of factors such as their background, the academic performance at university, and their integration into the social communities of the university they attend. Different universities have different populations, student services, instruction styles, and degree p...
Article
Full-text available
Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection...
Article
Full-text available
Background: The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not take into account the case of predictors belongi...
Preprint
Full-text available
The time it takes a student to graduate with a university degree is mitigated by a variety of factors such as their background, the academic performance at university, and their integration into the social communities of the university they attend. Different universities have different populations, student services, instruction styles, and degree p...
Article
Full-text available
Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. F...
Article
U-statistics enjoy good properties such as asymptotic normality, unbiasedness and minimal variance among unbiased estimators. The estimation of their variance is often of interest, for instance to derive asymptotic tests. It is well-known that an unbiased estimator of the variance of a U-statistic can be formulated explicitly as a U-statistic itsel...
Preprint
Full-text available
Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. F...
Preprint
Full-text available
Publication bias and p-hacking are two well-known phenomena which strongly affect the scientific literature and cause severe problems in meta-analysis studies. Due to these phenomena, the assumptions are seriously violated and the results of the meta-analysis studies cannot be trusted. While publication bias is almost perfectly captured by the mode...
Article
Data integration, i.e. the use of different sources of information for data analysis, is becoming one of the most important topics in modern statistics. Especially in, but not limited to, biomedical applications, a relevant issue is the combination of low-dimensional (e.g. clinical data) and high-dimensional (e.g. molecular data such as gene expres...
Preprint
Full-text available
Penalized regression methods, such as ridge regression, heavily rely on the choice of a tuning, or penalty, parameter, which is often computed via cross-validation. Discrepancies in the value of the penalty parameter may lead to substantial differences in regression coefficient estimates and predictions. In this paper, we investigate the effect of...
Article
Full-text available
Background: Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the need for appropriate model building for clinical...
Article
Problématique Le développement de technologies génomiques à haut débit a permis la croissance rapide et la disponibilité plus facile de très grandes données génomiques. Le modèle à risques proportionnels de Cox est couramment utilisé pour estimer l’effet d’un ou de plusieurs facteurs pronostiques pour des critères de jugement de type survie. La mét...
Article
Full-text available
In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressions. T...
Article
Objective: To establish the diagnostic test accuracy of both two-dimensional (2D) and four-dimensional (4D) transperineal ultrasound, to assess if 4D ultrasound imaging provides additional value in the diagnosis of posterior pelvic floor disorders in women with obstructed defaecation syndrome. Methods: In this prospective cohort study, 121 conse...
Article
Objective: To establish the diagnostic test accuracy of evacuation proctography, magnetic resonance imaging (MRI), transperineal ultrasonography, and endovaginal ultrasonography for detecting posterior pelvic floor disorders (rectocele, enterocele, intussusception, and anismus) in women with obstructed defecation syndrome and secondarily to identi...
Article
If a number of candidate variables are available, variable selection is a key task aiming to identify those candidates which influence the outcome of interest. Methods as backward elimination, forward selection, etc. are often implemented, despite their drawbacks. One of these drawbacks is the instability of their results with respect to small pert...
Article
Influential points can cause severe problems when deriving a multivariable regression model. A novel approach to check for such points is proposed, based on the variable inclusion matrix, a simple way to summarize results from resampling-based variable selection procedures. These procedures rely on the variable inclusion matrix, which reports wheth...
Chapter
We review some strategies proposed in the literature to combine clinical and omics data in a prediction model. We show how these strategies can be performed by using two well-known statistical methods, lasso and boosting, through an application to a biomedical study with a time-to-event outcome.
Article
Full-text available
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been wide...
Data
The table displays the parameters of the additional simulation settings. See the results in Subsection 3.2.2.
Article
Full-text available
Despite the limitations imposed by the proportional hazards assumption, the Cox model is probably the most popular statistical tool used to analyze survival data, thanks to its flexibility and ease of interpretation. For this reason, novel statistical/machine learning techniques are usually adapted to fit its requirements, including boosting. Boost...
Article
In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and their ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressi...
Article
In the exponential families framework, we provide a mixing distribution which assures the equivalence between the conditional and the random-effects likelihoods, two widely used tools to make inference on a parameter of interest in the case of many nuisance parameters.
Article
In recent years, increasing attention has been devoted to the problem of the stability of multivariable regression models, understood as the resistance of the model to small changes in the data on which it has been fitted. Resampling techniques, mainly based on the bootstrap, have been developed to address this issue. In particular, the approaches...
Article
Full-text available
Background In the last years, the importance of independent validation of the prediction ability of a new gene signature has been largely recognized. Recently, with the development of gene signatures which integrate rather than replace the clinical predictors in the prediction rule, the focus has been moved to the validation of the added predictive...
Article
In biomedical literature, numerous prediction models for clinical outcomes have been developed based either on clinical data or, more recently, on high-throughput molecular data (omics data). Prediction models based on both types of data, however, are less common, although some recent studies suggest that a suitable combination of clinical and mole...
Article
Full-text available
We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Thus, it has minimal variance among all unbiased estimators and is asymptotically normally distributed. Moreover, there is a...
Article
Full-text available
Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, since the number of variables can be much higher than the number of observations. Here, we present a general framework to deal with the clustering of microarray d...

Network

Cited By