
Riccardo De BinUniversity of Oslo · Department of Mathematics
Riccardo De Bin
About
60
Publications
9,308
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
848
Citations
Publications
Publications (60)
This paper addresses the statistical distribution of wave crest heights in ocean environments. This is much needed in several engineering applications including risk and reliability assessment of marine and coastal structures and is an important input for design of ocean structures. However, even though crest height distributions have received a lo...
We introduce GPTreeO, a flexible R package for scalable Gaussian process (GP) regression, particularly tailored to continual learning problems. GPTreeO builds upon the Dividing Local Gaussian Processes (DLGP) algorithm, in which a binary tree of local GP regressors is dynamically constructed using a continual stream of input data. In GPTreeO we ext...
Regression modelling often presents a trade-off between predictiveness and interpretability. Highly predictive and popular tree-based algorithms such as Random Forest and boosted trees predict very well the outcome of new observations, but the effect of the predictors on the result is hard to interpret. Highly interpretable algorithms like linear e...
We propose a framework for fitting multivariable fractional polynomial models as special cases of Bayesian generalized nonlinear models, applying an adapted version of the genetically modified mode jumping Markov chain Monte Carlo algorithm. The universality of the Bayesian generalized nonlinear models allows us to employ a Bayesian version of frac...
We propose a framework for fitting fractional polynomials models as special cases of Bayesian Generalized Nonlinear Models, applying an adapted version of the Genetically Modified Mode Jumping Markov Chain Monte Carlo algorithm. The universality of the Bayesian Generalized Nonlinear Models allows us to employ a Bayesian version of the fractional po...
Background
In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have lar...
Machine learning can make a strong contribution to accelerating the discovery of transition metal complexes (TMC). These compounds will play a key role in the development of new technologies for which there is an urgent need, including the production of green hydrogen from renewable sources. Despite the recent developments in machine learning for d...
Background
The research of biomarker-treatment interactions is commonly investigated in randomized clinical trials (RCT) for improving medicine precision. The hierarchical interaction constraint states that an interaction should only be in a model if its main effects are also in the model. However, this constraint is not guaranteed in the standard...
Lithium-ion batteries are a prominent technology for the electrification of the transport sector, which itself is a key measure towards the departure from fossil fuels. The “green shift” is taking place in the marine industry too, where the number of battery-powered vessels is fastly growing. In this case, monitoring the battery State of Health is...
Machine learning can make a strong contribution to accelerating the discovery of transition metal complexes (TMC). These compounds will play a key role in the development of new technologies for which there is an urgent need, including the production of green hydrogen from renewable sources. Despite the recent developments in machine learning for d...
Quantitative adverse outcome pathway network (qAOPN) is gaining momentum due to the predictive nature, alignment with quantitative risk assessment and great potential as a computational new approach methodology (NAM) to reduce laboratory animal tests. The present work aimed to demonstrate two advanced modeling approaches, piecewise structural equat...
A characteristic feature of time-to-event data analysis is possible censoring of the event time. Most of the statistical learning methods for handling censored data are limited by the assumption of independent censoring, even if this can lead to biased predictions when the assumption does not hold. This paper introduces Clayton-boost, a boosting ap...
Longevity and safety of lithium-ion batteries are facilitated by efficient monitoring and adjustment of the battery operating conditions. Hence, it is crucial to implement fast and accurate algorithms for State of Health (SoH) monitoring on the Battery Management System. The task is challenging due to the complexity and multitude of the factors con...
In this work, we suggest a framework to fit fractional polynomials based on the Bayesian Generalized Nonlinear Models (BGNLM, Hubin et al, 2021). A version of the Genetically Modified Mode Jumping Markov Chain Monte Carlo (GMJMCMC) algorithm (Hubin et al, 2020) is adopted. Preliminary simulation runs show promising results in terms of identifying t...
We propose a boosting model for the analysis of censored data with a
dependent censoring scheme, based on the accelerated failure time model and the Clayton copula. Both in the motivating example, related to aeroplane landing, and in a classic biomedical dataset, our proposed approach provides excellent results.
Url: https://www.iwsm2022.com/wp-c...
Background: The research of biomarker-treatment interactions is commonly investigated in randomized clinical trials (RCT) for improving medicine precision. The hierarchical interaction constraint states that an interaction should only be in a model if its main effects are also in the model. However, this constraint is not guaranteed in the differen...
In this paper we propose a boosting algorithm to extend the applicability of a first hitting time model to high-dimensional frameworks. Based on an underlying stochastic process, first hitting time models do not require the proportional hazards assumption, hardly verifiable in the high-dimensional context, and represent a valid parametric alternati...
The presence of snow and ice on runway surfaces reduces the available tire-pavement friction needed for retardation and directional control and causes potential economic and safety threats for the aviation industry during the winter seasons. To activate appropriate safety procedures, pilots need accurate and timely information on the actual runway...
Publication bias and p-hacking are two well-known phenomena that strongly affect the scientific literature and cause severe problems in meta-analyses. Due to these phenomena, the assumptions of meta-analyses are seriously violated and the results of the studies cannot be trusted. While publication bias is very often captured well by the weighting f...
Across the field of education research there has been an increased focus on the development, critique, and evaluation of statistical methods and data usage due to recently created, very large datasets and machine learning techniques. In physics education research (PER), this increased focus has recently been shown through the 2019 Physical Review P...
Pusblished version available: https://doi.org/10.1016/j.coldregions.2022.103556 - The presence of snow and ice on runway surfaces reduces the available tire-pavement friction needed for retardation and directional control and causes potential economic and safety threats for the aviation industry during the winter seasons. To activate appropriate sa...
Across the field of education research there has been an increased focus on the development, critique, and evaluation of statistical methods and data usage due to recently created, very large data sets and machine learning techniques. In physics education research (PER), this increased focus has recently been shown through the 2019 Physical Review...
Quantitative adverse outcome pathway (qAOP) is gaining momentum due to the predictive nature and alignment with quantitative risk assessment. A wide range of modeling approaches can potentially assist the construction of qAOPs. Among these, piecewise structural equation modeling (PSEM) is considered highly suitable for qAOP network construction. Th...
An adverse outcome pathway (AOP) network has been developed to describe the adverse effect of UV-B radiation (AOP #327−330). This tentative AOP, which is the first AOP for a non-chemical stressor, is a complex network linking a molecular initiating event (MIE: cellular ROS formation) to an adverse outcome (AO: reduced survival of a crustacean), thr...
Longevity and safety of Lithium-ion batteries are facilitated by efficient monitoring and adjustment of the battery operating conditions: hence, it is crucial to implement fast and accurate algorithms for State of Health (SoH) monitoring on the Battery Management System. The task is challenging due to the complexity and multitude of the factors con...
Due to the high number of chemicals and species, it is not feasible to assess the risk of every chemical to human and ecosystems. Cost-effective alternative ecotoxicity testing strategies with reduced needs for laboratory animal use are highly demanded. New Approach Methodologies (NAMs), such as high-throughput screening and high-content toxicogeno...
The time it takes a student to graduate with a university degree is mitigated by a variety of factors such as their background, the academic performance at university, and their integration into the social communities of the university they attend. Different universities have different populations, student services, instruction styles, and degree p...
Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection...
Background:
The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not take into account the case of predictors belongi...
The time it takes a student to graduate with a university degree is mitigated by a variety of factors such as their background, the academic performance at university, and their integration into the social communities of the university they attend. Different universities have different populations, student services, instruction styles, and degree p...
Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. F...
U-statistics enjoy good properties such as asymptotic normality, unbiasedness and minimal variance among unbiased estimators. The estimation of their variance is often of interest, for instance to derive asymptotic tests. It is well-known that an unbiased estimator of the variance of a U-statistic can be formulated explicitly as a U-statistic itsel...
Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. F...
Publication bias and p-hacking are two well-known phenomena which strongly affect the scientific literature and cause severe problems in meta-analysis studies. Due to these phenomena, the assumptions are seriously violated and the results of the meta-analysis studies cannot be trusted. While publication bias is almost perfectly captured by the mode...
Data integration, i.e. the use of different sources of information for data analysis, is becoming one of the most important topics in modern statistics. Especially in, but not limited to, biomedical applications, a relevant issue is the combination of low-dimensional (e.g. clinical data) and high-dimensional (e.g. molecular data such as gene expres...
Penalized regression methods, such as ridge regression, heavily rely on the choice of a tuning, or penalty, parameter, which is often computed via cross-validation. Discrepancies in the value of the penalty parameter may lead to substantial differences in regression coefficient estimates and predictions. In this paper, we investigate the effect of...
Background:
Omics data can be very informative in survival analysis and may improve the prognostic ability of classical models based on clinical risk factors for various diseases, for example breast cancer. Recent research has focused on integrating omics and clinical data, yet has often ignored the need for appropriate model building for clinical...
Problématique
Le développement de technologies génomiques à haut débit a permis la croissance rapide et la disponibilité plus facile de très grandes données génomiques. Le modèle à risques proportionnels de Cox est couramment utilisé pour estimer l’effet d’un ou de plusieurs facteurs pronostiques pour des critères de jugement de type survie. La mét...
In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressions. T...
Objective:
To establish the diagnostic test accuracy of both two-dimensional (2D) and four-dimensional (4D) transperineal ultrasound, to assess if 4D ultrasound imaging provides additional value in the diagnosis of posterior pelvic floor disorders in women with obstructed defaecation syndrome.
Methods:
In this prospective cohort study, 121 conse...
Objective:
To establish the diagnostic test accuracy of evacuation proctography, magnetic resonance imaging (MRI), transperineal ultrasonography, and endovaginal ultrasonography for detecting posterior pelvic floor disorders (rectocele, enterocele, intussusception, and anismus) in women with obstructed defecation syndrome and secondarily to identi...
If a number of candidate variables are available, variable selection is a key task aiming to identify those candidates which influence the outcome of interest. Methods as backward elimination, forward selection, etc. are often implemented, despite their drawbacks. One of these drawbacks is the instability of their results with respect to small pert...
Influential points can cause severe problems when deriving a multivariable regression model. A novel approach to check for such points is proposed, based on the variable inclusion matrix, a simple way to summarize results from resampling-based variable selection procedures. These procedures rely on the variable inclusion matrix, which reports wheth...
We review some strategies proposed in the literature to combine clinical and omics data in a prediction model. We show how these strategies can be performed by using two well-known statistical methods, lasso and boosting, through an application to a biomedical study with a time-to-event outcome.
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been wide...
The table displays the parameters of the additional simulation settings. See the results in Subsection 3.2.2.
Despite the limitations imposed by the proportional hazards assumption, the Cox model is probably the most popular statistical tool used to analyze survival data, thanks to its flexibility and ease of interpretation. For this reason, novel statistical/machine learning techniques are usually adapted to fit its requirements, including boosting. Boost...
In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and their ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressi...
In the exponential families framework, we provide a mixing distribution which assures the equivalence between the conditional and the random-effects likelihoods, two widely used tools to make inference on a parameter of interest in the case of many nuisance parameters.
In recent years, increasing attention has been devoted to the problem of the stability of multivariable regression models, understood as the resistance of the model to small changes in the data on which it has been fitted. Resampling techniques, mainly based on the bootstrap, have been developed to address this issue. In particular, the approaches...
Background
In the last years, the importance of independent validation of the prediction ability of a new gene signature has been largely recognized. Recently, with the development of gene signatures which integrate rather than replace the clinical predictors in the prediction rule, the focus has been moved to the validation of the added predictive...
In biomedical literature, numerous prediction models for clinical outcomes have been developed based either on clinical data or, more recently, on high-throughput molecular data (omics data). Prediction models based on both types of data, however, are less common, although some recent studies suggest that a suitable combination of clinical and mole...
We revisit resampling procedures for error estimation in binary
classification in terms of U-statistics. In particular, we exploit the fact
that the error rate estimator involving all learning-testing splits is a
U-statistic. Thus, it has minimal variance among all unbiased estimators and is
asymptotically normally distributed. Moreover, there is a...
Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, since the number of variables can be much higher than the number of observations.
Here, we present a general framework to deal with the clustering of microarray d...