ArticlePDF Available

Abstract

Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current “pseudo”-residuals by least squares at each iteration. The pseudo-residuals are the gradient of the loss functional being minimized, with respect to the model values at each training data point evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Specifically, at each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used in place of the full sample to fit the base learner and compute the model update for the current iteration. This randomized approach also increases robustness against overcapacity of the base learner.
1.0 1.1 1.2 1.3 1.4
0.1 0.2 0.3 0.4 0.5 0.6 0.8 1.0
Fraction randomly sampled
Error / min (error)
N = 500
1.00 1.05 1.10 1.15 1.20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0
Fraction randomly sampled
Error / min (error)
N = 5000
1.0 1.1 1.2 1.3 1.4 1.5
3 6 11 21 41
Terminal nodes
Error / min (error)
f = 1.0
1.0 1.1 1.2 1.3 1.4 1.5
3 6 11 21 41
Terminal nodes
Error / min (error)
f = 0.5
1.00 1.05 1.10 1.15 1.20 1.25 1.30
3 6 11 21 41
Terminal nodes
Error (f=1.0) / error (f=0.5)
Improvement ratio
1.00 1.05 1.10 1.15
0.4 0.5 0.6 0.7 0.8 1.0
Fraction randomly sampled
Error / min (error)
N = 500
1.00 1.05 1.10 1.15
0.4 0.5 0.6 0.7 0.8 1.0
Fraction randomly sampled
Error / min (error)
N = 5000
1.00 1.05 1.10 1.15 1.20 1.25
0.4 0.5 0.6 0.7 0.8 1.0
Fraction randomly sampled
Error / min (error)
N = 500
1.00 1.05 1.10 1.15 1.20 1.25
0.4 0.5 0.6 0.7 0.8 1.0
Fraction randomly sampled
Error / min (error)
N = 5000
... Extreme Gradient Boosting (XGB) algorithm is a gradient-boosted decision tree model that reduces errors by creating successive, complementary trees (Friedman 2002). Each iteration builds on previous attempts to minimize loss using a collection of weaker models, improving accuracy and controlling overfitting through regularization (Natekin and Knoll 2013). ...
Article
Poisson's ratio is a critical parameter in geomechanics and reservoir characterization, typically involves either conducting laboratory experiments directly on cores or indirect calculations derived from transit time logs. However, measuring it is challenging experiencing core recovery problem and low-quality signal registration; these challenges can lead to significant data gaps and inaccuracies that hinder effective reservoir management. The present study aims to develop an advanced intelligent model designed for real-time prediction of dynamic Poisson's ratio. It applies machine learning techniques associated with drilling data from the horizontal wellbore of the Cambro-Ordovician tight fractured reservoir (southeast Algeria). The study evaluates three common machine learning algorithms including random forest (RF), extreme gradient boosting (XGB), and gradient boosting (GB) with drilling data. The XGB model exhibited the highest R-squared value and lowest errors compared with RF and GB in the validation, reaching values of R 2 �0.856, RMSE�0.025, and MAE�0.019. The developed models enable a successful estimation of Poisson's ratio and could assist petroleum engineers in overcoming core recovery and transit time problems, which often result in considerable financial expenditures. Therefore, continuous monitoring and practical application of drilling variables can provide a real-time and effective understanding of formation characteristics allowing pre-decision-making.
... Stochastic gradient boosting is an ensemble technique developed by Friedman (Friedman 2002). He made some minor changes to improve by including random subsampling in the Gradient Boosting Algorithm, as the gradient boosting algorithm constructs an additive model by fitting a base learner sequentially. ...
... The Gradient Boosted Regression (GBR) model is an integrated machine learning algorithm composed of weak regression trees. 3,4 Given training samples D = {(x1, y1), (x2, y2), …, (xn, yn)}, with each regression tree having J leaf nodes, the input data is divided into J disjoint regions, and each regression tree is denoted as tm(x). The goal of GBR training is to minimize the loss function L, with the parameters θm determined by empirical risk minimization: ...
Article
The pursuit of two-dimensional single-atom catalysts (SACs) holds far-reaching significance for advancing energy conversion and storage technologies by providing efficient, stable, and low-cost alternatives to precious metals for hydrogen evolution...
... Still, our study showed that boosting and bagging methods also have equal or even greater capability for higher accuracy than neural networks and any other composite version of the ML algorithm. We used nine different ML models: multivariate adaptive regression splines (MARS) (Friedman 1991), k-nearest neighbors (KNN) (Fix & Hodges, 1951), support vector machine (SVM) (Cortes & Vapnik, 1995), adaptive boosting (ADABOOST) (Freund & Schapire, 1997), extreme gradient boosting (XGBTREE) (Chen & Guestrin, 2016), stochastic gradient boosting machine (SGBM) (Friedman, 2002a(Friedman, , 2002b, random forest (RF) (Breiman, 2001a(Breiman, , 2001b, bagged CART (TREEBAG) (Breiman, 1996), and averaged neural network (Av_NNET) (Kozyrev, 2012). Seven models can be classified into their family methods: ADABOOST, XGBTREE, and SGBM are boosting methods, RF and TREEBAG are bagging methods, and Av_NNET belongs to the family of neural network methods. ...
Article
Full-text available
Univariate neuroimaging studies have shown brain differences in individuals with autism spectrum disorder (ASD) compared to healthy controls (CTL). In contrast, together with neuroimaging, machine learning (ML) provides a framework for building ASD diagnostic models with predictive accuracy assessed with cross-validation. Three types of ML methods in nine ML algorithms were investigated, i.e., boosting, bagging, and neural networks, to check the best algorithm for the classification of ASD from CTL using structural magnetic resonance imaging (MRI) data (N = 740, 344 ASD) from the Autism Brain Imaging Data Exchange (ABIDE) repository. The current study investigated model efficiencies in receiver operating characteristics (ROCs) during the training phase; and balanced accuracy used in the testing phase was captured to compare the algorithms. Findings showed Stochastic Gradient Boosting Machine (SGBM) with a balanced accuracy of 78.87% was the best algorithm for classifying ASD from CTL, followed by random forest (RF) and averaged neural networks (Av_NNET). Top brain features include right Heschl gyrus, left median cingulate and paracingulate gyri, left inferior occipital gyrus, right supramarginal gyrus, and left posterior cingulate gyrus. Findings predict that the ensemble algorithms of ML with multi-modal brain features will improve the accuracy of diagnostic models.
Article
Water volume, a fundamental characteristic of lakes, serves as a crucial indicator for understanding regional climate, ecological systems, and hydrological processes. However, limitations in existing estimation methods and datasets for water depth, such as the insufficient observation of small and medium-sized lakes and unclear temporal information, have hindered a comprehensive understanding of global lake water volumes. To address these challenges, this study develops a machine learning (ML)-based approach to estimate the dynamic water depths of global lakes. By incorporating various lake features and employing multiple innovative water depth extraction methods, we generated an extensive water depth dataset to train the model. Validation results demonstrate the model’s high accuracy, with the bias of −0.08 m, a MAE of 1.09 m, an RMSE of 4.78 m, and an R2 of 0.95. The proposed method provides dynamic monthly estimates of global lake water depths and volumes in 2000~2020. This study offers a cost-effective and efficient solution for estimating global lake water dynamics, providing reliable data to support the monitoring, analysis, and management of regional and global lake systems.
Article
Fault localization is to identify faulty program elements. Among a large number of fault localization approaches in the literature, coverage-based fault localization, especially spectrum-based fault localization (SBFL), has been intensively studied due to its effectiveness and lightweightness. Despite the rich literature, almost all existing fault localization approaches and studies have been conducted on imperative programming languages such as Java and C, leaving a gap in other programming paradigms. In this paper, we aim to study fault localization approaches for the functional programming paradigm, using the Haskell language as a representative. To the best of our knowledge, we build up the first dataset on real Haskell projects, including both real and seeded faults. The dataset enables the research of fault localization for functional languages. With it, we explore fault localization techniques for Haskell. In particular, as is typical for SBFL approaches, we study methods for coverage collection and formulae for suspiciousness score computation, and carefully adapt these two components to Haskell considering the language features and characteristics, resulting in a series of adaption approaches. Moreover, we design a learning-based approach and a transfer learning based approach to take advantage of data from imperative languages. Both approaches are evaluated on our dataset to demonstrate the promises of the direction.
Article
Full-text available
Boosting is one of the most important recent developments in classi-fication methodology. Boosting works by sequentially applying a classifica-tion algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical princi-ples, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descrip-tions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.
Article
Full-text available
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest--descent minimization. A general gradient--descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least--squares, least--absolute--deviation, and Huber--M loss functions for regression, and multi--class logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are decision trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of decision trees produces competitive, highly robust, interpretable procedures for regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire 1996, and Fr...
Article
Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Article
Breiman(1996) showed that bagging could effectively reduce the variance of regression predictors, while leaving the bias unchanged. A new form of bagging we call adaptive bagging is effective in reducing both bias and variance. The procedure works in stages-- the first stage is bagging. Based on the outcomes of the first stage, the output values are altered and a second stage of bagging is carried out using the altered output values. This is repeated until a specified noise level is reached. We give the background theory, and test the method using both trees and nearest neighbor regression methods. Application to two class classification data gives some interesting results.
Article
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.