Chapter

Gradient Boosting with Neural Networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Gradient boosting machines form a family of powerful machine learning techniques that have been applied with success in a wide range of practical applications. Ensemble techniques rely on simple averaging of models in the ensemble. The family of boosting methods adopts a different strategy to construct ensembles. In boosting algorithms, new models are sequentially added to the ensemble. At each iteration, a new weak base-learner is trained with respect to the error of the whole ensemble built so far.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Photovoltaic (PV) systems are indispensable in the renewable energy industry as they convert sunlight into electricity. Accurate determination of important factors such as illuminance and Ultraviolet (UV) irradiation is essential for optimizing the effectiveness and maintenance of these systems. The objective of this work is to evaluate the predictive performance of several Machine Learning (ML) models in estimating the amounts of light and UV radiation in PV systems, by comparing and contrasting their effectiveness. The models that were assessed include Support Vector Classification (SVC), Linear Regression (LR), eXtreme Gradient Boosting (XGBoost), Gradient Boosting (GB), Random Forest (RF), and CatBoost. The study employed a comprehensive dataset that encompassed measurements for temperature, humidity, UV, voltage, current, and illuminance. The data was preprocessed to remove invalid values and align indices. Afterwards, it was divided into separate training and testing sets. The main metrics used to train and evaluate each model were Root Mean Squared Error (RMSE) and the Coefficient of Determination (R²). The findings suggest that the Categorical Boosting (CatBoost) and RF models demonstrate greater performance in comparison to other models. This is evidenced by their ability to obtain the lowest RMSE and highest R² values for both illuminance and UV forecasts. More precisely, CatBoost algorithm obtained a RMSE of 16.088 and a R² of 0.999 for illuminance. Additionally, it achieved a RMSE of 0.228 and a R² of 0.990 for UV. However, LR and SVC had notably inferior results. The results offered valuable perspectives for enhancing decision-making procedures.
Article
Full-text available
The main purpose of the study is to create a mathematical model for the analysis and prediction of technological parameters of the electron-beam welding process using modern regression models, as well as its implementation as a software system in the Python programming language using Scikit-learn, Pandas, NumPy and Matplotlib software packages. Actually, the problem of predicting the parameters of the technological process of electron beam welding is a regression problem. There are many algorithms available for solving the regression prob-lem. Under this work, a regression analysis algorithm is used as polynomial regression with L2 regular-ization – ridge regression, as well as an ensemble of decision tree algorithms – a random forest. Using the developed predictive model will allow the technologist to more consciously approach the selection of both the range of variable parameters for research in new technological modes and to improve the quality in already developed technological modes. The application of the proposed methods will also reduce the time and labor costs for the search, development, and adjustment of the technological pro-cess. The paper describes the ridge regression algorithm, as well as an analysis of the applicability of this algorithm to the solution of the problem posed, and also checks the reliability of the forecasts obtained by their direct use. Also, the process of direct training of the model is considered based on data ob-tained in the experiment framework on the development of the technological process of electron beam welding. An analysis of the applicability of the approach showed that it is permissible to use the proposed method for technological processes with similar statistical dependences. Implementation of the pro-posed approach to predicting the parameters of electron beam welding in production will make it pos-sible to support the adoption of technological decisions when working out the technological process of electron beam welding, as well as when put into production new types of products.
Article
Full-text available
Boosting is one of the most important recent developments in classi-fication methodology. Boosting works by sequentially applying a classifica-tion algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical princi-ples, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descrip-tions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.
Conference Paper
Full-text available
Important ecological phenomena are often observed indirectly. Consequently, probabilistic latent variable models provide an important tool, because they can include explicit models of the ecological phenomenon of interest and the process by which it is observed. However, existing latent variable methods rely on handformulated parametric models, which are expensive to design and require extensive preprocessing of the data. Nonparametric methods (such as regression trees) automate these decisions and produce highly accurate models. However, existing tree methods learn direct mappings from inputs to outputs—they cannot be applied to latent variable models. This paper describes a methodology for integrating nonparametric
Conference Paper
Full-text available
We address the problem of estimating human pose in video sequences, where rough location has been determined. We exploit both appearance and motion information by defining suitable features of an image and its temporal neighbors, and learning a regression map to the parameters of a model of the human body using boosting techniques. Our algorithm can be viewed as a fast initialization step for human body trackers, or as a tracker itself. We extend gradient boosting techniques to learn a multi-dimensional map from (rotated and scaled) Haar features to the entire set of joint angles representing the full body pose. We test our approach by learning a map from image patches to body joint angles from synchronized video and motion capture walking data. We show how our technique enables learning an efficient real-time pose estimator, validated on publicly available datasets.
Article
Full-text available
Two of the major limitations to effective management of coral reef ecosystems are a lack of information on the spatial distribution of marine species and a paucity of data on the interacting environmental variables that drive distributional patterns. Advances in marine remote sensing, together with the novel integration of landscape ecology and advanced niche modelling techniques provide an unprecedented opportunity to reliably model and map marine species distributions across many kilometres of coral reef ecosystems. We developed a multi-scale approach using three-dimensional seafloor morphology and across-shelf location to predict spatial distributions for five common Caribbean fish species. Seascape topography was quantified from high resolution bathymetry at five spatial scales (5-300 m radii) surrounding fish survey sites. Model performance and map accuracy was assessed for two high performing machine-learning algorithms: Boosted Regression Trees (BRT) and Maximum Entropy Species Distribution Modelling (MaxEnt). The three most important predictors were geographical location across the shelf, followed by a measure of topographic complexity. Predictor contribution differed among species, yet rarely changed across spatial scales. BRT provided 'outstanding' model predictions (AUC = >0.9) for three of five fish species. MaxEnt provided 'outstanding' model predictions for two of five species, with the remaining three models considered 'excellent' (AUC = 0.8-0.9). In contrast, MaxEnt spatial predictions were markedly more accurate (92% map accuracy) than BRT (68% map accuracy). We demonstrate that reliable spatial predictions for a range of key fish species can be achieved by modelling the interaction between the geographical location across the shelf and the topographic heterogeneity of seafloor structure. This multi-scale, analytic approach is an important new cost-effective tool to accurately delineate essential fish habitat and support conservation prioritization in marine protected area design, zoning in marine spatial planning, and ecosystem-based fisheries management.
Article
Full-text available
Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data.
Article
Important ecological phenomena are often observed indirectly. Consequently, probabilistic latent variable models provide an important tool, because they can include explicit models of the ecological phenomenon of interest and the process by which it is observed. However, existing latent variable methods rely on hand-formulated parametric models, which are expensive to design and require extensive preprocessing of the data. Nonparametric methods (such as regression trees) automate these decisions and produce highly accurate models. However, existing tree methods learn direct mappings from inputs to outputs — they cannot be applied to latent variable models. This paper describes a methodology for integrating nonparametric tree methods into probabilistic latent variable models by extending functional gradient boosting. The approach is presented in the context of occupancy-detection (OD) modeling, where the goal is to model the distribution of a species from imperfect detections. Experiments on 12 real and 3 synthetic bird species compare standard and tree-boosted OD models (latent variable models) with standard and tree-boosted logistic regression models (without latent structure). All methods perform similarly when predicting the observed variables, but the OD models learn better representations of the latent process. Most importantly, tree-boosted OD models learn the best latent representations when nonlinearities and interactions are present.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Article
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone-Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in n. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.
Article
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone-Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in ℝn. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.