Leo Breiman’s research while affiliated with University of California, Berkeley and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (50)


The II Method for Estimating Multivariate Functions From Noisy Data
  • Article

March 2012

·

95 Reads

·

80 Citations

Technometrics

Leo Breiman

The Π method for estimating an underlying smooth function of M variables, (x l , …, xm), using noisy data is based on approximating it by a sum of products of the form Πm m (x m ). The problem is then reduced to estimating the univariate functions in the products. A convergent algorithm is described. The method keeps tight control on the degrees of freedom used in the fit. Many examples are given. The quality of fit given by the Π method is excellent. Usually, only a few products are enough to fit even fairly complicated functions. The coding into products of univariate functions allows a relatively understandable interpretation of the multivariate fit.


Population theory for boosting ensembles

February 2004

·

75 Reads

·

80 Citations

The Annals of Statistics

Tree ensembles are looked at in distribution space, that is, the limit case of "infinite" sample size. It is shown that the simplest kind of trees is complete in D-dimensional L2(P)L_2(P) space if the number of terminal nodes T is greater than D. For such trees we show that the AdaBoost algorithm gives an ensemble converging to the Bayes risk.


Using Random Forest to Learn Imbalanced Data

January 2004

·

8,304 Reads

·

1,501 Citations

In this paper we propose two ways to deal with the imbalanced data classification problem using random forest. One is based on cost sensitive learning, and the other is based on a sampling technique. Performance metrics such as precision and recall, false positive rate and false negative rate, F-measure and weighted accuracy are computed. Both methods are shown to improve the prediction accuracy of the minority class, and have favorable performance compared to the existing algorithms.


Two-Eyed Algorithms and Problems

September 2003

·

79 Reads

·

1 Citation

Lecture Notes in Computer Science

Two-eyed algorithms are complex prediction algorithms that give accurate predictions and also give important insights into the structure of the data the algorithm is processing. The main example I discuss is RF/tools, a collection of algorithms for classification, regression and multiple dependent outputs. The last algorithm is a preliminary version and further progress depends on solving some fascinating questions of the characterization of dependency between variables. An important and intriguing aspect of the classification version of RF/tools is that it can be used to analyze unsupervised data–that is, data without class labels. This conversion leads to such by-products as clustering, outlier detection, and replacement of missing data for unsupervised data. The talk will present numerous results on real data sets. The code (f77) and ample documentation for RFtools is available on the web site www.stat.berkeley.edu/RFtools.


Using Iterated Bagging to Debias Regressions

December 2001

·

218 Reads

·

325 Citations

Machine Learning

Breiman (Machine Learning, 26(2), 123–140) showed that bagging could effectively reduce the variance of regression predictors, while leaving the bias relatively unchanged. A new form of bagging we call iterated bagging is effective in reducing both bias and variance. The procedure works in stages—the first stage is bagging. Based on the outcomes of the first stage, the output values are altered; and a second stage of bagging is carried out using the altered output values. This is repeated until a simple rule stops the process. The method is tested using both trees and nearest neighbor regression methods. Accuracy on the Boston Housing data benchmark is comparable to the best of the results gotten using highly tuned and compute- intensive Support Vector Regression Machines. Some heuristic theory is given to clarify what is going on. Application to two-class classification data gives interesting results.


Random Forests: Finding Quasars

October 2001

·

81 Reads

·

57 Citations

this paper, we discuss an example in which we classify objects as quasars or non-quasars using the combined results of a radio survey and an optical survey. Such classi cation helps guide the choice of which objects to follow up with relatively expensive spectroscopic measurements


Machine Learning, Volume 45, Number 1 - SpringerLink

October 2001

·

5,346 Reads

·

67,003 Citations

Machine Learning

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.


Randomizing Outputs To Increase Prediction Accuracy

September 2001

·

119 Reads

·

306 Citations

Machine Learning

Introduction In recent research in combining predictors, it has been recognized that the critical thing to success in combining low-bias predictors such as trees and neural nets has been through methods that reduce the variability in the predictor due to training set variability. Assume that the training set consists of N independent draws from the same underlying distribution. Conceptually, training sets of size N can be drawn repeatedly and the same algorithm used to construct a predictor on each training set. These predictors will vary, and the extent of the variability is a dominant factor in the generalization prediction error. 2 Given a training set {(y n ,x n ),n=1,...N} where the y's are either class labels or numerical values, the most common way of reducing variability is by perturbing the training set to produce alternative training sets, growing a predictor on


Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)

August 2001

·

2,663 Reads

·

3,880 Citations

Statistical Science

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.


Random Forests

January 2001

·

305 Reads

·

6,015 Citations

Machine Learning

Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.


Citations (50)


... In this paper, a quick-and-simple strategy is proposed to improve the market operator's forecasts of very short-term Italian electricity demand, which is of interest for power retailers in managing dispatching and bid strategies. No further forecasting model is proposed, rather we obtain 15-minutes Italian energy demand forecasts disaggregated according to billing zones through a stackedregression [2] post-processing approach using only the public information (observed data and forecast records) made available by Terna. The methodology is validated via out-of-sample comparisons using power load data from December 2023 to December 2024 to train the approach and evaluate its performance. ...

Reference:

Energy load forecasting using Terna public data: a free lunch multi-task combination approach
Stacked Regressions
  • Citing Article
  • July 1996

Machine Learning

... DTs are closely related to interpretability: not only are they often regarded as a particularly interpretable model (Freitas 2014), but also interpretability itself is regarded as the "biggest single advantage of the tree-structured approach" (Breiman and Friedman 1988). ...

Tree-Structured Classification Via Generalized Discriminant Analysis: Comment
  • Citing Article
  • September 1988

... The impact of features on the output results is assessed through the model's built-in feature importance and SHapley Additive exPlanations (SHAP) [35]. The following algorithms were used: AdaBoost regression (ABR) [36], bagging regression (BAGR) [37], gradient boosting regression (GBR) [38,39], k-nearest neighbors regression (KNR) [40], random forest regression (RFR) [41] and support vector regression (SVR) [42]. The main evaluation metrics for the regression models are the coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE). ...

Random Forests
  • Citing Article
  • January 2001

Machine Learning

... We chose a smoothing spline with the basis function of a thin plate spline over other smoothing spline approaches because it is both theoretically well-founded and particularly suited to our needs given that the approximations developed by Simon Wood (42) made thin plate regressions computationally efficient so that they can also be used for large data sets. This technique has a very good level of accuracy, though the curves produced are not as smooth as other automatic smoothers (45). The smoothing parameter can be estimated automatically and simultaneously with the whole model by either using restricted maximum likelihood (REML) or generalized cross validation (46). ...

Comparing Automatic Smoothers (A Public Service Enterprise)
  • Citing Article
  • December 1992

International Statistical Review

... Archetypal analysis (AA) 176 , also known as principal convex hull analysis, is a matrix factorisation that aims to approximate all alloy instances in a data set as a linear combination of extremal points. A given data set of n alloys described by m features is represented by an n × m matrix, X. Archetypal analysis seeks to find a k × m matrix Z such that each data instance can be represented as a mixture of the k archetypes. ...

Archetypal Analysis
  • Citing Article
  • November 1994

Technometrics

... The output is determined by the mode in classification tasks or the mean prediction in regression tasks of the individual decision trees. In regression, the tree predictor produces numerical values, unlike the class labels used by the random forest classifier (Breiman 1999). The RF regressor can be accurately described as a meta-estimator that aggregates the outputs from multiple individual decision trees (Cheng et al. 2021). ...

RANDOM FORESTS--RANDOM FEATURES
  • Citing Article

... Averaging and voting are common methods for generating the final integrated model. Bagging reduces variance, boosting decreases bias errors, and stacking leverages the strengths of multiple models to improve generalization by reducing both bias and variance errors (Bartlett et al., 1998;Breiman, 1998;Rincy and Gupta, 2020;Ugur et al., 2018). According to Figure 8, Ensemble algorithms can independently build ensemble models, such as extreme gradient boosting (GB) and light GB; they can also mix with base models to construct ensemble models, such as random forest (RF) and bagging classification and regression trees (CART). ...

Arcing classifiers. (With discussion)
  • Citing Article
  • June 1998

The Annals of Statistics

... Over fitting is a potential problem when the number of predictor variables relative to the number of subjects in the study (i.e., sample size) is large (Hosmer and Lemeshow, 1989). For better performance, a small number of predictor variables relative to the sample size should be used in model development (Anonymous, 1989). ...

Discriminant Analysis and Clustering: Panel on Discriminant Analysis, Classification, and Clustering
  • Citing Article
  • February 1989

Statistical Science

Ramanathan Gnanadesikan

·

Roger K. Blashfield

·

Leo Breiman

·

[...]

·

... The Π model (Breiman 1991) also uses a stepwise procedure for selecting a linear combination of products of univariate spline functions to be included in the metamodel. For all of these regression spline methods, the authors assume that the set of data values {(x i , y i )} to be fit are given. ...

The II &Pgr; method for estimating multivariate functions from noisy data
  • Citing Article
  • April 1991

Technometrics