Leo Breiman’s research while affiliated with University of California, Berkeley and other places


Ad

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (50)


The II Method for Estimating Multivariate Functions From Noisy Data
  • Article

March 2012

·

95 Reads

·

76 Citations

Technometrics

Leo Breiman

The Π method for estimating an underlying smooth function of M variables, (x l , …, xm), using noisy data is based on approximating it by a sum of products of the form Πm m (x m ). The problem is then reduced to estimating the univariate functions in the products. A convergent algorithm is described. The method keeps tight control on the degrees of freedom used in the fit. Many examples are given. The quality of fit given by the Π method is excellent. Usually, only a few products are enough to fit even fairly complicated functions. The coding into products of univariate functions allows a relatively understandable interpretation of the multivariate fit.


Population theory for boosting ensembles

February 2004

·

74 Reads

·

77 Citations

The Annals of Statistics

Tree ensembles are looked at in distribution space, that is, the limit case of "infinite" sample size. It is shown that the simplest kind of trees is complete in D-dimensional L2(P)L_2(P) space if the number of terminal nodes T is greater than D. For such trees we show that the AdaBoost algorithm gives an ensemble converging to the Bayes risk.


Using Random Forest to Learn Imbalanced Data

January 2004

·

7,795 Reads

·

1,430 Citations

In this paper we propose two ways to deal with the imbalanced data classification problem using random forest. One is based on cost sensitive learning, and the other is based on a sampling technique. Performance metrics such as precision and recall, false positive rate and false negative rate, F-measure and weighted accuracy are computed. Both methods are shown to improve the prediction accuracy of the minority class, and have favorable performance compared to the existing algorithms.


Two-Eyed Algorithms and Problems

September 2003

·

79 Reads

·

1 Citation

Lecture Notes in Computer Science

Two-eyed algorithms are complex prediction algorithms that give accurate predictions and also give important insights into the structure of the data the algorithm is processing. The main example I discuss is RF/tools, a collection of algorithms for classification, regression and multiple dependent outputs. The last algorithm is a preliminary version and further progress depends on solving some fascinating questions of the characterization of dependency between variables. An important and intriguing aspect of the classification version of RF/tools is that it can be used to analyze unsupervised data–that is, data without class labels. This conversion leads to such by-products as clustering, outlier detection, and replacement of missing data for unsupervised data. The talk will present numerous results on real data sets. The code (f77) and ample documentation for RFtools is available on the web site www.stat.berkeley.edu/RFtools.


Using Iterated Bagging to Debias Regressions

December 2001

·

215 Reads

·

309 Citations

Machine Learning

Breiman (Machine Learning, 26(2), 123–140) showed that bagging could effectively reduce the variance of regression predictors, while leaving the bias relatively unchanged. A new form of bagging we call iterated bagging is effective in reducing both bias and variance. The procedure works in stages—the first stage is bagging. Based on the outcomes of the first stage, the output values are altered; and a second stage of bagging is carried out using the altered output values. This is repeated until a simple rule stops the process. The method is tested using both trees and nearest neighbor regression methods. Accuracy on the Boston Housing data benchmark is comparable to the best of the results gotten using highly tuned and compute- intensive Support Vector Regression Machines. Some heuristic theory is given to clarify what is going on. Application to two-class classification data gives interesting results.


Random Forests: Finding Quasars

October 2001

·

79 Reads

·

55 Citations

this paper, we discuss an example in which we classify objects as quasars or non-quasars using the combined results of a radio survey and an optical survey. Such classi cation helps guide the choice of which objects to follow up with relatively expensive spectroscopic measurements


Machine Learning, Volume 45, Number 1 - SpringerLink

October 2001

·

5,110 Reads

·

61,874 Citations

Machine Learning

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.


Randomizing Outputs To Increase Prediction Accuracy

September 2001

·

114 Reads

·

292 Citations

Machine Learning

Introduction In recent research in combining predictors, it has been recognized that the critical thing to success in combining low-bias predictors such as trees and neural nets has been through methods that reduce the variability in the predictor due to training set variability. Assume that the training set consists of N independent draws from the same underlying distribution. Conceptually, training sets of size N can be drawn repeatedly and the same algorithm used to construct a predictor on each training set. These predictors will vary, and the extent of the variability is a dominant factor in the generalization prediction error. 2 Given a training set {(y n ,x n ),n=1,...N} where the y's are either class labels or numerical values, the most common way of reducing variability is by perturbing the training set to produce alternative training sets, growing a predictor on


Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)

August 2001

·

2,537 Reads

·

3,643 Citations

Statistical Science

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.


Random Forests

January 2001

·

285 Reads

·

5,919 Citations

Machine Learning

Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.


Ad

Citations (50)


... 1A") were built independently on a proper training set (based on LLNA data). Instead of trying to choose a specific method, these methods were combined by the stacking methodology of Wolpert (1992) and Breiman (1996) to obtain a specific stacking meta-model for each tier. ...

Reference:

OPINION on Sodium Bromothymol Blue (C186) (CAS No. 34722-90-2, EC No. 252-169-7)
Stacked Regressions
  • Citing Article
  • July 1996

Machine Learning

... DTs are closely related to interpretability: not only are they often regarded as a particularly interpretable model (Freitas 2014), but also interpretability itself is regarded as the "biggest single advantage of the tree-structured approach" (Breiman and Friedman 1988). ...

Tree-Structured Classification Via Generalized Discriminant Analysis: Comment
  • Citing Article
  • September 1988

Journal of the American Statistical Association

... It builds multiple decision trees sequentially, where each tree corrects the errors of the previous one. While Random Forest is also an ensemble method 34 , it consists of multiple decision trees constructed independently from different random subsets of the data. The final prediction is made by majority voting. ...

Random Forests
  • Citing Article
  • January 2001

Machine Learning

... We chose a smoothing spline with the basis function of a thin plate spline over other smoothing spline approaches because it is both theoretically well-founded and particularly suited to our needs given that the approximations developed by Simon Wood (42) made thin plate regressions computationally efficient so that they can also be used for large data sets. This technique has a very good level of accuracy, though the curves produced are not as smooth as other automatic smoothers (45). The smoothing parameter can be estimated automatically and simultaneously with the whole model by either using restricted maximum likelihood (REML) or generalized cross validation (46). ...

Comparing Automatic Smoothers (A Public Service Enterprise)
  • Citing Article
  • December 1992

International Statistical Review

... Archetypal analysis (AA) 176 , also known as principal convex hull analysis, is a matrix factorisation that aims to approximate all alloy instances in a data set as a linear combination of extremal points. A given data set of n alloys described by m features is represented by an n × m matrix, X. Archetypal analysis seeks to find a k × m matrix Z such that each data instance can be represented as a mixture of the k archetypes. ...

Archetypal Analysis
  • Citing Article
  • November 1994

Technometrics

... The random forest (RF) model creates a forest randomly, so that a direct relationship can be established between the number of trees in the algorithm and the result. The RF classifier is more reliable for large feature size and data noise ranges, and the random process in the algorithm reduces model overfitting (Breiman, 1999). In many crop mapping studies, higher accuracies can be achieved with RF compared to other machine learning algorithms (Zhong et al., 2014;Tatsumi et al., 2015). ...

RANDOM FORESTS--RANDOM FEATURES
  • Citing Article

... Each tree is constructed using a different bootstrap sample of the data set, considering a random subset of features at each node. The final prediction is an average of all trees' predictions (for regression) or a majority vote (for classification (Breiman, 1998(Breiman, , 2001b). Its versatility, robustness, and ease of use are evident in handling both classification and regression tasks, dealing with large data sets, handling missing values, maintaining accuracy, and providing feature importance measures. ...

Arcing classifiers. (With discussion)
  • Citing Article
  • June 1998

The Annals of Statistics

... Over fitting is a potential problem when the number of predictor variables relative to the number of subjects in the study (i.e., sample size) is large (Hosmer and Lemeshow, 1989). For better performance, a small number of predictor variables relative to the sample size should be used in model development (Anonymous, 1989). ...

Discriminant Analysis and Clustering: Panel on Discriminant Analysis, Classification, and Clustering
  • Citing Article
  • February 1989

Statistical Science

Ramanathan Gnanadesikan

·

Roger K. Blashfield

·

Leo Breiman

·

[...]

·

... The Π model (Breiman 1991) also uses a stepwise procedure for selecting a linear combination of products of univariate spline functions to be included in the metamodel. For all of these regression spline methods, the authors assume that the set of data values {(x i , y i )} to be fit are given. ...

The II &Pgr; method for estimating multivariate functions from noisy data
  • Citing Article
  • April 1991

Technometrics