"Since the split made at a node is likely to vary with the sample selected, this technique results in different trees which can be combined in ensembles. Another method for randomization of the decision tree through histograms was proposed in Kamath et al. (2002). The use of histograms has long been suggested as a way of making the features discrete, while reducing the time to handle very large datasets. "
[Show abstract][Hide abstract] ABSTRACT: The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used for improving prediction performance. Researchers from various disciplines such as statistics and AI considered the use of ensemble methodology. This paper, review existing ensemble techniques and can be served as a tutorial for practitioners who are interested in building ensemble based systems.
"Furthermore , although ensemble methods require growing several models, their combination with decision/regression trees remains also very attractive in terms of computational efficiency because of the low computational cost of the standard tree growing algorithm. Hence, given the success of generic ensemble methods with trees, several researchers have looked at specific randomization techniques for trees based on a direct randomization of the tree growing method (e.g., Ali and Pazzani, 1996; Ho, 1998; Dietterich, 2000; Breiman, 2001; Cutler and Guohua, 2001; Geurts, 2002; Kamath et al., 2002). All these randomization methods actually cause perturbations in the induced models by modifying the algorithm responsible for the search of the optimal split during tree growing. "
[Show abstract][Hide abstract] ABSTRACT: This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the Extra-Trees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.