Figure 1 - uploaded by Seyedsaman Emami
Content may be subject to copyright.

Classification boundaries for proposed Gradient boosting method (a) and GB (b)
Source publication
This paper presents a computationally efficient variant of gradient boosting for multi-class classification and multi-output regression tasks. Standard gradient boosting uses a 1-vs-all strategy for classifications tasks with more than two classes. This strategy translates in that one tree per class and iteration has to be trained. In this work, we...
Contexts in source publication
Context 1
... tree per class could pose difficulties learning the concepts. For this experiment a synthetic multi-class classification dataset with three classes, 1200 instances, two features, and one cluster per class was used. The generated dataset is based on the Madelon random data experiment [17]. The distribution of the training data points is shown in Fig. 1 with each class in a different color. The same hyper-parameter configuration was used for both methods (max depth=3, subsample=0.75, learning rate=0.1, decision trees=100). In Fig. 1, the decision boundaries of the two methods are shown using the same colors as the corresponding training data points. As it can be observed from the ...
Context 2
... one cluster per class was used. The generated dataset is based on the Madelon random data experiment [17]. The distribution of the training data points is shown in Fig. 1 with each class in a different color. The same hyper-parameter configuration was used for both methods (max depth=3, subsample=0.75, learning rate=0.1, decision trees=100). In Fig. 1, the decision boundaries of the two methods are shown using the same colors as the corresponding training data points. As it can be observed from the plots, both C-GB and GB perform very similarly and adjust rather good to the training instances. This shows that the proposed method is able to adapt to the problem at hand even though ...
Context 3
... Fig. 2, the decision trees of the first iteration are shown as an example for the classification task shown in Fig. 1. The figure shows, for iteration 1, one tree per class for GB and the one tree of C-GB. The nodes of the trees show the following values: the split information for internal nodes, the MSE error of the node, the number of instances and the node output. For multi-output decision trees the MSE is the average MSE error for all outputs (Eq. ...
Context 4
... of the node, the number of instances and the node output. For multi-output decision trees the MSE is the average MSE error for all outputs (Eq. 10) and the output value is a vector of size K-classes. The split of the root node and the left branch is equivalent to that of the class 0 tree of GB and basically isolates the class 0 (purple dots in fig. 1) from the rest. After the root node, the first two right nodes of the C-GB tree have the same splits than the first nodes of the class 2 GB tree. This illustrates how a single tree is able to capture the information of all classes of the problem in a more compact way. Also, using a single tree would produce more coherent outputs as the ...
Context 5
... tree per class could pose difficulties learning the concepts. For this experiment a synthetic multi-class classification dataset with three classes, 1200 instances, two features, and one cluster per class was used. The generated dataset is based on the Madelon random data experiment [17]. The distribution of the training data points is shown in Fig. 1 with each class in a different color. The same hyper-parameter configuration was used for both methods (max depth=3, subsample=0.75, learning rate=0.1, decision trees=100). In Fig. 1, the decision boundaries of the two methods are shown using the same colors as the corresponding training data points. As it can be observed from the ...
Context 6
... one cluster per class was used. The generated dataset is based on the Madelon random data experiment [17]. The distribution of the training data points is shown in Fig. 1 with each class in a different color. The same hyper-parameter configuration was used for both methods (max depth=3, subsample=0.75, learning rate=0.1, decision trees=100). In Fig. 1, the decision boundaries of the two methods are shown using the same colors as the corresponding training data points. As it can be observed from the plots, both C-GB and GB perform very similarly and adjust rather good to the training instances. This shows that the proposed method is able to adapt to the problem at hand even though ...
Context 7
... Fig. 2, the decision trees of the first iteration are shown as an example for the classification task shown in Fig. 1. The figure shows, for iteration 1, one tree per class for GB and the one tree of C-GB. The nodes of the trees show the following values: the split information for internal nodes, the MSE error of the node, the number of instances and the node output. For multi-output decision trees the MSE is the average MSE error for all outputs (Eq. ...
Context 8
... of the node, the number of instances and the node output. For multi-output decision trees the MSE is the average MSE error for all outputs (Eq. 10) and the output value is a vector of size K-classes. The split of the root node and the left branch is equivalent to that of the class 0 tree of GB and basically isolates the class 0 (purple dots in fig. 1) from the rest. After the root node, the first two right nodes of the C-GB tree have the same splits than the first nodes of the class 2 GB tree. This illustrates how a single tree is able to capture the information of all classes of the problem in a more compact way. Also, using a single tree would produce more coherent outputs as the ...
Similar publications
Crowd counting is usually handled in a density map regression fashion, which is supervised via a L2 loss between the predicted density map and ground truth. To effectively regulate models, various improved L2 loss functions have been proposed to find a better correspondence between predicted density and annotation positions. In this paper, we propo...