Article

Boosting algorithms: A review of methods, theory, and applications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... HE combination of multiple simple classifiers to produce a model whose performance surpasses a simple classifier is known as ensemble learning [1], [2]. This combination is often performed by methods that construct a weighted combination of the individual classifiers. ...
... This combination is often performed by methods that construct a weighted combination of the individual classifiers. The rationale behind these models is that training a combination of multiple simpler classifiers is easier and more effective than training a single complex classifier [2]. Boosting is a metaheuristic that exploits the strategy of building an ensemble of (so-called weak) classifiers, where each weak classifier (or weak learner) contributes to producing a more robust model [1]. ...
... Several boosting algorithms have been proposed, differing essentially on the methodology used to produce the differently weighted versions of the training data [2]. The first boosting algorithm, presented by Schapire [17], randomly splits the training set into three partitions and performs an analysis based on three classifiers (using a majority voting scheme to combine the classifiers' outputs). ...
Preprint
ProBoost, a new boosting algorithm for probabilistic classifiers, is proposed in this work. This algorithm uses the epistemic uncertainty of each training sample to determine the most challenging/uncertain ones; the relevance of these samples is then increased for the next weak learner, producing a sequence that progressively focuses on the samples found to have the highest uncertainty. In the end, the weak learners' outputs are combined into a weighted ensemble of classifiers. Three methods are proposed to manipulate the training set: undersampling, oversampling, and weighting the training samples according to the uncertainty estimated by the weak learners. Furthermore, two approaches are studied regarding the ensemble combination. The weak learner herein considered is a standard convolutional neural network, and the probabilistic models underlying the uncertainty estimation use either variational inference or Monte Carlo dropout. The experimental evaluation carried out on MNIST benchmark datasets shows that ProBoost yields a significant performance improvement. The results are further highlighted by assessing the relative achievable improvement, a metric proposed in this work, which shows that a model with only four weak learners leads to an improvement exceeding 12% in this metric (for either accuracy, sensitivity, or specificity), in comparison to the model learned without ProBoost.
... All models work at the same time and decide which hypothesis is the most correct by voting. Bagging algorithm is given in Table 2 [23]. ...
... All models run consecutively, and it is decided by weighted voting which hypothesis is the most correct. Boosting algorithm is given in Table 3 [23]. Boosting algorithm is a general approach that can be applied to many statistical learning methods for regression or classification such as Bagging. ...
... The overall classification success is provided by updating the selection probabilities. AdaBoost algorithm is given in Table 4 [23]. A high-performance classifier is obtained by combining the strongest weak classifiers in the Adaboost as in other ensemble learning algorithms. ...
Article
Full-text available
Artificial intelligence is a method that is increasingly becoming widespread in all areas of life and enables machines to imitate human behavior. Machine learning is a subset of artificial intelligence techniques that use statistical methods to enable machines to evolve with experience. As a result of the advancement of technology and developments in the world of science, the interest and need for machine learning is increasing day by day. Human beings use machine learning techniques in their daily life without realizing it. In this study, ensemble learning algorithms, one of the machine learning techniques, are mentioned. The methods used in this study are Bagging and Adaboost algorithms which are from Ensemble Learning Algorithms. The main purpose of this study is to find the best performing classifier with the Classification and Regression Trees (CART) basic classifier on three different data sets taken from the UCI machine learning database and then to obtain the ensemble learning algorithms that can make this performance better and more determined using two different ensemble learning algorithms. For this purpose, the performance measures of the single basic classifier and the ensemble learning algorithms were compared
... Adaptive boosting (AdaBoost) is one of the most widely used boosting approaches (Ferreira & Figueiredo, 2012). Details of the underlying math involved with AdaBoost can be found in the seminal works of Schapire (1996 and1997). ...
... Without delving into the mathematical complexities of this method, the advantages of AdaBoost can be boiled down to a couple of key methodological distinctions. First, unlike earlier boosting algorithms that trained multiple classifiers using random subsamples of a testing dataset, AdaBoost trains multiple classifiers that each have access to all available training data (Ferreira & Figueiredo, 2012). This allows for highly effective modeling with fewer data points (Ferreira & Figueiredo, 2012). ...
... First, unlike earlier boosting algorithms that trained multiple classifiers using random subsamples of a testing dataset, AdaBoost trains multiple classifiers that each have access to all available training data (Ferreira & Figueiredo, 2012). This allows for highly effective modeling with fewer data points (Ferreira & Figueiredo, 2012). Of course, fewer data points will result in a faster training process. ...
Article
Full-text available
Low-use tangible print collections represent a long-standing problem for academic libraries. Expanding on the previous research aimed at leveraging machine learning (ML) toward predicting patterns of collection use, this study explores the potential for adaptive boosting (AdaBoost) as a foundation for developing actionable predictive models of print title use. This study deploys the AdaBoost algorithm, with random forests used as the base classifier, via the adabag package for R. Methodological considerations associated with dataset congruence, as well as sample-based modeling versus novel data modeling, are explored in relation to four AdaBoost models that are trained and tested. Results of this study show AdaBoost as a promising ML solution for predictive modeling of print collections, with the central model of interest able to accurately predict use in over 85% of cases. This research also explores peripheral questions of interest related to general considerations when evaluating ML models, as well as the compatibility of similar models trained with e-book versus print book usage data.
... These techniques create strong classification models based on several weak learners. A weak learner refers to a learning algorithm that is capable to produce a classifier with an accuracy slightly above random guess (Ferreira and Figueiredo, 2012). Bagging refers to a technique where several weak learners are trained independently of each other. ...
... The prediction is realized by taking a weighted average of the predictions of each classifier across all iterations. The weights, thereby, are proportional to each classifier's accuracy on its training set (Ferreira and Figueiredo, 2012). Two boosting approaches are considered within the data analysis which are AdaBoost and Gradient ...
... One commonly applied boosting method is Adaptive Boosting, also known as AdaBoost, which was developed by Freund and Schapire (1996) to solve binary classification problems (Ghojogh and Crowley, 2019). The idea of AdaBoost is to learn x models in a hierarchy where every model gives more attention (higher weight) to instances that are misclassified by the previous model (Ferreira and Figueiredo, 2012). The algorithm is often applied using decision trees (Quinlan, 1996). ...
Thesis
Full-text available
The fashion industry operates in a highly competitive market with an increasing power of consumers regarding fashion trend creation and diffusion induced by the wide usage of online social networking platforms. This forces fashion companies to adapt their methods of trend prediction to meet consumer needs and preferences and to stay competitive. Social networking platforms provide an instrument to share ideas and opinions which allows users to influence others in their behaviors, and therefore, influence the development of trends. The content published on these platforms, thus, is a rich data source for the fashion industry containing information about changing consumer needs and upcoming trends. Fashion companies, however, challenge to benefit from this social media data for trend prediction purposes as they lack the knowledge about the trend-relevant users who publish content that includes information about future trends. This research addresses the challenge of profiting from this valuable data source for trend prediction, especially in the highly competitive fashion industry. It argues that trends are created and diffused by trendsetters and that the content which is shared by these trendsetters in online social networks includes information that enables early trend detection. Due to this, the study seeks to identify trendsetters based on their digital trace which they leave on online social networking platforms and addresses the question of how fashion trendsetters in online social networks can be identified automatically based on social media data. To achieve this goal, a feature framework is created based on literature review and expert interviews which enables the measurement of characteristics of trend-relevant roles based on social media data. Next, a two-step approach is developed which first extracts a topic-relevant sample of users (community) from a huge online social network, and then identifies the online trendsetters within this sample based on a supervised machine learning approach. For its development, a prototypical data analysis is realized based on publicly accessible data from the online social networking platform Instagram. The resulting methodology for the identification of online trendsetters related to a specific topic area consists of a topic-focused community detection approach and a classification model. The analysis of the relevant features for the model’s class decision further reveals insights into online trendsetters’ characteristics in online social networks. The evaluation of the developed methodology shows its transferability to other use cases and validates the trend prediction potential of the identified online trendsetters. The results of this thesis contribute to research and practice. The insights gained about online trendsetters’ behavioral patterns, characteristics, and the relevant features for their detection in online social networks expand the knowledge about online trendsetters related to the fashion industry, and thus, contribute to the area of trend research and the recently emerging field of fashion informatics. Besides, the insights can be used by companies to identify appropriate marketing partners to influence trends. Furthermore, the developed methodology supports fashion companies with providing a new data source to increase trend prediction quality and facilitates the identification of changing consumer needs and preferences.
... Ensemble learning consists of three different methods: 1) Bagging: It stands for 'bootstrap aggregating'. It was proposed by Breiman [23] to combine predictive results of models trained on randomly distributed training sets to improve classification performance [24]. As seen in Fig 3, training sets are randomly distributed in bagging method. ...
... And these prediction results are combined at the end. 2) Boosting : Boosting is the algorithm that combines a number of weak classifiers to create a strong classifier [24]. This method creates a gradually recurring community. ...
... This method creates a gradually recurring community. As in Fig 4., each new predictive model tries to increase performance by emphasizing the incorrect classifications of the previous model [24]. Adaboosting [25] is still the most used boosting method, although many different boosting methods have been developed. ...
Article
Full-text available
Pneumonia is a bacterial infection caused people of all ages with mild to severe inflammation of the lung tissue. The best known and most common clinical method for the diagnosis of pneumonia is chest X-ray imaging. But the diagnosis of pneumonia from chest X-ray images is a difficult task, even for specialist radiologists. In developing countries, this lung disease becomes one of the deadliest among children under the age of 5 and causing 15% of deaths recorded annually. Therefore, in this study, firstly the presence of the disease was tried to be determined using chest X-ray dataset. In addition, using the bacterial and viral pneumonia classes which are two different types of pneumonia, multi class classification which consists of viral pneumonia, bacterial pneumonia and healthy has been done. Since the used dataset does not have a balanced distribution among all classes, SMOTE (Synthetic Minority Over-sampling Technique) method has been used to deal with imbalanced dataset. CNN model and models in Ensemble Learning have been created from scratch instead of using weights of pre-trained networks to see the effectiveness of CNN weights on medical data. For each classification problem, two different deep learning methods which are CNN and ensemble learning have been used and 95% average accuracy has been obtained for each model, for binary classification and 78% and 75% average accuracy has been obtained for each model respectively for multi class classification problem.
... Among other ML methods, boosting algorithms have emerged as robust and competitive techniques that have been consistently placed among the top contenders in most Kaggle competitions [10,11]. Their performances are strongly justified and backed up theoretically by their ability to improve the performance of weak classification models by combining the outputs of many "weak" classifiers [12]. Furthermore, these models can absorb more input variables and adequately describe non-linear and complicated relations between variables. ...
... The algorithm's efficiency depends on building a diverse, yet accurate, collection of classifiers [46]. The key idea behind ADAB is to use weighted versions of the same training samples, rather than to use random subsamples [12]. ...
Article
Full-text available
There is growing tension between high-performance machine-learning (ML) models and explainability within the scientific community. In arsenic modelling, understanding why ML models make certain predictions, for instance, "high arsenic" instead of "low arsenic", is as important as the prediction accuracy. In response, this study aims to explain model predictions by assessing the relationship between influencing input variables, i.e., pH, turbidity (Turb), total dissolved solids (TDS), and electrical conductivity (Cond), on arsenic mobility. The two main objectives of this study are to: (i) classify arsenic concentrations in multiple water sources using novel boosting algorithms such as natural gradient boosting (NGB), categorical boosting (CATB), and adaptive boosting (ADAB) and compare them with other existing representative boosting algorithms, and (ii) introduce a novel SHapley Additive exPlanation (SHAP) approach for interpreting the performance of ML models. The outcome of this study indicates that the newly introduced boosting algorithms produced efficient performances, which are comparable to the state-of-the-art boosting algorithms and a benchmark random forest model. Interestingly, the extreme gradient boosting (XGB) proved superior over the remaining models in terms of overall and single-class performance metrics measures. Global and local interpretation (using SHAP with XGB) revealed that high pH water is highly correlated with high arsenic water and vice versa. In general, high pH, high Cond, and high TDS were found to be the potential indicators of high arsenic water sources. Conversely, low pH, low Cond, and low TDS were the main indicators of low arsenic water sources. This study provides new insights into the use of ML and explainable methods for arsenic modelling.
... Boosting approaches are an alternative family of ensemble algorithms which perform well in both classification and regression problems [65]. Similarly to bagging, boosting follows the learning paradigm of using simple (or "weak") ML models (classifiers/regressors), named learners, to form a powerful final model that combines their outputs. ...
... A widely used boosting technique is Adaptive Boosting (AdaBoost). Ad-aBoost proposes to train each weak learner in such a way that each learner focuses on the data that was misclassified by its predecessor, so that learners further down the queue iteratively learn to adapt their parameters and achieve better results [65,62]. Multiple variants of the AdaBoost algorithm exist, starting from the original one [66] designed to tackle binary classification problems, regression, or multi-class classification options. ...
Preprint
Full-text available
Atmospheric Extreme Events (EEs) cause severe damages to human societies and ecosystems. The frequency and intensity of EEs and other associated events are increasing in the current climate change and global warming risk. The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools. Machine Learning (ML) methods have arisen in the last years as powerful techniques to tackle many of the problems related to atmospheric EEs. This paper reviews the ML algorithms applied to the analysis, characterization, prediction, and attribution of the most important atmospheric EEs. A summary of the most used ML techniques in this area, and a comprehensive critical review of literature related to ML in EEs, are provided. A number of examples is discussed and perspectives and outlooks on the field are drawn.
... Boosting methods are a kind of ensemble algorithm which follows a special procedure for training their learners. They obtain excellent performance in both classification and regression problems (Ferreira and Figueiredo, 2012). All boosting methods establish the same structure for all the learners involved in the ensemble, that is, same architecture, number of parameters, or input-output variables. ...
... the ensemble structure, the learners are trained sequentially, in such a way that each new learner requires that the previous learner had been trained before, see Fig. 6. Adaptive Boosting (AdaBoost) is a kind of boosting method that proposes to train each of these machines iteratively, in such a way that each learner focuses on the data that was misclassified by its predecessor, to iteratively adapt its parameters and achieve better results (Ferreira and Figueiredo, 2012;González et al., 2020). Fig. 6 shows an outline of the Adaboost algorithm for multi-class classification. ...
Article
Atmospheric low-visibility events are usually associated with fog formation. Extreme low-visibility events deeply affect the air and ground transportation, airports and motor-road facilities causing accidents and traffic problems every year. Machine Learning (ML) algorithms have been successfully applied to many fog formation and low-visibility prediction problems. The associated problem can be formulated either as a regression or as a classification task, which has an impact on the type of ML approach to be used and on the quality of the predictions obtained. In this paper we carry out a complete analysis of low-visibility events prediction problems, formulated as both regression and classification problems. We discuss the performance of a large number of ML approaches in each type of problem, and evaluate their performance under a common comparison framework. According to the obtained results, we will provide indications on what the most efficient formulation is to tackle low-visibility predictions and the best performing ML approaches for low-visibility events prediction.
... Boosting approaches are an alternative family of ensemble algorithms which has obtained excellent performance in both classification and regression problems [31]. As bagging, boosting follows the learning paradigm of using simple or weak ML models (classifiers/regressors), named as learners, to form a powerful final approach properly combining their outputs. ...
... A widely used boosting technique in the history of ensemble learning is Adaptive Boosting (AdaBoost). AdaBoost proposes to train each of these machines iteratively, in such a way that each base learner focuses on the data that was misclassified by its predecessor, to iteratively adapt its parameters and achieve better results [31,26]. We can find multiple variants of the AdaBoost algorithm, starting from the original one [32] designed to tackle binary classification problems, regression or multi-class classification options. ...
Preprint
Randomization-based Machine Learning methods for prediction are currently a hot topic in Artificial Intelligence, due to their excellent performance in many prediction problems, with a bounded computation time. The application of randomization-based approaches to renewable energy prediction problems has been massive in the last few years, including many different types of randomization-based approaches, their hybridization with other techniques and also the description of new versions of classical randomization-based algorithms, including deep and ensemble approaches. In this paper we review the most important characteristics of randomization-based machine learning approaches and their application to renewable energy prediction problems. We describe the most important methods and algorithms of this family of modeling methods, and perform a critical literature review, examining prediction problems related to solar, wind, marine/ocean and hydro-power renewable sources. We support our critical analysis with an extensive experimental study, comprising real-world problems related to solar, wind and hydro-power energy, where randomization-based algorithms are found to achieve superior results at a significantly lower computational cost than other modeling counterparts. We end our survey with a prospect of the most important challenges and research directions that remain open this field, along with an outlook motivating further research efforts in this exciting research field.
... It was shown in [159] that H outperforms each of the weak classifiers. This procedure can be run recursively, that is, each h i can be replaced using a boosted classifier H to achieve enhanced performance [52]. ...
... This variant is called real AdaBoost. The simplicity of AdaBoost and its ability to cope with any weak learner makes it attractive, and therefore there exists a variety of methods based on the idea of AdaBoost [97,52]. ...
Book
Full-text available
In the context of advanced driver-assistance systems (ADAS), vehicles are equipped with multiple sensors to record the vehicle's environment and use intelligent algorithms to understand the data. This study contributes to the research in modern ADAS on different aspects. Methods deployed in ADAS must be accurate and computationally efficient in order to run fast on embedded platforms. We introduce a novel approach for pedestrian detection that economizes on the computational cost of cascades. We demonstrate that (a) our two-stage cascade achieves a high accuracy while running in real time, and (b) our three-stage cascade ranks as the fourth best-performing method on one of the most challenging pedestrian datasets. The other challenge faced with ADAS is the scarcity of positive training data. We introduce a novel approach that enables AdaBoost detectors to benefit from a high number of negative samples. We demonstrate that our approach ranks as the second-best among its competitors on two challenging pedestrian datasets while being multiple times faster. Acquiring labeled training data is costly and time-consuming, particularly for traffic sign recognition. We investigate the use of synthetic data with the aspiration to reduce the human efforts behind the data preparation. We (a) algorithmically and architecturally adapt the adversarial modeling framework to the image data provided in ADAS, and (b) conduct various evaluations and discuss promising future research directions.
... whenever P ξ=1 w ξ h ξ (x) = 0 [7]. ...
... In the following, we point out the relationship between the bipolar RCAMbased ensemble classifier and the majority voting ensemble described by (7). Let y be the vector recalled by the RCAM fed by the input z(0) given by (9), that is, y is a stationary state of the RCAM. ...
Chapter
An ensemble method should cleverly combine a group of base classifiers to yield an improved classifier. The majority vote is an example of a methodology used to combine classifiers in an ensemble method. In this paper, we propose to combine classifiers using an associative memory model. Precisely, we introduce ensemble methods based on recurrent correlation associative memories (RCAMs) for binary classification problems. We show that an RCAM-based ensemble classifier can be viewed as a majority vote classifier whose weights depend on the similarity between the base classifiers and the resulting ensemble method. More precisely, the RCAM-based ensemble combines the classifiers using a recurrent consult and vote scheme. Furthermore, computational experiments confirm the potential application of the RCAM-based ensemble method for binary classification problems.
... Similar to other semi-supervised wrapper-based methods, first, a classifier is trained using the labeled data [22,27]. Next, multiple iterations are performed with training and pseudo-labeling phase. ...
Preprint
Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers use far fewer labels for training models, but there are numerous semi-supervised methods, including self-labeling, co-training, maximal-margin, and graph-based methods, to name a few. Only a handful of these methods have been tested in SE for (e.g.) predicting defects and even that, those tests have been on just a handful of projects. This paper takes a wide range of 55 semi-supervised learners and applies these to over 714 projects. We find that semi-supervised "co-training methods" work significantly better than other approaches. However, co-training needs to be used with caution since the specific choice of co-training methods needs to be carefully selected based on a user's specific goals. Also, we warn that a commonly-used co-training method ("multi-view"-- where different learners get different sets of columns) does not improve predictions (while adding too much to the run time costs 11 hours vs. 1.8 hours). Those cautions stated, we find using these "co-trainers," we can label just 2.5% of data, then make predictions that are competitive to those using 100% of the data. It is an open question worthy of future work to test if these reductions can be seen in other areas of software analytics. All the codes used and datasets analyzed during the current study are available in the https://GitHub.com/Suvodeep90/Semi_Supervised_Methods.
... This work implements Boosting algorithms [2], [10]. Gradient boosting is an algorithm that stands out for its predictability and speed, especially when dealing with big and complicated datasets [11], [12]. Gradient Boosting algorithm has three main components, loss function, weak learner, and additive model. ...
Conference Paper
Considering the current state of Covid-19 pandemic, vaccine research and production is more important than ever. Antibodies recognize epitopes, which are immunogenic regions of antigen, in a very specific manner, to trigger an immune response. It is extremely difficult to predict such locations, yet they have substantial implications for complex humoral immunogenicity pathways. This paper presents a machine learning epitope prediction model. The research creates several models to test the accuracy of B-cell epitope prediction based solely on protein features. The goal is to establish a quantitative comparison of the accuracy of three machine learning models, XGBoost, CatBoost, and LightGbM. Our results found similar accuracy between the XGBoost and LightGbM models with the CatBoost model having the highest accuracy of 82%. Though this accuracy is not high enough to be considered reliable it does warrant further research on the subject.
... The next model now gives more weight to all the points with higher weights. It will train models until a smaller error is observed (Ferreira and Figueiredo, 2012). Fig. 4 displays the framework of the Adaboost method. ...
Article
Full-text available
Bioethanol demands have increased during the last decade due to unexpected events worldwide. It is among the renewable energy sources that are utilized to replace fossil-fuel-based energy. Designing an efficient, Sustainable Bioethanol Supply Chain Network (SBSCN) is a critical task for the government and communities to manage the bioethanol demands appropriately. The contribution of this study is threefold. First, this study project the demand using three popular machine learning methods, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Ensemble Learning (Adaboost) methods, to find the best one to be utilized as an input to the proposed mathematical to find optimal values for strategic, planning, and tactical decision variables. Second, the unemployment rate is considered an important parameter of the model to maximize the social effects. Third, since the proposed model of this study is NP-hard, to solve the problem, the CPLEX solver is applied for small size and two meta-heuristic algorithms, including Non-dominated Sorting Genetic Algorithm II (NSGA-II) and Multi-Objective Invasive Weed Optimization (MOIWO) are considered for medium and large size problems to find Pareto optimal solutions. Due to the sensitivity of two meta-heuristics algorithms, the Taguchi method, such as Max Spread (MS) and Mean Ideal Distance (MID), is utilized to control the parameters of those two applied algorithms. Since North Dakota (ND) is among the states with the most potential to produce bioethanol due to its vast land area, including marginal cropland and Conservation Research Program (CRP), the proposed model has been evaluated and validated based on the ND case study. The results show that the MOIWO algorithm outperforms NSGA-II based on the proposed model and the case study of this paper. Also, this algorithm is more reliable in terms of solution quality to tackle the problem. Finally, some research directions are discussed for future studies, and managerial insights are provided.
... In this sense, the SVM is a base model and new models can only focus on the cases that the SVM feels uncertain, i.e., the support vectors. According to the boosting theory [33], [34], we can always combine the SVM and the new models to gain better performance. ...
Preprint
Performance of speaker recognition systems is evaluated on test trials. Although as crucial as rulers for tailors, trials have not been carefully treated so far, and most existing benchmarks compose trials by naive cross-pairing. In this paper, we argue that the cross-pairing approach produces overwhelming easy trials, which in turn leads to potential bias in system and technique comparison. To solve the problem, we advocate more attention to hard trials. We present an SVM-based approach to identifying hard trials and use it to construct new evaluation sets for VoxCeleb1 and SITW. With the new sets, we can re-evaluate the contribution of some recent technologies. The code and the identified hard trials will be published online at http://project.cslt.org.
... The above method borrows a boosting concept, which is based on the idea that a combination of simple classifiers can have better performance than any of the simple classifiers alone [15]. With the same training data, a simple classifier (basic learner) is able to produce labels with a probability of error. ...
Article
Full-text available
Introduction: Obstructive sleep apnea (OSA) can cause serious health problems such as hypertension or cardiovascular disease. The manual detection of apnea is a time-consuming task, and automatic diagnosis is much more desirable. The contribution of this work is to detect OSA using a multi-error-reduction (MER) classification system with multi-domain features from bio-signals. Methods: Time-domain, frequency-domain, and non-linear analysis features are extracted from oxygen saturation (SaO2), ECG, airflow, thoracic, and abdominal signals. To analyse the significance of each feature, we design a two-stage feature selection. Stage 1 is the statistical analysis stage, and Stage 2 is the final feature subset selection stage using machine learning methods. In Stage 1, two statistical analyses (the one-way analysis of variance (ANOVA) and the rank-sum test) provide a list of the significance level of each kind of feature. Then, in Stage 2, the support vector machine (SVM) algorithm is used to select a final feature subset based on the significance list. Next, an MER classification system is constructed, which applies a stacking with a structure that consists of base learners and an artificial neural network (ANN) meta-learner. Results: The Sleep Heart Health Study (SHHS) database is used to provide bio-signals. A total of 66 features are extracted. In the experiment that involves a duration parameter, 19 features are selected as the final feature subset because they provide a better and more stable performance. The SVM model shows good performance (accuracy = 81.68%, sensitivity = 97.05%, and specificity = 66.54%). It is also found that classifiers have poor performance when they predict normal events in less than 60 s. In the next experiment stage, the time-window segmentation method with a length of 60s is used. After the above two-stage feature selection procedure, 48 features are selected as the final feature subset that give good performance (accuracy = 90.80%, sensitivity = 93.95%, and specificity = 83.82%). To conduct the classification, Gradient Boosting, CatBoost, Light GBM, and XGBoost are used as base learners, and the ANN is used as the meta-learner. The performance of this MER classification system has the accuracy of 94.66%, the sensitivity of 96.37%, and the specificity of 90.83%.
... The gentle term comes from the fact that gentle Adaboost is considered more conservative and stable than the original Adaboost [29]. However, according to [87], Adaboost is still outperforming Gentle Adaboost in FER. ...
Article
Full-text available
Facial expressions are among the most powerful ways to reveal the emotional state. Therefore , Facial Expression Recognition (FER) has been widely introduced to wide fields of applications, such as security, psychotherapy, neuromarketing, and advertisement. Feature extraction and selection are two essential key issues for the design of efficient FER systems. However, most of the previous studies focused on implementing static feature selection methods. Although these methods have shown promising results, they still present weaknesses , especially when dealing with spontaneous expressions. This is mainly due to the specificity of each face, which makes the facial emotion display differs from one subject to another. To address this problem, we propose a face-based dynamic feature selection of two geometric features sub-classes, namely linear and eccentricity features. This combination provides a better understanding of the facial transformation during the emotion display. Moreover, the suggested selection method takes into consideration the subject's general facial structure, muscle movements, and head position. The performed experiments, using the CK+ and the DISFA datasets, have showed that the proposed method outperforms the state-of-the-art techniques and maintains superior performance with cross-dataset validation. In fact, the accuracy of facial expression recognition by the proposed method reaches 97.72% and 91,26% on the CK+ and the DISFA datasets, respectively. Ones Sidhom, Haythem Ghazouani and Walid Barhoumi contributed equally to this work.
... The speculative ability of ensemble is usually greater than that of basic learners. Taking all factors into consideration, ensemble strategies are involved mainly because they can help weak learners, which is even better than the casual guesses of learners, who can make extremely accurate predictions (Ferreira and Figueiredo 2012). Therefore, basic learners are also called dependent learners. ...
Article
Artificial intelligence (AI) technology can be used to predict future value, especially for the healthcare industry. With the improvement of AI promotion strategies (bringing exploration in different fields), it is not difficult to predict certain advances. This article describes the use of AI in patients with kidney disease, such as chronic kidney disease (ckd) or patients without kidney disease (notckd). Five basic machine learning (ML) classifiers were used, and the confusion matrix and ROC curve were used to check their accuracy. The ensemble method is also used to extract the accuracy rate through the four ensemble classifiers, and confirm it through the confusion matrix, accuracy rate, recall rate, F1 score, and support value. Finally, the accuracy results obtained by the basic classifier and the ensemble classifier are analyzed. The accuracy result obtained by the ensemble classifier is higher than that of the essential classifier. The overall framework can be seen as a better ensemble method than basic classifiers. It is smarter to respond differently to a query that we come to a single choice, so in order to accomplish the decisive goal, ensemble method is better than some other well-known programs.
... The ensemble of neural networks has been demonstrated to be successful in improving the predictive performance of machine learning models 42 . There are broadly two methods of generating ensembles, (i) randomizationbased approaches where the ensembles can be trained in parallel without any interaction, and (ii) boostingbased approaches where the ensembles are trained sequentially 43 . The randomization procedure for generating ensembles of neural networks should be such that prediction from individual models are de-correlated and each individual models are strong (i.e., high accuracy). ...
Article
Full-text available
Recently, computational modeling has shifted towards the use of statistical inference, deep learning, and other data-driven modeling frameworks. Although this shift in modeling holds promise in many applications like design optimization and real-time control by lowering the computational burden, training deep learning models needs a huge amount of data. This big data is not always available for scientific problems and leads to poorly generalizable data-driven models. This gap can be furnished by leveraging information from physics-based models. Exploiting prior knowledge about the problem at hand, this study puts forth a physics-guided machine learning (PGML) approach to build more tailored, effective, and efficient surrogate models. For our analysis, without losing its generalizability and modularity, we focus on the development of predictive models for laminar and turbulent boundary layer flows. In particular, we combine the self-similarity solution and power-law velocity profile (low-fidelity models) with the noisy data obtained either from experiments or computational fluid dynamics simulations (high-fidelity models) through a concatenated neural network. We illustrate how the knowledge from these simplified models results in reducing uncertainties associated with deep learning models applied to boundary layer flow prediction problems. The proposed multi-fidelity information fusion framework produces physically consistent models that attempt to achieve better generalization than data-driven models obtained purely based on data. While we demonstrate our framework for a problem relevant to fluid mechanics, its workflow and principles can be adopted for many scientific problems where empirical, analytical, or simplified models are prevalent. In line with grand demands in novel PGML principles, this work builds a bridge between extensive physics-based theories and data-driven modeling paradigms and paves the way for using hybrid physics and machine learning modeling approaches for next-generation digital twin technologies.
... Decision-tree ensemble methods are supervised learning methods for modeling the relationship between the dependent variable y with the characteristic vector x. Besides, these techniques are a common choice on the actual machine learning research scenario, it has a wide range of applications for regression, classification and other tasks [48], [49]. ...
Preprint
Full-text available
Most representative decision tree ensemble methods have been used to examine the variable importance of Treasury term spreads to predict US economic recessions with a balance of generating rules for US economic recession detection. A strategy is proposed for training the classifiers with Treasury term spreads data and the results are compared in order to select the best model for interpretability. We also discuss the use of SHapley Additive exPlanations (SHAP) framework to understand US recession forecasts by analyzing feature importance. Consistently with the existing literature we find the most relevant Treasury term spreads for predicting US economic recession and a methodology for detecting relevant rules for economic recession detection. In this case, the most relevant term spread found is 3 month to 6 month, which is proposed to be monitored by economic authorities. Finally, the methodology detected rules with high lift on predicting economic recession that can be used by these entities for this propose. This latter result stands in contrast to a growing body of literature demonstrating that machine learning methods are useful for interpretation comparing many alternative algorithms and we discuss the interpretation for our result and propose further research lines aligned with this work.
... Decision-tree ensemble methods are supervised learning methods for modeling the relationship between the dependent variable y with the characteristic vector x. Besides, these techniques are a common choice on the actual machine learning research scenario, it has a wide range of applications for regression, classification and other tasks [48], [49]. ...
Article
Full-text available
Most representative decision-tree ensemble methods have been used to examine the variable importance of Treasury term spreads to predict US economic recessions with a balance of generating rules for US economic recession detection. A strategy is proposed for training the classifiers with Treasury term spreads data and the results are compared in order to select the best model for interpretability. We also discuss the use of SHapley Additive exPlanations (SHAP) framework to understand US recession forecasts by analyzing feature importance. Consistently with the existing literature we find the most relevant Treasury term spreads for predicting US economic recession and a methodology for detecting relevant rules for economic recession detection. In this case, the most relevant term spread found is 3-month–6-month, which is proposed to be monitored by economic authorities. Finally, the methodology detected rules with high lift on predicting economic recession that can be used by these entities for this propose. This latter result stands in contrast to a growing body of literature demonstrating that machine learning methods are useful for interpretation comparing many alternative algorithms and we discuss the interpretation for our result and propose further research lines aligned with this work.
... The speculative ability of ensemble is usually greater than that of basic learners. Taking all factors into consideration, ensemble strategies are involved mainly because they can help weak learners, which is even better than the casual guesses of learners, who can make extremely accurate predictions (Ferreira and Figueiredo 2012). Therefore, basic learners are also called dependent learners. ...
Article
Full-text available
Artificial intelligence (AI) technology can be used to predict future value, especially for the healthcare industry. With the improvement of AI promotion strategies (bringing exploration in different fields), it is not difficult to predict certain advances. This article describes the use of AI in patients with kidney disease, such as chronic kidney disease (ckd) or patients without kidney disease (notckd). Five basic machine learning (ML) classifiers were used, and the confusion matrix and ROC curve were used to check their accuracy. The ensemble method is also used to extract the accuracy rate through the four ensemble classifiers, and confirm it through the confusion matrix, accuracy rate, recall rate, F1 score, and support value. Finally, the accuracy results obtained by the basic classifier and the ensemble classifier are analyzed. The accuracy result obtained by the ensemble classifier is higher than that of the essential classifier. The overall framework can be seen as a better ensemble method than basic classifiers. It is smarter to respond differently to a query that we come to a single choice, so in order to accomplish the decisive goal, ensemble method is better than some other well-known programs.
... For the purpose of improving the performance of prediction, this article conduct a study on ensemble methods as Boosting family which have proven to be efficient in several areas. The boosting is one of the supervised machine learning methods (Ferreira et al., 2012), which consists of aggregating classifiers developed sequentially on a learning sample, to predict new observation. The task of a boosting algorithm is to learn from the iterative application of weak classifiers and then correct the errors from the last classifier to obtain more accurate classifier (Mayr et al., 2014). ...
Article
The student profile has become an important component of education systems. Many systems objectives, as e-recommendation, e-orientation, e-recruitment and dropout prediction are essentially based on the profile for decision support. Machine learning plays an important role in this context and several studies have been carried out either for classification, prediction or clustering purpose. In this paper, the authors present a comparative study between different boosting algorithms which have been used successfully in many fields and for many purposes. In addition, the authors applied feature selection methods Fisher Score, Information Gain combined with Recursive Feature Elimination to enhance the preprocessing task and models’ performances. Using multi-label dataset predict the class of the student performance in mathematics, this article results show that the Light Gradient Boosting Machine (LightGBM) algorithm achieved the best performance when using Information gain with Recursive Feature Elimination method compared to the other boosting algorithms.
... The ensemble of neural networks has been demonstrated to be successful in improving the predictive performance of machine learning models [41]. There are broadly two methods of generating ensembles, (i) randomization-based approaches where the ensembles can be trained in parallel without any interaction, and (ii) boosting-based approaches where the ensembles are trained sequentially [42]. The randomization procedure for generating ensembles of neural networks should be such that prediction from individual models are de-correlated and each individual models are strong (i.e., high accuracy). ...
Preprint
Full-text available
Recently, computational modeling has shifted towards the use of deep learning, and other data-driven modeling frameworks. Although this shift in modeling holds promise in many applications like design optimization and real-time control by lowering the computational burden, training deep learning models needs a huge amount of data. This big data is not always available for scientific problems and leads to poorly generalizable data-driven models. This gap can be furnished by leveraging information from physics-based models. Exploiting prior knowledge about the problem at hand, this study puts forth a concatenated neural network approach to build more tailored, effective, and efficient machine learning models. For our analysis, without losing its generalizability and modularity, we focus on the development of predictive models for laminar and turbulent boundary layer flows. In particular, we combine the self-similarity solution and power-law velocity profile (low-fidelity models) with the noisy data obtained either from experiments or computational fluid dynamics simulations (high-fidelity models) through a concatenated neural network. We illustrate how the knowledge from these simplified models results in reducing uncertainties associated with deep learning models. The proposed framework produces physically consistent models that attempt to achieve better generalization than data-driven models obtained purely based on data. While we demonstrate our framework for a problem relevant to fluid mechanics, its workflow and principles can be adopted for many scientific problems where empirical models are prevalent. In line with grand demands in novel physics-guided machine learning principles, this work builds a bridge between extensive physics-based theories and data-driven modeling paradigms and paves the way for using hybrid modeling approaches for next-generation digital twin technologies.
... It is a powerful ensemble intended for classification problems. The first boosting system was developed in 1996 by Yoav Freund and Robert Schapire [34]. It combines many lowaccuracy models to compensate for the shortcomings of predecessors, resulting in a robust classifier. ...
Article
Full-text available
One of the crucial challenges of the distribution network is the unintentionally isolated section of electricity from the power network, called unintentional islanding. Unintentionalislanding detection is severed when the local generation is equal to or closely matches the load requirement. In this paper, both ensemble learning and canonical methods are implemented for the islanding detection technique of synchronous machine-based dis-tributed generation. The ensemble learning models for this study are random forest (RF)and Ada boost, while the canonical methods are multi-layer perceptron (MLP), decision tree (DT), and support vector machine (SVM). The training and testing parameters for this technique are the total harmonic distortion (THD) of both current and voltage signals. THD is the most important parameter of power quality monitoring under islanding scenarios. The parameter and data extraction from the test system is executed in a MATLAB/Simulink environment, whereas the training and testing of the presented techniques are implemented in Python. Performance indices such as accuracy, precision, recall, and F1 score are used for evaluation, and both ensemble learning models and canonical models demonstrate good performance. Ada-boost shows the highest accuracy among all the fivemodels with original data, while RF is robust and gives the best results with noisy data (20and 30 dB) because of its ensemble nature.
... The GBT model was introduced as a robust machine learning algorithm based on the combination of weak learners and the gradient algorithm (Ferreira and Figueiredo, 2012). Similar to the RF algorithm, GBT is also classified to a decision tree algorithm, but the RF algorithm uses the bagging technique, and the GBT algorithm uses the boosting technique to improve the accuracy of the model (Dietterich, 2000;Sutton, 2005). ...
Article
Copper is one of the valuable natural resources, and it was widely used in many different industries. The complicated fluctuations of copper prices can significantly affect other industries. Therefore, this study aims to develop and propose several forecast models for forecasting monthly copper prices in the future based on various algorithms in machine learning, including multi-layer perceptron (MLP) neural network, k-nearest neighbors (KNN), support vector machine (SVM), gradient boosting tree (GBT), and random forest (RF). The monthly copper price dataset from January 1990 to December 2019 was collected for this aim based on other metals and natural gas prices. In addition, the influence of currency exchange rates of the countries that have the largest copper production around the world was also taken into account and used as input variables for forecasting copper price. The different set of predictors (t, t-1, t-2, t-3, t-4. t-5) were considered to forecast monthly copper prices based on the mentioned machine learning techniques. The results revealed that the currency exchange rates of the countries that have the most abundant copper production around the world have a significant effect on the volatility of monthly copper prices in the world, and they should be used to forecast monthly copper prices in the future. A comprehensive comparison of various machine learning techniques shows that MLP neural network (with deep learning techniques) is the best method for forecasting monthly copper price with an MAE of 228.617 and RMSE of 287.539. Whereas, the other models, such as SVM, RF, KNN, and GBT, provided higher errors with an MAE in the range of 308.691–453.147, RMSE in the range of 393.599–552.208. In this sense, MLP neural network can be used as a reliable tool to forecast copper prices in the future.
... Adaptive Boosting or AdaBoost is a boosting algorithm with the ultimate goal of using weighted variants of the same training dataset rather than using sub-samples as with other boosting techniques (Freund and Schapire, 1995;1996;1997). The advantage of this idea is that the algorithm does not require massive data since it repeatedly uses the same training dataset (Ferreira and Figueiredo, 2012). Hastie et al. (2009) said the algorithm is well known and trusted for building ensemble classifiers to produce an excellent result. ...
Article
Full-text available
The consequences of collapsed stopes can be dire in the mining industry. This can lead to the revocation of a mining license in most jurisdictions, especially when the harm costs lives. Therefore, as a mine planning and technical services engineer, it is imperative to estimate the stability status of stopes. This study has attempted to produce a stope stability prediction model adopted from stability graph using ensemble learning techniques. This study was conducted using 472 case histories from 120 stopes of AngloGold Ashanti Ghana, Obuasi Mine. Random Forest, Gradient Boosting, Bootstrap Aggregating and Adaptive Boosting classification algorithms were used to produce the models. A comparative analysis was done using six classification performance metrics namely Accuracy, Precision, Sensitivity, F1-score, Specificity and Mathews Correlation Coefficient (MCC) to determine which ensemble learning technique performed best in predicting the stability of a stope. The Bootstrap Aggregating model obtained the highest MCC score of 96.84% while the Adaptive Boosting model obtained the lowest score. The Specificity scores in decreasing order of performance were 98.95%, 97.89%, 96.32% and 95.26% for Bootstrap Aggregating, Gradient Boosting, Random Forest and Adaptive Boosting respectively. The results showed equal Accuracy, Precision, F1-score and Sensitivity score of 97.89% for the Bootstrap Aggregating model while the same observation was made for Adaptive Boosting, Gradient Boosting and Random Forest with 90.53%, 92.63% and 95.79% scores respectively. At a 95% confidence interval using Wilson Score Interval, the results showed that the Bootstrap Aggregating model produced the minimal error and hence was selected as the alternative stope design tool for predicting the stability status of stopes. Keywords: Stope Stability, Ensemble Learning Techniques, Stability Graph, Machine Learning
... A boosted cascade classifier consists of stages, each with an ensemble of Weak Learners (WLs). A WL is a learning algorithm that produces a classifier that can label data with an accuracy above chance 2 [Ferreira & Figueiredo, 2012;Freund & Schapire, 1995;Vaghela et al. , 2009]. Its performance profits from an efficient classifier structure and the use of fast-to-compute hand-crafted features. ...
Thesis
Full-text available
This thesis aims to develop a monocular vision system to track an Unmanned Aerial Vehicle (UAV) pose (3D position and orientation) relative to the camera reference frame during its landing on a ship. The vast majority of accidents and incidents occur during take-off or landing since, in the vast majority of systems, an external pilot takes control. Having less human intervention increases system reliability and alleviates the use of certified pilots. Due to UAV size and weight, take-off is easily performed by hand, so the main focus will be in the landing maneuver. The vision system is located on the ship’s deck, which reduces demands on the UAV’s processing power, size, and weight. The proposed architecture is based on an Unscented Particle Filter (UPF) scheme with two stages: (i) pose boosting, and (ii) tracking. In the pose boosting stage, we detect the UAV on the captured frame using Deep Neural Networks (DNNs) and initialize a set of pose hypotheses that are likely to describe the true pose of the target using a pre-trained database indexed by bounding boxes. In the tracking stage, we use a UPF based approach to obtain an online estimate of the true pose of the target. On contrary to many vision-based particle filters that sample particles from a distribution that is based solely on predictions from the previous frames, in this work, we also use information from the current frame to improve the convergence of the filter. We fuse information from current and previous time steps with Unscented Transform (UT) filters, and use, for the first time in this type of problem, the Bingham and Bingham-Gauss distributions to model the dynamics and noise of the orientation in its natural manifold. These filters depend on the computation of importance weights that use sub-optimal approximations to the likelihood function. We evaluate different similarity metrics that compute a distance measure between an artificial rendered image with the hypothetic state of the system and the captured frame. Since we are approximating the likelihood function, we enrich the filter with additional refinement steps to abridge its sub-optimality. We have developed a “realistic” simulator for a quantitative analysis of the results. The entire description and experimental analysis of the system is based on the tracking error and processing time. When analyzing a landing sequence with a real sky gradient filled with clouds, we have obtained approximately 81% less rotation error using the Unscented Bingham Filter (UBiF) and the Unscented Bingham-Gauss Filter (UBiGaF) when compared to the simple Unscented Kalman Filter (UKF) without considering the use of pose optimization. When we use pose optimization, we can decrease the obtained rotation error in more than 50%.
... Gradient Boosted Trees is a machine learning technique, for regression and classification problems [20]. Boosting is a method of transforming a weak learner into a strong learner [21]. It does this gradually with iterations. ...
... It is a strong ensemble, designed for classification problems. Yoav Freund and Robert Schapire in 1995/1996 introduced the first boosting method [46], [47]. It joins several poor accuracy models to compensate the weaknesses of predecessors to obtain a strong classifier. ...
Article
Full-text available
The specific characteristics and operations of microgrid cause protection problems due to high penetration of distributed energy resources. To resolve these issues, the proposed scheme employs the Hilbert transform and data mining approach to protect the microgrid. First, the Hilbert transform is used to preprocess the faulted voltage and current signals to extract the sensitive fault features. Then, the obtained data set of the extracted features is input to the logistic regression classifier for fault detection. Later, fault classification is done by training the AdaBoost classifier. In the proposed scheme, the simulation results for feature extractions are evaluated on a standard International Electrotechnical Commission (IEC) medium voltage microgrid, compatible with MATLAB/SIMULINK software environment, whereas, Python is used for training and testing of data mining model. The results are evaluated under grid-connected and islanded modes for both looped and radial configurations by simulating various fault and no-fault cases. The results show that the accuracy of the proposed logistic regression and AdaBoost classifier is higher when compared to decision tree, support vector machine, and random forest methods. The results further validate the robustness of the proposed method against the measurement noise.
... Among these algorithms, boosting algorithms like AdaBoost are considered to be more effective in classification problems. It is because of their competence to reduce variance as well as bias problems in predicted outputs, thereby providing more accurate results [18,79]. In addition, the present work also employs standalone C4.5 DT algorithm and support vector machine (SVM) algorithm with radial basis function (RBF) kernel [15,47] for the task of seizure detection. ...
Article
The present paper proposes a smart framework for detection of epileptic seizures using the concepts of IoT technologies, cloud computing and machine learning. This framework processes the acquired scalp EEG signals by Fast Walsh Hadamard transform. Then, the transformed frequency-domain signals are examined using higher-order spectral analysis to extract amplitude and entropy-based statistical features. The extracted features have been selected by means of correlation-based feature selection algorithm to achieve more real-time classification with reduced complexity and delay. Finally, the samples containing selected features have been fed to ensemble machine learning techniques for classification into several classes of EEG states, viz. normal, interictal and ictal. The employed techniques include Dagging, Bagging, Stacking, MultiBoost AB and AdaBoost M1 algorithms in integration with C4.5 decision tree algorithm as the base classifier. The results of the ensemble techniques are also compared with standalone C4.5 decision tree and SVM algorithms. The performance analysis through simulation results reveals that the ensemble of AdaBoost M1 and C4.5 decision tree algorithms with higher-order spectral features is an adequate technique for automated detection of epileptic seizures in real-time. This technique achieves 100% classification accuracy, sensitivity and specificity values with optimally small classification time.
... In classification problem, it is very easy to train single complex classifier than various simple classifiers and finally combining them into a more complex classifier. In other word boosting technique can be considered as a process of regularly applying differently weighted version of training data for the weak learning algorithm [36]. The boosting algorithm can be classified as: AdaBoost algorithm, GBA (Generic Boosting algorithm), XGBoost (Extreme gradient Boosting), stochastic gradient boosting and Bipartite Rankboost. ...
Article
: This work shows a multibiometric framework to upgrade the recognition rate and reduce the error rate using Bin based classifier based on score level (multi-algorithm) fusion. In this work Bin based classifier is used as the combination rule which integrate the matching scores from two distinct modalities namely iris and face. An optimization technique, PSO is utilized to minimize the unwanted information after combination of the feature sets of the iris and face using different feature extraction algorithms like PCA, LDA and LBP. The test results demonstrate that the multibiometric system as a Bin based classifier employing multi-algorithm score level fusion provides better outcomes than the other fusion rule like Likelihood Ratio based fusion, Linear Discriminant Analysis (LDA) and support vector machine (SVM). The experimental result on Face (ORL, BANCA,FERET) and iris (CASIA, UBIRIS) shows that the proposed multimodal system derived from CBBC (continuous bin based classifier) with PSO as an optimization technique achieve EER=0.012 , which outperform than the other fusion technique with EER=0.018 for SVM (RBF) and EER=0.02 for SVM linear.
... A classifier is defined "weak" when small changes in the data imply big changes in the classification model. The logic behind these methods is that the combination of several simple classifiers into a more complex one is easier than the construction of a single complex learner (Ferreira and Figueiredo, 2012). ...
Preprint
Full-text available
The interest in banks' bankruptcy prediction has rapidly increased especially after the 2008-2009 global financial crisis. The relevant consequences of bankruptcy cases have indeed highlighted the necessity for managers and regulators to develop and adopt appropriate early warning systems. The purpose of this paper is therefore to conduct a literature review of recent empirical contributions on bank's default prediction by analysing three underlying aspects: definition of default and financial distress, application of statistical and intelligent techniques, variables selection. The review also proposes some possible upgrades to promote future research on the topic, i.e. pointing out the potential role of non-financial information as good default predictors.
... Boosting algorithms come in different flavors for the type of learners or the updating of the weights [Ferreira andFigueiredo, 2012, Schapire, 2003]. Here we focused on the boosting using the decision tree model as the weak learner. ...
Preprint
Full-text available
Understanding how neurons cooperate to integrate sensory inputs and guide behavior is a fundamental problem in neuroscience. A large body of methods have been developed to study neuronal firing at the single cell and population levels, generally seeking interpretability as well as predictivity. However, these methods are usually confronted with the lack of ground-truth necessary to validate the approach. Here, using neuronal data from the head-direction (HD) system, we present evidence demonstrating how gradient boosted trees, a non-linear and supervised Machine Learning tool, can learn the relationship between behavioral parameters and neuronal responses with high accuracy by optimizing the information rate. Interestingly, and unlike other classes of Machine Learning methods, the intrinsic structure of the trees can be interpreted in relation to behavior (e.g. to recover the tuning curves) or to study how neurons cooperate with their peers in the network. We show how the method, unlike linear analysis, reveals that the coordination in thalamo-cortical circuits is qualitatively the same during wakefulness and sleep, indicating a brain-state independent feed-forward circuit. Machine Learning tools thus open new avenues for benchmarking model-based characterization of spike trains. A uthor summary The thalamus is a brain structure that relays sensory information to the cortex and mediates cortico-cortical interaction. Unraveling the dialogue between the thalamus and the cortex is thus a central question in neuroscience, with direct implications on our understanding of how the brain operates at the macro scale and of the neuronal basis of brain disorders that possibly result from impaired thalamo-cortical networks, such as absent epilepsy and schizophrenia. Methods that are classically used to study the coordination between neuronal populations are usually sensitive to the ongoing global dynamics of the networks, in particular desynchronized (wakefulness and REM sleep) and synchronized (non-REM sleep) states. They thus fail to capture the underlying temporal coordination. By analyzing recordings of thalamic and cortical neuronal populations of the HD system in freely moving mice during exploration and sleep, we show how a general non-linear encoder captures a brain-state independent temporal coordination where the thalamic neurons leading their cortical targets by 20-50ms in all brain states. This study thus demonstrates how methods that do not assume any models of neuronal activity may be used to reveal important aspects of neuronal dynamics and coordination between brain regions.
... A lot of surveys have been published due to remarkable properties of ensemblebased models [10,12,17,18,21,22,23,32,33]. Most ensemble-based models are thoroughly studied in Zhou's book [34]. ...
Preprint
Full-text available
The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the approach is to use the stacking algorithm in order to learn a second-level meta-model which can be regarded as a model for implementing various ensembles of gradient boosting models. First, the linear regression of the gradient boosting models is considered as a simplest realization of the meta-model under condition that the linear model is differentiable with respect to its coefficients (weights). Then it is shown that the proposed approach can be simply extended on arbitrary differentiable combination models, for example, on neural networks which are differentiable and can implement arbitrary functions of gradient boosting models. Various numerical examples illustrate the proposed approach.
Article
Full-text available
The implementation of tree‐ensemble models has become increasingly essential in solving classification and prediction problems. Boosting ensemble techniques have been widely used as individual machine learning algorithms in predicting house prices. One of the techniques is LGBM algorithm that employs leaf wise growth strategy, reduces loss and improves accuracy during training which results in overfitting. However, XGBoost algorithm uses level wise growth strategy which takes time to compute resulting in higher computation time. Nevertheless, XGBoost has a regularization parameter, implements column sampling and weight reduction on new trees which combats overfitting. This study focuses on developing a hybrid LGBM and XGBoost model in order to prevent overfitting through minimizing variance whilst improving accuracy. Bayesian hyperparameter optimization technique is implemented on the base learners in order to find the best combination of hyperparameters. This resulted in reduced variance (overfitting) in the hybrid model since the regularization parameter values were optimized. The hybrid model is compared to LGBM, XGBoost, Adaboost and GBM algorithms to evaluate its performance in giving accurate house price predictions using MSE, MAE and MAPE evaluation metrics. The hybrid LGBM and XGBoost model outperformed the other models with MSE, MAE and MAPE of 0.193, 0.285, and 0.156 respectively.
Article
Creep lifetime prediction is critical for the design of high-temperature components. Due to creep lifetime being affected by many factors, its prediction with high accuracy is still challenging. The 9% Cr martensitic heat-resistant steel is currently the world’s most widely used creep-resistant steel in supercritical power plant equipment. In this work, variables like material chemical compositions, heat treatment conditions and creep test conditions are considered in various machine learning (ML) models to predict creep lifetime. First, series of typical individual regression algorithms are assessed, but the prediction results are imperfect. Second, several ensemble learning algorithms are optimized by bagging and boosting, and a noticeable improvement in predictive performance is observed, especially for the extreme gradient boosting algorithm. Finally, a model coupled with Larson-Miller (LM) parameter is proposed based on stacking, which gives the best prediction results. R-square (R²), mean absolute error (MAE), and mean square error (MSE) of the proposed model is 0.918, 0.516, and 0.450, respectively.
Article
Accurate predictions for buildings’ energy performance (BEP) are crucial for retrofitting investment decisions and building benchmarking. With the increasing data availability and popularity of machine learning across disciplines, research started to investigate machine learning for BEP predictions. While stand-alone machine learning models showed first promising results, a comprehensive analysis of advanced ensemble models to increase prediction accuracy is missing for annual BEP predictions. We implement and thoroughly tune twelve machine learning models to bridge this research gap, ranging from stand-alone to homogeneous and heterogeneous ensemble learning models. Based on an extensive real-world dataset of over 25,000 German residential buildings, we benchmark their prediction accuracy. The results provide strong evidence that ensemble models substantially outperform stand-alone machine learning models both on average and in case of the best-performing model. All models are tested for robustness and systematic bias by evaluating their prediction performance along different building age classes, living space bins, and several error measures. Extreme gradient boosting as ensemble model exhibits the highest prediction accuracy, followed by a multilayer perceptron ahead of further ensemble models. We conclude that ensemble models for annual BEP prediction are advantageous compared to stand-alone models and outperform their results in most cases.
Article
Multiple machine learning models were developed in this study to optimize biodiesel production from waste cooking oil in a heterogenous catalytic reaction mode. Several input parameters were considered for the model including reaction temperature, reaction time, catalyst loading, methanol/oil molar ratio, whereas the percent of biodiesel production yield was the only output. Three ensemble models were utilized in this study: Boosted Linear Regression, Boosted Multi-layer Perceptron, and Forest of Randomized Tree for optimization of the yield. We then found their optimized configurations for each model, namely hyper-parameters. This critical task is done by running more than 1000 combinations of hyper-parameters. Finally, The R²-Scores for Boosted Linear Regression, Boosted Multi-layer Perceptron, and Forest of Randomized Tree, respectively, were 0.926, 0.998, and 0.992. MAPE criterion revealed that the error rates for boosted linear regression, boosted multi-layer perceptron, and Forest of Randomized Tree was 5.68×10⁻², 5.20×10⁻², and 9.83×10⁻², respectively. Furthermore, utilizing the input vector (X1=165, X2=5.72, X3=5.55, X4=13.0), the proposed technique produces an ideal output value of 96.7 % as the optimum yield in catalytic production of biodiesel from waste cooking oil.
Preprint
Full-text available
Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multi-class classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.
Article
Full-text available
Using telematics technology, insurers are able to capture a wide range of data to better decode driver behavior, such as distance traveled and how drivers brake, accelerate, or make turns. Such additional information also helps insurers improve risk assessments for usage-based insurance, a recent industry innovation. In this article, we explore the integration of telematics information into a classification model to determine driver heterogeneity. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero accidents, a lower proportion with exactly one accident, and a far lower proportion with two or more accidents. We here introduce a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm we call SAMME.C2 to handle such class imbalances. We calibrate the algorithm using empirical data collected from a telematics program in Canada and demonstrate an improved assessment of driving behavior using telematics compared with traditional risk variables. Using suitable performance metrics, we show that our algorithm outperforms other learning models designed to handle class imbalances.
Thesis
Full-text available
El desarrollo de la capacidad computacional unido a la aparición de nuevos algoritmos de predicción en el marco del machine learning está produciendo una mejora sustancial en la capacidad de conocer con anterioridad hechos de distinta índole con una adecuada base de datos sobre dicha situación. En las rías gallegas uno de los grandes problemas tanto de salud como económicos es la aparición de mareas rojas tóxicas debidas a blooms de algas como la Pseudo-nitzschia spp. En este trabajo se pretende probar varios de estos nuevos algoritmos con el objetivo de buscar la forma más adecuada de predecir estas proliferaciones de algas tóxicas, para lo cual se usan una serie de datos que se aplican a algoritmos tanto clásicos como de aparición más reciente. Mediante todas estas pruebas se obtendrán valoraciones de distintos métodos de cara a un objetivo a medio plazo de implementar un sistema de predicción de mareas rojas en tiempo real en las distintas rías gallegas. Se han usado datos obtenidos en las rías entre los años 2002 y 2012. Se han aplicado tanto en fase de entrenamiento como de validación a predictores mediante Redes Neuronales, Máquinas de Soporte Vectorial, Regresión Logística, Random Forest y AdaBoost. En el capítulo 1 se realiza una introducción a la situación actual en las rías bajas gallegas, punto de estudio, así como de los procesos de afloramiento y floraciones de algas. Se explican de forma más concreta los objetivos de este trabajo. En el capítulo 2 se describen los datos usados, así como las fuentes de dónde se obtuvieron. A continuación, se realiza una descripción matemática de los distintos algoritmos utilizados y de las métricas y formas de valoración de los mismos. Se incluye aquí la metodología completa usada. En el capítulo 3 se realiza una comparación directa de los modelos obtenidos en el trabajo previo realizado por Luis González Vilas (Codirector de esta tesis) usando los mismos grupos de variables, pero con datos obtenidos entre los años 2002 y 2012. Además de la comparación y mediante técnicas de curvas de aprendizaje se valora cómo es posible la mejora de la predicción. En el capítulo 4 se aplican en la medida de lo posible las mejoras recomendadas en el capítulo 3 mediante la adición de nuevos datos a los modelos. Además, se añaden modelos para realizar predicciones a una semana vista, que son de interés especial para el campo económico de la industria marisquera gallega. En este capítulo también se describe la realización de un algoritmo para la búsqueda de un conjunto de datos óptimo de cara a la predicción de bloom. En el capítulo 5 se comentan los resultados, se muestran conclusiones y se plantean líneas futuras de investigación a partir del trabajo realizado.
Article
Full-text available
The use of infrared spectroscopy to augment decision-making in histopathology is a promising direction for the diagnosis of many disease types. Hyperspectral images of healthy and diseased tissue, generated by infrared spectroscopy, are used to build chemometric models that can provide objective metrics of disease state. It is important to build robust and stable models to provide confidence to the end user. The data used to develop such models can have a variety of characteristics which can pose problems to many model-building approaches. Here we have compared the performance of two machine learning algorithms - AdaBoost and Random Forests - on a variety of non-uniform data sets. Using samples of breast cancer tissue, we devised a range of training data capable of describing the problem space. Models were constructed from these training sets and their characteristics compared. In terms of separating infrared spectra of cancerous epithelium tissue from normal-associated tissue on the tissue microarray, both AdaBoost and Random Forests algorithms were shown to give excellent classification performance (over 95% accuracy) in this study. AdaBoost models were more robust when datasets with large imbalance were provided. The outcomes of this work are a measure of classification accuracy as a function of training data available, and a clear recommendation for choice of machine learning approach.
Chapter
The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the approach is to use the stacking algorithm in order to learn a second-level meta-model which can be regarded as a model for implementing various ensembles of gradient boosting models. First, the linear regression of the gradient boosting models is considered as the simplest realization of the meta-model under the condition that the linear model is differentiable with respect to its coefficients (weights). Then it is shown that the proposed approach can be simply extended on arbitrary differentiable combination models, for example, on neural networks that are differentiable and can implement arbitrary functions of gradient boosting models. Various numerical examples illustrate the proposed approach.
Chapter
Aiming at the problem that the vehicle detection algorithm based on convolutional neural network is too deep in the network layer, resulting in low training efficiency, this paper proposes a visualization method to adjust the structure of convolutional neural network, so as to improve training efficiency and detection effect. Firstly, the existing convolutional neural network model for image classification is visualized using the intermediate layer visualization method of convolutional neural network. Then, the layers of the convolutional neural network model are analyzed to select the layer with the best visualization effect for network reconstruction, so as to obtain a relatively simplified network model. The experimental results show that the similar multi-target detection method proposed in this paper has obvious improvement in training efficiency and accuracy.
Preprint
Full-text available
An ensemble method should cleverly combine a group of base classifiers to yield an improved classifier. The majority vote is an example of a methodology used to combine classifiers in an ensemble method. In this paper, we propose to combine classifiers using an associative memory model. Precisely, we introduce ensemble methods based on recurrent correlation associative memories (RCAMs) for binary classification problems. We show that an RCAM-based ensemble classifier can be viewed as a majority vote classifier whose weights depend on the similarity between the base classifiers and the resulting ensemble method. More precisely, the RCAM-based ensemble combines the classifiers using a recurrent consult and vote scheme. Furthermore, computational experiments confirm the potential application of the RCAM-based ensemble method for binary classification problems.
Article
Full-text available
If one has a multiclass classification problem and wants to boost a multiclass base classifier AdaBoost.M1 is a well known and widely applicated boosting algorithm. However AdaBoost.M1 does not work, if the base classifier is too weak. We show, that with a modification of only one line of AdaBoost.M1 one can make it usable for weak base classifiers, too. The resulting classifier AdaBoost.M1Wis guaranteed to minimize an upper bound for a performance measure, called the guessing error, as long as the base classifier is better than random guessing. The usability of AdaBoost.M1Wcou ld be clearly demonstrated experimentally.
Article
Full-text available
The use of data mining approaches in medical domains is increasing rapidly. This is mainly because the effectiveness of these approaches to classification and prediction systems has improved, particularly in relation to helping medical practitioners in their decision making. This type of research has become important for finding ways to improve patient outcomes, reduce the cost of medicine, and further advance clinical studies. Therefore, in this paper, data pre-processing RELIEF attributes selection, and Modest AdaBoost algorithms, are used to extract knowledge from the breast cancer survival databases in Thailand. The performance of these algorithms is examined by using classification accuracy, sensitivity and specificity, confusion matrix and stratified 10-fold cross-validation method. Computational results showed that Modest AdaBoost outperforms Real and Gentle AdaBoosts.
Article
Full-text available
Boosting has been a very successful technique for solving the two-class classification problem. In going from two-class to multi-class classification, most algorithms have been restricted to reducing the multi-class classification problem to multiple two-class problems. In this paper, we propose a new algorithm that naturally extends the original AdaBoost algorithm to the multi-class case without reducing it to multiple two-class problems. Similar to AdaBoost in the two-class case, this new algorithm combines weak classifiers and only requires the performance of each weak classifier be better than random guessing (rather than 1/2). We further provide a statistical justification for the new algorithm using a novel multi-class exponential loss function and forward stage-wise additive modeling. As shown in the paper, the new algorithm is extremely easy to implement and is highly competitive with the best currently available multi-class classification methods.
Article
Full-text available
Recently, object tracking by detection using adaptive on-line classifiers has been investigated. In this case, the tracking problem is reduced to the discrimination of the current object view from the local background. However, on-line learning may introduce errors, which causes drifting and let the tracker fail. This can be avoided by using semi-supervised on-line learning (i.e., the use of labeled and unlabeled training samples), which allows to limit the drifting problem while still staying adaptive to appearance changes, in order to stabilize tracking. In particular, this paper extends semi-supervised on-line boosting by a particle filter to achieve a higher frame-rate. Furthermore, a more sophisticated search-space sampling, and an improved update sample selection have been added. In addition, a review of the semi-supervised on-line boosting algorithm is given and further experiments have been accomplished.
Article
Full-text available
Boosting is one of the most important recent developments in classi-fication methodology. Boosting works by sequentially applying a classifica-tion algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical princi-ples, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descrip-tions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.
Article
Full-text available
Divide and conquer" has been a common practice to address complex learning tasks such as multi-view object detection. The positive examples are divided into multiple subcategories for training subcategory classifiers individ-ually. However, the subcategory labeling process, either through manual labeling or through clustering, is subop-timal for the overall classification task. In this paper, we propose multiple category boosting (McBoost), which over-comes the above issue through adaptive labeling. In par-ticular, a winner-take-all McBoost (WTA-McBoost) scheme is presented in detail. Each positive example has a unique subcategory label at any stage of the training process, and the label may switch to a different subcategory if a higher score is achieved by that subcategory classifier. By allowing examples to self-organize themselves in such a winner-take-all manner, WTA-McBoost outperforms traditional schemes significantly, as supported by our experiments on learning a multi-view face detector.
Conference Paper
Full-text available
For on-line learning algorithms, which are applied in many vision tasks such as detection or tracking, robust integration of unlabeled samples is a crucial point. Various strategies such as self-training, semi-supervised learning and multiple-instance learning have been proposed. However, these methods are either too adaptive, which causes drifting, or biased by a prior, which hinders incorporation of new (orthogonal) information. Therefore, we propose a new on-line learning algorithm (TransientBoost), which is highly adaptive but still robust. This is realized by using an internal multi-class representation and modeling reliable and unreliable data in separate classes. Unreliable data is considered transient, hence we use highly adaptive learning parameters to adapt to fast changes in the scene while errors fade out fast. In contrast, the reliable data is preserved completely and not harmed by wrong updates. We demonstrate our algorithm on two different tasks, i.e., object detection and object tracking showing that we can handle typical problems considerable better than existing approaches. To demonstrate the stability and the robustness, we show long-term experiments for both tasks.
Conference Paper
Full-text available
Boosting has become a powerful and useful tool in the machine learning and computer vision communities in recent years, and many interesting boosting algorithms have been developed to solve various challenging problems. In particular, Friedman proposed a flexible framework called gradient boosting, which has been used to derive boosting procedures for regression, multiple instance learning, semi-supervised learning, etc. Recently some attention has been given to online boosting (where the examples become available one at a time). In this paper we develop a boosting framework that can be used to derive online boosting algorithms for various cost functions. Within this framework, we derive online boosting algorithms for Logistic Regression, Least Squares Regression, and Multiple Instance Learning. We present promising results on a wide range of data sets.
Chapter
This chapter reproduces the English translation by B. Seckler of the paper byVapnik and Chervonenkis inwhich they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as Vapnik V. N. and Qervonenkis.16(2), 264-279 (1971). © Springer International Publishing Switzerland 2015. All rights are reserved.
Article
Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Chapter
In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.
Article
This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error ∈.
Book
Ripley brings together two crucial ideas in pattern recognition: statistical methods and machine learning via neural networks. He brings unifying principles to the fore, and reviews the state of the subject. Ripley also includes many examples to illustrate real problems in pattern recognition and how to overcome them.
Article
This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other text-categorization algorithms on a variety of tasks. We conclude by describing the application of our system to automatic call-type identification from unconstrained spoken customer responses.
Article
Boosting is a technique of combining a set weak classifiers to form one high-performance prediction rule. Boosting was successfully applied to solve the problems of object detection, text analysis, data mining and etc. The most and widely used boosting algorithm is AdaBoost and its later more effective variations Gentle and Real AdaBoost. In this article we propose a new boosting algorithm, which produces less generalization error compared to mentioned algorithms at the cost of somewhat higher training error.
Article
Learning is regarded as the phenomenon of knowledge acquisition in the absence of explicit programming. A precise methodology is given for studying this phenomenon rom a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learned using it in a reasonable (polynomial) number of steps. Although inherent algorithmic complexity appears to set serious limits to the range of concepts that can be learned, it is shown that there are some important nontrivial classes of propositional concepts that can be learned in a realistic sense.
Article
This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady. The paper was first published in Russian as ???????????? ??. ??. and ?????????????????????? ??. ??. ?? ?????????????????????? ???????????????????? ???????????? ?????????????????? ?????????????? ?? ???? ????????????????????????. ???????????? ???????????????????????? ?? ???? ???????????????????? 16(2), 264???279 (1971).
Article
In this paper, we present a feature selection approach based on Gabor wavelets and AdaBoosting. The features are first extracted by a Gabor wavelet transform. A family of Gabor wavelets with 5 scales and 8 orientations is generated with the standard Gabor kernel. Convolved with the Gabor wavelets, the original images are transformed into vec-tors of Gabor wavelet features. Then for an individual person, a small set of significant features are selected by the AdaBoost algorithm from the pool of the Gabor wavelet features. In the feature selection process, each feature is the basis for a weak classifier which is trained with XM2VTS face images. In each round of AdaBoost learning, the feature with the lowest error of weak classifiers is selected. The results from the experi-ment have shown that the approach successfully selects meaningful and explainable features for face verification. The experiments suggest that the feature selection algorithm for face verification selects the features corresponding to the unique characteristics rather than common charac-teristics, and a large example size statistically benefits AdaBoost feature selection.
Article
Recently a fast and efficient face detection method has been devised [11], which relies on the AdaBoost algorithm and a set of Haar Wavelet like fea-tures. A natural extension of this approach is to use the same technique to lo-cate individual features within the face region. However, we find that there is insufficient local structure to reliably locate each feature in every image, and thus local models can give many false positive responses. We demonstrate that the performance of such feature detectors can be significantly improved by using global shape constraints. We describe an algorithm capable of ac-curately and reliably detecting facial features and present quantitative results on both high and low resolution image sets.
Conference Paper
Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, we briefly survey theoretical work on boosting including analyses of AdaBoost’s training error and generalization error, connections between boosting and game theory, methods of estimating probabilities using boosting, and extensions of AdaBoost for multiclass classification problems. We also briefly mention some empirical work.
Article
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone-Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in n. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.
Article
This paper presents a real-time vision-based vehicle detection system employing an online boosting algorithm. It is an online AdaBoost approach for a cascade of strong classifiers instead of a single strong classifier. Most existing cascades of classifiers must be trained offline and cannot effectively be updated when online tuning is required. The idea is to develop a cascade of strong classifiers for vehicle detection that is capable of being online trained in response to changing traffic environments. To make the online algorithm tractable, the proposed system must efficiently tune parameters based on incoming images and up-to-date performance of each weak classifier. The proposed online boosting method can improve system adaptability and accuracy to deal with novel types of vehicles and unfamiliar environments, whereas existing offline methods rely much more on extensive training processes to reach comparable results and cannot further be updated online. Our approach has been successfully validated in real traffic environments by performing experiments with an onboard charge-coupled-device camera in a roadway vehicle.
Article
This paper presents a novel ensemble classifier generation technique RotBoost, which is constructed by combining Rotation Forest and AdaBoost. The experiments conducted with 36 real-world data sets available from the UCI repository, among which a classification tree is adopted as the base learning algorithm, demonstrate that RotBoost can generate ensemble classifiers with significantly lower prediction error than either Rotation Forest or AdaBoost more often than the reverse. Meanwhile, RotBoost is found to perform much better than Bagging and MultiBoost. Through employing the bias and variance decompositions of error to gain more insight of the considered classification methods, RotBoost is seen to simultaneously reduce the bias and variance terms of a single tree and the decrement achieved by it is much greater than that done by the other ensemble methods, which leads RotBoost to perform best among the considered classification procedures. Furthermore, RotBoost has a potential advantage over AdaBoost of suiting parallel execution.
Article
In this paper, we propose lazy bagging (LB), which builds bootstrap replicate bags based on the characteristics of test instances. Upon receiving a test instance xk, LB trims bootstrap bags by taking into consideration xk's nearest neighbors in the training data. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information to enhance local learning and generate a classifier with refined decision boundaries emphasizing the test instance's surrounding region. In particular, by taking full advantage of xk's nearest neighbors, classifiers are able to reduce classification bias and variance when classifying xk. As a result, LB, which is built on these classifiers, can significantly reduce classification error, compared with the traditional bagging (TB) approach. To investigate LB's performance, we first use carefully designed synthetic data sets to gain insight into why LB works and under which conditions it can outperform TB. We then test LB against four rival algorithms on a large suite of 35 real-world benchmark data sets using a variety of statistical tests. Empirical results confirm that LB can statistically significantly outperform alternative methods in terms of reducing classification error.
Article
Real Adaboost is a well-known and good performance boosting method used to build machine ensembles for classification. Considering that its emphasis function can be decomposed in two factors that pay separated attention to sample errors and to their proximity to the classification border, a generalized emphasis function that combines both components by means of a selectable parameter, λ, is presented. Experiments show that simple methods of selecting λ frequently offer better performance and smaller ensembles.
Article
Multi-class AdaBoost algorithms AdaBooost.MO, -ECC and -OC have received a great attention in the literature, but their relationships have not been fully examined to date. In this paper, we present a novel interpretation of the three algorithms, by showing that MO and ECC perform stage-wise functional gradient descent on a cost function defined over margin values, and that OC is a shrinkage version of ECC. This allows us to strictly explain the properties of ECC and OC, empirically observed in prior work. Also, the outlined interpretation leads us to introduce shrinkage as regularization in MO and ECC, and thus to derive two new algorithms: SMO and SECC. Experiments on diverse databases are performed. The results demonstrate the effectiveness of the proposed algorithms and validate our theoretical findings.
Article
This paper presents a strategy to improve the AdaBoost algorithm with a quadratic combination of base classifiers. We observe that learning this combination is necessary to get better performance and is possible by constructing an intermediate learner operating on the combined linear and quadratic terms. This is not trivial, as the parameters of the base classifiers are not under direct control, obstructing the application of direct optimization. We propose a new method realizing iterative optimization indirectly. First we train a classifier by randomizing the labels of training examples. Subsequently, the input learner is called repeatedly with a systematic update of the labels of the training examples in each round. We show that the quadratic boosting algorithm converges under the condition that the given base learner minimizes the empirical error. We also give an upper bound on the VC-dimension of the new classifier. Our experimental results on 23 standard problems show that quadratic boosting compares favorably with AdaBoost on large data sets at the cost of training speed. The classification time of the two algorithms, however, is equivalent.