Article

Machine Learning Research: Four Current Directions

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Machine Learning research has been making great progress is many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up supervised learning algorithms, (c) reinforcement learning, and (d) learning complex stochastic models. 1 1 Introduction The last five years have seen an explosion in machine learning research. This explosion has many causes. First, separate research communities in symbolic machine learning, computational learning theory, neural networks, statistics, and pattern recognition have discovered one another and begun to work together. Second, machine learning techniques are being applied to new kinds of problems including knowledge discovery in databases, language processing, robot control, and combinatorial optimization as well as in more traditional problems such as speech recognition, face re...

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The Relief method has been usually used in feature weighting in many studies [26,27]. In this method, each feature is assigned a relevance weight to know the relevance of the feature due to the target framework. ...
... Here, the weight W j is expected to reach the highest value when X d is likely to have different values and X s is unlikely to have different values for attribute j [26]. In this study, we prefer the ReliefF that is a new modification of relief. ...
Article
Full-text available
Numerous studies have been conducted to elucidate the relation of tumor proximity to cancer prognosis and treatment efficacy in colorectal cancer. However, the molecular pathways and prognoses of left- and right-sided colorectal cancers are different, and this difference has not been fully investigated at the genomic level. In this study, a set of data science approaches, including six feature selection methods and three classification models, were used in predicting tumor location from gene expression profiles. Specificity, sensitivity, accuracy, and Mathew's correlation coefficient (MCC) evaluation metrics were used to evaluate the classification ability. Gene ontology enrichment analysis was applied by the Gene Ontology PANTHER Classification System. For the most significant 50 genes, protein-protein interactions and drug-gene interactions were analyzed using the GeneMANIA, CytoScape, CytoHubba, MCODE, and DGIdb databases. The highest classification accuracy (90%) is achieved with the most significant 200 genes when the ensemble-decision tree classification model is used with the ReliefF feature selection method. Molecular pathways and drug interactions are investigated for the most significant 50 genes. It is concluded that a machine-learning-based approach could be useful to discover the significant genes that may have an important role in the development of new therapies and drugs for colorectal cancer.
... Ensemble learning can significantly reduce the generalization error of the ensemble model. It is praised as the first of the four research directions of machine learning [1]. The current popular algorithms such as Random Forest (RF) and XGBoost are the products of ensemble learning. ...
... At present, increasing the diversity of classifiers has B Wei He he_w_1980@163.com 1 Harbin Normal University, Harbin 150025, China 2 Rocket Force University of Engineering, Xi'an 710025, China become the mainstream method to improve the performance of ensemble models. Generally, there are two ways to guide classifier integration by using diversity [6]. ...
Article
Full-text available
In ensemble learning, accuracy and diversity among classifiers are the keys to good integration. However, most diversity measures only evaluate the diversity of classifiers from a single point of view and have a poor correlation with the generalization capability of the final model. As such, there is still a lack of an effective diversity measure to guide the integration of classifiers. In this paper, an updatable fusion measure is proposed to evaluate diversity in classifiers. It is based on the evidential reasoning rule by fusing various measures from multiple perspectives. Before the fusion, the correlation among various diversity measures is tested. Only those measures with weak correlation can be fused. In addition, whenever a new effective measure appears, it can be fused with the old fusion measure after the significance test. Through the experimental verification of multiple data sets, classifiers, and combination strategies, this measure can effectively reflect the diversity of classifier combinations and assist classifier integration.
... techniques in supervised learning, as a sub-field of machine learning algorithms, that classify input data into categories. There are various algorithms in this field such as decision tree (Quinlan 1986(Quinlan , 1993Breiman et al. 1984), neural network (Richard and Lippmann 1991;Zhang 2000), K-nearest neighbor (K N N) (Cover and Hart 1967), naive Bayesian (Duda and Hart 1973), support vector machine (SV M) (Vapnik 1995;Cortes and Vapnik 1995), ensemble learning (Dietterich 1997), and extensions of them (Abaszade et al. 2019;Baik and Bala 2004;Baloochian and Ghaffary 2019;Cheng et al. 2002;Duda et al. 2012;Friedman et al. 1997;Fuentes-García and Walker 2010;Grillenzoni 2016;Ma and Ryoo 2021;Wilson and Martinez 2000). ...
... Ensemble classifier is one of the most important classifiers employing multiple learners to obtain better predictive performance (Dietterich 1997). The Adaptive Boosting (AdaBoost) is an effective technique in ensemble methods that can be used in connection with many other kinds of classifiers to modify performance (Freund and Schapire 1997). ...
Article
Full-text available
In this paper, we suggest a new classifier using a multiple-attribute decision-making (MADM) model for fuzzy classification. First, we form a decision-making matrix. Its elements are membership functions of a fuzzy set constructed by training datasets. Then, for any test data, we form an MADM problem, and by solving this problem with a method from the MADM techniques, we obtain a fuzzy classification. For this purpose, we utilize the technique for order of preference by similarity to ideal solution (TOPSIS) method as a well-known method in the MADM techniques. Additionally, we use a new criterion for determining a weight vector for features in this approach. We evaluate the obtained results of the new approach with five well-known algorithms on ten datasets. Also, we compare our new approach with the weightless algorithm and weighed algorithm by the generalized Fisher score in feature selection methods. Finally, to show the superiority of the new approach, we use a statistical comparison with other methods.
... Ensemble learning forms a more comprehensive 'strong learner' through the combination of multiple models. Ensemble learning can obtain more accurate prediction results, with better generalization performance and broader applications [10,11]. Ensemble learning has been successfully applied to character recognition [12], medical diagnosis [62], facial recognition [21] and seismic wave classification [46]. ...
... Ensemble learning [10] is a process of effective fusion to form a strong classifier based on individual learners (as shown in Fig. 1). According to the difference of the combination methods of base learners, ensemble learning can be divided into parallel topology structure (representing algorithm is bagging), serial topology structure (representing algorithm is boosting) and hybrid topology structure (representing algorithm is stacking). ...
Article
Full-text available
Slope engineering is a complex nonlinear system. It is difficult to respond with a high level of precision and efficiency requirements for stability assessment using conventional theoretical analysis and numerical computation. An ensemble learning algorithm for solving highly nonlinear problems is introduced in this paper to study the stability of 444 slope cases. Different ensemble learning methods [AdaBoost, gradient boosting machine (GBM), bagging, extra trees (ET), random forest (RF), hist gradient boosting, voting and stacking] for slope stability assessment are studied and compared to make the best use of the large variety of existing statistical and ensemble learning methods collected. Six potential relevant indicators, γ, C, φ, β, H and ru, are chosen as the prediction indicators. The tenfold CV method is used to improve the generalization ability of the classification models. By analysing the evaluation indicators AUC, accuracy, kappa value and log loss, the stacking model shows the best performance with the highest AUC (0.9452), accuracy (84.74%), kappa value (0.6910) and lowest log loss (0.3282), followed by ET, RF, GBM and bagging models. The analysis of engineering examples shows that the ensemble learning algorithm can deal with this relationship well and give accurate and reliable prediction results, which has good applicability for slope stability evaluation. Additionally, geotechnical material variables are found to be the most influential variables for slope stability prediction.
... AdaBoost: The algorithm deals with methods, which employ multiple learners to solve a problem (Dietterich 1997). The generalization ability with other algorithms to improve their performance is usually significantly better than that of a single learner. ...
Article
Full-text available
Although many would argue that the most important factor for the success of a big data project is the process of analyzing the data, it is more important to staff, structure and organize the participants involved to ensure an efficient collaboration within the team and an effective use of the toolsets, the relevant applications and a customized flow of information. A main challenge of big data projects originates from the amount of people involved and that need to collaborate, the need for a higher and specific education, the defined approach to solve the analytical problem that is undefined in many cases, the data-set itself (structured or unstructured) and the required hard- and software (such as analysis-software or self-learning algorithms). Today there is neither an organizational framework nor overarching guidelines for the creation of a high-performance analytics team and its organizational integration available. This paper builds upon (a) the organizational design of a team for a big data project, (b) the relevant roles and competencies (such as programming or communication skills) of the members of the team and (c) the form in which they are connected and managed.
... It can improve the generalization ability of predictions effectively. Nowadays, ensemble learning is known as the primary research direction of current machine learning gradually and becoming a research hotspot in the field of machine learning [34], and the algorithms are extensively applied in predictive applications [35][36], but few scholars have further explored stock prediction models based on modal decomposition techniques and ensemble learning algorithms in the current research literature. ...
Article
Full-text available
After the COVID-19 ended, the global economy gradually recovered. Due to the nonlinearity, complexity, and high noise of financial time series, stock price prediction has become one of the most challenging tasks in the stock market. To tackle this challenge and enhance the prediction performance in the complicated stock markets, we propose a novel integrated approach based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), Long Short-Term Memory (LSTM), and ensemble learning algorithm LightGBM to simultaneously improve the fitting and accuracy of stock price prediction. In addition, to prevent overfitting and improve predictive performance, this study adopted the Simulated Annealing (SA) algorithm for optimization. The predictive performance of the proposed hybrid model is comprehensively evaluated by comparing it with single LSTM, RNN, and other popular hybrid models. Three evaluation metrics, namely Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and accuracy, are used to compare the aforementioned models. The experimental results indicate that the proposed hybrid CEEMDAN-LSTM-SA-LightGBM model outperforms all other comparative models in this study with better fitting and accuracy.
... depicts an RBM, a kind of Boltzmann machine with this design. [132] The fact that the hidden and visible layers are conditionally independent of one another is one benefit of this topology. As a result, one layer may be sampled using the activations of another. ...
... depicts an RBM, a kind of Boltzmann machine with this design. [132] The fact that the hidden and visible layers are conditionally independent of one another is one benefit of this topology. As a result, one layer may be sampled using the activations of another. ...
... depicts an RBM, a kind of Boltzmann machine with this design. [132] The fact that the hidden and visible layers are conditionally independent of one another is one benefit of this topology. As a result, one layer may be sampled using the activations of another. ...
... depicts an RBM, a kind of Boltzmann machine with this design. [132] The fact that the hidden and visible layers are conditionally independent of one another is one benefit of this topology. As a result, one layer may be sampled using the activations of another. ...
... depicts an RBM, a kind of Boltzmann machine with this design. [132] The fact that the hidden and visible layers are conditionally independent of one another is one benefit of this topology. As a result, one layer may be sampled using the activations of another. ...
... First, ensemble learning models all use individual models to train, which means they combine several trained models, unlike traditional machine learning models which are only trained once. However, all "small" prediction models may use different combinations to build the final ensemble model [42]. Bagging, stacking, and boosting are the three most popular methods of combination. ...
... In a similar manner, when there is a task to predict or detect whether a fraudulent transaction has taken place or to identify a fraud transaction amongst a dataset that has valid transactions as well, requires the usage of classification based on machine learning algorithms. The detection system can be helpful in segregation of transactions into their type, whether fraud or valid transactions [33,67]. But often using different machine learning algorithms on the same dataset gives very different results, so it is at times advised to try something called ensemble learning. ...
Article
Full-text available
Long gone is the time when people preferred using only cash. In recent years, cashless transactions have gained much popularity, be it using UPI apps or credit and debit cards. The same has even led to a significant increase in the number of credit card fraud cases. Detecting fraudulent transactions is a challenging task as the fraudsters disguise the ordinary conduct of clients in order to perform fraud. Automated intelligent credit card fraud detection can be employed for detecting fraudulent transactions. In this paper, we proposed a credit card fraud detection approach involving an arrangement of supervised machine learning algorithms called ensemble learning. One of the difficulties looked at during the time spent to distinguish fraud transactions in datasets is the imbalanced class distribution. In this work, we employed an ensemble learning model in combination with two data-level techniques for handling class imbalance problems. The proposed approach is the ensemble of three base classifiers including random forest, logistic regress and K-nearest neighbour along with two data-level algorithms namely random oversampling and random undersampling. To combine the predictions of the base classifiers, the weighted voting ensemble approach is used. The proposed approach is evaluated using a highly imbalanced credit card transaction dataset. The proposed approach is evaluated using various sets of weights in order to identify the best possible outcomes in terms of accuracy and minimise the misclassification of fraudulent transactions.
... The specialized model is usually called the weak model or component model in ensemble learning. As early as 1997, the expert on ML, Dietterich (1997) listed ensemble learning as the first of the four research directions of ML. Until now, ensemble learning is still one of the most popular fields of ML. ...
Article
Full-text available
As a main type of colorectal cancer, rectal cancer has a high risk and mortality rate so it is very important to accurately predict the survivability of patients to make better decisions on medical treatment and preparation for medical expenses. In recent years, many scholars have studied the survivability of selected common cancers such as lung cancer using machine learning approaches. Therefore, this research proposes a heterogeneous ensemble classification model to predict the survivability of rectal cancer patients. The model employs four different types of classifiers as component classifiers and Bagging algorithm to generate example sets for training component classifiers. In the proposed model, heterogeneous ensemble can help improve the diversity of component classifiers and Bagging can lower the variance and enhance the stability of the model. Finally, a fuzzy multiple criteria decision making method named fuzzy TOPSIS is employed to fuse the results of component classifiers. We evaluated the proposed model on the rectal cancer patient records dataset extracted from Surveillance, Epidemiology, and End Results (SEER) database. The results show that the proposed model obtains a significant improvement in terms of four standard metrics, including accuracy, specificity, sensitivity and area under the receiver operating characteristic curve, compared with single component classifiers and some other state-of-the-art ensemble classification models, such as Random Forest and Gradient Boosting Tree. Experiments also show that fusing component classifiers with fuzzy TOPSIS is superior to voting and simple weighted average methods. The proposed model outperforms other techniques in rectal cancer survival prediction, thereby improving the prognosis of rectal cancer patients and further assisting clinicians in developing better treatment plans.
... In addition to the models previously discussed in Section IV (Center, CNN, LF, MLP, and GBR), we also consider the stacking and voting ensemble methods [42], [43]. A stacked ensemble model [44] takes the outputs of multiple models as the inputs to a meta-regressor, which then gives the VOLUME 4, 2016 9 This article has been accepted for publication in IEEE Access. ...
Article
Full-text available
Eye tracking is a technology that is in high demand, especially for next-generation virtual reality (VR), because it enables foveated rendering, which significantly reduces computational costs by rendering only the area at which a user is gazing at a high resolution and the rest at a lower resolution. However, the conventional eye-tracking technique requires per-eye camera hardware attached near the eyes within a VR headset. Moreover, the detected eye gaze follows the actual eye gaze with a finite delay because of the camera latency, the need for image processing, and the VR system’s native latency. This paper proposes an eye-tracking solution that predicts a user’s future eye gaze using only the inertial sensors that are already built into VR headsets for head tracking. To this end, we formulate three time-series regression problems to predict (1) the current eye gaze using past head orientation data, (2) the future eye gaze using past head orientation and eye gaze data, and (3) the future eye gaze using past head orientation data only. We solve the first and second problems using machine learning models and develop two solutions for the final problem: two-stage and single-stage approaches. The two-stage approach for the final problem relies on two machine learning models connected in series, one for the first problem and the other for the second problem. The single-stage approach uses a single model to predict the future eye gaze directly from past head orientation data. We evaluate the proposed solutions based on real eye-tracking traces captured from a VR headset for multiple test players, considering various combinations of machine learning models. The experimental results show that the proposed solutions for the final problem reduce the error for a center-fixed gaze by up to 50% and 20% for anticipation times of 50 and 150 ms, respectively.
... First, ensemble learning models all use individual models to train, which means they combine several trained models, unlike traditional machine learning models which are only trained once. However, all "small" prediction models may use different combinations to build the final ensemble model [42]. ...
Preprint
Full-text available
Ubiquitination-site prediction is an important task because ubiquitination is a critical regulatory function for many biological processes such as proteasome degradation, DNA repair and transcription, signal transduction, endocytoses, and sorting. However, the highly dynamic and reversible nature of ubiquitination makes it difficult to experimentally identify specific ubiquitination sites. In this paper, we explore the possibility of improving the prediction of ubiquitination sites using ensemble machine learning methods including Random Forrest (RF), Adaptive Boosting (ADB), Gradient Boosting (GB), and eXtreme Gradient Boosting (XGB). By doing grid search with the four ensemble methods and six comparison non-ensemble learning methods including Naive Base (NB), Logistic Regression (LR), Decision Trees (DT), Support Vector Machine (SVM), LASSO, and K-Nearest Neighbor (KNN), we find that all the four ensemble methods significantly outperform one or more non-ensemble methods included in this study. XGB outperforms three out of the six non-ensemble methods that we included; ADB and RF both outperform two of the six non-ensemble methods; GB outperforms one non-ensemble method. Comparing the four ensemble methods among themselves. GB performs the worst; XGB and ADB are very comparable in terms of prediction, but ADB beats XGB by far in terms of both the unit model training time and total running time. Both XGB and ADB tend to do better than RF in terms of prediction, but RF has the shortest unit model training time out of the three. In addition, we notice that ADB tends to outperform XGB when dealing with small-scale datasets, and RF can outperform either ADB or XGB when data are less balanced. Interestingly, we find that SVM, LR, and LASSO, three of the six non-ensemble methods included, perform comparably with all the ensemble methods. Based on this study, ensemble learning is a promising approach to ignificantly improving ubiquitination-site prediction using protein segment data.
... In this study, various machine learning algorithms were used for survival rate prediction according to mortality, survival time, and treatment method. The algorithms are voting ensembles [8][9][10][11], Logistic Regression (LR) [12], K-nearest neighbors (KNN) [13,14], Decision Tree (DT) Classifier [15][16][17], Support Vector Machine (SVM) [18][19][20][21], Random Forest (RF) [22,23], Extreme gradient boosting trees (XG Boost) [24], Light GBM [25,26], and Natural Gradient Boosting (NG Boost) [27,28]. Its prediction results are compared in Tables 6, 8, and 10. ...
Article
Full-text available
Aim To predict survival time of Korean hepatocellular carcinoma (HCC) patients using multi-center data as a foundation for the development of a predictive artificial intelligence model according to treatment methods based on machine learning. Methods Data of patients who underwent treatment for HCC from 2008 to 2015 was provided by Korean Liver Cancer Study Group and Korea Central Cancer Registry. A total of 10,742 patients with HCC were divided into two groups, with Group I (2920 patients) confirmed on biopsy and Group II (5562 patients) diagnosed as HCC according to HCC diagnostic criteria as outlined in Korean Liver Cancer Association guidelines. The data were modeled according to features of patient clinical characteristics. Features effective in predicting survival rate were analyzed retrospectively. Various machine learning methods were used. Results Target was overall survival time, which divided into approximately 60 months (= /< 60 m, > 60 m). Target distribution in Group I (total 514 samples) was 28.8%: (148 samples) less than 60 months, 71.2% (366 samples) greater than 60 months, and in Group II (total 757 samples) was 66.6% (504 samples) less than 60 months, 33.4% (253 samples) greater than 60 months. Using NG Boost method, its accuracy was 83%, precision 84%, sensitivity 95%, and F1 score 89% for more than 60 months survival time in Group I with surgical resection. Moreover, its accuracy was 79%, precision 82%, sensitivity 87%, and F1 score 84% for less than 60 months survival time in Group II with TACE. The feature importance with gain criterion indicated that pathology, portal vein invasion, surgery, metastasis, and needle biopsy features could be explained as important factors for prediction in case of biopsy (Group I). Conclusion By developing a predictive model using machine learning algorithms to predict prognosis of HCC patients, it is possible to project optimized treatment by case according to liver function and tumor status.
... In machine learning, ensemble methods that integrate predictions from multiple models are commonly used to boost the prediction performance. According to previous studies [46,47], ensemble methods are often more accurate than the component methods that the ensemble methods contain. Among various ensemble methods, model stacking is an efficient method that uses the predictions generated by other machine learning algorithms as the inputs of a second layer algorithm, which then combines the input predictions to form a new set of predictions. ...
Article
Full-text available
Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples. Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries, we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and assembly approaches to obtain the relative abundance profiles of both known and novel genomes. The random forests (RF) classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles. Based on within data cross-validation and cross-dataset prediction, we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken. We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial organisms to further increase the prediction performance for colorectal cancer from metagenomes.
... Ensemble learning (EL) is the foremost research direction in the field of machine learning, by combining several simple learners to make prediction results more accurate and reliable (Kuncheva and Whitaker, 2003;Thomas, 1997). It can significantly improve the generalization performance and reduce the miscalculation risk caused by a single model. ...
Article
In order to evaluate the slope stability quickly, accurately, and reliably, a slope stability prediction method based on the margin distance minimization selective ensemble (MDMSE) is proposed, which can objectively evaluate the slope stability through the basic geometric and geological factors, overcoming the disadvantages of difficult selection and high risk of misjudgment in traditional machine learning models. Firstly, a large number of differentiated individual learners are built by the method of data sample and algorithm parameter perturbation. Then, based on the MDMSE algorithm, search the optimal subset of individual learners. Finally, integrate them to form a reasonable and effective MDMSE prediction model with the majority voting method. With 422 groups of slope samples, and the prediction performance of the MDMSE prediction model is compared with the common single machine learning models and ensemble models. The results show that the MDMSE prediction model is of obviously better generalization ability than other models, and better recognition accuracy and faster identification speed than all ensemble models. As a selective ensemble model with strong generalization ability and high efficiency. The MDMSE prediction model is more suitable for the prediction and analysis of slope stability and has certain engineering references and practical value.
... A comprehensive review paper of (Clemen 1989) and many other empirical papers, for example, (Ginzburg and Horn 1994;Zhang 2003;Lemke and Gabrys 2010;Lin et al. 2012;Firmino et al. 2013) support that combined forecasts can generally outperform the individual forecasts. As an extension to the combined models, many ensemble models are introduced and investigated mainly in the machine learning area (Dietterich 1997;Anifowose et al. 2017;Fatai Anifowose 2015), such as stacked generalization (or stacking) (Wolpert 1992), boosting (Schapire 1990), bagging (Breiman 1996), and voting. Ensemble is an approach to the combination of results, produced by many single forecasters. ...
Article
Full-text available
Over the last decades, several soft computing techniques have been applied to tourism demand forecasting. Among these techniques, a neuro-fuzzy model of ANFIS (adaptive neuro-fuzzy inference system) has started to emerge. A conventional ANFIS model cannot deal with the large dimension of a dataset, and cannot work with our dataset, which is composed of a 62 time-series, as well. This study attempts to develop an ensemble model by incorporating neural networks with ANFIS to deal with a large number of input variables for multivariate forecasting. Our proposed approach is a collaboration of two base learners, which are types of the neural network models and a meta-learner of ANFIS in the framework of the stacking ensemble. The results show that the stacking ensemble of ANFIS (meta-learner) and ANN models (base learners) outperforms its stand-alone counterparts of base learners. Numerical results indicate that the proposed ensemble model achieved a MAPE of 7.26% compared to its single-instance ANN models with MAPEs of 8.50 and 9.18%, respectively. Finally, this study which is a novel application of the ensemble systems in the context of tourism demand forecasting has shown better results compared to those of the single expert systems based on the artificial neural networks.
... Data from these classifiers will be subsequently processed and used to generate the HDI model, based on the ensemble learning approach methodology. Ensemble learning combines predictions of multiple individual classifiers obtained by different techniques such as random forest, support vector machine or general linear modelling in order to enhance generalisation power, 26 avoid overfitting, and increase the strength and reliability of the final outcome. 27 Specifically, the outcomes from ctDNA, cfMeDNA, proteomics and radiomics tests will be combined by using a weighted-majority voting approach implemented in the R environment (caret package). ...
Article
Full-text available
Introduction Standard procedures aimed at the early diagnosis of breast cancer (BC) present suboptimal accuracy and imply the execution of invasive and sometimes unnecessary tissue biopsies. The assessment of circulating biomarkers for diagnostic purposes, together with radiomics, is of great potential in BC management. Methods and analysis This is a prospective translational study investigating the accuracy of the combined assessment of multiple circulating analytes together with radiomic variables for early BC diagnosis. Up to 750 patients will be recruited at their presentation at the Diagnostic Senology Unit of Ospedale Policlinico San Martino (Genoa, IT) for the execution of a diagnostic biopsy after the detection of a suspect breast lesion (t0). Each recruited patient will be asked to donate peripheral blood and urine before undergoing breast biopsy. Blood and urine samples will also be collected from a cohort of 100 patients with negative mammography. For cases with histological diagnosis of invasive BC, a second sample of blood and urine will be collected after breast surgery. Circulating tumour DNA, cell-free methylated DNA and circulating proteins will be assessed in samples collected at t0 from patients with stage I–IIA BC at surgery together with those collected from patients with histologically confirmed benign lesions of similar size and from healthy controls with negative mammography. These analyses will be combined with radiomic variables extracted with freeware algorithms applied to cases and matched controls for which digital mammography is available. The overall goal of the present study is to develop a horizontal data integration classifier for the early diagnosis of BC. Ethics and dissemination This research protocol has been approved by Regione Liguria Ethics Committee (reference number: 2019/75, study ID: 4452). Patients will be required to provide written informed consent. Results will be published in international peer-reviewed scientific journals. Trial registration number NCT04781062 .
... The primary aim of machine learning is artifacts, especially algorithms for improving the performance through experience [48][49][50]. Machine learning is programming computers to optimize a performance criterion using example data or past experience [51,52]. Machine learning tasks are usually divided into three categories: supervised learning, unsupervised learning and reinforcement learning [53]. ...
Article
Full-text available
In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of 'big data' derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifi-ers.
... Dietterich [44] tried to give reasons to understand the ensemble ability strongest against single learner, considering the nature of machine learning that searches a hypothesis space for the most accurate hypothesis. ...
Article
Glaucoma is one of the most known irreversible chronic eye disease that leads to permanent blindness but its earlier diagnosis can be treated. Convolutional neural networks (CNNs), a branch of deep learning, have an impressive record for applications in image analysis and interpretation, including medical imaging. This necessity is justified by their capacity and adaptability to extract pertinent features automatically from the original image. In other hand, the use of ensemble learning algorithms has an important impact to improve the classification rate. In this paper, a two-stage-based image processing and ensemble learning approach is proposed for automated glaucoma diagnosis. In the first stage, the generation of different modalities from original images is adopted by the application of advanced image processing techniques especially Gabor filter-based texture image. Next, each dataset constructing from the corresponding modality will be learned by an individual CNN classifier. Aggregation techniques will be then applied to generate the final decision taking into account the outputs of all CNNs classifiers. Experiments were carried out on Rime-One dataset for glaucoma diagnosis. The obtained results proved the superiority of the proposed ensemble learning system compared to the existing studies with classification accuracy of 89.63%.
... Na década de 90 houve um grande aumento nas investigações sobre aprendizagem de máquina provindas de várias frentes: da Inteligência Artificial simbólica, das redes neurais artificiais, das áreas de estatística, reconhecimento de padrões e da teoria de aprendizagem computacional. Juntamente com isso, novos problemas começaram a ser tratados na aprendizagem de máquina, tais como datamining, controles de robôs, otimização combinatória, reconhecimento de voz e facial, análise de dados médicos, jogos de computador, entre outros (Dietterich, 1997). ...
Conference Paper
Full-text available
Recursos da computação e da robótica já vem sendo agregados nas áreas performáticas e espetaculares como ferramentas de auxílio à atores e diretores, tanto na forma de elementos mecânicos para compor o cenário, como na forma de artefatos ativos integrantes da cena. Segundo Tori (2006), "o uso do computador potencializou e convergiu formas de expressão artísticas, viabilizando a multimídia, que envolve textos, imagens, sons, vídeos e animações". Entretanto, tal utilização vem sendo extrapolada, em alguns casos, direcionando à um rumo mais amplo como a utilização de robôs autômatos com Inteligência Artificial. Um princípio básico da Inteligência Artificial, segundo Russel e Norvig (2004) é a criação de agentes autônomos inteligentes, capazes de efetuar tarefas que muitas vezes são executadas por pessoas. Esses agentes conseguem perceber seu ambiente e o que acontece ao seu redor por meio de receptores-câmeras, sonar, toque, receptores sonoros-e pode responder à esses estímulos via conjunto de atuadores-braços mecânicos, pernas, rodas, sons. Com seu processamento interno ou Inteligência Artificial responsável por responder ao meio de forma mais adequada, esses autômatos podem realizar tarefas de modo que um espectador pode classificá-lo como um ser inteligente, podendo ser identificado em cena tal como um ator. Contudo, como o próprio nome rege, ainda é artificial, é simulado e foi programado por algum desenvolvedor ou conjunto de desenvolvedores. De qualquer forma, a utilização de um autômato em uma cena não é mais uma novidade e já foi utilizada para uma peça teatral no Japão, interagindo com um ator real, assim como também já existem robôs que declaram poesias e interpretam trechos de peças de Sheakespeare, como os Robôs Thespian (ROBOTHESPIAN, 2011) e Titan (ROBOTX, 2011), ou até peças teatrais inteiras (Lange, 2011). No contexto da utilização de robôs como parte integrante de uma interpretação teatral e/ou performática, é possível fazer um paralelo com o Teatro de Animação em que uma pessoa manipula bonecos ou outros objetos. Porém, a questão principal em relação à um robô/autômato é: quem manipula o autômato? Dessa forma, ainda que envolvendo o cenário das ferramentas computacionais e robóticas como recurso para a teatralidade, este trabalho avalia a extrapolação desses recursos e conceitos com a utilização de técnicas de Inteligência Artificial com a representação executada por recursos computacionais e robóticos providos de comportamentos que são, ainda que de forma simulada, classificados como inteligentes.
... First, the ICO algorithm uses a cross-validation or percentage split approach to optimize the number of iterations. Secondly, the Dagging algorithm benefits from ensemble learning in its structure (multiple weak learners) which outperforms a single strong learner (Dietterich, 1997). This learning helps to reduce variance and avoid the over-fitting problem caused by the use of a bootstrap procedure. ...
Article
Scour depth prediction and its prevention is one of the most important issues in channel and waterway design. However the potential for machine learning algorithms to provide models of scour depth has yet to be explored. This study provides the first quantification of the predictive power of a range of standalone and hybrid machine learning models. Using previously collected scour depth data from laboratory flume experiments, the performance of five types of recently developed standalone machine learning techniques - the Isotonic Regression (ISOR), Sequential Minimal Optimization (SMO), Iterative Classifier Optimizer (ICO), Locally Weighted learning (LWL) and Least Median of Squares Regression (LMS) - are assessed, along with their hybrid versions with Dagging (DA) and Random Subspace (RS) algorithms. The main findings are five-fold. First, the DA-ICO model had the highest prediction power. Second, the hybrid models had a higher prediction power than standalone models. Third, all algorithms underestimated the maximum scour depth, except DA-ICO which predicted scour depth almost perfectly. Fourth, scour depth was most sensitive to densimetric particle Froude number followed by the non-dimensionalized contraction width, flow depth within the contraction, sediment geometric standard deviation, approach flow velocity and median grain size. Fifth, most of the algorithms performed best when all the input parameters were involved in the building of the model. An important exception was the best performing model that required only four input parameters: densimetric particle Froude number, non-dimensionalized contraction width, flow depth within the contraction and sediment geometric standard deviation. Overall the results revealed that hybrid machine learning algorithms provide more accurate predictions of scour depth than empirical equations and traditional AI-algorithms. In particular, the DA-ICO model not only created the most accurate predictions but also used the fewest easily and readily measured input parameters. Thus this type of model could be of real benefit to practicing engineers required to estimate maximum scour depth when designing in-channel structures.
Preprint
A problem of bounding the generalization error of a classifier f in H, where H is a "base" class of functions (classifiers), is considered. This problem frequently occurs in computer learning, where efficient algorithms of combining simple classifiers into a complex one (such as boosting and bagging) have attracted a lot of attention. Using Talagrand's concentration inequalities for empirical processes, we obtain new sharper bounds on the generalization error of combined classifiers that take into account both the empirical distribution of "classification margins'' and an "approximate dimension" of the classifiers and study the performance of these bounds in several experiments with learning algorithms.
Article
Full-text available
Private equity (PE) represents the acquisition of stakes in non-listed companies, often long-term, with the objective of improving the performance and value of the company to obtain significant benefits at time of disinvestment. PE has gained particular importance in the global financial system for delivering superior risk-adjusted returns. Knowing the PE return drivers has been of great interest among researchers and academics, and some studies have developed statistical models to determine PE return drivers. Still, the explanatory capacity of these models has certain limitations related to their precision levels and exclusive focus on groups of countries located in Europe and the EE.UU. Therefore, in the current literature, new models of analysis of the PE return drivers are demanded to provide a better fit in worldwide scenarios. This study contributes to the accuracy of the models that identify the PE return drivers using computational methods and a sample of 1606 PE funds with a geographical focus on the world’s five regions. The results have provided a unique set of PE return drivers with a precision level above 86%. The conclusions obtained present important theoretical and practical implications, expanding knowledge about PE and financial forecasting from a global perspective.
Article
Full-text available
Protein phosphorylation is a dynamic and reversible post‐translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein–protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine‐learning approach that leverages protein language models and gradient‐boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision‐recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision‐recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence‐based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome‐wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.
Chapter
Diabetic retinopathy (DR) is a major ocular complication of diabetes. Delayed diagnosis of such disease increases the risk of vision loss and irreversible blindness. With the popularization of computer-aided diagnosis technology, the use of deep learning for DR classification has become a current research hotspot. We aim to develop a GA-DCNN model that can improve the performance of DR classification. In this paper, a novel global attention-based model called GCA-SA is proposed to provide fine-grained global lesion information for DR classification. Furthermore, inspired by genetic algorithm (GA) and ensemble learning (EL), this paper also proposes a strategy of integrating deep convolutional neural networks (DCNNs) with GA. The GA-DCNN model is constructed by aggregating GCA-SA and spatial pyramid pooling (SPP) into three DCNNs and using the strategy of integrating DCNNs with GA. The experimental results show that the accuracy, specificity and AUC of the GA-DCNN reach 0.91, 0.94 and 0.93, respectively. Compared with traditional CNN, GA-DCNN can capture the detailed features of DR lesions and integrate the classification results of the multiple DCNNs, effectively improving the detection and classification performance of DR.
Article
Microseismic events must be classified to obtain effective disaster precursor information. To further improve the accuracy and efficiency of microseismic event classification, and based on the microseismic data of rockburst monitoring, combined with time-frequency analysis theory, the main factors affecting microseismic event classification were analyzed. For the first time, the Rectangle-Histogram of Oriented Gradient (R-HOG) and Stacking technologies were effectively combined to establish a new Stacking integrated learning model (RSREL-stacking) for classifying microseismic events. Finally, the classification performances of RSREL-stacking, a deep learning model, and other models were tested. The results showed the following: (1) Noisy signal interference and some events that have similar characteristics are the main factors leading to the high misjudgment rate of events. (2) RSREL-stacking can accurately distinguish the spectrum of useful signals from complex noise interference; and effectively extract the contour, region, and spatial position feature information of the useful signal spectrum. (3) Through different experiments, it is confirmed that RSREL-stacking effectively combines the advantages of many independent models, can deeply recognize the subtle differences and features of similar events, and provides increased accuracy of event classification. Compared with other methods, RSREL-stacking combines the advantages of efficient and accurate classification of microseismic events, which guarantees the quickly obtaining of effective disaster information.
Chapter
As a significant type of machine learning, supervised learning method is apt at learning from well-labeled training data and it is widely used in various tasks such as classification and regression. Support Vector Machine (SVM) is a powerful and popular algorithm in supervised learning, and it has been successfully applied in machinery intelligent fault diagnosis due to its excellent ability to handle complex decision boundaries and high-dimensional data, especially in small sample cases. Therefore, supervised SVM-based algorithms and their applications in machinery fault diagnosis are introduced in this chapter. To fully improve the generalization performance of SVM, the problems such as parameter optimization, feature selection, and ensemble-based incremental methods are discussed. The effectiveness of the SVM-based algorithms is validated in several fault diagnosis tasks on electrical locomotive rolling bearings, Bently rotor, and motor bearings test benches.
Article
Full-text available
Although machine learning classifiers have been successfully used in the medical and engineering fields, there is still room for improving the predictive accuracy of model classification. The higher the accuracy of the classifier, the better suggestions can be provided for the decision makers. Therefore, in this study, we propose an ensemble machine learning approach, called Feature generation-based Ensemble Support Vector Machine (FESVM), for classification tasks. We first apply the feature selection technique to select the most related features. Next, we introduce an ensemble strategy to aggregate multiple base estimators for the final prediction using the meta-classifier SVM. During this stage, we use the classification probabilities obtained from the base classifier to generate new features. After that, the generated features are added to the original data set to form a new data set. Finally, this new data set is utilised to train the meta-classifier SVM to obtain the final classification results. For example, for a binary classification task, each base classifier has two probabilities (p for one class and 1−p for the other class). In this case, two new features are generated from the combination of probabilities based on these base classifiers. One is the sum of p as new feature 1, and the other is the sum of 1−p as new feature 2. These two new features are then added to the original data set to form the new data set. In the same way, our feature generation method can be easily extended for a multi-class task for generating new features, where the number of features depends on the number of classes. Those generated features from the base estimators (first layer) are added to the original data set to form a new data set. This new data set is used as the input to the second layer (meta-classifier) to obtain the final model. Experiments based on the 20 data sets show that our proposed model FESVM has the best performance compared to the other machine learning classifiers under comparison. In addition, our FESVM has better performance than the original stacking method in the multi-class classification tasks. Statistical results based on the Wilcoxon–Holm method also confirms that our FESVM can significantly outperform the other models. These indicate that our FESVM can be a useful tool for classification tasks, especially multi-classification tasks.
Article
To construct a strong classifier ensemble, base classifiers should be accurate and diverse. However, there is no uniform standard for the definition and measurement of diversity. This work proposes a learners' interpretability diversity (LID) to measure the diversity of interpretable machine learners. It then proposes a LID-based classifier ensemble. Such an ensemble concept is novel because: 1) interpretability is used as an important basis for diversity measurement and 2) before its training, the difference between two interpretable base learners can be measured. To verify the proposed method's effectiveness, we choose a decision-tree-initialized dendritic neuron model (DDNM) as a base learner for ensemble design. We apply it to seven benchmark datasets. The results show that the DDNM ensemble combined with LID obtains superior performance in terms of accuracy and computational efficiency compared to some popular classifier ensembles. A random-forest-initialized dendritic neuron model (RDNM) combined with LID is an outstanding representative of the DDNM ensemble.
Article
The bagging method has received much application and attention in recent years due to its good performance and simple framework. It has facilitated the advanced random forest method and accuracy-diversity ensemble theory. Bagging is an ensemble method based on simple random sampling (SRS) method with replacement. However, SRS is the most foundation sampling method in the field of statistics, where exists some other advanced sampling methods for probability density estimation. In imbalanced ensemble learning, down-sampling, over-sampling, and SMOTE methods have been proposed for generating base training set. However, these methods aim at changing the underlying distribution of data rather than simulating it better. The ranked set sampling (RSS) method uses auxiliary information to get more effective samples. The purpose of this article is to propose a bagging ensemble method based on RSS, which uses the ordering of objects related to the class to obtain more effective training sets. To explain its performance, we give a generalization bound of ensemble from the perspective of posterior probability estimation and Fisher information. On the basis of RSS sample having a higher Fisher information than SRS sample, the presented bound theoretically explains the better performance of RSS-Bagging. The experiments on 12 benchmark datasets demonstrate that RSS-Bagging statistically performs better than SRS-Bagging when the base classifiers are multinomial logistic regression (MLR) and support vector machine (SVM).
Article
Full-text available
Structural health monitoring for bridges is a crucial concern in engineering due to the degradation risks caused by defects, which can become worse over time. In this respect, enhancement of various models that can discriminate between healthy and non-healthy states of structures have received extensive attention. These models are concerned with implementation algorithms, which operate on the feature sets to quantify the bridge’s structural health. The functional correlation between the feature set and the health state of the bridge structure is usually difficult to define. Therefore, the models are derived from machine learning techniques. The use of machine learning approaches provides the possibility of automating the SHM procedure and intelligent damage detection. In this study, we propose four classification algorithms to SHM, which uses the concepts of support vector machine (SVM) algorithm. The laboratory experiment, which intended to validate the results, was performed at Western Sydney University (WSU). The results were compared with the basic SVM to evaluate the performance of proposed algorithms.
Chapter
The studies for vehicle noise, vibration and harshness (NVH) are related to the modification and optimization of noise and vibration characteristics of vehicles, particularly cars or trucks. This chapter introduces vehicle interior noise generation mechanisms, including air-borne and structure-borne noises. An automobile is a complex vibration system with multiple excitation sources (engine, intake/exhaust system, transmission system, tire/road excitation, vehicle body, and wind noise) which deliver vibrational energy to multiple target points through different transfer paths. Several transfer path analysis methods are introduced to identify the transmission and contribution of each vibration/ noise source. Furthermore, the vibration/noise prediction methods based on the model and data are discussed. Under varying high-speed conditions, the vehicle interior noises are nonlinear and nonstationary. Based on machine learning and compressed sensing approaches, the so-called signal decomposition optimization-based back propagation neural network for ear-side noise reconstruction (DBENR) and the multi-variable based time-domain signal reconstruction (MTSR) are introduced for vehicle interior noise. The research results suggest that the two methods can effectively reconstruct the nonlinear and nonstationary vehicle interior noise signals and may provide high-precision reference signals for vehicle active noise control.
Preprint
Full-text available
p>Customer Churn Prediction Using Stacking Classifier</p
Preprint
Full-text available
p>Customer Churn Prediction Using Stacking Classifier</p
Chapter
Human intelligence is deeply involved in creating efficient and faster systems that can work independently. Creation of such smart systems requires efficient training algorithms. Thus, the aim of this chapter is to introduce the readers with the concept of machine learning and the commonly employed learning algorithm for developing efficient and intelligent systems. The chapter gives a clear distinction between supervised and unsupervised learning methods. Each algorithm is explained with the help of suitable example to give an insight to the learning process.
Chapter
Symbolic or explainable learning models stand out within the Machine Learning area because they are self-explanatory, making the decision process easier to be interpreted by humans. However, these models are overly responsive to the training set used. Thus, even tiny variations in training sets can result in much worse precision. In this research we propose a meta-learning approach that transforms a Random Forest into a single Decision Tree. Experiments were performed on classification datasets from different domains. Our approach using precision (positive reliability) performs as good as a Random Forest with no statistically significant differences. Yet, its advantage is the interpretability provided by a single decision tree. Results indicate that it is possible to obtain a resulting model which is easier to interpret than a Random Forest, still with higher precision than a standard Decision Tree.
Chapter
We compare performance of six single classifiers trained on German credit dataset, an imbalanced dataset of 1000 instances with binary-valued dependent variable. To improve the performance, we consider resampling the dataset and ensembling the classifiers. The benchmarks are taken from the best performance among six considered classifiers. Resampling the dataset includes oversampling and undersampling. The performance of ensemble classifiers are then analyzed and examined. The experimental results provide three benchmarks, i.e. SVM trained on plain dataset, NB trained on plain dataset, and SVM trained on undersampled dataset. Furthermore, ensemble of kNN, LDA and SVM outperforms the first benchmark for all metrics used in this research, i.e. recall 92.71%, precision 79.14%, F1 84.73%, AUC 79.96%, and accuracy 76.88%. The ensemble of LR, SVM and NB and the ensemble of LDA, SVM, and NB outperforms the second and third benchmark, respectively.KeywordsEnsemble classifiersImbalanced datasetResampling
Article
The application of regulatory technology (RegTech) for monitoring comprehensive data sources has gained increased importance. Nevertheless, previous research neglects that the output of RegTech applications has to be explainable and non-discriminatory. Within this study, we propose design principles and features for a RegTech approach which provides automated assessments of financial consumer complaints. We follow three main design principles. First, we build upon information diagnosticity theory to address the need for explainable classifications empowering regulators to justify their actions. Second, we consider a bag-of-words representation and ensemble learning to ensure high classification accuracy. Third, we take into account author characteristics to avoid discriminating classifications. We evaluate our approach in the financial services industry and show its value for identifying consumer complaints resulting in monetary compensations. The proposed design principles and features are highly relevant for regulators, corporations as well as consumers.
Article
Traditional curved-roof forms have significant potentials in mitigating undesirable environmental impacts. Their performance predictions can be grouped into 4 trendlines of varying degrees of sophistication: theoretical abstracts, numerical methods, white-box simulations and black-box machine learning algorithms. Unprecedently, this research investigates the potential contribution of single- and ensemble-models to approximate the average hourly direct normal and diffuse horizontal irradiances (AHIRDirect, AHIRDiffuse) and cooling energy consumption (AHECCooling) of buildings topped with vaulted-roof forms of various configurations in Aswan, Egypt. Solar and energy simulations are first conducted to build essential datasets, which get pre-processed, before developing 8 single-models, representing 4 families of supervised single-algorithms: artificial neural networks, random forests, k-nearest neighbors and support vector regression. Voting ensemble-model is then created by combining the best-performing single-models. Lastly, the accuracies of all models are compared against simulation outputs. The results showed that no single-model could dominantly predict AHIRDirect, AHIRDiffuse and AHECCooling, obtaining tolerable R2 values, ranging from 97.017 to 61.913%, 92.782 to 43.986% and 99.341 to −9.219%, corresponding to RMSE values of 47.321 to 195.208, 17.457 to 53.617 and 0.002 to 0.032, respectively. Alternatively, voting ensemble-model acquired even better R2 values of 93.971, 93.047 and 97.276%, with RMSE values of 69.000, 17.249 and 0.004, respectively.
Article
Shimming in the context of nuclear magnetic resonance aims to achieve a uniform magnetic field distribution, as perfect as possible, and is crucial for useful spectroscopy and imaging. Currently, shimming precedes most acquisition procedures in the laboratory, and this mostly semi-automatic procedure often needs to be repeated, which can be cumbersome and time-consuming. The paper investigates the feasibility of completely automating and accelerating the shimming procedure by applying deep learning (DL). We show that DL can relate measured spectral shape to shim current specifications and thus rapidly predict three shim currents simultaneously, given only four input spectra. Due to the lack of accessible data for developing shimming algorithms, we also introduce a database that served as our DL training set, and allows inference of changes to 1H NMR signals depending on shim offsets. In situ experiments of deep regression with ensembles demonstrate a high success rate in spectral quality improvement for random shim distortions over different neural architectures and chemical substances. This paper presents a proof-of-concept that machine learning can simplify and accelerate the shimming problem, either as a stand-alone method, or in combination with traditional shimming methods. Our database and code are publicly available.
Article
[open access] The use of decision trees considerably improves the discriminating capacity of ensemble classifiers. However, this process results in the classifiers no longer being interpretable, although comprehensibility is a desired trait of decision trees. Consolidation (consolidated tree construction algorithm, CTC) was introduced to improve the discriminating capacity of decision trees, whereby a set of samples is used to build the consolidated tree without sacrificing transparency. In this work, PCTBagging is presented as a hybrid approach between bagging and a consolidated tree such that part of the comprehensibility of the consolidated tree is maintained while also improving the discriminating capacity. The consolidated tree is first developed up to a certain point and then typical bagging is performed for each sample. The part of the consolidated tree to be initially developed is configured by setting a consolidation percentage. In this work, 11 different consolidation percentages are considered for PCTBagging to effectively analyse the trade-off between comprehensibility and discriminating capacity. The results of PCTBagging are compared to those of bagging, CTC and C4.5, which serves as the base for all other algorithms. PCTBagging, with a low consolidation percentage, achieves a discriminating capacity similar to that of bagging while maintaining part of the interpretable structure of the consolidated tree. PCTBagging with a consolidation percentage of 100% offers the same comprehensibility as CTC, but achieves a significantly greater discriminating capacity.
Article
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-and-updating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.
Conference Paper
Full-text available
Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed through a single discrete variable—the hidden state. We discuss a generalization of HMMs in which this state is factored into multiple state variables and is therefore represented in a distributed manner. We describe an exact algorithm for inferring the posterior probabilities of the hidden state variables given the observations, and relate it to the forward–backward algorithm for HMMs and to algorithms for more general graphical models. Due to the combinatorial nature of the hidden state representation, this exact algorithm is intractable. As in other intractable systems, approximate inference can be carried out using Gibbs sampling or variational methods. Within the variational framework, we present a structured approximation in which the the state variables are decoupled, yielding a tractable algorithm for learning the parameters of the model. Empirical comparisons suggest that these approximations are efficient and provide accurate alternatives to the exact methods. Finally, we use the structured approximation to model Bach's chorales and show that factorial HMMs can capture statistical structure in this data set which an unconstrained HMM cannot.
Article
Full-text available
Current inductive machine learning algorithms typically use greedy search with limited lookahead. This prevents them to detect significant conditional dependencies between the attributes that describe training objects. Instead of myopic impurity functions and lookahead, we propose to use RELIEFF, an extension of RELIEF developed by Kira and Rendell [10, 11], for heuristic guidance of inductive learning algorithms. We have reimplemented Assistant, a system for top down induction of decision trees, using RELIEFF as an estimator of attributes at each selection step. The algorithm is tested on several artificial and several real world problems and the results are compared with some other well known machine learning algorithms. Excellent results on artificial data sets and two real world problems show the advantage of the presented approach to inductive learning.
Article
Full-text available
We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption oflikelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user''s priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—aprior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at mostk=1 parent. For the general case (k>1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.
Article
Full-text available
Although building sophisticated learning agents that operate in complex environments will require learning to perform multiple tasks, most applications of reinforcement learning have focused on single tasks. In this paper I consider a class of sequential decision tasks (SDTs), called composite sequential decision tasks, formed by temporally concatenating a number of elemental sequential decision tasks. Elemental SDTs cannot be decomposed into simpler SDTs. I consider a learning agent that has to learn to solve a set of elemental and composite SDTs. I assume that the structure of the composite tasks is unknown to the learning agent. The straightforward application of reinforcement learning to multiple tasks requires learning the tasks separately, which can waste computational resources, both memory and time. I present a new learning algorithm and a modular architecture that learns the decomposition of composite SDTs, and achieves transfer of learning by sharing the solutions of elemental SDTs across multiple composite SDTs. The solution of a composite SDT is constructed by computationally inexpensive modifications of the solutions of its constituent elemental SDTs. I provide a proof of one aspect of the learning algorithm.
Article
Full-text available
This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory.
Article
Full-text available
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning results into reactive strategies for real-time control, as well as for learning such strategies when the system being controlled is incompletely known. We introduce an algorithm based on DP, which we call Real-Time DP (RTDP), by which an embedded system can improve its performance with experience. RTDP generalizes Korf's Learning-Real-Time-A* algorithm to problems involving uncertainty. We invoke results from the theory of asynchronous DP to prove that RTDP achieves optimal behavior in several different classes of problems. We also use the theory of asynchronous DP to illuminate aspects of other DP-based reinforcement learning methods such as Watkins' Q-Learning algorithm. A secondary aim of this article is to provide a bridge between AI research on real-time planning and learning and relevant concepts and algorithms from control theory.
Conference Paper
Full-text available
While most Reinforcement Learning work utilizes temporal discounting to evaluate performance, the reasons for this are unclear. Is it out of desire or necessity? We argue that it is not out of desire, and seek to dispel the notion that temporal discounting is necessary by proposing a framework for undiscounted optimization. We present a metric of undiscounted performance and an algorithm for finding action policies that maximize that measure. The technique, which we call R-learning, is modelled after the popular Q-learning algorithm [17]. Initial experimental results are presented which attest to a great improvement over Q-learning in some simple cases.
Article
Full-text available
We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation -Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.
Chapter
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric called n-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms: while several algorithms can provably generate gain-optimal policies that maximize average reward, none of them can reliably filter these to produce bias-optimal (or T-optimal) policies that also maximize the finite reward to absorbing goal states. This paper also presents a detailed empirical study of R-learning, an average reward reinforcement learning method, using two empirical testbeds: a stochastic grid world domain and a simulated robot environment. A detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels. The results suggest that R-learning is quite sensitive to exploration strategies, and can fall into sub-optimal limit cycles. The performance of R-learning is also compared with that of Q-learning, the best studied discounted RL method. Here, the results suggest that R-learning can be fine-tuned to give better performance than Q-learning in both domains.
Article
We examine a graphical representation of uncertain knowledge called a Bayesian network. The representation is easy to construct and interpret, yet has formal probabilistic semantics making it suitable for statistical manipulation. We show how we can use the representation to learn new knowledge by combining domain knowledge with statistical data. 1 Introduction Many techniques for learning rely heavily on data. In contrast, the knowledge encoded in expert systems usually comes solely from an expert. In this paper, we examine a knowledge representation, called a Bayesian network, that lets us have the best of both worlds. Namely, the representation allows us to learn new knowledge by combining expert domain knowledge and statistical data. A Bayesian network is a graphical representation of uncertain knowledge that most people find easy to construct and interpret. In addition, the representation has formal probabilistic semantics, making it suitable for statistical manipulation (Howard,...
Article
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.
Book
Artificial intelligence and expert systems have seen a great deal of research in recent years, much of which has been devoted to methods for incorporating uncertainty into models. This book is devoted to providing a thorough and up-to-date survey of this field for researchers and students.
Article
Gibbs sampling has enormous potential for analysing complex data sets. However, routine use of Gibbs sampling has been hampered by the lack of general purpose software for its implementation. Until now all applications have involved writing one-off computer code in low or intermediate level languages such as C or Fortran. We describe some general purpose software that we are currently developing for implementing Gibbs sampling: BUGS (Bayesian inference using Gibbs sampling). The BUGS system comprises three components: first, a natural language for specifying complex models; second, an 'expert system' for deciding appropriate methods for obtaining samples required by the Gibbs sampler: third, a sampling module containing numerical routines to perform the sampling. S objects are used for data input and output. BUGS is written in Modula-2 and runs under both DOS and UNIX.
Article
We make an analogy between images and statistical mechanics systems. Pixel gray levels and the presence and orientation of edges are viewed as states of atoms or molecules in a lattice-like physical system. The assignment of an energy function in the physical system determines its Gibbs distribution. Because of the Gibbs distribution, Markov random field (MRF) equivalence, this assignment also determines an MRF image model. The energy function is a more convenient and natural mechanism for embodying picture attributes than are the local characteristics of the MRF. For a range of degradation mechanisms, including blurring, nonlinear deformations, and multiplicative or additive noise, the posterior distribution is an MRF with a structure akin to the image model. By the analogy, the posterior distribution defines another (imaginary) physical system. Gradual temperature reduction in the physical system isolates low energy states (``annealing''), or what is the same thing, the most probable states under the Gibbs distribution. The analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations. The result is a highly parallel ``relaxation'' algorithm for MAP estimation. We establish convergence properties of the algorithm and we experiment with some simple pictures, for which good restorations are obtained at low signal-to-noise ratios.
Article
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning. Comment: See http://www.jair.org/ for any accompanying files
Article
Reasoning about change requires predicting how long a proposition, having become true, will continue to be so. Lacking perfect knowledge, an agent may be constrained to believe that a proposition persists indefinitely simply because there is no way for the agent to infer a contravening proposition with certainty. In this paper, we describe a model of causal reasoning that accounts for knowledge concerning cause-and-effect relationships and knowledge concerning the tendency for propositions to persist or not as a function of time passing. Our model has a natural encoding in the form of a network representation for probabilistic models. We consider the computational properties of our model by reviewing recent advances in computing the consequences of models encoded in this network representation. Finally, we discuss how our probabilistic model addresses certain classical problems in temporal reasoning (e. g., the frame and qualification problems).
Article
Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linear-threshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space.
Article
This paper examines whether temporal difference methods for training connectionist networks, such as Sutton''s TD() algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD() is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex non-trivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains, may be worth investigating.
Article
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.
Article
Considerable literature has accumulated over the years regarding the combination of forecasts. The primary conclusion of this line of research is that forecast accuracy can be substantially improved through the combination of multiple individual forecasts. Furthermore, simple combination methods often work reasonably well relative to more complex combinations. This paper provides a review and annotated bibliography of that literature, including contributions from the forecasting, psychology, statistics, and management science literatures. The objectives are to provide a guide to the literature for students and researchers and to help researchers locate contributions in specific areas, both theoretical and applied. Suggestions for future research directions include (1) examination of simple combining approaches to determine reasons for their robustness, (2) development of alternative uses of multiple forecasts in order to make better use of the information they contain, (3) use of combined forecasts as benchmarks for forecast evaluation, and (4) study of subjective combination procedures. Finally, combining forecasts should become part of the mainstream of forecasting practice. In order to achieve this, practitioners should be encouraged to combine forecasts, and software to produce combined forecasts easily should be made available.
Conference Paper
In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest to the AI community. We found the existing algorithms for computing optimal control strategies to be highly computationally inefficient and have developed a new algorithm that is empirically more efficient. We sketch this algorithm and present preliminary results on several small problems that illustrate important properties of the pomdp approach.
Conference Paper
Breiman's bagging and Freund and Schapire'sboosting are recent methods for improving the predictive power of classifier learning systems.Both form a set of classifiers that are combined by voting, bagging by generating replicated boots trap samples of the data, and boosting by adjustingthe weights of training instances. Thispaper reports results of applying both techniquesto a system that learns decision trees and testingon a representative collection of datasets. While both approaches...
Conference Paper
Standard methods of constructing decision trees can be prohibitively expensive when induction algorithms are given very large training sets on which to compare attributes. This expense can often be avoided. By using a subsample for this calculation, we can get an approximation to the information gain used to assess each attribute. Selecting an attribute on this basis involves a certain risk, which depends on the subsample size and the information gain values observed. This paper addresses the questions of assessing when the choice may be made with a given expected error, and determining a sampling strategy that minimizes the computation cost of making it. The theory is entirely analogous to that used for decision-theoretic control of search. An empirical evaluation shows that an implementation performs well over an appropriately wide range of inputs on two fundamental tasks: deciding whether a choice of attribute can be safely made on a given sample, and if not, estimating how large a subsample is likely to be required to do so.
Conference Paper
In real-world concept learning problems, the representation of data often uses many features, only a few of which may be related to the target concept. In this situation, feature selection is important both to speed up learning and to improve concept quality. A new feature selection algorithm Relief uses a statistical method and avoids heuristic search. Relief requires linear time in the number of given features and the number of training instances regardless of the target concept to be learned. Although the algorithm does not necessarily find the smallest subset of features, the size tends to be small because only statistically relevant features are selected. This paper focuses on empirical test results in two artificial domains; the LED Display domain and the Parity domain with and without noise. Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.
Conference Paper
This paper outlines some problems that may occur with Reduced Error Pruning in relational learning algorithms, most notably efficiency. Thereafter a new method, Incremental Reduced Error Pruning, is proposed that attempts to address all of these problems. Experiments show that in many noisy domains this method is much more efficient than alternative algorithms, along with a slight gain in accuracy. However, the experiments show as well that the use of the algorithm cannot be recommended for domains which require a very specific concept description.
Conference Paper
Probabilistic networks which provide compact descriptions of complex stochastic relationships among several random variables are rapidly becoming the tool of choice for uncertain reasoning in artificial intelligence. We show that networks with fixed structure containing hidden variables can be learned automatically from data using a gradient-descent mechanism similar to that used in neural networks We also extend the method to networks with intensionally represented distributions, including networks with continuous variables and dynamic probabilistic networks Because probabilistic networks provide explicit representations of causal structure human experts can easily contribute pnor knowledge to the training process, thereby significantly improving the learning rate Adaptive probabilistic networks (APNs) may soon compete directly with neural networks as models in computational neuroscience as well as in industrial and financial applications.
Article
Reinforcement learning has become one of the most actively studied learning frameworks in the area of intelligent autonomous agents. This article describes the results of a three-day meeting of leading researchers in this area that was sponsored by the National Science Foundation. Because reinforcement learning is an interdisciplinary topic, the workshop brought together researchers from a variety of fields, including machine learning, neural networks, AI, robotics, and operations research. Thirty leading researchers from the United States, Canada, Europe, and Japan, representing from many different universities, government, and industrial research laboratories participated in the workshop. The goals of the meeting were to (1) understand limitations of current reinforcement-learning systems and define promising directions for further research; (2) clarify the relationships between reinforcement learning and existing work in engineering fields, such as operations research; and (3) identify potential industrial applications of reinforcement learning. Copyright © 1996, American Association for Artificial Intelligence. All rights reserved.
Article
This article outlines explanation-based learning (EBL) and its role in improving problem solving performance through experience. Unlike inductive systems, which learn by abstracting common properties from multiple examples, EBL systems explain why a particular example is an instance of a concept. The explanations are then converted into operational recognition rules. In essence, the EBL approach is analytical and knowledge-intensive, whereas inductive methods are empirical and knowledge-poor. This article focuses on extensions of the basic EBL method and their integration with the prodigy problem solving system. prodigy's EBL method is specifically designed to acquire search control rules that are effective in reducing total search time for complex task domains. Domain-specific search control rules are learned from successful problem solving decisions, costly failures, and unforeseen goal interactions. The ability to specify multiple learning strategies in a declarative manner enables EBL to serve as a general technique for performance improvement. prodigy's EBL method is analyzed, illustrated with several examples and performance results, and compared with other methods for integrating EBL and problem solving.
Article
This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computer-assisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. Finally, we relate the methods in this paper to previous work, and we discuss open problems.
Article
We explore the use of Rissanen's minimum description length principle for the construction of decision trees. Empirical results comparing this approach to other methods are given.