Figure 3 - uploaded by Jeannie Fitzgerald
Content may be subject to copyright.
Source publication
This paper introduces a novel evolutionary approach which can be applied to supervised, semi-supervised and unsupervised learning tasks. The method, Grammatical Evolution Machine Learning (GEML) adapts machine learning concepts from decision tree learning and clustering methods, and integrates these into a Grammatical Evolution framework. With mino...
Contexts in source publication
Context 1
... this work we have specified a grammar, shown in Figure 1, which facilitates the al- location of data instances to clusters based on the re- sults of applying simple 'if then else' decision rules. Although the rules are simple, the grammar allows for the construction of rich expressions which may be ca- pable of representing both simple and complex rela- tionships between attributes as seen in Figure 3. ...
Context 2
... all cases, the same mapping from cluster as- signment to class label determined during the training phase also applies when evaluating performance on test data. For each of the GEML methods the evolved solu- tions look something like the example shown in Fig- ure 3 which is essentially a python expression that can be evaluated for each training and each test in- stance. The result of evaluating the expression on a given instance is an integer which is converted into first a cluster assignment and then a class label, using the method already described. ...
Similar publications
Background:
COPD is a highly heterogeneous disease composed of different phenotypes with different aetiological and prognostic profiles and current classification systems do not fully capture this heterogeneity. In this study we sought to discover, describe and validate COPD subtypes using cluster analysis on data derived from electronic health re...
The amount of data is of crucial to the accuracy of fault classification through machine learning techniques. In wind energy harvest industry, due to the shortage of faulty data obtained in real practice, together with ever changing operational conditions, fault detection and evaluation of wind turbine blade problems become intractable through conv...
Under dynamic conditions on bridges, we need a real-time management. To this end, this paper presents a rule-based decision support system in which the necessary rules are extracted from simulation results made by Aimsun traffic micro-simulation software. Then, these rules are generalized by the aid of fuzzy rule generation algorithms. Then, they a...
The paper deals with k‐means clustering and Logic Learning Machine (LLM) for the detection of DNS tunneling. As the LLM shows more versatility in rule generation and classification precision with respect to traditional Decision Trees, the approach reveals to be robust to a large set of system conditions. The detection algorithm is designed to be ap...
Citations
... Evolutionary algorithms have been also applied to semi-supervised problems. In [6], the application of grammatical evolution (GE) for semi-supervised classification is proposed. The quality of the model is evaluated by combining the accuracy on the labeled data with one measure of cluster quality on the unlabeled examples. ...
... In [6], Fitzgerald et al. addressed semi-supervised problems using grammatical evolution (GE). They employ a grammar codifying if-then rules and evolve programs able to assign instances to different clusters based on their features. ...
... To create the classification problem benchmark, we follow the approach described in [6], in which fully labeled datasets are used and partial labeling is simulated by only considering a random subset of training data to be labeled; the rest of the data set is treated as unlabeled. Starting from a binary classification problem for which all labels are known, we will simulate the semi-supervised scenario by removing the labels for a proportion q of the instances in the dataset. ...
In some machine learning applications the availability of labeled instances for supervised classification is limited while unlabeled instances are abundant. Semi-supervised learning algorithms deal with these scenarios and attempt to exploit the information contained in the unlabeled examples. In this paper, we address the question of how to evolve neural networks for semi-supervised problems. We introduce neuroevolutionary approaches that exploit unlabeled instances by using neuron coverage metrics computed on the neural network architecture encoded by each candidate solution. Neuron coverage metrics resemble code coverage metrics used to test software, but are oriented to quantify how the different neural network components are covered by test instances. In our neuroevolutionary approach, we define fitness functions that combine classification accuracy computed on labeled examples and neuron coverage metrics evaluated using unlabeled examples. We assess the impact of these functions on semi-supervised problems with a varying amount of labeled instances. Our results show that the use of neuron coverage metrics helps neuroevolution to become less sensitive to the scarcity of labeled data, and can lead in some cases to a more robust generalization of the learned classifiers.
... Another research approach is to adopt a single decision tree and improve the fitting algorithm in order to provide a higher predictive performance. For instance, by using Evolutionary Computation (EC), such as proposed in Chabbouh et al. (2019), Czajkowski and Kretowski (2019a), Fitzgerald et al. (2015), Rivera-López and Canul-Reich (2018) and Motsinger-Reif et al. (2010). ...
... GE is particularly suited for variablelength solution representations, which is the case of a DT. Indeed, GE was used to evolve DTs in Motsinger-Reif et al. (2010) and Fitzgerald et al. (2015), outperforming standard DT algorithms (e.g, C4.5, CART) in several classification tasks. The advantage of using GE is that no limiting threshold needs to be set a priori, which is a limitation of the fixed-length tree representations used in Rivera-López and Canul-Reich (2018) and Chabbouh et al. (2019). ...
... An important distinctive aspect of our work is the type of EC goal. Most related works from Table 1, including the ones that use GE (Fitzgerald et al., 2015;Motsinger-Reif et al., 2010), only focus on predictive performance and not interpretability. These two goals are usually conflicting and thus a trade-off often needs to be set. ...
The worldwide adoption of mobile devices is raising the value of Mobile Performance Marketing, which is supported by Demand-Side Platforms (DSP) that match mobile users to advertisements. In these markets, monetary compensation only occurs when there is a user conversion. Thus, a key DSP issue is the design of a data-driven model to predict user conversion. To handle this nontrivial task, we propose a novel Multi-objective Optimization (MO) approach to evolve Decision Trees (DT) using a Grammatical Evolution (GE), under two main variants: a pure GE method (MGEDT) and a GE with Lamarckian Evolution (MGEDTL). Both variants evolve variable-length DTs and perform a simultaneous optimization of the predictive performance and model complexity. To handle big data, the GE methods include a training sampling and parallelism evaluation mechanism. The algorithms were applied to a recent database with around 6 million records from a real-world DSP. Using a realistic Rolling Window (RW) validation, the two GE variants were compared with a standard DT algorithm (CART), a Random Forest and a state-of-the-art Deep Learning (DL) model. Competitive results were obtained by the GE methods, which present affordable training times and very fast predictive response times.
... The effectiveness of KGP was demonstrated on a wide set of test problems. Finally, a novel evolutionary approach was proposed in [66], which can be applied to supervised, semi-supervised and unsupervised learning tasks. The method, Grammatical Evolution Machine Learning (GEML), adapts machine learning concepts from decision tree learning and clustering methods, and integrates these into a Grammatical Evolution framework. ...
... The authors state that the framework generates human readable solutions, which explain the mechanics behind the classification decisions, offering a significant advantage over existing paradigms for unsupervised and semi-supervised learning. Even though [64][65][66] all deal with overfitting, and we can assume that the data errors are outliers that the models should not (over)fit, to the best of our knowledge no work has been published yet that is explicitly devoted to the use of GP in a semi-supervised manner for dealing with data errors. ...
Data gathered in the real world normally contains noise, either stemming from inaccurate experimental measurements or introduced by human errors. Our work deals with classification data where the attribute values were accurately measured, but the categories may have been mislabeled by the human in several sample points, resulting in unreliable training data. Genetic Programming (GP) compares favorably with the Classification and Regression Trees (CART) method, but it is still highly affected by these errors. Despite consistently achieving high accuracy in both training and test sets, many classification errors are found in a later validation phase, revealing a previously hidden overfitting to the erroneous data. Furthermore, the evolved models frequently output raw values that are far from the expected range. To improve the behavior of the evolved models, we extend the original training set with additional sample points where the class label is unknown, and devise a simple way for GP to use this additional information and learn in a semi-supervised manner. The results are surprisingly good. In the presence of the exact same mislabeling errors, the additional unlabeled data allowed GP to evolve models that achieved high accuracy also in the validation phase. This is a brand new approach to semi-supervised learning that opens an array of possibilities for making the most of the abundance of unlabeled data available today, in a simple and inexpensive way.
In this paper, we propose a hybrid approach to solving multi-class problems which combines evolutionary computation with elements of traditional machine learning. The method, Grammatical Evolution Machine Learning (GEML) adapts machine learning concepts from decision tree learning and clustering methods and integrates these into a Grammatical Evolution framework. We investigate the effectiveness of GEML on several supervised, semi-supervised and unsupervised multi-class problems and demonstrate its competitive performance when compared with several well known machine learning algorithms. The GEML framework evolves human readable solutions which provide an explanation of the logic behind its classification decisions, offering a significant advantage over existing paradigms for unsupervised and semi-supervised learning. In addition we also examine the possibility of improving the performance of the algorithm through the application of several ensemble techniques.