Fig 4 - uploaded by Hayder Naser Khraibet Al-Behadili
Content may be subject to copyright.
Source publication
Rule-based classification is considered an important task of data classification. The ant-mining rule-based classification algorithm, inspired from the ant colony optimization algorithm, shows a comparable performance and outperforms in some application domains to the existing methods in the literature. One problem that often arises in any rule-bas...
Context in source publication
Context 1
... pruning is a common technique in rule-based classifiers that reduces the size of the discovered rules by avoiding the overfitting noisy data [14]. Noise in a dataset is caused by certain reasons (incorrect input, programming errors, and hardware failures). Such noisy data have a detrimental effect by misguiding the learning algorithm and producing a very poor classifier performance. In the learning process, the algorithm adds terms to the rule to increase its predictive accuracy by fitting the instances too closely. In this way, the rule will cover positive instances (instances correctly predicted by the rule) and then remove them from the training set; thereafter, the new instances are subjected to a new rule generation step. Subsequently, this type of rule is a perfect fit for instances from which they were generated. By contrast, when the rules are generated from noisy instances, they are highly complicated, lack usefulness, and exhibit low predictive accuracy on classifying unseen instances. This problem is known as overfitting, which can occur when the constructed rules fit too well, or exactly, to a particular training instance and do not have the applicability to unseen data. Then, those rules negatively influence the whole performance of the training model. An example of the overfitting rule picked up from the ant-miner algorithm without using rule pruning occurs in an experiment undertaken on a breast cancer dataset from the Ljubljana UCI Machine Learning Repository and is shown in Fig. 2. This dataset consists of nine attributes, all of which appear in the rule. Moreover, it can be observed that the rule is a perfect fit to the data instances from where it is generated. The abovementioned problem can be solved by detecting the significant terms in the generation rule and pruning the irrelative terms that provide minimal quality to classify the instances. This mechanism aims to improve the accuracy of the rule and increase its simplicity [15]. Post-pruning, pre-pruning, and hybrid-pruning are three strategies used in the ant-miner rule-based classifier. In the pre-pruning strategy, the rule discovery algorithm is halted before creating a full rule. The stopping condition handles irrelevant terms during the learning process (i.e., stop selecting the term when the impurity measured for some terms is less than the pre-deterministic value). However, post-pruning deals with irrelevant terms after an overfitting rule has been constructed; in this strategy, the rule grows to maximum size. Then, the irrelative term is deleted from the rule. Meanwhile, hybrid-pruning combines the characteristics of post-pruning and pre-pruning. In order to observe the influence of rule pruning, an experiment was undertaken with the same Ljubljana breast cancer dataset, using the same parameter setting on the ant-miner algorithm with the traditional post-pruning procedure. Then, we obtained a different rule from similar instances as shown in Fig. 3. This rule involved only two attributes in its structure. Thus, the rule is simpler and has less number of terms. Conversely, it covers more instances and is more accurate. The above examples are provided to show the impact of rule pruning in ant-miner classification algorithms which is equivalent to that of local search in stochastic methods. The pruning procedure aims to search for an improvement for each candidate solution produced by each ant. In addition, it increases its simplicity. In contrast, designing the algorithm without pruning techniques as in the MACO algorithm [16] will introduce complex rules and may face overfitting problems. Given its characteristics, rule pruning plays an important role to construct a classification model in ant-miner and its variants. In addition, in all pruning strategies, any excess of pre-pruning and post-pruning in the rule may lead to a very simple rule that does not have the ability to capture the underlying structure of the data. This problem is known as underfitting. Thus, the rule will not be suitable and lead to poor predictive performance on the data. Fig. 4 shows the overfitting and underfitting rules based on predictive error and model complexity. ...
Citations
... As mining kidnapping began to spread in the wild to abuse browser users' CPU resources, many researchers started to propose methods to block web mining, such as recording the file names of mining scripts or the domain names of mining pages into a blacklist. Plug-ins such as MinerBlock, No Coin, and AntiMiner [20], or techniques that use machine learning to analyze and detect virtual currency mining scripts, such as TLSH [21], were developed. However, mining kidnapping attacks continue to exist. ...
Coinhive released its browser-based cryptocurrency mining code in September 2017, and vicious web page writers, called vicious miners hereafter, began to embed mining JavaScript code into their web pages, called mining pages hereafter. As a result, browser users surfing these web pages will benefit mine cryptocurrencies unwittingly for the vicious miners using the CPU resources of their devices. The above activity, called Cryptojacking, has become one of the most common threats to web browser users. As mining pages influence the execution efficiency of regular programs and increase the electricity bills of victims, security specialists start to provide methods to block mining pages. Nowadays, using a blocklist to filter out mining scripts is the most common solution to this problem. However, when the number of new mining pages increases quickly, and vicious miners apply obfuscation and encryption to bypass detection, the detection accuracy of blacklist-based or feature-based solutions decreases significantly. This paper proposes a solution, called MinerGuard, to detect mining pages. MinerGuard was designed based on the observation that mining JavaScript code consumes a lot of CPU resources because it needs to execute plenty of computation. MinerGuard does not need to update data used for detection frequently. On the contrary, blacklist-based or feature-based solutions must update their blocklists frequently. Experimental results show that MinerGuard is more accurate than blacklist-based or feature-based solutions in mining page detection. MinerGuard’s detection rate for mining pages is 96%, but MinerBlock, a blacklist-based solution, is 42.85%. Moreover, MinerGuard can detect 0-day mining pages and scripts, but the blacklist-based and feature-based solutions cannot.
... However, due to a certain degree of imprecision in real-world data, some of the extracted chronicle rules may suffer from low quality. Also, overfitting is a problem that often arises in rule-based classification and prediction problems [4]. In this context, these low-quality and overfitting rules may reduce the accuracy and efficiency of KSPMI. ...
In the context of Industry 4.0, smart factories use advanced sensing and data analytic technologies to understand and monitor the manufacturing processes. To enhance production efficiency and reliability, statistical Artificial Intelligence (AI) technologies such as machine learning and data mining are used to detect and predict potential anomalies within manufacturing processes. However, due to the heterogeneous nature of industrial data, sometimes the knowledge extracted from industrial data is presented in a complex structure. This brings the semantic gap issue which stands for the lack of interoperability among different manufacturing systems. Furthermore, as the Cyber-Physical Systems (CPS) are becoming more knowledge-intensive, uniform knowledge representation of physical resources and real-time reasoning capabilities for analytic tasks are needed to automate the decision-making processes for these systems. These requirements highlight the potential of using symbolic AI for predictive maintenance.
To automate and facilitate predictive analytics in Industry 4.0, in this paper, we present a novel Knowledge-based System for Predictive Maintenance in Industry 4.0 (KSPMI). KSPMI is developed based on a novel hybrid approach that leverages both statistical and symbolic AI technologies. The hybrid approach involves using statistical AI technologies such as machine learning and chronicle mining (a special type of sequential pattern mining approach) to extract machine degradation models from industrial data. On the other hand, symbolic AI technologies, especially domain ontologies and logic rules, will use the extracted chronicle patterns to query and reason on system input data with rich domain and contextual knowledge. This hybrid approach uses Semantic Web Rule Language (SWRL) rules generated from chronicle patterns together with domain ontologies to perform ontology reasoning, which enables the automatic detection of machinery anomalies and the prediction of future events’ occurrence. KSPMI is evaluated and tested on both real-world and synthetic data sets.
... This value is updated when rules are discovered and the size of the training set is reduced. The amount of pheromone is updated after each experiment; thus, the selection of idioms for other ants is influenced in future experiments [29], [30]. ...
Background and Objectives: The Ant-Miner algorithm works based on Ant Colony Optimization as a tool for data analysis , and is used to explore classified laws from a set of data. In the current study, two new methods have been proposed for the purpose of optimizing this algorithm. The first method adopted logical negation operation on the records of the produced laws, while the second employed a new pheromone update strategy called "Generalized exacerbation of quality conflict". The two proposed methods were executed in Visual studio C#.Net , and 8 public datasets were applied in the test. Each one of these datasets was executed 10 times both in an independent way and combined with others, and the average results were recorded. Methods: In this study, we have proposed two approaches for the earlier method. Using the first method in the construction of rule records, idioms that include the rules can be made in the form of . Compared to the idioms of the early algorithm, these idioms are more compatible while constructing rules with high coverage. The advantage of this generalization is the reduction of the produced rules, which results in greater understandability of the output. During the process of pheromone update in the ordinary ACO algorithms, the amount of the sprayed pheromone is a function of the quality of rules. The objective of the second method is to strengthen the conflict between not-found, weak, good, and superior solutions. This method is a new strategy of pheromone update where ants with high-quality solutions are motivated through increasing the amount of pheromone sprayed on the trail that they have found; conversely, the ants that find weaker solutions are punished through eliminating pheromone from their trails. Results: The optimization of the initial algorithm using the two proposed methods produces a smaller number of rules, but increases the number of construction diagrams and prevents the production of low-quality rules. Conclusion: The results of tests performed on the dataset indicated the enhancement of algorithm efficiency in idioms of fewer tests, increased prediction accuracy of laws, and improved comprehensibility of the produced laws using the proposed methods.
... It uses the collective behavior of self-organized systems. Examples of swarm intelligence algorithms are particle swarm optimization (PSO) [44], ant colony optimization [45], artificial bee colony (ABC) [46], grey wolf optimization (GWO) [47], and bat algorithm (BA) [48]. ...
In today's world, the data generated by many applications are increasing drastically, and finding an optimal subset of features from the data has become a crucial task. The main objective of this review is to analyze and comprehend different stochastic local search algorithms to find an optimal feature subset. Simulated annealing, tabu search, genetic programming, genetic algorithm, particle swarm optimization, artificial bee colony, grey wolf optimization, and bat algorithm, which have been used in feature selection, are discussed. This review also highlights the filter and wrapper approaches for feature selection. Furthermore, this review highlights the main components of stochastic local search algorithms, categorizes these algorithms in accordance with the type, and discusses the promising research directions for such algorithms in future research of feature selection.
... One of the supervised learning techniques that gains significant attention is the rules-based classification which extracts classification rules from the data. One of the prominent algorithms used for rules-classification is ant colony optimization (ACO) for rules classification of Ant-Miner variants [12], [13]. The Ant-Miner produces a comprehensive classification model by finding a list of classification rules fashion (IF-THEN) from the data. ...
Ant colony optimization (ACO) is a well-known algorithm from swarm intelligence that plays an essential role in obtaining rich solutions to complex problems with wide search space. ACO is successfully applied to different application problems involving rules-based classification through an ant-miner classifier. However, in the ant-miner classifier, rule-pruning suffers from the problem of nesting effect origins from the method of greedy Sequential Backward Selection (SBS) in term selection, thereby depriving the opportunity of obtaining a good pruned rule by adding/removing the terms during the pruning process. This paper presents an extension to the Ant-Miner, namely the genetic algorithm Ant-Miner (GA-Ant Miner), which incorporates the use of GA as a key aspect in the design and implementation of a new rule pruning technique. This pruning technique consists of three fundamental procedures: an initial population Ant-Miner, crossover to prune the rule, and mutation to diversify the pruned classification rule. The GA-Ant Miner performance is tested and compared with the most related ant-mining classifiers, including the original Ant-Miner, ACO/ PSO2, TACO-Miner, CAnt-Miner, and Ant-Miner with a hybrid pruner, across various public available UCI datasets. These datasets are varied in terms of instance number, feature size, class number, and the application domains. Overall, the performance results indicate that the GA-Ant Miner classifier outperforms the other five classifiers in the classification accuracy and model size. Furthermore, the experimental results using statistical test prove that GA-Ant Miner is the best classifier when considering the multi objectives (i.e., accuracy and model size ranks).
... Pheromone trails represent a distributed, numerical information that the ACO adapt during its execution to reflect the search experience (Dorigo and Stützle, 2004). Many applications involve the usage of ACO metaheuristic framework such as scheduling (Blum, 2005), travel salesman problem (Sagban et al., 2017), assembly line balancing (Blum, 2008), sequential ordering (Dorigo and Stützle, 2010), DNA sequencing , packet-switched routing (Di Caro and Dorigo, 1998), feature selection (Kanan et al., 2007), data clustering (Jabbar and Ku-Mahamud, 2018;Jabbar et al., 2019a;2019b) and data classification (Al-Behadili et al., 2019;2018b;2018a). ...
In this study, a hybrid rule-based classifier namely, ant colony optimization/genetic algorithm ACO/GA is introduced to improve the classification accuracy of Ant-Miner classifier by using GA. The Ant-Miner classifier is efficient, useful and commonly used for solving rule-based classification problems in data mining. Ant-Miner, which is an ACO variant, suffers from local optimization problem which affects its performance. In our proposed hybrid ACO/GA algorithm, the ACO is responsible for generating classification rules and the GA improves the classification rules iteratively using the principles of multi-neighborhood structure (i.e., mutation and crossover) procedures to overcome the local optima problem. The performance of the proposed classifier was tested against other existing hybrid ant-mining classification algorithms namely, ACO/SA and ACO/PSO2 using classification accuracy, the number of discovered rules and model complexity. For the experiment, the 10-fold cross-validation procedure was used on 12 benchmark datasets from the University California Irwine machine learning repository. Experimental results show that the proposed hybridization was able to produce impressive results in all evaluation criteria.
... Rule pruning is another problem in terms of complexity. For large data, rule pruning is proven to be an exhaustive task [26]. Different researchers have updated AM to improve its performance but all the existing variants suffer from some of the above discussed limitations [9][10][11][12][13][14][15]17,25]. ...
... Rule pruning also plays a crucial role in AM's performance along with heuristic function and pheromone update method. In [26], authors did an extensive survey on existing rule pruning techniques. Rule pruning avoids over-fitting and also sometimes reduces the chances of under-fitting but rule pruning is a complex and computationally expensive task especially when the data-set is in high dimension. ...
Ant-Miner, a rule-based classification algorithm, has been successfully applied for classification tasks but it has some limitations such as getting stuck in local optima, high selective pressure, fixed exploration and exploitation rate, and premature convergence. In this paper, we have proposed a novel Ant-Miner based technique based on new Pheromone update method, Rule Rejection threshold, Adaptive gamma, and altered Tournament selection (PRRAT_AM) that caters to these limitations. The proposed algorithm introduced an adaptive gamma parameter to avoid fixed exploration and exploitation rate. To decrease the selective pressure, pheromone is updated by weighted average of rule length, rule quality and heuristic of the path. Ants are selected using improved tournament selection strategy to update the pheromone. Rules that covered less than one percent of the training examples are rejected to generate generic rules. These improvements aid PRRAT_AM in avoiding premature convergence and high selective pressure. We have tested the proposed approach on eight publicly available data-sets on standard benchmark performance measures that include accuracy and F1-score. The proposed approach has been compared with state of the art versions of Ant-Miner and with various data mining algorithms. The experimental results showed that the proposed approach achieved better results when compared with other techniques in terms of standard performance measures and convergence speed.
... Similarities are measured on the basis of the extracted features amongst data. This indirect data mining approach performs clustering without using predefined classes (unlabelled data) to determine the relationship amongst data, whereas a direct approach (classification) requires the use of predefined classes (pre-labelled data) [1,2]. The clustering approach groups data into different clusters that contain similar objects on the basis of an appropriate fitness measure that determines the relationship amongst data [3][4][5]. ...
Data clustering is a data mining technique that discovers hidden patterns by creating groups (clusters) of objects. Each object in every cluster exhibits sufficient similarity to its neighbourhood, whereas objects with insufficient similarity are found in other clusters. Data clustering techniques minimise intra-cluster similarity in each cluster and maximise inter-cluster dissimilarity amongst different clusters. Ant colony optimisation for clustering (ACOC) is a swarm algorithm inspired by the foraging behaviour of ants. This algorithm minimises deterministic imperfections in which clustering is considered an optimisation problem. However, ACOC suffers from high diversification in which the algorithm cannot search for best solutions in the local neighbourhood. To improve the ACOC, this study proposes a modified ACOC, called M-ACOC, which has a modification rate parameter that controls the convergence of the algorithm. Comparison of the performance of several common clustering algorithms using real-world datasets shows that the accuracy results of the proposed algorithm surpasses other algorithms.
... However, generalizing the model to new data is difficult. Meanwhile, under-fitting occurs when the classification model cannot capture the underlying data trend [15]. Rule pruning is the framework used in ant-mining algorithms to avoid under-fitting and over-fitting problems [16]. ...
Pruning is the popular framework for preventing the dilemma of overfitting noisy data. This paper presents a new hybrid Ant-Miner classification algorithm and ant colony system (ACS), called ACS-AntMiner. A key aspect of this algorithm is the selection of an appropriate number of terms to be included in the classification rule. ACS-AntMiner introduces a new parameter called importance rate (IR) which is a pre-pruning criterion based on the probability (heuristic and pheromone) amount. This criterion is responsible for adding only the important terms to each rule, thus discarding noisy data. The ACS algorithm is designed to optimize the IR parameter during the learning process of the Ant-Miner algorithm. The performance of the proposed classifier is compared with related ant-mining classifiers, namely, Ant-Miner, CAnt-Miner, TACO-Miner, and Ant-Miner with a hybrid pruner across several datasets. Experimental results show that the proposed classifier significantly outperforms the other ant-mining classifiers.
... One of the supervised learning techniques to gain significant attention is the rules-based classification which extracts classification rules from the data. One of the prominent ant colony optimisation (ACO) algorithms used for rules-classification is the Ant-Miner variant [9], [10]. The Ant-Miner produces a comprehensive classification model by finding a list of classification rules in the form of (IF <term1> AND <term2> AND … <term-n> THEN <class> from the data. ...
This research presents the ILS-AntMiner rules-based algorithm, a hybrid Iterated Local Search and Ant Colony Optimization, to improve classification accuracy and the size of the classification model. This hybridisation aims to enhance the classification performance in both accuracy and simplicity by increasing the profit of neighbourhood structures in the exploitation mechanism. The experimental results in this research are compared with the most related ant-mining classifiers, including ACO/PSO2 and ACO/SA across various datasets. The results indicate that the proposed classification algorithm can effectively search the training space based on multiple structures to escape from local optima and achieve high classification accuracy and model size.