"Evolutionary algorithms (EA), and particularly genetic algorithms (GA), have been extensively used for the optimization and adjustment of models in data mining tasks. They are global search algorithms that have been used successfully in many complex and difficult optimization problems due to their flexibility and robust behavior . "
[Show abstract][Hide abstract] ABSTRACT: Association rule mining is a well-known methodology to discover significant and apparently hidden relations among attributes in a subspace of instances from datasets. Genetic algorithms have been extensively used to find interesting association rules. However, the rule-matching task of such techniques usually requires high computational and memory requirements. The use of efficient computational techniques has become a task of the utmost importance due to the high volume of generated data nowadays. Hence, this paper aims at improving the scalability of quantitative association rule mining techniques based on genetic algorithms to handle large-scale datasets without quality loss in the results obtained. For this purpose, a new representation of the individuals, new genetic operators and a windowing-based learning scheme are proposed to achieve successfully such challenging task. Specifically, the proposed techniques are integrated into the multi-objective evolutionary algorithm named QARGA-M to assess their performances. Both the standard version and the enhanced one of QARGA-M have been tested in several datasets that present different number of attributes and instances. Furthermore, the proposed methodologies have been integrated into other existing techniques based in genetic algorithms to discover quantitative association rules. The comparative analysis performed shows significant improvements of QARGA-M and other existing genetic algorithms in terms of computational costs without losing quality in the results when the proposed techniques are applied.
"Therefore, it would be beneficial if one could generate the association rules in a direct way, skipping the frequent itemset generation step. For this purpose, evolutionary algorithms have been used widely for generating association rules by maximizing the support/confidence of the rules . However, the goodness of an association rule cannot only be represented by its support or confidence. "
[Show abstract][Hide abstract] ABSTRACT: This paper is the second part of a two-part paper, which is a survey of multiobjective evolutionary algorithms for data mining problems. In Part I , multiobjective evolutionary algorithms used for feature selection and classification have been reviewed. In this part, different multiobjective evolutionary algorithms used for clustering, association rule mining, and other data mining tasks are surveyed. Moreover, a general discussion is provided along with scopes for future research in the domain of multiobjective evolutionary algorithms for data mining.
"From all of them, support and confidence highlight although lift, gain, certainty factor or leverage are also indicators that provide useful information about the extracted rules. A review on AR learning based on the use of EA applied to boolean, categorical, quantitative and fuzzy variables has been described in . However, as this work is focused on quantitative variables only the works using this kind of data are reviewed in this section. "
[Show abstract][Hide abstract] ABSTRACT: The majority of the existing techniques to mine association rules typically use the support and the confidence to evaluate the quality of the rules obtained. However, these two measures may not be sufficient to properly assess their quality due to some inherent drawbacks they present. A review of the literature reveals that there exist many measures to evaluate the quality of the rules, but that the simultaneous optimization of all measures is complex and might lead to poor results. In this work, a principal components analysis is applied to a set of measures that evaluate quantitative association rules' quality. From this analysis, a reduced subset of measures has been selected to be included in the fitness function in order to obtain better values for the whole set of quality measures, and not only for those included in the fitness function. This is a general-purpose methodology and can, therefore, be applied to the fitness function of any algorithm. To validate if better results are obtained when using the function fitness composed of the subset of measures proposed here, the existing QARGA algorithm has been applied to a wide variety of datasets. Finally, a comparative analysis of the results obtained by means of the application of QARGA with the original fitness function is provided, showing a remarkable improvement when the new one is used.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.