José María LunaUniversity of Cordoba (Spain) | UCO · Department of Computer Sciences and Numerical Analysis
José María Luna
PhD Student
About
91
Publications
59,924
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,531
Citations
Introduction
José María Luna is a Ph. D. student of the department of Computer Sciences and Numerical Analysis in the University of Córdoba. His research is developed in the context of the "Knowledge Discovery and Intelligent Systems " (KDIS) research group, and it is focused on mining association rules with genetic programming. He has a grant from the Ministry of Education and Science of Spain on the FPU program (AP2010-0041).
Publications
Publications (91)
El mantenimiento predictivo ha supuesto un importante hito en la forma en la que los sistemas industriales se analizan con el fin de detectar anomalías en el funcionamiento y posibles fallos antes de que éstos ocurran. En este trabajo se presenta una Herramienta de Sostenimiento Avanzado (HSA) del Ejército de Tierra que permite mejorar la planifica...
The task of detection of common and unique characteristics among different cancer subtypes is an important focus of research that aims to improve personalized therapies. Unlike current approaches mainly based on predictive techniques, our study aims to improve the knowledge about the molecular mechanisms that descriptively led to cancer, thus not r...
perdona que estoy en clase. Te lo mando:
In the airline industry, the Revenue and Pricing teams generally spend a considerable amount of time analysing and interpreting the actions of their competitors. Most of the time the analysts have to use their analytical skills to create ad-hoc methods to interpret or find patterns in the fares. In this fi...
El mantenimiento de instalaciones industriales ha sido siempre una tarea crítica para garantizar el buen funcionamiento de los sistemas y su disponibilidad. Las estrategias de mantenimiento tradicionales han estado marcadas por enfoques correctivos y preventivos. Sin embargo, los últimos avances en sensorización y aprendizaje automático han impulsa...
The multi-label classification task has been widely used to solve problems where each of the instances may be related not only to one class but to many of them simultaneously. Many of these problems usually comprise a high number of labels in the output space, so learning a predictive model from such datasets may turn into a challenging task since...
To provide a good study plan is key to avoid students’ failure. Academic advising based on student’s preferences, complexity of the semester, or even background knowledge is usually considered to reduce the dropout rate. This article aims to provide a good course index to recommend courses to students based on the sequence of courses already taken...
High utility itemset mining is an important model in data mining. It involves discovering all itemsets in a quantitative transactional database that satisfy a user-specified minimum utility (minU til) constraint. M inU til controls the minimum value that an itemset must maintain in a database. Since the model evaluates an itemset's interestingness...
This paper presents an approach based on emerging pattern mining to analyse cancer through genomic data. Unlike existing approaches, mainly focused on predictive purposes, the proposed approach aims to improve the understanding of cancer in a descriptive way, not requiring either any prior knowledge or hypothesis to be validated. Additionally, it e...
BACKGROUND:
The dataset from genes used for the prediction of HCV outcome was evaluated in a previous study by means of conventional statistical methodology.
OBJECTIVE:
The aim of this study was reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied.
METHO...
The current state of the art in supervised descriptive pattern mining is very good in automatically finding subsets of the dataset at hand that are exceptional in some sense. The most common form, subgroup discovery, generally finds subgroups where a single target variable has an unusual distribution. Exceptional model mining (EMM) typically finds...
Periodic frequent patterns are sets of events or items that periodically appear in a sequence of events or transactions. Many algorithms have been designed to identify periodic frequent patterns in data. However, most assume that the periodic behavior of a pattern does not change much over time. To address this limitation, this paper proposes to di...
Existing systems to support decision-taking process based on textual information of clinical reports are insufficient. Currently, there are few systems that unify different subtasks in a single and user-friendly framework, easing therefore the clinical work by automating complex and arduous tasks such as the detection of clinical alerts as
well as...
The goal of this paper is to introduce LAC, a new Java Library for Associative Classification. LAC is the first tool that covers the full taxonomy of this classification paradigm through 10 well-known proposals in the field. Furthermore, it includes several measures to quantify the quality of the solutions as well as different input/output data for...
To date, the subgroup discovery task has been considered in problems where a target variable is unequivocally described by a set of features, also known as instance. Nowadays, however, with the increasing interest in data storage, new data structures are being provided such as the multiple-instance data in which a target variable value is ambiguous...
Frequent itemset mining (FIM) is an essential task within data analysis since it is responsible for extracting frequently occurring events, patterns, or items in data. Insights from such pattern analysis offer important benefits in decision‐making processes. However, algorithmic solutions for mining such kind of patterns are not straightforward sin...
The aim of this paper is to categorize and describe different types of learners in MOOCs by means of a subgroup discovery approach based on MapReduce. The final objective is to discover generalizable IF-THEN rules that can be replicated into different MOOCs. The proposed subgroup discovery approach is an extension of the well-known FP-Growth algori...
Background. The state-of-the-art in associative classification includes interesting approaches for building accurate and interpretable classifiers. These approaches generally work on four different phases (data discretization, pattern mining, rule mining, and classifier building), some of them being computational expensive.
Methods. The aim of thi...
Health care professionals produce abundant textual information in their daily clinical practice and this information is stored in many diverse sources and, generally, in textual form. The extraction of insights from all the gathered information, mainly unstructured and lacking normalization, is one of the major challenges in computational medicine....
Background: Associative Classification, a combination of two important and different fields (classification and association rule mining), aims at building accurate and interpretable classifiers by means of association rules. A major problem in this field is that existing proposals do not scale well when Big Data are considered. In this regard, the...
In the association rule mining field many different quality measures have been proposed over time with
the aim of quantifying the interestingness of each discovered rule. In evolutionary computation, many of
these metrics have been used as functions to be optimized, but the selection of a set of suitable quality
measures for each specific proble...
A classical task in data analytics is the finding of data subsets that somehow deviate from the norm and denote that something interesting is going on. Such deviations can be measured in terms of the frequency of occurrence (e.g. class association rules) or as an unusual distribution for a specific target variable (e.g. subgroup discovery). Excepti...
This chapter introduces the supervised descriptive pattern mining task to the reader, providing him/her with the concept of patterns as well as presenting a description of the type of patterns usually found in literature. Patterns on advanced data types are also defined, denoting the usefulness of sequential and spatiotemporal patterns, patterns on...
The study of problems that involve data examples associated with multiple targets at the same time has gained a lot of attention in the past few years. In this work, a method based on gene expression programming for the multi-target regression problem is proposed. This method solves the symbolic regression problem for multi-target contexts, allowin...
Subgroup discovery is the most well-known task within the supervised descriptive pattern mining field. It aims at discovering patterns in the form of rules induced from labeled data. This chapter therefore introduces the subgroup discovery problem and also describes the main differences with regard to classification and clustering tasks. Additional...
Association rule mining is the most well-known task to perform data descriptions. This task aims at extracting useful an unexpected co-occurrences among items in data. Even when this task was generally defined for mining any type of association, it is sometimes desiderable the extraction of co-occurrences between a set of items and a specific targe...
Supervised descriptive pattern mining has been denoted as a really important area of research, including interesting tasks such as the discovery of emerging patterns, contrast set mining, subgroup discovery, class association rule mining, exceptional models mining, among others. The aim of this chapter is to illustrate a wide range of real problems...
Contrast set mining is one of the most important tasks in the supervised descriptive pattern mining field. It aims at finding patterns whose frequencies differ significantly among sets of data under contrast. This chapter introduces therefore the contrast set mining problem as well as similarities and differences with regard to related techniques....
Emerging pattern mining is a well-known task in the supervised descriptive pattern mining field. This task aims at discovering emerging trends amongst timestamped datasets or extracting patterns that denote a clear difference between two disjoint features. This chapter introduces therefore the emerging pattern mining problem and describes the main...
This book provides a general and comprehensible overview of supervised descriptive pattern mining, considering classic algorithms and those based on heuristics. It provides some formal definitions and a general idea about patterns, pattern mining, the usefulness of patterns in the knowledge discovery process, as well as a brief summary on the tasks...
Patterns are sometimes used to describe important properties of different data subsets previously identified or labeled, transforming therefore the pattern mining concept into a more specific one, supervised descriptive pattern mining. In general, supervised descriptive discovery is said to gather three main tasks: contrast set mining, emerging pat...
Association rule mining is one of the most important tasks to describe raw data. Although many efficient algorithms have been developed to this aim, existing algorithms do not work well on huge volumes of data. The aim of this paper is to propose a new genetic programming algorithm for mining association rules in Big Data. The genetic operators of...
Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pa...
Real-world data usually comprise features whose interpretation depends on some contextual information. Such
contextual-sensitive features and patterns are of high interest to be discovered and analyzed in order to obtain the right meaning. This paper formulates the problem of mining context-aware association rules, which refers to the search for as...
The growing interest in the extraction of useful knowledge from data with the aim of being beneficial for the data owner is giving rise to multiple data mining tools. Research community is specially aware of the importance of open source data mining software to ensure and ease the dissemination of novel data mining algorithms. The availability of t...
Association rule mining is one of the most common data mining techniques used to identify and describe interesting relationships between patterns from large quantities of data. Whereas many researches have been focused on the extraction of these patterns which appear frequently to obtain general information, in some scenarios it could also be inter...
Subgroup discovery is a well-known techniquefor the extraction of patterns, with respect to a variable ofinterest in the data. However, the explosion in data gatheringhas hampered the performance of traditional algorithms todiscover interesting relationships between different objectsin a set with respect to a specific property which is of interestto...
The interest in developing Learning Analytics tools that can be integrated into the well-known Moodle course management systems is increasing nowadays. These tools generally provide some type of basic analytics and graphs about users' interaction in the course. However, they do not enable a varied set of Data Mining techniques to be applied, such a...
The growing interest in data storage has made the data size to be exponentially increased, hampering the process of knowledge discovery from these large volumes of high-dimensional and heterogeneous data. In recent years, many efficient algorithms for mining data associations have been proposed, facing up time and main memory requirements. Neverthe...
Subgroup Discovery is a broadly applicable supervised local pattern mining method to search relations between different properties with respect to a target variable. With the exponential growth in data storage, the massive data gathered has hampered the performance of current techniques. In this regard, our aim is to propose two new algorithms to d...
The transition from high school to university is a critical step and many students head toward failure just because their final degree option was not the right choice. Both students’ preferences and skills play an important role in choosing the degree that best fits them, so an analysis of these attitudes during the high school can minimize the dro...
Subgroup Discovery is a flexible supervised local pattern mining method whose aim is to discover interesting subgroups with respect to one property of interest. Although many efficient algorithms have been developed in this field, the growing interest in data storage has provoked that the datasets are larger and larger hampering their performance....
In pattern mining, solutions tend to be evaluated according to several conflicting quality measures and, sometimes, these measures have to be optimized simultaneously. The problem of optimizing more than one objective function is known as multiobjective optimization, which together with evolutionary computation have given rise to evolutionary multi...
Pattern mining is considered as a really interesting task for the extraction of hidden knowledge in the form of patterns. The extraction of such subsequences, substructures or itemsets that represent any type of homogeneity and regularity in data has been carried out from unlabeled data. However, there are many research areas that aim at discoverin...
In any dataset, it is possible to identify small subsets of data which distribution is exceptionally different from the distribution in the complete set of data records. Finding such exceptional behaviour it is possible to mine interesting associations, which is known as the exceptional relationship mining task. This chapter formally describes the...
This chapter introduces the pattern mining task to the reader, providing formal definitions about patterns, the pattern mining task and the usefulness of patterns in the knowledge discovery process. The utility of the extraction of patterns is introduced by a sample dataset for the market basket analysis. Different type of patterns can be considere...
This chapter presents an overview on evolutionary computation, introducing its basic concepts and serving as a starting point for an inexpert user in this field. Then, the chapter discusses paradigms such as genetic algorithms and genetic programming, which are the most widely used techniques in the mining of patterns of interest. Finally, a brief...
This chapter describes the use of genetic algorithms for the mining of patterns of interest and the extraction of accurate relationships between them. The current chapter first makes an analysis of the utility of genetic algorithms in the mining of patterns of interest, paying special attention to the computational time and the memory requirements....
In this chapter different quality measures to evaluate the interest of the patterns discovered in the mining process are described. Patterns represent major features of data so their interestingness should be accordingly quantified by considering metrics that determine how representative a specific pattern is for the dataset. Nevertheless, a patter...
This chapter describes the use of genetic programming for the mining of patterns of interest and the extraction of accurate relationships between patterns. The current chapter first describes the canonical representation of genetic programming and the use of grammars to restrict the search space. Then, it describes different approaches based on gen...
The pattern mining task is the keystone of data analysis, describing and representing any type of homogeneity and regularity in data. Abundant research studies have been dedicated to this task, providing overwhelming improvements in both efficiency and scalability. Nevertheless, the growing interest in data collection is giving rise to extremely la...
Data processing in a fast and efficient way is an important functionality in machine learning, especially with the growing interest in data storage. This exponential increment in data size has hampered traditional techniques for data analysis and data processing, giving rise to a new set of methodologies under the term Big Data. Many efficient algo...
Association rule mining is one of the most common data mining techniques used to identify and describe interesting relationships between patterns from large datasets, the frequency of an association being defined as the number of transactions that it satisfies. In situations where each transaction includes an undetermined number of instances (custo...
The extraction of patterns of interest and associations between them have been a major research topic since its definition at the beginning of the nineties. Abundant research studies have been dedicated to this field, providing overwhelming progresses in both efficiency and scalability, and extracting patterns from different data structures and dom...
Multi-label learning is a challenging task in data mining which has attracted growing attention in recent years. Despite the fact that many multi-label datasets have continuous features, general algorithms developed specially to transform multi-label datasets with continuous attributes’ values into a finite number of intervals have not been propose...
This book provides a comprehensive overview of the field of pattern mining with evolutionary algorithms. To do so, it covers formal definitions about patterns, patterns mining, type of patterns and the usefulness of patterns in the knowledge discovery process. As it is described within the book, the discovery process suffers from both high runtime...
Given a database of records, it might be possible to identify small subsets of data which distribution is exceptionally diffeerent from the distribution in the complete set of data records. Finding such interesting relationships, which we call exceptional relationships, in an automated way would allow discovering unusual or exceptional hidden behav...
The use of data mining techniques in educational domains helps to find new knowledge about how students learn and how to improve the resources management. Using these techniques for predicting school failure is very useful in order to implement corrective actions. With this purpose, we try to determine the earliest stage when the quality of the res...
JCLEC-Classification is a usable and extensible open source library for genetic programming classification algorithms. It houses implementations of rule-based methods for classification based on genetic programming, supporting multiple model representations and providing to users the tools to implement any classifier easily. The software is written...
Most approaches for the extraction of association rules look for associations from a dataset in the form of a single table. However, with the growing interest in the storage of information, relational databases comprising a series of relations (tables) and relationships have become essential. We present the first grammar-guided genetic programming...
This paper proposes a novel grammar-guided genetic programming algorithm for subgroup discovery. This algorithm, called comprehensible grammar-based algorithm for subgroup discovery (CGBA-SD), combines the requirements of discovering comprehensible rules with the ability to mine expressive and flexible solutions owing to the use of a context-free g...
The extraction of useful information for decision making is a challenge in many different domains. Association rule mining is one of the most important techniques in this field, discovering relationships of interest among patterns. Despite the mining of association rules being an area of great interest for many researchers, the search for well-grou...
This paper presents a novel self-adaptive grammar-guided genetic programming proposal for mining association rules. It generates individuals through a context-free grammar, which allows of defining rules
in an expressive and flexible way over different domains. Each rule is represented as a derivation tree that shows a solution (described using the...
Association rule mining, an important data mining technique, has been widely focused on the extraction of frequent patterns. Nevertheless, in some application domains it is interesting to discover patterns that do not frequently occur, even when they are strongly related. More specifically, this type of relation can be very appropriate in e-learnin...
In association rule mining, the process of extracting relations from a dataset often requires the application of more than one quality measure and, in many cases, such measures involve conflicting objectives. In such a situation, it is more appropriate to attain the optimal trade-off between measures. This paper deals with the association rule mini...
This paper treats the first approximation to the extraction of association rules by employing ant programming, a technique that has recently reported very promising results in mining classification rules. In particular, two different algorithms are presented, both guided by a context-free grammar that defines the search space, specifically suited t...
This paper proposes the application of association rule mining to improve quizzes and courses. First, the paper shows how to preprocess quiz data and how to create several data matrices for use in the process of knowledge discovery. Next, the proposed algorithm that uses grammar‐guided genetic programming is described and compared with both classic...
Association rule mining is a well-known data mining task, but it requires much computational time and memory when mining large scale data sets of high dimensionality. This is mainly due to the evaluation process, where the antecedent and consequent in each rule mined are evaluated for each record. This paper presents a novel methodology for evaluat...
This paper deals with the problem of discovering subgroups in data by means of a grammar guided genetic programming algorithm, each subgroup including a set of related patterns. The proposed algorithm combines the requirements of discovering comprehensible rules with the ability of mining expressive and flexible solutions thanks to the use of a con...
This paper treats the first approximation to the extraction of association rules by employing ant programming, a technique that has recently reported very promising results in mining classification rules. In particular, two different algorithms are presented, both guided by a context-free grammar, specifically suited to association rule mining, whi...
This paper presents a free-parameter grammar-guided genetic programming algorithm for mining association rules. This algorithm uses a contex-free grammar to represent individuals, encoding the solutions in a tree-shape conformant to the grammar, so they are more expressive and flexible. The algorithm here presented has the advantages of using evolu...
This paper presents a proposal for the extraction of association rules called G3PARM (Grammar-Guided Genetic Programming for Association Rule Mining) that makes the knowledge extracted more expressive and flexible. This algorithm allows a context-free grammar to be adapted and applied to each specific problem or domain and eliminates the problems r...
This paper proposes the application of association rule mining to improve quizzes and courses. First, the paper shows how to preprocess quiz data and how to create several data matrices for use in the process of knowledge discovery. Next, the proposed algorithm that uses grammar-guided genetic programming is described and compared with both classic...
To date, association rule mining has mainly focused on the discovery of frequent patterns. Nevertheless, it is often interesting to focus on those that do not frequently occur. Existing algorithms for mining this kind of infrequent patterns are mainly based on exhaustive search methods and can be applied only over categorical domains. In a previous...