Francisco Herrera

Francisco Herrera
University of Granada | UGR · Department of Computer Science and Artificial Intelligence

PhD

About

1,087
Publications
439,273
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
110,620
Citations
Introduction
Head of the resesearh group SCI2S http://sci2s.ugr.es. He has been the supervisor of 36 PhD students. Editor in Chief of the international journals "Information Fusion" (Elsevier) and “Progress in Artificial Intelligence (Springer). Current research interests include among others, soft computing (including CWW, fuzzy modeling and evolutionary algorithms), decision making, bibliometrics, biometric, data preprocessing, data mining, big data.
Additional affiliations
October 1988 - present
University of Granada
Position
  • Professor

Publications

Publications (1,087)
Article
Full-text available
Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances...
Article
The term ‘Big Data’ has spread rapidly in the framework of Data Mining and Business Intelligence. This new scenario can be defined by means of those problems that cannot be effectively or efficiently addressed using the standard computing resources that we currently have. We must emphasize that Big Data does not just imply large volumes of data but...
Article
Classification with big data has become one of the latest trends when talking about learning from the available information. The data growth in the last years has rocketed the interest in effectively acquiring knowledge to analyze and predict trends. The variety and veracity that is related to big data introduces a degree of uncertainty that has to...
Article
Full-text available
To provide a good study plan is key to avoid students’ failure. Academic advising based on student’s preferences, complexity of the semester, or even background knowledge is usually considered to reduce the dropout rate. This article aims to provide a good course index to recommend courses to students based on the sequence of courses already taken...
Article
Data in the real world is far from being perfect. The appearance of noise is a common issue that arises from the limitations of data acquisition mechanisms and human knowledge. In classification, label noise will hinder the performance of almost all classifiers, inducing a bias in the built model. While label noise has recently attracted researcher...
Article
Nowadays, large-scale group decision making is usually handled based on clustering analysis process (CAP) and consensus reaching process (CRP). However, CAP and CRP can be contradictory since CAP is performed based on differences between potentially small groups and CRP is conducted to improve the overall similarity of a large group. To balance CAP...
Article
In Deep Learning, training a model properly with a high quantity and quality of data is crucial in order to achieve a good performance. In some tasks, however, the necessary data is not available at a particular moment and only becomes available over time. In which case, incremental learning is used to train the model correctly. An open problem rem...
Article
The last decade witnessed tremendous developments in social media and e-democracy technologies. A fundamental aspect in these paradigms is that the number of decision makers allowed to partake in a decision making event drastically increases. As a result Large Scale Decision Making (LSDM) has established itself as an emerging and rapidly developing...
Article
Irony detection is a not trivial problem and can help to improve natural language processing tasks as sentimentanalysis. When dealing with social media data in real scenarios, an important issue to address is data skew, i.e. the imbalancebetween available ironic and non-ironic samples available. In this work, the main objective is to address irony...
Preprint
Full-text available
Currently, Coronavirus disease (COVID-19), one of the most infectious diseases in the 21st century, is diagnosed using RT-PCR testing, CT scans and/or Chest X-Ray (CXR) images. CT (Computed Tomography) scanners and RT-PCR testing are not available in most medical centers and hence in many cases CXR images become the most time/cost effective tool fo...
Article
Artificial intelligence and all its supporting tools, e.g. machine and deep learning in computational intelligence-based systems, are rebuilding our society (economy, education, life-style, etc.) and promising a new era for the social welfare state. In this paper we summarize recent advances in data science and artificial intelligence within the in...
Preprint
In many machine learning tasks, learning a good representation of the data can be the key to building a well-performant solution. This is because most learning algorithms operate with the features in order to find models for the data. For instance, classification performance can improve if the data is mapped to a space where classes are easily sepa...
Article
We argue that classic citation-based scientific document clustering approaches, like co-citation or Bibliographic Coupling, lack to leverage the social-usage of the scientific literature originate through online information dissemination platforms, such as Twitter. In this paper, we present the methodology Tweet Coupling, which measures the similar...
Preprint
Autoencoders are techniques for data representation learning based on artificial neural networks. Differently to other feature learning methods which may be focused on finding specific transformations of the feature space, they can be adapted to fulfill many purposes, such as data visualization, denoising, anomaly detection and semantic hashing. Th...
Article
Combining traditional diversity and re-balancing techniques serves to design effective ensembles for solving imbalanced classification problems. Therefore, to explore the performance of new diversification procedures and new re-balancing methods is an attractive research subject which can provide even better performances. In this contribution, we p...
Article
Existing opinion dynamics models often fail to consider the relationship between one agent and people two degrees of separation from the agent. In addition, no accurate weight determination method has yet been fully developed. To address these limitations, this paper proposes a two-step communication opinion dynamics model based on the classical De...
Article
In many machine learning tasks, learning a good representation of the data can be the key to building a well-performant solution. This is because most learning algorithms operate with the features in order to find models for the data. For instance, classification performance can improve if the data is mapped to a space where classes are easily sepa...
Preprint
Bio-inspired optimization (including Evolutionary Computation and Swarm Intelligence) is a growing research topic with many competitive bio-inspired algorithms being proposed every year. In such an active area, preparing a successful proposal of a new bio-inspired algorithm is not an easy task. Given the maturity of this research field, proposing a...
Preprint
Full-text available
Multitasking optimization is an incipient research area which is lately gaining a notable research momentum. Unlike traditional optimization paradigm that focuses on solving a single task at a time, multitasking addresses how multiple optimization problems can be tackled simultaneously by performing a single search process. The main objective to ac...
Chapter
Data discretization task transforms continuous numerical data into discrete and bounded values, more understandable for humans and more manageable for a wide range of machine learning methods. With the advent of Big Data, a new wave of large-scale datasets with predominance of continuous features have arrived to industry and academia. However, stan...
Chapter
Data reduction in data mining selects/generates the most representative instances in the input data in order to reduce the original complex instance space and better define the decision boundaries between classes. Theoretically, reduction techniques should enable the application of learning algorithms on large-scale problems. Nevertheless, standard...
Chapter
The negative impact on learning associated with imbalanced proportion of classes has exploded lately with the exponential growth of “cheap” data. Many real-world problems present scarce number of instances in one class whereas in others their cardinality is several factors greater. The current techniques that treat large-scale imbalanced data are f...
Chapter
In the new era of Big Data, exponential increase in volume is usually accompanied by an explosion in the number of features. Dimensionality reduction arises as a possible solution to enable large-scale learning with millions of dimensions. Nevertheless, as any other family of algorithms, reduction methods require an upgrade in its design so that th...
Chapter
The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights. Big Data is focused on volume, velocity, variety, veracity, and value. The idea of Smart Data is to separate the physical properties of the data (volume, velocity, and variety), from the value and v...
Chapter
Throughout this book we have presented a complete vision about Big Data preprocessing and how it enables Smart Data. Data is only as valuable as the knowledge and insights we can extract from it. Referring to the well-known “garbage in, garbage out” principle, accumulating vast amounts of raw data will not guarantee quality results, but poor knowle...
Chapter
The fast evolving Big Data environment has provoked that a myriad of tools, paradigms, and techniques surge to tackle different use cases in industry and science. However, because of the myriad of existing tools, it is often difficult for practitioners and experts to analyze and select the correct tool for their problems. In this chapter we present...
Chapter
In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same dictate. A common problem affecting data quality is the presence of noise, particularly in classification problems...
Chapter
The advent of Big Data has created the necessity of new computing tools for processing huge amounts of data. Apache Hadoop was the first open-source framework that implemented the MapReduce paradigm. Apache Spark appeared a few years later improving the Hadoop Ecosystem. Similarly, Apache Flink appeared in the last years for tackling the Big Data s...
Preprint
This paper proposes a new model based on Fuzzy k-Nearest Neighbors for classification with monotonic constraints, Monotonic Fuzzy k-NN (MonFkNN). Real-life data-sets often do not comply with monotonic constraints due to class noise. MonFkNN incorporates a new calculation of fuzzy memberships, which increases robustness against monotonic noise witho...
Preprint
Full-text available
In recent years, Multifactorial Optimization (MFO) has gained a notable momentum in the research community. MFO is known for its inherent capability to efficiently address multiple optimization tasks at the same time, while transferring information among such tasks to improve their convergence speed. On the other hand, the quantum leap made by Deep...
Preprint
A key aspect of the design of evolutionary and swarm intelligence algorithms is studying their performance. Statistical comparisons are also a crucial part which allows for reliable conclusions to be drawn. In the present paper we gather and examine the approaches taken from different perspectives to summarise the assumptions made by these statisti...
Preprint
In recent years, a great variety of nature and bio-inspired algorithms have been published in the literature. These algorithms simulate some natural or biological processes such as natural evolution, to optimize complex problems. Last years, the accumulation of this type of algorithms has achieved a number difficult to manage, so it is needed a goo...
Preprint
Full-text available
With the advent of huges volumes of data produced in the form of fast streams, real-time machine learning has become a challenge of relevance emerging in a plethora of real-world applications. Processing such fast streams often demands high memory and processing resources. In addition, they can be affected by non-stationary phenomena (concept drift...
Article
A key aspect of the design of evolutionary and swarm intelligence algorithms is studying their performance. Statistical comparisons are also a crucial part which allows for reliable conclusions to be drawn. In the present paper we gather and examine the approaches taken from different perspectives to summarise the assumptions made by these statisti...
Preprint
Full-text available
Ensemble methods have been widely used for improving the results of the best single classification model. Indeed, a large body of works have achieved better results mainly by applying one specific ensemble method. However, very few works analyze complex fusion schemes using heterogeneous ensemble strategies. This paper is three-fold: 1) It provides...
Preprint
Full-text available
Big Data scenarios pose a new challenge to traditional data mining algorithms, since they are not prepared to work with such amount of data. Smart Data refers to data of enough quality to improve the outcome from a data mining algorithm. Existing data mining algorithms unability to handle Big Datasets prevents the transition from Big to Smart Data....
Article
Full-text available
Accurate tree cover mapping is of paramount importance in many fields, from biodiversity conservation to carbon stock estimation, ecohydrology, erosion control, or Earth system modelling. Despite this importance, there is still uncertainty about global forest cover, particularly in drylands. Recently, the Food and Agriculture Organization of the Un...
Article
Full-text available
Aspect-based sentiment analysis enables the extraction of fine-grained information, as it connects specific aspects that appear in reviews with a polarity. Although we detect that the information from these algorithms is very accurate at local level, it does not contribute to obtain an overall understanding of reviews. To fill this gap, we propose...
Article
Full-text available
Hesitant fuzzy linguistic preference relations (HFLPRs) can be used to represent cognitive complex information in a situation in which people hesitate among several possible linguistic terms for the preference degrees of pairwise comparisons over alternatives. HFLPRs have attracted growing attention owing to their efficiency in dealing with increas...
Book
This book offers a comprehensible overview of Big Data Preprocessing, which includes a formal description of each problem. It also focuses on the most relevant proposed solutions. This book illustrates actual implementations of algorithms that helps the reader deal with these problems. This book stresses the gap that exists between big, raw data...
Article
Full-text available
It is recognized the importance of knowing the descriptive properties of a dataset when tackling a data science problem. Having information about the redundancy, complexity and density of a problem allows us to make decisions as to which data preprocessing and machine learning techniques are most suitable. In classification problems, there are mult...
Article
Traditional bibliometric techniques gauge the impact of research through quantitative indices based on the citations data. However, due to the lag time involved in the citation-based indices, it may take years to comprehend the full impact of an article. This paper seeks to measure the early impact of research articles through the sentiments expres...
Article
Full-text available
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in Machine Learning, the entire community stands in front of the barrier of explainability, an inherent problem of the la...
Article
The social network group decision‐making is popular due to the advantages of social relationships in the consensus reaching process, especially the trust relationships. To explore the effects of trust on consensus, some minimum cost consensus models are proposed based on implicit trust between individuals and the moderator. The implicit trust is co...
Chapter
Full-text available
The rise of omics techniques has resulted in an explosion of molecular data in modern biomedical research. Together with information from medical images and clinical data, the field of omics has driven the implementation of per-sonalized medicine. Biomedical and omics datasets are complex and heteroge-neous, and extracting meaningful knowledge from...
Article
This paper proposes a novel preprocessing methodology, framed within the field of time series forecasting. The aim is to get quality data and to extract information on the most important variables involved in a real-world crude oil refining process. To achieve this objective, the methodology incorporates the addition of dynamic knowledge, treatment...
Preprint
Full-text available
In the last years, Artificial Intelligence (AI) has achieved a notable momentum that may deliver the best of expectations over many application sectors across the field. For this to occur, the entire community stands in front of the barrier of explainability, an inherent problem of AI techniques brought by sub-symbolism (e.g. ensembles or Deep Neur...
Article
The quality of the data is directly related to the quality of the models drawn from that data. For that reason, many research is devoted to improve the quality of the data and to amend errors that it may contain. One of the most common problems is the presence of noise in classification tasks, where noise refers to the incorrect labeling of trainin...
Article
Full-text available
Based on the Computing with Words (CW), double hierarchy hesitant fuzzy linguistic term set (DHHFLTS) can be used to express complex linguistic information accurately with two simple linguistic hierarchies. This paper proposes a group decision making (GDM) model based on multiplicative consistency and consensus with double hierarchy hesitant fuzzy...
Article
Full-text available
Despite their interest and threat status, the number of whales in world's oceans remains highly uncertain. Whales detection is normally carried out from costly sighting surveys, acoustic surveys or through high-resolution images. Since deep convolutional neural networks (CNNs) are achieving great performance in several computer vision tasks, here w...
Article
Full-text available
There are several breast cancer datasets for building Computer Aided Diagnosis systems (CADs) using either deep learning or traditional models. However, most of these datasets impose various trade-offs on practitioners related to their availability or inner clinical value. Recently, a public dataset called BreakHis has been released to overcome the...
Article
This paper presents otsad, the first R package which implements a set of novel online detection algorithms for univariate time-series. Moreover, this package also provides advanced functionalities and contents such as new false positive reduction algorithm and the novel NAB detectors measurement technique which is specifically designed to measure o...
Article
The hesitant fuzzy linguistic term set (HFLTS) turns out to be useful in representing people's hesitant qualitative information. The aim of this paper is to investigate new correlation measures between HFLTSs and apply them in decision-making process. Firstly, the concepts of mean and hesitancy degree of hesitant fuzzy linguistic elements are intro...
Article
One of the best-known and most effective methods in supervised classification is the k nearest neighbors algorithm (kNN). Several approaches have been proposed to improve its accuracy, where fuzzy approaches prove to be among the most successful, highlighting the classical Fuzzy k-nearest neighbors (FkNN). However, these traditional algorithms fail...
Article
This paper focuses on multi-attribute intuitionistic fuzzy large-scale decision making (LSDM) scenarios. The alternatives are described by attributes in the LSDM model. The decision failure may be caused by unqualified alternative being the final decision. To avoid this, we propose a Defective Alternative Detection-based multi-attribute intuitionis...
Chapter
Addressing the huge amount of data continuously generated is an important challenge in the Machine Learning field. The need to adapt the traditional techniques or create new ones is evident. To do so, distributed technologies have to be used to deal with the significant scalability constraints due to the Big Data context. In many Big Data applicati...
Article
Full-text available
The problem of class noisy instances is omnipresent in different classification problems. However, most of research focuses on noise handling in binary classification problems and adaptations to multiclass learning. This paper aims to contextualize noise labels in the context of non-binary classification problems, including multiclass, multi-label,...
Article
Full-text available
Corals are crucial animals as they support a large part of marine life. The automatic classification of corals species based on underwater images is important as it can help experts to track and detect threatened and vulnerable coral species. However, this classification is complicated due to the nature of coral underwater images and the fact that...
Article
Full-text available
Background Data preprocessing techniques are devoted to correcting or alleviating errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data preprocessing, there is little research devoted to the continuous Big Data problem. Apache Flink is...
Article
Full-text available
Instance reduction techniques are data preprocessing methods originally developed to enhance the nearest neighbor rule for standard classification. They reduce the training data by selecting or generating representative examples of a given problem. These algorithms have been designed and widely analyzed in multi-class problems providing very compet...
Article
Industry 4.0 is revolutionizing decision making processes within the manufacturing industry. Among the technological portfolio enabling this revolution, the late literature has capitalized on the potential of data analytics for improving the production cycle at different stages, from resource provisioning to planning, delivery and storage. However,...
Chapter
Autoencoders are techniques for data representation learning based on artificial neural networks. Differently to other feature learning methods which may be focused on finding specific transformations of the feature space, they can be adapted to fulfill many purposes, such as data visualization, denoising, anomaly detection and semantic hashing. Th...
Article
Full-text available
Because of the increasing complexity of real-world decision-making environment, there is a trend that a large number of decision-makers are becoming involved in group decision making problems. In large-scale group decision making problems, owing to various backgrounds and psychological cognition, it is natural to use heterogeneous representation fo...
Preprint
Uncertain multiple criteria decision making methodologies and their applications in healthcare management
Article
The current evolution in multidisciplinary Learning Analytics Research, poses significant challenges for the exploitation of behavior analysis by fusing data streams towards advanced decision making. The identification of students that are at risk of withdrawals in higher education is connected to numerous educational policies, to enhance their com...
Article
Full-text available
Latent fingerprint identification is attracting increasing interest because of its important role in law enforcement. Although the use of various fingerprint features might be required for successful latent fingerprint identification, methods based on minutiae are often readily applicable and commonly outperform other methods. However, there exist...
Article
Currently, a plethora of industrial and academic sentiment analysis methods for classifying the opinion polarity of a text are available and ready to use. However, each of those methods have their strengths and weaknesses, due mainly to the approach followed in their design (supervised/unsupervised) or the domain of text used in their development....
Article
Full-text available
Analytic Hierarchy Process (AHP), as one of the most important methods to tackle multiple criteria decision-making problems, has achieved much success over the past several decades. Given that linguistic expressions are much closer than numerical values or single linguistic terms to human way of thinking and cognition, this paper investigates the A...
Article
Full-text available
In recent years, the research community has witnessed an explosion of literature dealing with the adaptation of behavioral patterns and social phenomena observed in nature towards efficiently solving complex computational tasks. This trend has been especially dramatic in what relates to optimization problems, mainly due to the unprecedented complex...