Verónica Bolón-Canedo

Verónica Bolón-Canedo
  • PhD Computer Science
  • Professor (Assistant) at Universidade da Coruña

About

160
Publications
59,867
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,062
Citations
Current institution
Universidade da Coruña
Current position
  • Professor (Assistant)
Additional affiliations
May 2015 - February 2016
University of Manchester
Position
  • PostDoc Position
June 2014 - May 2015
Universidade da Coruña
Position
  • PostDoc Position
July 2008 - June 2014
Universidade da Coruña
Position
  • PhD Student
Education
September 2003 - December 2008

Publications

Publications (160)
Article
Full-text available
The growth in the number of wearable devices has increased the amount of data produced daily. Simultaneously, the limitations of such devices has also led to a growing interest in the implementation of machine learning algorithms with low‐precision computation. We propose green and efficient modifications of state‐of‐the‐art feature selection metho...
Conference Paper
The amount of data used by modern machine learning algorithms is increasing, which presents several challenges. First and foremost, the data does not come from a single repository, but is distributed across multiple sources, often in different geographic locations. Another challenge is the high hardware requirements needed to process the data, whic...
Article
Full-text available
A common situation in classification tasks is to deal with unbalanced datasets, an issue that appears when the majority class(es) has a large number of samples compared to the minority class(es). This problem is even more significant when the datasets have a large number of features but only a few samples, as is the case with microarray datasets. T...
Chapter
Unsupervised domain adaptation focuses on reusing a model trained on a source domain in an unlabeled target domain. Two main approaches stand out in the literature: adversarial training for generating invariant features and minimizing the discrepancy between feature distributions. This paper presents a hybrid approach that combines these two method...
Article
Full-text available
The growth of Big Data has resulted in an overwhelming increase in the volume of data available, including the number of features. Feature selection, the process of selecting relevant features and discarding irrelevant ones, has been successfully used to reduce the dimensionality of datasets. However, with numerous feature selection approaches in t...
Article
The emergence of the Industry 4.0 trend brings automation and data exchange to industrial manufacturing. Using computational systems and IoT devices allows businesses to collect and deal with vast volumes of sensorial and business process data. The growing and proliferation of big data and machine learning technologies enable strategic decisions ba...
Article
Classic embedded feature selection algorithms are often divided in two large groups: tree-based algorithms and LASSO variants. Both approaches are focused in different aspects: while the tree-based algorithms provide a clear explanation about which variables are being used to trigger a certain output, LASSO-like approaches sacrifice a detailed expl...
Article
In recent years, new technological areas have emerged and proliferated, such as the Internet of Things or embedded systems in drones, which are usually characterized by making use of devices with strict requirements of weight, size, cost and power consumption. As a consequence, there has been a growing interest in the implementation of machine lear...
Article
This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hy...
Preprint
Full-text available
There are many contexts where dyadic data is present. Social networking is a well-known example, where transparency has grown on importance. In these contexts, pairs of items are linked building a network where interactions play a crucial role. Explaining why these relationships are established is core to address transparency. These explanations ar...
Article
Full-text available
In this study, we analyze the capability of several state of the art machine learning methods to predict whether patients diagnosed with CoVid-19 (CoronaVirus disease 2019) will need different levels of hospital care assistance (regular hospital admission or intensive care unit admission), during the course of their illness, using only demographic...
Article
Full-text available
The number of interconnected devices, such as personal wearables, cars, and smart-homes, surrounding us every day has recently increased. The Internet of Things devices monitor many processes, and have the capacity of using machine learning models for pattern recognition, and even making decisions, with the added advantage of diminishing network co...
Article
Full-text available
Argumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is ne...
Chapter
Feature selection has been widely used for decades as a preprocessing step that allows for reducing the dimensionality of a problem while improving classification accuracy. The need for this kind of technique has increased dramatically in recent years with the advent of Big Data. This data explosion not only has the problem of a large number of sam...
Article
The video game League of Legends has several professional leagues and tournaments that offer prizes reaching several million dollars, making it one of the most followed games in the Esports scene. This article addresses the prediction of the winning team in professional matches of the game, using only pregame data. We propose to improve the accur...
Chapter
The advent of Big Data has brought with it an unprecedented and overwhelming increase in data volume, not only in samples but also in available features. Feature selection, the process of selecting the relevant features and discarding the irrelevant ones, has been successfully applied over the last decades to reduce the dimensionality of the datase...
Article
The task of choosing the appropriate classifier for a given scenario is not an easy-to-solve question. First, there is an increasingly high number of algorithms available belonging to different families. And also there is a lack of methodologies that can help on recommending in advance a given family of algorithms for a certain type of datasets. Be...
Article
Full-text available
Advances in the information technologies have greatly contributed to the advent of larger datasets. These datasets often come from distributed sites, but even so, their large size usually means they cannot be handled in a centralized manner. A possible solution to this problem is to distribute the data over several processors and combine the differ...
Preprint
Classic embedded feature selection algorithms are often divided in two large groups: tree-based algorithms and lasso variants. Both approaches are focused in different aspects: while the tree-based algorithms provide a clear explanation about which variables are being used to trigger a certain output, lasso-like approaches sacrifice a detailed expl...
Article
Full-text available
Cyber security is a critical area in computer systems especially when dealing with sensitive data. At present, it is becoming increasingly important to assure that computer systems are secured from attacks due to modern society dependence from those systems. To prevent these attacks, nowadays most organizations make use of anomaly-based intrusion d...
Article
Full-text available
In computer vision, current feature extraction techniques generate high dimensional data. Both convolutional neural networks and traditional approaches like keypoint detectors are used as extractors of high-level features. However, the resulting datasets have grown in the number of features, leading into long training times due to the curse of dime...
Article
Full-text available
Over the years, the success of recommender systems has become remarkable. Due to the massive arrival of options that a consumer can have at his/her reach, a collaborative environment was generated, where users from all over the world seek and share their opinions based on all types of products. Specifically, millions of images tagged with users’ ta...
Article
Full-text available
Image analysis is a prolific field of research which has been broadly studied in the last decades, successfully applied to a great number of disciplines. Since the apparition of Big Data, the number of digital images is explosively growing, and a large amount of multimedia data is publicly available. Not only is it necessary to deal with this incre...
Article
Since wearable computing systems have grown in importance in the last years, there is an increased interest in implementing machine learning algorithms with reduced precision parameters/computations. Not only learning, also feature selection, most of the times a mandatory preprocessing step in machine learning, is often constrained by the available...
Preprint
Full-text available
Recommender systems (RS) are increasingly present in our daily lives, especially since the advent of Big Data, which allows for storing all kinds of information about users' preferences. Personalized RS are successfully applied in platforms such as Netflix, Amazon or YouTube. However, they are missing in gastronomic platforms such as TripAdvisor, w...
Chapter
Due to the proliferation of mobile computing and Internet of Things devices, there is an urgent need to push the machine learning frontiers to the network edge so as to fully unleash the potential of the edge big data. Since feature selection becomes a fundamental step in the data analysis process, the need to perform this preprocessing task in a r...
Article
Classic feature selection techniques remove irrelevant or redundant features to achieve a subset of relevant features in compact models that are easier to interpret and so improve knowledge extraction. Most such techniques operate on the whole dataset, but are unable to provide the user with useful information when only instance-level information i...
Article
Feature selection is nowadays an extremely important data mining stage in the field of machine learning due to the appearance of problems of high dimensionality. In the literature there are numerous feature selection methods, mRMR (minimum-Redundancy-Maximum-Relevance) being one of the most widely used. However, although it achieves good results in...
Article
Feature selection is a crucial step nowadays in machine learning and data analytics to remove irrelevant and redundant characteristics and thus to provide fast and reliable analyses. Many research works have focused on developing new methods that increase the global relevance of the subset of selected features while reducing the redundancy of infor...
Article
Feature selection is a preprocessing technique that identifies the key features of a given problem. It has traditionally been applied in a wide range of problems that include biological data processing, finance, and intrusion detection systems. In particular, feature selection has been successfully used in medical applications, where it can not onl...
Chapter
The current situation in microarray data analysis and prospects for the future are briefly discussed in this chapter, in which the competition between microarray technologies and high-throughput technologies is considered under a data analysis view. The up-to-date limitations of DNA microarrays are important to forecast challenges and future trends...
Chapter
The advent of DNA microarray datasets has stimulated a new line of research both in bioinformatics and in machine learning. This type of data is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for disease diagnosis or for distinguishing specific types of tumor. Microarray data clas...
Chapter
A typical characteristic of microarray data is that it has a very high number of features (in the order of thousands) while the number of examples is usually less than 100. In the context of microarray classification, this poses a challenge for machine learning methods, which can suffer overfitting and thus degradation in their performance. A commo...
Preprint
Classic feature selection techniques remove those features that are either irrelevant or redundant, achieving a subset of relevant features that help to provide a better knowledge extraction. This allows the creation of compact models that are easier to interpret. Most of these techniques work over the whole dataset, but they are unable to provide...
Article
Feature selection is of great importance for two possible scenarios: (1) prediction, i.e., improving (or minimally degrading) the predictions of a target variable while discarding redundant or uninformative features and (2) discovery, i.e., identifying features that are truly dependent on the target and may be genuine causes to be determined in exp...
Book
This book provides a comprehensive, interdisciplinary collection of the main, up-to-date methods, tools, and techniques for microarray data analysis, covering the necessary steps for the acquisition of the data, its preprocessing, and its posterior analysis. Featuring perspectives from biology, computer science, and statistics, the volume explores...
Article
We consider a distributed framework where training and test samples drawn from the same distribution are available, with the training instances spread across disjoint nodes. In this setting, a novel learning algorithm based on combining with different weights the outputs of classifiers trained at each node is proposed. The weights depend on the dis...
Article
Ensemble learning is a prolific field in Machine Learning since it is based on the assumption that combining the output of multiple models is better than using a single model, and it usually provides good results. Normally, it has been commonly employed for classification, but it can be used to improve other disciplines such as feature selection. F...
Article
Full-text available
Proactive Maintenance practices are becoming more standard in industrial environments, with a direct and profound impact on the competitivity within the sector. These practices demand the continuous monitorization of industrial equipment, which generates extensive amounts of data. This information can be processed into useful knowledge with the use...
Article
Full-text available
Data is growing at an unprecedented pace. With the variety, speed and volume of data flowing through networks and databases, newer approaches based on machine learning are required. But what is really big in Big Data? Should it depend on the numerical representation of the machine? Since portable embedded systems have been growing in importance, th...
Article
In an era in which the volume and complexity of datasets is continuously growing, feature selection techniques have become indispensable to extract useful information from huge amounts of data. However, existing algorithms may not scale well when dealing with huge datasets, and a possible solution is to distribute the data in several nodes. In this...
Article
Full-text available
Lately, derived from the explosion of high dimensionality, researchers in machine learning became interested not only in accuracy, but also in scalability. Although scalability of learning methods is a trending issue, scalability of feature selection methods has not received the same amount of attention. This research analyzes the scalability of st...
Chapter
Medicine will experience many changes in the coming years because the so-called “medicine of the future” will be increasingly proactive, featuring four basic elements: predictive, personalized, preventive, and participatory. Drivers for these changes include the digitization of data in medicine and the availability of computational tools that deal...
Chapter
Full-text available
The unprecedented amount of visual data that is available nowadays has created new research opportunities and challenges in the areas of computer vision and machine learning. When dealing with large scale datasets, with a huge number of samples and features, the use of feature selection plays an important role for dimensionality reduction whilst al...
Chapter
This chapter describes the different approaches that can be used to evaluate the behavior of the ensembles for feature selection. Beside the well-known, almost universal measures of accuracy, there are two other measures that should be taken into account to quantify the success of an ensemble approach: diversity and stability. In both cases, the re...
Chapter
This chapter describes several new fields, beside the feature selection preprocessing step (the theme of this book), in which ensembles have been successfully used. First, in Sect. 7.1, we introduce a very brief review of the different application fields in which ensembles have been applied, together with basic levels that are used to produce diffe...
Chapter
This chapter reveals the new challenges that the researchers are finding in ensemble feature selection, most of them related with “Big Data” and some of its consequences, as the important rise in unsupervised learning, because unlabelled samples is the most common situation in large datasets; or the need for visualization, that is a challenge also...
Chapter
The advent of Big Data, and specially the advent of datasets with high dimensionality, has brought an important necessity to identify the relevant features of the data. In this scenario, the importance of feature selection is beyond doubt and different methods have been developed, although researchers do not agree on which one is the best method fo...
Chapter
This chapter presents two different approaches for ensemble feature selection based on the filter model, aiming at achieving a good classification performance together with an important reduction in the input dimensionality. In this manner, we try to overcome the issue of selecting an appropriate method for each problem at hand, as it is usually ve...
Chapter
In the new era of Big Data, the analysis of data is more important than ever, in order to extract useful information. Feature selection is one of the most popular preprocessing techniques used by machine learning researchers, aiming to find the relevant features of a problem. Since the best feature selection method does not exist, a possible approa...
Chapter
This chapter provides the users with a review of some popular software tools that can help in the design of their ensembles for feature selection. There is an important number of feature selection and ensemble learning methods already implemented and available in different platforms, so it is useful to know them before coding our own ensembles. Sec...
Chapter
This chapter describes the ideas of the ensemble approach applied to feature selection, a classical preprocessing step which in the present context of Big Data and high dimensional datasets, has become of capital importance. Section 4.1 introduces the context of ensembles for feature selection, that are more detailed in Sects. 4.2 and 4.3 for homog...
Chapter
Ensemble learning is based on the divided-and-conquer principles but, after dividing, we would need to combine the partial results in some way to reach a final decision. Therefore, a crucial point when designing an ensemble method is to choose an appropriate method for combining the different weak outputs. There are several methods in the literatur...
Chapter
This chapter describes the basic ideas under the ensemble approach, together with the classical methods that have being used in the field of Machine Learning. Section 3.1 states the rationale under the approach, while in Sect. 3.2 the most popular methods are briefly described. Finally, Sect. 3.3 summarizes and discusses the contents of this chapte...
Chapter
Full-text available
As a consequence of the increasing competitivity in the current economic environment, Proactive Maintenance practices are gradually becoming more common in industrial environments. In order to implement these practices, large amounts of heterogeneous information must be analysed, such that knowledge about the status of the equipment can be acquired...
Article
Feature selection ensemble methods are a recent approach aiming at adding diversity in sets of selected features, improving performance and obtaining more robust and stable results. However, using an ensemble introduces the need for an aggregation step to combine all the output methods that conform the ensemble. Besides, when trying to improve comp...
Chapter
In the last few years, we have witnessed the advent of Big Data and, more specifically, Big Dimensionality, which refers to the unprecedented number of features that are rendering existing machine learning inadequate. To be able to deal with these high-dimensional spaces, a common solution is to use data preprocessing techniques which might help to...
Article
Full-text available
In recent years, ensemble learning has become a prolific area of study in pattern recognition, based on the assumption that using and combining different learning models in the same problem could lead to better performance results than using a single model. This idea of ensemble learning has traditionally been used for classification tasks, but has...
Article
Full-text available
Data complexity analysis enables an understanding of whether classification performance could be affected, not by algorithm limitations, but by intrinsic data characteristics. Microarray datasets based on high numbers of gene expressions combined with small sample sizes represent a particular challenge for machine learning researchers. This type of...
Article
Classification problems with more than two classes can be handled in different ways. The most used approach is the one which transforms the original multiclass problem into a series of binary subproblems which are solved individually. In this approach, should the same base classifier be used on all binary subproblems? Or should these subproblems be...
Conference Paper
To cope with the huge quantity of data that fast development of sensoring, networking and inexpensive data storage has come, many distributed approaches have been developed during the last years. The main reason is that, when dealing with large datasets, most existing data mining algorithms do not scale well, and their efficiency may significantly...
Article
Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each d...
Preprint
Full-text available
With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on huge datasets --both in number of instances and features--. The purpose of this work is to demonstrate that st...
Article
In the era of Big Data, many datasets have a common characteristic, the large number of features. As a result, selecting the relevant features and ignoring the irrelevant and redundant features has become indispensable. However, when dealing with large amounts of data, most existing feature selection algorithms do not scale well, and their efficien...
Conference Paper
When dealing with multiclass problems, the most used approach is the one based on multiple binary classifiers. This approach consists of employing class binarization techniques which transforms the multiclass problem into a series of binary problems which are solved individually. Then, the resultant predictions are combined to obtain a final soluti...

Network

Cited By