Amparo Alonso-Betanzos

Amparo Alonso-Betanzos
  • Professor
  • Universidade da Coruña

About

294
Publications
109,828
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,575
Citations
Current institution
Universidade da Coruña
Additional affiliations
January 1991 - December 2011
Universidade da Coruña
January 1989 - present
University of Santiago de Compostela
September 1988 - present
Medical College of Georgia

Publications

Publications (294)
Article
Full-text available
The successful adoption of social innovations, such as renewable energy systems or pollution reduction plans in cities, depends, to a large extent, on the willingness and participation of the population in their development and implementation. We present an agent‐based model (ABM) to analyze the process of citizen acceptability of a social innovati...
Preprint
Full-text available
Among the existing approaches for visual-based Recommender System (RS) explainability, utilizing user-uploaded item images as efficient, trustable explanations is a promising option. However, current models following this paradigm assume that, for any user, all images uploaded by other users can be considered negative training examples (i.e. bad ex...
Preprint
Full-text available
Dietary Restriction (DR) is one of the most popular anti-ageing interventions, prompting exhaustive research into genes associated with its mechanisms. Recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on D...
Article
Full-text available
A common situation in classification tasks is to deal with unbalanced datasets, an issue that appears when the majority class(es) has a large number of samples compared to the minority class(es). This problem is even more significant when the datasets have a large number of features but only a few samples, as is the case with microarray datasets. T...
Article
Estamos inmersos en una nueva revolución, una era de transformación impulsada por la Inteligencia Artificial (IA), que afecta significativamente al equilibrio geopolítico, la sociedad, la economía, el empleo y la educación, generando cambios constantes en estos ámbitos. La IA es una disciplina transversal que está presente en prácticamente cualquie...
Conference Paper
Full-text available
During the most critical moments of the SARS-COV-2 pandemic, various containment measures were enacted to hinder the virus’s spread and mitigate its impact. This work focuses on studying the impact of the population’s adherence level to socio-sanitary measures on the virus’s spread, aiming to better understand its relevance in crisis situations. To...
Preprint
Full-text available
Compartmental epidemiological models categorize individuals based on their disease status, such as the SEIRD model (Susceptible-Exposed-Infected-Recovered-Dead). These models determine the parameters that influence the magnitude of an outbreak, such as contagion and recovery rates. However, they don't account for individual characteristics or popul...
Preprint
Full-text available
Recommender Systems have become crucial in the modern world, commonly guiding users towards relevant content or products, and having a large influence over the decisions of users and citizens. However, ensuring transparency and user trust in these systems remains a challenge; personalized explanations have emerged as a solution, offering justificat...
Article
The emergence of the Industry 4.0 trend brings automation and data exchange to industrial manufacturing. Using computational systems and IoT devices allows businesses to collect and deal with vast volumes of sensorial and business process data. The growing and proliferation of big data and machine learning technologies enable strategic decisions ba...
Article
Classic embedded feature selection algorithms are often divided in two large groups: tree-based algorithms and LASSO variants. Both approaches are focused in different aspects: while the tree-based algorithms provide a clear explanation about which variables are being used to trigger a certain output, LASSO-like approaches sacrifice a detailed expl...
Article
Systems that rely on dyadic data, which relate entities of two types together, have become ubiquitously used in fields such as media services, tourism business, e-commerce, and others. However, these systems have had a tendency to be black-box systems, despite their objective of influencing people's decisions. There is a lack of research on providi...
Article
Full-text available
The Sustainable Development Goals (SDGs) adopted by the United Nations require relevant social changes that sometimes involve the development of innovative projects that cause rejection and confrontation. Agent-Based Models (ABM) are powerful tools to represent the behavior of systems, and they have become valuable for the social sciences as they c...
Preprint
Full-text available
Most proposals in the anomaly detection field focus exclusively on the detection stage, specially in the recent deep learning approaches. While providing highly accurate predictions, these models often lack transparency, acting as "black boxes". This criticism has grown to the point that explanation is now considered very relevant in terms of accep...
Conference Paper
Full-text available
Social network analysis is a popular discipline among the social and behavioural sciences, in which the relationships between different social entities are modelled as a network. One of the most popular problems in social network analysis is finding communities in its network structure. Usually, a community in a social network is a functional sub-p...
Preprint
Full-text available
Social network analysis is a popular discipline among the social and behavioural sciences, in which the relationships between different social entities are modelled as a network. One of the most popular problems in social network analysis is finding communities in its network structure. Usually, a community in a social network is a functional sub-p...
Article
This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its implementation in Apache Spark further increases its ability to handle very large datasets. Moreover, the algorithm incorporates an automatic hy...
Preprint
Full-text available
There are many contexts where dyadic data is present. Social networking is a well-known example, where transparency has grown on importance. In these contexts, pairs of items are linked building a network where interactions play a crucial role. Explaining why these relationships are established is core to address transparency. These explanations ar...
Article
Full-text available
In this study, we analyze the capability of several state of the art machine learning methods to predict whether patients diagnosed with CoVid-19 (CoronaVirus disease 2019) will need different levels of hospital care assistance (regular hospital admission or intensive care unit admission), during the course of their illness, using only demographic...
Article
Full-text available
The number of interconnected devices, such as personal wearables, cars, and smart-homes, surrounding us every day has recently increased. The Internet of Things devices monitor many processes, and have the capacity of using machine learning models for pattern recognition, and even making decisions, with the added advantage of diminishing network co...
Article
Full-text available
Argumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is ne...
Chapter
Feature selection has been widely used for decades as a preprocessing step that allows for reducing the dimensionality of a problem while improving classification accuracy. The need for this kind of technique has increased dramatically in recent years with the advent of Big Data. This data explosion not only has the problem of a large number of sam...
Chapter
Agent based models (ABM) are computational models employed for simulating the actions and interactions of autonomous agents with the objective of assessing their effects on the system as a whole. They have been extensively applied in social sciences because ABM simulations, under different running conditions, can help to test the implications of a...
Article
The task of choosing the appropriate classifier for a given scenario is not an easy-to-solve question. First, there is an increasingly high number of algorithms available belonging to different families. And also there is a lack of methodologies that can help on recommending in advance a given family of algorithms for a certain type of datasets. Be...
Article
Full-text available
Feature selection algorithms, such as ReliefF, are very important for processing high‐dimensionality data sets. However, widespread use of popular and effective such algorithms is limited by their computational cost. We describe an adaptation of the ReliefF algorithm that simplifies the costliest of its step by approximating the nearest neighbor gr...
Article
Full-text available
Advances in the information technologies have greatly contributed to the advent of larger datasets. These datasets often come from distributed sites, but even so, their large size usually means they cannot be handled in a centralized manner. A possible solution to this problem is to distribute the data over several processors and combine the differ...
Preprint
Classic embedded feature selection algorithms are often divided in two large groups: tree-based algorithms and lasso variants. Both approaches are focused in different aspects: while the tree-based algorithms provide a clear explanation about which variables are being used to trigger a certain output, lasso-like approaches sacrifice a detailed expl...
Article
The k -nearest-neighbors ( k NN) graph is a popular and powerful data structure that is used in various areas of Data Science, but the high computational cost of obtaining it hinders its use on large datasets. Approximate solutions have been described in the literature using diverse techniques, among which Locality-sensitive Hashing (LSH) is a prom...
Article
Full-text available
Cyber security is a critical area in computer systems especially when dealing with sensitive data. At present, it is becoming increasingly important to assure that computer systems are secured from attacks due to modern society dependence from those systems. To prevent these attacks, nowadays most organizations make use of anomaly-based intrusion d...
Article
Full-text available
COVID-19 has brought a new normality in society. However, to avoid the situation, the virus must be stopped. There are several ways in which the governments of the world have taken action, from small measures like general cleaning up to large-scale measures like confinement. In this work, we present an agent-based tool that allows for simulating th...
Article
Full-text available
Over the years, the success of recommender systems has become remarkable. Due to the massive arrival of options that a consumer can have at his/her reach, a collaborative environment was generated, where users from all over the world seek and share their opinions based on all types of products. Specifically, millions of images tagged with users’ ta...
Article
Full-text available
This work presents EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), a novel approach to address explanation using an anomaly detection algorithm, ADMNC, which provides accurate detections on mixed numerical and categorical input spaces. Our improved algorithm leverages the formulation of the ADMNC model to offer pre...
Preprint
Full-text available
In the present article we study social network modelling using human interaction as a basis. To do so, we propose a new set of functions, Affinities, designed to capture the nature of the local interactions among each pair of actors in a network. Using these functions, we develop a new community detection algorithm, the Borgia Clustering, where com...
Article
In this contribution we study social network modelling by using human interaction as a basis. To do so, we propose a new set of functions, affinities, designed to capture the nature of the local interactions among each pair of actors in a network. By using these functions, we develop a new community detection algorithm, the Borgia Clustering, where...
Article
Aims : Both left ventricular (LV) diastolic dysfunction (LVDD) and hypertrophy (LVH) as assessed by echocardiography are independent prognostic markers of future cardiovascular events in the community. However, selective screening strategies to identify individuals at risk who would benefit most from cardiac phenotyping are lacking. We, therefore,...
Article
This paper presents a unified propagation method for dealing with both the classic Eikonal equation, where the motion direction does not affect the propagation, and the more general static Hamilton-Jacobi equations, where it does. While classic Fast Marching Method (FMM) techniques achieve the solution to the Eikonal equation with a O(M log M) (or...
Article
Since wearable computing systems have grown in importance in the last years, there is an increased interest in implementing machine learning algorithms with reduced precision parameters/computations. Not only learning, also feature selection, most of the times a mandatory preprocessing step in machine learning, is often constrained by the available...
Preprint
Full-text available
Recommender systems (RS) are increasingly present in our daily lives, especially since the advent of Big Data, which allows for storing all kinds of information about users' preferences. Personalized RS are successfully applied in platforms such as Netflix, Amazon or YouTube. However, they are missing in gastronomic platforms such as TripAdvisor, w...
Chapter
Due to the proliferation of mobile computing and Internet of Things devices, there is an urgent need to push the machine learning frontiers to the network edge so as to fully unleash the potential of the edge big data. Since feature selection becomes a fundamental step in the data analysis process, the need to perform this preprocessing task in a r...
Article
Classic feature selection techniques remove irrelevant or redundant features to achieve a subset of relevant features in compact models that are easier to interpret and so improve knowledge extraction. Most such techniques operate on the whole dataset, but are unable to provide the user with useful information when only instance-level information i...
Article
Background Current heart failure guidelines emphasize the importance of timely detection of subclinical left ventricular (LV) remodelling and dysfunction for more precise risk stratification of asymptomatic subjects. Both LV diastolic dysfunction (LVDD) and LV hypertrophy (LVH) as assessed by echocardiography are known independent prognostic marker...
Article
Full-text available
Gaining relevant insight from a dyadic dataset, which describes interactions between two entities, is an open problem that has sparked the interest of researchers and industry data scientists alike. However, the existing methods have poor explainability, a quality that is becoming essential in certain applications. We describe an explainable and sc...
Chapter
The current situation in microarray data analysis and prospects for the future are briefly discussed in this chapter, in which the competition between microarray technologies and high-throughput technologies is considered under a data analysis view. The up-to-date limitations of DNA microarrays are important to forecast challenges and future trends...
Chapter
The advent of DNA microarray datasets has stimulated a new line of research both in bioinformatics and in machine learning. This type of data is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for disease diagnosis or for distinguishing specific types of tumor. Microarray data clas...
Chapter
A typical characteristic of microarray data is that it has a very high number of features (in the order of thousands) while the number of examples is usually less than 100. In the context of microarray classification, this poses a challenge for machine learning methods, which can suffer overfitting and thus degradation in their performance. A commo...
Preprint
Classic feature selection techniques remove those features that are either irrelevant or redundant, achieving a subset of relevant features that help to provide a better knowledge extraction. This allows the creation of compact models that are easier to interpret. Most of these techniques work over the whole dataset, but they are unable to provide...
Article
Full-text available
This work presents the ADMNC method, designed to tackle anomaly detection for large-scale problems with a mixture of categorical and numerical input variables. A flexible parametric probability measure is adjusted to input data, allowing low likelihood values to be tracked as anomalies. The main contribution of this method is that, to cope with the...
Article
Feature selection is of great importance for two possible scenarios: (1) prediction, i.e., improving (or minimally degrading) the predictions of a target variable while discarding redundant or uninformative features and (2) discovery, i.e., identifying features that are truly dependent on the target and may be genuine causes to be determined in exp...
Preprint
Full-text available
CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data application...
Book
This book provides a comprehensive, interdisciplinary collection of the main, up-to-date methods, tools, and techniques for microarray data analysis, covering the necessary steps for the acquisition of the data, its preprocessing, and its posterior analysis. Featuring perspectives from biology, computer science, and statistics, the volume explores...
Article
We consider a distributed framework where training and test samples drawn from the same distribution are available, with the training instances spread across disjoint nodes. In this setting, a novel learning algorithm based on combining with different weights the outputs of classifiers trained at each node is proposed. The weights depend on the dis...
Article
Ensemble learning is a prolific field in Machine Learning since it is based on the assumption that combining the output of multiple models is better than using a single model, and it usually provides good results. Normally, it has been commonly employed for classification, but it can be used to improve other disciplines such as feature selection. F...
Article
Full-text available
Feature selection (FS) is a key preprocessing step in data mining. CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dea...
Article
Full-text available
Obtaining relevant information from the vast amount of data generated by interactions in a market or, in general, from a dyadic dataset, is a broad problem of great interest both for industry and academia. Also, the interpretability of machine learning algorithms is becoming increasingly relevant and even becoming a legal requirement, all of which...
Article
Full-text available
Data is growing at an unprecedented pace. With the variety, speed and volume of data flowing through networks and databases, newer approaches based on machine learning are required. But what is really big in Big Data? Should it depend on the numerical representation of the machine? Since portable embedded systems have been growing in importance, th...
Article
In an era in which the volume and complexity of datasets is continuously growing, feature selection techniques have become indispensable to extract useful information from huge amounts of data. However, existing algorithms may not scale well when dealing with huge datasets, and a possible solution is to distribute the data in several nodes. In this...
Article
Full-text available
Lately, derived from the explosion of high dimensionality, researchers in machine learning became interested not only in accuracy, but also in scalability. Although scalability of learning methods is a trending issue, scalability of feature selection methods has not received the same amount of attention. This research analyzes the scalability of st...
Article
The articles in the long tail are those that are not popular in some sense, but all together often represent a large proportion of the products covered by a recommender system. For companies, it is important to recommend these items that otherwise could be unknown to their customers. It is also interesting for users because knowing about these item...
Chapter
Medicine will experience many changes in the coming years because the so-called “medicine of the future” will be increasingly proactive, featuring four basic elements: predictive, personalized, preventive, and participatory. Drivers for these changes include the digitization of data in medicine and the availability of computational tools that deal...
Data
This dataset is made available for non-commercial use, and it corresponds to the dataset used in the paper: ---- [1] Díez, J., Martínez-Rego, D., Alonso-Betanzos, A., Luaces, O., Bahamonde, A.: Optimizing Novelty and Diversity in Recommendations. Progress in Artificial Intelligence (2018). ---- You can find a better explanation of the dataset...
Chapter
This chapter describes the different approaches that can be used to evaluate the behavior of the ensembles for feature selection. Beside the well-known, almost universal measures of accuracy, there are two other measures that should be taken into account to quantify the success of an ensemble approach: diversity and stability. In both cases, the re...
Chapter
This chapter describes several new fields, beside the feature selection preprocessing step (the theme of this book), in which ensembles have been successfully used. First, in Sect. 7.1, we introduce a very brief review of the different application fields in which ensembles have been applied, together with basic levels that are used to produce diffe...
Chapter
This chapter reveals the new challenges that the researchers are finding in ensemble feature selection, most of them related with “Big Data” and some of its consequences, as the important rise in unsupervised learning, because unlabelled samples is the most common situation in large datasets; or the need for visualization, that is a challenge also...
Chapter
The advent of Big Data, and specially the advent of datasets with high dimensionality, has brought an important necessity to identify the relevant features of the data. In this scenario, the importance of feature selection is beyond doubt and different methods have been developed, although researchers do not agree on which one is the best method fo...
Chapter
This chapter presents two different approaches for ensemble feature selection based on the filter model, aiming at achieving a good classification performance together with an important reduction in the input dimensionality. In this manner, we try to overcome the issue of selecting an appropriate method for each problem at hand, as it is usually ve...
Chapter
In the new era of Big Data, the analysis of data is more important than ever, in order to extract useful information. Feature selection is one of the most popular preprocessing techniques used by machine learning researchers, aiming to find the relevant features of a problem. Since the best feature selection method does not exist, a possible approa...
Chapter
This chapter provides the users with a review of some popular software tools that can help in the design of their ensembles for feature selection. There is an important number of feature selection and ensemble learning methods already implemented and available in different platforms, so it is useful to know them before coding our own ensembles. Sec...
Chapter
This chapter describes the ideas of the ensemble approach applied to feature selection, a classical preprocessing step which in the present context of Big Data and high dimensional datasets, has become of capital importance. Section 4.1 introduces the context of ensembles for feature selection, that are more detailed in Sects. 4.2 and 4.3 for homog...
Chapter
Ensemble learning is based on the divided-and-conquer principles but, after dividing, we would need to combine the partial results in some way to reach a final decision. Therefore, a crucial point when designing an ensemble method is to choose an appropriate method for combining the different weak outputs. There are several methods in the literatur...
Chapter
This chapter describes the basic ideas under the ensemble approach, together with the classical methods that have being used in the field of Machine Learning. Section 3.1 states the rationale under the approach, while in Sect. 3.2 the most popular methods are briefly described. Finally, Sect. 3.3 summarizes and discusses the contents of this chapte...
Article
Feature selection ensemble methods are a recent approach aiming at adding diversity in sets of selected features, improving performance and obtaining more robust and stable results. However, using an ensemble introduces the need for an aggregation step to combine all the output methods that conform the ensemble. Besides, when trying to improve comp...
Chapter
In the last few years, we have witnessed the advent of Big Data and, more specifically, Big Dimensionality, which refers to the unprecedented number of features that are rendering existing machine learning inadequate. To be able to deal with these high-dimensional spaces, a common solution is to use data preprocessing techniques which might help to...
Article
Full-text available
In recent years, ensemble learning has become a prolific area of study in pattern recognition, based on the assumption that using and combining different learning models in the same problem could lead to better performance results than using a single model. This idea of ensemble learning has traditionally been used for classification tasks, but has...
Article
In this paper, a new one-class classification algorithm capable of working in distributed environments is presented. In it, convex hull is used to build the boundary of the target class defining the one-class problem in each of the distributed nodes. Therefore, we will consider several classifiers, each one determined using a given local data parti...
Chapter
Medical informatics has undergone a significant transformation in the past two decades. Software that once was only running on main frames, is nowadays available for anyone on any desired location via tablets and smartphones. Interestingly, the original ambitious endeavors to construct systems that ultimately would encompass the whole of medicine h...
Article
Full-text available
Data complexity analysis enables an understanding of whether classification performance could be affected, not by algorithm limitations, but by intrinsic data characteristics. Microarray datasets based on high numbers of gene expressions combined with small sample sizes represent a particular challenge for machine learning researchers. This type of...
Article
Classification problems with more than two classes can be handled in different ways. The most used approach is the one which transforms the original multiclass problem into a series of binary subproblems which are solved individually. In this approach, should the same base classifier be used on all binary subproblems? Or should these subproblems be...
Conference Paper
To cope with the huge quantity of data that fast development of sensoring, networking and inexpensive data storage has come, many distributed approaches have been developed during the last years. The main reason is that, when dealing with large datasets, most existing data mining algorithms do not scale well, and their efficiency may significantly...
Chapter
This chapter demonstrates an approach to the agent-based modelling of norm transmission using decision trees learned from questionnaire data. We explore the implications of adding norm dynamics implied in static questionnaire data and the influence social network topology has on the outcome. We find that parameters determining network topology infl...
Chapter
With the increasing trend in exploring the use of agent-based models in empirical contexts, this paper reflects on the use of decision trees learned from questionnaire data as behavioral models for the agents. Decision trees are machine learning algorithms most commonly used in the data mining literature, especially for smaller datasets where other...
Book
Using the O.D.D. (Overview, Design concepts, Detail) protocol, this title explores the role of agent-based modeling in predicting the feasibility of various approaches to sustainability. The chapters incorporated in this volume consist of real case studies to illustrate the utility of agent-based modeling and complexity theory in discovering a path...
Article
Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each d...
Article
Pro-environmental behaviors have been analyzed in the home, with little attention to other important contexts of everyday life, such as the workplace. The research reported here explored three categories of pro-environmental behavior (consumption of materials and energy, waste generation, and work-related commuting) in a public large-scale organiza...

Network

Cited By