About
165
Publications
31,605
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,819
Citations
Publications
Publications (165)
Regular vine copulas (R-vines) provide a comprehensive framework for modeling high-dimensional dependencies using a hierarchy of trees and conditional pair-copulas. While the graphical structure of R-vines is traditionally derived from data, this work introduces a novel approach by utilizing a (conditional) pairwise dependence list. Our primary goa...
In some machine learning applications the availability of labeled instances for supervised classification is limited while unlabeled instances are abundant. Semi-supervised learning algorithms deal with these scenarios and attempt to exploit the information contained in the unlabeled examples. In this paper, we address the question of how to evolve...
The COVID-19 pandemic is continuously evolving with drastically changing epidemiological situations which are approached with different decisions: from the reduction of fatalities to even the selection of patients with the highest probability of survival in critical clinical situations. Motivated by this, a battery of mortality prediction models wi...
A probabilistic framework for streaming novelty detection is proposed and illustrated with a mixture of Gaussian distributions that models the set of classes. Instances are predicted based on the probability of belonging to each of the classes. Those for which the model cannot provide confident predictions are introduced into a fixed-sized buffer....
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. Therefore, the development of efficient and parallel algorithms to perform such an analysis is a a crucial topic in unsupervised learning. Cluster analysis algorithms are a key element of exploratory data analysis and, among them, the K...
Due to their unprecedented capacity to learn patterns from raw data, deep neural networks have become the de facto modeling choice to address complex machine learning tasks. However, recent works have emphasized the vulnerability of deep neural networks when being fed with intelligently manipulated adversarial data instances tailored to confuse the...
Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors...
Denborazko serieen datu meatzaritza arloko problema ohikoenetako bat, denborazko serieen gainbegiratutako sailkapena da. Problema honen helburua, klaseetan banatuta dauden serie multzo batetik abiatuz, sailkatu gabeko beste serie batzuen klasea aurresango duen eredu ahalik eta zehatzena eraikitzea da. Problema klasiko honen hedapen gisa, kasu batzu...
Time series classification is an increasing research topic due to the vast amount of time series data that is being created over a wide variety of fields. The particularity of the data makes it a challenging task and different approaches have been taken, including the distance based approach. 1-NN has been a widely used method within distance based...
The way similarity is measured among time series is of paramount importance in many data mining and machine learning tasks. For instance, Elastic Similarity Measures are widely used to determine whether two time series are similar to each other. Indeed, in off-line time series mining, these measures have been shown to be very effective due to their...
Adversarial Machine Learning (AML) refers to the study of the robustness of classification models when processing data samples that have been intelligently manipulated to confuse them. Procedures aimed at furnishing such confusing samples exploit concrete vulnerabilities of the learning algorithm of the model at hand, by which perturbations can mak...
This paper deals with the problem of detecting sand dunes from remotely sensed images of the surface of Mars. We build on previous approaches that propose methods to extract informative features for the classification of the images. The intricate correlation structure exhibited by these features motivates us to propose the use of probabilistic clas...
Many Pareto-based multi-objective evolutionary algorithms require to rank the solutions of the population in each iteration according to the dominance principle, what can become a costly operation particularly in the case of dealing with many-objective optimization problems. In this paper, we present a new efficient algorithm for computing the non-...
This article is a survey paper on solving spacecraft trajectory optimization problems. The solving process is decomposed into four key steps of mathematical modeling of the problem, defining the objective functions, development of an approach and obtaining the solution of the problem. Several subcategories for each step have been identified and des...
Given a particular instance of a combinatorial optimization problem, the knowledge about the attraction basin sizes can help to analyze the difficulty encountered by local search algorithms while solving it. As calculating these sizes exhaustively is computationally intractable, we focus on methods for their estimation. The accuracy of some of thes...
The statistical assessment of the empirical comparison of algorithms is an essential step in heuristic optimization. Classically, researchers have relied on the use of statistical tests. However, recently, concerns about their use have arisen and, in many fields, other (Bayesian) alternatives are being considered. For a proper analysis, different a...
Estimation of distribution algorithms have already demonstrated their utility when solving a broad range of combinatorial problems. However, there is still room for methodological improvements when approaching constrained type problems. The great majority of works in the literature implement external repairing or penalty schemes, or use ad-hoc samp...
Time series classification is an increasing research topic due to the vast amount of time series data that are being created over a wide variety of fields. The particularity of the data makes it a challenging task and different approaches have been taken, including the distance based approach. 1-NN has been a widely used method within distance base...
Solving combinatorial optimization problems efficiently requires the development of algorithms that consider the specific properties of the problems. In this sense, local search algorithms are designed over a neighborhood structure that partially accounts for these properties. Considering a neighborhood, the space is usually interpreted as a natura...
Konputazio ebolutiboan, algoritmoek optimizazio-problemen gainean duten errendimendua ebaluatzeko ohikoa izaten da problema horien hainbat instantzia erabiltzea. Batzuetan, problema errealen instantziak eskuragarri daude, eta beraz, esperimentaziorako instantzien multzoa hortik osatzen da. Tamalez, orokorrean, ez da hori gertatzen. Instantziak esku...
The Mallows and Generalized Mallows models are compact yet powerful and natural ways of representing a probability distribution over the space of permutations. In this paper, we deal with the problems of sampling and learning such distributions when the metric on permutations is the Cayley distance. We propose new methods for both operations, and t...
Classifying software defects according to any defined taxonomy is not straightforward. In order to be used for automatizing the classification of software defects, two sets of defect reports were collected from public issue tracking systems from two different real domains. Due to the lack of a domain expert, the collected defects were categorized b...
In the last decade, many works in combinatorial optimisation have shown that, due to the advances in multi-objective optimisation, the algorithms from this field could be used for solving single-objective problems as well. In this sense, a number of papers have proposed multi-objectivising single-objective problems in order to use multiobjective al...
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the implementation and relatively low computational cost. Among these algorithms, the K -means algorithm stands out as t...
The problem of early classification of time series appears naturally in contexts where the data, of temporal nature, are collected over time, and early class predictions are interesting or even required. The objective is to classify the incoming sequence as soon as possible, while maintaining suitable levels of accuracy in the predictions. Thus, we...
In software engineering, associating each reported defect with a category allows, among many other things, for the appropriate allocation of resources. Although this classification task can be automated using standard machine learning techniques, the categorization of defects for model training requires expert knowledge, which is not always availab...
Lan eredu honen helburua atunaren arrantzaren errentagarritasuna hobetzea da arrainaren behaketa eta ibilbidearen optimizazioan oinarrituta, erregaien kontsumoa murriztuz eta harrapaketak mantenduz. Munduko lanpo sabelmarradun eta hegahoriaren arrantzaren kontribuzio maila altuenak erakusten dituen itsas-azaleko arrantza mota inguratze arrantzan da...
The covariance matrix adaptation evolution strategy (CMA-ES) is one of the state-of-the-art evolutionary algorithms for optimization problems with continuous representation. It has been extensively applied to single-objective optimization problems, and different variants of CMA-ES have also been proposed for multi-objective optimization problems (M...
A variety of general strategies have been applied to enhance the performance of multi-objective optimization algorithms for many-objective optimization problems (those with more than three objectives). One of these strategies is to split the solutions to cover different regions of the search space (clusters) and apply an optimizer to each region wi...
This work deals with the so-called weighted independent domination problem, which is an NP-hard combinatorial optimization problem in graphs. In contrast to previous theoretical work from the literature, this paper considers the problem from an algorithmic perspective. The first contribution consists in the development of an integer linear programm...
The goal of early classification of time series is to predict the class value of a sequence early in time, when its full length is not yet available. This problem arises naturally in many contexts where the data is collected over time and the label predictions have to be made as soon as possible. In this work, a method based on probabilistic classi...
The Boltzmann distribution plays a key role in the field of optimization as it directly connects this field with that of probability. Basically, given a function to optimize, the Boltzmann distribution associated to this function assigns higher probability to the candidate solutions with better quality. Therefore, an efficient sampling of the Boltz...
The performance of local search algorithms is influenced by the properties that the neighborhood imposes on the search space. Among these properties, the number of local optima has been traditionally considered as a complexity measure of the instance, and different methods for its estimation have been developed. The accuracy of these estimators dep...
Although a great methodological effort has been invested in proposing competitive solutions to the class-imbalance problem, little effort has been made in pursuing a theoretical understanding of this matter. In order to shed some light on this topic, we perform, through a novel framework, an exhaustive analysis of the adequateness of the most commo...
The definition of a distance measure between time series is crucial for many time series data
mining tasks, such as clustering and classification. For this reason, a vast portfolio of time series
distance measures has been published in the past few years. In this paper, the
TSdist package is presented, a complete tool which provides a unified frame...
Probabilistic Graphical model (PGMs) types Data format and pre-processing Bayesian networks (BNs): structure and parameters Bayesian network classifiers Applications of Bayesian networks in environmental sciences Sentimental analysis in social sciences using BNs Multi-dimensional Bayesian network classifiers Flexible classifiers Inference diagrams...
In this paper we present the R package PerMallows, which is a complete toolbox to work with permutations, distances and some of the most popular probability models for permutations: Mallows and the Generalized Mallows models. The Mallows model is an exponential location model, considered as analogous to the Gaussian distribution. It is based on the...
Weakly supervised classification tries to learn from data sets which are not certainly labeled. Many problems, with different natures of partial labeling, fit this description. In this paper, the novel problem of learning from positive-unlabeled proportions is presented. The provided examples are unlabeled, and the only class information available...
In this paper we introduce vine copulas to model probabilistic dependencies in supervised classification problems. Vine copulas allow the representation of the dependence structure of multidimensional distributions as a factorization of bivariate pair-copulas. The flexibility of this model lies in the fact that we can mix different types of pair-co...
Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to manipulate and analyze such information. In spite of its dependency on the initial settings and the large number of distance computations that it can require to converge, the K-means algorithm remains as one of the m...
Machine learning techniques have been previously used to assist clinicians to select embryos for human-assisted reproduction. This work aims to show how an appropriate modeling of the problem can contribute to improve machine learning techniques for embryo selection. In this study, a dataset of 330 consecutive cycles (and associated embryos) carrie...
Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm remains as one of the most popular clustering methods, in spite of its dependency on the initial settings and...
During the last decades several learning algorithms have been proposed to learn probability distributions based on decomposable models. Some of these algorithms can be used to search for a maximum likelihood decomposable model with a given maximum clique size, k. Unfortunately, the problem of learning a maximum likelihood decomposable model given a...
NM-landscapes have been recently introduced as a class of tunable rugged
models. They are a subset of the general interaction models where all the
interactions are of order less or equal $M$. The Boltzmann distribution has
been extensively applied in single-objective evolutionary algorithms to
implement selection and study the theoretical propertie...
In recent years, the performance of semisupervised learning (SSL) has been theoretically investigated. However, most of this theoretical development has focused on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover to the multiclass paradigm. In particular, we consider the key proble...
In the last decade, many works in combinatorial optimisation have shown that, due to the advances in multi-objective optimisation, the algorithms in this field could be used for solving single-objective problems. In this sense, a number of papers have proposed multi-objectivising single-objective problems in order to apply multi-objectivisation sch...
Standard supervised classification learns a classifier from a set of labeled examples. Alternatively, in the field of weakly supervised classification different frameworks have been presented where the training data cannot be certainly labeled. In this paper, the novel problem of learning from positive-unlabeled proportions is presented. The provid...
In this paper we propose an extension of the NM-landscape to model multi-objective problems (MOPs). We illustrate the link between the introduced model and previous landscapes used to study MOPs. Empirical results are presented for a variety of configurations of the multi-objective NM-landscapes.
Recently, distance-based exponential probability models, such as Mallows and Generalized Mallows, have demonstrated their validity in the context of estimation of distribution algorithms (EDAs) for solving permutation problems. However, despite their successful performance, these models are unimodal, and therefore, they are not flexible enough to a...
Early diagnosis of psychiatric conditions can be enhanced by taking into account eye movement behavior. However, the implementation of prediction algorithms which are able to assist physicians in the diagnostic is a difficult task. In this paper we propose, for the first time, an automatic approach for classification of multiple psychiatric conditi...
Performance assessment of a learning method related to its prediction ability on independent data is extremely important in supervised classification. This process provides the information to evaluate the quality of a classification model and to choose the most appropriate technique to solve the specific supervised classification problem at hand. T...
Estimation of distribution algorithms (EDAs) are a successful example of how to use machine learning techniques for designing robust and efficient heuristic search algorithms. Understanding the relationship between EDAs and the space of optimization problems is a fundamental issue for the successful application of this type of algorithms. A step fo...
The Mallows (MM) and the Generalized Mallows (GMM) probability models have demonstrated their validity in the framework of Estimation of distribution algorithms (EDAs) for solving permutation-based combinatorial optimisation problems. Recent works, however, have suggested that the performance of these algorithms strongly relies on the distance used...
Learning from crowds is a classification problem where the provided training instances are labeled by multiple (usually conflicting) annotators. In different scenarios of this problem, straightforward strategies show an astonishing performance. In this paper, we characterize the crowd scenarios where these basic strategies show a good behavior. As...
The Linear Ordering Problem is a popular combinatorial optimisation problem which has been extensively addressed in the literature. However, in spite of its popularity, little is known about the characteristics of this problem. This paper studies a procedure to extract static information from an instance of the problem, and proposes a method to inc...
In this paper, we propose a tunable generator of instances of permutation-based combinatorial optimization problems. Our approach is based on a probabilistic model for permutations, called the generalized Mallows model. The generator depends on a set of parameters that permits the control of the properties of the output instances. Specifically, in...
In the past few years, clustering has become a popular task associated with time series. The choice of a suitable distance measure is crucial to the clustering process and, given the vast number of distance measures for time series available in the literature and their diverse characteristics, this selection is not straightforward. With the objecti...
Selection plays an important role in estimation of distribution algorithms. It determines the solutions that will be modeled to represent the promising areas of the search space. There is a strong relationship between the strength of selection and the type and number of dependencies that are captured by the models. In this paper we propose to use d...
Wind is one of the parameters best predicted by numerical weather models, as it can be directly calculated from the physical equations of pressure that govern its movement. However, local winds are considerably affected by topography, which global numerical weather models, due to their limited resolution, are not able to reproduce. To improve the s...
The effect of different factors (spawning biomass, environmental conditions) on recruitment is a subject of great importance in the management of fisheries, recovery plans and scenarios exploration. In this study, recently proposed supervised classification techniques, tested by the machine-learning community, are applied to forecast the recruitmen...
Cloud infrastructures are designed to simultaneously service many, diverse applications that consist of collections of Virtual Machines (VMs). The placement policy used to map applications onto physical servers has important effects in terms of application performance and resource efficiency. We propose enhancing placement policies with network-awa...
This work is focused on learning maximum weighted graphs subject to three structural constraints: (1) the graph is decomposable, (2) it has a maximum clique size of k + 1, and (3) it is coarser than a given maximum k-order decomposable graph. After proving that the problem is NP-hard we give a formulation of the problem based on integer linear prog...
Due to the increase in vehicle transit and congestion in road networks, providing information about the state of the traffic to commuters has become a critical issue for Advanced Traveller Information Systems. These systems should assist users in making pre-trip and en-route decisions and, for this purpose, delivering travel time information is ver...
The minimum common string partition problem is an NP-hard combinatorial optimization problem with applications in computational biology. In this work we propose an iterative probabilistic tree search algorithm for tackling this problem. By means of an extensive experimental evaluation we show the superiority of our approach in comparison to a stand...
Non-contiguous partitioning strategies are often used to select and assign a set of nodes of a parallel computer to a particular job. The main advantage of these strategies, compared to contiguous ones, is the reduction of system fragmentation. However, without contiguity, locality in communications cannot be easily exploited, resulting in longer j...
Cloud infrastructures are designed to simultaneously service many, diverse applications that consist of collections of Virtual Machines (VMs). The policy used to map applications onto physical servers (placement policy) has important effects in terms of application performance and resource efficiency. This paper proposes enhancing placement policie...
This paper studies the influence that contiguous job placement has on the performance of schedulers for large-scale computing systems. In contrast with non-contiguous strategies, contiguous partitioning enables the exploitation of communication locality in applications, and also reduces inter-application interference. However, contiguous partitioni...
Sampling methods are a fundamental component of estimation of distribution algorithms (EDAs). In this paper we propose new methods for generating solutions in EDAs based on Markov networks. These methods are based on the combination of message passing algorithms with decimation techniques for computing the maximum a posteriori solution of a probabi...
In many optimization domains the solution of the problem can be made more efficient by the construction of a surrogate fitness model. Estimation of distribution algorithms (EDAs) are a class of evolutionary algorithms particularly suitable for the conception of model-based surrogate techniques. Since EDAs generate probabilistic models, it is natura...
Estimation of distribution algorithms (EDAs) are optimization methods that construct at each step a probabilistic graphical model (PGM) of the best evaluated solutions. The model serves as a concise representation of the regularities shared by the good solutions and can serve to unveil structural characteristics of the problem domain. In this paper...
While estimation of distribution algorithms (EDAs) based on Markov networks usually incorporate efficient methods to learn undirected probabilistic graphical models (PGMs) from data, the methods they use for sampling the PGMs are computationally costly. In addition, methods for generating solutions in Markov network based EDAs frequently discard in...
We investigate the behavior of message passing algorithms (MPAs) on approximate probabilistic graphical models (PGMs) learned in the context of optimization. We use the framework of estimation of distribution algorithms (EDAs), a class of optimization algorithms that learn in each iteration a PGM and sample new solutions from it. The impact that in...
This work presents a multidimensional classifier described in terms of interaction factors called multidimensional k-interaction classifier. The classifier is based on a probabilistic model composed of the product of all the interaction factors of order lower or equal to k and it takes advantage of all the information contained in them. The propose...
Learning from crowds is a recently fashioned supervised classification framework where the true/real labels of the training instances are not available. However, each instance is provided with a set of noisy class labels, each indicating the class-membership of the instance according to the subjective opinion of an annotator. The additional challen...
Designing customized optimization problem instances is a key issue in optimization. They can be used to tune and evaluate new algorithms, to compare several optimization algorithms, or to evaluate techniques that estimate the number of local optima of an instance. Given this relevance, several methods have been proposed to design customized optimiz...
Methods for generating a new population are a fundamental component of estimation of distribution algorithms (EDAs). They serve to transfer the information contained in the probabilistic model to the new generated population. In EDAs based on Markov networks, methods for generating new populations usually discard information contained in the model...
In this paper we empirically investigate the structural characteristics that can help to predict the complexity of NK-landscape instances for estimation of distribution algorithms (EDAs). We evolve instances that maximize the EDA complexity in terms of its success rate. Similarly, instances that minimize the algorithm complexity are evolved. We the...
Factor graphs can serve to represent Markov networks and Bayesian networks models. They can also be employed to implement efficient inference procedures such as belief propagation. In this paper we introduce a flexible implementation of belief propagation on factor graphs in the context of estimation of distribution algorithms (EDAs). By using a tr...
Estimation of Distribution Algorithms are a set of algorithms that belong to the field of Evolutionary Computation. Characterized by the use of probabilistic models to learn the (in)dependencies between the variables of the optimization problem, these algorithms have been applied to a wide set of academic and real-world optimization problems, achie...
Which problems a search algorithm can effectively solve is a fundamental issue that plays a key role in understanding and developing algorithms. In order to study the ability limit of estimation of distribution algorithms (EDAs), this paper experimentally tests three different EDA implementations on a sequence of additively decomposable functions (...
An increasing number of data mining domains consider data that can be represented as permutations. Therefore, it is important
to devise new methods to learn predictive models over datasets of permutations. However, maintaining probability distributions
over the space of permutations is a hard task since there are n! permutations of n elements. The...
This work is related to the search of complexity measures for instances of combinatorial optimization problems. Particularly, we have carried out a study about the complexity of random instances of the Traveling Salesman Problem under the 2-exchange neighbor system. We have proposed two descriptors of complexity: the proportion of the size of the b...
Estimation of distribution algorithms (EDAs) that use marginal product model factorizations have been widely applied to a broad range of mainly binary optimization problems. In this paper, we introduce the affinity propagation EDA (AffEDA) which learns a marginal product model by clustering a matrix of mutual information learned from the data using...
This paper presents an optimization algorithm for the automatic selection of a minimal subset of tagging single nucleotide polymorphisms (SNPs).
The determination of the set of minimal tagging SNPs is approached as an optimization problem in which each tagged SNP can be covered by a single tagging SNP or by a pair of tagging SNPs. The problem is so...