[Show abstract][Hide abstract] ABSTRACT: This paper describes a general hybrid metaheuristic for combinatorial optimization labelled Construct, Merge, Solve & Adapt. The proposed algorithm is a specific instantiation of a framework known from the literature as Generate-And-Solve, which is based on the following general idea. First, generate a reduced sub-instance of the original problem instance, in a way such that a solution to the sub-instance is also a solution to the original problem instance. Second, apply an exact solver to the reduced sub-instance in order to obtain a (possibly) high quality solution to the original problem instance. And third, make use of the results of the exact solver as feedback for the next algorithm iteration. The minimum common string partition problem and the minimum covering arborescence problem are chosen as test cases in order to demonstrate the application of the proposed algorithm. The obtained results show that the algorithm is competitive with the exact solver for small to medium size problem instances, while it significantly outperforms the exact solver for larger problem instances.
Full-text · Article · Nov 2015 · Computers & Operations Research
[Show abstract][Hide abstract] ABSTRACT: In recent years, different researchers in the machine learning community have presented new classification frameworks which go beyond the standard supervised classification in different aspects. Specifically, a wide spectrum of novel frameworks that use partially labeled data in the construction of classifiers has been studied. With the objective of drawing up a description of the state-of-the-art, three identifying characteristics of these novel frameworks have been considered: (1) the relationship between instances and labels of a problem, which may be beyond the one-instance one-label standard, (2) the possible provision of partial class information for the training examples, and (3) the possible provision of partial class information also for the examples in the prediction stage. These three ideas have been formulated as axes of a comprehensive taxonomy that organizes the state-of-the-art. The proposed organization allows us both to understand similarities/differences among the different classification problems already presented in the literature as well as to discover unexplored frameworks that might be seen as further challenges and research opportunities. A representative set of state-of-the-art problems has been used to illustrate the novel taxonomy and support the discussion.
No preview · Article · Oct 2015 · Pattern Recognition Letters
[Show abstract][Hide abstract] ABSTRACT: Evolutionary algorithm-based unmanned aerial vehicle (UAV) path planners have been extensively studied for their effectiveness and flexibility. However, they still suffer from a drawback that the high-quality waypoints in previous candidate paths can hardly be exploited for further evolution, since they regard all the waypoints of a path as an integrated individual. Due to this drawback, the previous planners usually fail when encountering lots of obstacles. In this paper, a new idea of separately evaluating and evolving waypoints is presented to solve this problem. Concretely, the original objective and constraint functions of UAVs path planning are decomposed into a set of new evaluation functions, with which waypoints on a path can be evaluated separately. The new evaluation functions allow waypoints on a path to be evolved separately and, thus, high-quality waypoints can be better exploited. On this basis, the waypoints are encoded in a rotated coordinate system with an external restriction and evolved with JADE, a state-of-the-art variant of the differential evolution algorithm. To test the capabilities of the new planner on planning obstacle-free paths, five scenarios with increasing numbers of obstacles are constructed. Three existing planners and four variants of the proposed planner are compared to assess the effectiveness and efficiency of the proposed planner. The results demonstrate the superiority of the proposed planner and the idea of separate evolution.
Full-text · Article · Oct 2015 · IEEE Transactions on Robotics
[Show abstract][Hide abstract] ABSTRACT: An artificial bioindicator system is developed in order to solve a network intrusion detection problem. The system, inspired by an ecological approach to biological immune systems, evolves a population of agents that learn to survive in their environment. An adaptation process allows the transformation of the agent population into a bioindicator that is capable of reacting to system anomalies. Two characteristics stand out in our proposal. On the one hand, it is able to discover new, previously unseen attacks, and on the other hand, contrary to most of the existing systems for network intrusion detection, it does not need any previous training. We experimentally compare our proposal with three state-of-the-art algorithms and show that it outperforms the competing approaches on widely used benchmark data.
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies (GWAS) have discovered numerous loci involved in genetic traits. Virtually all studies have reported associations between individual single nucleotide polymorphisms (SNPs) and traits. However, it is likely that complex traits are influenced by interaction of multiple SNPs. One approach to detect interactions of SNPs is the brute force approach which performs a pairwise association test between a trait and each pair of SNPs. The brute force approach is often computationally infeasible because of the large number of SNPs collected in current GWAS studies. We propose a two-stage model, Threshold-based Efficient Pairwise Association Approach (TEPAA), to reduce the number of tests needed while maintaining almost identical power to the brute force approach. In the first stage, our method performs the single marker test on all SNPs and selects a subset of SNPs that achieve a certain significance threshold. In the second stage, we perform a pairwise association test between traits and pairs of the SNPs selected from the first stage. The key insight of our approach is that we derive the joint distribution between the association statistics of a single SNP and the association statistics of pairs of SNPs. This joint distribution allows us to provide guarantees that the statistical power of our approach will closely approximate the brute force approach. We applied our approach to the Northern Finland Birth Cohort data and achieved 63 times speedup while maintaining 99% of the power of the brute force approach.
No preview · Article · Apr 2015 · Journal of computational biology: a journal of computational molecular cell biology
[Show abstract][Hide abstract] ABSTRACT: Message passing algorithms (MPAs) have been traditionally used as an inference method in probabilistic graphical models. Some MPA variants have recently been introduced in the field of estimation of distribution algorithms (EDAs) as a way to improve the efficiency of these algorithms. Multiple developments on MPAs point to an increasing potential of these methods for their application as part of hybrid EDAs. In this paper we review recent work on EDAs that apply MPAs and propose ways to further extend the useful synergies between MPAs and EDAs. Furthermore, we analyze some of the implications that MPA developments can have in their future application to EDAs and other evolutionary algorithms.
No preview · Article · Dec 2014 · Natural Computing
[Show abstract][Hide abstract] ABSTRACT: Cloud computing environments allow customers to dynamically scale their applications. The key problem is how to lease the right amount of resources, on a pay-as-you-go basis. Application re-dimensioning can be implemented effortlessly, adapting the resources assigned to the application to the incoming user demand. However, the identification of the right amount of resources to lease in order to meet the required Service Level Agreement, while keeping the overall cost low, is not an easy task. Many techniques have been proposed for automating application scaling. We propose a classification of these techniques into five main categories: static threshold-based rules, control theory, reinforcement learning, queuing theory and time series analysis. Then we use this classification to carry out a literature review of proposals for auto-scaling in the cloud.
Full-text · Article · Dec 2014 · Journal of Grid Computing
[Show abstract][Hide abstract] ABSTRACT: Path planning technique is important to Unmanned Aerial Vehicle (UAV). Evolutionary Algorithms (EAs) have been widely used in planning path for UAV. In these EA-based path planners, Cartesian coordinate system and polar coordinate system are commonly used to codify the path. However, either of them has its drawback: Cartesian coordinate systems result in an enormous search space, whilst polar coordinate systems are unfit for local modifications resulting e.g., from mutation and/ or crossover. In order to overcome these two drawbacks, we solve the UAV path planning in a new coordinate system. As the new coordinate system is only a rotation of Cartesian coordinate system, it is inherently easy for local modification. Besides, this new coordinate system has successfully reduced the search space by explicitly dividing the mission space into several subspaces. Within this new coordinate system, an Estimation of Distribution Algorithms (EDAs) based path planner is proposed in this paper. Some experiments have been designed to test different aspects of the new path planner. The results show the effectiveness of this planner.
[Show abstract][Hide abstract] ABSTRACT: Recently, probability models on rankings have been proposed in the field of estimation of distribution algorithms in order to solve permutation-based combinatorial optimisation problems. Particularly, distance-based ranking models, such as Mallows and Generalized Mallows under the Kendall's-τ distance, have demonstrated their validity when solving this type of problems. Nevertheless, there are still many trends that deserve further study. In this paper, we extend the use of distance-based ranking models in the framework of EDAs by introducing new distance metrics such as Cayley and Ulam. In order to analyse the performance of the Mallows and Generalized Mallows EDAs under the Kendall, Cayley and Ulam distances, we run them on a benchmark of 120 instances from four well known permutation problems. The conducted experiments showed that there is not just one metric that performs the best in all the problems. However, the statistical test pointed out that Mallows-Ulam EDA is the most stable algorithm among the studied proposals.
[Show abstract][Hide abstract] ABSTRACT: The minimum common string partition problem is an NP-hard combinatorial
optimization problem with applications in computational biology. In this work
we propose the first integer linear programming model for solving this problem.
Moreover, on the basis of the integer linear programming model we develop a
deterministic 2-phase heuristic which is applicable to larger problem
instances. The results show that provenly optimal solutions can be obtained for
problem instances of small and medium size from the literature by solving the
proposed integer linear programming model with CPLEX. Furthermore, new
best-known solutions are obtained for all considered problem instances from the
literature. Concerning the heuristic, we were able to show that it outperforms
heuristic competitors from the related literature.
[Show abstract][Hide abstract] ABSTRACT: The minimum common string partition problem is an NP-hard combinatorial optimization problem with applications in computational biology. In this work we propose the first integer linear programming model for solving this problem. Moreover, on the basis of the integer linear programming model we develop a deterministic 2-phase heuristic which is applicable to larger problem instances. The results show that provenly optimal solutions can be obtained for problem instances of small and medium size from the literature by solving the proposed integer linear programming model with CPLEX. Furthermore, new best-known solutions are obtained for all considered problem instances from the literature. Concerning the heuristic, we were able to show that it outperforms heuristic competitors from the related literature.
No preview · Article · Apr 2014 · European Journal of Operational Research
[Show abstract][Hide abstract] ABSTRACT: A fundamental question in the field of approximation algorithms, for a given problem instance, is the selection of the best (or a suitable) algorithm with regard to some performance criteria. A practical strategy for facing this problem is the application of machine learning techniques. However, limited support has been given in the literature to the case of more than one performance criteria, which is the natural scenario for approximation algorithms. We propose multidimensional Bayesian network (mBN) classifiers as a relatively simple, yet well-principled, approach for helping to solve this problem. Precisely, we relax the algorithm selection decision problem into the elucidation of the nondominated subset of algorithms, which contains the best. This formulation can be used in different ways to elucidate the main problem, each of which can be tackled with an mBN classifier. Namely, we deal with two of them: the prediction of the whole nondominated set and whether an algorithm is nondominated or not. We illustrate the feasibility of the approach for real-life scenarios with a case study in the context of Search Based Software Test Data Generation (SBSTDG). A set of five SBSTDG generators is considered and the aim is to assist a hypothetical test engineer in elucidating good generators to fulfil the branch testing of a given programme.
No preview · Article · Feb 2014 · Information Sciences
[Show abstract][Hide abstract] ABSTRACT: Most researchers employed common functional models when dealing with scheduling problems with controllable processing times. However, in many complicated manufacturing systems with high diversity of jobs, these functional resource models fail to reflect their specific characteristics. To fulfill these requirements, we apply a more general model, the discrete model. Traditional functional models can be viewed as special cases of such model. In this paper, the discrete model is implemented on a problem of minimizing the weighted resource allocation subject to a common deadline on a single machine. By reducing the problem to a partition problem, we demonstrate that it is NP-complete, which addresses the hard issue of the guarantee of both the solution quality and time cost. In order to tackle the problem, we develop an estimation of distribution algorithm based on an approximation of the Boltzmann distribution. The approximation strategy represents a tradeoff between complexity and solution accuracy. The results of the experiments conducted on benchmarks show that, compared with other alternative approaches, the proposed algorithm has competitive behavior, obtaining 74 best solutions out of 90 instances.
No preview · Article · Jan 2014 · IEEE Transactions on Evolutionary Computation
[Show abstract][Hide abstract] ABSTRACT: This paper deals with a classification problem known as learning from label proportions. The provided dataset is composed of unlabeled instances and is divided into disjoint groups. General class information is given within the groups: the proportion of instances of the group that belong to each class. We have developed a method based on the Structural EM strategy that learns Bayesian network classifiers to deal with the exposed problem. Four versions of our proposal are evaluated on synthetic data, and compared with state-of-the-art approaches on real datasets from public repositories. The results obtained show a competitive behavior for the proposed algorithm.
No preview · Article · Dec 2013 · Pattern Recognition
[Show abstract][Hide abstract] ABSTRACT: The Linear Ordering Problem is a combinatorial optimization problem which has been frequently addressed in the literature due to its numerous applications in diverse fields. In spite of its popularity, little is known about its complexity. In this paper we analyze the linear ordering problem trying to identify features or characteristics of the instances that can provide useful insights into the difficulty of solving them. Particularly, we introduce two different metrics, insert ratio and ubiquity ratio, that measure the difficulty of solving the LOP with local search type algorithms with the insert neighborhood system. Conducted experiments demonstrate that the proposed metrics clearly correlate with the complexity of solving the LOP with a multistart local search algorithm.
[Show abstract][Hide abstract] ABSTRACT: A multi-species approach to fisheries management requires taking into account the interactions between species in order to improve recruitment forecasting. Recent advances in Bayesian networks direct the learning of models with several interrelated variables to be forecasted simultaneously. These are known as multi-dimensional Bayesian network classifiers (MDBNs). Pre-processing steps are critical for the posterior learning of the model in these kinds of domains. Therefore, in this study, a set of 'state-of-the-art' uni-dimensional pre-processing methods, within the categories of missing data imputation, feature discretization and subset selection, are adapted to be used with MDBNs. A framework that includes the proposed multi-dimensional supervised pre-processing methods, coupled with a MDBN classifier, is tested for fish recruitment forecasting. The correctly forecasting of three fish species (anchovy, sardine and hake) simultaneously is doubled (from 17.3% to 29.5%) using the multi-dimensional approach in comparison to mono-species models. The probability assessments also show high improvement reducing the average error (Brier score) from 0.35 to 0.27. These differences are superior to the forecasting of species by pairs.
[Show abstract][Hide abstract] ABSTRACT: Symmetry has hitherto been studied piecemeal in a variety of evolutionary computation domains, with little consistency between the definitions. Here we provide formal definitions of symmetry that are consistent across the field of evolutionary computation. We propose a number of evolutionary and estimation of distribution algorithms suitable for variable symmetries in Cartesian power domains, and compare their utility, integration of the symmetry knowledge with the probabilistic model of an EDA yielding the best outcomes. We test the robustness of the algorithm to inexact symmetry, finding adequate performance up to about 1% noise. Finally, we present evidence that such symmetries, if not known a priori, may be learnt during evolution.
[Show abstract][Hide abstract] ABSTRACT: Estimation of distribution algorithms are known as powerful evolutionary algorithms that have been widely used for diverse types of problems. However, they have not been extensively developed for permutation-based problems. Recently, some progress has been made in this area by introducing probability models on rankings to optimize permutation domain problems. In particular, the Mallows model and the Generalized Mallows model demonstrated their effectiveness when used with estimation of distribution algorithms. Motivated by these advances, in this paper we introduce a Thurstone order statistics model, called Plackett-Luce, to the framework of estimation of distribution algorithms. In order to prove the potential of the proposed algorithm, we consider two different permutation problems: the linear ordering problem and the flowshop scheduling problem. In addition, the results are compared with those obtained by the Mallows and the Generalized Mallows proposals. Conducted experiments demonstrate that the Plackett-Luce model is the best performing model for solving the linear ordering problem. However, according to the experimental results, the Generalized Mallows model turns out to be very robust obtaining very competitive results for both problems, especially for the permutation flowshop scheduling problem.
[Show abstract][Hide abstract] ABSTRACT: The aim of this paper is two-fold. First, we introduce a novel general estimation of distribution algorithm to deal with permutation-based optimization problems. The algorithm is based on the use of a probabilistic model for permutations called the generalized Mallows model. In order to prove the potential of the proposed algorithm, our second aim is to solve the permutation flowshop scheduling problem. A hybrid approach consisting of the new estimation of distribution algorithm and a variable neighborhood search is proposed. Conducted experiments demonstrate that the proposed algorithm is able to outperform the state-of-the-art approaches. Moreover, from the 220 benchmark instances tested, the proposed hybrid approach obtains new best known results in 152 cases. An in-depth study of the results suggests that the successful performance of the introduced approach is due to the ability of the generalized Mallows estimation of distribution algorithm to discover promising regions in the search space.
No preview · Article · Apr 2013 · IEEE Transactions on Evolutionary Computation