Jose A. Lozano

Universidad del País Vasco / Euskal Herriko Unibertsitatea, Leioa, Basque Country, Spain

Are you Jose A. Lozano?

Claim your profile

Publications (109)122.63 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: An artificial bioindicator system is developed in order to solve a network intrusion detection problem. The system, inspired by an ecological approach to biological immune systems, evolves a population of agents that learn to survive in their environment. An adaptation process allows the transformation of the agent population into a bioindicator that is capable of reacting to system anomalies. Two characteristics stand out in our proposal. On the one hand, it is able to discover new, previously unseen attacks, and on the other hand, contrary to most of the existing systems for network intrusion detection, it does not need any previous training. We experimentally compare our proposal with three state-of-the-art algorithms and show that it outperforms the competing approaches on widely used benchmark data.
    Artificial Life 05/2015; 21(2):93-118. DOI:10.1162/ARTL_a_00162 · 1.93 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies (GWAS) have discovered numerous loci involved in genetic traits. Virtually all studies have reported associations between individual single nucleotide polymorphisms (SNPs) and traits. However, it is likely that complex traits are influenced by interaction of multiple SNPs. One approach to detect interactions of SNPs is the brute force approach which performs a pairwise association test between a trait and each pair of SNPs. The brute force approach is often computationally infeasible because of the large number of SNPs collected in current GWAS studies. We propose a two-stage model, Threshold-based Efficient Pairwise Association Approach (TEPAA), to reduce the number of tests needed while maintaining almost identical power to the brute force approach. In the first stage, our method performs the single marker test on all SNPs and selects a subset of SNPs that achieve a certain significance threshold. In the second stage, we perform a pairwise association test between traits and pairs of the SNPs selected from the first stage. The key insight of our approach is that we derive the joint distribution between the association statistics of a single SNP and the association statistics of pairs of SNPs. This joint distribution allows us to provide guarantees that the statistical power of our approach will closely approximate the brute force approach. We applied our approach to the Northern Finland Birth Cohort data and achieved 63 times speedup while maintaining 99% of the power of the brute force approach.
    Journal of computational biology: a journal of computational molecular cell biology 04/2015; 22(6). DOI:10.1089/cmb.2014.0163 · 1.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose and evaluate improved first fit (IFF), a fast implementation of the first fit contiguous partitioning strategy. It has been devised to accelerate the process of finding contiguous partitions in space-shared parallel computers in which the nodes are arranged forming multidimensional cubic networks. IFF uses system status information to drastically reduce the cost of finding partitions with the requested shape. The use of this information, i combined with the early detection of zones where requests cannot be allocated, remarkably improves the search speed in large networks. An exhaustive set of simulation-based experiments have been carried out to test IFF against other algorithms implementing the same partitioning strategy. Results, using synthetic and real workloads, show that IFF can be several orders of magnitude faster than competitor algorithms. Copyright © 2013 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 12/2014; 26(17). DOI:10.1002/cpe.3174 · 0.78 Impact Factor
  • Source
    Journal of Grid Computing 12/2014; DOI:10.1007/s10723-014-9314-7 · 1.67 Impact Factor
  • Source
    Peng Yang, Ke Tang, Jose A. Lozano
    [Show abstract] [Hide abstract]
    ABSTRACT: Path planning technique is important to Unmanned Aerial Vehicle (UAV). Evolutionary Algorithms (EAs) have been widely used in planning path for UAV. In these EA-based path planners, Cartesian coordinate system and polar coordinate system are commonly used to codify the path. However, either of them has its drawback: Cartesian coordinate systems result in an enormous search space, whilst polar coordinate systems are unfit for local modifications resulting e.g., from mutation and/ or crossover. In order to overcome these two drawbacks, we solve the UAV path planning in a new coordinate system. As the new coordinate system is only a rotation of Cartesian coordinate system, it is inherently easy for local modification. Besides, this new coordinate system has successfully reduced the search space by explicitly dividing the mission space into several subspaces. Within this new coordinate system, an Estimation of Distribution Algorithms (EDAs) based path planner is proposed in this paper. Some experiments have been designed to test different aspects of the new path planner. The results show the effectiveness of this planner.
    IEEE Congress on Evolutionary Computation, Beijing, China; 07/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The minimum common string partition problem is an NP-hard combinatorial optimization problem with applications in computational biology. In this work we propose the first integer linear programming model for solving this problem. Moreover, on the basis of the integer linear programming model we develop a deterministic 2-phase heuristic which is applicable to larger problem instances. The results show that provenly optimal solutions can be obtained for problem instances of small and medium size from the literature by solving the proposed integer linear programming model with CPLEX. Furthermore, new best-known solutions are obtained for all considered problem instances from the literature. Concerning the heuristic, we were able to show that it outperforms heuristic competitors from the related literature.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The minimum common string partition problem is an NP-hard combinatorial optimization problem with applications in computational biology. In this work we propose the first integer linear programming model for solving this problem. Moreover, on the basis of the integer linear programming model we develop a deterministic 2-phase heuristic which is applicable to larger problem instances. The results show that provenly optimal solutions can be obtained for problem instances of small and medium size from the literature by solving the proposed integer linear programming model with CPLEX. Furthermore, new best-known solutions are obtained for all considered problem instances from the literature. Concerning the heuristic, we were able to show that it outperforms heuristic competitors from the related literature.
    European Journal of Operational Research 04/2014; 242(3). DOI:10.1016/j.ejor.2014.10.049 · 1.84 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A fundamental question in the field of approximation algorithms, for a given problem instance, is the selection of the best (or a suitable) algorithm with regard to some performance criteria. A practical strategy for facing this problem is the application of machine learning techniques. However, limited support has been given in the literature to the case of more than one performance criteria, which is the natural scenario for approximation algorithms. We propose multidimensional Bayesian network (mBN) classifiers as a relatively simple, yet well-principled, approach for helping to solve this problem. Precisely, we relax the algorithm selection decision problem into the elucidation of the nondominated subset of algorithms, which contains the best. This formulation can be used in different ways to elucidate the main problem, each of which can be tackled with an mBN classifier. Namely, we deal with two of them: the prediction of the whole nondominated set and whether an algorithm is nondominated or not. We illustrate the feasibility of the approach for real-life scenarios with a case study in the context of Search Based Software Test Data Generation (SBSTDG). A set of five SBSTDG generators is considered and the aim is to assist a hypothetical test engineer in elucidating good generators to fulfil the branch testing of a given programme.
    Information Sciences 02/2014; 258:122-139. DOI:10.1016/j.ins.2013.09.050 · 3.89 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper deals with a classification problem known as learning from label proportions. The provided dataset is composed of unlabeled instances and is divided into disjoint groups. General class information is given within the groups: the proportion of instances of the group that belong to each class. We have developed a method based on the Structural EM strategy that learns Bayesian network classifiers to deal with the exposed problem. Four versions of our proposal are evaluated on synthetic data, and compared with state-of-the-art approaches on real datasets from public repositories. The results obtained show a competitive behavior for the proposed algorithm.
    Pattern Recognition 12/2013; 46(12):3425-3440. DOI:10.1016/j.patcog.2013.05.002 · 2.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A multi-species approach to fisheries management requires taking into account the interactions between species in order to improve recruitment forecasting. Recent advances in Bayesian networks direct the learning of models with several interrelated variables to be forecasted simultaneously. These are known as multi-dimensional Bayesian network classifiers (MDBNs). Pre-processing steps are critical for the posterior learning of the model in these kinds of domains. Therefore, in this study, a set of 'state-of-the-art' uni-dimensional pre-processing methods, within the categories of missing data imputation, feature discretization and subset selection, are adapted to be used with MDBNs. A framework that includes the proposed multi-dimensional supervised pre-processing methods, coupled with a MDBN classifier, is tested for fish recruitment forecasting. The correctly forecasting of three fish species (anchovy, sardine and hake) simultaneously is doubled (from 17.3% to 29.5%) using the multi-dimensional approach in comparison to mono-species models. The probability assessments also show high improvement reducing the average error (Brier score) from 0.35 to 0.27. These differences are superior to the forecasting of species by pairs.
    ICES Annual Science Conference, Reykjavik, Iceland; 09/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this paper is two-fold. First, we introduce a novel general estimation of distribution algorithm to deal with permutation-based optimization problems. The algorithm is based on the use of a probabilistic model for permutations called the generalized Mallows model. In order to prove the potential of the proposed algorithm, our second aim is to solve the permutation flowshop scheduling problem. A hybrid approach consisting of the new estimation of distribution algorithm and a variable neighborhood search is proposed. Conducted experiments demonstrate that the proposed algorithm is able to outperform the state-of-the-art approaches. Moreover, from the 220 benchmark instances tested, the proposed hybrid approach obtains new best known results in 152 cases. An in-depth study of the results suggests that the successful performance of the introduced approach is due to the ability of the generalized Mallows estimation of distribution algorithm to discover promising regions in the search space.
    IEEE Transactions on Evolutionary Computation 04/2013; 18(2). DOI:10.1109/TEVC.2013.2260548 · 5.55 Impact Factor
  • Source
    Juan Diego Rodríguez, Aritz Pérez, Jose Antonio Lozano
    [Show abstract] [Hide abstract]
    ABSTRACT: Estimating the prediction error of classifiers induced by supervised learning algorithms is important not only to predict its future error, but also to choose a classifier from a given set (model selection). If the goal is to estimate the prediction error of a particular classifier, the desired estimator should have low bias and low variance. However, if the goal is the model selection, in order to make fair comparisons the chosen estimator should have low variance assuming that the bias term is independent from the considered classifier. This paper follows the analysis proposed in [1] about the statistical prop- erties of k-fold cross-validation estimators and extends it to the most popular error estimators: resubstitution, holdout, repeated holdout, simple bootstrap and 0.632 bootstrap estimators, without and with stratification. We present a general framework to analyze the decomposition of the variance of different error estimators considering the nature of the variance (irreducible/reducible variance) and the different sources of sensitivity (internal/external sensitiv- ity). An extensive empirical study has been performed for the previously men- tioned estimators with naive Bayes and C4.5 classifiers over training sets ob- tained from assorted probability distributions. The empirical analysis con- sists of decomposing the variances following the proposed framework and checking the independence assumption between the bias and the considered classifier. Based on the obtained results, we propose the most appropriate error estimations for model selection under different experimental conditions.
    Pattern Recognition 03/2013; 46(3):855-864. DOI:10.1016/j.patcog.2012.09.007, · 2.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: a b s t r a c t A multi-species approach to fisheries management requires taking into account the interactions between species in order to improve recruitment forecasting of the fish species. Recent advances in Bayesian networks direct the learning of models with several interrelated variables to be forecasted simulta-neously. These models are known as multi-dimensional Bayesian network classifiers (MDBNs). Pre-processing steps are critical for the posterior learning of the model in these kinds of domains. There-fore, in the present study, a set of 'state-of-the-art' uni-dimensional pre-processing methods, within the categories of missing data imputation, feature discretization and feature subset selection, are adapted to be used with MDBNs. A framework that includes the proposed multi-dimensional supervised pre-processing methods, coupled with a MDBN classifier, is tested with synthetic datasets and the real domain of fish recruitment forecasting. The correctly forecasting of three fish species (anchovy, sardine and hake) simultaneously is doubled (from 17.3% to 29.5%) using the multi-dimensional approach in comparison to mono-species models. The probability assessments also show high improvement reducing the average error (estimated by means of Brier score) from 0.35 to 0.27. Finally, these differences are superior to the forecasting of species by pairs.
    Environmental Modelling and Software 02/2013; 40:245-254. DOI:10.1016/j.envsoft.2012.10.001 · 4.54 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper shows how the Bayesian network paradigm can be used in order to solve combinatorial optimization problems. To do it some methods of structure learning from data and simulation of Bayesian networks are inserted inside Estimation of Distribution Algorithms (EDA). EDA are a new tool for evolutionary computation in which populations of individuals are created by estimation and simulation of the joint probability distribution of the selected individuals. We propose new approaches to EDA for combinatorial optimization based on the theory of probabilistic graphical models. Experimental results are also presented.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cloud computing environments offer the user the capability of running their applications in an elastic manner, using only the resources they need, and paying for what they use. However, to take advantage of this flexibility, it is advisable to use an auto-scaling technique that adjusts the resources to the incoming workload, both reducing the over-all cost and complying with the Service Level Objec-tive. In this work we present a comparison of some auto-scaling techniques (both reactive and proactive) proposed in the literature, plus two new approaches based on rules with dynamic thresholds. Results show that dynamic thresholds avoid the bad performance derived from a bad threshold selection.
    CEDI; 01/2013
  • R. Santana, R.I. McKay, J.A. Lozano
    [Show abstract] [Hide abstract]
    ABSTRACT: Symmetry has hitherto been studied piecemeal in a variety of evolutionary computation domains, with little consistency between the definitions. Here we provide formal definitions of symmetry that are consistent across the field of evolutionary computation. We propose a number of evolutionary and estimation of distribution algorithms suitable for variable symmetries in Cartesian power domains, and compare their utility, integration of the symmetry knowledge with the probabilistic model of an EDA yielding the best outcomes. We test the robustness of the algorithm to inexact symmetry, finding adequate performance up to about 1% noise. Finally, we present evidence that such symmetries, if not known a priori, may be learnt during evolution.
    Evolutionary Computation (CEC), 2013 IEEE Congress on; 01/2013
  • J. Ceberio, A. Mendiburu, J.A. Lozano
    [Show abstract] [Hide abstract]
    ABSTRACT: Estimation of distribution algorithms are known as powerful evolutionary algorithms that have been widely used for diverse types of problems. However, they have not been extensively developed for permutation-based problems. Recently, some progress has been made in this area by introducing probability models on rankings to optimize permutation domain problems. In particular, the Mallows model and the Generalized Mallows model demonstrated their effectiveness when used with estimation of distribution algorithms. Motivated by these advances, in this paper we introduce a Thurstone order statistics model, called Plackett-Luce, to the framework of estimation of distribution algorithms. In order to prove the potential of the proposed algorithm, we consider two different permutation problems: the linear ordering problem and the flowshop scheduling problem. In addition, the results are compared with those obtained by the Mallows and the Generalized Mallows proposals. Conducted experiments demonstrate that the Plackett-Luce model is the best performing model for solving the linear ordering problem. However, according to the experimental results, the Generalized Mallows model turns out to be very robust obtaining very competitive results for both problems, especially for the permutation flowshop scheduling problem.
    Evolutionary Computation (CEC), 2013 IEEE Congress on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Nowadays, the solution of many combinatorial optimization problems is carried out by metaheuristics, which generally, make use of local search algorithms. These algorithms use some kind of neighborhood structure over the search space. The performance of the algorithms strongly depends on the properties that the neighborhood imposes on the search space. One of these properties is the number of local optima. Given an instance of a combinatorial optimization problem and a neighborhood, the estimation of the number of local optima can help, not only to measure the complexity of the instance, but also to choose the most convenient neighborhood to solve it. In this paper we review and evaluate several methods to estimate the number of local optima in combinatorial optimization problems. The methods reviewed not only come from the combinatorial optimization literature, but also from the statistical literature. A thorough evaluation in synthetic as well as real problems is given. We conclude by providing recommendations of methods for several scenarios.
    Evolutionary Computation 12/2012; DOI:10.1162/EVCO_a_00100 · 3.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Methods for generating a new population are a fundamental component of estimation of distribution algorithms (EDAs). They serve to transfer the information contained in the probabilistic model to the new generated population. In EDAs based on Markov networks, methods for generating new populations usually discard information contained in the model to gain in efficiency. Other methods like Gibbs sampling use information about all interactions in the model but are com-putationally very costly. In this paper we propose new methods for generating new solutions in EDAs based on Markov networks. We introduce approaches based on inference methods for computing the most probable configurations and model-based template recombination. We show that the application of different variants of inference methods can increase the EDAs' convergence rate and reduce the number of function evaluations needed to find the optimum of binary and non-binary discrete functions.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Understanding the relationship between a search algorithm and the space of problems is a fundamental issue in the optimization field. In this paper, we lay the foundations to elaborate taxonomies of problems under estimation of distribution algorithms (EDAs). By using an infinite population model and assuming that the selection operator is based on the rank of the solutions, we group optimization problems according to the behavior of the EDA. Throughout the definition of an equivalence relation between functions it is possible to partition the space of problems in equivalence classes in which the algorithm has the same behavior. We show that only the probabilistic model is able to generate different partitions of the set of possible problems and hence, it predetermines the number of different behaviors that the algorithm can exhibit. As a natural consequence of our definitions, all the objective functions are in the same equivalence class when the algorithm does not impose restrictions to the probabilistic model. The taxonomy of problems, which is also valid for finite populations, is studied in depth for a simple EDA that considers independence among the variables of the problem. We provide the sufficient and necessary condition to decide the equivalence between functions and then we develop the operators to describe and count the members of a class. In addition, we show the intrinsic relation between univariate EDAs and the neighborhood system induced by the Hamming distance by proving that all the functions in the same class have the same number of local optima and that they are in the same ranking positions. Finally, we carry out numerical simulations in order to analyze the different behaviors that the algorithm can exhibit for the functions defined over the search space {0, 1}(3).
    Evolutionary Computation 11/2012; 21(3). DOI:10.1162/EVCO_a_00095 · 3.73 Impact Factor

Publication Stats

1k Citations
122.63 Total Impact Points

Institutions

  • 2–2014
    • Universidad del País Vasco / Euskal Herriko Unibertsitatea
      • Computer Sciences and Artificial Intelligence
      Leioa, Basque Country, Spain
  • 2009
    • Instituto de Salud Carlos III
      Madrid, Madrid, Spain
    • Instituto Interuniversitario de Investigación en Bioingeniería y Tecnología Orientada al Ser Humano
      Valenza, Valencia, Spain
  • 2008
    • University Pompeu Fabra
      Barcino, Catalonia, Spain
  • 2004–2007
    • Universitat Politècnica de València
      • Institute for Research and Innovation in Bioengineering (i3BH)
      Valenza, Valencia, Spain
  • 1998–2006
    • University of Valencia
      • Facultad de Psicología
      Valenza, Valencia, Spain