Erick Cantu-Paz

Erick Cantu-Paz

About

98
Publications
11,513
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,543
Citations

Publications

Publications (98)
Conference Paper
Full-text available
Amazon is one of the world's largest e-commerce sites and Amazon Search powers the majority of Amazon's sales. As a consequence, even small improvements in relevance ranking both positively influence the shopping experience of millions of customers and significantly impact revenue. In the past, Amazon's product search engine consisted of several ha...
Patent
Full-text available
The present invention is directed towards systems and methods for providing dynamic search results based upon historical data through the use of one or more widgets. The method of the present invention comprises receiving a request for content from a client and generating one or more widgets for providing search result content. A display profile is...
Article
Full-text available
The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad...
Article
Parallel Global Optimization Algorithms (PGOA) provide an efficient way of dealing with hard optimization problems. One method of parallelization of GOAs that is frequently applied and commonly found in the contemporary literature is the so-called Island ...
Conference Paper
Full-text available
Previous studies on search engine click modeling have identified two presentation factors that affect users' behavior: (1) position bias: the same result will get a different number of clicks when displayed in different positions and (2) externalities: the same result might get more clicks when displayed with results of relatively lower quality tha...
Conference Paper
Full-text available
Sponsored search is a multi-billion dollar business that generates most of the revenue for search engines. Predicting the probability that users click on ads is crucial to sponsored search because the prediction is used to influence ranking, filtering, placement, and pricing of ads. Ad ranking, filtering and placement have a direct impact on the us...
Article
A fundamental problem in sponsored search advertising is the estimation of probability of click for ads displayed in re-sponse to search queries. The historical click-through rate (CTR) is one of the most important predictors of the click, and can be extracted at multiple resolutions of the query-ad hierarchy. However, the new ads do not have any c...
Book
The objective in editing this book was to assemble a sample of the best work in parallel and distributed biologically inspired algorithms. The editors invited researchers in different domains to submit their work. They aimed to include diverse topics to appeal to a wide audience. Some of the chapters summarize work that has been ongoing for several...
Patent
Full-text available
The present invention is directed towards systems and methods for predicting a frequency with which an advertisement displayed in response to a query will be selected. The method of the present invention comprises receiving analytics data associated with a display of one or more advertisements in response to one or more queries. One or more feat...
Chapter
Full-text available
This chapter focuses on the parallelization of Estimation of Distribution Algorithms (EDAs). More specifically, it presents guidelines for designing efficient parallel EDAs that employ parallel fitness evaluation and parallel model building. Scalability analysis techniques are employed to identify and parallelize the main performance bottlenecks to...
Chapter
The performance of classification algorithms is affected by the features used to describe the labeled examples presented to the inducers. Therefore, the problem of feature subset selection has received considerable attention. Approaches to this problem based on evolutionary algorithms typically use the wrapper method, treating the inducer as a blac...
Chapter
Parallel genetic algorithms (GAs) have numerous parameters that affect their efficiency and accuracy. Traditionally, these parameters have been studied using empirical studies whose generality and limitations are difficult to assess. This chapter reviews existing theoretical models that predict the effects of the parameters. The models are used to...
Article
Estimation of distribution algorithms (EDAs) are a wide-ranging family of evolutionary algorithms whose common feature is the way they evolve by learning a probability distribution from the best individuals in a population and sampling it to generate ...
Article
Full-text available
A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that re...
Article
Full-text available
There are numerous combinations of neural networks (NNs) and evolutionary algorithms (EAs) used in classification problems. EAs have been used to train the networks, design their architecture, and select feature subsets. However, most of these combinations have been tested on only a few data sets and many comparisons are done inappropriately measur...
Chapter
IntroductionMaster-Slave Parallel GAsMultipopulation Parallel GAsCellular Parallel GAsConclusions References
Conference Paper
Full-text available
This paper describes GridAssist, a user friendly Grid-based workflow management tool that allows users to execute workflows in a Grid environment and hides the underlying technology. Two cases are described in which this tool is now being used: processing ...
Conference Paper
Full-text available
The population size of genetic algorithms (GAs) aects the quality of the solutions and the time required to nd them. While progress has been made in estimating the population sizes required to reach a de- sired solution quality for certain problems, in practice the sizing of pop- ulations is still usually performed by trial and error. These trials...
Conference Paper
Full-text available
Numerous applications of data mining to scientific data involve the induction of a classification model. In many cases, the collection of data is not performed with this task in mind, and therefore, the data might contain irrelevant or redundant features that affect negatively the accuracy of the induction algorithms. The size and dimensionality of...
Article
Comparing the output of a physics simulation with an experiment is often done by visually comparing the two outputs. In order to determine which simulation is a closer match to the experiment, more quantitative measures are needed. This paper describes our early experiences with this problem by considering the slightly simpler problem of finding ob...
Article
Comparing the output of a physics simulation with an experiment is often done by visually comparing the two outputs. In order to determine which simulation is a closer match to the experiment, more quantitative measures are needed. This paper describes our early experiences with this problem by considering the slightly simpler problem of finding ob...
Article
This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decision-tree (DT) induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that, at least in some cases, this can be accomplished in a shorter time. We p...
Article
The FIRST (Faint Images of the Radio Sky at Twenty-cm) survey is an ambitious project scheduled to cover 10,000 square degrees of the northern and southern galactic caps. Until recently, astronomers associated with FIRST identified radio-emitting galaxies with a bent-double morphology through a visual inspection of images. Besides being subjective,...
Conference Paper
Full-text available
The performance of classiflcation algorithms in machine learn- ing is afiected by the features used to describe the labeled examples pre- sented to the inducers. Therefore, the problem of feature subset selection has received considerable attention. Genetic approaches to this problem usually follow the wrapper approach: treat the inducer as a black...
Conference Paper
Full-text available
The usual approach to deal with noise present in many real-world optimization problems is to take an arbitrary number of samples of the objective function and use the sample average as an estimate of the true objective value. The number of samples is typically chosen arbitrarily and remains constant for the entire optimization process. This paper s...
Article
Full-text available
The performance of classification algorithms in machine learning is affected by the features used to describe the labeled examples presented to the inducers. Therefore, the problem of feature subset selection has received considerable attention. Genetic approaches to this problem usually follow the wrapper approach: treat the inducer as a black box...
Conference Paper
Full-text available
This paper describes the application of four evolutionary algorithms to the pruning of neural networks used in classification problems. Besides of a simple genetic algorithm (GA), the paper considers three distribution estimation algorithms (DEAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to de...
Conference Paper
Full-text available
There are conflicting reports over whether multiple independent runs of genetic algorithms (GAs) with small populations can reach solutions of higher quality or can find acceptable solutions faster than a single run with a large population. This paper investigates this question analytically using two approaches. First, the analysis assumes that the...
Article
The FIRST (Faint Images of the Radio Sky at Twenty-cm) survey is an ambitious project scheduled to cover 10,000 square degrees of the northern and southern galactic caps. Until recently, astronomers associated with FIRST identified radio-emitting galaxies with a bent-double morphology through a visual inspection of images. Besides being subjective,...
Article
Full-text available
This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decision-tree (DT) induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that, at least in some cases, this can be accomplished in a shorter time. We p...
Article
Full-text available
This paper describes the application of four evolutionary algorithms to the selection of feature subsets for classification problems.
Article
Full-text available
This paper describes the application of four evolutionary algorithms to the pruning of neural networks used in classification problems. Besides of a simple genetic algorithm (GA), the paper considers three distribution estimation algorithms (DEAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to de...
Article
Full-text available
Astronomy data sets have led to interesting problems in mining scientific data. These problems will likely become more challenging as the astronomy community brings several surveys online as part of the National Virtual Observatory, giving rise to the possibility of mining data across many different surveys. In this article, we discuss the work we...
Article
Full-text available
Pseudo random number generators (PRNGs) are the basic input to the stochastic selection, recombination, and mutation operations of genetic algorithms (GAs). Although it does not seem like a crucial decision, recent studies suggest that the choice of PRNG can a#ect the performance of GAs. The objective of this paper is to study the e#ect of PRNGs on...
Article
Full-text available
Selection methods are essential components of evolutionary algorithms (EAs). This paper reviews five popular selection methods used in EAs. The algorithms are examined using the cumulants of the fitness distribution of the selected individuals. The cumulants are calculated using order statistics. The method presented here considers finite populatio...
Conference Paper
Full-text available
We describe an application of probabilistic modeling to the problem of recognizing radio galaxies with a bent-double morphology. The type of galaxies in question contain distinctive signatures of geometric shape and flux density that can be used to be build a probabilistic model that is then used to score potential galaxy configurations. The experi...
Conference Paper
Full-text available
The FIRST survey (Faint Images of the Radio Sky at Twenty-cm) is scheduled to cover 10,000 square degrees of the northern and southern galactic caps. Until recently, astronomers classified radio-emitting galaxies through a visual inspection of FIRST images.
Book
Preface. Acknowledgments. 1. Introduction. 2. The Gambler's Ruin and Population Sizing. 3. Master-Slave Parallel GAs. 4. Bounding Cases of GAs With Multiple Demes. 5. Markov Chain Models of Multiple Demes. 6. Migration Rates and Optimal Topologies. 7. Migration and Selection Pressure. 8. Fine-Grained and Hierarchical Parallel GAs. 9. Summary, Exten...
Chapter
Full-text available
With computers becoming more pervasive, disks becoming cheaper, and sensors becoming ubiquitous, we are collecting data at an ever-increasing pace. However, it is far easier to collect the data than to extract useful information from it. Sophisticated techniques, such as those developed in the multi-disciplinary field of data mining, are increasing...
Article
With computers becoming more pervasive, disks becoming cheaper, and sensors becoming ubiquitous, we are collecting data at an ever-increasing pace. However, it is far easier to collect the data than to extract useful information from it. Sophisticated techniques, such as those developed in the multi-disciplinary field of data mining, are increasing...
Article
In this paper, we describe the use of data mining techniques to search for radio-emitting galaxies with a bent-double morphology. In the past, astronomers from the FIRST (Faint Images of the Radio Sky at Twenty-cm) survey identified these galaxies through visual inspection. This was not only subjective but also tedious as the on-going survey now co...
Article
Full-text available
This paper investigates how the policy used to select migrants and the individuals they replace affects the selection pressure in parallel evolutionary algorithms (EAs) with multiple populations. The four possible combinations of random and fitness-based emigration and replacement of existing individuals are considered. The investigation follows tw...
Article
Full-text available
This paper investigates how the policy used to select migrants and the individuals they replace affects the selection pressure in parallel evolutionary algorithms (EAs) with multiple populations. The four possible combinations of random and fitness-based emigration and replacement of existing individuals are considered. The investigation follows tw...
Article
Full-text available
This paper introduces simple model-building evolutionary algorithms (EAs) that operate on continuous domains. The algorithms are based on supervised and unsupervised dis-cretization methods that have been used as preprocessing steps in machine learning. The basic idea is to discretize the continuous vari-ables and use the discretization as a simple...
Chapter
Data mining techniques are increasingly gaining popularity in various scientific domains as viable approaches to the analysis of massive data sets. In this chapter, we describe our experiences in applying data mining to a problem in astronomy, namely, the identification of radio-emitting galaxies with a bent-double morphology. Until recently, astro...
Chapter
Master-slave parallel GAs are easy to implement, often yield considerable improvements in performance, and all the theory available for simple GAs can be used to choose adequate values for the search parameters. The analysis of this chapter showed that, for many applications, the reduction in computation time is sufficient to overcome the cost of c...
Chapter
This chapter presented a solution to a long-standing problem in genetic algorithms: how to determine an adequate population size to reach a solution of a particular quality. The model is based on a random walk where the position of a particle on a bounded one-dimensional space represents the number of copies of the correct BBs in the population. Th...
Chapter
This chapter presented models that predict the expected solution quality of parallel GAs with multiple populations after any number of epochs and for any choice of deme size, deme count, topology, or migration rate. The basic idea was to model the parallel GAs as Markov chains to determine the number of correct BBs that are present in the demes at...
Chapter
This chapter treated fine-grained and hierarchical parallel GAs. It began with a brief review of fine-grained parallel GAs. The chapter identified some of the most salient design problems of this type of algorithms, and discussed some of the recent work on this area. This chapter focused on hierarchical combinations of parallel GAs. The hierarchica...
Chapter
The design of efficient and accurate parallel GAs is a complex problem. One must decide on a configuration among the many choices of topologies, migration rates, deme counts and sizes. Each parameter affects the quality of the search and the efficiency of the algorithm in non-linear ways, which makes the choices difficult. The ultimate goal is to d...
Chapter
The calculations presented in this chapter recognize that the design of parallel GAs is a complex problem, and that the choices of topologies, migration rates, number of demes, and their size are intimately related. To make progress on the deme-sizing problem without ignoring the other choices, the analysis used bounds on the topologies and migrati...
Chapter
The choice of migrants and the replacement of individuals are not often considered important parameters of parallel GAs. However, this chapter used two different methods to show that choosing the migrants or replacements according to their fitness increases the selection pressure. Some migration policies may cause the algorithm to converge signific...
Chapter
This chapter extended the previous deme-sizing equations to consider configurations that are likely to be used by practitioners. The first part of the chapter described the relation between the deme size, the migration rate, and the topology’s degree with the probability of success after two epochs. It showed how to find the configuration that opti...
Article
Decision tress have long been popular in classification as they use simple and easy-to-understand tests at each node. Most variants of decision trees test a single attribute at a node, leading to axis- parallel trees, where the test results in a hyperplane which is parallel to one of the dimensions in the attribute space. These trees can be rather...
Article
Full-text available
Implementations of parallel genetic algorithms (GA) with multiple populations are common, but they introduce several parameters whose effect on the quality of the search is not well understood. Parameters such as the number of populations, their size, the topology of communications, and the migration rate have to be set carefully to reach adequate...
Conference Paper
Full-text available
This paper analyzes convergence properties of the
Article
As data mining techniques are applied to ever larger data sets, it is becoming clear that parallel processors will play an important role in reducing the turn-around time for data analysis. In this paper, we describe the design of a parallel object-oriented toolkit for mining scientic data sets. After a brief discussion of our design goals, we desc...
Conference Paper
Full-text available
Migration of individuals between populations may increase the selection pressure. This has the desirable consequence of speeding up convergence, but it may result in an excessively rapid loss of variation that may cause the search to fail. This paper investigates the effects of migration on the distribution of fitness. It considers arbitrary migrat...
Article
Full-text available
This paper proposes an algorithm that uses an estimation of the joint distribution of promising solutions in order to generate new candidate solutions. The algorithm is settled into the context of genetic and evolutionary computation and the algorithms based on the estimation of distributions. The proposed algorithm is called the Bayesian Optimizat...
Conference Paper
Full-text available
This paper presents calculations of the selection intensity of common selection and replacement methods used in genetic algorithms (GAs) with generation gaps. The selection intensity measures the increase of the average fitness of the population after selection, and it can be used to predict the average fitness of the population at each iteration a...
Chapter
With computers becoming more pervasive, disks becoming cheaper, and sensors becoming ubiquitous, we are collecting data at an ever-increasing pace. However, it is far easier to collect the data than to extract useful information from it. Sophisticated techniques, such as those developed in the multi-disciplinary field of data mining, are increasing...
Article
Printout. Thesis (Ph. D.)--University of Illinois at Urbana-Champaign, 1999. Vita. Includes bibliographical references (leaves 140-146).
Article
Parallel implementations of genetic algorithms (GAs) are common, and, in most cases, they succeed to reduce the time required to find acceptable solutions. However, the effect of the parameters of parallel GAs on the quality of their search and on their efficiency are not well understood. This insufficient knowledge limits our ability to design fas...
Article
Full-text available
. Genetic algorithms (GAs) are powerful search techniques that are used successfully to solve problems in many different disciplines. Parallel GAs are particularly easy to implement and promise substantial gains in performance. As such, there has been extensive research in this field. This survey attempts to collect, organize, and present in a unif...
Article
Full-text available
Parallel genetic algorithms (GAs) are complex programs that are controlled by many parameters, which affect their search quality and their efficiency. The goal of this paper is to provide guidelines to choose those parameters rationally. The investigation centers on the sizing of populations, because previous studies show that there is a crucial re...
Article
Full-text available
This paper examines the scalability of several types of parallel genetic algorithms (GAs). The objective is to determine the optimal number of processors that can be used by each type to minimize the execution time. The first part of the paper considers algorithms with a single population. The investigation focuses on an implementation where the po...
Article
Full-text available
This paper presents a model to predict the convergence quality of genetic algorithms based on the size of the population. The model is based on an analogy between selection in GAs and one-dimensional random walks. Using the solution to a classic random walk problem-the gambler's ruin-the model naturally incorporates previous knowledge about the ini...
Article
Full-text available
As genetic algorithms (GAs) are used to solve harder problems, it is becoming necessary to use better algorithms and more efficient implementations to reach good solutions fast. This chapter describes the implementation of master-slave and multiple-population parallel GAs. The goal of the chapter is to help others to implement their own parallel co...
Conference Paper
Full-text available
The paper presents a model for predicting the convergence quality of genetic algorithms. The model incorporates previous knowledge about decision making in genetic algorithms and the initial supply of building blocks in a novel way. The result is an equation that accurately predicts the quality of the solution found by a GA using a given population...
Conference Paper
Full-text available
This paper presents models that predict the speedup of two cases that bound the possible topologies and migration rates of parallel genetic algorithms (GAs). The first bounding case is a parallel GA with completely isolated demes or subpopulations and for this case the model and the experiments show that the speedup is not very significant when mor...
Article
This paper investigates the possibility of gaining any computational benefit from multiple deme, small population GAs compared to a single large population GA. Our framework is based on an earlier decision theoretic framework developed by Goldberg, Deb and Clark (1992) for population sizing. Our analysis and empirical results for different bounding...
Article
Full-text available
The FIRST (Faint Images of the Radio Sky at Twenty-cm) survey is an ambitious project scheduled to cover 10,000 square degrees of the northern and southern galactic caps. Until recently, astronomers associated with FIRST identified radio-emitting galaxies with a bent-double morphology through a visual inspection of images. Besides being subjective,...
Article
Full-text available
Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random sam...
Article
Full-text available
In earlier work, we have described our experiences with the use of decision tree classifiers to identify radio-emitting galaxies with a bent-double morphology in the FIRST astronomical survey. We now extend this work to include ensembles of decision tree classifiers, including two algorithms developed by us. These algorithms randomize the decision...
Article
Full-text available
This paper uses Markov chainsto analyze the search quality of abounding case of parallel geneticalgorithms with multiple populations.In the bounding case consideredhere, each population exchangesindividuals with all theothers. First, the migration rateis set to the maximum value possible,and later the analysis is refinedto consider lower migration...
Article
A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the...
Article
Full-text available
High-resolution computer simulations produce large volumes of data. As a first step in the analysis of these data, supervised machine learning techniques can be used to retrieve objects similar to a query that the user finds interesting. These objects may be characterized by a large number of features, some of which may be redundant or irrelevant t...
Article
This paper describes the design and implementation of a general-purpose anomaly detector for streaming data. Based on a survey of similar work from the literature, a basic anomaly detector builds a model on normal data, compares this model to incoming data, and uses a threshold to determine when the incoming data represent an anomaly. Models compac...

Network

Cited By