Article
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The OPSWTW assumes stochastic times, making it challenging to solve using conventional solvers and standard search algorithms, such as (A)LNS, due to the need to perform many evaluations. Due to its complexity, this problem was selected for the IJCAI AI4TSP competition (Zhang et al. 2023). We summarize the contributions of our work as follows: ...
... We show how to apply the proposed DR-ALNS to solve the Orienteering Problem with Stochastic Weights and Time Windows (OPSWTW). This problem, first introduced by (Verbeeck, Vansteenwegen, and Aghezzaf 2016) and used in the IJCAI AI4TSP competition (Zhang et al. 2023), poses several challenges, such as unknown travel costs between locations, limited travel time, and time windows for customer visitation. In OPSWTW, each customer is represented as a node with a designated prize and time window for visitation, and the objective is to maximize expected collected prizes while respecting the time budget and time windows. ...
... We evaluate the performance of different algorithms on instances of the OPSWTW problem (Zhang et al. 2023). ALNS-Vanilla, ALNS-BO and DR-ALNS are initialized with an empty route, with a solution quality of 0.00. ...
Article
Full-text available
The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. The proposed method aims to learn, based on the state of the search, to configure ALNS for the next iteration to yield more effective solutions for the given optimization problem. We evaluate the proposed method on an orienteering problem with stochastic weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches that were the winning methods of the competition, achieving this with significantly fewer training observations. Furthermore, we demonstrate several good properties of the proposed DR-ALNS method: it is easily adapted to solve different routing problems, its learned policies perform consistently well across various instance sizes, and these policies can be directly applied to different problem variants.
... In fact, to account for constraints violation (e.g. maximum allowed tour duration, time window violation, unvisited customers, etc.), the method returns for every environment step a reward and a penalty, Zhang et al. (2023a). The ability to define a penalty, alongside and separately from the reward, is especially beneficial for scenarios that aim at minimizing overall travel time or distance. ...
Preprint
Full-text available
Research on Reinforcement Learning (RL) approaches for discrete optimization problems has increased considerably, extending RL to an area classically dominated by Operations Research (OR). Vehicle routing problems are a good example of discrete optimization problems with high practical relevance where RL techniques have had considerable success. Despite these advances, open-source development frameworks remain scarce, hampering both the testing of algorithms and the ability to objectively compare results. This ultimately slows down progress in the field and limits the exchange of ideas between the RL and OR communities. Here we propose a library composed of multi-agent environments that simulates classic vehicle routing problems. The library, built on PyTorch, provides a flexible modular architecture design that allows easy customization and incorporation of new routing problems. It follows the Agent Environment Cycle ("AEC") games model and has an intuitive API, enabling rapid adoption and easy integration into existing reinforcement learning frameworks. The library allows for a straightforward use of classical OR benchmark instances in order to narrow the gap between the test beds for algorithm benchmarking used by the RL and OR communities. Additionally, we provide benchmark instance sets for each environment, as well as baseline RL models and training code.
Article
Full-text available
Dung beetle optimizer (DBO) is a novel meta-heuristic algorithm proposed to imitate the habits of dung beetles. However, the parameter changes in the DBO affect the stability of the results. As the boundary shrunk is likely to cause overlap solutions, the algorithm eventually traps in local solutions. To overcome the weaknesses of DBO, the proposed version presents an integrated variant of DBO with the adaptive strategy, the dynamic boundaries individual position micro-adjustment strategy, and the mutation strategy, called BGADBO. First, an adaptive strategy is applied to overcome the instability caused by parameter changes. Then, introducing the linear scaling method to adjust the position of individuals within the dynamic boundary enriches the population diversity. The dynamic learning mechanism is introduced to enhance the adaptive capability of individuals when adjusting their positions. Finally, a Gaussian mutation mechanism is introduced to enhance the performance of the algorithm to escape the local optimum. In the experiment, we take the CEC2005 and CEC2019 benchmark functions to verify the performance of the proposed algorithm. In addition, the BGADBO is applied to several engineering optimization problems and feature selection (FS) problems to evaluate the application value. The experimental results indicate the proposed algorithm superior performance compared with the DBO and other well-established algorithms.
Conference Paper
Full-text available
The travelling thief problem (TTP) is a multi-component optimisation problem involving two interdependent NP-hard components: the travelling salesman problem (TSP) and the knapsack problem (KP). Recent state-of-the-art TTP solvers modify the underlying TSP and KP solutions in an iterative and interleaved fashion. The TSP solution (cyclic tour) is typically changed in a deterministic way, while changes to the KP solution typically involve a random search, effectively resulting in a quasi-meandering exploration of the TTP solution space. Once a plateau is reached, the iterative search of the TTP solution space is restarted by using a new initial TSP tour. We propose to make the search more efficient though an adaptive surrogate model (based on a customised form of Support Vector Regression) that learns the characteristics of initial TSP tours that lead to good TTP solutions. The model is used to filter out non-promising initial TSP tours, in effect reducing the amount of time spent to find a good TTP solution. Experiments on a broad range of benchmark TTP instances indicate that the proposed approach filters out a considerable number of non-promising initial tours, at the cost of missing only a small number of the best TTP solutions.
Article
Full-text available
Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems. In the last years, deep reinforcement learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have an important shortcoming: they only provide an approximate solution with no systematic ways to improve it or to prove optimality. In another context, constraint programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a dynamic programming formulation, that acts as a bridge between both techniques. We experimentally show that our solver is efficient to solve three challenging problems: the traveling salesman problem with time windows, the 4-moments portfolio optimization problem, and the 0-1 knapsack problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.
Article
Full-text available
Metaheuristics have been developed to provide general purpose approaches for solving hard combinatorial problems. While these frameworks often serve as the starting point for the development of problem-specific search procedures, they very rarely work efficiently in their default state. We combine the ideas of reactive search, which adjusts key parameters during search, and algorithm configuration, which fine-tunes algorithm parameters for a given set of problem instances, for the automatic compilation of a portfolio of highly reactive dialectic search heuristics for MaxSAT. Even though the dialectic search metaheuristic knows nothing more about MaxSAT than how to evaluate the cost of a truth assignment, our automatically generated solver defines a new state of the art for random weighted partial MaxSAT instances. Moreover, when combined with an industrial MaxSAT solver, the self-assembled reactive portfolio was able to win four out of nine gold medals at the recent 2016 MaxSAT Evaluation on random, crafted, and industrial partial and weighted-partial MaxSAT instances.
Conference Paper
Full-text available
We present NeuroLKH, a novel algorithm that combines deep learning with the strong traditional heuristic Lin-Kernighan-Helsgaun (LKH) for solving Traveling Salesman Problem. Specifically, we train a Sparse Graph Network (SGN) with supervised learning for edge scores and unsupervised learning for node penalties, both of which are critical for improving the performance of LKH. Based on the output of SGN, NeuroLKH creates the edge candidate set and transforms edge distances to guide the searching process of LKH. Extensive experiments firmly demonstrate that, by training one model on a wide range of problem sizes, NeuroLKH significantly outperforms LKH and generalizes well to much larger sizes. Also, we show that NeuroLKH can be applied to other routing problems such as Capacitated Vehicle Routing Problem (CVRP), Pickup and Delivery Problem (PDP), and CVRP with Time Windows (CVRPTW).
Article
Full-text available
Recent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which can be easily extended to more general k -opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions faster than previous state-of-the-art deep learning methods for the TSP. We also show we can adapt the proposed method to two extensions of the TSP: the multiple TSP and the Vehicle Routing Problem, achieving results on par with classical heuristics and learned methods.
Article
Full-text available
When a black-box optimization objective can only be evaluated with costly or noisy measurements, most standard optimization algorithms are unsuited to find the optimal solution. Specialized algorithms that deal with exactly this situation make use of surrogate models. These models are usually continuous and smooth, which is beneficial for continuous optimization problems, but not necessarily for combinatorial problems. However, by choosing the basis functions of the surrogate model in a certain way, we show that it can be guaranteed that the optimal solution of the surrogate model is integer. This approach outperforms random search, simulated annealing and a Bayesian optimization algorithm on the problem of finding robust routes for a noise-perturbed traveling salesman benchmark problem, with similar performance as another Bayesian optimization algorithm, and outperforms all compared algorithms on a convex binary optimization problem with a large number of variables.
Chapter
Full-text available
Local search metaheuristics have been developed as a general tool for solving hard combinatorial search problems. However, in practice, metaheuristics very rarely work straight out of the box. An expert is frequently needed to experiment with an approach and tweak parameters, remodel the problem, and adjust search concepts to achieve a reasonably effective approach. Reactive search techniques aim to liberate the user from having to manually tweak all of the parameters of their approach. In this paper, we focus on one of the most well-known and widely used reactive techniques, reactive tabu search (RTS) [7], and propose a hyper-parameterized tabu search approach that dynamically adjusts key parameters of the search using a learned strategy. Experiments on MaxSAT show that this approach can lead to state-of-the-art performance without any expert user involvement, even when the metaheuristic knows nothing more about the underlying combinatorial problem than how to evaluate the objective function.
Article
Full-text available
We present an end-to-end framework for solving Vehicle Routing Problem (VRP) using deep reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. Our method is faster in both training and inference than a recent method that solves the Traveling Salesman Problem (TSP), with nearly identical solution quality. On the more general VRP, our approach outperforms classical heuristics on medium-sized instances in both solution quality and computation time (after training). Our proposed framework can be applied to variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.
Conference Paper
Full-text available
Metaheuristics have been developed to provide general purpose approaches for solving hard combinatorial problems. While these frameworks often serve as the starting point for the development of problem-specific search procedures, they very rarely work efficiently in their default state. We combine the ideas of reactive search, which adjusts key parameters during search, and algorithm configuration, which fine-tunes algorithm parameters for a given set of problem instances, for the automatic compilation of a portfolio of highly reactive dialectic search heuristics for MaxSAT. Even though the dialectic search metaheuristic knows nothing more about MaxSAT than how to evaluate the cost of a truth assignment, our automatically generated solver defines a new state of the art for random weighted partial MaxSAT instances. Moreover, when combined with an industrial MaxSAT solver, the self-assembled reactive portfolio was able to win four out of nine gold medals at the recent 2016 MaxSAT Evaluation on random, crafted, and industrial partial and weighted-partial MaxSAT instances.
Article
Full-text available
Real-world optimization problems may require time consuming and expensive measurements or simulations. Recently, the application of surrogate model-based approaches was extended from continuous to combinatorial spaces. This extension is based on the utilization of suitable distance measures such as Hamming or Swap Distance. In this work, such an extension is implemented for Kriging (Gaussian Process) models. Kriging provides a measure of uncertainty when determining predictions. This can be harnessed to calculate the Expected Improvement (EI) of a candidate solution. In continuous optimization, EI is used in the Efficient Global Optimization (EGO) approach to balance exploitation and exploration for expensive optimization problems. Employing the extended Kriging model, we show for the first time that EGO can successfully be applied to combinatorial optimization problems. We describe necessary adaptations and arising issues as well as experimental results on several test problems. All surrogate models are optimized with a Genetic Algorithm (GA). To yield a comprehensive comparison, EGO and Kriging are compared to an earlier suggested Radial Basis Function Network, a linear modeling approach, as well as model-free optimization with random search and GA. EGO clearly outperforms the competing approaches on most of the tested problem instances.
Conference Paper
Full-text available
Different solution approaches for combinatorial problems often exhibit incomparable performance that depends on the concrete problem instance to be solved. Algorithm portfolios aim to combine the strengths of multiple algorithmic approaches by training a classifier that selects or schedules solvers dependent on the given instance. We de-vise a new classifier that selects solvers based on a cost-sensitive hierarchical clustering model. Experimental results on SAT and MaxSAT show that the new method outperforms the most effective portfolio builders to date.
Article
Full-text available
This paper presents the development of new elimination tests which greatly enhance the performance of a relatively well established dynamic programming approach and its application to the minimization of the total traveling cost for the traveling salesman problem with time windows. The tests take advantage of the time window constraints to significantly reduce the state space and the number of state transitions. These reductions are performed both a priori and during the execution of the algorithm. The approach does not experience problems stemming from increasing problem size, wider or overlapping time windows, or an increasing number of states nearly as rapidly as other methods. Our computational results indicate that the algorithm was successful in solving problems with up to 200 nodes and fairly wide time windows. When the density of the nodes in the geographical region was kept constant as the problem size was increased, the algorithm was capable of solving problems with up to 800 nodes. For these problems, the CPU time increased linearly with problem size. These problem sizes are much larger than those of problems previously reported in the literature.
Chapter
Full-text available
Reactive Search Optimization advocates the integration of sub-symbolic machine learning techniques into search heuristics for solving complex optimization problems. The word reactive hints at a ready response to events during the search through an internal online feedback loop for the self-tuning of critical parameters. Methodologies of interest for Reactive Search Optimization include machine learning and statistics, in particular reinforcement learning, active or query learning, neural networks, and meta-heuristics (although the boundary signalled by the “meta” prefix is not always clear).
Conference Paper
Full-text available
We introduce Hegel and Fichte’s dialectic as a search meta-heuristic for constraint satisfaction and optimization. Dialectic is an appealing mental concept for local search as it tightly integrates and yet clearly marks off of one another the two most important aspects of local search algorithms, search space exploration and exploitation. We believe that this makes dialectic search easy to use for general computer scientists and non-experts in optimization. We illustrate dialectic search, its simplicity and great efficiency on four problems from three different problem domains: constraint satisfaction, continuous optimization, and combinatorial optimization.
Conference Paper
Full-text available
A problem that is inherent to the development and efficient use of solvers is that of tuning parameters. The CP community has a long history of ad- dressing this task automatically. We propose a robust, inherently parallel genetic algorithm for the problem of configuring solvers automatically. In order to cope with the high costs of evaluating the fitness of individuals, we introduce a gender separation whereby we apply different selection pressure on both genders. Exper- imental results on a selection of SAT solvers show significant performance and robustness gains over the current state-of-the-art in automatic algorithm configu- ration.
Conference Paper
Full-text available
We present a new method for instance-specific algorithm configuration (ISAC). It is based on the integration of the algorithm configuration system GGA and the recently proposed stochastic off- line programming paradigm. ISAC is provided a solver with cate- gorical, ordinal, and/or continuous parameters, a training benchmark set of input instances for that solver, and an algorithm that com- putes a feature vector that characterizes any given instance. ISAC then provides high quality parameter settings for any new input in- stance. Experiments on a variety of different constrained optimiza- tion and constraint satisfaction solvers show that automatic algorithm configuration vastly outperforms manual tuning. Moreover, we show that instance-specific tuning frequently leads to significant speed-ups over instance-oblivious configurations.
Conference Paper
Full-text available
In this paper, we introduce a new crossover operator for the permutation representation of a GA. This new operator— Non-Wrapping Order Crossover (NWOX)—is a variation of the well-known Order Crossover (OX) operator. It strongly preserves relative order, as does the original OX, but also respects the absolute positions within the parent permuta- tions. This crossover operator is experimentally compared to several other permutation crossover operators on an NP- Hard problem known as weighted tardiness scheduling with sequence-dependent setups. A GA using this NWOX oper- ator finds new best known solutions for several benchmark problem instances and proves to be superior to the previous best performing metaheuristic for the problem. Categories and Subject Descriptors: I.2.8 (Problem
Article
Full-text available
In the Orienteering Problem (OP), we are given an undirected graph with edge weights and node prizes. The problem calls for a simple cycle whose total edge weight does not exceed a given threshold, while visiting a subset of nodes with maximum total prize. This NP-hard problem arises in routing and scheduling applications. We describe a branch-and-cut algorithm for finding an optimal OP solution. The algorithm is based on several families of valid inequalities. We also introduce a family of cuts, called conditional cuts, which can cut off the optimal OP solution, and propose an effective way to use them within the overall branch-and-cut framework. Exact and heuristic separation algorithms are described, as well as heuristic procedures to produce near-optimal OP solutions. An extensive computational analysis on several classes of both real-world and random instances is reported. The algorithm proved to be able to solve to optimality large-scale instances involving up to 500 nodes, within acceptable computing time. This compares favorably with previous published methods.
Article
Full-text available
The pickup and delivery problem with time windows is the problem of serving a number of transportation requests using a limited amount of vehicles. Each request involves moving a number of goods from a pickup location to a delivery location. Our task is to construct routes that visit all locations such that corresponding pickups and deliveries are placed on the same route, and such that a pickup is performed before the corresponding delivery. The routes must also satisfy time window and capacity constraints. This paper presents a heuristic for the problem based on an extension of the large neighborhood search heuristic previously suggested for solving the vehicle routing problem with time windows. The proposed heuristic is composed of a number of competing subheuristics that are used with a frequency corresponding to their historic performance. This general framework is denoted adaptive large neighborhood search. The heuristic is tested on more than 350 benchmark instances with up to 500 requests. It is able to improve the best known solutions from the literature for more than 50% of the problems. The computational experiments indicate that it is advantageous to use several competing subheuristics instead of just one. We believe that the proposed heuristic is very robust and is able to adapt to various instance characteristics.
Article
Full-text available
Abstract Evolution strategies (ESs) are powerful probabilistic search and optimization algorithms gleaned from biological evolution theory. They have been successfully applied to a wide range of real world applications. The modern ESs are mainly designed for solving continuous parameter optimization problems. Their ability to adapt the parameters of the multivariate normal distribution used for mutation during the optimization run makes them well suited for this domain. In this article we describe and study mixed integer evolution strategies (MIES), which are natural extensions of ES for mixed integer optimization problems. MIES can deal with parameter vectors consisting not only of continuous variables but also with nominal discrete and integer variables. Following the design principles of the canonical evolution strategies, they use specialized mutation operators tailored for the aforementioned mixed parameter classes. For each type of variable, the choice of mutation operators is governed by a natural metric for this variable type, maximal entropy, and symmetry considerations. All distributions used for mutation can be controlled in their shape by means of scaling parameters, allowing self-adaptation to be implemented. After introducing and motivating the conceptual design of the MIES, we study the optimality of the self-adaptation of step sizes and mutation rates on a generalized (weighted) sphere model. Moreover, we prove global convergence of the MIES on a very general class of problems. The remainder of the article is devoted to performance studies on artificial landscapes (barrier functions and mixed integer NK landscapes), and a case study in the optimization of medical image analysis systems. In addition, we show that with proper constraint handling techniques, MIES can also be applied to classical mixed integer nonlinear programming problems.
Chapter
Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms guarantee optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP), the vehicle routing problem (VRP) and TSP with time windows (TSPTW) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming most other ‘neural approaches’ for solving TSPs, VRPs and TSPTWs with 100 nodes.
Chapter
We present PyDGGA, a Python tool that implements a distributed version of the automatic algorithm configurator GGA, which is a specialized genetic algorithm to find high quality parameters for solvers and algorithms. PyDGGA implements GGA using an event-driven architecture and runs a simulation of future generations of the genetic algorithm to maximize the usage of the available computing resources. Overall, PyDGGA offers a friendly interface to deploy elastic distributed AC scenarios on shared high-performance computing clusters.
Chapter
Same-day delivery problems are challenging stochastic vehicle routing problems, where dynamically arriving orders have to be delivered to customers within a short time while minimizing costs. In this work, we consider the short-horizon planning of a problem variant where every order has to be delivered with the goal to minimize delivery tardiness, travel times, and labor costs of the drivers involved. Stochastic information as spatial and temporal order distributions is available upfront. Since timely routing decisions have to be made over the planning horizon of a day, the well-known sampling approach from the literature for considering expected future orders is not suitable due to its high runtimes. To mitigate this, we suggest to use a surrogate function for route durations that predicts the future delivery duration of the orders belonging to a route at its planned starting time. This surrogate function is directly used in the online optimization replacing the myopic current route duration. The function is trained offline by data obtained from running full day-simulations, sampling and solving a number of scenarios for each route at each decision point in time. We consider three different models for the surrogate function and compare with a sampling approach on challenging real-world inspired artificial instances. Results indicate that the new approach can outperform the sampling approach by orders of magnitude regarding runtime while significantly reducing travel costs in most cases.
Chapter
One method to solve expensive black-box optimization problems is to use a surrogate model that approximates the objective based on previous observed evaluations. The surrogate, which is cheaper to evaluate, is optimized instead to find an approximate solution to the original problem. In the case of discrete problems, recent research has revolved around discrete surrogate models that are specifically constructed to deal with these problems. A main motivation is that literature considers continuous methods, such as Bayesian optimization with Gaussian processes as the surrogate, to be sub-optimal (especially in higher dimensions) because they ignore the discrete structure by, e.g., rounding off real-valued solutions to integers. However, we claim that this is not true. In fact, we present empirical evidence showing that the use of continuous surrogate models displays competitive performance on a set of high-dimensional discrete benchmark problems, including a real-life application, against state-of-the-art discrete surrogate-based methods. Our experiments with different kinds of discrete decision variables and time constraints also give more insight into which algorithms work well on which type of problem.
Article
The Orienteering Problem with Time Windows (OPTW) is a combinatorial optimization problem where the goal is to maximize the total score collected from different visited locations. The application of neural network models to combinatorial optimization has recently shown promising results in dealing with similar problems, like the Travelling Salesman Problem. A neural network allows learning solutions using reinforcement learning or supervised learning, depending on the available data. After the learning stage, it can be generalized and quickly fine-tuned to further improve performance and personalization. The advantages are evident since, for real-world applications, solution quality, personalization, and execution times are all important factors that should be taken into account. This study explores the use of Pointer Network models trained using reinforcement learning to solve the OPTW problem. We propose a modified architecture that leverages Pointer Networks to better address problems related with dynamic time-dependent constraints. Among its various applications, the OPTW can be used to model the Tourist Trip Design Problem (TTDP). We train the Pointer Network with the TTDP problem in mind, by sampling variables that can change across tourists visiting a particular instance-region: starting position, starting time, available time, and the scores given to each point of interest. Once a model-region is trained, it can infer a solution for a particular tourist using beam search. We based the assessment of our approach on several existing benchmark OPTW instances. We show that it generalizes across different tourists that visit each region and that it generally outperforms the most commonly used heuristic, while computing the solution in realistic times.
Article
This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.
Conference Paper
Learning how to automatically solve optimization problems has the potential to provide the next big leap in optimization technology. The performance of automatically learned heuristics on routing problems has been steadily improving in recent years, but approaches based purely on machine learning are still outperformed by state-of-the-art optimization methods. To close this performance gap, we propose a novel large neighborhood search (LNS) framework for vehicle routing that integrates learned heuristics for generating new solutions. The learning mechanism is based on a deep neural network with an attention mechanism and has been especially designed to be integrated into an LNS search setting. We evaluate our approach on the capacitated vehicle routing problem (CVRP) and the split delivery vehicle routing problem (SDVRP). On CVRP instances with up to 297 customers, our approach significantly outperforms an LNS that uses only handcrafted heuristics and a well-known heuris-tic from the literature. Furthermore, we show for the CVRP and the SDVRP that our approach surpasses the performance of existing machine learning approaches and comes close to the performance of state-of-the-art optimization approaches.
Article
The Team Orienteering Problem (TOP) is a well-known NP-Hard vehicle routing problem in which one maximizes the collected profits for visiting some nodes. In this paper, we propose a Hybrid Adaptive Large Neighborhood Search (HALNS) to solve this problem. Our algorithm combines the exploration power of ALNS with local search procedures and an optimization stage using a Set Packing Problem to further improve the solutions. Extensive computational experiments demonstrate the high performance of our HALNS outperforming all the competing algorithms in the literature on a large set of benchmark instances in terms of solution quality and/or computational time. Our HALNS identifies all the 387 Best Known Solutions (BKS) from the literature on a first dataset including small-scale benchmark instances and all the 333 BKS for large-scale benchmark instances within very short computational times. Moreover, we improve one large-scale instance solution.
Article
In this paper, we present a new variant of the Clustered Orienteering Problem (COP) that we refer to as the Clustered Team Orienteering Problem (CluTOP). In this problem, customers are grouped into subsets called clusters. A profit is associated with each cluster and is gained only if all of its customers are served. A set of available vehicles with a limited travel time collaborates in order to visit the customers of the clusters. The objective is to maximize the total collected profit with respect to a travel time limit. The first contribution of this paper consists of an exact method based on a cutting planes approach. This method includes the consideration of a set of valid inequalities. In particular, an incompatibility-cluster-based valid inequality is proposed. Moreover, a pre-processing procedure is considered in order to compute the incompatibilities between clusters. The second contribution is a hybrid heuristic that combines an Adaptive Large Neighborhood Search (ALNS) and an effective split procedure. These two components cooperate together for the purpose of exploring both direct representation and giant tours search spaces. Experimental results show that the cutting planes based algorithm outperforms the state-of-the-art exact methods, in the particular case of a single vehicle by solving 61 additional instances. Moreover, the hybrid heuristic succeeds in finding 38 new best known solutions for the case of one vehicle. For the case with multiple vehicles, new benchmark instances are generated based on those introduced for the COP. Regarding the performance of the methods, the heuristic method finds the optimal solution for almost all the instances solved by the exact method.
Chapter
The aim of the study is to provide interesting insights on how efficient machine learning algorithms could be adapted to solve combinatorial optimization problems in conjunction with existing heuristic procedures. More specifically, we extend the neural combinatorial optimization framework to solve the traveling salesman problem (TSP). In this framework, the city coordinates are used as inputs and the neural network is trained using reinforcement learning to predict a distribution over city permutations. Our proposed framework differs from the one in [1] since we do not make use of the Long Short-Term Memory (LSTM) architecture and we opted to design our own critic to compute a baseline for the tour length which results in more efficient learning. More importantly, we further enhance the solution approach with the well-known 2-opt heuristic. The results show that the performance of the proposed framework alone is generally as good as high performance heuristics (OR-Tools). When the framework is equipped with a simple 2-opt procedure, it could outperform such heuristics and achieve close to optimal results on 2D Euclidean graphs. This demonstrates that our approach based on machine learning techniques could learn good heuristics which, once being enhanced with a simple local search, yield promising results.
Conference Paper
Efficient Global Optimization (EGO) is an effective method to optimize expensive black-box functions and utilizes Kriging models (or Gaussian process regression) trained on a relatively small design data set. In real-world applications, such as experimental optimization, where a large data set is available, the EGO algorithm becomes computationally infeasible due to the time and space complexity of Kriging. Recently, the so-called Cluster Kriging methods have been proposed to reduce such complexities for the big data, where data sets are clustered and Kriging models are built on each cluster. Furthermore, Kriging models are combined in an optimal way for the prediction. In addition, we analyze the Cluster Kriging landscape to adopt the existing infill-criteria, e.g., the expected improvement. The approach is tested on selected global optimization problems. It is shown by the empirical studies that this approach significantly reduces the CPU time of the EGO algorithm while maintaining the convergence rate of the algorithm.
Article
This paper introduces the stochastic time-dependent orienteering problem with time windows. The orienteering problem occurs in logistic situations where an optimal combination of locations must first be selected and then the routing between these selected locations must be optimized. In the stochastic time-dependent variant, the travel time between two locations is a stochastic function that depends on the departure time at the first location. The main contribution of this paper lies in the design of a fast and effective algorithm to solve this challenging problem. To validate the performance and the practical relevance of this proposed algorithm, several experiments were carried out on realistic benchmark instances of varying size and properties. These benchmark instances are constructed based on an actual large road network in Belgium with historic travel time profiles for every road segment.
Article
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
Article
We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems -- finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem -- using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.
Learning to delegate for large-scale vehicle routing
  • S Li
  • Z Yan
  • C Wu
Learning a latent search space for routing problems using variational autoencoders
  • A Hottung
  • B Bhandari
  • K Tierney
An efficient graph convolutional network technique for the travelling salesman problem
  • C K Joshi
  • T Laurent
  • X Bresson
POMO: policy optimization with multiple optima for reinforcement learning
  • Y.-D Kwon
  • J Choo
  • B Kim
  • I Yoon
  • Y Gwon
  • S Min
Reinforcement learning with combinatorial actions: An application to vehicle routing
  • A Delarue
  • R Anderson
  • C Tjandraatmadja
A learning-based iterative method for solving vehicle routing problems
  • H Lu
  • X Zhang
  • S Yang
Learning to perform local rewriting for combinatorial optimization
  • Chen
Learning collaborative policies to solve NP-hard routing problems
  • M Kim
  • J Park
  • J Kim
Taking the human out of the loop: a review of Bayesian optimization
  • Shahriari
Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning
  • J Sui
  • S Ding
  • R Liu
  • L Xu
  • D Bu
Bayesian Optimization: Open source constrained global optimization tool for Python
  • F Nogueira
Model-based genetic algorithms for algorithm configuration
  • C Ansótegui
  • Y Malitsky
  • H Samulowitz
  • M Sellmann
  • K Tierney
Learning collaborative policies to solve NP-hard routing problems
  • Kim