Conference Paper

Neural Large Neighborhood Search for the Capacitated Vehicle Routing Problem

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Learning how to automatically solve optimization problems has the potential to provide the next big leap in optimization technology. The performance of automatically learned heuristics on routing problems has been steadily improving in recent years, but approaches based purely on machine learning are still outperformed by state-of-the-art optimization methods. To close this performance gap, we propose a novel large neighborhood search (LNS) framework for vehicle routing that integrates learned heuristics for generating new solutions. The learning mechanism is based on a deep neural network with an attention mechanism and has been especially designed to be integrated into an LNS search setting. We evaluate our approach on the capacitated vehicle routing problem (CVRP) and the split delivery vehicle routing problem (SDVRP). On CVRP instances with up to 297 customers, our approach significantly outperforms an LNS that uses only handcrafted heuristics and a well-known heuris-tic from the literature. Furthermore, we show for the CVRP and the SDVRP that our approach surpasses the performance of existing machine learning approaches and comes close to the performance of state-of-the-art optimization approaches.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Da Costa et al. [111] extend this approach with dual encoding and pointing attention, allowing faster convergence to a near-optimal solution in TSP and CVRP. Hottung and Tierney [112] combine Ruin-Recreate and neural optimisation for CVRP, improving solution search by leveraging local information. We refer to these studies as learn-to-improve. ...
... Figure 5 and Table A2 (Appendix B) provide an overview of the reviewed studies, classified according to six major problem classes and their architectural and methodological structures. [55,56,58,59,80,102,[104][105][106][107][108][109][110][111][112][113]118,[124][125][126][127][128][129][130][131][132]; Prediction and Forecasting [23,[114][115][116][117]133,134]. The reviewed studies were classified according to the learning algorithms, applied model architectures, agent environment, and simulator setup strategy (Source: Own diagram). ...
... Network Optimisation [88,90]; Resource Assignment/Scheduling/Dispatching [13,68,69,71,79,82,87,89,[91][92][93][94][119][120][121][122]; Resource Allocation/Balancing/Fleet Composition [77,[97][98][99]123]; Batching and Sequencing [83,100,101]; Vehicle Routing Problem [55,56,58,59,80,102,[104][105][106][107][108][109][110][111][112][113]118,[124][125][126][127][128][129][130][131][132]; Prediction and Forecasting [23,[114][115][116][117]133,134]. The reviewed studies were classified according to the learning algorithms, applied model architectures, agent environment, and simulator setup strategy (Source: Own diagram). ...
Article
Full-text available
Intermodal freight transport (IFT) requires a large number of optimisation measures to ensure its attractiveness. This involves numerous control decisions on different time scales, making integrated optimisation with traditional methods almost unfeasible. Recently, a new trend in optimisation science has emerged: the application of Deep Learning (DL) to combinatorial problems. Neural combinatorial optimisation (NCO) enables real-time decision-making under uncertainties by considering rich context information—a crucial factor for seamless synchronisation, optimisation, and, consequently, for the competitiveness of IFT. The objective of this study is twofold. First, we systematically analyse and identify the key actors, operations, and optimisation problems in IFT and categorise them into six major classes. Second, we collect and structure the key methodological components of the NCO framework, including DL models, training algorithms, design strategies, and review the current State of the Art with a focus on NCO and hybrid DL models. Through this synthesis, we integrate the latest research efforts from three closely related fields: optimisation, transport planning, and NCO. Finally, we critically discuss and outline methodological design patterns and derive potential opportunities and obstacles for learning-based frameworks for integrated optimisation problems. Together, these efforts aim to enable a better integration of advanced DL techniques into transport logistics. We hope that this will help researchers and practitioners in related fields to expand their intuition and foster the development of intelligent decision-making systems and algorithms for tomorrow’s transport systems.
... Variants of the Travelling Salesperson Problem (TSP) and Vehicle Routing Problem frequently appear in real-world applications [1,2,3,4]. The Prize-collecting Travelling Salesperson Problem (Pc-TSP) is a member of the family of TSPs with Profits [5]. ...
... Suurballe's heuristic (a) Suurballe's heuristic generates the initial tour (0,1,5,9,7,2,0) from a pair of vertex-disjoint paths (0, 1,5,9) and (0, 2,7,9). Vertices in green are in the tour. ...
Preprint
Full-text available
In many real-world routing problems, decision makers must optimise over sparse graphs such as transportation networks with non-metric costs on the edges that do not obey the triangle inequality. Motivated by finding a sufficiently long running route in a city that minimises the air pollution exposure of the runner, we study the Prize-collecting Travelling Salesperson Problem (Pc-TSP) on sparse graphs with non-metric costs. Given an undirected graph with a cost function on the edges and a prize function on the vertices, the goal of Pc-TSP is to find a tour rooted at the origin that minimises the total cost such that the total prize is at least some quota. First, we introduce heuristics designed for sparse graphs with non-metric cost functions where previous work dealt with either a complete graph or a metric cost function. Next, we develop a branch & cut algorithm that employs a new cut we call the disjoint-paths cost-cover (DPCC) cut. Empirical experiments on two datasets show that our heuristics can produce a feasible solution with less cost than a state-of-the-art heuristic from the literature. On datasets with non-metric cost functions, DPCC is found to solve more instances to optimality than the baseline cutting algorithm we compare against.
... TAM [18] features a two-stage divide method to generate sub-route sequence to solve large-scale VRPs in real-time. Differently, improvement methods often hinge on a neighborhood search procedure such as node swap in [19], ruin-and-repair in [20], and 2-opt in [8], and often rely on longer run time than construction methods. The work in [5] extended the Transformer styled model of [8] to Dual-Aspect Collaborative Transformer (DACT), and achieved stateof-the-art performance, which was also competitive to the hybrid neural methods, e.g., the ones combined with differential evolution [21] and dynamic programming [22]. ...
... As for the construction model, its gradient norm is clipped to 1. Although ξ in Eq. (20) can regulate the strength of imitation learning, we actually use gradient clip to control this. We set ξ = 1 and clip the gradient norm of imitation learning to 0.1, 0.1, 0.01 for the three problem sizes. ...
Preprint
Full-text available
In this paper, we introduce Neural Collaborative Search (NCS), a novel learning-based framework for efficiently solving pickup and delivery problems (PDPs). NCS pioneers the collaboration between the latest prevalent neural construction and neural improvement models, establishing a collaborative framework where an improvement model iteratively refines solutions initiated by a construction model. Our NCS collaboratively trains the two models via reinforcement learning with an effective shared-critic mechanism. In addition, the construction model enhances the improvement model with high-quality initial solutions via curriculum learning, while the improvement model accelerates the convergence of the construction model through imitation learning. Besides the new framework design, we also propose the efficient Neural Neighborhood Search (N2S), an efficient improvement model employed within the NCS framework. N2S exploits a tailored Markov decision process formulation and two customized decoders for removing and then reinserting a pair of pickup-delivery nodes, thereby learning a ruin-repair search process for addressing the precedence constraints in PDPs efficiently. To balance the computation cost between encoders and decoders, N2S streamlines the existing encoder design through a light Synthesis Attention mechanism that allows the vanilla self-attention to synthesize various features regarding a route solution. Moreover, a diversity enhancement scheme is further leveraged to ameliorate the performance during the inference of N2S. Our NCS and N2S are both generic, and extensive experiments on two canonical PDP variants show that they can produce state-of-the-art results among existing neural methods. Remarkably, our NCS and N2S could surpass the well-known LKH3 solver especially on the more constrained PDP variant. Our code is available at: https://github.com/dtkon/PDP-NCS.
... Q-Learning is a model-free algorithm in which the Q-values can be learned using a table or a neural network, and the policy can be derived from the Q-values using an "ϵ-greedy" policy or a "softmax" policy [63]. Some deep learning models, such as [50], [64], [65], use auto-regressive decoding to create the solution to the VRP incrementally. In these studies, the RL is used to train a policy that selects the next node in the solution based on a reward function generated at each step. ...
... In another study on the application of ML to improve the performance of heuristics on solving VRPs, Hottung, and Tierney [65] provide a novel large neighborhood search framework that incorporates the learned heuristics for producing new solutions. The learning of their suggested method is based on a deep neural network with an AM. ...
Article
Full-text available
The vehicle routing problem (VRP) and its variants have been intensively studied by the operational research community. The existing surveys and the majority of the published articles tackle traditional solutions, including exact methods, heuristics, and meta-heuristics. Recently, machine learning (ML)-based methods have been applied to a variety of combinatorial optimization problems, specifically VRPs. The strong trend of using ML in VRPs and the gap in the literature motivated us to review the state-of-the-art. To provide a clear understanding of the ML-VRP landscape, we categorize the related studies based on their applications/constraints and technical details. We mainly focus on reinforcement learning (RL)-based approaches because of their importance in the literature, while we also address non RL-based methods. We cover both theoretical and practical aspects by clearly addressing the existing trends, research gap, and limitations and advantages of ML-based methods. We also discuss some of the potential future research directions.
... Additionally, experimentation faces limitations as the parameters interact, making grid or random search impractical for computationally expensive real-world problems. To address these issues, various machine learning approaches have been proposed that learn how to destroy or repair a solution (e.g., Hottung and Tierney (2020); Gao et al. (2020)). While these approaches have demonstrated good performance, they are problem-specific, making it difficult to adapt them to other problem variants. ...
... In parallel, a growing focus is on enhancing iterative search algorithms with ML techniques. Examples include the integration of neural networks with attention mechanisms as repair operators within the LNS framework to solve routing problems (Hottung and Tierney 2020) and the use of a neural network with domain-specific features for solution repair (Syed et al. 2019). Further advancements include (Sonnerat et al. 2021), which employs neural networks in node removal in a routing problem, and Gao et al. (2020), incorporating a graph attention network with edge embedding as an encoder to capture the effect of graph topology on the solution. ...
Article
Full-text available
The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. The proposed method aims to learn, based on the state of the search, to configure ALNS for the next iteration to yield more effective solutions for the given optimization problem. We evaluate the proposed method on an orienteering problem with stochastic weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches that were the winning methods of the competition, achieving this with significantly fewer training observations. Furthermore, we demonstrate several good properties of the proposed DR-ALNS method: it is easily adapted to solve different routing problems, its learned policies perform consistently well across various instance sizes, and these policies can be directly applied to different problem variants.
... Within these fields, two main approaches for solving optimization problems have emerged: hybrid methods and end-to-end NCO. Hybrid methods enhance conventional heuristics with neural optimization techniques (Hottung & Tierney 2019;Lu et al. 2020;Da Costa et al. 2020). In contrast, end-to-end NCO methods construct solutions step-by-step using encoder-decoder policy networks. ...
Preprint
Full-text available
Finding a feasible and prompt solution to the Vehicle Routing Problem (VRP) is a prerequisite for efficient freight transportation, seamless logistics, and sustainable mobility. Traditional optimization methods reach their limits when confronted with the real-world complexity of VRPs, which involve numerous constraints and objectives. Recently, the ability of generative Artificial Intelligence (AI) to solve combinatorial tasks, known as Neural Combinatorial Optimization (NCO), demonstrated promising results, offering new perspectives. In this study, we propose an NCO approach to solve a time-constrained capacitated VRP with a finite vehicle fleet size. The approach is based on an encoder-decoder architecture, formulated in line with the Policy Optimization with Multiple Optima (POMO) protocol and trained via a Proximal Policy Optimization (PPO) algorithm. We successfully trained the policy with multiple objectives (minimizing the total distance while maximizing vehicle utilization) and evaluated it on medium and large instances, benchmarking it against state-of-the-art heuristics. The method is able to find adequate and cost-efficient solutions, showing both flexibility and robust generalization. Finally, we provide a critical analysis of the solution generated by NCO and discuss the challenges and opportunities of this new branch of intelligent learning algorithms emerging in optimization science, focusing on freight transportation.
... Then, a neural-based policy was learned to improve the current solution by iteratively selecting and swapping two promising positions in the route. [7] developed a learning model based on the REINFORCE approach by integrating an attention mechanism with a large neighborhood search (LNS). They applied this hybrid approach to the CVRP and the split delivery vehicle routing problem (SDVRP). ...
... The policy network is typically trained with supervised learning (SL) techniques [38,14,9,21,8,24] or reinforcement learning (RL) [3,17,27,12,22,15,28,40,29,42,13,43]. Both approaches have particular challenges: SL-based methods require a large number of high-quality expert solutions to be used as labels, which are usually obtained from existing (exact) solvers. ...
Chapter
Full-text available
The constructive approach within Neural Combinatorial Optimization (NCO) treats a combinatorial optimization problem as a finite Markov decision process, where solutions are built incrementally through a sequence of decisions guided by a neural policy network. To train the policy, recent research is shifting toward a ‘self-improved’ learning methodology that addresses the limitations of reinforcement learning and supervised approaches. Here, the policy is iteratively trained in a supervised manner, with solutions derived from the current policy serving as pseudo-labels. The way these solutions are obtained from the policy determines the quality of the pseudo-labels. In this paper, we present a simple and problem-independent sequence decoding method for self-improved learning based on sampling sequences without replacement. We incrementally follow the best solution found and repeat the sampling process from intermediate partial solutions. By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives, thereby increasing solution diversity. Experimental results for the Traveling Salesman and Capacitated Vehicle Routing Problem demonstrate its strong performance. Furthermore, our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
... We compare the HNCO with several representative algorithms, including two classical COP solvers LKH3 [6] and Google OR Tools [7], two CNCOs, i.e., the AM [1] and POMO [18], and three PNCOs, namely, the NeuRewriter [35], NLNS [36], as well as the L2I [33]. The experimental results on the synthetic CVRP instances and the instances from TSPLib and CVRPLib are shown in Tables 7, and 8, respectively. ...
Article
Full-text available
In recent years, the application of Neural Combinatorial Optimization (NCO) techniques in Combinatorial Optimization (CO) has emerged as a popular and promising research direction. Currently, there are mainly two types of NCO, namely, the Constructive Neural Combinatorial Optimization (CNCO) and the Perturbative Neural Combinatorial Optimization (PNCO). The CNCO generally trains an encoder-decoder model via supervised learning to construct solutions from scratch. It exhibits high speed in construction process, however, it lacks the ability for sustained optimization due to the one-shot mapping, which bounds its potential for application. Instead, the PNCO generally trains neural network models via deep reinforcement learning (DRL) to intelligently select appropriate human-designed heuristics to improve existing solutions. It can achieve high-quality solutions but at the cost of high computational demand. To leverage the strengths of both approaches, we propose to hybrid the CNCO and PNCO by designing a hybrid framework comprising two stages, in which the CNCO is the first stage and the PNCO is the second. Specifically, in the first stage, we utilize the attention model to generate preliminary solutions for given CO instances. In the second stage, we employ DRL to intelligently select and combine appropriate algorithmic components from improvement pool, perturbation pool, and prediction pool to continuously optimize the obtained solutions. Experimental results on synthetic and real Capacitated Vehicle Routing Problems (CVRPs) and Traveling Salesman Problems(TSPs) demonstrate the effectiveness of the proposed hybrid framework with the assistance of automated algorithm design.
... The policy network is typically trained with supervised learning (SL) techniques [38,14,9,21,8,24] or reinforcement learning (RL) [3,17,27,12,22,15,28,40,29,42,13,43]. Both approaches have particular challenges: SL-based methods require a large number of high-quality expert solutions to be used as labels, which are usually obtained from existing (exact) solvers. ...
Preprint
Full-text available
The constructive approach within Neural Combinatorial Optimization (NCO) treats a combinatorial optimization problem as a finite Markov decision process, where solutions are built incrementally through a sequence of decisions guided by a neural policy network. To train the policy, recent research is shifting toward a 'self-improved' learning methodology that addresses the limitations of reinforcement learning and supervised approaches. Here, the policy is iteratively trained in a supervised manner, with solutions derived from the current policy serving as pseudo-labels. The way these solutions are obtained from the policy determines the quality of the pseudo-labels. In this paper, we present a simple and problem-independent sequence decoding method for self-improved learning based on sampling sequences without replacement. We incrementally follow the best solution found and repeat the sampling process from intermediate partial solutions. By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives, thereby increasing solution diversity. Experimental results for the Traveling Salesman and Capacitated Vehicle Routing Problem demonstrate its strong performance. Furthermore, our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
... Constructive neural heuristics have been successfully applied to a wide range of COPs, including many variants of routing problems [18,28,30,31,52,57], scheduling problems [34,58], as well as classic COPs such as the Knapsack Problem [33] and the Maximum Independent Set [27,52]. Another successful category of neural CO methods are improvement heuristics, which focus on iteratively enhancing a given complete solution, typically by using a neural network to guide the selection of local search operators [13,22,40,56]. The seminal work of [27] introduces graph embeddings which do work for a variety of CO problems but with limited performance. ...
Preprint
Full-text available
Machine Learning-based heuristics have recently shown impressive performance in solving a variety of hard combinatorial optimization problems (COPs). However they generally rely on a separate neural model, specialized and trained for each single problem. Any variation of a problem requires adjustment of its model and re-training from scratch. In this paper, we propose GOAL (for Generalist combinatorial Optimization Agent Learning), a generalist model capable of efficiently solving multiple COPs and which can be fine-tuned to solve new COPs. GOAL consists of a single backbone plus light-weight problem-specific adapters, mostly for input and output processing. The backbone is based on a new form of mixed-attention blocks which allows to handle problems defined on graphs with arbitrary combinations of node, edge and instance-level features. Additionally, problems which involve heterogeneous nodes or edges, such as in multi-partite graphs, are handled through a novel multi-type transformer architecture, where the attention blocks are duplicated to attend only the relevant combination of types while relying on the same shared parameters. We train GOAL on a set of routing, scheduling and classic graph problems and show that it is only slightly inferior to the specialized baselines while being the first multi-task model that solves a variety of COPs. Finally, we showcase the strong transfer learning capacity of GOAL by fine-tuning or learning the adapters for new problems, with only few shots and little data.
... While the ML-guided rounding explored in this section is straightforward, the inspirations for ML-assisted diving and neighborhood search in this paper are drawn from the works of [140] and [53,85] respectively. The following references are highlighted for other aspects on this subject: [45,57,80,84,86,87,110,112,122,126,145,147,172]. ...
Preprint
Full-text available
Learning to Optimize (L2O) stands at the intersection of traditional optimization and machine learning, utilizing the capabilities of machine learning to enhance conventional optimization techniques. As real-world optimization problems frequently share common structures, L2O provides a tool to exploit these structures for better or faster solutions. This tutorial dives deep into L2O techniques, introducing how to accelerate optimization algorithms, promptly estimate the solutions, or even reshape the optimization problem itself, making it more adaptive to real-world applications. By considering the prerequisites for successful applications of L2O and the structure of the optimization problems at hand, this tutorial provides a comprehensive guide for practitioners and researchers alike.
... In contrast, a trainable rule selection policy selects an operation from a collection of feasible modification procedures in each iteration. Hottong and Tierney et al. [36] employ predetermined manual procedures to systematically alter sections of the solution, which are then rebuilt using learned repair processes. Wu et al. [2] suggested using RL to select better solutions from defined local neighborhoods (e.g., 2-opt neighborhoods) for solving routing issues. ...
Article
Full-text available
Multi-objective optimization (MOO) endeavors to identify optimal solutions from a finite array of possibilities. In recent years, deep reinforcement learning (RL) has exhibited promise through its well-crafted heuristics in tackling NP-hard combinatorial optimization (CO) problems. Nonetheless, current methodologies grapple with two key challenges: (1) They primarily concentrate on single-objective optimization quandaries, rendering them less adaptable to the more prevalent MOO scenarios encountered in real-world applications. (2) These approaches furnish an approximate solution by imbibing heuristics, lacking a systematic means to enhance or substantiate optimality. Given these challenges, this study introduces an overarching hybrid strategy, dynamic programming with meta-reinforcement learning (DPML), to resolve MOO predicaments. The approach melds meta-learning into an RL framework, addressing multiple subproblems inherent to MOO. Furthermore, the precision of solutions is elevated by endowing exact dynamic programming with the prowess of meta-graph neural networks. Empirical results substantiate the supremacy of our methodology over previous RL and heuristics approaches, bridging the chasm between theoretical underpinnings and real-world applicability within this domain.
... Chen and Tian (2019); Li, Yan, and Wu (2021); Kim, Park, and Kim (2021); Wang et al. (2021) leverages local solver to rewrite partial tour to improve solution. Some studies train existing local search solvers such as 2-opt heuristic (da Costa et al. 2020;Wu et al. 2021), large neighborhood search (Hottung and Tierney 2020), iterative dynamic programming (Kool et al. 2021), and LKH (Xin et al. 2021b) using deep learning. Some studies use fine-tuning schemes focused on test-time adaptation in iterative learning (Hottung, Kwon, and Tierney 2021;Choo et al. 2022). ...
Article
Min-max routing problems aim to minimize the maximum tour length among multiple agents as they collaboratively visit all cities, i.e., the completion time. These problems include impactful real-world applications but are known as NP-hard. Existing methods are facing challenges, particularly in large-scale problems that require the coordination of numerous agents to cover thousands of cities. This paper proposes Equity-Transformer to solve large-scale min-max routing problems. First, we model min-max routing problems into sequential planning, reducing the complexity and enabling the use of a powerful Transformer architecture. Second, we propose key inductive biases that ensure equitable workload distribution among agents. The effectiveness of Equity-Transformer is demonstrated through its superior performance in two representative min-max routing tasks: the min-max multi-agent traveling salesman problem (min-max mTSP) and the min-max multi-agent pick-up and delivery problem (min-max mPDP). Notably, our method achieves significant reductions of runtime, approximately 335 times, and cost values of about 53% compared to a competitive heuristic (LKH3) in the case of 100 vehicles with 1,000 cities of mTSP. We provide reproducible source code: https://github.com/kaist-silab/equity-transformer.
... Furthermore, heuristic algorithms are often designed for specific problem, so they may lack flexibility. In recent years, with the development of deep learning and reinforcement learning technology [8][9][10][11][12][13][14], neural combinatorial optimization has become more and more popular among researchers for its short inference time, high parallelism, and strong robustness [15][16][17][18]. However, neural combinatorial optimization algorithms often require elaborate designs, esoteric domain knowledge, and long training times. ...
Article
Full-text available
The vehicle routing problem (VRP) is a common problem in logistics and transportation with high application value. In the past, many methods have been proposed to solve the vehicle routing problem and achieved good results, but with the development of neural network technology, solving the VRP through neural combinatorial optimization has attracted more and more attention by researchers because of its short inference time and high parallelism. PMOCO is the most state-of-the-art multi-objective vehicle routing optimization algorithm. However, in PMOCO, preferences are often uniformly selected, which may lead to uneven Pareto sets and may reduce the quality of solutions. To solve this problem, we propose a multi-objective vehicle routing optimization algorithm based on preference adjustment, which is improved from PMOCO. We incorporate the weight adjustment method in PMOCO that is able to adapt to different approximate Pareto fronts and to find solutions with better quality. We treat the weight adjustment as a sequential decision process and train it through deep reinforcement learning. We find that our method could adaptively search for a better combination of preferences and have strong robustness. Our method is experimented on multi-objective vehicle routing problems and obtained good results (about 6% improvement compared with PMOCO with 20 preferences).
... Lu et al. [12] use the attention-based model to select a local operator among a pool of operators to solve the capacitated vehicle routing problem (VRP). Using also an attention network, Hottung and Tierney [23] propose a neural large neighborhood search that suggests new solutions destroying and repairing parts of the current solution. ...
Article
Full-text available
Recent advances in graph neural network (GNN) architectures and increased computation power have revolutionized the field of combinatorial optimization (CO). Among the proposed models for CO problems, neural improvement (NI) models have been particularly successful. However, the existing NI approaches are limited in their applicability to problems where crucial information is encoded in the edges, as they only consider node features and nodewise positional encodings (PEs). To overcome this limitation, we introduce a novel NI model capable of handling graph-based problems where information is encoded in the nodes, edges, or both. The presented model serves as a fundamental component for hill-climbing-based algorithms that guide the selection of neighborhood operations for each iteration. Conducted experiments demonstrate that the proposed model can recommend neighborhood operations that outperform conventional versions for the preference ranking problem (PRP) with a performance in the 99 th percentile. We also extend the proposal to two well-known problems: the traveling salesman problem and the graph partitioning problem (GPP), recommending operations in the 98 th and 97 th percentile, respectively.
... Additionally, various deep learning strategies have been proposed to enhance the search efficiency of ALNS approaches. Hottung et al. [24] proposed a deep neural network model with an attention mechanism for repairing processes. Paulo et al. [25] proposed combining machine learning and two-opt operators to learn a stochastic policy to improve the solutions sequentially. ...
Article
Full-text available
Agile-satellite mission planning is a crucial issue in the construction of satellite constellations. The large scale of remote sensing missions and the high complexity of constraints in agile-satellite mission planning pose challenges in the search for an optimal solution. To tackle the issue, a dynamic destroy deep-reinforcement learning (D3RL) model is designed to facilitate subsequent optimization operations via adaptive destruction to the existing solutions. Specifically, we first perform a clustering and embedding operation to reconstruct tasks into a clustering graph, thereby improving data utilization. Secondly, the D3RL model is established based on graph attention networks (GATs) to enhance the search efficiency for optimal solutions. Moreover, we present two applications of the D3RL model for intensive scenes: the deep-reinforcement learning (DRL) method and the D3RL-based large-neighborhood search method (DRL-LNS). Experimental simulation results illustrate that the D3RL-based approaches outperform the competition in terms of solutions’ quality and computational efficiency, particularly in more challenging large-scale scenarios. DRL-LNS outperforms ALNS with an average scheduling rate improvement of approximately 11% in Area instances. In contrast, the DRL approach performs better in World scenarios, with an average scheduling rate that is around 8% higher than that of ALNS.
... HGS is a hybrid genetic search for solving the capacitated vehicle routing problem (CVRP). The problem instance set is generated using the instance generation function of [31] based on XE [32] problems. All instance sets are generated to include concept drift as follows. ...
Article
Full-text available
A solver’s runtime and the quality of the solutions it generates are strongly influenced by its parameter settings. Finding good parameter configurations is a formidable challenge, even for fixed problem instance distributions. However, when the instance distribution can change over time, a once effective configuration may no longer provide adequate performance. Realtime algorithm configuration (RAC) offers assistance in finding high-quality configurations for such distributions by automatically adjusting the configurations it recommends based on instances seen so far. Existing RAC methods treat the solver as a black box, meaning the solver is given a configuration as input, and it outputs either a solution or runtime as an objective function for the configurator. However, analyzing intermediate output from the solver can enable configurators to avoid wasting time on poorly performing configurations. We propose a gray-box approach that utilizes intermediate output during evaluation and implement it within the RAC method Contextual Preselection with Plackett-Luce (CPPL blue). We apply cost-sensitive machine learning with pairwise comparisons to determine whether ongoing evaluations can be terminated to free resources. We compare our approach to a black-box equivalent on several experimental settings and show that our approach reduces the total solving time in several scenarios and improves solution quality in an additional scenario.
... Besides the TSP, the D&R or LNS framework has also shown strong ability for solving many other combinatorial optimization problems. For example, [Hottung and Tierney, 2019] and both combine LNS with machine learning to solve vehicle routing problems (VRP), and report highly competitive results with respect to the state-of-the-art VRP algorithms. Successful applications of the LNS framework can also be found in the field of integer programming , Dial and ride problem [Vallée et al., 2019], and etc. ...
Preprint
Full-text available
For prohibitively large-scale Travelling Salesman Problems (TSPs), existing algorithms face big challenges in terms of both computational efficiency and solution quality. To address this issue, we propose a hierarchical destroy-and-repair (HDR) approach, which attempts to improve an initial solution by applying a series of carefully designed destroy-and-repair operations. A key innovative concept is the hierarchical search framework, which recursively fixes partial edges and compresses the input instance into a small-scale TSP under some equivalence guarantee. This neat search framework is able to deliver highly competitive solutions within a reasonable time. Fair comparisons based on nineteen famous large-scale instances (with 10,000 to 10,000,000 cities) show that HDR is highly competitive against existing state-of-the-art TSP algorithms, in terms of both efficiency and solution quality. Notably, on two large instances with 3,162,278 and 10,000,000 cities, HDR breaks the world records (i.e., best-known results regardless of computation time), which were previously achieved by LKH and its variants, while HDR is completely independent of LKH. Finally, ablation studies are performed to certify the importance and validity of the hierarchical search framework.
... One early example is the neural large neighborhood search (NLNS) of Hottung and Tierney [48], which integrates a learning model into the well-known large neighborhood search (LNS) algorithm. Specifically, the use of extended/large neighborhood structures has widely proved to be effective for obtaining high-quality solutions to CO problems [49], [50]. ...
Article
Full-text available
Traditional solvers for tackling combinatorial optimization (CO) problems are usually designed by human experts. Recently, there has been a surge of interest in utilizing deep learning, especially deep reinforcement learning, to automatically learn effective solvers for CO. The resultant new paradigm is termed neural combinatorial optimization (NCO). However, the advantages and disadvantages of NCO relative to other approaches have not been empirically or theoretically well studied. This work presents a comprehensive comparative study of NCO solvers and alternative solvers. Specifically, taking the traveling salesman problem as the testbed problem, the performance of the solvers is assessed in five aspects, i.e., effectiveness, efficiency, stability, scalability, and generalization ability. Our results show that the solvers learned by NCO approaches, in general, still fall short of traditional solvers in nearly all these aspects. A potential benefit of NCO solvers would be their superior time and energy efficiency for small-size problem instances when sufficient training instances are available. Hopefully, this work would help with a better understanding of the strengths and weaknesses of NCO and provide a comprehensive evaluation protocol for further benchmarking NCO approaches in comparison to other approaches.
Article
In this paper, we introduce Neural Collaborative Search (NCS), a novel learning-based framework for efficiently solving pickup and delivery problems (PDPs). NCS pioneers the collaboration between the latest prevalent neural construction and neural improvement models, establishing a collaborative framework where an improvement model iteratively refines solutions initiated by a construction model. Our NCS collaboratively trains the two models via reinforcement learning with an effective shared-critic mechanism. In addition, the construction model enhances the improvement model with high-quality initial solutions via curriculum learning, while the improvement model accelerates the convergence of the construction model through imitation learning. Besides the new framework design, we also propose the efficient Neural Neighborhood Search (N2S), an efficient improvement model employed within the NCS framework. N2S exploits a tailored Markov decision process formulation and two customized decoders for removing and then reinserting a pair of pickup-delivery nodes, thereby learning a ruin-repair search process for addressing the precedence constraints in PDPs efficiently. To balance the computation cost between encoders and decoders, N2S streamlines the existing encoder design through a light Synthesis Attention mechanism that allows the vanilla self-attention to synthesize various features regarding a route solution. Moreover, a diversity enhancement scheme is further leveraged to ameliorate the performance during the inference of N2S. Our NCS and N2S are both generic, and extensive experiments on two canonical PDP variants show that they can produce state-of-the-art results among existing neural methods. Remarkably, our NCS and N2S could surpass the well-known LKH3 solver especially on the more constrained PDP variant
Preprint
The rapid deployment of robotics technologies requires dedicated optimization algorithms to manage large fleets of autonomous agents. This paper supports robotic parts-to-picker operations in warehousing by optimizing order-workstation assignments, item-pod assignments and the schedule of order fulfillment at workstations. The model maximizes throughput, while managing human workload at the workstations and congestion in the facility. We solve it via large-scale neighborhood search, with a novel learn-then-optimize approach to subproblem generation. The algorithm relies on an offline machine learning procedure to predict objective improvements based on subproblem features, and an online optimization model to generate a new subproblem at each iteration. In collaboration with Amazon Robotics, we show that our model and algorithm generate much stronger solutions for practical problems than state-of-the-art approaches. In particular, our solution enhances the utilization of robotic fleets by coordinating robotic tasks for human operators to pick multiple items at once, and by coordinating robotic routes to avoid congestion in the facility.
Article
Electric vehicles (EVs) have been increasingly used in the logistics and transportation industry due to their cost-effectiveness and sustainability. However, one of the major challenges in optimizing routes for EVs is the Electric Vehicle Routing Problem (EVRP), which arises from their limited battery capacity. This paper proposes a Reinforcement Learning (RL)-based end-to-end framework to address EVRP with different sizes. The framework includes a Graph Attention Network (GAT)-based encoder and an attention-based decoder. In particular, an improved GAT-based encoder is employed to encrypt node and edge information from the graph-structured EVRP instances, resulting in high-dimensional node embeddings and graph embedding for downstream tasks. The decoder comprises a dual-layer attention module, which generates solutions (a sequence of input nodes) based on the global state and the embeddings from the encoder. This encoder-decoder architecture constitutes the policy network, which takes instances as input and produces solutions in an auto-regressive manner. The policy network is trained using REINFORCE with a baseline. The experiments indicate that the proposed Deep RL (DRL) method demonstrates more solvability efficiency than conventional methods (exact algorithms and heuristic algorithms) and shows superior performance than other DRL-based methods.
Article
Full-text available
Current methods for end-to-end constructive Neural Combinatorial Optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning. While behavior cloning is straightforward, it requires expensive expert solutions, and policy gradient methods are often computationally demanding and complex to fine-tune. In this work, we bridge the two and simplify the training process by sampling multiple solutions for random instances using the current model in each epoch and then selecting the best solution as an expert trajectory for supervised imitation learning. To achieve progressively improving solutions with minimal sampling, we introduce a method that combines round-wise Stochastic Beam Search with an update strategy derived from a provable policy improvement. This strategy refines the policy between rounds by utilizing the advantage of the sampled sequences with almost no computational overhead. We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data. Additionally, we apply our method to the Job Shop Scheduling Problem using a transformer-based architecture and outperform existing state-of-the-art methods by a wide margin.
Article
In order to alleviate urban congestion, improve vehicle mobility, and improve logistics delivery efficiency, this paper establishes a practical multi-objective and multi constraint logistics delivery mathematical model based on graphs, and proposes a solution algorithm framework that combines decomposition strategy and deep reinforcement learning (DRL). Firstly, taking into account the actual multiple constraints such as customer distribution, vehicle load constraints, and time windows in urban logistics distribution regions, a multi constraint and multi-objective urban logistics distribution mathematical model was established with the goal of minimizing the total length, cost, and maximum makespan of urban logistics distribution paths. Secondly, based on the decomposition strategy, a DRL framework for optimizing urban logistics delivery paths based on Graph Capsule Network (G-Caps Net) was designed. This framework takes the node information of VRP as input in the form of a 2D graph, modifies the graph attention capsule network by considering multi-layer features, edge information, and residual connections between layers in the graph structure, and replaces probability calculation with the module length of the capsule vector as output. Then, the baseline REINFORCE algorithm with rollout is used for network training, and a 2-opt local search strategy and sampling search strategy are used to improve the quality of the solution. Finally, the performance of the proposed method was evaluated on standard examples of problems of different scales. The experimental results showed that the constructed model and solution framework can improve logistics delivery efficiency. This method achieved the best comprehensive performance, surpassing the most advanced distress methods, and has great potential in practical engineering.
Preprint
Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted. These existing surveys did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to provide a comprehensive taxonomy of NCO solvers with up-to-date coverage, based on our thorough review of relevant publications and preprints, we divide all NCO solvers into four distinct categories, namely Learning to Construct, Learning to Improve, Learning to Predict-Once, and Learning to Predict-Multiplicity solvers. Subsequently, we present the inadequacies of the SOTA solvers, including poor generalization, incapability to solve large-scale VRPs, inability to address most types of VRP variants simultaneously, and difficulty in comparing these NCO solvers with the conventional Operations Research algorithms. Simultaneously, we propose promising and viable directions to overcome these inadequacies. In addition, we compare the performance of representative NCO solvers from the Reinforcement, Supervised, and Unsupervised Learning paradigms across both small- and large-scale VRPs. Finally, following the proposed taxonomy, we provide an accompanying web page as a live repository for NCO solvers. Through this survey and the live repository, we hope to make the research community of NCO solvers for VRPs more thriving.
Article
When utilizing end-to-end learn-to-construct methods to solve routing problems for multi-agent systems, the model is usually trained individually for different problem scales (i.e., the number of customers to be concurrently served within a map) to make the model adaptive to the corresponding scale, ensuring good solution quality. Otherwise, the model trained for one specific scale can lead to poor performance when applied to another different scale, and this situation can get worse when the scale discrepancy increases. Such a separate training strategy is inefficient and time-intensive. In this paper, we propose a Mix-scale learning framework that requires only a single training session, enabling the model to effectively plan high-quality routes for various problem scales. Based on the capacitated vehicle routing problem (CVRP), the test results reveal that: for problem scales which are no matter seen or unseen during training, our once-trained model can produce solution routes with performance comparable or even superior to those of individually trained models, and offer the highest average solution quality with improvement ratio ranging from 2.28% to 8.07%, which effectively spares the separate training session for each specific scale. Additionally, the extended comparison analysis with individually trained models on real-world benchmark dataset from CVRPLib further highlights our once-trained model’s generalization performance across various problem scales and diverse node distributions.
Article
Learning to optimize (L2O) stands at the intersection of traditional optimization and machine learning, utilizing the capabilities of machine learning to enhance conventional optimization techniques. As real-world optimization problems frequently share common structures, L2O provides a tool to exploit these structures for better or faster solutions. This tutorial dives deep into L2O techniques, introducing how to accelerate optimization algorithms, promptly estimate the solutions, or even reshape the optimization problem itself, making it more adaptive to real-world applications. By considering the prerequisites for successful applications of L2O and the structure of the optimization problems at hand, this tutorial provides a comprehensive guide for practitioners and researchers alike.
Article
The recent end-to-end neural solvers have shown promise for small-scale routing problems but suffered from limited real-time scaling-up performance. This paper proposes GLOP (Global and Local Optimization Policies), a unified hierarchical framework that efficiently scales toward large-scale routing problems. GLOP hierarchically partitions large routing problems into Travelling Salesman Problems (TSPs) and TSPs into Shortest Hamiltonian Path Problems. For the first time, we hybridize non-autoregressive neural heuristics for coarse-grained problem partitions and autoregressive neural heuristics for fine-grained route constructions, leveraging the scalability of the former and the meticulousness of the latter. Experimental results show that GLOP achieves competitive and state-of-the-art real-time performance on large-scale routing problems, including TSP, ATSP, CVRP, and PCTSP. Our code is available at: https://github.com/henry-yeh/GLOP.
Article
This paper provides a systematic overview of machine learning methods applied to solve NP-hard Vehicle Routing Problems (VRPs). Recently, there has been great interest from both the machine learning and operations research communities in solving VRPs either through pure learning methods or by combining them with traditional handcrafted heuristics. We present a taxonomy of studies on learning paradigms, solution structures, underlying models, and algorithms. Detailed results of state-of-the-art methods are presented, demonstrating their competitiveness with traditional approaches. The survey highlights the advantages of the machine learning-based models that aim to exploit the symmetry of VRP solutions. The paper outlines future research directions to incorporate learning-based solutions to address the challenges of modern transportation systems.
Chapter
IBM® ILOG® CP Optimizer (CPO) is a constraints solver that integrates multiple heuristics with the goal of handling a large diversity of combinatorial and scheduling problems while exposing a simple interface to users. CPO’s intent is to enable users to focus on problem modelling while automating the configuration of its optimization engine to solve the problem. For that purpose, CPO proposes an Auto search mode which implements a hard-coded logic to configure its search engine based on the runtime environment and some metrics computed on the input problem. This logic is the outcome of a mix of carefully designed rules and fine-tuning using experimental benchmarks. This paper explores the use of Machine Learning (ML) for the off-line configuration of CPO solver based on an analysis of problem instances. This data-driven effort has been motivated by the availability of a proprietary database of diverse benchmark problems that is used to evaluate and document CPO performance before each release. This work also addresses two methodological challenges: the ability of the trained predictive models to robustly generalize to the diverse set of problems that may be encountered in practice, and the integration of this new ML stage in the development workflow of the CPO product. Overall, this work resulted in a speedup improvement of about 14% (resp. 31%) on Combinatorial problems and about 5% (resp. 6%) on Scheduling problems when solving with 4 workers (resp. 8 workers), compared to the regular CPO solver.
Preprint
Homotopy optimization is a traditional method to deal with a complicated optimization problem by solving a sequence of easy-to-hard surrogate subproblems. However, this method can be very sensitive to the continuation schedule design and might lead to a suboptimal solution to the original problem. In addition, the intermediate solutions, often ignored by classic homotopy optimization, could be useful for many real-world applications. In this work, we propose a novel model-based approach to learn the whole continuation path for homotopy optimization, which contains infinite intermediate solutions for any surrogate subproblems. Rather than the classic unidirectional easy-to-hard optimization, our method can simultaneously optimize the original problem and all surrogate subproblems in a collaborative manner. The proposed model also supports real-time generation of any intermediate solution, which could be desirable for many applications. Experimental studies on different problems show that our proposed method can significantly improve the performance of homotopy optimization and provide extra helpful information to support better decision-making.
Article
Mixed integer linear programming (MILP) is an NP-hard problem, which can be solved by the branch and bound algorithm by dividing the original problem into several subproblems and forming a search tree. For each subproblem, linear programming (LP) relaxation can be solved to find the bound for making the following decisions. Recently, with the increasing dimension of MILPs in different applications, how to accelerate the solution process becomes a huge challenge. In this survey, we summarize techniques and trends to speed up MILP solving from two perspectives. First, we present different approaches in simplex initialization, which can help to accelerate the solution of LP relaxation for each subproblem. Second, we introduce the learning-based technologies in branch and bound algorithms to improve decision making in tree search. We also propose several potential directions and extensions to further enhance the efficiency of solving different MILP problems.
Article
Full-text available
We present an end-to-end framework for solving Vehicle Routing Problem (VRP) using deep reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. Our method is faster in both training and inference than a recent method that solves the Traveling Salesman Problem (TSP), with nearly identical solution quality. On the more general VRP, our approach outperforms classical heuristics on medium-sized instances in both solution quality and computation time (after training). Our proposed framework can be applied to variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.
Technical Report
Full-text available
This report describes the implementation of an extension of the Lin-Kernighan-Helsgaun TSP solver for solving constrained traveling salesman and vehicle routing problems. The extension, which is called LKH-3, is able to solve a variety of well-known problems, including the sequential ordering problem (SOP), the traveling repairman problem (TRP), variants of the multiple traveling salesman problem (mTSP), as well as vehicle routing problems (VRPs) with capacity, time windows, pickup-and-delivery and distance constraints. The implementation of LKH-3 builds on the idea of transforming the problems into standard symmetric traveling salesman problems and handling constraints by means of penalty functions. Extensive testing on benchmark instances from the literature has shown that LKH-3 is effective. Best known solutions are often obtained, and in some cases, new best solutions are found. The program is free of charge for academic and non-commercial use and can be downloaded in source code. A comprehensive library of benchmark instances and the best obtained results for these instances can also be downloaded.
Article
Full-text available
Many inverse problems are formulated as optimization problems over certain appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution. In this note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting "algorithm" will provide good accuracy-complexity tradeoffs in the average sense. We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer.
Article
Full-text available
The recent research on the CVRP is being slowed down by the lack of a good set of benchmark instances. The existing sets suffer from at least one of the following drawbacks: (i) became too easy for current algorithms; (ii) are too artificial; (iii) are too homogeneous, not covering the wide range of characteristics found in real applications. We propose a new set of 100 instances ranging from 100 to 1000 customers, designed in order to provide a more comprehensive and balanced experimental setting. Moreover, the same generating scheme was also used to provide an extended benchmark of 600 instances. In addition to having a greater discriminating ability to identify “which algorithm is better”, these new benchmarks should also allow for a deeper statistical analysis of the performance of an algorithm. In particular, they will enable one to investigate how the characteristics of an instance affect its performance. We report such an analysis on state-of-the-art exact and heuristic methods.
Article
Full-text available
Vehicle routing attributes are extra characteristics and decisions that complement the academic problem formulations and aim to properly account for real-life application needs. Hundreds of methods have been introduced in recent years for specific attributes, but the development of a single, general-purpose algorithm, which is both efficient and applicable to a wide family of variants remains a considerable challenge. Yet, such a development is critical for understanding the proper impact of attributes on resolution approaches, and to answer the needs of actual applications. This paper contributes towards addressing these challenges with a component-based design for heuristics, targeting multi-attribute vehicle routing problems, and an efficient general-purpose solver. The proposed Unified Hybrid Genetic Search metaheuristic relies on problem-independent unified local search, genetic operators, and advanced diversity management methods. Problem specifics are confined to a limited part of the method and are addressed by means of assignment, sequencing, and route-evaluation components, which are automatically selected and adapted and provide the fundamental operators to manage attribute specificities. Extensive computational experiments on 29 prominent vehicle routing variants, 42 benchmark instance sets and overall 1099 instances, demonstrate the remarkable performance of the method which matches or outperforms the current state-of-the-art problem-tailored algorithms. Thus, generality does not necessarily go against efficiency for these problem classes.
Article
Full-text available
The pickup and delivery problem with time windows is the problem of serving a number of transportation requests using a limited amount of vehicles. Each request involves moving a number of goods from a pickup location to a delivery location. Our task is to construct routes that visit all locations such that corresponding pickups and deliveries are placed on the same route, and such that a pickup is performed before the corresponding delivery. The routes must also satisfy time window and capacity constraints. This paper presents a heuristic for the problem based on an extension of the large neighborhood search heuristic previously suggested for solving the vehicle routing problem with time windows. The proposed heuristic is composed of a number of competing subheuristics that are used with a frequency corresponding to their historic performance. This general framework is denoted adaptive large neighborhood search. The heuristic is tested on more than 350 benchmark instances with up to 500 requests. It is able to improve the best known solutions from the literature for more than 50% of the problems. The computational experiments indicate that it is advantageous to use several competing subheuristics instead of just one. We believe that the proposed heuristic is very robust and is able to adapt to various instance characteristics.
Article
Full-text available
There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.
Article
Full-text available
Highly-interconnected networks of nonlinear analog neurons are shown to be extremely effective in computing. The networks can rapidly provide a collectively-computed solution (a digital output) to a problem on the basis of analog input information. The problems to be solved must be formulated in terms of desired optima, often subject to constraints. The general principles involved in constructing networks to solve specific problems are discussed. Results of computer simulations of a network designed to solve a difficult but well-defined optimization problem--the Traveling-Salesman Problem--are presented and used to illustrate the computational power of the networks. Good solutions to this problem are collectively computed within an elapsed time of only a few neural time constants. The effectiveness of the computation involves both the nonlinear analog response of the neurons and the large connectivity among them. Dedicated networks of biological or microelectronic neurons could provide the computational capabilities described for a wide class of problems having combinatorial complexity. The power and speed naturally displayed by such collective networks may contribute to the effectiveness of biological information processing.
Article
This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.
Article
One of the key challenges for operations researchers solving real-world problems is designing and implementing high-quality heuristics to guide their search procedures. In the past, machine learning techniques have failed to play a major role in operations research approaches, especially in terms of guiding branching and pruning decisions. We integrate deep neural networks into a heuristic tree search procedure to decide which branch to choose next and to estimate a bound for pruning the search tree of an optimization problem. We call our approach Deep Learning assisted heuristic Tree Search (DLTS) and apply it to a well-known problem from the container terminals literature, the container pre-marshalling problem (CPMP). Our approach is able to learn heuristics customized to the CPMP solely through analyzing the solutions to CPMP instances, and applies this knowledge within a heuristic tree search to produce the highest quality heuristic solutions to the CPMP to date.
Conference Paper
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Conference Paper
A selection hyper-heuristic is a search method that controls a prefixed set of low-level heuristics for solving a given computationally difficult problem. This study investigates a learning-via demonstrations approach generating a selection hyper-heuristic for Open Vehicle Routing Problem (OVRP). As a chosen ‘expert’ hyper-heuristic is run on a small set of training problem instances, data is collected to learn from the expert regarding how to decide which low-level heuristic to select and apply to the solution in hand during the search process. In this study, a Time Delay Neural Network (TDNN) is used to extract hidden patterns within the collected data in the form of a classifier ,i.e an ‘apprentice’ hyper-heuristic, which is then used to solve the ‘unseen’ problem instances. Firstly, the parameters of TDNN are tuned using Taguchi orthogonal array as a design of experiments method. Then the influence of extending and enriching the information collected from the expert and fed into TDNN is explored on the behaviour of the generated apprentice hyper-heuristic. The empirical results show that the use of distance between solutions as an additional information collected from the expert generates an apprentice which outperforms the expert algorithm on a benchmark of OVRP instances.
Article
Many combinatorial optimization problems over graphs are NP-hard, and require significant specialized knowledge and trial-and-error to design good heuristics or approximation algorithms. Can we automate this challenging and tedious process, and learn the algorithms instead? In many real world applications, it is typically the case that the same type of optimization problem is solved again and again on a regular basis, maintaining the same problem structure but differing in the data. This provides an opportunity for learning heuristic algorithms which can exploit the structure of such recurring problems. In this paper, we propose a unique combination of reinforcement learning and graph embedding to address this challenge. The learned greedy policy behaves like a meta-algorithm which incrementally constructs a solution, and the action is determined by the output of a graph embedding network capturing the current state of the solution. We show that our framework can be applied to a diverse range of optimization problems over graphs, and provide evidence that our learning approach can compete with or outperform specialized heuristics or approximation algorithms for the Minimum Vertex Cover, Maximum Cut and Traveling Salesman Problems.
Article
This paper concerns the Split Delivery Vehicle Routing Problem (SDVRP). This problem is a relaxation of the Capacitated Vehicle Routing Problem (CVRP) since the customers׳ demands are allowed to be split. We deal with the cases where the fleet is unlimited (SDVRP-UF) and limited (SDVRP-LF). In order to solve them, we implemented a multi-start Iterated Local Search (ILS) based heuristic that includes a novel perturbation mechanism. Extensive computational experiments were carried out on benchmark instances available in the literature. The results obtained are highly competitive, more precisely, 55 best known solutions were equaled and new improved solutions were found for 243 out of 324 instances, with an average and maximum improvement of 1.15% and 2.81%, respectively.
Article
In this paper we consider the Split Delivery Vehicle Routing Problem (SDVRP), a relaxation of the known Capacitated Vehicle Routing Problem (CVRP) in which the demand of any client can be serviced by more than one vehicle. We define a feasible solution of this problem, and we show that the convex hull of the associated incidence vectors is a polyhedron (PSDVRP), whose dimension depends on whether a vehicle visiting a client must service, or not, at least one unit of the client demand. From a partial and linear description of PSDVRP and a new family of valid inequalities, we develop a lower bound whose quality is exhibited in the computational results provided, which include the optimal resolution of some known instances - one of them with 50 clients. This instance is, as far as we know, the biggest one solved so far.
Article
We propose an algorithmic framework that successfully addresses three vehicle routing problems: the multidepot VRP, the periodic VRP, and the multidepot periodic VRP with capacitated vehicles and constrained route duration. The metaheuristic combines the exploration breadth of population-based evolutionary search, the aggressive-improvement capabilities of neighborhood-based metaheuristics, and advanced population-diversity management schemes. Extensive computational experiments show that the method performs impressively in terms of computational efficiency and solution quality, identifying either the best known solutions, including the optimal ones, or new best solutions for all currently available benchmark instances for the three problem classes. The proposed method also proves extremely competitive for the capacitated VRP.
Article
The Algorithm Selection Problem is concerned with selecting the best algorithm to solve a given problem on a case-by-case basis. It has become especially relevant in the last decade, as researchers are increasingly investigating how to identify the most suitable existing algorithm for solving a problem instead of developing new algorithms. This survey presents an overview of this work focusing on the contributions made in the area of combinatorial search problems, where Algorithm Selection techniques have achieved significant performance improvements. We unify and organise the vast literature according to criteria that determine Algorithm Selection systems in practice. The comprehensive classification of approaches identifies and analyses the different directions from which Algorithm Selection has been approached. This paper contrasts and compares different methods for solving the problem as well as ways of using these solutions. It closes by identifying directions of current and future research.
Article
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors a...
Hyper-reactive tabu search for maxsat
  • Carlos Ansótegui
  • Britta Heymann
  • Josep Pon
  • Meinolf Sellmann
  • Kevin Tierney
Carlos Ansótegui, Britta Heymann, Josep Pon, Meinolf Sellmann, and Kevin Tierney, 'Hyper-reactive tabu search for maxsat', in International Conference on Learning and Intelligent Optimization, pp. 309-325. Springer, (2018).
Neural combinatorial optimization with reinforcement learning
  • Irwan Bello
  • Hieu Pham
  • V Quoc
  • Mohammad Le
  • Samy Norouzi
  • Bengio
Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio, 'Neural combinatorial optimization with reinforcement learning', arXiv preprint arXiv:1611.09940, (2016).
Residual gated graph convnets
  • Xavier Bresson
  • Thomas Laurent
Xavier Bresson and Thomas Laurent, 'Residual gated graph convnets', arXiv preprint arXiv:1711.07553, (2017).
A fresh ruin & recreate implementation for the capacitated vehicle routing problem
  • Jan Christiaens
  • Greet Vanden Berghe
Jan Christiaens and Greet Vanden Berghe, 'A fresh ruin & recreate implementation for the capacitated vehicle routing problem', (2016).
The truck dispatching problem', Management science
  • B George
  • John H Dantzig
  • Ramser
George B Dantzig and John H Ramser, 'The truck dispatching problem', Management science, 6(1), 80-91, (1959).
Learning heuristics for the TSP by policy gradient
  • Michel Deudon
  • Pierre Cournut
  • Alexandre Lacoste
  • Yossiri Adulyasak
  • Louis-Martin Rousseau
Michel Deudon, Pierre Cournut, Alexandre Lacoste, Yossiri Adulyasak, and Louis-Martin Rousseau, 'Learning heuristics for the TSP by policy gradient', in International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pp. 170-181. Springer, (2018).
An efficient graph convolutional network technique for the travelling salesman problem
  • K Chaitanya
  • Thomas Joshi
  • Xavier Laurent
  • Bresson
Chaitanya K Joshi, Thomas Laurent, and Xavier Bresson, 'An efficient graph convolutional network technique for the travelling salesman problem', arXiv preprint arXiv:1906.01227, (2019).
Learning the multiple traveling salesmen problem with permutation invariant pooling networks
  • Yoav Kaempfer
  • Lior Wolf
Yoav Kaempfer and Lior Wolf, 'Learning the multiple traveling salesmen problem with permutation invariant pooling networks', arXiv preprint arXiv:1803.09621, (2018).
Adam: A method for stochastic optimization
  • D Kingma
  • J Ba
D. Kingma and J. Ba, 'Adam: A method for stochastic optimization', arXiv preprint arXiv:1412.6980, (2014).
Attention, learn to solve routing problems!
  • Wouter Kool
  • Max Herke Van Hoof
  • Welling
Wouter Kool, Herke van Hoof, and Max Welling, 'Attention, learn to solve routing problems!', in International Conference on Learning Representations, (2019).
Using constraint programming and local search methods to solve vehicle routing problems
  • Paul Shaw
Paul Shaw, 'Using constraint programming and local search methods to solve vehicle routing problems', in Fourth International Conference on Principles and Practice of Constraint Programming, pp. 417-431. Springer, (1998).
Neural network based large neighborhood search algorithm for ride hailing services
  • Arslan Ali Syed
  • Karim Akhnoukh
  • Bernd Kaltenhaeuser
  • Klaus Bogenberger
Arslan Ali Syed, Karim Akhnoukh, Bernd Kaltenhaeuser, and Klaus Bogenberger, 'Neural network based large neighborhood search algorithm for ride hailing services', in EPIA Conference on Artificial Intelligence, pp. 584-595. Springer, (2019).
  • Petar Veličković
  • Guillem Cucurull
  • Arantxa Casanova
  • Adriana Romero
  • Pietro Lio
  • Yoshua Bengio
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio, 'Graph attention networks', arXiv preprint arXiv:1710.10903, (2017).