Article

# A New Class of Hard Problem Instances for the 0-1 Knapsack Problem

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

The 0-1 knapsack problem is an important optimization problem, because it arises as a special case of a wide variety of optimization problems and has been generalized in several ways. Decades of research have resulted in very powerful algorithms that can solve large knapsack problem instances involving thousands of decision variables in a short amount of time. Current problem instances in the literature no longer challenge these algorithms. However, hard problem instances are important to demonstrate the strengths and weaknesses of algorithms and this knowledge can in turn be used to create better performing algorithms. In this paper, we propose a new class of hard problem instances for the 0-1 knapsack problem and provide theoretical support that helps explain why these problem instances are hard to solve to optimality. A large dataset of 3240 hard problem instances was generated and subsequently solved on a supercomputer, using approximately 810 CPU-hours. The analysis of the obtained results shows to which extent different parameters influence the hardness of the problem instances. This analysis also demonstrates that the proposed problem instances are a lot harder than the previously known hardest instances, despite being much smaller.

## No full-text available

Article
Predicting and comparing algorithm performance on graph instances is challenging for multiple reasons. First, there is not always a standard set of instances to benchmark performance. Second, using existing graph generators results in a restricted spectrum of difficulty and the resulting graphs are not always diverse enough to draw sound conclusions. That is why recent work proposes a new methodology to generate a diverse set of instances by using evolutionary algorithms. We can then analyze the resulting graphs and get key insights into which attributes are most related to algorithm performance. We can also fill observed gaps in the instance space in order to generate graphs with previously unseen combinations of features. We apply this methodology to the instance space of the Hamiltonian completion problem using two different solvers, namely the Concorde TSP Solver and a multi-start local search algorithm.
Article
Full-text available
This paper proposes a local search algorithm for a specific combinatorial optimisation problem in graph theory: the Hamiltonian completion problem (HCP) on undirected graphs. In this problem, the objective is to add as few edges as possible to a given undirected graph in order to obtain a Hamiltonian graph. This problem has mainly been studied in the context of various specific kinds of undirected graphs (e.g. trees, unicyclic graphs and series-parallel graphs). The proposed algorithm, however, concentrates on solving HCP for general undirected graphs. It can be considered to belong to the category of matheuristics, because it integrates an exact linear time solution for trees into a local search algorithm for general graphs. This integration makes use of the close relation between HCP and the minimum path partition problem, which makes the algorithm equally useful for solving the latter problem. Furthermore, a benchmark set of problem instances is constructed for demonstrating the quality of the proposed algorithm. A comparison with state-of-the-art solvers indicates that the proposed algorithm is able to achieve high-quality results.
Article
Full-text available
This paper tackles the issue of objective performance evaluation of machine learning classifiers, and the impact of the choice of test instances. Given that statistical properties or features of a dataset affect the difficulty of an instance for particular classification algorithms, we examine the diversity and quality of the UCI repository of test instances used by most machine learning researchers. We show how an instance space can be visualized, with each classification dataset represented as a point in the space. The instance space is constructed to reveal pockets of hard and easy instances, and enables the strengths and weaknesses of individual classifiers to be identified. Finally, we propose a methodology to generate new test instances with the aim of enriching the diversity of the instance space, enabling potentially greater insights than can be afforded by the current UCI repository.
Conference Paper
Full-text available
There are some questions concerning the applicability of meta-heuristic methods for real-world problems; further, some researchers claim there is a growing gap between research and practice in this area. The reason is that the complexity of real-world problems is growing very fast (e.g. due to globalisation), while researchers experiment with benchmark problems that are fundamentally the same as those of 50 years ago. Thus there is a need for a new class of benchmark problems that reflect the characteristics of real-world problems. In this paper, two main characteristics of real-world problems are introduced: combination and interdependence. We argue that real-world problems usually consist of two or more sub-problems that are interdependent (to each other). This interdependence is responsible for the complexity of the real-world problems, while the type of complexity in current benchmark problems is missing. A new problem, called the travelling thief problem, is introduced; it is a combination of two well-known problems, the knapsack problem and the travelling salesman problem. Some parameters which are responsible for the interdependence of these two sub-problems are defined. Two sets of parameters are introduced that result in generating two instances of the travelling thief problem. The complexities that are raised by interdependences for these two instances are discussed in detail. Finally, a procedure for generating these two instances is given.
Article
Full-text available
We describe an algorithm for the 0-1 knapsack problem (KP), which relies mainly on three new ideas. The first one is to focus on what we call the core of the problem, namely, a knapsack problem equivalent to KP, defined on a particular subset of the variables. The size of this core is usually a small fraction of the full problem size, and does not seem to increase with the latter. While the core cannot be identified without solving KP, a satisfactory approximation can be found by solving the associated linear program (LKP). The second new ingredient is a binary search-type procedure for solving LKP which, unlike earlier methods, does not require any ordering of the variables. The computational effort involved in this procedure is linear in the number of variables. Finally, the third new feature is a simple heuristic which under certain conditions finds an optimal solution with a probability that increases with the size of KP. Computational experience with an algorithm based on the above ideas, on several hundred randomly generated test problems with 1,000–10,000 variables and with coefficients ranging from between 10 and 100 to between 10 and 10,000, indicates that for such problems the computational effort grows linearly with the number of variables and less than linearly with the range of coefficients. Average time per problem was less than a second, and the maximum time for any single problem was 3 seconds. Value-independent 0-1 knapsack problems (also randomly generated), were solved with a specialized version of the code in less than one-third of the time required for general 0-1 knapsack problems. To conclude, we identify a class of hard knapsack problems.
Article
Full-text available
Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on previously unseen input data, using machine learning techniques to build a model of the algorithm's runtime as a function of domain-specific problem features. Such models have important applications to algorithm analysis, portfolio-based algorithm selection, and the automatic configuration of parameterized algorithms. Over the past decade, a wide variety of techniques have been studied for building such models. Here, we describe extensions and improvements of previous models, new families of models, and --- perhaps most importantly --- a much more thorough treatment of algorithm parameters as model inputs. We also describe novel features for predicting algorithm runtime for the propositional satisfiability (SAT), mixed integer programming (MIP), and travelling salesperson (TSP) problems. We evaluate these innovations through the largest empirical analysis of its kind, comparing to all previously proposed modeling techniques of which we are aware. Our experiments consider 11 algorithms and 35 instance distributions; they also span a very wide range of SAT, MIP, and TSP instances, with the least structured having been generated uniformly at random and the most structured having emerged from real industrial applications. Overall, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.
Article
Full-text available
The quadratic assignment problem (QAP), one of the most difficult problems in the NP-hard class, models many real-life problems in several areas such as facilities location, parallel and distributed computing, and combinatorial data analysis. Combinatorial optimization problems, such as the traveling salesman problem, maximal clique and graph partitioning can be formulated as a QAP. In this paper, we present some of the most important QAP formulations and classify them according to their mathematical sources. We also present a discussion on the theoretical resources used to define lower bounds for exact and heuristic algorithms. We then give a detailed discussion of the progress made in both exact and heuristic solution methods, including those formulated according to metaheuristic strategies. Finally, we analyze the contributions brought about by the study of different approaches.
Conference Paper
Full-text available
Whether the goal is performance prediction, or insights into the relationships between algorithm performance and instance characteristics, a comprehensive set of meta-data from which relationships can be learned is needed. This paper provides a methodology to determine if the meta-data is sufficient, and demonstrates the critical role played by instance generation methods. Instances of the Travelling Salesman Problem (TSP) are evolved using an evolutionary algorithm to produce distinct classes of instances that are intentionally easy or hard for certain algorithms. A comprehensive set of features is used to characterise instances of the TSP, and the impact of these features on difficulty for each algorithm is analysed. Finally, performance predictions are achieved with high accuracy on unseen instances for predicting search effort as well as identifying the algorithm likely to perform best.
Article
Full-text available
We extend the classical 0-1 knapsack problem by introducing disjunctive constraints for pairs of items which are not allowed to be packed together into the knapsack. These constraints are represented by edges of a conflict graph whose vertices correspond to the items of the knapsack problem. Similar conditions were treated in the literature for bin packing and scheduling problems. For the knapsack problem with conflict graphs, exact and heuristic algorithms were proposed in the past. While the problem is strongly NP-hard in general, we present pseudopolynomial algorithms for two special graph classes, namely graphs of bounded treewidth (including trees and series-parallel graphs) and chordal graphs. From these algorithms we can easily derive fully polynomial time approximation schemes.
Preprint
Full-text available
Already 30 years ago, Chvátal has shown that some instances of the zero-one knapsack problem cannot be solved in polynomial time using a particular type of branch-and-bound algorithms based on relaxations of linear programs together with some rudimentary cutting-plane arguments as bounding rules. We extend this result by proving an exponential lower bound in a more general class of branch-and-bound and dynamic programming algorithms which are allowed to use memoization and arbitrarily powerful bound rules to detect and remove subproblems leading to no optimal solution.
Article
Full-text available
It is widely believed that for many optimization problems, no algorithm is substantially more efficient than exhaustive search. This means that finding optimal solutions for many practical problems is completely beyond any current or projected computational capacity. To understand the origin of this extreme 'hardness', computer scientists, mathematicians and physicists have been investigating for two decades a connection between computational complexity and phase transitions in random instances of constraint satisfaction problems. Here we present a mathematically rigorous method for locating such phase transitions. Our method works by analysing the distribution of distances between pairs of solutions as constraints are added. By identifying critical behaviour in the evolution of this distribution, we can pinpoint the threshold location for a number of problems, including the two most-studied ones: random k-SAT and random graph colouring. Our results prove that the heuristic predictions of statistical physics in this context are essentially correct. Moreover, we establish that random instances of constraint satisfaction problems have solutions well beyond the reach of any analysed algorithm.
Conference Paper
Full-text available
We report results from large-scale experiments in satisfiability testing. As has been observed by others, testing the satisfiability of random formulas often appears surprisingly easy. Here we show that by using the right distribution of instances, and appropriate parameter values, it is possible to generate random formulas that are hard, that is, for which satisfiability testing is quite difficult. Our results provide a benchmark for the evaluation of satisfiability-testing procedures. Introduction Many computational tasks of interest to AI, to the extent that they can be precisely characterized at all, can be shown to be NP-hard in their most general form. However, there is fundamental disagreement, at least within the AI community, about the implications of this. It is claimed on the one hand that since the performance of algorithms designed to solve NP-hard tasks degrades rapidly with small increases in input size, something will need to be given up to obtain acceptable behavior....
Article
Full-text available
It is well known that for many NP-complete problems, such as K-Sat, etc., typical cases are easy to solve; so that computationally hard cases must be rare (assuming P 6= NP ). This paper shows that NP-complete problems can be summarized by at least one "order parameter ", and that the hard problems occur at a critical value of such a parameter. This critical value separates two regions of characteristically different properties. For example, for K-colorability, the critical value separates overconstrained from underconstrained random graphs, and it marks the value at which the probability of a solution changes abruptly from near 0 to near 1. It is the high density of wellseparated almost solutions (local minima) at this boundary that cause search algorithms to "thrash". This boundary is a type of phase transition and we show that it is preserved under mappings between problems. We show that for some P problems either there is no phase transition or it occurs for bounded N (and so bound...
Article
Full-text available
We investigate several complexity issues related to branch-and-cut algorithms for 0-1 integer programming based on lifted cover inequalities (LCIs). We show that given a fractional point, determining a violated LCI over all minimal covers is NPhard. The main result is that there exists a class of 0-1 knapsack instances for which any branch-and-cut algorithm based on LCIs has to evaluate an exponential number of nodes to prove optimality. Keywords: 0-1 integer programming, branch-and-cut, cover inequality, lifting October 1994 Revised December 1995 1 Introduction Consider the set P of feasible solutions to a 0 Gamma 1 knapsack problem with integer coefficients, i.e., P = fx 2 B n : X j2N a j x j bg where, without loss of generality, we assume a j ? 0 for j 2 N (since 0 Gamma 1 variables can be complemented) and a j b for j 2 N (since a j ? b implies x j = 0). A set C ` N is called a cover if P j2C a j ? b. A cover C is minimal if it is minimal with respect to this propert...
Article
In 2005, David Pisinger asked the question "where are the hard knapsack problems?". Noting that the classical benchmark test instances were limited in difficulty due to their selected structure, he proposed a set of new test instances for the 0-1 knapsack problem with characteristics that made them more challenging for dynamic programming and branch-and-bound algorithms. This important work highlighted the influence of diversity in test instances to draw reliable conclusions about algorithm performance. In this paper, we revisit the question in light of recent methodological advances-in the form of Instance Space Analysis-enabling the strengths and weaknesses of algorithms to be visualised and assessed across the broadest possible space of test instances. We show where the hard instances lie, and objectively assess algorithm performance across the instance space to articulate the strengths and weaknesses of algorithms. Furthermore, we propose a method to fill the instance space with diverse and challenging new test instances with controllable properties to support greater insights into algorithm selection, and drive future algorithmic innovations.
Article
In 2001 I (innocently!) asked Lane if I could write an article for his SIGACT News Complexity Theory Column that would be a poll of what computer scientists (and others) thought about P=?NP and related issues. It was to be an objective record of subjective opinions. I asked (by telegraph in those days) over 100 theorists. Exactly 100 responded, which made taking percentages very easy. That poll appeared in the SIGACT News Complexity Theory Column in 2002 (I call it the 2002 poll even though people answered it in 2001). The Wikipedia page on P=?NP links to it. That poll's readership and popularity have exceeded my wildest dreams.
Chapter
Assume that a set of n items is given, each item j having an integer profit pj and an integer weight wj. The knapsack problem asks to choose a subset of the items such that their overall profit is maximized, while the overall weight does not exceed a given capacity c. Introducing binary variables xj to indicate whether item j is included in the knapsack or not the model may be defined: $${\rm{(KP)}}\,{\rm{maximize}}\;\sum\limits_{j = 1}^n {{p_j}{x_j}}$$ (5.1)$${\rm{subject}}\;{\rm{to}}\;\sum\limits_{j = 1}^n {{w_j}{x_j}} \le c,$$ (5.2)$${x_j} \in \left\{ {0,1} \right\},\;j = 1,...,n.$$ (5.3)
Chapter
In this first chapter of extensions and generalizations of the basic knapsack problem (KP) we will add additional constraints to the single weight constraint (1.2) thus attaining the multidimensional knapsack problem. After the introduction we will deal extensively with relaxations and reductions in Section 9.2. Exact algorithms to compute optimal solutions will be covered in Section 9.3 followed by results on approximation in Section 9.4. A detailed treatment of heuristic methods will be given in Section 9.5. Separate sections are devoted to two special cases, namely the two-dimensional knapsack problem (Section 9.6) and the cardinality constrained knapsack problem (Section 9.7). Finally, we will consider the combination of multiple constraints and multiple-choice selection of items from classes (see Chapter 11 for the one-dimensional case) in Section 9.8.
Article
Our confidence in the future performance of any algorithm, including optimization algorithms, depends on how carefully we select test instances so that the generalization of algorithm performance on future instances can be inferred. In recent work, we have established a methodology to generate a two-dimensional representation of the instance space, comprising a set of known test instances. This instance space shows the similarities and differences between the instances using measurable features or properties, and enables the performance of algorithms to be viewed across the instance space, where generalizations can be inferred. The power of this methodology is the insights that can be generated into algorithm strengths and weaknesses by examining the regions in instance space where strong performance can be expected. The representation of the instance space is dependent on the choice of test instances however. In this paper we present a methodology for generating new test instances with controllable properties, by filling observed gaps in the instance space. This enables the generation of rich new sets of test instances to support better the understanding of algorithm strengths and weaknesses. The methodology is demonstrated on graph coloring as a case study.
Article
This paper tackles the difficult but important task of objective algorithm performance assessment for optimization. Rather than reporting average performance of algorithms across a set of chosen instances, which may bias conclusions, we propose a methodology to enable the strengths and weaknesses of different optimization algorithms to be compared across a broader instance space. The results reported in a recent Computers and Operations Research paper comparing the performance of graph coloring heuristics are revisited with this new methodology to demonstrate (i) how pockets of the instance space can be found where algorithm performance varies significantly from the average performance of an algorithm; (ii) how the properties of the instances can be used to predict algorithm performance on previously unseen instances with high accuracy; and (iii) how the relative strengths and weaknesses of each algorithm can be visualized and measured objectively.
Article
Little has been done in the study of these intriguing questions, and I do not wish to give the impression that any extensive set of ideas exists that could be called a "theory." What is quite surprising, as far as the histories of science and philosophy are concerned, is that the major impetus for the fantastic growth of interest in brain processes, both psychological and physiological, has come from a device, a machine, the digital computer. In dealing with a human being and a human society, we enjoy the luxury of being irrational, illogical, inconsistent, and incomplete, and yet of coping. In operating a computer, we must meet the rigorous requirements for detailed instructions and absolute precision. If we understood the ability of the human mind to make effective decisions when confronted by complexity, uncertainty, and irrationality then we could use computers a million times more effectively than we do. Recognition of this fact has been a motivation for the spurt of research in the field of neurophysiology.
Article
We consider a class of algorithms which use the combined powers of branch-and-bound, dynamic programming and rudimentary divisibility arguments for solving the zero-one knapsack problem. Our main result identifies a class of instances of the problem which are difficult to solve by such algorithms. More precisely, if reading the data takes t units of time, then the time required to solve the problem grows exponentially with the square root of t.
Article
This paper presents a formulation of the quadratic assignment problem, of which the Koopmans-Beckmann formulation is a special case. Various applications for the formulation are discussed. The equivalence of the problem to a linear assignment problem with certain additional constraints is demonstrated. A method for calculating a lower bound on the cost function is presented, and this forms the basis for an algorithm to determine optimal solutions. Further generalizations to cubic, quartic, N-adic problems are considered.
Article
Two new algorithms recently proved to outperform all previous methods for the exact solution of the 0-1 Knapsack Problem. This paper presents a combination of such approaches, where, in addition, valid inequalities are generated and surrogate relaxed, and a new initial core problem is adopted. The algorithm is able to solve all classical test instances, with up to 10,000 variables, in less than 0.2 seconds on a HP9000-735/99 computer. The C language implementation of the algorithm is available on the internet.
Article
We present a new algorithm for the optimal solution of the 0-1 Knapsack problem, which is particularly effective for large-size problems. The algorithm is based on determination of an appropriate small subset of items and the solution of the corresponding "core problem": from this we derive a heuristic solution for the original problem which, with high probability, can be proved to be optimal. The algorithm incorporates a new method of computation of upper bounds and efficient implementations of reduction procedures. The corresponding Fortran code is available. We report computational experiments on small-size and large-size random problems, comparing the proposed code with all those available in the literature.
Article
The traveling salesman problem is one of the most famous combinatorial problems. We identify a natural parameter for the two-dimensional Euclidean traveling salesman problem. We show that for random problems there is a rapid transition between soluble and insoluble instances of the decision problem at a critical value of this parameter. Hard instances of the traveling salesman problem are associated with this transition. Similar results are seen both with randomly generated problems and benchmark problems using geographical data. Surprisingly, finite-size scaling methods developed in statistical mechanics describe the behaviour around the critical value in random problems. Such phase transition phenomena appear to be ubiquitous. Indeed, we have yet to find an NP-complete problem which lacks a similar phase transition.
Article
The knapsack problem is believed to be one of the “easier” -hard problems. Not only can it be solved in pseudo-polynomial time, but also decades of algorithmic improvements have made it possible to solve nearly all standard instances from the literature. The purpose of this paper is to give an overview of all recent exact solution approaches, and to show that the knapsack problem still is hard to solve for these algorithms for a variety of new test problems. These problems are constructed either by using standard benchmark instances with larger coefficients, or by introducing new classes of instances for which most upper bounds perform badly. The first group of problems challenge the dynamic programming algorithms while the other group of problems are focused towards branch-and-bound algorithms. Numerous computational experiments with all recent state-of-the-art codes are used to show that (KP) is still difficult to solve for a wide number of problems. One could say that the previous benchmark tests were limited to a few highly structured instances, which do not show the full characteristics of knapsack problems.
Article
A new branch-and-bound algorithm for the exact solution of the 0–1 Knapsack Problem is presented. The algorithm is based on solving an ‘expanding core’, which intially only contains the break item, but which is expanded each time the branch-and-bound algorithm reaches the border of the core. Computational experiments show that most data instances are optimally solved without sorting or preprocessing a great majority of the items. Detailed program sketches are provided, and computational experiments are reported, indicating that the algorithm presented not only is shorter, but also faster and more stable than any other algorithm hitherto proposed.
Article
We address a variant of the classical knapsack problem in which an upper bound is imposed on the number of items that can be selected. This problem arises in the solution of real-life cutting stock problems by column generation, and may be used to separate cover inequalities with small support within cutting-plane approaches to integer linear programs. We focus our attention on approximation algorithms for the problem, describing a linear-storage Polynomial Time Approximation Scheme (PTAS) and a dynamic-programming based Fully Polynomial Time Approximation Scheme (FPTAS). The main ideas contained in our PTAS are used to derive PTAS's for the knapsack problem and its multi-dimensional generalization which improve on the previously proposed PTAS's. We finally illustrate better PTAS's and FPTAS's for the subset sum case of the problem in which profits and weights coincide.
Article
A fully polynomial time approximation scheme (FPTAS) is presented for the classical 0-1 knapsack problem. The new approach considerably improves the necessary space requirements. The two best previously known approaches need O(n + 1/ε3) and O(n · 1/ε) space, respectively. Our new approximation scheme requires only O(n + 1/ε2) space while also reducing the running time.
Article
A vector merging problem is introduced where two vectors of length n are merged such that the k-th entry of the new vector is the minimum over ℓ of the ℓ-th entry of the first vector plus the sum of the first k − ℓ + 1 entries of the second vector. For this problem a new algorithm with O(n log n) running time is presented thus improving upon the straightforward O(n 2) time bound. The vector merging problem can appear in different settings of dynamic programming. In particular, it is applied for a recent fully polynomial time approximation scheme (FPTAS) for the classical 0–1 knapsack problem by the same authors.
Article
We describe a computer code and data that together certify the optimality of a solution to the 85,900-city traveling salesman problem pla85900, the largest instance in the TSPLIB collection of challenge problems.
Article
A new way of computing the upper bound for the zero-one knapsack problem is presented, substantially improving on Dantzig's approach. A branch and bound algorithm is proposed, based on the above mentioned upper bound and on original backtracking and forward schemes. Extensive computational experiences indicate this new algorithm to be superior to the fastest algorithms known at present.
Article
Several types of large-sized 0-1 knapsack problems (KP) may be easily solved, but in such cases most of the computational effort is used for sorting and reduction. In order to avoid this problem it has been proposed to solve the so-called core of the problem: a knapsack problem defined on a small subset of the variables. The exact core cannot, however, be identified before KP is solved to optimality, thus previous algorithms had to rely on approximate core sizes. We present an algorithm for KP where the enumerated core size is minimal, and the computational effort for sorting and reduction also is limited according to a hierarchy. The algorithm is based on a dynamic programming approach, where the core size is extended by need, and the sorting and reduction is performed in a similar “lazy” way. Computational experiments are presented for several commonly occurring types of data instances. Experience from these tests indicate that the presented approach outperforms any known algorithm for KP, having very stable solution times.
Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems
• Jooken
MATILDA: Melbourne Algorithm Test Instance Library with Data Analytics
• K Smith-Miles
Measuring instance difficulty for combinatorial optimization problems
• Smith-Miles
Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems
• J Jooken
• P Leyman
• P De Causmaecker
• T Wauters