Article

Randomness and Structure

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This chapter covers research in constraint programming (CP) and related areas involving random problems. Such research has played a significant role in the development of more efficient and effective algorithms, as well as in understanding the source of hardness in solving combinatorially challenging problems. Random problems have proved useful in a number of different ways. Firstly, they pro- vide a relatively "unbiased" sample for benchmarking algorithms. In the early days of CP, many algorithms were compared using only a limited sample of problem instances. In some cases, this may have lead to premature conclusions. Random problems, by compar- ison, permit algorithms to be tested on statistically significant samples of hard problems. However, as we outline in the rest of this chapter, there remain pitfalls waiting the unwary in their use. For example, random problems may not contain structures found in many real world problems, and these structures can make problems much easier or much harder to solve. As a second example, the process of generating random problems may itself be "flawed", giving problem instances which are not, at least asymptotically, combinatorially hard.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... QCP are used to bridge the gap between random instances and structured problems. QCP problems have many practical applications in real world such as conflict-free wavelength routing in wide band optical networks, scheduling, statistical design, and error correcting codes [17]. A [17]. ...
... QCP problems have many practical applications in real world such as conflict-free wavelength routing in wide band optical networks, scheduling, statistical design, and error correcting codes [17]. A [17]. Table 2 shows the result of testing 6 instances of BQWH and QCP each. ...
Article
Full-text available
A Constraint Satisfaction Problem (CSP) is a very powerful framework for representing and solving constraint problems. Solving a CSP often requires searching for a solution in a large search space. Very often, much of the search efforts are wasted on the part of the search space that does not lead to a solution. Therefore, many search algorithms and heuristic techniques have been proposed to solve CSPs efficiently by guiding the search and reducing its size. Variable and value ordering techniques are among the most efficient ones as past experiments have shown that these heuristics can significantly improve the search performance and lead to the solution sooner. One such heuristic works by gathering information during search to guide subsequent decisions when selecting variables. More precisely, this heuristic gathers and records information about failures in the form of constraint weight during constraint propagation. In this paper, we propose a variant of this heuristic where the weight of a constraint is also based on the conflict and support counts, of each variable attached to this constraint, gathered during constraint propagation. We also propose a dynamic value ordering heuristic based on the support and conflict count information. Experiments have been conducted on random, quasi-random, pattern and real world instances. The test results show that the proposed variable ordering heuristic performs well in the cases of hard random and quasi-random instances. The test results also show that combining the proposed variable and value ordering heuristics can improve the performance significantly for some difficult problems.
... We have explored and documented in this Chapter most of the thesis contributions in the context of the propositional satisfiability problem. One of our earliest contributions, which has been already considered as useful and interesting by the research community (Muhammad and Stuckey, 2006;Gomes and Walsh, 2006;Santillán Rodríguez, 2007), is a model to randomly generate logical formulae not necessarily in clausal normal form. These formulae, moreover, are useful to create benchmarks to test satisfiability solvers which either directly work with non-clausal representations, or alternatively try to exploit the structural information implicitly available on clausal encodings. ...
... Muhammad and Stuckey (2006) have, in fact, already used our proposed method to evaluate the performance of their stochastic non-clausal solver. Our work was also found relevant in the constraint programming community, who have listed our method as an approach to randomly generate formulae in a handbook of their research area (Gomes and Walsh, 2006). There is also interest from people in theoretical computer science; such as Santillán Rodríguez (2007) who computed tighter bounds for the phase transition region of our random model, where the most difficult problems are expected to be found. ...
... Well-known examples are the Boolean Satisfaction Problem (SAT) [1,3], the Graph-Colouring problem [1,4], and the NPP [1,5]. The control parameter is a problem-dependent quantity that must be suitably defined; for example, in SAT, it is the ratio of the number of clauses to the number of variables, and the phase transition phenomenon has been observed in both random and structured instances [6,7]. ...
Conference Paper
Full-text available
Phase transitions play an important role in understanding search difficulty in combinatorial optimisation. However, previous attempts have not revealed a clear link between fitness landscape properties and the phase transition. We explore whether the global landscape structure of the number partitioning problem changes with the phase transition. Using the local optima network model, we analyse a number of instances before, during, and after the phase transition. We compute relevant network and neutrality metrics; and importantly, identify and visualise the funnel structure with an approach (monotonic sequences) inspired by theoretical chemistry. While most metrics remain oblivious to the phase transition, our results reveal that the funnel structure clearly changes. Easy instances feature a single or a small number of dominant funnels leading to global optima; hard instances have a large number of suboptimal funnels attracting the search. Our study brings new insights and tools to the study of phase transitions in combinatorial optimisation.
... We therefore see that a "power law" tail of the distribution P n of the form p t = Pr Pn [T (X) = t] ∼ t β , for a suitable value of β, gives us an exponential separation. There is substantial empirical evidence, and some analytical evidence, that such power law, or "heavy", tails can occur in both random and real-world instances of CSPs; for a survey, see [32]. For example, consider the case of graph k-colouring on random graphs with n vertices, where each edge is present with probability Θ(1/n). ...
Article
We describe a general method to obtain quantum speedups of classical algorithms which are based on the technique of backtracking, a standard approach for solving constraint satisfaction problems (CSPs). Backtracking algorithms explore a tree whose vertices are partial solutions to a CSP in an attempt to find a complete solution. Assume there is a classical backtracking algorithm which finds a solution to a CSP on n variables, or outputs that none exists, and whose corresponding tree contains T vertices, each vertex corresponding to a test of a partial solution. Then we show that there is a bounded-error quantum algorithm which completes the same task using O(sqrt(T) n^(3/2) log n) tests. In particular, this quantum algorithm can be used to speed up the DPLL algorithm, which is the basis of many of the most efficient SAT solvers used in practice. The quantum algorithm is based on the use of a quantum walk algorithm of Belovs to search in the backtracking tree. We also discuss how, for certain distributions on the inputs, the algorithm can lead to an average-case exponential speedup.
... Phase transitions have been seen in many other NP-hard problems and are now frequently used to benchmark new algorithms [CKT91, GW94, GW96a, GW96b, Wal11]. How does the structure of Candy Crush problems influence their hardness [GW06]? Finally, it would be interesting to see if we can profit from the time humans spend solving Candy Crush problems. ...
Article
We prove that playing Candy Crush to achieve a given score in a fixed number of swaps is NP-hard.
... The model is similar in spirit to the well-known random models in the study of the standard (constraint) satisfiability. See [27,30,40] and the references therein. ...
Article
Data reduction is a key technique in the study of fixed parameter algorithms. In the AI literature, pruning techniques based on simple and efficient-to-implement reduction rules also play a crucial role in the success of many industrial-strength solvers. Understanding the effectiveness and the applicability of data reduction as a technique for designing heuristics for intractable problems has been one of the main motivations in studying the phase transition of randomly-generated instances of NP-complete problems.In this paper, we take the initiative to study the power of data reductions in the context of random instances of a generic intractable parameterized problem, the weighted d-CNF satisfiability problem. We propose a non-trivial random model for the problem and study the probabilistic behavior of the random instances from the model. We design an algorithm based on data reduction and other algorithmic techniques and prove that the algorithm solves the random instances with high probability and in fixed-parameter polynomial time O(dknm) where n is the number of variables, m is the number of clauses, and k is the fixed parameter. We establish the exact threshold of the phase transition of the solution probability and show that in some region of the problem space, unsatisfiable random instances of the problem have parametric resolution proof of fixed-parameter polynomial size. Also discussed is a more general random model and the generalization of the results to the model.
... It was also know that random CSPs can have satisfiability thresholds and the hardest instances are around the thresholds [23]. Today it is a common practice to use random instances as benchmarks in algorithm competitions and research. ...
Conference Paper
Full-text available
A feedback vertex set is a subset of vertices whose deletion makes the remaining graph a forest. We show that the minimum FVS (MFVS) in star convex bipartite graphs is NP\mathcal{NP}-hard to find, and give a tighter lower bound on the size of MFVS in sparse random graphs, to provide further evidence on the hardness of random CSPs.
Article
We study the probabilistic behaviour of solutions of random instances of the Boolean Satisfiability (SAT) and Constraint Satisfaction Problems (CSPs) that generalize the standard notion of a satisfying assignment. Our analysis focuses on a special type of generalized solutions, the (1,0)-super solutions. For random instances of k-SAT, we establish the exact threshold of the phase transition of the solution probability for , and give upper and lower bounds on the threshold of the phase transition for the case of . For CSPs, we derive an upper bound on the threshold of having a (1,0)-super solution asymptotically with probability 1, and establish a condition for the expected number of super solutions to grow exponentially.
Article
Register allocation and instruction scheduling are two central compiler back-end problems that are critical for quality. In the last two decades, combinatorial optimization has emerged as an alternative approach to traditional, heuristic algorithms for these problems. Combinatorial approaches are generally slower but more flexible than their heuristic counterparts and have the potential to generate optimal code. This paper surveys existing literature on combinatorial register allocation and instruction scheduling. The survey covers approaches that solve each problem in isolation as well as approaches that integrate both problems. The latter have the potential to generate code that is globally optimal by capturing the trade-off between conflicting register allocation and instruction scheduling decisions.
Article
Full-text available
Robustness is about reducing the feasible set of a given nominal optimization problem by cutting “risky” solutions away. To this end, the most popular approach in the literature is to extend the nominal model with a polynomial number of additional variables and constraints, so as to obtain its robust counterpart. Robustness can also be enforced by adding a possibly exponential family of cutting planes, which typically leads to an exponential formulation where cuts have to be generated at run time. Both approaches have pros and cons, and it is not clear which is the best one when approaching a specific problem. In this paper we computationally compare the two options on some prototype problems with different characteristics. We first address robust optimization à la Bertsimas and Sim for linear programs, and show through computational experiments that a considerable speedup (up to 2 orders of magnitude) can be achieved by exploiting a dynamic cut generation scheme. For integer linear problems, instead, the compact formulation exhibits a typically better performance. We then move to a probabilistic setting and introduce the uncertain set covering problem where each column has a certain probability of disappearing, and each row has to be covered with high probability. A related uncertain graph connectivity problem is also investigated, where edges have a certain probability of failure. For both problems, compact ILP models and cutting plane solution schemes are presented and compared through extensive computational tests. The outcome is that a compact ILP formulation (if available) can be preferable because it allows for a better use of the rich arsenal of preprocessing/cut generation tools available in modern ILP solvers. For the cases where such a compact ILP formulation is not available, as in the uncertain connectivity problem, we propose a restart solution strategy and computationally show its practical effectiveness.
Article
This chapter reviews and expands our work on the relationship between linkage structure, that is how decision variables of a problem are linked with (dependent on) one another, and the performance of three basic types of genetic evolutionary algorithms (GEAs): hill climbing, genetic algorithm and bottom-up self-assembly (compositional). It explores how concepts and quantitative methods from the field of social/complex networks can be used to characterize or explain problem difficulty for GEAs. It also re-introduces two novel concepts – inter-level conflict and specificity – which view linkage structure from a level perspective. In general, the basic GEAs performed well on our test problems with linkage structures resembling those empirically observed in many real-world networks. This is a positive indication that the structure of real-world networks which evolved without any central organization such as biological networks is not only influenced by evolution and therefore exhibit non-random properties, but also influences its own evolution in the sense that certain structures are easier for evolutionary forces to adapt for survival. However, this necessarily implies the difficulty of certain other structures. Hence, the need to go beyond basic GEAs to what we call GEAs with “brains”, of which linkage-learning GEAs is one species.
Article
We study the phase transition of the coalitional manipulation problem for generalized scoring rules. Previously it has been shown that, under some conditions on the distribution of votes, if the number of manipulators is o(n)o(\sqrt{n}), where n is the number of voters, then the probability that a random profile is manipulable by the coalition goes to zero as the number of voters goes to infinity, whereas if the number of manipulators is ω(n)\omega(\sqrt{n}), then the probability that a random profile is manipulable goes to one. Here we consider the critical window, where a coalition has size cnc\sqrt{n}, and we show that as c goes from zero to infinity, the limiting probability that a random profile is manipulable goes from zero to one in a smooth fashion, i.e., there is a smooth phase transition between the two regimes. This result analytically validates recent empirical results, and suggests that deciding the coalitional manipulation problem may not be computationally hard in practice.
Conference Paper
We show that in Erdős-Rényi random graph G(n,p) with high probability, when p = c/n and c is a constant, the treewidth is upper bounded by tn for some constant t c, but when p ≫ 1/n, the treewidth is lower bounded by n − o(n). The upper bound refutes a conjecture that treewidth in G(n,p = c/n) is as large as n − o(n), and the lower bound provides further theoretical evidence on hardness of some random constraint satisfaction problems called Model RB and Model RD.
Conference Paper
Consider random hypergraphs on n vertices, where each k-element subset of vertices is selected with probability p independently and randomly as a hyperedge. By sparse we mean that the total number of hyperedges is O(n) or O(n ln n). When k = 2, these are exactly the classical Erdös-Rényi random graphs G(n, p). We prove that with high probability, hinge width on these sparse random hypergraphs can grow linearly with the expected number of hyperedges. Some random constraint satisfaction problems such as Model RB and Model RD have satisfiability thresholds on these sparse constraint hypergraphs, thus the large hinge width results provide some theoretical evidence for random instances around satisfiability thresholds to be hard for a standard hinge-decomposition based algorithm. We also conduct experiments on these and other kinds of random graphs with several hundreds vertices, including regular random graphs and power law random graphs. The experimental results also show that hinge width can grow linearly with the number of edges on these different random graphs. These results may be of further interests.
Conference Paper
Sequencing errors in high-throughput sequencing data constitute one of the major problems in analyzing such data. Error correction can reduce the error rate. However, it is a computation and data intensive process for large-scale data. This poses challenges for more efficient and scalable algorithms. In this paper, we propose PSAEC, an improved algorithm for short read error correction using partial suffix arrays in high-throughput sequencing data. Our algorithm optimizes the HiTEC program by replacing full suffix arrays with partial suffix arrays to index reads which is more time and space efficient. Moreover, PSAEC is a scalable parallel algorithm that can works well on multi-core computers using Pthread. Experiments show that our algorithm delivers good, scalable performance.
Article
One possible escape from the Gibbard-Satterthwaite theorem is computational complexity. For example, it is NP-hard to compute if the STV rule can be manipulated. However, there is increasing concern that such results may not reflect the difficulty of manipulation in practice. In this tutorial, I survey recent results in this area. The Gibbard Satterthwaite theorem proves that, under some simple assumptions, a voting rule can always be manipulated. A number of possible escapes have been suggested. For example, if we relax the assumption of an universal domain and replace it with single peaked preferences, then strategy free voting rules exist. In an influential paper [1], Bartholdi, Tovey and Trick proposed that complexity might offer another escape: perhaps it is computationally so difficult to find a successful manipulation that agents have little option but to report their true preferences? Many voting rules have subsequently been shown to be NP-hard to manipulate [3]. However, NP-hardness only dictates the worst-case and may not reflect the difficulty of manipulation in practice. Indeed, a number of recent theoretical results suggest that manipulation can often be easy (e.g. [19]).
Article
When agents are acting together, they may need a simple mechanism to decide on joint actions. One possibility is to have the agents express their preferences in the form of a ballot and use a voting rule to decide the winning action(s). Unfortunately, agents may try to manipulate such an election by misreporting their preferences. Fortunately, it has been shown that it is NP-hard to compute how to manipulate a number of different voting rules. However, NP-hardness only bounds the worst-case complexity. Recent theoretical results suggest that manipulation may often be easy in practice. To address this issue, I suggest studying empirically if computational complexity is in practice a barrier to manipulation. The basic tool used in my investigations is the identification of computational "phase transitions". Such an approach has been fruitful in identifying hard instances of propositional satisfiability and other NP-hard problems. I show that phase transition behaviour gives insight into the hardness of manipulating voting rules, increasing concern that computational complexity is indeed any sort of barrier. Finally, I look at the problem of computing manipulation of other, related problems like stable marriage and tournament problems.
Article
Full-text available
Given a monotone graph property P , consider p (P ), the probability that a random graph with edge probability p will have P . The function d p (P )=dp is the key to understanding the threshold behavior of the property P . We show that if d p (P )=dp is small (corresponding to a non-sharp threshold), then there is a list of graphs of bounded size such that P can be approximated by the property of having one of the graphs as a subgraph. One striking consequences of this result is that a coarse threshold for a random graph property can only happen when the value of the critical edge probability is a rational power of n.
Article
Full-text available
We present empirical evidence that the distribution of effort required to solve CSPs randomly generated at the 50% satisfiable point, when using a backtracking algorithm, can be approximated by two standard fam-ilies of continuous probability distribution functions. Solvable problems can be modelled by the Weibull dis-tribution, and unsolvable problems by the lognormal distribution. These distributions fit equally well over a variety of backtracking based algorithms.
Article
Full-text available
Let A be a Las Vegas algorithm, i.e., A is a randomized algorithm that always produces the correct answer when it stops but whose running time is a random variable. We consider the problem of minimizing the expected time required to obtain an answer from A using strategies which simulate A as follows: run A for a fixed amount of time t1, then run A independently for a fixed amount of time t2, etc. The simulation stops if A completes its execution during any of the runs. Let scL = (t1, t2,…) be a strategy, and let lA = infscLT(A, scL), where T(A, scL) i s the expected value of the running time of the simulation of A under strategy scL.We describe a simple universal strategy scLuniv, with the property that, for any algorithm A, T(A, scLuniv) = O(lin A log(linA)). Furthermore, we show that this is the best performance that can be achieved, up to a constant factor, by any universal strategy.
Conference Paper
Full-text available
Combinatorial search methods often exhibit a large variability in performance. We study the cost profiles of combinatorial search procedures. Our study reveals some intriguing properties of such cost profiles. The distributions are often characterized by very long tails or "heavy tails". We will show that these distributions are best characterized by a general class of distributions that have no moments (i.e., an infinite mean, variance, etc.). Such non-standard distributions have recently been observed in areas as diverse as economics, statistical physics, and geophysics. They are closely related to fractal phenomena, whose study was introduced by Mandelbrot. We believe this is the first finding of these distributions in a purely computational setting. We also show how random restarts can effectively eliminate heavy-tailed behavior, thereby dramatically improving the overall performance of a search procedure.
Conference Paper
Full-text available
We study the backbone and the backdoors of propositional satisfiability problems. We make a number of theoretical, al-gorithmic and experimental contributions. From a theoret-ical perspective, we prove that backbones are hard even to approximate. From an algorithmic perspective, we present a number of different procedures for computing backdoors. From an empirical perspective, we study the correlation be-tween being in the backbone and in a backdoor. Experiments show that there tends to be very little overlap between back-bones and backdoors. We also study problem hardness for the Davis Putnam procedure. Problem hardness appears to be correlated with the size of strong backdoors, and weakly correlated with the size of the backbone, but does not appear to be correlated to the size of weak backdoors nor their num-ber. Finally, to isolate the effect of backdoors, we look at problems with no backbone.
Conference Paper
Full-text available
The study of phase transitions in algorithmic problems has revealed that usually the critical value of the constrainedness parameter at which the phase transition occurs coincides with the value at which the average cost of natural solvers for the problem peaks. In particular, this confluence of phase tran- sition and peak cost has been observed for the Boolean sat- isfiability problem and its variants, where the solver used is aDavis-Putnam-type procedure or a suitable modification of it. Here, we investigate the relationship between phase transi- tions and peak cost for a family of PP-complete satisfiability problems, where the solver used is a symmetric Threshold Counting Davis-Putnam (TCDP) procedure, i.e., a modifica- tion of the Counting Davis-Putnam procedure for computing the number of satisfying assignments of a Boolean formula. Our main experimental finding is that, for each of the PP- complete problems considered, the asymptotic probability of solvability undergoes a phase transition at some critical ratio of clauses to variables, but this critical ratio does not always coincide with the ratio at which the average search cost of the symmetric TCDP procedure peaks. Actually, for some of these problems the peak cost occurs at the boundary or even outside of the interval in which the probability of solvability drops from 0.9 to 0.1, and we analyze why this happens.
Conference Paper
Full-text available
. Recent work on phase transition has detected apparentlyinteresting phenomena in the distribution of hardoptimisation problems (find, on some measure, the least msuch that the given instance x has a solution of value m) andtheir corresponding decision problems (determine, for a givenbound m whether or not x has a solution of value m). Thispaper examines the relationship between the hardness of optimisationand that of decision. We identify an expression forthe latter in terms of the...
Conference Paper
Full-text available
We study the impact of backbones in optimization and approximation problems. We show that some optimization problems like graph coloring resem- ble decision problems, with problem hardness pos- itively correlated with backbone size. For other op- timization problems like blocks world planning and traveling salesperson problems, problem hardness is weakly and negatively correlated with backbone size, while the cost of finding optimal and approx- imate solutions is positively correlated with back- bone size. A third class of optimization problems like number partitioning have regions of both types of behavior. We find that to observe the impact of backbone size on problem hardness, it is necessary to eliminate some symmetries, perform trivial re- ductions and factor out the effective problem size.
Article
Full-text available
In recent years, there has been much interest in phase transitions of combinatorial problems. Phase transitions have been successfully used to analyze combinatorial optimization problems, characterize their typical-case features and locate the hardest problem instances. In this paper, we study phase transitions of the asymmetric Traveling Salesman Problem (ATSP), an NP-hard combinatorial optimization problem that has many real-world applications. Using random instances of up to 1,500 cities in which intercity distances are uniformly distributed, we empirically show that many properties of the problem, including the optimal tour cost and backbone size, experience sharp transitions as the precision of intercity distances increases across a critical value. Our experimental results on the costs of the ATSP tours and assignment problem agree with the theoretical result that the asymptotic cost of assignment problem is pi ^2 /6 the number of cities goes to infinity. In addition, we show that the average computational cost of the well-known branch-and-bound subtour elimination algorithm for the problem also exhibits a thrashing behavior, transitioning from easy to difficult as the distance precision increases. These results answer positively an open question regarding the existence of phase transitions in the ATSP, and provide guidance on how difficult ATSP problem instances should be generated.
Article
The standard models used to generate random binary constraint satisfaction problems are described. At the problem sizes studied experimentally, a phase transition is seen as the constraint tightness is varied. However, D. Achlioptas, L. M. Kirousis, E. Kranakis, D. Krizanc, M. S. O. Molloy and Y. C. Stamatiou [Lect. Notes Comput. Sci. 1330, 107-120 (1997)] showed that if the problem size (number of variables) increases while the remaining parameters are kept constant, asymptotically almost all instances are unsatisfiable. In this paper, an alternative scheme for one of the standard models is proposed in which both the number of values in each variable’s domain and the average degree of the constraint graph are increased with problem size. It is shown that with this scheme there is asymptotically a range of values of the constraint tightness in which instances are trivially satisfiable with probability at least 0.5 and a range in which instances are almost all unsatisfiable; hence there is a crossover point at some value of the constraint tightness between these two ranges. This scheme is compared to a similar scheme due to Xu and Li.
Conference Paper
There has been significant recent progress in reasoning and constraint processing methods. In areas such as planning and finite model-checking, current solution techniques can handle combinatorial problems with up to a million variables and five million constraints. The good scaling behavior of these methods appears to defy what one would expect based on a worst-case complexity analysis. In order to bridge this gap between theory and practice, we propose a new framework for studying the complexity of these techniques on practical problem instances. In particular, our approach incorporates general structural properties observed in practical problem instances into the formal complexity analysis. We introduce a notion of "backdoors", which are small sets of variables that capture the overall combinatorics of the problem instance. We provide empirical results showing the existence of such backdoors in real-world problems. We then present a series of complexity results that explain the good scaling behavior of current reasoning and constraint methods observed on practical problem instances.
Article
To evaluate the performance of SAT algorithms empirically, we have already proposed a new type of random test-instance generator with known answers and proved securities. In this paper, we mainly discuss its controllability of several attributes such as the literal distribution, the clause-size distribution and the number of satisfying truth assignments. The generation algorithm has been implemented into four different generators, which have been used to test a Davis-Putnam algorithm and a local search algorithm to examine how the former works for unsatisfiable predicates of low clause/variable ratio and how the latter’s efficiency depends on the number of solutions.
Article
Stochastic algorithms are among the best for solving computationally hard search and reasoning problems. The runtime of such procedures is characterized by a random variable. Different algorithms give rise to different probability distributions. One can take advantage of such differences by combining several algorithms into a portfolio, and running them in parallel or interleaving them on a single processor. We provide a detailed evaluation of the portfolio approach on distributions of hard combinatorial search problems. We show under what conditions the protfolio approach can have a dramatic computational advantage over the best traditional methods.
Article
Recently developed techniques of the statistical mechanics of random systems are applied to the graph partitioning problem. The averaged cost function is calculated and agrees well with numerical results. The problem bears close resemblance to that of spin glasses. The authors find a spin glass transition in the system, and the low temperature phase space has an ultrametric structure. This sheds light on the nature of hard computation problems.
Article
The heavy-tailed phenomenon that characterises the runtime distributions of backtrack search procedures has received considerable attention over the past few years. Some have conjectured that heavy-tailed behaviour is largely due to the characteristics of the algorithm used. Others have conjectured that problem structure is a significant contributor. In this paper we attempt to explore the former hypothesis, namely we study how variable and value ordering heuristics impact the heavy-tailedness of runtime distributions of backtrack search procedures. We demonstrate that heavy-tailed behaviour can be eliminated from particular classes of random problems by carefully selecting the search heuristics, even when using chronological backtrack search. We also show that combinations of good search heuristics can eliminate heavy tails from quasigroups with holes of order 10 and 20, and give some insights into why this is the case. These results motivate a more detailed analysis of the effects that variable and value orderings can have on heavy-tailedness. We show how combinations of variable and value ordering heuristics can result in a runtime distribution being inherently heavy-tailed. Specifically, we show that even if we were to use an oracle to refute insoluble subtrees optimally, for some combinations of heuristics we would still observe heavy-tailed behaviour. Finally, we study the distributions of refutation sizes found using different combinations of heuristics and gain some further insights into what characteristics tend to give rise to heavy-tailed behaviour.
Conference Paper
Much progress has been made in terms of boosting the effectiveness of backtrack style search methods. In addition, during the last decade, a much better understanding of problem hardness, typical case complexity, and backtrack search behavior has been obtained. One example of a recent insight into backtrack search concerns so-called heavy-tailed behavior in randomized versions of backtrack search. Such heavy-tails explain the large variations in runtime often observed in practice. However, heavy-tailed behavior does certainly not occur on all instances. This has led to a need for a more precise characterization of when heavy-tailedness does and when it does not occur in backtrack search. In this paper, we provide such a characterization. We identify different statistical regimes of the tail of the runtime distributions of randomized backtrack search methods and show how they are correlated with the “sophistication” of the search procedure combined with the inherent hardness of the instances. We also show that the runtime distribution regime is highly correlated with the distribution of the depth of inconsistent subtrees discovered during the search. In particular, we show that an exponential distribution of the depth of inconsistent subtrees combined with a search space that grows exponentially with the depth of the inconsistent subtrees implies heavy-tailed behavior.
Article
The traveling salesman problem is one of the most famous combinatorial problems. We identify a natural parameter for the two-dimensional Euclidean traveling salesman problem. We show that for random problems there is a rapid transition between soluble and insoluble instances of the decision problem at a critical value of this parameter. Hard instances of the traveling salesman problem are associated with this transition. Similar results are seen both with randomly generated problems and benchmark problems using geographical data. Surprisingly, finite-size scaling methods developed in statistical mechanics describe the behaviour around the critical value in random problems. Such phase transition phenomena appear to be ubiquitous. Indeed, we have yet to find an NP-complete problem which lacks a similar phase transition.
Article
The traveling salesman problem (TSP) is one of the best-known combinatorial optimization problems. Branch-and-bound (BnB) is the best method for finding an optimal solution of the TSP. Previous research has shown that there exists a transition in the average computational complexity of BnB on random trees. We show experimentally that when the intercity distances of the asymmetric TSP are drawn uniformly from 0,1,2,…, r, the complexity of BnB experiences an easy-hard transition as r increases. We also observe easy-hard-easy complexity transitions when asymmetric intercity distances are chosen from a log-normal distribution. This transition pattern is similar to one previously observed on the symmetric TSP. We then explain these different transition patterns by showing that the control parameter that determines the complexity is the number of distinct intercity distances.
Article
Stochastic algorithms are among the best methods for solving computationally hard search and reasoning problems. The run time of such procedures can vary significantly from instance to instance and, when using different random seeds, on the same instance. One can take advantage of such differences by combining several algorithms into a portfolio, and running them in parallel or interleaving them on a single processor. We provide an evaluation of the portfolio approach on distributions of hard combinatorial search problems. We show under what conditions the portfolio approach can have a dramatic computational advantage over the best traditional methods. In particular, we will see how, in a portfolio setting, it can be advantageous to use a more “risk-seeking” strategy with a high variance in run time, such as a randomized depth-first search approach in mixed integer programming versus the more traditional best-bound approach. We hope these insights will stimulate the development of novel randomized combinatorial search methods.
Article
Completing partial Latin squares is shown to be NP-complete. Classical embedding techniques of Hall and Ryser underly a reduction from partitioning tripartite graphs into triangles. This in turn is shown to be NP-complete using a recent result of Holyer.
Article
This is a report of research carried out during 1992 and 1993 in which three different automated reasoning programs—DDPP, FINDER, and MGTP—were applied to a series of exhaustive search problems in the theory of quasigroups. All three of the programs succeeded in solving previously open problems concerning the existence of quasigroups satisfying certain additional conditions. Using different programs has allowed us to cross-check the results, helping reliability. We find this research interesting from several points of view: first, it brings techniques from the field of automated reasoning to bear on a rather different problem domain from that which motivated their development; second, investigating such hard problems leads us to push the limits of what our systems have achieved; and finally, it involves us in serious philosophical issues concerning essentially computational proofs.
Article
We present a detailed experimental investigation of the easy-hard-easy phase transition for randomly generated instances of satisfiability problems. Problems in the hard part of the phase transition have been extensively used for benchmarking satisfiability algorithms. This study demonstrates that problem classes and regions of the phase transition previously thought to be easy can sometimes be orders of magnitude more difficult than the worst problems in problem classes and regions of the phase transition considered hard. These difficult problems are either hard unsatisfiable problems or are satisfiable problems which give a hard unsatisfiable subproblem following a wrong split. Whilst these hard unsatisfiable problems may have short proofs, these appear to be difficult to find, and other proofs are long and hard.
Article
We introduce a technique for analyzing the behavior of sophisticated AI search programs working on realistic, large-scale problems. This approach allows us to predict where, in a space of problem instances, the hardest problems are to be found and where the fluctuations in difficulty are greatest. Our key insight is to shift emphasis from modelling sophisticated algorithms directly to modelling a search space that captures their principal effects. We compare our model's predictions with actual data on real problems obtained independently and show that the agreement is quite good. By systematically relaxing our underlying modelling assumptions we identify their relative contribution to the remaining error and then remedy it. We also discuss further applications of our model and suggest how this type of analysis can be generalized to other kinds of AI problems.
Article
The distribution of hard graph coloring problems as a function of graph connectivity is shown to have two distinct transition behaviors. The first, previously recognized, is a peak in the median search cost near the connectivity at which half the graphs have solutions. This region contains a high proportion of relatively hard problem instances. However, the hardest instances are in fact concentrated at a second, lower, transition point. Near this point, most problems are quite easy, but there are also a few very hard cases. This region of exceptionally hard problems corresponds to the transition between polynomial and exponential scaling of the average search cost, whose location we also estimate theoretically. These behaviors also appear to arise in other constraint problems. This work also shows the limitations of simple measures of the cost distribution, such as mean or median, for identifying outlying cases.
Article
Researchers in the areas of constraint satisfaction problems, logic programming, and truth maintenance systems have suggested various schemes for enhancing the performance of the backtracking algorithm. This paper defines and compares the performance of three such schemes: “backjumping,” “learning,” and “cycle-cutset.” The backjumping and the cycle-cutset methods work best when the constraint graph is sparse, while the learning scheme mostly benefits problem instances with dense constraint graphs. An integrated strategy is described which utilizes the distinct advantages of each scheme. Experiments show that, in hard problems, the average improvement realized by the integrated scheme is 20–25% higher than any of the individual schemes.
Article
In recent years there has been significant interest in the study of random k-SAT formulae. For a given set of n Boolean variables, let denote the set of all possible disjunctions of k distinct, non-complementary literals from its variables (k-clauses). A random k-SAT formula is formed by selecting uniformly and independently m clauses from and taking their conjunction. Motivated by insights from statistical mechanics that suggest a possible relationship between the “order” of phase transitions and computational complexity, Monasson and Zecchina (Phys. Rev. E 56(2) (1997) 1357) proposed the random (2+p)-SAT model: for a given p∈[0,1], a random (2+p)-SAT formula, , has m randomly chosen clauses over n variables, where pm clauses are chosen from and (1−p)m from . Using the heuristic “replica method” of statistical mechanics, Monasson and Zecchina gave a number of non-rigorous predictions on the behavior of random (2+p)-SAT formulae. In this paper we give the first rigorous results for random (2+p)-SAT, including the following surprising fact: for p⩽2/5, with probability , a random (2+p)-SAT formula is satisfiable iff its 2-SAT subformula is satisfiable. That is, for p⩽2/5, random (2+p)-SAT behaves like random 2-SAT.
Conference Paper
Existing random models for the constraint satisfaction problem (CSP) all require an extremely low constraint tightness in order to have non-trivial threshold behaviors and guaranteed hard instances at the threshold. We study the possibility of designing random CSP models that have interesting threshold and typical-case complexity behaviors while at the same time, allow a much higher constraint tightness. We show that random CSP models that enforce the constraint consistency have guaranteed exponential resolution complexity without putting much restriction on the constraint tightness. A new random CSP model is proposed to generate random CSPs with a high tightness whose instances are always consistent. Initial experimental results are also reported to illustrate the sensitivity of instance hardness to the constraint tightness in classical CSP models and to evaluate the proposed new random CSP model.
Conference Paper
Consider a randomly generated boolean formula F (in the conjunctive normal form) with m clauses of size k over n variables; k is fixed at any value greater than 1, but n tends to infinity and m=(1+o(1))cn for some c depending only on on k. It is easy to see that F is unsatisfiable with probability 1-o(1) whenever c>(ln2)2 k ; we complement this observation by proving that F is satisfiable with probability 1-o(1) whenever c<(0·25)2 k /k; in fact, we present a linear-time algorithm that satisfies F with probability 1-o(1). (This result is a continuation of work by Chao and Franco.) In addition, we establish a threshold for 2-SAT: if k=2 then F is satisfiable with probability 1-o(1) whenever c<1 and unsatisfiable with probability 1-o(1) whenever c>1.
Conference Paper
Recent empirical studies show that runtime distributions of backtrack procedures for solving hard combinatorial problems often have intriguing properties. Unlike standard distributions (such as the normal) , such distributions decay slower than exponentially and have "heavy tails". Procedures characterized by heavy-tailed runtime distributions exhibit large variability in efficiency, but a very straightforward method called rapid randomized restarts has been designed to essentially improve their average performance. We show on two experimental domains that heavy-tailed phenomena can be observed in ILP, namely in the search for a clause in the subsumption lattice. We also reformulate the technique of randomized rapid restarts to make it applicable in ILP and show that it can reduce the average search-time.
Conference Paper
In this paper we generalize a heuristic that we have introduced previously for solving efficiently random 3-SAT formulae. Our heuristic is based on the notion of backbone, searching variables belonging to local backbones of a formula. This heuristic was limited to 3-SAT formulae. In this paper we generalize this heuristic by introducing a sub-heuristic called a re-normalization heuristic in order to handle formulae with various clause lengths and particularly hard random k-sat formulae with k ≥ 3 . We implemented this new general heuristic in our previous program cnfs, a classical dpll algorithm, renamed kcnfs. We give experimental results which show that kcnfs outperforms by far the best current complete solvers on any random k-SAT formula for k ≥ 3.
Conference Paper
We study the parameterized complexity of detecting back- door sets for instances of the propositional satisabilit y problem (SAT) with respect to the polynomially solvable classes horn and 2-cnf. A backdoor set is a subset of variables; for a strong backdoor set, the simplied formulas resulting from any setting of these variables is in a polynomially solvable class, and for a weak backdoor set, there exists one setting which puts the satisable simplied formula in the class. We show that with respect to both horn and 2-cnf classes, the detection of a strong backdoor set is xed-parameter tractable (the existence of a set of size k for a formula of length N can be decided in time f(k)NO(1)), but that the detection of a weak backdoor set is W(2)-hard, implying that this problem is not xed-parameter tractable.
Conference Paper
We study the runtime profiles of complete backtrack- style search methods applied to hard scheduling prob- lems. Such search methods often exhibit a large vari- ability in performance due to the non-standard nature of their underlying cost distributions. The distribu- tions generally exhibit very long tails or "heavy tails" and are best characterized by a general class of distri- butions that have no moments (i.e., an infinite mean, variance, etc.). We show how one can exploit the spe- cial nature of such distributions to significantly im- prove upon deterministic complete search procedures. scales better than the best IP formulations. We then show how one can yet further improve upon our CSP approach by adding a stochastic element to the deter- ministic search strategy. We should stress that our method remains complete, unlike, for example, stochas- tic local search strategies. The effectiveness of our ap- proach can be explained in terms of the heavy-tailed nature of the underlying cost distributions of complete backtrack-style search procedures. That is, the distri- butions are characterized by extreme outliers relative to the median cost value. This phenomenon manifests itself in terms of the long-tails of the cost distributions and the highly erratic behavior of the mean search cost over multiple runs. Given its simplicity and generality, our approach can be easily adapted to improve the per- formance of other backtrack-style search methods used in planning and scheduling.
Conference Paper
. We outline a technique for studying phase transition behaviour in computational problems using number partitioning as a case study. We first build an "annealed" theory that assumes independence between parts of the number partition problem. Using this theory, we identify a parameter which represents the "constrainedness" of a problem. We determine experimentally the critical value of this parameter at which a rapid transition between soluble and insoluble problems occurs. Finite-size scaling methods developed in statistical mechanics describe the behaviour around the critical value. We identify phase transition behaviour in both the decision and optimization versions of number partitioning, in the size of the optimal partition, and in the quality of heuristic solutions. This case study demonstrates how annealed theories and finite-size scaling allows us to compare algorithms and heuristics in a precise and quantitative manner. 1 Introduction Phase transition behaviour has recently r...
Conference Paper
Variable ordering heuristics have long been an important component of constraint satisfaction search algorithms. In this paper we study the behaviour of standard variable ordering heuristics when searching an insoluble (sub)problem. We employ the notion of an optimal refutation of an insoluble (sub)problem and describe an algorithm for obtaining it. We propose a novel approach to empirically looking at problem hardness and typical-case complexity by comparing optimal refutations with those generated by standard search heuristics. It is clear from our analysis that the standard variable orderings used to solve CSPs behave very differently on real-world problems than on random problems of comparable size. Our work introduces a potentially useful tool for analysing the causes of the heavy-tailed phenomenon observed in the runtime distributions of backtrack search procedures.
Conference Paper
Of late, new insight into the study of random -SAT formulae has been gained from the introduction of a concept inspired by models of physics, the 'back- bone' of a SAT formula which corresponds to the variables having a fixed truth value in all assign- ments satisfying the maximum number of clauses. In the present paper, we show that this concept, al- ready invaluable from a theoretical viewpoint in the study of the satisfiability transition, can also play an important role in the design of efficient DPL-type algorithms for solving hard random -SAT formu- lae and more specifically -SAT formulae. We de- fine a heuristic search for variables belonging to the backbone of a -SAT formula which are chosen as branch nodes for the tree developed by a DPL-type procedure. We give in addition a simple technique to magnify the effect of the heuristic. Implementa- tion yields DPL-type algorithms with a significant performance improvement over the best current al- gorithms, making it possible to handle unsatisfiable hard 3-SAT formulae up to 700 variables.
Article
A propositional formula is in 2-CNF (2-conjunctive normal form) iff it is the conjunction of clauses each of which has exactly two literals. We show: If C=1+ε, where ε>0 is fixed and q(n)≥C·n, then almost all formulas in 2-CNF with q(n) different clauses, where n is the number of variables, are unsatisfiable. If C=1-ε and q(n)≤C·n, then almost all formulas with q(n) clauses are satisfiable. By “almost all” we mean that the probability of the set of unsatisfiable or satisfiable formulas among all formulas with q(n) clauses approaches 1 as n→∞. So C=1 gives us a threshold separating satisfiability and unsafisfiability of formulas in 2-CNF in a probabilistic, asymptotic sense. To prove our result we translate the satisfiability problem for formulas in 2-CNF into a graph theoretical question. Then we apply techniques from the theory of random graphs.
Article
In this paper, we study the possibility of designing non-trivial random CSP models by exploiting the intrinsic connection between structures and typical-case hardness. We show that constraint consistency, a notion that has been developed to improve the efficiency of CSP algorithms, is in fact the key to the design of random CSP models that have interesting phase transition behavior and guaranteed exponential resolution complexity without putting much restriction on the parameter of constraint tightness or the domain size of the problem. We propose a very flexible framework for constructing problem instances withinteresting behavior and develop a variety of concrete methods to construct specific random CSP models that enforce different levels of constraint consistency. A series of experimental studies with interesting observations are carried out to illustrate the effectiveness of introducing structural elements in random instances, to verify the robustness of our proposal, and to investigate features of some specific models based on our framework that are highly related to the behavior of backtracking search algorithms.
Article
In this paper we propose a new type of random CSP model, called Model RB, which is a revision to the standard Model B. It is proved that phase transitions from a region where almost all problems are satisfiable to a region where almost all problems are unsatisfiable do exist for Model RB as the number of variables approaches infinity. Moreover, the critical values at which the phase transitions occur are also known exactly. By relating the hardness of Model RB to Model B, it is shown that there exist a lot of hard instances in Model RB.
Article
We illustrate the use of phase transition behavior in the study of heuristics. Using an "annealed" theory, we de fine a parameter that measures the "constrainedness" of an ensemble of number partitioning problems. We identify a phase transition at a critical value of constrainedness. We then show that constrainedness can be used to analyze and compare algorithms and heuristics for number partitioning in a precise and quantitative manner. For example, we demonstrate that on uniform random problems both the Karmarkar-Karp and greedy heuristics minimize the constrainedness, but that the decisions made by the Karmarkar-Karp heuristic are superior at reducing constrainedness. This supports the better performance observed experimentally for the Karmarkar-Karp heuristic. Our results refute a conjecture of Fu that phase transition behavior does not occur in number partitioning. Additionally, they demonstrate that phase transition behavior is useful for more than just simple benchmarking. It can, for instance, be used to analyze heuristics, and to compare the quality of heuristic solutions.
Article
We study the parameterized complexity of detecting smallbackdoorsets for instances of the propositional satisfiability problem (SAT). The notion of backdoor sets has been recently introduced by Williams, Gomes, and Selman for explaining the ?heavy-tailed? behavior of back-tracking algorithms. If a small backdoor set is found, then the instance can be solved efficiently by the propagation and simplification mechanisms of a SAT solver. Empirical studies indicate that structured SAT instances coming from practical applications have small backdoor sets. We study the worst-case complexity of detecting backdoor sets with respect to the simplification and propagation mechanisms of the classic Davis-Logemann-Loveland (DLL) procedure. We show that the detection of backdoor sets of size bounded by a fixed integerkis of high parameterized complexity. In particular, we determine that this detection problem (and some of its variants) is complete for the parameterized complexity class W[P]. We achieve this result by means of a generalization of a reduction due to Abrahamson, Downey, and Fellows.
Article
Heuristic methods for solution of problems in the NP-complete class of decision problems often reach exact solutions, but fail badly at ''phase boundaries,'' across which the decision to be reached changes from almost always having one value to almost always having a different value. We report an analytic solution and experimental investiga- tions of the phase transition that occurs in the limit of very large problems in K-SAT. Studying a model which interpolates K-SAT between Ks 2 and Ks 3, we find a change from a continuous to a discontinuous phase transition when K, the average number of inputs per clause, exceeds 0.4. The cost of finding solutions also increases dramatically above this changeover. The nature of its ''random first-order'' phase transition, seen at values of K large enough to make the computational cost of solving typical instances increase exponen- tially with problem size, suggests a mechanism for the cost increase. There has been