The Impact of Balancing on Problem Hardness in a Highly Structured Domain.
ABSTRACT Random problem distributions have played a key role in the study and design of algorithms for constraint sat isfaction and Boolean satisfiability, as well as in our understanding of problem hardness, beyond standard worstcase complexity. We consider random problem distributions from a highly structured problem domain that generalizes the Quasigroup Completion problem (QCP) and Quasigroup with Holes (QWH), a widely used domain that captures the structure underlying a range of realworld applications. Our problem domain is also a generalization of the wellknown Sudoku puz zle: we consider Sudoku instances of arbitrary order, with the additional generalization that the block re gions can have rectangular shape, in addition to the standard square shape. We evaluate the computational hardness of Generalized Sudoku instances, for different parameter settings. Our experimental hardness results show that we can generate instances that are consider ably harder than QCP/QWH instances of the same size. More interestingly, we show the impact of different bal ancing strategies on problem hardness. We also provide insights into backbone variables in Generalized Sudoku instances and how they correlate to problem hardness.

Article: Kakuro as a constraint problem
[Show abstract] [Hide abstract]
ABSTRACT: In this paper we describe models of the logic puzzle Kakuro as a constraint problem with finite domain variables. We show a basic model expressing the constraints of the problem and present various improvements to enhance the constraint propagation, and compare alternatives using MILP and SAT solvers. Results for different puzzle collections are given. We also propose a grading scheme predicting the difficulty of a puzzle for a human and show how problems can be tightened by removing hints.01/2008;  SourceAvailable from: psu.edu[Show abstract] [Hide abstract]
ABSTRACT: We present a new probabilistic framework for finding likely variable assignments in difficult constraint satisfaction problems. Finding such assignments is key to efficient search, but practical efforts have largely been limited to random guessing and heuristically designed weighting systems. In contrast, we derive a new version of Belief Propagation (BP) using the method of Expectation Maximization (EM). This allows us to differentiate between variables that are strongly biased toward particular values and those that are largely extraneous. Using EM also eliminates the threat of nonconvergence associated with regular BP. Theoretically, the derivation exhibits appealing primal/dual semantics. Empirically, it produces an "EMBP"based heuristic that outperforms existing techniques for guiding variable and value ordering during backtracking search.01/2008;  [Show abstract] [Hide abstract]
ABSTRACT: The sudoku completion problem is a special case of the latin square completion problem and both problems are known to be NPcomplete. However, in the case of a rectangular hole pattern–i.e. each column (or row) is either full or empty of symbols–it is known that the latin square completion problem can be solved in polynomial time. Conversely, we prove in this paper that the same rectangular hole pattern still leaves the sudoku completion problem NPcomplete.Discrete Mathematics 11/2012; 312(22):33063315. · 0.57 Impact Factor
Page 1
The Impact of Balancing on Problem Hardness in a Highly Structured Domain∗
Carlos Ans´ otegui1, Ram´ on B´ ejar2, C´ esar Fern` andez2, Carla Gomes3, Carles Mateu2
1carlos@iiia.csic.es, IIIACSIC, Barcelona, SPAIN
2{ramon,cesar,carlesm}@diei.udl.es, Dept. of Computer Science, Universitat de Lleida, SPAIN
3gomes@cs.cornell.edu, Dept. of Computer Science, Cornell University, USA
Abstract
Random problem distributions have played a key role
in the study and design of algorithms for constraint sat
isfaction and Boolean satisfiability, as well as in our
understanding of problem hardness, beyond standard
worstcase complexity. We consider random problem
distributions from a highly structured problem domain
that generalizes the Quasigroup Completion problem
(QCP) and Quasigroup with Holes (QWH), a widely
used domain that captures the structure underlying a
range of realworld applications. Our problem domain
is also a generalization of the wellknown Sudoku puz
zle: we consider Sudoku instances of arbitrary order,
with the additional generalization that the block re
gions can have rectangular shape, in addition to the
standard square shape. We evaluate the computational
hardness of Generalized Sudoku instances, for different
parameter settings. Our experimental hardness results
show that we can generate instances that are consider
ably harder than QCP/QWH instances of the same size.
More interestingly, we show the impact of different bal
ancing strategies on problem hardness. We also provide
insights into backbone variables in Generalized Sudoku
instances and how they correlate to problem hardness.
Introduction
In recent years we have seen a tremendous development of
both complete and incomplete search methods for constraint
satisfaction (CSP) and Boolean satisfiability (SAT) prob
lems. An important factor in the development of new search
methods is the availability of good sets of benchmarks prob
lems used to evaluate and finetune the algorithms. Random
problem distributions have played a key role in the study
and design of algorithms, as well as in our understanding of
problem hardness, beyond standard worstcase complexity.
Problem distributions, such as random kSAT, have tradi
tionally been used to study the typical case complexity of
combinatorial search problems. However, realworld prob
lem instances have much more structure than that found in
random SAT instances. In order to study the impact of struc
ture on problem hardness, Gomes and Selman introduced
the Quasigroup Completion completion problem (QCP) as
a benchmark problem for evaluating combinatorial search
methods: the structure underlying this domain can be found
∗Research partially supported by projects TIN200407933
C0303, TIN200509312C0301 and TIC200300950 funded by
the Ministerio de Educaci´ on y Ciencia.
Copyright c ? 2006, American Association for Artificial Intelli
gence (www.aaai.org). All rights reserved.
in a range of realworld applications, such as timetabling,
routing, and design of statistical experiments (Gomes & Sel
man1997). QCP(anditsvariants)hasbecomeawidelyused
benchmark domain and it has led researchers to the discov
ery of interesting phenomena that characterize combinato
rial search and consequently to the design of new search and
inference strategies for such domains. Examples include,
the socalled heavytailed phenomena, randomization and
restarts strategies, e.g., (Gomes, Selman, & Crato 1997;
Gomes et al. 2004; Refalo 2004; Hulubei & O’Sullivan
2006), the design of efficient global constraints, e.g., (Shaw,
Stergiou, & Walsh 1998; R´ egin & Gomes 2004), and trade
offs in different CSP and SAT based representations (see e.g.
(Dot´ u, del Val, & Cebri´ an 2003; Ans´ otegui et al. 2004)).
We consider random problem distributions from a highly
structured problem domain that generalizes QCP and Quasi
group with Holes (QWH). Our problem domain also gen
eralizes the popular Sudoku puzzle: In Sudoku the goal is
to complete a partially filled 9x9 matrix, with 9 symbols,
without repeating a symbol in a row, column, and in each
of its 9 3x3 submatrix regions. Like QCP and QWH, Su
doku’s structure is that of a Latin square, with the additional
constraint that all 9 symbols must appear in each of its 9 sub
regions of size 3x3. Although some Sudoku instances can be
challenging for humans, they can be easily solved by current
stateofthe art CSP or SAT solvers, even when only using
polytime inference methods, without search (Simonis 2005;
Lynce & Ouaknine 2006). So, our goal is to produce chal
lenging Generalized Sudoku Problem (GSP) instances for
CSP and SAT based search algorithms.
A Generalized Sudoku instance has an arbitrary order n
with block regions that can have rectangular or square shape
and it can have more than one solution or it can even be un
satisfiable. We show that our Generalized Sudoku problem
(GSP) is NPcomplete. We also provide a method for gen
erating GSP instances that are guaranteed to have solutions.
We refer to this variant as Generalized Sudoku with Holes
(GSWH) problem, inspired by the QWH problem (Achliop
tas et al. 2000; Bart´ ak 2005). GSWH instances are gener
ated by “punching” holes into a complete GS instance. A
key question concerning GSWH is the impact of different
hole punching strategies on problem hardness. We present
a method that produces highly balanced GSWH instances:
The idea is to balance the number of holes between the re
gions of the Sudoku, in addition to balancing the number of
holes between rows and columns. Our results show that our
GSWH instances are harder to solve than QWH instances;
Page 2
furthermore, we also show that hardness increases as a func
tion of the squareness of the block regions. In other words,
GS instances with square block regions are harder than GS
instances with rectangular block regions, for the same or
der. Concerning hole patterns, our highly balanced method
of punching holes produces instances substantially harder
than balanced QWH instances (Kautz et al. 2001). An
open question, raised by our results, is to what extent our
method produces harder instances because it increases the
“bandwidth” of the corresponding bipartite graph associated
with the hole pattern. Finally, we also show an interesting
correlation across different constrainedness regions between
an approximation of the number of socalled backbone vari
ables and the complexity of GSWH.
Generalized Sudoku Problems
A valid complete Generalized Sudoku (GS) instance of or
der s on s symbols, is an s × s matrix, with each of its s2
cells filled with one of the s symbols, such that no symbol is
repeated in a row, column, or block region. A block region is
a set of s predefined cells; block regions don’t overlap, and
therefore there are exactly s block regions in a GS instance
of order s. In the case of square region blocks, each block
has√s ×√s cells (√s has to be an integer); in the case of
rectangular regions, each block has n × l cells (n columns,
l rows), always with n · l = s. One can also consider block
regions of arbitrary shape. These generalizations provide us
with a range of interesting problems, starting from quasi
group with holes (QWH) (Achlioptas et al. 2000) (when the
block regions correspond to single rows) to standard gener
alized Sudokus (when the regions are squares). In this paper
we consider GS instances with rectangular and square block
regions.
The Generalized Sudoku problem (GSP) is defined as fol
lows: Given a partially filled Generalized Sudoku instance
of order s can we fill the remaining cells of the s × s ma
trix such that we obtain a valid complete Generalized Su
doku instance? GSP is NPcomplete, given that it is a gen
eralization of Sudoku, known to be NPcomplete (Yato &
Seta 2002). To show the NPcompleteness of GSP, when the
regions are strictly of rectangular shape, with dimensions
n × l = s, we use a reduction from completing a partial
Latin square (LS) of side l that can be seen as a generaliza
tion of the one in (Yato & Seta 2002). The idea is that now
we use a GS of side n · l with set of symbols (a,b) with
0 ≤ a < l,0 ≤ b < n; the set of symbols associated with
the LS is still the subset (a,0). To embed the LS into the
GSP instance, we fill position S(i,j) of the GSP instance
with the symbol:
( i + ?j/n? mod l, j + ?i/l? mod n)
Observe that in this (totally filled) GSP instance the set of
symbols placed in the positions:
B = {(i,j)  0 ≤ i < l, j mod n = 0}
is the subset (a,0) and that the positions in B form a LS
of side l with that subset of symbols. Then, we embed the
partial LS of side l by mapping symbol i of the LS to symbol
(i,0) of the GSP and changing the contents of the set of
positions B to embed the partial LS in this way:
?
(1)
(2)
S(i,j) =
(LS(i,j/n),0)
⊥
(i,j) ∈ B,LS(i,j/n) ?=⊥
(i,j) ∈ B,LS(i,j/n) =⊥
(3)
So, the (partially filled) GSP instance with rectangular block
regions will have solution if and only if the partial LS has
solution.
Generating complete Generalized Sudokus
To generate a GS instance, we follow the approach of build
ing a Markov chain whose set of states includes, as a subset,
the set of complete Generalized Sudokus. We use the second
Markov chain defined in (Jacobson & Matthews 1996) that
considers only “proper” Latin squares as states. Because any
GS is also a Latin square of the same order, this chain obvi
ously includes a subchain with all the possible GSs. So, for
any pair of complete GSs there exists a sequence of proper
moves, of the type mentioned in Theorem 6 of (Jacobson &
Matthews 1996), that transforms one into the other.
However, if we simply use the Markov chain by making
(proper) moves uniformly at random, starting from an initial
complete GS, most of the time we will reach Latin squares
that are not valid GSs. To cope with this problem, we se
lect the move that minimizes the number of “violated” cells,
i.e. cells with a color that appears at least once more in the
same region of the cell. To escape local minima, we se
lect the move uniformly at random every certain number of
moves1. So, to generate a random GS, we start from an ini
tial GS2, and we perform the moves described above until
we have generated a certain number of valid GSs. Observe
that this method does not necessarily generate a GS instance
uniformly at random, because we do not always select the
next move uniformly at random. However, as we will see in
the experimental results, the method provides us with very
hard computational instances of the GSWH problem, once
we punch holes in the appropriate manner.
Balanced Hole Patterns
8x2 Sudoku 4x4 Sudoku
Figure 1: Examples of Singly balanced hole patterns for Su
doku problems, that allow decomposition in smaller (and
easier) problems. Empty cells are in gray.
Kautz et al. (Kautz et al. 2001) introduced a method for
balancing the hole pattern of a QWH instance, such that the
1In our experiments, this number has been fixed to 20, because
it works reasonable well with all the orders of Generalized Sudokus
that we have tested.
2Note that we can trivially generate a Generalized Sudoku of
arbitrary order, with arbitrary rectangular or square regions, using
equation 1.
Page 3
number of holes in each row and column is equal. We refer
to such a method as ”Singly balanced”. It turns out that this
way of balancing does not lead to the hardest Generalized
Sudoku instances. To see why, consider the following hole
pattern for a Sudoku problem with regions of dimensions
n × l and such that n mod l = 0. We punch holes in all
the cells of a same region, in such a way that in every region
row (n in total) the number of regions with holes is 1 and in
every region column (l in total) this number is n/l. This cre
ates an instance with the same number of holes (n) in every
row and column, but that can be decomposed in l smaller
independent subproblems that only involve n/l regions each
one. Observe that for (standard) Sudoku instances, n/l = 1,
so we get a set of n trivial subproblems. Figure 1 shows two
examples of this hole pattern, one for 8x2 Sudokus and the
other for 4x4 Sudokus.
So, the distribution of holes between different regions can
make a difference in the difficulty of the problem. So one
would like to balance the holes also between different re
gions. Even if the method of (Kautz et al. 2001) will tend
to distribute the holes approximately uniformly between the
regions, it will not always ensure that we have exactly the
same number of holes in every region when the total num
ber of holes is a multiple of the size of the Sudoku. For that
reason, we propose a method for ensuring both balance con
ditions: same number of holes in every row and column and
same number of holes in each block region.
Our new ”doubly balanced” method is again based on a
Markov chain, where every state is a hole pattern H with
h holes that satisfies both balance conditions. In order to go
from one hole pattern to another we use a “switch” (Kannan,
Tetali, & Vempala 1997). This kind of move was introduced
to build a Markov chain where every state is a regular bipar
tite graph (with vertices with a given degree), the Markov
chain is connected, and it includes all possible such graphs.
A hole pattern with the same number of holes in each row
and column can be seen as a regular bipartite graph. A
switch is defined by two entries (i,j),(i?,j?) of the GSP in
stance, such that there is a hole in (i,j) and (i?,j?) but not in
(i,j?) and (i?,j). If we change both holes to positions (i,j?)
and (i?,j) the row and column hole number does not change.
So, we use this Markov chain, but we restrict the moves to
those moves that also maintain the number of holes present
in each region, i.e., moves such that the regions of (i,j) and
(i?,j?) are either in the same region row or region column,
or both. Such moves always exist, even for a hole pattern
with only one hole in each region. So, we have the follow
ing code for generating a hole pattern H with q = h/(n · l)
holes in each row, column and region, using a GS S(i,j) to
create the initial pattern considering each symbol as a hole,
and then performing s moves trough the Markov chain:
H = { (i,j)  S(i,j) ∈ [1,q] }
Repeat s times:
T = { switch((i,j),(i?,j?)) of H 
?i/l? = ?i?/l? ∨ ?j/n? = ?j?/n? }
pick a uniformly random switch((i,j),(i?,j?)) from T
H = (H − {(i,j),(i?,j?)}) ∪ {(i,j?),(i?,j)}
Wealsoconsideredadifferentmethodforpunchingholes:
This method is based on the rectangular model presented
also in (Kautz et al. 2001). The rectangular model selects a
set of columns (or rows) and punches holes in all the cells of
4
1
3
9
2
6
7
8
5
9
6
7
8
5
3
2
4
1
2
8
5
1
7
4
9
6
3
x
x
5
2
8
4
1
7
3
9
6
3
9
6
2
8
5
1
7
4
7
4
1
6
3
9
5
2
8
Figure 2: Grayed cells represent the initial assignment. The
cells marked with × cannot be completed after the two first
columns are completed in the way shown
these columns: in the case of QWH, this method produces
tractable instances (Kautz et al. 2001); they can be solved
using an algorithm based on bipartite graph matching. An
open question is whether a similar hole pattern in the case of
GSP instances also corresponds to a tractable class. Figure 2
shows an example of a solvable 3x3 GSP instance, with a
rectangular hole pattern. The initial assignment is indicated
by the grayed cells. After two rounds of the bipartite graph
matching algorithm, a possible outcome corresponds to the
first two columns in the way shown; this configuration can
not be completed into a valid Sudoku (even though there ex
ists a valid completion of the initial partially filled instance).
So, the rectangular hole pattern could still provide hard in
stances for Sudoku. For that reason, we propose a rectangu
lar model for Generalized Sudoku, that distributes a set of
c hole columns between the different region columns of the
Generalized Sudoku, in a uniform way. The uniform distri
bution of the hole columns tries to minimize the clustering
of holes.
Encodings and Solvers
In this work we consider the best performing encodings for
the QCP analyzed in (Ans´ otegui et al. 2004), and we ex
tend them with the suitable representation of the additional
alldiff constraints for the regions in the Generalized Sudoku
Problem (GSP). We consider a GSP instance on s symbols.
The SAT encoding extends the SAT 3dimensional (3D)
encoding proposed in (Kautz et al. 2001) for the QCP. The
encoding uses s Boolean variables per cell; each variable
represents a symbol assigned to a cell, and the total num
ber of variables is s3. The clauses corresponding to the 3D
encoding represent the following constraints:
1. at least one symbol must be assigned to each cell (alo
cell);
2. no two symbols are assigned to the same cell (amocell);
3. each symbol must appear at least once in each row (alo
row);
4. no symbol is repeated in the same row (amorow);
5. each symbol must appear at least once in each column
(alocolumn);
6. no symbol is repeated in the same column (amocolumn).
Page 4
Finally, for each region i the clauses we add represent the
following constraints:
7. each symbol must appear at least once in each region i
(aloregioni).
8. no symbol is repeated in the same region i (amoregioni).
The same SAT encoding is considered in (Lynce & Ouak
nine 2006).
The CSP encoding extends the “bichannelling model” of
(Dot´ u, del Val, & Cebri´ an 2003). It consists of:
• A set of primal variables X = {xij  1 ≤ i ≤ s, 1 ≤
j ≤ s}; the value of xijis the symbol assigned to the cell
in the ith row and jth column.
• Two sets of dual variables: R = {rik 1 ≤ i ≤ s, 1 ≤
k ≤ s}, where the value of rikis the column j where
symbol k occurs in row i; and C = {cjk  1 ≤ j ≤
s, 0 ≤ k ≤ s} where the value of cjkrepresents the row
i where symbol k occurs in column j.
The domain of all variables is {1,...,s}, where these values
represent respectively symbols, columns, and rows. Vari
ables of different types are linked by channeling constraints:
• Row channeling constraints link the primal variables with
the row dual variables: xij= k ⇔ rik= j.
• Column channeling constraints link the primal variables
with the column dual variables: xij= k ⇔ cjk= i.
The concrete CSP encoding we use in our experimental
investigation is a Binary CSP represented with nogoods.
Finally, for each region we add the nogoods representing
the alldiff constraint over the set of primal variables involved
in each region of the Sudoku problem.
For the experimental investigation we considered four
stateoftheart SAT solvers: Satz (Li & Anbulagan 1997),
zChaff (Moskewicz et al.
S¨ orensson 2003), and Siege3.
We also considered a CSP solver, a variation of the MAC
solver by R´ egin and Bessi` ere (Bessi` ere & R´ egin 1996)
(MACLAH) proposed in (Ans´ otegui et al. 2004) that incor
porates the technique of failed literals and the Satz’s heuris
tic in terms of a CSP approach.
2001), MiniSAT (E´ en &
Complexity Patterns
Singly Balanced
Weconsiderfirstthecomplexityofsolving GSWHinstances
generated with the Singly Balanced method. Our first set of
results shows the complexity of Solving GSWH instances
with different region shapes, comparing it with the com
plexity of solving QWH instances. Figure 3 shows the re
sults for GSWH with regions 15x2, 10x3 and 6x5 (size 30)
and QWH also of size 30 (the encoding used for QWH is
the 3D). We employ 100 instances per point and MiniSAT
solver with 5,000 seconds cutoff. We observe similar com
plexity patterns for all problems, the difference being that as
the shape of the block region gets closer to a square, the peak
in complexity increases. So, for the same size, the easiest
instances are from QWH and the hardest ones are the ones
from GSWH, with regions that are almost square.4Observe
3Available at http://www.cs.sfu.ca/˜loryan/personal
4Since 30 is not a perfect square, we cannot have perfectly
square regions.
0.001
0.01
0.1
1
10
100
1000
10000
300 350 400 450
Number of holes
500 550 600 650
Time (seconds)
QWH 30
Sudoku 15x2
Sudoku 10x3
Sudoku 6x5
Figure 3: Empirical complexity patterns for singly balanced
GSWH instances with different region shapes, but same size
that the difference between QWH and GSWH with square
regions is about three orders of magnitude for this size. A
possible partial explanation of this fact is the following. For
a GSWH instance with regions n × l and l fixed, when we
fix a given cell, the larger n, the more cells in a same region
become constrained in each of the regions that intersect with
the row of the fixed cell. So, when fixing a cell, for a large n
some regions will become more constrained than others, and
this may be an advantage for simplifying the problem where
lookahead heuristics can take advantage of. By contrast, for
square regions, fixing a cell constrains the same number of
cells (if the current hole pattern is balanced) in all the re
gions that intersect with the row or the column of the fixed
cell. So, again it seems that balance is making a difference
in the complexity of this problem.
We observe the same qualitative behavior when using dif
ferent SAT algorithms. The main difference is in the mag
nitude of the peak of the complexity curve. Figure 4 shows
a plot with the performance of different algorithms in the
critically constrained area for different GSWH problems.
The plot shows, for different sizes and different algorithms,
the percentage of solved instances from a testset of 200 in
stances, when using a cutoff of 104seconds. For small sizes,
all algorithms solve almost all the instances. But as the size
increases, the solver MiniSAT clearly outperforms the other
solvers.
Doubly Balanced
Next, we consider the Doubly balanced method for punch
ing holes. When using this method, the typical hardness of
the GSWH instances seems to be very similar to the singly
balanced method, for small sizes; however, as we increase
the size of the instances, we see a dramatically difference
in computational hardness. This is probably due to the fact
that the singly balanced method tends to distribute the holes
uniformly between regions in such a way that the difference
with respect to the doubly balanced method is not signifi
cant for small instances. This can be quantified by looking
at the percentage of solved instances from a testset with 500
instances, for both methods, when working with a cutoff of
5.000 seconds. Table 1 shows these values. The values are
almost the same for 5x5 Sudokus, but for the instances of
Page 5
0
20
40
60
80
100
15x216x2 7x4 10x317x2
Sudoku block size
18x2 6x511x38x4 12x37x56x69x4
% solved instances
minisat
siege
satz
zchaff
MACLAH
Figure 4: Empirical complexity patterns for singly balanced
GSWH instances with different region shapes
singly balanced
% solved
doubly balanced
% solved regionholes
5x5
7x4
17x2
18x2
6x5
344
414
504
572
480
98.8
71.6
48.6
33.8
18.6
98.8
67.4
37.8
24.9
11.6
Table 1: Comparison of percentage of solved GSWH in
stances generated with both methods for putting holes, for
instances in the peak of hardness
higher orders, we observe an increase in the hardness of the
instances. This is reflected in the decrease in the number of
solved instances, for a given cutoff, when using the doubly
balanced method, in comparison to the singly balanced one.
So, our doubly balanced method generates harder in
stances than the ones produced by balanced QWH, they are
also guaranteed to be satisfiable, and therefore they consti
tute a good benchmark for the evaluation of local and sys
tematic search methods.
Rectangular model
For the rectangular model, in which holes are punched
across entire rows (or columns), our empirical results with
Satz do not seem to show a clear exponential scaling cost as
the size is increased. Figure 5 shows the results of solving
7x4 GSWH instances with the rectangular and the doubly
balanced model methods. We observe that the complexity
of solving the 7x4 problem is much higher for the balanced
model. Also, the fact that for Satz, with the problems tested
so far, the hardest instances of the rectangular model are the
ones obtained with the maximum number of holes (an empty
Sudoku) seems to indicate that this pattern is not inherently
hard. A key question is whether this pattern corresponds to
a tractable class.
0.1
1
10
100
1000
10000
380 400 420 440 460 480 500
Median time (s)
Number of holes
Sudoku 7x4
Double balanced
Rectangular
Figure 5: Comparison of the hardness of instances gener
ated using the doubly balanced method versus the rectangu
lar model.
6000
7000
8000
9000
10000
11000
12000
13000
300 320 340 360 380 400 420 440 460 480
Number of variables
Number of holes
5x5
9x3
15x2
7x4
10x3
Figure 6: Lookahead backbone and normalized complexity
patterns for Sudokus with different regions
Backbone
In this section we discuss results about the correlation be
tween the backbone of the satisfiable GS instances and com
putational hardness. It is known that computing the ex
act backbone is an intractable problem so we consider only
an approximation of the full backbone. In particular, we
use the lookahead backbone provided by Satz. This is the
set of variables that Satz discovers to have a unique value
by checking all possible assignments with unit propagation
and fixing every discovered backbone variable, until no new
backbone variable is discovered.
In our approximation of the backbone, we consider the
fraction of fixed variables by Satz over the number of vari
ables of the (satisfiable) encoded instance. Our satisfiable
instances are obtained after preprocessing our instances for
discarding the cells that are discovered to have a unique pos
sible value due to the initial partial assignment and the prop
agation of the Sudoku constraints. Figure 6 shows the evo
lution of the backbone together with the complexity of solv
Page 6
ing instances, for different region shapes, but normalized so
that the value at the peak of hardness coincides with the
maximum number of lookahead backbone variables. We
observe that this approximated backbone fraction starts to
increase until it reaches a point where it decreases abruptly
and then it again starts to increase, but this time more slowly.
The point where it reaches the minimum value is around the
value where the hardness starts to increase towards its peak.
So, this point of “suddenly” decrease in the backbone frac
tion can be used as a sign for the beginning of the hard re
gion of the problem. It is remarkable that even though the
backbone is hard to approximate (Kilby et al. 2005), in this
problem this approximated backbone provides a valuable in
formation.
We have obtained an approximated location of this mini
mum point through a doubly exponential regression model,
using the location of this minimum value for all possible Su
doku problems with different region forms (n×l), from size
26 to 49. The model obtained is:
holes = e0.537∗ n1.57∗ l1.72
The coefficient of regression (R2) is 0.989, thus indicating
that the model is quite good. We have also obtained an anal
ogous regression model, but for the location of the hardness
peak. For this model we used data obtained experimentally
with Satz, but only for problems with sizes ranging from 26
to 30. The model obtained is:
holes = e−0.217∗ n1.8∗ l1.97
Again, we obtain a high value for R (R2= 0.997). Observe
that for the hardest problems (n = l), the relative difference
between these two points is only O(n1.14), much smaller
than the whole range of possible holes (n4). So, as n in
creases, the width of the hard part of the phase transition
seems to decrease, in a normalized scale.
Conclusions
In this paper we show how different strategies for generat
ing Generalized Sudoku instances, based on the shape of the
block regions and the balance of the initial empty cells, can
dramatically impact the computational hardness of the in
stances. Our Generalized Sudoku problem generator pro
duces instances that are several orders of magnitude harder
than other structured instances of comparable size. We be
lieve our Generalized Sudoku generator should be of use in
the future development of systematic and stochastic local
search style CSP and SAT methods.
References
Achlioptas, D.; Gomes, C.; Kautz, H.; and Selman, B.
2000. Generating satisfiable problem instances. In Proc. of
AAAI00, 193–200.
Ans´ otegui, C.; del Val, A.; Dot´ u, I.; Fern´ andez, C.; and
Many` a, F. 2004. Modelling choices in quasigroup comple
tion: Sat vs csp. In Proc. of AAAI04.
Bart´ ak, R. 2005. On generators of random quasigroup
problems. In Proc. of ERCIM 05 Workshop, 264–278.
Bessi` ere, C., and R´ egin, J.C. 1996. Mac and combined
heuristics: Two reasons to forsake fc (and cbj?) on hard
problems. In CP, 61–75.
Dot´ u, I.; del Val, A.; and Cebri´ an, M. 2003. Redundant
modelingforthequasigroupcompletionproblem. InCP03.
E´ en, N., and S¨ orensson, N. 2003. An extensible satsolver.
In Proceedings of SAT 2003.
Gomes, C., and Selman, B. 1997. Problem structure in the
presence of perturbations. In Proceedings of the Fourteenth
National Conference on Artificial Intelligence (AAAI97),
221–227. New Providence, RI: AAAI Press.
Gomes, C.; Fern´ andez, C.; Selman, B.; and Bessi` ere, C.
2004. Statistical regimes across constrainedness regions.
In Proceedings CP’04.
Gomes, C. P.; Selman, B.; and Crato, N. 1997. Heavy
tailed distributions in combinatorial search. In Proceed
ings of the Third International Conference of Constraint
Programming (CP97). Linz, Austria.: SpringerVerlag.
Hulubei, T., and O’Sullivan, B. 2006. The impact of search
heuristics on heavytailed behaviour. Constraints 11(2).
Jacobson, M. T., and Matthews, P. 1996. Generating uni
formly distributed random latin squares. Journal of Com
binatorial Design 4:405–437.
Kannan, R.; Tetali, P.; and Vempala, S. 1997. Simple
markovchain algorithms for generating bipartite graphs
and tournaments. In Proc. of the eighth annual ACMSIAM
Symposium on Discrete Algorithms, 193–200.
Kautz, H.; Ruan, Y.; Achlioptas, D.; Gomes, C.; Selman,
B.; ; and Stickel, M. 2001. Balance and filtering in struc
tured satisfiable problems. In Proc. of IJCAI01, 193–200.
Kilby, P.; Slaney, J.; Thiebaux, S.; and Walsh, T. 2005.
Backbones and backdoors in satisfiability.
AAAI05, 193–200.
Li, C. M., and Anbulagan. 1997. Lookahead versus look
back for satisfiability problems. In Proc CP’97, 341–355.
Lynce, I., and Ouaknine, J. 2006. Sudoku as a sat problem.
In Proc. of Ninth International Symposium on Artificial In
telligence and Mathematics.
Moskewicz, M.; Madigan, C.; Zhao, Y.; Zhang, L.; and
Malik, S. 2001. Chaff: Engineering an efficient sat solver.
In Proceedings of 39th Design Automation Conference.
Refalo, P. 2004. Impactbased search strategies for con
straint programming. In Proceedings CP’04.
R´ egin, J. C., and Gomes, C. 2004. The cardinality matrix
constraint. In Proceedings CP’04.
Shaw, P.; Stergiou, K.; and Walsh, T. 1998. Arc consis
tency and quasigroup completion. In Proceedings of the
ECAI98 workshop on nonbinary constraints.
Simonis, H. 2005. Sudoku as a constraint problem. In
Proc. of Fourth International Workshop on Modelling and
Reformulating Constraint Satisfaction Problems, CP 2005,
13–27.
Yato, T., andSeta, T. 2002. Complexity andcompletness of
finding another solution and its application to puzzles. In
Proc. of National Meeting of the Information Processing
Society of Japan (IPSJ).
In Proc. of
View other sources
Hide other sources
 Available from Ramón Béjar · May 23, 2014
 Available from psu.edu