Available via license: CC BY 4.0
Content may be subject to copyright.
Towards an Understanding of
Long-Tailed Runtimes of SLS Algorithms
Jan-Hendrik Lorenz and Florian W¨orz
Universit¨at Ulm, Germany
jan-hendrik.lorenz@alumni.uni-ulm.de,florian.woerz@uni-ulm.de
October 25, 2022
Abstract
The satisfiability problem (SAT) is one of the most famous problems in computer science.
Traditionally, its NP-completeness has been used to argue that SAT is intractable. However,
there have been tremendous practical advances in recent years that allow modern SAT
solvers to solve instances with millions of variables and clauses. A particularly successful
paradigm in this context is stochastic local search (SLS).
In most cases, there are different ways of formulating the underlying SAT problem.
While it is known that the precise formulation of the problem has a significant impact on
the runtime of solvers, finding a helpful formulation is generally non-trivial. The recently
introduced
GapSAT
solver [Lorenz and W¨orz 2020] demonstrated a successful way to improve
the performance of an SLS solver on average by learning additional information which logically
entails from the original problem. Still, there were also cases in which the performance slightly
deteriorated. This justifies in-depth investigations into how learning logical implications
affects runtimes for SLS algorithms.
In this work, we propose a method for generating logically equivalent problem formulations,
generalizing the ideas of
GapSAT
. This method allows a rigorous mathematical study of the
effect on the runtime of SLS SAT solvers. Initially, we conduct empirical investigations. If
the modification process is treated as random, Johnson SB distributions provide a perfect
characterization of the hardness. Since the observed Johnson SB distributions approach
lognormal distributions, our analysis also suggests that the hardness is long-tailed.
As a second contribution, we theoretically prove that restarts are useful for long-tailed
distributions. This implies that incorporating additional restarts can further refine all algo-
rithms employing above mentioned modification technique.
Since the empirical studies compellingly suggest that the runtime distributions follow
Johnson SB distributions, we also investigate this property on a theoretical basis. We
succeed in proving that the runtimes for the special case of Sch¨oning’s random walk algo-
rithm [Sch¨oning 2002] are approximately Johnson SB distributed.
Keywords
Stochastic Local Search
·
Runtime Distribution
·
Statistical Analysis
·
Johnson SB
Distribution
·
Lognormal Distribution
·
Long-Tailed Distribution
·
Restarts
·
SAT Solving
·
Learned Clauses ·Logical Entailment
Previous Versions This is the full-length version of the paper in the ACM Journal
of Experimental Algorithmics (JEA). See the last section for a discussion.
1
arXiv:2210.13159v1 [cs.DS] 24 Oct 2022
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Contents
1 Introduction 3
1.1 Studying Runtime Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 OurResults............................................ 4
1.2.1 HardnessDistribution .................................. 4
1.2.2 Theoretical Arguments for the Hardness Distribution . . . . . . . . . . . . . . . . . 4
1.3 Previous Work on Runtime Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 OutlineofThisPaper ...................................... 6
2 Preliminaries 6
2.1 BasicNotation .......................................... 6
2.2 TheResolutionProofSystem .................................. 6
2.3 AShortProbabilityPrimer ................................... 7
2.4 ProbabilityDistributions..................................... 9
2.4.1 The Johnson SB Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Lognormal Distributions as Embedded Model of Johnson SB Distributions . . . . 11
3 Evidence for Long-Tails in SLS Algorithms 11
3.1 Design of the Adjusted Logical Formula Algorithm Alfa . . . . . . . . . . . . . . . . . . . 12
3.2 Empirical Evaluation of the Hardness Distribution . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1
Experimental Setup, Instance Types, and Solvers Used to Obtain Hardness Distri-
butionData........................................ 13
3.2.2 Experimental Results and Statistical Evaluation of the Hardness Distribution . . . 14
3.3 Restarts Are Useful For Long-Tailed Distributions . . . . . . . . . . . . . . . . . . . . . . 18
4 Theoretical Justifications for the Johnson SB Conjecture 21
4.1 ProofOverview.......................................... 22
4.2 Glossary of Notations and Notational Conventions . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Analysis of the Runtime Distribution of the Algorithm . . . . . . . . . . . . . . . . . . . . 24
4.3.1 The Infinite Case NeverSel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.2 The Finite Case c < ∞................................. 31
4.3.3 CombiningBothCases.................................. 32
4.4 Analysis of the Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.1 Distribution Analysis of the Random Variable P................... 33
4.4.2 Distribution Analysis of the Random Variable Q................... 35
4.4.3 Distribution Analysis of the Random Variable R................... 35
4.4.4 Concluding Remarks Regarding the Analysis of the Random Variables . . . . . . . 36
4.5 PuttingEverythingTogether .................................. 36
5 Conclusion 36
A The Finite Case 42
A.1 The Case Sel(c+ 1) = 1 of the First Factor D1in Line (19) ................ 44
A.2 The Case Sel(k) = 0 of the Factors in the Big Product of Line (19) ............ 45
A.3 PuttingTogetherBothCases .................................. 45
A.4 AllowingRestarts......................................... 46
B Connections Between Different Distributions 46
B.1 Embedded Models: Johnson SB Approaches Lognormal . . . . . . . . . . . . . . . . . . . 46
B.2 A Proof of Lemma 54 ...................................... 47
B.3 A Proof of Lemma 55 ...................................... 48
2
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
1 Introduction
The satisfiability problem (SAT) asks to determine if a given propositional formula
F
has a
satisfying assignment or not. Since Cook’s
NP
-completeness proof of the problem [
Coo71
], SAT
is believed to be computationally intractable in the worst case. However, in the field of applied
SAT solving, there have been enormous improvements in the performance of SAT solvers in
the last 20 years. Motivated by these significant improvements, SAT solvers have been applied
to an increasing number of areas, including bounded model checking [
BCC+03
,
CBRZ01
],
cryptology [EPV08], and even bioinformatics [LM06], to name just a few.
Stochastic local search (SLS ) is an especially successful algorithmic paradigm that many SAT
solvers employ [
BHvMW09
, Chapter 6]: There are solvers solely based on the SLS paradigm,
e. g., the solvers
probSAT
[
BS12
],
dimetheus
[
BM16
], and
YalSAT
[
Bie14
]; SLS has been used
in parallel solvers, e. g.,
Plingeling
[
Bie17
]; and is nowadays even a standard component of
sequential conflict-driven clause learning (CDCL) solvers, for example of
ReasonLS
[
CZ18
],
CaDiCaL
[
BFFH20
], the
Relaxed∗
family of solvers [
CZ19
,
CZ21
],
Kissat
[
BFH21
], newer
versions of
CryptoMiniSat
[
SNC09
], and
MergeSat
[
Man21
]. In [
CZFB22
], Cai et al. tightly
integrated SLS with three CDCL solvers, which significantly increased performance. The SLS
paradigm is furthermore frequently employed in solving MaxSAT (see e.g., [BBJM21]).
Broadly speaking, SLS solvers operate on complete assignments for a formula
F
. These
solvers are started with a randomly generated complete initial assignment
α
. If
α
satisfies
F
, a
solution is found. Otherwise, the SLS solver explores the neighborhood of the current assignment
by repeatedly flipping the value of some variable in the assignment when this variable is chosen
according to some underlying heuristic (e. g., aiming to minimize the number of unsatisfied
clauses by the assignment). That is, these solvers perform a random walk over the set of complete
assignments for the underlying formula.1
The success of SLS solvers is demonstrated by
probSAT
[
BS12
],
dimetheus
[
BM16
], and
YalSAT
[
Bie17
], winning several gold medals in the random track of previous SAT competitions.
SLS algorithms are also of interest from a theoretical perspective. For example, Sch¨oning [
Sch02
]
describes an algorithm (called
SRWA
in the following) with an appealing worst-case guarantee.
Furthermore, we firmly believe that a better understanding of SLS will help in the design of
future CDCL–SLS hybrids.
1.1 Studying Runtime Distributions
Although SLS algorithms are highly successful in solving SAT instances, as witnessed by their
comparatively low mean runtime, they often show a high variation in the runtime required to
solve a fixed instance over repeated runs. However, measures like the mean or the variance cannot
capture the long-tailed behavior of difficult instances. Some authors (e. g., [
FRV97
,
GS97
,
RF97
])
thus shifted their focus to studying the runtime distributions of search algorithms, which helps
to understand these methods better and draw meaningful conclusions for the design of new
algorithms.
A relatively new algorithmic technique is considering modified versions of the input problem.
For example, in the mixed integer programming community, it is known that the performance
is sensitive to the used modification [
LRV16
]. A similar approach is also employed in some
backtracking SAT solvers (known as CDCL solvers [
MS96
,
MMZ+01
]) that learn additional
1
In contrast to CDCL solvers and resolution, which are complete algorithms that can prove the unsatisfiability
of a formula in a finite amount of steps, SLS solvers are incomplete, i.e., in general, they cannot output the
solution in a finite number of steps.
3
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
information during their run. However, all successful SLS SAT solvers of the last decades work
on the original, unmodified instance.
In [
LW20
], the authors investigated the effect of modifying the input instance for SLS SAT
solvers. More specifically, they changed the input instance by adding new, logically equivalent
clauses to the problem. For this, a new solver, called
GapSAT
, was introduced. This new solver
is based on
probSAT
and uses the addition of new clauses as a preprocessing step, thus, yielding
a terraformed landscape. A comprehensive experimental evaluation found statistical evidence
that the performance of
probSAT
substantially increased with this modification technique.
However, the authors pointed to the fact that for some instances, the performance slightly
deteriorated when
probSAT
had access to these additional clauses, albeit all of them contained
useful information.
These experiments motivate to study the technique of adding new clauses in more detail.
In particular, it seems worthwhile to obtain a better understanding of the phenomenon that
adding new clauses improves the mean runtime, but there exist instances where adding clauses
can harm the performance of SLS.
Motivated by that, this work centers around studying the behavior of SLS solvers when these
solvers work on formulas that were extended by logical consequences of the initial formulation.
1.2 Our Results
1.2.1 Hardness Distribution
We study the runtime (or, more precisely, hardness) distribution of several SLS algorithms when
logical implications are added to an original formula. Central to all our investigations is the
basic elementary algorithm
Alfa
, that we introduce in this work. This algorithm is specifically
constructed in such a way that it is convenient to construct mathematical arguments after an
initial empirical analysis.
Our empirical evaluations suggest that the hardness distribution is long-tailed (called the
Weak Conjecture
). In fact, a stronger statement can be deduced: The data indicate that
the distribution follows a Johnson SB distribution (called the
Strong Conjecture
). We also
empirically show for our setting that this distribution converges to a lognormal distribution.
Since lognormal distributions are long-tailed, it is thus already established that if the Strong
Conjecture is true, the algorithm can be improved by restarts [
Lor18
]. We extend this result to
the case in which the Weak Conjecture is true: That is, we theoretically prove that restarts are
useful for the larger class of algorithms that exhibit a long-tailed distribution.
1.2.2 Theoretical Arguments for the Hardness Distribution
It should be highlighted how good the Johnson SB fit is for the observed data. The distribution
describes both typical and exceedingly low or high values exceptionally accurately. Only a
marginal absolute and relative error between the fits and the observations can be observed.
Moreover, this is true for all considered problem domains.
It is extraordinary that a simple parameterized distribution accurately describes the runtime
behavior of an entire group of algorithms (SLS solvers) on various domains. Since such behavior
is unlikely due to chance, we are pursuing theoretical explanations for this phenomenon. We
succeed in showing that the hardness distribution for the special case of Sch¨oning’s random
walk algorithm
SRWA
is indeed approximately Johnson SB distributed, confirming the Strong
Conjecture in practice. To the best of our knowledge, there are no comparable works deriving
the runtime distribution for the full support.
4
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
1.3 Previous Work on Runtime Distributions
Before continuing, we proceed to report on related work regarding the analysis of runtime
distributions. We include here related work showing why knowledge of runtime distributions, as
we obtain it in this work, is immensely valuable.
The study [
FRV97
] presented empirical evidence for the fact that the distribution of the
effort (more precisely, the number of consistency checks) required for backtracking algorithms to
solve constraint satisfaction problems randomly generated at the 50% satisfiable point can be
approximated by the Weibull distribution (in the satisfiable case) and the lognormal distribution
(in the unsatisfiable case). These results were later extended to a wider region around the
50 % satisfiable point [
RF97
]. It should be emphasized that this study created all instances
using the same generation model. This resulted in the creation of similar yet logically non-
equivalent formulas. We, however, firstly use different models to rule out any influence of the
generation model and secondly generate logically equivalent modifications of a base instance
(see Algorithm 1). This approach lends itself to the analysis of existing SLS solvers, like
GapSAT
.
The significant advantage is that the conducted work is not lost in the case of a restart: only
the logically equivalent instance could be changed while keeping the current assignment.
The runtime distributions of CDCL solvers was studies in [
KLW22
]. The authors empiri-
cally demonstrated that Weibull mixture distributions can accurately describe the multimodal
distributions found. They concluded that adding new clauses to a base instance has an inherent
effect of making runtimes long-tailed.
In [
GSCK00
], the cost profiles of combinatorial search procedures were studied. It was shown
that they are often characterized by Pareto-L´evy distributions and empirically demonstrated
how rapid randomized restarts can eliminate tail behavior. We, however, theoretically prove the
effectiveness of restarts for the larger class of long-tailed distributions.
The paper [
ATC13
] studied the solvers
Sparrow
and
CCASAT
and found that the lognormal
distribution is a good fit for the runtime distributions of randomly generated instances. For
this, the Kolmogorov–Smirnov statistic
supt∈R|ˆ
Fn
(
t
)
−F
(
t
)
|
was used. Although the KS-test
is very versatile, this comes with the disadvantage that its statistical power is relatively low.
The KS statistic is also nearly useless in the tails of a distribution: A high relative deviation
of the empirical from the theoretical cumulative distribution function in either tail results in
a very small absolute deviation. It should also be remarked that the paper studies only few
formulas in just two domains, ten randomly generated and nine crafted. Our work addressed
both shortcomings of this paper: The
χ2
-test gives equal importance to the goodness-of-fit
over the full support, and various instance domain models (both theoretical and applied) are
considered in this paper.
We want to stress the fact that studies on the runtime distribution of algorithms are quite
sparse, even though knowledge of the runtime distribution of an algorithm is extremely valuable:
•
Intuitively speaking, if the distribution is long-tailed, one knows there is a risk of ending
in the tail and experiencing very long runs; simultaneously, the knowledge that the time
the algorithm used thus far is in the tail of the distribution can be exploited to restart the
procedure (and create a new logically equivalent instance
F(2)
). We rigorously prove this
statement for all long-tailed algorithms.
•
Given the distribution of an algorithm’s sequential runtime, it was shown in [
ATC13
] how
to predict and quantify the algorithm’s expected speedup due to parallelization.
•
If the hardness distribution is known, experiments with a small number of instances can
lead to parameter estimations of the underlying distribution [FRV97].
5
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
•
Knowledge of the distribution can help to compare competing algorithms: e. g., one can test
if the difference in the means of two algorithm runtimes is significant if the distributions
are known [FRV97].
1.4 Outline of This Paper
The rest of this paper is organized as follows. We start by presenting the necessary notations
and the resolution proof system in Section 2. This section also includes a short probability
primer and an overview of probability distributions that will be appealed to in this paper.
In Section 3, we then begin our empirical analysis to provide evidence for long-tails in SLS
algorithms. We show that the Johnson SB distribution (which converges to the lognormal
distribution) provides an exceptional fit to the hardness distribution. We also obtain strong
evidence that the distribution is long-tailed. To conclude the section, we prove that restarts are
useful for long-tailed distributions. Section 4contains theoretical justifications for the Johnson
SB distribution in the
SRWA
case. Finally, in Section 5, we make some concluding remarks. All
data and code produced for this paper is made publicly available. Therefore, all experiments
are completely reproducible. For this, we refer to the end of the article.
2 Preliminaries
2.1 Basic Notation
Aliteral over a Boolean variable
x
is either
x
itself or its negation
x
. A clause
C
=
a1∨···∨a`
is a (possibly empty) disjunction of literals
ai
over pairwise disjoint variables. A CNF formula
F
=
C1∧ · ·· ∧ Cm
is a conjunction of clauses. We will sometimes interpret clauses as sets of
literals and CNF formulas as sets of clauses. The set of variables of a clause
C
is denoted by
Vars
(
C
). This notion is extended to formulas by taking unions. The width of a clause
C
is
given by
|Vars
(
C
)
|
. A CNF formula is a
k
-CNF if all clauses in it have at most
k
variables. An
assignment
α
for a CNF formula
F
is a function that maps some subset
Dom
(
α
)
⊆Vars
(
F
) to
{
0
,
1
}
. The assignment is called complete if
Dom
(
α
) =
Vars
(
F
), otherwise it is called partial.
The application of an assignment
α
to a clause
C
or a formula
F
will be denoted with
Cα
or
F α
, respectively. An assignment
α
satisfies a CNF formula
F
if at least one literal in every
clause of
F
is set to 1 by
α
. A formula
F
logically implies a clause
C
if every complete truth
assignment which satisfies
F
also satisfies
C
, for which we write
FC
. If
L
is a set of clauses,
we write
FL
if
FC
for all
C∈L
. If
L
is such that
FL
, then we call
F
and
F∪L
logically equivalent formulas. The act of changing the truth value of precisely one variable of a
complete assignment
α
is called a flip. When changing an assignment
α
by flipping a variable
x
of this assignment, the new assignment will be denoted with
α
[
x
], i. e.,
α
[
x
](
x
)
:
= 1
−α
(
x
),
while
α
[
x
](
y
)
:
=
α
(
y
) for
y6
=
x
. If
α
(
x
) = 1, we also write
α
[
x
= 1] for
α
[
x
]; otherwise we write
α[x= 1] for α[x].
2.2 The Resolution Proof System
Resolution is the proof system with the single derivation rule
B∨x C ∨x
B∨C,
where B∨xand C∨xare clauses. Clearly,
(B∨x)∧(C∨x)(B∨C).
6
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
In the paper, we will also use width-wrestricted resolution, introduced in the following.
Definition 1. Let Fbe a clause set, and w∈Nbe a positive integer. We define the operator
Resw(F):=F∪RRis a resolvent of two clauses in Fand |R| ≤ w.
Moreover, we inductively define Res0
w(F):=Fand
Resn+1
w(F):= ReswResn
w(F),for n≥0.
Finally, we set
Res∗
w(F):=[
n≥0
Resn
w(F).
2.3 A Short Probability Primer
We assume knowledge of conditional probabilities. In Section 4, however, these elementary
notions do not suffice. Thus, in this section, we introduce the necessary concepts involving
random variables in the expectations.
All random variables in this paper will be denoted with bold lettering. We wish to especially
highlight that some random variables describe probabilities for which we will use the notation
PPP
(
·
).
On the other hand, some probabilities are constants not depending on a random set, say
L
,
and are thus denoted by
P
[
·
]. We will adhere to this convention already in the preliminaries
to help the readers familiarize themselves with this notation. While this differentiation might
initially feel strange and unnecessary, we believe it immensely helps to parse the equations in
Section 4and find the random variables that “hide under the cloak” of appearing as a standard
probability at first sight.
Definition 2.
Let
X
be a discrete random variable and
A
an event. The conditional probability
of
A
given
X
is defined as the random variable, written
PPP(A|X)
(
ω
), that takes on the value
P[A|X=x] whenever X=x. More formally,
PPP(A|X)(ω):=P[A|X=X(ω)].
Thus, the conditional probability
PPP(A|X)
is a function of
X
and, therefore, itself a random
variable (thus denoted with the
PPP
-symbol and round brackets for better discriminability). In
particular, it is not a real value in the interval [0,1].
A similar concept like in Definition 2can be defined for expectations.
Definition 3
([
MU17
])
.
Let
X
be a random variable on a sample space Ω. Further, let
Y
be a
discrete random variable defined on the same sample space. Then, the conditional expectation
of Xwith respect to Yis the random variable EEEX|Y(·): Ω →Rdefined by
EEEX|Y(ω):=E[X|Y=Y(ω)].
Notice that
EEEX|Y
itself is again a random variable – it is not a real value. Its value
depends on the random variable
Y
. We make this clear with two examples. The second example
will take a central stage in Section 4.
7
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Example 4
([
MU17
])
.
Suppose that two standard dice are rolled independently. Let
X1
be
the result of the first dice,
X2
the result of the second dice, and
S
the sum of both results. For
all x∈ {1,...,6}it holds
E[S|X1=x] = 1
6
x+6
X
s=x+1
s=x+7
2.
Hence,
EEE
(
S|X1
) =
X1
+
7
2
is a random variable whose value depends on
X1
. If event
ω
occurs,
then X1has value X1(ω), and therefore EEE(S|X1) takes on the value
EEE (S|X1)(ω) = E[S|X1=X1(ω)] = X1(ω) + 7
2∈R.
Example 5.
If we let
#Flips
(
G
) denote the random variable specifying the number of flips a
fixed SLS algorithm takes to find a satisfying assignment for instance G, then
ERL:=E[#Flips(F∪L)|L]
denotes the expected runtime of this SLS algorithm on the extended instance
F∪L
, which is
dependent on the concrete realization Lof the random extension set L. In particular,
EEE #Flips(F∪L)|L(L) = E[#Flips(F∪L)|L=L].
Theorem 6
(Law of total probability, LTP)
.
Let (
Bi
)
i
be a finite or countable partition of the
sample space Ωsuch that P[Bi]>0for all iand A⊆Ω. Then
P[A] = X
i
P[A|Bi]·P[Bi].
Similarly, if
P
[
A|C
]is defined for
C⊆
Ωand the partition is such that
P
[
C∩Bi
]
>
0for each
i
,
then
P[A|C] = X
i
P[A|C∩Bi]·P[Bi|C].
To simplify arguments using the LTP, it is common practice to omit all terms for which
P
[
Bi
] = 0, because
P
[
A|Bi
] is finite (if
P
[
Bi
] = 0, then according to the simple definition above,
P
[
A|B
] is undefined; however, it is possible to define a conditional probability with respect to a
σ-algebra of such events). The same holds for the LTE below.
Theorem 7
(Law of total expectation, LTE)
.
Let
X
be a discrete random variable on a
probability space (Ω
,F,P
)such that
E
[
X
]is defined. Further, let (
Ai
)
i
be a finite or countable
partition of Ωsuch that P[Ai]>0for each i. Then
E[X] = X
i
E[X|Ai]·P[Ai].
Similarly, if E[X|B]is defined and the partition is such that P[B∩Ai]>0for each i, then
E[X|B] = X
i
E[X|B∩Ai]·P[Ai|B].
Theorem 8 (Chain rule for probabilities).If A1, . . . , Anare random events, then
P"n
\
k=1
Ak#=
n
Y
k=1
P
Ak
|
k−1
\
j=1
Aj
.
8
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
2.4 Probability Distributions
Throughout the paper, we will make use of various probability distributions. We need the
concepts of cumulative distribution functions and probability density functions to introduce
these distributions.
Definition 9 ([JKB94]).Let Xbe a real-valued random variable.
(i) Its cumulative distribution function (cdf) is the function F:R→[0,1] with
F(t):=P[X≤t].
(ii) If Xis continuous, its quantile function Q: (0,1) →Ris given by
Q(p):= inf{t∈R|F(t)≥p}.
(iii) Its survival function Sis given by
S(x):= 1 −F(x).
(iv) If a non-negative, integrable function fwith the property
F(t) = Zt
−∞
f(u) dufor all t∈R
exists, it is called probability density function (pdf) of X.
If the underlying cdf of a sample is unknown, we use the empirical distribution function.
Definition 10.
Let
X1,...,Xn
be independent, identically distributed, real-valued random
variables with realizations
xi
of
Xi
. Then, the empirical cumulative distribution function (ecdf )
of the sample (x1, . . . , xn) is defined as
ˆ
Fn(t):=1
n
n
X
i=1
1
{xi≤t}, t ∈R,
where
1
Ais the indicator of event A.
2.4.1 The Johnson SB Distribution
Central to all distributions considered in this paper and necessary to introduce the Johnson SB
distribution is the concept of the well-known Gaussian normal distribution.
Definition 11.
An absolutely continuous random variable
X
is normally distributed with
expectation µ∈Rand variance σ2>0, denoted by X∼Nµ, σ2, if the pdf of Xis given by
fN(x|µ, σ) = 1
σ√2πexp −(x−µ)2
2σ2!.
Using normal distributions, we introduce the Johnson SB distribution, which takes the
central stage in this work.
9
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
0 0.2 0.4 0.6 0.8 1
0
1
2
3γ=−1.0, δ = 1.5
γ= 0.5, δ = 0.7
γ=−1.0, δ = 0.7
(a) ξ= 0, λ = 1
−1 0 1 2
0
1
2
3ξ=−1.1, λ = 1
ξ= 0.0, λ = 1
ξ=−1.1, λ = 3
(b) γ= 0.5, δ = 0.7
Figure 1:
This figure shows the effect of the parameters on the pdf of the Johnson SB distribution.
All Johnson SB distributions in the left plot use
ξ
= 0
, λ
= 1 as parameters. In the right plot, all
distributions use
γ
= 0
.
5
, δ
= 0
.
7 as parameters. In both cases, there are two varying parameters that
are given in the respective legend.
Definition 12
([
Joh49b
,
Joh49a
,
JKB94
])
.
An absolutely continuous random variable
X
is
Johnson SB distributed with parameters
γ∈R
,
δ >
0,
λ >
0, and
ξ∈R
, denoted as
X∼SB (γ, δ, λ, ξ), if ξ < X< ξ +λand
Z:=γ+δ·log X−ξ
ξ+λ−X∼N (0,1).
The Johnson SB distribution is highly flexible and can model distributions with finite support.
Figure 1illustrates several Johnson SB distributions, including the effect of the parameters on
the form of the pdf.
Remark 13
([
Che17
])
.
A Johnson SB distributed random variable has positive density support
on (
ξ, ξ
+
λ
). The parameters
γ
and
δ
are shape parameters (governing the asymmetry and
kurtosis, respectively),
λ
is the scale parameter, and
ξ
is the location parameter. Letting
a:
=
ξ
and b:=ξ+λ, the pdf fSB is given by
fSB(x|a, b, γ , δ) = 1
√2π
(b−a)δ
(x−a)(b−x)exp −1
2γ+δln x−a
b−x2, x ∈(a, b).
Furthermore, the Johnson SB distribution has the following scaling property.
Lemma 14.
Let
X
be a Johnson SB distributed random variable having parameters
a, b, γ, δ ∈R
with
a < b
and
δ >
0. Then, the random variable
Y
=
g·X
, where
g∈R+
is an arbitrary,
positive real number, is Johnson SB distributed with parameters ga, gb, γ, δ ∈R.
Proof.
Let
fX
and
FX
denote the pdf and the cdf of
X
. Likewise,
fY
and
FY
are the pdf and
the cdf of Y. Since
P[Y≤x] = P[g·X≤x] = PX≤x
g=FXx
g,
it follows fY(x) = 1
gfXx
g. Thus,
fY(x) = 1
g
1
√2π
(b−a)δ
(x
g−a)(b−x
g)exp
−1
2"γ+δln x
g−a
b−x
g!#2
=1
√2π
(gb −ga)δ
(x−ga)(gb −x)exp −1
2γ+δln x−ga
gb −x2.
This is the pdf of a Johnson SB distribution with parameters ga, g b, γ, δ.
10
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
2.4.2 Lognormal Distributions as Embedded Model of Johnson SB Distributions
Experimentally, we show that the fits of the SLS hardness distributions exhibit parameter
combinations that suggest the involvement of an embedding process: Informally speaking,
the Johnson SB distribution can be thought of as converging to a lognormal distribution (see
Figure 2for an illustration). Hence, the Johnson SB distribution is sometimes referred to as
four-parameter lognormal distribution [AB63]. The lognormal distribution is given as follows.
Definition 15
([
Wic17
])
.
An absolutely continuous, positive random variable
X
is (three-
parameter) lognormally distributed with parameters
σ2>
0,
ξ >
0, and
µ∈R
, if
log
(
X−ξ
),
where
X> ξ
, is normally distributed with mean
µ
and variance
σ2
. In the following, we refer
to σas the shape,µas the scale, and ξas the location parameter.
If the location parameter
ξ
is zero, we call
X
two-parameter lognormal distributed and
commonly omit ξ.
Remark 16
([
CE88
])
.
The pdf
f3LogN
of the three-parameter lognormal distribution is given
by
f3LogN(x|µ, σ, ξ) = 1
(x−ξ)σ√2πexp (−ln(x−ξ)−µ2
2σ2), x > ξ.
The next definitions make the embedding of the lognormal distribution in the Johnson SB
distribution more precise.
Definition 17
([
Che17
])
.
Consider a function
f
having parameters Θ. Furthermore, let Θ =
(Θ
L,
Θ
R
) be a partition of the parameters with Θ
L
= (Θ
1,...,
Θ
`
) and Θ
R
= (Θ
`+1,...,
Θ
`+r
).
Lastly, assume b
Θ = g(Θ) for some function g. If
lim
ΘL→0f(x|Θ) = f0(x|b
Θ),
for some well-defined function f0, then f0is called an embedded model of f.
Lemma 18
([
Che17
,
JKB94
])
.
The lognormal distribution is an embedded model of the SB
distribution.
Proof. We have provided a proof in Appendix B.1.
In particular,
fSB →f3LogN
for the reparametrization
γ
=
δln
(
b−a
)
−µ
in the Johnson
SB distribution and
b→ ∞
. For all intents and purposes, it suffices for the right endpoint
b
of the density support (
a, b
) to increase (or
λ
, respectively), while
γ
can grow logarithmically
slow [Che17]. We refer to Figure 2for an illustration.
3 Evidence for Long-Tails in SLS Algorithms
The authors of [
LW20
] introduced the hybrid solver
GapSAT
by augmenting an SLS solver with
a clause-learning feature. After receiving a set of additional clauses
L
(implied by the original
formula
F
), the solver can be understood as solving a modified instance
F0
. This paper showed
that adding new clauses is beneficial to the mean runtime (in flips) of the SLS solver
probSAT
underlying the hybrid model. However, it was also demonstrated that although adding new
clauses can improve the mean runtime, there exist instances where adding clauses can harm the
performance of SLS. As announced, this behavior is worth studying further to help eliminate
the risk of increasing the runtime of such procedures.
11
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
0 2 4
0
1
2
Johnson SB
Lognormal
(a) λ= 1.0, γ =−0.8
0 2 4
0
0.1
0.2
0.3
Johnson SB
Lognormal
(b) λ= 10.0, γ ≈1.042
0 2 4
0
0.1
0.2
0.3
Johnson SB
Lognormal
(c) λ= 100.0, γ ≈2.884
Figure 2:
This figure shows that the Johnson SB distribution converges towards a fixed lognormal
distribution if
λ
and
γ
approach infinity. Each subfigure shows a Johnson SB distribution together
with its limiting lognormal distribution. The Johnson SB distributions all have
ξ
= 0
.
0
, δ
= 0
.
8 as
parameters. The remaining two parameters
λ
and
γ
are given in the corresponding subcaptions. One
should note that
γ
can take relatively small values for the convergence to still take place. More precisely,
it suffices for
γ
to grow logarithmically slow. This is theoretically explained in [
Che17
]. The limiting
lognormal distribution has σ= 1.25, µ = 1.0 as parameters.
For this reason, in this section, we study the runtime (or, more precisely, hardness) distribution
of the procedure
Alfa
that we introduce below. This procedure models the addition of a random
set of logically equivalent clauses
L
to a formula
F
and the subsequent solving of this amended
formula
F(1) :
=
F∪L
by an SLS solver. Our empirical evaluations show that this distribution
is long-tailed. This fact enables us to prove that restarts are useful for Alfa.
3.1 Design of the Adjusted Logical Formula Algorithm Alfa
Our SLS solver
Alfa
(Adjusted logical formula algorithm) receives a satisfiable formula
F
as
input. The algorithm then proceeds by adding to
F
a random set
L
of logically generated
clauses. It finally calls an SLS solver to solve the clause set
F∪L
. The pseudocode for
Alfa
is
given in Algorithm 1.
Input: Boolean formula F,Promise: F∈SAT
Generate randomly a set Lof clauses such that FL(e. g., with Algorithm 2)
Call SLS(F∪L) for some SLS solver SLS
Algorithm 1: Alfa acts as a base algorithm that can use different SLS algorithms.
We use width-
w
restricted resolution (recall Definition 1) in Algorithm 2as a natural way to
sample a set
L
of logically equivalent clauses with respect to a base instance
F
. This allows us
the formulation of Algorithm 2that is used to generate random sets Lwith resolution.
Input: Boolean formula F, integer w, probability p∈(0,1]
L:=∅
foreach R∈Res∗
w(F)\Fdo
with probability pdo L:=L∪ {R}
end
return L
Algorithm 2: Generation of the random set Lwith width-wrestricted resolution.
12
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
3.2 Empirical Evaluation of the Hardness Distribution
3.2.1 Experimental Setup, Instance Types, and Solvers Used to Obtain Hardness Distri-
bution Data
Hoos and St¨utzle [
HS98
] introduced the concept of runtime distribition to characterize the
cdf of Las Vegas algorithms, where the runtime can vary from one execution to another, even
with the same input. To obtain enough data for a fitting of such a distribution, for each base
instance
F
, we created 5000 modified instances
F(1), . . . , F (5000)
by generating resolvent sets
L(1), . . . , L(5000)
using Algorithm 2with
w
= 4 and a value of
p
such that the expected number
of resolvents being added was
1
10 |F|
. We also conducted experiments to rule out the influence
of
p
on our results. Each of the modified instances was solved 100 times, each time using a
different seed. For
i
= 1
,...,
5000 and
j
= 1
,...,
100, we obtained the values
flipsS
(
F(i), sj
)
indicating how many flips solver
S
used to solve the modified instance
F(i)
when using seed
sj
.
Next, we calculated
meanS(F(i)):=1
100
100
X
j=1
flipsS(F(i), sj),
the mean number of flips required to solve
F(i)
with solver
S
whose hardness distribution we
are going to analyze.
All experiments were performed on bwUniCluster 2.0 and three local servers, using Sput-
nik [
VLS+15
] to distribute the computation and parallelize the trials. Due to the heterogeneity
of the computer setup, measured runtimes are not directly comparable to each other. Conse-
quently, we instead measured the number of variable flips performed by the SLS solver. This
is a hardware-independent performance measure with the benefit that it can also be analyzed
theoretically.
Next, we describe the generation and satisfiability sampling of the instances. For the
experiments, the following instance types were used:
(1)
Hidden Solution: We used our implementation [
LW21
] of the CDC algorithm [
BC18
,
BHL+02
] to generate instances with a hidden solution. SLS solvers typically struggle to
solve such instances [
LW20
]. Thus, experiments like these might be beneficial in finding
theoretical reasons for this behavior.
(2)
Hidden Solution With Different Chances: We also created formulas with different chance
values, i.e., the probability of adding a clause in Algorithm 2. The purpose is to rule out
the influence of the chance value.
(3)
Uniform Random: Using Gableske’s
kcnfgen
[
Gab15
], we generated formulas with
n∈
{
50
,
60
,
70
,
80
,
90
}
variables and a clause-to-variable ratio
r
close to the satisfiability thresh-
old [
MMZ06
] of
r≈
4
.
267. Then, we checked each instance with
Glucose3
[
AS09
,
ES04
]
for satisfiability until we had five formulas of each size.
(4) Factoring: We encoded the factoring problem in the interval {128,...,256}with [Die21].
(5)
Coloring: These formulas assert that a graph is colorable with three colors. We generated
these formulas, using [
LENV17
], over random graphs with
n
vertices and
m
= 2
.
254
n
edges
in expectation, which is slightly below the non-colorability threshold [
KKS00
]. We obtained
32 satisfiable instances in 150 variables.
Our experiments investigated leading SLS solvers whose dominating component is based
on the random walk procedure proposed in [
Sch02
]. In this paper, Sch¨oning’s Random Walk
Algorithm
SRWA
(see Algorithm 4on page 22) was introduced. The
probSAT
solver family [
Bal15
],
13
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
including
YalSAT
[
Bie14
], is based on this approach. The excellent performances and similarities
were reasons for choosing
SRWA
,
probSAT
, and
YalSAT
as main solvers (
probSAT
won the random
track of the SAT competition 2013, and
YalSAT
won in 2017). Only recently, in 2018, other
types of solvers significantly exceeded
probSAT
-based algorithms. This lasting performance is
why this solver family is chosen in this study.
For
SRWA
, we conducted most of our experiments: All instance types were tested, including
different chance values in Algorithm 2. For
probSAT
, 55 hidden solution instances with
n∈
{
50
,
100
,
150
,
200
,
300
,
800
}
were used. Since
YalSAT
can be regarded as a
probSAT
derivate, we
tested YalSAT with ten hidden solution instances with 300 variables each.
3.2.2 Experimental Results and Statistical Evaluation of the Hardness Distribution
This section aims to explore how an instance’s hardness changes when logically equivalent clauses
are added in the manner described above. To characterize this effect as accurately as possible,
studying the ecdf is the most suitable method (recall Definition 10).
In the following, we demonstrate that the Johnson SB distribution, in particular, provides
an exceptionally accurate description of the runtime behavior, and this is true for all considered
problem domains and solvers. The results are so compelling that we ultimately conjecture that
the runtimes of
Alfa
-type algorithms all follow a Johnson SB distribution, regardless of the
problem domain.
To illustrate the accurate description of the runtime behavior mentioned above, we first
demonstrate our approach using two base instances. The first one is a factorization instance
that SRWA solved. The second instance has a hidden solution and was solved by
probSAT
. We
refer to the first instance as
A
and to the second instance as
B
. We estimate the Johnson SB
distribution’s four parameters using the 5000 data points obtained by applying the maximum
likelihood method (see [
WL22
]). After that, one can visually evaluate the suitability of the
fitted Johnson SB distribution for describing the data by plotting the ecdf and the fitted cdf on
the same graph.
Such a comparison is illustrated in Figure 3for the two instances
A
and
B
. In both
cases, no difference between the empirical data of the ecdf and the fitted distribution can be
detected visually (the absolute error between the predicted probabilities from the fitted cdf
versus the empirical probabilities from the ecdf is minuscule). These two example instances are
representative of the behavior of the investigated algorithms. Hardly any deviation could be
observed in this plot type for all instances and all algorithms (all data is published under [
WL22
]).
0 2 4 6
×108
0
0.2
0.4
0.6
0.8
1
Flips
empirical
tted
0 0.5 1 1.5
×104
Flips
empirical
tted
Figure 3: The ecdf and fitted cdf of the hardness distribution of instance A(left) and B(right).
14
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Study of the Left Tail
For the analysis, however, one should not confine oneself to this plot
type. Although absolute errors can be observed easily, relative errors are more difficult to detect.
Such a relative error may have a significant impact when used for decisions such as restarts. To
illustrate this point, suppose that the actual probability of a run of length
is 0
.
0001. In contrast,
the probability estimated based on a fit is 0
.
001. As can be seen, the absolute error of 0
.
0009 is
small, whereas the relative error of 10 is large. If one were to perform restarts after
steps, the
actual expected runtime would be ten times greater than the estimated expected runtime. Thus,
the erroneous estimate of that probability would have translated into an unfavorable runtime.
This example should illustrate the importance of checking the tails of a distribution for errors
as well.
The left tail, i. e., the probabilities for very small values, can be checked visually by plotting
both axes logarithmically scaled. Thereby, probabilities for extreme events (in this case, especially
easy instances) can be measured accurately. The two instances
A
and
B
are being examined
in this manner in Figure 4. As can be observed, the Johnson SB fit accurately predicts the
probabilities associated with very short runs. For the other instances, Johnson SB distributions
were mostly also able to accurately describe the probabilities for short runs. However, the
behavior of the ecdf and the fitted Johnson SB distribution differed very slightly in a few
instances.
107108109
10−4
10−3
10−2
10−1
100
Flips
empirical
fitted
103.6103.8104
Flips
empirical
fitted
Figure 4: Logarithmically scaled ecdf and fitted cdf of instances A(left) and B(right).
Study of the Right Tail
The probabilities for particularly hard instances should also be
checked. We can easily detect errors in the right tail if we plot the empirical survival function,
i. e.,
ˆ
Sn
(
x
)
:
= 1
−ˆ
Fn
(
x
), and the fitted survival function together on a graph with logarithmically
scaled axes. Figure 5illustrates this type of plot for the instances
A
and
B
. Here, there is a
discernible deviation between
A
and
B
. While for
A
, the Johnson SB fit provides an accurate
description of the probabilities for long runs, in the case of
B
, the empirical survival function
seems to approach 0 somewhat slower than the Johnson SB estimate. In the vast majority of
cases, these extreme value probabilities are accurately reflected by the Johnson SB fit. In most
other cases, the empirical survival function approaches 0 more slowly than the Johnson SB fit.
Thus the likelihood of encountering an exceptionally hard instance is underestimated in these
cases.
Goodness-Of-Fit Tests
So far, we have discussed the behavior of Johnson SB fits based on this
visual inspection. We concluded that Johnson SB distributions seem well suited for describing
the data. Next, we concretize this through the
χ2
-goodness-of-test that is executed for each
15
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
107108109
10−4
10−3
10−2
10−1
100
Flips
empirical
fitted
103.8103.9104
Flips
empirical
fitted
Figure 5:
Logarithmically scaled empirical survival function and fitted survival function of in-
stances A(left) and B(right).
instance. Subsequently, the probability
p
that such a value of the test statistic occurs under the
assumption that the data follow a Johnson SB distribution (null hypothesis) is determined. If
the fit is poor, then a small
p
-value will occur. We use a sufficiently high
p
-value as a heuristic
whether the distribution assumption is reasonable.
However, there is an obstacle that complicates statistical analysis by this method. As
described, each of the 5000 data points is obtained by first sampling 100 runtimes of the
corresponding instance and then calculating the mean. This means that we do not work with
the actual expected values, but only estimates, meaning our data is noisy. If one were to apply
the
χ2
-test to this noisy data, some cases would be incorrectly rejected, especially if the variance
is large. To overcome this limitation, we use a bootstrap test, which is based on a test described
by Cheng [
Che17
]. This test is presented in Algorithm 3. We reject the distribution hypothesis
for an instance if it fails the bootstrap test (p < 0.05).
Input: (noisy) random sample y = (y1, y2, . . . , yn), integer N, significance α∈(0,1)
ˆ
θ←MLE(y , F ), Johnson SB maximum likelihood estimation, Fis the Johnson SB cdf
χ2←ChiSquare(y, ˆ
θ), Chi-squared goodness-of-fit test statistic
for j= 1 to Ndo
z ←(z1, . . . , zn), where all ziare i. i. d. samples from the fitted Johnson SB
distribution with parameters ˆ
θ
z ←z +
ξ, where
ξis sampled from an n-dimensional normal distribution
ˆ
θ0←MLE(z , F )
χ2
j←ChiSquare(z, ˆ
θ0)
end
Let χ2
(1) ≤χ2
(2) ≤ ·· · ≤ χ2
(N)be the sorted test statistics.
if χ2
(b(1−α)·Nc)< χ2then reject else accept;
Algorithm 3:
Bootstrap test for noisy data. The null hypothesis is that each datapoint
xi
comes from
an underlying Johnson SB distribution. Furthermore, each datapoint is obscured by additive noise
ξi
.
Thus, the only available observations are ~y =~x +~
ξ, i. e., noisy data.
Briefly summarized, this test simulates how our data points were generated, assuming the
null hypothesis. Due to the central limit theorem, it is reasonable to assume that the initial
data’s sample mean originates from a normal distribution around the true expected value. We
use this assumption in the bootstrap test using a noise signal drawn from a normal distribution
16
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
with expected value 0. Since each data point is the average of 100 runs, the variance of this
normal distribution is determined from the initial data and divided by 100 (cf. central limit
theorem).
We now consider Johnson SB distributions’ adequacy for describing
SRWA
runtimes. The
results of the statistical analysis are reported in Table 1and can be found in [
WL22
]. The total
of 4 rejected instances may be attributed to so-called type 1 errors.
Table 1:
Goodness-of-fit results for
Alfa
+
SRWA
over various problem domains. The rejected row
contains the number of instances where the Johnson SB distribution is not a good fit according to
the bootstrap test at a significance level of 0
.
05. To put these results into perspective, the second row
contains each domain’s total number of instances. Out of a total of 230 instances, 4 got rejected.
hidden different chances uniform factoring coloring total
rejected 1 2 0 0 1 4
# of instances 20 120 25 33 32 230
For
probSAT
, the situation appears to be slightly different. The results are summarized
in Table 2and [
WL22
]. As can be seen, the distribution hypothesis was rejected for 7 of the
55 instances. This number can no longer be accounted for by type 1 errors at a significance
level of 0.05.
Table 2:
Goodness-of-fit results for
Alfa
+
probSAT
over various hidden solution instance sizes. The
columns refer to the number of variables in the corresponding SAT instances.
number of variables 50 100 150 200 300 800 total
rejected 0 0 2 1 3 1 7
# of instances 10 10 10 10 10 5 55
Lastly, for
YalSAT
according to the bootstrap test, none of the total 10 instances got rejected.
Distribution Conjectures
In summary, the presumption that Johnson SB distributions are the
appropriate choice for describing runtimes has been reinforced for
Alfa
+
SRWA
. Likewise, the
choice of Johnson SB distributions also seems very reasonable for
Alfa
+
YalSAT
. This appears
to be still plausible for Alfa+probSAT. This leads us to:
Conjecture 19
(Strong Conjecture)
.
The runtime of
Alfa
with
SLS ∈ {SRWA,probSAT,YalSAT}
follows a Johnson SB distribution.
If this is true, then it would be intriguing that one can infer how modifying the base instance
affects the hardness of instances. Simultaneously, the Johnson SB distribution parameters
also provide insight into how the hardness of the instance changes. For example, the location
parameter
ξ
implies an inherent problem hardness that cannot be decreased regardless of the
choice of the added clauses. At the same time,
ξ
also serves as a numerical description for the
value of this intrinsic hardness. Using Bayesian statistics, it is possible to infer the parameters
while the solver is running. These estimations can, e. g., be used to schedule restarts. This leads
to a scenario similar to that in [RHK02].
Conjecture 19 is a strong statement. However, even a slight deviation of the probabilities, for
example, at the left tail, would render the strong conjecture invalid from a strictly mathematical
point of view. Notably, the visual analyses revealed that the left tail’s behavior, i.e., for extremely
short runs, is occasionally not accurately reflected by Johnson SB distributions. Conversely, the
right tail, i. e., the probabilities for particularly long runs, are usually either correctly represented
17
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
by Johnson SB distributions or, occasionally, the corresponding probability approaches 0 even
more slowly. We, therefore, rephrase our conjecture in a weakened form. Our observations fit a
class of distributions known as long-tail distributions defined purely in terms of their behavior
at the right tail.
Definition 20
([
FKZ11
])
.
A positive, real-valued random variable
X
is long-tailed, if and only
if
∀x∈R+:P[X> x]>0 and ∀y∈R+: lim
x→∞
P[X> x +y]
P[X> x]= 1.
Conjecture 21
(Weak Conjecture)
.
The runtime of
Alfa
with
SLS ∈ {SRWA,probSAT,YalSAT}
is long-tailed.
The fact of observing long-tailed ecdfs points towards the presence of a limiting process
that is involved. Recall from Lemma 18 that the Johnson SB distribution converges towards
a lognormal distribution (for
λ→ ∞
, while it is sufficient for
γ
to increase at a logarithmic
rate with respect to
λ
). This property is called embedded distribution. In our experiments, we
observed that the parameter
γ
of the resulting Johnson SB fit is sufficiently high for convergence.
The Johnson SB distribution has bounded support, i. e., all of its probability mass is concentrated
on a finite interval. The endpoints of the support can be derived directly from the parameters.
Increasing the number of variables of the formula under consideration will additionally ensure
that the density support (
a, b
) = (
ξ, ξ
+
λ
) of the fitted distribution will increase since formulas
with a higher number of variables will naturally be harder to solve. Hence,
λ
must increase.
Therefore, the Johnson SB distribution fits approach lognormal distributions. An illustration of
this convergence is shown in Figure 2.
As a case in point for the actual involvement of such a convergence phenomenon, we repeated
all tests above for the lognormal distribution. The visual inspection reveals that the lognormal
distribution can also fit the data exceptionally well. For
Alfa
+
SRWA
, 5 out of 230 instances got
rejected by the goodness-of-fit test; for
Alfa
+
probSAT
, 7 out of 55 instances got rejected; and
for
Alfa
+
YalSAT
, 2 out of 10 instances got rejected. Hence, it also seems very reasonable to
use a lognormal distribution to describe the hardness.
It should be noted that lognormal distributions have the long-tail property [
FKZ11
,
NWZ20
].
That is if the Strong Conjecture holds, the Weak Conjecture is implied (at least, after conver-
gence). The reverse is, however, not true. In the next section, we show an important consequence
in case the Weak Conjecture holds, i. e., when the distribution is long-tailed.
3.3 Restarts Are Useful For Long-Tailed Distributions
If the runtimes are already lognormally distributed, then restarts are useful [
Lor18
] in the
following sense.
Definition 22.
Let
X
be a random variable for the runtime of an SLS algorithm
A
on some
input. For
t >
0, the algorithm
At
is obtained by restarting
A
after time
t
if no solution is
found. Letting
Xt
model the runtime of
At
, we say that restarts are useful if there is a
t >
0
such that E[Xt]<E[X].
This section extends this result and mathematically proves that restarts are useful even
if only the Weak Conjecture holds. This will be achieved by showing that restarts are useful
for long-tailed distributions. For this section, we always implicitly use the natural assumption
18
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
that the cdf
F
is continuous and strictly monotonically increasing. In this case, the quantile
function Qis the inverse of F.
A condition for the usefulness of restarts, as defined in Definition 22, was proven in [
Lor18
].
For the following, recall the concept of quantile functions (Definition 9). We show the result
using the following theorem.
Definition and Theorem 23
([
Lor18
])
.
Let
X
be a positive, real-valued random variable
having quantile function Q. Let
R(p, X):= (1 −p)·Q(p)
E[X]+Rp
0Q(u) du
E[X].
Then restarts are useful if and only if there is a quantile p∈(0,1) such that
R(p, X)< p.
Even if the quantile function
Q
and the expected value are unknown,
R
(
p, X
) can be
characterized for large values of p.
Lemma 24.
Consider a positive, real-valued random variable
X
with pdf
f
and quantile
function Qsuch that E[X]<∞. Also, assume that the limit limt→∞ t2·f(t)exists. Then,
lim
p→1R(p, X) = lim
p→1(1 −p)·Q(p)
E[X]+Rp
0Q(u) du
E[X]= 1.
For the proof of Lemma 24, we will need the inverse function theorem. This theorem roughly
states that a continuously differentiable function
ϕ
is invertible in a neighborhood of any point
a
at which ϕ0(x) does not vanish [Rud64].
Theorem 25
(Inverse function theorem [
BS00
])
.
Let
I⊆R
be an interval and let
ϕ:I→R
be
continuous and strictly monotone on
I
. Then, there is an inverse function
ψ:
=
ϕ−1
defined on
J:
=
ϕ
(
I
)that is continuous and strictly monotone. If
ϕ
is differentiable at
a∈I
and
ϕ0
(
a
)
6
= 0,
then ψis differentiable at b:=ϕ(a)and
ϕ−10(b) = ψ0(b) = 1
ϕ0(a)=1
ϕ0ψ(b).
Proof of Lemma 24.
In the following, let
F
and
f
be the cdf and pdf of
X
, respectively (see
Definition 9). We start by specifying the derivative of
Q
with respect to
p
as a preliminary
consideration. From
F
=
Q−1
and the application of the inverse function theorem 25 (letting
ψ=Qand ϕ=F), it follows that
Q0(p):=d
dpQ(p) = 1
fQ(p).(1)
As the first step in our proof, we consider the limit of the second summand of
R
(
p, X
), i. e.,
of the term
Rp
0Q
(
u
)
duE
[
X
]. This value can be determined using integration by substitution
with x=Q(u) followed by applying the change of variable method with p=F(t):
lim
p→1Rp
0Q(u) du
E[X]= lim
p→1RQ(p)
0x·f(x) dx
E[X]= lim
t→∞ Rt
0x·f(x) dx
E[X]= 1.
The last equality holds because the numerator matches the definition of the expected value.
19
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Next, we examine the limit of (1
−p
)
Q
(
p
)
/E
[
X
]. When
X
has finite support, i. e., when
there exists an
x∈R
with
F
(
x
) = 1. Then,
limp→1(1 −p)Q(p)=0
follows from the definition
of the quantile function.
More care needs to be taken in the case when
F
(
x
)
<
1 holds for all
x∈R
. In this case,
we have
limp→1Q
(
p
) =
∞
. Hence, to examine
limp→1
(1
−p
)
Q
(
p
)
/E
[
X
], we apply L’Hospital’s
rule twice and use the change of variable method with p=F(t) to obtain
lim
p→1(1 −p)·Q(p) = lim
p→1Q(p)2·fQ(p)= lim
t→∞ t2·f(t).
It is well-known that if
lim inft→∞ t2·f(t)>0
were to hold, then the expected value
E
[
X
] would
be infinite (this statement is, for example, implicitly given in [
FKZ11
]). This would contradict
the premise of the lemma; therefore,
lim inft→∞ t2·f(t)=0
. Moreover, since, by assumption,
limt→∞ t2·f(t) exists, we may conclude that
lim
t→∞ t2·f(t) = lim sup
t→∞ t2·f(t) = lim inf
t→∞ t2·f(t)=0.
A frequently used tool for describing distributions is the hazard rate function.
Definition 26
([
RBH03
])
.
Let
X
be a positive, real-valued random variable having cdf
F
and
pdf f. The hazard rate function r:R+→R+of Xis given by
r(t):=f(t)
1−F(t).
There is an interesting relationship between the long-tail property and the hazard rate
function’s behavior.
Lemma 27
([
NWZ20
])
.
Let
X
be a positive, real-valued random variable with hazard rate
function
r
such that the limit
limt→∞ r(t)
exists. Then, the following statements are equivalent:
(a) Xis long-tailed.
(b) lim
t→∞ r(t)=0.
With the help of these preliminary considerations, we are now ready to show that restarts
are useful for long-tailed distributions. It should be noted that the conditions of this following
theorem are not restrictive since all naturally occurring long-tail distributions satisfy these
conditions (see also [
NWZ20
]). To be more precise, to the best of our knowledge, all named
continuous long-tailed distribution do fulfill the requirements of the following theorem (there
are only pathological examples that can be constructed that do not fulfill the requirements).
Theorem 28.
Consider a positive, long-tailed random variable
X
with continuous pdf
f
and
hazard rate function
r
. Also, assume that either
E[X] = ∞
holds or the limits
limt→∞ r(t)
and
limt→∞ t2·f(t)both exist. In both cases, restarts are useful for X.
Proof.
Let
F
be the cdf and
Q
the quantile function of
X
. According to Theorem 23, restarts
are useful if and only if
(1 −p)·Q(p)
E[X]+1
E[X]·Zp
0
Q(u) du < p (2)
for some p∈(0,1). Let us consider two cases.
20
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
First, consider the case where the expected value
E
[
X
] is infinite. Let
p∈
(0
,
1) be such that
Q
(
p
)
<∞
. Since
E
[
X
] =
∞
, it immediately follows that
Q
(
p
)
/E
[
X
] = 0. Moreover, we also
have
Rp
0Q
(
u
)
du≤p·Q
(
p
)
<∞
. Hence, the left side of Inequality
(2)
is zero, and the inequality
is obviously satisfied. Thus, the statement follows.
For the second case, we assume that
E
[
X
]
<∞
and that both
limt→∞ r(t)
and
limt→∞ t2·f(t)
exist. Equation (1) can now be used to calculate the following derivative:
d
dpR(p, X)−p=d
dp(1 −p)·Q(p)
E[X]+Rp
0Q(u) du
E[X]−p=1−p
E[X]·fQ(p)−1.
Consider the limit of this expression for
p→
1. Once again, the change of variable method is
applied with p=F(t), resulting in
lim
p→1
1−p
E[X]·fQ(p)−1 = lim
t→∞
1−F(t)
E[X]·f(t)−1 = lim
t→∞
1
E[X]·r(t)−1.
By assumption,
X
has a long-tail distribution, and the limit of
limt→∞ r(t)
exists. For this
reason,
limt→∞ r(t)=0
follows as a result of Lemma 27. Furthermore, since
E
[
X
]
<∞
holds,
we may conclude that
lim
p→1
1−p
E[X]·fQ(p)−1 = lim
t→∞
1
E[X]·r(t)−1 = ∞.(3)
The condition from Theorem 23 can be rephrased in such a way that restarts are useful if and
only if
R(p, X)−p < 0.
According to Lemma 24, the left-hand side of this inequality approaches 0 for
p→
1. However,
as shown in Equation
(3)
, the derivative of
R
(
p, X
)
−p
approaches infinity for
p→
1. These two
observations imply that there is a
p∈
(0
,
1) satisfying
R
(
p, X
)
−p <
0. Consequently, restarts
are useful for X.
With the help of this theorem, we obtain the following corollary of the Weak Conjecture.
Conjecture 29. Restarts are useful for Alfa with SLS ∈ {SRWA,probSAT,YalSAT}.
If Conjecture 21 is true, then this statement follows immediately by Theorem 28.
4 Theoretical Justifications for the Johnson SB Conjecture
Up to this point, we have established that Johnson SB distributions accurately describe the
runtime behavior of the
Alfa
algorithm as demonstrated for the solvers
SRWA
,
probSAT
, and
YalSAT
. This observation was derived from extensive empirical investigations. This section
provides a theoretical justification for why the runtime distributions are Johnson SB distributed
in the special case of
SRWA
as the SLS component of
Alfa
. We focus on
SRWA
because this
algorithm is best suitable for purely theoretical analyses (as witnessed by the worst-case analysis
conducted by Sch¨oning in [
Sch02
]). Furthermore, it is a special case of
probSAT
. For convenience,
SRWA is presented in Algorithm 4.
21
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Input : Boolean formula G,Promise: G∈SAT
1while True do
2Choose a random assignment α
3for j= 1 to trestart do restart mechanism
4if Gα = 1 then return α
5Uniformly at random choose a clause K∈Gwith Kα = 0
6Uniformly at random choose a literal u∈K
7Perform the flip α:=α[u= 1]
Algorithm 4: Sch¨oning’s original random walk algorithm SRWA [Sch02]. Line 3takes care of restarting
the search after trestart flips. Then, a new random assignment is chosen.
4.1 Proof Overview
Let us begin by providing an overview of the organization of Section 4. This overview will also
function as a rough proof sketch. The overall idea of the proof is to study which random variables
make up the expected runtime (called
P,Q
, and
R
in the following) and then, subsequently,
analyze these random variables. We succeed in showing that these three random variables are
indeed approximately Johnson SB distributed. We have provided more details of the proof in
the following overview:
Section 4.2
To increase readability, we provided an overview of all used notation in a glossary.
We also discuss notational convention in this section.
Section 4.3
We start the proof by showing that the expected runtime (as measured in the
number of flips),
ERL
, on the extended instance
F∪L
can be analyzed by separating the
expected value into two components. The first component,
F
, takes care of the case where
at some point during the run of the algorithm, a clause of
L
will be selected by Sch¨oning’s
algorithm (see line 5in Algorithm 4). The second component,
I
, takes care of the case
when the formula is solved solely on the initial formula
F
. We analyze each component in
a separate subsection.
Section 4.3.1 We analyze the term
I
that gives the expected number of flips in case no
clause of
L
will ever get selected. We show that this term consists of one random
variable Q.
Section 4.3.2 We show that the term
F
that gives the expected number of flips in the
case where
L
is involved in the solving process contains three random variables that
we call P,Qand R(plus the expected value ERLafter one flip has taken place).
Section 4.3.3 We combine the two cases in one single equation.
Section 4.4
We analyze the random variables
P,Q
and
R
that we have obtained in the last
section and find that each is asymptotically Johnson SB distributed.
4.2 Glossary of Notations and Notational Conventions
For the following sections, the reader might want to refer to the following glossary to look up
terminology that is introduced in the following subsections. Let us also emphasize that all our
notations abide by the following convention.
Convention 30.
In the next section, we will consider a random set
L
of clauses. For the sake
of clarity, we print random variables depending on
L
in bold font (e. g.,
P,Q,R
), whereas
22
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
constants are not printed in bold font (e.g.,
C1
,
C2
, or
P
[
the number of flips to solve Fis i|
SRWA is started in α], etc.).
We wish to especially highlight that some random variables describe probabilities since they
depend on the random set
L
. We use a subscript
L
and boldface and denote this by
PPP
L(·)
. For
example, we will write
PPP
L(the first time a clause from Lis chosen is in iteration c+ 1 |SRWA was started in α, L).
To correctly interpret this notation, we refer to Definition 2. On the other hand, some probabilities
are constants not depending on L, denoted by the notation P[·].
We use the same principle for the expectation operator.
Upon first reading the paper, the reader might skip this glossary, as all definitions are
introduced and explained in the main body of the following section. In brackets, we indicate
where the full definition can be found.
Fthe original formula SRWA is trying to solve
F0the modified formula, given by F0:=F∪L
Lrandom set of logically equivalent clauses that gets added to F
Lset of some logically equivalent clauses with respect to F
#Flips(G)random variable for the runtime in flips of SRWA on instance G(Definition 31)
ERL
expected runtime
EEE #Flips
(
F∪L
)
|L
of
SRWA
(in flips) on the extended instance
F∪L
(Definition 32)
A(α)event that the initial assignment of SRWA is α(page Definition 35)
ERL(·)
expected runtime of
SRWA
(in flips) on
F∪L
subject to the conditions given in brackets
(page 25)
PPP
L(·):=PPP(·|L) if Bis some event (page 25)
PPP
L(·|B):=PPP(·|B, L) if Bis some event (page 25)
FirstSel(c+ 1)
event that the first time
SRWA
selects a clause of
L
in line 5is in the (
c
+ 1)-st
iteration (Definition 36)
NeverSel
event that
SRWA
never chooses a clause of
L
and solves the formula only using clauses
from F(Definition 36)
Sel(c)
indicator variable being 1 if and only if a clause in
L
gets selected in the
c
-th iteration of
SRWA (Definition 41)
FromαToβInc
event that
SRWA
(started from
α
) ends up in assignment
β
after performing
c
flips
(Definition 44)
UNSATG(β):={D∈G|Dβ = 0}(Definition 53)
UNSATG(β, ∈x):={D∈G|Dβ = 0 and x∈VarsD}(Definition 53)
UNSATG(β, /∈x):={D∈G|Dβ = 0 and x /∈VarsD}(Definition 53)
V:= VarsUNSATL(β), the set of clauses in Lthat are falsified by β(Definition 62)
Flipα(c, β, x)
event that in the execution of
SRWA
(started with assignment
α
), at the beginning
of the
c
-th iteration, the current assignment is
β
. Furthermore, the next flip will flip
variable x. (Definition 50)
C1(α, i):
=
P#Flips
(
F
) =
i
|
A
(
α
)
, i. e., some constant independent of
L
, namely the proba-
bility that Fgets solved with iflips when SRWA is started from α(Lemma 43)
23
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
C2(α, k, γ):
=
PhFromαToγInk−1
|
Tk−1
j=1 {Sel(j)=0}, A(α)i
, i. e., some constant independent of
L
, namely the probability that
SRWA
takes the random walk from the initial assignment
α
to γin k−1 flips, under the condition that no clause of Lwas touched (Lemma 46)
C3(α, c, β):
=
PFromαToβInc
|
FirstSel
(
c
+ 1)
, A
(
α
)
, i. e., some constant independent of
L
,
namely the probability that
SRWA
takes the random walk from the initial assignment
α
to
γ
in
c
flips, under the condition that the first clause of
L
gets selected in iteration
c
+ 1
(Proposition 64)
C4(α, c, γ):=C2(α, k + 1, γ), i. e., some constant independent of L(Proposition 64)
P:
=
PPP
LFlipα(c, β, x)|FromαToβInc,FirstSel
(
c
+ 1)
, A
(
α
)
, Johnson SB distributed random
variable (page 43)
Q:
=
PPP
L(Sel(k)=0
|
FromαToγInk−1, A(α))
, Johnson SB distributed random variable (Lemma 46)
R:
=
PPP
L(Sel(c+ 1) = 1
|
FromαToγInc, A(α))
, Johnson SB distributed random variable (Sec-
tion A.2)
4.3 Analysis of the Runtime Distribution of the Algorithm
Before beginning the analysis, let us quickly recall the setting of Section 3: The input of the
algorithm
Alfa
is a Boolean CNF formula
F
, and the promise that this formula is indeed
satisfiable. The algorithm then randomly generates some set
L
of clauses that are logical
implications of the original formula. Then, some SLS solver (in this section, Sch¨oning’s random
walk algorithm, again abbreviated with SRWA) is called on F0:=F∪L.
Note that the parameter
trestart
in Algorithm 4controls the restart mechanism of
SRWA
.
In other words,
SRWA
chooses a new random assignment after every
trestart
flips. Initially, we
consider the case
trestart
=
∞
implying that no restarts are performed. This choice simplifies the
analysis slightly. However, the observations and results can be extended to the case in which
restarts are performed at the cost of sacrificing clarity of exposition. We briefly explain how to
adapt our arguments to the case with restarts in Section A.4 of the appendix.
The aim at the beginning of this section is to establish that the expected runtime on the
extended instance
F∪L
can be analyzed by considering two components: The first component
F
,
taking care of the case where the algorithm will at some point select a clause of
L
; and the
other component Iof the case where such a clause is never selected.
We need the following two definitions to make the notion of expected runtime more precise.
Definition 31.
We let
#Flips
(
G
) denote the number of flips of
SRWA
on an instance
G
until a
satisfying assignment is found. Since
SRWA
is a Las Vegas algorithm,
#Flips
(
G
) is a random
variable.
We frequently refer to the number of flips required to find a satisfying assignment as the
runtime of
SRWA
. We aim to find the asymptotic distribution of the expected runtimes of
SRWA
when the algorithm is provided with a random set
L
from
Alfa
. We capture this random choice
of additional clauses with the following definition. To understand the term
EEE #Flips
(
F∪L
)
|L
in the definition, the reader might refer back to Definition 3and Example 5.
Definition 32.
Let
F
be some SAT instance and let
L
be a set of clauses such that for all
R⊆L
,
the formula
F∪R
is logically equivalent to
F
. Furthermore, let
L
be the random subset of
L
.
In the following, the random variable
ERL
denotes the expected runtime
EEE #Flips
(
F∪L
)
|L
of SRWA (in flips) on F∪L. That is,
ERL(R) = E[#Flips(F∪L)|L=R] = E[#Flips(F∪R)] ∈N0,where R⊆L.
24
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
In this section,
L
may be arbitrarily chosen as long as it only contains clauses implied by
F
.
In the first part of this section, any stochastic process can create the random set
L
. Later on,
in Section 4.4, we fix a generating model for
L
. As
L
is being randomly selected in
Alfa
,
L
is a
random set (denoted in bold). Thus, the expected runtime
ERL:P
(
L
)
→N0
is also a random
variable.
Furthermore, we frequently work with further restrictions on
ERL
, such as a condition for
the initial assignment. These restrictions result in a conditional expectation.
Notation 33. We denote additional conditions in round brackets, i. e., ERL(·).
Example 34. Let A(α) be the event that the initial assignment chosen by SRWA is some fixed
assignment
α
(cf. line 2in Algorithm 4). Then,
ERLA(α)
denotes the conditional expectation
EEE #Flips(F∪L)|A(α),L.
We begin our analysis of the runtime distribution of
Alfa
by applying the law of total
expectation (LTE) to ERLwhen conditioning on the randomly chosen initial assignment.
Definition 35.
Let
A
(
α
) denote the event that the initial starting assignment chosen in line 2
of SRWA is α.
Using this definition, the LTE yields
ERL=X
α∈{0,1}n
PA(α)·ERLA(α),(4)
where
ERLA(α)
denotes the expected runtime of
SRWA
(in flips) on a given extended problem
instance
F∪L
under the condition that the algorithm picks
α
as the initial assignment. Letting
n:
=
|Vars
(
F0
)
|
=
|Vars
(
F
)
|
, one can notice that
P
[
A
(
α
)] =
1
2n
for all
α∈ {
0
,
1
}n
, i. e., this
probability is independent of the random set L.
In the following, we concentrate on analyzing the expression
ERLA(α)
appearing in
Equation
(4)
. The following definition is used to distinguish between the cases of whether the
algorithm will use a clause of Lin the solution or not.
Definition 36.
Let
FirstSel
(
c
+ 1) denote the event that the first time
SRWA
selects a clause
of
L
in line 5is in iteration
c
+ 1 (i. e., in iteration 1
, . . . , c
a clause of
F
is chosen). Similarly, we
let
NeverSel
denote the event that
SRWA
never chooses a clause of
L
, i. e., the algorithm solves
the instance F0using only clauses from F.
Similar to
ERL
, denoting an expected runtime depending on
L
, we also deal with probabilities
depending on L. Following Convention 30, we write Las subscript and use bold font:
Notation 37.
The notation
PPP
L
(
·
) will be used as a shorthand for
PPP(·|L)
, and
PPP
L
(
· | B
)
:
=
PPP(·|B, L) if Bis some event.
Example 38.
Again, since
L
is a random set,
PPP
L
(
·
) is a random variable. For example, we will
write
PPP
LFirstSel(c+ 1) |A(α)
as a shorthand for the conditional probability
PPP(FirstSel(c+ 1)|A(α),L),
25
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
i. e., for the (random variable) probability that
SRWA
picks a clause from
L
in iteration
c
+ 1 for
the first time, given that it was started from
α
, and dependent on the concrete realization of
the random set L. Similarly, as introduced above, the notation
ERLFirstSel(c+ 1), A(α)
should be understood as
EEE #Flips(F∪L)|FirstSel(c+ 1), A(α),L.
The definitions of ERL(·) and PPP
L(·) in Notation 33 and 37 are flexible enough to add multiple
events, separated by commas.
Now, applying the LTE again to the respective terms
ERLA(α)
in sum
(4)
and conditioning
on the iteration cin which a clause of Lgets selected for the first time yields
ERLA(α)=X
c∈N0
PPP
LFirstSel(c+ 1) |A(α)·ERLFirstSel(c+ 1), A(α)(5)
+PPP
LNeverSel |A(α)·ERLNeverSel, A(α).(6)
For clarity of exposition, we analyze each line of this sum in a different case. We start with
line
(6)
, i. e., the case in which
SRWA
never selects a clause from
L
(called the infinite case
I
).
Its treatment can be found in Section 4.3.1. The case of line
(5)
, the finite case
F
, uses a similar
but more involved argument. For this reason, we only mention the result of the analysis in
Section 4.3.2 and defer the analysis to Appendix A. We then proceed to present the combined
result in Section 4.3.3, showing which random variables (later called
P
,
Q
, and
R
) make up the
above expression. These random variables will be in such an elementary form that it is easy for
us to analyze their distribution.
4.3.1 The Infinite Case NeverSel
As announced, this section will treat the analysis of the term
I:=PPP
LNeverSel |A(α)·ERLNeverSel, A(α),
i. e., the term in Equation
(6)
. We will state the goal of this section in an informal form before
we begin with the detailed analysis (a detailed version can be found in Proposition 48).
Proposition 39 (The infinite case NeverSel, informal).It holds that
I=PPP
LNeverSel |A(α)·ERLNeverSel, A(α)=X
i∈N0
C1·
i
Y
k=1
X
γ:F γ=0
C2·Q
·i
,
with constants
C1, C2
not depending on
L
, and the random variable
Q:
=
Q
(
α, k, γ
)that, roughly
speaking, tells us how likely it is that we select a clause of the set
L
in the
k
-th iteration of
SRWA
,
given the knowledge of the previous random walk path (from
α
to
γ
) over the last
k−
1iterations.
The random variable
Q
is “elementary enough” such that we can analyze its distribution in
Section 4.4.2.
To make the proof of Proposition 39 more digestible, we have split it into several steps, each
containing one or multiple lemmas of what will be achieved in this step.
26
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Step 1: Reduction From Probability and Expectation to Probabilities Only
In the first step,
we rewrite
I
in a form that only contains probabilities. These probabilities will then, in turn, be
analyzed in a later step. For the formulation of the lemma in this step, we would like to remind
the reader that F0refers to the modified formula, i. e., F0:=F∪L.
Additionally, we want to emphasize that the definitions of
ERL
(
·
) and
PPP
L
(
·
) in Nota-
tion 33 and 37 are flexible enough to add multiple events, separated by commas. For example,
PPP
L#Flips
(
F0
) =
i|NeverSel, A
(
α
)
denotes the probability that
SRWA
takes
i
flips to solve
F0
under the condition that it was started with
α
and never selects a clause from
L
. Similarly,
ERLNeverSel,#Flips(F0) = i, A(α)
denotes the expected runtime of
SRWA
subject to the
conditions listed in brackets, i.e., under the assumptions that no clause of
L
gets selected,
F0
is
solved in iflips, and the random walk started in assignment α.
Lemma 40. It holds
I=X
i∈N0
PPP
LNeverSel |A(α)·PPP
L#Flips(F0) = i|NeverSel, A(α)·i.
Proof.
By applying the LTE to the factor
ERLNeverSel, A(α)
when conditioning on the event
#Flips(F0) = i, one obtains
I=PPP
LNeverSel |A(α)·ERLNeverSel, A(α)
LTE
=X
i∈N0hPPP
LNeverSel |A(α)·PPP
L#Flips(F0) = i
|
NeverSel, A(α)
·ERLNeverSel,#Flips(F0) = i, A(α)i.
The last factor of the above sum can be expressed in a simpler form:
ERLNeverSel,#Flips(F0) = i, A(α)=i.
This equation holds because we have already conditioned on the event
{#Flips
(
F0
) =
i}
, i. e.,
the runtime on
F0
being
i
, and the condition that the algorithm never selects a clause from the
random set L. Hence, the lemma follows.
Step 2: Analysis of the Remaining Two Probabilities With the Help of a Selector Variable
and the Chain Rule
As our next step, we will analyze the product of the first two factors in
Lemma 40, i. e., the expression
E:=PPP
LNeverSel |A(α)·PPP
L#Flips(F0) = i|NeverSel, A(α).
For this, we need the following definition telling us if in the
c
-th iteration of
SRWA
a clause of
L
gets selected or not.
Definition 41. Let Sel(c) be the indicator variable defined as follows:
Sel(c):=(1 if a clause in Lgets selected in the c-th iteration of SRWA,
0 otherwise.
With this definition in place, we can present the lemma for this step.
27
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Lemma 42. We have
E=PPP
LNeverSel |A(α)·PPP
L#Flips(F0) = i
|
NeverSel, A(α)
=PPP
L
#Flips(F0) = i
|
i
\
j=1{Sel(j)=0}, A(α)
(7)
·
i
Y
k=1
PPP
L
Sel(k)=0
|
k−1
\
j=1{Sel(j)=0}, A(α)
.(8)
Proof.
By the definition of the conditional probability and by reducing the resulting fraction,
we obtain
E=PPP
LNeverSel |A(α)·PPP
L#Flips(F0) = i
|
NeverSel, A(α)
=PPP
LNeverSel, A(α)
PPP
LA(α)·PPP
L#Flips(F0) = i, NeverSel, A(α)
PPP
LNeverSel, A(α)
=PPP
L#Flips(F0) = i, NeverSel, A(α)|A(α).(9)
Now notice that
P[A|B] = P[A∩B]
P[B]=P[A∩B∩B]
P[B]=P[A∩B|B],
hence we can simplify line (9) even further
E=PPP
L#Flips(F0) = i, NeverSel, A(α)|A(α)=PPP
L#Flips(F0) = i, NeverSel |A(α).
We continue with analyzing
PPP
L#Flips
(
F0
) =
i, NeverSel |A
(
α
)
. Since we have the
information that
F0
is being solved with
i
flips, we can express
NeverSel
more precisely as the
event that in none of the iterations 1
, . . . , i
a clause from
L
gets selected, i. e., as the intersection
Ti
j=1{Sel(j)=0}. Thus,
E=PPP
L
{#Flips(F0) = i} ∩
i
\
j=1{Sel(j)=0}
|
A(α)
.
Applying the chain rule for probabilities (cf. Theorem 8) we obtain
E=PPP
L
#Flips(F0) = i
|
i
\
j=1{Sel(j)=0}, A(α)
·
i
Y
k=1
PPP
L
Sel(k)=0
|
k−1
\
j=1{Sel(j)=0}, A(α)
,
which is what we wanted.
Step 3: Analyzing the Product Rule Factors
Having achieved this, we proceed to analyze
the factors in lines (7) and (8). Let us begin with the first factor in line (7).
Lemma 43. The fol lowing expression is not a random variable anymore:
PPP
L
#Flips(F0) = i
|
i
\
j=1{Sel(j)=0}, A(α)
=:C1.(10)
28
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Proof.
When
Ti
j=1{Sel(j)=0}
holds and we have the information that
#Flips
(
F0
) =
i
,
Sch¨oning’s algorithm has never selected a clause from
L
in the
i
iterations it required to
solve formula
F0
. Thus, the algorithm performs its random walk only on clauses of the original
formula F; hence, the random set Ldoes not have any influence. We can therefore write
PPP
L
#Flips(F0) = i
|
i
\
j=1{Sel(j)=0}, A(α)
=P#Flips(F) = i
|
A(α)=C1.
Notice that P#Flips(F) = i
|
A(α)is not a random variable.
Because of the simplification provided in Equation
(10)
, we can concentrate on the factors
in the big product of line
(8)
, i. e., the part
Qi
k=1 PPP
LSel(k)=0
|
Tk−1
j=1 {Sel(j)=0}, A(α).
To
proceed, however, we require additional notation.
Definition 44.
We will denote the event that
SRWA
(started from the initial assignment
α
) ends
up in assignment βafter performing cflips with FromαToβInc.
Let us look at a few easy examples to get an intuition for this notation.
Example 45. (i)
It holds that
P
[
FromαToαIn0|A
(
α
)] = 1, since
α
is the initial assignment
selected by SRWA.
(ii)
If
α:
=
{x
=
y
= 0
, z
= 1
}
gets chosen in line 2of Algorithm 4and we also know that the
clause K:= (x∨y∨z) gets selected in line 5, then setting
β1={x= 1, y = 0, z = 1}, β2={x= 0, y = 1, z = 1}, β3={x=y=z= 0},
we have P[FromαToβiIn1|A(α), K is selected] = 1
3for all i∈ {1,2,3}.
(iii)
If in the (
c
+ 1)-th iteration of
SRWA
a clause of
L
gets selected, then for all
β∈ {
0
,
1
}n
that do not falsify a clause in Lwe have
P[FromαToβInc|A(α),Sel(c+ 1) = 1] = 0.
Now, we are ready to analyze the factors in the big product of line
(8)
. In analyzing these
factors, we end up with the random variable
Q
, of which we show in the Section 4.4 that it is
Johnson SB distributed.
Lemma 46. For k∈ {1, . . . , i}it holds
PPP
L
Sel(k)=0
|
k−1
\
j=1{Sel(j)=0}, A(α)
=X
γ:F γ=0
C2·Q,
where
C2:=C2(α, k, γ):=P
FromαToγInk−1
|
k−1
\
j=1{Sel(j)=0}, A(α)
∈[0,1]
is no random variable, and
Q:=Q(α, k, γ):=PPP
LSel(k)=0|FromαToγInk−1, A(α).
29
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Proof.
Let
k∈ {
1
, . . . , i}
. One can notice by applying the LTP when conditioning on the event
FromαToγInk−1, that
PPP
L
Sel(k)=0
|
k−1
\
j=1{Sel(j)=0}, A(α)
=X
γ:F γ=0
PPP
L
Sel(k)=0
|
FromαToγInk−1,
k−1
\
j=1{Sel(j)=0}, A(α)
·P
FromαToγInk−1
|
k−1
\
j=1{Sel(j)=0}, A(α)
.
Notice that we sum only over those assignments that falsify a clause in
F
since the algorithm
would already have finished in case of a satisfying assignment. Two additional remarks are due:
First, one can observe that
PPP
LSel(k)=0
|
FromαToγInk−1,Tk−1
j=1 {Sel(j)=0}, A(α)
can
be rewritten in the form
PPP
LSel
(
k
) = 0
|FromαToγInk−1, A
(
α
)
since the condition
FromαToγInk−1
(i. e., the algorithm being in
γ
after
k−
1 flips) makes the condition
Tk−1
j=1 {Sel
(
j
) = 0
}
(i. e., the
information which clauses were selected along the way) obsolete for determining the expected
number of flips the algorithm performs.
Secondly, notice that
PhFromαToγInk−1
|
Tk−1
j=1 {Sel(j)=0}, A(α)i
is a probability that does
not depend on
L
; in other words, this term is no random variable (also recall Convention 30).
This is true because of the condition
Tk−1
j=1 {Sel
(
j
) = 0
}
, the complete (
k−
1) flip random walk
from αto γis made without considering clauses from L.
Putting together Lemmata 43 and 46 yields:
Corollary 47. We have
E=C1·
i
Y
k=1
X
γ:F γ=0
C2·Q
.
Final Step: Putting Everything Together
Having done the steps above, we summarize the
results obtained in Subsection 4.3.1, dealing with the infinite case, in the following recapitulation.
Proposition 48 (The infinite case NeverSel).It holds that
I=PPP
LNeverSel |A(α)·ERLNeverSel, A(α)=X
i∈N0
C1·
i
Y
k=1
X
γ:F γ=0
C2·Q
·i
,
with the constants
C1:=C1(α, i):=P#Flips(F) = i
|
A(α)∈[0,1],and
C2:=C2(α, k, γ):=P
FromαToγInk−1
|
k−1
\
j=1{Sel(j)=0}, A(α)
∈[0,1]
and the random variable
Q:=Q(α, k, γ):=PPP
LSel(k)=0|FromαToγInk−1, A(α).
30
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Proof.
We want to emphasize that we will, from now on, drop the dependencies of random
variables like
Q
(from
α, k, γ
, etc.) in the notation in order not to overload notation. The reader
should, however, keep this in mind. We have
I=PPP
LNeverSel |A(α)·ERLNeverSel, A(α)
Lem 40
=X
i∈N0E · i
Cor 47
=X
i∈N0
C1·
i
Y
k=1
X
γ:F γ=0
C2·Q
·i
.
4.3.2 The Finite Case c < ∞
As seen in the previous section, we sometimes end up with terms that are not random variables
(above
C1
and
C2
). Since we will almost exclusively care about random variables in Section 4.3.3,
we introduce the following notation that will allow us to drop non-random variables. This
has the advantage of keeping the calculations cleaner and more readable. This is theoretically
founded in Lemma 14.
Notation 49.
We will use the symbol
∼
=
to indicate equality up to constants, e.g.,
1
2X
+
1
4∼
=X
.
Furthermore, we need the following definition.
Definition 50.
We let
Flipα(c, β, x)
denote that in the execution of
SRWA
(started with as-
signment
α
), at the beginning of the
c
-th iteration, the current assignment is
β
. Furthermore,
the next flip will flip variable
x
(i. e., at the end of the algorithm’s
c
-th iteration in line 7, we
have β[x] as the current assignment).
By applying similar arguments as in Section 4.3.1, where we analyzed
I
, i. e., applying the
LTE, the chain rule for probabilities, and using the LTP, one can obtain the following recursion.
Proposition 51
(The finite case
c < ∞
)
.
Let
V:
=
VarsUNSATL
(
β
)
, where
UNSATL
(
β
)is
the set of clauses in Lthat are falsified by β. It then holds that
F=X
c∈N0
PPP
LFirstSel(c+ 1) |A(α)·ERLFirstSel(c+ 1), A(α)
∼
=X
c∈N0X
β:Lβ=0
X
γ:F γ=0
R
·
c
Y
k=1
X
γ:F γ=0
Q
·X
x∈VP·ERLA(β[x])
with the random variables
P:=PPP
LFlipα(c, β, x)|FromαToβInc,FirstSel(c+ 1), A(α),
Q:=PPP
LSel(k)=0|FromαToγInk−1, A(α),
R:=PPP
LSel(c+ 1) = 1 |FromαToγInc, A(α).
Proof. For a detailed presentation of the arguments involved, we refer to Appendix A.
31
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
4.3.3 Combining Both Cases
Putting together the cases treated in Sections 4.3.1 and 4.3.2, we obtain the following:
Theorem 52.
Let
V
be as in Proposition 51. The distribution of the expected value
ERLA(α)
is given by
ERLA(α)∼
=X
c∈N0X
β:Lβ=0
X
γ:F γ=0
R
·
c
Y
k=1
X
γ:F γ=0
Q
·X
x∈VP·ERLA(β[x])
+X
i∈N0
i
Y
k=1 X
γ:F γ=0
Q,
with P,Q, and Ragain as in Proposition 51.
We proceed in the next section to analyze the distribution of the random variables
P
,
Q
,
and
R
. Finally, in Section 4.5, we present arguments, why
ERLA(α)
follows one distribution
type.
4.4 Analysis of the Random Variables
Recall that in the setting of this section, a base instance
F
is modified by adding a set of
logically equivalent clauses
L
. The precise manner in which the set of clauses
L
is randomly
generated does not affect any derivations and results presented so far in this section. In other
words, no matter how the clauses from
L
are sampled, the expression from Theorem 52 will
always describe the runtime distribution.
In this section, we proceed by fixing the generating model for the random set L. Then, we
analyze the random variables
P
,
Q
, and
R
in more detail based on this assumption. In this
context, we prove that each of them (asymptotically) follows the Johnson SB distribution. We
begin by outlining the model used to modify formula F.
Fixing the Generating Model
The generating model for the set
L
is specified in Algorithm 5.
Note that there are pronounced similarities to the model employed in Section 3.1. However,
there are two notable differences. First, the model from Algorithm 5does not necessarily use
clauses generated by resolution. Instead, other methods (for example, CDCL, like in the
GapSAT
solver) can also be used to produce logically equivalent clauses. In this sense, the model from
Algorithm 5generalizes the model discussed in Section 3.1. On the other hand, in Algorithm 5,
we restrict ourselves to clauses with a certain fixed length
. In that sense, the chosen model is
weaker than the one in Section 3.1. The reason for this restriction is that otherwise, another
random variable, namely the length of the added clause, would have to be considered, which
would only further complicate the analysis.
Input: Boolean formula F, probability p∈(0,1], set Lcontaining logically equivalent
clauses with respect to Fsuch that ∃∈N:∀K∈L:|K|=
L:=∅
foreach K∈Ldo
with probability pdo L:=L∪ {K}
Algorithm 5: Generation of the random set L.
32
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Some Preliminary Comments for the Analysis of the Random Variables P,Q, and R
As
indicated above, the Johnson SB distribution is applied several times in the following. For
our purposes, the exact values of the distribution’s parameters are irrelevant for the most
part. Therefore, we often omit the brackets containing the exact values of the parameters.
In addition, we often deal with asymptotic distributions in this section. A sequence of ran-
dom variables
X1,X2, . . .
with associated cdfs
F1, F2, . . .
has an asymptotic distribution
F
if
limn→∞ Fn(x) = F(x)
for all
x
. A prominent example is that a (suitably scaled) random
variable X∼Binn, pis asymptotically normally distributed for n→ ∞.
Finally, in the following, it is necessary to argue frequently about the number of unsatisfied
clauses given a specific assignment. For this purpose, we introduce a suitable notation.
Definition 53. Let Gbe a CNF formula, βan assignment of Gand x∈Vars(G). We set
UNSATG(β):=D∈GDβ = 0,
UNSATG(β, ∈x):=D∈GDβ = 0 and x∈Vars(D),
UNSATG(β, /∈x):=D∈GDβ = 0 and x /∈Vars(D).
Using the symbol #before any of these sets denotes the cardinality of the set.
4.4.1 Distribution Analysis of the Random Variable P
We begin our distribution analysis with the random variable
P
. The goal of this subsection is
to prove that Pis asymptotically Johnson SB distributed (Proposition 58).
The rough idea of the proof is to rewrite
P
(up to a constant) as the fraction
1
1+X
, see
Equation
(11)
, where
X
is asymptotically lognormal distributed. Thus, in order to analyze
the random variable
P
, it is helpful to establish a relationship between the lognormal and the
Johnson SB distribution.
For that, one technical detail should be clarified beforehand. In the previous sections, we
mainly considered the three-parameter lognormal distribution, i. e., the lognormal distribution,
which is shifted by an additional location parameter. However, for most of this section, we
use the two-parameter lognormal distribution, which can be interpreted as a three-parameter
lognormal distribution with location parameter zero.
Lemma 54 ([Emp18]).Let X∼LogN µ, σ2, then for all c∈R+we have
1
c+X∼SB γ=µ−ln c
σ, δ =1
σ, λ =1
c, ξ = 0.
Proof. For a proof, refer to Appendix B.2.
Furthermore, a connection between binomial and lognormal distributions can also be shown.
We need this connection to show that
X
is asymptotically lognormal distributed. For this
purpose, the following lemma is necessary.
Lemma 55 ([Lac11]).If Y∼Binn, p, then log Yis asymptotically normal ly distributed.
Proof. The proof is given in Appendix B.3.
Together with Definition 15 this lemma immediately yields the following corollary.
Corollary 56.
Let
Y
,
Z
be two independent binomial distributions. Then,
Y
,
Z
, as well as
Y/Z, are asymptotically lognormal distributed.
33
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Proof.
By applying Definition 15 of lognormal distributions, we see that a random variable
is lognormal distributed if and only if its logarithm is normally distributed. According to
Lemma 55, both
log Y
and
log Z
are asymptotically normally distributed. Therefore, it follows
that Yand Zare both asymptotically lognormal distributed.
It holds that
log(Y/Z) = log Y−log Z.
By Lemma 55, we already know that
log Y
and
log Z
are each asymptotically normal distributed.
An extremely valuable property is that the difference of independent normal distributions is
again normally distributed. Therefore, we note that
log Y−log Z
is asymptotically normal
distributed and, as a result, Y/Zis asymptotically lognormal distributed.
Remark 57.
Note that this corollary does not contradict the famous theorem by de Moivre
and Laplace, which states that
Y
is approximately normally distributed. Roughly speaking, this
is because both approximations converge towards the same limiting distribution. For further
elaboration, refer to [Che17].
We are now ready to analyze the asymptotic distribution of the random variable P.
Proposition 58. The random variable
P:=PPP
LFlipα(c, β, x)|FromαToβInc,FirstSel(c+ 1), A(α)
is asymptotically Johnson SB distributed.
Proof.
As a first step, it is worthwhile to clarify what exactly is being examined. Therefore, we
will initially focus on the conditions in PPP
LFlipα(c, β, x)|FromαToβInc,FirstSel(c+ 1), A(α).
The conditions
A
(
α
) and
FromαToβInc
tell us that
SRWA
was initialized in
α
and is in
β
after
exactly
c
flips. Due to the additional condition
FirstSel
(
c
+ 1), we know that a clause from
L
is
chosen for the (c+ 1)-st flip. In other words, a clause from the newly added clauses is picked.
Subject to these conditions, we are interested in how likely the algorithm will flip
x
next.
This first requires selecting a clause containing
x
and then, in the next step, selecting
x
as the
variable to be flipped (cf. Algorithm 4). Since these two events are independent, the overall
probability can be expressed as the product of the two individual probabilities.
The probability of selecting a clause containing
x
is the ratio of the unsatisfied clauses
from
L
containing
x
to all clauses from
L
, as one already knows that a clause from
L
is
chosen. If a clause
K
containing
x
has been selected, then
x
will be flipped with probability
1
/|K|
. In Algorithm 5, we have restricted ourselves to clauses of equal length
, so 1
/|K|
is constant and is therefore not a random variable. With these preliminary considerations,
P=PPP
LFlipα(c, β, x)|FromαToβInc,FirstSel(c+ 1)can now be given more precisely:
P=1
|K|·#UNSATL(β, ∈x)
#UNSATL(β)
=1
·#UNSATL(β, ∈x)
#UNSATL(β, ∈x) + #UNSATL(β, /∈x)
=1
·1
1 + #UNSATL(β, /∈x)
#UNSATL(β, ∈x)
.
(11)
We can now utilize knowledge about the model used to generate
L
(cf. Algorithm 5). Since
each clause from
L
is independently from each other included in
L
with probability
p
, both
34
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
#UNSATL
(
β, ∈x
) and
#UNSATL
(
β, /∈x
) are binomially distributed. Moreover, since no
clause can both contain and not contain
x
, the two random variables are independent of each
other. As a consequence,
X:=#UNSATL(β, /∈x)
#UNSATL(β, ∈x)
is asymptotically lognormal distributed according to Corollary 56. Then, Lemma 54 yields the
result.
4.4.2 Distribution Analysis of the Random Variable Q
The analysis of the random variable
Q
can be achieved using similar reasoning as before. Again,
we show that Qis asymptotically Johnson SB distributed.
Proposition 59. The random variable
Q:=PPP
L
Sel(k)=0
|
FromαToγInk−1,
k−1
\
j=1{Sel(j)=0}, A(α)
is asymptotically Johnson SB distributed.
Proof.
Let us again focus on the conditions in the above probability expression. We know that
on its random walk from assignment
α
to
γ
,
SRWA
has never selected a clause of the set
L
in
the
k−
1 flips this random walk took. By design,
SRWA
is independent of its past. Thus, the
information that it started from
α
and that no clause from
L
was selected in the last
k−
1 flips
does not affect the probability of choosing a clause from
L
for the current flip. In contrast, the
information that the algorithm is in assignment
γ
is of relevance. Therefore, the probability
that no clause from
L
is chosen is given by the ratio of the original unsatisfied clauses under
γ
to all unsatisfied clauses in the extended instance
F∪L
. Based on this reasoning,
Q
can be
expressed as follows:
Q=PPP
LSel(k)=0|FromαToγInk−1, A(α)
=#UNSATF(γ)
#UNSATF(γ) + #UNSATL(γ)
=1
1 + #UNSATL(γ)
#UNSATF(γ)
.
Due to the model for generating
L
, we conclude that
#UNSATL
(
γ
) is binomially distributed.
Moreover,
#UNSATL
(
γ
) is asymptotically lognormal distributed because of Corollary 56.
Since lognormal distributions are closed under multiplication of constants,
#UNSATL(γ)
#UNSATF(γ)
is also
asymptotically lognormal distributed. According to Lemma 54 it follows that
Q
is asymptotically
Johnson SB distributed.
4.4.3 Distribution Analysis of the Random Variable R
For the analysis of the random variable
R
, we use another helpful property of lognormal
distributions: They are closed under reciprocity.
Lemma 60 ([CE88]).If X∼LogN µ, σ2, then 1
X∼LogN −µ, σ2.
With this knowledge, one can now turn to the analysis of Ritself.
35
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Proposition 61.
The random variable
R:
=
PPP
L(Sel(c+ 1) = 1
|
FromαToγInc, A(α))
is asymp-
totically Johnson SB distributed.
Proof.
The reasoning is similar to that in Section 4.4.2. The probability that a clause from
L
is
selected is given by the ratio of unsatisfied clauses from
L
to the total number of unsatisfied
clauses:
R=#UNSATL(γ)
#UNSATF(γ) + #UNSATL(γ)=1
1 + #UNSATF(γ)
#UNSATL(γ)
.
In our model for adding the new clauses,
#UNSATL
(
γ
) has a binomial distribution.
Thus
#UNSATL
(
γ
) is asymptotically lognormal distributed (cf. Corollary 56). By applying
Lemma 60, we see that
#UNSATF(γ)
#UNSATL(γ)
is also asymptotically lognormal distributed. Lemma 54
then implies that Ris asymptotically Johnson SB distributed.
4.4.4 Concluding Remarks Regarding the Analysis of the Random Variables
In the results of this section, we frequently mentioned that the random variables are asymptoti-
cally Johnson SB distributed. Nevertheless, it is crucial to clarify under which conditions the
random variables converge to a Johnson SB distribution.
As a starting point, we commonly deal with binomial distributions of the form
#UNSATL
(
γ
)
∼
Binnγ, p
, where
nγ
is the number of clauses in
L
that are unsatisfied under
γ
. Our results
take effect precisely when nγapproaches infinity.
As is common practice, these results can also be applied under weaker conditions. For
example, if one has “only” a large number of unsatisfied clauses in
L
, then the random variables
are not exactly Johnson SB distributed, but Johnson SB distributions then represent an excellent
approximation.
4.5 Putting Everything Together
In the last two sections, we observed the distribution of the expected runtime of
Alfa
. Then, in
Section 4.3, we deduced an expression for this distribution consisting of the random variables
P
,
Q
, and
R
. Finally, in Section 4.4, we considered these random variables in detail. We found
that each asymptotically follows a Johnson SB distribution, respectively. In other words, the
distribution consists of asymptotically Johnson SB distributed random variables.
Main Result.
The expected runtime
ERL
consists of approximately Johnson SB distributed
random variables.
While this is not enough to prove Strong Conjecture (Conjecture 19), the distribution of
ERL
can be treated as Johnson SB for all intents and purposes. Note that this result is slightly
weaker than the Strong Conjecture. Therefore, it does not make our conjectures obsolete.
5 Conclusion
It has been shown in [
LW20
] that adding new, logically equivalent clauses to a formula improves
the performance of SLS SAT solvers on average. Building upon this observation, we have shown
the following main results in this paper:
1.
Treating this process as a random process, the hardness distribution follows a Johnson SB
distribution. These distributions can converge to long-tailed distributions.
36
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
2.
We have proven that restarts are useful to avoid long-tails. Thus, the algorithms can be
further improved by implementing a restart strategy.
There are several possible starting points for future research. In this work, we studied the
runtime distributions of SAT solvers on modified instances. A major influencing factor for our
work was that different problem formulations are often employed in mixed integer programming.
Therefore, it would be interesting to see if Johnson SB distributions are also the appropriate
descriptive model in that context or if a different family of distributions has to be used. This
would also provide more insight into how exactly modified instances affect an algorithm.
As can be inferred from the theoretical analysis of the runtime behavior, Johnson SB
distributions significantly impact the overall runtime. This suggests that for future studies of
algorithms, one should also consider using the Johnson SB distribution as a suitable model,
alongside established distribution types such as the normal, lognormal, and Weibull distributions.
This is especially true if one knows that there is an upper bound for the runtimes since the
Johnson SB, in contrast to the other distributions mentioned, is able to model finite support.
Such a scenario is given, for example, in the case of a systematic search. As soon as the search
tree has been completely traversed, the algorithm terminates. Thus, the runtime of the algorithm
can be bounded from above by the size of the search tree.
Furthermore, it could be a worthwhile pursuit to theoretically investigate the distribution of
other SLS solving paradigms, such as configuration checking solvers, e. g.,
CCAnr
[
CLS15
], or
WalkSAT [SKC94]. For the case of CDCL solvers we have reported results in [KLW22].
Code and Data Availability
The source code realizing the methodologies described in this paper and the corresponding
results are freely available. Therefore, all experiments are fully reproducible. The corresponding
code of Section 3(visual and statistical evaluations) is available under
https://github.com/FlorianWoerz/Towards-an-Understanding-of-Long-Tailed-Runtimes.
All evaluations take place in the files that can be found under
./evaluation/jupyter SB/evaluate *.ipynb.
A permanent version of this repository has been preserved under
doi:10.5281/zenodo.6945926.
The CNF data (all base instances, resolvents, and modifications) can be found under
doi:10.5281/zenodo.4715893.
The source code of
concealSATgen
that was used in Section 3.2.1 to generate hidden solution
instances is published under [LW21].
Acknowledgments
The authors acknowledge support by the state of Baden-W¨urttemberg through bwHPC. This
research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Founda-
tion).
37
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
The authors would like to express their gratitude to Uwe Sch¨oning for several helpful
discussions regarding the theoretical analysis.
Furthermore, the authors are greatly indebted to the anonymous referees of the European
Symposium on Algorithms (ESA ’21) and the ACM Journal of Experimental Algorithmics (ACM
JEA) for several comments that immensely helped to improve the presentation of this paper.
Previous Versions of the Paper
This is the full-length version of the paper in the ACM Journal of Experimental Algorith-
mics (JEA). It is an extended and enhanced version of the paper “Evidence for Long-Tails in
SLS Algorithms,” [
WL21
] presented at ESA 2021. Section 3describes the results from [
WL21
],
adapted to the Johnson SB distribution which converges to lognormal distributions. Section 4
and the appendix is entirely new material providing mathematical justification for the fact that
Johnson SB distributions underlie the modification model.
References
[AB63]
John Aitchison and J. A. C. Brown. The Lognormal Distribution – with special reference
to its uses in economics. University of Cambridge Department of Applied Economics
Monograph: 5. Cambridge University Press, 2nd edition, 1963.
[AS09]
Gilles Audemard and Laurent Simon. Predicting learnt clauses quality in modern SAT
solvers. In Proceedings of the 21st International Joint Conference on Artificial Intelligence
(IJCAI ’09), pages 399–404, 2009.
[ATC13]
Alejandro Arbelaez, Charlotte Truchet, and Philippe Codognet. Using sequential runtime
distributions for the parallel speedup prediction of SAT local search. Theory and Practice
of Logic Programming, 13(4-5):625–639, 2013.
[Bal15]
Adrian Balint. Original implementation of probSAT, 2015. Available at https://github.com/a
drianopolus/probSAT.
[BBJM21]
Fahiem Bacchus, Jeremias Berg, Matti J¨arvisalo, and Ruben Martins, editors. Proceedings
of MaxSAT Evaluation 2021: Solver and Benchmark Descriptions, volume B-2021-2 of
Department of Computer Science Report Series B. University of Helsinki, 2021.
[BC18]
Tom´aˇs Balyo and Luk´as Chrpa. Using algorithm configuration tools to generate hard SAT
benchmarks. In Proceedings of the 11th International Symposium on Combinatorial Search
(SOCS ’18), pages 133–137. AAAI Press, 2018.
[BCC+03]
Armin Biere, Alessandro Cimatti, Edmund M. Clarke, Ofer Strichman, and Yunshan Zhu.
Bounded model checking. Advances in Computers, 58:117–148, 2003.
[BFFH20]
Armin Biere, Katalin Fazekas, Mathias Fleury, and Maximillian Heisinger. CaDiCaL,
Kissat, Paracooba, Plingeling and Treengeling entering the SAT Competition 2020. In
Tomas Balyo, Nils Froleyks, Marijn Heule, Markus Iser, Matti J¨arvisalo, and Martin Suda,
editors, Proceedings of SAT Competition 2020: Solver and Benchmark Descriptions, volume
B-2020-1 of Department of Computer Science Report Series B, pages 51–53. University of
Helsinki, 2020.
[BFH21]
Armin Biere, Mathias Fleury, and Maximillian Heisinger. CaDiCaL, Kissat, Paracooba
entering the SAT Competition 2021. In Tomas Balyo, Nils Froleyks, Marijn Heule, Markus
Iser, Matti J¨arvisalo, and Martin Suda, editors, Proceedings of SAT Competition 2021:
Solver and Benchmark Descriptions, volume B-2021-1 of Department of Computer Science
Report Series B, pages 10–13. University of Helsinki, 2021.
[BHL+02]
Wolfgang Barthel, Alexander K. Hartmann, Michele Leone, Federico Ricci-Tersenghi,
Martin Weigt, and Riccardo Zecchina. Hiding solutions in random satisfiability problems:
A statistical mechanics approach. Physical review letters, 88(188701):1–4, 2002.
38
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
[BHvMW09]
Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors. Handbook of
Satisfiability, volume 185 of Frontiers in Artificial Intelligence and Applications. IOS Press,
2009.
[Bie14]
Armin Biere. Yet another local search solver and Lingeling and friends entering the SAT
Competition 2014. In Proceedings of SAT Competition 2014: Solver and Benchmark
Descriptions, volume B-2014-2 of Department of Computer Science Report Series B, pages
39–40. University of Helsinki, 2014.
[Bie17]
Armin Biere. Cadical, lingeling, plingeling, treengeling, yalsat entering the sat competition
2017. In Proceedings of SAT Competition 2017: Solver and Benchmark Descriptions, volume
B-2017-1 of Department of Computer Science Report Series B, pages 14–15. University of
Helsinki, 2017.
[BM16]
Adrian Balint and Norbert Manthey. dimetheus. In Proceedings of SAT Competition 2016:
Solver and Benchmark Descriptions, volume B-2016-1 of Department of Computer Science
Report Series B, pages 37–38. University of Helsinki, 2016.
[BS00]
Robert G. Bartle and Donald R. Sherbert. Introduction to real analysis. Wiley New York,
3rd edition, 2000.
[BS12]
Adrian Balint and Uwe Sch¨oning. Choosing probability distributions for stochastic local
search and the role of make versus break. In Proceedings of the 15th International Conference
on Theory and Applications of Satisfiability Testing (SAT ’12), volume 7317 of Lecture
Notes in Computer Science, pages 16–29. Springer, 2012.
[CBRZ01]
Edmund M. Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. Bounded model
checking using satisfiability solving. Formal Methods in System Design, 19(1):7–34, 2001.
[CE88]
Edwin L. Crow and Kunio Shimizu (Editors). Lognormal Distributions: Theory and
Applications, volume 88 of Statistics: A Series of Textbooks and Monographs. Marcel
Dekker, 1988.
[Che17]
Russell Cheng. Non-standard parametric statistical inference. Oxford University Press,
2017.
[CLS15]
Shaowei Cai, Chuan Luo, and Kaile Su. CCAnr: A configuration checking based local
search solver for non-random satisfiability. In Proceedings of the International Conference
on Theory and Applications of Satisfiability Testing (SAT ’15), volume 9340 of Lecture
Notes in Computer Science, pages 1–8. Springer, 2015.
[Coo71]
Stephen A. Cook. The complexity of theorem-proving procedures. In Proceedings of the
3rd Annual ACM Symposium on Theory of Computing (STOC ’71), pages 151–158, 1971.
[CZ18]
Shaowei Cai and Xindi Zhang. Reasonls. In Proceedings of SAT Competition 2018: Solver
and Benchmark Descriptions, volume B-2018-1 of Department of Computer Science Report
Series B, pages 52–53. University of Helsinki, 2018.
[CZ19]
Shaowei Cai and Xindi Zhang. Four relaxed CDCL solvers. In Marijn J. H. Heule, Matti
J¨arvisalo, and Martin Suda, editors, Proceedings of SAT Race 2019: Solver and Benchmark
Descriptions, volume B-2019-1 of Department of Computer Science Report Series B, pages
35–36. University of Helsinki, 2019.
[CZ21]
Shaowei Cai and Xindi Zhang. Deep cooperation of CDCL and local search for SAT. In
Proceedings of the 24th International Conference on Theory and Applications of Satisfiability
Testing (SAT ’21), volume 12831 of Lecture Notes in Computer Science, pages 64–81.
Springer, 2021.
[CZFB22]
Shaowei Cai, Xindi Zhang, Mathias Fleury, and Armin Biere. Better decision heuristics in
CDCL through local search and target phases. Journal of Artificial Intelligence Research,
74:1515–1563, 2022.
[Die21]
Maximilian Diemer. Source code of GenFactorSat, 2021. Newest version available at
https://github.com/madiemer/gen-factor-sat/.
39
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
[Emp18]
Sextus Empiricus. Reciprocal of shifted lognormal random variable. Cross Validated (Stats
Stack Exchange), 2018. URL: https://stats.stackexchange.com/q/379626.
[EPV08]
Tobias Eibach, Enrico Pilz, and Gunnar V¨olkel. Attacking bivium using SAT solvers. In
Proceedings of the 11th International Conference on Theory and Applications of Satisfiability
Testing (SAT ’08), volume 4966 of Lecture Notes in Computer Science, pages 63–76. Springer,
2008.
[ES04]
Niklas E´en and Niklas S¨orensson. An extensible SAT-solver. In 6th International Conference
on Theory and Applications of Satisfiability Testing (SAT ’03), Selected Revised Papers,
volume 2919 of Lecture Notes in Computer Science, pages 502–518. Springer, 2004.
[FKZ11]
Sergey Foss, Dmitry Korshunov, and Stan Zachary. An Introduction to Heavy
-
Tailed and
Subexponential Distributions, volume 6. Springer, 2011.
[FRV97]
Daniel Frost, Irina Rish, and Llu´ıs Vila. Summarizing CSP hardness with continuous proba-
bility distributions. In Proceedings of the 14th National Conference on Artificial Intelligence
and 9th Innovative Applications of Artificial Intelligence Conference (AAAI/IAAI ’97),
pages 327–333, 1997.
[Gab15]
Oliver Gableske. Source code of kcnfgen (version 1.0), 2015. Retrieved from https:
//www.gableske.net/downloads/kcnfgen v1.0.tar.gz.
[GS97]
Carla P. Gomes and Bart Selman. Algorithm portfolio design: Theory vs. practice. In
Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (UAI ’97),
pages 190–197, 1997.
[GSCK00]
Carla P. Gomes, Bart Selman, Nuno Crato, and Henry A. Kautz. Heavy-tailed phenomena
in satisfiability and constraint satisfaction problems. Journal of Automated Reasoning,
24:67–100, 2000. Related version in CP ’97.
[HS98]
Holger H. Hoos and Thomas St¨utzle. Evaluating Las Vegas algorithms: Pitfalls and
remedies. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence
(UAI ’98), pages 238–245, 1998.
[JKB94]
Norman L. Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan. Continuous Uni-
variate Distributions, Volume 1. Wiley Series in Probability and Statistics. John Wiley &
Sons, 2nd edition, 1994.
[Joh49a]
Norman Lloyd Johnson. Bivariate Distributions Based on Simple Translation Systems.
Biometrika, 36(3-4):297–304, 1949.
[Joh49b]
Norman Lloyd Johnson. Systems of Frequency Curves Generated by Methods of Translation.
Biometrika, 36(1-2):149–176, 1949.
[KKS00]
Alexis C. Kaporis, Lefteris M. Kirousis, and Yannis C. Stamatiou. A note on the non-
colorability threshold of a random graph. The Electronic Journal of Combinatorics, 7(1),
2000.
[KLW22]
Tom Kr¨uger, Jan-Hendrik Lorenz, and Florian W¨orz. Too much information: Why CDCL
solvers need to forget learned clauses. PLOS ONE, 17(8):1–28, 2022. doi:10.1371/journal.po
ne.0272967.
[Lac11]
John M. Lachin. Biostatistical Methods: The Assessment of Relative Risks. John Wiley &
Sons, 2nd edition, 2011.
[LENV17]
Massimo Lauria, Jan Elffers, Jakob Nordstr¨om, and Marc Vinyals. CNFgen: A generator
of crafted benchmarks. In Proceedings of the 20th International Conference on Theory and
Applications of Satisfiability Testing (SAT ’17), pages 464–473, 2017.
[LM06]
Inˆes Lynce and Jo˜ao Marques-Silva. SAT in bioinformatics: Making the case with haplotype
inference. In Proceedings of the 9th International Conference on Theory and Applications
of Satisfiability Testing (SAT ’06), volume 4121 of Lecture Notes in Computer Science,
pages 136–141. Springer, 2006.
40
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
[Lor18]
Jan-Hendrik Lorenz. Runtime distributions and criteria for restarts. In Proceedings of
the 44th International Conference on Current Trends in Theory and Practice of Computer
Science (SOFSEM ’18), pages 493–507. Springer, 2018.
[LRV16]
Eduardo Lalla-Ruiz and Stefan Voss. Improving solver performance through redundancy.
Journal of Systems Science and Systems Engineering, 25(3):303–325, 2016.
[LW20]
Jan-Hendrik Lorenz and Florian W¨orz. On the effect of learned clauses on stochastic local
search. In Proceedings of the 23rd International Conference on Theory and Applications
of Satisfiability Testing (SAT ’20), volume 12178 of Lecture Notes in Computer Science,
pages 89–106. Springer, 2020. Implementation and statistical tests of GapSAT available at
Zenodo doi:10.5281/zenodo.3776052.
[LW21]
Jan-Hendrik Lorenz and Florian W¨orz. Source code of concealSATgen, February 2021.
Newest version available at https://github.com/FlorianWoerz/concealSATgen/.
[Man21]
Norbert Manthey. The MergeSat solver. In Proceedings of the International Conference on
Theory and Applications of Satisfiability Testing (SAT ’21), volume 12831 of Lecture Notes
in Computer Science, pages 387–398. Springer, 2021.
[MMZ+01]
Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik.
Chaff: Engineering an efficient SAT solver. In Proceedings of the 38th Design Automation
Conference (DAC ’01), pages 530–535, 2001.
[MMZ06]
Stephan Mertens, Marc M´ezard, and Riccardo Zecchina. Threshold values of random
k
-SAT
from the cavity method. Random Structures & Algorithms, 28(3):340–373, 2006.
[MS96]
Jo˜ao P. Marques-Silva and Karem A. Sakallah. GRASP—a new search algorithm for
satisfiability. In Proceedings of the IEEE/ACM International Conference on Computer-
Aided Design (ICCAD ’96), pages 220–227, 1996.
[MU17]
Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomization and
Probabilistic Techniques in Algorithms and Data Analysis. Probability and Computing:
Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge
University Press, 2nd edition, 2017.
[NWZ20]
Jayakrishnan Nair, Adam Wierman, and Bert Zwart. The fundamentals of heavy tails:
Properties, emergence, and estimation. Preprint, California Institute of Technology, 2020.
[RBH03]
Marvin Rausand, Anne Barros, and Arnljot Hoyland. System reliability theory: models,
statistical methods, and applications. John Wiley & Sons, 2nd edition, 2003.
[RF97]
Irina Rish and Daniel Frost. Statistical analysis of backtracking on inconsistent CSPs. In
Proceedings of the 3rd International Conference on Principles and Practice of Constraint
Programming (CP ’97), pages 150–162, 1997.
[RHK02]
Yongshao Ruan, Eric Horvitz, and Henry A. Kautz. Restart policies with dependence
among runs: A dynamic programming approach. In Proceedings of the 8th International
Conference on Principles and Practice of Constraint Programming (CP ’02), volume 2470
of Lecture Notes in Computer Science, pages 573–586. Springer, 2002.
[Rud64]
Walter Rudin. Principles of mathematical analysis, volume 3. McGraw-Hill New York,
1964.
[Sch02]
Uwe Sch¨oning. A probabilistic algorithm for k-sat based on limited local search and restart.
Algorithmica, 32(4):615–623, 2002. Preliminary version in FOCS ’99.
[SKC94]
Bart Selman, Henry A. Kautz, and Bram Cohen. Noise strategies for improving local search.
In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI ’94), pages
337–343. AAAI Press / The MIT Press, 1994.
[SNC09]
Mate Soos, Karsten Nohl, and Claude Castelluccia. Extending SAT solvers to cryptographic
problems. In Proceedings of the 12th International Conference on Theory and Applications
of Satisfiability Testing (SAT ’09), volume 5584 of Lecture Notes in Computer Science,
pages 244–257. Springer, 2009.
41
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
[VLS+15] Gunnar V¨olkel, Ludwig Lausser, Florian Schmid, Johann M. Kraus, and Hans A. Kestler.
Sputnik: ad hoc distributed computation. Bioinformatics, 31(8):1298–1301, 2015.
[Wic17]
Sven Dag Wicksell. On logarithmic correlation with an application to the distribution of
ages at first marriage. Meddelanden fr˚an Lunds Astronomiska Observatorium, 84:1–21,
1917.
[WL21]
Florian W¨orz and Jan-Hendrik Lorenz. Evidence for long-tails in SLS algorithms. In Petra
Mutzel, Rasmus Pagh, and Grzegorz Herman, editors, Proceedings of the 29th Annual
European Symposium on Algorithms (ESA ’21), volume 204 of LIPIcs, pages 82:1–82:16.
Schloss Dagstuhl – Leibniz-Zentrum f¨ur Informatik, 2021. doi:10.4230/LIPIcs.ESA.2021.82.
[WL22]
Florian W¨orz and Jan-Hendrik Lorenz. Data set for “Towards an Understanding of Long
Tailed Runtimes”, 2022. We have provided all data of this paper. All base instances,
resolvents, and modifications can be found under
doi:10.5281/zenodo.4715893
. Visual
and statistical evaluations can be found under https://github.com/FlorianWoerz/Toward
s-an-Understanding-of-Long-Tailed-Runtimes, where all evaluations take place in the files
./evaluation/jupyter SB/evaluate *.ipynb
. A permanent version of this repository
has been preserved under doi:10.5281/zenodo.6945926.
A The Finite Case
Recall that in Equations (5) and (6) on page 26 we have seen that
ERLA(α)=F+I=X
c∈N0
PPP
LFirstSel(c+ 1) |A(α)·ERLFirstSel(c+ 1), A(α)
+PPP
LNeverSel |A(α)·ERLNeverSel, A(α).
In Section 4.3.1, we have analyzed the term
I
=
PPP
LNeverSel |A
(
α
)
·ERLNeverSel, A(α)
,
i. e., the case in which no clause of Lever gets selected. We have seen that we can write
I∼
=X
i∈N0
i
Y
k=1 X
γ:F γ=0
Q,
where
Q
is asymptotically Johnson SB distributed (cf. Proposition 59). Our aim in this appendix
is to obtain a similar representation for
F:=X
c∈N0
PPP
LFirstSel(c+ 1) |A(α)·ERLFirstSel(c+ 1), A(α).
We then proceed to analyze the distribution of the ensuing random variables in Section 4.3.3 of
the main body.
To abbreviate the notation, we use the next definition in the following.
Definition 62.
We let
V:
=
VarsUNSATL
(
β
)
, where
UNSATL
(
β
) is the set of clauses in
L
that are falsified by β.
Let us now state the first intermediate result of this section.
Proposition 63. It holds that
F=X
c∈N0X
β:Lβ=0 PFromαToβInc
|
FirstSel(c+ 1), A(α)·PPP
LFirstSel(c+ 1) |A(α)(12)
·X
x∈V
PPP
LFlipα(c, β, x)|FromαToβInc,FirstSel(c+ 1), A(α)(13)
·c+1+ERLA(β0).(14)
42
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Proof.
We will apply the LTE twice to analyze the finite case (i. e., terms in
ERLA(α)
with
an eventual selection of a clause from
L
). In the first application of the LTE we condition over
the event FromαToβIncand obtain
F=X
c∈N0
PPP
LFirstSel(c+ 1) |A(α)·ERLFirstSel(c+ 1), A(α)
LTE
=X
c∈N0X
β:Lβ=0 nPFromαToβInc
|
FirstSel(c+ 1), A(α)·PPP
LFirstSel(c+ 1) |A(α)
·ERLFirstSel(c+ 1),FromαToβInc, A(α)o.
Due to Example 45 (iii), we only need to consider such
β∈ {
0
,
1
}n
in the sum emerging from
the application of the LTE that falsify a clause in L.
For the second application of the LTE we condition over the event
Flipα(c, β, x)
. Then, we
can write
FLTE
=X
c∈N0X
β:Lβ=0 nPFromαToβInc
|
FirstSel(c+ 1), A(α)·PPP
LFirstSel(c+ 1) |A(α)(15)
·X
x∈V
PPP
LFlipα(c, β, x)|FromαToβInc,FirstSel(c+ 1), A(α)(16)
·ERLFirstSel(c+ 1),FromαToβInc,Flipα(c, β , x), A(α)o.(17)
One can notice the following two facts: First, the expression
PFromαToβInc
|
FirstSel(c+ 1), A(α)
is no random variable depending on
L
since the event
FirstSel
(
c
+ 1) ensures that in none of
the first
c
for-loop iterations, a clause from
L
gets selected, but only in the (
c
+ 1)-st iteration.
Thus, the probability that the algorithm ends up in
β
after the
c
-th flip does not depend on
L
since the algorithm behaves as on the original instance F.
Secondly, observe that the term appearing in line (17) can be expressed recursively as
ERLFirstSel(c+ 1),FromαToβInc,Flipα(c, β , x), A(α)=c+1+ERLA(β0),(18)
where
β0:
=
β
[
x
], since
c
flips were performed to get from the initial assignment
α
to assignment
β
,
and one
x
-flip was performed to get from
β
to
β0
. The remaining expected runtime is independent
of the previous history of the performed random walk and thus only depends on the event
A
(
β0
).
Now, replacing (17) with (18) yields the proposition.
While it turns out in Section 4.4.1 that we are able to analyze the distribution of the random
variable
P:=PPP
LFlipα(c, β, x)|FromαToβInc,FirstSel(c+ 1), A(α),
appearing in line
(13)
, and show that it is asymptotically Johnson SB distributed, the distribution
analysis of the more harmless-looking random variable
PPP
LFirstSel
(
c
+ 1)
|A
(
α
)
appearing in
line (12) requires further work.
The main idea in the following steps is to split the information that the event
FirstSel
(
c
+ 1)
contains into several events, apply the chain rule and analyze the resulting cases separately.
While in the beginning, this might seem like a lot of unnecessary calculations, after being done,
we will reap the fruit of our labor: we will be able to conduct a distribution analysis of all
43
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
random variables easily. For this purpose, recall Definition 41: Let
Sel
(
c
) be the indicator
variable being 1 if and only if a clause in Lgets selected in the c-th iteration of SRWA.
Following the above-specified plan, we next apply the chain rule to the random vari-
able
PPP
LFirstSel
(
c
+ 1)
|A
(
α
)
. With the notation of Definition 41 in place, and using
the fact that
FirstSel(c+ 1) = {Sel(c+ 1) = 1} ∩
c
\
i=1{Sel(i)=0},
one obtains
PPP
LFirstSel(c+ 1) |A(α)
=PPP
L {Sel(c+ 1) = 1} ∩
c
\
i=1{Sel(i)=0}
|
A(α)!
=D1·
c
Y
k=1 D2,(19)
where
D1:=PPP
L
Sel(c+ 1) = 1
|
c
\
j=1{Sel(j)=0}, A(α)
,and
D2:=PPP
L
Sel(k)=0
|
k−1
\
j=1{Sel(j)=0}, A(α)
.
Here, the nullary intersection
T0
j=1{Sel
(
j
) = 0
}:
= Ω, i.e., as the whole sample space, because
the condition of the intersection is a vacuous truth.
We analyze the random variables of the first factor, i. e.,
D1
, and the factors in the big
product of line
(19)
, i. e.,
D2
, separately in the cases A.1 and A.2 below. In Section A.3, we
finish the case analysis and present the central proposition of this appendix.
A.1 The Case Sel(c+ 1) = 1 of the First Factor D1in Line (19)
With the LTP, conditioning on the event FromαToγIncone obtains
D1=PPP
L
Sel(c+ 1) = 1
|
c
\
j=1{Sel(j)=0}, A(α)
=X
γ:F γ=0
PPP
L
Sel(c+ 1) = 1
|
FromαToγInc,
c
\
j=1{Sel(j)=0}, A(α)
·P
FromαToγInc
|
c
\
j=1{Sel(j)=0}, A(α)
.
Note that once again,
PhFromαToγInc
|
Tc
j=1{Sel(j)=0}, A(α)i
does not depend on
L
since
the event
Tc
j=1{Sel(j)=0}
rules out the usage of any clauses of
L
on the way from
α
to
γ
. One
44
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
readily checks that
PPP
L
Sel(c+ 1) = 1
|
FromαToγInc,
c
\
j=1{Sel(j)=0}, A(α)
=PPP
LSel(c+ 1) = 1 |FromαToγInc, A(α)
=Q,
(20)
since
SRWA
has the property that past history does not matter, i. e., once the algorithm has
arrived in assignment
γ
after
c
steps, it does not matter how it arrived in this assignment or
what clauses it touched on its way.
We will take care of the distribution analysis of the random variable Qin Section 4.4.2.
A.2 The Case Sel(k) = 0 of the Factors in the Big Product of Line (19)
For
k∈ {
1
, . . . , c}
we analyze the factors in the big product of line
(19)
with the LTP conditioning
on FromαToγInk−1:
D2=PPP
L
Sel(k)=0
|
k−1
\
j=1{Sel(j)=0}, A(α)
=X
γ:F γ=0
PPP
L
Sel(k)=0
|
FromαToγInk−1,
k−1
\
j=1{Sel(j)=0}, A(α)
·P
FromαToγInk−1
|
k−1
\
j=1{Sel(j)=0}, A(α)
.
An analogous argumentation as in Section A.1 shows that
P
FromαToγInk−1
|
k−1
\
j=1{Sel(j)=0}, A(α)
is no random variable. Similarly, one can use the reasoning brought forward in the last section
to see that
PPP
L
Sel(k)=0
|
FromαToγInk−1,
k−1
\
j=1{Sel(j)=0}, A(α)
=PPP
LSel(k)=0|FromαToγInk−1, A(α).
We will use
R
in the following to denote the above specified random variable. Its distribution
analysis can be found in Section 4.4.3.
A.3 Putting Together Both Cases
Having obtained the results of the last two sections, we immediately arrive at the following.
Proposition 64 (Analysis of the finite case c < ∞).The term ERLA(α)can be written as
X
c∈N0X
β:Lβ=0
C3
X
γ:F γ=0
C4·R
·
c
Y
k=1
X
γ:F γ=0
C4·Q
·X
x∈V
P·c+1+ERLA(β0)
+I,
45
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
with the constants being defined by
C3:=C3(α, c, β):=PFromαToβInc
|
FirstSel(c+ 1), A(α)∈[0,1],
C4:=C4(α, c, γ):=P
FromαToγInc
|
c
\
j=1{Sel(j)=0}, A(α)
∈[0,1],
and the random variables P,Q, and Ras in Proposition 51.
A.4 Allowing Restarts
At the beginning of Section 4.3, we mentioned that our arguments implicitly assume that
SRWA
does not employ any restarts. However, for the most part, our arguments also apply to the
restarted version of the algorithm. The only technical challenge is adapting Equation
(18)
in
the appendix:
ERLFirstSel(c+ 1),FromαToβInc,Flipα(c, β , x), A(α)=c+1+ERLA(β0).(21)
Here, the difficulty is that after some flips, a restart is due, and thus
SRWA
does not continue
its search in β0:=β[x]. Instead, a new random assignment is chosen.
At its core, the reasoning behind Equation
(21)
is still valid. The left-hand side of Equa-
tion
(21)
uses the condition
FirstSel
(
c
+1), which tells us that the algorithm performed
c
+1 flips
without finding a satisfying assignment. The only difference is that
SRWA
might have performed
one (or several) restarts in the meantime. This can be handled by introducing an additional
counter
T
in the expectation
ERL
. The counter
T
is used to count the number of remaining
flips until a restart is performed. Incorporating this counter in the argument readily yields the
following equations:
ERLT= 0, A(α)=X
β∈{0,1}n
PA(β)·ERLT=trestart, A(β),
where PA(β)=1
2n; and
ERLFirstSel(c+ 1),FromαToβInc,Flipα(c, β , x), T =d, A(α)
=c+1+ERLT=d−(c+ 1)mod trestart , A(β0).
The rest of the analysis remains unaltered besides the need to add the counter
T
in each part of
the proofs.
B Connections Between Different Distributions
B.1 Embedded Models: Johnson SB Approaches Lognormal
In this section, we provide some details about the embedding property of lognormal and Johnson
SB distributions.
Lemma 65
([
Che17
,
JKB94
])
.
The lognormal distribution is an embedded model of the SB
distribution.
46
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Proof (adapted from [Che17]).
Reparametrizing with
b
=
α−1
and
γ
=
δln (α−1−a)−µ
yields
f(x) = 1
√2π
(1
α−a)δ
(x−a)( 1
α−x)exp−1
2δln 1
α−a−µ+δln x−a
1
α−x2.
The log-likelihood function Lis then given by:
L(a, δ, α, µ) = −1
2ln 2π+ ln δ+ ln
1
α−a
1
α−x−ln (x−a)−1
2δ ln
1
α−a
1
α−x−µ!+δln (x−a)2
.
Thus,
(a, δ, α, µ):=∂
∂α L(a, δ, α, µ) = −
(x−a)δ2ln
1
α−a
1
α−x+δ2ln (x−a)−δ2µ−1
(aα −1) (xα −1) .
Furthermore, as
lim
α→0ln
1
α−a
1
α−x= 0
holds, we have
lim
α→0L(a, δ, α, µ) = −1
2ln 2π+ ln δ−ln (x−a)−1
2δln (x−a)−δµ2and
lim
α→0(a, δ, α, µ) = −(x−a)δ2ln (x−a)−δ2µ−1.
We represent
L
as a Taylor series of
α
in
α
= 0. Observing this series expansion for
α→
0
yields:
lim
α→0L(a, δ, α, µ) = lim
α→0L(a, δ, 0, µ) + (a, δ, 0, µ)·α+ O(α2)
=−1
2ln 2π+ ln δ−ln (x−a)−1
2δ2ln (x−a)−µ2
+ (x−a)h1 + δ2(µ−ln (x−a))ilim
α→0α+ lim
α→0O(α2)
=−1
2ln 2π+ ln δ−ln (x−a)−1
2δ2ln (x−a)−µ2.
This is precisely the log-likelihood function of a lognormal distribution. Thus, the Johnson SB
distribution approaches a lognormal distribution for
α→
0 (or
b→ ∞
as well as
γ→ ∞
in the
original parameterization2).
B.2 A Proof of Lemma 54
In this section, we will prove one of our main lemmas. This lemma will be used in the distribution
analysis of the random variables P,Q, and R.
Lemma 66 (Lemma 54 restated).Let Y∼LogN µ, σ2, then we have
1
c+Y∼SB γ=µ−ln c
σ, δ =1
σ, λ =1
c, ξ = 0
for all c∈R+.
2
There is a slight typo in the quoted source [
Che17
], stating that
b
should approach zero. However, following
the presented argument in [
Che17
], it is clear that
b
has to approach infinity since
α
approaches zero and
b
=
1
α
.
47
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Proof of Lemma 54 (adapted from [Emp18]).
Let
W∼Nµ, σ2
. Then
−W∼N−µ, σ2
. By
definition, Y:= eW∼LogN µ, σ2. Define the random variable
X:=1
c+Y=1
c+ eW=1
c+c·eW−ln c=1
c+c·eW0=e−W0
ce−W0+c,
where
W0∼Nµ−ln c, σ2.
A few simple rearrangements yield
−W0= log cX
1−cX= log X
1
c−X!.
Letting λ:= 1/c and ξ:= 0, we have
−W= log X−ξ
ξ+λ−X.
Define δ:=1
σand γ:=µ−lnc
σand let Zbe a random variable such that
Z=−W0·δ+γ.
Since
−W0∼Nln c−µ, σ2,
we have
−W0·δ∼Nln c−µ
σ,1
and
−W0·δ+γ∼N (0,1).
Per definition, X∼SB γ=µ−ln c
σ, δ =1
σ, λ =1
c, ξ = 0.
B.3 A Proof of Lemma 55
Our aim in this section will be to provide a proof of Lemma 55. We have restated the lemma
below for convenience.
Lemma 67
(Lemma 55 restated, [
Lac11
])
.
If
Y∼Binn, p
, then
log Y
is asymptotically
normally distributed.
For the proof of the lemma, we will need to introduce some technical machinery.
Definition 68
(Order in probability)
.
For a sequence of random variables (
Xn
)
n∈N
and a
corresponding sequence of constants (
an
)
n∈N
, we write
Xn∈
O
p
(
an
) if for all
ε >
0, there exists
an M > 0 and an N > 0 such that
P
Xn
an> M < ε for all n > N.
Definition 69.
An estimator
ˆ
θn
is a consistent estimator for the parameter
θ
if
ˆ
θp
−→
n→∞ θ
. An
estimator ˆ
θnis a √n-consistent estimator for the parameter θif ˆ
θn−θ= Op(1/√n).
48
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Example 70. From the central limit theorem, it follows that for all ε > 0 it holds
P|Xn−µ|> ε=P"√n|Xn−µ|
σ>√nε
σ#= 2 1−Φ √nε
σ!!n→∞
−→ 0,
i. e.,
Xn
p
−→
n→∞ µ
. Thus,
Xn
is a consistent estimator for
µ
. Because
√nXn−µd
−→
n→∞ N0, σ2
,
we can also see that Xnis a √n-consistent estimator of µ.
Theorem 71
(Slutsky’s Theorem)
.
Let (
Xn
)
n∈N,
(
An
)
n∈N,
(
Bn
)
n∈N
be sequences of random
variables. If
Xn
converges to a random variable
X
in distribution,
Xnd
−→
n→∞ X
, and
An
p
−→
n→∞ a
,
as well as Bn
p
−→
n→∞ b, then
An+BnXnd
−→
n→∞ a+bX.
With these definitions in place, we can now prove Lemma 55.
Proof of Lemma 55 (adaped from [Lac11]).
Let
n∈N+
and
p∈
(0
,
1). For
i∈ {
1
, . . . , n}
let
Xi∼Bern(p)
. Let (
x1, . . . , xn
) denote a corresponding sample of these random variables. Then,
Sn:=X1+·· · +Xn∼Binn, p. Since
E[Xn] = E"1
n
n
X
i=1
Xi#=1
nE"n
X
i=1
Xi#=1
n
n
X
i=1
E[Xi] = 1
nnp =p,
a natural moment estimator of
p
is the sample mean
Xn:
=
Sn/n
. In the following, we let
xn:
=
1
n
(
x1
+
·· ·
+
xn
) denote the concrete sample mean. Applying Taylor’s expansion to the
mapping x7→ log x, we obtain
log xn= log p+d log p
dp(xn−p) + R2(ξ),
where
R2(ξ) = d2log ξ
dξ2(xn−p)2
is the Lagrange remainder term and
ξ
lies between
xn
and
p
. Since
log0
(
x
) =
1
x
and
log00
(
x
) =
−1
x2
for all x > 0, this yields to
√nlog xn−log p=√nxn−p
p−√n(xn−p)2
2ξ2.
It is well known that
Xn
is asymptotically normal distributed (and the same holds for the
concrete realizations xn), more precisely
xn
d
≈Np, p(1 −p)
n.
Thus, the first term on the right-hand side,
√nxn−p
p
, is likewise asymptotically normal distributed,
more precisely √nxn−p
p
d
≈N0,1−p
p.
49
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
In Example 70, we saw that
xn
is a
√n
-consistent estimator of
p
. Hence, (
xn−p
)
2→
0
as n→ ∞ faster than n−1/2. Thus,
√nR2(ξ)p
−→
n→∞ 0.
Therefore, asymptotically
√nlog xn−log p
is the sum of two random variables, the first
one converging in distribution to the normal, the second one converging in probability to zero
(i. e., a constant). From Slutsky’s Theorem, it follows that
√nlog xn−log pd
−→
n→∞ N0,1−p
p.
Hence,
log xn
d
≈Nlog p, 1−p
np .
In other words,
xn
is asymptotically lognormal distributed. Since
xn
is binomially distributed,
the claim follows.
50