Page 1

Simple Random Logic Programs

Gayathri Namasivayam and Miros? law Truszczy´ nski

Department of Computer Science, University of Kentucky, Lexington, KY

40506-0046, USA

Abstract. We consider random logic programs with two-literal rules

and study their properties. In particular, we obtain results on the proba-

bility that random “sparse” and “dense” programs with two-literal rules

have answer sets. We study experimentally how hard it is to compute

answer sets of such programs. For programs that are constraint-free and

purely negative we show that the easy-hard-easy pattern emerges. We

provide arguments to explain that behavior. We also show that the hard-

ness of programs from the hard region grows quickly with the number of

atoms. Our results point to the importance of purely negative constraint-

free programs for the development of ASP solvers.

1Introduction

The availability of a simple model of a random CNF theory was one of the

enabling factors behind the development of fast satisfiability testing programs

— SAT solvers. The model constrains the length of each clause to a fixed integer,

say k, and classifies k-CNF theories according to their density, that is, the ratio

of the number of clauses to the number of atoms. k-CNF theories with low

densities have few clauses relative to the number of atoms. Thus, most of them

have many solutions, and solutions are easy to find. k-CNF theories with high

densities have many clauses relative to the number of atoms. Thus, most of

them are unsatisfiable. Moreover, due to the abundance of clauses, proofs of

contradiction are easy to find. As theories in low- and high-density regions are

“easy,” they played essentially no role in the development of SAT solvers.

There is, however, a narrow range of densities “in between,” called the phase

transition, where random k-CNF theories change rapidly from most being satis-

fiable to most being unsatisfiable. Somewhere in that narrow range is a value d

such that random k-CNF theories with density d are satisfiable with the proba-

bility 1/2. The problem of determining that value has received much attention.

For instance, for 3-CNF theories, the phase-transition density was found exper-

imentally to be about 4.25 [1]. A paper by Achlioptas discusses recent progress

on the problem, including some lower and upper bounds on the phase transition

value [2]. A key property of 3-CNF theories from the phase transition region

is that they are hard.1Thus, we have the easy-hard-easy difficulty pattern as

1It should be noted that the low- and high-density regions also contain challenging

theories, but they are relatively rare [3]).

Page 2

the function of density. Moreover, deciding satisfiability of programs from the

hard region is very hard indeed! Designing solvers that could solve random un-

satisfiable 3-CNF theories with 700 atoms generated from the phase-transition

region was one of grand challenges for SAT research posed by Selman, Kautz

and McAllester [4]. It resulted in major advances in SAT solver technology.

As in the case of the SAT research, work on random logic programs is likely

to lead to new insights into the properties of answer sets of programs, and lead

to advances in ASP solvers — software for computing them. Yet, the question

of models of random logic programs has received little attention so far, with the

work of Zhao and Lin [5] being a notable exception. Our objective is to propose

a model of simple random logic programs and investigate its properties.

As in SAT, we consider random programs with rules of the same length. For

the present study, we further restrict our attention to programs with two-literal

rules. These programs are simple, which facilitates theoretical studies. But de-

spite their simplicity, they are of considerable interest. First, every problem in

NP can be reduced in polynomial time to the problem of deciding the existence of

an answer set of a program of that type [6]. Second, many problems of interest

have a simple encoding in terms of such programs [7]. We study experimen-

tally and analytically properties of programs with two-literal rules. We obtain

results on the probability that random programs with two-literal rules, both

“sparse” and “dense,” have answer sets. We study experimentally how hard it is

to compute answer sets of such programs. We show that for programs that are

constraint-free and purely negative the easy-hard-easy pattern emerges. We give

arguments to explain that phenomenon, and show that the hardness of programs

from the hard region grows quickly with the number of atoms. Our results point

to the importance of constraint-free purely negative programs for the develop-

ment of ASP solvers, as they can serve as useful benchmarks when developing

good search heuristics. However, unlike in the case of SAT, depending on the

parameters of the model, we either do not observe the phase transition or, when

we do, it is gradual not sudden.

Even relatively small programs from the hard region are very hard for the

current generation of ASP solvers. Interestingly, that observation may also have

implications for the design of SAT solvers. If P is a purely negative program,

answer sets of P are models of its completion comp(P), a certain propositional

theory [8]. For programs with two-literal rules the completion is (essentially) a

CNF theory. Our experiments showed that these theories are very hard for the

present-day SAT solvers, despite the fact that most of their clauses are binary.

2Preliminaries

Logic programs consist of rules, that is, of expressions of the form

a ← b1,...,bm,not c1,...,not cn

(1)

and

← b1,...,bm,not c1,...,not cn, (2)

Page 3

where a, bi and cj are atoms. Rules (1) are called definite, and rules (2) —

constraints. A rule is proper if no atom occurs in it more than once. A rule is

k-regular if it consists of k literals (that is, it is a definite rule with k−1 literals

in the body, or a constraint with k literals in the body).

If r is a rule of type (1) or (2), the expression b1,...,bm,not c1,...,not cn

(understood as the conjunction of its literals) is the body of r. We denote it by

bd(r). The set of atoms {b1,...,bm} is the positive body of r, denoted bd+(r),

and the set of atoms {c1,...,cn} is the negative body of r, denoted bd−(r).

In addition, the head of r, hd(r), is defined as a, if r is of type (1), and as

⊥, otherwise. A program P is constraint-free if it contains no constraints. A

program P is purely negative if for every non-constraint rule r ∈ P, bd+(r) = ∅.

A set of atoms M is an answer set of a program P if it is the least model

of the reduct of P with respect to M, that is, the program PMobtained by

removing from P every rule r such that M ∩ bd−(r) ?= ∅, and by removing all

literals of the form not c from all other rules of P.

Computing answer sets of propositional logic programs is the basic reasoning

task of answer-set programming, and fast programs that can do that, known as

answer-set programming solvers (ASP solvers, for short) have been developed in

the recent years [9–13].

3 2-Regular Programs

We assume a fixed set of atoms At = {a1,a2,...}. There are five types of 2-

regular rules: a ← not b; a ← b; ← not a,not b; ← a,not b; ← a,b. Accord-

ingly, we define five classes of programs, mR−

with atoms from Atn= {a1,...,an} and consisting of m proper rules of each of

these types, respectively. Without the reference to m, the notation refers to all

programs with n atoms of the corresponding type (for instance, R+

the class of all programs over Atnconsisting of proper rules of the form a ← b).

The maximum value of m for which mR−

n(n − 1). The maximum value of m for which mC−

n(n−1)/2. Let 0 ≤ m1,m2,c2≤ n(n−1) and 0 ≤ c1,c3≤ n(n−1)/2 be integers.

By [m1R−+ m2R++ c1C−+ c2C±+ c3C+]nwe denote the class of programs

P that are unions of programs from the corresponding classes. We refer to these

programs as components of P. If any of the integers mi and ci is 0, we omit

the corresponding term from the notation. When we do not specify the numbers

of rules, we allow any programs from the corresponding classes. For instance,

[R−+ R++ C−+ C±+ C+]nstands for the class of all proper programs with

atoms from Atn.

Given integers n and m, it is easy to generate uniformly at random programs

from each class mR−

program from mR−

ncan be viewed as the result of a process in which we start

with the empty program on the set of atoms Atn and then, in each step, we

add a randomly generated proper rule of the form a ← not b, with repeating

rules discarded, until m rules are generated. This approach generalizes easily

n, mR+

n, mC−

n, mC±

n, and mC+

n,

nstands for

n, mR+

nand mC±

nand mC+

nare not empty is

nare not empty is

n, mR+

n, mC−

n, mC±

n, and mC+

n. For instance, a random

Page 4

to programs from other classes we consider, in particular, to programs from

[m1R−+ m2R++ c1C−+ c2C±+ c3C+]n. Our goal is to study properties of

such random programs.

We start with a general observation. If P ∈ [m2R++c1C−+c2C±+c3C+]n

(m1= 0), then either P has no answer sets (if c1?= 0) or, otherwise, ∅ is a unique

answer set of P. Thus, in order to obtain interesting classes of programs, we must

have m1> 0. In other words, programs from R−

constraint-free) play a key role.

n(proper purely negative and

4 The Probability of a Program to Have an Answer Set

We study first the probability that a random program in the class [m1R−+

m2R++c1C−+c2C±+c3C+]nhas an answer set. In several places we use results

from random graph theory [14,15]. To this end, we exploit graphs associated with

programs. Namely, with a program P ∈ [R−+R++C±]nwe associate a directed

graph D(P) with the vertex set Atn, in which a is connected to b with a directed

edge (a,b) if b ← not a, b ← a or ← b,not a is a rule of P. For P ∈ [R−+R+]n,

the graph D(P) is known as the dependency graph of a program. Similarly, with

a program P ∈ [R−+ R++ C−+ C±+ C+]nwe associate an undirected graph

G(P) with the vertex set Atn, in which a is connected to b with an undirected

edge {a,b} if a and b appear together in a rule of P. If P ∈ [R−+ R++ C±]n,

then D(P) may have fewer edges than P has rules (the rules a ← not b, a ← b

and ← b,not a determine the same edge). A similar observation holds for G(P).

These graphs contain much information about the underlying programs. For

instance, it is well known that if P ∈ [R−+R+]nand D(P) has no cycles then P

has a unique answer set. Similarly, if P ∈ [m1R−+m2R++c1C−+c2C±+c3C+]n

and M is an answer set of P then M is an independent set in the graph G(P1),

where P1is the component of P from m1R−

We denote by AS+the class of all programs over At that have answer sets.

We write Prob(P ∈ AS+) for the probability that a random graph P from one

of the classes defined above has an answer set. That probability depends on n

(technically, it also depends on the numbers of rules of particular types but,

whenever it is so, the relevant numbers are themselves expressed as functions

of n). We are interested in understanding the behavior of Prob(P ∈ AS+) for

random programs P from the class [R−+ R++ C−+ C±+ C+]n(or one of its

subclasses). More specifically, we will investigate Prob(P ∈ AS+) as n grows to

infinity. If Prob(P ∈ AS+) → 1 as n → ∞, we say that P asymptotically almost

surely, or a.a.s for short, has answer sets. If Prob(P ∈ AS+) → 0 as n → ∞, we

say that P a.a.s. has no answer sets.

To ground our results in some intuitions, we first consider the probability that

a program from mR−

150has an answer set as a function of the density d = m/150

(or equivalently, the number of edges m). The graphs, shown in Figure 1, were

obtained experimentally. For each value of d, we generated 1000 graphs from the

set mR−

150, where m = 150d. The graph on the left shows the behavior of the

n.

Page 5

probability across the entire range of d. The graph on the right shows in more

detail the behavior for small densities.

(a)

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 20 40 60 80 100 120 140 160

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

(b)

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 2 4 6 8 10

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Fig. 1. The probability that a graph from mR−

150(m = 150d) has an answer set, as a function of d.

The graphs show that the probability is close to 1 for very small densities,

then drops rapidly. After reaching a low point (around 0.6, in this case), it starts

getting larger again and, eventually, reaches 1. We also note that the rate of

drop is faster than the rate of ascent. We will now present theoretical results

that quantify some of these observations. Our results concern the two extremes:

programs of low density and graphs of high density.

We start with programs of low density and assume first that they do not

have constraints. In this case, the results do not depend on whether or not we

allow positive rules.

Theorem 1. If m1+ m2= o(n) and P ∈ [m1R−+ m2R+]n, then P a.a.s has

a unique answer set.

Proof. (Sketch) Let P be a random program from [m1R−+m2R+]n. The directed

graph D(P) can be viewed as a random directed graph with n vertices, and

m′= o(n) edges (m′≤ m, as different rules in P may map onto the same

edge). Thus, D(P) a.a.s. has no directed cycles (the claim can be derived from

the property of random undirected graphs: a random undirected graph with n

vertices and o(n) edges a.a.s. has no cycles [15]). It follows that P a.a.s. has a

unique answer set.

2

If there are constraints in the program, the situation changes. Even a sin-

gle constraint of the form

← not a,not b renders a sparse random program

inconsistent.

Corollary 1. If c1 ≥ 1, m1+ m2 = o(n), and P is a random program from

[m1R−+ m2R++ c1C−]n, then P a.a.s. has no answer sets.

Proof. Let P be a random program from [m1R−+m2R++c1C−]n. Then, P =

P1∪ P2, where P1 is a random program from [m1R−+ m2R+]n and P2 is a

random program from c1C−

say M. Since P1has o(n) non-constraint rules, |M| = o(n). The probability that

n. By Theorem 1, P1a.s.s. has a unique answer set,