Page 1

Polynomial Time Approximation Schemes for Dense

Instances of NP-Hard Problems

Sanjeev Arora∗

David Karger†

Marek Karpinski‡

Abstract

We present a unified framework for designing polynomial time approximation schemes

(PTASs) for “dense” instances of many NP-hard optimization problems, including

maximum cut, graph bisection, graph separation, minimum k-way cut with and with-

out specified terminals, and maximum 3-satisfiability. By dense graphs we mean graphs

with minimum degree Ω(n), although our algorithms solve most of these problems so

long as the average degree is Ω(n). Denseness for non-graph problems is defined sim-

ilarly. The unified framework begins with the idea of exhaustive sampling: picking a

small random set of vertices, guessing where they go on the optimum solution, and then

using their placement to determine the placement of everything else. The approach

then develops into a PTAS for approximating certain smooth integer programs where

the objective function and the constraints are “dense” polynomials of constant degree.

1Introduction

Approximation algorithms, whenever they can be found, are a way to deal with the NP-

hardness of optimization problems. Ideally, they should run in polynomial time and have a

small approximation ratio, which is the worst-case ratio of the value of the solution returned

by the algorithm to the value of the optimum solution. (This definition is for minimization

problems; for maximization problems the ratio is inverted so that it is always at least 1.)

Optimization problems seem to be approximable to different degrees (see [Shm94] for

a survey). We know that unless P = NP, problems such as CLIQUE [FGL+91, AS92,

ALM+92] and CHROMATIC NUMBER [LY93] cannot be approximated even to within

a factor of nδin polynomial time, for some fixed δ > 0. (More recently, H˚ astad [H96]

showed that if SAT does not have randomized polynomial-time algorithms, then CLIQUE

cannot be approximated to within a factor n1−δ, for every δ > 0.) Others problems, such as

those related to graph separators [LR88], have algorithms with approximation ratios close

to O(logn). No inapproximability results are known for them. MAX-SNP problems, such

as MAX-CUT or MAX-3-SAT, can be approximated to within some fixed constant factor

but no better [PY91, ALM+92]. Only a few problems, such as KNAPSACK [S75] and BIN

PACKING [FL81], are known to have polynomial time approximation schemes (PTASs).

A PTAS is an algorithm that, for every fixed ǫ > 0, achieves an approximation ratio

of 1 + ǫ in time that is polynomial in the input size (but could grow very fast with 1/ǫ,

such as O(n1/ǫ)). A PTAS thus allows us to trade off approximation accuracy for running

time. (In the previous definition, if the running time is polynomial in 1/ǫ as well, then we

∗Princeton University. Supported by an NSF CAREER Award NSF CCR-9502747 and an Alfred Sloan

Fellowship. email: arora@cs.princeton.edu. URL: http://www.cs.princeton.edu/~arora

†MIT Laboratory for Computer Science.Work done at AT&T Bell Laboratories.

karger@lcs.mit.edu URL: http://theory.lcs.mit.edu/~karger

‡University of Bonn.Supported in part by the International Computer Science Institute, Berkeley,

California, by the DFG Grant KA 673/4-1, ESPRIT BR Grants 7097, 21726, and EC-US 030, and by the

Max-Planck Research Prize. Email: marek@cs.bonn.edu , URL: http://theory.cs.uni-bonn.de/~marek.

email:

1

Page 2

have a fully polynomial time approximation scheme. These are known to exist for a few

problems [GJ79, DFK91, KLM89].)

Unfortunately, recent results ([ALM+92]) show that if P ?= NP, then PTASs do not

exist for many NP-hard problems. In particular, this is is true for every MAX-SNP-hard

problem. (The class of MAX-SNP-hard problems includes VERTEX COVER, MAX-3-

SAT, MAX-CUT, METRIC TSP, MULTIWAY CUTS, and many others [PY91].)

Note that the inapproximability results mentioned above, like all NP-hardness results,

rule out approximation only on worst case instances of the problem. They do not rule out

the existence of algorithms (heuristics) that do well on most instances. This observation is

the starting point of our research.

This paper gives PTASs for a large class of NP-hard problems when the problem instance

is dense. The definition of denseness depends on the problem; for example, dense graphs

are graphs with Ω(n2) edges while dense 3-SAT formulas are those with Ω(n3) clauses.

Note that almost all graphs (asymptotically speaking) are dense, as are almost all 3-SAT

instances.

The design of many (but not all) of our PTAS’s relies on the observation that many

optimization problems can be phrased as nonlinear integer programs in which the objective

function is a low degree polynomial. For dense problems, the optimum value of the objective

function is quite large. Thus, to achieve a multiplicative approximation for dense instances it

suffices to achieve an additive approximation for the nonlinear integer programming problem.

We design such an additive approximation algorithm (see Sections 1.2 and 1.3).

In the remainder of this introduction, we describe the problems we solve and sketch our

solution techniques.

1.1Applicable Optimization Problems

We now describe the problems to which we apply our techniques. The reader will note

that the problems span a broad spectrum. Some, like maximum cut and maximum k-

satisfiability, are MAX-SNP-complete. Thus they do not have PTASs on general (that is,

non-dense) instances [ALM+92], but they can all be approximated within some constant

factor in polynomial time [PY91]. Others, like graph bisection and separation, do not

currently have any algorithms with approximation ratios better than O(logn) on general

instances. It is an open problem whether they are hard to approximate.

MAX-CUT: Partition the vertices of an undirected graph into two groups so as to maximize

the number of edges with exactly one endpoint in each group. An algorithm in [GW94]

achieves an approximation ratio of 1.13 for the problem.

MAX-DICUT: The directed version of the MAX-CUT problem. An algorithm in [FG95]

(improving [GW94]) achieves an approximation ratio of 1.15.

MAX-HYPERCUT(d): A generalization of MAX-CUT to hypergraphs of dimension d; an

edge is considered cut if it has at least one endpoint on each side.

SEPARATOR: Partition the vertices of a graph into two groups, each with at least 1/3 of

the vertices, so as to minimize the number of edges with exactly one endpoint in each

group. An algorithm in [LR88] achieves approximation ratio O(logn) (though it may

produce a 1/4 : 3/4 separator instead of a 1/3 : 2/3 separator).

BISECTION: Partition the vertices of an undirected graph into two equal halves so as to

minimize the number of edges with exactly one endpoint in each half. Some algorithms,

for example using eigenvalues [B87] or simulated annealing [JS93] do well on certain

random graphs (see also [BCLS84]). For worst-case inputs, no true approximation

algorithms are known. Some known “bisection approximators” (based upon techniques

2

Page 3

of [LR88]) yield separators whose capacity is within a factor O(logn) of the capacity

of the optimum bisection. Our algorithm gives an exact bisection.

MAX-k-SAT: Given a conjunctive normal form formula with k variables per clause, find a

true-false assignment to the variables making the maximum possible number of clauses

true. An algorithm in [Yan92] achieves an approximation ratio of 1.33 for the problem.

Improved algorithms have since been given for MAX-3-SAT; an approximation ratio

of 8/7 + ǫ is achieved in [KZ97]. It also known that achieving an approximation ratio

of 8/7 − ǫ is N P-hard [H97].

MIN-k-CUT: Given an n-vertex graph, remove a minimum set of edges that partitions the

graph into k connected components. Saran and Vazirani [SV91] gave a (2 − 2/k)-

approximation algorithm. The variant k-terminal cut problem specifies k vertices that

must all be disconnected from each other by the removal of the edges. Dalhaus et

al. [DJP+94] give an algorithm that achieves an approximation ratio of (2 − 2/k).

DENSE-k-SUBGRAPH: Given a graph, find a subset of k vertices that induces a graph

with the most edges. This problem was studied in [KP93], where an approximation

algorithm with ratio n7/18was presented.

3-COLORING: Color the vertices of a graph with 3 colors such that no two adjacent vertices

have the same color. Application of our techniques to this problem yields a result

already shown in [Edw86].

MAX-SNP: The class of “constant factor approximable” problems defined in [PY91].

We now define a natural notion of dense instance for each problem. (The definition of

dense instances for the class MAX-SNP appears in Section 4.4, where we also describe a

PTAS for them.) Exact optimization on dense instances is NP-hard for all problems except

MIN-k-CUT and 3-COLORING (see Section 7).

Definition 1.1. A graph is δ-dense if it has δn2/2 edges. It is everywhere-δ-dense if the

minimum degree is δn. We abbreviate Ω(1)-dense as dense and everywhere-Ω(1)-dense as

everywhere-dense. Thus everywhere-dense implies dense, but not vice versa. Similarly, a

k-SAT formula is dense if it has Ω(nk) clauses, and a dimension-d hypergraph if it has Ω(nd)

edges.

Theorem 1.2. There are PTASs for everywhere-dense instances of BISECTION and SEP-

ARATOR.

Theorem 1.3. There are PTASs for dense instances of the following problems: MAX-CUT,

MAX-DICUT, MAX-k-SAT for any constant k, DENSE-k-SUBGRAPH for k = Ω(n),

MAX-HYPERCUT(d) for constant d, and any MAX-SNPproblem.

Theorem 1.4. Exact algorithms exist on everywhere-dense graphs for MIN-k-CUT when

k = o(n) and for 3-COLORING.

Remark. The 3-COLORING result is not new—see [Edw86]—but does follow from a direct

application of our general technique

1.2Our Methods

Our heuristics are based upon two main ideas: exhaustive sampling and its use in approxima-

tion of polynomial integer programs. We discuss these ideas in the context of the maximum

cut problem (MAX-CUT), one of the problems to which our techniques apply.

The goal in MAX-CUT is to partition the vertices of a given graph into two groups—

called the left and right sides—so as to maximize the number of edges with an endpoint

3

Page 4

on each side. Notice that in the optimum solution, every vertex has the majority of its

neighbors on the opposite side of the partition (else, it would improve the cut to move the

vertex to the other side). Thus, if we knew where the neighbors of each vertex lay, we would

know where to put each vertex.

This argument may seem circular, but the circularity can be broken (in dense graphs) by

the following exhaustive sampling approach. Suppose we take a sample of O(logn) vertices.

By exhaustively trying all possible (i.e., 2O(log n)) placements of the vertices in the sample,

we will eventually guess where each vertex of the sample belongs in the optimum cut. Since

there are 2O(log n)= nO(1)possibilities, we can afford to try every one of them in polynomial

time. So assume we have partitioned the sampled vertices correctly according to the optimal

cut. Now consider some unsampled vertex. If a majority of its neighbors belong on the right

side of the optimum cut, then we expect that a majority of its sampled neighbors will be

from the right side of the optimum cut. This suggests the following scheme: put each

unsampled vertex on the side opposite the majority of its sampled neighbors.

This scheme works well for vertices whose opposite-side neighbors significantly outnumber

their same-side neighbors. More problematic are vertices for which the neighbors split evenly

between the two sides; sampling will not typically give us confidence about the majority

side. This brings us to the second major idea of our paper: approximately solving nonlinear

integer programs. Define a variable xifor vertex i which is 1 if the vertex is on the right

side of a cut and 0 otherwise. Then finding a maximum cut corresponds to finding a 0-1

assignment that maximizes the following function (where E is the edge set of the graph):

?

i

xi

?

(i,j)∈E

(1 − xj)

.

To see this, note that the formula counts, for every vertex i on the right side of the cut, the

number of edges leading from it to neighbors j on the left side of the cut.

Of course, solving even linear integer programs is N P-complete, and the above program

involves a quadratic objective function. However, we show that exhaustive sampling can

be used to approximately maximize such functions, and more generally, to approximately

solve integer programs in which the constraints and objective involve low-degree polynomials

instead of linear functions. We state our main approximation result in the next section.

Most of our approximation algorithms are more properly viewed as algorithms that com-

pute an additive approximation (see Section 1.3). For example, our algorithm for MAX-CUT

computes, for every graph, a cut of capacity at least OPT − ǫn2, where ǫ is any desired

constant. Such an approximation is also within a small multiplicative factor of the optimum

in a dense graph (i.e., one with Ω(n2) edges) because OPT = Ω(n2) for such graphs (this

follows from our earlier observation that in an optimum cut, every vertex is on the oppo-

site side from a majority of its neighbors). However, our algorithms for BISECTION and

SEPARATOR are not additive approximation algorithms.

1.3Smooth Integer Programs

Many existing approximation algorithms for N P-hard problems are based on representation

of the problem as a linear integer program (LIP). All problems in N P have such formulations

since solving LIPs is N P-complete. Many problems have natural formuations as LIPs that

give insight into their structure and lead to approximation algorithms. But formulation

as a LIP masks the true nature of many other problems—in particular, an approximately

optimum solution to the LIP may correspond to a far from optimum solution to the original

optimization problem. A more natural formulation involves nonlinear integer program in

which the objective function is a low degree polynomial. Most of our PTAS’s for dense

problems are derived from such a representation. We solve a general class of optimization

problems in which the objective function and the constraints are polynomials.

4

Page 5

Definition 1.5. A polynomial integer program (or PIP) is of the form

maximize

subject to

p0(x1,... ,xn)

li≤ pi(x) ≤ ui

xi∈ {0,1}

(1)

(2)

(3)

(i = 1,...,m)

∀i ≤ n

where p0,... ,pmare polynomials. (The PIP could involve minimization instead of maxi-

mization.)

When all pihave degree at most d, we call this program a degree d PIP.

Since they subsume integer programs, it is clear that solving PIPs is N P-hard. One

might hope to define a more tractable class by eliminating the integrality requirement, but

this accomplishes nothing since the 0–1 integrality of xican be enforced by the quadratic

constraint xi(xi− 1) = 0.

We now describe a class of PIPs that are easy to approximate.

Definition 1.6. An n-variate, degree-d polynomial has smoothness c if the absolute value

of each coefficient of each degree i monomial (term) is at most c · nd−i.

Remark. The reader should think of c and d as being fixed constants, and n as being

allowed to grow. We call the resulting family of polynomials a family of c-smooth degree d

polynomials.

Definition 1.7. A c-smooth degree-d PIP is a PIP in which the objective function and

constraints are c-smooth polynomials with degree at most d.

Smooth integer programs can represent many combinatorial problems in a natural way.

We illustrate this using MAX-CUT as an example.

Example 1.8. A degree-2 polynomial with smoothness c has the form

?

aijxixj+

?

bixi+ d

where each |aij| ≤ c,|bi| ≤ cn,|d| ≤ cn2.

We show how to formulate MAX-CUT on the graph G = (V,E) using a 2-smooth integer

program. Define a variable xifor each vertex vi. Then, assign 0,1 values to the xi(in other

words, find a cut) so as to maximize

?

{i,j}∈E

(xi(1 − xj) + xj(1 − xi)).

(Notice that an edge {i,j} contributes 1 to the sum when xi?= xj and 0 otherwise. Thus

the sum is equal to the cut value.) Expanding the sum shows that the coefficients of the

quadratic terms are 0 and −2 while the coefficients of the linear terms are at most n.

Now we can state our general theorem about approximation of smooth integer programs.

Definition 1.9. A solution a is said to satisfy a constraint li≤ pi(x) ≤ uito within an

additive error δ if li− δ ≤ pi(a) ≤ ui+ δ.

Theorem 1.10. There is a randomized polynomial-time algorithm that approximately solves

smooth PIPs, in the following sense. Given a feasible c-smooth degree d PIP with n variables,

objective function p0and K constraints, the algorithm finds a 0/1 solution z satisfying

p0(z1,... ,zn) ≥ OPT − ǫnd,

5

Page 6

where OPT is the optimum of the PIP. This solution z also satisfies each degree d′constraint

to within an additive factor of ǫnd′for d′> 1, and satisfies each linear constraint to within

an additive error of O(ǫ√nlogn).

The running time of the algorithm is O((dKnd)t), where t = 4c2e2d2/ǫ2= O(1/ǫ2).

The algorithm can be derandomized (i.e., made deterministic), while increasing the run-

ning time by only a polynomial factor.

Remark. The statement of the theorem can be stronger: the input PIP does not need to be

feasible, but only approximately feasible (that is, there must be a point that satisfies each

degree d′constraint to within an additive error ǫ′nd′for some ǫ′< ǫ/2.)

Theorem 1.10 underlies almost all of our PTASs. However, our PTASs for BISECTION

and MIN-k-CUT require some additional ideas since an additive approximation is not good

enough.

1.4 Related Work

There are known examples of problems that are seemingly easier to approximate in dense

graphs than in general graphs. For instance, in graphs with degree exceeding n/2, one can

find Hamiltonian cycles [P`76] and approximate the number of perfect matchings [JS89]. In

everywhere-dense graphs it is easy to approximate the values of the Tutte polynomial and,

as a special case, to estimate the reliability of a network [AFW94].

Independent of our work, Fernandez de la Vega [FdlV94] developed a PTAS for everywhere-

dense MAX-CUT using exhaustive sampling principles similar to ours. After sampling and

guessing, Fernandez de la Vega replaces our linear-programming solution with a greedy

placement procedure. While this procedure is significantly simpler than ours (at least con-

ceptually; the running time is still dominated by the exhaustive sampling procedure and is

similar to ours), it is not obvious (and is an interesting open question) whether the procedure

can generalize to the other problems we have listed.

Edwards [Edw86] shows how to 3-color a 3-colorable everywhere-dense graph in polyno-

mial time. Our sampling approach gives an alternative algorithm.

A random-sampling based approach related to ours also appears in [KP92].

In the last section of the paper (Section 8) we describe some results related to our work

that have been discovered since the conference presentation of the current paper.

1.5Paper Organization

In Section 2 we give details of the main ideas of our approach, exhaustive sampling and

transforming polynomial constraints into linear constraints, already sketched in Sections 1.2

and 1.3. We continue to use MAX-CUT as a motivating example.

In Section 3 we generalize these techniques to derive our (additive error) approximation

algorithm for any smooth polynomial integer program (PIP). In Section 4, we use these PIPs

to approximate most of the problems listed in Section 1.1. Solving BISECTION and SEPA-

RATOR requires some additional exhaustive sampling ideas that are explained in Section 5.

In Section 6, we describe some problems that can be solved purely by exhaustive sampling,

with no recourse to PIPs. Finally, in Section 7, we confirm that all of the problems we

are approximating are still N P-complete when restricted to dense instances, demonstrating

that an exact solution is unlikely.

2Our Techniques: An overview

In this section we introduce our two major techniques, exhaustive sampling and reducing

degree d constraints to linear constraints (in an approximate sense) to give a PTAS for dense

MAX-CUT.

6

Page 7

First we express MAX-CUT as a quadratic integer program as follows. Let the 0/1 vector

x be the characteristic vector of a cut, i.e., xi= 1 iff i is on the right side. Let N(i) be the

set of neighbors of vertex i, and let

ri(x) =

?

j∈N(i)

(1 − xj)

be the linear function denoting the number of number of neighbors of i that are on the left

side of cut x. Then

MAX-CUT =max?

s.t. xi∈ {0,1}

ixi· ri(x)

∀i

The above formulation looks a lot like an integer linear program, for which numerous ap-

proximation techniques are known. Unfortunately, the “coefficients” ri(x) in the objective

function are not constants—the program is actually a quadratic program. However, exhaus-

tive sampling lets us estimate the value these coefficients have in the optimum solution. We

arrive at our approximation in three steps:

1. Using exhaustive sampling, we estimate the values of ri(a) at the optimum solution

a = (a1,... ,an). See Section 2.1.

2. We replace each function riby the corresponding estimate of ri(a). This turns the

quadratic program into a linear (integer) program. We show that optimum of this

linear integer program is near-optimum for the quadratic program. See Section 2.2.

3. We solve the fractional relaxation of the linear integer program, and use randomized

rounding to convert the solution into an integer one. We show that this does not

dramatically change the solution value. See Section 2.3.

A comment on notation:

real numbers, as a shorthand for the interval [a − b,a + b].

Throughout the paper, we will use a ± b where a and b are

2.1Estimating Coefficients

We begin by using exhaustive sampling to estimate the values ri(a) at the optimum solution

a. Let a be the optimum cut and let ρi= ri(a). Then a is the solution to the following

integer linear program:

MAX-CUT =max?

s.t. xi∈ {0,1}

ri(x) = ρi

ixi· ρi

∀i

∀i

Of course, the usefulness of this observation is unclear, since we don’t know the values

ρi. We show, however, that it is possible to compute an additive error estimate of the ρiin

polynomial time, in other words, a set of numbers eisuch that

ρi− ǫn ≤ ei≤ ρi+ ǫn

∀i.(4)

This can be done using our exhaustive sampling approach. We take a random sample of

O(logn) vertices. By exhaustively trying all possible (i.e., 2O(log n)= nO(1)) placements of

the vertices in the sample, we will eventually guess a placement wherein each vertex is placed

as it would be in the optimum cut. So we can assume that we have “guessed” the values aj

in the optimum cut for all the sampled vertices j. Now consider any unsampled vertex i. If

it has |N(i)| = Ω(n) neighbors, then with high probability, Θ(logn) of its neighbors are part

7

Page 8

of the random sample (high probability means probability 1−n−Ω(1)). A moment’s thought

shows that these neighbors form a uniform random sample from N(i). Hence by examining

the fraction of sampled neighbors on the left hand side of the cut (namely, neighbors for

which aj= 0) we can obtain an estimate of ri(a)/|N(i)| that is correct to within a small

additive factor. This follows from the following sampling lemma.

Lemma 2.1 (Sampling Lemma). Let (ai) be a sequence of n numbers, each with absolute

value at most M. Let f > 0 be any number. If we choose a multiset of s = g logn of the ai

at random (with replacement), then their sum q satisfies

qn

s

∈

?

i

ai± nM

?

f

g

with probability at least 1 − n−f.

Proof. Let s = g logn. For j = 1,... ,s let Xjbe the random variable denoting the number

picked in the jthdraw. Since the numbers are drawn with replacement, the values Xjare

identically distributed, and

E[Xj] =1

n

n

?

j=1

aj.

Since |Xj| ≤ M by hypothesis, the lemma now follows from the standard H¨ offding bound

[H64].

For MAX-CUT, our goal is to estimate the values ρiof the form?

we randomly choose (with replacement) g logn indices with g = O(1/ǫ3), and “guess” their

values by exhaustively trying all possibilities. Since each aj= 0 or 1, we can take M = 1

in the Sampling Lemma. The Sampling Lemma shows that for each i, the probability is at

least 1 − 1/n2that the following happen (i) Ω(logn/ǫ2) of the sampled vertices lie in N(i)

(note that the conditional distribution of these vertices is uniform) and (ii) the estimate for

ρiproduced using this sample is accurate to within ǫn.

j∈N(i)(1−aj). First,

if any |N(i)| ≤ ǫn/10, we use the estimate 0 for ρi. To estimate ρi for the remaining i,

2.2Linearizing the Quadratic Integer Program

Now we use the coefficient estimates to define an integer linear program whose solutions are

near-optima for MAX-CUT. Given the estimates eijust derived for the values ρi, we write

the following linear integer program. Note that it is feasible, since a satisfies it (assuming

our sampling step in the previous section worked).

NEW-OPT =max

?

ixi· ei

s.t. xi∈ {0,1}

ei− ǫn ≤ ri(x) ≤ ei− ǫn

∀i

∀i(5)

(Recall that each ri(x) is a linear function of x, so the given constraints are linear con-

straints.)

We claim that the optimum solution z to this integer linear program is near-optimum for

MAX-CUT. This can be seen as follows:

8

Page 9

?

ziri(z) ≥

?

?

?

?

?

zi(ei− ǫn)

ziei− ǫn2

aiei− ǫn2

ai(ρi− ǫn) − ǫn2

aiρi− 2ǫn2

= MAX-CUT − 2ǫn2

By the constraints (5)

≥

≥

≥

≥

Since z is integer optimum

from (4)

In other words, the optimum of the integer program is a near-optimum solution to MAX-

CUT.

2.3 Approximating the Linear Integer Program

Of course, we cannot exactly solve the integer linear program just derived. But we can

compute an approximate solution to it as follows. We relax the integrality constraints,

allowing 0 ≤ xi ≤ 1. We use linear programming to obtain the fractional optimum, say

y ∈ [0,1]n, and then use randomized rounding to convert the fractional solution to an

integral one of roughly the same value. The key lemma is the following:

Lemma 2.2 (Randomized Rounding). If c and f are positive integers and 0 < ǫ < 1,

then the following is true for any integers n ≥ 0. Let y = (yi) be a vector of n variables,

0 ≤ yi≤ 1, that satisfies a certain linear constraint aTy = b, where each |ai| ≤ c. Construct

a vector z = (zi) randomly by setting zi= 1 with probability yiand 0 with probability 1−yi.

Then with probability at least 1 − n−f, we have

aTz ∈ b ± c

We can apply this lemma to our problem as follows. Give our fractional solution y, let us

apply randomized rounding as in the lemma to yield an integral solution z. We claim that

with high probability,

ri(y) ± O(√nlnn)

?

Specifically, to derive Equations (6) and (7) from Lemma 2.2, note that each ri(x) is a

linear function with 0–1 coefficients and that each ri(y) is at most n.

We use these equations as follows. The analysis of the previous section showed that the

integral optimum of our derived linear program was near the maximum cut value, so the

fractional optimum y can only be better. That is,

?

fnlnn

ri(z)

∈

∈

(6)

ziri(y)

?

yiri(y) ± O(n3/2lnn)(7)

?

yiri(y) ≥ MAX-CUT − 2ǫn2.

We now use our randomized rounding lemma. We have that

?

ziri(z) ≥

?

?

?

zi(ri(y) − O(√nlnn))

ziri(y) − O(n3/2lnn)

yiri(y) − O(n3/2lnn)

≥ MAX-CUT − (2ǫ + o(1))n2

From (6)

≥

≥

From (7)

This finishes the overview of our algorithm for MAX-CUT.

9

Page 10

3Approximating Smooth Integer Programs

We now generalize the results of the previous section to handle arbitrary polynomial integer

programs (PIPs). We describe an algorithm that computes approximate solutions to smooth

PIPs of low degree, thus proving Theorem 1.10. We use the fact that smooth PIPs can be

recursively decomposed into smooth lower-degree PIPs. This lets us apply ideas similar to

those described in Section 2 for MAX-CUT. In a PIP the objective function and constraints

are low degree polynomials (degree 2 in the case of MAX-CUT). We use exhaustive sampling

to convert such polynomial integer programs into linear integer programs. Then we use the

Raghavan-Thompson technique to approximately solve the linear integer program.

We will see shortly that we can assume without loss of generality that we are dealing

with the feasibility version of a PIP—that is, we are given a feasible PIP and out goal is to

find an approximately feasible integer solution. Our general algorithm has the same three

elements as the one for MAX-CUT:

1. We show in Section 3.2 that we can relax the integrality conditions, since we can

use randomized rounding to convert every feasible fractional solution of a PIP into a

feasible integral solution.

2. In Section 3.3, we generalize the sampling theorem, which applies only to sums, to let

us estimate the values of polynomials.

3. We show in Section 3.4 that we can use our estimates to convert degree d constraints

into linear constraints without affecting feasibility.

We begin in Section 3.1 with some basic observations.

3.1 Basic Observations

We begin with a few basic observations that we will use at various times in the proof.

3.1.1 A Polynomial Decomposition

Our PIP algorithms are basically recursive generalizations of the approach for MAX-CUT.

They rely on the following key observation that lets us decompose any polynomial into

simpler polynomials:

Lemma 3.1. A c-smooth polynomial p of degree d can be written as

p(x) = t +

?

xipi(x)

where t is a constant and each piis a c-smooth polynomial of degree d − 1.

Proof. From each monomial term in the expansion of p, pull out one variable xi. Group all

monomials from which xiwas extracted into pi. Every degree d′term in picorresponds to

a degree d′+ 1 term in p, and thus has coefficient at most cnd−(d′+1)= cn(d−1)−d′. Thus,

since pihas degree (at most) d − 1, it is a c-smooth degree d − 1 polynomial.

Remark. The above analysis also shows that we can express p uniquely as a sum

p(x) = t +

?

xipi(xi,... ,xn),

that is, where each pidepends only on variables with index i or greater.

10

Page 11

The decomposition of a degree d polynomial into degree d − 1 polynomials gives us a

natural recursion through which we can generalize our quadratic programming techniques.

By computing an estimate ρiof the value of pi(x) at the optimum solution, we replace the

degree-d constraint p with a single constraint on?xiρitogether with a family of constraints

until all of our constraints are linear.

To estimate the values pi(x), we again rely on the expansion above: we expand pi in

terms of degree d − 2 polynomials, writing pi(x) =?xjpij(x), recursively estimate the pij

After constructing the required linear integer program, we solve its fractional relaxation

and use randomized rounding as before to transform the solution into an integral solution.

To prove that randomized rounding works, we again use the decomposition—we show that

each pi(x) is roughly preserved by rounding, and deduce that?xipi(x) is also preserved.

3.1.2 Reducing Optimization to Feasibility

on the values pi(x). We then recursively expand these degree d − 1 constraints, continuing

values, and then use exhaustive sampling to estimate p based on the values of the pij.

We can reduce PIP optimization to the corresponding feasibility problem (“Is there a feasible

solution such that the objective exceeds a given value?”) using binary search in the usual

way. This uses the fact that the optimum value of a PIP is not too large, as shown in the

following lemma (which will also be useful later).

Lemma 3.2. If n > d, then the absolute value of a c-smooth polynomial at any point in

[0,1]nis at most 2cend(where lne = 1).

Proof. For 0 ≤ i ≤ d the polynomial has at most?n+i

[0,1]nis

i

?terms of degree i, and each has a

coefficient in [−cnd−i,cnd−i]. Thus an upper bound on the absolute value at any point in

d

?

i=0

cnd−i·

?n + i

i

?

≤

d

?

i=0

cnd−i·

?n + d

i

?

≤

cnd

d

?

i=0

(n + d

n

)i1

i !

≤

cnde1+d/n

which is at most cend(1 + 2d/n) < 2cendfor n > 5d.

3.2Rounding Fractional PIPs

We begin with the final step of our algorithm, rounding a fractional solution to an integral

one. We present this section first since it is more straightforward than the following ones

but conveys the same ideas. As we saw in Section 2.3, Raghavan and Thompson [RT87]

show that given a fractional solution to a linear program, we can round it into an integer

solution that is “almost as good.” We rephrased their result in Lemma 2.2. We now modify

the Raghavan-Thompson technique to show in Lemma 3.3 that a similar result is true for

low degree polynomials. In other words, we show that the value of a c-smooth polynomial

at a point in [0,1]nis not too different from its value at a nearby integral point obtained by

randomized rounding.

Lemma 3.3 (Randomized Rounding for degree d polynomials). Let p be a c-smooth

degree-d polynomial. Given fractional values (yi) such that p(y1,... ,yn) = b, suppose ran-

domized rounding is performed on the yias in Lemma 2.2 to yield a 0,1 vector (zi). Then

with probability at least 1 − nd−f, we have

p(z1,... ,zn) ∈

?

b ± gdnd−1

2√

lnn

?

,

11

Page 12

where g = 2ce√f.

Proof. We use induction on the degree. The case d = 1 follows from Lemma 2.2. Now assume

we have proved the Lemma for all integers less than d, and p is a degree d polynomial. As

argued in Section 3.1, we can express p as

p(x1,... ,xn) =

n

?

i=1

xi· pi(x1,... ,xn) + t, (8)

where t is a constant and piis a c-smooth polynomial of degree at most d − 1.

Let ρidenote the value pi(y1,... ,yn). Then

b=p(y1,... ,yn)

n

?

=t +

i=1

ρi· yi.

Let (z1,... ,zn) ∈ {0,1}nbe obtained by randomized rounding on (y1,... ,yn). Our

proof consists of noticing that with high probability, (zi) satisfies both b ≈?

realize that any such (zi) also satisfies b ≈ p(z1,... ,zn).

Let us formalize this idea. Note that |ρi| ≤ 2cend−1by Lemma 3.2. So we can apply

Lemma 2.2 (replacing c by 2cend−1). We find that with probability at least 1−n−f(recalling

that the notation a ± b is shorthand for the interval [a − b,a + b]),

n

?

Furthermore, the inductive hypothesis implies that for each i ≤ n, the probability is at

least 1 − nd−f−1that

pi(zi,... ,zn) ∈ ρi± g(d − 1)nd−1−1

Hence we conclude that with probability at least 1 − nd−f− n−f≈ 1 − nd−f, the event

mentioned in Condition (10) happens for each i ≤ n, and so does the event mentioned in

Condition (9). Of course, when all these events happen, we have:

iρizi (by

Lemma 2.2) and ∀i ≤ n : ρi ≈ pi(z1,... ,zn) (by induction for degree d − 1). Then we

i=1

ρi· zi∈ b ± gnd−1√

nlnn. (9)

2√

lnn (10)

p(z1,... ,zn) = t +

n

?

?

?

i=1

zi· pi(z1,... ,zn)

zi(ρi± g(d − 1)nd−1−1/2√

ziρi± g(d − 1)nd−1/2√

= b ± gnd−1√

= [b ± gdnd−1

∈ t +

⊆ t +

lnn)

by (10)

lnn

nlnn ± g(d − 1)nd−1/2√

lnn]

lnn

by (9)

2√

Hence we have shown that p(z1,... ,zn) ∈ [b ± gdnd−1

1 − nd−f.

2√lnn] with probability at least

3.3 Estimating the Value of a Polynomial

Having shown how to round a fractional solution to an integral one, we now show how to

find an approximately optimal fractional solution by solving a linear program. As discussed

above, our procedure for replacing the constraint on p(x) by linear constraints requires

12

Page 13

estimating the values at the optimum a of the coefficients pi(a) in the expansion p(a) =

?aipi(a). In this section, we show how this estimation can be accomplished by exhaustive

of a c-smooth degree d polynomial p(x1,... ,xn) on any unknown 0/1 vector (a1,... ,an),

given only partial information about (a1,... ,an). The algorithm is given the values aifor

O(logn) randomly-chosen indices i, and outputs an estimate that, with high probability,

lies in p(a1,... ,an) ± ǫnd.

To simplify our exposition later we describe the procedure more generally as using a

(multi)set of indices S ⊆ {1,... ,n}.

sampling. We describe a procedure Eval in Figure 3.3 that can approximate the value

Algorithm Eval(p,S,{ai: i ∈ S})

Input:polynomial p of degree at most d,

set of variables indices S,

aifor i ∈ S.

Output: estimate for p(a1,... ,an).

if deg(p) = 0 (i.e., p is a constant) then

return p

else

write p(x1,... ,xn) = t +?xipi(x1,... ,xn)

for each i ∈ S

ei←Eval(pi,S,{ai: i ∈ S})

return

t +

|S|

i∈S

where t is a constant and each pihas degree at most d − 1

n

?

aiei

Figure 1: The approximate evaluation algorithm

Note that if S = {1,... ,n} then the procedure returns p(a1,... ,an). We will show that

in order to get an additive approximation of the type we are interested in, it suffices to

choose S randomly and of size O(logn). We use the Sampling Lemma (2.1) as the base case

in our inductive proof of the correctness of Eval.

Lemma 3.4. Let p be a c-smooth polynomial of degree d in n variables xi, and let a1,... ,an∈

{0,1}. Let S be a set of O(g logn) indices chosen randomly (with replacement). Then with

probability at least 1 − nd−f, set S is such that Eval(p,S,{ai: i ∈ S}) returns a value in

p(a1,... ,an) ± ǫnd, where

ǫ = 4ce

?

f

g.

Proof. The proof is by induction on d. The case d = 0 is clear. For the inductive step let

ρi= pi(ai,... ,an), so we have

p(a1,... ,an) = t +

n

?

i=1

ai· ρi

(11)

The intuition for why Eval’s output should approximate p(a1,... ,an) is as follows. Each

pihas degree at most d−1, so the inductive hypothesis implies that ei≈ ρi. Thus the output

13

Page 14

of eval is

t +

n

|S|·

?

i∈S

ai· ei≈ t +

n

|S|·

?

?

ai· ρi

i∈S

ai· ρi

(by the inductive hypothesis)

≈ t +

i

(by the Sampling Lemma)

It remains to fill in the details, and to deal with the complication that the errors in our

recursive estimates of the ρiaccumulate into the error for our estimate of p(a1,... ,an).

Our sample has size g logn. By Lemma 3.2, each |ρi| ≤ 2cend−1. Hence the Sampling

Lemma implies that with probability 1 − n−fthe set S is such that

n

|S|

?

i∈S

aiρi∈

?

i

aiρi±

?

2ce

?

f

g

?

nd

(12)

Of course, we do not have the values ρi. However, we do have the values ei=Eval(pi,S,{ai:

i ∈ S}). To see the impact of using them instead, let ǫddenote the smallest number such

that for every c-smooth degree d polynomial p and point a ∈ {0,1}n

Pr[Eval computes an estimate within p(a) ± ǫdnd] ≥ 1 − nd−f

We get an recurrence for ǫdas follows. By definition, Eval estimates any particular ρito

within ǫd−1nd−1with probability 1−n(d−1)−f. Thus all n values ρiare estimated to within

this bound with probability 1−n·n(d−1)−f= 1−nd−f. Combining with (12), we conclude

that with probability at least 1 − nd−f− n−f≈ 1 − nd−f, set S is such that the returned

value

n

|S|

i∈Si∈S

?

|S|

i∈S

??

?

t +

?

ai· ei∈ t +

n

|S|

?

n

ai·?ρi± ǫd−1nd−1?

?

⊆ t +

ai· ρi

?

± ǫd−1nd−1n

|S|

⊆ t +

i

ai· ρi± 2ce

?

f

gnd

?

±ǫd−1

|S|nd

?

by (12)

⊆ t +

?

i

ai· ρi±

2ce

?

f

g+ǫd−1

|S|

nd

= p(a1,... ,an) ±

?

2ce

?

f

g+ǫd−1

|S|

?

nd.

It follows that

ǫd≤ 2ce

?

?

?

f

g+ǫd−1

|S|

≤ 2ce

f

g(1 + |S|−1+ ··· + |S|−d)

≤ 4ce

f

g

for |S| > 1.

14

Page 15

Corollary 3.5. With probability 1−nd−fover the choice of S, the Eval procedure accurately

estimates the values of all the polynomials arising from the decomposition of polynomial p

(that is, estimates every degree-d′polynomial to within ǫd′nd′).

Proof. This is implicit in the previous proof. Note that the decomposition of p is determined

solely by p, independent of the value of the optimum solution a that we are estimating.

3.4Transforming Degree d Constraints to Linear Constraints

Using the estimates produced by Procedure Eval of Section 3.3 we can transform any

polynomial constraint into a family of linear constraints, so that any feasible solution to

the linear constraints will approximately satisfy the polynomial constraint as well. We use

algorithm Linearize in Figure 3.4. Just like Eval, the inputs to this procedure contain

partial information about some feasible solution vector (a1,... ,an) ∈ {0,1}nto the input

constraint.

Algorithm Linearize(“L ≤ p(x1,... ,xn) ≤ U”, S, {ai: i ∈ S}, ǫ)

Input:constraint involving polynomial p of degree d,

lower bound L and upper bound U

multiset of variable indices S

ai∈ {0,1} for each i ∈ S.

Error parameter ǫ > 0.

Output: A set of linear constraints

if p is linear then

output the input constraint “L ≤ p(x) ≤ U”

else

write p(x1,... ,xn) = t +?xipi(x1,... ,xn)

for i = 1 to n

ei← Eval(pi,S,{ai: i ∈ S})

li← ei− ǫnd−1

ui← ei+ ǫnd−1

Linearize(“li≤ pi(x1,... ,xn) ≤ ui”, S, {ai: i ∈ S}, ǫ)

output the constraint

“L − ǫnd≤ t +?xiei≤ U + ǫnd”

where t is a constant and each pihas degree at most d − 1

Figure 2: Linearizing a Polynomial Constraint

A simple induction shows that the Procedure in Figure 3.4, when given a degree d con-

straint, outputs a set of at most 1 + n + ··· + nd−1= O(nd−1) linear constraints. The next

two lemmas prove the correctness of this (probabilistic) reduction. The first shows that

with high probability, the replacement equations are jointly feasible. The second shows any

feasible solution will be almost feasible for the original constraint.

Lemma 3.6. Let f,g,c > 0 be any constants. Let Linearize be given an error param-

eter ǫ = 4ce?f/g and a constraint involving a c-smooth polynomial of degree d.

of g logn variables (picked with replacement), then with probability at least 1 − dnd−fPro-

cedure Linearize outputs a set of linear constraints that are satisfied by (a1,... ,an).

Let

(a1,... ,an) ∈ {0,1}nbe a feasible solution to the constraint. If S is a random sample

Proof. Calling linearize with polynomial p results in numerous recursive call, each of which

(besides making other recursive calls) outputs a constraint on some degree d′polynomial

15

Download full-text