Local Property Reconstruction and Monotonicity.
-
Citations (0)
-
Cited In (0)
Page 1
Local Property Reconstruction and Monotonicity∗
Michael Saks†
saks@math.rutgers.edu
Dept. of Mathematics
Rutgers University
C. Seshadhri‡
csesha@us.ibm.com
IBM Almaden Research Center
Abstract
We propose a general model of local property reconstruction. Suppose we have a function
f on domain Γ, which is supposed to have a particular property P, but may not have the
property. We would like a procedure which produces a function g that has property P and is
close to f (according to some suitable metric). The reconstruction procedure, called a filter, has
the following form. The procedure takes as input an element x of Γ and outputs g(x). The
procedure has oracle access to the function f and uses a single short random string ρ, but is
otherwise deterministic.
This model was inspired by a related model of online property reconstruction that was in-
troduced by by Ailon, Chazelle, Comandur and Liu (2004). It is related to the property testing
model, and extends the framework that is used in the model of locally decodable codes. A simi-
lar model, in the context of hypergraph properties, was independently proposed and studied by
Austin and Tao (2008).
We specifically consider the property of monotonicity and develop an efficient local filter for
thie property. The input f is a real valued function defined on the domain {1,...,n}d(where n
is viewed as large and d as a constant). The function is monotone if the following property holds:
for two domain elements x and y, if x ≤ y (in the product order) then f(x) ≤ f(y). Given x,
our filter outputs the value g(x) in (logn)O(1)time and uses a random seed ρ of the same size.
With high probability, the ratio of the (Hamming) distance between g and f to the minimum
possible Hamming distance between a monotone function and f is bounded above by a function
of d (independent of n).
∗This is an extended abstract of work that will appear as “Local Monotonicity Reconstruction” in SIAM Journal
of Computing. A preliminary version of this work appeared as “Parallel Monotonicity Reconstruction” [29].
†This work was supported in part by NSF under grants CCF-0515201 and CCF-0832787.
‡This paper is partly based on material that appeared in this author’s Ph.D. dissertation for the Department of
Computer Science, Princeton University.
1
Page 2
1Online property reconstruction
The process of assembling large data sets is prone to varied sources of corruption, such as measure-
ment error, replication error, and communication noise. Error correction techniques (i.e. coding)
can be used to reduce or eliminate the effects of some sources of error, but often some residual
errors may be unavoidable. Despite the presence of such inherent error, the data set may still be
very useful.
One problem in using such a data set is that even small amounts of error can significantly change
the behavior of algorithms that act on the data. For example, if we do a binary search on an array
that is supposed to be sorted, a few erroneous entries may lead to behavior that deviates significantly
from the “correct” behavior.
This is an example of a more general situation. We have a data set that ideally should have
some specified structural property, i.e., a list of numbers that should be sorted, a set of points that
should be in convex position, or a graph that should be a tree. Algorithms that run on the data
set may rely on this property. A small amount of error may destroy the property, and result in
the algorithm producing wildly unexpected results, or even crashing. In these situations, a small
amount of error may be tolerable but only if the structural property is maintained.
These considerations motivated the formulation of the online property reconstruction model,
which was introduced in [3]. We are given a data set, which we think of as a function f defined on
some domain Γ. Ideally, f should have a specified structural property P, but this property may not
hold due to unavoidable errors. We wish to construct online a new data set g such that:
(1) g has property P and (2) d(g,f) is small, where d(g,f) is the fraction of values x ∈ Γ for
which g(x) ?= f(x).
How small should d(g,f) be in Condition (2)? Define εf= εf(P) to be the minimum of d(h,f)
over all h that satisfy P. Of course, εf is a lower bound on the deviation of g from f. The error
blow-up of g is the ratio d(g,f)/εf. This error blow-up can be viewed as the price that is paid in
order to restore the property P online, and we want this to be a not too large constant.
An offline reconstruction algorithm explicitly outputs such a g on input f. In the context of
large data sets, the explicit construction of g from f requires a considerable amount of computa-
tional overhead (at least linear in the size of the data set). For this reason, [3] considered online
reconstruction algorithms. Such an algorithm, called a filter, gets as input a sequence x1,x2,...
of elements of Γ presented one at a time and must output the sequence of values g(x1),g(x2),...
where g(xi) is produced in response to xi, before knowing xi+1. The filter can access the function
f via an oracle which, given y ∈ Γ, answers f(y). The aim is to design a filter that, for any online
input sequence of elements in Γ, outputs a function g satisfying (1) and (2) above and furthermore
produces each successive g(xi) quickly, i.e., in time much smaller than O(|Γ|).
In [3], a filter for the monotonicity property was given. In this setting, the domain Γ is the set
[n]d= {(j1,...,jd) : ji∈ [n]}, where [n] denotes the set {1,2,...,n}. The set [n]dis considered to be
partially ordered under the component-wise (product) order: (i1,...,id) ≤ (j1,...,jd) iff ∀r,ir≤ jr.
A function f defined on Γ is monotone if x ≤ y implies f(x) ≤ f(y). The filter they constructed
satisfies Condition (1), has error blow-up that is bounded above by 2O(d)(independent of n), and
answers each successive query in time (logn)O(d).
2Local property reconstruction
The filter for monotonicity proposed in [3] has the following general structure. For each successive
query xj, the filter executes a randomized algorithm to compute g(xj). This algorithm accesses f,
and also needs to access the answers g(xi) for i < j to the queries asked previously. In particular,
2
Page 3
the function g produced may depend on both the order of the queries and the random bits used by
the algorithm.
This general structure for filters has two potential drawbacks: (1) It requires the storage of all
previous queries and answers, thus incurring possibly significant space overhead for the algorithm,
(2) It does not support a local implementation in which multiple copies of the filter, having read-only
access to f, are able to handle queries independently while maintaining mutual consistency.
In this paper, we propose the following strengthened requirements for a filter. A local filter1for
reconstructing property P is an algorithm A that has oracle access to a function f on domain Γ (the
“data set”) and to an auxiliary random string ρ (the “random seed”), and takes as input x ∈ Γ.
For fixed f and ρ, A runs deterministically on input x to produce an output Af,ρ(x). Thus, given
f and ρ, Af,ρspecifies a function on domain Γ. We want A to satisfy the following properties:
1. For each f and ρ, Af,ρsatisfies P.2
2. For each f, with high probability (with respect to the choice of ρ), the function Af,ρshould
be “suitably close” to f.
3. For each x, Af,ρon x can be computed very quickly.
4. The size of the random seed ρ should be “much smaller” than |Γ|.
Remark 1: In Condition 2, we say that Af,ρshould be “suitably close” to f. There are various
ways to make this precise. Let εfdenote the minimum distance from f to a function satisfying P
and let γf(ρ) denote the distance from f to Af,ρ. We would like γf(ρ) to be small compared to εf.
The error blow-up, which is the ratio of γf(ρ)/εf, works well for the monotoncity property that we
study. For other properties, it might be more appropriate to use another criterion: for example,
we might consider the difference γf(ρ) − εf. More generally, we could require simply that γf(ρ) be
bounded above by some arbitrary function of εf(either independent of |Γ| or growing very slowly
with |Γ|).
Remark 2: Similarly, for Condition 3, there are various possibilities for interpreting the phrase
“very quickly”. In this paper, we obtain running times that are polynomial in log|Γ|. In Section
3, we will mention some work on other properties where the running time does not depend on the
domain size. On the other hand, there may be other properties where it is non-trivial and interesting
to obtain running times of the form |Γ|δ.
Remark 3: A local filter can be used, trivially, as an online filter. The space required by the
local filter is bounded by the sum of the length of ρ and the running time per query. By keeping
these both small (e.g., much smaller than |Γ|) we obtain an online filter using little auxiliary space.
Remark 4: A local filter can be used to enforce consistent behavior among autonomous proces-
sors who each have access to f but do not communicate with each other. We generate one random
seed ρ and give the same random seed to each of the processors. Since Af,ρis deterministic, all
processors will reconstruct the same function.
1This was originally called a parallel filter in the conference version [29]. We made this terminology change since it
is more compatible with the existing concepts of locally decodable codes.
2In an earlier version of this paper, this condition was replaced by the weaker condition that for each f, Af,ρshould
satisfy P with high probability. Prompted by a question raised by a referee we were able to modify our monotonicity
filter to satisfy this stronger property, and so modified the definition accordingly. The weaker condition may be more
appropriate for some other properties.
3
Page 4
3 Related Work
One case of property reconstruction that has been studied extensively is error correcting codes.
Suppose C ⊆ {0,1}nis such a code in which all members of C are pairwise at distance at least
d. Let P be the property of being a codeword. The error correction problem for C is to find the
closest codeword to a given input string x. This can be formulated as a reconstruction problem for
the property P.
One variant of the error correction is the problem of local decoding. This problem was explic-
itly named in [25], but, as noted there, was studied previously in connection with self-correcting
computation (e.g., [12, 20]), probabilistically checkable proofs (e.g., [8]), average-case reductions
(e.g., [9,30]), and private information retrieval (e.g., [13]). Here we want a decoding algorithm for a
given code that, given oracle access to the bits of an input string x, and given an index i ∈ [n], finds
the ith bit of the closest codeword to x by querying a small (possibly randomly selected) number
of bits of x. If we view the local decoding algorithm as a deterministic algorithm that takes input i
and a random string r (used to make the decisions) then we require that for each i, most choices of
r lead to the correct value for the ith bit of the closest codeword.
This is very similar to (though not quite the same as) the local property reconstruction problem
for P; for local property reconstruction we interchange the “for all” and “for most” quantifiers and
require that for most choices of r, and for all i ∈ [n], the algorithm correctly produces the ith bit
of the codeword. Also, we pay attention to the length of the random string r, which we want to be
suitably small.
In local list decoding, our aim is to find a short list of codewords that are all suitably close to the
input word. For example, in list decoding of low-degree polynomials [6,30], the input is a function
and the output is a small list of low-degree polynomials that are close to the input function.
The monotonicity problem considered in this paper is qualitatively quite different from the
local decoding examples.In local decoding there is either one correct output, or (in the case
of list-decoding) a sparse list of possible correct outputs. For monotonicity there may be many
(possibly infinitely many) ways to correct a given function to a nearby function with the desired
property. One might think that having many possible close corrections (rather than one) makes
reconstruction easier but, at least for the monotonicity problem, it does not. The difficulty arises
from the requirement that once the random seed is fixed, all query answers provided by the filter
must be consistent with a single function having the property.
A related notion of reconstruction was discussed in [23], for generalized partition problems
in dense graphs. Given an input dense graph G that satisfies some partition property (say k-
colorability), we wish to efficiently construct a partition of the vertices that has at most an ε-fraction
of violating edges. The algorithms for this problem provided in [23] behaved like local filters. Specif-
ically, there was a constant (function of ε) time algorithm that gave the color class of an input vertex
of G, and this could be run independently on all vertices (after fixing a random seed). This coloring
was guaranteed to violate at most an ε-fraction of the edges in G.
Independently of our work, a model of repair of a property was formulated and studied in [7]. This
is closely related to the reconstruction model considered here. The results in [7] primarily considered
reconstruction of hypergraph properties, and obtained local filters of a very special form that modify
an input hypergraph to satisfy a given property. This result can be seen as a generalization of
the characterizations of testable properties of dense graphs [4,5]. This does not focus on the exact
form of the error blow-up, and only requires that the distance of the reconstructed hypergraph be
bounded by some arbitrary function of the minimum distance of the hypergraph to the property.
In general, a local filter for reconstructing a given property can be used to estimate the distance
of an input instance to the property. When we fix a random seed and run the filter on f, the
4
Page 5
filter implicitly outputs a function g that has the desired property and is at distance at most Bεf
from f (where εf is distance of f to P). By choosing a random sample of domain points x and
computing the fraction of points where g(x) ?= f(x), we get an estimate of the distance d(g,f). Since
εf≤ d(g,f) and with high probability, εf≥ d(g,f)/B, we get a multiplicative B-approximation to
εfin sublinear time.
4Our results
In this work, we construct a local filter for monotonicity for functions defined on [n]dwith the
following performance:
• The time per query is (logn)O(d).
• The error blow-up is 2O(d2), independent of n.
• The number of random bits needed to initialize the filter is (dlogn)O(1).
The online filter for monotonicity of [3] has a running time per query of (logn)O(d)(with a better
constant in the exponent) and an error blow-up of 2d. We see that our filter achieves local behavior
while having query time and error blow-up that are similar to (but not quite as good) as those
obtained by [3].
Our filter for monotonicity builds on techniques used for property testing of monotonicity. There
has been a large amount of work done on property testing, which was defined in [23,28]. Many
testers have been given for a wide variety of combinatorial, algebraic, and geometric problems (see
surveys [17, 21, 27]). The related notions of tolerant testing and distance approximation were
introduced in [26]. The problem of monotonicity in the context of property testing has been studied
in [1,10,11,14,15,18,19,22,24]. Sublinear algorithms for approximating the distance of a function
to monotonicity have been given in [2,16,26].
Both the running time and error blow-up of our filter have an exponential dependence on the
dimension d. We also prove that this dependence is unavoidable. Specifically we show the following
for some constant 0 < α < 1: given a filter on the boolean hypercube {0,1}dthat answers queries
within time 2αd, there is an input function f such that the filter applied to f has error blow-up 2αd
with probability close to 1/2. This shows a complexity gap between testing and reconstruction for the
hypercube, since there are monotonicity testers with only a polynomial dependence on d [14,16,22].
5Overview of the local filter for monotonicity
We now discuss some of the ideas used in constructing the looal filter for monotonicity. Details of
the construction and analysis can be found in the full paper.
The starting point for the construction of our local filter for monotonicity is the online filter
of [3]. We now give the main ideas of their construction, and indicate the difficulties in making their
construction local. In the discussion below, when we say an algorithm is “fast”, we mean that it
runs in time polylogarithmic in |Γ|.
We start with the case d = 1, i.e., the one-dimensional case. The basic idea (implicitly used)
in [3] is to classify the domain points as accepted and rejected in such a way that the following
conditions hold:
(1) There is a fast algorithm for testing whether a given point is accepted or rejected.
5