PreprintPDF Available

Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries

Authors:
  • Durham Univeristy
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

The area of computing with uncertainty considers problems where some information about the input elements is uncertain, but can be obtained using queries. For example, instead of the weight of an element, we may be given an interval that is guaranteed to contain the weight, and a query can be performed to reveal the weight. While previous work has considered models where queries are asked either sequentially (adaptive model) or all at once (non-adaptive model), and the goal is to minimize the number of queries that are needed to solve the given problem, we propose and study a new model where k queries can be made in parallel in each round, and the goal is to minimize the number of query rounds. We use competitive analysis and present upper and lower bounds on the number of query rounds required by any algorithm in comparison with the optimal number of query rounds. Given a set of uncertain elements and a family of m subsets of that set, we present an algorithm for determining the value of the minimum of each of the subsets that requires at most (2+ε)optk+O(1εlgm)(2+\varepsilon) \cdot \mathrm{opt}_k+\mathrm{O}\left(\frac{1}{\varepsilon} \cdot \lg m\right) rounds for every 0<ε<10<\varepsilon<1, where optk\mathrm{opt}_k is the optimal number of rounds, as well as nearly matching lower bounds. For the problem of determining the i-th smallest value and identifying all elements with that value in a set of uncertain elements, we give a 2-round-competitive algorithm. We also show that the problem of sorting a family of sets of uncertain elements admits a 2-round-competitive algorithm and this is the best possible.
Round-Competitive Algorithms for Uncertainty
Problems with Parallel Queries
Thomas Erlebach !Ï
School of Informatics, University of Leicester, UK
Michael Hoffmann !
School of Informatics, University of Leicester, UK
Murilo Santos de Lima1!Ï
School of Informatics, University of Leicester, UK
Abstract
The area of computing with uncertainty considers problems where some information about the input
elements is uncertain, but can be obtained using queries. For example, instead of the weight of an
element, we may be given an interval that is guaranteed to contain the weight, and a query can
be performed to reveal the weight. While previous work has considered models where queries are
asked either sequentially (adaptive model) or all at once (non-adaptive model), and the goal is to
minimize the number of queries that are needed to solve the given problem, we propose and study
a new model where
k
queries can be made in parallel in each round, and the goal is to minimize
the number of query rounds. We use competitive analysis and present upper and lower bounds on
the number of query rounds required by any algorithm in comparison with the optimal number of
query rounds. Given a set of uncertain elements and a family of
m
subsets of that set, we present
an algorithm for determining the value of the minimum of each of the subsets that requires at most
(2 +
ε
)
·optk
+ O
1
ε·lg m
rounds for every 0
< ε <
1, where
optk
is the optimal number of rounds,
as well as nearly matching lower bounds. For the problem of determining the
i
-th smallest value and
identifying all elements with that value in a set of uncertain elements, we give a 2-round-competitive
algorithm. We also show that the problem of sorting a family of sets of uncertain elements admits a
2-round-competitive algorithm and this is the best possible.
2012 ACM Subject Classification
Theory of Computation
Design and analysis of algorithms;
Mathematics of computing
Discrete mathematics; Theory of computation
Theory and al-
gorithms for application domains
Keywords and phrases
online algorithms, competitive analysis, explorable uncertainty, parallel
algorithms, minimum problem, selection problem
Related Version
An extended abstract is to appear in the proceedings of the 38th International
Symposium on Theoretical Aspects of Computer Science (STACS 2021).
Funding This research was supported by EPSRC grant EP/S033483/1.
Acknowledgements We would like to thank Markus Jablonka for helpful discussions.
1 Introduction
Motivated by real-world applications where only rough information about the input data
is initially available but precise information can be obtained at a cost, researchers have
considered a range of
uncertainty problems with queries
[
7
,
13
,
14
,
15
,
16
,
19
,
26
].
This research area has also been referred to as
queryable uncertainty
[
12
] or
explorable
uncertainty
[
17
]. For example, in the input to a sorting problem, we may be given for each
input element, instead of its precise value, only an interval containing that point. Querying
1Corresponding author.
arXiv:2101.05032v2 [cs.DS] 14 Jan 2021
2 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
an element reveals its precise value. The goal is to make as few queries as possible until
enough information has been obtained to solve the sorting problem, i.e., to determine a
linear order of the input elements that is consistent with the linear order of the precise
values. Motivation for explorable uncertainty comes from many different areas (see [
12
] and
the references given there for further examples): The uncertain input elements may, e.g.,
be locations of mobile nodes or approximate statistics derived from a distributed database
cache [
29
]. Exact information can be obtained at a cost, e.g., by requesting GPS coordinates
from a mobile node, by querying the master database or by a distributed consensus algorithm.
The main model that has been studied in the explorable uncertainty setting is the
adaptive query model
: The algorithm makes queries one by one, and the results of
previous queries can be taken into account when determining the next query. The number of
queries made by the algorithm is then compared with the best possible number of queries for
the given input (i.e., the minimum number of queries sufficient to solve the problem) using
competitive analysis [
5
]. An algorithm is
ρ-query-competitive
(or simply
ρ
-competitive)
if it makes at most
ρ
times as many queries as an optimal query set. A very successful
algorithm design paradigm in this area is based on the concept of
witness sets
[
7
,
14
]. A
witness set is a set of input elements for which it is guaranteed that every query set that
solves the problem contains at least one query in that set. If a problem admits witness sets of
size at most
ρ
, one obtains a
ρ
-query-competitive algorithm by repeatedly finding a witness
set and querying all its elements.
Some work has also considered the
non-adaptive query model
(see, e.g., [
15
,
28
,
29
]),
where all queries are made simultaneously and the set of queries must be chosen in such a way
that they certainly reveal sufficient information to solve the problem. In the non-adaptive
query model, one is interested in complexity results and approximation algorithms.
In settings where the execution of a query takes a non-negligible amount of time and there
are sufficient resources to execute a bounded number of queries simultaneously, the query
process can be completed faster if queries are not executed one at a time, but in
rounds
with
k
simultaneous queries. Such scenarios include e.g. IoT environments (such as drones
measuring geographic data), or teams of interviewers doing market research. Apart from
being well motivated from an application point of view, this variation of the model is also
theoretically interesting because it poses new challenges in selecting a useful set of
k
queries
to be made simultaneously. Somewhat surprisingly, however, this has not been studied yet.
In this paper, we address this gap and analyze for the first time a model where the algorithm
can make up to
k
queries per round, for a given value
k
. The query results from previous
rounds can be taken into account when determining the queries to be made in the next
round. Instead of minimizing the total number of queries, we are interested in minimizing
the number of query rounds, and we say that an algorithm is
ρ-round-competitive
if, for
any input, it requires at most ρtimes as many rounds as the optimal query set.
A main challenge in the setting with
k
queries per round is that the witness set paradigm
alone is no longer sufficient for obtaining a good algorithm. For example, if a problem
admits witness sets with at most 2elements, this immediately implies a 2-query-competitive
algorithm for the adaptive model, but only a
k
-round-competitive algorithm for the model
with
k
queries per round. (The algorithm is obtained by simply querying one witness set in
each round, and not making use of the other
k
2available queries.) The issue is that, even
if one can find a witness set of size at most
ρ
, the identity of subsequent witness sets may
depend on the outcome of the queries for the first witness set, and hence we may not know
how to compute a number of different witness sets that can fill a query round if kρ.
T. Erlebach, M. Hoffmann, and M. S. de Lima 3
Our contribution.
Apart from introducing the model of explorable uncertainty with
k
queries per round, we study several problems in this model: Minimum,Selection and
Sorting. For Minimum (or Sorting), we assume that the input can be a family
S
of
subsets of a given ground set
I
of uncertain elements, and that we want to determine the
value of the minimum of (or sort) all those subsets. For Selection, we are given a set
I
of
n
uncertain elements and an index
i∈ {
1
, . . . , n}
, and we want to determine the
i
-th smallest
value of the nprecise values, and all the elements of Iwhose value is equal to that value.
Our main contribution lies in our results for the Minimum problem. We present an
algorithm that requires at most (2 +
ε
)
·optk
+ O
1
ε·lg m
rounds, for every 0
<ε<
1,
where
optk
is the optimal number of rounds and
m
=
|S|
. (The execution of the algorithm
does not depend on
ε
, so the upper bound holds in particular for the best choice of 0
<ε<
1
for given
optk
and
m
.) Interestingly, our algorithm follows a non-obvious approach that is
reminiscent of primal-dual algorithms, but no linear programming formulation features in
the analysis. For the case that the sets in
S
are disjoint, we obtain some improved bounds
using a more straightforward algorithm. We also give lower bounds that apply even to the
case of disjoint sets, and show that our upper bounds are close to best possible. Note that
the Minimum problem is equivalent to the problem of determining the maximum element
of each of the sets in
S
, e.g., by simply negating all the numbers involved. A motivation
for studying the Minimum problem thus arises from the minimum spanning tree problem
with uncertain edge weights [
11
,
14
,
17
,
26
]: Determining the maximum-weight edge of each
cycle of a given graph allows one to determine a minimum spanning tree. Therefore, there is
a connection between the problem of determining the maximum of each set in a family of
possibly overlapping sets (which could be the edge sets of the cycles of a given graph) and
the minimum spanning tree problem. The minimum spanning tree problem with uncertain
edge weights has not been studied yet for the model with
k
queries per round, and seems
to be difficult for that setting. In particular, it is not clear in advance for which cycles of
the graph a maximum-weight edge actually needs to be determined, and this makes it very
difficult to determine a set of
k
queries that are useful to be asked in parallel. We hope that
our results for Minimum provide a first step towards solving the minimum spanning tree
problem.
Another motivation for solving multiple possibly overlapping sets comes from distributed
database caches [
29
], where one wants to answer database queries using cached local data
and a minimum number of queries to the master database. Values in the local database cache
may be uncertain, and exact values can be obtained by communicating with the central
master database. Different database queries might ask for the record with minimum value
in the field with uncertain information among a set of database records satisfying certain
criteria, or for a list of such database records sorted by the field with uncertain information.
Answering such database queries while making a minimum number of queries for exact values
to the master database corresponds to the Minimum and Sorting problems we consider.
For the Selection problem, we obtain a 2-round-competitive algorithm. For Sorting,
we show that there is a 2-round-competitive algorithm, by adapting ideas from a recent
algorithm for sorting in the standard adaptive model [21], and that this is best possible.
We also discuss the relationship between our model and another model of parallel queries
proposed by Meißner [27], and we give general reductions between both settings.
Literature overview.
The seminal paper on minimizing the number of queries to solve a
problem on uncertainty intervals is by Kahan [
22
]. Given
n
elements in uncertainty intervals,
he presented optimal deterministic adaptive algorithms for finding the maximum, the median,
4 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
the closest pair, and for sorting. Olston and Widom [
29
] proposed a distributed database
system which exploits uncertainty intervals to improve performance. They gave non-adaptive
algorithms for finding the maximum, the sum, the average and for counting problems. They
also considered the case in which errors are allowed within a given bound, so a trade-off
between performance and accuracy can be achieved. Khanna and Tan [
23
] extended this
previous work by investigating adaptive algorithms for the situation in which bounded errors
are allowed. They also considered the case in which query costs may be non-uniform, and
presented results for the selection, sum and average problems, and for compositions of such
functions. Feder et al. [
16
] studied the generalized median/selection problem, presenting
optimal adaptive and non-adaptive algorithms. They proved that those are the best possible
adaptive and non-adaptive algorithms, respectively, instead of evaluating them from a
competitive analysis perspective. They also investigated the
price of obliviousness
, which
is the ratio between the non-adaptive and adaptive strategies.
After this initial foundation, many classic discrete problems were studied in this framework,
including geometric problems [
7
,
9
], shortest paths [
15
], network verification [
4
], minimum
spanning tree [
11
,
14
,
17
,
26
], cheapest set and minimum matroid base [
13
,
28
], linear
programming [
25
,
30
], traveling salesman [
32
], knapsack [
19
], and scheduling [
2
,
3
,
10
]. The
concept of witness sets was proposed by Bruce et al. [
7
], and identified as a pattern in many
algorithms by Erlebach and Hoffmann [
12
]. Gupta et al. [
20
] extended this framework to the
setting where a query may return a refined interval, instead of the exact value of the element.
The problem of sorting uncertainty data has received some attention recently. Halldórsson
and de Lima [
21
] presented better query-competitive algorithms, by using randomization or
assumptions on the underlying graph structure. Other related work on sorting has considered
sorting with noisy information [
1
,
6
] or preprocessing the uncertain intervals so that the
actual numbers can be sorted efficiently once their precise value are revealed [31].
The idea of performing multiple queries in parallel was also investigated by Meißner [
27
].
Her model is different, however. Each round/batch can query an unlimited number of
intervals, but at most a fixed number of rounds can be performed. The goal is to minimize
the total number of queries. Meißner gave results for selection, sorting and minimum spanning
tree problems. We discuss this model in Section 6. A similar model was also studied by
Canonne and Gur for property testing [8].
Organization of the paper.
We present some definitions and preliminary results in Section 2.
Sections 3, 4 and 5 are devoted to the sorting, minimum and selection problems, respectively.
In Section 6, we discuss the relationship between the model we study and the model of
Meißner for parallel queries [27]. We conclude in Section 7.
2 Preliminaries and Definitions
For the problems we consider, the input consists of a set of
n
continuous uncertainty intervals
I
=
{I1, . . . , In}
in the real line. The precise value of each data item is
viIi
, which can be
learnt by performing a query; formally, a query on
Ii
replaces this interval with
{vi}
. We
wish to solve the given problem by performing the minimum number of queries (or query
rounds). We say that a closed interval
Ii
= [
i, ui
]is
trivial
if
i
=
ui
; clearly
Ii
=
{vi}
, so
trivial intervals never need to be queried. Some problems require that intervals are either
open or trivial; we will discuss this in further detail when addressing each problem. For a
given realization
v1, . . . , vn
of the precise values, a set
Q⊆ I
of intervals is a
feasible query
set
if querying
Q
is enough to solve the given problem (i.e., to output a solution that can
T. Erlebach, M. Hoffmann, and M. S. de Lima 5
be proved correct based only on the given intervals and the answers to the queries in
Q
),
and an
optimal query set
is a feasible query set of minimum size. Since the precise values
are initially unknown to the algorithm and can be defined adversarially, we have an online
exploration problem [
5
]. We fix an optimal query set
OPT1
, and we write
opt1
:=
|OPT1|
.
An algorithm which performs up to
ρ·opt1
queries is said to be
ρ-query-competitive
.
Throughout this paper, we only consider deterministic algorithms.
In previous work on the adaptive model, it is assumed that queries are made sequentially,
and the algorithm can take the results of all previous queries into account when deciding the
next query. We consider a model where queries are made in
rounds
and we can perform up
to
k
queries in parallel in each round. The algorithm can take into account the results from all
queries made in previous rounds when deciding which queries to make in the next round. The
adaptive model with sequential queries is the special case of our model with
k
= 1. We denote
by
optk
the optimal number of rounds to solve the given instance. Note that
optk
=
opt1/k
as
OPT1
only depends on the input intervals and their precise values and can be distributed
into rounds of
k
queries arbitrarily. For an algorithm
ALG
we denote by
ALG1
the number
of queries it makes, and by
ALGk
the number of rounds it uses. An algorithm which solves
the problem in up to
ρ·optk
rounds is said to be
ρ-round-competitive
. A query performed
by an algorithm that is not in
OPT1
is called a
wasted
query, and we say that the algorithm
wastes that query; a query performed by an algorithm that is not wasted is useful.
Proposition 2.1.
If an algorithm makes all queries in
OPT1
, wastes
w
queries in total
over all rounds excluding the final round, always makes
k
queries per round except possibly
in the final round, and stops as soon as the queries made so far suffice to solve the problem,
then its number of rounds will be (opt1+w)/k⌉ ≤ optk+w/k.
The problems we consider are Minimum,Sorting and Selection. For Minimum and
Sorting, we assume that we are given a set
I
of
n
intervals and a family
S
of
m
subsets
of
I
. For Sorting, the task is to output, for each set
S∈ S
, an ordering of the elements
in
S
that is consistent with the order of their precise values. For Minimum, the task is
to output, for each
S∈ S
, an element whose precise value is the minimum of the precise
values of all elements in
S
, along with the value of that element.
2
Regarding the family
S
,
we can distinguish the cases where
S
contains a single set, where all sets in
S
are pairwise
disjoint, and the case where the sets in
S
may overlap, i.e., may have common elements. For
Selection, we are given a set
I
of
n
intervals and an index
i∈ {
1
, . . . , n}
. The task is to
output the
i
-th smallest value
v
(i.e., the value in position
i
in a sorted list of the precise
values of the
n
intervals), as well as the set of intervals whose precise value equals
v
. We
also discuss briefly a variant of Minimum in which we seek all elements whose precise value
is the minimum and a variant of Selection in which we only seek the value v.
For a better understanding of the problems, we give a simple example for Sorting
with
k
= 1. We have a single set with two intersecting intervals. There are four different
configurations of the realizations of the precise values, which are shown in Figure 1. In
Figure 1a, it is enough to query
I1
to learn that
v1< v2
; however, if an algorithm first
queries
I2
, it cannot decide the order, so it must query
I1
as well. In Figure 1b we have a
symmetric situation. In Figure 1c, both intervals must be queried (i.e., the only feasible query
set is
{I1, I2}
), otherwise it is not possible to decide the order. Finally, in Figure 1d it is
2
In some of the literature, it is only required to identify the element with minimum value. Returning the
precise minimum value, however, is also an important problem, as discussed in [
26
, Section 7] for the
minimum spanning tree problem.
6 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
I1
I2
(a)
I1
I2
(b)
I1
I2
(c)
I1
I2
(d)
Figure 1
Example of Sorting for two intervals and the possible realizations of the precise values.
We have that opt1= 1 in (a), (b) and (d), and opt1= 2 in (c).
enough to query either
I1
or
I2
; hence, both
{I1}
and
{I2}
are feasible query sets. Since those
realizations are initially identical to the algorithm, this example shows that no deterministic
algorithm can be better than 2-query-competitive, and this example can be generalized by
taking multiple copies of the given structure. For Minimum, however, an optimum solution
can always be obtained by first querying
I1
(and then
I2
only if necessary): Since we need
the precise value of the minimum element, in Figure 1b it is not enough to just query I2.
3 Sorting
In this section we discuss the Sorting problem. We allow open, half-open, closed, and
trivial intervals in the input, i.e.,
Ii
can be of the form [
i, ui
]with
iui
, or (
i, ui
],[
i, ui
)
or (i, ui)with i< ui.
First, we consider the case where
S
consists of a single set
S
, which we can assume
to contain all
n
of the given intervals. We wish to find a permutation
π
: [
n
]
[
n
]such
that
vivj
if
π
(
i
)
< π
(
j
), by performing the minimum number of queries possible. This
problem was addressed for
k
= 1 in [
21
,
22
,
27
]; it admits 2-query-competitive deterministic
algorithms and has a deterministic lower bound of 2.
For Sorting, if two intervals
Ii
= [
i, ui
]and
Ij
= [
j, uj
]are such that
IiIj
=
{ui}
=
{j}
, then we can put them in a valid order without any further queries, because clearly
vivj
. Therefore, we say that two intervals
Ii
and
Ijintersect
(or are
dependent
) if
either their intersection contains more than one point, or if
Ii
is trivial and
vi
(
j, uj
)(or
vice versa). This is equivalent to saying that
Ii
and
Ij
are dependent if and only if
ui> ℓj
and uj> ℓi. Two simple facts are important to notice, which are proven in [21]:
For any pair of intersecting intervals, at least one of them must be queried in order to
decide their relative order; i.e., any intersecting pair is a witness set.
The
dependency graph
that represents this relation, with a vertex for each interval
and an edge between intersecting intervals, is an interval graph [24].
We adapt the 2-query-competitive algorithm for Sorting by Halldórsson and de Lima [
21
]
for
k
= 1 to the case of arbitrary
k
. Their algorithm first queries all non-trivial intervals in a
minimum vertex cover in the dependency graph. By the duality between vertex covers and
independent sets, the unqueried intervals form an independent set, so no query is necessary
to decide the order between them. However, the algorithm still must query intervals in
the independent set that intersect a trivial interval or the value of a queried interval. To
adapt the algorithm to the case of arbitrary
k
, we first compute a minimum vertex cover
and fill as many rounds as necessary with the given queries. After the answers to the queries
are returned, we use as many rounds as necessary to query the intervals of the remaining
independent set that contain a trivial point.
Theorem 3.1.
The algorithm of Halldórsson and de Lima [
21
] yields a 2-round-competitive
algorithm for Sorting that runs in polynomial time.
T. Erlebach, M. Hoffmann, and M. S. de Lima 7
Proof.
Any feasible query set is a vertex cover in the dependency graph, due to the fact that
at least one interval in each intersecting pair must be queried. Therefore a minimum vertex
cover is at most the size of an optimal query set, so the first phase of the algorithm spends at
most
optk
rounds. Since all intervals queried in the second phase are in any solution, again
we spend at most another
optk
rounds. As the minimum vertex cover problem for interval
graphs can be solved in polynomial time [
18
], the overall algorithm is polynomial as well.
The problem has a lower bound of 2 on the round-competitive factor. This can be
shown by having
kc
copies of a structure consisting of two dependent intervals, for some
c
1.
OPT1
may query only one interval in each pair, while we can force any deterministic
algorithm to query both of them (cf. the configurations shown in Figures 1a and 1b). We
have that optk=cwhile any deterministic algorithm will spend at least 2crounds.
We remark that the 2-query-competitive algorithm for Sorting with
k
= 1 due to
Meißner [
27
], when adapted to the setting with arbitrary
k
in the obvious way, only gives a
bound of 2
·optk
+ 1 rounds. Her algorithm first greedily computes a maximal matching in
the dependency graph and queries all non-trivial matched vertices, and then all remaining
intervals that contain a trivial point.
Now we study the case of solving a number of problems on different subsets of the same
ground set of uncertain elements. In such a setting, it may be better to perform queries that
can be reused by different problems, even if the optimum solution for one problem may not
query that interval. We can reuse ideas from the algorithms for single problems that rely on
the dependency graph. We define a new dependency relation (and dependency graph) in
such a way that two intervals are dependent if and only if they intersect and belong to a
common set. Note that the resulting graph may not be an interval graph, so some algorithms
for single problems may not run in polynomial time for this generalization.
If we perform one query at a time (
k
= 1), then there are 2-competitive algorithms. One
such is the algorithm by Meißner [
27
] described above; since a maximal matching can be
computed greedily in polynomial time for arbitrary graphs, this algorithm runs in polynomial
time for non-disjoint problems. If we can make
k
2queries in parallel, then this algorithm
performs at most 2
·optk
+1 rounds, and the analysis is tight since we may have an incomplete
round in between the two phases of the algorithm. If we relax the requirement that the
algorithm runs in polynomial time, then we can obtain an algorithm that needs at most
2
·optk
rounds, by first querying non-trivial intervals in a minimum vertex cover of the
dependency graph (in as many rounds as necessary) and then the intervals that contain a
trivial interval or the value of a queried interval (again, in as many rounds as necessary).
4 The Minimum Problem
For the Minimum problem, we assume without loss of generality that the intervals are
sorted by non-decreasing left endpoints; intervals with the same left endpoint can be ordered
arbitrarily. The
leftmost
interval among a subset of
I
is the one that comes earliest in this
ordering. We also assume that all intervals are open or trivial; otherwise the problem has a
trivial lower bound of non the query-competitive ratio [20].
First, consider the case
S
=
{I}
, i.e., we have a single set. It is easy to see that the
optimal query set consists of all intervals whose left endpoint is strictly smaller than the
precise value of the minimum: If
Ii
with precise value
vi
is a minimum element, then all
other intervals with left endpoint strictly smaller than
vi
must be queried to rule out that
their value is smaller than
vi
, and
Ii
must be queried (unless it is a trivial interval) to
8 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
determine the value of the minimum. The optimal set of queries is hence a prefix of the
sorted list of uncertain intervals (sorted by non-decreasing left endpoint). This shows that
there is a 1-query-competitive algorithm when
k
= 1, and a 1-round-competitive algorithm
for arbitrary
k
: In each round we simply query the next
k
uncertain intervals in the order
of non-decreasing left endpoint, until the problem is solved. For
k
= 1, the same method
yields a 1-query-competitive algorithm for the case with several sets: The algorithm can
always query an interval with smallest left endpoint for any of the sets that have not yet
been solved.3
In the remainder of this section, we consider the case of multiple sets and
k >
1. We
first present a more general result for potentially overlapping sets, then we give better upper
bounds for disjoint sets. At the end of the section, we also present lower bounds.
Let
W
(
x
) =
xlg x
; the inverse
W1
of
W
will show up in our analysis. Note that
W1(x) = Θ(x/ lg x)(see Appendix A for a proof).
Throughout this section, we assume w.l.o.g. that the optimum must make at least one
query in each set (or we consider only sets that require some query). We also assume that
any algorithm always discards from each set all elements that are certainly not the minimum
of that set, i.e., all elements for which it is already clear based on the available information
that their value must be larger than the minimum value of the set (this is where the right
endpoints of intervals also need to be considered). We adopt the following terminology. A
set in
S
is
solved
if we can determine the value of its minimum element. A set is
active
at
the start of a round if the queries made in previous rounds have not solved the set yet. An
active set
survives
a round if it is still active at the start of the next round. An active set
that does not survive the current round is said to be solved in the current round.
To illustrate these concepts, let us discuss a first simple strategy to build a query set
Q
for
a round. Let
P
be the set of intervals queried in previous rounds. The
prefix length
of an
active set
S
is the length of the maximum prefix of elements from
Q
in the list of non-trivial
intervals in
S\ P
ordered by non-decreasing left endpoints. The algorithm proceeds by
repeatedly adding to
Q
the leftmost non-trivial element not in
Q∪ P
from an arbitrary
active set with minimum prefix length. We call this the
balanced
algorithm, and denote
it by
BAL
. We give an example of its execution in Figure 2, with
m
= 3 disjoint sets and
k
= 5. The optimum solution queries the first three elements in
S1
and
S2
, and all elements
in
S3
. Since the algorithm picks an arbitrary active set with minimum prefix length, it may
give preference to
S1
and
S2
over
S3
, thus wasting one query in
S1
and one in
S2
in round 2.
All sets are active at the beginning of round 2;
S1
and
S2
are solved in round 2, while
S3
survives round 2. Since
S1
and
S2
are solved in round 2, they are no longer active in round 3,
so the algorithm no longer queries any of their elements.
4.1 The Minimum Problem with Arbitrary Sets
We are given a set
I
of
n
intervals and a family
S
of
m
possibly overlapping subsets of
I
,
and a number k2of queries that can be performed in each round.
Unfortunately, it is possible to construct an instance in which
BAL
uses as many as
k·optk
rounds. Let
c
be a multiple of
k
. We have
m
=
c·
(
k
1) sets, which are divided
in
c
groups with
k
1sets. For
i
= 1
, . . . , c
, the sets in groups
i, . . . , c
share the
i
leftmost
3
If we want to determine all elements whose value equals the minimum, it is not hard to see that the
optimal set of queries for each set is again a prefix. As all our algorithms require only this property, we
obtain corresponding results for that problem variant, even for inputs with arbitrary closed, open and
half-open intervals.
T. Erlebach, M. Hoffmann, and M. S. de Lima 9
S1
S2
S3
round 1 round 2 round 3
Figure 2
Possible execution of
BAL
for
m
= 3 disjoint sets and
k
= 5. Each interval is represented
by a box, and the optimum solution is a prefix of each set. The solid boxes are useful queries, the
two hatched boxes are wasted queries, and the white boxes are not queried by the algorithm.
elements. Furthermore, each set has one extra element which is unique to that set. The
precise values are such that each set in the
i
-th group is solved after querying the first
i
elements. We give an example in Figure 3 with
k
= 3 and
c
= 3. If we let
BAL
query the
intervals in the order given by the indices, it is easy to see that it queries
c·k
intervals,
while the
c
intervals that are shared by more than one set are enough to solve all sets. In
particular, note that
BAL
does not take into consideration that some elements are shared
between different sets. The challenge is how to balance queries between sets in a better way.
S1
I1
I2
S2
I1
I3
S3
I1
I4
I5
S4
I1
I4
I6
S5
I1
I4
I7
I8
S6
I1
I4
I7
I9
Figure 3
Bad instance for
BAL
with overlapping sets, with
k
= 3 and
c
= 3.
BAL
will query the
following rounds: {I1, I2, I3},{I4, I5, I6},{I7, I8, I9}. It is enough to query {I1, I4, I7}.
We give an algorithm that requires at most (2 +
ε
)
·optk
+ O
1
ε·lg m
rounds, for every
0
< ε <
1. (The execution of the algorithm does not depend on
ε
, so the upper bound
10 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
Algorithm 1 Computing a query round for possibly non-disjoint sets
Data: family S={S1, . . . , Sm}of active subsets of the ground set I
Result: set Q⊆ I of at most kqueries to make
1begin
2Qset of leftmost unqueried elements of all sets in S;
3if |Q| ≥ kthen
4Qarbitrary subset of Qwith size k;
5else
6bi0for all Si∈ S;
7while |Q|< k and there are unqueried elements in I \ Qdo
8foreach e∈ I \ Qdo
9Fe← {i|eis the leftmost unqueried element from I \ Qin Si};
10 increase all bisimultaneously at the same rate until there is an unqueried
element e∈ I \ Qthat satisfies PiFebi= 1;
11 QQ∪ {e};
12 bi0for all iFe;
13 return Q;
holds in particular for the best choice of 0
<ε<
1for given
optk
and
m
.) It is inspired by
how some primal-dual algorithms work. The pseudocode for determining the queries to be
made in a round is shown in Algorithm 1. First, we try to include the leftmost element of
each set in the set of queries
Q
. If those are not enough to fill a round, then we maintain a
variable
bi
for each set
Si
, which can be interpreted as a budget for each set. The variables
are increased simultaneously at the same rate, until the sets that share a current leftmost
unqueried element not in
Q
have enough budget to buy it. More precisely, at a given point
of the execution, for each element
e∈ I \ Q
, let
Fe
contain the indices of the sets that have
e
as their leftmost unqueried element not in
Q
. We include
e
in
Q
when
PiFebi
= 1, and
then we set
bi
to zero for all
iFe
. We repeat this process until
|Q|
=
k
or there are no
unqueried elements in I \ Q.
When a query
e
is added to
Q
, we say that it is
charged
to the sets
Si
with
iFe
.
The amount of charge for set
Si
is equal to the value of
bi
just before
bi
is reset to 0after
adding eto Q. We also say that the set Sipays this amount for e.
Definition 4.1.
Let
ε >
0. A round is
ε-good
if at least
k/
2of the queries made by
Algorithm 1 are also in
OPT1
(i.e., are useful queries), or if at least
a/r
active sets are
solved in that round, where
a
is the number of active sets at the start of the round and
r= (2(1 + ε) + 2ε2+ 4ε+ 4). A round that is not ε-good is called ε-bad.
Note that r > 2for any ε > 0.
Lemma 4.2.
If a round is
ε
-bad, then Algorithm 1 will make at least 2
k/
(2 +
ε
)useful
queries in the following round.
Proof.
Let
a
denote the number of active sets at the start of an
ε
-bad round. Let
s
be the
number of sets that are solved in the current round; note that
s < a/r
because the current
round is
ε
-bad. Let
T
be the total amount by which each value
bi
has increased during
the execution of Algorithm 1. If the simultaneous increase of all
bi
is interpreted as time
passing, then
T
corresponds to the point in time when the computation of the set
Q
has
T. Erlebach, M. Hoffmann, and M. S. de Lima 11
been completed. For example, if some set
Si
did not pay for any element during the whole
execution, then Tis equal to the value of biat the end of the execution of Algorithm 1.
Let
Q
be the set of queries that Algorithm 1 makes in the current round. We claim that
every wasted query in
Q
is charged only to sets that are solved in this round. Consider a
wasted query
e
that is in some set
Sj
not solved in this round. At the time
e
was selected,
j
cannot have been in
Fe
because otherwise
e
would be a useful query. Therefore, we do not
charge eto Sj.
The total number of wasted queries is therefore bounded by
T s
, as these queries are paid
for by the
s
sets solved in this round. As the number of wasted queries in a bad round is larger
than
k/
2, we therefore have
T s > k/
2. As
s < a/r
, we get
k/
2
< T a/r
, so
T >
(
r/
2)
·
(
k/a
).
Call a surviving set
Sirich
if
bi> k/a
when the computation of
Q
is completed. A set
that is not rich is called
poor
. Note that a poor set must have spent at least an amount
of (r/21) ·(k/a)>0, as its total budget would be at least T > (r/2) ·(k/a)if it had not
paid for any queries. As the poor sets have paid for fewer than
k/
2elements in total (as
there are fewer than
k/
2useful queries in the current round), the number of poor sets is
bounded by
k/2
(r/21)·(k/a)
=
a/
(
r
2)
>
0. As there are more than (1
1
/r
)
·a
surviving
sets and at most
a/
(
r
2) of them are poor, there are at least (1
1
/r
)
·aa/
(
r
2) =
((r2)(r1) r)/(r(r2)) ·a= 2a/(2 + ε)>0surviving sets that are rich.
Let
e
be any element that is the leftmost unqueried element (at the end of the current
round) of a rich surviving set. If
e
was the leftmost unqueried element of more than
a/k
rich
surviving sets, those sets would have been able to pay for
e
(because their total remaining
budget would be greater than
k/a ·a/k
= 1) before the end of the execution of Algorithm 1, a
contradiction to
e
not being included in
Q
. Hence, the number of distinct leftmost unqueried
elements of the at least 2
a/
(2+
ε
)rich surviving sets is at least (2
a/
(2+
ε
))
/
(
a/k
) = 2
k/
(2+
ε
).
So the following round will query at least 2
k/
(2 +
ε
)elements that are the leftmost unqueried
element of an active set, and all those are useful queries that are made in the next round.
Theorem 4.3.
Let
optk
denote the optimal number of rounds and
Ak
the number of
rounds made if the queries are determined using Algorithm 1. Then, for every 0
< ε <
1,
Ak(2 + ε)·optk+ O 1
ε·lg m.
Proof. In every round, one of the following must hold:
The algorithm makes at least k/2useful queries.
The algorithm solves at least a fraction of 1/r of the active sets.
If none of the above hold, the algorithm makes at least 2
k/
(2 +
ε
)useful queries in the
following round (by Lemma 4.2).
The number of rounds in which the algorithm solves at least a fraction of 1
/r
of the active
sets is bounded by
logr/(r1) m
= O
1
ε·lg m
, since 1
/lg r
r1<
5
for 0
< ε <
1. In
every round where the algorithm does not solve at least a fraction of 1
/r
of the active sets,
the algorithm makes at least
k/
(2 +
ε
)useful queries on average (if in any such round it
makes fewer than
k/
2useful queries, it makes 2
k/
(2 +
ε
)useful queries in the following
round). The number of such rounds is therefore bounded by (2 + ε)·optk.
We do not know if this analysis is tight, so it would be worth investigating this question.
4.2 The Minimum Problem with Disjoint Sets
We now consider the case where
k
2and the
m
sets in the given family
S
are pairwise
disjoint. For this case, it turns out that the balanced algorithm achieves good upper bounds.
12 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
Theorem 4.4. BALkoptk+ O(lg min{k, m}).
Proof.
First we prove the bound for
mk
. Index the sets in such a way that
Si
is the
i
-th set that is solved by
BAL
, for 1
im
. Sets that are solved in the same round
are ordered by non-decreasing number of queries made in them in that round by
BAL
. In
the round when
Si
is solved, there are at least
m
(
i
1) active sets, so the number of
wasted queries in
Si
is at most
k
m(i1)
. (
BAL
makes at most
lk
m(i1) m
queries in
Si
, and
at least one of these is not wasted.) The total number of wasted queries is then at most
Pm
i=1 k
m(i1)
=
Pm
i=1 k/i
=
k·H
(
m
), where
H
(
m
)denotes the
m
-th Harmonic number. By
Proposition 2.1, BALkoptk+ O(lg m).
If
m>k
, observe that the algorithm does not waste any queries until the number of active
sets is at most
k
. From that point on, it wastes at most
k·H
(
k
)queries following the arguments
in the previous paragraph, so the number of rounds is bounded by optk+ O(log k).
We now give a more refined analysis that provides a better bound for
optk
= 1, as well
as a better multiplicative bound than what would follow from Theorem 4.4.
Lemma 4.5. If optk= 1, then BALkO(lg m/ lg lg m).
Proof.
Consider an arbitrary instance of the problem with
optk
= 1. Let
R
+ 1 be the
number of rounds needed by the algorithm. For each of the first Rrounds, we consider the
fraction
bi
of active sets that are not solved in that round. More formally, for the
i
-th round,
for 1
iR
, if
ai
denotes the number of active sets at the start of round
i
and
ai+1
the
number of active sets at the end of round i, then we define bi=ai+1/ai.
Consider round
i
,1
iR
. A set that is active at the start of round
i
and is still active
at the start of the round
i
+ 1 is called a
surviving set
. A set that is active at the start
of round
i
and gets solved by the queries made in round
i
is called a
solved set
. For each
surviving set, all queries made in that set in round
i
are useful. For each solved set, at least
one query made in that set is useful. We claim that this implies the algorithm makes at
least
kbi
useful queries in round
i
. To see this, observe that if the algorithm makes
k/ai
queries in a surviving set and
k/ai
queries in a solved set, we can conceptually move one
useful query from the solved set to the surviving set. After this, the
ai+1
surviving sets
contain at least
k/ai
useful queries on average, and hence
ai+1 ·k/ai
=
bik
useful queries in
total.
As
OPT1
must make all useful queries and makes at most
k
queries in total, we have that
PR
i=1 kbiopt1k
, so
PR
i=1 bi
1. Furthermore, as there are
m
active sets initially and
there is still at least one active set after round
R
, we have that
QR
i=1 bi
=
aR+1/a1
1
/m
.
To get an upper bound on
R
, we need to determine the largest possible value of
R
for which
there exist values
bi>
0for 1
iR
satisfying
PR
i=1 bi
1and
QR
i=1 bi
1
/m
. We gain
nothing from choosing
bi
with
PR
i=1 bi<
1, so we can assume
PR
i=1 bi
= 1. In that case,
the value of
QR
i=1 bi
is maximized if we set all
bi
equal, namely
bi
= 1
/R
. So we need to
determine the largest value of
R
that satisfies
QR
i=1
1
/R
1
/m
, or equivalently
RRm
, or
Rlg Rlg m. This shows that RW1(lg m) = O(lg m/ lg lg m).
Corollary 4.6. If optk= 1, then BALkO(lg k / lg lg k).
Proof.
If
km
, then the corollary follows from Lemma 4.5. If
k < m
, there can be at
most
k
active sets, because the optimum performs at most
k
queries since
optk
= 1. Hence,
we only need to consider these ksets and can apply Lemma 4.5 with m=k.
T. Erlebach, M. Hoffmann, and M. S. de Lima 13
Now we wish to extend these bounds to arbitrary
optk
. It turns out that we can reduce
the analysis for an instance with arbitrary
optk
to the analysis for an instance with
optk
= 1,
assuming that
BAL
is implemented in a round-robin fashion. A formal description of such
an implementation is as follows: fix an arbitrary order of the
m
sets of the original problem
instance as
S1, S2, . . . , Sm
, and consider it as a cyclic order where the set after
Sm
is
S1
. In
each round,
BAL
distributes the
k
queries to the active sets as follows. Let
i
be the index of
the set to which the last query was distributed in the previous round (or let
i
=
m
if we are
in the first round). Then initialize
Q
=
and repeat the following step
k
times. Let
j
be
the first index after
i
such that
Sj
is active and has unqueried non-trivial elements that are
not in
Q
; pick the leftmost unqueried non-trivial element in
Sj\Q
, insert it into
Q
, and set
i=j. The resulting set Qis then queried.
Lemma 4.7.
Assume that
BAL
distributes queries to active sets in a round-robin fashion.
If
BALkρ
for instances with
optk
= 1, with
ρ
independent of
k
, then
BAL
is
ρ
-round-
competitive for arbitrary instances.
Proof.
Let
L
= (
I,S
)be an instance with
optk
(
L
) =
t
. Note that
opt1
(
L
)
tk
. Consider
the instance
L
which is identical to
L
except that the number of queries per round is
k
=
tk
. Use
BAL
to refer to the solution computed by
BAL
for the instance
L
(and also
to the algorithm
BAL
when it is executed on instance
L
). Note that
optk
(
L
) = 1 as
opt1
(
L
) =
opt1
(
L
). By our assumption,
BAL
kρ
. We claim that this implies
BALkρt
.
To establish the claim, we compare the situation when
BAL
has executed
x
rounds on
L
with the situation when
BAL
has executed
xt
rounds on
L
. We claim that the following two
invariants hold for every x:
(1) The number of remaining active sets of BAL is at most that of BAL.
(2) BAL has made at least as many queries in each active set as BAL.
For a proof of these invariants, note that
BAL
and
BAL
distribute queries to sets in the
same round-robin order, the only difference being that
BAL
performs a round of queries
whenever
k
queries have been distributed, while
BAL
only performs a round of queries
whenever
kt
queries have accumulated. Imagine the process by which both algorithms pick
queries as if it was executed in parallel, with both of the algorithms choosing one query in
each step. The only case where
BAL
and
BAL
can distribute the next query to a different
set is when
BAL
distributes the next query to a set
Si
that is no longer active for
BAL
(or
all of whose non-trivial unqueried elements have already been added by
BAL
to the set of
queries to be performed in the next round). This can happen because
BAL
may have already
made some of the queries that
BAL
has distributed to sets but not yet performed. If this
happens,
BAL
will select for the next query an element of a set that comes after
Si
in the
cyclical order, so it will move ahead of
BAL
(i.e., it chooses a query now that
BAL
will
only choose in a later step). Hence, at any step during this process,
BAL
either picks the
same next query as
BAL
or is ahead of
BAL
. This shows that if the invariants hold when
BAL and BALhave executed xt and xrounds, respectively, then they also hold after they
have executed (
x
+ 1)
t
and
x
+ 1 rounds, respectively. As the invariants clearly hold for
x= 0, if follows that they always hold, and hence BALkρt.
Lemmas 4.5 and 4.7 imply the following.
Corollary 4.8. BAL is O(lg m/ lg lg m)-round-competitive.
Unfortunately, Corollary 4.6 cannot be combined with Lemma 4.7 directly to show that
BAL
is O(
lg k/ lg lg k
)-round-competitive, because the proof of Lemma 4.7 assumes that
ρ
is
not a function of k. However, we can show the claim using different arguments.
14 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
Lemma 4.9. BAL is O(lg k/ lg lg k)-round-competitive.
Proof. If km, the lemma follows from Corollary 4.8.
If
k < m
, let
L
be the given instance and let
R0
be the number of rounds the algorithm
needs until the number of active sets falls below
k
+ 1 for the first time. As the algorithm
makes at most one query in each active set in the first
R0
rounds, all queries made in the
first
R0
rounds are useful. Let
L
be the instance at the end of round
R0
. As
L
has at
most
k
active sets,
BAL
is O(
lg k/ lg lg k
)-round-competitive on
L
by Corollary 4.8, and it
needs at most O(lg k/ lg lg k)·optk(L) = O(lg k/ lg lg k)· ⌈opt1(L)/krounds to solve L.
We have that
opt1
(
L
) =
k·R0
+
opt1
(
L
), and hence
optk
(
L
) =
R0
+
opt1
(
L
)
/k
. Thus,
BALk(L)R0+ O(lg k/ lg lg k)· ⌈opt1(L)/k
O(lg k/ lg lg k)·(R0+opt1(L)/k)
= O(lg k/ lg lg k)·optk(L),
and the claim follows.
The following theorem then follows from Corollary 4.8 and Lemma 4.9.
Theorem 4.10. BAL is O(lg min{k , m}/lg lg min{k, m})-round-competitive.
4.3 Lower Bounds
In this section we present lower bounds for Minimum that hold even for the more restricted
case where the family Sconsists of disjoint sets.
Theorem 4.11.
For arbitrarily large
m
and any deterministic algorithm
ALG
, there exists
an instance with
m
sets and
k > m
queries per round, such that
optk
= 1,
ALGkW1
(
lg m
)
and
ALGk
= Ω(
W1
(
lg k
)). Hence, there is no o(
lg min{k, m}/lg lg min{k, m}
)-round-
competitive deterministic algorithm.
Proof.
Fix an arbitrarily large positive integer
M
. Consider an instance with
m
=
MM
sets, and let
k
=
MM+1
. Each set contains
Mk
elements, with the
i
-th element having
uncertainty interval (1 +
iε,
100 +
)for
ε
= 1
/m
. The adversary will pick for each set an
index
j
and set the
j
-th element to be the minimum, by letting it have value 1 + (
j
+ 0
.
5)
ε
,
while the
i
-th element for
i̸
=
j
is given value 100 + (
i
0
.
5)
ε
. The optimal query set for the
set is thus its first
j
elements. We assume that an algorithm queries the elements of each set
in order of increasing lower interval endpoints. (Otherwise, the lower bound only becomes
larger.)
Consider the start of a round when
am
sets are still active; initially
a
=
m
. The
adversary observes how the algorithm distributes its
k
queries among the active sets and
repeatedly adds the active set with largest number of queries (from the current round) to
a set
L
, until the total number of queries from the current round in sets of
L
is at least
(
M
1)
k/M
. Let
S
denote the remaining active sets. Note that
|S| a/M
. For the sets
in
L
, the adversary chooses the minimum in such a way that a single query in the current
round would have been sufficient to find it, while the sets in
S
remain active (and so the
optimum must make the same queries in them that the algorithm has made in the current
round, and these are at most
k/M
queries). We continue for
M
rounds. In the
M
-th round,
the adversary picks the minimum in all remaining sets in such way that a single query in
that round would have been sufficient to solve the set. The optimal number of queries is
then at most (
M
1)
k/M
+
MM
= (
M
1)
k/M
+
k/M
=
k
, and hence
optk
= 1. On the
other hand, we have ALGk=M.
T. Erlebach, M. Hoffmann, and M. S. de Lima 15
We can now express this lower bound in terms of
k
or
m
as follows: As
m
=
MM
, we have
lg m
=
Mlg M
and hence
M
=
W1
(
lg m
). As
k
=
MM+1
, we have
lg k
= (
M
+ 1)
lg M
and hence M= Ω(W1(lg k)). Thus, the theorem follows.
Theorem 4.12.
No deterministic algorithm
ALG
attains
ALGkoptk
+ o(
lg min{k, m}
).
Proof.
Let
k
=
m
be an arbitrarily large integer. The intervals of the
m
sets are chosen
as in the proof of Theorem 4.11, for a sufficiently large value of
M
. Let
a
be the number
of active sets at the start of a round; initially
a
=
m
. After each round, the adversary
considers the set
Sj
in which the algorithm has made the largest number of queries, which
must be at least
k/a
. The adversary picks the minimum element in
Sj
in such a way that
a single query in the current round would have been enough to solve it, and keeps all
other sets active. This continues for
m
rounds. The number of wasted queries is at least
k/m
+
k/
(
m
1) +
···
+
k/
2 +
km
=
k·
(
H
(
m
)
1) =
k·
Ω(
lg k
). As the algorithm must
also make all queries in OPT1, the theorem follows from Proposition 2.1.
We conclude thus that the balanced algorithm attains matching upper bounds for disjoint
sets. For non-disjoint sets, a small gap remains between our lower and upper bounds.
5 Selection
An instance of the Selection problem is given by a set
I
of
n
intervals and an integer
i
,
1
in
. Throughout this section we denote the
i
-th smallest value in the set of
n
precise
values by v.
5.1 Finding the Value v
If we only want to find the value
v
, then we can adapt the analysis in [
20
] to obtain an
algorithm that performs at most
opt1
+
i
1queries, simply by querying the intervals in
the order of their left endpoints. This is the best possible and can easily be parallelized in
optk
+
i1
k
rounds. Note that we can assume
i≤ ⌈n/
2
, since otherwise we can consider
the
i
-th largest value problem, noting that the
i
-th smallest value is the (
ni
+ 1)-th largest
value. We also assume that every input interval is either trivial or open, since otherwise (if
arbitrary closed intervals are allowed) the problem has a lower bound of
n
on the competitive
ratio, using the same instance as presented in [
20
] for the problem of identifying the minimum
element.
Let
Ij1
be the interval with the
i
-th smallest left endpoint, and let
Ij2
be the interval
with the
i
-th smallest right endpoint. Note that any interval
Ij
with
uj< ℓj1
or
j> uj2
can be discarded (and the value of iadjusted accordingly).
We analyze the algorithm that simply queries the
k
leftmost non-trivial intervals until
the problem is solved.
Theorem 5.1.
For instances of the
i
-th smallest value problem where all input intervals
are open or trivial, there is an algorithm that returns the i-th smallest value vand uses at
most
opt1+i1
koptk+i1
k
rounds.
16 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
Proof.
Let
I
be the set of non-trivial intervals in the input, ordered by non-decreasing left
endpoint. We show that there is a set
Q⊆ I
of size at most
opt1
+
i
1that is a prefix
of
I
in the given ordering and has the property that, after querying
Q
, the instance is solved.
Given the existence of such a set Q, it is clear that the theorem follows.
Fix an optimum query set
OPT1
, and let
v
be the
i
-th smallest value. After query-
ing
OPT1
, assume that there are
m
trivial intervals with value
v
. Note that
m
1, since
it is necessary to determine the value
v
. Those
m
intervals are either queried in
OPT1
or
already were trivial intervals in the input. We classify the intervals in
I
into the following
categories, depending on where they fall after querying OPT1:
1. The set M(of size m) consisting of trivial intervals whose value is v;
2. The set Xconsisting of non-trivial intervals that contain v;
3. The set Lof intervals that are to the left of v;
4. The set Rof intervals that are to the right of v.
L X R
M
Figure 4 An illustration of sets M,X,Land Rin the proof of Theorem 5.1.
We illustrate this classification in Figure 4. Note that intervals in
L
and
R
may intersect
intervals in
X
, but cannot contain
v
. Let
M
=
MOPT1
,
L
=
LOPT1
and
R
=
ROPT1
. Note that
XOPT1
=
, and that every interval in
M\M
is trivial in
the input.
We claim that the set
Q
= (
LI
)
XMR
is a prefix of
I
in the given ordering
(note that (
XMR
)
\ I
=
), that querying
Q
suffices to solve the instance, and that
|Q| ≤ opt1
+
i
1. Clearly, every interval in
LXM
comes before all the intervals in
R\R
in the ordering considered. It also holds that every interval in
R
comes before all
the intervals in
R\R
in the ordering, since otherwise an interval in
R
not satisfying this
condition could be removed from
OPT1
. Furthermore, querying all intervals in
Q
is enough
to solve the instance, because every interval in
R\R
is to the right of
v
, and the optimum
solution can decide the problem without querying them. Thus it suffices to bound the size
of
Q
. Note then that
|L|
+
|X| ≤ i
1since, after querying
OPT1
, the
i
-th smallest interval
is in M, and any interval in LXhas a left endpoint to the left of v. Therefore,
|Q| ≤ |L|+|X|+|M|+|R| ≤ i1 + |M|+|R| ≤ opt1+i1,
which concludes the proof.
The upper bound of
lopt1+i1
km
is best possible, because we can construct a lower bound
of
opt1
+
i
1queries to solve the problem. It uses the same instance as described in [
20
]
for the problem of identifying an
i
-th smallest element (but not necessarily finding its precise
value). We include a description of the instance for the sake of completeness. Consider 2
i
intervals, comprising
i
copies of (0
,
5) and
i
copies of
{
3
}
. For the first
i
1intervals (0
,
5)
queried by the algorithm, the adversary returns a value of 1, so the algorithm also needs to
T. Erlebach, M. Hoffmann, and M. S. de Lima 17
query the final interval of the form (0
,
5) to decide the problem. Then the adversary sets the
value of that interval to 4, and querying only that interval would be sufficient for determining
that 3is the
i
-th smallest value. Hence any deterministic algorithm makes at least
i
queries,
while opt1= 1.
5.2 Finding all Elements with Value v
Now we focus on the task of finding
v
as well as identifying all intervals in
I
whose precise
value equals
v
. For ease of presentation, we assume that all the intervals in
I
are closed.
The result can be generalized to arbitrary intervals without any significant new ideas, but the
proofs become longer and require more cases. A complete proof is included in Appendix B.
Let us begin by observing that the optimal query set is easy to characterize.
Lemma 5.2.
Every feasible query set contains all non-trivial intervals that contain
v
.
The optimal query set
OPT1
contains all non-trivial intervals that contain
v
and no other
intervals.
Proof.
If a non-trivial interval
Ij
containing
v
is not queried, one cannot determine whether
the precise value of
Ij
is equal to
v
or not. Thus, every feasible query set contains all
non-trivial intervals that contain v.
Furthermore, it is easy to see that the non-trivial intervals containing
v
constitute a
feasible query set: Once these intervals are queried, one can determine for each interval
whether its precise value is smaller than v, equal to v, or larger than v.
Let
Ij1
be the interval with the
i
-th smallest left endpoint, and let
Ij2
be the interval with
the
i
-th smallest right endpoint. Then it is clear that
v
must lie in the interval [
j1, uj2
],
which we call the
target area
. The following lemma was essentially shown by Kahan [
22
];
we include a proof for the sake of completeness.
Lemma 5.3
(Kahan, 1991)
.
Assume that the current instance of Selection is not yet
solved. Then there is at least one non-trivial interval
Ij
in
I
that contains the target area,
i.e., satisfies jj1and ujuj2.
Proof.
First, assume that the target area is trivial, i.e.,
j1
=
uj2
=
v
. If there is no
non-trivial interval in
I
that contains
v
, then the instance is already solved, a contradiction.
Now, assume that the target area is non-trivial. Assume that no interval in
I
contains
the target area. Then all intervals Ijwith jj1have uj< uj2. There are at least isuch
intervals (because
j1
is the
i
-th smallest left endpoint), and hence the
i
-th smallest right
endpoint must be strictly smaller than uj2, a contradiction to the definition of uj2.
For
k
= 1, there is therefore an online algorithm that makes
opt1
queries: In each
round, it determines the target area of the current instance and queries a non-trivial interval
that contains the target area. (This algorithm was essentially proposed by Kahan [
22
] for
determining all elements with value equal to
v
, without necessarily determining
v
.) For
larger
k
, the difficulty is how to select additional intervals to query if there are fewer than
k
intervals that contain the target area.
The intervals that intersect the target area can be classified into four categories:
(1) anon-trivial intervals [j, uj]with jj1and ujuj2; they contain the target area;
(2) b
intervals [
j, uj
]with
j> ℓj1
and
uj< uj2
; they
are strictly contained
in the target
area and contain neither endpoint of the target area;
(3) c
intervals [
j, uj
]with
jj1
and
uj< uj2
; they intersect the target area on the
left
;
18 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
(4) d
intervals [
j, uj
]with
j> ℓj1
and
ujuj2
; they intersect the target area on the
right
.
We propose the following algorithm for rounds with
k
queries: Each round is filled with
as many non-trivial intervals as possible, using the following order: first all intervals of
category (1); then intervals of category (2); then picking intervals alternatingly from categor-
ies (3) and (4), starting with category (3). If one of the two categories (3) and (4) is exhausted,
the rest of the
k
queries is chosen from the other category. Intervals of categories (3) and (4)
are picked in order of non-increasing length of overlap with the target area, i.e., intervals of
category (3) are chosen in non-increasing order of right endpoint, and intervals of category (4)
in non-decreasing order of left endpoint. When a round is filled, it is queried, and the
algorithm restarts, with a new target area and the intervals redistributed into the categories.
Proposition 5.4. At the start of any round, a1and ba1.
Proof.
Lemma 5.3 shows
a
1. If the target area is trivial, we have
b
= 0 and hence
ba1. From now on assume that the target area is non-trivial.
Let
L
be the set of intervals in
I
that lie to the left of the target area, i.e., intervals
Ij
with
uj< ℓj1
. Similarly, let
R
be the set of intervals that lie to the right of the target area.
Observe that a+b+c+d+|L|+|R|=n.
The intervals in
L
and the intervals of type (1) and (3) include all intervals with left
endpoint at most j1. As j1is the i-th smallest left endpoint, we have |L|+a+ci.
Similarly, the intervals in
R
and the intervals of type (1) and (4) include all intervals
with right endpoint at least
uj2
. As
uj2
is the
i
-th smallest right endpoint, or equivalently
the (ni+ 1)-th largest right endpoint, we have |R|+a+dni+ 1.
Adding the two inequalities derived in the previous two paragraphs, we get 2
a
+
c
+
d
+
|L|+|R| ≥ n+ 1. Combined with a+b+c+d+|L|+|R|=n, this yields ba1.
Lemma 5.5.
If the current round of the algorithm is not the last one, then the following
holds: If the algorithm queries at least one interval of categories (3) or (4), then the algorithm
does not query all intervals of category (3) that contain
v
, or it does not query all intervals
of category (4) that contain v.
Proof.
Assume for a contradiction that the algorithm queries at least one interval of cat-
egories (3) or (4), and that it queries all intervals of categories (3) and (4) that contain
v
.
Observe that the algorithm also queries all intervals in categories (1) and (2), as otherwise it
would not have started to query intervals of categories (3) and (4). Thus, the algorithm has
queried all intervals that contain
v
and, hence, solved the problem, a contradiction to the
current round not being the last one.
Theorem 5.6. There is a 2-round-competitive algorithm for Selection.
Proof.
Consider any round of the algorithm that is not the last one. Let
A
,
B
,
C
and
D
be the sets of intervals of categories (1), (2), (3) and (4) that are queried in this round,
respectively. Let
A
,
B
,
C
and
D
be the subsets of
A
,
B
,
C
and
D
that are in
OPT1
,
respectively. By Lemmas 5.2 and 5.3,
|A|
=
|A| ≥
1. Since the algorithm prioritizes
category (1), by Proposition 5.4 we have
|B| ≤ |A| −
1, and thus
|AB| ≤
2
· |A| −
1 =
2· |A| − 12(|A|+|B|)1.
For bounding the size of
CD
, first note that the order in which the algorithm selects
the elements of categories (3) and (4) ensures that, within each category, the intervals that
contain
v
are selected first. By Lemma 5.5, there exists a category in which the algorithm
does not query all intervals that contain
v
in the current round. If that category is (3), we
have
|C|
=
|C|
and, by the alternating choice of intervals from (3) and (4) starting with (3),
T. Erlebach, M. Hoffmann, and M. S. de Lima 19
|D| ≤ |C|
and hence
|CD| ≤
2
· |C| ≤
2(
|C|
+
|D|
). If that category is (4), we have
|D|
=
|D|
and
|C| ≤ |D|
+ 1, giving
|CD| ≤
2
· |D|
+ 1
2(
|C|
+
|D|
) + 1. In both
cases, we thus have |CD| ≤ 2(|C|+|D|)+1.
Combining the bounds obtained in the two previous paragraphs, we get
|ABCD| ≤
2(
|A|
+
|B|
+
|C|
+
|D|
). This shows that, among the queries made in the round, at most
half are wasted. The total number of wasted queries in all rounds except the last one is
hence bounded by
opt1
. Since the algorithm fills each round except possibly the last one and
also queries all intervals in OPT1, the theorem follows by Proposition 2.1.
We also have the following lower bound, which proves that our algorithm has the best
possible multiplicative factor. We remark that it uses instances with
optk
= 1, and we do
not know how to scale it to larger values of
optk
. In its present form, it does not exclude the
possibility of an algorithm using at most optk+ 1 rounds.
Lemma 5.7.
There is a family of instances of Selection with
k
=
i
2with
opt1i
(and hence
optk
= 1) such that any algorithm that makes
k
queries in the first round needs
at least two rounds and performs at least opt1+(i1)/2queries.
Proof.
Consider the instance with
i
1copies of interval [0
,
3] (called
left-side
intervals),
i
1copies of interval [5
,
8] (called
right-side
intervals), and one interval [2
,
6] (called
middle
interval). The precise values are always 1for the left-side intervals, and 7for right-side
intervals. The value of the middle interval depends on the behavior of the algorithm, but
in all cases it will be the
i
-th smallest element. If the algorithm does not query the middle
interval in the first round, then we set its value to 4, so we have
opt1
= 1 and the algorithm
performs at least
opt1
+
i
= (
i
+ 1)
·opt1
queries. So assume that the algorithm queries the
middle interval in the first round. If it queries more left-side than right-side intervals, then
we set the value of the middle interval to 5
.
5, so all right-side intervals must be queried (and
all queries of left-side intervals are wasted); otherwise, we set the middle value to 2
.
5. In
either case, we have opt1=iand the algorithm wastes at least (i1)/2queries.
6 Relationship with the Parallel Model by Meißner
In [
27
, Section 4.5], Meißner describes a slightly different model for parallelization of queries.
There, one is given a maximum number
r
of
batches
that can be performed, and there is
no constraint on the number of queries that can be performed in a given batch. The goal
is to minimize the total number of queries performed, and the algorithm is compared to
an optimal query set. The number of uncertain elements in the input is denoted by
n
. In
this section, we discuss the relationship between this model and the one we described in the
previous sections.
Meißner argues that the sorting problem admits a 2-query-competitive algorithm for
r
2batches. For the minimum problem with one set, she gives an algorithm which
is
n1/r
-query-competitive, with a matching lower bound. She also gives results for the
selection and the minimum spanning tree problems.
Theorem 6.1.
If there is an
α
-query-competitive algorithm that performs at most
r
batches,
then there is an algorithm that performs at most
α·optk
+
r
1rounds of
k
queries. Conversely,
if a problem has a lower bound of
β·optk
+
t
on the number of rounds of
k
queries, then any
algorithm running at most t+ 1 batches has query-competitive ratio at least β.
Proof.
Given an
α
-query-competitive algorithm
A
on
r
batches, we construct an algorithm
B
for rounds of
k
queries in the following way. For each batch in
A
, algorithm
B
simply performs
20 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
all queries in as many rounds as necessary. In between batches, we may have an incomplete
round, but there are only r1such rounds.
In view of Meißner’s lower bound for the minimum problem with one set mentioned above,
the following result is close to being asymptotically optimal for that problem (using
α
= 1).
Theorem 6.2.
If there is an
α
-round-competitive algorithm for rounds of
k
queries, with
α
independent of
k
, then there is an algorithm that performs at most
r
batches with query-
competitive ratio O(
α·nα/(r1)
), with
r≥ ⌊α⌋ · x
+ 1 for an arbitrary natural number
x
.
In particular, for r≥ ⌊α⌋ · lg n+ 1, the query-competitive factor is O(α).
Proof.
Assume
r
=
α⌋ · x
+ 1 for some natural number
x
; otherwise we can simply leave
some batches unused. Given an
α
-round-competitive algorithm
A
for rounds of
k
queries, we
construct an algorithm
B
that performs at most
r
batches. We group them into sequences
of
α
batches. For the
i
-th sequence, for
i
= 1
, . . . , x
, algorithm
B
runs algorithm
A
for
α
rounds with
k
=
n(i1)/x
, until the problem is solved. If the problem is not solved after
α⌋ · xbatches, then algorithm Bqueries all the remaining intervals in one final batch.
To determine the query-competitive ratio, consider the number
i
of sequences of
α
batches the algorithm executes. If the problem is solved during the
i
-th sequence, then
algorithm
B
performs at most
α·
(
Pi1
j=0 nj/x
) =
α·
Θ(
n(i1)/x
)queries. (If the problem
is solved during the last batch, it performs at most
n≤ ⌊α⌋ · nx/x
queries.) On the other
hand, we claim that, if the problem is not solved after the (
i
1)-th sequence, then the
optimum solution queries at least
n(i2)/x
intervals. This is because algorithm
A
is
α
-round-
competitive, so whenever the algorithm performs a sequence of
α
rounds for a certain value
of
k
and does not solve the problem, it follows that the optimum solution requires more than
one round for this value of
k
, and hence more than
k
queries. Thus, the query-competitive
ratio is at most α⌋ · Θ(n1/x) = Θ(α·nα/(r1) ).
Therefore, an algorithm that uses a constant number of batches implies an algorithm
with the same asymptotic round-competitive ratio for rounds of
k
queries. On the other
hand, some problems have worse query-competitive ratio if we require few batches, even if
we have round-competitive algorithms for rounds of
k
queries, but the ratio is preserved by a
constant if the number of batches is sufficiently large.
7 Final Remarks
We propose a model with parallel queries and the goal of minimizing the number of query
rounds when solving uncertainty problems. Our results show that, even though the techniques
developed for the sequential setting can be utilized in the new framework, they are not
enough, and some problems are harder (have a higher lower bound on the competitive ratio).
One interesting open question is how to extend our algorithms for Minimum to the
variant where it is not necessary to return the precise minimum value, but just to identify
the minimum element. Another problem one could attack is the following generalization
of Selection: Given multiple sets
S1, . . . , Sm⊆ I
and indices
i1, . . . , im
, identify the
ij
-smallest precise value and all elements with that value in
Sj
, for
j
= 1
, . . . , m
. It would
be interesting to see if the techniques we developed for Minimum with multiple sets can be
adapted to Selection with multiple sets.
It would be nice to close the gaps in the round-competitive ratio, to understand if the
analysis of Algorithm 1 is tight, and to study whether randomization can help to obtain
better upper bounds. One could also study other problems in the parallel model, such as the
minimum spanning tree problem.
T. Erlebach, M. Hoffmann, and M. S. de Lima 21
References
1
M. Ajtai, V. Feldman, A. Hassidim, and J. Nelson. Sorting and selection with imprecise
comparisons. ACM Transactions on Algorithms, 12(2):19:1–19:19, 2016.
doi:10.1145/2701427
.
2
S. Albers and A. Eckl. Explorable uncertainty in scheduling with non-uniform testing times. In
WAOA 2020: 18th International Workshop on Approximation and Online Algorithms, Lecture
Notes in Computer Science. Springer Berlin Heidelberg, 2021. To appear. Also: arXiv preprint,
arXiv:2009.13316, 2020. URL: https://arxiv.org/abs/2009.13316.
3
L. Arantes, E. Bampis, A. V. Kononov, M. Letsios, G. Lucarelli, and P. Sens. Scheduling under
uncertainty: A query-based approach. In IJCAI 2018: 27th International Joint Conference on
Artificial Intelligence, pages 4646–4652, 2018. doi:10.24963/ijcai.2018/646.
4
Z. Beerliova, F. Eberhard, T. Erlebach, A. Hall, M. Hoffmann, M. Mihal’ák, and L. S. Ram.
Network discovery and verification. IEEE Journal on Selected Areas in Communications,
24(12):2168–2181, 2006. doi:10.1109/JSAC.2006.884015.
5
A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cambridge
University Press, 1998.
6
M. Braverman and E. Mossel. Sorting from noisy information. arXiv preprint, arXiv:0910.1191,
2009. URL: https://arxiv.org/abs/0910.1191.
7
R. Bruce, M. Hoffmann, D. Krizanc, and R. Raman. Efficient update strategies for geometric
computing with uncertainty. Theory of Computing Systems, 38(4):411–423, 2005.
doi:
10.1007/s00224-004-1180-4.
8
C. L. Canonne and T. Gur. An adaptivity hierarchy theorem for property testing. computational
complexity, 27:671–716, 2018. doi:10.1007/s00037- 018-0168- 4.
9
G. Charalambous and M. Hoffmann. Verification problem of maximal points under uncertainty.
In T. Lecroq and L. Mouchard, editors, IWOCA 2013: 24th International Workshop on
Combinatorial Algorithms, volume 8288 of Lecture Notes in Computer Science, pages 94–105.
Springer Berlin Heidelberg, 2013. doi:10.1007/978-3-642-45278-9_9.
10
C. Dürr, T. Erlebach, N. Megow, and J. Meißner. An adversarial model for scheduling with
testing. Algorithmica, 2020. doi:10.1007/s00453-020-00742-2.
11
T. Erlebach and M. Hoffmann. Minimum spanning tree verification under uncertainty. In
D. Kratsch and I. Todinca, editors, WG 2014: International Workshop on Graph-Theoretic
Concepts in Computer Science, volume 8747 of Lecture Notes in Computer Science, pages
164–175. Springer Berlin Heidelberg, 2014. doi:10.1007/978-3- 319-12340-0_14.
12
T. Erlebach and M. Hoffmann. Query-competitive algorithms for computing with uncertainty.
Bulletin of the EATCS, 116:22–39, 2015. URL:
http://bulletin.eatcs.org/index.php/
beatcs/article/view/335.
13
T. Erlebach, M. Hoffmann, and F. Kammer. Query-competitive algorithms for cheapest set
problems under uncertainty. Theoretical Computer Science, 613:51–64, 2016.
doi:10.1016/j.
tcs.2015.11.025.
14
T. Erlebach, M. Hoffmann, D. Krizanc, M. Mihal’ák, and R. Raman. Computing minimum
spanning trees with uncertainty. In S. Albers and P. Weil, editors, STACS’08: 25th International
Symposium on Theoretical Aspects of Computer Science, volume 1 of Leibniz International
Proceedings in Informatics, pages 277–288. Schloss Dagstuhl–Leibniz-Zentrum für Informatik,
2008. doi:10.4230/LIPIcs.STACS.2008.1358.
15
T. Feder, R. Motwani, L. O’Callaghan, C. Olston, and R. Panigrahy. Computing shortest
paths with uncertainty. Journal of Algorithms, 62(1):1–18, 2007.
doi:10.1016/j.jalgor.
2004.07.005.
16
T. Feder, R. Motwani, R. Panigrahy, C. Olston, and J. Widom. Computing the me-
dian with uncertainty. SIAM Journal on Computing, 32(2):538–547, 2003.
doi:10.1137/
S0097539701395668.
17
J. Focke, N. Megow, and J. Meißner. Minimum spanning tree under explorable uncertainty in
theory and experiments. In C. S. Iliopoulos, S. P. Pissis, S. J. Puglisi, and R. Raman, editors,
SEA 2017: 16th International Symposium on Experimental Algorithms, volume 75 of Leibniz
22 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
International Proceedings in Informatics, pages 22:1–22:14. Schloss Dagstuhl–Leibniz-Zentrum
für Informatik, 2017. doi:10.4230/LIPIcs.SEA.2017.22.
18
F. Gavril. Algorithms for minimum coloring, maximum clique, minimum covering by cliques,
and maximum independent set of a chordal graph. SIAM Journal on Computing, 1(2):180–187,
1972. doi:10.1137/0201013.
19
M. Goerigk, M. Gupta, J. Ide, A. Schöbel, and S. Sen. The robust knapsack problem with
queries. Computers & Operations Research, 55:12–22, 2015.
doi:10.1016/j.cor.2014.09.010
.
20
M. Gupta, Y. Sabharwal, and S. Sen. The update complexity of selection and related problems.
Theory of Computing Systems, 59(1):112–132, 2016. doi:10.1007/s00224- 015-9664- y.
21
M. M. Halldórsson and M. S. de Lima. Query-competitive sorting with uncertainty. In
P. Rossmanith, P. Heggernes, and J.-P. Katoen, editors, MFCS 2019: 44th International
Symposium on Mathematical Foundations of Computer Science, volume 138 of Leibniz Inter-
national Proceedings in Informatics, pages 7:1–7:15. Schloss Dagstuhl–Leibniz-Zentrum für
Informatik, 2019. doi:10.4230/LIPIcs.MFCS.2019.7.
22
S. Kahan. A model for data in motion. In STOC’91: 23rd Annual ACM Symposium on Theory
of Computing, pages 265–277, 1991. doi:10.1145/103418.103449.
23
S. Khanna and W.-C. Tan. On computing functions with uncertainty. In PODS’01: 20th ACM
SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 171–182,
2001. doi:10.1145/375551.375577.
24
C. Lekkerkerker and J. Boland. Representation of a finite graph by a set of intervals on the real
line. Fundamenta Mathematicae, 51(1):45–64, 1962. URL: https://eudml.org/doc/213681.
25
T. Maehara and Y. Yamaguchi. Stochastic packing integer programs with few queries.
Mathematical Programming, 182:141–174, 2020. doi:10.1007/s10107-019- 01388-x.
26
N. Megow, J. Meißner, and M. Skutella. Randomization helps computing a minimum spanning
tree under uncertainty. SIAM Journal on Computing, 46(4):1217–1240, 2017.
doi:10.1137/
16M1088375.
27
J. Meißner. Uncertainty Exploration: Algorithms, Competitive Analysis, and Computational Ex-
periments. PhD thesis, Technische Universität Berlin, 2018.
doi:10.14279/depositonce-7327
.
28
A. I. Merino and J. A. Soto. The minimum cost query problem on matroids with uncertainty
areas. In C. Baier, I. Chatzigiannakis, P. Flocchini, and S. Leonardi, editors, ICALP 2019: 46th
International Colloquium on Automata, Languages, and Programming, volume 132 of Leibniz
International Proceedings in Informatics, pages 83:1–83:14. Schloss Dagstuhl–Leibniz-Zentrum
für Informatik, 2019. doi:10.4230/LIPIcs.ICALP.2019.83.
29
C. Olston and J. Widom. Offering a precision-performance tradeoff for aggregation queries
over replicated data. In VLDB 2000: 26th International Conference on Very Large Data Bases,
pages 144–155, 2000. URL: http://ilpubs.stanford.edu:8090/437/.
30
I. O. Ryzhov and W. B. Powell. Information collection for linear programs with uncertain
objective coefficients. SIAM Journal on Optimization, 22(4):1344–1368, 2012.
doi:10.1137/
12086279X.
31
I. van der Hoog, I. Kostitsyna, M. Löffler, and B. Speckmann. Preprocessing ambiguous
imprecise points. In G. Barequet and Y. Wang, editors, SoCG 2019: 35th International
Symposium on Computational Geometry, volume 129 of Leibniz International Proceedings
in Informatics, pages 42:1–42:16. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2019.
doi:10.4230/LIPIcs.SoCG.2019.42.
32
W. A. Welz. Robot Tour Planning with High Determination Costs. PhD thesis, Technische
Universität Berlin, 2014. URL:
https://www.depositonce.tu-berlin.de/handle/11303/
4597.
A Asymptotic Growth of W1
For the sake of completeness, we include a proof of the following:
T. Erlebach, M. Hoffmann, and M. S. de Lima 23
Proposition A.1. Let W(x) = xlg x. It holds that W1(x) = Θ(x/ lg x).
Proof.
First we claim that
W1
(
x
)
2for
x
2. It holds that
W1
(
x
)is injective for
x >
0, because
ylg y
is increasing for
y >
1, and
ylg y
0for 0
< y
1but
ylg y >
0
for
y >
1. Thus
W1
(2) is unique and it is easy to check that
W1
(2) = 2. By implicit
differentiation,
x=W1(x) lg W1(x)
1 = lg W1(x) + 1
ln 2 dW 1(x)
dx
dW 1(x)
dx =1
lg W1(x) + 1
ln 2 ,
which is greater than zero for
x >
0, because
W1
(
x
)
>
1for
x >
0, because
lg y
is not defined
for
y
0, and
lg y
0for 0
< y
1. Therefore
W1
(
x
)is increasing and
W1
(
x
)
2for
x2.
Now we prove the asymptotic bounds for x2. First,
x
lg x=W1(x) lg W1(x)
lg(W1(x) lg W1(x)) W1(x)·lg W1(x)
lg W1(x)=W1(x),
where the inequality holds because ylg yyfor y=W1(x)2. Furthermore,
2x
lg x=2W1(x) lg W1(x)
lg(W1(x) lg W1(x)) 2W1(x) lg W1(x)
lg(W1(x))2=2W1(x) lg W1(x)
2 lg W1(x)=W1(x),
where the inequality holds because ylg yy2for y=W1(x)2.
Thus, we have shown that x
lg xW1(x)2x
lg xfor all x2.
B Selection with Arbitrary Intervals
In this section we prove that Theorem 5.6 also holds for arbitrary intervals.
An instance of the Selection problem is given by a set
I
of
n
intervals and an integer
i
,
1
in
. The
i
-th smallest value in the set of
n
precise values is denoted by
v
. The task
is to find vas well as identify all intervals in Iwhose precise value equals v.
We allow arbitrary intervals as input: trivial intervals containing a single value, open
intervals, closed intervals, and intervals that are closed on one side and open on the other.
Lemma 5.2 and its proof hold also for arbitrary intervals without any changes.
We call the left endpoint
i
of an interval
Ii
an
open left endpoint
if the interval
does not contain
i
and a
closed left endpoint
otherwise. The definitions of the terms
open right endpoint
and
closed right endpoint
are analogous. When we order the left
endpoints of the intervals in non-decreasing order, equal left endpoints are ordered as follows:
closed left endpoints come before open left endpoints. When we order the right endpoints
of the intervals in non-decreasing order, equal right endpoints are ordered as follows: open
right endpoints come before closed right endpoints. Equal left endpoints that are all open
can be ordered arbitrarily, and the same holds for equal left endpoints that are all closed,
for equal right endpoints that are all open, and for equal right endpoints that are all closed.
Informally, the order of left endpoints orders intervals in order of the “smallest” values they
contain, and the order of right endpoints orders intervals in order of the “largest” values
they contain. We call the resulting order of left endpoints
L
and the resulting order of
right endpoints
U
. We say that a right endpoint
ui1strictly precedes
a right endpoint
24 Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries
ui2
if either
ui1< ui2
or
ui1
=
ui2
and
ui1
is an open right endpoint and
ui2
is a closed right
endpoint.
Let
Ij1
be the interval with the
i
-th smallest left endpoint (i.e., the
i
-th left endpoint in
the order
L
), and let
Ij2
be the interval with the
i
-th smallest right endpoint (i.e, the
i
-th
right endpoint in the order
U
). Then it is clear that
v
must lie in the interval
Ita
, which
we call the target area and define as follows:
If
j1
is an open left endpoint of
Ij1
and
uj2
is an open right endpoint of
Ij2
, then
Ita = (j1, uj2).
If
j1
is an open left endpoint of
Ij1
and
uj2
is a closed right endpoint of
Ij2
, then
Ita = (j1, uj2].
If
j1
is a closed left endpoint of
Ij1
and
uj2
is an open right endpoint of
Ij2
, then
Ita = [j1, uj2).
If
j1
is a closed left endpoint of
Ij1
and
uj2
is a closed right endpoint of
Ij2
, then
Ita = [j1, uj2].
The following lemma was essentially shown by Kahan [
22
]; for the sake of completeness,
we give a proof for arbitrary intervals.
Lemma B.1
(Kahan, 1991; version of Lemma 5.3 for arbitrary intervals)
.
Assume that the
current instance of Selection is not yet solved. Then there is at least one non-trivial
interval Ijin Ithat contains the target area Ita.
Proof.
First, assume that the target area is trivial, i.e.,
Ita
=
{v}
. If there is no non-trivial
interval in Ithat contains v, then the instance is already solved, a contradiction.
Now, assume that the target area
Ita
is non-trivial. Assume that no interval in
I
contains
the target area. Then all intervals
Ij
whose left endpoint is not after
j1
in the order of left
endpoints must have a right endpoint that strictly precedes
uj2
. There are at least
i
such
intervals (because
j1
is the
i
-th smallest left endpoint), and hence the
i
-th smallest right
endpoint must strictly precede
uj2
in the order of right endpoints, a contradiction to the
definition of uj2.
The intervals that intersect the target area can be classified into four categories:
(1) anon-trivial intervals that contain the target area;
(2) b
intervals that
are strictly contained
in the target area such that the target area
contains at least one point to the left of the interval and at least one point to the right
of the interval.
(3) c
intervals that contain some part of
Ita
at the left end and do not contain some part
of
Ita
on the right end. Formally, an interval
Ii
with closed right endpoint
ui
is in
this category if
uiIta
,
IiIta
=
{vIta |vui}
and
{vIta |v > ui} ̸
=
.
Moreover, an interval
Ii
with open right endpoint
ui
is in this category if
uiIta
and
IiIta ={vIta |v < ui} ̸=.
(4) d
intervals that contain some part of
Ita
at the right end and do not contain some
part of
Ita
on the left end. Formally, an interval
Ii
with closed left endpoint
i
is in
this category if
iIta
,
IiIta
=
{vIta |vi}
and
{vIta |v < ℓi} ̸
=
.
Moreover, an interval
Ii
with open left endpoint
i
is in this category if
iIta
and
IiIta ={vIta |v > ℓi} ̸=.
We propose the following algorithm for rounds with
k
queries. Each round is filled with
as many intervals as possible, using the following order: First all intervals of category (1);
then intervals of category (2); then picking intervals alternatingly from categories (3) and (4),
starting with category (3). If one of the two categories is exhausted, the rest of the
k
T. Erlebach, M. Hoffmann, and M. S. de Lima 25
queries is chosen from the other category. Intervals of categories (3) and (4) are picked in
order of non-increasing length of overlap with the target area. More precisely, intervals of
category (3) are chosen according to the reverse of the order
U
of their right endpoints,
and intervals of category (4) are chosen according to the order
L
of their left endpoints.
When a round is filled, it is queried, and the algorithm restarts, calculating a new target
area and redistributing the intervals into the categories.
Proposition B.2
(Version of Proposition 5.4 for arbitrary intervals)
.
At the start of any
round, a1and ba1.
Proof.
Lemma B.1 shows
a
1. If the target area is trivial, we have
b
= 0 and hence
ba1. From now on assume that the target area is non-trivial.
Let
L
be the set of intervals in
I
that lie to the left of
Ita
(and have empty intersection
with
Ita
). Similarly, let
R
be the set of intervals that lie to the right of
Ita
(and have empty
intersection with Ita). Observe that a+b+c+d+|L|+|R|=n.
The intervals in
L
and the intervals of type (1) and (3) include all intervals with left
endpoint not after
j1
in the order
L
. As
j1
is the
i
-th left endpoint in that order, we have
|L|+a+ci.
Similarly, the intervals in
R
and the intervals of type (1) and (4) include all intervals
with right endpoint not before
uj2
in the order
U
. As
uj2
is the
i
-th smallest right endpoint
in that order, or equivalently the (
ni
+ 1)-th largest right endpoint in that order, we have
|R|+a+dni+ 1.
Adding the two inequalities derived in the previous two paragraphs, we get 2
a
+
c
+
d
+
|L|+|R| ≥ n+ 1. Combined with a+b+c+d+|L|+|R|=n, this yields ba1.
Lemma 5.5 and its proof hold for arbitrary intervals without any changes.
Theorem B.3.
There is a 2-round-competitive algorithm for Selection even if arbitrary
intervals are allowed as input.
Proof.
The proof is identical to the proof of Theorem 5.6, except that Lemma B.1 and
Proposition B.2 are used in place of Lemma 5.3 and Proposition 5.4, respectively.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We introduce a novel adversarial model for scheduling with explorable uncertainty. In this model, the processing time of a job can potentially be reduced (by an a priori unknown amount) by testing the job. Testing a job j takes one unit of time and may reduce its processing time from the given upper limit pˉj\bar{p}_j (which is the time taken to execute the job if it is not tested) to any value between 0 and pˉj\bar{p}_j. This setting is motivated e.g., by applications where a code optimizer can be run on a job before executing it. We consider the objective of minimizing the sum of completion times on a single machine. All jobs are available from the start, but the reduction in their processing times as a result of testing is unknown, making this an online problem that is amenable to competitive analysis. The need to balance the time spent on tests and the time spent on job executions adds a novel flavor to the problem. We give the first and nearly tight lower and upper bounds on the competitive ratio for deterministic and randomized algorithms. We also show that minimizing the makespan is a considerably easier problem for which we give optimal deterministic and randomized online algorithms.
Conference Paper
Full-text available
We consider the minimum spanning tree (MST) problem in an uncertainty model where uncertain edge weights can be explored at extra cost. The task is to find an MST by querying a minimum number of edges for their exact weight. This problem has received quite some attention from the algorithms theory community. In this paper, we conduct the first practical experiments for MST under uncertainty, theoretically compare three known algorithms, and compare theoretical with practical behavior of the algorithms. Among others, we observe that the average performance and the absolute number of queries are both far from the theoretical worst-case bounds. Furthermore, we investigate a known general preprocessing procedure and develop an implementation thereof that maximally reduces the data uncertainty. We also characterize a class of instances that is solved completely by our preprocessing. Our experiments are based on practical data from an application in telecommunications and uncertainty instances generated from the standard TSPLib graph library.
Article
Full-text available
Adaptivity is known to play a crucial role in property testing. In particular, there exist properties for which there is an exponential gap between the power of \emph{adaptive} testing algorithms, wherein each query may be determined by the answers received to prior queries, and their \emph{non-adaptive} counterparts, in which all queries are independent of answers obtained from previous queries. In this work, we investigate the role of adaptivity in property testing at a finer level. We first quantify the degree of adaptivity of a testing algorithm by considering the number of "rounds of adaptivity" it uses. More accurately, we say that a tester is k-(round) adaptive if it makes queries in k+1 rounds, where the queries in the i'th round may depend on the answers obtained in the previous i1i-1 rounds. Then, we ask the following question: Does the power of testing algorithms smoothly grow with the number of rounds of adaptivity? We provide a positive answer to the foregoing question by proving an adaptivity hierarchy theorem for property testing. Specifically, our main result shows that for every nNn\in \mathbb{N} and 0kn0.990 \le k \le n^{0.99} there exists a property Pn,k\mathcal{P}_{n,k} of functions for which (1) there exists a k-adaptive tester for Pn,k\mathcal{P}_{n,k} with query complexity O~(k)\tilde{O}(k), yet (2) any (k1)(k-1)-adaptive tester for Pn,k\mathcal{P}_{n,k} must make Ω(n)\Omega(n) queries. In addition, we show that such a qualitative adaptivity hierarchy can be witnessed for testing natural properties of graphs.
Article
Full-text available
We present a framework for computing with input data specified by intervals, representing uncertainty in the values of the input parameters. To compute a solution, the algorithm can query the input parameters that yield more refined estimates in the form of sub-intervals and the objective is to minimize the number of queries. The previous approaches address the scenario where every query returns an exact value. Our framework is more general as it can deal with a wider variety of inputs and query responses and we establish interesting relationships between them that have not been investigated previously. Although some of the approaches of the previous restricted models can be adapted to the more general model, we require more sophisticated techniques for the analysis and we also obtain improved algorithms for the previous model. We address selection problems in the generalized model and show that there exist 2-update competitive algorithms that do not depend on the lengths or distribution of the sub-intervals and hold against the worst case adversary. We also obtain similar bounds on the competitive ratio for the MST problem in graphs.
Article
We consider a stochastic variant of the packing-type integer linear programming problem, which contains random variables in the objective vector. We are allowed to reveal each entry of the objective vector by conducting a query, and the task is to find a good solution by conducting a small number of queries. We propose a general framework of adaptive and non-adaptive algorithms for this problem, and provide a unified methodology for analyzing the performance of those algorithms. We also demonstrate our framework by applying it to a variety of stochastic combinatorial optimization problems such as matching, matroid, and stable set problems.
Conference Paper
We consider a single machine, a set of unit-time jobs, and a set of unit-time errors. We assume that the time-slot at which each error will occur is not known in advance but, for every error, there exists an uncertainty area during which the error will take place. In order to find if the error occurs in a specific time-slot, it is necessary to issue a query to it. In this work, we study two problems: (i) the error-query scheduling problem, whose aim is to reveal enough error-free slots with the minimum number of queries, and (ii) the lexicographic error-query scheduling problem where we seek the earliest error-free slots with the minimum number of queries. We consider both the off-line and the on-line versions of the above problems. In the former, the whole instance and its characteristics are known in advance and we give a polynomial-time algorithm for the error-query scheduling problem. In the latter, the adversary has the power to decide, in an on-line way, the time-slot of appearance for each error. We propose then both lower bounds and algorithms whose competitive ratios asymptotically match these lower bounds.
Article
Given a graph with "uncertainty intervals" on the edges, we want to identify a minimum spanning tree by querying some edges for their exact edge weights which lie in the given uncertainty intervals. Our objective is to minimize the number of edge queries. It is known that there is a deterministic algorithm with best possible competitive ratio 2 [T. Erlebach, et al., in Proceedings of STACS, Schloss Dagstuhl, Dagstuhl, Germany, 2008, pp. 277-288]. Our main result is a randomized algorithm with expected competitive ratio 1 + 1/√2 ≈ 1.707, solving the long-standing open problem of whether an expected competitive ratio strictly less than 2 can be achieved [T. Erlebach and M. Hoffmann, Bull. Eur. Assoc. Theor. Comput. Sci. EATCS, 116 (2015)]. We also present novel results for various extensions, including arbitrary matroids and more general querying models.
Thesis
In der Automobilindustrie spielt die Computerisierung eine immer größer werdende Rolle. Deshalb und aus Kostengründen ist es vorteilhaft, die auftretenden Prozesse mit mathematischen Methoden zu optimieren. Nur so kann auch automatisch sofort auf Änderungen des Produktionsprozesses reagiert werden. In der vorliegenden Arbeit wird hierzu ein Beitrag geleistet, indem sogenannte Schweißzellen studiert werden, in denen mehrere Roboterarme gleichzeitig Schweißpunkte an demselben Werkstück vornehmen. Dabei sind die Touren dieser Schweißroboter so zu planen, dass die notwendigen Schweißpunkte in möglichst kurzer Zeit abgefahren werden, ohne dass es zu Kollisionen zwischen den Robotern kommt. Wissenschaftlich gesehen besteht diese Optimierungsaufgabe, das "Welding Cell Problem", aus Aspekten zweier klassischer Bereiche der Mathematik: Es hat auf der einen Seite Ähnlichkeiten mit dem berühmten "Problem des Handlungsreisenden", in dem eine optimale kollisionsfreie Reihenfolge der Punkte bestimmt werden muss. Auf der anderen Seite enthält es auch klassische Elemente der Bewegungsplanung und -optimierung der Robotik. Im ersten Teil der Arbeit werden die eher praktischen Probleme des Welding Cell Problems untersucht. In diesem Zusammenhang stellen wir einen Algorithmus vor, der die beiden Teile dieses Problems kombiniert, indem die kontinuierliche Bewegungsplanung direkt in den branch-and-price-Prozess des diskreten Teils integriert wird. Da die Trajektorienberechnung rechnerisch viel aufwändiger ist als die diskrete Optimierung, wurde bei diesem Ansatz besonderer Wert darauf gelegt, unnötige Pfadberechnungen zu vermeiden. Dieser Ansatz führt auch zu der wichtigen theoretischen Frage, wie viele dieser Berechnungen im Minimum erforderlich sind um eine optimale Lösung garantieren zu können. Diese Frage wird im zweiten Teil genauer analysiert. Dazu werden einige klassische kombinatorische Probleme (kürzeste Wege, minimal spannende Bäume und das Problem des Handlungsreisenden) in diesem sogenannten Uncertainty-Szenario betrachtet. Am Ende ist es so möglich einen Approximationsalgorithmus für das TSP mit Unsicherheiten anzugeben, der in Erwartung nur O(n) solcher exakter Distanzberechnungen benötigt. Dieses Problem ähnelt dem "U-Bahn-Problem", bei dem das gesamte U-Bahnnetz einer Großstadt in möglichst kurzer Zeit durchfahren werden muss. Im letzten Kapitel beschrieben wir deshalb einen branch-and-cut Algorithmus, der durch das Hinzufügen von Kurzzyklusungleichungen in der Lage ist, eine optimale Lösung des U-Bahn-Problems für eine Metropole - hier am Beispiel von Berlin - in weniger als einer Sekunde zu berechnen.
Conference Paper
In the verification under uncertainty setting, an algorithm is given, for each input item, an uncertainty area that is guaranteed to contain the exact input value, as well as an assumed input value. An update of an input item reveals its exact value. If the exact value is equal to the assumed value, we say that the update verifies the assumed value. We consider verification under uncertainty for the minimum spanning tree (MST) problem for undirected weighted graphs, where each edge is associated with an uncertainty area and an assumed edge weight. The objective of an algorithm is to compute the smallest set of updates with the property that, if the updates of all edges in the set verify their assumed weights, the edge set of an MST can be computed. We give a polynomial-time optimal algorithm for the MST verification problem by relating the choices of updates to vertex covers in a bipartite auxiliary graph. Furthermore, we consider an alternative uncertainty setting where the vertices are embedded in the plane, the weight of an edge is the Euclidean distance between the endpoints of the edge, and the uncertainty is about the location of the vertices. An update of a vertex yields the exact location of that vertex. We prove that the MST verification problem in this vertex uncertainty setting is NP-hard. This shows a surprising difference in complexity between the edge and vertex uncertainty settings of the MST verification problem.
Article
Considering the model of computing under uncertainty where element weights are uncertain but can be obtained at a cost by query operations, we study the problem of identifying a cheapest (minimum-weight) set among a given collection of feasible sets using a minimum number of queries of element weights. For the general case we present an algorithm that makes at most queries, where d is the maximum cardinality of any given set and OPT is the optimal number of queries needed to identify a cheapest set. For the minimum multi-cut problem in trees with d terminal pairs, we give an algorithm that makes at most queries. For the problem of computing a minimum-weight base of a given matroid, we give an algorithm that makes at most queries, generalizing a known result for the minimum spanning tree problem. For each of the above algorithms we give matching lower bounds. We also settle the complexity of the verification version of the general cheapest set problem and the minimum multi-cut problem in trees under uncertainty.