Content uploaded by Rodrigo A. Carrasco
Author content
All content in this area was uploaded by Rodrigo A. Carrasco on Aug 07, 2018
Content may be subject to copyright.
The Online Set Aggregation Problem
Rodrigo A. Carrasco1?, Kirk Pruhs2??, Cliff Stein3???, and Jos´e Verschae4†
1Facultad de Ingenier´ıa y Ciencias, Universidad Adolfo Ib´a˜nez, Santiago, Chile,
rax@uai.cl
2Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA,
kirk@cs.pitt.edu
3Department of Industrial Engineering and Operations Research, Columbia
University, New York, NY, USA, cliff@ieor.columbia.edu
4Facultad de Matem´aticas & Escuela de Ingenier´ıa, Pontificia Universidad Cat´olica
de Chile, Santiago, Chile, jverschae@uc.cl
Abstract. We introduce the online Set Aggregation Problem, which is
a natural generalization of the Multi-Level Aggregation Problem, which
in turn generalizes the TCP Acknowledgment Problem and the Joint
Replenishment Problem. We give a deterministic online algorithm, and
show that its competitive ratio is logarithmic in the number of requests.
We also give a matching lower bound on the competitive ratio of any
randomized online algorithm.
Keywords: online algorithms, competitive analysis, set aggregation,
multilevel aggregation
1 Introduction
Problem Statement: We introduce an online problem, which we call the Set
Aggregation Problem. In this problem, a sequence Rof requests arrives over time.
We assume time is continuous. Each request ρ∈Rhas an associated release time
rρwhen it arrives, and an associated waiting cost function wρ(t) that specifies
the waiting cost for this request if it is first serviced at time t. We assume that
the online algorithm learns wρ(t) at time rρ. We also assume that each wρ(t) is
non-decreasing, left continuous, and limt→∞ wρ(t) = ∞.
At any time t, the online algorithm can decide to service any subset Sof
the previously released requests. Thus, a schedule for this instance is a sequence
(S1, t1),(S2, t2),...,(Sk, tk), where the Si’s are sets of requests and the ti’s are
the times that these sets were serviced. We will implicitly restrict our attention
to feasible schedules, which are those for which every request is serviced, i.e. for
all ρ∈R, exists an (Si, ti) in the resulting sequence where ρ∈Siand ti≥rρ.
?Supported in part by Fondecyt Project Nr. 1151098.
?? Supported in part by NSF grants CCF-1421508 and CCF-1535755, and an IBM
Faculty Award.
??? Supported in part by NSF grant CCF-1421161.
†Supported in part by Nucleo Milenio Informaci´on y Coordinaci´on en Redes ICM/FIC
CODIGO RC130003, and Fondecyt Project Nr. 11140579.
2
We assume that there is a time-invariant service cost function C(S) that
specifies the cost for servicing a set Sof the requests. Without any real loss of
generality, we will generally assume that C(S) is monotone in the sense that
adding requests to Scan not decrease C(S). We will postpone the discussion on
how the online algorithm learns C(S) until we state our results, but essentially
our lower bound holds even if the online algorithm knows C(S) a priori, and our
upper bound holds even if C(S) is only known for subsets Sof released requests.
The online algorithm then incurs in two types of costs, a waiting cost and
a service cost. At time tithe algorithm incurs a waiting cost of Wt(Si) =
Pρ∈Siwρ(t) for servicing set Si, which is just the aggregate waiting cost of
the serviced requests. At the same time tithe algorithm also incurs in a service
cost of C(Si), and the total service cost the algorithm incurs is Pk
i=1 C(Si).
The total cost associated with schedule (S1, t1),(S2, t2),...,(Sk, tk) is therefore
Pk
i=1 C(Si) + Pk
i=1 Wt(Si), and the objective is to find the schedule that mini-
mizes the total cost.
In the deadline version of the set aggregation problem, the waiting cost of
each request is zero until a specified deadline for that request, after which the
waiting cost becomes infinite.
Most Relevant Background: Our main motivation for introducing the Set
Aggregation Problem is that it is a natural generalization of the Multi-Level
Aggregation Problem (MLAP), which was introduced by [1]. In the MLAP, the
requests are vertices in an a priori known tree T, and the edges (and/or vertices)
of Thave associated costs. Further, a set Sof requests has service cost equal
to the minimum cost subtree of Tthat contains the root of Tand contains all
of the requests in S. The motivation for [1] to introduce the MLAP is that it
generalizes both the well-known TCP Acknowledgment Problem (TCPAP), and
the well-known Joint Replenishment Problem (JRP), both of which correspond
to special cases of the MLAP in which the tree Thas constant height. In [1]
the authors gave an online algorithm for the Multi-Level Aggregation Problem,
and showed that the competitive ratio of the algorithm is O(d42d), where d
is the height of T. It is an open question whether constant competitiveness is
achievable in the MLAP, and only a constant lower bound is known. In [2] the
authors gave an O(d)-approximation for the deadline version of the problem.
Our Results: The Set Aggregation Problem allows us to study how critical is
the restriction that service costs come from an underlying tree. More specifically
it is natural to ask:
–Can constant competitiveness be achieved if there are no restrictions on
service costs (other than monotonicity)?
–And if not, what is the best achievable competitive ratio?
In this paper we answer these two questions. We first show, in Section 2,
that the competitive ratio of every algorithm (deterministic or randomized) is
at least logarithmic in the number of requests for the deadline version of the set
aggregation problem. This lower bound holds against an online algorithm that
a priori knows the domain Rof all possible requests that might arrive, and the
3
service cost C(S) for each possible subset Sof R. Thus, we can conclude that if a
constant competitive algorithm exists for the Multi-Level Aggregation Problem,
then the analysis must use the fact that the service costs can not be arbitrary.
Intuitively, the requests in our lower bound instance are each associated with
a node in a full binary tree T. There are n/2drequests associated with a node
of depth din the tree. The lifetime of the requests inherit the same laminar
structure of the tree– the lifetime of the nrequests associated with the root of
Tis [0, n], the lifetime of the n/2 requests associated with the left child of the
root is [0, n/2], and the lifetime of the n/2 requests associated with the right
child of the root is [n/2, n]. The set costs are defined so that there is clearly
no benefit to include more than one request associated with each node in any
serviced set. Then the requests in the subtree rooted at the left child of the
root aggregate with all requests associated with the root, but the only requests
associated with the root that aggregate with requests associated with the right
child of the root are those that the algorithm serviced during the first half of
these request’s lifetime. So the algorithm incurs an unnecessary incremental cost
for each set serviced during [n/2, n]. The instance recursively applies this same
idea lower down in T.
To complement this lower bound, in Section 3 we give a deterministic online
algorithm RetrospectiveCover for the set aggregation, and show the com-
petitive ratio of this algorithm is logarithmic in the number of requests. The
algorithm only needs to know the service cost C(S) for subsets Sof requests
that are released but unserviced.
Let us give a brief (necessarily simplified) overview of, and intuition for,
the RetrospectiveCover algorithm. Define a proactive schedule to be one
in which the total waiting cost of every serviced set is at most its service cost.
The algorithm maintains a lower bound LB(t) for the least possible service cost
incurred by any proactive schedule up until the current time t. Let ube a time
where LB(u) = 2kand let vbe the future time where LB(v) = 2k, assuming
no more requests are released. Intuitively, at time uthe sets in the proactive
schedule associated with LB(v) are serviced, and then a recursive call is made
to handle requests that arrive until the time wwhen LB(w)=2k+1 . Note that
due to the release of new requests it may be that wis earlier than v.
Computing the state of the optimum at the current time, and moving to
that state, is a classic technique in online algorithms, and is often called the
Retrospective algorithm [3]. For example, the Retrospective algorithm is
optimally competitive for the online metric matching problem [4, 5]. Intuitively
the RetrospectiveCover algorithm is a generalization of the Retrospec-
tive algorithm, that computes the state of optimal at many carefully-selected
times, for many different carefully-selected sub-instances, and then moves to a
state that somehow covers/combines all of these optimal states. This Retro-
spectiveCover algorithmic design technique seems relatively general, and at
least plausibly applicable to other problems.
Within the context of the Multi-Level Aggregation problem, our upper bound
on the achievable competitive ratio is incomparable to the upper bound obtained
in [1]. The upper bound obtained in [1] is better if dr2dis asymptotically less
4
than the logarithm of the number of requests, otherwise the upper bound that we
obtain is better. As a caveat, computing the lower bound used by our algorithm
is definitely N P -hard, and its not clear to us how to even obtain a polynomial-
time offline O(log |R|)-approximation algorithm. Techniques used in the prior
literature on offline algorithms for TCPAP and JRP do not seem to be applicable.
Further Background: As mentioned above, the Set Aggregation Problem gen-
eralizes the TCPAP and the JRP. More generally, we can think of the case of
the Set Aggregation Problem where the set relationship forms a tree. More pre-
cisely, if we include an edge for every pair of sets S1and S2, where S1⊂S2
and there is no set S3such that S1⊂S3⊂S2, the family of sets Siis lami-
nar and the resulting graph is a tree. In [6], this problem is referred to as the
Multi-Level Aggregation Problem. If the tree is of height one (a root and leaves),
we obtain the TCPAP. The offline version of the TCPAP on nrequests can be
solved exactly in O(nlog n) time [7]. The best deterministic approximation has
competitive ratio 2 [8], and the best randomized is e/(e−1) [9]. We note that
the TCPAP is equivalent to the Lot Sizing Problem that has been studied in the
operations research literature since the 1950s.
If the tree has two levels, we obtain the Joint Replenishment Problem. The
best offline approximation is 1.791 [10] and the best competitive ratio is 3, via
a primal dual-algorithm [11]. There is also a deadline version of the JRP where
there is no cost for waiting but each request must be satisfied before its deadline.
This problem is a special case of a general cost function and has an approximation
ratio of 1.574 [12] and online competitive ratio of 2 [10].
For general trees of height D, we obtain the Multi-Level Aggregation Prob-
lem. This more general problem has several applications in computing, including
protocols for aggregating control messages [13, 14], energy efficient data aggrega-
tion and fusion in sensor networks [15,16], and message aggregation in organiza-
tional hierarchies [17]. There are also applications in lot sizing problems [18–20].
For the deadline version of the MLAP, there is an offline 2-approximation algo-
rithm [21]. In unpublished work, Pedrosa [22] showed how to adapt an algorithm
of Levi et al. [23] for the Multistage Assembly Problem to obtain a 2 + ap-
proximation algorithm for the MLAP with general waiting cost functions. For
the online case, there is no constant competitive algorithm known. In [1], the
authors give a O(d42d) competitive algorithm for trees of height d, and a some-
what better bound of O(d22d) for the deadline version. Their algorithms use a
reduction from general trees to trees of exponentially decreasing weights as one
goes down the tree, and therefore relies heavily on the tree structure. Building
on this, [2] give an O(d)-competitive algorithm for the deadline version.
2 The Lower Bound
We prove a Ω(log |R|) lower bound on the competitive ratio of any deterministic
online algorithm for the deadline version of the Set Aggregation Problem.
Instance Construction Conceptually the requests are partitioned such that each
request is associated with a node in a full n-node binary tree T. The root of T
has depth 0 and height lg n. A leaf of Thas depth lg nand height 0. A node
5
of height hin Twill have 2hrequests associated with it. Thus a leaf in Thas
one associated request, and the root of Thas nrequests associated with it.
Hence there are nrequests per level and n(1 + lg n) in total. The lifetime of a
request associated with the kth node at depth d(so k∈ {1,2,...,2d}) in the
tree is [(k−1)n/2d, kn/2d). So the lifetime of all requests associated with the
same node are the same, the lifetimes of the requests inherit the same laminar
structure from T, and the lifetimes of the requests associated with nodes at a
particular level of the tree cover the time interval [0, n). Let Rxbe the requests
associated with a node xin T, and Txbe the requests associated with nodes in
the subtree rooted at xin T. We say a collection Sof requests is sparse if it
does not contain two requests associated with nodes of the same depth in T.
We now proceed to describe the service costs. To do so we will split each
set Rxinto two sets of equal cardinality by defining a set Ux⊆Rxwith |Ux|=
|Rx|/2. The specific requests that belong to Uxwill be decided online by the
adversary depending on partial outputs of the algorithm. Let us consider a set
of requests S. If Sis ever serviced by any algorithm at a given point in time t∈
(0, n), the laminar structure of the lifetimes implies that the requests correspond
to a subset of nodes in a path Pof Twith endpoints at the root and some leaf
vof T, where the lifetime of vcontains t. Otherwise, we can define the cost of S
arbitrarily, e.g., as ∞. We first define the cost of a set Sthat is sparse. The cost
of Sis defined inductively by walking up the path P. We say that a request r
aggregates with a set Sif the service cost of S∪ {r}is the same as the service
cost of S. If rdoes not aggregate with Sthen the service cost of S∪ {r}is the
service cost of Splus one. Finally, we say that ris not compatible with Sif the
cost of S∪{r}is +∞. If Sis a sparse set of requests in Rxwhere xis a leaf, then
|S|= 1 and its service cost is also one. Consider now a sparse set S6=∅with
requests belonging to Tyfor some y. Let xbe the parent of yand ra request
corresponding to x. Note that xis not a node associated to the requests in S.
We have three cases:
–If S∩Ry=∅, then ris not compatible with S.
–Otherwise if yis a left-child of x, then ralways aggregates with S.
–Otherwise if yis a right-child of x, then raggregates with Swhen r6∈ Ux,
and it does not aggregate with Sif r∈Ux.
Notice that, using this definition inductively, we have that the cost of S6=∅
is either infinity or an integer between 1 and the height of yplus one (inclusive)5.
Also, the first condition implies that for a sparse set Sto have finite cost there
must exists a unique path Pfrom a leaf to node xsuch that exactly one request
in Sbelongs to Rvfor each node vof P.
If Sis an arbitrary set of requests (always corresponding to a leaf-to-root
path P), then Scan be decomposed into several sparse sets. We define the cost
5We remark that this definition does not yield monotone service costs. However this
does not cause any trouble. Indeed, if two sets S1⊆S2fulfill C(S2)< C(S1), then
the algorithm can simply serve S2instead of S1without increasing its cost and
without affecting feasibility. Hence, any instance with service cost Ccan be turned
to an equivalent instance with non-decreasing service cost C0(S) = minT⊇SC(T).
6
of Sas the minimum over all possible decomposition, of the sum of the costs of
the sparse sets in the decomposition. In this way, any solution can be converted
to a solution of the same cost where each serviced set is sparse. Hence, we can
restrict ourselves to consider only sparse sets.
Adversarial Strategy We now explain how to choose the requests in Uxfor each
x∈T. Let (ax, cx) be the lifetime of the requests associated with x, and bx
the midpoint of (ax, cx). Let the left and right children of xbe `(x) and r(x),
respectively. Let Sxbe the requests in Rxthat the online algorithm has serviced
by time bx. If |Sx| ≤ |Rx|/2 then let Uxbe an arbitrary subset of |Rx|/2 requests
from Rx−Sx. If |Sx|>|Rx|/2 then let Uxcontain all the requests in Rx−Sx
plus an arbitrary set of |Sx|−|Rx|/2 requests from Sx.
To see that this is a valid adversarial strategy, consider a node xin T, and
let Pbe the path in Tfrom the root to x, and u1, . . . , ukbe the nodes in P
whose right child is also in P. The requests associated with ancestors of xin T
that requests in Rxwill not aggregate with are affected by the sets Uu1,...Uuk,
which are all known by time ax.
Competitiveness Analysis It is not hard to see that the optimum has cost at most
n. We can construct a solution where each serviced set is sparse as follows. For
each node xin T, any one request in Uxis serviced at each time in (ax, bx), and
any one request from Rx−Uxis serviced at each time in (bx, cx). We can now
construct a solution in which each serviced set Scorresponds to each different
leaf-to-root path, where each node in the path corresponds to exactly one request
in S. Each set Scan me made to have a service cost of one. Since there are n
leaves, thus nleaf-to-root paths, the constructed solution will have cost n.
Now consider the cost to the online algorithm for requests in Rx. We say
that the incremental cost at xis how much more the online algorithm would
pay for servicing the requests associated to nodes in Txthan it would pay for
servicing requests in Tx−Rxif requests in Rxwere deleted from any (sparse) set
Sserviced by the online algorithm. First consider the case that |Sx|≤|Rx|/2.
In this case at time bxthe online algorithm has |Rx|/2 unserviced requests in
Uxthat do not aggregate with any requests in Tr(x). Each such request can be
only serviced together with a set S⊆Tr(x), and thus each request in Uximply
an increase of 1 in the cost. Then the incremental cost at xis at least |Rx|/2.
Now let us assume that |Sx|>|Rx|/2. Consider requests in Sx, which are
serviced during the time period (ax, bx). Indeed, a request in Sxcan be compat-
ible with at most |R`(x)|=|Rx|/2 many sparse sets within R`(x)(one for each
request in R`(x)). Hence, in this case the online algorithm incurs an incremental
cost of at least |Sx|−|Rx|/2 for the requests in Sx. Request in Rx−Sx=Uxall
do not aggregate (or are incompatible) with sparse sets in Tr(x), and hence the
incur a cost of |Rx|−|Sx|. Thus in both cases the total the incremental cost at
xis at least |Rx|/2. As there are nlog nrequests in total, the online algorithms
pays at least (nlog n)/2. Thus we have shown a lower bound of Ω(log n) on the
competitive ratio.
Note that one can apply Yao’s technique, where the identity of the requests in
Uxare selected uniformly at random from the requests in Rx, to get an Ω(log n)
lower bound on the competitive ratio of any randomized online algorithm.
7
Theorem 1. Any randomized online algorithm for the deadline version of the
online set aggregation problem is Ω(log(|R|))-competitive.
3 The Upper Bound
In Subsection 3.1 we define a lower bound on the optimum used within the online
algorithm, and observe some relatively straightforward properties of this lower
bound. In Subsection 3.2 we state the online algorithm. In Subsection 3.3 we
show that the online algorithm is O(log |R|) competitive.
3.1 Lower Bound on the Optimal Solution
We will simplify our lower bound and algorithm by restricting our attention to
sets whose waiting cost is at most their service cost. By doing so, we will be able
to focus only on service cost.
Definition 1. We say that a set Sof requests is violated at time tin a schedule
if time tWt(S)> C(S). A feasible schedule is proactive if it does not contain
any violated set.
We will also use in our algorithm a lower bound on the service cost of sched-
ules for subsets of the input in subintervals of time. As mentioned in Section 1,
computing the lower bound is NP-hard, but it is critical to guide our online
algorithm.
Definition 2. The lower bound LB−(s, t, d)is the minimum cost over all proac-
tive schedules Zfor the requests released in the time interval (s, t), of the total
service cost incurred in Zduring the time period (s, d). Let LB+(s, t, d)be the
minimum over all proactive schedules Zfor the requests released in the time in-
terval (s, t), of the total service cost incurred in Zduring the time period (s, d].
Polymorphically we will also use LB−(s, t, d)(resp. LB+(s, t, d)) to denote the
sets serviced within (s, d)(resp. (s, d]) by the proactive schedules that attains the
minimum.
The difference between LB−(s, t, d) and LB+(s, t, d) is that service cost in-
curred at time dare included in LB+(s, t, d), but not in LB−(s, t, d). Notice that
the values of LB−(s, t, d) and LB+(s, t, d) do not depend on future requests, and
thus can be computed at time tby an online algorithm.
Lemma 1. There exists a proactive schedule whose objective value is at most
twice optimal.
Lemma 2. The value LB−(s, t, d)and LB+(s, t, d)are monotone on the set of
requests, that is, adding more requests between times sand tcan not decrease
either. Moreover, LB−(s, t, d)and LB+(s, t, d)are non-decreasing as a function
of tand as a function of d, for any s≤t≤d.
The proof for Lemmas 1 and 2 are in Section A.
8
3.2 Algorithm Design
We now give our algorithm RetrospectiveCover, which is executed at each
time t. Although we understand that this is nonstandard, we believe that most in-
tuitive way to conceptualize our algorithm is to think of the algorithm as execut-
ing several concurrent processes. The active processes are numbered 1,2, . . . , a.
Each process imaintains a start time s[i]. A process ireaches a new mile-
stone at time tif the value of LB+(s[i], t, t) is at least 2 ·LB+(s[i], m[i], m[i]),
where m[i] is the time of the last milestone for process i. When a milestone
for process iis reached at time t, each higher numbered process `services the
sets in LB−(s[`], t, t), and then terminates. Process ithen services the sets in
LB−(s[i], t, d[i]), where d[i] is the earliest time after twhere LB+(s[i], t, d[i]) ≥
2·LB+(s[i], t, t). Process ithen starts a process i+ 1 with start time of t.
We give pseudocode for our algorithm RetrospectiveCover to explain
how various corner cases are handled, and to aide readers who don’t want to think
about the algorithm in terms of concurrent processes. RetrospectiveCover
uses a subroutine CheckMilestone that checks, for a process i, whether it has
reached a milestone at the current time t.RetrospectiveCover is initialized
by setting a= 1, s[1] = 0 and m[1] = s[1].
CheckMilestone(i, t)
1if m[i] = s[i]then
2// Process ihas not seen its first milestone yet
3if LB+(s[i], t, t)>0then
4return true
5else return false
6else
7if LB+(s[i], t, t)≥2·LB+(s[i], m[i], m[i]) then
8return true
9else return false
RetrospectiveCover(t)
1for i= 1 to a
2if CheckMilestone(i, t)then
3for `=adownto i+ 1
4if any request has arrived since time m[`]then
5 service every set in LB−(s[`], t, t)
// Intuitively process `now terminates
6// The analysis will prove in an inductive invariant on the state
at this point
7 Let d[i] be the earliest time after twhere
LB+(s[i], t, d[i]) ≥2·LB+(s[i], t, t)
if such a time exists, and d[i] is infinite otherwise.
8 Service the sets in LB−(s[i], t, d[i])
9m[i] = t
10 a=i+ 1; s[a] = t;m[a] = s[a]; return
9
…"
s[i]"
u-1"
m[i]"
u0" u1" u2" uk-1"
t"
uk"
+2x0"
+2x1"
+2x2"
+2xk-1"
+2x1"
+2x2"
+2xk-1"
…"
Fig. 1. An illustration of the generic situation, and when costs are incurred by Ret-
rospectiveCover. The terms in the diagonal depict the cost incurred in line 8 and
the terms on the right terms in line 5.
3.3 Algorithm Analysis
The first part of this subsection is devoted to proving Lemma 4, which is the key
lemma, and states that the service cost incurred by RetrospectiveCover is
within a logarithmic factor of optimal. Next, we show how this readily implies
that RetrospectiveCover is O(log |R|)-competitive.
We first make some simplifying assumptions, and then prove one simple prop-
erty of RetrospectiveCover. Without loss of generality, we can assume that
no two requests are released at the same time, and that no requests are released
exactly at any milestone (one can sequentialize the releases arbitrarily followed
by the unique milestone). Similarly, without loss of generality we can assume that
each waiting function wρ(t) is a continuous function of t. Finally, we can assume
without loss of generality that the initial waiting time is 0, that is wρ(rρ) = 0.
With this we can state the following Lemma, which is proved in Section A.
Lemma 3. Assume that RetrospectiveCover just computed a new deadline
d[i]in line 7. Then process iwill either reach a new milestone or be terminated
by time d[i].
The Key Induction Argument
Lemma 4. Let c= 6. Consider a point of time when RetrospectiveCover
is at line 6. Let Ybe the number of requests that arrive during the time interval
(s[i], t). Then the service cost incurred by RetrospectiveCover up until this
point is at most c(lg Y)LB+(s[i], t, t).
The rest of this subsubsection is devoted to proving Lemma 4 by induction
on the number of times that line 6 in RetrospectiveCover is invoked. So
consider an arbitrary time that RetrospectiveCover is at line 6. We now
need to introduce some more notation. (See Figure 1 for an illustration of the
various concepts and notation). Let u−1=s[i], u0=m[i], y0be the number of
requests that arrive during (s, u0), and x0= LB+(s[i], m[i], m[i]). Let k=a−i.
Then for j∈ {1,2, . . . , k}, let uj−1=s[i+j] = m[i+j−1], uk=t, let yjbe the
number of requests that arrive during (uj−1, uj), and xj= LB+(uj−1, uj, uj).
Also, for notational convenience let z= LB+(s[i], t, t).
Before making our inductive argument, we need to detour slightly. Lemma 5
gives an inequality on service costs that will be useful in our analysis, and the
proof is given in Section A.
10
Lemma 5. It holds that Pk−1
j=1 xj≤x0.
We are now ready to make our inductive argument. We start with the base
case. The first milestone occurs at the first time twhere LB+(0, t, t) is positive.
When this occurs, RetrospectiveCover has incurred no service cost until
this point because LB−(0, t, t) = 0. Note that if Y= 1 or k= 0, then again
RetrospectiveCover has incurred no service cost until this point because
LB−(0, t, t) = 0. So assume from now on that Y≥2 and k≥1.
Now consider the case that k= 1. In this case, process iincurred a cost of at
most 2x0when line 8 is invoked at time u0, and as process i+ 1 did not reach its
first milestone (which would have caused process i+ 2 to start), no additional
cost is incurred at time t. Thus to establish the induction, it is sufficient to show
that 2x0+cx0lg y0≤cz lg Y.
Normalizing costs so that x0= 1, and using the fact that z≥2x0(since tis
the next milestone), it is sufficient to show that 4yc
0≤Y2c. Using the fact that
y0≤Y, it is sufficient to show 4Yc≤Y2c. This holds as Y≥2 and c= 6.
Now consider the case that k= 2. In this case, process iincurred a cost of at
most 2x0when line 8 is invoked at time u0, process i+ 1 incurs a cost of at most
2x1when line 8 is invoked at time u1and at most 2x1when line 5 is invoked at
time t, and process i+2 incurs no cost at time t. Thus to establish the induction,
it is sufficient to show that 2x0+ 4x1+cx0lg y0+cx1lg y1≤cz lg Y.
Normalizing costs so that x0= 1, and using the fact that z≥2, x1≤x0,
and y0+y1≤Y, it is sufficient to show that: 64yc
0yc
1≤(y0+y1)2c, which holds
by the binomial theorem as 64 ≤2c
c=12
6.
For the remainder of the proof, we assume k≥3. At time uj, for 0 ≤j≤k−1,
process i+jincurs a cost at most 2xjwhen line 8 is invoked. At time tprocess
i+j, for 1 ≤j≤k−1, incurs a cost of at most 2xjwhen line 5 is invoked. Indeed,
the algorithm pays LB−(uj−1, t, t), which must be less than 2LB+(uj−1, uj, uj),
otherwise process i+jwould have hit a milestone within (uj, t). Similarly, at
time tpoint process i+k=adoes not incur any cost, as otherwise process a
would have hit a milestone before time t. Thus to establish the induction, we
need to show that 2x0+ 4 Pk−1
j=1 xj+cPk−1
j=0 xjlg yj≤cz lg Y.
Note that our induction is using the fact that the cost for Retrospec-
tiveCover during a time interval (uj−1, uj) is identical to the cost of Ret-
rospectiveCover on the subinstance of requests that are released during
(uj−1, uj), essentially because RetrospectiveCover can be viewed as a re-
cursive algorithm. We now normalize costs so that x0= 1. Note that by Lemma
5, Pk−1
j=1 xj≤1, and z≥2 (or twouldn’t be the next milestone). Thus it is
sufficient to show that 64Qk−1
j=0 ycxj
j≤Y2c.
We now claim that the left hand side of this inequality is maximized, subject
to the constraint that Pk−1
j=0 yj≤Y, when each yj=xjY/X, where X=
Pk−1
j=0 xj(this can be shown using the method of Lagrangian multipliers). Thus
it is sufficient to show that 64Qk−1
j=0 (Y xj/X)cxj≤Y2c, which is equivalent to
64Yc(X−2) Qk−1
j=0 (xj)cxj≤XcX .
11
Since X≤2 and Y≥2, the value Yc(X−2) is maximized when Y= 2. Thus
it suffices to show that 64 ·2c(X−2) Qk−1
j=0 (xj)cxj≤XcX . Because x0= 1, it is
sufficient to show that 64 ·2c(X−2) Qk−1
j=1 (xj)cxj≤XcX .
Using again the method of Lagrangian multipliers, we have that the max-
imum of the left hand side of this inequality, subject to Pk−1
j=1 xj=X−1, is
reached when all xjare equal. Hence it suffices show that 64·2c(X−2) X−1
k−1cX ≤
XcX . Since ((X−1)/X)cX is clearly between 0 and 1, it is sufficient to show
that 26−2c·2cX ≤(k−1)cX , which, by simple algebra, is true for c= 6 and
when k≥3.
The Rest of the Analysis Lemma 6, which is proved in Section A, shows that
because the algorithm is trying to mimic proactive schedules, it will be the case
that the waiting cost for the algorithm is at most twice its service cost. Finally
in Theorem 2, also proved in Section A, we conclude that these lemmas imply
that our algorithm is O(log |R|)-competitive.
Lemma 6. For any set Sserviced at time tin the schedule produced by the
algorithm RetrospectiveCover, it will be the case that Wt(S)≤2 C(S).
Theorem 2. The RetrospectiveCover algorithm is O(log |R|)-competitive.
4 Conclusion
Another possible way to generalize the multilevel aggregation problem is to as-
sume that the domain Rof possible requests (perhaps it is useful to think of R
as “types” of possible requests), and the service cost C(S) for every subset Sof
R, is known to the online algorithm a priori, and then consider the competitive
ratio as a function of |R|. Our lower bound instance shows that the optimal
competitive ratio is Ω(log log |R|). Its not immediately clear to us how to upper
bound the competitiveness of our algorithm RetrospectiveCover in terms of
|R|, or how to design an algorithm where such an analysis is natural. So, deter-
mining the optimal competitive ratio as a function of |R| seems like a reasonable
interesting open problem.
References
1. Bienkowski, M., B¨ohm, M., Byrka, J., Chrobak, M., D¨urr, C., Folwarczn`y, L., Jez,
L., Sgall, J., Thang, N.K., Vesel`y, P.: Online algorithms for multi-level aggregation.
In: European Symposium on Algorithms. (2016) 12:1–12:17
2. Buchbinder, N., Feldman, M., Naor, J.S., Talmon, O.: O(depth)-competitive algo-
rithm for online multi-level aggregation. In: ACM-SIAM Symposium on Discrete
Algorithms. (2017) 1235–1244
3. Borodin, A., El-Yaniv, R.: Online Computation and Competitive Analysis. Cam-
bridge University Press, New York, NY, USA (1998)
4. Kalyanasundaram, B., Pruhs, K.: Online weighted matching. Journal of Algorithms
14(3) (1993) 478–488
12
5. Khuller, S., Mitchell, S.G., Vazirani, V.V.: On-line algorithms for weighted bipar-
tite matching and stable marriages. Theoretical Computer Science 127(2) (1994)
255–267
6. Bienkowski, M., B¨ohm, M., Byrka, J., Chrobak, M., D¨urr, C., Folwarczn`y, L., Je ˙z,
L., Sgall, J., Thang, N.K., Vesel`y, P.: Online algorithms for multi-level aggregation.
arXiv preprint arXiv:1507.02378 (2015)
7. Aggarwal, A., Park, J.K.: Improved algorithms for economic lot sizing problems.
Operations Research 41 (1993) 549–571
8. Dooly, D.R., Goldman, S.A., Scott, S.D.: On-line analysis of the TCP acknowl-
edgment delay problem. Journal of the ACM 48(2) (2001) 243–273
9. Karlin, A.R., Kenyon, C., Randall, D.: Dynamic TCP acknowledgement and other
stories about e/(e - 1). Algorithmica 36(3) (2003) 209–224
10. Bienkowski, M., Byrka, J., Chrobak, M., Je˙z, L., Nogneng, D., Sgall, J.: Bet-
ter approximation bounds for the joint replenishment problem. In: ACM-SIAM
Symposium on Discrete Algorithms. (2014) 42–54
11. Buchbinder, N., Kimbrel, T., Levi, R., Makarychev, K., Sviridenko, M.: Online
make-to-order joint replenishment model: Primal-dual competitive algorithms. In:
ACM-SIAM Symposium on Discrete Algorithms. (2008) 952–961
12. Bienkowski, M., Byrka, J., Chrobak, M., Dobbs, N., Nowicki, T., Sviridenko, M.,
wirszcz, G., Young, N.E.: Approximation algorithms for the joint replenishment
problem with deadlines. In: International Colloquium on Automata, Languages
and Programming. (2013) 135–147
13. Badrinath, B., Sudame, P.: Gathercast: the design and implementation of a pro-
grammable aggregation mechanism for the internet. In: International Conference
on Computer Communications and Networks. (2000) 206–213
14. Bortnikov, E., Cohen, R.: Schemes for scheduling of control messages by hierarchi-
cal protocols. In: Joint Conference of the IEEE Computer and Communications
Societies. Volume 2. (1998) 865–872
15. Hu, F., Cao, X., May, C.: Optimized scheduling for data aggregation in wire-
less sensor networks. In: Int. Conference on Information Technology: Coding and
Computing (ITCC 2005). Volume 2. (2005) 557–561
16. Yuan, W., Krishnamurthy, S., Tripathi, S.: Synchronization of multiple levels of
data fusion in wireless sensor networks. In: Global Telecommunications Conference.
Volume 1. (2003) 221–225
17. Papadimitriou, C.: Computational aspects of organization theory. In: European
Symposium on Algorithms. (1996) 559–564
18. Crowston, W.B., Wagner, M.H.: Dynamic lot size models for multi-stage assembly
systems. Management Science 20(1) (1973) 14–21
19. Kimms, A.: Multi-level lot sizing and scheduling: Methods for capacitated, dy-
namic, and deterministic models. Springer-Verlag (1997)
20. Lambert, D.M., Cooper, M.C.: Issues in supply chain management. Industrial
Marketing Management 29(1) (2000) 65–83
21. Becchetti, L., Marchetti-Spaccamela, A., Vitaletti, A., Korteweg, P., Skutella, M.,
Stougie., L.: Latency-constrained aggregation in sensor networks. ACM Transac-
tions on Algorithms 6(1) (2009) 13:1–13:20
22. Pedrosa, L.L.C.: (2013) Private communication.
23. Levi, R., Roundy, R., Shmoys, D.B.: Primal-dual algorithms for deterministic
inventory problems. Mathematics of Operations Research 31(2) (2006) 267–284
13
A Detailed Proofs
In this section we detail some of the proofs from Section 3.
Lemma 1 There exists a proactive schedule whose objective value is at most
twice optimal.
Proof. We show how to iteratively transform an arbitrary schedule Zinto a
proactive schedule in such a way that the total cost at most doubles. Let tbe
the next time in Zwhen there is an unserved set S, with the property that the
waiting time for S, infinitesimally after t, is greater than the service cost for S.
We then add the set Sat time tto Z. The service cost of this set is at most
the total waiting time of the requests that it serves. Thus the total service cost
of the final schedule is at most the waiting time in the original Z0. Further the
transformation can only decrease the total waiting cost, since the requests in S
are being served no later than they were originally.
Lemma 2 The value LB−(s, t, d) and LB+(s, t, d) are monotone on the set of
requests, that is, adding more requests between times sand tcan not decrease
either. Moreover, LB−(s, t, d) and LB+(s, t, d) are non-decreasing as a function
of tand as a function of d, for any s≤t≤d.
Proof. Any schedule that is proactive for the larger set of requests is also proac-
tive for the smaller set of requests, because the waiting time of the requests
serviced can not be more in the smaller set of requests than in the larger set of
requests. The monotonicity on tfollows directly from the monotonicity on the
set of requests. The monotonicity on dis clear since for any d≤d0, a proactive
solution up to time d0is also proactive up to time d.
Lemma 3 Assume that RetrospectiveCover just computed a new deadline
d[i] in line 7. Then process iwill either reach a new milestone or be terminated
by time d[i].
Proof. If d[i] is infinite, then this is obvious, so assume otherwise. If process i
is terminated before time d[i] then this is obvious, so assume otherwise. If no
requests arrive during the time interval (m[i], d[i]) then process iwill reach a
new milestone exactly at time d[i] by the definition of milestones. If requests
arrive before d[i] then the claim follows by the monotonicity of LB+, see Lemma
2.
Lemma 5 It holds that Pk−1
j=1 xj≤x0.
Proof. First notice that LB−(u−1, t, t)≤2LB+(u−1, u0, u0)=2x0, since oth-
erwise process iwould have hit a milestone within the interval (u0, t). Then,
it suffices to show that Pk−1
j=0 xj≤LB−(u−1, t, t). To show this last bound, fix
j∈ {0, . . . , k −1}and consider a proactive schedule Zfor requests arriving in
(u−1, t). Within each interval (uj−1, uj), the solution is proactive when consid-
ering requests with release date in (uj−1, uj], and hence the serving cost of the
14
requests served by Zwithin this interval is at most LB+(uj−1, uj−1, uj). We
remark that this is also true for j=k−1 as uk−1< t. Taking Zminimiz-
ing the service cost within (u−1, t), we obtain the required bound, Pk−1
j=0 xj≤
LB−(u−1, t, t).
Lemma 6 For any set Sserviced at time tin the schedule produced by the
algorithm RetrospectiveCover, it will be the case that Wt(S)≤2 C(S).
Proof. Assume that a process ihit a milestone at time t. We adopt the notation
from subsubsection 3.3 and illustrated in Figure 1. Additionally let Ujbe the
requests that are released in the time interval (uj−1, uj) for j∈ {0,1, . . . , k}.
We now prove that the sets serviced in line 5 have waiting time at most twice
the service cost. Also, we show that after serving such sets, the set ∪k
h=k−jUh
has no violated subset at time t, where j=a−`. We show this by induction
on j=a−`. The base case is when j= 0 (`=a). There is no violated subset
of Ukat time tsince process i+k=adid not hit a milestone before time t.
Thus, no set serviced in LB−(s[a], t, t) is violated. Obviously, after servicing
such sets, Ukstill has no violated subset. Now let us show the two properties for
an arbitrary j. By induction hypothesis, the set ∪k
h=k−(j−1)Uhhas no violated
subset at time t. There is no violated subset of Uk−jat time tsince the servicing
of sets in LB−(s[a−j], m[a−j], d[a−j]) = LB−(uk−j−1, uk−j, d[a−j]) at time
m[a−j] guaranteed that Uk−jwould not have any violated subsets until after
time d[a−j] and t≤d[a−j]. Thus no set serviced in LB−(s[a−j], t, t) in line 5
has waiting time at most twice the service cost. Further since LB−(s[a−j], t, t) =
LB−(uk−j−1, t, t) is a proactive schedule, after serving sets in such solution the
set ∪k
h=k−jUhhas no violated subset at time t.
Finally the same argument can be applied to the sets serviced in line 8 in
RetrospectiveCover.
Theorem 2 The RetrospectiveCover algorithm is O(log |R|)-competitive.
Proof. Applying Lemma 4 to the original process, and noting that the service
cost in the final invocation of line 8 is at most twice the previous service costs,
one obtains that the service cost for the algorithm is O(log |R|) times the lower
bound. By Lemma 1 the lower bound is at most twice the optimal, and by
Lemma 6 the waiting cost for requests serviced by the algorithm is at most the
service cost of these requests. Finally the waiting cost of any unserviced requests
is at most twice the service cost of the algorithm.