ArticlePDF Available

Near-optimal labeling schemes for nearest common ancestors

Authors:

Abstract

We consider NCA labeling schemes: given a rooted tree $T$, label the nodes of $T$ with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the label of their nearest common ancestor. For trees with $n$ nodes we present upper and lower bounds establishing that labels of size $(2\pm \epsilon)\log n$, $\epsilon<1$ are both sufficient and necessary. (All logarithms in this paper are in base 2.) Alstrup, Bille, and Rauhe (SIDMA'05) showed that ancestor and NCA labeling schemes have labels of size $\log n +\Omega(\log \log n)$. Our lower bound increases this to $\log n + \Omega(\log n)$ for NCA labeling schemes. Since Fraigniaud and Korman (STOC'10) established that labels in ancestor labeling schemes have size $\log n +\Theta(\log \log n)$, our new lower bound separates ancestor and NCA labeling schemes. Our upper bound improves the $10 \log n$ upper bound by Alstrup, Gavoille, Kaplan and Rauhe (TOCS'04), and our theoretical result even outperforms some recent experimental studies by Fischer (ESA'09) where variants of the same NCA labeling scheme are shown to all have labels of size approximately $8 \log n$.
arXiv:1312.4413v1 [cs.DS] 16 Dec 2013
Near-optimal labeling schemes for nearest common
ancestors
Stephen AlstrupEsben Bistrup Halvorsen
Kasper Green Larsen
Abstract
We consider NCA labeling schemes: given a rooted tree T, label the nodes
of Twith binary strings such that, given the labels of any two nodes, one can
determine, by looking only at the labels, the label of their nearest common
ancestor.
For trees with nnodes we present upper and lower bounds establishing
that labels of size (2 ±ǫ) log n,ǫ < 1 are both sufficient and necessary.1
Alstrup, Bille, and Rauhe (SIDMA’05) showed that ancestor and NCA
labeling schemes have labels of size log n+ Ω(log log n). Our lower bound
increases this to log n+ Ω(log n) for NCA labeling schemes. Since Fraigniaud
and Korman (STOC’10) established that labels in ancestor labeling schemes
have size log n+ Θ(log log n), our new lower bound separates ancestor and
NCA labeling schemes. Our upper bound improves the 10 log nupper bound
by Alstrup, Gavoille, Kaplan and Rauhe (TOCS’04), and our theoretical result
even outperforms some recent experimental studies by Fischer (ESA’09) where
variants of the same NCA labeling scheme are shown to all have labels of size
approximately 8 log n.
1 Introduction
Alabeling scheme assigns a label, which is a binary string, to each node of a tree
such that, given only the labels of two nodes, one can compute some predefined
function of the two nodes. The main objective is to minimize the maximum label
length: that is, the maximum number of bits used in a label.
With labeling schemes it is possible, for instance, to avoid costly access to large,
global tables, to compute locally in distributed settings, and to have storage used for
names/labels be informative. These properties are used in XML search engines [2],
network routing and distributed algorithms [57, 29, 22, 24, 29, 30], graph represen-
tations [40] and other areas. An extensive survey of labeling schemes can be found
in [35].
A nearest common ancestor (NCA) labeling scheme labels the nodes such that,
for any two nodes, their labels alone are sufficient to determine the label of their
NCA. Labeling schemes can be found, for instance, for distance, ancestor, NCA,
connectivity, parent and sibling [36, 44, 51, 7, 40, 57, 8, 41, 15, 16, 45, 55], and have
also been analyzed for dynamic trees [20]. NCA labeling schemes are used, among
other things, to compute minimum spanning trees in a distributed setting [50, 28,
13].
Department of Computer Science, University of Copenhagen, Denmark, s.alstrup@diku.dk.
Department of Computer Science, University of Copenhagen, Denmark, esbenbh@diku.dk.
MADALGO - Center for Massive Data Algorithmics, a Center of the Danish National Research
Foundation, Department of Computer Science, Aarhus University, Denmark, larsen@cs.au.dk.
1All logarithms in this paper are in base 2.
1
Our main result establishes that labels of size (2 ±ǫ) log n,ǫ < 1 are both
necessary and sufficient for NCA labeling schemes for trees with nnodes. More
precisely, we show that label sizes are lower bounded by 1.008 log nO(1) and
upper bounded by 2.772 log n+O(1).
Since our lower bound is log n+ Ω(log n), this establishes an exponential sepa-
ration (on the nontrivial, additive term) between NCA labeling and the closely re-
lated problem of ancestor labeling which can be solved optimally with labels of size
log n+ Θ(log log n) [32, 6]. (An ancestor labeling scheme labels the nodes in a tree
such that, for any two nodes, their labels alone are sufficient to determine whether
the first node is an ancestor of the second.) The upper bound of log n+ O(log log n)
for ancestor [32] is the latest result in a sequence [2, 41, 42, 8, 1, 31] of improvements
from the trivial 2 log nbound [58].
Our upper bound improves the 10 log nlabel size of [7]. In addition to the NCA
labeling scheme used to establish our upper bound, we present another scheme with
labels of size 3 log nwhich on the RAM uses only linear time for preprocessing and
constant time for answering queries, meaning that it may be an efficient solution
for large trees compared to traditional non-labeling scheme algorithms [38].
NCAs, also known as least common ancestors or lowest common ancestors
(LCAs), have been studied extensively over the last several decades in many varia-
tions; see, for example, [48, 3, 5, 56, 21, 10, 53, 11, 33, 56, 12]. A linear time algo-
rithm to preprocess a tree such that subsequent NCA queries can be answered in
constant time is described in [38]. NCAs have numerous applications for graphs [34,
43, 23, 5], strings [37, 25], planarity testing [59], geometric problems [18, 33], evolu-
tionary trees [26], bounded tree-width algorithms [19] and more. A survey on NCAs
with variations and application can be found in [7].
A log n+ O(logn) adjacency labeling scheme is presented in [9], and adjacency
labeling schemes of log n+ O(1) are presented in [14] for the special cases of bi-
nary trees and caterpillars. We present NCA labeling schemes with labels of size
2.585 log n+O(1) and log n+ log log n+O(1) for binary trees and caterpillars, re-
spectively. Our lower bound holds for any family of trees that includes all trees of
height O(log n) in which all nodes either have 2 or 3 children.
1.1 Variations and related work.
The NCA labeling scheme in [7] is presented as an O(log n) result, but it is easy to see
that the construction gives labels of worst-case size 10 log n. The algorithm uses a
decomposition of the tree, where each component is assigned a sub-label, and a label
for a node is a combination of sub-labels. Fischer [27] ran a series of experiments
using various techniques for sub-labels [7, 49, 39] and achieved experimentally that
worst-case label sizes are approximately 8 log n.
Peleg [52] has established labels of size Θ(log2n) for NCA labeling schemes in
which NCA queries have to return a predefined label of O(log n) bits. Experimental
studies of this variation can be found in [17]. In [13] the results from [7] are extended
to predefined labels of length k. We have included a corollary that shows that such
an extension can be achieved by adding klog nbits to the labels.
In [46] a model (1-query) is studied where one, in addition to the label of the
input nodes, can access the label of one additional node. With this extra informa-
tion, using the result from [7] for NCA labeling, they present a series of results for
NCA and distance. As our approach improves the label length from [7], we also
improve some of the label lengths in [46].
Sometimes various computability requirements are imposed on the labeling scheme:
in [45] a query should be computable in polynomial time; in [2] in constant time
on the RAM; and in [40] in polynomial time on a Turing machine. We use the
same approach as in [7], but with a different kind of sub-labels and with different
2
encodings for lists of strings, and it is only the 2.772 log n+O(1) labeling scheme
for trees that we do not show how to implement efficiently.
2 Preliminaries
The size or length of a binary string s=s1···skis the number of bits |s|=kin it.
The concatenation of two strings sand tis denoted s·t.
Let Tbe a rooted tree with root r. The depth of a node v, denoted depth(v),
is the length of the unique path from rto v. If a node ulies on the path from r
to a node v, then uis an ancestor of vand vis a descendant of u. If, in addition,
depth(v) = depth(u) + 1 so that uv is an edge in the tree, then uis the unique
parent of v, denoted parent(v), and vis a child of u. A binary tree is a rooted tree
in which any node has at most two children. A common ancestor of two nodes v
and wis a node that is an ancestor of both vand w, and their nearest common
ancestor (NCA), denoted nca(v, w), is the unique common ancestor with maximum
depth. The descendants of vform an induced subtree Tvwith vas root. The size
of v, denoted size(v), is the number of nodes in Tv.
Let Tbe a family of rooted trees. An NCA labeling scheme for Tconsists of an
encoder and a decoder. The encoder is an algorithm that accepts any tree Tfrom
Tas input and produces a label l(v), which is a binary string, for every node vin
T. The decoder is an algorithm that takes two labels l(v) and l(w) as input and
produces the label l(nca(v, w)) as output. Note that encoder knows the entire tree
when producing labels for nodes, whereas the decoder knows nothing about v,w
or the tree from which they come, although it does know that they come from the
same tree and that this tree belongs to T. The worst-case label size is the maximum
size of a label produced by the encoder from any node in any tree in T.
3 Lower bound
This section introduces a class of integer sequences, 3-2 sequences, and an associated
class of trees, 3-2 trees2, so that two 3-2 trees that have many labels in common
when labeled with an NCA labeling scheme correspond to two 3-2 sequences that
are “close” in the sense of a metric known as Levenshtein distance. By considering
a subset of 3-2 sequences that are pairwise distant in this metric, the corresponding
set of 3-2 trees cannot have very many labels in common, which leads to a lower
bound on the total number of labels and hence on the worst-case label size.
3.1 Levenshtein distance and 3-2 sequences.
The Levenshtein distance [47], or edit distance, between two sequences xand yis
defined as the number lev(x, y) of single-character edits (insertion, deletion and
substitution) required to transform xinto y. A 3-2 sequence of length 2kis an
integer sequence x= (x1,...,x2k) with exactly k2s and k3s.
Lemma 3.1. For any h, k with 2hkand kan integer with k90, there exists
a set Σof 3-2 sequences of length 2kwith |Σ| ≥ 21.95k/(16k/h)3hand lev(x, y)> h
for all x, y Σ.
Proof. Since lev(x, y)> h is equivalent to lev(x, y)>hand 21.95k/(16k/h)3h
21.95k/(16k/h)3h, we can safely assume that his an integer.
2The related “2-3 trees” [4] have a slightly different definition, which is why we use a different
terminology here.
3
Now, let xbe an arbitrary 3-2 sequence of length 2k, and consider the number
of 3-2 sequences yof length 2kwith lev(x, y)h. We can transform xinto yby
performing rdeletions followed by ssubstitutions followed by tinsertions, where
r+s+th. This leads to the following upper bound on the number of y’s:
h
X
r=0 2k
rhr
X
s=0 2kr
shrs
X
t=0 2krs+t
t2t(h+ 1)32k
h23k
h2h.
Using Stirling’s approximation [54] and the fact that (h+ 1)38hfor all h2, it
follows that this is upper bounded by
8h(2ke/h)2h(3ke/h)h2h= 3h(4ke/h)3h(16k/h)3h.
We now construct Σ as follows. Let Σdenote the set of 3-2 sequences of length
2k, and note that |Σ|=2k
k. Pick an arbitrary 3-2 sequence xfrom Σ, add it to
Σ and remove all strings yfrom Σwith lev(x, y)h. Continue by picking one of
the remaining strings from Σ, add it to Σ and remove all strings from Σwithin
distance h. When we run out of strings in Σwe will, according to the previous
calculation and Stirling’s approximation [54], have
|Σ| ≥ 2k
k
(16k/h)3h22k1
k1/2(16k/h)3h21.95k
(16k/h)3h,
where the last inequality follows from the fact that 20.05k1k1/2whenever k
90.
3.2 3-2 trees and a lower bound.
Given a 3-2 sequence x= (x1,...,x2k) of length 2k, we can create an associated
tree of depth 2kwhere all nodes at depth i1 have exactly xichildren, and all
nodes at depth 2kare leaves. We denote this tree the 3-2 tree associated with x.
The number of nodes at depth iin the 3-2 tree associated with xis x1···xi; in
particular, the number of leaves is x1···x2k= 6k. The number of nodes in total is
upper bounded by 2 ·6k.
Consider the set of labels produced by an NCA labeling scheme for the nodes in
a tree. Given a subset Sof these labels, let Sdenote the set of labels which can be
generated from Sby the labeling scheme: thus, Scontains the labels in Sas well
as the labels for the NCAs of all pairs of nodes labeled with labels from S. The
labels in Scan be organized as a rooted tree according to their ancestry relations,
which can be determined directly from the labels using the decoder of the labeling
scheme and without consulting the original tree. The tree produced in this way is
denoted TSand is uniquely determined from S. Note that, if all the nodes in Sare
leaves, then all internal nodes in TSmust have been obtained as the NCA of two
leaves, and hence must have at least two children.
Now, given a tree TSinduced by a subset Sof labels assigned to the leaves of
a tree Tby an NCA labeling scheme, we can create an integer sequence, I(S), as
follows. Start at the root of TS, and let the first integer be the number of children
of the root. Then recurse to a child vfor which the subtree TS
vcontains a maximum
number of leaves, and let the second integer be the number of children of this child.
Continue this until a leaf is reached (without writing down the last 0). Note that,
if Tis a 3-2 tree of depth 2k, the produced sequence I(S) will have length at most
2kand will contain only 2s and 3s.
Lemma 3.2. Let Tbe a 3-2 tree associated with the 3-2 sequence x= (x1,...,x2k).
Let Sbe a set of mlabels assigned to the leaves of Tby an NCA labeling scheme.
Then lev(x, I(S)) log3/2(6k/m).
4
Proof. We describe a way to transform xinto I(S). Start at the root of T, and let
ibe the depth in Tcontaining the node vwhose label l(v) is the root in TS. Delete
all entries x1,...,xi1from xand compare the number of children of l(v) in TS
to xi. If the numbers are the same, leave xibe; if not, we must have that xi= 3
and that the number of children of l(v) is 2, so replace xiby 2. Then recurse to a
child wof vin Tfor which the corresponding subtree in TScontains a maximum
number of leaves, and repeat the process with Tw, the corresponding subtree of TS
and the remaining elements xi+1,...,xk.
Clearly, this transforms xinto I(S) using only deletions and substitutions, where
all substitutions replace a 3 by a 2. Each of these edits modify the maximum possible
number of leaves in TScompared to Twith a factor of either 1/2 or 2/3. It follows
that the number mof leaves in TSsatisfies m6k·(2/3)lev(x,I(S)) , which implies
lev(x, I(S)) log3/2(6k/m) as desired.
We now present our main lower bound result. The result is formulated for a
family Tthat is large enough to contain all 3-2 trees with Nnodes; in particular,
it holds for the family of all rooted trees with at most Nnodes.
Theorem 3.3. If Tis a family of trees that contains all 3-2 trees with up to
N2·3240 nodes, then any NCA labeling scheme for Thas a worst-case label size
of at least 1.008 log N318.
Proof. Let k= 1201
120 log6(N/2)be log6(N/2) rounded down to the nearest
multiple of 120, and let n= 6kN/2. Further, set m=n119/120 and h=
2 log3/2(n/m). Note that n,mand n/m =n1/120 are all integers. Observe also
that n > (N/2)/6120 (3/2)120 and thereby that h2. Finally, observe that
h=1
60 klog3/26kand that k120.
According to Lemma 3.1, there exists a set Σ of 3-2 sequences of length 2kwith
|Σ| ≥ 21.95k/(16k/h)3hand lev(x, y )> h for all x, y Σ. The set Σ defines a set of
|Σ|associated 3-2 trees with nleaves and at most 2nNnodes. In particular, all
the associated trees belong to T. We can estimate the number of elements in Σ as
follows:
|Σ| ≥ 21.95k
(16k/h)3h
=21.95 log6n
(8 log6n/ log3/2(n/m))6 log3/2(n/m)
=n1.95 log62
(960 log6n/ log3/2n)(6 log3/2n)/120
=n1.95 log62
(960 log6(3/2))0.05 log3/2n
=n1.95 log62
n0.05 log3/2(960 log6(3/2))
=n1.95 log620.05 log3/2(960 log6(3/2))
n0.09
Now suppose that an NCA labeling scheme labels the nodes of all 3-2 trees
associated with sequences in Σ. Consider two trees associated with sequences x, y
Σ, and let Sdenote the set of leaf labels that are common to xand y. We must
then have |S|< m, since otherwise, by Lemma 3.2, we would have
lev(x, y)lev(x, I (S)) + lev(I(S), y)h
2+h
2=h.
5
It follows that, if we restrict attention to a subset Tconsisting of min(|Σ|,n/(2m))
of the trees associated with strings in Σ, then the leaves of any tree in Tcan share
a total of at most n/2 labels with all other trees in T. In other words, every tree
in Thas at least n/2 leaf labels that are unique for this tree within the set of all
leaf labels of trees in T. This gives a total of at least
n
2min(|Σ|,n/(2m)) = n
2min(n0.09,n1/120 /2)
=n121/120/8
n1.008/8
distinct labels. If the worst-case label size is L, we can create 2L+1 1 distinct
labels, and we must therefore have n1.008/82L+1 1 from which it follows that
L≥ ⌊1.008 log n⌋ − 3≥ ⌊1.008 log(N/2·6120)⌋ − 3
1.008 log N318.
4 Upper bound
In this section we construct an NCA labeling scheme that assigns to every node
a label consisting of a sequence of sub-labels, each of which is constructed from a
decomposition of a tree known as heavy-light decomposition. The labeling scheme
is similar to that of [7] but with a different way of constructing sub-labels (presented
in Section 4.4), a different way of ordering sub-labels (presented in Section 4.2) and
a different way of encoding lists of sub-labels (presented in Section 4.1).
4.1 Encodings.
We begin with a collection of small results that show how to efficiently encode
sequences of binary strings.
Lemma 4.1. A collection of nobjects can be uniquely labeled with binary strings
of length at most Lif and only if L≥ ⌊log n.
Proof. There are 2Lbinary strings of length L, and hence there are 2L+1 1 binary
strings of length at most L. Thus, we can create unique labels for ndifferent
objects using labels of length at most Lwhenever n2L+1 1, which is equivalent
to L≥ ⌈log(n+ 1)⌉ − 1 = log n. (The latter equality follows from the simple
fact that r=s⌉ − 1 for all real numbers r < s for which there does not exist an
integer zwith r < z < s.)
Lemma 4.2. A collection of nobjects can be uniquely labeled with binary strings
of length exactly Lif and only if L≥ ⌈log n.
Proof. The argument is similar to the one in Lemma 4.1, but with the modification
that we only use labels of length exactly equal to L. This yields the inequality
n2L, which is equivalent to L≥ ⌈log n.
Lemmas 4.1 and 4.2 can only be efficiently implemented if there is a way to
efficiently implement the 1-1 correspondence between the objects and the numbers
1,...,n. The remaining lemmas of this section show how to encode sequences of
binary strings whose concatenation has length t, and all of them except Lemma 4.4
can be implemented with linear time encoding and constant time decoding on a
RAM machine in which a machine word has size O(t).
6
Lemma 4.3. Let a= (a1, a2)be a pair of (possibly empty) binary strings with
|a1·a2|=t. We can encode aas a single binary string of length t+log tsuch that
a decoder without any knowledge of aor tcan recreate afrom the encoded string
alone.
Proof. Since |a1| ≤ |a1·a2|=t, we can use Lemma 4.2 to encode |a1|with exactly
log tbits. We then encode aby concatenating the encoding of |a1|with a1·a2
to give a string of exactly t+log tbits. Since tis uniquely determined from
t+log t, the decoder can split up the encoded string into the encoding of |a1|and
the concatenation a1·a2from which it can recreate a1and a2.
We thank Mathias Bæk Tejs Knudsen for inspiring parts of the proof of Lemma 4.4
below. As the proof shows, the encoding in Lemma 4.4 is optimal with respect to
size but comes with no guarantees for time complexities. Lemma 4.5 further below
is a suboptimal version of Lemma 4.4 but with a more efficient implementation.
Lemma 4.4. Let a= (a0,...,a2k)be a list of (possibly empty) binary strings with
|a0···a2k|=tand with a2i·a2i+1 6=εfor all i < k. We can encode aas a single
binary string of length (1+log(2+2))tsuch that a decoder without any knowledge
of a,tor kcan recreate afrom the encoded string alone.
Proof. We will use Lemma 4.2 to encode afor a fixed t. To do this, we must count
the number of possible sequences in the form of a. There are 2tchoices for the t
bits in the concatenation a0···a2k, and every subdivision of the concatenation into
the substrings aicorresponds to a solution to the equation
x0+x1+···+x2k=t
where x2i+x2i+1 1 for i= 0,...,k1. Note that we must have kt. For a
given t, let stdenote the number of solutions (including choices of k) to the above
equation. We shall prove further below that
st=1
4ct+1 +1
4dt+1,(1)
where c= 2 + 2 and d= 2 2, which easily implies st(2 + 2)t. It then
follows that the total number of sequences afor fixed tis bounded by 2t(2 + 2)t,
and using Lemma 4.2 we can therefore encode any such aas a string with exactly
(1 + log(2 + 2))tbits. Since tis uniquely determined by this length, the decoder
can determine tfrom the length of the string and then use Lemma 4.2 to recreate
a.
It remains to show (1). For any t, the number of solutions with k= 0 is 1.
Given a solution where k > 0, let j=x0+x1, and note that j1 and that
x2+···+x2k=tjis a solution to the problem for tj. There are j+ 1 s olutions
to x0+x1=j, and hence the total number of solutions is
st= 1 +
t
X
j=1
stj(j+ 1).
Using this expression, it is straightforward to see that
st2st1+st2= 2st1st2,
which implies st= 4st12st2. The characteristic polynomial of this recurrence
relation has roots cand d, and hence st=αct+βdtfor some α, β. Using s0= 1
and s1= 3 to solve, we obtain α=c/4 and β=d/4, which proves (1).
7
Lemma 4.5. Let a= (a0,...,a2k)be a list of (possibly empty) binary strings with
|a0···a2k|=tand with a2i·a2i+1 6=εfor all i < k. We can encode aas a single
binary string of length 3tsuch that a decoder without any knowledge of a,tor k
can recreate afrom the encoded string alone.
Proof. We encode aas a concatenation of three binary strings of lengths t,t1 and
t+ 1, respectively. The first string is the concatenation ˜a=a0···a2k. The second
string has a 1in the i’th position for it1 exactly when the (i+ 1)’th position
of ˜ais the first bit in a substring a2j·a2j+1 (which by the assumption is nonempty
for all j). The third string has a 1in the i’th position for itexactly when the
i’th position of ˜ais the first bit in a substring a2j+1 for some jor in a2k, and a 1
in the (t+ 1)’th position exactly when a2k6=ε.
If the decoder receives the concatenation of length 3tof these three strings, it
can easily recreate the three strings by splitting up the string into three substrings
of sizes t,t1 and t+ 1. The first string is ˜a, which it can then split up at all
positions where the second string has a 1. This gives a list of nonempty strings in
the form a2i·a2i+1 for ik2 as well as the string a2k2·a2k1·a2k. The decoder
can then use the third string to split up each of these concatenations as follows.
For every (nonempty) concatenation a2i·a2i+1, consider the corresponding bits in
the third string. If one of these bits is a 1, then the concatenation should be split
up at that position; in particular, if the 1is at the first bit in the concatenation,
then it means that a2iis empty. If none of the bits is a 1, then it means that a2i+1
is empty. In all cases, we can recreate a2iand a2i+1 . Likewise, the concatenation
a2k2·a2k1·a2kcan be split up using the 1s in the corresponding bits in the third
string. If there are two 1s among these bits, then it is clear how to split up the
concatenation. If there are no 1s, then it means that a2k1and a2kare both empty.
If there is exactly one 1, then we can split up the concatenation into a2k2and
a2k1·a2k, and exactly one of a2k1and a2kmust be empty. The last bit of the
third string determines which of these two cases we are in.
Lemma 4.6. Let a= (a0,...,ak)be a list of (possibly empty) binary strings with
|a0···ak|=tand with ai·ai+1 6=εfor all i < k. We can encode aas a single binary
string of length (1 + log 3)(t1)+ 3 such that a decoder without any knowledge
of a,tor kcan recreate afrom the encoded string alone.
Proof. We encode aby concatenating ˜a=a0···akof length twith a string sof
length (log 3)t. To describe s, we first construct a string ˜sof length t1 over
the alphabet {0,1,2}. The i’th bit ˜siof ˜sis defined according to the role of the
(i+ 1)’th bit xin ˜aas follows:
˜si=
0,if xis the first bit of a nonempty string aj,
where aj1is nonempty,
1,if xis the first bit of a nonempty string aj,
where aj1is empty,
2,else.
The string ˜srepresents a unique choice out of 3t1possibilities, and by Lemma 4.2
we can represent this choice with a binary string sof length exactly equal to
log 3t1=(t1) log 3. We concatenate this with a single indicator bit repre-
senting whether a0is empty or not, and another indicator bit representing whether
akis empty or not. Finally, we concatenate all this with ˜a, giving a string of total
length (t1) log 3+ 2 + t=(1 + log 3)(t1)+ 3.
8
Since the value tis uniquely determined from the length of the encoded string,
the decoder is able to split up the encoded string into ˜a,sand the two indicator
bits. It can then convert sto ˜sand use the entries in ˜sand the indicator bits to
recreate afrom ˜a. This proves the theorem.
4.2 An order on binary strings.
Consider the total order on binary strings defined by
s·0·tss·1·t
for all binary strings s, t, t. Here we have written stas short for sts6=t.
This order naturally arises in many contexts and has been studied before; see, for
example, [57]. All binary strings of length three or less are ordered by as follows:
000 00 001 0010 01 011 ε100 10 101 1110 11 111
A finite sequence (ai) of binary strings is -ordered if aiajfor i < j.
Lemma 4.7. Given a finite sequence (wi)of positive numbers with w=Piwi,
there exists an -ordered sequence (ai)with |ai| ≤ ⌊log wlog wifor all i.
Proof. The proof is by induction on the number of elements in the sequence (wi).
If there is only one element, w1, then we can set a1=ε, which satisfies |a1|= 0 =
log w1log w1. So suppose that there is more than one element in the sequence
and that the theorem holds for shorter sequences. Let kbe the smallest index
such that Pikwi> w/2, and set ak=ε. Then akclearly satisfies the condition.
The subsequences (wi)i<k and (wi)i>k are shorter and satisfy Pi<k wiw/2 and
Pi>k wiw/2, so by induction there exist -ordered sequences (bi)i<k and (bi)i>k
with |bi| ≤ ⌊log(w/2) log wi=log wlog wi⌋ − 1 for all i6=k. Now, define
aifor i < k by ai= 0 ·biand for i > k by ai= 1 ·bi. Then (ai) is a -ordered
sequence with |ai| ≤ ⌊log wlog wifor all i.
A linear time implementation of the previous lemma can be achieved as follows.
First compute the numbers ti=log wlog wiin linear time. Now set a1=0t1
to be the minimum (with respect to the order ) binary string of length at most
t1. At the i’th step, set aito be the minimum binary string of length at most ti
with ai1ai. If this process successfully terminates, then the sequence (ai) has
the desired property. On the other hand, the process must terminate, because the
above lemma says that there exists an assignment of the ai’s, and our algorithm
conservatively chooses each aiso that the set of possible choices left for ai+1 is
maximal at every step. A similar argument shows that the following lemma can be
implemented in linear time.
Lemma 4.8. Given a finite sequence (wi)of positive numbers with w=Piwi,
there exist an -ordered sequence (ai)of nonempty strings and a ksuch that |ai| ≤
log(w+wk)log wifor all i.
Proof. Let kbe the smallest index such that Pikwi> w/2 and add an extra copy
of wknext to wkin the sequence of weights. The total sequence of weights will now
sum to w+wk, and if we apply Lemma 4.7 to this sequence, exactly one of the
two copies of wkwill be assigned the empty string. Discard this string, and what
is left is a -ordered sequence (ai) with |ai| ≤ log(w+wk)log wifor all ias
desired.
9
4.3 Heavy-light decomposition.
We next describe the heavy-light decomposition of Harel and Tarjan [38]. Let T
be a rooted tree. The nodes of Tare classified as either heavy or light as follows.
The root rof Tis light. For each internal node v, pick one child node wwhere
size(w) is maximal among the children of vand classify it as heavy; classify the
other children of vas light. We denote the unique heavy child of vby hchild(v) and
the set of light children by lchildren(v). The light size of a node vis the number
lsize(v) = 1+Pwlchildren(v)size(w), which is equal to size(v)size(hchild(v)) when
vis internal. The apex of v, denoted apex(v), is the nearest light ancestor of v.
By removing the edges between light nodes and their parents, Tis divided into a
collection of heavy paths. The set of nodes on the same heavy path as vis denoted
hpath(v). The top node of hpath(v) is the light node apex(v).
For a node v, consider the sequence u0,...,ukof light nodes encountered on
the path from the root r=u0to v. The number kis the light depth of v, denoted
ldepth(v). The light depth of T, ldepth(T) is the maximum light depth among the
nodes in T. Note that ldepth(v)ldepth(T)log n; see [38].
4.4 One NCA labeling scheme.
We now describe the labeling scheme that will be used for various families of trees,
although with different encodings for each family. Given a rooted tree T, we begin
by assigning to each node vaheavy label, hlabel(v), and, when vis light and not
equal to the root, a light label, llabel(v), as described in Lemmas 4.9 and 4.10 below.
Lemma 4.9. There exist binary strings hlabel(v)for all nodes vin Tso that the
following hold for all nodes v, w belonging to the same heavy path:
depth(v)<depth(w) =
hlabel(v)hlabel(w)
(2)
|hlabel(v)| ≤ ⌊log size(apex(v)) log lsize(v)(3)
Proof. Consider each heavy path Hseparately and use the sequence (lsize(v))vH,
ordered ascendingly by depth(v), as input to Lemma 4.7.
Lemma 4.10. There exist binary strings llabel(v)for all light nodes v6=rin Tso
that the following hold for all light siblings v, w:
v6=w=llabel(v)6= llabel(w) (4)
|llabel(v)| ≤ ⌊log lsize(parent(v)) log size(v)(5)
Proof. Consider each set Lof light siblings separately and use the sequence (size(v))vL,
not caring about order, as input to Lemma 4.7.
In many cases we are not going to use the constructions in Lemmas 4.9 and 4.10
directly, but will instead use the following two modifications:
Lemma 4.11. It is possible to modify the constructions in Lemmas 4.9 and 4.10
so that, for all nodes u, v where vis a light child of u,
hlabel(u) = ε=llabel(v)6=ε. (6)
The modification still satisfies (2),(3),(4) and (5) except that when hlabel(u)is
empty, (5) is replaced by
|hlabel(u)|+|llabel(v)| ≤ ⌊log size(apex(u)) log size(v)(7)
10
Proof. First observe that without modifying the construction in Lemmas 4.9 and 4.10
we can combine (3) with (5) to obtain (7). We now describe the modification: the
construction works exactly as in the two lemmas except that in cases where hlabel(u)
is empty, we use Lemma 4.8 in place of Lemma 4.7 in the construction of light labels
in Lemma 4.10. This clearly makes (6) true, so it remains to prove (7).
So let uand vbe as above. By construction of the heavy-light decomposition,
size(hchild(u)) is larger than or equal to the size of any of the light children of u,
and hence larger than the size that corresponds to the weight wkin Lemma 4.8.
Further, lsize(u) + size(hchild(u)) = size(u)size(apex(u)). Using these two facts
together, Lemma 4.8 now yields
|llabel(v)| ≤ ⌊log size(apex(u)) log size(v).
Since |hlabel(u)|= 0, we have therefore obtained (7).
Lemma 4.12. It is possible to modify the constructions in Lemmas 4.9 and 4.10
so that, for all nodes u, v, w where vis a light child of uand wis a descendant of
von the same heavy path as v,
hlabel(u) = εand llabel(v) = ε=hlabel(w)6=ε. (8)
The modification still satisfies (2),(3),(4) and (5) except that when hlabel(u)and
llabel(v)are both empty, (3) is replaced by
|hlabel(u)|+|llabel(v)|+|hlabel(w)| ≤ ⌊log size(apex(u)) log lsize(w)(9)
Proof. The proof is similar to that of the previous lemma. First observe that without
modifying the construction in Lemmas 4.9 and 4.10 we can combine (3), (5) and (3)
again to obtain (9). We now describe the modification: the construction works
exactly as in the two lemmas except that in cases where hlabel(u) and llabel(v) are
both empty, we use Lemma 4.8 in place of Lemma 4.7 in the construction of heavy
labels in Lemma 4.9. This clearly makes (8) true, so it remains to prove (9).
So let u,vand wbe as above. Note that size(v) is larger than or equal to
the light size of any of the nodes on the heavy path with vas apex, and hence
larger than the light size that corresponds to the weight wkin Lemma 4.8. Further,
2 size(v)lsize(u) + size(hchild(u)) = size(u)) size(apex(u)). Using these two
facts together, Lemma 4.8 now yields
|hlabel(w)| ≤ ⌊log size(apex(u)) log lsize(w).
Since |hlabel(u)|=|llabel(v)|= 0, we have therefore obtained (9).
We next assign a new set of labels for the nodes of T. Given a node vwith
ldepth(v) = k, consider the sequence u0, v0,...,uk, vkof nodes from the root r=u0
to v=vk, where ui= apex(vi) is light for i= 0,...,k and vi1= parent(ui) for i=
1,...,k. Let l(v) = (h0, l1, h1,...,lk, hk), where li= llabel(ui) and hi= hlabel(vi).
Figure 1 shows an example of a tree with the labels l(v). Note that we have used
Lemmas 4.9 and 4.10 for the construction of labels in this figure and not any of the
modifications in Lemmas 4.11 and 4.12.
To define a labeling scheme, it remains to encode the lists l(v) of binary strings
into a single binary string. Before we do this, however, we note that l(nca(v, w))
can be computed directly from l(v) and l(w). The proof is essentially the same as
that in [7] although with the order in place of the usual lexicographic order.
Lemma 4.13. Let vand wbe nodes in T, and let u= nca(v, w).
(a) If l(v)is a prefix of l(w), then l(u) = l(v).
11
ε
10
100ε
1
10ε
11
110ε
11111εε
1εε
10εε
ε00
ε0ε
εε0
εε0εε
εεε
εεε00
εεε0ε
εε1
εε1εεεε11
εεε10
εεε1ε
Figure 1: A tree with the labels l(v) from Section 4.4 and with heavy sub-labels under-
lined.
(b) If l(w)is a prefix of l(v), then l(u) = l(w).
(c) If l(v) = (h0, l1,...,hi, li,...)and l(w) = (h0, l1,...,hi, l
i,...)with li6=l
i,
then l(u) = (h0, l1, h1,...,hi).
(d) If l(v) = (h0, l1,...,li1, hi,...)and l(w) = (h0, l1, h1,...,li1, h
i,...)with
hi6=h
i, then l(u) = (h0, l1, h1,...li1,min{hi, h
i}).
Proof. By construction, l=l(parent(apex(u))) is a prefix of both l(v) and l(w),
and
l(u) = l·(llabel(apex(u)),hlabel(u)).
Suppose first that vis an ancestor of w, so that u=v, and let xbe the nearest
ancestor of won hpath(v). Then apex(x) = apex(u), so
l(w) = l·(llabel(apex(u)),hlabel(x),...)
If u=x, then hlabel(x) = hlabel(u) and case (a) applies. (If v=wthen case (b)
applies too.) If u6=x, then hlabel(u)hlabel(x) by (2) and case (d) applies. The
case where wis an ancestor of vis analogous.
Suppose next that vand ware not ancestors of each other. Then umust have
children ˆvand ˆwwith ˆv6= ˆwsuch that ˆvis an ancestor of vand ˆwis an ancestor of
w. At most one of ˆvand ˆwcan be heavy. If neither of them are heavy, then they
are apexes for their own heavy paths, and hence
l(v) = l(u)·(llabel(ˆv),...)
and
l(w) = l(u)·(llabel( ˆw),...).
By (4), llabel(ˆv) and llabel( ˆw) are distinct, so case (c) applies. If ˆvis heavy,
then apex(ˆv) = apex(u) and l(v) = l·(llabel(apex(u)),hlabel(ˆv),...) while l(w)
is still on the above form, i.e. l(w) = l·(llabel(apex(u)),hlabel(u),...). By (2),
hlabel(u)hlabel(ˆv), so (d) applies. The case where ˆwis heavy is analogous.
Note that, as in [7], the above theorem can be used to find labels for NCAs in
constant time on the RAM as long as the labels have size O(log n).
As a final step, before presenting the encodings of the labels l(v), we present a
lemma that makes it easier to compute the size of the encodings. For brevity, we
let ˜
l(v) = h0·l1·h1···lk·hkdenote the concatenation of the sub-labels of l(v).
12
Lemma 4.14. If Thas nnodes, then |˜
l(v)| ≤ ⌊log nfor every node vin T. This
holds no matter if we use Lemmas 4.9 and 4.10 combined or any of the variants in
Lemmas 4.11 and 4.12 for the construction of heavy and light labels.
Proof. Let vbe an arbitrary node in Tand recall that l(v) = (h0, l1, h1,...,lk, hk)
where li= llabel(ui) and hi= hlabel(vi) for nodes ui, vi,i= 0,...,k given by
r=u0,v=vk,ui= apex(vi) for all i= 0,...,k and vi1= parent(ui) for
i= 1,...,k. If we use Lemmas 4.9 and 4.10 for the construction of heavy and light
labels, we have by (3) that |hi| ≤ ⌊log size(ui)log lsize(vi)for all i= 0,...,k and
by (5) that |li| ≤ ⌊log lsize(vi1)log size uifor i= 1,...,k. Summarizing now
gives a telescoping sum:
|˜
l(v)|=|h0·l1·h1···lk·hk|
≤ ⌊log size(u0)log lsize(v0)+
log lsize(v0)log size(u1)+
···+log size(uk)log lsize(vk)
≤ ⌊log size(u0)log lsize(vk)
≤ ⌊log n.
In the cases where we have used any of the variants in Lemmas 4.11 and 4.12, we
must use (7) or (9) first to collapse sums of two or three terms in the above sum
before collapsing the whole expression. Nevertheless, the result of the computation
remains unchanged.
4.5 NCA labeling schemes for different families of trees.
Let Trees and BinaryTrees denote the families of rooted trees and binary trees,
respectively.
Theorem 4.15. There exists an NCA labeling scheme for Trees whose worst-case
label size is at most (1 + log(2 + 2))log n⌋⌉ ≤ 2.772 log n+ 1.
Proof. The encoder uses the modified construction in Lemma 4.11 to ensure that
every empty heavy label is followed by a nonempty light label. This means that the
sequence l(v) = (h0, l1, h1,...,lk, hk) can be encoded using (1+log(2+2))log n⌋⌉
bits; see Lemma 4.4. Given the encoded labels from two nodes, the decoder can
now decode the labels as described in Lemma 4.4, use Lemma 4.13 to compute the
label of the NCA, and then re-encode that label using Lemma 4.4 once again.
The labeling scheme in Theorem 4.15 makes use of Lemma 4.4 which comes
without any guarantees for the time complexities for encoding and decoding. This
makes the result less applicable in practice. Theorems 4.16, 4.18 and 4.19 and
Corollary 4.17 below all use linear time for encoding and constant time for decoding.
Theorem 4.16. There exists an NCA labeling scheme for Trees whose worst-case
label size is at most 3log n.
Proof. The proof is identical to that of Theorem 4.15 but with Lemma 4.5 in place
of Lemma 4.4.
A variant of NCA labeling schemes [13] allows every node to also have a prede-
fined label and requires the labeling scheme to return the predefined label of the
NCA.
Corollary 4.17. There exists an NCA labeling scheme for Trees with predefined
labels of fixed length kwhose worst-case label size is at most (3 + k)log n+ 1.
13
Proof. It suffices to save together with the NCA label of a node va table of the
predefined labels for the at most log nparents of light nodes on the path from
the root to v, since the NCA of two nodes will always be a such for one of the
nodes. By prepending a string in the form 0i1to the NCA label of vwe can ensure
that it has size exacly 3log n+ 1. We can then append a table of up to log n
predefined labels of size k. Finally, we append 0s to make the label have size exactly
(3 + k)log n+ 1. The decoder can now use the label’s length to split up the label
into the NCA label and the entries in the table of predefined labels.
Theorem 4.18. There exists an NCA labeling scheme for BinaryTrees whose worst-
case label size is at most (1 + log 3)(log n⌋ − 1)+ 3 2.5 85 log n+ 2.
Proof. First note that every node in a binary tree has at most one light child. We
can therefore assume that all light labels are empty. Letting the encoder use the
construction in Lemma 4.12, we can then ensure that every empty heavy label is
followed by (an empty light label and) a nonempty heavy label. Since we can ignore
light labels, it suffices to encode the sequence (h0, h1,...,hk), and this sequence can
be encoded with (1 + log 3)(log n1)+ 3 bits; see Lemma 4.6. The rest of the
proof follows the same argument as the proof of Theorem 4.15.
Acaterpillar is a tree in which all leaves are connected to a single main path.
We assume caterpillars to always be rooted at one of the end nodes of the main
path. Let Caterpillars denote the family of caterpillars.
Theorem 4.19. There exists an NCA labeling scheme for Caterpillars whose worst-
case label size is at most log n+loglog n⌋⌉ + 1.
Proof. By definition of caterpillars, every label l(v) is either in the form (h0) or
(h0, l1, ε). We encode the first case as 0·h0and the second case as 1·x, where x
is the encoding of the pair (h0, l1) using log n+loglog n⌋⌉ bits; see Lemma 4.3.
In both cases, the label size is at most log n+loglog n⌋⌉ + 1, and the decoder
can easily distinguish the two cases from the first bit. The rest of the proof follows
the same argument as the proof of Theorem 4.15.
For comparison, the best known lower bound for NCA labeling schemes for
caterpillars is the trivial log n.
References
[1] S. Abiteboul, S. Alstrup, H. Kaplan, T. Milo, and T. Rauhe, Compact labeling
scheme for ancestor queries, SIAM J. Comput. 35 (2006), no. 6, 1295–1309.
[2] S. Abiteboul, H. Kaplan, and T. Milo, Compact labeling schemes for ancestor
queries, Proceedings of the twelfth annual ACM-SIAM Symposium on Discrete
Algorithms (SODA), 2001, pp. 547–556.
[3] A. V. Aho, Y. Sagiv, T. G. Szymanski, and J. D. Ullman, Inferring a tree from
lowest common ancestors with an application to the optimization of relational
expressions, SIAM Journal on Computing 10 (1981), no. 3, 405–421.
[4] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman, The design and
analysis of computer algorithms, Addison-Wesley, 1974.
[5] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, On finding lowest common ancestor
in trees, SIAM Journal on computing 5(1976), no. 1, 115–132, See also STOC
1973.
14
[6] S. Alstrup, P. Bille, and T. Rauhe, Labeling schemes for small distances in
trees, SIAM J. Discrete Math. 19 (2005), no. 2, 448–462.
[7] S. Alstrup, C. Gavoille, H. Kaplan, and T. Rauhe, Nearest common ances-
tors: A survey and a new algorithm for a distributed environment, Theory of
Computing Systems 37 (2004), no. 3, 441–456.
[8] S. Alstrup and T. Rauhe, Improved labeling schemes for ancestor queries, Proc.
of the 13th annual ACM-SIAM Symp. on Discrete Algorithms (SODA), 2002.
[9] ,Small induced-universal graphs and compact implicit graph representa-
tions, In Proc. 43rd annual IEEE Symp. on Foundations of Computer Science,
2002, pp. 53–62.
[10] S. Alstrup and M. Thorup, Optimal pointer algorithms for finding nearest com-
mon ancestors in dynamic trees, Journal of Algorithms 35 (2000), no. 2, 169–
188.
[11] M. A. Bender and M. Farach-Colton, The lca problem revisted, 4th LATIN,
2000, pp. 88–94.
[12] O. Berkman and U. Vishkin, Recursive star-tree parallel data structure, SIAM
Journal on Computing 22 (1993), no. 2, 221–242.
[13] L. Blin, S. Dolev, M. Potop-Butucaru, and S. Rovedakis, Fast self-stabilizing
minimum spanning tree construction: using compact nearest common ances-
tor labeling scheme, Proceedings of the 24th international conference on Dis-
tributed computing, DISC’10, 2010, pp. 480–494.
[14] N. Bonichon, C. Gavoille, and A. Labourel, Short labels by traversal and jump-
ing, Electronic Notes in Discrete Mathematics 28 (2007), 153–160.
[15] M. A. Breuer, Coding vertexes of a graph, IEEE Trans. on Information Theory
IT–12 (1966), 148–153.
[16] M. A. Breuer and J. Folkman, An unexpected result on coding vertices of a
graph, J. of Mathemathical analysis and applications 20 (1967), 583–600.
[17] S. Caminiti, I. Finocchi, and R. Petreschi, Engineering tree labeling schemes:
A case study on least common ancestors., ESA, Lecture Notes in Computer
Science, vol. 5193, Springer, 2008, pp. 234–245.
[18] S. Carlsson and B. J. Nilsson, Computing vision points in polygons, Algorith-
mica 24 (1999), no. 1, 50–75.
[19] S. Chaudhuri and C. D. Zaroliagis, Shortest paths in digraphs of small treewdith.
Part II: Optimal parallel algorithms, Theoretical Computer Science 203 (1998),
no. 2, 205–223.
[20] E. Cohen, H. Kaplan, and T. Milo, Labeling dynamic xml trees, SIAM J. Com-
put. 39 (2010), no. 5, 2048–2074.
[21] R. Cole and R. Hariharan, Dynamic lca queries on trees, Annual ACM-SIAM
Symposium on Discrete Algorithms (SODA), vol. 10, 1999.
[22] L. J. Cowen, Compact routing with minimum stretch, Journal of Algorithms
38 (2001), 170–183.
15
[23] B. Dixon, M. Rauch, and R. E. Tarjan, Verification and sensitivity analysis
of minimum spanning trees in linear time, SIAM Journal on Computing 21
(1992), no. 6, 1184–1192.
[24] T. Eilam, C. Gavoille, and D. Peleg, Compact routing schemes with low stretch
factor, 17th Annual ACM Symposium on Principles of Distributed Computing
(PODC), August 1998, pp. 11–20.
[25] M. Farach-Colton, Optimal suffix tree construction with large alphabets, 38th
Annual Symposium on Foundations of Computer Science (IEEE, ed.), IEEE
Computer Society Press, 1997, pp. 137–143.
[26] M. Farach-Colton, S. Kannan, and T. Warnow, A robust model for finding
optimal evolutionary trees., Algorithmica 13 (1995), no. 1/2, 155–179.
[27] J. Fischer, Short labels for lowest common ancestors in trees, ESA, 2009,
pp. 752–763.
[28] P. Flocchini, T. Mesa Enriquez, L. Pagli, G. Prencipe, and N. Santoro, Dis-
tributed minimum spanning tree maintenance for transient node failures, IEEE
Trans. Comput. 61 (2012), no. 3, 408–414.
[29] P. Fraigniaud and C. Gavoille, Routing in trees, 28th International Colloquium
on Automata, Languages and Programming (ICALP), vol. 2076 of LNCS, 2001,
pp. 757–772.
[30] P. Fraigniaud and C. Gavoille., A space lower bound for routing in trees, 19th
Annual Symposium on Theoretical Aspects of Computer Science (STACS),
March 2002, pp. 65–75.
[31] P. Fraigniaud and A. Korman, Compact ancestry labeling schemes for xml trees,
SODA, 2010, pp. 458–466.
[32] ,An optimal ancestry scheme and small universal posets, Proceedings
of the 42nd ACM symposium on Theory of computing (New York, NY, USA),
2010, pp. 611–620.
[33] H. N. Gabow, J. L. Bentley, and R. E. Tarjan, Scaling and related techniques
for geometry problems, Proc. of the Sixteenth Annual ACM Symposium on
Theory of Computing, 1984, pp. 135–143.
[34] H.N. Gabow, Data structure for weighted matching and nearest common an-
cestors with linking, Annual ACM-SIAM Symposium on discrete algorithms
(SODA), vol. 1, 1990, pp. 434–443.
[35] C. Gavoille and D. Peleg, Compact and localized distributed data structures,
Distributed Computing 16 (2003), no. 2-3, 111–120.
[36] C. Gavoille, D. Peleg, S. Perennes, and R. Raz, Distance labeling in graphs,
12th Symp. On Discrete algorithms, 2001.
[37] D. Gusfield, Algorithms on strings, trees, and sequences, Cambridge University
Press, 1997, pp. 196-207.
[38] D. Harel and R. E. Tarjan, Fast algorithms for finding nearest common ances-
tors, Siam J. Comput 13 (1984), no. 2, 338–355.
[39] T. C. Hu and A. C. Tucker, Optimum computer search trees, SIAM Journal of
Applied Mathematics 21 (1971), 514–532.
16
[40] S. Kannan, M. Naor, and S. Rudich, Implicit representation of graphs, SIAM
J. DISC. MATH. (1992), 596–603, Preliminary version appeared in STOC’88.
[41] H. Kaplan and T. Milo, Short and distances and other functions, 7nd Work.
on Algo. and Data Struc., LNCS, 2001.
[42] H. Kaplan, T. Milo, and R. Shabo, A comparison of labeling schemes for an-
cestor queries, Proceedings of the thirteen annual ACM-SIAM Symposium on
Discrete Algorithms (SODA), 2002.
[43] D. R. Karger, P. N. Klein, and R. E. Tarjan, A randomized linear-time algo-
rithm to find minimum spanning trees, Journal of the ACM 42 (1995), no. 2,
321–328.
[44] M. Katz, N. Katz, and D. Peleg, Distance labeling schemes for well-seperated
graph classes, STACS’00, LNCS, vol. 1170, Springer Verlag, 2000.
[45] M. Katz, N. A. Katz, A. Korman, and D. Peleg, Labeling schemes for flow and
connectivity, SIAM J. Comput. 34 (2004), no. 1, 23–40.
[46] A. Korman and S. Kutten, Labeling schemes with queries., SIROCCO, 2007,
pp. 109–123.
[47] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions and
reversals., Soviet Physics Doklady. 10 (1966), no. 8, 707–710.
[48] D. Maier, A space efficient method for the lowest common ancestor problem and
an application to finding negative cycles, 18th Annual Symposium on Founda-
tions of Computer Science, 1977, pp. 132–141.
[49] K. Mehlhorn, A best possible bound for the weighted path length of binary search
trees, SIAM J. Comput. 6(1977), no. 2, 235–239.
[50] L. Pagli, G. Prencipe, and T. Zuva, Distributed computation for swapping a
failing edge, Proceedings of the 6th international conference on Distributed
Computing (Berlin, Heidelberg), IWDC’04, Springer-Verlag, 2004, pp. 28–39.
[51] D. Peleg, Proximity-preserving labeling schemes and their applications, Graph-
Theoretic concepts in computer science, 25th international workshop WG’99,
LNCS, vol. 1665, Springer Verlag, 1999, pp. 30–41.
[52] ,Informative labeling schemes for graphs, 25th International Sympo-
sium on Mathematical Foundations of Computer Science (MFCS), vol. 1893 of
LNCS, Springer, August 2000, pp. 579–588.
[53] P. Powel, A further improved lca algorithm, Tech. Report TR90-01, University
of Minneapolis, 1990.
[54] H. Robbins, A remark on Stirling’s formula, Amer. Math. Monthly 62 (1955),
26–29. MR MR0069328 (16,1020e)
[55] N. Santoro and R. Khatib, Labeling and implicit routing in networks, The
computer J. 28 (1985), 5–8.
[56] B. Schieber and U. Vishkin, On finding lowest common ancestors: Simplifica-
tion and parallelization, SIAM Journal of Computing 17 (1988), 1253–1262.
[57] M. Thorup and U. Zwick, Compact routing schemes, ACM Symposium on
Parallel Algorithms and Architectures, vol. 13, 2001.
17
[58] A. K. Tsakalidis, Maintaining order in a generalized linked list, Acta Informat-
ica 21 (1984), no. 1, 101–112.
[59] J. Westbrook, Fast incremental planarity testing, Automata, Languages and
Programming, 19th International Colloquium (Werner Kuich, ed.), Lecture
Notes in Computer Science, vol. 623, Springer-Verlag, 1992, pp. 342–353.
18
... Arguably, trees are one of the most important classes of graphs considered in the context of labeling schemes. Functions studied in the literature include adjacency [6,13,15], ancestry [2,25,26], routing [23,24,47], distance [42,32,5,9,30], and nearest common ancestors [43,8,22,10,34]. See Table 1 for a summary of the state-of-the-art bounds for these problems. ...
... Create-labels(P ), where P is a heavy path with v i,j as the head 10: ...
... Iterating through the routing table 10: ...
... Upper bound Distance 1/4 log 2 n − O(log n) [9] 1/4 log 2 n + o(log 2 n) [25] Fixed-port routing Ω(log 2 n/ log log n) [21] O(log 2 n/ log log n) [20] (1 + )-approximate distance Ω(log (1/ ) log n) [25] O(log (1/ ) log n) [25] Nearest common ancestor 1.008 log n − O(1) [10] 2.318 log n + O(1) [31] Designer-port routing log n + Ω(log log n) [5] log n + O((log log n) 2 ) log n + O((log log n) 2 ) log n + O((log log n) 2 ) Ancestry log n + Ω(log log n) [5] log n + O(log log n) [23] Siblings/connectivity (unique) log n + Ω(log log n) [5] log n + O(log log n) [5] Adjacency log n (trivial) log n + O(1) [6] Table 1: Summary of the state-of-the-art bounds for labeling schemes in trees. ...
... Arguably, trees are one of the most important classes of graphs considered in the context of labeling schemes. Functions studied in the literature on labeling schemes in trees include adjacency [6,13,14], ancestry [2,22,23], routing [20,21,44], distance [5,9,27,29,39], and nearest common ancestors [8,10,19,31,40]. See Table 1 for a summary of the state-of-the-art bounds for these problems. ...
... Iterating through the routing table 10: ...
Preprint
A routing labeling scheme assigns a binary string, called a label, to each node in a network, and chooses a distinct port number from $\{1,\ldots,d\}$ for every edge outgoing from a node of degree $d$. Then, given the labels of $u$ and $w$ and no other information about the network, it should be possible to determine the port number corresponding to the first edge on the shortest path from $u$ to $w$. In their seminal paper, Thorup and Zwick [SPAA 2001] designed several routing methods for general weighted networks. An important technical ingredient in their paper that according to the authors ``may be of independent practical and theoretical interest'' is a routing labeling scheme for trees of arbitrary degrees. For a tree on $n$ nodes, their scheme constructs labels consisting of $(1+o(1))\log n$ bits such that the sought port number can be computed in constant time. Looking closer at their construction, the labels consist of $\log n + O(\log n\cdot \log\log\log n / \log\log n)$ bits. Given that the only known lower bound is $\log n+\Omega(\log\log n)$, a natural question that has been asked for other labeling problems in trees is to determine the asymptotics of the smaller-order term. We make the first (and significant) progress in 19 years on determining the correct second-order term for the length of a label in a routing labeling scheme for trees on $n$ nodes. We design such a scheme with labels of length $\log n+O((\log\log n)^{2})$. Furthermore, we modify the scheme to allow for computing the port number in constant time at the expense of slightly increasing the length to $\log n+O((\log\log n)^{3})$.
... Alstrup et al. [7] showed that, somewhat surprisingly, this is enough to circumvent the lower bound of Peleg by designing a scheme using labels consisting of O(log n) bits. They did not calculate the exact constant, but later experimental comparison by Fischer [16] showed that, even after some tweaking, in the worst case it is around 8. In a later paper, Alstrup et al. [9] showed a scheme with labels of length 2.772 log n + O(1) 1 and proved that any such scheme needs labels of length at least 1.008 log n − O (1). The latter non-trivially improves an immediate lower bound of log n + Ω(log log n) obtained from ancestry. ...
... The scheme of Alstrup et al. [9] (and also all previous schemes) is based on the notion of heavy path decomposition. For every heavy path, we assign a binary code to the root of each subtree hanging off it. ...
... Our technical contributions are summarized in Table 1. The upper bounds are presented in Section 4 and should be compared with the labeling schemes of Alstrup et al. [9], that imply a minor-universal tree of size O(n 2.585 ) for binary trees, and O(n 2.772 ) for general (without restricting the degrees) trees. There is also the explicit construction of a minor-universal tree of size O(n 4 ) for binary trees given by Hrubes et al. [24]. ...
... This implies that the families of rooted n-vertex trees and of rooted n-vertex binary trees have labeling schemes with labels of size 2.318 log n and 1.894 log n, respectively. This improves on the best previous NCA-labeling schemes that were developed by Alstrup, Halvorsen and Larsen[3]and which require labels of size 2.772 log n for general rooted trees and of size 2.585 log n for binary rooted trees. In[3], it is also shown that any NCA-labeling scheme for general n-vertex rooted trees requires labels of size at least 1.008 log n. ...
... This improves on the best previous NCA-labeling schemes that were developed by Alstrup, Halvorsen and Larsen[3]and which require labels of size 2.772 log n for general rooted trees and of size 2.585 log n for binary rooted trees. In[3], it is also shown that any NCA-labeling scheme for general n-vertex rooted trees requires labels of size at least 1.008 log n. As we show how to explicitly construct the NCA-universal trees, our labeling schemes are constructive. ...
... As we show how to explicitly construct the NCA-universal trees, our labeling schemes are constructive. Note that the best NCA-labeling schemes of[3]are not constructive and that the best previous constructive NCA-labeling scheme for n-vertex rooted trees requires labels of size 3 log n. Further, our NCA-labeling schemes are efficient, the embedding of a rooted tree into the constructed NCA-universal tree can be computed efficiently and a single query can be answered in time O(log 2 n) (O(log n) for binary trees). ...
Article
We investigate the nearest common ancestor (NCA) function in rooted trees. As the main conceptual contribution, the paper introduces universal trees for the NCA function: For a given family of rooted trees, an NCA-universal tree $S$ is a rooted tree such that any tree $T$ of the family can be embedded into $S$ such that the embedding of the NCA in $T$ of two nodes of $T$ is equal to the NCA in $S$ of the embeddings of the two nodes. As the main technical result we give explicit constructions of NCA-universal trees of size $n^{2.318}$ for the family of rooted $n$-vertex trees and of size $n^{1.894}$ for the family of rooted binary $n$-vertex trees. A direct consequence is the explicit construction of NCA-labeling schemes with labels of size $2.318\log_2 n$ and $1.894\log_2 n$ for the two families of rooted trees. This improves on the best known such labeling schemes established by Alstrup, Halvorsen and Larsen [SODA 2014].
... However, notice that given the labels of u and v in tree T α , we are not interested in computing the label of the edge stored at the LCA of u and v, rather we are interested in knowing the weight of the edge stored at the LCA. Alstrup et al. [1] established a labeling scheme for LCA that given the labels of any two vertices u and v in a tree, returns a predefined label associated with the LCA of u and v in the tree. If each predefined label consist of M-bits, then the labels in this scheme for LCA consists of O(M log n 0 ) bits, where n 0 is the number of nodes in the tree. ...
... However, we can also store in the predefined label other relevant information associated with either the detour HD α (y), or a (1 + , k) preserver of HD α (y). Then, the result of Alstrup et al. [1] implies the following lemma, which would be used in the construction of the routing scheme. Lemma 6.1. ...
... Alstrup, Dahlgaard, and Knudsen [3] proved the optimal result for adjacency in trees, achieving labels of size log n + O (1). Numerous other functions were considered, both in terms of upper and lower bounds: distance [10][11][12], connectivity [16,18], sibling or ancestor relationship [2], nearest common ancestor in trees [5,14], routing [25] and flow [16]. Often more restricted classes of graphs are analysed, most notably planar graphs [7,8], bounded degree graphs [1] and sparse graphs [4,13,19]. ...
Preprint
We consider labeling nodes of a directed graph for reachability queries. A reachability labeling scheme for such a graph assigns a binary string, called a label, to each node. Then, given the labels of nodes $u$ and $v$ and no other information about the underlying graph, it should be possible to determine whether there exists a directed path from $u$ to $v$. By a simple information theoretical argument and invoking the bound on the number of partial orders, in any scheme some labels need to consist of at least $n/4$ bits, where $n$ is the number of nodes. On the other hand, it is not hard to design a scheme with labels consisting of $n/2+O(\log n)$ bits. In the classical centralised setting, Munro and Nicholson designed a data structure for reachability queries consisting of $n^2/4+o(n^2)$ bits (which is optimal, up to the lower order term). We extend their approach to obtain a scheme with labels consisting of $n/3+o(n)$ bits.
Article
Let G=(V,E) be an n -vertices m -edges directed graph with edge weights in the range [1, W ] for some parameter W , and sϵ V be a designated source. In this article, we address several variants of the problem of maintaining the (1+ε)-approximate shortest path from s to each v ϵ V { s } in the presence of a failure of an edge or a vertex. From the graph theory perspective, we show that G has a subgraph H with Õ(ε ⁻¹ } n log W ) edges such that for any x,vϵ V , the graph H \ x contains a path whose length is a (1+ε)-approximation of the length of the shortest path from s to v in G \ x . We show that the size of the subgraph H is optimal (up to logarithmic factors) by proving a lower bound of Ω (ε ⁻¹ n log W ) edges. Demetrescu, Thorup, Chowdhury, and Ramachandran (SICOMP 2008) showed that the size of a fault tolerant exact shortest path subgraph in weighted directed/undirected graphs is Ω ( m ). Parter and Peleg (ESA 2013) showed that even in the restricted case of unweighted undirected graphs, the size of any subgraph for the exact shortest path is at least Ω ( n 1.5 ). Therefore, a (1+ε)-approximation is the best one can hope for. We consider also the data structure problem and show that there exists an ϕ(ε ⁻¹ n log W ) size oracle that for any vϵ V reports a (1+ε)-approximate distance of v from s on a failure of any xϵ V in O(log log 1+ε ( nW )) time. We show that the size of the oracle is optimal (up to logarithmic factors) by proving a lower bound of Ω (ε ⁻¹ n log W log ⁻¹ n ). Finally, we present two distributed algorithms . We present a single-source routing scheme that can route on a (1+ε)-approximation of the shortest path from a fixed source s to any destination t in the presence of a fault. Each vertex has a label and a routing table of ϕ(ε ⁻¹ log W ) bits. We present also a labeling scheme that assigns each vertex a label of ϕ(ε ⁻¹ log W ) bits. For any two vertices x,vϵ V , the labeling scheme outputs a (1+ε)-approximation of the distance from s to v in G \ x using only the labels of x and v .
Article
This article defines a problem that involves merging nodes into trees while retaining the ability to determine the lowest common ancestor of any two nodes. An O(n log n) algorithm is offered to solve the problem on-line. It is shown how this algorithm provides a fast way of computing the dominator tree of a reducible flow graph.
Article
An algorithm is given for constructing an alphabetic binary tree of minimum weighted path length (for short, an optimal alphabetic tree). The algorithm needs $4n^2 + 2n$ operations and $4n$ storage locations, where n is the number of terminal nodes in the tree. A given binary tree corresponds to a computer search procedure, where the given files or letters (represented by terminal nodes) are partitioned into two parts successively until a particular file or letter is finally identified. If the files or letters are listed alphabetically, such as a dictionary, then the binary tree must have, from left to right, the terminal nodes consecutively. Since different letters have different frequencies (weights) of occurring, an alphabetic tree of minimum weighted path length corresponds to a computer search tree with minimum-mean search time. A binary tree also represents a (variable-length) binary code. In an alphabetic binary code, the numerical binary order of the code words corresponds to the alphabetical order of the encoded letters. An optimal alphabetic tree corresponds to an optimal alphabetic binary code.
Article
We present the first universal compact routing algorithm with maximum stretch bounded by 3 that uses sublinear space at every vertex. The algorithm uses local routing tables of size O(n(2/3) log(4/3) n) and achieves paths that are most 3 times the length of the shortest path distances for all nodes in an arbitrary weighted undirected network. This answers an open question of Gavoille and Gengler who showed that any universal compact routing algorithm with maximum stretch strictly less than 3 must use Omega (n) local space at some vertex (C) 2001 Academic Press.