ArticlePDF Available

Distance labeling schemes for trees

Authors:

Abstract

We consider distance labeling schemes for trees: given a tree with $n$ nodes, label the nodes with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the distance in the tree between the two nodes. A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theory 2000) establish that labels must use $\Theta(\log^2 n)$ bits\footnote{Throughout this paper we use $\log$ for $\log_2$.}. Gavoille et. al. (ESA 2001) show that for very small approximate stretch, labels use $\Theta(\log n \log \log n)$ bits. Several other papers investigate various variants such as, for example, small distances in trees (Alstrup et. al., SODA'03). We improve the known upper and lower bounds of exact distance labeling by showing that $\frac{1}{4} \log^2 n$ bits are needed and that $\frac{1}{2} \log^2 n$ bits are sufficient. We also give ($1+\epsilon$)-stretch labeling schemes using $\Theta(\log n)$ bits for constant $\epsilon>0$. ($1+\epsilon$)-stretch labeling schemes with polylogarithmic label size have previously been established for doubling dimension graphs by Talwar (STOC 2004). In addition, we present matching upper and lower bounds for distance labeling for caterpillars, showing that labels must have size $2\log n - \Theta(\log\log n)$. For simple paths with $k$ nodes and edge weights in $[1,n]$, we show that labels must have size $\frac{k-1}{k}\log n+\Theta(\log k)$.
arXiv:1507.04046v1 [cs.DS] 14 Jul 2015
Distance labeling schemes for trees
Stephen Alstrup Inge Li Gørtz Esben Bistrup Halvorsen
. Ely Porat §
Abstract
We consider distance labeling schemes for trees: given a tree with nnodes, label the nodes with
binary strings such that, given the labels of any two nodes, one can determine, by looking only at
the labels, the distance in the tree between the two nodes.
A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theory
2000) establish that labels must use Θ(log2n) bits1. Gavoille et. al. (ESA 2001) show that for very
small approximate stretch, labels use Θ(log nlog log n) bits. Several other papers investigate various
variants such as, for example, small distances in trees (Alstrup et. al., SODA’03).
We improve the known upper and lower bounds of exact distance labeling by showing that 1
4log2n
bits are needed and that 1
2log2nbits are sufficient. We also give (1 + ε)-stretch labeling schemes
using Θ(log n) bits for constant ε > 0. (1 + ε)-stretch labeling schemes with polylogarithmic label
size have previously been established for doubling dimension graphs by Talwar (STOC 2004).
In addition, we present matching upper and lower bounds for distance labeling for caterpillars,
showing that labels must have size 2 log nΘ(log log n). For simple paths with knodes and edge
weights in [1, n], we show that labels must have size k1
klog n+ Θ(log k).
Department of Computer Science, University of Copenhagen. E-mail: s.alstrup@di.ku.dk.
DTU Compute, Technical University of Denmark. E-mail: inge@dtu.dk
Department of Computer Science, University of Copenhagen. E-mail: esben@bistruphalvorsen.dk
§Department of Computer Science, Bar-Ilan University. porately@cs.biu.ac.il
1Throughout this paper we use log for log2.
1 Introduction
Adistance labeling scheme for a given family of graphs assigns labels to the nodes of each graph in the
family such that, given the labels of two nodes in the graph and no other information, it is possible
to determine the shortest distance between the two nodes. The labels are assumed to be composed of
bits, and the goal is to make the worst-case label size as small as possible. Labeling schemes are also
called implicit representation of graphs [60, 67]. The problem of nding implicit representations with
small labels for specific families of graphs was introduced in the 1960s [14, 15], and efficient labeling
schemes were introduced in [42, 53]. Distance labeling for general graphs has been considered since the
1970/80s [38, 68], and later for various restricted classes of graphs and/or approximate distances, often
tightly related to distance oracle and routing problems, see e.g. [6]. This paper focuses on distance
labels for the well studied case of trees.
Exact distances. In [57] Peleg presented an O(log2n) bits distance labeling scheme for general un-
weighted trees. In [37] Gavoille et al. proved that distance labels for unweighted binary trees re-
quire 1
8log2nO(log n) bits and presented a scheme with 1/(log 3 1) log n1.7 log nbits. This
paper presents a scheme of size 1
2log2n+ O(log n) and further reduces the gap by showing that
1
4log2nO(log n) bits are needed. Our upper bound is a somewhat straightforward application of
a labeling scheme for nearest common ancestors [7, 8].
Approximate distances. Let distT(x, y) denote the shortest distance between nodes x, y in a tree
T. An r-additive approximation scheme returns a value dist
T(x, y), where distT(x, y)dist
T(x, y)
distT(x, y) + r. An s-stretched approximation scheme returns a value dist
T(x, y), where distT(x, y )
dist
T(x, y)distT(x, y)·s. For trees of height hGavoille et al. [30, theorem 4] gave a 1-additive
O(log nlog h) bit labeling scheme. However, using an extra bit in the label for the node depth modulo
2, it is easy to see that any 1-additive scheme can be made exact. Gavoille et al. [30] also gave upper
and lower bounds of Θ(log log nlog n) bits for (1 + 1/log n)-stretched distance. This paper presents a
scheme of size Θ(log n) for (1 + ε)-stretch for constant ε > 0. Labeling schemes for (1 + ε)-stretch with
polylogarithmic size label have previously been given for graphs of doubling dimension [61] and planar
graphs [63].
Distances in caterpillars and paths. Labeling schemes for caterpillars have been studied for various
queries, e.g., adjacency [13]. Here we present upper and lower bounds showing that distance labeling
caterpillars requires 2 log nΘ(log log n) bits. The upper bound is constructed by reduction to the case
of weighted paths with k > 1 nodes and positive integer edge weights in [1, n], for which we give upper
and lower bounds showing that labels must have size k1
klog n+ Θ(log k).
Problem Lower bound Upper bound
Exact, general trees 1
4log2n1
2log2n
(1 + ε)-stretch, general trees Θ(log n)
Caterpillars 2 log nΘ(log log n)
Weighted paths, knodes, weights in [1, n]k1
klog n+ Θ(log k)
Table 1: Results presented in this paper. ε > 0 is a constant.
1.1 Related work
Distances in trees with small height. It is known that, for unweighted trees with bounded height
h, labels must have size Θ(log nlog h). The upper bound follows from [30, Theorem 2] and the lower
1
bound from [37, Section 3]2. In [43] distance labeling for various restricted class of trees, including trees
with bounded height, is considered, and in [62] another distance labeling scheme for unweighted trees
using O(log nlog h) bits is given.
Small distances in trees. Distances in a tree between nodes at distance at most kcan be computed
with labels of size log n+ O(klog n) [44]. In [4] it is shown that size log n+ Θ(log log n) are needed
for labeling schemes supporting both parent and sibling queries. More generally, [4] shows that, using
labels of size log n+ O(log log n), the distance between two nodes can be determined if it is at most k
for some constant k, which is optimal for k > 1. In [31, 32] further improvements are given for small
distances in trees. For k= 1, corresponding to adjacency testing, there is a sequence of papers that
improve the second order term, recently ending with [5] which establishes that log n+ Θ(1) bits are
sufficient.
Various other cases for trees. Distance labeling schemes for various other cases have been consid-
ered, e.g., for weighted trees [30, 37, 57], dynamic trees [50], and a labeling scheme variation with extra
free lookup [48, 49].
Exact and approximate distances in graphs. Distance labeling schemes for general graphs [6,
37, 38, 60, 66, 68] and various restricted graphs exist, e.g., for bounded tree-width, planar and bounded
degree [37], distance-hereditary [34], bounded clique-width [20], some non-positively curved plane [17],
interval [35] and permutation graphs [12]. Approximate distance labeling schemes, both additive and
stretched, are also well studied; see e.g., [16, 24, 30, 33, 37, 39, 40, 51, 57, 65]. An overview of distance
labeling schemes can be found in [6].
1.2 Second order terms are important
Chung’s solution in [18] gives labels of size log n+O(log log n) for adjacency labeling in trees, which
was improved to log n+O(logn) in FOCS’02 [11] and in [13, 18, 27, 28, 45] to log n+ Θ(1) for various
special cases. Finally it was improved to log n+ Θ(1) for general trees in FOCS’15 [5].
A recent STOC’15 paper [9] improves label size for adjacency in general graphs from n/2+O(log n) [42,
52] to n/2 + O(1) almost matching an (n1)/2 lower bound [42, 52].
Likewise, the second order term for ancestor relationship is improved in a sequence of STOC/SODA
papers [2, 3, 10, 28, 29] (and [1, 45]) to Θ(log log n), giving labels of size log n+ Θ(log log n).
Somewhat related, succinct data structures (see, e.g., [22, 25, 26, 54, 55]) focus on the space used
in addition to the information theoretic lower bound, which is often a lower order term with respect to
the overall space used.
1.3 Labeling schemes in various settings and applications
By using labeling schemes, it is possible to avoid costly access to large global tables, computing instead
locally and distributed. Such properties are used, e.g., in XML search engines [2], network routing and
distributed algorithms [21, 23, 64, 65], dynamic and parallel settings [19, 50], graph representations [42],
and other applications [46, 47, 56, 57, 58]. Various computability requirements are sometimes imposed
on labeling schemes [2, 42, 46]. This paper assumes the RAM model.
2 Preliminaries
Trees. Given nodes u, v in a rooted tree T,uis an ancestor of vand vis a descendant of u, if uis on
the unique path from vto the root. For a node uof T, denote by Tube the subtree of Tconsisting of
2We thank Gavoille for pointing this out.
2
all the descendants of u(including itself). The depth of uis the number of edges on the unique simple
path from uto the root of T. The nearest common ancestor (NCA) of two nodes is the unique common
ancestor with largest depth. Let T[u, v] denote the nodes on the simple path from uto vin T. The
variants T(u, v] and T[u, v) denote the same path without the first and last node, respectively. The
distance between uand vis the number dist(u, v) = |T(u, v]|. We set distroot(v) = dist(v, r ), where r
is the root of T. A caterpillar is a tree whose non-leaf nodes form a path, called the spine.
Heavy-light decomposition. (From [59].) Let Tbe a rooted tree. The nodes of Tare classified as
either heavy or light as follows. The root rof Tis light. For each non-leaf node v, pick one child w
where |Tw|is maximal among the children of vand classify it as heavy; classify the other children of v
as light. The apex of a node vis the nearest light ancestor of v. By removing the edges between light
nodes and their parents, Tis divided into a collection of heavy paths. Any given node vhas at most
log nlight ancestors (see [59]), so the path from the root to vgoes through at most log nheavy paths.
Bit strings. A bit string sis a member of the set {0,1}. We denote the length of a bit string sby
|s|, the ith bit of sby si, and the concatenation of two bit strings s, sby ss. We say that s1is the
most significant bit of sand s|s|is the least significant bit.
Labeling schemes. An distance labeling scheme for trees of size nconsists of an encoder eand a
decoder d. Given a tree T, the encoder computes a mapping eT:V(T)→ {0,1}assigning a label to
each node uV(T). The decoder is a mapping d:{0,1}×{0,1}Z+, where Z+denotes the positive
integers, such that, given any tree Tand any pair of nodes u, v V(T), d(e(u), e(v)) = dist(u, v). Note
that the decoder does not know T. The size of a labeling scheme is defined as the maximum label size
|eT(u)|over all trees Tand all nodes uV(T). If, for all trees T, the mapping eTis injective we say
that the labeling scheme assigns unique labels.
3 Distances on weighted paths
In this section we study the case of paths with knodes and integral edge weights in [1, n]. The solution
to this problem will later be used to establish the upper bound for caterpillars.
3.1 Upper Bound
Theorem 3.1. There exist a distance labeling scheme for paths with knodes and positive integral edge
weights in [1, n]with labels of size k1
klog n+O(log k).
Proof. We begin by considering the family of paths with knodes, integral edge weights and diameter
< n. We shall prove that there exists a distance labeling scheme for this family with labels of size
k1
klog n+ log k+ O(log log k).
So consider such a path, and root it in one of its end nodes, denoted v0. Denote the nodes on the
path v0,...,vk1in order. Let di= distroot(vi) and note that, by assumption, di< n for all i. We will
let the label for vistore the number di+xfor some x < n that allows us to represent di+xcompactly.
Since we use the same xfor all nodes, we can easily compute the distance between any pair of nodes
vi, vjas |(di+x)(dj+x)|.
Since we choose x < n, the largest number stored in a label will be dk+x < 2n, which can be
represented with exactly L=log(2n)bits. Divide those Lbits to k+ 1 segments, whereof khave
=L/kbits and the last segment contains the remaining bits. The first segment, segment 0, will
contain the least significant bits, segment 1 the following bits and so on. We will choose xsuch that
the representation of di+xhas 0s in all the bits in the i’th segment. If we manage to do so, we will be
able to encode each di+xwith L+log kbits. Indeed, we can use exactly log kbits to represent
3
i, and the next Lbits to represent di+xwhere we skip the i’th segment. Prefixing with a string in
the form 0loglog k⌉⌉1, we get a string from which we can determine the number of bits needed to write
log kand therefrom the numbers iand di+x. We use this string as the label for vi. The label length
is L+log k+log log k⌉⌉ + 1 = k1
klog n+ log k+O(log log k).
It remains to show that there exist a number x < n as described. In the following we shall, as in
the above, represent numbers <2nwith Lbits that are divided into k+ 1 segments whereof the rst k
have size . For i < k and y < 2n, let a(i, y) be a function which returns a number zwith the following
properties:
(i) In z, all bits outside segment iare 0.
(ii) z+yhas only 0s in segment i.
This function is constructed as follows. If yonly has 0s in segment i, let a(i, y) = 0. Otherwise take the
representation of y, zero out all bits outside segment i, reverse the bits in segment iand add vto the
resulting number, where vhas a 1 in the least significant bit of segment iand 0s in all other positions.
Note that from (i) it follows that adding zto any number will not change bits in less significant
positions than segment i. We can now scan through the nodes v0,...vk1, increasing xby adding bits
to xin more and more significant positions (in non-overlapping segments), as follows:
Set x= 0.
For i= 1 ...,k 1, set x=x+a(i, x +di).
After iteration iwe have that x+diin segment ionly has 0s, and in the following iterations, 1s
are only added to xin more significant bit positions, meaning that di+xcontinues to have only 0s in
segment i. Since the segments are non-overlapping, we end up with x < n.
For the more general family of paths with knodes and edge weights in [1, n], we simply note that
the diameter of any path in this family is at most kn. Using the above result thus immediately gives
us a labeling scheme with labels of size k1
klog n+O(log k).
3.2 Lower bound
Theorem 3.2. Labeling scheme for distances on weighted paths with knodes and edge weights [1, n]
require k1
klog n+ Ω(log k)bits.
Proof. Let Fdenote the family of paths with knodes and integral edge weights in [1, n]. We can
construct all the members of Fby selecting (k1) different edge weights in the range [1, n], skipping
the paths which have already been constructed by the reverse sequence of edge weights. With this
construction we will at most skip half of the paths, and hence |F| ≥ 1
2nk1. Let the worst-case label
size of an optimal distance labeling scheme for such paths have length L. The number of different
labels with length at most Lis N= 2L+1 1. We can uniquely represent each of the paths in F
with the collection of their labels, and hence |F| ≤ N
k. Thus, we have found that 1
2nk1N
k. Since
N
k(Ne/k)k, it follows that k1
klog nlog Nlog k+O(1) and hence that Lk1
klog n+log kO(1)
as desired.
Combining Theorem 3.2 with Theorem 3.1 we see that distance labels for paths of knodes with
integral weights in [1, n] must have length k1
klog n+ Θ(log k).
4
4 Distances in caterpillars
4.1 Upper bound
Theorem 4.1. There exist a distance labeling scheme for caterpillars with worst case label size 2 log n
log log n+O(log log log n).
Proof. We will start by giving a simple 2 log nbits scheme and then improve it. The simple solution
assigns two numbers to each node. The nodes on the spine save distroot and the number 0. The nodes
not on the spine save their parent’s distroot and a number that is unique among their siblings. The
second number is required to distinguish siblings, and hence determine if the distance between two
nodes is 0 or 2. The worst-case label size for this solution is 2 log n+O(1).
To improve the solution, we split up the nodes on the spine into two groups: (1) nodes with >n
k
leaves and (2) nodes with n
kleaves, for some parameter kto be chosen later. We add the root to the
first group no matter what. Note that the first group can contain at most knodes.
As before, all nodes save two numbers: distroot and the number 0 for spine nodes or a number to
distinguish siblings. The idea is to reduce label size with log kbits by using fewer bits for the first
number for nodes in the first group and for the second number for nodes in the second group.
The nodes in the first group form a path with at most knodes and edge weights in [1, n] (where
each weight corresponds to the distance between the nodes in the original graph). The algorithm
from Theorem 3.1 will add a number x, which is less than the diameter, which again is less than n,
to the numbers representing the root distances of the nodes. Using this technique, we can, as seen
in the proof of Theorem 3.1, encode the (modified) distroots of the nodes in the first group with only
k1
klog n+log k+O(log log k) bits. This gives labels of size 2k1
klog n+log k+O(log log k) for non-spine
nodes whose parents are in the first group.
We will also add xto the distroots of nodes in the second group, but since x < n this will not
change the label size by more than a single bit. For non-spine nodes whose parents are in the second
group, we need at most log nlog k+O(1) bits for the second number, giving a total label size of
2 log nlog k+O(1).
Finally, since the two numbers that form a label now have different lengths, we need an additional
O(log log k) bits to determine when one number ends and the next begins. Indeed, it wil be possible
to split up labels into their components if we know the number of bits used to write log k, and we
represent this number with O(log log k) bits.
Setting k=log n
2 log log n, we now see that our worst-case label size is the maximum of
2 log nlog k+O(log log k) = 2 log nlog log n+O(log log log n)
and
2k1
klog n+ log k+O(log log k) = 2 log n2 log log n+ log log n+O(log log log n)
= 2 log nlog log n+O(log log log n).
This proves the theorem.
4.2 Lower bound
We present a technique that counts tuples of labels that are known to be distinct and compares the
result to the number of tuples one can obtain with labels of size L. The technique may have applications
to distance labeling for other families of graphs.
Theorem 4.2. For any n4, any distance labeling scheme for the family of caterpillars with at most
nnodes has a worst-case label size of at least 2log n⌋ − ⌊loglog n⌋⌋ − 4.
5
Proof. Set k=log nand m= 2k. Let (i1,...,ik) be a sequence of knumbers from the set {1,...,m/2}
with the only requirement being that i1= 1. Now consider, for each such sequence, the caterpillar whose
main path has length m/2 and where, for t= 1,...,k, the node in position ithas m/2kleaf children
(not on the main path). We shall refer to these children as the t’th group. Note that two disjoint groups
of children may be children of the same node if it=isfor some s, t. Each of these caterpillar has
m/2 + km/2k⌋ ≤ mnnodes.
Suppose that σis a distance labeling scheme for the family of caterpillars, and consider one of
the caterpillars defined above. Given distinct nodes u, v not on the main path, their distance will be
dist(u, v) = |isit|+ 2, where isand itare the positions on the main path of the parents of uand
v, respectively. In particular, if s= 1, so that is= 1, then dist(u, v) = it+ 1. Thus, if σhas been
used to label the nodes of the caterpillar, the number itfor a child in the t’th group can be uniquely
determined from its label together with the label of any of the children from the first group. It follows
that any k-tuple of labels (l1,...,lk) where ltis a label of a child in the t’th group uniquely determines
the sequence (i1,...,ik). In particular, k-tuples of labels from distinct caterpillars must be distinct. Of
course, k-tuples of labels from the same caterpillar must also be distinct, since labels are unique in a
distance labeling scheme.
Now, there are (m/2)k1choices for the sequence (i1,...,ik), and hence there are (m/2)k1different
caterpillars in this form. For each of these, there are m/2kkdifferent choices of k-tuples of labels.
Altogether, we therefore have (m/2)k1m/2kkdistinct k-tuples of labels. If the worst-case label
size of σis L, then we can create at most (2L+1 1)kdistinct k-tuples of labels, so we must have
(m/2)k1m/2kk(2L+1 1)k. From this it follow that
L≥ ⌊k1
k(log m1) + logm/2k⌋⌋
≥ ⌊(k1)2
k+klog k⌋ − 2
2k− ⌊log k⌋ − 4
= 2log n⌋ − ⌊loglog n⌋⌋ − 4.
5 Exact distances in trees
5.1 Upper bound
Let u, v be nodes in a tree Tand let wbe their nearest common ancestor. We then have
dist(u, v) = distroot(u)distroot(v) + 2 dist(w, v) (1)
If w=uso that uis an ancestor of v, then the above equation is just a difference of distroots, which
can be stored for each node with log nbits. The same observation clearly holds if w=v.
Assume now that w /∈ {u, v}so that uand vare not ancestors of each other. Consider the heavy-
light decomposition [59] described in the preliniaries. At least one of the nodes uand vmust have an
ancestor which is a light child of w. Assume that it is v. Now, vhas at most log nlight ancestors.
Saving the distance to all of them together with distroot gives us sufficient information to compute the
distance between uand vusing equation (1). This is the idea behind Theorem 5.2 below.
By examining the NCA labeling scheme from [7, 8], we see that it can easily be extended as follows.
Lemma 5.1 ([7, 8]).There exists an NCA labeling scheme of size O(log n). For any two nodes u, v the
scheme returns the label of w= nca(u, v)as well as:
which of uand v(if any) have a light ancestor that is a child of w; and
6
the number of light nodes on the path from the root to wand from wto uand v, respectively.
Theorem 5.2. There exists a distance labeling scheme for trees with worst-case label size 1
2log2n+
O(log n).
Proof. We use O(log n) bits for the extended NCA labeling in Lemma 5.1 and for distroot. Using (1)
it now only remains to efficiently represent, for each node, the distance to all its light ancestors. We
consider the light ancestors of a node vencountered on the path from the root to v. The distance from
vto the root is at most n1 and can therefore be encoded with exactly log nbits (by adding leading
zeros if needed). By construction of the heavy-light decomposition, the next light node on the path to
vwill be the root of a subtree of size at most n/2, meaning that the distance from vto that ancestor
is at most n/21 and can be encoded with exactly log n⌉ − 1 bits. Continuing this way, we encode
the i’th light ancestor on the path from the root to vwith exactly log n ibits. When we run out of
light ancestors, we concatenate all the encoded distances, resulting in a string of length at most
log n+ (log n⌉ − 1) + ···+ 2 + 1 = 1
2log n2+1
2log n.
We can use O(log n) extra bits to encode nand to separate all sublabels from each other. The decoder
can now determine log nand split up the entries in the list of distances. When applying formula (1),
it can then determine the distance between vand wby adding together the relevant distances in the
list of light ancestors, using the fact from Lemma 5.1 that it knows the number of light ancestors from
vto w.
5.2 Lower bound
In the case of general trees, Gavoille et al [37] establish a lower bound of 1
8log2nO(log n) using
an ingenious technique where they apply a distance labeling scheme to a special class of trees called
(h, M )-trees3. The following uses a generalization of (h, M )-trees to improve their ideas and leads to a
lower bound of 1
4log2nO(log n).
(h, W, a)-trees. We begin with some definitions. For integers h, W 0 and a number a1 such that
W/aiis integral for all i= 0,...,h, an (h, W, a)-tree is a rooted binary tree Twith edge weights in
[0, W ] that is constructed recursively as follows. For h= 0, Tis just a single node. For h= 1, Tis
a claw (i.e. a star with three edges) with edge weights x, x, W xfor some 0 x < W rooted at the
leaf node of the edge with weight Wx. For h > 1, Tconsists of an (1, W, a)-tree whose two leaves
are the roots of two (h1, W/a, a)-trees T0, T1. We shall denote an (h, W, a)-tree constructed in this
way by T=hT0, T1, xiAn example for h= 3 can be seen in Figure 1. Note that the case a= 1 simply
corresponds to the (h, W )-trees defined in [37].
It is easy to see that an (h, W, a)-tree has 2hleaves and 3·2h2 nodes. Further, it is straightforward
to see that, if u, v are leaves in an (h, W, a)-tree T=hT0, T1, xi, then
distT(u, v) = (2Wa1ah
1a1+ 2x, if uT0and vT1, or vice versa,
distTi(u, v),if u, v Tifor some i= 0,1.(2)
Leaf distance labeling schemes. In the following we shall consider leaf distance labeling schemes
for the family of (h, W, a)-trees: that is, distance labeling schemes where only the leaves in a tree need to
be labeled, and where only leaf labels can be given as input to the decoder. Since an ordinary distance
labeling scheme obviously can be used only for leaves, any lower bound on worst-case label sizes for a
leaf distance labeling scheme is also a lower bound for an ordinary distance labeling scheme. We denote
by g(h, W, a) the smallest number of labels needed by an optimal leaf distance labeling scheme to label
all (h, W, a)-trees.
3Note that their exposition has some minor errors as pointed out (and corrected) in [41]
7
z4z4
W
a2z4
z3z3
W
a2z3
y2y2
W
ay2
z2z2
W
a2z2
z1z1
W
a2z1
y1y1
W
ay1
x x
Wx
Figure 1: An (h, W, a)-tree, where h= 3. We require that x < W ,y1, y2< W/a and z1,...,z4< W/a2.
Lemma 5.3. For all h1and W2,g(h, W, a)2W g(h1, W 2/a2, a2).
Proof. Fix an optimal leaf distance labeling scheme σwhich produces exactly g(h, W, a) distinct labels
for the family of (h, W, a)-trees. For leaves uand vin an (h, W, a)-tree, denote by l(u) and l(v),
respectively, the labels assigned by σ. For x= 0,...,W 1, let S(x) be the set consisting of pairs of
labels (l(u), l(v)) for all leaves uT0and vT1in all (h, W, a)-trees T=hT0, T1, xi.
The sets S(x) and S(x) are disjoint for x6=x, since every pair of labels in S(x) uniquely determines
xdue to (2). Letting S=SW1
x=0 S(x), we therefore have |S|=PW1
x=0 |S(x)|. Since Scontains pairs
of labels produced by σfrom leaves in (h, W, a)-trees , we clearly also have |S| ≤ g(h, W, a)2, and
hence it only remains to prove that |S| ≥ W g(h1, W 2/a2, a2), which we shall do by showing that
|S(x)| ≥ g(h1, W 2/a2, a2) for all x.
The goal for the rest of the proof is therefore to create a leaf distance labeling scheme for (h
1, W 2/a2, a2)-trees using only labels from the set S(x) for some fixed x. So let xbe given and consider
an (h1, W 2/a2, a2)-tree T. Let V=W/a. From Twe shall construct an (h1, V, a)-tree φi(T)
for i= 0,1 such that every leaf node vin Tcorresponds to nodes φi(v) in φi(T) for i= 0,1. The
trees φi(T) are defined as follows. If h= 1, so that Tconsists of a single node, then φi(T) = T
for i= 0,1. If h > 1, then Tis in the form T=hT
0, T
1, yifor some 0 y < V 2. We can write
yin the form y=y0+y1Vfor uniquely determined y0, y1with 0 y0, y1< V . For i= 0,1, we
recursively define φi(T) = hφi(T
0), φi(T
1), yii. Thus, φi(T) is an (h1, V, a)-tree that is similar to T
but where we replace the top edge weight yby edge weights yiand, recursively, do the same for all
(h2, V 2/a2, a2)-subtrees. Note also that the corresponding edge weight V2yin Tautomatically is
replaced by the edge weight Vyiin φi(T) in order for φi(T) to be an (h1, V , a)-tree.
Denote by φi(v) the leaf in φi(T) corresponding to the leaf vin T.
Consider now the (h, W, a)-tree T=hφ0(T), φ1(T), xi. Every leaf vin Tcorresponds to the leaves
φ0(v), φ1(v) in Twhere φi(v)φi(T) for i= 0,1. Using formula (2) for the distances in T, it is
straightforward to see that
distT(u, v) = distφ0(T)(φ0(u), φ0(v)) mod (2V)+Vdistφ1(T)(φ1(u), φ1(v)).
We can now apply the leaf distance labeling scheme σto Tand obtain a label for each leaf node in
T. In particular, the pair of leaves (φ0(v), φ1(v)) corresponding to a node vin Twill receive a pair of
labels. We use this pair to label vin T, whereby we have obtained a labeling of the leaves in Twith
labels from S(x). Using the formula in (5.2) we can construct a decoder that can compute the distance
between two nodes in Tusing these labels alone, and hence we have obtained a leaf distance labeling
scheme for (h1, V 2, a2)-trees using only labels from S(x) as desired.
8
Lemma 5.4. For all h1and W2,g(h, W, a)Wh/2
ah(h1)/4.
Proof. The proof is by induction on h. For h= 1 we note that an (0, W, a)-tree has only one node,
so that g(0, W 2/a2, a2) = 1. Lemma 5.3 therefore yields g(1, W, a)2Wfrom which it follows that
g(1, W, a)W. The lemma therefore holds for h= 1. Now let h > 1 and assume that the lemma
holds for h1. Lemma 5.3 and the induction hypothesis now yield
g(h, W, a)2W g(h1, W 2/a2, a2)
W(W2/a2)(h1)/2
a2(h1)(h2)/4
=Wh
ah(h1)/2
from which it follows that g(h, W, a)Wh/2
ah(h1)/4as desired.
The previous lemma implies that any (leaf and hence also ordinary) distance labeling scheme for
(h, W, a)-trees must have labels with worst-case length at least h
2(log Wh1
2log a) = 1
2hlog W
1
4h2log a+1
4hlog a. Since the number of nodes in such a tree is n= 3 ·2h2, it follows that h=
log(n+ 2) log 3, and hence that log n2hlog nfor sufficiently large n. From this we see that
the worst case label length is at least
1
2log nlog W1
4log n(log n1) log alog W1
2log a.
In the case where a= 1, we retrieve the bound of 1
2log nlog Wlog Wobtained in [36]. It seems that
larger values of aonly makes the above result weaker, but the the real strength of the above becomes
apparent when we switch to the unweighted version of (h, W, a)-trees, in which we replace weighted
edges by paths of similar lenghts. Note that a distance labeling scheme for the family of unweighted
(h, W, a)-trees can be used as a distance labeling scheme for the weighted (h, W, a)-trees, and hence any
lower bound in the weighted version automatcially becomes a lower bound in the unweighted version.
The number of nodes nin an unweighted (h, W, a)-tree is upper bounded by
n2W+ 2 ·2W/a + 22·2W/a2+···+ 2h1·2W/ah1+ 1
In the case a= 2, we get n2W h + 1.
Theorem 5.5. Any distance labeling scheme for unweighted (h, W, 2)-trees, and hence also for general
trees, has a worst-case label size of at least 1
4log2nO(log n).
Proof. Choose the largest integer hwith 2 ·2hh+ 1 n, and note that we must have hlog n
O(log log n). Set W= 2hand consider the family of (h, W, 2)-trees, which is a subfamily of the family
of trees with nnodes. From Lemma 5.4 it therefore follows that the worst-case label length is
1
2hlog W1
4h2+1
4h=1
4h2+1
4h=1
4log2n+1
4log nO(log log n).
6 Approximate distances in trees
In this section we present a (1 + ε)-stretch distance labeling schemes with labels of size O(log n).
Theorem 6.1. For constant ε > 0,1 + εstretch labeling scheme use Θ(log n)bits.
9
Proof. As in the case of exact distances, we will create labels of size O(log n) bits that contain the
extended NCA labels from Lemma 5.1 as well as distroot. We will also be using the formula in (1).
However we can not afford to store exact distance to each apex ancestor. Even storing an 2-approximate
distance to each apex ancestor would require log nlog log nbits. Furthermore, given approximate dis-
tance to the apex nodes does not directly guarantee upper bound for the approximate distance, as we
in equation (1) are using subtractions. We will in the following address these two problems.
Let w= nca(u, v) and assume w6∈ {u, v }, since otherwise we can compute the exact distance using
only distroot. Suppose we know a (1 + ε)-approximation αof dist(w, v) for some ε0. That is,
dist(w, v)α(1 + ε) dist(w, v).(3)
Define ˜
d= distroot(u)distroot(v) + 2α. First we show that ˜
dis a (1 + 2ε)-approximation of dist(u, v).
Next we show how to represent all the (1 + ε)-approximate distances to light ancestors for a node using
a total of O(log n) bits. Together with formula (1), these two facts prove that we can compute (1 + 2ε)-
stretch distances between any pair of nodes with labels of size O(log n). To prove the theorem, we can
then simple replace εby 1
2ε.
To see that ˜
dis a (1 + 2ε)-approximation of dist(u, v), first note that
˜
d= distroot(u)distroot(v) + 2αdistroot(u)distroot(v) + 2 dist(w, v) = dist(u, v).
For the other inequality, note that
˜
d= distroot(u)distroot(v) + 2α
distroot(u)distroot(v) + 2(1 + ǫ) dist(w, v)
= distroot(u)(distroot(v)dist(w, v)) + (1 + 2ǫ) dist(w, v)
= distroot(u)distroot(w) + (1 + 2ǫ) dist(w, v)
= dist(u, w) + (1 + 2ǫ) dist(w, v)
(1 + 2ǫ) (dist(u, w) + dist(w, v))
= (1 + 2ǫ) dist(u, v).
It now only remains to show that we can compactly store all the approximate distances αto light
ancestors using O(log n) bits space.
We use a heavy light path decomposition of the tree. For each node vwe can save a 2 approximate
distance to all its kproper light ancestors node as follows. Let Sbe a binary string initially with k
zeros. Before each 0 we now inserts 1s such that, if we have j1s in total from the beginning of Sand
to the i’th 0, then the distance to the ith light ancestor aof vsatisfies that 2j1dist(v, a)2j. This
is the same as traversing the tree bottom-up from vand, for each light node encountered on the way,
adding a 0 and each time the distance doubles adding a 1. The number of 0s equal the number of light
nodes which is at most log n, and the number of 1s is also limited by log nsince nis the maximum
distance in the tree. In total the length of Sis at most 2 log n.
Using the O(log n) bits label from Lemma 5.1 we can tell if one node is an ancestor of another, and
if not which one has a light ancestor athat is a child of their nearest common ancestor w. In addition,
we can determine the total number iof light ancestors up to a. This means that we can compute j,
and hence the 2-approximation j1, as the number of 1’s in Suntil the i’th 0.
We have now obtained a 2-approximation with labels of size O(log n). We can improve this to a
(1 + ε)-approximation by setting a 1 in Seach time the distance increases with 1 + εrather than 2.
This will increase the label size with a constant factor 1
log(1+ε).
This proves that there is a (1 + ε)-stretch distance labeling scheme with O(log n). To complete the
proof of the theorem, we note that, given any (1 + ε)-stretch distance scheme, we can always distinguish
nodes (since identical nodes have distance 0), which means that we always need at least ndifferent
labels, and hence labels of size at least log nbits.
10
References
[1] S. Abiteboul, S. Alstrup, H. Kaplan, T. Milo, and T. Rauhe. Compact labeling scheme for ancestor
queries. SIAM J. Comput., 35(6):1295–1309, 2006.
[2] S. Abiteboul, H. Kaplan, and T. Milo. Compact labeling schemes for ancestor queries. In Proc. of
the 12th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 547–556, 2001.
[3] S. Alstrup, P. Bille, and T. Rauhe. Labeling schemes for small distances in trees. In Proc. of the
14th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 689–698, 2003.
[4] S. Alstrup, P. Bille, and T. Rauhe. Labeling schemes for small distances in trees. SIAM J. Discrete
Math., 19(2):448–462, 2005. See also SODA’03.
[5] S. Alstrup, S. Dahlgaard, and M. B. T. Knudsen. Optimal induced universal graphs and labeling
schemes for trees. In Proc. 56th Annual Symp. on Foundations of Computer Science (FOCS), 2015.
[6] S. Alstrup, C. Gavoile, E. B. Halvorsen, and H. Petersen. Simpler, faster and shorter labels for
distances in graphs. Submitted, 2015.
[7] S. Alstrup, C. Gavoille, H. Kaplan, and T. Rauhe. Nearest common ancestors: A survey and a
new algorithm for a distributed environment. Theory of Computing Systems, 37(3):441–456, May
2004.
[8] S. Alstrup, E. B. Halvorsen, and K. G. Larsen. Near-optimal labeling schemes for nearest common
ancestors. In Proc. of the 25th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages
972–982, 2014.
[9] S. Alstrup, H. Kaplan, M. Thorup, and U. Zwick. Adjacency labeling schemes and induced-universal
graphs. In Proc. of the 47th Annual ACM Symp. on Theory of Computing (STOC), 2015.
[10] S. Alstrup and T. Rauhe. Improved labeling schemes for ancestor queries. In Proc. of the 13th
Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), 2002.
[11] S. Alstrup and T. Rauhe. Small induced-universal graphs and compact implicit graph represen-
tations. In Proc. 43rd Annual Symp. on Foundations of Computer Science (FOCS), pages 53–62,
2002.
[12] F. Bazzaro and C. Gavoille. Localized and compact data-structure for comparability graphs. Dis-
crete Mathematics, 309(11):3465–3484, 2009.
[13] N. Bonichon, C. Gavoille, and A. Labourel. Short labels by traversal and jumping. In Structural
Information and Communication Complexity, pages 143–156. Springer, 2006. Include proof for
binary trees and caterpillars.
[14] M. A. Breuer. Coding the vertexes of a graph. IEEE Trans. on Information Theory, IT–12:148–153,
1966.
[15] M. A. Breuer and J. Folkman. An unexpected result on coding vertices of a graph. J. of Mathe-
mathical analysis and applications, 20:583–600, 1967.
[16] V. D. Chepoi, F. F. Dragan, B. Estellon, M. Habib, and Y. Vax`es. Diameters, centers, and
approximating trees of delta-hyperbolic geodesic spaces and graphs. In 24st Annual ACM Symp.
on Computational Geometry (SoCG), pages 59–68, 2008.
11
[17] V. D. Chepoi, F. F. Dragan, and Y. Vax`es. Distance and routing labeling schemes for non-positively
curved plane graphs. J. of Algorithms, 61(2):60–88, 2006.
[18] F. R. K. Chung. Universal graphs and induced-universal graphs. J. of Graph Theory, 14(4):443–454,
1990.
[19] E. Cohen, H. Kaplan, and T. Milo. Labeling dynamic XML trees. SIAM J. Comput., 39(5):2048–
2074, 2010.
[20] B. Courcelle and R. Vanicat. Query efficient implementation of graphs of bounded clique-width.
Discrete Applied Mathematics, 131:129–150, 2003.
[21] L. J. Cowen. Compact routing with minimum stretch. J. of Algorithms, 38:170–183, 2001. See also
SODA’91.
[22] Y. Dodis, M. Pˇatra¸scu, and M. Thorup. Changing base without losing space. In Proc. of the 42nd
Annual ACM Symp. on Theory of Computing (STOC), pages 593–602, 2010.
[23] T. Eilam, C. Gavoille, and D. Peleg. Compact routing schemes with low stretch factor. J. of
Algorithms, 46(2):97–114, 2003.
[24] M. Elkin, A. Filtser, and O. Neiman. Prioritized metric structures and embedding. In Proc. of the
47th Annual ACM Symp. on Theory of Computing (STOC), pages 489–498, 2015.
[25] A. Farzan and J. I. Munro. Succinct encoding of arbitrary graphs. Theoretical Computer Science,
513:38–52, 2013.
[26] A. Farzan and J. I. Munro. A uniform paradigm to succinctly encode various families of trees.
Algorithmica, 68(1):16–40, 2014.
[27] P. Fraigniaud and A. Korman. On randomized representations of graphs using short labels. In
Proc. of the 21st Annual Symp. on Parallelism in Algorithms and Architectures (SPAA), pages
131–137, 2009.
[28] P. Fraigniaud and A. Korman. Compact ancestry labeling schemes for XML trees. In Proc. of the
21st annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 458–466, 2010.
[29] P. Fraigniaud and A. Korman. An optimal ancestry scheme and small universal posets. In Proc.
of the 42nd Annual ACM Symp. on Theory of Computing (STOC), pages 611–620, 2010.
[30] C. Gavoille, M. Katz, N. Katz, C. Paul, and D. Peleg. Approximate distance labeling schemes. In
Proc. of the 9th Annual European Symp. on Algorithms (ESA), pages 476–488, 2001.
[31] C. Gavoille and A. Labourel. Distributed relationship schemes for trees. In 18th International
Symp. on Algorithms and Computation (ISAAC), pages 728–738, 2007.
[32] C. Gavoille and A. Labourel. On local representation of distances in trees. In Proc. of the 26th
Annual ACM Symp. on Principles of Distributed Computing (PODC), pages 352–353, 2007.
[33] C. Gavoille and O. Ly. Distance labeling in hyperbolic graphs. In 16th Annual International Symp.
on Algorithms and Computation (ISAAC), pages 1071–1079, 2005.
[34] C. Gavoille and C. Paul. Distance labeling scheme and split decomposition. Discrete Mathematics,
273(1-3):115–130, 2003.
[35] C. Gavoille and C. Paul. Optimal distance labeling for interval graphs and related graphs families.
SIAM J. Discrete Math., 22(3):1239–1258, 2008.
12
[36] C. Gavoille, D. Peleg, S. P´erennes, and R. Raz. Distance labeling in graphs. In Proc. of the 12th
Annual ACM-SIAM Symp. on Discrete algorithms (SODA), pages 210–219, 2001.
[37] C. Gavoille, D. Peleg, S. P´erennes, and R. Raz. Distance labeling in graphs. J. of Algorithms,
53(1):85 – 112, 2004. See also SODA’01.
[38] R. L. Graham and H. O. Pollak. On embedding graphs in squashed cubes. In Lecture Notes in
Mathematics, volume 303. Springer-Verlag, 1972.
[39] A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded geometries, fractals, and low-distortion
embeddings. In 44th Annual Symp. on Foundations of Computer Science (FOCS), pages 534–543,
2003.
[40] A. Gupta, A. Kumar, and R. Rastogi. Traveling with a pez dispenser (or, routing issues in mpls).
SIAM J. on Computing, 34(2):453–474, 2005. See also FOCS’01.
[41] E. B. Halvorsen. Labeling schemes for trees - overview and new results. Master’s thesis, University
of Copenhagen, 2013. Available at esben.bistruphalvorsen.dk.
[42] S. Kannan, M. Naor, and S. Rudich. Implicit representation of graphs. SIAM J. Disc. Math., pages
596–603, 1992. See also STOC’88.
[43] M. Kao, X. Li, and W. Wang. Average case analysis for tree labelling schemes. Theor. Comput.
Sci., 378(3):271–291, 2007.
[44] H. Kaplan and T. Milo. Short and simple labels for distances and other functions. In 7nd Work.
on Algo. and Data Struc., 2001.
[45] H. Kaplan, T. Milo, and R. Shabo. A comparison of labeling schemes for ancestor queries. In Proc.
of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), 2002.
[46] M. Katz, N. A. Katz, A. Korman, and D. Peleg. Labeling schemes for flow and connectivity. SIAM
J. Comput., 34(1):23–40, 2004. See also SODA’02.
[47] A. Korman. Labeling schemes for vertex connectivity. ACM Trans. Algorithms, 6(2):39:1–39:10,
2010.
[48] A. Korman and S. Kutten. Labeling schemes with queries. CoRR, abs/cs/0609163, 2006.
[49] A. Korman and S. Kutten. Labeling schemes with queries. In SIROCCO, pages 109–123, 2007.
[50] A. Korman and D. Peleg. Labeling schemes for weighted dynamic trees. Inf. Comput., 205(12):1721–
1740, 2007.
[51] R. Krauthgamer and J. R. Lee. Algorithms on negatively curved spaces. In 47th Annual Symp. on
Foundations of Computer Science (FOCS), pages 119–132, 2006.
[52] J. W. Moon. On minimal n-universal graphs. Proc. of the Glasgow Mathematical Association,
7(1):32–33, 1965.
[53] J. H. M¨uller. Local structure in graph classes. PhD thesis, Georgia Institute of Technology, 1988.
[54] J. I. Munro, R. Raman, V. Raman, and S. Srinivasa Rao. Succinct representations of permutations
and functions. Theor. Comput. Sci., 438:74–88, 2012.
[55] M. Pˇatra¸scu. Succincter. In Proc. 49th Annual Symp. on Foundations of Computer Science (FOCS),
pages 305–313, 2008.
13
[56] D. Peleg. Informative labeling schemes for graphs. In Proc. 25th Symp. on Mathematical Founda-
tions of Computer Science, pages 579–588, 2000.
[57] D. Peleg. Proximity-preserving labeling schemes. J. Graph Theory, 33(3):167–176, 2000.
[58] N. Santoro and R. Khatib. Labeling and implicit routing in networks. The computer J., 28:5–8,
1985.
[59] D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. J. of Computer and System
Sciences, 26(3):362 – 391, 1983.
[60] J. P. Spinrad. Efficient Graph Representations, volume 19 of Fields Institute Monographs. AMS,
2003.
[61] K. Talwar. Bypassing the embedding: algorithms for low dimensional metrics. In Proc. of the 36th
Annual ACM Symp. on Theory of Computing (STOC), pages 281–290, 2004.
[62] M. Tang, J. Yang, and G. Zhang. A compact distance labeling scheme for trees of small depths.
In International Conference on Scalable Computing and Communications / Eighth International
Conference on Embedded Computing, ScalCom-EmbeddedCom, pages 455–458, 2009.
[63] M. Thorup. Compact oracles for reachability and approximate distances in planar digraphs. J.
ACM, 51(6):993–1024, 2004. See also FOCS’01.
[64] M. Thorup and U. Zwick. Compact routing schemes. In Proc. of the 13th Annual ACM Symp. on
Parallel Algorithms and Architectures, SPAA ’01, pages 1–10, 2001.
[65] M. Thorup and U. Zwick. Approximate distance oracles. J. of the ACM, 52(1):1–24, 2005. See
also STOC’01.
[66] O. Weimann and D. Peleg. A note on exact distance labeling. Inf. Process. Lett., 111(14):671–673,
2011.
[67] Wikipedia. Implicit graph — wikipedia, the free encyclopedia, 2013. [Online; accessed 15-February-
2014].
[68] P. M. Winkler. Proof of the squashed cube conjecture. Combinatorica, 3(1):135–139, 1983.
14
... We stress that the only information available to the decoder are the given labels, and it cannot access any other information about the network. This question has been considered for functions such as adjacency [4,6,11,14,34,41], distance [1,5,7,9,25,26,[28][29][30]32,35,39], flows and connectivity [33,36,37] or Steiner tree [40]. See [42] for an up-to-date survey. ...
... Upper bound Distance 1/4 log 2 n − O(log n) [9] 1/4 log 2 n + o(log 2 n) [25] Fixed-port routing Ω(log 2 n/ log log n) [21] O(log 2 n/ log log n) [20] (1 + )-approximate distance Ω(log (1/ ) log n) [25] O(log (1/ ) log n) [25] Nearest common ancestor 1.008 log n − O(1) [10] 2.318 log n + O(1) [31] Designer-port routing log n + Ω(log log n) [5] log n + O((log log n) 2 ) log n + O((log log n) 2 ) log n + O((log log n) 2 ) Ancestry log n + Ω(log log n) [5] log n + O(log log n) [23] Siblings/connectivity (unique) log n + Ω(log log n) [5] log n + O(log log n) [5] Adjacency log n (trivial) log n + O(1) [6] Table 1: Summary of the state-of-the-art bounds for labeling schemes in trees. ...
... Arguably, trees are one of the most important classes of graphs considered in the context of labeling schemes. Functions studied in the literature on labeling schemes in trees include adjacency [6,13,14], ancestry [2,22,23], routing [20,21,44], distance [5,9,27,29,39], and nearest common ancestors [8,10,19,31,40]. See Table 1 for a summary of the state-of-the-art bounds for these problems. ...
Preprint
A routing labeling scheme assigns a binary string, called a label, to each node in a network, and chooses a distinct port number from $\{1,\ldots,d\}$ for every edge outgoing from a node of degree $d$. Then, given the labels of $u$ and $w$ and no other information about the network, it should be possible to determine the port number corresponding to the first edge on the shortest path from $u$ to $w$. In their seminal paper, Thorup and Zwick [SPAA 2001] designed several routing methods for general weighted networks. An important technical ingredient in their paper that according to the authors ``may be of independent practical and theoretical interest'' is a routing labeling scheme for trees of arbitrary degrees. For a tree on $n$ nodes, their scheme constructs labels consisting of $(1+o(1))\log n$ bits such that the sought port number can be computed in constant time. Looking closer at their construction, the labels consist of $\log n + O(\log n\cdot \log\log\log n / \log\log n)$ bits. Given that the only known lower bound is $\log n+\Omega(\log\log n)$, a natural question that has been asked for other labeling problems in trees is to determine the asymptotics of the smaller-order term. We make the first (and significant) progress in 19 years on determining the correct second-order term for the length of a label in a routing labeling scheme for trees on $n$ nodes. We design such a scheme with labels of length $\log n+O((\log\log n)^{2})$. Furthermore, we modify the scheme to allow for computing the port number in constant time at the expense of slightly increasing the length to $\log n+O((\log\log n)^{3})$.
... The current paper is the latest in a series of results dating back to Kannan, Naor, and Rudich [18,19] and Muller [22] who defined adjacency labelling schemes 2 and described O(log n)-bit adjacency labelling schemes for several classes of graphs, including planar graphs. Since this initial work, adjacency labelling schemes and, more generally, informative labelling schemes have remained a very active area of research [2,7,1,4,6,5,8]. ...
... The operations in a bulk tree proceed in rounds where, in each round, two types of bulk operations are performed: bulk insertion, in which a set I ⊂ R \ V (T ) of new values are inserted into T , and bulk deletion, in which a set D ⊆ V (T ) of values are removed from T . The sets I and D inserted into and deleted from T in a single round must satisfy the following two restrictions: 6 Note that (ii) implies that |D| 6(|V (T )| − |D|), so |D| (6/7)|V (T )|. ...
... A graph is an apex if it has a vertex whose removal leaves a planar graph.6 There is nothing special about the constant 3 here. ...
Preprint
We show that there exists an adjacency labelling scheme for planar graphs where each vertex of an $n$-vertex planar graph $G$ is assigned a $(1+o(1))\log_2 n$-bit label and the labels of two vertices $u$ and $v$ are sufficient to determine if $uv$ is an edge of $G$. This is optimal up to the lower order term and is the first such asymptotically optimal result. An alternative, but equivalent, interpretation of this result is that, for every $n$, there exists a graph $U_n$ with $n^{1+o(1)}$ vertices such that every $n$-vertex planar graph is an induced subgraph of $U_n$. These results generalize to bounded genus graphs, apex-minor-free graphs, bounded-degree graphs from minor closed families, and $k$-planar graphs.
... A basic ingredient for our data structure will be distance labeling for trees. Exact distance labeling on an n-vertex tree requires Θ(log n) words [AGHP16]. In order to shave a log n factor, we will use approximated labeling scheme. ...
... The lower bound is based on[AGHP16] (h, M )-trees, and uses only the labels on leaf vertices. One can observe that the metric induces by leaf vertices in (h, M )-trees is in fact an ultrametric. ...
Preprint
Full-text available
In network design problems, such as compact routing, the goal is to route packets between nodes using the (approximated) shortest paths. A desirable property of these routes is a small number of hops, which makes them more reliable, and reduces the transmission costs. Following the overwhelming success of stochastic tree embeddings for algorithmic design, Haeupler, Hershkowitz, and Zuzic (STOC'21) studied hop-constrained Ramsey-type metric embeddings into trees. Specifically, embedding $f:G(V,E)\rightarrow T$ has Ramsey hop-distortion $(t,M,\beta,h)$ (here $t,\beta,h\ge1$ and $M\subseteq V$) if $\forall u,v\in M$, $d_G^{(\beta\cdot h)}(u,v)\le d_T(u,v)\le t\cdot d_G^{(h)}(u,v)$. $t$ is called the distortion, $\beta$ is called the hop-stretch, and $d_G^{(h)}(u,v)$ denotes the minimum weight of a $u-v$ path with at most $h$ hops. Haeupler {\em et al.} constructed embedding where $M$ contains $1-\epsilon$ fraction of the vertices and $\beta=t=O(\frac{\log^2 n}{\epsilon})$. They used their embedding to obtain multiple bicriteria approximation algorithms for hop-constrained network design problems. In this paper, we first improve the Ramsey-type embedding to obtain parameters $t=\beta=\frac{\tilde{O}(\log n)}{\epsilon}$, and generalize it to arbitrary distortion parameter $t$ (in the cost of reducing the size of $M$). This embedding immediately implies polynomial improvements for all the approximation algorithms from Haeupler {\em et al.}. Further, we construct hop-constrained clan embeddings (where each vertex has multiple copies), and use them to construct bicriteria approximation algorithms for the group Steiner tree problem, matching the state of the art of the non constrained version. Finally, we use our embedding results to construct hop constrained distance oracles, distance labeling, and most prominently, the first hop constrained compact routing scheme with provable guarantees.
... Finally, we wonder whether the bound 2 O(f (n) log 2 n) in Theorem 3.4 can be replaced by 2 O(f (n)+log 2 n) . The motivation for this question is the following: on the one hand, we have seen that for planar graphs we can improve the bound of Theorem 3.4 from 2 O( √ n log 2 n) to 2 O( √ n) ; on the other hand, it is known that for n-vertex trees (which admit balanced separators of size 1), the minimum size of the labels in a distance labelling scheme is ( 1 4 + o(1)) log 2 n [14] and the constant 1 4 is best possible [6]. This shows in particular that the log 2 n term cannot be avoided in Theorem 3.4 and in a possible improvement with 2 O(f (n)+log 2 n) vertices. ...
... The work on distance labelling in trees mentioned above also motivates the following natural question: What is the smallest constant c > 0 such that the class of n-vertex trees has an isometric-universal graph with at most 2 c log 2 n vertices? The lower bound on distance labelling schemes in trees [6] shows that c 1 4 , while known upper bounds on the size of trees containing all n-vertex trees as subgraphs [12,20] (and thus also as isometric subgraphs) show that c 1 2 + o(1). ...
Preprint
We show that for any integer $n\ge 1$, there is a graph on $3^{n+O(\log^2 n)}$ vertices that contains isometric copies of all $n$-vertex graphs. Our main tool is a new type of distance labelling scheme, whose study might be of independent interest.
... We stress that the only information available to the decoder are the given labels, and it cannot access any other information about the network. This question has been considered for functions such as adjacency [37,6,44,11,4,15,14,20], distance [1,35,33,9,5,32,7,28,29,31,42,38], flows and connectivity [39,36,40] or Steiner tree [43]. See [45] for an up-to-date survey. ...
... Arguably, trees are one of the most important classes of graphs considered in the context of labeling schemes. Functions studied in the literature include adjacency [6,13,15], ancestry [2,25,26], routing [23,24,47], distance [42,32,5,9,30], and nearest common ancestors [43,8,22,10,34]. See Table 1 for a summary of the state-of-the-art bounds for these problems. ...
... This scheme is asymptotically optimal since Ω(n) bits labels are needed for general graphs. Another important result is that there exists a distance labeling scheme for trees with O(log 2 n) bits labels [6,32,50]. Several classes of graphs containing trees also enjoy a distance labeling scheme with O(log 2 n) bit labels such as bounded tree-width graphs [36], distancehereditary graphs [34], bounded clique-width graphs [29], and non-positively curved plane graphs [24]. A lower bound of Ω(log 2 n) bits on the label length is known for trees [6,36], implying that all the results mentioned above are optimal as well. ...
... Several classes of graphs containing trees also enjoy a distance labeling scheme with O(log 2 n) bit labels such as bounded tree-width graphs [36], distancehereditary graphs [34], bounded clique-width graphs [29], and non-positively curved plane graphs [24]. A lower bound of Ω(log 2 n) bits on the label length is known for trees [6,36], implying that all the results mentioned above are optimal as well. Other families of graphs have been considered such as interval graphs, permutation graphs, and their generalizations [12,35] for which an optimal bound of Θ(log n) bits was given, and planar graphs for which there is a lower bound of Ω(n 1 3 ) bits [36] and an upper bound of O( √ n) bits [38]. ...
Article
Full-text available
Distance labeling schemes are schemes that label the vertices of a graph with short labels in such a way that the distance between any two vertices u and v can be determined efficiently by merely inspecting the labels of u and v, without using any other information. Similarly, routing labeling schemes label the vertices of a graph in a such a way that given the labels of a source node and a destination node, it is possible to compute efficiently the port number of the edge from the source that heads in the direction of the destination. One of important problems is finding natural classes of graphs admitting distance and/or routing labeling schemes with labels of polylogarithmic size. In this paper, we show that the class of cube-free median graphs on n nodes enjoys distance and routing labeling schemes with labels of O(log3n) bits.
... Perhaps the next most commonly studied graph labelling problem is distance labelling [GPPR04], where the goal is to compute dist(x, y) from the labels (see e.g. [ADKP16,AGHP16b,FGNW17,GU21]). Intermediate between distance and adjacency labelling is the decision version of distance labelling: for given r, decide whether dist(x, y) r from the labels. ...
Preprint
Full-text available
We study the problems of adjacency sketching, small-distance sketching, and approximate distance threshold sketching for monotone classes of graphs. The problem is to obtain randomized sketches of the vertices of any graph G in the class, so that adjacency, exact distance thresholds, or approximate distance thresholds of two vertices u, v can be decided (with high probability) from the sketches of u and v, by a decoder that does not know the graph. The goal is to determine when sketches of constant size exist. We show that, for monotone classes of graphs, there is a strict hierarchy: approximate distance threshold sketches imply small-distance sketches, which imply adjacency sketches, whereas the reverse implications are each false. The existence of an adjacency sketch is equivalent to the condition of bounded arboricity, while the existence of small-distance sketches is equivalent to the condition of bounded expansion. Classes of constant expansion admit approximate distance threshold sketches, while a monotone graph class can have arbitrarily small non-constant expansion without admitting an approximate distance threshold sketch.
... Better schemes for distance labeling are known for restricted classes of graphs. As a prime example, trees admit a distance labeling scheme with labels of length 1 4 log 2 n + o(log 2 n) bits [15], and this is known to be tight up to lowerorder terms [7]. In fact, any sparse graph admits a sublinear distance labeling scheme [5] (see also [17] for a somewhat simpler construction). ...
Chapter
Full-text available
A distance labeling scheme is an assignment of labels, that is, binary strings, to all nodes of a graph, so that the distance between any two nodes can be computed from their labels without any additional information about the graph. The goal is to minimize the maximum length of a label as a function of the number of nodes. A major open problem in this area is to determine the complexity of distance labeling in unweighted planar (undirected) graphs. It is known that, in such a graph on n nodes, some labels must consist of \(\varOmega (n^{1/3})\) bits, but the best known labeling scheme constructs labels of length \(\mathcal {O}(\sqrt{n}\log n)\) [Gavoille, Peleg, Pérennes, and Raz, J. Algorithms, 2004]. For weighted planar graphs with edges of length polynomial in n, we know that labels of length \(\varOmega (\sqrt{n}\log n)\) are necessary [Abboud and Dahlgaard, FOCS 2016]. Surprisingly, we do not know if distance labeling for weighted planar graphs with edges of length polynomial in n is harder than distance labeling for unweighted planar graphs. We prove that this is indeed the case by designing a distance labeling scheme for unweighted planar graphs on n nodes with labels consisting of \(\mathcal {O}(\sqrt{n})\) bits with a simple and (in our opinion) elegant method. We augment the construction with a mechanism that allows us to compute the distance between two nodes in only polylogarithmic time while increasing the length by \(\mathcal {O}(\sqrt{n\log n})\). The previous scheme required \(\varOmega (\sqrt{n})\) time to answer a query in this model.
... We refer to [FGK20] for an overview of distance labeling schemes in different regimes (and comparison with metric embedding). Exact distance labeling on an n-vertex tree requires Θ(log n) words [AGHP16] (see also [GPPR04,Pel00]), which is already larger than the routing table size we are aiming for. Nonetheless, Freedman et al. [FGNW17] (improving upon [AGHP16, GKK + 01]) showed that for any n-vertex unweighted tree, and ∈ (0, 1), one can construct an 1 + -labeling scheme with labels of size O(log 1 ) words. ...
Preprint
Full-text available
In low distortion metric embeddings, the goal is to embed a host "hard" metric space into a "simpler" target space, while approximately preserving pairwise distances. A highly desirable target space is that of a tree metric. Unfortunately, such embedding will result in a huge distortion. A celebrated bypass to this problem is stochastic embedding with logarithmic expected distortion. Another bypass is Ramsey type embedding, where the distortion guarantee applies only to a subset of the points. However both this solutions fail to provide an embedding into a single tree with worst case distortion guarantee on all pairs. In this paper we propose a novel third bypass called \emph{clan embedding}. Here each point $x$ is mapped to a subset of points $f(x)$ (called a \emph{clan}) with a special \emph{chief} point $\chi(x)\in f(x)$. The clan embedding has multiplicative distortion $t$ if for every pair $x,y$ some copy $y'\in f(y)$ in the clan of $y$ is close to the chief of $x$: $\min_{y'\in f(y)}d(y',\chi(x))\le t\cdot d(x,y)$. Our first result is clan embedding into a tree with multiplicative distortion $O(\frac{\log n}{\epsilon})$ such that each point has $1+\epsilon$ copies (in expectation). In addition, for graphs we provide a "spanning" version of this theorem, and use it to devise the first compact routing scheme with constant size routing tables. Next we turn to minor free graphs, who were previously stochastically embedded into bounded treewidth graphs with expected additive distortion $\epsilon D$ ($D$ being the diameter). We devise a Ramsey type embedding and clan embedding analogs of the stochastic embedding. We use this embeddings to construct the first (bicriteria quasi-polynomial) approximation schemes for the metric $\rho$-dominating set and metric $\rho$-independent set problems in minor free graphs.
Article
Full-text available
We consider how to assign labels to any undirected graph with n nodes such that, given the labels of two nodes and no other information regarding the graph, it is possible to determine the distance between the two nodes. The challenge in such a distance labeling scheme is primarily to minimize the maximum label lenght and secondarily to minimize the time needed to answer distance queries (decoding). Previous schemes have offered different trade-offs between label lengths and query time. This paper presents a simple algorithm with shorter labels and shorter query time than any previous solution, thereby improving the state-of-the-art with respect to both label length and query time in one single algorithm. Our solution addresses several open problems concerning label length and decoding time and is the first improvement of label length for more than three decades. More specifically, we present a distance labeling scheme with label size (log 3)/2 + o(n) (logarithms are in base 2) and O(1) decoding time. This outperforms all existing results with respect to both size and decoding time, including Winkler's (Combinatorica 1983) decade-old result, which uses labels of size (log 3)n and O(n/log n) decoding time, and Gavoille et al. (SODA'01), which uses labels of size 11n + o(n) and O(loglog n) decoding time. In addition, our algorithm is simpler than the previous ones. In the case of integral edge weights of size at most W, we present almost matching upper and lower bounds for label sizes. For r-additive approximation schemes, where distances can be off by an additive constant r, we give both upper and lower bounds. In particular, we present an upper bound for 1-additive approximation schemes which, in the unweighted case, has the same size (ignoring second order terms) as an adjacency scheme: n/2. We also give results for bipartite graphs and for exact and 1-additive distance oracles.
Article
Full-text available
We show that there exists a graph $G$ with $O(n)$ nodes, where any forest of $n$ nodes is a node-induced subgraph of $G$. Furthermore, for constant arboricity $k$, the result implies the existence of a graph with $O(n^k)$ nodes that contains all $n$-node graphs as node-induced subgraphs, matching a $\Omega(n^k)$ lower bound. The lower bound and previously best upper bounds were presented in Alstrup and Rauhe (FOCS'02). Our upper bounds are obtained through a $\log_2 n +O(1)$ labeling scheme for adjacency queries in forests. We hereby solve an open problem being raised repeatedly over decades, e.g. in Kannan, Naor, Rudich (STOC 1988), Chung (J. of Graph Theory 1990), Fraigniaud and Korman (SODA 2010).
Article
Full-text available
Metric data structures (distance oracles, distance labeling schemes, routing schemes) and low-distortion embeddings provide a powerful algorithmic methodology, which has been successfully applied for approximation algorithms [21], online algorithms [7], distributed algorithms [19] and for computing sparsifiers [28]. However, this methodology appears to have a limitation: the worst-case performance inherently depends on the cardinality of the metric, and one could not specify in advance which vertices/points should enjoy a better service (i.e., stretch/distortion, label size/dimension) than that given by the worst-case guarantee. In this paper we alleviate this limitation by devising a suit of prioritized metric data structures and embeddings. We show that given a priority ranking (x1,x2,...,xn) of the graph vertices (respectively, metric points) one can devise a metric data structure (respectively, embedding) in which the stretch (resp., distortion) incurred by any pair containing a vertex xj will depend on the rank j of the vertex. We also show that other important parameters, such as the label size and (in some sense) the dimension, may depend only on j. In some of our metric data structures (resp., embeddings) we achieve both prioritized stretch (resp., distortion) and label size (resp., dimension) simultaneously. The worst-case performance of our metric data structures and embeddings is typically asymptotically no worse than of their non-prioritized counterparts.
Conference Paper
Full-text available
Hyperbolic metric spaces have been defined by M. Gro-mov in 1987 via a simple 4-point condition: for any four points u, v, w, x, the two larger of the distance sums d(u, v)+ d(w, x), d(u, w) + d(v, x), d(u, x) + d(v, w) differ by at most 2δ. They play an important role in geometric group theory, geometry of negatively curved spaces, and have recently be-come of interest in several domains of computer science. Given a finite set S of points of a δ-hyperbolic space, we present simple and fast methods for approximating the di-ameter of S with an additive error 2δ and computing an ap-proximate radius and center of a smallest enclosing ball for S with an additive error 3δ. These algorithms run in linear time for classical hyperbolic spaces and for δ-hyperbolic graphs and networks. Furthermore, we show that for δ-hyperbolic graphs G = (V, E) with uniformly bounded degrees of ver-tices, the exact center of S can be computed in linear time O(|E|). We also provide a simple construction of distance approximating trees of δ-hyperbolic graphs G = (V, E) on n vertices with an additive error O(δ log 2 n). This construc-tion has an additive error comparable with that given by M. Gromov for n-point δ-hyperbolic spaces, but can be imple-mented in linear time O(|E|) (instead of O(n 2)). Finally, we establish that several geometrically defined classes of graphs have bounded hyperbolicity.
Article
Full-text available
We describe a way of assigning labels to the vertices of any undirected graph on up to $n$ vertices, each composed of $n/2+O(1)$ bits, such that given the labels of two vertices, and no other information regarding the graph, it is possible to decide whether or not the vertices are adjacent in the graph. This is optimal, up to an additive constant, and constitutes the first improvement in almost 50 years of an $n/2+O(\log n)$ bound of Moon. As a consequence, we obtain an induced-universal graph for $n$-vertex graphs containing only $O(2^{n/2})$ vertices, which is optimal up to a multiplicative constant, solving an open problem of Vizing from 1968. We obtain similar tight results for directed graphs, tournaments and bipartite graphs.
Article
Full-text available
We consider NCA labeling schemes: given a rooted tree $T$, label the nodes of $T$ with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the label of their nearest common ancestor. For trees with $n$ nodes we present upper and lower bounds establishing that labels of size $(2\pm \epsilon)\log n$, $\epsilon<1$ are both sufficient and necessary. (All logarithms in this paper are in base 2.) Alstrup, Bille, and Rauhe (SIDMA'05) showed that ancestor and NCA labeling schemes have labels of size $\log n +\Omega(\log \log n)$. Our lower bound increases this to $\log n + \Omega(\log n)$ for NCA labeling schemes. Since Fraigniaud and Korman (STOC'10) established that labels in ancestor labeling schemes have size $\log n +\Theta(\log \log n)$, our new lower bound separates ancestor and NCA labeling schemes. Our upper bound improves the $10 \log n$ upper bound by Alstrup, Gavoille, Kaplan and Rauhe (TOCS'04), and our theoretical result even outperforms some recent experimental studies by Fischer (ESA'09) where variants of the same NCA labeling scheme are shown to all have labels of size approximately $8 \log n$.
Conference Paper
We study how to label the vertices of a tree in such a way that we can decide the distance of two vertices in the tree given only their labels. For trees, Gavoille et al. [7] proved that for any such distance labelling scheme, the maximum label length is at least 1/8 log(2) n-O(log n) bits. They also gave a separator-based labelling scheme that has the optimal label length Theta(log n.log(H(n) (T))), where H(n)(T) is the height of the tree. In this paper, we present two new distance labelling schemes that not only achieve the optimal label length Theta(log n.log(H(n)(T))), but also have a much smaller expected label length under certain tree distributions. With these new schemes, we also can efficiently find the least common ancestor of any two vertices based on their labels only.
Article
We consider the problem of encoding graphs with n vertices and m edges compactly supporting adjacency, neighborhood and degree queries in constant time in the Θ(logn)-bit word RAM model. The adjacency query asks whether there is an edge between two vertices, the neighborhood query reports the neighbors of a given vertex in constant time per neighbor, and the degree query reports the number of incident edges to a given vertex. We study the problem in the context of succinctness, where the goal is to achieve the optimal space requirement as a function of n and m, to within lower order terms. We prove a lower bound in the cell probe model indicating it is impossible to achieve the information-theory lower bound up to lower order terms unless the graph is either too sparse (namely, m=o(nδ)m=o(nδ) for any constant δ>0δ>0) or too dense (namely m=ω(n2−δ)m=ω(n2−δ) for any constant δ>0δ>0). Furthermore, we present a succinct encoding of graphs supporting aforementioned queries in constant time. The space requirement of the encoding is within a multiplicative 1+ϵ1+ϵ factor of the information-theory lower bound for any arbitrarily small constant ϵ>0ϵ>0. This is the best achievable space bound according to our lower bound where it applies. The space requirement of the representation achieves the information-theory lower bound tightly within lower order terms where the graph is very sparse (m=o(nδ)m=o(nδ) for any constant δ>0δ>0), or very dense (m>n2/lg1−δn for an arbitrarily small constant δ>0δ>0).