Content uploaded by Stephen Alstrup

Author content

All content in this area was uploaded by Stephen Alstrup on Aug 18, 2015

Content may be subject to copyright.

arXiv:1507.04046v1 [cs.DS] 14 Jul 2015

Distance labeling schemes for trees

Stephen Alstrup ∗Inge Li Gørtz †Esben Bistrup Halvorsen ‡

. Ely Porat §

Abstract

We consider distance labeling schemes for trees: given a tree with nnodes, label the nodes with

binary strings such that, given the labels of any two nodes, one can determine, by looking only at

the labels, the distance in the tree between the two nodes.

A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theory

2000) establish that labels must use Θ(log2n) bits1. Gavoille et. al. (ESA 2001) show that for very

small approximate stretch, labels use Θ(log nlog log n) bits. Several other papers investigate various

variants such as, for example, small distances in trees (Alstrup et. al., SODA’03).

We improve the known upper and lower bounds of exact distance labeling by showing that 1

4log2n

bits are needed and that 1

2log2nbits are suﬃcient. We also give (1 + ε)-stretch labeling schemes

using Θ(log n) bits for constant ε > 0. (1 + ε)-stretch labeling schemes with polylogarithmic label

size have previously been established for doubling dimension graphs by Talwar (STOC 2004).

In addition, we present matching upper and lower bounds for distance labeling for caterpillars,

showing that labels must have size 2 log n−Θ(log log n). For simple paths with knodes and edge

weights in [1, n], we show that labels must have size k−1

klog n+ Θ(log k).

∗Department of Computer Science, University of Copenhagen. E-mail: s.alstrup@di.ku.dk.

†DTU Compute, Technical University of Denmark. E-mail: inge@dtu.dk

‡Department of Computer Science, University of Copenhagen. E-mail: esben@bistruphalvorsen.dk

§Department of Computer Science, Bar-Ilan University. porately@cs.biu.ac.il

1Throughout this paper we use log for log2.

1 Introduction

Adistance labeling scheme for a given family of graphs assigns labels to the nodes of each graph in the

family such that, given the labels of two nodes in the graph and no other information, it is possible

to determine the shortest distance between the two nodes. The labels are assumed to be composed of

bits, and the goal is to make the worst-case label size as small as possible. Labeling schemes are also

called implicit representation of graphs [60, 67]. The problem of ﬁnding implicit representations with

small labels for speciﬁc families of graphs was introduced in the 1960s [14, 15], and eﬃcient labeling

schemes were introduced in [42, 53]. Distance labeling for general graphs has been considered since the

1970/80s [38, 68], and later for various restricted classes of graphs and/or approximate distances, often

tightly related to distance oracle and routing problems, see e.g. [6]. This paper focuses on distance

labels for the well studied case of trees.

Exact distances. In [57] Peleg presented an O(log2n) bits distance labeling scheme for general un-

weighted trees. In [37] Gavoille et al. proved that distance labels for unweighted binary trees re-

quire 1

8log2n−O(log n) bits and presented a scheme with 1/(log 3 −1) log n≈1.7 log nbits. This

paper presents a scheme of size 1

2log2n+ O(log n) and further reduces the gap by showing that

1

4log2n−O(log n) bits are needed. Our upper bound is a somewhat straightforward application of

a labeling scheme for nearest common ancestors [7, 8].

Approximate distances. Let distT(x, y) denote the shortest distance between nodes x, y in a tree

T. An r-additive approximation scheme returns a value dist′

T(x, y), where distT(x, y)≤dist′

T(x, y)≤

distT(x, y) + r. An s-stretched approximation scheme returns a value dist′

T(x, y), where distT(x, y )≤

dist′

T(x, y)≤distT(x, y)·s. For trees of height hGavoille et al. [30, theorem 4] gave a 1-additive

O(log nlog h) bit labeling scheme. However, using an extra bit in the label for the node depth modulo

2, it is easy to see that any 1-additive scheme can be made exact. Gavoille et al. [30] also gave upper

and lower bounds of Θ(log log nlog n) bits for (1 + 1/log n)-stretched distance. This paper presents a

scheme of size Θ(log n) for (1 + ε)-stretch for constant ε > 0. Labeling schemes for (1 + ε)-stretch with

polylogarithmic size label have previously been given for graphs of doubling dimension [61] and planar

graphs [63].

Distances in caterpillars and paths. Labeling schemes for caterpillars have been studied for various

queries, e.g., adjacency [13]. Here we present upper and lower bounds showing that distance labeling

caterpillars requires 2 log n−Θ(log log n) bits. The upper bound is constructed by reduction to the case

of weighted paths with k > 1 nodes and positive integer edge weights in [1, n], for which we give upper

and lower bounds showing that labels must have size k−1

klog n+ Θ(log k).

Problem Lower bound Upper bound

Exact, general trees 1

4log2n1

2log2n

(1 + ε)-stretch, general trees Θ(log n)

Caterpillars 2 log n−Θ(log log n)

Weighted paths, knodes, weights in [1, n]k−1

klog n+ Θ(log k)

Table 1: Results presented in this paper. ε > 0 is a constant.

1.1 Related work

Distances in trees with small height. It is known that, for unweighted trees with bounded height

h, labels must have size Θ(log nlog h). The upper bound follows from [30, Theorem 2] and the lower

1

bound from [37, Section 3]2. In [43] distance labeling for various restricted class of trees, including trees

with bounded height, is considered, and in [62] another distance labeling scheme for unweighted trees

using O(log nlog h) bits is given.

Small distances in trees. Distances in a tree between nodes at distance at most kcan be computed

with labels of size log n+ O(k√log n) [44]. In [4] it is shown that size log n+ Θ(log log n) are needed

for labeling schemes supporting both parent and sibling queries. More generally, [4] shows that, using

labels of size log n+ O(log log n), the distance between two nodes can be determined if it is at most k

for some constant k, which is optimal for k > 1. In [31, 32] further improvements are given for small

distances in trees. For k= 1, corresponding to adjacency testing, there is a sequence of papers that

improve the second order term, recently ending with [5] which establishes that log n+ Θ(1) bits are

suﬃcient.

Various other cases for trees. Distance labeling schemes for various other cases have been consid-

ered, e.g., for weighted trees [30, 37, 57], dynamic trees [50], and a labeling scheme variation with extra

free lookup [48, 49].

Exact and approximate distances in graphs. Distance labeling schemes for general graphs [6,

37, 38, 60, 66, 68] and various restricted graphs exist, e.g., for bounded tree-width, planar and bounded

degree [37], distance-hereditary [34], bounded clique-width [20], some non-positively curved plane [17],

interval [35] and permutation graphs [12]. Approximate distance labeling schemes, both additive and

stretched, are also well studied; see e.g., [16, 24, 30, 33, 37, 39, 40, 51, 57, 65]. An overview of distance

labeling schemes can be found in [6].

1.2 Second order terms are important

Chung’s solution in [18] gives labels of size log n+O(log log n) for adjacency labeling in trees, which

was improved to log n+O(log∗n) in FOCS’02 [11] and in [13, 18, 27, 28, 45] to log n+ Θ(1) for various

special cases. Finally it was improved to log n+ Θ(1) for general trees in FOCS’15 [5].

A recent STOC’15 paper [9] improves label size for adjacency in general graphs from n/2+O(log n) [42,

52] to n/2 + O(1) almost matching an (n−1)/2 lower bound [42, 52].

Likewise, the second order term for ancestor relationship is improved in a sequence of STOC/SODA

papers [2, 3, 10, 28, 29] (and [1, 45]) to Θ(log log n), giving labels of size log n+ Θ(log log n).

Somewhat related, succinct data structures (see, e.g., [22, 25, 26, 54, 55]) focus on the space used

in addition to the information theoretic lower bound, which is often a lower order term with respect to

the overall space used.

1.3 Labeling schemes in various settings and applications

By using labeling schemes, it is possible to avoid costly access to large global tables, computing instead

locally and distributed. Such properties are used, e.g., in XML search engines [2], network routing and

distributed algorithms [21, 23, 64, 65], dynamic and parallel settings [19, 50], graph representations [42],

and other applications [46, 47, 56, 57, 58]. Various computability requirements are sometimes imposed

on labeling schemes [2, 42, 46]. This paper assumes the RAM model.

2 Preliminaries

Trees. Given nodes u, v in a rooted tree T,uis an ancestor of vand vis a descendant of u, if uis on

the unique path from vto the root. For a node uof T, denote by Tube the subtree of Tconsisting of

2We thank Gavoille for pointing this out.

2

all the descendants of u(including itself). The depth of uis the number of edges on the unique simple

path from uto the root of T. The nearest common ancestor (NCA) of two nodes is the unique common

ancestor with largest depth. Let T[u, v] denote the nodes on the simple path from uto vin T. The

variants T(u, v] and T[u, v) denote the same path without the ﬁrst and last node, respectively. The

distance between uand vis the number dist(u, v) = |T(u, v]|. We set distroot(v) = dist(v, r ), where r

is the root of T. A caterpillar is a tree whose non-leaf nodes form a path, called the spine.

Heavy-light decomposition. (From [59].) Let Tbe a rooted tree. The nodes of Tare classiﬁed as

either heavy or light as follows. The root rof Tis light. For each non-leaf node v, pick one child w

where |Tw|is maximal among the children of vand classify it as heavy; classify the other children of v

as light. The apex of a node vis the nearest light ancestor of v. By removing the edges between light

nodes and their parents, Tis divided into a collection of heavy paths. Any given node vhas at most

log nlight ancestors (see [59]), so the path from the root to vgoes through at most log nheavy paths.

Bit strings. A bit string sis a member of the set {0,1}∗. We denote the length of a bit string sby

|s|, the ith bit of sby si, and the concatenation of two bit strings s, s′by s◦s′. We say that s1is the

most signiﬁcant bit of sand s|s|is the least signiﬁcant bit.

Labeling schemes. An distance labeling scheme for trees of size nconsists of an encoder eand a

decoder d. Given a tree T, the encoder computes a mapping eT:V(T)→ {0,1}∗assigning a label to

each node u∈V(T). The decoder is a mapping d:{0,1}∗×{0,1}∗→Z+, where Z+denotes the positive

integers, such that, given any tree Tand any pair of nodes u, v ∈V(T), d(e(u), e(v)) = dist(u, v). Note

that the decoder does not know T. The size of a labeling scheme is deﬁned as the maximum label size

|eT(u)|over all trees Tand all nodes u∈V(T). If, for all trees T, the mapping eTis injective we say

that the labeling scheme assigns unique labels.

3 Distances on weighted paths

In this section we study the case of paths with knodes and integral edge weights in [1, n]. The solution

to this problem will later be used to establish the upper bound for caterpillars.

3.1 Upper Bound

Theorem 3.1. There exist a distance labeling scheme for paths with knodes and positive integral edge

weights in [1, n]with labels of size k−1

klog n+O(log k).

Proof. We begin by considering the family of paths with knodes, integral edge weights and diameter

< n. We shall prove that there exists a distance labeling scheme for this family with labels of size

k−1

klog n+ log k+ O(log log k).

So consider such a path, and root it in one of its end nodes, denoted v0. Denote the nodes on the

path v0,...,vk−1in order. Let di= distroot(vi) and note that, by assumption, di< n for all i. We will

let the label for vistore the number di+xfor some x < n that allows us to represent di+xcompactly.

Since we use the same xfor all nodes, we can easily compute the distance between any pair of nodes

vi, vjas |(di+x)−(dj+x)|.

Since we choose x < n, the largest number stored in a label will be dk+x < 2n, which can be

represented with exactly L=⌈log(2n)⌉bits. Divide those Lbits to k+ 1 segments, whereof khave

ℓ=⌊L/k⌋bits and the last segment contains the remaining bits. The ﬁrst segment, segment 0, will

contain the ℓleast signiﬁcant bits, segment 1 the following ℓbits and so on. We will choose xsuch that

the representation of di+xhas 0s in all the bits in the i’th segment. If we manage to do so, we will be

able to encode each di+xwith L−ℓ+⌈log k⌉bits. Indeed, we can use exactly ⌈log k⌉bits to represent

3

i, and the next L−ℓbits to represent di+xwhere we skip the i’th segment. Preﬁxing with a string in

the form 0⌈log⌈log k⌉⌉1, we get a string from which we can determine the number of bits needed to write

⌈log k⌉and therefrom the numbers iand di+x. We use this string as the label for vi. The label length

is L−ℓ+⌈log k⌉+⌈log ⌈log k⌉⌉ + 1 = k−1

klog n+ log k+O(log log k).

It remains to show that there exist a number x < n as described. In the following we shall, as in

the above, represent numbers <2nwith Lbits that are divided into k+ 1 segments whereof the ﬁrst k

have size ℓ. For i < k and y < 2n, let a(i, y) be a function which returns a number zwith the following

properties:

(i) In z, all bits outside segment iare 0.

(ii) z+yhas only 0s in segment i.

This function is constructed as follows. If yonly has 0s in segment i, let a(i, y) = 0. Otherwise take the

representation of y, zero out all bits outside segment i, reverse the bits in segment iand add vto the

resulting number, where vhas a 1 in the least signiﬁcant bit of segment iand 0s in all other positions.

Note that from (i) it follows that adding zto any number will not change bits in less signiﬁcant

positions than segment i. We can now scan through the nodes v0,...vk−1, increasing xby adding bits

to xin more and more signiﬁcant positions (in non-overlapping segments), as follows:

•Set x= 0.

•For i= 1 ...,k −1, set x=x+a(i, x +di).

After iteration iwe have that x+diin segment ionly has 0s, and in the following iterations, 1s

are only added to xin more signiﬁcant bit positions, meaning that di+xcontinues to have only 0s in

segment i. Since the segments are non-overlapping, we end up with x < n.

For the more general family of paths with knodes and edge weights in [1, n], we simply note that

the diameter of any path in this family is at most kn. Using the above result thus immediately gives

us a labeling scheme with labels of size k−1

klog n+O(log k).

3.2 Lower bound

Theorem 3.2. Labeling scheme for distances on weighted paths with knodes and edge weights [1, n]

require k−1

klog n+ Ω(log k)bits.

Proof. Let Fdenote the family of paths with knodes and integral edge weights in [1, n]. We can

construct all the members of Fby selecting (k−1) diﬀerent edge weights in the range [1, n], skipping

the paths which have already been constructed by the reverse sequence of edge weights. With this

construction we will at most skip half of the paths, and hence |F| ≥ 1

2nk−1. Let the worst-case label

size of an optimal distance labeling scheme for such paths have length L. The number of diﬀerent

labels with length at most Lis N= 2L+1 −1. We can uniquely represent each of the paths in F

with the collection of their labels, and hence |F| ≤ N

k. Thus, we have found that 1

2nk−1≤N

k. Since

N

k≤(Ne/k)k, it follows that k−1

klog n≤log N−log k+O(1) and hence that L≥k−1

klog n+log k−O(1)

as desired.

Combining Theorem 3.2 with Theorem 3.1 we see that distance labels for paths of knodes with

integral weights in [1, n] must have length k−1

klog n+ Θ(log k).

4

4 Distances in caterpillars

4.1 Upper bound

Theorem 4.1. There exist a distance labeling scheme for caterpillars with worst case label size 2 log n−

log log n+O(log log log n).

Proof. We will start by giving a simple 2 log nbits scheme and then improve it. The simple solution

assigns two numbers to each node. The nodes on the spine save distroot and the number 0. The nodes

not on the spine save their parent’s distroot and a number that is unique among their siblings. The

second number is required to distinguish siblings, and hence determine if the distance between two

nodes is 0 or 2. The worst-case label size for this solution is 2 log n+O(1).

To improve the solution, we split up the nodes on the spine into two groups: (1) nodes with >n

k

leaves and (2) nodes with ≤n

kleaves, for some parameter kto be chosen later. We add the root to the

ﬁrst group no matter what. Note that the ﬁrst group can contain at most knodes.

As before, all nodes save two numbers: distroot and the number 0 for spine nodes or a number to

distinguish siblings. The idea is to reduce label size with log kbits by using fewer bits for the ﬁrst

number for nodes in the ﬁrst group and for the second number for nodes in the second group.

The nodes in the ﬁrst group form a path with at most knodes and edge weights in [1, n] (where

each weight corresponds to the distance between the nodes in the original graph). The algorithm

from Theorem 3.1 will add a number x, which is less than the diameter, which again is less than n,

to the numbers representing the root distances of the nodes. Using this technique, we can, as seen

in the proof of Theorem 3.1, encode the (modiﬁed) distroots of the nodes in the ﬁrst group with only

k−1

klog n+log k+O(log log k) bits. This gives labels of size 2k−1

klog n+log k+O(log log k) for non-spine

nodes whose parents are in the ﬁrst group.

We will also add xto the distroots of nodes in the second group, but since x < n this will not

change the label size by more than a single bit. For non-spine nodes whose parents are in the second

group, we need at most log n−log k+O(1) bits for the second number, giving a total label size of

2 log n−log k+O(1).

Finally, since the two numbers that form a label now have diﬀerent lengths, we need an additional

O(log log k) bits to determine when one number ends and the next begins. Indeed, it wil be possible

to split up labels into their components if we know the number of bits used to write ⌈log k⌉, and we

represent this number with O(log log k) bits.

Setting k=log n

2 log log n, we now see that our worst-case label size is the maximum of

2 log n−log k+O(log log k) = 2 log n−log log n+O(log log log n)

and

2k−1

klog n+ log k+O(log log k) = 2 log n−2 log log n+ log log n+O(log log log n)

= 2 log n−log log n+O(log log log n).

This proves the theorem.

4.2 Lower bound

We present a technique that counts tuples of labels that are known to be distinct and compares the

result to the number of tuples one can obtain with labels of size L. The technique may have applications

to distance labeling for other families of graphs.

Theorem 4.2. For any n≥4, any distance labeling scheme for the family of caterpillars with at most

nnodes has a worst-case label size of at least 2⌊log n⌋ − ⌊log⌊log n⌋⌋ − 4.

5

Proof. Set k=⌊log n⌋and m= 2k. Let (i1,...,ik) be a sequence of knumbers from the set {1,...,m/2}

with the only requirement being that i1= 1. Now consider, for each such sequence, the caterpillar whose

main path has length m/2 and where, for t= 1,...,k, the node in position ithas ⌊m/2k⌋leaf children

(not on the main path). We shall refer to these children as the t’th group. Note that two disjoint groups

of children may be children of the same node if it=isfor some s, t. Each of these caterpillar has

m/2 + k⌊m/2k⌋ ≤ m≤nnodes.

Suppose that σis a distance labeling scheme for the family of caterpillars, and consider one of

the caterpillars deﬁned above. Given distinct nodes u, v not on the main path, their distance will be

dist(u, v) = |is−it|+ 2, where isand itare the positions on the main path of the parents of uand

v, respectively. In particular, if s= 1, so that is= 1, then dist(u, v) = it+ 1. Thus, if σhas been

used to label the nodes of the caterpillar, the number itfor a child in the t’th group can be uniquely

determined from its label together with the label of any of the children from the ﬁrst group. It follows

that any k-tuple of labels (l1,...,lk) where ltis a label of a child in the t’th group uniquely determines

the sequence (i1,...,ik). In particular, k-tuples of labels from distinct caterpillars must be distinct. Of

course, k-tuples of labels from the same caterpillar must also be distinct, since labels are unique in a

distance labeling scheme.

Now, there are (m/2)k−1choices for the sequence (i1,...,ik), and hence there are (m/2)k−1diﬀerent

caterpillars in this form. For each of these, there are ⌊m/2k⌋kdiﬀerent choices of k-tuples of labels.

Altogether, we therefore have (m/2)k−1⌊m/2k⌋kdistinct k-tuples of labels. If the worst-case label

size of σis L, then we can create at most (2L+1 −1)kdistinct k-tuples of labels, so we must have

(m/2)k−1⌊m/2k⌋k≤(2L+1 −1)k. From this it follow that

L≥ ⌊k−1

k(log m−1) + log⌊m/2k⌋⌋

≥ ⌊(k−1)2

k+k−log k⌋ − 2

≥2k− ⌊log k⌋ − 4

= 2⌊log n⌋ − ⌊log⌊log n⌋⌋ − 4.

5 Exact distances in trees

5.1 Upper bound

Let u, v be nodes in a tree Tand let wbe their nearest common ancestor. We then have

dist(u, v) = distroot(u)−distroot(v) + 2 dist(w, v) (1)

If w=uso that uis an ancestor of v, then the above equation is just a diﬀerence of distroots, which

can be stored for each node with log nbits. The same observation clearly holds if w=v.

Assume now that w /∈ {u, v}so that uand vare not ancestors of each other. Consider the heavy-

light decomposition [59] described in the preliniaries. At least one of the nodes uand vmust have an

ancestor which is a light child of w. Assume that it is v. Now, vhas at most log nlight ancestors.

Saving the distance to all of them together with distroot gives us suﬃcient information to compute the

distance between uand vusing equation (1). This is the idea behind Theorem 5.2 below.

By examining the NCA labeling scheme from [7, 8], we see that it can easily be extended as follows.

Lemma 5.1 ([7, 8]).There exists an NCA labeling scheme of size O(log n). For any two nodes u, v the

scheme returns the label of w= nca(u, v)as well as:

•which of uand v(if any) have a light ancestor that is a child of w; and

6

•the number of light nodes on the path from the root to wand from wto uand v, respectively.

Theorem 5.2. There exists a distance labeling scheme for trees with worst-case label size 1

2log2n+

O(log n).

Proof. We use O(log n) bits for the extended NCA labeling in Lemma 5.1 and for distroot. Using (1)

it now only remains to eﬃciently represent, for each node, the distance to all its light ancestors. We

consider the light ancestors of a node vencountered on the path from the root to v. The distance from

vto the root is at most n−1 and can therefore be encoded with exactly ⌈log n⌉bits (by adding leading

zeros if needed). By construction of the heavy-light decomposition, the next light node on the path to

vwill be the root of a subtree of size at most n/2, meaning that the distance from vto that ancestor

is at most n/2−1 and can be encoded with exactly ⌈log n⌉ − 1 bits. Continuing this way, we encode

the i’th light ancestor on the path from the root to vwith exactly ⌈log n⌉− ibits. When we run out of

light ancestors, we concatenate all the encoded distances, resulting in a string of length at most

⌈log n⌉+ (⌈log n⌉ − 1) + ···+ 2 + 1 = 1

2⌈log n⌉2+1

2⌈log n⌉.

We can use O(log n) extra bits to encode nand to separate all sublabels from each other. The decoder

can now determine ⌈log n⌉and split up the entries in the list of distances. When applying formula (1),

it can then determine the distance between vand wby adding together the relevant distances in the

list of light ancestors, using the fact from Lemma 5.1 that it knows the number of light ancestors from

vto w.

5.2 Lower bound

In the case of general trees, Gavoille et al [37] establish a lower bound of 1

8log2n−O(log n) using

an ingenious technique where they apply a distance labeling scheme to a special class of trees called

(h, M )-trees3. The following uses a generalization of (h, M )-trees to improve their ideas and leads to a

lower bound of 1

4log2n−O(log n).

(h, W, a)-trees. We begin with some deﬁnitions. For integers h, W ≥0 and a number a≥1 such that

W/aiis integral for all i= 0,...,h, an (h, W, a)-tree is a rooted binary tree Twith edge weights in

[0, W ] that is constructed recursively as follows. For h= 0, Tis just a single node. For h= 1, Tis

a claw (i.e. a star with three edges) with edge weights x, x, W −xfor some 0 ≤x < W rooted at the

leaf node of the edge with weight W−x. For h > 1, Tconsists of an (1, W, a)-tree whose two leaves

are the roots of two (h−1, W/a, a)-trees T0, T1. We shall denote an (h, W, a)-tree constructed in this

way by T=hT0, T1, xiAn example for h= 3 can be seen in Figure 1. Note that the case a= 1 simply

corresponds to the (h, W )-trees deﬁned in [37].

It is easy to see that an (h, W, a)-tree has 2hleaves and 3·2h−2 nodes. Further, it is straightforward

to see that, if u, v are leaves in an (h, W, a)-tree T=hT0, T1, xi, then

distT(u, v) = (2Wa−1−a−h

1−a−1+ 2x, if u∈T0and v∈T1, or vice versa,

distTi(u, v),if u, v ∈Tifor some i= 0,1.(2)

Leaf distance labeling schemes. In the following we shall consider leaf distance labeling schemes

for the family of (h, W, a)-trees: that is, distance labeling schemes where only the leaves in a tree need to

be labeled, and where only leaf labels can be given as input to the decoder. Since an ordinary distance

labeling scheme obviously can be used only for leaves, any lower bound on worst-case label sizes for a

leaf distance labeling scheme is also a lower bound for an ordinary distance labeling scheme. We denote

by g(h, W, a) the smallest number of labels needed by an optimal leaf distance labeling scheme to label

all (h, W, a)-trees.

3Note that their exposition has some minor errors as pointed out (and corrected) in [41]

7

z4z4

W

a2−z4

z3z3

W

a2−z3

y2y2

W

a−y2

z2z2

W

a2−z2

z1z1

W

a2−z1

y1y1

W

a−y1

x x

W−x

Figure 1: An (h, W, a)-tree, where h= 3. We require that x < W ,y1, y2< W/a and z1,...,z4< W/a2.

Lemma 5.3. For all h≥1and W≥2,g(h, W, a)2≥W g(h−1, W 2/a2, a2).

Proof. Fix an optimal leaf distance labeling scheme σwhich produces exactly g(h, W, a) distinct labels

for the family of (h, W, a)-trees. For leaves uand vin an (h, W, a)-tree, denote by l(u) and l(v),

respectively, the labels assigned by σ. For x= 0,...,W −1, let S(x) be the set consisting of pairs of

labels (l(u), l(v)) for all leaves u∈T0and v∈T1in all (h, W, a)-trees T=hT0, T1, xi.

The sets S(x) and S(x′) are disjoint for x6=x′, since every pair of labels in S(x) uniquely determines

xdue to (2). Letting S=SW−1

x=0 S(x), we therefore have |S|=PW−1

x=0 |S(x)|. Since Scontains pairs

of labels produced by σfrom leaves in (h, W, a)-trees , we clearly also have |S| ≤ g(h, W, a)2, and

hence it only remains to prove that |S| ≥ W g(h−1, W 2/a2, a2), which we shall do by showing that

|S(x)| ≥ g(h−1, W 2/a2, a2) for all x.

The goal for the rest of the proof is therefore to create a leaf distance labeling scheme for (h−

1, W 2/a2, a2)-trees using only labels from the set S(x) for some ﬁxed x. So let xbe given and consider

an (h−1, W 2/a2, a2)-tree T′. Let V=W/a. From T′we shall construct an (h−1, V, a)-tree φi(T′)

for i= 0,1 such that every leaf node vin T′corresponds to nodes φi(v) in φi(T′) for i= 0,1. The

trees φi(T′) are deﬁned as follows. If h= 1, so that T′consists of a single node, then φi(T′) = T′

for i= 0,1. If h > 1, then T′is in the form T′=hT′

0, T ′

1, yifor some 0 ≤y < V 2. We can write

yin the form y=y0+y1Vfor uniquely determined y0, y1with 0 ≤y0, y1< V . For i= 0,1, we

recursively deﬁne φi(T′) = hφi(T′

0), φi(T′

1), yii. Thus, φi(T′) is an (h−1, V, a)-tree that is similar to T′

but where we replace the top edge weight yby edge weights yiand, recursively, do the same for all

(h−2, V 2/a2, a2)-subtrees. Note also that the corresponding edge weight V2−yin T′automatically is

replaced by the edge weight V−yiin φi(T′) in order for φi(T′) to be an (h−1, V , a)-tree.

Denote by φi(v) the leaf in φi(T′) corresponding to the leaf vin T′.

Consider now the (h, W, a)-tree T=hφ0(T′), φ1(T′), xi. Every leaf vin T′corresponds to the leaves

φ0(v), φ1(v) in Twhere φi(v)∈φi(T′) for i= 0,1. Using formula (2) for the distances in T′, it is

straightforward to see that

distT′(u, v) = distφ0(T′)(φ0(u), φ0(v)) mod (2V)+Vdistφ1(T′)(φ1(u), φ1(v)).

We can now apply the leaf distance labeling scheme σto Tand obtain a label for each leaf node in

T. In particular, the pair of leaves (φ0(v), φ1(v)) corresponding to a node vin T′will receive a pair of

labels. We use this pair to label vin T′, whereby we have obtained a labeling of the leaves in T′with

labels from S(x). Using the formula in (5.2) we can construct a decoder that can compute the distance

between two nodes in T′using these labels alone, and hence we have obtained a leaf distance labeling

scheme for (h−1, V 2, a2)-trees using only labels from S(x) as desired.

8

Lemma 5.4. For all h≥1and W≥2,g(h, W, a)≥Wh/2

ah(h−1)/4.

Proof. The proof is by induction on h. For h= 1 we note that an (0, W, a)-tree has only one node,

so that g(0, W 2/a2, a2) = 1. Lemma 5.3 therefore yields g(1, W, a)2≥Wfrom which it follows that

g(1, W, a)≥√W. The lemma therefore holds for h= 1. Now let h > 1 and assume that the lemma

holds for h−1. Lemma 5.3 and the induction hypothesis now yield

g(h, W, a)2≥W g(h−1, W 2/a2, a2)

≥W(W2/a2)(h−1)/2

a2(h−1)(h−2)/4

=Wh

ah(h−1)/2

from which it follows that g(h, W, a)≥Wh/2

ah(h−1)/4as desired.

The previous lemma implies that any (leaf and hence also ordinary) distance labeling scheme for

(h, W, a)-trees must have labels with worst-case length at least h

2(log W−h−1

2log a) = 1

2hlog W−

1

4h2log a+1

4hlog a. Since the number of nodes in such a tree is n= 3 ·2h−2, it follows that h=

log(n+ 2) −log 3, and hence that log n−2≤h≤log nfor suﬃciently large n. From this we see that

the worst case label length is at least

1

2log nlog W−1

4log n(log n−1) log a−log W−1

2log a.

In the case where a= 1, we retrieve the bound of 1

2log nlog W−log Wobtained in [36]. It seems that

larger values of aonly makes the above result weaker, but the the real strength of the above becomes

apparent when we switch to the unweighted version of (h, W, a)-trees, in which we replace weighted

edges by paths of similar lenghts. Note that a distance labeling scheme for the family of unweighted

(h, W, a)-trees can be used as a distance labeling scheme for the weighted (h, W, a)-trees, and hence any

lower bound in the weighted version automatcially becomes a lower bound in the unweighted version.

The number of nodes nin an unweighted (h, W, a)-tree is upper bounded by

n≤2W+ 2 ·2W/a + 22·2W/a2+···+ 2h−1·2W/ah−1+ 1

In the case a= 2, we get n≤2W h + 1.

Theorem 5.5. Any distance labeling scheme for unweighted (h, W, 2)-trees, and hence also for general

trees, has a worst-case label size of at least 1

4log2n−O(log n).

Proof. Choose the largest integer hwith 2 ·2hh+ 1 ≤n, and note that we must have h≥log n−

O(log log n). Set W= 2hand consider the family of (h, W, 2)-trees, which is a subfamily of the family

of trees with nnodes. From Lemma 5.4 it therefore follows that the worst-case label length is

1

2hlog W−1

4h2+1

4h=1

4h2+1

4h=1

4log2n+1

4log n−O(log log n).

6 Approximate distances in trees

In this section we present a (1 + ε)-stretch distance labeling schemes with labels of size O(log n).

Theorem 6.1. For constant ε > 0,1 + εstretch labeling scheme use Θ(log n)bits.

9

Proof. As in the case of exact distances, we will create labels of size O(log n) bits that contain the

extended NCA labels from Lemma 5.1 as well as distroot. We will also be using the formula in (1).

However we can not aﬀord to store exact distance to each apex ancestor. Even storing an 2-approximate

distance to each apex ancestor would require log nlog log nbits. Furthermore, given approximate dis-

tance to the apex nodes does not directly guarantee upper bound for the approximate distance, as we

in equation (1) are using subtractions. We will in the following address these two problems.

Let w= nca(u, v) and assume w6∈ {u, v }, since otherwise we can compute the exact distance using

only distroot. Suppose we know a (1 + ε)-approximation αof dist(w, v) for some ε≥0. That is,

dist(w, v)≤α≤(1 + ε) dist(w, v).(3)

Deﬁne ˜

d= distroot(u)−distroot(v) + 2α. First we show that ˜

dis a (1 + 2ε)-approximation of dist(u, v).

Next we show how to represent all the (1 + ε)-approximate distances to light ancestors for a node using

a total of O(log n) bits. Together with formula (1), these two facts prove that we can compute (1 + 2ε)-

stretch distances between any pair of nodes with labels of size O(log n). To prove the theorem, we can

then simple replace εby 1

2ε.

To see that ˜

dis a (1 + 2ε)-approximation of dist(u, v), ﬁrst note that

˜

d= distroot(u)−distroot(v) + 2α≥distroot(u)−distroot(v) + 2 dist(w, v) = dist(u, v).

For the other inequality, note that

˜

d= distroot(u)−distroot(v) + 2α

≤distroot(u)−distroot(v) + 2(1 + ǫ) dist(w, v)

= distroot(u)−(distroot(v)−dist(w, v)) + (1 + 2ǫ) dist(w, v)

= distroot(u)−distroot(w) + (1 + 2ǫ) dist(w, v)

= dist(u, w) + (1 + 2ǫ) dist(w, v)

≤(1 + 2ǫ) (dist(u, w) + dist(w, v))

= (1 + 2ǫ) dist(u, v).

It now only remains to show that we can compactly store all the approximate distances αto light

ancestors using O(log n) bits space.

We use a heavy light path decomposition of the tree. For each node vwe can save a 2 approximate

distance to all its kproper light ancestors node as follows. Let Sbe a binary string initially with k

zeros. Before each 0 we now inserts 1s such that, if we have j1s in total from the beginning of Sand

to the i’th 0, then the distance to the ith light ancestor aof vsatisﬁes that 2j−1≤dist(v, a)≤2j. This

is the same as traversing the tree bottom-up from vand, for each light node encountered on the way,

adding a 0 and each time the distance doubles adding a 1. The number of 0s equal the number of light

nodes which is at most log n, and the number of 1s is also limited by log nsince nis the maximum

distance in the tree. In total the length of Sis at most 2 log n.

Using the O(log n) bits label from Lemma 5.1 we can tell if one node is an ancestor of another, and

if not which one has a light ancestor athat is a child of their nearest common ancestor w. In addition,

we can determine the total number iof light ancestors up to a. This means that we can compute j,

and hence the 2-approximation j−1, as the number of 1’s in Suntil the i’th 0.

We have now obtained a 2-approximation with labels of size O(log n). We can improve this to a

(1 + ε)-approximation by setting a 1 in Seach time the distance increases with 1 + εrather than 2.

This will increase the label size with a constant factor 1

log(1+ε).

This proves that there is a (1 + ε)-stretch distance labeling scheme with O(log n). To complete the

proof of the theorem, we note that, given any (1 + ε)-stretch distance scheme, we can always distinguish

nodes (since identical nodes have distance 0), which means that we always need at least ndiﬀerent

labels, and hence labels of size at least log nbits.

10

References

[1] S. Abiteboul, S. Alstrup, H. Kaplan, T. Milo, and T. Rauhe. Compact labeling scheme for ancestor

queries. SIAM J. Comput., 35(6):1295–1309, 2006.

[2] S. Abiteboul, H. Kaplan, and T. Milo. Compact labeling schemes for ancestor queries. In Proc. of

the 12th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 547–556, 2001.

[3] S. Alstrup, P. Bille, and T. Rauhe. Labeling schemes for small distances in trees. In Proc. of the

14th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 689–698, 2003.

[4] S. Alstrup, P. Bille, and T. Rauhe. Labeling schemes for small distances in trees. SIAM J. Discrete

Math., 19(2):448–462, 2005. See also SODA’03.

[5] S. Alstrup, S. Dahlgaard, and M. B. T. Knudsen. Optimal induced universal graphs and labeling

schemes for trees. In Proc. 56th Annual Symp. on Foundations of Computer Science (FOCS), 2015.

[6] S. Alstrup, C. Gavoile, E. B. Halvorsen, and H. Petersen. Simpler, faster and shorter labels for

distances in graphs. Submitted, 2015.

[7] S. Alstrup, C. Gavoille, H. Kaplan, and T. Rauhe. Nearest common ancestors: A survey and a

new algorithm for a distributed environment. Theory of Computing Systems, 37(3):441–456, May

2004.

[8] S. Alstrup, E. B. Halvorsen, and K. G. Larsen. Near-optimal labeling schemes for nearest common

ancestors. In Proc. of the 25th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages

972–982, 2014.

[9] S. Alstrup, H. Kaplan, M. Thorup, and U. Zwick. Adjacency labeling schemes and induced-universal

graphs. In Proc. of the 47th Annual ACM Symp. on Theory of Computing (STOC), 2015.

[10] S. Alstrup and T. Rauhe. Improved labeling schemes for ancestor queries. In Proc. of the 13th

Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), 2002.

[11] S. Alstrup and T. Rauhe. Small induced-universal graphs and compact implicit graph represen-

tations. In Proc. 43rd Annual Symp. on Foundations of Computer Science (FOCS), pages 53–62,

2002.

[12] F. Bazzaro and C. Gavoille. Localized and compact data-structure for comparability graphs. Dis-

crete Mathematics, 309(11):3465–3484, 2009.

[13] N. Bonichon, C. Gavoille, and A. Labourel. Short labels by traversal and jumping. In Structural

Information and Communication Complexity, pages 143–156. Springer, 2006. Include proof for

binary trees and caterpillars.

[14] M. A. Breuer. Coding the vertexes of a graph. IEEE Trans. on Information Theory, IT–12:148–153,

1966.

[15] M. A. Breuer and J. Folkman. An unexpected result on coding vertices of a graph. J. of Mathe-

mathical analysis and applications, 20:583–600, 1967.

[16] V. D. Chepoi, F. F. Dragan, B. Estellon, M. Habib, and Y. Vax`es. Diameters, centers, and

approximating trees of delta-hyperbolic geodesic spaces and graphs. In 24st Annual ACM Symp.

on Computational Geometry (SoCG), pages 59–68, 2008.

11

[17] V. D. Chepoi, F. F. Dragan, and Y. Vax`es. Distance and routing labeling schemes for non-positively

curved plane graphs. J. of Algorithms, 61(2):60–88, 2006.

[18] F. R. K. Chung. Universal graphs and induced-universal graphs. J. of Graph Theory, 14(4):443–454,

1990.

[19] E. Cohen, H. Kaplan, and T. Milo. Labeling dynamic XML trees. SIAM J. Comput., 39(5):2048–

2074, 2010.

[20] B. Courcelle and R. Vanicat. Query eﬃcient implementation of graphs of bounded clique-width.

Discrete Applied Mathematics, 131:129–150, 2003.

[21] L. J. Cowen. Compact routing with minimum stretch. J. of Algorithms, 38:170–183, 2001. See also

SODA’91.

[22] Y. Dodis, M. Pˇatra¸scu, and M. Thorup. Changing base without losing space. In Proc. of the 42nd

Annual ACM Symp. on Theory of Computing (STOC), pages 593–602, 2010.

[23] T. Eilam, C. Gavoille, and D. Peleg. Compact routing schemes with low stretch factor. J. of

Algorithms, 46(2):97–114, 2003.

[24] M. Elkin, A. Filtser, and O. Neiman. Prioritized metric structures and embedding. In Proc. of the

47th Annual ACM Symp. on Theory of Computing (STOC), pages 489–498, 2015.

[25] A. Farzan and J. I. Munro. Succinct encoding of arbitrary graphs. Theoretical Computer Science,

513:38–52, 2013.

[26] A. Farzan and J. I. Munro. A uniform paradigm to succinctly encode various families of trees.

Algorithmica, 68(1):16–40, 2014.

[27] P. Fraigniaud and A. Korman. On randomized representations of graphs using short labels. In

Proc. of the 21st Annual Symp. on Parallelism in Algorithms and Architectures (SPAA), pages

131–137, 2009.

[28] P. Fraigniaud and A. Korman. Compact ancestry labeling schemes for XML trees. In Proc. of the

21st annual ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 458–466, 2010.

[29] P. Fraigniaud and A. Korman. An optimal ancestry scheme and small universal posets. In Proc.

of the 42nd Annual ACM Symp. on Theory of Computing (STOC), pages 611–620, 2010.

[30] C. Gavoille, M. Katz, N. Katz, C. Paul, and D. Peleg. Approximate distance labeling schemes. In

Proc. of the 9th Annual European Symp. on Algorithms (ESA), pages 476–488, 2001.

[31] C. Gavoille and A. Labourel. Distributed relationship schemes for trees. In 18th International

Symp. on Algorithms and Computation (ISAAC), pages 728–738, 2007.

[32] C. Gavoille and A. Labourel. On local representation of distances in trees. In Proc. of the 26th

Annual ACM Symp. on Principles of Distributed Computing (PODC), pages 352–353, 2007.

[33] C. Gavoille and O. Ly. Distance labeling in hyperbolic graphs. In 16th Annual International Symp.

on Algorithms and Computation (ISAAC), pages 1071–1079, 2005.

[34] C. Gavoille and C. Paul. Distance labeling scheme and split decomposition. Discrete Mathematics,

273(1-3):115–130, 2003.

[35] C. Gavoille and C. Paul. Optimal distance labeling for interval graphs and related graphs families.

SIAM J. Discrete Math., 22(3):1239–1258, 2008.

12

[36] C. Gavoille, D. Peleg, S. P´erennes, and R. Raz. Distance labeling in graphs. In Proc. of the 12th

Annual ACM-SIAM Symp. on Discrete algorithms (SODA), pages 210–219, 2001.

[37] C. Gavoille, D. Peleg, S. P´erennes, and R. Raz. Distance labeling in graphs. J. of Algorithms,

53(1):85 – 112, 2004. See also SODA’01.

[38] R. L. Graham and H. O. Pollak. On embedding graphs in squashed cubes. In Lecture Notes in

Mathematics, volume 303. Springer-Verlag, 1972.

[39] A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded geometries, fractals, and low-distortion

embeddings. In 44th Annual Symp. on Foundations of Computer Science (FOCS), pages 534–543,

2003.

[40] A. Gupta, A. Kumar, and R. Rastogi. Traveling with a pez dispenser (or, routing issues in mpls).

SIAM J. on Computing, 34(2):453–474, 2005. See also FOCS’01.

[41] E. B. Halvorsen. Labeling schemes for trees - overview and new results. Master’s thesis, University

of Copenhagen, 2013. Available at esben.bistruphalvorsen.dk.

[42] S. Kannan, M. Naor, and S. Rudich. Implicit representation of graphs. SIAM J. Disc. Math., pages

596–603, 1992. See also STOC’88.

[43] M. Kao, X. Li, and W. Wang. Average case analysis for tree labelling schemes. Theor. Comput.

Sci., 378(3):271–291, 2007.

[44] H. Kaplan and T. Milo. Short and simple labels for distances and other functions. In 7nd Work.

on Algo. and Data Struc., 2001.

[45] H. Kaplan, T. Milo, and R. Shabo. A comparison of labeling schemes for ancestor queries. In Proc.

of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA), 2002.

[46] M. Katz, N. A. Katz, A. Korman, and D. Peleg. Labeling schemes for ﬂow and connectivity. SIAM

J. Comput., 34(1):23–40, 2004. See also SODA’02.

[47] A. Korman. Labeling schemes for vertex connectivity. ACM Trans. Algorithms, 6(2):39:1–39:10,

2010.

[48] A. Korman and S. Kutten. Labeling schemes with queries. CoRR, abs/cs/0609163, 2006.

[49] A. Korman and S. Kutten. Labeling schemes with queries. In SIROCCO, pages 109–123, 2007.

[50] A. Korman and D. Peleg. Labeling schemes for weighted dynamic trees. Inf. Comput., 205(12):1721–

1740, 2007.

[51] R. Krauthgamer and J. R. Lee. Algorithms on negatively curved spaces. In 47th Annual Symp. on

Foundations of Computer Science (FOCS), pages 119–132, 2006.

[52] J. W. Moon. On minimal n-universal graphs. Proc. of the Glasgow Mathematical Association,

7(1):32–33, 1965.

[53] J. H. M¨uller. Local structure in graph classes. PhD thesis, Georgia Institute of Technology, 1988.

[54] J. I. Munro, R. Raman, V. Raman, and S. Srinivasa Rao. Succinct representations of permutations

and functions. Theor. Comput. Sci., 438:74–88, 2012.

[55] M. Pˇatra¸scu. Succincter. In Proc. 49th Annual Symp. on Foundations of Computer Science (FOCS),

pages 305–313, 2008.

13

[56] D. Peleg. Informative labeling schemes for graphs. In Proc. 25th Symp. on Mathematical Founda-

tions of Computer Science, pages 579–588, 2000.

[57] D. Peleg. Proximity-preserving labeling schemes. J. Graph Theory, 33(3):167–176, 2000.

[58] N. Santoro and R. Khatib. Labeling and implicit routing in networks. The computer J., 28:5–8,

1985.

[59] D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. J. of Computer and System

Sciences, 26(3):362 – 391, 1983.

[60] J. P. Spinrad. Eﬃcient Graph Representations, volume 19 of Fields Institute Monographs. AMS,

2003.

[61] K. Talwar. Bypassing the embedding: algorithms for low dimensional metrics. In Proc. of the 36th

Annual ACM Symp. on Theory of Computing (STOC), pages 281–290, 2004.

[62] M. Tang, J. Yang, and G. Zhang. A compact distance labeling scheme for trees of small depths.

In International Conference on Scalable Computing and Communications / Eighth International

Conference on Embedded Computing, ScalCom-EmbeddedCom, pages 455–458, 2009.

[63] M. Thorup. Compact oracles for reachability and approximate distances in planar digraphs. J.

ACM, 51(6):993–1024, 2004. See also FOCS’01.

[64] M. Thorup and U. Zwick. Compact routing schemes. In Proc. of the 13th Annual ACM Symp. on

Parallel Algorithms and Architectures, SPAA ’01, pages 1–10, 2001.

[65] M. Thorup and U. Zwick. Approximate distance oracles. J. of the ACM, 52(1):1–24, 2005. See

also STOC’01.

[66] O. Weimann and D. Peleg. A note on exact distance labeling. Inf. Process. Lett., 111(14):671–673,

2011.

[67] Wikipedia. Implicit graph — wikipedia, the free encyclopedia, 2013. [Online; accessed 15-February-

2014].

[68] P. M. Winkler. Proof of the squashed cube conjecture. Combinatorica, 3(1):135–139, 1983.

14