# Inge Li Gørtz's research while affiliated with Technical University of Denmark and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (73)

Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string $S$ is compressed relative to a second string $R$ (called the reference) by parsing $S$ into a sequence of substrings that occur in $R$. RLZ is particularly effective at compressing sets of strings that have a high degree of similarity to the reference string, su...

Let $S$ be a string of length $n$ over an alphabet $\Sigma$ and let $Q$ be a subset of $\Sigma$ of size $q \geq 2$. The 'co-occurrence problem' is to construct a compact data structure that supports the following query: given an integer $w$ return the number of length-$w$ substrings of $S$ that contain each character of $Q$ at least once. This is a...

The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string P, report all occurrences of P within S. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-k close consec...

Given a regular expression $R$ and a string $Q$ the regular expression matching problem is to determine if $Q$ is a member of the language generated by $R$. The classic textbook algorithm by Thompson [C. ACM 1968] constructs and simulates a non-deterministic finite automaton in $O(nm)$ time and $O(m)$ space, where $n$ and $m$ are the lengths of the...

We consider the predecessor problem on the ultra-wide word RAM model of computation, which extends the word RAM model with 'ultrawords' consisting of $w^2$ bits [TAMC, 2015]. The model supports arithmetic and boolean operations on ultrawords, in addition to 'scattered' memory operations that access or modify $w$ (potentially non-contiguous) memory...

We consider the classic partial sums problem on the ultra-wide word RAM model of computation. This model extends the classic w-bit word RAM model with special ultrawords of length w2 bits that support standard arithmetic and boolean operation and scattered memory access operations that can access w (non-contiguous) locations in memory. The ultra-wi...

We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length n over an alphabet of size \(\sigma\) into a compressed data stru...

Given two strings $S$ and $P$, the Episode Matching problem is to compute the length of the shortest substring of $S$ that contains $P$ as a subsequence. The best known upper bound for this problem is $\tilde O(nm)$ by Das et al. (1997), where $n,m$ are the lengths of $S$ and $P$, respectively. Although the problem is well studied and has many appl...

The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of...

We consider the classic partial sums problem on the ultra-wide word RAM model of computation. This model extends the classic w-bit word RAM model with special ultrawords of length bits that support standard arithmetic and boolean operation and scattered memory access operations that can access w (non-contiguous) locations in memory. The ultra-wide...

The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-$k$ cl...

We consider the a priori traveling repairman problem, which is a stochastic version of the classic traveling repairman problem. Given a metric (V,d) with a root r∈V, the traveling repairman problem (TRP) involves finding a tour originating from r that minimizes the sum of arrival-times at all vertices. In its a priori version, we are also given ind...

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path f...

We present the first linear time algorithm to construct the $2n$-bit version of the Lyndon array using only $o(n)$ bits of working space. A simpler variant of this algorithm computes the plain ($n\lg n$-bit) version of the Lyndon array using only $\mathcal{O}(1)$ words of additional working space. All previous algorithms are either not linear, or u...

Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of t...

We consider the classic partial sums problem on the ultra-wide word RAM model of computation. This model extends the classic $w$-bit word RAM model with special ultrawords of length $w^2$ bits that support standard arithmetic and boolean operation and scattered memory access operations that can access $w$ (non-contiguous) locations in memory. The u...

We present the first algorithm for regular expression matching that can take advantage of sparsity in the input instance. Our main result is a new algorithm that solves regular expression matching in $O\left(\Delta \log \log \frac{nm}{\Delta} + n + m\right)$ time, where $m$ is the number of positions in the regular expression, $n$ is the length of...

We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to preprocess a set of strings of total length $n$ over an alphabet of size $\sigma$ into a compressed data stru...

We consider the a priori traveling repairman problem, which is a stochastic version of the classic traveling repairman problem (also called the traveling deliveryman or minimum latency problem). Given a metric $(V,d)$ with a root $r\in V$, the traveling repairman problem (TRP) involves finding a tour originating from $r$ that minimizes the sum of a...

We revisit the mergeable dictionaries with shift problem, where the goal is to maintain a family of sets subject to search, split, merge, make-set, and shift operations. The search, split, and make-set operations are the usual well-known textbook operations. The merge operation merges two sets and the shift operation adds or subtracts an integer fr...

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the ori...

In their ground-breaking paper on grammar-based compression, Charikar et al. (2005) gave a separation between straight-line programs (SLPs) and Lempel–Ziv '77 (LZ77): they described an infinite family of strings such that the size of the smallest SLP generating a string of length n in that family, is an Ω(logn/loglogn)-factor larger than the siz...

We consider the communication complexity of fundamental longest common prefix (Lcp) problems. In the simplest version, two parties, Alice and Bob, each hold a string, $A$ and $B$, and we want to determine the length of their longest common prefix $l=\text{Lcp}(A,B)$ using as few rounds and bits of communication as possible. We show that if the long...

We consider the problem of decompressing the Lempel-Ziv 77 representation of a string $S\in[\sigma]^n$ using a working space as close as possible to the size $z$ of the input. The folklore solution for the problem runs in optimal $O(n)$ time but requires random access to the whole decompressed text. A better solution is to convert LZ77 into a gramm...

In this paper we give an infinite family of strings for which the length of the Lempel-Ziv'77 parse is a factor $\Omega(\log n/\log\log n)$ smaller than the smallest run-length grammar.

We present a highly optimized implementation of tiered vectors, a data structure for maintaining a sequence of $n$ elements supporting access in time $O(1)$ and insertion and deletion in time $O(n^\epsilon)$ for $\epsilon > 0$ while using $o(n)$ extra space. We consider several different implementation optimizations in C++ and compare their perform...

Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and we...

We consider compressing labeled, ordered and rooted trees using DAG compression and top tree compression. We show that there exists a family of trees such that the size of the DAG compression is always a logarithmic factor smaller than the size of the top tree compression (even for an alphabet of size 1). The result settles an open problem from Bil...

Given a string $S$, the compressed indexing problem is to preprocess $S$ into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of $S$ while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. Let $n$, and $z$ denote t...

Visualizing algorithms, such as drawings, slideshow presentations, animations, videos, and software tools, is a key concept to enhance and support student learning. A typical visualization of an algorithm show the data and then perform computation on the data. For instance, a standard visualization of a standard binary search on an array shows an a...

Re-Pair is an efficient grammar compressor that operates by recursively replacing high-frequency character pairs with new grammar symbols. The most space-efficient linear-time algorithm computing Re-Pair uses $(1+\epsilon)n+\sqrt n$ words on top of the re-writable text (of length $n$ and stored in $n$ words), for any constant $\epsilon>0$; in pract...

We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size n compressing a string of size N and a pattern string of size m over an alphabet of size , our algorithm uses space and or time. Here w is the word size and occ is the number of minimal occurrences of the pattern. Our algorithm uses less space...

In this paper we show how to construct a data structure for a string S of size N compressed into a context-free grammar of size n that supports efficient Karp–Rabin fingerprint queries to any substring of S. That is, given indices i and j, the answer to a query is the fingerprint of the substring . We present the first space data structures that an...

Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In the \emph{deterministic} variant the goal is to solve the string indexing problem without any randomization (at preprocessing time or query time). In the \emph{packed} varian...

Re-Pair~\cite{larsson2000off} is an effective grammar-based compression scheme achieving strong compression rates in practice. Let $n$, $\sigma$, and $d$ be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and...

We study a location-routing problem in the context of capacitated vehicle routing. The input to the k-location capacitated vehicle routing problem (k-LocVRP) consists of a set of demand locations in a metric space and a fleet of k identical vehicles, each of capacity Q. The objective is to locate k depots, one for each vehicle, and compute routes f...

In this work, we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays, and sparse position heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction.
Attempts at breaking the naïve bound of Ω(nb) time for constructing sparse suffix trees in O(b) space can...

Let S be a string of length n with characters from an alphabet of size \(\sigma \). The subsequence automaton of S (often called the directed acyclic subsequence graph) is the minimal deterministic finite automaton accepting all subsequences of S. A straightforward construction shows that the size (number of states and transitions) of the subsequen...

Let $S$ be a string of length $n$ with characters from an alphabet of size
$\sigma$. The \emph{subsequence automaton} of $S$ (often called the
\emph{directed acyclic subsequence graph}) is the minimal deterministic finite
automaton accepting all subsequences of $S$. A straightforward construction
shows that the size (number of states and transition...

We consider distance labeling schemes for trees: given a tree with $n$ nodes,
label the nodes with binary strings such that, given the labels of any two
nodes, one can determine, by looking only at the labels, the distance in the
tree between the two nodes.
A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg
(J. Graph Theor...

Grammar-based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. In this paper, we
present new representations of grammars that supports efficient finger search
style access, random access, and longest common exten...

Given a static reference string $R$ and a source string $S$, a relative
compression of $S$ with respect to $R$ is an encoding of $S$ as a sequence of
references to substrings of $R$. Relative compression schemes are a classic
model of compression and have recently proved very successful for compressing
highly-repetitive massive data set such as gen...

The longest common extension problem (LCE problem) is to construct a data
structure for an input string $T$ of length $n$ that supports LCE$(i,j)$
queries. Such a query returns the length of the longest common prefix of the
suffixes starting at positions $i$ and $j$ in $T$. This classic problem has a
well-known solution that uses $O(n)$ space and $...

We study the orthogonal range searching problem on points that have a significant number of geometric repetitions, that is, subsets of points that are identical under translation. Such repetitions occur in scenarios such as image compression, GIS applications and in compactly representing sparse matrices and web graphs. Our contribution is twofold....

We show how to compactly index video data to support fast motion detection queries. A query specifies a time interval T, a area A in the video and two thresholds v and p. The answer to a query is a list of timestamps in T where = p% of A has changed by = v values. Our results show that by building a small index, we can support queries with a speedu...

We present a new algorithm for subsequence matching in grammar compressed
strings. Given a grammar of size $n$ compressing a string of size $N$ and a
pattern string of size $m$ over an alphabet of size $\sigma$, our algorithm
uses $O(n+\frac{n\sigma}{w})$ space and $O(n+\frac{n\sigma}{w}+m\log N\log
w\cdot occ)$ or $O(n+\frac{n\sigma}{w}\log w+m\lo...

The Karp-Rabin fingerprint of a string is a type of hash value that due to
its strong properties has been used in many string algorithms. In this paper we
show how to construct a data structure for a string $S$ of size $N$ compressed
by a context-free grammar of size $n$ that answers fingerprint queries. That
is, given indices $i$ and $j$, the answ...

We consider the problem of computing the q-gram profile of a string T of size N compressed by a context-free grammar with n production rules. We present an algorithm that runs in O(N − α) expected time and uses O(n + k
T , q
) space, where N − α ≤ qn is the exact number of characters decompressed by the algorithm and k
T , q
≤ N − α is the number o...

We consider the problem of constructing a sparse suffix tree (or suffix array) for b suffixes of a given text T of length n, using only O(b) words of space during construction. Attempts at breaking the naive bound of Ω(nb) time for this problem can be traced back to the origins of string indexing in 1968. First results were only obtained in 1996, b...

The longest common extension (LCE) problem is to preprocess a string in order to allow for a large number of LCE queries, such that the queries are efficient. The LCE value, LCE
s
(i,j), is the length of the longest common prefix of the pair of suffixes starting at index i and j in the string s. The LCE problem can be solved in linear space with co...

We study a location-routing problem in the context of capacitated vehicle routing. The input to k-LocVRP is a set of demand locations in a metric space and a fleet of k vehicles each of capacity Q. The objective is to locate
k depots, one for each vehicle, and compute routes for the vehicles so that all demands are satisfied and the total cost is m...

We revisit various string indexing problems with range reporting features, namely, position-restricted substring searching, indexing substrings with gaps, and indexing substrings with intervals. We obtain the following main results.
We give efficient reductions for each of the above problems to a new problem, which we call substring range reportin...

The capacitated vehicle routing problem (CVRP) [21] involves distributing (identical) items from a depot to a set of demand locations in the shortest possible time, using a single capacitated vehicle. We study a generalization of this problem to the setting of multiple vehicles having non-uniform speeds (that we call Heterogenous CVRP), and present...

We consider string matching with variable length gaps. Given a string T and a pattern P consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is
to find all ending positions of substrings in T that match P. This problem is a basic primitive in computational biology applications. Let...

An arc-annotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arc-annotated strings P and Q the arc-preserving subsequence problem is to determine if P can be obtained from Q by deleting bases from Q. Whenever a base is deleted any arc with an endpoint in that base is a...

We study the approximate string matching and regular expressionmatching problemfor the casewhen the text to be searched is compressedwith the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly im...

Dial-a-Ride problems consist of a set V of n vertices in a metric space (denoting travel time between vertices) and a set of m objects represented as source-destination pairs \(\{(s_i,t_i)\}^m_{i=1}\), where each object requires to be moved from its source to destination vertex. In the multi-vehicle Dial-a-Ride problem, there are q vehicles each ha...

Abstract In this paper we give approximation,algorithms and inapproximability results for various asymmetric k-center with minimum coverage problems. In thek-center with minimum coverage problem, each center is required to serve a minimum,number,of clients. These problems have been studied by Lim et al. [Theor. Comput. Sci. 2005] in the symmetric,s...

The well-known number partition problem is NP-hard even in the following version: Given a set S of n non-negative integers; partition S into two sets X and Y such that |X|=|Y| and the sum of the elements in X is as close as possible to the sum of the elements in Y (or equivalently, minimize the maximum of the two sums). In this paper we study the f...

We study the approximate string matching and regular ex- pression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-o! that leads to algorithms improving the previously known complexities for both problems. In particular, we sig- nifica...

Given two rooted, labeled trees P and T the tree path subsequence problem is to determine which paths in P are subsequences of which paths in T. Here a path begins at the root and ends at a leaf. In this paper we propose this problem as a useful query primitive for XML data, and provide new algorithms improving the previously best known time and sp...

In the Finite Capacity Dial-a-Ride problem the input is a metric space, a set of objects {d
i
}, each specifying a source s
i
and a destination t
i
, and an integer k—the capacity of the vehicle used for making the deliveries. The goal is to compute a shortest tour for the vehicle in which all objects can be delivered from their sources to their de...

Given two rooted, ordered, and labeled trees P and T the tree inclusion problem is to determine if P can be obtained from T by deleting nodes in T. This problem has recently been recognized as an important query primitive in XML databases. Kilpelainen and Mannila (SIAM J. of Comp. 1995) presented the first polynomial time algorithm using quadratic...

A union-find data structure maintains a collection of disjoint sets under the operations makeset, union, and find. Kaplan, Shafrir, and Tarjan [SODA 2002] designed data structures for an extension of the union-find problem in which items of the sets maintained may be deleted. The cost of a delete operation in their implementations is essentially th...

This paper offers a systematic account of techniques to infer strong normalization from weak normalization that make use of syntactic translations from λ-terms to λI-terms. We present variants of such techniques due to Klop, Sørensen, Xi, Gandy, and Loader.
We show that all the translations, in some cases via adjustments, are special cases of a gen...

This paper explores three concepts: the k-center problem, some of its variants, and asymmetry. The k-center problem is a fundamental clustering problem, similar to the k-median problem. Variants of k-center may more accurately model real-life problems than the original formulation. Asymmetry is a significant impediment to approximation in many grap...

The dispatching problem for object oriented languages is the problem of determining the most specialized method to invoke for calls at run-time. This can be a critical component of execution performance. A number of recent results, including [Muthukrishnan and Müller SODA’96, Ferragina and Muthukrishnan ESA’96, Alstrup et al. FOCS’98], have studied...

## Citations

... Another variant is string indexing for consecutive occurrences [9,40]. Here, the goal is to compactly represent the string such that given a pattern P and a gap range [α, β] we can quickly find consecutive occurrences of P with distance in [α, β], i.e., pairs of occurrences immediately following each other and with distance within the range. ...

Reference: Gapped Indexing for Consecutive Occurrences

... , σ} with σ ∈ o log n (log log n) 2 then u ∈ o n log n (log log n) 2 and Arroyuelo and Raman's space bound is nH 0 (S) + o(n). Although there are many other searchable partial-sums data structures (see, e.g., [2,4] and references therein), as far as we know Arroyuelo and Raman's is the first to fit in this space, even for a sequence of sublogarithmic positive integers. In this paper we slightly improve their bound for this special case, to nH k (S)+ o(n) bits for k ∈ o log n (log log n) 2 , where H k (S) ≤ H 0 (S) is the kth-order empirical entropy of S. ...

... By allowing each node in to be identified by ′ 's hash, we can ensure only one node for each ′ ∈ is inserted into . This method of compressing through repeated subtrees is commonly known as Directed Acyclic Graph (DAG) compression [5], and aims at creating the most minimal representation of tree in the form of a DAG. ...

... Except for componentwise multiplication, all of the above componentwise operations can be implemented in constant time on the restricted UWRAM using standard word-level parallelism techniques [12,23] (see Appendix A for details on blend). For our purposes, we will need componentwise multiplication as an instruction (for evaluating hash functions in parallel) and thus we include this in the instruction set of the UWRAM. ...

Reference: Predecessor on the Ultra-Wide Word RAM

... Algoritma apriori merupakan algoritma market basket analysis yang digunakan untuk menghasilkan association rule [3], dan pada algoritma apriori merupakan solusi yang menguntungkan dalam pemecahan sebuah masalah [4]. Association rule dapat digunakan untuk menemukan hubungan atau sebab akibat. ...

... Grammar-based compression is a loss-less data compression scheme that represents a string w by an SLP for w. We are aware of more powerful compression schemes such as run-length SLPs [23,35,5], composition systems [18], collage systems [25], NU-systems [34], the Lempel-Ziv 77 family [40,37,11,12], and bidirectional schemes [37]. Nevertheless, since SLPs exhibit simpler structures than those, a number of efficient algorithms that can work directly on SLPs have been proposed, including pattern matching [24,23], convolutions [38], random access [7], detection of repeats and palindromes [21], Lyndon factorizations [22], longest common extension queries [20], longest common substrings [33], finger searches [4], and balancing the grammar [16]. ...

... i computational theoretic studies on the space and time needed for matching and parsing ii parsing algorithms with different coverage of syntax trees (total vs partial) iii RE software libraries Since our focus is on practical and provably correct algorithms, for brevity we only discuss category (ii), with one exception in category (i), i.e., [5], but we recall that we have experimentally found that the BSP parsing speed compares favorably with the popular RE2 library. A representative list of parsing algorithms is in Table 3, where each one is accompanied by a short description, to which we add a few comments. ...

... We are aware of more powerful compression schemes such as run-length SLPs [23,35,5], composition systems [18], collage systems [25], NU-systems [34], the Lempel-Ziv 77 family [40,37,11,12], and bidirectional schemes [37]. Nevertheless, since SLPs exhibit simpler structures than those, a number of efficient algorithms that can work directly on SLPs have been proposed, including pattern matching [24,23], convolutions [38], random access [7], detection of repeats and palindromes [21], Lyndon factorizations [22], longest common extension queries [20], longest common substrings [33], finger searches [4], and balancing the grammar [16]. More examples of algorithms directly working on SLPs can be found in references therein and the survey [30]. ...

... Similar compression ratios are reported in Wikipedia. 4 Despite the obvious practical relevance of these compression methods, there is not a clear entropy measure useful for highly repetitive texts. The number z of phrases generated by the Lempel-Ziv parse [32] is often used as a gold standard, possibly because it can be implemented in linear time [40] and is never larger than g, the size of the smallest context-free grammar that generates the text [41,7]. ...

... Beyond genomics applications, RLZ has also found wider use as a compressor for large text corpora in contexts where random-access support for individual documents is needed [14,32,33,24,19,2] and as as a general data compressor [17,16]. In those contexts, S 1 is usually first constructed using substrings sampled from other strings in the collection (Hoobin et al. [14] show that random sampling works well) in a preprocessing phase. ...

Reference: Hierarchical Relative Lempel-Ziv Compression