## About

179

Publications

12,249

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

2,968

Citations

Introduction

**Skills and Expertise**

## Publications

Publications (179)

A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, w...

Filters (such as Bloom Filters) are a fundamental data structure that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are very space efficient, but can make bounded one-sided errors: with tunable probability \(\epsilon \), they may report that a query element is stored in the filter when...

For any forest $G = (V, E)$ it is possible to orient the edges $E$ so that no vertex in $V$ has out-degree greater than $1$. This paper considers the incremental edge-orientation problem, in which the edges $E$ arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maxi...

In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer $k$, a pattern $P$ of length $m$, and a text $T$ of length $n \ge m$, the task is to find substrings of $T$ that are within edit distance $k$ from $P$. Our main result is a streaming algorithm that solves the probl...

Filters (such as Bloom Filters) are data structures that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are space efficient, but can make bounded one-sided errors: with tunable probability epsilon, they may report that a query element is stored in the filter when it is not. This is calle...

We formalize and examine the online Dictionary Recognition with One Gap problem (DROG) which is the following. Preprocess a dictionary D of d patterns each containing a special gap symbol that matches any string, so that given a text arriving online a character at a time, all patterns from D which are suffixes of the text that has arrived so far an...

In population protocols, the underlying distributed network consists of $n$ nodes (or agents), denoted by $V$, and a scheduler that continuously selects uniformly random pairs of nodes to interact. When two nodes interact, their states are updated by applying a state transition function that depends only on the states of the two nodes prior to the...

The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are g...

We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per char...

We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $\sigma$, compute the Hamming distance between the pattern and the text at every location. Several $(1+\epsilon)$-approximation algorithms have been proposed in the literature, with running time of the form $O(\epsi...

In the $\{-1,0,1\}$-APSP problem the goal is to compute all-pairs shortest paths (APSP) on a directed graph whose edge weights are all from $\{-1,0,1\}$. In the (min,max)-product problem the input is two $n\times n$ matrices $A$ and $B$, and the goal is to output the (min,max)-product of $A$ and $B$. This paper provides a new algorithm for the $\{-...

In the SetDisjointness problem, a collection of $m$ sets $S_1,S_2,...,S_m$ from some universe $U$ is preprocessed in order to answer queries on the emptiness of the intersection of some two query sets from the collection. In the SetIntersection variant, all the elements in the intersection of the query sets are required to be reported. These are tw...

In the 3SUM-Indexing problem the goal is to preprocess two lists of elements from $U$, $A=(a_1,a_2,\ldots,a_n)$ and $B=(b_1,b_2,...,b_n)$, such that given an element $c\in U$ one can quickly determine whether there exists a pair $(a,b)\in A \times B$ where $a+b=c$. Goldstein et al.~[WADS'2017] conjectured that there is no algorithm for 3SUM-Indexin...

In the classic dictionary matching problem, the input is a dictionary of patterns \(\mathcal {D}=\{P_1,P_2,\ldots ,P_k\}\) and a text T, and the goal is to report all the occurrences in T of every pattern from \(\mathcal {D}\). In the dynamic version of the dictionary matching problem, patterns may be either added or removed from \(\mathcal {D}\)....

In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $'?'$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streami...

We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction of a set of suffixes $B$ using only $O(|B|)$ words of space. The second problem is the Longest Common Extension (LCE) problem, where for some parameter $1\le\tau\le n$, the goal is to construct a da...

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffix...

This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with n vertices and m edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(\sqrt n)$ processors and $O(łog n...

In the kSUM problem we are given an array of numbers $a_1,a_2,...,a_n$ and we are required to determine if there are $k$ different elements in this array such that their sum is 0. This problem is a parameterized version of the well-studied SUBSET-SUM problem, and a special case is the 3SUM problem that is extensively used for proving conditional ha...

This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with $n$ vertices and $m$ edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(\sqrt n)$ processors and $O(\...

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx,...

We consider the problem of approximate pattern matching in a stream. In the streaming $k$-mismatch problem, we must compute all Hamming distances between a pattern of length $n$ and successive $n$-length substrings of a longer text, as long as the Hamming distance is at most $k$. The twin challenges of streaming pattern matching derive from the nee...

In recent years much effort was put into developing polynomial-time conditional lower bounds for algorithms and data structures in both static and dynamic settings. Along these lines we suggest a framework for proving conditional lower bounds based on the well-known 3SUM conjecture. Our framework creates a \emph{compact representation} of an instan...

In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a bett...

An approximate sparse recovery system in ℓ1 norm consists of parameters k, ε, N; an m-by-N measurement Φ; and a recovery algorithm R. Given a vector, x, the system approximates x by &xwidehat; = R(Φ x), which must satisfy ‖ &xwidehat;-x‖1 ≤ (1+ε)‖ x - xk‖1. We consider the “for all” model, in which a single...

In this paper we introduce a general framework that exponentially improves the space, degree of independence, and time needed by min-wise-based algorithms. The authors, in SODA '11 [1], introduced an exponential time improvement for min-wise-based algorithms. Here we develop an alternative approach that achieves both exponential time and exponentia...

We introduce the Holiday Gathering Problem which models the difficulty in scheduling non-interfering transmissions in (wireless) networks. Our goal is to schedule transmission rounds so that the antennas that transmit in a given round will not interfere with each other, i.e. all of the other antennas that can interfere will not transmit in that rou...

We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation. The key ideas are the representation of each word in the corpus by its two most infrequent letters, finding matched pairs of strings of four or five words that differ by at m...

In the classical pattern-matching problem, one is given a text and a pattern both of which are sequences of letters. The requirement is to find all occurrences of the pattern in the text. We studied two modifications of the classical problem, where each letter in the text and pattern is a set (Set Intersection Matching problem) or a sequence (Seque...

We revisit the complexity of one of the most basic problems in pattern
matching. In the k-mismatch problem we must compute the Hamming distance
between a pattern of length m and every m-length substring of a text of length
n, as long as that Hamming distance is at most k. Where the Hamming distance is
greater than k at some alignment of the pattern...

The algorithmic tasks of computing the Hamming distance between a given
pattern of length $m$ and each location in a text of length $n$ is one of the
most fundamental algorithmic tasks in string algorithms. Unfortunately, there
is evidence that for a text $T$ of size $n$ and a pattern $P$ of size $m$, one
cannot compute the exact Hamming distance f...

In this work, we focus on building an efficient succinct dynamic dictionary that significantly improves the query time of the current best known results. The algorithm that we propose suffers from only a (Formula presented.) multiplicative slowdown in its query time and a (Formula presented.) slowdown for insertion and deletion operations, where n...

Consider the problem of maintaining a family F of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given \(S,S'\in F\), report every member of \(S\cap S'\) in any order. We show that in the word RAM model, where w is the word size, given a cap d on the maximum size of any set, we can support set intersection qu...

We consider distance labeling schemes for trees: given a tree with $n$ nodes,
label the nodes with binary strings such that, given the labels of any two
nodes, one can determine, by looking only at the labels, the distance in the
tree between the two nodes.
A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg
(J. Graph Theor...

A distance labeling scheme labels the $n$ nodes of a graph with binary
strings such that, given the labels of any two nodes, one can determine the
distance in the graph between the two nodes by looking only at the labels. A
$D$-preserving distance labeling scheme only returns precise distances between
pairs of nodes that are at distance at least $D...

We propose an approach for approximating the Jaccard similarity of two streams, , for domains where this similarity is known to be high. Our method is based on a reduction from Jaccard similarity to norm estimation, for which there exists a sketch that is efficient in terms of both size and compute time, which we augment by a sampling technique. Ou...

We consider the problem of dictionary matching in a stream. Given a set of
strings, known as a dictionary, and a stream of characters arriving one at a
time, the task is to report each time some string in our dictionary occurs in
the stream. We present a randomised algorithm which takes O(log log(k + m))
time per arriving character and uses O(k log...

The dictionary matching with gaps problem is to preprocess a dictionary D of total size containing d gapped patterns over an alphabet Σ, where each gapped pattern is a sequence of subpatterns separated by bounded sequences of don't cares. Then, given a query text T of length n over Σ, the goal is to output all locations in T in which a pattern , ,...

Hardware-based packet classification has become an essential component in many networking devices. It often relies on ternary content-addressable memories (TCAMs), which compare the packet header against a set of rules. TCAMs are not well suited to encode range rules. Range rules are often encoded by multiple TCAM entries, and little is known about...

This book constitutes the refereed proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015, held on Ischia Island, Italy, in June/July 2015.
The 34 revised full papers presented together with 3 invited talks were carefully reviewed and selected from 83 submissions. The papers address issues of searching and matching str...

Efficient handling of sparse data is a key challenge in Computer Science.
Binary convolutions, such as polynomial multiplication or the Walsh Transform
are a useful tool in many applications and are efficiently solved.
In the last decade, several problems required efficient solution of sparse
binary convolutions. both randomized and deterministic a...

The dictionary matching with gaps problem is to preprocess a dictionary D of d gapped patterns P
1,…,P
d
over alphabet Σ, where each gapped pattern P
i
is a sequence of subpatterns separated by bounded sequences of don’t cares. Then, given a query text T of length n over alphabet Σ, the goal is to output all locations in T in which a pattern P
i
∈...

We introduce and examine the {\em Holiday Gathering Problem} which models the
difficulty that couples have when trying to decide with which parents should
they spend the holiday. Our goal is to schedule the family gatherings so that
the parents that will be {\em happy}, i.e.\ all their children will be home
{\em simultaneously} for the holiday fest...

We examine several (dynamic) graph and set intersection problems in the
word-RAM model with word size $w$. We begin with Dynamic Connectivity where we
need to maintain a fully dynamic graph $G=(V,E)$ with $n=|V|$ vertices while
supporting $(s,t)$-connectivity queries. To do this, we provide a new
simplified worst-case solution for the famous Dynami...

We prove lower bounds for several (dynamic) data structure problems
conditioned on the well known conjecture that 3SUM cannot be solved in
$O(n^{2-\Omega(1)})$ time. This continues a line of work that was initiated by
Patrascu [STOC 2010] and strengthened recently by Abboud and
Vassilevska-Williams [FOCS 2014]. The problems we consider are from sev...

We consider the problem of broadcasting a message from a sender to n ≥ 1 receivers in a time-slotted, single-hop, wireless network with a single communication channel. Sending and listening dominate the energy usage of small wireless devices and this is abstracted as a unit cost per time slot. A jamming adversary exists who can disrupt the channel...

In this work, we focus on building an efficient succinct dynamic dictionary that significantly improves the query time of the current best known results. The algorithm that we propose suffers from only a O((loglogn)2 ) multiplicative slowdown in its query time and a \(O(\frac{1}{\epsilon} \log n)\) slowdown for insertion and deletion operations, wh...

Recently, a new pattern matching paradigm was proposed, pattern matching with address errors. In this paradigm approximate string matching problems are studied, where the content is unaltered and only the locations of the different entries may change. Specifically, a broad class of problems was defined—the class of rearrangement errors. In this typ...

Histogram indexing, also known as jumbled pattern indexing and permutation indexing is one of the important current open problems in pattern matching. It was introduced about 6 years ago and has seen active research since. Yet, to date there is no algorithm that can preprocess a text T in time o(|T|(2)/polylog|T|) and achieve histogram indexing, ev...

An approximate sparse recovery system in $\ell_1$ norm consists of parameters
$k$, $\epsilon$, $N$, an $m$-by-$N$ measurement $\Phi$, and a recovery
algorithm, $\mathcal{R}$. Given a vector, $\mathbf{x}$, the system approximates
$x$ by $\widehat{\mathbf{x}} = \mathcal{R}(\Phi\mathbf{x})$, which must satisfy
$\|\widehat{\mathbf{x}}-\mathbf{x}\|_1 \l...

We consider how selfish agents are likely to share revenues derived from
maintaining connectivity between important network servers. We model a network
where a failure of one node may disrupt communication between other nodes as a
cooperative game called the vertex Connectivity Game (CG). In this game, each
agent owns a vertex, and controls all the...

We demonstrate how crowdsourcing can be used to automatically build a personalized tourist attraction recommender system, which tailors recommendations to specific individuals, so different people who use the system each get their own list of recommendations, appropriate to their own traits. Recommender systems crucially depend on the availability...

In edge orientations, the goal is usually to orient (direct) the edges of an undirected network (modeled by a graph) such that all out-degrees are bounded. When the network is fully dynamic, i.e., admits edge insertions and deletions, we wish to maintain such an orientation while keeping a tab on the update time. Low out-degree orientations turned...

A key building block for collaborative filtering recommender systems is finding users with similar consumption patterns. Given access to the full data regarding the items consumed by each user, one can directly compute the similarity between any two users. However, for massive recommender systems such a naive approach requires a high running time a...

Fingerprinting is a widely-used technique for efficiently verifying that two files are identical. More generally, linear sketching is a form of lossy compression (based on random projections) that also enables the "dissimilarity" of non-identical files to be estimated. Many sketches have been proposed for dissimilarity measures that decompose coord...

In this paper, we consider the “foreach” sparse recovery problem with failure probability p. The goal of the problem is to design a distribution over m ×N matrices Φ and a decoding algorithm A such that for every x ∈ ℝN
, we have with probability at least 1 − p
$$\|\mathbf{x}-A(\Phi\mathbf{x})\|_2\leqslant C\|\mathbf{x}-\mathbf{x}_k\|_2,$$ where x...

Hardware-based packet classification has become an essential component in many networking devices. It often relies on TCAMs (ternary content-addressable memories), which need to compare the packet header against a set of rules. But efficiently encoding these rules is not an easy task. In particular, the most complicated rules are range rules, which...

We study the problem of generating a large sample from a data stream of elements (i,v), where the sample consists of pairs (i,C
i
) for C
i
= ∑ (i,v) ∈ streamv. We consider strict turnstile streams and general non-strict turnstile streams, in which C
i
may be negative. Our sample is useful for approximating both forward and inverse distribution sta...

Network coding helps maximize the network throughput. However, such schemes are also vulnerable to pollution attacks in which malicious forwarders inject polluted messages into the system. Traditional cryptographic solution, such as digital signatures, are not suited for network coding, in which nodes do not forward the original packets, but rather...

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx...

We present two recursive techniques to construct compressed sensing schemes that can be“decoded” in sub-linear time. The first technique is based on the well studied code composition method called code concatenation where the“outer” code has strong list recoverability properties. This technique uses only one level of recursion and critically uses t...

We investigate the problem of deterministic pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated with one
of s streaming texts. The task at each time step is to report if there is a new
match between a fixed pattern of length m and a newly updated stream. As is
usual in the streaming context, the goal...

Group testing is a long studied problem in combinatorics: A small set of r ill people should be identified out of the whole (n people) by using only queries (tests) of the form “Does set X contain an ill human?”. In this paper we provide an explicit construction of a testing scheme which is better (smaller) than any known explicit construction. Thi...

Communication such as web browsing is often monitored and restricted by organizations and governments. Users who wish to bypass the monitoring and restrictions often relay their (encrypted) communication via proxy servers or anonym zing networks such as Tor. While this type of solution allows users to hide the content of their communication and oft...

We consider a class of pattern matching problems where a normalising
transformation is applied at every alignment. Normalised pattern matching plays
a key role in fields as diverse as image processing and musical information
processing where application specific transformations are often applied to the
input. By considering the class of polynomial...

Thorup and Zwick [J. ACM and STOC’01] in their seminal work introduced the notion of distance oracles. Given an n-vertex weighted undirected graph with m edges, they show that for any integer
k ≥ 1 it is possible to preprocess the graph in \(\tilde{O}(mn^{1/k})\) time and generate a compact data structure of size O(kn
1 + 1/k
). For each pair of ve...

A (d,ℓ)-list disjunct matrix is a non-adaptive group testing primitive which, given a set of items with at most d “defectives,” outputs a superset of the defectives containing less than ℓ non-defective items. The primitive has found many applications as stand alone objects and as building blocks in the construction of other combinatorial objects.
T...

We present space lower bounds for online pattern matching under a number of different distance measures. Given a pattern of length m and a text that arrives one character at a time, the online pattern matching problem is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. We require that...

Given an alphabet I ={1,2 pound,aEuro broken vertical bar,|I |} pound text string TaI pound (n) and a pattern string PaI pound (m) , for each i=1,2,aEuro broken vertical bar,n-m+1 define L (p) (i) as the p-norm distance when the pattern is aligned below the text and starts at position i of the text. The problem of pattern matching with L (p) distan...

Cuckoo hashing [4] is a multiple choice hashing scheme in which each item can
be placed in multiple locations, and collisions are resolved by moving items to
their alternative locations. In the classical implementation of two-way cuckoo
hashing, the memory is partitioned into contiguous disjoint fixed-size buckets.
Each item is hashed to two bucket...

In this paper we extend the notion of min-wise independent family of hash functions by defining a k-min-wise independent family of hash functions. Informally, under this definition, all subsets of size k of any fixed set X have an equal chance to have the minimal hash values among all the elements in X, when the probability is over the random choic...