Ely Porat

Ely Porat
Bar Ilan University | BIU · Department of Computer Science

About

179
Publications
12,249
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,968
Citations

Publications

Publications (179)
Preprint
Full-text available
A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, w...
Chapter
Filters (such as Bloom Filters) are a fundamental data structure that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are very space efficient, but can make bounded one-sided errors: with tunable probability \(\epsilon \), they may report that a query element is stored in the filter when...
Preprint
For any forest $G = (V, E)$ it is possible to orient the edges $E$ so that no vertex in $V$ has out-degree greater than $1$. This paper considers the incremental edge-orientation problem, in which the edges $E$ arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maxi...
Preprint
Full-text available
In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer $k$, a pattern $P$ of length $m$, and a text $T$ of length $n \ge m$, the task is to find substrings of $T$ that are within edit distance $k$ from $P$. Our main result is a streaming algorithm that solves the probl...
Preprint
Full-text available
Filters (such as Bloom Filters) are data structures that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are space efficient, but can make bounded one-sided errors: with tunable probability epsilon, they may report that a query element is stored in the filter when it is not. This is calle...
Article
Full-text available
We formalize and examine the online Dictionary Recognition with One Gap problem (DROG) which is the following. Preprocess a dictionary D of d patterns each containing a special gap symbol that matches any string, so that given a text arriving online a character at a time, all patterns from D which are suffixes of the text that has arrived so far an...
Preprint
In population protocols, the underlying distributed network consists of $n$ nodes (or agents), denoted by $V$, and a scheduler that continuously selects uniformly random pairs of nodes to interact. When two nodes interact, their states are updated by applying a state transition function that depends only on the states of the two nodes prior to the...
Preprint
Full-text available
The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are g...
Preprint
We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per char...
Preprint
We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $\sigma$, compute the Hamming distance between the pattern and the text at every location. Several $(1+\epsilon)$-approximation algorithms have been proposed in the literature, with running time of the form $O(\epsi...
Preprint
In the $\{-1,0,1\}$-APSP problem the goal is to compute all-pairs shortest paths (APSP) on a directed graph whose edge weights are all from $\{-1,0,1\}$. In the (min,max)-product problem the input is two $n\times n$ matrices $A$ and $B$, and the goal is to output the (min,max)-product of $A$ and $B$. This paper provides a new algorithm for the $\{-...
Preprint
In the SetDisjointness problem, a collection of $m$ sets $S_1,S_2,...,S_m$ from some universe $U$ is preprocessed in order to answer queries on the emptiness of the intersection of some two query sets from the collection. In the SetIntersection variant, all the elements in the intersection of the query sets are required to be reported. These are tw...
Preprint
In the 3SUM-Indexing problem the goal is to preprocess two lists of elements from $U$, $A=(a_1,a_2,\ldots,a_n)$ and $B=(b_1,b_2,...,b_n)$, such that given an element $c\in U$ one can quickly determine whether there exists a pair $(a,b)\in A \times B$ where $a+b=c$. Goldstein et al.~[WADS'2017] conjectured that there is no algorithm for 3SUM-Indexin...
Chapter
In the classic dictionary matching problem, the input is a dictionary of patterns \(\mathcal {D}=\{P_1,P_2,\ldots ,P_k\}\) and a text T, and the goal is to report all the occurrences in T of every pattern from \(\mathcal {D}\). In the dynamic version of the dictionary matching problem, patterns may be either added or removed from \(\mathcal {D}\)....
Article
Full-text available
In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $'?'$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streami...
Preprint
We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction of a set of suffixes $B$ using only $O(|B|)$ words of space. The second problem is the Longest Common Extension (LCE) problem, where for some parameter $1\le\tau\le n$, the goal is to construct a da...
Article
Full-text available
We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffix...
Conference Paper
This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with n vertices and m edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(\sqrt n)$ processors and $O(łog n...
Preprint
In the kSUM problem we are given an array of numbers $a_1,a_2,...,a_n$ and we are required to determine if there are $k$ different elements in this array such that their sum is 0. This problem is a parameterized version of the well-studied SUBSET-SUM problem, and a special case is the 3SUM problem that is extensively used for proving conditional ha...
Preprint
This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with $n$ vertices and $m$ edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(\sqrt n)$ processors and $O(\...
Article
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx,...
Article
Full-text available
We consider the problem of approximate pattern matching in a stream. In the streaming $k$-mismatch problem, we must compute all Hamming distances between a pattern of length $n$ and successive $n$-length substrings of a longer text, as long as the Hamming distance is at most $k$. The twin challenges of streaming pattern matching derive from the nee...
Article
In recent years much effort was put into developing polynomial-time conditional lower bounds for algorithms and data structures in both static and dynamic settings. Along these lines we suggest a framework for proving conditional lower bounds based on the well-known 3SUM conjecture. Our framework creates a \emph{compact representation} of an instan...
Conference Paper
In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a bett...
Article
An approximate sparse recovery system in ℓ1 norm consists of parameters k, ε, N; an m-by-N measurement Φ; and a recovery algorithm R. Given a vector, x, the system approximates x by &xwidehat; = R(Φ x), which must satisfy ‖ &xwidehat;-x‖1 ≤ (1+ε)‖ x - xk‖1. We consider the “for all” model, in which a single...
Article
In this paper we introduce a general framework that exponentially improves the space, degree of independence, and time needed by min-wise-based algorithms. The authors, in SODA '11 [1], introduced an exponential time improvement for min-wise-based algorithms. Here we develop an alternative approach that achieves both exponential time and exponentia...
Conference Paper
We introduce the Holiday Gathering Problem which models the difficulty in scheduling non-interfering transmissions in (wireless) networks. Our goal is to schedule transmission rounds so that the antennas that transmit in a given round will not interfere with each other, i.e. all of the other antennas that can interfere will not transmit in that rou...
Article
Full-text available
We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation. The key ideas are the representation of each word in the corpus by its two most infrequent letters, finding matched pairs of strings of four or five words that differ by at m...
Article
In the classical pattern-matching problem, one is given a text and a pattern both of which are sequences of letters. The requirement is to find all occurrences of the pattern in the text. We studied two modifications of the classical problem, where each letter in the text and pattern is a set (Set Intersection Matching problem) or a sequence (Seque...
Conference Paper
Full-text available
We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern...
Article
Full-text available
The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a text $T$ of size $n$ and a pattern $P$ of size $m$, one cannot compute the exact Hamming distance f...
Article
In this work, we focus on building an efficient succinct dynamic dictionary that significantly improves the query time of the current best known results. The algorithm that we propose suffers from only a (Formula presented.) multiplicative slowdown in its query time and a (Formula presented.) slowdown for insertion and deletion operations, where n...
Conference Paper
Consider the problem of maintaining a family F of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given \(S,S'\in F\), report every member of \(S\cap S'\) in any order. We show that in the word RAM model, where w is the word size, given a cap d on the maximum size of any set, we can support set intersection qu...
Article
Full-text available
We consider distance labeling schemes for trees: given a tree with $n$ nodes, label the nodes with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the distance in the tree between the two nodes. A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theor...
Article
Full-text available
A distance labeling scheme labels the $n$ nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A $D$-preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least $D...
Article
We propose an approach for approximating the Jaccard similarity of two streams, , for domains where this similarity is known to be high. Our method is based on a reduction from Jaccard similarity to norm estimation, for which there exists a sketch that is efficient in terms of both size and compute time, which we augment by a sampling technique. Ou...
Article
Full-text available
We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log...
Article
Full-text available
The dictionary matching with gaps problem is to preprocess a dictionary D of total size containing d gapped patterns over an alphabet Σ, where each gapped pattern is a sequence of subpatterns separated by bounded sequences of don't cares. Then, given a query text T of length n over Σ, the goal is to output all locations in T in which a pattern , ,...
Article
Hardware-based packet classification has become an essential component in many networking devices. It often relies on ternary content-addressable memories (TCAMs), which compare the packet header against a set of rules. TCAMs are not well suited to encode range rules. Range rules are often encoded by multiple TCAM entries, and little is known about...
Book
This book constitutes the refereed proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching, CPM 2015, held on Ischia Island, Italy, in June/July 2015. The 34 revised full papers presented together with 3 invited talks were carefully reviewed and selected from 83 submissions. The papers address issues of searching and matching str...
Article
Full-text available
Efficient handling of sparse data is a key challenge in Computer Science. Binary convolutions, such as polynomial multiplication or the Walsh Transform are a useful tool in many applications and are efficiently solved. In the last decade, several problems required efficient solution of sparse binary convolutions. both randomized and deterministic a...
Conference Paper
Full-text available
The dictionary matching with gaps problem is to preprocess a dictionary D of d gapped patterns P 1,…,P d over alphabet Σ, where each gapped pattern P i is a sequence of subpatterns separated by bounded sequences of don’t cares. Then, given a query text T of length n over alphabet Σ, the goal is to output all locations in T in which a pattern P i ∈...
Article
Full-text available
We introduce and examine the {\em Holiday Gathering Problem} which models the difficulty that couples have when trying to decide with which parents should they spend the holiday. Our goal is to schedule the family gatherings so that the parents that will be {\em happy}, i.e.\ all their children will be home {\em simultaneously} for the holiday fest...
Article
Full-text available
We examine several (dynamic) graph and set intersection problems in the word-RAM model with word size $w$. We begin with Dynamic Connectivity where we need to maintain a fully dynamic graph $G=(V,E)$ with $n=|V|$ vertices while supporting $(s,t)$-connectivity queries. To do this, we provide a new simplified worst-case solution for the famous Dynami...
Article
Full-text available
We prove lower bounds for several (dynamic) data structure problems conditioned on the well known conjecture that 3SUM cannot be solved in $O(n^{2-\Omega(1)})$ time. This continues a line of work that was initiated by Patrascu [STOC 2010] and strengthened recently by Abboud and Vassilevska-Williams [FOCS 2014]. The problems we consider are from sev...
Article
We consider the problem of broadcasting a message from a sender to n ≥ 1 receivers in a time-slotted, single-hop, wireless network with a single communication channel. Sending and listening dominate the energy usage of small wireless devices and this is abstracted as a unit cost per time slot. A jamming adversary exists who can disrupt the channel...
Conference Paper
In this work, we focus on building an efficient succinct dynamic dictionary that significantly improves the query time of the current best known results. The algorithm that we propose suffers from only a O((loglogn)2 ) multiplicative slowdown in its query time and a \(O(\frac{1}{\epsilon} \log n)\) slowdown for insertion and deletion operations, wh...
Article
Recently, a new pattern matching paradigm was proposed, pattern matching with address errors. In this paradigm approximate string matching problems are studied, where the content is unaltered and only the locations of the different entries may change. Specifically, a broad class of problems was defined—the class of rearrangement errors. In this typ...
Article
Full-text available
Histogram indexing, also known as jumbled pattern indexing and permutation indexing is one of the important current open problems in pattern matching. It was introduced about 6 years ago and has seen active research since. Yet, to date there is no algorithm that can preprocess a text T in time o(|T|(2)/polylog|T|) and achieve histogram indexing, ev...
Conference Paper
Full-text available
An approximate sparse recovery system in $\ell_1$ norm consists of parameters $k$, $\epsilon$, $N$, an $m$-by-$N$ measurement $\Phi$, and a recovery algorithm, $\mathcal{R}$. Given a vector, $\mathbf{x}$, the system approximates $x$ by $\widehat{\mathbf{x}} = \mathcal{R}(\Phi\mathbf{x})$, which must satisfy $\|\widehat{\mathbf{x}}-\mathbf{x}\|_1 \l...
Article
Full-text available
We consider how selfish agents are likely to share revenues derived from maintaining connectivity between important network servers. We model a network where a failure of one node may disrupt communication between other nodes as a cooperative game called the vertex Connectivity Game (CG). In this game, each agent owns a vertex, and controls all the...
Article
We demonstrate how crowdsourcing can be used to automatically build a personalized tourist attraction recommender system, which tailors recommendations to specific individuals, so different people who use the system each get their own list of recommendations, appropriate to their own traits. Recommender systems crucially depend on the availability...
Conference Paper
Full-text available
In edge orientations, the goal is usually to orient (direct) the edges of an undirected network (modeled by a graph) such that all out-degrees are bounded. When the network is fully dynamic, i.e., admits edge insertions and deletions, we wish to maintain such an orientation while keeping a tab on the update time. Low out-degree orientations turned...
Conference Paper
A key building block for collaborative filtering recommender systems is finding users with similar consumption patterns. Given access to the full data regarding the items consumed by each user, one can directly compute the similarity between any two users. However, for massive recommender systems such a naive approach requires a high running time a...
Article
Fingerprinting is a widely-used technique for efficiently verifying that two files are identical. More generally, linear sketching is a form of lossy compression (based on random projections) that also enables the "dissimilarity" of non-identical files to be estimated. Many sketches have been proposed for dissimilarity measures that decompose coord...
Conference Paper
Full-text available
In this paper, we consider the “foreach” sparse recovery problem with failure probability p. The goal of the problem is to design a distribution over m ×N matrices Φ and a decoding algorithm A such that for every x ∈ ℝN , we have with probability at least 1 − p $$\|\mathbf{x}-A(\Phi\mathbf{x})\|_2\leqslant C\|\mathbf{x}-\mathbf{x}_k\|_2,$$ where x...
Conference Paper
Hardware-based packet classification has become an essential component in many networking devices. It often relies on TCAMs (ternary content-addressable memories), which need to compare the packet header against a set of rules. But efficiently encoding these rules is not an easy task. In particular, the most complicated rules are range rules, which...
Conference Paper
We study the problem of generating a large sample from a data stream of elements (i,v), where the sample consists of pairs (i,C i ) for C i = ∑ (i,v) ∈ streamv. We consider strict turnstile streams and general non-strict turnstile streams, in which C i may be negative. Our sample is useful for approximating both forward and inverse distribution sta...
Conference Paper
Network coding helps maximize the network throughput. However, such schemes are also vulnerable to pollution attacks in which malicious forwarders inject polluted messages into the system. Traditional cryptographic solution, such as digital signatures, are not suited for network coding, in which nodes do not forward the original packets, but rather...
Conference Paper
Full-text available
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx...
Conference Paper
We present two recursive techniques to construct compressed sensing schemes that can be“decoded” in sub-linear time. The first technique is based on the well studied code composition method called code concatenation where the“outer” code has strong list recoverability properties. This technique uses only one level of recursion and critically uses t...
Conference Paper
Full-text available
We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal...
Conference Paper
Full-text available
Group testing is a long studied problem in combinatorics: A small set of r ill people should be identified out of the whole (n people) by using only queries (tests) of the form “Does set X contain an ill human?”. In this paper we provide an explicit construction of a testing scheme which is better (smaller) than any known explicit construction. Thi...
Conference Paper
Full-text available
Communication such as web browsing is often monitored and restricted by organizations and governments. Users who wish to bypass the monitoring and restrictions often relay their (encrypted) communication via proxy servers or anonym zing networks such as Tor. While this type of solution allows users to hide the content of their communication and oft...
Article
Full-text available
We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial...
Conference Paper
Thorup and Zwick [J. ACM and STOC’01] in their seminal work introduced the notion of distance oracles. Given an n-vertex weighted undirected graph with m edges, they show that for any integer k ≥ 1 it is possible to preprocess the graph in \(\tilde{O}(mn^{1/k})\) time and generate a compact data structure of size O(kn 1 + 1/k ). For each pair of ve...
Conference Paper
A (d,ℓ)-list disjunct matrix is a non-adaptive group testing primitive which, given a set of items with at most d “defectives,” outputs a superset of the defectives containing less than ℓ non-defective items. The primitive has found many applications as stand alone objects and as building blocks in the construction of other combinatorial objects. T...
Conference Paper
We present space lower bounds for online pattern matching under a number of different distance measures. Given a pattern of length m and a text that arrives one character at a time, the online pattern matching problem is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. We require that...
Article
Given an alphabet I ={1,2 pound,aEuro broken vertical bar,|I |} pound text string TaI pound (n) and a pattern string PaI pound (m) , for each i=1,2,aEuro broken vertical bar,n-m+1 define L (p) (i) as the p-norm distance when the pattern is aligned below the text and starts at position i of the text. The problem of pattern matching with L (p) distan...
Article
Cuckoo hashing [4] is a multiple choice hashing scheme in which each item can be placed in multiple locations, and collisions are resolved by moving items to their alternative locations. In the classical implementation of two-way cuckoo hashing, the memory is partitioned into contiguous disjoint fixed-size buckets. Each item is hashed to two bucket...
Conference Paper
In this paper we extend the notion of min-wise independent family of hash functions by defining a k-min-wise independent family of hash functions. Informally, under this definition, all subsets of size k of any fixed set X have an equal chance to have the minimal hash values among all the elements in X, when the probability is over the random choic...