Chapter

The Sketching Complexity of Pattern Matching

DOI: 10.1007/978-3-540-27821-4_24
Source: DBLP

ABSTRACT We address the problems of pattern matching and approximate pattern matching in the sketching model. We show that it is impossible
to compress the text into a small sketch and use only the sketch to decide whether a given pattern occurs in the text. We
also prove a sketch size lower bound for approximate pattern matching, and show it is tight up to a logarithmic factor.

0 Followers
 · 
87 Views
  • Source
    • "Our results come from lower-bounding the information cost of a novel one-way communication complexity problem. One can view our results as a strengthening of the augmented-indexing problem [9] [10] [18] [28] [35] to very large domains. Our technique is far-reaching, implying the first lower bounds for the space complexity of streaming algorithms that depends on the error probability δ. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Johnson-Lindenstrauss transform is a dimensional- ity reduction technique with a wide range of applica- tions to theoretical computer science. It is specied by a distribution over projection matrices from R n ! R k where k d and states that k = O(" 2 log 1= ) di- mensions suce to approximate the norm of any xed vector in R d to within a factor of 1 " with probability at least 1 . In this paper we show that this bound on k is optimal up to a constant factor, improving upon a previous (( " 2 log 1= )= log(1=")) dimension bound of Alon. Our techniques are based on lower bounding the information cost of a novel one-way communication game and yield the rst space lower bounds in a data stream model that depend on the error probability .
    Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, January 23-25, 2011; 05/2011
  • Source
    • "The lower bound for general (randomized) codes and the direct sum theorem are proved via information theory arguments . We extend previous arguments from [17], [18] to obtain a direct sum theorem for the information cost of codes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivated by a problem of transmitting supplemental data over broadcast channels (Birk and Kol, INFOCOM 1998), we study the following coding problem: a sender communicates with n receivers R<sub>1</sub>,..., R<sub>n</sub>. He holds an input x ∈ {0,01l}<sup>n</sup> and wishes to broadcast a single message so that each receiver Ri can recover the bit x<sub>i</sub>. Each R<sub>i</sub> has prior side information about x, induced by a directed graph Grain nodes; Ri knows the bits of a; in the positions {j | (i,j) is an edge of G}.G is known to the sender and to the receivers. We call encoding schemes that achieve this goal INDEXcodes for {0,1}<sup>n</sup> with side information graph G. In this paper we identify a measure on graphs, the minrank, which exactly characterizes the minimum length of linear and certain types of nonlinear INDEX codes. We show that for natural classes of side information graphs, including directed acyclic graphs, perfect graphs, odd holes, and odd anti-holes, minrank is the optimal length of arbitrary INDEX codes. For arbitrary INDEX codes and arbitrary graphs, we obtain a lower bound in terms of the size of the maximum acyclic induced subgraph. This bound holds even for randomized codes, but has been shown not to be tight.
    IEEE Transactions on Information Theory 04/2011; 57(3-57):1479 - 1494. DOI:10.1109/TIT.2010.2103753 · 2.65 Impact Factor
  • Source
    • "The lower bound for general (randomized) codes and the direct sum theorem are proved via information theory arguments. We extend previous arguments from [5] [4] to obtain a direct sum theorem for the information cost of codes. Finally, our lower bounds for odd holes and odd antiholes are purely combinatorial. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivated by a problem of transmitting data over broadcast channels (BirkandKol, INFOCOM1998), we study the following coding problem: a sender communicates with n receivers R<sub>l</sub>,.., R<sub>n</sub>. He holds an input x isin {0, 1}<sub>n</sub> and wishes to broadcast a single message so that each receiver R<sub>i</sub> can recover the bit x<sub>i</sub>. Each R<sub>i</sub> has prior side information about x, induced by a directed graph G on n nodes; R<sub>i </sub> knows the bits of x in the positions {j | (i, j) is anedge of G}. We call encoding schemes that achieve this goal INDEX codes for {0, 1} <sup>n</sup> with side information graph G. In this paper we identify a measure on graphs, the minrank, which we conjecture to exactly characterize the minimum length of INDEX codes. We resolve the conjecture for certain natural classes of graphs. For arbitrary graphs, we show that the minrank bound is tight for both linear codes and certain classes of non-linear codes. For the general problem, we obtain a (weaker) lower bound that the length of an INDEX code for any graph G is at least the size of the maximum acyclic induced subgraph of G
    Foundations of Computer Science, 2006. FOCS '06. 47th Annual IEEE Symposium on; 11/2006
Show more

Preview

Download
0 Downloads