Sketching in Adversarial Environments

SIAM Journal on Computing (Impact Factor: 0.74). 01/2011; 40(6):1845-1870. DOI: 10.1145/1374376.1374471
Source: DBLP


We formalize a realistic model for computations over massive data sets. The model, referred to as the {\em adversarial sketch model}, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets. The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases. In the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation from the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance. In the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs. In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.

Full-text preview

Available from:
  • Source
    • "On the other hand, we can use the special structure of the path-quality monitoring setting to prove new analytical bounds which result in provably lower communication and storage requirements than those typically needed in traffic characterization applications. Also, at the end of Section 5.3 we discuss how the new result of [35] for sketching adversarially-chosen sets could be applied to our setting. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Edge networks connected to the Internet need effective mon- itoring techniques to drive routing decisions and detect vi- olations of Service Level Agreements (SLAs). However, ex- isting measurement tools, like ping, traceroute, and trajec- tory sampling, are vulnerable to attacks that can make a path look better than it really is. In this paper, we design and analyze path-quality monitoring protocols that reliably raise an alarm when the packet-loss rate and delay exceed a threshold, even when an adversary tries to bias monitoring results by selectively delaying, dropping, modifying, inject- ing, or preferentially treating packets. Despite the strong threat model we consider in this pa- per, our protocols are efficient enough to run at line rate on high-speed routers. We present a secure sketching protocol for identifying when packet loss and delay degrade beyond a threshold. This protocol is extremely lightweight, requiring only 250-600 bytes of storage and periodic transmission of a comparably sized IP packet to monitor billions of pack- ets. We also present secure sampling protocols that provide faster feedback and accurate round-trip delay estimates, at the expense of somewhat higher storage and communication costs. We prove that all our protocols satisfy a precise defi- nition of secure path-quality monitoring and derive analytic expressions for the trade-off between statistical accuracy and system overhead. We also compare how our protocols per- form in the client-server setting, when paths are asymmetric, and when packet marking is not permitted.
    Full-text · Conference Paper · Jun 2008
  • Source

    Preview · Article ·
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Linear sketches are powerful algorithmic tools that turn an n-dimensional input into a concise lower-dimensional representation via a linear transformation. Such sketches have seen a wide range of applications including norm estimation over data streams, compressed sensing, and distributed computing. In almost any realistic setting, however, a linear sketch faces the possibility that its inputs are correlated with previous evaluations of the sketch. Known techniques no longer guarantee the correctness of the output in the presence of such correlations. We therefore ask: Are linear sketches inherently non-robust to adaptively chosen inputs? We give a strong affirmative answer to this question. Specifically, we show that no linear sketch approximates the Euclidean norm of its input to within an arbitrary multiplicative approximation factor on a polynomial number of adaptively chosen inputs. The result remains true even if the dimension of the sketch is d = n - o(n) and the sketch is given unbounded computation time. Our result is based on an algorithm with running time polynomial in d that adaptively finds a distribution over inputs on which the sketch is incorrect with constant probability. Our result implies several corollaries for related problems including lp-norm estimation and compressed sensing. Notably, we resolve an open problem in compressed sensing regarding the feasibility of l2/l2-recovery guarantees in the presence of computationally bounded adversaries.
    Preview · Article · Nov 2012 · Proceedings of the Annual ACM Symposium on Theory of Computing
Show more