Michael Mitzenmacher

Harvard University, Boston, MA, USA

Are you Michael Mitzenmacher?

Claim your profile

Publications (18)1.03 Total impact

  • Article: Biff (Bloom Filter) Codes : Fast Error Correction for Large Data Sets
    Michael Mitzenmacher, George Varghese
    [show abstract] [hide abstract]
    ABSTRACT: Large data sets are increasingly common in cloud and virtualized environments. For example, transfers of multiple gigabytes are commonplace, as are replicated blocks of such sizes. There is a need for fast error-correction or data reconciliation in such settings even when the expected number of errors is small. Motivated by such cloud reconciliation problems, we consider error-correction schemes designed for large data, after explaining why previous approaches appear unsuitable. We introduce Biff codes, which are based on Bloom filters and are designed for large data. For Biff codes with a message of length $L$ and $E$ errors, the encoding time is $O(L)$, decoding time is $O(L + E)$ and the space overhead is $O(E)$. Biff codes are low-density parity-check codes; they are similar to Tornado codes, but are designed for errors instead of erasures. Further, Biff codes are designed to be very simple, removing any explicit graph structures and based entirely on hash tables. We derive Biff codes by a simple reduction from a set reconciliation algorithm for a recently developed data structure, invertible Bloom lookup tables. While the underlying theory is extremely simple, what makes this code especially attractive is the ease with which it can be implemented and the speed of decoding. We present results from a prototype implementation that decodes messages of 1 million words with thousands of errors in well under a second.
    08/2012;
  • Source
    Article: An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
    [show abstract] [hide abstract]
    ABSTRACT: As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology. Comment: A preliminary version of this work was presented in ACM PODS 2009. 20 pages, 0 figures
    02/2010;
  • Article: The Power of One Move: Hashing Schemes for Hardware.
    Adam Kirsch, Michael Mitzenmacher
    IEEE/ACM Trans. Netw. 01/2010; 18:1752-1765.
  • Chapter: Hash-Based Techniques for High-Speed Packet Processing
    Adam Kirsch, Michael Mitzenmacher, George Varghese
    [show abstract] [hide abstract]
    ABSTRACT: Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary.
    12/2009: pages 181-218;
  • Source
    Article: More Robust Hashing: Cuckoo Hashing with a Stash.
    Adam Kirsch, Michael Mitzenmacher, Udi Wieder
    SIAM J. Comput. 01/2009; 39:1543-1561.
  • Source
    Article: The hiring problem and Lake Wobegon strategies
    [show abstract] [hide abstract]
    ABSTRACT: We introduce the hiring problem, in which a growing company continuously interviews and decides whether to hire applicants. This problem is similar in spirit but quite different from the well-studied secretary problem. Like the secretary problem, it captures fundamental aspects of decision making under uncertainty and has many possible applications. We analyze natural strategies of hiring above the current average, considering both the mean and the median averages; we call these Lake Wobegon strategies. Like the hiring problem itself, our strategies are intuitive, simple to describe, and amenable to mathematically and economically significant modifications. We demonstrate several intriguing behaviors of the two strategies. Specifically, we show dramatic differences between hiring above the mean and above the median. We also show that both strategies are intrinsically connected to the lognormal distribution, leading to only very weak concentration results, and the marked importance of the first few hires on the overall outcome.
    SIAM J. Comput. 01/2009; 39:1184-1193.
  • Article: Less hashing, same performance: Building a better Bloom filter
    Adam Kirsch, Michael Mitzenmacher
    [show abstract] [hide abstract]
    ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice. © 2008 Wiley Periodicals, Inc. Random Struct. Alg., 2008
    Random Structures and Algorithms 08/2008; 33(2):187 - 218. · 1.03 Impact Factor
  • Conference Proceeding: More Robust Hashing: Cuckoo Hashing with a Stash.
    Adam Kirsch, Michael Mitzenmacher, Udi Wieder
    Algorithms - ESA 2008, 16th Annual European Symposium, Karlsruhe, Germany, September 15-17, 2008. Proceedings; 01/2008
  • Article: Simple summaries for hashing with choices.
    Adam Kirsch, Michael Mitzenmacher
    IEEE/ACM Trans. Netw. 01/2008; 16:218-231.
  • Conference Proceeding: The Power of One Move: Hashing Schemes for Hardware.
    Adam Kirsch, Michael Mitzenmacher
    INFOCOM 2008. 27th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 13-18 April 2008, Phoenix, AZ, USA; 01/2008
  • Source
    Article: Less hashing, same performance: Building a better Bloom filter.
    Adam Kirsch, Michael Mitzenmacher
    Random Struct. Algorithms. 01/2008; 33:187-218.
  • Source
    Conference Proceeding: The hiring problem and Lake Wobegon strategies.
    Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008; 01/2008
  • Chapter: Less Hashing, Same Performance: Building a Better Bloom Filter
    Adam Kirsch, Michael Mitzenmacher
    [show abstract] [hide abstract]
    ABSTRACT: A standard technique from the hashing literature is to use two hash functions h 1(x) and h 2(x) to simulate additional hash functions of the form g i (x) = h 1(x) + ih 2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.
    09/2006: pages 456-467;
  • Conference Proceeding: Less Hashing, Same Performance: Building a Better Bloom Filter.
    Adam Kirsch, Michael Mitzenmacher
    Algorithms - ESA 2006, 14th Annual European Symposium, Zurich, Switzerland, September 11-13, 2006, Proceedings; 01/2006
  • Chapter: More Robust Hashing: Cuckoo Hashing with a Stash
    Adam Kirsch, Michael Mitzenmacher, Udi Wieder
    [show abstract] [hide abstract]
    ABSTRACT: Cuckoo hashing holds great potential as a high-performance hashing scheme for real applications. Up to this point, the greatest drawback of cuckoo hashing appears to be that there is a polynomially small but practically significant probability that a failure occurs during the insertion of an item, requiring an expensive rehashing of all items in the table. In this paper, we show that this failure probability can be dramatically reduced by the addition of a very small constant-sized stash. We demonstrate both analytically and through simulations that stashes of size equivalent to only three or four items yield tremendous improvements, enhancing cuckoo hashing’s practical viability in both hardware and software. Our analysis naturally extends previous analyses of multiple cuckoo hashing variants, and the approach may prove useful in further related schemes.
    01/1970: pages 611-622;
  • Source
    Article: Distance-Sensitive Bloom Filters
    Adam Kirsch, Michael Mitzenmacher
    Applied Sciences.
  • Source
    Article: Building a better bloom filter
    Adam Kirsch, Michael Mitzenmacher
    [show abstract] [hide abstract]
    ABSTRACT: A technique from the hashing literature is to use two hash functions h 1 (x) and h 2 (x) to simulate additional hash functions of the form g i (x) = h 1 (x) + ih 2 (x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.
  • Source
    Article: Using a queue to de-amortize cuckoo hashing in hardware
    Adam Kirsch, Michael Mitzenmacher
    [show abstract] [hide abstract]
    ABSTRACT: Cuckoo hashing combines multiple-choice hashing with the power to move elements, providing hash tables with very high space utilization and low probability of overflow. However, inserting a new object into such a hash table can take substantial time, requiring many elements to be moved. While these events are rare and the amortized performance of these data structures is excellent, this shortcoming is unacceptable in many applications, particularly those involving hardware router implementations. We address this difficulty, focusing on the po-tential of content-addressable memories and queueing techniques to provide a de-amortization of cuckoo hashing suitable for hardware, and in particular for high-performance routers.