Maurice Herlihy

Maurice Herlihy
Brown University · Department of Computer Science

About

386
Publications
49,065
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
20,748
Citations

Publications

Publications (386)
Preprint
Full-text available
This paper considers the classical state machine replication (SMR) problem in a distributed system model inspired by cross-chain exchanges. We propose a novel SMR protocol adapted for this model. Each state machine transition takes $O(n)$ message delays, where $n$ is the number of active participants, of which any number may be Byzantine. This prot...
Article
Full-text available
We investigate scheduling algorithms for distributed transactional memory systems where transactions residing at nodes of a communication graph operate on shared, mobile objects. A transaction requests the objects it needs, executes once those objects have been assembled, and then sends the objects to other waiting transactions. We study scheduling...
Chapter
Many aspects of blockchain-based decentralized finance can be understood as an extension of classical distributed computing. In this paper, we trace the evolution of two interrelated notions: failure and fault-tolerance. In classical distributed computing, a failure to complete a multi-party protocol is typically attributed to hardware malfunctions...
Preprint
Full-text available
Automated market makers (AMMs) are smart contracts that automatically trade electronic assets according to a mathematical formula. This paper investigates how an AMM's formula affects the interests of liquidity providers, who endow the AMM with assets, and traders, who exchange one asset for another at the AMM's rates. *Linear slippage* measures ho...
Preprint
Many aspects of blockchain-based decentralized finance can be understood as an extension of classical distributed computing. In this paper, we trace the evolution of two interrelated notions: failure and fault-tolerance. In classical distributed computing, a failure to complete a multi-party protocol is typically attributed to hardware malfunctions...
Book
This book is the first to present the state of the art and provide technical focus on the latest advances in the foundations of blockchain systems. It is a collaborative work between specialists in cryptography, distributed systems, formal languages, and economics, and addresses hot topics in blockchains from a theoretical perspective: cryptographi...
Article
Full-text available
Modern distributed data management systems face a new challenge: how can autonomous, mutually distrusting parties cooperate safely and effectively? Addressing this challenge brings up familiar questions from classical distributed systems: how to combine multiple steps into a single atomic action, how to recover from failures, and how to synchronize...
Preprint
Full-text available
Safe lock-free memory reclamation is a difficult problem. Existing solutions follow three basic methods (or their combinations): epoch based reclamation, hazard pointers, and optimistic reclamation. Epoch-based methods are fast, but do not guarantee lock-freedom. Hazard pointer solutions are lock-free but typically do not provide high performance....
Preprint
Linearizability is the de facto consistency condition for concurrent objects, widely used in theory and practice. Loosely speaking, linearizability classifies concurrent executions as correct if operations on shared objects appear to take effect instantaneously during the operation execution time. This paper calls attention to a somewhat-neglected...
Article
Full-text available
We investigate scheduling algorithms for distributed transactional memory systems where transactions residing at nodes of a communication graph operate on shared, mobile objects. A transaction requests the objects it needs, executes once those objects have been assembled, and then possibly forwards those objects to other waiting transactions. Minim...
Article
We present LB-Spiral, a novel distributed directory protocol for shared objects, suitable for large-scale distributed shared memory systems. Each shared object has an owner node that can modify its value. The ownership may change by moving the object from one node to another in response to move requests. The value of an object can be read by other...
Chapter
In this chapter, we consider various synchronization primitives that a multiprocessor system might provide, and evaluate their relative power. In particular, we compare synchronization primitives by their ability to solve consensus, an elementary synchronization problem. The consensus number of a synchronization primitive is the maximum number of t...
Chapter
This chapter covers several useful patterns for distributed coordination: combining, counting, diffraction, and sampling. Some of these techniques are deterministic; others use randomization. We cover two basic structures underlying these patterns, trees and combinatorial networks. Although these techniques support a high degree of parallelism with...
Chapter
This chapter presents two implementations of concurrent skiplists, one lock-based and the other lock-free. Skip lists are a probabilistic data structure that provides logarithmic search (with high probability). Although balanced search trees guarantee logarthmic search, the rebalancing they require can cause contention, which cause poor performance...
Chapter
This chapter uses the example of a list-based set to present several useful techniques for implementing concurrent data structures, starting from simple coarse-grained locking to fine-grained locking to optimistic and lazy synchronization, and finally nonblocking synchronization. It also demonstrates how to reason about the correctness of concurren...
Chapter
This chapter studies concurrent hash table implementations. A hash table can use either closed or open addressing. Closed-address hashing allows the reuse or adaptation of concurrent data structures (most notably lists) from previous chapters, and can result in highly scalable, even lock-free algorithms. The chapter also shows how to craft highly s...
Chapter
This chapter introduces monitors, a structured way to encapsulate data, methods, and synchronization in a single modular package. It illustrates how to use monitors to implement basic synchronization mechanisms such as readers-writers locks and semaphores.
Chapter
This chapter covers classical mutual exclusion algorithms that work by reading and writing shared memory. Although these algorithms are not used in practice, they introduce the kinds of algorithmic and correctness issues that arise in every area of synchronization. This chapter also includes an impossibility proof that demonstrates the limitations...
Chapter
This chapter examines MapReduce and stream programming, two approaches to shared-memory data-parallel programming, that is, to implementing parallel algorithms that process large amounts of data. It explains each approach, describes a simple generic framework for developing applications with that approach, and demonstrates how to use it to implemen...
Chapter
This chapter presents a variety of queue implementations, from bounded lock-based queues to unbounded lock-free queues to synchronous queues. It introduces the notions of partial methods, which block when certain conditions are not met, for example, trying to remove an element from an empty queue or add an element to a full (bounded) queue, and syn...
Chapter
This chapter switches from C++ to Java to explore the challenges of manual memory management in a concurrent system. This is an important issue faced by designers of concurrent low-level systems software. We present two approaches that represent different ends of a spectrum. On one end are hazard pointers. Hazard pointers enable tight bounds on the...
Chapter
Many parallel algorithms execute in phases such that all threads must complete each phase before any thread moves on to the next phase. Other algorithms, such as work stealing, must be able to detect when all of the threads have run out of work. This kind of synchronization is called a barrier. This chapter introduces barriers, discusses the basic...
Chapter
This chapter begins our study of the foundations of concurrent computation by examining its most basic primitive, the read–write register. A register is a single location of shared memory. We characterize a register by the values it can store, the number of threads that can access it and what operations they can do, and the properties it guarantees...
Chapter
Some applications break down naturally into many parallel tasks. This chapter shows how to decompose and analyze such applications, introducing the notions of work and span. The chapter also introduces thread pools, an efficient and robust mechanism for executing such applications that insulates the programmer from platform-dependent details. Final...
Chapter
This chapter begins our study of practical concurrency by exploring the impact of system architecture on the performance of spin locks. Understanding the memory hierarchy and how processors communicate is critical to being able to write effective concurrent programs. We examine how to exploit knowledge of the architecture to design locking algorith...
Chapter
This chapter introduces transactional programming. Transactional programming raises the level of abstraction, letting a programmer focus on which regions of code must be atomic, and not how to make those regions atomic. By shifting responsibility for achieving atomicity to a run-time library, transactional programming overcomes many limitations and...
Chapter
This chapter presents several concurrent priority queue implementations. A priority queue is a multiset of items, each with an associated priority; items are processed (i.e., removed from the queue) in order of their priority. Priority queues play a vital role in many applications, which have different requirements for their priority queues. For ex...
Chapter
A stack is a last-in-first-out (LIFO) pool: threads push items on, and pop items off, the top of the stack. This chapter considers the challenge of implementing scalable concurrent stacks when every operation would seem to contend for access to the top of the stack. It presents a classic lock-free stack implementation, in which all operations do in...
Chapter
This chapter describes how to use consensus objects to build a universal construction, an algorithm for implementing a linearizable concurrent object for any sequential object type. It presents two algorithms, a lock-free one and a wait-free one. These universal constructions demonstrate the consensus objects are universal; that is, they can be use...
Chapter
This chapter describes how to specify correctness for concurrent objects. All notions of correctness for concurrent objects are based on some notion of equivalence with sequential behavior. Sequential consistency is a strong condition that is useful for describing standalone systems. Linearizability is an even stronger condition that supports compo...
Article
Full-text available
Modern cryptocurrency systems, such as Ethereum, permit complex financial transactions through scripts called smart contracts. These smart contracts are executed many, many times, always without real concurrency. First, all smart contracts are serially executed by miners before appending them to the blockchain. Later, those contracts are serially r...
Chapter
Concurrent data structures implemented with software transactional memory (STM) perform poorly when operations which do not conflict in the definition of the abstract data type nonetheless incur conflicts in the concrete state of an implementation. Several works addressed various aspects of this problem, yet we still lack efficient, general-purpose...
Conference Paper
Full-text available
In a traditional DRAM-based main memory architecture, a memory access operation requires much more time and energy than a simple logic operation. This fact is exploited to build time-consuming and power-hungry memory-hard cryptographic functions that serve the purpose of hindering brute-force security attacks. The security of such memory-hard funct...
Chapter
Full-text available
We present CUDA-DTM, the first ever Distributed Transactional Memory framework written in CUDA for large scale GPU clusters. Transactional Memory has become an attractive auto-coherence scheme for GPU applications with irregular memory access patterns due to its ability to avoid serializing threads while still maintaining programmability. We extend...
Conference Paper
Full-text available
Recent advances in memory architectures have provoked renewed interest in near-data-processing (NDP) as way to alleviate the "mem-ory wall" problem. An NDP architecture places logic circuits, such as simple processors, in close proximity to memory. Effective use of NDP architectures requires rethinking data structures and their algorithms. Here, we...
Article
The roots of blockchain technologies are deeply interwoven in distributed computing.
Preprint
We use historical data to estimate the potential benefit of speculative techniques for executing Ethereum smart contracts in parallel. We replay transaction traces of sampled blocks from the Ethereum blockchain over time, using a simple speculative execution engine. In this engine, miners attempt to execute all transactions in a block in parallel,...
Article
The M(n)-renaming task requires n + 1 processes, each starting with a unique input name (from an arbitrary large range), to coordinate the choice of new output names from a range of size M(n). It is known that 2n-renaming can be solved if and only if n + 1 is not a prime power. However, the previous proof of solvability was not constructive, involv...
Chapter
We propose a way to reconcile the apparent contradiction between the immutability of idealized smart contracts and the real-world need to update contracts to fix bugs and oversights. Our proposal is to raise the contract’s level of abstraction to guarantee a specification \(\varphi \) instead of a particular implementation of that specification. A...
Article
Full-text available
High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories...
Article
Full-text available
We consider scheduling problems in the data flow model of distributed transactional memory. Objects shared by transactions move from one network node to another by following network paths. We examine how the objects’ transfer in the network affects the completion time of all transactions and the total communication cost. We show that there are prob...
Conference Paper
Full-text available
We present hamm, a Haskell library that enables programmers to easily configure authenticated map (key-value store) implementations. We use type level programming techniques to establish an extensible foundation, and provide an example base map and several example “add on” transformers supporting features such as caches, Bloom filters and paging st...
Article
Full-text available
The asynchronous computability theorem (ACT) uses concepts from combinatorial topology to characterize which tasks have wait-free solutions in read-write memory. A task can be expressed as a relation between two chromatic simplicial complexes. The theorem states that a task has a protocol (algorithm) if and only if there is a certain chromatic simp...
Article
Today’s hardware transactional memory (HTM) systems rely on existing coherence protocols, which implement a requester-wins strategy. This, in turn, leads to poor performance when transactions frequently conflict, causing them to resort to a non-speculative fallback path. Often, such a path severely limits parallelism. In this article, we propose ve...
Conference Paper
Full-text available
Non-volatile memory is expected to coexist with (or even displace) volatile DRAM for main memory in upcoming architectures. This has led to increasing interest in the problem of designing and specifying durable data structures that can recover from system crashes. Data structures may be designed to satisfy stricter or weaker durability guarantees t...
Article
Full-text available
Non-volatile memory is expected to coexist with (or even displace) volatile DRAM for main memory in upcoming architectures. This has led to increasing interest in the problem of designing and specifying durable data structures that can recover from system crashes. Data structures may be designed to satisfy stricter or weaker durability guarantees t...
Article
Full-text available
Scaling of semiconductor devices has enabled higher levels of integration and performance improvements at the price of making devices more susceptible to the effects of static and dynamic variability. Adding safety margins (guardbands) on the operating frequency or supply voltage prevents timing errors, but has a negative impact on performance and...
Conference Paper
Most STM systems are poorly equipped to support libraries of concurrent data structures. One reason is that they typically detect conflicts by tracking transactions' read sets and write sets, an approach that often leads to false conflicts. A second is that existing data structures and libraries often need to be rewritten from scratch to support tr...
Conference Paper
There has been a recent explosion of interest in blockchain-based distributed ledger systems such as Bitcoin, Ethereum, and many others. Much of this work originated outside the distributed computing community, but the questions raised, such as consensus, replication, fault-tolerance, privacy, and security, and so on, are all issues familiar to our...
Conference Paper
Modern cryptocurrency systems, such as Ethereum, permit complex financial transactions through scripts called smart contracts. These smart contracts are executed many, many times, always without real concurrency. First, all smart contracts are serially executed by miners before appending them to the blockchain. Later, those contracts are serially r...
Conference Paper
We investigate scheduling algorithms for distributed transactional memory systems where transactions residing at nodes of a communication graph operate on shared, mobile objects. A transaction requests the objects it needs, executes once those objects have been assembled, and then possibly forwards those objects to other waiting transactions. Minim...
Conference Paper
The performance gap between memory and CPU has grown exponentially. To bridge this gap, hardware architects have proposed near-memory computing (also called processing-in-memory, or PIM), where a lightweight processor (called a PIM core) is located close to memory. Due to its proximity to memory, a memory access from a PIM core is much faster than...
Article
In an asynchronous distributed system where any number of processes may crash, a process may have to run solo, computing its local output without receiving any information from other processes. In the basic shared memory system where the processes communicate through atomic read/write registers, at most one process may run solo. This paper introduc...
Article
Most STM systems are poorly equipped to support libraries of concurrent data structures. One reason is that they typically detect conflicts by tracking transactions' read sets and write sets, an approach that often leads to false conflicts. A second is that existing data structures and libraries often need to be rewritten from scratch to support tr...
Conference Paper
State teleportation is a new technique for exploiting hardware transactional memory (HTM) to improve existing synchronization and memory management schemes for highly-concurrent data structures. When applied to fine-grained locking, a thread holding the lock for a node launches a hardware transaction that traverses multiple successor nodes, acquire...
Article
State teleportation is a new technique for exploiting hardware transactional memory (HTM) to improve existing synchronization and memory management schemes for highly-concurrent data structures. When applied to fine-grained locking, a thread holding the lock for a node launches a hardware transaction that traverses multiple successor nodes, acquire...
Conference Paper
Full-text available
We present thrifty-malloc: a transaction-friendly dynamic memory manager for high-end embedded multicore systems. The manager combines modularity, ease-of-use and hardware transactional memory (HTM) compatibility in a lightweight and memory-efficient design. Thrifty-malloc is easy to deploy and configure for non-expert programmers, yet provides goo...
Conference Paper
A task is a distributed coordination problem where processes start with private inputs, communicate with one another, and then halt with private outputs. A protocol that solves a task is t-resilient if it tolerates halting failures by t or fewer processes. The t-resilient asynchronous computability theorem stated here characterizes the tasks that h...
Conference Paper
In concurrent systems without automatic garbage collection, it is challenging to determine when it is safe to reclaim memory, especially for lock-free data structures. Existing concurrent memory reclamation schemes are either fast but do not tolerate process delays, robust to delays but with high overhead, or both robust and fast but narrowly appli...
Article
Full-text available
Permisionless decentralized ledgers ("blockchains") such as the one underlying the cryptocurrency Bitcoin allow anonymous participants to maintain the ledger, while avoiding control or "censorship" by any single entity. In contrast, permissioned decentralized ledgers exploit real-world trust and accountability, allowing only explicitly authorized p...
Conference Paper
Current memory reclamation mechanisms for highly-concurrent data structures present an awkward trade-off. Techniques such as epoch-based reclamation perform well when all threads are running on dedicated processors, but the delay or failure of a single thread will prevent any other thread from reclaiming memory. Alternatives such as hazard pointers...
Article
Current memory reclamation mechanisms for highly-concurrent data structures present an awkward trade-off. Techniques such as epoch-based reclamation perform well when all threads are running on dedicated processors, but the delay or failure of a single thread will prevent any other thread from reclaiming memory. Alternatives such as hazard pointers...
Article
Consider a network of \(n\) processes, where each process inputs a \(d\)-dimensional vector of reals. All processes can communicate directly with others via reliable FIFO channels. We discuss two problems. The multidimensional Byzantine consensus problem, for synchronous systems, requires processes to decide on a single \(d\)-dimensional vector \(v...
Conference Paper
Hardware transactional memory (HTM) is becoming widely available on modern platforms. However, software using HTM requires at least two carefully-coordinated code paths: one for transactions, and at least one for when transactions either fail, or are not supported at all. We present the MCMS interface that allows a simple design of fast concurrent...