Conference Paper

Multiset signatures for transactional memory.

DOI: 10.1145/1995896.1995905 Conference: Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31 - June 04, 2011
Source: DBLP

ABSTRACT Transactional Memory (TM) systems must record the memory locations read and written (read and write sets) by concurrent transactions in order to detect conflicts. Some TM implementations use signatures for this purpose, which summarize read and write sets in bounded hardware at the cost of false positives (detection of non-existing conflicts). Read/write signatures are usually implemented as two separate Bloom filters with the same size. In contrast, transactions usually exhibit read/write sets of uneven cardinality, where read sets use to be larger than write sets. Thus, the read filter populates earlier than the write one and, consequently the read signature false positive rate may be high while the write filter has still a low occupation. In this paper, a multiset signature design is proposed which records both the read and write sets in the same Bloom filter without adding significant hardware complexity. Several designs of multiset signatures are analyzed and evaluated. New problems arise related to hardware complexity and the existence of cross false positives, i.e. new false positives coming from the fact that both sets share the same filter. Additionally, multiset signatures are enhanced using locality-sensitive hashing, proposed by the authors in a previous work. Experimental results show that the multiset approach is able to reduce the false positive rate and improve the execution performance in most of the tested codes, without increasing the required hardware area in a noticeable amount.

  • [Show abstract] [Hide abstract]
    ABSTRACT: The transactional memory (TM) paradigm promises to increase programmer productivity by making it easier to write correct parallel programs. In fulfilling this goal, a TM system should maximize its performance with limited hardware resources. Conflict detection is an essential element for maintaining correctness among concurrent transactions in a TM system. Hardware signatures have been proposed as an area-efficient method for detecting conflicts. However, signatures can degrade TM performance by falsely declaring conflicts. Hence, improving the accuracy of signatures within a given hardware budget is a crucial issue for TM to be adopted as a mainstream programming model. In this paper, we propose a simple and effective signature design, the unified signature. Instead of using separate read- and write-signatures, we implement a single signature to track all read- and write-accesses. By merging read- and write-signatures, a unified signature can effectively enlarge the signature coverage without additional overhead. Within the constraints of a given hardware budget, a TM system with a unified signature outperforms a baseline system with the same-sized traditional signatures by reducing the number of falsely detected conflicts. Even though the unified signature scheme incurs read-read dependencies, we show that these false dependencies do not negate the benefit of unified signatures and can effectively be filtered out. A TM system with a 2-Kbit unified signature with a helper signature scheme achieves speedups of 15 percent over baseline TM with 33 percent less area and 49 percent less power.
    IEEE Transactions on Parallel and Distributed Systems 11/2013; 24(11):2230-2239. DOI:10.1109/TPDS.2012.292 · 2.17 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Transactional Memory (TM) systems must track memory accesses made by concurrent transactions in order to detect conflicts. Many TM implementations use signatures for this purpose, which summarize reads and writes in fixed-size bit registers at the cost of false positives (detection of nonexisting conflicts). Signat f8 ures are commonly implemented as two separate same-sized Bloom filters, one for reads and other for writes. In contrast, transactions frequently exhibit read and write sets of uneven cardinality. This mismatch between data sets and filter storage i 2a9 ntroduces inefficiencies in the use of signatures that have some impact on performance. This paper presents different signature designs as alternatives to the common scheme to deal with the asymmetry in transactional data sets in an effective way. Basically, we analyze two classes of new signatures, called multiset and reconfigurable asymmetric signatures. The first class uses only one Bloom filter to track both read and write sets, while the second class uses Bloom filters of configurable size for reads and writes. The main focus of this paper is a thorough study of these alternative signature designs, including a statistical analysis of false positives and an experimenta d4f l evaluation, providing performance results and hardware area, time and energy requirements.
    IEEE Transactions on Parallel and Distributed Systems 03/2013; 24(3):506-519. DOI:10.1109/TPDS.2012.138 · 2.17 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The efficient management of conflicts among concurrent transactions constitutes a key aspect that hardware transactional memory (HTM) systems must achieve. Scalable HTM proposals so far inherit the cache-based style of conflict detection typically found in bus-based systems, largely unaware of the interactions between transactions and directory coherence. In this paper, we demonstrate that the traditional approach of detecting conflicts at the private cache levels is inefficient when used in the context of a directory protocol. We find that the use of the directory as a mere router of coherence requests restricts the throughput of conflict detection, and show how it becomes a bottleneck under high contention. This paper proposes a scheme for conflict detection that decouples conflict detection from cache coherence in order to overcome pathological situations that degrade the performance of an eager HTM system. Our scheme places bookkeeping metadata at the directory, introducing it as a separate hardware module that leaves the coherence protocol unmodified. In comparison to a state-of-the-art eager HTM system, our design handles contention more efficiently, minimizes the performance degradation of false positives for signatures of similar hardware cost, and reduces the network traffic generated.
    IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(1):59-71. DOI:10.1109/TPDS.2012.103 · 2.17 Impact Factor