Multiset signatures for transactional memory
DOI: 10.1145/1995896.1995905 Conference: Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31 - June 04, 2011
Transactional Memory (TM) systems must record the memory locations read and written (read and write sets) by concurrent transactions in order to detect conflicts. Some TM implementations use signatures for this purpose, which summarize read and write sets in bounded hardware at the cost of false positives (detection of non-existing conflicts). Read/write signatures are usually implemented as two separate Bloom filters with the same size. In contrast, transactions usually exhibit read/write sets of uneven cardinality, where read sets use to be larger than write sets. Thus, the read filter populates earlier than the write one and, consequently the read signature false positive rate may be high while the write filter has still a low occupation. In this paper, a multiset signature design is proposed which records both the read and write sets in the same Bloom filter without adding significant hardware complexity. Several designs of multiset signatures are analyzed and evaluated. New problems arise related to hardware complexity and the existence of cross false positives, i.e. new false positives coming from the fact that both sets share the same filter. Additionally, multiset signatures are enhanced using locality-sensitive hashing, proposed by the authors in a previous work. Experimental results show that the multiset approach is able to reduce the false positive rate and improve the execution performance in most of the tested codes, without increasing the required hardware area in a noticeable amount.
- [Show abstract] [Hide abstract]
ABSTRACT: Optimistic concurrency provided by Transactional Memory (TM) makes it a good candidate for maintaining synchronization in future multi-core processors. Speculative execution and bulk level conflict detection enable TM to provide synchronization at fine grain without the complexity of managing fine grain locks. Early hardware TM systems proposed to store the information needed for checking conflicts in the Level 1 (L1) cache, thereby limiting the size of a transaction to the size of the L1 cache. The introduction of signatures to TM systems removed this limitation and allowed transactions to be of any size. However signatures produce false positives which leads to performance degradation in TM systems. The objective of introducing signatures to TM is that the size of a transaction can be bigger than the L1 cache. Once signatures are integrated to a TM system, they are used to detect conflicts regardless of the size of a transaction. This means signatures are being used even for transactions that can store their read and write sets in the L1 cache. Based on this observation we propose SnCTM, a TM system that adaptively changes the source used to detect conflicts. In our approach, when a transaction fits in the L1 cache, cache line information is used to detect conflicts and signatures are used otherwise. By adaptively changing the source, SnCTM achieved up to 4.62 and 2.93 times speed-up over a baseline TM using lazy versioning and lazy conflict detection with two commonly used signature configurations. We also show that our system, even with a smaller signature (64 bit), can achieve performance comparable to a system with a perfect signature (8k bit).
- [Show abstract] [Hide abstract]
ABSTRACT: The efficient management of conflicts among concurrent transactions constitutes a key aspect that hardware transactional memory (HTM) systems must achieve. Scalable HTM proposals so far inherit the cache-based style of conflict detection typically found in bus-based systems, largely unaware of the interactions between transactions and directory coherence. In this paper, we demonstrate that the traditional approach of detecting conflicts at the private cache levels is inefficient when used in the context of a directory protocol. We find that the use of the directory as a mere router of coherence requests restricts the throughput of conflict detection, and show how it becomes a bottleneck under high contention. This paper proposes a scheme for conflict detection that decouples conflict detection from cache coherence in order to overcome pathological situations that degrade the performance of an eager HTM system. Our scheme places bookkeeping metadata at the directory, introducing it as a separate hardware module that leaves the coherence protocol unmodified. In comparison to a state-of-the-art eager HTM system, our design handles contention more efficiently, minimizes the performance degradation of false positives for signatures of similar hardware cost, and reduces the network traffic generated.IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(1):59-71. DOI:10.1109/TPDS.2012.103 · 2.17 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Transactional Memory (TM) systems must track memory accesses made by concurrent transactions in order to detect conflicts. Many TM implementations use signatures for this purpose, which summarize reads and writes in fixed-size bit registers at the cost of false positives (detection of nonexisting conflicts). Signat f8 ures are commonly implemented as two separate same-sized Bloom filters, one for reads and other for writes. In contrast, transactions frequently exhibit read and write sets of uneven cardinality. This mismatch between data sets and filter storage i 2a9 ntroduces inefficiencies in the use of signatures that have some impact on performance. This paper presents different signature designs as alternatives to the common scheme to deal with the asymmetry in transactional data sets in an effective way. Basically, we analyze two classes of new signatures, called multiset and reconfigurable asymmetric signatures. The first class uses only one Bloom filter to track both read and write sets, while the second class uses Bloom filters of configurable size for reads and writes. The main focus of this paper is a thorough study of these alternative signature designs, including a statistical analysis of false positives and an experimenta d4f l evaluation, providing performance results and hardware area, time and energy requirements.IEEE Transactions on Parallel and Distributed Systems 03/2013; 24(3):506-519. DOI:10.1109/TPDS.2012.138 · 2.17 Impact Factor
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.