Conference Paper

Multiset signatures for transactional memory.

DOI: 10.1145/1995896.1995905 In proceeding of: Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31 - June 04, 2011
Source: DBLP

ABSTRACT Transactional Memory (TM) systems must record the memory locations read and written (read and write sets) by concurrent transactions in order to detect conflicts. Some TM implementations use signatures for this purpose, which summarize read and write sets in bounded hardware at the cost of false positives (detection of non-existing conflicts). Read/write signatures are usually implemented as two separate Bloom filters with the same size. In contrast, transactions usually exhibit read/write sets of uneven cardinality, where read sets use to be larger than write sets. Thus, the read filter populates earlier than the write one and, consequently the read signature false positive rate may be high while the write filter has still a low occupation. In this paper, a multiset signature design is proposed which records both the read and write sets in the same Bloom filter without adding significant hardware complexity. Several designs of multiset signatures are analyzed and evaluated. New problems arise related to hardware complexity and the existence of cross false positives, i.e. new false positives coming from the fact that both sets share the same filter. Additionally, multiset signatures are enhanced using locality-sensitive hashing, proposed by the authors in a previous work. Experimental results show that the multiset approach is able to reduce the false positive rate and improve the execution performance in most of the tested codes, without increasing the required hardware area in a noticeable amount.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: While Sequential Consistency (SC) is the most intuitive memory consistency model and the one most programmers likely assume, current multiprocessors do not support it. Instead, they support more relaxed models that deliver high performance. SC implementations are considered either too slow or -- when they can match the performance of relaxed models -- too difficult to implement. In this paper, we propose Bulk Enforcement of SC (BulkSC), anovel way of providing SC that is simple to implement and offers performance comparable to Release Consistency (RC). The idea is to dynamically group sets of consecutive instructions into chunks that appear to execute atomically and in isolation. The hardware enforces SC at the coarse grain of chunks which, to the program, appears as providing SC at the individual memory access level. BulkSC keeps the implementation simple by largely decoupling memory consistency enforcement from processor structures. Moreover, it delivers high performance by enabling full memory access reordering and overlapping within chunks and across chunks. We describe a complete system architecture that supports BulkSC and show that it delivers performance comparable to RC.
    34th International Symposium on Computer Architecture (ISCA 2007), June 9-13, 2007, San Diego, California, USA; 01/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a hardware transactional memory (HTM) system called LogTM Signature Edition (LogTM-SE). LogTM-SE uses signatures to summarize a transaction's read- and write-sets and detects conflicts on coherence requests (eager conflict detection). Transactions update memory "in place" after saving the old value in a per-thread memory log (eager version management). Finally, a transaction commits locally by clearing its signature, resetting the log pointer, etc., while aborts must undo the log. LogTM-SE achieves two key benefits. First, signatures and logs can be implemented without changes to highly-optimized cache arrays because LogTM-SE never moves cached data, changes a block's cache state, or flash clears bits in the cache. Second, transactions are more easily virtualized because sig- natures and logs are software accessible, allowing the operating system and runtime to save and restore this state. In particu- lar, LogTM-SE allows cache victimization, unbounded nesting (both open and closed), thread context switching and migra- tion, and paging.
    13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 10-14 February 2007, Phoenix, Arizona, USA; 01/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transactional Memory (TM) systems must track the read and write sets--items read and written during a transaction--to detect conflicts among concurrent trans- actions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a per- formance cost of false positives (conflicts detected when none exists). This paper examines different organizations to achieve hardware-efficient and accurate TM signatures. First, we find that implementing each signature with a single k-hash- function Bloom filter (True Bloom signature) is inefficient, as it requires multi-ported SRAMs. Instead, we advocate using k single-hash-function Bloom filters in parallel (Par- allel Bloom signature), using area-efficient single-ported SRAMs. Our formal analysis shows that both organiza- tions perform equally well in theory and our simulation- based evaluation shows this to hold approximately in prac- tice. We also show that by choosing high-quality hash func- tions we can achieve signature designs noticeably more ac- curate than the previously proposed implementations. Fi- nally, we adapt Pagh and Rodler's cuckoo hashing to im- plement Cuckoo-Bloom signatures. While this representa- tion does not support set intersection, it mitigates false pos- itives for the common case of small read/write sets and per- forms like a Bloom filter for large sets.
    Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on; 01/2008