RMS-TM: A transactional memory benchmark for recognition, mining and synthesis applications

Transactional Memory (TM) is a new concurrency control mechanism that aims to make parallel programming for Chip MultiProcessors (CMPs) easier. Recently, this topic has re-ceived substantial research attention with various software and hardware TM proposals and designs that promise to make TM both more efficient. These proposals are usually analyzed using existing TM-benchmarks, however the per-formance evaluation of TM proposals would be more solid if it included more representative benchmarks, especially from the emerging future CMP applications in the Recognition, Mining and Synthesis (RMS) domain. In this work, we introduce RMS-TM, a new TM bench-mark suite that includes selected RMS applications. Besides being non-trivial and scalable, RMS-TM applications have several important properties that make them promising can-didates as good TM workloads, such as I/O operations inside critical sections, nested locking, and various percentages of time spent in atomic sections and high commit/abort rates depending on the application. We propose a methodical process to construct a TM benchmark suite from candidate applications: in this en-deavor, we divide the application selection process into static and dynamic pre-transactification phases and propose crite-ria for selecting the most suitable applications. Analyzing all the BioBench and MineBench RMS applications and apply-ing our methodology, we selected 4 applications which form the RMS-TM benchmark suite. Our experiments show that the transactified versions of RMS-TM applications scale as well as their lock-based versions.

Full-text preview

Available from:
  • Source
    • "That exploration set the TM properties to a baseline configuration and then varied a single TM property at a time while allowing for a maximum of 20 retries in the Ser- Control policy. That threshold was obtained empirically as the best average result for a series of experiments with the RMS- TM [16] and STAMP benchmarks. Section 4.2 experiments are repeated in this section, but this time we use the htm-pBuilder tool to vary the maximum number of retries. "
    [Show abstract] [Hide abstract] ABSTRACT: This paper presents an extensive performance study of the implementation of Hardware Transactional Memory (HTM) in the Haswell generation of Intel x86 core processors. It evaluates the strengths and weaknesses of this new architecture by explor- ing several dimensions in the space of Transactional Memory (TM) application characteristics using the Eigenbench [1] and the CLOMP-TM [2] benchmarks. This paper also introduces a new tool, called htm-pBuilder that tailors fallback policies and allows independent exploration of its parameters. This detailed performance study provides insights on the constraints imposed by the Intel’s Transaction Synchronization Exten- sion (Intel’s TSX) and introduces a simple, but efficient policy for guaranteeing forward progress on top of the best-effort Intel’s HTM which was critical to achieving performance. The evaluation also shows that there are a number of potential improvements for designers of TM applications and software systems that use Intel’s TM and provides recommendations to extract maximum benefit from the current TM support available in Haswell.
    Full-text · Article · Dec 2015
  • Source
    • "For example, as long as the program correctness is preserved, the programmer should use two smaller atomic blocks instead of one large atomic block or as inFigure 1 put the atomic block inside the while loop instead of outside. In an earlier paper, we illustrated examples where smaller atomic blocks aborted less frequently and incurred less wasted work when they did abort [15, 23, 28]. In addition, the underlying TM system may support language-level primitives to tune performance, or provide an API that the programmer can use to give hints about the shared data structures. "
    [Show abstract] [Hide abstract] ABSTRACT: Many researchers have developed applications using transactional memory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and optimizing programs which use transactions. In this paper we introduce a series of profiling and optimization techniques for TM applications. The profiling techniques are of three types: (i) techniques to identify multiple potential conflicts from a single program run, (ii) techniques to identify the data structures involved in conflicts by using a symbolic path through the heap, rather than a machine address, and (iii) visualization techniques to summarize how threads spend their time and which of their transactions conflict most frequently. Altogether they provide in-depth and comprehensive information about the wasted work caused by aborting transactions. To reduce the contention between transactions we suggest several TM specific optimizations which leverage nested transactions, transaction checkpoints, early release and etc. To examine the effectiveness of the profiling and optimization techniques, we provide a series of illustrations from the STAMP TM benchmark suite and from the synthetic WormBench workload. First we analyze the performance of TM applications using our profiling techniques and then we apply various optimizations to improve the performance of the Bayes, Labyrinth and Intruder applications. We discuss the design and implementation of the profiling techniques in the Bartok-STM system. We process data offline or during garbage collection, where possible, in order to minimize the probe effect introduced by profiling.
    Preview · Article · Feb 2012 · International Journal of Parallel Programming
  • Source
    • "Other interesting future issues are related to RAC, such as adaptive adjustment of sampling interval and finding the optimal parameters such as MIN and MAX for the abort/success ratio. We will also investigate how to best adjust the admission quota Q using more benchmarks such as RMS- TM [37] and STMBench [38]. "
    [Show abstract] [Hide abstract] ABSTRACT: This paper proposes the View-Oriented Transactional Memory (VOTM) model to seamlessly integrate locking mechanism and transactional memory. The VOTM model allows programmers to partition the shared memory into "views", which are non-overlapping sets of shared data objects. The Restricted Admission Control (RAC) scheme can then control the number of processes accessing each view individually in order to reduce the number of aborts of transactions. The RAC scheme has the merits of both the locking mechanism and the transactional memory. Experimental results demonstrate that VOTM outperforms traditional transactional memory models such as TinySTM by up to 270%.
    Full-text · Conference Paper · Oct 2011
Show more