RMS-TM: A transactional memory benchmark for recognition, mining and synthesis applications



Transactional Memory (TM) is a new concurrency control mechanism that aims to make parallel programming for Chip MultiProcessors (CMPs) easier. Recently, this topic has re-ceived substantial research attention with various software and hardware TM proposals and designs that promise to make TM both more efficient. These proposals are usually analyzed using existing TM-benchmarks, however the per-formance evaluation of TM proposals would be more solid if it included more representative benchmarks, especially from the emerging future CMP applications in the Recognition, Mining and Synthesis (RMS) domain. In this work, we introduce RMS-TM, a new TM bench-mark suite that includes selected RMS applications. Besides being non-trivial and scalable, RMS-TM applications have several important properties that make them promising can-didates as good TM workloads, such as I/O operations inside critical sections, nested locking, and various percentages of time spent in atomic sections and high commit/abort rates depending on the application. We propose a methodical process to construct a TM benchmark suite from candidate applications: in this en-deavor, we divide the application selection process into static and dynamic pre-transactification phases and propose crite-ria for selecting the most suitable applications. Analyzing all the BioBench and MineBench RMS applications and apply-ing our methodology, we selected 4 applications which form the RMS-TM benchmark suite. Our experiments show that the transactified versions of RMS-TM applications scale as well as their lock-based versions.

8 Reads
  • Source
    • "For example, as long as the program correctness is preserved, the programmer should use two smaller atomic blocks instead of one large atomic block or as in Figure 1 put the atomic block inside the while loop instead of outside. In an earlier paper, we illustrated examples where smaller atomic blocks aborted less frequently and incurred less wasted work when they did abort [15] [23] [28]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many researchers have developed applications using transactional memory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and optimizing programs which use transactions. In this paper we introduce a series of profiling and optimization techniques for TM applications. The profiling techniques are of three types: (i) techniques to identify multiple potential conflicts from a single program run, (ii) techniques to identify the data structures involved in conflicts by using a symbolic path through the heap, rather than a machine address, and (iii) visualization techniques to summarize how threads spend their time and which of their transactions conflict most frequently. Altogether they provide in-depth and comprehensive information about the wasted work caused by aborting transactions. To reduce the contention between transactions we suggest several TM specific optimizations which leverage nested transactions, transaction checkpoints, early release and etc. To examine the effectiveness of the profiling and optimization techniques, we provide a series of illustrations from the STAMP TM benchmark suite and from the synthetic WormBench workload. First we analyze the performance of TM applications using our profiling techniques and then we apply various optimizations to improve the performance of the Bayes, Labyrinth and Intruder applications. We discuss the design and implementation of the profiling techniques in the Bartok-STM system. We process data offline or during garbage collection, where possible, in order to minimize the probe effect introduced by profiling.
    International Journal of Parallel Programming 02/2012; 40(1):25-56. DOI:10.1007/s10766-011-0177-2 · 0.49 Impact Factor
  • Source
    • "Other TM benchmark suites have been proposed, including those from Kestor et al. [9], Ansari et al. [10], and Guerraoui et al. [11]. These suites focus on particular application domains and are not designed to exercise the full breadth of TM behavior. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There are a significant number of Transactional Memory(TM) proposals, varying in almost all aspects of the design space. Although several transactional benchmarks have been suggested, a simple, yet thorough, evaluation framework is still needed to completely characterize a TM system and allow for comparison among the various proposals. Unfortunately, TM system evaluation is difficult because the application characteristics which affect performance are often difficult to isolate from each other. We propose a set of orthogonal application characteristics that form a basis for transactional behavior and are useful in fully understanding the performance of a TM system. In this paper, we present EigenBench, a lightweight yet powerful microbenchmark for fully evaluating a transactional memory system. We show that EigenBench is useful for thoroughly exploring the orthogonal space of TM application characteristics. Because of its flexibility, our microbenchmark is also capable of reproducing a representative set of TM performance pathologies. In this paper, we use Eigenbench to evaluate two well-known TM systems and provide significant insight about their strengths and weaknesses. We also demonstrate how EigenBench can be used to mimic the evaluation coverage of a popular TM benchmark suite called STAMP.
    Workload Characterization (IISWC), 2010 IEEE International Symposium on; 01/2011
  • Source
    • "To test multicore performance with STM benchmarks running on the BeeFarm, we have run ScalParC from RMS-TM [13], Intruder and SSCA2 TM benchmarks from STAMP [16] which are very commonly used for TM research. We modified ScalParC with explicit calls to use TinySTM. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate superior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.
    Reconfigurable Computing: Architectures, Tools and Applications - 7th International Symposium, ARC 2011, Belfast, UK, March 23-25, 2011. Proceedings; 01/2011
Show more


8 Reads
Available from