RMS-TM: A transactional memory benchmark for recognition, mining and synthesis applications

ABSTRACT Transactional Memory (TM) is a new concurrency control mechanism that aims to make parallel programming for Chip MultiProcessors (CMPs) easier. Recently, this topic has re-ceived substantial research attention with various software and hardware TM proposals and designs that promise to make TM both more efficient. These proposals are usually analyzed using existing TM-benchmarks, however the per-formance evaluation of TM proposals would be more solid if it included more representative benchmarks, especially from the emerging future CMP applications in the Recognition, Mining and Synthesis (RMS) domain. In this work, we introduce RMS-TM, a new TM bench-mark suite that includes selected RMS applications. Besides being non-trivial and scalable, RMS-TM applications have several important properties that make them promising can-didates as good TM workloads, such as I/O operations inside critical sections, nested locking, and various percentages of time spent in atomic sections and high commit/abort rates depending on the application. We propose a methodical process to construct a TM benchmark suite from candidate applications: in this en-deavor, we divide the application selection process into static and dynamic pre-transactification phases and propose crite-ria for selecting the most suitable applications. Analyzing all the BioBench and MineBench RMS applications and apply-ing our methodology, we selected 4 applications which form the RMS-TM benchmark suite. Our experiments show that the transactified versions of RMS-TM applications scale as well as their lock-based versions.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate superior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.
    Reconfigurable Computing: Architectures, Tools and Applications - 7th International Symposium, ARC 2011, Belfast, UK, March 23-25, 2011. Proceedings; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The simulation of the dynamics and kinematics of solid bodies is an important problem in a wide variety of fields in computing ranging from animation and interactive environments to scientific simulations. While rigid body simulation has a significant amount of potential parallelism, efficiently synchronizing irregular accesses to the large amount of mutable shared data in such programs remains a hurdle. There has been a significant amount of interest in transactional memory systems for their potential to alleviate some of the problems associated with fine-grained locking and more broadly for writing correct and efficient parallel programs. While results so far are promising, the effectiveness of TM systems has so far been predominantly evaluated on small benchmarks and kernels. In this paper we present our experiences in parallelizing ODE, a real-time physics engine that is widely used in commercial and open source games. Rigid body simulation in ODE consists of two main phases that are amenable to effective coarse-grained parallelization and which are also suitable for using transactions to orchestrate shared data synchronization. We found ODE to be a good candidate for applying parallelism and transactions to - it is a large real world application, there is a large amount of potential parallelism, it exhibits irregular access patterns and the amount of contention may vary at runtime. We present an experimental evaluation of our implementation of the parallel transactional ODE engine that shows speedups of up to 1.27x relative to the sequential version.
    Euro-Par 2011 Parallel Processing - 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part II; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There are a significant number of Transactional Memory(TM) proposals, varying in almost all aspects of the design space. Although several transactional benchmarks have been suggested, a simple, yet thorough, evaluation framework is still needed to completely characterize a TM system and allow for comparison among the various proposals. Unfortunately, TM system evaluation is difficult because the application characteristics which affect performance are often difficult to isolate from each other. We propose a set of orthogonal application characteristics that form a basis for transactional behavior and are useful in fully understanding the performance of a TM system. In this paper, we present EigenBench, a lightweight yet powerful microbenchmark for fully evaluating a transactional memory system. We show that EigenBench is useful for thoroughly exploring the orthogonal space of TM application characteristics. Because of its flexibility, our microbenchmark is also capable of reproducing a representative set of TM performance pathologies. In this paper, we use Eigenbench to evaluate two well-known TM systems and provide significant insight about their strengths and weaknesses. We also demonstrate how EigenBench can be used to mimic the evaluation coverage of a popular TM benchmark suite called STAMP.
    Workload Characterization (IISWC), 2010 IEEE International Symposium on; 01/2011


Available from