RMS-TM: A TRANSACTIONAL MEMORY BENCHMARK FOR
RECOGNITION, MINING AND SYNTHESIS APPLICATIONS
G¨ okc ¸en Kestor
Barcelona Supercomputing Center
Barcelona Supercomputing Center
Barcelona Supercomputing Center
Barcelona Supercomputing Center
Barcelona Supercomputing Center
Transactional Memory (TM) is a new concurrency control
MultiProcessors (CMPs) easier. Recently, this topic has re-
ceived substantial research attention with various software
and hardware TM proposals and designs that promise to
make TM both more efficient. These proposals are usually
analyzed using existing TM-benchmarks, however the per-
formance evaluation of TM proposals would be more solid if
it included more representative benchmarks, especially from
the emerging future CMP applications in the Recognition,
Mining and Synthesis (RMS) domain.
In this work, we introduce RMS-TM, a new TM bench-
mark suite that includes selected RMS applications. Besides
being non-trivial and scalable, RMS-TM applications have
several important properties that make them promising can-
didates as good TM workloads, such as I/O operations inside
critical sections, nested locking, and various percentages of
time spent in atomic sections and high commit/abort rates
depending on the application.
We propose a methodical process to construct a TM
benchmark suite from candidate applications: in this en-
and dynamic pre-transactification phases and propose crite-
ria for selecting the most suitable applications. Analyzing all
the BioBench and MineBench RMS applications and apply-
ing our methodology, we selected 4 applications which form
the RMS-TM benchmark suite. Our experiments show that
the transactified versions of RMS-TM applications scale as
well as their lock-based versions.
tion, BioBench, MineBench.
Transactional Memory, Workload Characteriza-
Since it is expensive (in terms of area and power consump-
tion) to extract more Instruction Level Parallelism (ILP)
from modern processors, multicore chips have become a de-
facto standard, as they provide performance scalability by
exploiting Thread Level Parallelism (TLP). However, the
complexity of parallel programming and the difficulties of
implementing efficient and provably correct programs limit
the effective use of these Chip MultiProcessors (CMPs).
New programming models have been proposed to ease the
writing of parallel applications that perform well on multi-
core architectures. Transactional Memory (TM)  is one
such programming model for control.
One could compare TM with locking, the classical con-
currency control mechanism. Lock-based implementations
provide consistency and isolation to threads that access
shared resources, although programmers have to explicitly
protect those shared resources. With TM, programmers sim-
ply mark code regions that access shared resources while the
TM system provides consistency and isolation. Furthermore,
TM enables programmers to write simple parallel code with
coarse-grained transactions that could perform as well as
parallel code that uses fine-grained locks.
Recognition, Mining, Synthesis (RMS) applications have
clear relevance to mainstream workloads and have often
been proposed as a good workload for future many-core
processors. This has prompted us to study whether RMS
applications (BioBench  and the MineBench ) benefit
from using TM or not. As we show in Sections 2 and 4, the
characteristics of these applications are different from the
existing TM benchmarks, thus, they provide a further set of
challenging test applications useful to TM researchers. For
example, the applications we study involve I/O operations
within critical sections, deep nesting levels, and various mix
of long and short transactions.
Before selecting applications for transactification from
the RMS benchmarks, we realized the we needed a me-
thodical, well-defined procedure. Consequently, we devel-
oped a set of criteria that makes a lock-based threaded paral-
lel program a good candidate for transactification. In a pre-
transactification phase, we apply these criteria, such as a
nested locking, complex function call traces, and irrevoca-
ble 1operations inside lock blocks, and filter out those
BioBench and MineBench applications that do not gener-
ate interesting cases from a TM point of view. The selected
applications are then transactified from their original lock-
based parallel versions using a prototype version of Intel
C++ compiler with Software Transactional Memory (STM)
support [28, 7] in the transactification phase. Finally, we
provide information about the lock-based and the TM-based
implementations of the selected applications in order to pro-
vide a direct comparison to measure the benefits of TM.
Moreover, we discuss the challenges faced while transacti-
fying the applications. We validate the characteristics of the
transactified applications by showing experimental results
performed on a multi-core machine. According to our ex-
perimental results, the selected applications present a wide
range of different transactional and runtime characteristics
that qualify them as a new and comprehensive benchmark
suite for evaluating TM designs. Among those properties,
the most desirable and important ones are the following:
• Nested transactions (up to depth level 9) - the depth of
nested transactions are unknown at compile time due
to the conditional recursive function calls inside atomic
• Large amount of I/O operations, memory management
operations and library calls in atomic blocks.
• Complex function calls and control flow inside atomic
• Various mix of long/short transactions with different
sizes of read and write sets.
• High and low contention.
The rest of this paper is organized as follows: Section 2
summarizes other work proposed in this area and we also de-
scribe our motivation. In section 3, we introduce RMS-TM
benchmark suite. The analysis and selection of the RMS-
TM applications are covered in Section 4. In section 5,
we show our experimental results for TM-based applica-
tions and compare their behavior against the equivalent lock-
based versions. Section 6 concludes this paper and section 7
comments on future work.
1When a transaction runs in irrevocable mode, it is guaranteed to commit
and all other transactions in the system are aborted
This section is intended to explain other benchmarks devel-
oped in recent years for analyzing parallel systems as well
as TM systems.
TM micro-benchmarks  use single data structures,
such as hash tables, linked lists and B-trees, to test TM
implementations. These micro-benchmarks are useful for
constructing basic-level insights of TM designs but do not
exhibit different TM characteristics. More importantly, these
benchmarks are not representative of realistic workloads,
thus, they do not provide a comprehensive analysis of TM
systems. In fact, realistic workloads perform operations on
several, more complex data structures at the same time.
SPLASH-2  is a suite of parallel applications that
consists of eight complete applications and four computa-
tional kernels. The applications and the kernels have been
implemented to minimize the time spent inside critical sec-
tions. SPLASH-2 focuses on parallel applications that utilize
little synchronization between threads. SPLASH-2 does not
provide various sizes of critical sections or different conflict
rates because of the high degree of parallelism, hence, the
benchmark suite is not fully capable of evaluating the under-
lying TM systems and discovering interesting transactional
STMBench7  presents an application adapted from
the 007 benchmark  to analyze Software Transactional
Memory (STM). STMBench7 provides a coarse-grained
and medium-grained locking implementation in both Java
and C++ that can be compared to their transactified equiv-
alent versions. The benchmark performs complex and dy-
namic operations on a non-trivial data structure. However,
the benchmark relies on the users to mark the critical sec-
tions with annotations that may be error-prone and time-
consuming. STMBench7 performs operations only on large
data structures, thus, it only shows long transactions. This
characteristic is useful for evaluating virtualized transactions
on TM systems. Virtualized transactions  are not limited
in terms of execution time, memory footprint and nesting
depth. Virtualization is a challenge for Hardware Trans-
actional Memory (HTM) systems that use small hardware
caches and physical addresses for transaction bookkeeping.
STAMP  is a benchmark suite that consists of sev-
eral benchmarks that have various transactional and runtime
behaviors. STAMP provides a sequential and a transactional
version of applications but does not provide the lock-based
Lee-TM  is a benchmark suite based on the Lee’s rout-
ing algorithm and promises longer and realistic workloads.
The benchmarks consist of sequential, coarse-grained and
medium-grained lock-based, transactional and optimized
transactional (with early release) implementations of Lee’s
routing algorithm. Thus, Lee-TM is good for comparing
different locked and transactional implementations. Besides
that, the data structure used in the implementation of Lee’s
 M.Ansari,C.Kotselidis,K.Jarvis,M.Luj´ an,C.Kirkham,and
I. Watson. Lee-tm: A non-trivial benchmark for transactional
memory. In ICA3PP ’08, June.
 David R. Butenhof. Programming with POSIX threads. 1997.
 M. J. Carey, D. J. DeWitt, and J. F. Naughton.
benchmark. In SIGMOD ’93, 1993.
 B. Chapman, G. Jost, and R. Pas. Using OpenMP: Portable
Shared Memory Parallel Programming (Scientific and Engi-
neering Computation). 2007.
 W. Chuang, S. Narayanasamy, G. Venkatesh, J. Sampson,
V. Biesbrouck, M., G. Pokam, B. Calder, and O. Colavin. Un-
bounded page-based transactional memory. SIGPLAN Not.,
 Intel Corporation. In Intel C++ STM Compiler Prototype Edi-
tion 2.0 Language Extensions and Users Guide, 2008 March.
 S. R. Eddy. Profile hidden markov models. Bioinformatics,
 J. Ennals, R. Adaptive Evaluation of Non-Strict Programs.
PhD thesis. July 2007.
 GCC. In Transactional Memory Support for GCC, 2008.
 R. Gioiosa, F. Petrini, K. Davis, and F. Lebaillif-Delamare.
Analysis of system overhead on parallel computers. In ISSPIT
 R. Guerraoui, M. Kapalka, and J. Vitek. Stmbench7: a bench-
mark for software transactional memory. SIGOPS ’07, 2007.
 L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D.
Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis,
and K. Olukotun. Transactional memory coherence and con-
sistency. In ISCA ’04, 2004.
 M. Herlihy and J. E. B. Moss. Transactional memory: Archi-
tectural support for lock-free data structures. In ISCA, 1993.
 M. V. Joshi and G. Karypis.
and efficient parallel classification algorithm for mining large
datasets. In IPDPS ’98, 1998.
 Y. Liu, W. Liao, and A. Choudhary. A fast high utility itemsets
mining algorithm. In UBDM ’05, 2005.
 Y. Liu, J. Pisharath, W. Liao, G. Memik, A. Choudhary, and
P. Dubey. Performance evaluation and characterization of
scalable data mining algorithms. In the 16th PDCS ’04, 2004.
 C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. Stamp:
Stanford transactional applications for multi-processing. In
IISWC ’08, September.
 E. Moore, K., J. Bobba, J. Moravan, M., D. Hill, M., and
A. Wood, D. Logtm: Log-based transactional memory. In
HPCA ’06. 2006.
 R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and
A. Choudhary. Minebench: A benchmark suite for data min-
ing workloads. In 2006 IEEE International Symposium on,
 S. B. Needleman and C. D. Wunsch.
applicable to the search for similarities in the amino acid
sequence of two proteins.
Scalparc: A new scalable
A general method
Journal of Molecular Biology,
 The openmp api specification for parallel programming.
Available at http://www.openmp.org.
 Oprofile - a system profiler for linux.
 C.Perfumo,N.S¨ onmez,S.Stipic,O.Unsal,A.Cristal,T.Har-
ris, and M. Valero. The limits of software transactional mem-
ory (stm): dissecting haskell stm applications on a many-core
environment. In CF ’08, 2008.
 F. Petrini, D. Kerbyson, and S. Pakin. The case of the missing
supercomputer performance: Achieving optimal performance
on the 8,192 processors of asci q. In ACM/IEEE SC ’03, 2006.
 T. F. Smith and M. S. Waterman. Identification of common
molecular subsequences. 1981.
 D. Tsafrir, Y. Etsion, G. Feitelson, D., and S. Kirkpatrick. Sys-
tem noise, os clock ticks, and fine-grained parallel applica-
tions. In ICS ’05, 2005.
 C. Wang, W. Chen, Y. Wu, B. Saha, and A. Adl-Tabatabai.
Code generation and optimization for transactional memory
constructs in an unmanaged language. In CGO ’07, 2007.
 A. Welc, B. Saha, and A. Adl-Tabatabai. Irrevocable transac-
tions and their applications. In SPAA ’08, 2008.
 C. Woo, S., M. Ohara, E. Torrie, P. Singh, J., and A. Gupta.
The SPLASH-2 programs: Characterization and methodolog-
ical considerations. In ISCA ’95, 1995.
 J. Zaki, M., M. Ogihara, S. Parthasarathy, and W. Li. Parallel
data mining for association rules on shared-memory multi-
processors. In KAIS, 1996.
 J. Zambreno, B. Ozisikyilmaz, G. Memik, and A. Choudhary.
Performance characterization of data mining applications us-
ing minebench. In In 9th CAECW, 2006.
 F. Zyulkyarov, S. Cvijic, O. Unsal, A. Cristal, E. Ayguade,
T. Harris, and M. Valero. Wormbench - a configurable work-
load for evaluating transactional memory systems. In MEDEA
 F. Zyulkyarov, V. Gajinov, O. Unsal, A. Cristal, E. Ayguade,
T. Harris, and M. Valero. Atomic quake: Use case of transac-
tional memory in an interactive multiplayer game server. In
PPoPP ’09, 2009.