Conference PaperPDF Available

Is Transactional Programming Actually Easier?

Authors:

Abstract and Figures

Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. The promise of increased performance for all applications through ever more parallel hardware requires good tools for concurrent programming, especially for average programmers. Transactional memory (TM) has enjoyed recent interest as a tool that can help programmers program concurrently. The transactional memory (TM) research community is heavily invested in the claim that programming with transactional memory is easier than alternatives (like locks), but evidence for or against the veracity of this claim is scant. In this paper, we describe a user-study in which 237 undergraduate students in an operating systems course implement the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the students' code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.
Content may be subject to copyright.
Is Transactional Programming Actually Easier?
Christopher J. Rossbach and Owen S. Hofmann and Emmett Witchel
University of Texas at Austin
{rossbach,osh,witchel}@cs.utexas.edu
Abstract
Chip multi-processors (CMPs) have become ubiquitous, while
tools that ease concurrent programming have not. The promise
of increased performance for all applications through ever more
parallel hardware requires good tools for concurrent programming,
especially for average programmers. Transactional memory (TM)
has enjoyed recent interest as a tool that can help programmers
program concurrently.
The transactional memory (TM) research community is heavily
invested in the claim that programming with transactional memory
is easier than alternatives (like locks), but evidence for or against
the veracity of this claim is scant. In this paper, we describe a user-
study in which 237 undergraduate students in an operating systems
course implement the same programs using coarse and fine-grain
locks, monitors, and transactions. We surveyed the students after
the assignment, and examined their code to determine the types and
frequency of programming errors for each synchronization tech-
nique. Inexperienced programmers found baroque syntax a bar-
rier to entry for transactional programming. On average, subjective
evaluation showed that students found transactions harder to use
than coarse-grain locks, but slightly easier to use than fine-grained
locks. Detailed examination of synchronization errors in the stu-
dents’ code tells a rather different story. Overwhelmingly, the num-
ber and types of programming errors the students made was much
lower for transactions than for locks. On a similar programming
problem, over 70% of students made errors with fine-grained lock-
ing, while less than 10% made errors with transactions.
Categories and Subject Descriptors D.1.3 [Programming Tech-
niques]: [Concurrent Programming]
General Terms Design, Performance
Keywords Transactional Memory, Optimistic Concurrency, Syn-
chronization
1. Introduction
The increasing ubiquity of chip multiprocessors has resulted in a
high availability of parallel hardware resources. However, while
parallel computing resources have become commonplace, con-
current programs have not; concurrent programming remains a
challenging endeavor, even for experienced programmers. Trans-
actional memory (TM) has enjoyed considerable research attention
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
PPoPP’10
January 9–14, 2010, Bangalore, India.
Copyright c
2010 ACM 978-1-60558-708-0/10/01.. . $10.00
precisely because it promises to make the development of concur-
rent software easier. Transactional memory (TM) researchers po-
sition TM as an enabling technology for concurrent programming
for the “average” programmer.
Transactional memory allows the programmer to delimit re-
gions of code that must execute atomically and in isolation. It
promises the performance of fine-grain locking with the code sim-
plicity of coarse-grain locking. In contrast to locks, which use mu-
tual exclusion to serialize access to critical sections, TM is typ-
ically implemented using optimistic concurrency techniques, al-
lowing critical sections to proceed in parallel. Because this tech-
nique dramatically reduces serialization when dynamic read-write
and write-write sharing is rare, it can translate directly to improved
performance without additional effort from the programmer. More-
over, because transactions eliminate many of the pitfalls commonly
associated with locks (e.g. deadlock, convoys, poor composability),
transactional programming is touted as being easier than lock based
programming.
Evaluating the ease of transactional programming relative to
locks is largely uncharted territory. Naturally, the question of
whether transactions are easier to use than locks is qualitative.
Moreover, since transactional memory is still a nascent technology,
the only available transactional programs are research benchmarks,
and the population of programmers familiar with both transactional
memory and locks for synchronization is vanishingly small.
To address the absence of evidence, we developed a concurrent
programming project for students of an undergraduate Operating
Systems course at The University of Texas at Austin, in which stu-
dents were required to implement the same concurrent program us-
ing coarse and fine-grained locks, monitors, and transactions. We
surveyed students about the relative ease of transactional program-
ming as well as their investment of development effort using each
synchronization technique. Additionally, we examined students’
solutions in detail to characterize and classify the types and fre-
quency of programming errors students made with each program-
ming technique.
This paper makes the following contributions:
A project and design for collecting data relevant to the question
of the relative ease of programming with different synchroniza-
tion primitives.
Data from 237 student surveys and 1323 parallel programs that
constitute the largest-scale (to our knowledge) empirical data
relevant to the question of whether transactions are, in fact,
easier to use than locks.
A taxonomy of synchronization errors made with different syn-
chronization techniques, and a characterization of the frequency
with which such errors occur in student programs.
2. Sync-gallery
In this section, we describe sync-gallery, the Java programming
project we assigned to students in an undergraduate operating sys-
Figure 1. A screen-shot of sync-gallery, the program undergraduate OS students were asked to implement. In the figure the colored boxes
represent 16 shooting lanes in a gallery populated by shooters, or rogues. A red or blue box represents a box in which a rogue has shot either
a red or blue paint ball. A white box represents a box in which no shooting has yet taken place. A purple box indicates a line in which both
a red and blue shot have occurred, indicating a race condition in the program. Sliders control the rate at which shooting and cleaning threads
perform their work.
tems course. The project is designed to familiarize students with
concurrent programming in general, and with techniques and id-
ioms for using a variety of synchronization primitives to manage
data structure consistency. Figure 1 shows a screen shot from the
sync-gallery program.
The project asks students to consider the metaphor of a shooting
gallery, with a fixed number of lanes in which rogues (shooters) can
shoot in individual lanes. Being pacifists, we insist that shooters in
this gallery use red or blue paint balls rather than bullets. Targets
are white, so that lanes will change color when a rogue has shot in
one. Paint is messy, necessitating cleaners to clean the gallery when
all lanes have been shot. Rogues and cleaners are implemented
as threads that must check the state of one or more lanes in the
gallery to decide whether it is safe to carry out their work. For
rogues, this work amounts to shooting at some number of randomly
chosen lanes. Cleaners must return the gallery to its initial state
with all lanes white. The students must use various synchronization
primitives to enforce a number of program invariants:
1. Only one rogue may shoot in a given lane at a time.
2. Rogues may only shoot in a lane if it is white.
3. Cleaners should only clean when all lanes have been shot
(are non-white).
4. Only one thread can be engaged in the process of cleaning
at any given time.
If a student writes code for a rogue that fails to respect the first
two invariants, the lane can be shot with both red andblue, and will
therefore turn purple, giving the student instant visual feedback that
a race condition exists in the program. If the code fails to respect
to the second two invariants, no visual feedback is given (indeed
these invariants can only be checked by inspection of the code in
the current implementation).
We ask the students to implement 9 different versions of rogues
(Java classes) that are instructive for different approaches to syn-
chronization. Table 1 summarizes the rogue variations. Gaining ex-
clusive access to one or two lanes of the gallery in order to test
the lane’s state and then modify it corresponds directly to the real-
world programming task of locking some number of resources in
order to test and modify them safely in the presence of concurrent
threads.
2.1 Locking
We ask the students to synchronize rogue and cleaner threads in the
sync-gallery using locks to teach them about coarse and fine-grain
locking. To ensure that students write code that explicitly performs
locking and unlocking operations, we require them to use the Java
ReentrantLock class and do not allow use of the synchronized
keyword. In locking rogue variations, cleaners do not use dedicated
threads; the rogue that colors the last white lane in the gallery
is responsible for becoming a cleaner and subsequently cleaning
all lanes. There are four variations on this rogue type: Coarse,
Fine, Coarse2 and Fine2. In the coarse implementation, students
are allowed to use a single global lock which is acquired before
attempting to shoot or clean. In the fine-grain implementation, we
require the students to implement individual locks for each lane.
The Coarse2 and Fine2 variations require the same mapping of
locks to objects in the gallery as their counterparts above, but
introduce the additional stipulation that rogues must acquire access
to and shoot at two random lanes rather than one. The variation
illustrates that fine-grain locking requires a lock-ordering discipline
to avoid deadlock, while a single coarse lock does not. Naturally,
the use of fine grain lane locks complicates the enforcement of
invariants 3 and 4 above.
2.2 Monitor implementations
Two variations of the program require the students to use condition
variables along with signal/wait implement both fine and coarse
locking versions of the rogue programs. The monitor variations in-
troduce dedicated threads for cleaners: shooters and cleaners must
use condition variables to coordinate shooting and cleaning phases.
In the coarse version (CoarseCleaner), students use a single global
lock, while the fine-grain version (FineCleaner) requires per-lane
locks.
2.3 Transactions
Finally, the students are asked to implement 3 TM-based variants
of the rogues that implement the same specification as their cor-
responding locking variations, but use transactional memory for
synchronization instead of locks. The most basic TM-based rogue,
TM, is analogous to the Coarse and Fine versions: rogue and
cleaner threads are not distinct, and shooters need shoot only one
lane, while the TM2 variation requires that rogues shoot at two
lanes rather than one. In the TMCleaner, rogues and cleaners have
TM Int y = new T MIn t ( 0 ) ;
TMInt x = new TMInt ( 10) ;
Callab l e c = new C a l lable<Void>{
p u bl i c V oi d c a l l ( ) {
/ / t x nl c od e
y . se t Va lue ( x . g et Valu e ( ) 2 ) ;
re t u r n n ul l ;
}
}
T hr ea d . d o It ( c ) ;
TM Int y = new T MIn t ( 0 ) ;
TMInt x = new TMInt ( 10) ;
T r an s a c ti o n t x = new T r a n sa c t i on ( i d ) ;
b oo l ea n d on e = f a l s e ;
whil e ( ! d one ) {
tr y {
t x . Begi n Tra n sac t ion ( ) ;
/ / t x nl c od e
y . se t Va lue ( x . g et Valu e ( ) 2 ) ;
do ne = t x . C om mi tTr ans act io n ( ) ;
}c at c h ( A b or t Ex cep t io n e ) {
t x . AbortT r ansa c tion ( ) ;
do ne = f a l s e ;
}
}
i n t y = 0;
i n t x = 10;
ato m ic {
y = x 2;
}
Figure 2. Examples of (left) DSTM2 concrete syntax, (middle) JDASTM concrete syntax, and (right, for comparison) ideal atomic keyword syntax.
Year 1 of the study used DSTM2, while years 2 and 3 used JDASTM.
dedicated threads. Students can rely on the TM subsystem to de-
tect conflicts and restart transactions to enforce all invariants, so no
condition synchronization is required.
2.4 Transactional Memory Support
Our ideal TM system would support atomic blocks in the Java
language, allowing students to write transactional code of the form:
voi d s hoot () {
ato m ic {
La ne l = g et La ne ( ran d ( ) ) ;
i f ( l . getColo r () == WHITE)
l . s h o ot ( t h i s . c o l or ) ;
}
}
No such tool is yet available; implementing compiler support
for atomic blocks, or use of a a source-to-source compiler such as
spoon [2] were considered out-of-scope for the project. Instead we
used a TM library.
Using a TM library means that students are forced to deal
directly with the concrete syntax of our TM implementation, and
must manage read and write barriers explicitly. We assigned the
lab to 5 classes over 3 semesters during 3 different school years.
During the first year both classes used DSTM2 [15]. For the second
and third years, all classes used JDASTM [29].
The concrete syntax has a direct impact on ease of program-
ming, as seen in Figure 2, which provides examples for the same
task using DSTM2, JDASTM, and for reference, with language
support using the atomic keyword. While the version with the
atomic keyword is quite simple, both the DSTM2 and JDASTM
examples pepper the actual data structure manipulation with code
that explicitly manages transactions. We replaced DSTM2 in the
second year because we felt that JDASTM syntax was somewhat
less baroque and did not require students to deal directly with
programming constructs like generics. Also, DSTM2 binds trans-
actional execution to specialized thread classes. However, both
DSTM2 and JDASTM require explicit read and write barrier calls
for transactional reads and writes.
3. Methodology
Students completed the sync-gallery program as a programming
assignment as part of several operating systems classes at The
University of Texas at Austin. In total, 237 students completed the
assignment, spanning five sections in classes from three different
semesters, over three years of the course. The first year of the
course included 84 students, while the second and third included
101 and 53 respectively. We provided an implementation of the
shooting gallery, and asked students to write the rogue classes
described in the previous sections, respecting the given invariants.
We asked students to record the amount of time they spent
designing, coding, and debugging each programming task (rogue).
We use the amount of time spent on each task as a measure of the
difficulty that task presented to the students. This data is presented
in Section 4.1. After completing the assignment, students rated
their familiarity with concurrent programming concepts prior to the
assignment. Students then rated their experience with the various
tasks, ranking synchronization methods with respect to ease of
development, debugging, and reasoning (Section 4.2).
While grading the assignment, we recorded the type and fre-
quency of synchronization errors students made. These are the er-
rors still present in the student’s final version of the code. We use
Rogue name Technique R/C Threads Additional Requirements
Coarse Single global lock not distinct
Coarse2 Single global lock not distinct rogues shoot at 2 random lanes
CoarseCleaner Single global lock, conditions distinct conditions, wait/notify
Fine Per lane locks not distinct
Fine2 Per lane locks not distinct rogues shoot at 2 random lanes
FineCleaner Per lane locks, conditions distinct conditions, wait/notify
TM TM not distinct
TM2 TM not distinct rogues shoot at 2 random lanes
TMCleaner TM distinct
Table 1. The nine different rogue implementations required for the sync-gallery project. The technique column indicates what synchroniza-
tion technique was required. The R/C Threads column indicates whether coordination was required between dedicated rogue and cleaner
threads or not. A value of “distinct” means that rogue and cleaner instances run in their own thread, while a value of “not distinct” means
that the last rogue to shoot an empty (white) lane is responsible for cleaning the gallery.
the frequency with which students made errors as another metric of
the difficulty of various synchronization constructs.
To prevent experience with the assignment as a whole from in-
fluencing the difficulty of each task, we asked students to com-
plete the tasks in different orders. In each group of rogues (single-
lane, two-lane, and separate cleaner thread), students completed the
coarse-grained lock version first. Students then either completed
the fine-grained or TM version second, depending on their assigned
group. We asked students to randomly assign themselves to groups
based on hashes of their name. Due to an error, nearly twice as
many students were assigned to the group completing the fine-
grained version first. However, there were no significant differences
in programming time between the two groups, suggesting that the
order in which students implemented the tasks did not affect the
difficulty of each task.
3.1 Limitations
Perhaps the most important limitation of the study is the much
greater availability of documentation and tutorial information about
locking than about transactions. The novelty of transactional mem-
ory made it more difficult both to teach and learn. Lectures about
locking drew on a larger body of understanding that has existed for
a longer time. It is unlikely that students from year one influenced
students from year two given the difference in concrete syntax be-
tween the two courses. Students from year two could have influ-
enced those from year three, since the syntax remained the same
from year two to year three.
Figure 3. Average design, coding, and debugging time spent for analogous rogue variations.
Figure 4. Distributions for the amount of time students spent coding and debugging, for all rogue variations.
Another important limitation is the lack of compiler and lan-
guage support for TM in general, and the lack of support for the
atomic keyword specifically. Because of this, programmers must
directly insert read and write barriers and write exception handling
code to manage retrying on conflicts. This yields a concrete syntax
for transactions that is a a barrier to ease of understanding and use
(see §4.2).
4. Evaluation
We examined development time, user experiences, and program-
ming errors to determine the difficulty of programming with vari-
ous synchronization primitives. In general, we found that a single
coarse-grained lock had similar complexity to transactions. Both of
these primitives were less difficult, caused fewer errors, and had
better student responses than fine-grained locking.
4.1 Development time
Figure 3 shows the average time students spent designing, coding
and debugging with each synchronization primitive, for all three
years of the study. To characterize the diversity of time investment
the students reported, Figure 4 shows the distribution of times
students spent on coding and debugging. Figure 4 shows only data
from year two: distributions for years one and three are similar.
On average, transactional memory required more development
time than coarse locks, but less than what was required for fine-
grain locks and condition synchronization. Coloring two lanes or
using condition synchronization are more complex synchronization
tasks than using coarse-grained locks. Debugging time increases
more than design or coding time for these complex teaks. (Fig-
ure 3).
We evaluate the statistical significance of differences in devel-
opment time in Table 2. Using a Wilcoxon signed-rank test, we
evaluate the alternative hypothesis on each pair of synchronization
tasks that the row task required less time than the column task.
Pairs for which the signed-rank test reports a p-value of < .05 are
considered statistically significant, indicating that the row task re-
quired less time than the column. If the p-value is greater than .05,
the difference in time for the tasks is not statistically significant or
the row task required more time than the column task. Results for
the different class years are separated due to differences in the TM
part of the assignment(Section 2.4).
We found that students took more time on their initial task, be-
cause they were familiarizing themselves with the assignment. Ex-
cept for fine-grain locks, later versions of similar synchronization
primitives took less time than earlier, e.g. the Coarse2 task took
less time than the Coarse task. In addition, condition synchroniza-
tion is difficult. For both rogues with less complex synchroniza-
tion (Coarse and TM), adding condition synchronization increases
the time required for development. For fine-grain locking, students
simply replace one complex problem with a second, and so do not
require significant additional time.
In years one and two, we found that coarse locks and transac-
tions required less time than fine-grain locks on the more complex
two-lane assignments. This echoes the promise of transactions, re-
moving the coding and debugging complexity of fine-grain locking
and lock ordering when more than one lock is required. The same
trend was not observable in year three.
4.2 User experience
To gain insight into the students’ perceptions about the relative ease
of using different synchronization techniques we asked the students
to respond to a survey after completing the sync-gallery project.
The survey ends with 6 questions asking students to rank their
favorite technique with respect to ease of development, debugging,
reasoning about, and so on.
A version of the complete survey can be viewed at [3].
In student opinions, we found that the more baroque syntax of
the DSTM2 system was a barrier to entry for new transactional
programmers. Figure 5 shows student responses to questions about
syntax and ease of thinking about different transactional primitives.
In the first class year (which used DSTM2), students found trans-
actions more difficult to think about and had syntax more difficult
than that of fine-grain locks. In the second year, when the TM
implementation had a less cumbersome syntax, student opinions
aligned with our other findings: TM ranked behind coarse locks,
but ahead of fine-grain. For all three years, other questions on ease
of design and implementation ranked TM ahead of fine-grain locks.
4.3 Synchronization Error Characterization
We examined the solutions from all three years in detail to classify
the types of synchronization errors students made along with their
frequency. This involved both a thorough reading of every student’s
final solutions and automated testing. While the students’ subjec-
tive evaluation of the ease of transactional programming does not
clearly indicate that transactional programming is easier, the types
and frequency of programming errors does.
While the diversity of different errors we found present in the
students’ programs far exceeded our expectations, we found that all
errors fit within the taxonomy described below.
1. Lock ordering (lock-ord). In fine-grain locking solutions, a
program failed to use a lock ordering discipline to acquire locks,
admitting the possibility of deadlock.
2. Checking conditions outside a critical section (lock-cond).
This type of error occurs when code checks a program invari-
ant with no locks held, and subsequently acts on that invari-
ant after acquiring locks. This was the most common error in
sync-gallery, and usually occurred when students would check
whether to clean the gallery with no locks held, subsequently
acquiring lane locks and proceeding to clean. The result is a
violation of invariant 4 (§2). This type of error may be more
common because no visual feedback is given when it is vio-
lated (unlike races for shooting lanes, which can result in purple
lanes).
3. Forgotten synchronization (lock-forgot). This class of er-
rors includes all cases where the programmer forgot to acquire
locks, or simply did not realize that a particular region would
require mutual exclusion to be correct.
4. Exotic use of locks (lock-exotic). This error category is a
catch-all for synchronization errors made with locks for which
Year 1
Best syntax
Answers 1 2 3 4
Coarse 69.6% 17.4% 0% 8.7%
Fine 13.0% 43.5% 17.4% 21.7%
TM 8.7% 21.7% 21.7% 43.5%
Conditions 0% 21.7% 52.1% 21.7%
Easiest to think about
Answers 1 2 3 4
Coarse 78.2% 13.0% 4.3% 0%
Fine 4.3% 39.1% 34.8% 17.4%
TM 8.7% 21.7% 26.1% 39.1%
Conditions 4.3% 21.7% 30.4% 39.1%
Year 2
Best syntax
Answers 1 2 3 4
Coarse 61.6% 30.1% 1.3% 4.1%
Fine 5.5% 20.5% 45.2% 26.0%
TM 26.0% 31.5% 19.2% 20.5%
Cond. 5.5% 20.5% 28.8% 39.7%
Easiest to think about
Answers 1 2 3 4
Coarse 80.8% 13.7% 1.3% 2.7%
Fine 1.3% 38.4% 30.1% 28.8%
TM 16.4% 31.5% 30.1% 20.5%
Cond. 4.1% 13.7% 39.7% 39.7%
Year 3
Best syntax
Answers 1 2 3 4
Coarse 72.2% 22.2% 2.8% 2.8%
Fine 16.7% 38.9% 33.3% 11.1%
TM 8.3% 25% 25% 41.7%
Conditions 8.3% 13.9% 36.1% 41.7%
Easiest to think about
Answers 1 2 3 4
Coarse 88.9% 8.3% 0.0% 0.0%
Fine 0.0% 44.4% 33.3% 19.4%
TM 5.6% 27.8% 33.3% 30.8%
Conditions 2.8% 22.2% 27.8% 44.4%
Figure 5. Selected results from student surveys. Column numbers represent rank order, and entries represent what percentage of students
assigned a particular synchronization technique a given rank (e.g. 80.8% of students ranked Coarse locks first in the “Easiest to think about
category”). In the first year the assignment was presented, the more complex syntax of DSTM made TM more difficult to think about. In the
second year, simpler syntax alleviated this problem.
Coarse Fine TM Coarse2 Fine2 TM2 CoarseCleaner FineCleaner TMCleaner
Coarse Y1 1.00 0.03 0.02 1.00 0.02 1.00 0.95 0.47 0.73
Y2 1.00 0.33 0.12 1.00 0.38 1.00 1.00 0.18 1.00
Y3 1.00 0.06 0.43 1.00 0.17 0.60 0.93 0.02 0.61
Fine Y1 0.97 1.00 0.33 1.00 0.24 1.00 1.00 0.97 0.88
Y2 0.68 1.00 0.58 1.00 0.51 1.00 1.00 0.40 1.00
Y3 0.94 1.00 0.66 1.00 0.70 0.99 0.99 0.35 0.94
TM Y1 0.98 0.68 1.00 1.00 0.13 1.00 1.00 0.98 0.92
Y2 0.88 0.43 1.00 1.00 0.68 1.00 1.00 0.41 1.00
Y3 0.57 0.35 1.00 1.00 0.46 0.84 0.96 0.03 0.82
Coarse2 Y1 <0.01 <0.01 <0.01 1.00 <0.01 <0.01 <0.01 <0.01 <0.01
Y2 <0.01 <0.01 <0.01 1.00 <0.01 0.45 <0.01 <0.01 <0.01
Y3 <0.01 <0.01 <0.01 1.00 <0.01 <0.01 <0.01 <0.01 <0.01
Fine2 Y1 0.98 0.77 0.87 1.00 1.00 1.00 1.00 1.00 0.98
Y2 0.62 0.49 0.32 1.00 1.00 1.00 0.99 0.59 1.00
Y3 0.83 0.31 0.55 1.00 1.00 0.93 0.96 <0.01 0.69
TM2 Y1 <0.01 <0.01 <0.01 0.99 <0.01 1.00 0.04 <0.01 <0.01
Y2 <0.01 <0.01 <0.01 0.55 <0.01 1.00 <0.01 <0.01 <0.01
Y3 0.41 0.02 0.17 1.00 0.07 1.00 0.73 <0.01 0.40
CoarseCleaner Y1 0.05 <0.01 <0.01 1.00 <0.01 0.96 1.00 <0.01 0.08
Y2 <0.01 <0.01 <0.01 1.00 <0.01 1.00 1.00 <0.01 0.96
Y3 0.07 <0.01 0.04 1.00 0.05 0.28 1.00 <0.01 0.34
FineCleaner Y1 0.53 0.03 0.02 1.00 <0.01 1.00 0.99 1.00 0.46
Y2 0.83 0.60 0.59 1.00 0.42 1.00 1.00 1.00 1.00
Y3 0.98 0.66 0.97 1.00 0.99 1.00 1.00 1.00 0.99
TMCleaner Y1 0.28 0.12 0.08 1.00 0.03 1.00 0.92 0.55 1.00
Y2 <0.01 <0.01 <0.01 0.99 <0.01 1.00 0.04 <0.01 1.00
Y3 0.40 0.06 0.19 1.00 0.32 0.60 0.67 0.02 1.00
Table 2. Comparison of time taken to complete programming tasks for all students. The time to complete the task on the row is compared to
the time for the task on the column. Each cell contains p-values for a Wilcoxon signed-rank test, testing the hypothesis that the row task took
less time than the column task. Entries are considered statistically significant when p < .05, meaning that the row task did take less time to
complete than the column task, and are marked in bold. Results for the three class years are reported separately, due to differing transactional
memory implementations.
Figure 6. Overall error rates for programming tasks, for all three years of the study. Error bars show a 95% confidence interval on the error
rate. Fine-grained locking tasks were more likely to contain errors than coarse-grained or transactional memory (TM).
we were unable to discern the programmer’s actual intent. Pro-
grams that contribute a data point in this category were charac-
terized by heavy over-use of additional locks.
5. Exotic use of condition variables (cv-exotic). We encountered
a good deal of signal/wait usage on condition variables that
indicates no clear understanding of what the primitives actually
do. The canonical example of this is signaling and waiting on
the same condition in the same thread.
6. Condition variable use errors (cv-use). These types of errors
indicate a failure to use condition variables properly, but do
indicate a certain level of understanding. This class includes use
of if instead of while when checking conditions on a decision
to wait, or failure to check the condition at all before or after
waiting.
7. TM primitive misuse (TM-exotic). This class of error includes
any misuse of transactional primitives. Technically, this class
includes mis-use of the API, but in practice the only errors of
this form we saw were failure to call BeginTransaction be-
fore calling EndTransaction. Omission of read/write barriers
falls within this class as well, but it is interesting to note that we
found no bugs of this form over all three years.
8. TM ordering (TM-order). This class of errors represents at-
tempts by the programmer to follow some sort of locking dis-
cipline in the presence of transactions, where they are strictly
unnecessary. Such errors do not result in an incorrect program,
but do represent a misunderstanding of the primitive.
9. Checking conditions outside a transaction (TM-cond). This
class of errors is analogous to the lock-cond class. It occurs
when a programmer checks a condition outside of a transaction
which is then acted on inside the transaction, and the condition
is no longer guaranteed to hold during the transaction.
10. Forgotten TM synchronization (TM-forgot). Like the for-
gotten synchronization class above (lock-forgot), these errors
occur when a programmer failed to recognize the need for syn-
chronization and did not use transactions to protect a data struc-
ture.
Because different rogue implementations use different synchro-
nization techniques, a different subset of the error classes applies
for each rogue. For example, it is not possible for a programmer
to create a lock-ordering bug using a single coarse grain lock, so
lock-ord does not apply for coarse rogues. Table 4 shows explicitly
which error classes are applicable for each rogue implementation.
Table 3 shows the characterization of synchronization for pro-
grams submitted for all three years of the study. Figure 6 shows the
overall portion of students that made an error on each programming
task. Students were far more likely to make an error on fine-grain
synchronization than on coarse or TM. At least 50% of students
made at least one error on the Fine and Fine2 portions of the as-
signment. We believe error rates in year 2 are generally lower be-
cause the teaching assistants during this year were more active in
helping students with the material and the assignment. In almost
all cases, the error rates associated with TM implementations are
dramatically lower than those associated with locks or conditions.
The notable exception is the TM-based cleaner implementations
from Year 1, where more than half of the solutions contained at
least one error: the vast majority of errors were of type TM-cond.
The bulk of these errors were similar, and arose when programmers
checked whether lanes should be cleaned outside a transaction, sub-
coarse fine TM
single
two
cleaner
single
two
cleaner
single
two
cleaner
lock-ord XXX
lock-cond XXXXXX
lock-forgot XXXXXX
lock-exotic XXXXXX
cv-exotic X X
cv-use X X
TM-exotic X X X
TM-order X X X
TM-forgot X X X
TM-cond X X X
Table 4. Indication for which synchronization errors are possible
with each rogue assignment. The cell in the table contains an X
if it is possible to make the given error using the synchronization
primitives prescribed for that rogue implementation. Error types are
explained in Section 4.3.
sequently starting a transaction to perform the cleaning. In years 2
and 3, where transaction support relied on JDASTM rather than
DSTM2, this type of error was non-existent.
5. Code complexity
Table 5 shows code complexity statistics, and Figure 7 shows rel-
ative cyclomatic complexity and code size (in instructions) for all
rogue classes (measured by the cyvis software complexity visu-
alizer [1]). Data shown are for year 2. The cyclomatic complex-
ity [21] of a fragment of code is the the number of independent
control paths through it. Cyclomatic complexity does not directly
capture important aspects of complexity in a multi-threaded pro-
gramming environment. However, in this study, the programming
tasks remain constant, and the bulk of variation from solution to so-
lution can be attributed to synchronization implementation. Hence,
cyclomatic complexity is a reasonable metric for understanding ad-
ditional differences in code complexity across different synchro-
nization techniques.
The cyclomatic complexity data corroborate what we observed
in the students’ surveys. In years two and three, solutions based
on coarse grain locking have the lowest average cyclomatic com-
plexity, while TM-based solutions have lower complexity than fine-
grain and cleaner solutions. For example, in year 2, the single-lane,
two-lane, and cleaner rogues show complexity of 2.9, 3.6, and 2.8
respectively. The solutions synchronizing with TM show slightly
lock-ord
lock-cond
lock-forgot
lock-exotic
cv-exotic
cv-use
TM-exotic
TM-order
TM-forgot
TM-cond
year 1 occurrences 23 84 39 16 12 32 0 5 8 41
opportunities 51 312 312 312 52 52 153 153 149 149
rate 45.1% 29.6% 12.5% 5.1% 23.1% 61.5% 0.0% 3.7% 5.4% 27.5%
year 2 occurrences 11 62 26 0 11 14 5 4 1 0
opportunities 134 402 402 402 134 134 201 201 201 201
rate 8.2% 6.5% 15.4% 0% 8.2% 10.5% 2.5% 2.0% 0.5% 0%
year 3 occurrences 5 41 5 0 22 12 6 0 3 0
opportunities 28 168 168 168 56 56 84 84 84 84
rate 17.9% 24.4% 3.0% 0% 39.3% 21.4% 7.1% 0% 3.6% 0%
Table 3. Synchronization error rates for all three years. The occurrences row indicates the number of programs in which at least one bug of
the type indicated by the column header occurred. The opportunities row indicates the sample size (the number of programs we examined
in which that type of bug could arise: e.g. lock-ordering bugs cannot occur in with a single coarse lock). The rate column expresses the
percentage of examined programs containing that type of bug. Bug types are explained in Section 4.3.
Figure 7. Cyclomatic complexity and code size relative to RogueCoarse for solutions from all years.
rogue complexity size
year name mean std mean std
1 Coarse 2.8 0.2 114.4 1.5
Fine 5.2 0.3 172.8 7.6
TM 2.7 1.8 96.7 35.3
Coarse2 3.8 0.2 160.9 11.6
Fine2 6.8 0.3 247.2 5.0
TM2 3.2 2.3 96.3 86.2
CoarseCleaner 2.4 0.2 153.1 10.6
FineCleaner 2.9 0.1 211.3 9.1
TMCleaner 1.9 0.7 120.3 60.0
2 Coarse 2.9 0.8 104.5 25.2
Fine 5.4 2.0 189.4 50.5
TM 3.9 1.1 142.9 29.0
Coarse2 3.6 0.9 137.2 32.1
Fine2 6.4 2.5 257.9 59.6
TM2 4.7 1.3 183.4 43.0
CoarseCleaner 2.8 0.6 147.6 23.8
FineCleaner 3.8 1.0 237.1 51.2
TMCleaner 2.9 0.6 190.9 30.2
3 Coarse 2.9 0.2 116.5 1.6
Fine 5.3 0.7 167.1 6.0
TM 4.1 1.1 122.4 30.9
Coarse2 3.5 0.4 153.4 12.9
Fine2 6.1 1.2 254.5 36.5
TM2 5.0 1.4 168.8 47.2
CoarseCleaner 2.7 0.2 165.4 4.1
FineCleaner 3.0 0.1 207.7 16.7
TMCleaner 2.4 0.4 168.5 39.4
Table 5. Cyclomatic complexity and code size measurements for
all rogue implementations for all three years of the study.
higher complexity than their coarse counterparts with values of 3.9,
4.7, and 2.9. However, TM solutions have noticeably lower com-
plexity than solutions based on fine-grain locks, which, in year 2
show average complexity of 5.4, 6.4, and 3.8. The same trends are
echoed by the average instruction counts for the various rogue so-
lutions. Year 1 is an exception to this trend in that TM solutions
showed lower complexity than coarse or fine-grain locks. This is
largely because DSTM2 syntax does not require the programmer
to provide exception handling and retry code to handle aborts due
to conflict. The boiler-plate exception handling code required by
JDASTM inflates the complexity measurement in years 2 and 3.
6. Related work
Hardware transactional memory research is an active research field
with many competing proposals [5–8, 11, 12, 16–18, 22, 23, 26–
28, 32, 34]. All this research on hardware mechanism is premature
if researchers never validate the assumption that transactional pro-
gramming is actually easier than lock-based programming.
This research uses software transactional memory [4, 10, 13–
15, 19, 20, 31, 33]), but its purpose is to validate how untrained
programmers learn to write correct and performant concurrent pro-
grams with locks and transactions. The programming interface for
STM systems is the same as HTM systems, but without compiler
support, STM implementations require explicit read-write barriers,
which are not required in an HTM. Compiler integration creates a
simpler programming model than using a TM library [9]. Future
research could investigate whether compiler integration lowers the
perceived programmer difficulty in using transactions.
There is a previous version of this study [30]. This version ex-
tends that study bringing a third year of survey results and two addi-
tional years worth of student programs into the data set. Pankratius
et al. [25] describe a study in which 12 students working in 6
teams of two, wrote a parallel search engine over ten course of a
15 week project. Three teams used Pthreads, and three teams used
the Intel STM compiler [24]. Like this study, the Pankratius study
evaluates the resulting code, and surveys the developers involved
to understand their experience. Their findings are complementary
to ours: development investment required for the TM implementa-
tions in the study was lower. The study in this paper involves 20×
more programmers, and 200×more programs are evaluated. In our
own study, blocking and non-blocking synchronization, as well as
coarse and fine-grain locking are explicitly considered.
7. Conclusion
This paper offers evidence that transactional programming really is
less error-prone than high-performance locking, even if inexperi-
enced programmers have some trouble understanding transactions.
Students’ subjective evaluation showed that they found transac-
tional memory slightly harder to use than coarse locks, and easier to
use than fine-grain locks and condition synchronization. However,
analysis of synchronization error rates in students’ code yields a
more dramatic result, showing that for similar programming tasks,
transactions are considerably easier to get correct than locks.
References
[1] Cyvis Software Complexity Visualizer, 2009.
[2] Spoon, 2009. http://spoon.gforge.inria.fr/.
[3] Sync-gallery survey: http://www.cs.utexas.edu/users/witchel/tx/sync-
gallery-survey.html, 2009.
[4] Ali-Reza Adl-Tabatabai, Brian T. Lewis, Vijay Menon, Brian R. Mur-
phy, Bratin Saha, and Tatiana Shpeisman. Compiler and runtime sup-
port for efficient software transactional memory. In PLDI ’06: Pro-
ceedings of the 2006 ACM SIGPLAN conference on Programming lan-
guage design and implementation, pages 26–37, New York, NY, USA,
2006. ACM.
[5] Lee Baugh, Naveen Neelakantam, and Craig Zilles. Using hard-
ware memory protection to build a high-performance, strongly-atomic
hybrid transactional memory. SIGARCH Comput. Archit. News,
36(3):115–126, 2008.
[6] Colin Blundell, Joe Devietti, E. Christopher Lewis, and Milo M. K.
Martin. Making the fast case common and the uncommon case sim-
ple in unbounded transactional memory. SIGARCH Comput. Archit.
News, 35(2):24–34, 2007.
[7] Jayaram Bobba, Neelam Goyal, Mark D. Hill, Michael M. Swift,
and David A. Wood. Tokentm: Efficient execution of large transac-
tions with hardware transactional memory. SIGARCH Comput. Ar-
chit. News, 36(3):127–138, 2008.
[8] JaeWoong Chung, Chi Cao Minh, Austen McDonald, Travis Skare,
Hassan Chafi, Brian D. Carlstrom, Christos Kozyrakis, and Kunle
Olukotun. Tradeoffs in transactional memory virtualization. SIG-
PLAN Not., 41(11):371–381, 2006.
[9] Luke Dalessandro, Virendra J. Marathe, Michael F. Spear, and
Michael L. Scott. Capabilities and limitations of library-based soft-
ware transactional memory in c++. In Proceedings of the 2nd ACM
SIGPLAN Workshop on Transactional Computing. Portland, OR, Aug
2007.
[10] D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In DISC,
2006.
[11] Dave Dice, Yossi Lev, Mark Moir, and Daniel Nussbaum. Early expe-
rience with a commercial hardware transactional memory implemen-
tation. SIGPLAN Not., 44(3):157–168, 2009.
[12] L. Hammond, V. Wong, M. Chen, B. Hertzberg, B. Carlstrom,
M. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional
memory coherence and consistency. In ISCA, 2004.
[13] Tim Harris and Keir Fraser. Language support for lightweight trans-
actions. In OOPSLA ’03: Proceedings of the 18th annual ACM SIG-
PLAN conference on Object-oriented programing, systems, languages,
and applications, pages 388–402, New York, NY, USA, 2003. ACM.
[14] Tim Harris, Mark Plesko, Avraham Shinnar, and David Tarditi. Opti-
mizing memory transactions. In PLDI ’06: Proceedings of the 2006
ACM SIGPLAN conference on Programming language design and im-
plementation, pages 14–25, New York, NY, USA, 2006. ACM.
[15] Maurice Herlihy, Victor Luchangco, and Mark Moir. A flexible
framework for implementing software transactional memory. In
OOPSLA ’06: Proceedings of the 21st annual ACM SIGPLAN con-
ference on Object-oriented programming systems, languages, and ap-
plications, pages 253–262, New York, NY, USA, 2006. ACM.
[16] Maurice Herlihy and J. Eliot B. Moss. Transactional memory: archi-
tectural support for lock-free data structures. SIGARCH Comput. Ar-
chit. News, 21(2):289–300, 1993.
[17] Owen S. Hofmann, Christopher J. Rossbach, and Emmett Witchel.
Maximum benefit from a minimal htm. In ASPLOS ’09: Proceed-
ing of the 14th international conference on Architectural support for
programming languages and operating systems, pages 145–156, New
York, NY, USA, 2009. ACM.
[18] Yossi Lev and Jan-Willem Maessen. Split hardware transactions:
true nesting of transactions using best-effort hardware transactional
memory. In PPoPP ’08: Proceedings of the 13th ACM SIGPLAN
Symposium on Principles and practice of parallel programming, pages
197–206, New York, NY, USA, 2008. ACM.
[19] Yossi Lev, Mark Moir, and Dan Nussbaum. PhTM: Phased transac-
tional memory. In Workshop on Transactional Computing (TRANS-
ACT), 2007.
[20] Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul
Acharya, David Eisenstat, William N. Scherer III, and Michael L.
Scott. Lowering the overhead of software transactional memory.
Technical Report TR 893, Computer Science Department, University
of Rochester, Mar 2006. Condensed version submitted for publica-
tion.
[21] T. J. McCabe. A complexity measure. IEEE Trans. Softw. Eng.,
2(4):308–320, 1976.
[22] Austen McDonald, JaeWoong Chung, Brian D. Carlstrom, Chi Cao
Minh, Hassan Chafi, Christos Kozyrakis, and Kunle Olukotun. Ar-
chitectural semantics for practical transactional memory. SIGARCH
Comput. Archit. News, 34(2):53–65, 2006.
[23] Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D.
Hill, and David A. Wood. Logtm: Log-based transactional mem-
ory. In Proceedings of the 12th International Symposium on High-
Performance Computer Architecture, pages 254–265. Feb 2006.
[24] Yang Ni, Adam Welc, Ali-Reza Adl-Tabatabai, Moshe Bach, Sion
Berkowits, James Cownie, Robert Geva, Sergey Kozhukow, Ravi
Narayanaswamy, Jeffrey Olivier, Serguei Preis, Bratin Saha, Ady Tal,
and Xinmin Tian. Design and implementation of transactional con-
structs for C/C++. In OOPSLA ’08: Proceedings of the 23rd ACM
SIGPLAN conference on Object-oriented programming systems lan-
guages and applications, pages 195–212, New York, NY, USA, 2008.
ACM.
[25] Victor Pankratius, Ali-Reza Adl-Tabatabai, and Frank Otto. Does
transactional memory keep its promises? results from and empirical
study. publication, September 2009.
[26] Ravi Rajwar, Maurice Herlihy, and Konrad Lai. Virtualizing transac-
tional memory. In ISCA ’05: Proceedings of the 32nd annual inter-
national symposium on Computer Architecture, pages 494–505, Wash-
ington, DC, USA, 2005. IEEE Computer Society.
[27] Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter,
Owen S. Hofmann, Aditya Bhandari, and Emmett Witchel. Metat-
m/txlinux: transactional memory for an operating system. In ISCA
’07: Proceedings of the 34th annual international symposium on Com-
puter architecture, pages 92–103, New York, NY, USA, 2007. ACM.
[28] Hany E. Ramadan, Christopher J. Rossbach, and Emmett Witchel.
Dependence-aware transactional memory for increased concurrency.
In MICRO ’08: Proceedings of the 2008 41st IEEE/ACM International
Symposium on Microarchitecture, pages 246–257, Washington, DC,
USA, 2008. IEEE Computer Society.
[29] Hany E. Ramadan, Indrajit Roy, Maurice Herlihy, and Emmett
Witchel. Committing conflicting transactions in an stm. In PPoPP
’09: Proceedings of the 14th ACM SIGPLAN symposium on Principles
and practice of parallel programming, pages 163–172, New York, NY,
USA, 2009. ACM.
[30] Christopher Rossbach, Owen Hofmann, and Emmett Witchel. Is
transactional memory programming actually easier? In WDDD ’09:
Proc. 8th Workshop on Duplicating, Deconstructing, and Debunking,
jun 2009.
[31] Nir Shavit and Dan Touitou. Software transactional memory. In
Proceedings of the 14th ACM Symposium on Principles of Distributed
Computing, pages 204–213, Aug 1995.
[32] Arrvindh Shriraman, Sandhya Dwarkadas, and Michael L. Scott.
Flexible decoupled transactional memory support. In Proceedings of
the 35th Annual International Symposium on Computer Architecture.
Jun 2008.
[33] Fuad Tabba, Cong Wang, James R. Goodman, and Mark Moir.
NZTM: Nonblocking, zero-indirection transactional memory. In
Workshop on Transactional Computing (TRANSACT), 2007.
[34] Luke Yen, Jayaram Bobba, Michael R. Marty, Kevin E. Moore,
Haris Volos, Mark D. Hill, Michael M. Swift, and David A. Wood.
Logtm-se: Decoupling hardware transactional memory from caches.
In HPCA ’07: Proceedings of the 2007 IEEE 13th International Sym-
posium on High Performance Computer Architecture, pages 261–272,
Washington, DC, USA, 2007. IEEE Computer Society.
... It delivers a higher throughput in comparison to coarse-grained locks and does not increase design complexity as compared to fine-grained locks (Rossbach et al., 2010). Last but not least, it is worth noticing that STM-based approaches present very good performances in terms of transaction abort ratio for systems with a low ratio of context switching during the execution of the transactions and with a predominance of (1) read-only transactions; and (2) transactions with a short execution time (Maldonado et al., 2010). ...
Thesis
Full-text available
The current trend in the development of recent real-time embedded systems is driven by (i) a shift from single-core to multi-core platform architectures at the hardware level; (ii) a shift from sequential to parallel programming paradigms at the software level; and finally (iii) the ever increasing demand of new functionalities (e.g. additional tasks with specific timing requirements). These trends taken together increase the complexity of the system as a whole, and have a significant impact on the type of mechanisms that are adopted in order to guarantee both the functional and non-functional correctness of the system. This holds true especially in the case where these mechanisms have to maintain the correctness of data shared between different tasks executing concurrently in parallel. The access to shared resources (e.g. main memory) on single-core systems has traditionally relied on lock-based mechanisms. At any time instant, a single task is granted exclusive access to each shared resource. However, assuming the new settings, i.e. multi-core architectures executing a set of potentially parallel tasks sharing data, the big picture changes. Tasks executing in parallel on different cores and sharing the same data may have to compete before completing the execution. It has been proven that lock-based synchronisation approaches, which were sound in a single-core context, do not scale to multi-cores and, furthermore, hinder the composability of the system, unfortunately. On the path to solving these issues, Software Transactional Memory (STM) based approaches have been proposed as promising candidates. By using these alternative techniques, the underlying STM service would solve the conflicts between contending tasks while maintaining data consistency, and critical sections would be executed speculatively – i.e. they are executed but if the result of the computation harms the system correctness, then changes made by the computation are reverted and the results are ignored. This way, the details on how to synchronise shared data would be hidden from the programmer, thus representing a significant advantage as compared to lock-based synchronisation techniques regarding the functional correctness of the system. Regarding the non-functional correctness instead, the use of STM-based approaches in real-time systems also requires the tasks timing constraints to be met. This is due to the fact that each transaction aborting and repeating multiple times before its eventual commit incurs a timing overhead that might not be negligible and, therefore, must be taken into account to prevent deadline misses at runtime. This work considers a set of potentially parallel real-time tasks sharing data and executed on a multi-core platform. Assuming this setting, first it proposes a complete framework where an STM service is associated with a set of fully partitioned scheduling algorithms in order to improve the predictability of the system as well as guaranteeing that the timing constraints are met for all the tasks. Then, it proposes the corresponding schedulability analysis for each pair of STM and scheduling algorithms. Finally, it proposes a lightweight syntax to enrich the original Ada programming language in order to support STM for concurrent real-time applications.
... We considered Opinion to reflect the intrinsically biased and unreliable nature of tasks that ask subjects to provide their opinion. Previous work [79], [80] has shown that evidence in software engineering studies often contradicts opinions. This kind of analysis falls outside the scope of this paper. ...
Preprint
Reading code is an essential activity in software maintenance and evolution. Several studies with human subjects have investigated how different factors, such as the employed programming constructs and naming conventions, can impact code readability, i.e., what makes a program easier or harder to read and apprehend by developers, and code legibility, i.e., what influences the ease of identifying elements of a program. These studies evaluate readability and legibility by means of different comprehension tasks and response variables. In this paper, we examine these tasks and variables in studies that compare programming constructs, coding idioms, naming conventions, and formatting guidelines, e.g., recursive vs. iterative code. To that end, we have conducted a systematic literature review where we found 54 relevant papers. Most of these studies evaluate code readability and legibility by measuring the correctness of the subjects' results (83.3%) or simply asking their opinions (55.6%). Some studies (16.7%) rely exclusively on the latter variable.There are still few studies that monitor subjects' physical signs, such as brain activation regions (5%). Moreover, our study shows that some variables are multi-faceted. For instance, correctness can be measured as the ability to predict the output of a program, answer questions about its behavior, or recall parts of it. These results make it clear that different evaluation approaches require different competencies from subjects, e.g., tracing the program vs. summarizing its goal vs. memorizing its text. To assist researchers in the design of new studies and improve our comprehension of existing ones, we model program comprehension as a learning activity by adapting a preexisting learning taxonomy. This adaptation indicates that some competencies are often exercised in these evaluations whereas others are rarely targeted.
... HTM implements in hardware [19,21,32] the abstraction of Transactional Memory (TM), an alternative to lock-based synchronization that can significantly simplify the development of concurrent applications [34]. Due to its hardware nature, HTM avoids the overhead imposed by software-based TM implementations. ...
Conference Paper
Full-text available
With the emergence of byte-addressable Persistent Memory(PM), a number of works have recently addressed the problem of how to implement persistent transactional memory using off-the-shelf hardware transactional memory systems. Using Intel Optane DC PM we show, for the first time in the literature, experimental results highlighting several scalability bottlenecks of state of the art approaches, which so far have been evaluated only via PM emulation. We tackle these limitations by proposing SPHT (ScalablePersistent Hardware Transactions), an innovative PersistentTransactional Memory that exploits a set of novel mechanisms aimed at enhancing scalability both during transaction processing and recovery. We show that SPHT enhances through-put by up to 2.6x on STAMP and achieve speedups of 2.8x in the log replay phase vs state of the art solutions.
... HTM implements in hardware [19,21,32] the abstraction of Transactional Memory (TM), an alternative to lock-based synchronization that can significantly simplify the development of concurrent applications [34]. Due to its hardware nature, HTM avoids the overhead imposed by software-based TM implementations. ...
... The development of complex, real-life applications based on transactional futures would be greatly beneficial to research community for two main reasons. On the one hand, it would allow to quantify the reduction in complexity (e.g., development costs) stemming from the use of transctional futures with respect to conventional (e.g., lock-based) synchronization primitive -analogously to the studies that have demonstrated increased programmer's productivity thanks the use of classic TM [34]. On the other hand, it would provide a broader set of benchmarks directly inspired from real use case to evaluate the performance of future TMs with support for transactional futures -whereas in this work we had to resort to parallelize existing benchmarks that were not originally designed to use futures (e.g., STAMP's Vacation), or to develop rather simplistic synthetic benchmarks that did not fully exploit the richness of the proposed semantics (e.g., escaping futures). ...
Preprint
There is abundant observational data in the software engineering domain, whereas running large-scale controlled experiments is often practically impossible. Thus, most empirical studies can only report statistical correlations -- instead of potentially more insightful and robust causal relations. This paper discusses some novel techniques that support analyzing purely observational data for causal relations. Using fundamental causal models such as directed acyclic graphs, one can rigorously express, and partially validate, causal hypotheses; and then use the causal information to guide the construction of a statistical model that captures genuine causal relations -- such that correlation does imply causation. We apply these ideas to analyzing public data about programmer performance in Code Jam, a large world-wide coding contest organized by Google every year. Specifically, we look at the impact of different programming languages on a participant's performance in the contest. While the overall effect associated with programming languages is weak compared to other variables -- regardless of whether we consider correlational or causal links -- we found considerable differences between a purely statistical and a causal analysis of the very same data. The takeaway message is that even an imperfect causal analysis of observational data can help answer the salient research questions more precisely and more robustly than with just purely statistical techniques.
Article
There are many paradigms available to address the unique and complex problems introduced with parallel programming. These complexities have implications for computer science education as ubiquitous multi-core computers drive the need for programmers to understand parallelism. One major obstacle to student learning of parallel programming is that there is very little human factors evidence comparing the different techniques to one another, so there is no clear direction on which techniques should be taught and how. We performed a randomized controlled trial using 88 university-level computer science student participants performing three identical tasks to examine the question of whether or not there are measurable differences in programming performance between two paradigms for concurrent programming: threads compared to process-oriented programming based on Communicating Sequential Processes. We measured both time on task and programming accuracy using an automated token accuracy map (TAM) technique. Our results showed trade-offs between the paradigms using both metrics and the TAMs provided further insight about specific areas of difficulty in comprehension.
Article
The addition of transactional memory (TM) support to existing languages provides the opportunity to create new soft- ware from scratch using transactions, and also to simplify or extend legacy code by replacing existing synchronization with language-level transactions. In this paper, we describe our experiences transactionalizing the memcached application through the use of the GCC implementation of the Draft C++ TM Specification. We present experiences and recommendations that we hope will guide the effort to integrate TM into languages, and that may also contribute to the growing collective knowledge about how programmers can begin to exploit TM in existing production-quality software.
Article
Full-text available
This paper presents a software transactional memory system that introduces first-class C++ language constructs for transactional programming. We describe new C++ language extensions, a production-quality optimizing C++ compiler that translates and optimizes these extensions, and a high-performance STM runtime library. The transactional language constructs support C++ language features including classes, inheritance, virtual functions, exception handling, and templates. The compiler automatically instruments the program for transactional execution and optimizes TM overheads. The runtime library implements multiple execution modes and implements a novel STM algorithm that supports both optimistic and pessimistic concurrency control. The runtime switches a transaction's execution mode dynamically to improve performance and to handle calls to precompiled functions and I/O libraries. We present experimental results on 8 cores (two quad-core CPUs) running a set of 20 non-trivial parallel programs. Our measurements show that our system scales well as the numbers of cores increases and that our compiler and runtime optimizations improve scalability.
Article
Full-text available
Programmers have traditionally used locks to synchronize concurrent access to shared data. Lock-based synchronization, however, has well-known pitfalls: using locks for fine-grain synchronization and composing code that already uses locks are both difficult and prone to deadlock. Transactional memory provides an alternate concurrency control mechanism that avoids these pitfalls and significantly eases concurrent programming. Transactional memory language constructs have recently been proposed as extensions to existing languages or included in new concurrent language specifications, opening the door for new compiler optimizations that target the overheads of transactional memory.This paper presents compiler and runtime optimizations for transactional memory language constructs. We present a high-performance software transactional memory system (STM) integrated into a managed runtime environment. Our system efficiently implements nested transactions that support both composition of transactions and partial roll back. Our JIT compiler is the first to optimize the overheads of STM, and we show novel techniques for enabling JIT optimizations on STM operations. We measure the performance of our optimizations on a 16-way SMP running multi-threaded transactional workloads. Our results show that these techniques enable transactional memory's performance to compete with that of well-tuned synchronization.
Conference Paper
Full-text available
Transactional Memory (TM) is on its way to becoming the programming API of choice for writing correct, concurrent, and scalable programs. Hardware TM (HTM) implementations are expected to be significantly faster than pure software TM (STM); however, full hardware support for true closed and open nested transactions is unlikely to be practical. This paper presents a novel mechanism, the split hardware transaction (SpHT), that uses minimal software support to combine multiple segments of an atomic block, each executed using a separate hardware transaction, into one atomic operation. The idea of segmenting transactions can be used for many purposes, including nesting, local retry, orElse, and user-level thread scheduling; in this paper we focus on how it allows linear closed and open nesting of transactions. SpHT overcomes the limited expressive power of best-effort HTM while imposing overheads dramatically lower than STM and preserving useful guarantees such as strong atomicity provided by the underlying HTM.
Conference Paper
Full-text available
This paper proposes a hardware transactional memory (HTM) system called LogTM Signature Edition (LogTM-SE). LogTM-SE uses signatures to summarize a transaction's read- and write-sets and detects conflicts on coherence requests (eager conflict detection). Transactions update memory "in place" after saving the old value in a per-thread memory log (eager version management). Finally, a transaction commits locally by clearing its signature, resetting the log pointer, etc., while aborts must undo the log. LogTM-SE achieves two key benefits. First, signatures and logs can be implemented without changes to highly-optimized cache arrays because LogTM-SE never moves cached data, changes a block's cache state, or flash clears bits in the cache. Second, transactions are more easily virtualized because sig- natures and logs are software accessible, allowing the operating system and runtime to save and restore this state. In particu- lar, LogTM-SE allows cache victimization, unbounded nesting (both open and closed), thread context switching and migra- tion, and paging.
Article
Current hardware transactional memory systems seek to simplify parallel programming, but assume that large transactions are rare, so it is acceptable to penalize their performance or concurrency. However, future programmers may wish to use large transactions more often in order to integrate with higher-level programming models (e.g., database transactions) or perform selected I/O operations. To prevent the "small transactions are common" assumption from becoming self-fulfilling, this paper contributes TokenTM—an unbounded HTM that uses the abstraction of tokens to precisely track conflicts on an unbounded number of memory blocks. TokenTM implements tokens with new mechanisms, including metastate fission/fusion and fast token release. TokenTM executes small transactions fast, executes concurrent large transactions with no penalty to nonconflicting transactions, and gracefully handles paging, context switching, and System-V-style shared memory.
Conference Paper
This paper quantifies the effect of architectural design decisions onthe performance of TxLinux. TxLinux is a Linux kernel modifiedto use transactions in place of locking primitives in several key subsystems.We run TxLinux on MetaTM, which is a new hardwaretransaction memory (HTM) model.MetaTM contains features that enable efficient and correct interrupthandling for an x86-like architecture. Live stack overwrites can corrupt non-transactional stack memory and requires a smallchange to the transaction register checkpoint hardware to ensurecorrect operation of the operating system. We also propose stack based early release to reduce spurious conflicts on stack memorybetween kernel code and interrupt handlers.We use MetaTM to examine the performance sensitivity of individualarchitectural features. For TxLinux we find that Polka and SizeMatters are effective contention management policies, someform of backoff on transaction contention is vital for performance,and stalling on a transaction conflict reduces transaction restartrates, but does not improve performance. Transaction write setsare small, and performance is insensitive to transaction abort costsbut sensitive to commit costs.
Conference Paper
This paper quantifies the effect of architectural design decisions onthe performance of TxLinux. TxLinux is a Linux kernel modifiedto use transactions in place of locking primitives in several key subsystems.We run TxLinux on MetaTM, which is a new hardwaretransaction memory (HTM) model.MetaTM contains features that enable efficient and correct interrupthandling for an x86-like architecture. Live stack overwrites can corrupt non-transactional stack memory and requires a smallchange to the transaction register checkpoint hardware to ensurecorrect operation of the operating system. We also propose stack based early release to reduce spurious conflicts on stack memorybetween kernel code and interrupt handlers.We use MetaTM to examine the performance sensitivity of individualarchitectural features. For TxLinux we find that Polka and SizeMatters are effective contention management policies, someform of backoff on transaction contention is vital for performance,and stalling on a transaction conflict reduces transaction restartrates, but does not improve performance. Transaction write setsare small, and performance is insensitive to transaction abort costsbut sensitive to commit costs.
Conference Paper
Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (retained if the transaction aborts) values. Most (hardware) TM systems leave old values "in place" (the target memory address) and buffer new values elsewhere until commit. This makes aborts fast, but penalizes (the much more frequent) commits. In this paper, we present a new implementation of transactional memory, log-based transactional memory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place. LogTM makes two additional contributions. First, LogTM extends a MOESI directory protocol to enable both fast conflict detection on evicted blocks and fast commit (using lazy cleanup). Second, LogTM handles aborts in (library) software with little performance penalty. Evaluations running micro- and SPLASH-2 benchmarks on a 32-way multiprocessor support our decision to optimize for commit by showing that only 1-2% of transactions abort.