Improved multithreaded unit testing.
-
Citations (0)
-
Cited In (0)
Page 1
Improved Multithreaded Unit Testing
Vilas Jagannath, Milos Gligoric, Dongyun Jin,
Qingzhou Luo, Grigore Ro¸ su, Darko Marinov
Department of Computer Science, University of Illinois at Urbana-Champaign
Urbana, IL 61801, USA
{vbangal2, gliga, djin3, qluo2, grosu, marinov}@illinois.edu
ABSTRACT
Multithreaded code is notoriously hard to develop and test.
A multithreaded test exercises the code under test with two
or more threads. Each test execution follows some sched-
ule/interleaving of the multiple threads, and different sched-
ules can give different results. Developers often want to en-
force a particular schedule for test execution, and to do so,
they use time delays (Thread.sleep in Java). Unfortunately,
this approach can produce false positives or negatives, and
can result in unnecessarily long testing time.
This paper presents IMUnit, a novel approach to speci-
fying and executing schedules for multithreaded tests. We
introduce a new language that allows explicit specification of
schedules as orderings on events encountered during test ex-
ecution. We present a tool that automatically instruments
the code to control test execution to follow the specified
schedule, and a tool that helps developers migrate their
legacy, sleep-based tests into event-based tests in IMUnit.
The migration tool uses novel techniques for inferring events
and schedules from the executions of sleep-based tests. We
describe our experience in migrating over 200 tests. The
inference techniques have high precision and recall of over
75%, and IMUnit reduces testing time compared to sleep-
based tests on average 3.39x.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging
General Terms
Verification, Reliability
Keywords
IMUnit, Unit Testing, Multithreaded Code
1.INTRODUCTION
Multicore processors are here to stay. To extract greater
performance from multicore processors, developers need to
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ESEC/FSE’11, September 5–9, 2011, Szeged, Hungary.
Copyright 2011 ACM 978-1-4503-0443-6/11/09 ...$10.00.
write parallel code. The predominant paradigm for paral-
lel code is that of shared memory where multiple threads of
control communicate by reading and writing shared data ob-
jects. Shared-memory multithreaded code is often afflicted
by concurrency bugs, which are hard to detect because mul-
tithreaded code can demonstrate different behavior based on
the scheduling of threads, and the bugs may only be trig-
gered by a small specific set of schedules.
To validate their multithreaded code, developers write
multithreaded unit tests. A multithreaded test creates and
executes two or more threads (and/or invokes code under
test that itself creates and executes two or more threads).
Each test execution follows some schedule/interleaving of
the multiple threads, and different schedules can give differ-
ent results. Developers often want to enforce a particular
schedule for a test. For example, consider two threads, one
executing a method m and the other executing a method
m′. Developers may want to ensure in one test that m fin-
ishes before m′starts and in another test that m′finishes
before m starts (and in more tests that m and m′inter-
leave in certain ways). Without controlling the schedule, it
is impossible to write precise assertions about the execution
because the results can differ in the two scenarios, and it is
impossible to guarantee which scenarios were covered during
testing, even if multiple runs are performed.
To control the schedule of multithreaded tests, develop-
ers mostly use a combination of timed delays in the vari-
ous test threads. In Java, the delay is performed with the
Thread.sleep method, so we call this approach sleep-based.
A sleep pauses a thread while other threads continue exe-
cution. Using a combination of sleeps, developers attempt
to enforce the desired schedule during the execution of a
multithreaded test, and then assert the intended result for
the desired schedule. A sleep-based test can fail when an
undesired schedule gets executed even if the code under test
has no bug (false positive). Dually, a sleep-based test can
pass when an unintended schedule gets executed even if the
code under test has a bug (false negative). Indeed, sleeps are
an unreliable and inefficient mechanism for enforcing sched-
ules. To use sleeps, one has to estimate the real-time du-
ration for which to delay a thread while the other threads
perform their work. This is usually estimated by trial and
error, starting from a small duration and increasing it un-
til the test passes consistently on the developer’s machine.
The estimated duration depends on the execution environ-
ment (hardware/software configuration and the load on the
machine). Therefore, when the same test is executed in a
different environment, the intended schedule may not be en-
Page 2
forced, leading to false positives/negatives. Moreover, sleep
can be very inaccurate even on a single machine [20]. In an
attempt to mitigate the unreliability of sleep, developers of-
ten end up over-estimating the duration, which in turn leads
to slow running multithreaded tests.
Researchers have previously noted the numerous prob-
lems with using sleeps to specify schedules in multithreaded
tests and have developed frameworks such as ConAn [22,
23], ConcJUnit [27], MultithreadedTC [26], and ThreadCon-
trol [13] to tackle some problems in specifying and enforc-
ing schedules in multithreaded unit tests. However, despite
these frameworks, multithreaded unit testing still has many
issues that could be categorized as follows:
Readability: Most current frameworks force developers
to reason about the execution of threads relative to a global
clock. This is unintuitive since developers usually reason
about the execution of their multithreaded code in terms
of event relationships (such as m finishing before m′starts).
Some frameworks require users to write schedules in external
scripts, which makes it even more difficult to reason about
schedules. In other frameworks the schedule is implicit, as a
part of the unit test code, and hence it is difficult to focus on
the schedule and reason about it separately at a higher level.
Modularity: In some current frameworks, the intended
schedule is intermixed with the test code and effectively
hard-coded into a multithreaded unit test. This makes it
difficult to specify multiple schedules for a particular unit
test and/or to reuse test code among different tests.
Reliability: Some current frameworks, as well as the
legacy sleep-based tests, rely on real time. As explained,
this makes them very fragile, leading to false positives/neg-
atives and/or slow testing time.
Migration Costs: Most current frameworks are very dif-
ferent from the traditional sleep-based tests. This makes it
costly to migrate the existing sleep-based tests.
We present a new framework, called IMUnit (pronounced
“immunity”), which aims to address these issues with mul-
tithreaded unit testing. Specifically, we make the following
contributions:
Schedule Language: IMUnit introduces a novel lan-
guage that enables natural and explicit specification of sch-
edules for multithreaded unit tests. Semantically, the basic
entity in an IMUnit schedule is an event that an execution
can produce at various points (e.g., a thread starting/finish-
ing the execution of a method, or a thread getting blocked).
We call the IMUnit approach event-based. An IMUnit sched-
ule itself is a (monitorable) property [10,24] on the sequence
of events. More precisely, each schedule is expressed as a
set of desirable event orderings, where each event ordering
specifies the order between a pair of events (note that an
IMUnit schedule need not specify a total order between all
events but only the necessary partial order).
While the ideas of IMUnit can be embodied in any lan-
guage, we have developed an implementation for Java. Syn-
tactically, the IMUnit constructs are represented using Java
annotations. A developer can use @Event and @Schedule an-
notations to describe the events and intended schedules, re-
spectively, for a multithreaded unit test.
Automated Migration: We have developed two infer-
ence techniques and a tool to ease migration of legacy, sleep-
based tests to IMUnit, event-based tests.
techniques can automatically infer likely relevant events and
schedules from the execution traces of existing sleep-based
Our inference
tests.
plugin which uses the results of inference to automatically
refactor a given multithreaded test into an IMUnit test.
Execution and Checking: We have implemented a tool
for execution of IMUnit multithreaded unit tests. The tool
can work in two modes. In the active mode, it controls the
thread scheduler to enforce a given IMUnit schedule dur-
ing test execution. In the passive mode, it checks whether
an arbitrary test execution, controlled by the regular JVM
thread scheduler, follows a given IMUnit schedule. To en-
force/check the schedules, our tool uses the JavaMOP mon-
itoring framework [10,24]. We also include a new runner for
the standard JUnit testing framework to enable execution
of IMUnit tests with our enforcement/checking tool.
Evaluation: To guide and refine our design of the IMUnit
language, we have been inspecting over 200 sleep-based tests
from several open-source projects. We manually translated
198 of those tests into IMUnit, adding events and sched-
ules, and removing sleeps. As a result, the current version
of IMUnit is highly expressive, and we were able to remove
all sleeps from all but 4 tests.
We evaluated our inference techniques by automatically
inferring events/schedules for the original tests that we man-
ually translated (the subprojects on manual translation and
automatic inference were performed by different authors to
reduce the direct bias of manual translation on automatic
inference). Computing the precision and recall of the auto-
matically inferred events/schedules with respect to the man-
ually translated events/schedules, we find our techniques to
be highly effective, with over 75% precision and recall.
We also compared the execution time of the original tests
and our translated tests. Because the main goal of IMUnit
is to make tests more readable, modular, and reliable, we
did not expect IMUnit to run faster. However, IMUnit did
reduce the testing time, on average 3.39x, compared to the
sleep-based tests, with the sleep duration that the original
tests had in the code. As mentioned earlier, these duration
values are often over-estimated, especially in older tests that
were written for slower machines. In summary, IMUnit not
only makes multithreaded unit tests more readable, modu-
lar, and reliable than the traditional sleep-based approach,
but IMUnit can also make test execution faster.
This paper makes progress on our vision for improving
multithreaded unit testing; our position paper [15] proposed
the idea of event-based specification of schedules, but the
IMUnit language and algorithms/tools for inference and ex-
ecution are completely new.
We implemented our migration tool as an Eclipse
2.EXAMPLE
We now illustrate IMUnit with the help of an example
multithreaded unit test for the ArrayBlockingQueue class in
java.util.concurrent (JSR-166) [17]. ArrayBlockingQueue
is an array-backed implementation of a bounded blocking
queue. One operation provided by ArrayBlockingQueue is
add, which performs a non-blocking insertion of the given el-
ement at the tail of the queue. If add is performed on a full
queue, it throws an exception. Another operation provided
by ArrayBlockingQueue is take, which removes and returns
the object at the head of the queue. If take is performed on
an empty queue, it blocks until an element is inserted into
the queue. These operations could have bugs that get trig-
gered when the add and take operations execute on different
threads. Consider testing some scenarios for these opera-
Page 3
1 @Test
2 public void testTakeWithAdd() {
3
ArrayBlockingQueue<Integer> q;
q = new ArrayBlockingQueue<Integer>(1);
new Thread(
new CheckedRunnable() {
public void realRun() {
q.add(1);
Thread.sleep(100);
q.add(2);
}
}, ” addThread” ).start();
Thread.sleep(50);
Integer taken = q.take();
assertTrue(taken == 1 && q.isEmpty());
taken = q.take();
assertTrue(taken == 2 && q.isEmpty());
addThread.join();
19 }
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
(a) JUnit
1 public class TestTakeWithAdd
extends MultithreadedTest {
3
ArrayBlockingQueue<Integer> q;
4
@Override
public void initialize() {
q = new ArrayBlockingQueue<Integer>(1);
}
public void addThread() {
q.add(1);
waitForTick(2);
q.add(2);
}
public void takeThread() {
waitForTick(1);
Integer taken = q.take();
assertTrue(taken == 1 && q.isEmpty());
taken = q.take();
assertTick(2);
assertTrue(taken == 2 && q.isEmpty());
}
21 }
2
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(b) MultithreadedTC
1 @Test
2 @Schedule(” finishedAdd1->startingTake1,
[startingTake2]->startingAdd2” )
4 public void testTakeWithAdd() {
5
ArrayBlockingQueue<Integer> q;
q = new ArrayBlockingQueue<Integer>(1);
new Thread(
new CheckedRunnable() {
public void realRun() {
q.add(1);
@Event(” finishedAdd1” )
@Event(” startingAdd2” )
q.add(2);
}
}, ”addThread” ).start();
@Event(” startingTake1” )
Integer taken = q.take();
assertTrue(taken == 1 && q.isEmpty());
@Event(” startingTake2” )
taken = q.take();
assertTrue(taken == 2 && q.isEmpty());
addThread.join();
23 }
3
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
(c) IMUnit
Figure 1: Example multithreaded unit test for ArrayBlockingQueue
tions (in fact, the JSR-166 TCK provides over 100 tests for
various scenarios for similar classes).
Figure 1 shows a multithreaded unit test that ArrayBlock-
ingQueue exercises add and take in two scenarios. In particu-
lar, Figure 1(a) shows the test written as a regular JUnit test
method, with sleeps used to specify the required schedule.
We invite the reader to consider what scenarios are specified
with that test (without looking at the other figures). It is
likely to be difficult to understand which schedule is being
exercised by reading the code of this unit test. While the
sleeps provide hints as to which thread is waiting for another
thread to perform operations, it is unclear which operations
are intended to be performed by the other thread before the
sleep finishes.
The test actually checks that take performs correctly both
with and without blocking, when used with add from another
thread. To check both scenarios, the test exercises a sched-
ule where the first add finishes before the first take starts,
and the second take blocks before the second add starts.
Line 13 shows the first sleep that is intended to pause the
main thread1while the addThread finishes the first add. Line 9
shows the second sleep which is intended to pause the add-
Thread while the main thread finishes the first take and then
proceeds to block while performing the second take. If the
specified schedule is not enforced during the execution, there
may be a false positive/negative. For example, if both add
operations execute before a take is performed, the test will
throw an exception and fail even if the code has no bug, and
if both take operations finish without blocking, the test will
not fail, even if the blocking take code had a bug.
Figure 1(b) shows the same test written using Multithre-
adedTC [26]. Note that it departs greatly from traditional
JUnit where each test is a method. In MultithreadedTC,
each test has to be written as a class, and each method in
the test class contains the code executed by a thread in the
1JVM names the thread that starts the execution main by
default, although the name can be changed later.
test. The intended schedule is specified with respect to a
global, logical clock. Since this clock measures time in ticks,
we call the approach tick-based. When a thread executes a
waitForTick, it is blocked until the global clock reaches the
required tick. The clock advances implicitly when all threads
are blocked (and at least one thread is blocked in a wait-
ForTick). While a MultithreadedTC test does not rely on
real time, and is thus more reliable than a sleep-based test,
the intended schedule is still not immediately clear upon
reading the test code. It is especially not clear when wait-
ForTick operations are blocked/unblocked, because ticks are
advanced implicitly when all the threads are blocked.
Figure 1(c) shows the same test written using IMUnit.
The interesting events encountered during test execution are
marked with @Event annotations2, and the intended sched-
ule is specified with a @Schedule annotation that contains a
comma-separated set of orderings among events. An order-
ing is specified using the operator ->, where the left event is
intended to execute before the right event. An event speci-
fied within square brackets denotes that the thread execut-
ing that event is intended to block after that event. It should
be clear from reading the schedule that the addThread should
finish the first add before the main thread starts the first take,
and that the main thread should block while performing the
second take before the addThread starts the second add.
We now revisit, in the context of this example, the issues
with multithreaded tests listed in the introduction. In terms
of readability, we believe that making the schedules explicit,
as in IMUnit, allows easier understanding and maintenance
of schedules and code for both testing and debugging. In
terms of modularity, IMUnit allows extracting the addThread
as a helper thread (with its events) that can be reused in
2Note that @Event annotations appear on statements. The
current version of Java (ver. 6) does not support annotations
on statements, but the upcoming version of Java (ver. 7)
will add such support.For now, @Event annotations can
be written as comments, e.g., /* @Event("finishedAdd1") */,
which IMUnit translates into code for test execution.
Page 4
<Schedule>
<Ordering>
<Condition>
::= { <Ordering> [ ” , ”]
::= <Condition> ”->” <Basic Event>
::= <Basic Event> | <Block Event>
| <Condition> ”| | ” <Condition>
| <Condition> ”&&” <Condition>
|”( ” <Condition> ”) ”
<Basic Event> ::= <Event Name> [ ”@” <Thread Name>]
| ”start ” ”@” <Thread Name>
| ”end ” ”@” <Thread Name>
<Block Event> ::= ”[ ” <Basic Event> ”] ”
<Event Name>
::= { <Id> ”. ” } <Id>
<Thread Name> ::= <Id>
} <Ordering>
Figure 2: Syntax of the IMUnit schedule language
other tests (in fact, many tests in the JSR-166 TCK [17] use
such helper threads). In contrast, reusing thread methods
from the MultithreadedTC test class is more involved, re-
quiring subclassing, parametrizing tick values, and providing
appropriate parameter values. Also, IMUnit allows specify-
ing multiple schedules for the same test code (Section 4.3).
In terms of reliability, IMUnit does not rely on real time and
hence has no false positives/negatives due to unintended sch-
edules. In terms of migration costs, IMUnit tests resemble
legacy JUnit tests more than MultithreadedTC tests. This
similarity eases the transition of legacy tests into IMUnit:
in brief, add @Event annotations, add @Schedule annotation,
and remove sleep calls. Section 4 presents our techniques
and tool that automate this transition.
3.SCHEDULE LANGUAGE
We now describe the syntax and semantics of the language
used in IMUnit’s schedules.
3.1Concrete Syntax
Figure 2 shows the concrete syntax of the implemented
IMUnit schedule language. An IMUnit schedule is a comma-
separated set of orderings. Each ordering defines a condi-
tion that must hold before a basic event can take place. A
basic event is an event name possibly tagged with its is-
suing thread name when that is not understood from the
context. An event name is any identifier, possibly prefixed
with a qualified class name. There are two implicit event
names for each thread, start and end, indicating when the
thread starts and terminates. Any other event must be ex-
plicitly introduced by the user with the @Event annotation
(see Figure 1(c)). A condition is a conjunctive/disjunctive
combination of basic and block events, where block events
are written as basic events in square brackets. A block event
[e′] in the condition c of an ordering c → e states that e′
must precede e and, additionally, the thread of e′is blocked
when e takes place.
3.2 Schedule Logic
It is more convenient to define a richer logic than what
is currently supported by our IMUnit implementation; the
additional features are natural and thus may also be imple-
mented in the future. The semantics of our logic is given in
Section 3.3; here is its syntax:
a
t
e
ϕ
∶∶=
∶∶=
∶∶=
∶∶=
start ∣ end ∣ block ∣ unblock ∣ event names
thread names
a@t
[t]∣
ϕ → ϕ
∣
usual propositional connectives
The intuition for [t] is“thread t is blocked”and for ϕ → ψ“if
ψ held in the past, then ϕ must have held at some moment
before ψ”. We call these two temporal operators the block
and the ordering operators, respectively. For uniformity, all
events are tagged with their thread.
plicit events: start@t and end@t were discussed above, and
block@t and unblock@t correspond to when t gets blocked
and unblocked3.
For example, the following formula in our logic
There are four im-
(a1@t1∧([t2] ∨ (¬(start(t2) → a1@t1)))) → a2@t2
(a2@t2∧([t1] ∨ (end(t1) → a2@t2))) → a2@t2
∧
says that if event a2 is generated by thread t2 then: (1)
event a1 must have been generated before that and, when a1
was generated, t2 was either blocked or not started yet; and
(2) when a2 is generated by t2, t1 is either blocked or termi-
nated. As explained shortly, every event except for block and
unblock is restricted to appear at most once in any execution
trace. Above we assumed that a1,a2∉ {block,unblock}.
Before we present the precise semantics, we explain how
our current IMUnit language shown in Figure 2 (whose de-
sign was driven exclusively by practical needs) is a smaller
fragment of the richer logic. An IMUnit schedule is a con-
junction (we use comma instead of ∧) of orderings, and
schedules cannot be nested. Since generating block and un-
block events is expensive, IMUnit currently disallows their
explicit use in schedules. Moreover, to reduce their implicit
use to a fast check of whether a thread is blocked or not,
IMUnit also disallows the explicit use of [t] formulas. In-
stead, it allows block events of the form [a@t] (note the
square brackets) in conditions. Since negations are not al-
lowed in IMUnit, and since we can show (after we discuss the
semantics) that (ϕ1∨ϕ2) → ψ equals (ϕ1→ ψ)∨ (ϕ2 → ψ),
we can reduce any IMUnit schedule to a Boolean combina-
tion of orderings ϕ → e, where ϕ is a conjunction of ba-
sic events or block events. All that is left to show is how
block events are desugared. Consider an IMUnit schedule
(ϕ ∧ [a1@t1]) → a2@t2, saying that a1@t1 and ϕ must pre-
cede a2@t2 and t1 is blocked when a2@t2 occurs. This can
be expressed as ((ϕ ∧ a1@t1) → a2@t2) ∧ ((a2@t2∧ [t1]) →
a2@t2), relying on a2@t2 happening at most once.
3.3Semantics
Our schedule logic is a carefully chosen fragment of past-
time linear temporal logic (PTLTL) over special well-formed
multithreaded system execution traces.
Program executions are abstracted as finite traces of eve-
nts τ = e1e2...en. Unlike in conventional LTL, our traces
are finite because unit tests always terminate. Traces must
satisfy the obvious condition that events corresponding to
thread t can only appear while the thread is alive, that is,
between start@t and end@t. Using PTLTL, this require-
ment states that for any trace τ and any event a@t with
a ∉ {start, end}, the following holds:
τ ⊧ ¬ ⟐(a@t∧(⟐end@t∨ ¬⟐start@t))
where ⟐ stands for “eventually in the past”. Moreover, ex-
cept for block@t and unblock@t events, we assume that each
3It is expensive to explicitly generate block/unblock events in
Java precisely when they occur, because it requires polling
the status of each thread; our currently implemented frag-
ment only needs, through its restricted syntax, to check if a
given thread is currently blocked or not, which is fast.
Page 5
event appears at most once in a trace. With PTLTL, this
says that the following must hold (? is “previously”):
τ ⊧ ¬ ⟐(a@t∧ ?⟐ a@t)
for any trace τ and any a@t with a ∉ {block,unblock}.
The semantics of our logic is defined as follows:
e1e2...en⊧ e
iff e = en
τ ⊧ ϕ ∧/∨ ψ
iff τ ⊧ ϕ and/or τ ⊧ ψ
e1e2...en⊧ [t]
iff (∃1 ≤ i ≤ n) (ei= block@t and
(∀i < j ≤ n) ej ≠ unblock@t)
e1e2...en⊧ ϕ → ψ iff (∀1 ≤ i ≤ n) e1e2...ei / ⊧ ψ or
(∃1 ≤ i ≤ n) (e1e2...ei⊧ ψ and
(∃1 ≤ j ≤ i) e1e2...ej ⊧ ϕ)
It is not hard to see that the two new operators [t] and
ϕ → ψ can be expressed in terms of PTLTL as
[t]
≡
≡
¬unblock@t S block@t
?¬ψ ∨ ⟐(ψ ∧⟐ϕ)
ϕ → ψ
where S stands for “since” and ? for “always in the past”.
4. MIGRATION
We now describe the process of migrating legacy, sleep-
based tests to IMUnit, event-based tests. First we present
the steps that are typically performed during manual migra-
tion and then we describe the automated support that we
have developed for key steps of the migration.
4.1Manual Migration
Based on our experience of manually migrating over 200
tests, the migration process typically follows these steps:
Step 1: Optionally add explicit names for threads in the
test code (by using a thread constructor with a name or by
adding a call to setName). This step is required if events are
tagged with their thread name (e.g. finishedAdd1@addThread)
in the schedule, because by default the JVM automatically
assigns a name (e.g. Thread-5) for each thread created with-
out an explicit name, and the automatic name may differ
between JVMs or between different runs on the same JVM.
Step 2: Introduce @Event annotations for the events relevant
for the intended schedule. Some of these annotations will be
used for block events and some for basic events.
Step 3: Introduce a @Schedule annotation for the intended
schedule. Steps 2 and 3 are the hardest to perform as they
require understanding of the intended behavior of the sleep-
based test. Note that a schedule with too few orderings can
lead to failing tests that are false positives. On the other
hand, a schedule with too many orderings may lead to false
negatives whereby a bug is missed because the schedule is
over-constraining the test execution.
Step 4: Check that the orderings in the introduced sched-
ule are actually satisfied when running the test with sleeps
(Section 5 describes the passive, checking mode).
Step 5: Remove sleeps.
Step 6: Optionally merge multiple tests with different sched-
ules (but similar test code) into one test with multiple sched-
ules, potentially adding schedule-specific code (Section 4.3).
4.2Automated Migration
We have developed automated tool support to enable eas-
ier migration of sleep-based tests to IMUnit. In particular,
we have developed inference techniques that can compute
enum EntryType { SLEEP CALL, SLEEP RETURN, BLOCK CALL,
BLOCK RETURN, OTHER CALL, OTHER RETURN, TH START,
TH END, EVENT }
class LogEntry { EntryType type; ThreadID tid; String info; StmtID sid; }
Figure 3: Log Entries
likely relevant events and schedules for sleep-based tests by
inspecting the execution logs obtained from test runs. We
next describe the common infrastructure for logging the test
runs. We then present the techniques for inferring events
and schedules.
4.2.1
Our inference of events and schedules from sleep-based
tests is dynamic: it first instruments the test code (using As-
pectJ [19]) to emit entries potentially relevant for inference,
then runs the instrumented code (several times, as explained
below) to collect logs of entries from the test executions, and
finally analyzes the logs to perform the inference.
Figure 3 shows the generic representation for log entries,
although event and schedule inference require slightly differ-
ent representations. Each log entry has a type, name/ID of
the thread that emits the entry, potential info/parameters
for the entry, and the ID of the statement that creates the
entry (which is used only for event inference). The types of
log entries and their corresponding info are as follows:
SLEEP_CALL: Invocation of Thread.sleep method. (Only used
for inferring events.)
SLEEP_RETURN: Return from Thread.sleep method.
BLOCK_CALL: Invocation of a thread blocking method (Lock-
Support.park or Object.wait).
BLOCK_RETURN: Return from a thread blocking method.
OTHER_CALL: Invocation of a method (other than those listed
above) in the test class. The info is the method name. (Only
used for inferring events .)
OTHER_RETURN: Return from a method executed from the test
class.
TH_START: Invocation of Thread.start. The info is the ID of
the started thread. (Only used for inferring schedules.)
TH_END: End of thread execution.
EVENT: Execution of an IMUnit event. The info is the name
of the event. (Only available while inferring schedules.)
Note that any logging can affect timing of test execution.
Because sleep-based tests are especially sensitive to timing,
care must be taken to avoid false positives. We address this
in three ways. First, our logging is lightweight. The instru-
mented code only collects log entries (and their parameters)
relevant to the inference. For example, OTHER_CALL is not col-
lected for schedule inference. Also, the entries are buffered
in memory during test execution, and they are converted to
strings and logged to file only at the end of test execution.
While keeping entries in memory would not work well for
very long logs, it works quite well for the relatively short
logs produced by test executions. Second, our instrumenta-
tion automatically scales the duration of sleeps by a given
constant N to compensate for the logging overhead. For ex-
ample, for N = 3 it increases all sleep times 3x. Increasing
all the durations almost never makes a passing test fail, but
it does make the test run slower. Third, we perform multiple
Lightweight Logging