Content uploaded by Itir Karac
Author content
All content in this area was uploaded by Itir Karac on May 29, 2019
Content may be subject to copyright.
FROM THE EDITOR Editor: Editor Name
affiliation
email@email.com
0740-7459/18/$33.00 © 2018 IEEE JULY/AUGUST 2018 | IEEE SOFTWARE
1
REDIRECTIONS
Editor: Tim Menzies
Nort h Caroli na State
Universi ty
tim@menzies.us
What Do We (Really)
Know about Test-Driven
Development?
Itir Karac and Burak Turhan
TEST-DRIVEN DEVELOPMENT
(TD D) is one of the most controver-
sial agile practices in terms of its
impact on software quality and pro-
grammer productivity. After more
than a decade’s research, the jury is
still out on its effectiveness. TDD
promised all: increased quality and
productivity, along with an emerg-
ing, clean design supported by the
safety net of a growing library
of tests. What’s more, the recipe
sounded surprisingly simple: Don’t
write code without a failing test.
Here, we revisit the evidence of
the promises of TDD.1 But, before
we go on, just pause and think of an
answer to the following core ques-
tion: What is TDD?
Let us guess: your response is
most likely along the lines of, “TDD
is a practice in which you write
tests before code.” This emphasis
on its test-rst dynamic, strongly
implied by the name, is perhaps the
root of most, if not all, of the con-
troversy about TDD. Unfortunately,
it’s a common misconception to use
“TDD” and “test-rst” interchange-
ably. Test-rst is only one part of
TDD. There are many other cogs
in the system that potentially make
TDD tick.
How about working on small
tasks, keeping the red–green–refactor
cycles short and steady, writing only
the code necessary to pass a fail-
ing test, and refactoring? What if
we told you that some of these cogs
contribute more toward fullling
the promises of TDD than the order
of test implementation? (Hint: you
should ask for evidence.)
15 Years of (Contradictory)
Evidence
Back in 2003, when the software
development paradigm started to
change irrevocably (for the bet-
ter?), Kent Beck posed a claim based
on anecdotal evidence and paved
the way for software engineering
researchers:
No studies have categorically
demonstrated the difference be-
tween TDD and any of the many
Call for Submissions
Do you have a surprising result or industrial experience? Someth ing that chal-
lenges decades of conventiona l thinking in software engineering? If so, email a
one-paragraph synopsis to timm@ieee.org (use the subject line “REDIRECTIONS:
Idea: your idea”). If that looks interesting, I’ll ask you to submit a 1,000- to 2,400-
word article (in which each gure or table counts as 250 words) for review for the
Redirections department. Please note: heresies are more tha n welcome (if sup-
ported by well-reasoned industrial experiences, case studies, or other empirical
results).—
Tim Menzies
2 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFT WARE
alternatives in quality, productiv-
ity, or fun. However, the anecdotal
evidence is overwhelming, and the
secondary effects are unmistakable.2
Since then, numerous studies—
for example, experiments and case
studies—have investigated TDD’s
effectiveness. These studies are pe-
riodically synthesized in secondary
studies (see Table 1), only to reveal
contradictory results across the pri-
mary studies. This research has also
demonstrated no consistent overall
benet from TDD, particularly for
overall productivity and within sub-
groups for quality.
Why the inconsistent results? Be-
sides the reasons listed in Table 1,
other likely reasons are that
• TDD has too many cogs,
• its effectiveness is highly inu-
enced by the context (for ex-
ample, the tasks at hand or skills
of individuals),
• the cogs highly interact with
each other, and
• most studies have focused on
only the test-rst aspect.
Identifying the inconsistencies’
sources is important for designing
further studies that control for those
sources.
Matjaž Pancˇur and Mojca
Ciglaricˇ speculated that the results of
studies showing TDD’s superiority
over a test-last approach were due to
the fact that most of the experiments
employed a coarse-grained test-last
process closer to the waterfall
Table 1. Systematic literature reviews on test-driven development (TDD).
Study
Overall conclusion for quality
with TDD
Overall conclusion for
productivity with TDD
Inconsistent result s in the study
categories
Bissi et al.3Improvement Inconclusive Productivity:
Academic vs. industrial setting
Munir et al.4Improvement or no difference Degradation or no difference Quality:
• Low vs. high rigor
• Low vs. high relevance
Productivity:
• Low vs. high rigor
• Low vs. high relevance
Raque and Mišic´5Improvement Inconclusive Quality:
Waterfall vs. iterative test-last
Productivity:
• Waterfall vs. iterative test-last
• Academic vs. industrial
Turhan et al.6 and Shull et al.1Improvement Inconclusive Quality:
• Among controlled experiments
• Among studies with high rigor
Productivity:
• Among pilot studies
• Controlled experiments vs.
industrial case studies
• Among studies with high rigor
Kollanus7Improvement Degradation Quality:
• Among academic studies
• Among semi-industrial studies
Siniaalto8Improvement Inconclusive Productivity:
• Among academic studies
• Among semi-industrial studies
REDIRECTIONS
JULY/AUGUST 20 18 | IEEE SOFTWARE
3
approach as a control group.9 This
created a large differential in granu-
larity between the treatments, and
sometimes even a complete lack of
tests in the control, resulting in un-
fair, misleading comparisons. In the
end, TDD might perform better only
when compared to a coarse-grained
development process.
Industry Adoption
(or Lack Thereof)
Discussions on TDD are common
and usually heated. But how com-
mon is the use of TDD in practice?
Not very—at least, that’s what the
evidence suggests.
For example, after monitoring
the development activity of 416 de-
velopers over more than 24,000
hours, researchers reported that the
developers followed TDD in only
12 percent of the projects that
claimed to use it.10 We’ve observed
similar patterns in our work with
professional developers. Indeed, if it
were possible to reanalyze all exist-
ing evidence considering this facet
only, the shape of things might
change signicantly (for better or
worse). We’ll be the devil’s advocate
and ask, what if the anecdotal evi-
dence from TDD enthusiasts is based
on misconceived personal experience
from non-TDD activities?
Similarly, a recent study analyzed
a September 2015 snapshot of all the
(Java) projects in GitHub.11 Using
heuristics for identifying TDD-like
repositories, the researchers found
that only 0.8 percent of the projects
adhered to TDD protocol. Further-
more, comparing those projects to
a control set, the study reported no
difference between the two groups in
terms of
• the commit velocity as a measure
of productivity,
• the number of bug-xing com-
mits as an indicator of the num-
ber of defects, and
• the number of issues reported
for the project as a predictor of
quality.
Additionally, a comparison of the
number of pull requests and the dis-
tribution of commits per author
didn’t indicate any effect on devel-
oper collaboration.
Adnan Causevic and his col-
leagues identied seven factors limit-
ing TDD’s use in the industry:12
• increased development time
(productivity hits),
• insufcient TDD experience or
knowledge,
• insufcient design,
• insufcient developer testing
skills,
• insufcient adherence to TDD
protocol,
• domain- and tool-specic limita-
tions, and
• legacy code.
It’s not surprising that three of these
factors are related to the developers’
capacity to follow TDD and their
rigor in following it.
What Really Makes TDD Tick?
A more rened look into TDD is
concerned with not only the order
in which production code and test
code are written but also the average
duration of development cycles, that
duration’s uniformity, and the refac-
toring effort. A recent study of 39
professionals reported that a steady
rhythm of short development cycles
was the primary reason for improved
quality and productivity.13 Indeed,
the effect of test-rst completely di-
minished when the effects of short
and steady cycles were considered.
These ndings are consistent with
earlier research demonstrating that
TDD experts had much shorter and
less variable cycle lengths than nov-
ices did.14 The signicance of short
development cycles extends beyond
TDD; Alistair Cockburn, in explain-
ing the Elephant Carpaccio concept,
states that “agile developers apply
micro-, even nano-incremental de-
velopment in their work.”15
Another claim of Elephant Car-
paccio, related to the TDD concept
of working on small tasks, is that
agile developers can deliver fast
“not because we’re so fast we can
[develop] 100 times as fast as other
people, but rather, we have trained
ourselves to ask for end-user-visible
functionality 100 times smaller than
most other people.”15 To test this,
we conducted experiments in which
we controlled for the framing of task
descriptions (ner-grained user sto-
ries versus coarser-grained generic
descriptions). We observed that the
type of task description and the task
itself are signicant factors affect-
ing software quality in the context
of TDD.
In short, working on small,
well-dened tasks in short, steady
development cycles has a more
positive impact on quality and
productivity than the order of test
implementation.
Deviations from the
Test-First Mantra
Even if we consider the studies that
focus on only the test-rst nature
of TDD, there’s still the problem of
conformance to the TDD process.
TDD isn’t a dichotomy in which
you either religiously write tests
rst every time or always test after
the fact. TDD is a continuous spec-
trum between these extremes, and
developers tend to dynamically span
REDIRECTIONS
4 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFT WARE
this spectrum, adjusting the TDD
process as needed. In industrial set-
tings, time pressure, lack of disci-
pline, and insufcient realization of
TDD’s benets have been reported
to cause developers to deviate from
the process.12
To gain more insight, in an ethno-
graphically informed study, research-
ers monitored and documented the
TDD development process more
closely by means of artifacts includ-
ing audio recordings and notes.16
They concluded that developers per-
ceived implementation as the most
important phase and didn’t strictly
follow the TDD process. In par-
ticular, developers wrote more pro-
duction code than necessary, often
omitted refactoring, and didn’t keep
test cases up to date in accordance
with the progression of the produc-
tion code. Even when the develop-
ers followed the test-rst principle,
they thought about how the produc-
tion code (not necessarily the design)
should be before they wrote the test
for the next feature. In other words,
perhaps we should simply name this
phenomenon “code-driven testing”?
TDD’s internal and external
dynamics are more complex
than the order in which tests
are written. There’s no convincing
evidence that TDD consistently fares
better than any other development
method, at least those methods that
are iterative. And enough evidence ex-
ists to question whether TDD fulls
its promises.
How do you decide whether and
when to use TDD, then? And what
about TDD’s secondary effects?
As always, context is the key, and
any potential benet of TDD is likely
not due to whatever order of writing
tests and code developers follow. It
makes sense to have realistic expecta-
tions rather than worship or discard
TDD. Focus on the rhythm of devel-
opment; for example, tackle small
tasks in short, steady development
cycles, rather than bother with the
test order. Also, keep in mind that
some tasks are better (suited) than
others with respect to “TDD-bility.”
This doesn’t mean you should
avoid trying TDD or stop using it.
For example, if you think that TDD
offers you the self-discipline to write
tests for each small functionality,
following the test-rst principle will
certainly prevent you from taking
shortcuts that skip tests. In this case,
there’s value in Beck’s suggestion,
“Never write a line of functional code
without a broken test case.”2 How-
ever, you should primarily consider
those tests’ quality (without obsessing
over coverage),17 instead of xating
on whether you wrote them before
the code. Although TDD does result
in more tests,1,6 the lack of attention
to testing quality,12 including main-
tainability and coevolution with pro-
duction code,16 could be alarming.
As long as you’re aware of and
comfortable with the potential trad-
eoff between productivity and test-
ability and quality (perhaps paying
off in the long term?), using TDD
is ne. If you’re simply having fun
and feeling good while performing
TDD without any signicant draw-
backs, that’s also ne. After all, the
evidence shows that happy develop-
ers are more productive and produce
better code!18
Acknowledgments
Academy of Finland Project 278354 partly
supports this research.
References
1. F. Shull et al., “W hat Do We Know
about Test-Driven Development?,”
IEEE Software, vol. 27, no. 6,
pp. 16–19, 2010.
2. K. Beck, Test-Driven Development:
By Example, Addison-Wesley, 2003.
3. W. Bissi et al., “The Effects of Test
Driven Development on Internal
Qualit y, External Qualit y and Pro-
ductivit y: A Systematic Review,” In-
formation and Software Technology,
June 2016, pp. 45–54.
4. H. Munir, M. Moayyed, and K.
Petersen, “Considering Rigor and Rel-
evance When Evaluating Test Driven
Development: A Systematic Review,”
Inform ation and Software Technol-
ogy, vol. 56, no. 4, 2014, pp. 375–394.
5. Y. Raque and V.B. Mišic
, “The Ef-
fects of Test-Driven Development on
External Quality and Productivity:
A Meta-analysis,” IEEE Trans. Soft-
ware Eng., vol. 39, no. 6, 2013, pp.
835–856; http://dx.doi.org/10.1109
/TSE.2012.28.
6. B. Turhan et al., “How Effective Is
Test-Driven Development?,” Making
Software: What Really Works, and
Why We Believe It, A. Oram and
G. Wilson, eds., O’Reilly Media,
2010, pp. 207–219.
7. S. Kollanus, “Test-Driven
Development—Still a Promising
Approach?,” Proc. 7th Int ’l Conf.
Quality of Information and Commu-
nications Technology (QUATIC 10),
2010, pp. 403– 408; http://dx.doi
.org/10.1109/QUATIC.2010.73.
8. M. Siniaalto, “Test Driven Develop-
ment: Empirical Body of Evidence,”
tech. report, Information Technology
for European Advancement, 3 Mar.
2006.
9. M. Pancˇur and M. Ciglaricˇ, “Im-
pact of Test-Driven Development on
Productivity, Code and Tests: A Con-
trolled Experiment,” Information
and Sof tware Technology, vol. 53,
no. 6, 2011, pp. 557–573.
10. M. Beller et al., “When, How, and
Why Developers (Do Not) Test
REDIRECTIONS
JULY/AUGUST 20 18 | IEEE SOFTWARE
5
in Their IDEs,” Proc. 10th Joint
Meeting Foundations of Soft-
ware Eng. (ESEC/FSE 15), 2015,
pp. 179–190; http://doi.acm.
org/10.1145/2786805. 2786843.
11. N.C . Borle et al., “Analyzing the
Effects of Test Driven Development
in GitHub,” Empirical Software
Eng., Nov. 2017.
12. A. Causevic, D. Sundmark, and
S. Punnekkat, “Factors Limiting
Industrial Adoption of Test Driven
Development: A Systematic Review,”
Proc. 4th IEEE Int’l Conf. Software
Tes ti ng, Verication and Validation,
2011, pp. 337–346.
13. D. Fucci et al., “A Dissection of the
Test-Driven Development Process:
Does It Really Matter to Test-First
or to Test-Last?,” IEE E Trans. Soft-
ware Eng., vol. 43, no. 7, 2017, pp.
597–614.
14. M.M. Müller and A. Höfer, “The Ef-
fect of Experience on the Test-Driven
Development Process,” Empirical
Software Eng., vol. 12, no. 6, 2007,
pp. 593 – 615; http s://doi.org /10.10 07
/s10664-007-9048-2.
15. A. Cockburn, “Elephant Carpaccio,”
blog; http://alistair.cockburn.us
/Elephant1carpaccio.
16. S. Romano et al., “Findings from a
Multi-method Study on Test-Driven
Development,” Information and
Software Technolog y, Sept. 2017,
pp. 64–77.
17. D. Bowes et al., “How Good Are My
Tes ts?,” Proc. IEEE/ACM 8th Work-
shop Emerging Trend s in Software
Metrics (WETSoM 17), 2017, pp.
9– 14.
18. D. Graziotin et al., “What Happens
When Software Developers Are (Un)
happy,” J. Systems and Software,
June 2018, pp. 32– 47.
Read your subscriptions
throu gh the myCS
publications portal at
http://mycs.computer.org
ABOUT THE AUTHORS
ITIR KARAC is a project researcher in the M-Group research group
and a doctoral student in the Department of Information Processing
Science at the University of Oulu. Contact her at itir.karac@oulu..
BURAK TURHAN is a senior lecturer in Brunel Universit y’s
Department of Computer Science and a professor of software
engineering at the University of Oulu. Contact him at turhanb@
computer.org.