ArticlePDF Available

What Do We (Really) Know about Test-Driven Development?

Authors:

Abstract

Test-driven development (TDD) involves more than just testing before coding. This article examines how (and whether) TDD has lived up to its promises.
FROM THE EDITOR Editor: Editor Name
affiliation
email@email.com
0740-7459/18/$33.00 © 2018 IEEE JULY/AUGUST 2018 | IEEE SOFTWARE
1
REDIRECTIONS
Editor: Tim Menzies
Nort h Caroli na State
Universi ty
tim@menzies.us
What Do We (Really)
Know about Test-Driven
Development?
Itir Karac and Burak Turhan
TEST-DRIVEN DEVELOPMENT
(TD D) is one of the most controver-
sial agile practices in terms of its
impact on software quality and pro-
grammer productivity. After more
than a decade’s research, the jury is
still out on its effectiveness. TDD
promised all: increased quality and
productivity, along with an emerg-
ing, clean design supported by the
safety net of a growing library
of tests. What’s more, the recipe
sounded surprisingly simple: Don’t
write code without a failing test.
Here, we revisit the evidence of
the promises of TDD.1 But, before
we go on, just pause and think of an
answer to the following core ques-
tion: What is TDD?
Let us guess: your response is
most likely along the lines of, “TDD
is a practice in which you write
tests before code.” This emphasis
on its test-rst dynamic, strongly
implied by the name, is perhaps the
root of most, if not all, of the con-
troversy about TDD. Unfortunately,
it’s a common misconception to use
“TDD” and “test-rst” interchange-
ably. Test-rst is only one part of
TDD. There are many other cogs
in the system that potentially make
TDD tick.
How about working on small
tasks, keeping the red–green–refactor
cycles short and steady, writing only
the code necessary to pass a fail-
ing test, and refactoring? What if
we told you that some of these cogs
contribute more toward fullling
the promises of TDD than the order
of test implementation? (Hint: you
should ask for evidence.)
15 Years of (Contradictory)
Evidence
Back in 2003, when the software
development paradigm started to
change irrevocably (for the bet-
ter?), Kent Beck posed a claim based
on anecdotal evidence and paved
the way for software engineering
researchers:
No studies have categorically
demonstrated the difference be-
tween TDD and any of the many
Call for Submissions
Do you have a surprising result or industrial experience? Someth ing that chal-
lenges decades of conventiona l thinking in software engineering? If so, email a
one-paragraph synopsis to timm@ieee.org (use the subject line “REDIRECTIONS:
Idea: your idea”). If that looks interesting, I’ll ask you to submit a 1,000- to 2,400-
word article (in which each gure or table counts as 250 words) for review for the
Redirections department. Please note: heresies are more tha n welcome (if sup-
ported by well-reasoned industrial experiences, case studies, or other empirical
results).—
Tim Menzies
2 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFT WARE
alternatives in quality, productiv-
ity, or fun. However, the anecdotal
evidence is overwhelming, and the
secondary effects are unmistakable.2
Since then, numerous studies—
for example, experiments and case
studies—have investigated TDD’s
effectiveness. These studies are pe-
riodically synthesized in secondary
studies (see Table 1), only to reveal
contradictory results across the pri-
mary studies. This research has also
demonstrated no consistent overall
benet from TDD, particularly for
overall productivity and within sub-
groups for quality.
Why the inconsistent results? Be-
sides the reasons listed in Table 1,
other likely reasons are that
• TDD has too many cogs,
• its effectiveness is highly inu-
enced by the context (for ex-
ample, the tasks at hand or skills
of individuals),
• the cogs highly interact with
each other, and
• most studies have focused on
only the test-rst aspect.
Identifying the inconsistencies’
sources is important for designing
further studies that control for those
sources.
Matjaž Pancˇur and Mojca
Ciglaricˇ speculated that the results of
studies showing TDD’s superiority
over a test-last approach were due to
the fact that most of the experiments
employed a coarse-grained test-last
process closer to the waterfall
Table 1. Systematic literature reviews on test-driven development (TDD).
Study
Overall conclusion for quality
with TDD
Overall conclusion for
productivity with TDD
Inconsistent result s in the study
categories
Bissi et al.3Improvement Inconclusive Productivity:
Academic vs. industrial setting
Munir et al.4Improvement or no difference Degradation or no difference Quality:
Low vs. high rigor
Low vs. high relevance
Productivity:
Low vs. high rigor
Low vs. high relevance
Raque and Mišic´5Improvement Inconclusive Quality:
Waterfall vs. iterative test-last
Productivity:
Waterfall vs. iterative test-last
Academic vs. industrial
Turhan et al.6 and Shull et al.1Improvement Inconclusive Quality:
Among controlled experiments
Among studies with high rigor
Productivity:
Among pilot studies
Controlled experiments vs.
industrial case studies
Among studies with high rigor
Kollanus7Improvement Degradation Quality:
Among academic studies
Among semi-industrial studies
Siniaalto8Improvement Inconclusive Productivity:
Among academic studies
Among semi-industrial studies
REDIRECTIONS
JULY/AUGUST 20 18 | IEEE SOFTWARE
3
approach as a control group.9 This
created a large differential in granu-
larity between the treatments, and
sometimes even a complete lack of
tests in the control, resulting in un-
fair, misleading comparisons. In the
end, TDD might perform better only
when compared to a coarse-grained
development process.
Industry Adoption
(or Lack Thereof)
Discussions on TDD are common
and usually heated. But how com-
mon is the use of TDD in practice?
Not very—at least, that’s what the
evidence suggests.
For example, after monitoring
the development activity of 416 de-
velopers over more than 24,000
hours, researchers reported that the
developers followed TDD in only
12 percent of the projects that
claimed to use it.10 We’ve observed
similar patterns in our work with
professional developers. Indeed, if it
were possible to reanalyze all exist-
ing evidence considering this facet
only, the shape of things might
change signicantly (for better or
worse). We’ll be the devil’s advocate
and ask, what if the anecdotal evi-
dence from TDD enthusiasts is based
on misconceived personal experience
from non-TDD activities?
Similarly, a recent study analyzed
a September 2015 snapshot of all the
(Java) projects in GitHub.11 Using
heuristics for identifying TDD-like
repositories, the researchers found
that only 0.8 percent of the projects
adhered to TDD protocol. Further-
more, comparing those projects to
a control set, the study reported no
difference between the two groups in
terms of
• the commit velocity as a measure
of productivity,
• the number of bug-xing com-
mits as an indicator of the num-
ber of defects, and
• the number of issues reported
for the project as a predictor of
quality.
Additionally, a comparison of the
number of pull requests and the dis-
tribution of commits per author
didn’t indicate any effect on devel-
oper collaboration.
Adnan Causevic and his col-
leagues identied seven factors limit-
ing TDD’s use in the industry:12
• increased development time
(productivity hits),
• insufcient TDD experience or
knowledge,
insufcient design,
• insufcient developer testing
skills,
• insufcient adherence to TDD
protocol,
• domain- and tool-specic limita-
tions, and
• legacy code.
It’s not surprising that three of these
factors are related to the developers’
capacity to follow TDD and their
rigor in following it.
What Really Makes TDD Tick?
A more rened look into TDD is
concerned with not only the order
in which production code and test
code are written but also the average
duration of development cycles, that
duration’s uniformity, and the refac-
toring effort. A recent study of 39
professionals reported that a steady
rhythm of short development cycles
was the primary reason for improved
quality and productivity.13 Indeed,
the effect of test-rst completely di-
minished when the effects of short
and steady cycles were considered.
These ndings are consistent with
earlier research demonstrating that
TDD experts had much shorter and
less variable cycle lengths than nov-
ices did.14 The signicance of short
development cycles extends beyond
TDD; Alistair Cockburn, in explain-
ing the Elephant Carpaccio concept,
states that “agile developers apply
micro-, even nano-incremental de-
velopment in their work.”15
Another claim of Elephant Car-
paccio, related to the TDD concept
of working on small tasks, is that
agile developers can deliver fast
“not because we’re so fast we can
[develop] 100 times as fast as other
people, but rather, we have trained
ourselves to ask for end-user-visible
functionality 100 times smaller than
most other people.15 To test this,
we conducted experiments in which
we controlled for the framing of task
descriptions (ner-grained user sto-
ries versus coarser-grained generic
descriptions). We observed that the
type of task description and the task
itself are signicant factors affect-
ing software quality in the context
of TDD.
In short, working on small,
well-dened tasks in short, steady
development cycles has a more
positive impact on quality and
productivity than the order of test
implementation.
Deviations from the
Test-First Mantra
Even if we consider the studies that
focus on only the test-rst nature
of TDD, there’s still the problem of
conformance to the TDD process.
TDD isn’t a dichotomy in which
you either religiously write tests
rst every time or always test after
the fact. TDD is a continuous spec-
trum between these extremes, and
developers tend to dynamically span
REDIRECTIONS
4 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFT WARE
this spectrum, adjusting the TDD
process as needed. In industrial set-
tings, time pressure, lack of disci-
pline, and insufcient realization of
TDD’s benets have been reported
to cause developers to deviate from
the process.12
To gain more insight, in an ethno-
graphically informed study, research-
ers monitored and documented the
TDD development process more
closely by means of artifacts includ-
ing audio recordings and notes.16
They concluded that developers per-
ceived implementation as the most
important phase and didn’t strictly
follow the TDD process. In par-
ticular, developers wrote more pro-
duction code than necessary, often
omitted refactoring, and didn’t keep
test cases up to date in accordance
with the progression of the produc-
tion code. Even when the develop-
ers followed the test-rst principle,
they thought about how the produc-
tion code (not necessarily the design)
should be before they wrote the test
for the next feature. In other words,
perhaps we should simply name this
phenomenon “code-driven testing”?
TDD’s internal and external
dynamics are more complex
than the order in which tests
are written. There’s no convincing
evidence that TDD consistently fares
better than any other development
method, at least those methods that
are iterative. And enough evidence ex-
ists to question whether TDD fulls
its promises.
How do you decide whether and
when to use TDD, then? And what
about TDD’s secondary effects?
As always, context is the key, and
any potential benet of TDD is likely
not due to whatever order of writing
tests and code developers follow. It
makes sense to have realistic expecta-
tions rather than worship or discard
TDD. Focus on the rhythm of devel-
opment; for example, tackle small
tasks in short, steady development
cycles, rather than bother with the
test order. Also, keep in mind that
some tasks are better (suited) than
others with respect to “TDD-bility.
This doesn’t mean you should
avoid trying TDD or stop using it.
For example, if you think that TDD
offers you the self-discipline to write
tests for each small functionality,
following the test-rst principle will
certainly prevent you from taking
shortcuts that skip tests. In this case,
there’s value in Beck’s suggestion,
“Never write a line of functional code
without a broken test case.”2 How-
ever, you should primarily consider
those tests’ quality (without obsessing
over coverage),17 instead of xating
on whether you wrote them before
the code. Although TDD does result
in more tests,1,6 the lack of attention
to testing quality,12 including main-
tainability and coevolution with pro-
duction code,16 could be alarming.
As long as you’re aware of and
comfortable with the potential trad-
eoff between productivity and test-
ability and quality (perhaps paying
off in the long term?), using TDD
is ne. If you’re simply having fun
and feeling good while performing
TDD without any signicant draw-
backs, that’s also ne. After all, the
evidence shows that happy develop-
ers are more productive and produce
better code!18
Acknowledgments
Academy of Finland Project 278354 partly
supports this research.
References
1. F. Shull et al., “W hat Do We Know
about Test-Driven Development?,”
IEEE Software, vol. 27, no. 6,
pp. 16–19, 2010.
2. K. Beck, Test-Driven Development:
By Example, Addison-Wesley, 2003.
3. W. Bissi et al., “The Effects of Test
Driven Development on Internal
Qualit y, External Qualit y and Pro-
ductivit y: A Systematic Review,” In-
formation and Software Technology,
June 2016, pp. 45–54.
4. H. Munir, M. Moayyed, and K.
Petersen, “Considering Rigor and Rel-
evance When Evaluating Test Driven
Development: A Systematic Review,”
Inform ation and Software Technol-
ogy, vol. 56, no. 4, 2014, pp. 375–394.
5. Y. Raque and V.B. Mišic
, “The Ef-
fects of Test-Driven Development on
External Quality and Productivity:
A Meta-analysis,” IEEE Trans. Soft-
ware Eng., vol. 39, no. 6, 2013, pp.
835–856; http://dx.doi.org/10.1109
/TSE.2012.28.
6. B. Turhan et al., “How Effective Is
Test-Driven Development?,” Making
Software: What Really Works, and
Why We Believe It, A. Oram and
G. Wilson, eds., O’Reilly Media,
2010, pp. 207–219.
7. S. Kollanus, “Test-Driven
Development—Still a Promising
Approach?,” Proc. 7th Int ’l Conf.
Quality of Information and Commu-
nications Technology (QUATIC 10),
2010, pp. 403– 408; http://dx.doi
.org/10.1109/QUATIC.2010.73.
8. M. Siniaalto, “Test Driven Develop-
ment: Empirical Body of Evidence,”
tech. report, Information Technology
for European Advancement, 3 Mar.
2006.
9. M. Pancˇur and M. Ciglaricˇ, “Im-
pact of Test-Driven Development on
Productivity, Code and Tests: A Con-
trolled Experiment,Information
and Sof tware Technology, vol. 53,
no. 6, 2011, pp. 557–573.
10. M. Beller et al., “When, How, and
Why Developers (Do Not) Test
REDIRECTIONS
JULY/AUGUST 20 18 | IEEE SOFTWARE
5
in Their IDEs,” Proc. 10th Joint
Meeting Foundations of Soft-
ware Eng. (ESEC/FSE 15), 2015,
pp. 179–190; http://doi.acm.
org/10.1145/2786805. 2786843.
11. N.C . Borle et al., “Analyzing the
Effects of Test Driven Development
in GitHub,” Empirical Software
Eng., Nov. 2017.
12. A. Causevic, D. Sundmark, and
S. Punnekkat, “Factors Limiting
Industrial Adoption of Test Driven
Development: A Systematic Review,”
Proc. 4th IEEE Int’l Conf. Software
Tes ti ng, Verication and Validation,
2011, pp. 337–346.
13. D. Fucci et al., “A Dissection of the
Test-Driven Development Process:
Does It Really Matter to Test-First
or to Test-Last?,” IEE E Trans. Soft-
ware Eng., vol. 43, no. 7, 2017, pp.
597614.
14. M.M. Müller and A. Höfer, “The Ef-
fect of Experience on the Test-Driven
Development Process,” Empirical
Software Eng., vol. 12, no. 6, 2007,
pp. 593 615; http s://doi.org /10.10 07
/s10664-007-9048-2.
15. A. Cockburn, “Elephant Carpaccio,”
blog; http://alistair.cockburn.us
/Elephant1carpaccio.
16. S. Romano et al., “Findings from a
Multi-method Study on Test-Driven
Development,” Information and
Software Technolog y, Sept. 2017,
pp. 64–77.
17. D. Bowes et al., “How Good Are My
Tes ts?,” Proc. IEEE/ACM 8th Work-
shop Emerging Trend s in Software
Metrics (WETSoM 17), 2017, pp.
9– 14.
18. D. Graziotin et al., “What Happens
When Software Developers Are (Un)
happy,J. Systems and Software,
June 2018, pp. 32– 47.
Read your subscriptions
throu gh the myCS
publications portal at
http://mycs.computer.org
ABOUT THE AUTHORS
ITIR KARAC is a project researcher in the M-Group research group
and a doctoral student in the Department of Information Processing
Science at the University of Oulu. Contact her at itir.karac@oulu..
BURAK TURHAN is a senior lecturer in Brunel Universit y’s
Department of Computer Science and a professor of software
engineering at the University of Oulu. Contact him at turhanb@
computer.org.
... This should facilitate re-analysis, the application of different analysis methods to provide joint conclusions, the investigation of other research questions, and the integration of results in future meta-analyses. 7. The results of this family of experiments contradict findings commonly reached in secondary and tertiary studies on the effectiveness of TDD on external quality (Karac and Turhan 2018). While secondary and tertiary studies typically indicate that TDD achieves higher quality than control approaches (Karac and Turhan 2018), we do not reach the same conclusion. ...
... Even though we do not focus on the length and rhythm of the development cycles with TDD (see Fucci et al. 2017 for an extended discussion on this point), our results appear to be consistent with the observations made by Karac and Turhan (2018). In particular, both TDD and ITL perform similarly within our family of experiments. ...
... Additionally, other issues, such as the average development cycle duration, duration uniformity and refactoring effort, rather than the code/test writing order could pose a challenge (Karac and Turhan 2018). ...
Article
Full-text available
Context Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process, or case studies comparing the performance achieved by TDD vs. the control approach (e.g., the waterfall model), each applied to develop a different system. Further experiments with TDD experts are needed to validate these hypotheses.
... This should facilitate re-analysis, the application of different analysis methods to provide joint conclusions, the investigation of other research questions, and the integration of results in future meta-analyses. 7. The results of this family of experiments contradict findings commonly reached in secondary and tertiary studies on the effectiveness of TDD on external quality [29]. While secondary and tertiary studies typically indicate that TDD achieves higher quality than control approaches [29], we do not reach the same conclusion. ...
... 7. The results of this family of experiments contradict findings commonly reached in secondary and tertiary studies on the effectiveness of TDD on external quality [29]. While secondary and tertiary studies typically indicate that TDD achieves higher quality than control approaches [29], we do not reach the same conclusion. Also, our results disagree with the findings of the meta-analyses conducted by [42] claiming that industrial experiments led to more significant improvements in quality with TDD. ...
... The waterfall model implies that tests are generated after coding and there are no development cycles. The benefits of TDD could be due to the use of these short development cycles (which is also a characteristic of ITL [29], but not of the waterfall model). In view of this, we call upon the community not only to look at the results of the studies but to also assess the extent to which participant or study characteristics may be influencing results. ...
Article
Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives: The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method: We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results: TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion: Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process, or case studies comparing the performance achieved by TDD vs. the control approach (e.g., the waterfall model), each applied to develop a different system. Further experiments with TDD experts are needed to validate these hypotheses.
... This should facilitate re-analysis, the application of different analysis methods to provide joint conclusions, the investigation of other research questions, and the integration of results in future meta-analyses. 7. The results of this family of experiments contradict findings commonly reached in secondary and tertiary studies on the effectiveness of TDD on external quality [29]. While secondary and tertiary studies typically indicate that TDD achieves higher quality than control approaches [29], we do not reach the same conclusion. ...
... 7. The results of this family of experiments contradict findings commonly reached in secondary and tertiary studies on the effectiveness of TDD on external quality [29]. While secondary and tertiary studies typically indicate that TDD achieves higher quality than control approaches [29], we do not reach the same conclusion. Also, our results disagree with the findings of the meta-analyses conducted by [42] claiming that industrial experiments led to more significant improvements in quality with TDD. ...
... The waterfall model implies that tests are generated after coding and there are no development cycles. The benefits of TDD could be due to the use of these short development cycles (which is also a characteristic of ITL [29], but not of the waterfall model). In view of this, we call upon the community not only to look at the results of the studies but to also assess the extent to which participant or study characteristics may be influencing results. ...
Preprint
Full-text available
Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives: The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method: We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results: TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion: Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process...
... The motivation for this work is to provide software companies a road map for the introduction of TDD in their policies based on the current state of research. However, before that can happen, practitioners need to be made aware of the TDD research results, which are often inconclusive and oftentimes contradictory [22]. ...
... In the first step, we looked at secondary studies on TDD. We mainly based our work on nine secondary studies reported in a recent meta literature study [22]. We used these secondary studies (see Table 1) to get an overview of the state of research on TDD, and to acquaint ourselves with the diverging results discussed in previous work. ...
... The secondary studies we analyzed in the first stepAuthorsTitle Karac and Turhan[22] What Do We (Really) Know about Test-Driven Development? Bissi et al. ...
Preprint
Full-text available
[Background] Recent investigations into the effects of Test-Driven Development (TDD) have been contradictory and inconclusive. This hinders development teams to use research results as the basis for deciding whether and how to apply TDD. [Aim] To support researchers when designing a new study and to increase the applicability of TDD research in the decision-making process in the industrial context, we aim at identifying the reasons behind the inconclusive research results in TDD. [Method] We studied the state of the art in TDD research published in top venues in the past decade, and analyzed the way these studies were set up. [Results] We identified five categories of factors that directly impact the outcome of studies on TDD. [Conclusions] This work can help researchers to conduct more reliable studies, and inform practitioners of risks they need to consider when consulting research on TDD.
... The order with which unit tests interpose within the process underlying TDD-i.e., the writing of a test precedes the one of the corresponding production code-is known as test-first sequencing (or also test-first dynamic) [5]. It is worth noting that test-first sequencing refers to just one central aspect of TDD [6]. That is, it does not capture the full nature of TDD [5]. ...
... A number of primary studies, like experiments or case studies, have been conducted on TDD (e.g., [8,9,12,13,14]). Their results, gathered and combined in a number of secondary studies (e.g., [6,15,16,17,18,19]), do not fully support the claimed benefits of TDD. Therefore, some researchers have recommended taking a longitudinal perspective when investigating such a development approach (e.g., [16,18,20,21])-i.e., studying TDD over a time span. ...
Preprint
In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of software products nor developers' productivity. However, we observed that the participants applying TDD produced significantly more tests, with a higher fault-detection capability than those using a non-TDD approach. As for the retainment of TDD, we found that TDD is retained by novice developers for at least six months.
... There are recommendations of how tests should be written, but we do not know how they are written in practice. Many researchers have tried to clarify the motivation of writing tests [3][4][5], the impact of Test-driven development (TDD) on code quality [6][7][8][9], the effectiveness of tests on defective code [10][11][12], or the popularity of testing frameworks [13]. However, a manual analysis of test cases in projects could also be helpful to reveal and understand how tests are written. ...
Article
Full-text available
Automated tests are often considered an indicator of project quality. In this paper, we performed a large analysis of 6.3 M public GitHub projects using Java as the primary programming language. We created an overview of tests occurrence in publicly available GitHub projects and the use of test frameworks in them. The results showed that 52% of the projects contain at least one test case. However, there is a large number of example tests that do not represent relevant production code testing. It was also found that there is only a poor correlation between the number of the word “test” in different parts of the project (e.g., file paths, file name, file content, etc.) and the number of test cases, creation date, date of the last commit, number of commits, or number of watchers. Testing framework analysis confirmed that JUnit is the most used testing framework with a 48% share. TestNG, considered the second most popular Java unit testing framework, occurred in only 3% of the projects.
Article
The research on the claimed effects of Test-Driven Development (TDD) on software quality and developers’ productivity has shown inconclusive results. Some researchers have ascribed such results to the negative affective reactions that TDD would provoke when developers apply it. In this paper, we studied whether and in which phases TDD influences the affective states of developers, who are new to this development approach. To that end, we conducted a baseline experiment and two replications, and analyzed the data from these experiments both individually and jointly. Also, we performed methodological triangulation by means of an explanatory survey, whose respondents were experienced with TDD. The results of the baseline experiment suggested that developers like TDD significantly less, compared to a non-TDD approach. Also, developers who apply TDD like implementing production code significantly less than those who apply a non-TDD approach, while testing production code makes TDD developers significantly less happy. These results were not confirmed in the replicated experiments. We found that the moderator that better explains these differences across experiments is experience (in months) with unit testing, practiced in a test-last manner. The higher the experience with unit testing, the more negative the affective reactions caused by TDD. The results from the survey seem to confirm the role of this moderator.
Article
In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of software products nor developers’ productivity. However, we observed that the participants applying TDD produced significantly more tests, with a higher fault-detection capability, than those using a non-TDD approach. As for the retainment of TDD, we found that TDD is retained by novice developers for at least six months.
Article
Full-text available
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. To that end, we have conducted a comparative analysis of GitHub repositories that adopts TDD to a lesser or greater extent, in order to determine how TDD affects software development productivity and software quality. We classified GitHub repositories archived in 2015 in terms of how rigorously they practiced TDD, thus creating a TDD spectrum. We then matched and compared various subsets of these repositories on this TDD spectrum with control sets of equal size. The control sets were samples from all GitHub repositories that matched certain characteristics, and that contained at least one test file. We compared how the TDD sets differed from the control sets on the following characteristics: number of test files, average commit velocity, number of bug-referencing commits, number of issues recorded, usage of continuous integration, number of pull requests, and distribution of commits per author. We found that Java TDD projects were relatively rare. In addition, there were very few significant differences in any of the metrics we used to compare TDD-like and non-TDD projects; therefore, our results do not reveal any observable benefits from using TDD.
Article
Full-text available
The growing literature on affect among software developers mostly reports on the linkage between happiness, software quality, and developer productivity. Understanding happiness and unhappiness in all its components -- positive and negative emotions and moods -- is an attractive and important endeavour. Scholars in industrial and organizational psychology have suggested that understanding happiness and unhappiness could lead to cost-effective ways of enhancing working conditions, job performance, and to limiting the occurrence of psychological disorders. Our comprehension of the consequences of (un)happiness among developers is still too shallow, being mainly expressed in terms of development productivity and software quality. In this paper, we study what happens when developers are happy and not happy. Qualitative data analysis of responses given by 317 questionnaire participants identified 42 consequences of unhappiness and 32 of happiness. We found consequences of happiness and unhappiness that are beneficial and detrimental for developers' mental well-being, the software development process, and the produced artefacts. Our classification scheme, available as open data enables new happiness research opportunities of cause-effect type, and it can act as a guideline for practitioners for identifying damaging effects of unhappiness and for fostering happiness on the job.
Article
Full-text available
Context: Test Driven Development (TDD) is an agile practice that has gained popularity when it was defined as a fundamental part in eXtreme Programming (XP). Objective: This study analyzed the conclusions of previously published articles on the effects of TDD on internal and external software quality and productivity, comparing TDD with Test Last Development (TLD). Method: In this study, a systematic literature review has been conducted considering articles published between 1999 and 2014. Results: In about 57% of the analyzed studies, the results were validated through experiments and in 32% of them, validation was performed through a case study. The results of this analysis show that 76% of the studies have identified a significant increase in internal software quality while 88% of the studies identified a meaningful increase in external software quality. There was an increase in productivity in the academic environment, while in the industrial scenario there was a decrease in productivity. Overall, about 44% of the studies indicated lower productivity when using TDD compared to TLD. Conclusion: According to our findings, TDD yields more benefits than TLD for internal and external software quality, but it results in lower developer productivity than TLD.
Conference Paper
Full-text available
The research community in Software Engineering and Software Testing in particular builds many of its contributions on a set of mutually shared expectations. Despite the fact that they form the basis of many publications as well as open-source and commercial testing applications, these common expectations and beliefs are rarely ever questioned. For example, Frederic Brooks' statement that testing takes half of the development time seems to have manifested itself within the community since he first made it in the "Mythical Man Month" in 1975. With this paper, we report on the surprising results of a large-scale field study with 416 software engineers whose development activity we closely monitored over the course of five months, resulting in over 13 years of recorded work time in their integrated development environments (IDEs). Our findings question several commonly shared assumptions and beliefs about testing and might be contributing factors to the observed bug proneness of software in practice: the majority of developers in our study does not test; developers rarely run their tests in the IDE; Test-Driven Development (TDD) is not widely practiced; and, last but not least, software developers only spend a quarter of their work time engineering tests, whereas they think they test half of their time.
Article
Full-text available
This paper provides a systematic meta-analysis of 27 studies that investigate the impact of Test-Driven Development (TDD) on external code quality and productivity. The results indicate that, in general, TDD has a small positive effect on quality but little to no discernible effect on productivity. However, subgroup analysis has found both the quality improvement and the productivity drop to be much larger in industrial studies in comparison with academic studies. A larger drop of productivity was found in studies where the difference in test effort between the TDD and the control group's process was significant. A larger improvement in quality was also found in the academic studies when the difference in test effort is substantial; however, no conclusion could be derived regarding the industrial studies due to the lack of data. Finally, the influence of developer experience and task size as moderator variables was investigated, and a statistically significant positive correlation was found between task size and the magnitude of the improvement in quality.
Article
Full-text available
Context Test driven development (TDD) has been extensively researched and compared to traditional approaches (test last development, TLD). Existing literature reviews show varying results for TDD. Objective This study investigates how the conclusions of existing literature reviews change when taking two study quality dimension into account, namely rigor and relevance. Method In this study a systematic literature review has been conducted and the results of the identified primary studies have been analyzed with respect to rigor and relevance scores using the assessment rubric proposed by Ivarsson and Gorschek 2011. Rigor and relevance are rated on a scale, which is explained in this paper. Four categories of studies were defined based on high/low rigor and relevance. Results We found that studies in the four categories come to different conclusions. In particular, studies with a high rigor and relevance scores show clear results for improvement in external quality, which seem to come with a loss of productivity. At the same time high rigor and relevance studies only investigate a small set of variables. Other categories contain many studies showing no difference, hence biasing the results negatively for the overall set of primary studies. Given the classification differences to previous literature reviews could be highlighted. Conclusion Strong indications are obtained that external quality is positively influenced, which has to be further substantiated by industry experiments and longitudinal case studies. Future studies in the high rigor and relevance category would contribute largely by focusing on a wider set of outcome variables (e.g. internal code quality). We also conclude that considering rigor and relevance in TDD evaluation is important given the differences in results between categories and in comparison to previous reviews.
Article
Context: Test-driven development (TDD) is an iterative software development practice where unit tests are defined before production code. A number of quantitative empirical investigations have been conducted about this practice. The results are contrasting and inconclusive. In addition, previous studies fail to analyze the values, beliefs, and assumptions that inform and shape TDD. Objective: We present a study designed, and conducted to understand the values, beliefs, and assumptions about TDD. Participants were novice and professional software developers. Method: We conducted an ethnographically-informed study with 14 novice software developers, i.e., graduate students in Computer Science at the University of Basilicata, and six professional software developers (with one to 10 years work experience). The participants worked on the implementation of a new feature for an existing software written in Java. We immersed ourselves in the context of our study. We collected qualitative information by means of audio recordings, contemporaneous field notes, and other kinds of artifacts. We collected quantitative data from the integrated development environment to support or refute the ethnography results. Results: The main insights of our study can be summarized as follows: (i) refactoring (one of the phases of TDD) is not performed as often as the process requires and it is considered less important than other phases, (ii) the most important phase is implementation, (iii) unit tests are almost never up-to-date, and (iv) participants first build in their mind a sort of model of the source code to be implemented and only then write test cases. The analysis of the quantitative data supported the following qualitative findings: (i), (iii), and (iv). Conclusions: Developers write quick-and-dirty production code to pass the tests, do not update their tests often, and ignore refactoring.
Article
Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.
Article
Test-driven development is an approach to software development, where automated tests are written before production code in highly iterative cycles. Test-driven development attracts attention as well as followers in professional environment; however empirical evidence of its superiority regarding its effect on productivity, code and tests compared to test-last development is still fairly limited. Moreover, it is not clear if the supposed benefits come from writing tests before code or maybe from high iterativity/short development cycles.