ArticlePDF Available

Abstract

Test-driven development (TDD) involves more than just testing before coding. This article examines how (and whether) TDD has lived up to its promises.
FROM THE EDITOR Editor: Editor Name
affiliation
email@email.com
0740-7459/18/$33.00 © 2018 IEEE JULY/AUGUST 2018 | IEEE SOFTWARE
1
REDIRECTIONS
Editor: Tim Menzies
Nort h Caroli na State
Universi ty
tim@menzies.us
What Do We (Really)
Know about Test-Driven
Development?
Itir Karac and Burak Turhan
TEST-DRIVEN DEVELOPMENT
(TD D) is one of the most controver-
sial agile practices in terms of its
impact on software quality and pro-
grammer productivity. After more
than a decade’s research, the jury is
still out on its effectiveness. TDD
promised all: increased quality and
productivity, along with an emerg-
ing, clean design supported by the
safety net of a growing library
of tests. What’s more, the recipe
sounded surprisingly simple: Don’t
write code without a failing test.
Here, we revisit the evidence of
the promises of TDD.1 But, before
we go on, just pause and think of an
answer to the following core ques-
tion: What is TDD?
Let us guess: your response is
most likely along the lines of, “TDD
is a practice in which you write
tests before code.” This emphasis
on its test-rst dynamic, strongly
implied by the name, is perhaps the
root of most, if not all, of the con-
troversy about TDD. Unfortunately,
it’s a common misconception to use
“TDD” and “test-rst” interchange-
ably. Test-rst is only one part of
TDD. There are many other cogs
in the system that potentially make
TDD tick.
How about working on small
tasks, keeping the red–green–refactor
cycles short and steady, writing only
the code necessary to pass a fail-
ing test, and refactoring? What if
we told you that some of these cogs
contribute more toward fullling
the promises of TDD than the order
of test implementation? (Hint: you
should ask for evidence.)
15 Years of (Contradictory)
Evidence
Back in 2003, when the software
development paradigm started to
change irrevocably (for the bet-
ter?), Kent Beck posed a claim based
on anecdotal evidence and paved
the way for software engineering
researchers:
No studies have categorically
demonstrated the difference be-
tween TDD and any of the many
Call for Submissions
Do you have a surprising result or industrial experience? Someth ing that chal-
lenges decades of conventiona l thinking in software engineering? If so, email a
one-paragraph synopsis to timm@ieee.org (use the subject line “REDIRECTIONS:
Idea: your idea”). If that looks interesting, I’ll ask you to submit a 1,000- to 2,400-
word article (in which each gure or table counts as 250 words) for review for the
Redirections department. Please note: heresies are more tha n welcome (if sup-
ported by well-reasoned industrial experiences, case studies, or other empirical
results).—
Tim Menzies
2 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFT WARE
alternatives in quality, productiv-
ity, or fun. However, the anecdotal
evidence is overwhelming, and the
secondary effects are unmistakable.2
Since then, numerous studies—
for example, experiments and case
studies—have investigated TDD’s
effectiveness. These studies are pe-
riodically synthesized in secondary
studies (see Table 1), only to reveal
contradictory results across the pri-
mary studies. This research has also
demonstrated no consistent overall
benet from TDD, particularly for
overall productivity and within sub-
groups for quality.
Why the inconsistent results? Be-
sides the reasons listed in Table 1,
other likely reasons are that
• TDD has too many cogs,
• its effectiveness is highly inu-
enced by the context (for ex-
ample, the tasks at hand or skills
of individuals),
• the cogs highly interact with
each other, and
• most studies have focused on
only the test-rst aspect.
Identifying the inconsistencies’
sources is important for designing
further studies that control for those
sources.
Matjaž Pancˇur and Mojca
Ciglaricˇ speculated that the results of
studies showing TDD’s superiority
over a test-last approach were due to
the fact that most of the experiments
employed a coarse-grained test-last
process closer to the waterfall
Table 1. Systematic literature reviews on test-driven development (TDD).
Study
Overall conclusion for quality
with TDD
Overall conclusion for
productivity with TDD
Inconsistent result s in the study
categories
Bissi et al.3Improvement Inconclusive Productivity:
Academic vs. industrial setting
Munir et al.4Improvement or no difference Degradation or no difference Quality:
Low vs. high rigor
Low vs. high relevance
Productivity:
Low vs. high rigor
Low vs. high relevance
Raque and Mišic´5Improvement Inconclusive Quality:
Waterfall vs. iterative test-last
Productivity:
Waterfall vs. iterative test-last
Academic vs. industrial
Turhan et al.6 and Shull et al.1Improvement Inconclusive Quality:
Among controlled experiments
Among studies with high rigor
Productivity:
Among pilot studies
Controlled experiments vs.
industrial case studies
Among studies with high rigor
Kollanus7Improvement Degradation Quality:
Among academic studies
Among semi-industrial studies
Siniaalto8Improvement Inconclusive Productivity:
Among academic studies
Among semi-industrial studies
REDIRECTIONS
JULY/AUGUST 20 18 | IEEE SOFTWARE
3
approach as a control group.9 This
created a large differential in granu-
larity between the treatments, and
sometimes even a complete lack of
tests in the control, resulting in un-
fair, misleading comparisons. In the
end, TDD might perform better only
when compared to a coarse-grained
development process.
Industry Adoption
(or Lack Thereof)
Discussions on TDD are common
and usually heated. But how com-
mon is the use of TDD in practice?
Not very—at least, that’s what the
evidence suggests.
For example, after monitoring
the development activity of 416 de-
velopers over more than 24,000
hours, researchers reported that the
developers followed TDD in only
12 percent of the projects that
claimed to use it.10 We’ve observed
similar patterns in our work with
professional developers. Indeed, if it
were possible to reanalyze all exist-
ing evidence considering this facet
only, the shape of things might
change signicantly (for better or
worse). We’ll be the devil’s advocate
and ask, what if the anecdotal evi-
dence from TDD enthusiasts is based
on misconceived personal experience
from non-TDD activities?
Similarly, a recent study analyzed
a September 2015 snapshot of all the
(Java) projects in GitHub.11 Using
heuristics for identifying TDD-like
repositories, the researchers found
that only 0.8 percent of the projects
adhered to TDD protocol. Further-
more, comparing those projects to
a control set, the study reported no
difference between the two groups in
terms of
• the commit velocity as a measure
of productivity,
• the number of bug-xing com-
mits as an indicator of the num-
ber of defects, and
• the number of issues reported
for the project as a predictor of
quality.
Additionally, a comparison of the
number of pull requests and the dis-
tribution of commits per author
didn’t indicate any effect on devel-
oper collaboration.
Adnan Causevic and his col-
leagues identied seven factors limit-
ing TDD’s use in the industry:12
• increased development time
(productivity hits),
• insufcient TDD experience or
knowledge,
insufcient design,
• insufcient developer testing
skills,
• insufcient adherence to TDD
protocol,
• domain- and tool-specic limita-
tions, and
• legacy code.
It’s not surprising that three of these
factors are related to the developers’
capacity to follow TDD and their
rigor in following it.
What Really Makes TDD Tick?
A more rened look into TDD is
concerned with not only the order
in which production code and test
code are written but also the average
duration of development cycles, that
duration’s uniformity, and the refac-
toring effort. A recent study of 39
professionals reported that a steady
rhythm of short development cycles
was the primary reason for improved
quality and productivity.13 Indeed,
the effect of test-rst completely di-
minished when the effects of short
and steady cycles were considered.
These ndings are consistent with
earlier research demonstrating that
TDD experts had much shorter and
less variable cycle lengths than nov-
ices did.14 The signicance of short
development cycles extends beyond
TDD; Alistair Cockburn, in explain-
ing the Elephant Carpaccio concept,
states that “agile developers apply
micro-, even nano-incremental de-
velopment in their work.”15
Another claim of Elephant Car-
paccio, related to the TDD concept
of working on small tasks, is that
agile developers can deliver fast
“not because we’re so fast we can
[develop] 100 times as fast as other
people, but rather, we have trained
ourselves to ask for end-user-visible
functionality 100 times smaller than
most other people.15 To test this,
we conducted experiments in which
we controlled for the framing of task
descriptions (ner-grained user sto-
ries versus coarser-grained generic
descriptions). We observed that the
type of task description and the task
itself are signicant factors affect-
ing software quality in the context
of TDD.
In short, working on small,
well-dened tasks in short, steady
development cycles has a more
positive impact on quality and
productivity than the order of test
implementation.
Deviations from the
Test-First Mantra
Even if we consider the studies that
focus on only the test-rst nature
of TDD, there’s still the problem of
conformance to the TDD process.
TDD isn’t a dichotomy in which
you either religiously write tests
rst every time or always test after
the fact. TDD is a continuous spec-
trum between these extremes, and
developers tend to dynamically span
REDIRECTIONS
4 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFT WARE
this spectrum, adjusting the TDD
process as needed. In industrial set-
tings, time pressure, lack of disci-
pline, and insufcient realization of
TDD’s benets have been reported
to cause developers to deviate from
the process.12
To gain more insight, in an ethno-
graphically informed study, research-
ers monitored and documented the
TDD development process more
closely by means of artifacts includ-
ing audio recordings and notes.16
They concluded that developers per-
ceived implementation as the most
important phase and didn’t strictly
follow the TDD process. In par-
ticular, developers wrote more pro-
duction code than necessary, often
omitted refactoring, and didn’t keep
test cases up to date in accordance
with the progression of the produc-
tion code. Even when the develop-
ers followed the test-rst principle,
they thought about how the produc-
tion code (not necessarily the design)
should be before they wrote the test
for the next feature. In other words,
perhaps we should simply name this
phenomenon “code-driven testing”?
TDD’s internal and external
dynamics are more complex
than the order in which tests
are written. There’s no convincing
evidence that TDD consistently fares
better than any other development
method, at least those methods that
are iterative. And enough evidence ex-
ists to question whether TDD fulls
its promises.
How do you decide whether and
when to use TDD, then? And what
about TDD’s secondary effects?
As always, context is the key, and
any potential benet of TDD is likely
not due to whatever order of writing
tests and code developers follow. It
makes sense to have realistic expecta-
tions rather than worship or discard
TDD. Focus on the rhythm of devel-
opment; for example, tackle small
tasks in short, steady development
cycles, rather than bother with the
test order. Also, keep in mind that
some tasks are better (suited) than
others with respect to “TDD-bility.
This doesn’t mean you should
avoid trying TDD or stop using it.
For example, if you think that TDD
offers you the self-discipline to write
tests for each small functionality,
following the test-rst principle will
certainly prevent you from taking
shortcuts that skip tests. In this case,
there’s value in Beck’s suggestion,
“Never write a line of functional code
without a broken test case.”2 How-
ever, you should primarily consider
those tests’ quality (without obsessing
over coverage),17 instead of xating
on whether you wrote them before
the code. Although TDD does result
in more tests,1,6 the lack of attention
to testing quality,12 including main-
tainability and coevolution with pro-
duction code,16 could be alarming.
As long as you’re aware of and
comfortable with the potential trad-
eoff between productivity and test-
ability and quality (perhaps paying
off in the long term?), using TDD
is ne. If you’re simply having fun
and feeling good while performing
TDD without any signicant draw-
backs, that’s also ne. After all, the
evidence shows that happy develop-
ers are more productive and produce
better code!18
Acknowledgments
Academy of Finland Project 278354 partly
supports this research.
References
1. F. Shull et al., “W hat Do We Know
about Test-Driven Development?,”
IEEE Software, vol. 27, no. 6,
pp. 16–19, 2010.
2. K. Beck, Test-Driven Development:
By Example, Addison-Wesley, 2003.
3. W. Bissi et al., “The Effects of Test
Driven Development on Internal
Qualit y, External Qualit y and Pro-
ductivit y: A Systematic Review,” In-
formation and Software Technology,
June 2016, pp. 45–54.
4. H. Munir, M. Moayyed, and K.
Petersen, “Considering Rigor and Rel-
evance When Evaluating Test Driven
Development: A Systematic Review,”
Inform ation and Software Technol-
ogy, vol. 56, no. 4, 2014, pp. 375–394.
5. Y. Raque and V.B. Mišic
, “The Ef-
fects of Test-Driven Development on
External Quality and Productivity:
A Meta-analysis,” IEEE Trans. Soft-
ware Eng., vol. 39, no. 6, 2013, pp.
835–856; http://dx.doi.org/10.1109
/TSE.2012.28.
6. B. Turhan et al., “How Effective Is
Test-Driven Development?,” Making
Software: What Really Works, and
Why We Believe It, A. Oram and
G. Wilson, eds., O’Reilly Media,
2010, pp. 207–219.
7. S. Kollanus, “Test-Driven
Development—Still a Promising
Approach?,” Proc. 7th Int ’l Conf.
Quality of Information and Commu-
nications Technology (QUATIC 10),
2010, pp. 403– 408; http://dx.doi
.org/10.1109/QUATIC.2010.73.
8. M. Siniaalto, “Test Driven Develop-
ment: Empirical Body of Evidence,”
tech. report, Information Technology
for European Advancement, 3 Mar.
2006.
9. M. Pancˇur and M. Ciglaricˇ, “Im-
pact of Test-Driven Development on
Productivity, Code and Tests: A Con-
trolled Experiment,Information
and Sof tware Technology, vol. 53,
no. 6, 2011, pp. 557–573.
10. M. Beller et al., “When, How, and
Why Developers (Do Not) Test
REDIRECTIONS
JULY/AUGUST 20 18 | IEEE SOFTWARE
5
in Their IDEs,” Proc. 10th Joint
Meeting Foundations of Soft-
ware Eng. (ESEC/FSE 15), 2015,
pp. 179–190; http://doi.acm.
org/10.1145/2786805. 2786843.
11. N.C . Borle et al., “Analyzing the
Effects of Test Driven Development
in GitHub,” Empirical Software
Eng., Nov. 2017.
12. A. Causevic, D. Sundmark, and
S. Punnekkat, “Factors Limiting
Industrial Adoption of Test Driven
Development: A Systematic Review,”
Proc. 4th IEEE Int’l Conf. Software
Tes ti ng, Verication and Validation,
2011, pp. 337–346.
13. D. Fucci et al., “A Dissection of the
Test-Driven Development Process:
Does It Really Matter to Test-First
or to Test-Last?,” IEE E Trans. Soft-
ware Eng., vol. 43, no. 7, 2017, pp.
597614.
14. M.M. Müller and A. Höfer, “The Ef-
fect of Experience on the Test-Driven
Development Process,” Empirical
Software Eng., vol. 12, no. 6, 2007,
pp. 593 615; http s://doi.org /10.10 07
/s10664-007-9048-2.
15. A. Cockburn, “Elephant Carpaccio,”
blog; http://alistair.cockburn.us
/Elephant1carpaccio.
16. S. Romano et al., “Findings from a
Multi-method Study on Test-Driven
Development,” Information and
Software Technolog y, Sept. 2017,
pp. 64–77.
17. D. Bowes et al., “How Good Are My
Tes ts?,” Proc. IEEE/ACM 8th Work-
shop Emerging Trend s in Software
Metrics (WETSoM 17), 2017, pp.
9– 14.
18. D. Graziotin et al., “What Happens
When Software Developers Are (Un)
happy,J. Systems and Software,
June 2018, pp. 32– 47.
Read your subscriptions
throu gh the myCS
publications portal at
http://mycs.computer.org
ABOUT THE AUTHORS
ITIR KARAC is a project researcher in the M-Group research group
and a doctoral student in the Department of Information Processing
Science at the University of Oulu. Contact her at itir.karac@oulu..
BURAK TURHAN is a senior lecturer in Brunel Universit y’s
Department of Computer Science and a professor of software
engineering at the University of Oulu. Contact him at turhanb@
computer.org.
... For more information, see https://creativecommons.org/licenses/by/4.0/ developers [10], [11]. As control treatment, we use Incremental Test-Last Development (ITLD), where the code is done incrementally but test are written after the production code has been implemented. ...
... In TDD experiments, process conformance has been highlighted in secondary studies as a confounding factor causing inconsistent results among empirical studies [10], [11], [22], [23]. Moreover, Ghafari et al. [10] remark that there is no commonly shared definition of TDD, which may account, in part, for the variability among the developers when they adopt and follow the TDD process. ...
... The hypothesis This article has been accepted for publication in IEEE Transactions on Software Engineering. This is the author's version which has not been fully edited and (11,12) TDD was tested by applying the partially overlapping samples t-test [38], t(12) = −2.15, p = .05 ...
Article
Full-text available
bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Context :In software engineering (SE) experiments, the way in which a treatment is applied could affect results. Different interpretations of how to apply the treatment and decisions on treatment adherence could lead to different results when data are analysed. Objective : This paper aims to study whether treatment adherence has an impact on the results of an SE experiment. Method : The experiment used as test case for our research uses Test-Driven Development (TDD) and Incremental Test-Last Development, (ITLD) as treatments. We reported elsewhere the design and results of such an experiment where 24 participants were recruited from industry. Here, we compare experiment results depending on the use of data from adherent participants or data from all the participants irrespective of their adherence to treatments. Results : Only 40% of the participants adhere to both TDD protocol and to the ITLD protocol; 27% never followed TDD; 20% used TDD even in the control group; 13% are defiers (used TDD in ITLD session but not in TDD session). Considering that both TDD and ITLD are less complex than other SE methods, we can hypothesize that more complex SE techniques could get even lower adherence to the treatment. Conclusion : Both TDD and ITLD are applied differently across participants. Training participants could not be enough to ensure a medium to large adherence of experiment participants. Adherence to treatments impacts results and should not be taken for granted in SE experiments.
Chapter
Full-text available
In many agile software development projects, documentation is often missing, outdated, or written with only a technical perspective. Existing literature recognizes the importance of documentation quality, especially when it comes to its readability for diverse audiences. While recent advances in Large Language Models (LLMs) offer the potential to tackle these issues, the use of LLMs for software documentation remains unexplored. This paper investigates the use of ChatGPT to improve and adapt documentation to specific audiences. We apply ChatGPT-4 for alternative documentation production and measure the resulting text characteristics and readability. Twenty-five experts from management and development rate these different versions. Results show the suitability of ChatGPT for generating high-quality text for both audiences, with managers benefiting more from an adapted version.
Chapter
Full-text available
Test Driven Development (TDD) is one of the major practices of Extreme Programming for which incremental testing and refactoring trigger the code development. TDD has limited adoption in the industry, as it requires more code to be developed and experienced developers. Generative AI (GenAI) may reduce the extra effort imposed by TDD. In this work, we introduce an approach to automatize TDD by embracing GenAI either in a collaborative interaction pattern in which developers create tests and supervise the AI generation during each iteration or a fully-automated pattern in which developers only supervise the AI generation at the end of the iterations. We run an exploratory experiment with ChatGPT in which the interaction patterns are compared with the non-AI TDD regarding test and code quality and development speed. Overall, we found that, for our experiment and settings, GenAI can be efficiently used in TDD, but it requires supervision of the quality of the produced code. In some cases, it can even mislead non-expert developers and propose solutions just for the sake of the query.
Chapter
Full-text available
Agile and technical debt management should have a symbiotic relationship, as technical debt was conceived as a metaphor (or tool) to balance the benefits of taking shortcuts for early release and user feedback with the responsibility of ‘repairing’ the effects of these trade-offs. Agile processes provide the necessary flexibility to achieve this balance. However, in reality, feature greed often takes over, making it difficult for development teams to ensure that technical debt is repaid. This paper discusses experiences and best practices to address Technical Debt in an Agile context.
Chapter
Full-text available
Self-organizing teams are a common way of organizing teamwork in sectors related to modern technologies, especially in programming teams. Agile methods often promote and advocate such teams. One of the problems in this form of team organization is the issue of leadership, and particularly the relationship between vertical leadership - one person, and horizontal leadership - team members. In the literature on the subject, we can find traces of many concepts in such a broad area as the issue of leadership. However, several selected concepts allow us to capture an emerging feature in self-organizing teams which is the taking over of leadership functions by team members. Shared leadership, where the leadership function comes from team members, not from one appointed leader. Distributed leadership is where leadership in the organization is taken over voluntarily by individuals. Balanced leadership, where the vertical leader enables team members to take over leadership functions depending on the situation. The selected concepts presented here allow for a better understanding and research of the nature and phenomenon of leadership in self-organizing teams.
Chapter
Full-text available
Background and Related Work : Software startups face unique challenges in product development, including limited resources, the need for rapid innovation, and the constant pressure to adapt to market changes. Generative Artificial Intelligence (GenAI) has recently gained significant attention, offering capabilities to assist creative processes, generate content, and enhance decision-making through data analysis. However, how GenAI can be integrated into agile product development processes in software startups remains an open question. Objective : This study aims to identify potential use cases for GenAI in software startups and explore how GenAI can support innovation, overcome development challenges, and integrate with agile practices to improve product quality and development speed. Method : We identified a list of GenAI use cases from existing systematic literature reviews and mapped them to engineering process areas in software startups. Following that, we conducted workshops with experts to validate our results. Results : The results provide a descriptive overview of GenAI’s potential applications in software startup environments. Given the current state of the art, we identified areas that could benefit faster from integrating GenAI. Conclusions : The study delineates the prospective impact of GenAI on agile product development in software startups, showcasing areas of immediate applicability.
Chapter
Full-text available
This paper explores the emergence of agile-inspired approaches in the critical infrastructure sector, with a focus on the current digital transformation of the Norwegian Oil & Gas industry. It addresses how traditional plan-driven development and strict architectural principles are challenged by the need to exploit the growing volume of operational data, in search for better, faster, and safer operations. We emphasize the increasing reliance on data for optimizing operations and the inherent risks and culture clashes between Information Technology (IT) and Operational Technology (OT). We furthermore discuss the role of cybersecurity in this transition, illustrating how increased connectivity and agile-like approaches can both mitigate and exacerbate security vulnerabilities.
Chapter
Full-text available
User stories are the main vehicle to describe user needs in Agile projects and Agile project developments. But being this concept universally agreed, we may find that not all work increments have a clear user-centric view. In this paper, we focus on the distinction between user-centric “user stories” and other type of simple narratives, which may be simply called “stories”, which can be at the same level of abstraction. We propose a conceptual model in the form of UML diagram, and associated definitions, to clarify this distinction. The model also makes clearer the distinction among (user) story and (user) story template, which is not always kept clear.
Chapter
Full-text available
Leadership has been considered from every angle (almost) and the efforts are going strong. New ideas, books and trends, fads are popping out frequently. It’s not a secret that leadership in Agile is a fundament, started looking from attitude and roles up to practice at every level of the organization. Leadership in Agile Teams is still under dispute. It is time to embrace shared leadership, a very helpful concept in describing an emergent team phenomenon whereby leadership roles and influence are distributed among team members. This approach has surprising support in studies about team performance, well-established history, and even anecdotal evidence from practitioners. This very short paper presents results from the initial research of the author using SNA method.
Chapter
Full-text available
Software practitioners have adopted many new ways of working over the past 25 years. Change has been driven by a diverse and global community of users, practitioners, researchers, and vernacular programmers. What have we learned over the past 25 years? What skills will software researchers and practitioners need in the future? Will AI or other emerging technologies offer opportunities for greater achievements, or will they become an obstacle to the human touch needed to develop software products? This paper reports on a combined workshop and panel organized and facilitated by Steven Fraser (Innoxec) together with Dennis Mancl (MSWX Software Experts) and Werner Wild (Evolution Consulting). The workshop and panel were part of the 25th Anniversary Track at the XP 2024 conference held in Bolzano, Italy.
Article
Full-text available
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. To that end, we have conducted a comparative analysis of GitHub repositories that adopts TDD to a lesser or greater extent, in order to determine how TDD affects software development productivity and software quality. We classified GitHub repositories archived in 2015 in terms of how rigorously they practiced TDD, thus creating a TDD spectrum. We then matched and compared various subsets of these repositories on this TDD spectrum with control sets of equal size. The control sets were samples from all GitHub repositories that matched certain characteristics, and that contained at least one test file. We compared how the TDD sets differed from the control sets on the following characteristics: number of test files, average commit velocity, number of bug-referencing commits, number of issues recorded, usage of continuous integration, number of pull requests, and distribution of commits per author. We found that Java TDD projects were relatively rare. In addition, there were very few significant differences in any of the metrics we used to compare TDD-like and non-TDD projects; therefore, our results do not reveal any observable benefits from using TDD.
Article
Full-text available
The growing literature on affect among software developers mostly reports on the linkage between happiness, software quality, and developer productivity. Understanding happiness and unhappiness in all its components -- positive and negative emotions and moods -- is an attractive and important endeavour. Scholars in industrial and organizational psychology have suggested that understanding happiness and unhappiness could lead to cost-effective ways of enhancing working conditions, job performance, and to limiting the occurrence of psychological disorders. Our comprehension of the consequences of (un)happiness among developers is still too shallow, being mainly expressed in terms of development productivity and software quality. In this paper, we study what happens when developers are happy and not happy. Qualitative data analysis of responses given by 317 questionnaire participants identified 42 consequences of unhappiness and 32 of happiness. We found consequences of happiness and unhappiness that are beneficial and detrimental for developers' mental well-being, the software development process, and the produced artefacts. Our classification scheme, available as open data enables new happiness research opportunities of cause-effect type, and it can act as a guideline for practitioners for identifying damaging effects of unhappiness and for fostering happiness on the job.
Article
Full-text available
Context: Test Driven Development (TDD) is an agile practice that has gained popularity when it was defined as a fundamental part in eXtreme Programming (XP). Objective: This study analyzed the conclusions of previously published articles on the effects of TDD on internal and external software quality and productivity, comparing TDD with Test Last Development (TLD). Method: In this study, a systematic literature review has been conducted considering articles published between 1999 and 2014. Results: In about 57% of the analyzed studies, the results were validated through experiments and in 32% of them, validation was performed through a case study. The results of this analysis show that 76% of the studies have identified a significant increase in internal software quality while 88% of the studies identified a meaningful increase in external software quality. There was an increase in productivity in the academic environment, while in the industrial scenario there was a decrease in productivity. Overall, about 44% of the studies indicated lower productivity when using TDD compared to TLD. Conclusion: According to our findings, TDD yields more benefits than TLD for internal and external software quality, but it results in lower developer productivity than TLD.
Conference Paper
Full-text available
The research community in Software Engineering and Software Testing in particular builds many of its contributions on a set of mutually shared expectations. Despite the fact that they form the basis of many publications as well as open-source and commercial testing applications, these common expectations and beliefs are rarely ever questioned. For example, Frederic Brooks' statement that testing takes half of the development time seems to have manifested itself within the community since he first made it in the "Mythical Man Month" in 1975. With this paper, we report on the surprising results of a large-scale field study with 416 software engineers whose development activity we closely monitored over the course of five months, resulting in over 13 years of recorded work time in their integrated development environments (IDEs). Our findings question several commonly shared assumptions and beliefs about testing and might be contributing factors to the observed bug proneness of software in practice: the majority of developers in our study does not test; developers rarely run their tests in the IDE; Test-Driven Development (TDD) is not widely practiced; and, last but not least, software developers only spend a quarter of their work time engineering tests, whereas they think they test half of their time.
Article
Full-text available
This paper provides a systematic meta-analysis of 27 studies that investigate the impact of Test-Driven Development (TDD) on external code quality and productivity. The results indicate that, in general, TDD has a small positive effect on quality but little to no discernible effect on productivity. However, subgroup analysis has found both the quality improvement and the productivity drop to be much larger in industrial studies in comparison with academic studies. A larger drop of productivity was found in studies where the difference in test effort between the TDD and the control group's process was significant. A larger improvement in quality was also found in the academic studies when the difference in test effort is substantial; however, no conclusion could be derived regarding the industrial studies due to the lack of data. Finally, the influence of developer experience and task size as moderator variables was investigated, and a statistically significant positive correlation was found between task size and the magnitude of the improvement in quality.
Article
Full-text available
Context Test driven development (TDD) has been extensively researched and compared to traditional approaches (test last development, TLD). Existing literature reviews show varying results for TDD. Objective This study investigates how the conclusions of existing literature reviews change when taking two study quality dimension into account, namely rigor and relevance. Method In this study a systematic literature review has been conducted and the results of the identified primary studies have been analyzed with respect to rigor and relevance scores using the assessment rubric proposed by Ivarsson and Gorschek 2011. Rigor and relevance are rated on a scale, which is explained in this paper. Four categories of studies were defined based on high/low rigor and relevance. Results We found that studies in the four categories come to different conclusions. In particular, studies with a high rigor and relevance scores show clear results for improvement in external quality, which seem to come with a loss of productivity. At the same time high rigor and relevance studies only investigate a small set of variables. Other categories contain many studies showing no difference, hence biasing the results negatively for the overall set of primary studies. Given the classification differences to previous literature reviews could be highlighted. Conclusion Strong indications are obtained that external quality is positively influenced, which has to be further substantiated by industry experiments and longitudinal case studies. Future studies in the high rigor and relevance category would contribute largely by focusing on a wider set of outcome variables (e.g. internal code quality). We also conclude that considering rigor and relevance in TDD evaluation is important given the differences in results between categories and in comparison to previous reviews.
Conference Paper
Background: Writing unit tests is one of the primary activities in test-driven development. Yet, the existing reviews report few evidence supporting or refuting the effect of this development approach on test case quality. Lack of ability and skills of developers to produce sufficiently good test cases are also reported as limitations of applying test-driven development in industrial practice. Objective: We investigate the impact of test-driven development on the effectiveness of unit test cases compared to an incremental test last development in an industrial context. Method: We conducted an experiment in an industrial setting with 24 professionals. Professionals followed the two development approaches to implement the tasks. We measure unit test effectiveness in terms of mutation score. We also measure branch and method coverage of test suites to compare our results with the literature. Results: In terms of mutation score, we have found that the test cases written for a test-driven development task have a higher defect detection ability than test cases written for an incremental test-last development task. Subjects wrote test cases that cover more branches on a test-driven development task compared to the other task. However, test cases written for an incremental test-last development task cover more methods than those written for the second task. Conclusion: Our findings are different from previous studies conducted at academic settings. Professionals were able to perform more effective unit testing with test-driven development. Furthermore, we observe that the coverage measure preferred in academic studies reveal different aspects of a development approach. Our results need to be validated in larger industrial contexts.
Article
Context: Test-driven development (TDD) is an iterative software development practice where unit tests are defined before production code. A number of quantitative empirical investigations have been conducted about this practice. The results are contrasting and inconclusive. In addition, previous studies fail to analyze the values, beliefs, and assumptions that inform and shape TDD. Objective: We present a study designed, and conducted to understand the values, beliefs, and assumptions about TDD. Participants were novice and professional software developers. Method: We conducted an ethnographically-informed study with 14 novice software developers, i.e., graduate students in Computer Science at the University of Basilicata, and six professional software developers (with one to 10 years work experience). The participants worked on the implementation of a new feature for an existing software written in Java. We immersed ourselves in the context of our study. We collected qualitative information by means of audio recordings, contemporaneous field notes, and other kinds of artifacts. We collected quantitative data from the integrated development environment to support or refute the ethnography results. Results: The main insights of our study can be summarized as follows: (i) refactoring (one of the phases of TDD) is not performed as often as the process requires and it is considered less important than other phases, (ii) the most important phase is implementation, (iii) unit tests are almost never up-to-date, and (iv) participants first build in their mind a sort of model of the source code to be implemented and only then write test cases. The analysis of the quantitative data supported the following qualitative findings: (i), (iii), and (iv). Conclusions: Developers write quick-and-dirty production code to pass the tests, do not update their tests often, and ignore refactoring.
Article
Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.