ArticlePDF Available


Test-driven development (TDD) involves more than just testing before coding. This article examines how (and whether) TDD has lived up to its promises.
FROM THE EDITOR Editor: Editor Name
0740-7459/18/$33.00 © 2018 IEEE JULY/AUGUST 2018 | IEEE SOFTWARE
Editor: Tim Menzies
Nort h Caroli na State
Universi ty
What Do We (Really)
Know about Test-Driven
Itir Karac and Burak Turhan
(TD D) is one of the most controver-
sial agile practices in terms of its
impact on software quality and pro-
grammer productivity. After more
than a decade’s research, the jury is
still out on its effectiveness. TDD
promised all: increased quality and
productivity, along with an emerg-
ing, clean design supported by the
safety net of a growing library
of tests. What’s more, the recipe
sounded surprisingly simple: Don’t
write code without a failing test.
Here, we revisit the evidence of
the promises of TDD.1 But, before
we go on, just pause and think of an
answer to the following core ques-
tion: What is TDD?
Let us guess: your response is
most likely along the lines of, “TDD
is a practice in which you write
tests before code.” This emphasis
on its test-rst dynamic, strongly
implied by the name, is perhaps the
root of most, if not all, of the con-
troversy about TDD. Unfortunately,
it’s a common misconception to use
“TDD” and “test-rst” interchange-
ably. Test-rst is only one part of
TDD. There are many other cogs
in the system that potentially make
TDD tick.
How about working on small
tasks, keeping the red–green–refactor
cycles short and steady, writing only
the code necessary to pass a fail-
ing test, and refactoring? What if
we told you that some of these cogs
contribute more toward fullling
the promises of TDD than the order
of test implementation? (Hint: you
should ask for evidence.)
15 Years of (Contradictory)
Back in 2003, when the software
development paradigm started to
change irrevocably (for the bet-
ter?), Kent Beck posed a claim based
on anecdotal evidence and paved
the way for software engineering
No studies have categorically
demonstrated the difference be-
tween TDD and any of the many
Call for Submissions
Do you have a surprising result or industrial experience? Someth ing that chal-
lenges decades of conventiona l thinking in software engineering? If so, email a
one-paragraph synopsis to (use the subject line “REDIRECTIONS:
Idea: your idea”). If that looks interesting, I’ll ask you to submit a 1,000- to 2,400-
word article (in which each gure or table counts as 250 words) for review for the
Redirections department. Please note: heresies are more tha n welcome (if sup-
ported by well-reasoned industrial experiences, case studies, or other empirical
Tim Menzies
alternatives in quality, productiv-
ity, or fun. However, the anecdotal
evidence is overwhelming, and the
secondary effects are unmistakable.2
Since then, numerous studies—
for example, experiments and case
studies—have investigated TDD’s
effectiveness. These studies are pe-
riodically synthesized in secondary
studies (see Table 1), only to reveal
contradictory results across the pri-
mary studies. This research has also
demonstrated no consistent overall
benet from TDD, particularly for
overall productivity and within sub-
groups for quality.
Why the inconsistent results? Be-
sides the reasons listed in Table 1,
other likely reasons are that
• TDD has too many cogs,
• its effectiveness is highly inu-
enced by the context (for ex-
ample, the tasks at hand or skills
of individuals),
• the cogs highly interact with
each other, and
• most studies have focused on
only the test-rst aspect.
Identifying the inconsistencies’
sources is important for designing
further studies that control for those
Matjaž Pancˇur and Mojca
Ciglaricˇ speculated that the results of
studies showing TDD’s superiority
over a test-last approach were due to
the fact that most of the experiments
employed a coarse-grained test-last
process closer to the waterfall
Table 1. Systematic literature reviews on test-driven development (TDD).
Overall conclusion for quality
with TDD
Overall conclusion for
productivity with TDD
Inconsistent result s in the study
Bissi et al.3Improvement Inconclusive Productivity:
Academic vs. industrial setting
Munir et al.4Improvement or no difference Degradation or no difference Quality:
Low vs. high rigor
Low vs. high relevance
Low vs. high rigor
Low vs. high relevance
Raque and Mišic´5Improvement Inconclusive Quality:
Waterfall vs. iterative test-last
Waterfall vs. iterative test-last
Academic vs. industrial
Turhan et al.6 and Shull et al.1Improvement Inconclusive Quality:
Among controlled experiments
Among studies with high rigor
Among pilot studies
Controlled experiments vs.
industrial case studies
Among studies with high rigor
Kollanus7Improvement Degradation Quality:
Among academic studies
Among semi-industrial studies
Siniaalto8Improvement Inconclusive Productivity:
Among academic studies
Among semi-industrial studies
approach as a control group.9 This
created a large differential in granu-
larity between the treatments, and
sometimes even a complete lack of
tests in the control, resulting in un-
fair, misleading comparisons. In the
end, TDD might perform better only
when compared to a coarse-grained
development process.
Industry Adoption
(or Lack Thereof)
Discussions on TDD are common
and usually heated. But how com-
mon is the use of TDD in practice?
Not very—at least, that’s what the
evidence suggests.
For example, after monitoring
the development activity of 416 de-
velopers over more than 24,000
hours, researchers reported that the
developers followed TDD in only
12 percent of the projects that
claimed to use it.10 We’ve observed
similar patterns in our work with
professional developers. Indeed, if it
were possible to reanalyze all exist-
ing evidence considering this facet
only, the shape of things might
change signicantly (for better or
worse). We’ll be the devil’s advocate
and ask, what if the anecdotal evi-
dence from TDD enthusiasts is based
on misconceived personal experience
from non-TDD activities?
Similarly, a recent study analyzed
a September 2015 snapshot of all the
(Java) projects in GitHub.11 Using
heuristics for identifying TDD-like
repositories, the researchers found
that only 0.8 percent of the projects
adhered to TDD protocol. Further-
more, comparing those projects to
a control set, the study reported no
difference between the two groups in
terms of
• the commit velocity as a measure
of productivity,
• the number of bug-xing com-
mits as an indicator of the num-
ber of defects, and
• the number of issues reported
for the project as a predictor of
Additionally, a comparison of the
number of pull requests and the dis-
tribution of commits per author
didn’t indicate any effect on devel-
oper collaboration.
Adnan Causevic and his col-
leagues identied seven factors limit-
ing TDD’s use in the industry:12
• increased development time
(productivity hits),
• insufcient TDD experience or
insufcient design,
• insufcient developer testing
• insufcient adherence to TDD
• domain- and tool-specic limita-
tions, and
• legacy code.
It’s not surprising that three of these
factors are related to the developers’
capacity to follow TDD and their
rigor in following it.
What Really Makes TDD Tick?
A more rened look into TDD is
concerned with not only the order
in which production code and test
code are written but also the average
duration of development cycles, that
duration’s uniformity, and the refac-
toring effort. A recent study of 39
professionals reported that a steady
rhythm of short development cycles
was the primary reason for improved
quality and productivity.13 Indeed,
the effect of test-rst completely di-
minished when the effects of short
and steady cycles were considered.
These ndings are consistent with
earlier research demonstrating that
TDD experts had much shorter and
less variable cycle lengths than nov-
ices did.14 The signicance of short
development cycles extends beyond
TDD; Alistair Cockburn, in explain-
ing the Elephant Carpaccio concept,
states that “agile developers apply
micro-, even nano-incremental de-
velopment in their work.”15
Another claim of Elephant Car-
paccio, related to the TDD concept
of working on small tasks, is that
agile developers can deliver fast
“not because we’re so fast we can
[develop] 100 times as fast as other
people, but rather, we have trained
ourselves to ask for end-user-visible
functionality 100 times smaller than
most other people.15 To test this,
we conducted experiments in which
we controlled for the framing of task
descriptions (ner-grained user sto-
ries versus coarser-grained generic
descriptions). We observed that the
type of task description and the task
itself are signicant factors affect-
ing software quality in the context
of TDD.
In short, working on small,
well-dened tasks in short, steady
development cycles has a more
positive impact on quality and
productivity than the order of test
Deviations from the
Test-First Mantra
Even if we consider the studies that
focus on only the test-rst nature
of TDD, there’s still the problem of
conformance to the TDD process.
TDD isn’t a dichotomy in which
you either religiously write tests
rst every time or always test after
the fact. TDD is a continuous spec-
trum between these extremes, and
developers tend to dynamically span
this spectrum, adjusting the TDD
process as needed. In industrial set-
tings, time pressure, lack of disci-
pline, and insufcient realization of
TDD’s benets have been reported
to cause developers to deviate from
the process.12
To gain more insight, in an ethno-
graphically informed study, research-
ers monitored and documented the
TDD development process more
closely by means of artifacts includ-
ing audio recordings and notes.16
They concluded that developers per-
ceived implementation as the most
important phase and didn’t strictly
follow the TDD process. In par-
ticular, developers wrote more pro-
duction code than necessary, often
omitted refactoring, and didn’t keep
test cases up to date in accordance
with the progression of the produc-
tion code. Even when the develop-
ers followed the test-rst principle,
they thought about how the produc-
tion code (not necessarily the design)
should be before they wrote the test
for the next feature. In other words,
perhaps we should simply name this
phenomenon “code-driven testing”?
TDD’s internal and external
dynamics are more complex
than the order in which tests
are written. There’s no convincing
evidence that TDD consistently fares
better than any other development
method, at least those methods that
are iterative. And enough evidence ex-
ists to question whether TDD fulls
its promises.
How do you decide whether and
when to use TDD, then? And what
about TDD’s secondary effects?
As always, context is the key, and
any potential benet of TDD is likely
not due to whatever order of writing
tests and code developers follow. It
makes sense to have realistic expecta-
tions rather than worship or discard
TDD. Focus on the rhythm of devel-
opment; for example, tackle small
tasks in short, steady development
cycles, rather than bother with the
test order. Also, keep in mind that
some tasks are better (suited) than
others with respect to “TDD-bility.
This doesn’t mean you should
avoid trying TDD or stop using it.
For example, if you think that TDD
offers you the self-discipline to write
tests for each small functionality,
following the test-rst principle will
certainly prevent you from taking
shortcuts that skip tests. In this case,
there’s value in Beck’s suggestion,
“Never write a line of functional code
without a broken test case.”2 How-
ever, you should primarily consider
those tests’ quality (without obsessing
over coverage),17 instead of xating
on whether you wrote them before
the code. Although TDD does result
in more tests,1,6 the lack of attention
to testing quality,12 including main-
tainability and coevolution with pro-
duction code,16 could be alarming.
As long as you’re aware of and
comfortable with the potential trad-
eoff between productivity and test-
ability and quality (perhaps paying
off in the long term?), using TDD
is ne. If you’re simply having fun
and feeling good while performing
TDD without any signicant draw-
backs, that’s also ne. After all, the
evidence shows that happy develop-
ers are more productive and produce
better code!18
Academy of Finland Project 278354 partly
supports this research.
1. F. Shull et al., “W hat Do We Know
about Test-Driven Development?,”
IEEE Software, vol. 27, no. 6,
pp. 16–19, 2010.
2. K. Beck, Test-Driven Development:
By Example, Addison-Wesley, 2003.
3. W. Bissi et al., “The Effects of Test
Driven Development on Internal
Qualit y, External Qualit y and Pro-
ductivit y: A Systematic Review,” In-
formation and Software Technology,
June 2016, pp. 45–54.
4. H. Munir, M. Moayyed, and K.
Petersen, “Considering Rigor and Rel-
evance When Evaluating Test Driven
Development: A Systematic Review,”
Inform ation and Software Technol-
ogy, vol. 56, no. 4, 2014, pp. 375–394.
5. Y. Raque and V.B. Mišic
, “The Ef-
fects of Test-Driven Development on
External Quality and Productivity:
A Meta-analysis,” IEEE Trans. Soft-
ware Eng., vol. 39, no. 6, 2013, pp.
6. B. Turhan et al., “How Effective Is
Test-Driven Development?,” Making
Software: What Really Works, and
Why We Believe It, A. Oram and
G. Wilson, eds., O’Reilly Media,
2010, pp. 207–219.
7. S. Kollanus, “Test-Driven
Development—Still a Promising
Approach?,” Proc. 7th Int ’l Conf.
Quality of Information and Commu-
nications Technology (QUATIC 10),
2010, pp. 403– 408; http://dx.doi
8. M. Siniaalto, “Test Driven Develop-
ment: Empirical Body of Evidence,”
tech. report, Information Technology
for European Advancement, 3 Mar.
9. M. Pancˇur and M. Ciglaricˇ, “Im-
pact of Test-Driven Development on
Productivity, Code and Tests: A Con-
trolled Experiment,Information
and Sof tware Technology, vol. 53,
no. 6, 2011, pp. 557–573.
10. M. Beller et al., “When, How, and
Why Developers (Do Not) Test
in Their IDEs,” Proc. 10th Joint
Meeting Foundations of Soft-
ware Eng. (ESEC/FSE 15), 2015,
pp. 179–190; http://doi.acm.
org/10.1145/2786805. 2786843.
11. N.C . Borle et al., “Analyzing the
Effects of Test Driven Development
in GitHub,” Empirical Software
Eng., Nov. 2017.
12. A. Causevic, D. Sundmark, and
S. Punnekkat, “Factors Limiting
Industrial Adoption of Test Driven
Development: A Systematic Review,”
Proc. 4th IEEE Int’l Conf. Software
Tes ti ng, Verication and Validation,
2011, pp. 337–346.
13. D. Fucci et al., “A Dissection of the
Test-Driven Development Process:
Does It Really Matter to Test-First
or to Test-Last?,” IEE E Trans. Soft-
ware Eng., vol. 43, no. 7, 2017, pp.
14. M.M. Müller and A. Höfer, “The Ef-
fect of Experience on the Test-Driven
Development Process,” Empirical
Software Eng., vol. 12, no. 6, 2007,
pp. 593 615; http s:// /10.10 07
15. A. Cockburn, “Elephant Carpaccio,”
16. S. Romano et al., “Findings from a
Multi-method Study on Test-Driven
Development,” Information and
Software Technolog y, Sept. 2017,
pp. 64–77.
17. D. Bowes et al., “How Good Are My
Tes ts?,” Proc. IEEE/ACM 8th Work-
shop Emerging Trend s in Software
Metrics (WETSoM 17), 2017, pp.
9– 14.
18. D. Graziotin et al., “What Happens
When Software Developers Are (Un)
happy,J. Systems and Software,
June 2018, pp. 32– 47.
Read your subscriptions
throu gh the myCS
publications portal at
ITIR KARAC is a project researcher in the M-Group research group
and a doctoral student in the Department of Information Processing
Science at the University of Oulu. Contact her at itir.karac@oulu..
BURAK TURHAN is a senior lecturer in Brunel Universit y’s
Department of Computer Science and a professor of software
engineering at the University of Oulu. Contact him at turhanb@
... While test-driven development increases code quality and productivity, its industry adoption is low due to context-related factors [30]. Nevertheless, some of the limiting factors listed by Karac et al. (e.g., increased development time [30]) could be overcome by using automatic tools (M2): Mısırlı et al. stated in their 2011 paper that by inspecting less than a fourth of the code, it is possible to detect almost three fourths of software defects [31]. ...
... While test-driven development increases code quality and productivity, its industry adoption is low due to context-related factors [30]. Nevertheless, some of the limiting factors listed by Karac et al. (e.g., increased development time [30]) could be overcome by using automatic tools (M2): Mısırlı et al. stated in their 2011 paper that by inspecting less than a fourth of the code, it is possible to detect almost three fourths of software defects [31]. Nevertheless, regression testing needs for careful selection and prioritisation of test cases within test suites often too large to run entirely [32]. ...
... There is a clear tendency to use AI to assist in daily tasks, including Agile processes, as discussed in Section 4. The integration of AI in Agile software development can help improve software development outcomes by enhancing accuracy, efficiency, and safety while reducing development time [3,4]. However, there are also challenges associated with integrating AI and Agile methodologies, such as the need for specialised technical expertise, and integrating AI into Agile software development processes requires careful consideration of the context [30]. Additionally, the system's ability to adapt to changing contexts is also crucial, as Agile methodologies prioritise flexibility and adaptability to the ever-changing market [3]. ...
Full-text available
This study explores the benefits and challenges of integrating Artificial Intelligence with Agile software development methodologies, focusing on improving continuous integration and delivery. A systematic literature review and longitudinal meta-analysis of the retrieved studies was conducted to analyse the role of Artificial Intelligence and it's future applications within Agile software development. The review helped identify critical challenges, such as the need for specialised socio-technical expertise. While Artificial Intelligence holds promise for improved software development practices, further research is needed to better understand its impact on processes and practitioners, and to address the indirect challenges associated with its implementation.
... Karac and Turhan [35] provided a rather general overview on the topic of TDD. They highlighted, inter alia, that only a fraction of the projects that claimed to be conducted test-driven actually had the developers consequently following the corresponding methodology. ...
... Finally, challenges regarding the implementation of the actual process fall into the last category. Lack of knowledge, experience, and competencies in applying TDD [27], [30], [32], [33], [34], [35] Difficulty to shift to the TDD mindset [28], [29], [31], [32], [36] Senior-level management not having a proper understanding of the TDD practice [27], [32] Software ...
... Lack of detailed upfront design [29], [31], [32], [35] Not having proper guidelines for using TDD for legacy code [30] High test code volume [32] Tests are often geared towards providing confidence in the developed solution instead of actively looking for issues [32] ...
Full-text available
Due to the ongoing trend of digitalization, the importance of software for today’s society is continuously increasing. Naturally, there is also a huge interest in improving its quality, which led to a highly active research community dedicated to this aim. Consequently, a plethora of propositions, tools, and methods emerged from the corresponding efforts. One of the approaches that have become highly prominent is the concept of test-driven development (TDD) that increases the quality of created software by restructuring the development process. However, such a big change to the followed procedures is usually also accompanied by major challenges that pose a risk for the achievement of the set targets. In order to find ways to overcome them, or at least to mitigate their impact, it is necessary to identify them and to subsequently raise awareness. Furthermore, since the effect of TDD on productivity and quality is already extensively researched, this work focuses only on issues besides these aspects. For this purpose, a literature review is presented that focuses on the challenges of TDD. In doing so, challenges that can be attributed to the three categories of people, software, and process are identified and potential avenues for future research are discussed.
... For the general application, these have already been discussed in multiple contributions, as an overview of the topic shows (Staegemann et al. 2022a). These are, regarding the involved people, mainly a lack of knowledge and experience by the developers (Buchan et al. 2011;Causevic et al. 2011;Causevic et al. 2013;Karac and Turhan 2018;Latorre 2014;Nanthaamornphong and Carver 2017), difficulties in shifting to the TDD mindset (Baldassarre et al. 2022;Causevic et al. 2013;Hammond and Umphress 2012;Kollanus 2011;Marchenko et al. 2009), and senior-level management's insufficient understanding of TDD (Buchan et al. 2011;Causevic et al. 2013). Further, it is tempting to create the tests in a way that they highlight what works, instead of actively looking for potential issues (Causevic et al. 2013). ...
... Further, it is tempting to create the tests in a way that they highlight what works, instead of actively looking for potential issues (Causevic et al. 2013). The envisioned application's initial design is often too poorly planned (Causevic et al. 2013;Hammond and Umphress 2012;Karac and Turhan 2018;Kollanus 2011), the existence of legacy code is not sufficiently considered in the TDD approach (Causevic et al. 2011), and there is obviously a necessity to create huge volumes of test code (Causevic et al. 2013). Moreover, a lack of suitable tools for test creation (Causevic et al. 2013;Kollanus 2011;Nanthaamornphong and Carver 2017) and the high technical complexity of TDD's application in certain scenarios (e.g., GUI development) are noteworthy (Causevic et al. 2011;Causevic et al. 2013;Marchenko et al. 2009). ...
Conference Paper
Big data (BD) is one of the major technological trends of today and finds application in numerous domains and contexts. However, while there are huge potential benefits, there are also considerable challenges. One of these is the difficulty to make sure the respective applications have the necessary quality. For this purpose, the application of test driven development (TDD) to the domain was proposed. In general, the approach already has a rather long history and, thereby, the corresponding challenges are also known. However, since the BD domain has several demanding particularities, this also needs to be accounted for when applying TDD. Yet, to our knowledge, this specific aspect has not been discussed by now. The publication at hand bridges this gap by examining the challenges of applying TDD to the engineering of BD applications. In doing so, it facilitates the approach’s use by practitioners and researchers while also constituting a foundation for further discourse regarding the quality assurance in the BD realm and the TDD approach in general.
... For the general application, these have already been discussed in multiple contributions, as an overview of the topic shows (Staegemann et al. 2022a). These are, regarding the involved people, mainly a lack of knowledge and experience by the developers (Buchan et al. 2011;Causevic et al. 2011;Causevic et al. 2013;Karac and Turhan 2018;Latorre 2014;Nanthaamornphong and Carver 2017), difficulties in shifting to the TDD mindset (Baldassarre et al. 2022;Causevic et al. 2013;Hammond and Umphress 2012;Kollanus 2011;Marchenko et al. 2009), and senior-level management's insufficient understanding of TDD (Buchan et al. 2011;Causevic et al. 2013). Further, it is tempting to create the tests in a way that they highlight what works, instead of actively looking for potential issues (Causevic et al. 2013). ...
... Further, it is tempting to create the tests in a way that they highlight what works, instead of actively looking for potential issues (Causevic et al. 2013). The envisioned application's initial design is often too poorly planned (Causevic et al. 2013;Hammond and Umphress 2012;Karac and Turhan 2018;Kollanus 2011), the existence of legacy code is not sufficiently considered in the TDD approach (Causevic et al. 2011), and there is obviously a necessity to create huge volumes of test code (Causevic et al. 2013). Moreover, a lack of suitable tools for test creation (Causevic et al. 2013;Kollanus 2011;Nanthaamornphong and Carver 2017) and the high technical complexity of TDD's application in certain scenarios (e.g., GUI development) are noteworthy (Causevic et al. 2011;Causevic et al. 2013;Marchenko et al. 2009). ...
Conference Paper
Business Simulation Games (BSGs) aim to simulate reality and impart knowledge as well as skills in a playful way. To be able to verify the goal attainment, the first steps towards an evaluation concept were taken in this paper. With the exemplary evaluation of Global Bike Go, a series of mini BSGs for SAP ERP teaching, initial indications could be generated about what they (can) achieve. One certain finding is that the games are suitable for beginners whereas the participants’ knowledge gain only shows tendencies. From the overall results, development potentials for the BSGs as well as for the evaluation concept used could be identified. However, due to the small sample and the limiting circumstances, further investigations have to be conducted. In this context, the self-performed actions as well as interactions with other players as significant game elements should be focused more, and especially the interdependency between the BSGs and other teaching materials seems promising. Ther efore, an interdisciplinary approach is desirable.
... From a broader perspective, refactoring is just as important for automated test code as production code as it supports the identification of issues caused by production code changes [17], [18]. Testing verifies whether the software functionality and observable behavior are kept the same after code refactorings [19], [20]. ...
Full-text available
Refactorings are transformations to improve the code design without changing overall functionality and observable behavior. During the refactoring process of smelly test code, practitioners may struggle to identify refactoring candidates and define and apply corrective strategies. This paper reports on an empirical study aimed at understanding how test smells and test refactorings are discussed on the Stack Exchange network. Developers commonly count on Stack Exchange to pick the brains of the wise, i.e., to `look up' how others are completing similar tasks. Therefore, in light of data from the Stack Exchange discussion topics, we could examine how developers understand and perceive test smells, the corrective actions they take to handle them, and the challenges they face when refactoring test code aiming to fix test smells. We observed that developers are interested in others' perceptions and hands-on experience handling test code issues. Besides, there is a clear indication that developers often ask whether test smells or anti-patterns are either good or bad testing practices than code-based refactoring recommendations.
... Tests are often written after the code-under-test has been developed. However, the practice of test-driven development-which advocates for test creation before code developmenthas become popular[43].2.4 THE ROLE OF SOFTWARE TESTINGIN THE SOFTWARE DEVELOPMENT LIFE CYCLE The software development life cycle (SDLC) defines a series of stages or activities that take place with the aim of developing software that meets or exceeds customer expectations and reaches completion within time and cost estimates. The typical SDLC includes the following activities: 1. Planning: Analysis of requirements and planning are among software development's most critical aspects. ...
Full-text available
Software testing is notoriously difficult and expensive, and improper testing carries eco- nomic, legal, and even environmental or medical risks. Research in software testing is critical to enabling the development of the robust software that our society relies upon. This dissertation aims to lower the cost of software testing without decreasing the quality by focusing on the use of automation. The dissertation consists of three empirical studies on aspects of software testing. Specifically, these three projects focus on (1) mapping the connections between research topics and the evolution of research topics in the field of software testing, (2) an assessment of the metrics used to guide automated test generation and the factors that suggest when automated test generation can detect real faults, and (3) examination of the semantic coupling between synthetic and real faults in service of im- proving our ability to cost-effectively generate synthetic faults for use in assessing test case quality. • Project 1 (Mapping): Our main goal for this project is to understand better the emergence of individual research topics and the connection between these topics within the broad field of software testing, enabling the identification of new topics and connections in future research. To achieve this goal, we have applied co-word analysis in order to characterize the topology of software testing research over three decades of research studies based on the keywords provided by the authors of studies indexed in the Scopus database. • Project 2 (Automated Input Generation): We have assessed the fault-detection ca- pabilities of unit test suites generated by automated tools with the goal of satisfying eight fitness functions representing common testing goals. Our purpose was not only iv to identify the particular fitness functions that detect the most faults but to explore further the factors that influence fault detection. To do this, we gathered observa- tions on the generated test suites and metrics describing the source code of the faulty classes and applied a rule-learning algorithm to identify the factors with the strongest influence on fault detection. • Project 3 (Mutant-Fault Coupling): Synthetic faults (mutants), which can be in- serted into code through transformative mutation operators, offer an automated means to assess the effectiveness of test suites and create new test cases. However, mutants can be expensive to utilize and may not realistically model real faults. To enable the cost-effective generation of mutants, we investigate this semantic relationship between mutation operators and real faults.
Test cases are designed in service of goals, e.g., functional correctness or performance. Unfortunately, we lack a clear understanding of how specific goal types influence test design. In this study, we explore this relationship through interviews and a survey with software developers, with a focus on identification and importance of goal types, quantitative relations between goals and tests, and personal, organizational, methodological, and technological factors. We identify nine goal types and their importance, and perform further analysis of three—correctness, reliability, and quality. We observe that test design for correctness forms a “default” design process that is modified when pursuing other goals. For the examined goal types, test cases tend to be simple, with many tests targeting a single goal and each test focusing on 1–2 goals at a time. We observe differences in testing practices, tools, and targeted system types between goal types. In addition, we observe that test design can be influenced by organization, process, and team makeup. This study provides a foundation for future research on test design and testing goals.
Full-text available
Test-Driven Development (TDD) é uma prática de desenvolvimento de software que ganhou notoriedade quando Kent Beck a definiu como uma parte essencial da Extreme Programming (XP). O presente estudo analisou experimentos e conclusões de estudos, previamente publicados, em relação aos efeitos do TDD na produtividade dos desenvolvedores e na qualidade do software produzido, contrastando o TDD com o Test-Last Development (TLD). Para isto, foi conduzida uma revisão bibliográfica sistemática considerando artigos publicados entre 2003 e 2020. Ao final do processo de revisão, aproximadamente 73\% dos estudos analisados, consistiram em experimentos com TDD e em 27\% deles, o principal tema era o TDD em sua essência, detalhando-o. A análise realizada mostra que 43\% dos estudos apontaram um aumento considerável na qualidade do software, enquanto nenhum artigo apontou queda na qualidade. Em relação à produtividade, 28\% dos estudos apontaram queda na produtividade e 47\% foram inconclusivos. Via de regra, os estudos não apontaram melhorias significativas na produtividade quando o TDD foi utilizado. De acordo com a análise, o TDD promove maior qualidade, mesmo que alguns estudos apontem o contrário. Em relação à produtividade, o TDD é inconclusivo. Sendo assim, de acordo com os artigos analisados, não há uma posição final referente ao custo-benefício envolvido nesta prática, discutimos algumas possíveis razões para essa conclusão.
Full-text available
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. To that end, we have conducted a comparative analysis of GitHub repositories that adopts TDD to a lesser or greater extent, in order to determine how TDD affects software development productivity and software quality. We classified GitHub repositories archived in 2015 in terms of how rigorously they practiced TDD, thus creating a TDD spectrum. We then matched and compared various subsets of these repositories on this TDD spectrum with control sets of equal size. The control sets were samples from all GitHub repositories that matched certain characteristics, and that contained at least one test file. We compared how the TDD sets differed from the control sets on the following characteristics: number of test files, average commit velocity, number of bug-referencing commits, number of issues recorded, usage of continuous integration, number of pull requests, and distribution of commits per author. We found that Java TDD projects were relatively rare. In addition, there were very few significant differences in any of the metrics we used to compare TDD-like and non-TDD projects; therefore, our results do not reveal any observable benefits from using TDD.
Full-text available
The growing literature on affect among software developers mostly reports on the linkage between happiness, software quality, and developer productivity. Understanding happiness and unhappiness in all its components -- positive and negative emotions and moods -- is an attractive and important endeavour. Scholars in industrial and organizational psychology have suggested that understanding happiness and unhappiness could lead to cost-effective ways of enhancing working conditions, job performance, and to limiting the occurrence of psychological disorders. Our comprehension of the consequences of (un)happiness among developers is still too shallow, being mainly expressed in terms of development productivity and software quality. In this paper, we study what happens when developers are happy and not happy. Qualitative data analysis of responses given by 317 questionnaire participants identified 42 consequences of unhappiness and 32 of happiness. We found consequences of happiness and unhappiness that are beneficial and detrimental for developers' mental well-being, the software development process, and the produced artefacts. Our classification scheme, available as open data enables new happiness research opportunities of cause-effect type, and it can act as a guideline for practitioners for identifying damaging effects of unhappiness and for fostering happiness on the job.
Full-text available
Context: Test Driven Development (TDD) is an agile practice that has gained popularity when it was defined as a fundamental part in eXtreme Programming (XP). Objective: This study analyzed the conclusions of previously published articles on the effects of TDD on internal and external software quality and productivity, comparing TDD with Test Last Development (TLD). Method: In this study, a systematic literature review has been conducted considering articles published between 1999 and 2014. Results: In about 57% of the analyzed studies, the results were validated through experiments and in 32% of them, validation was performed through a case study. The results of this analysis show that 76% of the studies have identified a significant increase in internal software quality while 88% of the studies identified a meaningful increase in external software quality. There was an increase in productivity in the academic environment, while in the industrial scenario there was a decrease in productivity. Overall, about 44% of the studies indicated lower productivity when using TDD compared to TLD. Conclusion: According to our findings, TDD yields more benefits than TLD for internal and external software quality, but it results in lower developer productivity than TLD.
Conference Paper
Full-text available
The research community in Software Engineering and Software Testing in particular builds many of its contributions on a set of mutually shared expectations. Despite the fact that they form the basis of many publications as well as open-source and commercial testing applications, these common expectations and beliefs are rarely ever questioned. For example, Frederic Brooks' statement that testing takes half of the development time seems to have manifested itself within the community since he first made it in the "Mythical Man Month" in 1975. With this paper, we report on the surprising results of a large-scale field study with 416 software engineers whose development activity we closely monitored over the course of five months, resulting in over 13 years of recorded work time in their integrated development environments (IDEs). Our findings question several commonly shared assumptions and beliefs about testing and might be contributing factors to the observed bug proneness of software in practice: the majority of developers in our study does not test; developers rarely run their tests in the IDE; Test-Driven Development (TDD) is not widely practiced; and, last but not least, software developers only spend a quarter of their work time engineering tests, whereas they think they test half of their time.
Full-text available
This paper provides a systematic meta-analysis of 27 studies that investigate the impact of Test-Driven Development (TDD) on external code quality and productivity. The results indicate that, in general, TDD has a small positive effect on quality but little to no discernible effect on productivity. However, subgroup analysis has found both the quality improvement and the productivity drop to be much larger in industrial studies in comparison with academic studies. A larger drop of productivity was found in studies where the difference in test effort between the TDD and the control group's process was significant. A larger improvement in quality was also found in the academic studies when the difference in test effort is substantial; however, no conclusion could be derived regarding the industrial studies due to the lack of data. Finally, the influence of developer experience and task size as moderator variables was investigated, and a statistically significant positive correlation was found between task size and the magnitude of the improvement in quality.
Full-text available
Context Test driven development (TDD) has been extensively researched and compared to traditional approaches (test last development, TLD). Existing literature reviews show varying results for TDD. Objective This study investigates how the conclusions of existing literature reviews change when taking two study quality dimension into account, namely rigor and relevance. Method In this study a systematic literature review has been conducted and the results of the identified primary studies have been analyzed with respect to rigor and relevance scores using the assessment rubric proposed by Ivarsson and Gorschek 2011. Rigor and relevance are rated on a scale, which is explained in this paper. Four categories of studies were defined based on high/low rigor and relevance. Results We found that studies in the four categories come to different conclusions. In particular, studies with a high rigor and relevance scores show clear results for improvement in external quality, which seem to come with a loss of productivity. At the same time high rigor and relevance studies only investigate a small set of variables. Other categories contain many studies showing no difference, hence biasing the results negatively for the overall set of primary studies. Given the classification differences to previous literature reviews could be highlighted. Conclusion Strong indications are obtained that external quality is positively influenced, which has to be further substantiated by industry experiments and longitudinal case studies. Future studies in the high rigor and relevance category would contribute largely by focusing on a wider set of outcome variables (e.g. internal code quality). We also conclude that considering rigor and relevance in TDD evaluation is important given the differences in results between categories and in comparison to previous reviews.
Conference Paper
Background: Writing unit tests is one of the primary activities in test-driven development. Yet, the existing reviews report few evidence supporting or refuting the effect of this development approach on test case quality. Lack of ability and skills of developers to produce sufficiently good test cases are also reported as limitations of applying test-driven development in industrial practice. Objective: We investigate the impact of test-driven development on the effectiveness of unit test cases compared to an incremental test last development in an industrial context. Method: We conducted an experiment in an industrial setting with 24 professionals. Professionals followed the two development approaches to implement the tasks. We measure unit test effectiveness in terms of mutation score. We also measure branch and method coverage of test suites to compare our results with the literature. Results: In terms of mutation score, we have found that the test cases written for a test-driven development task have a higher defect detection ability than test cases written for an incremental test-last development task. Subjects wrote test cases that cover more branches on a test-driven development task compared to the other task. However, test cases written for an incremental test-last development task cover more methods than those written for the second task. Conclusion: Our findings are different from previous studies conducted at academic settings. Professionals were able to perform more effective unit testing with test-driven development. Furthermore, we observe that the coverage measure preferred in academic studies reveal different aspects of a development approach. Our results need to be validated in larger industrial contexts.
Context: Test-driven development (TDD) is an iterative software development practice where unit tests are defined before production code. A number of quantitative empirical investigations have been conducted about this practice. The results are contrasting and inconclusive. In addition, previous studies fail to analyze the values, beliefs, and assumptions that inform and shape TDD. Objective: We present a study designed, and conducted to understand the values, beliefs, and assumptions about TDD. Participants were novice and professional software developers. Method: We conducted an ethnographically-informed study with 14 novice software developers, i.e., graduate students in Computer Science at the University of Basilicata, and six professional software developers (with one to 10 years work experience). The participants worked on the implementation of a new feature for an existing software written in Java. We immersed ourselves in the context of our study. We collected qualitative information by means of audio recordings, contemporaneous field notes, and other kinds of artifacts. We collected quantitative data from the integrated development environment to support or refute the ethnography results. Results: The main insights of our study can be summarized as follows: (i) refactoring (one of the phases of TDD) is not performed as often as the process requires and it is considered less important than other phases, (ii) the most important phase is implementation, (iii) unit tests are almost never up-to-date, and (iv) participants first build in their mind a sort of model of the source code to be implemented and only then write test cases. The analysis of the quantitative data supported the following qualitative findings: (i), (iii), and (iv). Conclusions: Developers write quick-and-dirty production code to pass the tests, do not update their tests often, and ignore refactoring.
Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.