ArticlePDF Available

Does Test-Driven Development Really Improve Software Design Quality?

Authors:

Abstract and Figures

Support for test-driven development [TDD] is growing in many development contexts beyond its common association with extreme programming. By focusing on how TDD influences design characteristics, we hope to raise awareness of TDD as a design approach and assist others in decisions on whether and how to adopt TDD. Our results indicate that test-first programmers are more likely to write software in more and smaller units that are less complex and more highly tested. We weren't able to confirm claims that TDD improves cohesion while lowering coupling, but we anticipate ways to clarify the questions these design characteristics raised. In particular, we're working to eliminate the confounding factor of accessor usage in the cohesion metrics.
Content may be subject to copyright.
© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating
new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
For more information, please see www.ieee.org/web/publications/rights/index.html.
www.computer.org/software
Does Test-Driven Development
Really Improve Software Design Quality?
David S. Janzen, California Polytechnic State University, San Luis Obispo
Hossein Saiedian, University of Kansas
Vol. 25, No. 2
March/April 2008
This material is presented to ensure timely dissemination of scholarly and technical work.
Copyright and all rights therein are retained by authors or by other copyright holders. All
persons copying this information are expected to adhere to the terms and constraints
invoked by each author's copyright. In most cases, these works may not be reposted
without the explicit permission of the copyright holder.
0 74 0 - 74 5 9 / 0 8 / $ 2 5 . 0 0 © 2 0 0 8 I E E E March/April 2008 I E E E S O F T WA R E
77
feature
software metrics
Does Test-Driven
Development Really
Improve Software
Design Quality?
David S. Janzen, California Polytechnic State University, San Luis Obispo
Hossein Saiedian, University of Kansas
TDD is first and
foremost a design
practice. The question
is, how good are the
resulting designs?
Empirical studies
help clarify the
practice and answer
this question.
S
oftware developers are known for adopting new technologies and practices on the
basis of their novelty or anecdotal evidence of their promise. Who can blame them?
With constant pressure to produce more with less, we often cant wait for evidence
before jumping in. We become convinced that competition wont let us wait.
Advocates for test-driven development claim
that TDD produces code that’s simpler, more co-
hesive, and less coupled than code developed in a
more traditional test-last way. Support for TDD is
growing in many development contexts beyond its
common association with Extreme Programming.
Examples such as Robert C. Martins bowling game
demonstrate the clean and sometimes surprising de-
signs that can emerge with TDD,
1
and the buzz has
proven sufficient for many software developers to
try it. Positive personal experiences have led many
to add TDD to their list ofbest practices, but for
others, the jury is still out. And although the litera-
ture includes many publications that teach us how
to do TDD, it includes less empirical evaluation of
the results.
In 2004, we began a study to collect evidence
that would substantiate or question the claims re-
garding TDDs influence on software.
TDD misconceptions
We looked for professional development teams
who were using TDD and willing to participate in
the study. We interviewed representatives from four
reputable Fortune 500 companies who claimed to be
using TDD. However, when we dug a little deeper,
we discovered some unfortunate misconceptions:
Misconception #1: TDD equals automated test-
ing. Some developers we met placed a heavy
emphasis on automated testing. Because TDD
has helped propel automated testing to the fore-
front, many seem to think that TDD is only
about writing automated tests.
Misconception #2: TDD means write all tests
first. Some developers thought that TDD in-
volved writing the tests (all the tests)rst,
rather than using the short, rapid test-code it-
erations of TDD.
Unfortunately, these perspectives miss TDD’s pri-
mary purpose, which is design. Granted, the tests
are important, and automated test suites that can
run at the click of a button are great. However,
78 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e
from early on, TDD pioneers have been clear that
TDD is about design, not the tests.
2
Why the confusion regarding TDD? We propose
two possible explanations.
First, we can blame it on the name, which in-
cludes the word test but not the word design.
But, alas, “test-driven development seems to be
here to stay. We’re unlikely to revert to earlier,
more accurately descriptive names such as test-
driven design.
A second source of confusion is the difference
between internal and external quality. Several
early studies focused on TDD’s effects on defects
(external quality) and productivity.
3
Many re-
sults were promising although somewhat mixed.
Boby George and Laurie Williams reported fewer
defects but lower productivity.
4
Hakan Erdog-
mus, Maurizio Morisio, and Marco Torchiano
reported minimal external quality differences but
improved productivity.
5
Adam Geras, Michael
Smith, and James Miller reported no changes in
productivity, but more frequent unplanned test
failures.
6
The emphasis on external quality is
valid and beneficial, but it can also miss TDD’s
primary focus on design.
Matthias Müller addressed internal quality in a
recent case study comparing ve open-source and
student TDD projects with three open-source non-
TDD projects.
7
(The study incorrectly identified
one TDD project, JUnit, as being non-TDD, and
it didnt confirm whether two projects, Ant and
log4j, were TDD or non-TDD.) Although Müller
focused on a new metric to gauge testability, he
indicated that software developed with TDD had
lower coupling, smaller classes, and higher testabil-
ity, but less cohesiveness.
Despite the misconceptions about TDD, some
of the traditional test-last development teams we
interviewed reported positive experiences with au-
tomated testing, resulting in quality and productiv-
ity improvements. Other test-last teams reported
frustrations and eventual abandonment of the ap-
proach. We believed that focusing on internal qual-
ities, such as simplicity, size, coupling, and cohe-
sion, would emphasize TDDs design aspects and
help clarify how to use it.
TDD in a traditional
development process
We wanted to examine TDD independent
of other process practices, but we had to select a
methodology to minimize independent variables.
We chose to study TDD in the context of a some-
what traditional development process based on the
Unified Process.
8
The projects in this research were
relatively short (typically three to four months). We
believe the process we used could be repeated as it-
erations in a larger evolutionary process model, but
we didnt study this.
Figure 1a illustrates a traditional test-last flow
of development. This process involves significant
effort in specifying the system architecture and
design before any significant software develop-
ment. Such an approach does not preclude some
programming to explore a prototype or prove a
concept, but it assumes that no significant produc-
tion software is constructed without a detailed de-
sign. Unit testing occurs after a unit is coded. We
asked test-last programmers in the study to use an
iterative approach in which the time from unit con-
struction to unit testing was very short (seconds or
minutes rather than weeks or months).
Figure 1b illustrates the test-first development
flow. In this approach, the project identifies some
high-level architecture early, but that design doesnt
proceed to a detailed level. Instead, the test-first
process of writing unit tests and constructing the
units in short, rapid iterations allows the design to
emerge and evolve.
Neither of these ows makes any assumptions
about other process practices.
Study design and execution
We designed our study to compare the test-first
TDD approach with a comparable but reversed test-
last approach. In particular, programmers in both
Test
Detailed design
Code
High-level
design/architecture
High-level
design/architecture
Unit testCode
RefactorCode
(a)
Test
Unit test
Design and code
(b)
Figure 1. Development
flow: (a) traditional test-
last and (b) test-driven
development/test-first
flow.
March/April 2008 I E E E S O F T WA R E
79
the test-first and test-last groups wrote automated
unit tests and production code in short, rapid itera-
tions. We conducted pre-experiment surveys to en-
sure no significant differences existed between the
test-first and test-last groups in terms of program-
ming experience, age, and acceptance of TDD. The
only difference was whether they wrote the tests
before or after writing the code under test.
We selected a development group in one com-
pany to conduct three quasi-controlled experi-
ments and one case study. (We call the studies
quasi-controlled because the teams werent ran-
domly assigned.) We selected this group because
of their willingness to participate in the study, to
share project data, and to use TDD as an integral
part of design. Developers voluntarily participated
as part of their regular full-time work for the com-
pany, which assigned all projects and used the re-
sults in production.
In addition, we conducted two quasi-con-
trolled experiments in undergraduate and gradu-
ate software engineering courses at the University
of Kansas during the summer and fall of 2005.
Nineteen students worked in teams of three or
four programmers each. Both courses involved the
same semester-long project.
Table 1 summarizes the studies. The three in-
dustry quasi-controlled experiments involved ve
similar but distinct projects completed by overlap-
ping groups of ve developers. The “Teams col-
umns in table 1 identify these developers with the
letters A through E and indicate how the teams
overlap on projects. All industry developers had
computing degrees and a minimum of six years
professional development experience. The projects
were all Web applications completed in Java, devel-
oped as part of the teams normal work domain,
and completed in three to 12 months each.
Companies are rarely willing to commit two
teams to develop the same system just to see which
approach works better. So, to make things fair,
we interleaved the approaches and mixed up their
order in completing individual projects. The first
quasi-experiment involved a test-last project with
no automated tests, followed by a second phase
of the same project completed with a test-first
approach. The test-first project used the Spring
framework. We labeled this comparison INT-TF
Table 1
Study profile
Study*
Experiment
Type
Test-First Test-Last
Classes LOC
Teams
/
experience*
Technologies/
real world? Classes LOC
Teams
/
experience
Technologies/
real world?
INT-TF Quasi-
controlled
28 842 A/>5 years J2EE, Spring/
real world
18 1,562 A/>5 years J2EE/real world
ITL-TF Quasi-
controlled
28 842 A/>5 years J2EE, Spring/
real world
21 811 AB/>5 years J2EE/real world
ITF-TL Quasi-
controlled
69 1,559 ABC/>5 years J2EE, Spring,
Struts/real
world
57 2,071 BC/>5 years J2EE, Spring,
Struts/real
world
ICS Case study 126 2,750 ABC/>5 years J2EE, Spring,
Struts/real
world
831 49,330 ABCDE/
>5 years
J2EE, Spring,
Struts/real
world
GSE Quasi-
controlled
19 1,301 Two teams of
3 participants/
05 years
Java/ academic 4 867 One team of
3 participants/
>5 years
Java/academic
USE Quasi-
controlled
28 1,053 One team of
3 participants/
novice
Java/academic 17 1,254 Two teams
of 3 and 4
participants/
novice
Java/academic
Unique totals 173 5,104 12
participants
N/A 852 51,451 15
participants
N/A
* INT-TF (industry no-tests followed by test-first), ITL-TF (industry test-last followed by test-first), ITF-TL (industry test-first followed by test-last), ICS (industry case study), GSE (graduate software engineering), USE (undergraduate
software engineering).
A, B, C, D, and E identify five developers to show overlap between teams.
One of the early test-last projects had additional developers.
80 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e
for “industry no-tests followed by test-first.The
second quasi-experiment involved a test-last project
followed by a test-first project. Again, the test-first
application used the Spring framework; we labeled
this comparison ITL-TF for “industry test-last fol-
lowed by test-first. The third quasi-experiment
involved a test-first project followed by a test-last
project. Both projects used the Struts and Spring
frameworks along with object-relational mapping
patterns and extensive mock objects in testing; we
labeled this comparison ITF-TL for industry test-
first followed by test-last.
The case study, labeled ICS, examined 15 soft-
ware projects completed in one development group
over five years. The 15 projects included the five
test-first and test-last projects from the industry
quasi-experiments. The group had completed the
remaining 10 projects prior to the quasi-experiment
projects. We interviewed the developers from these
10 projects and determined that all 10 used a test-
last approach. All 15 case study projects were com-
pleted in three to 12 months with less than 10,000
lines of code by development teams of three or fewer
primary developers. Six projects were completed
with no automated unit tests; six projects, with au-
tomated tests in a test-last manner; and three proj-
ects, with automated tests in a test-first manner. All
projects used Java to develop Web applications in a
single domain.
We labeled the academic studies GSE for grad-
uate software engineering and USE for “under-
graduate software engineering. We divided the
student programmers into test-first and test-last
groups and gave them the same set of programming
requirements for the semester-long project—specif-
ically, to design and build an HTML pretty-print
system. The system was to take an HTML file as
input and transform the file into a more human-
readable format by performing operations such as
deleting redundant tags and adding appropriate
indentation.
Students self-selected their teammates, and we
compared the results from pre-experiment surveys
to ensure that no statistically significant differences
existed between the teams in preparation or bias.
In particular, we established Java experience as a
blocking variable to ensure that each team had a
minimum and balanced skill set. In every case, the
teams were fairly balanced and didnt change dur-
ing the study. All but one student in the GSE study
had at least one year of professional development
experience. Students in the USE study were all ju-
niors or seniors.
We developed TDD and automated testing
training materials and delivered them in conjunc-
tion with each study. We gave the training to the
industry participants in summer 2004. The test-
first and test-last projects began in fall 2004 and
(a) (b)
(c) (d)
0
20
40
60
80
100
USE GSE ICS INT-TF ITL-TF ITF-TL
USE GSE ICS INT-TF ITL-TF ITF-TL
Average line coverage (percent)
0
20
40
60
80
100
120
140
160
USE GSE ICS INT-TF ITL-TF ITF-TL
Lines of code per module
0
5
10
15
20
25
30
35
Lines of code per method
0
2
4
6
8
10
12
14
16
Methods per class
Test-first
Test-last
Test-first
Test-last
Test-first
Test-last
Test-first
Test-last
USE GSE ICS INT-TF ITL-TF ITF-TL
Figure 2. Code size
metrics: (a) average
line coverage of
automated tests, (b)
lines of code per module
(class), (c) lines of code
per method, and (d)
methods per class.
March/April 2008 I E E E S O F T WA R E
81
ran through spring 2006. Although the develop-
ers might have experienced some learning curve
with TDD during the rst few months, we be-
lieve the project durations and total time elapsed
established sufficient experience in the test-first
projects.
The software engineering courses involved rel-
atively short training sessions (about two hours)
dedicated to automated unit testing and TDD top-
ics. Some students noted challenges with applying
TDD at first. We observed undergraduate students
in a lab setting and provided additional informal
instruction as needed to keep them writing au-
tomated tests. The industry training consisted of
a full-day course on automated unit testing and
TDD. We carefully presented the materials for both
test-first and test-last approaches to avoid introduc-
ing any approach bias.
Analyzing the studies
We used several popular software metrics to
evaluate the artifacts from the study. Although
experts differ regarding the most appropriate met-
rics, particularly in areas such as coupling
9
and co-
hesion,
10
we selected a representative set that are
widely calculated and reported.
We began our analysis by considering whether
the programmers in our studies actually wrote au-
tomated unit tests. We informally monitored devel-
opers during the studies through brief interviews
and observed code samples. The post-experiment
survey asked developers to anonymously report
whether they used the prescribed approach. In all
the studies but one, programmers reported using
the approach they were instructed to use (test-first
or test-last). The one exception was a team in the
undergraduate software engineering course. De-
spite being instructed to use a test-first approach,
the team reported using a test-last approach, so we
reclassified them into the test-last control group.
Figure 2a reports each study’s average line cover-
age. This measure indicates the percentage of lines
of code that the automated test suites execute. Not
surprisingly, line coverage is rather low in the stu-
dent studies and some test-last teams failed to write
any automated tests. In their post-survey comments,
several student test-last team members reported
running out of time to write tests. In contrast, pro-
fessional test-last developers in the ICS, ITL-TF, and
ITF-TL studies reported more faithful adherence to
the rapid-cycle code-test-refactorpractice.
In every study but the last one, the test-first
programmers wrote tests that covered a higher
percentage of code. The test-last control group in
the INT-TF study performed only manual test-
ing, so the group had no line coverage. In the case
study, we omitted test-last projects with no auto-
mated tests from the line-coverage percentage cal-
culation to avoid unfairly penalizing the test-last
measures. In all the studies, we found additional
testing metrics such as branch coverage and num-
ber of assertions to be generally consistent with
the line-coverage results.
In the final study, when the same professional
developers completed a test-last project after having
completed a test-first project earlier, they increased
their average line coverage. Average branch test cov-
erage (Boolean expressions in control structures)
was actually a bit lower at 74 percent for test-last
while the test-first project achieved 94 percent. We
observed a similar phenomenon in a separate study
with beginning programmers.
11
In that study, stu-
dent programmers who used the test-first approach
first wrote more tests than their test-last counter-
parts. However, on the subsequent project, when
students were asked to use the opposite approach,
the test-last programmers (those who used test-first
on the first project) again wrote more tests. Could
it be that the test-first approach has some sort of
a residual effect on a programmer’s disposition to
write more tests? If so, we wonder whether this ef-
fect would diminish over time.
Impact on code size
The simplest software metric is size. Figure 2b
reports lines of code per module (generally a class).
In all studies, test-first programmers wrote smaller
modules than their test-last counterparts. The case
study was the only study with enough classes to
analyze the data statistically. A two-sample, two-
tailed, unequal variance t-test indicated that the
difference in ICS lines of code per module was sta-
tistically significant with p < 0.05. Unless stated
otherwise, we use this same test and criteria when
claiming statistical significance.
Similarly, test-first programmers tended to write
smaller methods on average. Figure 2c reveals that
test-first programmers average method size in lines
of code was below the test-last averages in all but
the last two industry studies (ITL-TF and ITF-TL).
The use of simple one-line accessor methods af-
fects these differences. The ITF-TL study had the
most striking difference with nearly 40 percent of
the methods in the test-last project being simple
one-line accessors. In contrast, only 11 percent of
the test-first methods were simple accessors. Inlin-
ing the one-line accessor methods strengthens the
claim that test-first programmers write smaller
methods on average.
Finally, figure 2d indicates that the test-first
82 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e
programmers wrote fewer methods per class in all
but the ITL-TF study (the difference was very slight
in the USE study).
In summary, the data shows a possible tendency
for test-first programmers to write smaller, simpler
classes and methods.
Impact on complexity
Size is one measure of complexity: smaller classes
and methods are generally simpler and easier to un-
derstand. Other common complexity measures in-
clude counting the number of independent paths
through code (cyclomatic complexity) and mea-
suring the degree of nesting (nested block depth).
More branches, paths, and nesting make code more
complex and therefore more difficult to understand,
test, and maintain.
We report three metrics to compare the com-
plexity differences between the test-first and test-last
projects. Weighted-methods complexity measures
the sum of cyclomatic complexities for all methods
in a class. In figure 3a, we see that test-first pro-
grammers consistently produced classes with lower
complexity in terms of the number of branches and
the number of methods. The ICS and ITF-TL dif-
ferences were statistically significant. The consis-
tently simpler classes by test-first programmers isnt
surprising considering the earlier report of fewer
methods per class.
The remaining two metrics, cyclomatic com-
plexity per method and nested block depth (NBD)
per method, measure whether individual methods
are more or less complex. Figures 3b compares
cyclomatic complexity per method, and figure 3c
compares NBD per method. The method-level dif-
ferences are less consistent than those at the class-
level. Cyclomatic complexity per method was lower
in the test-first projects in four of the six studies.
The difference was statistically significant in ICS
and INT-TF. In the two studies where the test-last
methods were less complex, the difference was
small and the method complexity was low for both
test-first and test-last methods. The difference in
the ITF-TL study was statistically significant, but
we question the difference, given the earlier discus-
sion on accessor methods in this study.
NBD comparisons were similar. The test-first
projects had lower NBD in three studies. In the
remaining three studies, the test-last projects had
lower NBD, but the values are low and the differ-
ences are small.
We think the complexity metrics point to a ten-
dency of test-first programmers to write simpler
classes and sometimes simpler methods.
Impact on coupling
The tendency of test-first programmers to imple-
ment solutions with more and smaller classes and
methods might generate more connections between
classes. Figure 4a shows the coupling between ob-
jects (CBO), which measures the number of connec-
tions between objects. Half the studies had a lower
CBO in the test-first projects, and half were lower
in the test-last projects. The average CBO values
were acceptable in all the studies; none of the differ-
ences were statistically significant. The maximum
CBO for any class was acceptably low (12 or fewer)
for all the projects except two test-last ICS projects
(CBO of 28 and 49) and the two projects in the ITF-
TL study. Interestingly, the test-first project in the
ITF-TL study had a class with a CBO of 26 and the
test-last project had a class with a CBO of 16, both
of which might be considered unacceptably high.
Figure 4b reports differences in another cou-
pling measure: fan-out per class. Fan-out refers to
the number of classes used by a class. Not surpris-
ingly, the results are similar to those for CBO. The
differences are small—not statistically significant
and the values are acceptable.
Two additional metrics seem informative when
considering coupling: the average number of method
parameters (PAR) and the information ow (IF =
fan-in
2
* fan-out
2
), where fan-in refers to the num-
ber of classes using a particular class. In all but the
GSE study, PAR was higher in the test-first projects.
This difference was statistically significant in all the
industry studies. In all but the ITL-TF study, IF was
higher in the test-first projects.
(a) (b) (c)
USE GSE ICS INT-TF ITL-TF ITF-TL USE GSE ICS INT-TF ITL-TF ITF-TL USE GSE ICS INT-TF ITL-TF ITF-TL
0
20
40
60
80
100
120
140
160
180
Weighted-methods complexity
0
1
2
3
4
5
6
7
Cyclomatic complexity per method
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Nested-block depth per method
Test-first
Test-last
Test-first
Test-last
Test-first
Test-last
Figure 3. Complexity
metrics: (a) weighted-
methods complexity,
(b) cyclomatic
complexity per method,
and (c) nested block
depth per method.
March/April 2008 I E E E S O F T WA R E
83
The PAR and IF measures indicate a high vol-
ume of interaction and data passing between units
in the test-first projects. This could reflect the in-
creased testing discussed earlier. Test-first develop-
ers often report writing more parameters to make
a method easier to configure and test. The higher
IF values in the test-first projects might indicate
high reuse (fan-in).
We were curious about whether the possible in-
creased coupling was good or bad. Coupling can
be bad when its rigid and changes in one module
cause changes in another module. However, some
coupling can be good, particularly when it’s con-
figurable or uses abstract connections such as in-
terfaces or abstract classes. Such code can be highly
flexible and thus more maintainable and reusable.
Many test-first programmers make heavy use
of interfaces and abstract classes to simplify test-
ing. For instance, the dependency-injection pat-
tern
12
is popular among TDD developers, and it’s
central to frameworks such as Spring,
13
which sev-
eral projects in our study used. To check this out,
we looked at several abstraction metrics, including
Robert Martins abstractness measure (RMA),
1
number of interfaces implemented (NII), number
of interfaces (NOI), and number of abstract classes
(NOA) in all the projects. Our evaluation of these
measures didnt give a conclusive answer to whether
the test-first approach produces more abstract de-
signs. However in most of the studies, the test-first
approach resulted in more abstract projects in terms
of RMA, NOI, and NII.
The coupling analysis doesnt reveal clear an-
swers. It appears that test-first programmers might
actually tend to write more highly coupled smaller
units. However, possible increases in abstractness
might indicate that the higher coupling is a good
kind of coupling, resulting in more flexible soft-
ware. The coupling question needs more work.
Impact on cohesion
Cohesion is difficult to measure. The most
common metrics look at the sharing (or use) of at-
tributes among methods. We elected to use Brian
Henderson-Sellers definition of lack of cohesion of
methods, LCOM5,
14
because it normalizes cohe-
sion values between zero and one. In addition, sev-
eral popular tools calculate LCOM5.
Figure 4c reports LCOM5 measures for the
studies. LCOM is an inverse metric, so lower values
indicate better cohesion. The chart indicates that
cohesion was better in the test-first projects in half
the studies (ICS, ITL-TF, and ITF-TL) and worse in
the other half. The difference was statistically sig-
nificant in only two studies (ICS and ITF-TL).
One known problem with most cohesion met-
rics is their failure to account for accessor meth-
ods.
10
Most cohesion metrics, including LCOM5,
penalize classes that use accessor methods. The use
of accessor methods is common in Java software,
and all the study projects involved Java.
To gauge the impact of this concern, we calcu-
lated the percentage of accessor to total methods in
all but the ICS studies. The test-first projects had
an average of 10 percent more accessors in all but
the ITF-TL study. It seems plausible that correcting
for the accessor problem would bring the test-first
cohesion metrics in line with the test-last measures.
We were nevertheless unable to substantiate claims
that TDD improves cohesion.
Threats to validity
Like most empirical studies, the validity of our
results is subject to several threats. In particular, the
results are based on a small number of developers.
Team selection wasnt randomized, and participants
knew that we were comparing TDD and non-TDD
approaches, leading to a possible Hawthorne effect.
Furthermore, in the industry experiments, it was
nearly impossible to control all variables except the
use or non-use of TDD, while keeping the projects
real and in-domain.
We made every effort to ensure that the TDD
and non-TDD teams were applying the approach
assigned to them. We interviewed developers during
the projects and at their conclusion. We observed
the undergraduate academic teams in a lab setting
and examined multiple in-process code samples to
(a) (b) (c)
USE GSE ICS INT-TF ITL-TF ITF-TL
USE GSE ICS INT-TF ITL-TF ITF-TL
USE GSE ICS INT-TF ITL-TF ITF-TL
0
1
2
3
4
5
6
Coupling between objects
0
0.5
1.0
1.5
2.0
2.5
3.0
Fan-out per class
0
0.1
0.2
0.3
0.4
0.5
0.6
Lack of cohesion of methods
Test-first
Test-last
Test-first
Test-last
Test-first
Test-last
Figure 4. Coupling
and cohesion between
objects per project:
(a) coupling between
objects per project,
(b) fan-out per class,
and (c) lack of cohesion
of methods.
84 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e
see that automated unit tests were written in step
with production code. Still, developers could have
misapplied the TDD and non-TDD approaches at
some points. We look forward to additional stud-
ies in varied domains that will increase the results
validity and broaden their applicability.
B
y focusing on how TDD influences design
characteristics, we hope to raise awareness
of TDD as a design approach and assist
others in decisions on whether and how to adopt
TDD. Our results indicate that test-first program-
mers are more likely to write software in more and
smaller units that are less complex and more highly
tested. We werent able to confirm claims that TDD
improves cohesion while lowering coupling, but we
anticipate ways to clarify the questions these design
characteristics raised. In particular, were working
to eliminate the confounding factor of accessor us-
age in the cohesion metrics.
References
1. R.C. Martin, Agile Software Development: Principles,
Patterns, and Practices, Pearson Education, 2003.
2. K. Beck, “Aim, Fire,”
IEEE Software, Sept./Oct. 2001,
pp. 87–89.
3. D. Janzen and H. Saiedian, “Test-Driven Development:
Concepts, Taxonomy, and Future Direction,” Com-
puter, Sept. 2005, pp. 4350.
4. B. George and L. Williams, “A Structured Experiment
of Test-Driven Development,” Information and Soft-
ware Technology, vol. 46, no. 5, 2004, pp. 337–342.
5. H. Erdogmus, M. Morisio, and M. Torchiano, “On the
Effectiveness of the Test-First Approach to Program-
ming,IEEE Trans. Software Eng., vol. 31, no. 3,
2005, pp. 226–237.
6. A. Geras, M. Smith, and J. Miller, “A Prototype Em
-
pirical Evaluation of Test Driven Development,” Proc.
10th Int’l Symp. Software Metrics (Metrics 04), IEEE
CS Press, 2004, pp. 405416.
7. M.M. Müller, “The Effect of Test-Driven Development
on Program Code,Proc. Int’l Conf. Extreme Program-
ming and Agile Processes in Software Eng. (XP 06),
Springer, 2006, pp. 94–103.
8. P. Kruchten,
The Rational Unified Process: An Intro-
duction, 3rd ed., Addison-Wesley, 2003.
9. L.C. Briand, J.W. Daly, and K. Wüst, “A Unied
Framework for Coupling Measurement in Object-Ori-
ented Systems,” IEEE Trans. Software Eng., vol. 25,
no. 1, 1999, pp. 91–121.
10. L. Briand, J. Daly, and J. Wüst, “A Unified Framework
for Cohesion Measurement in Object-Oriented Sys-
tems,” Empirical Software Eng., vol. 3, no. 1, 1998, pp.
65–117.
11. D. Janzen and H. Saiedian, “A Leveled Examination of
Test-Driven Development Acceptance,” Proc. 29th Intl
Conf. Software Eng. (ICSE 07), IEEE CS Press, 2007,
pp. 719–722.
12. M. Fowler, “Inversion of Control Containers and the
Dependency Injection Patterns,” 2004; www.martin-
fowler.com/articles/injection.html.
13. R. Johnson et al.,
Java Development with the Spring
Framework, Wrox, 2005.
14. B. Henderson-Sellers,
Object-Oriented Metrics: Mea-
sures of Complexity, Prentice Hall, 1996.
About the Authors
David Janzen is an assistant professor of computer science at California Polytechnic
State University, San Luis Obispo, and president of Simex, a software consulting and
training company. His teaching and research interests include agile methodologies and
practices, empirical software engineering, software architecture, and software metrics. He
received his PhD in computer science from the University of Kansas and is a member of the
IEEE Computer Society and the ACM. Contact him at California Polytechnic State University,
Computer Science Department, San Luis Obispo, California, 93407, djanzen@calpoly.edu or
david@simexusa.com.
Hossein Saiedian is a professor of software engineering in the Department of
Electrical Engineering and Computer Science at the University of Kansas and a member
of the university’s Information and Telecommunication Technology Center. His research
interests are in software engineering—particularly, technical and managerial models for
quality software development. He received his PhD in computer science from Kansas State
University. He’s a senior member of the IEEE. Contact him at EECS, University of Kansas,
Lawrence, KS 66049, saiedian@eecs.ku.edu.
Writers
For detailed information on sub-
mitting articles, write for our
Editorial Guidelines (software@
computer.org) or access www.
computer.org/software/author.
htm.
Letters to the Editor
Send letters to
Editor,
IEEE Software
10662 Los Vaqueros Circle
Los Alamitos, CA 90720
software@computer.org
Please provide an email address or
daytime phone number with your
letter.
On the Web
Access www.computer.org/
software for information about
IEEE Software.
Subscribe
Visit www.computer.
org/subscribe.
Subscription Change of Address
Send change-of-address requests
for magazine subscriptions to
address.change@ieee.org. Be sure
to specify IEEE Software.
Membership Change of Address
Send change-of-address requests
for IEEE and Computer Society
membership to member.services@
ieee.org.
Missing or Damaged Copies
If you are missing an issue or you
received a damaged copy, contact
help@computer.org.
Reprints of Articles
For price information or to order
reprints, send email to software@
computer.org or fax +1 714 821
4010.
Reprint Permission
To obtain permission to reprint
an article, contact the Intellectual
Property Rights Ofce at
copyrights@ieee.org.
HOW TO REACH US
... O progresso do projetoé avaliado diariamente por meio de um controle contínuo de requisitos, projeto e soluções, além de testes executados durante todo o ciclo de vida [1]. Além disso, a atividade de testes assume um papel importante no projeto do software [6] e também na forma de documentar o software de maneira não ambígua, sempre atualizada de acordo com o andamento do projeto e que pode ser executada rapidamente. ...
... A atividade de teste dentro de métodoságeis tem como objetivo evitar que a qualidade do produto e a condução do projeto seja afetada por processos menos formais de documentação e projeto em relação aos métodos tradicionais [14]. Além dos testes validarem novas funcionalidades e mudanças de forma contínua, eles também podem guiar o projeto e documentar o software [6]. A documentação do software utilizando testes, além de não conter ambiguidades como uma documentação descrita em linguagem natural, sempre estará atualizada e poderá ser executada tanto pelo desenvolvedor (e.g. ...
... O TDD [6]é uma estratégia de desenvolvimento de software que requer que testes automatizados sejam escritos antes do desenvolvimento de código funcional em iterações pequenas e rápidas. Neste sentido o desenvolvimento dirigido a testes pode ser utilizado para explorar, projetar, desenvolver e testar o software. ...
Conference Paper
Full-text available
Software testing activity has a great importance on agile development methods. In this context, strategies for testing have been created and adapted, trying to eliminate non-productive aspects of testing, identifying best practices and, mainly, ways to automate the testing process. This paper presents a systematic review which main goal is to identify strategies, techniques and testing criteria applied in agile methods. We also investigated issues related to the importance of testing activity in these methods, testing tools used and experimental studies that demonstrate the implications of an agile testing approach. This study evinced an emphasis on the use of test-driven development strategy for unit and integration testing and acceptance testing (or business testing) with the customer, as well as a concern with experimental studies that can measure the benefits and difficulties of using such strategies, and the necessity of testing tools in the context of agile methods.
... Some of the benefits of AUT include reduced time to test, discover bugs, fix bugs, and implement new features; wider and measurable test coverage; reproducibility, reusability, consistency, and reliability of tests; improved accuracy, regression testing, parallel testing, faster feedback cycle, reduced cost, and higher team morale [15,31,17,40,27]. Likewise, TDD has been shown to result in benefits such as flexible and adaptive program design, cleaner interfaces, higher code quality, maintainability, extensibility, reliability, detailed evolving specification, reduced time on bug fixes and feature implementation, reliable refactoring and code changes, reduced cost, reduced development time, and increased programmer productivity [28,10,19,6,1,3,14,36,20,5,33]. Unfortunately, most of the undergraduate computer science and engineering introductory programming courses do not expose students to AUT and TDD. ...
... In addition to academic studies, numerous industry studies have also been conducted to measure the efficacy of TDD. Janzen et al. [19] gathered data from both academic and industry software teams and found that the test-first programmers, compared to their test-last counterparts, wrote tests that covered a higher percentage of code, wrote smaller modules, smaller methods, and fewer methods per class on the average. The complexity metrics point to a tendency of test-first programmers to write simpler classes and sometimes simpler methods. ...
... From a code testability and correctness point of view, TDD with AUT provides numerous advantages such as immediate and continuous feedback regarding the correctness of code, ability to identify, localize, and fix bugs quickly, traceability and localization of code that got broken due to the addition of new functionality, reduced number of bugs, and confidence in code quality based on test coverage. Moreover, TDD with AUT provides additional benefits which result in a positive environment for developers such as improved maintainability and extensibility of the code, reduced development time, higher programmer productivity, and reduced development costs [28,10,19,6,1,3,14,36,20,5,33]. ...
Article
The best practices of agile software development have had a significant positive impact on the quality of software and time-to-delivery. As a result, many leading software companies employ some form of agile software development practices. Some of the most important best practices of agile software development, which have received significant attention in recent years, are automated unit testing (AUT) and test-driven development (TDD). Both of these practices work in conjunction to provide numerous benefits. AUT leads to reduced time to test, discover bugs, fix bugs, and implement new features; wider and measurable test coverage; reproducibility, reusability, consistency, and reliability of tests; improved accuracy, regression testing, parallel testing, faster feedback cycle, reduced cost, and higher team morale. The benefits of TDD include flexible and adaptive program design, cleaner interfaces, higher code quality, maintainability, extensibility, reliability, detailed evolving specification, reduced time on bug fixes and feature implementation, reliable refactoring, and code changes, reduced cost, reduced development time, and increased programmer productivity. Unfortunately, students in introductory programming courses are generally not introduced to AUT and TDD. This leads to the development of bad programming habits and practices which become harder to change later on. By introducing the students earlier to these industry-standard best practices, not only the motivation and interest of students in this area can be increased but also their academic success and job marketability can be enhanced. This paper presents the detailed design and efficacy study of an introductory C++ programming course designed using the principles of AUT and TDD. The paper presents the pedagogical techniques employed to build industry-proven agile software development practices in the students. As part of the paper, all the course material including the source code for the labs and the automated unit tests are being made available to encourage people to incorporate these best practices into their curricula.
... The intuition is that TDD tends to produce more decoupled code because tests written before the code focus on the desired behavior (rather than on the implementation details) and hence it drives testable, simpler code than other approaches, such as test-lastdevelopment [3]. There is plenty of anecdotal and informal evidence that TDD produces code of superior quality, while more rigorous empirical and experimental evaluations diverge in finding some impact [4] or no difference at all [5]. ...
... 4.7 3.9 I feel that writing the tests before the actual code helps me feel more confident while coding and makes me feel more competent to deal with complex coding problems. 4 like I could control what I am doing". We also add a control question stating that the writing of tests before the production code actually increases anxiety. ...
... Neste cenário, a metodologia Test Driven Design (TDD) pode ser aplicada no benchmark de controladores para acionamentos de máquinas elétricas. Esta metodologia consiste em realizar uma série de testes automatizados em uma unidade de software, buscando atingir determinadas funcionalidades e parâmetros de projeto [8]. Este trabalho visa validar o uso de plataformas de testes em HIL (Hardware in the Loop) para gerar relatórios de benchmark de controladores para PMSMs não senoidais, através da estratégia de Test Driven Design. ...
... Valor Overshoot máximo 10% Tempo de acomodação máximo 400 ms Erro de rastreamento máximo em regime permanente 6rad/s A plataforma de testes utilizada permite gerar relatório automatizado para benchmark dos controladores.A simulação foi realizada em ambiente V-HIL. Os controladores de corrente e velocidade foram definidos através de (2), utilizando ξC i = ξC m = 0, 8 Os controladores k 1 e k 4 apresentaram desempenho insatisfatório quanto ao erro de rastreamento em regime permanente, apresentando variações grandes na velocidade para a dada carga. Na Tabela VI são apresentados os resultados dos testes de eficiência média para cada controlador testado. ...
Conference Paper
This paper aims to validate the use of automatic test routines in HIL to evaluate performance of PMSM drivers. Therefore, a non-sinusoidal PMSM model is implemented, driven through six-steps commutation. In addition, the conduction and commutation losses of the inverter are determined, in order to use the efficiency as a comparison parameter to generate a report for benchmarking controllers. The six-step 120°strategy is used to drive the PMSM, which determines the order of commutations of a three-phase inverter through the information of the machine rotor position obtained by Hall effect sensors. The Test Driven Design methodology was used to compare the efficiency of different controllers. Therefore, the conduction and commutation losses of the inverter were determined, taking into account the junction temperature of the IGBTs and their thermal model. From this, the power dissipated in the inverter was used to define the efficiency of the system. An automated test routine was established to benchmark six current and speed controllers in terms of efficiency and rotor speed.
... Nomura et al. [23] discussed the different aspects and types of testing as the Site Reliability Engineering (SRE) and Testing run alongside either by static testing and reviews or running the system/SW to confirm the compliance of requirements. In this paper, the authors have tested techniques and tools and described them as well as some typical latest research has also been summarised [11]. Arnicane [2] focused on the theoretical bounds of the size of test suites or the complexity of domain testing methods and included a subsumption hierarchy that attempted to relate various coverage criteria associated with the identified domain testing methods. ...
Article
Fast-growing software needs result in the rise of quality software in technical and time challenges in software development and the impact the cost and scarcity of resources addressed by the companies. Thus, this research focuses on optimal implementation of the User Acceptance Testing (UAT) and the process generation integration. The Software Development Life Cycle (SDLC) was adapted to develop software and introduce the UAT process right from the initial phase of the software development. Additionally, it is devised to maximise time reduction by implementing the client testing in all the three processes. A High Capability to Detect (HCD) procedure has been incorporated in the problem formulation that has optimally identified sensitive bugs. A Modified Reuse of Code (MRC) is proposed for a feasible time-saving solution. The proposed UAT will provide an optimal solution in the software testing phases implemented earlier than black-box testing. The proposed UAT has significantly better production time, development cost, and software quality in comparison to other traditional UATs. The study's findings were corroborated by the output data from the UAT cases. The UAT ensures the quality of the product in the early phase of the development and implementation of the projects. This will minimise the risk during and post-implementation of bugs and achieve the target audience’s needs.
... As stated earlier, this procedure, which has a heavy focus on incremental changes and small tasks [13], generally increases the test coverage and allows for shorter test cycles [14]. Moreover, by encouraging the developers to break down the system into many smaller pieces and separate components, TDD also heavily influences the resulting internal design [15]. ...
Article
Full-text available
Due to the ongoing trend of digitalization, the importance of software for today’s society is continuously increasing. Naturally, there is also a huge interest in improving its quality, which led to a highly active research community dedicated to this aim. Consequently, a plethora of propositions, tools, and methods emerged from the corresponding efforts. One of the approaches that have become highly prominent is the concept of test-driven development (TDD) that increases the quality of created software by restructuring the development process. However, such a big change to the followed procedures is usually also accompanied by major challenges that pose a risk for the achievement of the set targets. In order to find ways to overcome them, or at least to mitigate their impact, it is necessary to identify them and to subsequently raise awareness. Furthermore, since the effect of TDD on productivity and quality is already extensively researched, this work focuses only on issues besides these aspects. For this purpose, a literature review is presented that focuses on the challenges of TDD. In doing so, challenges that can be attributed to the three categories of people, software, and process are identified and potential avenues for future research are discussed.
... This is supported by the previously written tests that help to detect if new errors were introduced during this procedure. As stated previously, this overall process with its focus on incremental changes and small tasks (Williams et al. 2003) not only impacts the test coverage and provides the developers with faster feedback, due to shorter test cycles (Janzen and Saiedian 2005), but also heavily influences the developed solution's design (Janzen and Saiedian 2008). ...
Conference Paper
Knowledge, information, and modern technologies have become some of the most influential drivers of today’s society, consequently leading to a high popularity of the concepts of big data (BD). However, their actual harnessing is a demanding task that is accompanied by many barriers and challenges. To facilitate the realization of the corresponding projects, the (big) data science engineering process (BDSEP) has been devised to support researchers and practitioners in the planning and implementation of data intensive projects by outlining the relevant steps. However, the BDSEP is only geared towards a test last development approach. With recent works suggesting the application of test driven development (TDD) in the big data domain, it appears reasonable to also provide a corresponding TDD focused equivalent to the BDSEP. Therefore, in the publication at hand, using the BDSEP as a foundation, the test driven big data science engineering process (TDBDSEP) is proposed, facilitating the application of TDD in the big data domain and further enriching the discourse on BD quality assurance.
Article
Many current controllers are designed based on linear models, which may lead to poor performance when limiting its variables to a safe operation region, or even lead to instability when subject to abnormal operation conditions. Taking that as motivation, the contribution of this work is a new automated test-driven design procedure for robust and optimized current controllers applied to LCL-filtered grid-tied inverters. The design of the control gains towards an optimal solution is oriented by high-fidelity simulations of the converter, covering tests under normal and abnormal operating conditions, such as: reference tracking, grid impedance variations, voltage sags, harmonic compliance, and current and voltage limitations. A particle swarm optimization algorithm is used to evolve the control gains in a computationally efficient way, and linear matrix inequalities are employed to accelerate the optimization process and to provide a theoretical certificate of robust stability. Results are obtained in both controller hardware-in-the-loop testbed and an experimental 5.4 kW prototype, illustrating cases where the proposed design ensures superior performance under normal and abnormal grid conditions when compared with three other current control designs from the literature.
Article
Test-driven development (TDD) has garnered considerable attention in professional settings and has made some inroads into software engineering and computer science education. A series of leveled experiments were conducted with students in beginning undergraduate programming courses through upper-level undergraduate, graduate, and professional training courses. This paper reports that mature programmers who try TDD are more likely to choose TDD over a similar test-last approach. Additionally this research reveals differences in programmer acceptance of TDD between beginning programmers who were reluctant to adopt TDD and more mature programmers who were more willing to adopt TDD. Attention is given to confounding factors, and future studies aimed at resolving these factors are identified. Finally proposals are made to improve early programmer acceptance of TDD.
Article
From the Publisher:Best selling author and world-renowned software development expert Robert C. Martin shows how to solve the most challenging problems facing software developers, project managers, and software project leaders today. This comprehensive, pragmatic tutorial on Agile Development and eXtreme programming, written by one of the founding father of Agile Development: Teaches software developers and project managers how to get projects done on time, and on budget using the power of Agile Development. Uses real-world case studies to show how to of plan, test, refactor, and pair program using eXtreme programming. Contains a wealth of reusable C++ and Java code. Focuses on solving customer oriented systems problems using UML and Design Patterns. Robert C. Martin is President of Object Mentor Inc. Martin and his team of software consultants use Object-Oriented Design, Patterns, UML, Agile Methodologies, and eXtreme Programming with worldwide clients. He is the author of the best-selling book Designing Object-Oriented C++ Applications Using the Booch Method (Prentice Hall, 1995), Chief Editor of, Pattern Languages of Program Design 3 (Addison Wesley, 1997), Editor of, More C++ Gems (Cambridge, 1999), and co-author of XP in Practice, with James Newkirk (Addison-Wesley, 2001). He was Editor in Chief of the C++ Report from 1996 to 1999. He is a featured speaker at international conferences and trade shows. Author Biography: ROBERT C. MARTIN is President of Object Mentor Inc. Martin and his team of software consultants use Object-Oriented Design, Patterns, UML, Agile Methodologies, and eXtreme Programming with worldwide clients. He is the author of the best-selling book Designing Object-Oriented C++ Applications Using the Booch Method (Prentice Hall, 1995), Chief Editor of, Pattern Languages of Program Design 3 (Addison Wesley, 1997), Editor of, More C++ Gems (Cambridge, 1999), and co-author of XP in Practice, with James Newkirk (Addison-Wesley, 2001). He was Editor in Chief of the C++ Report from 1996 to 1999. He is a featured speaker at international conferences and trade shows.
Article
The increasing importance being placed on software measurement has led to an increased amount of research developing new software measures. Given the importance of object-oriented development techniques, one specific area where this has occurred is cohesion measurement in object-oriented systems. However, despite a very interesting body of work, there is little understanding of the motivation and empirical hypotheses behind many of these new measures. It is often difficult to determine how such measures relate to one another and for which application they can be used. As a consequence, it is very difficult for practitioners and researchers to obtain a clear picture of the state-of-the-art in order to select or define cohesion measures for object-oriented systems. This situation is addressed and clarified through several different activities. First, a standardized terminology and formalism for expressing measures is provided which ensures that all measures using it are expressed in a fully consistent and operational manner. Second, to provide a structured synthesis, a review of the existing approaches to measure cohesion in object-oriented systems takes place. Third, a unified framework, based on the issues discovered in the review, is provided and all existing measures are then classified according to this framework. Finally, a review of the empirical validation work concerning existing cohesion measures is provided. This paper contributes to an increased understanding of the state-of-the-art: a mechanism is provided for comparing measures and their potential use, integrating existing measures which examine the same concepts in different ways, and facilitating more rigorous decision making regarding the definition of new measures and the selection of existing measures for a specific goal of measurement. In addition, our review of the state-of-the-art highlights several important issues: (i) many measures are not defined in a fully operational form, (ii) relatively few of them are based on explicit empirical models as recommended by measurement theory, and (iii) an even smaller number of measures have been empirically validated; thus, the usefulness of many measures has yet to be demonstrated.
Article
Test Driven Development (TDD) is a software development practice in which unit test cases are incrementally written prior to code implementation. We ran a set of structured experiments with 24 professional pair programmers. One group developed a small Java program using TDD while the other (control group), used a waterfall-like approach. Experimental results, subject to external validity concerns, tend to indicate that TDD programmers produce higher quality code because they passed 18% more functional black-box test cases. However, the TDD programmers took 16% more time. Statistical analysis of the results showed that a moderate statistical correlation existed between time spent and the resulting quality. Lastly, the programmers in the control group often did not write the required automated test cases after completing their code. Hence it could be perceived that waterfall-like approaches do not encourage adequate testing. This intuitive observation supports the perception that TDD has the potential for increasing the level of unit testing in the software industry.
Conference Paper
Usage of test-driven development (TDD) is said to lead to better testable programs. However, no study answers either the question how this better testability can be measured nor whether the feature of better testability exists. To answer both questions we present the concept of the controllability of assignments. We studied this metric on various TDD and conventional projects. Assignment controllability seems to support the rules of thumb for testable code, e.g. small classes with low coupling are better testable than large classes with high coupling. And as opposed to the Chidamber and Kemerer metric suite for object-oriented design, controllability of assignments seems to be an indicator whether a project was developed with TDD or not.
Article