Article

Realizing quality improvement through test driven development: Results and experiences of four industrial teams

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Test-driven development (TDD) is a software development practice that has been used sporadically for decades. With this practice, a software engineer cycles minute-by-minute between writing failing unit tests and writing implementation code to pass those tests. Test-driven development has recently re-emerged as a critical enabling practice of agile software development methodologies. However, little empirical evidence supports or refutes the utility of this practice in an industrial context. Case studies were conducted with three development teams at Microsoft and one at IBM that have adopted TDD. The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15–35% increase in initial development time after adopting TDD.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Software testing is the activity that allows developers to check that the source code works as expected (Pezzè and Young 2008). In the past, a number of researchers have investigated the properties that make test code more effective (Cai and Lyu 2007;Chen et al. 2001;Grano et al. 2020;Nagappan et al. 2005bNagappan et al. , 2008 as well as their relation to the ability of catching defects in production code (Catolino et al. 2019b;Chen et al. 2001;Kudrjavets et al. 2006). Researchers have successfully demonstrated that the quality of test suites has a strong Listing 1 Example of test case considered in our study correlation with the post-release defects that appear in the production classes they test (Chen et al. 2001;Kochhar et al. 2017), i.e., the higher the test quality the lower the likelihood that the corresponding production code will be affected by defects. ...
... Other studies found a relation between test effort and product quality (Nagappan et al. 2008;Strecker and Memon 2012) based on other testing metrics such as code coverage (Cai and Lyu 2007;Chen et al. 2001;Nagappan et al. 2008) and other static metrics (e.g., number of assertions) (Nagappan et al. 2005b). Kudrjavets et al. (2006) showed the existence of a high correlation between assertion density and defect-proneness of production code, while Catolino et al. (2019b) showed that this relation may be due to the experience of the testing teams. ...
... Other studies found a relation between test effort and product quality (Nagappan et al. 2008;Strecker and Memon 2012) based on other testing metrics such as code coverage (Cai and Lyu 2007;Chen et al. 2001;Nagappan et al. 2008) and other static metrics (e.g., number of assertions) (Nagappan et al. 2005b). Kudrjavets et al. (2006) showed the existence of a high correlation between assertion density and defect-proneness of production code, while Catolino et al. (2019b) showed that this relation may be due to the experience of the testing teams. ...
Article
Full-text available
Testing represents a crucial activity to ensure software quality. Recent studies have shown that test-related factors (e.g., code coverage) can be reliable predictors of software code quality, as measured by post-release defects. While these studies provided initial compelling evidence on the relation between tests and post-release defects, they considered different test-related factors separately: as a consequence, there is still a lack of knowledge of whether these factors are still good predictors when considering all together. In this paper, we propose a comprehensive case study on how test-related factors relate to production code quality in Apache systems. We first investigated how the presence of tests relates to post-release defects; then, we analyzed the role played by the test-related factors previously shown as significantly related to post-release defects. The key findings of the study show that, when controlling for other metrics (e.g., size of the production class), test-related factors have a limited connection to post-release defects.
... Therefore, several studies on the productivity and quality impacts of TDD adoption are found in the literature. A study based on four case studies [3] pointed to a 40-90% reduction in the density of software pre-launch defects that used TDD over similar designs that used the traditional test approach. Other metaanalysis research on the effects of TDD adoption [4] found evidence of improvement in product quality, but did not confirm productivity difference in relation to non-TDD projects. ...
... Other metaanalysis research on the effects of TDD adoption [4] found evidence of improvement in product quality, but did not confirm productivity difference in relation to non-TDD projects. While some studies have indicated significant reductions in productivity, e.g. up to -57%, after the introduction of TDD [3], [5], [6], others indicate the opposite, with gains of +50% [7] and up to +72% in productivity [8]. However, no studies were found that analyzed the productivity impact of TDD in relation to the variation of total project term and product complexity. ...
... For example, if the product is of low complexity, then 60% of the stories should be of low, 30% of medium and 10% of high complexity, resulting in the following values for variable: complexStoryDistrib [1]=0.6, complexStoryDistrib [2]=0.3 and complexStoryDistrib [3]=0.1. ...
Conference Paper
Full-text available
Project management in the software industry has been constantly evolving its development practices. One practice that has been standing out in recent years is Test Driven Development (TDD). However, there is no consensus in the literature about the impact of TDD on the productivity of development teams. Furthermore, no studies were found that analyzed the productivity impact of TDD practitioners in relation to the variations of total project term and the complexity of the product. Based on three case studies, organized in 18 scenarios, this work used modeling and discrete-event simulation to investigate the impact of TDD on team productivity. The results showed that both factors, total project duration, and product complexity, do influence the productivity of the software development team that adopts TDD practice and the most advantageous scenarios were identified. A Fuzzy Logical System was also implemented that recommends TDD based on these two variables.
... This study is a product of a case study that involves three development teams at Microsoft and one development team at IBM. The contents of this study is follow up on the two previously reviewed studies in this paper in the "IBM Corporation Final conclusion in this study was (Nagappan et al., 2008): ...
... Threats to validity of the study were identified as (Nagappan et al., 2008): ...
... This means that these projects had detailed requirements documents specification that drove test and development effort and there were also design meetings and review sessions. The legacy team did not use any agile methodologies nor the TDD practice (Nagappan et al., 2008). ...
... Many past studies investigated the properties that make test code more effective [1,2,9,10] and demonstrated that test code quality strongly impacts the number of post-release defects contained in the exercised classes [2,6]. ...
... The results, later confirmed by Rafique and Misic [12] showed a significant influence of the effort invested into testing on code quality. Other studies found an influence of test effort and testdriven development on product quality [10,19] based on other testing metrics such as code coverage [1,2,10] and other static metrics (e.g., number of assertions) [11]. Kudrjavets et al. [7] showed the existence of a high correlation between assertion density and defect-proneness of production code, while Chen and Wong [2] used code coverage for software failures prediction and showed that this metric influences code quality. ...
... The results, later confirmed by Rafique and Misic [12] showed a significant influence of the effort invested into testing on code quality. Other studies found an influence of test effort and testdriven development on product quality [10,19] based on other testing metrics such as code coverage [1,2,10] and other static metrics (e.g., number of assertions) [11]. Kudrjavets et al. [7] showed the existence of a high correlation between assertion density and defect-proneness of production code, while Chen and Wong [2] used code coverage for software failures prediction and showed that this metric influences code quality. ...
Conference Paper
Testing is a very important activity whose purpose is to ensure software quality. Recent studies have studied the effects of test-related factors (e.g., code coverage) on software code quality, showing that they have good predictive power on post-release defects. Despite these studies demonstrated the existence of a relation between test-related factors and software code quality, they considered different factors separately. That led us to conduct an additional empirical study in which we considered these factors all together. The key findings of the study show that, while post-release defects are strongly related to process and code metrics of the production classes, test-related factors have a limited prediction impact.
... Although there are many existing experimental assessments of unit testing techniques, they are usually constrained by their academic or commercial nature (as in [7], [8], [9]). The purpose of this work is to achieve experimental conditions that are very rarely met in the studies on the subject [10]. ...
... [9]); while others are in the form of analysis after the fact, discussing the known sources about the course of the already executed project (e.g. [8]), • results comparability -an experiment may concern only a single project or a group of unique projects, the features of which may vary, making them hard to compare (e.g. [7]); on the other hand, it may refer to a project repeated several times, each time in similar circumstances and controllable, experiment-relevant conditions, including the possibility of control groups (e.g. ...
... Industrial experiments involve the permanently employed company workers [20], but still possibly also interns. Studies of industrial cases are sometimes realized in the form of a historical analysis and the results of the projects are executed with no scientific inquiry in mind [10], [7], [8]. This makes them more realistic, but, at the same time, more obscure for investigation. ...
Article
Full-text available
Context: There is still little evidence on differences between Test-Driven Development and Test-Last Development, especially for real-world projects, so their impact on code/test quality is an ongoing research trend. An empirical comparison is presented, with 19 participants working on an industrial project developed for an energy market software company, implementing real-world requirements for one of the company's customers. Objective: Examine the impact of TDD and TLD on quality of the code and the tests. The aim is to evaluate if there is a significant difference in external code quality and test quality between these techniques. Method: The experiment is based on a randomized within-subjects block design, with participants working for three months on the same requirements using different techniques, changed from week to week, within three different competence blocks: Intermediate, Novice and Mixed. The resulting code was verified for process conformance. The participants developed only business logic and were separated from infrastructural concerns. A separate group of code repositories was used to work without unit tests, to verify that the requirements were not too easy for the participants. Also, it was analysed if there is any difference between the code created by shared efforts of developers with different competences and the code created by participants isolated in the competence blocks. The resulting implementations had LOC order of magnitude of 10k. Results: Statistically significant advantage of TDD in terms of external code quality (1.8 fewer bugs) and test quality (5 percentage points higher) than TLD. Additionally, TDD narrows the gap in code coverage between developers from different competence blocks. At the same time, TDD proved to have a considerable entry barrier and was hard to follow strictly, especially by Novices. Still, no significant difference w.r.t. code coverage has been observed between the Intermediate and the Novice developers - as opposed to TLD, which was easier to follow. Lastly, isolating the Intermediate developers from the Novices had significant impact on the code quality. Conclusion:TDD is a recommended technique for software projects with a long horizon or when it is critical to minimize the number of bugs and achieve high code coverage.
... Although short cycles do not prevent software bugs or issues, these are discovered much earlier, and therefore the reinforcing issue-regeneration process (i.e., bug proliferation) is prevented (see Figure 2). Significant reductions of 40-90% in defect density are reported when teams perform frequent tests, as they do under Agile approaches (Nagappan et al., 2008). Our case analysis illustrates that SoftCo took about 39 weeks to try to fix all issues (Episode C and a part of Episode D) in an attempt to deliver a stable system to ManuCo. ...
... Interestingly, if SoftCo had delayed the start of its software development and waited for the validation equipment to become available, it could have maintained an Agile development approach. It is very likely that frequent testing would have subsequently reduced the time needed for fixing issues and stabilizing the system (Nagappan et al., 2008). This is depicted in Figure 3. ...
Article
Full-text available
Increasingly, the development of today’s “smart” products requires the integration of both software and hardware in embedded systems. To develop these, hardware firms typically enlist the expertise of software development firms to offer integrated solutions. While hardware firms often work according to a plan-driven approach, software development firms draw on Agile development methods. Interestingly, empirically little is known about the implications and consequences of working according to contrasting development methods in a collaborative project. In response to this research gap, we conducted a process study of a collaborative development project involving a software firm and a hardware firm, within which the two firms worked according to contrasting development methods. We found that the software firm was gradually compelled to forgo its Agile method, creating a role conflict in terms of its way of working. As such, our results contribute to the literature on Agile–Stage-Gate hybrids by demonstrating how, in collaborative embedded systems development, hybridization of development methods may cause projects to fail. Our main practical implication entails the introduction of the “sequential Agile approach.”
... In this paper, we discuss experiences on how Ericsson tackled the problem of crafting a software product for the financial sector not only migrating to a modern programming language and with a Service Oriented Architecture, but also using modern ways of developing the software such as test-driven development, integration tests, continuous integration, clean code, learning by doing, mandatory solution review and simple communication. Several studies analyze the effects that TDD has on code quality and defect rate [12,10], though few Research supported by Ericsson AB and PLEng School. ...
... Although it has earlier roots in the Smalltalk community, the term Test-Driven Development was popularized in the late 1990s and described as part of the Extreme Programming process, [1,2]. Several scientific studies have analyzed the effects of TDD e.g., [12,10,5]. ...
Conference Paper
Full-text available
In this paper, we present experiences from eight years of developing a financial transaction engine, using what can be described as an integration-test-centric software development process. We discuss the product and the relation between three different categories of its software and how the relative weight of these artifacts has varied over the years. In addition to the presentation, some challenges and future research directions are discussed.
... Often, both of these practices are used together to reap maximum benefits. Some of the benefits of AUT include reduced time to test, discover bugs, fix bugs, and implement new features; wider and measurable test coverage; reproducibility, reusability, consistency, and reliability of tests; improved accuracy, regression testing, parallel testing, faster feedback cycle, reduced cost, and higher team morale [15,31,17,40,27]. Likewise, TDD has been shown to result in benefits such as flexible and adaptive program design, cleaner interfaces, higher code quality, maintainability, extensibility, reliability, detailed evolving specification, reduced time on bug fixes and feature implementation, reliable refactoring and code changes, reduced cost, reduced development time, and increased programmer productivity [28,10,19,6,1,3,14,36,20,5,33]. ...
... These results indicate that unit test automation is beneficial but additional quality improvements may be realized if the tests are written iteratively, as in TDD. Nagappan et al. [27] conducted case studies with three development teams at Microsoft and one at IBM that had adopted TDD. The results of the case studies indicate that the prerelease defect density of the four products decreased between 40% and 90% relative to similar projects that did not use TDD. ...
Article
The best practices of agile software development have had a significant positive impact on the quality of software and time-to-delivery. As a result, many leading software companies employ some form of agile software development practices. Some of the most important best practices of agile software development, which have received significant attention in recent years, are automated unit testing (AUT) and test-driven development (TDD). Both of these practices work in conjunction to provide numerous benefits. AUT leads to reduced time to test, discover bugs, fix bugs, and implement new features; wider and measurable test coverage; reproducibility, reusability, consistency, and reliability of tests; improved accuracy, regression testing, parallel testing, faster feedback cycle, reduced cost, and higher team morale. The benefits of TDD include flexible and adaptive program design, cleaner interfaces, higher code quality, maintainability, extensibility, reliability, detailed evolving specification, reduced time on bug fixes and feature implementation, reliable refactoring, and code changes, reduced cost, reduced development time, and increased programmer productivity. Unfortunately, students in introductory programming courses are generally not introduced to AUT and TDD. This leads to the development of bad programming habits and practices which become harder to change later on. By introducing the students earlier to these industry-standard best practices, not only the motivation and interest of students in this area can be increased but also their academic success and job marketability can be enhanced. This paper presents the detailed design and efficacy study of an introductory C++ programming course designed using the principles of AUT and TDD. The paper presents the pedagogical techniques employed to build industry-proven agile software development practices in the students. As part of the paper, all the course material including the source code for the labs and the automated unit tests are being made available to encourage people to incorporate these best practices into their curricula.
... According to Microsoft researchers [1] in the paper "Empirical Software Engineering at Microsoft Research" they described TDD as the art of writing failing unit tests and then going on to write implementation code to that will make the failing tests to pass. This description seems to be the widely acceptable definition of TDD as echoed by H Erdogmus in his paper "Effectiveness of Test-first Approach to Programming" [4] as well as Nachiappan Nagappan in his 2008 publication [6] and several other research articles on TDD also agrees and asserts this definition [ [12] . There have been varying results on the strengths and weaknesses of TDD which at most times have been similar although in few cases, the results have been very different. ...
... However, in George and Williams experiment [10] [11] , they found that TDD did improve the productivity and effectiveness of the experiments subjects, as well as to lead to high test coverage. The results and experiences of the research conducted by four industrial teams as observed by Thirumalesh Bhat, Nachiappan Nagappan, Maximilien, L.Williams [6] summarized the microsoft and IBM studies, indicated that there was a decrease in the pre-release defect density of the four projects observed to the tune of about 40% -90% when compared with similar projects that didn't use the TDD practice. ...
Thesis
Full-text available
UNIVERSITY OF TARTU Institute of Computer Science Software Engineering Curriculum Meya Stephen Kenigbolo A Case study of Test-Driven Development Master's thesis (30 ECTS) Supervisor: Dietmar Alfred Paul Kurt Pfahl Co-supervisor: Kaarel Kotkas Tartu 2017
... The discipline of testing and quality assurance extends into defect prevention where bugs found in released code are addressed in favor of future functionality (by agreement with the customer) (Patel & Ramachandran, 2009b). Root cause analysis is applied to the defect, the cause is addressed, tests are developed to identify the defect and ensure non-recurrence, an approach which has shown to increase both quality and the overall sustainable velocity of delivery (Nagappan et al., 2008). ...
... H0-10: Implementing defect prevention and root cause analysis in favor of future functionality is not positively associated with the teams' perceived project success. (Nagappan et al., 2008). ...
Conference Paper
Aim/Purpose: Given the underlying philosophy of the agile manifesto, this study investigates whether an increase in agile maturity is associated with improved perceived project success. Background: The underlying philosophy of the agile manifesto is embodied in principle one which promotes the continuous delivery of software that is deemed valuable by the customer, while principle twelve encourages continual improvement of the delivery process. This constant improvement, or maturity, is not a concept unique to agile methods and is commonly referred to as a maturity model. The most common of maturity model is the Capability Maturity Model Integrated (CMMI). However, research consensus indicates CMMI might not fully be compatible with agile implementation, specifically at higher levels of maturity without sacrificing agility. Agile maturity models (AMM), which are aligned to agile principles encourage continuous improvement while maintaining agility. Methodology: The study employs a conceptual model based on an existing agile maturity model that is related to perceived project success. Using an objectivist perspective, a quantitative method was employed to analyze the results of an online survey of agile practitioners. Contribution: The significant contribution from this research is the validation of the conceptual model relating the activities and maturity levels of the AMM as the independent variables to the dependent variable of perceived project success. Findings: The data analysis found that a significant positive correlation exists between maturity levels and perceived project success. The strongest correlation was found at the highest maturity level, with relatively weaker correlation at the lower levels of maturity. It can thus be concluded that a higher level of maturity in the AMM is positively associated with perceived project success. Recommendations for Practitioners: The study has practical implications in highlighting that performance management, requirements management, regular delivery and customer availability are key areas to focus on to establish and continually improve the success of agile implementations. This study further assists practitioners in systematically identifying the critical agile activities, such as the use of story cards, continuous delivery and the presence of a knowledgeable customer. Recommendation for Researchers: The contributions of this study for academics is the confirmation of the maturity model developed by Patel and Ramachandran (2009a). This study also shows the association between the individual activities within the maturity levels as well as the maturity levels and the perceived project success, addressing a gap in literature relating these concepts. Future Research: It would be useful to replicate this study whilst following a qualitative approach. The study could also be replicated with a sample consisting of agile project customers.
... It emphasizes writing unit tests before production code, aiming to improve design, increase code coverage, and reduce defects. Interviews with practitioners (e.g. in [3], [4]), case studies in large scale software organizations (e.g. in [5], [6]) and experiments with different developer types (with students in [7], with practitioners in [8]) revealed that both academia and industry are still very much interested in understanding TDD, and observing its benefits on a variety of factors in different types of settings. ...
... In an earlier study, we also reported an industry experiment with 24 professionals to investigate the effects of TDD on external quality and productivity compared to an incremental test-last approach [8]. We designed the experiment with more participants than [19] and [22], reduced confounding factors like pair programming and personal software process, organized a three-day workshop [10] £ 7 2(£), 4(¡£), 1(¡) * [21], [25], [19], [20], [22] [9] £ 5 3(¡£), 2(¡) Not specified [11] £ 11 4(£), 7(¡) † [26], [20], [22], [5], [21], [27], [28], [29] [12] £ 10 3(£), 6(¡£), 1(¡) ‡ [21], [22], [19], [6] [13] £ 9 7 (£), 1(¡£), 1(¡) [24], [6], [21], [22], [19] including intensive training and exercises, and defined more granular measures for external quality and productivity. We chose three tasks, one for control and two for two TDD sessions. ...
Article
Reviews on test-driven development (TDD) studies suggest that the conflicting results reported in the literature are due to unobserved factors, such as the tasks used in the experiments, and highlight that there are very few industry experiments conducted with professionals. The goal of this study is to investigate the impact of a new factor, the chosen task, and the development approach on external quality in an industrial experiment setting with 17 professionals. The participants are junior to senior developers in programming with Java, beginner to novice in unit testing, JUnit, and they have no prior experience in TDD. The experimental design is a $2 \times 2$ cross-over, i.e., we use two tasks for each of the two approaches, namely TDD and incremental test-last development (ITLD). Our results reveal that both development approach and task are significant factors with regards to the external quality achieved by the participants. More specifically, the participants produce higher quality code during ITLD in which splitting user stories into subtasks, coding, and testing activities are followed, compared to TDD. The results also indicate that the participants produce higher quality code during the implementation of Bowling Score Keeper, compared to that of Mars Rover API, although they perceived both tasks as of similar complexity. An interaction between the development approach and task could not be observed in this experiment. We conclude that variables that have not been explored so often, such as the extent to which the task is specified in terms of smaller subtasks, and developers' unit testing experience might be critical factors in TDD experiments. The real-world appliance of TDD and its implications on external quality still remain to be challenging unless these uncontrolled and unconsidered factors are further investigated by researchers in both academic and industrial settings.
... This way, software requirements are converted to specific evaluations of the functionality. TDD has been demonstrated to provide a variety of different benefits, such as better productivity [5], better quality [6] and higher test coverage [7]. ...
... Embedded system design frequently implies the development of custom processors or accelerators (using, for example, a reconfigurable logic fabric) that will be later integrated in the final platform. Hence, it is interesting to explore the utilization of TDD during this process and benefit from the gain in productivity [5], quality [6] and high test coverage [7] as it has been proven in the software realm. Figure 2 illustrates the workflow to be followed by the user of RC-Unity, identifying the manual (user intervention) and automatic steps, together with the principal inputs and outputs to/from the individual processes. ...
Article
Full-text available
High-Level Synthesis (HLS) tools help engineers to deal with the complexity of building heterogeneous embedded systems that make it use of reconfigurable technology. Also, HLS opens up a way for introducing, into the development flow of custom hardware components, techniques well known in the software industry such as Test-Driven Development (TDD). However, the support provided by HLS tools for verification activities is limited, and it is usually focused on the initial steps of the design process. In this paper, a hardware testing framework is introduced as an enabler for effortless on-board verification of components by applying the Unit Testing Paradigm and, hence, realizing TDD on reconfigurable hardware. The proposed solution comprises a hardware/software introspection infrastructure to verify modules of a system at different stages, spawning multiple abstraction levels without extra effort nor redesigning the component. Our solution has been implemented for the Xilinx ZynQ FPGA-SoC architecture and applied to the verification of C-kernels within the CHStone Benchmark. Effortless integration into the Xilinx Vivado design flow and tools is supported by a set of automatic generation scripts developed for this end. Experimental results show a considerable speedup of the verification time and unveils inaccuracies concerning the co-simulation estimation obtained by Xilinx tools when compared with the on-board latency measured by our framework.
... However, the current work is not directly linked to Principle 9 of the manifesto. It investigates various topics in isolation; like pair programming (e.g., [10,19,22]), refactoring (e.g., [16,30,32]) and test-driven development (e.g., [9,26,33] ). Whether these practices are influenced by Principle 9 is still not well understood. ...
Conference Paper
Full-text available
"Technical excellence" is a nebulous term in agile software development. This vagueness is risky, as it creates a gap in the understanding of agile that may have consequences on how software development practitioners operate. Technical excellence is the only reference to quality in the agile manifesto. Hence, it is fundamental to understand how agile software development practitioners both interpret and implement it. We conducted interviews with twenty agile practitioners about their understanding of the term "technical excellence" and how they approach the task of fostering it. To validate the findings, two focus group meetings were conducted after the interviews and the analysis of the data were completed. We found that the configuration of technical excellence is made of four traits: (1) software craftsmanship; (2) software quality (3) mindset for excellence; and (4) consistency with good software engineering practices. Fostering technical excellence is a continuous endeavor. Further, we identified three key principles that were commonly cited as essential to implementing technical excellence, namely: 1) continuous learning; 2) continuous improvement; and 3) control of excellence. Based on our findings, we present several recommendations for software development teams seeking to better realize the goal of technical excellence in their agile implementation.
... In software engineering, test-driven development ensures that in response to a defined input a piece of code generates the expected output 35 . Distributed version control represents an efficient way of tracking and merging changes made by a group of people working on the same project 36 . ...
Preprint
Full-text available
Several studies have shown that neither the formal representation nor the functional requirements of genome-scale metabolic models (GEMs) are precisely defined. Without a consistent standard, comparability, reproducibility, and interoperability of models across groups and software tools cannot be guaranteed. Here, we present memote ( https://github.com/opencobra/memote ) an open-source software containing a community-maintained, standardized set of me tabolic mo del te sts. The tests cover a range of aspects from annotations to conceptual integrity and can be extended to include experimental datasets for automatic model validation. In addition to testing a model once, memote can be configured to do so automatically, i.e., while building a GEM. A comprehensive report displays the model’s performance parameters, which supports informed model development and facilitates error detection. Memote provides a measure for model quality that is consistent across reconstruction platforms and analysis software and simplifies collaboration within the community by establishing workflows for publicly hosted and version controlled models.
... Second, we took into account the effectiveness of test cases in terms of code coverage (Gopinath et al. 2014) and assertion density (Catolino et al. 2019b;Kudrjavets et al. 2006). Third, we assessed the relation between test cases and post-release defects (Chen et al. 2001;Nagappan et al. 2005Nagappan et al. , 2008Pecorelli et al. 2020a), in an effort of understanding how well can tests prevent the introduction of defects in production code. For these reasons, we defined the following research questions: ...
Article
Full-text available
These days, over three billion users rely on mobile applications (a.k.a. apps) on a daily basis to access high-speed connectivity and all kinds of services it enables, from social to emergency needs. Having high-quality apps is therefore a vital requirement for developers to keep staying on the market and acquire new users. For this reason, the research community has been devising automated strategies to better test these applications. Despite the effort spent so far, most developers write their test cases manually without the adoption of any tool. Nevertheless, we still observe a lack of knowledge on the quality of these manually written tests: an enhanced understanding of this aspect may provide evidence-based findings on the current status of testing in the wild and point out future research directions to better support the daily activities of mobile developers. We perform a large-scale empirical study targeting 1,693 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, (3) what is their effectiveness, and (4) how well manual tests can reduce the risk of having defects in production code. In addition, we conduct a focus group with 5 Android testing experts to discuss the findings achieved and gather insights into the next research avenues to undertake. The key results of our study show Android apps are poorly tested and the available tests have low (i) design quality, (ii) effectiveness, and (iii) ability to find defects in production code. Among the various suggestions, testing experts report the need for improved mechanisms to locate potential defects and deal with the complexity of creating tests that effectively exercise the production code.
... Another study at IBM and Microsoft, done in part by the same authors, found that defects decreased drastically, yet productivity declined with the introduction of TDD [72]. In light of no clear evidence for TDD, developers might simply choose not to employ it. ...
Article
Software testing is one of the key activities to software quality in practice. Despite its importance, however, we have a remarkable lack of knowledge on how developers test in real-world projects. In this paper, we report on the surprising results of a large-scale field study with 2,443 software engineers whose development activities we closely monitored over the course of 2.5 years in four Integrated Development Environments (IDEs). Our findings question several commonly shared assumptions and beliefs about developer testing: half of the developers in our study does not test; developers rarely run their tests in the IDE; only once they start testing, do they do it heftily; most programming sessions end without any test execution; only a quarter of test cases is responsible for three quarters of all test failures; 12% of tests show flaky behavior; Test-Driven Development (TDD) is not widely practiced; and software developers only spend a quarter of their time engineering tests, whereas they think they test half of their time. We observed only minor differences in the testing practices among developers in different IDEs, Java, and C#. We summarize these practices of loosely guiding ones development efforts with the help of testing as Test-Guided Development (TGD).
... Siniaalto [6] provides an overview of the experiments regarding TDD and productivity. Nagappan [7] describes the effects of TDD in four industrial case studies. Muller and Padberg [8] claim that the lifecycle benefit introduced by TDD outweighs its required investment. ...
Article
Due to embedded co-design considerations, testing embedded software is typically deferred after the integration phase. Contrasting with the current embedded engineering practices, Test-Driven Development (TDD) promotes testing software during its development, even before the target hardware becomes available. Principally, TDD promotes a fast feedback cycle in which a test is written before the implementation. Moreover, each test is added to a test suite, which runs at every step in the TDD cycle. As a consequence, test-driven code is well tested and maintainable. Still, embedded software has some typical properties which impose challenges to apply the TDD cycle. Essentially, uploading software to target is generally too time-consuming to frequently run tests on target. Secondary issues are hardware dependencies and limited resources, such as memory footprint or processing power. In order to deal with these limitations, four methods have been identified and evaluated. Furthermore, a number of relevant design patterns are discussed to apply TDD in an embedded environment.
... Developers continue in edit-run cycles, writing code and running unit tests until all desired functionality is implemented. This approach has been found to reduce defects and improve code quality [30]- [32]. ...
Preprint
Full-text available
As developers program and debug, they continuously edit and run their code, a behavior known as edit-run cycles. While techniques such as live programming are intended to support this behavior, little is known about the characteristics of edit-run cycles themselves. To bridge this gap, we analyzed 28 hours of programming and debugging work from 11 professional developers which encompassed over three thousand development activities. We mapped activities to edit or run steps, constructing 581 debugging and 207 programming edit-run cycles. We found that edit-run cycles are frequent. Developers edit and run the program, on average, 7 times before fixing a defect and twice before introducing a defect. Developers waited longer before again running the program when programming than debugging, with a mean cycle length of 3 minutes for programming and 1 minute for debugging. Most cycles involved an edit to a single file after which a developer ran the program to observe the impact on the final output. Edit-run cycles which included activities beyond edit and run, such as navigating between files, consulting resources, or interacting with other IDE features, were much longer, with a mean length of 5 minutes, rather than 1.5 minutes. We conclude with a discussion of design recommendations for tools to enable more fluidity in edit-run cycles.
... This result falls within the range of 18 [25] and 45% [20] observed in the two most related studies in literature. In practice, Nagappan et al. [40] also observed a significant difference in the defect density of a release (completed with reduced TDD adoption level due to shortcuts taken by the team) of a software project compared to previous releases (completed with full TDD adoption). This result may explain why TDD is the most popular agile practice and described as the first step of a waterfall project going "agile" [16]. ...
Article
Full-text available
The adoption of agile practices in software projects has been faced with scepticism by practitioners, with concerns about the actual effectiveness of these practices. Using system dynamics, this study investigates the impact of four popular agile practices, Test-Driven Development, Pair Programming, On-site Customer Involvement and Pair Testing, on the quality of continuous delivery projects. The developed system dynamic model, called the predictive continuous delivery model, was developed with an extensive use of existing literature, supported by survey, interviews, historical data and expert’s judgement. Simulation results showed all the investigated agile practices except pair programming have a significant impact on the quality of continuous delivery projects.
... The authors state that the use of TDD increased programming time by 16% [28]. Along this line, the case studies presented in the works of Nagappan et al. [29] and Canfora et al. [30] presented similar conclusions with the developers of companies such as Soluziona Software Factory, IBM, and Microsoft. ...
... Taken separately, both unit testing and performance testing are well established quality control activities. At the unit testing side, studies such as[12]and[25]report successful defect reduction in industrial settings. At the performance testing side, studies such as[35]provide an early summary of the issues, and papers such as[36]contain a broader overview of the software performance engineering challenges. ...
Conference Paper
Although methods and tools for unit testing of performance exist for over a decade, anecdotal evidence suggests unit testing of performance is not nearly as common as unit testing of functionality. We examine this situation in a study of GitHub projects written in Java, looking for occurrences of performance evaluation code in common performance testing frameworks. We quantify the use of such frameworks, identifying the most relevant performance testing approaches, and describe how we adjust the design of our SPL performance testing framework to follow these conclusions.
... George e Williams [14] realizaram um estudo de caso com programadores profissionais no qual afirma que o uso do TDD com ciclo de vida iterativo e incremental, assistida com detecção de falhas, o tempo de programação aumentou em 16% quando comparada com os desenvolvimentos tradicionais, por exemplo, cascata. Os estudos de caso apresentados nos trabalhos de Nagappan et al. [15] e Canfora et al. [16] também apresentaram conclusões semelhantes com os desenvolvedores de empresas como Soluziona Software Factory, IBM e Microsoft. ...
Conference Paper
Full-text available
In this paper, we aimed to analyze the application and contribution of the use of Test-Driven Development (TDD) and Behavior-Oriented Development (BDD) in Software Engineering teaching. As empirical research, we presented an experiment conducted in the Software Engineering Laboratory (LES) course of a Private University (Tiradentes University) in the Bachelor of Computer Science and Information Systems courses. This experiment demonstrated the fundamentals in verifying the learning difficulties of students. Collected data were subjected not only to quantitative analysis but also to appropriate statistical analysis. The results showed a reduction in student absences, higher student satisfaction rate and higher grades in the courses. Furthermore, our approach allowed students to deliver a product in a short period, representing a possibility of adoption of BDD due to their successful learning experience.
... Test-driven development (TDD) has become popular among developers because it is one of the most important practices in any agile development method [3]. In TDD, a developer considers the test cases and scenarios firsthand. ...
... Table 2 lists the similar project characteristics found in the two cases or development projects in terms of context factors and software product measure factors. The factor list is taken from the work in [6][7][8]. We notice that both kinds of factors, especially the contextual factors are comparable and rather similar between the two case studies. ...
Article
Full-text available
In a case study where a Dutch small-to-medium enterprise (SME) implemented test-driven development and continuous integration, researchers observed that the SME discovered a higher number of defects compared to a baseline case study, and that there was an increase in the focus on quality and test applications.
... TDD has been assessed from a quantitative point of view (e.g., [8,9]) and according to a qualitative perspective (e.g., [10,11]). A number of primary studies, like experiments or case studies, have been conducted on TDD (e.g., [8,9,12,13,14]). Their results, gathered and combined in a number of secondary studies (e.g., [6,15,16,17,18,19]), do not fully support the claimed benefits of TDD. ...
Preprint
In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of software products nor developers' productivity. However, we observed that the participants applying TDD produced significantly more tests, with a higher fault-detection capability than those using a non-TDD approach. As for the retainment of TDD, we found that TDD is retained by novice developers for at least six months.
... Researchers have studied TDD through case studies (e.g. Nagappan et al. 2008), surveys (e.g. Aniche and Gerosa 2010) and experiments (e.g. ...
Article
Full-text available
Existing empirical studies on test-driven development (TDD) report different conclusions about its effects on quality and productivity. Very few of those studies are experiments conducted with software professionals in industry. We aim to analyse the effects of TDD on the external quality of the work done and the productivity of developers in an industrial setting. We conducted an experiment with 24 professionals from three different sites of a software organization. We chose a repeated-measures design, and asked subjects to implement TDD and incremental test last development (ITLD) in two simple tasks and a realistic application close to real-life complexity. To analyse our findings, we applied a repeated-measures general linear model procedure and a linear mixed effects procedure. We did not observe a statistical difference between the quality of the work done by subjects in both treatments. We observed that the subjects are more productive when they implement TDD on a simple task compared to ITLD, but the productivity drops significantly when applying TDD to a complex brownfield task. So, the task complexity significantly obscured the effect of TDD. Further evidence is necessary to conclude whether TDD is better or worse than ITLD in terms of external quality and productivity in an industrial setting. We found that experimental factors such as selection of tasks could dominate the findings in TDD studies.
... Table 2 lists the similar project characteristics found in the two cases or development projects in terms of context factors and software product measure factors. The factor list is taken from the work in [6][7][8]. We notice that both kinds of factors, especially the contextual factors are comparable and rather similar between the two case studies. ...
Article
Full-text available
In this article we describe the implementation of hybrid agile practices, namely Test Driven Development (TDD) and Continuous Integration (CI) at a Dutch SME. The quality and productivity outcomes of the case study were compared to a performance baseline set by a reference case, a preceding development project of similar context, size, complexity and team. We observed that on applying TDD and CI, a higher number of defects were discovered compared to the baseline case. The team members at the Dutch SME perceived an increase in the focus on quality and test applications, while considering customer acceptance. As a result of the case study, the Dutch SME now has an infrastructure in place to further evaluate software process improvement (SPI) initiatives.
... The TDD process is also known as test-first or red-green-refactoring. Today TDD is being widely adopted in industry, including large software firms such as Microsoft and IBM [1]. TDD has also gained in popularity with the introduction of the eXtreme Programming XP methodology [2], and it is sometimes used as a stand-alone approach for software engineering maintenance tasks such as adding features to legacy code [3]. ...
Article
Full-text available
Test-Driven Development (TDD) is a software development approach where test cases are written before actual development of the code in iterative cycles. Context: TDD has gained attention of many software practitioners during the last decade since it has contributed several benefits to the software development process. However, empirical evidence of its dominance in terms of internal code quality, external code quality and productivity is fairly limited. Objective: The aim behind conducting this controlled experiment with professional Java developers is to see the impact of Test-Driven Development (TDD) on internal code quality, external code quality and productivity compared to Test-Last Development (TLD). Results: Experiment results indicate that values found related to number of acceptance test cases passed, McCabe's Cyclomatic complexity, branch coverage, number of lines of code per person hours, number of user stories implemented per person hours are statistically insignificant. However, static code analysis results were found statistically significant in the favor of TDD. Moreover, the results of the survey revealed that the majority of developers in the experiment prefer TLD over TDD, given the lesser required level of learning curve as well as the minimum effort needed to understand and employ TLD compared to TDD.
Conference Paper
We describe how and why we use test automation and refactoring in the design and evolution of software systems. The writing and running of automated tests against the the system create mutual feedback loops that guide design decisions in both the system and test code.
Conference Paper
Engineering performance-critical systems often requires manual, expensive fine-tuning of critical application parts such as start-up routines, authentication sequences and transactions. It is highly desirable to protect this investment by regression tests that indicate when performance characteristics such as memory usage or thread allocation change. While traditional testing techniques can be used, they are often too coarse, as systems are tested against static thresholds, and therefore important changes that can result in declining system performance will not be detected. To address these limitations, we propose a novel approach to performance regression testing based on automatically generated statistical test oracles. Machine learning methods are used to detect deviations from the profiles shown in these oracles. We present Buto, a proof-of-concept tool tightly integrated into the JUnit testing framework that can be used to test applications executed on the Java virtual machine (JVM). Buto uses data obtained by transparently monitoring applications through Java Management Extensions (JMX). In this paper we describe the Buto framework and demonstrate how to calibrate the tool using an evaluation based on a set of benchmarking examples.
Conference Paper
Test-Driven Development (TDD) has been claimed to increase external software quality. However, the extent to which TDD increases external quality has been seldom studied in industrial experiments. We conduct four industrial experiments in two different companies to evaluate the performance of TDD on external quality. We study whether the performance of TDD holds across the premises of the same company and across companies. We identify participant-level characteristics impacting results. Iterative-Test Last (ITL), the reverse approach of TDD, outperforms TDD in three out of four premises. ITL outperforms TDD in both companies. The larger the experience with unit testing and testing tools, the larger the difference in performance between ITL and TDD (in favour of ITL). Technological environment (i.e., programming language and testing tool) seems not to impact results. Evaluating participant-level characteristics impacting results in industrial experiments may ease the understanding of TDD’s performance in realistic settings.
Article
As the scale and complexity of software increases, the number of tests needed for effective validation becomes extremely large. Executing these large test suites is expensive, both in terms of time and energy. Cache misses are a significant contributing factor to execution time of software. We propose an approach that helps order test executions in a test suite in such a way that instruction cache misses are reduced. We also ensure that the approach scales to large test suite sizes. We conduct an empirical evaluation with 20 subject programs and test suites from the SIR repository, EEMBC suite, and LLVM Symbolizer, comparing execution times and cache misses with test orderings maximising instruction locality versus a traditional ordering maximising coverage and random permutations. We also assess overhead of algorithms in generating the orderings that optimise cache locality. Nature of programs and tests impact the performance gained with our approach. Performance gains were considerable for programs and test suites where the average number of different instructions executed between tests was high. We achieved an average execution speedup of 6.83% and a maximum execution speedup of 17% over subject programs with differing control flow between test executions. We propose an approach that helps order test executions in a test suite in such a way that instruction cache misses are reduced. The ordering of tests uses a distance metric based on visited basic blocks in test executions and nearest neighbour analysis. We ensure our approach scales to large test suites. We evaluated our approach with 20 subject programs and test suites. For test suite execution, we achieved an average speedup of 7% and a maximum speedup of 17%.
Article
The practical applications of agile methods and their impact on the productivity and efficiency of software development dominate the agile literature. We analyzed 827 academic articles with bibliometric techniques to explore the role project management research played in the development of the academic agile discourse. Bibliometric analyses over two time periods reveal that project management–related topics form a distinct stream of research in the second time period but not in the first. Furthermore, our results reveal that the academic agile discussion has been mainly unidirectional. This situation offers many opportunities for project management researchers to contribute to the agile discourse.
Preprint
Full-text available
Test-Driven Development (TDD) has been claimed to increase external software quality. However, the extent to which TDD increases external quality has been seldom studied in industrial experiments. We conduct four industrial experiments in two different companies to evaluate the performance of TDD on external quality. We study whether the performance of TDD holds across premises within the same company and across companies. We identify participant-level characteristics impacting results. Iterative-Test Last (ITL), the reverse approach of TDD, outperforms TDD in three out of four premises. ITL outperforms TDD in both companies. The larger the experience with unit testing and testing tools, the larger the difference in performance between ITL and TDD (in favour of ITL). Technological environment (i.e., programming language and testing tool) seems not to impact results. Evaluating participant-level characteristics impacting results in industrial experiments may ease the understanding of the performance of TDD in realistic settings.
Thesis
Full-text available
How can we improve the unit testing skills of recent graduates to improve the connection with the software industry?
Chapter
Now days, Agile development methodologies are becoming popular in various IT industries. This methodology is combination of different stages in repetitive and incremental manner. Its main focus is to increase the adaptability in process, which in turn increases customer satisfaction. There are various development frameworks in agile methodology like Scrum and Kanban. There is also another programming practice known as Test Driven Development. It starts with developing test for a feature, before its implementation. It is also known as test first programming. The main objective of this paper is to adopt an approach in which TDD can be merged with Scrum to add benefits of TDD in Scrum and also to provide a comprehensive review of this approach.
Chapter
Context. Extensive unit testing is worth its costs in terms of the higher quality of the final product and reduced development expenses, though it may consume more than fifty percent of the overall project budget. Thus, even a tiny percentage of saving can significantly decrease the costs. Since recently competing assertion libraries emerged, we need empirical evidence to gauge them in terms of developer productivity, allowing SQA Managers and Testers to select the best. Objective. The aim of this work is comparing two assertion frameworks having a different approach (matchers vs. fluent assertions) w.r.t. tester productivity. Method. We conducted a controlled experiment involving 41 Bachelor students. AssertJ is compared with Hamcrest, in a test development scenario with the Java language. We analysed the number of correct assertions developed in a tight time frame and used this measure as a proxy for tester productivity. Results. The results show that adopting AssertJ improves the overall tester’s productivity significantly during the development of assertions. Conclusions. Testers and SQA managers selecting assertion frameworks for their organizations should consider as first choice AssertJ, since our study shows that it increases the productivity of testers during development more than Hamcrest.
Preprint
Full-text available
Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives: The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method: We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results: TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion: Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process...
Article
Context: Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable. Objectives: The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD. Method: We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD. Results: TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students. Conclusion: Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process, or case studies comparing the performance achieved by TDD vs. the control approach (e.g., the waterfall model), each applied to develop a different system. Further experiments with TDD experts are needed to validate these hypotheses.
Article
In this paper, we investigate the effect of TDD, as compared to a non-TDD approach, as well as its retainment (or retention) over a time span of (about) six months. To pursue these objectives, we conducted a (quantitative) longitudinal cohort study with 30 novice developers (i.e., third-year undergraduate students in Computer Science). We observed that TDD affects neither the external quality of software products nor developers’ productivity. However, we observed that the participants applying TDD produced significantly more tests, with a higher fault-detection capability, than those using a non-TDD approach. As for the retainment of TDD, we found that TDD is retained by novice developers for at least six months.
Article
Geospatial software developers often rely on Git to collaborate with each other and manage source code in an efficient way. Yet, most GIS programming courses do not prepare students for such a work environment. This article proposes a typology that a GIS programming course could follow consisting of three components: group organization, project and evaluation. Based on the typology, a GIS programming course was designed where randomly formed pairs develop a state‐of‐the‐art QGIS plugin. GitHub Classroom was used to facilitate collaboration among students, which also allowed the lecturer to monitor the progress of groups and provide timely feedback. Five out of the six groups were successful in completing the projects, and a substantial majority of the students were satisfied with the course. A strengths–weaknesses–opportunities–threats analysis reveals insights that other lecturers may find useful when designing their GIS programming courses.
Article
Full-text available
A real-time operating system for avionics (RTOS4A) provides an operating environment for avionics application software. Since an RTOS4A has safety-critical applications, demonstrating a satisfactory level of its quality to its stakeholders is very important. By assessing the variation in quality across consecutive releases of an industrial RTOS4A based on test data collected over 17 months, we aim to provide a set of guidelines to 1) improve the test effectiveness and thus the quality of subsequent RTOS4A releases and 2) similarly assess the quality of other systems from test data. We carefully defined a set of research questions, for which we defined a number of variables (based on available test data), including release and measures of test effort, test effectiveness, complexity, test efficiency, test strength, and failure density. With these variables, to assess the quality in terms of number of failures found in tests, we applied a combination of analyses, including trend analysis using two-dimensional graphs, correlation analysis using Spearman’s test, and difference analysis using the Wilcoxon rank test. Key results include the following: 1) The number of failures and failure density decreased in the latest releases and the test coverage was either high or did not decrease with each release; 2) increased test effort was spent on modules of greater complexity and the number of failures was not high in these modules; and 3) the test coverage for modules without failures was not lower than the test coverage for modules with failures uncovered in all the releases. The overall assessment, based on the evidences, suggests that the quality of the latest RTOS4A release has improved. We conclude that the quality of the RTOS4A studied was improved in the latest release. In addition, our industrial partner found our guidelines useful and we believe that these guidelines can be used to assess the quality of other applications in the future.
Article
Full-text available
Test-first programming is one of the central techniques of Extreme Programming. Programming test-first means...
Article
Full-text available
Evidence suggests that pair programmers-two programmers working collaboratively on the same design, algorithm, code, or test-perform substantially better than the two would working alone. Improved quality, teamwork, communication, knowledge management, and morale have been among the reported benefits of pair programming. This paper presents a comparative economic evaluation that strengthens the case for pair programming. The evaluation builds on the quantitative results of an empirical study conducted at the University of Utah. The evaluation is performed by interpreting these findings in the context of two different, idealized models of value realization. In the first model, consistent with the traditional waterfall process of software development, code produced by a development team is deployed in a single increment; its value is not realized until the full project completion. In the second model, consistent with agile software development processes such as Extreme Programming, code is produced and delivered in small increments; thus its value is realized in an equally incremental fashion. Under both models, our analysis demonstrates a distinct economic advantage of pair programmers over solo programmers. Based on these preliminary results, we recommend that organizations engaged in software development consider adopting pair programming as a practice that could improve their bottom line. To be able to perform quantitative analyses, several simplifying assumptions had to be made regarding alternative models of software development, the costs and benefits associated with these models, and how these costs and benefits are recognized. The implications of these assumptions are addressed in the paper.
Article
Full-text available
Conference Paper
Full-text available
Test-driven development (TDD) is an agile software development strategy that addresses both design and testing. This paper describes a controlled experiment that examines the effects of TDD on internal software design quality. The experiment was conducted with undergraduate students in a software engineering course. Students in three groups completed semester-long programming projects using either an iterative Test-First (TDD), iterative Test-Last, or linear Test-Last approach. Results from this study indicate that TDD can be an effective software design approach improving both code-centric aspects such as object decomposition, test coverage, and external quality, and developer-centric aspects including productivity and confidence. In addition, iterative development approaches that include automated testing demonstrated benefits over a more traditional linear approach with manual tests. This study demonstrates the viability of teaching TDD with minimal effort in the context of a relatively traditional development approach. Potential dangers with TDD are identified regarding programmer motivation and discipline. Pedagogical implications and instructional techniques which may foster TDD adoption will also be referenced.
Conference Paper
Full-text available
An important goal of most empirical software engineering research is the transfer of research results to industrial applications. Two important obstacles for this transfer are the lack of control of variables of case studies, i.e., the lack of explanatory power, and the lack of realism of controlled experiments. While it may be difficult to increase the explanatory power of case studies, there is a large potential for increasing the realism of controlled software engineering experiments. To convince industry about the validity and applicability of the experimental results, the tasks, subjects and the environments of the experiments should be as realistic as practically possible. Such experiments are, however, more expensive than experiments involving students, small tasks and pen-and-paper environments. Consequently, a change towards more realistic experiments requires a change in the amount of resources spent on software engineering experiments. This paper argues that software engineering researchers should apply for resources enabling expensive and realistic software engineering experiments similar to how other researchers apply for resources for expensive software and hardware that are necessary for their research. The paper describes experiences from recent experiments that varied in size from involving one software professional for 5 days to 130 software professionals, from 9 consultancy companies, for one day each.
Conference Paper
Full-text available
Extreme programming (XP) is a new and controversial software process for small teams. A practical training course at the University of Karlsruhe led to the following observations about the key practices of XP. First, it is unclear how to reap the potential benefits of pair programming, although pair programming produces high-quality code. Second, designing in small increments appears to be problematic but ensures rapid feedback about the code. Third, while automated testing is helpful, writing test cases before coding is a challenge. Last, it is difficult to implement XP without coaching. This paper also provides some guidelines for those starting out with XP.
Article
Full-text available
What tools do we use to develop and debug software? Most of us rely on a full-screen editor to write code, a compiler to translate it, a source-level debugger to correct it, and a source-code control system to archive and share it. These tools originated in the 1970s, when the change from batch to interactive programming stimulated the development of innovative languages, tools, environments, and other utilities we take for granted. Microsoft Research has developed two generations of tools, some of which Microsoft developers already use to find and correct bugs. These correctness tools can improve software development by systematically detecting programming errors.
Article
Full-text available
For 25 years, software researchers have proposed improving software development and maintenance with new practices whose effectiveness is rarely, if ever, backed up by hard evidence. We suggest several ways to address the problem, and we challenge the community to invest in being more scientific.< >
Article
Full-text available
Test-driven development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: in TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed.
Article
Full-text available
Experimentation in software engineering is necessary but difficult. One reason is that there are a large number of context variables and, so, creating a cohesive understanding of experimental results requires a mechanism for motivating studies and integrating results. It requires a community of researchers that can replicate studies, vary context variables, and build models that represent the common observations about the discipline. The paper discusses the experience of the authors, based upon a collection of experiments, in terms of a framework for organizing sets of related studies. With such a framework, experiments can be viewed as part of common families of studies, rather than being isolated events. Common families of studies can contribute to important and relevant hypotheses that may not be suggested by individual experiments. A framework also facilitates building knowledge in an incremental manner through the replication of experiments within families of studies. To support the framework, the paper discusses the experiences of the authors in carrying out empirical studies, with specific emphasis on persistent problems encountered in experimental design, threats to validity, criteria for evaluation, and execution of experiments in the domain of software engineering
Article
Full-text available
Experimentation helps determine the effectiveness of proposed theories and methods. However, computer science has not developed a concise taxonomy of methods for demonstrating the validity of new techniques. Experimentation is a crucial part of attribute evaluation and can help determine whether methods used in accordance with some theory during product development will result in software being as effective as necessary. By looking at multiple examples of technology validation, the authors develop a taxonomy for software engineering experimentation that describes twelve different experimental approaches
Article
Full-text available
The "standard" approach to studies in empirical software engineering research is the "comparative group experiment". We are told the participation of large groups of subjects makes the results more conclusive and meaningful. However, requiring large numbers of subjects results in issues involving the expense, timeliness and applicability of results. Many of these can be addressed through the use of a technique long used in psychological research: the single subject experiment. In Single Subject experiments, a single subject is systematically studied and the factor of interest is alternatively introduced and withdrawn so its effect on the subject can be analyzed.
Book
From the Publisher: "XP is the most important movement in our field today. I predict that it will be as essential to the present generation as the S.E.I. and its Capability Maturity Model were to the last." —From the foreword by Tom DeMarco The hallmarks of Extreme Programming—constant integration and automated testing, frequent small releases that incorporate continual customer feedback, and a teamwork approach—make it an exceptionally flexible and effective approach to software development. Once considered radical, Extreme Programming (XP) is rapidly becoming recognized as an approach particularly well-suited to small teams facing vague or rapidly changing requirements—that is, the majority of projects in today's fast-paced software development world. Within this context of flexibility and rapid-fire changes, planning is critical; without it, software projects can quickly fall apart. Written by acknowledged XP authorities Kent Beck and Martin Fowler, Planning Extreme Programming presents the approaches, methods, and advice you need to plan and track a successful Extreme Programming project. The key XP philosophy: Planning is not a one-time event, but a constant process of reevaluation and course-correction throughout the lifecycle of the project. You will learn how planning is essential to controlling workload, reducing programmer stress, increasing productivity, and keeping projects on track. Planning Extreme Programming also focuses on the importance ofestimating the cost and time for each user story (requirement), determining its priority, and planning software releases accordingly. Specific topics include: Planning and the four key variables: cost, quality, time, and scope Deciding how many features to incorporate into a release Estimating scope, time, and effort for user stories Prioritizing user stories Balancing the business value and technical risk of user stories Rebuilding the release plan based on customer and programmer input Choosing the iteration length Tracking an iteration What to do when you're not going to make the date Dealing with bugs Making changes to the team Outsourcing Working with business contracts In addition, this book alerts you to the red flags that signal serious problems: customers who won't make decisions, growing defect reports, failing daily builds, and more. An entire chapter is devoted to war stories from the trenches that illustrate the real-world problems many programmers encounter and the solutions they've devised.
Article
Although testing starts with individual programs, programs are rarely self-contained in real software environments. They depend on external subsystems like language run time and operating system libraries for various functionalities. These subsystems are developed externally to any given program, with their own test processes. Of course, an uncoordinated change in one of the external subsystems may affect the program's correctness. Test teams therefore add an integration testing step to their process to ensure that programs will continue to operate with different versions of the external subsystems. As full testing may take days or weeks to run, it is useful to understand how to prioritize these tests. We present an integration testing system to understand and quantify the impact of a change, so test teams can focus their testing efforts on the most likely affected parts of the program. Detecting the impact of a change is a hard problem due to the size and complexity of the control and data dependencies involved. Our new approach is based on a binary dependency framework, MaX, that determines control and data dependencies in a system and represents them in a dependency graph. MaX is designed to work on systems consisting of thousands of binaries and millions of procedures. It constructs the graph in multiple steps to allow the analysis of individual binaries to proceed in parallel. MaX provides simple abstractions for defining systems, and provides a simple programming interface to tools for analysis of the graph. The integration testing system also contains two tools that use MaX to advise test teams. MaxCift quantifies the effect of a change to guide how much testing is likely to be needed. MaxScout prioritizes an existing set of tests based on changes made to external subsystems. All of the tools use a binary code based approach that does not require source code for external subsystems, an important requirement for practical use. MaX runs under the Windows environment and is used by Microsoft product teams. Early results show that the system scales to production software and is effective in guiding testing.
Article
The author advises innovators who want others to accept their work to study propaganda, especially as the military, political parties, and even corporations use it. He argues that, while we have come to associate propaganda with nefarious mind-control plots and political extremists, it’s really just an approach to convincing others to see things the way you do.
Article
Test Driven Development (TDD) is a software development practice in which unit test cases are incrementally written prior to code implementation. We ran a set of structured experiments with 24 professional pair programmers. One group developed a small Java program using TDD while the other (control group), used a waterfall-like approach. Experimental results, subject to external validity concerns, tend to indicate that TDD programmers produce higher quality code because they passed 18% more functional black-box test cases. However, the TDD programmers took 16% more time. Statistical analysis of the results showed that a moderate statistical correlation existed between time spent and the resulting quality. Lastly, the programmers in the control group often did not write the required automated test cases after completing their code. Hence it could be perceived that waterfall-like approaches do not encourage adequate testing. This intuitive observation supports the perception that TDD has the potential for increasing the level of unit testing in the software industry.
Article
An integrated approach to software quality, reliability, and safety is described that is termed ‘software quality engineering’. It encompasses the three levels of quality assurance technology generally recognised in manufacturing, namely, product inspection, process control, and design improvement. Most software organizations today operate at the product-inspection level of technology (or lower). Achieving higher levels of quality assurance technology depends on establishing effective measurement, control, and process improvement mechanisms within the software enterprise.
Conference Paper
One way of responding to a keynote speaker is to put the expressed views into context, pointing to highlights in the address, suggesting areas where alternative viewpoints might have been presented, exposing any chinks in the armour of the otherwise ...
Conference Paper
This paper discusses software development using the Test Driven Development (TDD) methodology in two different environments (Windows and MSN divisions) at Microsoft. In both these case studies we measure the various context, product and outcome measures to compare and evaluate the efficacy of TDD. We observed a significant increase in quality of the c ode (greater than two times) for projects developed using TDD compared to similar projects developed in the same organization in a no n-TDD fashion. The projects also took at least 15% extra upfront t ime for writing the tests. Additionally, the unit tests have served as auto documentation for the code when libraries/APIs had to be used as well as for code maintenance.
Conference Paper
Test Driven Development (TDD) is a software development practice in which unit test cases are incrementally written prior to code implementation. In our research, we ran a set of structured experiments with 24 professional pair programmers. One group developed code using TDD while the other a waterfall-like approach. Both groups developed a small Java program. We found that the TDD developers produced higher quality code, which passed 18% more functional black box test cases. However, TDD developer pairs took 16% more time for development. A moderate correlation between time spent and the resulting quality was established upon analysis. It is conjectured that the resulting high quality of code written using the TDD practice may be due to the granularity of TDD, which may encourage more frequent and tighter verification and validation. Lastly, the programmers which followed a waterfall-like process often did not write the required automated test cases after completing their code, which might be indicative of the tendency among practitioners toward inadequate testing. This observation supports that TDD has the potential of increasing the level of testing in the industry as testing as an integral part of code development.
Article
Simply transferring knowledge and instrumentation is not enough to help developing countries build their own research base. Such efforts must be tied to national and local needs to create trust and services for society in the long term
Conference Paper
Test-driven development is a software development practice that has been used sporadically for decades. With this practice, test cases (preferably automated) are incrementally written before production code is implemented. Test-driven development has recently re-emerged as a critical enabling practice of the extreme programming software development methodology. We ran a case study of this practice at IBM. In the process, a thorough suite of automated test cases was produced after UML design. In this case study, we found that the code developed using a test-driven development practice showed, during functional verification and regression tests, approximately 40% fewer defects than a baseline prior product developed in a more traditional fashion. The productivity of the team was not impacted by the additional focus on producing automated test cases. This test suite aids in future enhancements and maintenance of this code. The case study and the results are discussed in detail.
Article
The author argues that test-first coding is not testing. Test-first coding is not new. It is nearly as old as programming. It is an analysis technique. We decide what we are programming and what we are not programming, and we decide what answers we expect. Test-first is also a design technique
Article
Case studies help industry evaluate the benefits of methods and tools and provide a cost-effective way to ensure that process changes provide the desired results. However, unlike formal experiments and surveys, case studies do not have a well-understood theoretical basis. This article provides guidelines for organizing and analyzing case studies so that they produce meaningful results
Article
Although many view iterative and incremental development as a modern practice, its application dates as far back as the mid-1950s. Prominent software-engineering thought leaders from each succeeding decade supported IID practices, and many large projects used them successfully. These practices may have differed in their details, but all had a common theme-to avoid a single-pass sequential, document-driven, gated-step approach.
Propaganda and software development Managing the software process On the influence of test-driven development on software design Case studies for method and tool evaluation
  • Addison-Wesley
  • Reading
  • D Ma Janzen
  • H Saiedian
W (2004) Propaganda and software development. IEEE Softw 21(5):5–7 Humphrey WS (1989) Managing the software process. Addison-Wesley, Reading, MA Janzen D, Saiedian H (2006) On the influence of test-driven development on software design. In: Proceedings of the Conference on Software Engineering Education and Training, Turtle Bay, HI Kitchenham B, Pickard L et al (1995) Case studies for method and tool evaluation. IEEE Softw 12(4):52–62
Software quality engineering An initial investigation of test-driven development in industry A structured experiment of test-driven development
  • Uml Distilled
  • Addison
  • Wesley
  • Reading
  • D Ma
  • Hetzel
  • B George
  • Williams
M (2000) UML Distilled. Addison Wesley, Reading, MA Gelperin D, Hetzel W (1987) Software quality engineering. In: Proceedings of the Fourth International Conference on Software Testing, Washington, DC, June George B, Williams L (2003a) An initial investigation of test-driven development in industry. In: Proceedings of the ACM Symposium on Applied Computing, Melbourne, FL George B, Williams L (2003b) A structured experiment of test-driven development. Inf Softw Technol (IST) 46(5):337–342
Software metrics: a rigorous and practical approach Science and substance: a challenge to software engineers
  • Ne
  • Pfleeger
  • Pacific Sl Cole Brooks
  • Grove
  • Ca
  • N Fenton
  • Pfleeger
  • Sl
NE, Pfleeger SL (1998) Software metrics: a rigorous and practical approach. Cole Brooks, Pacific Grove, CA Fenton N, Pfleeger SL et al (1994) Science and substance: a challenge to software engineers. IEEE Softw 11 (4):86–95
IEEE Std 982.2-1988 IEEE guide for the use of IEEE standard dictionary of measures to produce reliable software
IEEE (1988) IEEE Std 982.2-1988 IEEE guide for the use of IEEE standard dictionary of measures to produce reliable software. IEEE Computer Society, Washington, DC
UML Distilled Software quality engineering
  • M Fowler
  • Reading
  • D Ma Gelperin
  • W Hetzel
Fowler M (2000) UML Distilled. Addison Wesley, Reading, MA Gelperin D, Hetzel W (1987) Software quality engineering. In: Proceedings of the Fourth International Conference on Software Testing, Washington, DC, June
Test-driven development as a defect-reduction practice Experimental models for validating technology
  • L Williams
  • Em Maximilien
Williams L, Maximilien EM et al. (2003) Test-driven development as a defect-reduction practice. In: Proceedings of the IEEE International Symposium on Software Reliability Engineering, Denver, CO. IEEE Computer Society, Washington, DC Zelkowitz MV, Wallace DR (1998) Experimental models for validating technology. Computer 31(5):23–31
Managing the development of large software systems: concepts and techniques Conducting realistic experiments in software engineering Efficient integration testing using dependency analysis
  • Ww Royce
  • D Sjøberg
  • B Anda
Royce WW (1970) Managing the development of large software systems: concepts and techniques. IEEE WESTCON, Los Angeles, CA Sjøberg D, Anda B et al. (2002) Conducting realistic experiments in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering, Nara, Japan. IEEE Computer Society, Washington, DC Srivastava A, Thiagarajan J, Schertz C (2005) Efficient integration testing using dependency analysis. Technical report: MSR-TR-2005-94. Microsoft Research, Redmond, WA
max " ) is a research staff member at IBM's Almaden Research Center in San Jose, California. Prior to joining ARC, he spent ten years at IBM's Research Triangle Park
  • E Dr
  • Michael Maximilien
Dr. E. Michael Maximilien (aka " max " ) is a research staff member at IBM's Almaden Research Center in San Jose, California. Prior to joining ARC, he spent ten years at IBM's Research Triangle Park, N.C., in
2-1988 IEEE guide for the use of IEEE standard dictionary of measures to produce reliable software
  • IEEE
Software quality engineering
  • D Gelperin
  • W Hetzel