Profile plot: mean difference in quality (TDD-ITL) for students per experience level in Questionnaire 2

Profile plot: mean difference in quality (TDD-ITL) for students per experience level in Questionnaire 2

Source publication
Article
Full-text available
Context Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environme...

Citations

... • Pointers to studies outside FM: in software engineering it is quite common to use this strategy, for example to evaluate visual/model-based languages, as done for example by the works of Abrahão et al. [69,70], focused on modelling notations. In the evaluation of methodologies, a representative work is the one by Santos et al. [71] on test-driven development. When one wants to focus on single specific methodological step, a reference work is the one by Mohanani et al. [72], about different strategies for framing requirements and their impact on creativity. ...
Article
Full-text available
Empirical studies on formal methods and tools are rare. In this paper, we provide guidelines for such studies. We mention their main ingredients and then define nine different study strategies (usability testing, laboratory experiments with software and human subjects, case studies, qualitative studies, surveys, judgement studies, systematic literature reviews, and systematic mapping studies) and discuss for each of them their crucial characteristics, the difficulties of applying them to formal methods and tools, typical threats to validity, their maturity in formal methods, pointers to external guidelines, and pointers to studies in other fields. We conclude with a number of challenges for empirical formal methods.
... So, it is not yet clear what impact, if any, inline tests will have in the presence of different testing methodologies. For example, since inline tests check existing target statements, its role may be limited in organizations that follow test-driven development (TDD) [3,7,78]. (In TDD, tests are written prior to writing code.) ...
Preprint
Unit tests are widely used to check source code quality, but they can be too coarse-grained or ill-suited for testing individual program statements. We introduce inline tests to make it easier to check for faults in statements. We motivate inline tests through several language features and a common testing scenario in which inline tests could be beneficial. For example, inline tests can allow a developer to test a regular expression in place. We also define language-agnostic requirements for inline testing frameworks. Lastly, we implement I-Test, the first inline testing framework. I-Test works for Python and Java, and it satisfies most of the requirements. We evaluate I-Test on open-source projects by using it to test 144 statements in 31 Python programs and 37 Java programs. We also perform a user study. All nine user study participants say that inline tests are easy to write and that inline testing is beneficial. The cost of running inline tests is negligible, at 0.007x--0.014x, and our inline tests helped find two faults that have been fixed by the developers.
... • Pointers to studies outside FM: in software engineering it is quite common to use this strategy, for example to evaluate visual/model-based languages, as done for example by the works of Abrahão et al. [64,65], focused on modelling notations. In the evaluation of methodologies, a representative work is the one by Santos et al. [66] on test-driven development. When one wants to focus on single specific methodological step, a reference work is the one by Mohanani et al. [67], about different strategies for framing requirements and their impact on creativity. ...
Preprint
Full-text available
Empirical studies on formal methods and tools are rare. In this paper, we provide guidelines for such studies. We mention their main ingredients and then define nine different study strategies (laboratory experiments with software and human subjects, usability testing, surveys, qualitative studies, judgment studies, case studies, systematic literature reviews, and systematic mapping studies) and discuss for each of them their crucial characteristics, the difficulties of applying them to formal methods and tools, typical threats to validity, their maturity in formal methods, pointers to external guidelines, and pointers to studies in other fields. We conclude with a number of challenges for empirical formal methods.
... In order to ensure the smooth progress of the experiment, we translated the experimental materials into the participants' native language (Spanish) so that they did not have to spend time and mental effort on language translation. We acknowledge that, although the experimental material was translated into the participants' native language to make them feel comfortable, the self-assessment questions may not accurately capture their background [64], [65]. Thus, the use of questionnaires may have biased the results of the satisfaction response variable. ...
Article
Full-text available
Context: Recent developments in natural language processing have facilitated the adoption of chatbots in typically collaborative software engineering tasks (such as diagram modelling). Families of experiments can assess the performance of tools and processes and, at the same time, alleviate some of the typical shortcomings of individual experiments (e.g., inaccurate and potentially biased results due to a small number of participants). Objective: Compare the usability of a chatbot for collaborative modelling (i.e., SOCIO) and an online web tool (i.e., Creately). Method: We conduct a family of three experiments to evaluate the usability of SOCIO against the Creately online collaborative tool in academic settings. Results: The student participants were faster at building class diagrams using the chatbot than with the online collaborative tool and more satisfied with SOCIO. Besides, the class diagrams built using the chatbot tended to be more concise albeit slightly less complete. Conclusion: Chatbots appear to be helpful for building class diagrams. In fact, our study has helped us to shed light on the future direction for experimentation in this field and lays the groundwork for researching the applicability of chatbots in diagramming.
... Another family of experiments evaluated whether the use of test-driven development (TDD) improves software product quality [50]. The family is composed of 12 separate experiments and aims to improve the accuracy and generalizability of the results. ...
Article
Full-text available
Context: The usability software quality characteristic aims to improve system user performance. In a previous study, we found evidence of the impact of a set of usability features from the viewpoint of users in terms of efficiency, effectiveness and satisfaction. However, the impact level appears to depend on the usability feature and suggest priorities with respect to their implementation depending on how they promote user performance. Objectives: We use a family of three experiments to increase the precision and generalization of the results in the baseline experiment and provide findings regarding the impact on user performance of the Abort Operation, Progress Feedback and Preferences usability mechanisms. Method: We conduct two replications of the baseline experiment in academic settings. We analyse the data of 366 experimental subjects and apply aggregation (meta-analysis) procedures. Results: We find that the Abort Operation and Preferences usability mechanisms appear to improve system usability a great deal with respect to efficiency, effectiveness and user satisfaction. Conclusions: We find that the family of experiments further corroborates the results of the baseline experiment. Most of the results are statistically significant, and, because of the large number of experimental subjects, the evidence that we gathered in the replications is sufficient to outweigh other experiments.
Article
The research on the claimed effects of Test-Driven Development (TDD) on software quality and developers’ productivity has shown inconclusive results. Some researchers have ascribed such results to the negative affective reactions that TDD would provoke when developers apply it. In this paper, we studied whether and in which phases TDD influences the affective states of developers, who are new to this development approach. To that end, we conducted a baseline experiment and two replications, and analyzed the data from these experiments both individually and jointly. Also, we performed methodological triangulation by means of an explanatory survey, whose respondents were experienced with TDD. The results of the baseline experiment suggested that developers like TDD significantly less, compared to a non-TDD approach. Also, developers who apply TDD like implementing production code significantly less than those who apply a non-TDD approach, while testing production code makes TDD developers significantly less happy. These results were not confirmed in the replicated experiments. We found that the moderator that better explains these differences across experiments is experience (in months) with unit testing, practiced in a test-last manner. The higher the experience with unit testing, the more negative the affective reactions caused by TDD. The results from the survey seem to confirm the role of this moderator.
Article
One of the main challenges that developers face when testing their systems lies in engineering test cases that are good enough to reveal bugs. And while our body of knowledge on software testing and automated test case generation is already quite significant, in practice, developers are still the ones responsible for engineering test cases manually. Therefore, understanding the developers' thought- and decision-making processes while engineering test cases is a fundamental step in making developers better at testing software. In this paper, we observe 13 developers thinking-aloud while testing different real-world open-source methods, and use these observations to explain how developers engineer test cases. We then challenge and augment our main findings by surveying 72 software developers on their testing practices. We discuss our results from three different angles. First, we propose a general framework that explains how developers reason about testing. Second, we propose and describe in detail the three different overarching strategies that developers apply when testing. Third, we compare and relate our observations with the existing body of knowledge and propose future studies that would advance our knowledge on the topic.
Article
Test-driven development (TDD) is an agile development technology that involves the running of many test cases. The software is likely to be more robust when more test cases are executed in a certain time. For this, it is important to find the combination of number of threads and test cases which can lead to the lowest average running time of a single test case aiming at different computers. It can improve the performance of TDD. Therefore, this paper proposed a method for this problem. First, it tests the three-dimensional raw data in computers with different cores. Second, these data are changed into two-dimensional data after dimensionality reduction. This facilitates the generation of fitting functions. Also, the fitting function of new data waiting for prediction is generated after dimensionality reduction. At last, similarity calculations between the fitting functions of raw and new data are carried out by using Euclidean distance similarity algorithms. Experimental results show that data based on dual-core computer have higher similarities with four new data, such as 81.25%, 100%, 81.25%, and 100%. Thus, the data of dual-core computer have higher reference credibility in predicting the average running time of a single test case of the new data for different computers. In summary, the performance of TDD can be improved after applying the proposed method.