October 2024
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Publications (103)
October 2024
·
8 Reads
August 2024
·
7 Reads
ACM Transactions on Software Engineering and Methodology
Experiments are a commonly used method of research in software engineering (SE). Researchers report their experiments following detailed guidelines. However, researchers do not, in the field of test-driven development (TDD) at least, specify how they operationalized the response variables and, particularly, the measurement process. This article has three aims: (i) identify the response variable operationalization components in TDD experiments that study external quality; (ii) study their influence on the experimental results; (iii) determine if the experiment reports describe the measurement process components that have an impact on the results. We used two-part sequential mixed methods research. The first part of the research adopts a quantitative approach applying a statistical analysis of the impact of the operationalization components on the experimental results. The second part follows on with a qualitative approach applying a systematic mapping study (SMS). The test suites, intervention types and measurers have an influence on the measurements and results of the statistical analysis of TDD experiments in SE. The test suites have a major impact on both the measurements and the results of the experiments. The intervention type has less impact on the results than on the measurements. While the measurers have an impact on the measurements, this is not transferred to the experimental results. On the other hand, the results of our SMS confirm that TDD experiments do not usually report either the test suites, the test case generation method, or the details of how external quality was measured. A measurement protocol should be used to assure that the measurements made by different measurers are similar. It is necessary to report the test cases, the experimental task and the intervention type in order to be able to reproduce the measurements and statistical analyses, as well as to replicate experiments and build dependable families of experiments.
August 2024
·
18 Reads
Experimentation is an essential method for causal inference in any empirical discipline. Crossover-design experiments are common in Software Engineering (SE) research. In these, subjects apply more than one treatment in different orders. This design increases the amount of obtained data and deals with subject variability but introduces threats to internal validity like the learning and carryover effect. Vegas et al. reviewed the state of practice for crossover designs in SE research and provided guidelines on how to address its threats during data analysis while still harnessing its benefits. In this paper, we reflect on the impact of these guidelines and review the state of analysis of crossover design experiments in SE publications between 2015 and March 2024. To this end, by conducting a forward snowballing of the guidelines, we survey 136 publications reporting 67 crossover-design experiments and evaluate their data analysis against the provided guidelines. The results show that the validity of data analyses has improved compared to the original state of analysis. Still, despite the explicit guidelines, only 29.5% of all threats to validity were addressed properly. While the maturation and the optimal sequence threats are properly addressed in 35.8% and 38.8% of all studies in our sample respectively, the carryover threat is only modeled in about 3% of the observed cases. The lack of adherence to the analysis guidelines threatens the validity of the conclusions drawn from crossover design experiments
July 2024
·
21 Reads
Test-driven development (TDD) is a widely used agile practice. However, very little is known with certainty about TDD's underlying foundations, i.e., the way TDD works. In this paper, we propose a theoretical framework for TDD, with the following characteristics: 1) Each TDD cycle represents a vertical slice of a (probably also small) user story, 2) vertical slices are captured using contracts, implicit in the developers' minds, and 3) the code created during a TDD cycle is a sliced-based specification of a code oracle, using the contracts as slicing pre/post-conditions. We have checked the connections among TDD, contracts, and slices using a controlled experiment conducted in the industry.
July 2024
·
15 Reads
June 2024
·
30 Reads
Experiments are a commonly used method of research in software engineering (SE). Researchers report their experiments following detailed guidelines. However, researchers do not, in the field of test-driven development (TDD) at least, specify how they operationalized the response variables and the measurement process. This article has three aims: (i) identify the response variable operationalization components in TDD experiments that study external quality; (ii) study their influence on the experimental results;(ii) determine if the experiment reports describe the measurement process components that have an impact on the results. Sequential mixed method. The first part of the research adopts a quantitative approach applying a statistical an\'alisis (SA) of the impact of the operationalization components on the experimental results. The second part follows on with a qualitative approach applying a systematic mapping study (SMS). The test suites, intervention types and measurers have an influence on the measurements and results of the SA of TDD experiments in SE. The test suites have a major impact on both the measurements and the results of the experiments. The intervention type has less impact on the results than on the measurements. While the measurers have an impact on the measurements, this is not transferred to the experimental results. On the other hand, the results of our SMS confirm that TDD experiments do not usually report either the test suites, the test case generation method, or the details of how external quality was measured. A measurement protocol should be used to assure that the measurements made by different measurers are similar. It is necessary to report the test cases, the experimental task and the intervention type in order to be able to reproduce the measurements and SA, as well as to replicate experiments and build dependable families of experiments.
November 2023
·
1 Citation
June 2023
·
255 Reads
[Context] Microservices enable the decomposition of applications into small and independent services connected together. The independence between services could positively affect the development velocity of a project, which is considered an important metric measuring the time taken to implement features and fix bugs. However, no studies have investigated the connection between microservices and development velocity. [Objective and Method] The goal of this study plan is to investigate the effect microservices have on development velocity. The study compares GitHub projects adopting microservices from the beginning and similar projects using monolithic architectures. We designed this study using a cohort study method, to enable obtaining a high level of evidence. [Results] The result of this work enables the confirmation of the effective improvement of the development velocity of microservices. Moreover, this study will contribute to the body of knowledge of empirical methods being among the first works adopting the cohort study methodology.
May 2023
·
41 Reads
Software engineering techniques are increasingly relying on deep learning approaches to support many software engineering tasks, from bug triaging to code generation. To assess the efficacy of such techniques researchers typically perform controlled experiments. Conducting these experiments, however, is particularly challenging given the complexity of the space of variables involved, from specialized and intricate architectures and algorithms to a large number of training hyper-parameters and choices of evolving datasets, all compounded by how rapidly the machine learning technology is advancing, and the inherent sources of randomness in the training process. In this work we conduct a mapping study, examining 194 experiments with techniques that rely on deep neural networks appearing in 55 papers published in premier software engineering venues to provide a characterization of the state-of-the-practice, pinpointing experiments common trends and pitfalls. Our study reveals that most of the experiments, including those that have received ACM artifact badges, have fundamental limitations that raise doubts about the reliability of their findings. More specifically, we find: weak analyses to determine that there is a true relationship between independent and dependent variables (87% of the experiments); limited control over the space of DNN relevant variables, which can render a relationship between dependent variables and treatments that may not be causal but rather correlational (100% of the experiments); and lack of specificity in terms of what are the DNN variables and their values utilized in the experiments (86% of the experiments) to define the treatments being applied, which makes it unclear whether the techniques designed are the ones being assessed, or how the sources of extraneous variation are controlled. We provide some practical recommendations to address these limitations.
Citations (60)
... We then presented the design outline of the visualization methodology experiment in [23], which involved implementing a tool named Microvision utilizing Augmented Reality (AR) medium to address the challenge of limited rendering space in traditional visualization, as detailed in [24]. For assessment purposes, we conducted controlled experiments comparing Microvision with conventional 2D-graph-based visualization tools, with the protocol and results analysis published in [25]. The 2D tool uses rectangular boxes and arrows to present microservice dependency graphs, similar to commercial and open-source tools. ...
- Citing Conference Paper
May 2023
... Purposive sampling is the most common methodology for participant selection in SE research. 28,29 The experts have a broad experience in a variety of SE dimensions, such as requirements engineering (RE), software development, testing, and SPI in both organizational and academic settings. Table 1 shows that About 70% (7) of the experts have over 10 years of experience in SE in academia and/or industry. ...
- Citing Conference Paper
- Full-text available
August 2021
... In order to ensure the smooth progress of the experiment, we translated the experimental materials into the participants' native language (Spanish) so that they did not have to spend time and mental effort on language translation. We acknowledge that, although the experimental material was translated into the participants' native language to make them feel comfortable, the self-assessment questions may not accurately capture their background [64], [65]. Thus, the use of questionnaires may have biased the results of the satisfaction response variable. ...
- Citing Article
- Publisher preview available
May 2021
Empirical Software Engineering
... A discussion of related work (Section 8), and a short conclusion (Section 9) wrap up our contribution. We use a combination of engineering research, benchmarking and repository mining in this paper [31]. ...
- Citing Technical Report
- Full-text available
March 2021
... We round out the results of the LMMs with a cumulative meta-analysis [60] of LMMs (see Appendix C, available in the online supplemental material), which shows that the treatment estimates for both SOCIO and Creately tend to come closer (with increasingly narrower confidence intervals) as the experiment results within the family pile up (i.e., EXP1 vs. EXP1þEXP2, EXP1þEXP2þEXP3). ...
- Citing Article
- Publisher preview available
February 2021
Empirical Software Engineering
... In order to ensure the smooth progress of the experiment, we translated the experimental materials into the participants' native language (Spanish) so that they did not have to spend time and mental effort on language translation. We acknowledge that, although the experimental material was translated into the participants' native language to make them feel comfortable, the self-assessment questions may not accurately capture their background [64], [65]. Thus, the use of questionnaires may have biased the results of the satisfaction response variable. ...
- Citing Article
November 2020
Empirical Software Engineering
... Um processo de teste de software tem como objetivo estruturar as etapas, as atividades, os artefatos, os papéis e as responsabilidades do teste, permitindo organização e controle de todo o ciclo do teste, minimizando os riscos e agregando qualidade ao software [1]. Contudo, é considerada uma das práticas do processo de desenvolvimento mais custosas [2]. Por isso o teste de software necessita de um bom gerenciamento a fim de evitar perdas de recursos e atrasos no cronograma. ...
- Citing Article
March 2004
Empirical Software Engineering
... The works that are most related to ours are [2], [66], and [70], who propose a tabular form to summarize the experiments that compose a family. In [2], replications are reported including their motivation, their changes (in unstructured narrative text), the confirmation or non-confirmation of results in previous experiments, other characteristics such as subjects, tasks, and materials, and whether hypotheses or research questions changed from previous experiments. ...
- Citing Article
- Publisher preview available
July 2020
Empirical Software Engineering
... More and more replications of experiments are being conducted in SE [15]. Different authors have analysed the process of experiment replication [16] and data aggregation techniques [17] in order to identify the best techniques for use in the field of SE. ...
- Citing Article
- Full-text available
June 2020
Software Quality Journal
... [PS119] I IEEE Transactions on Software Engineering ITL (-) The subjects produce better quality code with ITL than with TDD where the subtasks are divided into user stories, coded and tested. [67] TF=Test First; TL = Test Last; ITL = Iterative Test Last, BDD = Behaviour-driven development; YW = Your way reported, (iv) who ran the test cases, (v) the technique used to generate the test cases, and (vi) the respective reference. As Table 7 shows, test cases are reported in three experiments and not detailed in 15, that is, 16.66 % and 83.33 %, respectively. ...
- Citing Article
October 2019
IEEE Transactions on Software Engineering