Sira Vegas’s research while affiliated with Universidad Politécnica de Madrid and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (103)


Evidence-Based Commit Message Generation with Deep Learning Techniques (EvidenCoM)
  • Conference Paper

October 2024

Sira Vegas

·

Xavier Ferre

·

Hongming Zhu


Relevant information in TDD experiment reporting

August 2024

·

7 Reads

ACM Transactions on Software Engineering and Methodology

·

·

·

[...]

·

Sira Vegas

Experiments are a commonly used method of research in software engineering (SE). Researchers report their experiments following detailed guidelines. However, researchers do not, in the field of test-driven development (TDD) at least, specify how they operationalized the response variables and, particularly, the measurement process. This article has three aims: (i) identify the response variable operationalization components in TDD experiments that study external quality; (ii) study their influence on the experimental results; (iii) determine if the experiment reports describe the measurement process components that have an impact on the results. We used two-part sequential mixed methods research. The first part of the research adopts a quantitative approach applying a statistical analysis of the impact of the operationalization components on the experimental results. The second part follows on with a qualitative approach applying a systematic mapping study (SMS). The test suites, intervention types and measurers have an influence on the measurements and results of the statistical analysis of TDD experiments in SE. The test suites have a major impact on both the measurements and the results of the experiments. The intervention type has less impact on the results than on the measurements. While the measurers have an impact on the measurements, this is not transferred to the experimental results. On the other hand, the results of our SMS confirm that TDD experiments do not usually report either the test suites, the test case generation method, or the details of how external quality was measured. A measurement protocol should be used to assure that the measurements made by different measurers are similar. It is necessary to report the test cases, the experimental task and the intervention type in order to be able to reproduce the measurements and statistical analyses, as well as to replicate experiments and build dependable families of experiments.


Figure 1: Relevant Factors Influencing the Response Variable in a Crossover-Design Experiment
Figure 2: Types of subjects in the experiments
Figure 8: Availability of Analysis Scripts
Data Extraction Attributes
Types of Threat Addressal
Crossover Designs in Software Engineering Experiments: Review of the State of Analysis
  • Preprint
  • File available

August 2024

·

18 Reads

Experimentation is an essential method for causal inference in any empirical discipline. Crossover-design experiments are common in Software Engineering (SE) research. In these, subjects apply more than one treatment in different orders. This design increases the amount of obtained data and deals with subject variability but introduces threats to internal validity like the learning and carryover effect. Vegas et al. reviewed the state of practice for crossover designs in SE research and provided guidelines on how to address its threats during data analysis while still harnessing its benefits. In this paper, we reflect on the impact of these guidelines and review the state of analysis of crossover design experiments in SE publications between 2015 and March 2024. To this end, by conducting a forward snowballing of the guidelines, we survey 136 publications reporting 67 crossover-design experiments and evaluate their data analysis against the provided guidelines. The results show that the validity of data analyses has improved compared to the original state of analysis. Still, despite the explicit guidelines, only 29.5% of all threats to validity were addressed properly. While the maturation and the optimal sequence threats are properly addressed in 35.8% and 38.8% of all studies in our sample respectively, the carryover threat is only modeled in about 3% of the observed cases. The lack of adherence to the analysis guidelines threatens the validity of the conclusions drawn from crossover design experiments

Download

The role of slicing in test-driven development

July 2024

·

21 Reads

Test-driven development (TDD) is a widely used agile practice. However, very little is known with certainty about TDD's underlying foundations, i.e., the way TDD works. In this paper, we propose a theoretical framework for TDD, with the following characteristics: 1) Each TDD cycle represents a vertical slice of a (probably also small) user story, 2) vertical slices are captured using contracts, implicit in the developers' minds, and 3) the code created during a TDD cycle is a sliced-based specification of a code oracle, using the contracts as slicing pre/post-conditions. We have checked the connections among TDD, contracts, and slices using a controlled experiment conducted in the industry.



Relevant information in TDD experiment reporting

June 2024

·

30 Reads

Experiments are a commonly used method of research in software engineering (SE). Researchers report their experiments following detailed guidelines. However, researchers do not, in the field of test-driven development (TDD) at least, specify how they operationalized the response variables and the measurement process. This article has three aims: (i) identify the response variable operationalization components in TDD experiments that study external quality; (ii) study their influence on the experimental results;(ii) determine if the experiment reports describe the measurement process components that have an impact on the results. Sequential mixed method. The first part of the research adopts a quantitative approach applying a statistical an\'alisis (SA) of the impact of the operationalization components on the experimental results. The second part follows on with a qualitative approach applying a systematic mapping study (SMS). The test suites, intervention types and measurers have an influence on the measurements and results of the SA of TDD experiments in SE. The test suites have a major impact on both the measurements and the results of the experiments. The intervention type has less impact on the results than on the measurements. While the measurers have an impact on the measurements, this is not transferred to the experimental results. On the other hand, the results of our SMS confirm that TDD experiments do not usually report either the test suites, the test case generation method, or the details of how external quality was measured. A measurement protocol should be used to assure that the measurements made by different measurers are similar. It is necessary to report the test cases, the experimental task and the intervention type in order to be able to reproduce the measurements and SA, as well as to replicate experiments and build dependable families of experiments.



Does Microservices Adoption Impact the Development Velocity? A Cohort Study. A Registered Report

June 2023

·

255 Reads

[Context] Microservices enable the decomposition of applications into small and independent services connected together. The independence between services could positively affect the development velocity of a project, which is considered an important metric measuring the time taken to implement features and fix bugs. However, no studies have investigated the connection between microservices and development velocity. [Objective and Method] The goal of this study plan is to investigate the effect microservices have on development velocity. The study compares GitHub projects adopting microservices from the beginning and similar projects using monolithic architectures. We designed this study using a cohort study method, to enable obtaining a high level of evidence. [Results] The result of this work enables the confirmation of the effective improvement of the development velocity of microservices. Moreover, this study will contribute to the body of knowledge of empirical methods being among the first works adopting the cohort study methodology.


Number of papers within scope analyzed / pub- lished, and experiments analyzed (in parenthesis)
Assessing of a sampled experiment [51] specification in terms of Fully addressed, Partially addressed, or Missing.
Characterization of 194 experiments with DNNs
Characterization of (44) experiments that earned ACM Artifact Badges.
Extraneous variables and how to deal with them
Pitfalls in Experiments with DNN4SE: An Analysis of the State of the Practice

May 2023

·

41 Reads

Software engineering techniques are increasingly relying on deep learning approaches to support many software engineering tasks, from bug triaging to code generation. To assess the efficacy of such techniques researchers typically perform controlled experiments. Conducting these experiments, however, is particularly challenging given the complexity of the space of variables involved, from specialized and intricate architectures and algorithms to a large number of training hyper-parameters and choices of evolving datasets, all compounded by how rapidly the machine learning technology is advancing, and the inherent sources of randomness in the training process. In this work we conduct a mapping study, examining 194 experiments with techniques that rely on deep neural networks appearing in 55 papers published in premier software engineering venues to provide a characterization of the state-of-the-practice, pinpointing experiments common trends and pitfalls. Our study reveals that most of the experiments, including those that have received ACM artifact badges, have fundamental limitations that raise doubts about the reliability of their findings. More specifically, we find: weak analyses to determine that there is a true relationship between independent and dependent variables (87% of the experiments); limited control over the space of DNN relevant variables, which can render a relationship between dependent variables and treatments that may not be causal but rather correlational (100% of the experiments); and lack of specificity in terms of what are the DNN variables and their values utilized in the experiments (86% of the experiments) to define the treatments being applied, which makes it unclear whether the techniques designed are the ones being assessed, or how the sources of extraneous variation are controlled. We provide some practical recommendations to address these limitations.


Citations (60)


... We then presented the design outline of the visualization methodology experiment in [23], which involved implementing a tool named Microvision utilizing Augmented Reality (AR) medium to address the challenge of limited rendering space in traditional visualization, as detailed in [24]. For assessment purposes, we conducted controlled experiments comparing Microvision with conventional 2D-graph-based visualization tools, with the protocol and results analysis published in [25]. The 2D tool uses rectangular boxes and arrows to present microservice dependency graphs, similar to commercial and open-source tools. ...

Reference:

Fostering Microservice Maintainability Assurance through a Comprehensive Framework
Comparing 2D and Augmented Reality Visualizations for Microservice System Understandability: A Controlled Experiment
  • Citing Conference Paper
  • May 2023

... Purposive sampling is the most common methodology for participant selection in SE research. 28,29 The experts have a broad experience in a variety of SE dimensions, such as requirements engineering (RE), software development, testing, and SPI in both organizational and academic settings. Table 1 shows that About 70% (7) of the experts have over 10 years of experience in SE in academia and/or industry. ...

Towards a Methodology for Participant Selection in Software Engineering Experiments. A Vision of the Future

... In order to ensure the smooth progress of the experiment, we translated the experimental materials into the participants' native language (Spanish) so that they did not have to spend time and mental effort on language translation. We acknowledge that, although the experimental material was translated into the participants' native language to make them feel comfortable, the self-assessment questions may not accurately capture their background [64], [65]. Thus, the use of questionnaires may have biased the results of the satisfaction response variable. ...

A family of experiments on test-driven development

Empirical Software Engineering

... We round out the results of the LMMs with a cumulative meta-analysis [60] of LMMs (see Appendix C, available in the online supplemental material), which shows that the treatment estimates for both SOCIO and Creately tend to come closer (with increasingly narrower confidence intervals) as the experiment results within the family pile up (i.e., EXP1 vs. EXP1þEXP2, EXP1þEXP2þEXP3). ...

Comparing the results of replications in software engineering

Empirical Software Engineering

... In order to ensure the smooth progress of the experiment, we translated the experimental materials into the participants' native language (Spanish) so that they did not have to spend time and mental effort on language translation. We acknowledge that, although the experimental material was translated into the participants' native language to make them feel comfortable, the self-assessment questions may not accurately capture their background [64], [65]. Thus, the use of questionnaires may have biased the results of the satisfaction response variable. ...

A Family Of Experiments on Test-Driven Development
  • Citing Article
  • November 2020

Empirical Software Engineering

... Um processo de teste de software tem como objetivo estruturar as etapas, as atividades, os artefatos, os papéis e as responsabilidades do teste, permitindo organização e controle de todo o ciclo do teste, minimizando os riscos e agregando qualidade ao software [1]. Contudo, é considerada uma das práticas do processo de desenvolvimento mais custosas [2]. Por isso o teste de software necessita de um bom gerenciamento a fim de evitar perdas de recursos e atrasos no cronograma. ...

Reviewing 25 Years of Testing Technique Experiments
  • Citing Article
  • March 2004

Empirical Software Engineering

... The works that are most related to ours are [2], [66], and [70], who propose a tabular form to summarize the experiments that compose a family. In [2], replications are reported including their motivation, their changes (in unstructured narrative text), the confirmation or non-confirmation of results in previous experiments, other characteristics such as subjects, tasks, and materials, and whether hypotheses or research questions changed from previous experiments. ...

On (Mis)perceptions of testing effectiveness: an empirical study

Empirical Software Engineering

... More and more replications of experiments are being conducted in SE [15]. Different authors have analysed the process of experiment replication [16] and data aggregation techniques [17] in order to identify the best techniques for use in the field of SE. ...

Increasing validity through replication: an illustrative TDD case

Software Quality Journal

... [PS119] I IEEE Transactions on Software Engineering ITL (-) The subjects produce better quality code with ITL than with TDD where the subtasks are divided into user stories, coded and tested. [67] TF=Test First; TL = Test Last; ITL = Iterative Test Last, BDD = Behaviour-driven development; YW = Your way reported, (iv) who ran the test cases, (v) the technique used to generate the test cases, and (vi) the respective reference. As Table 7 shows, test cases are reported in three experiments and not detailed in 15, that is, 16.66 % and 83.33 %, respectively. ...

Investigating the Impact of Development Task on External Quality in Test-Driven Development: An Industry Experiment
  • Citing Article
  • October 2019

IEEE Transactions on Software Engineering