ArticlePDF Available

PrioriTTVs: A process aimed at supporting researchers to prioritize threats to validity and their mitigation actions when planning controlled experiments in SE

Authors:
  • Federal Institute for Education, Science, and Technology of Pernamuco (IFPE)

Abstract and Figures

Context Researchers argue that a critical component of any empirical study in Software Engineering (SE) is to identify, analyze, and mitigate threats to validity. Objective We propose PrioriTTVs, a process to support researchers in identifying and prioritizing threats to validity and their corresponding mitigation actions when planning controlled experiments in SE. We also introduce a tool to support the entire process. Method Empirical studies were conducted with six experts and 20 postgraduate students to evaluate the ease of use, learning, and perceptions of satisfaction regarding PrioriTTVs. Results So far, participants have considered PrioriTTVs to be useful (83%), significantly contributing to learning (90%), and satisfaction (75%). Conclusions We believe both novice and expert users can benefit from the process we propose for addressing threats to validity when conducting SE experiments. We also intend to extend our approach to manage threats specific to different SE experiment contexts.
Content may be subject to copyright.
PrioriTTVs: A process aimed at supporting researchers
to prioritize threats to validity and their mitigation
actions when planning controlled experiments in SE
1,4Eudis Teixeira, 1Liliane Fonseca, 2Bruno Cartaxo, 1,3 Sergio Soares
1Federal University of Pernambuco (UFPE) - Recife, Pernambuco, Brazil
2Federal Institute of Pernambuco (IFPE) - Paulista, Pernambuco, Brazil
3Senai Innovation Institute for ICT (ISI-TICs) - Recife, Pernambuco, Brazil
4Federal Institute of Sert˜ao Pernambucano (IF Sert˜ao-PE) - Petrolina, Pernambuco, Brazil
eot@cin.ufpe.br, lss4@cin.ufpe.br, email@brunocartaxo.com, scbs@cin.ufpe.br
Abstract
Context: Researchers argue that a critical component of any empirical study
in Software Engineering (SE) is to identify, analyze, and mitigate threats to
validity.
Objective: We propose PrioriTTVs, a process to support researchers in iden-
tifying and prioritizing threats to validity and their corresponding mitigation
actions when planning controlled experiments in SE. We also introduce a tool
to support the entire process.
Method: Empirical studies were conducted with six experts and 20 postgradu-
ate students to evaluate the ease of use, learning, and perceptions of satisfaction
regarding PrioriTTVs.
Results: So far, participants have considered PrioriTTVs to be useful (83%),
significantly contributing to learning (90%), and satisfaction (75%).
Conclusions: We believe both novice and expert users can benefit from the
process we propose for addressing threats to validity when conducting SE ex-
periments. We also intend to extend our approach to manage threats specific
to different SE experiment contexts.
Keywords: Empirical Studies, Threats to Validity, Controlled
Experiment
Preprint submitted to Information and Software Technology August 2, 2019
1. Introduction
To obtain correct conclusions from an experiment, researchers must identify
and manage factors that may interfere with results, thereby provoking biased
conclusions [1]. Threats to validity (TTVs) are the potential risks (external,
internal, in conclusion, and in construction) that may occur during the planning,5
execution, and reporting of empirical studies. TTVs may affect the results of
studies, compromising their efficacy [2].
An experimental plan is a document where experimenters describe specific
procedures and directions to be followed in conducting and analyzing experi-
ments. It is also useful to execute and analyze research [3]. Therefore, there10
is a need for validity evaluation during planning, aiming to deal with potential
threats such as undesired interferences or inadequate design decisions [2].
Identifying, analyzing, and prioritizing controlled experiment TTVs are not
trivial when defining an experimental plan. For example, when we prioritize
one type of threat, another may be affected [2]. There is also controversy over15
the relationships between TTVs and possible control actions. For instance, by
mitigating a threat, a new one may arise [1].
When prioritizing TTVs, one needs to take into account the goals and char-
acteristics of each empirical study. The researcher plays a crucial role in this
context because some TTVs will be considered more critical than others.20
Fields of study which have come of age - like medicine, education, social
sciences, and psychology - have strategies to manage TTVs, focusing on threat
frequency and severity as a way to prioritize which one will be mitigated [4, 5].
We recently surveyed 115 specialists from 30 different countries, publishing
controlled experiments at the leading SE and Empirical Software Engineering25
(ESE) conferences and in journals [6]. The goal was to characterize researchers’
actions when trying to mitigate TTVs, as well as to understand their perceptions
about the actions they are used to deploying, compared to the ones provided by
the literature. The specialists believe it is not possible to mitigate all TTVs, and
that is why it is essential to prioritize threats according to each experiment’s30
2
characteristics.
To overcome those challenges, we propose a process called PrioriTTVs. It
aims at identifying, analyzing, and prioritizing TTVs, as well as defining miti-
gation actions for each threat. With PrioriTTVs, researchers in SE can improve
their controlled experiments’ plans.35
We expect PrioriTTVs to contribute to reducing the efforts of experimenters,
especially among beginners, while managing TTVs during the planning phase of
their studies. This article is organized as follows: Section 2 reports the proposed
process, preliminary experimental results, and main findings; and Section 3
presents some discussions, future work, and the main conclusions.40
2. The PrioriTTVs Process
In this process, the overall focus is to support researchers’ identifying and
prioritizing TTVs and their respective mitigation actions during SE controlled
experiment planning. Following this, we present the goal of this study in ad-
herence with the GQM template [7]:45
To Analyze the process of managing threats to validity. For the purpose
of identifying experiment characteristics; to analyze and prioritize TTVs and
the actions taken for their mitigation. With respect to planning of controlled
experiments involving human subjects. From the point of view of experts and
experimental beginners. In the context of Empirical Software Engineering.50
We also developed a previously available tool for planning and reviewing ex-
perimental plans developed by Fonseca [8] for supporting the PrioriTTVs pro-
cess. The tool ValidEPlan1supports validity-oriented experimental planning,
automating our process.
1ValidEPlan: Validity-Oriented Software Engineering Experiments Planning Tool.
https://valideplan.cin.ufpe.br
3
2.1. Conceptual Modeling55
Figure 1 presents the PrioriTTVs through SA, a structured analysis language
[9]. In SA, input information is represented by arrows to the left, output by
arrows to the right, control by arrows on the top, and the mechanism by bottom
arrows, as depicted in Figure 1.
Figure 1: PrioriTTVs: A process to identify and prioritize threats to validity.
A1. Identify Experiment Characteristics: In this activity, major experi-60
ment characteristics are identified by the experimenter through an instrument2,
(checklist). Its creation is based on the structure of an experimental plan defined
by Fonseca [3]. This instrument is divided into eight categories: (1) Goals, (2)
Hypotheses, Variables, and Measurement, (3) Participants, (4) Experimental
Materials and Tasks, (5) Experimental Design, (6) Procedure, (7) Data Col-65
lection and Data Analysis and, (8) Document. The checklist has 43 questions
associated with the eight categories that may be answered with one of the fol-
lowing four options: Yes, Partially, No or Not Applicable. The experiment
characteristics are the output of Activity A1, which in turn are the input to
the next activity. Activity A1-1 uses a search algorithm to relate the checklist70
responses filled in by the experimenter with the TTVs database. In the end, the
output is a list of potential threats and their mitigation actions as a suggestion
to the experimenter.
A2. Preliminary Threat Analysis: In this activity, the experimenter clas-
2The instrument for Experiment Characterization (checklist) is available at
https://tinyurl.com/y8qz7cxq
4
sifies threats identified in activity A1-1 according to their priority. The classi-75
fication has three criteria: IMPACT (consequences to the experiment provoked
by the threat), URGENCY (how fast a specific threat should be solved), and
TENDENCY (prognostic on how the threat tends to evolve). Each criterion
receives a score from 1 to 3 based on its intensity. The output of these activities
is a list of prioritized TTVs.80
After that, a magnitude scale of TTVs adapted from Garvey [10] will help
the experimenter to understand the degree of importance each identified risk
will have in his/her experiment. The scale is categorized as (1) Very High, (2)
High, (3) Moderate, (4) Low and, (5) Very Low. Thus, the experimenter can
define specific control actions during experiment planning.85
A3. Analysis of Mitigation Actions Effectiveness: In this activity, the
experimenter classifies mitigation actions for each TTV. The classification uses
a scale of importance/effectiveness, considering the following possible values: (i)
not important; (ii) slightly important; (iii) moderately important; (iv) impor-
tant; and (v) very important.90
2.2. Preliminary experimental results
We conducted a preliminary assessment with two objectives: (1) To eval-
uate the proposed procedures for identifying and classifying TTVs and their
respective mitigation actions. To do so, we assess the checklist and the pro-
posed procedures for threats prioritization; (2) To evaluate the impact of using95
the ValidEPlan tool regarding its ease of use, user satisfaction, and if the tool
fosters learning in users in relation to concepts about experiment TTVs. Six
experts in ESE participated in this evaluation, all of them being members of
the ESE Group3. On top of this, 20 postgraduate students enrolled in the ex-
perimental software engineering discipline of the Center of Informatics (CIn) at100
the Federal University of Pernambuco in Brazil took part in this study.
The experts evaluated each activity of the PrioriTTVs through a rating scale
3ESEG - Empirical Software Engineering Group. https://sites.google.com/site/eseportal
5
and then justified their responses. Initially, they concluded that risk factors such
as TTVs have direct repercussions on the quality, reliability, and security of data
of the experiment. They also agreed that the ValidEPlan tool may contribute to105
identifying and prioritizing threats to experimental validity, as well as providing
a systematization for experimental planning. Gaps and points for improvement
identified by the experts are: (a) The possibility of including new issues on the
checklist; (b) Addressing other phases of the experimental process; (c) Allowing
new threats and mitigation actions to be added to the database; (d) Integrating110
the ValidEPlan with other experimental tools.
Concerning the use of the ValidEPlan, 50% of the 20 participants believe
the tool is useful; and for 33.3% the tool is extremely useful. On a scale of 1 to
5 regarding the ease of use, 33.3% of participants considered it easy, and 66.6%
believe the tool is extremely easy.115
Regarding the learning of experiment TTVs, we highlight three points.
Firstly, that 90% of the participants totally agree that using the tool contributed
to learning about identification and control of TTVs during experimental plan-
ning, while 5% disagree, and 5% partially agree. Secondly, 75% answered they
totally agree that using the tool improved their understanding about the the-120
oretical concepts approached in the classroom during the experimental soft-
ware engineering discipline, while 20% partially agree, and 5% are indifferent.
Thirdly, 60% totally agree that using the tool helps to relate concepts about
identification and control of TTVs studied in the classroom with the practice of
experimentation, while 35% partially agree, and 5% are indifferent.125
Concerning participants’ satisfaction, we observed that 75% agree they are
satisfied with the ValidEPlan tool, and 25% partially agree. 60% of the partic-
ipants agree that using the tool provided a greater motivation to learn about
control of TTVs, while 25% partially agree, 10% are indifferent, and 5% partially
disagree. We also note that 95% of the respondents agree that they would use130
ValidEPlan again, and, moreover, would recommend it to other experimenters,
while 5% partially agree.
6
3. Discussion and Conclusion
Supporting researchers to identify and to classify experiments TTVs is chal-
lenging but undoubtedly relevant. The current state of the art suggests using135
summaries or checklists to detect threats.
We propose and briefly demonstrate the PrioriTTVs whose aims are: (1) to
reduce the efforts required by the experimenter through a structured process
capable of providing support for identifying, prioritizing and mitigating TTVs
in SE experiments; (2) to provide a checklist capable of capturing informa-140
tion about the main features of an experimental plan; (3) to provide an online
repository of TTVs and control actions organized according to types of threats;
and (4) to contribute to the teaching and learning of experimental software en-
gineering, relating the concepts studied in the classroom with the practice of
experimentation.145
Despite the aforementioned benefits, we envision some challenges yet to be
overcome in our proposal. Threats to validity are very specific and tied to each
experiment’s context. This is why it is hard to have a complete threats database
for each specific context. To mitigate this problem, we are planning to evaluate
the PrioriTTVs by planning experiments for different SE contexts. We are also150
planning to conduct a systematic mapping study to identify experiments’ TTVs
in diverse SE contexts.
We also recognize that our checklist is not yet a finished and complete in-
strument; however, it has great potential to evolve. That’s why there will be
other evaluations to guarantee the presence of essential elements to identify the155
characteristics of a controlled SE experiment, reducing its bias.
In this paper, we propose a process to support researchers in their managing
of TTVs when planning controlled experiments in SE. Additionally, we report
preliminary results of empirical studies we have conducted with six ESE experts
and 20 postgraduate students aiming to assess our approach. We believe our160
approach might positively impact research in SE, contributing to enhancing
controlled experiments quality by improving their validity.
7
Acknowledgment
This research was partially funded by INES 2.0, FACEPE grant APQ-0399-
1.03/17, CAPES grant 88887.136410/2017-00, 88887.351815/2019-00, CNPq165
grant 465614/2014-0, and 141705/2015-9.
References
[1] A. A. Neto, T. Conte, A conceptual model to address threats to validity in
controlled experiments, in: Proceedings of the 17th International Confer-
ence on Evaluation and Assessment in Software Engineering, ACM, 2013,170
pp. 82–85.
[2] C. Wohlin, P. Runeson, M. H¨ost, M. C. Ohlsson, B. Regnell, A. Wessl´en,
Experimentation in software engineering, Springer Science & Business Me-
dia, 2012.
[3] L. Fonseca, An instrument for reviewing the completeness of experimental175
plans for controlled experiments using human subjects in software engi-
neering, Ph.D. thesis, Federal University of Pernambuco, Recife, Brazil,
https://repositorio.ufpe.br/handle/123456789/22421 (2016).
[4] V. C. Henderson, J. Kimmelman, D. Fergusson, J. M. Grimshaw, D. G.
Hackam, Threats to validity in the design and conduct of preclinical efficacy180
studies: a systematic review of guidelines for in vivo animal experiments,
PLoS medicine 10 (7) (2013) e1001489.
[5] R. Singh, A. Singh, T. J. Servoss, G. Singh, Prioritizing threats to patient
safety in rural primary care, The Journal of Rural Health 23 (2) (2007)
173–178.185
[6] E. Teixeira, L. Fonseca, S. Soares, Threats to validity in controlled ex-
periments in software engineering: what the experts say and why this is
relevant, in: Proceedings of the XXXII Brazilian Symposium on Software
Engineering, ACM, 2018, pp. 52–61.
8
[7] V. R. B.-G. Caldiera, H. D. Rombach, Goal question metric paradigm,190
Encyclopedia of software engineering (1) (1994) 528–532.
[8] L. Fonseca, E. Lucena, S. Soares, C. Seaman, Reviewer ep: A collabora-
tive web platform for reviewing the completeness of experimental plans (in
portuguese), in: Tools Session in VIII Brazilian Conference on Software:
Theory and Practice (CBSOFT), Brazil, 2017, pp. 97–104.195
[9] D. T. Ross, Structured analysis (sa): A language for communicating ideas,
IEEE Transactions on software engineering (1) (1977) 16–34.
[10] P. R. Garvey, Z. F. Lansdowne, Risk matrix: an approach for identifying,
assessing, and ranking program risks, Air Force Journal of Logistics 22 (1)
(1998) 18–21.200
9
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Context: Every experimental study has some threats to validity hindering its results. Goal: Improve software engineering controlled experiments quality by better understanding threats to validity control process. Method: A systematic Survey was executed to collect information from software engineering controlled experiments specialists. Data was quantitative and qualitatively analyzed. Results: 115 researchers took part in the study. Most of them (78.26%) consider extremely important to identify threats to validity during experiments planning to adjust it reducing the probability of threats to validity impacting experiment execution. Conclusions: Results bring participants point of view about identifying controlled experiments threats to validity. However, the study reveals some concerns since a considerable number (18.26%) of participants are not aware of threats to their studies or to new threats raised by actions took to address previous threats, hindering results validity.
Article
Full-text available
The vast majority of medical interventions introduced into clinical development prove unsafe or ineffective. One prominent explanation for the dismal success rate is flawed preclinical research. We conducted a systematic review of preclinical research guidelines and organized recommendations according to the type of validity threat (internal, construct, or external) or programmatic research activity they primarily address. We searched MEDLINE, Google Scholar, Google, and the EQUATOR Network website for all preclinical guideline documents published up to April 9, 2013 that addressed the design and conduct of in vivo animal experiments aimed at supporting clinical translation. To be eligible, documents had to provide guidance on the design or execution of preclinical animal experiments and represent the aggregated consensus of four or more investigators. Data from included guidelines were independently extracted by two individuals for discrete recommendations on the design and implementation of preclinical efficacy studies. These recommendations were then organized according to the type of validity threat they addressed. A total of 2,029 citations were identified through our search strategy. From these, we identified 26 guidelines that met our eligibility criteria-most of which were directed at neurological or cerebrovascular drug development. Together, these guidelines offered 55 different recommendations. Some of the most common recommendations included performance of a power calculation to determine sample size, randomized treatment allocation, and characterization of disease phenotype in the animal model prior to experimentation. By identifying the most recurrent recommendations among preclinical guidelines, we provide a starting point for developing preclinical guidelines in other disease domains. We also provide a basis for the study and evaluation of preclinical research practice. Please see later in the article for the Editors' Summary.
Chapter
The experiment data from the operation is input to the analysis and interpretation. After collecting experimental data in the operation phase, we want to be able to draw conclusions based on this data. To be able to draw valid conclusions, we must interpret the experiment data.
Conference Paper
Context: During the planning phase of an experiment, the threats to validity must be identified in order to assess their impact over the data. In addition, the actions to address these threats must be defined (if possible). Objective: This paper proposes a conceptual model which contains the relationships between threats to validity and actions to address them. Method: A Systematic Literature Review was conducted to collect data for building the conceptual model. We identified 166 papers published in nine journals and four conferences that reported threats to validity in controlled experiments. Results: We identified 39 threats to internal validity (95 actions to address them), 9 to external validity (23 actions to address them), 10 to construct validity (21 actions to address them), and 8 to conclusion validity (19 actions to address them). Conclusion: By presenting this conceptual model, we intend to assist novice researchers in identifying and addressing the threats to validity of empirical studies.
Article
Structured analysis (SA) combines blueprint-like graphic language with the nouns and verbs of any other language to provide a hierarchic, top-down, gradual exposition of detail in the form of an SA model. The things and happenings of a subject are expressed in a data decomposition and an activity decomposition, both of which employ the same graphic building block, the SA box, to represent a part of a whole. SA arrows, representing input, output, control, and mechanism, express the relation of each part to the whole. The paper describes the rationalization behind some 40 features of the SA language, and shows how they enable rigorous communication which results from disciplined, recursive application of the SA maxim: “Everything worth saying about anything worth saying something about must be expressed in six or fewer pieces.”
Article
Rural primary care is a complex environment in which multiple patient safety challenges can arise. To make progress in improving safety with limited resources, each practice needs to identify those safety problems that pose the greatest threat to patients and focus efforts on these. To describe and field-test a novel approach to prioritizing safety problems in rural primary care based on the method of Failure Modes and Effects Analysis. A survey instrument designed to assess perceptions of medical error frequency, severity, and cause was administered anonymously to staff of 2 rural primary care practices in New York State. Responses were converted to quantitative hazard scores, which were used to make priority rankings of safety problems. Concordance analysis was conducted. Response rate was 94% at each site. Analysis yielded a list of priorities for each site. Comparison between staff groups (provider vs nursing vs administration), based on the top 10 priorities perceived by staff, showed 53% concordance at one site and 30% at the other. Concordance between sites was lower, at 20%. Initial field-testing of a Failure Modes and Effects Analysis approach in rural primary care suggests that it is feasible and can be used to estimate, based on staff perceptions, the greatest threats to patient safety in an individual practice so that limited resources can be focused appropriately. Higher concordance between staff within a practice than between practices lends preliminary support to the validity of the approach.
An instrument for reviewing the completeness of experimental 175 plans for controlled experiments using human subjects in software engineering
  • L Fonseca
L. Fonseca, An instrument for reviewing the completeness of experimental 175 plans for controlled experiments using human subjects in software engineering, Ph.D. thesis, Federal University of Pernambuco, Recife, Brazil, https://repositorio.ufpe.br/handle/123456789/22421 (2016).
Goal question metric paradigm
  • B.-G Caldiera
  • H D Rombach
V. R. B.-G. Caldiera, H. D. Rombach, Goal question metric paradigm, 190 Encyclopedia of software engineering (1) (1994) 528-532.
Reviewer ep: A collaborative web platform for reviewing the completeness of experimental plans
  • L Fonseca
  • E Lucena
  • S Soares
  • C Seaman
L. Fonseca, E. Lucena, S. Soares, C. Seaman, Reviewer ep: A collaborative web platform for reviewing the completeness of experimental plans (in portuguese), in: Tools Session in VIII Brazilian Conference on Software: Theory and Practice (CBSOFT), Brazil, 2017, pp. 97-104.