Content uploaded by Bruno Cartaxo
Author content
All content in this area was uploaded by Bruno Cartaxo on Aug 02, 2019
Content may be subject to copyright.
PrioriTTVs: A process aimed at supporting researchers
to prioritize threats to validity and their mitigation
actions when planning controlled experiments in SE
1,4Eudis Teixeira, 1Liliane Fonseca, 2Bruno Cartaxo, 1,3 Sergio Soares
1Federal University of Pernambuco (UFPE) - Recife, Pernambuco, Brazil
2Federal Institute of Pernambuco (IFPE) - Paulista, Pernambuco, Brazil
3Senai Innovation Institute for ICT (ISI-TICs) - Recife, Pernambuco, Brazil
4Federal Institute of Sert˜ao Pernambucano (IF Sert˜ao-PE) - Petrolina, Pernambuco, Brazil
eot@cin.ufpe.br, lss4@cin.ufpe.br, email@brunocartaxo.com, scbs@cin.ufpe.br
Abstract
Context: Researchers argue that a critical component of any empirical study
in Software Engineering (SE) is to identify, analyze, and mitigate threats to
validity.
Objective: We propose PrioriTTVs, a process to support researchers in iden-
tifying and prioritizing threats to validity and their corresponding mitigation
actions when planning controlled experiments in SE. We also introduce a tool
to support the entire process.
Method: Empirical studies were conducted with six experts and 20 postgradu-
ate students to evaluate the ease of use, learning, and perceptions of satisfaction
regarding PrioriTTVs.
Results: So far, participants have considered PrioriTTVs to be useful (83%),
significantly contributing to learning (90%), and satisfaction (75%).
Conclusions: We believe both novice and expert users can benefit from the
process we propose for addressing threats to validity when conducting SE ex-
periments. We also intend to extend our approach to manage threats specific
to different SE experiment contexts.
Keywords: Empirical Studies, Threats to Validity, Controlled
Experiment
Preprint submitted to Information and Software Technology August 2, 2019
1. Introduction
To obtain correct conclusions from an experiment, researchers must identify
and manage factors that may interfere with results, thereby provoking biased
conclusions [1]. Threats to validity (TTVs) are the potential risks (external,
internal, in conclusion, and in construction) that may occur during the planning,5
execution, and reporting of empirical studies. TTVs may affect the results of
studies, compromising their efficacy [2].
An experimental plan is a document where experimenters describe specific
procedures and directions to be followed in conducting and analyzing experi-
ments. It is also useful to execute and analyze research [3]. Therefore, there10
is a need for validity evaluation during planning, aiming to deal with potential
threats such as undesired interferences or inadequate design decisions [2].
Identifying, analyzing, and prioritizing controlled experiment TTVs are not
trivial when defining an experimental plan. For example, when we prioritize
one type of threat, another may be affected [2]. There is also controversy over15
the relationships between TTVs and possible control actions. For instance, by
mitigating a threat, a new one may arise [1].
When prioritizing TTVs, one needs to take into account the goals and char-
acteristics of each empirical study. The researcher plays a crucial role in this
context because some TTVs will be considered more critical than others.20
Fields of study which have come of age - like medicine, education, social
sciences, and psychology - have strategies to manage TTVs, focusing on threat
frequency and severity as a way to prioritize which one will be mitigated [4, 5].
We recently surveyed 115 specialists from 30 different countries, publishing
controlled experiments at the leading SE and Empirical Software Engineering25
(ESE) conferences and in journals [6]. The goal was to characterize researchers’
actions when trying to mitigate TTVs, as well as to understand their perceptions
about the actions they are used to deploying, compared to the ones provided by
the literature. The specialists believe it is not possible to mitigate all TTVs, and
that is why it is essential to prioritize threats according to each experiment’s30
2
characteristics.
To overcome those challenges, we propose a process called PrioriTTVs. It
aims at identifying, analyzing, and prioritizing TTVs, as well as defining miti-
gation actions for each threat. With PrioriTTVs, researchers in SE can improve
their controlled experiments’ plans.35
We expect PrioriTTVs to contribute to reducing the efforts of experimenters,
especially among beginners, while managing TTVs during the planning phase of
their studies. This article is organized as follows: Section 2 reports the proposed
process, preliminary experimental results, and main findings; and Section 3
presents some discussions, future work, and the main conclusions.40
2. The PrioriTTVs Process
In this process, the overall focus is to support researchers’ identifying and
prioritizing TTVs and their respective mitigation actions during SE controlled
experiment planning. Following this, we present the goal of this study in ad-
herence with the GQM template [7]:45
To Analyze the process of managing threats to validity. For the purpose
of identifying experiment characteristics; to analyze and prioritize TTVs and
the actions taken for their mitigation. With respect to planning of controlled
experiments involving human subjects. From the point of view of experts and
experimental beginners. In the context of Empirical Software Engineering.50
We also developed a previously available tool for planning and reviewing ex-
perimental plans developed by Fonseca [8] for supporting the PrioriTTVs pro-
cess. The tool ValidEPlan1supports validity-oriented experimental planning,
automating our process.
1ValidEPlan: Validity-Oriented Software Engineering Experiments Planning Tool.
https://valideplan.cin.ufpe.br
3
2.1. Conceptual Modeling55
Figure 1 presents the PrioriTTVs through SA, a structured analysis language
[9]. In SA, input information is represented by arrows to the left, output by
arrows to the right, control by arrows on the top, and the mechanism by bottom
arrows, as depicted in Figure 1.
Figure 1: PrioriTTVs: A process to identify and prioritize threats to validity.
A1. Identify Experiment Characteristics: In this activity, major experi-60
ment characteristics are identified by the experimenter through an instrument2,
(checklist). Its creation is based on the structure of an experimental plan defined
by Fonseca [3]. This instrument is divided into eight categories: (1) Goals, (2)
Hypotheses, Variables, and Measurement, (3) Participants, (4) Experimental
Materials and Tasks, (5) Experimental Design, (6) Procedure, (7) Data Col-65
lection and Data Analysis and, (8) Document. The checklist has 43 questions
associated with the eight categories that may be answered with one of the fol-
lowing four options: Yes, Partially, No or Not Applicable. The experiment
characteristics are the output of Activity A1, which in turn are the input to
the next activity. Activity A1-1 uses a search algorithm to relate the checklist70
responses filled in by the experimenter with the TTVs database. In the end, the
output is a list of potential threats and their mitigation actions as a suggestion
to the experimenter.
A2. Preliminary Threat Analysis: In this activity, the experimenter clas-
2The instrument for Experiment Characterization (checklist) is available at
https://tinyurl.com/y8qz7cxq
4
sifies threats identified in activity A1-1 according to their priority. The classi-75
fication has three criteria: IMPACT (consequences to the experiment provoked
by the threat), URGENCY (how fast a specific threat should be solved), and
TENDENCY (prognostic on how the threat tends to evolve). Each criterion
receives a score from 1 to 3 based on its intensity. The output of these activities
is a list of prioritized TTVs.80
After that, a magnitude scale of TTVs adapted from Garvey [10] will help
the experimenter to understand the degree of importance each identified risk
will have in his/her experiment. The scale is categorized as (1) Very High, (2)
High, (3) Moderate, (4) Low and, (5) Very Low. Thus, the experimenter can
define specific control actions during experiment planning.85
A3. Analysis of Mitigation Actions Effectiveness: In this activity, the
experimenter classifies mitigation actions for each TTV. The classification uses
a scale of importance/effectiveness, considering the following possible values: (i)
not important; (ii) slightly important; (iii) moderately important; (iv) impor-
tant; and (v) very important.90
2.2. Preliminary experimental results
We conducted a preliminary assessment with two objectives: (1) To eval-
uate the proposed procedures for identifying and classifying TTVs and their
respective mitigation actions. To do so, we assess the checklist and the pro-
posed procedures for threats prioritization; (2) To evaluate the impact of using95
the ValidEPlan tool regarding its ease of use, user satisfaction, and if the tool
fosters learning in users in relation to concepts about experiment TTVs. Six
experts in ESE participated in this evaluation, all of them being members of
the ESE Group3. On top of this, 20 postgraduate students enrolled in the ex-
perimental software engineering discipline of the Center of Informatics (CIn) at100
the Federal University of Pernambuco in Brazil took part in this study.
The experts evaluated each activity of the PrioriTTVs through a rating scale
3ESEG - Empirical Software Engineering Group. https://sites.google.com/site/eseportal
5
and then justified their responses. Initially, they concluded that risk factors such
as TTVs have direct repercussions on the quality, reliability, and security of data
of the experiment. They also agreed that the ValidEPlan tool may contribute to105
identifying and prioritizing threats to experimental validity, as well as providing
a systematization for experimental planning. Gaps and points for improvement
identified by the experts are: (a) The possibility of including new issues on the
checklist; (b) Addressing other phases of the experimental process; (c) Allowing
new threats and mitigation actions to be added to the database; (d) Integrating110
the ValidEPlan with other experimental tools.
Concerning the use of the ValidEPlan, 50% of the 20 participants believe
the tool is useful; and for 33.3% the tool is extremely useful. On a scale of 1 to
5 regarding the ease of use, 33.3% of participants considered it easy, and 66.6%
believe the tool is extremely easy.115
Regarding the learning of experiment TTVs, we highlight three points.
Firstly, that 90% of the participants totally agree that using the tool contributed
to learning about identification and control of TTVs during experimental plan-
ning, while 5% disagree, and 5% partially agree. Secondly, 75% answered they
totally agree that using the tool improved their understanding about the the-120
oretical concepts approached in the classroom during the experimental soft-
ware engineering discipline, while 20% partially agree, and 5% are indifferent.
Thirdly, 60% totally agree that using the tool helps to relate concepts about
identification and control of TTVs studied in the classroom with the practice of
experimentation, while 35% partially agree, and 5% are indifferent.125
Concerning participants’ satisfaction, we observed that 75% agree they are
satisfied with the ValidEPlan tool, and 25% partially agree. 60% of the partic-
ipants agree that using the tool provided a greater motivation to learn about
control of TTVs, while 25% partially agree, 10% are indifferent, and 5% partially
disagree. We also note that 95% of the respondents agree that they would use130
ValidEPlan again, and, moreover, would recommend it to other experimenters,
while 5% partially agree.
6
3. Discussion and Conclusion
Supporting researchers to identify and to classify experiments TTVs is chal-
lenging but undoubtedly relevant. The current state of the art suggests using135
summaries or checklists to detect threats.
We propose and briefly demonstrate the PrioriTTVs whose aims are: (1) to
reduce the efforts required by the experimenter through a structured process
capable of providing support for identifying, prioritizing and mitigating TTVs
in SE experiments; (2) to provide a checklist capable of capturing informa-140
tion about the main features of an experimental plan; (3) to provide an online
repository of TTVs and control actions organized according to types of threats;
and (4) to contribute to the teaching and learning of experimental software en-
gineering, relating the concepts studied in the classroom with the practice of
experimentation.145
Despite the aforementioned benefits, we envision some challenges yet to be
overcome in our proposal. Threats to validity are very specific and tied to each
experiment’s context. This is why it is hard to have a complete threats database
for each specific context. To mitigate this problem, we are planning to evaluate
the PrioriTTVs by planning experiments for different SE contexts. We are also150
planning to conduct a systematic mapping study to identify experiments’ TTVs
in diverse SE contexts.
We also recognize that our checklist is not yet a finished and complete in-
strument; however, it has great potential to evolve. That’s why there will be
other evaluations to guarantee the presence of essential elements to identify the155
characteristics of a controlled SE experiment, reducing its bias.
In this paper, we propose a process to support researchers in their managing
of TTVs when planning controlled experiments in SE. Additionally, we report
preliminary results of empirical studies we have conducted with six ESE experts
and 20 postgraduate students aiming to assess our approach. We believe our160
approach might positively impact research in SE, contributing to enhancing
controlled experiments quality by improving their validity.
7
Acknowledgment
This research was partially funded by INES 2.0, FACEPE grant APQ-0399-
1.03/17, CAPES grant 88887.136410/2017-00, 88887.351815/2019-00, CNPq165
grant 465614/2014-0, and 141705/2015-9.
References
[1] A. A. Neto, T. Conte, A conceptual model to address threats to validity in
controlled experiments, in: Proceedings of the 17th International Confer-
ence on Evaluation and Assessment in Software Engineering, ACM, 2013,170
pp. 82–85.
[2] C. Wohlin, P. Runeson, M. H¨ost, M. C. Ohlsson, B. Regnell, A. Wessl´en,
Experimentation in software engineering, Springer Science & Business Me-
dia, 2012.
[3] L. Fonseca, An instrument for reviewing the completeness of experimental175
plans for controlled experiments using human subjects in software engi-
neering, Ph.D. thesis, Federal University of Pernambuco, Recife, Brazil,
https://repositorio.ufpe.br/handle/123456789/22421 (2016).
[4] V. C. Henderson, J. Kimmelman, D. Fergusson, J. M. Grimshaw, D. G.
Hackam, Threats to validity in the design and conduct of preclinical efficacy180
studies: a systematic review of guidelines for in vivo animal experiments,
PLoS medicine 10 (7) (2013) e1001489.
[5] R. Singh, A. Singh, T. J. Servoss, G. Singh, Prioritizing threats to patient
safety in rural primary care, The Journal of Rural Health 23 (2) (2007)
173–178.185
[6] E. Teixeira, L. Fonseca, S. Soares, Threats to validity in controlled ex-
periments in software engineering: what the experts say and why this is
relevant, in: Proceedings of the XXXII Brazilian Symposium on Software
Engineering, ACM, 2018, pp. 52–61.
8
[7] V. R. B.-G. Caldiera, H. D. Rombach, Goal question metric paradigm,190
Encyclopedia of software engineering (1) (1994) 528–532.
[8] L. Fonseca, E. Lucena, S. Soares, C. Seaman, Reviewer ep: A collabora-
tive web platform for reviewing the completeness of experimental plans (in
portuguese), in: Tools Session in VIII Brazilian Conference on Software:
Theory and Practice (CBSOFT), Brazil, 2017, pp. 97–104.195
[9] D. T. Ross, Structured analysis (sa): A language for communicating ideas,
IEEE Transactions on software engineering (1) (1977) 16–34.
[10] P. R. Garvey, Z. F. Lansdowne, Risk matrix: an approach for identifying,
assessing, and ranking program risks, Air Force Journal of Logistics 22 (1)
(1998) 18–21.200
9