ArticlePDF Available

Abstract

Exploratory testing (ET) is a powerful and efficient way of testing software by integrating design, execution, and analysis of tests during a testing session. ET is often contrasted with scripted testing, and seen as a choice of either exploratory testing or not. In contrast, we pose that exploratory testing can be of varying degrees of exploration from fully exploratory to fully scripted. In line with this, we propose a scale for the degree of exploration and define five levels. In our classification, these levels of exploration correspond to the way test charters are defined. We have evaluated this classification through focus groups at four companies and identified factors that influence the choice of exploration level. The results show that the proposed levels of exploration are influenced by different factors such as ease to reproduce defects, better learning, verification of requirements, etc., and that the levels can be used as a guide to structure test charters. Our study also indicates that applying a combination of exploration levels can be beneficial in achieving effective testing.
IEEE ACCESS, MAY 2018 1
Levels of Exploration in Exploratory Testing: From
Freestyle to Fully Scripted
Ahmad Nauman Ghazi, Kai Petersen, Elizabeth Bjarnason, and Per Runeson, Member, IEEE
Abstract—Exploratory testing (ET) is a powerful and efficient
way of testing software by integrating design, execution, and
analysis of tests during a testing session. ET is often contrasted
with scripted testing, and seen as a choice of either exploratory
testing or not. In contrast, we pose that exploratory testing can be
of varying degrees of exploration from fully exploratory to fully
scripted. In line with this, we propose a scale for the degree of
exploration and define five levels. In our classification, these levels
of exploration correspond to the way test charters are defined.
We have evaluated this classification through focus groups at
four companies and identified factors that influence the choice
of exploration level. The results show that the proposed levels
of exploration are influenced by different factors such as ease to
reproduce defects, better learning, verification of requirements,
etc., and that the levels can be used as a guide to structure test
charters. Our study also indicates that applying a combination of
exploration levels can be beneficial in achieving effective testing.
Index Terms—Exploratory testing, test charter, test mission,
session-based test management, levels of exploration, exploratory
testing classification, software testing.
I. INTRODUCTION
ADVOCATES of exploratory testing (ET) stress the ben-
efits of providing the tester with freedom to act based
on his/her skills, paired with the reduced effort for test script
design and maintenance. ET can be very effective in detecting
critical defects [1]. We have found that exploratory testing
can be more effective in practice than traditional software
testing approaches, such as scripted testing [1] [2]. ET supports
testers in learning about the system while testing [1], [3].
The ET approach also enables a tester to explore areas of
the software that were overlooked while designing test cases
based on system requirements [4]. However, ET does come
with some shortcomings and challenges. In particular, ET can
be performed in many different ways, and thus there is no
one-way of training someone to be an exploratory tester. Also,
exploratory testing tends to be considered an ad-hoc way of
testing and some argue that defects detected using ET are
difficult to reproduce [5].
The benefits of exploratory testing are discussed both within
industry and academia, but only little work relates to how to
perform this kind of testing [6]. Bach introduced a technique
named Session Based Test Management (SBTM) [7] that
Ahmad Nauman Ghazi and Kai Petersen are with the Department of
Software Engineering, Blekinge Institute of Technology, Karlskrona, Sweden.
E-mail: nauman.ghazi@bth.se kai.petersen@bth.se
Elizabeth Bjarnason and Per Runeson are with the Department of Computer
Science, Lund University, Sweden. E-mail: elizabeth.bjarnason@cs.lth.se,
per.runeson@cs.lth.se
Manuscript received March 15, 2018.
Manuscript accepted May 06, 2018.
provides a basic structure and guidelines for ET using test
missions. In the context of SBTM, a test mission is an
objective to provide focus on what to test or what problems
to identify within a test session [7]. SBTM provides a strong
focus on designing test charters to scope exploration to the
test missions assigned to exploratory testers. A test charter
provides a clear goal and scopes the test session, and can
be seen as a high level test plan [8] thus the level of detail
provided in the test charter influences the degree of exploration
in the testing.
However, little guidance exists on how to define test charters
in order to achieve different, or combine various degrees of
exploration. Though, there is a need in the industry to have
support for choosing the “right” degree of exploration see e.g.
[9]. In order to make an informed decision there is a need to
define what is meant by “degree of exploration”.
We pose that testing can be performed at varying degrees
of exploration from freestyle ET to fully scripted, and propose
a scale for the degree of exploration defined by five distinct
levels of exploration. In this paper, we present a classification
consisting of five levels of exploratory testing (ET) ranging
from free style testing to fully scripted testing. We exemplify
these five levels of exploration with test charter types that were
defined based on studying existing test charters in industry. In
a previous research study [8], we provided a checklist of
contents to support test charter design in exploratory testing.
The focus of that research was to support practitioners in
designing test charters depending on the context of the test
mission and the system under test. We have extended this
research to explore different levels of exploration and how
they map to the contents of the test charters, e.g. test goals,
test steps, etc.
We evaluated our classification through focus groups at
four companies, Sony Mobile Communications, Axis Com-
munications, Ericsson, and Softhouse Consulting. In addition
to validating the levels of exploration, these focus groups
provided insight into factors that influence the choice of one
or more exploration levels to be used in exploratory testing.
The remainder of this paper is structured as follows. Sec-
tion II presents related work on exploratory testing and test
charter design. Section III presents the research methodology
and case descriptions. The results are presented in Section
IV, including the classification (Section IV-A) and evaluation
(Section IV-B). Section V provides the conclusions from the
research and directions for the future work.
IEEE ACCESS, MAY 2018 2
II. RE LATE D WO RK
Exploratory testing (ET) is a way to harness the skills,
knowledge and creativity of a software tester. The tester
explores the system while testing, and uses the knowledge
gained to decide how to continue the exploration. The design,
execution, and analysis of tests take place in an integrated fash-
ion [10]. The experience and skills of the tester plays a vital
role in ET and influences the outcome of the testing [11], [12].
ET displays multiple benefits, such as testing efficiency and
effectiveness [1], [6], a goal-focused approach to testing, ease
of use, flexibility in test design and execution, and providing
an interesting and engaging task for the tester [6].
Shah et al. [5] conducted a systematic review of the
literature on ET, and found that the strengths of ET are
often the weaknesses of scripted testing and vice versa. They
conclude that ET and scripted testing should be used together
to address the weaknesses of one process with the strength
of the other. While Shah et al. [5] do not consider different
types of ET, other research highlights the existence of different
levels or levels of exploration [4]. James Bach [13] shows
that there exists a continuum of exploration between fully
scripted and freestyle ET but does not classify this continuum
of exploration with distinct levels.
We complement the existing work by defining different
levels of exploration and their advantages and disadvantages,
thus aiding practitioners during their decisions of what levels
of exploration to choose.
III. MET HO D
The following research questions are formulated for our
study.
RQ1: How can different degrees of exploration be char-
acterized? In previous studies the question was raised
whether or not to do exploratory testing and how to
distribute effort between exploratory and scripted testing
[9]. Practitioners can use the caracterization of different
degrees of exploration as decision options that go beyond
the distinction of scripted versus exploratory testing when
defining their testing strategies. That is, the question is
not whether to conduct exploratory testing or not, but
rather which degree of exploration is preferred.
RQ2: What are the factors that influence the levels of
exploration? Knowledge of such factors, e.g. with regard
to defect detection ability may support practitioners when
deciding at which level to perform exploratory testing.
A. Designing the Classification (RQ1)
In our earlier work we identified 35 potential information
items that may be included in test charters [8]. Examples
of these include descriptions of the test setup, test techniques
to be used, purpose of the testing session, priorities, quality
characteristics to focus on, etc. Potentially one could include
all information items in a test charter, however, we argue that
this would be counterproductive for various reasons, such as:
Not all items in the test charter may be of equal impor-
tance.
Including all the items would overload the test session, as
there is too much to check, resulting in confusion for the
tester. This would be counterproductive from the idea of
exploratory testing, where the testers are driven by short
iterations of learning and reacting during a test session.
Test charters can be used to steer the degree of exploration
where the ET level changes as information items are included
/ excluded in the test charters. When no extensive information
(except the test object or system under test) is provided to the
tester, the tester is free to fully explore. The more information
that is added to the test charter, the more the tester is restricted.
Testers are restricted by test steps or biased by information
provided in the test charter. As a consequence we used test
charters as a means to define the levels of exploration.
The checklists to support test charter design, defined by
Ghazi et al. [8] in combination with existing test charters
formed the basis for designing the classification of test levels.
A total of 15 test charter examples obtained through a literature
search (peer-reviewed and grey) were obtained. We ranked the
checklists from low exploration to high exploration.
Test charters with very high degrees of exploration only
stated the test objects and the main aim/misson of the test
session [14]. An example test charter from Suranto [14] states
what should be explored i.e. the test object (a “histogram
page” in Suranto’s example), and provides ideas for what
input data to use (“various data sets and different bin-interval
setting”) with the goal to discover bugs in “the histogram
display”. The charter in the paper by Suranto characterized
well as what we later classify as “High degree of exploration”.
An example of a test charter with a low degree of explo-
ration is presented by Anders Claesson [15] showing an exam-
ple of a Copy/Paste function. The charter includes information
about:
Actors: Role the tester should take
Purpose: Goal of the testing session
Setup: Specification of the technical environment
Priority: Importance of the function
Reference: Links to complementary documents (here re-
quirements)
Data: Specification of the types of input data that can be
used
Activities: Concrete steps to be taken and their order when
conducting the testing
As is evident the test charter leaves little room to explore as
all the test steps and types of input data are given, as well as
data types when conducting copy and paste. That is, only the
concrete test data (e.g. which concrete image to copy) is left
to the testers to explore. By providing all this information no
room is left for exploration, i.e. this equates scripted testing
Other test charters fell in-between the two examples [14],
[15] and thus provided a medium degree of exploration.
B. Evaluation
We evaluated our classification of ET levels through a study
of test processes at four companies in Sweden involved in
large-scale product development in the area of telecommu-
nications and embedded systems. The companies we studied
IEEE ACCESS, MAY 2018 3
are Sony Mobile Communications, Axis Communications,
Ericsson, and Softhouse Consulting. Focus groups were used
as the main data collection method for our evaluation.
1) Companies: All of these companies use agile software
development as their development methodology. However, two
of these companies, Sony Mobile Communications and Erics-
son, have a strong focus on developing telecommunication sys-
tems ranging from mobile applications to telecommunications
charging systems. Axis Communications, mainly works in the
area of networked security cameras and embedded software.
Softhouse Consulting provides consultation to a wide range of
companies working on banking solution, telecommunication
systems, mobile applications, embedded software and control
systems.
2) Subjects: Exlporatory focus groups were conducted at
Sony Mobile Communications and Axis Communications, to
elicit the influence factors that affect the level of exploration
in testing. The participants were selected by the companies
considering the research needs of the study presented herein.
To validate the factors, that influence levels of exploration,
two focus group at Ericsson and Softhouse Consulting were
performed. Overall, 20 practitioners participated in these focus
groups. These participants were experienced testers with 4 to
25 years of experience in software testing. Table I provides an
overview of the particpants during each focus group, context
of their assignments in the company and their experience in
softeware testing.
3) Data collection: We used focus groups [16] as the
main method for data collection at all four companies and
conducted them in two main iterations. In a focus group, a
group of experts is selected by the researchers to discuss and
collect their views in a specific area of expertise where these
practitioners have considerable experience. Focus groups, as a
data collection method, support the researchers to understand
a research area in a concise way with a strong involvement of
experts from industry [16].
In the study presented herein, the initial two focus groups
were exploratory and were conducted at two companies,
Sony Mobile Communications and Axis Communications.
These companies were interested in extending their use of
exploratory testing in their current test processes. The third
and fourth focus groups were held at Ericsson and at Softhouse
Consulting with the main aim to validate the results from the
initial two focus groups. The focus group at Ericsson was
performed in two 4 hour sessions on different days.
The first two focus groups contained the following steps:
1) Introduce the basic concepts of exploratory testing
2) Present our classification of exploration levels
3) Share examples of test charters type for each level with
the participants
4) The participants re-write an existing test case at the dif-
ferent exploration levels using the provided test charter
types
5) Open discussion of how each level and test charter types
matches the context for their current test practices
6) Elicit factors that are affected by the level of exploration
in testing
Prior to the third and fourth focus group, we conducted a
survey with the focus group participants to gauge their views
of the factors elicited from the first two focus groups (such
as learning) and to which extent these impact the level of
exploration. At the (subsequent) focus group sessions, we
discussed the outcome of the survey and in particular, how
various factors are influenced by the level of exploration
and how this affects the decision regarding which level of
exploration to apply in a test session, and reached a consensus
for each factor.
The companies selected the focus group participants based
on their experience of testing and their interest in exploratory
testing. We audio recorded and transcribed all focus group
sessions, and analyzed these to identify resulting impact of
factors on each level of exploration in the proposed classi-
fication. Table I provides an overview of companies and the
participants in the focus groups.
4) Threats to validity: The focus group participants did not
have direct experience of all the levels of exploration and
the corresponding test charter types discussed in the focus
group. However, we believe that the participants could relate
to these anyway given their experience of testing. We reduced
this threat of lack of first hand experience by letting the
practitioners gain hands-on experience of the test charter types
during the focus group.
A common threat of studies with companies is the gen-
eralizability of the findings. We partially reduce this threat,
by involving four companies. Furthermore, we mitigated the
risk of researcher bias by involving three different researchers
in designing and performing the focus groups, and in jointly
discussing the outcome of these.
IV. RES ULT S
A. Levels of exploration in exploratory testing (RQ1)
We have identified five levels of exploration ranging from
free style exploratory testing to fully scripted testing with
the intermediate levels of high,medium and low level of
exploration. Figure 1 provides an overview of the proposed
classification. Each of the five levels are defined by a test
charter type that guides the testing. The test charter for each
level of exploration adds an element that sets the scope for
exploration space for the tester. At the freestyle level, the tester
is only provided with the test object. For each subsequent level,
the exploration space is reduced by adding further information
and instructions to the test charter, e.g. high-level goals. The
tester is thus further focused for each decreasing exploration
level, and the reduced freedom leads to a less exploratory
approach compared to the previous level. The test charter type
for the lowest exploration level, i.e. fully scripted, contains
both test steps and test data, and thus leaves no space for
exploration during test execution.
We provide examples of test charters produced during one
of the focus groups in Figure 2 The test charter for the
highest level of exploration, i.e. free style (not shown in figure)
contains only the goal for the testing, namely to verify a
specific function of the system. The test charter for the medium
exploration level contains additional information, e.g. suitable
IEEE ACCESS, MAY 2018 4
TABLE I
OVE RVIE W OF FO CU S GRO UPS
Focus group Company Context Number of participants Experience (in years)
1 Sony Mobile Communications Telecommunication systems,
mobile applications 6 4 - 25 years
2 Axis Communications Networked security cameras 4 4 - 25 years
3 Ericsson Telecommunication systems,
charging systems 7 15 - 24 years
4 Softhouse Consulting
Consulting (Banking solutions,
telecommunication systems,
mobile applications,
embedded systems,
control systems.)
3 15 - 24 years
Freestyle: Only the test object is provided to the tester. The tester can freely explore the system
High degree of exploration: The tester is provided with one or more high goals for the test session, also
knowing the test object. Besides that, the tester can freely explore the system.
Medium degree of exploration: The tester is provided with one or more high goals for the test session. At
the same time additional restrictions are required, that may bias and thus limit the tester in his/her testing
session. Biasing aspects could be too detailed goals, priorities, risks that the tester is required to focus
on, tools used, the functionality that needs to be covered, or the test method to be used.
Low degree of exploration: Besides the information for Medium degree of exploration, the tester is also
required to follow certain test steps, which further may bias the tester and reduce the exploration space.
The tester is encouraged to choose the test data to be used in the test steps.
Fully scripted: The tester is provided with the test steps, but also with the test data, which does not
provide room for exploration steps.
Fig. 1. Classification of levels of exploration in exploratory testing
starting points for the testing. Finally, the test charter for the
low level of exploration contains detailed test activities/steps
in addition to goals and other information. The next level, i.e.
fully scripted, which is not shown in Figure 2, also contains
test data. For example, the test charter for the test activity
“Copy content to card from PC” would also specify the content
to be copied.
B. Factors influencing the choice of exploration levels (RQ2)
We evaluated our classification of exploration levels in ET
and explored factors/characteristics that influence the selection
of these levels and the corresponding charter types through
focus groups at four companies, see Section III. We found
six main areas that influence the choice for selecting the
level of exploration used in testing, namely defect detection,
time and effort, people-related factors, evolution and change,
traceability and quality requirements. We provide an overview
of these factors in Figure 3 by presenting two opposing poles
for each factor. For example, better learning (indicated as
positive by ) versus poor learning (indicated as negative by
). If an exploration level has a neutral impact on a factor
this is indicated by .
Overall, the practitioners had a positive view of the higher
exploration levels (freestyle and high exploration) for four of
the six main areas. The participants noted a positive impact
for these levels within the areas of defect detection, time
and effort, people-related factors, and evolution and change.
In contrast, they expressed a negative impact for factors
related to traceability and verifying quality requirements. The
participants believed that the more exploration levels in ET
have a negative impact on these two areas.
Defect detection: The participants of all focus groups
highlighted that the exploratory approach will identify more
significant defects. However, one participant stated that this
may only be the case “if you know what the faults may be”, i.e.
the tester should have the skills to identify where significant
faults are most likely to occur. Thus, the tester’s skills plays
IEEE ACCESS, MAY 2018 5
High%degree%of%
exploration
Medium%degree%of%exploration Low%degree%of%exploration
Test%goal/%purpose To#verify#adoptable#storage
1.#Test#different#adoptable#storage#(speed#and#
size)
2.#Test#adoptable#storage#and#
internal/external#memory
3.#Test#move#adoptable#storage#to#other#
device#(phone/PC#etc)
4.#Test#extract#files#from#adoptable#storage
1.#To#verify#that#adopt.storage#behaves#(as#
expected)#according#To#the#requirement
2.#To#test#that#no#data#loss#occurs
Set-up%(pre-conditions%of%what%
needs%to%be%available%to%test)
1.#PC
2.#2#devices#supported#by#adopt.storage
3.#Adoptable#storage#1#…#n
4.#1#device#not#supporting#adopt.storage
5.#Test#content
Priority% High.#There#must#not#be#any#data#loss
References See#ALMreg.doc
Test%activities
1.#Insert#adoptable#storage
2.#Setup#as#internal
3.#Copy#content#to#adoptable#storage#from#
PC
4.#Read#content#on#adoptable#storage#from#
device
5.#Save
Additional%information%used
1.#Use#previously#identified#problems#as#input
2.#Use#Google#requirements#if#needed
Fig. 2. Example of test charters for the high, medium and low degrees of exploration
a vital role in exploratory testing, as is also confirmed by
empirical studies on ET [12]. These skills are also required to
judge whether an explored behavior is a critical defect or not.
However, the participants also pointed out that people taking
a new perspective often find new defects. One practitioner
said: “every time we get new people in the team, we find new
defects in the system”. This highlights one of the benefits of
exploration, namely that of not biasing the search for defects,
for example, through pre-existing test cases and prior knowl-
edge embedded in scripted tests. This may be the case for
the lower exploration levels, namely low exploration and fully
scripted. Some participants pointed out that high exploration
comes with challenges with reproducibility of detected defect.
One participant said that the “problem is when you have higher
level of exploration .. the developers want to have very detailed
steps to reproduce it”. However, when “you focus on the
reproducing you lose the exploration” which is a drawback
with the fully scripted level.
Time and effort: Many participants highlighted time effi-
ciency as one of the benefits of the high and medium levels
of exploration. One practitioner explained this by saying that
“we can get a better overview quickly” with higher exploration
levels. At these higher exploration levels less effort is required
to prepare the tests, compared to the lower and fully scripted
exploration levels. One participant explained that the many
details of the low levels of exploration require “an upfront
investment to develop test cases” before you can execute them.
Another participant expands on this by saying that there is
“less administration if you have a high level of exploration
because then you have quite openness and it is much easier
to write test cases.The participants indicated that less effort
is required to maintain test cases at the higher levels of
exploration due to the fact that changes are more likely to
affect details within test cases at the lower exploration levels.
For example, the tester would need to update the test steps.
People factors: The participants highlighted that the higher
levels of exploration are beneficial for encouraging critical
thinking, for challenging the system when testing, and that
these levels support learning. One participant said that at
the highest exploration level (freestyle) learning “might take
longer time. But that it would probably be a better approach
from the beginning to understand what the testing is.This
participant also said that “you only do fully scripted when
you know the system and it is monotonous and you can also
get tired of it”. However, some participants also expressed
positive learning effects from fully scripted testing and that
“it is definitely easier to start learning about testing when it
is fully scripted. If we do freestyle then it would be difficult”
because “it requires skills and some form of competence or
otherwise you are completely lost.One participant suggested
that to make full use of the higher levels of exploration, i.e.
freestyle and high exploration “you need a mentor that tells
you explore this, and then you explore and test. When you
have questions, you go back and ask/ discuss with the mentor.
Several participants pointed out that a tester’s experience
plays an important role vis-a-vis the exploration levels. Less
experienced testers are often able to identify new defects since
they bring a new perspective to a project. At the same time, a
tester with less experience may find it hard to conduct freestyle
or high exploration testing since they do not have the domain
knowledge required. Hence, there are additional factors that
affect the level of learning. Participants also stressed that with
scripted testing “one problem can be that if you just keep
following the test steps then there is a chance that you miss
IEEE ACCESS, MAY 2018 6
the approval criteria”. Finally, the participants pointed out
that learning occurs during the derivation of test case from
the detailed requirements, therefore it is important to consult
requirements documents, no matter what level of exploration
you select to drive your test sessions. They also highlighted
motivation as an important distinguishing factor, where testers
quickly get bored when testing at low levels of exploration
included fully scripted testing. Furthermore, the participants
highlighted that the impact and effect of the exploration levels
may very well vary throughout the development cycle. They
said that the higher levels of exploration might be particularly
useful during the early phases of testing to explore and learn
about the system. The testers may then design new tests that
later become scripted tests, which are used for regression
testing in later stages when the project is closer to releasing
software.
Evolution and change: The participants highlighted that it
is easier to design new tests for higher levels of exploration
(freestyle and high level of exploration) since this requires less
effort; “you have transparency and it is much easier to write
test cases.In line with this, the participants also expressed
that changes can be more easily implemented given that
the “higher exploration levels are less resistant [to change]
since you don’t need to change a lot of details.They also
expressed that the communication around changes to tests is
simplified for these higher exploration levels and that when
“some behavior has changed and you just discuss and notify
that this has changed instead of going in details every time.
Neutral (K)(
+,-&%&'"(
()J*(
./""-%01"(
2&#3(
"451,/$%&,6(
7"8&9:(
;451,/$%&,6(
<,=(
;451,/$%&,6(
>?/&5%"8(
7$69$1(
@"A"?%(@"%"?%&,6(
J(
J(
J(
K
JBL
Finds&more&significant/&critical&
defects&
J(
J(
JBL(
K
L
Helps&to&uncover&unknown&
defects&
L(
K(
K(
K
J
Easier&to&reproduce&defects&
C&:"($68("AA,/%&
J&
J&
K&
L&
L&
Time&efficient&&
J&
J&
K&
L&
L&
Less&effort&to&prepare&tests&
J(
J(
K(
K
L
Less&effort&to&maintain&test&
cases&
+",51"(A$?%,/-&
J(
J(
J(
K
K
Better&learning&&
J(
J(
J(
K
L
Motivates&critical&thinking&to&
challenge&expected&outcomes&
J(
J(
J(
K
L
Motivates&the&tester&
;',19%&,6($68(?3$6#"&
J(
J(
K(
L
L
Easy&to&design&new&tests&
J(
J(
J(
JBL
JBL
Easier/&provides&freedom&to&
change&test&cases&
L&
K&
J&
K&
J&
Easier&to&fill&knowledge&gap&
when&adding&new&
requirements&
C/$?"$D&1&%0&
L&
L&
J&
J&
J&
Easy&to&trace&coverage&
L&
L&
L&
J&
J&
Efficient&in&checking&
verification&of&requirements&
E9$1&%0(/"F9&/":"6%-(
L&
L&
L&
J&
J&
Easier&to&verify&conformance/&
legal&requirements&
JBL&
K&
K&
K&
J&
Helps&to&check&performance&
issues&
Fig. 3. Overview of factors influencing the levels of exploration derived from the focus groups and the survey
IEEE ACCESS, MAY 2018 7
However, the practitioners also said that the higher exploration
levels are more challenging when requirements are added or
changed, since information of the new requirements is needed
to guide the testing.
Traceability: All focus groups highlighted that the diffi-
culties of tracing requirements coverage is a major drawback
for higher levels of exploration both regarding coverage of
code and of requirements. One participant said: “The sense
of coverage is much lower as compared to when you ticked
off 100 test cases in scripted tests.This issue also applies to
requirements coverage since test cases at the higher levels of
exploration as per definition do not include any mapping to
individual requirements.
Quality requirements: Several participants highlighted that
the higher levels of exploration are not suitable for verifying
conformance requirements. One participant said that “We do
have a lot of conformance with different standards and legal
requirements” and “if you don’t have this kind of [low]
exploration level then it is easier to miss.The participants ex-
pressed different views on this regarding performance. When
testing the load of a system, scripted automated tests are
often preferred. In one case, a participant highlighted that
for performance testing “you have to continuously compare
it to different firmware and we need to have similar tests
again. Then we can’t really explore a lot.However, it is
also important to consider perceived quality and the end
user perspective, for this the higher levels of exploration
are suitable since they allow for making observations during
testing.
V. CONCLUSIONS AND FUTURE WORK
Exploratory testing (ET) leads to different outcomes in
comparison to scripted testing, thus fulfilling different pur-
poses. While an exploratory testing approach can enable find
critical and otherwise missed defects by utilizing the skill
and creativity of the tester, it is also insufficient for verifying
conformance to requirements due to providing weak coverage
of requirements. In contrast, scripted testing provides this and
is a vital component in regression testing. Thus, the question
about exploratory testing is not whether or not to apply it, but
rather when to apply which level of exploratory testing (full
or none) to achieve the desired outcome.
There have been some previous attempts to provide structure
to ET and guide the test process by defining clear test missions
and time-boxing test sessions. We provide practitioners with
a better understanding of ET practices by introducing the
concept of a sliding scale of exploration from fully exploratory,
or freestyle, to fully scripted testing. In this paper, we propose
five levels of exploratory testing and present factors that are
influenced by these levels. We define each level of exploration
by providing test charter types with distinct elements that help
practitioners to design tests at different levels of exploration.
We have explored factors related to the level of exploration
through a series of focus groups. Our research shows that the
exploration levels have an affect on factors such as the ability
to detect defects, test efficiency, learning and motivation,
and that different outcomes are to be expected depending
on the chosen exploration level. Awareness of these factors
allows testers to select the exploration level according to
what they want to achieve with their testing. For example,
testers operating at higher levels of exploration, e.g. freestyle,
can expect to achieve improved defect detection, savings in
time and effort, and facilitated management of evolution and
change. They can also expect a positive impact with regards
to learning and motivation. Though there are drawbacks too,
since the higher exploration levels are weak in supporting
traceability and verification of quality requirements concerning
conformance and performance. Another characteristic of high
levels of exploration is the weak reproducibility of defects,
as the test steps are not clearly documented for developers to
follow to reproduce the defect. However, we note that recent
research studies provide solutions for tracking testing sessions
to later derive and repeat the test steps [17].
We encourage practitioners to consider striving for a com-
bination of exploratory and scripted testing. In this way,
testers can obtain the positive effects of the higher levels of
exploration, while not neglecting other types of testing. As one
participant stated at the end of one focus group: “we [now]
think that we want both scripted and exploratory; a mix of both
approaches, so that we can approach our testing in different
ways”. We also found that the test charters we used when
defining the exploration levels, provide practical value to the
participants. The practitioners quickly grasped the differences
between the exploration levels by viewing and applying these
test charter types. We suggest that practitioners reflect on the
levels of exploration by using a similar approach. They can
explore and reflect on how the various evels of exploration
could support them by rewriting existing charters or scripted
tests according to the corresponding test charter types.
ACKNOWLEDGMENT
We would like to thank the participating companies and in
particular the individuals for their active involvement in and
support of this research.
This work was partly funded by the Industrial Excellence
Center EASE Embedded Applications Software Engineering,
(http://ease.cs.lth.se).
REFERENCES
[1] W. Afzal, A. N. Ghazi, J. Itkonen, R. Torkar, A. Andrews, and K. Bhatti,
“An experiment on the effectiveness and efficiency of exploratory
testing,” Empirical Software Engineering, vol. 20, no. 3, pp. 844–878,
2015.
[2] K. Bhatti and A. N. Ghazi, “Effectiveness of exploratory testing: An
empirical scrutiny of the challenges and factors affecting the defect de-
tection efficiency,” Ph.D. dissertation, Master Thesis, Blekinge Institute
of Technology, Sweden, 2010.
[3] C. Kaner, J. Bach, and B. Pettichord, Lessons learned in software testing.
John Wiley & Sons, 2008.
[4] J. Itkonen, M. V. M¨
antyl¨
a, and C. Lassenius, “Test better by exploring:
Harnessing human skills and knowledge,” IEEE Software, vol. 33, no. 4,
pp. 90–96, 2016.
[5] S. M. A. Shah, C¸ . Gencel, U. S. Alvi, and K. Petersen, “Towards a
hybrid testing process unifying exploratory testing and scripted testing,”
Journal of Software: Evolution and Process, vol. 26, no. 2, pp. 220–250,
2014.
[6] D. Pfahl, H. Yin, M. V. M¨
antyl¨
a, and J. M¨
unch, “How is exploratory
testing used?” in ESEM’14 Proceedings of the 8th ACM/IEEE Interna-
tional Symposium on Empirical Software Engineering and Measurement,
2014.
IEEE ACCESS, MAY 2018 8
[7] J. Bach, “Session-based test management,” Software Testing and Quality
Engineering Magazine, vol. 2, no. 6, 2000.
[8] A. N. Ghazi, R. P. Garigapati, and K. Petersen, “Checklists to support
test charter design in exploratory testing,” in 18th International Confer-
ence on Agile Software Development (XP2017). Springer, 2017.
[9] E. Engstr¨
om, K. Petersen, N. bin Ali, and E. Bjarnason, “Serp-test: a
taxonomy for supporting industry–academia communication,” Software
Quality Journal, pp. 1–37, 2016.
[10] J. A. Whittaker, Exploratory software testing: tips, tricks, tours, and
techniques to guide test design. Pearson Education, 2009.
[11] J. Itkonen, M. M¨
antyl¨
a, and C. Lassenius, “The role of the tester’s
knowledge in exploratory software testing,IEEE Trans. Software Eng.,
vol. 39, no. 5, pp. 707–724, 2013.
[12] M. Micallef, C. Porter, and A. Borg, “Do exploratory testers need formal
training? an investigation using HCI techniques,” in Ninth IEEE Inter-
national Conference on Software Testing, Verification and Validation
Workshops, ICST Workshops 2016, Chicago, IL, USA, April 11-15, 2016,
2016, pp. 305–314.
[13] J. Bach, “Exploratory testing explained,” 2003.
[14] B. Suranto, “Exploratory software testing in agile project,” in Computer,
Communications, and Control Technology (I4CT), 2015 International
Conference on. IEEE, 2015, pp. 280–283.
[15] A. Claesson, “How to perform exploratory testing by using
test charters.” [Online]. Available: http://www.sast.se/q-moten/2007/
stockholm/q3/2007 q3 claesson.pdf
[16] M. Daneva, “Focus group: Cost-effective and methodologically sound
ways to get practitioners involved in your empirical RE research,” in
Joint Proceedings of REFSQ-2015 Workshops, Research Method Track,
and Poster Track co-located with the 21st International Conference on
Requirements Engineering: Foundation for Software Quality (REFSQ
2015), Essen, Germany, March 23, 2015., 2015, pp. 211–216.
[17] E. Al´
egroth, R. Feldt, and P. Kolstr¨
om, “Maintenance of automated
test suites in industry: An empirical study on visual GUI testing,”
Information & Software Technology, vol. 73, pp. 66–80, 2016.
Ahmad Nauman Ghazi is a lecturer with the
Department of Software Engineering at Blekinge
Institute of Technology (BTH), Sweden. He received
his PhD, Licentiate of Technology and M.Sc. in
software engineering from BTH in 2017, 2014 and
2010 respectively. His research interests include
empirical software engineering, software verification
and validation, exploratory testing, agile software
development, software quality assurance, and soft-
ware process improvement. Dr. Ghazi also has ex-
tensive experience from industry where he worked
as a software test engineer for several years.
Kai Petersen is a professor with the Department
of Software Engineering at Blekinge Institute of
Technology (BTH), Sweden. He received his PhD
from BTH in 2010. His research focuses on software
processes, software metrics, lean and agile software
development, quality assurance, and software secu-
rity in close collaboration with industry partners. Kai
has authored over 70 research works in international
journals and conferences.
Elizabeth Bjarnason is a senior lecturer of Software
Engineering at the Department of Computer Science,
Lund University, Sweden. Dr. Bjarnason received
her PhD from Lund University, Sweden and prior to
that she worked in software and telecommunications
industry for several years. Her research interests
include empirical research and theory building on
requirements communication and collaboration, in
particular towards software testing.
Per Runeson is a professor of software engineer-
ing with Lund University, Sweden, head of the
Department of Computer Science, and the leader
of Software Engineering Research Group (SERG)
and the Industrial Excellence Center on Embedded
Applications Software Engineering (EASE). His re-
search interests include empirical research on soft-
ware development and management methods, in
particular for verification and validation. He is the
principal author of Case Study Research in Soft-
ware Engineering, has coauthored Experimentation
in Software Engineering, serves on the editorial board of the Empirical
Software Engineering and the Software Testing, Verification and Reliability,
and is a member of several program committees. He is a member of the IEEE.
Article
Exploratory testing is a very common, yet under researched, software testing technique. Research has shown how this technique can provide a better insight about the system under test than other techniques, that it can find defects more efficiently than other testing approaches and even aid the design of other techniques. This research aims at increasing the understanding of exploratory testing and the way it is used within industries utilizing SCRUM. Another aim is to identify and understand the factors that enable the tester to use this technique successfully. The decision to set the study in SCRUM comes from the fact that this Agile management framework is the most popular in industry and from the suggestion to focus on the relationship between Agile and exploratory testing. Also, the choice of a specific context adds significance to the findings. This research will be conducted in a Sheffield based company, which produces data analytics software. The methodology will consist of three phases. During Phase 1 (Identification), SCRUM practitioners will be interviewed about the use of exploratory testing in SCRUM and the success factors of this technique. The aim of Phase 2 (Confirmation) will be to confirm the findings from Phase 1. This will be accomplished with focus groups and widely-distributed online survey. Finally, during Phase 3 (Verification), practitioners will take part to experiments to verify that the success factors identified during the first two phases enable efficient and effective exploratory testing. The purpose of this research is to enrich the academic field of software verification and validation, but also to provide industries utilising SCRUM with useful guidance.
Preprint
Compared with scripted test, exploratory testing has the advantages of finding more defects and higher quality defects. However, in decades there are few researches on the proprietary testing methods for exploratory testing. In this paper, a new express delivery testing method is proposed based on the inspiration of the FedEx tour method and the express industry mode. The test types of the express delivery testing method are expanded from data to data, interaction objects and activities, states of internal and external and sequence of activities. Through practice verification, compared with the FedEx tour method, this method can excavate hidden test points that are easy to be missed, design more test cases, find more faults, and has higher test effectiveness.
Chapter
Context: Exploratory testing plays an important role in the continuous integration and delivery pipelines of large-scale software systems, but a holistic and structured approach is needed to realize efficient and effective exploratory testing. Objective: This paper seeks to address the need for a structured and reliable approach by providing a tangible model, supporting practitioners in the industry to optimize exploratory testing in each individual case. Method: The reported study includes interviews, group interviews and workshops with representatives from six companies, all multi-national organizations with more than 2,000 employees. Results: The ExET model (Excellence in Exploratory Testing) is presented. It is shown that the ExET model allows companies to identify and visualize strengths and improvement areas. The model is based on a set of key factors that have been shown to enable efficient and effective exploratory testing of large-scale software systems, grouped into four themes: “The testers’ knowledge, experience and personality”, “Purpose and scope”, “Ways of working” and “Recording and reporting”. Conclusions: The validation of the ExET model showed that the model is novel, actionable and useful in practice, showing companies what they should prioritize in order to enable efficient and effective exploratory testing in their organization.
Chapter
Continuous experimentation (CE) refers to a group of practices used by software companies to rapidly assess the usage, value and performance of deployed software using data collected from customers and the deployed system. Despite its increasing popularity in the development of web-facing applications, CE has not been discussed in the development process of business-to-business (B2B) mission-critical systems. We investigated in a case study the use of CE practices within several products, teams and areas inside Ericsson. By observing the CE practices of different teams, we were able to identify the key activities in four main areas and inductively derive an experimentation process, the HURRIER process, that addresses the deployment of experiments with customers in the B2B and with mission-critical systems. We illustrate this process with a case study in the development of a large mission-critical functionality in the Long Term Evolution (4G) product. In this case study, the HURRIER process is not only used to validate the value delivered by the solution but to increase the quality and the confidence from both the customers and the R&D organization in the deployed solution. Additionally, we discuss the challenges, opportunities and lessons learned from applying CE and the HURRIER process in B2B mission-critical systems.
Chapter
Measuring properties of software systems, organizations, and processes has much more to it than meets the eye. Numbers and quantities are at the center of it, but that is far from everything. Software measures (or metrics, as some call them) exist in a context of a measurement program, which involves the technology used to measure, store, process, and visualize data, as well as people who make decisions based on the data and software engineers who ensure that the data can be trusted. z
Chapter
Software developers in big and medium-size companies are working with millions of lines of code in their codebases. Assuring the quality of this code has shifted from simple defect management to proactive assurance of internal code quality. Although static code analysis and code reviews have been at the forefront of research and practice in this area, code reviews are still an effort-intensive and interpretation-prone activity. The aim of this research is to support code reviews by automatically recognizing company-specific code guidelines violations in large-scale, industrial source code. In our action research project, we constructed a machine-learning-based tool for code analysis where software developers and architects in big and medium-sized companies can use a few examples of source code lines violating code/design guidelines (up to 700 lines of code) to train decision-tree classifiers to find similar violations in their codebases (up to 3 million lines of code). Our action research project consisted of (i) understanding the challenges of two large software development companies, (ii) applying the machine-learning-based tool to detect violations of Sun’s and Google’s coding conventions in the code of three large open source projects implemented in Java, (iii) evaluating the tool on evolving industrial codebase, and (iv) finding the best learning strategies to reduce the cost of training the classifiers. We were able to achieve the average accuracy of over 99% and the average F-score of 0.80 for open source projects when using ca. 40K lines for training the tool. We obtained a similar average F-score of 0.78 for the industrial code but this time using only up to 700 lines of code as a training dataset. Finally, we observed the tool performed visibly better for the rules requiring to understand a single line of code or the context of a few lines (often allowing to reach the F-score of 0.90 or higher). Based on these results, we could observe that this approach can provide modern software development companies with the ability to use examples to teach an algorithm to recognize violations of code/design guidelines and thus increase the number of reviews conducted before the product release. This, in turn, leads to the increased quality of the final software.
Chapter
Continuous Integration is a software practice where developers integrate frequently, at least daily. While this is an ostensibly simple concept, it does leave ample room for interpretation: what is it the developers integrate with, what happens when they do, and what happens before they do? These are all open questions with regards to the details of how one implements the practice of continuous integration, and it is conceivable that not all such implementations in the industry are alike. In this paper we show through a literature review that there are differences in how the practice of continuous integration is interpreted and implemented from case to case. Based on these findings we propose a descriptive model for documenting and thereby better understanding implementations of the continuous integration practice and their differences. The application of the model to an industry software development project is then described in an illustrative case study.
Chapter
Measurement programs in large software development organizations contain a large number of indicators, base and derived measures to monitor products, processes and projects. The diversity and the number of these measures causes the measurement programs to become large, combining multiple needs, measurement tools and organizational goals. For the measurement program to effectively support organization’s goals, it should be scalable, automated, standardized and flexible – i.e. robust. In this paper we present a method for assessing the robustness of measurement programs. The method is based on the robustness model which has been developed in collaboration between seven companies and a university. The purpose of the method is to support the companies to optimize the value obtained from the measurement programs and their cost. We evaluated the method at the seven companies and the results from applying the method to each company quantified the robustness of their programs, reflecting the real-world status of the programs and pinpointed strengths and improvements of the programs.
Chapter
In many ways, digitalization has confirmed that the success of new technologies and innovations is fully realized only when these are effectively adopted and integrated into the daily practices of a company. During the last decade, we have seen how the speed of technology developments only accelerates, and there are numerous examples of innovations that have fundamentally changed businesses as well as everyday life for the customers they serve. In the manufacturing industry, automation is key for improving efficiency as well as for increasing safety. In the automotive domain, electrification of cars and autonomous drive technologies are replacing mechanical power and human intervention. In the telecom domain, seamless connectivity and digital infrastructures allow systems to adapt and respond within the blink of an eye. In the security and surveillance domain, intelligent technologies provide organizations with the ability to detect, respond, and mitigate potential risks and threats with an accuracy and preciseness we could only dream about a few decades ago. While these are only a few examples, they reflect how digital technologies, and the ever-increasing access to data, are transforming businesses to an extent that we have only seen the beginnings of.
Chapter
Context: Agile methods have become mainstream even in large-scale systems engineering companies that need to accommodate different development cycles of hardware and software. For such companies, requirements engineering is an essential activity that involves upfront and detailed analysis which can be at odds with agile development methods. Objective: This paper presents a multiple case study with seven large-scale systems companies, reporting their challenges, together with best practices from industry. We also analyse literature about two popular large-scale agile frameworks, SAFe® and LeSS, to derive potential solutions for the challenges. Method: Our results are based on 20 qualitative interviews, five focus groups, and eight cross-company workshops which we used to both collect and validate our results. Results: We found 24 challenges which we grouped in six themes, then mapped to solutions from SAFe®, LeSS, and our companies, when available. Conclusion: In this way, we contribute a comprehensive overview of RE challenges in relation to large-scale agile system development, evaluate the degree to which they have been addressed, and outline research gaps. We expect these results to be useful for practitioners who are responsible for designing processes, methods, or tools for large scale agile development as well as guidance for researchers.
Conference Paper
Full-text available
During exploratory testing sessions the tester simultaneously learns, designs and executes tests. The activity is iterative and utilizes the skills of the tester and provides flexibility and creativity.Test charters are used as a vehicle to support the testers during the testing. The aim of this study is to support practitioners in the design of test charters through checklists. We aimed to identify factors allowing practitioners to critically reflect on their designs and contents of test charters to support practitioners in making informed decisions of what to include in test charters. The factors and contents have been elicited through interviews. Overall, 30 factors and 35 content elements have been elicited.
Article
Full-text available
This paper presents the construction and evaluation of SERP-test, a taxonomy aimed to improve communication between researchers and practitioners in the area of software testing. SERP-test can be utilized for direct communication in industry academia collaborations. It may also facilitate indirect communication between practitioners adopting software engineering research and researchers who are striving for industry relevance. SERP-test was constructed through a systematic and goal-oriented approach which included literature reviews and interviews with practitioners and researchers. SERP-test was evaluated through an online survey and by utilizing it in an industry–academia collaboration project. SERP-test comprises four facets along which both research contributions and practical challenges may be classified: Intervention, Scope, Effect target and Context constraints. This paper explains the available categories for each of these facets (i.e., their definitions and rationales) and presents examples of categorized entities. Several tasks may benefit from SERP-test, such as formulating research goals from a problem perspective, describing practical challenges in a researchable fashion, analyzing primary studies in a literature review, or identifying relevant points of comparison and generalization of research.
Article
Full-text available
Focus groups are a qualitative research method helping researchers collect and analyze information from practitioners in industry, in order to better understand how a Requirements Engineering (RE) phenomenon happens from the perspective of those working in the field. It is useful in both exploratory and confirmatory studies. While focus groups have been popular in studies of other disciplines where a researcher investigates 'why', 'what' and 'how' aspects of a phenomenon of interest in practical context, the potential of this research method is under-exploited in RE. One reason could be that computer science, software engineering and Information Systems (IS) programs in most universities do relatively little to prepare their master students and PhD researchers on the use of this research method. As a result, the method is partly understood and is sub-optimally or incorrectly applied, or avoided altogether. This may translate in a missed opportunity for RE researchers to engage with practitioners in industry-relevant research that could be both done in costeffective and pragmatic way. This mini-tutorial provides some practical suggestions on how to evaluate the fitness of focus group research techniques to a research context, how to design a good enough focus-group-research process, how to counter validity threats and how to report and publish the results.
Conference Paper
Full-text available
Software testing is an important process in a software development life-cycle as validation and verification mechanisms to guarantee the quality of the intended software product. Exploratory testing is a software testing where the testers may interact with the system in whatever way they want and use the information the system provides to react and generally explore the system functionalities without restraint. Exploratory testing allows the full power of the human brain to be brought to bear on finding bugs and verifying functionality without preconceived restrictions. This paper discuss about enacting exploratory software testing in a software project which was developing Student Marking System (SMS) in University XYZ.
Article
Full-text available
The exploratory testing (ET) approach is commonly applied in industry, but lacks scientific research. The scientific community needs quantitative results on the performance of ET taken from realistic experimental settings. The objective of this paper is to quantify the effectiveness and efficiency of ET vs. testing with documented test cases (test case based testing, TCT). We performed four controlled experiments where a total of 24 practitioners and 46 students performed manual functional testing using ET and TCT. We measured the number of identified defects in the 90-minute testing sessions, the detection difficulty, severity and types of the detected defects, and the number of false defect reports. The results show that ET found a significantly greater number of defects. ET also found significantly more defects of varying levels of difficulty, types and severity levels. However, the two testing approaches did not differ significantly in terms of the number of false defect reports submitted. We conclude that ET was more efficient than TCT in our experiment. ET was also more effective than TCT when detection difficulty, type of defects and severity levels are considered. The two approaches are comparable when it comes to the number of false defect reports submitted.
Article
Context: Verification and validation (V&V) activities make up 20-50% of the total development costs of a software system in practice. Test automation is proposed to lower these V&V costs but available research only provides limited empirical data from industrial practice about the maintenance costs of automated tests and what factors affect these costs. In particular, these costs and factors are unknown for automated GUI-based testing. Objective: This paper addresses this lack of knowledge through analysis of the costs and factors associated with the maintenance of automated GUI-based tests in industrial practice. Method: An empirical study at two companies, Siemens and Saab, is reported where interviews about, and empirical work with, Visual GUI Testing is performed to acquire data about the technique's maintenance costs and feasibility. Results: 13 factors are observed that affect maintenance, e.g. tester knowledge/experience and test case complexity. Further, statistical analysis shows that developing new test scripts is costlier than maintenance but also that frequent maintenance is less costly than infrequent, big bang maintenance. In addition a cost model, based on previous work, is presented that estimates the time to positive return on investment (ROI) of test automation compared to manual testing. Conclusions: It is concluded that test automation can lower overall software development costs of a project while also having positive effects on software quality. However, maintenance costs can still be considerable and the less time a company currently spends on manual testing, the more time is required before positive, economic, ROI is reached after automation.
Article
Users continue to stumble upon software bugs, despite developers' efforts to build and test high-quality software. Although traditional testing and quality assurance techniques are extremely valuable, software testing should pay more attention to exploration. Exploration can directly apply knowledge and learning to the core of industrial software testing, revealing more relevant bugs earlier. This article describes exploration's characteristics, knowledge's role in software testing, and the three levels of exploratory-testing practices. Academics and practitioners should focus on exploiting exploration's strengths in software testing and on reporting existing practices and benefits in different academic and industrial contexts.