Conference PaperPDF Available

In Quest for Requirements Engineering Oracles: Dependent Variables and Measurements for (good) RE.

Authors:

Abstract and Figures

Context: For many years, researchers and practitioners have been proposing various methods and approaches to Requirements Engineering (RE). Those contributions remain, however, too often on the level of apodictic discussions without having proper knowledge about the practical problems they propagate to address, or how to measure the success of the contributions when applying them in practical contexts. While the scientific impact of research might not be threatened, the practical impact of the contributions is. Aim: We aim at better understanding practically relevant variables in RE, how those variables relate to each other, and to what extent we can measure those variables. This allows for the establishment of generalisable improvement goals, and the measurement of success of solution proposals. Method: We establish a first empirical basis of dependent variables in RE and means for their measurement. We classify the variables according to their dimension (e.g. RE, company, SW project), their measurability, and their actionability. Results: We reveal 93 variables with 167 dependencies of which a large subset is measurable directly in RE while further variables remain unmeasurable or have too complex dependencies for reliable measurements. We critically reflect on the results and show direct implications for research in the field of RE. Conclusion: We discuss a variety of conclusions we can draw from our results. For example, we show a set of first improvement goals directly usable for evidence-based RE research such as "increase flexibility in the RE process", we discuss suitable study types, and, finally, we can underpin the importance of replication studies to obtain generalisability.
Content may be subject to copyright.
In Quest for Requirements Engineering Oracles:
Dependent Variables and Measurements for (good) RE
Daniel Méndez Fernández, Jakob Mund, Henning Femmer, Antonio Vetrò
Technische Universität München, Germany
http://www4.in.tum.de/~mendezfe|femmer|mund|vetro
ABSTRACT
Context: For many years, researchers and practitioners
have been proposing various methods and approaches to Re-
quirements Engineering (RE). Those contributions remain,
however, too often on the level of apodictic discussions with-
out having proper knowledge about the practical problems
they propagate to address, or how to measure the success
of the contributions when applying them in practical con-
texts. While the scientific impact of research might not
be threatened, the practical impact of the contributions is.
Aim: We aim at better understanding practically relevant
variables in RE, how those variables relate to each other,
and to what extent we can measure those variables. This
allows for the establishment of generalisable improvement
goals, and the measurement of success of solution propos-
als. Method: We establish a first empirical basis of de-
pendent variables in RE and means for their measurement.
We classify the variables according to their dimension (e.g.
RE, company, SW project), their measurability, and their
actionability. Results: We reveal 93 variables with 167 de-
pendencies of which a large subset is measurable directly
in RE while further variables remain unmeasurable or have
too complex dependencies for reliable measurements. We
critically reflect on the results and show direct implications
for research in the field of RE. Conclusion: We discuss a
variety of conclusions we can draw from our results. For
example, we show a set of first improvement goals directly
usable for evidence-based RE research such as“increase flex-
ibility in the RE process”, we discuss suitable study types,
and, finally, we can underpin the importance of replication
studies to obtain generalisability.
Categories and Subject Descriptors
D.2.1 [Software Engineering]: Requirements/Specification
General Terms
Measurement, Experimentation
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
EASE ’14, May 13-14 2014, London, UK
Copyright 2014 ACM 1-58113-000-0/00/0010 ...$15.00.
Keywords
Evidence-based Research, Requirements Engineering, Met-
rics and Measurements
1. INTRODUCTION
Requirements Engineering (RE) aims at the discovery and
specification of requirements that unambiguously reflect the
purpose of a software system as well as the needs of all rele-
vant stakeholders. The specification of precise requirements
directly contribute to appropriateness and cost-eectiveness
in the development of a system [21] and, thus, RE is an im-
portant factor for productivity and (product) quality [5].
Given the practical importance of RE, it remains an in-
herently complex discipline due to the various influences
in practical environments making the process itself uncer-
tain [17]. The interdisciplinary nature of the field and the de-
pendency on the various human factors that pervade RE like
no other software engineering discipline, eventually make RE
hard to investigate and even harder to improve [16].
In response to the practically motivated relevance and the
fundamental challenges given in the discipline, we can ob-
serve over the last two decades a strong research commu-
nity arising from an initially neglected field of investigation.
In the course of various research endeavours, a plethora of
methods and approaches to RE have been proposed.
However, when it comes to evaluating the contributions in
practical environments, which is a prerequisite for providing
insights into factors relating to the practical impact of the
contributions, we can observe that available evaluations are
often not in tune with the (practical) problems they are in-
tended to address. Available contributions provide, if at all,
isolated case studies investigating aspects that hardly can
be generalised, e.g., long-term views on cost and benefits
when applying developed methods. And in most cases ac-
curate evaluations starve in the future work section of the
publication [2].
From an empirical perspective, however, the investiga-
tions one would expect with the contributions cannot be
provided in short notice as:
The eort necessary to conduct case and field study
research is, in general, very high, and often can be
portrait as an own scientific contribution itself.
The accuracy and objectivity not only depend on the
chosen study population, but also on the involved re-
searchers. This threat implies that, ideally, the evalua-
tions should be performed independently by researchers
who are not involved in the development of the method
under analysis. Given the current stress fields of aca-
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
demic research environments, the independent investi-
gation of given (methodological) contributions via con-
firmatory studies seems often to be unattractive.
The external validity seems to demand for replication
studies and longitudinal studies that are often not in
scope of research projects.
Over and above all, investigations on selected benefits
are often driven by very specific-context problems where we
evaluate a solution using metrics and measurements that are
important to a particular context whereas they might not be
important to the other; for instance, supporting the creation
of consistent RE artefacts might be important to one com-
pany while the focus of another company might be set on
supporting communication within local teams.
It is thus not surprising that empirical evidence on the
suitability of scientific contributions to tackle practical prob-
lems is per se dicult to provide and often underrepresented
in scientific contributions. Another reason for the currently
weak state of evidence in RE [2] might be the missing aware-
ness of the RE community towards the possibilities we have
in evidence-based research. Condori-Fernandez et al. [4] con-
ducted a study to understand the extent to which the empiri-
cal software engineering research methods are adopted in the
RE community. One result was that even senior researchers
are not aware of the full potential of evidence-based research,
not only to evaluate contributions on basis of hypotheses,
but also to rigorously explore the problem domain itself to
reveal theories and practically relevant improvement goals.
To develop solutions that are meant to solve commonly
accepted problems, we first need to understand what com-
monly accepted problems are, how they evolve within the
whole project ecosystem (beyond RE), and how we can in-
fer a generalised notion of a “good RE” from those problems.
This notion might not necessarily apply to all socio-economic
contexts with individual and isolated problems, improve-
ment goals, and organisational cultures, but it should sup-
port the reliable evaluation of solutions against a set of com-
monly accepted (dependent) variables and measurements.
Having a set of variables and understanding to what extent
those variables eventually matter from a practical perspec-
tive supports an evaluation of solution proposals and, as an
ultimate goal, the establishment of RE oracles. For example,
knowing which variables matter in practice and how they can
be measured allows us to evaluate engineering methodolo-
gies w.r.t., e.g., their support to tackle communication flaws
within teams or the creation of consistent artefacts as we
know to what extent those problems matter and how they
manifest themselves in the whole development process.
1.1 Problem Statement
To develop proper improvement goals, we first need an
empirical basis of measurable variables that originate in RE
and their dependency to further variables within the whole
project ecosystem. A characterisation of such phenomena
gives a better understanding of practical improvement goals,
which allow, in turn, to develop and evaluate scientific con-
tributions against practically relevant problems we are able
to characterise and measure.
1.2 Research Objective
The main objective of the paper is to provide a first empir-
ical basis of structured variables originating in RE and their
dependency to further variables within the whole project
ecosystems. A further classification of those variables ac-
cording to their measurability and actionability allows fi-
nally to critically reflect on the possibilities in evidence-
based research and to draw direct research implications for
RE.
1.3 Contribution
In this paper, we transfer the results of a survey we con-
ducted to identify a set of RE-related problems and their
eects to a set of dependent variables in RE, show where
those variables occur in the project ecosystem, and critically
discuss the measurability of those variables. Based on those
results and their critical discussion, we infer first research
implications for RE research.
1.4 Outline
In Sect. 2, we discuss fundamentals and work related to
our research. In particular, we introduce the NaPiRE project
from which we infer the variables classified and analysed in
context of this paper and introduce further research results
on which we rely during our classification. In Sect. 3, we
then introduce our overall research design. We present our
results in Sect. 4.1. In Sect. 5, we critically reflect on our re-
sults and draw first research implications for evidence-based
RE research in Sect. 6, before concluding with a discussion
of the threats to validity and future work in Sect. 7.
2. FUNDAMENTALS AND RELATED WORK
Much work has been carried out to explore promising re-
search directions in Requirements Engineering mostly bap-
tised with the term“Roadmap”. Prominent examples are the
one by Nuseibeh et al. [21] and subsequent contribution by
Chen et al. [2]. Both contributions explore the facets of RE
in great detail and show various research directions within
those facets; for example, regarding requirements elicitation
techniques, modelling and analysis techniques, or aspects re-
lating to RE as an interdisciplinary area which characterises
the discipline in greatest distinction to other software en-
gineering disciplines. However, when it comes to directly
characterising RE more explicitly by its phenomena, which is
necessary to, inter alia, infer measurable improvement goals
for general RE research and to evaluate solution proposals
against a common understanding of practically relevant RE
variables, only few contributions are at our disposal.
Gorschek et al. [9] proposed a framework to characterise
dependent variables in RE in context of a project ecosystem
via five dimensions (from a process improvement perspec-
tive), namely:
1. Requireme nts pha se
2. Project including variables like cost and time or project
estimates
3. Product where dependent variables determine the de-
gree of product success
4. Company considering the eects in multi-project envi-
ronments or a product placed in a market
5. Society
In their contribution, they describe an initial set of depen-
dent variables to support a view on the notion of RE quality
in a broader context as within selected particularities of RE
alone. However, the variables proposed so far were an initial
set that needed further investigation and extension.
To reveal RE phenomena and characterise the discipline
from a practical perspective, we mostly rely on (exploratory)
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
field study and survey research. One of the most well known
surveys is the Chaos Report of the Standish Group, exam-
ining, for example, root causes for project failures of which
most are to be seen in RE, such as missing user involvement.
Apart from having serious flaws in its design negatively af-
fecting the validity of the results [7] studies of this type do
not support investigation of contemporary phenomena and
problems in industrial RE environments (as they focus on
project failure only). Such investigations have, for exam-
ple, been indirectly conducted by Damian et al. [5]. They
analysed process improvements in RE and the relation to
payos regarding, for example, productivity and the final
product quality. Nikula et al. [20] present a survey on RE
at organisational level of small and medium size companies
in Finland. Based on their findings, they inferred improve-
ment goals, e.g. on optimising knowledge transfer. A study
used to infer recommendations to practitioners, such as user
involvement in the elicitation process, has been performed
by Enam et al. [6]. A more curiosity-driven study to anal-
yse typical project situations in companies was presented
by us in [17]. We could discover 31 project characteristics
that directly influence RE. A survey that directly focused
on discovering problems in practical settings was performed
by Hall et al. [10]. They empirically underpin the prob-
lems discussed by Hsia et al. [11] and investigated a set of
critical organisational and project-specific problems, such as
communication problems, inappropriate skills or vague re-
quirements, while those problems matched to a large extent
project characteristics we could discover.
Still, studies as the mentioned ones, although being all
valuable as they provided necessary groundwork for an em-
pirical understanding of RE, had either their focus on RE
phenomena without any relation to further phenomena in a
project ecosystem (except project failure), or they were per-
formed in isolation in one single company, thus, remaining
not representative.
For this reason, we initiated a global family of surveys
to reveal the status quo in industrial requirements engi-
neering, namely the NaPiRE project (“Naming the Pain
in Requirements Engineering”) [16, 18]. This project pro-
vides, in the long run, an empirical survey repository and is
performed in collaboration between various members of the
International Software Engineering Research Network (IS-
ERN). Under http://re-survey.org, we provide further
information on the project as well as the complete survey
data as it is published to the PROMISE repository.
For the purpose of the paper at hand, we rely on a subset
of the NaPiRE data obtained from its run in Germany in
2013 with 58 companies [16, 18]. We take a set of problems
in RE and their eects as revealed via open questions in
NaPiRE and classify those variables using the scheme intro-
duced by Gorschek et al. [9] (see also the next section).
3. RESEARCH DESIGN
Our aim is to provide a first empirical basis of structured
variables originating in RE and their dependency to further
variables within the whole project ecosystems. This allows
for a critical reflection on metrics and measurements used
for evidence-based RE research and the inference of research
implications for RE research. To this end, we define three
research questions as summarised in Tab. 1.
In RQ 1, we explore the variables and their manifestation
in context of RE as they result from NaPiRE and trans-
fer those variables to a subset of the dimensions defined by
Gorschek at al. [9] (see also Sect. 2). We change the dimen-
sions in response to disagreements during the classification
procedure (e.g., we remove the dimension “product” as we
see a product as a subset of artefacts occurring in multiple
dimensions). The goal is to understand the problems named
by practitioners in terms of where they occur and how they
relate to each other.
Table 1: Research questions
RQ 1 Which RE-related phenomena exist, where in
the project ecosystem do they manifest them-
selves, and how do they relate to each other?
RQ 2 Are the phenomena measurable?
RQ 3 Are the phenomena actionable?
To this end, two of the authors individually classified the
RE phenomena according to the dimensions (RE, Engineer-
ing, SW project, Company), based on the individual, origi-
nal statements that were given in the NaPiRE report. After
this, each disagreement was discussed in depth until the re-
searchers agreed on one common answer.
In RQ 2, we aim to identify those phenomena amenable
to measurement. Therefore, a phenomenon is considered
measurable [?] if and only if
(i) its understanding is suciently mature such that
(ii) an existing or anticipated measure, i.e., objective map-
ping to mathematical objects
(iii) can eciently (e.g., in justifiable time) and
(iv) eectively (i.e., preserving empirical observations) cap-
ture the phenomenon
(v) under practical conditions and when applied on study
objects which can be expected to be present in a soft-
ware project ecosystem.
We further distinguish between artefacts or activities as the
primary kind of study object, or both in cases where the
phenomena can be measured on artefacts and activities as
well. Please note that our intention is not to provide a taxon-
omy of measurements, but rather rate the extent of general
measurability.
For the classification, we rely again on researcher triangu-
lation. To this end, three authors of this paper (one senior
researcher and two PhD students) classified the phenomena
on a nominal scale (not measurable, measurable on artefacts,
measurable on activities, measurable on both). The individ-
ual classifications lead to full agreement,partial agreeme nt
or disagreement. Also in this case, the three researchers dis-
cussed each disagreeing classification was discussed in depth
until the researchers agreed on one common answer or at
least a majority vote was possible.
In RQ 3, finally, we aim at investigating a certain prop-
erty of the phenomena called actionability (see also [19]).
An actionable phenomenon allows to make an empirically-
informed decision based on its measurement with significant
impact on the phenomenon, e.g., to improve/change the sta-
tus of a phenomenon. Similar notions are interpretative
guidelines or recommendations for action. For the identi-
fication, we rely again on the three authors who estimated
the actionability of each phenomenon either by yes or no,
leading to only full or partia l agreement (ma jority vote) on
this question.
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
Validity Procedure
The three researchers who did the classification have sev-
eral years of experience on RE. To analyse and increase the
validity of the procedure, the classification was conducted
independently by the three authors while we used the kappa-
values to assess the agreement. Furthermore, we forced that
the researchers discuss each disagreement in order to en-
sure a common understanding and reach a final classification
where all the raters fully agreed or at least a majority vote
was possible.
4. DEPENDENT VARIABLES
As a results of the survey, we revealed 93 variables with
167 dependencies. Given the complexity of the network, the
complete list of all phenomena with the determined dimen-
sion of occurrence is available online1.
Herein, we present only the most relevant relations be-
tween phenomena (also referred as variables). Fig. 1 illus-
trates those interconnected variables.
Each variable is represented as a box in the graph, labeled
in the upper face with an unique alphanumerical identifier.
The first part of the identifier is either ”RP”, i.e. requirement
problem, or ”M”, i.e. manifestation of the problem, or ”R”,
i.e. reasoning, and it is followed by a sequential number. For
further explanations on problems, reasonings and manifes-
tations, we point the reader to the available publications on
the NaPiRE survey [16, 18]. The two front faces report the
results of the classification. On the leftmost, the measura-
bility (RQ2), on the rightmost the actionability (RQ3). As
far the classification on dimensions is concerned, the nodes
are placed in four dierent boxes, each one corresponding to
a dierent dimension. The names of the variables presented
in the graph are reported in Tab. 2.
The graph is directed: being A and B two connected phe-
nomena, the relation A !B states that A causes B or man-
ifests itself in B, according to participants’ answers. The
width of the arrow indicates how many respondents con-
nected A to B: it can be interpreted, approximately, as the
amount of evidence provided by the survey results on the
relationship between A and B. The size of the node corre-
sponds to the sum of the the weights of the outgoing and
incoming connections. In the following, we present our re-
sults on finding dependent variables, structured according
to the research questions.
4.1 RE-related Phenomena (RQ 1)
The classification of the variables according to dimensions
resulted in 38 variables assigned to dimension SW project,33
to RE,14toEngineering, and 8 to Company. The variables
are highly connected (167 overall connections), but 50 %
of nodes have only two connections. Especially the vari-
ables in the dimensions SW project and RE have frequent
connections. We represent a portion of the graph in Fig.
1, selecting only those connections which are supported by
more than one respondent (i.e., weight >1, corresponding
to 80% of all connections).
The most frequently observed relationship is Moving tar-
gets (RP02) !Change requests (M09). While this relation-
ship might be obvious, it is interesting to notice that Mov-
ing targets manifests itself (outgoing edge), in this filtered
1http://www4.in.tum.de/~mendezfe/openspace/NaPiRE/
ease2014.zip
Table 2: Descriptions for variables in Fig. 1
ID Name
M02 Underspecified Reqs.
M03 Incomplete Reqs.
M05 Eort and time overrun
M08 Failed approval of reqs.
M09 Change Requests
M13 Additional communication and replanning
M15 Failed Acceptance
M17 Increased eort in testing
M18 Increased eort in reviews
M20 Too complex solutions
M24 Bugs and defects
M29 Time overrun
M30 Cost overrun
M32 Stagnating progress
M36 Customer dissatisfaction
R01 Implicit Reqs not made explicit
R03 Missing abstraction from solution level
R11 Weak communication
R18 Too ambitious time planning
RP01 Incomplete / hidden reqs.
RP02 Moving targets
RP03 Time boxing
RP04 Separation reqs. from known solutions
RP05 Underspecified reqs.
RP06 Communication flaws in team
RP07 Inconsistent reqs.
RP08 Communication flaws to customer
RP10 Gold plating
RP11 Terminological problems
graph, only with phenomena in the dimension SW project.
On the other hand, the reasons for change requests (incom-
ing edges) result from multiple dimensions. Underspecified
requireme nts (RP05) and Gold plating (RP10), instead, af-
fect only phenomena related to SW project.
The node with the highest frequency of incoming/outcoming
connections is Incomplete or hidden requirements (RP01) in
RE. Although it is aected only by one phenomenon, its
manifestations are transversal to all dimensions and refer in
particular to time delays and wasted eorts.
A separated network of dependencies, on the right, is built
around the phenomenon Additional Communication and Re-
planing (M13), which, interestingly, is assumed to be di-
rectly caused only by phenomena from the RE dimension:
Ter m i n olog ica l p ro b l ems (RP11), Communications flaws to
customer (RP08), and Inconsistent requirements (RP07).
We observe yet another isolated network on the bottom
of the RE dimension, around the node Time boxing (RP03),
which is aected by three variables from RE, and Bugs and
defects (M24) from Engineering. It is interesting to note
that this is the only connection in which Bugs and defects
appears (and in fact the size of the node is not large), which
reveals that problems and bad quality in requirements might
not propagate to the point to aect the external quality of
code artefacts.
A final observation is that no circular dependencies (.i.e.,
chains of cause-eect relationships) are presented in this fil-
tered graph.
4.2 Measurability of Variables (RQ 2)
Regarding RQ 2, we investigated which phenomenon, and
thus potential variables, can be measured using an existing
or anticipated metric (cf. Sec. 3), and on what study ob-
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
Company
Project
Engineering
2
2
2
2
2
2
2
3
2
2
4
2
3
8
3
2
2
4
2
2
2
2
2
2
2
2
2
3
2
3
3
Requirements
Engineering
M36
Name
Actionable (A)
Measurable (on aRtefacts, on aCtivities,
on aRtefacts & aCtivities)
M03
R
M08
R
R03
R
M09
R
RP06
R
M32
C
RP02
R
M29
C
M30
R&C
M05
C
M15
M20
M13
A
R11
R18
R&C
M24
R
RP10
R&C A
C
RP03
CA
M02
RA
RP08
RA
RP11
A
RP07
R
RP04
C
R01
RP05
R A
RP01
R
Figure 1: Relationship between RE-related variables with weight >1.
Figure 2: Measurability of variables regarding the dimensions RE,Engineering,Project and Company.
jects it can be measured. Fig. 2 illustrates the results per
dimension.
Out of all 93 variables, we considered less than half (38
variables, 41%) not measurable at all. Considering the re-
maining 55 measurable variables, the majority was measur-
able exclusively on artefacts (33 variables, 60%), with mea-
surability on activities coming second (17 variables, 31%).
Only 5 variables (9%) were found to be measurable on both
artefacts and activities.
Regarding RE phenomena only, out of 33 variables less
than one-third were considered unmeasurable (10 variables,
30%). Unmeasurable variables comprise social phenomena,
e.g. Weak access to customer needs and Insucient support
by project lead, as well as phenomena attributable to un-
certainty and limited knowledge inherently present during
RE, e.g., Tec hn i cal l y i nf ea si bl e req u ire m en ts or Implicit re-
quirements not made explicit. Measurability of RE phenom-
ena was mostly considered possible exclusively on artefacts
(17 variables, 74%) instead of activities (5 variables, 22%),
with the requirements specification as the predominant arte-
fact to measure RE phenomena, e.g., Underspecified require-
ments,Instable requirements or Inconsistent object models.
Solely one variable (4%), namely Informal (unpaid) changes
during RE, was considered to be measurable on both arte-
facts and activites.
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
Regarding the variables in the engineering dimension, only
3 variables (21%) were considered unmeasurable, namely
Too co mplex so l u t i ons,Wrong design decisions and the Use
of throw-away prototypes. The remaining 11 variables (79%)
were considered to be measurable on artefacts (6 variables,
55%, e.g. Bugs and Defects), activities (3 variables, 27%,
e.g. Increased discussion during implementation)orboth(6
variables, 18%, e.g. Gold plating).
Out of all variables in the project dimension, exactly half
(19 variables, 50%) were considered measurable, with the
following distribution: eight of them (42%) were classified
as measurable on artefacts , nine on activities (42%), the
remaining two variables (11%) were classified as measurable
on both artefacts and activities.
Only two variables (25%) attributed to the company di-
mension, No further improvement after acceptance and Com-
munication flaws in teams, were considered measurable, both
on artefacts exclusively. However, the majority of variables
(75%), e.g. Customer dissatisfaction and Volatile domain,
were considered unmeasurable.
4.3 Actionability of Variables (RQ 3)
Actionability (cf. Sec. 3) denotes the ability to take imme-
diate actions to significantly change a phenomenon. Fig. 3
illustrates the results of the classification.
Figure 3: Actionability according to dimensions.
Overall, only a minority of variables (25, 27%) were con-
sidered actionable. Actionable variables are, e.g., No vali-
dation and Underspecified requirements. Those phenomena
can even be avoided by introducing a specific RE process
or a reference model for the artefacts that defines a notion
of quality for the artefacts. Increased process costs and Un-
availability of customer, in turn, are instances for variables
considered inactionable.
The percentage of actionable variables (42%, compared
to all variables within a dimension) is highest in RE. In
contrast, 4 (29%) engineering variables and 7 (19%) project
variables are actionable. None of the eight variables in the
company dimension is actionable.
4.4 Reliability of Classifications
In order to assess the degree of the inter-rater agreement
and, thus, the reliability of the classifications, we apply Co-
hen’s -measure [3]. It is defined as
=PoPe
1Pe
in which Pois the observed percentage agreement, and Peis
the expected probability of agreement among raters due to
chance, based on marginal probabilities, i.e. the distribution
of the actual selections of the raters, and complete statistical
independence of raters. Peestimates the proportion of times
raters would agree if they guessed completely on every case
and with probabilities that match the marginal proportions
of the observed classifications.
The values of are constrained to the interval [1; +1].
Avalue of one means perfect agreement, a value of zero
means that agreement is equal to chance, and a value
of negative one means perfect disagreement. For 0
1, literature suggests >0.60 corresponds to a ”good” or
substantial agreement”, the range 0.21 0.60 is interpreted
as ”fair” or ”moderate” agreement in the majority of cases,
while for the range 0 0.20 we have ”poor” agreement (see,
e.g., [8]).
In case the distribution of ratings is skewed, the coe-
cient must be adjusted for prevalence, resulting in 2Po1.
Tab. 3 reports the three mentioned agreement metrics. For
the classification of the dimension, which was done by 2
raters, the Cohen’s Kappa is computed. For the classifica-
tions of the measurability and the actionabilit, provided by
3 raters, the Fleiss [8] adjusted Kappa is computed.
We observe moderate agreement for the classification of
dimensions and actionability. The agreement values on mea-
surability are conservative, because we ignored the partial
agreements (i.e. when a rater classified a variable as measur-
able on basis of both activities and artefacts and the other(s)
only on basis of one of them). We report poor agreement in
measurability for except for SW project and RE. The dimen-
sion Engineering has moderate agreement in measurability.
We observe disagreeement for dimension Company both in
measurability and actionability. Although only 8 nodes are
aected, this reveals a diifculty for the raters to classifiy and
interpret those variables.
Classification Po2Po
1 Cohen’s
Dimensions 0.60 0.20 0.43
Measurability
Overall 0.43 -0.15 0.20
Company 0.38 -0.25 -0.13
Engineering 0.52 0.05 0.34
RE 0.42 -0.15 0.15
SW project 0.39 -0.23 0.13
Actionability
Overall 0.67 0.34 0.34
Company 0.50 0.00 -0.33
Engineering 0.76 0.52 0.52
RE 0.66 0.31 0.30
SW project 0.68 0.37 0.34
Table 3: Reliability of classifications
5. CRITICAL REFLECTION
In the following, we critically reflect on our results by con-
sidering the possibilities and limitations for evidence-based
RE research we draw from the extent to which variables are
measurable and actionable in context of RE.
To this end, we first discuss the positive eects we see in
the results, i.e. the measurability of RE phenomena within
RE itself. In a second step, we critically discuss the nega-
tive conclusions we draw from our results. Those negative
eects can be discussed, in turn, on two levels: we have
variables that are barely measurable within RE or barely
measurable at all, and we have variables where the mea-
surements strongly depend on subjective interpretation, i.e.
where the variables are strongly dependent on the partic-
ularities of a narrow (“local”) socio-economic context with
limited possibilities of generalisation. The latter is impor-
tant to the more general, and to some extent philosophical,
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
question to which extent the variables can eventually serve
at all to build and evaluate contributions to RE that are not
dependent on subjectivity (i.e. oracles).
Universe of all possible RE phenomena
(maximal scope of validity)
Scope of local RE theories
Treatments
(Subjective) Effects
on further variables
Delta to objectivity
"The Bad"
"The Ugly"
"The Good"
Figure 4: The Good, the Bad, and the Ugly: Pos-
sibilities and limitations in evidence-based RE re-
search.
Figure 4 summarises the perspectives we take in the fol-
lowing while categorising them according to the possibilities
we conclude for evidence-based RE research.
5.1 The Good: Measurability in RE Context
We see those dependent variables that allow us to reliably
measure the eects of treatments in a specific experimental
context with a low degree of subjectivity2as most valuable
(“The Good”, Fig. 4) as they allow us, for example, to ac-
curately test the sensitivity of applying an RE method in a
socio-economic context.
We discovered that 59 % of all revealed variables can be
measured while 69 % of those variables can be measured on
basis of artefacts. For those variables that origin in con-
text of RE, we discovered that 78 % of the variables are
measurable on basis of the artefacts with a high percent-
age of actionability compared to the variables allocated to
other dimensions. One implication we draw is that when ap-
plying a treatment to RE, we can rely on independent and
comparable measurements on basis of artefacts rather than
relying on measurements on basis of software engineering
activities. This understanding results from our experiences
that by taking artefact reference models, we can define a
certain external notion of artefact quality to which we can
compare the actual results of an experiment in a standard-
ised manner [15].
Apart from the possibilities we see in the means for mea-
suring the variables is that a substantial amount of variables
is allocated to RE. This enables to shorten the empirical cy-
cle when conducting, for example, case study research as we
do not necessarily have to take into account subjects and
objects associated with the whole development process. For
example, we can investigate the eects of a RE specification
method by directly investigating the quality of the created
RE artefacts while isolating investigations of the eects on
other development activities, e.g. on reviews within analyt-
ical quality assurance tasks.
From the perspective of evidence-based research, this means
that we are able to set up an experiment or a case study
2We speak of a high degree of “subjectivity” when the out-
come of a measurement is dependent on the particularities
and attitude of the involved subjects.
while focussing on RE only, while neglecting a large extent
of phenomena relevant to further dimensions, but still pre-
serving potential eects on them. The latter is reflected
by a set of discovered variables having impacts on further
variables that would demand for longitudinal studies (e.g.,
incomplete requirements causing change requests).
For researchers, this supports the accuracy in the plan-
ning of experimental settings. Practitioners can already use
those results, for example, to calibrate their quality assur-
ance techniques.
5.2 The Bad: Limitations in RE Context
One direct implication of the foregoing discussion is that,
in contrast to the given measurability in RE, we could also
reveal a set of cause-eect relationships between variables
that are barely measurable or not measurable in a local,
context-specific RE setting. We consider those variables to
limit the possibilities in evidence-based RE research (“The
Bad”, Fig. 4). That is:
1. 73 % of the variables are not actionable, let alone as
they result from circumstances that are not visible
when setting up a project (e.g. the eects the rela-
tionship to customers has on the quality of the RE
artefacts), or they are
2. not measurable (41 %) or not objectively measurable
limiting the internal and the external validity of mea-
surements (e.g. the customer satisfaction), or
3. they are measurable, but they have relationships with
variables not within RE making measurements and es-
pecially their interpretation dicult (e.g. incomplete
requirements with a strong causal relation to 4 vari-
ables in RE, but also to 8 variables in the project di-
mension and even one in the overall company dimen-
sion), or, finally, they
4. have no direct relation to RE (except over transitive
and, thus, not empirically resilient relationships).
Therefore, while we see good chances to use a substan-
tial amount of variables for measurements in RE, we still
have a large extent of variables that imply the need of fur-
ther investigation on (1) means for accurate measurements
and (2) extension of the variables themselves. The latter
considers the need to better understand the further dimen-
sions via longitudinal studies and replication studies as well
to strengthen the external validity of those variables that
by now remain underrepresented (see also Sect. 7.1 on the
threats to validity).
5.3 The Ugly: No RE Oracle in Sight!
Apart from problems in the measurability of certain vari-
ables discussed in the previous section, we consider another
problem to aect the generalisability of empirical results re-
lying on the obtained variables. That is, the notion of objec-
tivity3and, thus, external validity is weak negatively aect-
ing the extent to which we are eventually able to obtain RE
oracles in general, i.e. the possibility to eventually establish
universally valid RE theories is, in our opinion, low (“The
Ugly”, Fig. 4).
The reason is that especially in RE, a large extent of vari-
ables usable in experimental settings strongly relates to a
measurements where experiences, the expertise, and the ex-
3In its essence, objectivity considers that a treatment results
always in the same eects when applying it independently
to a population.
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
pectations of the subjects involved strongly aect the sensi-
tivity of RE contributions tested in socio-economic contexts.
For example, when testing the eciency of an RE elicitation
method, the results will always depend on beliefs and the
judgment of those who apply the method. In consequence,
one might assume that there is no such thing as universal
truth and that an oracle for RE will never be at our dis-
posal, because the eects of a treatment will always depend
on the subjective view of human beings involved in the con-
text. An alternative view is that it would be inherently hard
to identify and eliminate all confounding factors.
Even if ignoring that we still have many unknown vari-
ables in our results or variables where the measurements
rely on subjective interpretation, those variables we could
already obtain negatively aect the possibility of generalisa-
tion, because:
1. most variables important to RE have complex and of-
ten transitive cause-eect relationships, and
2. they have a low degree of measurability.
We can see, for example, that underspecified or unmeasur-
able requirements impact the eort spent for reviews. We
now could investigate more precisely those eects by con-
ducting a longitudinal study and come, for example, to the
conclusion that one particular specification method applied
to RE has the eect of leading to more precise RE artefacts
and a decreasing review eort. However, the problem in such
an investigation is of more general nature as the decreased
eort could be also caused by variables not included in the
experimental setting and side-eects not taken into account.
One reason why a potential correlation will, in our opinion,
never imply causation is that we doubt that it is possible to
build yet such a system of dependent variables that is able
to capture all possible facets of a project ecosystem.
The results in their current state already reveal the com-
plexity in the dependent variables; for example, change re-
quests are already caused by 9 dierent variables, moving
targets already aects 10 variables.
However, assuming that an RE theory is always something
relative, we can at least use the variables to establish and
calibrate experimental settings as long as we rely on those
aspects we can measure in RE in a standardised manner
(e.g. on basis of artefacts, see Sect. 5.1) or by explicitly opt-
ing for subjectivity, e.g. by gathering expert opinions. The
latter makes clear the necessity to explicitly and accurately
characterise experimental contexts (subjects, background,
expertise etc.).
Finally, although one might doubt the possibility to gen-
erally obtain full external validity, we can (and should) in-
crease the external validity of experimental results until reach-
ing a certain saturation via a tool already at our disposal:
replication studies [12].
6. RESEARCH IMPLICATIONS
Starting from our results, we picture three areas of impli-
cations for which we encourage researchers and practitioners
to foster the necessary discussions, namely (i) general prin-
ciples in evidence-based RE research, (ii) RE methodologies
and (iii) RE quality management.
6.1 Evidence-based RE Research in General
Implication we can draw from our results on evidence-
based RE research mainly result from the negative eects
we discussed in the previous section. Simplified, we have
1. a large extent of variables that are hard to measure
2. a limited understanding on the context surrounding
the particularities in RE.
One might now argue that only the limited set of measur-
able variables with a low degree of dependencies are suitable
to conduct empirical studies. However, we strongly believe
that even if avoiding a pragmatic view on the results pre-
sented in previous sections, we can already make use of all
variables allocated to the context of the RE dimension while
explicitly opting for those for which the measurements is in-
herently subjective. That is, the results increase the aware-
ness of those variables that demand for expert judgement,
thus, allowing us to calibrate the experimental setting, e.g.,
via survey research. To increase the reliability of the results
and to tackle the problem of a limited understanding on the
context surrounding the particularities in RE, we belief that
we should especially value more independent (confirmatory)
replication studies [12].
Apart from the general implications on the various types
of empirical studies, we draw the need to conduct more
curiosity-driven studies in RE to reveal more variables and
strengthen the confidence in those variables we already could
define. This should support a better understanding on the
general phenomena in RE necessary to, for example, infer
proper improvement goals or general characteristics suitable
to tailor RE methodologies to practical environments. As a
matter of fact, this extension of the results is already in
scope via the globally distributed replications as part of the
NaPiRE endeavour.
Nevertheless, the variables presented already serve practi-
tioners and researchers to calibrate their improvement goals
and the metrics used in evaluation research and is in scope
of the following section.
6.2 Research on RE Methodologies
When investigating the area of RE methodologies, i.e.
benefits and needs when relying on certain RE methods
or whole software process models, we too often rely on op-
portunistically chosen metrics for measurements that might
be important to that particularly envisioned socio-economic
context while for others, it might be not. To test the sen-
sitivity of a method in a context, we therefore believe in
the benefits of relying on measurements for those variables
stated with a high number of occurrences in our result set.
The reasons is that those variables seem important to a
broader range of practitioners, thus, they already imply an
increase in external validity for the corresponding improve-
ment goals. Furthermore, by relying on commonly accepted
practical problems to infer improvement goals, we allow for
accurate objectivism via independent replication studies.
Exemplary improvement goals we can already infer from
our results are:
1. Increase flexibility in the RE process (reflecting exem-
plary problems like moving targets, time boxing, or
gold plating)
2. Increase syntactic quality in RE artefacts (reflecting
exemplary problems like inconsistent requirements, ter-
minological problems)
3. Support precise terminology and communication
4. Support consistency and traceability
5. Support testability of requirements
6. Make explicit the particularities of the application do-
main
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
7. Increase of semantic quality of the RE artefacts (re-
flecting exemplary problems like incomplete / hidden
requirements)
Those improvement goals already show the diversity of
treatments we can test while still being in scope of practi-
cally relevant problems. Some goals envision whole method-
ologies and the way of working (activities), others envision
the quality of the artefacts and the specified requirements.
The nature of the measurements for the corresponding vari-
ables furthermore reveal first implications on the necessary
type of studies. While those variables relying on subjective
measurements should be envisioned via, for example, tech-
nical action research case studies [22] and study types to
gather subjective expert judgement (e.g., survey research,
or grounded theory) [1], other more objectively measurable
variables could be in scope of controlled environments. The
cause-eect relationships of the variables can be finally used
to design longitudinal studies, e.g. to test the eects of a
treatment on the general project quality (see also the dis-
cussion in Sect. 5).
Another field of application considers the validation of
previously conducted studies and the calibration of the met-
rics used for further studies. For example, we have already
tested the benefits and shortcomings of applying artefact ori-
entation to RE in various experiments and case studies.[14,
13]. We could also confirm the value of replication studies
as those studies confirmed trends, e.g. the support of consis-
tency and a clear terminology. While most of the variables
we used in our studies match indeed the stated improvement
goals, there are some variables we plan to remove for further
replication as they show to have too many dependencies and
uncontrollable side-eects (e.g., the eciency of the tested
approaches).
We strongly believe that we have the same eects when
testing the practical impact of single methods in evaluation
research. That is, if practitioners and researchers test the
sensitivity of applying single methods (e.g., for requirements
elicitation), they can rely on our presented variables follow-
ing corresponding study types while isolating those variables
that are inherently threatened in their validity.
6.3 Research on RE Quality Management
In the following, we discuss research implications on qual-
ity management in RE, comprising both quality assurance
(assessment) and quality control (correcting actions).
Several indicators suggest the further investigation of met-
rics in RE quality management. While the results of Sec. 4.2
suggest both a substantial basis (23 variables) and the most
promising ratio (70%) of measurable variables, the inter-
rater agreement for RE measurability (=0.15) and action-
ability (=0.30) was rather low, especially in contrast to
the Engineering dimension (= 34 resp. 52). This suggests
that for RE, we are still lacking knowledge of how to as-
sess and control practical problems using metrics. However,
there are immediate benefits: on the one hand, several phe-
nomena (e.g. Inconsistent and Incomplete/Hidden require-
ments) considered measurable but not actionable may be
refined into precise defects and measures, potentially yield-
ing actionability. On the other hand, for those phenomena
considered actionable but not measurable (R05,R08,RP11;
e.g., Underspecified requirements), the discovery of adequate
metrics could improve quality management in RE notably.
In particular, artefact-based metrics seem most promising,
because (i) 78% of RE phenomena can be measured on basis
of artefacts, and (ii) artefact-based reference models can be
leveraged to standardised measurements to obtain compa-
rability.
Indeed, the relation of many identified RE problems, e.g.,
Uncertainty in RE and Weak access to customer needs,to
the more traditional notions of RE quality in general and
their manifestations and impacts of requirement specifica-
tions in terms of quality, remains unclear. Bridging this gap
between the observations and expectations of practitioners
and scientific RE quality models would benefit both: Prac-
titioners would have easier access to scientific RE quality
management methods and tools, while the scientific com-
munity could profit from an empirically-grounded notion of
quality, evaluating and revising existing models.
7. CONCLUSION AND FUTURE WORK
In this paper, we contributed a set of dependent RE vari-
ables classified according to the dimensions defined by Gorschek
et al. [9]. We showed to what extent the variables are mea-
surable and actionable. Based on this classification, we
critically reflected on the results from the perspective of
evidence-based research in RE and draw, as a second step,
direct implications for RE research.
Our results showed that out of 93 variables, 33 have their
origin in RE. Furthermore, we discovered that 59 % of all
revealed variables can be measured while 69 % of those vari-
ables can be measured on basis of artefacts. In RE, even
78 % of the variables are measurable on basis of the arte-
facts. This strengthens our confidence in the possibility to
accurately set up an experiment or a case study while fo-
cussing on RE only, and while neglecting a large extent of
phenomena relevant to further dimensions, but still preserv-
ing potential eects on them. However, we could also show a
substantial amount of variables outside RE having complex
dependencies. We critically discussed the implications on
evidence-based RE research and, for example, on the possi-
bility obtain objectivity in experimental settings.
Further implications we draw were on the RE research
itself, i.e. we showed
1. the need to conduct longitudinal studies and confirma-
tory replication studies for RE,
2. a first set of improvement goals suitable to calibrate ex-
perimental settings in research on RE methodologies,
and
3. the implications on research on RE quality manage-
ment.
7.1 Limitations and Threats to Validity
A threat to the validity is given by the classification itself
given that the area of metrics and measurements is inher-
ently complex and multi-facetted. We minimised this threat
via research triangulation. However, Tab. 3 shows mainly
moderate” agreements and even disagreement for variables
in the Company dimension. We contrasted the moderate re-
liability by solving conflicts with specific reconciliation meet-
ings, in which the three raters concurred by discussion on
an unique classification.
The biggest threat to the validity of our results remains,
however, the incompleteness of the variables. We have, for
example, revealed small coherent sets of variables indepen-
dent of RE and often stated in a limited number of occur-
rences. However, those isolated groups of dependent vari-
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
ables make also clear the necessity (and possibilities) of fur-
ther investigations which we actively discussed. One expla-
nation we have is that the focus when eliciting the variables
within NaPiRE was on RE-specific variables and their causes
and eects leaving open an understanding about the phe-
nomena within the other dimensions. The replication and,
thus, the extension of the results also to other dimensions
is in scope of the NaPiRE endeavour whereas our results
already allow to foster the discussion on the implications on
research in RE.
7.2 Future Work
We are currently replicating the globally distributed sur-
veys on RE in various countries We plan to use the results
to extend and validate the classification we provided with
this paper.
8. REFERENCES
[1] S. Adolph, W. Hall, and P. Kruchten. Using Grounded
Theory to study the Experience of Software
Development. Journal of Empirical Software
Engineering,16(4):487513,2011.
[2] Cheng, B.H.C. and Atlee, J.M. Research Directions in
Requirements Engineering. In Fut ure o f S oftware
Engineering (FOSE’07), pages 285–303. IEEE
Computer Society, 2007.
[3] J. Cohen. A coecient of agreement for nominal
scales. Educational and Psychological Measurement,
20(1):37–46, 1960.
[4] N. Condori-Fern´andez, M. Daneva, and R. Wieringa.
Preliminary Survey on Empirical Research Practices
in Requirements Engineering. Technical Report
TR-CTIT-12-10, University of Twente, Centre for
Telematics and Information Technology (CTIT), 2012.
[5] D. Damian and J. Chisan. An Empirical Study of the
Complex Relationships between Requirements
Engineering Processes and other Processes that lead
to Payos in Productivity, Quality, and Risk
Management. IEEE Transactions on Software
Engineering,32(7):433453,2006.
[6] K. El Enam and N. Madhavji. A Field Study of
Requirements Engineering Practices in Information
Systems Development. In Proceedings o the 2nd IEEE
International Symposium on Requirements
Engineering, pages 68–80. IEEE Computer Society,
1995.
[7] J. Eveleens and T. Verhoef. The Rise and Fall of the
Chaos Report Figures. IEEE Software,27(1):3036,
2010.
[8] J. L. Fleiss. Statistical Methods for Rates and
Proportions. Wiley series in probability and
mathematical statistics. John Wiley & Sons, New
York, second edition, 1981.
[9] Gorschek, T. and Davis, A.M. Requirements
Engineering: In Search of the dependent Variables.
Information and Software Technology,50:6775,2007.
[10] T. Hall, S. Beecham, and A. Rainer. Requirements
problems in twelve software companies: an empirical
analysis. Empirical Software Engineering,8(1):742,
2003.
[11] P. Hsia, A. Davis, and D. Kung. Status report:
Requirements engineering. IEEE Software,
10(6):75–79, 1993.
[12] N. Juristo and S. Vegas. Using dierences among
replications of software engineering experiments to
gain knowledge, 2009. Invited Talk from the
International Conference on Empirical Software
Engineering and Measurement (ESEM).
[13] M. Kuhrmann, D. M´endez Fern´andez, and A. Knapp.
Who Cares About Software Process Modelling? A
First Investigation About the Perceived Value of
Process Engineering and Process Consumption. In
PROFES’14, pages 138–152. Springer, 2013.
[14] D. M´endez Fern´andez, K. Lochmann,
B. Penzenstadler, and S. Wagner. A Case Study on
the Application of an Artefact-Based Requirements
Engineering Approach. In EASE’11, pages 104–113.
Institution of Engineering and Technology (IET),
2011.
[15] D. M´endez Fern´andez, B. Penzenstadler,
M. Kuhrmann, and M. Broy. A Meta Model for
Artefact-Orientation: Fundamentals and Lessons
Learned in Requirements Engineering. In D. Petriu,
N. Rouquette, and O. Haugen, editors, MoDELS’10,
volume 6395, pages 183–197. Springer-Verlag Berlin
Heidelberg, 2010.
[16] D. M´endez Fern´andez and S. Wagner. Naming the
Pain in Requirements Engineering: Design of a global
Family of Surveys and first Results from Germany. In
EASE’13, pages 183–194. ACM Press, 2013.
[17] D. M´endez Fern´andez, S. Wagner, K. Lochmann,
A. Baumann, and H. de Carne. Field Study on
Requirements Engineering: Investigation of Artefacts,
Project Parameters, and Execution Strategies.
Information and Software Technology,54(2):162178,
2012.
[18] endez Fern´andez, D. and Wagner, S. Naming the
Pain in Requirements Engineering. Information and
Software Technology, Currently in Revision, Author
Version available under:
http://www4.in.tum.de/~mendezfe/openspace/
NaPiRE/INFSOF-S-13-00428.pdf,2013.
[19] A. Meneely, B. Smith, and L. Williams. Validating
software metrics: A spectrum of philosophies. ACM
Transa c t i o ns on Soft w a re E n g i n ee r i ng and
Methodology (TOSEM),21(4):24,2012.
[20] U. Nikula, J. Sajaniemi, and H. K¨
alvi¨
ainen. A
State-of-the-practice Survey on Requirements
Engineering in Small-and Medium-sized Enterprises.
Research Report 951-764-431-0, Telecom Business
Research Center Lappeenranta, 2000.
[21] B. Nuseibeh and S. Easterbrook. Requirements
Engineering: A Roadmap. In Proceedings of the
Conference on the Future of Software Engineering,
pages 35–46, New York, NY, USA, 2000. ACM.
[22] R. Wieringa and M. Aycse. Technical action research
as a validation method in information systems design
science. In Proceedings of the 7th international
conference on Design Science Research in Information
Systems: advances in theory and practice, pages
220–238, 2012.
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in the conference/workshop proceedings.
DOI: http://dx.doi.org/10.1145/2601248.2601258
... The possible reasons for such a lack of empirical studies and quantitative data include the difficulty to measure or manipulate essential variables and the very high effort required [25]. Some examples that illustrate the scale of such effort include an 18-month project involving 9 companies [37], a 30-month project in a single company requiring a serious commitment of the researchers [7], or 2 large survey studies gathering data from more than 400 organizations [8]. ...
Article
Full-text available
Requirements Engineering (RE) is recognized as one of the most important (yet difficult) areas of software engineering that has a significant impact on other areas of IT projects and their final outcomes. Empirical studies investigating this impact are hard to conduct, mainly due to the great effort required. It is thus difficult for both researchers and industry practitioners to make evidence-based evaluations about how decisions about RE practices translate into requirement quality and influence other project areas. We propose an idea of a lightweight approach utilizing widely-used tools to enable such an evaluation without extensive effort. This is illustrated with a pilot study where the data from six industrial projects from a single organization were analyzed and three metrics regarding the requirement quality, rework effort, and testing were used to demonstrate the impact of different RE techniques. We also discuss the factors that are important for enabling the broader adoption of the proposed approach.
... The possible reasons for such lack of empirical studies and quantitative data include the difficulty to measure or manipulate essential variables and a very high effort required [8]. Some examples that illustrate the scale of such effort include: 18-month project involving 9 companies [9]; 30-month project in a single company requiring a serious commitment of researchers [10]; or two large survey studies gathering data from over 400 organizations [11]. ...
Chapter
Full-text available
Requirements Engineering (RE) is recognized as one of the most important, but difficult areas of software engineering, with a significant impact on other areas of the IT project and its final outcome. The empirical studies investigating this impact are hard to conduct, mainly due to large effort required. It is thus difficult for researchers and even more for industry practitioners to make evidence-based evaluations, how decisions about RE (e.g. RE process improvements, RE techniques selection) translate into requirements quality and influence other project areas. We propose an idea of a lightweight approach, utilizing the popular tools adopted by numerous software companies, to enable such evaluation without an excessive effort. The proposal is illustrated with a pilot study, where the data from 6 industrial projects from a single organization was analyzed and 3 metrics regarding requirements quality, rework effort and testing were used to demonstrate the impact of different RE techniques applied among considered projects.We also discuss the factors important to enabling adoption of the proposed approach.
... Zhang et al. [21] divided dependencies into "increase/decrease" value and cost along with business-related relations such as "precedence and concurrence dependencies". Although requirements dependencies are important factors in the prioritization process and the final ordered results [21,22], dependencies between and among the requirements is a less investigated field, which is the rationale for using an indirect factor in current studies. ...
Article
Full-text available
Owing to the special stance of prioritizing tasks in requirements engineering processes, and as the requirements are not independent in nature, considering their dependencies is essential during the prioritizing process. Although different classifications of dependency types among requirements exist, only a few approaches in the prioritization process consider such valuable data (dependency among requirements). To achieve a practical prioritization, this study proposes a method based on the effects of the requirement dependencies (increase/decrease cost of) on the value of prioritization provided by the tensor concept. Since the strengths of dependencies are also influential factors in the act of prioritization, the algebraic structure of fuzzy graphs is used to model the requirement dependencies and their strengths. Moreover, a weighted page rank algorithm based on the fuzzy concept is provided to determine the final dependency strength of the dependent requirements of the fuzzy graph. To evaluate the proposed approach, a controlled experiment is also conducted. The proposed approach is compared with an analytic hierarchy process-based approach, TOPSIS, and EVOLVE in the experiment. The results analysis demonstrates that our approach is less time-consuming, much easier to use, and highly accurate.
... Over the last years, we have observed an active research community arise and propose a plethora of promising contributions to RE. However, we still know very little about the practical impact of those contributions or whether they are in tune with the practical problems they intend to address [8]. In fact, there still seems to be often a gap between research and current practice [3]. ...
Article
Full-text available
The relevance of Requirements Engineering (RE) research to practitioners is a prerequisite for problem-driven research in the area and key for a long-term dissemination of research results to everyday practice. To better understand how industry practitioners perceive the practical relevance of RE research, we have initiated the RE-Pract project, an international collaboration conducting an empirical study. This project opts for a replication of previous work done in two different domains and relies on survey research. To this end, we have designed a survey to be sent to several hundred industry practitioners at various companies around the world and ask them to rate their perceived practical relevance of the research described in a sample of 418 RE papers published between 2010 and 2015 at the RE, ICSE, FSE, ESEC/FSE, ESEM and REFSQ conferences. In this paper, we summarise our research protocol and present the current status of our study and the planned future steps.
... To elaborate the extent to which the application of ArtREPI eventually leads to an improvement, and how to measure the success of an improvement (including subjective and cognitive facets), we first need a better understanding on the measurability of such an improvement. In [18], we provide a richer discussion on the limitations of measurements in RE. ...
Conference Paper
Most requirements engineering (RE) process improvement approaches are solution-driven and activity-based. They focus on the assessment of the RE of a company against an external norm of best practices. A consequence is that practitioners often have to rely on an improvement approach that skips a profound problem analysis and that results in an RE approach that might be alien to the organisational needs. In recent years, we have developed an RE improvement approach (called \emph{ArtREPI}) that guides a holistic RE improvement against individual goals of a company putting primary attention to the quality of the artefacts. In this paper, we aim at exploring ArtREPI's benefits and limitations. We contribute an industrial evaluation of ArtREPI by relying on a case study research. Our results suggest that ArtREPI is well-suited for the establishment of an RE that reflects a specific organisational culture but to some extent at the cost of efficiency resulting from intensive discussions on a terminology that suits all involved stakeholders. Our results reveal first benefits and limitations, but we can also conclude the need of longitudinal and independent investigations for which we herewith lay the foundation.
... Over the last years, we have observed a strong research community arise and propose a plethora of promising contributions to RE. Yet, we still know very little about the practical impact of those contributions or whether they are in tune with the practical problems they intend to address [30]. The state of empirical evidence in RE is particularly weak and dominated by, if at all, isolated case studies and small-scale studies investigating aspects that hardly can be generalised. ...
Article
Full-text available
Requirements Engineering (RE) has received much attention in research and practice due to its importance to software project success. Its inter-disciplinary nature, the dependency to the customer, and its inherent uncertainty still render the discipline dicult to investigate. This results in a lack of empirical data. These are necessary, however, to demonstrate which practically relevant RE problems exist and to what extent they matter. Motivated by this situation, we initiated the Naming the Pain in Requirements Engineering (NaPiRE) initiative which constitutes a globally distributed, bi-yearly replicated family of surveys on the status quo and problems in practical RE. In this article, we report on the qualitative analysis of data obtained from 228 companies working in 10 countries in various domains and we reveal which contemporary problems practitioners encounter. To this end, we analyse 21 problems derived from the literature with respect to their relevance and criticality in dependency to their context, and we complement this picture with a cause-e↵ect analysis showing the causes and e↵ects surrounding the most critical problems. Our results give us a better understanding of which problems exist and how they manifest themselves in practical environments. Thus, we provide a first step to ground contributions to RE on empirical observations which, until now, were dominated by conventional wisdom only.
Preprint
Full-text available
Defects in requirements specifications can have severe consequences during the software development lifecycle. Some of them result in overall project failure due to incorrect or missing quality characteristics such as security. There are several concerns that make security difficult to deal with; for instance, (1) when stakeholders discuss general requirements in (review) meetings, they are often not aware that they should also discuss security-related topics, and (2) they typically do not have enough security expertise. These concerns become even more challenging in agile development contexts, where lightweight documentation is typically involved. The goal of this paper is to design and evaluate an approach to support reviewing security-related aspects in agile requirements specifications of web applications. The designed approach considers user stories and security specifications as input and relates those user stories to security properties via Natural Language Processing (NLP) techniques. Based on the related security properties, our approach then identifies high-level security requirements from the Open Web Application Security Project (OWASP) to be verified and generates a focused reading techniques to support reviewers in detecting detects. We evaluate our approach via two controlled experiment trials, comparing the effectiveness and efficiency of novice inspectors verifying security aspects in agile requirements using our reading technique against using the complete list of OWASP high-level security requirements. The (statistically significant) results indicate that using the reading technique has a positive impact (with very large effect size) on the performance of inspectors in terms of effectiveness and efficiency.
Conference Paper
Full-text available
Defects in requirements specifications can have severe consequences during the software development lifecycle. Some of them result in overall project failure due to incorrect or missing quality characteristics such as security. There are several concerns that make security difficult to deal with; for instance, (1) when stakeholders discuss general requirements in (review) meetings, they are often not aware that they should also discuss security-related topics, and (2) they typically do not have enough security expertise. These concerns become even more challenging in agile development contexts, where lightweight documentation is typically involved. The goal of this paper is to design and evaluate an approach to support reviewing security-related aspects in agile requirements specifications of web applications. The designed approach considers user stories and security specifications as input and relates those user stories to security properties via Natural Language Processing (NLP) techniques. Based on the related security properties, our approach then identifies high-level security requirements from the Open Web Application Security Project (OWASP) to be verified and generates a focused reading techniques to support reviewers in detecting detects. We evaluate our approach via two controlled experiment trials. We compare the effectiveness and efficiency of novice inspectors verifying security aspects in agile requirements using our reading technique against using the complete list of OWASP high-level security requirements. The (statistically significant) results indicate that using the reading technique has a positive impact (with very large effect size) on the performance of inspectors in terms of effectiveness and efficiency.
Article
There are many types of dependencies between software requirements, such as the contributions dependencies (Make, Some+, Help, Break, Some-, Hurt) and business dependencies modeled in the i* framework. However, current approaches for prioritizing requirements seldom take these dependencies into consideration, because it is difficult for stakeholders to prioritize requirements considering their preferences as well as the dependencies between requirements. To make requirement prioritization more practical, a method called DRank is proposed. DRank has the following advantages:1) a prioritization evaluation attributes tree is constructed to make the ranking criteria selection easier and more operable; 2) RankBoost is employed to calculate the subjective requirements prioritization according to stakeholder preferences, which reduces the difficulty of evaluating the prioritization; 3) an algorithm based on the weighted PageRank is proposed to analyze the dependencies between requirements, allowing the objective dependencies to be automatically transformed into partial order relations; and 4) an integrated requirements prioritization method is developed to amend the stakeholders? subjective preferences with the objective requirements dependencies and make the process of prioritization more reasonable and applicable. A controlled experiment performed to validate the effectiveness of DRank based on comparisons with Case Based Ranking, Analytical Hierarchy Process, and EVOLVE. The results demonstrate that DRank is less time-consuming and more effective than alternative approaches. A simulation experiment demonstrates that taking requirement dependencies into consideration can improve the accuracy of the final prioritization sequence.
Conference Paper
Full-text available
Context: For many years, we have observed industry struggling in defining a high quality requirements engineering (RE) and researchers trying to understand industrial expectations and problems. Although we are investigating the discipline with a plethora of empirical studies, those studies either concentrate on validating specific methods or on single companies or countries. Therefore, they allow only for limited empirical generalisations. Objective: To lay an empirical and generalisable foundation about the state of the practice in RE, we aim at a series of open and reproducible surveys that allow us to steer future research in a problem-driven manner. Method: We designed a globally distributed family of surveys in joint collaborations with different researchers from different countries. The instrument is based on an initial theory inferred from available studies. As a long-term goal, the survey will be regularly replicated to manifest a clear understanding on the status quo and practical needs in RE. In this paper, we present the design of the family of surveys and first results of its start in Germany. Results: Our first results contain responses from 30 German companies. The results are not yet generalisable, but already indicate several trends and problems. For instance, a commonly stated problem respondents see in their company standards are artefacts being underrepresented, and important problems they experience in their projects are incomplete and inconsistent requirements. Conclusion: The results suggest that the survey design and instrument are well-suited to be replicated and, thereby, to create a generalisable empirical basis of RE in practice.
Conference Paper
Full-text available
When it comes to designing a software process, we have experienced two major strategies. Process engineers can either opt for the strategy in which they focus on designing a process using an artefact model as backbone or, on the other hand, they can design it around activities and methods. So far, we have first studies that directly analyse benefits and shortcomings of both approaches in direct comparison to each other, without addressing the questions relevant to pro-cess engineers and which implications the selection of a particular design strategy has on the process consumption. We contribute a first controlled investigation on the perceived value of both strategies from the perspectives of process engineers and process consumers. While our results underpin the artefact-oriented design strategy to be an advantageous instrument for process engineers, process con-sumers do not evidently care about the selected design strategy. Furthermore, our first investigation performed in an academic environment provides a suitable em-pirical basis, which we can use to steer further replications and investigations in practical environments.
Article
Full-text available
[Context and Motivation] Based on published output in the premium RE conferences and journals, we observe a growing body of research using both quantitative and qualitative research methods to help understand which RE technique, process or tool work better in which context. Also, more and more empirical studies in RE aim at comparing and evaluating alternative techniques that are solutions to common problems. However, until now there have been few meta studies of the current state of knowledge about common practices carried out by researchers and practitioners in empirical RE. Also, surprisingly little has been published on how RE researchers perceive the usefulness of these best practices. [Objective] The goal of our study is to improve our understanding of what empirical practices are performed by researchers and practitioners in RE, for the purpose of understanding the extent to which the research methods of empirical software engineering are adopted in the RE community. [Method] We surveyed the practices that participants of the REFSQ conference have been using in their empirical research projects. The survey was part of the REFSQ 2012 Empirical Track. [Conclusions] We found that there are 15 commonly used practices out of a set of 27. The study has two implications: first it presents a list of practices that are commonly used in the RE community, and a list of practices that still remain to be practiced. Researchers may now make an informed decision on how to extend the practices they use in producing and executing their research designs, so that their designs get better. Second, we found that senior researchers and PhD students do not always converge in their perceptions about the usefulness of research practices. Whether this is all right and whether something needs to be done in the face of this finding remains an open question.
Article
Full-text available
Technology transfer to small-and medium-sized enterprises has failed to achieve its full potential in the requirements engineering (RE) field. Most companies do not know how to start their RE improvement efforts even if they are aware of the problems in this field. The state-of-the-practice survey presented in this paper gives a realistic view of how marginal technology transfer from the research community to the industry has been. It also reveals that the key development needs in industry are (1) development of own RE process adaptations, (2) RE process improvement, and (3) automation of RE practices. Directing efforts to these areas would substantially improve the chances of successful technology transfer and process improvement efforts in industry.
Article
Context. Researchers proposing a new metric have the burden of proof to demonstrate to the research community that the metric is acceptable in its intended use. This burden of proof is provided through the multi-faceted, scientific, and objective process of software metrics validation. Over the last 40 years, however, researchers have debated what constitutes a “valid” metric. Aim. The debate over what constitutes a valid metric centers on software metrics validation criteria. The objective of this article is to guide researchers in making sound contributions to the field of software engineering metrics by providing a practical summary of the metrics validation criteria found in the academic literature. Method. We conducted a systematic literature review that began with 2,288 papers and ultimately focused on 20 papers. After extracting 47 unique validation criteria from these 20 papers, we performed a comparative analysis to explore the relationships amongst the criteria. Results. Our 47 validation criteria represent a diverse view of what constitutes a valid metric. We present an analysis of the criteria's categorization, conflicts, common themes, and philosophical motivations behind the validation criteria. Conclusions. Although the 47 validation criteria are not conflict-free, the diversity of motivations and philosophies behind the validation criteria indicates that metrics validation is complex. Researchers proposing new metrics should consider the applicability of the validation criteria in terms of our categorization and analysis. Rather than arbitrarily choosing validation criteria for each metric, researchers should choose criteria that can confirm that the metric is appropriate for its intended use. We conclude that metrics validation criteria provide answers to questions that researchers have about the merits and limitations of a metric.
Conference Paper
Current proposals for combining action research and design science start with a concrete problem in an organization, then apply an artifact to improve the problem, and finally reflect on lessons learned. The aim of these combinations is to reduce the tension between relevance and rigor. This paper proposes another way of using action research in design science, which starts with an artifact, and then tests it under conditions of practice by solving concrete problems with them. The aim of this way of using action research in design science is to bridge the gap between the idealizations made when designing the artifact and the concrete conditions of practice that occur in real-world problems. The paper analyzes the role of idealization in design science and compares it with the requirements of rigor and relevance. It then proposes a way of bridging the gap between idealization and practice by means of action research, called technical action research (TAR) in this paper. The core of TAR is that the researcher plays three roles, which must be kept logically separate, namely of artifact developer, artifact investigator, and client helper. Finally, TAR is compared to other approaches of using action research in design science, and with canonical action research.
Article
When software development teams modify their requirements engineering process as an independent variable, they often examine the implications of these process changes by assessing the quality of the products of the requirements engineering process, e.g., a software requirements specification (SRS). Using the quality of the SRS as the dependent variable is flawed. As an alternative, this paper presents a framework of dependent variables that serves as a full range for requirements engineering quality assessment. In this framework, the quality of the SRS itself is just the first level. Other higher, and more significant levels, include whether the project was successful and whether the resulting product was successful. And still higher levels include whether or not the company was successful and whether there was a positive or negative impact on society as a whole.
Article
Requirements Engineering (RE) is a critical discipline mostly driven by uncertainty, since it is influenced by the customer domain or by the development process model used. We aim to investigate RE processes in successful project environments to discover characteristics and strategies that allow us to elaborate RE tailoring approaches in the future. We perform a field study on a set of projects at one company. First, we investigate by content analysis which RE artefacts were produced in each project and to what extent they were produced. Second, we perform qualitative analysis of semi-structured interviews to discover project parameters that relate to the produced artefacts. Third, we use cluster analysis to infer artefact patterns and probable RE execution strategies, which are the responses to specific project parameters. Fourth, we investigate by statistical tests the effort spent in each strategy in relation to the effort spent in change requests to evaluate the efficiency of execution strategies. Our results show no statistically significant difference between the efficiency of the strategies. In addition, it turned out that many parameters considered as the main causes for project failures can be successfully handled. Hence, practitioners can apply the artefact patterns and related project parameters to tailor the RE process according to individual project characteristics.