Conference PaperPDF Available

Abstract and Figures

Selecting a suitable development method for a specific project context is one of the most challenging activities in process design. Every project is unique and, thus, many context factors have to be considered. Recent research took some initial steps towards statistically constructing hybrid development methods, yet, paid little attention to the peculiarities of context factors influencing method and practice selection. In this paper, we utilize exploratory factor analysis and logistic regression analysis to learn such context factors and to identify methods that are correlated with these factors. Our analysis is based on 829 data points from the HELENA dataset. We provide five base clusters of methods consisting of up to 10 methods that lay the foundation for devising hybrid development methods. The analysis of the five clusters using trained models reveals only a few context factors, e.g., project/product size and target application domain, that seem to significantly influence the selection of methods. An extended descriptive analysis of these practices in the context of the identified method clusters also suggests a consolidation of the relevant practice sets used in specific project contexts.
Content may be subject to copyright.
Determining Context Factors for Hybrid Development Methods
with Trained Models
Jil Klünder
Leibniz University Hannover
jil.kluender@inf.uni-hannover.de
Dzejlana Karajic
University of Passau
dzejlana.karajic@gmail.com
Paolo Tell
IT University Copenhagen
pate@itu.dk
Oliver Karras
Leibniz University Hannover
oliver.karras@inf.uni-hannover.de
Christian Münkel
Leibniz University Hannover
christian@muenkel.cc
Jürgen Münch
Reutlingen University
j.muench@computer.org
Stephen G. MacDonell
Auckland University of Technology
stephen.macdonell@aut.ac.nz
Regina Hebig
Chalmers | University of Gothenburg
regina.hebig@cse.gu.se
Marco Kuhrmann
University of Passau
kuhrmann@acm.org
ABSTRACT
Selecting a suitable development method for a specic project con-
text is one of the most challenging activities in process design.
Every project is unique and, thus, many context factors have to be
considered. Recent research took some initial steps towards statis-
tically constructing hybrid development methods, yet, paid little
attention to the peculiarities of context factors inuencing method
and practice selection. In this paper, we utilize exploratory factor
analysis and logistic regression analysis to learn such context fac-
tors and to identify methods that are correlated with these factors.
Our analysis is based on 829 data points from the HELENA dataset.
We provide ve base clusters of methods consisting of up to 10
methods that lay the foundation for devising hybrid development
methods. The analysis of the ve clusters using trained models
reveals only a few context factors, e.g., project/product size and
target application domain, that seem to signicantly inuence the
selection of methods. An extended descriptive analysis of these
practices in the context of the identied method clusters also sug-
gests a consolidation of the relevant practice sets used in specic
project contexts.
CCS CONCEPTS
Software and its engineering
Software development methods;
Software organization and properties
;Agile software develop-
ment;Waterfall model;Spiral model;V-model;Programming teams;
Computing methodologies Machine learning.
KEYWORDS
Agile software development; software process; hybrid development
method; exploratory factor analysis, logistical regression analysis
ACM Reference Format:
Jil Klünder, Dzejlana Karajic, Paolo Tell, Oliver Karras, Christian Münkel,
Jürgen Münch, Stephen G. MacDonell, Regina Hebig, and Marco Kuhrmann.
2020. Determining Context Factors for Hybrid Development Methods with
ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea
©2020 Association for Computing Machinery.
This is the author’s version of the work. It is posted here for your personal use. Not
for redistribution. The denitive Version of Record was published in International
Conference on Software and System Processes (ICSSP ’20), October 10–11, 2020, Seoul,
Republic of Korea,https://doi.org/10.1145/3379177.3388898.
Trained Models. In International Conference on Software and System Processes
(ICSSP ’20), October 10–11, 2020, Seoul, Republic of Korea. ACM, New York,
NY, USA, 10 pages. https://doi.org/10.1145/3379177.3388898
1 INTRODUCTION
Determining and accounting for the context in which a develop-
ment method must be used is among the most challenging activities
in process design [
1
,
4
,
23
]. For every project, many context factors
have to be considered—and the number of such context factors is
huge. For instance, Clarke and O’Connor [
7
] identify 44 major situ-
ational factors (with in total 170 sub factors) in eight groups. Kalus
and Kuhrmann [
17
] name 49 tailoring criteria. In this regard, situa-
tional factors and tailoring criteria both represent context factors.
In both studies, authors do not claim to have explored all factors
and discuss that further domain-specic aspects could extend the
set of factors identied. Also, in both studies, authors point to issues
regarding the mapping of context factors with specic methods
and development activities in projects. Such a mapping is usually
performed during project-specic process tailoring, which however
still seems to be implemented in a demand-driven and experience-
based way [21] rather than in an evidence-based manner.
Recent research provides initial evidence on the systematic use
of hybrid methods in industry, i.e., methods that are combinations
of multiple development methods and practices [
19
]. In [
32
], we
proposed a statistical construction procedure for hybrid methods,
which is grounded in evidence obtained in a large-scale survey
among practitioners [
22
]. Yet, our approach left out context fac-
tors and employs usage frequencies to compute base methods and
method combinations that build the framework for plugging in sets
of development practices. In 2017, we used statistical clustering
methods to identify related methods and practices [
20
]. However,
the direct inuence of context factors and the inuence of latent
factors was not included in these previously conducted studies.
Problem Statement. Even though available research agrees on the
importance of context factors in the construction of development
methods for a specic project context, an evidence-based method
that helps dene the “best-tting” method for a specic context is
missing. This adds a risk to software projects, since inappropriate
hybrid methods can aect several risk-dimensions, e.g., unnecessary
work, misunderstandings, and “faked” processes [29].
© ACM. PREPRINT. This is the author's version of the work. It is posted here by permission of ACM for your personal use.
Not for redistribution. The definitive version was published in the conference/workshop proceedings.
ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea J. Klünder et al.
Objective. We aim to understand the role of context factors, to
identify the inuential among the context factors, and to understand
how to integrate such factors in the systematic and evidence-based
construction of hybrid development methods. Hence, the overall
objective of this paper is to understand which context factors are
important when devising hybrid development methods.
Contribution. We contribute a study on the role of context factors in
the selection of development methods. Using supervised learning,
we analyze a large dataset and derive context factors that inuence
the selection of development methods. In contrast to our previous
study [
20
], we use Exploratory Factor Analysis and Logistical Re-
gression Analysis methods to learn the context factors from data.
The study at hand shows that just a few factors seem to have a
signicant inuence on the method clusters, i.e., project/product
size, target application domain, and certain criticality factors. The
study at hand also provides a novel approach to rene the con-
struction procedure introduced in [
32
] in which base methods and
method combinations have been identied based on their inten-
tional use. The trained models developed in this study provide a new
instrument to rene the method presented in [
32
] by improving
the method-cluster construction through learned context factors.
Outline. This paper is organized as follows: Section 2provides an
overview of related work. Section 3presents the research design,
before we present our ndings in Section 4. In Section 5we discuss
our ndings, before we conclude the paper in Section 6.
2 RELATED WORK
Determining and balancing relevant context factors is key [
1
,
4
,
23
],
however, linking context factors with decisions taken by project
managers in process selection with the impact of a specic method
is not yet well-understood [
17
]. Software process improvement
(SPI) models, like CMMI [
8
] or ISO/IEC 15504 [
12
] have sought
to establish such links. However, as recent research [
21
] shows,
software-producing organizations and project teams tend to imple-
ment SPI as a project-integrated activity rather than implementing
it as a planned project-spanning activity and, therefore, explicit
considerations whether the applied development method prop-
erly addresses the project context step into the background. Since
situation-specic process selection is bound to a particular project,
company-wide learning is limited and, thus, the risk increases to
use an inadequate development method.
In [
32
], we could show that there are hundreds of process vari-
ants, and Noll and Beecham [
27
] stated that companies often use
hybrid methods, but, tend to stay in a specic process category. As
there is no “Silver Bullet” [
3
,
5
,
24
,
26
,
33
] and as companies go for
highly individualized development methods [
15
,
32
,
34
,
35
], notably,
for becoming more agile, the need for answering the question of
which is the best-tting development method becomes increasingly
relevant. However, to answer this question, a deep understanding
of context factors and how these drive the selection of development
methods is necessary. The paper at hand aims to close this gap by
utilizing trained models of context factors. Utilizing the HELENA
data [
22
], we implement a supervised learning strategy that helps
predict and recommend a hybrid development method based on
the project context.
3 RESEARCH DESIGN
We present the research design including a discussion of the threats
to validity. Figure 1provides an overview of the overall research
method, which we explain in detail in subsequent sections.
HELENA 2 Dataset
n=1,467
Selected Data
n=829
Filter
EFA
Filter Variables
(D001) Company Size
(D002) Business Area
(D003) Distribution
(D005) Target Application Domain
(D006) Criticality
(D009) Project/Product Size
(PU01) Company-wide Process
Cluster Scope
(PU09) Frameworks/Methods
Practice Use
(PU10) Practices
36 items to choose from:
1. Do not know the practice
2. Do not know if we use it
3. We never use it
4. We rarely use it
5. We sometimes use it
6. We often use it
7. We always use the practice
Category:
Use
24 items to choose from:
1. Do not know the framework
2. Do not know if we use it
3. We never use it
4. We rarely use it
5. We sometimes use it
6. We often use it
7. We always use the framework
Category:
Use
Cluster 1 Cluster n
Logit
Model 1
Logit
Model n
Factor n
Factor 1
Factor n
Factor 1
1
2
3
Figure 1: Overview of the research method including the
data (variables) selected for the study
3.1 Research Objective and Research Questions
Our overall objective is to understand which context factors are im-
portant when devising hybrid development methods. In particular, we
analyze which development methods are related to similar context
factors, i.e., we group the set of development methods according to
context factors that are related to methods in the set. For this, we
pose the research questions presented in Table 1.
3.2 Data Collection Procedures
This study uses the HELENA 2 dataset [22] and no extra data was
collected. The HELENA 2 data was collected in a large interna-
tional online survey as described in [
19
,
32
]. Starting in 2015, in
three stages, the HELENA survey instrument was incrementally
developed and tested. In total, the questionnaire consisted of ve
parts and up to 38 questions, depending on previously given an-
swers. Data was collected from May to November 2017 following a
convenience sampling strategy [31].
3.2.1 Variable Selection. In this paper, we focus on the context
factors of the development process and their relation to the chosen
development methods. Therefore, we only consider answers to
selected questions on context factors. This selection denes the
Determining Context Factors for Hybrid Development Methods ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea
Table 1: Overview of the research questions of this study
Research Question and Rationale
RQ1 Which development methods have similar usage contexts?
In the rst step, we study clusters of development methods.
In contrast to our previously published study [
20
], we use an
Exploratory Factor Analysis to build clusters of methods.
RQ2
Which context factors inuence the likelihood of using development
methods from a specic set?
In the second step, we study context factors, which are the build-
ing blocks for the clusters identied in RQ1, in more detail. We
build a Logistic Model uncovering the factors that inuence the
likelihood of ending in the respective method clusters.
RQ3
Which practices are commonly used to extend the method clusters
and hence should be taken into consideration when forming a
hybrid development method?
After having gured out the clusters of methods in RQ1, these
methods need to be extended with practices [
28
,
32
], which are
the building blocks of development methods. In this study, we
are primarily interested in identifying candidate practices and,
therefore, we analyze the sets of practices descriptively only.
base dataset for our analysis. It consists of the parameters D001,
D002, D003, D005, D006, D009, and PU01 as shown in Figure 1.
All these variables/questions are used in the dataset to describe
context-related properties of the actual method use. Furthermore,
we excluded all questions that are based on perceptions, such as
the degree of agility per project category, or that are dened by the
participants, such as the participant’s role or the years of experience.
Due to an unequal distribution of data points per country [
19
], we
also excluded the country as a context factor from the analysis.
3.2.2 Data Selection. The survey yielded 1,467 responses, and 691
participants completed the questionnaire. In this study, we use the
complete dataset, which includes only answers of the participants
that completed the survey according to the selection of variables.
This leads to a base population of 𝑛=836.
3.2.3 Data Cleaning. To prepare the data analysis, we inspected
the data and cleaned the dataset. Specically, we analyzed the data
for NA and -9 values indicating that the participants did not provide
answers. That is, participants either skipped a question or did not
provide an answer to an optional question. Data points containing
such values have been analyzed and omitted if the remaining in-
formation was insucient to be included in the statistical analyses.
We removed a data point as soon as it contained NA or -9 for at
least one variable—including PU09—under investigation. This leads
to a nal base population of 𝑛=829.
3.3 Data Analysis Procedures
As illustrated in Figure 1, we implemented a three-staged data
analysis procedure, which consists of an Exploratory Factor Analysis
as the rst step, the construction of a set of Logistic Models in the
second step, and a descriptive analysis of practice use in the third
step. In subsequent sections, we provide detailed information on
the chosen methods and their application.
3.3.1 Exploratory Factor Analysis. To answer the rst research
question (Table 1), we performed an Exploratory Factor Analysis
(EFA). An EFA is “a multivariate statistical method designed to
facilitate the postulation of latent variables that are thought to
underlie—and give rise to—patterns of correlations in new domains
of manifest variables” [
10
]. In the rst step of our study, we used an
EFA to uncover latent variables or hypothetical constructs. A latent
variable cannot be directly observed, but, it can emerge
1
from a set
of other observed variables. Specically, we aim to create clusters
of methods, based on the use of the methods concerning a similar
degree and similar method combinations. For this, we use the 829
data points, each containing information about the use of methods.
We implemented the EFA in the following steps:
Step 1: Applicability of the EFA. The rst step is to ensure that an
EFA is applicable to our dataset. In this regard, the rst important
criterion for applying an EFA is to answer the question if the cor-
relation matrix of the variables under consideration is the identity
matrix for which Bartlett’s test [
2
] is used. If this is the case, the
EFA should not be applied. Therefore, we performed Bartlett’s test
to ensure that an EFA can be applied. Furthermore, we applied the
Kaiser-Meyer-Olkin measure (KMO; [
16
]) to analyze the suitability
of our dataset for an EFA. However, as the KMO provides a metric
for all potentially relevant variables, we also checked the individual
variables using the Measure of Sampling Adequacy (MSA; [
16
]) for
each variable. Finally, we check if the determinant of the correlation
matrix is greater than 0.00001 to avoid singularities. In our case,
the determinant is 0.00031 > 0.00001.
Step 2: Calculating the Number of Clusters. Having ensured that the
EFA is applicable to our dataset, in the next step, we calculate the
number of clusters (factors) that should be generated by the EFA. A
parallel analysis is a common approach for deciding on the number
of factors. Often, parallel analysis is combined with the so-called
Scree test, which is also known as Cattell’s Criterion [
6
], which, in
our case, suggests to build ve factors. A double check using the
Kaiser criterion [
16
] resulted in the suggestion to build two factors.
To obtain more detailed results, we opted for the Scree test and used
the ve-factor suggestion to build ve method clusters.
Step 3: Performing the EFA. Eventually, we performed the EFA using
R
2
, which constructs the method clusters. We used ordinary, least
squares, minres as factoring method with factor loadings
0
.
3,
since we cannot guarantee normally distributed data. Furthermore,
we used Oblimin [
13
] as rotation method. As an oblique rotation
method, Oblimin permits correlations among the constructed sets,
and in case of uncorrelated data, rotations produce similar results
as orthogonal rotation.
ality Evaluation of the Analysis. To ensure the quality of the
results, we calculate the Root Mean Square Error of Approximation
(RMSEA), which estimates the discrepancy between the model and
the data. Furthermore, we calculate Cronbach’s
𝛼
for each identied
cluster to analyze if the clustering of methods is reliable, i.e., if all
elements in the cluster calculate the same. For the interpretation
of Cronbach’s
𝛼
, we use the widely accepted scale:
𝛼
0
.
9is
1
In this context, a latent variable could be agile,hybrid or traditional, or anything that
emerges from clustering the dierent methods and frameworks.
2See: https://www.promptcloud.com/blog/exploratory-factor-analysis-in- r
ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea J. Klünder et al.
considered excellent (items are highly correlated), 0
.
9
>𝛼
0
.
8:
good, 0
.
8
>𝛼
0
.
7: acceptable, 0
.
7
>𝛼
0
.
6: questionable,
0.6>𝛼0.5: poor, and 0.5>𝛼is considered unacceptable.
3.3.2 Logistic Regression Analysis. To answer the second research
question (Section 3.1), we analyze which context factors inuence
the likelihood of being allocated to one of the identied clusters
as a starting point for deciding on an appropriate development
method using the results of [
32
]. Since this allocation is a binary
decision, we use the Binary Logistic Regression analysis—a so-called
Binary Logistic Model—to calculate the inuence of each context
factor for the allocation. In the following, we describe the ve steps
performed in our logistic regression analysis.
Step 1: Checking Assumptions. Before applying a logistic regression
analysis, several assumptions need to be checked. For our analysis,
we check the following four assumptions according to Karras et
al. [18]:
(1)
The dependent variable is dichotomous. To fulll this assump-
tion, a binary outcome needs to be dened. That is, it must
be clearly dened whether or not a data point is in a method
cluster.
(2)
The independent variables are metric or categorical. All vari-
ables used in the model are categorical (Section 3.2.1). All
variables are assessed on Likert scales (ordinal answers) or
as single-choice options (nominal answers).
(3)
In the case of two or more metric independent variables, no
multicollinearity is allowed to be present. This assumption
is fullled, since we do not have any metric independent
variables.
(4)
Both groups of the dichotomous dependent variable contain at
least 25 elements. This assumption has to be checked after
having dened the binary outcome (see the next Step 2).
Step 2: Defining the Binary Outcome. Since we used the raw data as
a training set for the Logistic Model, we rst dene a threshold for
accepting a data point for a specic cluster. Note: for each cluster,
we consider only those data points that provide information about
the use of every method in the cluster. This means that we exclude
data points that report for one or more of the cluster’s methods
that the method is not known or that it is now known if the method
is used (see classication in Figure 1). Hence, the investigation
of the clusters considers a dierent subset
𝑛total
of the overall set
of data points. To accept a data point for a cluster, we dene the
criterion “use” through
PU09𝑖
4(Figure 1), i.e., the method
𝑖
was at least rated “rarely used” by the participant [19]. Due to the
varying number of methods used by the participants, we dened a
relative threshold
>
0
.
5, i.e., a data point is added to a cluster if at
least 50% of the cluster’s methods are used in the data point. The
resulting number of data points in a cluster is called
𝑛using cluster
in
the following (see Table 3for an overview).
Step 3: Data Preparation for the Logistic Analysis. For each cluster
identied in the EFA (Section 3.3.1), we built one Logistic Model.
Each Logistic Model aims at identifying variables that inuence the
likelihood of nding suitable methods in the associated cluster of
methods (Figure 1). To build the Logistic Models, data needs to be
prepared. For each cluster identied, we therefore independently
performed the following steps:
(1)
We removed all data points that did not match the “use”
criterion dened in Step 2, i.e., we removed all data points
with
PU09𝑖<
3(don’t know the method
𝑖
and don’t know
if we use it). The whole data point was removed at the rst
occurrence of a rating
<
3, as we cannot draw conclusions
on the method use. This rigorous decision helps reduce noise
in the data as we only include complete data points in the
analysis.
(2)
As the logistic analysis requires factorized variables, i.e.,
every possible answer option of the considered variables is
treated as a single categorical variable. Figure 2illustrates
the factorization for the variable D001. This procedure was
applied to the variables D001, D003, D009, and PU01. The
remaining variables described in Section 3.2.1 are already
presented and interpreted as categorical variables.
(3)
Some multiple-choice questions (e.g., D002, D005, and D006;
Figure 2, [
22
]) had an option “other” to provide extra infor-
mation through free-text answers. Due to the diversity and
low number of reoccurring answers to these options, we
decided to exclude these answers from the analysis.
Filter Variables
(D001) Company Size
(D002) Business Area
(D003) Distribution
(D005) Target Application Domain
(D006) Criticality
(D009) Project/Product Size
(PU01) Company-wide Process
Small
Medium
Large
Very Large
Micro
No
Yes
No
Yes
Figure 2: Variable factorization to prepare the Logit analysis
Step 4: Build the Models. We build all binary logistic models with R.
All models considered in total 48 variables (constructed as described
in Step 3). Further details for the signicant predictors of the models
can be found in Section 4.2.
Step 5: Evaluate the Model. To ensure interpretability of the results,
we performed multiple steps for the evaluation as suggested by
Peng et al. [30]:
(1)
We evaluated the overall model quality using the Likelihood
ratio test, which compares the model’s results with the re-
sults given by the (intercept-only) null model. Since the null
hypothesis states that there is no dierence between the logis-
tic regression model and the null model, this test needs to be
signicant.
(2)
We performed Wald’s test [
11
] to analyze the statistical sig-
nicance of the individual predictors. This step is necessary
for the interpretation of the model’s results.
(3)
We calculated goodness-of-t statistics using the Hosmer-
Lemeshow test (HL) and Nagelkerke’s
𝑅2
[
25
]. These statistics
analyze the t of the logistic model against the actual out-
comes, i.e., the statistics indicate if the model ts the original
data. The HL-test must not be signicant as it tests the null
Determining Context Factors for Hybrid Development Methods ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea
hypothesis that the model ts the data.Nagelkerke’s
𝑅2
calcu-
lates how much variability in the dataset can be explained by
the logistic model—the closer
𝑅2
is to 1, the more variability
can be explained by the model. This value can be converted
into Cohen’s eect size
𝑓
[
9
] to assess the practical relevance
of the results.
(4)
We validated the predicted probabilities to calculate the ac-
curacy of the Logistic Model. Accuracy can be expressed as a
measure of association (
𝑐
-statistic) and a classication table
that summarizes true and false positives/negatives. The
𝑐
-
statistic is a measure 0
.
5
𝑐
1, with 0
.
5meaning that the
model is not better than a random prediction and 1mean-
ing that all pairwise assignments of elements in and not in
the cluster are always correct. A confusion matrix summa-
rizes the results of applying the Logistic Model to the dataset.
For each entry in the dataset, the model calculates the pre-
dicted probabilty and classies the entry into one of the two
possible groups of the dependent variable. Afterwards, it is
possible to calculate the sensitivity and the specicity of the
model.
3.3.3 Descriptive Analysis on Practice Use. To answer the third
research question, we analyzed the sets of practices (Figure 1, PU10)
for all data points that are dened to contribute to a respective
method cluster (Section 3.3.2–Step 2). For this, we calculate the
share of practices per cluster. Again, we only consider data points
matching the “use” criterion
PU10𝑖
4(see Section 3.3.2Step 2
and Figure 1) and we use a 85% threshold [
32
] to consider a practice
commonly used within a cluster of methods.
3.4 Validity Procedures and Threats to Validity
The research presented in this paper is subject to some limitations
and threats to validity, which we discuss using the classication by
Wohlin et al. [36].
3.4.1 Construct Validity. Given that the dataset used in this analy-
sis emerged from an online survey, we had to deal with the risk of
misunderstood questions leading to incomplete or wrong answers.
To mitigate this risk and as described in Section 3.2, the question-
naire was iteratively developed, including translations into three
languages by native speakers [
19
]. Several methods are related to
one another and were partially built on top of each other. Thus, par-
ticipants, e.g., using Scrum, are likely to identify their development
process to be Iterative as well, which could have resulted in false
positives during the identication of the hybrid methods. Another
risk emerges from the chosen convenience sampling strategy [
31
]
to distribute the questionnaire, which potentially introduced errors
due to participants not reecting the target population. Given the
meaningful results of the analysis of free-text answers [
19
], we are
condent that this threat can be considered mitigated.
3.4.2 Internal Validity. The selection of variables, which emerged
from the limitation to context factors, and the cleaning of the data
as described in Section 3.2 can inuence the results. The variables,
in conjunction with the methods as variables under investigation,
dened the basic dataset. Based on this dataset, we removed all
incomplete data points. A data point was considered as incomplete
as soon as one of the respective questions was not answered. This
reduced the overall sample size, but, we considered the awed
interpretations due to missing answers more severe. We followed
the same approach for the denition of data points for the second
step of the analysis. That is, we also decided conservatively on the
inclusion of data points and removed all data points that did not
match the “use” criterion dened in Section 3.3.2Step 2, which
reduced the sample sizes, but, we decreased the risk of awed
interpretations. Finally, all steps of the data analysis were performed
by two researchers, and two more researchers not involved in the
data analysis thoroughly reviewed each step. Therefore, we are
condent that the analyses are well documented (for replication)
and robust.
3.4.3 Conclusion Validity. The interpretation of the statistical tests
is based on a signicance level of
𝑝
0
.
05. Nevertheless, before
interpreting the results, we included several reliability checks in
the analysis, including the calculation of Cronbach’s
𝛼
for internal
reliability of the found sets, measures for error rates (RMSEA), as
well as the thorough evaluation of the logistic models as described
in Section 3.3.2Step 5. We used a 85% threshold for the extension
of the method sets with practices. This threshold was dened in
[
32
] on the same dataset. Changing this threshold would impact the
results and limit the usability of [
32
] as a baseline. Further research
is thus necessary to increase the results’ reliability.
3.4.4 External Validity. Our results emerge from a large-scale study
representing development methods of a large number of companies
with dierent context factors and in dierent environments. Yet,
we cannot guarantee that our results are correct and applicable for
each company. Nevertheless, we found evidence that some context
factors tend to be more important than others. These may be taken
into account when dening hybrid development methods.
4 RESULTS
We present the results of our study following the three steps of the
data analysis shown in Figure 1and described in Section 3.3.
4.1 Exploratory Factor Analysis
As described in Section 3.3.1, the Exploratory Factor Analysis was
performed in dierent steps for which we present the results in this
section.
4.1.1 Step 1: Applicability of the EFA. Before performing the EFA,
we ensured that it is applicable to our dataset. For this, we calculated
the MSA for each variable. These checks resulted in an overall
MSA =
0
.
9, which is considered good to very good. Only the two
values for “Scrum” and “Kanban” resulted in medium suitability.
Nevertheless, the smallest result in the dataset was
MSA =
0
.
78
(medium) and, therefore, EFA can be applied to our dataset.
4.1.2 Step 2: Calculating the Number of Clusters. To calculate the
number of factors to be considered, we performed a parallel analysis.
The Scree test suggests ve factors. We followed this suggestion
to obtain more ne-grained results and constructed ve method
clusters.
4.1.3 Step 3: Performing the EFA. Performing the EFA resulted in
the method clusters shown in Figure 3. We removed all elements
ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea J. Klünder et al.
DSDM1
Crystal Family
Personal Software Process
Team Software Process
SSADM
Nexus
Spiral Model
Rational Unified Process
Scaled Agile Framework
PRINCE2
2
3
4
5
6
7
8
9
10
Iterative Development
Extreme Programming
Feature-driven Development
Domain-driven Design
Lean Software Development
1
2
3
4
5
Kanban
Scrum
ScrumBan
DevOps
Scaled Agile Framework
Large-scale Scrum
1
2
3
4
5
6
V-shaped Process
Phase/Stage-gate Model
Classic Waterfall Process
PRINCE2
1
2
3
4
Rational Unified Process
Spiral Model
1
2
Cluster 1
(minres values: 0.3011 – 0.7437)
Cluster 3
(minres values: 0.3243 – 0.6705)
Cluster 2
(minres values: 0.3169 – 0.6081)
Cluster 4
(minres values: 0.3295 – 0.6545)
Cluster 5
(minres values: -0.3878 – -0.3038)
Classic Waterfall Process Crystal FamilyDevOpsDomain-driven Design DSDM Extreme Programming Feature-driven Development
Iterative Development Kanban Large-scale ScrumLean Software Development Model-driven Architecture
Nexus
Personal Software Process
Phase/Stage-gate Model
PRINCE2
Rational Unified Process
Scaled Agile Framework Scrum ScrumBan Spiral Model SSADM Team Software Process V-shaped Process
Exploratory
Factor Analysis
24 Methods and Frameworks
from PU09
Figure 3: The ve resulting clusters of the Exploratory Factor Analysis (including value ranges of the minres-algorithm, grey
cells highlight methods that are relevant in multiple clusters)
with loadings
|𝑙|<
0
.
3, which is a common practice, e.g., “Model-
driven Architecture”. A greater loading-value represents a more
important position in the set, which is illustrated in Figure 3by the
“rank” of the methods. For example, “Kanban” was “more important”
to dene Cluster 2 than “Large-scale Scrum”. The factor loading
indicates the strength and direction of a factor on a measured
variable and, therefore, these values are particularly important in
the interpretation of the sets.
4.1.4 ality Evaluation of the Analysis. To check the quality of the
model, we calculated the RMSEA t index as 0.058. A value between
0.05 and 0.08 constitutes a good t of the model. Furthermore,
Table 2summarizes the results of the Cronbach’s
𝛼
values for each
cluster. While the clusters 1, 2 and 3 are well dened according to
the internal reliability, Cluster 4 is questionable, and Cluster 5 is
poor. Hence, the clusters 4 and 5 need to be treated with care.
Table 2: Cronbach’s 𝛼reliability check for the clusters
Cluster Cronbach’s 𝛼Interpretation Logistic Analysis?
Cluster 1 0.86 good No
Cluster 2 0.70 acceptable Yes
Cluster 3 0.73 acceptable Yes
Cluster 4 0.66 questionable Yes
Cluster 5 0.56 poor No
4.2 Logistic Regression Analysis
As described in Section 3.3.2, the Logistic Regression Analysis was
performed in dierent steps. In the EFA presented in Section 4.1,
we identied ve clusters having relationships to similar context
factors. In this section, we present the analysis of the correlation of
context factors and the clusters.
4.2.1 Data Preparation for the Logistic Analysis. We prepared the
dataset for each cluster identied in the EFA following the steps
described in Section 3.3.2Step 3. The data preparation yielded
the distribution of datasets shown in Table 3: column
𝑛total
shows
the number of data points that provide information about use or
missing use for all methods in the cluster. The column
𝑛using cluster
shows the number of data points that report using more than half
of the clusters’ methods (see Section 3.3.2). Finally, column
𝑛rest
includes the remaining data points of
𝑛total
that are not part of
𝑛using cluster
. As we had less than 25 data points for Cluster 1, we
could not apply the logistic regression analysis to this cluster.
Table 3: Data points per cluster used for the Logistic Models
Cluster 𝑛total 𝑛using cluster 𝑛rest
Cluster 1 135 16 119
Cluster 2 281 130 151
Cluster 3 352 217 135
Cluster 4 209 53 156
Cluster 5 376 65 311
4.2.2 Results. Based on the clusters identied in the EFA (Figure 3)
and the results of the data preparation shown in Table 3, we per-
formed the logistic regression analysis using R. In the following,
we present the results of these analyses per cluster.
Cluster 1. For the small number of 16 data points (Table 3) using
more than 50% of the methods in Cluster 1, one condition to apply
the logistic regression as described in Section 3.3.2 was violated.
Therefore, it was not possible to build a Logistic Model for Cluster 1.
Cluster 2. The rst step was to calculate the factorized variables
as described in Figure 2. The signicant results (Wald’s tests) are
summarized in Table 4. The table shows that none of the factorized
Determining Context Factors for Hybrid Development Methods ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea
variables signicantly inuences the likelihood of using more than
50% of the methods in Cluster 2. The other results are further rened
in Table 5, which summarizes the signicant results of the Logistic
Model.
Table 4: Wald’s test results of the Logistic Model for Cluster 2
Var. 𝜒2df 𝑃>𝜒2Interpretation
D001 Company Size 4.3 4 0.37 not signicant
D003 Distribution 7.0 3 0.07 not signicant
D009 Project/Prod. Size 1.6 4 0.8 not signicant
PU01 Comp.-w. Process 0.054 2 0.97 not signicant
Based on the results from Table 4and Table 5, we conclude the
following statements: Based on the logistic model, the likelihood
that a company uses more than 50% of the methods contained in
Cluster 2 is. . .
(1)
.. . positively related to companies that are distributed across
one continent (
D0033, 𝑝 <
0
.
05
,Est. =
1
.
02
>
1). That is,
companies working in this area tend to use more agile de-
velopment methods, including the agile scaling frameworks,
e.g, SAFe or LeSS.
(2)
. . . positively related to companies active in the defense systems
domain (
D00504, 𝑝 <
0
.
05
,Est. =
3
.
52
>
1). That is, compa-
nies working in this area tend to use more agile methods.
(3)
...negatively related to companies active in the space systems
domain (
D00515, 𝑝 <
0
.
05
,Est. =
3
.
85
<
1). That is, compa-
nies working in this area tend to avoid using agile methods.
Finally, as described in Section 3.3.2Step 5, we evaluated the model.
The rst step was to conduct the Likelihood Ratio Test, which con-
rmed that there is no signicant improvement comparing the built
model with the null model (
𝜒2=
36
.
902
, 𝑝 =
0
.
06197
<
0
.
1). How-
ever, at a signicance level of
𝑝=
0
.
1, there is a signicant improve-
ment. Therefore, the results of the model need to be taken with care.
In the second step, a
𝑧
-test was performed to analyze the statistical
signicance of the individual predictors. Table 4and Table 5show
three signicant variables, i.e., predictors for the likelihood of using
methods from Cluster 2. In the analysis of the goodness-of-t statis-
tics, the Hosmer-Lemeshow test did not support the claim that the
model does not t the data (
𝜒2=
9
.
0373
,df =
8
, 𝑝 =
0
.
3392
>
0
.
05).
Nagelkerke’s
𝑅2
resulted in 0
.
272, i.e., almost 27% of the variability
in the dataset can be explained with the logistic model, and the
resulting eect size of 𝑓=0.282 indicates a medium eect [9].
To validate the predicted probabilities, we dened a threshold
using the share of data points contributing to the cluster (Table 3),
i.e., 1
130
281 =
0
.
537, which is the relative probability of not being
in the cluster. Using this relative probability, we computed the
confusion matrix with a sensitivity of 79.47% and a specicity of
Table 5: Signicant results of the Logistic Model for Cluster 2
Var. Est. Std. Err. 𝑧-value 𝑃(>|𝑧|)
D0033Distr. Continent 1.02 0.48 2.105 0.0353
D00504 Defense Systems 3.52 1.63 2.156 0.0311
D00515 Space Systems -3.85 1.89 -2.035 0.0419
Table 6: Wald’s test results of the Logistic Model for Cluster 3
Var. 𝜒2df 𝑃>𝜒2Interpretation
D001 Company Size 5.4 4 0.25 not signicant
D003 Distribution 3.1 3 0.38 not signicant
D009 Project/Product Size 7.6 4 0.11 not signicant
PU01 Comp.-wide Process 0.16 2 0.92 not signicant
Table 7: Signicant results of the Logistic Model for Cluster 3
Var. Est. Std. Err. 𝑧-value 𝑃(>|𝑧|)
D00516 Telecom. -1.32 0.66 -2.002 0.0453
53.85%. The
𝑐
-statistic was calculated with 0.7666, which indicates
a good result as it states that for approx. 75% of all possible pairs of
data points, the model correctly assigned the higher probability to
those in the cluster.
Cluster 3. The rst step was to calculate the factorized variables
as described in Figure 2. The results (Wald’s tests) are summarized
in Table 6. The table shows that none of the factorized variables
signicantly inuences the likelihood of using more than 50% of
the methods in Cluster 3. Table 7summarizes the signicant results
of the Logistic Model for the renement of the individual variables.
Based on the results from Table 6and Table 7, we conclude the
following statement: Based on the logistic model, the likelihood
that a company uses more than 50% of the methods contained in
Cluster 3 is negatively related to companies active in the domain of
Telecommunication (
D00516, 𝑝 <
0
.
05
,Est. =
1
.
32
<
1). That is,
companies working in this area tend to avoid using agile methods.
As described in Section 3.3.2Step 5, we evaluated the model. We
conducted the Likelihood Ratio Test, which conrmed that there is
a signicant improvement comparing the built model with the null
model (
𝜒2=
71
.
124
, 𝑝 =
0
.
01671
<
0
.
05). A
𝑧
-test was performed
to analyze the statistical signicance of the individual predictors.
Table 6and Table 7show one signicant variable, i.e., one predictor
for the likelihood of (not) using methods from Cluster 3. In the
analysis of the goodness-of-t statistics, the Hosmer-Lemeshow
test did not support the claim that the model does not t the data
(
𝜒2=
6
.
3067
,df =
8
, 𝑝 =
0
.
6129
>
0
.
05). Nagelkerke’s
𝑅2
resulted
in 0
.
249, i.e., almost 25% of the variability in the dataset can be
explained with the logistic model, and the resulting eect size of
𝑓=0.257 indicates a medium eect [9].
To validate the predicted probabilities, we dened a threshold
using the share of data points contributing to the cluster (Table 3),
i.e., 1
217
352 =
0
.
384, which is the relative probability of not being
in the cluster. Using this relative probability, we computed the
confusion matrix with a sensitivity of 28.89% and a specicity of
94.01%. The
𝑐
-statistic was calculated with 0.7434, which indicates
a good result as it states that for approx. 75% of all possible pairs of
data points, the model correctly assigned the higher probability to
those in the cluster.
Cluster 4. At rst the factorized variables were calculated as de-
scribed in Figure 2. The signicant results (Wald’s tests) are shown
ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea J. Klünder et al.
in Table 8. The table shows that the Project/Product Size (D009) sig-
nicantly inuences the likelihood of using more than 50% of the
methods in Cluster 4. This particular inuence is rened in Table 9,
which summarizes the signicant results of the Logistic model.
Table 8: Wald’s test results of the Logistic Model for Cluster 4
Var. 𝜒2df 𝑃>𝜒2Interpretation
D001 Company Size 5.3 4 0.26 not signicant
D003 Distribution 3.6 3 0.31 not signicant
D009 Project/Product Size 10.3 4 0.035 signicant
PU01 Comp.-wide Process 1.8 2 0.41 not signicant
Based on the results from Table 8and Table 9, we conclude the
following statements: Based on the logistic model, the likelihood
that a company uses more than 50% of the methods contained in
Cluster 4 is. . .
(1)
.. . negatively related to companies active in the domain of
Web Apps. and Services (
D00517, 𝑝 <
0
.
05
,Est. =
1
.
86
<
1).
That is, companies working in this area tend to avoid using
traditional development methods.
(2)
. . . positively related to the size of a product or project – class:
Small (
D0092, 𝑝 <
0
.
05
,Est. =
4
.
65
>
1). Running small
projects (eort: 2 person weeks–2 person months) increases
the likelihood of using methods from Cluster 4, i.e., small
projects tend to use traditional development methods.
Finally, as described in Section 3.3.2Step 5, we evaluated the model.
The rst step is to conduct the Likelihood Ratio Test, which con-
rmed that there is a signicant improvement comparing the built
model with the null model (
𝜒2=
73
.
827
, 𝑝 =
0
.
00971
<
0
.
05). A
𝑧
-test was performed to analyze the statistical signicance of the
individual predictors. Table 8and Table 9show two signicant
variables, i.e., two predictors for the likelihood of using methods
from Cluster 4. In the analysis of the goodness-of-t statistics, the
Hosmer-Lemeshow test did not support the claim that the model
does not t the data (
𝜒2=
6
.
2849
,df =
8
, 𝑝 =
0
.
6154
>
0
.
05).
Nagelkerke’s
𝑅2
resulted in 0
.
439, i.e., almost 44% of the variability
in the dataset can be explained with the logistic model, and the
resulting eect size of 𝑓=0.489 indicates a large eect [9].
To validate the predicted probabilities, we dened a threshold
using the share of data points contributing to the cluster (Table 3),
i.e., 1
53
209 =
0
.
746, which is the relative probability of not being
in the cluster. Using this relative probability, we computed the
confusion matrix with a sensitivity of 98,08% and a specicity of
15.09%. The
𝑐
-statistic was calculated with 0.8586, which indicates
a good result as it states that for approx. 86% of all possible pairs of
data points, the model correctly assigned the higher probability to
those in the cluster.
Table 9: Signicant results of the Logistic Model for Cluster 4
Var. Est. Std. Err. 𝑧-value 𝑃(>|𝑧|)
D00517 Web Appl./Svc. -1.86 0.88 -2.122 0.0338
D0092Small Product 4.65 2 2.324 0.0201
Cluster 5. Due to the poor reliability of Cluster 5 (see Table 2), we
did not build the Logistic Model for this cluster.
4.3 Practice Use
The last step in our analysis (Figure 1) was the analysis of practices
used in the method clusters identied in the EFA (Section 4.1). To
determine the clusters of practices, we implemented the (descrip-
tive) analysis method as described in Section 3.3.3
3
. Please note that
this part of the analysis is an exploratory analysis in which we are
primarily interested in learning if there is an eect on the selection
of practices in the context of our factor analysis at all, and if we can
observe converging subsets of practices that, eventually, can form
clusters of core practices as building blocks of hybrid development
methods as identied in [
32
]. Figure 5visualizes the outcome of
the assignment of practices to clusters. Even though the clusters 1
and 5 could not be considered for the logistic analysis (Tabel 2and
Table 3), we present the sets of practices for all clusters.
4.3.1 Practices for Analyzed Clusters. Figure 5highlights the three
clusters of methods for which we implemented Logistic Models.
For these three clusters, similar to our ndings from [
32
], we see
that a maximum of 26 out of 36 practices nd an 85% agreement
regarding their use in the context of the method clusters. That is,
we also observe some “preferences” regarding the use of practices.
For instance, in all three clusters, we nd the core practices “Code
Review”, “Coding Standards”, and “Release Planning” (highlighted
in Figure 5), which were identied as the least common denominator
in [
32
]. As illustrated in Figure 4, these three practices build one
core component of constructing hybrid development methods.
Code Review
Coding Standards
Code Review
Release Planning
Coding Standards
Release Planning
OR
OR
Base Methods
Base Method
Combinations
Relevant Sets of
Practices
Figure 4: Simplied construction procedure [32] with core
practices, base methods and combinations, and practice sets
for the respective method combinations
Together with the initially identied preferences (agreement
levels
0
.
85) of the practice use in the study at hand, we therefore
expect converging sets of practices and combinations of practices
for devising hybrid development methods. While the study [
32
] was
limited to only one variable “intentional use of hybrid methods”,
this study adds further relevant context factors and provides a
means for a rened and context-sensitive identication of the base
methods and their combinations.
3
Please note that we did not execute the same construction method based on usage
frequencies and agreement levels as implemented in [
32
]. In this study, we only used
the exploratively identied thresholds for the agreement levels. Yet, we did not apply
the combined-set construction to identify process variants. This requires adjustments
of the construction procedure and remains subject to future work (Section 6).
Determining Context Factors for Hybrid Development Methods ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea
Cluster 1
(minres values: 0.3011 – 0.7437)
Cluster 3
(minres values: 0.3243 – 0.6705)
Cluster 2
(minres values: 0.3169 – 0.6081)
Cluster 4
(minres values: 0.3295 – 0.6545)
Cluster 5
(minres values: -0.3878 – -0.3038)
Exploratory
Factor Analysis
24 Methods and Frameworks from PU09
1. Architecture Specifications
2. Automated Unit Testing
3. Backlog Management
4. Burn-Down Charts
5. Code review
6. Coding standards
7. Collective code ownership
8. Continuous deployment
9. Continuous integration
10. Daily Standup
11. Definition of done/re ady
12. Design Reviews
13. Detailed Designs/Specs.
14. End-to-End (System) Testing
15. Expert-based estimation
16. Iteration Planning
17. Iteration/Sprint Reviews
18. Limit Work-in-Progress
19. Pair Programming
20. Prototyping
21. Refactoring
22. Release planning
23. Retrospectives
24. Security Testing
25. Test-driven Development
26. User Stories
1. Architecture Specifications
2. Automated Unit Testing
3. Backlog Management
4. Burn-Down Charts
5. Code review
6. Coding standards
7. Collective code ownership
8. Continuous deployment
9. Continuous integration
10. Daily Standup
11. Definition of done/re ady
12. Design Reviews
13. End-to-End (System) Testing
14. Expert-based estimation
15. Iteration Planning
16. Iteration/Sprint Reviews
17. Pair Programming
18. Prototyping
19. Refactoring
20. Release planning
21. Retrospectives
22. User Stories
1. Architecture Specifications
2. Automated Unit Testing
3. Backlog Management
4. Burn-Down Charts
5. Code review
6. Coding standards
7. Collective code ownership
8. Continuous integration
9. Daily Standup
10. Definition of done/re ady
11. Design Reviews
12. Detailed Designs/Specs.
13. End-to-End (System) Testing
14. Expert-based estimation
15. Iteration Planning
16. Iteration/Sprint Reviews
17. Prototyping
18. Refactoring
19. Release planning
20. Retrospectives
21. Security Testing
22. Use Case Modeling
23. User Stories
1. Architecture Specifications
2. Automated Code Generation
3. Automated Theorem Proving
4. Automated Unit Testing
5. Backlog Management
6. Burn-Down Charts
7. Code review
8. Coding standards
9. Collective code ownership
10. Continuous deployment
11. Continuous integration
12. Daily Standup
13. Definition of done/re ady
14. Design Reviews
15. Destructive Testing
16. Detailed Designs/Specs.
17. End-to-End (System) Testing
18. Expert-based estimation
19. Formal Estimation
20. Formal Specification
21. Iteration Planning
22. Iteration/Sprint Reviews
23. Limit Work-in-Progress
24. Model Checking
25. On-Site Customer
26. Pair Programming
27. Prototyping
28. Refactoring
29. Release planning
30. Retrospectives
31. Scrum-of-Scrums
32. Security Testing
33. Test-driven Development
34. Use Case Modeling
35. User Stories
36. Velocity-based planning
1. Architecture Specifications
2. Automated Unit Testing
3. Backlog Management
4. Code review
5. Coding standards
6. Collective code ownership
7. Continuous deployment
8. Continuous integration
9. Daily Standup
10. Definition of done/re ady
11. Design Reviews
12. Detailed Designs/Specs.
13. End-to-End (System) Testing
14. Expert-based estimation
15. Formal Specification
16. Iteration Planning
17. Iteration/Sprint Reviews
18. Prototyping
19. Refactoring
20. Release planning
21. Retrospectives
22. Security Testing
23. User Stories
Figure 5: Results of the cluster-practice assignment using a 85% agreement level [32] regarding the use of a practice
4.3.2 Practices for Excluded Clusters. Figure 5also includes Clus-
ter 1 and Cluster 5, which have been excluded from the logistic
regression analysis. However, we can make some interesting ob-
servations. First, Cluster 5, which was excluded from the logistic
regression analysis due to poor reliability, has a reduced and con-
verging set of practices assigned. This set of practices also includes
the core practices [
32
] and, therefore, this cluster remains a can-
didate for further investigation. The second observation is that
Cluster 1 has all practices assigned. As discussed in Section 3.3.2
and Section 4.2.2, the number of elements in Cluster 1 is too small to
draw meaningful conclusions. Figure 5illustrates this by assigning
a complete and unltered list to the Cluster 1. This nding shows
the necessity for further research to grow and improve the data
basis.
5 DISCUSSION
From the predened list of 24 methods, we extracted ve clus-
ters with methods that are correlated with similar context factors.
These clusters consist of mostly agile methods (Cluster 2), mostly
or completely traditional methods (Cluster 4 and Cluster 5) or both
(Cluster 1 and Cluster 3). The clusters formed the basis for a lo-
gistic regression analysis to study context factors that inuence
(i.e., increase or decrease) the likelihood of using more than 50%
of methods belonging to the respective cluster. These analyses re-
vealed few signicant inuence factors: distributed development
on one continent, target application domains defense systems,space
systems,telecommunications and web applications and services, and
project/product size: small. However, we found few contextual fac-
tors only that support conclusions or at least assumptions on the
used development methods. Therefore, we argue that there must
be further factors inuencing the choice of development methods,
which could explain why the denition of a suitable development
method is that complicated. As shown in [
19
], most development
methods emerge from experience. However, experience can only
take eect when having “something” in place that can be adjusted
based on experience, whereas starting from scratch is dicult.
The results of our study provide support for devising hybrid
development methods by identifying factors that inuence the
choice of development methods. However, our results should not
be over-interpreted. They represent an initial guideline for dening
a hybrid development method. For instance, our ndings help nd
a starting point for selecting base methods or method combinations
to dene a hybrid development method. Nevertheless, compared
with [
7
,
17
] (44 and 49 factors), we could only identify a small
number of context factors. Future research is thus strongly required,
notably, to study the remaining known factors for which we—so
far—could not draw any conclusion. A deeper knowledge about the
ICSSP ’20, October 10–11, 2020, Seoul, Republic of Korea J. Klünder et al.
context factors will also lay the foundation for developing improved
tailoring instruments that help project managers dene suitable
project-specic development methods using a systematic approach
in combination with experience and continuous learning [21].
A second key nding is that we can support the claim that prac-
tices are the real building blocks of development approaches [
14
,
32
].
For the three analyzed clusters, we could identify at least 22 prac-
tices with an agreement level
85%. This indicates that practices
might be context-dependent, which implies that focusing on meth-
ods only is insucient. Further research is necessary to gain deeper
insights on the role of practices in method development.
6 CONCLUSION
In this paper, we studied the use of hybrid methods based on 829
data points from a large-scale international survey. Using an Ex-
ploratory Factor Analysis, we identied ve clusters of methods.
We used these clusters as dependent variables for a Logistic Regres-
sion Analysis to identify contextual factors that are correlated with
the use of methods from these clusters. The analysis using trained
models reveals that only a few factors, e.g., project/product size,
and target application domain, seem to signicantly inuence the
method selection. An extended descriptive analysis of the practices
used in the identied method clusters also suggests a consolidation
of the relevant practice sets used in specic project contexts.
Our ndings contribute to the evidence-based construction of
hybrid methods. As described in Section 4.3.1, our results provide a
means to learn relevant context factors, which can be used to derive
base methods and method combinations that, themselves, are a core
component of a construction procedure for hybrid methods [
32
].
That is, a hybrid development methods can be constructed using a
set of context factors going beyond the so far used frequency-based
construction procedure. Furthermore, an improved knowledge of
such context factors will also contribute to better understand and
dene powerful tailoring mechanisms that help dene develop-
ment methods for specic project situations to reduce overhead
introduced through inadequate project-specic processes.
ACKNOWLEDGMENTS
We thank all the study participants and the researchers involved in
the HELENA project for their great eort in collecting data.
REFERENCES
[1]
Ove Armbrust and Dieter Rombach. 2011. The Right Process for Each Context:
Objective Evidence Needed. In Proceedings of the International Conference on
Software and Systems Process (ICSSP). Association for Computing Machinery,
New York, NY, USA, 237–241. https://doi.org/10.1145/1987875.1987920
[2]
Maurice Stevenson Bartlett. 1937. Properties of suciency and statistical tests.
Proceedings of the Royal Society of London. Series A-Mathematical and Physical
Sciences 160, 901 (1937), 268–282.
[3]
Abdrew Begel and Nachiappan Nagappan. 2007. Usage and Perceptions of Agile
Software Development in an Industrial Context: An and Exploratory Study. In
Intl. Symp. on Empirical Software Engineering and Measurement.
[4]
O. Benediktsson, D. Dalcher, and H. Thorbergsson. 2006. Comparison of software
development life cycles: a multiproject experiment. IEE Proceedings - Software
153, 3 (June 2006), 87–101. https://doi.org/10.1049/ip-sen:20050061
[5] Frederick P Brooks. 1987. No silver bullet. IEEE Computer 20, 4 (1987), 10–19.
[6]
Raymond B Cattell. 1966. The scree test for the number of factors. Multivariate
behavioral research 1, 2 (1966), 245–276.
[7]
Paul Clarke and Rory V. O’Connor. 2012. The Situational Factors That Aect the
Software Development Process: Towardsa Comprehensive Reference Framework.
Inf. Softw. Technol. 54, 5 (2012), 433–447.
[8]
CMMI Product Team. 2010. CMMI for Development, Version 1.3. Technical Report
CMU/SEI-2010-TR-033. Software Engineering Institute.
[9]
J. Cohen. 2013. Statistical Power Analysis for the Behavioral Sciences. Routledge.
[10]
B.D. Haig. 2010. Abductive Research Methods. In International Encyclopedia
of Education (3 ed.), Penelope Peterson, Eva Baker, and Barry McGaw (Eds.).
Elsevier, 77–82.
[11]
Leonhard Held and D Sabanés Bové. 2014. Applied statistical inference. Springer,
Berlin Heidelberg, doi 10, 978-3 (2014), 16.
[12]
ISO/IEC JTC 1/SC 7. 2004. ISO/IEC 15504:2004: Software Process Assessment – Part
4: Guidance on use for process improvement and process capability determination.
Technical Report. International Organization for Standardization.
[13]
J. Edward Jackson. 2005. Encyclopedia of Biostatistics. American Cancer Society,
Chapter Oblimin Rotation. https://onlinelibrary.wiley.com/doi/abs/10.1002/
0470011815.b2a13060
[14]
Ivar Jacobson, Harold Lawson, Pan-Wei Ng, Paul E. McMahon, and Michael
Goedicke. 2019. The Essentials of Modern Software Engineering: Free the Practices
from the Method Prisons! Morgan & Claypool Publishers.
[15]
Capers Jones. 2003. Variations in software development practices. IEEE Software
20, 6 (Nov 2003), 22–27. https://doi.org/10.1109/MS.2003.1241362
[16] Henry F Kaiser. 1970. A second generation little jiy. (1970).
[17]
G. Kalus and M. Kuhrmann. 2013. Criteria for Software Process Tailoring: A
Systematic Review. In International Conference on Software and Systems Process
(ICSSP). ACM, 171–180.
[18]
Oliver Karras, Kurt Schneider,and Samuel A. Fricker. 2019. Representing Software
Project Vision by Means of Video: A Quality Model for Vision Videos. Journal of
Systems and Software (2019). https://doi.org/10.1016/j.jss.2019.110479
[19]
J. Klünder, R. Hebig, P. Tell, M. Kuhrmann, J. Nakatumba-Nabende, R. Heldal,
S. Krusche, M. Fazal-Baqaie, M. Felderer, M. F. Genero Bocco, S. Küpper, S. A.
Licorish, G. López, F. McCaery, Ö. Özcan Top, C. R. Prause, R. Prikladnicki,
E. Tüzün, D. Pfahl, K. Schneider, and S. G. MacDonell. 2019. Catching up with
Method and Process Practice: An Industry-Informed Baseline for Researchers. In
Proceedings of International Conference on Software Engineering (ICSE-SEIP).
[20]
M. Kuhrmann, P. Diebold, J. Münch, P. Tell, V. Garousi, M. Felderer, K. Trektere,
F. McCaery, O. Linssen, E. Hanser, and C. R. Prause. 2017. Hybrid Software and
System Development in Practice: Waterfall, Scrum, and Beyond. In International
Conference on Software and System Process (ICSSP). ACM, 30–39.
[21]
Marco Kuhrmann and Jürgen Münch. 2019. SPI is Dead, isn’t it? Clear the Stage
for Continuous Learning!. In International Conference on Software and System
Processes (ICSSP). IEEE, 9–13. https://doi.org/10.1109/ICSSP.2019.00012
[22]
Marco Kuhrmann, Paolo Tell, Jil Klünder, Regina Hebig, Sherlock A. Licorish,
and Stephen G. MacDonell. 2018. Complementing Materials for the HELENA
Study (Stage 2). [online] DOI: 10.13140/RG.2.2.11032.65288.
[23]
Alan MacCormack and Roberto Verganti. 2003. Managing the Sources of Un-
certainty: Matching Process and Context in Software Development. Journal of
Product Innovation Management 20, 3 (2003), 217–232.
[24]
B. Murphy, C. Bird, T. Zimmermann, L. Williams, N. Nagappan, and A. Begel.
2013. Have Agile Techniques been the Silver Bullet for Software Development at
Microsoft. In 2013 ACM / IEEE International Symposium on Empirical Software
Engineering and Measurement.
[25]
Nico J.D. Nagelkerke. 1991. A Note on a General Denition of the Coecient of
Determination. Biometrika 78, 3 (1991), 691–692.
[26]
S. Nerur, R. Mahapatra, and G. Mangalaraj. 2005. Challenges of Migrating to
Agile Methodologies. In Communications of the ACM, Vol. 48. 73–78.
[27]
John Noll and Sarah Beecham. 2019. How Agile Is Hybrid Agile? An Analysis of
the HELENA Data. In Product-Focused Software Process Improvement. Springer
International Publishing, Cham, 341–349.
[28]
OMG. 2018. Essence – Kernel and Language for Software Engineering Methods.
OMG Standard formal/18-10-02. Object Management Group.
[29]
D. Parnas and P. Clements. 1986. A rational design process: How and why to
fake it. IEEE Transactions on Software Engineering 12, 2 (1986).
[30]
Chao-Ying Joanne Peng, Kuk Lida Lee, and Gary M Ingersoll. 2002. An Introduc-
tion to Logistic Regression Analysis and Reporting. The Journal of Educational
Research 96, 1 (2002), 3–14.
[31] C. Robson and K. McCartan. 2016. Real World Research. John Wiley & Sons.
[32]
P. Tell, J. Klünder, S. Küpper, D. Rao, S. G. MacDonell, J. Münch, D. Pfahl, O.
Linssen, and M. Kuhrmann. 2019. What Are Hybrid Development Methods Made
of?: An Evidence-based Characterization. In Proceedings of the International
Conference on Software and System Processes (ICSSP). IEEE, 105–114.
[33]
Van Vliet H. Van Waardenburg G. 2013. When agile meets the enterprise. IEEE
Information and Software Technology 55 (2013), 2154 – 2171.
[34]
Leo R. Vijayasarathy and Charles W. Butler. 2016. Choice of Software Develop-
ment Methodologies: Do Organizational, Project, and Team Characteristics Mat-
ter? IEEE Software 33, 5 (Sept 2016), 86–94. https://doi.org/10.1109/MS.2015.26
[35]
D. West, M. Gilpin, T. Grant, and A. Anderson. 2011. Water-Scrum-Fall Is The
Reality Of Agile For Most Organizations Today. Technical Report. Forrester
Research Inc.
[36]
Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and
Anders Wesslén. 2012. Experimentation in Software Engineering. Springer.
... Many studies have investigated the relationships between project environment and process tailoring (Avison and Pries-Heje, 2007;Clarke and O'Connor, 2012;Klünder et al., 2020;Kruchten, 2013;Masood et al., 2020;Petersen et al., 2021). However, while this research has shown that adaptation of practices can be necessary, there is little knowledge on the context of why changes are needed or why certain practices are selected based on the project's context. ...
... Krutchten proposed a model for guiding the adoption of agile development practices, particularly in environments that are "outside of the agile sweet spot" (Kruchten, 2013). Klünder et al. applied a statistical approach to investigate context for hybrid development methods (Klünder et al., 2020). ...
... The present study aimed to determine the effectiveness of software development practices in addressing the goals of DevOps adoption in software organizations through the use of machine learning prediction algorithms. Information collected for the HELENA2 dataset was used for this purpose (Klünder, et al. 2020). The HELENA2 dataset itself is comprehensive because it was created by a team of 75 researchers to gather input from respondents working in software sectors in 55 different countries. ...
Article
Full-text available
Development and Operations (DevOps) is a relatively recent phenomenon that can be defined as a multidisciplinary effort to improve and accelerate the delivery of business values in terms of IT solutions. Many software organizations are heading towards DevOps to leverage its benefits in terms of improved development speed, stability, collaboration, and communication. DevOps practices are essential to effectively implement in software organizations, but little attention has been given in the literature to how these practices can be managed. Our study aims to propose and develop a framework for effectively managing DevOps practices. We have conducted an empirical study using the publicly available HELENA2 dataset to identify the best practices for effectively implementing DevOps. Furthermore, we have used the prediction algorithms such as Support Vector Machine (SVM), Artificial Neural Network (ANN) and Random Forest (RF) to develop a prediction model for DevOps implementation. The findings of this study show that “Continuous deployment”, “Coding standards”, “Continuous integration”, and “Daily Standup” "are the most significant practicesduring the life cycle of projects for effectively managing the DevOps practices. The contribution of this study is not only limited to investigating the best DevOps practices but also provides a prediction of DevOps project success and prioritization of best practices. It can assist software organizations in getting the best possible practices to focus on based on the nature of their projects.
... The present study aimed to determine the effectiveness of software development practices in addressing the goals of DevOps adoption in software organizations through the use of machine learning prediction algorithms. Information collected for the HELENA2 dataset was used for this purpose [37]. The HELENA2 dataset itself is comprehensive because it was created by a team of 75 researchers to gather input from respondents working in software sectors in 55 different countries. ...
Preprint
Development and Operations (DevOps) is a relatively recent phenomenon that can be defined as a multidisciplinary effort to improve and accelerate the delivery of business values in terms of IT solutions. Many software organizations are heading towards DevOps to leverage its benefits in terms of improved development speed, stability, collaboration, and communication. DevOps practices are essential to effectively implement in software organizations, but little attention has been given in the literature to how these practices can be managed. Our study aims to propose and develop a framework for effectively managing DevOps practices. We have conducted an empirical study using the publicly available HELENA2 dataset to identify the best practices for effectively implementing DevOps. Furthermore, we have used the prediction algorithms such as Support Vector Machine (SVM), Artificial Neural Network (ANN) and Random Forest (RF) to develop a prediction model for DevOps implementation. The findings of this study show that "Continuous deployment", "Coding standards", "Continuous integration", and "Daily Standup" "are the most significant practicesduring the life cycle of projects for effectively managing the DevOps practices. The contribution of this study is not only limited to investigating the best DevOps practices but also provides a prediction of DevOps project success and prioritization of best practices. It can assist software organizations in getting the best possible practices to focus on based on the nature of their projects.
... Klünder et al. [8] analyzed 829 data points from the HELENA dataset. They found that a few context factors, e.g., project/product size and target application domain, significantly influence the selection of methods. ...
Technical Report
Full-text available
It is widely accepted that a good software process is essential for developing good software products. International studies have seen that most software companies apply hybrid processes for developing software, i.e., an organized combination of traditional and agile practices. Software process improvement and tailoring have been researched for several years, but there is less experience when dealing with hybrid software processes. Therefore it is not always clear to determine the combination of practices that best fits a certain context, either for the company or a certain project. Moreover, the most appropriate process also depends on a particular goal. In this paper we propose the Dynamic Tailoring for Hybrid Software Processes (DynaTail), a method that aims to achieve three purposes: (1) allow for the tailoring of the organization process in order to adapt to the project context when pursuing certain goal; (2) provide a framework for trying different process and context modifications in order to obtain better results for a particular project. We describe the method in full detail illustrating it with a running example taken from industry. We have found that the proposed method is a promising path for the quantitative evaluation of hybrid software processes.
Thesis
Full-text available
Since software projects are becoming progressively complex, social aspects such as the communication between developers are becoming increasingly important. While encouraging communication has a positive effect on productivity, inadequate communication can diminish the productivity and well-being of developers. Therefore, it is of interest for project managers to monitor communication within their development teams using sentiment analysis. However, one obstacle is that sentiment analysis tools have been unable to distinguish between individual developers’ perceptions of sentiments. For instance, a recent study revealed strong discrepancies between the perceptions of computer scientists and the scientific authors of datasets used to apply sentiment analysis in software engineering. Therefore, sentiment analysis tools must be calibrated to the individual developers’ perceptions for an industrial application. In this thesis, the different perceptions of sentiments of 94 computer scientists will be investigated using exploratory data analysis to find first clues on how such a calibration could look like. Using this approach, two groups of computer scientists were identified that showed significantly different perceptions of sentiments regarding statements from the collaborative software development domain. These results can be used to investigate what influences these different perceptions in a follow-up study. Thus, the industrial applicability of sentiment analysis in software engineering shall be supported.
Article
Full-text available
Background: Measuring the organizational climate of agile teams is a challenge for organizations, mainly because of the shortages of specific instruments to agile methodologies. On the other hand, finding companies willing to participate in the preliminary validation of an instrument is a challenge for researchers of the organizational climate. The preliminary validation allows identifying problems and improvements in the instrument. Objective: We present the preliminary evaluation of TACT. TACT is an instrument to assess the organizational climate of agile teams. Its initial version comprises the Communication, Collaboration, Leadership, Autonomy, Decision-Making, and Client Involvement dimensions. Method: We planned and executed a case study considering three development teams. We evaluated TACT using open-ended questions, quantitative methods, and TAM dimensions of Intention to Use, Perceived Usefulness, and Output Quality. Results: TACT allowed to classify the organizational climate of the teams for the Communication, Collaboration, Leadership, Autonomy, Decision-Making, and Client Involvement dimensions. Some items were assessed negatively or neutrally, which represent points of attention. TACT captured the lack of agile ceremonies, the difficulty of the product owner in planning iterations, and the distance in leadership. In addition, TACT dimensions presented high levels of reliability. Conclusions: TACT captured the organizational climate of the teams adequately. The team leaders reported intention of future use. The items that compose TACT can be used by researchers investigating the influence of human factors in agile teams and practitioners who need to design organizational climate assessments of agile teams. By using an instrument adapted to assess the organizational climate of agile teams, an organization can better identify issues and improvement actions aligned with agile values, principles, and practices.
Conference Paper
Full-text available
Software process improvement (SPI) is around for decades, but it is a critically discussed topic. In several waves, different aspects of SPI have been discussed in the past, e.g., large-scale company-level SPI programs, maturity models, success factors, and in-project SPI. It is hard to find new streams or a consensus in the community, but there is a trend coming along with agile and lean software development. Apparently, practitioners reject extensive and prescriptive maturity models and move towards smaller, faster and continuous project-integrated SPI. Based on data from two survey studies conducted in Germany (2012) and Europe (2016), we analyze the process customization for projects and practices for implementing SPI in the participating companies. Our findings indicate that, even in regulated industry sectors, companies increasingly adopt in-project SPI activities, primarily with the goal to continuously optimize specific processes. Therefore, with this paper, we want to stimulate a discussion on how to evolve traditional SPI towards a continuous learning environment.
Conference Paper
Full-text available
Among the multitude of software development processes available, hardly any is used by the book. Regardless of company size or industry sector, a majority of project teams and companies use customized processes that combine different development methods - so-called hybrid development methods. Even though such hybrid development methods are highly individualized, a common understanding of how to systematically construct synergetic practices is missing. In this paper, we make a first step towards devising such guidelines. Grounded in 1,467 data points from a large-scale online survey among practitioners, we study the current state of practice in process use to answer the question: What are hybrid development methods made of? Our findings reveal that only eight methods and few practices build the core of modern software development. This small set allows for statistically constructing hybrid development methods. Using an 85% agreement level in the participants' selections, we provide two examples illustrating how hybrid development methods are characterized by the practices they are made of. Our evidence-based analysis approach lays the foundation for devising hybrid development methods.
Conference Paper
Full-text available
Software development methods are usually not applied by the book. Companies are under pressure to continuously deploy software products that meet market needs and stakeholders’ requests. To implement efficient and effective development processes, companies utilize multiple frameworks, methods and practices, and combine these into hybrid methods. A common combination contains a rich management framework to organize and steer projects complemented with a number of smaller practices providing the development teams with tools to complete their tasks. In this paper, based on 732 data points collected through an international survey, we study the software development process use in practice. Our results show that 76.8% of the companies implement hybrid methods. Company size as well as the strategy in devising and evolving hybrid methods affect the suitability of the chosen process to reach company or project goals. Our findings show that companies that combine planned improvement programs with process evolution can increase their process’ suitability by up to 5%.
Data
Full-text available
This file contains the basic research and questionnaire designs and research (raw) data for the second stage of the HELENA study. Instructions and directory outline is included in the zip-file.
Conference Paper
Full-text available
Software and system development faces numerous challenges of rapidly changing markets. To address such challenges, companies and projects design and adopt specific development approaches by combining well-structured comprehensive methods and flexible agile practices. Yet, the number of methods and practices is large, and available studies argue that the actual process composition is carried out in a fairly ad-hoc manner. The present paper reports on a survey on hybrid software development approaches. We study which approaches are used in practice, how different approaches are combined, and what contextual factors influence the use and combination of hybrid software development approaches. Our results from 69 study participants show a variety of development approaches used and combined in practice. We show that most combinations follow a pattern in which a traditional process model serves as framework in which several fine-grained (agile) practices are plugged in. We further show that hybrid software development approaches are independent from the company size and external triggers. We conclude that such approaches are the results of a natural process evolution, which is mainly driven by experience, learning, and pragmatism.
Article
Establishing a shared software project vision is a key challenge in Requirements Engineering (RE). Several approaches use videos to represent visions. However, these approaches omit how to produce a good video. This missing guidance is one crucial reason why videos are not established in RE. We propose a quality model for videos representing a vision, so-called vision videos. Based on two literature reviews, we elaborate ten quality characteristics of videos and five quality characteristics of visions which together form a quality model for vision videos that includes all 15 quality characteristics. We provide two representations of the quality model: (a) a hierarchical decomposition of vision video quality into the quality characteristics and (b) a mapping of these characteristics to the video production and use process. While the hierarchical decomposition supports the evaluation of vision videos, the mapping provides guidance for video production. In an evaluation with 139 students, we investigated whether the 15 characteristics are related to the overall quality of vision videos perceived by the subjects from a developer’s the point of view. Six characteristics (video length, focus, prior knowledge, clarity, pleasure, and stability) correlated significantly with the likelihood that the subjects perceived a vision video as good. These relationships substantiate a fundamental relevance of the proposed quality model. Therefore, we conclude that the quality model is a sound basis for future refinements and extensions.
Chapter
Context: Many researchers advocate “tailoring” agile methods to suit a project’s or company’s specific environment and needs. This includes combining agile methods with more traditional “plan driven” practices. Objective: This study aims to assess to what extent projects actually combine agile and traditional practices. Method: Data from the HELENA survey of nearly 700 projects were examined to assess how many projects combine agile methods and traditional methods, and also to what extent they used different software development practices. Results: The data show that, overall, two-thirds of the projects in the survey combine agile and traditional methods to some extent. However, projects that combine agile and traditional methods are significantly less likely to use agile practices than projects that solely use agile methods. Conclusions: We hypothesize that the mindset of an organization, rather than technical necessity, determines whether a project will adopt a hybrid vs. purely agile approach.
Chapter
The first course in software engineering is the most critical. Education must start from an understanding of the heart of software development, from familiar ground that is common to all software development endeavors. This book is an in-depth introduction to software engineering that uses a systematic, universal kernel to teach the essential elements of all software engineering methods. This kernel, Essence, is a vocabulary for defining methods and practices. Essence was envisioned and originally created by Ivar Jacobson and his colleagues, developed by Software Engineering Method and Theory (SEMAT) and approved by The Object Management Group (OMG) as a standard in 2014. Essence is a practiceindependent framework for thinking and reasoning about the practices we have and the practices we need. Essence establishes a shared and standard understanding of what is at the heart of software development. Essence is agnostic to any particular method, lifecycle independent, programming language independent, concise, scalable, extensible, and formally specified. Essence frees the practices from their method prisons. The first part of the book describes Essence, the essential elements to work with, the essential things to do and the essential competencies you need when developing software. The other three parts describe more and more advanced use cases of Essence. Using real but manageable examples, it covers the fundamentals of Essence and the innovative use of serious games to support software engineering. It also explains how current practices such as user stories, use cases, Scrum, and microservices can be described using Essence, and illustrates how their activities can be represented using the Essence notions of cards and checklists. The fourth part of the book offers a vision how Essence can be scaled to support large, complex systems engineering. Essence is supported by an ecosystem developed and maintained by a community of experienced people worldwide. From this ecosystem, professors and students can select what they need and create their own way of working, thus learning how to create ONE way of working that matches the particular situation and needs.