Conference PaperPDF Available

Quantifying the Impact of Different Non-functional Requirements and Problem Domains on Software Effort Estimation

Authors:

Abstract and Figures

The effort estimation techniques used in the software industry often tend to ignore the impact of Non-functional Requirements (NFR) on effort and reuse standard effort estimation models without local calibration. Moreover, the effort estimation models are calibrated using data of previous projects that may belong to problem domains different from the project which is being estimated. Our approach suggests a novel effort estimation methodology that can be used in the early stages of software development projects. Our proposed methodology initially clusters the historical data from the previous projects into different problem domains and generates domain specific effort estimation models, each incorporating the impact of NFR on effort by sets of objectively measured nominal features. We reduce the complexity of these models using a feature subset selection algorithm. In this paper, we discuss our approach in details, and we present the results of our experiments using different supervised machine learning algorithms. The results show that our approach performs well by increasing the correlation coefficient and decreasing the error rate of the generated effort estimation models and achieving more accurate effort estimates for the new projects.
Content may be subject to copyright.
QUANTIFYING THE IMPACT OF DIFFERENT NON-FUNCTIONAL
REQUIREMENTS AND PROBLEM DOMAINS ON SOFTWARE
EFFORT ESTIMATION
Abstract - The effort estimation techniques used in the
software industry often tend to ignore the impact of Non-
functional Requirements (NFR) on effort and reuse standard
effort estimation models without local calibration. Moreover,
the effort estimation models are calibrated using data of
previous projects that may belong to problem domains
different from the one of the project being estimated. Our
approach suggests a novel effort estimation methodology that
can be used in the early stages of software development
projects. Our proposed methodology initially clusters the
historical data from the previous projects into different
problem domains and generates domain specific effort
estimation models, each incorporating the impact of NFR on
effort by sets of objectively measured categorical features. We
reduce the complexity of these models using a feature subset
selection algorithm. In this paper, we discuss our approach in
details, and we present the results of our experiments using
different supervised machine learning algorithms. The results
show that our approach performs well by increasing the
correlation coefficient and decreasing the error rate of the
generated effort estimation models and achieving more
accurate effort estimates for the new projects.
Keywords: Software Effort Estimation, Non-functional
Requirements, Supervised Machine Learning.
1. INTRODUCTION
The success of planning and management of software
project largely depends on the estimation of size and
effort. A good estimation of these variables available right
from the start in a project gives the project manager
confidence about any future course of action, since many
of the decisions made during development depend on, or
are influenced by, the initial estimations. Hence, effort
estimation is one of the most crucial steps of planning and
management of a software project.
The work described in this paper introduces a new
comprehensive methodology to estimate software
development effort during the early phases of
requirements development using the functional size of the
software as primary variable.
Although, effort estimation in practice is largely
performed by subjective evaluations, there has been
numerous works in this field attempting to build
parametric models for estimating effort. All these models
are calibrated with historical data from past projects, so
that the effort of the new software projects can be
estimated. However, while some tend to ignore the
impacts of different non-functional requirements [7],
others [1] include them partially requiring subjective
judgement by human experts. Ignoring non-functional
requirements and introducing subjective evaluations can
often result in a large magnitude of error in effort
estimation. In contrast, our work proposes a methodology
that can objectively quantify the impact of non-functional
requirements on effort estimation. We take into
consideration the impact of six high-level classes of non-
functional requirements chosen from the NFR ontology,
which is described in [2], to encompass all possible
classes of non-functional requirements. The goals of this
work are presented below:
1. To develop an effort estimation model based on the
historical data of previous projects to estimate
software development effort during the requirements
specification phase.
2. To objectively assess the impacts of different non-
functional requirements and different problem
domains on the estimation of software development
effort.
3. To make the effort estimation model robust by
dynamically reducing the feature space using both
statistical and semantic techniques.
The paper is organized as following: section 2
provides the background required to understand the
methodology described in section 3. The experimental
work and the discussion of the obtained results are
explained in sections 4 and 5 respectively. The recent
related work is surveyed in section 6. Our concluding
remarks and the future work directions are outlined in
section 7.
2. BACKGROUND
Feature is a variable that describes information about
certain project. For example, Number of Developers is a
feature that measures the number of software developers
in a software project. Features can have different types
such as numeric or nominal. Numeric feature is a feature
whose possible values contain numerical values. On the
other hand, nominal feature (often used interchangeably
with categorical term) is a feature whose possible values
contain qualitative values [10]. For example, Project
Complexity can be measured using 3 values Low,
Medium, High.
The number of features collected from previously
completed projects can be quite high. Therefore, it is
necessary to reduce the complexity of the feature space by
using feature selection. The feature subset selection helps
to reduce the redundancy in the feature subset [8]. There
are various ways to perform feature subset selection. For
example, Filter and Wrapper methods are one of common
feature reduction techniques [8]. Filter methods reduce
feature set with help of heuristics such as correlation,
standard deviation, entropy. They are not based on an
induction algorithm. Whereas, wrapper methods are based
on a learning algorithm that reduces the feature space by
evaluating each subset of feature for its merit [8]. The
correlation based feature selection (CFS) is a filter method
that uses correlation as a heuristic criterion to evaluate the
merit of feature subset [8].
There are many statistical software tools in the market
that can help one to perform feature selection or
regression analysis. For example, we have used Waikato
Environment for Knowledge Analysis (WEKA) tool in
our experiments. WEKA is one of the well-known
powerful tools containing multiple learning algorithms
designed for feature reduction, statistical analysis and data
mining [11].
Functional size is one of the important features used to
quantify the software size by quantification of functional
user requirements [12]. We refer to it as Size in our paper.
COSMIC is a functional size measurement method
used to measure objectively Size [13]. The method has
become an International Standard ISO/IEC 19761:2003
and has been used widely in the academia and industry
[13].
3. METHODOLOGY
The effort estimation methodology takes into account
the problem domain of software project to be estimated by
generating effort estimation models specific to a certain
problem domain. Also, it incorporates categorical features
such as the impact of the non-functional requirements on
effort. The complexity of the models is reduced using a
feature subset selection algorithm. The effectiveness of
the methodology is validated using case studies presented
in the results section.
The methodology has two parts: generation of effort
estimation model from historical data and application of
the model on the new project(s) (Figure 1). In the first
part, we cluster projects by problem domain, gather
historical data, split the feature subset into nominal and
numeric feature groups, reduce the nominal feature subset
using either statistical or semantic method, generate effort
estimation model from the feature subset consisting of
reduced nominal feature subset and original numeric
subset. In the second part, we identify the new project’s
problem domain, gather new project’s objectively
measurable feature, estimate new project’s subjective
features, and estimate new project’s effort using the
generated effort estimation model.
The nominal feature reduction step of the
methodology proposed here can be considered as a point
of extension where the estimator can decide to plug-in the
desired method to reduce the number of nominal features
(e.g. impact of NFR on effort, the type of architecture
style used in the project, project difficulty). In this work,
we have used the feature subset reduction algorithm CFS
of WEKA developed previously by Mark A. Hall in order
to reduce the set of nominal features. However, one can
use wrappers method or another filter method to reduce
the feature subset.
Figure 1 Effort Estimation Methodology Steps.
Also, it was observed effort estimation models
perform better when the nominal feature values are
converted into numerical ones. For example, the impact of
performance on effort values can be mapped as following
{-2=Very Low, -1=Low, 0=Nominal, 1=High, 2=Very
High} where Very Low means the performance
requirements reduce the effort significantly, Low means
the performance requirements reduce the effort slightly,
Nominal means the performance requirements do not
have any impact on the effort, High means the
performance requirements increase the effort slightly,
Very High means the performance requirements increase
the effort significantly.
Furthermore, it has been found that separation of
original features into nominal and numeric groups,
reducing separately the subset of nominal features and
then combining the reduced subset with the original
numerical subset produces better result with the linear
effort estimation model.
We have designed a special questionnaire to collect
important historical information about projects. The
questionnaire can be also used for the new project to be
estimated. It makes it easy and efficient to gather impact
of NFR on effort, project complexity, and average
experience of project member.
3.1 Generate Effort Estimation Model from Historical
Data
3.1.1 Cluster Projects by Problem Domains
Problem domains dictate the use of different
architectural design patterns and, thus, play a significant
role in predicting the complexity of the software to be
developed. Our work identifies how this variation in
problem domains translates into changes in development
effort by first clustering the historical dataset into problem
domain categories before calibrating our effort estimation
model.
Software development industries categorize the
problem domains for organization of their product
inventories. Thus, the classification of problem domains
varies organization to organization based on their internal
needs. For example, Microsoft Corporation [6] prescribes
40 different classes of problem domains for software
products. We, therefore, allow the decomposition of
problem domains into open categories that can be
customized to have an organization-specific classification.
We set the following attributes to describe a problem
domain class,
id: INT
name: STRING
application_type: {“desktop”, web, plug-in, real-
time, developer, “publisher”, “embedded”, business,
utility, game, academic, communication,
system, portable, graphics, multimedia, driver,
framework, research, prototype, component,
other”}
deployment_type: {“private”, “public-open”, “public-
closed”}
where id allows us identify each problem domain
uniquely, and application_type and deployment_type
allows higher level classification of problem domain to
provide additional categorical features during model
calibration.
Each instance in our historical dataset that represents a
software project is tagged with a problem domain id,
indicating the problem domain that the software belongs
to. Thus, when calibrating our effort estimation model, we
first choose a target problem domain based on the
software project that is to be estimated. Our system then
automatically selects from the historical database the
instances that belong to the chosen problem domain and
calibrates the effort estimation model based on those
instances only.
3.1.2 Gather historical data
Gather past projects data which includes effort and
other important variables such as Size, NFR Impact, etc.
Classify projects into the corresponding domain classes.
Use objective guidelines on assigning different NFRs
categorical values (e.g. analyze how much LOC, effort
spent per categorical NFR value).
Size can be measured in function points, COSMIC
CFPs, or any other accepted units of measure as long as
historical projects and the project to be estimated are
consistent in the method Size counting. Our approach is to
use COSMIC CFPs in the methodology validation.
The impact of NFR on effort, average experience of
project members, project complexity are some important
features that can be gathered effectively using the
questionnaire we have designed. The main prerequisite is
that the questionnaire needs to be filled in by a member of
a project or a person who has access to historical data.
3.1.3 Split the feature subset
Split the feature subset of historical projects into 2
groups: nominal and numerical. Nominal features could
include variables with categorical values such as Low,
Nominal, and Very High. For example, the impact of
NFR on effort and project complexity are one the types of
nominal features. Numerical features include variables
with numeric values. For example, Size and Number of
Developers are numerical features.
3.1.4 Feature subset reduction
For each domain class, in order to reduce complexity
and increase precision of effort estimation models,
perform one of the below mentioned techniques. The goal
here is to eliminate NFRs and categorical features that do
not affect effort variable or found to be redundant.
The feature subset reduction can be done either using
statistical method or semantic method. The statistical
method relies on statistical principles to find redundant
features. On the other hand, the semantic method is based
on the analysis of the semantic meaning of each feature
and its contribution towards the class feature (e.g. effort).
If the projects in the historical database have no
categorical features, then this step can be skipped and we
can proceed to the next step.
3.1.4.1 Statistical feature reduction
The statistical feature reduction uses either Filter or
Wrappers method to reduce the number of features in the
feature subset.
1. Assign each categorical variable a numerical value such
as Low = -1, Medium = 0, High = 1, if possible.
2. Run feature subset selection analysis to find out what
categorical variables have more impact on Effort. Filters
or Wrappers method can be used for this step. For
example, WEKA provides
weka.attributeSelection.CfsSubsetEval algorithm for
feature selection using CFS Filter approach.
3. Select the best feature subset of categorical variables
and proceed to the next step where the best feature subset
replaces original set of categorical variables.
3.1.4.2 Semantic feature reduction
The semantic feature reduction method uses previous
knowledge about features to identify redundancy among
them. For example, one can use the previously done study
about impact of NFR on effort to analyze which
combination of NFRs is redundant and eliminate the
redundant features.
3.1.5 Generate effort estimation model
The effort estimation model can be generated in two
different ways. If there are enough of historical projects to
cover all combinations of nominal features, then the
method EEM2 can be used. Otherwise, the method EEM1
can be used.
3.1.5.1 Method EEM1
The result of the reduced nominal feature subset is
combined with numerical features. The effort estimation
model is generated from this combined subset. The
generation of the model can be done using available tools
such as WEKA.
3.1.5.2 Method EEM2
The method EEM2 generates a separate effort
estimation model per a combination of nominal feature
values based on the historical projects data.
For example, let us have N number of categorical
features and M number of numerical features.
Categorical_Variable1 has 3 values {Low, Medium,
High}. Categorical_Variable2 has 5 values {Very Low,
Low, Nominal, High, Very High}. Categorical_VariableN
has 5 values {Very Low, Low, Nominal, High, Very
High}.
Model
#
Categorical_
Variable1
Categorical_
Variable2
Categorical_VariableN
1
Low
Very Low
Very Low
2
Medium
Very Low
Very Low
3
High
Very Low
Very Low
4
Low
Low
Very Low
5
Medium
Low
Very Low
Each generated effort estimation model contains only
numeric features. Therefore, a new project to be estimated
will be mapped into the corresponding effort estimation
model by finding out to which combination it matches.
For example, if a new project has Categorical_Variable1
= Low, Categorical_Variable2 = Very Low,
Categorical_VariableN = Very Low, then the model #1
will be used to estimate the effort the new project.
3.2 Apply generated Effort Estimation Models on new
projects
3.2.1 Identify new project’s problem domain
Find out to what problem domain the new project
belongs to. The schema described earlier in the section
3.1.1 for the classification of the project problem domain
can be reused for this purpose as well.
3.2.2 Gather new project’s objectively measurable
features
Gather objectively measurable features of a new
project. For example, Size, Number of Developers,
Average Experience can be objectively measured and
quantified for the new project.
3.2.3 Estimate new project’s subjective features
Estimate features that cannot be predicted or measured
objectively (e.g. impact of NFR on effort) using the
methodology based on historical projects’ data and
objective features of a new project such as Size, Problem
Domain, Average Experience.
The subjective features can be estimated by extracting
a model from the objectively measured features of
historical projects while keeping out the other categorical
features. The generated model is then used with new
project’s objectively measured features as input to
estimate the subjective features such as categorical
variables.
The estimation model can be generated using available
tools in the following way:
use WEKA tool’s regression algorithms directly.
use WEKA tool’s CFS algorithm to reduce the
feature subset and then WEKA regression algorithm
(e.g. Linear Regression) on the reduced feature
subset.
use another algorithm of choice of a statistical
software tool.
It has been found that the estimation model tend to
achieve higher accuracy when the categorical feature
values are converted first to numerical ones before using
CFS or regression algorithm. The estimation model can be
specified to generate values in a range to give more
flexibility for the estimator. The step of this methodology
helps to reduce the subjectivity of the estimation of
categorical variables which can be incorrectly assessed by
human estimators.
3.2.4 Estimate new project’s effort
The effort of a new project can be estimated using
effort estimation models generated with help of Method
EEM1 or Method EEM2.
If the Method EEM1 is used, then it is necessary to
plug in values of objectively measured features and values
of subjective features into the generated effort estimation
model. If the method EEM2 is used, then
1. Map categorical feature values into the
corresponding effort estimation model of the particular
problem domain generated previously in order to identify
the correct effort estimation model.
2. Plug values of numeric features of the current
project into the effort estimation model to estimate the
effort of the new project.
The effort estimation model derived using Method
EEM1 and Method EEM2 can be specified to generate
values in a range to give more flexibility for the estimator.
For example, the effort of a new project can be between
100 and 120 person days.
4. EXPERIMENTS
The validation of the methodology is done using 2
case studies. In the first case study, a private data set
consisting of industrial projects completed at a large
international software development company is used. In
the second case study, a public data set from ISBSG is
used. The data set is split into 3 groups according to the
problem domain.
The data sets use COSMIC as a method of
measurement for software size. The automation of certain
steps in the methodology is achieved using WEKA tool.
However, an alternative statistical tool can be used by the
estimator to execute the steps of the methodology. The
details of data sets are presented in Table 1.
Table 1: Data sets used in the validation of the
methodology
Data Set
Number of
Projects
Used in Case
Study
Industry
(private)
20
1, 2
ISBSG
(public)
151
2
ISBSG
(public)
18
2
ISBSG
(public)
64
2
In the first case study, different experiments are
performed to compare the approach of our methodology
against a traditional approach where the effort estimation
methodology is generated directly out of the data set.
In the case of the industry data set, 16 projects are
used to generate the effort estimation model and the effort
of 4 new projects is estimated by human experts and by
our methodology. The PD1 problem domain refers to
application type = web and deployment type private. In
trial 1, the human expert is asked to estimate the new
project’s based on their own experience and knowledge.
In trial 2, we build regular linear regression model out of
the projects of the industry data set. The best model
among all trials excluding trial 1 and 2 is selected to
estimate the effort of new projects. The result of this
estimation is compared to trial 1 result. The goal of trials
[3-7] is to gradually observe the improvements in the
generated effort estimation model via comparing the
correlation coefficient and error rates. When we arrive to
trial 8, it represents our approach and its result is
compared to results of trials 1 and 2.
Table 2: Design of Experiment of case study 1
Step
Trial
1
2
3
4
5
6
7
8
Separate nominal and
numeric values
X
X
Convert nominal
variables values to
numerical values
X
X
X
Perform Feature Subset
Selection on nominal
features
X
X
Perform Feature Subset
Selection on all features
(numeric and nominal)
X
X
Run regression on the
selected feature subset
X
X
X
X
Run regression on all
features in the set
X
X
Run regression on all
features in the set using
LinearRegression
X
Estimate new projects’
subjective features using
expert knowledge
X
X
X
X
Estimate new project’s
subjective features using
our approach
X
X
Estimate new project’s
effort using human expert
only
X
In the second case study, the ISBSG and Industry data
sets are used to compare our methodology to a regular
approach using 3 trials. The goal of this case study is to
show that our approach performs better than regular
approach which does not take into account impacts of
different non-functional requirements and different
problem domains on the estimation of software
development effort and does not reduce the feature space.
In the first trial, we estimate the projects of problem
domain PD2 using our methodology. The problem
domain PD2 refers to application type = business and
deployment type public closed. In the second trial, we
estimate the effort of new projects of the problem domain
PD1 and PD2 without using our approach. In fact, the new
projects are estimated using effort estimation model
generated directly from the data set containing projects of
problem domain PD1, PD2, PD3, PD4 where PD3
problem domain refers to application type = utility and
deployment type public open and PD4 problem domain
refers to application type = other and deployment type
public open. In the third trial, we estimate new projects of
problem domains PD1 and PD2 without using our
approach, but this time new projects from problem
domain PD1 are estimated using effort estimation model
generated from the problem domain PD2 and PD3 and
new projects from the problem domain PD2 are estimated
using effort estimation model generated from the problem
domain PD1 and PD3 (Table 3).
5. RESULTS AND ANALYSIS
Table 4 presents the results of the generation of the
effort estimation model using the Method EEM1 of the
methodology using the best performing regression
algorithms for the case study 1. The algorithms are
available in WEKA tool. MultilayerPerceptron algorithm
is artificial neural network based algorithm. It has been
found that LinearRegression algorithm has performed the
best in terms of the correlation coefficient, relative
absolute error, and root relative squared error in Trial 8.
Table 3: Outline of Case Study 2
New
Projects
Problem
Domain
Problem Domain
of the Historical
Data Set used for
Effort Estimation
Model Generation
Trial in
the Case
Study 2
Our
Methodology
PD2
PD2
1
Yes
PD1
PD1, PD2, PD3,
PD4
2
No
PD2
PD1, PD2, PD3,
PD4
2
No
PD1
PD2
3
No
PD1
PD3
3
No
PD2
PD1
3
No
PD2
PD3
3
No
Table 4: The best performing algorithm in Effort
Estimation Model (Method EEM1) for case study 1
Trial
#
Algorithm
Correlation
Coefficient
Relative
Absolute
Error (%)
Root
relative
squared
error (%)
8
Linear
Regression
0.9775
31.2926
35.4589
4
LeastMedSq
0.9697
38.8387
38.784
4
Linear
Regression
0.9571
37.9866
44.0302
3
Linear
Regression
0.9439
40.2121
54.4069
8
Multilayer
Perceptron
0.9285
36.7743
42.3345
8
LeastMedSq
0.8248
49.6174
54.6742
3
Multilayer
Perceptron
0.8183
53.4167
55.9571
4
Multilayer
Perceptron
0.7624
51.6075
61.602
7
LeastMedSq
0.7489
57.4825
69.5129
6
Linear
Regression
0.7444
72.6686
101.1395
3
LeastMedSq
0.7117
61.8939
70.1565
6
Multilayer
Perceptron
0.6878
67.419
69.7698
5
Multilayer
Perceptron
0.6843
62.58
69.6723
7
Multilayer
Perceptron
0.6484
68.2768
72.2187
5
LeastMedSq
0.5631
71.9888
83.4221
6
LeastMedSq
0.093
104.7056
121.5743
The results prove that our approach performs well
during the generation of the effort estimation model.
However, the number of projects used to generate the
effort estimation model was not very high.
The results show that new projects’ effort estimation
using our approach performed quite well. The correlation
coefficient for the LinearRegression model was 0.7481
while the MMRE was 21 %. On the other hand, the
human expert estimates were prone to have higher MRE.
Table 5: Results of both case studies
Tri
al #
Case
Study
#
New
Project
Problem
Domain
Historical
Data
Problem
Domain
MM
RE
(%)
MdM
RE
(%)
Correlati
on
Coeffici
ent
1
1
PD1
PD1
54
61
N/A
2
1
PD1
PD1
41
40
0
8
1
PD1
PD1
21
19
0.75
1
2
PD2
PD2
23
17
0.94
2
2
PD1
PD1+PD+
PD3+PD4
1273
1015
0.15
2
2
PD2
PD1+PD+
PD3+PD4
71
42
0.74
3
2
PD1
PD2
453
515
0.91
3
2
PD1
PD3
705
698
0.8673
3
2
PD2
PD1
96
96
0
3
2
PD2
PD3
150
116
0.18
We can also observe in the Case Study 2 in Trials 2
and 3 the estimation of the new effort for new projects did
not perform well when our approach was not used. In fact,
the MMRE and MdMRE have detiorated significantly in
some cases. However, when our approach was used in the
Trial 1 of the Case Study 2, the estimation of the effort for
the new projects was done well with MMRE of 23% and
MdMRE of 17%.
6. RELATED WORK
Software cost and effort estimation plays a significant
role in the successful completion of any software.
Resources are assigned according to the effort required to
complete the software. Accurate effort estimation leads
the completion of software project on the scheduled time.
Many models and approaches have been developed in the
past 40 years to estimate the effort. Most of the models
take software size as a basic input to estimate the effort
[14]. Effort is usually calculated by using functional size
of the software [15]. There is a strong relationship
between functional size and effort [16]. Valid measured
functional size has the potential to improve effort
estimation and reduce the “cone of uncertainty” effect on
the project planning. It is critical to correctly establish a
relationship between functional size and effort so that we
could be able to estimate effort accurately. There are
many project and product factors that affect positively or
negatively this relationship. Environmental factors,
technical factors and operating constraints are some of
them [14].
Many significant attempts have been taken to explore
the relationship between the size and effort and also to
identify the subset of those NFRs which may affect this
relation. In the following subsections we present an
overview of research studies, effort estimation models and
functional size estimation methods which consider NFRs
as factors affecting the relationship between the software
size and effort:
6.1 Study by Maxwell and Forselius
A study carried out in Finnish companies [17] to
explore the factors that affect productivity and effort
estimation shows the following results (Table 6):
Table 6: Factors Affecting Productivity by Pekka
Forselius (adapted from [17]).
Data set
Experience Database (206 business
software projects from 26 companies).
Variables
considered in
Database
Productivity
Analysis
Application Programming Language,
Application Type (MIS etc), Hardware
Platform, User Interface, Development
Model, DBMS Architecture, DB
Centralization, Software
Centralization, DBMS Tools, Case
Cools, Operating System, Company
where project was developed,
Business Sector (Banking, Insurance
etc), Customer Participation, Staff
Availability, Standard Use, Method
Use, Tool Use, Software Logical
Complexity, Requirement Volatility,
Quality Requirement, Efficiency
Requirement, Installation
Requirement, Staff’s Analysis Skills,
Staff’s Tools Skills, Staff’s Team
Skills, Staff’s Application Knowledge
Base of Size
Measurement
Experience 2.0 Function Point Method
6.2 Study by Angelis, Stamelos and Morisio
L. Angelis and his colleagues have also made
important contribution towards finding the different
factors that affect size and effort relationship. These
authors study the projects in the International Software
Benchmarking Standards Group (ISBSG). The ISBSG
database contains data about recently developed projects
characterized mostly by attributes of categorical nature
such as the project business area, organization type,
application domain and usage of certain tools or methods.
The authors found 7 important factors that affect the
relationship between the size and effort. The result of this
study is given in more detail below (Table 7) [18]:
Table 7: Factors Affecting Productivity by L. Angelis
[18].
Data set
ISBSG release 6
Factors
1. Development Type
2. Development Platform
3. Language Type
4. Used Methodology
5. Organization Type
6. Business Area Type
7. Application Type
Base of Size
Measurement
IFPUG Function Point
The authors’ method is based on the characterization
of the software to be developed in terms of project and
environment attributes and comparison with some similar
completed projects recovered from the ISBSG. The
authors also refer to that human factors are very important
factors that are not taken into account while performing
any previous study. A recent study shows that
Psychometrics data should be collected to better perform
the empirical study [19].
6.3 Study by Liebchen and Shepperd
A study by Liebchen and Shepperd that aims at
reporting on an ongoing investigation into software
productivity and its influencing factors brought the
following results (Table 8) [20]:
Table 8: Factors Affecting Productivity by Martin
Shepperd [20].
Data Set
25,000 closed projects of a large
multinational company
Attributes
Influencing
Software
Productivity
1. The Degree of Technical
Innovation, Business Innovation,
Application Innovation,
2. Team Complexity
3. Client Complexity
4. Degree of Concurrency
5. Development Team Degree of
Experience With Tools, Information
Technology, Hardware, or With
Adopted Methodology,
6. The Project Management
Experience
Base of Size
Measurement
Function Point
This study confirms the intuitive notion that different
industry sectors exhibit the differences in the productivity.
It is due to the fact that industry sectors also affect the
productivity [20].
6.3.4 Summary of Other Studies
A study in the different Swedish companies shows
that following factors affect the effort estimation [21]:
1. Requirement Volatility (Unclear and Changing
Requirement).
2. Unavailability of Templates.
3. Lack of coordination between product developed
and other parts of the project.
The following factors that are considered important
from ISBSG data repository, also affect the productivity
[22]:
1. Programming Language.
2. Team Size.
3. Organization Type.
4. Application Type.
Another recent study published in the Second ACM-
IEEE international Symposium on Empirical Software
Engineering and Measurement shows the following
results (Table 9) [23]:
Table 9: Factors Affecting Phase Distribution for
Software Development Effort [23].
Data Set
China Software Benchmarking
Standard Group
Factors
1. Development Life Cycle
2. Development Size
3. Software Size
4. Team Size
Base for Size
Measurement
LOC
By analyzing the factors collected from the above
studies, we find that all of them are mapped to concepts
under the root of the NonFunctionalRequirement concept
in the NFRs Ontology [2].
7. CONCLUSIONS AND FUTURE WORK
In this paper, we have presented a novel effort
estimation methodology that can be applied in the early
stages of software development projects to estimate effort
of new projects. The results show that our approach
performs well by increasing the correlation coefficient
and decreasing the error rate of the generated effort
estimation models and achieving more accurate effort
estimates for the new projects. We have developed an
effort estimation model based on the historical data of
previous projects to estimate software development effort
during the requirements specification phase. Also, we
have objectively assessed the impacts of different non-
functional requirements and different problem domains on
the estimation of software development effort. Moreover,
we made the effort estimation model robust by
dynamically reducing the feature space using both
statistical and semantic techniques.
Our methodology proposed in this paper is semi-
automated. Our previous work presented in [5] showed
that the functional and non-functional requirements can
automatically and effectively be extracted from software
requirements document using natural language processing
techniques, and our recent work [3,4] have shown that the
functional size of the software can be computed
objectively from any form of unrestricted textual
representation of functional requirements. In future, we
plan to fully automate our methodology and to integrate
with the work presented in [3] in order to obtain the Size
measurements automatically in the early stages of the
software development projects.
REFERENCES
[1] Boehm, B. (2000) Safe and Simple Software Cost Analysis. IEEE
Software, 17 (5), 14-17.
[2] Kassab, M. (2009). Non-Functional Requirements:
Modeling and Assessment. VDM Verlag.
[3] Hussain, I., Kosseim, L., & Ormandjieva, O. (2010).
Towards Approximating COSMIC Functional Size from User
Requirements in Agile Development Processes Using Text Mining. In
LNCS: Natural Language Processing and Information Systems (Vol.
6177/2010, pp. 80-91). Germany: Springer-Verlag.
[4] Hussain, I., Ormandjieva, O., & Kosseim, L. (2009). Mining
and Clustering Textual Requirements to Measure Functional Size of
Software with COSMIC. Proceedings of the International Conference
on Software Engineering Research and Practice (SERP 2009).
[5] Hussain, I., Kosseim, L., & Ormandjieva, O. (2008). Using
Linguistic Knowledge to Classify Non-functional Requirements in SRS
documents. In LNCS: Natural Language and Information Systems (Vol.
5039/2008, pp. 287-298). Germany: Springer-Verlag.
[6] Microsoft Corp. (2011). Windows Intune: Software
Categories. URL: http://onlinehelp.microsoft.com/en-
us/windowsintune/ff399004.aspx (Last Retrieved: May 11, 2011)
[7] Boehm, B., Chulani, S. (2000). Software development cost
estimation approaches a survey. Technical Report. University of
Southern California and IBM Research, Los Angeles, USA.
[8] Mark A. Hall. (2000). Correlation based Feature Selection for
Machine Learning (PhD Thesis). University of Waikato.
[9] Angelis, L., Stamelos, I. (2000). A Simulation Tool for Efficient
Analogy Based Cost Estimation. In Empirical Software Engineering,
(Vol. 5, pp. 3568). Netherlands: Kluwer Academic Publishers.
[10] StatSoft Inc. (2011). Elementary Concepts in Statistic. URL
http://www.statsoft.com/textbook/elementary-concepts-in-statistics/
(Last Retrieved: May 14, 2011)
[11] The University of Waikato. (2001). Machine Learning Project at
the University of Waikato in New Zeland. URL:
http://www.cs.waikato.ac.nz/ml/index.html (Last Retrieved: May 14,
2011)
[12] International Organization for Standardization. (2007). ISO/IEC IS
14143--1:2007: Information technology -- Software measurement --
Functional size measurement -- Part 1: Definition of concepts.
[13] COSMIC. (2009). The COSMIC Functional Size Measurement
Method. URL:
http://www.cosmicon.com/methodV3.asp (Last Retrieved: May 14,
2011)
[14] Gencel, C., & Demirors, O. (2008). Functional size measurement
revisited, ACM Transactions Software Engineering Methodol, 17(3),
(pp. 1-36).
[15] Fenton, N.E., & Pfleeger, S.L. (1997). Software Metrics: A
rigorous and Practical Approach, International Thomson Computer
Press.
[16] Pfleeger, S. L., Wu, F., & Lewis, R. (2005). Software Cost
Estimation and Sizing Methods: Issues and Guidelines, RAND
Corporation.
[17] Maxwell, K. D., & Forselius, P. (2000). Benchmarking Software-
Development Productivity, IEEE Software, 17(1), (pp. 80-88).
[18] Angelis, L., Stamelos, I., & Morisio, M. (2001). Building A
Software Cost Estimation Model Based On Categorical Data, In
Proceedings of the 7th international Symposium on Software Metrics,
METRICS, IEEE Computer Society, Washington, DC.
[19] Feldt, R., Torkar, R., Angelis, L., & Samuelsson, M. (2008).
Towards individualized software engineering: empirical studies should
collect psychometrics, In Proceedings of the 2008 international
Workshop on Cooperative and Human Aspects of Software Engineering
(Leipzig, Germany, May 13 - 13, 2008), CHASE '08, ACM, New York,
NY, (pp. 49-52).
[20] Liebchen, G. A., & Shepperd, M. (2005). Software Productivity
Analysis of a Large Data Set and Issues of Confidentiality and Data
Quality, In Proceedings of the 11th IEEE international Software Metrics
Symposium (September 19 - 22, 2005), METRICS, IEEE Computer
Society, Washington, DC, 46.
[21] Magazinovic, A., & Pernstål, J. (2008). Any other cost estimation
inhibitors?, In Proceedings of the Second ACM-IEEE international
Symposium on Empirical Software Engineering and Measurement,
Kaiserslautern, Germany, ESEM '08. ACM, New York, NY, (pp. 233-
242).
[22] Lokan, C., Wright, T., Hill, P. R., & Stringer, M. (2001).
Organizational Benchmarking Using the ISBSG Data Repository, IEEE
Software, 18(5), (pp. 26-32).
[23] Yang, Y., He, M., Li, M., Wang, Q., & Boehm, B. (2008). Phase
distribution of software development effort, In Proceedings of the
Second ACM-IEEE international Symposium on Empirical Software
Engineering and Measurement (Kaiserslautern, Germany, October 09 -
10, 2008), ESEM '08, ACM, New York, NY, (pp. 61-69).
... The first one is to estimate the weight of a nonfunctional requirement using those reported in similar projects. In case of unavailability of similar projects, sources like International Software Benchmarking Standards Group (ISBSG) 1 , which provides information for fixing coefficients, can be used [3] [23]. The second strategy is to derive the weight of nonfunctional requirements through a heuristic approach, i.e., the expert judgements. ...
... In recent years, significant attention has been given to nonfunctional requirements because of their importance in systems and software engineering [9] [11] [16] [26]. In fact, the role of nonfunctional requirements has been extensively studied in software engineering literature [3] [6] [7] [11] [48] [25] [43]. In the context of information systems, axiomatic design has been adopted, for instance, for concept evaluation in conceptual design. ...
... On the one hand, they can be estimated by making use of similar projects whose data is available. For this purpose, there are databases such as ISBSG [3] [23], which collect statistics of software projects for benchmarking purposes. ...
Article
Full-text available
The main information systems design techniques focus, almost exclusively, on the functional requirements of the system to be implemented. In its standard formulation, axiomatic design has such characteristics. However, in complex operational environments, this can lead to the identification of functionally valid solutions, but do not perfectly adhere to the system’s nonfunctional requirements. In specific operational contexts, particularly in information systems design, neglecting nonfunctional requirements has been identified as a major threat to projects, which can prevent their proper utilization throughout the design process. In this paper, we focus on nonfunctional requirements in axiomatic design, whose impact assessments can only be performed according to a heuristic basis, i.e., by expert judgment. However, the value assignment by experts can lead to a decisional indeterminacy or cognitive bias. To overcome these limitations, we propose the adoption of a methodological approach based on a reinterpretation of the information axiom of axiomatic design in terms of a multi-criteria decision problem. This approach allows the formal inclusion of nonfunctional requirements in the design process, which can be accomplished by setting them as evaluation attributes to achieve the robust design solution. In this paper, we propose an algorithm to evaluate alternative design solutions based on the information theory of entropy, which then comply with the nonfunctional requirements. We illustrate our approach by a case study, which implements the process of managing patients in home care and compare it with the mathematical-based analytic hierarchy process method proposed in the literature. According to our method, the robust solution is computed in just a single step saving significant computational cost with respect to the iterative-based analytic hierarchy process method. In this perspective, the proposed approach can support information systems designers in decision making as it allows to select the most suitable solution for the context in which it must operate.
... Por otro lado, se detectó un enfoque que emplea el aprendizaje supervisado para cuantificar el impacto de los RNF en la estimación del esfuerzo de desarrollo de un proyecto de software [56]. También, se identificó una propuesta que aborda el análisis de requerimientos para extraer automáticamente información semántica [57]. ...
Article
Full-text available
In recent years, the popularity of machine learning techniques has grown due to the availability of larges volumes of data and the increased processing capacity of computers. Despite the inherent value of these techniques, few studies have attempted to summarize how machine learning algorithms, especially supervised learning have contributed to task automation and resolving challenges in Requirements Engineering. This paper proposes a systematic mapping of the literature to identify and analyze proposals which employ supervised learning in Requirements Engineering between 2002-2018. The goal of this research is to identify trends, datasets, and methods used. Thirty-three studies were selected based on defined inclusion and exclusion criteria. The results show that researches using these techniques focuses on eight broad categories: detection of linguistic problems in requirements documents and artifacts written in natural language, classification of document content, traceability, effort estimation, requirements analysis, failures prediction, quality and detection of business rules. The most used supervised learning algorithms were Support Vector Machine, Naive Bayes, Decision Tree, K-Nearest Neighbour, and Random Forest. Twenty-five public and twenty -eight private data sources were identified. Among the most used public data sources are Predictor Models in Software Engineering, iTrust Electronic Health Care System and Metric Data Program from NASA.
... However, constraints and quality attributes that cannot be expressed in an operationalizable way are not included. Other proposed methodologies aim to measure the impact of the NFRs on the total effort of project [31]. The methods that exist for estimating software effort treat requirements in a way that FRs, NFRs and DPs are still always separated, and in most cases NFRs are considered as factors affecting the total effort. ...
... A good estimation of the size and effort available right from the start in a project gives the project manager confidence about any future course of action, as many of the decisions made during development depend on, or are influenced by, the initial estimations. Hence, effort estimation is one of the most crucial steps of planning and management of a software project [1]. Development effort is considered to be one of the major components of software costs, particularly as regards global development, and it is usually the most challenging effort to predict. ...
Article
Full-text available
In a software development process, effective cost estimation is the most challenging activity. Software effort estimation is a crucial part of cost estimation. Management cautiously considers the efforts and benefits of software before committing the required resources to that project or order for a contract. Unfortunately, it is difficult to measure such preliminary estimation, as it has only little information about the project at an early stage. In this paper, a new approach is proposed; this is based on reasoning by the soft computing approach to calculate the effort estimation of the software. In this approach, rules are generated based on the input dataset. These rules are then clustered for better estimation. In our proposed method, we use modified fuzzy C means for clustering the dataset. Once the clustering is done, various rules are obtained and these rules are given as the input to the neural network. Here, we modify the neural network by incorporating optimization algorithms. The optimization algorithms employed here are the artificial bee colony (ABC), modified cuckoo search (MCS), and hybrid ABC-MCS algorithms. Hence, we obtain three optimized sets of rules that are used for the effort estimation process. The performance of our proposed method is investigated using parameters such as the mean absolute relative error and mean magnitude of relative error.
... This article describes two methods of machine learning, which we use to build estimators of software development effort from historical data with Neuro-Fuzzy Expert System. In order to optimize the results data analytical techniques have been used [63][64][65][66][67][68][69][70][71]. First we need to find the size of the product then it is easy to find the required effort. ...
Article
In software development cost estimation, effort allocation is an important and usually challenging task for project management. This paper observes the use of concepts in software effort estimation by analyzing group work with organizational strategies as outgoing practice. The purpose is to improve our understanding of how software professionals raise different types of information when talking, way of thinking and reaching a decision on a software effort estimate. Software effort estimation is a core task regarding planning, budgeting and controlling software development projects. However, providing exact effort estimates is not easy. Estimation work is increasingly group based, and to support it, there is a need to reveal how work practices are carried out as mutual efforts. This paper contributes to an understanding of the role of concepts in group work and of software effort estimation as a specific work practice. In this paper we investigate the estimation practice through a detailed data analytical technique with machine learning approach. Keywords: Effort Estimation, Group work, Interaction, Qualitative Analysis, and Organizational Strategies.
Chapter
This chapter discusses prospects for the measurement of quantum software artifacts and processes, describing initial directions and reviewing the scattered and scarce literature on the topic. It examines potential differences and commonalities of classical and quantum software in the context of measurement and identifies future research directions that appear to be more promising to address the specificities of quantum computer–oriented development.
Chapter
The characteristic difficulty in creating pure quantum software is mainly due to the inaccessibility to intermediate states, which makes debugging practically impossible. However, the use of formal methods, which apply rigorous mathematical models to ensure error-free software, can overcome this barrier and enable the production of reliable quantum algorithms and applications right out of the box.
Preprint
Full-text available
Data security is being one of the most crucial aspects to be focused on system development. However, using such a feature to enhance the security of data might affect the system's performance. This study aims to observe how substantial Transparent Data Encryption as a solution for data security on Microsoft SQL Server will affect the database management system's performance. Each of the system performance is conducted with stress and load test. This paper concentrates on the upsides of using Transparent Data Encryption over standard database by finding how significant performance degradation has occurred in terms of Reliability and Efficiency.
Article
Full-text available
Estimation of a software project effort, based on project analogies, is a promising method in the area of software cost estimation. Projects in a historical database, that are analogous (similar) to the project under examination, are detected, and their effort data are used to produce estimates. As in all software cost estimation approaches, important decisions must be made regarding certain parameters, in order to calibrate with local data and obtain reliable estimates. In this paper, we present a statistical simulation tool, namely the bootstrap method, which helps the user in tuning the analogy approach before application to real projects. This is an essential step of the method, because if inappropriate values for the parameters are selected in the first place, the estimate will be inevitably wrong. Additionally, we show how measures of accuracy and in particular, confidence intervals, may be computed for the analogy-based estimates, using the bootstrap method with different assumptions about the population distribution of the data set. Estimate confidence intervals are necessary in order to assess point estimate accuracy and assist risk analysis and project planning. Examples of bootstrap confidence intervals and a comparison with regression models are presented on well-known cost data sets.
Conference Paper
Full-text available
Non-functional Requirements (NFRs) such as software quality attributes, software design constraints and software interface requirements hold crucial information about the constraints on the software system under development and its behavior. NFRs are subjective in nature and have a broad impact on the system as a whole. Being distinct from Functional Requirements (FR), NFRs are dealt with special attention, as they play an integral role during software modeling and development. However, since Software Requirements Specification (SRS) documents, in practice, are written in natural language, solely holding the perspectives of the clients, the documents often end up with FR and NFR statements mixed together in the same paragraphs. It is, therefore, left upon the software analysts to classify and separate them manually. The research, presented in this paper, aims to automate the process of detecting NFR sentences by using a text classifier equipped with a part-of-speech (POS) tagger. The results reported in this paper outperform the recent work in the field, and achieved a higher accuracy of 98.56% using 10-folds-cross-validation over the same data used in the literature. The research reported in this paper is part of a larger project aimed at applying Natural Language Processing techniques in Software Requirements Engineering.
Conference Paper
Full-text available
Measurement of software size from user requirements is crucial for the estimation of the developmental time and effort. COSMIC, an ISO/IEC international standard for functional size measurement, provides an objective method of measuring the functional size of the software from user requirements. COSMIC requires the user requirements to be written at a level of granularity, where interactions between the internal and the external environments to the system are visible to the human measurer, in a form similar to use case descriptions. On the other hand, requirements during an agile software development iteration are written in a less formal way than use case descriptions — often in the form of user stories, for example, keeping with the goal of delivering a planned release as quickly as possible. Therefore, size measurement in agile processes uses methods (e.g. story-points, smart estimation) that strictly depend on the subjective judgment of the experts, and avoid using objective measurement methods like COSMIC. In this paper, we presented an innovative concept showing that using a supervised text mining approach, COSMIC functional size can be automatically approximated from informally written textual requirements, demonstrating its applicability in popular agile software development processes, such as Scrum.
Conference Paper
Full-text available
Non-functional requirements (NFRs) pose unique challenges in estimating the effort it would take to implement them. This is mainly because of their unique nature; NFRs are subjective, relative, interactive and tending to have a broad impact on the system as a whole. Nevertheless, it is crucial, when making decisions about the scope of software by given resources and budget, to furnish a justifying and quantitative analysis based on both Functional Requirements (FRs) and NFRs. This paper presents a meta-model which complements the FR dimension with the NFRs as another dimension to be used in effort estimation approaches. The meta-model is deployed to extend the use of the COSMIC functional size measurement method to measure the size of NFRs, as effort is a function of size. We report on a case study to demonstrate our approach in context.
Article
Full-text available
There are various approaches to software size measurement. Among these, the metrics and methods based on measuring the "functionality" attribute have become widely-used since the original method was introduced in 1979 (7). Although functional size measurement methods have gone a long way, they still provide challenges for software managers. This paper identifies improvement opportunities based on empirical studies we performed on ongoing projects. We also compare our findings with the extended dataset provided by the International Software Benchmarking Standards Group (ISBSG).
Conference Paper
The purpose of this qualitative multiple case study is to research the current causes to cost estimation errors and revisit the research question of a well published quantitative study to explore whether responses would be consistent using qualitative methodology. Both overlaps and deviations are to be expected as a result of this comparison. The focus of this paper is on differences where new issues are likely to be found. The 8 resulting cross case cost estimation inhibitors validate 7 out of 16 issues found in the quantitative study. They are also related to the theory within and outside the area of software engineering resulting in five underlying software cost estimation inhibitors and providing a deeper understanding for the snapshot of the current causes of cost estimation errors reported by the practitioners. Lastly, the question of the practitioners' ability to report upon software cost estimation inhibitors is discussed. We argue that there might be causes of cost estimation errors that practitioners might be unwilling to reveal or might even be completely unaware of. Thus, they might unconsciously be providing researchers with invalid data.
Conference Paper
ABSTRACT Effort distribution by phase ,or activity is an ,important but often overlooked aspect compared ,to other ,steps in the ,cost estimation process. Poor effort allocation is among ,the major root causes of rework due to insufficiently resourced ,early activities. This paper provides results of an ,empirical study on phase ,effort distribution data of 75 industry projects, from the China Software Benchmarking,Standard Group (CSBSG) database. The phase effort distribution patterns and variation sources are presented, and analysis results show ,some consistency in effects of software ,size and team size on code and test phase distribution variations, and some considerable deviations in requirements, design, and transition phases, compared with recommendations in the COCOMO model. Finally, this paper discusses the major findings and threats to validity and presents general guidelinesin directing effort allocation. Empirical findings from this study are beneficial for stimulating discussions,and ,debates ,to ,improve ,cost ,estimation ,and benchmarking,practices. Categories and Subject Descriptors D.2.8 [Metrics]: process metrics and product metrics. D.2.9 [Software Engineering]: Management–Cost estimation General Terms