Content uploaded by Olga Ormandjieva
Author content
All content in this area was uploaded by Olga Ormandjieva
Content may be subject to copyright.
QUANTIFYING THE IMPACT OF DIFFERENT NON-FUNCTIONAL
REQUIREMENTS AND PROBLEM DOMAINS ON SOFTWARE
EFFORT ESTIMATION
Abstract - The effort estimation techniques used in the
software industry often tend to ignore the impact of Non-
functional Requirements (NFR) on effort and reuse standard
effort estimation models without local calibration. Moreover,
the effort estimation models are calibrated using data of
previous projects that may belong to problem domains
different from the one of the project being estimated. Our
approach suggests a novel effort estimation methodology that
can be used in the early stages of software development
projects. Our proposed methodology initially clusters the
historical data from the previous projects into different
problem domains and generates domain specific effort
estimation models, each incorporating the impact of NFR on
effort by sets of objectively measured categorical features. We
reduce the complexity of these models using a feature subset
selection algorithm. In this paper, we discuss our approach in
details, and we present the results of our experiments using
different supervised machine learning algorithms. The results
show that our approach performs well by increasing the
correlation coefficient and decreasing the error rate of the
generated effort estimation models and achieving more
accurate effort estimates for the new projects.
Keywords: Software Effort Estimation, Non-functional
Requirements, Supervised Machine Learning.
1. INTRODUCTION
The success of planning and management of software
project largely depends on the estimation of size and
effort. A good estimation of these variables available right
from the start in a project gives the project manager
confidence about any future course of action, since many
of the decisions made during development depend on, or
are influenced by, the initial estimations. Hence, effort
estimation is one of the most crucial steps of planning and
management of a software project.
The work described in this paper introduces a new
comprehensive methodology to estimate software
development effort during the early phases of
requirements development using the functional size of the
software as primary variable.
Although, effort estimation in practice is largely
performed by subjective evaluations, there has been
numerous works in this field attempting to build
parametric models for estimating effort. All these models
are calibrated with historical data from past projects, so
that the effort of the new software projects can be
estimated. However, while some tend to ignore the
impacts of different non-functional requirements [7],
others [1] include them partially requiring subjective
judgement by human experts. Ignoring non-functional
requirements and introducing subjective evaluations can
often result in a large magnitude of error in effort
estimation. In contrast, our work proposes a methodology
that can objectively quantify the impact of non-functional
requirements on effort estimation. We take into
consideration the impact of six high-level classes of non-
functional requirements chosen from the NFR ontology,
which is described in [2], to encompass all possible
classes of non-functional requirements. The goals of this
work are presented below:
1. To develop an effort estimation model based on the
historical data of previous projects to estimate
software development effort during the requirements
specification phase.
2. To objectively assess the impacts of different non-
functional requirements and different problem
domains on the estimation of software development
effort.
3. To make the effort estimation model robust by
dynamically reducing the feature space using both
statistical and semantic techniques.
The paper is organized as following: section 2
provides the background required to understand the
methodology described in section 3. The experimental
work and the discussion of the obtained results are
explained in sections 4 and 5 respectively. The recent
related work is surveyed in section 6. Our concluding
remarks and the future work directions are outlined in
section 7.
2. BACKGROUND
Feature is a variable that describes information about
certain project. For example, Number of Developers is a
feature that measures the number of software developers
in a software project. Features can have different types
such as numeric or nominal. Numeric feature is a feature
whose possible values contain numerical values. On the
other hand, nominal feature (often used interchangeably
with categorical term) is a feature whose possible values
contain qualitative values [10]. For example, Project
Complexity can be measured using 3 values Low,
Medium, High.
The number of features collected from previously
completed projects can be quite high. Therefore, it is
necessary to reduce the complexity of the feature space by
using feature selection. The feature subset selection helps
to reduce the redundancy in the feature subset [8]. There
are various ways to perform feature subset selection. For
example, Filter and Wrapper methods are one of common
feature reduction techniques [8]. Filter methods reduce
feature set with help of heuristics such as correlation,
standard deviation, entropy. They are not based on an
induction algorithm. Whereas, wrapper methods are based
on a learning algorithm that reduces the feature space by
evaluating each subset of feature for its merit [8]. The
correlation based feature selection (CFS) is a filter method
that uses correlation as a heuristic criterion to evaluate the
merit of feature subset [8].
There are many statistical software tools in the market
that can help one to perform feature selection or
regression analysis. For example, we have used Waikato
Environment for Knowledge Analysis (WEKA) tool in
our experiments. WEKA is one of the well-known
powerful tools containing multiple learning algorithms
designed for feature reduction, statistical analysis and data
mining [11].
Functional size is one of the important features used to
quantify the software size by quantification of functional
user requirements [12]. We refer to it as Size in our paper.
COSMIC is a functional size measurement method
used to measure objectively Size [13]. The method has
become an International Standard ISO/IEC 19761:2003
and has been used widely in the academia and industry
[13].
3. METHODOLOGY
The effort estimation methodology takes into account
the problem domain of software project to be estimated by
generating effort estimation models specific to a certain
problem domain. Also, it incorporates categorical features
such as the impact of the non-functional requirements on
effort. The complexity of the models is reduced using a
feature subset selection algorithm. The effectiveness of
the methodology is validated using case studies presented
in the results section.
The methodology has two parts: generation of effort
estimation model from historical data and application of
the model on the new project(s) (Figure 1). In the first
part, we cluster projects by problem domain, gather
historical data, split the feature subset into nominal and
numeric feature groups, reduce the nominal feature subset
using either statistical or semantic method, generate effort
estimation model from the feature subset consisting of
reduced nominal feature subset and original numeric
subset. In the second part, we identify the new project’s
problem domain, gather new project’s objectively
measurable feature, estimate new project’s subjective
features, and estimate new project’s effort using the
generated effort estimation model.
The nominal feature reduction step of the
methodology proposed here can be considered as a point
of extension where the estimator can decide to plug-in the
desired method to reduce the number of nominal features
(e.g. impact of NFR on effort, the type of architecture
style used in the project, project difficulty). In this work,
we have used the feature subset reduction algorithm CFS
of WEKA developed previously by Mark A. Hall in order
to reduce the set of nominal features. However, one can
use wrappers method or another filter method to reduce
the feature subset.
Figure 1 Effort Estimation Methodology Steps.
Also, it was observed effort estimation models
perform better when the nominal feature values are
converted into numerical ones. For example, the impact of
performance on effort values can be mapped as following
{-2=Very Low, -1=Low, 0=Nominal, 1=High, 2=Very
High} where Very Low means the performance
requirements reduce the effort significantly, Low means
the performance requirements reduce the effort slightly,
Nominal means the performance requirements do not
have any impact on the effort, High means the
performance requirements increase the effort slightly,
Very High means the performance requirements increase
the effort significantly.
Furthermore, it has been found that separation of
original features into nominal and numeric groups,
reducing separately the subset of nominal features and
then combining the reduced subset with the original
numerical subset produces better result with the linear
effort estimation model.
We have designed a special questionnaire to collect
important historical information about projects. The
questionnaire can be also used for the new project to be
estimated. It makes it easy and efficient to gather impact
of NFR on effort, project complexity, and average
experience of project member.
3.1 Generate Effort Estimation Model from Historical
Data
3.1.1 Cluster Projects by Problem Domains
Problem domains dictate the use of different
architectural design patterns and, thus, play a significant
role in predicting the complexity of the software to be
developed. Our work identifies how this variation in
problem domains translates into changes in development
effort by first clustering the historical dataset into problem
domain categories before calibrating our effort estimation
model.
Software development industries categorize the
problem domains for organization of their product
inventories. Thus, the classification of problem domains
varies organization to organization based on their internal
needs. For example, Microsoft Corporation [6] prescribes
40 different classes of problem domains for software
products. We, therefore, allow the decomposition of
problem domains into open categories that can be
customized to have an organization-specific classification.
We set the following attributes to describe a problem
domain class,
id: INT
name: STRING
application_type: {“desktop”, “web”, “plug-in”, “real-
time”, “developer”, “publisher”, “embedded”, “business”,
“utility”, “game”, “academic”, “communication”,
“system”, “portable”, “graphics”, “multimedia”, “driver”,
“framework”, “research”, “prototype”, “component”,
“other”}
deployment_type: {“private”, “public-open”, “public-
closed”}
where id allows us identify each problem domain
uniquely, and application_type and deployment_type
allows higher level classification of problem domain to
provide additional categorical features during model
calibration.
Each instance in our historical dataset that represents a
software project is tagged with a problem domain id,
indicating the problem domain that the software belongs
to. Thus, when calibrating our effort estimation model, we
first choose a target problem domain based on the
software project that is to be estimated. Our system then
automatically selects from the historical database the
instances that belong to the chosen problem domain and
calibrates the effort estimation model based on those
instances only.
3.1.2 Gather historical data
Gather past projects data which includes effort and
other important variables such as Size, NFR Impact, etc.
Classify projects into the corresponding domain classes.
Use objective guidelines on assigning different NFRs
categorical values (e.g. analyze how much LOC, effort
spent per categorical NFR value).
Size can be measured in function points, COSMIC
CFPs, or any other accepted units of measure as long as
historical projects and the project to be estimated are
consistent in the method Size counting. Our approach is to
use COSMIC CFPs in the methodology validation.
The impact of NFR on effort, average experience of
project members, project complexity are some important
features that can be gathered effectively using the
questionnaire we have designed. The main prerequisite is
that the questionnaire needs to be filled in by a member of
a project or a person who has access to historical data.
3.1.3 Split the feature subset
Split the feature subset of historical projects into 2
groups: nominal and numerical. Nominal features could
include variables with categorical values such as Low,
Nominal, and Very High. For example, the impact of
NFR on effort and project complexity are one the types of
nominal features. Numerical features include variables
with numeric values. For example, Size and Number of
Developers are numerical features.
3.1.4 Feature subset reduction
For each domain class, in order to reduce complexity
and increase precision of effort estimation models,
perform one of the below mentioned techniques. The goal
here is to eliminate NFRs and categorical features that do
not affect effort variable or found to be redundant.
The feature subset reduction can be done either using
statistical method or semantic method. The statistical
method relies on statistical principles to find redundant
features. On the other hand, the semantic method is based
on the analysis of the semantic meaning of each feature
and its contribution towards the class feature (e.g. effort).
If the projects in the historical database have no
categorical features, then this step can be skipped and we
can proceed to the next step.
3.1.4.1 Statistical feature reduction
The statistical feature reduction uses either Filter or
Wrappers method to reduce the number of features in the
feature subset.
1. Assign each categorical variable a numerical value such
as Low = -1, Medium = 0, High = 1, if possible.
2. Run feature subset selection analysis to find out what
categorical variables have more impact on Effort. Filters
or Wrappers method can be used for this step. For
example, WEKA provides
weka.attributeSelection.CfsSubsetEval algorithm for
feature selection using CFS Filter approach.
3. Select the best feature subset of categorical variables
and proceed to the next step where the best feature subset
replaces original set of categorical variables.
3.1.4.2 Semantic feature reduction
The semantic feature reduction method uses previous
knowledge about features to identify redundancy among
them. For example, one can use the previously done study
about impact of NFR on effort to analyze which
combination of NFRs is redundant and eliminate the
redundant features.
3.1.5 Generate effort estimation model
The effort estimation model can be generated in two
different ways. If there are enough of historical projects to
cover all combinations of nominal features, then the
method EEM2 can be used. Otherwise, the method EEM1
can be used.
3.1.5.1 Method EEM1
The result of the reduced nominal feature subset is
combined with numerical features. The effort estimation
model is generated from this combined subset. The
generation of the model can be done using available tools
such as WEKA.
3.1.5.2 Method EEM2
The method EEM2 generates a separate effort
estimation model per a combination of nominal feature
values based on the historical projects data.
For example, let us have N number of categorical
features and M number of numerical features.
Categorical_Variable1 has 3 values {Low, Medium,
High}. Categorical_Variable2 has 5 values {Very Low,
Low, Nominal, High, Very High}. Categorical_VariableN
has 5 values {Very Low, Low, Nominal, High, Very
High}.
Model
#
Categorical_
Variable1
Categorical_
Variable2
…
Categorical_VariableN
1
Low
Very Low
…
Very Low
2
Medium
Very Low
…
Very Low
3
High
Very Low
…
Very Low
4
Low
Low
…
Very Low
5
Medium
Low
…
Very Low
…
…
…
…
…
Each generated effort estimation model contains only
numeric features. Therefore, a new project to be estimated
will be mapped into the corresponding effort estimation
model by finding out to which combination it matches.
For example, if a new project has Categorical_Variable1
= Low, Categorical_Variable2 = Very Low,
Categorical_VariableN = Very Low, then the model #1
will be used to estimate the effort the new project.
3.2 Apply generated Effort Estimation Models on new
projects
3.2.1 Identify new project’s problem domain
Find out to what problem domain the new project
belongs to. The schema described earlier in the section
3.1.1 for the classification of the project problem domain
can be reused for this purpose as well.
3.2.2 Gather new project’s objectively measurable
features
Gather objectively measurable features of a new
project. For example, Size, Number of Developers,
Average Experience can be objectively measured and
quantified for the new project.
3.2.3 Estimate new project’s subjective features
Estimate features that cannot be predicted or measured
objectively (e.g. impact of NFR on effort) using the
methodology based on historical projects’ data and
objective features of a new project such as Size, Problem
Domain, Average Experience.
The subjective features can be estimated by extracting
a model from the objectively measured features of
historical projects while keeping out the other categorical
features. The generated model is then used with new
project’s objectively measured features as input to
estimate the subjective features such as categorical
variables.
The estimation model can be generated using available
tools in the following way:
use WEKA tool’s regression algorithms directly.
use WEKA tool’s CFS algorithm to reduce the
feature subset and then WEKA regression algorithm
(e.g. Linear Regression) on the reduced feature
subset.
use another algorithm of choice of a statistical
software tool.
It has been found that the estimation model tend to
achieve higher accuracy when the categorical feature
values are converted first to numerical ones before using
CFS or regression algorithm. The estimation model can be
specified to generate values in a range to give more
flexibility for the estimator. The step of this methodology
helps to reduce the subjectivity of the estimation of
categorical variables which can be incorrectly assessed by
human estimators.
3.2.4 Estimate new project’s effort
The effort of a new project can be estimated using
effort estimation models generated with help of Method
EEM1 or Method EEM2.
If the Method EEM1 is used, then it is necessary to
plug in values of objectively measured features and values
of subjective features into the generated effort estimation
model. If the method EEM2 is used, then
1. Map categorical feature values into the
corresponding effort estimation model of the particular
problem domain generated previously in order to identify
the correct effort estimation model.
2. Plug values of numeric features of the current
project into the effort estimation model to estimate the
effort of the new project.
The effort estimation model derived using Method
EEM1 and Method EEM2 can be specified to generate
values in a range to give more flexibility for the estimator.
For example, the effort of a new project can be between
100 and 120 person days.
4. EXPERIMENTS
The validation of the methodology is done using 2
case studies. In the first case study, a private data set
consisting of industrial projects completed at a large
international software development company is used. In
the second case study, a public data set from ISBSG is
used. The data set is split into 3 groups according to the
problem domain.
The data sets use COSMIC as a method of
measurement for software size. The automation of certain
steps in the methodology is achieved using WEKA tool.
However, an alternative statistical tool can be used by the
estimator to execute the steps of the methodology. The
details of data sets are presented in Table 1.
Table 1: Data sets used in the validation of the
methodology
Data Set
Number of
Projects
Problem
Domain
Used in Case
Study
Industry
(private)
20
PD1
1, 2
ISBSG
(public)
151
PD2
2
ISBSG
(public)
18
PD3
2
ISBSG
(public)
64
PD4
2
In the first case study, different experiments are
performed to compare the approach of our methodology
against a traditional approach where the effort estimation
methodology is generated directly out of the data set.
In the case of the industry data set, 16 projects are
used to generate the effort estimation model and the effort
of 4 new projects is estimated by human experts and by
our methodology. The PD1 problem domain refers to
application type = web and deployment type private. In
trial 1, the human expert is asked to estimate the new
project’s based on their own experience and knowledge.
In trial 2, we build regular linear regression model out of
the projects of the industry data set. The best model
among all trials excluding trial 1 and 2 is selected to
estimate the effort of new projects. The result of this
estimation is compared to trial 1 result. The goal of trials
[3-7] is to gradually observe the improvements in the
generated effort estimation model via comparing the
correlation coefficient and error rates. When we arrive to
trial 8, it represents our approach and its result is
compared to results of trials 1 and 2.
Table 2: Design of Experiment of case study 1
Step
Trial
1
2
3
4
5
6
7
8
Separate nominal and
numeric values
X
X
Convert nominal
variables values to
numerical values
X
X
X
Perform Feature Subset
Selection on nominal
features
X
X
Perform Feature Subset
Selection on all features
(numeric and nominal)
X
X
Run regression on the
selected feature subset
X
X
X
X
Run regression on all
features in the set
X
X
Run regression on all
features in the set using
LinearRegression
X
Estimate new projects’
subjective features using
expert knowledge
X
X
X
X
Estimate new project’s
subjective features using
our approach
X
X
Estimate new project’s
effort using human expert
only
X
In the second case study, the ISBSG and Industry data
sets are used to compare our methodology to a regular
approach using 3 trials. The goal of this case study is to
show that our approach performs better than regular
approach which does not take into account impacts of
different non-functional requirements and different
problem domains on the estimation of software
development effort and does not reduce the feature space.
In the first trial, we estimate the projects of problem
domain PD2 using our methodology. The problem
domain PD2 refers to application type = business and
deployment type public closed. In the second trial, we
estimate the effort of new projects of the problem domain
PD1 and PD2 without using our approach. In fact, the new
projects are estimated using effort estimation model
generated directly from the data set containing projects of
problem domain PD1, PD2, PD3, PD4 where PD3
problem domain refers to application type = utility and
deployment type public open and PD4 problem domain
refers to application type = other and deployment type
public open. In the third trial, we estimate new projects of
problem domains PD1 and PD2 without using our
approach, but this time new projects from problem
domain PD1 are estimated using effort estimation model
generated from the problem domain PD2 and PD3 and
new projects from the problem domain PD2 are estimated
using effort estimation model generated from the problem
domain PD1 and PD3 (Table 3).
5. RESULTS AND ANALYSIS
Table 4 presents the results of the generation of the
effort estimation model using the Method EEM1 of the
methodology using the best performing regression
algorithms for the case study 1. The algorithms are
available in WEKA tool. MultilayerPerceptron algorithm
is artificial neural network based algorithm. It has been
found that LinearRegression algorithm has performed the
best in terms of the correlation coefficient, relative
absolute error, and root relative squared error in Trial 8.
Table 3: Outline of Case Study 2
New
Projects’
Problem
Domain
Problem Domain
of the Historical
Data Set used for
Effort Estimation
Model Generation
Trial in
the Case
Study 2
Our
Methodology
PD2
PD2
1
Yes
PD1
PD1, PD2, PD3,
PD4
2
No
PD2
PD1, PD2, PD3,
PD4
2
No
PD1
PD2
3
No
PD1
PD3
3
No
PD2
PD1
3
No
PD2
PD3
3
No
Table 4: The best performing algorithm in Effort
Estimation Model (Method EEM1) for case study 1
Trial
#
Algorithm
Correlation
Coefficient
Relative
Absolute
Error (%)
Root
relative
squared
error (%)
8
Linear
Regression
0.9775
31.2926
35.4589
4
LeastMedSq
0.9697
38.8387
38.784
4
Linear
Regression
0.9571
37.9866
44.0302
3
Linear
Regression
0.9439
40.2121
54.4069
8
Multilayer
Perceptron
0.9285
36.7743
42.3345
8
LeastMedSq
0.8248
49.6174
54.6742
3
Multilayer
Perceptron
0.8183
53.4167
55.9571
4
Multilayer
Perceptron
0.7624
51.6075
61.602
7
LeastMedSq
0.7489
57.4825
69.5129
6
Linear
Regression
0.7444
72.6686
101.1395
3
LeastMedSq
0.7117
61.8939
70.1565
6
Multilayer
Perceptron
0.6878
67.419
69.7698
5
Multilayer
Perceptron
0.6843
62.58
69.6723
7
Multilayer
Perceptron
0.6484
68.2768
72.2187
5
LeastMedSq
0.5631
71.9888
83.4221
6
LeastMedSq
0.093
104.7056
121.5743
The results prove that our approach performs well
during the generation of the effort estimation model.
However, the number of projects used to generate the
effort estimation model was not very high.
The results show that new projects’ effort estimation
using our approach performed quite well. The correlation
coefficient for the LinearRegression model was 0.7481
while the MMRE was 21 %. On the other hand, the
human expert estimates were prone to have higher MRE.
Table 5: Results of both case studies
Tri
al #
Case
Study
#
New
Project
Problem
Domain
Historical
Data
Problem
Domain
MM
RE
(%)
MdM
RE
(%)
Correlati
on
Coeffici
ent
1
1
PD1
PD1
54
61
N/A
2
1
PD1
PD1
41
40
0
8
1
PD1
PD1
21
19
0.75
1
2
PD2
PD2
23
17
0.94
2
2
PD1
PD1+PD+
PD3+PD4
1273
1015
0.15
2
2
PD2
PD1+PD+
PD3+PD4
71
42
0.74
3
2
PD1
PD2
453
515
0.91
3
2
PD1
PD3
705
698
0.8673
3
2
PD2
PD1
96
96
0
3
2
PD2
PD3
150
116
0.18
We can also observe in the Case Study 2 in Trials 2
and 3 the estimation of the new effort for new projects did
not perform well when our approach was not used. In fact,
the MMRE and MdMRE have detiorated significantly in
some cases. However, when our approach was used in the
Trial 1 of the Case Study 2, the estimation of the effort for
the new projects was done well with MMRE of 23% and
MdMRE of 17%.
6. RELATED WORK
Software cost and effort estimation plays a significant
role in the successful completion of any software.
Resources are assigned according to the effort required to
complete the software. Accurate effort estimation leads
the completion of software project on the scheduled time.
Many models and approaches have been developed in the
past 40 years to estimate the effort. Most of the models
take software size as a basic input to estimate the effort
[14]. Effort is usually calculated by using functional size
of the software [15]. There is a strong relationship
between functional size and effort [16]. Valid measured
functional size has the potential to improve effort
estimation and reduce the “cone of uncertainty” effect on
the project planning. It is critical to correctly establish a
relationship between functional size and effort so that we
could be able to estimate effort accurately. There are
many project and product factors that affect positively or
negatively this relationship. Environmental factors,
technical factors and operating constraints are some of
them [14].
Many significant attempts have been taken to explore
the relationship between the size and effort and also to
identify the subset of those NFRs which may affect this
relation. In the following subsections we present an
overview of research studies, effort estimation models and
functional size estimation methods which consider NFRs
as factors affecting the relationship between the software
size and effort:
6.1 Study by Maxwell and Forselius
A study carried out in Finnish companies [17] to
explore the factors that affect productivity and effort
estimation shows the following results (Table 6):
Table 6: Factors Affecting Productivity by Pekka
Forselius (adapted from [17]).
Data set
Experience Database (206 business
software projects from 26 companies).
Variables
considered in
Database
Productivity
Analysis
Application Programming Language,
Application Type (MIS etc), Hardware
Platform, User Interface, Development
Model, DBMS Architecture, DB
Centralization, Software
Centralization, DBMS Tools, Case
Cools, Operating System, Company
where project was developed,
Business Sector (Banking, Insurance
etc), Customer Participation, Staff
Availability, Standard Use, Method
Use, Tool Use, Software Logical
Complexity, Requirement Volatility,
Quality Requirement, Efficiency
Requirement, Installation
Requirement, Staff’s Analysis Skills,
Staff’s Tools Skills, Staff’s Team
Skills, Staff’s Application Knowledge
Base of Size
Measurement
Experience 2.0 Function Point Method
6.2 Study by Angelis, Stamelos and Morisio
L. Angelis and his colleagues have also made
important contribution towards finding the different
factors that affect size and effort relationship. These
authors study the projects in the International Software
Benchmarking Standards Group (ISBSG). The ISBSG
database contains data about recently developed projects
characterized mostly by attributes of categorical nature
such as the project business area, organization type,
application domain and usage of certain tools or methods.
The authors found 7 important factors that affect the
relationship between the size and effort. The result of this
study is given in more detail below (Table 7) [18]:
Table 7: Factors Affecting Productivity by L. Angelis
[18].
Data set
ISBSG release 6
Factors
1. Development Type
2. Development Platform
3. Language Type
4. Used Methodology
5. Organization Type
6. Business Area Type
7. Application Type
Base of Size
Measurement
IFPUG Function Point
The authors’ method is based on the characterization
of the software to be developed in terms of project and
environment attributes and comparison with some similar
completed projects recovered from the ISBSG. The
authors also refer to that human factors are very important
factors that are not taken into account while performing
any previous study. A recent study shows that
Psychometrics data should be collected to better perform
the empirical study [19].
6.3 Study by Liebchen and Shepperd
A study by Liebchen and Shepperd that aims at
reporting on an ongoing investigation into software
productivity and its influencing factors brought the
following results (Table 8) [20]:
Table 8: Factors Affecting Productivity by Martin
Shepperd [20].
Data Set
25,000 closed projects of a large
multinational company
Attributes
Influencing
Software
Productivity
1. The Degree of Technical
Innovation, Business Innovation,
Application Innovation,
2. Team Complexity
3. Client Complexity
4. Degree of Concurrency
5. Development Team Degree of
Experience With Tools, Information
Technology, Hardware, or With
Adopted Methodology,
6. The Project Management
Experience
Base of Size
Measurement
Function Point
This study confirms the intuitive notion that different
industry sectors exhibit the differences in the productivity.
It is due to the fact that industry sectors also affect the
productivity [20].
6.3.4 Summary of Other Studies
A study in the different Swedish companies shows
that following factors affect the effort estimation [21]:
1. Requirement Volatility (Unclear and Changing
Requirement).
2. Unavailability of Templates.
3. Lack of coordination between product developed
and other parts of the project.
The following factors that are considered important
from ISBSG data repository, also affect the productivity
[22]:
1. Programming Language.
2. Team Size.
3. Organization Type.
4. Application Type.
Another recent study published in the Second ACM-
IEEE international Symposium on Empirical Software
Engineering and Measurement shows the following
results (Table 9) [23]:
Table 9: Factors Affecting Phase Distribution for
Software Development Effort [23].
Data Set
China Software Benchmarking
Standard Group
Factors
1. Development Life Cycle
2. Development Size
3. Software Size
4. Team Size
Base for Size
Measurement
LOC
By analyzing the factors collected from the above
studies, we find that all of them are mapped to concepts
under the root of the NonFunctionalRequirement concept
in the NFRs Ontology [2].
7. CONCLUSIONS AND FUTURE WORK
In this paper, we have presented a novel effort
estimation methodology that can be applied in the early
stages of software development projects to estimate effort
of new projects. The results show that our approach
performs well by increasing the correlation coefficient
and decreasing the error rate of the generated effort
estimation models and achieving more accurate effort
estimates for the new projects. We have developed an
effort estimation model based on the historical data of
previous projects to estimate software development effort
during the requirements specification phase. Also, we
have objectively assessed the impacts of different non-
functional requirements and different problem domains on
the estimation of software development effort. Moreover,
we made the effort estimation model robust by
dynamically reducing the feature space using both
statistical and semantic techniques.
Our methodology proposed in this paper is semi-
automated. Our previous work presented in [5] showed
that the functional and non-functional requirements can
automatically and effectively be extracted from software
requirements document using natural language processing
techniques, and our recent work [3,4] have shown that the
functional size of the software can be computed
objectively from any form of unrestricted textual
representation of functional requirements. In future, we
plan to fully automate our methodology and to integrate
with the work presented in [3] in order to obtain the Size
measurements automatically in the early stages of the
software development projects.
REFERENCES
[1] Boehm, B. (2000) Safe and Simple Software Cost Analysis. IEEE
Software, 17 (5), 14-17.
[2] Kassab, M. (2009). Non-Functional Requirements:
Modeling and Assessment. VDM Verlag.
[3] Hussain, I., Kosseim, L., & Ormandjieva, O. (2010).
Towards Approximating COSMIC Functional Size from User
Requirements in Agile Development Processes Using Text Mining. In
LNCS: Natural Language Processing and Information Systems (Vol.
6177/2010, pp. 80-91). Germany: Springer-Verlag.
[4] Hussain, I., Ormandjieva, O., & Kosseim, L. (2009). Mining
and Clustering Textual Requirements to Measure Functional Size of
Software with COSMIC. Proceedings of the International Conference
on Software Engineering Research and Practice (SERP 2009).
[5] Hussain, I., Kosseim, L., & Ormandjieva, O. (2008). Using
Linguistic Knowledge to Classify Non-functional Requirements in SRS
documents. In LNCS: Natural Language and Information Systems (Vol.
5039/2008, pp. 287-298). Germany: Springer-Verlag.
[6] Microsoft Corp. (2011). Windows Intune: Software
Categories. URL: http://onlinehelp.microsoft.com/en-
us/windowsintune/ff399004.aspx (Last Retrieved: May 11, 2011)
[7] Boehm, B., Chulani, S. (2000). Software development cost
estimation approaches – a survey. Technical Report. University of
Southern California and IBM Research, Los Angeles, USA.
[8] Mark A. Hall. (2000). Correlation based Feature Selection for
Machine Learning (PhD Thesis). University of Waikato.
[9] Angelis, L., Stamelos, I. (2000). A Simulation Tool for Efficient
Analogy Based Cost Estimation. In Empirical Software Engineering,
(Vol. 5, pp. 35–68). Netherlands: Kluwer Academic Publishers.
[10] StatSoft Inc. (2011). Elementary Concepts in Statistic. URL
http://www.statsoft.com/textbook/elementary-concepts-in-statistics/
(Last Retrieved: May 14, 2011)
[11] The University of Waikato. (2001). Machine Learning Project at
the University of Waikato in New Zeland. URL:
http://www.cs.waikato.ac.nz/ml/index.html (Last Retrieved: May 14,
2011)
[12] International Organization for Standardization. (2007). ISO/IEC IS
14143--1:2007: Information technology -- Software measurement --
Functional size measurement -- Part 1: Definition of concepts.
[13] COSMIC. (2009). The COSMIC Functional Size Measurement
Method. URL:
http://www.cosmicon.com/methodV3.asp (Last Retrieved: May 14,
2011)
[14] Gencel, C., & Demirors, O. (2008). Functional size measurement
revisited, ACM Transactions Software Engineering Methodol, 17(3),
(pp. 1-36).
[15] Fenton, N.E., & Pfleeger, S.L. (1997). Software Metrics: A
rigorous and Practical Approach, International Thomson Computer
Press.
[16] Pfleeger, S. L., Wu, F., & Lewis, R. (2005). Software Cost
Estimation and Sizing Methods: Issues and Guidelines, RAND
Corporation.
[17] Maxwell, K. D., & Forselius, P. (2000). Benchmarking Software-
Development Productivity, IEEE Software, 17(1), (pp. 80-88).
[18] Angelis, L., Stamelos, I., & Morisio, M. (2001). Building A
Software Cost Estimation Model Based On Categorical Data, In
Proceedings of the 7th international Symposium on Software Metrics,
METRICS, IEEE Computer Society, Washington, DC.
[19] Feldt, R., Torkar, R., Angelis, L., & Samuelsson, M. (2008).
Towards individualized software engineering: empirical studies should
collect psychometrics, In Proceedings of the 2008 international
Workshop on Cooperative and Human Aspects of Software Engineering
(Leipzig, Germany, May 13 - 13, 2008), CHASE '08, ACM, New York,
NY, (pp. 49-52).
[20] Liebchen, G. A., & Shepperd, M. (2005). Software Productivity
Analysis of a Large Data Set and Issues of Confidentiality and Data
Quality, In Proceedings of the 11th IEEE international Software Metrics
Symposium (September 19 - 22, 2005), METRICS, IEEE Computer
Society, Washington, DC, 46.
[21] Magazinovic, A., & Pernstål, J. (2008). Any other cost estimation
inhibitors?, In Proceedings of the Second ACM-IEEE international
Symposium on Empirical Software Engineering and Measurement,
Kaiserslautern, Germany, ESEM '08. ACM, New York, NY, (pp. 233-
242).
[22] Lokan, C., Wright, T., Hill, P. R., & Stringer, M. (2001).
Organizational Benchmarking Using the ISBSG Data Repository, IEEE
Software, 18(5), (pp. 26-32).
[23] Yang, Y., He, M., Li, M., Wang, Q., & Boehm, B. (2008). Phase
distribution of software development effort, In Proceedings of the
Second ACM-IEEE international Symposium on Empirical Software
Engineering and Measurement (Kaiserslautern, Germany, October 09 -
10, 2008), ESEM '08, ACM, New York, NY, (pp. 61-69).