ArticlePDF Available

Abstract

Software measurement, like measurement in any other discipline, must adhere to the science of measurement if it is to gain widespread acceptance and validity. The observation of some very simple, but fundamental, principles of measurement can have an extremely beneficial effect on the subject. Measurement theory is used to highlight both weaknesses and strengths of software metrics work, including work on metrics validation. We identify a problem with the well-known Weyuker properties (E.J. Weyuker, 1988), but also show that a criticism of these properties by J.C. Cherniavsky and C.H. Smith (1991) is invalid. We show that the search for general software complexity measures is doomed to failure. However, the theory does help us to define and validate measures of specific complexity attributes. Above all, we are able to view software measurement in a very wide perspective, rationalising and relating its many diverse activities
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING.
VOL.
20,
NO.
3,
MARCH
I994
I99
Software Measurement:
A
Necessary Scientific Basis
Norman
Fenton
Abstruct-
Software measurement, like measurement in any
other discipline, must adhere to the science of measurement if
it is to gain widespread acceptance and validity. The observation
of some very simple, but fundamental, principles of measurement
can have an extremely beneficial effect on the subject. Measure-
ment theory is used to highlight both weaknesses and strengths of
software metrics work, including work
on
metrics validation. We
identify a problem with the well-known Weyuker properties, but
also show that a criticism of these properties by Cherniavsky and
Smith is invalid. We show that the search for general software
complexity measures is doomed to failure. However, the theory
does help
us
to define and validate measures of specific complexity
attributes. Above all, we are able to view software measurement
in a very wide perspective, rationalising and relating its many
diverse activities.
Index
Terms-Software measurement, empirical studies, met-
rics, measurement theory, complexity, validation.
I.
INTRODUCTION
T
IS
over eleven years since DeMillo and Lipton outlined
I
the relevance of measurement theory to software metrics
[
101. More recent work by the author and others [4],
[
1
I],
[46]
has taken the measurement theory basis for software metrics
considerably further. However, despite the important message
in this work, and related material (such
as
[31], 1341, [43],
[38]), it has been largely ignored by both practitioners and
researchers. The result is that much published work in software
metrics
is
theoretically flawed.
This
paper therefore provides
a
timely summary and enhancement of measurement theory
approaches, which enables us to expose problems in software
metrics work and show how they can be avoided.
In
Section
11,
we provide
a
concise summary of measure-
ment theory. In Section
111,
we use the theory to show that the
search for general-purpose, real-valued software ‘complexity’
measures is doomed to failure. The assumption that funda-
mentally different views of complexity can be characterised
by a single number is counter to the fundamental concepts
of measurement theory. This leads us to re-examine critically
the much cited Weyuker properties [45]. We explain how the
most promising approach is to identify specific attributes of
complexity and measure these separately. In Section
IV,
we
use basic notions of measurement to describe a framework
which enables
us
to view apparently diverse software measure-
ment activities
in
a
unified way. We look at some well-known
approaches to software measurement within this framework,
exposing both the good points and bad points.
Manuscript received August
1992;
revised September
1993.
This work was
The author is with
the
Centre for Software Reliability, City University,
IEEE
Log
Number
921.5569.
supported in pan by IED project SMARTIE and ESPRIT project PDCS?.
London, EClV OHB. UK.
11.
MEASUREMENT
FUNDAMENTALS
In this section, we provide
a
summary of the key concepts
from the science of measurement which are relevant to soft-
ware metrics. First, we define the fundamental notions (which
are generally not well understood) and then we summarise the
representational theory of measurement. Finally, we explain
how this leads inevitably to a goal-oriented approach.
A.
What
is
Measurement?
Measurement
is defined as the process by which numbers
or
symbols are assigned to attributes of entities in the real world
in such a way as to describe them according
to
clearly defined
rules
[
131, [36]. An
entity
may be an object, such as a person
or a software specification, or an event, such as
a
joumey or
the testing phase of
a
software project. An
atrrihute
is a feature
or property of the entity, such
as
the height or blood pressure
(of a person), the length or functionality (of a specification),
the cost (of
a
joumey),
or
the duration (of the testing phase).
Just what is meant by the numerical assignment “describing”
the attribute is made precise within the representational theory
of measurement presented below. Informally, the assignment
of numbers
or symbols must preserve any intuitive and em-
pirical observations about the attributes and entities. Thus,
for example, when measuring the height of humans bigger
numbers must be assigned to the taller humans, although the
numbers themselves will differ according to whether we use
metres, inches, or feet. In most situations an attribute, even one
as
well understood
as
height of humans, may have a different
intuitive meaning to different people. The normal way to get
round this problem
is
to define a
model
for the entities being
measured. The model reflects a specific viewpoint. Thus, for
example,
our
model of
a
human might specify a particular type
of posture and whether or not to include hair height
or
allow
shoes to be wom. Once the model is fixed there
is
a
reasonable
consensus about relations which hold for humans with respect
to height (these are the empirical relations). The need for
good models is particularly relevant in software engineering
measurement. For example, even as simple
a
measure
of
length
of programs
as
lines of code
(LOC)
requires a well defined
model of programs which enables
us
to identify unique lines
unambiguously. Similarly, to measure the effort spent on,
say,
the unit testing process we would need an agreed “model”
of the process which at least makes clear when the process
begins and ends.
There are two broad types of measurement: direct and indi-
rect.
Direct measurement
of an attribute
is
measurement which
does not depend on the measurement of any other attribute.
Indirect measurement
of an attribute is measurement which
involves the measurement of one or more other attributes. It
0098-5589/94$04.00
0
1994
IEEE
200
IEEE
TRANSACTIONS
ON
SOFTWARE
ENGINEERING,
VOL.
20,
NO.
3.
MARCH
1994
tums out that while some attributes can be measured directly,
we normally get more sophisticated measurement (meaning
a
more sophisticated scale, see below) if we measure indirectly.
For a good discussion of these issues, see [25], [27].
Uses
of
Measurement: Assessment and Prediction:
There
are two broad uses of measurement: for
assessment
and for
prediction.
Predictive measurement of an attribute
A
will
generally depend on
a mathematical model relating
A
to
some existing measures
of
attributes
AI,
.
.
.
,
A,.
Accurate
predictive measurement is inevitably dependent on careful
(assessment type) measurement of the attributes
AI,
.
. .
,
A,.
For example, accurate estimates of project resources are
not
obtained by simply “applying” a cost estimation model with
fixed parameters [26]. However, careful measurement of key
attributes of completed projects can lead to accurate resource
predictions for future projects [22] Similarly, it is possible
to get accurate predictions of the reliability of software in
operation, but these are dependent on careful data collection
relating to failure times during alpha-testing
[SI.
For predictive measurement the model alone is not suffi-
cient. Additionally, we need to define the procedures for
a)
determining model parameters and b) interpreting the results.
For example, in the case of software reliability prediction we
might use maximum likelihood estimation for
a) and Bayesian
statistics forb). The model, together with procedures a) and b),
is called
a
prediction system
[29]. Using the same model will
generally yield different results if we use different prediction
procedures.
It must be stressed that, for all but the most trivial attributes,
proposed predictive measures in software engineering are
invariably stochastic rather than deterministic. The same is
true of proposed indirect measures
[14].
Measurement Activities must have Clear Objectives:
The
basic definitions of measurement suggest that any measure-
ment activity must proceed with very clear objectives or
goals. First you need to know whether you want to measure
for assessment or for prediction. Next, you need to know
exactly which entities are the subject of interest. Then you
need to decide which attributes
of
the chosen entities are the
significant ones. The definition of measurement makes clear
the need to specify both an entity and an attribute before
any measurement can be undertaken (a simple fact which
has been ignored in much software metrics activity). Clearly,
there are no definitive measures which can be prescribed for
every objective in any application area. Yet for many years
software practitioners expected precisely that: ‘what software
metric should we be using?’ was, and still
is,
a
commonly
asked question. It says something about the previous ignorance
of scientific measurement in software engineering that the
Goal/Question/Metric paradigm of Basili and Rombach [7] has
been hailed
as
a revolutionary step forward. GQM spells out
the above necessary obligations for setting objectives before
embarking on any software measurement activity.
B.
Representational Theory
of
Measurement
The Issues Addressed:
Although there is
no
universally
agreed theory of measurement, most approaches are devoted
to resolving the following issues: what is and what is not
measurement; which types of attributes can and cannot be
measured and on what kind
of
scales; how do we know if we
have really measured an attribute; how to define measurement
scales; when is
an
error margin acceptable or not; which
statements about measurement are meaningful. The texts
[
131,
[25], [27], [36], [40] all deal with these issues. Here we
present
a brief overview of the
representational
theory of
measurement
[
131, [2S].
Empirical Relation Systems:
Direct measurement of
a
par-
ticular attribute possessed
by
a set of entities must be preceded
by intuitive understanding of that attribute. This intuitive
understanding leads to the identification
of
empirical relations
between entities. The set of entities
C,
together with the set of
empirical relations
R,
is called an
empirical relation system
(C,
R)
for the attribute in question. Thus the attribute of
“height” of people gives rise to empirical relations like “is
tall”, “taller than”, and “much taller than”.
Representation Condition:
To measure the attribute that is
characterised by an empirical relation system
(C,
R)
requires
a mapping
M
into a
numerical relation system
(N,
I‘).
Specif-
ically,
M
maps entities in
C
to numbers (or symbols) in
N,
and empirical relations in
R
are mapped to numerical
relations in
P,
in such
a
way that all empirical relations are
preserved. This is the so-called
representation condition,
and
the mapping
A4
is called
a
representation.
The representation
condition asserts that the correspondence between empirical
and numerical relations
is
two way. Suppose, for example,
that the binary relation
<
is mapped by
M
to the numerical
relation
<.
Then, formally, we have the following instance:
Thus, suppose,
C
is the set of all people and
R
contains the
relation “taller than”.
A
measure
M
of height would map
C
into the set of real numbers
3
and ‘taller than’ to the relation
“>”.
The representation condition asserts that person
A
is
taller than person
B,
if and only if
M(A)
>
M(B).
By having to identify empirical relations for an attribute
in advance, the representational approach to measurement
avoids the temptation to
define
a
poorly understood, but
intuitively recognisable, attribute in terms of some numerical
assignment. This is one of the most common failings in
software metrics work. Classic examples are where attributes
such
as
“complexity”
or
“quality” are equated with proposed
numbers; for example, complexity with
a
“measure” like
McCabe’s cyclomatic number [30], or Halstead’s
E
[
181,
and
“quality” with Kafura and Henry’s fan-in/fan-out equation
Scale Types and Meaningfulness:
Suppose that an attribute
of some set
of
entities has been characterised by an empirical
relation system
(C,
R).
There may in general be many ways of
assigning numbers which satisfy the representation condition.
For example, if person
A
is taller than person
B,
then
M(
A)
>
M(B)
irrespective of whether the measure
hf
is
in
inches,
feet, centimetres metres, etc. Thus, there are many different
measurement representations for the normal empirical relation
system for the attribute of height of people. However, any two
representations
M
and
M’
are related
in
a
very specific way:
there is always some constant
c
>
0
such that
M
=
cM’
Representation Condition:
z
<
y
U
M(x)
<
M(1y)
~31.
FENTON: SOFTWARE MEASUREMENT A
NECESSARY SCIENTIFIC
BASIS
20
1
(so
where
M
is the representation of height in inches and
M’
in centimetres, c
=
2.54).
This transformation from
one valid representation into another is called an
admissible
transformation.
It is the class of admissible transformations which deter-
mines the
scale type
for an attribute (with respect to some
fixed empirical relation system). For example, where every
admissible transformation is a scalar multiplication (as for
height) the scale type is called
ratio.
The ratio scale is
a sophisticated scale of measurement which reflects a very
rich empirical relation system. An attribute is never of ratio
type
a priori;
we normally start with a crude understanding
of an attribute and a means of measuring it. Accumulating
data and analysing the results leads to the clarification and
re-evaluation of the attribute. This in tum leads to refined
and new empirical relations and improvements in the accu-
racy of the measurement; specifically this is an improved
scale.
For many software attributes we are still at the stage of
having very crude empirical relation systems. In the case of
an attribute like “criticality” of software failures an empirical
relation system would at best only identify different classes of
failures and a binary relation “is more critical than”. In this
case, any two representations are related by a monotonically
increasing transformation. With this class of admissible trans-
formations, we have an
ordinal
scale type. In increasing order
of sophistication, the best known scale types are:
nominal,
ordinal, interval, ratio,
and
absolute.
For full details about the
defining classes of admissible transformations, see
1361.
This formal definition of scale type based on admissible
transformations enables
us
to determine rigorously what kind
of statements about measurement are meaningful. Formally,
a statement involving measurement is
meaninllful
if its truth
or falsity remains unchanged under any admissible transfor-
mation of the measures involved.
Thus, for example, it is
meaningful to say that “Hermann is twice as tall as Peter”; if
the statement is true (false) when we measure height in inches,
it will remain true (false) when we measure height in any
constant multiple of inches. On the other hand the statement
“Failure
:E
is twice as critical as failure
y”
is not meaningful
if we only have an ordinal scale empirical relation system
for failure criticality. This is because a valid ordinal scale
measure
M
could define
M(z)
=
6,
M(y)
=
3, while another
valid ordinal scale measure
M’
could define
M’(Lc)
=
10,
M’(y)
=
9.
In this case the statement is true under
M
but
false under
M’.
The notion of meaningfulness also enables
us
to determine
what kind of operations we can perform on different measures.
For example, it is meaningful to use
means
for computing the
average of a set of data measured on a ratio scale but not on
an ordinal scale.
Medians
are meaningful for an ordinal scale
but not for a nominal scale. Again, these basic observations
have been ignored in many software measurement studies,
where a common mistake is to use the mean (rather than
median) as measure of average for data which is only ordinal.
Good examples of practical applications of meaningfulness
ideas may be found in
131,
FW86,
[37],
1391.
An alternative
definition of meaningfulness is given in
1161.
Representation Theorems:
The serious mathematical as-
pects of measurement theory are largely concemed with
theorems which assert conditions under which certain scales of
direct measurement are possible for certain relation systems.
A typical example of such a theorem, due to Cantor, gives
conditions for real-valued ordinal-scale measurement when
we have a countable set of entities
C
and a binary relation
b
on
C:
Cantor’s Theorem:
The empirical relation system
(C.
b)
has a representation in
(9,
<)
if and only if
b
is a strict weak
order. The scale type is ordinal when such a representation
exists.
The relation
b
being a “strict weak order” means that it is:
1)
asymmetric
(zRy
implies that it is not the case
yRz),
and
2)
negatively transitive
(zRy
implies that for every
z
E
C,
either
:cRz
or
zRy).
111.
MEASURING SOFTWARE
“COMPLEXITY”
The representational theory of measurement is especially
relevant to the study of software complexity measures. In
this section we show that the search for a general-purpose
real-valued complexity measure is doomed to failure, but
that there are promising axiomatic approaches which help
us
to measure specific complexity attributes. However, one
well-known axiomatic approach
1451 has serious weaknesses
because it attempts to characterise incompatible views of
complexity.
A.
General Complexity Measures: The Impossible Holy Grail
For many years researchers have sought to characterise
general notions of “complexity” by a single real number.
To simplify matters, we first restrict our attention to those
measures which attempt only to characterise control-flow com-
plexity. If we can show that
it
is impossible to define a general
measure of control-flow complexity, then the impossibility of
even more general complexity measures is certain.
Zuse cites dozens of proposed control-flow complexity
measures in
1461.
There seems to
be
a minimum assumption
that the empirical relation system for complexity of programs
leads to (at least) an ordinal scale. This is because of the
following hypotheses which are implicit in much of the work.
Hypothesis
I:
Let
C
be the class of programs. Then the
attribute control-flow “complexity” is characterised by an
empirical relation system which includes a binary relation
h
“less complex than”; specifically
(x.
y)
E
b
if there is a
consensus that
LC
is less complex than
y.
Hypothesis
2:
The proposed measure
M:
C
+
R
is a
representation of complexity in which the relation
b
is mapped
to
<.
Hypothesis
I
seems plausible. It does not state that
C
is
totally ordered with respect to
b;
only that there is some
general
view of complexity for which there would be a
reasonable consensus that certain pairs of programs are in
h.
For example, in Fig.
1,
it seems plausible that
(.E,
y)
E
b
(from
the measurement theory viewpoint
it
would be good enough
if most programmers agreed this). Some pairs appear to be
202
IEEE
TRANSACTIONS ON SOFTWARE ENGINEERING,
VOL. 20,
NO.
3,
MARCH
1994
L
-1
Fig.
I.
Complexity relation not negatively transitive’?
incomparable, such as
:I:
and z
or
:q
and
z;
if people were
asked to “rank” these for complexity they would inevitably
end up asking questions like “what is meant by complexity”
before attempting to answer. Since
b
is supposed to capture a
general view of complexity, this would be enough to deduce
that
(x:z)
4
b
and
(z:~)
4
b
and also that
(2.y)
$
b
and
(y.
z)
4
6.
The idea of the inevitable incomparability of some
programs, even for some specific views of complexity, has
also been noted in [43].
Unfortunately, while Hypothesis
I
is plausible, Hypothesis
2
can be dismissed because of the Representation Condition.
The problem is the “incomparable” programs. While
b
is not
a total order
in
C,
the relation
<
is
a total order in
R.
The
measurement mapping
M
might force an order which has to
be reflected back in
C.
Thus, if for example
M(z)
<
M(w)
(as
in
the case
of
McCabe’s cyclomatic complexity measure
in Fig.
1
where
M(z)
=
2
and
M(y)
=
3)
then, if
A!!
is really
a measure of complexity, the Representation Condition asserts
that we must also have
z
<
;q
for which there is no consensus.
Formally we can prove the following theorem.
Theorem
I:
Assuming Hypothesis
1,
there is no general
notion of control-flow complexity of programs which can be
measured on an ordinal scale in
(1R.
<)
To
prove this, the previous argument is made formal by
appealing to Cantor’s Theorem. It is enough
to
show that
the relation
b
is not a strict weak order. This follows since
(according to
our
definition of
h)
it is reasonable
to
deduce
that
(x.?J)
E
b
but
(x:,
z)
$
b
and
(2.y)
$
b
(.’
since
it
is not
clear that any consensus exists about the relative complexities
of
:c
and
z
and lij and
z).
The theorem should put an end
to
the
search for the
holy grail of a general complexity measure. However, it
does not rule
out
the search for measures that characterise
specific views of complexity (which is the true measurement
theory approach). For example, a specific program complexity
attribute is “the number of independent paths.” McCabe’s
cyclomatic complexity is an absolute scale measure of this
attribute. It might even be a ratio scale measure of the attribute
of ‘testability’ with respect to independent path testing. Other
specific attributes of complexity, such as the maximum depth
of nesting, distribution of primes in the decomposition tree,
and the number of paths of various types, can all be measured
rigorously and automatically
I
I
I],
[34].
This idea of looking
at
measures with respect to particular
viewpoints of complexity is taken much further by Zuse [46].
Zuse uses measurement theory to analyse the many complexity
measures in the literature; he shows which viewpoint and
assumptions are necessary to use the measures on different
scales. The beauty and relevance of measurement theory is
such that it clearly underlies some of the most promising
work in software measurement even where the authors have
not made the explicit link. Notable in this respect are the
innovative approaches of Melton
et
al.
[31]
and Tian and
Zelkowitz
[43].
In both of these works, the authors seek
to characterise specific views of complexity.
In
[43],
the
authors do this by proposing a number of axioms reflecting
viewpoints of complexity:
in
the context of measurement
theory, the axioms correspond to particular empirical relations.
This means that the representation condition can be used to
determine the acceptability of potential measures.
Melton
et
a/.
[31] characterise a specific view of program
complexity by specifying precisely an order relation
5
on
program flowgraphs; in other words they
dejine
the binary
relation
b
(of Hypothesis 1) as
5.
The benefit of this approach
is that the view of complexity is explicit and the search for
representations (i.e., measures of this view of complexity)
becomes purely analytical. The only weakness
in
[31] is
the assertion that a measure
M
is “any real-valued mapping
for which
M(x)
5
M(y) whenever
z
5
:y.”
This ignores
the sufficiency condition of the Representation Condition.
Thus, while McCabe’s cyclomatic complexity
[30]
satisfies
necessity, (and is therefore a “measure” according to Melton
et
U/.
[31]),
it is not a measure in the representational sense
(since in Fig.
1
M(z)
<
M(y)
but
it
is not the case that
z
5
y).
Interestingly, Tian and Zelkowitz also use the same
weakened form of representation, but acknowledge that they
“would like the relationship” to be necessary and sufficient.
It follows from Cantor’s theorem that there is no repre-
sentation of Melton’s
(F:
+)
in
(Y?;
<).
However, it is still
possible
to
get ordinal measurement
in
a number system which
is not
(X,
<)
(and hence, for which
it
is not required that
<
is a strict weak order), although the resulting measure is of
purely theoretical interest.
It
is shown
in
[
121 that there is
a representation in
(Nut,
I)
where
Nut
is the set of natural
numbers and
I
is the divides relation. The construction of the
measurement mapping
M
is based on ensuring incomparable
flowgraphs are mapped to mutually prime numbers.
For
the
flowgraphs of Fig.
I,
M(z)
=
2,
M(:E)
is a fairly large
multiple of
3, and M(y) is a very large multiple of 3.
B.
The
Weyukei-
Pi-operties
Despite the above evidence, researchers have continued
to search for single real-valued complexity measures which
are
expected
to have the magical properties of being
key
indicators of such diverse attributes as
comprehensihility,
cot-1-ectness, muintaiizuhility.
I-eliahility,
testuhility,
and euse
of
implementation
[30],
1321.
A
high value for a “complexity”
measure is supposed to be indicative of low comprehensibility,
low reliability, etc. Sometimes these measures are also called
“quality” measures
1231.
In
this case, high values of the
measure actually indicate low values of the quality attributes.
The danger of attempting
to
find measures which char-
acterise
so
many different attributes is that inevitably the
measures have
to
satisfy
conjlicting
aims. This is counter to
the representational theory of measurement. Nobody would
expect a single number
M
to characterise every notion of
FENTON:
SOtTWARE
MEASUREMENT:
A
NECESSARY
SCIENTIFIC
BASIS
203
“quality” of people, which might include the very different
notions of
a)
physical strength, and b) intelligence. If such
a
measure
M
existed it would have to satisfy a) M(A)
>
M(B)
whenever
A
is stronger than
B
and b)
M(A)
>
M(B)
completely.
whenever
A
is more intelligent than
B.
The fact that some
highly intelligent people are very weak physically ensures
that no
M
can satisfy both these properties. Nevertheless,
Weyuker’s list of properties [45] seems
to
suggest the need
measure is
a
“metric”, the converse is certainly not true. The
confusion in [9], and
also in [45], arises from wrongly equating
these two concepts, and ignoring the theory
of
measurement
IV.
UNIFYING
FRAMEWORK
FOR
SOFTWARE
MEASUREMENT
A,
A
Classification ofsofnyare
Measures
”-
for analogous software “complexity” measures.
For
example,
two of the properties that Weyuker proposes any complexity
measure
M
should satisfy are the following properties.
Property
A:
For
any
program
bodies
p,
62,
M(p)
I
M(P;
Q)
and
M(Q)
I
M(P;Q).
Property
B:
There exist program bodies p.
Q,
and
such
that
M(P)
=
M(Q)
and
M(P;R)
#
M(Q:R).
Property
A
that adding
code
to
a program
cannot
decrease
its
complexity.
mis
reflects
the view that
program
In software measurement activity, there are three classes of
Processes:
are any software related activities which take
place over time.
Products:
are any artefacts, deliverables
or
documents
which arise out of the processes.
Resources:
are the items which are inputs to processes.
We make a distinction between attributes of these which
entities of interest
[
1
13.
size
is
a
key factor in its complexity. We can
also
conclude
from Property A that low comprehensibility is
not
a
key factor
in complexity. This is because it is widely believed that in
certain cases we can understand a program
more
easily as
we see more of it [43]. Thus, while
a
“size” type complexity
measure
M
should satisfy property A,
a
“comprehensability”
type complexity measure
M
cannot satisfy property A.
Property
B
asserts that we can find two program bodies
of equal complexity which when separately concatenated to
a
same third program yield programs of different complexity.
Clearly, this property has much to do with comprehensibility
and little
to
do with size.
Thus, properties A and
B
are relevant for very different,
and incompatible, views of complexity. They cannot both be
satisfied by a single measure which captures notions of size
and
low comprehensibility. Although the above argument is
not formal, Zuse has recently proved [47] that, within the
representational theory of measurement, Weyuker’s axioms
are contradictory. Formally, he shows that while Property A
explicitly requires the ratio scale for
M,
Property
B
explicitly
excludes the ratio scale.
The general misunderstanding of scientific measurement in
software engineering is illustrated further in a recent paper [9],
which was itself
a critique of the Weyuker’s axiom. Chemi-
avsky and Smith define
a
code based “metric” which satisfies
all of Weyuker’s axioms but, which they rightly claim, is not
a
sensible measure of complexity. They conclude that axiomatic
approaches may not work. There is no justification for their
conclusion. On the one hand,
as
they readily accept, there was
no suggestion that Weyuker’s axioms were complete. More
importantly, what they fail to observe,
is
that Weyuker did not
propose that the axioms were
suflcient;
she only proposed that
they were necessary. Since the Chemiavsky/Smith “metric”
is
clearly not
a
measure (in
our
sense) of any specific attribute,
then showing that
it
satisfies any set
of
necessary axioms for
any measure is of no interest at
all.
These problems would have been avoided by a simple
lesson from measurement theory: the definition of
a
numerical
mapping does not in itself constitute measurement.
It
is popular
in software engineering to use the word “metric” for any
number extracted from
a
software entity. Thus while every
are
internal
and
external.
Internal attributes
of a product, process,
or
resource are
those which can be measured purely in terms of the product,
process,
or
resource itself.
For
example, length is an intemal
attribute of any software document, while elapsed time is
an internal attribute of any software process.
External attributes
of
a
product, process, or resource are
those which can only be measured with respect to how
the product, process, or resource relates to other entities
in its environment.
For
example,
reliability
of
a
program
(a
product attribute) is dependent not just on the program
itself, but on the compiler, machine, and user.
Productivio
is an external attribute of
a
resource, namely people (either
as
individuals or groups); it is clearly dependent on many
aspects of the process and the quality of products delivered.
Software managers and software users would most like
to measure and predict extemal attributes. Unfortunately,
they are necessarily only measurable indirectly.
For
example,
productivity of personnel is most commonly measured
as
a
ratio
of
size
of code delivered
(an
intemal product attribute);
and
effort
(an intemal process attribute). The problems with
this oversimplistic measure of productivity have been well
documented. Similarly, “quality” of
a
software system (a very
high level extemal product attribute) is often defined
as
the
ratio of:
faults discovered during formal
testing
(an internal
process attribute); and
size measured
by
KLOC)
[
191. While
reasonable for developers, this measure of quality cannot be
said to be
a
valid measure from the viewpoint of the user.
Empirical studies have suggested there may be little real
correlation between faults and actual failures of the software
in operation. For example, Adams
[
11
made
a
significant study
of a number of large software systems being used on many
sites around the world; he discovered that a large proportion
of faults almost never lead to failures, while less than
2%
of
the known faults caused most of the common failures.
It is rare for a genuine consensus to be reached about the
contrived definitions of external attributes. An exception is the
definition
of
reliability of code in terms of probability of failure
free-operation within a given usage environment
[21],
[29].
In this case, we need
to
measure intemal process attributes.
The processes are each of the periods of software operation
FENTON: SOFTWARE MEASUREMENT: A NECESSARY SCIENTIFIC BASIS
be considered; module calling structures with widely varying
widths are not considered to be very modular because of ideas
of chunking from cognitive psychology.
D.
Validating Software Measures
Validating a software measure in the assessment sense is
equivalent to demonstrating empirically that the representation
condition is satisfied for the attribute being measured. For
a measure in the predictive sense, all the components of
the prediction system must be clearly specified and a proper
hypothesis proposed, before experimental design for validation
can begin.
Despite these simple obligations for measurement valida-
tion, the software engineering literature abounds with so-called
validation studies which have ignored them totally. This
phenomenon has been examined thoroughly in
[
141 and [33],
and fortunately there is some recent work addressing the
problem [38]. Typically a measure (in the assessment sense)
is proposed. For example, this might be a measure of an
intemal structural attribute of source code. The measure is
“validated” by showing that it correlates with some other
existing measure. What this really means is that the proposed
measure is the main independent variable in a prediction
system. Unfortunately, these studies commonly fail to specify
the required prediction system and experimental hypothesis.
Worse still, they do not specify, in advance, what is the
dependent variable being predicted. The result is often an
attempt to find fortuitous correlations with any data which
happens to be available.
In
many cases, the only such data
happens to be some other structural measure. For example,
in [28], structural type measures are “validated” by showing
that they correlate with “established” measures like LOC
and McCabe’s cyclomatic complexity number. In such cases,
the validation study tells us nothing
of
interest. The general
dangers of the “shotgun” approach to correlations of software
measures have been highlighted in
[8].
The search for rigorous software measures has not been
helped by a commonly held viewpoint that
no measure is
“valid” unless it is a good predictor of effort. An analogy
would be to reject the usefulness of measuring a person’s
height
on
the grounds that it tells us nothing about that person’s
intelligence. The result is that potentially valid measures of
important intemal attributes become distorted. Consider, for
example, Albrecht’s function points [2]. In this approach, the
unadjustedfunction count
UFC seems to be a reasonable mea-
sure of the important attribute of
functionality
in specification
documents. However, the intention was to define a single
size measure as the main independent variable in prediction
systems for effort. Because of this, a
technical complexity
factor
(TCF), is applied to UFC to arrive at the number
of
function points FP which is the model in the prediction system
for effort. The TCF takes account of 14 product and process
attributes in Albrecht’s approach, and even more in Symons’
approach [41]. This kind of adjustment (to a measure
of
system
functionality) is analogous to redefining measures of height
of people in such a way that the measures correlate more
closely with intelligence. Interestingly, Jeffery [20] has shown
205
that the complexity adjustments do not even improve effort
predictions; there was
no significant differences between UFC
and
FF’
as effort predictors in his studies. Similar results have
been reported by Kitchenham and Kansala [24].
V,
SUMMARY
Contrary to popular opinion, software measurement, like
measurement in any other discipline, must adhere to the
science of measurement if it is to gain widespread acceptance
and validity. The representational theory
of
measurement
asserts that measurement
is the process of assigning numbers
or symbols to attributes
of
entities in such a way that all
empirical relations
are
preserved. The entities of interest in
software can
be
classified as processes, products, or resources.
Anything we may wish to measure or predict is an identifiable
attribute of these. Attributes are either intemal or extemal.
Although extemal attributes like reliability of products, sta-
bility of processes, or productivity of resources tend to
be
the ones we
are
most interested in measuring, we cannot do
so
directly. We are generally forced to measure indirectly in
terms of intemal attributes. Predictive measurement requires a
prediction system.
This means not just a model but also a set
of prediction procedures for determining the model parameters
and applying the results. These in tum are dependent
on
accurate measurements in the assessment sense.
We have used measurement theory to highlight both weak-
nesses and strengths of software metrics work, including
work
on metrics validation. Invariably, it seems that the most
promising theoretical work has been using the key components
of measurement theory. We showed that the search for general
software complexity measures is doomed to failure. However,
the theory does help us to define and validate measures of
specific complexity attributes.
ACKNOWLEDGMENT
I
would like to thank
B.
Littlewood and M. Neil for
providing comments
on
an
earlier draft of this paper, and
P.
Mellor,
S.
Page, and R. Whitty for sharing views and
information that have influenced its contents. Finally, I would
like
that
[I1
[21
[31
[41
[SI
L61
171
to
thank four anonymous referees who made suggestions
clearly improved the paper.
REFERENCES
E.
Adams, “Optimizing preventive service of software products.”
IBM
Res.
J.,
vol. 28, no.
1,
pp. 2-14, 1984.
A.
J. Albrecht, “Measuring application development productivity,” in
IBM Applic.
Dev.
Joint SHAREIGUIDE Symp.,
Monterey,
CA,
1979,
J. Aczel, F.
S.
Roberts, and
Z.
Rosenbaum, “On scientific laws without
dimensional constants,”
J.
Math. Anal. Applicat..
vol. 119, no. 389416,
1986.
A.
Baker, J. Bieman,
N.
E. Fenton,
D.
Gustafson,
A.
Melton, and
R.
W.
Whitty,
“A
philosophy for software measurement,”
J.
Syst. Software,
vol. 12, pp. 277-281, July 1990.
S.
Brocklehurst, P.
Y.
Chan, B. Littlewood, and J. Snell, “Recalibrating
software reliability models,”
IEEE
Trans.
Softwure Eng.,
vol. 16, no. 4,
pp. 458-470, Apr. 1990.
B.
Boehm,
Software Engineering Economics.
Englewood Cliffs,
NJ:
Prentice
Hall,
1981.
V.
Basili and
D.
Rombach, “The tame project: Towards improvement-
orientated software environments,”
IEEE Trans. Software Eng.,
vol. 14,
no. 6, pp. 758-773, June 1988.
pp. 83-92.
206
IEEE TRANSACTIONS
ON
SOFTWARE ENGINEERING,
VOL.
20,
NO.
3,
MARCH
1994
[8]
R.
E. Courtney and D. A. Gustafson, “Shotgun correlations in software
measures,”
IEE Software Eng.
J.,
vol.
8,
no.
1,
pp.
5-13,
1993.
[9] J. C. Chemiavsky and C. H. Smith, “On weyuker’s axioms for software
complexity measures,”
IEEE
Trans.
Software Eng.,
vol. 17, no.
6,
pp.
63&638,
June 1991.
[lo] R. A. DeMillo and R. J. Lipton, “Software project forecasting,” in
Software Metrics,
A.
J.
Perlis, F.G. Sayward, and M. Shaw, Eds.
Cambridge, MA: MIT Press, 1981, pp. 77-89.
[l I] N. E. Fenton,
Software
Metrics:
A Rigorous Approach.
London: Chap-
man
&
Hall, 1991.
[
121
-,
“When
a
software measure
is
not
a
measure,”
IEE Software Eng.
J.,
vol. 7, no.
5,
pp. 357-362, May 1992.
[I31
L.
Finkelstein, “A review of the fundamental concepts of measurement,”
Measurement,
vol. 2, no.
I,
pp. 25-34, 1984.
[
141 N. E. Fenton and B. A. Kitchenham, “Validating software measures,”
J.
Software Testing, Verification
&
Reliability,
vol. 1, no. 2, pp. 27-42,
1991.
[15] N.E. Fenton and A. Melton, “Deriving structurally based software
measures,”
J.
Syst.
Software,
vol.
12,
pp. 177-187, July 1990.
[I61 J.-C. Falmagne and
L.
Narens, “Scales and meaningfulness of quantita-
tive laws.”
Synthese,
vol.
55,
pp. 287-325, 1983.
[I71
P.
J. Fleming and J. J. Wallace, “How not
to
lie with statistics,”
Commun.
[18] M. H. Halstead.
Elements
of
Sofrware Science.
Amsterdam: Elsevier
North Holland, 1975.
[19]
J.
Inglis, “Standard software quality metrics,”
AT&T Tech.
J.,
vol.
65,
no. 2, pp. 113-118, Feb. 1985.
1201 D.R. Jeffery, G.C. Low, and M. Bames, “A comparison of function
point counting techniques,”
IEEE Trans. SOfMwP Eng.,
vol. 19, no.
5,
pp. 529-532, Mar. 1993.
[21] Z. Jelinski and P. B. Moranda, “Software reliability research,” in
Statisti-
cal Computer Performance Evaluation,
W. Freiberger, Ed. New York:
Academic Press, 1972, pp. 465-484.
[22] B. A. Kitchenham and B. de Neumann, “Cost modelling and estimation,”
in
Software Reliability Handbook,
P. Rook, Ed. New York: Elsevier
Applied Science, 1990, pp. 333-376.
[23]
D.
Kafura and
S.
Henry, “Software quality metrics based on intercon-
ACM,
vol. 29, pp. 218-221, 1986.
[33] M. Neil, “Multivariate assessment of software products,”
J.
Sofmare
Testing Verification and Reliability,
to appear 1992.
[34]
R.
E. Prather and
S.
G.
Giulieri, “Decomposition of flowchart schemata.”
Comput.
J..
vol. 24, no.
3,
pp. 258-262, 1981.
[35]
R.
S.
Pressman,
Software Engineering: A Practitioner’s Approach.,
2nd
ed. New York: McGraw-Hill Int., 1987.
[361
F.
S.
Roberts,
Measurement Theory with Applications
to
Decision
Mak-
ing,
Utility. and the Social Sciences.
Reading, MA: Addison Wesley,
1979.
[37]
-,
“Applications of the theory of meaningfulness to psychology,”
J.
Marh. Psychol.,
vol.
29,
pp. 311-332, 1985.
[38]
N. F. Schneidewind, “Methodology for validating software metrics,”
IEEE Trans. Software Eng.,
vol.
18,
no.
5,
pp. 41M22, May 1992.
[39]
J.
E. Smith, “Characterizing computer performance with
a
single num-
ber,”
Commun. ACM,
vol.
31,
pp. 1202-1206, 1988.
[40]
P. H. Sydenham, Ed.,
Handbook of Measurement Science,
vol.
1.
New
York: J. Wiley, 1982.
[41] C. R. Symons, “Function point analysis: Difficulties and improvements,”
IEEE Trans. Software Eng.,
vol. 14, no.
1,
pp. 2-11, Jan. 1988.
[42] D. A. Troy and S.H. Zweben, “Measuring the quality of structured
design,”
J.
Syst.
Sofhvare,
vol.
2,
pp. 113-120, 1981.
[43]
J.
Tian and M. V. Zelkowitz, “A formal program complexity model and
its applications,”
J.
Syst.
Software,
vol. 17,
pp.
253-266, 1992.
[44]
S.N. Woodfield, H.E. Dunsmore, and V. Y. Shen, “The effects of
modularisation and comments on program comprehension,” in
Proc.
Sth Int. Conf. Software Eng.,
1979, pp. 213-223.
[45] E.
J.
Weyuker, “Evaluating software complexity measures,’’
IEEE
Trans.
Software
Eng.,
vol. 14, no. 9, pp. 1357-1365, Sept. 1988.
[46] H. Zuse,
Software Complexity:
Measures
and Methods.
Amsterdam:
de Gruyter, 1990.
[47]
-,
“Support of experimentation by measurement theory,” in
Experi-
mental Software Engineering Issues (Lecture Notes in Computer Science,
vol. 706). H.D. Rombach. V.R. Basili, and R. W. Selby, Eds.
New
York: Springer-Verlag, 1993, pp. 137-140.
.~
nectivity,”
J
Syst
&
S&waie,
vol.
2,
pp.
121-131, 1981
[241 B. A. Kitchenham and K Kansala, “Inter-item correlations among
function points,” in
IEEE Software Metrics Svmp
,
Baltimore, MD, 1993,
pp.
11-15
[25] D H. Krantz, R.D. Luce,
P.
Suppes, and A. Tversky,
Foundations of
Measurement,
vol.
1.
[26]
B. A Kitchenham and
N.
R.
Taylor, “Software project development cost
estimation,”
J
Syst Software,
vol.
5,
pp. 67-278. 1985
[271
H.
E.
Kyburg,
Theory and Measuiement
Cambridge. Cambridge Univ
Press, 1984.
t281
H
F
Li
and W.K Cheung, “An empincal study
of
software metncs,”
IEEE Trans Sofhare Eng
,
vol. 13, no.
6,
June 1987.
[291 B Littlewood, “Forecasting software reliability,” in
Software Reliabilin,
Modelling
and
Identrfication,
S
Bittanti, Ed.
(Lecture Notes in Computer
Science,
vol 341)
[30] T. J McCabe, “A complexity measure,”
IEEE Trans Sojrware Eng
,
vol
SE-2, no 4, pp 308-320, Apr. 1976.
[3
11
A. C. Melton, D A. Gustafson, J. M. Bieman, and A. A Baker, “Mathe-
matical perspective of software measures research,”
IEE
Sojtwaie
Eng
J,
vol
5,
no.
5,
pp 246-254, 1990
[321 J. C. Munson and Khoshgoftaar, “The detection of fault prone modules,”
IEEE Ttans Software En,?,
vol. 18, no
5,
pp 423-433, May 1992
Norman Fenton is
a
Professor of Computing Sci-
ence in the Centre for Software Reliability
at
City
University, London,
UK.
He was previously the
Director of the Centre for Systems and Software
Engineering (CSSE) at South Bank University and
a
Post-Doctoral Research Fellow at Oxford University
(Mathematical Institute)
He has consulted widely to industry about metncs
programs, dnd has
also
led numerous collaborative
projects. One such current project is developing
a
measurement based framework
for
the assessment
of software engineering standards and methods. His research interests are in
software measurement and formal methods of software development. He has
wntten three books on these subjects and published many papers.
Prof Fenton
is
Editor of Chapman and Hall’s
Computer Science Research
and Piactice Series
and is on the Editorial Board of the
Software Quality
Journal.
He has chaired several international conferences on software metrics
Prof Fenton is Secretary
of
the (National) Centre
for
Software Reliability He
is
a
Chartered Engineer (Member of the IEE), and Associate Fellow of the
Institute of Mathemdtics and its Applications, and is
a
member of the IEEE
Computer Society
New York. Academic Press, 1971.
New York. Springer-Verlag, 1988, pp. 141-209.
... In the 1990s, before researchers had access to extensive SE data, there were prolonged, somewhat heated, theoretical debates on the value of metric X vs metric Y (e.g. [12]). So to test if (e.g.) McCabe's cyclomatic complexity metrics [13] were any better than (e.g.) Halstead readability metrics [14], our 2007 paper applied feature pruning. ...
Preprint
Full-text available
Industry can get any research it wants, just by publishing a baseline result along with the data and scripts need to reproduce that work. For instance, the paper ``Data Mining Static Code Attributes to Learn Defect Predictors'' presented such a baseline, using static code attributes from NASA projects. Those result were enthusiastically embraced by a software engineering research community, hungry for data. At its peak (2016) this paper was SE's most cited paper (per month). By 2018, twenty percent of leading TSE papers (according to Google Scholar Metrics), incorporated artifacts introduced and disseminated by this research. This brief note reflects on what we should remember, and what we should forget, from that paper.
... With the democratization of the Internet, there is a great trend to develop quality metrics (Gregory, 2019;Pressman & Maxim, 2015;Hoftman, R., Marx, M. & Hancock, 2008;Fenton, 1994). The purpose is to evaluate computer programs with methods and techniques, from an endless number of professional groups, who define themselves as experts. ...
Chapter
Full-text available
The study analyzes the invisible factors that influence the innovation and quality of the software of the 21st century, through natural language and programming languages. The analysis of languages shows how technological evolution influences the innate and acquired skills of human beings, especially those who are dedicated to software engineering and all its derivations in the field of ICTs. There is a detailed list of internal and external factors affecting the qualitative and reliable software industry. It also examines the relationships between innovative and creative education of experts in new technologies, programming over time, and the role of social networks. Finally, a state of the art on the myths and realities of the software profession in the new millennium is presented, which together with a group of rhetorical questions allows generating new lines of research within the formal and factual sciences, starting from the inquiries and conclusions of this work.
... If we intuitively understand (e.g., via visual observations) that a person A (of height 175cm) is taller than another person B (of height 172cm), this mapping will give us ℎ ℎ (A) > ℎ ℎ (B) (i.e., 175cm > 172cm) according to the measurement results. Meanwhile, if we have ℎ ℎ (A) > ℎ ℎ (B) based on our measurements (e.g., ℎ ℎ (A)=175cm and ℎ ℎ (B)=172cm), we will intuitively understand that A is taller than B [40] via visual observations. Then, we can conclude that the height metric is validated. ...
Article
Full-text available
A defining, unique aspect of distributed systems lies in interprocess communication (IPC) through which distributed components interact and collaborate toward the holistic system behaviors. This highly decoupled construction intuitively contributes to the scalability, performance, and resiliency advantages of distributed software, but also adds largely to their greater complexity, compared to centralized software. Yet despite the importance of IPC in distributed systems, little is known about how to quantify IPC-induced behaviors in these systems through IPC measurement and how such behaviors may be related to the quality of distributed software. To answer these questions, in this paper, we present DistMeasure, a framework for measuring distributed software systems via the lens of IPC hence enabling the study of its correlation with distributed system quality. Underlying DistMeasure is a novel set of IPC metrics that focus on gauging the coupling and cohesion of distributed processes. Through these metrics, DistMeasure quantifies relevant run-time characteristics of distributed systems and their quality relevance, covering a range of quality aspects each via respective direct quality metrics. Further, DistMeasure enables predictive assessment of distributed system quality in those aspects via learning-based anomaly detection with respect to the corresponding quality metrics based on their significant correlations with related IPC metrics. Using DistMeasure, we demonstrated the practicality and usefulness of IPC measurement against 11 real-world distributed systems and their diverse execution scenarios. Among other findings, our results revealed that IPC has a strong correlation with distributed system complexity, performance efficiency, and security. Higher IPC coupling between distributed processes tended to be negatively indicative of distributed software quality, while more cohesive processes have positive quality implications. Yet overall IPC-induced behaviors are largely independent of the system scale, and higher (lower) process coupling does not necessarily come with lower (higher) process cohesion. We also show promising merits (with 98% precision/recall/F1) of IPC measurement (e.g., class-level coupling and process-level cohesion) for predictive anomaly assessment of various aspects (e.g., attack surface and performance efficiency) of distributed system quality.
... According to Fenton [1994], "measurement is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined rules". A software metric is a clearly defined rule that assigns values to software entities, such as components, classes, methods, or attributes of development processes. ...
Article
Full-text available
Software metrics measure quantifiable or countable software characteristics. Researchers may apply them to provide better product understanding, evaluate the process effectiveness, and improve the software quality. A threshold is a value that aids the proper interpretation of software measurements; it indicates whether or not a given value represents a quality risk. Thresholds are unknown for most software metrics, inhibiting their use in a software quality assessment process. In a previous paper, we proposed a catalog with 18 object-oriented software metrics thresholds, providing a preliminary case study in proprietary software to validate them. This article evaluates these thresholds more deeply, considering significant aspects. We show a new example of threshold derivation, discussing it qualitatively. We explain these software metrics and discuss their threshold values, presenting each one’s application level, definition, formula, and implications for the software design. We conduct a study with two software systems to evaluate the capacity of our thresholds to identify software quality enhancement after a restructuring process. We assess these thresholds using two case studies, comparing the evaluation provided by the thresholds with the qualitative analysis given by manual inspections. The study results indicate that the thresholds may lead to few false-positive and false-negative occurrences, i.e., the thresholds provide a proper quantitative assessment of software quality. This study contributes with empirical evidence that the metrics’ thresholds proposed in our previous work provide a proper interpretation of software metrics and, hence, may aid the application of software metrics in practice.
... Reliability recognizes the reliance between the disappointment procedure and the primary variables influencing techniques used in testing stage [19]. These models gauge the framework unwavering quality dimension and are partitioned in two primary classifications which are Fault-Count Model and MTBF model [20]. Software reliability methods are applied to limit the blame event and the negative impacts of software development failures [21]. ...
Conference Paper
Full-text available
Evaluating reliability of software is hot concern for decision makers and software engineers seeing as if we assess, it cannot be mastered. It is common that reliability of system stratum could be employed for evaluating growth of testing of system with help of comparison of failure of currently implemented software with required system. It is also generic that reliability of system echelon assures improved utilization by the user consequently enhanced contentment of customers. In this paper, our goal is to put forward a meticulously analysis of reliability measurement process. As the existing approaches still lack software reliability, quality and its growth, successful generation of test cases, scalability with large software systems, cost and customer satisfaction. we have projected a new framework of software reliability measurement process based on software metrics. Our model is evaluated by comparing it with an existing model and a published case study. The control experiment is used to evaluate FESR Framework. CCS CONCEPTS •
... Code Health focuses on how cognitively difficult it is for human developers to comprehend what the code is doing. The metric aligns with the mindset that the best strategy for gauging code quality is to aggregate a set of specific complexity attributes [21]. CodeScene parses source code to identify the presence of established code smells, e.g., God Class, God Methods, and Duplicated Code [22]. ...
Preprint
As generative AI is expected to increase global code volumes, the importance of maintainability from a human perspective will become even greater. Various methods have been developed to identify the most important maintainability issues, including aggregated metrics and advanced Machine Learning (ML) models. This study benchmarks several maintainability prediction approaches, including State-of-the-Art (SotA) ML, SonarQube's Maintainability Rating, CodeScene's Code Health, and Microsoft's Maintainability Index. Our results indicate that CodeScene matches the accuracy of SotA ML and outperforms the average human expert. Importantly, unlike SotA ML, CodeScene also provides end users with actionable code smell details to remedy identified issues. Finally, caution is advised with SonarQube due to its tendency to generate many false positives. Unfortunately, our findings call into question the validity of previous studies that solely relied on SonarQube output for establishing ground truth labels. To improve reliability in future maintainability and technical debt studies, we recommend employing more accurate metrics. Moreover, reevaluating previous findings with Code Health would mitigate this revealed validity threat.
... Reliability recognizes the reliance between the disappointment procedure and the primary variables influencing techniques used in testing stage [19]. These models gauge the framework unwavering quality dimension and are partitioned in two primary classifications which are Fault-Count Model and MTBF model [20]. Software reliability methods are applied to limit the blame event and the negative impacts of software development failures [21]. ...
Article
Evaluating reliability of software is hot concern for decision makers and software engineers seeing as if we assess, it cannot be mastered. It is common that reliability of system stratum could be employed for evaluating growth of testing of system with help of comparison of failure of currently implemented software with required system. It is also generic that reliability of system echelon assures improved utilization by the user consequently enhanced contentment of customers. In this paper, our goal is to put forward a meticulously analysis of reliability measurement process. As the existing approaches still lack software reliability, quality and its growth, successful generation of test cases, scalability with large software systems, cost and customer satisfaction. we have projected a new framework of software reliability measurement process based on software metrics. Our model is evaluated by comparing it with an existing model and a published case study. The control experiment is used.
Chapter
The field of software development has adopted object-oriented mechanism (OOM) widely. The software quality can be measured by evaluating quality measures such as reliability, reusability and maintainability. Reusability is most important attribute of software quality that helps developers produce high-quality, affordable software. Various metrics and model are available that can be utilize to evaluate the quality of object-oriented (OO) software. Here, we propose a metric set consisting of five metrics namely inherited class count, inherited attribute count, inherited method count, inheritance hierarchy, and degree of reusability to describe the effects of inheritance and reusability on the design quality of OO software. The validation of these metrics through three major validation framework has also been done.
Chapter
The chapter discusses areas that are essential to empirical research. These include ethical concerns and the importance of replication. Moreover, theory building is presented in the context of empirical research. Furthermore, measurement in software engineering is introduced since it is essential when performing empirical research. The chapter concludes by discussing software process improvement and how to conduct technology transfer from research to practice.
Chapter
This chapter presents the software reliability research. Software reliability study was initiated by Advanced Information Systems subdivision of McDonnell Douglas Astronautics Company, Huntington Beach, California, to conduct research into the nature of the software reliability problem including definitions, contributing factors, and means for control. Discrepancy reports, which originated during the development of two large-scale real-time systems, form two separate primary data sources for the reliability study. A mathematical model, descriptively entitled the De-Eutrophication Process, was developed to describe the time pattern of the occurrence of discrepancies. This model has been employed to estimate the initial or residual error content in a software package as well as to estimate the time between discrepancies at any phase of its development. The chapter describes the means of predicting mission success on the basis of errors which occur during testing. Moreover, it also describes the problems in categorizing software anomalies and discusses the special area of the genesis of discrepancies during the integration of modules.
Article
A summary is presented of the current state of the art and recent trends in software engineering economics. It provides an overview of economic analysis techniques and their applicability to software engineering and management. It surveys the field of software cost estimation, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.
Article
A decomposition theory for flowchart schemata is presented, and a series of algorithms for implementing the nested composition is discussed. It is suggested that the utilization of this process as an initial phase of a flowchart structuring routine, will ensure the recognition and preservation of a given flowchart's inherent topology.
Article
Many software measures have been forwarded on the simple basis of a high linear correlation coefficient with some measurable quantities. The linear correlation coefficient is an unreliable statistic for deciding whether an observed correlation indicates significant association. Several published software measure experiments collected more than 20 different measurements, or have 14 or fewer observations. With considerable data from small samples, the probability of 'discovering' a 'significant' correlation is high. We present a computer simulation experiment where the correlation between sets of randomly generated numbers is calculated. We also look at randomly generated numbers in the ranges that would be expected in Halstead's Software Science measures. Our results show that the average maximum linear correlation for randomly generated numbers is 0.70 or higher if the sample size is low compared to the number of variables. Alternative statistical approaches to obtaining meaningful significant results are presented.
Article
Standard measures of software quality have been set up for AT&T Bell Laboratories. These metrics allow a software project to be followed through its development, controlled introduction, and release to customers. The metrics serve both project and corporate management needs. For project management, they allow more effective management of development effort, and they help to ensure a fast and effective solution to problems that arise at any stage. For corporate management, they provide a vehicle for quantifying the overall quality of software development, for setting quality improvement objectives, and for tracking results. In particular, the metrics provide quantitative information on number of faults, normalized so that corporate results can be summarized and projects of differing size can be compared; the responsiveness of support organizations in resolving problems; and the impact of fixes on customers.
Article
Experience from a dozen years of analyzing software engineering processes and products is summarized as a set of software engineering and measurement principles that argue for software engineering process models that integrate sound planning and analysis into the construction process. In the TAME (Tailoring A Measurement Environment) project at the University of Maryland, such an improvement-oriented software engineering process model was developed that uses the goal/question/metric paradigm to integrate the constructive and analytic aspects of software development. The model provides a mechanism for formalizing the characterization and planning tasks, controlling and improving projects based on quantitative analysis, learning in a deeper and more systematic way about the software process and product, and feeding the appropriate experience back into the current and future projects. The TAME system is an instantiation of the TAME software engineering process model as an ISEE (integrated software engineering environment). The first in a series of TAME system prototypes has been developed. An assessment of experience with this first limited prototype is presented including a reassessment of its initial architecture