ArticlePDF Available

Abstract

Software measurement, like measurement in any other discipline, must adhere to the science of measurement if it is to gain widespread acceptance and validity. The observation of some very simple, but fundamental, principles of measurement can have an extremely beneficial effect on the subject. Measurement theory is used to highlight both weaknesses and strengths of software metrics work, including work on metrics validation. We identify a problem with the well-known Weyuker properties (E.J. Weyuker, 1988), but also show that a criticism of these properties by J.C. Cherniavsky and C.H. Smith (1991) is invalid. We show that the search for general software complexity measures is doomed to failure. However, the theory does help us to define and validate measures of specific complexity attributes. Above all, we are able to view software measurement in a very wide perspective, rationalising and relating its many diverse activities
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING.
VOL.
20,
NO.
3,
MARCH
I994
I99
Software Measurement:
A
Necessary Scientific Basis
Norman
Fenton
Abstruct-
Software measurement, like measurement in any
other discipline, must adhere to the science of measurement if
it is to gain widespread acceptance and validity. The observation
of some very simple, but fundamental, principles of measurement
can have an extremely beneficial effect on the subject. Measure-
ment theory is used to highlight both weaknesses and strengths of
software metrics work, including work
on
metrics validation. We
identify a problem with the well-known Weyuker properties, but
also show that a criticism of these properties by Cherniavsky and
Smith is invalid. We show that the search for general software
complexity measures is doomed to failure. However, the theory
does help
us
to define and validate measures of specific complexity
attributes. Above all, we are able to view software measurement
in a very wide perspective, rationalising and relating its many
diverse activities.
Index
Terms-Software measurement, empirical studies, met-
rics, measurement theory, complexity, validation.
I.
INTRODUCTION
T
IS
over eleven years since DeMillo and Lipton outlined
I
the relevance of measurement theory to software metrics
[
101. More recent work by the author and others [4],
[
1
I],
[46]
has taken the measurement theory basis for software metrics
considerably further. However, despite the important message
in this work, and related material (such
as
[31], 1341, [43],
[38]), it has been largely ignored by both practitioners and
researchers. The result is that much published work in software
metrics
is
theoretically flawed.
This
paper therefore provides
a
timely summary and enhancement of measurement theory
approaches, which enables us to expose problems in software
metrics work and show how they can be avoided.
In
Section
11,
we provide
a
concise summary of measure-
ment theory. In Section
111,
we use the theory to show that the
search for general-purpose, real-valued software ‘complexity’
measures is doomed to failure. The assumption that funda-
mentally different views of complexity can be characterised
by a single number is counter to the fundamental concepts
of measurement theory. This leads us to re-examine critically
the much cited Weyuker properties [45]. We explain how the
most promising approach is to identify specific attributes of
complexity and measure these separately. In Section
IV,
we
use basic notions of measurement to describe a framework
which enables
us
to view apparently diverse software measure-
ment activities
in
a
unified way. We look at some well-known
approaches to software measurement within this framework,
exposing both the good points and bad points.
Manuscript received August
1992;
revised September
1993.
This work was
The author is with
the
Centre for Software Reliability, City University,
IEEE
Log
Number
921.5569.
supported in pan by IED project SMARTIE and ESPRIT project PDCS?.
London, EClV OHB. UK.
11.
MEASUREMENT
FUNDAMENTALS
In this section, we provide
a
summary of the key concepts
from the science of measurement which are relevant to soft-
ware metrics. First, we define the fundamental notions (which
are generally not well understood) and then we summarise the
representational theory of measurement. Finally, we explain
how this leads inevitably to a goal-oriented approach.
A.
What
is
Measurement?
Measurement
is defined as the process by which numbers
or
symbols are assigned to attributes of entities in the real world
in such a way as to describe them according
to
clearly defined
rules
[
131, [36]. An
entity
may be an object, such as a person
or a software specification, or an event, such as
a
joumey or
the testing phase of
a
software project. An
atrrihute
is a feature
or property of the entity, such
as
the height or blood pressure
(of a person), the length or functionality (of a specification),
the cost (of
a
joumey),
or
the duration (of the testing phase).
Just what is meant by the numerical assignment “describing”
the attribute is made precise within the representational theory
of measurement presented below. Informally, the assignment
of numbers
or symbols must preserve any intuitive and em-
pirical observations about the attributes and entities. Thus,
for example, when measuring the height of humans bigger
numbers must be assigned to the taller humans, although the
numbers themselves will differ according to whether we use
metres, inches, or feet. In most situations an attribute, even one
as
well understood
as
height of humans, may have a different
intuitive meaning to different people. The normal way to get
round this problem
is
to define a
model
for the entities being
measured. The model reflects a specific viewpoint. Thus, for
example,
our
model of
a
human might specify a particular type
of posture and whether or not to include hair height
or
allow
shoes to be wom. Once the model is fixed there
is
a
reasonable
consensus about relations which hold for humans with respect
to height (these are the empirical relations). The need for
good models is particularly relevant in software engineering
measurement. For example, even as simple
a
measure
of
length
of programs
as
lines of code
(LOC)
requires a well defined
model of programs which enables
us
to identify unique lines
unambiguously. Similarly, to measure the effort spent on,
say,
the unit testing process we would need an agreed “model”
of the process which at least makes clear when the process
begins and ends.
There are two broad types of measurement: direct and indi-
rect.
Direct measurement
of an attribute
is
measurement which
does not depend on the measurement of any other attribute.
Indirect measurement
of an attribute is measurement which
involves the measurement of one or more other attributes. It
0098-5589/94$04.00
0
1994
IEEE
200
IEEE
TRANSACTIONS
ON
SOFTWARE
ENGINEERING,
VOL.
20,
NO.
3.
MARCH
1994
tums out that while some attributes can be measured directly,
we normally get more sophisticated measurement (meaning
a
more sophisticated scale, see below) if we measure indirectly.
For a good discussion of these issues, see [25], [27].
Uses
of
Measurement: Assessment and Prediction:
There
are two broad uses of measurement: for
assessment
and for
prediction.
Predictive measurement of an attribute
A
will
generally depend on
a mathematical model relating
A
to
some existing measures
of
attributes
AI,
.
.
.
,
A,.
Accurate
predictive measurement is inevitably dependent on careful
(assessment type) measurement of the attributes
AI,
.
. .
,
A,.
For example, accurate estimates of project resources are
not
obtained by simply “applying” a cost estimation model with
fixed parameters [26]. However, careful measurement of key
attributes of completed projects can lead to accurate resource
predictions for future projects [22] Similarly, it is possible
to get accurate predictions of the reliability of software in
operation, but these are dependent on careful data collection
relating to failure times during alpha-testing
[SI.
For predictive measurement the model alone is not suffi-
cient. Additionally, we need to define the procedures for
a)
determining model parameters and b) interpreting the results.
For example, in the case of software reliability prediction we
might use maximum likelihood estimation for
a) and Bayesian
statistics forb). The model, together with procedures a) and b),
is called
a
prediction system
[29]. Using the same model will
generally yield different results if we use different prediction
procedures.
It must be stressed that, for all but the most trivial attributes,
proposed predictive measures in software engineering are
invariably stochastic rather than deterministic. The same is
true of proposed indirect measures
[14].
Measurement Activities must have Clear Objectives:
The
basic definitions of measurement suggest that any measure-
ment activity must proceed with very clear objectives or
goals. First you need to know whether you want to measure
for assessment or for prediction. Next, you need to know
exactly which entities are the subject of interest. Then you
need to decide which attributes
of
the chosen entities are the
significant ones. The definition of measurement makes clear
the need to specify both an entity and an attribute before
any measurement can be undertaken (a simple fact which
has been ignored in much software metrics activity). Clearly,
there are no definitive measures which can be prescribed for
every objective in any application area. Yet for many years
software practitioners expected precisely that: ‘what software
metric should we be using?’ was, and still
is,
a
commonly
asked question. It says something about the previous ignorance
of scientific measurement in software engineering that the
Goal/Question/Metric paradigm of Basili and Rombach [7] has
been hailed
as
a revolutionary step forward. GQM spells out
the above necessary obligations for setting objectives before
embarking on any software measurement activity.
B.
Representational Theory
of
Measurement
The Issues Addressed:
Although there is
no
universally
agreed theory of measurement, most approaches are devoted
to resolving the following issues: what is and what is not
measurement; which types of attributes can and cannot be
measured and on what kind
of
scales; how do we know if we
have really measured an attribute; how to define measurement
scales; when is
an
error margin acceptable or not; which
statements about measurement are meaningful. The texts
[
131,
[25], [27], [36], [40] all deal with these issues. Here we
present
a brief overview of the
representational
theory of
measurement
[
131, [2S].
Empirical Relation Systems:
Direct measurement of
a
par-
ticular attribute possessed
by
a set of entities must be preceded
by intuitive understanding of that attribute. This intuitive
understanding leads to the identification
of
empirical relations
between entities. The set of entities
C,
together with the set of
empirical relations
R,
is called an
empirical relation system
(C,
R)
for the attribute in question. Thus the attribute of
“height” of people gives rise to empirical relations like “is
tall”, “taller than”, and “much taller than”.
Representation Condition:
To measure the attribute that is
characterised by an empirical relation system
(C,
R)
requires
a mapping
M
into a
numerical relation system
(N,
I‘).
Specif-
ically,
M
maps entities in
C
to numbers (or symbols) in
N,
and empirical relations in
R
are mapped to numerical
relations in
P,
in such
a
way that all empirical relations are
preserved. This is the so-called
representation condition,
and
the mapping
A4
is called
a
representation.
The representation
condition asserts that the correspondence between empirical
and numerical relations
is
two way. Suppose, for example,
that the binary relation
<
is mapped by
M
to the numerical
relation
<.
Then, formally, we have the following instance:
Thus, suppose,
C
is the set of all people and
R
contains the
relation “taller than”.
A
measure
M
of height would map
C
into the set of real numbers
3
and ‘taller than’ to the relation
“>”.
The representation condition asserts that person
A
is
taller than person
B,
if and only if
M(A)
>
M(B).
By having to identify empirical relations for an attribute
in advance, the representational approach to measurement
avoids the temptation to
define
a
poorly understood, but
intuitively recognisable, attribute in terms of some numerical
assignment. This is one of the most common failings in
software metrics work. Classic examples are where attributes
such
as
“complexity”
or
“quality” are equated with proposed
numbers; for example, complexity with
a
“measure” like
McCabe’s cyclomatic number [30], or Halstead’s
E
[
181,
and
“quality” with Kafura and Henry’s fan-in/fan-out equation
Scale Types and Meaningfulness:
Suppose that an attribute
of some set
of
entities has been characterised by an empirical
relation system
(C,
R).
There may in general be many ways of
assigning numbers which satisfy the representation condition.
For example, if person
A
is taller than person
B,
then
M(
A)
>
M(B)
irrespective of whether the measure
hf
is
in
inches,
feet, centimetres metres, etc. Thus, there are many different
measurement representations for the normal empirical relation
system for the attribute of height of people. However, any two
representations
M
and
M’
are related
in
a
very specific way:
there is always some constant
c
>
0
such that
M
=
cM’
Representation Condition:
z
<
y
U
M(x)
<
M(1y)
~31.
FENTON: SOFTWARE MEASUREMENT A
NECESSARY SCIENTIFIC
BASIS
20
1
(so
where
M
is the representation of height in inches and
M’
in centimetres, c
=
2.54).
This transformation from
one valid representation into another is called an
admissible
transformation.
It is the class of admissible transformations which deter-
mines the
scale type
for an attribute (with respect to some
fixed empirical relation system). For example, where every
admissible transformation is a scalar multiplication (as for
height) the scale type is called
ratio.
The ratio scale is
a sophisticated scale of measurement which reflects a very
rich empirical relation system. An attribute is never of ratio
type
a priori;
we normally start with a crude understanding
of an attribute and a means of measuring it. Accumulating
data and analysing the results leads to the clarification and
re-evaluation of the attribute. This in tum leads to refined
and new empirical relations and improvements in the accu-
racy of the measurement; specifically this is an improved
scale.
For many software attributes we are still at the stage of
having very crude empirical relation systems. In the case of
an attribute like “criticality” of software failures an empirical
relation system would at best only identify different classes of
failures and a binary relation “is more critical than”. In this
case, any two representations are related by a monotonically
increasing transformation. With this class of admissible trans-
formations, we have an
ordinal
scale type. In increasing order
of sophistication, the best known scale types are:
nominal,
ordinal, interval, ratio,
and
absolute.
For full details about the
defining classes of admissible transformations, see
1361.
This formal definition of scale type based on admissible
transformations enables
us
to determine rigorously what kind
of statements about measurement are meaningful. Formally,
a statement involving measurement is
meaninllful
if its truth
or falsity remains unchanged under any admissible transfor-
mation of the measures involved.
Thus, for example, it is
meaningful to say that “Hermann is twice as tall as Peter”; if
the statement is true (false) when we measure height in inches,
it will remain true (false) when we measure height in any
constant multiple of inches. On the other hand the statement
“Failure
:E
is twice as critical as failure
y”
is not meaningful
if we only have an ordinal scale empirical relation system
for failure criticality. This is because a valid ordinal scale
measure
M
could define
M(z)
=
6,
M(y)
=
3, while another
valid ordinal scale measure
M’
could define
M’(Lc)
=
10,
M’(y)
=
9.
In this case the statement is true under
M
but
false under
M’.
The notion of meaningfulness also enables
us
to determine
what kind of operations we can perform on different measures.
For example, it is meaningful to use
means
for computing the
average of a set of data measured on a ratio scale but not on
an ordinal scale.
Medians
are meaningful for an ordinal scale
but not for a nominal scale. Again, these basic observations
have been ignored in many software measurement studies,
where a common mistake is to use the mean (rather than
median) as measure of average for data which is only ordinal.
Good examples of practical applications of meaningfulness
ideas may be found in
131,
FW86,
[37],
1391.
An alternative
definition of meaningfulness is given in
1161.
Representation Theorems:
The serious mathematical as-
pects of measurement theory are largely concemed with
theorems which assert conditions under which certain scales of
direct measurement are possible for certain relation systems.
A typical example of such a theorem, due to Cantor, gives
conditions for real-valued ordinal-scale measurement when
we have a countable set of entities
C
and a binary relation
b
on
C:
Cantor’s Theorem:
The empirical relation system
(C.
b)
has a representation in
(9,
<)
if and only if
b
is a strict weak
order. The scale type is ordinal when such a representation
exists.
The relation
b
being a “strict weak order” means that it is:
1)
asymmetric
(zRy
implies that it is not the case
yRz),
and
2)
negatively transitive
(zRy
implies that for every
z
E
C,
either
:cRz
or
zRy).
111.
MEASURING SOFTWARE
“COMPLEXITY”
The representational theory of measurement is especially
relevant to the study of software complexity measures. In
this section we show that the search for a general-purpose
real-valued complexity measure is doomed to failure, but
that there are promising axiomatic approaches which help
us
to measure specific complexity attributes. However, one
well-known axiomatic approach
1451 has serious weaknesses
because it attempts to characterise incompatible views of
complexity.
A.
General Complexity Measures: The Impossible Holy Grail
For many years researchers have sought to characterise
general notions of “complexity” by a single real number.
To simplify matters, we first restrict our attention to those
measures which attempt only to characterise control-flow com-
plexity. If we can show that
it
is impossible to define a general
measure of control-flow complexity, then the impossibility of
even more general complexity measures is certain.
Zuse cites dozens of proposed control-flow complexity
measures in
1461.
There seems to
be
a minimum assumption
that the empirical relation system for complexity of programs
leads to (at least) an ordinal scale. This is because of the
following hypotheses which are implicit in much of the work.
Hypothesis
I:
Let
C
be the class of programs. Then the
attribute control-flow “complexity” is characterised by an
empirical relation system which includes a binary relation
h
“less complex than”; specifically
(x.
y)
E
b
if there is a
consensus that
LC
is less complex than
y.
Hypothesis
2:
The proposed measure
M:
C
+
R
is a
representation of complexity in which the relation
b
is mapped
to
<.
Hypothesis
I
seems plausible. It does not state that
C
is
totally ordered with respect to
b;
only that there is some
general
view of complexity for which there would be a
reasonable consensus that certain pairs of programs are in
h.
For example, in Fig.
1,
it seems plausible that
(.E,
y)
E
b
(from
the measurement theory viewpoint
it
would be good enough
if most programmers agreed this). Some pairs appear to be
202
IEEE
TRANSACTIONS ON SOFTWARE ENGINEERING,
VOL. 20,
NO.
3,
MARCH
1994
L
-1
Fig.
I.
Complexity relation not negatively transitive’?
incomparable, such as
:I:
and z
or
:q
and
z;
if people were
asked to “rank” these for complexity they would inevitably
end up asking questions like “what is meant by complexity”
before attempting to answer. Since
b
is supposed to capture a
general view of complexity, this would be enough to deduce
that
(x:z)
4
b
and
(z:~)
4
b
and also that
(2.y)
$
b
and
(y.
z)
4
6.
The idea of the inevitable incomparability of some
programs, even for some specific views of complexity, has
also been noted in [43].
Unfortunately, while Hypothesis
I
is plausible, Hypothesis
2
can be dismissed because of the Representation Condition.
The problem is the “incomparable” programs. While
b
is not
a total order
in
C,
the relation
<
is
a total order in
R.
The
measurement mapping
M
might force an order which has to
be reflected back in
C.
Thus, if for example
M(z)
<
M(w)
(as
in
the case
of
McCabe’s cyclomatic complexity measure
in Fig.
1
where
M(z)
=
2
and
M(y)
=
3)
then, if
A!!
is really
a measure of complexity, the Representation Condition asserts
that we must also have
z
<
;q
for which there is no consensus.
Formally we can prove the following theorem.
Theorem
I:
Assuming Hypothesis
1,
there is no general
notion of control-flow complexity of programs which can be
measured on an ordinal scale in
(1R.
<)
To
prove this, the previous argument is made formal by
appealing to Cantor’s Theorem. It is enough
to
show that
the relation
b
is not a strict weak order. This follows since
(according to
our
definition of
h)
it is reasonable
to
deduce
that
(x.?J)
E
b
but
(x:,
z)
$
b
and
(2.y)
$
b
(.’
since
it
is not
clear that any consensus exists about the relative complexities
of
:c
and
z
and lij and
z).
The theorem should put an end
to
the
search for the
holy grail of a general complexity measure. However, it
does not rule
out
the search for measures that characterise
specific views of complexity (which is the true measurement
theory approach). For example, a specific program complexity
attribute is “the number of independent paths.” McCabe’s
cyclomatic complexity is an absolute scale measure of this
attribute. It might even be a ratio scale measure of the attribute
of ‘testability’ with respect to independent path testing. Other
specific attributes of complexity, such as the maximum depth
of nesting, distribution of primes in the decomposition tree,
and the number of paths of various types, can all be measured
rigorously and automatically
I
I
I],
[34].
This idea of looking
at
measures with respect to particular
viewpoints of complexity is taken much further by Zuse [46].
Zuse uses measurement theory to analyse the many complexity
measures in the literature; he shows which viewpoint and
assumptions are necessary to use the measures on different
scales. The beauty and relevance of measurement theory is
such that it clearly underlies some of the most promising
work in software measurement even where the authors have
not made the explicit link. Notable in this respect are the
innovative approaches of Melton
et
al.
[31]
and Tian and
Zelkowitz
[43].
In both of these works, the authors seek
to characterise specific views of complexity.
In
[43],
the
authors do this by proposing a number of axioms reflecting
viewpoints of complexity:
in
the context of measurement
theory, the axioms correspond to particular empirical relations.
This means that the representation condition can be used to
determine the acceptability of potential measures.
Melton
et
a/.
[31] characterise a specific view of program
complexity by specifying precisely an order relation
5
on
program flowgraphs; in other words they
dejine
the binary
relation
b
(of Hypothesis 1) as
5.
The benefit of this approach
is that the view of complexity is explicit and the search for
representations (i.e., measures of this view of complexity)
becomes purely analytical. The only weakness
in
[31] is
the assertion that a measure
M
is “any real-valued mapping
for which
M(x)
5
M(y) whenever
z
5
:y.”
This ignores
the sufficiency condition of the Representation Condition.
Thus, while McCabe’s cyclomatic complexity
[30]
satisfies
necessity, (and is therefore a “measure” according to Melton
et
U/.
[31]),
it is not a measure in the representational sense
(since in Fig.
1
M(z)
<
M(y)
but
it
is not the case that
z
5
y).
Interestingly, Tian and Zelkowitz also use the same
weakened form of representation, but acknowledge that they
“would like the relationship” to be necessary and sufficient.
It follows from Cantor’s theorem that there is no repre-
sentation of Melton’s
(F:
+)
in
(Y?;
<).
However, it is still
possible
to
get ordinal measurement
in
a number system which
is not
(X,
<)
(and hence, for which
it
is not required that
<
is a strict weak order), although the resulting measure is of
purely theoretical interest.
It
is shown
in
[
121 that there is
a representation in
(Nut,
I)
where
Nut
is the set of natural
numbers and
I
is the divides relation. The construction of the
measurement mapping
M
is based on ensuring incomparable
flowgraphs are mapped to mutually prime numbers.
For
the
flowgraphs of Fig.
I,
M(z)
=
2,
M(:E)
is a fairly large
multiple of
3, and M(y) is a very large multiple of 3.
B.
The
Weyukei-
Pi-operties
Despite the above evidence, researchers have continued
to search for single real-valued complexity measures which
are
expected
to have the magical properties of being
key
indicators of such diverse attributes as
comprehensihility,
cot-1-ectness, muintaiizuhility.
I-eliahility,
testuhility,
and euse
of
implementation
[30],
1321.
A
high value for a “complexity”
measure is supposed to be indicative of low comprehensibility,
low reliability, etc. Sometimes these measures are also called
“quality” measures
1231.
In
this case, high values of the
measure actually indicate low values of the quality attributes.
The danger of attempting
to
find measures which char-
acterise
so
many different attributes is that inevitably the
measures have
to
satisfy
conjlicting
aims. This is counter to
the representational theory of measurement. Nobody would
expect a single number
M
to characterise every notion of
FENTON:
SOtTWARE
MEASUREMENT:
A
NECESSARY
SCIENTIFIC
BASIS
203
“quality” of people, which might include the very different
notions of
a)
physical strength, and b) intelligence. If such
a
measure
M
existed it would have to satisfy a) M(A)
>
M(B)
whenever
A
is stronger than
B
and b)
M(A)
>
M(B)
completely.
whenever
A
is more intelligent than
B.
The fact that some
highly intelligent people are very weak physically ensures
that no
M
can satisfy both these properties. Nevertheless,
Weyuker’s list of properties [45] seems
to
suggest the need
measure is
a
“metric”, the converse is certainly not true. The
confusion in [9], and
also in [45], arises from wrongly equating
these two concepts, and ignoring the theory
of
measurement
IV.
UNIFYING
FRAMEWORK
FOR
SOFTWARE
MEASUREMENT
A,
A
Classification ofsofnyare
Measures
”-
for analogous software “complexity” measures.
For
example,
two of the properties that Weyuker proposes any complexity
measure
M
should satisfy are the following properties.
Property
A:
For
any
program
bodies
p,
62,
M(p)
I
M(P;
Q)
and
M(Q)
I
M(P;Q).
Property
B:
There exist program bodies p.
Q,
and
such
that
M(P)
=
M(Q)
and
M(P;R)
#
M(Q:R).
Property
A
that adding
code
to
a program
cannot
decrease
its
complexity.
mis
reflects
the view that
program
In software measurement activity, there are three classes of
Processes:
are any software related activities which take
place over time.
Products:
are any artefacts, deliverables
or
documents
which arise out of the processes.
Resources:
are the items which are inputs to processes.
We make a distinction between attributes of these which
entities of interest
[
1
13.
size
is
a
key factor in its complexity. We can
also
conclude
from Property A that low comprehensibility is
not
a
key factor
in complexity. This is because it is widely believed that in
certain cases we can understand a program
more
easily as
we see more of it [43]. Thus, while
a
“size” type complexity
measure
M
should satisfy property A,
a
“comprehensability”
type complexity measure
M
cannot satisfy property A.
Property
B
asserts that we can find two program bodies
of equal complexity which when separately concatenated to
a
same third program yield programs of different complexity.
Clearly, this property has much to do with comprehensibility
and little
to
do with size.
Thus, properties A and
B
are relevant for very different,
and incompatible, views of complexity. They cannot both be
satisfied by a single measure which captures notions of size
and
low comprehensibility. Although the above argument is
not formal, Zuse has recently proved [47] that, within the
representational theory of measurement, Weyuker’s axioms
are contradictory. Formally, he shows that while Property A
explicitly requires the ratio scale for
M,
Property
B
explicitly
excludes the ratio scale.
The general misunderstanding of scientific measurement in
software engineering is illustrated further in a recent paper [9],
which was itself
a critique of the Weyuker’s axiom. Chemi-
avsky and Smith define
a
code based “metric” which satisfies
all of Weyuker’s axioms but, which they rightly claim, is not
a
sensible measure of complexity. They conclude that axiomatic
approaches may not work. There is no justification for their
conclusion. On the one hand,
as
they readily accept, there was
no suggestion that Weyuker’s axioms were complete. More
importantly, what they fail to observe,
is
that Weyuker did not
propose that the axioms were
suflcient;
she only proposed that
they were necessary. Since the Chemiavsky/Smith “metric”
is
clearly not
a
measure (in
our
sense) of any specific attribute,
then showing that
it
satisfies any set
of
necessary axioms for
any measure is of no interest at
all.
These problems would have been avoided by a simple
lesson from measurement theory: the definition of
a
numerical
mapping does not in itself constitute measurement.
It
is popular
in software engineering to use the word “metric” for any
number extracted from
a
software entity. Thus while every
are
internal
and
external.
Internal attributes
of a product, process,
or
resource are
those which can be measured purely in terms of the product,
process,
or
resource itself.
For
example, length is an intemal
attribute of any software document, while elapsed time is
an internal attribute of any software process.
External attributes
of
a
product, process, or resource are
those which can only be measured with respect to how
the product, process, or resource relates to other entities
in its environment.
For
example,
reliability
of
a
program
(a
product attribute) is dependent not just on the program
itself, but on the compiler, machine, and user.
Productivio
is an external attribute of
a
resource, namely people (either
as
individuals or groups); it is clearly dependent on many
aspects of the process and the quality of products delivered.
Software managers and software users would most like
to measure and predict extemal attributes. Unfortunately,
they are necessarily only measurable indirectly.
For
example,
productivity of personnel is most commonly measured
as
a
ratio
of
size
of code delivered
(an
intemal product attribute);
and
effort
(an intemal process attribute). The problems with
this oversimplistic measure of productivity have been well
documented. Similarly, “quality” of
a
software system (a very
high level extemal product attribute) is often defined
as
the
ratio of:
faults discovered during formal
testing
(an internal
process attribute); and
size measured
by
KLOC)
[
191. While
reasonable for developers, this measure of quality cannot be
said to be
a
valid measure from the viewpoint of the user.
Empirical studies have suggested there may be little real
correlation between faults and actual failures of the software
in operation. For example, Adams
[
11
made
a
significant study
of a number of large software systems being used on many
sites around the world; he discovered that a large proportion
of faults almost never lead to failures, while less than
2%
of
the known faults caused most of the common failures.
It is rare for a genuine consensus to be reached about the
contrived definitions of external attributes. An exception is the
definition
of
reliability of code in terms of probability of failure
free-operation within a given usage environment
[21],
[29].
In this case, we need
to
measure intemal process attributes.
The processes are each of the periods of software operation
FENTON: SOFTWARE MEASUREMENT: A NECESSARY SCIENTIFIC BASIS
be considered; module calling structures with widely varying
widths are not considered to be very modular because of ideas
of chunking from cognitive psychology.
D.
Validating Software Measures
Validating a software measure in the assessment sense is
equivalent to demonstrating empirically that the representation
condition is satisfied for the attribute being measured. For
a measure in the predictive sense, all the components of
the prediction system must be clearly specified and a proper
hypothesis proposed, before experimental design for validation
can begin.
Despite these simple obligations for measurement valida-
tion, the software engineering literature abounds with so-called
validation studies which have ignored them totally. This
phenomenon has been examined thoroughly in
[
141 and [33],
and fortunately there is some recent work addressing the
problem [38]. Typically a measure (in the assessment sense)
is proposed. For example, this might be a measure of an
intemal structural attribute of source code. The measure is
“validated” by showing that it correlates with some other
existing measure. What this really means is that the proposed
measure is the main independent variable in a prediction
system. Unfortunately, these studies commonly fail to specify
the required prediction system and experimental hypothesis.
Worse still, they do not specify, in advance, what is the
dependent variable being predicted. The result is often an
attempt to find fortuitous correlations with any data which
happens to be available.
In
many cases, the only such data
happens to be some other structural measure. For example,
in [28], structural type measures are “validated” by showing
that they correlate with “established” measures like LOC
and McCabe’s cyclomatic complexity number. In such cases,
the validation study tells us nothing
of
interest. The general
dangers of the “shotgun” approach to correlations of software
measures have been highlighted in
[8].
The search for rigorous software measures has not been
helped by a commonly held viewpoint that
no measure is
“valid” unless it is a good predictor of effort. An analogy
would be to reject the usefulness of measuring a person’s
height
on
the grounds that it tells us nothing about that person’s
intelligence. The result is that potentially valid measures of
important intemal attributes become distorted. Consider, for
example, Albrecht’s function points [2]. In this approach, the
unadjustedfunction count
UFC seems to be a reasonable mea-
sure of the important attribute of
functionality
in specification
documents. However, the intention was to define a single
size measure as the main independent variable in prediction
systems for effort. Because of this, a
technical complexity
factor
(TCF), is applied to UFC to arrive at the number
of
function points FP which is the model in the prediction system
for effort. The TCF takes account of 14 product and process
attributes in Albrecht’s approach, and even more in Symons’
approach [41]. This kind of adjustment (to a measure
of
system
functionality) is analogous to redefining measures of height
of people in such a way that the measures correlate more
closely with intelligence. Interestingly, Jeffery [20] has shown
205
that the complexity adjustments do not even improve effort
predictions; there was
no significant differences between UFC
and
FF’
as effort predictors in his studies. Similar results have
been reported by Kitchenham and Kansala [24].
V,
SUMMARY
Contrary to popular opinion, software measurement, like
measurement in any other discipline, must adhere to the
science of measurement if it is to gain widespread acceptance
and validity. The representational theory
of
measurement
asserts that measurement
is the process of assigning numbers
or symbols to attributes
of
entities in such a way that all
empirical relations
are
preserved. The entities of interest in
software can
be
classified as processes, products, or resources.
Anything we may wish to measure or predict is an identifiable
attribute of these. Attributes are either intemal or extemal.
Although extemal attributes like reliability of products, sta-
bility of processes, or productivity of resources tend to
be
the ones we
are
most interested in measuring, we cannot do
so
directly. We are generally forced to measure indirectly in
terms of intemal attributes. Predictive measurement requires a
prediction system.
This means not just a model but also a set
of prediction procedures for determining the model parameters
and applying the results. These in tum are dependent
on
accurate measurements in the assessment sense.
We have used measurement theory to highlight both weak-
nesses and strengths of software metrics work, including
work
on metrics validation. Invariably, it seems that the most
promising theoretical work has been using the key components
of measurement theory. We showed that the search for general
software complexity measures is doomed to failure. However,
the theory does help us to define and validate measures of
specific complexity attributes.
ACKNOWLEDGMENT
I
would like to thank
B.
Littlewood and M. Neil for
providing comments
on
an
earlier draft of this paper, and
P.
Mellor,
S.
Page, and R. Whitty for sharing views and
information that have influenced its contents. Finally, I would
like
that
[I1
[21
[31
[41
[SI
L61
171
to
thank four anonymous referees who made suggestions
clearly improved the paper.
REFERENCES
E.
Adams, “Optimizing preventive service of software products.”
IBM
Res.
J.,
vol. 28, no.
1,
pp. 2-14, 1984.
A.
J. Albrecht, “Measuring application development productivity,” in
IBM Applic.
Dev.
Joint SHAREIGUIDE Symp.,
Monterey,
CA,
1979,
J. Aczel, F.
S.
Roberts, and
Z.
Rosenbaum, “On scientific laws without
dimensional constants,”
J.
Math. Anal. Applicat..
vol. 119, no. 389416,
1986.
A.
Baker, J. Bieman,
N.
E. Fenton,
D.
Gustafson,
A.
Melton, and
R.
W.
Whitty,
“A
philosophy for software measurement,”
J.
Syst. Software,
vol. 12, pp. 277-281, July 1990.
S.
Brocklehurst, P.
Y.
Chan, B. Littlewood, and J. Snell, “Recalibrating
software reliability models,”
IEEE
Trans.
Softwure Eng.,
vol. 16, no. 4,
pp. 458-470, Apr. 1990.
B.
Boehm,
Software Engineering Economics.
Englewood Cliffs,
NJ:
Prentice
Hall,
1981.
V.
Basili and
D.
Rombach, “The tame project: Towards improvement-
orientated software environments,”
IEEE Trans. Software Eng.,
vol. 14,
no. 6, pp. 758-773, June 1988.
pp. 83-92.
206
IEEE TRANSACTIONS
ON
SOFTWARE ENGINEERING,
VOL.
20,
NO.
3,
MARCH
1994
[8]
R.
E. Courtney and D. A. Gustafson, “Shotgun correlations in software
measures,”
IEE Software Eng.
J.,
vol.
8,
no.
1,
pp.
5-13,
1993.
[9] J. C. Chemiavsky and C. H. Smith, “On weyuker’s axioms for software
complexity measures,”
IEEE
Trans.
Software Eng.,
vol. 17, no.
6,
pp.
63&638,
June 1991.
[lo] R. A. DeMillo and R. J. Lipton, “Software project forecasting,” in
Software Metrics,
A.
J.
Perlis, F.G. Sayward, and M. Shaw, Eds.
Cambridge, MA: MIT Press, 1981, pp. 77-89.
[l I] N. E. Fenton,
Software
Metrics:
A Rigorous Approach.
London: Chap-
man
&
Hall, 1991.
[
121
-,
“When
a
software measure
is
not
a
measure,”
IEE Software Eng.
J.,
vol. 7, no.
5,
pp. 357-362, May 1992.
[I31
L.
Finkelstein, “A review of the fundamental concepts of measurement,”
Measurement,
vol. 2, no.
I,
pp. 25-34, 1984.
[
141 N. E. Fenton and B. A. Kitchenham, “Validating software measures,”
J.
Software Testing, Verification
&
Reliability,
vol. 1, no. 2, pp. 27-42,
1991.
[15] N.E. Fenton and A. Melton, “Deriving structurally based software
measures,”
J.
Syst.
Software,
vol.
12,
pp. 177-187, July 1990.
[I61 J.-C. Falmagne and
L.
Narens, “Scales and meaningfulness of quantita-
tive laws.”
Synthese,
vol.
55,
pp. 287-325, 1983.
[I71
P.
J. Fleming and J. J. Wallace, “How not
to
lie with statistics,”
Commun.
[18] M. H. Halstead.
Elements
of
Sofrware Science.
Amsterdam: Elsevier
North Holland, 1975.
[19]
J.
Inglis, “Standard software quality metrics,”
AT&T Tech.
J.,
vol.
65,
no. 2, pp. 113-118, Feb. 1985.
1201 D.R. Jeffery, G.C. Low, and M. Bames, “A comparison of function
point counting techniques,”
IEEE Trans. SOfMwP Eng.,
vol. 19, no.
5,
pp. 529-532, Mar. 1993.
[21] Z. Jelinski and P. B. Moranda, “Software reliability research,” in
Statisti-
cal Computer Performance Evaluation,
W. Freiberger, Ed. New York:
Academic Press, 1972, pp. 465-484.
[22] B. A. Kitchenham and B. de Neumann, “Cost modelling and estimation,”
in
Software Reliability Handbook,
P. Rook, Ed. New York: Elsevier
Applied Science, 1990, pp. 333-376.
[23]
D.
Kafura and
S.
Henry, “Software quality metrics based on intercon-
ACM,
vol. 29, pp. 218-221, 1986.
[33] M. Neil, “Multivariate assessment of software products,”
J.
Sofmare
Testing Verification and Reliability,
to appear 1992.
[34]
R.
E. Prather and
S.
G.
Giulieri, “Decomposition of flowchart schemata.”
Comput.
J..
vol. 24, no.
3,
pp. 258-262, 1981.
[35]
R.
S.
Pressman,
Software Engineering: A Practitioner’s Approach.,
2nd
ed. New York: McGraw-Hill Int., 1987.
[361
F.
S.
Roberts,
Measurement Theory with Applications
to
Decision
Mak-
ing,
Utility. and the Social Sciences.
Reading, MA: Addison Wesley,
1979.
[37]
-,
“Applications of the theory of meaningfulness to psychology,”
J.
Marh. Psychol.,
vol.
29,
pp. 311-332, 1985.
[38]
N. F. Schneidewind, “Methodology for validating software metrics,”
IEEE Trans. Software Eng.,
vol.
18,
no.
5,
pp. 41M22, May 1992.
[39]
J.
E. Smith, “Characterizing computer performance with
a
single num-
ber,”
Commun. ACM,
vol.
31,
pp. 1202-1206, 1988.
[40]
P. H. Sydenham, Ed.,
Handbook of Measurement Science,
vol.
1.
New
York: J. Wiley, 1982.
[41] C. R. Symons, “Function point analysis: Difficulties and improvements,”
IEEE Trans. Software Eng.,
vol. 14, no.
1,
pp. 2-11, Jan. 1988.
[42] D. A. Troy and S.H. Zweben, “Measuring the quality of structured
design,”
J.
Syst.
Sofhvare,
vol.
2,
pp. 113-120, 1981.
[43]
J.
Tian and M. V. Zelkowitz, “A formal program complexity model and
its applications,”
J.
Syst.
Software,
vol. 17,
pp.
253-266, 1992.
[44]
S.N. Woodfield, H.E. Dunsmore, and V. Y. Shen, “The effects of
modularisation and comments on program comprehension,” in
Proc.
Sth Int. Conf. Software Eng.,
1979, pp. 213-223.
[45] E.
J.
Weyuker, “Evaluating software complexity measures,’’
IEEE
Trans.
Software
Eng.,
vol. 14, no. 9, pp. 1357-1365, Sept. 1988.
[46] H. Zuse,
Software Complexity:
Measures
and Methods.
Amsterdam:
de Gruyter, 1990.
[47]
-,
“Support of experimentation by measurement theory,” in
Experi-
mental Software Engineering Issues (Lecture Notes in Computer Science,
vol. 706). H.D. Rombach. V.R. Basili, and R. W. Selby, Eds.
New
York: Springer-Verlag, 1993, pp. 137-140.
.~
nectivity,”
J
Syst
&
S&waie,
vol.
2,
pp.
121-131, 1981
[241 B. A. Kitchenham and K Kansala, “Inter-item correlations among
function points,” in
IEEE Software Metrics Svmp
,
Baltimore, MD, 1993,
pp.
11-15
[25] D H. Krantz, R.D. Luce,
P.
Suppes, and A. Tversky,
Foundations of
Measurement,
vol.
1.
[26]
B. A Kitchenham and
N.
R.
Taylor, “Software project development cost
estimation,”
J
Syst Software,
vol.
5,
pp. 67-278. 1985
[271
H.
E.
Kyburg,
Theory and Measuiement
Cambridge. Cambridge Univ
Press, 1984.
t281
H
F
Li
and W.K Cheung, “An empincal study
of
software metncs,”
IEEE Trans Sofhare Eng
,
vol. 13, no.
6,
June 1987.
[291 B Littlewood, “Forecasting software reliability,” in
Software Reliabilin,
Modelling
and
Identrfication,
S
Bittanti, Ed.
(Lecture Notes in Computer
Science,
vol 341)
[30] T. J McCabe, “A complexity measure,”
IEEE Trans Sojrware Eng
,
vol
SE-2, no 4, pp 308-320, Apr. 1976.
[3
11
A. C. Melton, D A. Gustafson, J. M. Bieman, and A. A Baker, “Mathe-
matical perspective of software measures research,”
IEE
Sojtwaie
Eng
J,
vol
5,
no.
5,
pp 246-254, 1990
[321 J. C. Munson and Khoshgoftaar, “The detection of fault prone modules,”
IEEE Ttans Software En,?,
vol. 18, no
5,
pp 423-433, May 1992
Norman Fenton is
a
Professor of Computing Sci-
ence in the Centre for Software Reliability
at
City
University, London,
UK.
He was previously the
Director of the Centre for Systems and Software
Engineering (CSSE) at South Bank University and
a
Post-Doctoral Research Fellow at Oxford University
(Mathematical Institute)
He has consulted widely to industry about metncs
programs, dnd has
also
led numerous collaborative
projects. One such current project is developing
a
measurement based framework
for
the assessment
of software engineering standards and methods. His research interests are in
software measurement and formal methods of software development. He has
wntten three books on these subjects and published many papers.
Prof Fenton
is
Editor of Chapman and Hall’s
Computer Science Research
and Piactice Series
and is on the Editorial Board of the
Software Quality
Journal.
He has chaired several international conferences on software metrics
Prof Fenton is Secretary
of
the (National) Centre
for
Software Reliability He
is
a
Chartered Engineer (Member of the IEE), and Associate Fellow of the
Institute of Mathemdtics and its Applications, and is
a
member of the IEEE
Computer Society
New York. Academic Press, 1971.
New York. Springer-Verlag, 1988, pp. 141-209.
... The metric is a numeric and absolute score ranging from 10 (code of superior quality) to 1 (terribly bad code quality). Code Health acknowledges the idea that the best approach to measuring source code complexity is by identifying and measuring a set of specific complexity attributes [17]. To meet this end, Code Health incorporates low-level elements from the ISO/IEC 5055 maintainability section, and complements them with design-level code smells 2 . ...
... Based on the change size measured in LoC, we use blocking to mitigate the influence of complexity on the issue resolution time. We use the following three blocks: Small (< 5 LoC), Medium (6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19), and Large (≥ 20 LoC). ...
Preprint
[Context] Accurate time estimation is a critical aspect of predictable software engineering. Previous work shows that low source code quality increases the uncertainty in issue resolution times. [Objective] Our goal is to evaluate how developers' project experience and file ownership are related to issue resolution times. [Method] We mine 40 proprietary software repositories and conduct an observational study. Using CodeScene, we measure source code quality and active development time connected to Jira issues. [Results] Most source code changes are made by either a marginal or dominant code owner. Also, most changes to low-quality source code are made by developers with low levels of ownership. In low-quality source code, marginal owners need 45\% more time for small changes, and 93\% more time for large changes. [Conclusions] Collective code ownership is a popular target, but industry practice results in many dominant and marginal owners. Marginal owners are particularly hampered when working with low-quality source code, which leads to productivity losses. In codebases plagued by technical debt, newly onboarded developers will require more time to complete tasks.
... Combining metrics with other qualitative assessments, user feedback, and expert judgment can provide a more comprehensive understanding of software quality and improve the accuracy of predictions. Additionally, regular evaluation and refinement of metrics frameworks can help overcome some limitations by adapting to changing software development practices and incorporating new dimensions of quality [33]. ...
Research Proposal
Full-text available
The abstract of this research study outlines the objective of validating prediction systems for process understandability and software metrics. In this study, we focus on assessing the accuracy and reliability of prediction systems that aim to provide insights into complex processes and software-related metrics. The process of validation involves defining clear objectives, gathering relevant data, preprocessing the data, performing feature engineering, selecting appropriate prediction models, and training and validating these models using cross-validation techniques. Additionally, we emphasize the importance of interpretability and explainability in the prediction process, which enables us to gain meaningful insights into the underlying processes. Furthermore, a comparative analysis is conducted to compare the predictions generated by the system with ground truth or expert judgments, thereby ensuring the accuracy and reliability of the predictions. The study adopts an iterative refinement approach to enhance the performance, interpretability, and usability of the prediction system based on feedback and validation results. By following this comprehensive validation process, we aim to establish reliable prediction systems that provide meaningful understandability of processes and software metrics. Software metrics play a crucial role in assessing the quality, maintainability, and performance of software systems. However, understanding these metrics and their implications can be challenging, especially for non-technical stakeholders. This research study focuses on the understandability of software metrics and proposes a validation framework to assess the effectiveness of prediction systems in providing understandable insights.
... That is, the representation condition asserts that a measurement mapping (a measurement) M must map entities into numbers and empirical relations into numerical relations in such a way that the empirical relations preserve and are preserved by the numerical relations. For example, if we measure height using a meter, the measurement method M that we use to determine height needs to respect the following representation condition: A is taller than B if and only if M(A)>M(B) [7,8]. ...
Preprint
Full-text available
[Context] Coupling is a widely discussed metric by software engineers while developing complex software systems, often referred to as a crucial factor and symptom of a poor or good design. Nevertheless, measuring the logical coupling among microservices and analyzing the interactions between services is non-trivial because it demands runtime information in the form of log files, which are not always accessible. [Objective and Method] In this work, we propose the design of a study aimed at empirically validating the Microservice Logical Coupling (MLC) metric presented in our previous study. In particular, we plan to empirically study Open Source Systems (OSS) built using a microservice architecture. [Results] The result of this work aims at corroborating the effectiveness and validity of the MLC metric. Thus, we will gather empirical evidence and develop a methodology to analyze and support the claims regarding the MLC metric. Furthermore, we establish its usefulness in evaluating and understanding the logical coupling among microservices.
... Statechart diagram complexity is caused by its building components such as states, transitions, events. Therefore, different metrics are required to assess each of its component [9]. Based on this argument, three base metrics will be defined for these identified attributes states, transitions and events. ...
Article
Model-Driven Development and the Model-Driven Architecture paradigm have in the recent past been emphasizing on the importance of good models. In the Object-Oriented paradigm one of the key artefacts are the Statechart diagrams. Statechart diagrams have inherent complexity which keeps increasing every time the diagrams are modified, and this complexity poses problems when it comes to comprehending the diagrams. Statechart diagrams provide a foundation for analysing the dynamic behaviour of systems, and therefore, their quality should be maintained. The aim of this study is to develop and validate metrics for measuring the complexity of UML Statechart diagrams. This study used design science which involved the definition of metrics, development of a metrics tool, and theoretical and empirical validation of the metrics. For the measurement of the cognitive complexity of statechart diagrams, this study proposes three metrics. The defined metrics were further used to calculate the complexity of two sample statechart diagrams and found relevant. Also, theoretical validation of the defined metrics was done using the Weyuker’s nine properties and revealed they are mathematically sound. Empirical validations were performed on the metrics and results indicate that all the three metrics are good for the measurement of the cognitive complexity of statecharts.
Chapter
In the past two decades, Object Oriented (OO) programming has become a foundational component of CS2 courses in university software development programs. Though these courses teach OO design philosophies and techniques, there is not a generally accepted standard as to what constitutes good OO design. In contrast, we have excellent metrics for algorithm design and efficiency, modularization design, and overall code quality. The purpose of this paper is to identify the attributes and metrics of class design quality (as opposed to function or inheritance design quality) for the purpose of helping practitioners, educators, and students become better designers of classes.
Chapter
Software reliability is one of the most important software quality attributes. It is generally predicted using different software metrics that measure internal quality attributes like cohesion and complexity. Therefore, continuous focus on software metrics proposed to predict software reliability still required. In this context, an entropy-based suite of four metrics is proposed to monitor this attribute. The different metrics composing this suite are manually computed and only theoretically validated. Hence, we aim to propose an empirical approach to validate them as useful indicators of software reliability. Therefore, we start by assessing these metrics, using a set of programs retrieved from real software projects. The obtained dataset is served to empirically validate them as reliability indicators. Given that software reliability as external attribute, cannot be directly evaluated, we use two main experiments to perform the empirical validation of these metrics. In the first experiment, we study the relationship between the redundancy metrics and measurable attributes of reliability like fault-proneness. In the second one, we study whether the combination of redundancy metrics with existed complexity and size metrics that are validated as significant reliability indicators can ameliorate the performance of the developed fault-proneness prediction model. The validation is carried out using appropriate machine learning techniques. The experiments outcome showed up that, redundancy metrics provide promising results as indicators of software reliability.KeywordsSoftware reliabilitySoftware redundancy metricsSoftware metrics validationFault-pronenessComplexity metrics
Chapter
The quality of Software comprises many features constituting of software reliability. Estimating of software reliability in the initial stage of platform establishment will allow a software professional in originating tables well as defect long-suffering software. Testing and maintaining Software is terribly exorbitant and strenuous, and it has been predicted that about half of software establishment expenses are designated to validating of the software. In view of this we propose nature inspires methods of Particle Swarm Optimization (PSO) based model to predict software failure. The proposed model is compared with some existing benchmark techniques like Neural Networks (NN), Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), Random Forest, and genetic algorithm (GA). The dataset considered for experiments are taken from NASA Promise Software Engineering Repository projects. The prediction generated by PSO is more accurate as compared with other benchmark techniques.KeywordsSoftware reliabilityParticle swarm optimizationParameter estimation
Chapter
This chapter presents the software reliability research. Software reliability study was initiated by Advanced Information Systems subdivision of McDonnell Douglas Astronautics Company, Huntington Beach, California, to conduct research into the nature of the software reliability problem including definitions, contributing factors, and means for control. Discrepancy reports, which originated during the development of two large-scale real-time systems, form two separate primary data sources for the reliability study. A mathematical model, descriptively entitled the De-Eutrophication Process, was developed to describe the time pattern of the occurrence of discrepancies. This model has been employed to estimate the initial or residual error content in a software package as well as to estimate the time between discrepancies at any phase of its development. The chapter describes the means of predicting mission success on the basis of errors which occur during testing. Moreover, it also describes the problems in categorizing software anomalies and discusses the special area of the genesis of discrepancies during the integration of modules.
Article
A summary is presented of the current state of the art and recent trends in software engineering economics. It provides an overview of economic analysis techniques and their applicability to software engineering and management. It surveys the field of software cost estimation, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.
Article
A decomposition theory for flowchart schemata is presented, and a series of algorithms for implementing the nested composition is discussed. It is suggested that the utilization of this process as an initial phase of a flowchart structuring routine, will ensure the recognition and preservation of a given flowchart's inherent topology.
Article
Many software measures have been forwarded on the simple basis of a high linear correlation coefficient with some measurable quantities. The linear correlation coefficient is an unreliable statistic for deciding whether an observed correlation indicates significant association. Several published software measure experiments collected more than 20 different measurements, or have 14 or fewer observations. With considerable data from small samples, the probability of 'discovering' a 'significant' correlation is high. We present a computer simulation experiment where the correlation between sets of randomly generated numbers is calculated. We also look at randomly generated numbers in the ranges that would be expected in Halstead's Software Science measures. Our results show that the average maximum linear correlation for randomly generated numbers is 0.70 or higher if the sample size is low compared to the number of variables. Alternative statistical approaches to obtaining meaningful significant results are presented.
Article
Standard measures of software quality have been set up for AT&T Bell Laboratories. These metrics allow a software project to be followed through its development, controlled introduction, and release to customers. The metrics serve both project and corporate management needs. For project management, they allow more effective management of development effort, and they help to ensure a fast and effective solution to problems that arise at any stage. For corporate management, they provide a vehicle for quantifying the overall quality of software development, for setting quality improvement objectives, and for tracking results. In particular, the metrics provide quantitative information on number of faults, normalized so that corporate results can be summarized and projects of differing size can be compared; the responsiveness of support organizations in resolving problems; and the impact of fixes on customers.
Article
Experience from a dozen years of analyzing software engineering processes and products is summarized as a set of software engineering and measurement principles that argue for software engineering process models that integrate sound planning and analysis into the construction process. In the TAME (Tailoring A Measurement Environment) project at the University of Maryland, such an improvement-oriented software engineering process model was developed that uses the goal/question/metric paradigm to integrate the constructive and analytic aspects of software development. The model provides a mechanism for formalizing the characterization and planning tasks, controlling and improving projects based on quantitative analysis, learning in a deeper and more systematic way about the software process and product, and feeding the appropriate experience back into the current and future projects. The TAME system is an instantiation of the TAME software engineering process model as an ISEE (integrated software engineering environment). The first in a series of TAME system prototypes has been developed. An assessment of experience with this first limited prototype is presented including a reassessment of its initial architecture