# An approach for analysing the propagation of data errors in software

**ABSTRACT** We present a novel approach for analysing the propagation of data errors in software. The concept of error permeability is introduced as a basic measure upon which we define a set of related measures. These measures guide us in the process of analysing the vulnerability of software to find the modules that are most likely exposed to propagating errors. Based on the analysis performed with error permeability and its related measures, we describe how to select suitable locations for error detection mechanisms (EDMs) and error recovery mechanisms (ERMs). A method for experimental estimation of error permeability, based on fault injection, is described and the software of a real embedded control system analysed to show the type of results obtainable by the analysis framework. The results show that the developed framework is very useful for analysing error propagation and software vulnerability and for deciding where to place EDMs and ERMs.

**0**Bookmarks

**·**

**52**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Designing a dependable system successfully is a challenging issue that is an ongoing research subject in the literature. Different approaches have been adopted to analyse and verify the dependability of a system design. This process is far from obvious and often hampered due to the limitations of the classical dependability analysis and verification approaches. This paper provides an overview of model-based dependability analysis, design and verification approaches. Firstly, model-based analysis approaches are grouped by the limitations of the classical approaches. Secondly, design approaches have been classified looking at their underlying recovery strategies: hardware replication and hardware reuse. Then, the ins and outs of model-based verification approaches are identified starting from fault injection approaches towards their evolution into model-based integrative approaches. Finally, a model-based hybrid design process is presented making use of the reviewed analysis, design and verification approaches.International Journal On Advances in Security. 07/2013; 6(1 and 2):12-31. - [Show abstract] [Hide abstract]

**ABSTRACT:**Component Based Software systems (CBSSs) are being widely used because of their various advantages. Reliability of these systems is an important issue to be focused. The approaches which have been proposed so far, does not consider that the failure of one component depend on another and also assume that each failure affects equally to the overall system while practically it is not so. The failure of one component depends on other and also each kind of failure have their different impacts on overall system. This paper presents a new and enhanced approach for Reliability analysis of CBSSs, that will give more accurate reliability prediction than the approaches being proposed so far by taking into account the facts that the component failure affects other components as well and also that the different failure have their different impact on overall system.Computer and Communication Technology (ICCCT), 2012 Third International Conference on; 01/2012 - SourceAvailable from: Jose Ignacio Aizpurua Unanue[Show abstract] [Hide abstract]

**ABSTRACT:**Designing a dependable system successfully is a challenging issue that is an ongoing research subject in the literature. Different approaches have been adopted in order to identify, analyse and verify the dependability of a system design. This process is far from obvious and often hampered due to the limitations of the classical dependability analysis techniques and verification approaches. This paper provides an overview of analysis approaches grouped by limitations. The principal points for the characterization of the considered approaches are the capability to handle notions of time, component-wise failure propagations and the use of architectural languages with the aim to extract analysis models from design models. Finally, verification approaches are partially reviewed.DEPEND 2012; 08/2012

Page 1

The International Conference on Dependable Systems and Networks 2001

(FTCS-31 and DCCA-9)

Submission category:

Regular paper

Title:

An Approach to Analysing the Propagation of Data Errors in Software

General topic area:

Software reliability

Authors:

Martin Hiller, Arshad Jhumka, Neeraj Suri

Laboratory for Dependable Computing

Department of Computer Engineering

Chalmers University of Technology

Hörsalsvägen 11

SE-41296 Göteborg

Sweden

(email)

(phone)

(fax)

{hiller, arshad, suri}@ce.chalmers.se

+46-(0)31-772 5228 (Martin Hiller)

+46-(0)31-772 3663

The underlined author is the contact author.

Abstract:

We present a novel approach to analysing the propagation of data errors in software. The concept of Error

Permeability is introduced as a basic measure upon which we define a set of related measures. These

measures guide us in the process of analysing the vulnerability of software to find the modules that are

most likely exposed to propagating errors. Based on the analysis performed with error permeability and its

related measures, we describe how to select suitable locations for error detection mechanisms (EDM's)

and error recovery mechanisms (ERM's). A method for experimental estimation of error permeability,

based on fault injection, is described and the software of a real embedded control system is analysed to

show the type of results that can be obtained by the analysis framework. The results show that the

presented framework is very useful for analysing error propagation and software vulnerability, and for

deciding where to place EDM's and ERM's.

Keywords:

error propagation analysis, software vulnerability analysis, software dependability design,

experimental software metrics, fault injection

"Neither this paper nor any version close to it has been or is being offered elsewhere for publication.

All necessary clearances have been obtained for the publication of this paper. If accepted, the paper

will be made available in camera-ready forms by the due date, and it will be personally presented at

ICDSN 2001 by the author or one of the co-authors. The presenting author(s) will pre-register for

ICDSN 2001 before the due date of the camera-ready paper."

-The Authors!

Page 2

1

An Approach to Analysing the Propagation of Data Errors in Software

Martin Hiller, Arshad Jhumka, and Neeraj Suri

Department of Computer Engineering

Chalmers University of Technology

Göteborg, Sweden

{hiller, arshad, suri}@ce.chalmers.se

Abstract

We present a novel approach to analysing the propagation of data errors in software. The concept of

Error Permeability is introduced as a basic measure upon which we define a set of related measures.

These measures guide us in the process of analysing the vulnerability of software to find the modules that

are most likely exposed to propagating errors. Based on the analysis performed with error permeability

and its related measures, we describe how to select suitable locations for error detection mechanisms

(EDM’s) and error recovery mechanisms (ERM’s). A method for experimental estimation of error

permeability, based on fault injection, is described and the software of a real embedded control system is

analysed to show the type of results that can be obtained by the analysis framework. The results show that

the presented framework is very useful for analysing error propagation and software vulnerability, and

for deciding where to place EDM’s and ERM’s.

Keywords: error propagation analysis, software vulnerability analysis, dependability design,

experimental software metrics, fault injection

1. Introduction

As software based functionality becomes pervasive in embedded control systems, software usually

comprises numerous discrete software modules interacting with each other in order to provide a specific

task or service. When errors (as defined in [Laprie95]) are present in one software module, there is a

likelihood that this error can propagate to other software modules with which it interacts. Knowing where

errors propagate in a system is of particular importance for a number of development activities.

Propagation analysis may be used to find the most vulnerable modules in a system and to ascertain how

different modules affect each other in the presence of errors. Furthermore, propagation analysis also gives

an insight as to what locations in the system would be suitable for placement of error detection

mechanisms (EDM’s) and associated error recovery mechanisms (ERM’s).

Apart from the technical issues that can be addressed using propagation analysis, there are also issues

Page 3

2

pertaining to project and resource management. Error propagation analysis may be used as a means of

obtaining information for use in decisions on where additional resources for dependability development

are necessary and to determine where they would be most cost effective. Software is common not only in

applications such as aircraft or other high-cost systems, but also in consumer-based cost-sensitive systems,

such as cars. These systems often require both development costs and production costs to be kept low.

Analysing error propagation can also complement other analysis activities, for instance FMECA (Failure

Mode Effect and Criticality Analysis). Consequently, modules and signals found to be vulnerable and/or

critical during propagation analysis might be given more attention during design activities. Thus, error

propagation analysis, as a means of both system analysis and resource management, may be a very useful

design-stage tool in such systems.

In this paper we present an approach for analysing error propagation in software based systems. Our

basic intent is software level error propagation, thus we consider distributed software functions resident on

either single or distributed hardware nodes. In our approach, we introduce the measure Error Permeability

as well as a set of related measures, and subsequently define a methodology for using these measures to

obtain information on error propagation and candidate locations for detection and recovery mechanisms.

The basic definition of error permeability is the probability of an error in an input signal permeating to one

of the output signals (there is one permeability value assigned between each pair of input/output signals).

Paper organisation: We review existing and related work in error propagation analysis in Section 2. In

Section 3 we define the system model used in our proposed approach. The definition of the error

permeability measure used for propagation analysis, and a method for analysing error propagation paths,

is the subject of Section 4. How the permeability values relate to the locations of EDM’s and ERM’s is

discussed in Section 5. In Section 6 we describe our method for obtaining an estimate of the error

permeability of software modules. An example study and experiment is presented in Section 7, and the

results are discussed in Section 8. A summary and conclusions are found in Section 9.

2. Related Work

Error propagation analysis for logic circuits has been used for over 30 years. Numerous algorithms and

techniques have been proposed, e.g., the D-algorithm [Roth80], the PODEM-algorithm [Goel81] and the

FAN-algorithm [Fujiwara83] (which improves on the PODEM-algorithm).

Propagation analysis in software has been described for debugging use in [Voas90]. Here the

propagation analysis aimed at finding probabilities of source level locations propagating data-state errors

if they were executed with erroneous initial data-states. The framework was further extended in [Voas92].

This framework was used for analysing source code under test in order to determine which test cases

should be used during testing to reveal the largest amount of defects [Morrel97]. In [Voas94], the

framework was used for determining locations for placing assertions during software testing, i.e., aiming

Page 4

3

to place simple assertions where normal testing would have difficulties finding defects.

An investigation in [Michael97] reported that there was evidence of uniform propagation of data errors.

That is, a data error occurring at a location l of a program would, to a high degree, exhibit uniform

propagation, meaning that for location l either all data errors would propagate to the system output or none

of them would. Our findings do not corroborate this assertion of uniform propagation.

Finding optimal combinations of hardware EDM’s based on experimental results was described in

[Steininger97]. They used coverage and latency estimates for a given set of EDM’s to form subsets which

minimised overlapping between different EDM’s, thereby giving the best cost-performance ratio.

3. Software System Model

In our studies, we consider modular software, i.e., discrete software functions interacting together to

deliver the requisite functionality. In this sense, a module is a generalised black-box module having

multiple inputs and outputs. These modules may communicate with each other in some specified way

using varied forms of signalling, e.g., shared memory, messaging, parameter passing etc., as pertinent to

the chosen communication model.

A software module performs computations using the provided inputs to generate the outputs. At the

lowest level, such a black-box module may be a procedure or a function but could also conceptually be a

basic block or particular code fragment within a procedure or function (at a finer level of software

abstraction). A number of such modules constitute a system and they are linked to each other via signals,

much like signal pathways between hardware components on a circuit board. Of course, this system may

be seen as a larger component or module in an even larger system. Signals can originate internally from a

module, e.g., as a calculation result, or externally from the hardware itself, e.g., a sensor reading from a

register in an A/D-converter. The destination of a signal may also be internal, being part of the input set of

a module, or external, as for example the value placed in a hardware register for physical transmission or

D/A-conversion.

Software constructed according to the above is found in numerous embedded systems. For example,

most control applications controlling physical events such as the systems found in automobiles are

traditionally built up in such a way. Our studies mainly focus on software developed for embedded

systems in consumer products (high-volume and low-production-cost systems).

4. Propagation Analysis: Conceptual Basis

In our study we aim to chart the propagation of errors, i.e., the process of a fault or an error present in

one part of the system causing errors in other modules. The errors we have studied primarliy are data

errors – erroneous values in the internal variables and signals of a system. The main goal of performing an

error propagation analysis is to determine how errors propagate and their effect on system operations.

Page 5

4

Typically, an error appears in a module by force of nature, for example as a result of faulty sensor

readings or radiation flipping bits in memory areas, or by human force, as a result of a design or

implementation defect in the system.

There is a probability that an error will affect the course of the execution in such a way that it generates

new errors along the execution trajectory of the system. If one could somehow obtain knowledge of the

error propagation characteristics of a particular system, this would aid the development of techniques and

mechanisms for detecting and eventually correcting the error. Of course, error detection and error

recovery mechanisms can also be devised without performing an error propagation analysis. However,

knowing where the errors propagate and how they affect the system can greatly improve the effectiveness

of error detection and handling and the consequent cost-performance-ratio of these mechanisms, as the

efforts can be concentrated to those areas of the system to where errors tend to propagate. In propagation

analysis, the results are useful even with minimal knowledge of the distribution of the occurring errors,

i.e., if one does not know which errors are most likely to appear. Having such knowledge would certainly

improve the value of the results, but performing the analysis without it still provides qualitative insights on

system error susceptibility.

4.1 Error Permeability

In our approach, we introduce the error permeability measure. Based on this measure we can define a

set of related measures that cumulatively give an insight on the propagation characteristics and

vulnerabilities of a system.

Figure 1. A basic black-box software module with m inputs and n outputs

Consider the software module in Fig. 1 (at this stage we only consider discrete software modules). We

start with a simple definition of error permeability and refine it successively. For each pair of input and

output signals we can define the error permeability as the conditional probability of an error occurring on

the output given that there is an error on the input. That is, for input i and output k of a module M we can

define the error permeability, ℘Mk

i, as follows:

This measure indicates how “permeable” an input/output pair of a software module is to errors

occurring on the inputs. One major advantage of this definition of error permeability is that it is

independent of the probability of error occurrence on the input. This reduces the need for having a detailed

model of error occurrence. On the other hand, error permeability is still dependent on the workload of the

module as well as the type of the errors that can occur on the inputs. It should be noted that if the error

) 1 ,1 (1}| Pr{0

nkmiiinputinerrorkoutput inerrorMi

k

≤≤≤≤≤=℘≤

M

Output 1

Output n

Input 1

Input m

......

(1)

Page 6

5

permeability of an input/output pair is zero, this does not necessarily mean that the incoming error did not

cause any damage. The error may have caused a latent error in the internal state of the module that for

some reason is not visible on the outputs. In Section 5, we describe an approach for experimentally

estimating values for this measure.

Error permeability is the basic measure, which we can use to obtain other measures that will aid us in

analysing the propagation characteristics of a system. We define the relative permeability, ℘M, of a

module M to be:

Note that this does not necessarily reflect the overall probability that an error is permeated from the

input of the module to the output. Rather, it is an abstract measure that can be used to obtain a relative

ordering between different modules. If all inputs are assumed to be independent of each other and errors

on one input signal can only generate errors on one output signal at a time, then this measure is the total

real-world probability of an error on the input being permeated to the output. However, this is seldom the

case in most practical applications.

At this stage, one potential weakness of this measure is that it is not possible to distinguish modules

with a large number of input and output signals from those with a small number of input and output

signals. This distinction is useful to ascertain as modules with many input- and output-signals are likely to

be central parts (almost like hubs) of the system thereby attracting errors from different parts of the

system. In order to be able to make this distinction we remove the weighting factor in Eq. 2, thereby

“punishing” modules with a large number of input and output signals. That is, we can define the non-

weighted relative permeability, ℘*M, for module M as follows:

As was the case for the relative permeability, this measure does not have a straightforward real-world

interpretation but is rather a measure that can be used during development to obtain a relative ordering

between different modules in a system. The larger this value is for a particular module the more effort has

to be spent in order to increase the error containment capability of that module (which is the same as

decreasing the error permeability of the module), for instance by using wrappers as in [Salles99]. Note

that this measure is not limited to ≤ 1, as the maximum value of each individual permeability value is 1,

the upper bound for this measure is the product of i and k!

The two measures defined in Eqs. 2 and 3 are both necessary for analysing the modules of a system.

For instance, consider the case where two modules are to be compared. One of the modules has a low

number of signals going in and going out, and the other module has a large number. If they both have the

same relative permeability, they will be distinguishable by looking at the non-weighted relative

1

1

n

1

m

0

≤℘

⋅=℘≤

∑∑

ik

i

k

MM

∑∑

i

≥℘=℘

k

i

k

MM

0

*

(2)

(3)

Page 7

6

permeability measure since the smaller module will have a low value and the larger module will have a

high value. Should the two modules have the same non-weighted relative permeability, then we are able to

distinguish between them using the relative permeability measure since the large module will have a low

value and the small module will have a high value.

4.2 Ascertaining Propagation Paths in Inter-Linked Software Modules

So far, we have obtained the error permeability factors for each discrete software module in a system.

Considering every module individually does have limitations though; this analysis will give insights on

which modules are likely (relatively) to transfer incoming errors, but will not reveal which modules are

likely to be exposed to propagating errors in the system. In order to gain knowledge about the exposure of

the modules to propagating errors in the system we define the following process which now considers

interactions across modules.

Consider the example software system shown in Fig. 2. Here we have five modules, A through E,

connected to each other with a number of signals. The ith input of module M is designated Mi and the kth

output of module M is designated Mk. External input to the system is received at A1, C2 and C3. The output

produced by the system is E1.

Once we have obtained values for the error permeability measures of each pair of input and output

signal of each module, we can construct a permeability graph as illustrated in Fig. 3. Each node in the

graph corresponds to a particular module and has a number of incoming arcs and a number of outgoing

arcs. Each arc has a weight associated with it, and this weight is an error permeability value. Hence, there

may be more arcs between two nodes than there are signals between the corresponding modules. Actually,

the maximum number of outgoing arcs for a node is the product of the number of incoming signals and the

number of outgoing signals for the corresponding software module (remember that there is an error

permeability value related to every pair of input and output signal of a module). Arcs with a weight of 0

(representing non-permeability from one input to one output) can be omitted.

With the permeability graph we can perform two different propagation analyses, namely:

AB

CD

E

A1

A1

A2

C2

C3

C1

C1

C2

D1

D3

D2

B2

B3

B2

D1

E2

E1

E1

B1

B1

Figure 2. An example software system

with five modules (A, B, C, D, E)

1

1 A

℘

AB

CD

E

1

1

2

1

C

C

℘

℘

3

1

C

℘

1

2

2

2

C

C

℘

℘

℘

1

3 B

℘

1

2

B

℘

2

2

1

1

D

℘

℘

2

1

D

1

2

A

℘

3

2

C

3

1

D

℘

1

1

E

℘

2

1

E

℘

A1

A1

A2

B1

E1

C1

C2

C3

C2

C1

D1

D2

D3

E2

E1

B1

B2

D1

B3

2

3

B

℘

1

1 B

℘

2

1 B

℘

B2

B

℘

Figure 3. The permeability graph of the

example system

Page 8

7

A. Backtrack our way from the output of the system to find the paths that give the highest probability

of propagation (output error tracing); or

B. Trace errors from the input of the system to find the paths that these errors will most likely

propagate along (input error tracing).

Output error tracing is easily accomplished by constructing a set of backtrack trees, one for each

system output. These backtrack trees can be constructed quite simply based on the following steps on the

permeability graph, namely:

This will, for each system output signal, give us a backtrack tree where the root corresponds to the

system output signal, the intermediate nodes correspond to internal output signals and the leaves

correspond to system input signals (or module inputs receiving feedback from its own module). Also, all

vertices in the tree have a weight corresponding to an error permeability value. Once we have obtained

this tree, finding the propagation paths with the highest propagation probability is simply a matter of

finding which paths from the root to the leaves have the highest weight.

Input error tracing is done in a very similar way as for output error tracing. However, instead of

constructing a backtrack tree for each system output signal, we construct a trace tree for each system input

signal. The steps for constructing a trace tree are the following:

This procedure results in a set of trace trees – one for each system input signal. In a trace tree, the root

will represent a system input signal, the leaves will represent system output signals, and the intermediate

Step 1.

Select a system output signal and let this signal be the root node of the backtrack tree.

Step 2.

For each error permeability value associated with the signal, generate a child node. Each child

node will be associated with an input signal.

Step 3.

For each child node, if the corresponding signal is not a system input signal, backtrack to the

generating module and determine the corresponding output signal. Use this signal and

construct the sub-tree for the child node from Step 2. If the corresponding signal is a system

input signal it will be a leaf in the tree. If the corresponding signal is an input signal to the

same module it will be a leaf in the tree having a special relation to its parent node. We do not

follow the recursion that is generated by the feedback.

Step 4.

If there are more system output signals, go back to Step 1.

Step 1.

Select a system input signal and let this signal be the root node of the trace tree.

Step 2.

Determine the receiving module of the signal and for each output of that module, generate a

child node. This way, each child node will be associated with an output signal.

Step 3.

For each child node, if the corresponding signal is not a system output signal, trace the signal

to the receiving module and determine the corresponding input signal. Use this signal and

construct the sub-tree of the child node from Step 2. If the corresponding signal is a system

output signal it will be a leaf in the tree. If the input signal is the same module that generated

the output signal (i.e. we have a module feedback) then follow this feedback once and generate

the sub-trees for the remaining outputs. We do not follow the recursion generated by this

feedback.

Step 4.

If there are more system input signals, go back to Step 1.

Page 9

8

branch node will represent internal input signals. Thus, all vertices will be associated with an error

permeability value. From the trace trees we find the propagation pathways that errors on system input

signals would most likely take by finding the paths from the root to the leaves having the highest weights.

The case when an output signal of a module is connected to an input signal of the same module is

handled in the way described in Step 3 of the backtrack tree generation script. If we would use recursive

sub-tree generation we would get an infinite number of sub-trees with diminishing probabilities. As all

permeability values are ≤ 1, the sub-tree with the highest probability is the one which only goes one pass

through the feedback loop and this path is included in the permeability tree. [Roth80], [Goel81], and

[Fujiwara83] have also utilised similar techniques for hardware error propagation analysis.

The backtrack tree for system output E1 of the example in Fig 2. is shown in Fig. 4.

Figure 4. The backtrack tree of system output signal E1 of the example system.

In the permeability tree in Fig. 4 we observe the double line between input signal B1 and output signal

B1. This notation implies that we have a local feedback in module B – output B1 is connected to input B1 –

and represents breaking up of the propagation recursion.

The weight for each path is the product of the error permeability values along the path. For example, in

Fig. 4, the path from E1 to A1 going straight from input B2 to output B2 of module B (the leftmost path in

the tree) has the probability P = ℘A1

is an error in the output signal E1 and the error originated from input signal A1, it propagated directly

through output B2 of module B which is connected to input E1 of module E and then to output E1.

1⋅℘B2

2⋅℘E1

1. This is the conditional probability that, given that there

If we have knowledge regarding the probability of errors appearing on the input signals we can use

these probabilities as additional weights on the paths. For example, if the possibility of an error appearing

on input A1 is PA1, then the probability P can be adjusted with this factor, giving us P' =

PA1⋅℘A1

1⋅℘B2

2⋅℘E1

1. This is the probability of an error appearing on system input A1, propagating

through module B directly via output B2 to the output signal E1 of the system.

The trace tree for system input A1 is shown in Fig 5.

E1

B2

A1

A1

D1

C1

A2

C2

C3

C2

A2

C2

C3

A1

A1

1

1

E

℘

1

1 B

℘

B1

1

1 A

℘

1

1

D

℘

2

1

E

℘

3

1

D

℘

1

2

A

℘

1

1

C

℘

2

1

C

℘

3

1

C

℘

1

2

C

℘

3

2

C

℘

1

2 A

℘

2

2

C

℘

2

1

D

℘

B1

2

2

B

℘

1

2

B

℘

A1

A1

2

1 B

℘

1

1 A

℘

B3

A1

A1

1

1 B

℘

B1

1

1 A

℘

B1

2

3

B

℘

1

3

B

℘

A1

A1

2

1 B

℘

1

1 A

℘

Page 10

9

Figure 5. The trace tree for system input signal A1 of the example system.

From the trace tree we can see which propagation path from system input to system output has the

highest probability. As for backtrack trees, the probability of a path is obtained by multiplying the error

permeability values along the path. For example, in Fig. 5, the probability of an error in A1 propagating to

module C and via its output C2 to module D and from there via module E to system output E1 is P =

℘A2

Again, if we have knowledge on the probability PA1 of an error appearing on signal input A1, then we

can adjust the probability P so that we get P’ = PA1⋅℘A2

1⋅℘C2

1⋅℘D1

3⋅℘E1

1.

1⋅℘C2

1⋅℘D1

3⋅℘E1

1.

5. Relating Error Permeability to Locations for EDM’s and ERM’s

Using the backtrack trees and trace trees enables us to determine two specific aspects: (a) the paths in

the system that errors will most likely propagate along to get to certain output signals, and (b) which

output signals are most likely affected by errors occurring on the input signals. With this knowledge we

can start selecting locations for the EDM’s and ERM’s that we may want to incorporate into our system

based on system reliability/safety requirements.

One problem remains though: once we have the most probable propagation paths, we still have to find

the modules along that path that are the best to target with EDM’s and ERM’s. Earlier, in Eqs. 2 and 3, we

had defined a set of measures, relative permeability and non-weighted relative permeability, that can

guide us in this search.

These measures only take into account the permeability values of each individual module. That is,

coupling between modules is not taken into account. Using the error permeability graph we can define a

set of measures that will take the coupling explicitly into account and will add more information to the

process of determining locations for EDM’s and ERM’s. To find the modules that are most likely to be

exposed to propagating errors, we want to have some knowledge of the amount of errors that a module

may be subjected to. For this we define the error exposure, ℵM, of a module M as:

∑

=ℵ

MM

node of arcs incoming all of weight

arcs incoming#

1

D1

1

1

E

℘

E1

E2

E2

E2

E1

E1

D2

E1

E1

D3

C1

A1

1

1

D

℘

2

1

D

℘

3

1

D

℘

2

1

E

℘

2

1

E

℘

2

1

E

℘

2

2

B

℘

2

3

B

℘

1

1

C

℘

1

2

C

℘

1

1 A

℘

1

2

A

℘

D1

1

1

E

℘

E2

E1

E1

E1

1

1

D

℘

2

1

E

℘

B1

B2

2

1 B

℘

(4)

Page 11

10

where M is the node M in the permeability graph, representing the software module M. Incoming arcs

are those arcs that are inbound to that node. This measure does not have a straightforward real-world

interpretation, and does not take into account any correlation that may exists between two or more

incoming arcs. Since we use this as a relative measure, this is not a concern for us. The error exposure is

the mean of the weights of all incoming arcs of a node and is limited upwards by 1/#incoming arcs.

Analogous to the non-weighted relative permeability measure, we can also define a non-weighted error

exposure, ℵ*M, of a module M as:

This measure does not have a real-world interpretation either – it is used only during system analysis to

obtain a relative ordering between modules. These two exposure measures (Eqs. 4 and 5) along with the

previously defined permeability measures (Eqs. 2 and 3) will be the basis for the analysis performed to

obtain information upon which to base a decision about locating EDM’s and ERM’s. As was the case for

the two relative permeability measures, the two exposure measures are used for distinguishing between

nodes with a small number of incoming arcs and nodes with a large number of incoming arcs.

The error exposure measure indicates which modules will most probably be the ones exposed to errors

propagating through the system. If we want to analyse the system at the signal level and get indications on

which signals might be the ones that errors most likely will propagate through, we can define the signal

error exposure. This measure is the equivalent of the error exposure defined in Eq. 4, but is only

calculated for one signal at a time. In the backtrack trees we can easily see which error permeability values

are directly associated with a signal S. We define the set Sp as composed of all unique arcs going to the

child nodes of all nodes generated by the signal S. A signal may generate multiple nodes in a backtrack

tree (see for instance signal B1 in the backtrack tree in Fig. 4). However, in the set Sp, the permeability

values associated with the arcs emanating from those nodes will only be counted once. The signal error

exposure, ℵsS, of signal S is then calculated as:

The interpretation for the signal error exposure is the same as for the error exposure of a module, but at

a signal level. That is, the higher a signal error exposure value, the higher the probability of errors in the

system being propagated through that signal.

It may be difficult to give strict rules for selecting the EDM and ERM locations. However, some rules

of thumb or recommendations can still be made:

•

The higher the error exposure of a module, the higher the probability that it will be subjected to

errors propagating through the system if errors are indeed present. Therefore, it may be more cost

effective to place EDM’s in those modules than in modules with lower error exposure. An

analogous way of reasoning is valid also for the signal error exposure.

∑

=ℵ

MM

node of arcs incoming all ofweight

*

∑

=ℵ

p

s

SS

in ty values permeabili all

(5)

(6)

Page 12

11

•

The higher the error permeability of a module, the higher the probability of subsequent modules

being subjected to propagating errors if errors should pass through the module. Therefore, it may be

more cost effective to place ERM’s in those modules than in those with lower error permeability.

We have now defined a basic analytical framework for ascertaining measures pertaining to error

propagation and software vulnerability. Next, we will describe how to obtain experimental estimates of

the measures and then use our framework on the actual software of a real embedded control system.

6. Estimating Error Permeability: An Experimental Approach

Our method for experimentally estimating the error permeability values of software modules is based

on fault injection (FI). FI artificially introduces faults and/or errors into a system and has been used for

evaluation and assessment of dependability for several years, see for example [Chillarege89], [Arlat90],

and [Fabre99]. A comprehensive survey of experimental analysis of dependability appears in [Iyer96].

For analysis of the raw experimental data, we make use of so-called Golden Run Comparisons (GRC).

A Golden Run is a trace of the system executing without any injections being made, hence, this trace is

used as reference and is stated to be “correct”. All traces obtained from the experiments, where injections

were conducted, are compared to the Golden Run, and any difference indicates that an error has occurred.

The main advantage of this approach is that it does not require any a priori knowledge of how the various

signals are supposed to behave, which makes this approach less application specific.

For this study, we used the Propagation Analysis Environment (PROPANE, [Hiller00b]). This is a tool

that enables fault and error injection, using SWIFI (SoftWare Implemented Fault Injection), in software

running on a desktop system (currently only available for Microsoft Windows NT4/2000). The tool is also

capable of creating traces of individual variables and different pre-defined events during the execution.

Each trace of a variable from an injection experiment is compared to the corresponding trace in the

Golden Run. Any discrepancy is recorded as an error.

Experimentally estimating values for error permeability of a module is done by injecting errors in the

input signals of the module and logging its output signals. We only inject one error at a time in one input

signal at a time. Suppose that, for module M, we inject ninj distinct errors in input i, and at output k observe

nerr differences compared to the Golden Run, then we can directly estimate the error permeability ℘Mk

i to

be nerr/ninj (see more on experimental estimation in [Powell95] and [Cukier99]).

Since the propagation of errors may differ depending on the workload of the system, it is generally

preferred to have realistic input distributions rather than randomly generated inputs. This will very likely

generate permeability estimates that are closer to the “real” values than randomly chosen inputs would.

The type of errors that are injected may also have an effect on the estimates. Ideally, one would inject

errors from a realistic set, and with a realistic distribution. However, as in our framework the measures are

mainly used as relative measures, the relevance of the realism provided by the error model is decreased,

Page 13

12

assuming that the relative order of the modules and signals when analysing permeability is maintained.

7. Experimental Analysis: An Example Embedded System

As an example of actual application of our proposed methodology on an embedded control system, we

have conducted an example study. This study illustrates the results obtained using experimental estimates

for error permeability values.

7.1 Target Software System

The target system is a medium sized embedded control system used for arresting aircraft on short

runways and aircraft carriers. The system aids incoming aircraft to reduce their velocity, eventually

bringing them to a complete stop. The system is constructed according to specifications found in

[USAF86]. The system is illustrated in Fig. 7.

In our study, we used the original software of the master node of the system and ported it so it would

run on a Windows-based desktop computer. The scheduling of the software is slot-based and non-

preemptive. Therefore, from the viewpoint of the software, there is no difference in running on the actual

hardware or running on a desktop computer. Some glue software had to be developed in order to simulate

registers for A/D-conversion, timers, counter registers etc., accessed by the application. An environment

simulator used in experiments conducted on the real system was also ported, so the environment

experienced by the real system and the desktop system was identical. The simulator handles the rotating

drum and the incoming aircraft (as illustrated in Fig. 8).

In the real system, there are two nodes – one master node which does all the calculations for the desired

pressure to be applied, and one slave node which receives the desired pressure from the master node. Each

node controls one of the rotating drums. In our setup, the slave node was removed and the retracting force

applied by the rotating drum of the master node was also applied on the slave-node end of the cable. This

is equivalent to the case where the two drums behave identically.

The structure of the software is illustrated in Fig. 9. The numbers shown at the inputs and outputs are

used for numbering the signals. For instance, the signal PACNT is input signal #1 of the DIST_S module,

and the signal SetValue is output signal #2 of the CALC module.

Master

Rotation

sensor

Pressure

sensor

Pressure

valve

Pressure

sensor

Pressure

valve

Cable

Tape drum

(Slave)

Tape drum

(Master)

Slave

Figure 7. The target system used in

our example study

Master

Rotation

sensor

Pressure

sensor

Pressure

valve

Cable

Tape drum

(Mirror)

Tape drum

(Original)

Pressure

valve

Environment simulator

Figure 8. The target system and the

environment simulator

Page 14

13

Figure 9. The software structure of the target system

The software is composed of six modules of varying size and input/output signal count. The specifics of

each module is as follows:

•

CLOCK provides a clock, mscnt, with one millisecond resolution. The system operates in seven 1-ms-slots.

In each slot, one or more of the other modules (except for CALC) are invoked. The signal ms_slot_nbr tells

the module scheduler the current execution slot. Period = 1 ms.

•

DIST_S receives the signals PACNT and TIC1 from the rotation sensor and the signal TCNT from the

hardware counter modules. The module provides a total count of the pulses, pulscnt, generated during the

arrestment. It also provides two Boolean values, slow_speed and stopped, saying whether the aircraft

velocity has gone below a certain threshold (slow_speed == TRUE) and whether it has stopped altogether

(stopped == TRUE). The rotation sensor reads the number of pulses generated by a tooth wheel on the tape

drum. Period = 1 ms.

•

CALC (which is the main background process) uses the signals mscnt, pulscnt, slow_speed and stopped to

calculate a set point value for the pressure valves, SetValue, at six predefined checkpoints along the

runway. The distance between these checkpoints is constant, and they are detected by comparing the

current pulscnt with internally stored pulscnt-values corresponding to the various checkpoints. The current

checkpoint is stored in i. Period = n/a (background task, runs when other modules are dormant).

•

PRES_S monitors the pressure sensor measuring the pressure that is actually being applied by the pressure

valves, using the signal ADC from the internal A/D-converter. This value is provided to the rest of the

system in the signal IsValue. Period = 7 ms.

•

V_REG uses the signals SetValue and IsValue to control OutValue, the output value to the pressure valve.

OutValue is based on SetValue and then modified to compensate for the difference between SetValue and

IsValue. This module contains the software implemented PID-regulator. Period = 7 ms.

•

PRES_A uses the OutValue signal to set the pressure valve via the hardware register TOC2. Period = 7 ms.

7.2 System Analysis

Prior to running the experiments we generated the permeability graph and the backtrack trees and trace

trees for the target system as per the process described in Sections 4 and 5. The permeability graph is

shown in Fig.10.

Rotation

sensor

HW counter

ms_slot_nbr

CALC

CLOCK

DIST_S

PRES_S

PRES_AV_REG

i

pulscnt

IsValue

mscnt

SetValue

OutValue

1

Pressure

valve

slow_speed

stopped

Pressure

sensor

PACNT

TIC1

TCNT

ADC

TOC2

1

1

2

3

1

2

3

11

2

11

1

2

1

1

2

3

4

5

1

2

Page 15

14

Figure 10. The permeability graph of the arrestment system

In the graph (Fig. 10) we can see the various permeability values (labels on the arcs) that will have to

be calculated. The numbers used in the notation refer to the numbers of the input signals and output

signals respectively, as shown in Fig 9. For instance, ℘CALC1

2 is the permeability from input 2 to output

1 of module CALC, as can be see in Fig. 9, input 2 is mscnt and output 1 is i.

From the permeability graph in Fig. 10 we can now generate the backtrack tree for the system output

signal TOC2, using the steps described in Section 4.2. This tree is shown in Fig. 11.

Figure 11. The backtrack tree for system output TOC2

CALC

CLOCK

PRES_S

DIST_S

V_REG

PRES_A

ms_slot_nbr

mscnt

i

pulscnt

slow_speed

1

2

DIST_S

stopped

SetValue

IsValue

OutValue

TOC2 ADC

TCNT

TIC1

PACNT

1

1

CLOCK

℘

1

2

CLOCK

℘

1

1

CALC

℘

2

1

CALC

℘

3

1

CALC

℘

4

1

CALC

℘

5

1

CALC

℘

1

2

CALC

℘

2

2

CALC

℘

3

2

CALC

℘

CALC

℘

CALC

℘

4

2

5

2

1

1

℘

V_REG

℘

2

1

V_REG

1

1

PRES_S

℘

1

3

DIST_S

℘

℘

2

3

℘

DIST_S

3

3

DIST_S

1

1

PRES_A

℘

1

1

DIST_S

℘

2

1

DIST_S

℘

3

1

DIST_S

℘

℘

3

2

DIST_S

℘

2

2

DIST_S

℘

TOC2

OutValue

SetValue

IsValue

imscntpulscnt slow_speedstopped

ms_slot_nbr

ms_slot_nbr

PACNT

TIC1

TCNT

PACNT

TIC1

TCNT

PACNT

TIC1

TCNT

ADC

mscntpulscnt slow_speedstopped

ms_slot_nbr

ms_slot_nbr

PACNT

TIC1

TCNT

PACNT

TIC1

TCNT

PACNT

TIC1

TCNT

i

1

1

CLOCK

℘

3

1

DIST_S

℘

1

1

DIST_S

℘

2

1

DIST_S

℘

1

2

CLOCK

℘

2

2

DIST_S

℘

1

2

DIST_S

℘

3

2

DIST_S

℘

2

3

DIST_S

℘

1

3

DIST_S

℘

3

3

DIST_S

℘

2

2

CALC

℘

3

2

CALC

℘

4

2

CALC

℘

5

2

CALC

℘

1

2

CALC

℘

1

1

V_REG

℘

2

1

V_REG

℘

1

1

CLOCK

℘

3

1

DIST_S

℘

1

1

DIST_S

℘

2

1

DIST_S

℘

1

2

CLOCK

℘

2

2

DIST_S

℘

1

2

DIST_S

℘

3

2

DIST_S

℘

2

1

CALC

℘

3

1

CALC

℘

4

1

CALC

℘

5

1

CALC

℘

1

1

PRES_A

℘

3

3

DIST_S

℘

2

3

DIST_S

℘

1

3

DIST_S

℘

1

1

PRES_S

℘

1

1

CALC

℘

#### View other sources

#### Hide other sources

- Available from Arshad Jhumka · May 27, 2014
- Available from psu.edu