[show abstract][hide abstract] ABSTRACT: This paper considers automatic, specification based detection of failures (supervision) of software systems. It is applicable to systems specified in a formalism based on communicating finite state machines. The technique enhances the belief-based approach to supervision to allow its continuation after occurrences of failures. The enhancement adopts the fuzzy set view of the membership of hypotheses in the behavior matched set. The paper first overviews the belief-based approach, presents the fuzzy enhancements and describes an experimental evaluation of the technique and summarizes its results.
Availability, Reliability and Security, 2008. ARES 08. Third International Conference on; 04/2008
[show abstract][hide abstract] ABSTRACT: The paper considers automatic, specification-based detection of failures (differences between observed and specified behavior) in external behavior of software systems. The external behavior is recorded in traces, which are analyzed for the presence of failures. The paper describes a novel failure detection technique. The technique is applicable to multi-user systems which are reactive, session-oriented and specified in formalisms based on communicating extended finite state machines. It separates the failure detection concerns into two parts, the detection of failures directly noticeable by individual users, and the determination whether the individually correct local behaviors are globally consistent with the specification. An experimental evaluation of the technique on the control program for a small telephone exchange is also presented.
Availability, Reliability and Security, International Conference on. 01/2007;
[show abstract][hide abstract] ABSTRACT: Self-protection is an attribute of autonomic computing systems, reflecting the requirement to proactively defend against attackers, and automatically detect and recover from attacks. As demonstrated by increasing numbers of Internet worms, a single previously unknown vulnerability can cause an entire infrastructure to crumble, due to software and hardware monocultures. One defence against complete failures is diversity: by utilizing differing implementations of software and hardware, the potential total damage from a single exploit is lessened. The self-deployment and self-configuration features of an autonomic computing infrastructure make it practical to use diversity as a self-protection mechanism. We explore the idea of using diversity as a factor in resource allocation decisions, showing how it could be used to limit the damage an attacker can inflict.
Availability, Reliability and Security, 2006. ARES 2006. The First International Conference on; 05/2006
[show abstract][hide abstract] ABSTRACT: Autonomic computing addresses increasing complexity in computer-based systems by giving these systems the ability to automatically manage many aspects of their own operation. While many aspects of self-management have been examined in isolation, there is a notable lack of an effective autonomic computing infrastructure publicly available with which these techniques could be integrated, compared, and evaluated. We describe an autonomic computing architecture and accompanying implementation infrastructure constructed on top of the cognitive agent architecture, showing that many of its features map naturally to autonomic computing concepts. By implementing a common infrastructure and providing sample applications, autonomic computing research will be better prepared to develop and evaluate self-management techniques
Engineering of Autonomic and Autonomous Systems, 2006. EASe 2006. Proceedings of the Third IEEE International Workshop on; 04/2006
[show abstract][hide abstract] ABSTRACT: With current trends towards more complex software system and use of higher level languages, a monitoring technique is of increasing importance for the areas such as performance enhancement, dependability, correctness checking and so on. In this paper, we present a formal specification-based online monitoring technique. The key idea of our technique is to build a linking system, which connects a specification animator and a program debugger. The required information about dynamic behaviors of the formal specification and concrete implementation of a target system is obtained from the animator and the debugger. Based on that information, the judgment on the consistency of the concrete implementation with the formal specification will be provided. Not embedding any instrumentation code into the target system, our monitoring technique will not alter the dynamic behavior of the target system. Animating the formal specification, rather than annotating the target system with extra formal specifications, our monitoring technique separates the implementation-dependent description of the monitored objects and the formal requirement specification of them
Engineering of Complex Computer Systems, 2006. ICECCS 2006. 11th IEEE International Conference on; 01/2006
[show abstract][hide abstract] ABSTRACT: The problem of automatic detection of failures of reactive, session-oriented software programs is described. Detection of failures is carried out by a separate unit, which observes the inputs and outputs of the target program and reports the failures detected.
Control and Communications, 2005. SIBCON '05. IEEE International Siberian Conference on; 11/2005
[show abstract][hide abstract] ABSTRACT: The benefits of monitoring the internal health of complex systems are recognized in mature engineering disciplines. Such monitoring helps maintain the operational reliability and availability of the system. Recently, research has begun to address the notion of health of complex software systems and its monitoring. This paper outlines a three-layer software health monitoring architecture and presents a collection of design patterns for the bottom two layers of the architecture. The patterns can be implemented with aspect-oriented technologies, which increase system modularity and facilitates retrofitting of monitoring capability onto existing systems. The application of the patterns to a control program of a small telephone exchange is described and the results of its general assessment are summarized.
Engineering of Complex Computer Systems, 2005. ICECCS 2005. Proceedings. 10th IEEE International Conference on; 07/2005
[show abstract][hide abstract] ABSTRACT: The paper presents a curriculum for a 4-year undergraduate program in Embedded System Engineering (ESE). The curriculum was developed using a two-step approach. First, a body of education knowledge for Embedded System Engineering was defined. The body consists of sixteen knowledge areas. Each area is composed of several knowledge units, some designated as core and others as electives. The minimum lecture time for the core of each knowledge area is identified. The Body of Knowledge for Computer Engineering, developed by the IEEE-CS/ACM task force for Computing Curricula, was used as a reference. The education knowledge for ESE then served as the base for the development of the program curriculum. The curriculum has a strong mathematics and basic science base, an in-depth exposure to engineering science and design of systems implemented with digital hardware and software, and coverage of two prominent application areas of embedded systems. The curriculum core takes approximately 3 years of the program; the remaining part is elective.
[show abstract][hide abstract] ABSTRACT: The quality of software components is very important for the overall service quality of the component-based software systems. Several factors make exhaustive testing of components very difficult. Furthermore, the behavioral correctness of each independently produced component does not guarantee the behavioral correctness of the composed software system. Experience shows that there are faults in components which elude the testing effort and do not surface until the system is operating. In this paper, a specification-based software monitor is presented which can be used for detecting certain kinds of errors and failures of a component as well as the whole system while the system is operating. The behavior of each component is assumed to be specified in a formalism based on communicating finite state machines with addressing variables, and inter-component communications are achieved via asynchronous message passing. The monitor passively observes the external input/output and receives partial state information of the target system or component. These are used to interpret the specification. The approach is compositional as it achieves global monitoring by analyzing the behavior of the components of a system individually, and then combining the results obtained from the independent component analyses. The paper describes the architecture and operations of the monitor and includes illustrative examples. Techniques for dealing with non-determinism and concurrency issues in monitoring a concurrent component-based system are also discussed.
[show abstract][hide abstract] ABSTRACT: Global predicate evaluation is a fundamental problem in distributed systems. This paper views it from a different perspective, namely that of the signals and systems area of electrical engineering. It adapts a signal processing approach to address this problem in the context of monitoring of 'health' of a software system. The global state of the system is viewed as a 'state' signal which evolves over time. The distributed processes are assumed to possess roughly synchronized clocks. The states of individual processes are periodically sampled and reported to a global monitor. The observed system state constructed by the global monitor is viewed as being composed of two components - the consistent global states and an error signal due to the messages in transit and differences in the local clocks. The global monitor removes the error signal by processing the observed global signal through a low-pass filter. It evaluates the predicates on the filtered signal. The approach presented is applicable to distributed systems which are semi-stationary, i.e. whose internal states of interest remain stable over comparatively long intervals of time. The paper presents the relevant signal processing concepts (p-spectrum and p-filtering), outlines an architecture for global predicate monitoring and describes the signal processing done in the global monitor. The paper then summarizes an evaluation of the approach presented on a small computer aided vehicle dispatch system. The evaluation experiments are described and the results are presented and analyzed.
Reliable Distributed Systems, 2004. Proceedings of the 23rd IEEE International Symposium on; 11/2004
[show abstract][hide abstract] ABSTRACT: We describe a monitoring approach for evaluation of the response performance of services delivered by real-time software systems. Our approach handles certain specification nondeterminism in the behavioral requirements of these systems and is capable of concurrently measuring state-dependent response time intervals. We detect impairments to service performance as response performance failures, i.e., those system response time intervals that statistically exceed some specified maximum delay. While monitoring of behavioral correctness may require a full specification model to detect behavioral failures, our approach detects response time and response performance failures using a reduced timepost-model (TPM). We consider those targets whose: (1) behavior is specified using communicating extended finite state machines; (2) response time objectives are tabular in format. We present an algorithm for deriving an interpretable TPM from these software requirements. We report and comment on an experimental evaluation of the TPM derivation algorithm and the robustness of the approach in the presence of behavioral failures. The target in the evaluation was the call processing program for a small telephone switch.
[show abstract][hide abstract] ABSTRACT: This paper proposes a specification-based monitoring approach for automatic run-time detection of software errors and failures of distributed systems. The specification is assumed to be expressed in communicating finite state machines based formalism. The monitor observes the external I/O and partial state information of the target distributed system and uses them to interpret the specification. The approach is compositional as it achieves global monitoring by combining the component-level monitoring. The core of the paper describes the architecture and operations of the monitor The monitor includes several independent mechanisms, each tailored to detecting specific kinds of errors or failures. Their operations are described in detail using illustrative examples. Techniques for dealing with nondeterminism and concurrency issues in monitoring a distributed system are also discussed with respect to the considered model and specification. A case study describing the application of the prototype monitor to an embedded system is presented.
Dependable Systems and Networks, 2002. DSN 2002. Proceedings. International Conference on; 02/2002
[show abstract][hide abstract] ABSTRACT: In the development of many software systems, the focus was on functionality. When these systems are used in situations requiring higher reliability and availability, such qualities must be retrofitted into the system. This paper considers a way of adding such capability to existing software by taking advantage of aspect-oriented programming, a recently developed technology which supports encapsulation of non-functional concerns. The paper introduces the notion of a system health index as a measure of the internal system well-being, and identifies a number of health indicators, i.e. operational metrics from which a health index could be derived. The paper then discusses an aspect-oriented implementation of health indicators and lists several applicable aspect-oriented design patterns. Experience obtained from the implementation of health indicators in a distributed system whose original development focused only on functionality is then summarized. The experience suggests that time and space overhead and development costs are moderate, and that there is a distinct advantage of aspect-oriented implementation of health indicators.
[show abstract][hide abstract] ABSTRACT: The capability to provide an indication of the internal well-being or health of an operational software system would be very valu- able in a number of situations. This paper considers a way of adding such capability to existing Java programs by taking advan- tage of AspectJ, an aspect-oriented programming language. It introduces an approach for detecting internal state corruption by using health indicators which perform state consistency checks. An example is presented and experience obtained from the As- pectJ implementation of a number of state consistency health indicators is summarized.
[show abstract][hide abstract] ABSTRACT: This paper presents an approach based on assume-guarantee style reasoning for automatic detection of software failures. Reasoning
about failures requires knowing the expected behavior. The paper considers the case when the requirement specification of
the behavior of the target system is available, and expressed in a formalism based on communicating finite state machines.
The failure detector observes the external inputs and outputs, and receives partial information about the internal state of
the target system. Using this information, it interprets the specification, and determines whether a failure has occurred.
A key issue in the interpretation of the specification is the efficiency of handling of inherent nondeterminism present in
the specification. The paper describes, in a step by step manner, a compositional approach for online failure detection which
reduces the computational costs of dealing with non-determinism. The details of the algorithms required in each of the steps
are provided. To evaluate the algorithms described, a prototype failure detector was used to detect failures of the control
program of a small telephone exchange. We present some of the results obtained.
Integrated Formal Methods, Third International Conference, IFM 2002, Turku, Finland, May 15-18, 2002, Proceedings; 01/2002
[show abstract][hide abstract] ABSTRACT: Large software systems, such as telecom applications, are often
built on reused components. Such systems are often developed using
components from previous similar projects, or simply using a
reconfiguration of the same set of components from a previous similar
project. Performance prediction of such software system architectural
design provides a quantified measurement for better design quality. The
design is specified in a communicating extended finite state machine
model. The model is extended with stochastic information and simulated
for performance prediction. The stochastic extension requires
performance data for each component and load information of the system
environment. This paper addresses the problem of abstracting stochastic
performance model of a component to be reused in a software
architectural design. We use a software supervision approach to monitor
the performance of a deployed component and collect its execution trace,
including individual time stamps of the externally observable signals.
We then derive a stochastic performance model of the component from the
trace. The model can be used later in performance prediction when the
component is reused. We applied this method to a control program of a
small telephone exchange. We were able to reuse a component and its
performance data in a new exchange design
Performance, Computing, and Communications Conference, 2000. IPCCC '00. Conference Proceeding of the IEEE International; 03/2000
[show abstract][hide abstract] ABSTRACT: Building software systems from prefabricated components is a very
attractive vision. Distributed component platforms (DCP) and their
visual development environments bring this vision closer. However, some
experiences with component libraries warn us about potential problems
that arise when software-system families or systems evolve over many
years of changes. Indeed, implementation-level components, when affected
by many independent changes, tend to grow in both size and number,
impeding reuse. This unwanted effect is analysed in detail. It is argued
that components affected by frequent unexpected changes require higher
levels of flexibility than the `plug-and-play' paradigm is able to
provide. A program construction environment is proposed, based on
generative programming techniques, to help in customisation and
evolution of components that require much flexibility. This solution
allows the benefits of DCPs to be reaped during runtime and, at the same
time, keeps components under control during system construction and
evolution. Salient features of a construction environment for component
based systems are discussed. Its implementation with commercial reuse
technology Fusion<sup>TM</sup> is described. The main lesson learnt from
the project is that generative-programming techniques can extend the
strengths of the component based approach in two important ways: 1)
generative-programming techniques automate routine component
customisation and composition tasks and allow developers work more
productively, at a higher abstraction level; 2) as custom components
with required properties are generated on demand, it is not necessary to
store and manage multiple versions of components