Realization of Fault-Tolerant Home Network Management Middleware with the TMO Structuring Approach and an Integration of Fault Detection and Reconfiguration Mechanisms
ABSTRACT A middleware model named ROAFTS (Real-time Object-oriented Adaptive Fault Tolerance Support) has been evolving in the UCI DREAM Lab. over the past decade as the core of a reliable execution engine model for fault-tolerant (FT) real-time (RT) distributed computing (DC) applications. It is meant to be an integration of various mechanisms for fault detection and recovery in a form that meshes well with high-level RT DC object-/component- based programming, in particular, TMO (Time-triggered Message-triggered Object) programming. Using ROAFTS as a backbone and low-layer middleware, we developed a model and a skeleton implementation for FT DC middleware providing efficient FT execution services for component-based home network applications. Capabilities for management of home information processing devices, including health monitoring of home devices, reconfiguration of device connections, and servicing queries on device status, were added to ROAFTS. Those additions were first designed as a network of high-level RT DC components, i.e., TMOs. Then the TMO network was extended into an FT TMO network by applying the replication scheme called the PSTR (Primary-Shadow TMO Replication) scheme and incorporating a component responsible for reconfiguring TMO replicas. This extension of ROAFTS is called ROAFTS-HNE (Home Network Extension) and its architecture is presented here. In addition, during the development of the ROAFTS-HNE model, we formulated a new approach for applying the PSTR scheme to RT DC components supported by ROAFTS. Finally, evaluations of the recovery times of a prototype implementation have been conducted.
- SourceAvailable from: psu.edu[show abstract] [hide abstract]
ABSTRACT: Enterprises are increasingly involved in worldwide round-the-clock e-commerce and e-business, which requires them to be operational 24 hours per day, 7 days per week. With outages leading to loss of revenue, reputation and customers, fault tolerance becomes increasingly important. By mixing the fault tolerance logic into the application logic, existing fault tolerance practices render applications more complex, more prone to errors, and more difficult to maintain and build. The Eternal system is a component-based middleware framework that provides transparent fault tolerance for enterprise applications, and that ensures continuous operation without requiring special skills of the application programmers. The Eternal system implements the new Fault-Tolerant CORBA standard. Copyright © 2002 John Wiley & Sons, Ltd.Software Practice and Experience 07/2002; 32(8):771 - 788. · 1.01 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Validation of the dependability of distributed systems via fault injection is gaining importance because distributed systems are being increasingly used in environments with high dependability requirements. The fact that distributed systems can fail in subtle ways that depend on the state of multiple parts of the system suggests that a global-state-based fault injection mechanism should be used to validate them. However, global-state-based fault injection is challenging since it is very difficult in practice to maintain the global state of a distributed system at runtime with minimal intrusion into the system execution. We present Loki, a global-state-based fault injector, which has been designed with the goals of low intrusion, high precision, and high flexibility. Loki achieves these goals by utilizing the ideas of partial view of global state, optimistic synchronization, and offline analysis. In Loki, faults are injected based on a partial, view of the global state of the system, and a post-runtime analysis is performed to place events and injections into a single global timeline and to discard experiments with incorrect fault injections. Finally, the experiments with correct fault injections are used to estimate user-specified performance and dependability measures. A flexible measure language has been designed that facilitates the specification of a wide range of measures.IEEE Transactions on Parallel and Distributed Systems 08/2004; · 1.80 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: The Time-triggered Message-triggered Object (TMO) programming and specification scheme came out of an effort to remove the limitations of conventional object structuring techniques in developing real-time (RT) distributed computing components and composing distributed computing applications out of such components and others. It is a natural and syntactically small but semantically powerful extension of the object oriented (OO) design and implementation techniques which allows the system designer to specify in natural and yet precise forms timing requirements imposed on data and function components of high-level distributed computing objects. TMO Support Middleware (TMOSM) was devised to be an efficient middleware architecture that can be easily adapted to many commercial-off-the-shelf (COTS) hardware + kernel operating system platforms to form efficient TMO execution engines. However, up until 2003, its adaptations were done for Microsoft Windows platforms only. As we have been developing and refining an adaptation of TMOSM to the Linux 2.6 operating system platform in recent years, TMOSM has been refined to possess further improved modularity and portability. This paper presents the refined TMOSM as well as the techniques developed for efficient adaptation of TMOSM to the Linux 2.6 platform.Real-Time Systems 01/2007; 36:75-99. · 0.55 Impact Factor