About
115
Publications
34,643
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
669
Citations
Introduction
Current institution
Additional affiliations
December 2013 - present
Publications
Publications (115)
This paper describes a Reliability Block Diagram (RBD) Library that has been added to the Risk Analysis and Assessment Modeling Language (RAAML) version 1.1, the Object Management Group (OMG) standard for safety and reliability extensions for SysML. A discussion of relevant reliability concepts is provided followed by a description of the RBD libra...
Model‐based systems engineering depends on correct models. However, thus far, relatively little attention has been paid to ensuring their correctness. This paper describes a methodology for performing verification and validation on models written in SysML. The methodology relies on a catalog of candidate requirements that can be tailored for a spec...
This paper presents a profile that extends the Systems Modeling Language (SysML) to support the requirements of MIL‐STD‐882E and facilitate the System Safety process. MIL‐STD‐882E is the U.S. Department of Defense (DoD) standard for System Safety Engineering (SSE). It mandates a series of analyses for hazard identification and tracking throughout s...
A SysML library and method for calculating system reliability and availability is described. The method can be used to model and predict reliability and availability early in the design and continue through detailed design and system integration. Values for failure rates, restoration rates, recovery probabilities, and other parameters are stored in...
Fault tree analysis (FTA) is a top‐down method for identifying the discrete primary failure events that lead to system failures (top level events), and the means for determining the probability of the top‐level events if the probabilities of primary discrete failure events are known. Fault tree models document the logical combination of events that...
Model Based Systems Engineering depends on correct models. However, thus far, relatively little attention has been paid to ensuring their correctness. This paper describes a methodology for performing verification and validation on models written in SysML. The methodology relies on a catalog of candidate requirements that can be tailored for a spec...
In this article, I present a method for producing a failure modes and effects analysis (FMEA) from SysML together with an application to a microgrid control system. The significance of the method is the modeling of failure propagation which enables not only an automated approach but also additional results that systems engineers can use to support...
A method for producing a Failure Modes and Effects Analysis (FMEA) from SysML is presented together with a simple critical infrastructure system example. The significance of the method is the modeling of failure propagation which enables not only an automated approach but significant additional analysis results that can be used to support reliabili...
A method for modeling failure propagation is presented in the context of producing a Failure Modes and Effects Analysis (FMEA) from SysML with a critical infrastructure system example. The significance of the method is the modeling of failure propagation which enables not only an automated approach but significant additional analysis results that c...
Electrical power generation, transmission, and distribution are among the most vital of the critical infrastructure services in modern society. While electrical grids in developed regions of the world are highly reliable, they are vulnerable to cyberattacks, extreme weather, electromagnetic pulses, and other extreme natural phenomena and malicious...
This article describes a model-based systems engineering (MBSE) approach to cyberattack resilience modeling for an electrical power substation and demonstrates that (1) resiliency can be quantitatively characterized to enable design tradeoffs and (2) the analysis can be incorporated into a Systems Modeling Language (SysML) model to enable it to be...
Industry has faced increasing challenges to deliver systems that not only meet the performance goals of their customers but can work reliably as well. As systems have become more complex and cost and schedule pressures have increased, Department of Defense systems are failing to meet the reliability and availability needs of their users. Legislatio...
This paper describes a method for automated generation of Failure Modes and Effects Analyses from SysML models containing block definition diagrams, internal block diagrams, state transition machines, and activity diagrams. The SysML model can be created in any SysML modeling tool and then an analysis is performed using the AltaRica language and mo...
This paper describes a method for automated generation of Failure Modes and Effects Analyses from SysML models containing block definition diagrams, internal block diagrams, state transition machines, and activity diagrams. The SysML model can be created in any SysML modeling tool and then an analysis is performed using the AltaRica language and mo...
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. This chapter articulates some of the success enablers for deploying big data on clouds (BDOC), in the context of some historical perspectives and emerging g...
This paper describes a method for automated generation of Failure Modes and Effects Analyses from SysML models containing block definition diagrams, internal block diagrams, state transition machines, and activity diagrams. The SysML model can be created in any SysML modeling tool and then analysis is performed using the AltaRica language and model...
This paper describes the key criteria for quantitative reliability and availability testing of software intensive systems. Soundly defined requirements are the basis for reliability testing. They are derived from the sustainability KPPs and KSAs defined in CJCSI 3170.01H and must state not only quantities such as operational availability and materi...
• Question - How can Failure Modes and Effects Analyses be generated from SysML models? • Motivation - Technical: Growing ubiquity, complexity, and safety criticality of systems containing software - Programmatic: Reduce cost and schedule burden of FMEAs to levels tolerated by developers and their management - Cultural: Growing use of SysML and • M...
SUMMARY A software development organization's process trustworthiness can be enhanced through development project retrospectives or postmortems, which seek to produce lessons from mistakes and successes in the project performance. Tool support for such retrospectives is usually limited to qualitative analysis. A new model, Dynamic COQUALMO, offers...
Cyberphysical (embedded) computer system availability and reliability can be modeled and assessed using the Architecture Analysis and Design Language (AADL) and its Error Model Annex. AADL can represent systems at multiple levels of abstraction. Therefore, analyses can be performed early and often throughout the development process thereby minimizi...
This paper describes how safety and reliability for space systems can be modeled and assessed using the Architecture Analysis and Design Language (AADL) and its Error Model Annex. AADL can graphically represent systems at multiple levels of abstraction. Therefore, analyses can be performed early and often throughout the development process. Thus, c...
This paper describes the methodology and results of an integrated hardware and software reliability and availability model of an experimental satellite. The satellite computing architecture is based on dual redundant spacecraft and payload processors. The reliability and availability model was implemented as a Stochastic Activity Network. The model...
Various techniques have been used for managing software quality, including those that predict defect counts over time. This
paper introduces a simulation model based on COQUALMO, which can be calibrated to organizational process performance for estimating
counts of residual defects. This simulator has the additional benefit of producing a set of es...
This paper describes a failure modes and effects analysis for a large software-intensive system that controls the NAVSTAR Global Positioning System satellite constellation. The objective of the assessment was to determine (1) what safety concerns arise due to the functionality, architecture, and implementation of the system; and (2) whether these c...
We describe a specific application of assurance cases to the problem of ensuring that a transition from a legacy system to its replacement will not compromise mission assurance objectives. The application in question was the transition of the Global Positioning System (GPS) to a new ground-control system. The transition, which took place over five...
This paper describes the set of analyses used to evaluate the safety and operational effectiveness of a large ground system as well as the transition from its predecessor "legacy" system. The objective of the assessment is to determine (1) what safety concerns arise due to the functionality, architecture, and implementation; (2) whether these conce...
This paper describes an approach reliability and maintainability practices and programs that address problems specific to software intensive space systems. Elements of these programs include a precise statement of reliability requirements, a reliability program that addresses software, a software development program that addresses reliability, and...
This study proposes a systems-level approach to airport and runway availability assessments and prediction, and addresses the problem of the aging or continuously degrading aviation infrastructure.
Although the availability block diagrams are often used in the availability assessment of aerospace and electronic systems, their application to the air...
System-level reliability and availability requirements set forth by U.S. Government agencies procuring large software-intensive systems encompass both hardware and software. However, specifications, statement of work requirements, and compliance documents (standards) usually implicitly or explicitly focus on hardware and are largely silent about so...
This paper describes stochastic methods for assessing risk in integrated hardware and software systems. The methods assess evaluate availability, outage probabilities, and effectiveness-weighted degraded states based on data from measurements with a specified confidence level. System-level reliability/availability models can also identify the eleme...
The widespread introduction of sophisticated and complex software and firmware into consumer products ranging from automobiles to home appliances has given rise to a new type of hazard to consumers related to software or firmware defects. Conventional products liability law requires that a plaintiff show that a manufacturing, design, or labeling de...
A model that predicts staffing requirements in the National Air Space (NAS) Facilities using three sub-models (preventative maintenance, watch-standing, and corrective maintenance) is described. The means by which service metrics can be defined using these models is proposed, and the benefit of being able to use these service metrics as a basis for...
This paper presents a stochastic model that accounts for failure probability as a function of inspection frequency and effectiveness. A total ownership cost (TOC) model is presented. The model is then applied to a cargo vessel. Stress-strength relationships predict the failure likelihood of details or critical points within the structure. These ind...
Model-based software development, particularly when it utilizes unified modeling language (UML) tools, provides artifacts that make programs more transparent. We use these capabilities to automate major steps in the generation of a software FMEA. Automation not only reduces the labor required but also makes the process repeatable and removes many s...
Today, there are millions of accidents caused by consumer products each year, and an increasing number will be related to the firmware contained in embedded computing devices. These accidents are unfortunately not totally avoidable. However, a proactive safety analysis and failure reporting methodology will reduce the likelihood and severity of suc...
One of the most important attributes of on-line computer systems that are performing mission or business critical applications is availability. System Engineers are often called upon to predict the reliability of such systems as part of proposal preparation, architecture definition, design reviews, and operation. However, traditional modeling techn...
With the increasing air traffic and growth of deployed FAA
equipment, high equipment availability and low outage time is also
becoming more important. While the use of simulation models and simple
queuing models for assessing the impact of staffing on availability has
been available for more than 5 decades, it has not been widely used
because of th...
With the increasing air traffic and growth of deployed FAA
equipment, high equipment availability and low outage time is also
becoming more important. While the use of simulation models and simple
queuing models for assessing the impact of staffing on availability has
been available for more than 5 decades, it has not been widely used
because of th...
This paper describes the use of standard reliability modeling
techniques-Markov modeling and reliability block diagrams-to analyze a
web site and develop the answers to strategic questions on the
configuration and operation of high availability computing systems. The
analyses are performed using MEADEP, a powerful reliability analysis
tool capable...
Describes OFTT (OLE Fault Tolerance Technology), a fault tolerance middleware toolkit running on the Microsoft Windows NT operating system that provides the required fault tolerance for networked PCs in the context of industrial process monitoring and control applications. It is based on the Microsoft Component Object Model (COM) and consists of co...
This paper describes the design and implementation of software
infrastructure for real-time fault tolerance for applications on long
duration deep space missions. The infrastructure has advanced
capabilities for Adaptive Fault Tolerance (AFT), i.e., the ability to
change the recovery strategy based on the failure history, available
resources, and t...
Computer-based control systems have grown more complex over the
past two decades. Thus, the software aspects of system reliability are
an increasingly important concern. Current methods of software and
system reliability prediction-whether measurement based or incorporating
reliability growth models-cannot accurately predict failure rates of
greate...
This paper describes how MEADEP (http//:www.sohar.com/meadep) a
system level dependability prediction tool, and SMERFS (Farr and Smith,
1993), a software reliability growth prediction tool can be used
together to predict system reliability, availability growth for complex
systems. The Littlewood/Verrall model is used to predict reliability
growth f...
This paper presents an analytical model and software tool that can be used by non-experts to relate FAA maintenance resources including staffing, training, shift allocation, and geographical deployment to NAS facility and service downtime and availability. The analytical methodology and tool presented in this paper make it possible for any user to...
MEADEP (measure dependability) is a user-friendly
dependability-evaluation tool for measurement-based analysis of critical
systems. MEADEP consists of 4 software modules: a data preprocessor for
converting data in various formats to the MEADEP format, a data analyzer
for graphical data-presentation and parameter estimation, a graphical
modeling int...
This paper reports some of the authors' experience in using
MEADEP-a newly developed measurement-based dependability evaluation tool
that includes both data analysis and modeling functions. Several issues
are discussed: identification of time between outages and time to repair
distributions; need for more graphical model forms; and consistency
betw...
This paper describes generic airport runway arrival and departure
availability models and provides results of the models on nine airports.
The models utilize a three tiered hierarchical approach
(runway/configuration/airport). They include local navigational aids,
and instrument landing systems (ILS). They take into consideration
runway length, vis...
Although there are several measurement and model based approaches
to assessing the compliance of critical computing systems with
reliability requirements, applying these approaches requires
sophisticated data analysis and mathematical skills so that reliability
engineers often hesitate to perform such a task. The need to develop
cost effective, cre...
This paper describes a model for assessing the impact of staffing
on outage times and availability in the US national network of air
traffic control equipment using a finite queuing model. Because of the
wide geographic distribution of FAA facilities and equipment,
maintenance is provided out of a national network of cost centers. Each
such center...
Defensible quantitative assessments of the reliability and
availability of computer systems including software are possible. This
paper characterizes the need for quantitative empirically-based
dependability assessment, describes some of the previous work in this
area and identifies problems. While there is still ongoing research in
measurement-bas...
MEADEP (measure dependability) is a user-friendly dependability
evaluation tool for measurement-based analysis of computing systems
including both hardware and software. Features of MEADEP are: a data
processor for converting data in various formats (records with a number
of fields stored in a commercial database format) to the MEADEP format,
a sta...
MEADEP is a user-friendly dependability evaluation tool for measurement-based analysis of completing systems. Features of MEADEP include: a data processor for converting data in various formats to the MEADEP format, a statistical analysis module for graphical data presentation and parameter estimation, a graphical modeling interface for building re...
Although there are several measurement and model based approaches to assessing the compliance of critical computing systems with reliability requirements, applying these approaches requires sophisticated data analysis and mathematical skills so that reliability engineers often hesitate to perform such a task. The need to develop cost effective, cre...
Guidelines for the programming and auditing of software written in high level languages for safety systems are presented. The guidelines are derived from a framework of issues significant to software safety which was gathered from relevant standards and research literature. Language-specific adaptations of these guidelines are provided for the foll...
The objective of the paper is to reduce the cost of testing
software in high assurance systems. It is at present a very expensive
activity and one for which there are no generally accepted guidelines. A
part of the problem is that failure mechanisms for software are not as
readily understood as those for hardware, and that the experience of any
one...
In many cases, it is possible to derive a quantitative reliability
or availability assessment for systems containing software with the
appropriate use of system-level measurement-based modeling and
supporting data. This paper demonstrates the system-level measurement
based approach using a simplified safety protection system example. The
approach i...
The great difficulties that are encountered when reliability
requirements for critical software have to be validated motivate an
approach that facilitates testing for exceptional conditions that the
software is expected to handle. It is shown that in several published
studies, failures in previously tested critical programs occurred when
rare event...
In this paper we identify special quality assurance and test requirements of software for safety systems and show that even
the best currently available practices meet these requirements only at very high cost and by application of empirical rather
than technically rigorous criteria. Redundancy can help but is expensive and the reduction in failure...
MEADEP is a user-friendly dependability evaluation tool for measurement-based analysis of computing systems. Features of MEADEP include: a data processor for converting data in various formats to the MEADEP format, a statistical analysis module for graphical data presentation and parameter estimation, a graphical modeling interface for building rel...
Presents an overview of a measurement-based methodology for
dependability evaluation of critical digital systems and describes a
software tool under development for it. The approach is based on
measurements of operational systems and on dependability models to
provide quantitative reliability and availability assessments with
stated confidence leve...
Guidelines for the programming and auditing of software written in high level languages for safety systems are presented. The guidelines are derived from a framework of issues significant to software safety which was gathered from relevant standards and research literature. Language-specific adaptations of these guidelines are provided for the foll...
This paper presents an overview of a measurement-based dependability evaluation of digital safety systems and describes a software tool under development for it. The approach is based on measurements on operational systems and dependability models to provide quantitative assessments for system reliability and availability with stated confidence lev...
The paper discusses a measurement-based approach to dependability evaluation of fault-tolerant, real-time software systems based on failure data collected from stability tests of an air traffic control system under development. Several dependability analysis techniques are illustrated with the data: parameter estimation, availability modeling of so...
Conventional dependability measures, such as reliability or availability, assume that the equipment characterized by the measure is either operational or has failed. This dichotomy does not hold for decentralized or distributed systems because these can operate in modes in which partial or degraded service is furnished. Whether a specific degraded...
The problem of developing software for critical systems in the decision support context is considered. The limitations of existing software development methodologies are mentioned and a new methodology, cooperating diverse experts (CDE), is proposed. This new methodology draws upon techniques used in multiple version software and in distributed rec...
The following material is furnished as an experimental guide for the use of risk based classification for nuclear plant protection systems. As shown in Sections 2 and 3 of this report, safety classifications for the nuclear field are application based (using the function served as the primary criterion), whereas those in use by the process industry...
High integrity systems include all protective (safety and mitigation) systems for nuclear power plants, and also systems for which comparable reliability requirements exist in other fields, such as in the process industries, in air traffic control, and in patient monitoring and other medical systems. Verification aims at determining that each stage...
Verification and validation activities in defense projects are
compared with those in the nuclear power plant industry. A significant
difference is that in most defense projects the V&V effort is funded
directly by the sponsor whereas in the nuclear power industry the major
responsibility for V&V resides with the developer. This may cause a
shift i...
Software verification and validation (V&V) methodologies were investigated for high integrity systems. The effort was jointly sponsored by the Nuclear Regulatory Commission and the Electric Power Research Institute as a precursor to official nuclear regulatory guidance. The technology is dual-use; both the nuclear and defense communities will benef...
The concept, design, and features of a fault-tolerant intelligent
robotic control system being developed for Space and commercial
applications that require high dependability are described. The
comprehensive strategy integrates system level hardware/software fault
tolerance with task level handling of uncertainties and unexpected
events for robotic...
Describes extensions to the Halstead Software Science measures for
avionics software written in Ada. The Halstead measures are based only
on the syntax of the program text (operators and operands) without
considering the semantics of the applications. Multitasking real-time
software, widely used in avionics control, is generally more difficult
to d...
A distributed fault tolerant system for process control that is
based on an enhancement of the distributed recovery block (DRB) is
described. Fault tolerance provisions in the system cover software
faults by use of the DRB; hardware faults by means of replication and
the DRB; system software faults by means of replication, loose coupling,
periodic...
A distributed fault tolerant system for process control based on
an enhancement of the distributed recovery block has been implemented
and integrated into a chemical processing system. Fault tolerance
provisions in the system cover software faults by use of the distributed
recovery block (DRB); hardware faults by means of replication, loose
couplin...
A fault-tolerant architecture that provides tolerance to a broad
scope of hardware, software, and communications faults is being
developed. This architecture relies on widely available commercial
operating systems, local area networks, and software standards. Thus
development time is significantly shortened, and modularity allows for
continuous and...
In a study conducted for the USAF Rome Air Development Center to improve reliability prediction for spacecraft approximately 2600 incident reports on 300 spacecraft (representing 100 programmes) that were launched between the early 1960s and January 1984 were analysed. The causes of the spacecraft failures and the severity of the observed effects a...
This paper presents a new approach for verification of fault tolerant software in aerospace applications. The approach, called the "Enhanced Condition Table", integrates the merits of functional and structural testing in a single framework. The goal is to generate a reasonably sized set of test cases that will reveal operationally significant defec...
A systems approach to the analysis and control of software reliability is described which is intended to supplement conventional software reliability models which focus on program attributes under the control of the software professionals. A review of software reliability experience during the operations and maintenance (O&M) phase is presented. Th...
This study provides the basis for improving the utility of Mil-Hdbk- 217 for reliability prediction of spacecraft components and systems. The reliability performance histories of 300 satellite vehicles, which were launched between the early 1960's through Jan 84, were reviewed and analyzed during the course of the study. Analysis of over 2500 repor...
The work reported here provides protection against software failures in the task dispatcher of the FTMP, a particularly critical portion of the system software. Faults in other system modules and application programs can be handled by similar techniques but are not covered in this effort. Goals of the work reported here are: (1) to develop provisio...
Software error data of major recent Digital Flight Control Systems Development Programs. The report summarizes the data, compare these data with similar data from previous surveys and identifies trends and disciplines to improve software reliability.
A growing need exists for improved fault tolerance, reliability, and testability in distributed systems which support Command, Control and Communications and Intelligence (C3I) activities. The objective of this study is to provide a foundation for the development of design measures and guidelines for the design of fault tolerant systems. Taxonomies...
The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A des...
A categorized data base of software errors which were discovered during the various stages of development and operational use of the Deep Space Network DSN/Mark 3 System was developed. A study team identified several existing error classification schemes (taxonomies), prepared a detailed annotated bibliography of the error taxonomy literature, and...
An equation of state based on the properties of normal fluids, the law
of rectilinear averages, and the second law of thermodynamics can be
derived for advanced fuels on the basis of the vapor pressure, enthalpy
of vaporization, change in heat capacity upon vaporization, and liquid
density at the melting point. The method consists of estimating an...
used to determine the measure of association of candidate causes and effects during the development process and analyses can therefore be used to make adjustments in the development process. However, such data are not useful for quantitative measurement of reliability, availability, recovery time, recovery probability, and extent of common mode fai...