ArticlePDF Available

Critical Steps: How to Identify and Manage the Most Important Human Performance Risks in Operations

Authors:
  • Muschara Error Management Consulting, LLC

Abstract

A critical step is a human action that will trigger immediate, irreversible, and intolerable harm to an asset if that action or a preceding action is performed improperly. Critical steps occur anytime human performance involves a substantial transfer of energy, or the movement of solids, liquids, and gases, or the transmission of data and information that, if not performed under control, could cause serious harm to one or more important assets. Accidents occur when front-line production personnel lose control of such transfers during work.
1
CRITICAL STEPS:
MANAGING HUMAN OPERATIONS THAT MUST GO RIGHT EVERY TIME
Tony Muschara, CPT
Muschara Error Management Consulting, LLC
4724 Outlook Way NE, Marietta, Georgia 30066 USA
tmuschara@muschara.com
Abstract
A critical step is defined as a human action that will trigger immediate, irreversible, and intolerable harm to an
asset, if that action or a preceding action is performed improperly. Critical steps occur anytime an action involves
a substantial transfer of energy, the movement of mass, or the transmission of information that, if not performed
under control, could result in serious harm to an organization’s assets. Accidents typically occur when front line
production personnel lose control of such transfers during work. Human beings, because of human fallibility, are
the greatest source of variation in operations. First, this paper describes the critical step concept and the
preceding conditionsrisk-important actionsthat create critical steps. Second, a couple of means of managing
and controlling performance of critical steps during operations are described. Included is an adaptation of the
cornerstones of resilience described by Erik Hollnagel in the Ashgate series on Resilience Engineering. Risk-based
thinkinganticipate, monitor, respond, and learnis offered as a fundamental first principal of identifying and
controlling the performance of critical steps during operations.
1 INTRODUCTION
Do you need a parachute to sky dive? Literally, no. However, you definitely need a parachute to sky dive twice.
Although humorous, this example conveys the fact that some things have to go right first time, every time. There
is a point of no return, involving substantial transfers of energy, movements of mass, or transmissions of
information such that, if not performed under control, could result in serious, in not tragic, harm to people,
damage to product, property, or environment, or even loss that threatens the survival of the organization.
Accidents typically occur when front line production personnel at the sharp end lose control of such transfers
during work.
Recall from the definition of a critical step that it involves harm to something of importancean asset. An asset
is anything of substantial value to an organization. For a commercial airline pilot, the primary assets are
passengers and the aircraft. Secondary assets include schedule adherence and quality of service. For a hospital
assets include patients, its staff, it medications, and its facilities. For a biotech company, it is the drug substance.
For a nuclear power plant, it is the reactor core. And so on. For the purposes of this paper, assets include those
things deamed important to the organization and vulnerable to human action during opertions.
A human interaction with an asset, directly or indirectly, such that it changes the state of that asset, whether for
good or ill, is referred to as a touchpoint. All work accomplishments include a series of touchpoints intended to
produce a work output for business purposes. Most touchpoints add value, but, occasionally, some extract value
causing damage or loss. Actions that extract value trigger unwanted releases of energy, movements of mass, or
tranmissions of information, if the operator loses control at these critical touchpointthese are critical steps.
Therefore, it becomes important for high-hazard organizations to methodically identify critical steps, identify
preceding risk-important action (discussed later), install appropriate controls, and prepare front-line workers to
recognize and adapt conservatively to the surprise occurrences of critical steps not identified previously. Hence,
together, risk-based thinking and chronic unease (discussed later) serve as an important way of thinking and
feeling about risk-important operations, enhancing people’s adaptive capacity during work.
Most errors occur without our knowledge, because of their trivial nature; there are no apparent consequences
to most errors. However, occasionally, simple human error can trigger serious harm, possibly catastrophic
consequences. What if failure is not an option? A study by the Civil Aviation Safety Authority of Australia suggests
people, in their waking moments, make three to four errors per hour (CASA, 2013, p.27). However, we are not
concerned with avoiding all errors. That’s truly an impossibility. But, can you pinpoint specific operations
(touchpoints) that possess the lethal capacity to cause injury, damage, or loss? Yes, I beleve you can?
2
2 UNDERSTANDING AND IDENTIFYING CRITICAL STEPS
Dr. James Reason, author of Managing the Risks of Organizational Accidents (1998), has said repeatedly that,
“Wherever there are human beings, there is human error—human error is part of the human condition.” It is
normal to err, a behavior that unintentionally deviates from a preferred behavior for a particular situation. More
simply, human error is a loss of controla source of uncertainty in the workplace. This uncertainty becomes
relevant when people perform work on key assets or manipulate the controls of industrial processes.
Understandably, human fallibility is one of the greatest sources of variation in any industrial operation, and thus
it poses a serious risk to the integrity of an organization’s assets and its ability to achieve its goals. During
operational tasks, a worker could unintentionally trigger harm to one or more assets, losing control of transfers
of energy, movements of mass, or transmissions of information during operations.
This innate human tendency to err is not a problem unless it occurs coincidentally at a critical step. (CCPS, 1994,
pp.207-211). Occasionally, people’s slips, lapses, or mistakes trigger harm, extracting value. An asset’s exposure
to human fallibility occurs when workers touch an asset, which usually occurs when front line personnel, such
as operators, nurses, technicians, pilots, surgeons, and others perform work, which usually involve application
of inherent hazards with assets with the intent of adding value. When performing tasks, people are usually
concentrating on accomplishing their immediate production goal, not necessarily safety (Hollnagel, 2009, p.29).
If people cannot fully concentrate on safety 100 percent of the time, then when should people be fully focused
on safe outcomes? I believe at critical steps.
2.1 Critical Step Basics
All tasks involve human action. Some actions change the state of assetstouchpoints. Some touchpoints are
more important than others, initiating substantial transfers of energy, such as tripping a distribution system
circuit breaker, energizing an x-ray machine, starting a large electric-driven pump, giving medication to a patient,
and clicking Enter to start a software update for a distributed digital control system. People perform such actions
many times a day, but could, with the right pre-conditions, seriously hurt someone, damage product, or,
otherwise, destroy something of importance, if performed out of control. The question is which action(s)
absolutely has to go right the first time? To help us accurately identify a critical step, it helps to know the
attributes of one:
TouchPoint: A physical human interaction with an asset or with a control such that the state of the
asset is altered as confirmed by one or more asset parameters.
Transfers: There is a substantial flow of energy, mass, or information associated with the action.
Certainty: There is complete assurance that harm triggered by the action is unavoidable, if there is a
loss of control. It will happen, such as briefly touching a hot burner on a stovetop will be painful possibly
leaving a blister.
Immediate: The occurrence of harm is faster than a performer can humanly respond to avoid the
consequence. In some cases there is no turning back to avoid the harm such as falling out of an aircraft
door without a parachute, described by the next characteristic.
Irreversible: There is no undo. You cannot undo the harm by simply reversing the action; you are past
the point of no return. No means exists to restore the asset to its previous, unharmed state, or the
performer has no means to regain control of transfers of energy, mass, or information after taking the
action.
Intolerable Harm: Significant injury, damage, or loss is realized. The severity of harm depends on the
asset’s susceptibility and on the magnitude and intensity of the release of energy and movement of
mass, and the significance and type of information transmitted and its receiver. An outcome is
intolerable depending on what the organization deems important to safety, quality, the environment,
and production.
Collectively, these attributes define a critical step: a human action that will trigger immediate, irreversible, and
intolerable harm to an asset, if that action or a preceding action is performed improperly . As you read the
following examples, test for yourself that every one of the situations satisfies the definition of a critical step:
Breaching a pressure boundary such as a pipe or tank manhole cover
Entering a confined space where the atmosphere could be deficient of oxygen
Pulling the trigger on a firearm
3
Opening or closing a power system grid circuit breaker that supplies electric power to a hospital
Clicking “Send,” “Start,” or “Enter” after recording or entering sensitive information or data in a digital
control system, including software updates
Leaping out of the door of an airborne aircraft while skydiving
Entering the line of firedirect exposure to a moving hazard you have little or no control over, such as
crossing a street or driving a vehicle on public streets and highways
In every case, there is either a transfer (or interruption) of energy, a movement of mass, or a transmission of
information that could cause immediate, irreversible, intolerable harm to an asset if the performer loses control.
However, it is important to note that an action that has a point of no return is not necessarily a critical step. To
designate an action as a critical step depends on the severity of the consequences, that is, the degree of harm
that ensues after the action. To be useful, reserve the critical step concept for situations that could threaten the
life and well-being of its employees or the public, endanger the environment, jeopardize accomplishment of the
organization’s mission, and so on.
2.2 Risk-Important Actions
People confuse actions they perceive important to safety or quality that really do not satisfy the definition of a
critical step. Many people consider securing your seatbelt before driving away in your automobile to be a critical
step. But, really, is it? As long as the car is not moving, nothing happens if the seatbelt is buckled or unbuckled
no harm occurs. Critical steps involve transfers of energy, movements of mass, or transmissions of information
that could trigger immediate harm to assets, if performed out of control. Though important, such actions trigger
no harm until later at the true critical steps.
In an operational process, the conditions for harm are created earlier by previous actions, where energy, mass,
or information is poised for transfer with the intent to add value. Mistakes that trigger serious harm always
combine with one or more hazardous conditions that already exist. Performers create these conditions by one
or more preceding actions in the process, which are referred to as risk-important actions (RIA). RIAs create the
conditions for potential harm for subsequent critical steps.
A RIA includes any human action preceding a critical step that either 1) creates a condition that could cause
harm to an asset, 2) reduces the number of actions required to initiate a transfer of energy, mass, or information,
or 3) weakens positive control for a critical step. Take skydiving for example. Taking off in an aircraft and flying
at 12,000 feet creates a potentially hazardous condition for any human being, a RIA. Another RIA involves
properly folding the parachute. Finally, donning a parachute prior to leaping across the threshold of an open
door of the aircraft is a necessary precondition for skydiving safely. RIAs possess some or all of the following
characteristics:
Reversible: an action that can be reversed or undone without causing harm; being able to start over.
Using a handgun example, a bullet can be inserted into or ejected from the gun’s chamber, the safety
lever can be moved from on to off and back to on, or the hammer can be cocked and uncocked. In every
case, there is no discharge of the firearm as long as the trigger is not pulled.
Slack: a substantial gap of time exists after an action that affords the opportunity to stop the transfer
of energy, movement of mass, or transmission of information before realizing harm. Time is available
to think and take recovery actions to preclude unwanted outcomes. RIAs always precede critical steps.
Reduced margin to safety limits: reduced number of actions to a critical step, such as defeating safety
devices or functions prior to performing a critical step, such as energizing an x-ray machine to a certain
power level prior to snapping the shutter to release the energy.
Minimal transfer: little or no transfer of energy, relocation of substances, movement of objects, or
communication of information. An asset experiences no substantial change in state, if any.
The following examples illustrate how commonplace risk-important actions are. Notice that in every case, an
asset’s condition does not change—there is no substantial release or transfer of energy, mass, or information,
and the action is reversible as long as the critical step is not performed.
Checking the atmosphere of a large tank before entering it to inspect the internal structure
Adding new oil to an engine during an oil change before starting the engine
Donning and securing a parachute harness before leaping out the open door of an aircraft
Donning a hardhat, protective eyewear, and gloves before entering a construction zone
4
Securing the lanyard of a fall protection harness after reaching an elevated work location
Removing car keys from the ignition and securing them before closing a locked car door after leaving
Just prior to performing a critical step, one has an opportunity to review the outcomes of previously performed
RIAs to verify (prove) that conditions are safe to proceed with the critical step.
2.3 Identifying a Critical Step
Process engineers and procedure writers can improve the effectiveness of anticipating and controlling critical
steps by using methodical, repeatable processes, such as a failure modes and effects analysis, which has been
adapted to identify critical steps. In general terms, a CRITICAL STEP MAPPINGSM process includes the following
high-level phases.
1. Pinpoint important assets, things of value to protect from injury, damage, or loss.
2. Identify significant touchpoints occurring during important work functions (accomplishments).
3. Assess the risk at significant touchpoints, comparing the touchpoint to the definition of a critical step.
4. Develop controls and barriers to avoid loss of control (human error) and damage to assets.
5. Implement and evaluate controls and barriers.
3 MANAGING CRITICAL STEPS
3.1 Risk-Based Thinking
I believe a practical, risk-based approach to managing human performance in operations has been offered
through the research of Erik Hollnagel in a series of articles and chapters associated with Resilience Engineering.
He strongly suggests safety cannot be managed simply by imposing restraints on how work is doneprocedures,
supervision, automation, supervision, and so on. He states clearly that:
The solution is instead to identify the situations where the variability of everyday performance may
combine to create unwanted effects and to monitor continuously how the system functions in
order to intervene and dampen performance variability when it threatens to get out of control.
(Hollnagel, 2014, p.121) (Italics added for emphasis.)
Although critical steps and their related risk-important actions can be identified during work planning (described
later), a more flexible approach is needed to take advantage of front-line operators’ abilities to adapt and adjust
to protect assets during uncertain work situations, where critical steps can arise unexpectedly. Hollnagel noted
four essential cornerstones that permeate all levels and all functions of ultra-safe organizations (Hollnagel, 2009,
pp.117-133):
Anticipate: know what to expect; what can go wrong with one or more assets during operations
Monitor: know what to pay attention to; where a loss of control could trigger harm, i.e., critical steps
Respond: know what to do; how to control, avoid, or mitigate threats or hazards to assets
Learn: know what has happened, what is happening, and what to change
Notice the recurring verb know. Each cornerstone involves creating knowledgeto know, which requires
thinking, to represent some thing or idea accurately as it really is. Ultra-safe organizations think about safety
and reliabilitymindfulness. Therefore, in an operational context, I refer to these cornerstones collectively as
risk-based thinking. Risk-based thinking is a fundamental first principle of operational human performance, and
anyone can use it. For those at the sharp end of the organization, risk-based thinking offers a straightforward
and practical approach to controlling performance (variability) at critical steps. I suggest managers integrate risk-
based thinking into important work functions throughout the organization, not just for critical steps during
operational work, by enabling the abilities to anticipate, monitor, respond, and learn in the key functions of a
process.
People can apply risk-based thinking more effectively, when they know clearly what asset to protect during their
work. To manage the variability of human performance (uncertainty), the personnel involved in an operation
must understand the inherent hazards of the processes to be used and the asset(s) to be engaged during the
work process. As a minimum, there are at least two assets in most work activities, the health and well-being of
the performer and the quality and integrity of the product or service. Foreknowledge of assets and related
critical steps offers workers an opportunity to avoid serious harm, not only to themselves and to their coworkers,
but also to the products or services the organization provides. In the following section, risk-based thinking serves
5
as a foundation for developing a systematic approach to managing critical steps.
3.2 A Risk-Based Approach
Because of the specter of human error, it is helpful to provide front-line workers with structured time to think
about critical steps in their work. Before starting work, workers participate in a pre-task briefing or tailboard,
where workers review procedures to familiarize themselves with desired work accomplishments, criteria for
success, and to know what to avoid. It is important for the workers to consider explicitly how their actions affect
assets in light of the production objectives. Knowing what to avoid includes identifying, denoting, and controlling
critical steps and their related RIAs. Integrating the logic of risk-based thinking into pre-task briefings and
tailboards aids this awareness. Similar to failure modes and effects analysis, an adaptation of an approach
commonly used in the nuclear electric generating industry (DOE, 2009, pp.34-40) to preview a work activity is
RU-SAFESM:
1. Recognize assets important to safety, reliability, quality, and production (economic goal).
2. Understand inherent hazards to each asset and relevant lessons learned from previous incidents.
3. Summarize the critical steps and related risk-important actions (RIAs) from the work plan.
4. Anticipate errors (losses of control) for each critical step, highlighting dangerous error traps.
5. Foresee worst-case consequences to each asset should control be lost at each critical step.
6. Evaluate the controls and barriers needed at each critical step, including contingencies and stop-work
criteria.
Individuals can preview their work guided by RU-SAFESM not only during work preparation, but also during the
work planning process, procedure development process, and even during work execution in the field. It can help
anyone think through one’s actions before performing them, anytime, anywhere. However, the effectiveness of
RU-SAFESM depends greatly on the user’s technical knowledge, without which the worker may not recognize the
limits of safety for the assets they work with.
Even when no apparent critical steps exist, people should still adopt an attitude of mindfulness and wariness
toward their worka chronic uneasiness, especially during any activity that involves a transfer of energy,
movement of mass, or transmission of information during operations. This when self-checking is useful.
3.3 Performing a Critical Step
Rigor and care are essential when performing critical steps. The same can be said about RIAs, especially when
the results of a RIA can be hidden or obscure after performance (such as properly folding a parachute before
stitching it closed for skydiving). When about to perform a critical step, it is important to pause for a moment to
consider the situationself-check. Stopping the flow of work momentarily helps the performer collect his/her
thoughtsto think, to concentrate (pay attention) on what is about to happen. Any action involving the transfer
of energy, the movement of mass, or the transmission of information could trigger harm if control is lost.
To retain positive control, self-checking is perhaps the most effective human performance technique an
individual can use to perform a critical step (DOE, 2009, pp.18-19). Positive control, simply stated is “what is
intended to happen is what happens, and that it all that all that happens.” Before acting, the performer thinks
explicitly about the intended action, its control, the asset and expected outcome. The performer ensures safety
exists for the asset by verifying proper conditions created by preceding RIAs exist. Self-checking also preserves
concentration during and after an action to verify results. The effectiveness of self-checking depends greatly on
the performer’s grasp and understanding of process technical knowledge. Without it, uncertainty abounds.
If uncertain, the performer resolves any questions or concerns before proceeding. When the performer believes
a situation places himself/herself, a coworker, the product, the equipment, or the environment in danger or at
risk, he or she STOPS the work, and gets help. STOP work criteria should be explicit, having been discussed
previously in the pre-task briefing. If there is any doubt, there is no doubt, STOP! Never proceed in the face of
uncertainty! Once the performer satisfies him/herself that safe conditions exist, he/she performs the critical
step, the right action on the right component at the right place and time, while monitoring the key parameters
regarding the change in state of the asset. The STAR (stop, think, act, review) self-checking tool (DOE, 2006, p.)
describes how an individual performer can exercise positive control of a critical step.
1. Stop Just before transferring energy, moving mass, or transmitting information, pause.
Focus attention on the asset(s) and the task’s immediate objective.
Eliminate distractions.
6
2. Think Understand what will happen, especially to assets, when performing the action.
Verify the action is appropriate, given the status of the asset(s); understand the path of the transfer
of energy, movement of mass, or transmission of information.
Know the key parameters of the asset to monitor, how they should change, and expected result(s)
of the action.
Consider a contingency to minimize harm if an unexpected result occurs.
If there is any doubt, STOP, and get help. Apply STOP work criteria.
3. Act Perform the correct action under control on the right component.
Without losing eye contact, read and touch the component label.
Compare the component label with the guiding document.
Without losing physical contact, perform the action.
4. Review Verify anticipated result obtained.
Verify the desired change in key parameters of the asset(s).
Perform the contingency, if an unexpected result occurs.
STOP work, if criteria are met, and notify a supervisor or those with the expertise.
SUMMARY
A critical step is a human action that will trigger immediate, irreversible, and intolerable harm to an asset, if that
action or a preceding action is performed improperly. Safety is not what you have, it is what you do (especially
at critical steps) (Woods, 2010, pp.38-39). Safety is achieved by pinpointing the situations where human error
could trigger unwanted consequences and then introducing means to enhance the reliability of the practitioner’s
performance. There is no margin for error at a critical stepperformance must be perfect. Critical steps always
involve substantial transfers of energy, movements of mass, or transmissions of information. Workers improve
their chances of controlling critical steps by identifying the associated risk- important actions (RIA). A RIA is a
human action preceding a critical step that either 1) creates the condition that could cause harm to an asset, 2)
reduces the number of actions required to initiate a transfer of energy, movement of mass, or transmission of
information, or 3) weakens positive control for a subsequent critical step. Every critical step has one or more
associated RIAs that precede it. Systematically identifying and controlling the occurrence of human error at
critical steps is an important safety function to incorporate into work to promote safety and reliability.
Identifying and controlling critical steps helps people protect assets from unacceptable harm, and improves the
chances for success even during varying or unexpected circumstances. CRITICAL STEP MAPPINGSM, RU-SAFESM,
and STAR (self-checking) provide workers with a proactive means of making sure value is added instead
extracting value.
REFERENCES
Center for Chemical Process Safety (CCPS) (1994). Guidelines for Preventing Human Error in Process Safety. New
York: American Institute of Chemical Engineers.
Civil Aviation Safety Authority (CASA) (2013). Safety Behaviors Human Factors: Resource Guide for Engineers.
Canberra, Australia: Civil Aviation Safety Authority.
Hollnagel, E. (2009). The Four Cornerstones of Resilience Engineering. Pp. in Nemeth, C., Hollnagel, E., and
Dekker, S. (Eds.). Resilience Engineering Perspectives Volume 2, Preparation and Restoration. Farnham: Ashgate.
Hollnagel, E. (2014). Safety-I and Safety-II: The Past and Future of Safety Management. Farnham: Ashgate.
Kletz, T. (2001). An Engineer’s View of Human Error. New York, NY: Taylor and Francis.
Reason, J. (1998). Managing the Risks of Organizational Accidents. Aldershot: Ashgate.
United States Department of Defense (2000). Standard Practice for System Safety (MIL-STD 882D). Washington,
D.C.: Government Printing Office.
United States Department of Energy (DOE) (2009). DOE Standard: Human Performance Improvement Handbook,
VOLUME 2: Human Performance Tools for Individuals, Work Teams, and Management (DOE-HDBK-1028-2009)
(pp.18-19) retrieved from http://www.hss.energy.gov/nuclearsafety/ns/techstds/.
Woods, D., Dekker, S. Cook, R., Johannesen, L., and Sarter, N. (2010). Behind Human Error (2nd ed.). Farnham:
Ashgate.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Resilience engineering has since 2004 attracted widespread interest from industry as well as academia. Practitioners from various fields, such as aviation and air traffic management, patient safety, off-shore exploration and production, have quickly realised the potential of resilience engineering and have became early adopters. The continued development of resilience engineering has focused on four abilities that are essential for resilience. These are the ability a) to respond to what happens, b) to monitor critical developments, c) to anticipate future threats and opportunities, and d) to learn from past experience - successes as well as failures. Working with the four abilities provides a structured way of analysing problems and issues, as well as of proposing practical solutions (concepts, tools, and methods). This book is divided into four main sections which describe issues relating to each of the four abilities. The chapters in each section emphasise practical ways of engineering resilience and feature case studies and real applications. The text is written to be easily accessible for readers who are more interested in solutions than in research, but will also be of interest to the latter group.
Article
Many 21st century operations are characterised by teams of workers dealing with significant risks and complex technology, in competitive, commercially-driven environments. Informed managers in such sectors have realised the necessity of understanding the human dimension to their operations if they hope to improve production and safety performance. While organisational safety culture is a key determinant of workplace safety, it is also essential to focus on the non-technical skills of the system operators based at the 'sharp end' of the organisation. These skills are the cognitive and social skills required for efficient and safe operations, often termed Crew Resource Management (CRM) skills. In industries such as civil aviation, it has long been appreciated that the majority of accidents could have been prevented if better non-technical skills had been demonstrated by personnel operating and maintaining the system. As a result, the aviation industry has pioneered the development of CRM training. Many other organisations are now introducing non-technical skills training, most notably within the healthcare sector. Safety at the Sharp End is a general guide to the theory and practice of non-technical skills for safety. It covers the identification, training and evaluation of non-technical skills and has been written for use by individuals who are studying or training these skills on CRM and other safety or human factors courses. The material is also suitable for undergraduate and post-experience students studying human factors or industrial safety programmes. © Rhona Flin, Paul O'Connor and Margaret Crichton 2008. All rights reserved.
Article
Accident investigation and risk assessment have for decades focused on the human factor, particularly 'human error'. Countless books and papers have been written about how to identify, classify, eliminate, prevent and compensate for it. This bias towards the study of performance failures, leads to a neglect of normal or 'error-free' performance and the assumption that as failures and successes have different origins there is little to be gained from studying them together. Erik Hollnagel believes this assumption is false and that safety cannot be attained only by eliminating risks and failures. The ETTO Principle looks at the common trait of people at work to adjust what they do to match the conditions – to what has happened, to what happens, and to what may happen. It proposes that this efficiency-thoroughness trade-off (ETTO) – usually sacrificing thoroughness for efficiency – is normal. While in some cases the adjustments may lead to adverse outcomes, these are due to the very same processes that produce successes, rather than to errors and malfunctions. The ETTO Principle removes the need for specialised theories and models of failure and 'human error' and offers a viable basis for effective and just approaches to both reactive and proactive safety management.
Book
Safety has traditionally been defined as a condition where the number of adverse outcomes was as low as possible (Safety-I). From a Safety-I perspective, the purpose of safety management is to make sure that the number of accidents and incidents is kept as low as possible, or as low as is reasonably practicable. This means that safety management must start from the manifestations of the absence of safety and that - paradoxically - safety is measured by counting the number of cases where it fails rather than by the number of cases where it succeeds. This unavoidably leads to a reactive approach based on responding to what goes wrong or what is identified as a risk - as something that could go wrong. Focusing on what goes right, rather than on what goes wrong, changes the definition of safety from ‘avoiding that something goes wrong’ to ‘ensuring that everything goes right’. More precisely, Safety-II is the ability to succeed under varying conditions, so that the number of intended and acceptable outcomes is as high as possible. From a Safety-II perspective, the purpose of safety management is to ensure that as much as possible goes right, in the sense that everyday work achieves its objectives. This means that safety is managed by what it achieves (successes, things that go right), and that likewise it is measured by counting the number of cases where things go right. In order to do this, safety management cannot only be reactive, it must also be proactive. But it must be proactive with regard to how actions succeed, to everyday acceptable performance, rather than with regard to how they can fail, as traditional risk analysis does. This book analyses and explains the principles behind both approaches and uses this to consider the past and future of safety management practices. The analysis makes use of common examples and cases from domains such as aviation, nuclear power production, process management and health care. The final chapters explain the theoretical and practical consequences of the new perspective on the level of day-to-day operations as well as on the level of strategic management (safety culture).
Conference Paper
Nuclear explosive assembly/disassembly operations that are carried out under United States Department of Energy (DOE) purview are characterized by activities that primarily involve manual tasks. These process activities are governed by procedural and administrative controls that traditionally have been developed without a formal link to process hazards. This work, which was based on hazard assessment (HA) activities conducted as part of the W69 Integrated Safety Process (ISP), specifies an approach to identifying formal safety controls for controlling (i.e., preventing or mitigating) hazards associated with nuclear explosive operations. Safety analysis methods are used to identify controls, which then are integrated into a safety management framework to provide assurance to the DOE that hazardous activities are managed properly. As a result of the work on the W69 ISP dismantlement effort, the authors have developed an approach to identify controls and safety measures to improve the safety of nuclear explosive operations. The methodology developed for the W69 dismantlement effort is being adapted to the W76 ISP effort. Considerable work is still ongoing to address issues such as the adequacy or effectiveness of controls. DOE nuclear explosive safety orders and some historical insights are discussed briefly in this paper. The safety measure identification methodology developed as part of the W69 ISP dismantlement process then is summarized.
Article
As the practical interest for resilience engineering continues to grow, so does the need of a clear definition and of practical methods. The purpose of this short chapter is to propose a working definition of resilience and analyse it in some detail. The working definition, based on the work in the project described here, is as follows: A resilient system is able effectively to adjust its functioning prior to, during, or following changes and disturbances, so that it can continue to perform as required after a disruption or a major mishap, and in the presence of continuous stresses.