Content uploaded by Tony Muschara
Author content
All content in this area was uploaded by Tony Muschara on Jan 19, 2018
Content may be subject to copyright.
1
CRITICAL STEPS:
MANAGING HUMAN OPERATIONS THAT MUST GO RIGHT EVERY TIME
Tony Muschara, CPT
Muschara Error Management Consulting, LLC
4724 Outlook Way NE, Marietta, Georgia 30066 USA
tmuschara@muschara.com
Abstract
A critical step is defined as a human action that will trigger immediate, irreversible, and intolerable harm to an
asset, if that action or a preceding action is performed improperly. Critical steps occur anytime an action involves
a substantial transfer of energy, the movement of mass, or the transmission of information that, if not performed
under control, could result in serious harm to an organization’s assets. Accidents typically occur when front line
production personnel lose control of such transfers during work. Human beings, because of human fallibility, are
the greatest source of variation in operations. First, this paper describes the critical step concept and the
preceding conditions—risk-important actions—that create critical steps. Second, a couple of means of managing
and controlling performance of critical steps during operations are described. Included is an adaptation of the
cornerstones of resilience described by Erik Hollnagel in the Ashgate series on Resilience Engineering. Risk-based
thinking—anticipate, monitor, respond, and learn—is offered as a fundamental first principal of identifying and
controlling the performance of critical steps during operations.
1 INTRODUCTION
Do you need a parachute to sky dive? Literally, no. However, you definitely need a parachute to sky dive twice.
Although humorous, this example conveys the fact that some things have to go right first time, every time. There
is a point of no return, involving substantial transfers of energy, movements of mass, or transmissions of
information such that, if not performed under control, could result in serious, in not tragic, harm to people,
damage to product, property, or environment, or even loss that threatens the survival of the organization.
Accidents typically occur when front line production personnel at the sharp end lose control of such transfers
during work.
Recall from the definition of a critical step that it involves harm to something of importance—an asset. An asset
is anything of substantial value to an organization. For a commercial airline pilot, the primary assets are
passengers and the aircraft. Secondary assets include schedule adherence and quality of service. For a hospital
assets include patients, its staff, it medications, and its facilities. For a biotech company, it is the drug substance.
For a nuclear power plant, it is the reactor core. And so on. For the purposes of this paper, assets include those
things deamed important to the organization and vulnerable to human action during opertions.
A human interaction with an asset, directly or indirectly, such that it changes the state of that asset, whether for
good or ill, is referred to as a touchpoint. All work accomplishments include a series of touchpoints intended to
produce a work output for business purposes. Most touchpoints add value, but, occasionally, some extract value
causing damage or loss. Actions that extract value trigger unwanted releases of energy, movements of mass, or
tranmissions of information, if the operator loses control at these critical touchpoint—these are critical steps.
Therefore, it becomes important for high-hazard organizations to methodically identify critical steps, identify
preceding risk-important action (discussed later), install appropriate controls, and prepare front-line workers to
recognize and adapt conservatively to the surprise occurrences of critical steps not identified previously. Hence,
together, risk-based thinking and chronic unease (discussed later) serve as an important way of thinking and
feeling about risk-important operations, enhancing people’s adaptive capacity during work.
Most errors occur without our knowledge, because of their trivial nature; there are no apparent consequences
to most errors. However, occasionally, simple human error can trigger serious harm, possibly catastrophic
consequences. What if failure is not an option? A study by the Civil Aviation Safety Authority of Australia suggests
people, in their waking moments, make three to four errors per hour (CASA, 2013, p.27). However, we are not
concerned with avoiding all errors. That’s truly an impossibility. But, can you pinpoint specific operations
(touchpoints) that possess the lethal capacity to cause injury, damage, or loss? Yes, I beleve you can?
2
2 UNDERSTANDING AND IDENTIFYING CRITICAL STEPS
Dr. James Reason, author of Managing the Risks of Organizational Accidents (1998), has said repeatedly that,
“Wherever there are human beings, there is human error—human error is part of the human condition.” It is
normal to err, a behavior that unintentionally deviates from a preferred behavior for a particular situation. More
simply, human error is a loss of control—a source of uncertainty in the workplace. This uncertainty becomes
relevant when people perform work on key assets or manipulate the controls of industrial processes.
Understandably, human fallibility is one of the greatest sources of variation in any industrial operation, and thus
it poses a serious risk to the integrity of an organization’s assets and its ability to achieve its goals. During
operational tasks, a worker could unintentionally trigger harm to one or more assets, losing control of transfers
of energy, movements of mass, or transmissions of information during operations.
This innate human tendency to err is not a problem unless it occurs coincidentally at a critical step. (CCPS, 1994,
pp.207-211). Occasionally, people’s slips, lapses, or mistakes trigger harm, extracting value. An asset’s exposure
to human fallibility occurs when workers touch an asset, which usually occurs when front line personnel, such
as operators, nurses, technicians, pilots, surgeons, and others perform work, which usually involve application
of inherent hazards with assets with the intent of adding value. When performing tasks, people are usually
concentrating on accomplishing their immediate production goal, not necessarily safety (Hollnagel, 2009, p.29).
If people cannot fully concentrate on safety 100 percent of the time, then when should people be fully focused
on safe outcomes? I believe at critical steps.
2.1 Critical Step Basics
All tasks involve human action. Some actions change the state of assets—touchpoints. Some touchpoints are
more important than others, initiating substantial transfers of energy, such as tripping a distribution system
circuit breaker, energizing an x-ray machine, starting a large electric-driven pump, giving medication to a patient,
and clicking Enter to start a software update for a distributed digital control system. People perform such actions
many times a day, but could, with the right pre-conditions, seriously hurt someone, damage product, or,
otherwise, destroy something of importance, if performed out of control. The question is which action(s)
absolutely has to go right the first time? To help us accurately identify a critical step, it helps to know the
attributes of one:
TouchPoint: A physical human interaction with an asset or with a control such that the state of the
asset is altered as confirmed by one or more asset parameters.
Transfers: There is a substantial flow of energy, mass, or information associated with the action.
Certainty: There is complete assurance that harm triggered by the action is unavoidable, if there is a
loss of control. It will happen, such as briefly touching a hot burner on a stovetop will be painful possibly
leaving a blister.
Immediate: The occurrence of harm is faster than a performer can humanly respond to avoid the
consequence. In some cases there is no turning back to avoid the harm such as falling out of an aircraft
door without a parachute, described by the next characteristic.
Irreversible: There is no undo. You cannot undo the harm by simply reversing the action; you are past
the point of no return. No means exists to restore the asset to its previous, unharmed state, or the
performer has no means to regain control of transfers of energy, mass, or information after taking the
action.
Intolerable Harm: Significant injury, damage, or loss is realized. The severity of harm depends on the
asset’s susceptibility and on the magnitude and intensity of the release of energy and movement of
mass, and the significance and type of information transmitted and its receiver. An outcome is
intolerable depending on what the organization deems important to safety, quality, the environment,
and production.
Collectively, these attributes define a critical step: a human action that will trigger immediate, irreversible, and
intolerable harm to an asset, if that action or a preceding action is performed improperly . As you read the
following examples, test for yourself that every one of the situations satisfies the definition of a critical step:
Breaching a pressure boundary such as a pipe or tank manhole cover
Entering a confined space where the atmosphere could be deficient of oxygen
Pulling the trigger on a firearm
3
Opening or closing a power system grid circuit breaker that supplies electric power to a hospital
Clicking “Send,” “Start,” or “Enter” after recording or entering sensitive information or data in a digital
control system, including software updates
Leaping out of the door of an airborne aircraft while skydiving
Entering the line of fire—direct exposure to a moving hazard you have little or no control over, such as
crossing a street or driving a vehicle on public streets and highways
In every case, there is either a transfer (or interruption) of energy, a movement of mass, or a transmission of
information that could cause immediate, irreversible, intolerable harm to an asset if the performer loses control.
However, it is important to note that an action that has a point of no return is not necessarily a critical step. To
designate an action as a critical step depends on the severity of the consequences, that is, the degree of harm
that ensues after the action. To be useful, reserve the critical step concept for situations that could threaten the
life and well-being of its employees or the public, endanger the environment, jeopardize accomplishment of the
organization’s mission, and so on.
2.2 Risk-Important Actions
People confuse actions they perceive important to safety or quality that really do not satisfy the definition of a
critical step. Many people consider securing your seatbelt before driving away in your automobile to be a critical
step. But, really, is it? As long as the car is not moving, nothing happens if the seatbelt is buckled or unbuckled—
no harm occurs. Critical steps involve transfers of energy, movements of mass, or transmissions of information
that could trigger immediate harm to assets, if performed out of control. Though important, such actions trigger
no harm until later at the true critical steps.
In an operational process, the conditions for harm are created earlier by previous actions, where energy, mass,
or information is poised for transfer with the intent to add value. Mistakes that trigger serious harm always
combine with one or more hazardous conditions that already exist. Performers create these conditions by one
or more preceding actions in the process, which are referred to as risk-important actions (RIA). RIAs create the
conditions for potential harm for subsequent critical steps.
A RIA includes any human action preceding a critical step that either 1) creates a condition that could cause
harm to an asset, 2) reduces the number of actions required to initiate a transfer of energy, mass, or information,
or 3) weakens positive control for a critical step. Take skydiving for example. Taking off in an aircraft and flying
at 12,000 feet creates a potentially hazardous condition for any human being, a RIA. Another RIA involves
properly folding the parachute. Finally, donning a parachute prior to leaping across the threshold of an open
door of the aircraft is a necessary precondition for skydiving safely. RIAs possess some or all of the following
characteristics:
Reversible: an action that can be reversed or undone without causing harm; being able to start over.
Using a handgun example, a bullet can be inserted into or ejected from the gun’s chamber, the safety
lever can be moved from on to off and back to on, or the hammer can be cocked and uncocked. In every
case, there is no discharge of the firearm as long as the trigger is not pulled.
Slack: a substantial gap of time exists after an action that affords the opportunity to stop the transfer
of energy, movement of mass, or transmission of information before realizing harm. Time is available
to think and take recovery actions to preclude unwanted outcomes. RIAs always precede critical steps.
Reduced margin to safety limits: reduced number of actions to a critical step, such as defeating safety
devices or functions prior to performing a critical step, such as energizing an x-ray machine to a certain
power level prior to snapping the shutter to release the energy.
Minimal transfer: little or no transfer of energy, relocation of substances, movement of objects, or
communication of information. An asset experiences no substantial change in state, if any.
The following examples illustrate how commonplace risk-important actions are. Notice that in every case, an
asset’s condition does not change—there is no substantial release or transfer of energy, mass, or information,
and the action is reversible as long as the critical step is not performed.
Checking the atmosphere of a large tank before entering it to inspect the internal structure
Adding new oil to an engine during an oil change before starting the engine
Donning and securing a parachute harness before leaping out the open door of an aircraft
Donning a hardhat, protective eyewear, and gloves before entering a construction zone
4
Securing the lanyard of a fall protection harness after reaching an elevated work location
Removing car keys from the ignition and securing them before closing a locked car door after leaving
Just prior to performing a critical step, one has an opportunity to review the outcomes of previously performed
RIAs to verify (prove) that conditions are safe to proceed with the critical step.
2.3 Identifying a Critical Step
Process engineers and procedure writers can improve the effectiveness of anticipating and controlling critical
steps by using methodical, repeatable processes, such as a failure modes and effects analysis, which has been
adapted to identify critical steps. In general terms, a CRITICAL STEP MAPPINGSM process includes the following
high-level phases.
1. Pinpoint important assets, things of value to protect from injury, damage, or loss.
2. Identify significant touchpoints occurring during important work functions (accomplishments).
3. Assess the risk at significant touchpoints, comparing the touchpoint to the definition of a critical step.
4. Develop controls and barriers to avoid loss of control (human error) and damage to assets.
5. Implement and evaluate controls and barriers.
3 MANAGING CRITICAL STEPS
3.1 Risk-Based Thinking
I believe a practical, risk-based approach to managing human performance in operations has been offered
through the research of Erik Hollnagel in a series of articles and chapters associated with Resilience Engineering.
He strongly suggests safety cannot be managed simply by imposing restraints on how work is done—procedures,
supervision, automation, supervision, and so on. He states clearly that:
The solution is instead to identify the situations where the variability of everyday performance may
combine to create unwanted effects and to monitor continuously how the system functions in
order to intervene and dampen performance variability when it threatens to get out of control.
(Hollnagel, 2014, p.121) (Italics added for emphasis.)
Although critical steps and their related risk-important actions can be identified during work planning (described
later), a more flexible approach is needed to take advantage of front-line operators’ abilities to adapt and adjust
to protect assets during uncertain work situations, where critical steps can arise unexpectedly. Hollnagel noted
four essential cornerstones that permeate all levels and all functions of ultra-safe organizations (Hollnagel, 2009,
pp.117-133):
Anticipate: know what to expect; what can go wrong with one or more assets during operations
Monitor: know what to pay attention to; where a loss of control could trigger harm, i.e., critical steps
Respond: know what to do; how to control, avoid, or mitigate threats or hazards to assets
Learn: know what has happened, what is happening, and what to change
Notice the recurring verb know. Each cornerstone involves creating knowledge—to know, which requires
thinking, to represent some thing or idea accurately as it really is. Ultra-safe organizations think about safety
and reliability—mindfulness. Therefore, in an operational context, I refer to these cornerstones collectively as
risk-based thinking. Risk-based thinking is a fundamental first principle of operational human performance, and
anyone can use it. For those at the sharp end of the organization, risk-based thinking offers a straightforward
and practical approach to controlling performance (variability) at critical steps. I suggest managers integrate risk-
based thinking into important work functions throughout the organization, not just for critical steps during
operational work, by enabling the abilities to anticipate, monitor, respond, and learn in the key functions of a
process.
People can apply risk-based thinking more effectively, when they know clearly what asset to protect during their
work. To manage the variability of human performance (uncertainty), the personnel involved in an operation
must understand the inherent hazards of the processes to be used and the asset(s) to be engaged during the
work process. As a minimum, there are at least two assets in most work activities, the health and well-being of
the performer and the quality and integrity of the product or service. Foreknowledge of assets and related
critical steps offers workers an opportunity to avoid serious harm, not only to themselves and to their coworkers,
but also to the products or services the organization provides. In the following section, risk-based thinking serves
5
as a foundation for developing a systematic approach to managing critical steps.
3.2 A Risk-Based Approach
Because of the specter of human error, it is helpful to provide front-line workers with structured time to think
about critical steps in their work. Before starting work, workers participate in a pre-task briefing or tailboard,
where workers review procedures to familiarize themselves with desired work accomplishments, criteria for
success, and to know what to avoid. It is important for the workers to consider explicitly how their actions affect
assets in light of the production objectives. Knowing what to avoid includes identifying, denoting, and controlling
critical steps and their related RIAs. Integrating the logic of risk-based thinking into pre-task briefings and
tailboards aids this awareness. Similar to failure modes and effects analysis, an adaptation of an approach
commonly used in the nuclear electric generating industry (DOE, 2009, pp.34-40) to preview a work activity is
RU-SAFESM:
1. Recognize assets important to safety, reliability, quality, and production (economic goal).
2. Understand inherent hazards to each asset and relevant lessons learned from previous incidents.
3. Summarize the critical steps and related risk-important actions (RIAs) from the work plan.
4. Anticipate errors (losses of control) for each critical step, highlighting dangerous error traps.
5. Foresee worst-case consequences to each asset should control be lost at each critical step.
6. Evaluate the controls and barriers needed at each critical step, including contingencies and stop-work
criteria.
Individuals can preview their work guided by RU-SAFESM not only during work preparation, but also during the
work planning process, procedure development process, and even during work execution in the field. It can help
anyone think through one’s actions before performing them, anytime, anywhere. However, the effectiveness of
RU-SAFESM depends greatly on the user’s technical knowledge, without which the worker may not recognize the
limits of safety for the assets they work with.
Even when no apparent critical steps exist, people should still adopt an attitude of mindfulness and wariness
toward their work—a chronic uneasiness, especially during any activity that involves a transfer of energy,
movement of mass, or transmission of information during operations. This when self-checking is useful.
3.3 Performing a Critical Step
Rigor and care are essential when performing critical steps. The same can be said about RIAs, especially when
the results of a RIA can be hidden or obscure after performance (such as properly folding a parachute before
stitching it closed for skydiving). When about to perform a critical step, it is important to pause for a moment to
consider the situation—self-check. Stopping the flow of work momentarily helps the performer collect his/her
thoughts—to think, to concentrate (pay attention) on what is about to happen. Any action involving the transfer
of energy, the movement of mass, or the transmission of information could trigger harm if control is lost.
To retain positive control, self-checking is perhaps the most effective human performance technique an
individual can use to perform a critical step (DOE, 2009, pp.18-19). Positive control, simply stated is “what is
intended to happen is what happens, and that it all that all that happens.” Before acting, the performer thinks
explicitly about the intended action, its control, the asset and expected outcome. The performer ensures safety
exists for the asset by verifying proper conditions created by preceding RIAs exist. Self-checking also preserves
concentration during and after an action to verify results. The effectiveness of self-checking depends greatly on
the performer’s grasp and understanding of process technical knowledge. Without it, uncertainty abounds.
If uncertain, the performer resolves any questions or concerns before proceeding. When the performer believes
a situation places himself/herself, a coworker, the product, the equipment, or the environment in danger or at
risk, he or she STOPS the work, and gets help. STOP work criteria should be explicit, having been discussed
previously in the pre-task briefing. If there is any doubt, there is no doubt, STOP! Never proceed in the face of
uncertainty! Once the performer satisfies him/herself that safe conditions exist, he/she performs the critical
step, the right action on the right component at the right place and time, while monitoring the key parameters
regarding the change in state of the asset. The STAR (stop, think, act, review) self-checking tool (DOE, 2006, p.)
describes how an individual performer can exercise positive control of a critical step.
1. Stop – Just before transferring energy, moving mass, or transmitting information, pause.
Focus attention on the asset(s) and the task’s immediate objective.
Eliminate distractions.
6
2. Think – Understand what will happen, especially to assets, when performing the action.
Verify the action is appropriate, given the status of the asset(s); understand the path of the transfer
of energy, movement of mass, or transmission of information.
Know the key parameters of the asset to monitor, how they should change, and expected result(s)
of the action.
Consider a contingency to minimize harm if an unexpected result occurs.
If there is any doubt, STOP, and get help. Apply STOP work criteria.
3. Act – Perform the correct action under control on the right component.
Without losing eye contact, read and touch the component label.
Compare the component label with the guiding document.
Without losing physical contact, perform the action.
4. Review – Verify anticipated result obtained.
Verify the desired change in key parameters of the asset(s).
Perform the contingency, if an unexpected result occurs.
STOP work, if criteria are met, and notify a supervisor or those with the expertise.
SUMMARY
A critical step is a human action that will trigger immediate, irreversible, and intolerable harm to an asset, if that
action or a preceding action is performed improperly. Safety is not what you have, it is what you do (especially
at critical steps) (Woods, 2010, pp.38-39). Safety is achieved by pinpointing the situations where human error
could trigger unwanted consequences and then introducing means to enhance the reliability of the practitioner’s
performance. There is no margin for error at a critical step—performance must be perfect. Critical steps always
involve substantial transfers of energy, movements of mass, or transmissions of information. Workers improve
their chances of controlling critical steps by identifying the associated risk- important actions (RIA). A RIA is a
human action preceding a critical step that either 1) creates the condition that could cause harm to an asset, 2)
reduces the number of actions required to initiate a transfer of energy, movement of mass, or transmission of
information, or 3) weakens positive control for a subsequent critical step. Every critical step has one or more
associated RIAs that precede it. Systematically identifying and controlling the occurrence of human error at
critical steps is an important safety function to incorporate into work to promote safety and reliability.
Identifying and controlling critical steps helps people protect assets from unacceptable harm, and improves the
chances for success even during varying or unexpected circumstances. CRITICAL STEP MAPPINGSM, RU-SAFESM,
and STAR (self-checking) provide workers with a proactive means of making sure value is added instead
extracting value.
REFERENCES
Center for Chemical Process Safety (CCPS) (1994). Guidelines for Preventing Human Error in Process Safety. New
York: American Institute of Chemical Engineers.
Civil Aviation Safety Authority (CASA) (2013). Safety Behaviors Human Factors: Resource Guide for Engineers.
Canberra, Australia: Civil Aviation Safety Authority.
Hollnagel, E. (2009). The Four Cornerstones of Resilience Engineering. Pp. in Nemeth, C., Hollnagel, E., and
Dekker, S. (Eds.). Resilience Engineering Perspectives Volume 2, Preparation and Restoration. Farnham: Ashgate.
Hollnagel, E. (2014). Safety-I and Safety-II: The Past and Future of Safety Management. Farnham: Ashgate.
Kletz, T. (2001). An Engineer’s View of Human Error. New York, NY: Taylor and Francis.
Reason, J. (1998). Managing the Risks of Organizational Accidents. Aldershot: Ashgate.
United States Department of Defense (2000). Standard Practice for System Safety (MIL-STD 882D). Washington,
D.C.: Government Printing Office.
United States Department of Energy (DOE) (2009). DOE Standard: Human Performance Improvement Handbook,
VOLUME 2: Human Performance Tools for Individuals, Work Teams, and Management (DOE-HDBK-1028-2009)
(pp.18-19) retrieved from http://www.hss.energy.gov/nuclearsafety/ns/techstds/.
Woods, D., Dekker, S. Cook, R., Johannesen, L., and Sarter, N. (2010). Behind Human Error (2nd ed.). Farnham:
Ashgate.