Content uploaded by Malcolm Van Harte
Author content
All content in this area was uploaded by Malcolm Van Harte on Aug 21, 2017
Content may be subject to copyright.
C I R E D
21st International Conference on Electricity Distribution Frankfurt, 6-9 June 2011
Paper 0594
Paper No
0594
1/4
BUILDING SYSTEM RESILIENCE THROUGH MULTI-DISCIPLINARY AND CROSS-
DIVISIONAL REGIONAL RESILIENCE TEAMS
Malcolm Van Harte
vhartem@eskom.co.za Robert Koch
robert.koch@eskom.co.za Jose Correia
CorreiaJ@eskom.co.za Gunter Rohde
RohdeG@eskom.co.za
System Operations & Planning Division
Eskom Holdings Limited
ABSTRACT
This paper describes some of the structures and processes
implemented to build resilience across the various divisions
of a vertically integrated utility (Eskom
1
). The aim in
implementing these was to: enhance its ability to identify,
anticipate; and adapt rapidly to threats and vulnerabilities;
operate at elevated levels of stress without failure for
extended periods of time; build the capacity to respond and
recover from a disturbance; and learn from near misses
and events.
This paper describes Eskom’s Regional Resilience Teams
and how they relate to Eskom’s Integrated Risk
Management (IRM) Framework, Eskom’s Integrated
Emergency Response structures and protocols [1,2], and
Eskom’s normal business processes [3]. The paper also
illustrates how these contribute to a greater degree of
organisational “mindfulness” in relation to the potential
for failure - and the manner in which management and staff
recognise, prioritise, and mobilise to address areas of
potential failure [4,5].
1. INTRODUCTION
After experiencing a significant regional power system
constraint in 2006, Eskom initiated a re-design of its
emergency structures and protocols – with the aim of
facilitating a more coordinated response to power system
emergencies across generation, transmission, and
distribution boundaries. These structures and protocols were
stress-tested during national load shedding experienced in
2008 (when system demand outstripped available generation
capacity in South Africa) [6]. Modifications were
subsequently made to these structures and protocols and to
the way in which Eskom identifies and manages risk
2
. One
of the initiatives to better manage power system risks was
the implementation of Regional Resilience Teams (RRT’s).
These structures integrate business units (i.e. generation,
transmission, distribution, capital build programme) and
various functional areas (i.e. technical, customer services,
stakeholder management, and risk management).
2. RESILIENCE
Resilience is more than simply “the ability to bounce back”
after a failure – an organisation seeking to be highly
resilient needs to also continuously focus on aspects related
to the potential for failure at all levels of the organisation.
1
Eskom is a vertically integrated, Sout h African state-owned electricity company, established in
1923. The utility is the largest producer of electricity in Africa, and is among the top seven utilities
in the wor ld in terms of generation capacity and among the t op nine in terms of sales.
(www.eskom.co.za)
2
Definition of Risk = “Effect of uncertainty on objectives” [ISO 31000:2009]
With this in mind, Eskom defines resilience as the inherent
ability to: (i) identify, anticipate, and adapt rapidly to
threats and vulnerabilities arising from changes in the
internal & external environment; (ii) operate at elevated
levels of stress without failure for extended periods of time;
(iii) respond to a shock by containing the impact
(severity/duration) of the event; (iv) recover quickly in a
coordinated manner; and (v) implement learning from
near-misses and recovery experiences.
Time
Identify and
anticipate threats
& system
vulnerabilities
Build adaptive
capacity
& ability to
operate under stress
Shock
Recover
Respond
Learn
Time
Identify and
anticipate threats
& system
vulnerabilities
Build adaptive
capacity
& ability to
operate under stress
Shock
Recover
Respond
Learn
Figure 1: Attributes of resilience.
Whilst structures and processes play an important role in
establishing these attributes, highly reliable organisations
inherently reflect these in their culture – i.e. in “the way
things are done” in the organisation [4]. Development of
such a culture needs to be facilitated by leadership that has
collectively resolved to build the capacity to recognise risks,
prioritise the actions required to address these, and mobilise
resources to implement these actions [5]. This paper
focuses on management of the integrated power system, and
some of the systems (structures and procedures that support
decision-making) and organisational accountabilities that
have been implemented in Eskom to facilitate the process of
building a Highly Resilient Organisation (HRO). These key
elements are illustrated in Figure 2.
Organisation culture
(mindfulness)
People
(accountable)
System
(decision making)
Leadership style
(resolve to solve)
HRO
Organisation culture
(mindfulness)
People
(accountable)
System
(decision making)
Leadership style
(resolve to solve)
HRO
Figure 2: Key elements of a highly resilient organisation.
An effective risk management framework constitutes one
such decision making system. Resilience is an
organisational trait, and not a “new function” in the
organisation.
C I R E D
21st International Conference on Electricity Distribution Frankfurt, 6-9 June 2011
Paper 0594
Paper No
0594
2/4
3. INTEGRATED RISK MANAGEMENT
3.1. Background
Integrated Risk Management (IRM) has been practiced in
Eskom for many years – with varying degrees of success.
After the 2008 capacity crisis Eskom conducted an
extensive review to identify the gaps and opportunities for
improvement in the existing IRM framework. The outcome
of this review was a complete overhaul of Eskom’s risk
management framework – including adoption of the new
ISO 31000:2009 standard [7].
3.2. Overview of the IRM framework
Eskom’s IRM policy and revised standards aim at
establishing a uniform approach to risk management across
all its strategic and operational decision-making processes.
Key to this framework are: (i) establishment of the context
in which risks are assessed; (ii) risk ranking and
prioritisation based on common consequence and likelihood
criteria; (iii) leadership resolve to ensure that treatment is
timeous and effective; and (iv) effective communication,
monitoring, and review systems. The process is summarised
in Figure 3.
Figure 3: Integrated Risk Management process [7]
Common consequence and likelihood criteria are applied
across the organisation. These consequence criteria address
the following areas: (i) financial impact, (ii) people effects
(iii) environmental impact, (iv) brand and reputation, (v)
legal and compliance, (vi) continuity of supply.
Risks identified throughout the organisation are captured
on a central database. This enables access to these risks by
the whole organisation. It also enables communication and
review of these risks through standardised reporting and
queries.
The various attributes of a resilient organisation (Figure 1)
can be communicated in the context of the risk management
framework by considering six key stages (R
6
):
(i) Recognition: of threats and vulnerabilities that give
rise to business risks (i.e. uncertainty of meeting defined
objectives).
(ii) Reduction: of the potential consequences and/or
likelihood associated with these risks - through prioritising
and implementing effective controls and treatment plans;
(iii) Readiness: effective emergency planning in order to
respond should identified risks materialise (or unexpected
events occurring)
(iv) Response: the necessary capacity (structures,
protocols, and systems) to respond to any type of events;
(v) Recovery: effective business continuity plans
(vi) Reflection: post-event analysis with the aim of
capturing organisational learning.
When it comes to the integrated power system, effective
risk management requires integration across various areas of
the business. Recognising this (through events such as the
regional capacity constraint in 2005/6), Eskom has
formalised specific structures and accountabilities across the
country to facilitate a more integrated approach to risk
management.
4. REGIONAL RELIABILITY TEAMS
4.1. Objective
Regional Resilience Teams (RRT’s) have been established
and aligned with Eskom’s six distribution areas across the
country. Their specific objective is to develop and maintain
a consolidated and common leadership view of supply risks
and associated societal resilience in a given region. The
intent behind their establishment was to continue to evolve
Eskom’s organisational culture from a largely event-driven
(reactive) culture, to a culture that is characterised by
proactive attention to the potential for failure (i.e. threats,
vulnerabilities, and small failures).
4.2. Risk identification and management
Risk identification is undertaken from a “bottom-up” and a
“top-down” perspective. The “bottom-up” perspective is
facilitated by establishing risk management as a requirement
at all levels of the organisation – supported by an extensive
risk management training programme. The “top-down”
perspective is guided by specific risk categories identified to
ensure that all aspects are regularly reviewed. These
categories are i.e. (i) load growth; (ii) system adequacy;
(iii) operational risks; (iv) security risks; (v) emergency
preparedness; (vi) customers; (vii) stakeholders; (viii)
audits, reviews, and post-mortems.
4.3. Integration with normal business
A distinct concern when prioritising treatment plans across
functional areas (system operations, grid operations &
maintenance, system planning, customer services etc) as
well as business areas (generation, transmission,
distribution) is the different perspective that each of these
areas may have on a particular risk.
For each of the identified risk categories, a review of
business processes is being undertaken to enhance the
integration between functional and business areas. For
example, the prioritisation of treatment plans by the
transmission grid planning department is influenced by both
technical assessments of system vulnerabilities by the
System Operator, as well as a review by the various
functional and business areas of the risks in accordance with
the company risk criteria (consequence and likelihood).
The RRT’s further provide a good mechanism for tracking
the implementation of corporate initiatives aimed at
improving the resilience of the organisation and the county -
such as the implementation of new policy or procedures
which require interaction between various parts of the
C I R E D
21st International Conference on Electricity Distribution Frankfurt, 6-9 June 2011
Paper 0594
Paper No
0594
3/4
business. A good example is the implementation of the
national code of practice for emergency load shedding and
restoration – which requires interaction between customers,
distributors, and Eskom divisions [2].
4.4. Escalation of risks
One of the shortcomings identified in the review of Eskom’s
previous risk management framework was the inadequacy of
the risk escalation process. Risks identified by the Regional
Resilience Teams are escalated through several channels.
Regular presentations are made to the operational meeting
of Eskom’s executive management committee. These have
demonstrated to be an effective means of ensuring executive
oversight. The priority 1 risks are also tabled at the risk
committee meetings of the Eskom Board. Risks identified
by the RRT’s are actioned through the individual divisional
executive teams. Table 1 summarises the risk escalation
requirements – i.e. the level at which a risk with a given
priority need review and approval of the treatment plans.
Table 1 – Risk escalation requirements
Ranking Review and approval of treatment plan by:
Priority 1 Executive Committee & Eskom Board
Priority 2 Divisional Executives & Exec Sub-Committee
Priority 3 Business Unit General Managers
Priority 4 Senior Managers
5. INTEGRATED EMERGENCY STRUCTURES
5.1. Objective
An emergency is by its nature chaotic and therefore
emergency structures, facilities, and protocols are required
to coordinate an effective response to an impending or
actual emergency. “The ability to deal with a crisis situation
is largely dependent on the structures that have been
developed before chaos arrives. The event can in some
ways be considered as an abrupt and brutal audit: at a
moment’s notice, everything that was left unprepared
becomes a complex problem, and every weakness comes
rushing to the forefront” [8].
An electrical power system is essentially passive, and
responding to supply interruptions and plant outages is a
day-to-day activity in the operational divisions of a power
company – i.e. the normal business of a utility is to handle
these events as they arise. An emergency may be defined
as: “an event which cannot be handled by the normal
business resources and processes of the utility in question
[1]”.
The key objective with the establishment of integrated
emergency structures and protocols is therefore to ensure an
effectively coordinated response to events that exceed the
boundaries of what constitutes “normal business” – i.e.
events that could otherwise escalate to a major crisis for the
company. Key criteria in the establishment and maintenance
of emergency preparedness protocols & structures are that
these are: (i) fully integrated within internal and external
structures; (ii) regularly tested & assured; (iii) demonstrate a
high level of preparedness for all kinds of emergencies that
could affect the functioning of the power system.
5.2. Structures
To effectively coordinate a response, a national Emergency
Response Command Centre (ERCC) has been established to
coordinate the emergency protocols across the regions with
support from Regional Joint Command Centres (RJCC’s).
These structures are only activated in the event of an
emergency. In the event of a regional emergency, the
relevant RJCC is established. Venues are equipped with a
range of communication technologies namely video and
teleconferencing facilitates.
5.3. Mandate
When an emergency is declared by the standby chairman,
emergency powers are invoked. This mandate is only
available for emergencies, in other words situations in
which Eskom’s normal processes will not handle the speed
required for an acceptable resolution of the situation. The
triggers for such an emergency declaration were originally
only related to the power system however, the usefulness of
the system has been demonstrated for many kinds of
emergencies, including the mentioned strike in mid 2010
FIFA World Cup
TM
in South Africa.
5.4. Protocols
Emergency protocols have been defined to provide structure
in a potentially chaotic situation (see Figure 4). Annual
scenario testing is conducted on a national and regional
level. The national tests involve all the structures and tele-
conferenced committees play out the scenarios across the
country.
The Alert phase of the protocol allows management of a
potential crisis where emergency power are not required –
i.e. the ERCC/RJCC’s play a strong coordinating and
monitoring role.
Further
Actions
Further
Actions
Mobilise
Recovery
Team
Mobilise
Recovery
Team
Status
Reporting
Status
Reporting
Emergency Response Protocols
Emergency Response Protocols
Determine
Initial
Response
Determine
Initial
Response
Event Trigger
Activation
Activation
Convene
Initial
Response
Team
Convene
Initial
Response
Team
Plan
Actions
Plan
Actions
Stand Down
Assess
Situation
Assess
Situation
Develop
Strategic
Options &
Priorities
Develop
Strategic
Options &
Priorities
Further
Actions
Further
Actions
Mobilise
Recovery
Team
Mobilise
Recovery
Team
Status
Reporting
Status
Reporting
Emergency Response Protocols
Emergency Response Protocols
Determine
Initial
Response
Determine
Initial
Response
Team
Event Trigger
Activation
Activation
Convene
Initial
Response
Team
Convene
Initial
Response
Team
Plan
Actions
Plan
Actions
Stand Down
Info Request
Assess
Situation
Assess
Situation
Develop
Strategic
Options &
Priorities
Develop
Strategic
Options &
Priorities
Communications an d Stakeholder Manageme nt
Alert
The process provides
structure in a chaotic
situation
The process is a guide to
Emergency Response for
decision-makers
Further
Actions
Further
Actions
Mobilise
Recovery
Team
Mobilise
Recovery
Team
Status
Reporting
Status
Reporting
Emergency Response Protocols
Emergency Response Protocols
Determine
Initial
Response
Determine
Initial
Response
Event Trigger
Activation
Activation
Convene
Initial
Response
Team
Convene
Initial
Response
Team
Plan
Actions
Plan
Actions
Stand Down
Assess
Situation
Assess
Situation
Develop
Strategic
Options &
Priorities
Develop
Strategic
Options &
Priorities
Further
Actions
Further
Actions
Mobilise
Recovery
Team
Mobilise
Recovery
Team
Status
Reporting
Status
Reporting
Emergency Response Protocols
Emergency Response Protocols
Determine
Initial
Response
Determine
Initial
Response
Team
Event Trigger
Activation
Activation
Convene
Initial
Response
Team
Convene
Initial
Response
Team
Plan
Actions
Plan
Actions
Stand Down
Info Request
Assess
Situation
Assess
Situation
Develop
Strategic
Options &
Priorities
Develop
Strategic
Options &
Priorities
Communications an d Stakeholder Manageme nt
Alert
Further
Actions
Further
Actions
Mobilise
Recovery
Team
Mobilise
Recovery
Team
Status
Reporting
Status
Reporting
Emergency Response Protocols
Emergency Response Protocols
Determine
Initial
Response
Determine
Initial
Response
Event Trigger
Activation
Activation
Convene
Initial
Response
Team
Convene
Initial
Response
Team
Plan
Actions
Plan
Actions
Stand Down
Assess
Situation
Assess
Situation
Develop
Strategic
Options &
Priorities
Develop
Strategic
Options &
Priorities
Further
Actions
Further
Actions
Mobilise
Recovery
Team
Mobilise
Recovery
Team
Status
Reporting
Status
Reporting
Emergency Response Protocols
Emergency Response Protocols
Determine
Initial
Response
Determine
Initial
Response
Team
Event Trigger
Activation
Activation
Convene
Initial
Response
Team
Convene
Initial
Response
Team
Plan
Actions
Plan
Actions
Stand Down
Info Request
Assess
Situation
Assess
Situation
Develop
Strategic
Options &
Priorities
Develop
Strategic
Options &
Priorities
Communications an d Stakeholder Manageme nt
Alert
The process provides
structure in a chaotic
situation
The process is a guide to
Emergency Response for
decision-makers
Figure 4: Emergency response protocols
The ERCC and the RJCC stand down after an emergency
has been managed and recovery complete.
5.5. Relationship to the Regional Reliability Teams
The Regional Reliability Teams are accountable for
reviewing the readiness of the Regional Joint Command
Centres, for specific emergency preparedness plans in the
various divisions, and for overseeing the implementation of
learning after the event. Figure 5, illustrates the oversight
role of the RRT and highlights that activation of the RJCC
only occurs during an emergency condition. The RJCC will
oversee containment of the severity of the emergency –
relying on both its command capabilities and the
C I R E D
21st International Conference on Electricity Distribution Frankfurt, 6-9 June 2011
Paper 0594
Paper No
0594
4/4
groundwork laid by the RRT in establishing suitable
contingency plans for identified risks.
Vulnerability Adaptive
Capacity Time
System Adaptive Capacity
System Vulnerability
KPI
ReadinessReduction Response Recovery
Shock
Vulnerability Adaptive
Capacity Time
System Adaptive Capacity
System Vulnerability
KPI
ReadinessReduction Response Recovery
Vulnerability Adaptive
Capacity Time
System Adaptive Capacity
System Vulnerability
KPI
System Adaptive Capacity
System Vulnerability
KPI
ReadinessReduction Response Recovery
Shock
Potential
Potential
Threat
Threat
Reflect
RJCC
RRT
Recognise
Figure 5: Relationship between the RRT’s and RJCC’s.
6. VISUALISATION
The ability to anticipate and identify threats and
vulnerabilities is a necessary pre-requirement for ensuring
an appropriate response. Eskom is pursuing enhanced
visualisation at many levels in the organisation (ranging
from implementation of an Integrated Generation Control
Centre - which tracks detailed power station parameters
across Eskom’s generation fleet, to an Operational Health
Index that acts as a “wide net” to track operational
vulnerabilities across the organisation for the operations
sub-committee of Eskom’s Executive Management
Committee).
In preparation for the hosting of the 2010 FIFA World
Cup
TM
in South Africa, the need was identified for
improved visualisation of threats related to the supply to
FIFA venues and other key sites. This resulted in the
establishment of temporary structures (Situational
Awareness Centres) at national and regional level. Supply-
related threats to specific event-related sites were visible at
national level. The value of this information contributed to
decision to establish a permanent nerve-centre to provide
senior management with situational information on the
status of supply to customers as part of Eskom’s new
business model.
7. CONCLUSIONS
Whilst it may be true that “resilience is something you
realise after the fact [8]”, the challenge remains as to how
the level of resilience of an organisation can be enhanced
and evaluated before a crisis emerges. Furthermore, no
organisation can predict all its failure modes or have
planned up front for unexpected changes in the internal and
external environment in which it operates.
This paper has proposed a general framework for
institutionalising some aspects of organisational resilience
through: (i) improved integration across business areas and
functions; (ii) a common risk management framework to
establish clear communication on the nature of specific
risks; (iii) escalation processes to ensure prioritisation of
risk treatment plans, and; (iv) a clear alignment between the
“normal” business and the “abnormal” requirements
associated with a response to a crisis.
Whilst resilience can be enhanced through
institutionalised risk management processes and improved
integration across the business, the true benefit of the
structures discussed relates to the manner in which an
elevated level of “mindfulness” is entrenched as a culture in
the organisation.
This elevated level of “mindfulness” is considered by as
having played a significant part in Eskom’s contribution to
the successful hosting of the 2010 FIFA World Cup
TM
in
South Africa.
Acknowledgments
The authors wish to acknowledge Eskom’s leadership team,
in particular: Erica Johnson, Kannan Lakmeeharan, and
Greg Tosen for their enthusiastic envisioning and
sponsorship of the resilience initiative; the RRT leadership
teams and coordinators for their excellent support and
leadership in the implementation process; Gavin Brice,
Juan la Grange, Angie Dubbini, Vanessa Carpal, and the
Deloitte Consulting team for their role in the review and
establishment of the emergency structures and protocols,
and; Christopher Palm and Grant Purdee (Broadleaf
Consulting) for development of the new IRM framework.
8. REFERENCES
[1] A.J. Correia, R.G. Koch, Integrated Power System
Emergency Preparedness - Framework and
Implementation in South Africa, Paper 85, Cigré
Symposium, Recife, Brazil, April 2011.
[2] R.G. Koch, J. Correia, A. Dold, D. Marais, P. van
Niekerk, M. Motaung, M. Mncube, P. Johnson, (on
behalf of the NRS 048 WG ), National Code of
Practice: Emergency Load Reduction and System
Restoration Practices (NRS 048-9:2009), AMEU
Convention, Port Elizabeth, South Africa, 28-30 Sept
2009
[3] M. Van Harte, R.G. Koch, M. Nene, G. Havford, M.
Bala, Integrated Risk Management and System
Adequacy Assessment: Implementation of the ISO
31000:2009 Standard in the South African Power
System, Paper 84, Cigré Symposium, Recife, Brazil,
April 2011.
[4] K. Weik & K. Sutcliffe, “Managing the Unexpected:
Resilient Performance in an Age of Uncertainty”, 2nd
Edition, San Francisco, John Wiley & Sons, 2007
[5] M.H. Bazerman, M.D. Watkins, Predictable Surprises
– The disasters you should have seen coming and how
to prevent them, 2nd Edition, San Francisco,John
Wiley & Sons, 2007
[6] M. Chettiar, K. Lakmeeharan, R.G. Koch, A Review
of the January 2008 Electricity Crisis in South Africa:
A Problem a Decade in the Making, Cigre Southern
Africa Regional Conference, August 2009.
[7] ISO 31000:2009, Risk management — Principles and
guidelines
[8] P. Lagadec, “Preventing Chaos in a Crisis: Strategies
for Prevention, Control, and Damage Limitation”,
London, McGraw-Hill International, 1993, p.54
[9] Diane L. Coutu, “How resilience works – confronted
with life’s hardships some people snap, and others
snap back”, Harvard Business Review, May 2002,
p.47