Conference PaperPDF Available

Enhancing Microgrid Resiliency Against Cyber Vulnerabilities

Authors:
Measuring and Enhancing Microgrid Resiliency
Against Cyber Threats
V. Venkataramanan, Student Member, IEEE, A. Srivastava Senior Member, IEEE, A. Hahn, Member, IEEE,
and S. Zonouz, Member, IEEE
Abstract—Recent cyber attacks on the power grid have been of
increasing complexity and sophistication. In order to understand
the impact of cyber-attacks on the power system resiliency,
it is important to consider an holistic cyber-physical system
specially with increasing industrial automation. In this work,
device level resilience properties of the various controllers and
their impact on the microgrid resiliency is studied. In addition,
a cyber-physical resiliency metric considering vulnerabilities,
system model, and device level properties is proposed. Resiliency
is defined as the system ability to provide energy to critical
loads even in extreme contingencies and depends on system
ability to withstand, predict and recover. A use case is presented
inspired by the recent Ukraine cyber-attack. A use case has been
presented to demonstrate application of the developed cyber-
physical resiliency metric to enhance situational awareness of
the operator, and enable better proactive or remedial control
actions to improve resiliency.
Index Terms—Cyber-attacks, cyber-physical systems, micro-
grid, cyber-vulnerabilities, CVSS, resiliency.
I. INTRODUCTION
Presidential Policy Directive (PDD) 21 states that resilience
is “the ability to prepare for and adapt to changing conditions
and withstand and recover rapidly from disruptions” [1].
Hence in the power grid - especially in military or critical
microgrids, it is necessary that some critical loads are kept
operational even in presence of contingencies. While reliability
based problems have been studied for the power grid [2],
resiliency is a relatively new and evolving concept. Resilience
in this paper is defined as the ability of the microgrid to supply
the critical load even in the case of multiple contingencies.
Advances in industrial automation and control of micro-
grids has led to increasing number of digital devices in the
power grid. These devices, while having many advantages,
also introduce cyber-vulnerabilities. This is a growing area of
concern for the industry as an increasing number of SCADA
devices and industrial control systems (ICS) are being exposed
to such vulnerabilities [3], [4]. Also, more advanced threats
such as covert attacks [5], and attacks against EMS/control
application [6] are recent concerns for the industry. There are
a number of papers that discuss the performance of resilient
microgrid systems against adverse events such as faults or
weather events, but most do not consider the microgrid as a
V. Venkataramanan, A. Srivastava and A. Hahn are with Washington State
University, Pullman, WA 99163 USA. (E-mail: asrivast@eecs.wsu.edu)
S. Zonouz is with Rutgers University, New Jersey, 08854 USA (E-mail:
saman.zonouz@rutgers.edu).
This material is based upon work supported by the Department of Energy
under Award Number DE-OE0000780.
cyber-physical system [7]–[9]. Considering the enhanced inte-
gration of information and communication technology (ICT)
in the smart grid systems and resulting cyber vulnerabilities
[10], it is essential that the operation of the microgrid should
not be affected by failures in either the physical infrastructure
or the ICT infrastructure [11].
To understand cyber-physical microgrid resiliency, detailed
modeling for both the physical system and the cyber compo-
nents is required. The cyber attacks are modeled to disrupt
the operation of the critical load by trying to isolate the load
from the generation sources. Before enabling resilience, it is
important to measure the resilience of a system numerically
to enable the operator to quickly understand the changing
operating conditions of the system. A report from the National
Academy of Sciences titled “Enhancing the Resilience of the
Nation’s Electricity System” [12] states that “without some
numerical basis for assessing resilience, it would be impossible
to monitor changes or show that community resilience has
improved”. To bridge this gap, this paper proposes a metric
that to determine the effect of cyber-vulnerabilities on the
microgrid while also considering the device level properties
in the microgrid building on our previous work [13]. The
proposed metric will be able to identify the effect if a particular
vulnerability is exploited and will help the operator to be aware
of the system status by providing an operational score based
on the status of the vulnerability. Recent cyber incidents such
as BlackEnergy, CrashOverride, Triton illustrate the growing
threat of cyber vulnerabilities. It is critical to define metrics
to measure the performance of the system to guide operators
and planners to take key decisions to improve the system
performance. Many cyber security metrics have been devel-
oped for Information Technology (IT) systems, such as the
Common Vulnerability Scoring System (CVSS) [14]. CVSS
relies on expert opinion to grade vulnerabilities on several
factors, to enable comparison among several vulnerabilities.
While CVSS is useful for studying and analyzing vulnera-
bilities, it still lacks an understanding with respect to critical
infrastructure systems such as the power grid. Recent efforts
have focused on organizational practices to study resiliency,
such as the CERT Resilience Management Model [15], or
MITRE’s Cyber Resilience Engineering Framework (CREF)
[16]. These frameworks also use a qualitative rating for defin-
ing operational levels (such as low, medium, high) and may
not be sufficient for certain cases. Other metrics developed
from system theory such as Infrastructure Resilience Analysis
Methodology (IRAM), metrics defined by Jacobs et. al [17],
and Rieger [18] use control theoretic definitions of resilience
to evaluate performance degradation, and time and cost of
recovery. While there are metrics that compute individual
physical and cyber resiliency, there is a lack of metrics
that study cyber-physical resiliency rigorously in integrated
manner. Also, many of the metrics proposed in literature are
an aggregation or tuple of existing metrics, that need a domain
specialist to interpret them. Our contribution aims to help the
operator to intuitively understand the status and importance of
various vulnerabilities present in the system, and take pro-
active control actions based on this information to ensure
resiliency. The contributions of this work are as follows -
1) Analyzing the effects of cyber-vulnerabilities on device-
level resiliency,
2) Developing a device resilience framework based on
verification of controller code,
3) Quantifying the impact on microgrid resiliency using
a developed cyber-physical resiliency metric (CPRM)
based on Common Vulnerability Scoring System
(CVSS),
4) Demonstrating the performance of our device level re-
silience tool,
5) Demonstrating the change in proposed metric CPRM,
using a case inspired by the Ukraine cyber-attack. The
metric will enable control actions based on better situa-
tional awareness.
II. CY BE R-PHYSICAL RESILIENCY METRIC (CPRM)
The cyber-physical resiliency metric (CPRM) is calcu-
lated with two main components - a device level resilience,
measured by Trusted Safety Verifier (TSV) and CPRM for
microgrid system resiliency.
A. TSV - Device Level Resiliency
In modern control system networks, a security flaw in
almost any component can be leveraged to upload malicious
code to a PLC. A clear example of this was the Stuxnet virus,
which used many potential vectors, including the program
development environment, to propagate to a PLC-connected
computer [19]. TSV’s aim is to reduce the amount of control
system infrastructure needed to guarantee safe behavior of
PLC-controlled processes to a single embedded computer and
the PLC itself.
We must assume that the interface for uploading safety
properties to TSV is secured. For example, a simple file format
could be read from a USB key directly by TSV. While the use
of an air-gap may seem to mitigate the advantage of a network
connected PLC, safety properties require modification far less
frequently than PLC code. Compared to the large numbers of
requirements in existing industrial security regulations, this is
a small additional overhead. We note that TSV is not secure
against a privileged insider with physical access to the plant
floor.
The TSV is interposed between controllers (i.e., microgrid
controllers and other control devices) and the control network
(i.e., supervisory control and data acquisition - SCADA). Any
controller-bound code must be formally verified against a set
of safety properties. The safety properties are stated in linear
temporal logic (LTL) that allows for the description of tempo-
ral properties such as causal relationships, and guarantees of
eventual progress.
TSV will reject any controller code that violates one or
more of its safety properties. This checking is done in sev-
eral offline steps. First, the controller code is disassembled
into an Instruction List (IL) program. IL is an accumulator
based assembly language that is widely used as the low-level
language in controller development environments. Most IL
instructions cause CPU side effects such as implicit zero flag
register settings, making direct IL analysis difficult. Thus, TSV
uses the IL Intermediate Language (ILIL) to represent IL
programs. ILIL is a register transfer language, based on Vine
intermediate language. An ILIL program models the low-level
register and memory operations resulting from execution of its
corresponding IL program.
Cyber-physical system controllers typically follow syn-
chronous execution paradigm as a sequence of fixed-time
sense-process-actuate epochs called scan cycles. TSV per-
forms a mixed concrete and symbolic execution of an ILIL
program to produce a symbolic scan cycle. A symbolic scan
cycle represents every possible execution of a single scan cycle
of the controller program. It is a mapping from path con-
straints over controller input variables (sensor measurements)
to symbolic values for controller output variables (actuation
commands). For a single entry in the mapping, if the input
variables satisfy the path constraint, then the output variables
will have the corresponding symbolic value.
To model the controller’s execution’s subsequent scan cy-
cles, TSV combines successive symbolic scan cycles into
the Temporal Execution Graph (TEG). The TEG is a tree
that represents a nondeterministic execution of a controller
program for some fixed number of scan cycles. For example,
a TEG of depth three would represent all possible executions
of the program for three successive scan cycles. The TEG is
later checked against the safety requirements to determine if
the controller code could ever produce unsafe outputs. Finally,
TSV produces a “confidence” score, that determines whether
the device performs as expected. This score is then scaled to
be between 0-1 for calculating the CPRM. For larger systems
(such as bigger microgrid or electric distribution systems), the
size of the TEG might be larger. We detail our approach to
deal with this problem at the end of this section. However, this
might not be the case for all the controllers, as some of the
controller’s actions might be simple, thus restricting the size of
the TEG. However, for larger systems, more individual devices
need to be analyzed, depending on the size of the system. As
this step is performed offline and only need to be updated
occasionally, this should not affect the real-time operation of
the microgrid.
The following text describes in details the procedure for
TEG generation (Algorithm 1). The main inputs to the al-
gorithm are i) symbolic scan cycle set ssc, i.e., symbolic
execution outputs that are mappings from path predicates to
Input : The Symbolic scan cycle ssc
Input : The LTL safety specification ϕ
Input : The TEG generation deadline γ
Output: The generated temporal execution graph TEG
1Aget atomic propositions(ϕ)
2σcreate initial state()
3if initial GenTEG call then
4σpredicate inilialize predicate(True)
5σvar values inilialize PLC variables(False)
6σprop values
concretize atomic propositions(σvars values,A)
7sσ
8e← ∅
9end
10 foreach Path predicate πssc do
11 symbolic values ssc [π]
12 foreach α2Ado
13 τσpredicate Vpredicate(α, A)Vπ
14 if ¬satisfiable(τ)then
15 continue
16 end
17 σ0create state()
18 [σ0
predicate, σ0
prop values][τ, α]
19 σ0
var values
update(σvar values,symbolic values)
20 σ00 find equivalent state(Ω, σ 0)
21 if σ00 6= NULL then
22 delete(σ0)
23 end
24 else
25 σ00 σ0
26 end
27 ss∪ {σ00}
28 ee∪ {(σσ00)}
29 if γ < elapsed time then
30 return
31 end
32 GenTEG(ssc, ϕ, σ00 )
33 end
34 end
Algorithm 1: GenTEG
symbolic PLC variable values; ii) the safety specification of the
underlying physical system ϕ, and iii) the termination deadline
γfor the TEG generation algorithm. We parse the given LTL
safety formula to get the corresponding atomic propositions1
(Line 1). The TEG generation algorithm starts with initializing
the TEG state space by creating an initial state σwhere all
of the PLC variables/predicates are reset to zero/true (Lines
2-7) that happens when the PLC loads the controller code for
the first time.
Regarding the TEG state notion, each state includes three
types of information: i) σpredicate denotes the logical predicate
1Note that “” denotes an assignment.
as the result of symbolic execution of branch/jump instructions
that has been accumulated in the current state through the state
transition sequence starting at the initial state σ; ii) σvar values
indicates the symbolic variable values that have been initiated
in the current state; and iii) σprop values represents the con-
crete Boolean value vector for the atomic propositions in the
current state. For the initial state2, given the reset concrete
variable/predicate values, the concrete values for the LTL
atomic propositions Aare calculated and stored in σprop values
(Line 6); however, for other states storing symbolic values, we
take a different approach to assign concrete atomic proposition
values as discussed below. The TEG state space sand set
of transitions eare also initialized to the initial state σand
empty set, respectively, during the initial function call (Line
7-8).
Following the algorithm, a transition is then added for
each (path predicate, symbolic output values) mapping in the
symbolic scan cycle ssc (Line 10) that is satisfiable given
the variable values in the initial state σ. The algorithm goes
through a nested loop (Line 12) to be able to assign concrete
Boolean values for each atomic proposition on every generated
TEG state. We produce the conjunctive predicate τusing i)
the accumulated state predicate; ii) the path predicate πfrom
ssc; and iii) the concrete atomic proposition vector α(Line
13). The satisfiability check is performed in Line 14 that, if
satisfiable, allows us to create the corresponding state σ0(Lines
17-19) and transition, and update TEG (Lines 27-28).
The update function (Line 19) creates the symbolic variable
values for the new state σ0. It takes the symbolic variable
values in the source state σvar values , that captures the PLC’s
current memory state, as well as the symbolic values from
the corresponding program control path in ssc (Line 11).
Consequently, the update function performs the intermediate
variable elimination step to get rid of intermediate variables
in the ssc symbolic values, and stores the result in the new
state’s symbolic variable values σ0
var values .
There is a case in which the state will not be added
even when the path predicate is satisfied. If the TEG already
contains a state with PLC variables equivalent to the desti-
nation state (Line 20), then a transition is added back to the
existing state, and the new destination is discarded (Line 22).
Two states are considered equivalent if their PLC variables
have equal symbolic values. This step enables us to avoid
unnecessary state space size increase, and hence improves the
formal verification efficiency. It is noteworthy that to decrease
the false negative rates of the state equivalence checking
function, we check for equality after simplifying the symbolic
values. For instance, we mark the X1I0
k+ 2 + 3 ·I0
kand
X24·I0
k+ 2 as equal after the simplification of those
expressions’ abstract syntax trees.
Finally, we call the TEG generation function GenTEG
recursively to explore next possible states starting the recently
2It is assumed that the function GenTEG takes a Boolean argument
initial GenTEG call that denotes whether this is the first call in the
recursion chain. Due to presentation simplicity, the variable is not listed in
the algorithm’s input list explicitly.
explored state σ00. The recursive graph generation procedure
returns under two conditions. First, the procedure returns if
all of the states are created and the graph is completely
generated. This is the ideal return condition as the complete
graph will result in accurate model checking results with a
counterexample. Second, the procedure returns of the explored
depth, i.e., the number PLC input-output scans, exceeds a
predefined bound value (Line 29). This results in a partially
generated temporal execution graph that is later used for
formal model checking. The bounded graph generation is a
suitable solution when the size of the program is large and
complete graph generation is too costly.
To summarize, we strive for minimality of model state space
through three approaches. (i.) Symbolic execution lumps as
many concrete input values (and hence, scan cycles) together
as possible. (ii.) In the refinement step, a truth value for a
proposition is only added if it is feasible transitioning from
the previous state. (iii.) As a measure of last resort, we will
perform bounded model generation when the TEG’s diameter
becomes too large.
B. CPRM - Contextualizing System Level Resiliency
It is important to consider a cyber-physical model of the
microgrid to model the dependencies between the cyber and
physical system formally. This becomes important in order
to simulate the system, and also to understand the potential
impacts of cyber-attacks. A cyber-physical model of the mi-
crogrid is developed here along the lines of the work in [20].
The power system can be represented as a directed graph
Gpwith a set of nodes Nand a set of links L. The power
system topology, based only on electrical components and
connectivity, can be represented by a connectivity matrix, and
then a directed graph. While the power system graph can be
determined based on the topology, the communication system
graph is harder, because such details are not usually available
to researchers. However, some basic information is available
based on which the cyber-physical dependencies can be better
understood.
In our power system model, each node is considered to be
a connection point, meaning there exists a switch which can
be operated. Fig. 1 shows these various nodes. The switches
are connected to an actuator and a sensor (or more). Also,
based on the location of these nodes, they are often grouped
into a single substation which contains a substation computer
that can control a specific subset of switches. Substations
might be interconnected, or remain isolated but they are all
connected to a central control center which is the application
layer that provides the tertiary control. The communication
topology might be in different configurations such as point-
to-point, ring, etc., the overall architecture lends itself to
a hierarchical topology based on the control structure. The
communication topology can also be derived based on firewall
rules implemented in the various systems on the network. The
communication model is essential in understanding the attack
paths present in the system, and the relation between the cyber
and physical nodes. In our case, we associate each physical
node with a cyber node Mand links Lwith communication
links Cto create a cyber graph Gcsuch that there exists a
bijection as given in Eqn. 1,
f:LC, |(x, y)L(f(x), f (y)) C(1)
The isomorphic nature of the graphs allow us the flexibility
of defining more detailed communication models by aggregat-
ing larger number of nodes into a single power system node
Ni. Isomorphism allows us to include detailed communication
models based on both OT (Operation technology) and IT
(Information technology) networks, while still retaining the
same overall structure of the power system model Gp. Section
III-B describes the communication model that is derived from
Gpbased on these principles.
The detailed communication graph will be based on the set
of hosts H, firewall and IDS rules R, and the communication
links between them C. This allows us to create an attack graph
for the communication graph Gcbased on which the status of
the cyber system, and the influence of a particular cyber or
physical node can be derived.
The Cyber Physical Resilience Metric (CPRM) proposed
in this paper is based on CVSS. In order to understand the
complexity of the cyber attack, the Common Vulnerability
Scoring System (CVSS) has been used as a basis. CVSS
is a vulnerability evaluation and scoring system developed
by “FIRST” [14], [21]. The CVSS system considers known
vulnerabilities in devices (known as Common Vulnerabilities
and Exposures or CVEs) and tries to assess the impact
of the vulnerability by looking at various parameters such
as complexity of attack vector, confidentiality, integrity, and
availability impacts. The CPRM metric has three components.
Initially, a base score is calculated based on the properties
of the vulnerability which studies the exposure and impact
factors. An environmental score modifier is applied based
on the location of the device in the system, and this gives
us revised exposure and impact scores. This modified en-
vironmental score can be derived based on analyzing the
attack graphs generated from Gc. CVSS scores score Impact
and Exploitability sub-scores based on a specific vulnerability
without taking into consideration the previous steps that the
attacker might have gone through. We propose an added vuln
score that will modify the Impact and Exploitability scores
based on the device’s location in the system. The concept is not
dissimilar from the environmental score of CVSS, but offers
the user more control to determine the weights assigned to
the device, and takes into consideration any previous exploits
performed by the attacker. Based on the work in [22], we
consider vulnito be a tuple of (C Ri, I Ri, ARi, Ri, AUi),
where CRi,I Ri,ARiare the confidentiality, integrity, and
availability requirements for that particular node. Rirepresents
the firewall rules for the host and its alerts, and AUirepresents
the authentication status i.e., the privilege of the attacker on
that particular host. Each device on the network will have a
vulnerability modifier vulni, which are used to modify the
CVSS formulation as follows -
ModConf identiality Impact =Conf identialityImpact C Ri
ModIntegrityI mpact =IntegrityImpact I Ri
ModAvailabilityImpact =AvailabilityI mpact ARi
ModAccessComplexity =AccessComplexity Ri
ModAuthetication =Authentication AUi
(2)
In the tuple vulni, the modified confidentiality, integrity,
and availability impact scores are derived from the position
of the hosts in the network topology. The score is dictated by
the mapping between the power and communication models.
In our communication model, each breaker is associated with
a physical switch. Each of these breakers control individual
devices or components such as various loads or generating
sources. Some breakers might also be sectionalizing breakers,
isolating a part of the system from the rest. In this case,
the breaker is assumed to control a larger area of different
components. For example, a microgrid point of common
coupling (PCC) switch would isolate the microgrid from the
main grid, with all its various components. However, a breaker
connecting a building or another load to the grid would control
just a single component.
The load/component priority can be determined from the
utility, or a suitable priority can be assigned by the system
operator. Once the priority is known, the same priority can
also be assigned to the overall communication node, due to the
two systems being isomorphic. Usually, the communication
node would be considered to have other processes or systems
within them, including operating system processes and such.
Each communication node is assumed to have a certain
security mechanism, which might be a firewall or in case of
a controller, a real-time model verifier such as TSV. These
firewall rules, or additional security mechanisms are described
in a set of rules R. The change in Ris captured in the
modified AccessComplexity, which represents how complex
the attack vector needs to be. The ModAccessComplexity
score is larger for more easily exploitable vulnerabilities.
Each communication host also as a privilege based on the
authentication needed to perform a malicious operation, or to
gain further access in the network. This is captured in the
authentication variable, AUiwhich captures the sophistication
needed for elevated privileges. For example, if device has
two-factor authentication rather than a single password, this
variable will result in higher resiliency.
For our analysis, the confidentiality, integrity, and avail-
ability (CIA) impact scores does not change with attack
progression or increasing privileges of the attacker. This means
that once the modified impact scores are determined using the
load priorities, these remain constant unless the system config-
uration changes. However, the Rand AU changes periodically
when the attacker compromises the security mechanisms and
gains elevated privileges. Hence these variables are cumulative
and represent the risk presented to the system at any time. In
accordance with the CVSS specification, the ranges for the
tuple vulniis restricted between 0-10.
The confidentiality requirement CRiis high only for hosts
which aggregate measurements, and can potentially inform
the attacker about the status of the system. For the individual
breakers, this is not as high, except for the critical load, as this
can also allow the attacker to study the pattern of switching.
The availability requirement ARispecifies how important it
is to have access to a device when required. This is again
higher for the substation switches, as they need to be available
to ensure communication to the breakers. The control center
does not score high here, as the protection system will operate
without instruction from the control center, and the backup
generation will also operate without awaiting instructions.
The integrity requirement IRispecifies the trust or veracity
of the information. In this case, the control center has the
highest requirement, as it can control the entire system. The
requirement is assigned hierarchically, and at the lowest level
the load priority is used to specify this requirement. In absence
of real network configuration, typical values are assumed for
the Authentication AUiand security privileges Rivariables.
We assume there are firewalls present between substations,
and also between the control center and other nodes. Only the
critical load and the control center are assumed to have multi-
factor authentication mechanisms present. In case of AUiand
Ri, smaller values lead to a more resilient system.
Based on CVSS, modified environmental score from Eqn. 2,
and the TSV score incorporating temporal elements, the
CPRM score is determined as follows -
ModImpact = 10.41 (1(1 M odC onf identialityI mpact)
(1 ModIntegrityI mpact)
(1 ModAvailabilityImpact))
ModExploitability = 20 M odAccessComplexity
AccessV ector M odAuthentication
(3)
CPRM = (10 (0.6M odI mpact+0.4
ModExploitability 1.5)T SV Scor e)(4)
where TSV score ranges between 0 to 1 depending on
the temporal score. CPRM is for individual devices. How-
ever, for the purpose of determining the resiliency for the
overall system, individual CPRM scores can be combined by
assigning weights for each individual component in the system
and then combining them using mathematical techniques. The
numerical constants in the Eqn. 3 and Eqn. 4 are derived
directly from the CVSS formulation. These numbers have been
used to scale the CVSS between [0,10]. The constants can
of course be changed, however that will change the way the
individual parameters are weighed, and will also change the
scale. Various techniques exist to weight and combine these
factors into a single system level metric such as [23], [24].
In terms of implementation, CPRM needs to be only com-
puted centrally, and not at all nodes. TSV is very light in
terms of computational burden, and does not require any
additional components to be added. Our trusted computing
Fig. 1. Microgrid Model
base components including the servers where CPRM is im-
plemented ensure that the CPRM system itself will be well-
protected. In addition, the CPRM itself does not directly
perform any control actions, so the attacker cannot use this
interface to cause physical impact. Such infrastructure will also
be cost-efficient as this is not beyond what the current security
solutions (e.g., IDSes) require nowadays, i.e., a trusted server
for their security analysis.
III. TES T CAS E DESCRIPTION
The microgrid model used in this work is represented in
the Fig. 1. We consider a cyber-physical layered model of a
microgrid, and details about this modeling approach can be
found in [13].
A. Power System Modeling
The microgrid used in this work is based on the model of an
Army microgrid [25]. The Army microgrid is chosen because
of the following reasons.
1) Microgrid architecture, which allows redundant paths for
reconfiguration,
2) Presence of critical loads and priority loads,
3) Strong emphasis on cybersecurity in a critical infrastruc-
ture
This work primarily uses a design based on the Fort Carson
microgrid. This microgrid is already established, and some
details of the microgrid has been made public. A conceptual
overview of the microgrid’s architecture is also available
in public domain [25], which has been used to design the
microgrid for this paper.
The details about the microgrid components from [25] have
been listed here:
1) 1.1 MW of critical load, and 1 MW of priority load,
2) 3.25 MVA of existing diesel generation,
3) 1 MW of Solar array and 5 electric vehicles.
As shown in Fig. 1, the critical load is at the far left, and
it has an auxiliary generator connected to it. The PV panel
right above it and the priority load is right next to it, but is
not connected to its own auxiliary generator. This part of the
microgrid is followed by a sectionalizing switch which can be
used to further isolate the sensitive loads from the rest of the
TABLE I
CIA IMPACTS AND SECURITY STATU S
Node CRiI RiARiAUiRi
BRK 1 4 5 6 4 8
BRK 2 4 5 6 4 8
BRK 3 6 8 8 2 8
BRK 4 4 5 6 4 8
BRK 5 4 4 4 7 8
BRK 6 4 4 4 7 8
BRK 7 4 4 4 7 8
BRK 8 2 2 2 7 8
BRK 9 2 2 2 7 8
SS 1 8 8 6 5 3
SS 2 6 6 4 5 3
SS 3 4 4 2 5 3
CC 8 10 6 2 5
Fig. 2. Microgrid Communication Model
microgrid. The non-critical load is in the extreme right, and
is being fed by the grid. In normal operation, the main grid
is connected to the microgrid, and all the loads are supplied
by the main grid and the PV array. The Generator 2, which
is a gas turbine based generation is normally not connected,
and has the first priority to be connected to the system in any
contingencies. Generator 1 is an expensive diesel unit, that
only supplies the critical load during emergency, following
extreme contingencies.
B. Communication Network Modeling
The communication network for this microgrid is modeled
in CORE, a communication system emulation platform [26].
The CORE model is shown in Fig. 2. The CORE model shows
several nodes in the bottom of the figure, which represent
the different relays that control the breakers in the simulated
system, and the measuring devices that send the status infor-
mation back to the control center. Each of these relays are
connected to a substation gateway in the substation, which
also has a substation computer. The substation computer acts
as an aggregator for the various measurements coming into the
substation and sends it to the control center. It also behaves as a
switch and sends the control command from the control center
to the right relay. For a small test system such as this, there is
no need for routing and simple switching will suffice. However
for bigger systems, the substation gateway can also have
routing protocols installed. CORE is an emulation platform
and provides the user with a Linux kernel at all nodes. This
enables the user to create proxy attacks at various locations,
and also generate and monitor traffic to observe the effect of
cyber-attacks. When the simulation is started, data from the
power system simulator is obtained, and is routed through this
network model to the control center. The control center will
have a control algorithm implemented such as reconfiguration,
or have an operator who can respond to contingencies based on
the system information obtained. Table I provides the values
used to compute CPRM for our test system.
IV. SIMULATION RESULTS AN D DISCUSSION
We had earlier defined resiliency as the ability of the
microgrid to supply critical loads. The CPRM metric allows
the operator to monitor the status of the system in terms of
physical power system performance and cyber vulnerabilities
in devices, and take pro-active or remedial control actions
to ensure critical loads are supplied even in the presence
of multiple and extreme contingencies. CPRM can enable
resilience for a case similar to the Ukraine cyber-attack as
demonstrated by results here.
A. Device Defense Performance
We measured the run times for individual solution compo-
nents while checking the safety properties for each test case.
Figure 3 shows the results for a sample use case (the traffic
light control system) for up to 14 steps during bounded model
generation. This allows for exploration of control systems with
up to 14 consecutive unique state outputs. This is significantly
more than Stuxnet’s malicious code, which used a state
machine with three unique outputs to manipulate centrifuge
speed [19]3. One could imagine an attack that evades detection
by counting to 15 before violating a safety property. In this
case, any control logic capable of producing a non-repeating
chain of more than 14 unique outputs could also be rejected.
This bound could be set higher if required for the legitimate
plant functionality. The results are shown for running our
solution on a desktop computer with a 3.4 GHz processor
and Raspberry Pi with a 700 MHz processor.
The initial processing of the symbolic scan cycle is shown in
Figure 3(a). For all cases, this step requires less than 22ms.
Once the engine creates the initial PLC program models, it
starts building the temporal execution graph, which is the
main source of overhead. Figure 3(b) shows how long the
solution needs to complete the graph generation phase. The
majority of time in this phase is spent performing recursive
exploration of the TEG to set concrete values for atomic
propositions. A complete graph generation for 14 input-output
scans takes 2and 17 minutes on a desktop computer and
Raspberry Pi respectively. However, as expected, trimming
the analysis horizon limit to 10 reduces the graph generation
time requirement significantly—down to <10 seconds on a
desktop computer and 1minute on Raspberry Pi. Figure 3(c)
shows the corresponding state space sizes for the generated
graphs. The reported numbers, only 4Kstates for a full
14 horizon analysis,proves the effectiveness of the usage of
symbolic execution at reducing the state space size.
3We are currently working with several parties to obtain a disassembled
copy of Stuxnet’s PLC code.
(a) Initial Model Creation
(b) Temporal Execution Graph Generation
(c) Temporal Execution Graph Cardinality
Fig. 3. Performance Analysis of the Traffic Light Control System.
B. Attack Scenarios
The Ukraine cyber attack exploited a Windows vulnerability
defined in the CVE-2014-4114. The CVE allows “remote
attackers to execute arbitrary code via a crafted OLE object
in an Office document”. This is not a zero-day vulnerability,
meaning that the operator should already know that this
vulnerability exists in his system. The steps of the attack are
also shown in Fig. 4. A simplified attack timeline for the
Ukraine attack is given below -
1) Phishing emails are sent to substation operators contain-
ing the BlackEnergy malware
2) The operator opens the Office document which enables
the attacker to exploit CVE-2014-4114 and install mal-
ware on the IT network computer
3) The attacker gains access to credentials for the VPN into
the OT network
4) The attacker gains unauthorized elevated privileges on
the substation HMI computer
Fig. 4. Attack Steps
5) The attack is coordinated between multiple substations
to maximize the impact
6) The attacker opens the switches in the system and
disable remote access by the operator
Following these steps, an attacker can isolate the critical
loads in the system from the sources and will also prevent
the operator from taking mitigative control actions unless the
attack has been detected immediately after the vulnerability is
exploited. For our test system, we consider that the attacker
gains unauthorized access to the breakers connecting the main
grid to the microgrid (BRK 8), the sectionalizing breaker
(BRK 7), and the breaker connecting the PV unit to the
microgrid (BRK 4). As the gas unit is usually not connected
to the microgrid, it is assumed that the attacker does not try
to gain access to that breaker.
C. Defense mechanism without CPRM
In this case, the substation operator does not have access
to continuous monitoring tools that can calculate metrics
and provide awareness. The operator is not aware of the
attack as the attacker accesses the OT network using valid
credentials. The operator becomes aware after the attacker
gains unauthorized elevated privileges, and there is a physical
system impact due to the circuit breaker opening. At this
point, the operator is forced to take drastic control actions
such as performing hard reset on the system to restore system
operation. In this case, the operator will remain oblivious to
the situation until the actual physical impact, at which the
only option left to the operator will be to turn ON the auxiliary
generator manually, which will result in the critical load losing
power till its turned ON.
From the CVE definition the Impact score for this CVE
is 10, and the Exploitability score is 8.6. Initially when the
BlackEnergy malware is activated, it affects the confidentiality
requirement as the attacker now obtain credentials to the HMI.
The exploit has partial effect on the ModImpact score which
results in a ModImpact score of 6.4, from Eqn. 3. Since the
attacker does not need any authentication for this exploit, the
ModExploitability score is 8.6. Since the attacker has already
gained access to the substation computer, the TSV score is
Fig. 5. Changing CPRM for different scenario of Ukraine Attack
at 0.5. This leads to a CPRM score of 7.1, from Eqn. 4.
Similarly, the CPRM score can be calculated for all the steps.
The progression of the attack in terms of CPRM is shown
in Fig. 5. Without awareness of the microgrid resiliency, the
control action is taken too late, and the microgrid resiliency
drops significantly as shown in the solid line of Fig. 6. While
the operator will not have access to this metric, it is shown
here for comparison with defense mechanisms with CPRM.
This figure is a representative image to show the progress of
the attack with time.
D. Defense mechanism with CPRM
In this case, the substation operator has access to CPRM
in real time and has awareness about the physical and cyber
system status. In case of the cyber attack, the CPRM score is
computed in real time, and the operator can see the impact at
each step.
Two defense mechanisms are possible - a cyber control
action or a physical control action. If the IDS/firewall detects
the BlackEnergy malware, the operator can choose to take
a proactive control action to isolate the malicious host, thus
restoring the resiliency to its original score. This scenario
can be seen in Fig. 6, with the dashed lines where the
resiliency is improved once the operator isolates the infected
host. In case the malware is not detected on time (such as
a zero-day exploit), the operator will get notified once there
is an unauthorized access to the HMI. Now the operator
will see the increasing CPRM score and can reconfigure the
microgrid by switching on the gas unit to mitigate the effect
of disconnection from the main grid, and proactively open
the sectionalizing switch and isolate the critical load from
any disturbance due to disconnection from the main grid. The
effect of the proactive physical control action on resiliency is
shown in the dotted line of Fig. 6. This figure demonstrates
the awareness and increased visibility that metrics provide and
enables the operator to take smarter control actions.
In case of an inside attacker, the system would only detect
it if the attacker elevates their privilege to perform the cyber
attack. Otherwise, the CPRM will only change the resiliency
when there is a physical impact on the power system. Re-
siliency relates to withstand, predict and recover. Security
mechanism may help in withstanding against external attack
Fig. 6. Enhancing Microgrid Resiliency Using CPRM
but not against internal attacks. Resiliency concepts will help
to recover in both cases.
V. CONCLUSIONS
Cyber-physical attacks are a threat to the microgrid, and
resilient systems need to be designed to prevent loss of
critical loads. Cyber-physical resiliency metrics considering
physical, cyber, and device level factors need to be formulated
to aid the operator to make accurate proactive or remedial
control decisions in a short time. The Cyber-Physical Re-
silience Metric (CPRM) and TSV proposed in this work uses
vulnerability information available publicly, and provides a
score that is intuitive and can be used by the operator for
monitoring the state and and taking suitable control needed
for the system performance. A case study similar to the cyber-
attacks in Ukraine is considered to show the usefulness of the
proposed metric to enable resiliency. Developed metric has
been tested and validated in real-time using a comprehensive
cyber-physical testbed.
ACK NOW LE DG EM EN T
This material is based upon work supported by the Depart-
ment of Energy under Award Number DE-OE0000780.
REFERENCES
[1] PRESIDENTIAL POLICY DIRECTIVE/PPD-21, “Critical Infrastruc-
ture Security and Resilience,” 2013, [Accessed March-2018]. [Online].
Available: https://goo.gl/j9y4ef
[2] J. Stamp, A. McIntyre, and B. Ricardson, “Reliability impacts from
cyber attack on electric power systems,” in Power Systems Conference
and Exposition, 2009. PSCE ’09. IEEE/PES, March 2009.
[3] M. Cheminod, L. Durante, and A. Valenzano, “Review of security issues
in industrial networks,” Trans. on Industrial Informatics, IEEE, 2013.
[4] A. A. Creery and E. J. Byres, “Industrial cybersecurity for a power
system and scada networks - be secure,” IEEE Industry Applications
Magazine, 2007.
[5] A. O. de S, L. F. R. d. C. Carmo, and R. C. S. Machado, “Covert attacks
in cyber-physical control systems,Trans. on Industrial Informatics,
IEEE, 2017.
[6] W. Zeng, Y. Zhang, and M. Y. Chow, “Resilient distributed energy man-
agement subject to unexpected misbehaving generation units,Trans. on
Industrial Informatics, IEEE, 2017.
[7] C. Chen, J. Wang, F. Qiu, and D. Zhao, “Resilient distribution system
by microgrids formation after natural disasters,” Smart Grid, IEEE
Transactions on, March 2016.
[8] S. Manshadi and M. Khodayar, “Resilient operation of multiple energy
carrier microgrids,” Smart Grid, IEEE Transactions on, Sept 2015.
[9] S. Chanda, “Measuring and Enabling Resiliency in Distribution Systems
with Multiple Microgrids,” Master’s thesis, Washington State University,
Pullman, WA, 2015.
[10] D. Jin, D. M. Nicol, and G. Yan, “An event buffer flooding attack
in dnp3 controlled scada systems,” in Simulation Conference (WSC),
Proceedings of the 2011 Winter, Dec 2011.
[11] D. He, S. Chan, Y. Zhang, C. Wu, and B. Wang, “How effective are
the prevailing attack-defense models for cybersecurity anyway?” IEEE
Intelligent Systems, Sept 2014.
[12] National Academies of Sciences, Engineering, and Medicine, Enhancing
the Resilience of the Nation’s Electricity System, 2017. [Online].
Available: https://www.nap.edu/catalog/24836/enhancing-the-resilience-
of-the-nations-electricity-system
[13] V. Venkataramanan, A. Srivastava, A. Hahn, and S. Zonouz, “Enhanc-
ing microgrid resiliency against cyber vulnerabilities,” in 2018 IEEE
Industry Applications Society Annual Meeting (IAS). IEEE, 2018, pp.
1–8.
[14] M. Schiffman, “Common vulnerability scoring system (CVSS),” URL
http://www. first. org/cvss/cvss-guide.html, 2011.
[15] R. A. Caralli, J. H. Allen, and D. W. White, CERT resilience man-
agement model: A maturity model for managing operational resilience.
Addison-Wesley Professional, 2010.
[16] D. e. a. Bodeau, “Cyber resiliency engineering framework (CREF),
https://www.mitre.org/publications/technical-papers/cyber-resiliency-
engineering-framework, 2012.
[17] N. Jacobs, S. Hossain-McKenzie, and E. Vugrin, “Measurement and
analysis of cyber resilience for control systems: An illustrative example,
in 2018 Resilience Week (RWS). IEEE, 2018, pp. 38–46.
[18] C. G. Rieger, “Resilient control systems practical metrics basis for
defining mission impact,” in 2014 7th International Symposium on
Resilient Control Systems (ISRCS). IEEE, 2014, pp. 1–10.
[19] T. M. Chen, “Stuxnet, the real start of cyber warfare?[editor’s note],”
IEEE Network, vol. 24, no. 6, pp. 2–3, 2010.
[20] Q. Zhang, C. Zhou, Y. C. Tian, N. Xiong, Y. Qin, and B. Hu, “A
fuzzy probability bayesian network approach for dynamic cybersecurity
risk assessment in industrial control systems,” Trans. on Industrial
Informatics, IEEE, 2017.
[21] P. Mell, K. Scarfone, and S. Romanosky, “Common vulnerability scoring
system,” Security & Privacy, IEEE, 2006.
[22] L. Gallon and J. J. Bascou, “Using CVSS in attack graphs,” in Sixth
International Conference on Availability, Reliability and Security, 2011.
[23] A. Pandit and J. C. Crittenden, “Index of network resilience for urban
water distribution systems,International Journal of Critical Infrastruc-
tures, 2016.
[24] S. Chanda, A. K. Srivastava, M. Mohanpurkar, and R. Hovsapian,
“Quantifying power distribution system resiliency using code based
metric,” IEEE Transactions on Industry Applications, 2018.
[25] N. F. E. Command. (2014, October) SPIDERS Phase 2 Fort Carson
Technology Transition Public Report.
[26] J. Ahrenholz, “Comparison of CORE network emulation platforms,”
in MILITARY COMMUNICATIONS CONFERENCE, 2010 - MILCOM
2010, Oct 2010.
... condition under multiple contingencies becomes important. Resiliency analysis can play a vital role to address these problems [2]. ...
... Focusing on the cyber layer, power systems automation infrastructure often utilizes centralized communication network with a central substation controller for monitoring data and sending control/protection signals [9]. Such centralized data aggregation creates security challenges as parts of the infrastructure are in risk of being paralyzed in case of an attack on the control center (e.g. ...
... The authors of Reference [13] studied the resiliency level of different control systems inside an MG network. They also proposed a Cyber-Physical Resilience Metric (CPRM) that is based on utilizing the vulnerability data to specify the system status for monitoring and management purposes. ...
Article
Full-text available
Measuring resiliency of smart grid systems is one of the vital topics towards maintaining a reliable and efficient operation under attacks. This paper introduces a set of factors that are utilized for resiliency quantification of microgrid (MG) systems. The level of resilience (LoR) measure is determined by examining the voltage sag percentage, the level of performance reduction (LoPR) as measured by percentage of reduction of load served, recovery time (RT), which is the time system takes to detect and recover from an attack/fault, and the time to reach Power Balance state (Tb) during the islanded mode. As an illustrative example, a comparison based on the resiliency level is presented for two topologies of MGs under an attack scenario.
... Although connection of MGs is beneficial, mutiple control layers and information sharing over communication links are typically required [7]. This increases the complexity of control structure and threat of severe cyber attacks [8], [9]. Furthermore, forming NMGs with MGs from different owners and enabling efficient coordination of resources become more challenging. ...
Conference Paper
Nesting of microgrids enhances supply reliability, operational efficiency. and ancillary service support by optimizing distributed energy resource utilization. The availability of local reserve capacity within the nested microgrids (NMGs) enhances the local black start capability and can provide extra ancillary services to the main grid. However, NMG requires complex control and multiple communication layers, which makes it more vulnerable to control instability and cyber-attacks. Coordination of multivendor MGs is another challenge. This paper reviews the structure of a NMG, its associated benefits, challenges, and measures to assess its potential for large-scale application.
... For the distribution system resiliency, we have extended our approach from our previous works in [19], [20]. ...
Article
Cyber-Physical Systems (CPSs) are increasingly more complex and integrated into our everyday lives forming the basis of smart infrastructures, products, and services. Consequently, there is a greater need for their ability to perform their required functions under expected and unexpected adverse events. Moreover, the multitude of threats and their rapid evolution pushes the development of approaches that go beyond pure technical reliability, rather encompassing multi-dimensional performance of a socio-technical system. These dimensions call for the notion of resilience, to be used as a staging area for modelling system performance. While a large number of documents deal with this kind of problem for systems including CPSs, a comprehensive review on the topic is still lacking. The scope of this paper is to survey available literature for understanding to which extent CPSs contribute to system resilience, and to synthetize the approaches developed in this domain. More than 500 documents were reviewed through a protocol based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) review technique. This survey identifies main models and methods categorizing them on the basis of the hazards of interest and their effects on security, privacy, safety and business continuity. It also summarizes main conceptual frameworks and metrics used to assess and compare the resilience capabilities of a system including also CPSs. This cross-domain survey highlights the dominant techno-centric unit of analysis for available literature, still highlighting emerging trends towards more systemic representations of system threats, even socio-technically oriented, and respective modern investigation approaches.
Article
Full-text available
Microgrids (MG) treat local energy supply issues effectively and from a point of view of the distribution grid, may be a power supply or virtual load. Despite holding a myriad of benefits, MGs also bear a set of challenges, including a higher fault rate. Currently, many articles focus on control techniques; however, little has been written about the techniques of control, hierarchical control, and fault-tolerant control (FTC) applied to MGs, which is the motive of this bibliographic revision on control systems. A brief comparison of the different approaches in the field of present-day research is carried out primarily addressing hierarchical control and fault tolerance. The objective of this investigation is to attract the interest of researchers to the field of control and fault tolerance applied to MGs, such as: modeling, testbed, benchmark systems, control and hierarchical control strategies, fault diagnosis and FTC.
Article
Full-text available
This paper presents a method for quantifying and enabling the resiliency of a power distribution system (PDS) using analytical hierarchical process and percolation theory. Using this metric, quantitative analysis can be done to analyze the impact of possible control decisions to pro-actively enable the resilient operation of distribution system with multiple microgrids and other resources. Developed resiliency metric can also be used in short term distribution system planning. The benefits of being able to quantify resiliency can help distribution system planning engineers and operators to justify control actions, compare different reconfiguration algorithms, develop proactive control actions to avert power system outage due to impending catastrophic weather situations or other adverse events. Validation of the proposed method is done using modified CERTS microgrids and a modified industrial distribution system. Simulation results show topological and composite metric considering power system characteristics to quantify the resiliency of a distribution system with the proposed methodology, and improvements in resiliency using two-stage reconfiguration algorithm and multiple microgrids.
Article
It is essential to improve the resiliency of power distribution systems (PDS) given the increase in extreme weather events, number of malicious threats and consumers' need for higher reliability. This paper provides a formal approach to evaluate the operational resiliency of PDS, and quantify the resiliency of a system using a code-based metric. A combination of steady state and dynamic simulation tools is used to determine the resiliency metric. Dynamic simulation tools help with analyzing impact of short-term events, which might affect operational resiliency in long term. A dynamic optimization algorithm for changing operating criteria to increase the sustainability of the most critical loads has been proposed. The proposed theoretical approach is validated using a simple power distribution system model and simulation results demonstrate the ability to quantify the resiliency using the proposed code-based metric. The time-dependent quantification of resiliency has been demonstrated on a test system of two connected CERTS microgrids.
Article
With the increasing deployment of data network technologies in industrial control systems (ICSs), cybersecurity becomes a challenging problem in ICSs. Dynamic cybersecurity risk assessment plays a vital role in ICS cybersecurity protection. However, it is difficult to build a risk propagation model for ICSs due to the lack of sufficient historical data. In this paper, a fuzzy probability Bayesian network (FPBN) approach is presented for dynamic risk assessment. Firstly, an FPBN is established for analysis and prediction of the propagation of cybersecurity risks. To overcome the difficulty of limited historical data, the crisp probabilities used in standard Bayesian networks (BNs) are replaced in our approach by fuzzy probabilities. Then, an approximate dynamic inference algorithm is developed for dynamic assessment of ICS cybersecurity risk. It is embedded with a noise evidence filter in order to reduce the impact from noise evidence caused by system faults. Experiments are conducted on a simplified chemical reactor control system to demonstrate the effectiveness of the presented approach.
Article
The advantages of using communication networks to interconnect controllers and physical plants motivate the increasing number of Networked Control Systems, in industrial and critical infrastructure facilities. However, this integration also exposes such control systems to new threats, typical of the cyber domain. In this context, studies have been conduced, aiming to explore vulnerabilities and propose security solutions for cyber-physical systems. In this paper, it is proposed a covert attack for service degradation, which is planned based on the intelligence gathered by another attack, herein proposed, referred as System Identification attack. The simulation results demonstrate that the joint operation of the two attacks is capable to affect, in a covert and accurate way, the physical behavior of a system.
Conference Paper
This paper proposed a methodology to identify the vulnerable components, and ensure the resilient operation of coordinated electricity and natural gas infrastructures considering multiple disruptions within the microgrid. The micro-grid demands, which consist of electricity and heat demands, are served by the interdependent electricity and natural gas supplies. The proposed approach addressed the vulnerability of multiple energy carrier microgrids against various inter-dictions, which is used to apply preventive reinforcements to increase the resilience of energy supply and decrease the operation cost. The proposed methodology is formulated as a bi-level optimization problem to address the optimal and secure operation of multiple energy carrier microgrids. The interdependence between natural gas and electricity infrastructures is addressed to show the effectiveness of the presented methodology in improving the resilience of generation and demand scheduling against deliberate actions causing disruptions in the interdependent energy infrastructures in multiple energy carrier microgrids.
Article
A unique demographic shift towards urban centres has necessitated incorporation of sustainability principles in the tenets of urban infrastructure planning and design. Adopting resilience as the indicator of sustainability, this paper presents a novel index of network resilience (INR) for urban water distribution systems. The index developed in this paper incorporates six network attributes to develop a composite INR based on the topology of the water distribution systems. A multi-criteria analysis (MCA) using the weighted summation approach is employed to evaluate the alternative configurations which would satisfy the demand and other hydraulic requirements. Analytic hierarchy process (AHP) was assigned to assign weights to the attributes and was optimised for two scenarios: resilience and efficiency. Using the original configuration of Anytown network as the base case scenario, four alternative designs were developed. The results indicate that resilience of the system, in terms of increased robustness and redundancy, can be increased through a better topology without increasing material and energy investment. In addition, the results also indicate that there might be some potential trade-off between resilience and efficiency of flow for the network.
Article
Distributed energy management algorithms are being developed for the smart grid to efficiently and economically allocate electric power among connected distributed generation units and loads. The use of such algorithms provides flexibility, robustness, and scalability, while it also increases the vulnerability of smart grid to unexpected faults and adversaries. The potential consequences of compromising the power system can be devastating to public safety and economy. Thus, it is important to maintain the acceptable performance of distributed energy management algorithms in a smart grid environment under malicious cyberattacks. In this paper, a neighborhood-watch based distributed energy management algorithm is proposed to guarantee the accurate control computation in solving the economic dispatch problem in the presence of compromised generation units. The proposed method achieves the system resilience by performing a reliable distributed control without a central coordinator and allowing all the well-behaving generation units to reach the optimal operating point asymptotically. The effectiveness of the proposed method is demonstrated through case studies under several different adversary scenarios.
Article
Microgrids with distributed generation (DG) provide a resilient solution in the case of major faults in a distribution system due to natural disasters. This paper proposes a novel distribution system operational approach by forming multiple microgrids energized by DG from the radial distribution system in real-time operations to restore critical loads from the power outage. Specifically, a mixed-integer linear program is formulated to maximize the critical loads to be picked up while satisfying the self-adequacy and operation constraints for the microgrids formation problem by controlling the ON/OFF status of the remotely controlled switch devices and DG. A distributed multiagent coordination scheme is designed via local communications for the global information discovery as inputs of the optimization, which is suitable for autonomous communication requirements after the disastrous event. The formed microgrids can be further utilized for power quality control and can be connected to a larger microgrid before the restoration of the main grids is complete. Numerical results based on modified IEEE distribution test systems validate the effectiveness of our proposed scheme.