Content uploaded by Stefan Hofbauer
Author content
All content in this area was uploaded by Stefan Hofbauer on Aug 11, 2022
Content may be subject to copyright.
Assuring long-term operational resilience in a pandemic:
Lessons learned from COVID-19
Dr. Stefan, S.H., Hofbauer
team Technology Management GmbH, Senior Consultant
IT-Security, Vienna
stefan.hofbauer@te-am.net
Univ.-Prof. Dipl.-Ing. Dr. Dr. Gerald, G.Q.,
Quirchmayr
University of Vienna, Research Group Multimedia
Information Systems, Vienna
gerald.quirchmayr@univie.ac.at
ABSTRACT
The COVID-19 pandemic has shown that some companies have
been prepared for the pandemic in terms of crisis management, but
other companies have not been prepared at all. The dependency of
a company on third-party provider is even bigger in a pandemic
situation. Operational resilience must be assured for third-party
providers, who are supporting the company in delivering critical
business processes. In a pandemic, the risk is much bigger that
a third-party provider is having economical or employee-related
issues, for example nancial problems or loss of sta so that the
provider will not be able to support the company on the same
level as before the pandemic or cannot support the company at all.
To assure operational resilience within a company, it is needed to
rst identify the critical IT assets and critical processes within the
company. Only then it is possible to protect these IT assets and
assure the business continuity of the critical business processes.
Results described in this paper are based on practical experiences
gained during the COVID-19 crisis.
KEYWORDS
COVID-19, operational resilience, risk management, controls, BCM,
KRI
ACM Reference Format:
Dr. Stefan, S.H., Hofbauer and Univ.-Prof. Dipl.-Ing. Dr. Dr. Gerald, G.Q.,
Quirchmayr. 2021. Assuring long-term operational resilience in a pan-
demic: Lessons learned from COVID-19. In The 12th International Con-
ference on Advances in Information Technology (IAIT2021), June 29–July
01, 2021, Bangkok, Thailand. ACM, New York, NY, USA, 9 pages. https:
//doi.org/10.1145/3468784.3470466
1 INTRODUCTION
There are dierent kinds of Business Continuity Management
(BCM) approaches [
1
], such as the project management approach,
the risk approach, or the managerial approach. Each company must
decide on its own, which approach is the most appropriate for them.
For slow onset-disasters, as can be seen in Figure 1 below, a
proactive response is needed to not fall into a crisis. If the early
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
IAIT2021, June 29–July 01, 2021, Bangkok, Thailand
©2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-9012-5/21/06. . . $15.00
https://doi.org/10.1145/3468784.3470466
Figure 1: Slow onset disaster management continuum [2]
warnings are not taken seriously, the cause is a crisis, which is
dicult to master. Slow onset disasters are the second problem in
disaster risk resilience besides the long-term eects of a disaster. It
is typical for slow onset disasters that the disaster triggers spread
over half a year with no one taking early warnings seriously. The
warnings, especially those coming out of China, should have not
been put aside and well-prepared mitigation eorts could have
prevented some of the worst eects. However, there were a few
fortunate companies in Europe with branches in Asia, who correctly
analyzed the developing danger and prepared adequate business
continuity measures. So, while for most Asian countries COVID-19
was a surprise, for European and American companies there would,
at least in theory, have been sucient time to properly plan for the
developing situation.
The nuclear disaster of Fukushima [
3
–
5
] & the temperature rise
due to the climate change [
6
] belong to the long-term eects of
a disaster. For both disasters, the person in charge should have
known it better that there will be long-term eects on the envi-
ronmental side, as well as for humans. Figure 2 below shows the
long-term eects of the climate change due to temperature rise
and is a practical example for the urgent need to focus on getting
prepared for disaster management. If no environmental measures
are taken, the temperature will rise constantly and even faster.
Looking at the COVID-19 pandemic, only a small number of
companies were prepared for the pandemic. Now, the long-term
IAIT2021, June 29–July 01, 2021, Bangkok, Thailand Stefan Hofbauer and Gerald irchmayr
Figure 2: Temperature Rise of 1.5°C could happen by 2024 in
“B2C” [6]
eects of a disaster must be considered by lessons learned from
the COVID-19 pandemic. It should be expected that the current
crisis will be followed by other pandemics. Therefore, companies
are required to learn from this pandemic and improve their BCM
and long-term resilience [7] capabilities.
This paper is further organized in chapter 2, discussing the unmet
challenge of long-term resilience, chapter 3, the main contribution
with our approach driven by operational resilience requirements,
chapter 4, an application of our approach to handling Information
Security challenges during the COVID-19 pandemic, chapter 5 with
lessons learned from applying this approach during the COVID-19
pandemic, and nally the conclusions in chapter 6.
2 THE UNMET CHALLENGE OF LONG-TERM
RESILIENCE
The pandemic should also be a chance to learn from the crisis to
build long-term resilience for a company. The pandemic has shown
that critical infrastructures and providers are essential, even more
essential during a crisis. The industry must become more resilient
so that a crisis cannot aect them. From the COVID-19 pandemic,
much can be learned to be more resilient in terms of preparation,
response, and recovery.
A criticality assessment of the commercial products, customer
services, and supporting business activities needs to be done at
the beginning. The business continuity and contingency strategies
[
8
] need to be elaborated and reviewed constantly, together with
senior management and all relevant stakeholders.
A documentation needs to be established, including all busi-
ness processes of the company, and conducting an assessment, if
there are regulatory requirements regarding these processes. The
applications for these processes are listed and it must be veried
if these processes are in scope (meaning critical for operational
resilience) and the targeted availability score and tolerance level
for this process (in terms of application and infrastructure availabil-
ity) in percentage is met. Based on the criticality of the business
processes a score is calculated to see to which extent this control is
fullled to mitigate the risks. As an example, the risk score can be
between 1 and 5 with 1 being the lowest and best risk score and 5
being the highest and worst risk score.
The goal in disaster management is to move from crisis response
to operational resilience, as can be seen in Figure 3. The dierent
phases are described and depending on the crisis, the starting point
is at a dierent phase.
Companies have been used to plan for short-term eects, but
not for long-term eects. Many Business Continuity Plans (BCP)
did not mention Work from Home (WFH) as a measure, but limited
themselves to the relocation of sta to a backup site, which is not
helpful in case of a pandemic. Also, many companies did not have
a pandemic plan when the COVID-19 crisis started. To be able
to move from crisis response to operational resilience, dierent
controls need to be in place. Among these controls are for example,
the Business Impact Analysis (BIA), Resilience by Design (Test of
Design), Test of Eectiveness and an Operational Recovery Strategy.
The following key controls (entity, asset, or business-service based)
aiming at operational resilience need to be set up where entity stands
for department, asset stands for IT asset and business-service stands
for service within the company:
For condentiality reasons, more details about the taken mea-
sures can unfortunately not be provided at this stage.
In general, a Resilience by Design approach needs to be in place
and the operational resilience lifecycle needs to be covered for
all critical business processes and IT assets. It thus makes sense
that a company has a central Conguration Management Database
(CMDB), used as an inventory, including all IT assets with their
related information such as hostname, serial number, support con-
tract numbers, etc. Many companies do not have a CMDB in place
and for sure, a crisis is not the best moment to implement a CMDB,
because there is no time.
In an internal audit by conducting Test of Design and Test of
Eectiveness [
11
], these controls [
12
] are tested at least once a year
to see, if these controls are sucient to reach long-term resilience.
In the Test of Design, the design of the controls is tested and veried,
whereas during Test of Eectiveness it is tested if the controls in
place are eective.
Due to incomplete risk assessment, the long-term threats are
often underestimated. This is the reason why strategy (phase 3) and
planning (phase 4) are designed for short-term threats, although
the Business Continuity models would oer more (see Figure 4).
Figure 4 is also oering the key steps of analyzing the business,
assessing the risks and rehearsing the plan. The potential of these
methods is only partially utilized today.
Considering the long-term eects of a crisis is the kernel of our
approach. Thus, we can make sure that the core IT systems sup-
porting critical processes are operationally resilient. Measures for
closing existing gaps are needed to secure the relevant IT systems.
The following measures assure that important IT systems support-
ing critical processes are operationally resilient:
•Using only stateless services
•Usage of a transactional based database
•
Redundancy of the following infrastructure (in the data cen-
ter or between datacenters):
•Load balancer
•Firewall
Assuring long-term operational resilience in a pandemic: Lessons learned from COVID-19 IAIT2021, June 29–July 01, 2021, Bangkok, Thailand
Figure 3: A phase model based on the “comprehensive approach” to disaster management [9]
Table 1: Key controls aiming at operational resilience.
Entity Asset Business service
Local CEO annually reviews the local
operation resilience targets regarding the
resilience-critical business/IT services
Business/IT service owners annually review
relevant BIA and service levels for alignment,
interdependencies, and consistency
Operational Recovery Strategy for
business strategy
Local CEO checks whether the resilience
critical business and IT services meet the
dened threshold
Resilience by Design as a service-based test Operational Recovery Strategy for
business as a service-based test
Resilience by Design [
10
] as local approach
Ability to monitor as a service-based test
Resilience by Design as a service-based test
Operational Recovery Strategy for IT assets
strategy
Ability to monitor as local approach Operational Recovery Strategy for IT
asset-based test
Ability to monitor as a service-based test
Ability to respond, in case of incidents
process
Ability to respond, in case of incidents by
Test of Eectiveness
Ability to learn, in case of incidents process
Ability to learn, in case of incidents by Test
of Eectiveness
Concurrently maintainable Data Centers
assessment
Other Data Centers assessment
Preventive maintenance criteria
•All application server (Web and Middleware)
•Database Server (cold standby)
•Storage (mirrored)
•
Network (including dark ber connection between data cen-
ters)
•Document storage
•Periodically simulating the outage of one data center
For the implementation of these measures, the evidence is needed
that they have been implemented. A Disaster Recovery test [
14
]
of the data centers serves as evidence for the implementation. If
more than one data center is in place, it must be considered that
the distance between the datacenters is at least 200 kilometers [
15
].
Otherwise, redundancy is not assured in case of an environmental
crisis for example. At least once a year, a disaster recovery test
including a disaster recovery test protocol needs to be carried out.
This test takes place in the data center and helps personnel to get
familiar with the IT infrastructure at this location as well as moni-
toring the automatic failover of the network and servers, running
applications and services.
Due to the long distance between the two geo-redundant data
centers, asynchronous data mirroring is needed. There are special
requirements regarding infrastructure redundancy concerning the
transmission medium. For the failover process, there is a dierence
if the servers to be failed over are in cold standby or hot standby.
Unmet resilience can have a signicant negative impact on the
following business areas: customer services, regulatory compliance,
IAIT2021, June 29–July 01, 2021, Bangkok, Thailand Stefan Hofbauer and Gerald irchmayr
Figure 4: Key steps in developing BCM according to [13]
nancial management, sta condence, and reputation of the com-
pany. The longer the business-critical process cannot be executed,
the higher the impact on the individual, as well as on the impact
area. Given the example of a data breach, which needs to be re-
ported to the public authorities within 72 hours [
16
]. If the data
breach is not reported within this amount of time, there will not
only be a regulatory impact but also a nancial impact, resulting in
a nancial penalty for the company.
Operational resilience can only be met when there are a well-
organized framework and policies in place within the company.
Once, the pandemic is there and operational resilience is not yet
implemented within the company, it is not feasible to reach opera-
tional resilience.
Operational resilience can also be achieved by outsourcing tasks
or part of the service to another location, either a dierent internal
department or having personnel working from home, in case of a
pandemic. Looking for example at a contact center service, phone
calls must be rerouted, and a dierent conferencing solution needs
to be implemented to be able to use the company’s contact center
number as an outgoing number for personnel working from home,
using a softphone client. With such a solution, also the availability
of the contact center can be maintained, and customer satisfaction
can be kept at a high level, even during a crisis. Companies must
bear in mind that such a scenario will require additional hardware
resources, documentation, governance, processes, testing, and time
to have the new solution in place.
3 AN APPROACH DRIVEN BY OPERATIONAL
RESILIENCE REQUIREMENTS
The goal for a company regarding crisis management is to move
from the crisis response phase to an operational resilience phase.
Figure 5: An approach driven by operational resilience re-
quirements, based on the NIST Cybersecurity Framework
Version 1.1 [19]
During the crisis response phase, businesses need to be closely mon-
itored, while operational resilience helps to stabilize the company’s
situation with the crisis.
The requirements [
17
] regarding operational resilience need to
be dened and documented. Requirements engineers within the
company are the ones, which can help here, as they know how to
gather and formulate requirements. One example is to keep track of
the impact on the business as usual and dene for which business
services, operational resilience is needed. As a rule, operational
resilience is needed for the business-critical processes [
18
], because
the availability of the company and its provided services need to
be maintained.
A sketch of our approach is shown in Figure 5. The long-term
perspective on problems is highlighted.
Our approach consists of the following 6 steps:
•
Step 1 (Identify Problems) is the phase, in which a company
needs to identify existing problems. Often, monitoring solu-
tions are used to watch out for potential problems.
•
Step 2 (Long-term perspective on Problems) is the phase
where we are also considering the long-term eects of a
crisis on crisis management.
•
Step 3 (Protect against Problems) is the phase, where mea-
sures for covering existing gaps need to be thought about
and documented.
•
Step 4 (Detect Problems) is the phase, where administrators
need to detect the identied problems and see whether addi-
tional problems exist, which have not yet been identied in
Step 1.
•
Step 5 (Respond to Problems) is the phase, where the mea-
sures dened in Step 3 are put into action. Improving the
business processes in place is also done in this step.
Assuring long-term operational resilience in a pandemic: Lessons learned from COVID-19 IAIT2021, June 29–July 01, 2021, Bangkok, Thailand
•
Step 6 (Recover from Problems) is the phase, in which a
company can recover from problems by having put adequate
measures in place and business processes improved.
Regular service management meetings need to be in place, which
consists of reporting, service levels, incidents, and performance.
If there is for example a degradation of the service, an additional
remark needs to be added to the service level report comments.
Also, if there are additional actions regarding operational resilience
that need to be taken if there are issues.
From a third-party provider point of view, there are a couple of
topics that need to be considered in terms of operational resilience,
which are:
•
Expected impact on operational stability/delivery from third-
party providers.
•
Business Continuity risk: Likelihood of an outage in terms
of a generic assessment as expert judgment.
•Impact of the outage and its criticality.
•
Preventive actions the company (not the third-party
provider) can do to prevent an outage.
•
Reactive actions the company (not the third-party provider)
can do when an outage occurs.
•
IT-Security Controls risk increase in terms of a generic as-
sessment as expert judgment to verify if due to remote work-
ing the risk of IT security controls being not eective or in
place increases.
•Other risks, for example, downstream risk.
High dependency on third-party providers of technical support
or legal support is a risk. In a pandemic especially small third-party
providers are suering from being unable to provide the service
or a part of the service. The main asset of our model is setting up
crisis operations and operational resilience to regain control and
recovery.
IT-Security Controls [
20
] are part of a Detailed Risk Assessment
(dRA), which is having a look at IT asset-dened controls for ex-
ample. During the dRA, in a risk landscape, dierent scenarios are
dened in a matrix, based on the probability of the event (x-axis), as
well as the impact of the event (y-axis). IT assets need to be secured,
based on their Condentiality, Integrity, and Availability (CIA) rat-
ing. Examples of IT-Security controls are extensive monitoring, the
need for Multifactor Authentication, or the need for implement-
ing data protection security measures. Multifactor Authentication
should be used, whenever it is possible.
Risk management is always looking at the inherent risk, as well
as the residual risk. The risk rating (Low, Medium, High, or Critical)
is calculated based on the likelihood and impact of the risk. It makes
very well sense that scenarios are based on realistic events and that
they are tested in real exercises and not only tabletop exercises on
the paper. Real exercises also count for a higher BCM maturity level
[21] than tabletop exercises.
Open security issues found in IT assets must be xed within
a dened time frame, based on the criticality level of the found
security issue, by penetration tests or operational measures. There
are no high or critical security issues allowed to become overdue. It
is within the responsibility of the IT security department to do the
governance and create vulnerability status reports for management
and follow up open security issues with the specialist department
as well as following up on already overdue open security issues.
Controls are having the task to mitigate risks, though sometimes
it is not possible to fully mitigate all risks. The risk can either be
mitigated, reduced, or accepted. It is a matter of priorities, as well
as costs involved to reduce or mitigate the risk. The 80/20 rule
applies, meaning that with 20% of the measures, 80% of the risk
can be mitigated, but it will cost another 80% of the measures to
also reduce the last 20% of the risk. Companies have a certain risk
budget and management needs to decide where to spend the budget.
The risk budget is also used to compensate for incidents and their
caused damages.
As a rst step of the operational resilience process, it needs to be
analyzed if the critical business process is aected by regulatory re-
quirements (see KRITIS [
22
]) and if these regulatory requirements
are mandatory. Next question is, if the business services are directly
linked to a customer channel and if there is reputational risk if the
business service cannot be maintained. Further, it must be checked
if there are contractual obligations regarding availability require-
ments or if there is timely process execution in place. The last point
of step 1 is to verify if the business owner has dened specic risks
for the business service. Only business-critical processes, which are
in the scope of BCM are looked at from an operational resilience
point of view.
Further, the amount of Full-Time Employee (FTEs) which are
needed to process the business service on a minimum level is de-
ned. The question is also if the business process can only be carried
out by certain personnel, which are having this Know-How or ev-
erybody in the department has the knowledge to carry out this
business process. In case that the business process can only be
carried out by certain personnel, this needs to be documented in
the BCP.
The resilience of processes and infrastructures always needs to
be audited to assess the eciency of the company’s BCPs as well as
to see how well the company is prepared for disruptive incidents.
In a second step, it needs to be veried, if the business owner
has dened enough FTEs to keep the business process available on
a required minimum level. The last point of step 2 is to dene the
applications, meaning the IT assets supporting the business service.
Dierent business continuity strategies exist, like split opera-
tions of the employee working in the oce and employee working
remotely, split operations over 2 or more locations, split of the
team in the same building, activation of alternative location (BCP
location or other), backup team working remotely returning to the
oce and working remotely.
After the resilience-critical business services have been dened,
recovery strategies for these are written down. The owners (or del-
egates) of resilience-critical business services need to dene these
recovery strategies. The recovery strategies [
23
] should rst cover
a list of scenarios that can go wrong in operations and can be solved
by the recovery strategy. And secondly cover the Recovery Point
Objective (RPO), Recovery Time Objective (RTO), and Mean Time
to Repair (MTTR) at the service level (possibly already included
at the BIA on process level) or service levels as dened in service
catalogs.
On a company level, the operational resilience targets need to
be dened and approved by general management. The business
IAIT2021, June 29–July 01, 2021, Bangkok, Thailand Stefan Hofbauer and Gerald irchmayr
Figure 6: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand [29]
services in scope need a dened measurement model, meaning
Key Performance Indicator (KPI) for monitoring and reporting the
availability. IT assets will be linked to critical business processes
and checked for their availability. The main goal during a pan-
demic is business continuity. Experience has shown that also shorter
(smaller than 48 hours) and much longer time frames (2 weeks until
2 months) need to be looked at when looking at business-critical
processes and their impact on the business.
Companies can achieve operational resilience by ensuring infor-
mation and communication systems are protected by Cybersecurity.
Their IT systems need protection, detection, response, and recovery
programs in place, which is regularly tested.
In summary, our approach covers the following main activities:
Documentation, Assessment, Recovery, and Testing.
4 APPLICATION OF THE APPROACH TO
HANDLING INFORMATION SECURITY
CHALLENGES DURING THE COVID-19
PANDEMIC
Our stepwise approach is based on the National Institute of Stan-
dards and Technology (NIST) Cybersecurity Framework. The NIST
Cybersecurity Framework would be applicable, when combining it
with the long-term eects of a crisis. Recent experience [
24
] shows
that companies are often only considering the short-term eects of
a crisis in the Identication Phase (step 1 in our approach), but not
the long-term eects (step 2 in our approach).
It is advisable to have additional security monitoring in place,
especially during times of crisis. On an application level, Appli-
cation Security Monitoring (ASM) [
25
], dierent actions need to
be monitored such as audit logs deleted, users locked or deleted,
multiple failed logins, unexpected login outside of business hours,
etc. These events then need to be implemented within a monitoring
system, like Qradar [
26
] or Splunk [
27
]. Besides that, also Technical
State Compliance Monitoring (TSCM) [
28
] for dierent kinds of OS
systems is helpful.
Further, Indicators of Compromise need to be checked for current
ongoing threats if these can be observed within the own network
by an own managed Security Operation Center (SOC) or a SOC as
a service.
In case of a crisis, it is a strategic advantage to have BCM plans
and procedures in place, also as oine copies. In case there is no
connection to the cloud provider or the internal network, thus no
access to BCM documentation, it might be necessary to have BCM
documentation available oine.
As can be seen in Figure 6, doing nothing is leading to a crisis and
the same holds true for the IT systems of a company. If enterprises
are not looking at their existing problems and are able to identify
them to close these gaps with measures, they will sooner or later
face an unmanageable crisis.
In Figure 6, the suppression strategy according to the Imperial
college is shown. It is called the hammer and dance phenomenon.
If no measures are taken the hammer and dance phenomenon will
reappear again and again over time.
There is a striking parallel with Advanced Persistent Threats
(APTs [
30
]). Attackers are also attacking companies in several waves
at dierent times. First, they are trying to misuse a vulnerability of
an IT system to get access to a company; then they are carrying out a
lateral movement to get administrator access to the crown jewels of
a company (intellectual property and nancials) and nally stealing
and ooading data or encrypting data to request ransom from a
company.
5 LESSONS LEARNED FROM APPLYING THIS
APPROACH DURING THE COVID-19
PANDEMIC
One lesson learned from the COVID-19 pandemic is that it makes
sense to verify if the company’s third-party providers do have an
operational resilience clause, Force-Majeure clause [
31
], or SOC
type 2 clause in their contract with the company. While Force-
Majeure clauses were the standard model applied in the European
context, other approaches, such as Frustration of Purpose or the
Doctrine of Impracticability are possible, depending primarily on
the legal environment the company is operating in. It makes sense
that not only those third-party providers are reviewed, with which
Assuring long-term operational resilience in a pandemic: Lessons learned from COVID-19 IAIT2021, June 29–July 01, 2021, Bangkok, Thailand
the company already has a contract, but to look at the contracts of all
partners and outsourcing companies, which are having a business
with the company. In case the third-party provider is unavailable
and cannot provide the assured service for whatever reason, the
third-party provider needs to announce its backup service provider.
Also, the backup service provider needs to adhere to an operational
resilience clause, Force-Majeure clause, or SOC type 2 clause. The
backup service provider also must prove that to the company. It can
either be a dedicated abstract that is describing this conformance
or other means to assure the third party-provider is resilient, such
as having the data stored in a datacenter. This evidence needs to be
veried once a year by the company, which is approaching the third-
party provider and/or backup service provider and is asking for
evidence within a predened time frame. The third-party provider
also needs to be asked for his Disaster Recovery Plan (DRP) as
well as Disaster Recovery test protocol, if the provided service is
based on IT. Further, it is needed to have a complete and up-to-date
list which third-party providers support the company, including
their relevant contact details. The business continuity coordinator
of the company must make sure that the third-party providers
of critical processes do have a legal contract with the company,
which also means that the third-party providers are having a BCM
framework in place as well as regular, minimum once per year,
BCM crisis management exercises. Additionally, the third-party
providers’ BCP and BCP test needs to be provided. If the third-party
provider does not have a BCM framework in place, they need to tell
the company, which partner will take over, in case the third-party
provider is not operational. This needs to be written down and
archived by the company.
Reassessment regarding business continuity of all business units,
creating BIAs and BCPs for identied relevant business units, and
have them included in the overall BCM dRA is helpful. Further,
scenarios need to be assessed with the highest risk/probability to
create specic rapid-response guides [
32
] and checklists based on
these scenarios and include them in the BCM crisis plan document
and future tests. It is a good idea to think about scenarios, for exam-
ple a blackout, where no electricity and power is working anymore.
The ideal situation is not only to think about these scenarios but
to test them, one by one, within time. To test the dierent scenar-
ios, companies can get help from the big 5 consultancies (Deloitte,
PwC, KPMG, Ernst & Young, Accenture) or any other consultancy
specialized on the topic of BCM and operational resilience. Further,
all available BCM relevant documents need to be reviewed and
organized in an intuitive hierarchical structure to allow for easy
navigation and quick retrieval during a crisis.
It is useful to dene certain Key Performance Indicator/Key Risk
Indicator (KPI/KRI) related to business continuity such as the num-
ber of BCM crises (number of ocial and unocial crises reported
per year) and average duration per BCM crisis. This number serves
as an indicator of the BCM maturity level in the company. Dif-
ferent BCM maturity levels can be achieved, and the goal is to
always strive for higher BCM maturity levels each year. Depart-
ments always need to think about if the current KPI/KRI in place
are sucient, or new KPI/KRI values are needed.
Further BCM KRIs are:
•
% BCM Framework with maturity level outside of dened
risk score
•
% Crisis Management Organization (CMO) plans with matu-
rity level outside of dened risk score
•
% BCP plans with a Maximum Outage Time (MOT) smaller
or equal to 48 hours with maturity level outside of dened
risk score
•
% Applications with an MOT smaller or equal to 24 hours
with maturity level outside of dened risk score
•
% Outsourced applications with an MOT smaller or equal to
24 hours with maturity level outside of dened risk score
•
% Of critical outsourcers with an MOT smaller than 48 hours
without a business continuity plan
•
% Outsourced applications with an MOT smaller or equal
to 24 hours as the relation between outsourced applications
and all applications used by the entity
The pandemic has shown that measures in BCP plans, for exam-
ple relocation of sta to a dierent oce location, are not adequate
anymore, because the crisis does not stop at the nation’s border.
In such a case, it has proven much more ecient to have the mea-
sure of remote working in place. Of course, remote working on the
other hand also means that higher bandwidth to the central oce
is needed, as well as having technical issues at the beginning and a
much higher possibility of data getting lost, which needs additional
Data Loss Prevention (DLP) [
33
] measures in place. Company in-
formation can easily be leaked when working from home by taking
screenshots, which must be considered by the security department.
Documentation during a pandemic including decisions taken,
communication sent, and scenarios need to be kept up to date at all
time in case of an audit and for clarity reasons. It makes very well
sense to test the scenarios with the highest impact and probability
during the dRA in resilience exercises (Cybercrime resilience, Crisis
Management Exercise) during the year. Example scenarios are for
example cyber-attacks, loss of personnel, pandemic, or environ-
mental crisis.
It has also shown necessary that documentation needs to be
translated to English, as soon as one member of the crisis man-
agement team can only speak English and not the local language.
Otherwise, this member of the crisis management team cannot
understand the documents and is not able to support the crisis
management team. During the crisis, there is no time to translate
all these documents, this must be done beforehand.
The current situation of the pandemic needs to be monitored as
well as the measurements reviewed and evaluated constantly. One
crisis management coordinator is not enough, he or she cannot
handle all the dierent tasks, coordinating and producing concepts.
Thus, a team organized as a Project Management Oce (PMO),
consisting of at least 3 people, a so-called Crisis Management Coor-
dination Oce (CMCO) is needed that can handle all the dierent
tasks at the right time and quality. Once a crisis phase has nished,
the next crisis phase already needs to be planned, aligned, and
organized within the crisis management team. Company personnel
always need to be informed and kept up to date regarding current
measures, plans, and concepts as well as giving them an outlook
for the future.
IAIT2021, June 29–July 01, 2021, Bangkok, Thailand Stefan Hofbauer and Gerald irchmayr
To be well prepared for the crisis it makes very well sense to
have a Facility Management DRP in place regarding emergency
workplaces. All BCM activities can be audited unannounced or
announced by an independent auditor. Often, from these audits,
there are gaps found by the auditor, which need to be followed up
and xed within a given time frame.
Another lesson learned is to always follow the process of creating
an incident ticket or call the internal support hotline if having
an IT issue, otherwise, IT does not have a chance to do proper
problem management and see how many issues they have got.
During a pandemic, a lot of personnel are working from home, thus
increasing the number of IT support tickets and incidents.
Always having the current laws and legislations in mind when
thinking about internal measures is a plus, as well as not being
stricter with measures for the employees than the government
during a pandemic.
Another approach is to use cloud and platform technologies for
business continuity, where IT-Security in the cloud is a big risk for
BCM. A switch from on-prem applications and infrastructures for
company collaboration, program management, and organizational
eciency could be seen during the pandemic.
The results in terms of lessons learned of the yearly resilience
exercises need to be incorporated into the BCM framework and not
left away. There can be huge understanding regarding points of
improvement in internal communication, external communication,
incident response, incident handling as well as taking the right
decisions during a short time frame.
Practical experience has shown that quite some BCM approaches
need to be adjusted, rened, or added to a pandemic. Also, quite
some companies did not have a pandemic plan [
34
] in place at all,
which made it dicult at the beginning of crisis work.
It is always a good idea to have a look at how other companies
or governments are dealing with a pandemic to learn from them
and incorporate good parts into the company’s framework. Attack
trends can be recognized either by exchanging knowledge with
dierent Computer Emergency Response Teams (CERTs) or having
Threat Intelligence Systems [
35
] in place. From the experience
of already compromised companies, much can be learned about
attacker’s tactics and countermeasure techniques, which can be
incorporated into the company’s own IT-Security.
For developing a more generalized recommendation, additional
studies will have to be carried out to validate the conclusions drawn
from the example which this paper is based on. Once the COVID-19
crisis is mastered, it should become possible to collect wider sam-
ples of how the situation was dealt with. In a next step, including
this additional information, can give the work carried out by the
authors a valid scientic basis. At this stage, the lessons learned
from the example discussed in the paper should only be applied
to very similar settings. More data is needed to see if the model
is more generally applicable. Policy formation in the long run can
and should only be done once the additional data becomes avail-
able, because otherwise it will be dicult to ground the policy in
scientically validated research.
6 CONCLUSIONS
For enterprises, the pandemic has shown that companies need to
be prepared for all kinds of dierent crises, even if the possibility of
them becoming reality is rather low. As it is impossible to prepare
for all sorts of crisis, it is essential to have a sort of early warning
system to allow ample time for getting prepared. Therefore, the
recognition and correct assessment of emerging crisis patterns is a
must.
Consequently, the BCM framework itself needs to be audited and
continuously improved, based on the Plan-Do-Check-Act (PDCA)
circle. The company’s personnel must be constantly trained as well
as trained on security threats using security awareness programs
because also the security threats are signicantly rising during a
pandemic.
For companies, it has become mandatory to not only have IT
security personnel but also to have specialists for BCM and oper-
ational resilience in place. It has also shown practically that per-
sonnel, who is acting in the crisis management team are doing this
activity full time, not only during the crisis but also during normal
times. It is not a good idea to have personnel as a crisis management
team member, who also needs to carry out other time-consuming
work tasks, like working in the information security department.
The right attention and support of management concerning BCM,
and operational resilience is needed to have a positive eect on the
company. Therefore, it should not be a question of costs to imple-
ment for example a cloud-based solution [
36
] supporting the BCM
and operational resilience process in alarming the right persons in
case of a crisis as well as having the right platform to share infor-
mation and communication among the crisis management team
and with employees.
A well-prepared BCM approach together with tool support is the
best solution to avoid having a panic like situation during a crisis.
The tool is helping to get an overview of the current situation by
providing all available information and status, making sure that the
communication within the crisis management team is sharp and
easy to understand and nally providing every crisis management
team member with the required information, which is needed to
fulll his or her role in dealing with the situation.
Finally, our paper has shown a sample approach for meeting oper-
ational resilience requirements based on our experience in dierent
domains and being in a coordinating role in a crisis management
team.
ACKNOWLEDGMENTS
This paper has been generously supported by team Technology
Management GmbH (https://te-am.net/)
REFERENCES
[1]
Brahim Herbane, Ethné Swartz, and Dominic Elliott. 2004. Busi-
ness Continuity Management: Time for a Strategic Role? Article in
Long Range Planning. https://www.researchgate.net/prole/Ethne-
Swartz/publication/240177042_Business_Continuity_Management_Time_
for_a_Strategic_Role/links/5d23141492851cf4406f5462/Business-Continuity-
Management-Time- for-a- Strategic-Role.pdf
[2]
Department of Humanitarian Aairs/United Nations Disaster Relief Oce –
United Nations Development Programme. 1992. An Overview of Disaster Manage-
ment. http://www.nzdl.org/cgi-bin/library?e=d- 00000-00- -- o-0aedl--00-0- -- -
0-10- 0-- -0- -- 0direct-10-- -4- -- -- -- 0-1l-- 11-en- 50-- -20- about--- 00-0- 1-00- 0-
0-11- 1-0utfZz- 8-00&cl=CL1.3&d=HASH68c99b49db28474206b4.4.3.2.3>=1
Assuring long-term operational resilience in a pandemic: Lessons learned from COVID-19 IAIT2021, June 29–July 01, 2021, Bangkok, Thailand
[3]
Yoshiaki Nemoto, and Kiyoshi Hamaguchi. 2014. Resilient ICT research based on
lessons learned from the Great East Japan Earthquake. IEEE. https://ieeexplore.
ieee.org/abstract/document/6766082
[4]
Lisa V. Chewning, Chih-Hui Lai, and Marya L. Doerfel. 2021. Organizational
Resilience and Using Information and Communication Technologies to Rebuild
Communication Structures. Sage Journals. https://journals.sagepub.com/doi/abs/
10.1177/0893318912465815
[5]
Jan Beyea, Edwin Lyman, and Frank N. von Hippel. 2013. Accounting for long-
term doses in “worldwide health eects of the Fukushima Daiichi nuclear accident.
From the journal: Energy & Environmental Science. https://pubs.rsc.org/en/
content/articlelanding/2013/ee/c2ee24183h#!divAbstract
[6]
Sam Carana. 2015. Temperature Rise of 1.5
°
C could happen by 2024. Below2C.
https://below2c.org/2015/12/temperature-rise- 1-5c- happen-2024/
[7]
McKinsey & Company. 2018. Don’t stress out: how to build long-term resilience.
https://www.mckinsey.com/business-functions/organization/our-insights/the-
organization-blog/dont- stress-out- how-to- build-longterm- resilience#
[8]
NIST. 2010., Contingency Planning Guide for Federal Information Systems. NIST
Special Publication 800-34 Revision 1. https://nvlpubs.nist.gov/nistpubs/Legacy/
SP/nistspecialpublication800-34r1.pdf
[9]
Queensland Fire and Emergency Services. 2018. Queensland Prevention,
Preparedness, Response and Recovery Disaster Management Guideline.
https://www.disaster.qld.gov.au/dmg/Documents/QLD-Disaster-Management-
Guideline.pdf
[10]
Joe Oleksak and John Hampson. 2020. The new Business Continuity
Management Booklet: Four questions you should ask. Plante Moran.
https://www.plantemoran.com/explore-our-thinking/insight/2020/02/the-
new-business- continuity-management- booklet-four-questions
[11]
Kim Le. 2016. SOX Walk-through Overview. A2Q2. https://www.a2q2.com/blog/
sox/sox-walk- through-overview/#:~:text=Test%20of%20Design%20(TOD)%20%
E2%80%93,operates%20as%20it%20was%20designed
[12]
NIST. 2020. Security and Privacy Controls for Information Systems and Orga-
nizations. NIST Special Publication 800-53 Revision 5. https://nvlpubs.nist.gov/
nistpubs/SpecialPublications/NIST.SP.800-53r5.pdf
[13]
London First. 2003. Expecting the unexpected, Business Continuity in an uncer-
tain world. https://assets.publishing.service.gov.uk/government/uploads/system/
uploads/attachment_data/le/61089/expecting-the- unexpected.pdf
[14]
Jessie Reed. 2019. Data Center Disaster Recovery: A Complete Guide. Nakivo.
https://www.nakivo.com/blog/data-center-disaster- recovery-a- complete-
guide/
[15]
Marten Bütow. 2019. Thus spoke the BSI
. . .
.T-Systems. https://www.t-systems.
com/de/en/newsroom/expert-blogs/thus- spoke-the- bsi-76334
[16]
European Commission. 2016. What is a data breach and what do we have
to do in case of a data breach? https://ec.europa.eu/info/law/law-topic/data-
protection/reform/rules-business- and-organisations/obligations/what- data-
breach-and- what-do- we-have- do-case- data-breach_en
[17]
Bank for International Settlements. 2020. Principles for operational resilience.
https://www.bis.org/bcbs/publ/d509.pdf
[18]
CrossCountry Consulting. 2020. Operational Resilience: Identifying Critical
Business Processes. https://insights.crosscountry-consulting.com/operational-
resilience-identifying- critical-business- processes
[19]
NIST. 2018. Framework Version 1.1. NIST Cybersecurity Framework. https://
www.nist.gov/cyberframework
[20]
IBM Cloud Education. 2019. What are Security Controls? IBM. https://www.ibm.
com/cloud/learn/security-controls
[21]
Margaret Langsett. 2016. Six levels of business continuity maturity. Virtual corpo-
ration. https://www.continuitycentral.com/index.php/news/business-continuity-
news/1293-six- levels-of- business-continuity- maturity
[22]
Bundesamt für Sicherheit in der Informationstechnik. 2020. Orienta-
tion guide to documentation of compliance according to Section 8a
(3) BSIG. https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/IT-
SiG/Orientierungshilfe_8a_3_eng.pdf?__blob=publicationFile&v=3
[23]
Moh Heng Goh. 2019. What are the Types of Business Continuity Strategy? BCM
Institute. https://blog.bcm-institute.org/bcm- planning-methodology/what- are-
the-types- of-business- continuity-strategy
[24]
Paul Malliet, Frédéric Reynès, Gissela Landa, Meriem Hamdi-Cherif, and Aurélien
Saussay. 2020. Assessing Short-Term and Long-Term Economic and Environ-
mental Eects of the COVID-19 Crisis in France. Environmental and Resource
Economics. https://link.springer.com/article/10.1007/s10640-020-00488- z
[25]
HP Enterprise Security. 2011. Next-Generation Application Monitoring: Combin-
ing Application Security Monitoring and SIEM. http://docs.media.bitpipe.com/io_
10x/io_101711/item_478578/TT%2011- 105%20HP%20Enterprise%20Security.pdf
[26]
Karen Scarfone. 2015. IBM Security QRadar: SIEM product overview. SearchSecu-
rity. https://searchsecurity.techtarget.com/feature/IBM-Security- QRadar-SIEM-
product-overview
[27]
Stephen Cooper. 2020. Splunk SIEM Review & Alternatives. Comparitech. https:
//www.comparitech.com/net-admin/splunk-siem-review-alternatives/
[28]
Sander Berkouwer. 2018. The Cloud Identity Dilemma. Semperis. https://www.
semperis.com/blog/cloud-identity- dilemma/
[29]
Neil Ferguson et. all. 2020. Impact of non-pharmaceutical interventions
[NPIs] to reduce COVID19 mortality and healthcare demand. Imperial Col-
lege. https://medium.com/tomas-pueyo/coronavirus-der-hammer- und-der- tanz-
abf9015cb2af
[30]
MITRE. 2020., ATT&CK Matrix for Enterprise. MITRE ATT&CK. https://attack.
mitre.org/
[31]
Simmons & Simmons LLP. 2020. Operational resilience – outsourcing and third
party risk management. https://www.simmons-simmons.com/en/publications/
cka512adwj5yq0999l3fbcm1d/operational-resilience-- -outsourcing- and-third-
party-risk- management
[32]
Federal Oce for Information Security. 2009. BSI Standard 100-4 Business Conti-
nuity Management. https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/
Publications/BSIStandards/standard_100-4_e_pdf.pdf ?__blob=publicationFile&
v=1
[33]
Edward Bishop. 2020. Our New Normal Of Remote Work Makes Data Loss
Prevention Crucial For GDPR Compliance. Forbes. https://www.forbes.com/
sites/forbestechcouncil/2020/06/15/our-new- normal-of- remote-work- makes-
data-loss- prevention-crucial- for-gdpr- compliance/?sh=24adde665937
[34]
Michael Berman. 2020. What’s The Dierence Between Business Con-
tinuity Management (BCM) And Pandemic Planning? NContracts.
https://www.ncontracts.com/integrated-risk-blog/whats-the- dierence-
between-business- continuity-management- bcm-and-pandemic- planning/
[35]
CREST. 2019. What is Cyber Threat Intelligence and how is it used?
https://www.crest-approved.org/wp-content/uploads/CREST-Cyber-Threat-
Intelligence.pdf
[36]
Eske Ofner. 2021. Keep calm and
. . .
? Five Tips For Successful Crisis Management.
FACT24. https://fact24.com/en/ve-tips-for-successful-crisis- management/