ArticlePDF Available

Towards cloud-native simulations - lessons learned from the front-line of cloud computing

Authors:
  • Lübeck University of Applied Sciences
  • aditerna GmbH

Abstract and Figures

Cloud computing can be a game-changer for computationally intensive tasks like simulations. The computational power of Amazon, Google, or Microsoft is even available to a single researcher. However, the pay-as-you-go cost model of cloud computing influences how cloud-native systems are being built. We transfer these insights to the simulation domain. The major contributions of this paper are twofold: (A) we propose a cloud-native simulation stack and (B) derive expectable software engineering trends for cloud-native simulation services. Our insights are based on systematic mapping studies on cloud-native applications, a review of cloud standards, action research activities with cloud engineering practitioners, and corresponding software prototyping activities. Two major trends have dominated cloud computing over the last 10 years. The size of deployment units has been minimized and corresponding architectural styles prefer more fine-grained service decompositions of independently deployable and horizontally scalable services. We forecast similar trends for cloud-native simulation architectures. These similar trends should make cloud-native simulation services more microservice-like, which are composable but just ''simulate one thing well.'' However, merely transferring existing simulation models to the cloud can result in significantly higher costs. One critical insight of our (and other) research is that cloud-native systems should follow cloud-native architecture principles to leverage the most out of the pay-as-you-go cost model.
Content may be subject to copyright.
Special Issue
JDMS
Journal of Defense Modeling and
Simulation: Applications,
Methodology, Technology
1–20
ÓThe Author(s) 2020
DOI: 10.1177/1548512919895327
journals.sagepub.com/home/dms
Towards cloud-native simulations –
lessons learned from the front-line of
cloud computing
Nane Kratzke
1
and Robert Siegfried
2
Abstract
Cloud computing can be a game-changer for computationally intensive tasks like simulations. The computational power
of Amazon, Google, or Microsoft is even available to a single researcher. However, the pay-as-you-go cost model of
cloud computing influences how cloud-native systems are being built. We transfer these insights to the simulation
domain. The major contributions of this paper are twofold: (A) we propose a cloud-native simulation stack and (B)
derive expectable software engineering trends for cloud-native simulation services. Our insights are based on systematic
mapping studies on cloud-native applications, a review of cloud standards, action research activities with cloud engineer-
ing practitioners, and corresponding software prototyping activities. Two major trends have dominated cloud computing
over the last 10 years. The size of deployment units has been minimized and corresponding architectural styles prefer
more fine-grained service decompositions of independently deployable and horizontally scalable services. We forecast
similar trends for cloud-native simulation architectures. These similar trends should make cloud-native simulation ser-
vices more microservice-like, which are composable but just ‘‘simulate one thing well.’’ However, merely transferring
existing simulation models to the cloud can result in significantly higher costs. One critical insight of our (and other)
research is that cloud-native systems should follow cloud-native architecture principles to leverage the most out of the
pay-as-you-go cost model.
Keywords
Cloud computing, cloud native, cloud maturity, simulation, reference model, maturity model
1. Introduction
Simulation is used for various purposes, such as training,
analysis, and decision support. Consequently, modeling
and simulation (M&S) has become a critical technology
for many industry sectors (such as logistics and manufac-
turing) and the defense sector. Achieving interoperability
between multiple simulation systems and ensuring the
credibility of results often requires enormous efforts with
regards to time, personnel, and budget. Recent technical
developments in the area of cloud computing technology
and service-oriented architectures (SOAs) may offer
opportunities to better utilize M&S capabilities in order to
satisfy these critical needs. A concept that includes service
orientation and the provision of M&S applications via the
as-a-service model of cloud computing may enable more
composable simulation environments that can be deployed
on-demand. This new concept is commonly known as
M&S as a Service (MSaaS).
1.1 MSaaS simulation principles and validation
results
The NATO Modelling and Simulation Group (NMSG) is
part of the NATO Science and Technology Organization
(STO). The mission of the NMSG is to promote coopera-
tion among alliance bodies, NATO, and partner nations to
1
Department for Electrical Engineering and Computer Science, Lu¨beck
University of Applied Sciences, Germany
2
aditerna GmbH, Germany
Corresponding author:
Nane Kratzke, Department for Electrical Engineering and Computer
Science, Lu¨beck University of Applied Sciences, Mo
¨nkhofer Weg 239,
Lu¨beck, 23562, Germany.
Email: nane.kratzke@th-luebeck.de
maximize the effective utilization of M&S. Primary mis-
sion areas include the following:
M&S standardization;
education;
associated science and technology.
The NMSG is tasked to enforce and supervise imple-
mentation of the NATO Modelling and Simulation
Masterplan (NMSMP; v2.0 (AC/323/NMSG(2012)-015)).
The NMSMP defines several objectives that collectively
will help to exploit M&S to its full potential across NATO
and the nations to enhance both operational and cost-effec-
tiveness. This vision will be achieved through a coopera-
tive effort guided by the following principles.
Synergy: leverage and share the existing NATO
and national M&S capabilities.
Interoperability: direct the development of common
M&S standards and services for simulation intero-
perability and foster interoperability between
Command & Control (C2) and simulation.
Reuse: increase the visibility, accessibility, and
awareness of M&S assets to foster sharing across
all NATO M&S application areas.
The NMSMP defines five strategic objectives, two of
which are directly addressed by the MSaaS efforts
described in this paper:
establish a common technical framework;
provide coordination and common services.
NATO MSG-136 (Modelling and Simulation as a
Service)
1
is one of the working groups under the NMSG.
From 2014 to 2017 this working group investigated the
concept of MSaaS to provide the technical and organiza-
tional foundations for a future service-based allied frame-
work for MSaaS within NATO and partner nations. In this
period, MSG-136 did groundbreaking work by defining
MSaaS in the NATO context and by developing opera-
tional, technical, and governance concepts for permanently
establishing the ‘‘Allied Framework for MSaaS.’’ In addi-
tion to developing the foundational concepts, MSG-136
conducted extensive experimentation activities to test and
validate the concepts.
2
From 2018 to 2021, the initial con-
cepts are extended by MSG-164 and validated through
dedicated evaluation events and participation in opera-
tional exercises.
1.2 Cloud-native lessons learned
Even tiny companies can generate enormous economic
growth and business value by providing cloud-based
services or applications: Instagram, Uber, WhatsApp,
NetFlix, Twitter – and many astonishing small companies
(if we relate the modest headcount of these companies in
their founding days to their noteworthy economic impact)
whose services are frequently used. However, even a fast-
growing start-up business model should have long-term
consequences and dependencies in mind. Many of these
companies rely on public cloud infrastructures – often pro-
vided by Amazon Web Services (AWS), Microsoft
(Azure), Google (Cloud Services), etc. Meanwhile, cloud
providers run a significant amount of mission-critical busi-
ness software for companies that no longer operate their
own data centers. Moreover, it is very often economical if
workloads have a high peak-to-average ratio.
3
However,
there are downsides. Although cloud services could be
standardized commodities, they are mostly not. Once a
cloud-hosted application or service is deployed to a spe-
cific cloud infrastructure, it is often inherently bound to
that infrastructure due to non-obvious technological bind-
ings. A transfer to another cloud infrastructure is very
often a time consuming and expensive one-time exercise.
A good real-world example here is Instagram. After being
bought by Facebook, it took over a year for the Instagram
engineering team to find and establish a solution for the
transfer of all its services from AWS to Facebook data
centers. Although no downtimes were planned, noteworthy
outages occurred during that period.
The National Institute of Standards and Technology
(NIST) definition of cloud computing defines three basic
and well-accepted service categories
4
: Infrastructure as a
Service (IaaS), Platform as a Service (PaaS), and Software
as a Service (SaaS). IaaS provides maximum flexibility
for arbitrary consumer-created software but hides almost
no operation complexity of the application (just of the
infrastructure). SaaS, on the other hand, hides operation
complexity almost entirely but is too limited for many use
cases involving consumer-created software. PaaS is a com-
promise enabling the operation of consumer-created soft-
ware with a convenient operation complexity but at the
cost to accept to some degree of lock-in situations result-
ing from the platform.
Throughout a project called CloudTRANSIT, we
searched intensively for solutions to overcome this ‘‘cloud
lock-in’’ – to make cloud computing an actual commodity.
We developed and evaluated a cloud application transfer-
ability concept that has prototype status but already works
for approximately 70% of the current cloud market, and
that can be extended for the rest of the market share.
5
However, what is essential for this paper is that we learned
some core insights from our action research with
practitioners:
practitioners want to have a choice between
platforms;
2Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
practitioners prefer declarative and cybernetic
(auto-adjusting) instead of workflow-based
(imperative) deployment and orchestration
approaches;
practitioners are forced to make efficient use of
cloud resources because more and more systems
are migrated to cloud infrastructures, causing stea-
dily increasing bills;
practitioners rate pragmatism of solutions much
higher than full feature coverage of cloud platforms
and infrastructures.
1.3 Research question
All these points influence how practitioners construct
cloud application architectures that are intentionally
designed for the cloud. One thing we learned was the fact
that cloud-native applications – although they are all dif-
ferent – follow some common architectural patterns that
we could exploit for transferability. This paper investigates
the research question of how these lessons learned can be
transferred from the cloud-native computing to the simula-
tion and modeling domain.
1.4 Outline
Therefore, the remainder of this paper is outlined as fol-
lows. We present a cloud application reference model in
Section 2 that steered our research in the cloud computing
domain. According to our experiences and action research
activities over the last 10 years, cloud computing is domi-
nated by two major long-term trends that are investigated
in Section 3. In particular, we investigate resource utiliza-
tion improvements in Section 3.1 and the architectural evo-
lution of cloud applications in Section 3.2. Section 4 will
analyze both trends regarding possible upcoming trends of
interest in the M&S community. Section 5 will present cor-
responding related work from cloud computing and the
MSaaS domain to provide interesting follow-up for the
reader. We will conclude our thoughts in Section 6 and
forecast intensified decentralizing and more fine-grained
service composing approaches for cloud computing and
the MSaaS domain.
2. Reference model
Our problem awareness results mainly from the conducted
research project CloudTRANSIT. This project dealt with
the question of how to transfer cloud applications and ser-
vices at runtime without downtime across cloud infrastruc-
tures from different public and private cloud service
providers to tackle the existing and growing problem of
vendor lock-in in cloud computing. Throughout the proj-
ect, we published more than 20 research papers. However,
the intent of this paper is not to summarize these papers.
The interested reader is referred to the corresponding tech-
nical report
5
that provides an integrated view of these
outcomes.
Almost all cloud system engineers focus on a common
problem. The core components of their distributed and
cloud-based systems, such as virtualized server instances
and essential networking and storage, can be deployed
using commodity services. However, further services –
that are needed to integrate these virtualized resources in
an elastic, scalable, and pragmatic manner – are often not
considered in standards. Services such as load balancing,
auto-scaling, or message queuing systems are needed to
design an elastic and scalable cloud-native system on
almost every cloud service infrastructure. Some standards,
such as AMQP
6
for messaging (dating back almost to the
pre-cloud era), exist. However, mainly these integrating
and ‘‘gluing’’ service types – that are crucial for almost
every cloud application on a higher cloud maturity level –
are often not provided in a standardized manner by cloud
providers.
7
It seems that all public cloud service providers
try to stimulate cloud customers to use their non-
commodity convenience service ‘‘interpretations’’ to bind
them to their infrastructures and higher level service
portfolios.
What is more, according to an analysis we performed
in 2016,
8
the percentage of these commodity service cate-
gories that are considered in standards, such as CIMI,
9
OCCI,
10,11
CDMI,
12
OVF,
13
OCI,
14
and TOSCA,
15
has
even decreased over the years. That has mainly to do with
the fact that new cloud service categories are released
faster than standardization authorities can standardize
existing service categories. Figure 1 shows this effect by
the example of AWS over the years. That is how vendor
Figure 1. Decrease of standard coverage over years (by
example of Amazon Web Services).
Kratzke and Siegfried 3
lock-in emerges in cloud computing. For a more detailed
discussion, we refer to Opara-Martins et al.,
16
Kratzke et
al.,
8
and Kratzke and Peinl.
17
Therefore, all reviewed cloud standards focus on a min-
imal but necessary subset of popular cloud services: com-
pute nodes (virtual machines), storage (file, block, object),
and (virtual private) networking. Standardized deployment
approaches, such as TOSCA, are defined mainly against
this commodity infrastructure level of abstraction. These
kinds of services are often subsumed as IaaS and build the
foundation of cloud services and therefore cloud-native
applications. All other service categories might foster ven-
dor lock-in situations. That might sound disillusioning. In
consequence, many cloud engineering teams follow the
basic idea that a cloud-native application stack should be
only using a minimal subset of well-standardized IaaS ser-
vices as founding building blocks. Because existing cloud
standards cover only specific cloud service categories
(mainly the IaaS level) and do not show an integrated
point of view, a more integrated reference model that take
the best practices of practitioners into account would be
helpful.
Very often cloud computing is investigated from a ser-
vice model point of view (IaaS, PaaS, SaaS) or a deploy-
ment point of view (private, public, hybrid, community
cloud).
4
Alternatively, one can look from an actor point of
view (provider, consumer, auditor, broker, carrier) or a
functional point of view (service deployment, service
orchestration, service management, security, privacy), as
done by Bohn et al.
18
Points of view are particularly useful
to split problems into concise parts. However, the
viewpoints mentioned above might be common in cloud
computing and useful from a service provider point of
view, but not from a cloud-native application engineering
point of view. From an engineering point of view, it seems
more useful to have views on the technology levels
involved and applied in cloud-native application
engineering.
By using the insights from our systematic mapping
study
19
and our review of cloud standards,
17
we compiled
a reference model of cloud-native applications. This
layered reference model is shown and explained in
Figure 2. The basic idea of this reference model is to use
only a small subset of well-standardized IaaS services as
founding building blocks (Layer 1). Four primary view-
points form the overall shape of this model.
Infrastructure provisioning: this is a viewpoint
that is familiar for engineers working on the infra-
structure level and how IaaS is understood. IaaS
deals with the deployment of separate compute
nodes for a cloud consumer. The cloud consumer
must manage these (hundreds of) requested and iso-
lated nodes.
Clustered elastic platforms: this is a viewpoint
that is familiar for engineers who are dealing with
horizontal scalability across nodes. Clusters are a
concept to handle many Layer 1 nodes as one logi-
cal compute node (a cluster). Such technologies are
often the technological backbone for portable cloud
runtime environments because they are hiding com-
plexity (of hundreds or thousands of single nodes)
Figure 2. Cloud-native stack observable in many cloud-native applications. FaaS: Function as a Service; IaaS: Infrastructure as a
Service.
4Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
appropriately. In addition, this layer realizes the
foundation to define services and applications with-
out reference to particular cloud services, cloud
platforms, or cloud infrastructures. Thus, it pro-
vides a foundation to avoid vendor lock-in.
Service composing: this is a viewpoint familiar for
application engineers dealing with web services in
SOAs. These (micro)-services operate on a Layer 2
cloud runtime platform (such as Kubernetes,
Mesos, Swarm, Nomad, and so on). Thus, the com-
plex orchestration and scaling of these services are
abstracted and delegated to a cluster (cloud runtime
environment) on Layer 2.
Application: this is a viewpoint that is familiar for
end-users of cloud services (or cloud-native applica-
tions). These cloud services are composed of smaller
cloud Layer 3 services being operated on clusters
formed of single compute and storage nodes.
For more details, we refer to Kratzke and Peinl
17
and
Kratzke and Quint.
5
However, the remainder of this paper
follows this model.
3. Observable long-term trends in cloud
computing
Cloud computing emerged some 10 years ago. In the first
adoption phase, existing IT systems were merely trans-
ferred to cloud environments without changing the original
design and architecture of these applications. Tiered appli-
cations were merely migrated from dedicated hardware to
virtualized hardware in the cloud. Cloud system engineers
implemented remarkable improvements in cloud platforms
(PaaS) and infrastructures (IaaS) over the years and estab-
lished several engineering trends.
All of these trends try to optimize specific quality fac-
tors, such as functional stability, performance efficiency,
compatibility, usability, reliability, maintainability, port-
ability, and security of cloud services to improve the over-
all quality of service (QoS). The most focused quality
factors are functional stability, performance efficiency,
and reliability (including availability).
20,21
Therefore, these
engineering trends, listed in Table 1, seem somehow iso-
lated. We want to review these trends from two different
perspectives.
Table 1. Some observable software engineering trends coming along with CNAs.
Trend Rationale
Microservices Microservices can be seen as a ‘‘pragmatic’’ interpretation of SOA. In addition to SOA, microservice
architectures intentionally focus and compose small and independently replaceable horizontally scalable
services that are ‘‘doing one thing well.’’
DevOps DevOps is a practice that emphasizes the collaboration of software developers and IToperators. It aims to
build, test, and release software more rapidly, frequently, and more reliably using automated processes for
software delivery. DevOps fosters the need for independent replaceable and standardized deployment units
and therefore pushes microservice architectures and container technologies.
Cloud modeling
languages
Softwareization of infrastructure and network enables one to automate the process of software delivery
and infrastructure changes more rapidly. Cloud modeling languages can express applications and services
and their elasticity behavior that shall be deployed to such infrastructures or platforms.
Standardized
deployment units
Deployment units wrap a piece of software in a complete file system that contains everything needed to
run: code, runtime, system tools, system libraries. So, it is guaranteed that the software will always run the
same, regardless of its environment. This deployment approach is often made using container technologies
(OCI standard). Each deployment unit should be designed and interconnected according to a collection of
cloud-focused patterns, such as the twelve-factor app collection, the circuit breaker pattern, and cloud
computing patterns.
Elastic platforms Elastic platforms, such as Kubernetes, Mesos, or Swarm, can be seen as a unifying middleware of elastic
infrastructures. Elastic platforms extend resource sharing and increase the utilization of underlying
compute, network, and storage resources for custom but standardized deployment units.
Serverless The term ‘‘serverless’’ is used for an architectural style that is used for cloud application architectures that
deeply depend on external third-party services (Backend-as-a-Service, BaaS) and integrating them via small
event-based triggered functions (Function-as-a-Service, FaaS). FaaS extends resource sharing of elastic
platforms by simply applying time-sharing concepts.
State isolation Stateless components are easier to scale up/down horizontally than stateful components. Of course,
stateful components cannot be avoided, but stateful components should be reduced to a minimum and
realized by intentional horizontal scalable storage systems (often eventual consistent NoSQL databases).
Versioned REST APIs REST-based APIs provide scalable and pragmatic communication, which means relying mainly on already
existing internet infrastructure and well-defined and widespread standards.
Loose coupling Service composition is done by events or by data. Event coupling relies on messaging solutions (e.g., AMQP
standard). Data coupling often relies on scalable but (mostly) eventual consistent storage solutions (which
are often subsumed as NoSQL databases).
CNAs: cloud-native applications; SOA: service-oriented architecture; API: application programming interface.
Kratzke and Siegfried 5
3.1 Resource utilization
Cloud infrastructures (IaaS) and platforms (PaaS) are built
to be elastic. Elasticity is understood as the degree to
which a system adapts to workload changes by provision-
ing and de-provisioning resources automatically. Without
this, cloud computing is very often not reasonable from an
economic point of view.
3
Over time, system engineers
learned to understand the elasticity options of modern
cloud environments better. Eventually, systems were
designed for such elastic cloud infrastructures, which
increased the utilization rates of underlying computing
infrastructures via new deployment and design approaches,
such as containers, microservices, or serverless architec-
tures. This design intention is often expressed using the
term ‘‘cloud native.’
Figure 3 shows a noticeable trend over the last decade.
Machine virtualization was introduced to consolidate many
bare metal machines to make more efficient utilization of
physical resources. This machine virtualization forms the
technological backbone of IaaS cloud computing. Virtual
machines might be more lightweight than bare metal ser-
vers, but they are still heavy, especially regarding their
image sizes. Due to being more fine-grained, containers
improved the way of standardized deployments but also
increased the utilization of virtual machines.
Nevertheless, although containers can be scaled
quickly, they are still always-on components. For that
reason Function-as-a-Service (FaaS) approaches have
emerged and applied time sharing of containers on under-
lying container platforms. Due to this time-shared execu-
tion of containers on the same hardware, FaaS enables
even a scale-to-zero capability. This improved resource
efficiency can be even measured monetarily.
22
So, over
time the technology stack to manage resources in the
cloud became more complicated and more difficult to
understand but followed one trend – to run a greater work-
load on the same number of physical machines.
3.1.1 Service-oriented deployment monoliths. Service-
oriented computing is a paradigm for distributed comput-
ing and e-business processing and has been introduced to
manage the complexity of distributed systems and to inte-
grate different software applications. A service offers func-
tionalities to other services mainly via message passing.
Services decouple their interfaces from their implementa-
tion. Corresponding architectures for such applications are
called SOAs. Many business applications have been devel-
oped over recent decades following this architectural para-
digm. Also, due to its underlying service concepts, these
applications can be deployed in cloud environments with-
out any problems. However, the main problem for cloud
system engineers emerges from the problem that –
although these kinds of applications are composed of dis-
tributed services – their deployment is not. These kinds of
Figure 3. The cloud architectural evolution from a resource utilization point of view. VM: virtual machine; FaaS: Function as a
Service.
6Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
distributed applications are conceptually monolithic appli-
cations from a deployment point of view.
In other words, the complete distributed application
must be deployed all at once in the case of updates or new
service releases. This monolithic style even leads to situa-
tions where complete applications are simply packaged as
one large virtual machine image. That fits perfectly to the
situations shown in Figure 3 (Dedicated Server and
Virtualization). However, depending on the application
size, this normally involves noteworthy downtimes of the
application for end-users and limits the capability to scale
the application in the case of increasing or decreasing
workloads.
It is evident that especially cloud-native applications
come along with such 24 ×7 requirements and the need to
deploy, update, or scale single components independently
from each other at runtime without any downtime.
Therefore, SOA evolved into a so-called microservice
architectural style. One might mention that microservices
are mainly a more pragmatic version of SOAs. What is
more, microservices are intentionally designed to be
independently deployable, updateable, and horizontally
scalable. So, microservices have some architectural
implications that will be investigated in Section 3.2.1.
However, deployment units of microservices should be
standardized and self-contained. This aspect will be inves-
tigated in Section 3.1.2.
3.1.2 Standardized and self-contained deployment
units. While deployment monoliths are mainly using IaaS
resources in the form of virtual machines that are deployed
and updated less regularly, microservice architectures split
up the monolith into independently deployable units that
are deployed and terminated much more frequently. What
is more, this deployment is done in a horizontally scalable
way that is very often triggered by request stimuli. If many
requests are hitting a service, more service instances are
launched to distribute the requests across more instances.
If the requests are decreasing, service instances are shut
down to free resources (and save money). So, the inherent
elasticity capabilities of microservice architectures are
much more in focus compared with classical deployment
monoliths and SOA approaches. One of the critical suc-
cess factors resulting in microservice architectures gaining
so much attraction over the recent years might be the fact
that the deployment of service instances could be standar-
dized as self-contained deployment units – so-called con-
tainers.
23
Containers make use of operating system
virtualization instead of machine virtualization (see Figure 4)
and are therefore much more lightweight. Containers make
scaling much more pragmatic and faster, and because contain-
ers are less resource consuming compared with virtual
machines, the instance density is reduced.
However, even in microservice architectures, the ser-
vice concept is an always-on concept. So, at least one ser-
vice instance (container) must be active and running for
each microservice at all times. Thus, even container tech-
nologies do not overcome the need for always-on compo-
nents. Also, always-on components are one of the most
expensive and therefore avoidable cloud workloads,
according to Weinmann.
3
Thus, the question arises as to
Figure 4. Comparing containers and virtual machines (adapted from the Docker website: https://www.docker.com/resources/
what-container).
Kratzke and Siegfried 7
whether it is possible to execute service instances only in
the case of actual requests? The answer leads to FaaS con-
cepts and corresponding platforms that will be discussed
in Section 3.1.3.
3.1.3 Function as a Service. Microservice architectures pro-
pose a solution to efficiently scale computing resources
that are hardly realizable with monolithic architectures.
24
The allocated infrastructure can be better tailored to the
microservice needs due to the independent scaling of each
one of them via standardized deployment units, addressed
in Section 3.1.2. However, microservice architectures face
additional efforts, such as deploying every single microser-
vice and to scale and operate them in cloud infrastructures.
To address these concerns container orchestrating plat-
forms, such as Kubernetes
25
and Mesos/Marathon,
26
have
emerged. However, this shifts the problem to the operation
of these platforms, and these platforms are still always-on
components. Thus, so-called serverless architectures and
FaaS platforms have emerged in the cloud service ecosys-
tem. The AWS Lambda service might be the most promi-
nent one, but there exist more, such as Google Cloud
Functions, Azure Functions, OpenWhisk, and Spring
Cloud Functions, to name just a few. However, all (com-
mercial platforms) follow the same principle to provide
minimal and fine-grained services (just exposing one state-
less function) that are billed on a runtime-consuming
model (millisecond dimension).
FaaS is more fine-grained than microservices and facil-
itates the creation of functions. Therefore, these fine-
grained functions are sometimes called nanoservices.
These functions can be quickly deployed and automati-
cally scaled, and provide the potential to reduce infrastruc-
ture and operation costs. Unlike the deployment unit
approaches of Section 3.1.2 – that are still always-on soft-
ware components – functions are only processed if there
are active requests. Thus, FaaS can be much more cost
efficient than just containerized deployment approaches.
According to a cost comparison study of monolithic,
microservice, and FaaS architectures in a case study by
Villamizar et al.,
22
cost reductions of up to 75% are possible.
On the other hand, there are still open problems, such as
the serverless trilemma. The serverless trilemma ‘‘captures
the inherent tension between economics, performance, and
synchronous composition’’ of serverless functions.
27
One
obvious problem stressed by Baldini et al.
27
is the ‘‘double
spending problem’ shown in Figure 5. This problem
occurs when a serverless function fis calling another ser-
verless function gsynchronously. The consumer is billed
for the execution of fand g– although only gis consum-
ing resources because fis waiting for the result of g.To
avoid this double spending problem, many serverless
applications delegate the composition of fine-grained ser-
verless functions into higher order functionality to client
applications and edge devices outside the scope of FaaS
platforms. This composition problem leads to new – more
distributed and decentralized – forms of cloud-native
architectures investigated in Section 3.2.2.
3.2 Architectural evolution
The reader has seen in Section 3.1 that cloud-native appli-
cations strived for a better resource utilization mainly by
applying more fine-grained deployment units in shape of
lightweight containers (instead of virtual machines) or the
shape of functions in the case of FaaS approaches.
Moreover, these improvements of resource utilization rates
had an impact on how the architectures of cloud
Figure 5. The double spending problem resulting from the serverless trilemma. FaaS: Function as a Service.
8Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
applications evolved. Two major architectural trends
(microservices, and serverless architectures) in cloud
application architectures have emerged in the last decade.
We will investigate microservice architectures in Section
3.2.1 and serverless architectures in Section 3.2.2.
3.2.1 Microservice architectures. Microservices form.
.an approach to software and systems architecture that
builds on the well-established concept of modularisation but
emphasise technical boundaries. Each module — each micro-
service — is implemented and operated as a small yet inde-
pendent system, offering access to its internal logic and data
through a well-defined network interface. This architectural
style increases software agility because each microservice
becomes an independent unit of development, deployment,
operations, versioning, and scaling.
28
Often-mentioned benefits of microservice architectures are
faster delivery, improved scalability, and greater auton-
omy.
28,29
Different services in a microservice architecture
can be scaled independently from each other according to
their specific requirements and actual request stimuli.
What is more, each service can be developed and oper-
ated by different teams. So, microservices do not only have
a technological but also an organizational impact. These
teams can make localized decisions per service regarding
programming languages, libraries, frameworks, and more.
This organizational impact enables, on the one hand, best-
of-breed approaches within each area of responsibility. On
the other hand, it might increase the technological hetero-
geneity across the complete system. What is more, corre-
sponding the long-term effects regarding maintainability of
such systems might not have even been observed so far.
30
First generation microservices are formed of individual
services that were packed using container technologies.
These services were then deployed and managed at run-
time using container orchestration tools, such as Mesos.
Each service was responsible for keeping track of other
services, and invoking them by specific communication
protocols. Failure-handling was implemented directly in
the service source code. With an increase of services per
application, the reliable and fault-tolerant location and
invocation of appropriate service instances became a prob-
lem itself. If new services were implemented using differ-
ent programming languages, reusing existing discovery
and failure-handling code would become increasingly dif-
ficult. So, freedom of choice and ‘‘polyglot programming’’
are often-mentioned benefits of microservices, but they
have drawbacks that need to be managed.
Therefore, second generation microservice architec-
tures made use of discovery services and reusable fault-
tolerant communication libraries. Common discovery ser-
vices (such as Consul, see Table 2) were used to register
provided functionalities. During service invocation, all
protocol-specific and failure-handling features were dele-
gated to an appropriate communication library, such as
Finagle (see Table 2). This simplified service implementa-
tion and reuse of boilerplate communication code across
services.
The third generation introduced service proxies as
transparent service intermediates with the intent to
improve software reusability. So-called sidecars encapsu-
late reusable service discovery and communication fea-
tures as self-contained services that can be accessed via
existing fault-tolerant communication libraries provided by
almost every programming language nowadays. Because
of its network intermediary conception, sidecars are more
than suited for monitoring the behavior of all service inter-
actions in a microservice application. This intermediary is
precisely the idea behind service mesh technologies such
as Linkerd (see Table 2). These tools extend the notion of
self-contained sidecars to provide a more integrated service
communication solution. Using service meshes, operators
have much more fine-grained control over the service-to-
service communication, including service discovery, load bal-
ancing, fault tolerance, message routing, and even security.
So, besides the pure architectural point of view, the fol-
lowing tools, frameworks, services, and platforms (see Table
2) form our current understanding of the term microservice.
Service discovery technologies let services commu-
nicate with each other without explicitly referring
to their network locations.
Container orchestration technologies automate con-
tainer allocation and management tasks, abstracting
away the underlying physical or virtual infrastruc-
ture from service developers. That is the reason we
see this technology as an essential part of any
cloud-native application stack (see Figure 2).
Monitoring technologies that are often based on
time-series databases to enable runtime monitoring
and analysis of the behavior of microservice
resources at different levels of detail.
Latency and fault-tolerant communication libraries
let services communicate more efficiently and reli-
ably in permanently changing system configura-
tions with plenty of service instances permanently
joining and leaving the system according to chang-
ing request stimuli.
Continuous delivery technologies integrate solu-
tions, often into third-party services that automate
many of the DevOps practices typically used in a
web-scale microservice production environment.
31
Service proxy technologies encapsulate mainly
communication-related features, such as service
discovery and fault-tolerant communication, and
expose them over HTTP.
Kratzke and Siegfried 9
Finally, the latest service mesh technologies build
on sidecar technologies to provide a fully integrated
service-to-service communication monitoring and
management environment.
Table 2 shows that a complex tool-chain evolved to
handle the continuous operation of microservice-based
cloud applications.
3.2.2 Serverless architectures. Serverless computing is a
cloud computing execution model in which the allocation
of machine resources is dynamically managed and inten-
tionally out of control of the service customer. The ability
to scale-to-zero instances is one of the critical differentia-
tors of serverless platforms compared with container
focused PaaS or virtual machine focused IaaS services.
Scale-to-zero enables to avoid always-on components and
therefore excludes the most expensive cloud usage pattern,
according to Weinmann.
3
That might be one reason why
the term ‘‘serverless’’ has become more and more com-
mon since 2014.
28
However, what is ‘‘serverless’’ exactly?
Servers must still exist somewhere.
So-called serverless architectures replace server admin-
istration and operation mainly by using FaaS concepts
32
and integrating third-party backend services. Figure 3
showed the evolution of how resource utilization has been
optimized over the last 10 years, ending in the latest trend
to make use of FaaS platforms. FaaS platforms apply
time-sharing principles and increase the utilization factor
of computing infrastructures, and thus avoid expensive
always-on components. As already mentioned, at least one
study showed that due to this time-sharing, serverless
architectures can reduce costs by 70%.
22
A serverless plat-
form is merely an event processing system (see Figure 6).
According to Baldini et al.,
33
serverless platforms take an
event (sent over HTTP or received from a further event
source in the cloud), then these platforms determine which
functions are registered to process the event, find an exist-
ing instance of the function (or create a new one), send the
event to the function instance, wait for a response, gather
execution logs, make the response available to the user,
and stop the function when it is no longer needed. Beside
application programming interface (API) composition and
aggregation to reduce API calls,
33
event-based applica-
tions are very much suited for this approach.
34
Serverless platform provision models can be grouped
into the following categories.
Public (commercial) serverless services of public
cloud service providers provide computational run-
time environments, also known as FaaS platforms.
Some well-known type representatives include
AWS Lambda, Google Cloud Functions, and
Microsoft Azure Functions. All of the mentioned
commercial serverless computing models are prone
to create vendor lock-in (to some degree).
Open (source) serverless platforms such as
Apache’s OpenWhisk and OpenLambda might be
an alternative, with the downside that these plat-
forms need infrastructure.
Provider agnostic serverless frameworks provide
a provider and platform agnostic way to define and
deploy serverless code on various serverless plat-
forms or commercial serverless services. So, these
frameworks are an option to avoid (or reduce) ven-
dor lock-in without the necessity to operate their
own infrastructure.
So, on the one hand, serverless computing provides
some inherent benefits, such as resource and cost effi-
ciency, operation simplicity, and a possible increase in
development speed and better time-to-market.
32
However,
serverless computing also comes with some noteworthy
Table 2. Some observable microservice engineering ecosystem components.
Ecosystem component Example tools, frameworks, services, and platforms (last accessed 18 December 2019)
Service discovery Zookeeper (https://zookeeper.apache.org), Eureka (https://github.com/Netflix/eureka), Consul
(https://www.consul.io), etcd (https://github.com/coreos/etcd, Synapse (https://github.com/airbnb/
synapse)
Container orchestration Kubernetes (https://kubernetes.io, Mesos (http://mesos.apache.org, Swarm (https://
docs.docker.com/engine/swarm), Nomad (https://www.nomadproject.io)
Monitoring Graphite (https://graphiteapp.org), InfluxDB (https://github.com/influxdata/influxdb), Sensu
(https://sensuapp.org), cAdvisor (https://github.com/google/cadvisor), Prometheus (https://
prometheus.io), Elastic Stack (https://elastic.io/products)
Fault-tolerant communication Finagle (https://twitter.github.io/finagle), Hystrix (https://github.com/Netflix/Hystrix), Proxygen
(https://github.com/facebook/proxygen), Resilience4j (https://github.com/resilience4j)
Continuous delivery services Ansible (https://ansible.com), Circle CI (https://circleci.com/), Codeship (https://codeship.com/),
Drone (https://drone.io), Spinnaker (https://spinnaker.io), Travis CI (https://travis-ci.org/)
Service proxy Envoy (https://www.envoyproxy.io)
Service meshes Linkerd (https://linkerd.io), Istio (https://istio.io)
10 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
drawbacks, such as runtime constraints, state constraints,
and still unsatisfactorily solved function composition prob-
lems, such as the double spending problem (see Figure 5).
What is more, resulting serverless architectures have secu-
rity implications. They increase attack surfaces and shift
parts of the application logic (service composing) to the
client-side (which is not under complete control of the ser-
vice provider). Furthermore, FaaS increases vendor lock-
in problems and client complexity, as well as integration
and testing complexity.
Furthermore, Figure 7 shows that serverless architec-
tures (and microservice architectures as well) require a
cloud application architecture redesign, compared to tradi-
tional e-commerce applications. Much more than micro-
service architectures, serverless architectures integrate
third-party backend services, such as authentication or
database services, intentionally. Functions on FaaS plat-
forms provide only very service specific, security relevant,
or computing intensive functionality. All functionality that
would have been provided classically on a central
Figure 6. Blueprint of a serverless platform architecture. FaaS: Function as a Service; API: application programming interface.
Figure 7. Serverless architectures result in a different and less centralized composition of application components and backend
services compared with classical tiered application architectures. API: application programming interface; FaaS: Function as a Service;
BaaS: Backend as a Service.
Kratzke and Siegfried 11
application server is now provided as many isolated
micro- or even nanoservices. The integration of all these
isolated services as meaningful end-user functionality is
delegated to end devices (very often in the shape of native
mobile applications or progressive web applications). In
summary, we can see the following observable engineer-
ing decisions in serverless architectures.
Former cross-sectional but service-internal logic,
such as authentication or storage, is sourced to
external third-party services.
Even nano- and microservice composition is shifted
to end-user clients or edge devices. This means that
even service orchestration is not done anymore by
the service provider itself but by the service con-
sumer via provided applications. This end-user
orchestration has two interesting effects: (1) the
service consumer now provides resources needed
for service orchestration; (2) because the service
composition is done outside the scope of the FaaS
platform, still unsolved FaaS function composition
problems (such as the double spending problem)
are avoided.
Such client or edge devices are interfacing third-
party services directly.
Endpoints of service-specific functionality are pro-
vided via API gateways. So, HTTP- and REST-
based/REST-like communication protocols are gen-
erally preferred.
Only very domain- or service-specific functions are
provided on FaaS platforms. This is mainly when
this functionality is security relevant and should be
executed in a controlled runtime environment by
the service provider, or the functionality is too pro-
cessing- or data-intensive to be executed on con-
sumer clients or edge devices, or the functionality is
so domain-, problem-, or service-specific that sim-
ply no external third-party service exists.
Finally, the reader might observe the trend in serverless
architectures that this kind of architecture is more decen-
tralized and distributed, makes more intentional use of
independently provided services, and is therefore much
more intangible (more cloudy) compared with microser-
vice architectures.
4. Impacts on the Modeling and
Simulation as a Service domain
The impacts on MSaaS are presented from diverse point
of views. Section 4.1 will present several example use
cases to derive some implications for cloud-native simula-
tions (CNSs; see Section 4.2). Section 4.3 will explain
how these implications have been considered in an CNS
reference model (see Figure 8). In addition, Section 4.4
will discuss some limitations that should be considered to
raise the overall maturity level of CNSs (Section 4.5).
4.1 Example cloud simulation use cases
There exist several examples and investigation of simula-
tion models that have been successfully deployed to public
cloud computing infrastructures.
35
A cloud-based distributed agent-based traffic simu-
lator named Megaffic.
36
The Scalable Electro-Mobility Simulation Cloud
Service was used to study the impact of large-scale
electromobility on a city’s infrastructure.
37
The D-Mason framework is a parallel version of the
Mason library for writing and running distributed
agent-based simulations.
38
GridSpice is a cloud-based simulation platform for
distributed smart power grid simulation.
39
The British Army is investigating the potential of
virtual reality (VR), machine learning, and cloud
computing for the Army’s Collective Training
Transformation Programme (CTTP; https://bit.ly/
2r6oL51). A series of training events aim to demon-
strate VR and mixed reality (MR) capabilities. To
display the potential benefits of data capture and
machine learning-driven analytics for military train-
ing, subcontractors will also show the use of cloud
computing in this context.
Guzzetti et al.
40
investigated the impact of different
high-performance computing (HPC) platforms for numeri-
cal simulation in computational hemodynamics with the
LiFEV (Library for Finite Elements). They compared in-
house computing clusters, a large-scale university-based
HPC cluster, and a regional supercomputer with public
clouds. According to their results, cloud computing can be
utilized for scientific computational fluid dynamics (CFD)
simulations, possibly at lower cost/performance than using
a more expensive local computing cluster. Ledyayev and
Richter
41
evaluated the private cloud solution OpenStack
by using three case studies in transportation modeling
(network optimization), high-energy physics (Monte
Carlo), and materials simulation (CFD). They concluded
that cloud computing is suitable for multiple runs of
non-concurrent code but needs specialist hardware to
support parallel processing.
4.2 Implications for cloud-native simulations
If we analyze these examples, we see that cloud-based
simulations are possible, even for large-scale problems.
However, economies of scale rely intensely on the kind of
12 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
simulation and the parallelizing approach of processing.
Summarizing our central insights of Section 3, we get the
following lessons learned from the cloud-native domain
that can be transferred to simulation contexts. First of all,
if a cloud-native application is an application that is com-
posed of services, then correspondingly, a CNS would be a
simulation composed of small and independent deployable
and replaceable simulation services that simulate (UNIX-
like) ‘‘one thing well’’ and can be scaled horizontally to
enable parallel processing.
Consequently, existing (monolithic) simulations must
be migrated into microservice architectures and would
evolve somehow from a cloud-ready into a cloud-native
maturity level (see Table 3). Cloud-native application engi-
neering showed that it is rarely possible to transfer existing
applications one to one into cloud environments without
reengineering.
As the reader will notice, we will propose a CNS stack
(Figure 8) that is deeply based on the already introduced
cloud-native stack (Figure 2). Consequently, correspond-
ing CNS engineering trends (Table 4) are derived from the
general cloud-native engineering trends (Table 1). The
cloud-native simulation stack as well as the corresponding
engineering trends have been compiled systematically by
‘‘replacing’’ general cloud-native concepts with more spe-
cific CNS concepts. That is because we assume that a
CNS is a particular cloud-native application (with some
specific requirements). However, this eliminated some
already discussed features and software trends, for exam-
ple, the observable DevOps trend is a general software
engineering trend. We do not see specific impacts on
simulation service engineering here that go beyond stan-
dard software engineering. However, that does not mean
that this trend should not be applied in simulation service
Table 3. Cloud simulation maturity model (adapted from the Open Data Center Alliance).
Level Maturity Criteria
3 Cloud native - Simulations are transferable across infrastructure providers at runtime and without
interruption of service.
- Simulation services are automatically scaled out/in based on stimuli.
2 Cloud resilient - The state of simulation services is isolated in a minimum of services.
- Simulations are unaffected by dependent service failures.
- Simulations are infrastructure agnostic.
1 Cloud friendly - Simulations are composed of loosely coupled simulation services.
- Simulation services are discoverable by name.
- Simulation services are designed to cloud patterns.
- Compute and storage are separated.
0 Cloud ready - Simulations can be operated on virtualized infrastructure.
- Simulations can be instantiated from image or script.
Figure 8. Proposal of a cloud-native simulation stack. FaaS: Function as a Service; IaaS: Infrastructure as a Service.
Kratzke and Siegfried 13
engineering. It only means that we do not see simulation-
specific problems here.
The same is true for cloud modeling and cloud simula-
tion tools (such as CloudSim) to represent and analyze
cloud architectures. At first glance, it seems obvious to
cover these tools as well. However, we do not see that
these tools are relevant to the research of CNSs in general,
except for the case that cloud simulations should be run as
CNSs. However, to simulate cloud infrastructures is
simply a particular object of investigation. This paper
intentionally does not focus too much on specific
simulation-specific objects of investigation.
We do not have a common definition that explains what
a CNS exactly is. Nevertheless, we use our experiences
with cloud-native applications to derive a definition pro-
posal for CNSs. If we assume that a CNS is a special kind
of a cloud-native application we should consider the fol-
lowing aspects.
Fehling et al.
42
postulate that almost all cloud-native
systems should be IDEAL: They [i]solate their state, they
are [d]istributed in their nature, they are [e]lastic in a hor-
izontal scaling notion, they are operated on [a]utomated
management systems, and their components are [l]oosely
coupled. According to Stine,
43
there are common motiva-
tions for cloud-native architectures, such as to deliver
software-based solutions more quickly (speed), in a more
fault isolating, fault tolerating, and automatic recovering
way (safety), to enable horizontal (instead of vertical)
application scaling (scale), and finally to handle a diver-
sity of consumer platforms and legacy systems (client
diversity).
Several application architectures and infrastructure
approaches address these common motivations.
Microservices represent the decomposition of
monolithic systems into independently deployable
services that do ‘‘one thing well.’
44,45
The primary mode of interaction between services in
a cloud-native application architecture is via pub-
lished and versioned APIs (API-based collabora-
tion). These APIs are often HTTP based and follow
a REST-style with JSON serialization, but other pro-
tocols and serialization formats can be used as well.
Single deployment units of the architecture are
designed and interconnected according to a collec-
tion of cloud-focused patterns, such as the twelve-
Table 4. Expectable software engineering trends for cloud-native simulation services.
CNA trend Impact on MSaaS
Microservices Simulation architectures should be composed of small and independently replaceable horizontally
scalable simulation services that are ‘‘simulating one thing well.’
Modeling languages Existing simulation modeling languages should be extended to define the composition of
simulation services and their elasticity behavior.
Standardized
deployment units
Simulation deployment units should wrap a piece of simulation software in a complete file system
that contains everything needed to run: code, runtime, system tools, system libraries. So, it is
guaranteed that the software will always run the same, regardless of its environment. This
deployment approach can be realized using standardized container technologies (OCI standard).
Elastic platforms Simulation platforms should evolve into a unifying middleware of cloud infrastructures. Such
platforms extend resource sharing and increase the utilization of underlying compute, network,
and storage resources for custom but standardized simulation deployment units.
Serverless Serverless simulation would be used for an architectural style that is used for cloud-based
simulations that deeply depend on external third-party simulation services and integrating them
via small event-based triggered functions (Function-as-a-Service, FaaS).
State isolation Stateless simulation services are easier to scale up/down horizontally than stateful simulation
services. Of course, stateful components cannot be avoided, but stateful components should be
reduced to a minimum and realized by intentional horizontal scalable storage systems (often
eventual consistent NoSQL databases).
Versioned REST APIs If simulation services provide versioned REST APIs this inherently provides a scalable and
pragmatic communication. Such a kind of simulation service communication relies mainly on
already existing internet infrastructure and well-defined and widespread standards. It would even
enable the seamless integration of simulation services that are not ‘‘cloud-native’’ but only
‘‘internet accessible.’’
Loose coupling Simulation service composition can be done by events or by data. Event coupling in ‘‘normal
cloud-native application’’ relies on messaging solutions (e.g., AMQP standard). Data coupling relies
on scalable but (mostly) eventual consistent storage solutions (which are often subsumed as
NoSQL databases).
CNA: cloud-native architecture; MSaaS: Modeling and Simulation as a Service; API: application programming interface; AMQP: Advanced Message
Queueing Protocol.
14 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
factor app collection,
46
the circuit breaker pat-
tern,
47
or cloud computing patterns.
42,48
More and more often elastic container platforms
are used to deploy and operate these microservices
via self-contained deployment units (containers).
These platforms provide additional operational cap-
abilities on top of IaaS infrastructures, such as auto-
mated and on-demand scaling of application
instances, application health management, dynamic
routing, load balancing, and aggregation of logs
and metrics.
4.3 A cloud-native simulation reference model
These aspects let us derive the following understanding
of a CNS system and the corresponding CNS stack
(Figure 8).
The core design idea of plenty of cloud-native applica-
tion architectures inspires the leading conceptual approach
of the derived CNS stack (Figure 8). Every simulation on
Layer 4 (or service) should be composable of stateless
Layer 3 simulation services that rely on services managing
and encapsulating simulation state. This separation of con-
cerns (simulation logic and simulation state) makes it pos-
sible for distributed simulations to decide for eventual or
strict consistency models for the simulation state.
Although it enables seamless horizontal scalability of
functional simulation services, it is a widespread and pro-
ven pattern in cloud-native application architectures,
according to our experiences.
Another general cloud application architecture best
practice is the standardization of the deployment of Layer
3 simulation services. This deployment standardization via
a Layer 2 elastic simulation platform enables one to oper-
ate plenty of services on the same physical or virtual Layer
1 hardware. In the general cloud computing context, this is
customarily done via container-based technologies. A con-
tainer is nothing more than a self-contained deployment
unit encapsulating all its runtime dependencies. It exposes
its functionality very often via a REST-based interfaces.
Such containers can be operated on corresponding con-
tainer platforms, such as Kubernetes, Mesos, Docker
Swarm, and more. These kind of platforms are application-
agnostic and can be used for arbitrary types of applica-
tions. Consequently, they can be used for simulation
services as well. Therefore, we recommend using these
kinds of building blocks for elastic simulation platforms
(Layer 2).
What is more, the proposed CNS stack aligns to the
general design principles of distributed and federated simu-
lations that have been successfully standardized, for exam-
ple, via high-level architecture (HLA). HLA is a standard
for distributed simulation, used when building a simulation
for a larger purpose by combining (federating) several
simulations. HLA requests a runtime infrastructure (RTI)
that provides a standardized set of services, as specified in
the HLA Federate Interface Specification. This RTI is
deeply aligned to Layer 2 of the proposed reference model.
Further HLA services of the interface specification can be
mapped to our model as well:
federation management services simulation
deployment unit orchestrator;
object and ownership management services sta-
teful simulation services;
time management services stateful simulation
services;
declaration and data distribution services exist-
ing messaging solutions (e.g., all Advanced
Message Queueing Protocol (AQMP) message bro-
kers) can be deployed on Layer 3 by the Simulation
Deployment unit orchestrator alongside further
simulation-specific services.
4.4 Discussion of limitations
We have to admit that this mapping stays vague at this
level of abstraction. So, more common detailed cross-
functional simulation services (such as timing or messa-
ging services) on Layer 3 could (and should) be defined in
future MSaaS work. However, the CNS stack does not
request a specific time or messaging simulation service (or
other services). However, it recommends providing such
services in a programming language-agnostic way, as
microservice approaches do via HTTP and REST-based
versioned APIs. To make use of such common internet
communication standards would efficiently tackle one
downside of current HLA-based approaches.
Because HLA is a message-oriented middleware that
defines a set of services, mostly provided by a C++ or
Java API, there is no standardized on-the-wire protocol. In
consequence, participants in a federation are very often
bound to RTI libraries from the same provider and usually
also of the same version for applications to interoperate.
The resulting simulations are mostly so-called deployment
monoliths.
Instead of that, the cloud-native application stack would
not request a specific set of services but the way to inter-
face every simulation service in an Internet standard-
ACNS is a distributed, elastic, and horizontally scalable
simulation system composed of simulation (micro)services that
isolate the state in a minimum of stateful components. The self-
contained deployment units of that simulation are designed
according to cloud-focused design patterns and operated on
self-service elastic simulation platforms.
Kratzke and Siegfried 15
conforming manner. Each simulation service would be a
self-containing deployment unit encapsulating all its run-
time dependencies. Every simulation service on Layer 3
and higher should be integrated using standardized internet
protocols, such as HTTP and REST-based approaches.
We also have to consider that cloud computing has tra-
ditionally not been HPC or simulation oriented. It is often
mentioned that current Internet of Things (IoT)
approaches, therefore, are being implemented using Fog or
Edge Computing approaches. However, this has more to
do with communication latencies outside the scope of the
regional datacenters of the hyperscalers. Nevertheless,
simulations are often based on message passing or data
analysis, and the current cloud approaches or MSaaS could
be a wrong choice. Recent studies by the NASA showed
that ‘‘in all cases, the full cost of running on NASA on-
premises resources was less than the lowest-possible com-
pute-only cost of running on AWS.’
49
This NASA study
showed that cloud-based simulations tend to be between
two and even 12 times more expensive than simulations
operated on on-premises simulation facilities.
This sounds disillusioning. However, the study did not
take into account that cloud-based simulations should fol-
low a different kind of architectural style to leverage the
economic benefits. Classical simulations are very often so-
called deployment monoliths. This kind of architecture is
hardly scalable and does fit not the general pay-as-you-go
business model of cloud computing. Insights into the
cloud-native application domain show that cloud-native
deployments should make use of more fine-grained and
horizontally scalable units to optimize resource usage. If
this can be done (and this might be simulation specific),
then other studies by Villamizar et al.
22
show that costs
could be reduced to 25%. So, if we are taking both kinds
of investigation into account, MSaaS would be a reason-
able option for simulations that are still two to four times
more expensive according to the NASA cost comparison
methodology. That is exactly what Taylor et al. are stat-
ing.
35
However, not all simulations will benefit equally
from MSaaS.
What is more, the reader should take into account that
plenty of small companies, organizations, or independent
researchers do not have access to simulation facilities
comparable to NASA’s High-End Computing Capability
(HECC) project. For people that do not have access to
such supercomputing facilities, cost and performance com-
parisons make little sense.
4.5 How to raise the cloud-native maturity of
simulations
The following setting should be considered to reach such a
cloud-native level.
We need standardized deployment units for simula-
tion components (services) and a standardized plat-
form to operate them. Furthermore, the deployment
units should provide a better-operating density than
virtual machines.
Event-triggered FaaS-based simulation services can
be considered to avoid expensive always-on
components.
FaaS-based simulation services will likely change
the architecture of simulations to avoid the double
spending problem (see Figure 5). Furthermore, run-
time limitations of FaaS functions, start-up laten-
cies, and state preservation must be considered and
might limit the applicability of special kinds of
time-sensitive simulations.
Horizontal scalability in cloud-native applications
is mostly realized via loosely coupled (event-based)
microservice approaches. These scalability require-
ments should be considered for CNS architectures
as well.
That raises the need for simulation service meshes
that connect, secure, control, and observe simula-
tion services to enable loose coupling of simulation
services.
FaaS-based simulation service composing problems
might raise the need for specific domain specific
languages (DSLs) to compose different kinds of
(stateful, stateless, serverless) simulation services
that are frictionless.
For operational convenience, the operation of cloud-
based simulation platforms and the operation of simula-
tions on top of these platforms should be handled as two
independent but complementary engineering problems.
Corresponding and simulation-specific engineering
trends are listed in Table 4 for the convenience of the
reader.
5. Related work
As far as the authors know, no survey has focused inten-
tionally on observable trends in cloud computing over the
last decade from a ‘‘big picture’’ and evolutional point of
view. This paper grouped that evolution into the following
points of view:
resource utilization optimization approaches, such
as the containerization and FaaS approaches;
the architectural evolution of cloud applications via
microservices and serverless architectures.
For all four of these specific aspects (containerization,
FaaS, microservices, serverless architectures) there exist
surveys that should be considered by the reader. The
16 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
studies and surveys
23,50–52
deal mainly with containeriza-
tion and its accompanying resource efficiency. Although
FaaS is quite young and is not reflected extensively in
research so far, there exist first survey papers
33,53–56
deal-
ing with FaaS approaches deriving some open research
questions regarding tool support, performance, patterns for
serverless solutions, enterprise suitability, and whether ser-
verless architectures will extend beyond traditional cloud
platforms and architectures.
Service composition provides value-adding and higher
order services by composing basic services that can be
even pervasively provided by various organizations.
57,58
What is more, service computing is quite established, and
there are several surveys on SOA-related aspects.
59–63
However, more recent studies focus mainly on microser-
vices. Dragoni et al.,
24
Jamshidi et al.,
28
and Cerny et al.
64
focus on the architectural point of view and the relation-
ship between SOA and microservices. All of these papers
are great to understand the current microservice ‘‘hype’
better. It is highly recommended to study these papers.
However, these papers are bound to microservices and do
not take the ‘‘big picture’’ of general cloud applications
and simulation architecture evolution into account. Very
often, serverless architectures are subsumed as a part of
microservices to some degree. The authors are not quite
sure whether serverless architectures do not introduce fun-
damental new aspects into cloud simulation architectures
that evolve from the ‘‘scale-to-zero’’ capability on the one
hand and the unsolved function composition aspects (such
as the double spending problem) on the other hand.
M&S products are highly valuable to NATO and the
military, and it is essential that M&S products, data, and
processes are conveniently accessible to a large number of
users as often as possible. Therefore, a new M&S ecosys-
tem is required where M&S products can be accessed
simultaneously and spontaneously by a large number of
users for their individual purposes. This ‘‘as-a-service’’
paradigm has to support stand-alone use as well as the
integration of multiple simulated and real systems into a
unified simulation environment whenever the need arises.
Several approaches head into this direction.
65–67
The
‘‘Allied Framework for MSaaS’’ is the common approach
of NATO and the nations toward implementing MSaaS
and is defined by the following documents.
1
Operational Concept Document (OCD): the OCD
describes the intended use, key capabilities, and
desired effects of the Allied Framework for MSaaS
from a user’s perspective.
Technical Reference Architecture: the Technical
Reference Architecture describes the architectural
building blocks and patterns for realizing MSaaS
capabilities.
Governance Policies: the Governance Policies iden-
tify MSaaS stakeholders and relationships and pro-
vide guidance for implementing and maintaining
the Allied Framework for MSaaS.
The MSaaS Technical Reference Architecture
68
is most
important in the context of this paper. The MSaaS
Technical Reference Architecture provides technical
guidelines, recommended standards, architecture building
blocks, and architecture patterns that should be considered
in realizing MSaaS capabilities. Compared to the CNS
maturity model, the MSaaS Technical Reference Model
guides on a higher level (building blocks, patterns, and
more), but does not explicitly define how to implement
those building blocks (e.g., as microservice or FaaS).
6. Conclusion
Because the inherent cost structure of cloud computing
stays the same for simulation services, we forecast that
CNS architectures follow similar trends to cloud-native
applications. These similar trends should make CNS ser-
vices more microservice-like, or even nanoservice-like
(similar to Functions as a Service). To leverage the oppor-
tunities of cloud computing, they should be much more
composed of smaller and more fine-grained and domain-
specific simulation services that just ‘‘simulate one thing
well.’’ Simulation services should strive to become state-
less or isolate states in a minimum of stateful components.
As the inherent nature of simulations deeply relies on
states (data), this state focusing might even raise some
problems that are not so common in ‘‘normal’’ cloud-
native application design and need new solutions to be
developed. ‘‘Classical’ cloud-native applications come
along with 24 ×7 requirements. These 24 ×7 require-
ments are very often not necessary for simulations. This
might provide short-cut opportunities in a cloud maturity
model for simulation services. A more detailed analysis of
how trends and approaches from cloud-native applications
might be applied to CNSs, and which new challenges
might arise, would be valuable future research.
Recent studies by NASA showed that cloud-based
simulations tend to be between two and even 12 times
more expensive than simulations operated on on-premises
simulation facilities. However, this study did not take into
account that cloud-based simulations should follow a dif-
ferent kind of architectural style to leverage the economic
benefits of cloud infrastructures. If this cloud-native rear-
chitecting of simulations can be done (and this might be
simulation specific), then other studies show that costs
could be reduced to 25%. So, if we are taking both kinds
of investigations into account, MSaaS would be a reason-
able option for maybe not all but a significant portion of
simulations.
Kratzke and Siegfried 17
What is more, plenty of small companies, organizations,
or independent researchers do not have access to simula-
tion facilities comparable to the NASA’s HECC project.
For all those that do not have access to supercomputing
facilities, such cost and performance comparisons make lit-
tle sense. Cloud computing might be the only viable option
here.
Funding
The authors disclosed receipt of the following financial
support for the research, authorship, and/or publication of
this article: The CloudTRANSIT project of Nane Kratzke
has been funded by the German Ministry of Education and
Research (13FH021PX4). The NMSG activities of Robert
Siegfried have been funded by the German Federal Office
of Bundeswehr Equipment, Information Technology and
In-Service Support (BAAINBw).
ORCID iD
Nane Kratzke https://orcid.org/0000-0001-5130-4969
References
1. NATO STO. Modelling and Simulation as a Service
(MSaaS) - Rapid Deployment of Interoperable and Credible
Simulation Environments. Technical report, AC/323(MSG-
136)TP/826, NATO Science and Technology Organization
(STO), 2018.
2. Siegfried R, McGroarty C, Lloyd J, et al. A new reality:
Modelling & Simulation as a Service. CSIAC J Cyber Secur
Inform Syst 2018; 6, https://www.csiac.org/journal-article/a-
new-reality-modelling-simulation-as-a-service/
3. Weinmann J. Mathematical proof if the inevitability of cloud
computing, http://www.JoeWeinman.com/Resources/Joe_
Weinman_Inevitability_Of_Cloud.pdf (2011, accessed 10
July 2018).
4. Mell PM and Grance T. The NIST definition of cloud com-
puting. Technical report, National Institute of Standards &
Technology, Gaithersburg, MD, USA, 2011.
5. Kratzke N and Quint PC. Technical report of project
CloudTRANSIT - Transfer cloud-native applications at run-
time. Technical report, Lu¨beck University of Applied
Sciences, 2018, https://doi.org/10.2314/KXP:1678556971
6. OASIS. Advanced Message Queueing Protocol (AQMP),
Version 1.0 http://www.amqp.org/sites/amqp.org/files/amqp.
pdf (2011, accessed 16 December 2019).
7. Kratzke N. Lightweight virtualization cluster – how to over-
come cloud vendor lock-in. J Comput Commun 2014; 2:
50326.
8. Kratzke N, Quint PC, Palme D, et al. Project cloud
TRANSIT - or to simplify cloud-native application provi-
sioning for SMEs by integrating already available container
technologies. In: Kantere V and Koch B (eds) European
project space on smart systems, big data, future internet -
towards serving the grand societal challenges, 2016, pp.3–
26. Setu
´bal, Portugal: SCITEPRESS.
9. Hogan M, Fang L, Sokol A, et al. Cloud Infrastructure
Management Interface (CIMI) model and RESTful HTTP-
based protocol, https://www.iso.org/standard/66296.html
(2015, accessed 16 December 2019).
10. Nyren R, Edmonds A, Papaspyrou A, et al. Open Cloud
Computing Interface (OCCI) - core, version 1.1, https://
www.ogf.org/documents/GFD.183.pdf (2011, accessed 16
December 2019).
11. Metsch T and Edmonds A. Open Cloud Computing Interface
(OCCI) - infrastructure, version 1.1 https://www.ogf.org/doc
uments/GFD.184.pdf (2011, accessed 16 December 2019).
12. SNIA. Cloud Data Management Interface (CDMI), version
1.1, http://www.snia.org/sites/default/files/CDMI_Spec_
v1.1.1.pdf (2015, accessed 16 December 2019).
13. System Virtualization, Partitioning, and Clustering Working
Group. Open Virtualization Format specification, version
2.1.0, https://www.dmtf.org/sites/default/files/standards/doc-
uments/DSP0243_2.1.1.pdf (2015, accessed 16 December
2019).
14. OCI. Open Container Initiative, https://www.opencontainers.
org (2015, accessed 4 February 2016).
15. OASIS. Topology and Orchestration Specification for Cloud
Applications (TOSCA), version 1.0, http://docs.oasis-open.org/
tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.pdf (2013, accessed 16
December 2019).
16. Opara-Martins J, Sahandi R and Tian F. Critical review of
vendor lock-in and its impact on adoption of cloud comput-
ing. In: International conference on information society
(i-Society 2014), London, 10–12 November 2014, pp.92–97.
IEEE.
17. Kratzke N and Peinl R. ClouNS - a cloud-native application
reference model for enterprise architects. In: 2016 IEEE
20th international enterprise distributed object computing
Workshop (EDOCW), Vienna, 5–9 September 2016, pp.1–
10. IEEE.
18. Bohn RB, Messina J, Liu F, et al. NIST cloud computing
reference architecture. In: world congress on services
(SERVICES 2011), Washington, DC, 4–9 July 2011,
pp.594–596. Washington, DC: IEEE Computer Society.
19. Quint PC and Kratzke N. Overcome vendor lock-in by inte-
grating already available container technologies - towards
transferability in cloud computing for SMEs. In: proceedings
of 7th international conference on cloud computing, grids
and virtualization (CLOUD COMPUTING 2016) (eds CB
Westphall, YW Lee and S Rass), pp.38–41.
20. Ardagna D, Casale G, Ciavotta M, et al. Quality-of-service
in cloud computing: modeling techniques and their applica-
tions. J Internet Serv Appl 2014; 5: 11.
21. White G, Nallur V and Clarke S. Quality of service
approaches in IoT: a systematic mapping. J Syst Softw 2017;
132: 186–203.
22. Villamizar M, Garce
´s O, Ochoa L, et al. Cost comparison of
running web applications in the cloud using monolithic,
microservice, and AWS Lambda architectures. Service
Orient Comput Appl. Epub ahead of print 27 April 2017.
DOI: 10.1007/s11761-017-0208-y.
18 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
23. Pahl C, Brogi A, Soldani J, et al. Cloud container technolo-
gies: a state-of-the-art review. IEEE Trans Cloud Comput
2017; 7: 1–1.
24. Dragoni N, Giallorenzo S, Lafuente AL, et al. Microservices:
yesterday, today, and tomorrow. In: Mazzara M and Meyer
B (eds) Present and ulterior software engineering. Cham:
Springer.
25. Verma A, Pedrosa L, Korupolu M, et al. Large-scale cluster
management at Google with Borg. In: Proceedings of the
tenth European conference on computer systems (EuroSys
’15), Bordeaux, France, 21–24 April 2015, pp.1–17. New
York: ACM.
26. Hindman B, Konwinski A, Zaharia M, et al. Mesos: a plat-
form for fine-grained resource sharing in the data center. In:
proceedings of the 8th USENIX conference on networked
systems design and implementation (NSDI’11), Boston, MA,
30 March–1 April 2011, pp.295–308. Berkeley, CA:
USENIX Association.
27. Baldini I, Cheng P, Fink SJ, et al. The serverless
trilemma: function composition for serverless computing. In:
proceedings of the 2017 ACM SIGPLAN international sympo-
sium on new ideas, new paradigms, and reflections on pro-
gramming and software - onward!, Vancouver, BC, Canada,
25–27 October 2017, pp.89–103. New York: ACM.
28. Jamshidi P, Pahl C, Mendoncxa NC, et al. Microservices: the
journey so far and challenges ahead. IEEE Softw 2018; 35:
24–35.
29. Taibi D, Lenarduzzi V and Pahl C. Architectural patterns for
microservices: a systematic mapping study. In: 8th interna-
tional conference on cloud computing and services science
(CLOSER‘18), Funchal, Madeira, Portugal, 19–21 March
2018, pp.221–232. SCITEPRESS.
30. Kratzke N and Quint PC. Understanding cloud-native appli-
cations after 10 years of cloud computing - a systematic map-
ping study. J Syst Softw 2017; 126: 1–16.
31. Balalaie A, Heydarnoori A and Jamshidi P. Microservices
architecture enables DevOps: migration to a cloud-native
architecture. IEEE Softw. Epub ahead of print 18 March
2016. DOI: 10.1109/MS.2016.64.1606.04036.
32. Mike Roberts. Serverless architectures, https://martinfowler.
com/articles/serverless.html (2016, accessed 18 December
2019).
33. Baldini I, Castro P, Chang K, et al. Serverless computing:
current trends and open problems. In: Research advances in
cloud computing. Singapore: Springer Singapore, 2017,
pp.1–20.
34. Baldini I, Castro P, Cheng P, et al. Cloud-native, event-based
programming for mobile applications. In: Proceedings of the
international conference on mobile software engineering and
systems, Austin, TX, 16–17 May 2016, pp.287–288. ACM.
35. Taylor SJ, Kiss T, Anagnostou A, et al. The CloudSME
simulation platform and its applications: a generic multi-
cloud platform for developing and executing commercial
cloud-based simulations. Fut Generat Comput Syst 2018; 88:
524–539.
36. Hanai M, Suzumura T, Ventresque A, et al. An adaptive
VM provisioning method for large-scale agent-based traffic
simulations on the cloud. In: 2014 IEEE 6th international
conference on cloud computing technology and science,
Singapore, 15–18 December 2014, pp.130–137. IEEE.
37. Zehe D, Knoll A, Cai W, et al. SEMSim Cloud Service:
large-scale urban systems simulation in the cloud. Simulat
Model Pract Theor 2015; 58: 157–171.
38. Carillo M, Cordasco G, Serrapica F, et al. D-Mason on the
cloud: an experience with Amazon Web Services. In:
Desprez F, Dutot P-F, Kaklamanis C, et al. (Eds.) Euro-Par
2016: parallel processing workshops, Lecture Notes in
Computer Science, pp.322–333. Cham: Springer
International Publishing.
39. Anderson K, Du J, Narayan A, et al. Gridspice: A distributed
simulation platform for the smart grid. IEEE Trans Ind
Informat 2014; 10: 2354–2363.
40. Guzzetti S, Passerini T, Slawinski J, et al.. Platform and algo-
rithm effects on computational fluid dynamics applications
in life sciences. Fut Generat Comput Syst 2017; 67: 382–
396.
41. Ledyayev R and Richter H. High performance computing in
a cloud using openstack. Cloud Computing 2014; 5: 108–
113.
42. Fehling C, Leymann F, Retter R, et al. Cloud computing pat-
terns. Wien, Austria: Springer, 2014.
43. Stine M. Migrating to cloud-native application architectures.
Sebastopol, CA: O Reilly, 2015.
44. Newman S. Building microservices. Sebastopol, CA:
O’Reilly Media, Incorporated, 2015.
45. Namiot D and Sneps-Sneppe M. On micro-services architec-
ture. Int J Open Inform Technol 2014; 2: 24–27.
46. Wiggins A. The twelve-factor app, http://12factor.net/ (2014,
accessed 14 February 2016).
47. Martin Fowler. Circuit breaker, http://martinfowler.com/
bliki/CircuitBreaker.html (2014, accessed 27 May 2016).
48. Erl T, Cope R and Naserpour A. Cloud computing design
patterns. Westford, MA: Prentice Hall, 2015.
49. Chang S, Hood R, Jin H, et al. Evaluating the suitability of
commercial clouds for NASA’s high performance comput-
ing applications: a trade study. Technical report, NASA,
https://www.nas.nasa.gov/assets/pdf/papers/NAS_Technical_
Report_NAS-2018-01.pdf (2018, accessed 18 December
2019).
50. Kaur T and Chana I. Energy efficiency techniques in cloud
computing: a survey and taxonomy. ACM Comput Surv
2015; 48: 22:1–22:46.
51. Tosatto A, Ruiu P and Attanasio A. Container-based orches-
tration in cloud: state of the art and challenges. In: 2015 ninth
international conference on complex, intelligent, and soft-
ware intensive systems, Blumenau, Brazil, 8–10 July 2015,
pp.70–75.
52. Peinl R, Holzschuher F and Pfitzer F. Docker cluster man-
agement for the cloud - survey results and own solution. J
Grid Comput 2016; 14: 265–282.
53. Spillner J. Practical tooling for serverless computing. In:
proceedings of the 10th international conference on utility
and cloud computing (UCC ’17), Austin, TX, 5–8 December
2017, pp.185–186. New York: ACM.
54. Lynn T, Rosati P, Lejeune A, et al. A preliminary review of
enterprise serverless cloud computing (Function-as-a-
Kratzke and Siegfried 19
Service) platforms. In: 2017 IEEE international conference
on cloud computing technology and science (CloudCom),
Hong Kong, China, 11–14 December 2017, pp.162–169.
55. van Eyk E, Toader L, Talluri S, et al. Serverless is more:
from PaaS to present cloud computing. IEEE Internet
Comput 2018; 22: 8–17.
56. van Eyk E, Iosup A, Abad CL, et al. A SPEC RG Cloud
Group’s vision on the performance challenges of FaaS cloud
architectures. In: Proceedings of the 8th ACM/SPEC on
international conference on performance engineering (ICPE
2018), Berlin, Germany, ACM, 9–13 April 2018, pp 21–24.
ACM.
57. Ylianttila M, Riekki J, Zhou J, et al. Cloud architecture for
dynamic service composition. Int J Grid High Perform
Comput 2012; 4: 17–31.
58. Zhou J, Riekki J and Sun J. Pervasive service computing
toward accommodating service coordination and collabora-
tion. In: 2009 4th international conference on frontier of
computer science and technology, Shanghai, China, 17–19
December 2009, pp.686–691. IEEE.
59. Huhns MN and Singh MP. Service-oriented computing:
key concepts and principles. IEEE Internet Comput 2005; 9:
75–81.
60. Dustdar S and Schreiner W. A survey on web services com-
position. Int J Web Grid Serv 2005; 1: 1–30.
61. Papazoglou MP, Traverso P, Dustdar S, et al. Service-
oriented computing: state of the art and research challenges.
Computer 2007; 40: 38–45.
62. Papazoglou MP and van den Heuvel WJ. Service oriented
architectures: approaches, technologies and research issues.
VLDB J 2007; 16: 389–415.
63. Razavian M and Lago P. A survey of SOA migration in
industry. In: Kappel G, Maamar Z and Motahari-Nezhad HR
(Eds.) Service-oriented computing. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2011, pp.618–626.
64. Cerny T, Donahoo MJ and Pechanec J. Disambiguation and
comparison of SOA, microservices and self-contained sys-
tems. In: Proceedings of the international conference on
research in adaptive and convergent systems (RACS ’17),
Krakow, Poland, 20–23 September 2017, pp.228–235. New
York: ACM.
65. Mittal S, Risco-Martı
´n JL and Zeigler BP. DEVS/SOA: a
cross-platform framework for net-centric modeling and
simulation in DEVS unified process. Simulation 2009; 85:
419–450.
66. Al-Zoubi K and Wainer G. Performing distributed simula-
tion with RESTful web-services. In: proceedings of the 2009
winter simulation conference (WSC), Austin, TX, 13–16
December 2009, pp.1323–1334. IEEE.
67. Mittal S and Risco-Martı
´n JL. DEVSML 3.0 Stack: rapid
deployment of DEVS farm in distributed cloud environment
using microservices and containers. In: Proceedings of the
symposium on theory of modeling & simulation (TMS/DEVS
’17), Virginia Beach, Virginia, 23–26 April 2017.
68. NATO STO. Modelling and Simulation as a Service
(MSaaS) - volume 1: MSaaS technical reference architec-
ture. Technical report, NATO Science and Technology
Organization (STO), 2018.
Author biographies
Nane Kratzke is a professor for Computer Science at
the Lu¨beck University of Applied Sciences and a former
Navy Officer (German Navy). He consulted for the
German Ministry of Defence in questions regarding
network-centric warfare. His particular research focus is
directed at cloud-native applications and cloud-native ser-
vice-related software engineering methodologies and cor-
responding application architectural styles, such as
microservices or serverless architectures. In addition, he is
interested in data science, distributed systems, and web-
scale elastic systems.
Robert Siegfried is senior consultant for IT and M&S
projects and CEO of aditerna GmbH and Aditerna Inc.
Prior to his industry engagement, he was research associ-
ate at the University of the Federal Armed Forces in
Munich, Germany. His primary research areas are agent-
based M&S and parallel and distributed simulation.
Within several projects for the German Armed Forces and
US Department of Defense (DoD), he has worked (and is
still working) on topics such as MSaaS, artificial intelli-
gence (AI)-supported data fusion, metadata specifications,
model management systems, distributed simulation test
beds, and process models. Since October 2018, he has
served as Vice-Chair of the NMSG. He is actively
involved in multiple working groups of the Simulation
Interoperability Standards Organization (SISO) and serves
as member of the SISO Executive Committee.
20 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
... Third, Kratzke and Siegfried (2020) focus on the consequences expected from leveraging the cloud for M&S services. Using their work on and definition of CNAs (Kratzke and Quint 2017) as a basis, they propose a definition for what Cloud-native Simulations (CNSs) are in terms of a textual definition (Kratzke and Siegfried 2020, section 4.3), a cloud-native simulation stack and a cloud simulation maturity model. ...
... The SIMaaS-implementation presented in section 3 has several desirable characteristics. First, its design reflects best practice and the state of the art for creating SaaS as identified by Kratzke and Siegfried (2020). Second, the decision to decouple API and workers, only linking them by the internal representation of tasks in the task queue, means that the implementation of how models are simulated can be changed without having to change the public service interface. ...
Conference Paper
Full-text available
Providing modelling and simulation capabilities as a service promises to increase their value by improving accessibility for non-expert users and software agents as well as by leveraging cloud computing technology to scale simulation performance beyond the capabilities of a single computer. In order to reach this potential, implementations must align their design with the architectural styles of cloud computing applications and the web in general. We present an open-source, cloud-native Simulation as a Service (SIMaaS)-implementation that gives access to models and allows simulating them on the web. The implementation uses Functional Mockup Units (FMUs) for co-simulation as an executable form of a model and relies on FMPy for simulation. It is realized as a microservice in the form of a REST-based HTTP-API. Functionality and performance are demonstrated by using the service to create ensemble forecasts for PV systems and to search for an optimal parameter set using a genetic algorithm. Conceptual limitations and the resulting opportunities for further work are summarized.
... FaaS simulation is a subarea of the previously introduced cloud simulation. Approaches present in literature can be divided into two categories: Firstly, the FaaS platforms are used as simulation engines where other systems are deployed to and investigated, like in [6], [37]. Secondly, research where the FaaS platform itself is simulated and cloud functions are only deployed to validate the simulation in the specific experiments. ...
... FaaS simulation is a subarea of the previously introduced cloud simulation. Approaches present in literature can be divided into two categories: Firstly, the FaaS platforms are used as simulation engines where other systems are deployed to and investigated, like in [6], [37]. Secondly, research where the FaaS platform itself is simulated and cloud functions are only deployed to validate the simulation in the specific experiments. ...
Conference Paper
Full-text available
Function as a Service (FaaS)-the reason why so many practitioners and researchers talk about Serverless Computing-claims to hide all operational concerns. The promise when using FaaS is that users only have to focus on the core business functionality in form of cloud functions. However, a few configuration options remain within the developer's responsibility. Most of the currently available cloud function offerings force the user to choose a memory or other resource setting and a timeout value. CPU is scaled based on the chosen options. At a first glance, this seems like an easy task, but the tradeoff between performance and cost has implications on the quality of service of a cloud function. Therefore, in this paper we present a local simulation approach for cloud functions and support developers in choosing a suitable configuration. The methodology we propose simulates the execution behavior of cloud functions locally, makes the cloud and local environment comparable and maps the local profiling data to a cloud platform. This reduces time during the development and enables developers to work with their familiar tools. This is especially helpful when implementing multi-threaded cloud functions.
... As compared to traditional approaches, the applications developed using serverless cloud computing (also known as serverless architectures) either depend on the third-party services (known as Backend-as-a-Service) or on the custom code runs in some stateless compute containers (known as Functionas-a-Service) which are short-lived, event-triggered and fully managed by the cloud provider [1][2][3][4]. It implies that the developers write concise and stateless functions which can be triggered through the events, produced from different sensors as well as services / users or middleware [5,6]. Consequently, the platform ensures a secure and timely execution of these functions (written by developers) by running the infrastructure efficiently [7]. ...
Article
Full-text available
In a serverless cloud computing environment, the cloud provider dynamically manages the allocation of resources whereas the developers purely focus on their applications. The data-driven applications in serverless cloud computing mainly address the web as well as other distributed scenarios, and therefore, it is essential to offer a consistent user experience across different connection types. In order to address the issues of data-driven application in a real-time distributed environment, the use of GraphQL (Graph Query Language) is getting more and more popularity in state-of-the-art cloud computing approaches. However, the existing solutions target the low level implementation of GraphQL, for the development of a complex data-driven application, which may lead to several errors and involve a significant amount of development efforts due to various users’ requirements in real-time. Therefore, it is critical to simplify the development process of data-driven applications in a serverless cloud computing environment. Consequently, this research introduces UMLPDA (Unified Modeling Language Profile for Data-driven Applications), which adopts the concepts of UML-based Model-driven Architectures to model the frontend as well as the backend requirements for data-driven applications developed at a higher abstraction level. Particularly, a modeling approach is proposed to resolve the development complexities such as data communication and synchronization. Subsequently, a complete open source transformation engine is developed using a Model-to-Text approach to automatically generate the frontend as well as backend low level implementations of Angular2 and GraphQL respectively. The validation of proposed work is performed with three different case studies, deployed on Amazon Web Services platform. The results show that the proposed framework enables to develop the data-driven applications with simplicity.
... Therefore, this paper deals with the question of how to improve the resource sharing of future VC projects. We have done some similar transfer research for another domain and asked what could be shifted auspiciously from the cloud computing domain to the simulation domain [1]. Because we derived some stupendous insights for the simulation domain, we will follow a quite similar methodology here (see Figure 3). ...
Article
Full-text available
From close to scratch, the COVID-19 pandemic created the largest volunteer supercomputer on earth. Sadly, processing resources assigned to the corresponding Folding@home project cannot be shared with other volunteer computing projects efficiently. Consequently, the largest supercomputer had significant idle times. This perspective paper investigates how the resource sharing of future volunteer computing projects could be improved. Notably, efficient resource sharing has been optimized throughout the last ten years in cloud computing. Therefore, this perspective paper reviews the current state of volunteer and cloud computing to analyze what both domains could learn from each other. It turns out that the disclosed resource sharing shortcomings of volunteer computing could be addressed by technologies that have been invented, optimized, and adapted for entirely different purposes by cloud-native companies like Uber, Airbnb, Google, or Facebook. Promising technologies might be containers, serverless architectures, image registries, distributed service registries, and all have one thing in common: They already exist and are all tried and tested in large web-scale deployments.
Article
Cloud infrastructure provides rapid resource provision for on-demand computational requirements. Cloud simulation environments today are largely employed to model and simulate complex systems for remote accessibility and variable capacity requirements. In this regard, scalability issues in Modeling and Simulation (M&S) computational requirements can be tackled through the elasticity of on-demand Cloud deployment. However, implementing a high performance cloud M&S framework following these elastic principles is not a trivial task as parallelizing and distributing existing architectures is challenging. Indeed, both the parallel and distributed M&S developments have evolved following separate ways. Parallel solutions has always been focused on ad-hoc solutions, while distributed approaches, on the other hand, have led to the definition of standard distributed frameworks like the High Level Architecture (HLA) or influenced the use of distributed technologies like the Message Passing Interface (MPI). Only a few developments have been able to evolve with the current resilience of computing hardware resources deployment, largely focused on the implementation of Simulation as a Service (SaaS), albeit independently of the parallel ad-hoc methods branch. In this paper, we present a unified parallel and distributed M&S architecture with enough flexibility to deploy parallel and distributed simulations in the Cloud with a low effort, without modifying the underlying model source code, and reaching important speedups against the sequential simulation, especially in the parallel implementation. Our framework is based on the Discrete Event System Specification (DEVS) formalism. The performance of the parallel and distributed framework is tested using the xDEVS M&S tool, Application Programming Interface (API) and the DEVStone benchmark with up to eight computing nodes, obtaining maximum speedups of 15.95× and 1.84×, respectively.
Article
Purpose This study aims to investigate the impact of skills and knowledge of employees, economic situations of the company, current IT infrastructure, payment fashion, cloud availability, and cloud privacy and security on the productivity of the human resources in the COVID-19 era. Design/methodology/approach Over the past few years, the advent of cloud-assisted technologies has dramatically advanced the Information Technology (IT)-based industries by providing everything as a service. Cloud computing is recognized as a growing technology among companies around the world. One of the most critical cloud applications is deploying systems and organizational resources, especially systems whose deployment costs are high. Manpower is one of the basic and vital resources of the organization, and organizations need an efficient workforce to achieve their goals. But, in the COVID-19 era, human resources' productivity can be reduced due to stress, high labor force, reduced organizational performance and profits, unfavorable organizational conditions, inability to manage and lack of training. Therefore, this study tries to investigate the productivity of human resources in the COVID-19 era. Data were collected from the medium-sized companies through a questionnaire. Distributed questionnaires were conducted on the Likert scale. The model is assessed using the structural equation modeling technique to examine its reliability and validity. The study is a library method and literature review. A case study was conducted through a questionnaire and statistical analysis by SPSS 25 and SMART-PLS. Findings Based on the findings, the skills and knowledge of employees, the economic situations of the company, payment fashion, cloud availability and the current IT infrastructures of the company have a positive impact on human resource efficiency in the COVID-19 era. But cloud privacy and security have a negative effect on the productivity of human resources. The findings can be the basis for companies and organizations in the COVID-19 era. Research limitations/implications This study has some restrictions that need to be considered in evaluating the obtained results. First, due to the prevalence of Coronavirus, access to information from the companies under study was limited. Second, this research may have overlooked other variables that affect human resource productivity in the COVID-19 era. Prospective researchers can examine the impact of Customer Relationship Management (CRM) and Supply Chain Management (SCM) on the human resource's productivity in the COVID-19 era. Practical implications The results of this research are applicable for all companies, their departments and human resources in the COVID-19 era. Originality/value In this paper, human resources' productivity in the COVID-19 era is pointed out. The presented new model provides a complete framework for investigating cloud-based enterprise resource planning systems affect the productivity of human resources in the COVID-19 era.
Article
Full-text available
The vision of network-centric operations is to increase operational capabilities through networked collaboration. NATO and its member nations state this vision in strategic documents at a very high level of abstraction. While suitable for giving an overall feel, current documentation renders the steps toward implementing those visions largely unsupported. We outline a method that is based on agile requirements engineering, for converting high-level strategic visions into capabilities whose forms lend themselves to incremental implementation. We illustrate the use of this method in two cases that deal with both operational capabilities and technical capabilities. We also show how the method enables one to prioritise which capabilities to develop first. We conclude that it is necessary to formulate and implement some form of explicit methodology with which to span the gap between strategic visions and an effective implementation of those visions.
Chapter
Systems engineering and simulation of cyber-physical systems require the aggregation of disparate models from the component cyber and physical domains in order to understand the whole system. Military multi-domain operations employ emerging technologies such as unmanned sensors, cyber, and electronic warfare. The Discrete Event System—Distributed Modeling Framework (DEVS-DMF) is a simulation technology that enables composition of multiple models via the actor model of computation, parallel and asynchronous messaging, and location transparency. Using a system of systems engineering approach, we compose models of military operations, unmanned systems, and electronic warfare technologies to analyze mission performance using different advanced equipment sets. Important performance metrics span the physical (sensor performance), cyber (electronic attack), human factors (soldier load), and military (mission success) domains. Simulation services are allocated to each domain, and the simulation’s microservice architecture allows for independently deployable services that own their internal state. Containerization and cloud deployment allow geographically distributed users to manipulate simulation inputs, conduct large-scale experiments, and analyze simulation output using browser and web tools. The resulting ensemble enables system of systems engineering and analysis of cyber and electronic systems in support of small tactical operations.
Article
Full-text available
In the late-1950s, leasing time on an IBM 704 cost hundreds of dollars per minute. Today, cloud computing, that is, using IT as a service, on-demand and pay-per-use, is a widely used computing paradigm that offers large economies of scale. Born from a need to make platform as a service (PaaS) more accessible, fine-grained, and affordable, serverless computing has garnered interest from both industry and academia. This article aims to give an understanding of these early days of serverless computing: what it is, where it comes from, what is the current status of serverless technology, and what are its main obstacles and opportunities.
Article
Full-text available
Simulation is used in industry to study a large variety of problems ranging from increasing the productivity of a manufacturing system to optimising the design of a wind turbine. However, some simulation models can be computationally demanding and some simulation projects require time consuming experimentation. High performance computing infrastructures such as clusters can be used to speed up the execution of large models or multiple experiments but at a cost that is often too much for Small and Medium-sized Enterprises (SMEs). Cloud computing presents an attractive, lower cost alternative. However, developing a cloud-based simulation application can again be costly for an SME due to training and development needs, especially if software vendors need to use resources of different heterogeneous clouds to avoid being locked-in to one particular cloud provider. In an attempt to reduce the cost of development of commercial cloud-based simulations, the CloudSME Simulation Platform (CSSP) has been developed as a generic approach that combines an AppCenter with the workflow of the WS-PGRADE/gUSE science gateway framework and the multi-cloud-based capabilities of the CloudBroker Platform. The paper presents the CSSP and two representative case studies from distinctly different areas that illustrate how commercial multi-cloud-based simulations can be created.
Article
Full-text available
Microservices are an architectural approach emerging out of service-oriented architecture, emphasizing self-management and lightweightness as the means to improve software agility, scalability, and autonomy. This article examines microservice evolution from the technological and architectural perspectives and discusses key challenges facing future microservice developments.
Conference Paper
Full-text available
As a key part of the serverless computing paradigm, Function-as-a-Service (FaaS) platforms enable users to run arbitrary functions without being concerned about operational issues. However, there are several performance-related issues surrounding the state-of-the-art FaaS platforms that can deter widespread adoption of FaaS, including sizeable overheads, unreliable performance, and new forms of the cost-performance trade-off. In this work we, the SPEC RG Cloud Group, identify six performance-related challenges that arise specifically in this FaaS model, and present our roadmap to tackle these problems in the near future. This paper aims at motivating the community to solve these challenges together.
Conference Paper
Full-text available
Microservices is an architectural style increasing in popularity. However, there is still a lack of understanding how to adopt a microservice-based architectural style. We aim at characterizing different microservice architectural style patterns and the principles that guide their definition. We conducted a systematic mapping study in order to identify reported usage of microservices and based on these use cases extract common patterns and principles. We present two key contributions. Firstly, we identified several agreed microservice architecture patterns that seem widely adopted and reported in the case studies identified. Secondly, we presented these as a catalogue in a common template format including a summary of the advantages, disadvantages, and lessons learned for each pattern from the case studies. We can conclude that different architecture patterns emerge for different migration, orchestration, storage and deployment settings for a set of agreed principles.
Chapter
Full-text available
Serverless computing has emerged as a new compelling paradigm for the deployment of applications and services. It represents an evolution of cloud programming models, abstractions, and platforms, and is a testament to thematurity and wide adoption of cloud technologies. In this chapter, we survey existing serverless platforms from industry, academia, and open-source projects, identify key characteristics and use cases, and describe technical challenges and open problems.
Conference Paper
Full-text available
In line with cloud computing emergence as the dominant enterprise computing paradigm, our conceptualization of the cloud computing reference architecture and service construction has also evolved. For example, to address the need for cost reduction and rapid provisioning, virtualization has moved beyond hardware to containers. More recently, serverless computing or Function-as-a-Service has been presented as a means to introduce further cost-efficiencies, reduce configuration and management overheads, and rapidly increase an application's ability to speed up, scale up and scale down in the cloud. The potential of this new computation model is reflected in the introduction of serverless computing platforms by the main hyperscale cloud service providers. This paper provides an overview and multi-level feature analysis of seven enterprise serverless computing platforms. It reviews extant research on these platforms and identifies the emergence of AWS Lambda as a de facto base platform for research on enterprise serverless cloud computing. The paper concludes with a summary of avenues for further research.
Conference Paper
Full-text available
There is an industrial shift from Service-Oriented Architectures (SOA) into Microservices; however, a quick review of online resources on these topics reveals a range of different understandings of these two architectures. Individuals often mix terms, grant false advantages or expect different quality attributes and properties. The purpose of this paper is to provide readers a solid understanding of the differences between these two architectures and their features. We provide both research and industry perspectives to point out strengths and weaknesses of both architectural directions, and we point out many shortcomings in both approaches that are not addressed by the architecture. Finally, based on this we propose challenges for future research.
Conference Paper
Cloud applications are increasingly built from a mixture of runtime technologies. Hosted functions and service-oriented web hooks are among the most recent ones which are natively supported by cloud platforms. They are collectively referred to as serverless computing by application engineers due to the transparent on-demand instance activation and microbilling without the need to provision infrastructure explicitly. This half-day tutorial explains the use cases for serverless computing and the drivers and existing software solutions behind the programming and deployment model also known as Function-as-a-Service in the overall cloud computing stack. Furthermore, it presents practical open source tools for deriving functions from legacy code and for the management and execution of functions in private and public clouds.