ArticlePDF Available

Towards cloud-native simulations - lessons learned from the front-line of cloud computing

Authors:
  • Lübeck University of Applied Sciences
  • aditerna GmbH

Abstract and Figures

Cloud computing can be a game-changer for computationally intensive tasks like simulations. The computational power of Amazon, Google, or Microsoft is even available to a single researcher. However, the pay-as-you-go cost model of cloud computing influences how cloud-native systems are being built. We transfer these insights to the simulation domain. The major contributions of this paper are twofold: (A) we propose a cloud-native simulation stack and (B) derive expectable software engineering trends for cloud-native simulation services. Our insights are based on systematic mapping studies on cloud-native applications, a review of cloud standards, action research activities with cloud engineering practitioners, and corresponding software prototyping activities. Two major trends have dominated cloud computing over the last 10 years. The size of deployment units has been minimized and corresponding architectural styles prefer more fine-grained service decompositions of independently deployable and horizontally scalable services. We forecast similar trends for cloud-native simulation architectures. These similar trends should make cloud-native simulation services more microservice-like, which are composable but just ''simulate one thing well.'' However, merely transferring existing simulation models to the cloud can result in significantly higher costs. One critical insight of our (and other) research is that cloud-native systems should follow cloud-native architecture principles to leverage the most out of the pay-as-you-go cost model.
Content may be subject to copyright.
Special Issue
JDMS
Journal of Defense Modeling and
Simulation: Applications,
Methodology, Technology
1–20
ÓThe Author(s) 2020
DOI: 10.1177/1548512919895327
journals.sagepub.com/home/dms
Towards cloud-native simulations –
lessons learned from the front-line of
cloud computing
Nane Kratzke
1
and Robert Siegfried
2
Abstract
Cloud computing can be a game-changer for computationally intensive tasks like simulations. The computational power
of Amazon, Google, or Microsoft is even available to a single researcher. However, the pay-as-you-go cost model of
cloud computing influences how cloud-native systems are being built. We transfer these insights to the simulation
domain. The major contributions of this paper are twofold: (A) we propose a cloud-native simulation stack and (B)
derive expectable software engineering trends for cloud-native simulation services. Our insights are based on systematic
mapping studies on cloud-native applications, a review of cloud standards, action research activities with cloud engineer-
ing practitioners, and corresponding software prototyping activities. Two major trends have dominated cloud computing
over the last 10 years. The size of deployment units has been minimized and corresponding architectural styles prefer
more fine-grained service decompositions of independently deployable and horizontally scalable services. We forecast
similar trends for cloud-native simulation architectures. These similar trends should make cloud-native simulation ser-
vices more microservice-like, which are composable but just ‘‘simulate one thing well.’’ However, merely transferring
existing simulation models to the cloud can result in significantly higher costs. One critical insight of our (and other)
research is that cloud-native systems should follow cloud-native architecture principles to leverage the most out of the
pay-as-you-go cost model.
Keywords
Cloud computing, cloud native, cloud maturity, simulation, reference model, maturity model
1. Introduction
Simulation is used for various purposes, such as training,
analysis, and decision support. Consequently, modeling
and simulation (M&S) has become a critical technology
for many industry sectors (such as logistics and manufac-
turing) and the defense sector. Achieving interoperability
between multiple simulation systems and ensuring the
credibility of results often requires enormous efforts with
regards to time, personnel, and budget. Recent technical
developments in the area of cloud computing technology
and service-oriented architectures (SOAs) may offer
opportunities to better utilize M&S capabilities in order to
satisfy these critical needs. A concept that includes service
orientation and the provision of M&S applications via the
as-a-service model of cloud computing may enable more
composable simulation environments that can be deployed
on-demand. This new concept is commonly known as
M&S as a Service (MSaaS).
1.1 MSaaS simulation principles and validation
results
The NATO Modelling and Simulation Group (NMSG) is
part of the NATO Science and Technology Organization
(STO). The mission of the NMSG is to promote coopera-
tion among alliance bodies, NATO, and partner nations to
1
Department for Electrical Engineering and Computer Science, Lu¨beck
University of Applied Sciences, Germany
2
aditerna GmbH, Germany
Corresponding author:
Nane Kratzke, Department for Electrical Engineering and Computer
Science, Lu¨beck University of Applied Sciences, Mo
¨nkhofer Weg 239,
Lu¨beck, 23562, Germany.
Email: nane.kratzke@th-luebeck.de
maximize the effective utilization of M&S. Primary mis-
sion areas include the following:
M&S standardization;
education;
associated science and technology.
The NMSG is tasked to enforce and supervise imple-
mentation of the NATO Modelling and Simulation
Masterplan (NMSMP; v2.0 (AC/323/NMSG(2012)-015)).
The NMSMP defines several objectives that collectively
will help to exploit M&S to its full potential across NATO
and the nations to enhance both operational and cost-effec-
tiveness. This vision will be achieved through a coopera-
tive effort guided by the following principles.
Synergy: leverage and share the existing NATO
and national M&S capabilities.
Interoperability: direct the development of common
M&S standards and services for simulation intero-
perability and foster interoperability between
Command & Control (C2) and simulation.
Reuse: increase the visibility, accessibility, and
awareness of M&S assets to foster sharing across
all NATO M&S application areas.
The NMSMP defines five strategic objectives, two of
which are directly addressed by the MSaaS efforts
described in this paper:
establish a common technical framework;
provide coordination and common services.
NATO MSG-136 (Modelling and Simulation as a
Service)
1
is one of the working groups under the NMSG.
From 2014 to 2017 this working group investigated the
concept of MSaaS to provide the technical and organiza-
tional foundations for a future service-based allied frame-
work for MSaaS within NATO and partner nations. In this
period, MSG-136 did groundbreaking work by defining
MSaaS in the NATO context and by developing opera-
tional, technical, and governance concepts for permanently
establishing the ‘‘Allied Framework for MSaaS.’’ In addi-
tion to developing the foundational concepts, MSG-136
conducted extensive experimentation activities to test and
validate the concepts.
2
From 2018 to 2021, the initial con-
cepts are extended by MSG-164 and validated through
dedicated evaluation events and participation in opera-
tional exercises.
1.2 Cloud-native lessons learned
Even tiny companies can generate enormous economic
growth and business value by providing cloud-based
services or applications: Instagram, Uber, WhatsApp,
NetFlix, Twitter – and many astonishing small companies
(if we relate the modest headcount of these companies in
their founding days to their noteworthy economic impact)
whose services are frequently used. However, even a fast-
growing start-up business model should have long-term
consequences and dependencies in mind. Many of these
companies rely on public cloud infrastructures – often pro-
vided by Amazon Web Services (AWS), Microsoft
(Azure), Google (Cloud Services), etc. Meanwhile, cloud
providers run a significant amount of mission-critical busi-
ness software for companies that no longer operate their
own data centers. Moreover, it is very often economical if
workloads have a high peak-to-average ratio.
3
However,
there are downsides. Although cloud services could be
standardized commodities, they are mostly not. Once a
cloud-hosted application or service is deployed to a spe-
cific cloud infrastructure, it is often inherently bound to
that infrastructure due to non-obvious technological bind-
ings. A transfer to another cloud infrastructure is very
often a time consuming and expensive one-time exercise.
A good real-world example here is Instagram. After being
bought by Facebook, it took over a year for the Instagram
engineering team to find and establish a solution for the
transfer of all its services from AWS to Facebook data
centers. Although no downtimes were planned, noteworthy
outages occurred during that period.
The National Institute of Standards and Technology
(NIST) definition of cloud computing defines three basic
and well-accepted service categories
4
: Infrastructure as a
Service (IaaS), Platform as a Service (PaaS), and Software
as a Service (SaaS). IaaS provides maximum flexibility
for arbitrary consumer-created software but hides almost
no operation complexity of the application (just of the
infrastructure). SaaS, on the other hand, hides operation
complexity almost entirely but is too limited for many use
cases involving consumer-created software. PaaS is a com-
promise enabling the operation of consumer-created soft-
ware with a convenient operation complexity but at the
cost to accept to some degree of lock-in situations result-
ing from the platform.
Throughout a project called CloudTRANSIT, we
searched intensively for solutions to overcome this ‘‘cloud
lock-in’’ – to make cloud computing an actual commodity.
We developed and evaluated a cloud application transfer-
ability concept that has prototype status but already works
for approximately 70% of the current cloud market, and
that can be extended for the rest of the market share.
5
However, what is essential for this paper is that we learned
some core insights from our action research with
practitioners:
practitioners want to have a choice between
platforms;
2Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
practitioners prefer declarative and cybernetic
(auto-adjusting) instead of workflow-based
(imperative) deployment and orchestration
approaches;
practitioners are forced to make efficient use of
cloud resources because more and more systems
are migrated to cloud infrastructures, causing stea-
dily increasing bills;
practitioners rate pragmatism of solutions much
higher than full feature coverage of cloud platforms
and infrastructures.
1.3 Research question
All these points influence how practitioners construct
cloud application architectures that are intentionally
designed for the cloud. One thing we learned was the fact
that cloud-native applications – although they are all dif-
ferent – follow some common architectural patterns that
we could exploit for transferability. This paper investigates
the research question of how these lessons learned can be
transferred from the cloud-native computing to the simula-
tion and modeling domain.
1.4 Outline
Therefore, the remainder of this paper is outlined as fol-
lows. We present a cloud application reference model in
Section 2 that steered our research in the cloud computing
domain. According to our experiences and action research
activities over the last 10 years, cloud computing is domi-
nated by two major long-term trends that are investigated
in Section 3. In particular, we investigate resource utiliza-
tion improvements in Section 3.1 and the architectural evo-
lution of cloud applications in Section 3.2. Section 4 will
analyze both trends regarding possible upcoming trends of
interest in the M&S community. Section 5 will present cor-
responding related work from cloud computing and the
MSaaS domain to provide interesting follow-up for the
reader. We will conclude our thoughts in Section 6 and
forecast intensified decentralizing and more fine-grained
service composing approaches for cloud computing and
the MSaaS domain.
2. Reference model
Our problem awareness results mainly from the conducted
research project CloudTRANSIT. This project dealt with
the question of how to transfer cloud applications and ser-
vices at runtime without downtime across cloud infrastruc-
tures from different public and private cloud service
providers to tackle the existing and growing problem of
vendor lock-in in cloud computing. Throughout the proj-
ect, we published more than 20 research papers. However,
the intent of this paper is not to summarize these papers.
The interested reader is referred to the corresponding tech-
nical report
5
that provides an integrated view of these
outcomes.
Almost all cloud system engineers focus on a common
problem. The core components of their distributed and
cloud-based systems, such as virtualized server instances
and essential networking and storage, can be deployed
using commodity services. However, further services –
that are needed to integrate these virtualized resources in
an elastic, scalable, and pragmatic manner – are often not
considered in standards. Services such as load balancing,
auto-scaling, or message queuing systems are needed to
design an elastic and scalable cloud-native system on
almost every cloud service infrastructure. Some standards,
such as AMQP
6
for messaging (dating back almost to the
pre-cloud era), exist. However, mainly these integrating
and ‘‘gluing’’ service types – that are crucial for almost
every cloud application on a higher cloud maturity level –
are often not provided in a standardized manner by cloud
providers.
7
It seems that all public cloud service providers
try to stimulate cloud customers to use their non-
commodity convenience service ‘‘interpretations’’ to bind
them to their infrastructures and higher level service
portfolios.
What is more, according to an analysis we performed
in 2016,
8
the percentage of these commodity service cate-
gories that are considered in standards, such as CIMI,
9
OCCI,
10,11
CDMI,
12
OVF,
13
OCI,
14
and TOSCA,
15
has
even decreased over the years. That has mainly to do with
the fact that new cloud service categories are released
faster than standardization authorities can standardize
existing service categories. Figure 1 shows this effect by
the example of AWS over the years. That is how vendor
Figure 1. Decrease of standard coverage over years (by
example of Amazon Web Services).
Kratzke and Siegfried 3
lock-in emerges in cloud computing. For a more detailed
discussion, we refer to Opara-Martins et al.,
16
Kratzke et
al.,
8
and Kratzke and Peinl.
17
Therefore, all reviewed cloud standards focus on a min-
imal but necessary subset of popular cloud services: com-
pute nodes (virtual machines), storage (file, block, object),
and (virtual private) networking. Standardized deployment
approaches, such as TOSCA, are defined mainly against
this commodity infrastructure level of abstraction. These
kinds of services are often subsumed as IaaS and build the
foundation of cloud services and therefore cloud-native
applications. All other service categories might foster ven-
dor lock-in situations. That might sound disillusioning. In
consequence, many cloud engineering teams follow the
basic idea that a cloud-native application stack should be
only using a minimal subset of well-standardized IaaS ser-
vices as founding building blocks. Because existing cloud
standards cover only specific cloud service categories
(mainly the IaaS level) and do not show an integrated
point of view, a more integrated reference model that take
the best practices of practitioners into account would be
helpful.
Very often cloud computing is investigated from a ser-
vice model point of view (IaaS, PaaS, SaaS) or a deploy-
ment point of view (private, public, hybrid, community
cloud).
4
Alternatively, one can look from an actor point of
view (provider, consumer, auditor, broker, carrier) or a
functional point of view (service deployment, service
orchestration, service management, security, privacy), as
done by Bohn et al.
18
Points of view are particularly useful
to split problems into concise parts. However, the
viewpoints mentioned above might be common in cloud
computing and useful from a service provider point of
view, but not from a cloud-native application engineering
point of view. From an engineering point of view, it seems
more useful to have views on the technology levels
involved and applied in cloud-native application
engineering.
By using the insights from our systematic mapping
study
19
and our review of cloud standards,
17
we compiled
a reference model of cloud-native applications. This
layered reference model is shown and explained in
Figure 2. The basic idea of this reference model is to use
only a small subset of well-standardized IaaS services as
founding building blocks (Layer 1). Four primary view-
points form the overall shape of this model.
Infrastructure provisioning: this is a viewpoint
that is familiar for engineers working on the infra-
structure level and how IaaS is understood. IaaS
deals with the deployment of separate compute
nodes for a cloud consumer. The cloud consumer
must manage these (hundreds of) requested and iso-
lated nodes.
Clustered elastic platforms: this is a viewpoint
that is familiar for engineers who are dealing with
horizontal scalability across nodes. Clusters are a
concept to handle many Layer 1 nodes as one logi-
cal compute node (a cluster). Such technologies are
often the technological backbone for portable cloud
runtime environments because they are hiding com-
plexity (of hundreds or thousands of single nodes)
Figure 2. Cloud-native stack observable in many cloud-native applications. FaaS: Function as a Service; IaaS: Infrastructure as a
Service.
4Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
appropriately. In addition, this layer realizes the
foundation to define services and applications with-
out reference to particular cloud services, cloud
platforms, or cloud infrastructures. Thus, it pro-
vides a foundation to avoid vendor lock-in.
Service composing: this is a viewpoint familiar for
application engineers dealing with web services in
SOAs. These (micro)-services operate on a Layer 2
cloud runtime platform (such as Kubernetes,
Mesos, Swarm, Nomad, and so on). Thus, the com-
plex orchestration and scaling of these services are
abstracted and delegated to a cluster (cloud runtime
environment) on Layer 2.
Application: this is a viewpoint that is familiar for
end-users of cloud services (or cloud-native applica-
tions). These cloud services are composed of smaller
cloud Layer 3 services being operated on clusters
formed of single compute and storage nodes.
For more details, we refer to Kratzke and Peinl
17
and
Kratzke and Quint.
5
However, the remainder of this paper
follows this model.
3. Observable long-term trends in cloud
computing
Cloud computing emerged some 10 years ago. In the first
adoption phase, existing IT systems were merely trans-
ferred to cloud environments without changing the original
design and architecture of these applications. Tiered appli-
cations were merely migrated from dedicated hardware to
virtualized hardware in the cloud. Cloud system engineers
implemented remarkable improvements in cloud platforms
(PaaS) and infrastructures (IaaS) over the years and estab-
lished several engineering trends.
All of these trends try to optimize specific quality fac-
tors, such as functional stability, performance efficiency,
compatibility, usability, reliability, maintainability, port-
ability, and security of cloud services to improve the over-
all quality of service (QoS). The most focused quality
factors are functional stability, performance efficiency,
and reliability (including availability).
20,21
Therefore, these
engineering trends, listed in Table 1, seem somehow iso-
lated. We want to review these trends from two different
perspectives.
Table 1. Some observable software engineering trends coming along with CNAs.
Trend Rationale
Microservices Microservices can be seen as a ‘‘pragmatic’’ interpretation of SOA. In addition to SOA, microservice
architectures intentionally focus and compose small and independently replaceable horizontally scalable
services that are ‘‘doing one thing well.’’
DevOps DevOps is a practice that emphasizes the collaboration of software developers and IToperators. It aims to
build, test, and release software more rapidly, frequently, and more reliably using automated processes for
software delivery. DevOps fosters the need for independent replaceable and standardized deployment units
and therefore pushes microservice architectures and container technologies.
Cloud modeling
languages
Softwareization of infrastructure and network enables one to automate the process of software delivery
and infrastructure changes more rapidly. Cloud modeling languages can express applications and services
and their elasticity behavior that shall be deployed to such infrastructures or platforms.
Standardized
deployment units
Deployment units wrap a piece of software in a complete file system that contains everything needed to
run: code, runtime, system tools, system libraries. So, it is guaranteed that the software will always run the
same, regardless of its environment. This deployment approach is often made using container technologies
(OCI standard). Each deployment unit should be designed and interconnected according to a collection of
cloud-focused patterns, such as the twelve-factor app collection, the circuit breaker pattern, and cloud
computing patterns.
Elastic platforms Elastic platforms, such as Kubernetes, Mesos, or Swarm, can be seen as a unifying middleware of elastic
infrastructures. Elastic platforms extend resource sharing and increase the utilization of underlying
compute, network, and storage resources for custom but standardized deployment units.
Serverless The term ‘‘serverless’’ is used for an architectural style that is used for cloud application architectures that
deeply depend on external third-party services (Backend-as-a-Service, BaaS) and integrating them via small
event-based triggered functions (Function-as-a-Service, FaaS). FaaS extends resource sharing of elastic
platforms by simply applying time-sharing concepts.
State isolation Stateless components are easier to scale up/down horizontally than stateful components. Of course,
stateful components cannot be avoided, but stateful components should be reduced to a minimum and
realized by intentional horizontal scalable storage systems (often eventual consistent NoSQL databases).
Versioned REST APIs REST-based APIs provide scalable and pragmatic communication, which means relying mainly on already
existing internet infrastructure and well-defined and widespread standards.
Loose coupling Service composition is done by events or by data. Event coupling relies on messaging solutions (e.g., AMQP
standard). Data coupling often relies on scalable but (mostly) eventual consistent storage solutions (which
are often subsumed as NoSQL databases).
CNAs: cloud-native applications; SOA: service-oriented architecture; API: application programming interface.
Kratzke and Siegfried 5
3.1 Resource utilization
Cloud infrastructures (IaaS) and platforms (PaaS) are built
to be elastic. Elasticity is understood as the degree to
which a system adapts to workload changes by provision-
ing and de-provisioning resources automatically. Without
this, cloud computing is very often not reasonable from an
economic point of view.
3
Over time, system engineers
learned to understand the elasticity options of modern
cloud environments better. Eventually, systems were
designed for such elastic cloud infrastructures, which
increased the utilization rates of underlying computing
infrastructures via new deployment and design approaches,
such as containers, microservices, or serverless architec-
tures. This design intention is often expressed using the
term ‘‘cloud native.’
Figure 3 shows a noticeable trend over the last decade.
Machine virtualization was introduced to consolidate many
bare metal machines to make more efficient utilization of
physical resources. This machine virtualization forms the
technological backbone of IaaS cloud computing. Virtual
machines might be more lightweight than bare metal ser-
vers, but they are still heavy, especially regarding their
image sizes. Due to being more fine-grained, containers
improved the way of standardized deployments but also
increased the utilization of virtual machines.
Nevertheless, although containers can be scaled
quickly, they are still always-on components. For that
reason Function-as-a-Service (FaaS) approaches have
emerged and applied time sharing of containers on under-
lying container platforms. Due to this time-shared execu-
tion of containers on the same hardware, FaaS enables
even a scale-to-zero capability. This improved resource
efficiency can be even measured monetarily.
22
So, over
time the technology stack to manage resources in the
cloud became more complicated and more difficult to
understand but followed one trend – to run a greater work-
load on the same number of physical machines.
3.1.1 Service-oriented deployment monoliths. Service-
oriented computing is a paradigm for distributed comput-
ing and e-business processing and has been introduced to
manage the complexity of distributed systems and to inte-
grate different software applications. A service offers func-
tionalities to other services mainly via message passing.
Services decouple their interfaces from their implementa-
tion. Corresponding architectures for such applications are
called SOAs. Many business applications have been devel-
oped over recent decades following this architectural para-
digm. Also, due to its underlying service concepts, these
applications can be deployed in cloud environments with-
out any problems. However, the main problem for cloud
system engineers emerges from the problem that –
although these kinds of applications are composed of dis-
tributed services – their deployment is not. These kinds of
Figure 3. The cloud architectural evolution from a resource utilization point of view. VM: virtual machine; FaaS: Function as a
Service.
6Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
distributed applications are conceptually monolithic appli-
cations from a deployment point of view.
In other words, the complete distributed application
must be deployed all at once in the case of updates or new
service releases. This monolithic style even leads to situa-
tions where complete applications are simply packaged as
one large virtual machine image. That fits perfectly to the
situations shown in Figure 3 (Dedicated Server and
Virtualization). However, depending on the application
size, this normally involves noteworthy downtimes of the
application for end-users and limits the capability to scale
the application in the case of increasing or decreasing
workloads.
It is evident that especially cloud-native applications
come along with such 24 ×7 requirements and the need to
deploy, update, or scale single components independently
from each other at runtime without any downtime.
Therefore, SOA evolved into a so-called microservice
architectural style. One might mention that microservices
are mainly a more pragmatic version of SOAs. What is
more, microservices are intentionally designed to be
independently deployable, updateable, and horizontally
scalable. So, microservices have some architectural
implications that will be investigated in Section 3.2.1.
However, deployment units of microservices should be
standardized and self-contained. This aspect will be inves-
tigated in Section 3.1.2.
3.1.2 Standardized and self-contained deployment
units. While deployment monoliths are mainly using IaaS
resources in the form of virtual machines that are deployed
and updated less regularly, microservice architectures split
up the monolith into independently deployable units that
are deployed and terminated much more frequently. What
is more, this deployment is done in a horizontally scalable
way that is very often triggered by request stimuli. If many
requests are hitting a service, more service instances are
launched to distribute the requests across more instances.
If the requests are decreasing, service instances are shut
down to free resources (and save money). So, the inherent
elasticity capabilities of microservice architectures are
much more in focus compared with classical deployment
monoliths and SOA approaches. One of the critical suc-
cess factors resulting in microservice architectures gaining
so much attraction over the recent years might be the fact
that the deployment of service instances could be standar-
dized as self-contained deployment units – so-called con-
tainers.
23
Containers make use of operating system
virtualization instead of machine virtualization (see Figure 4)
and are therefore much more lightweight. Containers make
scaling much more pragmatic and faster, and because contain-
ers are less resource consuming compared with virtual
machines, the instance density is reduced.
However, even in microservice architectures, the ser-
vice concept is an always-on concept. So, at least one ser-
vice instance (container) must be active and running for
each microservice at all times. Thus, even container tech-
nologies do not overcome the need for always-on compo-
nents. Also, always-on components are one of the most
expensive and therefore avoidable cloud workloads,
according to Weinmann.
3
Thus, the question arises as to
Figure 4. Comparing containers and virtual machines (adapted from the Docker website: https://www.docker.com/resources/
what-container).
Kratzke and Siegfried 7
whether it is possible to execute service instances only in
the case of actual requests? The answer leads to FaaS con-
cepts and corresponding platforms that will be discussed
in Section 3.1.3.
3.1.3 Function as a Service. Microservice architectures pro-
pose a solution to efficiently scale computing resources
that are hardly realizable with monolithic architectures.
24
The allocated infrastructure can be better tailored to the
microservice needs due to the independent scaling of each
one of them via standardized deployment units, addressed
in Section 3.1.2. However, microservice architectures face
additional efforts, such as deploying every single microser-
vice and to scale and operate them in cloud infrastructures.
To address these concerns container orchestrating plat-
forms, such as Kubernetes
25
and Mesos/Marathon,
26
have
emerged. However, this shifts the problem to the operation
of these platforms, and these platforms are still always-on
components. Thus, so-called serverless architectures and
FaaS platforms have emerged in the cloud service ecosys-
tem. The AWS Lambda service might be the most promi-
nent one, but there exist more, such as Google Cloud
Functions, Azure Functions, OpenWhisk, and Spring
Cloud Functions, to name just a few. However, all (com-
mercial platforms) follow the same principle to provide
minimal and fine-grained services (just exposing one state-
less function) that are billed on a runtime-consuming
model (millisecond dimension).
FaaS is more fine-grained than microservices and facil-
itates the creation of functions. Therefore, these fine-
grained functions are sometimes called nanoservices.
These functions can be quickly deployed and automati-
cally scaled, and provide the potential to reduce infrastruc-
ture and operation costs. Unlike the deployment unit
approaches of Section 3.1.2 – that are still always-on soft-
ware components – functions are only processed if there
are active requests. Thus, FaaS can be much more cost
efficient than just containerized deployment approaches.
According to a cost comparison study of monolithic,
microservice, and FaaS architectures in a case study by
Villamizar et al.,
22
cost reductions of up to 75% are possible.
On the other hand, there are still open problems, such as
the serverless trilemma. The serverless trilemma ‘‘captures
the inherent tension between economics, performance, and
synchronous composition’’ of serverless functions.
27
One
obvious problem stressed by Baldini et al.
27
is the ‘‘double
spending problem’ shown in Figure 5. This problem
occurs when a serverless function fis calling another ser-
verless function gsynchronously. The consumer is billed
for the execution of fand g– although only gis consum-
ing resources because fis waiting for the result of g.To
avoid this double spending problem, many serverless
applications delegate the composition of fine-grained ser-
verless functions into higher order functionality to client
applications and edge devices outside the scope of FaaS
platforms. This composition problem leads to new – more
distributed and decentralized – forms of cloud-native
architectures investigated in Section 3.2.2.
3.2 Architectural evolution
The reader has seen in Section 3.1 that cloud-native appli-
cations strived for a better resource utilization mainly by
applying more fine-grained deployment units in shape of
lightweight containers (instead of virtual machines) or the
shape of functions in the case of FaaS approaches.
Moreover, these improvements of resource utilization rates
had an impact on how the architectures of cloud
Figure 5. The double spending problem resulting from the serverless trilemma. FaaS: Function as a Service.
8Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
applications evolved. Two major architectural trends
(microservices, and serverless architectures) in cloud
application architectures have emerged in the last decade.
We will investigate microservice architectures in Section
3.2.1 and serverless architectures in Section 3.2.2.
3.2.1 Microservice architectures. Microservices form.
.an approach to software and systems architecture that
builds on the well-established concept of modularisation but
emphasise technical boundaries. Each module — each micro-
service — is implemented and operated as a small yet inde-
pendent system, offering access to its internal logic and data
through a well-defined network interface. This architectural
style increases software agility because each microservice
becomes an independent unit of development, deployment,
operations, versioning, and scaling.
28
Often-mentioned benefits of microservice architectures are
faster delivery, improved scalability, and greater auton-
omy.
28,29
Different services in a microservice architecture
can be scaled independently from each other according to
their specific requirements and actual request stimuli.
What is more, each service can be developed and oper-
ated by different teams. So, microservices do not only have
a technological but also an organizational impact. These
teams can make localized decisions per service regarding
programming languages, libraries, frameworks, and more.
This organizational impact enables, on the one hand, best-
of-breed approaches within each area of responsibility. On
the other hand, it might increase the technological hetero-
geneity across the complete system. What is more, corre-
sponding the long-term effects regarding maintainability of
such systems might not have even been observed so far.
30
First generation microservices are formed of individual
services that were packed using container technologies.
These services were then deployed and managed at run-
time using container orchestration tools, such as Mesos.
Each service was responsible for keeping track of other
services, and invoking them by specific communication
protocols. Failure-handling was implemented directly in
the service source code. With an increase of services per
application, the reliable and fault-tolerant location and
invocation of appropriate service instances became a prob-
lem itself. If new services were implemented using differ-
ent programming languages, reusing existing discovery
and failure-handling code would become increasingly dif-
ficult. So, freedom of choice and ‘‘polyglot programming’’
are often-mentioned benefits of microservices, but they
have drawbacks that need to be managed.
Therefore, second generation microservice architec-
tures made use of discovery services and reusable fault-
tolerant communication libraries. Common discovery ser-
vices (such as Consul, see Table 2) were used to register
provided functionalities. During service invocation, all
protocol-specific and failure-handling features were dele-
gated to an appropriate communication library, such as
Finagle (see Table 2). This simplified service implementa-
tion and reuse of boilerplate communication code across
services.
The third generation introduced service proxies as
transparent service intermediates with the intent to
improve software reusability. So-called sidecars encapsu-
late reusable service discovery and communication fea-
tures as self-contained services that can be accessed via
existing fault-tolerant communication libraries provided by
almost every programming language nowadays. Because
of its network intermediary conception, sidecars are more
than suited for monitoring the behavior of all service inter-
actions in a microservice application. This intermediary is
precisely the idea behind service mesh technologies such
as Linkerd (see Table 2). These tools extend the notion of
self-contained sidecars to provide a more integrated service
communication solution. Using service meshes, operators
have much more fine-grained control over the service-to-
service communication, including service discovery, load bal-
ancing, fault tolerance, message routing, and even security.
So, besides the pure architectural point of view, the fol-
lowing tools, frameworks, services, and platforms (see Table
2) form our current understanding of the term microservice.
Service discovery technologies let services commu-
nicate with each other without explicitly referring
to their network locations.
Container orchestration technologies automate con-
tainer allocation and management tasks, abstracting
away the underlying physical or virtual infrastruc-
ture from service developers. That is the reason we
see this technology as an essential part of any
cloud-native application stack (see Figure 2).
Monitoring technologies that are often based on
time-series databases to enable runtime monitoring
and analysis of the behavior of microservice
resources at different levels of detail.
Latency and fault-tolerant communication libraries
let services communicate more efficiently and reli-
ably in permanently changing system configura-
tions with plenty of service instances permanently
joining and leaving the system according to chang-
ing request stimuli.
Continuous delivery technologies integrate solu-
tions, often into third-party services that automate
many of the DevOps practices typically used in a
web-scale microservice production environment.
31
Service proxy technologies encapsulate mainly
communication-related features, such as service
discovery and fault-tolerant communication, and
expose them over HTTP.
Kratzke and Siegfried 9
Finally, the latest service mesh technologies build
on sidecar technologies to provide a fully integrated
service-to-service communication monitoring and
management environment.
Table 2 shows that a complex tool-chain evolved to
handle the continuous operation of microservice-based
cloud applications.
3.2.2 Serverless architectures. Serverless computing is a
cloud computing execution model in which the allocation
of machine resources is dynamically managed and inten-
tionally out of control of the service customer. The ability
to scale-to-zero instances is one of the critical differentia-
tors of serverless platforms compared with container
focused PaaS or virtual machine focused IaaS services.
Scale-to-zero enables to avoid always-on components and
therefore excludes the most expensive cloud usage pattern,
according to Weinmann.
3
That might be one reason why
the term ‘‘serverless’’ has become more and more com-
mon since 2014.
28
However, what is ‘‘serverless’’ exactly?
Servers must still exist somewhere.
So-called serverless architectures replace server admin-
istration and operation mainly by using FaaS concepts
32
and integrating third-party backend services. Figure 3
showed the evolution of how resource utilization has been
optimized over the last 10 years, ending in the latest trend
to make use of FaaS platforms. FaaS platforms apply
time-sharing principles and increase the utilization factor
of computing infrastructures, and thus avoid expensive
always-on components. As already mentioned, at least one
study showed that due to this time-sharing, serverless
architectures can reduce costs by 70%.
22
A serverless plat-
form is merely an event processing system (see Figure 6).
According to Baldini et al.,
33
serverless platforms take an
event (sent over HTTP or received from a further event
source in the cloud), then these platforms determine which
functions are registered to process the event, find an exist-
ing instance of the function (or create a new one), send the
event to the function instance, wait for a response, gather
execution logs, make the response available to the user,
and stop the function when it is no longer needed. Beside
application programming interface (API) composition and
aggregation to reduce API calls,
33
event-based applica-
tions are very much suited for this approach.
34
Serverless platform provision models can be grouped
into the following categories.
Public (commercial) serverless services of public
cloud service providers provide computational run-
time environments, also known as FaaS platforms.
Some well-known type representatives include
AWS Lambda, Google Cloud Functions, and
Microsoft Azure Functions. All of the mentioned
commercial serverless computing models are prone
to create vendor lock-in (to some degree).
Open (source) serverless platforms such as
Apache’s OpenWhisk and OpenLambda might be
an alternative, with the downside that these plat-
forms need infrastructure.
Provider agnostic serverless frameworks provide
a provider and platform agnostic way to define and
deploy serverless code on various serverless plat-
forms or commercial serverless services. So, these
frameworks are an option to avoid (or reduce) ven-
dor lock-in without the necessity to operate their
own infrastructure.
So, on the one hand, serverless computing provides
some inherent benefits, such as resource and cost effi-
ciency, operation simplicity, and a possible increase in
development speed and better time-to-market.
32
However,
serverless computing also comes with some noteworthy
Table 2. Some observable microservice engineering ecosystem components.
Ecosystem component Example tools, frameworks, services, and platforms (last accessed 18 December 2019)
Service discovery Zookeeper (https://zookeeper.apache.org), Eureka (https://github.com/Netflix/eureka), Consul
(https://www.consul.io), etcd (https://github.com/coreos/etcd, Synapse (https://github.com/airbnb/
synapse)
Container orchestration Kubernetes (https://kubernetes.io, Mesos (http://mesos.apache.org, Swarm (https://
docs.docker.com/engine/swarm), Nomad (https://www.nomadproject.io)
Monitoring Graphite (https://graphiteapp.org), InfluxDB (https://github.com/influxdata/influxdb), Sensu
(https://sensuapp.org), cAdvisor (https://github.com/google/cadvisor), Prometheus (https://
prometheus.io), Elastic Stack (https://elastic.io/products)
Fault-tolerant communication Finagle (https://twitter.github.io/finagle), Hystrix (https://github.com/Netflix/Hystrix), Proxygen
(https://github.com/facebook/proxygen), Resilience4j (https://github.com/resilience4j)
Continuous delivery services Ansible (https://ansible.com), Circle CI (https://circleci.com/), Codeship (https://codeship.com/),
Drone (https://drone.io), Spinnaker (https://spinnaker.io), Travis CI (https://travis-ci.org/)
Service proxy Envoy (https://www.envoyproxy.io)
Service meshes Linkerd (https://linkerd.io), Istio (https://istio.io)
10 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
drawbacks, such as runtime constraints, state constraints,
and still unsatisfactorily solved function composition prob-
lems, such as the double spending problem (see Figure 5).
What is more, resulting serverless architectures have secu-
rity implications. They increase attack surfaces and shift
parts of the application logic (service composing) to the
client-side (which is not under complete control of the ser-
vice provider). Furthermore, FaaS increases vendor lock-
in problems and client complexity, as well as integration
and testing complexity.
Furthermore, Figure 7 shows that serverless architec-
tures (and microservice architectures as well) require a
cloud application architecture redesign, compared to tradi-
tional e-commerce applications. Much more than micro-
service architectures, serverless architectures integrate
third-party backend services, such as authentication or
database services, intentionally. Functions on FaaS plat-
forms provide only very service specific, security relevant,
or computing intensive functionality. All functionality that
would have been provided classically on a central
Figure 6. Blueprint of a serverless platform architecture. FaaS: Function as a Service; API: application programming interface.
Figure 7. Serverless architectures result in a different and less centralized composition of application components and backend
services compared with classical tiered application architectures. API: application programming interface; FaaS: Function as a Service;
BaaS: Backend as a Service.
Kratzke and Siegfried 11
application server is now provided as many isolated
micro- or even nanoservices. The integration of all these
isolated services as meaningful end-user functionality is
delegated to end devices (very often in the shape of native
mobile applications or progressive web applications). In
summary, we can see the following observable engineer-
ing decisions in serverless architectures.
Former cross-sectional but service-internal logic,
such as authentication or storage, is sourced to
external third-party services.
Even nano- and microservice composition is shifted
to end-user clients or edge devices. This means that
even service orchestration is not done anymore by
the service provider itself but by the service con-
sumer via provided applications. This end-user
orchestration has two interesting effects: (1) the
service consumer now provides resources needed
for service orchestration; (2) because the service
composition is done outside the scope of the FaaS
platform, still unsolved FaaS function composition
problems (such as the double spending problem)
are avoided.
Such client or edge devices are interfacing third-
party services directly.
Endpoints of service-specific functionality are pro-
vided via API gateways. So, HTTP- and REST-
based/REST-like communication protocols are gen-
erally preferred.
Only very domain- or service-specific functions are
provided on FaaS platforms. This is mainly when
this functionality is security relevant and should be
executed in a controlled runtime environment by
the service provider, or the functionality is too pro-
cessing- or data-intensive to be executed on con-
sumer clients or edge devices, or the functionality is
so domain-, problem-, or service-specific that sim-
ply no external third-party service exists.
Finally, the reader might observe the trend in serverless
architectures that this kind of architecture is more decen-
tralized and distributed, makes more intentional use of
independently provided services, and is therefore much
more intangible (more cloudy) compared with microser-
vice architectures.
4. Impacts on the Modeling and
Simulation as a Service domain
The impacts on MSaaS are presented from diverse point
of views. Section 4.1 will present several example use
cases to derive some implications for cloud-native simula-
tions (CNSs; see Section 4.2). Section 4.3 will explain
how these implications have been considered in an CNS
reference model (see Figure 8). In addition, Section 4.4
will discuss some limitations that should be considered to
raise the overall maturity level of CNSs (Section 4.5).
4.1 Example cloud simulation use cases
There exist several examples and investigation of simula-
tion models that have been successfully deployed to public
cloud computing infrastructures.
35
A cloud-based distributed agent-based traffic simu-
lator named Megaffic.
36
The Scalable Electro-Mobility Simulation Cloud
Service was used to study the impact of large-scale
electromobility on a city’s infrastructure.
37
The D-Mason framework is a parallel version of the
Mason library for writing and running distributed
agent-based simulations.
38
GridSpice is a cloud-based simulation platform for
distributed smart power grid simulation.
39
The British Army is investigating the potential of
virtual reality (VR), machine learning, and cloud
computing for the Army’s Collective Training
Transformation Programme (CTTP; https://bit.ly/
2r6oL51). A series of training events aim to demon-
strate VR and mixed reality (MR) capabilities. To
display the potential benefits of data capture and
machine learning-driven analytics for military train-
ing, subcontractors will also show the use of cloud
computing in this context.
Guzzetti et al.
40
investigated the impact of different
high-performance computing (HPC) platforms for numeri-
cal simulation in computational hemodynamics with the
LiFEV (Library for Finite Elements). They compared in-
house computing clusters, a large-scale university-based
HPC cluster, and a regional supercomputer with public
clouds. According to their results, cloud computing can be
utilized for scientific computational fluid dynamics (CFD)
simulations, possibly at lower cost/performance than using
a more expensive local computing cluster. Ledyayev and
Richter
41
evaluated the private cloud solution OpenStack
by using three case studies in transportation modeling
(network optimization), high-energy physics (Monte
Carlo), and materials simulation (CFD). They concluded
that cloud computing is suitable for multiple runs of
non-concurrent code but needs specialist hardware to
support parallel processing.
4.2 Implications for cloud-native simulations
If we analyze these examples, we see that cloud-based
simulations are possible, even for large-scale problems.
However, economies of scale rely intensely on the kind of
12 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
simulation and the parallelizing approach of processing.
Summarizing our central insights of Section 3, we get the
following lessons learned from the cloud-native domain
that can be transferred to simulation contexts. First of all,
if a cloud-native application is an application that is com-
posed of services, then correspondingly, a CNS would be a
simulation composed of small and independent deployable
and replaceable simulation services that simulate (UNIX-
like) ‘‘one thing well’’ and can be scaled horizontally to
enable parallel processing.
Consequently, existing (monolithic) simulations must
be migrated into microservice architectures and would
evolve somehow from a cloud-ready into a cloud-native
maturity level (see Table 3). Cloud-native application engi-
neering showed that it is rarely possible to transfer existing
applications one to one into cloud environments without
reengineering.
As the reader will notice, we will propose a CNS stack
(Figure 8) that is deeply based on the already introduced
cloud-native stack (Figure 2). Consequently, correspond-
ing CNS engineering trends (Table 4) are derived from the
general cloud-native engineering trends (Table 1). The
cloud-native simulation stack as well as the corresponding
engineering trends have been compiled systematically by
‘‘replacing’’ general cloud-native concepts with more spe-
cific CNS concepts. That is because we assume that a
CNS is a particular cloud-native application (with some
specific requirements). However, this eliminated some
already discussed features and software trends, for exam-
ple, the observable DevOps trend is a general software
engineering trend. We do not see specific impacts on
simulation service engineering here that go beyond stan-
dard software engineering. However, that does not mean
that this trend should not be applied in simulation service
Table 3. Cloud simulation maturity model (adapted from the Open Data Center Alliance).
Level Maturity Criteria
3 Cloud native - Simulations are transferable across infrastructure providers at runtime and without
interruption of service.
- Simulation services are automatically scaled out/in based on stimuli.
2 Cloud resilient - The state of simulation services is isolated in a minimum of services.
- Simulations are unaffected by dependent service failures.
- Simulations are infrastructure agnostic.
1 Cloud friendly - Simulations are composed of loosely coupled simulation services.
- Simulation services are discoverable by name.
- Simulation services are designed to cloud patterns.
- Compute and storage are separated.
0 Cloud ready - Simulations can be operated on virtualized infrastructure.
- Simulations can be instantiated from image or script.
Figure 8. Proposal of a cloud-native simulation stack. FaaS: Function as a Service; IaaS: Infrastructure as a Service.
Kratzke and Siegfried 13
engineering. It only means that we do not see simulation-
specific problems here.
The same is true for cloud modeling and cloud simula-
tion tools (such as CloudSim) to represent and analyze
cloud architectures. At first glance, it seems obvious to
cover these tools as well. However, we do not see that
these tools are relevant to the research of CNSs in general,
except for the case that cloud simulations should be run as
CNSs. However, to simulate cloud infrastructures is
simply a particular object of investigation. This paper
intentionally does not focus too much on specific
simulation-specific objects of investigation.
We do not have a common definition that explains what
a CNS exactly is. Nevertheless, we use our experiences
with cloud-native applications to derive a definition pro-
posal for CNSs. If we assume that a CNS is a special kind
of a cloud-native application we should consider the fol-
lowing aspects.
Fehling et al.
42
postulate that almost all cloud-native
systems should be IDEAL: They [i]solate their state, they
are [d]istributed in their nature, they are [e]lastic in a hor-
izontal scaling notion, they are operated on [a]utomated
management systems, and their components are [l]oosely
coupled. According to Stine,
43
there are common motiva-
tions for cloud-native architectures, such as to deliver
software-based solutions more quickly (speed), in a more
fault isolating, fault tolerating, and automatic recovering
way (safety), to enable horizontal (instead of vertical)
application scaling (scale), and finally to handle a diver-
sity of consumer platforms and legacy systems (client
diversity).
Several application architectures and infrastructure
approaches address these common motivations.
Microservices represent the decomposition of
monolithic systems into independently deployable
services that do ‘‘one thing well.’
44,45
The primary mode of interaction between services in
a cloud-native application architecture is via pub-
lished and versioned APIs (API-based collabora-
tion). These APIs are often HTTP based and follow
a REST-style with JSON serialization, but other pro-
tocols and serialization formats can be used as well.
Single deployment units of the architecture are
designed and interconnected according to a collec-
tion of cloud-focused patterns, such as the twelve-
Table 4. Expectable software engineering trends for cloud-native simulation services.
CNA trend Impact on MSaaS
Microservices Simulation architectures should be composed of small and independently replaceable horizontally
scalable simulation services that are ‘‘simulating one thing well.’
Modeling languages Existing simulation modeling languages should be extended to define the composition of
simulation services and their elasticity behavior.
Standardized
deployment units
Simulation deployment units should wrap a piece of simulation software in a complete file system
that contains everything needed to run: code, runtime, system tools, system libraries. So, it is
guaranteed that the software will always run the same, regardless of its environment. This
deployment approach can be realized using standardized container technologies (OCI standard).
Elastic platforms Simulation platforms should evolve into a unifying middleware of cloud infrastructures. Such
platforms extend resource sharing and increase the utilization of underlying compute, network,
and storage resources for custom but standardized simulation deployment units.
Serverless Serverless simulation would be used for an architectural style that is used for cloud-based
simulations that deeply depend on external third-party simulation services and integrating them
via small event-based triggered functions (Function-as-a-Service, FaaS).
State isolation Stateless simulation services are easier to scale up/down horizontally than stateful simulation
services. Of course, stateful components cannot be avoided, but stateful components should be
reduced to a minimum and realized by intentional horizontal scalable storage systems (often
eventual consistent NoSQL databases).
Versioned REST APIs If simulation services provide versioned REST APIs this inherently provides a scalable and
pragmatic communication. Such a kind of simulation service communication relies mainly on
already existing internet infrastructure and well-defined and widespread standards. It would even
enable the seamless integration of simulation services that are not ‘‘cloud-native’’ but only
‘‘internet accessible.’’
Loose coupling Simulation service composition can be done by events or by data. Event coupling in ‘‘normal
cloud-native application’’ relies on messaging solutions (e.g., AMQP standard). Data coupling relies
on scalable but (mostly) eventual consistent storage solutions (which are often subsumed as
NoSQL databases).
CNA: cloud-native architecture; MSaaS: Modeling and Simulation as a Service; API: application programming interface; AMQP: Advanced Message
Queueing Protocol.
14 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
factor app collection,
46
the circuit breaker pat-
tern,
47
or cloud computing patterns.
42,48
More and more often elastic container platforms
are used to deploy and operate these microservices
via self-contained deployment units (containers).
These platforms provide additional operational cap-
abilities on top of IaaS infrastructures, such as auto-
mated and on-demand scaling of application
instances, application health management, dynamic
routing, load balancing, and aggregation of logs
and metrics.
4.3 A cloud-native simulation reference model
These aspects let us derive the following understanding
of a CNS system and the corresponding CNS stack
(Figure 8).
The core design idea of plenty of cloud-native applica-
tion architectures inspires the leading conceptual approach
of the derived CNS stack (Figure 8). Every simulation on
Layer 4 (or service) should be composable of stateless
Layer 3 simulation services that rely on services managing
and encapsulating simulation state. This separation of con-
cerns (simulation logic and simulation state) makes it pos-
sible for distributed simulations to decide for eventual or
strict consistency models for the simulation state.
Although it enables seamless horizontal scalability of
functional simulation services, it is a widespread and pro-
ven pattern in cloud-native application architectures,
according to our experiences.
Another general cloud application architecture best
practice is the standardization of the deployment of Layer
3 simulation services. This deployment standardization via
a Layer 2 elastic simulation platform enables one to oper-
ate plenty of services on the same physical or virtual Layer
1 hardware. In the general cloud computing context, this is
customarily done via container-based technologies. A con-
tainer is nothing more than a self-contained deployment
unit encapsulating all its runtime dependencies. It exposes
its functionality very often via a REST-based interfaces.
Such containers can be operated on corresponding con-
tainer platforms, such as Kubernetes, Mesos, Docker
Swarm, and more. These kind of platforms are application-
agnostic and can be used for arbitrary types of applica-
tions. Consequently, they can be used for simulation
services as well. Therefore, we recommend using these
kinds of building blocks for elastic simulation platforms
(Layer 2).
What is more, the proposed CNS stack aligns to the
general design principles of distributed and federated simu-
lations that have been successfully standardized, for exam-
ple, via high-level architecture (HLA). HLA is a standard
for distributed simulation, used when building a simulation
for a larger purpose by combining (federating) several
simulations. HLA requests a runtime infrastructure (RTI)
that provides a standardized set of services, as specified in
the HLA Federate Interface Specification. This RTI is
deeply aligned to Layer 2 of the proposed reference model.
Further HLA services of the interface specification can be
mapped to our model as well:
federation management services simulation
deployment unit orchestrator;
object and ownership management services sta-
teful simulation services;
time management services stateful simulation
services;
declaration and data distribution services exist-
ing messaging solutions (e.g., all Advanced
Message Queueing Protocol (AQMP) message bro-
kers) can be deployed on Layer 3 by the Simulation
Deployment unit orchestrator alongside further
simulation-specific services.
4.4 Discussion of limitations
We have to admit that this mapping stays vague at this
level of abstraction. So, more common detailed cross-
functional simulation services (such as timing or messa-
ging services) on Layer 3 could (and should) be defined in
future MSaaS work. However, the CNS stack does not
request a specific time or messaging simulation service (or
other services). However, it recommends providing such
services in a programming language-agnostic way, as
microservice approaches do via HTTP and REST-based
versioned APIs. To make use of such common internet
communication standards would efficiently tackle one
downside of current HLA-based approaches.
Because HLA is a message-oriented middleware that
defines a set of services, mostly provided by a C++ or
Java API, there is no standardized on-the-wire protocol. In
consequence, participants in a federation are very often
bound to RTI libraries from the same provider and usually
also of the same version for applications to interoperate.
The resulting simulations are mostly so-called deployment
monoliths.
Instead of that, the cloud-native application stack would
not request a specific set of services but the way to inter-
face every simulation service in an Internet standard-
ACNS is a distributed, elastic, and horizontally scalable
simulation system composed of simulation (micro)services that
isolate the state in a minimum of stateful components. The self-
contained deployment units of that simulation are designed
according to cloud-focused design patterns and operated on
self-service elastic simulation platforms.
Kratzke and Siegfried 15
conforming manner. Each simulation service would be a
self-containing deployment unit encapsulating all its run-
time dependencies. Every simulation service on Layer 3
and higher should be integrated using standardized internet
protocols, such as HTTP and REST-based approaches.
We also have to consider that cloud computing has tra-
ditionally not been HPC or simulation oriented. It is often
mentioned that current Internet of Things (IoT)
approaches, therefore, are being implemented using Fog or
Edge Computing approaches. However, this has more to
do with communication latencies outside the scope of the
regional datacenters of the hyperscalers. Nevertheless,
simulations are often based on message passing or data
analysis, and the current cloud approaches or MSaaS could
be a wrong choice. Recent studies by the NASA showed
that ‘‘in all cases, the full cost of running on NASA on-
premises resources was less than the lowest-possible com-
pute-only cost of running on AWS.’
49
This NASA study
showed that cloud-based simulations tend to be between
two and even 12 times more expensive than simulations
operated on on-premises simulation facilities.
This sounds disillusioning. However, the study did not
take into account that cloud-based simulations should fol-
low a different kind of architectural style to leverage the
economic benefits. Classical simulations are very often so-
called deployment monoliths. This kind of architecture is
hardly scalable and does fit not the general pay-as-you-go
business model of cloud computing. Insights into the
cloud-native application domain show that cloud-native
deployments should make use of more fine-grained and
horizontally scalable units to optimize resource usage. If
this can be done (and this might be simulation specific),
then other studies by Villamizar et al.
22
show that costs
could be reduced to 25%. So, if we are taking both kinds
of investigation into account, MSaaS would be a reason-
able option for simulations that are still two to four times
more expensive according to the NASA cost comparison
methodology. That is exactly what Taylor et al. are stat-
ing.
35
However, not all simulations will benefit equally
from MSaaS.
What is more, the reader should take into account that
plenty of small companies, organizations, or independent
researchers do not have access to simulation facilities
comparable to NASA’s High-End Computing Capability
(HECC) project. For people that do not have access to
such supercomputing facilities, cost and performance com-
parisons make little sense.
4.5 How to raise the cloud-native maturity of
simulations
The following setting should be considered to reach such a
cloud-native level.
We need standardized deployment units for simula-
tion components (services) and a standardized plat-
form to operate them. Furthermore, the deployment
units should provide a better-operating density than
virtual machines.
Event-triggered FaaS-based simulation services can
be considered to avoid expensive always-on
components.
FaaS-based simulation services will likely change
the architecture of simulations to avoid the double
spending problem (see Figure 5). Furthermore, run-
time limitations of FaaS functions, start-up laten-
cies, and state preservation must be considered and
might limit the applicability of special kinds of
time-sensitive simulations.
Horizontal scalability in cloud-native applications
is mostly realized via loosely coupled (event-based)
microservice approaches. These scalability require-
ments should be considered for CNS architectures
as well.
That raises the need for simulation service meshes
that connect, secure, control, and observe simula-
tion services to enable loose coupling of simulation
services.
FaaS-based simulation service composing problems
might raise the need for specific domain specific
languages (DSLs) to compose different kinds of
(stateful, stateless, serverless) simulation services
that are frictionless.
For operational convenience, the operation of cloud-
based simulation platforms and the operation of simula-
tions on top of these platforms should be handled as two
independent but complementary engineering problems.
Corresponding and simulation-specific engineering
trends are listed in Table 4 for the convenience of the
reader.
5. Related work
As far as the authors know, no survey has focused inten-
tionally on observable trends in cloud computing over the
last decade from a ‘‘big picture’’ and evolutional point of
view. This paper grouped that evolution into the following
points of view:
resource utilization optimization approaches, such
as the containerization and FaaS approaches;
the architectural evolution of cloud applications via
microservices and serverless architectures.
For all four of these specific aspects (containerization,
FaaS, microservices, serverless architectures) there exist
surveys that should be considered by the reader. The
16 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
studies and surveys
23,50–52
deal mainly with containeriza-
tion and its accompanying resource efficiency. Although
FaaS is quite young and is not reflected extensively in
research so far, there exist first survey papers
33,53–56
deal-
ing with FaaS approaches deriving some open research
questions regarding tool support, performance, patterns for
serverless solutions, enterprise suitability, and whether ser-
verless architectures will extend beyond traditional cloud
platforms and architectures.
Service composition provides value-adding and higher
order services by composing basic services that can be
even pervasively provided by various organizations.
57,58
What is more, service computing is quite established, and
there are several surveys on SOA-related aspects.
59–63
However, more recent studies focus mainly on microser-
vices. Dragoni et al.,
24
Jamshidi et al.,
28
and Cerny et al.
64
focus on the architectural point of view and the relation-
ship between SOA and microservices. All of these papers
are great to understand the current microservice ‘‘hype’
better. It is highly recommended to study these papers.
However, these papers are bound to microservices and do
not take the ‘‘big picture’’ of general cloud applications
and simulation architecture evolution into account. Very
often, serverless architectures are subsumed as a part of
microservices to some degree. The authors are not quite
sure whether serverless architectures do not introduce fun-
damental new aspects into cloud simulation architectures
that evolve from the ‘‘scale-to-zero’’ capability on the one
hand and the unsolved function composition aspects (such
as the double spending problem) on the other hand.
M&S products are highly valuable to NATO and the
military, and it is essential that M&S products, data, and
processes are conveniently accessible to a large number of
users as often as possible. Therefore, a new M&S ecosys-
tem is required where M&S products can be accessed
simultaneously and spontaneously by a large number of
users for their individual purposes. This ‘‘as-a-service’’
paradigm has to support stand-alone use as well as the
integration of multiple simulated and real systems into a
unified simulation environment whenever the need arises.
Several approaches head into this direction.
65–67
The
‘‘Allied Framework for MSaaS’’ is the common approach
of NATO and the nations toward implementing MSaaS
and is defined by the following documents.
1
Operational Concept Document (OCD): the OCD
describes the intended use, key capabilities, and
desired effects of the Allied Framework for MSaaS
from a user’s perspective.
Technical Reference Architecture: the Technical
Reference Architecture describes the architectural
building blocks and patterns for realizing MSaaS
capabilities.
Governance Policies: the Governance Policies iden-
tify MSaaS stakeholders and relationships and pro-
vide guidance for implementing and maintaining
the Allied Framework for MSaaS.
The MSaaS Technical Reference Architecture
68
is most
important in the context of this paper. The MSaaS
Technical Reference Architecture provides technical
guidelines, recommended standards, architecture building
blocks, and architecture patterns that should be considered
in realizing MSaaS capabilities. Compared to the CNS
maturity model, the MSaaS Technical Reference Model
guides on a higher level (building blocks, patterns, and
more), but does not explicitly define how to implement
those building blocks (e.g., as microservice or FaaS).
6. Conclusion
Because the inherent cost structure of cloud computing
stays the same for simulation services, we forecast that
CNS architectures follow similar trends to cloud-native
applications. These similar trends should make CNS ser-
vices more microservice-like, or even nanoservice-like
(similar to Functions as a Service). To leverage the oppor-
tunities of cloud computing, they should be much more
composed of smaller and more fine-grained and domain-
specific simulation services that just ‘‘simulate one thing
well.’’ Simulation services should strive to become state-
less or isolate states in a minimum of stateful components.
As the inherent nature of simulations deeply relies on
states (data), this state focusing might even raise some
problems that are not so common in ‘‘normal’’ cloud-
native application design and need new solutions to be
developed. ‘‘Classical’ cloud-native applications come
along with 24 ×7 requirements. These 24 ×7 require-
ments are very often not necessary for simulations. This
might provide short-cut opportunities in a cloud maturity
model for simulation services. A more detailed analysis of
how trends and approaches from cloud-native applications
might be applied to CNSs, and which new challenges
might arise, would be valuable future research.
Recent studies by NASA showed that cloud-based
simulations tend to be between two and even 12 times
more expensive than simulations operated on on-premises
simulation facilities. However, this study did not take into
account that cloud-based simulations should follow a dif-
ferent kind of architectural style to leverage the economic
benefits of cloud infrastructures. If this cloud-native rear-
chitecting of simulations can be done (and this might be
simulation specific), then other studies show that costs
could be reduced to 25%. So, if we are taking both kinds
of investigations into account, MSaaS would be a reason-
able option for maybe not all but a significant portion of
simulations.
Kratzke and Siegfried 17
What is more, plenty of small companies, organizations,
or independent researchers do not have access to simula-
tion facilities comparable to the NASA’s HECC project.
For all those that do not have access to supercomputing
facilities, such cost and performance comparisons make lit-
tle sense. Cloud computing might be the only viable option
here.
Funding
The authors disclosed receipt of the following financial
support for the research, authorship, and/or publication of
this article: The CloudTRANSIT project of Nane Kratzke
has been funded by the German Ministry of Education and
Research (13FH021PX4). The NMSG activities of Robert
Siegfried have been funded by the German Federal Office
of Bundeswehr Equipment, Information Technology and
In-Service Support (BAAINBw).
ORCID iD
Nane Kratzke https://orcid.org/0000-0001-5130-4969
References
1. NATO STO. Modelling and Simulation as a Service
(MSaaS) - Rapid Deployment of Interoperable and Credible
Simulation Environments. Technical report, AC/323(MSG-
136)TP/826, NATO Science and Technology Organization
(STO), 2018.
2. Siegfried R, McGroarty C, Lloyd J, et al. A new reality:
Modelling & Simulation as a Service. CSIAC J Cyber Secur
Inform Syst 2018; 6, https://www.csiac.org/journal-article/a-
new-reality-modelling-simulation-as-a-service/
3. Weinmann J. Mathematical proof if the inevitability of cloud
computing, http://www.JoeWeinman.com/Resources/Joe_
Weinman_Inevitability_Of_Cloud.pdf (2011, accessed 10
July 2018).
4. Mell PM and Grance T. The NIST definition of cloud com-
puting. Technical report, National Institute of Standards &
Technology, Gaithersburg, MD, USA, 2011.
5. Kratzke N and Quint PC. Technical report of project
CloudTRANSIT - Transfer cloud-native applications at run-
time. Technical report, Lu¨beck University of Applied
Sciences, 2018, https://doi.org/10.2314/KXP:1678556971
6. OASIS. Advanced Message Queueing Protocol (AQMP),
Version 1.0 http://www.amqp.org/sites/amqp.org/files/amqp.
pdf (2011, accessed 16 December 2019).
7. Kratzke N. Lightweight virtualization cluster – how to over-
come cloud vendor lock-in. J Comput Commun 2014; 2:
50326.
8. Kratzke N, Quint PC, Palme D, et al. Project cloud
TRANSIT - or to simplify cloud-native application provi-
sioning for SMEs by integrating already available container
technologies. In: Kantere V and Koch B (eds) European
project space on smart systems, big data, future internet -
towards serving the grand societal challenges, 2016, pp.3–
26. Setu
´bal, Portugal: SCITEPRESS.
9. Hogan M, Fang L, Sokol A, et al. Cloud Infrastructure
Management Interface (CIMI) model and RESTful HTTP-
based protocol, https://www.iso.org/standard/66296.html
(2015, accessed 16 December 2019).
10. Nyren R, Edmonds A, Papaspyrou A, et al. Open Cloud
Computing Interface (OCCI) - core, version 1.1, https://
www.ogf.org/documents/GFD.183.pdf (2011, accessed 16
December 2019).
11. Metsch T and Edmonds A. Open Cloud Computing Interface
(OCCI) - infrastructure, version 1.1 https://www.ogf.org/doc
uments/GFD.184.pdf (2011, accessed 16 December 2019).
12. SNIA. Cloud Data Management Interface (CDMI), version
1.1, http://www.snia.org/sites/default/files/CDMI_Spec_
v1.1.1.pdf (2015, accessed 16 December 2019).
13. System Virtualization, Partitioning, and Clustering Working
Group. Open Virtualization Format specification, version
2.1.0, https://www.dmtf.org/sites/default/files/standards/doc-
uments/DSP0243_2.1.1.pdf (2015, accessed 16 December
2019).
14. OCI. Open Container Initiative, https://www.opencontainers.
org (2015, accessed 4 February 2016).
15. OASIS. Topology and Orchestration Specification for Cloud
Applications (TOSCA), version 1.0, http://docs.oasis-open.org/
tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.pdf (2013, accessed 16
December 2019).
16. Opara-Martins J, Sahandi R and Tian F. Critical review of
vendor lock-in and its impact on adoption of cloud comput-
ing. In: International conference on information society
(i-Society 2014), London, 10–12 November 2014, pp.92–97.
IEEE.
17. Kratzke N and Peinl R. ClouNS - a cloud-native application
reference model for enterprise architects. In: 2016 IEEE
20th international enterprise distributed object computing
Workshop (EDOCW), Vienna, 5–9 September 2016, pp.1–
10. IEEE.
18. Bohn RB, Messina J, Liu F, et al. NIST cloud computing
reference architecture. In: world congress on services
(SERVICES 2011), Washington, DC, 4–9 July 2011,
pp.594–596. Washington, DC: IEEE Computer Society.
19. Quint PC and Kratzke N. Overcome vendor lock-in by inte-
grating already available container technologies - towards
transferability in cloud computing for SMEs. In: proceedings
of 7th international conference on cloud computing, grids
and virtualization (CLOUD COMPUTING 2016) (eds CB
Westphall, YW Lee and S Rass), pp.38–41.
20. Ardagna D, Casale G, Ciavotta M, et al. Quality-of-service
in cloud computing: modeling techniques and their applica-
tions. J Internet Serv Appl 2014; 5: 11.
21. White G, Nallur V and Clarke S. Quality of service
approaches in IoT: a systematic mapping. J Syst Softw 2017;
132: 186–203.
22. Villamizar M, Garce
´s O, Ochoa L, et al. Cost comparison of
running web applications in the cloud using monolithic,
microservice, and AWS Lambda architectures. Service
Orient Comput Appl. Epub ahead of print 27 April 2017.
DOI: 10.1007/s11761-017-0208-y.
18 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
23. Pahl C, Brogi A, Soldani J, et al. Cloud container technolo-
gies: a state-of-the-art review. IEEE Trans Cloud Comput
2017; 7: 1–1.
24. Dragoni N, Giallorenzo S, Lafuente AL, et al. Microservices:
yesterday, today, and tomorrow. In: Mazzara M and Meyer
B (eds) Present and ulterior software engineering. Cham:
Springer.
25. Verma A, Pedrosa L, Korupolu M, et al. Large-scale cluster
management at Google with Borg. In: Proceedings of the
tenth European conference on computer systems (EuroSys
’15), Bordeaux, France, 21–24 April 2015, pp.1–17. New
York: ACM.
26. Hindman B, Konwinski A, Zaharia M, et al. Mesos: a plat-
form for fine-grained resource sharing in the data center. In:
proceedings of the 8th USENIX conference on networked
systems design and implementation (NSDI’11), Boston, MA,
30 March–1 April 2011, pp.295–308. Berkeley, CA:
USENIX Association.
27. Baldini I, Cheng P, Fink SJ, et al. The serverless
trilemma: function composition for serverless computing. In:
proceedings of the 2017 ACM SIGPLAN international sympo-
sium on new ideas, new paradigms, and reflections on pro-
gramming and software - onward!, Vancouver, BC, Canada,
25–27 October 2017, pp.89–103. New York: ACM.
28. Jamshidi P, Pahl C, Mendoncxa NC, et al. Microservices: the
journey so far and challenges ahead. IEEE Softw 2018; 35:
24–35.
29. Taibi D, Lenarduzzi V and Pahl C. Architectural patterns for
microservices: a systematic mapping study. In: 8th interna-
tional conference on cloud computing and services science
(CLOSER‘18), Funchal, Madeira, Portugal, 19–21 March
2018, pp.221–232. SCITEPRESS.
30. Kratzke N and Quint PC. Understanding cloud-native appli-
cations after 10 years of cloud computing - a systematic map-
ping study. J Syst Softw 2017; 126: 1–16.
31. Balalaie A, Heydarnoori A and Jamshidi P. Microservices
architecture enables DevOps: migration to a cloud-native
architecture. IEEE Softw. Epub ahead of print 18 March
2016. DOI: 10.1109/MS.2016.64.1606.04036.
32. Mike Roberts. Serverless architectures, https://martinfowler.
com/articles/serverless.html (2016, accessed 18 December
2019).
33. Baldini I, Castro P, Chang K, et al. Serverless computing:
current trends and open problems. In: Research advances in
cloud computing. Singapore: Springer Singapore, 2017,
pp.1–20.
34. Baldini I, Castro P, Cheng P, et al. Cloud-native, event-based
programming for mobile applications. In: Proceedings of the
international conference on mobile software engineering and
systems, Austin, TX, 16–17 May 2016, pp.287–288. ACM.
35. Taylor SJ, Kiss T, Anagnostou A, et al. The CloudSME
simulation platform and its applications: a generic multi-
cloud platform for developing and executing commercial
cloud-based simulations. Fut Generat Comput Syst 2018; 88:
524–539.
36. Hanai M, Suzumura T, Ventresque A, et al. An adaptive
VM provisioning method for large-scale agent-based traffic
simulations on the cloud. In: 2014 IEEE 6th international
conference on cloud computing technology and science,
Singapore, 15–18 December 2014, pp.130–137. IEEE.
37. Zehe D, Knoll A, Cai W, et al. SEMSim Cloud Service:
large-scale urban systems simulation in the cloud. Simulat
Model Pract Theor 2015; 58: 157–171.
38. Carillo M, Cordasco G, Serrapica F, et al. D-Mason on the
cloud: an experience with Amazon Web Services. In:
Desprez F, Dutot P-F, Kaklamanis C, et al. (Eds.) Euro-Par
2016: parallel processing workshops, Lecture Notes in
Computer Science, pp.322–333. Cham: Springer
International Publishing.
39. Anderson K, Du J, Narayan A, et al. Gridspice: A distributed
simulation platform for the smart grid. IEEE Trans Ind
Informat 2014; 10: 2354–2363.
40. Guzzetti S, Passerini T, Slawinski J, et al.. Platform and algo-
rithm effects on computational fluid dynamics applications
in life sciences. Fut Generat Comput Syst 2017; 67: 382–
396.
41. Ledyayev R and Richter H. High performance computing in
a cloud using openstack. Cloud Computing 2014; 5: 108–
113.
42. Fehling C, Leymann F, Retter R, et al. Cloud computing pat-
terns. Wien, Austria: Springer, 2014.
43. Stine M. Migrating to cloud-native application architectures.
Sebastopol, CA: O Reilly, 2015.
44. Newman S. Building microservices. Sebastopol, CA:
O’Reilly Media, Incorporated, 2015.
45. Namiot D and Sneps-Sneppe M. On micro-services architec-
ture. Int J Open Inform Technol 2014; 2: 24–27.
46. Wiggins A. The twelve-factor app, http://12factor.net/ (2014,
accessed 14 February 2016).
47. Martin Fowler. Circuit breaker, http://martinfowler.com/
bliki/CircuitBreaker.html (2014, accessed 27 May 2016).
48. Erl T, Cope R and Naserpour A. Cloud computing design
patterns. Westford, MA: Prentice Hall, 2015.
49. Chang S, Hood R, Jin H, et al. Evaluating the suitability of
commercial clouds for NASA’s high performance comput-
ing applications: a trade study. Technical report, NASA,
https://www.nas.nasa.gov/assets/pdf/papers/NAS_Technical_
Report_NAS-2018-01.pdf (2018, accessed 18 December
2019).
50. Kaur T and Chana I. Energy efficiency techniques in cloud
computing: a survey and taxonomy. ACM Comput Surv
2015; 48: 22:1–22:46.
51. Tosatto A, Ruiu P and Attanasio A. Container-based orches-
tration in cloud: state of the art and challenges. In: 2015 ninth
international conference on complex, intelligent, and soft-
ware intensive systems, Blumenau, Brazil, 8–10 July 2015,
pp.70–75.
52. Peinl R, Holzschuher F and Pfitzer F. Docker cluster man-
agement for the cloud - survey results and own solution. J
Grid Comput 2016; 14: 265–282.
53. Spillner J. Practical tooling for serverless computing. In:
proceedings of the 10th international conference on utility
and cloud computing (UCC ’17), Austin, TX, 5–8 December
2017, pp.185–186. New York: ACM.
54. Lynn T, Rosati P, Lejeune A, et al. A preliminary review of
enterprise serverless cloud computing (Function-as-a-
Kratzke and Siegfried 19
Service) platforms. In: 2017 IEEE international conference
on cloud computing technology and science (CloudCom),
Hong Kong, China, 11–14 December 2017, pp.162–169.
55. van Eyk E, Toader L, Talluri S, et al. Serverless is more:
from PaaS to present cloud computing. IEEE Internet
Comput 2018; 22: 8–17.
56. van Eyk E, Iosup A, Abad CL, et al. A SPEC RG Cloud
Group’s vision on the performance challenges of FaaS cloud
architectures. In: Proceedings of the 8th ACM/SPEC on
international conference on performance engineering (ICPE
2018), Berlin, Germany, ACM, 9–13 April 2018, pp 21–24.
ACM.
57. Ylianttila M, Riekki J, Zhou J, et al. Cloud architecture for
dynamic service composition. Int J Grid High Perform
Comput 2012; 4: 17–31.
58. Zhou J, Riekki J and Sun J. Pervasive service computing
toward accommodating service coordination and collabora-
tion. In: 2009 4th international conference on frontier of
computer science and technology, Shanghai, China, 17–19
December 2009, pp.686–691. IEEE.
59. Huhns MN and Singh MP. Service-oriented computing:
key concepts and principles. IEEE Internet Comput 2005; 9:
75–81.
60. Dustdar S and Schreiner W. A survey on web services com-
position. Int J Web Grid Serv 2005; 1: 1–30.
61. Papazoglou MP, Traverso P, Dustdar S, et al. Service-
oriented computing: state of the art and research challenges.
Computer 2007; 40: 38–45.
62. Papazoglou MP and van den Heuvel WJ. Service oriented
architectures: approaches, technologies and research issues.
VLDB J 2007; 16: 389–415.
63. Razavian M and Lago P. A survey of SOA migration in
industry. In: Kappel G, Maamar Z and Motahari-Nezhad HR
(Eds.) Service-oriented computing. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2011, pp.618–626.
64. Cerny T, Donahoo MJ and Pechanec J. Disambiguation and
comparison of SOA, microservices and self-contained sys-
tems. In: Proceedings of the international conference on
research in adaptive and convergent systems (RACS ’17),
Krakow, Poland, 20–23 September 2017, pp.228–235. New
York: ACM.
65. Mittal S, Risco-Martı
´n JL and Zeigler BP. DEVS/SOA: a
cross-platform framework for net-centric modeling and
simulation in DEVS unified process. Simulation 2009; 85:
419–450.
66. Al-Zoubi K and Wainer G. Performing distributed simula-
tion with RESTful web-services. In: proceedings of the 2009
winter simulation conference (WSC), Austin, TX, 13–16
December 2009, pp.1323–1334. IEEE.
67. Mittal S and Risco-Martı
´n JL. DEVSML 3.0 Stack: rapid
deployment of DEVS farm in distributed cloud environment
using microservices and containers. In: Proceedings of the
symposium on theory of modeling & simulation (TMS/DEVS
’17), Virginia Beach, Virginia, 23–26 April 2017.
68. NATO STO. Modelling and Simulation as a Service
(MSaaS) - volume 1: MSaaS technical reference architec-
ture. Technical report, NATO Science and Technology
Organization (STO), 2018.
Author biographies
Nane Kratzke is a professor for Computer Science at
the Lu¨beck University of Applied Sciences and a former
Navy Officer (German Navy). He consulted for the
German Ministry of Defence in questions regarding
network-centric warfare. His particular research focus is
directed at cloud-native applications and cloud-native ser-
vice-related software engineering methodologies and cor-
responding application architectural styles, such as
microservices or serverless architectures. In addition, he is
interested in data science, distributed systems, and web-
scale elastic systems.
Robert Siegfried is senior consultant for IT and M&S
projects and CEO of aditerna GmbH and Aditerna Inc.
Prior to his industry engagement, he was research associ-
ate at the University of the Federal Armed Forces in
Munich, Germany. His primary research areas are agent-
based M&S and parallel and distributed simulation.
Within several projects for the German Armed Forces and
US Department of Defense (DoD), he has worked (and is
still working) on topics such as MSaaS, artificial intelli-
gence (AI)-supported data fusion, metadata specifications,
model management systems, distributed simulation test
beds, and process models. Since October 2018, he has
served as Vice-Chair of the NMSG. He is actively
involved in multiple working groups of the Simulation
Interoperability Standards Organization (SISO) and serves
as member of the SISO Executive Committee.
20 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
... Simulation experiments can be conducted within minutes or hours giving a first impression of the design and characteristics of the object under investigation. It is used in many cases like training, analysis and decision support [135]. A main concern is the accuracy of a simulation, in particular whether there is parity between the set of conditions and the features of the real object. ...
... There are many settings influencing each other and ultimately the performance of the CPU making it hard to trace performance variations back to a single factor. DELLs configuration of HPC servers [21,129] or the SPEC bios settings descriptions 135 for their CPU benchmarks are examples for the plethora of configuration options. Simply disabling all powersaving or performance boost options is not an option since the default settings do not ensure a linear scaling. ...
... FaaS simulation is a subarea of cloud simulation. Approaches present in literature can be divided into two categories: Firstly, the FaaS platforms are used as simulation engines where other systems are deployed to and investigated, like in [135,136]. Secondly, some approaches simulate the FaaS platform itself and cloud functions are only deployed to validate the simulation in the specific experiments. However, there is a lack of such simulation systems [165]. ...
Thesis
Full-text available
Serverless Computing is seen as a game changer in operating large-scale applications. While practitioners and researches often use this term, the concept they actually want to refer to is Function as a Service (FaaS). In this new service model, a user deploys only single functions to cloud platforms where the cloud provider deals with all operational concerns – this creates the notion of server-less computing for the user. Nonetheless, a few configurations for the cloud function are necessary for most commercial FaaS platforms as they influence the resource assignments like CPU time and memory. Due to these options, there is still an abstracted perception of servers for the FaaS user. The resource assignment and the different strategies to scale resources for public cloud offerings and on-premise hosted open-source platforms determine the runtime characteristics of cloud functions and are in the focus of this work. Compared to cloud offerings like Platform as a Service, two out of the five cloud computing characteristics improved. These two are rapid elasticity and measured service. FaaS is the first computational cloud model to scale functions only on demand. Due to an independent scaling and a strong isolation via virtualized environments, functions can be considered independent of other cloud functions. Therefore, noisy neighbor problems do not occur. The second characteristic, measured service, targets billing. FaaS platforms measure execution time on a millisecond basis and bill users accordingly based on the function configuration. This leads to new performance and cost trade-offs. Therefore, this thesis proposes a simulation approach to investigate this tradeoff in an early development phase. The alternative would be to deploy functions with varying configurations, analyze the execution data from several FaaS platforms and adjust the configuration. However, this alternative is time-consuming, tedious and costly. To provide a proper simulation, the development and production environment should be as similar as possible. This similarity is also known as dev-prod parity. Based on a new methodology to compare different virtualized environments, users of our simulation framework are able to execute functions on their machines and investigate the runtime characteristics for different function configurations at several cloud platforms without running their functions on the cloud platform at all. A visualization of the local simulations guide the user to choose an appropriate function configuration to resolve the mentioned trade-off dependent on their requirements.
... Serverless functions' auto-scaling features guarantee that applications may manage varying workloads without requiring human intervention. According to research, serverless functions can accommodate thousands of concurrent executions and dynamically allocate resources based on demand in real time [9]. ...
... Serverless functions cut expenses and operational overhead by autonomously scaling to meet demand and allocating resources dynamically [9]. The granularity and flexibility of serverless functions are advantageous to current application architectures, especially microservices, which this approach complements well. ...
Article
Full-text available
The evolution of cloud computing systems is examined in this study, which follows the path from conventional virtualisation to modern serverless computing models. Virtualisation optimised resource utilisation by allowing several VMs to operate on a single physical server at first, but it also added overhead and complexity to administration. With the use of common operating systems and quick deployment, containerisation signalled a move towards more effective and flexible solutions. By eliminating the need for infrastructure administration, concentrating on event-driven function execution, and providing improved scalability and cost effectiveness, serverless computing further revolutionised cloud infrastructure. This paper emphasises the consequences for resource management and application development while highlighting the developments, difficulties, and potential paths in cloud computing.
... On the other hand, from the hardware point of view, new paradigms such as Cloud Computing enables the provision of the large computational power of Google or Amazon infrastructure to a single researcher to exploit the computing resources for simulation execution [4]. However, the technologies for Cloud computing require specific handling and the M&S applications need to evolve to adapt to cloud-enabled architectures [5]. ...
... The microservices-based xDEVS distributed simulation execution is explained with the help of the classic Experimental Frame -Processor (EF-P) model [28]. Agnostic of the cloud deployment, a distributed simulation can be seen as a set of independent processes interconnected through the execution of microser- 3 The configuration file enumerates the atomic models and the IP and port where each model is listening, with an equivalent structure to the parallel configuration file 4 With entity we refer to a computer, virtual machine, container, etc. any virtual or physical device able to simulate an xDEVS model vices (wrapping DEVS atomic models) that are requested through socket commands. Figure 4 illustrates the process. ...
Preprint
Full-text available
Cloud simulation environments today are largely employed to model and simulate complex systems for remote accessibility and variable capacity requirements. In this regard, scalability issues in Modeling and Simulation (M\&S) computational requirements can be tackled through the elasticity of on-demand Cloud deployment. However, implementing a high performance cloud M\&S framework following these elastic principles is not a trivial task as parallelizing and distributing existing architectures is challenging. Indeed, both the parallel and distributed M\&S developments have evolved following separate ways. Parallel solutions has always been focused on ad-hoc solutions, while distributed approaches, on the other hand, have led to the definition of standard distributed frameworks like the High Level Architecture (HLA) or influenced the use of distributed technologies like the Message Passing Interface (MPI). Only a few developments have been able to evolve with the current resilience of computing hardware resources deployment, largely focused on the implementation of Simulation as a Service (SaaS), albeit independently of the parallel ad-hoc methods branch. In this paper, we present a unified parallel and distributed M\&S architecture with enough flexibility to deploy parallel and distributed simulations in the Cloud with a low effort, without modifying the underlying model source code, and reaching important speedups against the sequential simulation, especially in the parallel implementation. Our framework is based on the Discrete Event System Specification (DEVS) formalism. The performance of the parallel and distributed framework is tested using the xDEVS M\&S tool, Application Programming Interface (API) and the DEVStone benchmark with up to eight computing nodes, obtaining maximum speedups of 15.95×15.95\times and 1.84×1.84\times, respectively.
... Horizontal scaling refers to the process of adding more instances of a service to handle increased load. In a microservices architecture, each service can be scaled independently, depending on its specific requirements (Kratzke & Siegfried, 2021). This is in contrast to monolithic architectures, where scaling typically involves duplicating the entire application. ...
Article
Full-text available
Microservices architecture has emerged as a pivotal approach for designing scalable and maintainable cloud-native applications. Unlike traditional monolithic architectures, microservices decompose applications into small, independently deployable services that communicate through well-defined APIs. This architectural shift enhances modularity, allowing for improved scalability, resilience, and flexibility. This paper explores the core concepts of microservices, including service decomposition, inter-service communication, and data management. It delves into key design patterns such as the API Gateway, Circuit Breaker, Service Discovery, and Strangler Fig patterns, illustrating how these patterns address common challenges in microservices architecture. The discussion emphasizes the importance of these patterns in managing service interactions, ensuring fault tolerance, and facilitating gradual migration from legacy systems. Scalability is a major focus, with an examination of horizontal scaling techniques, load balancing strategies, and elasticity in cloud environments. The paper highlights best practices for scaling microservices, including auto-scaling policies and integration with cloud platforms like AWS, Azure, and GCP. Additionally, the paper addresses challenges such as complexity management, security considerations, and testing strategies. Real-world case studies provide insights into successful implementations and lessons learned. Finally, the paper considers emerging trends and future directions in microservices architecture, emphasizing its role in advancing modern application development. This exploration offers a comprehensive understanding of how microservices architecture can be effectively employed in cloud-native applications to achieve scalability and resilience. Keywords: Microservices, Architecture, Cloud-Native Applications, Design Patterns, Scalability.
... Training simulation is one of the most effective and economical ways to enhance personnel's professional ability, which has become a consensus. Faced with the serviceoriented simulation and the use of cloud computing and edge computing for simulation, it is an inevitable trend in the development of computer simulation technology [17][18][19][20]. Therefore, the cloud-edge collaborative training simulation not only faces some challenges in the field of cloud computing and edge computing, such as server deployment [21][22][23][24] and service deployment [25][26][27][28], but also has some unique problems. ...
Article
Full-text available
The requirements for low latency and high stability in large-scale geo-distributed training simulations have made cloud-edge collaborative simulation an emerging trend. However, there is currently limited research on how to deploy simulation run-time resources (SRR), including edge servers, simulation services, and simulation members. On one hand, the deployment schemes of these resources are coupled and have mutual impacts. It is difficult to ensure overall optimum by deploying these resources separately. On the other hand, the pursuit of low latency and high system stability is often challenging to achieve simultaneously because high stability implies low server load, while a small number of simulation services implies high response latency. We formulate this problem as a multi-objective optimization problem for the joint deployment of SRR, considering the complex combinatorial relationship between simulation services. Our objective is to minimize the system time cost and resource usage rate of edge servers under constraints such as server resource capacity and the relationship between edge servers and base stations. To address this problem, we propose a learnable genetic algorithm for SRR deployment (LGASRD) where the population can learn from elites and adaptively select evolution operators performing well. Extensive experiments with different settings based on real-world data sets demonstrate that LGASRD outperforms the baseline policies in terms of optimality, feasibility, and convergence rate, verifying the effectiveness and excellence of LGASRD when deploying SRR.
... Over the last decades, cloud computing has emerged as the finest computing model, integrating various computing prototypes like a parallel, grid, distributed and so on [6,7]. Cloud services permit the users and the business organization to use software and hardware that the third parties administrate at different remote positions [8]. ...
Article
Full-text available
Cloud services provide an optimal form of demand based data outsourcing. The large amount of user data sharing increases the possibility of attacks, and unauthorized users get easy data access to the data. Blockchain technology provides better security in the cloud based on its distributed and highly cohesive nature. In order to enhance the block chain based encryption process, the second work intends to propose a blockchain based hybrid optimized cryptography scheme for secure cloud storage. At first, key generation is performed using the ECC approach in the cloud. In cloud user registration, keys and data are needed, and the cloud will provide the user ID. Then, the optimal key selection is performed by using flamingo search optimization (FSO). The public and the private key is selected by using this optimization approach. Afterwards, data encryption is performed using the Elgamal scheme on the owner side. This hybrid lightweight elliptic Elgamal based encryption (HLEEE) approach in key generation and data encryption process increases data security. After the authentication process, the cloud controller maintains the blockchain to protect the data and signatures of the users by generating the hash in blocks. An optimal hash generation is performed using the SHA-256 approach in the blockchain. The generated hash value, encrypted data and timestamp are stored in each block to provide more security. Finally, blockchain validation is performed using the proof of authority (PoA) approach.
Article
Full-text available
In the landscape of modern software development, the demand for scalability and resilience has become paramount, particularly with the rapid growth of online services and applications. Cloud-native technologies have emerged as a transformative force in addressing these challenges, offering dynamic scalability and robust resilience through innovative architectural approaches. This paper presents a comprehensive review of leveraging cloud-native technologies to enhance scalability and resilience in software development. The review begins by examining the foundational concepts of cloud-native architecture, emphasizing its core principles such as containerization, microservices, and declarative APIs. These principles enable developers to build and deploy applications that can dynamically scale based on demand while maintaining high availability and fault tolerance. Furthermore, the review explores the key components of cloud-native ecosystems, including container orchestration platforms like Kubernetes, which provide automated management and scaling of containerized applications. Additionally, it discusses the role of service meshes in enhancing resilience by facilitating secure and reliable communication between microservices. Moreover, the paper delves into best practices and patterns for designing scalable and resilient cloud-native applications, covering topics such as distributed tracing, circuit breaking, and chaos engineering. These practices empower developers to proactively identify and mitigate potential failure points, thereby improving the overall robustness of their systems. This review underscores the significance of cloud-native technologies in enabling software developers to build scalable and resilient applications. By embracing cloud-native principles and adopting appropriate tools and practices, organizations can effectively meet the evolving demands of modern software development in an increasingly dynamic and competitive landscape.
Chapter
For the primary purpose of this research, key emphasis is on comprehensively reviewing existing literature on the usage of cloud computing simulators. For concluding this research, several types of cloud simulators are identified, reviewed, and compared with one another based on specific criteria of technical features. As concluded in the research, the most crucial aspects related to cloud simulation are the scope of accessing it from any location, and hence, presenting enhanced flexibility for the users, delocalising staff training, and ensuring learning on basis of collaborative and dynamic learning. Hence, the future scope of simulation lies in the system of cloud computing with a possibility that all engineering simulations will be conducted using cloud computing.KeywordsCloud computingCloud computing simulationSimulatorsAnd computation
Article
Full-text available
Modelling and Simulation (M&S) are core tools for designing, analysing and operating today’s industrial systems. They often also represent both a valuable asset and a significant investment. Typically, their use is constrained to a software environment intended to be used by engineers on a single computer. However, the knowledge relevant to a task involving modelling and simulation is in general distributed in nature, even across organizational boundaries, and may be large in volume. Therefore, it is desirable to increase the FAIRness (Findability, Accessibility, Interoperability, and Reuse) of M&S capabilities; to enable their use in loosely coupled systems of systems; and to support their composition and execution by intelligent software agents. In this contribution, the suitability of Semantic Web technologies to achieve these goals is investigated and an open-source proof of concept-implementation based on the Functional Mock-up Interface (FMI) standard is presented. Specifically, models, model instances, and simulation results are exposed through a hypermedia API and an implementation of the Pragmatic Proof Algorithm (PPA) is used to successfully demonstrate the API’s use by a generic software agent. The solution shows an increased degree of FAIRness and fully supports its use in loosely coupled systems. The FAIRness could be further improved by providing more “ rich” (meta)data.
Article
Full-text available
In the late-1950s, leasing time on an IBM 704 cost hundreds of dollars per minute. Today, cloud computing, that is, using IT as a service, on-demand and pay-per-use, is a widely used computing paradigm that offers large economies of scale. Born from a need to make platform as a service (PaaS) more accessible, fine-grained, and affordable, serverless computing has garnered interest from both industry and academia. This article aims to give an understanding of these early days of serverless computing: what it is, where it comes from, what is the current status of serverless technology, and what are its main obstacles and opportunities.
Technical Report
Full-text available
The project CloudTRANSIT dealt with the question of how to transfer cloud applications and services at runtime without downtime across cloud infrastructures from different public and private cloud service providers. This technical report summarizes the outcomes of approximately 20 research papers that have been published throughout the project. This report intends to provide an integrated birds-eye view on these-so far-isolated papers. The report references the original papers where ever possible. This project also systematically investigated practitioner initiated cloud application engineering trends of the last three years that provide several promising technical opportunities to avoid cloud vendor lock-in pragmatically. Especially European cloud service providers should track such kind of research because of the technical opportunities to bring cloud application workloads back home to Europe. Such workloads are currently often deployed and inherently bound to U.S. providers. Intensified EU General Data Protection (GDPR) policies, European Cloud Initiatives, or "America First" policies might even make this imperative. So, technical solutions needed for these scenarios that are manageable not only by large but also by small and medium-sized enterprises. Therefore, this project systematically analyzed commonalities of cloud infrastructures and cloud applications. Latest evolutions of cloud standards and cloud engineering trends (like containerization) were used to derive a cloud-native reference model (ClouNS) that guided the development of a pragmatic cloud-transferability solution. This solution intentionally separated the infrastructure-agnostic operation of elastic container platforms (like Swarm, Kubernetes, Mesos/Marathon, etc.) via a multi-cloud-scaler and the platform-agnostic definition of cloud-native applications and services via an unified cloud application modeling language. Both components are independent but complementary. Because of their independence, they can even contribute (although not intended) to other fields like moving target based cloud security-but also distributed ledger technologies (block-chains) made provide options here. The report summarizes the main outcomes and insights of a proof-of-concept solution to realize transferability for cloud applications and services at runtime without downtime.
Article
Full-text available
Simulation is used in industry to study a large variety of problems ranging from increasing the productivity of a manufacturing system to optimising the design of a wind turbine. However, some simulation models can be computationally demanding and some simulation projects require time consuming experimentation. High performance computing infrastructures such as clusters can be used to speed up the execution of large models or multiple experiments but at a cost that is often too much for Small and Medium-sized Enterprises (SMEs). Cloud computing presents an attractive, lower cost alternative. However, developing a cloud-based simulation application can again be costly for an SME due to training and development needs, especially if software vendors need to use resources of different heterogeneous clouds to avoid being locked-in to one particular cloud provider. In an attempt to reduce the cost of development of commercial cloud-based simulations, the CloudSME Simulation Platform (CSSP) has been developed as a generic approach that combines an AppCenter with the workflow of the WS-PGRADE/gUSE science gateway framework and the multi-cloud-based capabilities of the CloudBroker Platform. The paper presents the CSSP and two representative case studies from distinctly different areas that illustrate how commercial multi-cloud-based simulations can be created.
Article
Full-text available
Microservices are an architectural approach emerging out of service-oriented architecture, emphasizing self-management and lightweightness as the means to improve software agility, scalability, and autonomy. This article examines microservice evolution from the technological and architectural perspectives and discusses key challenges facing future microservice developments.
Conference Paper
Full-text available
As a key part of the serverless computing paradigm, Function-as-a-Service (FaaS) platforms enable users to run arbitrary functions without being concerned about operational issues. However, there are several performance-related issues surrounding the state-of-the-art FaaS platforms that can deter widespread adoption of FaaS, including sizeable overheads, unreliable performance, and new forms of the cost-performance trade-off. In this work we, the SPEC RG Cloud Group, identify six performance-related challenges that arise specifically in this FaaS model, and present our roadmap to tackle these problems in the near future. This paper aims at motivating the community to solve these challenges together.
Conference Paper
Full-text available
Microservices is an architectural style increasing in popularity. However, there is still a lack of understanding how to adopt a microservice-based architectural style. We aim at characterizing different microservice architectural style patterns and the principles that guide their definition. We conducted a systematic mapping study in order to identify reported usage of microservices and based on these use cases extract common patterns and principles. We present two key contributions. Firstly, we identified several agreed microservice architecture patterns that seem widely adopted and reported in the case studies identified. Secondly, we presented these as a catalogue in a common template format including a summary of the advantages, disadvantages, and lessons learned for each pattern from the case studies. We can conclude that different architecture patterns emerge for different migration, orchestration, storage and deployment settings for a set of agreed principles.
Chapter
Full-text available
Serverless computing has emerged as a new compelling paradigm for the deployment of applications and services. It represents an evolution of cloud programming models, abstractions, and platforms, and is a testament to thematurity and wide adoption of cloud technologies. In this chapter, we survey existing serverless platforms from industry, academia, and open-source projects, identify key characteristics and use cases, and describe technical challenges and open problems.
Conference Paper
Full-text available
In line with cloud computing emergence as the dominant enterprise computing paradigm, our conceptualization of the cloud computing reference architecture and service construction has also evolved. For example, to address the need for cost reduction and rapid provisioning, virtualization has moved beyond hardware to containers. More recently, serverless computing or Function-as-a-Service has been presented as a means to introduce further cost-efficiencies, reduce configuration and management overheads, and rapidly increase an application's ability to speed up, scale up and scale down in the cloud. The potential of this new computation model is reflected in the introduction of serverless computing platforms by the main hyperscale cloud service providers. This paper provides an overview and multi-level feature analysis of seven enterprise serverless computing platforms. It reviews extant research on these platforms and identifies the emergence of AWS Lambda as a de facto base platform for research on enterprise serverless cloud computing. The paper concludes with a summary of avenues for further research.
Conference Paper
Cloud applications are increasingly built from a mixture of runtime technologies. Hosted functions and service-oriented web hooks are among the most recent ones which are natively supported by cloud platforms. They are collectively referred to as serverless computing by application engineers due to the transparent on-demand instance activation and microbilling without the need to provision infrastructure explicitly. This half-day tutorial explains the use cases for serverless computing and the drivers and existing software solutions behind the programming and deployment model also known as Function-as-a-Service in the overall cloud computing stack. Furthermore, it presents practical open source tools for deriving functions from legacy code and for the management and execution of functions in private and public clouds.