Journal of Defense Modeling and
ÓThe Author(s) 2020
Towards cloud-native simulations –
lessons learned from the front-line of
and Robert Siegfried
Cloud computing can be a game-changer for computationally intensive tasks like simulations. The computational power
of Amazon, Google, or Microsoft is even available to a single researcher. However, the pay-as-you-go cost model of
cloud computing influences how cloud-native systems are being built. We transfer these insights to the simulation
domain. The major contributions of this paper are twofold: (A) we propose a cloud-native simulation stack and (B)
derive expectable software engineering trends for cloud-native simulation services. Our insights are based on systematic
mapping studies on cloud-native applications, a review of cloud standards, action research activities with cloud engineer-
ing practitioners, and corresponding software prototyping activities. Two major trends have dominated cloud computing
over the last 10 years. The size of deployment units has been minimized and corresponding architectural styles prefer
more fine-grained service decompositions of independently deployable and horizontally scalable services. We forecast
similar trends for cloud-native simulation architectures. These similar trends should make cloud-native simulation ser-
vices more microservice-like, which are composable but just ‘‘simulate one thing well.’’ However, merely transferring
existing simulation models to the cloud can result in significantly higher costs. One critical insight of our (and other)
research is that cloud-native systems should follow cloud-native architecture principles to leverage the most out of the
pay-as-you-go cost model.
Cloud computing, cloud native, cloud maturity, simulation, reference model, maturity model
Simulation is used for various purposes, such as training,
analysis, and decision support. Consequently, modeling
and simulation (M&S) has become a critical technology
for many industry sectors (such as logistics and manufac-
turing) and the defense sector. Achieving interoperability
between multiple simulation systems and ensuring the
credibility of results often requires enormous efforts with
regards to time, personnel, and budget. Recent technical
developments in the area of cloud computing technology
and service-oriented architectures (SOAs) may offer
opportunities to better utilize M&S capabilities in order to
satisfy these critical needs. A concept that includes service
orientation and the provision of M&S applications via the
as-a-service model of cloud computing may enable more
composable simulation environments that can be deployed
on-demand. This new concept is commonly known as
M&S as a Service (MSaaS).
1.1 MSaaS simulation principles and validation
The NATO Modelling and Simulation Group (NMSG) is
part of the NATO Science and Technology Organization
(STO). The mission of the NMSG is to promote coopera-
tion among alliance bodies, NATO, and partner nations to
Department for Electrical Engineering and Computer Science, Lu¨beck
University of Applied Sciences, Germany
aditerna GmbH, Germany
Nane Kratzke, Department for Electrical Engineering and Computer
Science, Lu¨beck University of Applied Sciences, Mo
¨nkhofer Weg 239,
Lu¨beck, 23562, Germany.
maximize the effective utilization of M&S. Primary mis-
sion areas include the following:
•associated science and technology.
The NMSG is tasked to enforce and supervise imple-
mentation of the NATO Modelling and Simulation
Masterplan (NMSMP; v2.0 (AC/323/NMSG(2012)-015)).
The NMSMP defines several objectives that collectively
will help to exploit M&S to its full potential across NATO
and the nations to enhance both operational and cost-effec-
tiveness. This vision will be achieved through a coopera-
tive effort guided by the following principles.
•Synergy: leverage and share the existing NATO
and national M&S capabilities.
•Interoperability: direct the development of common
M&S standards and services for simulation intero-
perability and foster interoperability between
Command & Control (C2) and simulation.
•Reuse: increase the visibility, accessibility, and
awareness of M&S assets to foster sharing across
all NATO M&S application areas.
The NMSMP defines five strategic objectives, two of
which are directly addressed by the MSaaS efforts
described in this paper:
•establish a common technical framework;
•provide coordination and common services.
NATO MSG-136 (Modelling and Simulation as a
is one of the working groups under the NMSG.
From 2014 to 2017 this working group investigated the
concept of MSaaS to provide the technical and organiza-
tional foundations for a future service-based allied frame-
work for MSaaS within NATO and partner nations. In this
period, MSG-136 did groundbreaking work by defining
MSaaS in the NATO context and by developing opera-
tional, technical, and governance concepts for permanently
establishing the ‘‘Allied Framework for MSaaS.’’ In addi-
tion to developing the foundational concepts, MSG-136
conducted extensive experimentation activities to test and
validate the concepts.
From 2018 to 2021, the initial con-
cepts are extended by MSG-164 and validated through
dedicated evaluation events and participation in opera-
1.2 Cloud-native lessons learned
Even tiny companies can generate enormous economic
growth and business value by providing cloud-based
services or applications: Instagram, Uber, WhatsApp,
NetFlix, Twitter – and many astonishing small companies
(if we relate the modest headcount of these companies in
their founding days to their noteworthy economic impact)
whose services are frequently used. However, even a fast-
growing start-up business model should have long-term
consequences and dependencies in mind. Many of these
companies rely on public cloud infrastructures – often pro-
vided by Amazon Web Services (AWS), Microsoft
(Azure), Google (Cloud Services), etc. Meanwhile, cloud
providers run a significant amount of mission-critical busi-
ness software for companies that no longer operate their
own data centers. Moreover, it is very often economical if
workloads have a high peak-to-average ratio.
there are downsides. Although cloud services could be
standardized commodities, they are mostly not. Once a
cloud-hosted application or service is deployed to a spe-
cific cloud infrastructure, it is often inherently bound to
that infrastructure due to non-obvious technological bind-
ings. A transfer to another cloud infrastructure is very
often a time consuming and expensive one-time exercise.
A good real-world example here is Instagram. After being
bought by Facebook, it took over a year for the Instagram
engineering team to find and establish a solution for the
transfer of all its services from AWS to Facebook data
centers. Although no downtimes were planned, noteworthy
outages occurred during that period.
The National Institute of Standards and Technology
(NIST) definition of cloud computing defines three basic
and well-accepted service categories
: Infrastructure as a
Service (IaaS), Platform as a Service (PaaS), and Software
as a Service (SaaS). IaaS provides maximum flexibility
for arbitrary consumer-created software but hides almost
no operation complexity of the application (just of the
infrastructure). SaaS, on the other hand, hides operation
complexity almost entirely but is too limited for many use
cases involving consumer-created software. PaaS is a com-
promise enabling the operation of consumer-created soft-
ware with a convenient operation complexity but at the
cost to accept to some degree of lock-in situations result-
ing from the platform.
Throughout a project called CloudTRANSIT, we
searched intensively for solutions to overcome this ‘‘cloud
lock-in’’ – to make cloud computing an actual commodity.
We developed and evaluated a cloud application transfer-
ability concept that has prototype status but already works
for approximately 70% of the current cloud market, and
that can be extended for the rest of the market share.
However, what is essential for this paper is that we learned
some core insights from our action research with
•practitioners want to have a choice between
2Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
•practitioners prefer declarative and cybernetic
(auto-adjusting) instead of workflow-based
(imperative) deployment and orchestration
•practitioners are forced to make efficient use of
cloud resources because more and more systems
are migrated to cloud infrastructures, causing stea-
dily increasing bills;
•practitioners rate pragmatism of solutions much
higher than full feature coverage of cloud platforms
1.3 Research question
All these points influence how practitioners construct
cloud application architectures that are intentionally
designed for the cloud. One thing we learned was the fact
that cloud-native applications – although they are all dif-
ferent – follow some common architectural patterns that
we could exploit for transferability. This paper investigates
the research question of how these lessons learned can be
transferred from the cloud-native computing to the simula-
tion and modeling domain.
Therefore, the remainder of this paper is outlined as fol-
lows. We present a cloud application reference model in
Section 2 that steered our research in the cloud computing
domain. According to our experiences and action research
activities over the last 10 years, cloud computing is domi-
nated by two major long-term trends that are investigated
in Section 3. In particular, we investigate resource utiliza-
tion improvements in Section 3.1 and the architectural evo-
lution of cloud applications in Section 3.2. Section 4 will
analyze both trends regarding possible upcoming trends of
interest in the M&S community. Section 5 will present cor-
responding related work from cloud computing and the
MSaaS domain to provide interesting follow-up for the
reader. We will conclude our thoughts in Section 6 and
forecast intensified decentralizing and more fine-grained
service composing approaches for cloud computing and
the MSaaS domain.
2. Reference model
Our problem awareness results mainly from the conducted
research project CloudTRANSIT. This project dealt with
the question of how to transfer cloud applications and ser-
vices at runtime without downtime across cloud infrastruc-
tures from different public and private cloud service
providers to tackle the existing and growing problem of
vendor lock-in in cloud computing. Throughout the proj-
ect, we published more than 20 research papers. However,
the intent of this paper is not to summarize these papers.
The interested reader is referred to the corresponding tech-
that provides an integrated view of these
Almost all cloud system engineers focus on a common
problem. The core components of their distributed and
cloud-based systems, such as virtualized server instances
and essential networking and storage, can be deployed
using commodity services. However, further services –
that are needed to integrate these virtualized resources in
an elastic, scalable, and pragmatic manner – are often not
considered in standards. Services such as load balancing,
auto-scaling, or message queuing systems are needed to
design an elastic and scalable cloud-native system on
almost every cloud service infrastructure. Some standards,
such as AMQP
for messaging (dating back almost to the
pre-cloud era), exist. However, mainly these integrating
and ‘‘gluing’’ service types – that are crucial for almost
every cloud application on a higher cloud maturity level –
are often not provided in a standardized manner by cloud
It seems that all public cloud service providers
try to stimulate cloud customers to use their non-
commodity convenience service ‘‘interpretations’’ to bind
them to their infrastructures and higher level service
What is more, according to an analysis we performed
the percentage of these commodity service cate-
gories that are considered in standards, such as CIMI,
even decreased over the years. That has mainly to do with
the fact that new cloud service categories are released
faster than standardization authorities can standardize
existing service categories. Figure 1 shows this effect by
the example of AWS over the years. That is how vendor
Figure 1. Decrease of standard coverage over years (by
example of Amazon Web Services).
Kratzke and Siegfried 3
lock-in emerges in cloud computing. For a more detailed
discussion, we refer to Opara-Martins et al.,
and Kratzke and Peinl.
Therefore, all reviewed cloud standards focus on a min-
imal but necessary subset of popular cloud services: com-
pute nodes (virtual machines), storage (file, block, object),
and (virtual private) networking. Standardized deployment
approaches, such as TOSCA, are defined mainly against
this commodity infrastructure level of abstraction. These
kinds of services are often subsumed as IaaS and build the
foundation of cloud services and therefore cloud-native
applications. All other service categories might foster ven-
dor lock-in situations. That might sound disillusioning. In
consequence, many cloud engineering teams follow the
basic idea that a cloud-native application stack should be
only using a minimal subset of well-standardized IaaS ser-
vices as founding building blocks. Because existing cloud
standards cover only specific cloud service categories
(mainly the IaaS level) and do not show an integrated
point of view, a more integrated reference model that take
the best practices of practitioners into account would be
Very often cloud computing is investigated from a ser-
vice model point of view (IaaS, PaaS, SaaS) or a deploy-
ment point of view (private, public, hybrid, community
Alternatively, one can look from an actor point of
view (provider, consumer, auditor, broker, carrier) or a
functional point of view (service deployment, service
orchestration, service management, security, privacy), as
done by Bohn et al.
Points of view are particularly useful
to split problems into concise parts. However, the
viewpoints mentioned above might be common in cloud
computing and useful from a service provider point of
view, but not from a cloud-native application engineering
point of view. From an engineering point of view, it seems
more useful to have views on the technology levels
involved and applied in cloud-native application
By using the insights from our systematic mapping
and our review of cloud standards,
a reference model of cloud-native applications. This
layered reference model is shown and explained in
Figure 2. The basic idea of this reference model is to use
only a small subset of well-standardized IaaS services as
founding building blocks (Layer 1). Four primary view-
points form the overall shape of this model.
•Infrastructure provisioning: this is a viewpoint
that is familiar for engineers working on the infra-
structure level and how IaaS is understood. IaaS
deals with the deployment of separate compute
nodes for a cloud consumer. The cloud consumer
must manage these (hundreds of) requested and iso-
•Clustered elastic platforms: this is a viewpoint
that is familiar for engineers who are dealing with
horizontal scalability across nodes. Clusters are a
concept to handle many Layer 1 nodes as one logi-
cal compute node (a cluster). Such technologies are
often the technological backbone for portable cloud
runtime environments because they are hiding com-
plexity (of hundreds or thousands of single nodes)
Figure 2. Cloud-native stack observable in many cloud-native applications. FaaS: Function as a Service; IaaS: Infrastructure as a
4Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
appropriately. In addition, this layer realizes the
foundation to define services and applications with-
out reference to particular cloud services, cloud
platforms, or cloud infrastructures. Thus, it pro-
vides a foundation to avoid vendor lock-in.
•Service composing: this is a viewpoint familiar for
application engineers dealing with web services in
SOAs. These (micro)-services operate on a Layer 2
cloud runtime platform (such as Kubernetes,
Mesos, Swarm, Nomad, and so on). Thus, the com-
plex orchestration and scaling of these services are
abstracted and delegated to a cluster (cloud runtime
environment) on Layer 2.
•Application: this is a viewpoint that is familiar for
end-users of cloud services (or cloud-native applica-
tions). These cloud services are composed of smaller
cloud Layer 3 services being operated on clusters
formed of single compute and storage nodes.
For more details, we refer to Kratzke and Peinl
Kratzke and Quint.
However, the remainder of this paper
follows this model.
3. Observable long-term trends in cloud
Cloud computing emerged some 10 years ago. In the first
adoption phase, existing IT systems were merely trans-
ferred to cloud environments without changing the original
design and architecture of these applications. Tiered appli-
cations were merely migrated from dedicated hardware to
virtualized hardware in the cloud. Cloud system engineers
implemented remarkable improvements in cloud platforms
(PaaS) and infrastructures (IaaS) over the years and estab-
lished several engineering trends.
All of these trends try to optimize specific quality fac-
tors, such as functional stability, performance efficiency,
compatibility, usability, reliability, maintainability, port-
ability, and security of cloud services to improve the over-
all quality of service (QoS). The most focused quality
factors are functional stability, performance efficiency,
and reliability (including availability).
engineering trends, listed in Table 1, seem somehow iso-
lated. We want to review these trends from two different
Table 1. Some observable software engineering trends coming along with CNAs.
Microservices Microservices can be seen as a ‘‘pragmatic’’ interpretation of SOA. In addition to SOA, microservice
architectures intentionally focus and compose small and independently replaceable horizontally scalable
services that are ‘‘doing one thing well.’’
DevOps DevOps is a practice that emphasizes the collaboration of software developers and IToperators. It aims to
build, test, and release software more rapidly, frequently, and more reliably using automated processes for
software delivery. DevOps fosters the need for independent replaceable and standardized deployment units
and therefore pushes microservice architectures and container technologies.
Softwareization of infrastructure and network enables one to automate the process of software delivery
and infrastructure changes more rapidly. Cloud modeling languages can express applications and services
and their elasticity behavior that shall be deployed to such infrastructures or platforms.
Deployment units wrap a piece of software in a complete file system that contains everything needed to
run: code, runtime, system tools, system libraries. So, it is guaranteed that the software will always run the
same, regardless of its environment. This deployment approach is often made using container technologies
(OCI standard). Each deployment unit should be designed and interconnected according to a collection of
cloud-focused patterns, such as the twelve-factor app collection, the circuit breaker pattern, and cloud
Elastic platforms Elastic platforms, such as Kubernetes, Mesos, or Swarm, can be seen as a unifying middleware of elastic
infrastructures. Elastic platforms extend resource sharing and increase the utilization of underlying
compute, network, and storage resources for custom but standardized deployment units.
Serverless The term ‘‘serverless’’ is used for an architectural style that is used for cloud application architectures that
deeply depend on external third-party services (Backend-as-a-Service, BaaS) and integrating them via small
event-based triggered functions (Function-as-a-Service, FaaS). FaaS extends resource sharing of elastic
platforms by simply applying time-sharing concepts.
State isolation Stateless components are easier to scale up/down horizontally than stateful components. Of course,
stateful components cannot be avoided, but stateful components should be reduced to a minimum and
realized by intentional horizontal scalable storage systems (often eventual consistent NoSQL databases).
Versioned REST APIs REST-based APIs provide scalable and pragmatic communication, which means relying mainly on already
existing internet infrastructure and well-defined and widespread standards.
Loose coupling Service composition is done by events or by data. Event coupling relies on messaging solutions (e.g., AMQP
standard). Data coupling often relies on scalable but (mostly) eventual consistent storage solutions (which
are often subsumed as NoSQL databases).
CNAs: cloud-native applications; SOA: service-oriented architecture; API: application programming interface.
Kratzke and Siegfried 5
3.1 Resource utilization
Cloud infrastructures (IaaS) and platforms (PaaS) are built
to be elastic. Elasticity is understood as the degree to
which a system adapts to workload changes by provision-
ing and de-provisioning resources automatically. Without
this, cloud computing is very often not reasonable from an
economic point of view.
Over time, system engineers
learned to understand the elasticity options of modern
cloud environments better. Eventually, systems were
designed for such elastic cloud infrastructures, which
increased the utilization rates of underlying computing
infrastructures via new deployment and design approaches,
such as containers, microservices, or serverless architec-
tures. This design intention is often expressed using the
term ‘‘cloud native.’’
Figure 3 shows a noticeable trend over the last decade.
Machine virtualization was introduced to consolidate many
bare metal machines to make more efficient utilization of
physical resources. This machine virtualization forms the
technological backbone of IaaS cloud computing. Virtual
machines might be more lightweight than bare metal ser-
vers, but they are still heavy, especially regarding their
image sizes. Due to being more fine-grained, containers
improved the way of standardized deployments but also
increased the utilization of virtual machines.
Nevertheless, although containers can be scaled
quickly, they are still always-on components. For that
reason Function-as-a-Service (FaaS) approaches have
emerged and applied time sharing of containers on under-
lying container platforms. Due to this time-shared execu-
tion of containers on the same hardware, FaaS enables
even a scale-to-zero capability. This improved resource
efficiency can be even measured monetarily.
time the technology stack to manage resources in the
cloud became more complicated and more difficult to
understand but followed one trend – to run a greater work-
load on the same number of physical machines.
3.1.1 Service-oriented deployment monoliths. Service-
oriented computing is a paradigm for distributed comput-
ing and e-business processing and has been introduced to
manage the complexity of distributed systems and to inte-
grate different software applications. A service offers func-
tionalities to other services mainly via message passing.
Services decouple their interfaces from their implementa-
tion. Corresponding architectures for such applications are
called SOAs. Many business applications have been devel-
oped over recent decades following this architectural para-
digm. Also, due to its underlying service concepts, these
applications can be deployed in cloud environments with-
out any problems. However, the main problem for cloud
system engineers emerges from the problem that –
although these kinds of applications are composed of dis-
tributed services – their deployment is not. These kinds of
Figure 3. The cloud architectural evolution from a resource utilization point of view. VM: virtual machine; FaaS: Function as a
6Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
distributed applications are conceptually monolithic appli-
cations from a deployment point of view.
In other words, the complete distributed application
must be deployed all at once in the case of updates or new
service releases. This monolithic style even leads to situa-
tions where complete applications are simply packaged as
one large virtual machine image. That fits perfectly to the
situations shown in Figure 3 (Dedicated Server and
Virtualization). However, depending on the application
size, this normally involves noteworthy downtimes of the
application for end-users and limits the capability to scale
the application in the case of increasing or decreasing
It is evident that especially cloud-native applications
come along with such 24 ×7 requirements and the need to
deploy, update, or scale single components independently
from each other at runtime without any downtime.
Therefore, SOA evolved into a so-called microservice
architectural style. One might mention that microservices
are mainly a more pragmatic version of SOAs. What is
more, microservices are intentionally designed to be
independently deployable, updateable, and horizontally
scalable. So, microservices have some architectural
implications that will be investigated in Section 3.2.1.
However, deployment units of microservices should be
standardized and self-contained. This aspect will be inves-
tigated in Section 3.1.2.
3.1.2 Standardized and self-contained deployment
units. While deployment monoliths are mainly using IaaS
resources in the form of virtual machines that are deployed
and updated less regularly, microservice architectures split
up the monolith into independently deployable units that
are deployed and terminated much more frequently. What
is more, this deployment is done in a horizontally scalable
way that is very often triggered by request stimuli. If many
requests are hitting a service, more service instances are
launched to distribute the requests across more instances.
If the requests are decreasing, service instances are shut
down to free resources (and save money). So, the inherent
elasticity capabilities of microservice architectures are
much more in focus compared with classical deployment
monoliths and SOA approaches. One of the critical suc-
cess factors resulting in microservice architectures gaining
so much attraction over the recent years might be the fact
that the deployment of service instances could be standar-
dized as self-contained deployment units – so-called con-
Containers make use of operating system
virtualization instead of machine virtualization (see Figure 4)
and are therefore much more lightweight. Containers make
scaling much more pragmatic and faster, and because contain-
ers are less resource consuming compared with virtual
machines, the instance density is reduced.
However, even in microservice architectures, the ser-
vice concept is an always-on concept. So, at least one ser-
vice instance (container) must be active and running for
each microservice at all times. Thus, even container tech-
nologies do not overcome the need for always-on compo-
nents. Also, always-on components are one of the most
expensive and therefore avoidable cloud workloads,
according to Weinmann.
Thus, the question arises as to
Figure 4. Comparing containers and virtual machines (adapted from the Docker website: https://www.docker.com/resources/
Kratzke and Siegfried 7
whether it is possible to execute service instances only in
the case of actual requests? The answer leads to FaaS con-
cepts and corresponding platforms that will be discussed
in Section 3.1.3.
3.1.3 Function as a Service. Microservice architectures pro-
pose a solution to efficiently scale computing resources
that are hardly realizable with monolithic architectures.
The allocated infrastructure can be better tailored to the
microservice needs due to the independent scaling of each
one of them via standardized deployment units, addressed
in Section 3.1.2. However, microservice architectures face
additional efforts, such as deploying every single microser-
vice and to scale and operate them in cloud infrastructures.
To address these concerns container orchestrating plat-
forms, such as Kubernetes
emerged. However, this shifts the problem to the operation
of these platforms, and these platforms are still always-on
components. Thus, so-called serverless architectures and
FaaS platforms have emerged in the cloud service ecosys-
tem. The AWS Lambda service might be the most promi-
nent one, but there exist more, such as Google Cloud
Functions, Azure Functions, OpenWhisk, and Spring
Cloud Functions, to name just a few. However, all (com-
mercial platforms) follow the same principle to provide
minimal and fine-grained services (just exposing one state-
less function) that are billed on a runtime-consuming
model (millisecond dimension).
FaaS is more fine-grained than microservices and facil-
itates the creation of functions. Therefore, these fine-
grained functions are sometimes called nanoservices.
These functions can be quickly deployed and automati-
cally scaled, and provide the potential to reduce infrastruc-
ture and operation costs. Unlike the deployment unit
approaches of Section 3.1.2 – that are still always-on soft-
ware components – functions are only processed if there
are active requests. Thus, FaaS can be much more cost
efficient than just containerized deployment approaches.
According to a cost comparison study of monolithic,
microservice, and FaaS architectures in a case study by
Villamizar et al.,
cost reductions of up to 75% are possible.
On the other hand, there are still open problems, such as
the serverless trilemma. The serverless trilemma ‘‘captures
the inherent tension between economics, performance, and
synchronous composition’’ of serverless functions.
obvious problem stressed by Baldini et al.
is the ‘‘double
spending problem’’ shown in Figure 5. This problem
occurs when a serverless function fis calling another ser-
verless function gsynchronously. The consumer is billed
for the execution of fand g– although only gis consum-
ing resources because fis waiting for the result of g.To
avoid this double spending problem, many serverless
applications delegate the composition of fine-grained ser-
verless functions into higher order functionality to client
applications and edge devices outside the scope of FaaS
platforms. This composition problem leads to new – more
distributed and decentralized – forms of cloud-native
architectures investigated in Section 3.2.2.
3.2 Architectural evolution
The reader has seen in Section 3.1 that cloud-native appli-
cations strived for a better resource utilization mainly by
applying more fine-grained deployment units in shape of
lightweight containers (instead of virtual machines) or the
shape of functions in the case of FaaS approaches.
Moreover, these improvements of resource utilization rates
had an impact on how the architectures of cloud
Figure 5. The double spending problem resulting from the serverless trilemma. FaaS: Function as a Service.
8Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
applications evolved. Two major architectural trends
(microservices, and serverless architectures) in cloud
application architectures have emerged in the last decade.
We will investigate microservice architectures in Section
3.2.1 and serverless architectures in Section 3.2.2.
3.2.1 Microservice architectures. Microservices form.
.an approach to software and systems architecture that
builds on the well-established concept of modularisation but
emphasise technical boundaries. Each module — each micro-
service — is implemented and operated as a small yet inde-
pendent system, offering access to its internal logic and data
through a well-defined network interface. This architectural
style increases software agility because each microservice
becomes an independent unit of development, deployment,
operations, versioning, and scaling.
Often-mentioned benefits of microservice architectures are
faster delivery, improved scalability, and greater auton-
Different services in a microservice architecture
can be scaled independently from each other according to
their specific requirements and actual request stimuli.
What is more, each service can be developed and oper-
ated by different teams. So, microservices do not only have
a technological but also an organizational impact. These
teams can make localized decisions per service regarding
programming languages, libraries, frameworks, and more.
This organizational impact enables, on the one hand, best-
of-breed approaches within each area of responsibility. On
the other hand, it might increase the technological hetero-
geneity across the complete system. What is more, corre-
sponding the long-term effects regarding maintainability of
such systems might not have even been observed so far.
First generation microservices are formed of individual
services that were packed using container technologies.
These services were then deployed and managed at run-
time using container orchestration tools, such as Mesos.
Each service was responsible for keeping track of other
services, and invoking them by specific communication
protocols. Failure-handling was implemented directly in
the service source code. With an increase of services per
application, the reliable and fault-tolerant location and
invocation of appropriate service instances became a prob-
lem itself. If new services were implemented using differ-
ent programming languages, reusing existing discovery
and failure-handling code would become increasingly dif-
ficult. So, freedom of choice and ‘‘polyglot programming’’
are often-mentioned benefits of microservices, but they
have drawbacks that need to be managed.
Therefore, second generation microservice architec-
tures made use of discovery services and reusable fault-
tolerant communication libraries. Common discovery ser-
vices (such as Consul, see Table 2) were used to register
provided functionalities. During service invocation, all
protocol-specific and failure-handling features were dele-
gated to an appropriate communication library, such as
Finagle (see Table 2). This simplified service implementa-
tion and reuse of boilerplate communication code across
The third generation introduced service proxies as
transparent service intermediates with the intent to
improve software reusability. So-called sidecars encapsu-
late reusable service discovery and communication fea-
tures as self-contained services that can be accessed via
existing fault-tolerant communication libraries provided by
almost every programming language nowadays. Because
of its network intermediary conception, sidecars are more
than suited for monitoring the behavior of all service inter-
actions in a microservice application. This intermediary is
precisely the idea behind service mesh technologies such
as Linkerd (see Table 2). These tools extend the notion of
self-contained sidecars to provide a more integrated service
communication solution. Using service meshes, operators
have much more fine-grained control over the service-to-
service communication, including service discovery, load bal-
ancing, fault tolerance, message routing, and even security.
So, besides the pure architectural point of view, the fol-
lowing tools, frameworks, services, and platforms (see Table
2) form our current understanding of the term microservice.
•Service discovery technologies let services commu-
nicate with each other without explicitly referring
to their network locations.
•Container orchestration technologies automate con-
tainer allocation and management tasks, abstracting
away the underlying physical or virtual infrastruc-
ture from service developers. That is the reason we
see this technology as an essential part of any
cloud-native application stack (see Figure 2).
•Monitoring technologies that are often based on
time-series databases to enable runtime monitoring
and analysis of the behavior of microservice
resources at different levels of detail.
•Latency and fault-tolerant communication libraries
let services communicate more efficiently and reli-
ably in permanently changing system configura-
tions with plenty of service instances permanently
joining and leaving the system according to chang-
ing request stimuli.
•Continuous delivery technologies integrate solu-
tions, often into third-party services that automate
many of the DevOps practices typically used in a
web-scale microservice production environment.
•Service proxy technologies encapsulate mainly
communication-related features, such as service
discovery and fault-tolerant communication, and
expose them over HTTP.
Kratzke and Siegfried 9
•Finally, the latest service mesh technologies build
on sidecar technologies to provide a fully integrated
service-to-service communication monitoring and
Table 2 shows that a complex tool-chain evolved to
handle the continuous operation of microservice-based
3.2.2 Serverless architectures. Serverless computing is a
cloud computing execution model in which the allocation
of machine resources is dynamically managed and inten-
tionally out of control of the service customer. The ability
to scale-to-zero instances is one of the critical differentia-
tors of serverless platforms compared with container
focused PaaS or virtual machine focused IaaS services.
Scale-to-zero enables to avoid always-on components and
therefore excludes the most expensive cloud usage pattern,
according to Weinmann.
That might be one reason why
the term ‘‘serverless’’ has become more and more com-
mon since 2014.
However, what is ‘‘serverless’’ exactly?
Servers must still exist somewhere.
So-called serverless architectures replace server admin-
istration and operation mainly by using FaaS concepts
and integrating third-party backend services. Figure 3
showed the evolution of how resource utilization has been
optimized over the last 10 years, ending in the latest trend
to make use of FaaS platforms. FaaS platforms apply
time-sharing principles and increase the utilization factor
of computing infrastructures, and thus avoid expensive
always-on components. As already mentioned, at least one
study showed that due to this time-sharing, serverless
architectures can reduce costs by 70%.
A serverless plat-
form is merely an event processing system (see Figure 6).
According to Baldini et al.,
serverless platforms take an
event (sent over HTTP or received from a further event
source in the cloud), then these platforms determine which
functions are registered to process the event, find an exist-
ing instance of the function (or create a new one), send the
event to the function instance, wait for a response, gather
execution logs, make the response available to the user,
and stop the function when it is no longer needed. Beside
application programming interface (API) composition and
aggregation to reduce API calls,
tions are very much suited for this approach.
Serverless platform provision models can be grouped
into the following categories.
•Public (commercial) serverless services of public
cloud service providers provide computational run-
time environments, also known as FaaS platforms.
Some well-known type representatives include
AWS Lambda, Google Cloud Functions, and
Microsoft Azure Functions. All of the mentioned
commercial serverless computing models are prone
to create vendor lock-in (to some degree).
•Open (source) serverless platforms such as
Apache’s OpenWhisk and OpenLambda might be
an alternative, with the downside that these plat-
forms need infrastructure.
•Provider agnostic serverless frameworks provide
a provider and platform agnostic way to define and
deploy serverless code on various serverless plat-
forms or commercial serverless services. So, these
frameworks are an option to avoid (or reduce) ven-
dor lock-in without the necessity to operate their
So, on the one hand, serverless computing provides
some inherent benefits, such as resource and cost effi-
ciency, operation simplicity, and a possible increase in
development speed and better time-to-market.
serverless computing also comes with some noteworthy
Table 2. Some observable microservice engineering ecosystem components.
Ecosystem component Example tools, frameworks, services, and platforms (last accessed 18 December 2019)
Service discovery Zookeeper (https://zookeeper.apache.org), Eureka (https://github.com/Netflix/eureka), Consul
(https://www.consul.io), etcd (https://github.com/coreos/etcd, Synapse (https://github.com/airbnb/
Container orchestration Kubernetes (https://kubernetes.io, Mesos (http://mesos.apache.org, Swarm (https://
docs.docker.com/engine/swarm), Nomad (https://www.nomadproject.io)
Monitoring Graphite (https://graphiteapp.org), InfluxDB (https://github.com/influxdata/influxdb), Sensu
(https://sensuapp.org), cAdvisor (https://github.com/google/cadvisor), Prometheus (https://
prometheus.io), Elastic Stack (https://elastic.io/products)
Fault-tolerant communication Finagle (https://twitter.github.io/finagle), Hystrix (https://github.com/Netflix/Hystrix), Proxygen
(https://github.com/facebook/proxygen), Resilience4j (https://github.com/resilience4j)
Continuous delivery services Ansible (https://ansible.com), Circle CI (https://circleci.com/), Codeship (https://codeship.com/),
Drone (https://drone.io), Spinnaker (https://spinnaker.io), Travis CI (https://travis-ci.org/)
Service proxy Envoy (https://www.envoyproxy.io)
Service meshes Linkerd (https://linkerd.io), Istio (https://istio.io)
10 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
drawbacks, such as runtime constraints, state constraints,
and still unsatisfactorily solved function composition prob-
lems, such as the double spending problem (see Figure 5).
What is more, resulting serverless architectures have secu-
rity implications. They increase attack surfaces and shift
parts of the application logic (service composing) to the
client-side (which is not under complete control of the ser-
vice provider). Furthermore, FaaS increases vendor lock-
in problems and client complexity, as well as integration
and testing complexity.
Furthermore, Figure 7 shows that serverless architec-
tures (and microservice architectures as well) require a
cloud application architecture redesign, compared to tradi-
tional e-commerce applications. Much more than micro-
service architectures, serverless architectures integrate
third-party backend services, such as authentication or
database services, intentionally. Functions on FaaS plat-
forms provide only very service specific, security relevant,
or computing intensive functionality. All functionality that
would have been provided classically on a central
Figure 6. Blueprint of a serverless platform architecture. FaaS: Function as a Service; API: application programming interface.
Figure 7. Serverless architectures result in a different and less centralized composition of application components and backend
services compared with classical tiered application architectures. API: application programming interface; FaaS: Function as a Service;
BaaS: Backend as a Service.
Kratzke and Siegfried 11
application server is now provided as many isolated
micro- or even nanoservices. The integration of all these
isolated services as meaningful end-user functionality is
delegated to end devices (very often in the shape of native
mobile applications or progressive web applications). In
summary, we can see the following observable engineer-
ing decisions in serverless architectures.
•Former cross-sectional but service-internal logic,
such as authentication or storage, is sourced to
external third-party services.
•Even nano- and microservice composition is shifted
to end-user clients or edge devices. This means that
even service orchestration is not done anymore by
the service provider itself but by the service con-
sumer via provided applications. This end-user
orchestration has two interesting effects: (1) the
service consumer now provides resources needed
for service orchestration; (2) because the service
composition is done outside the scope of the FaaS
platform, still unsolved FaaS function composition
problems (such as the double spending problem)
•Such client or edge devices are interfacing third-
party services directly.
•Endpoints of service-specific functionality are pro-
vided via API gateways. So, HTTP- and REST-
based/REST-like communication protocols are gen-
•Only very domain- or service-specific functions are
provided on FaaS platforms. This is mainly when
this functionality is security relevant and should be
executed in a controlled runtime environment by
the service provider, or the functionality is too pro-
cessing- or data-intensive to be executed on con-
sumer clients or edge devices, or the functionality is
so domain-, problem-, or service-specific that sim-
ply no external third-party service exists.
Finally, the reader might observe the trend in serverless
architectures that this kind of architecture is more decen-
tralized and distributed, makes more intentional use of
independently provided services, and is therefore much
more intangible (more cloudy) compared with microser-
4. Impacts on the Modeling and
Simulation as a Service domain
The impacts on MSaaS are presented from diverse point
of views. Section 4.1 will present several example use
cases to derive some implications for cloud-native simula-
tions (CNSs; see Section 4.2). Section 4.3 will explain
how these implications have been considered in an CNS
reference model (see Figure 8). In addition, Section 4.4
will discuss some limitations that should be considered to
raise the overall maturity level of CNSs (Section 4.5).
4.1 Example cloud simulation use cases
There exist several examples and investigation of simula-
tion models that have been successfully deployed to public
cloud computing infrastructures.
•A cloud-based distributed agent-based traffic simu-
lator named Megaffic.
•The Scalable Electro-Mobility Simulation Cloud
Service was used to study the impact of large-scale
electromobility on a city’s infrastructure.
•The D-Mason framework is a parallel version of the
Mason library for writing and running distributed
•GridSpice is a cloud-based simulation platform for
distributed smart power grid simulation.
•The British Army is investigating the potential of
virtual reality (VR), machine learning, and cloud
computing for the Army’s Collective Training
Transformation Programme (CTTP; https://bit.ly/
2r6oL51). A series of training events aim to demon-
strate VR and mixed reality (MR) capabilities. To
display the potential benefits of data capture and
machine learning-driven analytics for military train-
ing, subcontractors will also show the use of cloud
computing in this context.
Guzzetti et al.
investigated the impact of different
high-performance computing (HPC) platforms for numeri-
cal simulation in computational hemodynamics with the
LiFEV (Library for Finite Elements). They compared in-
house computing clusters, a large-scale university-based
HPC cluster, and a regional supercomputer with public
clouds. According to their results, cloud computing can be
utilized for scientific computational fluid dynamics (CFD)
simulations, possibly at lower cost/performance than using
a more expensive local computing cluster. Ledyayev and
evaluated the private cloud solution OpenStack
by using three case studies in transportation modeling
(network optimization), high-energy physics (Monte
Carlo), and materials simulation (CFD). They concluded
that cloud computing is suitable for multiple runs of
non-concurrent code but needs specialist hardware to
support parallel processing.
4.2 Implications for cloud-native simulations
If we analyze these examples, we see that cloud-based
simulations are possible, even for large-scale problems.
However, economies of scale rely intensely on the kind of
12 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
simulation and the parallelizing approach of processing.
Summarizing our central insights of Section 3, we get the
following lessons learned from the cloud-native domain
that can be transferred to simulation contexts. First of all,
if a cloud-native application is an application that is com-
posed of services, then correspondingly, a CNS would be a
simulation composed of small and independent deployable
and replaceable simulation services that simulate (UNIX-
like) ‘‘one thing well’’ and can be scaled horizontally to
enable parallel processing.
Consequently, existing (monolithic) simulations must
be migrated into microservice architectures and would
evolve somehow from a cloud-ready into a cloud-native
maturity level (see Table 3). Cloud-native application engi-
neering showed that it is rarely possible to transfer existing
applications one to one into cloud environments without
As the reader will notice, we will propose a CNS stack
(Figure 8) that is deeply based on the already introduced
cloud-native stack (Figure 2). Consequently, correspond-
ing CNS engineering trends (Table 4) are derived from the
general cloud-native engineering trends (Table 1). The
cloud-native simulation stack as well as the corresponding
engineering trends have been compiled systematically by
‘‘replacing’’ general cloud-native concepts with more spe-
cific CNS concepts. That is because we assume that a
CNS is a particular cloud-native application (with some
specific requirements). However, this eliminated some
already discussed features and software trends, for exam-
ple, the observable DevOps trend is a general software
engineering trend. We do not see specific impacts on
simulation service engineering here that go beyond stan-
dard software engineering. However, that does not mean
that this trend should not be applied in simulation service
Table 3. Cloud simulation maturity model (adapted from the Open Data Center Alliance).
Level Maturity Criteria
3 Cloud native - Simulations are transferable across infrastructure providers at runtime and without
interruption of service.
- Simulation services are automatically scaled out/in based on stimuli.
2 Cloud resilient - The state of simulation services is isolated in a minimum of services.
- Simulations are unaffected by dependent service failures.
- Simulations are infrastructure agnostic.
1 Cloud friendly - Simulations are composed of loosely coupled simulation services.
- Simulation services are discoverable by name.
- Simulation services are designed to cloud patterns.
- Compute and storage are separated.
0 Cloud ready - Simulations can be operated on virtualized infrastructure.
- Simulations can be instantiated from image or script.
Figure 8. Proposal of a cloud-native simulation stack. FaaS: Function as a Service; IaaS: Infrastructure as a Service.
Kratzke and Siegfried 13
engineering. It only means that we do not see simulation-
specific problems here.
The same is true for cloud modeling and cloud simula-
tion tools (such as CloudSim) to represent and analyze
cloud architectures. At first glance, it seems obvious to
cover these tools as well. However, we do not see that
these tools are relevant to the research of CNSs in general,
except for the case that cloud simulations should be run as
CNSs. However, to simulate cloud infrastructures is
simply a particular object of investigation. This paper
intentionally does not focus too much on specific
simulation-specific objects of investigation.
We do not have a common definition that explains what
a CNS exactly is. Nevertheless, we use our experiences
with cloud-native applications to derive a definition pro-
posal for CNSs. If we assume that a CNS is a special kind
of a cloud-native application we should consider the fol-
Fehling et al.
postulate that almost all cloud-native
systems should be IDEAL: They [i]solate their state, they
are [d]istributed in their nature, they are [e]lastic in a hor-
izontal scaling notion, they are operated on [a]utomated
management systems, and their components are [l]oosely
coupled. According to Stine,
there are common motiva-
tions for cloud-native architectures, such as to deliver
software-based solutions more quickly (speed), in a more
fault isolating, fault tolerating, and automatic recovering
way (safety), to enable horizontal (instead of vertical)
application scaling (scale), and finally to handle a diver-
sity of consumer platforms and legacy systems (client
Several application architectures and infrastructure
approaches address these common motivations.
•Microservices represent the decomposition of
monolithic systems into independently deployable
services that do ‘‘one thing well.’’
•The primary mode of interaction between services in
a cloud-native application architecture is via pub-
lished and versioned APIs (API-based collabora-
tion). These APIs are often HTTP based and follow
a REST-style with JSON serialization, but other pro-
tocols and serialization formats can be used as well.
•Single deployment units of the architecture are
designed and interconnected according to a collec-
tion of cloud-focused patterns, such as the twelve-
Table 4. Expectable software engineering trends for cloud-native simulation services.
CNA trend Impact on MSaaS
Microservices Simulation architectures should be composed of small and independently replaceable horizontally
scalable simulation services that are ‘‘simulating one thing well.’’
Modeling languages Existing simulation modeling languages should be extended to define the composition of
simulation services and their elasticity behavior.
Simulation deployment units should wrap a piece of simulation software in a complete file system
that contains everything needed to run: code, runtime, system tools, system libraries. So, it is
guaranteed that the software will always run the same, regardless of its environment. This
deployment approach can be realized using standardized container technologies (OCI standard).
Elastic platforms Simulation platforms should evolve into a unifying middleware of cloud infrastructures. Such
platforms extend resource sharing and increase the utilization of underlying compute, network,
and storage resources for custom but standardized simulation deployment units.
Serverless Serverless simulation would be used for an architectural style that is used for cloud-based
simulations that deeply depend on external third-party simulation services and integrating them
via small event-based triggered functions (Function-as-a-Service, FaaS).
State isolation Stateless simulation services are easier to scale up/down horizontally than stateful simulation
services. Of course, stateful components cannot be avoided, but stateful components should be
reduced to a minimum and realized by intentional horizontal scalable storage systems (often
eventual consistent NoSQL databases).
Versioned REST APIs If simulation services provide versioned REST APIs this inherently provides a scalable and
pragmatic communication. Such a kind of simulation service communication relies mainly on
already existing internet infrastructure and well-defined and widespread standards. It would even
enable the seamless integration of simulation services that are not ‘‘cloud-native’’ but only
Loose coupling Simulation service composition can be done by events or by data. Event coupling in ‘‘normal
cloud-native application’’ relies on messaging solutions (e.g., AMQP standard). Data coupling relies
on scalable but (mostly) eventual consistent storage solutions (which are often subsumed as
CNA: cloud-native architecture; MSaaS: Modeling and Simulation as a Service; API: application programming interface; AMQP: Advanced Message
14 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
factor app collection,
the circuit breaker pat-
or cloud computing patterns.
•More and more often elastic container platforms
are used to deploy and operate these microservices
via self-contained deployment units (containers).
These platforms provide additional operational cap-
abilities on top of IaaS infrastructures, such as auto-
mated and on-demand scaling of application
instances, application health management, dynamic
routing, load balancing, and aggregation of logs
4.3 A cloud-native simulation reference model
These aspects let us derive the following understanding
of a CNS system and the corresponding CNS stack
The core design idea of plenty of cloud-native applica-
tion architectures inspires the leading conceptual approach
of the derived CNS stack (Figure 8). Every simulation on
Layer 4 (or service) should be composable of stateless
Layer 3 simulation services that rely on services managing
and encapsulating simulation state. This separation of con-
cerns (simulation logic and simulation state) makes it pos-
sible for distributed simulations to decide for eventual or
strict consistency models for the simulation state.
Although it enables seamless horizontal scalability of
functional simulation services, it is a widespread and pro-
ven pattern in cloud-native application architectures,
according to our experiences.
Another general cloud application architecture best
practice is the standardization of the deployment of Layer
3 simulation services. This deployment standardization via
a Layer 2 elastic simulation platform enables one to oper-
ate plenty of services on the same physical or virtual Layer
1 hardware. In the general cloud computing context, this is
customarily done via container-based technologies. A con-
tainer is nothing more than a self-contained deployment
unit encapsulating all its runtime dependencies. It exposes
its functionality very often via a REST-based interfaces.
Such containers can be operated on corresponding con-
tainer platforms, such as Kubernetes, Mesos, Docker
Swarm, and more. These kind of platforms are application-
agnostic and can be used for arbitrary types of applica-
tions. Consequently, they can be used for simulation
services as well. Therefore, we recommend using these
kinds of building blocks for elastic simulation platforms
What is more, the proposed CNS stack aligns to the
general design principles of distributed and federated simu-
lations that have been successfully standardized, for exam-
ple, via high-level architecture (HLA). HLA is a standard
for distributed simulation, used when building a simulation
for a larger purpose by combining (federating) several
simulations. HLA requests a runtime infrastructure (RTI)
that provides a standardized set of services, as specified in
the HLA Federate Interface Specification. This RTI is
deeply aligned to Layer 2 of the proposed reference model.
Further HLA services of the interface specification can be
mapped to our model as well:
•federation management services →simulation
deployment unit orchestrator;
•object and ownership management services →sta-
teful simulation services;
•time management services →stateful simulation
•declaration and data distribution services →exist-
ing messaging solutions (e.g., all Advanced
Message Queueing Protocol (AQMP) message bro-
kers) can be deployed on Layer 3 by the Simulation
Deployment unit orchestrator alongside further
4.4 Discussion of limitations
We have to admit that this mapping stays vague at this
level of abstraction. So, more common detailed cross-
functional simulation services (such as timing or messa-
ging services) on Layer 3 could (and should) be defined in
future MSaaS work. However, the CNS stack does not
request a specific time or messaging simulation service (or
other services). However, it recommends providing such
services in a programming language-agnostic way, as
microservice approaches do via HTTP and REST-based
versioned APIs. To make use of such common internet
communication standards would efficiently tackle one
downside of current HLA-based approaches.
Because HLA is a message-oriented middleware that
defines a set of services, mostly provided by a C++ or
Java API, there is no standardized on-the-wire protocol. In
consequence, participants in a federation are very often
bound to RTI libraries from the same provider and usually
also of the same version for applications to interoperate.
The resulting simulations are mostly so-called deployment
Instead of that, the cloud-native application stack would
not request a specific set of services but the way to inter-
face every simulation service in an Internet standard-
ACNS is a distributed, elastic, and horizontally scalable
simulation system composed of simulation (micro)services that
isolate the state in a minimum of stateful components. The self-
contained deployment units of that simulation are designed
according to cloud-focused design patterns and operated on
self-service elastic simulation platforms.
Kratzke and Siegfried 15
conforming manner. Each simulation service would be a
self-containing deployment unit encapsulating all its run-
time dependencies. Every simulation service on Layer 3
and higher should be integrated using standardized internet
protocols, such as HTTP and REST-based approaches.
We also have to consider that cloud computing has tra-
ditionally not been HPC or simulation oriented. It is often
mentioned that current Internet of Things (IoT)
approaches, therefore, are being implemented using Fog or
Edge Computing approaches. However, this has more to
do with communication latencies outside the scope of the
regional datacenters of the hyperscalers. Nevertheless,
simulations are often based on message passing or data
analysis, and the current cloud approaches or MSaaS could
be a wrong choice. Recent studies by the NASA showed
that ‘‘in all cases, the full cost of running on NASA on-
premises resources was less than the lowest-possible com-
pute-only cost of running on AWS.’’
This NASA study
showed that cloud-based simulations tend to be between
two and even 12 times more expensive than simulations
operated on on-premises simulation facilities.
This sounds disillusioning. However, the study did not
take into account that cloud-based simulations should fol-
low a different kind of architectural style to leverage the
economic benefits. Classical simulations are very often so-
called deployment monoliths. This kind of architecture is
hardly scalable and does fit not the general pay-as-you-go
business model of cloud computing. Insights into the
cloud-native application domain show that cloud-native
deployments should make use of more fine-grained and
horizontally scalable units to optimize resource usage. If
this can be done (and this might be simulation specific),
then other studies by Villamizar et al.
show that costs
could be reduced to 25%. So, if we are taking both kinds
of investigation into account, MSaaS would be a reason-
able option for simulations that are still two to four times
more expensive according to the NASA cost comparison
methodology. That is exactly what Taylor et al. are stat-
However, not all simulations will benefit equally
What is more, the reader should take into account that
plenty of small companies, organizations, or independent
researchers do not have access to simulation facilities
comparable to NASA’s High-End Computing Capability
(HECC) project. For people that do not have access to
such supercomputing facilities, cost and performance com-
parisons make little sense.
4.5 How to raise the cloud-native maturity of
The following setting should be considered to reach such a
•We need standardized deployment units for simula-
tion components (services) and a standardized plat-
form to operate them. Furthermore, the deployment
units should provide a better-operating density than
•Event-triggered FaaS-based simulation services can
be considered to avoid expensive always-on
•FaaS-based simulation services will likely change
the architecture of simulations to avoid the double
spending problem (see Figure 5). Furthermore, run-
time limitations of FaaS functions, start-up laten-
cies, and state preservation must be considered and
might limit the applicability of special kinds of
•Horizontal scalability in cloud-native applications
is mostly realized via loosely coupled (event-based)
microservice approaches. These scalability require-
ments should be considered for CNS architectures
•That raises the need for simulation service meshes
that connect, secure, control, and observe simula-
tion services to enable loose coupling of simulation
•FaaS-based simulation service composing problems
might raise the need for specific domain specific
languages (DSLs) to compose different kinds of
(stateful, stateless, serverless) simulation services
that are frictionless.
For operational convenience, the operation of cloud-
based simulation platforms and the operation of simula-
tions on top of these platforms should be handled as two
independent but complementary engineering problems.
Corresponding and simulation-specific engineering
trends are listed in Table 4 for the convenience of the
5. Related work
As far as the authors know, no survey has focused inten-
tionally on observable trends in cloud computing over the
last decade from a ‘‘big picture’’ and evolutional point of
view. This paper grouped that evolution into the following
points of view:
•resource utilization optimization approaches, such
as the containerization and FaaS approaches;
•the architectural evolution of cloud applications via
microservices and serverless architectures.
For all four of these specific aspects (containerization,
FaaS, microservices, serverless architectures) there exist
surveys that should be considered by the reader. The
16 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
studies and surveys
deal mainly with containeriza-
tion and its accompanying resource efficiency. Although
FaaS is quite young and is not reflected extensively in
research so far, there exist first survey papers
ing with FaaS approaches deriving some open research
questions regarding tool support, performance, patterns for
serverless solutions, enterprise suitability, and whether ser-
verless architectures will extend beyond traditional cloud
platforms and architectures.
Service composition provides value-adding and higher
order services by composing basic services that can be
even pervasively provided by various organizations.
What is more, service computing is quite established, and
there are several surveys on SOA-related aspects.
However, more recent studies focus mainly on microser-
vices. Dragoni et al.,
Jamshidi et al.,
and Cerny et al.
focus on the architectural point of view and the relation-
ship between SOA and microservices. All of these papers
are great to understand the current microservice ‘‘hype’’
better. It is highly recommended to study these papers.
However, these papers are bound to microservices and do
not take the ‘‘big picture’’ of general cloud applications
and simulation architecture evolution into account. Very
often, serverless architectures are subsumed as a part of
microservices to some degree. The authors are not quite
sure whether serverless architectures do not introduce fun-
damental new aspects into cloud simulation architectures
that evolve from the ‘‘scale-to-zero’’ capability on the one
hand and the unsolved function composition aspects (such
as the double spending problem) on the other hand.
M&S products are highly valuable to NATO and the
military, and it is essential that M&S products, data, and
processes are conveniently accessible to a large number of
users as often as possible. Therefore, a new M&S ecosys-
tem is required where M&S products can be accessed
simultaneously and spontaneously by a large number of
users for their individual purposes. This ‘‘as-a-service’’
paradigm has to support stand-alone use as well as the
integration of multiple simulated and real systems into a
unified simulation environment whenever the need arises.
Several approaches head into this direction.
‘‘Allied Framework for MSaaS’’ is the common approach
of NATO and the nations toward implementing MSaaS
and is defined by the following documents.
•Operational Concept Document (OCD): the OCD
describes the intended use, key capabilities, and
desired effects of the Allied Framework for MSaaS
from a user’s perspective.
•Technical Reference Architecture: the Technical
Reference Architecture describes the architectural
building blocks and patterns for realizing MSaaS
•Governance Policies: the Governance Policies iden-
tify MSaaS stakeholders and relationships and pro-
vide guidance for implementing and maintaining
the Allied Framework for MSaaS.
The MSaaS Technical Reference Architecture
important in the context of this paper. The MSaaS
Technical Reference Architecture provides technical
guidelines, recommended standards, architecture building
blocks, and architecture patterns that should be considered
in realizing MSaaS capabilities. Compared to the CNS
maturity model, the MSaaS Technical Reference Model
guides on a higher level (building blocks, patterns, and
more), but does not explicitly define how to implement
those building blocks (e.g., as microservice or FaaS).
Because the inherent cost structure of cloud computing
stays the same for simulation services, we forecast that
CNS architectures follow similar trends to cloud-native
applications. These similar trends should make CNS ser-
vices more microservice-like, or even nanoservice-like
(similar to Functions as a Service). To leverage the oppor-
tunities of cloud computing, they should be much more
composed of smaller and more fine-grained and domain-
specific simulation services that just ‘‘simulate one thing
well.’’ Simulation services should strive to become state-
less or isolate states in a minimum of stateful components.
As the inherent nature of simulations deeply relies on
states (data), this state focusing might even raise some
problems that are not so common in ‘‘normal’’ cloud-
native application design and need new solutions to be
developed. ‘‘Classical’’ cloud-native applications come
along with 24 ×7 requirements. These 24 ×7 require-
ments are very often not necessary for simulations. This
might provide short-cut opportunities in a cloud maturity
model for simulation services. A more detailed analysis of
how trends and approaches from cloud-native applications
might be applied to CNSs, and which new challenges
might arise, would be valuable future research.
Recent studies by NASA showed that cloud-based
simulations tend to be between two and even 12 times
more expensive than simulations operated on on-premises
simulation facilities. However, this study did not take into
account that cloud-based simulations should follow a dif-
ferent kind of architectural style to leverage the economic
benefits of cloud infrastructures. If this cloud-native rear-
chitecting of simulations can be done (and this might be
simulation specific), then other studies show that costs
could be reduced to 25%. So, if we are taking both kinds
of investigations into account, MSaaS would be a reason-
able option for maybe not all but a significant portion of
Kratzke and Siegfried 17
What is more, plenty of small companies, organizations,
or independent researchers do not have access to simula-
tion facilities comparable to the NASA’s HECC project.
For all those that do not have access to supercomputing
facilities, such cost and performance comparisons make lit-
tle sense. Cloud computing might be the only viable option
The authors disclosed receipt of the following financial
support for the research, authorship, and/or publication of
this article: The CloudTRANSIT project of Nane Kratzke
has been funded by the German Ministry of Education and
Research (13FH021PX4). The NMSG activities of Robert
Siegfried have been funded by the German Federal Office
of Bundeswehr Equipment, Information Technology and
In-Service Support (BAAINBw).
Nane Kratzke https://orcid.org/0000-0001-5130-4969
1. NATO STO. Modelling and Simulation as a Service
(MSaaS) - Rapid Deployment of Interoperable and Credible
Simulation Environments. Technical report, AC/323(MSG-
136)TP/826, NATO Science and Technology Organization
2. Siegfried R, McGroarty C, Lloyd J, et al. A new reality:
Modelling & Simulation as a Service. CSIAC J Cyber Secur
Inform Syst 2018; 6, https://www.csiac.org/journal-article/a-
3. Weinmann J. Mathematical proof if the inevitability of cloud
Weinman_Inevitability_Of_Cloud.pdf (2011, accessed 10
4. Mell PM and Grance T. The NIST definition of cloud com-
puting. Technical report, National Institute of Standards &
Technology, Gaithersburg, MD, USA, 2011.
5. Kratzke N and Quint PC. Technical report of project
CloudTRANSIT - Transfer cloud-native applications at run-
time. Technical report, Lu¨beck University of Applied
Sciences, 2018, https://doi.org/10.2314/KXP:1678556971
6. OASIS. Advanced Message Queueing Protocol (AQMP),
Version 1.0 http://www.amqp.org/sites/amqp.org/files/amqp.
pdf (2011, accessed 16 December 2019).
7. Kratzke N. Lightweight virtualization cluster – how to over-
come cloud vendor lock-in. J Comput Commun 2014; 2:
8. Kratzke N, Quint PC, Palme D, et al. Project cloud
TRANSIT - or to simplify cloud-native application provi-
sioning for SMEs by integrating already available container
technologies. In: Kantere V and Koch B (eds) European
project space on smart systems, big data, future internet -
towards serving the grand societal challenges, 2016, pp.3–
´bal, Portugal: SCITEPRESS.
9. Hogan M, Fang L, Sokol A, et al. Cloud Infrastructure
Management Interface (CIMI) model and RESTful HTTP-
based protocol, https://www.iso.org/standard/66296.html
(2015, accessed 16 December 2019).
10. Nyren R, Edmonds A, Papaspyrou A, et al. Open Cloud
Computing Interface (OCCI) - core, version 1.1, https://
www.ogf.org/documents/GFD.183.pdf (2011, accessed 16
11. Metsch T and Edmonds A. Open Cloud Computing Interface
(OCCI) - infrastructure, version 1.1 https://www.ogf.org/doc
uments/GFD.184.pdf (2011, accessed 16 December 2019).
12. SNIA. Cloud Data Management Interface (CDMI), version
v1.1.1.pdf (2015, accessed 16 December 2019).
13. System Virtualization, Partitioning, and Clustering Working
Group. Open Virtualization Format specification, version
uments/DSP0243_2.1.1.pdf (2015, accessed 16 December
14. OCI. Open Container Initiative, https://www.opencontainers.
org (2015, accessed 4 February 2016).
15. OASIS. Topology and Orchestration Specification for Cloud
Applications (TOSCA), version 1.0, http://docs.oasis-open.org/
tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.pdf (2013, accessed 16
16. Opara-Martins J, Sahandi R and Tian F. Critical review of
vendor lock-in and its impact on adoption of cloud comput-
ing. In: International conference on information society
(i-Society 2014), London, 10–12 November 2014, pp.92–97.
17. Kratzke N and Peinl R. ClouNS - a cloud-native application
reference model for enterprise architects. In: 2016 IEEE
20th international enterprise distributed object computing
Workshop (EDOCW), Vienna, 5–9 September 2016, pp.1–
18. Bohn RB, Messina J, Liu F, et al. NIST cloud computing
reference architecture. In: world congress on services
(SERVICES 2011), Washington, DC, 4–9 July 2011,
pp.594–596. Washington, DC: IEEE Computer Society.
19. Quint PC and Kratzke N. Overcome vendor lock-in by inte-
grating already available container technologies - towards
transferability in cloud computing for SMEs. In: proceedings
of 7th international conference on cloud computing, grids
and virtualization (CLOUD COMPUTING 2016) (eds CB
Westphall, YW Lee and S Rass), pp.38–41.
20. Ardagna D, Casale G, Ciavotta M, et al. Quality-of-service
in cloud computing: modeling techniques and their applica-
tions. J Internet Serv Appl 2014; 5: 11.
21. White G, Nallur V and Clarke S. Quality of service
approaches in IoT: a systematic mapping. J Syst Softw 2017;
22. Villamizar M, Garce
´s O, Ochoa L, et al. Cost comparison of
running web applications in the cloud using monolithic,
microservice, and AWS Lambda architectures. Service
Orient Comput Appl. Epub ahead of print 27 April 2017.
18 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
23. Pahl C, Brogi A, Soldani J, et al. Cloud container technolo-
gies: a state-of-the-art review. IEEE Trans Cloud Comput
2017; 7: 1–1.
24. Dragoni N, Giallorenzo S, Lafuente AL, et al. Microservices:
yesterday, today, and tomorrow. In: Mazzara M and Meyer
B (eds) Present and ulterior software engineering. Cham:
25. Verma A, Pedrosa L, Korupolu M, et al. Large-scale cluster
management at Google with Borg. In: Proceedings of the
tenth European conference on computer systems (EuroSys
’15), Bordeaux, France, 21–24 April 2015, pp.1–17. New
26. Hindman B, Konwinski A, Zaharia M, et al. Mesos: a plat-
form for fine-grained resource sharing in the data center. In:
proceedings of the 8th USENIX conference on networked
systems design and implementation (NSDI’11), Boston, MA,
30 March–1 April 2011, pp.295–308. Berkeley, CA:
27. Baldini I, Cheng P, Fink SJ, et al. The serverless
trilemma: function composition for serverless computing. In:
proceedings of the 2017 ACM SIGPLAN international sympo-
sium on new ideas, new paradigms, and reflections on pro-
gramming and software - onward!, Vancouver, BC, Canada,
25–27 October 2017, pp.89–103. New York: ACM.
28. Jamshidi P, Pahl C, Mendoncxa NC, et al. Microservices: the
journey so far and challenges ahead. IEEE Softw 2018; 35:
29. Taibi D, Lenarduzzi V and Pahl C. Architectural patterns for
microservices: a systematic mapping study. In: 8th interna-
tional conference on cloud computing and services science
(CLOSER‘18), Funchal, Madeira, Portugal, 19–21 March
2018, pp.221–232. SCITEPRESS.
30. Kratzke N and Quint PC. Understanding cloud-native appli-
cations after 10 years of cloud computing - a systematic map-
ping study. J Syst Softw 2017; 126: 1–16.
31. Balalaie A, Heydarnoori A and Jamshidi P. Microservices
architecture enables DevOps: migration to a cloud-native
architecture. IEEE Softw. Epub ahead of print 18 March
2016. DOI: 10.1109/MS.2016.64.1606.04036.
32. Mike Roberts. Serverless architectures, https://martinfowler.
com/articles/serverless.html (2016, accessed 18 December
33. Baldini I, Castro P, Chang K, et al. Serverless computing:
current trends and open problems. In: Research advances in
cloud computing. Singapore: Springer Singapore, 2017,
34. Baldini I, Castro P, Cheng P, et al. Cloud-native, event-based
programming for mobile applications. In: Proceedings of the
international conference on mobile software engineering and
systems, Austin, TX, 16–17 May 2016, pp.287–288. ACM.
35. Taylor SJ, Kiss T, Anagnostou A, et al. The CloudSME
simulation platform and its applications: a generic multi-
cloud platform for developing and executing commercial
cloud-based simulations. Fut Generat Comput Syst 2018; 88:
36. Hanai M, Suzumura T, Ventresque A, et al. An adaptive
VM provisioning method for large-scale agent-based traffic
simulations on the cloud. In: 2014 IEEE 6th international
conference on cloud computing technology and science,
Singapore, 15–18 December 2014, pp.130–137. IEEE.
37. Zehe D, Knoll A, Cai W, et al. SEMSim Cloud Service:
large-scale urban systems simulation in the cloud. Simulat
Model Pract Theor 2015; 58: 157–171.
38. Carillo M, Cordasco G, Serrapica F, et al. D-Mason on the
cloud: an experience with Amazon Web Services. In:
Desprez F, Dutot P-F, Kaklamanis C, et al. (Eds.) Euro-Par
2016: parallel processing workshops, Lecture Notes in
Computer Science, pp.322–333. Cham: Springer
39. Anderson K, Du J, Narayan A, et al. Gridspice: A distributed
simulation platform for the smart grid. IEEE Trans Ind
Informat 2014; 10: 2354–2363.
40. Guzzetti S, Passerini T, Slawinski J, et al.. Platform and algo-
rithm effects on computational fluid dynamics applications
in life sciences. Fut Generat Comput Syst 2017; 67: 382–
41. Ledyayev R and Richter H. High performance computing in
a cloud using openstack. Cloud Computing 2014; 5: 108–
42. Fehling C, Leymann F, Retter R, et al. Cloud computing pat-
terns. Wien, Austria: Springer, 2014.
43. Stine M. Migrating to cloud-native application architectures.
Sebastopol, CA: O Reilly, 2015.
44. Newman S. Building microservices. Sebastopol, CA:
O’Reilly Media, Incorporated, 2015.
45. Namiot D and Sneps-Sneppe M. On micro-services architec-
ture. Int J Open Inform Technol 2014; 2: 24–27.
46. Wiggins A. The twelve-factor app, http://12factor.net/ (2014,
accessed 14 February 2016).
47. Martin Fowler. Circuit breaker, http://martinfowler.com/
bliki/CircuitBreaker.html (2014, accessed 27 May 2016).
48. Erl T, Cope R and Naserpour A. Cloud computing design
patterns. Westford, MA: Prentice Hall, 2015.
49. Chang S, Hood R, Jin H, et al. Evaluating the suitability of
commercial clouds for NASA’s high performance comput-
ing applications: a trade study. Technical report, NASA,
Report_NAS-2018-01.pdf (2018, accessed 18 December
50. Kaur T and Chana I. Energy efficiency techniques in cloud
computing: a survey and taxonomy. ACM Comput Surv
2015; 48: 22:1–22:46.
51. Tosatto A, Ruiu P and Attanasio A. Container-based orches-
tration in cloud: state of the art and challenges. In: 2015 ninth
international conference on complex, intelligent, and soft-
ware intensive systems, Blumenau, Brazil, 8–10 July 2015,
52. Peinl R, Holzschuher F and Pfitzer F. Docker cluster man-
agement for the cloud - survey results and own solution. J
Grid Comput 2016; 14: 265–282.
53. Spillner J. Practical tooling for serverless computing. In:
proceedings of the 10th international conference on utility
and cloud computing (UCC ’17), Austin, TX, 5–8 December
2017, pp.185–186. New York: ACM.
54. Lynn T, Rosati P, Lejeune A, et al. A preliminary review of
enterprise serverless cloud computing (Function-as-a-
Kratzke and Siegfried 19
Service) platforms. In: 2017 IEEE international conference
on cloud computing technology and science (CloudCom),
Hong Kong, China, 11–14 December 2017, pp.162–169.
55. van Eyk E, Toader L, Talluri S, et al. Serverless is more:
from PaaS to present cloud computing. IEEE Internet
Comput 2018; 22: 8–17.
56. van Eyk E, Iosup A, Abad CL, et al. A SPEC RG Cloud
Group’s vision on the performance challenges of FaaS cloud
architectures. In: Proceedings of the 8th ACM/SPEC on
international conference on performance engineering (ICPE
2018), Berlin, Germany, ACM, 9–13 April 2018, pp 21–24.
57. Ylianttila M, Riekki J, Zhou J, et al. Cloud architecture for
dynamic service composition. Int J Grid High Perform
Comput 2012; 4: 17–31.
58. Zhou J, Riekki J and Sun J. Pervasive service computing
toward accommodating service coordination and collabora-
tion. In: 2009 4th international conference on frontier of
computer science and technology, Shanghai, China, 17–19
December 2009, pp.686–691. IEEE.
59. Huhns MN and Singh MP. Service-oriented computing:
key concepts and principles. IEEE Internet Comput 2005; 9:
60. Dustdar S and Schreiner W. A survey on web services com-
position. Int J Web Grid Serv 2005; 1: 1–30.
61. Papazoglou MP, Traverso P, Dustdar S, et al. Service-
oriented computing: state of the art and research challenges.
Computer 2007; 40: 38–45.
62. Papazoglou MP and van den Heuvel WJ. Service oriented
architectures: approaches, technologies and research issues.
VLDB J 2007; 16: 389–415.
63. Razavian M and Lago P. A survey of SOA migration in
industry. In: Kappel G, Maamar Z and Motahari-Nezhad HR
(Eds.) Service-oriented computing. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2011, pp.618–626.
64. Cerny T, Donahoo MJ and Pechanec J. Disambiguation and
comparison of SOA, microservices and self-contained sys-
tems. In: Proceedings of the international conference on
research in adaptive and convergent systems (RACS ’17),
Krakow, Poland, 20–23 September 2017, pp.228–235. New
65. Mittal S, Risco-Martı
´n JL and Zeigler BP. DEVS/SOA: a
cross-platform framework for net-centric modeling and
simulation in DEVS unified process. Simulation 2009; 85:
66. Al-Zoubi K and Wainer G. Performing distributed simula-
tion with RESTful web-services. In: proceedings of the 2009
winter simulation conference (WSC), Austin, TX, 13–16
December 2009, pp.1323–1334. IEEE.
67. Mittal S and Risco-Martı
´n JL. DEVSML 3.0 Stack: rapid
deployment of DEVS farm in distributed cloud environment
using microservices and containers. In: Proceedings of the
symposium on theory of modeling & simulation (TMS/DEVS
’17), Virginia Beach, Virginia, 23–26 April 2017.
68. NATO STO. Modelling and Simulation as a Service
(MSaaS) - volume 1: MSaaS technical reference architec-
ture. Technical report, NATO Science and Technology
Organization (STO), 2018.
Nane Kratzke is a professor for Computer Science at
the Lu¨beck University of Applied Sciences and a former
Navy Officer (German Navy). He consulted for the
German Ministry of Defence in questions regarding
network-centric warfare. His particular research focus is
directed at cloud-native applications and cloud-native ser-
vice-related software engineering methodologies and cor-
responding application architectural styles, such as
microservices or serverless architectures. In addition, he is
interested in data science, distributed systems, and web-
scale elastic systems.
Robert Siegfried is senior consultant for IT and M&S
projects and CEO of aditerna GmbH and Aditerna Inc.
Prior to his industry engagement, he was research associ-
ate at the University of the Federal Armed Forces in
Munich, Germany. His primary research areas are agent-
based M&S and parallel and distributed simulation.
Within several projects for the German Armed Forces and
US Department of Defense (DoD), he has worked (and is
still working) on topics such as MSaaS, artificial intelli-
gence (AI)-supported data fusion, metadata specifications,
model management systems, distributed simulation test
beds, and process models. Since October 2018, he has
served as Vice-Chair of the NMSG. He is actively
involved in multiple working groups of the Simulation
Interoperability Standards Organization (SISO) and serves
as member of the SISO Executive Committee.
20 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)