Content uploaded by Nane Kratzke
Author content
All content in this area was uploaded by Nane Kratzke on Feb 28, 2018
Content may be subject to copyright.
Preliminary Technical Report
Released: February 28, 2018
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime
Nane Kratzke, Peter-Christian Quint
Abstract
The project CloudTRANSIT dealt with the question how to
transfer cloud applications and services at runtime
without downtime across
cloud infrastructures from different public and private cloud service providers. This technical report summarizes the outcomes of more than
20 research papers and reports that have been published throughout the course of the project. The intent of this report is to provide an
integrated birds-eye view on these – so far – isolated papers. The report references to the original papers where ever possible. This project
also systematically investigated
practitioner initiated cloud application engineering trends
of the last three years that provide several
promising technical opportunities to avoid cloud vendor lock-in pragmatically.
Especially European cloud service providers should track such kind of research because of the technical opportunities to bring cloud
application workloads back home to Europe that are currently often deployed and inherently bound to U.S. providers.
Intensified EU General
Data Protection (GDPR) policies, European Cloud Initiatives, or ”America First” policies
might even make this imperative. There are
technical solutions needed for these scenarios that are manageable not only by large but also by small and medium sized enterprises.
Therefore, this project systematically analyzed commonalities of cloud infrastructures and of cloud applications. Latest evolutions of cloud
standards and cloud engineering trends (like containerization) were used to derive a cloud-native reference model (ClouNS) that guided the
development of a pragmatic cloud-transferability solution. This solution intentionally separated the
infrastructure-agnostic operation
of
elastic container platforms (like Swarm, Kubernetes, Mesos/Marathon, etc.) via a
multi-cloud-scaler
and the
platform-agnostic
definition
of cloud-native applications and services via an
unified cloud application modeling language
. Both components are independent but
complementary. Because of their independence they can even contribute (although not intended) to other fields like moving target based
cloud security. The report summarizes the main outcomes and insights of a
proof-of-concept
solution to realize transferability for cloud
applications and services at runtime without downtime.
Keywords
Cloud computing – cloud-native application – runtime transferability – elastic container platform – European cloud strategy
L¨
ubeck University of Applied Sciences, Germany
Center of Excellence for Communcation, Systems and Applications (CoSA)
M¨
onkhofer Weg 239, 23562 L¨
ubeck, Germany
Corresponding author: nane.kratzke@fh-luebeck.de
Project duration: 1st Nov. 2014 - 31th Mar. 2018
Funding: German Ministry of Education and Research, 13FH021PX4
Contents
1Introduction 2
2Research Objectives and Outcomes 3
3Methodology 3
3.1 Systematic mapping study ......................3
3.2 Review of cloud standards and action research ............ 5
4Reference Model 6
4.1 Description of a reference cloud-native stack .............6
4.2 Performance Considerations .....................7
Machine similarity of different cloud providers
•
Performance
impact of overlay networks
4.3 Summary of reference model considerations ............ 10
5Multi-Cloud Scaler 11
5.1 Solution Description ........................ 11
Defining intended states of container platforms
•
A control loop
to reach an intended state
•
Handling infrastructure and platform
specifics
5.2 Evaluation ............................. 13
5.3 Results, Discussion, and Limitations ................ 15
6Unified Cloud Application Modeling Language 17
6.1 Solution Description ........................ 17
6.2 Evaluation ............................. 19
6.3 Results, Discussion, and Limitations ................ 20
7Cloud Security Opportunities 21
7.1 Breaking the cyber attack life cycle ................. 22
7.2 Regenerating nodes permanently .................. 22
7.3 Expectable regeneration intervals .................. 23
7.4 Critical discussion and limitations .................. 23
8Lessons learned 23
8.1 Transfer the platform, not the application .............. 24
8.2 Make the platform a user decision as well ............. 24
8.3 Rate pragmatism over expressiveness ............... 24
9Conclusion 24
Acknowledgments 25
References 25
App A: Cloud services and cloud standards 29
App B: Cluster definition formats 30
App C: UCAML defined cloud applications 32
App D: Mapping of outcomes to work packages 36
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 2/36
1. Introduction
Even for very small companies it is possible to generate enor-
mous economical growth and business value by providing
cloud services or applications. E.g., Instagram proofed suc-
cessfully that it was able to generate exponential business
growth and value in a very short amount of years. At the time
of being bought by Facebook for 1 billion USD, Instagram
had only a headcount of approximately 20 persons,was only
two years old, but operated already a world wide accessible
and scalable social network for image sharing (hosted exclu-
sively by Amazon Web Services, AWS) without owning any
data center or any noteworthy IT assets. On the other hand,
if AWS had to shutdown their services at that time due to an
insolvency or something alike, it is likely that Instagram had
been carried away by such an event as well. The reader should
consider, that it took years for Instagram’s cloud engineers
to transfer their services from the AWS cloud infrastructure
completely into Facebook’s data centers. And this transfer
was accompanied by several noteworthy service outages.
According to the current state of software technology,
once a cloud application or a cloud service is deployed to a
specific cloud infrastructure, it is often inherently bound to
that infrastructure due to non-obvious technological bindings
[
1
]. And once it is up, it has normally always-on requirements.
So, the obvious question arises how to transfer such kind of
applications and services between different cloud infrastruc-
tures at runtime without noteworthy downtime? This problem
is not solved satisfactorily so far. Given the fact, that the most
powerful and convenient cloud infrastructures are provided by
U.S. providers, this is an economical advantage for the U.S.
cloud industry. This effect is known as vendor lock-in, and
standardization approaches like CIMI, OCCI, CDMI, OVF,
OCI, TOSCA, etc., try to minimize these effects. But such
kind of standardization processes are much slower than the
dynamic technical progress in the cloud computing domain
that could be observed in the last years. Furthermore, cloud
service providers have no economical interest in decreasing
this effect
1
. So, although the amount of cloud standards in-
creases in absolute numbers, we can observe in fact a relative
decrease of cloud computing standardization (see Figure 3).
In consequence – and although cloud computing is a multi-
billion EUR business with a lot of potential for the European
ICT industry – European cloud providers seem to fall behind
the pace of U.S. providers like Amazon Web Services (AWS),
Google, Microsoft, IBM, and more. According to Forbes and
their considerations on ”Cloud Wars” only one of the Big
Five public cloud computing providers resides in the Euro-
pean Union – SAP. However, even SAP is forced to establish
partnerships with Amazon, Google, IBM, and Microsoft (the
driving players of ”Cloud Wars”). Being aware of that, as-
tonishing little research projects
2
focus a technological point
1
Werner Vogels, CTO of the biggest cloud service provider AWS, is
known for the bon-mot ”Cloud standardization must be considered harmful.”
2
According to a performed systematic mapping study [
2
] almost no Eu-
ropean research project take latest development trends in cloud application
of view how to overcome such technical vendor lock-in and
transferability at runtime problems in cloud computing. While
migration aspects are considered by several research projects,
transferability concepts are often fall behind.
This report
focus transferability and the corresponding problem to
move cloud applications and services at runtime without
downtime from one cloud infrastructure to another.
Especially European cloud service providers should
track such kind of research because of the technical
opportunities to bring cloud application workloads
back home to Europe. The majority of existing cloud
applications and services is mainly deployed and in-
herently bound to U.S. providers.
The remainder of this report is
OUTLINED
as follows. We
will present the main
research objectives
of this project to
create a pragmatic transferability concept for cloud appli-
cations in
Section 2
. The general outline of our
research
methodology is explained in Section 3and involved system-
atic mapping studies, reviews of cloud standards, application
reference modeling, software engineering and domain specific
language engineering techniques. The resulting
reference
model
that guided our technical solution proposals is pre-
sented in
Section 4
. This additionally compromises upfront
performance considerations and benchmark results for our
proposal.
Section 5
explains an extendable
transferability
and scalability solution
for elastic container platforms, that
provide transferability features for every application that can
be deployed on top of such elastic platforms. We could suc-
cessfully demonstrate that it is possible to transfer cloud appli-
cation workloads across different public and private cloud ser-
vice infrastructures at runtime and without downtime. There
are several promising or even established elastic container
platforms (see Table 1). Although there are a lot of indicators
that Kubernetes might be getting the dominant solution
3
, our
approach considers that the choice of a container platform
should be conceptually a user decision. Therefore, it is neces-
sary to define cloud applications in an unified way, that can
be deployed to any of the mentioned platforms. This
unified
cloud application modeling language (UCAML)
– that is
aligned to our overall transferability concept – is explained in
Section 6
. Although not intended by research design, there
might arise even options for other research fields in the cloud
computing domain. We see explicitly opportunities for
asym-
metric cloud security
and present additionally the core of
amoving target security concept in
Section 7
. Section 8
presents some lessons learned throughout the course of the
design systematically into account. Especially the standardization of deploy-
ment units of such cloud applications via container technologies seems not
aware to a lot of research projects.
3
Comparable to Linux as dominating operating system for servers, Win-
dows for business desktop computers, and the Java Virtual Machine for
platform independent software engineering.
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 3/36
project. Finally, we conclude our
research outcomes
in
Sec-
tion 9.
2. Research Objectives and Outcomes
The intent of the project Cloud TRANSIT was to investigate
technical solutions to overcome existing vendor lock-in prob-
lems in cloud computing for cloud hosted applications or
services. This technical report will investigate some – mainly
practitioner initiated – cloud application engineering trends of
the last three years that provide several promising opportuni-
ties for cloud services. The research objectives of the project
had been:
•
Public and private cloud infrastructures should be analyzed
to determine
commonalities
that can be relied on to avoid
vendor lock-in-free operation of cloud applications.
•
The project should investigate technical solutions (consider-
ing expectable performance impacts) to realize a promising
runtime transferability concept for cloud applications.
•
Requirements had to be derived for a unified
cloud-application
description language
that has a clear focus on cloud services
of typical complexity (distributed, load balanced, auto scaling)
with a built-in transferability concept.
Furthermore, we provide some informations of interrelated
research prototypes (see Appendix D: Table 18) that have
been developed throughout the course of the project. These
prototypes demonstrate a proof-of-concept of our proposals.
These
research prototypes
enable to build transferable cloud-
applications that are not prone to vendor lock-in and can be
transfered between arbitrary public, private, or hybrid cloud
infrastructures.
3. Methodology
The research objectives mentioned in Section 2guided our
research methodology that is shown in Figure 1. Initially
we performed a systematic mapping study on cloud-native
applications to get a better understanding what the specifics
and problems of cloud-native applications (CNA) exactly are.
Basic insights are summarized in Section 3.1. Because cloud-
native applications are very vulnerable to vendor lock-in, we
performed additionally a review on cloud standardization
approaches. Section 3.2 summarizes how vendor lock-in
emerges in cloud computing. Both reviewing steps have been
accompanied by action research in concrete projects or by
having cloud-native application engineering approaches by
practitioners under research surveillance.
Based on these insights we derived a reference model for
cloud-native applications (ClouNS, cloud-native application
stack) that is presented in Figure 4(see Section 4) and that
guided mainly our development of a transferability concept for
cloud-native applications. We evaluated this reference model
using a concrete project from our action research activities.
Based on the reference model we derived two independent
but complementary engineering problems. The first insight
was not to make the application itself transferable but the plat-
form the application is operated on. Therefore, we had to find
a solution, how to operate elastic (container) platforms in an
infrastructure-agnostic, secure, transferable, and elastic way.
This multi-cloud-scaler is described in Section 5. Because
not only the infrastructure provider but although the elastic
platform decision is a user decision, we additionally had to
find a solution to describe cloud applications in an unified
format. This format can be transformed into platform specific
definition formats like Swarm compose, Kubernetes manifest
files, and more. This unified cloud application modeling lan-
guage UCAML is explained in Section 6. Both approaches
mutually influenced each other and have been evaluated in
parallel.
Figure 1. Research methodology in retrospect of the project
But let us start with a brief summary of our initial insights
derived from our systematic mapping study on cloud-native
applications (see Section 3.1), and our review of cloud stan-
dards, and what can be learned from practitioner surveillance
(see Section 3.2).
3.1 Systematic mapping study
This paragraph will summarize the main aspects of our
systematic mapping study on cloud-native applications.
We refer to the original study [2]for all details.
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 4/36
Table 1. Some popular open source elastic container platforms (ECPs)
Platform Contributors URL
Kubernetes Cloud Native Foundation https://kubernetes.io (initiated by Google)
Swarm Docker https://docker.io
Mesos Apache https://mesos.apache.org/
Nomad Hashicorp https://nomadproject.io/
It is common sense that cloud-native applications (CNA) are
intentionally designed for the cloud. Although this understand-
ing can be broadly used, it does not guide and explain what a
cloud-native application exactly is. That is why we performed
a systematic mapping study according to general guidelines
for systematic mapping studies in software engineering [
3
,
4
]
and for systematic literature reviews [
5
]. Additionally, we
took updates and critical reflections [
6
] on these guidelines
into consideration. We searched for contributions published
in or after 2006 having the term ”cloud-native” in their title,
abstract or keywords. The cloud-native aspect should be in-
tentionally taken so much into consideration by the authors
that the term ”cloud-native” should be explicitly mentioned it
in title, abstract or as a keyword.
Table 2. Systematic mapping study results
Electronic Source Papers Oldest Youngest Latest access
IEEExplore 11 2010 2016 15.09.2016
ACM Digital Library 8 2014 2016 15.09.2016
Google Scholar 23 2015 2016 15.09.2016
Citeseer 10 2010 2014 15.09.2016
ScienceDirect 8 2010 2016 15.09.2016
SpringerLink 41 2006 2016 15.09.2016
Semantic Scholar 63 2011 2016 15.09.2016
Figure 2.
Mapping of research topics to research approaches
The amount of considered studies is summarized in Table 2.
All selected papers dealt with 21 detailed research topics of
cloud-native applications. We decided to group these detailed
research topics into the following major CNA research topics.
•CNA principles
describe recurring principles how CNA prop-
erties are achieved and how transferability of a CNA can be
realized. According to selected papers, CNAs should be
operated on automation platforms.Softwarization of infras-
tructures should be strived for to support DevOps principles
more consequently. Operation of CNAs in multi- and hy-
brid clouds should be supported by applying migration and
interoperability principles and lessons-learned.
•CNA architectures
deal with general CNA design aspects
like service-oriented architecture approaches (particular the
microservice architecture approach) as well as accompanying
service composition principles of self-contained deployment
units (containers).
•CNA methods
include patterns and design methodologies to
create effective CNA architectures and DevOps to automate
the process of delivery and infrastructure changes.
•CNA properties
describe characteristics of CNA. Such char-
acteristics dealing with consistency models, availability,parti-
tion tolerance (strongly related to Brewer’s CAP theorem [
7
]),
elasticity,resilience and service levels (SLA). This property
combination seems to be very characteristic for CNAs.
As Figure 2is indicating, the cloud-native research field seems
to be currently dominated by solution proposals for princi-
ples and architectures for cloud applications and seems to be
inspired very often by (failed) project experiences. But sys-
tematic evaluation and validation research falls behind. So, we
are talking about a not well consolidated software engineering
domain with isolated engineering trends (see Table 3) that
made it hard to derive well founded engineering decisions.
Our literature review did not turn up a common defini-
tion that explains what a cloud-native application exactly is.
Nevertheless, our survey identified enough studies that can be
used to derive a definition proposal for the term CNA. There is
a common but unconscious understanding across several rele-
vant studies. Fehling et al. propose that a cloud-native applica-
tion should be IDEAL, so it should have an
[i]solated state
, is
[d]istributed
in its nature, is
[e]lastic
in a horizontal scaling
way, operated via an
[a]utomated management
system and
its components should be
[l]oosely coupled
[
8
]. According
to Stine [
9
] there are common motivations for cloud-native
application architectures like to deliver software-based solu-
tions more quickly
(speed)
, in a more fault isolating, fault
tolerating, and automatic recovering way
(safety)
, to enable
horizontal (instead of vertical) application scaling
(scale)
, and
finally to handle a huge diversity of (mobile) platforms and
legacy systems
(client diversity)
. These common motiva-
tions are addressed by several application architecture and
infrastructure approaches:
•Microservices
represent the decomposition of monolithic
(business) systems into independently deployable services
that do ”one thing well” [10,11].
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 5/36
Table 3. Identified isolated engineering trends for CNA
Trend Rationale
Microservices Microservices can be seen as a ”pragmatic” interpretation of SOA. In addition to SOA microservice architectures inten-
tionally focus and compose small and independently replaceable horizontal scalable services that are ”doing one thing
well”.
DevOps DevOps is a practice that emphasizes the collaboration of software developers and IT operators. It aims to build, test,
and release software more rapidly, frequently, and more reliably using automated processes for software delivery.
Softwareization Softwareization of infrastructure and network enables to automate the process of software delivery and infrastructure
changes more rapidly.
Standardized Deployment Units Deployment units wrap a piece of software in a complete file system that contains everything needed to run: code,
runtime, system tools, system libraries. This guarantees that the software will always run the same, regardless of its
environment.
Elastic Platforms Elastic platforms like Kubernetes, Mesos, Docker Swarm can be seen as a middleware for the execution of custom
but standardized deployment units. Elastic platforms extend resource sharing and increase the utilization of underlying
compute, network and storage resources.
State Isolation Stateless components are easier to scale up/down horizontally than stateful components. Of course, stateful components
can not be avoided, but stateful components should be reduced to a minimum and realized by intentional horizontal
scalable storage systems (often eventual consistent NoSQL databases).
Versioned REST APIs REST-based APIs provide scalable and pragmatic communication means relying mainly on already existing internet
infrastructure and well defined and widespread standards.
Loose coupling Service composition is done by events or by data. Event coupling relies on messaging solutions (e.g. AQMP standard).
Data coupling relies often on scalable but (mostly) eventual consistent storage solutions (which are often subsumed as
NoSQL databases).
•
The main mode of interaction between services in a cloud-
native application architecture is via published and versioned
APIs
(API-based collaboration)
. These APIs are often HTTP
based and follow a REST-style with JSON serialization, but
other protocols and serialization formats can be used as well.
•
Single deployment units of the architecture are designed and
interconnected according to a
collection of cloud-focused
patterns
like the twelve-factor app collection [
12
], the circuit
breaker pattern [13] or cloud computing patterns [8,14].
•
And finally, more and more often
elastic container plat-
forms
are used to deploy and operate these microservices
via self-contained deployment units (containers). These plat-
forms provide additional operational capabilities on top of
IaaS infrastructures like automated and on-demand scaling
of application instances, application health management, dy-
namic routing, load balancing and aggregation of logs and
metrics.
These aspects let us derive the following definition that guided
our understanding of cloud applications throughout the course
of the project:
A
cloud-native application (CNA)
is a distributed, elas-
tic and horizontal scalable system composed of (mi-
cro)services which isolates state in a minimum of stateful
components. The application and each self-contained de-
ployment unit of that application is designed according
to cloud-focused design patterns and operated on a self-
service elastic platform.
3.2 Review of cloud standards and action research
This paragraph will summarize the main aspects of our
review on cloud standards and surveillance of practition-
ers (action research). We refer to our reference model
study
[1]
but although initial pre-project studies like
[15,16]for all details.
A commodity describes a class of demanded goods (in case
of IaaS computing, storage and networking services) without
substantial qualitative differentiation. The market treats in-
stances of these goods as (nearly) equivalent. Or more simple:
a virtual server provided by Google Compute Engine (GCE)
can easily being replaced by a server provided by Amazon
Web Services (AWS). Therefore, IaaS provides basically a
commodity service.
Table 4. Cloud Application Maturity Model
, adapted from
OPEN DATA CENTER ALLIANCE Best Practices [17]
Level Maturity Criteria
3 Cloud - Transferable across infrastructure providers at
native runtime and without interruption of service.
- Automatically scale out/in based on stimuli.
2 Cloud - State is isolated in a minimum of services.
resilient - Unaffected by dependent service failures.
- Infrastructure agnostic.
1 Cloud - Composed of loosely coupled services.
friendly - Services are discoverable by name.
- Components are designed to cloud patterns.
- Compute and storage are separated.
0 Cloud - Operated on virtualized infrastructure.
ready - Instantiateable from image or script.
On the other hand, switching from one SaaS provider to
another one, or changing the middleware (PaaS) is in most
cases not as easy as buying coffee from a different supplier
[
18
]. The core components of these distributed systems like
virtualized server instances and basic networking and storage
can be deployed using commodity services. However, fur-
ther services – that are needed to integrate these virtualized
resources in an elastic, scalable, and pragmatic manner – are
often not considered in standards. Services like load balanc-
ing, auto scaling or message queuing systems are needed to
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 6/36
! !
! " #$$
$$%% %%
% %
! " $%& %"
!% !# "! ""
&'
%&'
!&'
(&'
"&'
)&'
#&'
$&'
*&'
+&'
%&&'
!&&# !&&$ !&&* !&&+ !&%& !&%% !&%! !&%( !&%" !&%) !&%#
,-./ 01234 254 623718-9- 84 7-9: 16-7
623718-9-84;<4=>?>@4A==>@4=B?>@4ACD@4A=>@4EAF=G 3204623718-9-84
Figure 3. Decrease of standard coverage over years (by
example of AWS)
design an elastic and scalable cloud-native system on almost
every cloud service infrastructure. But especially the service
types that are needed for cloud applications on a higher cloud
maturity level (see Table 4) are not considered consequently
by current cloud standards (see App. A: Table 16 and [
15
]).
All public cloud service providers try to stimulate cloud cus-
tomers to use these non-commodity convenience services in
order to bind them to their infrastructures and higher-level
service portfolios. The percentage of these commodity ser-
vice categories that are considered in standards like CIMI [
19
],
OCCI [
20
,
21
], CDMI [
22
], OVF [
23
], OCI [
24
], TOSCA [
25
]
and non-commodity services
4
is decreasing over the years.
Figure 3shows this by example of AWS over the years. That
is how mainly vendor lock-in emerges in cloud computing.
For a more detailed discussion we refer additionally to [
26
].
Appendix A: Table 16 shows the service portfolio of major
cloud service providers AWS, Google, Azure (and OpenStack)
and their relationship to current cloud standards.
All reviewed cloud standards (but also practitioner initi-
ated open-source bottom up approaches) focus a very small
but basic subset of popular cloud services: compute nodes
(virtual machines), storage (file, block, object), and (virtual
private) networking. Standardized deployment approaches
like TOSCA are defined mainly against this commodity infras-
tructure level of abstraction. These kind of services are often
subsumed as IaaS and build the foundation of cloud services
and therefore cloud-native applications. All other service cat-
egories might foster vendor lock-in situations. This all might
sound disillusioning. But in consequence, the basic idea of
a cloud-native application stack should be to use only this
small subset of well standardized IaaS services as founding
building blocks. Because existing cloud standards cover only
specific cloud service categories (mainly the IaaS level) and
do not show an integrated point of view, we developed a more
integrated reference model (see ClouNS column in Appendix
4So, services that are not covered by cloud standards.
A: Table 16) that will be presented in the following Section 4.
This reference model take best-practices of practitioners how
cloud-native applications should be built into account.
4. Reference Model
This Section will summarize the main aspects of our ref-
erence modeling but also on accompanying performance
considerations for cloud-native applications. We refer
to our studies
[1,15,16]
for more in-depth information
about our guiding reference model.
To have multiple view points on the same object has been
adapted successfully by several architecture frameworks. Ob-
viously the same is true for cloud computing. We can look
from a service model point of view (IaaS, PaaS, SaaS), a de-
ployment point of view (private, public, hybrid, community
cloud) on cloud computing as it is done by [
27
]. Or we can
look from an actor point of view (provider, consumer, auditor,
broker, carrier) or a functional point of view (service deploy-
ment, service orchestration, service management, security,
privacy) as it is done by [
28
]. Points of view are particular
useful to split problems into concise parts. However, the above
mentioned view points might be common in cloud computing
and useful from a service provider point of view but not from
cloud-native application engineering point of view. From an
engineering point of view it seems more useful to have views
on technology levels involved and applied in cloud-native
application engineering. This is often done by practitioner
models. However, these practitioner models have been only
documented in some blog posts
5
and do not expand into any
academic papers as far as the authors know.
4.1 Description of a reference cloud-native stack
Taking the insights from our systematic mapping study (see
Section 3.1) and our review of cloud standards (see Section
3.2) we could compile a reference model of cloud-native
applications that is far more aligned to practitioner point of
views than other academic reference models. This layered
reference model is shown and explained in Figure 4and Table
5. The basic idea of this reference model is to use only a small
subset of well standardized IaaS services as founding building
blocks. Four basic view points form the overall shape of this
model:
1. Node centric view point (aka IaaS).
This is a view point
being familiar for engineers who are developing classical
client server architectures. This is how IaaS can be understood.
IaaS deals with deployment of isolated compute nodes for
a cloud consumer. It is up to the cloud consumer what it is
5
Jason Lavigne, ”Don’t let a PaaS you by - What is a PaaS and why
Microsoft is excited about it”, see
http://bit.ly/2nWFmDS
(last access
13th Feb. 2018)
Johann den Haan, ”Categorizing and Comparing the Cloud Landscape”, see
http://bit.ly/2BY7Sh2 (last access 13th Feb. 2018)
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 7/36
Table 5. Layers of a cloud-native stack (ClouNS reference model)
Viewpoint Layer (Sub)layer Name Main Purpose Examples Example Standards
SaaS (6) Application Application Providing End User Functionality
PaaS (5) Service Functional Services Providing Functional Services RabbitMQ, Hadoop AMQP
All Purpose Services Providing Distrib. Sys. Patterns SpringCloud
Storage Services Providing Storage Services S3, Swift, RADOS CDMI
Providing Database Services RDS SQL
CaaS (4) Cluster Container Orchestrator Providing Continual Elasticity Kubernetes, Marathon TOSCA
Overlay Network Bridging IaaS Networks Flannel, Weave, Calico
Cluster Scheduler Scaling Cluster Swarm, Mesos, ECS
Bridging IaaS Infrastructures
Clustered Storage Providing Scalable Storage Gluster FS, Ceph NFS, CIFS, WebDav
IaaS (3) Container Host Container Executing Deployment Units Docker, Rkt OCI
Operating System Operating Hosts Linux, OS X, Windows POSIX
(2) Virtual Host Virtual Infrastructure Providing Virtual Machines EC2, GCE, OS, ESX OVF, OCCI, CIMI
(1) Physical Host Physical Infrastructure Providing Host Machines Bare Metal Machines
Figure 4. Cloud-native stack
done with these isolated nodes (even if there are provisioned
hundreds of them).
2. Cluster centric view point (aka CaaS).
This is a view point
being familiar for engineers who are dealing with horizontal
scalability across nodes. Clusters are a concept to handle
many nodes as one logical compute node (a cluster). Such
kind of technologies are often the technological backbone
for PaaS solutions and portable cloud runtime environments
because they are hiding complexity (of hundreds or thousands
of single nodes) in an appropriate way. Additionally, CaaS
realizes the foundation to define services and applications
without reference to particular cloud services or cloud infras-
tructures and therefore provides the basis to avoid vendor
lock-in.
3. Service centric view point (aka PaaS).
This is a view point
familiar for application engineers dealing with Web services
in service-oriented architectures (SOA). Of course, (micro)-
services have to be deployed on and operated by single nodes.
However, all necessary and complex orchestration of these
single nodes is delegated to a cluster (cloud runtime environ-
ment) providing a platform as a service (PaaS).
4. Application centric view point (aka SaaS).
This is a view
point being familiar for end-users of cloud services (or cloud-
native applications). These cloud services are composed of
smaller cloud services being operated on clusters formed of
single compute and storage nodes. The cloud provision model
SaaS falls into this category.
4.2 Performance Considerations
This Section summarizes the main points of our perfor-
mance considerations. The following of our studies inves-
tigated performance aspects on service level
[29,30,31]
and on elastic container platform level
[32,33]
. We
developed the research benchmarking prototypes
EASY-
COMPARE
for the platform level and
PPBENCH
for
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 8/36
the service level (see Table 18) to do these pre-solution
performance analytics.
Common practitioner concerns regarding the operation of
microservice and cloud-native architectures in multi-cloud
contexts have to do with performance aspects. We identified
the following main concerns.
1.
Platforms for fine grained resource allocation like Mesos or
Kubernetes rely on a vast amount of
homogeneous nodes
.
And the obvious question arises whether it is possible to
identify similar machine types across different cloud service
providers? In short: It is possible! We show how to do this in
the following paragraph 4.2.1.
2.
Elastic container platforms normally involve
overlay net-
works
to avoid IP collisions pragmatically. But overlay net-
works may have severe performance impacts. The question
is whether and how these performance impacts can be con-
tained? Paragraph 4.2.2 will address this aspect.
4.2.1 Machine similarity of different cloud providers
Selecting virtual machine types can be a complex task in its
details. The underlying decision making problem makes it
difficult to model it appropriately. There exist several com-
plex mathematical models like analytic hierarchy processes
[
34
], utility function based methodologies [
35
], outranking
approaches [
36
], simple additive weighting approaches[
37
],
cloud feature models [
38
], dynamic programming approaches
[
39
], cloud service indexing [
40
], ranking approaches [
41
],
Markov decision process [
42
] to support this task. However, in
order to establish a container cluster across multiple providers,
it is necessary to select most similar machines [43].
All mentioned approaches try to identify a best resource
in terms of performance or cost effectiveness. But to operate
container clusters relies on homogeneous nodes across dif-
ferent cloud infrastructures. So, we do not search for best
performing or the most cost effective resources, we have to
search for the most similar – homogeneous – resources. There-
fore we developed EASYCOMPARE [
33
] and Appendix D:
Table 18. This benchmark suite uses a feature vector shown
in equation (1) to describe the performance of a virtual ma-
chine type. EASYCOMPARE compromises several different
benchmarks.
Processing performance
is determined by the
Taychon
benchmark [
44
]. The
Stream
benchmark is used to
measure the
memory performance
[
45
]. With the
IOZone
benchmark
[
46
] the read and write
disk performance
can
be measured. iperf [
47
] is used to measure
network transfer
rates of intra cloud data transfers.
i=
i1
i2
i3
i4
i5
i6
i7
Processing: Amount of simultaneous executable threads
Processing: Processing time in seconds
(Median of all Tachyon benchmark runs)
Memory: Memory size in MB
Memory: Memory transfer in MB/s
(Median of all Stream Triad benchmark runs)
Disk: Data transfer in MB/s for disk reads
(Median of all IOZone read stress benchmark runs)
Disk: Data transfer in MB/s for disk writes
(Median of all IOZone write stress benchmark runs)
Network: Data transfer in MB/s via network
(Median of all iperf benchmark runs)
(1)
∀i,js(i,j) = 1−1
m
m
∑
n=11−min(in,jn)
max(in,jn)2
∀i,js(i,j) = s(j,i)
∀i,j0≤s(i,j)≤1
∀is(i,i) = 1
(2)
The similarity
s(i,j)
of two virtual machine types
i
and
j
can
be analyzed by calculating their vector similarity. Although
this approach is mostly used in domains like information re-
trieval, text and data mining, vector similarities can be used to
determine the similarity of virtual machine types as well. A
good similarity measure has to fulfill the three similarity intu-
itions of Lin [
48
]. It turned out that the normalized Euclidian
distance measure shown in equation (2) fulfills Lin’s intuitions
of a good similarity measure for the intended purpose of iden-
tifying most similar cloud resources across different providers.
This variant of the Euclidian distance metric measures the per-
formance relations of performance components
(i1,i2,...,im)
and
(j1,j2,..., jm)
and normalizes the result
s(i,j)
to a simi-
larity value between
0.0
(
i
and
j
are not similar at all) and
1.0
(iand jare completely similar).
Table 6was compiled using EASYCOMPARE and com-
pares machine types provided by two major and representative
public cloud service providers: Amazon Web Services and
Google Compute Engine. It turned out that in over 500 dif-
ferent possible machine type pairings only three machine
pairings have a strong similarity. These three pairings with
high similarity would be reasonable choices for container clus-
ters. So, similar machine types might be rare but they exist
and they are identifiable. EASYCOMPARE can be easily
extended to benchmark further IaaS cloud service providers
[33].
4.2.2 Performance impact of overlay networks
To analyze the performance impact on network performance
by containers, additional overlay network layers, and machine
types several experiments have been designed (see Figure
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 9/36
Table 6. Example similarity matrix of AWS and GCE virtual machines types (sorted on both axis by descending amount of
simultaneous executable threads).
Similarities
n1-highmem-32
n1-standard-32
n1-highcpu-32
n1-highmem-16
n1-standard-16
n1-highcpu-16
n1-highmem-8
n1-standard-8
n1-highcpu-8
n1-highmem-4
n1-standard-4
n1-highcpu-4
n1-highmem-2
n1-standard-2
n1-highcpu-2
n1-standard-1
g1-small
f1-micro
m4.10xlarge 0.67 0.70 0.68 0.56 0.50 0.52 0.51 0.46 0.48 0.40 0.41 0.43 0.36 0.37 0.40 0.35 0.30 0.13
c4.8xlarge 0.66 0.66 0.70 0.59 0.65 0.54 0.55 0.47 0.50 0.41 0.43 0.46 0.37 0.39 0.43 0.38 0.32 0.14
c3.8xlarge 0.81 0.82 0.81 0.56 0.64 0.51 0.52 0.45 0.47 0.39 0.40 0.42 0.34 0.35 0.39 0.33 0.27 0.13
i2.4xlarge 0.61 0.57 0.60 0.84 0.77 0.80 0.66 0.60 0.64 0.52 0.54 0.59 0.46 0.48 0.53 0.47 0.40 0.22
r3.4xlarge 0.60 0.57 0.60 0.83 0.76 0.79 0.65 0.59 0.62 0.50 0.52 0.56 0.44 0.45 0.51 0.44 0.37 0.21
m4.4xlarge 0.61 0.58 0.61 0.84 0.77 0.80 0.66 0.60 0.64 0.51 0.54 0.58 0.45 0.47 0.52 0.46 0.39 0.21
c3.4xlarge 0.60 0.57 0.60 0.83 0.76 0.79 0.65 0.72 0.62 0.50 0.52 0.56 0.43 0.45 0.50 0.44 0.37 0.21
c4.4xlarge 0.59 0.55 0.58 0.82 0.75 0.79 0.64 0.72 0.62 0.53 0.56 0.60 0.49 0.51 0.56 0.50 0.44 0.25
i2.2xlarge 0.52 0.49 0.51 0.62 0.63 0.65 0.79 0.81 0.83 0.60 0.62 0.63 0.52 0.53 0.54 0.51 0.43 0.27
r3.2xlarge 0.52 0.49 0.51 0.62 0.63 0.65 0.79 0.81 0.83 0.60 0.62 0.63 0.52 0.53 0.54 0.51 0.43 0.27
m4.2xlarge 0.46 0.43 0.45 0.55 0.57 0.60 0.74 0.77 0.78 0.57 0.60 0.60 0.49 0.51 0.50 0.53 0.54 0.37
m3.2xlarge 0.52 0.49 0.50 0.61 0.63 0.66 0.79 0.96 0.83 0.60 0.63 0.63 0.52 0.54 0.54 0.52 0.44 0.27
c3.2xlarge 0.52 0.50 0.51 0.62 0.64 0.80 0.79 0.82 0.83 0.60 0.76 0.62 0.52 0.53 0.53 0.51 0.44 0.27
c4.2xlarge 0.53 0.51 0.52 0.62 0.64 0.80 0.78 0.81 0.82 0.58 0.74 0.60 0.50 0.52 0.52 0.50 0.42 0.26
i2.xlarge 0.45 0.42 0.43 0.54 0.59 0.59 0.58 0.78 0.62 0.81 0.82 0.80 0.60 0.60 0.58 0.57 0.48 0.32
r3.xlarge 0.46 0.42 0.43 0.54 0.59 0.60 0.58 0.79 0.63 0.82 0.83 0.81 0.62 0.62 0.60 0.58 0.50 0.33
m4.xlarge 0.47 0.44 0.45 0.56 0.61 0.61 0.60 0.66 0.64 0.83 0.85 0.82 0.63 0.63 0.60 0.60 0.51 0.34
m3.xlarge 0.47 0.43 0.45 0.56 0.58 0.74 0.60 0.63 0.64 0.80 0.96 0.82 0.59 0.61 0.61 0.58 0.49 0.31
c3.xlarge 0.46 0.42 0.43 0.54 0.60 0.59 0.58 0.65 0.77 0.82 0.83 0.79 0.61 0.74 0.57 0.56 0.48 0.32
c4.xlarge 0.49 0.46 0.47 0.57 0.62 0.62 0.61 0.67 0.79 0.81 0.83 0.80 0.59 0.74 0.57 0.57 0.48 0.32
r3.large 0.39 0.36 0.37 0.45 0.54 0.65 0.48 0.56 0.52 0.64 0.76 0.58 0.83 0.81 0.77 0.65 0.57 0.40
m3.large 0.41 0.38 0.39 0.48 0.54 0.53 0.51 0.58 0.69 0.63 0.64 0.61 0.84 0.98 0.80 0.67 0.58 0.39
c3.large 0.39 0.37 0.37 0.46 0.54 0.51 0.49 0.57 0.52 0.64 0.63 0.58 0.84 0.81 0.77 0.79 0.56 0.39
c4.large 0.40 0.37 0.37 0.46 0.55 0.51 0.50 0.58 0.53 0.65 0.63 0.59 0.81 0.79 0.75 0.78 0.56 0.40
t2.medium 0.43 0.40 0.40 0.49 0.56 0.52 0.53 0.59 0.55 0.66 0.62 0.59 0.74 0.70 0.68 0.69 0.48 0.31
m3.medium 0.31 0.30 0.30 0.34 0.42 0.39 0.36 0.43 0.40 0.50 0.47 0.44 0.57 0.55 0.51 0.86 0.77 0.57
t2.small 0.38 0.36 0.37 0.43 0.50 0.47 0.45 0.52 0.49 0.61 0.58 0.54 0.66 0.63 0.74 0.78 0.87 0.52
t2.micro 0.31 0.30 0.29 0.34 0.37 0.36 0.37 0.40 0.38 0.47 0.46 0.45 0.53 0.52 0.50 0.68 0.63 0.46
5). We defined a reference ping-pong system that provides a
REST-like and HTTP-based protocol to exchange data (PP-
BENCH, see Appendix D: Table 18). This kind of connections
are commonly used in microservice architectures and con-
tainer based cloud systems. The ping and pong services can
be implemented in different programming languages in order
to identify the impact of polyglot programming techniques.
PPBENCH sends several HTTP request with a random
message size
m
between a minimum and maximum amount
of bytes to the ping server to analyze the answer and response
behavior for different message sizes. The ping server relays
each request to the pong server. And the pong server answers
the request with a
m
byte long answer message (as requested
by the HTTP request). The ping server measures the time
from request entry to the point in time when the pong server
answer hit the ping server again. PPBENCH repeats a request
for a specified message size
m
several times to calculate a
mean transfer rate. Using this approach we got a much more
complete coverage and insight into our analyzed problem
domain.
This general setting can be modified to analyze the impact
Table 7. Language variants for ping and pong service
Language Version server library client library
Go 1.5 net/http + gorilla/mux net/http
Ruby 2.2 Webrick httpclient
Java 1.8 com.sun.net.httpserver java.net + java.io
Dart 1.12 http + start http
of an additional container layer, an overlay network, or even
the impact of a specific programming language (in order to
investigate the performance impact of polyglot programming
approaches).
•
The
Bare experiment setup
shown in Figure 5(a) was used
to collect reference performance data of the ping-pong sys-
tem deployed to different virtual machines interacting with a
REST-like and HTTP based protocol.
•
The intent of the
Container experiment setup
was to figure
out the impact of an additional container layer to network per-
formance (Figure 5(b)). Because the ping and pong services
are provided as containers every performance impact must be
due to this container layer.
•
The intent of the
Overlay network experiment setup
shown
in Figure 5(c) was to figure out the impact of an additional
SDN layer to network performance. Because an overlay net-
work connects the ping and the pong containers every data
transfer must pass this overlay network. Therefore, every per-
formance impact must be due to this additional networking
layer. PPBENCH covers currently Weave and Calico but can
be extended easily with further overlay network solutions.
•
Additionally the ping and pong services can be deployed
in different
programming language variants
(see Table 7).
These variants provide the same functionality and implement
the same interface but can be used to measure the impact of
polyglot programming techniques.
PPBENCH can be used to run, plot, and compare different
variants of this experiment in order to get an impression of
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 10/36
(a) Bare experiment to identify reference performance
(b) Container experiment to identify impact of container
(c) Overlay network experiment to identify impact of overlay networks
Figure 5. Network performance experiment setups
expectable performance impacts in the
upfront
of a system
design. Figure 7shows the relation of some measured trans-
fer rates on different virtual machine types (AWS). Table 8
summarizes such kind of measurements for GCE and AWS
as guidance levels. It turned out that the impacts of overlay
networks can be severe on low-core virtual machine types
but can be contained on machine types with high-core ma-
chine types. However, the performance impact should not be
overrated. Figure 6shows the relative performance impact of
programming languages (due to polyglot programming).
Table 8. Relative performance impacts on data transfer rates
by an additional overlay network layer (Weave); just
guidance levels
Machine type cores Small messages Big messages
AWS m3.large 2 45% 60%
GCE n1-standard-2 2 25% - 80% 25%
AWS m3.xlarge 4 60% 90% - 100%
GCE n1-standard-4 4 40% - 100% 60%
AWS m3.2xlarge 8 70% - 80% 90% - 100%
GCE n1-standard-8 8 55% - 80% 70%
Requests per second in relative comparison
(bigger is better)
Message Size (kB)
Ratio (%)
0kB 50kB 150kB 250kB 350kB 450kB
0% 20% 40% 60% 80% 110% 140% 170% 200%
●
●
●
●
Reference: Go (Exp. P1) on m3.2xlarge
Java (Exp. P2) on m3.2xlarge
Ruby (Exp. P3) on m3.2xlarge
Dart (Exp. P4) on m3.2xlarge
Figure 6. Relative impact of different programming
languages
The impact of programming languages on network
performance can be much higher than the impact of
an additional overlay network layer.
In consequence the project was able to derive a simple engi-
neering rule.
It seems to be a simple and cost effective strategy to
operate container clusters with
most similar high core
machine types across different providers.
4.3 Summary of reference model considerations
More and more cloud-native applications are deployed on
elastic container platforms. We saw that resulting performance
impacts shall be considered but can be contained. So, one
obvious idea was to delegate the transferability problem to the
elastic container platform layer. If a container platform would
be transferable, then the applications on top of that platform
would be transferable as well. That is why we decided to
derive a transferability concept that can be composed of two
independent but complementary parts.
•
Finding a solution to make elastic container platforms trans-
ferable across different IaaS infrastructures. This will be
explained in Section 5.
•
Finding a solution to define cloud applications in a way that
they are deployable to arbitrary elastic container platforms.
That would made it possible to make the choice of a elas-
tic container platform a user decision as well. This will be
explained in Section 6.
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 11/36
Data transfer in relative comparison
(bigger is better)
Message Size (kB)
Ratio (%)
0kB 50kB 150kB 250kB 350kB 450kB
0% 30% 60% 90% 120% 150% 180%
●
●
●
Reference: bare on m3.large
docker on m3.large
weave on m3.large
(a) AWS m3.large (2 core)
Data transfer in relative comparison
(bigger is better)
Message Size (kB)
Ratio (%)
0kB 50kB 150kB 250kB 350kB 450kB
0% 30% 60% 90% 120% 150% 180%
●
●
●
Reference: bare on m3.xlarge
docker on m3.xlarge
weave on m3.xlarge
(b) AWS m3.xlarge (4 core)
Data transfer in relative comparison
(bigger is better)
Message Size (kB)
Ratio (%)
0kB 50kB 150kB 250kB 350kB 450kB
0% 30% 60% 90% 120% 150% 180%
●
●
●
Reference: bare on m3.2xlarge
docker on m3.2xlarge
weave on m3.2xlarge
(c) AWS m3.2xlarge (8 core)
Figure 7. Relative transfer rates of containers, and overlay networks measured on different AWS machine types
5. Multi-Cloud Scaler
This Section will present a through the course of the
project developed multi-cloud-scaler. This component is
capable to operate and transfer elastic container plat-
forms in multi-cloud contexts at runtime without down-
times. We refer to the original studies
[49,50,51,52]
and
our software prototypes
PLAIN, ECP DEPLOY, and
OPEN4SSH
(see Appendix D: Table 18) for more in-
depth information.
A lot of requirements regarding transferability, awareness and
security come along with multi-cloud approaches [
53
,
54
,
55
,
56
]. Already existing elastic container platforms contribute to
fulfill these requirements [
1
]. Table 9summarizes the main
points. This Section presents insights from two software pro-
totypes (ECP DEPLOY, and PLAIN, see Appendix D: Table
18) that both are implemented in Ruby. These tools can be trig-
gered in the execution phase of a MAPE auto-scaling loop [
57
]
and scale elastic container platforms according to an execution
pipeline (see Figure 10). A control process interprets a cluster
description format (the intended state of the container cluster,
see Appendix B: Listing 1) and the current state of the cluster
(attached nodes, existing security groups, see Appendix B:
Listing 2). If the intended state differs from the current state
necessary adaption actions are deduced (attach/detachment of
nodes, creation and termination of security groups). That is
basically the strategy that Kubernetes is applying to its orches-
trated container workload, but we are applying this strategy
to the elastic container platform itself. An execution pipeline
(see Figure 11) assures that
•
a set of synchronized security groups is established to enable
network access control for all participating nodes forming the
elastic container platform,
•
that nodes are provided with all necessary software and con-
figuration installed to join the elastic container platform suc-
cessfully,
Table 9.
Common multi-cloud requirements and contributing
elastic platform concepts
Requirements Contributing platform concepts
Transferability - Integration of nodes into one logical elastic platform
- Elastic platforms are designed for failure
- Cross-provider deployable
Data location - Pod Concept (Kubernetes)
- Volume Orchestrators (e.g. Flocker for Docker)
Awarenesses - Tagging of nodes with geolocation, pricing,
- Pricing policy or on-premise informations.
- Legislation/policy - Platform schedulers have selectors (Swarm),
- Local resources affinitities (Kubernetes), constraints (Mesos)
to consider these taggings for scheduling.
Security - Default encrypted data/control plane (e.g. Swarm)
- Pluggable and encryptable overlay networks
(e.g. Weave for Kubernetes)
•
that the elastic container platform provides an encrypted over-
lay network for containers,
•
that removed nodes are drained (graceful node shutdown) in
order to initiate rescheduling of workloads to remaining nodes
of the cluster,
•
and that leaving nodes and ”empty” security groups are termi-
nated to free resources of IaaS infrastructures.
This enables a lot of deployment opportunities for cloud-
native applications (see Figure 8). It is possible to real-
ize public-to-public, private-to-public, and public-to-private
cloud transfers at runtime without downtime. But is also possi-
ble to realize hybrid variants like public-public, public-private
(hybrid), or overflow deployments. All these deployment op-
tions can be realized with just a single cybernetic concept of a
control loop.
5.1 Solution Description
The conceptual model shown in Figure 9is used to describe
the deployment of elastic platforms in multi-cloud scenarios
and considers arbitrary IaaS cloud service providers and arbi-
trary elastic container platforms via corresponding extension
points. Public cloud service providers organize their IaaS
services using mainly two approaches: project- and region-
based service delivery. GCE and OpenStack infrastructures
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 12/36
Figure 8. Deployment opportunities of multi-cloud deployed CNAs
are examples of project-based approaches. To request IaaS
resources like virtual machines one has to create a project first.
The project has access to resources of all provider regions.
AWS is an example for such kind of region-based service deliv-
ery. Both approaches have their advantages and disadvantages.
However, the approaches are given and multi-cloud solutions
must be prepared that both approaches occur in parallel. The
conceptual model integrates both approaches introducing a
concept called District (see Figure 9).
5.1.1 Defining intended states of container platforms
A
District
can be understood as a user defined ”datacen-
ter” which is provided by a specific cloud service provider (fol-
lowing the project- or region-based approach). So, provider re-
gions or projects can be mapped to one or more
District
s
and vice versa. This additional layer provides maximum flexi-
bility in defining multi-cloud deployments of elastic container
platforms. A multi-cloud deployed elastic container platform
can be defined using two definition formats (
cluster.json
and
districts.json
). The definition formats are ex-
plained in the Appendix B in more details.
An (elastic platform) is defined as a list of
Deployment
s
in a cluster definition file (see Appendix B: Listing 1). A
Deployment
is defined per
District
and defines how
many nodes of a specific
Flavor
should perform a specific
cluster role. A lot of elastic container platforms have two
main roles of nodes in a cluster. A ”master” role to perform
scheduling, control and management tasks and a ”worker” (or
slave) role to execute containers. The proposed prototype can
work with arbitrary roles and rolenames. Role-specifics can
be considered by
Platform
drivers (see Figure 9) in their
install
,
join
and
leave
cluster hooks (see Figure 11).
A typical
Deployment
can be expressed using this JSON
snippet.
{
"district": "gce-europe",
"flavor": "small",
"role": "master",
"quantity": 3
"tags": {
"location": "Europe",
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 13/36
"policy": "EU-US privacy shield",
"scheduling-priority": "low",
"further": "arbitrary tags"
}
}
This snippet would deploy three master nodes to the district
gce-europe. These nodes are tagged with several labels that
will be considered when container workloads are scheduled.
For instance, the label ”policy: EU-US privacy shield” would
made it possible to deploy containerized workloads only on
nodes that fulfill the governance requirement to be operated
under the EU-US privacy shield regulation
6
. The reader is re-
ferred to Section 6where it will be explained how applications
can define arbitrary scheduling constraints for the underlying
platform and infrastructure layer.
A complete elastic container platform will be composed by a
list of such
Deployment
s. Machine
Flavor
s (e.g. small,
medium and large machines) and
District
s are user de-
fined and have to be mapped to concrete cloud service provider
specifics. This is done using the
districts.json
defini-
tion file (see Appendix B, Listing 3). Each
District
object
executes deployments by adding or removing machines to
the cluster. The infrastructure-specific execution is delegated
by a
District
object to a
Driver
object. A
Driver
en-
capsules and processes all necessary cloud service provider
and elastic container platform specific requests. The driver
uses access
Credential
s for authentication (see Appendix
B: Listing 4) and generates
Resource
objects (
Node
s and
SecurityGroup
s) representing resources (current state of
the cluster, encoded in a
resources.json
file, see Ap-
pendix B: Listing 2) provided by the cloud service provider
(
District
).
SecurityGroup
s are used to allow internal
platform communication across IaaS infrastructures. These
basic security means are provided by all IaaS infrastructures
under different names (firewalls, security groups, access rules,
network rules, ...). This resources list is used by the control
process to build the delta between the intended state (en-
coded in
cluster.json
) and the current state (encoded in
resources.json).
5.1.2 A control loop to reach an intended state
The control loop shown in Figure 10 is responsible to reach the
intended state and can handle common multi-cloud workflows
(see Figure 8):
•
Adeployment of a cluster can be understood as running the
execution pipeline on an initially empty resources list.
•
Ashutdown can be expressed by setting all deployment quan-
tities to 0.
•
Amigration from one
District
A to another
District
B can be expressed by setting all
Deployment
quantities of
6
The EU-US Privacy Shield is a framework for transatlantic exchanges of
personal data for commercial purposes between the European Union and the
United States. It replaced the Safe Harbor regulation which was turned over
by the European Court of Justice in 2015 due to insufficient U.S. data privacy
standards.
A to 0 and adding the former quantities of A to the quantities
of B.
The execution pipeline of the control loop deduces and pro-
cesses a prioritized action plan to reach the intended state
(see Figure 11). The pipeline must keep the affected cluster
in a valid and operational state at all times. The currently
implemented strategy considers practitioner experience to re-
duce ”stress” for the affected elastic container platform but
other pipeline strategies might work as well and are subject to
further research.
Whenever a new node attachment is triggered by the con-
trol loop, the corresponding
Driver
is called to launch a new
Node
request. The
Node
is added to the list of requested re-
sources (and extends therefore the current state of the cluster).
Then all existing
SecurityGroup
s are updated to allow
incoming network traffic from the new
Node
. These steps
are handled by an IaaS
Infrastructure
driver. Next, the
control is handed over to a
Platform
driver performing
necessary software install steps via SSH-based scripting. Fi-
nally, the node is joined to the cluster using platform (and
maybe role-)specific joining calls provided by the
Platform
driver. If install or join operations were not successful, the
machine is terminated and removed from the resources list
by the
Infrastructure
driver. In these cases the current
state could not be extended and a next round of the control
loop would do a retry. Due to its ”cybernetic” design philoso-
phy, the control-loop can handle failures simply by repeating
failed actions in a next loop.
5.1.3 Handling infrastructure and platform specifics
The handling of infrastructure and platform specifics is done
using an extendable driver concept (see Figure 9). The classes
Platform
and
Infrastructure
form two extension
points to provide support for
IaaS infrastructures
like AWS,
GCE,Azure,DigitalOcean,RackSpace, ..., and for
elastic con-
tainer platforms
like Docker Swarm,Kubernetes,Mesos/-
Marathon,Nomad, and so on. Infrastructures and platforms
can be integrated by extending the
Infrastructure
class
(for IaaS infrastructures) or
Platform
class (for additional
elastic container platforms). The current state of implementa-
tion provides
platform drivers
for the elastic container plat-
forms Kubernetes and Docker’s SwarmMode and
infrastruc-
ture drivers
for the public IaaS infrastructures AWS,GCE,
Azure and the IaaS infrastructure OpenStack. Due to the
mentioned extension points further container
Platform
s
and IaaS
Infrastructure
s are easily extendable. Table
10 summarizes how this has been applied for different IaaS
drivers.
5.2 Evaluation
The prototype was evaluated operating and transferring two
elastic platforms (Swarm Mode of Docker 17.06 and Kuber-
netes 1.7) across four public and private cloud infrastructures
(see Table 11). The platforms operated a reference ”sock-
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 14/36
Figure 9. Conceptual model for the ECP multi-cloud-scaler solution
shop” application7being one of the most complete reference
applications for microservices architecture research [
58
]. The
implementation was tested using a 6 node cluster formed of
one master node and 5 worker nodes executing the above
mentioned reference application.
It turned out that most of runtimes are due to low level
IaaS infrastructure operations and not due to elastic container
platform operations. The container platform Swarm can be in-
stalled approximately 10 seconds faster than Kubernetes. And
the joining of a Kubernetes node is approximately 5 seconds
slower than joining a Swarm node. Only the cluster initial-
ization of Kubernetes is remarkable slower. However, that
is an operation which is done only one time while bootstrap-
ping the cluster. The reader should compare these platform
runtimes with infrastructure specific runtimes. Even on the
fastest provider it took more than three minutes to launch a
cluster. So, 15 seconds of installation and joining runtime
differences between different elastic container platforms are
negligible. Therefore, we present data for Kubernetes only.
The data for another elastic container platform like Swarm
would be simply to similar. Instead of that, Figure 12 can be
used to identify much more severe and hence more interesting
7https://github.com/microservices-demo/
microservices-demo (last access 3rd July 2017)
time intensive infrastructure operations.
The reader might be surprised, that cloud infrastructure
operations have a substantial variation in runtimes although
similar resources were used across cloud service providers
for the presented experiments. A cluster launch on AWS
takes only 3 minutes but can take up to 15 minutes on Azure
(median values). The termination is even more worse. A
cluster can be terminated in approximately a minute on AWS
or OpenStack, but it can take up 18 minutes to complete on
Azure.
The question is why? Figure 12 presents runtimes of
infrastructure operations to create/delete and adjust security
groups and to create/terminate virtual cluster nodes (virtual
machines) that have been measured while performing the
mentioned experiments. Transfers are complex sequences of
these IaaS operations and substantial differences for different
IaaS infrastructures are observable for these operations. AWS
and OpenStack infrastructures are much faster than GCE and
Azure in creating security groups and nodes. The same is true
for node terminations.
And because different providers (and their different run-
time behaviors) can be combined in any combination this can
result in astonishing runtime differences for node transfers.
Figure 13 shows all transfer experiments and plots measured
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 15/36
Figure 10. Control loop of the ECP multi-cloud-scaler
runtime differences of transfers between the analyzed infras-
tructures of AWS,GCE,Azure and OpenStack. In the best
case (transfer from OpenStack to AWS) the transfer could be
completed in 3 minutes, in the worst case (transfer from Azure
to GCE) the transfer took more than 18 minutes (median val-
ues). Furthermore, a transfer from provider A to B did in
no case take the same time as from B to A. A more in-depth
analysis turned out that this is mainly due to different runtimes
of create/terminate security groups and whether the APIs are
triggering node terminations in a blocking or non-blocking
way. The APIs used for the providers GCE and Azure are
blocking operations (that means the call returns at the point in
time when the infrastructure completed the termination opera-
tion). The behavior for OpenStack and AWS was non-blocking
(that means, the calls returned immediately and just triggered
the termination but did not wait until completion). The non-
blocking behavior obviously leads to a more reactive behavior
in case of node terminations (a complete node termination
takes a minute and more). Figure 13 visualizes this by plotting
how many nodes during a transfer are up at any point in time.
A transfer from a ”faster” provider (e.g. AWS) to a ”slower”
provider (e.g. Azure) can be done substantially faster than
vice versa.
It turned out, that the runtime behavior of the
slowest IaaS operation is dominating the overall runtime
behavior of multi-cloud operations.
Taking all together,
IaaS termination operations should be launched by drivers in
a non-blocking manner (whenever possible) to improve the
overall multi-cloud performance.
Table 10. The APIs used by PLAIN IaaS drivers
Driver Provider API Used provider concepts
AWS AWS Ruby SDK - Keyfile (SSH)
http://bit.ly/1gzgpoy - AWS VPC
- Gateway/route tables
- Security group
- Ingress permissions
- AWS instances (VM)
OpenStack fog.io library - Keyfile (SSH)
fog-openstack - Security group
http://bit.ly/2obnGDF - Security group rules
- External network
- Floating IPs
- Fog server concept (VM)
GCE gcloud CLI - Keyfile (SSH)
(fog.io works as well) - GCE VPC
http://bit.ly/2EtIBJ5 - GCE machines (VM)
Azure fog.io library - Keyfile (SSH)
with Azure plugin - Virt. network (Azure)
(fog-azure-rm) - Network interface (Azure)
http://bit.ly/2o3NUZx - Public IPs (Azure)
- Storage account (Azure)
- Security group (Azure)
- Access rules (Azure)
- Resource group (Azure)
- Fog server concept (VM)
Table 11. Used machine types and regions for evaluation
Provider Region Master (4 vCPU) Worker (2 vCPU)
AWS eu-west-1 m4.xlarge m4.large
GCE europe-west1 n1-standard-4 n1-standard-2
Azure europewest Standard A3 Standard A2
OpenStack own datacenter m1.large m1.medium
5.3 Results, Discussion, and Limitations
The proposed control loop is designed to be generic enough
to adapt to each provider specific and non-standardized detail
concepts which are likely to occur in a IaaS context. For
instance, the control loop was even able to handle completely
different timing behaviors. Intentionally, only very basic IaaS
concepts are used in order to assure that the control loop can
work with every public or private IaaS infrastructure providing
concepts like virtual machines and IP-based network access
control concepts (see Table 10).
A migration from one infrastructure Ato another infras-
tructure Bcould be expressed by setting all quantities of A
to 0 and all quantities of Bto the former quantities of A. The
presented prototype keeps the cluster in an operational state
under all circumstances. The current implementation of the
execution pipeline executes simply a worst case scaling. The
pipeline processes node creation steps before node termina-
tion steps. In consequence a migration increases the cluster to
its double size in a first step. In a second step, the cluster will
be shrinked down to its intended size in its intended infrastruc-
ture. This not very sophisticated and leaves obviously room
for improvement. Figure 13 visualizes this behavior over time
for different infrastructure transfers.
The execution pipeline is designed to be just the execution
step of a higher order MAPE loop. That might lead to situa-
tions that an intended state is not reachable. In these cases, the
execution loop may simply have no effect. For a better under-
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 16/36
Figure 11. Execution pipeline for an ECP multi-cloud-scaler
standing the reader might want to imagine a cluster under high
load. If the intended state would be set to half of the nodes,
the execution loop would not be able to reach this state. Why?
Before a node is terminated the execution pipeline informs
the scheduler of the elastic container platform to mark this
node as unscheduleable with the intent that the container plat-
form will reschedule all load of this node to other nodes. A
lot of elastic container platforms call this ”
draining a node
”.
For these kind of purposes elastic container platforms have
operations to mark nodes as unschedulable (Kubernetes has
the cordon command, Docker has a drain concept and so on).
Only in the case that the container platform could success-
fully drain the node, the node will be deregistered and deleted.
However, in high load scenarios the scheduler of the container
platform will return an error due to the fact that draining is
not possible because of insufficient resources. In consequence
the execution pipeline will not terminate the node and will
trigger to drain the next node on its list (which will not work
as well). So, this cycle of the execution pipeline will be fin-
ished without substantially changing the current state. The
analyzing step of the higher order MAPE loop will still iden-
tify a delta between the intended and the current state and will
retrigger the execution pipeline. That is not perfect but at last
the cluster is kept in an operational and working state.
Although there are opportunities to improve our re-
search prototype, we could successfully demonstrate
that it is possible to transfer elastic container plat-
forms at runtime without downtime across different
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 17/36
Figure 12. Infrastructure specific processing times
public and private cloud infrastructures.
6. Unified Cloud Application Modeling
Language
This Section will present the Unified Cloud Application
Modeling Language prototype (
UCAML
, see Table 18).
We refer to the original studies
[59,60]
for more in-depth
information.
While a descriptive platform definition model can be used to
define the elastic platform (see Section 5), there is also the
need to define the application topology on top of that platform.
Because we strived for a solution where the platform decision
can be a user decision as well we need a pragmatic solution
to define applications without having a concrete container
platform in mind. This report proposes to do this using a
domain-specific language (DSL) which focuses on the Layer
5 and 6 of the cloud-native application reference model (see
Section 4). This DSL is the focal point of this Section. The
central idea is to split the transferability problem into two
independent engineering problems which are too often solved
together.
1.
The
infrastructure aware
deployment and operation
of ECPs: These platforms can be deployed and operated
in a way that they can be transferred across IaaS infras-
tructures of different private and public cloud services
as Section 5showed.
2.
The
infrastructure agnostic
deployment of applica-
tions on top of these kind of transferable container plat-
forms which is the focus of this Section.
The intent of UCAML is to define CAMM Level 2 or
higher cloud applications (see Table 4) using a domain specific
language that can be transformed into any elastic container
platform specific definition formats (like Swarm compose
files, Kubernetes manifests, etc.).
6.1 Solution Description
In order to enable an ECP-based CNA deployment by a
domain-specific language that is not bound to a specific ECP,
the particular characteristics and commonalities of the target
systems had to be identified. Therefore, the architectures
and concepts of elastic container platforms have to be ana-
lyzed and compared. As representatives, we have chosen the
three most often used elastic container platforms Kubernetes,
Docker Swarm Mode and Apache Mesos with Marathon listed
in Table 1. Table 12 shows that all of these ECPs provide com-
parable concepts (from a bird’s eye view). The following
containerization trends in cloud-native application engineer-
ing can be observed.
•Application Definition
. All platforms define applications
as a set of deployment units. The dependencies of these
deployment units are expressed descriptively. YAML based
definition formats seem to be common for all ECPs. However,
the formats are not standardized. A DSL should provide a
model-to-model transformation (M2M) to generate these ECP
specific application definition formats from out of unified
format [AD].
•Service discovery
is the task to get service endpoints by name
and not by a (permanently) changing address. All analyzed
ECPs supported service discovery by DNS based solutions
(Mesos, Kubernetes) or using the service names defined in
the application definition format (Docker Swarm). Thus, a
DSL must consider to name services in order to make them
discoverable via DNS or ECP-specific naming services
[SD]
.
•Deployment units
. The basic units of execution are mostly
based on containers. That is especially true for Docker Swarm
and Kubernetes. Only Apache Mesos supports by its design ar-
bitrary binaries as deployment units. However, the Marathon
framework supports container workloads and emerges as a
standard for the Mesos platform to operate containerized
workloads. A DSL should consider that the deployment unit
concept (whether named application group, pod or container)
is the basic unit of execution for all ECPs [DU].
•Scheduling
. All ECPs provide some kind of a scheduling ser-
vice. The scheduler assigns deployment units to nodes of the
ECP considering the current workload and resource efficiency.
The scheduling process of all ECPs can be constrained using
scheduling constraints or so called (anti-)affinities [
61
,
62
,
63
].
These kind of scheduling constraints must be considered and
expressable by a DSL [SCHED].
•Load Balancing
. Like scheduling, load balancing is sup-
ported by almost all analyzed platforms using special add-
ons like Marathon-lb-autoscale (Mesos), kube-proxy (Ku-
bernetes), or Ingress service (Docker Swarm). These load
balancers provide basic oad balancing strategies and they are
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 18/36
Figure 13. Amount of nodes during a transfer
Table 12. Concepts of analyzed elastic container platforms
Concept Mesos Docker Swarm Kubernetes
Application Definition Application Group Compose Service + Namespace
Controller (Deployment, DaemonSet, Job, ...)
All K8S concepts are described in YAML
Service discovery Mesos DNS Service names KubeDNS (or replacements)
Service links
Deployment Unit Binaries Container (Docker) Pod (Docker, rkt)
Pods (Marathon)
Scheduling Marathon Framework Swarm scheduler kube-scheduler
Constraints Constraints Affinities + (Anti-)affinities
Load Balancing Marathon-lb-autoscale Ingress load balancing Ingress controller
kube-proxy
Autoscaling Marathon-autoscale - Horizontal pod autoscaling
Component Labeling key/value key/value key/value
used to distribute and forward IP traffic to the deployment
units under execution. So load balancing strategies should be
considered as future extensions for a DSL [LB].
•Autoscaling
. Except for Docker Swarm, all analyzed ECPs
provide (basic) autoscaling features which rely mostly on mea-
suring CPU or memory metrics. In case of Docker Swarm
this could be extended using an add-on monitoring solution
triggering Docker Compose file updates. The Mesos platform
provides Marathon-autoscale for this purpose and Kubernetes
relies on a horizontal pod autoscaler. Furthermore, Kuber-
netes supports even making use of custom metrics. So, a DSL
should provide support for autoscaling supporting custom
and even application specific metrics [AS].
•Component labeling
. All ECPs provide a
key/value
based
labeling concept that is used to group components (services,
deployment units) of applications by a key value scheme. This
labeling is used more (Kubernetes) or less (Docker Swarm)
intensively by concepts like service discovery, schedulers,
load balancers ,and autoscalers. This component labeling
can be even used to code data-center regions, prices, policies
and even enable to deploy services only on specific nodes in
multi-cloud scenarios [
64
]. In consequence, a DSL should be
able to label application components in key/value style
[CL]
.
To define a universal CNA definition DSL, we followed
established methodologies for DSL development as proposed
by [
65
,
66
,
67
]. We identified the following requirements for a
Figure 14. Model-to-model transformation approach
DSL with the intended purpose to define elastic, transferable,
multi-cloud-aware cloud-native applications being operated
on elastic container platforms (for more details about the
identification of these requirements, please read Section II of
[59]).
•R1: Containerized deployments
. Containers are self-contained
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 19/36
deployment units of a service and the core building block of every
modern cloud-native application.
The DSL must be designed to
describe and label a containerized deployment of discoverable
services. This requirement comprises [SD], [DU], and [CL].
•R2: Application Scaling
. Elasticity and scalability are one of the
major advantages using cloud computing [
68
]. Scalability enables
to follow workloads by request stimuli in order to improve resource
efficiency [
69
].
The DSL must be designed to describe elastic
services. This requirement comprises [SCHED], [LB], and [AS].
•R3: Compendiously
. To simplify operations the DSL should be
pragmatic. Our approach is based on a separation between the descrip-
tion of the application and the elastic container platform.
The DSL
must be designed to be lightweight and infrastructure-agnostic.
This requirement comprises [AD], [SD], and [CL].
•R4: Multi-Cloud-Support
. Using multi-cloud-capable ECPs for
deploying CNAs is a major requirement for our migration approach.
Multi-cloud support also enables the use of Hybrid-cloud infrastruc-
tures.
The DSL must be designed to support multi-cloud opera-
tions
. This requirement comprises
[SCHED], [CL]
and the necessity
to be applied on ECPs operated in a way described by [49].
•R5: Independence
. To avoid dependencies, the CNA should be
deployable independently to a specific ECP and also to specific IaaS
providers.
The DSL must be designed to be independent from a
specific ECP or cloud infrastructure.
This requirement comprises
[AD]
and the necessity to be applied on ECPs operated in a way
described by [49].
•R6: Elastic Runtime Environment
. Our approach provides a CNA
deployment on an ECP which is transferable across multiple IaaS
cloud infrastructures.
The DSL must be designed to define ap-
plications being able to be operated on an elastic runtime envi-
ronment
. This requirement comprises
[SD], [SCHED], [LB], [AS],
[CL]
and should consider the operation of ECPs in way that is de-
scribed in [49] .
According to these requirements, we examined existing
domain-specific languages for similar kind of purposes. By
investigating literature and conducting practical experiments
in expressing a reference application, we analyzed whether
the DSL fulfills our requirements. The examined DSLs were
CAMEL [
70
], CAML [
73
], CloudML [
75
], MODACloudML
[
78
], MultiCLAPP , and TOSCA [
25
]. But none of these
languages covered all of the requirements [
59
]. Although
we have to state, that TOSCA fulfills most of our require-
ments. We decided against TOSCA because of its tool-chain
complexity and its tendency to cover all layers of abstraction
(especially the infrastructure layer).
Figure 15 summarizes the core language model for the
resulting DSL. An
Application
provides a set of
Service
s.
A
Service
can have an
Endpoint
on which its features are
exposed. One
Endpoint
belongs exactly to one
Service
and
is associated with a
Load Balancing Strategy
. A
Service
can use other
Endpoint
s of other
Service
s as well. These
Service
s can be external Services that are not part of the ap-
plication deployment itself. However, each internal
Service
executes at least one
DeploymentUnit
which is composed of
one or more
Container
s. Furthermore, schedulers of ECPs
should consider
DeploymentPolicies
for
DeploymentUnit
s.
Such
DeploymentPolicies
can be workload considering
Scal-
ing Rules but also general Scheduling Constraints.
Table 13 relates these DSL concepts to identified require-
ments and initially identified trends in containerization. Multi-
Cloud support (requirement R4) is not directly mapped to a
Figure 15. UCAML core language model
specific DSL concept. It is basically supported by separating
the description of the ECP (see Secion 5) and the description
of the CNA (see Section 6). Therefore, multi-cloud support
must not be a part of the CNA DSL itself, what makes the
CNA description less complex. Multi-cloud support is simply
delegated to a lower level ECP.
According to [
65
] we implemented this core language
model as a declarative, internal DSL in Ruby.
6.2 Evaluation
The UCAML model-to-model transformator is able to gener-
ate Kubernetes (version 1.9.0) and Docker Swarm (version
1.2.5) application description files. To evaluate the usability
of the DSL, we used UCAML to describe several applications
of different complexity:
•
The
PrimeService
is an example application of low complex-
ity. It demonstrates as a kind of Hello World application basic
UCAML feazures (see Appendix C: Listing 5).
•
The
Guestbook
(designed by Kubernetes
8
) is a multi-tier
cloud-native guestbook application of medium complexity
where visitors can leave messages and read the most recently
created messages. The application consist of a PHP-based
frontend and Redis master and slaves for the backend func-
tionality (see Appendix C: Listing 6).
•Sock Shop9
is a reference microservice e-commerce applica-
tion of typical real-world complexity for demonstrating and
testing of microservice and cloud-native technologies. Sock
Shop is developed by Weaveworks with using technologies
like Node.js, Go, Spring Boot and MongoDB and is one of
the most complete reference applications for cloud-native ap-
plication research according to [
58
]. The application consists
of nine services and four databases (see Appendix C: Listing
7).
Instructions for generating the Kubernetes and Docker
Swarm application definition formats are included in the com-
ments of the respective listings (see Appendix C). We val-
8http://bit.ly/2HOSnaO
9https://go.weave.works/socks
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 20/36
Table 13. Mapping DSL concepts to derived requirements (R1-R6) and containerization trends (AD, SD, DU, ..., CL).
Concept R1 R2 R3 R4 R5 R6 AD SD DU SCHED LB AS CL
Application x x
Service x x x
Endpoint x x x x
DeploymentUnit x x x x
Container x x x
DeploymentPolicies x x x x x x
LoadBalancingStrategy x x x x
Scaling Rules x x x x x x x
Scheduling Constraints x x x x x x x
Figure 16.
Screenshot of the sock-shop reference application
Figure 17. The sock-shop reference model used for
evaluation
idated that our DSL fulfills all requirements we defined by
three evaluation steps:
E1
. To evaluate the usability of the DSL for describing
a containerized (
R1
) , auto-scalable (
R2
) deployment in a
pragmatic way (
R3
), we described the above mentioned ap-
plications.
E2
. To evaluate multi-cloud-support (
R4
) and ECP in-
dependence (part of
R5
and
R6
) we deployed and operated
the most complex reference application Sock Shop on Docker
Swarm and Kubernetes. The ECPs consist of five working
machines (and one master) hosted on the IaaS infrastruc-
tures OpenStack, Amazon AWS, Google GCE and Microsoft
Azure.
E3
. For demonstrating IaaS independence (
R5
) we mi-
grated the sock-shop deployment between various IaaS infras-
tructures of Amazon Web Services, Microsoft Azure, Google
Compute Engine and a research institution specific OpenStack
installation. To validate all migration possibilities we have
done the following experiments:
•E3.1: Migration OpenStack10 ⇔AWS 11
•E3.2: Migration OpenStack ⇔GCE 12
•E3.3: Migration OpenStack ⇔Azure 13
•E3.4: Migration ⇔and GCE
•E3.5: Migration AWS ⇔Azure
•E3.6: Migration GCE ⇔Azure
Every E3 experiment is a set of migrations in both direc-
tions. E.g., evaluation experiment E3.1 includes migrations
from OpenStack to AWS and from AWS to OpenStack. All
migrations were repeated 10 times. The transfer times of
the infrastructure migrations are shown in Figure 18. As the
reader can see, the needed time for a infrastructure migration
stretches from 3 minutes (E3.1 OpenStack ⇒AWS) to more
than 18min (E3.6 Azure
⇒
AWS). Moreover, the transfer time
for migrating also depends on the transfer direction between
the source and the target infrastructure. E.g., as seen in E3.3,
the migration Azure
⇒
AWS takes four times longer than
the reversed migration AWS
⇒
Azure. Our analysis turned
out, that the differences in the transfer times are mainly due
to different blocking behavior of the IaaS API operations of
different providers. We already discussed the impact of the in-
frastructure layer to application transfer runtimes. The reader
is referred to Section 5. The differences in transfer times are
due to different involved IaaS cloud service providers and not
due to the presented DSL.
6.3 Results, Discussion, and Limitations
We could show succesfully that UCAML is usable to describe
CNAs in a short and non-verbose manner. Redundant and
repetitive definitions – which are common in ECP defini-
tion formats – can be avoided. Using UCAML it was possi-
ble to define a complete (auto-scalable and monitored) sock-
shop reference application composed of four databases, one
RabbitMQ-based messaging service and 8 further database-
interfacing or web-front-end-microservices in only 90 lines of
code (see Appendix C: Listing 5). The same application needs
more than 650 lines if expressed as Kubernetes manifest files,
500 lines if expressed as a Nomad job (without auto-scaling
10Own Plattform, machines with 2vCPUs
11Region eu-west-1, Worker node type m4.xlarge
12Region europe-west1, Worker node type n1-standard-2
13Region europewest, Worker node type Standard A2
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 21/36
Figure 18. Transferability results (sock-shop application)
and monitoring), 300 lines for Mesos/Marathon (without auto-
scaling and monitoring), or 170 lines if expressed as Docker
compose files (without auto-scaling and monitoring).
The model-to-model transformation allows the choose be-
tween different ECPs and also switching them without chang-
ing the CNA description. Some features are not supported
by all ECPs alike. For example, Docker Swarm Mode does
not offer auto-scaling of services. UCAML is designed that
despite such incompatibilities the transformation to the respec-
tive ECP succeeds. For example, in the case of auto-scaling
rules in combination with the use of Docker Swarm, the upper
bounds of scalability rules are taken as fix numbers (which
would cover even worst case scenarios).
We also rated the DSL pragmatism and practitioner accep-
tance higher than the richness of possible DSL expressions.
This was a result according to discussions with practitioners
[
1
]. This results in some limitations. For instance, our DSL is
intentionally designed for container and microservice archi-
tectures, but has limitations to express applications out of this
scope. This limits language complexity but reduces possible
use cases. For applications outside the scope of microservice
architectures, we recommend to follow more general TOSCA
or CAMEL based approaches.
Furthermore, because during transfers short-term failures
of dependent services might occur, the presented approach
is only pragmatically applicable for cloud resilient (CAMM
Level 2, see Table 4) or cloud-native applications (CAMM
Level 3, see Table 4) that can handle such short-term failures
by design. But these kind of applications seem to become the
dominant architectural-style for modern cloud applications, so
this does not seem to be a severe limitation from a practitioner
point of view.
7. Cloud Security Opportunities
This Section will present some opportunities for
cloud
security
that derived more or less unintentionally from
our multi-cloud-scaler prototype (see Section 5). We
refer to the following position papers
[79,80]
for more
in-depth information.
Cloud security was not in the main focus of our research
project and it was not the intent of this project to contribute
intentionally to the cloud security field. We just considered
consequently cloud security requirements as an important
engineering restriction. Cloud transferability concepts lead
somehow obviously to easily realizable moving target con-
cepts. Thus, our presented approach can provide the founda-
tion for an
asymmetric cloud security
defense strategy that
bases on a moving target philosophy. The effort to remain
footholds in moving targets is simply more challenging for an
attacker, thus moving targets require simply more efforts for
an attack.
”Moving target defense aims to frustrate the
un-
detected attacker, [...] who hides in a system.”
(Mike Burshteyn, see
http://bit.ly/2FE0tST
)
Unlike other active defense strategies like deception, threat in-
telligence, and counter-attack, it is a pure asymmetric defense
strategy with the intent to increase the efforts on the attacker’s
and not on the defender’s side.
Therefore, we were pro-actively invited by security ex-
perts to share our ideas on the CLOUD COMPUTING 2018
and CLOSER 2018 conferences as position papers. Several
experts rated the approach being presented in Section 5as an
”intriguing” opportunity to improve cloud security for cloud-
native applications – although this was not intended with our
solution. However, this shows ”think-outside-the-box” oppor-
tunities of our approach and might show options for follow-up
cloud security research projects in the asymmetric defense
field.
Many research studies and programs focus to develop sys-
tems in a responsible way to ensure the security and privacy
of users. But compliance with standards, audits and check-
lists, does not automatically equal security [
81
] and there is
a fundamental issue remaining. Zero-day vulnerabilities are
computer-software vulnerabilities that are unknown to those
who would be interested in mitigating the vulnerability (in-
cluding the entity responsible to operate a cloud application).
Until a vulnerability is mitigated, hackers can exploit it to
adversely affect computer programs, data, additional comput-
ers or a network. For zero-day exploits, the probability that
vulnerabilities are patched is zero, so the exploit should al-
ways succeed. Therefore, zero-day attacks are a severe threat
and we have to draw a scary conclusion:
In principle attack-
ers can establish footholds in our systems whenever they
want [82].
And intruders have an astonishing long time to act
undetected on victim systems (see Table 14).
However, taking the approach being described in Sec-
tion 5it should be possible to immunize cloud applications
even against
undetected intruders
simply by moving an ap-
plication within the same provider infrastructure. To move
anything from A to A makes no sense at first glance. However,
let us be paranoid and aware that with some probability and at
a given time, an attacker will be successful and compromise
at least one virtual machine [
82
]. In these cases, a transfer
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 22/36
Table 14. Undetected days on victim systems (reported by
M-Trends)
Year External notification Internal discovery Median
2010 - - 416
2011 - - ?
2012 - - 243
2013 - - 229
2014 - - 205
2015 320 56 146
2016 107 80 99
Figure 19. Cyber attack life cycle model, adapted from the
M-Trends reports
from A to A would be an efficient counter measure – because
the intruder immediately loses any hijacked machine that is
moved. Just as a kind remember for the reader: To move
a machine means for the presented scaler to launch a com-
pensating machine unknown to the intruder and to terminate
the former (hi-jacked) machine. Whenever an application is
moved all of its virtual machines are regenerated. And this
would effectively eliminate undetected hi-jacked machines.
The biological analogy of this strategy is called “cell-
regeneration” and the attack on ill cells is coordinated by an
immune system. The task of such an immune system is to
break the inner circle of an cyber attack process (maintain
presence and lateral movement, see Figure 19). And that fits
perfectly as a moving target defense strategy.
7.1 Breaking the cyber attack life cycle
Figure 19 shows the cyber attack life cycle model which is
used by the M-Trends reports
14
to report developments in cy-
ber attacks over the years. According to this model, an attacker
passes through different stages to complete a cyber attack mis-
sion. It starts with initial reconnaissance and compromising
of access means. These steps are very often supported by
social engineering methodologies [
83
] and phishing attacks
[
84
]. The goal is to establish a foothold near the system of
interest. All these steps are not covered by this report, because
technical solutions are not able to harden the weakest point
in security – the human being. The following steps of this
model are more interesting for this report. According to the
14http://bit.ly/2m7UAYb (visited 9th Nov. 2017)
life cycle model the attacker’s goal is to escalate privileges to
get access to the target system. Table 14 shows how astonish-
ingly many days on average an intruder has access to a victim
system. So, basically there is the requirement, that
an unde-
tected attacker should lose access to compromised nodes
of a system as fast as possible. But how?
7.2 Regenerating nodes permanently
Elastic container platforms are
designed for failure
and pro-
vide self-healing capabilities via auto-placement, auto-restart,
auto-replication and auto-scaling features. They will identify
lost containers (for whatever reasons, e.g. process failure or
node unavailability) and will restart containers and place them
on remaining nodes. These features are absolutely necessary
to operate large-scale distributed systems in a resilient way.
However, the same features can be used intentionally to
purge
“compromised nodes”.
Section 5demonstrated a software prototype that provided
a control process that could be used as essential part for an
”immune system”. This process relies on an intended state
ρ
and a current state
σ
of a container cluster. If the intended
state differs from the current state (
ρ6=σ
), necessary adap-
tion actions are deduced (creation and attachment/detachment
of nodes, creation and termination of security groups) and
processed by an execution pipeline fully automatically (see
Figure 11) to reach the intended state
ρ
. With this kind of
control process, a cluster can be simply resized by changing
the intended amount of nodes in the cluster. If the cluster is
shrinking and nodes have to be terminated, affected containers
of running applications will be rescheduled to other available
nodes.
The downside of this approach is, that this will only work
for Level 2 (cloud resilient) or Level 3 (cloud native) applica-
tions (see Table 4) that can tolerate dependent service failures
(due to node failures and container rescheduling) by design.
However, for that kind of Level 2 or Level 3 application, we
can use the same control process to regenerate nodes of the
container cluster. The reader shall consider a cluster with
σ=N
nodes. If we want to regenerate one node, we change
the intended state to
ρ=N+1
nodes which will add one new
node to the cluster (
σ0=N+1
). And in a second step, we
will decrease the intended size of the cluster to
ρ0=N
again,
which has the effect that one node of the cluster is terminated
(
σ00 =N
). So, a node is regenerated simply by adding one
node and deleting one node. We could even regenerate the
complete cluster by changing the cluster size in the following
way:
σ=N7→ σ0=2N7→ σ00 =N
. But, this would consume
much more resources because the cluster would double its
size for a limited amount of time. A more resource efficient
way would be to regenerate the cluster in
N
steps:
σ=N7→
σ0=N+17→ σ00 =N7→ ... 7→ σ2N−1=N+17→ σ2N=N
.
This should make the general idea clear.
Whenever such a regeneration is triggered, all – even
undetected – hijacked machines would be terminated and
replaced by other machines, but the application would be
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 23/36
Table 15. Durations to regenerate a node (median values)
Provider Creation Secgroup Joining Term. Total
AWS 70 s 1 s 7 s 2 s 81 s
GCE 100 s 8 s 9 s 50 s 175 s
Azure 380 s 17 s 7 s 180 s 600 s
OS 110 s 2 s 7 s 5 s 126 s
unaffected. For an attacker, this means losing their foothold
in the system completely. Imagine this would be done once
a day or even more frequently? In fact, this could mean to
reduce the time for an attacker from weeks or month down
to just minutes. To establish a foothold in a moving target is
simply much more complicated.
7.3 Expectable regeneration intervals
Because cloud security was no intentional research objec-
tive for the project Cloud TRANSIT, we did no in-depth
evaluations for node regenerations in cloud security contexts.
However, we can take some data of our multi-cloud-scaler
evaluation presented in Section 5.2 to derive reasonable ex-
pectations for possible regeneration intervals. Basically we
just have to add the infrastructure runtimes that would occur
during a node regeneration. Such a regeneration involves the
following steps:
1. Request a compensating node
2. Adjust security group (add compensating node)
3. Join compensating node
4. Terminating node to be regenerated
5. Adjust security group (remove regenerated node)
Table 15 summarizes the median times of observed infras-
tructure runtimes from our transferability experiments (see
Figure 12). Even on the ”slowest” infrastructure, a node can
be regenerated in about 10 minutes. In other words, one can
regenerate six nodes every hour or up to 144 nodes a day or a
cluster of 432 nodes every 72h (which is the reporting time
requested by the EU General Data Protection Regulation). If
the reader compares a 72h regeneration time of a more than
400 node cluster (most systems are not so large) with the me-
dian value of 99 days that attackers were present on a victim
system in 2016 (see Table 14) the benefit of the proposed
approach should become obvious.
7.4 Critical discussion and limitations
The idea of using an immune system like approach to remove
undetected intruders in virtual machines seems to a lot of
experts an intriguing moving target strategy. But state of the
art is, that this is not done. And there might be reasons for
that and open questions the reader should consider.
One question is how to detect ”infected” nodes efficiently?
The presented approach would select nodes simply at random.
This will hit every node at some time. The same could be done
using a round-robin approach but a round-robin strategy would
be better predictable for an attacker. However, both strategies
will create a lot of unnecessary regenerations and that leaves
obviously room for improvement. It seems obvious to search
for solutions like presented by [
85
,
86
] to provide some ”in-
telligence” for the identification of ”suspicious” nodes. This
would limit regenerations to likely ”infected” nodes. In all
cases it is essential for anomaly detection approaches to secure
the forensic trail [87,88].
Furthermore, to regenerate nodes periodically or even ran-
domly is likely nontrivial in practice and depends on the state
management requirements for the affected nodes. Therefore,
this paper proposes the approach only as a promising solution
for Level 2 or 3 cloud applications (see Table 4) that are oper-
ated on elastic container platforms. That kind of applications
have eligible state management characteristics. But, this is
obviously a limitation.
One could be further concerned about exploits that are
adaptable to bio-inspired systems. Stealthy resident worms
dating back to the old PC era would be an example. This
might be especially true for the often encountered case of not
entirely stateless services, when data-as-code dependencies
or code-injection vulnerabilities exist. Furthermore, attackers
could shift their focus to the platform itself in order to disable
the regeneration mechanism as a first step. On the other hand,
this could be easily detected – but there could exist more
sophisticated attacks.
Finally, there is obviously room and need for a much more
detailed evaluation. The effectiveness of this approach needs
a large scale and real world evaluation with more complex
cloud native applications using multiple coordinated virtual
machines. This is up for ongoing research and should be kept
in mind.
8. Lessons learned
The proposed cloud transferability concept leverages more
up-to-date container technologies with the intend to be more
”pragmatic”, ”lightweight” and complexity hiding. It prefers
descriptive ways to define intended states of platforms and
applications, instead of (often TOSCA-based) workflow ap-
proaches. Especially the currently dominant TOSCA standard
has the tendency to describe full-stacks of cloud applications
while the presented CloudTRANSIT approach describes the
infrastructure, the platform, and the application level as in-
dependent (but complementary) layers. Simplified spoken –
in TOSCA, an application is composed of services, services
of containers, containers15 are deployed on virtual machines,
and virtual machines are placed in data centers or cloud in-
frastructure zones. If you want to transfer such an application
from one zone or data center to another, this stack has to be
teared down in the current zone or data center, and then build
up again in the intended zone or data center. During the course
of the action for this tear-down-build-up process – that can
15
The container layer can be skipped. It is also possible to deploy services
directly on virtual machines or bare metal servers.
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 24/36
take hours or more to complete – the application is not reach-
able. TOSCA is likely to come along with downtimes when it
comes to application transfers between cloud infrastructures.
Our presented transferability concept transfers the plat-
form at runtime, independently from what application is de-
ployed on that platform. What is more, our software pro-
totypes listed in Appendix D: Table 18 provide a working
proof-of-concept solution. That is hardly realizable using
TOSCA. But we have to be fair, TOSCA can express arbitrary
kind of applications and therefore has shortcomings leverag-
ing specific aspects. The presented transferability concept
might be only applicable for container-based applications be-
ing on a true cloud-native maturity level (see Table 4). But
to use container platforms and corresponding microservice
architectures gets more and more common in CNA engineer-
ing [
2
]. For exactly this setting we derive the following main
conclusions.
8.1 Transfer the platform, not the application
Elastic container platforms provide inherent – but often over-
looked – multi-cloud support and are a viable and pragmatic
option to support multi-cloud handling. But to operate elastic
container platforms across different public and private IaaS
cloud infrastructures can be a complex and challenging en-
gineering task. Most manuals do not recommend to operate
these kind of platforms in the proposed way due to operational
complexity. In fact, to define an intended multi-cloud state
of an elastic container platform and let a control process take
care to reach this state is not less challenging. But it hides and
manages complexity much better. We showed that this com-
plexity could be efficiently embedded in an execution pipeline
of a control loop. This kind of resulting control process was
able to migrate and operate elastic container platforms
at run-
time
across different cloud-service providers. It was possible
to transfer two of the currently most popular open-source con-
tainer platforms Swarm and Kubernetes between AWS,GCE,
Azure and OpenStack. These infrastructures cover three of the
big five public cloud service providers that are responsible for
almost 70% of the worldwide IaaS market share
16
. In other
words, the presented solution can already be used for 70% of
the most frequently used IaaS infrastructures. Due to its driver
concept it can be extended with further IaaS infrastructures
and further elastic container platforms.
8.2 Make the platform a user decision as well
Open issues in deploying cloud-native applications to cloud in-
frastructures come along with the combination of multi-cloud
interoperability, application topology definition/composition
and elastic runtime adaption. This combination was – to the
best of the authors’ knowledge – not solved satisfactorily so
far, because these three problems have been often seen in
isolation. Therefore, we strived for a more integrated point of
view. The key idea is to describe the platform independently
16
According to the synergy 2016 Cloud Research Report
http://bit.
ly/2f2FsGK (visited 12th Jul. 2017)
from the application. However, the platform – be it Kuber-
netes, Swarm, Mesos, Nomad, ... – should be user decision
as well. So, cloud applications should be defined in a unified
way and the cloud definition should be transformable into
platform specific formats like Swarm compose files, or Kuber-
netes manifests. We proposed UCAML and a model-to-model
transformation as one option and implemented UCAML as
internal DSL in Ruby to fulfill our own special demands in
a fast and pragmatic way. However, other approaches might
be provide pragmatic and comparable solutions to UCAML
as well. Even TOSCA could be a reasonable choice for the
application level, especially if TOSCA would be limited to
topologies.
8.3 Rate pragmatism over expressiveness
What is more, we also considered the balance between lan-
guage expressiveness, pragmatism, complexity and practi-
tioner acceptance. For instance, using UCAML it was possi-
ble to define a complete (auto-scalable and monitored) sock-
shop reference application composed of four databases, one
RabbitMQ-based messaging service and 8 further database-
interfacing or web-front-end-microservices in only 90 lines of
code (see Appendix C: Listing 5). The same application needs
more than 650 lines if expressed as Kubernetes manifest files,
500 lines if expressed as a Nomad job (without auto-scaling
and monitoring), 300 lines for Mesos/Marathon (without auto-
scaling and monitoring), 170 lines if expressed as Docker
compose files (without auto-scaling and monitoring), and it is
even hard to count all the necessary lines of code in TOSCA
because the complete infrastructure layer had to be expressed
here as well.
To reach this, one design criteria of UCAML was to rate
pragmatism over expressiveness (and completeness of fea-
ture coverage of underlying container platforms). However,
UCAML is notwithstanding capable to express scheduling
constraints and scalability rules. Exactly the features practi-
tioners are asking for.
9. Conclusion
Because of cloud computing even very small companies can
generate enormous economical growth and business value by
providing cloud-native services or applications: Instagram,
Uber, WhatsApp, NetFlix, Twitter - there a lot of examples
of astonishing small companies (if we relate the modest head-
count of these companies in their founding days to their note-
worthy economical impact) whose services are frequently
used. However, even a fast growing start-up business model
should have its long-term consequences and dependencies
in mind. A lot of these companies rely on public cloud in-
frastructures – currently often provided by AWS. But will
be Amazon Web Services still the leading and dominating
cloud service provider in 20 years? The IT history is full of
examples that companies fail: Atari, Hitachi, America Online,
Compaq, Palm. Even Microsoft – still a prospering company
– is no longer the dominating software company it was used
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 25/36
to be in the 1990’s, and 2000’s. However, cloud providers
becoming more and more essential – meanwhile they run a
large amount of mission critical business software for compa-
nies that no longer operate their own data-centers. And there
are good economical reasons for this. So, cloud providers
become a to-big-to-fail company category that seems to be-
come equally important for national economies like banks,
financial institutions, electricity suppliers, public transport
systems. Although essential for national economies, these
financial, energy, or transport providers provide commodi-
ties - replaceable goods or services. But the cloud computing
domain is still different. Although cloud services could be
standardized commodities, they are mostly not. Once a cloud
hosted application or service is deployed to a specific cloud in-
frastructure, it is often inherently bound to that infrastructure
due to non-obvious technological bindings. A transfer to an-
other cloud infrastructure is very often a time consuming and
expensive one-time exercise (it took over a year for Instagram
to transfer its services from AWS to Facebook datacenters).
Throughout the course of this project we searched inten-
sively for solutions to overcome this ”cloud lock-in” – to
make cloud computing a true commodity. We developed and
evaluated a transferability concept that has prototype status
but already works for approximately 70% of the current cloud
market, and that is designed to be extendable for the rest of the
market share. But what is more essential: We learned some
core insights from taking a pure technological and pragmatical
point of view on the problem.
1. We should focus to transfer platforms (and not applications).
2.
We should prefer descriptive and cybernetic instead of workflow-
based deployment and orchestration approaches.
3.
We should provide solutions that make the choice of platforms
a user decision as well.
4.
We should rate pragmatism over language expressiveness and
complete platform feature coverage.
That would make cloud computing a true – and not just a
postulated – commodity.
Acknowledgments
This study was funded by German Federal Ministry of Education and
Research (Project Cloud TRANSIT, 13FH021PX4). Without this
funding the project and its outcomes would have been impossible.
But beside thankful for money and jobs, we additionally have to
thank a lot of further (often anonymous) researchers, experts, and
practitioners. First we would like to thank a plenty of anonymous
reviewers who contributed valuable and very constructive feedback
to all of our conference and journal papers throughout this research
project. Their valuable comments improved our papers and sharp-
ened our view on the research problem. Furthermore, we have to
thank the following persons for their valuable input from a prac-
titioner or research perspective:
Dirk Reimers
and
Derek Palme
from fat IT solutions GmbH (Kiel) for their general support of our
research project;
Josef Adersberger
and his team from QAWARE
GmbH (Munich) for their inspiring MEKUNS approach (Mesos,
Kubernetes Cloud-native Stack);
Ren´
e Peinl
from Hof University
for his valuable contributions to our Cloud-native reference model
and
Bob Duncan
from Aberdeen University for his inspiring point
of views on current and upcoming cloud security challenges; and
finally our students
Christian St¨
uben
,
Arne Salveter
, and
Thomas
Finnern
for all of their hard programming efforts that made our
research prototypes working and steadily growing.
References
[1]
N. Kratzke and R. Peinl. ClouNS - a Cloud-Native Applica-
tion Reference Model for Enterprise Architects. In 2016 IEEE
20th Int. Enterprise Distributed Object Computing Workshop
(EDOCW), pages 1–10, September 2016.
[2]
Nane Kratzke and Peter-Christian Quint. Understanding Cloud-
native Applications after 10 Years of Cloud Computing - A
Systematic Mapping Study. Journal of Systems and Software,
126(April):1–16, 2017.
[3]
Kai Petersen, Sairam Vakkalanka, and Ludwik Kuzniarz. Guide-
lines for conducting systematic mapping studies in software
engineering: An update. Information and Software Technology,
64:1–18, 2015.
[4]
Kai Petersen, Robert Feldt, Shahid Mujtaba, and Michael Matts-
son. Systematic mapping studies in software engineering. In
Proceedings of the 12th International Conference on Evalua-
tion and Assessment in Software Engineering, EASE’08, pages
68–77, Swinton, UK, UK, 2008. British Computer Society.
[5]
Barbara Kitchenham and Stuart Charters. Guidelines for per-
forming Systematic Literature Reviews in Software Engineer-
ing. Technical Report EBSE 2007-001, Keele University and
Durham University Joint Report, 2007.
[6]
Claes Wohlin, Per Runeson, Paulo Anselmo da Mota Sil-
veira Neto, Emelie Engstr
˜
A
¶
m, Ivan do Carmo Machado, and
Eduardo Santana de Almeida. On the reliability of mapping
studies in software engineering. Journal of Systems and Soft-
ware, 86(10):2594–2610, 2013.
[7]
E. Brewer. CAP twelve years later: How the ”rules” have
changed. Computer, 45(2):23–29, Feb 2012.
[8]
Christoph Fehling, Frank Leymann, Ralph Retter, Walter Schu-
peck, and Peter Arbitter. Cloud Computing Patterns. Springer,
2014.
[9]
Matt Stine. Migrating to Cloud-Native Application Architec-
tures. O’Reilly, 2015.
[10]
S. Newman. Building Microservices. O’Reilly Media, Incorpo-
rated, 2015.
[11]
Dmitry Namiot and Manfred Sneps-Sneppe. On micro-services
architecture. Int. Journal of Open Information Technologies,
2(9), 2014.
[12]
Adam Wiggins. The Twelve-Factor App, 2014. last access
2016-02-14.
[13] Martin Fowler. Circuit Breaker, 2014. last access 2016-05-27.
[14]
Thomas Erl, Robert Cope, and Armin Naserpour. Cloud Com-
puting Design Patterns. Springer, 2015.
[15]
Nane Kratzke. Lightweight Virtualization Cluster - Howto
overcome Cloud Vendor Lock-in. Journal of Computer and
Communication (JCC), 2(12), October 2014.
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 26/36
[16]
Nane Kratzke. A Lightweight Virtualization Cluster Reference
Architecture Derived from Open Source PaaS Platforms. Open
Journal of Mobile Computing and Cloud Computing (MCCC),
1(2):17–30, 2014.
[17]
S. Ashtikar, C.J. Barker, B. Clem, P. Fichadia, V. Krupin,
K. Louie, G. Malhotra, D. Nielsen, N. Simpson, and C. Spence.
OPEN DATA CENTER ALLIANCE Best Practices: Architect-
ing Cloud-Aware Applications Rev. 1.0, 2014.
[18]
Beniamino Di Martino, Giuseppina Cretella, and Antonio Es-
posito. Classification and Positioning of Cloud Definitions and
Use Case Scenarios for Portability and Interoperability. In 3rd
Int. Conf. on Future Internet of Things and Cloud (FiCloud
2015), pages 538–544. IEEE, 2015.
[19]
Michael Hogan, Liu Fang, Annie Sokol, and Jin Tong. Cloud In-
frastructure Management Interface (CIMI) Model and RESTful
HTTP-based Protocol, Version 2.0.0c, 2015.
[20]
Rald Nyren, Andy Edmonds, Alexander Papaspyrou, and Thijs
Metsch. Open Cloud Computing Interface (OCCI) - Core, Ver-
sion 1.1, 2011.
[21]
Thijs Metsch and Andy Edmonds. Open Cloud Computing
Interface (OCCI) - Infrastructure, Version 1.1, 2011.
[22]
SNIA. Cloud Data Management Interface (CDMI), Version 1.1,
2015.
[23]
System Virtualization, Partitioning, and Clustering Working
Group. Open Virtualization Format Specification, Version 2.1.0,
2015.
[24] OCI. Open Container Initiative, 2015. last access 2016-02-04.
[25]
OASIS. Topology and Orchestration Specification for Cloud
Applications (TOSCA), Version 1.0, 2013.
[26]
J. Opara-Martins, R. Sahandi, and Feng Tian. Critical review of
vendor lock-in and its impact on adoption of cloud computing.
In Int. Conf. on Information Society (i-Society 2014), pages
92–97, Nov 2014.
[27]
Peter M. Mell and Timothy Grance. The NIST Definition of
Cloud Computing. Technical report, National Institute of Stan-
dards & Technology, Gaithersburg, MD, United States, 2011.
[28]
Robert B. Bohn, John Messina, Fang Liu, Jin Tong, and Jian
Mao. Nist cloud computing reference architecture. In World
Congr. on Services (SERVICES 2011), pages 594–596, Wash-
ington, DC, USA, 2011. IEEE Computer Society.
[29]
Nane Kratzke. About Microservices, Containers and their Un-
derestimated Impact on Network Performance. In Proc. of
CLOUD COMPUTING 2015 (6th. Int. Conf. on Cloud Comput-
ing, GRIDS and Virtualization), pages 165–169, 2015.
[30]
Nane Kratzke and Peter-Christian Quint. ppbench - A Visualiz-
ing Network Benchmark for Microservices. In Proc. of the 6th
Int. Conf. on Cloud Computing and Services Science (CLOSER
2016), pages 223–231, 2016.
[31]
Nane Kratzke and Peter-Christian Quint. Investigation of Im-
pacts on Network Performance in the Advance of a Microservice
Design. In Markus Helfert, Donald Ferguson, Victor Mendez
Munoz, and Jorge Cardoso, editors, Cloud Computing and Ser-
vices Science Selected Papers, Communications in Computer
and Information Science (CCIS). Springer, 2016.
[32]
Nane Kratzke and Peter-Christian Quint. How to Operate Con-
tainer Clusters more Efficiently? Some Insights Concerning
Containers, Software-Defined-Networks, and their sometimes
Counterintuitive Impact on Network Performance. Int. Journal
On Advances in Networks and Services, 8(3&4):203–214, 2015.
[33]
Nane Kratzke and Peter-Christian Quint. About automatic
benchmarking of iaas cloud service providers for a world of con-
tainer clusters. Journal of Cloud Computing Research, 1(1):16–
34, 2015.
[34]
Daji Ergu, Gang Kou, Yi Peng, Yong Shi, and Yu Shi. The
analytic hierarchy process: Task scheduling and resource al-
location in cloud computing environment. J. Supercomput.,
64(3):835–848, June 2013.
[35]
Saurabh Kumar Garg, Steven Versteeg, and Rajkumar Buyya.
Smicloud: A framework for comparing and ranking cloud ser-
vices. In Utility and Cloud Computing (UCC), 2011 Fourth
IEEE International Conference on, pages 210–218. IEEE, 2011.
[36]
Chih-Kun Ke, Zheng-Hua Lin, Mei-Yu Wu, and Shih-Fang
Chang. An optimal selection approach for a multi-tenancy
service based on a sla utility. In Computer, Consumer and
Control (IS3C), 2012 International Symposium on, pages 410–
413. IEEE, 2012.
[37]
Alireza Afshari, Majid Mojahed, and Rosnah Mohd Yusuff.
Simple additive weighting approach to personnel selection prob-
lem. International Journal of Innovation, Management and
Technology, 1(5):511–515, 2010.
[38]
Cl
´
ement Quinton, Nicolas Haderer, Romain Rouvoy, and Lau-
rence Duchien. Towards multi-cloud configurations using fea-
ture models and ontologies. In Proceedings of the 2013 inter-
national workshop on Multi-cloud applications and federated
clouds, pages 21–26. ACM, 2013.
[39]
Chia-Wei Chang, Pangfeng Liu, and Jan-Jan Wu. Probability-
based cloud storage providers selection algorithms with maxi-
mum availability. In Parallel Processing (ICPP), 2012 41st Int.
Conf. on, pages 199–208. IEEE, 2012.
[40]
Jane Siegel and Jeff Perdue. Cloud services measures for global
use: the service measurement index (smi). In SRII Global
Conference (SRII), 2012 Annual, pages 411–415. IEEE, 2012.
[41]
Raed Karim, Chen Ding, and Ali Miri. An end-to-end QoS
mapping approach for cloud service selection. In Services (SER-
VICES), 2013 IEEE 9th World Congress on, pages 341–348.
IEEE, 2013.
[42]
Jun Yang, Wenmin Lin, and Wanchun Dou. An adaptive service
selection method for cross-cloud service composition. Concur-
rency and Computation: Practice and Experience, 25(18):2435–
2454, 2013.
[43]
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Gh-
odsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion
Stoica. Mesos: A platform for fine-grained resource sharing in
the data center. In Proceedings of the 8th USENIX Conference
on Networked Systems Design and Implementation, NSDI’11,
pages 295–308, Berkeley, CA, USA, 2011. USENIX Associa-
tion.
[44]
John Stone. Tachyon parallel / multiprocessor ray trac-
ing system.
http://jedi.ks.uiuc.edu/˜johns/
raytracer, 1995. Accessed May 20, 2015.
[45]
John D. McCalpin. Memory bandwidth and machine balance
in current high performance computers. IEEE Computer So-
ciety Technical Committee on Computer Architecture (TCCA)
Newsletter, pages 19–25, December 1995.
[46]
William D Norcott and Don Capps. Iozone filesystem bench-
mark. www.iozone.org, 55, 2003.
[47]
Berkley Lab. iPerf - The network bandwidth measurement tool.
https://iperf.fr, 2015.
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 27/36
[48]
Dekang Lin. An information-theoretic definition of similarity.
In Proceedings of the Fifteenth International Conference on
Machine Learning, ICML ’98, pages 296–304, San Francisco,
CA, USA, 1998. Morgan Kaufmann Publishers Inc.
[49]
Nane Kratzke. Smuggling Multi-Cloud Support into Cloud-
native Applications using Elastic Container Platforms. In Pro-
ceedings of the 7th Int. Conf. on Cloud Computing and Services
Science (CLOSER 2017), pages 29–42, 2017.
[50]
Peter-Christian Quint and Nane Kratzke. Overcome Vendor
Lock-In by Integrating Already Available Container Technolo-
gies - Towards Transferability in Cloud Computing for SMEs. In
Proceedings of CLOUD COMPUTING 2016 (7th. International
Conference on Cloud Computing, GRIDS and Virtualization),
2016.
[51]
Peter-Christian Quint and Nane Kratzke. Taming the Complex-
ity of Elasticity, Scalability and Transferability in Cloud Com-
puting - Cloud-Native Applications for SMEs. International
Journal on Advances in Networks and Services, 9(3&4):389–
400, 2016.
[52]
Nane Kratzke, Peter-Christian Quint, Derek Palme, and Dirk
Reimers. Project Cloud TRANSIT - Or to Simplify Cloud-
native Application Provisioning for SMEs by Integrating Al-
ready Available Container Technologies. In Verena Kantere
and Barbara Koch, editors, European Project Space on Smart
Systems, Big Data, Future Internet - Towards Serving the Grand
Societal Challenges. SCITEPRESS, 2016.
[53]
Adam Barker, Blesson Varghese, and Long Thai. Cloud Services
Brokerage: A Survey and Research Roadmap. In 2015 IEEE
8th Int. Conf. on Cloud Computing, pages 1029–1032. IEEE,
jun 2015.
[54]
Dana Petcu and Athanasios V. Vasilakos. Portability in clouds:
approaches and research opportunities. Scalable Computing:
Practice and Experience, 15(3):251–270, oct 2014.
[55]
Adel Nadjaran Toosi, Rodrigo N. Calheiros, and Rajkumar
Buyya. Interconnected Cloud Computing Environments. ACM
Computing Surveys, 47(1):1–47, may 2014.
[56]
Nikolay Grozev and Rajkumar Buyya. Inter-Cloud architectures
and application brokering: taxonomy and survey. Software:
Practice and Experience, 44(3):369–390, mar 2014.
[57]
Ch. Qu and R. N. Calheiros and R. Buyya. Auto-scaling Web
Applications in Clouds: A Taxonomy and Survey. CoRR,
abs/1609.09224, 2016.
[58]
Carlos M. Aderaldo, Nabor C. Mendon
c¸
a, Claus Pahl, and
Pooyan Jamshidi. Benchmark Requirements for Microservices
Architecture Research. In Proc. of the 1st Int. Workshop on Es-
tablishing the Community-Wide Infrastructure for Architecture-
Based Software Engineering, ECASE ’17, pages 8–13, Piscat-
away, NJ, USA, 2017. IEEE Press.
[59]
Peter-Christian Quint and Nane Kratzke. Towards a Lightweight
Multi-Cloud DSL for Elastic and Transferable Cloud-native
Applications. In Proceedings of the 8th Int. Conf. on Cloud
Computing and Services Science (CLOSER 2018, Madeira, Por-
tugal), 2018.
[60]
Peter-Christian Quint and Nane Kratzke. Towards a Description
of Elastic Cloud-native Applications for Transferable Multi-
Cloud-Deployments. In Proceedings of the 1st Int. Forum on
Microservices (Microservices 2017, Odense, Denmark), 2017.
Book of extended abstracts.
[61]
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David
Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster
management at Google with Borg. Proceedings of the Tenth Eu-
ropean Conference on Computer Systems - EuroSys ’15, pages
1–17, 2015.
[62]
Nitin Naik. Building a virtual system of systems using docker
swarm in multiple clouds. In Systems Engineering (ISSE), 2016
IEEE International Symposium on, pages 1–3. IEEE, 2016.
[63]
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Gh-
odsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion
Stoica. Mesos: A platform for fine-grained resource sharing in
the data center. In Proceedings of the 8th USENIX Conference
on Networked Systems Design and Implementation, NSDI’11,
pages 295–308, Berkeley, CA, USA, 2011. USENIX Associa-
tion.
[64]
Nane Kratzke. Smuggling Multi-cloud Support into Cloud-
native Applications using Elastic Container Platforms. In 8th.
Int. Conf. on Cloud Computing and Service Sciences, Porto,
Portugal, 2017.
[65]
Arie Van Deursen, Paul Klint, and Joost Visser. Domain-specific
languages: An annotated bibliography. ACM Sigplan Notices,
35(6):26–36, 2000.
[66]
Marjan Mernik, Jan Heering, and Anthony M. Sloane. When
and how to develop domain-specific languages. ACM Comput-
ing Surveys, 37(4):316–344, dec 2005.
[67]
Mark Strembeck and Uwe Zdun. An approach for the systematic
development of domain-specific languages. Software: Practice
and Experience, 39(15):1253–1292, oct 2009.
[68]
Luis M. Vaquero, Luis Rodero-Merino, and Rajkumar Buyya.
Dynamically scaling applications in the cloud. ACM SIGCOMM
Computer Communication Review, 41(1):45, 2011.
[69]
Ming Mao and Marty Humphrey. Auto-scaling to minimize cost
and meet application deadlines in cloud workflows. Proceedings
of 2011 International Conference for High Performance Com-
puting, Networking, Storage and Analysis on - SC ’11, page 1,
2011.
[70]
Alessandro Rossini. Cloud application modelling and execution
language (camel) and the paasage workflow. In Advances in
Service-Oriented and Cloud Computing—Workshops of ESOCC,
volume 567, pages 437–439, 2015.
[71]
Gordon Blair, Nelly Bencomo, and Robert B France. Models@
run. time. Computer, 42(10), 2009.
[72]
Franck Chauvel, Nicolas Ferry, Brice Morin, Alessandro
Rossini, and Arnor Solberg. Models@ runtime to support the
iterative and continuous design of autonomic reasoners. In
MoDELS@ Run. time, pages 26–38, 2013.
[73]
Alexander Bergmayr, Manuel Wimmer, Gerti Kappel, and
Michael Grossniklaus. Cloud modeling languages by exam-
ple. In Proceedings - IEEE 7th International Conference on
Service-Oriented Computing and Applications, SOCA 2014,
2014.
[74]
Maksym Lushpenko, Nicolas Ferry, Hui Song, Franck Chauvel,
and Arnor Solberg. Using adaptation plans to control the be-
havior of models@ runtime. In MoDELS@ Run. time, pages
11–20, 2015.
[75]
Eirik Brandtzæg, S
´
ebastien Mosser, and Parastoo Mohagheghi.
Towards cloudml, a model-based approach to provision re-
sources in the clouds. In 8th European Conference on Modelling
Foundations and Applications (ECMFA), pages 18–27, 2012.
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 28/36
[76]
Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David
Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster
management at Google with Borg. In 10th. Europ. Conf. on
Computer Systems (EuroSys ’15), Bordeaux, France, 2015.
[77]
Tobias Binz, Uwe Breitenb
¨
ucher, Oliver Kopp, and Frank Ley-
mann. TOSCA: Portable Automated Deployment and Manage-
ment of Cloud Applications. In Advanced Web Services, pages
527–549. Springer New York, New York, NY, 2014.
[78]
Matej Arta
ˇ
c, Tadej Borov
ˇ
sak, Elisabetta Di Nitto, Michele Guer-
riero, and Damian A Tamburri. Model-driven continuous deploy-
ment for quality devops. In Proceedings of the 2nd International
Workshop on Quality-Aware DevOps, pages 40–41. ACM, 2016.
[79]
Nane Kratzke. About an Immune System Understanding for
Cloud-native Applications - Biology Inspired Thoughts to Im-
munize the Cloud Forensic Trail. In Proc. of the 9th Int. Conf. on
Cloud Computing, GRIDS, and Virtualization (CLOUD COM-
PUTING 2018, Barcelona, Spain), 2018. accepted paper.
[80]
Nane Kratzke. About being the Tortoise or the Hare? A Position
Paper on Making Cloud Applications too Fast and Furious for
Attackers. In Proc. of the 8th Int. Conf. on Cloud Computing and
Services Science (CLOSER 2018, Funchal, Madeira, Portugal),
2018. accepted paper.
[81]
Bob Duncan and Mark Whittington. Compliance with standards,
assurance and audit: does this equal security? In Proc. 7th Int.
Conf. Secur. Inf. Networks - SIN ’14, pages 77–84, Glasgow,
2014. ACM.
[82]
Leyla Bilge and Tudor Dumitras. Before we knew it: an empiri-
cal study of zero-day attacks in the real world. In ACM Conf.
on Computer and Communications Security, 2012.
[83]
Katharina Krombholz, Heidelinde Hobel, Markus Huber, and
Edgar Weippl. Advanced social engineering attacks. Journal of
Information Security and Applications, 22, 2015.
[84]
Surbhi Gupta, Abhishek Singhal, and Akanksha Kapoor. A lit-
erature survey on social engineering attacks: Phishing attack. In
2016 Int. Conf. on Computing, Communication and Automation
(ICCCA), pages 537–540, 2016.
[85]
Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li. Execution
Anomaly Detection in Distributed Systems through Unstruc-
tured Log Analysis. In 2009 Ninth IEEE Int. Conf. on Data
Mining, 2009.
[86]
Markus Wurzenberger, Florian Skopik, Roman Fiedler, and
Wolfgang Kastner. Applying High-Performance Bioinformatics
Tools for Outlier Detection in Log Data. In CYBCONF, 2017.
[87]
Bob Duncan and Mark Whittington. Cloud cyber-security: Em-
powering the audit trail. Int. J. Adv. Secur., 9(3 & 4):169–183,
2016.
[88]
Bob Duncan and Mark Whittington. Creating an Immutable
Database for Secure Cloud Audit Trail and System Logging. In
Cloud Comput. 2017 8th Int. Conf. Cloud Comput. GRIDs, Vir-
tualization, pages 54–59, Athens, Greece, 2016. IARIA, ISBN:
978-1-61208-529-6.
[89]
Nane Kratzke. About the Complexity to Transfer Cloud Applica-
tions at Runtime and how Container Platforms can Contribute?
In Markus Helfert, Donald Ferguson, Victor Mendez Munoz,
and Jorge Cardoso, editors, Cloud Computing and Services
Science (revised selected papers), Communications in Com-
puter and Information Science (CCIS). Springer, 2018. to be
published.
[90]
Nane Kratzke. CloudTRANSIT - Sichere, plattfor-
munabh
¨
angige und transferierbare IT-Services mittels einer
generischen Cloud Service Description Language. Impulse
- Aus Forschung und Lehre der FH L
¨
ubeck, 18(1), November
2014.
[91]
Christian St
¨
uben. Autoscaling f
¨
ur elastische Container Plattfor-
men (BA Thesis), 2017.
[92]
Thomas Finnern. Analyse und Integration von Storage-Clustern
in elastische Container Plattformen (MA Thesis), 2017.
[93]
Thomas Finnern. Evaluation einer cloudspeicher-loesung bei
einem telekommunikationsunternehmen, 2016.
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 29/36
A - Cloud services and cloud standards
Table 16. Popular cloud services/platforms and their mapping to cloud standards, compiled in 2016
Service Category Google Azure AWS OpenStack
OCCI
CIMI
CDMI
OVF
OCI
TOSCA
ClouNS
Compute Compute Engine Virtual Machines EC2 Nova C M M 2,3
Custom Machine Types Glance C M 2
App Engine Cloud Services Elastic Beanstalk 5
RemoteApp MobileHub + AppStream 5
Container Container Engine Container Service ECS Magnum +
3, 4
Container Registry EC2 Container Registry +
4
Storage Cloud Storage S3 Swift St 5
Storage (Disks) EBS Cinder St V St 2
Storage (Files) Elastic File System Manila St St 4
Nearline Backup Glacier 5
StorSimple 5
Storage Gateway 1
Networking Cloud Networking Vir tual Network VPN Neutron N N 2
Express Route Direct Connect 1
TrafficManager 4
Load Balancer ELB 4
Azure DNS Route 53 Designate 4
Media Services Elastic Transcoder 5
CDN Cloud Front 5
Cloud Endpoints Application Gateway API Gateway 5
Web Application Firewall 5
Database Cloud SQL SQL Database RDS Trove 5
SQL Data Warehouse Redshift 5
Datastore DocumentDB DynamoDB St 5
Big Table Storage (Tables) 5
Big Query 5
Data Flow Data Pipeline 5
Data Proc HDInsights EMR Sahara 5
Data Lake Store 5
Batch 5
Data Lab 5
Redis Cache ElastiCache 5
Database Migration Service -
Messaging PUB/SUB 5
Notification Hubs SNS 5
Notification Hubs SES 5
Event Hubs SQS + Lambda Zaqar 5
Storage (Queues) 5
Service Bus 5
Monitoring Cloud Monitoring Operational Insights Cloud Watch Ceilometer Mon 4
Cloud Logging CloudTrail 4
Management Cloud Deployment Manager Automation Cloud Formation Heat Sys Sys 4
OpsWork 4
Config + Service Catalog -
Azure Service Fabric 5
Site Recovery 5
API Management API Gateway 5
Mobile Engagement Mobile Analytics 5
Active Directory Directory Ser vice 5
Multi Factor Authentication IAM Keystone 4
Certificate Manager 4
Key Vault CloudHSM Barbican 4
Security Center Trusted Advisor -
Inspector -
Further Survices Translate API 5
Prediction API Machine Learning Machine Learning 5
Data Lake Analytics Kinesis + QuickSight 5
Search Cloud Search 5
IoT Hub IoT 5
BizTalk Services 5
VS Team Services CodeCommit -
DevTestsLabs CodePipeline&Deploy -
VS Application Insights 4
LumberJack 5
WorkSpaces, Mail, Docs 6
Sum of Services: 22 46 55 15 5 6 4 1 2 1
OCCI Concepts: Compute = C, Network = N, Storage = St (used for CDMI as well)
CIMI Concepts: System = Sys (used for TOSCA as well), Machine = M (used for OVF as well), Volume = V, Network = N, Monitoring = Mon
ClouNS Layers: 1 = Physical, 2 = Virtual Node, 3 = Host, 4 = Cluster (Container/Storage), 5 = Service, 6 = Application
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 30/36
B - Cluster definition formats
Cluster Definition File (Intended state)
This exemplary
cluster definition
file defines a Swarm cluster with the intended state to be deployed in two districts provided by two
providers GCE and AWS. It defines three type of user defined node types (flavors): small,med, and large. 3 master and 3 worker nodes
should be deployed on small virtual machine types in district gce-europe. 10 worker nodes should be deployed on small virtual machine types
in district aws-europe. The flavors small,med,large are defined in Listing 3.
Listing 1. Cluster Definition (cluster.json)
{
” t y pe ” : ” c l u s t e r ” ,
” p l a t f o r m ” : ” Swarm” ,
/ / [ . . . ] , S i m p l i f i ed f o r r e a d a b i l i t y
” f l a v o r s ” : [ ” sm a ll ” , ” med” , ” l a r ge ” ] ,
” de pl oym en ts ” : [
{” d i s t r i c t ” : ” gce−e uro pe ” ,
” f l a v o r ” : ” med ” ,
” r o l e ” : ” ma st er ” ,
” q u a n t i t y ” : 3
},
{” d i s t r i c t ” : ” aws−euro pe ” ,
” f l a v o r ” : ” s m al l ” ,
” r o l e ” : ” wo r ke r ” ,
” q u a n t i t y ” : 1 0
}
]
}
Resources File (Current state)
This exemplary
resources
file describes provided resources for the operated cluster. This example describes a simple one node cluster (1
master) being operated in one district (OpenStack). A security group was requested. Some data is omitted for better readability.
Listing 2. Resources (resources.json)
[
{
” i d ” : ”3 6 c7 611 8−d8e4−4d2c−b14e−fd6 738 7d 35f 5 ” ,
” d i s t r i c t i d ” : ” o pen st ack −n ova ” ,
” o s e x t e r n a l ne t w o r k i d ” : ”8 0 de50 1b−e836−47ed−a413 ” ,
” os se cgr ou p na me ” : ” s ecg rou p−a66817bd85e96c” ,
” os se c g r ou p id ” : ”3 6 c7 611 8−d8e4−4d2c−b14e ” ,
” os ke y nam e ” : ” s shke y−f o r −secgroup−a66817bd85e96c” ,
” t yp e ” : ” se cgr ou p ”
},
{
” i d ” : ”1 3 c3 064 2−b337−4963−94aa−60c ef8d b9 bbf ” ,
” r o l e ” : ” ma st er ” ,
” f l a v o r ” : ” medium ” ,
” p u b l i c i p ” : ” 2 1 2. 2 0 1. 2 2 .1 8 9” ,
” us er ” : ” u bun tu ” ,
” ss hke y ” : ” ss hke y . pem” ,
” d i s t r i c t i d ” : ” o pen st ack −n ova ” ,
” o s zo ne ” : ” n ova ” ,
” t yp e ” : ”n ode ”
}
]
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 31/36
District Definition File (JSON)
The following and exemplary
district definition
defines provider specific settings and mappings. The user defined district gce-europe should
be realized using the provider specific GCE zones europe-west1-b and europe-west1-c. Necessary and provider specific access settings like
project identifiers, regions, and credentials are provided as well. User defined flavors (see cluster definition format above) are mapped to
concrete provider specific machine types.
Listing 3. District Definitions (districts.json)
[
{
” t yp e ” : ” d i s t r i c t ” ,
” i d ” : ” gc e−euro pe ” ,
” p r o v i d e r ” : ” g ce ” ,
” c r e d e n t i a l i d ” : ” g c e d e fa u l t ” ,
” g c e pr o j e c t i d ” : ” yo ur −proj−i d ” ,
” g ce re gi on ” : ” eur ope −west 1 ” ,
” gc e z one s ” : [” e uro pe−west1−b ” , ” e uro pe−we st 1−c ” ] ,
” flavors ”: [
{” f l a v o r ” : ” s m al l ” , ” m a ch in e t yp e ” : ” n 1−standard−1” },
{” f l a v o r ” : ” med ” , ” ma ch in e ty pe ” : ” n1−standard−2 ” },
{” f l a v o r ” : ” l ar g e ” , ” ma c hi ne t yp e ” : ” n1−standard −4” }
]
}
]
Credentials File (JSON)
The following and exemplary
credential file
provides access credentials for customer specific GCE and AWS accounts as identified by the
district definition file (gce default and aws default).
Listing 4. Credentials (credentials.json)
[
{
” t y pe ” : ” c r e d e n t i a l ” ,
” i d ” : ” g c e de f a u l t ” ,
” p r o v i d e r ” : ” g ce ” ,
” g c e ke y f i l e ” : ” p at h−t o −ke y . js on ”
},
{
” t y pe ” : ” c r e d e n t i a l ” ,
” i d ” : ” a w s d e f au l t ” ,
” p r o v i d e r ” : ” aws ” ,
” a w s a cc e ss ke y i d ” : ” AKID ” ,
” a ws s ec r et a cc es s ke y ” : ” SECRET ”
}
]
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 32/36
C - UCAML defined cloud applications
Prime Service
The prime service is the most simple example for UCAML. It is merely a single however auto-scalable container service and its main purpose
is to demonstrate UCAML in a kind of ”Hello World” style.
Listing 5. Prime Service (prime-service.ucaml)
# Th i s i s a s m al l ex amp le a p p l i c a t i o n d em o ns t ra t i ng a k in d o f UCAML h e l l o w or l d a p p l i c a t i o n .
#
# S t a r t i t l i k e he re :
#>uc aml . r b t ra ns fo rm −−pri me−service >pri me −s e r v i c e . ya ml
#>k u b e c t l a p p ly −f p r im e−s e r v i c e . ya ml
#
# Th i s w i l l l a un c h a ve r y b a si c pr im e numb er ch e ck i n g s e r v i c e .
# A f t er you hav e s ta r t e d t h i s a p p l ic a t i o n yo u c an n av i ga t e u si ng yo ur f a v o r i t e web bro ws er o r c u r l .
# Le t us assume yo ur K8s c l u s t e r c an be a cc ess ed v i a 19 2. 1 68 . 99 .1 0 0 ( r ep l a ce w it h yo ur I P i f ne c es sa ry )
#
#>c u r l −k http ://192.168.99.100:32501/ is−pr im e /1 27 72 1
# A nsw er w i l l be : 1 277 21 i s n o t a p r im e numbe r . I t can be d i v i d e d by 11 , 17 , 18 7 , 68 3 , 75 1 3 , 11611
#
#>c u r l −k http ://192.168.99.100:32501/ is−pr im e /1 27
# A nswer w i l l b e: 127 i s a pr im e nu mber
#
# Te n l i n e s o f c od e t o d e f i n e an e x e c ut a b l e macr o−a r c h i t e c t u r e f o r an a u to s c a l i ng c lo u d a p p l i c a t i o n .
Ucam l : : a p p l i c a t i o n ( ’ pr im e−service−app ’ ,
services : [
Ucam l : : se r v i c e ( ’ p r im e−service ’ ,
r eq u es t : Uc aml : : re q ue s t ( cpu : 10 0 , memory : 25 6 , e ph em e ra l s to r ag e : 2 ) ,
s ca l e : Ucam l : : sc a l i n g r u l e ( min : 1 , max : 1 0 , cpu : 66 ) ,
p o r t s : [ 8 8 8 8 ] ,
exp ose : [8 88 8 =>32 50 1 ] , # Th i s ex po se s t h i s s e r v i c e t o t h e o u t s i d e w o r ld v i a po r t 3250 0
c o n t a i n e r : Uca ml : : c o n t a i n e r ( ’ p ri me −u ni t ’ , ’ t r a n s i t / p ri me sv c : l a t e st ’ , cmd : ’ r ub y hw−se r v i c e . r b ’ , po r t s : [ 8 0 ] )
)
]
)
Project CloudTRANSIT
Transfer Cloud-native Applications at Runtime — 33/36
Guestbook Application
The guestbook example is an application of medium complexity. It is a famous example application used by a number of elastic container
platforms like Docker Swarm and Kubernetes. It is composed of a web front-end and Redis backend. The Redis backend itself is composed
of master and scalable amount of slaves.
Listing 6. Guestbook Application (guestbook.ucaml)
# Th i s i s t he Uc aml d e s c r i p t i o n o f t he K u be rn et es g ue st bo ok , a si mp l e and m u l ti −t i e r cl ou d−n a ti v e web a p p l i c a t i o n
# h t t p s : / / g i t h u b . com / k ub e r ne t e s / ku b e rn e t es / b l o b / ma s te r / e xa mp le s / g ue s tb o ok / a l l −in −one / gu est bo ok−all −in −one . y aml
#
# S t a r t i t l i k e he re :
#>uc aml . r b t ra ns fo rm −−guestbook >gue st boo k . yam l
#>k u b e c t l a p p ly −f g ues tbo ok . ya ml
#
# Th is w i l l l au nc h a gu stb ook −webservice .
# Le t us assume yo ur K8s c l u s t e r c an be a cc ess ed v i a 19 2. 1 68 . 99 .1 0 0 ( r ep l a ce w it h yo ur I P i f ne c es sa ry )
# A f t er you hav e s ta r t e d t h i s a p p l ic a t i o n yo u c an n av i ga t e u si ng yo ur f a v o r i t e web bro ws er
#
# ht tp ://192.168.99.100:32500
#
s c a l i n g r u l e c p u 50 = Ucaml : : s c a l i n g r u l e ( min : 1 , max : 1 0 , cp u : 66 )
s c a l i n g r u l e n o s c a l e = Ucaml : : s c a l i n g r u l e ( m in : 1 , max: 1 , cp u : 80 )
en vi ro nme nt = Uc aml : : e nv iro nm en t (GET HOSTS FROM : ’ en v ’ )
s e r v i c e t e m p l a t e = U caml : : S e r v ic e . new ( name : ’ t e mp l at e ’ ,
s ch e du l i ng : Ucaml : : co n s t r a i n t ( ’ b et a . k u be r ne te s . i o / os ’ : ’ l i nu x ’ ) ,
r eq u es t : Uc aml : : re q ue s t ( cpu : 70 , memory : 2 56 , e p he me r al s t or a ge : 2 )
)
Ucam l : : a p p l i c a t i o n ( ’ demo−g ue s tb ook ’ ,
scale: scalingrule cpu 50 ,
services : [
s e r v i c e t e m p l a t e . co py . up d at e (
l a b e l s : Uc aml : : l a b e l s ( ap p : ’ r e di s ’ , t i e r : ’b ack end ’ , r o l e : ’ ma st er ’ ) ,
name: ’ r e d i s −master ’ ,
scale: scalingrule noscale ,
p o r t s : [ 6 3 7 9 ] ,
c o n t a i n e r : Uca ml : : Co n t a in e r . new ( name : ’ ma s te r ’ , ima ge : ’ k8 s . gc r . i o / r e d i s : e2e ’ , p o r t s : [ 6 3 7 9 ] )
) ,
s e r v i c e t e m p l a t e . co py . up d at e (
l a b e l s : Uc aml : : l a b e l s ( ap p : ’ r e di s ’ , t i e r : ’b ack end ’ , r o l e : ’ s la v e ’ ) ,
name: ’ r e d i s −sl a v e ’ ,
p o r t s : [ 6 3 7 9 ] ,
c o n t a i n e r : Uca ml : : Co n t a in e r . new ( name : ’ s l av e ’ ,
ima ge : ’ gcr . i o / go ogle s amp les / gb−r e d i s s l a v e : v 1 ’ ,
p o r t s : [ 63 7 9 ]
) ,
en viro nmen t : en vi ro nm en t