Conference PaperPDF Available

Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Container Platforms

Authors:
  • Lübeck University of Applied Sciences

Abstract and Figures

Elastic container platforms (like Kubernetes, Docker Swarm, Apache Mesos) fit very well with existing cloud-native application architecture approaches. So it is more than astonishing, that these already existing and open source available elastic platforms are not considered more consequently in multi-cloud research. Elastic container platforms provide inherent multi-cloud support that can be easily accessed. We present a solution proposal of a control process which is able to scale (and migrate as a side effect) elastic container platforms across different public and private cloud-service providers. This control loop can be used in an execution phase of self-adaptive auto-scaling MAPE loops (monitoring, analysis, planning, execution). Additionally, we present several lessons learned from our prototype implementation which might be of general interest for researchers and practitioners. For instance, to describe only the intended state of an elastic platform and let a single control process take care to reach this intended state is far less complex than to define plenty of specific and necessary multi-cloud aware workflows to deploy, migrate, terminate, scale up and scale down elastic platforms or applications.
Content may be subject to copyright.
Smuggling Multi-cloud Support into Cloud-native Applications using
Elastic Container Platforms
Nane Kratzke
Center of Excellence for Communication, Systems and Applications (CoSA),
L¨
ubeck University of Applied Sciences, 23562 L¨
ubeck, Germany
nane.kratzke@fh-luebeck.de
Keywords: Cloud-native Application, Multi-cloud, Elastic Platform, Container, Microservice, Portability, Transferability,
MAPE, AWS, GCE, OpenStack, Kubernetes, Docker, Swarm.
Abstract: Elastic container platforms (like Kubernetes, Docker Swarm, Apache Mesos) fit very well with existing cloud-
native application architecture approaches. So it is more than astonishing, that these already existing and
open source available elastic platforms are not considered more consequently in multi-cloud research. Elastic
container platforms provide inherent multi-cloud support that can be easily accessed. We present a solution
proposal of a control process which is able to scale (and migrate as a side effect) elastic container platforms
across different public and private cloud-service providers. This control loop can be used in an execution
phase of self-adaptive auto-scaling MAPE loops (monitoring, analysis, planning, execution). Additionally,
we present several lessons learned from our prototype implementation which might be of general interest for
researchers and practitioners. For instance, to describe only the intended state of an elastic platform and let a
single control process take care to reach this intended state is far less complex than to define plenty of specific
and necessary multi-cloud aware workflows to deploy, migrate, terminate, scale up and scale down elastic
platforms or applications.
1 INTRODUCTION
Cloud-native applications (CNA) are large scale elas-
tic systems being deployed to public or private cloud
infrastructures. The capability to design, develop and
operate a CNA can create enormous business growth
and value in a very limited amount of time. Compa-
nies like Instagram, Netflix, Dropbox, etc. proofed
that imposingly. They operate these kind of applica-
tions often on large scale elastic container-based clus-
ters with up to thousands of nodes. However, due
to slender standardization in cloud computing it can
be tricky to operate CNA across differing public or
private infrastructures in multi-cloud or hybrid cloud
scenarios. CNA – even when written from scratch –
are often targeted for a specific cloud only. The ef-
fort for porting in a different cloud is usually a one
time exercise and can be very time consuming and
complex. For instance, Instagram had to analyze their
existing services for almost one year to derive a vi-
able migration plan how to transfer their services from
Amazon Web Services (AWS) to Facebook datacenters.
This migration worked at last, but it was accompanied
by severe outages. This phenomenon is called a ven-
dor lock-in and CNA seem to be extremely vulnera-
ble for it. Almost no recent multi-cloud survey study
(see Section 6) considered elastic container platforms
(see Table 1) as a viable and pragmatic option to sup-
port multi-cloud handling. It is very astonishing that
this kind of already existing and open source available
technology is not considered more consequently in
multi-cloud research (see Section 6). That might have
to do with the fact, that ”the emergence of containers,
especially container supported microservices and ser-
vice pods, has raised a new revolution in [...] resource
management. However, dedicated auto-scaling solu-
tions that cater for specific characteristics of the con-
tainer era are still left to be explored.(Ch. Qu and R.
N. Calheiros and R. Buyya, 2016). The acceptance of
container technologies and corresponding elastic con-
tainer platforms has gained substantial momentum in
recent years. That resulted in a lot of technological
progress driven by companies like Docker, Netflix,
Google, Facebook, Twitter who released their solu-
tions very often as Open Source software. So, from
the current state of technology existing multi-cloud
approaches (often dated before container technologies
have been widespread) seem very complex – much
Kratzke, N.
Smuggling Multi-cloud Support into Cloud-native Applications using Elastic Container Platforms.
In Proceedings of the 7th International Conference on Cloud Computing and Services Science (CLOSER 2017), pages 29-42
ISBN: 978-989-758-243-1
Copyright ©2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
29
Table 1: Some popular open source elastic platforms and
their major contributing organizations.
Platform Contributors URL
Kubernetes Google http://kubernetes.io
Swarm Docker https://docker.io
Mesos Apache http://mesos.apache.org/
Nomad Hashicorp https://nomadproject.io/
too complex for a lot of use cases of cloud-native
applications which have become possible due to the
mentioned technological progress of the last three or
four years. This paper considers this progress and has
mainly two contributions:
A control loop (see Section 4.2) is presented being
able to scale elastic container platforms in multi-
cloud scenarios. This single control loop is ca-
pable to handle common multi-cloud workflows
like to deploy, to migrate/transfer, to terminate, to
scale up/down CNAs. The control loop is provid-
ing not just scalability but federation and transfer-
ability across multiple IaaS cloud infrastructures
as a side-effect.
This scaling control loop is intended to be used in
the execution phase of higher-level auto-scaling
MAPE loops (monitoring, analysis, planning, ex-
ecution) as systematized by (Pahl and Jamshidi,
2015; Ch. Qu and R. N. Calheiros and R. Buyya,
2016) and more. To some degree, the proposed
control loop makes the necessity for complex
and IaaS infrastructure-specific multi-cloud work-
flows redundant.
The remainder of this paper is outlined as follows.
Section 2 will investigate how CNAs are being build.
This is essential to understand how to avoid vendor
lock-in in a pragmatic and often overlooked way. Sec-
tion 3 will focus some requirements which should
be fulfilled by multi-cloud capable CNAs and will
show how already existing open source elastic con-
tainer platforms can contribute pragmatically. The
reader will see that these kind of platforms contribute
to operate cloud-native applications in a resilient and
elastic way. We provide a multi-cloud aware proof-
of-concept in Section 4 and derive several lessons
learned from the evaluation in Section 5. The pre-
sented scaling control loop is related to other work
in Section 6. Similarities and differences are summa-
rized.
2 WHAT IS A CNA?
Although the term CNA is vague, there exist sim-
ilarities of various view points (Kratzke and Quint,
2017b). According to common motivations for CNA
architectures are to deliver software-based solutions
more quickly (speed), in a more fault isolating, fault
tolerating, and automatic recovering way (safety),
to enable horizontal (instead of vertical) application
scaling (scale), and finally to handle a huge diver-
sity of (mobile) platforms and legacy systems (client
diversity) (Stine, 2015). (Fehling et al., 2014) pro-
pose that a CNA should be IDEAL. It should have an
isolated state, is distributed in its nature, is elastic in
a horizontal scaling way, operated via an automated
management system and its components should be
loosely coupled.
These common motivations and properties are ad-
dressed by several application architecture and in-
frastructure approaches (Balalaie et al., 2015): Mi-
croservices represent the decomposition of mono-
lithic (business) systems into independently deploy-
able services that do ”one thing well” (Namiot and
Sneps-Sneppe, 2014; Newman, 2015). The main
mode of interaction between services in a cloud-
native application architecture is via published and
versioned APIs (API-based collaboration). These
APIs often follow the HTTP REST-style with JSON
serialization, but other protocols and serialization
formats can be used as well. Single deployment
units of the architecture are designed and intercon-
nected according to a collection of cloud-focused
patterns like the twelve-factor app collection, the cir-
cuit breaker pattern and a lot of further cloud comput-
ing patterns (Fehling et al., 2014). And finally, self-
service elastic platforms are used to deploy and oper-
ate these microservices via self-contained deployment
units (containers). These platforms provide additional
operational capabilities on top of IaaS infrastructures
like automated and on-demand scaling of application
instances, application health management, dynamic
routing and load balancing as well as aggregation of
logs and metrics. Some open source examples of such
kind of elastic platforms are listed in Table 1. So, this
paper follows this understanding of a CNA:
Acloud-native application is a distributed, elas-
tic and horizontal scalable system composed of
(micro)services which isolates state in a min-
imum of stateful components. The applica-
tion and each self-contained deployment unit of
that application is designed according to cloud-
focused design patterns and operated on a self-
service elastic platform. (Kratzke and Quint,
2017b)
It is essential to understand that CNAs are operated
on elastic – often container-based – platforms. There-
fore, the multi-cloud aware handling of these elastic
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
30
platforms is focused throughout this paper.
3 MULTI-CLOUD SPECIFICS
Several transferability, awareness and security re-
quirements come along with multi-cloud approaches
(Barker et al., 2015; Petcu and Vasilakos, 2014; Toosi
et al., 2014; Grozev and Buyya, 2014). We will inves-
tigate these requirements in this section and show how
already existing elastic container platforms contribute
to fulfill these requirements.
3.1 Transferability Requirements
Cloud computing is basically a computing model
based on ubiquitous network access to a shared and
virtualized pool of computing resources. The prob-
lem is, that this conceptual model is implemented by
a large number of service providers in different and
not necessarily standardized or compatible ways. So,
portability or transferability has to be requested
for CNA by reasons varying from optimal selection
regarding utilization, costs or profits, to technology
changes, as well as legal issues (Petcu and Vasilakos,
2014).
Elastic container platforms (see Table 1) integrate
container hosts (nodes) into one single and higher
level logical cluster. These technologies provide self-
service elastic platforms for cloud-native applica-
tions (Stine, 2015) in an obvious but also often over-
looked way. Furthermore, some of these platforms
are really ”bulletproofed”. Apache Mesos (Hindman
et al., 2011) has been successfully operated for years
by companies like Twitter or Netflix to consolidate
hundreds of thousands of compute nodes. More re-
cent approaches are Docker Swarm and Google’s Ku-
bernetes, the open-source successor of Google’s in-
ternal Borg system (Verma et al., 2015). (Peinl and
Holzschuher, 2015) provide an excellent overview for
interested readers. From the author’s point of view,
there are four main benefits using these elastic con-
tainer platforms, starting with the integration of sin-
gle nodes (container hosts) into one logical clus-
ter (1st benefit). This integration can be done within
an IaaS infrastructure (for example only within AWS)
and is mainly done for complexity management rea-
sons. However, it is possible to deploy such elastic
platforms across public and private cloud infrastruc-
tures (for example deploying some hosts of the clus-
ter to AWS, some to Google Compute Engine (GCE)
and some to an on-premise OpenStack infrastruc-
ture). Even if these elastic container platforms are
deployed across different cloud service providers
(2nd benefit) they can be accessed as one logical sin-
gle cluster, which is of course a great benefit from
avendor lock-in avoiding (3rd benefit) point of
view. Last but not least, these kind of platforms are
designed for failure, so they have self-healing ca-
pabilities: Their auto-placement, auto-restart, auto-
replication and auto-scaling features are designed to
identfiy lost containers (due to whatever reasons, e.g.
process failure or node unavailability). In these cases
they restart containers and place them on remaining
nodes (without any necessary user interaction). The
cluster can be resized simply by adding or removing
nodes to the cluster. Affected containers (due to a
planned or unplanned node removal) will be resched-
uled transparently to other available nodes. If clus-
ters are formed up of hundreds or thousands of nodes,
some single nodes are always in an invalid state and
have to be replaced almost at any time. So, these fea-
tures are absolutely necessary to operate large-scale
elastic container platforms in a resilient way. How-
ever, exactly the same features can be used inten-
tionally to realize transferability requirements (4th
benefit). For instance, if we want to migrate from
AWS to GCE, we simply attach additional nodes pro-
visioned by GCE to the cluster. In a second step,
we shut down all nodes provided by AWS. The elas-
tic platform will recognize node failures and will
reschedule lost containers accordingly. From an inner
point of view of the platform, expectable node failures
occur and corresponding rescheduling operations are
tasked. From the outside it is a migration from one
provider to another provider at run-time. We will
explain the details in Section 4. At this point it should
be sufficient to get the idea. Further multi-cloud op-
tions like public cloud exits, cloud migrations, public
multi-clouds, hybrid clouds, overflow processing and
so on are presented in Figure 1 and can be handled
using the same approach.
3.2 Awareness Requirements
Beside portability/transferability requirements,
(Grozev and Buyya, 2014) emphasizes that multi-
cloud applications need to have several additional
awarenesses:
1. Data location awareness: The persistent data and
the processing units of an application should be in
the same data center (even on the same rack) and
should be connected with a high-speed network.
Elastic container platforms like Kubernetes intro-
duced the pod concept to ensure such kind of data
locality (Verma et al., 2015).
2. Geo-location awareness: Requests should be
scheduled near the geographical location of their
Smuggling Multi-cloud Support into Cloud-native Applications using Elastic Container Platforms
31
Figure 1: Some deployment options and transferability opportunities of elastic container platforms.
origin to achieve better performance.
3. Pricing awareness: An application scheduler
needs up to date information about providers’
prices to perform fiscally efficient provisioning.
4. Legislation/policy awareness: For some applica-
tions legislative and political considerations upon
provisioning and scheduling must be taken into
account. For example, some services could be re-
quired to avoid placing data outside a given coun-
try.
5. Local resources awareness: It is often that the
usage of in-house resources should have higher
priority than that of external ones (overflow pro-
cessing into a public cloud).
Platforms like Kubernetes,Mesos,Docker Swarm
are able to tag nodes of their clusters with arbitrary
key/value pairs. These tags can be used to code
geo-locations, prices, policies and preferred local re-
sources (and arbitrary further aspects) and considered
in scheduling strategies to place containers accord-
ingly to the above mentioned awareness requirements.
For instance, Docker Swarm uses constraint filters1
for that kind of purpose. Arbitrary tags like a location
tag ”Germany” can be assigned to a node. This tag-
ging can be defined in a cluster definition file and will
be assigned to a node in the install node step shown in
Figure 3(b). This tagging is considered by schedulers
of the mentioned platforms. Docker Swarm would de-
ploy a database container only on nodes which are
1See https://docs.docker.com/v1.11/swarm/
scheduler/filter/ (last access 15th Feb. 2017)
tagged as ”location=Germany” if a constraint filter is
applied like shown.
docker run \
-e constraint:location==Germany \
couchdb
Kubernetes provides similar tag-based concepts
called node selectors and even more expressive (anti-
)affinities which are considered by the Kubernetes
scheduler2. The Marathon framework for Mesos uses
constraints3. Obviously, all of these concepts rely on
the same idea, are tagged-based and therefore can be
used to cover mentioned awareness requirements in a
consistent way by platform drivers (see Figure 2).
3.3 Security Requirements
Furthermore, security requirements have to be con-
sidered for multi-cloud scenarios because such kind
of platforms can span different providers and there-
fore data is likely to be submitted via the ”open and
unprotected” internet. Again, platforms like Docker’s
Swarm Mode (since version 1.12) provide an en-
crypted data and control plane via overlay networks
to handle this. Kubernetes can be configured to
use encryptable overlay network plugins like Weave.
Accompanying network performance impacts can be
contained (Kratzke and Quint, 2015b; Kratzke and
Quint, 2017a).
2see https://kubernetes.io/docs/user-guide/
node-selection/ (last access 15th Feb. 2017)
3see https://mesosphere.github.io/marathon/
docs/constraints.html (last access 15th Feb. 2017)
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
32
3.4 Summary
Existing open source elastic container platforms ful-
fill common transferability, awareness and security
requirements in an elastic and resilient manner. The
following Section 4 will explain a proof-of-concept
solution how to access these opportunities using a
simple control process.
4 PROOF OF CONCEPT
The proposed solution is implemented as a proto-
typic Ruby command line tool which could be trig-
gered in the execution phase of a MAPE auto-scaling
loop (Ch. Qu and R. N. Calheiros and R. Buyya,
2016). Although a MAPE loop needs always on com-
ponents, this execution loop alone does not need any
central server component, no permanent cluster con-
nection, and can be operated on a single machine (out-
side the cloud). The tool scales elastic container plat-
forms according to a simple control process (see Fig-
ure 3(a)). This control process realizes all necessary
multi-cloud transferability requirements. The control
process evaluates a JSON encoded cluster descrip-
tion format (the intended state of the container cluster,
see Appendix: Listing 1) and the current state of the
cluster (attached nodes, existing security groups). If
the intended state differs from the current state nec-
essary adaption tasks are deduced (attach/detachment
of nodes, creation and termination of security groups).
The control process reaches the intended state and ter-
minates if no further adaption tasks can be deduced.
4.1 Description of Elastic Platforms
The description of elastic platforms in multi-cloud
scenarios must consider arbitrary IaaS cloud service
providers. This is done by the conceptual model
shown in Figure 2. Two main approaches can be
identified how public cloud service providers orga-
nize their IaaS services: project- and region-based
service delivery. GCE and OpenStack infrastructures
are examples following the project-based approach.
To request IaaS resources like virtual machines one
has to create a project and within the project one has
access to resources on whatever regions. AWS is an
example for region-based service provisioning. Re-
sources are requested per region (Europe, US, Asia,
...) and they are not assigned to a particular project.
So, one can access easily resources within a region
(and across projects) but it gets complicated to ac-
cess resources outside a region. Both approaches have
their advantages and disadvantages as the reader will
see. Region-based approaches seem to provide better
adaption performances (see Section 5). However, it
is not worth discussing – the approaches are given.
Multi-cloud solutions simply have to consider that
both approaches occur in parallel. They key idea to
integrate both approaches is the introduction of a con-
cept called District (see Figure 2).
One provider region or project can map to one or
more Districts and vice versa. A District is sim-
ply a user defined ”datacenter” which is provided by a
specific cloud service provider (following the project-
or region-based approach). This additional layer pro-
vides maximum flexibility in defining multi-cloud de-
ployments of elastic container platforms. A multi-
cloud deployed elastic container platform can be de-
fined using two descriptive and JSON encoded defini-
tion formats (cluster.json and districts.json).
The definition formats are exemplary explained in the
Appendix in Listings 1, 2, and 3 in more details.
ACluster (elastic platform) is defined as a list of
Deployments. Both concepts are defined in a cluster
definition file format (see Appendix, Listing 1). A
Deployment defines how many nodes of a specific
Flavor should perform a specific cluster role in a
specified District. Most elastic container platforms
are assuming two roles of nodes in a cluster. A ”mas-
ter” role to perform scheduling, control and manage-
ment tasks and a ”worker” role to execute contain-
ers. Our solution can work with arbitrary roles and
role(names). However, these roles have to be con-
sidered by Platform drivers (see Figure 2) in their
install,join and leave cluster hooks (see Figure
3(b)). So, role and container platform specifics can be
isolated in Platform drivers. A typical Deployment
can be expressed using this JSON snippet.
{
"district": "gce-europe",
"flavor": "small",
"role": "master",
"quantity": 3
"tags": {
"location": "Europe",
"policy": "Safe-Harbour",
"scheduling-priority": "low",
"further": "arbitrary tags"
}
}
A complete cluster can be expressed as a list of
such Deployments. Machine Flavors (e.g. small,
medium and large machines) and Districts are
user defined and have to be mapped to concrete
cloud service provider specifics. This is done us-
ing the districts.json definition file (see Ap-
pendix, Listing 3). A District object is responsi-
ble to execute deployments (adding or removing ma-
Smuggling Multi-cloud Support into Cloud-native Applications using Elastic Container Platforms
33
Figure 2: The conceptual model and its relation to descriptive cluster definition formats (please compare with Appendix).
chines to the cluster as well as tagging them to han-
dle the awareness requirements mentioned in Sec-
tion 3). The execution is delegated to a Driver
object which encapsules and processes all neces-
sary cloud service provider and elastic container
platform specific requests. The driver uses ac-
cess Credentials for authentication (see Appendix,
Listing 2). The Driver generates Resource ob-
jects (Nodes and SecurityGroups) representing re-
sources (current state of the cluster, encoded in a
resources.json file) provided by the cloud ser-
vice provider (District). SecurityGroups are
used to allow internal platform communication across
IaaS infrastructures. These basic security means are
provided by all IaaS infrastructures under different
names (firewalls, security groups, access rules, net-
work rules, ...). This resources list is used by the
control process to build the delta between the in-
tended state (encoded in cluster.json) and the cur-
rent state (encoded in resources.json).
4.2 One Control Process for All
The control process shown in Figure 3(a) is responsi-
ble to command and control all necessary actions to
reach the intended state (encoded in cluster.json).
This cybernetic understanding can be used to handle
common multi-cloud workflows.
A fresh deployment of a cluster can be understood
as executing the control loop on an initially empty
resources list.
Ashutdown can be expressed by setting all de-
ployment quantities to 0.
Amigration from one District A to another
District B can be expressed by setting all
Deployment quantities of A to 0 and adding the
former quantities of A to the quantities of B.
Having this cybernetic understanding it is easy to re-
alize all (and more) multi-cloud deployment options
and transferability opportunities shown in Figure 1.
The control loop derives a prioritized action plan to
reach the intended state. The current implementation
realizes this according to the workflow shown in Fig-
ure 3(a). However, other workflow sequences might
work as well and could show faster adaption cycles.
The reader should be aware that the workflow must
keep the affected cluster in a valid and operational
state at all times. The currently implemented strategy
considers practitioner experience to reduce ”stress”
for the affected elastic container platform. Only ac-
tion steps (2) and (3) of the control loop are explained
(shown in Figure 3(b)). The other steps are not pre-
sented due to triviality and page limitations.
Whenever a new node attachment is triggered by
the control loop, the corresponding Driver is called
to launch a new Node request. The Node is added
to the list of requested resources (and extends there-
fore the current state of the cluster). Then all exist-
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
34
(a) The execution control loop (monitoring, analy-
sis, planning are only shown to indicate the context
of a full elastic setting)
(b) Add master/worker action, steps (2) and (3)
Figure 3: The control loop embedded in a MAPE loop.
ing SecurityGroups are updated to allow incoming
network traffic from the new Node. These steps are
handled by an IaaS Infrastructure driver. Next,
the control is handed over to a Platform driver. This
driver is performing necessary software installs via
SSH-based scripting. Finally, the node is joined to
the cluster using platform (and maybe role-)specific
joining calls provided by the Platform driver. If in-
stall or join operations were not successful, the ma-
chine is terminated and removed from the resources
list by the Infrastructure driver. In these cases the
current state could not be extended and a next round
of the control loop would do a retry. Therefore, the
control-loop is automatically designed for failure and
will take care to retry failed install and joining ac-
tions.
4.3 IaaS Infrastructures and Platforms
The workflows shown in Figure 3 are designed
to handle arbitrary IaaS Infrastructures and ar-
bitrary elastic Platforms. However, the infras-
tructure and platform specifics must be handled as
well. This is done using an extendable driver con-
cept (see Figure 2). The classes Platform and
Infrastructure are two extension points to provide
support for IaaS infrastructures like AWS,GCE,
Azure,DigitalOcean,RackSpace, ..., and for elas-
tic container platforms like Docker Swarm,Kuber-
netes,Mesos/Marathon,Nomad, and so on. Infras-
tructures and platforms can be integrated simply by
extending the Infrastructure class (for IaaS in-
frastructures) or Platform class (for additional elas-
tic container platforms). Both concerns can be com-
bined to enable the operation of a Platform on an
IaaS Infrastructure. The current state of imple-
mentation provides platform drivers for the elastic
container platforms Kubernetes and Docker’s Swarm-
Mode and infrastructure drivers for the public IaaS
infrastructures AWS,GCE and the private IaaS in-
frastructure OpenStack. Due to the mentioned ex-
tension points further container Platforms and IaaS
Infrastructures are easily extendable.
5 EVALUATION
The solution was evaluated using two elastic plat-
forms (Docker’s 1.12 Swarm Mode and Kubernetes
1.4) on three public and private cloud infrastructures
(GCE, eu-west1 region;AWS, eu-central-1 region and
OpenStack, research institutions private datacenter).
The platforms operated two multi-tier web applica-
tions (a Redis-based guestbook and a reference ”sock-
shop” application4. Both CNAs are often used by
practitioners to demonstrate elastic container platform
features.
4see https://github.com/kubernetes/
kubernetes/tree/master/examples/guestbook
and https://github.com/microservices-demo/
microservices-demo (last access 19th Feb. 2017)
Smuggling Multi-cloud Support into Cloud-native Applications using Elastic Container Platforms
35
Figure 4: Launching and terminating o a elastic container platform (single-cloud experiments E1 and E2)
5.1 Experiments
The implementation was tested using a 10 node clus-
ter composed of one master node and 9 worker nodes
executing the above mentioned reference applica-
tions. The following experiments demonstrate elas-
tic container platform deployments, terminations, and
complete or partial platform transfers across differ-
ent cloud service infrastructures. Additionally, the ex-
periments were used to measure runtimes to execute
these kind of operations.
E1: Launch a 10 node cluster in AWS and GCE
(single-cloud).
E2: Terminate a 10 node cluster in AWS and GCE
(single-cloud).
E3: Transfer 1 node of a 10 node cluster from
AWS to GCE (multi-cloud) and vice versa.
E4: Transfer 5 nodes of a 10 node cluster from
AWS to GCE (multi-cloud) and vice versa.
E5: Transfer a complete 10 node cluster from
AWS to GCE (multi-cloud) and vice versa.
To compare similar machine types in AWS and GCE
it was decided to use n1-standard-2 machine types
from GCE and m3.large machine types from AWS.
These machine types show high similarities regard-
ing processing, networking, I/O and memory perfor-
mance (Kratzke and Quint, 2015a). Additionally, a
machine type on the institutes on-premise OpenStack
infrastructure has been defined with comparable per-
formance characteristics like the mentioned ”refer-
ence machine types” selected from GCE and AWS.
Each experiment was repeated at least 10 times.
Due to page limitations only data for Docker Swarm
is presented. It turned out that most of runtimes are
due to low level IaaS infrastructure operations and not
due to elastic container platform installation/configu-
ration and rescheduling operations. So, the data for
Kubernetes is quite similar. It is to admit that the mea-
sured data is more suited as a performance compari-
son of AWS and GCE infrastructure operations than it
is suited to be a performance measurement of the pro-
posed control loop. So far, one can say that the pro-
posed control loop is slow on slow infrastructures and
fast on fast infrastructures. That is not astonishing.
However, some interesting findings especially regard-
ing software defined network aspects in multi-cloud
scenarios and reactiveness of public cloud service in-
frastructures could be derived.
OpenStack performance is highly dependent on
the physical infrastructure and detail configura-
tion. So, although it was tested against institutes
on-premise OpenStack private cloud infrastructure
(OpenStack is somewhere in between AWS and GCE),
it is likely that this data is not representative. That is
why only data for AWS and GCE is presented.
Figure 4 shows the results of the experiments E1 and
E2 by boxplotting the completion times when all se-
curity groups, master and worker nodes were up/-
down. The cluster was in an initial operating mode
when the master node was up, and it was fully oper-
ational when all worker nodes were up. The reader
might be surprised, that GCE infrastructure opera-
tions are taking much longer than AWS infrastruc-
ture operations. The cluster was launched on AWS
in approximately 260 seconds and on GCE in 450
seconds (median values). The analysis turned out,
that the AWS infrastructure is much more reactive re-
garding adjustments of network settings than GCE
(see SDN related processing times in Figure 5). The
security groups on AWS can be created in approxi-
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
36
mately 10 seconds but it takes almost two minutes
on GCE. There are similar severe performance differ-
ences when adjustments on these security groups are
necessary (so when nodes are added or removed). We
believe this has to do with network philosophies of
both infrastructures. Security groups in AWS are re-
gion specific. Firewalls and networks in GCE are de-
signed to be used in GCE projects and GCE projects
can be deployed across all available GCE regions. So,
AWS has to acivate SDN changes only in one region
(that means within one datacenter). But GCE has to
activate changes in all regions (so in all of their dat-
acenters). From a cloud customer perspective GCE
networking is much more comfortable but adaptions
are slow compared with the AWS infrastructure. The
runtime effects for deployment visualizes Figure 4.
The termination is even more worse. A cluster can
be terminated in approximately 90 seconds on AWS
but it takes up to 720 seconds on GCE. Our analy-
sis turned out that the CLI (command line interface)
of AWS works mainly asynchronous while the CLI of
GCE works mainly synchronous. The control loop
terminates nodes node by node in order to reduce
rescheduling stress for the elastic platform (node re-
questing is done in parallel because a node adding
normally does not involve immediate rescheduling of
the workload). So, on AWS a node is terminated by
deregistering it from its master of the elastic con-
tainer platform and than its termination is launched.
The CLI does not wait until termination is completed
and just returns. The effect is, that nodes are dereg-
istered sequentially from the platform (which only
takes one or two seconds) and subsequent termination
of all nodes is done mainly in parallel (a termination
is started almost every 2 seconds but take almost a
minute). The reader should be aware, that this results
in much more rescheduling stress for the container
platform. However, no problems with that higher
stress level in the experiments were observable on the
AWS side. The CLI of GCE is synchronous. So, when
a node is terminated, the GCE CLI waits until the ter-
mination is completed (which takes approximately a
minute). Additionally, every time a node is removed
the GCE firewalls have to be adapted and this is a
time consuming operation as well in GCE (approxi-
mately 25 seconds for GCE but only 10 seconds for
AWS). That is why termination durations show these
dramatic differences.
Figure 5 shows the results of the experiments E3, E4
and E5 (Transfer times between AWS and GCE). It
was simply measured how long it took to transfer 1,
5 or all 10 cluster nodes from AWS to GCE or vice
versa. Transfer speeds are dependent of the origin
provider. Figure 5 shows that a transfer from AWS to
Figure 5: Detail data of multi-cloud experiments E3, E4, E5
GCE (6 minutes) is more than two times faster than
from GCE to AWS (13 minutes). Furthermore, it is
astonishing that a transfer of only one single node
from AWS to GCE takes almost as long as a complete
single-cloud cluster launch on AWS (both operations
take approximately 4 minutes). However, a complete
cluster transfer (10 nodes) from AWS to GCE is only
slightly slower (6 minutes, but not 40 minutes!). A
transfer from one provider to another is obviously
a multi-cloud operation and multi-cloud operation
always involve operations of the ”slower” provider
(in the case of the experiments E3,E4,E5 this in-
volved SDN adjustments and node terminations). It
turned out, that the runtime behaviour of the slow-
est provider is dominating the overall runtime be-
haviour of multi-cloud operations. In the analyzed
case, slower GCE SDN related processing times and
node termination times were the dominating factor to
slow down operations on the AWS side. This can re-
sult in surprising effects. A complete cluster transfer
from a ”faster” provider (AWS) to a ”slower” provider
(GCE) can be done substantially faster than from a
”slower” provider to a ”faster” provider.
Taking all together, IaaS termination operations
should be launched asynchronously to improve the
overall multi-cloud performance.
5.2 Critical Discussion
Each cloud provider has specific requirements and
the proposed control loop is designed to be generic
Smuggling Multi-cloud Support into Cloud-native Applications using Elastic Container Platforms
37
enough to adapt to each one. The presented control
loop was even able to handle completely different tim-
ing behaviors without being explicitly designed for
that purpose. However, this ”genericness” obviously
can not be proofed. There might be an IaaS cloud
infrastructure not suitable for the proposed approach.
But the reader should be aware that – intentionally –
only very basic IaaS concepts are used. The control
loop should work with every public or private IaaS in-
frastructure providing concepts like virtual machines
and IP-based network access control concepts like
security groups (AWS) or network/firewalls (GCE).
These are very basic concepts. An IaaS infrastructure
not providing these basic concepts is hardly imagin-
able and to the best of our knowledge not existing.
The proposed solution tries to keep the cluster in
a valid and operational state under all circumstances
(whatever it costs). A migration from one infrastruc-
ture Ato another infrastructure Bcould be expressed
by setting all quantities of Ato 0 and all quantities
of Bto the former quantities of B(that is basically
the experiment E5). The current implementation of
the control loop is not very sophisticated and executes
simply a worst case scaling. Node creation steps have
a higher priority than node deletion steps. So, a mi-
gration increases the cluster to its double size in a first
step. In a second step, the cluster will be shrinked
down to its intended size in its intended infrastructure.
This leaves obviously room for improvement.
Furthermore, the control loop is designed to be
just the execution step of a higher order MAPE loop.
If the planning step (or the operator) defines an in-
tended state which is not reachable, the execution
loop may simply have no effect. Imagine a cluster un-
der high load. If the intended state would be set to half
of the nodes (due to whatever reasons), the execution
loop would not be able to reach this state. Why? Be-
fore a node is terminated by the control loop, the con-
trol loop informs the container scheduler to mark this
node as unscheduleable with the intent that the con-
tainer platform will reschedule all load of this node
to other nodes (draining the node). For these kind of
purposes elastic container platforms have operations
to mark nodes as unschedulable (Kubernetes has the
cordon command, Docker has a drain concept and so
on). Only in the case that the container platform could
successfully drain the node, the node will be deleted.
However, in high load scenarios the scheduler of the
container platform will simply answer that draining is
not possible. The control loop will not terminate the
node and will simply try to drain the next node on its
list (which will not work as well). In consequence it
will finish its cycle without substantially changing the
current state. The analyzing step of the MAPE loop
will still identify a delta between the intended and the
current state and will consequently trigger the execu-
tion control loop one more time. That is not perfect
but at last the cluster is kept in an operational state.
5.3 Lessons Learned
Finally, several lessons learned can be derived
from performed software engineering activities which
might be of interest for researchers or practitioners.
1. Just Use Very Basic Requests to Launch, Ter-
minate and Secure Virtual Machines. Try to
avoid cloud-init. It is not identically supported by
all IaaS infrastructures especially in timing rele-
vant public/private IP assignments. Use ssh-based
scripting instead.
2. Consider Secure Networking Across Different
Providers. If you want to bridge different IaaS
cloud service providers you have to work with
public IPs from the very beginning! However, this
is not the default operation for most elastic plat-
forms. Additionally, control and data plane en-
cryption must be supported by the used overlay
network of your elastic platform.
3. Do Never Use IaaS Infrastructure Elasticity
Features. They are not 1:1 portable across
providers. The elastic platform has to cover this.
4. Separate IaaS Support and Elastic Platform
Support Concerns from Each Other. They can
be solved independently from each other using
two independent extension points.
5. Describe Intended States of an Elastic Platform
and Let a Control Process Take Care to Reach
This Intended State. Do not think in TOSCA-
like and IaaS infrastructure specific workflows
how to deploy, scale, migrate and terminate an
elastic platform. All this can be solved by a single
control loop.
6. Separate Description Concerns of the Intended
State. Try to describe the general cluster as an
intended state in a descriptive way. Do not mix
the intended state with infrastructure specifics and
access credentials.
7. Consider What Causes Stress to an Elastic
Platform. Adding nodes to a platform is less
stressfull than to remove nodes. It seems to be a
good and defensive strategy to add nodes in paral-
lel but to shutdown nodes sequentially. However,
this increases the runtime of the execution phase
of a MAPE loop. To investigate time optimal ex-
ecution strategies could be a fruitful research di-
rection to make MAPE loops more reactive.
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
38
8. Respect Resilience Limitations of an Elastic
Platform. Never shutdown nodes before you at-
tached compensating nodes (in case of transfer-
ability scaling actions) is an obvious solution! But
it is likely not ressource efficient. To investigate
resilient and resource efficient execution strate-
gies could be a fruitful research direction to op-
timize MAPE loops for transferability scenarios.
9. Platform Roles Increase Avoidable Deploy-
ment Complexity. Elastic container platforms
should be more P2P-like and composed of homo-
geneous and equal nodes. This could be a fruitful
research direction either.
10. Asynchronous CLIs or APIs Are Especially
Preferable for Terminating Operatings. Elastic
container platforms will show much more reactive
behavior (and faster adaption cycles) if operated
on IaaS infrastructures providing asynchronous
terminating operations (see Figure 4).
6 RELATED WORK
According to (Barker et al., 2015; Petcu and Vasi-
lakos, 2014; Toosi et al., 2014; Grozev and Buyya,
2014) there are several promising approaches deal-
ing with multi-cloud scenarios. However, none of
these surveys identified elastic container platforms as
a viable option. Just (Petcu and Vasilakos, 2014)
identify the need to ”adopt open-source platforms”
and ”mechanisms for real-time migration” at run-
time level but did not identified (nor integrated) con-
crete and existing platforms or solutions. All surveys
identified approaches fitting mainly in the following
fields: Volunteer federations for groups of ”cloud
providers collaborating voluntarily with each other to
exchange resources” (Grozev and Buyya, 2014). In-
dependent federations (or multi-clouds) ”when mul-
tiple clouds are used in aggregation by an application
or its broker. This approach is essentially indepen-
dent of the cloud provider” and focus the client-side
of cloud computing (Toosi et al., 2014).
This contribution focus independent federations
(multi-clouds). We do not propose a broker-based
solution (Barker et al., 2015) because cloud-brokers
have the tendency just to shift the vendor lock-in
problem to a broker. However, the following ap-
proaches show some similarities. The following para-
graphs briefly explain how the proposed approach is
different.
Approaches like OPTIMIS (Ferrer et al., 2012),
ConTrail (Carlini et al., 2012) or multi-cloud PaaS
platforms (Paraiso et al., 2012) enable dynamic pro-
visioning of cloud services targeting multi-cloud ar-
chitectures. These solutions have to provide a lot
of plugins to support possible implementation lan-
guages. (Paraiso et al., 2012) mention at least 19 dif-
ferent plugins (just for a research prototype). This in-
creases the inner complexity of such kind of solutions.
Container-based approaches might be better suited to
handle this kind of complexity. Approaches like mO-
SAIC (Petcu et al., 2011) or Cloud4SOA (Kamateri
et al., 2013) assume that an application can be divided
into components according to a service oriented ap-
plication architecture (SOA). These approaches rely
that applications are bound to a specific run-time en-
vironment. This is true for the proposed approach as
well. However, this paper proposes a solution where
the run-time environment (elastic container platform)
is up to a user decision as well.
The proposed deployment description format is
based on JSON. And it is not at all unlike the kind of
deployment description languages used by TOSCA
(Brogi et al., 2014), CAMEL (A. Rossini, 2015) or
CloudML (Lushpenko et al., 2015). In fact, some
EC-funded projects like PaaSage5(Baur and Do-
maschka, 2016) combine such deployment specifica-
tion languages with runtime environments. Nonethe-
less, this contribution is focused on a more container-
centric approach. Finally, several libraries have been
developed in recent years like JClouds,LibCloud,
DeltaCloud,SimpleCloud,Nuvem,CPIM (Giove
et al., 2013) to name a few. All these libraries unify
differences in the management APIs of clouds and
provide control over the provisioning of resources
across geographical locations. Also configuration
management tools like Chef or Puppet address de-
ployments. But, these solutions do not provide any
(elastic) runtime environments.
Taking all together, the proposed approach intends
to be more ”pragmatic”, ”lightweight” and complex-
ity hiding using existing elastic container platforms.
On the downside, it might be only applicable for
container-based applications. But to use container
platforms gets more and more common in CNA en-
gineering.
7 CONCLUSIONS
As it was emphasized throughout this contribution,
elastic container platforms are a viable and pragmatic
option to support multi-cloud handling which should
be considered more consequently in multi-cloud re-
search. Elastic container platforms provide inherent –
5see http://www.paasage.eu/ (last access 15th Feb.
2017)
Smuggling Multi-cloud Support into Cloud-native Applications using Elastic Container Platforms
39
but astonishingly often overlooked – multi-cloud sup-
port. Multi-cloud workflows to deploy, scale, migrate
and terminate elastic container platforms across dif-
ferent public and private IaaS cloud infrastructures
can be complex and challenging. Instead of that,
this paper proposed to define an intended multi-cloud
state of an elastic platform and let a control process
take care to reach this state. This paper presented an
implementation of such kind of control process be-
ing able to migrate and operate elastic container plat-
forms across different cloud-service providers. It was
possible to transfer a 10 node cluster from AWS to
GCE in approximately six minutes. This control pro-
cess can be used as execution phase in auto-scaling
MAPE loops (Ch. Qu and R. N. Calheiros and R.
Buyya, 2016). The presented cybernetic approach
could evaluated successfully using common elastic
container platforms (Docker’s Swarm Mode,Kuber-
netes) and IaaS infrastructures (AWS,GCE, and Open-
Stack). Furthermore, fruitful lessons learned about
runtime behaviors of IaaS operations and promising
research directions like more P2P-based and control-
loop based designs of elastic container platforms
could be derived. The reader should be aware that the
presented approach might be not feasible for applica-
tions and services outside the scope of CNAs. Nev-
ertheless, it seems that CNA architectures are getting
a predominant architectural style how to deploy and
operate services in the cloud.
ACKNOWLEDGEMENTS
This research is funded by German Federal Ministry
of Education and Research (03FH021PX4). I would
like to thank Peter Quint, Christian St¨
uben, and Arne
Salveter for their hard work and their contributions to
the Project Cloud TRANSIT. Finally, let me thank all
anonymous reviewers for their valuable feedback that
improved this paper.
REFERENCES
A. Rossini (2015). Cloud Application Modelling and Exe-
cution Language (CAMEL) and the PaaSage Work-
flow. In Advances in Service-Oriented and Cloud
Computing—Workshops of ESOCC 2015, volume
567, pages 437–439.
Balalaie, A., Heydarnoori, A., and Jamshidi, P. (2015).
Migrating to Cloud-Native Architectures Using Mi-
croservices: An Experience Report. In 1st Int. Work-
shop on Cloud Adoption and Migration (CloudWay),
Taormina, Italy.
Barker, A., Varghese, B., and Thai, L. (2015). Cloud Ser-
vices Brokerage: A Survey and Research Roadmap.
In 2015 IEEE 8th International Conference on Cloud
Computing, pages 1029–1032. IEEE.
Baur, D. and Domaschka, J. (2016). Experiences from
Building a Cross-cloud Orchestration Tool. In Proc.
of the 3rd Workshop on CrossCloud Infrastructures &
Platforms, CrossCloud ’16, pages 4:1–4:6, New York,
NY, USA. ACM.
Brogi, A., Soldani, J., and Wang, P. (2014). TOSCA in a
Nutshell: Promises and Perspectives, pages 171–186.
Springer Berlin Heidelberg, Berlin, Heidelberg.
Carlini, E., Coppola, M., Dazzi, P., Ricci, L., and Righetti,
G. (2012). Cloud Federations in Contrail. pages 159–
168. Springer Berlin Heidelberg.
Ch. Qu and R. N. Calheiros and R. Buyya (2016). Auto-
scaling Web Applications in Clouds: A Taxonomy and
Survey. CoRR, abs/1609.09224.
Fehling, C., Leymann, F., Retter, R., Schupeck, W., and Ar-
bitter, P. (2014). Cloud Computing Patterns: Funda-
mentals to Design, Build, and Manage Cloud Appli-
cations. Springer Publishing Company, Incorporated.
Ferrer, A. J., Hernandez, F., Tordsson, J., Elmroth, E., Ali-
Eldin, A., Zsigri, C., Sirvent, R., Guitart, J., Badia,
R. M., Djemame, K., Ziegler, W., Dimitrakos, T., Nair,
S. K., Kousiouris, G., Konstanteli, K., Varvarigou, T.,
Hudzia, B., Kipp, A., Wesner, S., Corrales, M., Forgo,
N., Sharif, T., and Sheridan, C. (2012). OPTIMIS: A
holistic approach to cloud service provisioning. Fu-
ture Generation Computer Systems, 28(1):66–77.
Giove, F., Longoni, D., Yancheshmeh, M. S., Ardagna, D.,
and Di Nitto, E. (2013). An Approach for the Devel-
opment of Portable Applications on PaaS Clouds. In
Proceedings of the 3rd International Conference on
Cloud Computing and Services Science, pages 591–
601. SciTePress - Science and and Technology Publi-
cations.
Grozev, N. and Buyya, R. (2014). Inter-Cloud architectures
and application brokering: taxonomy and survey. Soft-
ware: Practice and Experience, 44(3):369–390.
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A.,
Joseph, A. D., Katz, R. H., Shenker, S., and Stoica,
I. (2011). Mesos: A Platform for Fine-Grained Re-
source Sharing in the Data Center. In 8th USENIX
Conf. on Networked systems design and implementa-
tion (NSDI’11), volume 11.
Kamateri, E., Loutas, N., Zeginis, D., Ahtes, J., D’Andria,
F., Bocconi, S., Gouvas, P., Ledakis, G., Ravagli,
F., Lobunets, O., and Tarabanis, K. A. (2013).
Cloud4SOA: A Semantic-Interoperability PaaS Solu-
tion for Multi-cloud Platform Management and Porta-
bility. pages 64–78. Springer Berlin Heidelberg.
Kratzke, N. and Quint, P.-C. (2015a). About Automatic
Benchmarking of IaaS Cloud Service Providers for a
World of Container Clusters. Journal of Cloud Com-
puting Research, 1(1):16–34.
Kratzke, N. and Quint, P.-C. (2015b). How to Operate Con-
tainer Clusters more Efficiently? Some Insights Con-
cerning Containers, Software-Defined-Networks, and
their sometimes Counterintuitive Impact on Network
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
40
Performance. International Journal On Advances in
Networks and Services, 8(3&4):203–214.
Kratzke, N. and Quint, P.-C. (2017a). Investigation of Im-
pacts on Network Performance in the Advance of a
Microservice Design. In Helfert, M., Ferguson, D.,
Munoz, V. M., and Cardoso, J., editors, Cloud Com-
puting and Services Science Selected Papers, Com-
munications in Computer and Information Science
(CCIS). Springer.
Kratzke, N. and Quint, P.-C. (2017b). Understanding
Cloud-native Applications after 10 Years of Cloud
Computing - A Systematic Mapping Study. Journal
of Systems and Software, 126(April):1–16.
Lushpenko, M., Ferry, N., Song, H., Chauvel, F., and Sol-
berg, A. (2015). Using Adaptation Plans to Control
the Behavior of Models@Runtime. In Bencomo, N.,
G¨
otz, S., and Song, H., editors, MRT 2015: 10th
Int. Workshop on Models@run.time, co-located with
MODELS 2015: 18th ACM/IEEE Int. Conf. on Model
Driven Engineering Languages and Systems, volume
1474 of CEUR Workshop Proceedings. CEUR.
Namiot, D. and Sneps-Sneppe, M. (2014). On micro-
services architecture. Int. Journal of Open Informa-
tion Technologies, 2(9).
Newman, S. (2015). Building Microservices. O’Reilly Me-
dia, Incorporated.
Pahl, C. and Jamshidi, P. (2015). Software architecture
for the cloud – A roadmap towards control-theoretic,
model-based cloud architecture. In Lecture Notes in
Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioin-
formatics), volume 9278.
Paraiso, F., Haderer, N., Merle, P., Rouvoy, R., and Sein-
turier, L. (2012). A Federated Multi-cloud PaaS In-
frastructure. In 2012 IEEE Fifth International Con-
ference on Cloud Computing, pages 392–399. IEEE.
Peinl, R. and Holzschuher, F. (2015). The Docker Ecosys-
tem Needs Consolidation. In 5th Int. Conf. on Cloud
Computing and Services Science (CLOSER 2015),
pages 535–542.
Petcu, D., Craciun, C., Neagul, M., Lazcanotegui, I., and
Rak, M. (2011). Building an interoperability API for
Sky computing. In 2011 International Conference
on High Performance Computing & Simulation, pages
405–411. IEEE.
Petcu, D. and Vasilakos, A. V. (2014). Portability in clouds:
approaches and research opportunities. Scalable Com-
puting: Practice and Experience, 15(3):251–270.
Stine, M. (2015). Migrating to Cloud-Native Application
Architectures. O’Reilly.
Toosi, A. N., Calheiros, R. N., and Buyya, R. (2014). In-
terconnected Cloud Computing Environments. ACM
Computing Surveys, 47(1):1–47.
Verma, A., Pedrosa, L., Korupolu, M. R., Oppenheimer, D.,
Tune, E., and Wilkes, J. (2015). Large-scale cluster
management at Google with Borg. In 10th. Europ.
Conf. on Computer Systems (EuroSys ’15), Bordeaux,
France.
APPENDIX
Exemplary Cluster Definition File (JSON)
This cluster definition file defines a Swarm cluster
with the intended state to be deployed in two districts
provided by two providers GCE and AWS. It defines
three type of user defined node types (flavors): small,
med, and large. 3 master and 3 worker nodes should
be deployed on small virtual machine types in dis-
trict gce-europe. 10 worker nodes should be deployed
on small virtual machine types in district aws-europe.
The flavors small,med,large are defined in Listing 3.
{ " t yp e " : " c lu s te r " ,
" p la t fo r m ": " Sw ar m " ,
// [ .. . ] , S i m p l i f i e d fo r r e a d a b i lit y
" f la v o rs ": [" s m al l " , " me d " , " l ar g e "] ,
" de p lo y me n ts " : [
{ " d i st r ic t " : " gc e - e ur o pe " ,
" fl a vo r " : " s m al l " ,
" ro l e ": " ma st e r ",
"quantity ": 3
},
{ " d i st r ic t " : " gc e - e ur o pe " ,
" fl a vo r " : " s m al l " ,
" ro l e ": " wo rk e r ",
"quantity ": 3
},
{ " d i st r ic t " : " aw s - e ur o pe " ,
" fl a vo r " : " s m al l " ,
" ro l e ": " wo rk e r ",
"quantity ": 10
}
]
}
Listing 1: Cluster Definition (cluster.json).
Exemplary Credentials File (JSON)
The following credential file provides access creden-
tials for customer specific GCE and AWS accounts
as identified by the district definition file (gce default
and aws default).
[ { " ty pe " : " c r ed e nt i al " ,
" id " : " gc e _ de f a u lt ",
" p ro v id e r " : " g c e ",
" g ce _ ke y _ fi l e ": " pa th -t o - k ey . j so n "
},
{ " t yp e ": " c re de n ti a l ",
" id " : " aw s _ de f a u lt ",
" p ro v id e r " : " a w s ",
"aws_access_key_id ": "AKID",
"aws_secret_access_key ": "SECRET"
}
]
Listing 2: Credentials (credentials.json).
Smuggling Multi-cloud Support into Cloud-native Applications using Elastic Container Platforms
41
Exemplary District Definition File (JSON)
The following district definition defines provider
specific settings and mappings. The user defined dis-
trict gce-europe should be realized using the provider
specific GCE zones europe-west1-b and europe-
west1-c. Necessary and provider specific access set-
tings like project identifiers, regions, and credentials
are provided as well. User defined flavors (see clus-
ter definition format above) are mapped to concrete
provider specific machine types. The same is done
for the AWS district aws-europe.
[
{
" ty pe ": " di s tr ic t " ,
" id " : " g ce - e u ro pe " ,
" p ro v id e r " : " g c e ",
" c re de n ti a l_ i d ": " gc e _d e fa u lt " ,
" g c e_ p r o je c t _ id ": "y o ur - pr oj - id " ,
" g ce _ re g io n " : " e ur o pe - we s t1 " ,
"gce_zones ": [
" eu r op e -w e st 1 - b " ,
" eu r op e -w e st 1 - c "
],
" fl a vo r s ": [
{ " f l av o r ": " sm al l " ,
" m ac h in e _ ty p e ": " n1 - s ta n da rd - 1"
},
{ " f l av o r " : " m e d ",
" m ac h in e _ ty p e ": " n1 - s ta n da rd - 2"
},
{ " f l av o r ": " la rg e " ,
" m ac h in e _ ty p e ": " n1 - s ta n da rd - 4"
}
]
},
{
" ty pe ": " di s tr ic t " ,
" id " : " a ws - e u ro pe " ,
" p ro v id e r " : " a w s ",
" c re de n ti a l_ i d ": " aw s _d e fa u lt " ,
" a ws _ re g i on ": "e u - ce n tr al - 1" ,
" fl a vo r s ": [
{ " f l av o r ": " sm al l " ,
" i n st a n c e_ t y pe ": " m3 . m e di u m "
},
{ " f l av o r " : " m e d ",
" i n st a n c e_ t y pe ": " m4 . l a rg e "
},
{ " f l av o r ": " la rg e " ,
" i n st a n c e_ t y pe ": " m4 . x l ar g e "
}
]
}
]
Listing 3: District Definitions (districts.json).
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
42
... This component is capable to operate and transfer elastic container platforms in multi-cloud contexts at runtime without downtimes. We refer to the original studies [51,52,53,54] and our software prototypes PLAIN, ECP DEPLOY, and OPEN4SSH (see Appendix D: Table 18) for more in-depth information. ...
... and the necessity to be applied on ECPs operated in a way described by [51]. ...
... The DSL must be designed to be independent of a specific ECP or cloud infrastructure. This requirement comprises [AD], and the necessity to be applied on ECPs operated in a way described by [51]. ...
Technical Report
Full-text available
The project CloudTRANSIT dealt with the question of how to transfer cloud applications and services at runtime without downtime across cloud infrastructures from different public and private cloud service providers. This technical report summarizes the outcomes of approximately 20 research papers that have been published throughout the project. This report intends to provide an integrated birds-eye view on these-so far-isolated papers. The report references the original papers where ever possible. This project also systematically investigated practitioner initiated cloud application engineering trends of the last three years that provide several promising technical opportunities to avoid cloud vendor lock-in pragmatically. Especially European cloud service providers should track such kind of research because of the technical opportunities to bring cloud application workloads back home to Europe. Such workloads are currently often deployed and inherently bound to U.S. providers. Intensified EU General Data Protection (GDPR) policies, European Cloud Initiatives, or "America First" policies might even make this imperative. So, technical solutions needed for these scenarios that are manageable not only by large but also by small and medium-sized enterprises. Therefore, this project systematically analyzed commonalities of cloud infrastructures and cloud applications. Latest evolutions of cloud standards and cloud engineering trends (like containerization) were used to derive a cloud-native reference model (ClouNS) that guided the development of a pragmatic cloud-transferability solution. This solution intentionally separated the infrastructure-agnostic operation of elastic container platforms (like Swarm, Kubernetes, Mesos/Marathon, etc.) via a multi-cloud-scaler and the platform-agnostic definition of cloud-native applications and services via an unified cloud application modeling language. Both components are independent but complementary. Because of their independence, they can even contribute (although not intended) to other fields like moving target based cloud security-but also distributed ledger technologies (block-chains) made provide options here. The report summarizes the main outcomes and insights of a proof-of-concept solution to realize transferability for cloud applications and services at runtime without downtime.
... Moreover, we analyze the potential of Kubernetes to detect and configure compatible support for desired features across these vendors in a uniform manner. This analysis is inspired by the fact that existing state-of-the-art on modeldriven migration of clusters across cloud providers [17] is very similar to the declarative configuration management approach of Kubernetes. Declarative configuration management approaches enforce a desired cluster state by a control loop that detects differences between desired and actual cluster state. ...
... In both cases, the need arises to ensure that a particular application or cluster resource works the same across multiple cloud providers either when migrating or replicating these resources to another cloud provider. The existing stateof-the-art in cloud migration and multi-cloud resource orchestration has thoroughly validated the model-driven approach that allows for infrastructure-agnostic configuration management of container orchestration platforms across multiple cloud providers [17], [6], [101], [102]. ...
... We already referred to our own work on an infrastructureagnostic middleware platform to transfer container clusters from one cloud provider to another cloud provider [17], [6]. As the requirements of this middleware platform favor pragmatism over expressiveness [20], the middleware platform supports commonly supported features that are VOLUME XX, 2017 supported by Kubernetes, Docker Swarm and Mesos and therefore ignore many unique features of Kubernetes. ...
Article
Full-text available
Kubernetes (k8s) is a kind of cluster operating system for cloud-native workloads that has become a de-facto standard for container orchestration. Provided by more than one hundred vendors, it has the potential to protect the customer from vendor lock-in. However, the open-source k8s distribution consists of many optional and alternative features that must be explicitly activated and may depend on pre-configured system components. As a result, incompatibilities still may ensue among Kubernetes vendors. Mostly managed k8s services typically restrict the customizability of Kubernetes. This paper firstly compares the most relevant k8s vendors and, secondly, analyses the potential of Kubernetes to detect and configure compatible support for required features across vendors in a uniform manner. Our comparison is performed based on documented features, by testing, and by inspection of the configuration state of running clusters. Our analysis focuses on the potential of the end-to-end testing suite of Kubernetes to detect support for a desired feature in any Kubernetes vendor and the possibility of reconfiguring the studied vendors with missing features in a uniform manner. Our findings are threefold: First, incompatibilities arise between default cluster configurations of the studied vendors for approximately 18% of documented features. Second, matching end-to-end tests exist only for around 64% of features and for 17% of features these matching tests are not well developed for all vendors. Third, almost all feature incompatibilities can be resolved using a vendor-agnostic API. These insights are beneficial to avoid feature incompatibilities already in cloud-native application engineering processes. Moreover, the end-to-end testing suite can be extended in currently unlighted areas to provide better feature coverage.
... Recent research [3], [4] made successfully use of elastic container platforms (see Table I) and their "designed for failure" capabilities to realize transferability of cloud-native applications at runtime. By transferability, the conducted research means that a cloud-native application can be moved from one IaaS provider infrastructure to another without any downtime. ...
... Figure 3. So, platforms can be operated elastically in a set of synchronized IaaS infrastructures. Explained in details by [3]. ...
Article
Full-text available
Cloud applications expose besides service endpoints also potential or actual vulnerabilities. Therefore, cloud security engineering efforts focus on hardening the fortress walls but seldom assume that attacks may be successful. At least against zero-day exploits, this approach is often toothless. Other than most security approaches and comparable to biological systems, we accept that defensive "walls" can be breached at several layers. Instead of hardening the "fortress" walls, we propose to make use of an (additional) active and adaptive defense system to attack potential intruders. This immune system is inspired by the concept of a moving target defense. This "immune system" works on two layers. On the infrastructure layer, virtual machines are continuously regenerated (cell regeneration) to wipeout even undetected intruders. On the application level, the vertical and horizontal attack surface is continually modified to circumvent successful replays of formerly scripted attacks. Our evaluations with two common cloud-native reference applications in popular cloud service infrastructures (Amazon Web Services, Google Compute Engine, Azure and OpenStack) show that itis technically possible to limit the time of attackers acting detected down to minutes. Further, more than 98% of an attack surface can be changed automatically and minimized, which makes it hard for intruders to replay formerly successful scripted attacks. So, even if intruders get a foothold in the system, it is hard for them to maintain it. Therefore, our proposals are robust and dynamically change due in response to security threats similar to biological immune systems.
... Recent research [3], [4] made successfully use of elastic container platforms (see Table I) and their "designed for failure" capabilities to realize transferability of cloud-native applications at runtime. By transferability, the conducted research means that a cloud-native application can be moved from one IaaS provider infrastructure to another without any downtime. ...
... We could Figure 3. So, platforms can be operated elastically in a set of synchronized IaaS infrastructures. Explained in details by [3]. even regenerate the complete cluster by changing the cluster size in the following way: ...
Preprint
Full-text available
Cloud applications expose - besides service endpoints - also potential or actual vulnerabilities. Therefore, cloud security engineering efforts focus on hardening the fortress walls but seldom assume that attacks may be successful. At least against zero-day exploits, this approach is often toothless. Other than most security approaches and comparable to biological systems we accept that defensive "walls" can be breached at several layers. Instead of hardening the "fortress" walls we propose to make use of an (additional) active and adaptive defense system to attack potential intruders - an immune system that is inspired by the concept of a moving target defense. This "immune system" works on two layers. On the infrastructure layer, virtual machines are continuously regenerated (cell regeneration) to wipe out even undetected intruders. On the application level, the vertical and horizontal attack surface is continuously modified to circumvent successful replays of formerly scripted attacks. Our evaluations with two common cloud-native reference applications in popular cloud service infrastructures (Amazon Web Services, Google Compute Engine, Azure and OpenStack) show that it is technically possible to limit the time of attackers acting undetected down to minutes. Further, more than 98% of an attack surface can be changed automatically and minimized which makes it hard for intruders to replay formerly successful scripted attacks. So, even if intruders get a foothold in the system, it is hard for them to maintain it.
... This Chapter extends the ideas formulated in [17] and focuses on the complexity to transfer Cloud-native Applications (CNA) at runtime which seems -even after 10 years of cloud computing -to be an astonishingly complex problem [22,31]. It can be hard to operate a CNA across di↵erent public or private infrastructures. ...
... Deployment options and transferability opportunities: taken from[17] ...
Preprint
Full-text available
Cloud-native applications are often designed for only one specific cloud infrastructure or platform. The effort to port such kind of applications into a different cloud is usually a laborious one time exercise. Modern Cloud-native application architecture approaches make use of popular elastic container platforms (Apache Mesos, Kubernetes, Docker Swarm). These kind of platforms contribute to a lot of existing cloud engineering requirements. This given, it astonishes that these kind of platforms (already existing and open source available) are not considered more consequently for multi-cloud solutions. These platforms provide inherent multi-cloud support but this is often overlooked. This paper presents a software prototype and shows how Kubernetes and Docker Swarm clusters could be successfully transfered at runtime across public cloud infrastructures of Google (Google Compute Engine), Microsoft (Azure) and Amazon (EC2) and further cloud infrastructures like Open-Stack. Additionally, software engineering lessons learned are derived and some astonishing performance data of the mentioned cloud infrastruc-tures is presented that could be used for further optimizations of IaaS transfers of Cloud-native applications.
... This Chapter extends the ideas formulated in [17] and focuses on the complexity to transfer Cloud-native Applications (CNA) at runtime which seems -even after 10 years of cloud computing -to be an astonishingly complex problem [22,31]. It can be hard to operate a CNA across different public or private infrastructures. ...
... Deployment options and transferability opportunities: taken from[17] ...
Chapter
Full-text available
Cloud-native applications are often designed for only one specific cloud infrastructure or platform. The effort to port such kind of applications into a different cloud is usually a laborious one time exercise. Modern Cloud-native application architecture approaches make use of popular elastic container platforms (Apache Mesos, Kubernetes, Docker Swarm). These kind of platforms contribute to a lot of existing cloud engineering requirements. This given, it astonishes that these kind of platforms (already existing and open source available) are not considered more consequently for multi-cloud solutions. These platforms provide inherent multi-cloud support but this is often overlooked. This paper presents a software prototype and shows how Kubernetes and Docker Swarm clusters could be successfully transfered at runtime across public cloud infrastructures of Google (Google Compute Engine), Microsoft (Azure) and Amazon (EC2) and further cloud infrastructures like OpenStack. Additionally, software engineering lessons learned are derived and some astonishing performance data of the mentioned cloud infrastructures is presented that could be used for further optimizations of IaaS transfers of Cloud-native applications.
... The elastic nature is incorporated with the use of CLUES manager. Kratzke attempts to enable multi-cloud deployments of cloud-native applications [31]. The inherent nature of cloud-native applications is that it is developed for a specific Cloud provider. ...
Chapter
Microservice architectures (MSA), composed of loosely coupled and autonomous units called microservices, are gaining wide adoption in the software field. With characteristics that are loyal to the requirements of the Cloud environment, such as inherent support for continuous integration/continuous deployment (CI/CD), MSA are actively embraced by the Cloud computing community. Containers employing lightweight virtualization have also been increasingly adopted in the Cloud environment. The containers wrap applications along with their dependencies into self-contained units, which can be deployed independently. These features make it the unanimously accepted technology to enable seamless execution of microservices in the Cloud. With this outlook, this chapter undertakes a study on how containers may be used to support the execution of microservices. The study also includes other technologies that, in collaboration with container technologies, provide the support required for running microservices in the Cloud. An interesting concern for applications running on containers is resource management. Nevertheless, this is a significant aspect for supporting microservices as well. Such issues have been identified and research works addressing all or some of these issues, have been considered. The various relevant studies have been classified into different categories and the future directions have been identified, which can be used by researchers aiming to enhance the technological support for microservices in Cloud.
... Both components are independent but complementary and provide a solution to operate elastic (container) platforms in an infrastructure-agnostic, secure, transferable, and elastic way. This multi-cloud-scaler is described in [6,7]. Additionally, we had to find a solution to describe cloud applications in a unified format. ...
Article
Full-text available
This paper presents a review of cloud application architectures and its evolution. It reports observations being made during a research project that tackled the problem to transfer cloud applications between different cloud infrastructures. As a side effect, we learned a lot about commonalities and differences from plenty of different cloud applications which might be of value for cloud software engineers and architects. Throughout the research project, we analyzed industrial cloud standards, performed systematic mapping studies of cloud-native application-related research papers, did action research activities in cloud engineering projects, modeled a cloud application reference model, and performed software and domain-specific language engineering activities. Two primary (and sometimes overlooked) trends can be identified. First, cloud computing and its related application architecture evolution can be seen as a steady process to optimize resource utilization in cloud computing. Second, these resource utilization improvements resulted over time in an architectural evolution of how cloud applications are being built and deployed. A shift from monolithic service-oriented architectures (SOA), via independently deployable microservices towards so-called serverless architectures, is observable. In particular, serverless architectures are more decentralized and distributed and make more intentional use of separately provided services. In other words, a decentralizing trend in cloud application architectures is observable that emphasizes decentralized architectures known from former peer-to-peer based approaches. This is astonishing because, with the rise of cloud computing (and its centralized service provisioning concept), the research interest in peer-to-peer based approaches (and its decentralizing philosophy) decreased. However, this seems to change. Cloud computing could head into the future of more decentralized and more meshed services.
... Both components are independent but complementary and provide a solution to operate elastic (container) platforms in an infrastructure-agnostic, secure, transferable, and elastic way. This multi-cloud-scaler is described in [6,7]. Additionally, we had to find a solution to describe cloud applications in a unified format. ...
Preprint
Full-text available
This paper presents a review of cloud application architectures and its evolution. It reports observations being made during the course of a research project that tackled the problem to transfer cloud applications between different cloud infrastructures. As a side effect we learned a lot about commonalities and differences from plenty of different cloud applications which might be of value for cloud software engineers and architects. Throughout the course of the research project we analyzed industrial cloud standards, performed systematic mapping studies of cloud-native application related research papers, performed action research activities in cloud engineering projects, modeled a cloud application reference model, and performed software and domain specific language engineering activities. Two major (and sometimes overlooked) trends can be identified. First, cloud computing and its related application architecture evolution can be seen as a steady process to optimize resource utilization in cloud computing. Second, this resource utilization improvements resulted over time in an architectural evolution how cloud applications are being build and deployed. A shift from monolithic servce-oriented architectures (SOA), via independently deployable microservices towards so called serverless architectures is observable. Especially serverless architectures are more decentralized and distributed, and make more intentional use of independently provided services. In other words, a decentralizing trend in cloud application architectures is observable that emphasizes decentralized architectures known from former peer-to-peer based approaches. That is astonishing because with the rise of cloud computing (and its centralized service provisioning concept) the research interest in peer-to-peer based approaches (and its decentralizing philosophy) decreased. But this seems to change. Cloud computing could head into future of more decentralized and more meshed services.
Chapter
Developing cloud-native applications demands a radical shift from the way we design and build traditional applications. Application designers usually divide their business logic into several business functions, each developed according to a microservices architectural style and packaged in containers. Throughout the stages of cloud-native application development (development, testing, staging and production), container orchestration helps coordinate the execution environment. Thanks to the increasing popularity of cloud-native applications, there has been a growing interest in container and orchestration technologies recently. However, despite their closeness, these two inter-related technologies are supported by different toolsets and specification formalisms, with minimal portability between them and usually a disregard for the best practices. This paper presents velo, a domain-specific language (DSL) that unifies containerisation and orchestration concepts. velo has two components: (1) a specification language that supports an abstract description of containerisation and orchestration for a complex application; and (2) a transpiler, a source-to-source compiler into concrete container manifest and orchestration description.
Article
Full-text available
It is common sense that cloud-native applications (CNA) are intentionally designed for the cloud. Although this understanding can be broadly used it does not guide and explain what a cloud-native application exactly is. The term "cloud-native" was used quite frequently in birthday times of cloud computing (2006) which seems somehow obvious nowadays. But the term disappeared almost completely. Suddenly and in the last years the term is used again more and more frequently and shows increasing momentum. This paper summarizes the outcomes of a systematic mapping study analyzing research papers covering "cloud-native" topics, research questions and engineering methodologies. We summarize research focuses and trends dealing with cloud-native application engineering approaches. Furthermore, we provide a definition for the term "cloud-native application" which takes all findings, insights of analyzed publications and already existing and well-defined terminology into account.
Conference Paper
Full-text available
Due to REST-based protocols, microservice architectures are inherently horizontally scalable. That might be why the microservice architectural style is getting more and more attention for cloud-native application engineering. Corresponding microservice architectures often rely on a complex technology stack which includes containers, elastic platforms and software defined networks. Astonishingly, there are almost no specialized tools to figure out performance impacts (coming along with this microservice architectural style) in the upfront of a microservice design. Therefore, we propose a benchmarking solution intentionally designed for this upfront design phase. Furthermore, we evaluate our benchmark and present some performance data to reflect some often heard cloud-native application performance rules (or myths).
Article
Full-text available
Web application providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of cloud is elasticity. It allows cloud users to acquire or release computing resources on demand, which enables web application providers to auto-scale the resources provisioned to their applications under dynamic workload in order to minimize resource cost while satisfying Quality of Service (QoS) requirements. In this paper, we comprehensively analyze the challenges remain in auto-scaling web applications in clouds and review the developments in this field. We present a taxonomy of auto-scaling systems according to the identified challenges and key properties. We analyze the surveyed works and map them to the taxonomy to identify the weakness in this field. Moreover, based on the analysis, we propose new future directions.
Conference Paper
Full-text available
A Cloud Services Brokerage (CSB) acts as an intermediary between cloud service providers (e.g., Amazon and Google) and cloud service end users, providing a number of value adding services. CSBs as a research topic are in there infancy. The goal of this paper is to provide a concise survey of existing CSB technologies in a variety of areas and highlight a roadmap, which details five future opportunities for research.
Article
Full-text available
In previous work, we concluded that container technologies and overlay networks typically have negative performance impacts, mainly due to an additional layer to networking. This is what everybody would expect, only the degree of impact might be questionable. These negative performance impacts can be accepted (if they stay moderate), due to a better flexibility and manageability of the resulting systems. However, we draw our conclusion only on data covering small core machine types. This extended work additionally analyzed the impact of various (high core) machine types of different public cloud service providers (Amazon Web Services, AWS and Google Compute Engine, GCE) and comes to a more differentiated view and some astonishing results for high core machine types. Our findings stand to reason that a simple and cost effective strategy is to operate container cluster with highly similar high core machine types (even across different cloud service providers). This strategy should cover major relevant and complex data transfer rate reducing effects of containers, container clusters and software-defined-networks appropriately.
Conference Paper
Full-text available
The PaaSage project delivers a platform to support the modelling, execution, and automatic adaptation of multi-cloud applications (i.e., applications deployed across multiple private, public, or hybrid cloud infrastructures). In order to cover the necessary aspects of the modelling and execution of multi-cloud applications, PaaSage adopts the Cloud Application Modelling and Execution Language (CAMEL). In particular, PaaSage leverages upon CAMEL models that are progressively refined throughout the modelling, deployment, and execution phases of the PaaSage workflow. By leveraging upon CAMEL models not only at design-time but also run-time, PaaSage enables self-adaptive multi-cloud applications (i.e., multi-cloud applications that automatically adapt to changes in the environment).
Conference Paper
Even after being around for almost a decade, the cloud still has tremendous complexity and suffers from vendor lock-in. Users still face poor comparability of performance and quality of service of different operators. This still hinders users of all kinds to take full advantage of clouds and even more of cross-cloud deployments. While multiple tools exist promising to address all those issues, they still lack fundamental features. Based on an analysis of existing tools, we introduce Cloudiator, an open source software, addressing existing issues in cross-cloud orchestration and depict our experiences and insights gained while designing and implementing it.
Article
Cloud computing is becoming important for ICT industry. Despite undoubted advantages in term of scalability and cost savings, today there are still issues regarding its massive diffusion due to portability of applications. To address this problem, in this paper we propose a new approach for the development of portable applications for Platform as a Service (PaaS) systems. This is based on a Java library exposing a vendor independent API that provides an abstract intermediation layer for the most important middleware services typically offered by PaaS systems (e.g., NoSQL services, message queues and memcache). The current version of our library supports the portability of applications across Java platforms for Google App Engine and Windows Azure. We have conducted some experiments especially focusing on evaluating the performance degradation introduced by our library when executing an application on both PaaS. The experiments demonstrate that such degradation is not significant.