PreprintPDF Available

About the Complexity to Transfer Cloud Applications at Runtime and how Container Platforms can Contribute?

Authors:
  • Lübeck University of Applied Sciences
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Cloud-native applications are often designed for only one specific cloud infrastructure or platform. The effort to port such kind of applications into a different cloud is usually a laborious one time exercise. Modern Cloud-native application architecture approaches make use of popular elastic container platforms (Apache Mesos, Kubernetes, Docker Swarm). These kind of platforms contribute to a lot of existing cloud engineering requirements. This given, it astonishes that these kind of platforms (already existing and open source available) are not considered more consequently for multi-cloud solutions. These platforms provide inherent multi-cloud support but this is often overlooked. This paper presents a software prototype and shows how Kubernetes and Docker Swarm clusters could be successfully transfered at runtime across public cloud infrastructures of Google (Google Compute Engine), Microsoft (Azure) and Amazon (EC2) and further cloud infrastructures like Open-Stack. Additionally, software engineering lessons learned are derived and some astonishing performance data of the mentioned cloud infrastruc-tures is presented that could be used for further optimizations of IaaS transfers of Cloud-native applications.
Content may be subject to copyright.
About the Complexity to Transfer Cloud
Applications at Runtime and how Container
Platforms can Contribute?
Nane Kratzke
ubeck University of Applied Sciences,
Center for Communication, Systems and Applications (CoSA), Germany,
nane.kratzke@fh-luebeck.de
Abstract. Cloud-native applications are often designed for only one
specific cloud infrastructure or platform. The eort to port such kind
of applications into a dierent cloud is usually a laborious one time ex-
ercise. Modern Cloud-native application architecture approaches make
use of popular elastic container platforms (Apache Mesos, Kubernetes,
Docker Swarm). These kind of platforms contribute to a lot of existing
cloud engineering requirements. This given, it astonishes that these kind
of platforms (already existing and open source available) are not consid-
ered more consequently for multi-cloud solutions. These platforms pro-
vide inherent multi-cloud support but this is often overlooked. This pa-
per presents a software prototype and shows how Kubernetes and Docker
Swarm clusters could be successfully transfered at runtime across pub-
lic cloud infrastructures of Google (Google Compute Engine), Microsoft
(Azure) and Amazon (EC2) and further cloud infrastructures like Open-
Stack. Additionally, software engineering lessons learned are derived and
some astonishing performance data of the mentioned cloud infrastruc-
tures is presented that could be used for further optimizations of IaaS
transfers of Cloud-native applications.
Keywords: cloud-native application, multi-cloud, elastic platform, con-
tainer, portability, transferability, MAPE, AWS, GCE, Azure, Open-
Stack, Kubernetes, Docker, Swarm
1 Introduction
This Chapter extends the ideas formulated in [17] and focuses on the complex-
ity to transfer Cloud-native Applications (CNA) at runtime which seems even
after 10 years of cloud computing to be an astonishingly complex problem [22,
31]. It can be hard to operate a CNA across dierent public or private infras-
tructures. Very often, because standardization in cloud computing is not very
established or in very early stages. A very good case study is Instagram. Insta-
gram had to analyze their existing services for almost one year to derive a viable
migration plan how to transfer their services from Amazon Web Services (AWS )
2
Table 1. S o me p o p u l a r op e n s o u rce elastic pla t f o r m s . These kind of platforms
can used as a kind of cloud infrastructure unifying midd leware.
Platform Contributors URL
Kubernetes Cloud Native Comput. Found. http://kubernetes.io (initiated by Google)
Swarm Docker https://docker.io
Mesos Apache http://mesos.apache.org/
Nomad Hashicorp https://nomadproject.io/
to Facebook datacenters1. This migration was accompanied by some observable
outages for service customers. This phenomenon is called a vendor lock-in and
CNAs seem to be extremely vulnerable for it [18]. Therefore, the author pro-
poses to think about the way how to deploy CNAs in order to get migration
capabilities across dierent cloud infrastructures by design. The central idea is
to split the migration problem into two independent engineering problems which
are too often solved together.
1. The infrastructure aware deployment and operation of elastic container
platforms (like the platforms listed in Table 1). However, these platforms
can be deployed and operated in a way that they can be transferred across
IaaS infrastructures of dierent private and public cloud service providers at
runtime as this contribution will show.
2. The infrastructure agnostic deployment of applications on top of these
kind of migrateable container platforms. These elastic container platforms
can be seen as a kind of cloud infrastructure unifying middleware.
The main point of this contribution is to make use of elastic container platforms
that can be used to abstract and encapsulate IaaS infrastructure specifics. That
makes it possible to define CNAs that must not be aware about specifics of un-
derlying cloud infrastructures. CNAs are operated on a logical platform and this
logical platform is transferable across dierent public or private IaaS infrastruc-
tures. Although this is possible technologically as this contribution will show,
almost no recent multi-cloud survey study (see Section 6) considered elastic con-
tainer platforms (see Table 1) as a viable and pragmatic option to support this
style of multi-cloud handling. It is very astonishing that this kind of already ex-
isting and open source available technology is not considered more consequently
(see Section 6). That might have to do with the fact, that ”the emergence of
containers, especially container supported microservices and service pods, has
raised a new revolution in [...] resource management.” [9]. However, container
technologies and elastic container platforms gained substantial momentum in
recent years and resulted in a lot of technological progress driven by companies
like Docker,Netflix,Google,Facebook,Twitter. A lot of these companies released
their solutions as Open Source software. Having this progress in mind, we have
1To the best of the author’s knowledge there are no research papers analyzing this
interesting case study. So, the reader is referred to an Wired magazine article: https:
//www.wired.com/2014/06/facebook-instagram/
3
to state that existing multi-cloud approaches are often dated before container
technologies have been widespread and seem very complex much too complex
for a lot of use cases of cloud-native applications. This Chapter considers this
progress and presents a software prototype that provides the following:
Section 4.2 presents a control loop that is able to scale elastic container
platforms in multi-cloud scenarios. But the control loop make use of the
same features providing scalability to support federation and transferability
across multiple IaaS cloud infrastructures as a side-eect.
The intention of this control loop is to be used in the execution phase of
higher-level auto-scaling MAPE loops (monitoring, analysis, planning, exe-
cution) [26, 9] and to make the necessity for complex and IaaS infrastructure-
specific multi-cloud workflows redundant (to some degree).
Section 2 will investigate how CNAs are being build and how to use these insights
to avoid vendor lock-in in a pragmatic and often overlooked way. Section 3 will
focus requirements which should be fulfilled by multi-cloud capable CNAs and
how existing open source elastic container platforms contribute pragmatically
to fulfill these requirements in a resilient and elastic way. Section 4 presents a
multi-cloud aware proof-of-concept. Several lessons learned from the evaluation,
performance analysis and software prototyping are presented in Section 5. The
presented approach is related to other work in Section 6.
2 Commonalities of Cloud-native Applications
There exist noteworthy similarities of various view points with regard to the
vague term CNA [22]. A common approach is to define maturity levels in or-
der to categorize dierent kind of cloud applications. Table 2 shows a maturity
model proposed by the Open Data Center Alliance. Common motivations for
CNA architectures are software development speed (time to market), fault iso-
lation, fault toleraterance, and automatic recovery to improve safety,andto
enable horizontal (instead of vertical) application scalability [32]. Fehling et
al. [10] proposed the IDEAL model for CNAs. A CNA should strive for an
isolated state,isdistributed, provides elasticity in a horizontal scaling way,
and should be operated on an automated deployment machinery. Finally,
its components should be loosely coupled.
Balalaie et al. [4] stress that these properties are addressed by cloud-specific
architecture and infrastructure approaches like Microservices [24, 25], API-
based collaboration, adaption of cloud-focused patterns [10], and self-
service elastic platforms that are used to deploy and operate these microser-
vices via self-contained deployment units (containers). These platforms provide
additional operational capabilities on top of IaaS infrastructures like automated
and on-demand scaling of application instances, application health management,
dynamic routing and load balancing as well as aggregation of logs and metrics
[22]. Some open source examples of such kind of elastic platforms are listed in
Table 1.
4
Table 2. Cl o u d Applicatio n M a t u r i ty Mo d e l, adapted from OPEN DATA CEN-
TER ALLIANCE Best Practices (Architecting Cloud-Aware Applications) [3]
Level Maturity Criteria
3 Cloud - A CNA can migrate across infrastructure providers at runtime
native and without interruption of service (focus of this Chapter).
- A CNA can automatically scale out/in based on stimuli.
2 Cloud - The application state is isolated in a minimum of services.
resilient - The application is unaected by dependent service failures.
- The application is infrastructure agnostic.
1 Cloud - The application is composed of loosely coupled services.
friendly - Application services are discoverable by name (not by IP).
- Application components are designed using cloud patterns.
- Application compute and storage are separated.
0 Cloud - The application runs on virtualized infrastructure.
ready - The application can be instantiated from an image or script.
If the reader understands the commonality that CNAs are operated (more
and more often) on elastic often container-based platforms, it is an obvious
idea to delegate the multi-cloud handling down to these platforms. The question
is how to do this? Therefore, the multi-cloud aware handling of these elastic
platforms is focused throughout this Chapter.
3 Multi-Cloud Specifics
A lot of requirements regarding transferability, awareness and security come
along with multi-cloud approaches [5,30, 33, 13]. These requirements will be
adressed in this Section. Furthermore it is investigated how already existing
elastic container platforms contribute to fulfill these requirements [18]. Impa-
tient readers may jump directly to Table 3 at the end of this Section which
summarizes the main points.
3.1 Transferability requirements
Cloud computing can be understood as a computing model making use of ubiq-
uitous network access to a shared and virtualized pool of resources. Sadly, this
conceptual model is quite vague and has been implemented by a large number
of service providers in dierent and not necessarily standardized or compatible
ways. So, portability or transferability has to be requested for multi-cloud
capable CNAs due to several reasons like costs, optimal resource utilization,
technology changes, or legal issues [30].
Elastic container platforms (see Table 1) compose container hosts (nodes)
into a higher level logical concept a cluster. Such clusters provide self-service
elastic platforms for cloud-native applications [32] in an often overlooked way.
Some of these platforms are really ”bulletproofed”. Apache Mesos [14] has been
5
successfully operated for years by companies like Twitter or Netflix to consolidate
hundreds of thousands of compute nodes. Peinl and Holzschuher [28] provide an
excellent overview for interested readers. From the author’s point of view, there
are the following benefits using these elastic container platforms.
1. The integration of single nodes (container hosts) into one logical
cluster is mainly done to manage complexity. However, it is possible to
deploy such clusters across public and private cloud infrastructures.
2. If elastic container platforms are deployed across dierent cloud service
providers they can be still accessed as one logical cluster. And cross-provider
deployed elastic container platforms are obviously vendor lock-in avoid-
ing.
3. Furthermore, elastic container platforms are designed for failure and pro-
vide self-healing capabilities via auto-placement, auto-restart, auto-replication
and auto-scaling features. They will identfiy lost containers (due to whatever
reasons, e.g. process failure or node unavailability) and will restart containers
and place them on remaining nodes. These features are absolutely necessary
to operate large-scale distributed systems in a resilient way. However, exactly
the same features can be used intentionally to realize transferability re-
quirements.
A cluster can be resized simply by adding or removing nodes to the cluster.
Aected containers will be rescheduled to other available nodes. In a first step,
we simply attach additional nodes provisioned by GCE to the cluster. In a
second step, we shut down all nodes provided by AW S . The cluster will observe
node failures and trigger its self-healing features to reschedule lost containers
accordingly. From an inner point of view of the platform, rescheduling operations
are tasked due to node failures. From an outside point of view it looks like (and
in fact is) a migration from one provider to another provider at run-time. At this
point this should make the general idea clear. All the details will be explained
in more details in Section 4 and further multi-cloud options like public cloud
exits, cloud migrations, public multi-clouds, hybrid clouds, overflow processing
and further can be handled using the same approach (see Figure 1).
3.2 Awareness requirements
Beside portability/transferability requirements,multi-cloud applications need to
have several additional awarenesses [13]:
1. Data location awareness: The persistent data and the processing units of
an application should be in the same data center (even on the same rack).
2. Geo-location awareness: To achieve better performance requests should
be scheduled near the geographical location of their origin.
3. Pricing awareness: Fiscally ecient provisioning needs information about
providers’ prices for scheduling operations.
6
Fig. 1. Deployment options and transferability opportunities: taken from [17]
4. Legislation/policy awareness: For some use cases legislative and political
considerations upon provisioning and scheduling must be taken into account.
For example, some services could be required to avoid placing data outside
a given country.
5. Local resources awareness: Very often in-house resources should have
higher priority than external ones (overflow processing into a public cloud).
Platforms like Kubernetes,Mesos,Docker Swarm are able to tag nodes of their
clusters with arbitrary key/value pairs. These tags can be used to encode geo-
locations, prices, policies and preferred local resources (and further aspects).
So, the above mentioned awareness requirements can be mapped to scheduling
constraints for container schedulers of elastic platforms.
For instance, Docker Swarm uses constraint filters2for that kind of purpose.
Arbitrary tags like a location tag ”Germany” can be assigned to a node. This tag-
ging can be defined in a cluster definition file and will be assigned to a node in the
install node step shown in Figure 4. Docker Swarm would schedule a CouchDB
database container only on nodes which are tagged as ”location=Germany” if a
constraint filter is applied like shown.
docker run -e constraint:location==Germany couchdb
Kubernetes provides similar tag-based concepts called node selectors and even
more expressive (anti-)anities which are considered by the Kubernetes sched-
2See https://docs.docker.com/v1.11/swarm/scheduler/filter/ (last access 15th
Feb. 2017)
7
Table 3. Common multi-cloud requirements and contributing elastic platform concepts
Requirements Contributing platform concepts
Transferability - Integration of nodes into one logical elastic platform
- Elastic platforms are designed for failure
- Cross-provider deployable (as shown by Section 4)
Data location - Pod Concept (Kubernetes)
- Volume Orchestrators (e.g. Flocker for Docker)
Awarenesses - Tagging of nodes with geolocation, pricing,
- Pricing policy or on-premise informations.
- Legislation/policy - Platform schedulers have selectors (Swarm),
- Local resources anitities (Kubernetes), constraints (Mesos/Marathon)
to consider these taggings for scheduling.
Security - Default encrypted data/control plane (e.g. Swarm)
- Pluggable and encryptable overlay networks
(e.g. Weave for Kubernetes)
uler3.TheMarathon framework for Mesos uses cons tra int s 4. All of these con-
cepts rely on the same idea and can be used to handle mentioned awareness
requirements.
3.3 Security Requirements
If such kind of elastic platforms are deployed across dierent providers it is
likely that data has to be submitted via the ”open and unprotected” internet.
Therefore elastic container platforms provide encryptable overlay networks which
can be used for such kind of scenarios. For instance, Docker’s Swarm Mode (since
version 1.12) provides an encrypted data and control plane and Kubernetes can
be configured to use encryptable overlay network plugins like Weave. The often
feared network performance impacts can be contained [20, 21].
4 Proof of Concept
This Section presents a software prototype that is implemented in Ruby and
provides a command line tool as core component. The tool can be triggered in
the execution phase of a MAPE auto-scaling loop [9] and scales elastic container
platforms according to an execution pipeline (see Figure 4). The control process
interprets cluster description format (the intended state of the container cluster,
see Appendix A) and the current state of the cluster (attached nodes, existing
security groups, see Appendix B). If the intended state diers from the cur-
rent state necessary adaption actions are deduced (attach/detachment of nodes,
creation and termination of security groups). The execution pipeline assures that
3see https://kubernetes.io/docs/user-guide/node-selection/ (last access 15th
Feb. 2017)
4see https://mesosphere.github.io/marathon/docs/constraints.html (last ac-
cess 15th Feb. 2017)
8
The control theory inspired execution control loop compares the intended state of an
elastic container platform with the current state and derives necessary actions. These
actions are processed by the execution pipeline explained in Figure 4. The elastic con-
tainer platform is operated in a set of synchronized security groups across dierent IaaS
infrastructures.
An elastic container platform composed of several nodes is deployed to multiple
providers and secured by synchronized security groups (IP incoming access rules) al-
lowing platform internal trac only. To allow external trac this must be configured
explicitly by the operator and is not done by the execution pipeline or covered by this
contribution.
Fig. 2. The execution loop and synchronized security group concept
9
security groups are established in participating cloud service infrastructures
to enable network access control before a node joins the elastic container
platform,
that all nodes are reachable by all other nodes of the cluster adjusting all
security groups if a node enters or leaves the cluster,
that nodes are provided with all necessary software and configuration in-
stalled to join the elastic container platform successfully,
that the elastic container platform is operated in a way that it is providing
encrypted overlay networks for containers,
that removed nodes are drained (graceful node shutdown) in order to initiate
rescheduling of workloads to remaining nodes of the cluster,
that leaving nodes and ”empty” security groups are terminated to free re-
sources of IaaS infrastructures.
4.1 Description of Elastic Platforms
The conceptual model shown in Figure 3 is used tp describe the deployment
of elastic platforms in multi-cloud scenarios and considers arbitrary IaaS cloud
service providers [18]. Public cloud service providers organize their IaaS services
using mainly two approaches: project- and region-based service delivery. GCE
and OpenStack infrastructures are examples of project-based approaches. To
request IaaS resources like virtual machines one has to create a project first. The
project has access to resources of all provider regions. AW S is an example for
such kind of region-based service delivery. Both approaches have their advantages
and disadvantages as the reader will see. However, the approaches are given
and multi-cloud solutions must prepared that both approaches occur in parallel.
The conceptual model integrates both approaches introducing a concept called
District (see Figure 3).
ADistrict can be understood as a user defined ”datacenter” which is pro-
vided by a specific cloud service provider (following the project- or region-based
approach). So, provider regions or projects can be mapped to one or more
Districts and vice versa. This additional layer provides maximum flexibility in
defining multi-cloud deployments of elastic container platforms. A multi-cloud
deployed elastic container platform can be defined using two definition formats
(cluster.json and districts.json). The definition formats are explained in
the Appendices in more details.
An (elastic platform) is defined as a list of Deployments in a cluster definition
file (see Appendix A). A Deployment is defined per District and defines how
many nodes of a specific Flavor should perform a specific cluster role. A lot of
elastic container platforms have two main roles of nodes in a cluster. A ”master”
role to perform scheduling, control and management tasks and a ”worker” (or
slave) role to execute containers. The proposed prototype can work with arbi-
trary roles and rolenames. Role-specifics can be considered by Platform drivers
(see Figure 3) in their install,join and leave cluster hooks (see Figure 4). A
typical Deployment can be expressed using this JSON snippet.
10
Fig. 3. The conceptual data model: The relation to descriptive cluster definition
formats is shown as well (please compare with Appendices A, B, C, D).
{ "district": "gce-europe",
"flavor": "small",
"role": "master",
"quantity": 3
"tags": {
"location": "Europe", "policy": "Safe-Harbour",
"scheduling-priority": "low", "further": "arbitrary tags"
}
}
A complete elastic container platform will be composed by a list of such Deploy-
ments. Machine Flavors (e.g. small, medium and large machines) and Districts
are user defined and have to be mapped to concrete cloud service provider
specifics. This is done using the districts.json definition file (see Appendix C).
Each District object executes deployments by adding or removing machines
to the cluster as well as tagging them to handle the awareness requirements
mentioned in Section 3. The infrastructure-specific execution is delegated by a
District object to a Driver object. A Driver encapsules and processes all
necessary cloud service provider and elastic container platform specific requests.
The driver uses access Credentials for authentication (see Appendix D) and
generates Resource objects (Nodes and SecurityGroups) representing resources
(current state of the cluster, encoded in a resources.json file, see Appendix B)
provided by the cloud service provider (District). SecurityGroups are used to
allow internal platform communication across IaaS infrastructures. These basic
security means are provided by all IaaS infrastructures under dierent names
11
(firewalls, security groups, access rules, network rules, ...). This resources list is
used by the control process to build the delta between the intended state (en-
coded in cluster.json) and the current state (encoded in resources.json).
4.2 The Control Loop
The control loop shown in Figure 2 is responsible to reach the intended state
(encoded in cluster.json, Appendix A) and can handle common multi-cloud
workflows:
Adeployment of a cluster can be understood as running the execution
pipeline on an initially empty resources list.
Ashutdown can be expressed by setting all deployment quantities to 0.
Amigration from one District A to another District B can be expressed
by setting all Deployment quantities of A to 0 and adding the former quan-
tities of A to the quantities of B.
and more
The execution pipeline of the control loop derives a prioritized action plan to
reach the intended state (see Figure 4). The reader should be aware that the
pipeline must keep the aected cluster in a valid and operational state at all
times. The currently implemented strategy considers practitioner experience to
reduce ”stress” for the aected elastic container platform but other pipeline
strategies might work as well and are subject to further research.
Whenever a new node attachment is triggered by the control loop, the corre-
sponding Driver is called to launch a new Node request. The Node is added to the
list of requested resources (and extends therefore the current state of the clus-
ter). Then all existing SecurityGroups are updated to allow incoming network
trac from the new Node. These steps are handled by an IaaS Infrastructure
driver. Next, the control is handed over to a Platform driver performing nec-
essary software install steps via SSH-based scripting. Finally, the node is joined
to the cluster using platform (and maybe role-)specific joining calls provided by
the Platform driver. If install or join operations were not successful, the ma-
chine is terminated and removed from the resources list by the Infrastructure
driver. In these cases the current state could not be extended and a next round
of the control loop would do a retry. Due to its ”cybernetic” design philosophy,
the control-loop can handle failures simply by repeating failed actions in a next
loop.
4.3 IaaS Infrastructures and Platforms
The handling of infrastructure and platform specifics is done using an extendable
driver concept (see Figure 3). The classes Platform and Infrastructure form
two extension points to provide support for IaaS infrastructures like AWS,
GCE,Azure,DigitalOcean,RackSpace, ..., and for elastic container platforms
12
Fig. 4. Execution pipeline
like Docker Swarm,Kubernetes,Mesos/Marathon,Nomad, and so on. Infrastruc-
tures and platforms can be integrated by extending the Infrastructure class
(for IaaS infrastructures) or Platform class (for additional elastic container plat-
forms). The current state of implementation provides platform drivers for the
elastic container platforms Kubernetes and Docker’s SwarmMode and infras-
tructure drivers for the public IaaS infrastructures AW S ,GCE,Azure and
the IaaS infrastructure OpenStack. Due to the mentioned extension points fur-
ther container Platforms and IaaS Infrastructures are easily extendable.
Table 4 shows how this has been applied for dierent IaaS drivers. Although
only common and very basic IaaS concepts (virtual machine, security groups and
IP based access rules) have been used, we can observe a substantial variation
in infrastructure specific detail concepts. Even the drivers for OpenStack and
Azure rely on dierent detail concepts. This is astonishing because both drivers
13
Table 4 . I a aS d r i vers: The following APIs were used to interact with dierent cloud
infrastructures. In case of GCE the terminal based CLI gcloud program is used to
demonstrate that the presented approach would even work for infrastructures where no
Ruby API is available. Drivers are responsible to launch and terminate nodes (Ubuntu
16.04 LTS) and to synchronize security groups (see Figure 2).
Driver Provider API Used detail provider concepts
AWS AWS Ruby SDK - Keyfile (SSH public/private key)
- Virtual network + subnet (AWS VPC)
- Internet gateway + route tables
- Security group
- IP based ingress permissions
- Virtual machines (AWS instances)
OpenStack fog.io library - Keyfile (SSH public/private key)
fog-openstack - Security group
- Security group rules
- External network
- Floating IPs
- Fog server concept (virtual machine)
GCE gcloud CLI - Keyfile (SSH public/private key)
(fog.io would work as well) - Virtual network + subnet (GCE VPC)
- GCE instance concept (virtual machine)
Azure fog.io library - Keyfile (SSH public/private key)
with Azure plugin - Virtual network + subnet (Azure)
(fog-azure-rm) - Network interface (Azure)
- Public IPs (Azure)
- Storage account (Azure)
- Security group + rules (Azure)
- Resource group (Azure)
- Fog server concept (virtual machine)
See the following urls:
https://github.com/aws/aws-sdk-ruby,https://github.com/fog/fog-openstack,
https://github.com/fog/fog-azure-rm,https://cloud.google.com/sdk/gcloud/
have been implemented using the same cloud library (fog.io). That is why CNAs
are prone for vendor lock-in. They bind to these detail infrastructure specifics.
The proposed concept encapsulates all those details in low level infrastructure
drivers to make them completely transparent for the platform driver and CNAs
being operated on elastic container platforms.
5 Evaluation
The prototype was evaluated operating and transferring two elastic platforms
(Swarm Mode of Docker 17.06 and Kubernetes 1.7 ) across four public and pri-
vate cloud infrastructures (see Table 5). The platforms operated a reference
14
Fig. 5. Launching and terminating times (Kubernetes): The Kubernetes cluster
was composed of one master and five worker nodes. Data of single cloud experiments
E1 and E2 (see Table 6) is presented.
”sock-shop” application5being one of the most complete reference applications
for microservices architecture research [2].
5.1 Experiments
The implementation was tested using a 6 node cluster formed of one master
node and 5 worker nodes executing the above mentioned reference application.
The experiments shown in Table 6 demonstrate elastic container platform de-
ployments, terminations, and platform transfers across dierent cloud service
infrastructures. Additionally, the experiments were used to measure the run-
times of these kind of infrastructure operations.
To compare similar machine types it was decided to follow the approach
presented in [19] to make use of machine types from dierent providers that
show high similarities regarding processing, networking, I/O and memory per-
formance. Table 5 shows the selection of machine types.
It turned out that most of runtimes are due to low level IaaS infrastruc-
ture operations and not due to elastic container platform operations. Figure 6
5https://github.com/microservices-demo/microservices-demo (last access 3rd
July 2017)
Table 5. U s e d mach i n e t yp e s a n d r e gions: The machine types have been selected
according to the proposed method by Kratzke and Quint [19]. The OpenStack m1.large
and m1.medium are research institute specific machine types and that have been inten-
tionally defined to show maximum similarities with the other mentioned machine types.
The OpenStack platform is operated in the author’s research institution datacenter and
does not necessarily provide representative data.
Provider Region Master node type Worker node type
AWS eu-west-1 m4.xlarge (4 vCPU) m4.large (2 vCPU)
GCE europe-west1 n1-standard-4 (4 vCPU) n1-standard-2 (2 vCPU)
Azure europewest Standard A3 (4 vCPU) Standard A2 (2 vCPU)
OpenStack own datacenter m1.large (4 vCPU) m1.medium (2 vCPU)
15
Ta bl e 6 . E xp e r i me n ts : Single cloud experiments E1 and E2 were mainly used to
measure infrastructure specific timings (see Figure 5) and have been repeated 10 times.
Nr. Experiment Cloud Master Worker Provider
E1 Cluster launch single 1 5 AWS, GCE, Azure, OpenStack
E2 Cluster termination single 1 5 AWS, GCE, Azure, OpenStack
Multi-cloud transfer experiments E3: Multi-cloud experiments E3 are used to
demonstrate the transfer of elastic container platforms at runtime between dierent
cloud service infrastructures and to get a better understanding about runtimes for these
kind of operations (see Figure 8). 5 worker nodes of the cluster have been transfered
between two infrastructures in both directions. Initial transfer experi me nt s E3 .1 - E 3. 3
(to cover all providers) were repeated 10 times. Follow up transfer experiments (E3.4
- E3.6) to cover all possible provider pairings were only repeated 5 times. In total 450
nodes were transfered between 4 providers.
AW S GCE Azure
E3.1 E3.2 E3.3
OpenStack 5nodes,n=10 5nodes,n=10 5nodes,n=10
both directions both directions both directions
E3.4 E3.5
AW S see E1, E2 5nodes,n=5 5nodes,n=5
both directions both directions
E3.6
GCE see E3.4 see E1, E2 5nodes,n=5
both directions
shows platform dierences of Kubernetes and Swarm measured using the cloud
service provider AWS . We see that the container platform Swarm can be in-
stalled approximately 10 seconds faster than Kubernetes. And the joining of a
Kubernetes node is approximately 5 seconds slower than joining a Swarm node.
Only the cluster initialization of Kubernetes is remarkable slower. However, that
is an operation which is done only one time while bootstrapping the cluster.
The reader should compare these platform runtimes with infrastructure specific
runtimes presented in Figure 5. Even on the fastest provider it took more than
three minutes to launch a cluster. So, 15 seconds of installation and joining run-
time dierences between dierent elastic container platforms are negligible. So,
only the data for Kubernetes is presented throughout this Chapter. The data
for another elastic container platform like Swarm would be simply to similar to
present. Instead of that, Figure 7 can be used to identify much more severe and
there more interesting time intensive infrastructure operations. Several interest-
ing findings especially regarding software defined network aspects in multi-cloud
scenarios and reactiveness of public cloud service infrastructures might be of
interest for the reader.
16
Fig. 6. Dierences in platform specific timings: dierences in creation are due to
slightly longer installation times of Kubernetes; dierences in joining are due to a more
complex join process of Kubernetes (especially for cluster initi al iz at io n, i ni ti al j oi n of
approx. 50 seconds)
Figure 5 shows the results of the experiments E1 and E2. The reader might
be surprised, that cloud infrastructure operations have a substantial variation
in runtimes although similar resources were used for the presented experiments.
Figure 5 shows that a cluster launch on AWS takes only 3 minutes but can take
up to 15 minutes on Azure (median values). The termination is even more worse.
A cluster can be terminated in approximately a minute on AWS or OpenStack,
but it can take up 18 minutes to complete on Azure.
The question is why? Figure 7 presents runtimes of infrastructure operations to
create/delete and adjust security groups and to create/terminate virtual cluster
nodes (virtual machines) that have been measured while performing the men-
tioned experiments. Transfers are complex sequences of these IaaS operations
and substantial dierences for dierent IaaS infrastructures are observable for
these operations. AW S and OpenStack infrastructures are much faster than GCE
and Azure in creating security groups and nodes. The same is true for node ter-
minations.
And because dierent providers (and their dierent runtime behaviors) can
be combined in any combination this can result in astonishing runtime dier-
ences for node transfers. Figure 8 shows all E3 experiments and plots measured
runtime dierences of transfers between the analyzed infrastructures of AWS ,
GCE,Azure and OpenStack. In the best case (transfer from OpenStack to AWS )
the transfer could be completed in 3 minutes, in the worst case (transfer from
Azure to GCE) the transfer took more than 18 minutes (median values). Fur-
thermore, a transfer from provider A to B did in no case take the same time
as from B to A. A more in-depth analysis turned out that this is mainly due
to dierent runtimes of create/terminate security groups and whether the APIs
are triggering node terminations in a blocking or non-blocking way. The APIs
used for the providers GCE and Azure are blocking operations (that means the
call returns at the point in time when the infrastructure completed the termina-
tion operation). The behavior for OpenStack and AWS was non-blocking (that
means, the calls returned immediately and just triggered the termination but
did not wait until completion). The non-blocking behavior obviously leads to a
17
Fig. 7. Dierences in infrastructure specific processing times: data is taken
from experiments E1, E2 and E3.x and presented for the Kubernetes elastic container
platform (AWS = Amazon Web Service EC2, OS = OpenStack, GCE = Google Com-
pute Engine, Azure = Microsoft Azure)
more reactive behavior in case of node terminations (a complete node termina-
tion takes a minute and more). Figure 9 visualizes this by plotting how many
nodes during a transfer are up at any point in time. A transfer from a ”faster”
provider (AWS ) to a ”slower” provider (Azure) can be done substantially faster
than vice versa. It turned out, that the runtime behavior of the slowest
IaaS operation is dominating the overall runtime behavior of multi-
cloud operations. Taking all together, IaaS termination operations should be
launched in a non-blocking manner (whenever possible) to improve the overall
multi-cloud performance.
5.2 Critical discussion
The proposed control loop is designed to be generic enough to adapt to each
provider specific and non-standardized detail concepts which are likely to occur
in a IaaS context. For instance, the control loop was even able to handle com-
pletely dierent timing behaviors. Intentionally, only very basic IaaS concepts
18
Fig. 8. Transfer times between dierent providers: data is taken from experi-
ments E3.x and presented for the Kubernetes elastic container platform. The reader
can deduce the eect of combining slow, medium and fast IaaS infrastructure providers
on transfer durations. The slowest provider is dominating the overall runtime of an scal-
ing operation. (AWS = Amazon Web Service EC2, OS = OpenStack, GCE = Google
Compute Engine, Azure = Microsoft Azure)
are used in order to assure that the control loop can work with every public or
private IaaS infrastructure providing concepts like virtual machines and IP-based
network access control concepts. An IaaS infrastructure not providing these ba-
sic concepts is hardly imaginable and to the best of the author’s knowledge not
existing.
A migration from one infrastructure Ato another infrastructure Bcould be
expressed by setting all quantities of Ato 0 and all quantities of Bto the former
quantities of A. The presented prototype keeps the cluster in an operational state
under all circumstances. The current implementation of the execution pipeline
executes simply a worst case scaling. The pipeline processes node creation steps
before node termination steps. In consequence a migration increases the cluster
to its double size in a first step. In a second step, the cluster will be shrinked down
to its intended size in its intended infrastructure. This not very sophisticated and
leaves obviously room for improvement. Figure 9 visualizes this behavior over
time for dierent infrastructure transfers.
The execution pipeline is designed to be just the execution step of a higher
order MAPE loop. That might lead to situations that an intended state is not
reachable. In these cases, the execution loop may simply have no eect. For
better understandability the reader might want to imagine a cluster under high
load. If the intended state would be set to half of the nodes, the execution
loop would not be able to reach this state. Why? Before a node is terminated
the execution pipeline informs the scheduler of the elastic container platform to
19
Fig. 9. Amount of nodes during a transfer: These are randomly choosen transfers
taken from experiments E3.1, E3.2 and E3.3 to visualize the pipeline behaviour of a
transfer over time. For a longer or shorter amount of time (depending on the reactive-
ness of the provider) the cluster is doubled in its size. After that it is shrinked down to
its intended size in the intended provider district. The transfer is completed when the
lines end.
mark this node as unscheduleable with the intent that the container platform
will reschedule all load of this node to other nodes. A lot of elastic container
platforms call this draining a node”. For these kind of purposes elastic con-
tainer platforms have operations to mark nodes as unschedulable (Kubernetes
has the cordon command, Docker has a drain concept and so on). Only in the
case that the container platform could successfully drain the node, the node will
be deregistered and deleted. However, in high load scenarios the scheduler of the
container platform will return an error due to the fact that draining is not pos-
sible. In consequence the execution pipeline will not terminate the node and will
trigger to drain the next node on its list (which will not work as well). So, this
cycle of the execution pipeline will be finished without substantially changing
the current state. The analyzing step of the higher order MAPE loop will still
identify a delta between the intended and the current state and will retrigger
the execution pipeline. That is not perfect but at last the cluster is kept in an
operational state.
5.3 Lessons learned
By building, testing and evaluation the presented prototype several lessons learned
have been derived from performed software engineering activities. These follow-
ing lessons learned might be of interest for researchers or practitioners.
1. Secure networking should be considered from the beginning. If
dierent IaaS cloud service providers shall be bridged, it is necessary to work
with public IPs from the very beginning! According to made experiences
this is not the default operation for most elastic platforms and may result
20
in tricky details to consider6. A second essential aspect is, that the control
and data plane encryption must be supported by the used overlay network
of the elastic platform. If several overlay networks can be used by the elastic
platform, encryption should be rated as a ”showstopper” feature for overlay
network selection.
2. Do not rely on IaaS infrastructure elasticity features like auto-
scaling, load-balancing and so on. Although these featires are from a high
level point of view very basic concepts, these features are in a lot of cases
not 1:1 portable across providers. The elastic platform (and its supervising
MAPE loop) has to cover this.
3. Separate IaaS support and elastic platform support concerns. Both
concerns can be solved independently from each other using two independent
extension points. The proposed prototype introduced drivers for platforms
like Kubernetes and Swarm and drivers for Infrastructures like AWS,GCE,
OpenStack,Azure.
4. To describe an intended state and let a control process take care to
reach this intended state is less complex. To think in IaaS infrastruc-
ture specific workflows how to deploy, scale, migrate and terminate an elastic
platform has the tendency to increase complexity. The presented prototype
showed that this can be solved by a single control loop.
5. Consider what causes stress to an elastic platform. Adding nodes
to a platform is less stressful than to remove nodes. To add a node has
no immediate rescheduling involved, to deregister and remove a node has
immediate rescheduling eorts in consequence. Knowing that, it seems a
good and defensive strategy to add nodes in parallel but to shutdown nodes
sequentially. However, this increases the runtime of the execution phase of
a MAPE loop. To investigate time optimal execution strategies might be a
fruitful research direction to make MAPE loops more reactive.
6. Consider varying runtimes of similar IaaS infrastructure opera-
tions across dierent providers. IaaS operations take often (several)
minutes. Container platform operations just take seconds or even less. MAPE
loops should consider this providing two adaption levels for auto-scaling. One
slow reacting infrastructure-aware auto-scaling loop for the elastic container
platform and one fast reacting infrastructure-agnostic auto-scaling loop for
the applications operating on top of elastic container platforms. Furthermore,
the infrastructure aware auto-scaling loop must be aware that dierent IaaS
service providers might show substantial diering reaction times. The reader
might want to recapitulate the really diering timing behaviours of the AWS
and Azure IaaS infrastructures.
7. Respect resilience limitations of elastic platforms. Never shutdown
nodes before attaching compensating nodes (in case of transferability scal-
ing actions) is an obvious solution! But it is likely not ressource ecient -
6To get Kubernetes running in a multi-cloud scenario it is necessary to assign an ad-
ditional virtual network interface with the public IP address of the node. Kubernetes
provides no config options for that mode of operation! However, even these kind of
obstacles can be transparently handled by drivers.
21
especially if we consider dierent timing behaviours of public cloud service
providers (see Figure 5, 7, 8, and 9). To investigate resilient, resource and
timing ecient execution strategies could be a fruitful research direction to
optimize MAPE loops for transferability scenarios.
8. Platform roles increase avaidable deployment complexity. Platform
roles increase the inner complexity of platform drivers. Elastic container
platforms should be more P2P-like and composed of homogeneous and equal
nodes. This could be a fruitful research direction either and there exist first
interesting ideas investigating this direction [16].
9. Non-blocking APIs are preferable. This is especially true for terminat-
ing operations. In consequence elastic container platforms will show much
more reactive behavior (and faster adaption cycles) if operated on IaaS in-
frastructures providing non-blocking terminating operations (see Figure 8
and Figure 9).
6 Related Work
Several promising approaches dealing with multi-cloud scenarios. There are some
good survery papers on this [5, 30, 33, 13]. But none of these surveys identified
elastic container platforms as a viable option. Nevertheless, the need to ”adopt
open-source platforms” and ”mechanisms for real-time migration” at run-time
level is identified [30]. But to the best of author’s knowledge there do not exist
concrete and existing platforms or solutions based on container platforms. All
surveys identified approaches fitting mainly in the following fields:
Volunteer federations for groups of ”cloud providers collaborating volun-
tarily with each other to exchange resources” [13].
Independent federations (or multi-clouds) ”when multiple clouds are used
in aggregation by an application or its broker. This approach is essentially
independent of the cloud provider” and focus the client-side of cloud com-
puting [33].
This contribution intentionally did not proposed a broker-based solution [5] be-
cause cloud-brokers have the tendency just to shift the vendor lock-in problem
to a broker. Instead of that mainly independent federations (multi-clouds) were
focused by this Chapter. But if course there are similarities with other existing
approaches.
Approaches like OPTIMIS [11], ConTrail [8] or multi-cloud PaaS plat-
forms [27] enable dynamic provisioning of cloud services targeting multi-cloud
architectures. These solutions have to provide a lot of plugins to support pos-
sible implementation languages. For instance, [27] mention at least 19 dierent
plugins (just for a research prototype). Such solutions seem to come along with
an increase of inner complexity. Container-based approaches seem to be a better
fit handling this language complexity. mOSAIC [29] or Cloud4SOA [15] as-
sume that an application can be divided into components according to a service
22
oriented application architecture (SOA) and rely on the constraint that applica-
tions are bound to a specific run-time environment. This is true for the proposed
approach as well. However, this paper proposes a solution where the run-time
environment (elastic container platform) is up to a user decision as well.
The proposed deployment description format is based on JSON and shows
similarities with other kind of deployment description languages like TOSCA
[7], CAMEL [1] or CloudML [23]. In fact, some EC-funded projects like
PaaSage7[6] combine such deployment specification languages with runtime en-
vironments. Nonetheless, the proposed prototype is focused on a more container-
centric approach. Finally, several libraries have been developed in recent years
like JClouds,LibCloud,DeltaCloud,SimpleCloud,fog,Nuvem,CPIM
[12] to name a few. All these libraries unify dierences in the management APIs
of clouds and provide control over the provisioning of resources across geographi-
cal locations. And experiences with the fog library for the Azure and OpenStack
driver show, that infrastructure specifics are not completely capsuled (even the
(non-)blocking behavior can dier for the same operation in dierent infrastruc-
tures8).
Taking all together, the proposed approach leverages more up-to-date con-
tainer technologies with the intend to be more ”pragmatic”, ”lightweight” and
complexity hiding. On the downside, it might be only applicable for container-
based applications being on the cloud-native level of the maturity model shown
in Table 2. But to use container platforms and corresponding microservice ar-
chitectures gets more and more common in CNA engineering [22].
7 Conclusions
Elastic container platforms provide inherent but often overlooked multi-
cloud support and are a viable and pragmatic option to support multi-cloud
handling. But to operate elastic container platforms across dierent public and
private IaaS cloud infrastructures can be a complex and challenging engineering
task. Most manuals do not recommend to operate these kind of platforms in the
proposed way due to operational complexity. In fact, to define an intended multi-
cloud state of an elastic container platform and let a control process take care
to reach this state is not less challenging. But it hides and manages complexity
much better from the author’s point of view. This Chapter showed that this
complexity could be eciently embedded in an execution pipeline of a control
loop. This kind of resulting control process was able to migrate and operate elas-
tic container platforms at runtime across dierent cloud-service providers. It
was possible to transfer two of the currently most popular open-source container
platforms Swarm and Kubernetes between AWS,GCE,Azure and OpenStack.
These infrastructures cover three of the big five public cloud service providers
7http://www.paasage.eu/ (visited 15th Feb. 2017)
8In this case node termination (Azure blocking, OpenStack non-blocking).
23
that are responsible for almost 70% of the worldwide IaaS market share9.In
other words, the presented solution can already be used for 70% of the most fre-
quently used IaaS infrastructures. Due to its driver concept it can be extended
with further IaaS infrastructures and further elastic container platforms.
However, there is work to do. It was astonishing to see, that the completion
times of these transfers vary from 3 minutes to almost 20 minutes depending
on the involved infrastructures (see Figure 8). This should be considered for
further thoughts how to implement a more time and resource ecient execution
pipeline. Fruitful lessons learned about runtime behaviors of IaaS operations
and promising research directions like more P2P-based and control-loop based
designs of elastic container platforms could be derived for that purpose. The
presented data can be used as reference for further research and development
for these kind of ideas. However, even the worst transfer times are likely to be
much faster than the engineering eort to port infrastructure-specific CNAs in
a dierent cloud which is usually a very time consuming and complex one time
exercise not being done in hours or minutes.
Acknowledgements. This research is funded by German Federal Ministry of Ed-
ucation and Research (13FH021PX4). I would like to thank Peter Quint, Christian
St¨uben, and Arne Salveter for their hard work and their contributions to the Project
Cloud TRANSIT. Additionally I would like to thank the practitioners Mario-Leander
Reimer and Josef Adersberger from QAWare for inspiring discussions and contributions
concerning cloud-native application stacks.
References
1. A. Rossini: Cloud Application Modelling and Execution Language (CAMEL) and
the PaaSage Workflow. In: Advances in Service-Oriented and Cloud Comput-
ing—Workshops of ESOCC 2015. vol. 567, pp. 437–439 (2015)
2. Aderaldo, C.M., Mendon¸ca, N.C., Pahl, C., Jamshidi, P.: Benchmark requirements
for microservices architecture research. In: Proc. of the 1st Int. Workshop on Es-
tablishing the Community-Wide Infrastructure for Architecture-Based Software
Engineering. pp. 8–13. ECASE ’17, IEEE Press, Piscataway, NJ, USA (2017)
3. Ashtikar, S., Barker, C., Clem, B., Fichadia, P., Krupin, V., Louie, K., Malhotra,
G., Nielsen, D., Simpson, N., Spence, C.: OPEN DATA CENTER ALLIANCE
Best Practices: Architecting Cloud-Aware Applications Rev. 1.0 (2014)
4. Balalaie, A., Heydarnoori, A., Jamshidi, P.: Migrating to Cloud-Native Architec-
tures Using Microservices: An Experience Report. In: 1st Int. Workshop on Cloud
Adoption and Migration (CloudWay). Taormina, Italy (2015)
5. Barker, A., Varghese, B., Thai, L.: Cloud Services Brokerage: A Survey and Re-
search Roadmap. In: 2015 IEEE 8th International Conference on Cloud Comput-
ing. pp. 1029–1032. IEEE (jun 2015)
6. Baur, D., Domaschka, J.: Experiences from Building a Cross-cloud Orchestration
Tool. In: Proc. of the 3rd Workshop on CrossCloud Infrastructures & Platforms.
pp. 4:1–4:6. CrossCloud ’16, ACM, New York, NY, USA (2016)
9According to the synergy 2016 Cloud Research Report http://bit.ly/2f2FsGK (vis-
ited 12th Jul. 2017)
24
7. Brogi, A., Soldani, J., Wang, P.: TOSCA in a Nutshell: Promises and Perspectives,
pp. 171–186. Springer Berlin Heidelberg, Berlin, Heidelberg (2014)
8. Carlini, E., Coppola, M., Dazzi, P., Ricci, L., Righetti, G.: Cloud Federations in
Contrail. pp. 159–168. Springer Berlin Heidelberg (2012)
9. Ch. Qu and R. N. Calheiros and R. Buyya: Auto-scaling Web Applications in
Clouds: A Taxonomy and Survey. CoRR abs/1609.09224 (2016), http://arxiv.
org/abs/1609.09224
10. Fehling, C., Leymann, F., Retter, R., Schupeck, W., Arbitter, P.: Cloud Comput-
ing Patterns: Fundamentals to Design, Build, and Manage Cloud Applications.
Springer Publishing Company, Incorporated (2014)
11. Ferrer, A.J., Hernandez, F., Tordsson, J., Elmroth, E., Ali-Eldin, A., Zsigri, C.,
Sirvent, R., Guitart, J., Badia, R.M., Djemame, K., Ziegler, W., Dimitrakos, T.,
Nair, S.K., Kousiouris, G., Konstanteli, K., Varvarigou, T., Hudzia, B., Kipp, A.,
Wesner, S., Corrales, M., Forgo, N., Sharif, T., Sheridan, C.: OPTIMIS: A holis-
tic approach to cloud service provisioning. Future Generation Computer Systems
28(1), 66–77 (2012)
12. Giove, F., Longoni, D., Yancheshmeh, M.S., Ardagna, D., Di Nitto, E.: An Ap-
proach for the Development of Portable Applications on PaaS Clouds. In: Proceed-
ings of the 3rd International Conference on Cloud Computing and Services Science.
pp. 591–601. SciTePress - Science and and Technology Publications (2013)
13. Grozev, N., Buyya, R.: Inter-Cloud architectures and application brokering: tax-
onomy and survey. Software: Practice and Experience 44(3), 369–390 (mar 2014)
14. Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H.,
Shenker, S., Stoica, I.: Mesos: A Platform for Fine-Grained Resource Sharing in
the Data Center. In: 8th USENIX Conf. on Networked systems design and imple-
mentation (NSDI’11). vol. 11 (2011)
15. Kamateri, E., Loutas, N., Zeginis, D., Ahtes, J., D’Andria, F., Bocconi, S., Gou-
vas, P., Ledakis, G., Ravagli, F., Lobunets, O., Tarabanis, K.A.: Cloud4SOA: A
Semantic-Interoperability PaaS Solution for Multi-cloud Platform Management
and Portability. pp. 64–78. Springer Berlin Heidelberg (2013)
16. Karwowski, W., Rusek, M., Dwornicki, G., Or lowski, A.: Swarm Based System
for Management of Containerized Microservices in a Cloud Consisting of Hetero-
geneous Servers, pp. 262–271. Springer International Publishing, Cham (2018),
https://doi.org/10.1007/978-3-319-67220- 5_24
17. Kratzke, N.: Smuggling Multi-Cloud Support into Cloud-native Applications using
Elastic Container Platforms. In: Proc. of the 7th Int. Conf. on Cloud Computing
and Services Science (CLOSER 2017). pp. 29–42 (2017)
18. Kratzke, N., Peinl, R.: ClouNS - A Cloud-native Applications Reference Model for
Enterprise Architects. In: 8th Workshop on Service oriented Enterprise Architec-
ture for Enterprise Engineering (SoEA4EE 2016) in conjunction with the EDOC
2016 conference (2016)
19. Kratzke, N., Quint, P.C.: About Automatic Benchmarking of IaaS Cloud Service
Providers for a World of Container Clusters. Journal of Cloud Computing Research
1(1), 16–34 (2015)
20. Kratzke, N., Quint, P.C.: How to Operate Container Clusters more Eciently?
Some Insights Concerning Containers, Software-Defined-Networks, and their some-
times Counterintuitive Impact on Network Performance. International Journal On
Advances in Networks and Services 8(3&4), 203–214 (2015)
21. Kratzke, N., Quint, P.C.: Investigation of Impacts on Network Performance in the
Advance of a Microservice Design. In: Helfert, M., Ferguson, D., Munoz, V.M.,
25
Cardoso, J. (eds.) Cloud Computing and Services Science Selected Papers. Com-
munications in Computer and Information Science (CCIS), Springer (2017)
22. Kratzke, N., Quint, P.C.: Understanding Cloud-native Applications after 10 Years
of Cloud Computing - A Systematic Mapping Study. Journal of Systems and Soft-
ware 126(April), 1–16 (2017)
23. Lushpenko, M., Ferry, N., Song, H., Chauvel, F., Solberg, A.: Using Adaptation
Plans to Control the Behavior of Models@Runtime. In: Bencomo, N., otz, S.,
Song, H. (eds.) MRT 2015: 10th Int. Workshop on Models@run.time, co-located
with MODELS 2015: 18th ACM/IEEE Int. Conf. on Model Driven Engineering
Languages and Systems. CEUR Workshop Proceedings, vol. 1474. CEUR (2015)
24. Namiot, D., Sneps-Sneppe, M.: On micro-services architecture. Int. Journal of
Open Information Technologies 2(9) (2014)
25. Newman, S.: Building Microservices. O’Reilly Media, Incorporated (2015)
26. Pahl, C., Jamshidi, P.: Software architecture for the cloud A roadmap towards
control-theoretic, model-based cloud architecture. In: Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics). vol. 9278 (2015)
27. Paraiso, F., Haderer, N., Merle, P., Rouvoy, R., Seinturier, L.: A Federated Multi-
cloud PaaS Infrastructure. In: 2012 IEEE Fifth International Conference on Cloud
Computing. pp. 392–399. IEEE (jun 2012)
28. Peinl, R., Holzschuher, F.: The Docker Ecosystem Needs Consolidation. In: 5th
Int. Conf. on Cloud Computing and Services Science (CLOSER 2015). pp. 535–
542 (2015)
29. Petcu, D., Craciun, C., Neagul, M., Lazcanotegui, I., Rak, M.: Building an in-
teroperability API for Sky computing. In: 2011 Int. Conf. on High Performance
Computing & Simulation. pp. 405–411. IEEE (jul 2011)
30. Petcu, D., Vasilakos, A.V.: Portability in clouds: approaches and research oppor-
tunities. Scalable Computing: Practice and Experience 15(3), 251–270 (oct 2014)
31. Quint, P.C., Kratzke, N.: Taming the Complexity of Elasticity, Scalability and
Transferability in Cloud Computing - Cloud-Native Applications for SMEs. Int.
Journal on Advances in Networks and Services 9(3&4), 389–400 (2016)
32. Stine, M.: Migrating to Cloud-Native Application Architectures. O’Reilly (2015)
33. Toosi, A.N., Calheiros, R.N., Buyya, R.: Interconnected Cloud Computing Envi-
ronments. ACM Computing Surveys 47(1), 1–47 (may 2014)
26
Appendix A Cluster Definition File (Intended state)
This exemplary cluster definition file defines a Swarm cluster with the in-
tended state to be deployed in two districts provided by two providers GCE and
AWS. It defines three type of user defined node types (flavors): small,med, and
large. 3 master and 3 worker nodes should be deployed on small virtual machine
types in district gce-europe. 10 worker nodes should be deployed on small virtual
machine types in district aws-europe. The flavors small,med,large are defined
in Appendix C.
{
"type": "cluster",
"platform": "Swarm",
// [...], S impli fied for re adabi lity
"flavors": ["small", "med", "large"],
"deployments": [
{"district":"gce-europe",
"flavor": "med",
"role": "master",
"quantity": 3
},
{"district":"aws-europe",
"flavor": "small",
"role": "worker",
"quantity": 10
}
]
}
Listing 1.1. Cluster Definition (cluster.json)
Appendix B Resources File (Current state)
This exemplary resources file describes provided resources for the operated
cluster. This example describes a simple one node cluster (1 master) being op-
erated in one district (OpenStack). A security group was requested. Some data
is omitted for better readability.
[
{"id":"36c76118-d8e4-4d2c-b14e-fd67387d35f5",
"district_id": "openstack-nova",
"os_external_network_id": "80de501b-e836-47ed-a413",
"os_secgroup_name": "secgroup-a66817bd85e96c",
"os_secgroup_id": "36c76118-d8e4-4d2c-b14e",
"os_key_name": "sshkey-for-secgroup-a66817bd85e96c",
"type": "secgroup"
},
{"id":"13c30642-b337-4963-94aa-60cef8db9bbf",
"role": "master",
"flavor ": "medium",
"public_ip ": "212.201.22.189",
"user": "ubuntu",
"sshkey": "sshkey.pem",
"district_id": "openstack-nova",
"os_zone": "nova",
"type": "node"
}
]
Listing 1.2. Resources (resources.json)
27
Appendix C District Definition File (JSON)
The following and exemplary district definition defines provider specific set-
tings and mappings. The user defined district gce-europe should be realized using
the provider specific GCE zones europe-west1-b and europe-west1-c. Necessary
and provider specific access settings like project identifiers, regions, and cre-
dentials are provided as well. User defined flavors (see cluster definition format
above) are mapped to concrete provider specific machine types.
[
{"type":"district",
"id": "gce-europe",
"provider": "gce",
"credential_id": "gce_default",
"gce_project_id": "your-proj-id",
"gce_region": "europe-west1",
"gce_zones": ["europe-west1-b", "europe -west1-c"],
"flavors ": [
{"flavor":"small","machine_type":"n1-standard-1"},
{"flavor":"med", "machine_type":"n1-standard-2"},
{"flavor":"large","machine_type":"n1-standard-4"}
]
}
]
Listing 1.3. District Definitions (districts.json)
Appendix D Credentials File (JSON)
The following and exemplary credential file provides access credentials for
customer specific GCE and AWS accounts as identified by the district definition
file (gce default and aws default).
[{"type":"credential",
"id": "gce_default",
"provider": "gce",
"gce_key_file": "path-to-key.json"
},
{"type":"credential",
"id": "aws_default",
"provider": "aws",
"aws_access_key_id": "AKID",
"aws_secret_access_key": "SECRET"
}
]
Listing 1.4. Credentials (credentials.json)
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Centralized and hybrid container orchestration systems are a bottleneck for the scalability of cloud applications. Due to data replication costs the cluster can consist only of a handful of servers. A decentralized peer-to-peer systems are needed. We propose such a system whose architecture is the same as the microservice architecture of the cloud application it manages. It can potentially offer performance improvements with respect to the existing centralized container orchestration systems.
Conference Paper
Full-text available
Microservices have recently emerged as a new architectural style in which distributed applications are broken up into small independently deployable services, each running in its own process and communicating via lightweight mechanisms. However, there is still a lack of repeatable empirical research on the design, development and evaluation of microservices applications. As a first step towards filling this gap, this paper proposes, discusses and illustrates the use of an initial set of requirements that may be useful in selecting a community-owned architecture benchmark to support repeatable microservices research.
Conference Paper
Full-text available
Elastic container platforms (like Kubernetes, Docker Swarm, Apache Mesos) fit very well with existing cloud-native application architecture approaches. So it is more than astonishing, that these already existing and open source available elastic platforms are not considered more consequently in multi-cloud research. Elastic container platforms provide inherent multi-cloud support that can be easily accessed. We present a solution proposal of a control process which is able to scale (and migrate as a side effect) elastic container platforms across different public and private cloud-service providers. This control loop can be used in an execution phase of self-adaptive auto-scaling MAPE loops (monitoring, analysis, planning, execution). Additionally, we present several lessons learned from our prototype implementation which might be of general interest for researchers and practitioners. For instance, to describe only the intended state of an elastic platform and let a single control process take care to reach this intended state is far less complex than to define plenty of specific and necessary multi-cloud aware workflows to deploy, migrate, terminate, scale up and scale down elastic platforms or applications.
Article
Full-text available
It is common sense that cloud-native applications (CNA) are intentionally designed for the cloud. Although this understanding can be broadly used it does not guide and explain what a cloud-native application exactly is. The term "cloud-native" was used quite frequently in birthday times of cloud computing (2006) which seems somehow obvious nowadays. But the term disappeared almost completely. Suddenly and in the last years the term is used again more and more frequently and shows increasing momentum. This paper summarizes the outcomes of a systematic mapping study analyzing research papers covering "cloud-native" topics, research questions and engineering methodologies. We summarize research focuses and trends dealing with cloud-native application engineering approaches. Furthermore, we provide a definition for the term "cloud-native application" which takes all findings, insights of analyzed publications and already existing and well-defined terminology into account.
Conference Paper
Full-text available
Due to REST-based protocols, microservice architectures are inherently horizontally scalable. That might be why the microservice architectural style is getting more and more attention for cloud-native application engineering. Corresponding microservice architectures often rely on a complex technology stack which includes containers, elastic platforms and software defined networks. Astonishingly, there are almost no specialized tools to figure out performance impacts (coming along with this microservice architectural style) in the upfront of a microservice design. Therefore, we propose a benchmarking solution intentionally designed for this upfront design phase. Furthermore, we evaluate our benchmark and present some performance data to reflect some often heard cloud-native application performance rules (or myths).
Article
Full-text available
Cloud computing enables companies getting computational and storage resources on demand. Especially when using features like elasticity and scaling, cloud computing can be a very powerful technology to run, e.g., a webservice without worries about failure by overload or wasting money by paid use of unneeded resources. For using these features, developers can use or implement cloud-native applications (CNA), containerized software running on an elastic platform. Nevertheless, a CNA can be complex at planning, installation and configuration, maintenance and searching for failures. Small and medium enterprises (SMEs) are mostly limited by their personnel and financial restrictions. So, using these offered services can facilitate a very fast realization of the software project. However, using these (proprietary) services it is often difficult to migrate between cloud vendors. This paper introduces C4S, an open source system for SMEs to deploy and operate their container application with features like elasticity, auto-scaling and load balancing. The system also supports transferability features for migrating containers between different Infrastructure as a Service (IaaS) platforms. Thus, C4S is a solution for SMEs to use the benefits of cloud computing with IaaS migration features to reduce vendor lock-in.
Article
Full-text available
Web application providers have been migrating their applications to cloud data centers, attracted by the emerging cloud computing paradigm. One of the appealing features of cloud is elasticity. It allows cloud users to acquire or release computing resources on demand, which enables web application providers to auto-scale the resources provisioned to their applications under dynamic workload in order to minimize resource cost while satisfying Quality of Service (QoS) requirements. In this paper, we comprehensively analyze the challenges remain in auto-scaling web applications in clouds and review the developments in this field. We present a taxonomy of auto-scaling systems according to the identified challenges and key properties. We analyze the surveyed works and map them to the taxonomy to identify the weakness in this field. Moreover, based on the analysis, we propose new future directions.
Conference Paper
Full-text available
The capability to operate cloud-native applications can create enormous business growth and value. But enterprise architects should be aware that cloud-native applications are vulnerable to vendor lock-in. We investigated cloud-native application design principles, public cloud service providers, and industrial cloud standards. All results indicate that most cloud service categories seem to foster vendor lock-in situations which might be especially problematic for enterprise architectures. This might sound disillusioning at first. However, we present a reference model for cloud-native applications that relies only on a small subset of well standardized IaaS services. The reference model can be used for codifying cloud technologies. It can guide technology identification, classification, adoption, research and development processes for cloud-native application and for vendor lock-in aware enterprise architecture engineering methodologies.