ArticlePDF Available

Managing Feature Compatibility in Kubernetes: Vendor Comparison and Analysis

Authors:
  • Lübeck University of Applied Sciences

Abstract and Figures

Kubernetes (k8s) is a kind of cluster operating system for cloud-native workloads that has become a de-facto standard for container orchestration. Provided by more than one hundred vendors, it has the potential to protect the customer from vendor lock-in. However, the open-source k8s distribution consists of many optional and alternative features that must be explicitly activated and may depend on pre-configured system components. As a result, incompatibilities still may ensue among Kubernetes vendors. Mostly managed k8s services typically restrict the customizability of Kubernetes. This paper firstly compares the most relevant k8s vendors and, secondly, analyses the potential of Kubernetes to detect and configure compatible support for required features across vendors in a uniform manner. Our comparison is performed based on documented features, by testing, and by inspection of the configuration state of running clusters. Our analysis focuses on the potential of the end-to-end testing suite of Kubernetes to detect support for a desired feature in any Kubernetes vendor and the possibility of reconfiguring the studied vendors with missing features in a uniform manner. Our findings are threefold: First, incompatibilities arise between default cluster configurations of the studied vendors for approximately 18% of documented features. Second, matching end-to-end tests exist only for around 64% of features and for 17% of features these matching tests are not well developed for all vendors. Third, almost all feature incompatibilities can be resolved using a vendor-agnostic API. These insights are beneficial to avoid feature incompatibilities already in cloud-native application engineering processes. Moreover, the end-to-end testing suite can be extended in currently unlighted areas to provide better feature coverage.
Content may be subject to copyright.
A preview of the PDF is not available
... This brings great challenges to achieving the aforementioned consistency requirements in cluster federations as explained in Section 1. In our previous work [35], we have found that 30 out of 162 documented features of the open-source distribution of K8s v1.13 are not consistently activated or de-activated in the aforementioned leading vendors of the hosted product type, and these feature settings cannot be modified by the cluster administrator due to hidden configuration manifests of the Kubernetes control plane. ...
... Various plug-in components and configuration parameters can be set via options of the kube-apiserver command that starts up the API server [36]. These plug-in components and configuration parameters of the API server can be mapped to the following K8s features that are clearly defined as part of the open-source documentation [5,35] [35], we found differences between the three K8s vendors with respect to 2 beta feature gates. Unfortunately, also different alpha features were enabled; 4. ...
... Various plug-in components and configuration parameters can be set via options of the kube-apiserver command that starts up the API server [36]. These plug-in components and configuration parameters of the API server can be mapped to the following K8s features that are clearly defined as part of the open-source documentation [5,35] [35], we found differences between the three K8s vendors with respect to 2 beta feature gates. Unfortunately, also different alpha features were enabled; 4. ...
Article
Full-text available
Kubernetes (K8s) defines standardized APIs for container-based cluster orchestration such that it becomes possible for application managers to deploy their applications in a portable and interopable manner. However, a practical problem arises when the same application must be replicated in a distributed fashion across different edge, fog and cloud sites; namely, there will not exist a single K8s vendor that is able to provision and manage K8s clusters across all these sites. Hence, the problem of feature incompatibility between different K8s vendors arises. A large number of documented features in the open-source distribution of K8s are optional features that are turned off by default but can be activated by setting specific combinations of parameters and plug-in components in configuration manifests for the K8s control plane and worker node agents. However, none of these configuration manifests are standardized, giving K8s vendors the freedom to hide the manifests behind a single, more restricted, and proprietary customization interface. Therefore, some optional K8s features cannot be activated consistently across K8s vendors and applications that require these features cannot be run on those vendors. In this paper, we present a unified, vendor-agnostic feature management approach for consistently configuring optional K8s features across a federation of clusters hosted by different Kubernetes vendors. We describe vendor-agnostic reconfiguration tactics that are already applied in industry and that cover a wide range of optional K8s features. Based on these tactics, we design and implement an autonomic controller for declarative feature compatibility management across a cluster federation. We found that the features configured through our vendor-agnostic approach have no impact on application performance when compared with a cluster where the features are configured using the configuration manifests of the open-source K8s distribution. Moreover, the maximum time to complete reconfiguration of a single feature is within 100 seconds, which is 6 times faster than using proprietary customization interfaces of mainstream K8s vendors such as Google Kubernetes Engine. However, there is a non-negligible disruption to running applications when performing the reconfiguration to an existing cluster; this disruption impact does not appear using the proprietary customization methods of the K8s vendors due to the use of rolling upgrade of cluster nodes. Therefore, our approach is best applied in the following three use cases: (i) when starting up new K8s clusters, (ii) when optional K8s features of existing clusters must be activated as quickly as possibly and temporary disruption to running applications can be tolerated or (iii) when proprietary customization interfaces do not allow to activate the desired optional feature.
... This brings great challenges to achieving the aforementioned consistency requirements in cluster federations as explained in Section 1. In our previous work [28], we have found that 30 out of 162 documented features of the open-source distribution of K8s v1.13, are not consistently activated or de-activated in the aforementioned leading vendors of the hosted product type, and these feature settings cannot be modified by the cluster administrator due to hidden configuration manifests of the Kubernetes control plane. ...
... In the latest version of K8s there are more than 40 beta features. In our previous work [28], we found differences between the three K8s vendors with respect to 2 beta feature gates. Unfortunately, also different alpha features were enabled; 4. ...
... To address this issue, we can utilize the end-to-end testing suite provided by the open-source distribution of Kubernetes to run conformance tests to validate if a feature is supported in a cluster. However, the feature coverage of the test suite is insufficient [28]. Improving the coverage of this test suite is beyond the scope of this paper. ...
Preprint
Kubernetes (K8s) defines standardized APIs for container-based cluster orchestration so it becomes possible for application managers to deploy their applications in a unified manner across different cloud providers. A practical problem is however feature incompatibility between different K8s vendors, who offer commercial K8s products based on the open-source K8s distribution. A large number of documented features in this open-source distribution are optional features that are turned off by default, but can be activated by setting specific combinations of parameters and plug-in components in configuration manifests for the K8s control plane and worker node agents. However, none of these configuration manifests are standardized, giving K8s vendors the freedom to hide the manifests behind a single, more restricted, and proprietary customization interface. Therefore some optional K8s features cannot be activated consistently across K8s vendors and applications that require these features cannot be run on those vendors. In this paper we present a unified, vendor-agnostic feature management approach that bypasses the proprietary customization interface of K8s vendors in order to consistently activate optional K8s features across a federation of clusters hosted by different Kubernetes vendors. We describe vendor-agnostic reconfiguration tactics that are already applied in industry and cover a wide range of optional K8s features. Based on these tactics, we design and implement an autonomic controller for declarative feature compatibility management across a cluster federation. We found that the features configured through our vendor-agnostic approach have no impact on application performance when compared with a cluster where the features are configured using the configuration manifests of the open-source K8s distribution. Moreover, the maximum time to complete reconfiguration of a single feature is within 100 seconds, which is 6 times faster than using proprietary customization interfaces of mainstream K8s vendors such as Google Kubernetes Engine. However, there is a non-negligible disruption to running applications when performing the reconfiguration; this disruption impact does not appear using the proprietary customization methods of the K8s vendors. Therefore, our approach is best applied in the following three use cases: (i) when starting up new K8s clusters, (ii) when optional K8s features of existing clusters must be activated as quickly as possibly and temporary disruption to running applications can be tolerated or (iii) when proprietary customization interfaces do not allow to activate the desired optional feature.
... Throughout the implementation of full software stack for DCS, we learned some lessons about how the current Linux kernel and cloud infra software, such as Kubernetes, 12 should evolve to better support CXL pooled memory systems. ...
Article
Full-text available
CXL pooled memory is gaining attention from the industry as a viable memory disaggregation solution offering memory expansion and alleviating memory overprovisioning. One essential feature for efficient use of the pooled memory is to dynamically allocate or release memory from the pool based on hosts' demands. We refer to this feature Dynamic Capacity Service (DCS) . This paper introduces one of the industry's first DCS implementation for CXL pooled memory. We demonstrate fully functional DCS by implementing an FPGA-based CXL pooled memory prototype and full software stacks. Our experiment shows DCS can substantially improve system memory utilization by dynamically allocating and releasing memory resources on demand. We also present the lessons learned from the DCS implementation.</styled-content
... However, deploying and managing containerized applications requires a container orchestration platform, such as Kubernetes and KubeEdge (KE), to maximize application performance under various circumstances. Kubernetes [12], [13] is a well-known container orchestration platform that offers several features, such as service management and resource provisioning of edge nodes, as well as assuring service availability. A pod is the smallest deployable unit in Kubernetes that contains one or more containers. ...
Article
Full-text available
KubeEdge (KE) is a container orchestration platform for deploying and managing containerized IoT applications in an edge computing environment based on Kubernetes. It is intended to be hosted at the edge and provides seamless cloud-edge coordination as well as an offline mode that allows the edge to function independently of the cloud. However, there are unreliable communication links between edge nodes in edge computing environments, implying that load balancing in an edge computing environment is not guaranteed while using KE. Furthermore, KE lacks Horizontal Pod Autoscaling (HPA), implying that KE cannot dynamically deploy new resources to efficiently handle increasing requests. Both of the aforementioned issues have a significant impact on the performance of the KE-based edge computing system, particularly when traffic volumes vary over time and geographical location. In this study, a node-based horizontal pod autoscaler (NHPA) is proposed to provide dynamical adjustment for the number of pods of individual nodes independently from each other in an edge computing environment where the traffic volume fluctuates over time and location, and the communication links between edge nodes are not stable. The proposed NHPA can dynamically adjust the number of pods depending on the incoming traffic at each node, which will improve the overall performance of the KubeEdge-based edge computing environment. In the KubeEdge-based edge computing environment, the experimental findings reveal that NHPA outperforms KE in terms of throughput and response time by a factor of about 3 and 25, respectively.
... In past and current industrial action research [4,6,[8][9][10][11][12][13][14], I came across various cloud-114 native applications and corresponding engineering methodologies like the 12-factor app 115 (see 4.1) and learned that the discussion around observability is increasingly moving 116 beyond these three stovepipes and taking a more nuanced and integrated view. There is a 117 growing awareness of integrating and unifying these three pillars, and more emphasis is 118 being placed on analytics. ...
Preprint
Full-text available
Background: Cloud-native software systems often have a much more decentralized structure and many independently deployable and (horizontally) scalable components, making it more complicated to create a shared and consolidated picture of the overall decentralized system state. Today, observability is often understood as a triad of collecting and processing metrics, distributed tracing data, and logging. The result is often a complex observability system composed of three stovepipes whose data is difficult to correlate. Objective: This study analyzes whether these three historically emerged observability stovepipes of logs, metrics and distributed traces could be handled more integrated and with a more straightforward instrumentation approach. Method: This study applied an action research methodology used mainly in industry-academia collaboration and common in software engineering. The research design utilized iterative action research cycles, including one long-term use case. Results: This study presents a unified logging library for Python and a unified logging architecture that uses the structured logging approach. The evaluation shows that several thousand events per minute are easily processable. Conclusion: The results indicate that a unification of the current observability triad is possible without the necessity to develop utterly new toolchains.
... For example, Kubernetes could manage containerized applications and data running on different public cloud providers, and it can also realize the connection among microservices distributed in multiple clouds and hybrid environments (data center and public cloud) [17]. However, Truyen et al. [18] found that there may still be incompatibilities among Kubernetes vendors by comparing the default cluster configurations of Azure Kubernetes Service (AKS), AWS Elastic Kubernetes Service (EKS), and Google Kubernetes Engine (GKE). They also proposed that almost all feature incompatibilities can be solved if vendors activate the KubeletConfiguration API. ...
Article
Full-text available
Cloud-native is an innovative technology and methodology that is necessary to realize the digital transformation of enterprises. Promoting the wide adoption of cloud-native in cloud providers and enterprises has gained popularity in recent years. According to the technological and commercial characteristics of cloud-native, this paper analyzes the game relationship between cloud providers and enterprises on the selection of cloud-native, and combines evolutionary game theory to establish a model. In addition, empirical analysis indicates the impact of parameter changes on the dynamic evolution process. The results show that (1) enterprises are more vulnerable to the impact of direct benefit to adopt cloud-native, and cloud providers are especially affected by the cost of providing cloud-native; (2) enterprises are more likely to be impacted by the invisible benefit than cloud providers, but the impact has a marginal decreasing effect; (3) the low price is one of the reasons to attract enterprises; (4) enterprises are more concerned about the potential loss caused by the supply and demand mismatch. The results of the discussion provide a reference for all stakeholders to promote the implementation of cloud-native and the digital transformation of enterprises.
Article
Full-text available
Cloud-native software systems often have a much more decentralized structure and many independently deployable and (horizontally) scalable components, making it more complicated to create a shared and consolidated picture of the overall decentralized system state. Today, observability is often understood as a triad of collecting and processing metrics, distributed tracing data, and logging. The result is often a complex observability system composed of three stovepipes whose data are difficult to correlate. Objective: This study analyzes whether these three historically emerged observability stovepipes of logs, metrics and distributed traces could be handled in a more integrated way and with a more straightforward instrumentation approach. Method: This study applied an action research methodology used mainly in industry-academia collaboration and common in software engineering. The research design utilized iterative action research cycles, including one long-term use case. Results: This study presents a unified logging library for Python and a unified logging architecture that uses the structured logging approach. The evaluation shows that several thousand events per minute are easily processable. Conclusions: The results indicate that a unification of the current observability triad is possible without the necessity to develop utterly new toolchains.
Article
Full-text available
Kubernetes, a container orchestration tool for automatically installing and managing Docker containers, has recently begun to support a federation function of multiple Docker container clusters. This technology, called Kubernetes Federation, allows developers to increase the responsiveness and reliability of their applications by distributing and federating container clusters to multiple service areas of cloud service providers. However, it is still a daunting task to manually manage federated container clusters across all the service areas or to maintain the entire topology of cloud applications at a glance. This research work proposes a method to automatically form and monitor Kubernetes Federation, given application topology descriptions in TOSCA (Topology and Orchestration Specification for Cloud Applications), by extending the orchestration tool that automatizes the modeling and instantiation of cloud applications. It also demonstrates the successful federation of the clusters according to the TOSCA specifications and verifies the auto-scaling capability of the configured system through a scenario in which the servers of a sample application are deployed and federated.
Technical Report
Full-text available
The project CloudTRANSIT dealt with the question of how to transfer cloud applications and services at runtime without downtime across cloud infrastructures from different public and private cloud service providers. This technical report summarizes the outcomes of approximately 20 research papers that have been published throughout the project. This report intends to provide an integrated birds-eye view on these-so far-isolated papers. The report references the original papers where ever possible. This project also systematically investigated practitioner initiated cloud application engineering trends of the last three years that provide several promising technical opportunities to avoid cloud vendor lock-in pragmatically. Especially European cloud service providers should track such kind of research because of the technical opportunities to bring cloud application workloads back home to Europe. Such workloads are currently often deployed and inherently bound to U.S. providers. Intensified EU General Data Protection (GDPR) policies, European Cloud Initiatives, or "America First" policies might even make this imperative. So, technical solutions needed for these scenarios that are manageable not only by large but also by small and medium-sized enterprises. Therefore, this project systematically analyzed commonalities of cloud infrastructures and cloud applications. Latest evolutions of cloud standards and cloud engineering trends (like containerization) were used to derive a cloud-native reference model (ClouNS) that guided the development of a pragmatic cloud-transferability solution. This solution intentionally separated the infrastructure-agnostic operation of elastic container platforms (like Swarm, Kubernetes, Mesos/Marathon, etc.) via a multi-cloud-scaler and the platform-agnostic definition of cloud-native applications and services via an unified cloud application modeling language. Both components are independent but complementary. Because of their independence, they can even contribute (although not intended) to other fields like moving target based cloud security-but also distributed ledger technologies (block-chains) made provide options here. The report summarizes the main outcomes and insights of a proof-of-concept solution to realize transferability for cloud applications and services at runtime without downtime.
Article
Full-text available
This paper presents a review of cloud application architectures and its evolution. It reports observations being made during a research project that tackled the problem to transfer cloud applications between different cloud infrastructures. As a side effect, we learned a lot about commonalities and differences from plenty of different cloud applications which might be of value for cloud software engineers and architects. Throughout the research project, we analyzed industrial cloud standards, performed systematic mapping studies of cloud-native application-related research papers, did action research activities in cloud engineering projects, modeled a cloud application reference model, and performed software and domain-specific language engineering activities. Two primary (and sometimes overlooked) trends can be identified. First, cloud computing and its related application architecture evolution can be seen as a steady process to optimize resource utilization in cloud computing. Second, these resource utilization improvements resulted over time in an architectural evolution of how cloud applications are being built and deployed. A shift from monolithic service-oriented architectures (SOA), via independently deployable microservices towards so-called serverless architectures, is observable. In particular, serverless architectures are more decentralized and distributed and make more intentional use of separately provided services. In other words, a decentralizing trend in cloud application architectures is observable that emphasizes decentralized architectures known from former peer-to-peer based approaches. This is astonishing because, with the rise of cloud computing (and its centralized service provisioning concept), the research interest in peer-to-peer based approaches (and its decentralizing philosophy) decreased. However, this seems to change. Cloud computing could head into the future of more decentralized and more meshed services.
Article
Full-text available
Cloud-native applications are intentionally designed for the cloud in order to leverage cloud platform features like horizontal scaling and elasticity - benefits coming along with cloud platforms. In addition to classical (and very often static) multi-tier deployment scenarios, cloud-native applications are typically operated on much more complex but elastic infrastructures. Furthermore, there is a trend to use elastic container platforms like Kubernetes, Docker Swarm or Apache Mesos. However, especially multi-cloud use cases are astonishingly complex to handle. In consequence, cloud-native applications are prone to vendor lock-in. Very often TOSCA-based approaches are used to tackle this aspect. But, these application topology defining approaches are limited in supporting multi-cloud adaption of a cloud-native application at runtime. In this paper, we analyzed several approaches to define cloud-native applications being multi-cloud transferable at runtime. We have not found an approach that fully satisfies all of our requirements. Therefore we introduce a solution proposal that separates elastic platform definition from cloud application definition. We present first considerations for a domain specific language for application definition and demonstrate evaluation results on the platform level showing that a cloud-native application can be transferred between different cloud service providers like Azure and Google within minutes and without downtime. The evaluation covers public and private cloud service infrastructures provided by Amazon Web Services, Microsoft Azure, Google Compute Engine and OpenStack.
Conference Paper
Full-text available
Cloud-native applications are intentionally designed for the cloud in order to leverage cloud platform features like horizontal scaling and elasticity-benefits coming along with cloud platforms. In addition to classical (and very often static) multi-tier deployment scenarios, cloud-native applications are typically operated on much more complex but elastic infrastructures. Furthermore, there is a trend to use elastic container platforms like Kubernetes, Docker Swarm or Apache Mesos. However, especially multi-cloud use cases are astonishingly complex to handle. In consequence, cloud-native applications are prone to vendor lock-in. Very often TOSCA-based approaches are used to tackle this aspect. But, these application topology defining approaches are limited in supporting multi-cloud adaption of a cloud-native application at runtime. In this paper, we analyzed several approaches to define cloud-native applications being multi-cloud transferable at runtime. We have not found an approach that fully satisfies all of our requirements. Therefore we introduce a solution proposal that separates elastic platform definition from cloud application definition. We present first considerations for a domain specific language for application definition and demonstrate evaluation results on the platform level showing that a cloud-native application can be transferred between different cloud service providers like Azure and Google within minutes and without downtime. The evaluation covers public and private cloud service infrastructures provided by Amazon Web Services, Microsoft Azure, Google Compute Engine and OpenStack.
Article
Full-text available
The capability to operate cloud-native applications can generate enormous business growth and value. But enterprise architects should be aware that cloud-native applications are vulnerable to vendor lock-in. We investigated cloud-native application design principles, public cloud service providers, and industrial cloud standards. All results indicate that most cloud service categories seem to foster vendor lock-in situations which might be especially problematic for enterprise architectures. This might sound disillusioning at first. However, we present a reference model for cloud-native applications that relies only on a small subset of well standardized IaaS services. The reference model can be used for codifying cloud technologies. It can guide technology identification, classification, adoption, research and development processes for cloud-native application and for vendor lock-in aware enterprise architecture engineering methodologies.
Article
Containers, enabling lightweight environment and performance isolation, fast and flexible deployment, and fine‐grained resource sharing, have gained popularity in better application management and deployment in addition to hardware virtualization. They are being widely used by organizations to deploy their increasingly diverse workloads derived from modern‐day applications such as web services, big data, and internet of things in either proprietary clusters or private and public cloud data centers. This has led to the emergence of container orchestration platforms, which are designed to manage the deployment of containerized applications in large‐scale clusters. These systems are capable of running hundreds of thousands of jobs across thousands of machines. To do so efficiently, they must address several important challenges including scalability, fault tolerance and availability, efficient resource utilization, and request throughput maximization among others. This paper studies these management systems and proposes a taxonomy that identifies different mechanisms that can be used to meet the aforementioned challenges. The proposed classification is then applied to various state‐of‐the‐art systems leading to the identification of open research challenges and gaps in the literature intended as future directions for researchers.
Conference Paper
In the current world with immense technological advancement, the Information Technology(IT) world is switching from physical storage to cloud storage since the “cloud” providers supply resources on demand over the Internet. Cloud computing is a successful and speedy evolving model with new features and capabilities being announced regularly. The concept of this is known as “pay as you use” which enables the firms to shift to cloud. Due to this, security of such data has become an issue. Security of cloud-based applications is one of the key concerns of cloud customers. These three principles of cloud security are Availability, Confidentiality and Integrity. One of the most efficient ways is with the help of Container Clustering. There are various ways in which container clustering can be achieved. This paper presents the study and the comparison between two such technologies, i.e. Docker Swarm and Kubernetes.
Conference Paper
Diagnosing misconfiguration across modern software stacks is increasingly difficult. These stacks comprise multiple micro-services which are deployed across a combination of containers and hosts (VMs, physical machines) in a cloud or a data center. The existing approaches for detecting misconfiguration, whether rule-based or inference, are highly specialized (e.g., security only), cumbersome to write and maintain, geared towards a host (instead of container images), and can result into false-positives or false-negatives. This paper introduces configuration validation language (CVL), a declarative language for writing rules to detect misconfigurations that can, for instance, impact security, performance, functionality. We have built a system, ConfigValidator, which applies the CVL rules across a multitude of environments such as Docker images, running containers, host, and cloud. The system is running in production and has scanned thousands of Docker images and running containers for identifying misconfigurations.
Article
Platform-as-a-Service (PaaS) clouds offer services to automate the deployment and management of applications, relieving application owners of the complexity of managing the underlying infrastructure resources. However, application owners have an increasingly larger diversity and volume of workloads, which they want to execute at minimum cost while maintaining desired performance guarantees. In this paper we investigate how existing PaaS systems cope with this challenge. In particular, we first present a taxonomy of commonly-encountered design decisions regarding how PaaS systems manage underlying resources. We then use this taxonomy to analyze an extensive set of PaaS systems targeting different application domains. Based on this analysis, we identify several future research opportunities in the PaaS design space, which will enable PaaS owners to reduce hosting costs while coping with the workload variety.