ArticlePDF Available

ElaClo: A Framework for Optimizing Software Application Topology in the Cloud Environment

Authors:

Abstract and Figures

Application architectures in the cloud employ elastic components, and achieve lower operating costs without sacrificing quality. Software architects strive to provide efficient services by deciding on software topology: a set of structural architectural decisions. For a given application, there can be numerous software topology alternatives creating the need for automated optimization methods. Current optimization approaches rely on experts providing application performance models built upfront, based on their experience and the requirements provided. While such techniques are effective and valuable, they require additional maintenance effort as the software evolves.
Content may be subject to copyright.
A preview of the PDF is not available
... Various tools are available [19], [20], [21], [22], [23], [24] to test Cloud services at different layers including, storage system, platform interface, hardware interface, and application systems. Some framework like PCTF [15], AUToCLES [25] ROAR [26], and ElaClo [27] are also developed for this purpose. ...
Article
Serverless computing systems have become very popular because of their natural advantages with respect to auto-scaling, load balancing and fast distributed processing. As of today, almost all serverless systems define two QoS classes: best-effort ( BE ) and latency-sensitive ( LS ). Systems typically do not offer any latency or QoS guarantees for BE jobs and run them on a best-effort basis. In contrast, systems strive to minimize the processing time for LS jobs. This work proposes a precise definition for these job classes and argues that we need to consider a bouquet of performance metrics for serverless applications, not just a single one. We thus propose the comprehensive latency ( CL ) that comprises the mean, tail latency, median and standard deviation of a series of invocations for a given serverless function. Next, we design a system FaaSCtrl, whose main objective is to ensure that every component of the CL is within a prespecified limit for an LS application, and for BE applications, these components are minimized on a best-effort basis. Given the sheer complexity of the scheduling problem in a large multi-application setup, we use the method of surrogate functions in optimization theory to design a simpler optimization problem that relies on performance and fairness. We rigorously establish the relevance of these metrics through characterization studies. Instead of using standard approaches based on optimization theory, we use a much faster reinforcement learning (RL) based approach to tune the knobs that govern process scheduling in Linux, namely the real-time priority and the assigned number of cores. RL works well in this scenario because the benefit of a given optimization is probabilistic in nature, owing to the inherent complexity of the system. We show using rigorous experiments on a set of real-world workloads that FaaSCtrl achieves its objectives for both LS and BE applications and outperforms the state-of-the-art by 36.9% (for tail response latency) and 44.6% (for response latency's std. dev.) for LS applications.
Chapter
Attracted by the flexibility of microservice architecture, many Cloud services are composed of components that communicate with the Remote Procedure Calls (RPC). Considering the high cost of RPC between components running on different machines, this raises a question about how to effectively arrange and place these components to physical machines in Cloud to offer good quality of services. Current co-location strategies mainly consider the resource constraints and performance interference among individual components but ignore the workflow dependencies of components. Though workflow of background jobs has been used as an important modeling element when seeking optimal scheduling schema, the workflow-aware scheduling for microservice applications lacks research efforts. In this paper, we propose a workflow-aware component placement schema for microservice applications to reduce their running time. Specifically, we design a new workflow model based on Directed Acyclic Graph (DAG) and probabilistic theory describing components’ calling and time dependency to predict the running time of applications. Based on the proposed model, we quantify the affinity degree of two components that further supports the pod affinity scheduling. The affinity degree considers multiple dimensions, including the critical path, the possible improvement of running time, and the throughput of user requests. The experimental evaluation results prove the accuracy of the workflow model and its application effect on pod affinity scheduling.
Article
The increasing heterogeneity of the VM offerings on public IaaS clouds gives rise to a very large number of deployment options for constructing distributed, multi-component cloud applications. However, selecting an appropriate deployment variant , i.e., a valid combination of deployment options, to meet required performance levels is non-trivial. The combinatorial explosion of the deployment space makes it infeasible to measure the performance of all deployment variants to build a comprehensive empirical performance model. To address this problem, we propose Feature-Oriented Cloud (FOCloud), a performance engineering approach for deployment configurable cloud applications. FOCloud (i) uses feature modeling to structure and constrain the valid deployment space by modeling the commonalities and variations in the different deployment options and their inter-dependencies, (ii) uses sampling and machine learning to incrementally and cost-effectively build a performance prediction model whose input variables are the deployment options, and the output variable is the performance of the resulting deployment variant, and (iii) uses Explainable AI techniques to provide explanations for the prediction outcomes of valid deployment variants in terms of the deployment options. We demonstrate the practicality and feasibility of FOCloud by applying it to an extension of the RuBiS benchmark application deployed on Google Cloud.
Article
During the last decade, the research community has developed different simulation tools to model and study cloud systems. However, current cloud simulators focus on specific features that typically do not fully cover all aspects of the cloud infrastructure. The ever-growing number of existing simulators increases the difficulty to properly choose the most appropriate one. Moreover, in certain situations, these simulators must be combined to analyze the features required by the user, which leads to investing a considerable time and effort for their selection. In this paper, we propose CloudExpert, an intelligent system based on metamorphic testing that selects the most appropriate simulator covering the features of interest for the user. In contrast to our previous work, where metamorphic testing is applied to improve models representing a cloud, in this work we analyse the underlying features of several well-known cloud simulators to generate metamorphic rules, which are applied to represent the properties of the simulator. To show the applicability of CloudExpert, we conducted an empirical study where the adequacy of six well-known cloud simulators was analyzed. In this experiment, CloudExpert recommended the most appropriate simulator for eight scenarios involving different aspects of the cloud (energy, storage, network, memory, CPU) and simulator performance; and could also identify strengths and weaknesses of these simulators. Then, we further validated CloudExpert in two different ways. Firstly, the effectiveness of CloudExpert was measured using different faulty cloud simulators. Secondly, we designed a questionnaire based on the results provided by CloudExpert for some of the scenarios of the first experiment. The questionnaire was answered by eight experts in cloud simulation, confirming the usefulness of the tool.
Article
Full-text available
References to NP-completeness and NP-hardness are common in the computer science literature, but unfortunately they are often in error or misguided. This article explains the most widespread misconceptions, why they are wrong, and why we should care.
Conference Paper
Full-text available
One important issue in software engineering is to find an effective way to deal with the increasing complexity of software computing system. Modern software applications have evolved in terms of size and scope. Specific tools have been created to predict the Quality of Service (QoS) at design-time. However, the optimization of an architecture usually has to be done manually, resulting in an arduous and time-consuming process. For this reason, we present the Palladio Optimization Suite (POS), a collection of complementary plugins realized to run atop Palladio Bench with the aim of automatizing the exploration of the space of possible architectures by means of advanced search paradigms.
Article
Full-text available
Traditionally, software architecture is seen as the result of the software architecture design process, the solution, usually represented by a set of components and connectors. Recently, the why of the solution, the set of design decisions made by the software architect, is complementing or even replacing the solution-oriented definition of software architecture. This in turn leads to the study of the process of making these decisions. We outline some research directions that may help us understand and improve the software architecture design process.
Article
Full-text available
Recent years have seen the massive migration of enterprise applications to the cloud. One of the challenges posed by cloud applications is Quality-of-Service (QoS) management, which is the problem of allocating resources to the application to guarantee a service level along dimensions such as performance, availability and reliability. This paper aims at supporting research in this area by providing a survey of the state of the art of QoS modeling approaches suitable for cloud systems. We also review and classify their early application to some decision-making problems arising in cloud QoS management.
Conference Paper
Full-text available
A recent trend, movement of software applications to Cloud, provides among numerous benefits, an important model for infrastructure cost reduction using the pay-as-you-go concept. In our experiments, we noticed that software distribution may significantly influence cost benefits achieved in Cloud. Software distribution optimization requires a continuous information influx on key metrics characterizing incoming workload. In this paper we propose a method for modeling workloads of business applications characterized by nonuniform distribution over the day. Two important properties are described: (1) modeling and forecasting repeatable patterns observed in the business context, and (2) modeling the inter-arrival time distribution of service requests. While former is important for constructing automated capacity planning controllers, latter is required for describing the amount of traffic variability. We analyzed these properties on a two-month workload collected from a production business services used by several thousand customers in retail domain in Croatia. Based on this analysis, we propose a high-level design of a quality of service controller applicable to business services in cloud environment.
Book
Metaheuristics are widely used to solve important practical combinatorial optimization problems. Many new multicast applications emerging from the Internet-such as TV over the Internet, radio over the Internet, and multipoint video streaming-require reduced bandwidth consumption, end-to-end delay, and packet loss ratio. It is necessary to design and to provide for these kinds of applications as well as for those resources necessary for functionality. Multi-Objective Optimization in Computer Networks Using Metaheuristics provides a solution to the multi-objective problem in routing computer networks. It analyzes layer 3 (IP), layer 2 (MPLS), and layer 1 (GMPLS and wireless functions). In particular, it assesses basic optimization concepts, as well as several techniques and algorithms for the search of minimals; examines the basic multi-objective optimization concepts and the way to solve them through traditional techniques and through several metaheuristics; and demonstrates how to analytically model the computer networks presented within the text. The book then focuses on the multi-objective models in computer networks, optical networks, and wireless networks and the applied way they can be solved. This resource also contains annexes that present the source code to solve the mathematical model problems present in the book through solvers and source codes programmed in C language, which solve some of the multi-objective optimization problems presented in the book.
Article
Both network security and quality of service (QoS) consume computational resource of IT system and thus may evidently affect the application services. In the case of limited computational resource, it is important to model the mutual influence between network security and QoS, which can be concurrently optimized in order to provide a better performance under the available computational resource. In this paper, an evaluation model is accordingly presented to describe the mutual influence of network security and QoS, and then a multi-objective genetic algorithm NSGA-II is revised to optimize the multi-objective model. Using the intrinsic information from the target problem, a new crossover approach is designed to further enhance the optimization performance. Simulation results validate that our algorithm can find a set of Pareto-optimal security policies under different network workloads, which can be provided to the potential users as the differentiated security preferences. These obtained Pareto-optimal security policies not only meet the security requirement of the user, but also provide the optimal QoS under the available computational resource.
Conference Paper
Cloud Computing is an emerging paradigm in Information Technologies that enables the delivery of infrastructure, software and platform resources as services. It is an environment with automatic service provisioning and management. In these last years autonomic management of Cloud services is receiving an increasing attention. Meanwhile, optimization of autonomic managers remains not well explored. In fact, almost all the existing solutions on autonomic computing have been interested in modeling and implementing of autonomic environments without paying attention on optimization. In this paper, we propose a new efficient algorithm to optimize autonomic managers for the management of service-based applications. Our algorithm allows to determine the minimum number of autonomic managers and to assign them to services that compose managed service-based applications. The realized experiments proves that our approach is efficient and adapted to service-based applications that can be not only described as architecture-based but also as behavior-based compositions of services.
Article
In this paper, we present performance comparisons between two popular elitism-based evolutionary multi-objective optimization algorithms -NSGA2 and SPEA2 in the presence of noise. Three test problems and six noise levels are employed in the research experiments. The results show that SPEA2 outperforms NSGA2 in the early generations. NSGA2, however, is superior during latter generations regardless of the level of noise presence in the problem.