Conference Paper
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Cloud systems are large scalable distributed systems that must be carefully monitored to timely detect problems and anomalies. While a number of cloud monitoring frameworks are available, only a few solutions address the problem of adaptively and dynamically selecting the indicators that must be collected, based on the actual needs of the operator. Unfortunately, these solutions are either limited to infrastructure-level indicators or technology-specific, for instance, they are designed to work with OpenStack but not with other cloud platforms. This paper presents the VARYS monitoring framework, a technology-agnostic Monitoring-as-a-Service solution that can address KPI monitoring at all levels of the Cloud stack, including the application-level. Operators use VARYS to indicate their monitoring goals declaratively, letting the framework to perform all the operations necessary to achieve a requested monitoring configuration automatically. Interestingly, the VARYS architecture is general and extendable, and can thus be used to support increasingly more platforms and probing technologies.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This paper presents our work about delivering ADaaS for cloud systems. We defined our approach on top of VARYS [30], which is a technology-agnostic MaaS framework for cloud systems. ADaaS relies on an architecture that is designed to dynamically and automatically deploy, redeploy, and un-deploy anomaly detectors, based on operator's needs. ...
... Note that we started from VARYS [30] to define ADaaS because VARYS provides a flexible API that facilitates integration. However, the ADaaS principle is not limited to VARYS and can be developed also using other MaaS frameworks. ...
... This means that ADaaS can be used with regular monitoring frameworks, but it is more effective when used jointly with a MaaS framework, because both the collected KPIs and the anomaly detection strategies could be changed dynamically. In particular, our prototype has been designed to seamless integrate with VARYS [30]. ...
Preprint
Full-text available
Cloud systems are complex, large, and dynamic systems whose behavior must be continuously analyzed to timely detect misbehaviors and failures. Although there are solutions to flexibly monitor cloud systems, cost-effectively controlling the anomaly detection logic is still a challenge. In particular, cloud operators may need to quickly change the types of detected anomalies and the scope of anomaly detection, for instance based on observations. This kind of intervention still consists of a largely manual and inefficient ad-hoc effort. In this paper, we present Anomaly Detection as-a-Service (ADaaS), which uses the same as-a-service paradigm often exploited in cloud systems to declarative control the anomaly detection logic. Operators can use ADaaS to specify the set of indicators that must be analyzed and the types of anomalies that must be detected, without having to address any operational aspect. Early results with lightweight detectors show that the presented approach is a promising solution to deliver better control of the anomaly detection logic.
... This paper presents our work about delivering ADaaS for cloud systems. We defined our approach on top of VARYS [30], which is a technology-agnostic MaaS framework for cloud systems. ADaaS relies on an architecture that is designed to dynamically and automatically deploy, redeploy, and un-deploy anomaly detectors, based on operator's needs. ...
... Note that we started from VARYS [30] to define ADaaS because VARYS provides a flexible API that facilitates integration. However, the ADaaS principle is not limited to VARYS and can be developed also using other MaaS frameworks. ...
... This means that ADaaS can be used with regular monitoring frameworks, but it is more effective when used jointly with a MaaS framework, because both the collected KPIs and the anomaly detection strategies could be changed dynamically. In particular, our prototype has been designed to seamless integrate with VARYS [30]. ...
... Although a large number of monitoring solutions are already available for the Cloud [5,9,10,23,25,29,35,37,38], they are wellknown to badly adapt to heterogeneous and massively distributed infrastructures characterized by frequent changes to the topology of the nodes, such as the Fog [41]. For instance, Abderrahim et al. [1] discuss how a fog monitoring system, unlike cloud-specific solutions, must make particularly good use of the available resources, must be resilient to changes in the topology (e.g., nodes joining and leaving the network) and network conditions (e.g., communication links may not always be fully operational). ...
... In the last decade, a large number of monitoring solutions, both commercial and academic, have been proposed for the Cloud [5,9,10,23,25,29,35,37,38]. However, they are seriously challenged by several characteristics of the Fog, such as its massively distributed infrastructure characterized by frequent changes to the topology and the presence of heterogeneous and resource constrained devices [32,41]. ...
Preprint
Full-text available
Monitoring is a critical component in fog environments: it promptly provides insights about the behavior of systems, reveals Service Level Agreements (SLAs) violations, enables the autonomous orchestration of services and platforms, calls for the intervention of operators, and triggers self-healing actions. In such environments, monitoring solutions have to cope with the heterogeneity of the devices and platforms present in the Fog, the limited resources available at the edge of the network, and the high dynamism of the whole Cloud-to-Thing continuum. This paper addresses the challenge of accurately and efficiently monitoring the Fog with a self-adaptive peer-to-peer (P2P) monitoring solution that can opportunistically adjust its behavior according to the collected data exploiting a lightweight rule-based expert system. Empirical results show that adaptation can improve monitoring accuracy, while reducing network and power consumption at the cost of higher memory consumption.
... In such a scenario, changes are performed solely to the model and system code is generated automatically, without the need to manually adapt and modify different parts of the system. In this sense, it seems promising to employ the same technique to model a monitoring framework, define events, data that needs to be collected, and link constraints to specific elements in the model [19], [29]. ...
... There are already monitoring solutions that allow to collect non-trivial amount of runtime data from many running services, such as the ELK [4] and Prometheus [5] commercial frameworks and the VARYS [6] and Monasca [7] open source research frameworks. While these frameworks can provide end-to-end monitoring capabilities, from data collection to data visualization, they only offer limited autonomous operation capabilities. ...
... There are already monitoring solutions that allow to collect non-trivial amount of runtime data from many running services, such as the ELK [4] and Prometheus [5] commercial frameworks and the VARYS [6] and Monasca [7] open source research frameworks. While these frameworks can provide end-to-end monitoring capabilities, from data collection to data visualization, they only offer limited autonomous operation capabilities. ...
Preprint
Full-text available
Systems of systems are highly dynamic software systems that require flexible monitoring solutions to be observed and controlled. Indeed, operators have to frequently adapt the set of collected indicators according to changing circumstances, to visualize the behavior of the monitored systems and timely take actions, if needed. Unfortunately, dashboard systems are still quite cumbersome to configure and adapt to a changing set of indicators that must be visualized. This paper reports our initial effort towards the definition of an automatic dashboard generation process that exploits metamodel layouts to create a full dashboard from a set of indicators selected by operators.
... There are a number of monitoring techniques that can be used to collect these interactions, especially in the context of cloud technologies. For instance, ELK [2] and Prometheus [9] are two popular monitoring systems that can be used to probe cloud applications, while Monasca [4], CloudHealth [42], and Varys [43] provide more sophisticated monitoring features, including the capability to dynamically turn on and off probes. ...
Conference Paper
Full-text available
Microservice-based applications consist of multiple services that can evolve independently. When services are modified, they are typically tested before being deployed. However, the test suites that are executed are usually designed without the exact knowledge about how the services will be accessed and used in the field, therefore they may easily miss relevant test scenarios, failing to prevent the deployment of faulty services. To address this problem, we introduce ExVivoMicroTest, an approach that analyzes the execution of deployed services at run-time in the field, in order to generate test cases for future versions of the same services. ExVivoMicroTest exploits cloud technologies, containers in particular, to generate a mocked environment that fully isolates the service under test from the rest of the system. It then reproduces service interactions as previously analyzed, thus testing the new version of the service against usage scenarios that capture the field usages of its earlier versions. We evaluate our approach on an open sourced microservices application and show that ExVivoMicroTest can effectively reveal faults based on automatically collected data.
Article
Microservice‐based applications consist of multiple services that can evolve independently. When a service must be updated, it is first tested with in‐house regression test suites. However, the test suites that are executed are usually designed without the exact knowledge about how the services will be accessed and used in the field; therefore, they may easily miss relevant test scenarios, failing to prevent the deployment of faulty services. To address this problem, we introduce ExVivoMicroTest, an approach that analyzes the execution of deployed services at run‐time in the field, in order to generate test cases for future versions of the same services. ExVivoMicroTest implements lightweight monitoring and tracing capabilities, to inexpensively record executions that can be later turned into regression test cases that capture how services are used in the field. To prevent accumulating an excessive number of test cases, ExVivoMicroTest uses a test coverage model that can discriminate the recorded executions between the ones that are worth to be turned into test cases and the ones that should be discarded. The resulting test cases use a mocked environment that fully isolates the service under test from the rest of the system to faithfully reply interactions. We assessed ExVivoMicroTest with the PiggyMetrics and Train Ticket open source microservice applications and studied how different configurations of the monitoring and tracing logic impact on the capability to generate test cases. This paper proposes ExVivoMicroTest, a technique that can trace and turn interactions between microservices recorded in the field into in‐house regression test cases. ExVivoMicroTest includes policies to dynamically adjust tracing probabilities on the collected data, to collect interactions only when they are likely to represent new untested behaviors. Empirical results show that ExVivoMicroTest can generate regression test suites that cover a relevant portion of the behavioral space of the microservices under tests.
Conference Paper
Full-text available
Smart Cities represent the future of urban aggregation, where a multitude of heterogeneous systems and IoT devices interact to provide a safer, more efficient, and greener environment. The vision for smart cities is adapting accordingly to the evolution of software and IoT based services. The current trend is not to have a big comprehensive system, but a plethora of small, well integrated systems that interact one with each other. Monitoring these kinds of systems is challenging for a number of reasons. Having a centralized and modular monitoring infrastructure, which is able to translate high level monitoring requirements into low level software metrics to monitor and to provide and deploy the probes accordingly, may prove to ease the monitoring of the systems within the context of smart cities. It will also help in dealing with, or at least mitigating, conflicting requirements coming from the different stakeholders involved. In this work, we envision a novel approach for monitoring IoT applications in a Smart City scenario, where a quality model of the services can enable the monitoring activities in a flexible and manageable way.
Article
Full-text available
Cloud systems are complex and large systems where services provided by different operators must coexist and eventually cooperate. In such a complex environment, controlling the health of both the whole environment and the individual services is extremely important to timely and effectively react to misbehaviours, unexpected events, and failures. Although there are solutions to monitor cloud systems at different granularity levels, how to relate the many KPIs that can be collected about the health of the system and how health information can be properly reported to operators are open questions. This paper reports the early results we achieved in the challenge of monitoring the health of cloud systems. In particular we present CloudHealth, a model-based health monitoring approach that can be used by operators to watch specific quality attributes. The Cloud-Health Monitoring Model describes how to operationalize high level monitoring goals by dividing them into subgoals, deriving metrics for the subgoals, and using probes to collect the metrics. We use the CloudHealth Monitoring Model to control the probes that must be deployed on the target system, the KPIs that are dynamically collected, and the visualization of the data in dashboards.
Conference Paper
Full-text available
Auto-scalability has become an evident feature for cloud software systems including but not limited to big data and IoT applications. Cloud application providers now are in full control over their applications' microservices and macroservices; virtual machines and containers can be provisioned or deprovisioned on demand at runtime. Elascale strives to adjust both micro/macro resources with respect to workload and changes in the internal state of the whole application stack. Elascale leverages Elasticsearch stack for collection, analysis and storage of performance metrics. Elascale then uses its default scaling engine to elastically adapt the managed application. Extendibility is guaranteed through provider, schema, plug-in and policy elements in the Elascale by which flexible scalability algorithms, including both reactive and proactive techniques, can be designed and implemented for various technologies, infrastructures and software stacks. In this paper, we present the architecture and initial implementation of Elascale; an instance will be leveraged to add auto-scalability to a generic IoT application. Due to zero dependency to the target software system, Elascale can be leveraged to provide auto-scalability and monitoring as-a-service for any type of cloud software system.
Conference Paper
Full-text available
Over the past decade, Cloud Computing has rapidly become a widely accepted paradigm with core concepts such as elasticity, scalability and on demand automatic resource provisioning emerging as next generation Cloud service-must have-properties. Automatic resource provisioning for Cloud applications is not a trivial task, requiring for both the applications and platform, to be constantly monitored, capturing information at various levels and time granularity. In this paper we describe the challenges that occur when monitoring elastically adaptive Cloud applications and to address these issues we present JCatascopia, a fully automated, multi-layer, interoperable Cloud Monitoring System. Experiments on different production Cloud platforms show that JCatascopia is a Monitoring System capable of supporting a fully automated Cloud resource provisioning system with proven interoperability, scalability and low runtime footprint. Most importantly, JCatascopia is able to adapt in a fully automatic manner when elasticity actions are enforced to an application deployment.
Conference Paper
Full-text available
Abstract— Cloud computing is a computing model through which resources such as – infrastructure, applications or software are offered as services to the users. Cloud computing offers the opportunity of virtualization through deploying multiple virtual machines (VMs) on single physical machine, which increases resource utilization and reduces power consumption. The main benefit of a virtualized technology relies on its on-demand resource allocation strategy and flexible management. OpenStack is one of the promising open source solutions which offers infrastructure as a service. This paper covers how underlying infrastructure for deployment affects the performance of OpenStack. The aim is to provide a comparative view on the performance of OpenStack while deploying it over a virtual environment versus using dedicated hardware. We conduct three basic tests on both environments to check CPU performance, data transfer rate, and bandwidth. The results show that OpenStack over dedicated hardware performs much better than OpenStack over virtualized environment.
Conference Paper
Full-text available
In computing clouds, it is desirable to avoid wasting resources as a result of under-utilization and to avoid lengthy response times as a result of over-utilization. In this paper, we propose a new approach for dynamic autonomous resource management in computing clouds. The main contribution of this work is two-fold. First, we adopt a distributed architecture where resource management is decomposed into independent tasks, each of which is performed by Autonomous Node Agents that are tightly coupled with the physical machines in a data center. Second, the Autonomous Node Agents carry out configurations in parallel through Multiple Criteria Decision Analysis using the PROMETHEE method. Simulation results show that the proposed approach is promising in terms of scalability, feasibility and flexibility.
Article
Full-text available
CLOUD COMPUTING, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Developers with innovative ideas for new Internet services no longer require the large capital outlays in hardware to deploy their service or the human expense to operate it. They need not be concerned about overprovisioning for a service whose popularity does not meet their predictions, thus wasting costly resources, or underprovisioning for one that becomes wildly popular, thus missing potential customers and revenue. Moreover, companies with large batch-oriented tasks can get results as quickly as their programs can scale, since using 1,000 servers for one hour costs no more than using one server for 1,000.
Article
This issue's "Cloud Tidbit" focuses on container technology and how it's emerging as an important part of the cloud computing infrastructure. It looks at Docker, an open source project that automates the faster deployment of Linux applications, and Kubernetes, an open source cluster manager for Docker containers.
Article
This paper presents a novel monitoring architecture addressed to the cloud provider and the cloud consumers. This architecture offers a monitoring platform-as-a-Service to each cloud consumer that allows to customize the monitoring metrics. The cloud provider sees a complete overview of the infrastructure whereas the cloud consumer sees automatically her cloud resources and can define other resources or services to be monitored. This is accomplished by means of an adaptive distributed monitoring architecture automatically deployed in the cloud infrastructure. This architecture has been implemented and released under GPL license to the community as “MonPaaS”, open source software for integrating Nagios and OpenStack. An intensive empirical evaluation of performance and scalability have been done using a real deployment of a cloud computing infrastructure in which more than 3,700 VMs have been executed.
Article
the past few years, cloud computing has grown from being a promising business idea to one of the fastest growing parts of the IT industry. IT organizations have expresses concern about critical issues (such as security) that exist with the widespread implementation of cloud computing. These types of concerns originate from the fact that data is stored remotely from the customer's location; in fact, it can be stored at any location. Security, in particular, is one of the most argued-about issues in the cloud computing field; several enterprises look at cloud computing warily due to projected security risks. The risks of compromised security and privacy may be lower overall, however, with cloud computing than they would be if the data were to be stored on individual machines instead of in a so­ called "cloud" (the network of computers used for remote storage and maintenance). Comparison of the benefits and risks of cloud computing with those of the status quo are necessary for a full evaluation of the viability of cloud computing. Consequently, some issues arise that clients need to consider as they contemplate moving to cloud computing for their businesses. In this paper I summarize reliability, availability, and security issues for cloud computing (RAS issues), and propose feasible and available solutions for some of them.
Article
With the significant advances in Information and Communications Technology (ICT) over the last half century, there is an increasingly perceived vision that computing will one day be the 5th utility (after water, electricity, gas, and telephony). This computing utility, like all other four existing utilities, will provide the basic level of computing service that is considered essential to meet the everyday needs of the general community. To deliver this vision, a number of computing paradigms have been proposed, of which the latest one is known as Cloud computing. Hence, in this paper, we define Cloud computing and provide the architecture for creating Clouds with market-oriented resource allocation by leveraging technologies such as Virtual Machines (VMs). We also provide insights on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain Service Level Agreement (SLA)-oriented resource allocation. In addition, we reveal our early thoughts on interconnecting Clouds for dynamically creating global Cloud exchanges and markets. Then, we present some representative Cloud platforms, especially those developed in industries, along with our current work towards realizing market-oriented resource allocation of Clouds as realized in Aneka enterprise Cloud technology. Furthermore, we highlight the difference between High Performance Computing (HPC) workload and Internet-based services workload. We also describe a meta-negotiation infrastructure to establish global Cloud exchanges and markets, and illustrate a case study of harnessing ‘Storage Clouds’ for high performance content delivery. Finally, we conclude with the need for convergence of competing IT paradigms to deliver our 21st century vision.
Article
Scalability is said to be one of the major advantages brought by the cloud paradigm and, more specifically, the one that makes it different to an "advanced outsourcing" solution. However, there are some important pending issues before making the dreamed automated scaling for applications come true. In this paper, the most notable initiatives towards whole application scalability in cloud environments are presented. We present relevant efforts at the edge of state of the art technology, providing an encompassing overview of the trends they each follow. We also highlight pending challenges that will likely be addressed in new research efforts and present an ideal scalable cloud system.
Design Patterns for Container-based Distributed Systems
  • B Burns
  • D Oppenheimer
B. Burns and D. Oppenheimer. 2016. Design Patterns for Container-based Distributed Systems. In Proceedings of the 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).
Elasticsearch: RESTful, Distributed Search & Analytics
  • B V Elasticsearch
Elasticsearch BV. 2019. Elasticsearch: RESTful, Distributed Search & Analytics. https://www.elastic.co/products/elasticsearch. [Online; accessed 15-May-2019].
Monasca - an OpenStack Community project
  • Hewlett-Packard Enterprise Development
Hewlett-Packard Enterprise Development LP. 2017. Monasca -an OpenStack Community project. http://http://monasca.io/. [Online; accessed 15-May-2019].
Flask (A Python Microframework
  • Armin Ronacher
Armin Ronacher. 2019. Flask (A Python Microframework). http://flask.pocoo.org/. [Online; accessed 15-May-2019].
Elasticsearch Beats - Lightweight Data Shippers
  • B V Elasticsearch
Elasticsearch B.V. 2019. Elasticsearch Beats -Lightweight Data Shippers. https: //www.elastic.co/products/beats/. [Online; accessed 15-May-2019].
DAViD PAttER-Son, and ARiEL RABKin. 2010. A view of cloud computing
  • Randy Anthony D Josep
  • Andy Katz
  • Konwinski
  • Lee Gunho
Anthony D JoSEP, RAnDy KAtz, AnDy KonWinSKi, LEE Gunho, DAViD PAttER-Son, and ARiEL RABKin. 2010. A view of cloud computing. Commun. ACM 53, 4 (2010).
Salvatore Sanfilippo and contributors
Salvatore Sanfilippo and contributors. 2019. Redis.io. https://redis.io/. [Online; accessed 15-May-2019].