Context in source publication

Context 1
... deployed Ceph [11], a distributed file system, to store glance images. Figure 2 presents the different OpenStack services and their distribution on hosts. ...

Citations

... On these servers, a highly available OpenStack installation is deployed, which functions as private cloud [280]. Since OpenStack consists of management services itself, OpenStack occupies a subset of physical hosts. ...
Thesis
Cloud computing is widely applied by modern software development companies. Providing digital services in a cloud environment offers both the possibility of cost-efficient usage of computation resources and the ability to dynamically scale applications on demand. Based on this flexibility, more and more complex software applications are being developed leading to increasing maintenance efforts to ensure the reliability of the entire system infrastructure. Furthermore, highly available cloud service requirements (99.999% as industry standards) are difficult to guarantee due to the complexity of modern systems and can therefore just be ensured by great effort. Due to these trends, there is an increasing demand for intelligent applications that automatically detect anomalies and provide suggestions solving or at least mitigating problems in order not to cascade a negative impact on the service quality. This thesis focuses on the detection of degraded abnormal system states in cloud environments. A holistic analysis pipeline and infrastructure is proposed, and the applicability of different machine learning strategies is discussed to provide an automated solution. Based on the underlying assumptions, a novel unsupervised anomaly detection algorithm called CABIRCH is presented and its applicability is analyzed and discussed. Since the choice of hyperparameters has a great influence on the accuracy of the algorithm, a hyperparameter selection procedure with a novel fitness function is proposed, leading to further automation of the integrated anomaly detection. The method is generalized and applicable for a variety of unsupervised anomaly detection algorithms, which will be evaluated including a comparison to recent publications. The results show the applicability for the automated detection of degraded abnormal system states and possible limitations are discussed. The results show that detection of system anomaly scenarios achieves accurate detection rates but comes with a false alarm rate of more than 1%.
... A high available OpenStack private cloud platform [21] forms the foundation of the evaluation scenario. The platform runs on 14 commodity servers with the following specifications: ...
Article
Full-text available
Reliable deployment of services is especially challenging in virtualized infrastructures, where the deep tech-nological stack and the multitude of components necessitate automatic anomaly detection and remediation mechanisms. Traditional monitoring solutions observe the system and generate alarms when the collected metrics exceed predefined thresholds. The fixed thresholds rely on expert knowledge and can lead to numerous false alarms, while abnormal behavior that spans over multiple metrics, components, or system layers, may not be detected. We propose to use an unsupervised online clustering algorithm to create a model of the normal behavior of each monitored component with minimal human interaction and no impact on the monitored system. When an anomaly is detected, a human administrator or automatic remediation system can subsequently revert the component into a normal state. An experimental evaluation resulted in a high accuracy of our approach, indicating that it is suitable for anomaly detection in productive systems.
... Clearwater respects the basic IMS architectural principles and interfaces well-known in the telecommunication world. In the literature Clearwater has been use as the main testbed for anomaly detection in nfv [164], S. Makhsous et al. used it to study high-availability [165] for NFV deployments. ...
Thesis
The main goal of the PhD activities is to define and develop architecture and mechanisms to ensure consistency and continuity of the operations and behaviors in mixed physical/virtual environments, characterized by a high level of dynamicity, elasticity and heterogeneity by applying a cognitive approach to the architecture where applicable. The target is then to avoid the "build it first, manage it later" paradigm. The research questions targeted by the PhD are the following: 1. Identify the changes on Network Operation Support Systems implementation when using SDN as a design approach for future networks. The study could be restricted to mobile networks for example, or sub-part of it (CORE networks, RAN, data centers, etc); 2.Identify the needed evolution at the management interfaces level: a. Shall we need alternative to the well-known FCAPS and do we still need the element management system? b. What will change to provision an SDN based service? c. How to ensure resiliency of SDN based networks?
... The evaluation testbed is based on 13 physical machines, each equipped with an Intel Xeon X3450 (4 cores), 16GB Ram, three 1TB disk and two 1GBit Ethernet interfaces. The cloud environment is based on a high available OpenStack installation [18]. Since a number of hosts are occupied by the redundant OpenStack services, only a sub set of 6 physical hosts are included in the evaluation. ...
Conference Paper
Full-text available
Critical services in the field of Network Function Virtualization require elaborate reliability and high availability mechanisms to meet the high service quality requirements. Traditional monitoring systems detect overload situations and outages in order to automatically scale out services or mask faults. However, faults are often preceded by anomalies and subtle misbehaviors of the services, which are overlooked when detecting only outages. We propose to exploit machine learning techniques to detect abnormal behavior of services and hosts by analysing metrics collected from all layers and components of the cloud infrastructure. Various algorithms are able to compute models of a hosts normal behavior that can be used for anomaly detection at runtime. An offline evaluation of data collected from anomaly injection experiments shows that the models are able to achieve very high precision and recall values.
Article
Software as a Service is evolving as a leader model for cloud service delivery, enabling service providers to remotely deliver hosted, developed and managed software over the Internet. In parallel, some IT services are moving from traditional Internet services to cloud services based on peer-to-peer technologies. However, the P2P-based cloud is a large-scale, heterogeneous and highly dynamic environment whose performance is highly dependent on its ability to maintain persistent availability of SaaS services. In this paper, we propose an approach for improving SaaS service availability in order to meet service quality requirements and maintain performance in a P2P-Based cloud environment. It is mainly based on a new hybrid clustering mechanism that aims to provide a virtual and optimal infrastructure in order to organize the system peers into distinct clusters represented by virtual nodes forming together a virtual layer. This layer allows not only the distribution of peer providers but also the formation of condensed areas of each service of interest for a set of neighboring peers, which improve the availability probability of services in specific regions. In addition, a service availability measurement model was proposed based on the use of the system’s virtual layer taking into account different entities at different levels. The experimental results show that the proposed approach improves the probability of SaaS service availability and the reliability of the P2P-Cloud system. It responds mainly to the large-scale nature of distributed systems as well as making the best trade-off of maintaining QOS in terms of availability, performance and cost.
Conference Paper
Software Networks built by combining Software Defined Networks (SDN), Network Function Virtualization (NFV) and Cloud principles call for agile and dynamic automation of management operations to ensure continuous provisioning and deployment of networked resources and services. In this context, efficient Service Level Agreements (SLA) management and anticipation of Service Level Objectives (SLO) breaches become essential to fulfill established service contracts with clients. In this paper, we design and specify a framework for cognitive SLA enforcement (using Artificial Neural Network learning) for networking services involving VNFs (Virtualized Network Functions) and SDN controllers. A proof of concept, a testbed description and an extensive evaluation assess the performance of the proposed framework.