Fig 5 - uploaded by André van Hoorn
Content may be subject to copyright.
Source publication
Execution traces capture information on a software system’s runtime behavior, including data on system-internal software control flows, performance, as well as request parameters and values. In research and industrial practice, execution traces serve as an important basis for model-based and measurement-based performance evaluation, e.g., for appli...
Similar publications
Citations
... In the same way, OpenTelemetry [35] was proposed, based on two former standards: OpenCensus [63] and Open-Tracing [64]. Other standards like OpenXTrace [65] were proposed, although they did not get much attention from academia and industry. The benefit of using such standards is to ease the data exchange and integration of different monitoring components. ...
Fog computing is a distributed paradigm that provides computational resources in the users’ vicinity. Fog orchestration is a set of functionalities that coordinate the dynamic infrastructure and manage the services to guarantee the Service Level Agreements. Monitoring is an orchestration functionality of prime importance. It is the basis for resource management actions, collecting status of resource and service and delivering updated data to the orchestrator. There are several cloud monitoring solutions and tools, but none of them comply with fog characteristics and challenges. Fog monitoring solutions are scarce, and they may not be prepared to compose an orchestration service. This paper updates the knowledge base about fog monitoring, assessing recent subjects in this context like observability, data standardization and instrumentation domains. We propose a novel taxonomy of fog monitoring solutions, supported by a systematic review of the literature. Fog monitoring proposals are analyzed and categorized by this new taxonomy, offering researchers a comprehensive overview. This work also highlights the main challenges and open research questions.
... Because Dialogflow is limited to conversations, we had to implement the fulfillment functionality [1] for Dialogflow to connect it to a web service that actually executes load tests. We send the load test configuration to the Vizard framework [22], which executes load tests and sends back execution reports. For the load test execution itself, Vizard uses Apache JMeter. ...
... To achieve low overhead, resource monitoring approaches typically do not aim at complex in-vivo online analyses , e.g., aggregation of monitoring information or constraint checking. Instead, many tools collect extensive traces, e.g., using Open.XTrace [44] , in order to support different post-mortem offline analyses . ...
... Like the other parts of the architecture, this is not provided by all monitoring frameworks, and may take very different forms, ranging from simple trace files to entire database solutions. In recent years, several initiatives have pushed on standardization for persistence and interchange formats, e.g., Open.XTrace [44] or OpenTracing 7 for performance monitoring. ...
[Context] Complex and heterogeneous software systems need to be monitored as their full behavior often only emerges at runtime, e.g., when interacting with other systems or the environment. Software monitoring approaches observe and check properties or quality attributes of software systems during operation. Such approaches have been developed in diverse communities for various kinds of systems and purposes. For instance, requirements monitoring aims to check at runtime whether a software system adheres to its requirements, while resource or performance monitoring collects information about the consumption of computing resources by the monitored system. Many venues publish research on software monitoring, often using diverse terminology, and focusing on different monitoring aspects and phases. The lack of a comprehensive overview of existing research often leads to re-inventing the wheel. [Objective] We provide a domain model to structure and systematize the field of software monitoring, starting with requirements and resource monitoring. [Method] We developed an initial domain model based on (i) our extensive experiences with requirements and resource monitoring, (ii) earlier efforts to develop a comparison framework for monitoring approaches, and (iii) an earlier systematic literature review on requirements monitoring frameworks. We then systematically analyzed 47 existing requirements and resource monitoring approaches to iteratively refine the domain model and to develop a reference architecture for software monitoring approaches. [Results] Our domain model covers the key elements of monitoring approaches and allows analyzing their commonalities and differences. Together with the reference architecture, our domain model supports the development of integrated monitoring solutions. We provide details on 47 approaches we analyzed with the model to assess its coverage. We also evaluate the reference architecture by instantiating it for five different monitoring solutions. [Conclusions] We conclude that requirements and resource monitoring have more commonalities than differences, which is promising for the future integration of existing monitoring solutions.
... Kieker traces are also available for use with tools other than Kieker's own tooling, as they can be automatically transformed to Open Execution Trace Exchange (OPEN.xtrace) traces, an open source trace format enabling interoperability between software performance engineering approaches (Okanović et al., 2016). ...
Energy efficiency of computing systems has become an increasingly important issue over the last decades. In 2015, data centers were responsible for 2% of the world's greenhouse gas emissions, which is roughly the same as the amount produced by air travel. In addition to these environmental concerns, power consumption of servers in data centers results in significant operating costs, which increase by at least 10% each year. To address this challenge, the U.S. EPA and other government agencies are considering the use of novel measurement methods in order to label the energy efficiency of servers. The energy efficiency and power consumption of a server is subject to a great number of factors, including, but not limited to, hardware, software stack, workload, and load level. This huge number of influencing factors makes measuring and rating of energy efficiency challenging. It also makes it difficult to find an energy-efficient server for a specific use-case. Among others, server provisioners, operators, and regulators would profit from information on the servers in question and on the factors that affect those servers' power consumption and efficiency. However, we see a lack of measurement methods and metrics for energy efficiency of the systems under consideration. Even assuming that a measurement methodology existed, making decisions based on its results would be challenging. Power prediction methods that make use of these results would aid in decision making. They would enable potential server customers to make better purchasing decisions and help operators predict the effects of potential reconfigurations. Existing energy efficiency benchmarks cannot fully address these challenges, as they only measure single applications at limited sets of load levels. In addition, existing efficiency metrics are not helpful in this context, as they are usually a variation of the simple performance per power ratio, which is only applicable to single workloads at a single load level. Existing data center efficiency metrics, on the other hand, express the efficiency of the data center space and power infrastructure, not focusing on the efficiency of the servers themselves. Power prediction methods for not-yet-available systems that could make use of the results provided by a comprehensive power rating methodology are also lacking. Existing power prediction models for hardware designers have a very fine level of granularity and detail that would not be useful for data center operators. This thesis presents a measurement and rating methodology for energy efficiency of servers and an energy efficiency metric to be applied to the results of this methodology. We also design workloads, load intensity and distribution models, and mechanisms that can be used for energy efficiency testing. Based on this, we present power prediction mechanisms and models that utilize our measurement methodology and its results for power prediction. Specifically, the six major contributions of this thesis are: We present a measurement methodology and metrics for energy efficiency rating of servers that use multiple, specifically chosen workloads at different load levels for a full system characterization. We evaluate the methodology and metric with regard to their reproducibility, fairness, and relevance. We investigate the power and performance variations of test results and show fairness of the metric through a mathematical proof and a correlation analysis on a set of 385 servers. We evaluate the metric's relevance by showing the relationships that can be established between metric results and third-party applications. We create models and extraction mechanisms for load profiles that vary over time, as well as load distribution mechanisms and policies. The models are designed to be used to define arbitrary dynamic load intensity profiles that can be leveraged for benchmarking purposes. The load distribution mechanisms place workloads on computing resources in a hierarchical manner. Our load intensity models can be extracted in less than 0.2 seconds and our resulting models feature a median modeling error of 12.7% on average. In addition, our new load distribution strategy can save up to 10.7% of power consumption on a single server node. We introduce an approach to create small-scale workloads that emulate the power consumption-relevant behavior of large-scale workloads by approximating their CPU performance counter profile, and we introduce TeaStore, a distributed, micro-service-based reference application. TeaStore can be used to evaluate power and performance model accuracy, elasticity of cloud auto-scalers, and the effectiveness of power saving mechanisms for distributed systems. We show that we are capable of emulating the power consumption behavior of realistic workloads with a mean deviation less than 10% and down to 0.2 watts (1%). We demonstrate the use of TeaStore in the context of performance model extraction and cloud auto-scaling also showing that it may generate workloads with different effects on the power consumption of the system under consideration. We present a method for automated selection of interpolation strategies for performance and power characterization. We also introduce a configuration approach for polynomial interpolation functions of varying degrees that improves prediction accuracy for system power consumption for a given system utilization. We show that, in comparison to regression, our automated interpolation method selection and configuration approach improves modeling accuracy by 43.6% if additional reference data is available and by 31.4% if it is not. We present an approach for explicit modeling of the impact a virtualized environment has on power consumption and a method to predict the power consumption of a software application. Both methods use results produced by our measurement methodology to predict the respective power consumption for servers that are otherwise not available to the person making the prediction. Our methods are able to predict power consumption reliably for multiple hypervisor configurations and for the target application workloads. Application workload power prediction features a mean average absolute percentage error of 9.5%. Finally, we propose an end-to-end modeling approach for predicting the power consumption of component placements at run-time. The model can also be used to predict the power consumption at load levels that have not yet been observed on the running system. We show that we can predict the power consumption of two different distributed web applications with a mean absolute percentage error of 2.2%. In addition, we can predict the power consumption of a system at a previously unobserved load level and component distribution with an error of 1.2%. The contributions of this thesis already show a significant impact in science and industry. The presented efficiency rating methodology, including its metric, have been adopted by the U.S. EPA in the latest version of the ENERGY STAR Computer Server program. They are also being considered by additional regulatory agencies, including the EU Commission and the China National Institute of Standardization. In addition, the methodology's implementation and the underlying methodology itself have already found use in several research publications. Regarding future work, we see a need for new workloads targeting specialized server hardware. At the moment, we are witnessing a shift in execution hardware to specialized machine learning chips, general purpose GPU computing, FPGAs being embedded into compute servers, etc. To ensure that our measurement methodology remains relevant, workloads covering these areas are required. Similarly, power prediction models must be extended to cover these new scenarios.
... Kieker traces are also available for use with tools other than Kieker's own tooling, as they can be automatically transformed to Open Execution Trace Exchange (OPEN.xtrace) traces, an open source trace format enabling interoperability between software performance engineering approaches [35]. ...
Researchers propose and employ various methods to
analyze, model, optimize and manage modern distributed cloud
applications. In order to demonstrate and evaluate these methods
in realistic scenarios, researchers rely on reference applications.
These applications should offer a range of different behaviors,
degrees of freedom allowing for customization and should use
a modern and representative technology stack. Existing testing
and benchmarking applications are either outdated, designed for
specific testing scenarios, or do not offer the necessary degrees of
freedom. Further, most cloud reference applications are difficult
to deploy and run.
In this paper, we present the TeaStore, a micro-service-based
test and reference cloud application. TeaStore offers services
with various performance characteristics and a high degree of
freedom regarding its deployment and configuration to be used
as a cloud reference application for researchers. The TeaStore
is designed for the evaluation of performance modeling and
resource management techniques. We invite cloud researchers to
use the TeaStore and provide it open-source, extendable, easily
deployable and monitorable.
... Kieker traces are also available for use with tools other than Kieker's own tooling, as they can be automatically transformed to Open Execution Trace Exchange (OPEN.xtrace) traces, an open source trace format enabling interoperability between software performance engineering approaches [35]. ...
Modern distributed applications offer complex performance behavior and many degrees of freedom regarding deployment and configuration. Researchers employ various methods of analysis, modeling, and management that leverage these degrees of freedom to predict or improve non-functional properties of the software under consideration. In order to demonstrate and evaluate their applicability in the real world, methods resulting from such research areas require test and reference applications that offer a range of different behaviors, as well as the necessary degrees of freedom. Existing production software is often inaccessible for researchers or closed off to instrumentation. Existing testing and benchmarking frameworks, on the other hand, are either designed for specific testing scenarios, or they do not offer the necessary degrees of freedom. Further, most test applications are difficult to deploy and run, or are outdated. In this paper, we introduce the TeaStore, a state-of-the-art micro-service-based test and reference application. TeaStore offers services with different performance characteristics and many degrees of freedom regarding deployment and configuration to be used as a benchmarking framework for researchers. The TeaStore allows evaluating performance modeling and resource management techniques; it also offers instrumented variants to enable extensive run-time analysis. We demonstrate TeaStore's use in three contexts: performance modeling, cloud resource management, and energy efficiency analysis. Our experiments show that TeaStore can be used for evaluating novel approaches in these contexts and also motivates further research in the areas of performance modeling and resource management.
... specific analysis goals and capabilities. Interoperability of evaluation techniques can be achieved by automated performance model extraction [10,27], and by transformations of monitoring data [18] and performance models [9]. Integrating all aspects into a unified interface, declarative software performance engineering (SPE) approaches automate the whole process of deriving performance metrics [28]. ...
... Qualitative comparisons of measurement-based approaches have been performed in [13,18,29]. Gartner evaluates application performance management (APM) tool vendors on a yearly basis [13]. ...
... Kowall and Cappelli [13] define three dimensions of APM functionality: application topology discovery and visualization, application component deep dive, and user-defined transaction profiling. Okanović et al. [18] compare application performance monitoring tools concerning derivable information, like call parameters or error stack traces. According to Watson [29], key aspects to consider include programming language support, cloud support, software-as-a-service (SaaS) versus on-premise, pricing, and easeof-use. ...
Software performance engineering (SPE) provides a plethora of methods and tooling for measuring, modeling, and evaluating performance properties of software systems. The solution approaches come with different strengths and limitations concerning, for example, accuracy, time-to-result, or system overhead. While approaches allow for interchangeability, the choice of an appropriate approach and tooling to solve a given performance concern still relies on expert knowledge. Currently, there is no automated and extensible approach for decision support. In this paper, we present a methodology for the automated selection of performance engineering approaches tailored to user concerns. We decouple the complexity of selecting an SPE approach for a given scenario providing a decision engine and solution approach capability models. This separation allows to easily append additional solution approaches and rating criteria. We demonstrate the applicability by presenting decision engines that compare measurement- and model-based analysis approaches.
... Regarding the standardization of a common exchange format attempts have been done both in the academia [14] and in the industry [15], however there is yet no strong adoption of such proposals. Our approach does not aim at creating a new protocol or a new vocabulary, we rather aim at reducing product teams efforts, preventing them to care about the underlying protocol and make metrics available to them selfservice. ...
Modern DevOps pipelines entail extreme automation and speed as paramount assets for continuous application improvement. Likewise, monitoring is required to assess the quality of service and user-experience such that applications can continuously evolve towards use-centric excellence. In this scenario however, it is increasingly difficult to pull up and maintain efficient monitoring infrastructures which are frictionless, i.e., they do not introduce any slowdown neither in the DevOps pipeline nor in the DevOps organizational and social structure comprising multiple roles and responsibilities. Using an experimental prototype, this paper elaborates Omnia an approach for structured monitoring configuration and rollout based around a monitoring factory, i.e., a re-interpretation of the factory design-pattern for building and managing ad-hoc monitoring platforms. Comparing with practitioner surveys and the state of the art, we observed that Omnia shows the promise of delivering an effective solution that tackles the steep learning curve and entry costs needed to embrace cloud monitoring and monitoring-based DevOps continuous improvement.
... A similar approach is likely possible for Kieker through plug-in extensions [19]. An alternative could be using a common exchange format such as Open.Xtrace [12] enabling a more generic integration. Also other kinds of data sources such as offline monitoring databases or DQL could be used instead. ...
In large and complex systems there is a need to monitor resources as it is critical for system operation to ensure sufficient availability of resources and to adapt the system as needed. While there are various (resource)-monitoring solutions, these typically do not include an analysis part that takes care of analyzing violations and responding to them. In this paper we report on experiences, challenges and lessons learned in creating a solution for performing requirements-monitoring for resource constraints and using this as a basis for adaptation to optimize the resource behavior. Our approach rests on reusing two previous solutions (one for resource monitoring and one for requirements-based adaptation) that were built in our group.
... Their high level of abstraction prevents an straightforward realization of REQ1 and REQ3, making them not suitable for our purposes. A more similar approach to the desired generic format is the Common Trace API (CTA), also known as OPEN.XTRACE [12]. The CTA is an API for representing monitored execution traces in a tool independent fashion, therefore immediately satisfies REQ1 and REQ2. ...
The performance of software systems is an ongoing issue in the industry, including the development of corresponding performance models. Recently several approaches for deriving such performance models from monitoring data have been proposed. A current limitation of these approaches is that most of them are bound to certain monitoring tools for providing the data, limiting their applicability.
We therefore propose a generic platform for transforming monitoring data into performance models, encapsulating these approaches for deriving performance models. This platform gives the flexibility of exchanging the monitoring tool or the used performance modeling approach, allowing more comprehensive performance analysis without additional manual transformation work. A seamless exchangeability of the performance modeling approach enables the generation of different types of performance models based on the same monitoring data, while the exchangeability of the monitoring tool enables the same approaches to be employed on a wider range of systems, as often the applicability of certain monitoring tools is limited by environmental properties. In addition, the generic nature of the platform aims to support the rapid development of prototypes of new, upcoming ideas within the context of performance modeling based on monitoring data.
During our evaluation we examine the quality of our approach in terms of accuracy and scalability. We show that our platform for transforming monitoring data into performance models scales with a very low overhead and that the results of the integrated performance modeling approaches are very accurate in comparison to the results of the non-integrated versions.