Figure 1 - uploaded by Wilhelm Hasselbring
Content may be subject to copyright.
Source publication
The DevOps movement intends to improve communication, collaboration , and integration between software developers (Dev) and IT operations professionals (Ops). Automation of software quality assurance is key to DevOps success. We present how automated performance benchmarks may be included into continuous integration. As an example, we report on reg...
Similar publications
Model-based testing (MBT) provides an automated approach for finding discrepancies between software models and their implementation. If we want to incorporate MBT into the fast and iterative software development process that is Continuous Integration Continuous Deployment, then MBT must be able to test the entire model in as little time as possible...
Citations
... Using the trace data, Kieker allows for reverse engineering and visualization of the software architecture. The Kieker developers have been continuously monitoring its performance overhead with the MooBench monitoring overhead microbenchmark [29,42]. ...
... The instrumentation for observability imposes overhead, which should be minimized in production environments. Kieker's overhead has been continuously measured since 2015 4 using the Moo-Bench microbenchmark [42]. It measures both the overall overhead of Kieker tracing and the overhead that is incurred by different factors, which are the instrumentation itself, the measurement, and the final serialization of data. ...
Observability of a software system aims at allowing its engineers and operators to keep the system robust and highly available. With this paper, we present the Kieker Observability Framework Version 2, the successor of the Kieker Monitoring Framework. In this tool artifact paper, we do not just present the Kieker framework, but also a demonstration of its application to the TeaStore benchmark, integrated with the visual analytics tool ExplorViz. This demo is provided both as an online service and as an artifact to deploy it yourself.
... These results show that µOpTime significantly reduces the execution time of microbenchmark suites while keeping a result accuracy. This closes the gap further towards enabling a continuous benchmarking with microbenchmark suites in CI/CD pipelines [21,53]. ...
... Benchmarking in CI/CD pipelines. Besides our study, there are several others that integrate microbenchmarks into CI/CD pipelines to detect performance changes [10,11,19,28,35,53]. Application benchmarks, i.e., stressing fully set-up systems such as a database system with an artificial load such as HTTP requests [6], are used more and more for detecting performance issues [16,17,21,27]. ...
Performance regressions have a tremendous impact on the quality of software. One way to catch regressions before they reach production is executing performance tests before deployment, e.g., using microbenchmarks, which measure performance at subroutine level. In projects with many microbenchmarks, this may take several hours due to repeated execution to get accurate results, disqualifying them from frequent use in CI/CD pipelines. We propose OpTime, a static approach to reduce the execution time of microbenchmark suites by configuring the number of repetitions for each microbenchmark. Based on the results of a full, previous microbenchmark suite run, OpTime determines the minimal number of (measurement) repetitions with statistical stability metrics that still lead to accurate results. We evaluate OpTime with an experimental study on 14 open-source projects written in two programming languages and five stability metrics. Our results show that (i) OpTime reduces the total suite execution time (measurement phase) by up to 95.83% (Go) and 94.17% (Java), (ii) the choice of stability metric depends on the project and programming language, (iii) microbenchmark warmup phases have to be considered for Java projects (potentially leading to higher reductions), and (iv) OpTime can be used to reliably detect performance regressions in CI/CD pipelines.
... 1 https://protobuf.dev/ MooBench addresses another important dimension to the performance engineering of cloud-native applications: performance overhead yielded by performance observability frameworks [1,10,16]. It continuously evaluates the performance regression of its targets. ...
... In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools [9]. MooBench is designed for regression benchmarking within continuous integration pipelines [1] of individual monitoring frameworks, not for comparing such frameworks against each other. ...
Performance engineering has become crucial for the cloud-native architecture. This architecture deploys multiple services, with each service representing an orchestration of containerized processes. OpenTelemetry is growing popular in the cloud-native industry for observing the software's behaviour, and Kieker provides the necessary tools to monitor and analyze the performance of target architectures. Observability overhead is an important aspect of performance engineering and MooBench is designed to compare different observability frameworks, including OpenTelemetry and Kieker. In this work, we measure the overhead of Cloudprofiler, a performance profiler implemented in C++ to measure native and JVM processes. It minimizes the profiling overhead by locating the profiler process outside the target process and moving the disk writing overhead off the critical path with buffer blocks and compression threads. Using MooBench, Cloudprofiler's buffered ID handler with the Zstandard lossless data compression ZSTD showed an average execution time of 2.28 microseconds. It is 6.15 times faster than the non-buffered and non-compression handler.
... MooBench is a benchmark that aims for measuring the performance overhead of monitoring frameworks [22]. To measure the performance overhead, it calls a method recursively with a given call depth . ...
... MooBench is a benchmark that aims for measuring the performance overhead of monitoring frameworks [22]. To measure the performance overhead, it calls a method recursively with a given call depth . ...
The examination of performance changes or the performance behavior of a software requires the measurement of the performance. This is done via probes, i.e., pieces of code which obtain and process measurement data, and which are inserted into the examined application. The execution of those probes in a singular method creates overhead, which deteriorates performance measurements of calling methods and slows down the measurement process. Therefore, an important challenge for performance measurement is the reduction of the measurement overhead. To address this challenge, the overhead should be minimized. Based on an analysis of the sources of performance overhead, we derive the following four optimization options: (1) Source instrumentation instead of AspectJ instrumentation, (2) reduction of measurement data, (3) change of the queue and (4) aggregation of measurement data. We evaluate the effect of these optimization options using the MooBench benchmark. Thereby, we show that these optimizations options reduce the monitoring overhead of the monitoring framework Kieker. For MooBench, the execution duration could be reduced from 4.77 ms to 0.39 ms per method invocation on average.
... If performance issues are detected, then engineers must also be able to promptly fix such issues. To this end, several approaches emerged, e.g., automated performance tests [18] to guarantee the prompt identification and fixing of performance degradation, or performance load testing [19] to evaluate software refactorings that most likely lead to performance improvement. However, most of the approaches in the literature, e.g., [2], [20], [21], [22], act statically on the implementation code. ...
The detection of performance issues in Java-based applications is not trivial since many factors concur to poor performance, and software engineers are not sufficiently supported for this task. The goal of this manuscript is the automated detection of performance problems in running systems to guarantee that no quality-based hinders prevent their successful usage. Starting from software performance antipatterns, i.e., bad practices (e.g., extensive interaction between software methods) expressing both the problem and the solution with the purpose of identifying shortcomings and promptly fixing them, we develop a framework that automatically detects seven software antipatterns capturing a variety of performance issues in Java-based applications. Our approach is applied to real-world case studies from different domains, and it captures four real-life performance issues of Hadoop and Cassandra that were not predicted by state-of-the-art approaches. As empirical evidence, we calculate the accuracy of the proposed detection rules, we show that code commits inducing and fixing real-life performance issues present interesting variations in the number of detected antipattern instances, and solving one of the detected antipatterns improves the system performance up to 50%.
... Furthermore, there exist tool-specific inclusions of regression benchmarking into the CI of tools, e.g. [41] who describe how regression benchmarking was introduced to the Kieker CI process. Schulz et al. [37] discuss how continuous load test generation from production session data can be used to continuously execute load tests of a web application. ...
To develop software with optimal performance, even small performance changes need to be identified. Identifying performance changes is challenging since the performance of software is influenced by non-deterministic factors. Therefore, not every performance change is measurable with reasonable effort. In this work, we discuss which performance changes are measurable at code level with reasonable measurement effort and how to identify them. We present (1) an analysis of the boundaries of measuring performance changes, (2) an approach for determining a configuration for reproducible performance change identification, and (3) an evaluation comparing of how well our approach is able to identify performance changes in the application server Jetty compared with the usage of Jetty's own performance regression benchmarks. Thereby, we find (1) that small performance differences are only measurable by fine-grained measurement workloads, (2) that performance changes caused by the change of one operation can be identified using a unit-test-sized workload definition and a suitable configuration, and (3) that using our approach identifies small performance regressions more efficiently than using Jetty's performance regression benchmarks.
... user experience, performance issues can also occupy additional resources and result in major fixing efforts which all imply unpredictable additional costs [14,60,61]. Thus, performance changes should ideally be detected by a Continuous Integration and Deployment (CI/CD) pipeline immediately after a code change is checked in [11,17,23,27,35,48,59]. ...
... Nevertheless, neither benchmarking technique is currently suited to be executed on every code change due to the extensive execution durations of several hours as well as the resulting costs [18,27,40,55,59]. Applying one of these two approaches to a large project with many application developers, hundreds of source code files, and multiple code changes per day would soon create a stack of bench-mark tasks that would prevent fast-paced software development and integration of individual changes. ...
... A welldesigned application benchmark can provide answers to many performance-related questions and also can be used to compare different versions of an SUT. This is especially relevant for the context of this work, in which a dedicated benchmark step as part of a CI/CD pipeline is envisioned [27,59]. On the other hand, however, continuous benchmarking for early performance regression detection is expensive, complex, and time-consuming [11,55]. ...
Software performance changes are costly and often hard to detect pre-release. Similar to software testing frameworks, either application benchmarks or microbenchmarks can be integrated into quality assurance pipelines to detect performance changes before releasing a new application version. Unfortunately, extensive benchmarking studies usually take several hours which is problematic when examining dozens of daily code changes in detail; hence, trade-offs have to be made. Optimized microbenchmark suites, which only include a small subset of the full suite, are a potential solution for this problem, given that they still reliably detect the majority of the application performance changes such as an increased request latency. It is, however, unclear whether microbenchmarks and application benchmarks detect the same performance problems and one can be a proxy for the other. In this paper, we explore whether microbenchmark suites can detect the same application performance changes as an application benchmark. For this, we run extensive benchmark experiments with both the complete and the optimized microbenchmark suites of the two time-series database systems InuxDB and VictoriaMetrics and compare their results to the results of corresponding application benchmarks. We do this for 70 and 110 commits, respectively. Our results show that it is possible to detect application performance changes using an optimized microbenchmark suite if frequent false-positive alarms can be tolerated.
... Furthermore, there exist tool-specific inclusions of regression benchmarking into the CI of tools, e.g. [41] who describe how regression benchmarking was introduced to the Kieker CI process. Schulz et al. [37] discuss how continuous load test generation from production session data can be used to continuously execute load tests of a web application. ...
... To improve the communication, coordination, and integration paths between development and operations teams, a paradigm known as DevOps has been adopted [5]. e word DevOps originated from two words developers "Dev" and operations "Ops" [1]. ...
Due to multitudes factors like rapid change in technology, customer needs, and business trends, the software organizations are facing pressure to deliver quality software on time. To address this concern, the software industry is continually looking the solution to improve processing timeline. Thus, the Development and Operations (DevOps) has gained a wide popularity in recent era, and several organizations are adopting it, to leverage its perceived benefits. However, companies are facing several problems while executing the DevOps practices. The objective of this work is to identify the DevOps success factors that will help in DevOps process improvement. To accomplish this research firstly, a systematic literature review is conducted to identify the factors having positive influence on DevOps. Secondly, success factors were mapped with DevOps principles, i.e., culture, automation, measurement, and sharing. Thirdly, the identified success factors and their mapping were further verified with industry experts via questionnaire survey. In the last step, the PROMETHEE-II method has been adopted to prioritize and investigate logical relationship of success factors concerning their criticality for DevOps process. This study’s outcomes portray the taxonomy of the success factors, which help the experts design the new strategies that are effective for DevOps process improvement.