ArticlePDF Available

Including Performance Benchmarks into Continuous Integration to Enable DevOps

Authors:

Abstract and Figures

The DevOps movement intends to improve communication, collaboration , and integration between software developers (Dev) and IT operations professionals (Ops). Automation of software quality assurance is key to DevOps success. We present how automated performance benchmarks may be included into continuous integration. As an example, we report on regression benchmarks for application monitoring frameworks and illustrate the inclusion of automated benchmarks into continuous integration setups.
Content may be subject to copyright.
Including Performance Benchmarks
into Continuous Integration to Enable DevOps
Jan Waller, Nils C. Ehmke, and Wilhelm Hasselbring
Software Engineering Group, Kiel University, 24098 Kiel, Germany
{jwa,nie,wha}@informatik.uni-kiel.de
ABSTRACT
The DevOps movement intends to improve communication, col-
laboration, and integration between software developers (Dev)
and IT operations professionals (Ops). Automation of software
quality assurance is key to DevOps success. We present how auto-
mated performance benchmarks may be included into continuous
integration. As an example, we report on regression benchmarks
for application monitoring frameworks and illustrate the inclusion
of automated benchmarks into continuous integration setups.
Categories and Subject Descriptors
D.2.5 [Software Engineering]: Testing and Debugging – Test-
ing tools. D.2.8 [Software Engineering]: Metrics – Perfor-
mance measures. D.2.9 [Software Engineering]: Management
– Software quality assurance (SQA).
General Terms
Measurement, Performance.
Keywords
Jenkins, Kieker, MooBench.
1. INTRODUCTION
Based upon our experience with regression benchmarks for appli-
cation monitoring frameworks, we suggest to include automated
benchmarks into continuous integration setups. Applying these
automated benchmarks in an early stage of the development pro-
cess enables an early detection and repair of performance issues
before such issues are propagated into a release.
This approach contributes to the current efforts of the DevOps
movement by presenting a case study of executing and analyzing
regression benchmarks in continuous integration setups. Further-
more, we provide a vision for improved analyses and visualizations
including automated notifications to the developers in charge.
The rest of this article is structured as follows: First, we introduce
the DevOps movement (Section 2) and our case study setup (Sec-
tion 3). In Section 4, a short overview on our present inclusion of
benchmarks into continuous integration is given while our vision
is sketched in Section 5. Finally, we provide a short summary and
advice in Section 6.
2. DEVOPS: DEVELOPMENT + OPERATIONS
In addition to studying the construction and evolution of software,
the software engineering discipline needs to address the operation
of continuously running software services. Often, software devel-
opment and IT operations are detached organizational units, with
a high potential for misunderstanding and conflict. The DevOps
movement intends to improve communication, collaboration, and
integration between software developers (Dev) and IT operations
professionals (Ops).
Automation is key to DevOps success: automated building of sys-
tems out of version management repositories; automated execu-
tion of unit tests, integration tests, and system tests; automated
deployment in test and production environments. Besides func-
tional acceptance tests, automated tests of non-functional quality
attributes, such as performance, are required to ensure seamless
operation of the software. In this article, we present how perfor-
mance benchmarks may be included into continuous integration
setups.
Continuous integration [3] and continuous delivery are enabling
techniques for DevOps. A further important requirement for the
robust operation of software services are means for continuous
monitoring of software runtime behavior. In contrast to profil-
ing for construction activities, monitoring of operational services
should only impose a small performance overhead. The Kieker
monitoring framework is an example of providing these means
with a small performance overhead [5]. We report on how we in-
clude micro-benchmarks that measure the performance overhead
of Kieker into the continuous integration process for this moni-
toring framework.
3. KIEKER WITH MOOBENCH
The Kieker framework [5] is an extensible framework for appli-
cation-level performance monitoring of operational services and
subsequent dynamic software analysis.1It includes measurement
probes for the instrumentation of software systems and writers
to facilitate the storage or further transport of collected obser-
vation data. Analysis plug-ins operate on the collected data to
extract and visualize architectural models that can be augmented
by quantitative observations.
In 2011, the Kieker framework was reviewed, accepted, and pub-
lished as a recommended tool for quantitative system evaluation
and analysis by the SPEC Research Group. Since then, the tool
is also distributed as part of SPEC Research Group’s tool repos-
itory.2Although is has originally been developed as a research
tool, Kieker is used in several industrial systems.
A monitored software system has to share some of its resources
(e. g., CPU-time or memory) with the monitoring framework. The
amount of additional resources that are consumed by the moni-
toring framework is called monitoring overhead. In this article,
we are concerned with monitoring overhead measured by the in-
creasing response times of the monitored system.
1http://kieker-monitoring.net/
2http://research.spec.org/projects/tools.html
ACM SIGSOFT Software Engineering Notes
Page 1
March 2015 Volume 40 Number 2
DOI:10.1145/2735399.2735416
http://doi.acm.org/10.1145/2735399.2735416
Benchmarks are used to compare different platforms, tools, or
techniques in experiments. They define standardized measure-
ments to provide repeatable, objective, and comparable results.
In computer science, benchmarks are used to compare, for in-
stance, the performance of CPUs, database management systems,
or information retrieval algorithms [4].
The MooBench micro-benchmark has been developed to mea-
sure and compare the monitoring overhead of different monitoring
frameworks [6]. We have identified three causes of application-
level monitoring overhead that are common to most monitoring
tools: (1) the instrumentation of the monitored system, (2) col-
lecting data within the system, e. g., response times or method
signatures, and (3) either writing the data into a monitoring log
or transferring the data to an analysis system. With the help of
our micro-benchmark, we can quantify these three causes of mon-
itoring overhead and, for instance, detect performance regressions
or steer performance tunings of the monitoring framework.
Thanks to irregularly performed manual benchmarks of the mon-
itoring overhead of Kieker, we have detected several performance
regressions after the releases of new versions over the last years.
After their detection, these performance regressions have been
transformed into tickets in our issue tracking system. This has
enabled us to further investigate the regressions and to provide
bug fixes for future releases.
The main challenge when patching performance regressions is
identifying the code changes that have triggered the regression.
With irregular manual benchmarks, lots of commits can contain
the possible culprit. Ideally, our benchmark would have been
executed automatically with each nightly build to provide imme-
diate hints on performance problems with each change. Thus, the
continuous integration of benchmarks provides the benefit of an
immediate and automatic feedback to developers.
4. INCLUDING MOOBENCH INTO JENKINS
The continuous integration setup that is employed by Kieker is
based upon Jenkins.3
As our initial approach, we developed a Jenkins plugin to execute
the benchmark. This plugin directly executes MooBench from
within Jenkins. However, this kind of execution violates the com-
mon benchmark requirement for an idle environment: Jenkins and
its periodical tasks, as well as the running plugin itself, influence
the benchmark results. This can be seen in Figure 1. Note that
the changes in the response times of the purple graph are not
caused by actual changes in the source code but rather by back-
ground tasks within Jenkins. Even remote execution with the help
of a Jenkins master/slave setup, i. e., executing the benchmark
within an otherwise idle Jenkins instance on a separate server,
has only provided fluctuating results.
Finally, we have chosen a simpler approach: Instead of using com-
plex plugins, we simply call a shell script at the end of each nightly
build on Jenkins. This script copies the benchmark and the cre-
ated Kieker nightly jar-file to an idle, pre-configured remote server
(e. g., onto a cloud instance). There, the benchmark gets exe-
cuted while Jenkins waits for the results. In addition to the usual
analyses performed by MooBench, e. g., calculating mean and me-
dian with their confidence intervals and quartiles, we also create a
comma-separated values (CSV) file with the mean measurement
results. This file can be read and interpreted by a plot plugin
within Jenkins. An example of such a generated plot based upon
3http://jenkins-ci.org/
the Kieker nightly builds is presented in Figure 2.
As is evident by the display of the data in Figure 2, the plot plugin
is rather limited. For instance, it is only capable of displaying the
measured mean response times that still contain some variations.
The display of additional statistical method, such as confidence
intervals, would be beneficial to their interpretation.
In addition, we currently only display the gathered results rather
than automatically notifying the developers when a performance
anomaly occurs. The actual detection of the anomalies has to be
performed manually. However, previous work on anomaly detec-
tion within Kieker results can be adapted for this scenario [1].
Finally, as is common with dynamic analysis approaches, the de-
tection and visualization of performance regressions is only pos-
sible within benchmarked areas of Kieker. As a consequence, any
performance regression caused by, for instance, unused probes or
writers cannot be found. However, a more thorough benchmark
requires a higher time cost (currently about 60 minutes per nightly
build). Thus, a balance has to be found between benchmark cov-
erage and time spent benchmarking.
Despite these remaining challenges in the current implementation,
the inclusion of MooBench into the continuous integration setup of
Kieker already provides great benefits. Any performance regres-
sions are now detected immediately. Furthermore, the regressions
can directly be linked to small sets of changes. Thus, diagnosis
of performance problems is aided. The current and future im-
plementations of our integration of benchmarks into Jenkins are
available as open source software with MooBench.4Furthermore,
the current state of our implementation is available with our con-
tinuous integration setup.5
5. EVALUATION AND VISION
In this section, we present our vision for improved analyses and
visualizations of regression benchmarks within continuous inte-
gration setups. Of special interest are automated detections of
performance regressions and corresponding notifications to the
developers in charge. We demonstrate the capabilities of our
envisioned approach with the help of our case study system by
studying a previous performance regression.
Since our inclusion of MooBench into the continuous integration
setup of Kieker, no additional major performance regressions have
occurred. Instead of artificially creating an anomaly to demon-
strate the capabilities of our setup, we have recreated earlier
nightly builds and executed the benchmark as it would have been
included. This post-mortem benchmarking also allows for an out-
look on a more advanced visualization and anomaly detection than
is currently realized within our Jenkins implementation.
Specifically, we have selected a performance regression that hap-
pened in March 2013 and that was detected in Kieker release
version 1.7: An unintended increase of the first part of monitor-
ing overhead (instrumentation) that was related to a bug in our
implementation of adaptive monitoring. To further narrow down
the cause of this regressions, we haven taken a look at the nightly
builds between Kieker releases 1.6 and 1.7. For each build, we
have run the MooBench benchmark in a configuration identical
to the one used in our continuous integration setup. The result-
ing visualization of a few of the relevant benchmark results of the
nightly builds is presented in Figure 3.
4http://kieker-monitoring.net/MooBench/
5http://build.kieker-monitoring.net/job/kieker- nightly-release/plot/
ACM SIGSOFT Software Engineering Notes
Page 2
March 2015 Volume 40 Number 2
DOI:10.1145/2735399.2735416
http://doi.acm.org/10.1145/2735399.2735416
Figure 1: Initial inclusion of MooBench into Jenkins
In Figure 3, the mean benchmark results are depicted as stacked
bars. Each bar is annotated to the right with its respective 95%
confidence interval. The lowest bar is barely visible and represents
the base time of the benchmark without any monitoring overhead.
The other three bars correspond to the three causes of overhead
(instrumentation, data collection, and writing). Our focus in this
analysis is on the orange bar, representing the instrumentation.
The actual anomaly is highlighted with a red ellipse.
The first four presented nightly builds are part of our analysis:
two builds before and after the performance regression occurred.
The final three builds demonstrate our bug fixing two and a half
month after the performance regression. With the help of our
presented vision of including benchmarks into continuous integra-
tion and performing automated anomaly detections on the results,
e. g., similar to [1], the time to fix performance regressions can be
reduced.
The necessity to automate the continuous execution of bench-
marks has also been recognized in other contexts. For instance,
the academical BEEN project is concerned with automated re-
gression benchmarking [2]. An industrial approach for continuous
benchmarking of web-based software systems is presented in [7].
6. SUMMARY AND ADVICE
Based on our experience with regression benchmarks, we suggest
to include benchmarks into continuous integration early in the
development process. In current practice, benchmark and moni-
toring instrumentation is often only integrated into the software
when problems occur after release in the production environment
of IT operations. Measurement instrumentation, i. e., benchmarks
as well as monitoring, should be integrated into software devel-
opment right from the start. Automated benchmarks are then
able to detect performance issues that may have been introduced
between versions, before these issues are propagated via the de-
ployment pipeline. Monitoring then can aid in the early detection
of further performance issues that only manifest in the actual de-
ployment of the software.
References
[1] J. Ehlers, A. van Hoorn, J. Waller, and W. Hasselbring. Self-
adaptive software system monitoring for performance anomaly
localization. In Proceedings of the 8th IEEE/ACM Interna-
tional Conference on Autonomic Computing (ICAC 2011),
pages 197–200. ACM, June 2011.
[2] T. Kalibera, J. Lehotsky, D. Majda, B. Repcek, M. Tomcanyi,
A. Tomecek, P. T˚uma, and J. Urban. Automated bench-
marking and analysis tool. In Proceedings of the 1st Interna-
tional Conference on Performance Evaluation Methodologies
and Tools (Valuetools 2006), pages 5–14. ACM, Oct. 2006.
[3] M. Meyer. Continuous integration and its tools. IEEE Soft-
ware, 31(3):14–16, May 2014.
[4] S. E. Sim, S. Easterbrook, and R. C. Holt. Using benchmark-
ing to advance research: A challenge to software engineering.
In Proceedings of the 25th International Conference on Soft-
ware Engineering (ICSE 2003), pages 74–83. IEEE Computer
Society, May 2003.
[5] A. van Hoorn, J. Waller, and W. Hasselbring. Kieker:
A framework for application performance monitoring and
dynamic software analysis. In Proceedings of t he 3 rd
ACM/SPEC International Conference on Performance Engi-
neering (ICPE 2012), pages 247–248. ACM, Apr. 2012.
[6] J. Waller and W. Hasselbring. A benchmark engineering
methodology to measure the overhead of application-level
monitoring. In Proceedings of the Symposium on Software Per-
formance: Joint Kieker/Palladio Days (KPDays 2013), pages
59–68. CEUR Workshop Proceedings, Nov. 2013.
[7] C. Weiss, D. Westermann, C. Heger, and M. Moser. Sys-
tematic performance evaluation based on tailored benchmark
applications. In Proceedings of the 4th ACM/SPEC Interna-
tional Conference on Performance Engineering (ICPE 2013),
pages 411–420. ACM, Apr. 2013.
ACM SIGSOFT Software Engineering Notes
Page 3
March 2015 Volume 40 Number 2
DOI:10.1145/2735399.2735416
http://doi.acm.org/10.1145/2735399.2735416
Figure 2: Performance measurements within Jenkins
0͘84 0͘87
4͘00 3͘94 4͘02
0͘76 0͘76
4͘15 4͘10
3͘86 3͘94 3͘88
7͘20 7͘25
40͘06 39͘36
42͘52
37͘58 37͘67 40͘08 41͘54
0
10
20
30
40
50
2013-03-05 2013-03-06 2013-03-07 2013-03-08 2013-05-26 2013-05-27 2013-05-28
Mean Response Time (s)
...
Nightly Builds on our Jenkins Server
ŽůůĞĐƚŝŶŐĂƚĂtƌŝƚĞƌ;^//ͿŽŶĨŝĚĞŶĐĞ/ŶƚĞƌǀĂůƐ;ϵϱйͿ
ĂƐĞ /ŶƐƚƌƵŵĞŶƚĂƚŝŽŶ
Figure 3: Scenario for detecting performance anomalies b etween releases via benchmarks in continuous integration
ACM SIGSOFT Software Engineering Notes
Page 4
March 2015 Volume 40 Number 2
DOI:10.1145/2735399.2735416
http://doi.acm.org/10.1145/2735399.2735416
... These results show that µOpTime significantly reduces the execution time of microbenchmark suites while keeping a result accuracy. This closes the gap further towards enabling a continuous benchmarking with microbenchmark suites in CI/CD pipelines [21,53]. ...
... Benchmarking in CI/CD pipelines. Besides our study, there are several others that integrate microbenchmarks into CI/CD pipelines to detect performance changes [10,11,19,28,35,53]. Application benchmarks, i.e., stressing fully set-up systems such as a database system with an artificial load such as HTTP requests [6], are used more and more for detecting performance issues [16,17,21,27]. ...
Preprint
Performance regressions have a tremendous impact on the quality of software. One way to catch regressions before they reach production is executing performance tests before deployment, e.g., using microbenchmarks, which measure performance at subroutine level. In projects with many microbenchmarks, this may take several hours due to repeated execution to get accurate results, disqualifying them from frequent use in CI/CD pipelines. We propose μ\muOpTime, a static approach to reduce the execution time of microbenchmark suites by configuring the number of repetitions for each microbenchmark. Based on the results of a full, previous microbenchmark suite run, μ\muOpTime determines the minimal number of (measurement) repetitions with statistical stability metrics that still lead to accurate results. We evaluate μ\muOpTime with an experimental study on 14 open-source projects written in two programming languages and five stability metrics. Our results show that (i) μ\muOpTime reduces the total suite execution time (measurement phase) by up to 95.83% (Go) and 94.17% (Java), (ii) the choice of stability metric depends on the project and programming language, (iii) microbenchmark warmup phases have to be considered for Java projects (potentially leading to higher reductions), and (iv) μ\muOpTime can be used to reliably detect performance regressions in CI/CD pipelines.
... 1 https://protobuf.dev/ MooBench addresses another important dimension to the performance engineering of cloud-native applications: performance overhead yielded by performance observability frameworks [1,10,16]. It continuously evaluates the performance regression of its targets. ...
... In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools [9]. MooBench is designed for regression benchmarking within continuous integration pipelines [1] of individual monitoring frameworks, not for comparing such frameworks against each other. ...
Preprint
Full-text available
Performance engineering has become crucial for the cloud-native architecture. This architecture deploys multiple services, with each service representing an orchestration of containerized processes. OpenTelemetry is growing popular in the cloud-native industry for observing the software's behaviour, and Kieker provides the necessary tools to monitor and analyze the performance of target architectures. Observability overhead is an important aspect of performance engineering and MooBench is designed to compare different observability frameworks, including OpenTelemetry and Kieker. In this work, we measure the overhead of Cloudprofiler, a performance profiler implemented in C++ to measure native and JVM processes. It minimizes the profiling overhead by locating the profiler process outside the target process and moving the disk writing overhead off the critical path with buffer blocks and compression threads. Using MooBench, Cloudprofiler's buffered ID handler with the Zstandard lossless data compression ZSTD showed an average execution time of 2.28 microseconds. It is 6.15 times faster than the non-buffered and non-compression handler.
... MooBench is a benchmark that aims for measuring the performance overhead of monitoring frameworks [22]. To measure the performance overhead, it calls a method recursively with a given call depth . ...
... MooBench is a benchmark that aims for measuring the performance overhead of monitoring frameworks [22]. To measure the performance overhead, it calls a method recursively with a given call depth . ...
Preprint
Full-text available
The examination of performance changes or the performance behavior of a software requires the measurement of the performance. This is done via probes, i.e., pieces of code which obtain and process measurement data, and which are inserted into the examined application. The execution of those probes in a singular method creates overhead, which deteriorates performance measurements of calling methods and slows down the measurement process. Therefore, an important challenge for performance measurement is the reduction of the measurement overhead. To address this challenge, the overhead should be minimized. Based on an analysis of the sources of performance overhead, we derive the following four optimization options: (1) Source instrumentation instead of AspectJ instrumentation, (2) reduction of measurement data, (3) change of the queue and (4) aggregation of measurement data. We evaluate the effect of these optimization options using the MooBench benchmark. Thereby, we show that these optimizations options reduce the monitoring overhead of the monitoring framework Kieker. For MooBench, the execution duration could be reduced from 4.77 ms to 0.39 ms per method invocation on average.
Article
Performance regressions have a tremendous impact on the quality of software. One way to catch regressions before they reach production is executing performance tests before deployment, e.g., using microbenchmarks, which measure performance at subroutine level. In projects with many microbenchmarks, this may take several hours due to repeated execution to get accurate results, disqualifying them from frequent use in CI/CD pipelines. We propose µOpTime, a static approach to reduce the execution time of microbenchmark suites by configuring the number of repetitions for each microbenchmark. Based on the results of a full, previous microbenchmark suite run, µOpTime determines the minimal number of (measurement) repetitions with statistical stability metrics that still lead to accurate results. We evaluate µOpTime with an experimental study on 14 open-source projects written in two programming languages and five stability metrics. Our results show that (i) µOpTime reduces the total suite execution time (measurement phase) by up to 95.83% (Go) and 94.17% (Java), (ii) the choice of stability metric depends on the project and programming language, (iii) microbenchmark warmup phases have to be considered for Java projects (potentially leading to higher reductions), and (iv) µOpTime can be used to reliably detect performance regressions in CI/CD pipelines.
Article
DevOps has emerged as a practical approach to the interaction between development and operations. This approach has been extended to the interaction with business functions, generating the term BizDevOps. Although many proposals and tools support BizDevOps from a technical viewpoint, there has been no significant improvement in management aspects, such as evaluating the practices and processes involved in the area. This paper presents two case studies on the assessment of BizDevOps practices using MMBDO (Maturity Model for BizDevOps), a maturity model for BizDevOps based on international ICT standards. The case studies were conducted in two organizations with different profiles: a large multinational ICT company with a unit specialized in providing DevOps to the rest of the company and a small software development company that implements DevOps practices as part of its development system. The evidence we obtained in the case studies supports the adequacy of MMBDO for assessing the maturity of BizDevOps in real companies. Also, the combination of the two case studies supports the adequacy of the model for being used in companies with different sizes and levels of maturity. MMBDO is an adequate proposal for assessing the maturity of BizDevOps in software organizations and a suitable tool for process improvement.
Article
Testing is one of the most important steps in software development–it ensures the quality of software. Continuous Integration (CI) is a widely used testing standard that can report software quality to the developer in a timely manner during development progress. Performance, especially scalability, is another key factor for High Performance Computing (HPC) applications. There are many existing profiling and performance tools for HPC applications, but none of these are integrated into CI tools. In this work, we propose BeeSwarm, an HPC container based parallel scaling performance system that can be easily applied to the current CI test environments. BeeSwarm is mainly designed for HPC application developers who need to monitor how their applications can scale on different compute resources. We demonstrate BeeSwarm using three different HPC applications: CoMD, LULESH and NWChem. We utilize GitHub Actions and provision resources from Google Compute Engine. Our results show that BeeSwarm can be used for scalability and performance testing of a variety of HPC applications, allowing developers to monitor application performance over time.
Thesis
Full-text available
Cloud-native applications constitute a recent trend for designing large-scale software systems. This thesis introduces the Theodolite benchmarking method, allowing researchers and practitioners to conduct empirical scalability evaluations of cloud-native applications, their frameworks, configurations, and deployments. The benchmarking method is applied to event-driven microservices, a specific type of cloud-native applications that employ distributed stream processing frameworks to scale with massive data volumes. Extensive experimental evaluations benchmark and compare the scalability of various stream processing frameworks under different configurations and deployments, including different public and private cloud environments. These experiments show that the presented benchmarking method provides statistically sound results in an adequate amount of time. In addition, three case studies demonstrate that the Theodolite benchmarking method can be applied to a wide range of applications beyond stream processing.
Article
Full-text available
Application-level monitoring frameworks, such as Kieker, provide insight into the inner workings and the dynamic behavior of software systems. However, depending on the number of monitoring probes used, these frameworks may introduce significant runtime overhead. Consequently, planning the instrumentation of continuously operating software systems requires detailed knowledge of the performance impact of each monitoring probe. In this paper, we present our benchmark engineering approach to quantify the monitoring overhead caused by each probe under controlled and repeatable conditions. Our developed MooBench benchmark provides a basis for performance evaluations and comparisons of application-level monitoring frameworks. To evaluate its capabilities, we employ our benchmark to conduct a performance comparison of all available Kieker releases from version 0.91 to the current release 1.8.
Conference Paper
Full-text available
Kieker is an extensible framework for monitoring and analyzing the runtime behavior of concurrent or distributed software systems. It provides measurement probes for application performance monitoring and control-flow tracing. Analysis plugins extract and visualize architectural models, augmented by quantitative observations. Configurable readers and writers allow Kieker to be used for online and offline analysis. This paper reviews the Kieker framework focusing on its features, its provided extension points for custom components, as well the imposed monitoring overhead.
Conference Paper
Full-text available
Autonomic computing components and services require continuous monitoring capabilities for collecting and analyzing data of runtime behavior. Particularly for software systems, a trade-off between monitoring coverage and performance overhead is necessary. In this paper, we propose an approach for localizing performance anomalies in software systems employing self-adaptive monitoring. Time series analysis of operation response times, incorporating architectural information about the diagnosed software system, is employed for anomaly localization. Comprising quality of service data, such as response times, resource utilization, and anomaly scores, OCL-based monitoring rules specify the adaptive monitoring coverage. This enables to zoom into a system's or component's internal realization in order to locate root causes of software failures and to prevent failures by early fault determination and correction. The approach has been implemented as part of the Kieker monitoring and analysis framework. The evaluation presented in this paper focuses on monitoring overhead, response time forecasts, and the anomaly detection process.
Conference Paper
Full-text available
Benchmarks have been used in computer science to compare the performance of computer systems, information retrieval algorithms, databases, and many other technologies. The creation and widespread use of a benchmark within a research area is frequently accompanied by rapid technical progress and community building. These observations have led us to formulate a theory of benchmarking within scientific disciplines. Based on this theory, we challenge software engineering research to become more scientific and cohesive by working as a community to define benchmarks. In support of this challenge, we present a case study of the reverse engineering community, where we have successfully used benchmarks to advance the state of research.
Article
Full-text available
Benchmarks have been used in computer science to compare the performance of computer systems, information retrieval algorithms, databases, and many other technologies. The creation and widespread use of a benchmark within a research area is frequently accompanied by rapid technical progress and community building. These observations have led us to formulate a theory of benchmarking within scientific disciplines. Based on this theory, we challenge software engineering research to become more scientific and cohesive by working as a community to define benchmarks. In support of this challenge, we present a case study of the reverse engineering community, where we have successfully used benchmarks to advance the state of research.
Conference Paper
Performance (i.e., response time, throughput, resource consumption) is a key quality metric of today's applications as it heavily affects customer satisfaction. SAP strives to identify and fix performance problems before customers face them. Therefore, performance engineering methods are applied in all stages of the software lifecycle. However, especially in the development phase continuous performance evaluations can introduce a lot of overhead for developers which hinders their broad application in practice. In order to evaluate the performance of a certain software artefact (e.g. comparing two design alternatives), a developer has to run measurements that are tailored to the software artefact under test. The use of standard benchmarks would create less overhead, but the information gain is often not sufficient to answer the specific questions of developers. In this industrial paper, we present an approach that enables exhaustive, tailored performance testing with minimal effort for developers. The approach allows to define benchmark applications through a domain-specific model and realizes the transformation of those models to benchmark applications via a generic Benchmark Framework. The application of the approach in the context of the SAP Netweaver Cloud development environment demonstrated that we can efficiently identify performance problems that would not have been detected by our existing performance test infrastructure.
Article
Continuous integration has been around for a while now, but the habits it suggests are far from common practice. Automated builds, a thorough test suite, and committing to the mainline branch every day sound simple at first, but they require a responsible team to implement and constant care. What starts with improved tooling can be a catalyst for long-lasting change in your company's shipping culture. Continuous integration is more than a set of practices, it's a mindset that has one thing in mind: increasing customer value. The Web extra at http://youtu.be/tDl_cHfrJZo is an audio podcast of the Tools of the Trade column discusses how continuous integration is more than a set of practices, it's a mindset that has one thing in mind: increasing customer value.
Conference Paper
ó Benchmarking is an important performance eval- uation technique that provides performance data representative of real systems. Such data can be used to verify the results of performance modeling and simulation, or to detect performance changes. Automated benchmarking is an increasingly popular approach to tracking performance changes during software development, which gives developers a timely feedback on their work. In contrast with the advances in modeling and simulation tools, the tools for automated benchmarking are usually being implemented adñhoc for each project, wasting resources and limiting functionality. We present the result of project BEEN, a generic tool for automated benchmarking in a heterogeneous distributed environ- ment. BEEN automates all steps of a benchmark experiment from software building and deployment through measurement and load monitoring to the evaluation of results. The notable features include separation of measurement from the evaluation and ability to adaptively scale the benchmark experiment based on the evaluation. BEEN has been designed to facilitate automated detection of performance changes during software development (regression benchmarking).