ArticlePublisher preview available

On the Effectiveness of Bisection in Performance Regression Localization

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract and Figures

Performance regressions can have a drastic impact on the usability of a software application. The crucial task of localizing such regressions can be achieved using bisection, which attempts to find the bug-introducing commit using binary search. This approach is used extensively by many development teams, but it is an inherently heuristical approach when applied to performance regressions, and therefore, does not have correctness guarantees. Unfortunately, bisection is also time-consuming, which implies the need to assess its effectiveness prior to running it. To this end, the goal of this study is to analyze the effectiveness of bisection for performance regressions. This goal is achieved by first formulating a metric that quantifies the probability of a successful bisection, and extracting a list of input parameters – the contributing properties – that potentially impact its value; a sensitivity analysis is then conducted on these properties to understand the extent of their impact. Furthermore, an empirical study of 310 bug reports describing performance regressions in 17 real-world applications is conducted, to better understand what these contributing properties look like in practice. The results show that while bisection can be highly effective in localizing real-world performance regressions, this effectiveness is sensitive to the contributing properties, especially the choice of baseline and the distributions at each commit. The results also reveal that most bug reports do not provide sufficient information to help developers properly choose values and metrics that can maximize the effectiveness, which implies the need for measures to fill this information gap.
This content is subject to copyright. Terms and conditions apply.
https://doi.org/10.1007/s10664-022-10152-3
On the Effectiveness of Bisection in Performance
Regression Localization
Frolin S. Ocariza, Jr.1
Accepted: 23 March 2022
©The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
Abstract
Performance regressions can have a drastic impact on the usability of a software applica-
tion. The crucial task of localizing such regressions can be achieved using bisection,which
attempts to find the bug-introducing commit using binary search. This approach is used
extensively by many development teams, but it is an inherently heuristical approach when
applied to performance regressions, and therefore, does not have correctness guarantees.
Unfortunately, bisection is also time-consuming, which implies the need to assess its effec-
tiveness prior to running it. To this end, the goal of this study is to analyze the effectiveness
of bisection for performance regressions. This goal is achieved by first formulating a metric
that quantifies the probability of a successful bisection, and extracting a list of input param-
eters – the contributing properties – that potentially impact its value; a sensitivity analysis
is then conducted on these properties to understand the extent of their impact. Furthermore,
an empirical study of 310 bug reports describing performance regressions in 17 real-world
applications is conducted, to better understand what these contributing properties look like
in practice. The results show that while bisection can be highly effective in localizing real-
world performance regressions, this effectiveness is sensitive to the contributing properties,
especially the choice of baseline and the distributions at each commit. The results also
reveal that most bug reports do not provide sufficient information to help developers prop-
erly choose values and metrics that can maximize the effectiveness, which implies the need
for measures to fill this information gap.
Keywords Software performance ·Bisection ·Empirical study ·Bug localization
Communicated by: Philipp Leitner
Frolin S. Ocariza, Jr.
frolin.ocariza@sap.com
1SAP Canada Inc., Vancouver, BC, Canada
Published online: 30 April 2022
Empirical Software Engineering (2022) 27: 95
/
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... The goal is to provide a comprehensive understanding of how these approaches can be implemented and their potential impact on overall database performance. [4] C. Research Questions ...
Research
Full-text available
This research explores innovative approaches to enhancing the performance of enterprise databases, which are critical for managing extensive data storage, retrieval, and real-time access in large organizations. Addressing current performance challenges such as handling big data, query optimization, hardware constraints, and security, the study examines cutting-edge solutions including database sharding, in-memory databases, AI-driven query optimization, and cloud-based Database as a Service (DBaaS). These approaches aim to improve scalability, reduce latency, and ensure data integrity and availability. By evaluating the effectiveness and applicability of these techniques, the research provides valuable insights and recommendations for database administrators and IT managers to optimize enterprise database systems, ultimately supporting operational efficiency and growth.
... Finally, in 2022 Frolin Ocariza [61] published a paper on bisecting performance regressions. There are strong connections between a flaky test and a performance regression as a performance regression may not happen 100% of the time. ...
Conference Paper
Full-text available
When a change introduces a bug into a large software repository, there is often a delay between when the change is committed and when bug is detected. This is true even when the bug causes an existing test to fail! These delays are caused by resource constraints which prevent the organization from running all of the tests on every change. Due to the delay, a Continuous Integration system needs to locate buggy commits. Locating them is complicated by flaky tests that pass and fail non-deterministically. The flaky tests introduce noise into the CI system requiring costly reruns to determine if a failure was caused by a bad code change or caused by non-deterministic test behavior. This paper presents an algorithm, Flake Aware Culprit Finding, that locates buggy commits more accurately than a traditional bisection search. The algorithm is based on Bayesian inference and noisy binary search, utilizing prior information about which changes are most likely to contain the bug. A large scale empirical study was conducted at Google on 13,000+ test breakages. The study evaluates the accuracy and cost of the new algorithm versus a traditional deflaked bisection search.
Article
Full-text available
A performance regression in software is defined as an increase in an application step’s response time as a result of code changes. Detecting such regressions can be done using profiling tools; however, investigating their root cause is a mostly-manual and time-consuming task. This statement holds true especially when comparing execution timelines, which are dynamic function call trees augmented with response time data; these timelines are compared to find the performance regression-causes – the lowest-level function calls that regressed during execution. When done manually, these comparisons often require the investigator to analyze thousands of function call nodes. Further, performing these comparisons on web applications is challenging due to JavaScript’s asynchronous and event-driven model, which introduce noise in the timelines. In response, we propose a design – Zam – that automatically compares execution timelines collected from web applications, to identify performance regression-causes. Our approach uses a hybrid node matching algorithm that recursively attempts to find the longest common subsequence in each call tree level, then aggregates multiple comparisons’ results to eliminate noise. Our evaluation of Zam on 10 web applications indicates that it can identify performance regression-causes with a path recall of 100% and a path precision of 96%, while performing comparisons in under a minute on average. We also demonstrate the real-world applicability of Zam, which has been used to successfully complete performance investigations by the performance and reliability team in SAP.
Article
Full-text available
The detection of performance bugs, like those causing an unexpected execution time, has gained much attention in the last years due to their potential impact in safety-critical and resourceconstrained applications. Much effort has been put on trying to understand the nature of performance bugs in different domains as a starting point for the development of effective testing techniques. However, the lack of a widely accepted classification scheme of performance faults and, more importantly, the lack of well-documented and understandable datasets makes it difficult to draw rigorous and verifiable conclusions widely accepted by the community. In this paper, we present TANDEM, a dual contribution related to real-world performance bugs. Firstly, we propose a taxonomy of performance bugs based on a thorough systematic review of the related literature, divided into three main categories: effects, causes and contexts of bugs. Secondly, we provide a complete collection of fully documented real-world performance bugs. Together, these contributions pave the way for the development of stronger and reproducible research results on performance testing.
Book
Tackling the questions that systems designers care about, this book brings queueing theory decisively back to computer science. The book is written with computer scientists and engineers in mind and is full of examples from computer systems, as well as manufacturing and operations research. Fun and readable, the book is highly approachable, even for undergraduates, while still being thoroughly rigorous and also covering a much wider span of topics than many queueing books. Readers benefit from a lively mix of motivation and intuition, with illustrations, examples and more than 300 exercises – all while acquiring the skills needed to model, analyze and design large-scale systems with good performance and low cost. The exercises are an important feature, teaching research-level counterintuitive lessons in the design of computer systems. The goal is to train readers not only to customize existing analyses but also to invent their own.
Article
Bisection is of no use if you have a heisenbug that fails only from time to time. These subtle bugs are the hardest to fix and the ones that cause us to think critically about what we are doing. Timing bugs, bugs in distributed systems, and all the difficult problems we face in building increasingly complex software systems can't yet be addressed by simple bisection. It's often the case that it would take longer to write a usable bisection test for a complex problem than it would to analyze the problem whilst at the tip of the tree.
Chapter
Consider a generalization of the classical binary search problem in linearly sorted data to the graph-theoretic setting. The goal is to design an adaptive query algorithm, called a strategy, that identifies an initially unknown target vertex in a graph by asking queries. Each query is conducted as follows: the strategy selects a vertex q and receives a reply v: if q is the target, then v=q, and if q is not the target, then v is a neighbor of q that lies on a shortest path to the target. Furthermore, there is a noise parameter 0≤p<12 which means that each reply can be incorrect with probability p. The optimization criterion to be minimized is the overall number of queries asked by the strategy, called the query complexity. The query complexity is well understood to be O(ε-2logn) for general graphs, where n is the order of the graph and ε=12-p. However, implementing such a strategy is computationally expensive, with each query requiring possibly O(n2) operations. In this work we propose two efficient strategies that keep the optimal query complexity. The first strategy achieves the overall complexity of O(ε-1nlogn) per a single query. The second strategy is dedicated to graphs of small diameter D and maximum degree Δ and has the average complexity of O(n+ε-2DΔlogn) per query. We point out that we develop an algorithmic tool of graph median approximation that is of independent interest: the median can be efficiently approximated by finding a vertex minimizing the sum of distances to a randomly sampled vertex subset of size O(ε-2logn).