Project

diagnoseIT: Expert-guided Automatic Diagnosis of Performance Problems in Enterprise Applications

Goal: Quality attributes of enterprise software applications such as performance, availability, and reliability have a significant impact on business critical metrics of enterprises such as revenue and total cost of ownership. Application Performance Management (APM) processes and tools are often facilitated and integrated into the application lifecycle to monitor performance-relevant metrics of the enterprise applications (e.g., response time, throughput, or resource utilization). APM is a necessity to detect and solve performance problems early. Experience shows that comprehensive APM is seldom implemented in industry, resulting in an unsatisfying quality of enterprise applications and detection rate of performance problems. There are major reasons for the low adoption rate of APM: the initial setup and maintenance of APM is error-prone and requires a high manual effort and expertise.

In order to improve this situation, NovaTec Consulting GmbH and the University of Stuttgart (Reliable Software Systems Group) launched the collaborative research project diagnoseIT on “Expert-guided Automatic Diagnosis of Performance Problems in Enterprise Applications”. Formalized APM expert knowledge is used to systematically detect and diagnose performance problems. Therefore, diagnoseIT uses an APM-tool-independent approach to orchestrate available APM solutions, initially focusing on the open-source tools inspectIT and Kieker. diagnoseIT provides a goal-oriented root cause analysis, offering a starting point for problem resolution. The project results will be published under an open-source license.

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
11
Reads
0 new
69

Project log

Dušan Okanović
added a research item
The success of modern businesses relies on the quality of their supporting application systems. Continuous application performance management is mandatory to enable efficient problem detection, diagnosis, and resolution during production. In today's age of ubiquitous computing, large fractions of users access application systems from mobile devices, such as phones and tablets. For detecting, diagnosing, and resolving performance and availability problems, an end-to-end view, i.e., traceability of requests starting on the (mobile) clients' devices, is becoming increasingly important. In this paper, we propose an approach for end-to-end monitoring of applications from the users' mobile devices to the back end, and diagnosing root-causes of detected performance problems. We extend our previous work on diagnosing performance anti-patterns from execution traces by new metrics and rules. The evaluation of this work shows that our approach successfully detects and diagnoses performance anti-patterns in applications with iOS-based mobile clients. While there are threats to validity to our experiment, our research is a promising starting point for future work.
Dušan Okanović
added a research item
This is the final report of the collaborative research project diagnoseIT on expert-guided automatic diagnosis of performance problems in enterprise applications.
Dušan Okanović
added 2 research items
Application performance management (APM) is a necessity to detect and solve performance problems during operation of enterprise applications. While existing tools provide alerting and visualization capabilities when performance requirements are violated during operation, the isolation and diagnosis of the problem's real root cause is the responsibility of the rare performance expert, often resulting in a boring and recurring task. Main challenges for APM adoption in practice include that initial setup and maintenance of APM, and particularly the diagnosis of performance problems are error-prone, costly, and require a high manual effort and expertise. In this paper, we present preliminary work on diagnoseIT, an approach that utilizes formalized APM expert knowledge to automate the aforementioned recurring APM activities.
Failures in software systems during operation are inevitable. They cause system downtime, which needs to be minimized to reduce or avoid unnecessary costs and customer dissatisfaction. Online failure prediction aims at identifying upcoming failures at runtime to enable proactive maintenance actions. Existing online failure prediction approaches focus on predicting failures of either individual components or the system as a whole. They do not take into account software architectural dependencies, which determine the propagation of failures. In this paper, we propose a hierarchical online failure prediction approach, HORA , which employs a combination of both failure predictors and architectural models. We evaluate our approach using a distributed RSS reader application by Netflix and investigate the prediction quality for two representative types of failures, namely memory leak and system overload. The results show that, overall, our approach improves the area under the ROC curve by 10.7% compared to a monolithic approach.
Dušan Okanović
added 4 research items
A challenging problem with today's increasingly large and distributed software systems is their performance behavior. To help developers avoid or detect mistakes that lead to performance problems, many researchers in software performance engineering have come up with classifications of such problems, called antipatterns. To test the approaches for antipattern detection, data from running systems is required. However, the usefulness of this data is doubtful as it may or may not include manifestations of performance problems. In this paper, we classify existing performance antipatterns w.r.t. their suitability for being injected and, based on this, introduce an extensible tool that allows to inject instances of these antipatterns into existing applications. The approach can be useful for researchers to test and validate their automated runtime problem evaluation and prevention techniques. Using two exemplary performance antipatterns, it is demonstrated that the injection is easily possible and produces feasible, though currently rather clinical results.
The performance of application systems has a direct impact on business metrics. For example, companies lose customers and revenue in case of poor performance such as high response times. Application performance management (APM) aims to provide the required processes and tools to have a continuous and up-to-date picture of relevant performance measures during operations, as well as to support the detection and resolution of performance-related incidents. In this tutorial paper, we provide an overview of the state of the art in APM in industrial practice and academic research, highlight current challenges, and outline future research directions.
As the importance of application performance grows in modern enterprise systems, many organizations employ application performance management (APM) tools to help them deal with potential performance problems during production. In addition to monitoring capabilities, these tools provide problem detection and alerting. In large enterprise systems these tools can report a very large number of performance problems. They have to be dealt with individually, in a time-consuming and error-prone manual process, even though many of them have a common root cause. In this vision paper, we propose using automatic categorization for dealing with large numbers of performance problems reported by APM tools. This leads to the aggregation of reported problems, reducing the work required for resolving them. Additionally, our approach opens the possibility of extending the analysis approaches to use this information for a more efficient diagnosis of performance problems.
Dušan Okanović
added a research item
Execution traces capture information on a software system’s runtime behavior, including data on system-internal software control flows, performance, as well as request parameters and values. In research and industrial practice, execution traces serve as an important basis for model-based and measurement-based performance evaluation, e.g., for application performance monitoring (APM), extraction of descriptive and prescriptive models, as well as problem detection and diagnosis. A number of commercial and open-source APM tools that allow the capturing of execution traces within distributed software systems is available. However, each of the tools uses its own (proprietary) format, which means that each approach building on execution trace data is tool-specific. In this paper, we propose the (OPEN.xtrace) format to enable data interoperability and exchange between APM tools and (SPE) approaches. Particularly, this enables SPE researchers to develop their approaches in a tool-agnostic and comparable manner. OPEN.xtrace is a community effort as part of the overall goal to increase interoperability of SPE/APM techniques and tools. In addition to describing the OPEN.xtrace format and its tooling support, we evaluate OPEN.xtrace by comparing its modeling capabilities with the information that is available in leading APM tools.
André van Hoorn
added a project goal
Quality attributes of enterprise software applications such as performance, availability, and reliability have a significant impact on business critical metrics of enterprises such as revenue and total cost of ownership. Application Performance Management (APM) processes and tools are often facilitated and integrated into the application lifecycle to monitor performance-relevant metrics of the enterprise applications (e.g., response time, throughput, or resource utilization). APM is a necessity to detect and solve performance problems early. Experience shows that comprehensive APM is seldom implemented in industry, resulting in an unsatisfying quality of enterprise applications and detection rate of performance problems. There are major reasons for the low adoption rate of APM: the initial setup and maintenance of APM is error-prone and requires a high manual effort and expertise.
In order to improve this situation, NovaTec Consulting GmbH and the University of Stuttgart (Reliable Software Systems Group) launched the collaborative research project diagnoseIT on “Expert-guided Automatic Diagnosis of Performance Problems in Enterprise Applications”. Formalized APM expert knowledge is used to systematically detect and diagnose performance problems. Therefore, diagnoseIT uses an APM-tool-independent approach to orchestrate available APM solutions, initially focusing on the open-source tools inspectIT and Kieker. diagnoseIT provides a goal-oriented root cause analysis, offering a starting point for problem resolution. The project results will be published under an open-source license.