Steffen Herbold

Steffen Herbold
Universität Passau · Chair of AI Engineering

Prof. Dr.

About

89
Publications
7,702
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
868
Citations

Publications

Publications (89)
Article
Full-text available
Context Performance metrics are a core component of the evaluation of any machine learning model and used to compare models and estimate their usefulness. Recent work started to question the validity of many performance metrics for this purpose in the context of software defect prediction. Objective Within this study, we explore the relationship b...
Preprint
Context: The identification of bugs within the reported issues in an issue tracker is crucial for the triage of issues. Machine learning models have shown promising results regarding the performance of automated issue type prediction. However, we have only limited knowledge beyond our assumptions how such models identify bugs. LIME and SHAP are pop...
Preprint
Context: Differential testing is a useful approach that uses different implementations of the same algorithms and compares the results for software testing. In recent years, this approach was successfully used for test campaigns of deep learning frameworks. Objective: There is little knowledge on the application of differential testing beyond deep...
Article
Full-text available
Context Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective We want to improve our understanding of the prevalence of tangling and the types of changes that...
Preprint
Pre-trained transformer models are the current state-of-the-art for natural language models processing. seBERT is such a model, that was developed based on the BERT architecture, but trained from scratch with software engineering data. We fine-tuned this model for the NLBSE challenge for the task of issue type prediction. Our model dominates the ba...
Article
Soil water storage (SWS) illustrates the available water capacity of soil horizons and its water reservoir from which crops can draw upon during transient water deficit periods. Information on SWS quantity and stability at spatial and temporal scales represented in a 4D predictive map is important to support sustainable agricultural intensification...
Article
Full-text available
Context The SZZ algorithm is the de facto standard for labeling bug fixing commits and finding inducing changes for defect prediction data. Recent research uncovered potential problems in different parts of the SZZ algorithm. Most defect prediction data sets provide only static code metrics as features, while research indicates that other features...
Article
Full-text available
Machine learning is nowadays a standard technique for data analysis within software applications. Software engineers need quality assurance techniques that are suitable for these new kinds of systems. Within this article, we discuss the question whether standard software testing techniques that have been part of textbooks since decades are also use...
Article
This paper presents an Expert Decision Support System for the identification of time-invariant, aeroacoustic source types. The system comprises two steps: first, acoustic properties are calculated based on spectral and spatial information. Second, clustering is performed based on these properties. The clustering aims at helping and guiding an exper...
Article
This paper presents an Expert Decision Support System for the identification of time-invariant, aeroacoustic source types. The system comprises two steps: first, acoustic properties are calculated based on spectral and spatial information. Second, clustering is performed based on these properties. The clustering aims at helping and guiding an exper...
Preprint
Automated Static Analysis Tools (ASATs) are part of software development best practices. ASATs are able to warn developers about potential problems in the code. On the one hand, ASATs are based on best practices so there should be a noticeable effect on software quality. On the other hand, ASATs suffer from false positive warnings, which developers...
Preprint
Bug localization is a tedious activity in the bug fixing process in which a software developer tries to locate bugs in the source code described in a bug report. Since this process is time-consuming and sometimes requires additional knowledge about the software project, current literature proposes several information retrieval techniques which can...
Preprint
Transformers are the current state-of-the-art of natural language processing in many domains and are using traction within software engineering research as well. Such models are pre-trained on large amounts of data, usually from the general domain. However, we only have a limited understanding regarding the validity of transformers within the softw...
Preprint
Static software metrics, e.g., size, complexity and coupling are used in defect prediction research as well as software quality models to evaluate software quality. Static analysis tools also include boundary values for complexity and size that generate warnings for developers. However, recent studies found that complexity metrics may be unreliable...
Article
Beamforming is an imaging tool for the investigation of aeroacoustic phenomena and results in high-dimensional data that are broken down to spectra by integrating spatial regions of interest. This paper presents two methods that enable the automated identification of aeroacoustic sources in sparse beamforming maps and the extraction of their corres...
Preprint
Full-text available
This paper presents an Expert Decision Support System for the identification of time-invariant, aeroacoustic source types. The system comprises two steps: first, acoustic properties are calculated based on spectral and spatial information. Second, clustering is performed based on these properties. The clustering aims at helping and guiding an exper...
Preprint
Full-text available
Beamforming is an imaging tool for the investigation of aeroacoustic phenomena and results in high dimensional data that is broken down to spectra by integrating spatial Regions Of Interest. This paper presents two methods that enable the automated identification of aeroacoustic sources in sparse beamforming maps and the extraction of their corresp...
Preprint
The competent programmer hypothesis states that most programmers are competent enough to create correct or almost correct source code. Because this implies that bugs should usually manifest through small variations of the correct code, the competent programmer hypothesis is one of the fundamental assumptions of mutation testing. Unfortunately, it i...
Preprint
Performance metrics are a core component of the evaluation of any machine learning model and used to compare models and estimate their usefulness. Recent work started to question the validity of many performance metrics for this purpose in the context of software defect prediction. Within this study, we explore the relationship between performance...
Preprint
The SmartSHARK repository mining data is a collection of rich and detailed information about the evolution of software projects. The data is unique in its diversity and contains detailed information about each change, issue tracking data, continuous integration data, as well as pull request and code review data. Moreover, the data does not contain...
Preprint
Full-text available
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes tha...
Article
Full-text available
Context Issue tracking systems are used to track and describe tasks in the development process, e.g., requested feature improvements or reported bugs. However, past research has shown that the reported issue types often do not match the description of the issue. Objective We want to understand the overall maturity of the state of the art of issue...
Article
Full-text available
Automated static analysis tools (ASATs) have become a major part of the software development workflow. Acting on the generated warnings, i.e., changing the code indicated in the warning, should be part of, at latest, the code review phase. Despite this being a best practice in software development, there is still a lack of empirical research regard...
Article
Full-text available
The original version of this article unfortunately contained mistakes. Figures 8, 9 and 10 were incorrectly captured. Somehow, the plots in Fig. 8 were replaced with those from Fig. 9 and the original Fig. 8 was lost.
Preprint
Machine learning is nowadays a standard technique for data analysis within software applications. Software engineers need quality assurance techniques that are suitable for these new kinds of systems. Within this article, we discuss the question whether standard software testing techniques that have been part of textbooks since decades are also use...
Article
Developer social networks (DSNs) are a tool for the analysis of community structures and collaborations between developers in software projects and software ecosystems. Within this paper, we present the results of a systematic mapping study on the use of DSNs in software engineering research. We identified 255 primary studies on DSNs. We mapped the...
Article
Full-text available
Data extracted from software repositories is used intensively in Software Engineering research, for example, to predict defects in source code. In our research in this area, with data from open source projects as well as an industrial partner, we noticed several shortcomings of conventional data mining approaches for classification problems: (1) Do...
Conference Paper
Full-text available
We present an expert decision support system for time-invariant aeroacoustic source classification from deconvolved beamforming maps and results based on scaled airframe half-model wind tunnel measurements. The system consists of three steps: the identification of acoustic sources from the deconvolved maps, the calculation of their acoustic propert...
Preprint
Context: Issue tracking systems are used to track and describe tasks in the development process, e.g., requested feature improvements or reported bugs. However, past research has shown that the reported issue types often do not match the description of the issue. Objective: We want to understand the overall maturity of the state of the art of issue...
Preprint
The scale of manually validated data is currently limited by the effort that small groups of researchers can invest for the curation of such data. Within this paper, we propose the use of registered reports to scale the curation of manually validated data. The idea is inspired by the mechanical turk and replaces monetary payment with authorship of...
Preprint
Software repository mining is the foundation for many empirical software engineering studies. The collection and analysis of detailed data can be challenging, especially if data shall be shared to enable replicable research and open science practices. SmartSHARK is an ecosystem that supports replicable and reproducible research based on software re...
Article
Defect prediction can be a powerful tool to guide the use of quality assurance resources. However, while lots of research covered methods for defect prediction as well as methodological aspects of defect prediction research, the actual cost saving potential of defect prediction is still unclear. Within this article, we close this research gap and f...
Preprint
Automated static analysis tools (ASATs) have become a major part of the software development workflow. Acting on the generated warnings, i.e., changing the code indicated in the warning, should be part of, at latest, the code review phase. Despite this being a best practice in software development, there is still a lack of empirical research regard...
Preprint
Defect prediction research has a strong reliance on published data sets that are shared between researchers. The SZZ algorithm is the de facto standard for collecting defect labels for this kind of data and is used by most public data sets. Thus, problems with the SZZ algorithm may have a strong indirect impact on almost the complete state of the a...
Preprint
Defect prediction can be a powerful tool to guide the use of quality assurance resources. However, while lots of research covered methods for defect prediction as well as methodological aspects of defect prediction research, the actual cost saving potential of defect prediction is still unclear. Within this article, we close this research gap and f...
Article
Context Unit and integration testing are popular testing techniques. However, while the software development context evolved over time, the definitions remained unchanged. There is no empirical evidence, if these commonly used definitions still fit to modern software development. Objective We analyze, if the existing standard definitions of unit a...
Preprint
Developer social networks (DSNs) are a tool for the analysis of community structures and collaborations between developers in software projects and software ecosystems. Within this paper, we present the results of a systematic mapping study on the use of DSNs in software engineering research. We identified 194 primary studies on DSNs. We mapped the...
Preprint
Data extracted from software repositories is used intensively in Software Engineering research, for example, to predict defects in source code. In our research in this area, with data from open source projects as well as an industrial partner, we noticed several shortcomings of conventional data mining approaches for classification problems: (1) Do...
Preprint
Change-based code review is used widely in industrial software development. Thus, research on tools that help the reviewer to achieve better review performance can have a high impact. We analyze one possibility to provide cognitive support for the reviewer: Determining the importance of change parts for review, specifically determining which parts...
Conference Paper
Cross-Project Defect Prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. However, within the current state-of-the-art it is unclear which of the many proposals performs best due to a lack of replication of results and diverse experiment setups that utilize different performance...
Article
Full-text available
The usage of empirical methods has grown common in software engineering. This trend spawned hundreds of publications, whose results are helping to understand and improve the software development process. Due to the data-driven nature of this venue of investigation, we identified several problems within the current state-of-the-art that pose a threa...
Article
Defect prediction can be a powerful tool to guide the use of quality assurance resources. In recent years, many researchers focused on the problem of Cross-Project Defect Prediction (CPDP), i.e., the creation of prediction models based on training data from other projects. However, only few of the published papers evaluate the cost efficiency of pr...
Article
Unfortunately, the article "A Comparative Study to Benchmark Cross-project Defect Prediction Approaches" has a problem in the statistical analysis which was pointed out almost immediately after the pre-print of the article appeared online. While the problem does not negate the contribution of the the article and all key findings remain the same, it...
Chapter
Replications and replicable research are currently gaining traction in the software engineering research community. Our research group made an effort in the recent years to make our own research accessible for other researchers, through the provision of replication kits that allow rerunning our experiments. Within this chapter, we present our exper...
Article
In this article, we discuss the ScottKnottESD test, which was proposed in a recent paper "An Empirical Comparison of Model Validation Techniques for Defect Prediction Models" that was published in this journal. We discuss the implications and the empirical impact of the proposed normality correction of ScottKnottESD and come to the conclusion that...
Article
Full-text available
Although researchers invested significant effort, the performance of defect prediction in a cross-project setting, i.e., with data that does not come from the same project, is still unsatisfactory. A recent proposal for the improvement of defect prediction is using local models. With local models, the available data is first clustered into homogene...
Article
Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. However, within the current state-of-the-art it is unclear which of the many proposals performs best due to a lack of replication of results and diverse experiment setups that utilize different performance...
Article
Full-text available
Usage-based testing focuses quality assurance on highly used parts of the software. The basis for this are usage profiles based on which test cases are generated. There are two fundamental approaches in usage-based testing for deriving usage profiles: either the system under test (SUT) is observed during its operation and from the obtained usage da...
Article
Full-text available
The quality of Web services is an important factor for businesses that advertise or sell their services in the Internet. Failures can directly lead to fewer costumers or security problems. However, the testing of complex Web services that are organized in service-oriented architectures is a difficult and complex problem. Model-based testing (MBT) i...
Conference Paper
Reliability is one of the key concerns of both cloud providers and consumers, who require accurate reliability evaluation methods to develop, deploy, and maintain cloud applications. However, few works assess the reliability of cloud applications considering deep dependencies in the deployment stack. To explore the impact of deep dependencies to th...
Article
Cross-Project-Defect Prediction as a sub-topic of defect prediction in general has become a popular topic in research. In this article, we present a systematic mapping study with the focus on CPDP, for which we found 50 publications. We summarize the approaches presented by each publication and discuss the case study setups and results. We discover...
Conference Paper
Fault prediction on high quality industry grade software often suffers from strong imbalanced class distribution due to a low bug rate. Previous work reports on low predictive performance, thus tuning parameters is required. As the State of the Art recommends sampling methods for imbalanced learning, we analyse effects when under- and oversampling...
Conference Paper
In software project planning project managers have to keep track of several things simultaneously including the estimation of the consequences of decisions about, e.g., the team constellation. The application of machine learning techniques to predict possible outcomes is a widespread research topic in software engineering. In this paper, we summari...
Conference Paper
The evolution of software projects is driven by developers who are in control of the developed artifacts. When analyzing the behavior of developers, the observable behaviors are, e.g., commits, messages, or bug assignments. For defining dynamic activities and workload of developers, we consider underlying characteristics, which means the level of i...
Conference Paper
Research in software repository mining has grown considerably the last decade. Due to the data-driven nature of this venue of investigation, we identified several problems within the current state-of-the-art that pose a threat to the external validity of results. The heavy re-use of data sets in many studies may invalidate the results in case probl...
Book
This book constitutes revised papers of the proceedings of the 9th International Workshop on System Analysis and Modeling, SAM 2016, held in Saint-Melo, France, in October 2016. The 15 full papers presented were carefully reviewed and selected from 31 submissions. The contributions are organized in topical theme named: Technology-Specific Aspects o...
Conference Paper
Defect prediction is a powerful tool that greatly helps focusing quality assurance efforts during development. In the case of the availability of fault data from a particular context, there are different ways of using such fault predictions in practice. Companies like Google, Bell Labs and Cisco make use of fault prediction, whereas its use within...
Conference Paper
Along with the increasing importance of software systems for our daily life, attacks on these systems may have a critical impact. Since the number of attacks and their effects increases the more systems are connected, the secure operation of IT systems becomes a fundamental property. In the future, this importance will increase, due to the rise of...
Article
IT-Sicherheitstests untersuchen Systeme auf sicherheitsrelevante Schwachstellen, indem diese ausgeführthrt werden. Eine inzwischen verbreitete Technik hierfür ist das sogenannte Fuzzing, bei dem die Schnittstellen eines Systems mit ungültigen Daten stimuliert werden. Diese können zufallsbasiert, mit Beschreibungen der Eingabedatenformate, beispiels...
Conference Paper
While Service Oriented Architectures (SOAs) are for many parts deployed online, and today often in a cloud, the testing of the systems still happens mostly locally. In this paper, we want to present the MIDAS Testing as a Service (TaaS), a cloud platform for the testing of SOAs. We focus on the testing of whole SOA orchestrations, a complex task du...
Article
We combine a new data model, where the random classification is subjected to rather weak restrictions which in turn are based on the Mammen-Tsybakov [E. Mammen and A. B. Tsybakov, Ann. Statis. 27 (1999) 1808-1829; A. B. Tsybakov, Ann. Statis. 32 (2004) 135-166.] small margin conditions, and the statistical query (SQ) model due to Kearns [M. J. Kear...
Conference Paper
Software defect prediction has been a popular research topic in recent years and is considered as a means for the optimization of quality assurance activities. Defect prediction can be done in a within-project or a cross-project scenario. The within-project scenario produces results with a very high quality, but requires historic data of the projec...
Conference Paper
In this paper, we present AutoQUEST, a testing platform for Event-Driven Software (EDS) that decouples the implementation of testing techniques from the concrete platform they should be applied to. AutoQUEST provides the means to define testing techniques against an abstract Application Programming Interface (API) and provides plugins to port the t...
Article
End-user software systems are usually operated using a Graphical User Interface (GUI). Therefore, the quality of the GUI greatly impacts the quality of use of a software, which makes GUI testing an important part of software quality assurance. Furthermore, bugs in the software are triggered by the users through interaction with the software's GUI....
Article
Full-text available
In this article, we present a novel algorithmic method for the calculation of thresholds for a metric set. To this aim, machine learning and data mining techniques are utilized. We define a data-driven methodology that can be used for efficiency optimization of existing metric sets, for the simplification of complex classification models, and for t...
Conference Paper
Full-text available
Event-driven software is very diverse, e.g., in form of Graphical User Interfaces (GUIs), Web applications, or embedded software. Regardless of the application, the challenges for testing event-driven software are similar. Most event-driven systems allow a huge number of possible event sequences, which makes exhaustive testing infeasible. As a poss...
Conference Paper
Full-text available
Most software systems are operated using a Graphical User Interface (GUI). Therefore, bugs are often triggered by user interaction with the software's GUI. Hence, accurate and reliable GUI usage information is an important tool for bug fixing, as the reproduction of a bug is the first important step towards fixing it. To support bug reproduction, a...