Benjamin Ledel's research while affiliated with Institute for Software & Systems Engineering and other places

Publications (11)

Preprint
Context: The identification of bugs within the reported issues in an issue tracker is crucial for the triage of issues. Machine learning models have shown promising results regarding the performance of automated issue type prediction. However, we have only limited knowledge beyond our assumptions how such models identify bugs. LIME and SHAP are pop...
Article
Full-text available
Context Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective We want to improve our understanding of the prevalence of tangling and the types of changes that...
Article
Full-text available
Context The SZZ algorithm is the de facto standard for labeling bug fixing commits and finding inducing changes for defect prediction data. Recent research uncovered potential problems in different parts of the SZZ algorithm. Most defect prediction data sets provide only static code metrics as features, while research indicates that other features...
Preprint
Bug localization is a tedious activity in the bug fixing process in which a software developer tries to locate bugs in the source code described in a bug report. Since this process is time-consuming and sometimes requires additional knowledge about the software project, current literature proposes several information retrieval techniques which can...
Preprint
Full-text available
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes tha...
Preprint
Software repository mining is the foundation for many empirical software engineering studies. The collection and analysis of detailed data can be challenging, especially if data shall be shared to enable replicable research and open science practices. SmartSHARK is an ecosystem that supports replicable and reproducible research based on software re...

Citations

... For instance, should the caller of a vulnerable code snippet also be labelled as such? Herbold et al. [60] encountered similar problems in their investigation of tangled commits. Alternatively, the label source may not contain enough information in the bug report to properly trace it. ...
... We use defect prediction data by Herbold et al. (2022) that contains data for 398 releases of 38 Java projects Table 1. Each instance in a release represents a Java production file. ...
... We identify the following directions as future works to extend this study; i) integrate PreMOSA in a continuous integration environment, ii) adapt an appropriate test suite minimisation technique to address the generation of large test suites, iii) define and simulate acceptable defect predictors with respect to unbiased performance metrics like MCC, and iv) validate PreMOSA against other bug datasets [62,63,64]. ...
... Inaccurate ground truth may introduce * Yan Lei is the corresponding author. noise and bias into the assessment activities, and thus may further negatively impact the reliability of assessment results [31], [32]. Therefore, for each bug of datasets used in APR, the associated human-written patch is expected to exclude the irrelevant code changes to the bug that is exposed by the triggering tests. ...
... The inclusion of these counterparts results in computational effort as we need every metric for every file in every commit. However, we are able to provide this data via the SmartSHARK ecosystem (Trautsch et al. 2017(Trautsch et al. , 2020b. This additional effort allows us to infer if categories of changes are different when regarding all changes of a software project. ...