Kwabena BenninWageningen University & Research | WUR
Kwabena Bennin
About
78
Publications
12,153
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,703
Citations
Introduction
Additional affiliations
September 2014 - February 2019
Publications
Publications (78)
Highly imbalanced data typically make accurate predictions difficult. Unfortunately, software defect datasets tend to have fewer defective modules than non-defective modules. Synthetic oversampling approaches address this concern by creating new minority defective modules to balance the class distribution before a model is trained. Notwithstanding...
To prioritize quality assurance efforts, various fault prediction models have been proposed. However, the best performing fault prediction model is unknown due to three major drawbacks: (1) comparison of few fault prediction models considering small number of data sets, (2) use of evaluation measures that ignore testing efforts and (3) use of n-fol...
Several defect prediction models proposed are effective when historical datasets are available. Defect prediction becomes difficult when no historical data exist. Cross-project defect prediction (CPDP), which uses projects from other sources/companies to predict the defects in the target projects proposed in recent studies has shown promising resul...
Code readability is an important indicator of software maintenance as it can significantly impact maintenance efforts. Recently, LLM (large language models) have been utilized for code readability evaluation. However, readability evaluation differs among developers, so personalization of the evaluation by LLM is needed. This study proposes a method...
Various clone detection methods have been proposed, with results varying depending on the combination of the methods and hyperparameters used (i.e., configurations). To help select a suitable clone detection configuration, we propose two Bandit Algorithm (BA) based methods that can help evaluate the configurations used dynamically while using detec...
Background: Code s generation tools such as GitHub Copilot have received attention due to their performance in generating code. Generally, a prior analysis of their performance is needed to select new code-generation tools from a list of candidates. Without such analysis, there is a higher risk of selecting an ineffective tool, which would negative...
Ensemble learning methods have been used to enhance the reliability of defect prediction models. However, there is an inconclusive stability of a single method attaining the highest accuracy among various software projects. This work aims to improve the performance of ensemble-learning defect prediction among such projects by helping select the hig...
Building defect prediction models based on online learning can enhance prediction accuracy. It continuously rebuilds a new prediction model while adding new data points. However, a module predicted as “non-defective” can result in fewer test cases for such modules. Thus, a defective module can be overlooked during testing. The erroneous test result...
Cross-project defect prediction (CPDP) aims to use data from external projects as historical data may not be available from the same project. In CPDP, deciding on a particular historical project to build a training model can be difficult. To help with this decision, a Bandit Algorithm (BA) based approach has been proposed in prior research to selec...
Unmanned aerial vehicles (UAVs) have emerged as versatile tools with significant potential in various fields, including, but not limited to civil engineering, ecology, networking and precision agriculture. Systematic literature reviews (SLRs) play a crucial role in assessing the quality of research methods and approaches, aiding researchers and pra...
There are many aspects of code quality, some of which are difficult to capture or to measure. Despite the importance of software quality, there is a lack of commonly accepted measures or indicators for code quality that can be linked to quality attributes. We investigate software developers’ perceptions of source code quality and the practices they...
Building defect prediction models based on online learning can enhance prediction accuracy. It continuously rebuilds a new prediction model when adding a new data point. However, predicting a module as "non-defective" (i.e., negative prediction) can result in fewer test cases for such modules. Therefore, defects can be overlooked during testing, ev...
This study aims to identify opportunities and barriers in developing and implementing Food Shopping Support Systems (FSSS) for healthier and more sustainable choices, given the growing consumer demand and persistent societal problems related to food. The study examined the social and technical value of FSSS in an early development stage through one...
Content Tractor Drive Trains An Analysis of Mixed Hydraulic and Electric Configurations for the Actuation of Tractor Auxiliary and Implement Functions to Reduce Power Consumption 1 The development of a reference working cycles for agricultural tractors 15 Performance optimization of CVT standard tractors in front loading application 21 Numerical si...
Currently, not all children that need speech therapy have access to a therapist. With the current international shortage of speech–language pathologists (SLPs), there is a demand for online tools to support SLPs with their daily tasks. Several online speech therapy (OST) systems have been designed and proposed in the literature; however, the implem...
Context: Automated software defect prediction (SDP) methods are increasingly applied, often with the use of machine learning (ML) techniques. Yet, the existing ML-based approaches require manually extracted features, which are cumbersome, time consuming and hardly capture the semantic information reported in bug reporting tools. Deep learning (DL)...
With the current international shortage of speech-language pathologists (SLPs), there is a demand for online tools to support SLPs with their daily tasks. For this purpose, several online speech therapy systems (OSTSs) have been proposed and discussed in the literature. However, developing these OSTSs is not trivial since it involves the considerat...
Context
Automated software defect prediction (SDP) methods are increasingly applied, often with the use of machine learning (ML) techniques. Yet, the existing ML-based approaches require manually extracted features, which are cumbersome, time consuming and hardly capture the semantic information reported in bug reporting tools. Deep learning (DL) t...
Crossp-roject defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbour (NN) Filter approach have shown promising results in recent studies. A key challenge with...
Background: Selecting a suitable feature reduction technique, when building a defect prediction model, can be challenging. Different techniques can result in the selection of different independent variables which have an impact on the overall performance of the prediction model. To help in the selection, previous studies have assessed the impact of...
By 2050, according to the UN medium forecast, 68.6% of the world’s population will live in cities. This growth will place a strain on critical infrastructure distribution networks, which already operate in a state that is complex and intertwined within society. In order to create a sustainable society, there needs to be a change in both societal be...
Context: Technical debt (TD) discusses the negative impact of sub-optimal decisions to cope with the need-for-speed in software development. Code technical debt items (TDI) are atomic elements of TD that can be observed in code artifacts. Empirical results on open-source systems demonstrated how code-smells, which are just one type of TDIs, are int...
Cross-project defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbor (NN) Filter approach have shown promising results in recent studies. A key challenge with d...
Background: defect prediction model is built using historical data from previous versions/releases of the same project. However, such historical data may not exist in case of newly developed projects. Alternatively, one can train a model using data obtained from external projects. This approach is known as cross-project defect prediction (CPDP). In...
In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take into account the direct, and potentially indirect, effects of class size on defects. However, some studies have...
Inter-release defect prediction (IRDP) is a practical scenario that employs the datasets of the previous release to build a prediction model and predicts defects for the current release within the same software project. A practical software project experiences several releases where data of each release appears in the form of chunks that arrive in...
In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take into account the direct, and potentially indirect, effects of class size on defects. However, some studies have...
Bellwether effect refers to the existence of exemplary projects (called the Bellwether) within a historical dataset to be used for improved prediction performance. Recent studies have shown an implicit assumption of using recently completed projects (referred to as moving window) for improved prediction accuracy. In this paper, we investigate the B...
Context: In addressing how best to estimate how much effort is required to develop software, a recent study found that using exemplary and recently completed projects [forming Bellwether moving windows (BMW)] in software effort prediction (SEP) models leads to relatively improved accuracy. More studies need to be conducted to determine whether the...
BACKGROUND: In object oriented (OO) software systems, class size has been acknowledged as having an indirect effect on the relationship between certain artifact characteristics, captured via metrics, and faultproneness, and therefore it is recommended to control for size when designing fault prediction models. AIM: To use robust statistical methods...
Various software fault prediction models have been proposed in the past twenty years. Many studies have compared and evaluated existing prediction approaches in order to identify the most effective ones. However, in most cases, such models and techniques provide varying results, and their outcomes do not result in best possible performance across d...
Context
Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective instances. Existing techniques generate eithe...
Context: Technical Debt (TD) discusses the negative impact of sub-optimal decisions to cope with the need-for-speed in software development. Code Technical Debt Items (TDI) are atomic elements of TD that can be observed in code artefacts. Empirical results on open-source systems demonstrated how code-smells, which are just one type of TDIs, are int...
Context: Ranking-oriented defect prediction (RODP) ranks software modules to allocate limited testing resources to each module according to the predicted number of defects. Most RODP methods overlook that ranking a module with more defects incorrectly makes it difficult to successfully find all of the defects in the module due to fewer testing reso...
Software defect data sets are typically characterized by an unbalanced class distribution where the defective modules are fewer than the non-defective modules. Prediction performances of defect prediction models are detrimentally affected by the skewed distribution of the faulty minority modules in the data set since most algorithms assume both cla...
Effort-Aware Defect Prediction (EADP) ranks software
modules based on the possibility of these modules being
defective, their predicted number of defects, or defect density by
using learning to rank algorithms. Prior empirical studies compared
a few learning to rank algorithms considering small number
of datasets, evaluating with inappropriate or o...
BACKGROUND: In object oriented (OO) software systems, class size has been acknowledged as having an indirect effect on the relationship between certain artifact characteristics, captured via metrics, and fault-proneness, and therefore it is recommended to control for size when designing fault prediction models.
AIM: To use robust statistical metho...
Context: Automatic localization of buggy files can speed up the process of bug fixing to improve the efficiency and productivity of software quality assurance teams. Useful semantic information is available in bug reports and source code, but it is usually underutilized by existing bug localization approaches. Objective: To improve the performance...
Background: Correctly localizing buggy files for bug reports together with their semantic and structural information is a crucial task, which would essentially improve the accuracy of bug localization techniques. Aims: To empirically evaluate and demonstrate the effects of both semantic and structural information in bug reports and source files on...
Context: In addressing how best to estimate how much effort is required to develop software, a recent study found that using exemplary and recently completed projects [forming Bellwether moving windows (BMW)] in software effort prediction (SEP) models leads to relatively improved accuracy. More studies need to be conducted to determine whether the...
This study presents MAHAKIL, a novel and efficient synthetic over-sampling approach for software defect datasets that is based on the chromosomal theory of inheritance. Exploiting this theory, MAHAKIL interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the di...
Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to a target company. Unfortunately, larger irrelevant cross-company (CC) data usually make it difficult to build a prediction model with high performance. On the other hand, b...
Context
Cross-project defect prediction (CPDP) which uses dataset from other projects to build predictors has been recently recommended as an effective approach for building prediction models that lack historical or sufficient local datasets. Class imbalance and distribution mismatch between the source and target datasets associated with real-world...
Context
The challenge of locating bugs in mostly large-scale software systems has led to the development of bug localization techniques. However, the lexical mismatch between bug reports and source codes degrades the performances of existing information retrieval or machine learning-based approaches.
Objective
To bridge the lexical gap and improve...
The structural complexity of design components (e.g. Classes) is proportional to design quality at the system level and is quantified via the object-oriented metrics. The frequent use of design patterns causes of too much abstraction and can increase the structural complexity of design components. Though, in our previous work, we have empirically i...
Bug localization is a software development and maintenance activity that aims to find relevant source code entities to be modified so that a specific bug can be fixed on the basis of the given bug report. Information retrieval (IR) techniques have been widely used to locate bugs in recent decades. These techniques mainly use the IR similarity betwe...
Context: Recent studies have shown that
performance of defect prediction models can be affected
when data sampling approaches are applied to imbalanced
training data for building defect prediction models. However, the
magnitude (degree and power) of the effect of these sampling
methods on the classification and prioritization performances of
defect...
Context
Software effort estimation (SEE) plays a key role in predicting the effort needed to complete software development task. However, the conclusion instability across learners has affected the implementation of SEE models. This instability can be attributed to the lack of an effort classification benchmark that software researchers and practit...
Programmers tend to leave incomplete, temporary workarounds and buggy codes that require rework in software development and such pitfall is referred to as Self-admitted Technical Debt (SATD). Previous studies have shown that SATD negatively affects software project and incurs high maintenance overheads. In this study, we introduce a prioritization...
Background: In the plethora of studies, the objectorientedmetrics have been empirically validated to assess thedesign properties and quantify the high-level quality attributessuch as fault-proneness, either at the method or class granularitylevels of software. Motivation: A more precise value of an objectorientedmetric can be used as an indicator f...
In the domain of software fault prediction, class membership probability of a selected classifier and the factors related to its estimation can be considered as necessary information for tester to take informed decisions about software quality issues. The objective of this study is to empirically investigate the class membership probability estimat...
Open Source Software (OSS) is often developed in
a public collaborative manner. Online OSS repositories such as
GitHub, Google Code and SourceForge support collaborative
OSS development by offering services such as subversion
management, bug tracking and others. However, OSS mostly
favors end-users who are programmers or have some prerequisite
prog...
In object-oriented software development, a plethora of studies have been carried out to present the application of machine learning algorithms for fault prediction. Furthermore, it has been empirically validated that an ensemble method can improve classification performance as compared to a single classifier. But, due to the inherent differences am...
Trend of software development is changing rapidly most of the software development organizations are trying to globalize their activities throughout the world. This trend leads towards a phenomenon called Global Software Development (GSD).The main reason behind the software globalization is its various benefits. Besides these benefits, software org...