Kwabena Bennin

Kwabena Bennin
Wageningen University & Research | WUR

About

78
Publications
12,153
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,703
Citations
Additional affiliations
September 2014 - February 2019
City University of Hong Kong
Position
  • Graduate Research Student

Publications

Publications (78)
Article
Highly imbalanced data typically make accurate predictions difficult. Unfortunately, software defect datasets tend to have fewer defective modules than non-defective modules. Synthetic oversampling approaches address this concern by creating new minority defective modules to balance the class distribution before a model is trained. Notwithstanding...
Conference Paper
Full-text available
To prioritize quality assurance efforts, various fault prediction models have been proposed. However, the best performing fault prediction model is unknown due to three major drawbacks: (1) comparison of few fault prediction models considering small number of data sets, (2) use of evaluation measures that ignore testing efforts and (3) use of n-fol...
Conference Paper
Several defect prediction models proposed are effective when historical datasets are available. Defect prediction becomes difficult when no historical data exist. Cross-project defect prediction (CPDP), which uses projects from other sources/companies to predict the defects in the target projects proposed in recent studies has shown promising resul...
Preprint
Code readability is an important indicator of software maintenance as it can significantly impact maintenance efforts. Recently, LLM (large language models) have been utilized for code readability evaluation. However, readability evaluation differs among developers, so personalization of the evaluation by LLM is needed. This study proposes a method...
Article
Various clone detection methods have been proposed, with results varying depending on the combination of the methods and hyperparameters used (i.e., configurations). To help select a suitable clone detection configuration, we propose two Bandit Algorithm (BA) based methods that can help evaluate the configurations used dynamically while using detec...
Article
Background: Code s generation tools such as GitHub Copilot have received attention due to their performance in generating code. Generally, a prior analysis of their performance is needed to select new code-generation tools from a list of candidates. Without such analysis, there is a higher risk of selecting an ineffective tool, which would negative...
Preprint
Ensemble learning methods have been used to enhance the reliability of defect prediction models. However, there is an inconclusive stability of a single method attaining the highest accuracy among various software projects. This work aims to improve the performance of ensemble-learning defect prediction among such projects by helping select the hig...
Article
Building defect prediction models based on online learning can enhance prediction accuracy. It continuously rebuilds a new prediction model while adding new data points. However, a module predicted as “non-defective” can result in fewer test cases for such modules. Thus, a defective module can be overlooked during testing. The erroneous test result...
Article
Cross-project defect prediction (CPDP) aims to use data from external projects as historical data may not be available from the same project. In CPDP, deciding on a particular historical project to build a training model can be difficult. To help with this decision, a Bandit Algorithm (BA) based approach has been proposed in prior research to selec...
Article
Full-text available
Unmanned aerial vehicles (UAVs) have emerged as versatile tools with significant potential in various fields, including, but not limited to civil engineering, ecology, networking and precision agriculture. Systematic literature reviews (SLRs) play a crucial role in assessing the quality of research methods and approaches, aiding researchers and pra...
Article
Full-text available
There are many aspects of code quality, some of which are difficult to capture or to measure. Despite the importance of software quality, there is a lack of commonly accepted measures or indicators for code quality that can be linked to quality attributes. We investigate software developers’ perceptions of source code quality and the practices they...
Preprint
Building defect prediction models based on online learning can enhance prediction accuracy. It continuously rebuilds a new prediction model when adding a new data point. However, predicting a module as "non-defective" (i.e., negative prediction) can result in fewer test cases for such modules. Therefore, defects can be overlooked during testing, ev...
Article
Full-text available
This study aims to identify opportunities and barriers in developing and implementing Food Shopping Support Systems (FSSS) for healthier and more sustainable choices, given the growing consumer demand and persistent societal problems related to food. The study examined the social and technical value of FSSS in an early development stage through one...
Chapter
Content Tractor Drive Trains An Analysis of Mixed Hydraulic and Electric Configurations for the Actuation of Tractor Auxiliary and Implement Functions to Reduce Power Consumption 1 The development of a reference working cycles for agricultural tractors 15 Performance optimization of CVT standard tractors in front loading application 21 Numerical si...
Article
Full-text available
Currently, not all children that need speech therapy have access to a therapist. With the current international shortage of speech–language pathologists (SLPs), there is a demand for online tools to support SLPs with their daily tasks. Several online speech therapy (OST) systems have been designed and proposed in the literature; however, the implem...
Preprint
Full-text available
Context: Automated software defect prediction (SDP) methods are increasingly applied, often with the use of machine learning (ML) techniques. Yet, the existing ML-based approaches require manually extracted features, which are cumbersome, time consuming and hardly capture the semantic information reported in bug reporting tools. Deep learning (DL)...
Article
Full-text available
With the current international shortage of speech-language pathologists (SLPs), there is a demand for online tools to support SLPs with their daily tasks. For this purpose, several online speech therapy systems (OSTSs) have been proposed and discussed in the literature. However, developing these OSTSs is not trivial since it involves the considerat...
Article
Context Automated software defect prediction (SDP) methods are increasingly applied, often with the use of machine learning (ML) techniques. Yet, the existing ML-based approaches require manually extracted features, which are cumbersome, time consuming and hardly capture the semantic information reported in bug reporting tools. Deep learning (DL) t...
Preprint
Crossp-roject defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbour (NN) Filter approach have shown promising results in recent studies. A key challenge with...
Conference Paper
Full-text available
Background: Selecting a suitable feature reduction technique, when building a defect prediction model, can be challenging. Different techniques can result in the selection of different independent variables which have an impact on the overall performance of the prediction model. To help in the selection, previous studies have assessed the impact of...
Article
Full-text available
By 2050, according to the UN medium forecast, 68.6% of the world’s population will live in cities. This growth will place a strain on critical infrastructure distribution networks, which already operate in a state that is complex and intertwined within society. In order to create a sustainable society, there needs to be a change in both societal be...
Article
Full-text available
Context: Technical debt (TD) discusses the negative impact of sub-optimal decisions to cope with the need-for-speed in software development. Code technical debt items (TDI) are atomic elements of TD that can be observed in code artifacts. Empirical results on open-source systems demonstrated how code-smells, which are just one type of TDIs, are int...
Article
Full-text available
Cross-project defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbor (NN) Filter approach have shown promising results in recent studies. A key challenge with d...
Conference Paper
Full-text available
Background: defect prediction model is built using historical data from previous versions/releases of the same project. However, such historical data may not exist in case of newly developed projects. Alternatively, one can train a model using data obtained from external projects. This approach is known as cross-project defect prediction (CPDP). In...
Article
Full-text available
In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take into account the direct, and potentially indirect, effects of class size on defects. However, some studies have...
Article
Inter-release defect prediction (IRDP) is a practical scenario that employs the datasets of the previous release to build a prediction model and predicts defects for the current release within the same software project. A practical software project experiences several releases where data of each release appears in the form of chunks that arrive in...
Preprint
In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take into account the direct, and potentially indirect, effects of class size on defects. However, some studies have...
Preprint
Bellwether effect refers to the existence of exemplary projects (called the Bellwether) within a historical dataset to be used for improved prediction performance. Recent studies have shown an implicit assumption of using recently completed projects (referred to as moving window) for improved prediction accuracy. In this paper, we investigate the B...
Preprint
Context: In addressing how best to estimate how much effort is required to develop software, a recent study found that using exemplary and recently completed projects [forming Bellwether moving windows (BMW)] in software effort prediction (SEP) models leads to relatively improved accuracy. More studies need to be conducted to determine whether the...
Preprint
BACKGROUND: In object oriented (OO) software systems, class size has been acknowledged as having an indirect effect on the relationship between certain artifact characteristics, captured via metrics, and faultproneness, and therefore it is recommended to control for size when designing fault prediction models. AIM: To use robust statistical methods...
Article
Various software fault prediction models have been proposed in the past twenty years. Many studies have compared and evaluated existing prediction approaches in order to identify the most effective ones. However, in most cases, such models and techniques provide varying results, and their outcomes do not result in best possible performance across d...
Article
Context Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective instances. Existing techniques generate eithe...
Preprint
Full-text available
Context: Technical Debt (TD) discusses the negative impact of sub-optimal decisions to cope with the need-for-speed in software development. Code Technical Debt Items (TDI) are atomic elements of TD that can be observed in code artefacts. Empirical results on open-source systems demonstrated how code-smells, which are just one type of TDIs, are int...
Article
Context: Ranking-oriented defect prediction (RODP) ranks software modules to allocate limited testing resources to each module according to the predicted number of defects. Most RODP methods overlook that ranking a module with more defects incorrectly makes it difficult to successfully find all of the defects in the module due to fewer testing reso...
Article
Full-text available
Software defect data sets are typically characterized by an unbalanced class distribution where the defective modules are fewer than the non-defective modules. Prediction performances of defect prediction models are detrimentally affected by the skewed distribution of the faulty minority modules in the data set since most algorithms assume both cla...
Conference Paper
Effort-Aware Defect Prediction (EADP) ranks software modules based on the possibility of these modules being defective, their predicted number of defects, or defect density by using learning to rank algorithms. Prior empirical studies compared a few learning to rank algorithms considering small number of datasets, evaluating with inappropriate or o...
Conference Paper
Full-text available
BACKGROUND: In object oriented (OO) software systems, class size has been acknowledged as having an indirect effect on the relationship between certain artifact characteristics, captured via metrics, and fault-proneness, and therefore it is recommended to control for size when designing fault prediction models. AIM: To use robust statistical metho...
Article
Context: Automatic localization of buggy files can speed up the process of bug fixing to improve the efficiency and productivity of software quality assurance teams. Useful semantic information is available in bug reports and source code, but it is usually underutilized by existing bug localization approaches. Objective: To improve the performance...
Conference Paper
Background: Correctly localizing buggy files for bug reports together with their semantic and structural information is a crucial task, which would essentially improve the accuracy of bug localization techniques. Aims: To empirically evaluate and demonstrate the effects of both semantic and structural information in bug reports and source files on...
Article
Context: In addressing how best to estimate how much effort is required to develop software, a recent study found that using exemplary and recently completed projects [forming Bellwether moving windows (BMW)] in software effort prediction (SEP) models leads to relatively improved accuracy. More studies need to be conducted to determine whether the...
Conference Paper
This study presents MAHAKIL, a novel and efficient synthetic over-sampling approach for software defect datasets that is based on the chromosomal theory of inheritance. Exploiting this theory, MAHAKIL interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the di...
Article
Full-text available
Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to a target company. Unfortunately, larger irrelevant cross-company (CC) data usually make it difficult to build a prediction model with high performance. On the other hand, b...
Article
Full-text available
Context Cross-project defect prediction (CPDP) which uses dataset from other projects to build predictors has been recently recommended as an effective approach for building prediction models that lack historical or sufficient local datasets. Class imbalance and distribution mismatch between the source and target datasets associated with real-world...
Article
Context The challenge of locating bugs in mostly large-scale software systems has led to the development of bug localization techniques. However, the lexical mismatch between bug reports and source codes degrades the performances of existing information retrieval or machine learning-based approaches. Objective To bridge the lexical gap and improve...
Conference Paper
The structural complexity of design components (e.g. Classes) is proportional to design quality at the system level and is quantified via the object-oriented metrics. The frequent use of design patterns causes of too much abstraction and can increase the structural complexity of design components. Though, in our previous work, we have empirically i...
Article
Bug localization is a software development and maintenance activity that aims to find relevant source code entities to be modified so that a specific bug can be fixed on the basis of the given bug report. Information retrieval (IR) techniques have been widely used to locate bugs in recent decades. These techniques mainly use the IR similarity betwe...
Conference Paper
Context: Recent studies have shown that performance of defect prediction models can be affected when data sampling approaches are applied to imbalanced training data for building defect prediction models. However, the magnitude (degree and power) of the effect of these sampling methods on the classification and prioritization performances of defect...
Article
Context Software effort estimation (SEE) plays a key role in predicting the effort needed to complete software development task. However, the conclusion instability across learners has affected the implementation of SEE models. This instability can be attributed to the lack of an effort classification benchmark that software researchers and practit...
Article
Programmers tend to leave incomplete, temporary workarounds and buggy codes that require rework in software development and such pitfall is referred to as Self-admitted Technical Debt (SATD). Previous studies have shown that SATD negatively affects software project and incurs high maintenance overheads. In this study, we introduce a prioritization...
Conference Paper
Background: In the plethora of studies, the objectorientedmetrics have been empirically validated to assess thedesign properties and quantify the high-level quality attributessuch as fault-proneness, either at the method or class granularitylevels of software. Motivation: A more precise value of an objectorientedmetric can be used as an indicator f...
Conference Paper
In the domain of software fault prediction, class membership probability of a selected classifier and the factors related to its estimation can be considered as necessary information for tester to take informed decisions about software quality issues. The objective of this study is to empirically investigate the class membership probability estimat...
Conference Paper
Full-text available
Open Source Software (OSS) is often developed in a public collaborative manner. Online OSS repositories such as GitHub, Google Code and SourceForge support collaborative OSS development by offering services such as subversion management, bug tracking and others. However, OSS mostly favors end-users who are programmers or have some prerequisite prog...
Conference Paper
Full-text available
In object-oriented software development, a plethora of studies have been carried out to present the application of machine learning algorithms for fault prediction. Furthermore, it has been empirically validated that an ensemble method can improve classification performance as compared to a single classifier. But, due to the inherent differences am...
Conference Paper
Full-text available
Trend of software development is changing rapidly most of the software development organizations are trying to globalize their activities throughout the world. This trend leads towards a phenomenon called Global Software Development (GSD).The main reason behind the software globalization is its various benefits. Besides these benefits, software org...

Network

Cited By