
David BowesLancaster University | LU · School of Computing and Communications
David Bowes
BSc, PGCE, MSc, PhD
About
72
Publications
30,441
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,958
Citations
Introduction
David Bowes currently works at the School of Computing and Communications at Lancaster University. David does research in Software Engineering, Algorithms and Computing in Mathematics, Natural Science, Engineering and Medicine. Their most recent publication is 'How Effectively Is Defective Code Actually Tested?: An Analysis of JUnit Tests in Seven Open Source Systems'.
Additional affiliations
March 2018 - present
January 2006 - August 2017
Publications
Publications (72)
Background: Developers inevitably make human errors while coding. These errors can lead to faults in code, some of which may result in system failures. It is important to reduce the faults inserted by developers as well as fix any that slip through. Aim: To investigate the fault insertion and fault fixing activities of developers. We identify devel...
Automatic program repair (APR) offers significant potential for automating some coding tasks. Using APR could reduce the high costs historically associated with fixing code faults and deliver significant benefits to software engineering. Adopting APR could also have profound implications for software developers daily activities, transforming their...
Automatic program repair (APR) is a rapidly advancing field of software engineering that aims to supplement or replace manual bug fixing with an automated tool. For APR to be successfully adopted in industry, it is vital that APR tools respond to developer needs and preferences. However, very little research has considered developers' general attit...
A key to the success of Automatic Program Repair techniques is how easily they can be used in an industrial setting. In this article, we describe a collaboration by a team from four UK-based universities with Bloomberg (London) in implementing automatic, high-quality fixes to its code base. We explain the motivation for adopting APR, the mechanics o...
The use of asserts in code has received increasing attention in the software engineering community in the past few years, even though it has been a recognized programming construct for many decades. A previous empirical study by Casalnuovo showed that methods containing asserts had fewer defects than those that did not. In this paper, we analyze th...
Developers inevitably make human errors while coding. These errors can lead to faults in code, some of which may result in system failures. It is important to reduce the faults inserted by developers as well as fix any that slip through. To investigate the fault insertion and fault fixing activities of developers. We identify developers who insert...
Substantial development time is devoted to software maintenance and testing. As development resources are usually finite, there is a risk that some components receive insufficient effort for thorough testing. Architectural complexity (e.g. tight coupling) can make effective testing particularly challenging. Software components with high architectur...
Background: Studies related to human factors in software engineering are providing insightful information on the emotional state of contributors and the impact this has on the code. The open source software development paradigm involves different roles, and previous studies about emotions in software development have not taken into account what dif...
Background: Studies related to human factors in software engineering are providing insightful information on the emotional state of contributors and the impact this has on the code. The open source software development paradigm involves different roles, and previous studies about emotions in software development have not taken into account what dif...
Background: Newspaper headlines still regularly report latent software defects. Such defects have often evaded testing for many years. It remains difficult to identify how well a system has been tested. It also remains difficult to assess how successful at finding defects particular tests are. Coverage and mutation testing are frequently used to as...
Automatic and repeatable builds are an established software engineering practices for achieving continuous integration and continuous delivery processes. The building phase of modern software systems is an important part of the development process such that dedicated roles as "Release Engineer" are more and more required. Software development is a...
Context: Identifying defects in code early is important. A wide range of static code metrics have been evaluated as potential defect indicators. Most of these metrics offer only high level insights and focus on particular pre-selected features of the code. None of the currently used metrics clearly performs best in defect prediction.
Objective: We...
In this study, we analyzed issues and comments on GitHub projects and built collaboration networks dividing contributors into two categories: users and commenters. We identified as commenters those users who only post comments without posting any issues nor committing changes in the source code. Since previous studies showed that there is a link be...
During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse th...
Context: Replications are an important part of scientific disciplines. Replications test the credibility of original studies and can separate true results from those that are unreliable.
Objective: In this paper we investigate the replication of defect prediction studies and identify the characteristics of replicated studies. We further assess how...
The International Conference on Evaluation and Assessment in Software Engineering (EASE) had its twentieth anniversary in 2016, with that year’s edition hosted in Limerick, Ireland. Founded in 1997, the EASE conference was the first event solely dedicated to encouraging empirical research in software engineering, and its founders have been longtime...
Evolutionary coupling (EC) is defined as the implicit relationship between 2 or more software artifacts that are frequently changed together. Changing software is widely reported to be defect-prone. In this study, we investigate the effect of EC on the defect proneness of large industrial software systems and explain why the effects vary. We analys...
Context: Defect prediction research is based on a small number of defect datasets and most are at class not method level. Consequently our knowledge of defects is limited. Identifying defect datasets for prediction is not easy and extracting quality data from identified datasets is even more difficult. Goal: Identify open source Java systems suitab...
Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majorit...
We introduce mutation-aware fault prediction, which leverages additional guidance from metrics constructed in terms of mutants and the test cases that cover and detect them. We report the results of 12 sets of experiments, applying 4 different predictive modelling techniques to 3 large real-world systems (both open and closed source). The results s...
Background: The NASA datasets have previously been used extensively in studies of software defects. In 2013 Shepperd et al. presented an essential set of rules for removing erroneous data from the NASA datasets making this data more reliable to use.
Objective: We have now found additional rules necessary for removing problematic data which were not...
Software defect prediction performance varies over a large range. Menzies suggested there is a ceiling effect of 80% Recall [8]. Most of the data sets used are highly imbalanced. This paper asks, what is the empirical effect of using different datasets with varying levels of imbalance on predictive performance? We use data synthesised by a previous...
BACKGROUND – During the last 10 years hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall.
OBJECTIVE – We investigate the individual defects that four classifier...
Background. The ability to predict defect-prone software components would be valuable. Consequently, there have been many empirical studies to evaluate the performance of different techniques endeavouring to accomplish this effectively. However no one technique dominates and so designing a reliable defect prediction model remains problematic.
Objec...
We investigate the relationship between faults and five of Fowler et al.'s least-studied smells in code: Data Clumps, Switch Statements, Speculative Generality, Message Chains, and Middle Man. We developed a tool to detect these five smells in three open-source systems: Eclipse, ArgoUML, and Apache Commons. We collected fault data from the change a...
There are many hundreds of fault prediction models published in the literature. The predictive performance of these models is often reported using a variety of different measures. Most performance measures are not directly comparable. This lack of comparability means that it is often difficult to evaluate the performance of one model against anothe...
Fowler and Beck defined 22 Code Bad Smells. These smells are useful indicators of code that may need to be refactored. A range of tools have been developed that measure smells in Java code. We aim to compare the results of using two smell measurement tools (DECOR which is embedded in the Ptidej tool and Stench Blossom) on the same Java code (ArgoUM...
Background: The NASA metrics data program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, follow...
The aim of this paper is to investigate the quality of methodology in software fault prediction studies using machine learning. Over two hundred studies of fault prediction have been published in the last 10 years. There is evidence to suggest that the quality of methodology used in some of these studies does not allow us to have confidence in the...
Background: Systematic literature reviews are increasingly used in software engineering. Most systematic literature reviews require several hundred papers to be examined and assessed. This is not a trivial task and can be time consuming and error-prone. Aim: We present SLuRp - our open source web enabled database that supports the management of sys...
There are many hundreds of fault prediction models published in the literature. The predictive performance of these models is often reported using a variety of different measures. Most performance measures are not directly comparable. This lack of comparability means that it is often difficult to evaluate the performance of one model against anothe...
Sound empirical research suggests that we should analyze software metrics from a theoretical and practical perspective. This paper describes the result of an investigation into the respective merits of two cohesion-based metrics for program slicing. The Tightness and Overlap metrics were those originally proposed by Weiser for the procedural paradi...
Background: Software Code Cloning is widely used by developers to produce code in which they have confidence and which reduces development costs and improves the software quality. However, Fowler and Beck suggest that the maintenance of clones may lead to defects and therefore clones should be re-factored out. Objective: We investigate the purpose...
A systematic review of the research literature on fault-prediction models from 2000 through 2010 identified 36 studies that sufficiently defined their models and development context and methodology. The authors quantitatively analyzed 19 of these studies and the 206 models they presented. They identified several key features to help industry softwa...
It is important to develop corpuses of data to test out the efficacy of using metrics. Replicated studies are an important contribution to corpuses of metrics data. There are few replicated studies using metrics reported in software engineering. To contribute more data to the body of evidence on the use of novel program slicing-based cohesion metri...
The advantages a DSL and the benefits its use potentially brings imply that informed decisions on the design of a domain specific language are of paramount importance for its use. We believe that the foundations of such decisions should be informed by analysis of data empirically collected from systems to highlight salient features that should then...
Background: There has been much discussion amongst automated software defect prediction researchers regarding use of the precision and false positive rate classifier performance metrics. Aim: To demonstrate and explain why failing to report precision when using data with highly imbalanced class distributions may provide an overly optimistic view of...
Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs and improve the quality of software. Objective: We investigate how the context of models, the independent variables used and the modelling techniques applied, influence the performance of fault prediction models. Method:We used...
Background: The NASA Metrics Data Program data sets have been heavily used in software defect prediction experiments. Aim: To demonstrate and explain why these data sets require significant pre-processing in order to be suitable for defect prediction. Method: A meticulously documented data cleansing process involving all 13 of the original NASA dat...
In this paper, we investigate the Barcode open-source system (OSS) using one of Weiser's original slice-based metrics (Tightness) as a basis. In previous work, low numerical values of this slice-based metric were found to indicate fault-free (as opposed to fault-prone) functions. In the same work, we deliberately excluded from our analysis a catego...
Many studies have been carried out to predict the presence of software code defects using static code metrics. Such studies typically report how a classifier performs with real world data, but usually no analysis of the predictions is carried out. An analysis of this kind may be worthwhile as it can illuminate the motivation behind the predictions...
Software products can only be improved if we have a good understanding of the faults they typically contain. Code faults are
a significant source of software product problems which we currently do not understand sufficiently. Open source change repositories
are potentially a rich and valuable source of fault data for both researchers and practition...
Visual adaptation is the process that allows animals to be able to see over a wide range of light levels. This is achieved partially by lateral inhibition in the retina which compensates for low/high light levels. Neural controllers which cause robots to turn away from or towards light tend to work in a limited range of light conditions. In real en...
In this paper, we investigate the barcode OSS using two of Weiser's original slice-based metrics (tightness and overlap) as a basis, complemented with fault data extracted from multiple versions of the same system. We compared the values of the metrics in functions with at least one reported fault with fault-free modules to determine a) whether sig...
Program slicing metrics are an important addition to the range of static code measures available to software developers and researchers alike. However, slicing metrics still remain under-utilized due partly to the difficulty in calibrating such metrics for practical use; previous use of slicing metrics reveals a variety of calibration approaches. T...
This paper investigates different models of leakiness for the soma of a simulated spiking neural controller for a robot exhibiting negative photo-taxis. It also investigates two models of receptor response to stimulus levels. The re- sults show that exponential decay of ions across the soma and of a receptor re- sponse function where intensity is p...
The automated detection of defective modules within software systems could lead to reduced development costs and more reliable
software. In this work the static code metrics for a collection of modules contained within eleven NASA data sets are used
with a Support Vector Machine classifier. A rigorous sequence of pre-processing steps were applied t...