Tim Menzies

Tim Menzies
North Carolina State University | NCSU · Department of Computer Science

Ph.D. (Com Sci)

About

698
Publications
185,688
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,345
Citations
Introduction
An effective data scientist supports some human endeavor, via the application of automated data miners. So I explore how to support this combination of silicon and carbon intelligence.
Additional affiliations
February 1996 - June 1997
Monash University (Australia)
Position
  • Lecturer
June 1993 - July 1995
UNSW Sydney
Position
  • PostDoc Position
July 2014 - present
North Carolina State University
Position
  • Professor (Full)
Education
April 1991 - April 1995
UNSW Sydney
Field of study
  • Computer Science

Publications

Publications (698)
Preprint
Full-text available
This paper proposes a new challenge problem for software analytics. In the process we shall call "software review", a panel of SMEs (subject matter experts) review examples of software behavior to recommend how to improve that's software's operation. SME time is usually extremely limited so, ideally, this panel can complete this optimization task a...
Article
Full-text available
Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers use far fewer labels for training models. However, there are numerous semi-supervised methods, including self-labeling, co-training, maximal-margin, and graph-based methods,...
Article
Full-text available
Synthetic data has been widely applied in the real world recently. One typical example is the creation of synthetic data for privacy concerned datasets. In this scenario, synthetic data substitute the real data which contains the privacy information, and is used to public testing for machine learning models. Another typical example is the unbalance...
Article
Full-text available
A “partial ordering” is a way to heuristically order a set of examples (partial orderings are a set where, for certain pairs of elements, one precedes the other). While these orderings may only be approximate, they can be useful for guiding a search towards better regions of the data. To illustrate the value of that technique, this paper presents i...
Article
This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the respo...
Article
When data is scarce, software analytics can make many mistakes. For example, consider learning predictors for open source project health (e.g. the number of closed pull requests in twelve months time). The training data for this task may be very small (e.g. five years of data, collected every month means just 60 rows of training data). The models g...
Preprint
Full-text available
To make models more understandable and correctable, I propose that the PROMISE community pivots to the problem of model review. Over the years, there have been many reports that very simple models can perform exceptionally well. Yet, where are the researchers asking "say, does that mean that we could make software analytics simpler and more compreh...
Article
For simulation-based systems, finding a set of test cases with the least cost by exploring multiple goals is a complex task. Domain-specific optimization goals (e.g. maximize output variance) are useful for guiding the rapid selection of test cases via mutation. But evaluating the selected test cases via mutation (that can distinguish the current p...
Preprint
Full-text available
Testing complex Cyber Physical Systems (CPSs) can be expensive and time consuming. Current state-of-the-art methods that explore this problem are fully-supervised; i.e. they require that all examples are labeled. On the other hand, the GenClu system (introduced in this paper) takes a semi-supervised approach; i.e. (a) only a small subset of informa...
Article
Full-text available
Context Many software systems can be tuned for multiple objectives (e.g., faster runtime, less required memory, less network traffic or energy consumption, etc.). Such systems can suffer from “disagreement” where different models have different (or even opposite) insights and tactics on how to optimize a system. For configuration problems, we show...
Article
Automatically generated static code warnings suffer from a large number of false alarms. Hence, developers only take action on a small percent of those warnings. To better predict which static code warnings should n ot be ignored, we suggest that analysts need to look deeper into their algorithms to find choices that better improve the particular...
Article
Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, perhaps a model learned from that region would suffice for the rest of the project. To support this claim, we offer a c...
Preprint
Full-text available
In many domains, there are many examples and far fewer labels for those examples; e.g. we may have access to millions of lines of source code, but access to only a handful of warnings about that code. In those domains, semi-supervised learners (SSL) can extrapolate labels from a small number of examples to the rest of the data. Standard SSL algorit...
Article
Prior pessimism about reuse in software engineering research may have been a result of using the wrong methods to measure the wrong things.
Preprint
When learning from very small data sets, the resulting models can make many mistakes. For example, consider learning predictors for open source project health. The training data for this task may be very small (e.g. five years of data, collected every month means just 60 rows of training data). Using this data, prior work had unacceptably large err...
Preprint
Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers use far fewer labels for training models, but there are numerous semi-supervised methods, including self-labeling, co-training, maximal-margin, and graph-based methods, to n...
Article
Full-text available
Context Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project. Objective The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effe...
Preprint
CONTEXT: Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project. OBJECTIVE: The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and eff...
Article
Full-text available
Keeping track of and managing Self-Admitted Technical Debts (SATDs) is important for maintaining a healthy software project. Current active-learning SATD recognition tool involves manual inspection of 24% of the test comments on average to reach 90% of the recall. Among all the test comments, about 5% are SATDs. The human experts are then required...
Article
Full-text available
Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstra...
Preprint
Full-text available
Automatically generated static code warnings suffer from a large number of false alarms. Hence, developers only take action on a small percent of those warnings. To better predict which static code warnings should not be ignored, we suggest that analysts need to look deeper into their algorithms to find choices that better improve the particulars o...
Preprint
Full-text available
Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a large amount of labeled data to learn the useful relationships between selected features and the target class....
Article
Full-text available
Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus produc...
Preprint
Full-text available
Ethical bias in machine learning models has become a matter of concern in the software engineering community. Most of the prior software engineering works concentrated on finding ethical bias in models rather than fixing it. After finding bias, the next step is mitigation. Prior researchers mainly tried to use supervised approaches to achieve fairn...
Article
Given the large numbers of publications in software engineering, frequent literature reviews are required to keep current on work in specific areas. One tedious work in literature reviews is to find relevant studies amongst thousands of non-relevant search results. In theory, expert systems can assist in finding relevant work but those systems have...
Preprint
Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help...
Preprint
Full-text available
To reduce technical debt and make code more maintainable, it is important to be able to warn programmers about code smells. State-of-the-art code small detectors use deep learners, without much exploration of alternatives within that technology. One promising alternative for software analytics and deep learning is GHOST (from TSE'21) that relies on...
Preprint
Full-text available
Keeping track of and managing Self-Admitted Technical Debts (SATDs) is important for maintaining a healthy software project. Current active-learning SATD recognition tool involves manual inspection of 24% of the test comments on average to reach 90% of the recall. Among all the test comments, about 5% are SATDs. The human experts are then required...
Article
Full-text available
ContextMachine learning-based security detection models have become prevalent in modern malware and intrusion detection systems. However, previous studies show that such models are susceptible to adversarial evasion attacks. In this type of attack, inputs (i.e., adversarial examples) are specially crafted by intelligent malicious adversaries, with...
Article
Full-text available
bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Context : Machine learning software can generate models that inappropriately discriminate against specific protected social groups (e.g., groups based on gender, ethnicity, etc.). Motivated by those results, software engineering researchers have proposed...
Preprint
Full-text available
For cyber-physical systems, finding a set of test cases with the least cost by exploring multiple goals is a complex task. For example, Arrieta et al. reported that state-of-the-art optimizers struggle to find minimal test suites for this task. To better manage this task, we propose DoLesS (Domination with Least Squares Approximation) which uses a...
Preprint
Testing machine learning software for ethical bias has become a pressing current concern. In response, recent research has proposed a plethora of new fairness metrics, for example, the dozens of fairness metrics in the IBM AIF360 toolkit. This raises the question: How can any fairness tool satisfy such a diverse range of goals? While we cannot comp...
Preprint
When reasoning over complex models, AI tools can generate too many solutions for humans to read and understand. In this case, \textit{interactive search-based software engineering} techniques might use human preferences to select relevant solutions (and discard the rest). Often, iSBSE methods rely on evolutionary methods that need to score thousand...
Preprint
Full-text available
A large scale collection of both semantic and natural language resources is essential to leverage active Software Engineering research areas such as code reuse and code comprehensibility. Existing machine learning models ingest data from Open Source repositories (like GitHub projects) and forum discussions (like Stackoverflow.com), whereas, in this...
Preprint
Machine learning software can generate models that inappropriately discriminate against specific protected social groups (e.g., groups based on gender, ethnicity, etc). Motivated by those results, software engineering researchers have proposed many methods for mitigating those discriminatory effects. While those methods are effective in mitigating...
Preprint
Full-text available
Cloud-based software has many advantages. When services are divided into many independent components, they are easier to update. Also, during peak demand, it is easier to scale cloud services (just hire more CPUs). Hence, many organizations are partitioning their monolithic enterprise applications into cloud-based microservices. Recently there has...
Preprint
Full-text available
Standard software analytics often involves having a large amount of data with labels in order to commission models with acceptable performance. However, prior work has shown that such requirements can be expensive, taking several weeks to label thousands of commits, and not always available when traversing new research problems and domains. Unsuper...
Preprint
Full-text available
In any field, finding the "leading edge" of research is an on-going challenge. Researchers cannot appease reviewers and educators cannot teach to the leading edge of their field if no one agrees on what is the state-of-the-art. Using a novel crowdsourced "reuse graph" approach, we propose here a new method to learn this state-of-the-art. Our reuse...
Article
Full-text available
Software engineering is a highly dynamic discipline. Hence, as times change, so too might our beliefs about core processes in this field. This paper checks some five beliefs that originated in the past decades that comment on the relationships between (i) developer productivity; (ii) software quality and (iii) years of developer experience. Using d...
Preprint
Full-text available
When optimizing software for the cloud, monolithic applications need to be partitioned into many smaller *microservices*. While many tools have been proposed for this task, we warn that the evaluation of those approaches has been incomplete; e.g. minimal prior exploration of hyperparameter optimization. Using a set of open source Java EE applicatio...
Preprint
When AI tools can generate many solutions, some human preference must be applied to determine which solution is relevant to the current project. One way to find those preferences is interactive search-based software engineering (iSBSE) where humans can influence the search process. Current iSBSE methods can lead to cognitive fatigue (when they over...
Preprint
Full-text available
Software comes with many configuration options, satisfying varying needs from users. Exploring those options for non-functional requirements can be tedious, time-consuming, and even error-prone (if done manually). Worse, many software systems can be tuned to multiple objectives (e.g., faster response time, fewer memory requirements, decreased netwo...
Conference Paper
Full-text available
Increasingly, software is making autonomous decisions in case of criminal sentencing, approving credit cards, hiring employees, and so on. Some of these decisions show bias and adversely affect certain social groups (e.g. those defined by sex, race, age, marital status). Many prior works on bias mitigation take the following form: change the data o...
Preprint
Increasingly, software is making autonomous decisions in case of criminal sentencing, approving credit cards, hiring employees, and so on. Some of these decisions show bias and adversely affect certain social groups (e.g. those defined by sex, race, age, marital status). Many prior works on bias mitigation take the following form: change the data o...
Preprint
Before researchers rush to reason across all available data, they should first check if the information is densest within some small region. We say this since, in 240 GitHub projects, we find that the information in that data ``clumps'' towards the earliest parts of the project. In fact, a defect prediction model learned from just the first 150 com...
Article
One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we pre-process data wit...
Article
Full-text available
Background In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce target...
Article
Full-text available
Static code warning tools often generate warnings that programmers ignore. Such tools can be made more useful via data mining algorithms that select the “actionable” warnings; i.e. the warnings that are usually not ignored. In this paper, we look for actionable warnings within a sample of 5,675 actionable warnings seen in 31,058 static code warning...
Article
How can we make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve the predictions generated via software analytics. Sometimes, very fast hyperparameter optimizat...
Preprint
Full-text available
Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the workflow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely imp...
Preprint
Full-text available
Lack of security expertise among software practitioners is a problem with many implications. First, there is a deficit of security professionals to meet current needs. Additionally, even practitioners who do not plan to work in security may benefit from increased understanding of security. The goal of this paper is to aid software engineering educa...
Article
Software comes in releases. An implausible change to software is something that has never been changed in prior releases. When planning how to reduce defects, it is better to use plausible changes, i.e., changes with some precedence in the prior releases. To demonstrate these points, this paper compares several defect reduction planning tools. LIME...
Article
Improved test case prioritization means that software developers can detect and fix more software faults sooner than usual. But is there one "best" prioritization algorithm Or do different kinds of projects deserve special kinds of prioritization To answer these questions, this paper applies nine prioritization schemes to 31 projects that range fro...
Article
As 2020 drew to a close, I was thinking about what lessons we have learned about software engineering (SE) for artificial intelligence (AI)-things that we can believe now but, in the last century, would have seemed somewhat shocking. One very surprising lesson, at least for me, is the success of the very complex and very simple. At the complex end,...
Preprint
Full-text available
Is deep learning over-hyped? Where are the case studies that compare state-of-the-art deep learners with simpler options? In response to this gap in the literature, this paper offers one case study on using deep learning to predict issue close time in Bugzilla. We report here that a SIMPLE extension to a decades-old feedforward neural network works...
Preprint
Full-text available
Theorem provers has been used extensively in software engineering for software testing or verification. However, software is now so large and complex that additional architecture is needed to guide theorem provers as they try to generate test suites. The SNAP test suite generator (introduced in this paper) combines the Z3 theorem prover with the fo...
Article
Many methods have been proposed to estimate how much effort is required to build and maintain software. Much of that research tries to recommend a single method – an approach that makes the dubious assumption that one method can handle the diversity of software project data. To address this drawback, we apply a configuration technique called “ROME”...
Preprint
Many methods in defect prediction are "datahungry"; i.e. (1) given a choice of using more data, or some smaller sample, researchers assume that more is better; (2) when data is missing, researchers take elaborate steps to transfer data from another project; and (3) given a choice of older data or some more recent sample, researchers usually ignore...
Preprint
BACKGROUND: Machine learning-based security detection models have become prevalent in modern malware and intrusion detection systems. However, previous studies show that such models are susceptible to adversarial evasion attacks. In this type of attack, inputs (i.e., adversarial examples) are specially crafted by intelligent malicious adversaries,...
Article
Knowledge-based systems reason over some knowledge base. Hence, an important issue for such systems is how to acquire the knowledge needed for their inference. This paper assesses active learning methods for acquiring knowledge for “static code warnings”. Static code analysis is a widely-used method for detecting bugs and security vulnerabilities i...
Article
Keeping track of and managing Self-Admitted Technical Debts (SATDs) are important to maintaining a healthy software project. This requires much time and effort from human experts to identify the SATDs manually. The current automated solutions do not have satisfactory precision and recall in identifying SATDs to fully automate the process. To solve...
Article
Full-text available
The current generation of software analytics tools are mostly prediction algorithms (e.g. support vector machines, naive bayes, logistic regression, etc). While prediction is useful, after prediction comes planning about what actions to take in order to improve quality. This research seeks methods that generate demonstrably useful guidance on “what...