About
148
Publications
15,033
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,204
Citations
Citations since 2017
Publications
Publications (148)
The package manager (PM) is crucial to most technology stacks, acting as a broker to ensure that a verified dependency package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of PMs with various features. While our recent study indicates that package management features of PM are...
The package manager (PM) is crucial to most technology stacks, acting as a broker to ensure that a verified dependency package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of PMs with various features. While our recent study indicates that package management features of PM are...
Technical Debt occurs when development teams favour short-term operability over long-term stability. Since this places software maintainability at risk, technical debt requires early attention to avoid paying for accumulated interest. Most of the existing work focuses on detecting technical debt using code comments, known as Self-Admitted Technical...
The ability of an Open Source Software (OSS) project to attract, onboard, and retain any newcomer is vital to its livelihood. Although, evidence suggests an upsurge in novice developers joining social coding platforms (such as GitHub), the extent to which their activities result in a OSS contribution is unknown. Henceforth, we execute the protocols...
AlphaCode is a code generation system for assisting software developers in solving competitive programming problems using natural language problem descriptions. Despite the advantages of the code generating system, the open source community expressed concerns about practicality and data licensing. However, there is no research investigating generat...
Forking is a common practice for developers when building upon on already existing projects. These forks create variants, which have a common code base but then evolve the code in different directions, which is specific to that forked project requirements. An interesting side-effect of having multiple forks is the ability to select between differen...
Coding rules automatically exclude language-specific anti-patterns in the source code. However, developers still need to validate project-specific anti-patterns manually. We present a novel static analysis tool DevReplay that generates coding rules as regular expressions from real-time source code editing. The generated regular expressions automati...
Popular adoption of third-party libraries for contemporary software development has led to the creation of large inter-dependency networks, where sustainability issues of a single library can have widespread network effects. Maintainers of these libraries are often overworked, relying on the contributions of volunteers to sustain these libraries. I...
Third-party library dependencies are commonplace in today's software development. With the growing threat of security vulnerabilities, applying security fixes in a timely manner is important to protect software systems. As such, the community developed a list of software and hardware weakness known as Common Weakness Enumeration (CWE) to assess vul...
Software developers may write a number of similar source code fragments including the same mistake in software products. To remove such faulty code fragments, developers inspect code clones if they found a bug in their code. While various code clone detection methods have been proposed to identify clones of either code blocks or functions, those to...
Clone-and-own is a typical code reuse approach because of its simplicity and efficiency. Cloned software components are maintained independently by a new owner. These clone-and-own operations can be occurred sequentially, that is, cloned components can be cloned again and owned by other new owners on the supply chain. In general, code reuse is not...
In the field of data science, and for academics in general, the Python programming language is a popular choice, mainly because of its libraries for storing, manipulating, and gaining insight from data. Evidence includes the versatile set of machine learning, data visualization, and manipulation packages used for the ever-growing size of available...
Third-party package usage has become a common practice in contemporary software development. Developers often face different challenges, including choosing the right libraries, installing errors, discrepancies, setting up the environment, and building failures during software development. The risks of maintaining a third-party package are well know...
It has become common practice for software projects to adopt third-party dependencies. Developers are encouraged to update any outdated dependency to remain safe from potential threats of vulnerabilities. In this study, we present an approach to aid developers show whether or not a vulnerable code is reachable for JavaScript projects. Our prototype...
The widespread adoption of third-party libraries for contemporary software development has led to the creation of large inter-dependency networks, where sustainability issues of a single library can have widespread network effects. Maintainers of these libraries are often overworked, relying on the contributions of volunteers to sustain these libra...
Technical Debt is a metaphor used to describe the situation in which long-term software artifact quality is traded for short-term goals in software projects. In recent years, the concept of self-admitted technical debt (SATD) was proposed, which focuses on debt that is intentionally introduced and described by developers. Although prior work has ma...
It has become common practice for software projects to adopt third-party dependencies. Developers are encouraged to update any outdated dependency to remain safe from potential threats of vulnerabilities. In this study, we present an approach to aid developers show whether or not a vulnerable code is reachable for JavaScript projects. Our prototype...
Third-party package usage has become a common practice in contemporary software development. Developers often face different challenges, including choosing the right libraries, installing errors, discrepancies, setting up the environment, and building failures during software development. The risks of maintaining a third-party package are well know...
The management of third-party package dependencies is crucial to most technology stacks, with package managers acting as brokers to ensure that a verified package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of package ecosystems with their own management features. While recent...
Context: Open source software development has become more social and collaborative, especially with the rise of social coding platforms like GitHub. Since 2016, GitHub started to support more informal methods such as emoji reactions, with the goal to reduce commenting noise when reviewing any code changes to a repository. Interestingly, preliminary...
The management of third-party package dependencies is crucial to most technology stacks, with package managers acting as brokers to ensure that a verified package is correctly installed, configured, or removed from an application. Diversity in technology stacks has led to dozens of package ecosystems with their own management features. While recent...
SQL is one of the most popular tools for data analysis, and it is now used by an increasing number of users without having expertise in databases. Several studies have proposed programming-by-example approaches to help such non-experts to write correct SQL queries. While existing methods support a variety of SQL features such as aggregation and nes...
The Node.js Package Manager (i.e., npm) archive repository serves as a critical part of the JavaScript community and helps support one of the largest developer ecosystems in the world. However, as a developer, selecting an appropriate npm package to use or contribute to can be difficult. To understand what features users and contributors consider i...
Context: Contemporary code review tools are a popular choice for software quality assurance. Using these tools, reviewers are able to post a linkage between two patches during a review discussion. Large development teams that use a review-then-commit model risk being unaware of these linkages. Objective: Our objective is to first explore how patch...
Context
Contemporary code review tools are a popular choice for software quality assurance. Using these tools, reviewers are able to post a linkage between two patches during a review discussion. Large development teams that use a review-then-commit model risk being unaware of these linkages.
Objective
Our objective is to first explore how patch l...
Context: Code Review (CR) is the cornerstone for software quality assurance and a crucial practice for software development. As CR research matures, it can be difficult to keep track of the best practices and state-of-the-art in methodology, dataset, and metric. Objective: This paper investigates the potential of benchmarking by collecting methodol...
Security vulnerability in third-party dependencies is a growing concern not only for developers of the affected software, but for the risks it poses to an entire software ecosystem, e.g., Heartbleed vulnerability. Recent studies show that developers are slow to respond to the threat of vulnerability, sometimes taking four to eleven months to act. T...
Finding the same or similar code snippets in the source code for a query code snippet is one of the fundamental activities in software maintenance. Code clone detectors detect the same or similar code snippets, but they report all of the code clone pairs in the target, which are generally excessive to the users. In this paper, we propose ccgrep, a...
Context
Code Review (CR) is the cornerstone for software quality assurance and a crucial practice for software development. As CR research matures, it can be difficult to keep track of the best practices and state-of-the-art in methodology, dataset, and metric.
Objective
This paper investigates the potential of benchmarking by collecting methodolo...
Context: Open Source Software (OSS) projects rely on a continuous stream of new contributors for sustainable livelihood. Recent studies reported that new contributors experience many barriers in their first contribution. One of the critical barriers is the social barrier. Although a number of studies investigated the social barriers to new contribu...
Code Review plays a crucial role in software quality, by allowing reviewers to discuss and critique any new patches before they can be successfully integrated into the project code. Yet, it is unsure the extent to which coding pattern changes (i.e., repetitive code) from when a patch is first submitted and when the decision is made (i.e., during th...
Technical Debt is a metaphor used to describe the situation in which long-term code quality is traded for short-term goals in software projects. In recent years, the concept of self-admitted technical debt (SATD) was proposed, which focuses on debt that is intentionally introduced and described by developers. Although prior work has made important...
Online collaboration platforms such as GitHub have provided software developers with the ability to easily reuse and share code between repositories. With clone-and-own and forking becoming prevalent, maintaining these shared files is important, especially for keeping the most up-to-date version of reused code. Different to related work, we propose...
Logging is an important feature of a software system to record run-time information. Detailed logging allows developers to collect run-time information in situations where they cannot use an interactive debugger, such as continuous integration and web application server cases. However, extensive logging leads to larger execution traces because few...
The ability for an Open Source Software (OSS) project to attract, onboard, and retain any newcomer is vital to its livelihood. Evidence suggests more new users are joining GitHub, however, the extent to which they contribute to OSS projects is unknown. In this study, we coin the term newcomer candidate to describe a novice developer that is a new u...
Technical debt occurs when software engineers favour short-term operability over long-term stability. Since this puts software stability at risk, technical debt requires early attention (failing which it accumulates interest). Most of existing work focus on detecting technical debts through code comment (i.e. self-admitted technical debt). However,...
SQL is one of the most popular tools for data analysis and used by an increasing number of users without having expertise in databases. In order to help such non-experts to write correct SQL queries, several studies have proposed programming-by-example approaches. In these approaches, the user can obtain a desired query just by giving input and out...
With one of the largest available collection of reusable packages, the JavaScript runtime environment Node.js is one of the most popular programming application. With recent work showing evidence that known vulnerabilities are prevalent in both open source and industrial software, we propose and implement a viable code-based vulnerability detection...
Context: To attract, onboard, and retain any new-comer in Open Source Software (OSS) projects is vital to their livelihood. Recent studies conclude that OSS projects risk failure due to abandonment and poor participation of newcomers. Evidence suggests more new users are joining GitHub, however, the extent to which they contribute to OSS projects i...
Static analysis tools, or linters, detect violation of source code conventions to maintain project readability. Those tools automatically fix specific violations while developers edit the source code. However, existing tools are designed for the general conventions of programming languages. These tools do not check the project/API-specific conventi...
Finding the same or similar code snippets in source code is one of fundamental activities in software maintenance. Text-based pattern matching tools such as grep is frequently used for such purpose, but making proper queries for the expected result is not easy. Code clone detectors could be used but their features and result are generally excessive...
Code Review (CR) is a cornerstone for Quality Assurance within software development teams. Also known as "software inspections" and "walk-throughs", traditional CR involved time-consuming processes, which is different from more lightweight contemporary forms used today. In this paper, we aim to summarize how CR research has evolved into its current...
Vulnerabilities in third-party libraries is a growing concern for the software developer, as it poses risks not only to the software client itself but to the entire software ecosystem. To mitigate these risks, developers are strongly recommended to update their dependencies. Recent studies show that affected developers are not likely to respond to...
Links are an essential feature of the World Wide Web, and source code repositories are no exception. However, despite their many undisputed benefits, links can suffer from decay, insufficient versioning, and lack of bidirectional traceability. In this paper, we investigate the role of links contained in source code comments from these perspectives....
Defects in spacecraft software may result in loss of life and serious economic damage. To avoid such consequences, the software development process incorporates code review activity. A code review conducted by a third-party organization independently of a software development team can effectively identify defects in software. However, such review a...
Peer code review is key to ensuring the absence of software defects. To reduce review costs, software developers adopt code convention checking tools that automatically identify maintainability issues in source code. However, these tools do not always address the maintainability issue for a particular project. The goal of this study is to understan...