Bin Lin’s research while affiliated with Radboud University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (29)


Fig. 1: Overview of this study
On the Evolution of Unused Dependencies in Java Project Releases: An Empirical Study
  • Conference Paper
  • Full-text available

April 2025

·

17 Reads

·

Yagut Shakizada

·

·

[...]

·

Modern software development heavily relies on third-party dependencies to reduce workload and improve developer productivity. Given the vast number of dependencies available and the ease of including them in projects, some introduced dependencies are never used, leading to bloated software, longer build times, and increased network bandwidth usage. While several previous studies have examined the prevalence of unused dependencies and their impact on security, it remains unclear how these dependencies are introduced and removed in software projects. This study aims to answer this question through an empirical study involving 3,020 release versions of 417 Java projects. Our analysis shows that unused packages are common in most projects (52% of projects), but few releases (9%) introduce new unused dependencies. Among those resolved unused dependencies, 59% of them were removed and 41% were used in later versions. Our findings highlight that not all unused dependencies should be removed in practice.

Download

Leveraging Context Information for Self-Admitted Technical Debt Detection

Self-Admitted Technical Debt (SATD) refers to non-optimal software design or implementation that is acknowledged and explicitly documented in the code by developers. Detecting SATD and understanding its evolution can help developers better manage their development activities and monitor the software quality. In recent years, numerous approaches have been proposed to automatically identify SATD. However, these approaches still suffer from a high number of false positives (i.e., non-SATD comments being detected as SATD). To further advance this field, in this paper, we conduct an empirical study to evaluate the performance of the state-of-the-art SATD detection tools and investigate the causes behind the false positives. By manually analyzing 135 false positive cases, we identify the main types of comments that are easily misclassified. To address this issue, we propose a new approach, CASTI, which integrates context information into CodeBERT, a pre-trained model for programming languages. Our evaluation demonstrates that CASTI can significantly reduce the false positives and that the context information does help improve the performance.




Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension

August 2023

·

76 Reads

Context: Developers spend most of their time comprehending source code during software development. Automatically assessing how readable and understandable source code is can provide various benefits in different tasks, such as task triaging and code reviews. While several studies have proposed approaches to predict software readability and understandability, most of them only focus on local characteristics of source code. Besides, the performance of understandability prediction is far from satisfactory. Objective: In this study, we aim to assess readability and understandability from the perspective of language acquisition. More specifically, we would like to investigate whether code readability and understandability are correlated with the naturalness and vocabulary difficulty of source code. Method: To assess code naturalness, we adopted the cross-entropy metric, while we use a manually crafted list of code elements with their assigned advancement levels to assess the vocabulary difficulty. We will conduct a statistical analysis to understand their correlations and analyze whether code naturalness and vocabulary difficulty can be used to improve the performance of code readability and understandability prediction methods. The study will be conducted on existing datasets.


The Human Side of Fuzzing: Challenges Faced by Developers During Fuzzing Activities

August 2023

·

45 Reads

·

9 Citations

ACM Transactions on Software Engineering and Methodology

Fuzz testing, also known as fuzzing, is a software testing technique aimed at identifying software vulnerabilities. In recent decades, fuzzing has gained increasing popularity in the research community. However, existing studies led by fuzzing experts mainly focus on improving the coverage and performance of fuzzing techniques. That is, there is still a gap in empirical knowledge regarding fuzzing, especially about the challenges developers face when they adopt fuzzing. Understanding these challenges can provide valuable insights to both practitioners and researchers on how to further improve fuzzing processes and techniques. We conducted a study to understand the challenges encountered by developers during fuzzing. More specifically, we first manually analyzed 829 randomly sampled fuzzing-related GitHub issues and constructed a taxonomy consisting of 39 types of challenges (22 related to the fuzzing process itself, 17 related to using external fuzzing providers). We then surveyed 106 fuzzing practitioners to verify the validity of our taxonomy and collected feedback on how the fuzzing process can be improved. Our taxonomy, accompanied with representative examples and highlighted implications, can serve as a reference point on how to better adopt fuzzing techniques for practitioners, and indicates potential directions researchers can work on toward better fuzzing approaches and practices.


On the Rise of Modern Software Documentation

July 2023

·

22 Reads

·

3 Citations

Classical software documentation, as it was conceived and intended decades ago, is not the only reality anymore. Official documentation from authoritative and official sources is being replaced by real-time collaborative platforms and ecosystems that have seen a surge, influenced by changes in society, technology, and best practices. These modern tools influence the way developers document the conception, design, and implementation of software. As a by-product of these shifts, developers are changing their way of communicating about software. Where once official documentation stood as the only truth about a project, we now find a multitude of volatile and heterogeneous documentation sources, forming a complex and ever-changing documentation landscape. Software projects often include a top-level README file with important information, which we leverage to identify their documentation landscape. Starting from ∼12K GitHub repositories, we mine their README files to extract links to additional documentation sources. We present a qualitative analysis, revealing multiple dimensions of the documentation landscape (e.g., content type, source type), highlighting important insights. By analyzing instant messaging application links (e.g., Gitter, Slack, Discord) in the histories of README files, we show how this part of the landscape has grown and evolved in the last decade. Our findings show that modern documentation encompasses communication platforms, which are exploding in popularity. This is not a passing phenomenon: On the contrary, it entails a number of unknowns and socio-technical problems the research community is currently ill-prepared to tackle.



Opinion Mining for Software Development: A Systematic Literature Review

July 2022

·

193 Reads

·

76 Citations

ACM Transactions on Software Engineering and Methodology

Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies. SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils these approaches entail. We conducted a systematic literature review involving 185 papers. More specifically, we present (1) well-defined categories of opinion mining-related software development activities, (2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, (3) available datasets for performance evaluation and tool customization, and (4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for software development activities and provide critical insights for the further development of opinion mining techniques in the SE domain.


Why Do Developers Reject Refactorings in Open-Source Projects?

April 2022

·

80 Reads

·

12 Citations

ACM Transactions on Software Engineering and Methodology

Refactoring operations are behavior-preserving changes aimed at improving source code quality. While refactoring is largely considered a good practice, refactoring proposals in pull requests are often rejected after the code review. Understanding the reasons behind the rejection of refactoring contributions can shed light on how such contributions can be improved, essentially benefiting software quality. This article reports a study in which we manually coded rejection reasons inferred from 330 refactoring-related pull requests from 207 open-source Java projects. We surveyed 267 developers to assess their perceived prevalence of these identified rejection reasons, further complementing the reasons. Our study resulted in a comprehensive taxonomy consisting of 26 refactoring-related rejection reasons and 21 process-related rejection reasons. The taxonomy, accompanied with representative examples and highlighted implications, provides developers with valuable insights on how to ponder and polish their refactoring contributions, and indicates a number of directions researchers can pursue toward better refactoring recommenders.


Citations (23)


... In this regard, tools, functioning as "artificial collaborators", such as GitHub Copilot [2] and ChatGPT [3], have been effective in assisting and supporting developers in multiple phases of the software development lifecycle [4]- [6] as well as enhancing their understanding of code [7], [8]. ...

Reference:

Toward Neurosymbolic Program Comprehension
On the Use of ChatGPT for Code Review: Do Developers Like Reviews By ChatGPT?

... (P8) With the important technical knowledge that links the technology with realistic applications, documentors design their documentation to provide novel, practical examples: "I create a small use case that will help put someone in this problem situation, and then I explain that we will build this [solution] [...] At the end, [the learner] will have a project that they can use in the real world." (P23) Ultimately, documentation is a communication [78], and since different people communicate differently, there is an inherent originality to human-created documentation: "There's this whole thing of: the topic has already been written about, so there's no need to write about it. But, individually, we all have various ways we could add value to people that relate to us, by how we frame our writing." ...

On the Rise of Modern Software Documentation
  • Citing Conference Paper
  • July 2023

... Graybox fuzzing [29] focuses on seed schedule, power schedule, mutation strategy, etc. New methods are proposed or combined with other techniques to enhance the effectiveness of testing [30]. ...

The Human Side of Fuzzing: Challenges Faced by Developers During Fuzzing Activities

ACM Transactions on Software Engineering and Methodology

... These idioms are represented as a syntactic probabilistic model that uses probabilities to measure the quality of a proposed idiom. Similar approaches have been used for measuring how natural/idiomatic code is, or how it changes when bugs are fixed [14,15,16,17]. Based on such measures, these approaches have all found that software is repetitive-in other words, that idioms are often used. ...

On the Uniqueness of Code Redundancies
  • Citing Conference Paper
  • May 2017

... In software engineering studies, LIWC has been adopted and used repeatedly by researchers since 2007 until now, providing insight into developer collaboration, the emotional tone of project discussions, the overall dynamics of software teams, etc. While previous studies have examined the use of opinion mining and sentiment analysis tools in SE [13], this study focuses on psycholinguistic tools due to their ability to target and interpret complex psychological constructs. In particular, we explore the significant 3 role of LIWC in the analysis of SE-specific language toward gaining insights into various psychological and social factors affecting the daily tasks of software engineers. ...

Opinion Mining for Software Development: A Systematic Literature Review
  • Citing Article
  • July 2022

ACM Transactions on Software Engineering and Methodology

... This taxonomy can be compared with existing ones in the literature, such as those proposed by Pantiuchina et al. [46,85] and Paixao et al. [43], which classify refactorings based on reasons for refactoring rejection and rationale in the GitHub pull requests and Gerrit, respectively. For example, previous studies found that lack of clear goals and poorly documented proposals are the main causes of rejection after code review, while design improvement and test quality are key motivations. ...

Why Do Developers Reject Refactorings in Open-Source Projects?
  • Citing Article
  • April 2022

ACM Transactions on Software Engineering and Methodology

... While GP can improve run time performance [58] and energy efficiency [59], it is often limited to small source codes due to high memory requirements and the expansive search space of tree-based solutions. Second, those approaches that preserve the functional properties of the SW, with code refactoring being a promising method that can significantly reduce run time [60] and energy consumption [61]. Innovative methods that combine deep and machine learning to optimize code, alongside with parameter optimization and auto-tuning, demonstrate significant potential for enhancing SW efficiency. ...

How Software Refactoring Impacts Execution Time
  • Citing Article
  • April 2022

ACM Transactions on Software Engineering and Methodology

... the software's external behavior [22,32,35]. Numerous empirical studies [20,25,28,31,32,35,36] have highlighted the importance of test refactoring, and developers widely accept the refactored tests, and both developers and testers are increasingly recognizing the negative impact of test smells. They widely agree that test quality improves significantly once test smells are eliminated. ...

Does Refactoring Break Tests and to What Extent?

... The majority of efforts resulted in new approaches to analyze Java projects (e.g., [1,2,8,16,23,31,33,35,38,39,41]). Due to the need for finding refactorings in other programming languages, recent years also saw research on refactoring detection for software written in Python [3,9,25], JavaScript [33], Go [4], Kotlin [18,21,22], C [33], and C++ [23]. Some of these tools [21,22,25]including the one for C++ projects-are based on RefDetect [23], which "is not currently publicly available" [24]. ...

PYREF: Refactoring Detection in Python Projects

... Maipradit et al. [17] presented an approach using N-gram IDF capable of detecting issues referenced in code comments, and to automatically detect instances of "On-hold" SATD, where the developer's comments indicate that a developer is holding off on future work due to a future event. Using n-gram + autosklean they achieved an F1-score of 0.73. ...

Automated Identification of On-hold Self-admitted Technical Debt
  • Citing Conference Paper
  • September 2020