Massimiliano Di Penta

Massimiliano Di Penta
Università degli Studi del Sannio | UniSannio · Department of Engineering (DING)

PhD

About

407
Publications
119,317
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
19,138
Citations
Citations since 2017
104 Research Items
11327 Citations
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
201720182019202020212022202305001,0001,5002,000
Additional affiliations
January 2008 - present
January 2001 - December 2012
Università degli Studi del Sannio

Publications

Publications (407)
Preprint
Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e., the fact that software is released in a shape not as good as it should be, e.g., in terms of functionality, reliability, or maintainability. This paper empirically investigates the extent to which technical debt...
Preprint
Since its launch in November 2022, ChatGPT has gained popularity among users, especially programmers who use it as a tool to solve development problems. However, while offering a practical solution to programming problems, ChatGPT should be mainly used as a supporting tool (e.g., in software education) rather than as a replacement for the human bei...
Preprint
Recommender systems for software engineering (RSSEs) assist software engineers in dealing with a growing information overload when discerning alternative development solutions. While RSSEs are becoming more and more effective in suggesting handy recommendations, they tend to suffer from popularity bias, i.e., favoring items that are relevant mainly...
Article
Full-text available
Continuous Integration and Delivery (CI/CD) practices have shown several benefits for software development and operations, e.g., faster release cycles and early discovery of defects. For Cyber-Physical System (CPS) development, CI/CD can help achieving required goals, such as high dependability, yet it may be challenging to apply. This paper empiri...
Article
Unmanned Aerial Vehicles (UAVs) are nowadays used in a variety of applications. Given the cyber-physical nature of UAVs, software defects in these systems can cause issues with safety-critical implications. An important aspect of the lifecycle of UAV software is to minimize the possibility of harming humans or damaging properties through a continuo...
Article
Video games represent a substantial and increasing share of the software market. However, their development is particularly challenging as it requires multi-faceted knowledge, which is not consolidated in computer science education yet. This paper aims at defining a catalog of bad smells related to video game development. To achieve this goal, we m...
Article
Full-text available
Self-Admitted Technical Debt (SATD) consists of annotations—typically, but not only, source code comments—pointing out incomplete features, maintainability problems, or, in general, portions of a program not-ready yet. The way a SATD comment is written, and specifically its polarity, may be a proxy indicator of the severity of the problem and, to s...
Article
Full-text available
Background Cyber-Physical Systems (CPSs) are systems in which software and hardware components interact with each other. Understanding the specific nature and root cause of CPS bugs would help to design better verification and validation (V&V) techniques for these systems such as domain-specific mutants. Aim We look at CPS bugs from an open-source...
Article
Refactoring operations are behavior-preserving changes aimed at improving source code quality. While refactoring is largely considered a good practice, refactoring proposals in pull requests are often rejected after the code review. Understanding the reasons behind the rejection of refactoring contributions can shed light on how such contributions...
Article
Defect prediction models can be beneficial to prioritize testing, analysis, or code review activities, and has been the subject of a substantial effort in academia, and some applications in industrial contexts. A necessary precondition when creating a defect prediction model is the availability of defect data from the history of projects. If this d...
Article
Full-text available
Developers often use Static Code Analysis Tools (SCAT) to automatically detect different kinds of quality flaws in their source code. Since many warnings raised by SCATs may be irrelevant for a project/organization, it can be possible to leverage information from the project development history, to automatically configure which warnings a SCAT shou...
Article
Code completion aims at speeding up code writing by predicting the next code token(s) the developer is likely to write. Works in this field focused on improving the accuracy of the generated predictions, with substantial leaps forward made possible by deep learning (DL) models. However, code completion techniques are mostly evaluated in the scenari...
Article
Full-text available
Self-admitted technical debt (SATD) consists of annotations, left by developers as comments in the source code or elsewhere, as a reminder about pieces of software manifesting technical debt (TD), i.e., “not being ready yet”. While previous studies have investigated SATD management and its relationship with software quality, there is little underst...
Conference Paper
Software developers rely on various repositories and communication channels to exchange relevant information about their ongoing tasks and the status of overall project progress. In this context, semi-structured and unstructured software artifacts have been leveraged by researchers to build recommender systems aimed at supporting developers in diff...
Preprint
Code completion aims at speeding up code writing by predicting the next code token(s) the developer is likely to write. Works in this field focused on improving the accuracy of the generated predictions, with substantial leaps forward made possible by deep learning (DL) models. However, code completion techniques are mostly evaluated in the scenari...
Preprint
Full-text available
Defect prediction models can be beneficial to prioritize testing, analysis, or code review activities, and has been the subject of a substantial effort in academia, and some applications in industrial contexts. A necessary precondition when creating a defect prediction model is the availability of defect data from the history of projects. If this d...
Article
Technical Debt (TD) concerns the lack of an adequate solution in a software project, from its design to the source code. Its admittance through source code comments, issues, or commit messages is referred to as Self-Admitted Technical Debt (SATD). Previous research has studied SATD from different perspectives, including its distribution, impact on...
Preprint
Code completion is one of the main features of modern Integrated Development Environments (IDEs). Its objective is to speed up code writing by predicting the next code token(s) the developer is likely to write. Research in this area has substantially bolstered the predictive performance of these techniques. However, the support to developers is sti...
Preprint
Full-text available
Software development activity has reached a high degree of complexity, guided by the heterogeneity of the components, data sources, and tasks. The proliferation of open-source software (OSS) repositories has stressed the need to reuse available software artifacts efficiently. To this aim, it is necessary to explore approaches to mine data from soft...
Preprint
Refactoring aims at improving code non-functional attributes without modifying its external behavior. Previous studies investigated the motivations behind refactoring by surveying developers. With the aim of generalizing and complementing their findings, we present a large-scale study quantitatively and qualitatively investigating why developers pe...
Conference Paper
An effective and efficient application of Continuous Integration (CI) and Delivery (CD) requires software projects to follow certain principles and good practices. Configuring such a CI/CD pipeline is challenging and error-prone. Therefore, automated linters have been proposed to detect errors in the pipeline. While existing linters identify syntac...
Preprint
Software refactoring aims at improving code quality while preserving the system's external behavior. Although in principle refactoring is a behavior-preserving activity, a study presented by Bavota et al. in 2012 reported the proneness of some refactoring actions (e.g., pull up method) to induce faults. The study was performed by mining refactoring...
Article
Refactoring aims at improving code non-functional attributes without modifying its external behavior. Previous studies investigated the motivations behind refactoring by surveying developers. With the aim of generalizing and complementing their findings, we present a large-scale study quantitatively and qualitatively investigating why developers pe...
Article
Full-text available
Mobile apps are playing a major role in our everyday life, and they are tending to become more and more complex and resource demanding. Because of that, performance issues may occur, disrupting the user experience or, even worse, preventing an effective use of the app. Ultimately, such problems can cause bad reviews and influence the app success. D...
Article
Full-text available
On question and answer sites, such as Stack Overflow (SO), developers use tags to label the content of a post and to support developers in question searching and browsing. However, these tags mainly refer to technological aspects instead of the purpose of the question. Tagging questions with their purpose can add a new dimension to the identificati...
Article
Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires “traditional” operators designed for progra...
Preprint
To develop and train defect prediction models, researchers rely on datasets in which a defect is attributed to an artifact, e.g., a class of a given release. However, the creation of such datasets is far from being perfect. It can happen that a defect is discovered several releases after its introduction: this phenomenon has been called "dormant de...
Article
Full-text available
Continuous Integration (CI) has been claimed to introduce several benefits in software development, including high software quality and reliability. However, recent work pointed out challenges, barriers and bad practices characterizing its adoption. This paper empirically investigates what are the bad practices experienced by developers applying CI...
Article
Context:Behavior-Driven Development (BDD) features the capability, through appropriate domain-specific languages, of specifying acceptance test cases and making them executable. The availability of frameworks such as Cucumber or RSpec makes the application of BDD possible in practice. However, it is unclear to what extent developers use such framew...
Preprint
Mutation testing can be used to assess the fault-detection capabilities of a given test suite. To this aim, two characteristics of mutation testing frameworks are of paramount importance: (i) they should generate mutants that are representative of real faults; and (ii) they should provide a complete tool chain able to automatically generate, inject...
Article
Full-text available
When creating a new software system, or when evolving an existing one, developers do not reinvent the wheel but, rather, seek available libraries that suit their purpose. In such a context, open source software repositories contain rich resources that can provide developers with helpful advice to support their tasks. However, the heterogeneity of r...
Article
Millions of open source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learni...
Conference Paper
A major problem with user-written bug reports, indicated by developers and documented by researchers, is the (lack of high) quality of the reported steps to reproduce the bugs. Low-quality steps to reproduce lead to excessive manual effort spent on bug triage and resolution. This paper proposes Euler, an approach that automatically identifies and a...
Article
Communication means, such as issue trackers, mailing lists, Q&A forums, and app reviews, are premier means of collaboration among developers, and between developers and end-users. Analyzing such sources of information is crucial to build recommenders for developers, for example suggesting experts, re-documenting source code, or transforming user fe...
Preprint
A major problem with user-written bug reports, indicated by developers and documented by researchers, is the (lack of high) quality of the reported steps to reproduce the bugs. Low-quality steps to reproduce lead to excessive manual effort spent on bug triage and resolution. This paper proposes Euler, an approach that automatically identifies and a...
Conference Paper
Continuous Integration (CI) is a widely-used software engineering practice. The software is continuously built so that changes can be easily integrated and issues such as unmet quality goals or style inconsistencies get detected early. Unfortunately, it is not only hard to introduce CI into an existing project, but it is also challenging to live up...
Conference Paper
Full-text available
Software developers interact with APIs on a daily basis and, therefore, often face the need to learn how to use new APIs suitable for their purposes. Previous work has shown that recommending usage patterns to developers facilitates the learning process. Current approaches to usage pattern recommendation, however, still suffer from high redundancy...
Preprint
Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a clear need for better, possibly customized, mutation operators and strategies. While some recent papers have trie...
Preprint
Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learni...
Article
The number of mobile devices sold worldwide has exponentially increased in recent years, surpassing that of personal computers in 2011. Such devices daily download and run millions of apps that take advantage of modern hardware features (e.g., multi-core processors, large Organic Light-Emitting Diode—OLED—screens, etc.) to offer exciting user exper...
Conference Paper
Although very important in software engineering, establishing traceability links between software artifacts is extremely tedious, error-prone, and it requires significant effort. Even when approaches for automated traceability recovery exist, these provide the requirements analyst with a, usually very long, ranked list of candidate links that needs...
Conference Paper
Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learni...
Preprint
Test Case Prioritization (TCP) is an important component of regression testing, allowing for earlier detection of faults or helping to reduce testing time and cost. While several TCP approaches exist in the research literature, a growing number of studies have evaluated them against synthetic software defects, called mutants. Hence, it is currently...
Article
Full-text available
Code smells are symptoms of poor design and implementation choices that may hinder code comprehensibility and maintainability. Despite the effort devoted by the research community in studying code smells, the extent to which code smells in software systems affect software maintainability remains still unclear. In this paper we present a large scale...
Conference Paper
Full-text available
Recent research has provided evidence that, in the industrial context , developing video games diverges from developing software systems in other domains, such as office suites and system utilities. In this paper, we consider video game development in the open source system (OSS) context. Specifically, we investigate how developers contribute to vi...
Conference Paper
Full-text available
Software developers frequently solve development issues with the help of question and answer web forums, such as Stack Overflow (SO). While tags exist to support question searching and browsing, they are more related to technological aspects than to the question purposes. Tagging questions with their purpose can add a new dimension to the investiga...
Conference Paper
Recent research has provided evidence that, in the industrial context, developing video games diverges from developing software systems in other domains, such as office suites and system utilities. In this paper, we consider video game development in the open source system (OSS) context. Specifically, we investigate how developers contribute to vid...
Conference Paper
Assessing the similarity between code components plays a pivotal role in a number of Software Engineering (SE) tasks, such as clone detection, impact analysis, refactoring, etc. Code similarity is generally measured by relying on manually defined or hand-crafted features, e.g., by analyzing the overlap among identifiers or comparing the Abstract Sy...
Conference Paper
Sentiment analysis has been applied to various software engineering (SE) tasks, such as evaluating app reviews or analyzing developers' emotions in commit messages. Studies indicate that sentiment analysis tools provide unreliable results when used out-of-the-box, since they are not designed to process SE datasets. The silver bullet for a successfu...
Conference Paper
Code smells were defined as symptoms of poor design choices applied by programmers during the development of a software project [2]. They might hinder the comprehensibility and maintainability of software systems [5]. Similarly to some previous work [3, 4, 6, 7] in this paper we investigate the relationship between the presence of code smells and t...
Conference Paper
Software licenses dictate how source code or binaries can be modified, reused, and redistributed. In the case of open source projects, software licenses generally fit into two main categories, permissive and restrictive, depending on the degree to which they allow redistribution or modification under licenses different from the original one(s). Dev...