Andrea Capiluppi

Andrea Capiluppi
University of Groningen | RUG · Johann Bernoulli Institute for Mathematics and Computer Science (JBI)

PhD in Computer Systems

About

134
Publications
38,562
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,841
Citations
Additional affiliations
April 2020 - present
University of Groningen
Position
  • Professor (Associate)
May 2012 - March 2020
Brunel University London
Position
  • Professor (Associate)
February 2009 - May 2012
University of East London
Position
  • Professor (Associate)
Education
January 2002 - May 2005
Politecnico di Torino
Field of study
  • Software Engineering

Publications

Publications (134)
Article
Full-text available
Software systems continuously evolve to accommodate new features and interoperability relationships between artifacts point to increasingly relevant software change impacts. During maintenance, developers must ensure that related entities are updated to be consistent with these changes. Studies in the static change impact analysis domain have ident...
Article
Full-text available
One of the most time-consuming tasks for developers is the comprehension of new code bases. An effective approach to aid this process is to label source code files with meaningful annotations, which can help developers understand the content and functionality of a code base quicker. However, most existing solutions for code annotation focus on proj...
Article
Context GitHub is the world's most prominent host of source code, with more than 327M repositories. However, most of these repositories are not labelled or inadequately, making it harder for users to find relevant projects. Various proposals for software application domain classification over the past years have been proposed. However, these severa...
Article
Full-text available
A key part of software evolution and maintenance is the continuous integration from collaborative efforts, often resulting in complex traceability challenges between software artifacts: features and modules remain scattered in the source code, and traceability links become harder to recover. In this paper, we perform a systematic mapping study deal...
Article
Full-text available
Collaboration between academia and industry for education is a common practice, and several academic institutions already benefit from that. While best practices have been shared on how to run these collaborations, the process of starting these collaborations is less clear. Even long established schemes, like the one running at the Norwegian Univer...
Article
The five papers in this special section focus on collaboration and innovation dynamics in software ecosystems. Over the past years, there has been an exponential growth in the number of articles mentioning “ecosystem” as their topic, and among them, ecosystems in the context of information and communication technologies have received particular att...
Article
Full-text available
Context Burnout is a work-related syndrome that, similar to many occupations, influences most software developers. For decades, studies in software engineering(SE) have explored the causes of burnout and its consequences among IT professionals. Objective This paper is a systematic mapping study (SMS) of the studies on burnout in SE, exploring its...
Article
Full-text available
Effort estimation models are a fundamental tool in software management, and used as a forecast for resources, constraints and costs associated to software development. For Free/Open Source Software (FOSS) projects, effort estimation is especially complex: professional developers work alongside occasional, volunteer developers, so the overall effort...
Preprint
Full-text available
E-type open-source software inevitably grows in size and complexity over time, and without performing anti-regressive tasks this type of software has a limited lifespan. In this project, a case study of the effect of such anti-regressive tasks is conducted using GrimoireLab Graal as a subject. This process is guided by quality metrics and developer...
Preprint
Full-text available
GitHub is the world's largest host of source code, with more than 150M repositories. However, most of these repositories are not labeled or inadequately so, making it harder for users to find relevant projects. There have been various proposals for software application domain classification over the past years. However, these approaches lack a well...
Preprint
Full-text available
Empirical results in software engineering have long started to show that findings are unlikely to be applicable to all software systems, or any domain: results need to be evaluated in specified contexts, and limited to the type of systems that they were extracted from. This is a known issue, and requires the establishment of a classification of sof...
Article
Empirical results in software engineering have long started to show that findings are unlikely to be applicable to all software systems, or any domain: results need to be evaluated in specified contexts, and limited to the type of systems that they were extracted from. This is a known issue, and requires the establishment of a classification of sof...
Preprint
Full-text available
Effort estimation models are a fundamental tool in software management, and used as a forecast for resources, constraints and costs associated to software development. For Free/Open Source Software (FOSS) projects, effort estimation is especially complex: professional developers work alongside occasional, volunteer developers, so the overall effort...
Preprint
Full-text available
Software repository hosting services contain large amounts of open-source software, with GitHub hosting more than 100 million repositories, from new to established ones. Given this vast amount of projects, there is a pressing need for a search based on the software's content and features. However, even though GitHub offers various solutions to aid...
Preprint
Full-text available
Component Based Software Engineering (CBSE) seeks to promote the reuse of software by using existing software modules into the development process. However, the availability of such a reusable component is not immediate and is costly and time consuming. As an alternative, the extraction from pre-existing OO software can be considered. In this work,...
Chapter
Full-text available
Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts cont...
Conference Paper
Full-text available
Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts co...
Article
Full-text available
In a widely used definition, a software ecosystem is a 'collection of software projects which are developed and evolve together in the same environment'. The objective of this paper is to explore how software ecosystems foster collaboration among their key players to achieve innovation within individual software projects. Thus, in this paper, we me...
Article
Full-text available
A smart contract (SC) is a programme stored in the Ethereum blockchain by a contract‐creation transaction. SC developers deploy an instance of the SC and attempt to execute it in exchange for a fee, paid in Ethereum coins (Ether). If the computation needed for their execution turns out to be larger than the effort proposed by the developer (i.e., t...
Article
Full-text available
Context: Long-term software projects employ different software developers who collaborate on shared artifacts. The accumulation of changes pushed by different developers leave traces on the underlying code, that have an effect on its future maintainability, and even reuse. Objective: This study focuses on the how the changes by different developers...
Article
Collaborative development is a paradigm shift in software development. Loosely coupled developers coordinate their work via distributed versioning systems (SVN, Git, and others), code reviews and priority-led bug tracking systems. This development approach allows many different developers to input additional source code to the same source artifact....
Article
Full-text available
Background Research on empirical software engineering has increasingly been conducted by analysing and measuring vast amounts of software systems. Hundreds, thousands and even millions of systems have been (and are) considered by researchers, and often within the same study, in order to test theories, demonstrate approaches or run prediction models...
Chapter
Full-text available
Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction....
Preprint
Full-text available
Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction....
Conference Paper
Full-text available
Students in higher education are traditionally requested to produce various pieces of written work during the courses they undertake. When students' work is submitted online as a whole, both the ethically questionable act of procrastinating and late submissions afect performance. The objective of this paper is to assess the performance of students...
Preprint
Full-text available
The term 'software ecosystem' refers to a collection of software systems that are related in some way. Researchers have been using different levels of aggregation to define an ecosystem: grouping them by a common named project (e.g., the Apache ecosystem); or considering all the projects contained in online repositories (e.g., the GoogleCode ecosys...
Conference Paper
Global software development has long being recognised as a paradigm shift in modern software development. As an immediate effect, co-location of workers in the same building or office is not seen as necessary any longer. Coordination in distributed socio-technical systems is mostly achieved by means of the artifacts that are produced by the develop...
Conference Paper
The evolution of software systems is an inevitable process which has to be managed effectively to enhance software quality. Change impact analysis (CIA) is a technique that identifies impact sets, i.e., the set of classes that require correction as a result of a change made to a class or artefact. These sets can also be considered as ripple effects...
Conference Paper
Full-text available
Dependency-based software change impact analysis is the domain concerned with estimating sets of artifacts impacted by a change to a related artifact. Research has shown that analysing the various class dependency types independently will not reveal a complete estimate of impact sets. Therefore, dependency types are combined to improve the precisio...
Article
Full-text available
During the lifetime of object-Oriented (OO) software systems, new classes are added to increase functionality, also increasing the inter-dependencies between classes. Logical coupling depicts the change dependencies between classes, while structural coupling measures source code dependencies induced via the system architecture. The relationship or...
Conference Paper
Full-text available
Context: Conceptual coupling is a measure of how loosely or closely related two software artifacts are, by considering the semantic information embedded in the comments and identifiers. This type of coupling is typically evaluated using the semantic information from source code into a words corpus. The extraction of words corpora can be lengthy, es...
Conference Paper
Full-text available
Specialised, or vertical, social networks (SSN) are emerging as a useful tool to address practical issues such as household water use management. Despite the perceived benefits, the design of such systems is still not fully aware of the social interactions or the incentives that could be used to change user’s behaviours when engaging with the netwo...
Conference Paper
It is established that the internal quality of software is a key determinant of the total cost of ownership of that software. The objective of this research is to determine the impact that the development team’s size has on the internal structural attributes of a codebase and, in doing so, we consider the impact that the team’s size may have on the...
Conference Paper
Full-text available
Context: Information and tracking of defects can be severely incomplete in almost every Open Source project, resulting in a reduced traceability of defects into the development logs (i.e., version control commit logs). In particular, defect data often appears not in sync when considering what developers logged as their actions. Synchronizing or com...
Article
Full-text available
A promising way to support software reuse is based on Component-Based Software Development CBSD. Open Source Software OSS products are increasingly available that can be freely used in product development. However, OSS communities still face several challenges before taking full advantage of the "reuse mechanism": many OSS projects duplicate effort...
Article
Full-text available
Context: Obfuscation is a common technique used to protect software against malicious reverse engineering. Obfuscators manipulate the source code to make it harder to analyze and more difficult to understand for the attacker. Although different obfuscation algorithms and implementations are available, they have never been directly compared in a lar...
Conference Paper
Full-text available
It has been suggested that the data from bug repositories is not always in sync or complete compared to the logs detailing the actions of developers on source code. In this paper, we trace two sources of information relative to software bugs: the change logs of the actions of developers and the issues reported as bugs. The aim is to identify and qu...
Conference Paper
Full-text available
Because of the distributed and collaborative nature of free / open source software (FOSS) projects, the development effort invested in a project is usually unknown, even after the software has been released. However, this information is becoming of major interest, especially —but not only— because of the growth in the number of companies for which...
Conference Paper
Full-text available
In the last twenty years, the evolution of web systems has been driven along three dimensions: the processes used to develop, evolve, maintain and re-engineer the systems themselves; the end products (the pages, content and links) of such processes; and finally the people dimension, with the extraordinary shift in how developers and users shape, in...
Article
Full-text available
Online communities are flourishing as social meeting web spaces for users and peer community members. Different online communities require different levels of competence for participants to join, and scattered evidence suggests that females and minorities as participants can be under-represented. Additional anecdotal evidence suggests that women wi...
Article
Full-text available
Several years of research and evidence have demonstrated that open source software portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to increasingly act as a storage for abandoned project...
Conference Paper
Full-text available
Open Source Software (OSS) proponents suggest that when develop-ers lose interest in their project, their last duty is to "hand it off to a competent successor." However, the mechanisms of such a hand-off are not clear, or widely known among OSS developers. As a result, many OSS projects, after a certain long period of evolution, stop evolving, in...
Article
Full-text available
Empirical research on Free/Libre/Open Source Software (FLOSS) has shown that developers tend to cluster around two main roles: “core” contributors differ from “peripheral” developers in terms of a larger number of responsibilities and a higher productivity pattern. A further, cross-cutting characterization of developers could be achieved by associa...
Article
Full-text available
A promising way to support software reuse is based on Component-Based Software Development (CBSD). Open Source Software (OSS) products are increasingly available that can be freely used in product development. However, OSS communities still face several challenges before taking full advantage of the "reuse mechanism": many OSS projects duplicate ef...
Chapter
The process of fixing software bugs plays a key role in the maintenance activities of a software project. Ideally, code ownership and responsibility should be enforced among developers working on the same artifacts, so that those introducing buggy code could also contribute to its fix. However, especially in FLOSS projects, this mechanism is not cl...
Chapter
A promising way to support software reuse is based on Component-Based Software Development (CBSD). Open Source Software (OSS) products are increasingly available that can be freely used in product development. However, OSS communities still face several challenges before taking full advantage of the “reuse mechanism”: many OSS projects duplicate ef...
Conference Paper
Full-text available
Online communities are flourishing as social meeting web-spaces for users and peer community members. Different online communities require different levels of competence for participants to join, and scattered evidence suggests that women can be overly under-represented. Moreover, anecdotal evidence of the Q&A website StackOverflow suggests that wo...
Article
Full-text available
The Social Web provides comprehensive and publicly available information about software developers: they can be identified as contributors to open source projects, as experts at maintaining weak ties on social network sites, or as active participants to knowledge sharing sites. These signals, when aggregated and summarized, could be used to define...
Article
The Mining Software Repositories (MSR) field analyzes software repository data to uncover knowledge and assist development of ever growing, complex systems. However, existing approaches and platforms for MSR analysis face many challenges when performing ...
Conference Paper
Full-text available
Obfuscation is a very common protection against reverse engineering attacks: it modifies a program structure to make it harder for the adversary to analyse and understand it. Conceptually, obfuscation is the opposite of refactoring: the code should be more complex to understand, bloated, and with excessive characteristics from the design point of v...
Conference Paper
Full-text available
It has been lately established that a major success or failure factor of an OSS project is whether or not it involves a commercial company, or more ex-tremely, when a project is managed by a commercial software corporation. As documented recently, the success of the Eclipse project can be largely attributed to IBM's project management, since the up...
Conference Paper
Full-text available
Wikipedia is the largest online service storing user-generated content. Its pages are open to anyone for addition, deletion and modifications, and the effort of contributors is recorded and can be tracked in time. Although potentially the Wikipedia web content could exhibit unbounded growth, it is still not clear whether the effort of developers an...
Article
The public data available in Open Source Software (OSS) repositories has been used for many practical reasons: detecting community structures; identifying key roles among developers; understanding software quality; predicting the arousal of bugs in large OSS systems, and so on; but also to formulate and validate new metrics and proof-of-concepts on...
Conference Paper
Full-text available
A promising way of software reuse is Component-Based Software Development (CBSD). There is an increasing number of OSS products available that can be freely used in product development. However, OSS communities themselves have not yet taken full advantage of the “reuse mechanism”. Many OSS projects duplicate effort and code, even when sharing the s...
Article
Full-text available
This paper proposes to use a historical perspective on generic laws, principles, and guidelines, like Lehman’s software evolution laws and Martin’s design principles, in order to achieve a multi-faceted process and structural assessment of a system’s architectural evolution. We present a simple structural model with associated historical metrics a...
Conference Paper
Full-text available
It has been claimed that the advent of user-generated content has reshaped the way people approached all sorts of content realization projects, being multimedia (YouTube, DeviantArt, etc.), knowledge (Wikipedia, blogs), to software in general, when based on a more general Open Source model. After many years of research and evidence, several studies...
Article
Full-text available
The process of fixing software bugs plays a key role in the maintenance activities of a software project. Ideally, code ownership and responsibility should be enforced among developers working on the same artifacts, so that those introducing buggy code could also contribute to its fix. However, especially in FLOSS projects, this mechanism is not cl...
Chapter
Agile sprints are short events where a small team collocates in order to work on particular aspects of the overall project for a short period of time. Sprinting is a process that has been observed also in Free Software projects: these two paradigms, sharing common principles and values have shown several commonalities of practice. This article eval...
Conference Paper
Full-text available
When considering the jobs market, changes or recurring trends for skilled employees expressed by employers' needs have a tremendous impact on the evolution of website content. On-line jobs sites adverts, academic institutions and professional development “standard bodies” all share those needs as their common driver for contents evolution. This pap...
Conference Paper
Full-text available
The Second International Workshop on Building Sustainable Open Source Communities (OSCOMM 2010) aims at building a community of researchers and practitioners to share experiences and discuss challenges involved in building and maintaining open source communities.