Tom Mens

Tom Mens
University of Mons · Department of Computer Science

PhD in Sciences
Interested in a PhD position in empirical software engineering or software development ecosystems? Contact me!

About

373
Publications
148,339
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,106
Citations
Introduction
I am a Full Professor in Software Engineering at the University of Mons, working mainly on software evolution, socio-technical software ecosystems, software quality, software health, empirical software engineering and model-driven software engineering http://staff.umons.ac.be/tom.mens
Additional affiliations
October 2003 - December 2012
University of Mons
September 1992 - September 2003
Vrije Universiteit Brussel
Education
October 1993 - September 1999
Vrije Universiteit Brussel
Field of study
  • Computer Science

Publications

Publications (373)
Article
Identifying whether GitHub contributors are automated bots is important for empirical research on collaborative software development practices. Multiple such bot identification approaches have been proposed in the past. In this article, we identify the limitations of these approaches and we propose a new binary classification model, called BIMBAS,...
Preprint
While open-source software has enabled significant levels of reuse to speed up software development, it has also given rise to the dreadful dependency hell that all software practitioners face on a regular basis. This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries. The catalogue is...
Conference Paper
Collaborative software development through GitHub repositories frequently relies on bot accounts to automate repetitive and error-prone tasks. This highlights the need to have accurate and efficient bot identification tools. Several such tools have been proposed in the past, but they tend to rely on a substantial amount of historical data, or they...
Preprint
This chapter defines and presents different kinds of software ecosystems. The focus is on the development, tooling and analytics aspects of software ecosystems, i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are developing and maintaining....
Conference Paper
Development bots are being used by maintainers of GitHub repositories to perform repetitive or error-prone tasks. While multiple approaches have been proposed in the past to identify such bots in GitHub repositories, they are either inaccurate, not general enough (e.g., based exclusively on specific types of activities) or require a large amount of...
Chapter
This chapter defines and presents the kinds of software ecosystems that are targeted in this book. The focus is on the development, tooling and analytics aspects of "software ecosystems", i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are d...
Chapter
Large-scale software development has become a highly collaborative and geographically distributed endeavour, especially in open-source software development ecosystems and their associated developer communities. It has given rise to modern development processes (e.g., pull-based development) that involve a wide range of activities such as issue and...
Preprint
Full-text available
Large-scale software development has become a highly collaborative and geographically distributed endeavour, especially in open-source software development ecosystems and their associated developer communities. It has given rise to modern development processes (e.g., pull-based development) that involve a wide range of activities such as issue and...
Article
Full-text available
Continuous integration, delivery and deployment (CI/CD) is used to support the collaborative software development process. CI/CD tools automate a wide range of activities in the development workflow such as testing, linting, updating dependencies, creating and deploying releases, and so on. Previous quantitative studies have revealed important chan...
Conference Paper
Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-art bot identification tools have been developed to detect bots based on their comments in commits, issues and pull requests. Given that bots can be invo...
Preprint
Full-text available
Software artifacts often interact with each other throughout the software development cycle. Associating related artifacts is a common practice for effective documentation and maintenance of software projects. Conventionally, to register the link between an issue report and its associated commit, developers manually include the issue identifier in...
Article
Using popular open source projects on GitHub, we provide evidence that bots are regularly among the most active contributors, even though GitHub does not explicitly acknowledge their presence. This poses a problem for techniques that analyze human contributor activity.
Article
Full-text available
The increasing interest in open source software has led to the emergence of large language-specific package distributions of reusable software libraries, such as npm and RubyGems. These software packages can be subject to vulnerabilities that may expose dependent packages through explicitly declared dependencies. Using Snyk’s vulnerability database...
Article
Full-text available
Technical collaboration between multiple contributors is a natural phenomenon in distributed open source software development projects. Macro-collaboration, where each code commit is attributed to a single collaborator, has been extensively studied in the research literature. This is much less the case for so-called micro-collaboration practices, i...
Presentation
Full-text available
Invited Talk for the Doctoral Symposium of the ICSR 2022 Conference. The presentation provides practical advice to PhD students on how to become a successful PhD student, and beyond...
Article
The practice of backporting aims to bring the benefits of a bug or vulnerability fix from a higher to a lower release of a software package. When such a package adheres to semantic versioning, backports can be recognised as new releases in a lower major train. This is particularly useful in case a substantial number of software packages continues t...
Preprint
Full-text available
The increasing interest in open source software has led to the emergence of large package distributions of reusable software libraries, such as npm and RubyGems. These software packages can be subject to security vulnerabilities that may expose dependent packages through explicitly declared dependencies. This article empirically studies security vu...
Article
Docker is one of the most popular containerization technologies. A Docker container can be saved into an image including all environmental packages required to run it, such as system and third-party packages from language-specific package repositories. Relying on its modularity, an image can be shared and included in other images to simplify the wa...
Article
Distributions of open source software packages dedicated to specific programming languages facilitate software development by allowing software projects to depend on the functionality provided by such reusable packages. The health of a software project can be affected by the maturity of the packages on which it depends. The version numbers of the u...
Preprint
Full-text available
Detecting the presence of bots in distributed software development activity is very important in order to prevent bias in large-scale socio-technical empirical analyses. In previous work, we proposed a classification model to detect bots in GitHub repositories based on the pull request and issue comments of GitHub accounts. The current study genera...
Preprint
Full-text available
Linux users expect fresh packages in the official repositories of their distributions. Yet, due to philosophical divergences, the packages available in various distributions do not all have the same degree of freshness. Users therefore need to be informed as to those differences. Through quantitative empirical analyses, we assess and compare the fr...
Preprint
Software projects are regularly updated with new functionality and bug fixes through so-called releases. In recent years, many software projects have been shifting to shorter release cycles and this can affect the bug handling activity. Past research has focused on the impact of switching from traditional to rapid release cycles with respect to bug...
Preprint
Development bots are used on Github to automate repetitive activities. Such bots communicate with human actors via issue comments and pull request comments. Identifying such bot comments allows preventing bias in socio-technical studies related to software development. To automate their identification, we propose a classification model based on nat...
Article
Full-text available
Container-based solutions, such as Docker, have become increasingly relevant in the software industry to facilitate deploying and maintaining software systems. Little is known, however, about how outdated such containers are at the moment of their release or when used in production. This article addresses this question, by measuring and comparing f...
Article
Bots are frequently used in Github repositories to automate repetitive activities that are part of the distributed software development process. They communicate with human actors through comments. While detecting their presence is important for many reasons, no large and representative ground-truth dataset is available, nor are classification mode...
Preprint
Full-text available
Distributions of open source software packages dedicated to specific programming languages facilitate software development by allowing software projects to depend on the functionality provided by such reusable packages. The health of a software project can be affected by the maturity of the packages on which it depends. The version numbers of the u...
Article
Full-text available
Large software projects follow a continuous development process with regular releases during which bugs are handled. In recent years, many software projects shifted to rapid releases that reduce time-to-market and claim a faster delivery of fixed issues, but also have a shorter period to address bugs. To better understand the impact of rapid releas...
Article
Full-text available
Les technologies immersives ont fait leur apparition dans bon nombre d’outils de modélisation architecturale. Néanmoins, leur usage se limite bien souvent à des fins de visualisation, par exemple pour valider un design auprès d’un client muni d’un casque de réalité virtuelle. Notre travail vise à permettre une utilisation de ce medium immersif dura...
Preprint
Bots are frequently used in Github repositories to automate repetitive activities that are part of the distributed software development process. They communicate with human actors through comments. While detecting their presence is important for many reasons, no large and representative ground-truth dataset is available, nor are classification mode...
Preprint
Full-text available
The open-source Linux operating system is available through a wide variety of distributions, each containing a collection of installable software packages. It can be important to keep these packages as fresh as possible to benefit from new features, bug fixes and security patches. However, not all distributions place the same emphasis on package fr...
Conference Paper
Full-text available
Many empirical studies focus on socio-technical activity in social coding platforms such as GitHub, for example to study the onboard-ing, abandonment, productivity and collaboration among team members. Such studies face the difficulty that GitHub activity can also be generated automatically by bots of a different nature. It therefore becomes impera...
Article
Full-text available
Statecharts are a well-known visual modelling language for representing the executable behaviour of complex reactive event-based systems. The essential complexity of statechart models solicits the need for advanced model testing and validation techniques, such as test-driven development, behaviour-driven development, design by contract, and propert...
Article
Abandonment of active developers poses a significant risk for many open source software projects. This risk can be reduced by forecasting the future activity of contributors involved in such projects. Focusing on the commit activity of individuals involved in git repositories, this paper proposes a practicable probabilistic forecasting model based...
Conference Paper
Full-text available
Large open source software projects, like Eclipse, follow a continuous software development process , with a regular release cycle. During each release , new bugs are reported, triaged and resolved. Previous studies have focused on various aspects of bug fixing, such as bug triaging, bug prediction, and bug process analysis. Most studies, however,...
Preprint
Full-text available
Even though architectural modelling radically evolved over the course of its history, the current integration of Augmented Reality (AR) and Virtual Reality(VR) components in the corresponding design tasks is mostly limited to enhancing visualisation. Little to none of these tools attempt to tackle the challenge of modelling within immersive environ...
Article
The semantic versioning (semver) policy is commonly accepted by open source package management systems to inform whether new releases of software packages introduce possibly backward incompatible changes. Maintainers depending on such packages can use this information to avoid or reduce the risk of breaking changes in their own packages by specifyi...
Article
Full-text available
Statecharts constitute an executable language for modelling event-based reactive systems. The essential complexity of statechart mod- els solicits the need for advanced model testing and validation techniques. In this article we propose a method aimed at enhancing statechart design with a range of techniques that have proven their usefulness to inc...
Article
Reusable Open Source Software (OSS) components for major programming languages are available in package repositories. Developers rely on package management tools to automate deployments, specifying which package releases satisfy the needs of their applications. However, these specifications may lead to deploying package releases that are outdated,...
Article
Full-text available
Nearly every popular programming language comes with one or more package managers. The software packages distributed by such package managers form large software ecosystems. These packaging ecosystems contain a large number of package releases that are updated regularly and that have many dependencies to other package releases. While packaging ecos...
Preprint
Full-text available
Containerized applications, and in particular Docker images, are becoming a common solution in cloud environments to meet ever-increasing demands in terms of portability, reliability and fast deployment. A Docker image includes all environmental dependencies required to run it, such as specific versions of system and third-party packages. Leveragin...
Preprint
Full-text available
Software systems often leverage on open source software libraries to reuse functionalities. Such libraries are readily available through software package managers like npm for JavaScript. Due to the huge amount of packages available in such package distributions, developers often decide to rely on or contribute to a software package based on its po...
Article
Contemporary Software Engineering has inevitably become much more social. Due to the size, complexity, and diversity of today's software systems, there is a need to interact across organizational, geographical, cultural, and socioeconomic boundaries. Large-scale software development now implies active user involvement and requires close cooperation...
Preprint
The pull-based development process has become prevalent on platforms such as GitHub as a form of distributed software development. Potential contributors can create and submit a set of changes to a software project through pull requests. These changes can be accepted, discussed or rejected by the maintainers of the software project, and can influen...
Preprint
Software ecosystems are collections of projects that are developed and evolve together in the same environment. Existing literature investigates software ecosystems as isolated entities whose boundaries do not overlap and assumes they are self-contained. However, a number of software projects are distributed in more than one ecosystem. As different...
Preprint
Full-text available
Packaging software into containers is becoming a common practice when deploying services in cloud and other environments. Docker images are one of the most popular container technologies for building and deploying containers. A container image usually includes a collection of software packages, that can have bugs and security vulnerabilities that a...
Conference Paper
The Architecture, Engineering and Construction (AEC) industry started to integrate Augmented Reality (AR) and Virtual Reality (VR) solutions in Computer-Aided Architectural Design (CAAD) tools, but their use is mostly limited to visualisation purposes. Few of these tools propose the ability of advanced modelling directly within an immersive environ...
Conference Paper
Full-text available
Even though architectural modelling radically evolved over the course of its history, the current integration of Augmented Reality (AR) and Virtual Reality (VR) components in the corresponding design tasks is mostly limited to enhancing visualisation. Little to none of these tools attempt to tackle the challenge of modelling within immersive enviro...
Chapter
Component‐based software reuse has led to the emergence of numerous open‐source software ecosystems. Such ecosystems offer the user a wide and diverse collection of software components that are interconnected by dependency relationships and maintained by large communities of developers. While developers can reuse the work of others by depending on...
Preprint
Full-text available
Software packages developed and distributed through package managers extensively depend on other packages. These dependencies are regularly updated, for example to add new features, resolve bugs or fix security issues. In order to take full advantage of the benefits of this type of reuse, developers should keep their dependencies up to date by rely...
Conference Paper
Full-text available
Software library packages are constantly evolving and increasing in number. Not updating to the latest available release of dependent libraries may negatively affect software development by not benefiting from new functionality, vulnerability and bug fixes available in more recent versions. On the other hand, automatically updating to the latest re...
Conference Paper
Security vulnerabilities are among the most pressing problems in open source software package libraries. It may take a long time to discover and fix vulnerabilities in packages. In addition, vul-nerabilities may propagate to dependent packages, making them vulnerable too. This paper presents an empirical study of nearly 400 security reports over a...
Presentation
Full-text available
This abstract presents the automatic link extraction pitfalls based on our experience on manually investigating links in the RubyGems package manager metadata. This work can lead in automating the link extraction approach so as to avoid these pitfalls and produce more complete datasets to be used by researchers when they investigate the multi-platf...
Article
Full-text available
This extended abstract presents the research goals and preliminary research results of the interdisciplinary research project SECOHealth, an ongoing collaboration between research teams of Polytechnique Montreal (Canada), the University of Mons (Belgium) and Laval University (Canada). SECOHealth aims to contribute to research and practice in softwa...
Technical Report
Full-text available
This document contains the final report of the research activities carried out by research partners Université de Mons and Université de Namur in the context of F.R.S.-FNRS research project T.0022.13 entitled “Empirical Analysis of the Co-Evolution and Social Interaction in Data-Intensive Software Systems”. During a four-year period from July 2013...
Article
Full-text available
Software ecosystems can be viewed as socio-technical networks consisting of technical components (software packages) and social components (communities of developers) that maintain the technical components. Ecosystems evolve over time through socio-technical changes that may greatly impact the ecosystem's sustainability. Social changes like develop...
Conference Paper
Full-text available
Open source cloud computing solutions, such as CloudStack and Eucalyptus, have become increasingly popular in recent years. Despite this popularity, a better understanding of the factors influencing user adoption is still under active research. For example, increased project agility may lead to solutions that remain competitive in a rapidly evolvin...
Conference Paper
Nearly every popular programming language comes with one or more open source software packaging ecosystem(s), containing a large collection of interdependent software pack- ages developed in that programming language. Such packaging ecosystems are extremely useful for their respective software development community. We present an empirical analysis...
Conference Paper
Full-text available
Software development projects frequently rely on testing-related libraries to test the functionality of the software product automatically and efficiently. Many such libraries are available for Java, and developers face a hard time deciding which libraries are most appropriate for their project, or when to migrate to a competing library. We empiric...
Conference Paper
Full-text available
Relational databases (DB) play a critical role in many information systems. For different reasons, their schemas gather not only tables and columns but also views, triggers or stored functions (i.e., fragments of code describing treatments). As for any other code-related artefact, software quality in a DB schema helps avoiding future bugs. However,...
Article
Full-text available
This article presents an empirical study of how the use of relational database access technologies in open source Java projects evolves over time. Our observations may be useful to project managers to make more informed decisions on which technologies to introduce into an existing project and when. We selected 2,457 Java projects on GitHub using th...
Article
In this paper we formally present a layered calculus for encapsulated modification of objects. Its denotational as well as operational semantics are given. The confluency of the calculus is proven, and a translation of λ-calculus into our calculus is presented.
Chapter
Component-based software reuse has lead to the emergence of numerous open source software ecosystems. Such ecosystems offer the user a wide and diverse collection of software components that are interconnected by dependency relationships and maintained by large communities of developers. While developers can reuse the work of others by depending on...
Chapter
Full-text available
This chapter presents the research advancements in the field of data-intensive software system evolution, 5 years after the publication of our IEEE Computer column presenting the challenges in this field. We present the state-of-the-art in this research domain, and re- port on research on the evolution of open source Java projects relying on relati...
Conference Paper
Software ecosystems evolve through an active community of developers who contribute to projects within the ecosystem. However, development teams change over time, suggesting a potential impact on the evolution of the technical parts of the ecosystem. The impact of such modifications has been studied by previous works, but only temporary changes hav...
Conference Paper
Package-based software ecosystems are composed of thousands of interdependent software packages. Many empirical studies have focused on software packages belonging to a single software ecosystem, and suggest to generalise the results to more ecosystems. We claim that such a generalisation is not always possible, because the technical structure of s...
Conference Paper
In this invited paper I focus on the difficulties of maintaining and evolving software systems that are part of a larger ecosystem. While not every software system falls under this category, software ecosystems are becoming ubiquitous due to the omnipresence of open source software. I present several challenges that arise during maintenance and evo...
Article
Full-text available
There are many dimensions of software complexity. In this article, we explore how structural complexity is measured and used to study and control evolving software systems. We also