Tom MensUniversity of Mons · Department of Computer Science
Tom Mens
PhD in Sciences
Interested in a PhD position in empirical software engineering or software development ecosystems? Contact me!
About
373
Publications
148,339
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,106
Citations
Introduction
I am a Full Professor in Software Engineering at the University of Mons, working mainly on software evolution, socio-technical software ecosystems, software quality, software health, empirical software engineering and model-driven software engineering
http://staff.umons.ac.be/tom.mens
Additional affiliations
October 2003 - December 2012
September 1992 - September 2003
Education
October 1993 - September 1999
Publications
Publications (373)
Identifying whether GitHub contributors are automated bots is important for empirical research on collaborative software development practices. Multiple such bot identification approaches have been proposed in the past. In this article, we identify the limitations of these approaches and we propose a new binary classification model, called BIMBAS,...
While open-source software has enabled significant levels of reuse to speed up software development, it has also given rise to the dreadful dependency hell that all software practitioners face on a regular basis. This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries. The catalogue is...
Collaborative software development through GitHub repositories frequently relies on bot accounts to automate repetitive and error-prone tasks. This highlights the need to have accurate and efficient bot identification tools. Several such tools have been proposed in the past, but they tend to rely on a substantial amount of historical data, or they...
This chapter defines and presents different kinds of software ecosystems. The focus is on the development, tooling and analytics aspects of software ecosystems, i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are developing and maintaining....
Development bots are being used by maintainers of GitHub repositories to perform repetitive or error-prone tasks. While multiple approaches have been proposed in the past to identify such bots in GitHub repositories, they are either inaccurate, not
general enough (e.g., based exclusively on specific types of activities) or require a large amount of...
This chapter defines and presents the kinds of software ecosystems that are targeted in this book. The focus is on the development, tooling and analytics aspects of "software ecosystems", i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are d...
Large-scale software development has become a highly collaborative and geographically distributed endeavour, especially in open-source software development ecosystems and their associated developer communities. It has given rise to modern development processes (e.g., pull-based development) that involve a wide range of activities such as issue and...
Large-scale software development has become a highly collaborative and geographically distributed endeavour, especially in open-source software development ecosystems and their associated developer communities. It has given rise to modern development processes (e.g., pull-based development) that involve a wide range of activities such as issue and...
Continuous integration, delivery and deployment (CI/CD) is used to support the collaborative software development process. CI/CD tools automate a wide range of activities in the development workflow such as testing, linting, updating dependencies, creating and deploying releases, and so on. Previous quantitative studies have revealed important chan...
Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-art bot identification tools have been developed to detect bots based on their comments in commits, issues and
pull requests. Given that bots can be invo...
Software artifacts often interact with each other throughout the software development cycle. Associating related artifacts is a common practice for effective documentation and maintenance of software projects. Conventionally, to register the link between an issue report and its associated commit, developers manually include the issue identifier in...
Using popular open source projects on GitHub, we provide evidence that bots are regularly among the most active contributors, even though GitHub does not explicitly acknowledge their presence. This poses a problem for techniques that analyze human contributor activity.
The increasing interest in open source software has led to the emergence of large language-specific package distributions of reusable software libraries, such as npm and RubyGems. These software packages can be subject to vulnerabilities that may expose dependent packages through explicitly declared dependencies. Using Snyk’s vulnerability database...
Technical collaboration between multiple contributors is a natural phenomenon in distributed open source software development projects. Macro-collaboration, where each code commit is attributed to a single collaborator, has been extensively studied in the research literature. This is much less the case for so-called micro-collaboration practices, i...
Invited Talk for the Doctoral Symposium of the ICSR 2022 Conference. The presentation provides practical advice to PhD students on how to become a successful PhD student, and beyond...
The practice of backporting aims to bring the benefits of a bug or vulnerability fix from a higher to a lower release of a software package. When such a package adheres to semantic versioning, backports can be recognised as new releases in a lower major train. This is particularly useful in case a substantial number of software packages continues t...
The increasing interest in open source software has led to the emergence of large package distributions of reusable software libraries, such as npm and RubyGems. These software packages can be subject to security vulnerabilities that may expose dependent packages through explicitly declared dependencies. This article empirically studies security vu...
Docker is one of the most popular containerization technologies. A Docker container can be saved into an image including all environmental packages required to run it, such as system and third-party packages from language-specific package repositories. Relying on its modularity, an image can be shared and included in other images to simplify the wa...
Distributions of open source software packages dedicated to specific programming languages facilitate software development by allowing software projects to depend on the functionality provided by such reusable packages. The health of a software project can be affected by the maturity of the packages on which it depends. The version numbers of the u...
Detecting the presence of bots in distributed software development activity is very important in order to prevent bias in large-scale socio-technical empirical analyses. In previous work, we proposed a classification model to detect bots in GitHub repositories based on the pull request and issue comments of GitHub accounts. The current study genera...
Linux users expect fresh packages in the official repositories of their distributions. Yet, due to philosophical divergences, the packages available in various distributions do not all have the same degree of freshness. Users therefore need to be informed as to those differences. Through quantitative empirical analyses, we assess and compare the fr...
Software projects are regularly updated with new functionality and bug fixes through so-called releases. In recent years, many software projects have been shifting to shorter release cycles and this can affect the bug handling activity. Past research has focused on the impact of switching from traditional to rapid release cycles with respect to bug...
Development bots are used on Github to automate repetitive activities. Such bots communicate with human actors via issue comments and pull request comments. Identifying such bot comments allows preventing bias in socio-technical studies related to software development. To automate their identification, we propose a classification model based on nat...
Container-based solutions, such as Docker, have become increasingly relevant in the software industry to facilitate deploying and maintaining software systems. Little is known, however, about how outdated such containers are at the moment of their release or when used in production. This article addresses this question, by measuring and comparing f...
Bots are frequently used in Github repositories to automate repetitive activities that are part of the distributed software development process. They communicate with human actors through comments. While detecting their presence is important for many reasons, no large and representative ground-truth dataset is available, nor are classification mode...
Distributions of open source software packages dedicated to specific programming languages facilitate software development by allowing software projects to depend on the functionality provided by such reusable packages. The health of a software project can be affected by the maturity of the packages on which it depends. The version numbers of the u...
Large software projects follow a continuous development process with regular releases during which bugs are handled. In recent years, many software projects shifted to rapid releases that reduce time-to-market and claim a faster delivery of fixed issues, but also have a shorter period to address bugs. To better understand the impact of rapid releas...
Les technologies immersives ont fait leur apparition dans bon nombre d’outils de modélisation architecturale. Néanmoins, leur usage se limite bien souvent à des fins de visualisation, par exemple pour valider un design auprès d’un client muni d’un casque de réalité virtuelle. Notre travail vise à permettre une utilisation de ce medium immersif dura...
Bots are frequently used in Github repositories to automate repetitive activities that are part of the distributed software development process. They communicate with human actors through comments. While detecting their presence is important for many reasons, no large and representative ground-truth dataset is available, nor are classification mode...
The open-source Linux operating system is available through a wide variety of distributions, each containing a collection of installable software packages. It can be important to keep these packages as fresh as possible to benefit from new features, bug fixes and security patches. However, not all distributions place the same emphasis on package fr...
Many empirical studies focus on socio-technical activity in social coding platforms such as GitHub, for example to study the onboard-ing, abandonment, productivity and collaboration among team members. Such studies face the difficulty that GitHub activity can also be generated automatically by bots of a different nature. It therefore becomes impera...
Statecharts are a well-known visual modelling language for representing the executable behaviour of complex reactive event-based systems. The essential complexity of statechart models solicits the need for advanced model testing and validation techniques, such as test-driven development, behaviour-driven development, design by contract, and propert...
Abandonment of active developers poses a significant risk for many open source software projects. This risk can be reduced by forecasting the future activity of contributors involved in such projects. Focusing on the commit activity of individuals involved in git repositories, this paper proposes a practicable probabilistic forecasting model based...
Large open source software projects, like Eclipse, follow a continuous software development process , with a regular release cycle. During each release , new bugs are reported, triaged and resolved. Previous studies have focused on various aspects of bug fixing, such as bug triaging, bug prediction, and bug process analysis. Most studies, however,...
Even though architectural modelling radically evolved over the course of its history, the current integration of Augmented Reality (AR) and Virtual Reality(VR) components in the corresponding design tasks is mostly limited to enhancing visualisation. Little to none of these tools attempt to tackle the challenge of modelling within immersive environ...
The semantic versioning (semver) policy is commonly accepted by open source package management systems to inform whether new releases of software packages introduce possibly backward incompatible changes. Maintainers depending on such packages can use this information to avoid or reduce the risk of breaking changes in their own packages by specifyi...
Statecharts constitute an executable language for modelling event-based reactive systems. The essential complexity of statechart mod- els solicits the need for advanced model testing and validation techniques. In this article we propose a method aimed at enhancing statechart design with a range of techniques that have proven their usefulness to inc...
Reusable Open Source Software (OSS) components for major programming languages are available in package repositories. Developers rely on package management tools to automate deployments, specifying which package releases satisfy the needs of their applications. However, these specifications may lead to deploying package releases that are outdated,...
Nearly every popular programming language comes with one or more package managers. The software packages distributed by such package managers form large software ecosystems. These packaging ecosystems contain a large number of package releases that are updated regularly and that have many dependencies to other package releases. While packaging ecos...
Containerized applications, and in particular Docker images, are becoming a common solution in cloud environments to meet ever-increasing demands in terms of portability, reliability and fast deployment. A Docker image includes all environmental dependencies required to run it, such as specific versions of system and third-party packages. Leveragin...
Software systems often leverage on open source software libraries to reuse functionalities. Such libraries are readily available through software package managers like npm for JavaScript. Due to the huge amount of packages available in such package distributions, developers often decide to rely on or contribute to a software package based on its po...
Contemporary Software Engineering has inevitably become much more social. Due to the size, complexity, and diversity of today's software systems, there is a need to interact across organizational, geographical, cultural, and socioeconomic boundaries. Large-scale software development now implies active user involvement and requires close cooperation...
The pull-based development process has become prevalent on platforms such as GitHub as a form of distributed software development. Potential contributors can create and submit a set of changes to a software project through pull requests. These changes can be accepted, discussed or rejected by the maintainers of the software project, and can influen...
Software ecosystems are collections of projects that are developed and evolve together in the same environment. Existing literature investigates software ecosystems as isolated entities whose boundaries do not overlap and assumes they are self-contained. However, a number of software projects are distributed in more than one ecosystem. As different...
Packaging software into containers is becoming a common practice when deploying services in cloud and other environments. Docker images are one of the most popular container technologies for building and deploying containers. A container image usually includes a collection of software packages, that can have bugs and security vulnerabilities that a...
The Architecture, Engineering and Construction (AEC) industry started to integrate Augmented Reality (AR) and Virtual Reality (VR) solutions in Computer-Aided Architectural Design (CAAD) tools, but their use is mostly limited to visualisation purposes. Few of these tools propose the ability of advanced modelling directly within an immersive environ...
Even though architectural modelling radically evolved over the course of its history, the current integration of Augmented Reality (AR) and Virtual Reality (VR) components in the corresponding design tasks is mostly limited to enhancing visualisation. Little to none of these tools attempt to tackle the challenge of modelling within immersive enviro...
Component‐based software reuse has led to the emergence of numerous open‐source software ecosystems. Such ecosystems offer the user a wide and diverse collection of software components that are interconnected by dependency relationships and maintained by large communities of developers. While developers can reuse the work of others by depending on...
Software packages developed and distributed through package managers extensively depend on other packages. These dependencies are regularly updated, for example to add new features, resolve bugs or fix security issues. In order to take full advantage of the benefits of this type of reuse, developers should keep their dependencies up to date by rely...
Software library packages are constantly evolving and increasing in number. Not updating to the latest available release of dependent libraries may negatively affect software development by not benefiting from new functionality, vulnerability and bug fixes available in more recent versions. On the other hand, automatically updating to the latest re...
Security vulnerabilities are among the most pressing problems in open source software package libraries. It may take a long time to discover and fix vulnerabilities in packages. In addition, vul-nerabilities may propagate to dependent packages, making them vulnerable too. This paper presents an empirical study of nearly 400 security reports over a...
This abstract presents the automatic link extraction pitfalls based on our experience on manually investigating links in the RubyGems package manager metadata. This work can lead in automating the link extraction approach so as to avoid these pitfalls and produce more complete datasets to be used by researchers when they investigate the multi-platf...
This extended abstract presents the research goals and preliminary research results of the interdisciplinary research project SECOHealth, an ongoing collaboration between research teams of Polytechnique Montreal (Canada), the University of Mons (Belgium) and Laval University (Canada). SECOHealth aims to contribute to research and practice in softwa...
This document contains the final report of the research activities carried out by research partners Université de Mons and Université de Namur in the context of F.R.S.-FNRS research project T.0022.13 entitled “Empirical Analysis of the Co-Evolution and Social Interaction in Data-Intensive Software Systems”. During a four-year period from July 2013...
Software ecosystems can be viewed as socio-technical networks consisting of technical components (software packages) and social components (communities of developers) that maintain the technical components. Ecosystems evolve over time through socio-technical changes that may greatly impact the ecosystem's sustainability. Social changes like develop...
Open source cloud computing solutions, such as CloudStack and Eucalyptus, have become increasingly popular in recent years. Despite this popularity, a better understanding of the factors influencing user adoption is still under active research. For example, increased project agility may lead to solutions that remain competitive in a rapidly evolvin...
Nearly every popular programming language comes with one or more open source software packaging ecosystem(s), containing a large collection of interdependent software pack- ages developed in that programming language. Such packaging ecosystems are extremely useful for their respective software development community. We present an empirical analysis...
Software development projects frequently rely on testing-related libraries to test the functionality of the software product automatically and efficiently. Many such libraries are available for Java, and developers face a hard time deciding which libraries are most appropriate for their project, or when to migrate to a competing library. We empiric...
Relational databases (DB) play a critical role in many information systems. For different reasons, their schemas gather not only tables and columns but also views, triggers or stored functions (i.e., fragments of code describing treatments). As for any other code-related artefact, software quality in a DB schema helps avoiding future bugs. However,...
This article presents an empirical study of how the use of relational database access technologies in open source Java projects evolves over time. Our observations may be useful to project managers to make more informed decisions on which technologies to introduce into an existing project and when. We selected 2,457 Java projects on GitHub using th...
In this paper we formally present a layered calculus for encapsulated modification of objects. Its denotational as well as operational semantics are given. The confluency of the calculus is proven, and a translation of λ-calculus into our calculus is presented.
Component-based software reuse has lead to the emergence of numerous open source software ecosystems. Such ecosystems offer the user a wide and diverse collection of software components that are interconnected by dependency relationships and maintained by large communities of developers. While developers can reuse the work of others by depending on...
This chapter presents the research advancements in the field of data-intensive software system evolution, 5 years after the publication of our IEEE Computer column presenting the challenges in this field. We present the state-of-the-art in this research domain, and re- port on research on the evolution of open source Java projects relying on relati...
Software ecosystems evolve through an active community of developers who contribute to projects within the ecosystem. However, development teams change over time, suggesting a potential impact on the evolution of the technical parts of the ecosystem. The impact of such modifications has been studied by previous works, but only temporary changes hav...
Package-based software ecosystems are composed of thousands of interdependent software packages. Many empirical studies have focused on software packages belonging to a single software ecosystem, and suggest to generalise the results to more ecosystems. We claim that such a generalisation is not always possible, because the technical structure of s...
In this invited paper I focus on the difficulties of maintaining and evolving software systems that are part of a larger ecosystem. While not every software system falls under this category, software ecosystems are becoming ubiquitous due to the omnipresence of open source software. I present several challenges that arise during maintenance and evo...
There are many dimensions of software complexity. In this article, we explore how structural complexity is measured and used to study and control evolving software systems. We also