Benoit Baudry

Benoit Baudry
  • Professor
  • Professor at Université de Montréal

About

376
Publications
100,011
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,625
Citations
Current institution
Université de Montréal
Current position
  • Professor

Publications

Publications (376)
Preprint
Full-text available
New media art builds on top of rich software stacks. Blending multiple media such as code, light or sound , new media artists integrate various types of software to draw, animate, control or synchronize different parts of an artwork. Yet, the artworks rarely credit software and all the developers involved. In this work, we present Myriad People, an...
Preprint
Full-text available
Software Bills of Materials (SBOMs) are essential to ensure the transparency and integrity of the software supply chain. There is a growing body of work that investigates the accuracy of SBOM generation tools and the challenges for producing complete SBOMs. Yet, there is little knowledge about how developers distribute SBOMs. In this work, we mine...
Article
Full-text available
Software tests require data that is realistic, but not real. For example, banking applications cannot be tested with actual customer names and addresses. In these situations, developers rely on fake data generators, also known as fakers, to generate test data to be used in automated tests. Fakers exist in all programming languages. For example, the...
Article
Full-text available
Mocking allows testing program units in isolation. A developer who writes tests with mocks faces two challenges: design realistic interactions between a unit and its environment; and understand the expected impact of these interactions on the behavior of the unit. In this paper, we propose to monitor an application in production to generate tests t...
Preprint
Full-text available
Using open-source dependencies is essential in modern software development. However, this practice implies significant trust in third-party code, while there is little support for developers to assess this trust. As a consequence, attacks have been increasingly occurring through third-party dependencies. These are called software supply chain attac...
Preprint
One of the main challenges of N-Version Programming is development cost: it requires paying multiple teams to develop variants of the same system. To address this issue, we propose the automated generation of variants using large language models. We design, develop and evaluate Gal\'apagos: a tool for generating program variants using LLMs, validat...
Preprint
We introduce Java-Class-Hijack, a novel software supply chain attack that enables an attacker to inject malicious code by crafting a class that shadows a legitimate class that is in the dependency tree. We describe the attack, provide a proof-of-concept demonstrating its feasibility, and replicate it in the German Corona-Warn-App server application...
Preprint
Dependency updates often cause compilation errors when new dependency versions introduce changes that are incompatible with existing client code. Fixing breaking dependency updates is notoriously hard, as their root cause can be hidden deep in the dependency tree. We present Breaking-Good, a tool that automatically generates explanations for breaki...
Article
Algorithmic artists master programming to create art. Specialized libraries and hardware devices such as pen plotters support their practice.
Preprint
Typically, a conventional unit test (CUT) verifies the expected behavior of the unit under test through one specific input / output pair. In contrast, a parameterized unit test (PUT) receives a set of inputs as arguments, and contains assertions that are expected to hold true for all these inputs. PUTs increase test quality, as they assess correctn...
Preprint
Full-text available
Software supply chain attacks have become a significant threat as software development increasingly relies on contributions from multiple, often unverified sources. The code from unverified sources does not pose a threat until it is executed. Log4Shell is a recent example of a supply chain attack that processed a malicious input at runtime, leading...
Preprint
Full-text available
JavaScript packages are notoriously prone to bloat, a factor that significantly impacts the performance and maintainability of web applications. While web bundlers and tree-shaking can mitigate this issue in client-side applications at the function level, they cannot effectively detect and remove bloat in server-side applications. In this paper, we...
Preprint
In managed languages, serialization of objects is typically done in bespoke binary formats such as Protobuf, or markup languages such as XML or JSON. The major limitation of these formats is readability. Human developers cannot read binary code, and in most cases, suffer from the syntax of XML or JSON. This is a major issue when objects are meant t...
Article
Full-text available
Software bills of materials (SBOM) promise to become the backbone of software supply chain hardening. We deep-dive into six tools and the SBOMs they produce for complex open source Java projects, revealing challenges regarding the accurate production and usage of SBOMS.
Article
Large-scale code reuse significantly reduces both development costs and time. However, the massive share of third-party code in software projects poses new challenges, especially in terms of maintenance and security. In this paper, we propose a novel technique to specialize dependencies of Java projects, based on their actual usage. Given a project...
Article
Full-text available
In this paper, we present ChaosETH , a chaos engineering approach for resilience assessment of Ethereum blockchain clients. ChaosETH operates in the following manner: First, it monitors Ethereum clients to determine their normal behavior. Then, it injects system call invocation errors into one single Ethereum client at a time, and observes the beha...
Article
Full-text available
WebAssembly has become a crucial part of the modern web, offering a faster alternative to Java-Script in browsers. While boosting rich applications in browser, this technology is also very efficient to develop cryptojacking malware. This has triggered the development of several methods to detect cryptojacking malware. However, these defenses have n...
Preprint
As all software, blockchain nodes are exposed to faults in their underlying execution stack. Unstable execution environments can disrupt the availability of blockchain nodes interfaces, resulting in downtime for users. This paper introduces the concept of N-version Blockchain nodes. This new type of node relies on simultaneous execution of differen...
Preprint
Software bills of materials (SBOM) promise to become the backbone of software supply chain hardening. We deep-dive into 6 tools and the accuracy of the SBOMs they produce for complex open-source Java projects. Our novel insights reveal some hard challenges for the accurate production and usage of SBOMs.
Preprint
Modern software systems rely on a multitude of third-party dependencies. This large-scale code reuse reduces development costs and time, and it poses new challenges with respect to maintenance and security. Techniques such as tree shaking or shading can remove dependencies that are completely unused by a project, which partly address these challeng...
Article
Full-text available
As all software, blockchain nodes are exposed to faults in their underlying execution stack. Unstable execution environments can disrupt the availability of blockchain nodes' interfaces, resulting in downtime for users. This paper introduces the concept of N-Version Blockchain nodes. This new type of node relies on simultaneous execution of differe...
Preprint
Full-text available
WebAssembly is a binary format that has become an essential component of the web nowadays. Providing a faster alternative to JavaScript in the browser, this new technology has been embraced from its early days to create cryptomalware. This has triggered a solid effort to propose defenses that can detect WebAssembly malware. Yet, no defensive work h...
Article
Ethereum is the single largest programmable blockchain platform today. Ethereum nodes operate the blockchain, relying on a vast supply chain of third-party software dependencies. In this article, we perform an analysis of the software supply chain of Java Ethereum nodes and distill the challenges of maintaining and securing this blockchain technolo...
Preprint
Mocking in the context of automated software tests allows testing program units in isolation. Designing realistic interactions between a unit and its environment, and understanding the expected impact of these interactions on the behavior of the unit, are two key challenges that software testers face when developing tests with mocks. In this paper,...
Article
Full-text available
Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one singl...
Article
It's a period of unrest. Rebel developers, striking from continuous deployment servers, have won their first victory. During the battle, rebel spies managed to push an epic commit in the HTML code of https://pro.sony. Pursued by sinister agents, the rebels are hiding in commits, buttons, tooltips, API, HTTP headers, and configuration screens.
Preprint
Full-text available
Rickrolling is an Internet cultural phenomenon born in the mid 2000s. Originally confined to Internet fora, it has spread to other channels and media. In this paper, we hypothesize that rickrolling has reached the formal academic world. We design and conduct a systematic experiment to survey rickrolling in the academic literature. As of March 2022,...
Preprint
Full-text available
The rise of blockchain technologies has triggered tremendous research interests, coding efforts, and monetary investments in the last decade. Ethereum is the largest programmable blockchain platform today. It features cryptocurrency trading, digital art, and decentralized finance through smart contracts. So-called Ethereum nodes operate the blockch...
Preprint
Full-text available
The highly parallel workflows of modern software development have made merging of source code a common activity for developers. The state of the practice is based on line-based merge, which is ubiquitously used with "git merge". Line-based merge is however a generalized technique for any text that cannot leverage the structured nature of source cod...
Article
The highly parallel workflows of modern software development have made merging of source code a common activity for developers. The state of the practice is based on line-based merge, which is ubiquitously used with “git merge”. Line-based merge is however a generalized technique for any text that cannot leverage the structured nature of source cod...
Article
Full-text available
Modern software deployment process produces software that is uniform, and hence vulnerable to large-scale code-reuse attacks, such as Jump-Oriented Programming (JOP) attacks. Compiler-based diversification improves the resilience and security of software systems by automatically generating different assembly code versions of a given program. Existi...
Preprint
Full-text available
GraphQL is a new paradigm to design web APIs. Despite its growing popularity, there are few techniques to verify the implementation of a GraphQL API. We present a new testing approach based on GraphQL queries that are logged while users interact with an application in production. Our core motivation is that production queries capture real usages of...
Preprint
Full-text available
Modern software deployment process produces software that is uniform and hence vulnerable to large-scale code-reuse attacks, such as Jump-Oriented Programming (JOP) attacks. Compiler-based diversification improves the resilience of software systems by automatically generating different assembly code versions of a given program. Existing techniques...
Preprint
Full-text available
Despite its obvious benefits, the increased adoption of package managers to automate the reuse of libraries has opened the door to a new class of hazards: supply chain attacks. By injecting malicious code in one library, an attacker may compromise all instances of all applications that depend on the library. To mitigate the impact of supply chain a...
Article
Hyrum’s law states a common observation in the software industry: “With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody”. Meanwhile, recent research results seem to contradict this observation when they state that “for most APIs, the...
Preprint
The Ethereum blockchain is the operational backbone of major decentralized finance platforms. As such, it is expected to be exceptionally reliable. In this paper, we present ChaosETH, a chaos engineering tool for resilience assessment of Ethereum clients. ChaosETH operates in the following manner: First, it monitors Ethereum clients to determine th...
Preprint
Full-text available
This paper explores the use of relational symbolic execution to counter timing side channels in WebAssembly programs. We design and implement Vivienne, an open-source tool to automatically analyze WebAssembly cryptographic libraries for constant-time violations. Our approach features various optimizations that leverage the structure of WebAssembly...
Article
Full-text available
In this paper, we present a novel fault injection system called ChaosOrca for system calls in containerized applications. ChaosOrca aims at evaluating a given application’s self-protection capability with respect to system call errors. The unique feature of ChaosOrca is that it conducts experiments under production-like workload without instrumenti...
Article
Full-text available
In this article, we propose to use production executions to improve the quality of testing for certain methods of interest for developers. These methods can be methods that are not covered by the existing test suite or methods that are poorly tested. We devise an approach called pankti which monitors applications as they execute in production and t...
Preprint
Full-text available
Edge-cloud computing offloads parts of the computations that traditionally occurs in the cloud to edge nodes,e.g., CDN servers, in order to get closer to the users and reduce latency. To improve performance even further, WebAssembly is increasingly used in this context. Edge-cloud computing providers, such as Fastly or Cloudflare, let their clients...
Conference Paper
Full-text available
We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we...
Preprint
Full-text available
We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we...
Article
Full-text available
Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application’s code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challeng...
Article
Full-text available
The automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning to make computers able to interpret signs of the Swedish Sign Language...
Conference Paper
Full-text available
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suit...
Preprint
Full-text available
JSON is a popular file and data format that is precisely specified by the IETF in RFC 8259. Yet, this specification implicitly and explicitly leaves room for many design choices when it comes to parsing and generating JSON. This yields the opportunity of diverse behavior among independent implementations of JSON libraries. A thorough analysis of th...
Article
Full-text available
Software bugs are common and correcting them accounts for a significant part of costs in the software development and maintenance process. This calls for automatic techniques to deal with them. One promising direction towards this goal is gaining repair knowledge from historical bug fixing examples. Retrieving insights from software development his...
Article
Full-text available
In this paper, we present a novel fault injection framework for system call invocation errors, called Phoebe. Phoebe is unique as follows; First, Phoebe enables developers to have full observability of system call invocations. Second, Phoebe generates error models that are realistic in the sense that they mimic errors that naturally happen in produ...
Preprint
Full-text available
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suit...
Article
This exposition is an investigation into software tranquillity through sound. One second of activity on a laptop was recorded by tracing the function calls within the Linux kernel. Can software be wild or calm? If so, what would calm software be like? Imagining that the software could experience its own existence, is the nature of its tranquillity...
Preprint
Full-text available
Software bugs are common and correcting them accounts for a significant part of costs in the software development and maintenance process. This calls for automatic techniques to deal with them. One promising direction towards this goal is gaining repair knowledge from historical bug fixing examples. Retrieving insights from software development his...
Preprint
Full-text available
Software testing ensures that a software system behaves as intended. In this paper, we identify the methods in a software system that need better testing, and propose to use production executions to improve test suites. We devise an approach called PANKTI which monitors applications as they execute in production, and then automatically generates un...
Preprint
The automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning in order to make computers able to interpret signs of the Swedish Sign...
Chapter
Modern software deployment process produces software that is uniform, and hence vulnerable to large-scale code-reuse attacks. Compiler-based diversification improves the resilience and security of software systems by automatically generating different assembly code versions of a given program. Existing techniques are efficient but do not have a pre...
Preprint
Software bloat is code that is packaged in an application but is actually not used and not necessary to run the application. The presence of bloat is an issue for software security, for performance, and for maintenance. In recent years, several works have proposed techniques to detect and remove software bloat. In this paper, we introduce a novel t...
Preprint
Full-text available
The adoption of WebAssembly has rapidly increased in the last few years as it provides a fast and safe model for program execution. However, WebAssembly is not exempt from vulnerabilities that could be exploited by side channels attacks. This class of vulnerabilities that can be addressed by code diversification. In this paper, we present the first...
Preprint
Full-text available
Modern software deployment process produces software that is uniform, and hence vulnerable to large-scale code-reuse attacks. Compiler-based diversification improves the resilience and security of software systems by automatically generating different assembly code versions of a given program. Existing techniques are efficient but do not have a pre...
Article
Full-text available
When a developer pushes a change to an application’s codebase, a good practice is to have a test case specifying this behavioral change. Thanks to continuous integration (CI), the test is run on subsequent commits to check that they do no introduce a regression for that behavior. In this paper, we propose an approach that detects behavioral changes...
Preprint
Full-text available
In this paper, we present a novel fault injection framework called Phoebe for reliability analysis with respect to system call invocation errors. First, Phoebe enables developers to have full observability of system call invocations. Second, Phoebe generates error models that are realistic in the sense that they resemble errors that naturally happe...
Preprint
Full-text available
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers u...
Article
Full-text available
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers u...
Article
With this article, we survey the research performed in the domain of browser fingerprinting, while providing an accessible entry point to newcomers in the field. We explain how this technique works and where it stems from. We analyze the related work in detail to understand the composition of modern fingerprints and see how this technique is curren...
Preprint
Motivated by the fast adoption of WebAssembly, we propose the first functional pipeline to support the superoptimization of WebAssembly bytecode. Our pipeline works over LLVM and Souper. We evaluate our superoptimization pipeline with 12 programs from the Rosetta code project. Our pipeline improves the code section size of 8 out of 12 programs. We...
Conference Paper
Full-text available
Motivated by the fast adoption of WebAssembly, we propose the first functional pipeline to support the superoptimization of Web-Assembly bytecode. Our pipeline works over LLVM and Souper. We evaluate our superoptimization pipeline with 12 programs from the Rosetta code project. Our pipeline improves the code section size of 8 out of 12 programs. We...
Preprint
Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application's code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challeng...
Article
Full-text available
Generative software development has paved the way for the creation of multiple code generators that serve as a basis for automatically generating code to different software and hardware platforms. In this context, the software quality becomes highly correlated to the quality of code generators used during software development. Eventual failures may...
Preprint
Docker is a virtualization technique heavily used in industry to build cloud-based systems. In this context, observability means that it is hard for engineers to get timely and accurate information about the running state in production, due to scale and virtualization. In this paper, we present a novel approach, called POBS, to automatically improv...
Article
Full-text available
Neutral program variants are alternative implementations of a program, yet equivalent with respect to the test suite. Techniques such as approximate computing or genetic improvement share the intuition that potential for enhancements lies in these acceptable behavioral differences (e.g., enhanced performance or reliability). Yet, the automatic synt...
Article
Full-text available
Software systems contain resilience code to handle those failures and unexpected events happening in production. It is essential for developers to understand and assess the resilience of their systems. Chaos engineering is a technology that aims at assessing resilience and uncovering weaknesses by actively injecting perturbations in production. In...
Conference Paper
Browser fingerprinting is a technique that collects information about the browser configuration and the environment in which it is running. This information is so diverse that it can partially or totally identify users online. Over time, several countermeasures have emerged to mitigate tracking through browser fingerprinting. However, these measure...
Conference Paper
Full-text available
During compilation from Java source code to byte-code, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Moder...
Conference Paper
The comparison and alignment of runtime traces are essential, e.g., for semantic analysis or debugging. However, naive sequence alignment algorithms cannot address the needs of the modern web: (i) the bytecode generation process of V8 is not deterministic; (ii) bytecode traces are large. We present STRAC, a scalable and extensible tool tailored to...
Preprint
Repairnator is a bot. It constantly monitors software bugs discovered during continuous integration of open-source software and tries to fix them automatically. If it succeeds in synthesizing a valid patch, Repairnator proposes the patch to the human developers, disguised under a fake human identity. To date, Repairnator has been able to producepat...
Preprint
The comparison and alignment of runtime traces are essential, e.g., for semantic analysis or debugging. However, naive sequence alignment algorithms cannot address the needs of the modern web: (i) the bytecode generation process of V8 is not deterministic; (ii) bytecode traces are large. We present STRAC, a scalable and extensible tool tailored to...
Preprint
An extreme transformation removes the body of a method that is reached by one test case at least. If the test suite passes on the original program and still passes after the extreme transformation, the transformation is said to be undetected, and the test suite needs to be improved. In this work we propose a technique to automatically determine whi...
Preprint
Full-text available
This paper addresses the following question: does a small, essential, core set of API members emerges from the actual usage of the API by client applications? To investigate this question, we study the 99 most popular libraries available in Maven Central and the 865,560 client programs that declare dependencies towards them, summing up to 2.3M depe...
Preprint
Full-text available
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Modern...
Article
Full-text available
In the literature, there is a rather clear segregation between manually written tests by developers and automatically generated ones. In this paper, we explore a third solution: to automatically improve existing test cases written by developers. We present the concept, design and implementation of a system called DSpot, that takes developer-written...
Article
Full-text available
The adoption of agile approaches has put an increased emphasis on testing, resulting in extensive test suites. These suites include a large number of tests, in which developers embed knowledge about meaningful input data and expected properties as oracles.This article surveys works that exploit this knowledge to enhance manually written tests with...
Preprint
Full-text available
In this paper, we present a novel fault injection system called ChaosOrca for system calls in containerized applications. ChaosOrca aims at evaluating a given application's self-protection capability with respect to system call errors. The unique feature of ChaosOrca is that it conducts experiments under production-like workload without instrumenti...
Article
Full-text available
Repairnator is a bot. It constantly monitors software bugs discovered during continuous integration of open-source software and tries to fix them automatically. If it succeeds in synthesizing a valid patch, Repairnator proposes the patch to the human developers, disguised under a fake human identity. To date, Repairnator has been able to produce pa...

Network

Cited By