Benoit Baudry

Benoit Baudry
National Institute for Research in Computer Science and Control | INRIA · TRISKELL - Reliable and Efficient Component Based Software Engineering Research Team

About

337
Publications
69,523
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,193
Citations
Citations since 2016
121 Research Items
3314 Citations
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500
20162017201820192020202120220100200300400500

Publications

Publications (337)
Article
Ethereum is the single largest programmable blockchain platform today. Ethereum nodes operate the blockchain, relying on a vast supply chain of third-party software dependencies. In this article, we perform an analysis of the software supply chain of Java Ethereum nodes and distill the challenges of maintaining and securing this blockchain technolo...
Preprint
Mocking in the context of automated software tests allows testing program units in isolation. Designing realistic interactions between a unit and its environment, and understanding the expected impact of these interactions on the behavior of the unit, are two key challenges that software testers face when developing tests with mocks. In this paper,...
Article
Full-text available
Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one singl...
Article
It's a period of unrest. Rebel developers, striking from continuous deployment servers, have won their first victory. During the battle, rebel spies managed to push an epic commit in the HTML code of https://pro.sony. Pursued by sinister agents, the rebels are hiding in commits, buttons, tooltips, API, HTTP headers, and configuration screens.
Preprint
Rickrolling is an Internet cultural phenomenon born in the mid 2000s. Originally confined to Internet fora, it has spread to other channels and media. In this paper, we hypothesize that rickrolling has reached the formal academic world. We design and conduct a systematic experiment to survey rickrolling in the academic literature. As of March 2022,...
Preprint
Full-text available
The rise of blockchain technologies has triggered tremendous research interests, coding efforts, and monetary investments in the last decade. Ethereum is the largest programmable blockchain platform today. It features cryptocurrency trading, digital art, and decentralized finance through smart contracts. So-called Ethereum nodes operate the blockch...
Preprint
The highly parallel workflows of modern software development have made merging of source code a common activity for developers. The state of the practice is based on line-based merge, which is ubiquitously used with "git merge". Line-based merge is however a generalized technique for any text that cannot leverage the structured nature of source cod...
Article
The highly parallel workflows of modern software development have made merging of source code a common activity for developers. The state of the practice is based on line-based merge, which is ubiquitously used with git merge. Line-based merge is however a generalized technique for any text that cannot leverage the structured nature of source code,...
Article
Modern software deployment process produces software that is uniform, and hence vulnerable to large-scale code-reuse attacks, such as Jump-Oriented Programming (JOP) attacks. Compiler-based diversification improves the resilience and security of software systems by automatically generating different assembly code versions of a given program. Existi...
Preprint
Full-text available
GraphQL is a new paradigm to design web APIs. Despite its growing popularity, there are few techniques to verify the implementation of a GraphQL API. We present a new testing approach based on GraphQL queries that are logged while users interact with an application in production. Our core motivation is that production queries capture real usages of...
Preprint
Full-text available
Modern software deployment process produces software that is uniform and hence vulnerable to large-scale code-reuse attacks, such as Jump-Oriented Programming (JOP) attacks. Compiler-based diversification improves the resilience of software systems by automatically generating different assembly code versions of a given program. Existing techniques...
Preprint
Full-text available
Despite its obvious benefits, the increased adoption of package managers to automate the reuse of libraries has opened the door to a new class of hazards: supply chain attacks. By injecting malicious code in one library, an attacker may compromise all instances of all applications that depend on the library. To mitigate the impact of supply chain a...
Article
Hyrum’s law states a common observation in the software industry: “With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody”. Meanwhile, recent research results seem to contradict this observation when they state that “for most APIs, the...
Preprint
The Ethereum blockchain is the operational backbone of major decentralized finance platforms. As such, it is expected to be exceptionally reliable. In this paper, we present ChaosETH, a chaos engineering tool for resilience assessment of Ethereum clients. ChaosETH operates in the following manner: First, it monitors Ethereum clients to determine th...
Preprint
Full-text available
This paper explores the use of relational symbolic execution to counter timing side channels in WebAssembly programs. We design and implement Vivienne, an open-source tool to automatically analyze WebAssembly cryptographic libraries for constant-time violations. Our approach features various optimizations that leverage the structure of WebAssembly...
Article
Full-text available
In this paper, we present a novel fault injection system called ChaosOrca for system calls in containerized applications. ChaosOrca aims at evaluating a given application’s self-protection capability with respect to system call errors. The unique feature of ChaosOrca is that it conducts experiments under production-like workload without instrumenti...
Article
Full-text available
In this article, we propose to use production executions to improve the quality of testing for certain methods of interest for developers. These methods can be methods that are not covered by the existing test suite or methods that are poorly tested. We devise an approach called pankti which monitors applications as they execute in production and t...
Preprint
Full-text available
Edge-cloud computing offloads parts of the computations that traditionally occurs in the cloud to edge nodes,e.g., CDN servers, in order to get closer to the users and reduce latency. To improve performance even further, WebAssembly is increasingly used in this context. Edge-cloud computing providers, such as Fastly or Cloudflare, let their clients...
Conference Paper
Full-text available
We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we...
Preprint
Full-text available
We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we...
Article
Full-text available
Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application’s code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challeng...
Article
Full-text available
The automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning to make computers able to interpret signs of the Swedish Sign Language...
Conference Paper
Full-text available
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suit...
Preprint
Full-text available
JSON is a popular file and data format that is precisely specified by the IETF in RFC 8259. Yet, this specification implicitly and explicitly leaves room for many design choices when it comes to parsing and generating JSON. This yields the opportunity of diverse behavior among independent implementations of JSON libraries. A thorough analysis of th...
Article
Full-text available
Software bugs are common and correcting them accounts for a significant part of costs in the software development and maintenance process. This calls for automatic techniques to deal with them. One promising direction towards this goal is gaining repair knowledge from historical bug fixing examples. Retrieving insights from software development his...
Article
Full-text available
In this paper, we present a novel fault injection framework for system call invocation errors, called Phoebe. Phoebe is unique as follows; First, Phoebe enables developers to have full observability of system call invocations. Second, Phoebe generates error models that are realistic in the sense that they mimic errors that naturally happen in produ...
Preprint
Full-text available
Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suit...
Preprint
Full-text available
Software bugs are common and correcting them accounts for a significant part of costs in the software development and maintenance process. This calls for automatic techniques to deal with them. One promising direction towards this goal is gaining repair knowledge from historical bug fixing examples. Retrieving insights from software development his...
Preprint
Full-text available
Software testing ensures that a software system behaves as intended. In this paper, we identify the methods in a software system that need better testing, and propose to use production executions to improve test suites. We devise an approach called PANKTI which monitors applications as they execute in production, and then automatically generates un...
Preprint
The automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning in order to make computers able to interpret signs of the Swedish Sign...
Chapter
Modern software deployment process produces software that is uniform, and hence vulnerable to large-scale code-reuse attacks. Compiler-based diversification improves the resilience and security of software systems by automatically generating different assembly code versions of a given program. Existing techniques are efficient but do not have a pre...
Preprint
Software bloat is code that is packaged in an application but is actually not used and not necessary to run the application. The presence of bloat is an issue for software security, for performance, and for maintenance. In recent years, several works have proposed techniques to detect and remove software bloat. In this paper, we introduce a novel t...
Preprint
Full-text available
The adoption of WebAssembly has rapidly increased in the last few years as it provides a fast and safe model for program execution. However, WebAssembly is not exempt from vulnerabilities that could be exploited by side channels attacks. This class of vulnerabilities that can be addressed by code diversification. In this paper, we present the first...
Preprint
Full-text available
Modern software deployment process produces software that is uniform, and hence vulnerable to large-scale code-reuse attacks. Compiler-based diversification improves the resilience and security of software systems by automatically generating different assembly code versions of a given program. Existing techniques are efficient but do not have a pre...
Article
Full-text available
When a developer pushes a change to an application’s codebase, a good practice is to have a test case specifying this behavioral change. Thanks to continuous integration (CI), the test is run on subsequent commits to check that they do no introduce a regression for that behavior. In this paper, we propose an approach that detects behavioral changes...
Preprint
Full-text available
In this paper, we present a novel fault injection framework called Phoebe for reliability analysis with respect to system call invocation errors. First, Phoebe enables developers to have full observability of system call invocations. Second, Phoebe generates error models that are realistic in the sense that they resemble errors that naturally happe...
Preprint
Full-text available
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers u...
Article
Full-text available
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers u...
Article
With this article, we survey the research performed in the domain of browser fingerprinting, while providing an accessible entry point to newcomers in the field. We explain how this technique works and where it stems from. We analyze the related work in detail to understand the composition of modern fingerprints and see how this technique is curren...
Preprint
Motivated by the fast adoption of WebAssembly, we propose the first functional pipeline to support the superoptimization of WebAssembly bytecode. Our pipeline works over LLVM and Souper. We evaluate our superoptimization pipeline with 12 programs from the Rosetta code project. Our pipeline improves the code section size of 8 out of 12 programs. We...
Conference Paper
Full-text available
Motivated by the fast adoption of WebAssembly, we propose the first functional pipeline to support the superoptimization of Web-Assembly bytecode. Our pipeline works over LLVM and Souper. We evaluate our superoptimization pipeline with 12 programs from the Rosetta code project. Our pipeline improves the code section size of 8 out of 12 programs. We...
Preprint
Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application's code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challeng...
Article
Full-text available
Generative software development has paved the way for the creation of multiple code generators that serve as a basis for automatically generating code to different software and hardware platforms. In this context, the software quality becomes highly correlated to the quality of code generators used during software development. Eventual failures may...
Preprint
Docker is a virtualization technique heavily used in industry to build cloud-based systems. In this context, observability means that it is hard for engineers to get timely and accurate information about the running state in production, due to scale and virtualization. In this paper, we present a novel approach, called POBS, to automatically improv...
Article
Full-text available
Neutral program variants are alternative implementations of a program, yet equivalent with respect to the test suite. Techniques such as approximate computing or genetic improvement share the intuition that potential for enhancements lies in these acceptable behavioral differences (e.g., enhanced performance or reliability). Yet, the automatic synt...
Article
Full-text available
Software systems contain resilience code to handle those failures and unexpected events happening in production. It is essential for developers to understand and assess the resilience of their systems. Chaos engineering is a technology that aims at assessing resilience and uncovering weaknesses by actively injecting perturbations in production. In...
Conference Paper
Browser fingerprinting is a technique that collects information about the browser configuration and the environment in which it is running. This information is so diverse that it can partially or totally identify users online. Over time, several countermeasures have emerged to mitigate tracking through browser fingerprinting. However, these measure...
Conference Paper
Full-text available
During compilation from Java source code to byte-code, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Moder...
Conference Paper
The comparison and alignment of runtime traces are essential, e.g., for semantic analysis or debugging. However, naive sequence alignment algorithms cannot address the needs of the modern web: (i) the bytecode generation process of V8 is not deterministic; (ii) bytecode traces are large. We present STRAC, a scalable and extensible tool tailored to...
Preprint
Repairnator is a bot. It constantly monitors software bugs discovered during continuous integration of open-source software and tries to fix them automatically. If it succeeds in synthesizing a valid patch, Repairnator proposes the patch to the human developers, disguised under a fake human identity. To date, Repairnator has been able to producepat...
Preprint
The comparison and alignment of runtime traces are essential, e.g., for semantic analysis or debugging. However, naive sequence alignment algorithms cannot address the needs of the modern web: (i) the bytecode generation process of V8 is not deterministic; (ii) bytecode traces are large. We present STRAC, a scalable and extensible tool tailored to...
Preprint
An extreme transformation removes the body of a method that is reached by one test case at least. If the test suite passes on the original program and still passes after the extreme transformation, the transformation is said to be undetected, and the test suite needs to be improved. In this work we propose a technique to automatically determine whi...
Preprint
This paper addresses the following question: does a small, essential, core set of API members emerges from the actual usage of the API by client applications? To investigate this question, we study the 99 most popular libraries available in Maven Central and the 865,560 client programs that declare dependencies towards them, summing up to 2.3M depe...
Preprint
Full-text available
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Modern...
Article
Full-text available
In the literature, there is a rather clear segregation between manually written tests by developers and automatically generated ones. In this paper, we explore a third solution: to automatically improve existing test cases written by developers. We present the concept, design and implementation of a system called DSpot, that takes developer-written...
Article
Full-text available
The adoption of agile approaches has put an increased emphasis on testing, resulting in extensive test suites. These suites include a large number of tests, in which developers embed knowledge about meaningful input data and expected properties as oracles.This article surveys works that exploit this knowledge to enhance manually written tests with...