Conference Paper

An Empirical Study of Build Failures in the Docker Context

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Docker containers have become the de-facto industry standard. Docker builds often break, and a large amount of efforts are put into troubleshooting broken builds. Prior studies have evaluated the rate at which builds in large organizations fail. However, little is known about the frequency and fix effort of failures that occur in Docker builds of open-source projects. This paper provides a first attempt to present a preliminary study on 857,086 Docker builds from 3,828 open-source projects hosted on GitHub. Using the Docker build data, we measure the frequency of broken builds and report their fix time. Furthermore, we explore the evolution of Docker build failures across time. Our findings help to characterize and understand Docker build failures and motivate the need for collecting more empirical evidence.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It is also a trendy research area, with more than 7,770 research papers published on Docker since the beginning of 2020. 15 In software engineering, a wide variety of Docker studies have investigated specific aspects, including the ecosystem of Docker containers (Cito et al. 2017), Docker build failure (Yiwen et al. 2020), and other Docker topics (Henkel et al. 2020a, b). These studies aimed to examine practical findings and lessons learned by mining a large number of published Dockerfiles. ...
Article
Full-text available
In software development, ad hoc solutions that are intentionally implemented by developers are called self-admitted technical debt (SATD). Because the existence of SATD spreads poor implementations, it is necessary to remove it as soon as possible. Meanwhile, container virtualization has been attracting attention in recent years as a technology to support infrastructure such as servers. Currently, Docker is the de facto standard for container virtualization. In Docker, a file describing how to build a container (Dockerfile) is a set of procedural instructions; thus, it can be considered as a kind of source code. Moreover, because Docker is a relatively new technology, there are few developers who have accumulated good or bad practices for building Docker container. Hence, it is likely that Dockerfiles contain many SATDs, as is the case with general programming language source code analyzed in previous SATD studies. The goal of this paper is to categorize SATDs in Dockerfiles and to share knowledge with developers and researchers. To achieve this goal, we conducted a manual classification for SATDs in Dockerfile. We found that about 3.0% of the comments in Dockerfile are SATD. In addition, we have classified SATDs into five classes and eleven subclasses. Among them, there are some SATDs specific to Docker, such as SATDs for version fixing and for integrity check. The three most common classes of SATD were related to lowering maintainability, testing, and defects.
... Similar to the concept of inheritance in object-oriented programming, docker images can define the FROM instruction to inherit image definitions from another base image [15]. The new image will inherit all the attributes and files encapsulated in the base image [13]. In practice, to find a suitable image, developers often need to manually search the base image from docker registries, e.g., Docker Hub. ...
Preprint
Full-text available
Docker containers are being widely used in large-scale industrial environments. In practice, developers must manually specify the base image in the dockerfile in the process of container creation. However, finding the proper base image is a nontrivial task because manually searching is time-consuming and easily leads to the use of unsuitable base images, especially for newcomers. There is still a lack of automatic approaches for recommending related base image for developers through dockerfile configuration. To tackle this problem, this paper makes the first attempt to propose a neural network approach named DCCimagerec which is based on deep configuration comprehension. It aims to use the structural configuration features of dock-erfile extracted by AST and path-attention model to recommend potentially suitable base image. The evaluation experiments based on about 83,000 dockerfiles show that DCCimagerec outperforms multiple baselines, improving Precision by 7.5%-67.5%, Recall by 6.2%-106.6%, and F1 by 7.5%-150.2%.
... Today, source code is essentially required to have Continuous Integration and Continuous Delivery/Deployment (CI/CD) to target environments because automation has become a norm in software development practices [5]. To carry out CI/CD, GitHub, as one of the most widely used source code repository providers, has integrated many third-party CI/CD tools, such as Travis CI [34], Jenkins [1] and Docker [28]. It is becoming increasingly clear that making a choice of CI/CD tools and infrastructures is a tough task [21]. ...
Preprint
Automation has become a norm in software development practices, especially in CI/CD practices. Recently, GitHub introduced GitHub Actions (GA) to provide automated workflows for software maintainers. However, few researches have been proposed to evaluate its capability and impact, even though several GA have been built by practitioners. In this paper, we conduct a large-scale empirical study of GitHub projects, to help practitioners gain deep insights into the GA. We quantitatively investigate the basic adoption of GA and its potential correlation with project properties. We also analyze the usage details of GA, including its component scale and action sequences. Finally, using regression modeling, we investigate the impact of GA on the commit frequency, pull request and issue's resolution efficiency. Our findings suggest a nuanced picture of how practitioners are adapting to, and benefiting from GA.
... However, the CI/CD pipeline execution is often blocked by Docker build failures due to the discrepancy between the local environment and that inside a Docker image. Previous work found that 17.8% of historical Docker builds failed in the sample open-source projects [5]. In the community of deep learning, it was reported that 39.4% of the jobs based on Docker failed because of accessing a non-existent file or directory [6]. ...
Preprint
Continuous Integration (CI) and Continuous Deployment (CD) are widely adopted in software engineering practice. In reality, the CI/CD pipeline execution is not yet reliably continuous because it is often interrupted by Docker build failures. However, the existing trial-and-error practice to detect faults is time-consuming. To timely detect Dockerfile faults, we propose a context-based pre-build analysis approach, named DockerMock, through mocking the execution of common Dockerfile instructions. A Dockerfile fault is declared when an instruction conflicts with the approximated and accumulated running context. By explicitly keeping track of whether the context is fuzzy, DockerMock strikes a good balance of detection precision and recall. We evaluated DockerMock with 53 faults in 41 Dockerfiles from open source projects on GitHub and 130 faults in 105 Dockerfiles from student course projects. On average, DockerMock detected 68.0% Dockerfile faults in these two datasets. While baseline hadolint detected 6.5%, and baseline BuildKit detected 60.5% without instruction execution. In the GitHub dataset, DockerMock reduces the number of builds to 47, outperforming that of hadolint (73) and BuildKit (74).
... We found a comparable breakage rate, but have also developed methods aimed at making repairs instead of just analyzing quality. More recently, Wu et al. [34] conducted a comprehensive study of build failures in Dockerfiles. They analyzed a total of 3,828 GitHub project containing Dockerfiles, and a total of 857,086 Docker builds. ...
Preprint
Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and -- to our great surprise -- found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose SHIPWRIGHT, a human-in-the-loop system for finding repairs to broken Dockerfiles. SHIPWRIGHT uses a modified version of the BERT language model to embed build logs and to cluster broken Dockerfiles. Using these clusters and a search-based procedure, we were able to design 13 rules for making automated repairs to Dockerfiles. With the aid of SHIPWRIGHT, we submitted 45 pull requests (with a 42.2% acceptance rate) to GitHub projects with broken Dockerfiles. Furthermore, in a "time-travel" analysis of broken Dockerfiles that were later fixed, we found that SHIPWRIGHT proposed repairs that were equivalent to human-authored patches in 22.77% of the cases we studied. Finally, we compared our work with recent, state-of-the-art, static Dockerfile analyses, and found that, while static tools detected possible build-failure-inducing issues in 20.6--33.8% of the files we examined, SHIPWRIGHT was able to detect possible issues in 73.25% of the files and, additionally, provide automated repairs for 18.9% of the files.
Article
Full-text available
Dockerfile plays an important role in the Docker-based software development process, but many Dockerfile codes are infected with smells in practice. Understanding the occurrence of Dockerfile smells in open-source software can benefit the practice of Dockerfile and enhance project maintenance. In this paper, we perform an empirical study on a large dataset of 6,334 projects to help developers gain some insights into the occurrence of Dockerfile smells, including its coverage, distribution, cooccurrence, and correlation with project characteristics. Our results show that smells are very common in Dockerfile codes and there exists co-occurrence between different types of Dockerfile smells. Further, using linear regression analysis, when controlled for various variables, we statistically identify and quantify the relationships between Dockerfile smells occurrence and project characteristics. We also provide a rich resource of implications for software practitioners.
Conference Paper
Full-text available
Continuous Integration (CI) and Continuous Delivery (CD) are widespread in both industrial and open-source software (OSS) projects. Recent research characterized build failures in CI and identified factors potentially correlated to them. However, most observations and findings of previous work are exclusively based on OSS projects or data from a single industrial organization. This paper provides a first attempt to compare the CI processes and occurrences of build failures in 349 Java OSS projects and 418 projects from a financial organization, ING Nederland. Through the analysis of 34,182 failing builds (26% of the total number of observed builds), we derived a taxonomy of failures that affect the observed CI processes. Using cluster analysis, we observed that in some cases OSS and ING projects share similar build failure patterns (e.g., few compilation failures as compared to frequent testing failures), while in other cases completely different patterns emerge. In short, we explain how OSS and ING CI processes exhibit commonalities, yet are substantially different in their design and in the failures they report.
Conference Paper
Full-text available
Software development has always inherently required multitasking: developers switch between coding, reviewing, testing, designing, and meeting with colleagues. The advent of software ecosystems like GitHub has enabled something new: the ability to easily switch between projects. Developers also have social incentives to contribute to many projects; prolific contributors gain social recognition and (eventually) economic rewards. Multitasking, however, comes at a cognitive cost: frequent context switches can lead to distraction, sub-standard work, and even greater stress. In this paper, we gather ecosystem-level data on a group of programmers working on a large collection of projects. We develop models and methods for measuring the rate and breadth of a developers' context-switching behavior, and we study how context-switching affects their productivity. We also survey developers to understand the reasons for and perceptions of multitasking. We find that the most common reason for multitasking is interrelationships and dependencies between projects. Notably, we find that the rate of switching and breadth (number of projects) of a developer’s work matter. Developers who work on many projects have higher productivity if they focus on few projects per day. Developers that switch projects too much during the course of a day have lower productivity as they work on more projects overall. Despite these findings, developers perceptions of the benefits of multitasking are varied.
Conference Paper
Continuous Integration (CI) is a widely-used software development practice to reduce risks. CI builds often break, and a large amount of efforts are put into troubleshooting broken builds. Despite that compiler errors have been recognized as one of the most frequent types of build failures, little is known about the common types, fix efforts and fix patterns of compiler errors that occur in CI builds of open-source projects. To fill such a gap, we present a large-scale empirical study on 6,854,271 CI builds from 3,799 open-source Java projects hosted on GitHub. Using the build data, we measured the frequency of broken builds caused by compiler errors, investigated the ten most common compiler error types, and reported their fix time. We manually analyzed 325 broken builds to summarize fix patterns of the ten most common compiler error types. Our findings help to characterize and understand compiler errors during CI and provide practical implications to developers, tool builders and researchers.
Article
Docker, as a de-facto industry standard, enables the packaging of an application with all its dependencies and execution environment in a lightweight, self-contained unit, i.e., containers. By launching the container from Docker image, developers can easily share the same operating system, libraries, and binaries. As the configuration file, the dockerfile plays an important role, because it defines the specific Docker image architecture and building orders. As a project progresses through its development stages, the content of the dockerfile may be revised many times. This dockerfile evolution is indicative of how the project infrastructure varies over time, and different projects can exhibit different evolutionary trajectories. However, projects with similar goals and needs may converge to more similar trajectories than more disparate projects. Identifying software projects that have undergone similar changes can be very important for the discovery and implementation of the best practices when adopting new tools and pipelines, especially in the DevOps software development paradigm. The potential to implement the best practices through the analysis of the dockerfile evolutionary trajectories motivated this work. This research studied dockerfile longitudinal changes at large scale and presented a clustering-based approach for mining convergent evolutionary trajectories. An empirical study of 2840 projects was conducted, and six distinct clusters of dockerfile evolutionary trajectories were found. Furthermore, each cluster was summarized, accompanied by case studies, and the differences between different clusters were discussed. The proposed approach quantifies distinct dockerfile evolution modes and reflects the learning curves of project maintainers, and this benefits future project maintenance. Also, the proposed approach is generic and can be used for the study of general infrastructure configuration file evolution.
Conference Paper
Continuous deployment (CD) is a software development practice aimed at automating delivery and deployment of a software product, following any changes to its code. If properly implemented, CD together with other automation in the development process can bring numerous benefits, including higher control and flexibility over release schedules, lower risks, fewer defects, and easier on-boarding of new developers. Here we focus on the (r)evolution in CD workflows caused by containerization, the virtualization technology that enables packaging an application together with all its dependencies and execution environment in a light-weight, self-contained unit, of which Docker has become the de-facto industry standard. There are many available choices for containerized CD workflows, some more appropriate than others for a given project. Owing to cross-listing of GitHub projects on Docker Hub, in this paper we report on a mixed-methods study to shed light on developers' experiences and expectations with containerized CD workflows. Starting from a survey, we explore the motivations, specific workflows, needs, and barriers with containerized CD. We find two prominent workflows, based on the automated builds feature on Docker Hub or continuous integration services, with different trade-offs. We then propose hypotheses and test them in a large-scale quantitative study.
Conference Paper
Docker containers are standardized, self-contained units of applications, packaged with their dependencies and execution environment. The environment is defined in a Dockerfile that specifies the steps to reach a certain system state as infrastructure code, with the aim of enabling reproducible builds of the container. To lay the groundwork for research on infrastructure code, we collected structured information about the state and the evolution of Dockerfiles on GitHub and release it as a PostgreSQL database archive (over 100,000 unique Dockerfiles in over 15,000 GitHub projects). Our dataset enables answering a multitude of interesting research questions related to different kinds of software evolution behavior in the Docker ecosystem.
Conference Paper
Build systems translate sources into deliverables. Developers execute builds on a regular basis in order to integrate their personal code changes into testable deliverables. Prior studies have evaluated the rate at which builds in large organizations fail. A recent study at Google has analyzed (among other things) the rate at which builds in developer workspaces fail. In this paper, we replicate the Google study in the Visual Studio context of the MSR challenge. We extract and analyze 13,300 build events, observing that builds are failing 67%--76% less frequently and are fixed 46%--78% faster in our study context. Our results suggest that build failure rates are highly sensitive to contextual factors. Given the large number of factors by which our study contexts differ (e.g., system size, team size, IDE tooling, programming languages), it is not possible to trace the root cause for the large differences in our results. Additional data is needed to arrive at more complete conclusions.
Conference Paper
Continuous Integration (CI) is a cornerstone of modern quality assurance, providing on-demand builds (compilation and tests) of code changes or software releases. Despite the myriad of CI tools and frameworks, the basic activity of interpreting build results is not straightforward, due to not only the number of builds being performed but also, and especially, due to the phenomenon of build inflation, according to which one code change can be built on dozens of different operating systems, run-time environments and hardware architectures. As existing work mostly ignored this inflation, this paper performs a large-scale empirical study of the impact of OS and run-time environment on build failures on 30 million builds of the CPAN ecosystem's CI environment. We observe the evolution of build failures over time, and investigate the impact of OSes and environments on build failures. We show that distributions may fail differently on different OSes and environments and, thus, that the results of CI require careful filtering and selection to identify reliable failure data.
Article
In episode 217 of Software Engineering Radio, host Charles Anderson talks with James Turnbull, a software developer and security specialist who's vice president of services at Docker. Lightweight Docker containers are rapidly becoming a tool for deploying microservice-based architectures.
Article
Building is an integral part of the software development process. However, little is known about the compiler errors that occur in this process. In this paper, we present an empirical study of 26.6 million builds produced during a period of nine months by thousands of developers. We describe the workflow through which those builds are generated, and we analyze failure frequency, compiler error types, and resolution efforts to fix those compiler errors. The results provide insights on how a large organization build process works, and pinpoints errors for which further developer support would be most effective.