Conference Paper

One Size Does Not Fit All: An Empirical Study of Containerized Continuous Deployment Workflows

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Continuous deployment (CD) is a software development practice aimed at automating delivery and deployment of a software product, following any changes to its code. If properly implemented, CD together with other automation in the development process can bring numerous benefits, including higher control and flexibility over release schedules, lower risks, fewer defects, and easier on-boarding of new developers. Here we focus on the (r)evolution in CD workflows caused by containerization, the virtualization technology that enables packaging an application together with all its dependencies and execution environment in a light-weight, self-contained unit, of which Docker has become the de-facto industry standard. There are many available choices for containerized CD workflows, some more appropriate than others for a given project. Owing to cross-listing of GitHub projects on Docker Hub, in this paper we report on a mixed-methods study to shed light on developers' experiences and expectations with containerized CD workflows. Starting from a survey, we explore the motivations, specific workflows, needs, and barriers with containerized CD. We find two prominent workflows, based on the automated builds feature on Docker Hub or continuous integration services, with different trade-offs. We then propose hypotheses and test them in a large-scale quantitative study.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... $15.00 https://doi.org/10.1145/3379597.3387483 configuration file [6], workflow [12], and best configuration practices [10]. Those works have emerged a lot of great findings and brought many practical implications to developers, but were not designed to look into the details of Docker builds. ...
... Docker Hub provides GitHub integration as well as some featured tools, e.g., automated builds 4 , which allow developers to build their images automatically from GitHub sources. The build data on Docker Hub is available for mining, if the projects are public (using automated builds) [12]. ...
... The trends that we observe in the Docker context may be due to several reasons. E.g., previous study [12] verified that Docker build latency tends to increase over time. A time-consuming and burdensome build process may increase the probability of build failure. ...
Conference Paper
Docker containers have become the de-facto industry standard. Docker builds often break, and a large amount of efforts are put into troubleshooting broken builds. Prior studies have evaluated the rate at which builds in large organizations fail. However, little is known about the frequency and fix effort of failures that occur in Docker builds of open-source projects. This paper provides a first attempt to present a preliminary study on 857,086 Docker builds from 3,828 open-source projects hosted on GitHub. Using the Docker build data, we measure the frequency of broken builds and report their fix time. Furthermore, we explore the evolution of Docker build failures across time. Our findings help to characterize and understand Docker build failures and motivate the need for collecting more empirical evidence.
... Recently, many studies have been conducted on the Dockerbased software development process. Specifically, Zhang et al. [17] proposed a mixed-methods study to shed light on developers' experiences and expectations with Docker-enabled workflows. Their results revealed two prominent workflows, based on the automated builds feature on Docker Hub or Continuous Integration services, with different trade-offs. ...
... By following the detection approach used in previous studies [10], [17], [28], in our study we mainly use the Haskell Dockerfile Linter 8 to detect different Dockerfile smells. This tool parses the Dockerfile into an AST and performs rules on top of the AST. ...
... • nInstructions: number of instructions in a Dockerfile, as a proxy for the Dockerfile complexity [17], [20]. A large Dockerfile may have more smells than a small Dockerfile; • nContributors: number of project contributors (submitted at least one commit), as a proxy for the project team size [34]. ...
Article
Full-text available
Dockerfile plays an important role in the Docker-based software development process, but many Dockerfile codes are infected with smells in practice. Understanding the occurrence of Dockerfile smells in open-source software can benefit the practice of Dockerfile and enhance project maintenance. In this paper, we perform an empirical study on a large dataset of 6,334 projects to help developers gain some insights into the occurrence of Dockerfile smells, including its coverage, distribution, cooccurrence, and correlation with project characteristics. Our results show that smells are very common in Dockerfile codes and there exists co-occurrence between different types of Dockerfile smells. Further, using linear regression analysis, when controlled for various variables, we statistically identify and quantify the relationships between Dockerfile smells occurrence and project characteristics. We also provide a rich resource of implications for software practitioners.
... Similar to the concept of inheritance in object-oriented programming, docker images can define the FROM instruction to inherit image definitions from another base image [15]. The new image will inherit all the attributes and files encapsulated in the base image [13]. ...
... • II. MOTIVATION Adopting a suitable base image is important to the docker image creation, as different base image selection may have various effects on the quality and efficiency, and size of the resulting image [6], [15]. For instance, Cito et al. [6] suggests that the same application can be run with a different base image to reduce the overall size and preferably also build time. ...
Preprint
Full-text available
Docker containers are being widely used in large-scale industrial environments. In practice, developers must manually specify the base image in the dockerfile in the process of container creation. However, finding the proper base image is a nontrivial task because manually searching is time-consuming and easily leads to the use of unsuitable base images, especially for newcomers. There is still a lack of automatic approaches for recommending related base image for developers through dockerfile configuration. To tackle this problem, this paper makes the first attempt to propose a neural network approach named DCCimagerec which is based on deep configuration comprehension. It aims to use the structural configuration features of dock-erfile extracted by AST and path-attention model to recommend potentially suitable base image. The evaluation experiments based on about 83,000 dockerfiles show that DCCimagerec outperforms multiple baselines, improving Precision by 7.5%-67.5%, Recall by 6.2%-106.6%, and F1 by 7.5%-150.2%.
... To ameliorate this problem, we introduce binnacle: the first toolset for semantics-aware rule mining from, and rule enforcement in, Dockerfiles. We selected Dockerfiles as the initial type of artifact because it is the most prevalent De-vOps artifact in industry (some 79% of IT companies use it [27]), has become the de-facto container technology in OSS [15,38], and it has a characteristic that we observe in many other types of DevOps artifacts, namely, fragments of shell code are embedded within its declarative structure. ...
... Zhang et al. [38] explored the different methods of continuous deployment (CD) that use containerized deployment. While they found that developers see many benefits when using CD, adopting CD also poses many challenges. ...
Preprint
With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining, and (iii) the lack of semantic rule-based analysis. To address these challenges we introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub repositories. Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles, and also identified a Gold Set of Dockerfiles written by Docker experts. We addressed challenge (i) by reducing the number of effectively uninterpretable nodes in our ASTs by over 80% via a technique we call phased parsing. To address challenge (ii), we introduced a novel rule-mining technique capable of recovering two-thirds of the rules in a benchmark we curated. Through this automated mining, we were able to recover 16 new rules that were not found during manual rule collection. To address challenge (iii), we manually collected a set of rules for Dockerfiles from commits to the files in the Gold Set. These rules encapsulate best practices, avoid docker build failures, and improve image size and build latency. We created an analyzer that used these rules, and found that, on average, Dockerfiles on GitHub violated the rules five times more frequently than the Dockerfiles in our Gold Set. We also found that industrial Dockerfiles fared no better than those sourced from GitHub. The learned rules and analyzer in binnacle can be used to aid developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues in, and to improve, existing Dockerfiles.
... Moreover, containers enable fast deployment and recovery, supporting continuous deployment [7]. Docker is an open software for containerization that has been adopted as the de-facto method for virtualization in the industry [14]. ...
... The most difficult topics mentioned in the feedback were networking (37 times), volumes (23), application frameworks (14), Nginx (16), Postgres (8), and CORS (8). Based on the feedback collected from the course staff, the usage of operating systems, namely Linux, was challenging at the beginning of the course as many students were not familiar with the usage of the command-line interface and, for example, creating small script files using bash. ...
Chapter
We present the design of a online course that focuses on container-based virtualization as part of the DevOps toolchain. In addition, we outline the professional background of participants taking the course, and describe how this affects perceived previous knowledge of DevOps. We found out that the self-evaluated conceptual understanding of DevOps topics is nearly equal regardless of the participants professional identity (e.g., student or developer). However, there are significant differences in how much participants have used tools like Docker before. We conclude that there is a clear need for lifelong learning among software engineering professionals as (future) developers often struggle in operations related skills such as command line or networking.
... Therefore, in such a case, the development team will not benefit from the test automation supposed to be promoted by CI tools. Thus, a common misconception that has been acknowledged about CI is that the sole adoption of a CI tool does not imply the proper adherence to CI practices [8], [10]. Indeed, such kind of situation has long been one of the Achilles' heels of agile. ...
Preprint
Full-text available
Background: Continuous Integration (CI) systems are now the bedrock of several software development practices. Several tools such as TravisCI, CircleCI, and Hudson, that implement CI practices, are commonly adopted by software engineers. However, the way that software engineers use these tools could lead to what we call "Continuous Integration Theater", a situation in which software engineers do not employ these tools effectively, leading to unhealthy CI practices. Aims: The goal of this paper is to make sense of how commonplace are these unhealthy continuous integration practices being employed in practice. Method: By inspecting 1,270 open-source projects that use TravisCI, the most used CI service, we quantitatively studied how common is to use CI (1) with infrequent commits, (2) in a software project with poor test coverage, (3) with builds that stay broken for long periods, and (4) with builds that take too long to run. Results: We observed that 748 ($sim$60%) projects face infrequent commits, which essentially makes the merging process harder. Moreover, we were able to find code coverage information for 51 projects. The average code coverage was 78%, although Ruby projects have a higher code coverage than Java projects (86% and 63%, respectively). However, some projects with very small coverage ($sim$4%) were found. Still, we observed that 85% of the studied projects have at least one broken build that take more than four days to be fixed. Interestingly, very small projects (up to 1,000 lines of code) are the ones that take the longest to fix broken builds. Finally, we noted that, for the majority of the studied projects, the build is executed under the 10 minutes rule of thumb. Conclusions: Our results are important to an increasing community of software engineers that employ CI practices on daily basis but may not be aware of bad practices that are eventually employed.
... Due to the developers are overlapped in GitHub and SO [32], we identified those developers who also use SO, sampled 200 of them, and sent them email invitation with a link to the online form. Within 10 days, we receive 26 responses, for a response rate of 13% which is consistent with other software engineering online surveys [33], [34]. Respondents indicate that their experience in the industry was 7.10 years on average (median: 8; range 1-13), while their microservice experience was 3.07 years on average (median: 3; range 1-9). ...
Preprint
Full-text available
Microservice architecture is a dominant architec-tural style in SaaS industry, which helps to develop a singleapplication as a collection of independent, well-defined, and inter-communicating services. The number of microservice-relatedquestions in Q&A websites, such as Stack Overflow, has ex-panded substantially over the last years. Due to its increasingpopularity, it is essential to understand the existing problemsthat microservice developers face in practices as well as thepotential solutions to these problems. Such an investigation ofproblems and solutions is vital for long-term, impactful, andqualified research and practices in microservice community.Unfortunately, we currently know relatively little about suchknowledge. To fill this gap, we conduct a large-scale in-depthempirical study on 17,522 Stack Overflow microservice-relatedposts. Our analysis leads to the first taxonomy of microservice-related topics based on the software development process. Byanalyzing the characteristics of the accepted answers, we findthat there are fewer experts in the microservice than otherdomains, and such a phenomenon is most significant with respectto the microservice design phase. Furthermore, we performmanual analysis on 6,013 answers accepted by developers anddistill 47 general solution strategies for different microservice-related problems, 22 of which are proposed for the first time.For instance, several problems inherent in the delivery phasecan be lessened by referring to external sources like GitHubcode examples. Our findings can therefore facilitate research anddevelopment on emerging microservice systems.
... Request [27,34], and reports from CI systems require effort to process and can cause unwanted interruptions [39], especially without developer buy-in and in the presence of frequent false positives from flaky tests and platform instabilities [37]. Bad experiences or frustration with a specific CI tool can turn developers away from CI as a practice, even when more customized tool solutions exist [86]. ...
Conference Paper
Continuous integration (CI) is an established software quality assurance practice, and the focus of much prior research with a diverse range of methods and populations. In this paper, we first conduct a literature review of 37 papers on CI pain points. We then conduct a conceptual replication study on results from these papers using a triangulation design consisting of a survey with 132 responses, 12 interviews, and two logistic regressions predicting Travis CI abandonment and switching on a dataset of 6,239 GitHub projects. We report and discuss which past results we were able to replicate, those for which we found conflicting evidence, those for which we did not find evidence, and the implications of these findings.
... DevOps artifacts in general, and Dockerfiles in particular, represent a relatively under-served area with respect to advanced tooling for assisting developers. We focus on Docker because it is the most prevalent DevOps artifact in industry (some 79% of IT companies use it [10]) and the de-facto container technology in OSS [6,12]. Nevertheless, the VS Code Docker extension, with its over 3.7 million unique installations, features relatively shallow syntactic support [8]. ...
Preprint
Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of approximately 178,000 unique Dockerfiles collected from GitHub. To enhance the usability of this data, we describe five representations we have devised for working with, mining from, and analyzing these Dockerfiles. Each Dockerfile representation builds upon the previous ones, and the final representation, created by three levels of nested parsing and abstraction, makes tasks such as mining and static checking tractable. The Dockerfiles, in each of the five representations, along with metadata and the tools used to shepard the data from one representation to the next are all available at: https://doi.org/10.5281/zenodo.3628771.
... The success of many community-based Open Source Software (OSS) projects relies heavily on a large number of volunteer developers [35], [64], [84], [86], who are geographically distributed and collaborate online with others from all over the world [43], [58]. Compared to the traditional email-based contribution submission [25], the pull-based model [40] on modern collaborative coding platforms (e.g., GitHub [9] and GitLab [10]) supports a more efficient collaboration process [115], by coupling code repository with issue tracking, review discussion and continuous integration, delivery and deployment [79], [110]. Consequently, an increasing number of OSS projects are adopting the synthesized pull-based mechanism, which helps them improve their productivity [98] and attract more contributors [108]. ...
Article
Full-text available
OSS projects are being developed by globally distributed contributors, who often collaborate through the pull-based model today. While this model lowers the barrier to entry for OSS developers by synthesizing, automating and optimizing the contribution process, coordination among an increasing number of contributors remains as a challenge due to the asynchronous and self-organized nature of distributed development. In particular, duplicate contributions, where multiple different contributors unintentionally submit duplicate pull requests to achieve the same goal, are an elusive problem that may waste effort in automated testing, code review and software maintenance. While the issue of duplicate pull requests has been highlighted, to what extent duplicate pull requests affect the development in OSS communities has not been well investigated. In this paper, we conduct a mixed-approach study to bridge this gap. Based on a comprehensive dataset constructed from 26 popular GitHub projects, we obtain the following findings: (a) Duplicate pull requests result in redundant human and computing resources, exerting a significant impact on the contribution and evaluation process. (b) Contributors' inappropriate working patterns and the drawbacks of their collaborating environment might result in duplicate pull requests. (c) Compared to non-duplicate pull requests, duplicate pull requests have significantly different features, e.g., being submitted by inexperienced contributors, being fixing bugs, touching cold files, and solving tracked issues. (d) Integrators choosing between duplicate pull requests prefer to accept those with early submission time, accurate and high-quality implementation, broad coverage, test code, high maturity, deep discussion, and active response. Finally, actionable suggestions and implications are proposed for OSS practitioners.
... Containerization enables packaging an application with self-contained units, such as Docker and Kubernetes. Investigations by Zhang et al. (2018) showed that 45.8% of the respondents have changed from one form of the container flow to another and centralized logging allows containers to share information with an entire set of all components. Among the popular containers, Docker runs on a single node, whereas Kubernetes is designed to run across a cluster. ...
Article
Software application deployment change management is one of the emerging research themes that is gaining increased focus day by day. Our study examined the factors that affect software application deployment change management in Agile software development settings. Our study provided a systematic review and synthesized the approaches, practices, and challenges reported for adopting and implementing deployment change management. The prime objective of our study was to systematically synthesize the data extracted and formulate evidence-based practical recommendations that are influential in software deployment change management. Six research themes are proposed to evaluate the rationale of the research question. This qualitative study and systematic review explored the pertinent research articles and key findings from prominent academic databases. Based on the selected criteria, the final screening revealed 25 articles from an immense set of publications. Key findings that emerged from these publications are correlated with the six research themes: (a) timely communication with all stakeholders; (b) the reliance of deployment approaches on past experience; (c) the importance of collaboration among team members having adequate knowledge of DevOps tools; (d) the ramification of the differences among development, test, and production environments; (e) the influential areas that reap the benefits of continuous delivery and deployment; and (f) the challenges of the effective use of containerization. We also found indications of the significance of Lewin’s three-step change process model in the Agile development and deployment environment. Overall, our study deepens understanding of this thriving research area and contributes to the literature on Agile deployment and the software change management process.
... External validity. External validity is the validity of the generalizations of the findings in this study beyond the context under study [51]. When performing qualitative analyses of the instances of FAs, we selected a set of 300 questions, which is a relatively limited sample set. ...
Conference Paper
Full-text available
Modern programming question & answer (Q&A) sites such as Stack Overflow (SO) employ gamified mechanisms to stimulate volunteers' contributions. To maximize the chances of winning gamification rewards such as reputation and badges, a portion of users race to post answers as quickly as possible (i.e., fast answers or FAs), which makes SO the fastest Q&A site; however, this behavior may affect the contribution quality as well. In this paper, we report on a large-scale, mixed-methods empirical study of the gamification-influenced FA phenomenon in SO. We first quantitatively investigate the popularity of the phenomenon and user behaviors regarding FAs. Then, we study the quality of FAs by using regression modeling and qualitatively analyzing 300 instances of FAs. Our main findings reveal that more than 70% and 90% of FAs are not edited by the answerers and other users, respectively, and that later incoming answers have lower chances of being voted on and accepted. Notably, we find that the answer length, code snippets length, and readability of FAs are significantly lower than those of non-fast answers. Although FAs have higher crowd assessment scores, they have no relationship with acceptance from the perspective of asker assessment, and a considerable portion of FAs solve the problem by interacting with the asker in the comments. These results help us better understand the effects of reward-based gamification on crowdsourced software engineering communitites and provide implications for designers of gamified systems.
... A deployment pipeline should include explicit stages, e.g., building and packaging, to transfer code from a source repository to the production environment [2]. Zhang et al. [31] investigated the barriers developers face when using containerized CD workflow and the trade-offs developers must make when choosing different CD workflows. ...
Preprint
Automation has become a norm in software development practices, especially in CI/CD practices. Recently, GitHub introduced GitHub Actions (GA) to provide automated workflows for software maintainers. However, few researches have been proposed to evaluate its capability and impact, even though several GA have been built by practitioners. In this paper, we conduct a large-scale empirical study of GitHub projects, to help practitioners gain deep insights into the GA. We quantitatively investigate the basic adoption of GA and its potential correlation with project properties. We also analyze the usage details of GA, including its component scale and action sequences. Finally, using regression modeling, we investigate the impact of GA on the commit frequency, pull request and issue's resolution efficiency. Our findings suggest a nuanced picture of how practitioners are adapting to, and benefiting from GA.
Chapter
Over the past decade, continuous software development has become a common place in the field of software engineering. Containers like Docker are a lightweight solution that developers can use to deploy and manage applications. Containers are used to build both component-based architectures and microservice architectures. Still, practitioners often view containers only as way to lower resource requirements compared to virtual machines. In this paper, we conducted a systematic mapping study to find information on what is known of how containers are used in software development. 56 primary studies were selected into this paper and they were categorized and mapped to identify the gaps in the current research. Based on the results containers are most often discussed in the context of cloud computing, performance and DevOps. We find that what is currently missing is more deeply focused research.
Preprint
Full-text available
Despite substantial recent research activity related to continuous delivery and deployment (CD), there has not yet been a systematic, empirical study on how the practices often associated with continuous deployment have found their way into the broader software industry. This raises the question to what extent our knowledge of the area is dominated by the peculiarities of a small number of industrial leaders, such as Facebook. To address this issue, we conducted a mixed-method empirical study, consisting of a pre-study on literature, qualitative interviews with 20 software developers or release engineers with heterogeneous backgrounds, and a Web-based quantitative survey that attracted 187 complete responses. A major trend in the results of our study is that architectural issues are currently one of the main barriers for CD adoption. Further, feature toggles as an implementation technique for partial rollouts lead to unwanted complexity, and require research on better abstractions and modelling techniques for runtime variability. Finally, we conclude that practitioners are in need for more principled approaches to release decision making, e.g., which features to conduct A/B tests on, or which metrics to evaluate.
Conference Paper
Full-text available
Continuous Integration (CI) and Continuous Delivery (CD) are widespread in both industrial and open-source software (OSS) projects. Recent research characterized build failures in CI and identified factors potentially correlated to them. However, most observations and findings of previous work are exclusively based on OSS projects or data from a single industrial organization. This paper provides a first attempt to compare the CI processes and occurrences of build failures in 349 Java OSS projects and 418 projects from a financial organization, ING Nederland. Through the analysis of 34,182 failing builds (26% of the total number of observed builds), we derived a taxonomy of failures that affect the observed CI processes. Using cluster analysis, we observed that in some cases OSS and ING projects share similar build failure patterns (e.g., few compilation failures as compared to frequent testing failures), while in other cases completely different patterns emerge. In short, we explain how OSS and ING CI processes exhibit commonalities, yet are substantially different in their design and in the failures they report.
Article
Full-text available
Context: Continuous practices, i.e., continuous integration, delivery, and deployment, are the software development industry practices that enable organizations to frequently and reliably release new features and products. With the increasing interest in and literature on continuous practices, it is important to systematically review and synthesize the approaches, tools, challenges, and practices reported for adopting and implementing continuous practices. Objective: This research aimed at systematically reviewing the state of the art of continuous practices to classify approaches and tools, identify challenges and practices in this regard, and identify the gaps for future research. Method: We used systematic literature review (SLR) method for reviewing the peer-reviewed papers on continuous practices published between 2004 and 1st June 2016. We applied thematic analysis method for analysing the data extracted from reviewing 69 papers selected using predefined criteria. Results: We have identified thirty approaches and associated tools, which facilitate the implementation of continuous practices in the following ways: (1) "reducing build and test time in continuous integration (CI)"; (2) "increasing visibility and awareness on build and test results in CI"; (3) "supporting (semi-) automated continuous testing"; (4) "detecting violations, flaws and faults in CI"; (5) "addressing security and scalability issues in deployment pipeline", and (6) "improving dependability and reliability of deployment process". We have also determined a list of critical factors such as "testing (effort and time)", "team awareness and transparency", "good design principles", "customer", "highly skilled and motivated team", "application domain", and "appropriate infrastructure" that should be carefully considered when introducing continuous practices in a given organization.
Article
Full-text available
The need for ever-shorter development cycles, continuous delivery, and cost savings in cloud-based infrastructures led to the rise of containers, which are more flexible than virtual machines and provide near-native performance. Among all container solutions, Docker, a complete packaging and software delivery tool, currently leads the market. This article gives an overview of the container ecosystem and discusses the Docker environment's security implications through realistic use cases. The authors define an adversary model, point out several vulnerabilities affecting current Docker usage, and discuss further research directions.
Conference Paper
Full-text available
Continuous deployment is the software engineering practice of deploying many small incremental software updates into production, leading to a continuous stream of 10s, 100s, or even 1,000s of deployments per day. High-profile Internet firms such as Amazon, Etsy, Facebook, Flickr, Google, and Netflix have embraced continuous deployment. However, the practice has not been covered in textbooks and no scientific publication has presented an analysis of continuous deployment. In this paper, we describe the continuous deployment practices at two very different firms: Facebook and OANDA. We show that continuous deployment does not inhibit productivity or quality even in the face of substantial engineering team and code size growth. To the best of our knowledge, this is the first study to show it is possible to scale the size of an engineering team by 20X and the size of the code base by 50X without negatively impacting developer productivity or software quality. Our experience suggests that top-level management support of continuous deployment is necessary, and that given a choice, developers prefer faster deployment. We identify elements we feel make continuous deployment viable and present observations from operating in a continuous deployment environment.
Conference Paper
Full-text available
Software development is conducted in increasingly dynamic business environments. Organizations need the capability to develop, release and learn from software in rapid parallel cycles. The abilities to continuously deliver software, to involve users, and to collect and prioritize their feedback are necessary for software evolution. In 2014, we introduced Rugby, an agile process model with workflows for continuous delivery and feedback management, and evaluated it in university projects together with industrial clients. Based on Rugby’s release management workflow we identified the specific needs for project-based organizations developing mobile applications. Varying characteristics and restrictions in projects teams in corporate environments impact both process and infrastructure. We found that applicability and acceptance of continuous delivery in industry depend on its adaptability. To address issues in industrial projects with respect to delivery process, infrastructure, neglected testing and continuity, we extended Rugby’s workflow and made it tailorable. Eight projects at Capgemini, a global provider of consulting, technology and outsourcing services, applied a tailored version of the workflow. The evaluation of these projects shows anecdotal evidence that the application of the workflow significantly reduces the time required to build and deliver mobile applications in industrial projects, while at the same time increasing the number of builds and internal deliveries for feedback.
Article
Full-text available
Continuous delivery (CD) has emerged as an auspicious alternative to traditional release engineering, promising to provide the capability to release valuable software continuously to customers. Paddy Power has been implementing CD for the past two years. This article explains why Paddy Power decided to adopt CD, describes the resulting CD capability, and reports the huge benefits and challenges involved. These experiences can provide fellow practitioners with insights for their adoption of CD, and the identified challenges can provide researchers valuable input for developing their research agendas.
Article
Full-text available
As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a `DevOps' philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.
Article
Full-text available
Nakagawa & Schielzeth extended the widely used goodness-of-fit statistic R2 to apply to generalized linear mixed models (GLMMs). However, their R2GLMM method is restricted to models with the simplest random effects structure, known as random intercepts models. It is not applicable to another common random effects structure, random slopes models. I show that R2GLMM can be extended to random slopes models using a simple formula that is straightforward to implement in statistical software. This extension substantially widens the potential application of R2GLMM.
Conference Paper
Full-text available
Continuous delivery is a set of practices and principles to release software faster and more frequently. While it helps to bridge the gap between developers and operations for software in production, it can also improve the communication between developers and customers in the development phase, i.e. before software is in production. It shortens the feedback cycle and developers ideally use it right from the beginning of a software development project. In this paper we describe the implementation of a customized continuous delivery workflow and its benefits in a multi-customer project course in summer 2013. Our workflow focuses on the ability to deliver software with only a few clicks to the customer in order to obtain feedback as early as possible. This helps developers to validate their understanding about requirements, which is especially helpful in agile projects where requirements might change often. We describe how we integrated this workflow and the role of the release manager into our project-based organization and how we introduced it using different teaching methods. Within three months 90 students worked in 10 different projects with real customers from industry and delivered 490 releases. After the project course we evaluated our approach in an online questionnaire and in personal interviews. Our findings and observations show that participating students understood and applied the concepts and are convinced about the benefits of continuous delivery.
Conference Paper
Full-text available
Large open and closed source organizations like Google, Facebook and Mozilla are migrating their products towards rapid releases. While this allows faster time-to-market and user feedback, it also implies less time for testing and bug fixing. Since initial research results indeed show that rapid releases fix proportionally less reported bugs than traditional releases, this paper investigates the changes in software testing effort after moving to rapid releases. We analyze the results of 312,502 execution runs of the 1,547 mostly manual system level test cases of Mozilla Fire fox from 2006 to 2012 (5 major traditional and 9 major rapid releases), and triangulated our findings with a Mozilla QA engineer. In rapid releases, testing has a narrower scope that enables deeper investigation of the features and regressions with the highest risk, while traditional releases run the whole test suite. Furthermore, rapid releases make it more difficult to build a large testing community, forcing Mozilla to increase contractor resources in order to sustain testing for rapid releases.
Article
Full-text available
The use of categorical variables in regression involves the application of coding methods. The purpose of this paper is to describe how categorical independent variables can be incorporated into regression by virtue of two coding methods: dummy and effect coding. The paper discusses the uses, interpretations, and underlying assumptions of each method. In general, overall results of the regression are unaffected by the methods used for coding the categorical independent variables. In any of the methods, the analysis tests whether group membership is related to the dependent variables. Both methods yield identical R 2 and F. However, the interpretations of the intercept and regression coefficients depend on what coding method has been applied and whether the groups have equal sample sizes.
Conference Paper
Full-text available
Agile software development is well-known for its focus on close customer collaboration and customer feedback. In emphasizing flexibility, efficiency and speed, agile practices have lead to a paradigm shift in how software is developed. However, while agile practices have succeeded in involving the customer in the development cycle, there is an urgent need to learn from customer usage of software also after delivering and deployment of the software product. The concept of continuous deployment, i.e. the ability to deliver software functionality frequently to customers and subsequently, the ability to continuously learn from real-time customer usage of software, has become attractive to companies realizing the potential in having even shorter feedback loops. However, the transition towards continuous deployment involves a number of barriers. This paper presents a multiple-case study in which we explore barriers associated with the transition towards continuous deployment. Based on interviews at four different software development companies we present key barriers in this transition as well as actions that need to be taken to address these.
Article
Full-text available
The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses – the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferroni-type procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.
Article
The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses — the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.
Conference Paper
Continuous Integration (CI) services, which can automatically build, test, and deploy software projects, are an invaluable asset in distributed teams, increasing productivity and helping to maintain code quality. Prior work has shown that CI pipelines can be sophisticated, and choosing and configuring a CI system involves tradeoffs. As CI technology matures, new CI tool offerings arise to meet the distinct wants and needs of software teams, as they negotiate a path through these tradeoffs, depending on their context. In this paper, we begin to uncover these nuances, and tell the story of open-source projects falling out of love with Travis, the earliest and most popular cloud-based CI system. Using logistic regression, we quantify the effects that open-source community factors and project technical factors have on the rate of Travis abandonment. We find that increased build complexity reduces the chances of abandonment, that larger projects abandon at higher rates, and that a project's dominant language has significant but varying effects. Finally, we find the surprising result that metrics of configuration attempts and knowledge dispersion in the project do not affect the rate of abandonment.
Conference Paper
This paper introduces the idea of mining container image repositories for configuration and other deployment information of software systems. Unlike traditional software repositories (e.g., source code repositories and app stores), image repositories encapsulate the entire execution ecosystem for running target software, including its configurations, dependent libraries and components, and OS-level utilities, which contributes to a wealth of data and information. We showcase the opportunities based on concrete software engineering tasks that can benefit from mining image repositories. To facilitate future mining efforts, we summarize the challenges of analyzing image repositories and the approaches that can address these challenges. We hope that this paper will stimulate exciting research agenda of mining this emerging type of software repositories.
Conference Paper
Continuous integration (CI) systems automate the compilation, building, and testing of software. Despite CI being a widely used activity in software engineering, we do not know what motivates developers to use CI, and what barriers and unmet needs they face. Without such knowledge, developers make easily avoidable errors, tool builders invest in the wrong direction, and researchers miss opportunities for improving the practice of CI. We present a qualitative study of the barriers and needs developers face when using CI. We conduct semi-structured interviews with developers from different industries and development scales. We triangulate our findings by running two surveys. We find that developers face trade-offs between speed and certainty (Assurance), between better access and information security (Security), and between more configuration options and greater ease of use (Flexi- bility). We present implications of these trade-offs for developers, tool builders, and researchers.
Article
Continuous deployment involves automatically testing incremental software changes and frequently deploying them to production environments. With it, developers' changes can reach customers in days or even hours. Such ultrafast changes create a new reality in software development. To understand the emerging practices surrounding continuous deployment, researchers facilitated a one-day Continuous Deployment Summit at the Facebook campus in July 2015, at which participants from 10 companies described how they used continuous deployment. From the resulting conversation, the researchers derived 10 adages about continuous-deployment practices. These adages represent a working set of approaches and beliefs that guide current practice and establish a tangible target for empirical validation by the research community.
Conference Paper
Docker containers have recently become a popular approach to provision multiple applications over shared physical hosts in a more lightweight fashion than traditional virtual machines. This popularity has led to the creation of the Docker Hub registry, which distributes a large number of official and community images. In this paper, we study the state of security vulnerabilities in Docker Hub images. We create a scalable Docker image vulnerability analysis (DIVA) framework that automatically discovers, downloads, and analyzes both official and community images on Docker Hub. Using our framework, we have studied 356,218 images and made the following findings: (1) both official and community images contain more than 180 vulnerabilities on average when considering all versions; (2) many images have not been updated for hundreds of days; and (3) vulnerabilities commonly propagate from parent images to child images. These findings demonstrate a strong need for more automated and systematic methods of applying security updates to Docker images and our current Docker image analysis framework provides a good foundation for such automatic security update. This article is summarized in: the morning paper an interesting/influential/important paper from the world of CS every weekday morning, as selected by Adrian Colyer
Conference Paper
Continuous Delivery is an agile software development practice in which developers frequently integrate changes into the main development line and produce releases of their software. An automated Continuous Integration infrastructure builds and tests these changes. Claimed advantages of CD include early discovery of (integration) errors, reduced cycle time, and better adoption of coding standards and guidelines. This paper reports on a study in which we surveyed 152 developers of a large financial organization (ING Nederland), and investigated how they adopt a Continuous Integration and delivery pipeline during their development activities. In our study, we focus on topics related to managing technical debt, as well as test automation practices. The survey results shed light on the adoption of some agile methods in practice, and sometimes confirm, while in other cases, confute common wisdom and results obtained in other studies. For example, we found that refactoring tends to be performed together with other development activities, technical debt is almost always "self-admitted", developers timely document source code, and assure the quality of their product through extensive automated testing, with a third of respondents dedicating more than 50% of their time to do testing activities.
Conference Paper
Continuous deployment is the practice of releasing software updates to production as soon as it is ready, which is receiving increased adoption in industry. The frequency of updates of mobile software has traditionally lagged the state of practice for cloud-based services for a number of reasons. Mobile versions can only be released periodically. Users can choose when and if to upgrade, which means that several different releases coexist in production. There are hundreds of Android hardware variants, which increases the risk of having errors in the software being deployed. Facebook has made significant progress in increasing the frequency of its mobile deployments. Over a period of 4 years, the Android release has gone from a deployment every 8 weeks to a deployment every week. In this paper, we describe in detail the mobile deployment process at FB. We present our findings from an extensive analysis of software engineering metrics based on data collected over a period of 7 years. A key finding is that the frequency of deployment does not directly affect developer productivity or software quality. We argue that this finding is due to the fact that increasing the frequency of continuous deployment forces improved release and deployment automation, which in turn reduces developer workload. Additionally, the data we present shows that dog-fooding and obtaining feedback from alpha and beta customers is critical to maintaining release quality.
Conference Paper
This paper illustrates how Jenkins evolved from being a pure Continuous Integration Platform to a Continuous Delivery one, embracing the new design tendency where not only the build but also the release and the delivery process of the product is automated. In this scenario Jenkins becomes the orchestrator tool for all the teams/roles involved in the software lifecycle, thanks to which Development, Quality&Assurance and Operations teams can work closely together. Goal of this paper is not only to position Jenkins as hub for CD, but also introduce the challenges that still need to be solved in order to strengthen Jenkins' tracking capabilities.
Conference Paper
Docker is an open platform for developers and system administrators to build, ship, and run distributed applications using Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows. The main advantage is that, Docker can get code tested and deployed into production as fast as possible. Different applications can be run over Docker containers with language independency. In this paper the performance of these Docker containers are evaluated based on their system performance. That is based on system resource utilization. Different benchmarking tools are used for this. Performance based on file system is evaluated using Bonnie++. Other system resources such as CPU utilization, memory utilization etc. are evaluated based on the benchmarking code (using psutil) developed using python. Detail results obtained from all these tests are also included in this paper. The results include CPU utilization, memory utilization, CPU count, CPU times, Disk partition, network I/O counter etc.
Article
BACKGROUND - The software intensive industry is moving towards the adoption of a value-driven and adaptive real-time business paradigm. The traditional view of software as an item that evolves through releases every few months is being replaced by continuous evolution of software functionality. OBJECTIVE - This study aims to classify and analyse literature related to continuous deployment in the software domain in order to scope the phenomenon, provide an overview of its state-of-the-art, investigate the scientific evidence in the reported results and identify areas that are suitable for further research. METHOD - We conducted a systematic mapping study and classified the continuous deployment literature. The benefits and challenges related to continuous deployment were also analyzed. RESULTS - The systematic mapping study includes 50 primary studies published between 2001 and 2014. An in-depth analysis of the primary studies revealed ten recurrent themes that characterize continuous deployment and provide researchers with directions for future work. In addition, a set of benefits and challenges of which practitioners may take advantage were identified. CONCLUSION - Overall, although the topic area is very promising, it is still in its infancy, thus offering a plethora of new opportunities for both researchers and software intensive companies.
Article
In episode 217 of Software Engineering Radio, host Charles Anderson talks with James Turnbull, a software developer and security specialist who's vice president of services at Docker. Lightweight Docker containers are rapidly becoming a tool for deploying microservice-based architectures.
Article
As part of a Finnish research program, researchers interviewed 15 information and communications technology companies to determine the extent to which the companies adopted continuous deployment. They also aimed to find out why continuous deployment is considered beneficial and what the obstacles are to its full adoption. The benefits mentioned the most often were the ability to get faster feedback, the ability to deploy more often to keep customers satisfied, and improved quality and productivity. Despite understanding the benefits, none of the companies adopted a fully automatic deployment pipeline. The companies also had higher continuous-deployment capability than what they practiced. In many cases, they consciously chose to not aim for full continuous deployment. Obstacles to full adoption included domain-imposed restrictions, resistance to change, customer desires, and developers' skill and confidence.
Article
This issue's "Cloud Tidbit" focuses on container technology and how it's emerging as an important part of the cloud computing infrastructure. It looks at Docker, an open source project that automates the faster deployment of Linux applications, and Kubernetes, an open source cluster manager for Docker containers.
Article
Using grounded theory as an example, this paper examines three methodological questions that are generally applicable to all qualitative methods. How should the usual scientific canons be reinterpreted for qualitative research? How should researchers report the procedures and canons used in their research? What evaluative criteria should be used in judging the research products? The basic argument we propose is that the criteria should be adapted to fit the procedures of the method. We demonstrate how we have done this with grounded theory and suggest criteria for evaluating studies done in this mode. We suggest that other qualitative researchers might be similarly specific about their procedures and evaluative criteria.
Article
Context Continuous deployment (CD) is an emerging software development process with organisations such as Facebook, Microsoft, and IBM successfully implementing and using the process. The CD process aims to immediately deploy software to customers as soon as new code is developed, and can result in a number of benefits for organizations, such as: new business opportunities, reduced risk for each release, and prevent development of wasted software. There is little academic literature on the challenges organisations face when adopting the CD process, however there are many anecdotal challenges that organisations have voiced on their online blogs. Objective The aim of this research is to examine the challenges faced by organisations when adopting CD as well as the strategies to mitigate these challenges. Method An explorative case study technique that involves in-depth interviews with software practitioners in an organisation that has adopted CD was conducted to identify these challenges. Results This study found a total of 20 technical and social adoption challenges that organisations may face when adopting the CD process. The results are discussed to gain a deeper understanding of the strategies employed by organisations to mitigate the impacts of these challenges. Conclusion While a number of individual technical and social adoption challenges were uncovered by the case study in this research, most challenges were not faced in isolation. The severity of these challenges were reduced by a number of mitigation strategies employed by the case study organization. It is concluded that organisations need to be well prepared to handle technical and social adoption challenges with their existing expertise, processes and tools before adopting the CD process. For practitioners, knowing how to address the challenges an organization may face when adopting the CD process provides a level of awareness that they previously may not have had.
Article
Docker promises the ability to package applications and their dependencies into lightweight containers that move easily between different distros, start up quickly and are isolated from each other.
Article
Continuous integration has been around for a while now, but the habits it suggests are far from common practice. Automated builds, a thorough test suite, and committing to the mainline branch every day sound simple at first, but they require a responsible team to implement and constant care. What starts with improved tooling can be a catalyst for long-lasting change in your company's shipping culture. Continuous integration is more than a set of practices, it's a mindset that has one thing in mind: increasing customer value. The Web extra at http://youtu.be/tDl_cHfrJZo is an audio podcast of the Tools of the Trade column discusses how continuous integration is more than a set of practices, it's a mindset that has one thing in mind: increasing customer value.
Conference Paper
Rally Software transitioned from shipping code every eight-weeks, with time-boxed Scrum sprints, to a model of continuous delivery with Kanban. The team encountered complex challenges with their build systems, automated test suites, customer enablement, and internal communication. But there was light at the end of the tunnel - greater control and flexibility over feature releases, incremental delivery of value, lower risks, fewer defects, easier on-boarding of new developers, less off-hours work, and a considerable up tick in confidence. This experience report describes the journey to continuous delivery with the aim that others can learn from our mistakes and get their teams deploying more frequently. We will describe and contrast this transition from the business (product management) and engineering perspectives.
Conference Paper
Doing high quality research about the human side of software engineering necessitates the participation of real software developers in studies, but getting high levels of participation is a challenge for software engineering researchers. In this paper, we discuss several factors that software engineering researchers can use when recruiting participants, drawn from a combination of general research on survey design, research on persuasion, and our experience in conducting surveys. We study these factors by performing post-hoc analysis on several previously conducted surveys. Our results provide insight into the factors associated with increased response rates, which are neither wholly composed of factors associated strictly with persuasion research, nor those of conventional wisdom in software engineering.
Article
The use of both linear and generalized linear mixed-effects models (LMMs and GLMMs) has become popular not only in social and medical sciences, but also in biological sciences, especially in the field of ecology and evolution. Information criteria, such as Akaike Information Criterion (AIC), are usually presented as model comparison tools for mixed-effects models. The presentation of variance explained' (R2) as a relevant summarizing statistic of mixed-effects models, however, is rare, even though R2 is routinely reported for linear models (LMs) and also generalized linear models (GLMs). R2 has the extremely useful property of providing an absolute value for the goodness-of-fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R2 can also be a quantity of biological interest. One reason for the under-appreciation of R2 for mixed-effects models lies in the fact that R2 can be defined in a number of ways. Furthermore, most definitions of R2 for mixed-effects have theoretical problems (e.g. decreased or negative R2 values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation). Here, we make a case for the importance of reporting R2 for mixed-effects models. We first provide the common definitions of R2 for LMs and GLMs and discuss the key problems associated with calculating R2 for mixed-effects models. We then recommend a general and simple method for calculating two types of R2 (marginal and conditional R2) for both LMMs and GLMMs, which are less susceptible to common problems. This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed-effects models. The proposed method has the potential to facilitate the presentation of R2 for a wide range of circumstances.
Conference Paper
The software as a service (SaaS) delivery model has recently become an acceptable business model for software delivery. However, the attractiveness of SaaS to customers, with their low upfront costs and seamless delivery, changes the game in ways that challenge traditional software development methodologies. As customers begin to treat software as an on demand utility, expectations of high uptime, predictable quality, strong security and privacy, and lower total costs become at least as critical as time-honored feature functionality requirements. The efficiency and accuracy of delivery can make or break a business. As demands upon the service increase and change, the resilience and transparent scalability of the operational architecture becomes as important as that of the software running within it. This paper explores how these new challenges parallel modern demands within manufacturing, and how many key lean manufacturing techniques might be applicable within this emerging field.
Conference Paper
Testing and deployment can be a difficult and time-consuming process in complex environments comprising application servers, messaging infrastructure and interfaces to external systems. We have seen deployments take several days, even in cases where teams have used automated builds to ensure their code is fully tested. In this paper we describe principles and practices which allow new environments to be created, configured and deployed to at the click of a button. We show how to fully automate your testing and deployment process using a multi-stage automated workflow. Using this "deployment production line", it is possible to deploy fully tested code into production environments quickly and with full confidence that you can fall back to a previous version easily should a problem occur
Conference Paper
One purpose of empirical software engineering is to enable an understanding of factors that influence software development. Surveys are an appropriate empirical strategy to gather data from a large population (e.g., about methods, tools, developers, companies) and to achieve an understanding of that population. Although surveys are quite often performed, for example, in social sciences and marketing research, they are underrepresented in empirical software engineering research, which most often uses controlled experiments and case studies. Consequently, also the methodological support how to perform such studies in software engineering is rather low. However, with the increasing pervasion of the Internet it is possible to perform surveys easily and cost-effectively over Internet pages (i.e., on-line), while at the same time the interest in performing surveys is growing. The purpose of this paper is twofold. First we want to arise the awareness of on-line surveys and discuss methods how to perform these in the context of software engineering. Second, we report our experience in performing on-line surveys in the form of lessons learned and guidelines.
  • C I Travis
Travis CI. 2018. Build Environment Overview. Retrieved July 17, 2018 from https://docs.travis-ci.com/user/reference/overview/
Installing a newer Docker version
  • C I Travis
Travis CI. 2018. Installing a newer Docker version. Retrieved July 17, 2018 from https://docs.travis-ci.com/user/docker/#Installing-a-newer-Docker-version
Using Docker Compose
  • C I Travis
Travis CI. 2018. Using Docker Compose. Retrieved July 17, 2018 from https: //docs.travis-ci.com/user/docker/#Using-Docker-Compose
Using Docker in Builds
  • C I Travis
Travis CI. 2018. Using Docker in Builds. Retrieved July 17, 2018 from https: //docs.travis-ci.com/user/docker/