Article

Continuous Integration and Its Tools

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Continuous integration has been around for a while now, but the habits it suggests are far from common practice. Automated builds, a thorough test suite, and committing to the mainline branch every day sound simple at first, but they require a responsible team to implement and constant care. What starts with improved tooling can be a catalyst for long-lasting change in your company's shipping culture. Continuous integration is more than a set of practices, it's a mindset that has one thing in mind: increasing customer value. The Web extra at http://youtu.be/tDl_cHfrJZo is an audio podcast of the Tools of the Trade column discusses how continuous integration is more than a set of practices, it's a mindset that has one thing in mind: increasing customer value.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... To merge several changes efficiently, this process should be automated as much as possible. Common version control systems (VCS) are capable of automatically merging changes as long as there are no conflicting changes [28]. For this, [28] provides an exhaustive overview of tools and approaches. ...
... Common version control systems (VCS) are capable of automatically merging changes as long as there are no conflicting changes [28]. For this, [28] provides an exhaustive overview of tools and approaches. ...
... 1. Continuous product planning considers various inputs (customer feedback, market data, etc.) and prepares plans on leveraging software engineering to attain organizational objectives (Lin 2018;Provost and Fawcett 2013) 2. Continuous software integration comprises continuous development, configuration management, testing, integration, and other activities to produce working software (Meyer 2014;Felidré et al. 2019;Hilton et al. 2017). 3. Continuous deployment making the latest software features available for delivery to end-users (Zhu et al. 2016;Senapathi et al. 2018). ...
... Step # 3, (Pinto et al. 2018;Kim et al. 2008;Meyer 2014;Hilton et al. 2017;Hilton et al. 2016;Felidré et al. 2019). ...
Article
Full-text available
Context Software companies must become better at delivering software to remain relevant in the market. Continuous integration and delivery practices promise to streamline software deliveries to end-users by implementing an automated software development and delivery pipeline. However, implementing or retrofitting an organization with such a pipeline is a substantial investment, while the reporting on benefits and their relevance in specific contexts/domains are vague. Aim In this study, we explore continuous software engineering practices from an investment-benefit perspective. We identify what benefits can be attained by adopting continuous practices, what the associated investments and risks are, and analyze what parameters determine their relevance. Method We perform a multiple case study to understand state-of-practice, organizational aims, and challenges in adopting continuous software engineering practices. We compare state-of-practice with state-of-the-art to validate the best practices and identify relevant gaps for further investigation. Results We found that companies start the CI/CD adoption by automating and streamlining the internal development process with clear and immediate benefits. However, upgrading customers to continuous deliveries is a major obstacle due to existing agreements and customer push-back. Renegotiating existing agreements comes with a risk of losing customers and disrupting the whole organization. Conclusions We conclude that the benefits of CI/CD are overstated in literature without considering the contextual and domain complexities rendering some benefits infeasible. We identify the need to understand the customer and organizational perspectives further and understand the contextual requirements towards the CI/CD.
... Overall, build pipelines are perfectly suited to improve the quality of code through automated testing which enables developers to "Commit Daily, Commit Often" and this way to introduce a culture which can greatly improve debugging capabilities [9]. ...
Article
Full-text available
To implement software for autonomous driving in Formula Student competitions, frequent retraining of neural networks (NNs) is critical to adapt to evolving datasets and dynamic team structures. This paper presents a Continuous Integration (CI) pipeline that automates data labeling, augmentation, training, and validation, focusing on lightweight NNs for perception tasks. Initially developed for Sensor-Fusion within the perception component, the pipeline was later adapted to the Vision-Only approach. The challenges of transitioning pipelines across differing sensor setups and software solutions and the feasibility of reusing existing artifacts are evaluated, highlighting the resilience and limitations of CI pipelines over time. Insights from deployment emphasize the importance of standardization and robust documentation to ensure adaptability and sustainability in rapidly evolving software environments.
... This is especially the case for VCS and CI processes. While most of the technologies are loosely coupled or even independent from each other, for the successful realization of a CI process, a VCS is strictly required (Meyer, 2014). When a developer pushes code to the repository, VCS tracks changes to know the code's version. ...
Conference Paper
Full-text available
The growing digitalization influences almost all areas of our modern life. Companies are aligning themselves with this movement already. However, public institutions in particular still have problems that are primarily originating out of high costs. Serverless computing is a new trend in cloud computing that enables modern software development to build and run services without managing the underlying infrastructure, solely providing single functionalities. To identify whether serverless computing offers benefits in terms of use and cost, especially for higher educational institutions, two autograder approaches for programming assignments are designed, developed, and implemented, that utilize services offered by the Google Cloud Platform. While one is being built using serverless technologies, another approach is based on regular cloud technologies and best-practices from existing literature. Eventually, a thorough evaluation is performed, in which both are investigated in terms of the execution time, load, cold start, cost, and usability.
... MCR is integrated into modern software development pipelines and all leading configuration management platforms enable this way of working. Git and Gerrit [4,5] are two examples of such tools, where the developers can review other's code before it is integrated with the main branch. ...
Article
Background: Modern Code Reviews (MCR) are frequently adopted when assuring code and design quality in continuous integration and deployment projects. Although tiresome, they serve a secondary purpose of learning about the software product. Aim: Our objective is to design and evaluate a support tool to help software developers focus on the most important code fragments to review and provide them with suggestions on what should be reviewed in this code. Method: We used design science research to develop and evaluate a tool for automating code reviews by providing recommendations for code reviewers. The tool is based on Transformer-based machine learning models for natural language processing, applied to both programming language code (patch content) and the review comments. We evaluate both the ability of the language model to match similar lines and the ability to correctly indicate the nature of the potential problems encoded in a set of categories. We evaluated the tool on two open-source projects and one industry project. Results: The proposed tool was able to correctly annotate (only true positives) 35%–41% and partially correctly annotate 76%–84% of code fragments to be reviewed with labels corresponding to different aspects of code the reviewer should focus on. Conclusion: By comparing our study to similar solutions, we conclude that indicating lines to be reviewed and suggesting the nature of the potential problems in the code allows us to achieve higher accuracy than suggesting entire changes in the code considered in other studies. Also, we have found that the differences depend more on the consistency of commenting rather than on the ability of the model to find similar lines.
... In CI, developers are encouraged to commit their changes early and often. Changes that are smaller and more regularly integrated are easier to debug when something breaks [9]. Thus, build times are very important when integrations are frequent. ...
Preprint
Automated builds are integral to the Continuous Integration (CI) software development practice. In CI, developers are encouraged to integrate early and often. However, long build times can be an issue when integrations are frequent. This research focuses on finding a balance between integrating often and keeping developers productive. We propose and analyze models that can predict the build time of a job. Such models can help developers to better manage their time and tasks. Also, project managers can explore different factors to determine the best setup for a build job that will keep the build wait time to an acceptable level. Software organizations transitioning to CI practices can use the predictive models to anticipate build times before CI is implemented. The research community can modify our predictive models to further understand the factors and relationships affecting build times.
... Although complicated and expensive to build and maintain, distributed and heterogeneous wireless testbeds offer the most realistic experimental conditions [3]. While the cost of adding additional hardware functionality to support emerging technologies is difficult to overcome, the cost of developing software for the increased variety and number of wireless devices can be significantly reduced using continuous integration (CI) principles and tools [4]. In particular, open source software projects have benefited significantly from adopting CI principles that significantly reduce the workload of the core maintainers. ...
Preprint
Full-text available
Network testing plays an important role in the iterative process of developing new communication protocols and algorithms. However, test environments have to keep up with the evolution of technology and require continuous update and redesign. In this paper, we propose COINS, a framework that can be used by wireless technology developers to enable continuous integration (CI) practices in their testbed infrastructure. As a proof-of-concept, we provide a reference architecture and implementation of COINS for controlled testing of multi-technology 5G Machine Type Communication (MTC) networks. The implementation upgrades an existing wireless experimentation testbed with new software and hardware functionalities. It blends web service technology and operating system virtualization technologies with emerging Internet of Things technologies enabling CI for wireless networks. Moreover, we also extend an existing qualitative methodology for comparing similar frameworks and identify and discuss open challenges for wider use of CI practices in wireless technology development.
... While CI testing is essential to identify and fix issues that could lead to build failures (Abdalkareem et al. 2019), to perform such CI in real-world industry settings is far from being a trivial task (Herzig et al. 2015). On the one hand, a large number of commits are produced daily (Meyer 2014). Each such commit has to undergo the compilation and test activities, the latter can involve hundreds to thousands of tests to be run. ...
Article
Full-text available
Context Continuous Integration (CI) is a resource intensive, widely used industry practice. The two most commonly used heuristics to reduce the number of builds are either by grouping multiple builds together or by skipping builds predicted to be safe. Yet, both techniques have their disadvantages in terms of missing build failures and respectively higher build turn-around time (delays). Objective We aim to bring together these two lines of research, empirically comparing their advantages and disadvantages over time, and proposing and evaluating two ways in which these build avoidance heuristics can be combined more effectively, i.e., the ML-CI model based on machine learning and the Timeout Rule. Method We empirically study the trade-off between reduction in the number of builds required and the speed of recognition of failing builds on a dataset of 79,482 builds from 20 open-source projects. Results We find that both of our hybrid heuristics can provide a significant improvement in terms of less missed build failures and lower delays than the baseline heuristics. They substantially reduce the turn-around-time of commits by 96% in comparison to skipping heuristics, the Timeout Rule also enables a median of 26.10% less builds to be scheduled than grouping heuristics. Conclusions Our hybrid approaches offer build engineers a better flexibility in terms of scheduling builds during CI without compromising the quality of the resulting software.
... To assure the functional safety of highly configurable systems, system testing is important [1,2]. However, system testing often mandates a trade-off between efficient test execution (i.e., the time it takes to execute all test cases) and test coverage (i.e., how many system configurations were covered by the testing procedure) [3,4,5,6,7]. This trade-off is even more severe for configurable systems since test cases must be executed on multiple system configurations. ...
Preprint
Full-text available
Ensuring the functional safety of highly configurable systems often requires testing representative subsets of all possible configurations to reduce testing effort and save resources. The ratio of covered t-wise feature interactions (i.e., T-Wise Feature Interaction Coverage) is a common criterion for determining whether a subset of configurations is representative and capable of finding faults. Existing t-wise sampling algorithms uniformly cover t-wise feature interactions for all features, resulting in lengthy execution times and large sample sizes, particularly when large t-wise feature interactions are considered (i.e., high values of t). In this paper, we introduce a novel approach to t-wise feature interaction sampling, questioning the necessity of uniform coverage across all t-wise feature interactions, called \emph{\mulTiWise{}}. Our approach prioritizes between subsets of critical and non-critical features, considering higher t-values for subsets of critical features when generating a t-wise feature interaction sample. We evaluate our approach using subject systems from real-world applications, including \busybox{}, \soletta{}, \fiasco{}, and \uclibc{}. Our results show that sacrificing uniform t-wise feature interaction coverage between all features reduces the time needed to generate a sample and the resulting sample size. Hence, \mulTiWise{} Sampling offers an alternative to existing approaches if knowledge about feature criticality is available.
... While continuous integration focuses on automating the build and test processes to ensure quality and consistency in the code, continuous deployment is focused on automating the process of deploying code changes to test, staging and production environments. The process for continuous integration and deployment is also explained in [2], [3]. Now, in the absence of the continuous integration and continuous deployment practices, the code changes are manually promoted in the test environment for validation. ...
Article
Full-text available
To cater to the changing needs of the businesses, enterprises are adopting processes that allow rapid iteration and feedback loop. Today, development teams work closely with the business leveraging agile methods to gather feedback, assess the impact of the changes and deploy changes in a short duration. By taking advantage of the microservices architecture (MSA), large monolithic code is logically broken down into microservices that can be developed, deployed and scaled independently. Applications are leveraging containerization and orchestration technologies along with microservices architecture to package, deploy and manage the code across different environments. The underlying infrastructure and agile processes need to be supported with robust methods to perform code integration and deployments without any service disruption. This paper provides a qualitative assessment of the following code deployment techniques (i) Recreating Deployments (ii) Rolling Deployment (iii) Blue-Green Deployment (iv) Canary Deployment. This assessment can guide enterprises to identify the right code deployment strategy that can be adopted based on the business use case. Next, the paper dives deeper on how the in-built capabilities of Kubernetes along with open-source tools like Istio can be leveraged successfully to implement canary deployments for service changes. The paper presents a novel technique for performing canary deployments whereby service and database changes can be promoted to production by leveraging Istio and Liquibase along with load balancer without incurring any downtime for the application. The paper provides a complete canary deployment reference architecture that can be adopted by enterprises pursuing zero downtime for continuous deployments.
... Get to know the popular DevOps tools like Git, Jenkins, Docker, Kubernetes, Terraform and how they can be used to automate and manage infrastructure and applications [49]. Make sure to thoroughly prepare for the technical round by reviewing the job requirements, practicing scenario-based questions, and familiarizing yourself with common DevOps tools and practices [50]. ...
Article
Full-text available
DevOps has become a critical component of modern software development, bridging the gap between development and operations teams to ensure efficient software delivery. With the increasing demand for skilled DevOps professionals, it is essential for job seekers to effectively prepare for the interview process. This article explores the typical DevOps interview process, including the recruiter screening, take-home challenge, hiring manager screen, coding round, and technical round. It provides strategies and best practices for navigating each stage successfully. The preparation strategies discussed include outlining and strengthening the required skills, practicing articulating answers and storytelling, solving coding problems on platforms like LeetCode, reviewing common DevOps interview questions, gaining hands-on experience with relevant tools and technologies, and staying updated with the latest trends and best practices in the field. By understanding the interview patterns, diligently preparing, and demonstrating a combination of technical expertise, problem-solving abilities, and effective communication skills, candidates can increase their chances of success in securing a desired DevOps role. The article emphasizes the importance of customizing preparation to the specific role and company requirements, setting aside adequate time for practice, and adopting the right mindset to confidently navigate the DevOps interview process.
... Build Automation: Understanding build automation tools like Maven and Gradle is essential for success [32]. These tools make it easier to compile, test, and deploy software artifacts, which improves efficiency and reliability in the development pipeline [33]. ...
Article
Full-text available
This article provides valuable insights and guidance for aspiring DevOps professionals on building a strong portfolio to succeed in the rapidly evolving technology industry. It highlights the importance of DevOps in driving innovation, efficiency, and collaboration within organizations, and emphasizes the growing demand for skilled professionals in this field. The article discusses the versatility and flexibility required in DevOps roles, and outlines essential skills such as version control, build automation, repository management, CI/CD, scripting and automation, configuration management, Infrastructure as Code, containerization, container orchestration, monitoring and observability, and cloud platforms. It also emphasizes the significance of additional skills, including systems design, data structures and algorithms, programming languages, and security. The article provides practical advice on building a compelling DevOps portfolio through personal projects, illustrating expertise, showcasing a range of projects, effective documentation and presentation, and continuous improvement. By following these guidelines and consistently enhancing their skills, aspiring DevOps professionals can position themselves for a successful and rewarding career in this dynamic field.
... Typically, the expected generalization performance can be expressed by the following formula: Expected loss 1 : E x, y ∼ p L y; f x ≈ mean -Reliability and reproducibility: a reliable AI system is one that works properly with a range of inputs and in a range of situations, while reproducibility describes whether an AI experiment exhibits the same behaviour when repeated under the same conditions. This idea is tied with the software engineering concept of continuous integration [44], that is, is the algorithm auditable? (e.g. ...
Article
Full-text available
Business reliance on algorithms is becoming ubiquitous, and companies are increasingly concerned about their algorithms causing major financial or reputational damage. High-profile cases include Google’s AI algorithm for photo classification mistakenly labelling a black couple as gorillas in 2015 (Gebru 2020 In The Oxford handbook of ethics of AI, pp. 251–269), Microsoft’s AI chatbot Tay that spread racist, sexist and antisemitic speech on Twitter (now X) (Wolf et al. 2017 ACM Sigcas Comput. Soc. 47, 54–64 (doi:10.1145/3144592.3144598)), and Amazon’s AI recruiting tool being scrapped after showing bias against women. In response, governments are legislating and imposing bans, regulators fining companies and the judiciary discussing potentially making algorithms artificial ‘persons’ in law. As with financial audits, governments, business and society will require algorithm audits; formal assurance that algorithms are legal, ethical and safe. A new industry is envisaged: Auditing and Assurance of Algorithms (cf. data privacy), with the remit to professionalize and industrialize AI, ML and associated algorithms. The stakeholders range from those working on policy/regulation to industry practitioners and developers. We also anticipate the nature and scope of the auditing levels and framework presented will inform those interested in systems of governance and compliance with regulation/standards. Our goal in this article is to survey the key areas necessary to perform auditing and assurance and instigate the debate in this novel area of research and practice.
... A common metric to quantify the operational carbon emissions of energy consumption is called carbon intensity, which describes the grams of CO 2 -equivalent greenhouse gases emitted per killowatt-hour of consumed energy (gCO 2 /kWh). A significant portion of the work performed in cloud data centers pertains to services that enable the automated testing, building, and delivery of software, often referred to as continuous integration and delivery (CI/CD) [3]. According to a survey conducted by the CD Foundation [4], the use of CI/CD in the context of development and operation has become a standard practice today and has steadily increased. ...
Chapter
Full-text available
While the environmental impact of cloud computing is increasingly evident, the climate crisis has become a major issue for society. For instance, data centers alone account for 2.7% of Europe’s energy consumption today. A considerable part of this load is accounted for by cloud-based services for automated software development, such as continuous integration and delivery (CI/CD) workflows. In this paper, we discuss opportunities and challenges for greening CI/CD services by better aligning their execution with the availability of low-carbon energy. We propose a system architecture for carbon-aware CI/CD services, which uses historical runtime information and, optionally, user-provided information. Our evaluation examines the potential effectiveness of different scheduling strategies using real carbon intensity data and 7,392 workflow executions of Github Actions, a popular CI/CD service. Results show, that user-provided information on workflow deadlines can effectively improve carbon-aware scheduling.
... Overall, build pipelines are perfectly suited to improve the quality of code through automated testing which enables developers to "Commit Daily, Commit Often" to introduce a culture which can greatly improve debugging capabilities [8]. ...
Conference Paper
The perception of the autonomous driving software of the FS223, a low-level sensor fusion of Lidar and Camera data requires the use of a neural network for image classification. To keep the neural network up to date with updates in the training data, we introduce a Continuous Integration (CI) pipeline to re-train the network. The network is then automatically validated and integrated into the code base of the autonomous system. The introduction of proper CI methods in these high-speed embedded software applications is an application of state-of-the-art MLOps techniques that aim to provide rapid generation of production-ready models. It further serves the purpose of professionalizing the otherwise script-based software production, which is re-done almost completely every year as the teams change from one year to the next.
... Continuous Integration is a software development practice where team members integrate their work frequently, and each integration is verified by an automated build (including test) to detect integration errors [35]. Such automation has been reported to increase productivity significantly in both industrial projects [10,36,37]. Driven by these success stories, the impact of CI on the open source software development process has become a topic of active research. ...
... A third-party alternative, however, offers fewer possibilities for customisation. An important aspect of a CI/CD pipeline is modularity, where each step is independent and easier to maintain and troubleshoot [49]. This can also make the pipeline more efficient, as the entire process can be shortened if the application fails in earlier steps. ...
Article
Full-text available
As software applications continue to become more complex and attractive to cyber-attackers, enhancing resilience against cyber threats becomes essential. Aiming to provide more robust solutions, different approaches were proposed for vulnerability detection in different stages of the application life-cycle. This article explores three main approaches to application security: Static Application Security Testing (SAST), Dynamic Application Security Testing (DAST), and Software Composition Analysis (SCA). The analysis conducted in this work is focused on open-source solutions while considering commercial solutions to show contrast in the approaches taken and to better illustrate the different options available. It proposes a baseline comparison model to help evaluate and select the best solutions, using comparison criteria that are based on community standards. This work also identifies future opportunities for application security, highlighting some of the key challenges that still need to be addressed in order to fully protect against emerging threats, and proposes a workflow that combines the identified tools to be used for vulnerability assessments.
... Continuous Integration is a software development practice where team members integrate their work frequently, and each integration is verified by an automated build (including test) to detect integration errors [35]. Such automation has been reported to increase productivity significantly in both industrial projects [10,36,37]. Driven by these success stories, the impact of CI on the open source software development process has become a topic of active research. ...
Preprint
Full-text available
Code review is a popular practice where developers critique each others' changes. Since automated builds can identify low-level issues (e.g., syntactic errors, regression bugs), it is not uncommon for software organizations to incorporate automated builds in the code review process. In such code review deployment scenarios, submitted change sets must be approved for integration by both peer code reviewers and automated build bots. Since automated builds may produce an unreliable signal of the status of a change set (e.g., due to ``flaky'' or non-deterministic execution behaviour), code review tools, such as Gerrit, allow developers to request a ``recheck'', which repeats the build process without updating the change set. We conjecture that an unconstrained recheck command will waste time and resources if it is not applied judiciously. To explore how the recheck command is applied in a practical setting, in this paper, we conduct an empirical study of 66,932 code reviews from the OpenStack community. We quantitatively analyze (i) how often build failures are rechecked; (ii) the extent to which invoking recheck changes build failure outcomes; and (iii) how much waste is generated by invoking recheck. We observe that (i) 55% of code reviews invoke the recheck command after a failing build is reported; (ii) invoking the recheck command only changes the outcome of a failing build in 42% of the cases; and (iii) invoking the recheck command increases review waiting time by an average of 2,200% and equates to 187.4 compute years of waste -- enough compute resources to compete with the oldest land living animal on earth.
... They use problem-based learning as a teaching method, starting from the problem and showing why people use it and what they are using. In this context, Meyer et al. [58] motivate the adoption of CI by facing the following main problem: "How do you verify whether someone's changes broke something in the code or whether the changes work in the larger context of the entire codebase?". Any educator could start teaching CI from this problem motivation and not only focus on teaching CI tools and their respective functionalities. ...
... They have reduced the number of mutants and test executions, but still they are not suitable for large systems like Google's on which more than ten million test executions take place everyday [19]. To avoid this, recent advances employ the context of Continuous Integration (CI) [20]: predictive mutation testing techniques [21,22,23] aim to predict mutation testing results without mutant executions by learning static and dynamic features of mutants from earlier version of the project. ...
Preprint
Full-text available
Context: Automated fault localisation aims to assist developers in the task of identifying the root cause of the fault by narrowing down the space of likely fault locations. Simulating variants of the faulty program called mutants, several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed. Method: To overcome this shortcoming, we propose a new MBFL technique named SIMFL (Statistical Inference for Mutation-based Fault Localisation). SIMFL localises faults based on the past results of mutation analysis that has been done on the earlier version in the project history, allowing developers to make predictions on the location of incoming faults in a just-in-time manner. Using several statistical inference methods, SIMFL models the relationship between test results of the mutants and their locations, and subsequently infers the location of the current faults. Results: The empirical study on Defects4J dataset shows that SIMFL can localise 113 faults on the first rank out of 224 faults, outperforming other MBFL techniques. Even when SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95 faults on the first rank out of 194 faults. Moreover, removing redundant mutants significantly improves the localisation accuracy of SIMFL by the number of faults localised at the first rank up to 51. Conclusion: This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults. SIMFL is not only cost-effective, as it does not need a mutation analysis after the fault is observed, but also capable of localising faults accurately.
... Sarma et al., 2016), archiving and deployment of data, code and reports (e.g. this manuscript, White et al., 2018), and the interpretation, integration and usage of data and software across different sources (see Pasquier et al., 2017). In this context, small modifications to code and data can be frequently committed and automatically tested, as in continuous integration and continuous deployment practices (Meyer, 2014). This allows for early detection and correction of errors, potentially improving confidence in scientific development by minimizing software errors (see Soergel, 2015). ...
Article
Full-text available
1. Researchers in ecology and evolutionary biology are increasingly dependent on computational code to conduct research. Hence, the use of efficient methods to share, reproduce, and collaborate on code as well as document research is fundamental. GitHub is an online, cloud-based service that can help researchers track, organize, discuss, share, and collaborate on software and other materials related to research production, including data, code for analyses, and protocols. Despite these benefits, the use of GitHub in ecology and evolution is not widespread. 2. To help researchers in ecology and evolution adopt useful features from GitHub to improve their research workflows, we review 12 practical ways to use the platform. 3. We outline features ranging from low to high technical difficulty, including storing code, managing projects, coding collaboratively, conducting peer review, writing a manuscript, and using automated and continuous integration to streamline analyses. Given that members of a research team may have different technical skills and responsibilities, we describe how the optimal use of GitHub features may vary among members of a research collaboration. 4. As more ecologists and evolutionary biologists establish their workflows using GitHub, the field can continue to push the boundaries of collaborative, transparent, and open research.
... The software evolution process has been developing towards full automation during the past decade [18], [27], yet, it remains, in essence, human-driven. With the increasing complexity of computing systems, the rapid advancements of technologies, the need for systems operating 24/7, and the ever-changing demands on computing systems, human-driven evolution will eventually become unmanageable [5], [10], [31]. ...
Preprint
Full-text available
Engineering long-running computing systems that achieve their goals under ever-changing conditions pose significant challenges. Self-adaptation has shown to be a viable approach to dealing with changing conditions. Yet, the capabilities of a self-adaptive system are constrained by its operational design domain (ODD), i.e., the conditions for which the system was built (requirements, constraints, and context). Changes, such as adding new goals or dealing with new contexts, require system evolution. While the system evolution process has been automated substantially, it remains human-driven. Given the growing complexity of computing systems, human-driven evolution will eventually become unmanageable. In this paper, we provide a definition for ODD and apply it to a self-adaptive system. Next, we explain why conditions not covered by the ODD require system evolution. Then, we outline a new approach for self-evolution that leverages the concept of ODD, enabling a system to evolve autonomously to deal with conditions not anticipated by its initial ODD. We conclude with open challenges to realise self-evolution.
... Modern software development is based on the DevOps development approach [15]. DevOps relies on a range of tools to improve both the development phase (Dev), e.g., code management, and the operational phase (Ops), e.g., deployment, monitoring, and logging [33], [32]. These tools help to ensure that software is delivered quickly, reliably, and with minimal risk of errors or downtime. ...
... De acordo com [Meyer 2014], o ciclo de vida de um processo de CI envolve executar um programa que automaticamente seleciona as mudanças feitas, aplica-as, e então executa vários comandos (como testes automatizados) para verificar se há algum problema nas mudanças. Uma ferramenta para utilizar CI no Githubé o Travis CI 5 . ...
Article
Full-text available
Esse artigo descreve o processo de adaptação do projeto minicursosdo grupo PET Computação UFRGS para a modalidade online, em virtude da pandemia de COVID-19.São abordados tanto o processo de escolha de plataformas e ferramentas quantoa criação dos cursos em si. Após a análise de algumas alternativas, optamospelas tecnologias MDBook, Git, Github Pages e Travis CI para desenvolvimentoe disponibilização dos cursos online. O grupo tem recebido um retorno positivodos estudantes sobre o formato e o material disponibilizado, que permite umaatualização constante e colaborativa do material. Essa experiência mostrouque o modelo adotado é adequado não apenas para esse perı́odo de pandemia,mas também como suporte a cursos presenciais.
... Continuous integration is the practice of integrating code changes from multiple developers into a single source via automation [77]. The important practice of CI is that all developers commit the code frequently to the main or trunk branch [113] mentioned in Section 3.1, subsection 'Plan' After the commitment, the code building is performed as explained in Section 3.1, subsection 'Build'. As soon as the build succeeds, the unit test cases are run as explained in Section 3.1, subsection 'Test'. ...
Article
Full-text available
In the Software Development Life Cycle (SDLC), Development and Operations (DevOps) has been proven to deliver reliable, scalable software within a shorter time. Due to the explosion of Machine Learning (ML) applications, the term Machine Learning Operations (MLOps) has gained significant interest among ML practitioners. This paper explains the DevOps and MLOps processes relevant to the implementation of MLOps. The contribution of this paper towards the MLOps framework is threefold: First, we review the state of the art in MLOps by analyzing the related work in MLOps. Second, we present an overview of the leading DevOps principles relevant to MLOps. Third, we derive an MLOps framework from the MLOps theory and apply it to a time-series forecasting application in the hourly day-ahead electricity market. The paper concludes with how MLOps could be generalized and applied to two more use cases with minor changes.
... files often only specify the direct dependencies of the code, and not downstream dependencies which may change if an environment or container is later re-built. Continuous integration (CI) (Meyer, 2014) can help to identify these issues, but reproducibility issues can also occur due to differences in hardware (e.g., GPU model) and environment (e.g., environment variables). In addition, comprehensive CI is not always feasible with largeomics datasets. ...
Article
Full-text available
Motivation Computational systems biology analyses typically make use of multiple software and their dependencies, which are often run across heterogeneous compute environments. This can introduce differences in performance and reproducibility. Capturing metadata (e.g., package versions, GPU model) currently requires repetitious code and is difficult to store centrally for analysis. Even where virtual environments and containers are used, updates over time mean that versioning metadata should still be captured within analysis pipelines to guarantee reproducibility. Results Microbench is a simple and extensible Python package to automate metadata capture to a file or Redis database. Captured metadata can include execution time, software package versions, environment variables, hardware information, Python version, and more, with plugins. We present three case studies demonstrating Microbench usage to benchmark code execution and examine environment metadata for reproducibility purposes. Availability Install from the Python Package Index using pip install microbench. Source code is available from https://github.com/alubbock/microbench.
... Note that open source license compliance (or OSLC [11]) is just a part of OSC, albeit an often-discussed one due to the variety and complexity of software licensing [20,2]. The state-of-the-art industry approach for managing the complexity of OSC-known as continuous open source compliance [26]-is to automate as much as possible the verification of adherence to all obligations and best practices for FOSS component management and integrate them into continuous integration (CI) toolchains [23]. ...
Preprint
Free/Open Source Software (FOSS) enables large-scale reuse of preexisting software components. The main drawback is increased complexity in software supply chain management. A common approach to tame such complexity is automated open source compliance, which consists in automating the verication of adherence to various open source management best practices about license obligation fulllment, vulnerability tracking, software composition analysis, and nearby concerns.We consider the problem of auditing a source code base to determine which of its parts have been published before, which is an important building block of automated open source compliance toolchains. Indeed, if source code allegedly developed in house is recognized as having been previously published elsewhere, alerts should be raised to investigate where it comes from and whether this entails that additional obligations shall be fullled before product shipment.We propose an ecient approach for prior publication identication that relies on a knowledge base of known source code artifacts linked together in a global Merkle direct acyclic graph and a dedicated discovery protocol. We introduce swh-scanner, a source code scanner that realizes the proposed approach in practice using as knowledge base Software Heritage, the largest public archive of source code artifacts. We validate experimentally the proposed approach, showing its eciency in both abstract (number of queries) and concrete terms (wall-clock time), performing benchmarks on 16 845 real-world public code bases of various sizes, from small to very large.
... The more target platforms that need to be supported, the bigger the risk that this leads to multiple redundant code segments with potentially different programming syntax, compilation configurations, and deployment mechanisms, which are error-prone and laborintensive to maintain. In the software development world, mechanisms and paradigms have been found that facilitate writing more portable software, such as programming paradigms that support multiple architectures (Wolfe, 2021), and modern continuous integration mechanisms (Meyer, 2014). If we want to be able to keep benefiting from future hardware developments in neuroscience, neurosimulator software will have to fully engage with the portability challenge. ...
Article
Full-text available
The need for reproducible, credible, multiscale biological modeling has led to the development of standardized simulation platforms, such as the widely-used NEURON environment for computational neuroscience. Developing and maintaining NEURON over several decades has required attention to the competing needs of backwards compatibility, evolving computer architectures, the addition of new scales and physical processes, accessibility to new users, and efficiency and flexibility for specialists. In order to meet these challenges, we have now substantially modernized NEURON, providing continuous integration, an improved build system and release workflow, and better documentation. With the help of a new source-to-source compiler of the NMODL domain-specific language we have enhanced NEURON's ability to run efficiently, via the CoreNEURON simulation engine, on a variety of hardware platforms, including GPUs. Through the implementation of an optimized in-memory transfer mechanism this performance optimized backend is made easily accessible to users, providing training and model-development paths from laptop to workstation to supercomputer and cloud platform. Similarly, we have been able to accelerate NEURON's reaction-diffusion simulation performance through the use of just-in-time compilation. We show that these efforts have led to a growing developer base, a simpler and more robust software distribution, a wider range of supported computer architectures, a better integration of NEURON with other scientific workflows, and substantially improved performance for the simulation of biophysical and biochemical models.
... With this procedure, the developers integrate their work packages (e. g. code) regularly (several times a day) to comply with shorter and more frequent release cycles (Fitzgerald and Stol, 2017). This is expected to improve the final quality and team productivity through the provision of quick feedback on new problems with each code change (Humble and Farley, 2011) and changes in smaller increments which are easier to debug (Meyer, 2014). This is also harnessed by other quality assurance strategies such as test driven development, where CI is often utilized (Staegemann et al, 2021). ...
Preprint
Full-text available
To enable reliable software releases, automated concepts are increasingly being sought as part of the necessary test processes to be able to detect potential errors at an early stage. Static code analysis tools (SCATs) are especially suited for this purpose, as these testing tools perform their checks without actually executing the software. Thus, they represent an important part in the test suite. For the problem of carefully selecting such a tool for project use, a method for the construction of SCAT comparison catalogs is developed in this article to allow to contrast different tools and to evaluate them with respect to derived criteria. While the comparison categories are derived from established software quality models by taking into account the specifics of SCATs, multi criteria decision making procedures employing linguistic predicates are used to determine the most suitable test tool for each case. The artifact is demonstrated and evaluated in an artificial SCAT selection process which involves FindBugs, checkstyle, and PMD. A Method for Comparing and Selecting Static Code Analysis Tools Finally, potential extensions of the proposed method are outlined and its simple modifiability to other software decision situations is set forth.
... CD is based on the principles of agile development (Dingsøyr, Nerur, Balijepally, and Moe, 2012) and DevOps (Mishra and Otaiwi, 2020) that aim at increasing the deployment speed and quality of systems. CD leverages on continuous integration (CI) (Meyer, 2014) that automates tasks such as compiling code, running tests, and building deployment packages. Among the benefits of CI/CD are rapid innovation, shorter time-to-market, increased customer satisfaction, continuous feedback, and improved developer productivity. ...
Preprint
Full-text available
Computing systems are omnipresent; their sustainability has become crucial for our society. A key aspect of this sustainability is the ability of computing systems to cope with the continuous change they face, ranging from dynamic operating conditions, to changing goals, and technological progress. While we are able to engineer smart computing systems that autonomously deal with various types of changes, handling unanticipated changes requires system evolution, which remains in essence a human-centered process. This will eventually become unmanageable. To break through the status quo, we put forward an arguable opinion for the vision of self-evolving computing systems that are equipped with an evolutionary engine enabling them to evolve autonomously. Specifically, when a self-evolving computing system detects conditions outside its operational domain, such as an anomaly or a new goal, it activates an evolutionary engine that runs online experiments to determine how the system needs to evolve to deal with the changes, thereby evolving its architecture. During this process the engine can integrate new computing elements that are provided by computing warehouses. These computing elements provide specifications and procedures enabling their automatic integration. We motivate the need for self-evolving computing systems in light of the state of the art, outline a conceptual architecture of self-evolving computing systems, and illustrate the architecture for a future smart city mobility system that needs to evolve continuously with changing conditions. To conclude, we highlight key research challenges to realize the vision of self-evolving computing systems.
... Typically, the Expected Generalization Performance can be expressed by the following formula: ▪ Reliability and Reproducibility: a reliable AI system is one that works properly with a range of inputs and in a range of situations, whilst reproducibility describes whether an AI experiment exhibits the same behaviour when repeated under the same conditions. This idea is tied with the software engineering concept of Continuous Integration (Meyer, 2014), that is, is the algorithm auditable? (e.g. ...
Article
Algorithms are becoming ubiquitous. However, companies are increasingly alarmed about their algorithms causing major financial or reputational damage. A new industry is envisaged: auditing and assurance of algorithms with the remit to validate artificial intelligence, machine learning, and associated algorithms.
Article
Full-text available
For modern software development, especially for Java applications Continuous Integration and Continuous Deployment (CI/CD) pipelines is a vital aspect. This paper aims to provide an in-depth exploration of building custom CI/CD pipelines in GitLab specifically for Java applications. In this guide, we are going to touch upon proper ways on how you should integrate GitLab CI/CD with Java development tools and strategies for pipeline optimization towards effective testing, as well as deployment; analyze Docker methodology where environment consistency is a concern. This will emphasize the key areas to be considered for security, performance and scalability. The analysis also covers automation techniques for testing and deployment, ensuring high code quality and fast iteration cycles. Developers can therefore use GitLab CI/CD and Java tools in tandem to simplify their workflow, with a consistent stream of production-ready software delivered efficiently.
Article
Full-text available
________________________________________________________________________________________________________ Abstract: DevOps is a set of methodologies and cultural values that automate and combine software development (Dev) and IT operations (Ops) to create the best application. DevOps, based on agile concepts, applies incremental creation and continuous input across the product lifecycle. DevOps reduces friction between operations and development to improve software development and delivery. These rules govern software efficiency, structure, naming, and documentation. Jenkins automates code development, verification, and distribution, enhancing CI/CD efficiency and dependability. Kubernetes streamlines container-based application administration, enabling controlled scalability, growth, and fast service execution. These technologies improve DevOps processes, ensuring on-time delivery of items. DevOps release management includes thorough planning, infrastructure, versioning, restoration plans, continuous tracking, automated testing, reflection alerts, interaction, and recordkeeping. Well-controlled processes ensure reliable application deployments by resolving issues quickly, simplifying setup, and reducing user effort. Kubernetes' Permanent Amounts, flexible supply, and capacity divisions help manage, customize, and extend storage. It streamlines deployment and maintenance, organizes containers into Pods, and provides a flexible and robust environment due to its formal approach. Kubernetes manages all program storage, simplifying application deployment and management. Kubernetes has powerful networking, cloud support, recovery, load balancing, simple scalability, strong monitoring, and backup operations.
Article
Software development teams establish elaborate continuous integration pipelines containing automated test cases to accelerate the development process of software. Automated tests help to verify the correctness of code modifications decreasing the response time to changing requirements. However, when the software teams do not track the performance impact of pending modifications, they may need to spend considerable time refactoring existing code. This paper presents PACE , a program analysis framework that provides continuous feedback on the performance impact of pending code updates. We design performance microbenchmarks by mapping the execution time of functional test cases given a code update. We map microbenchmarks to code stylometry features and feed them to predictors for performance predictions. Our experiments achieved significant performance in predicting code performance, outperforming current state-of-the-art by 75% on neural-represented code stylometry features.
Conference Paper
Full-text available
Continuous Integration (CI) is a widely adopted practice in modern software engineering that involves integrating developers' local changes with the project baseline daily. Despite its popularity, recent studies have revealed that integrating changes can be time-consuming, requiring significant effort to correct errors that arise. This can lead to development activities being paused, including the addition of new features and fixing bugs, while developers focus on analyzing and correcting build failures. In this study, we investigate the factors that influence the time taken to correct build failures in CI. Specifically, we analyze the impact of developer activity, project characteristics, and build complexity on build failure correction time. To conduct our analysis, we collected data from 18 industrial projects of a software company, calculating 13 metrics for each project based on the literature on build failures analysis. We used association rules, a data mining technique, to examine the relationship between the defined factors and build failure correction time. Our findings reveal significant correlations between the factors studied and the duration of build failure correction time. Specifically, we found that more experienced developers require less time to correct build failures, while build failures that originate in the early stages of the project are resolved more quickly. Additionally, we observed that build failures with more lines and modified files tend to have longer correction times. Overall, this study sheds light on the factors that impact build failure correction time in CI. By identifying these factors, our findings can help software development teams optimize their CI processes and minimize the impact of build failures on development activities.
Chapter
DevOps has changed the software industry to enable continuous delivery. While many studies have investigated on how to introduce DevOps into a software product from the organizational perspective, less is known about the technical challenges developers and practitioners face when transforming legacy codes into DevOps, despite the undisputed importance of this topic. In this paper, throughout the context of web applications, we report the results of a study case with the adoption of four legacy open-source projects into DevOps to understand which refactoring techniques and strategies influence developers’ decisions. We analyze two dependent variables: the technique used and how they are applied to the project. After every implementation, there was an overview of the process that just occurred and later a written report on how the strategies have been applied, their respective order, which strategy has been more fruitful, and such. Those reports have been the foundation of this study. The main findings of such study are that some strategies are more efficient when viewed from the evolution aspect and the sequence these techniques are employed matter.
Article
Full-text available
Computing systems are omnipresent; their sustainability has become crucial for our society. A key aspect of this sustainability is the ability of computing systems to cope with the continuous change they face, ranging from dynamic operating conditions, to changing goals, and technological progress. While we are able to engineer smart computing systems that autonomously deal with various types of changes, handling unanticipated changes requires system evolution, which remains in essence a human-centered process. This will eventually become unmanageable. To break through the status quo, we put forward an arguable opinion for the vision of self-evolving computing systems that are equipped with an evolutionary engine enabling them to evolve autonomously. Specifically, when a self-evolving computing systems detects conditions outside its operational domain, such as an anomaly or a new goal, it activates an evolutionary engine that runs online experiments to determine how the system needs to evolve to deal with the changes, thereby evolving its architecture. During this process the engine can integrate new computing elements that are provided by computing warehouses. These computing elements provide specifications and procedures enabling their automatic integration. We motivate the need for self-evolving computing systems in light of the state of the art, outline a conceptual architecture of self-evolving computing systems, and illustrate the architecture for a future smart city mobility system that needs to evolve continuously with changing conditions. To conclude, we highlight key research challenges to realize the vision of self-evolving computing systems.
Article
Software-defined manufacturing (SDM) überträgt den Ansatz der Trennung von Anwendungen und unterlagerten Systemen, welcher ein entscheidendes Paradigma und Erfolgsfaktor für die Digitalisierung in vielen Branchen ist, auf die industrielle Produktion. Neben einer gesteigerten Integration von IT und OT können so die Produktionsprozesse einer gesamten Fabrik dynamisch durch Software anpasst werden. Ziel dieses Beitrags ist es, den SDM-Ansatz und die zur Umsetzung notwendigen Konzepte hinsichtlich technologischer Infrastruktur und Abstraktionsschicht vorzustellen. Anhand von zwei Use Cases werden die sich daraus ergebenden Potenziale aufgezeigt.
Chapter
Many frameworks and libraries are available for researchers working on optimization. However, the majority of them require programming knowledge, lack of a friendly user interface and cannot be run on different operating systems. WebGE is a new optimization tool which provides a web-based graphical user interface allowing any researcher to use Grammatical Evolution and Differential Evolution on symbolic regression problems. In addition, the fact that it can be deployed on any server as a web service also incorporating user authentication, makes it a versatile and portable tool that can be shared by multiple researchers. Finally, the modular software architecture allows to easily extend WebGE to other algorithms and types of problems.KeywordsGrammatical EvolutionDifferential EvolutionSymbolic regressionOpen-source software
ResearchGate has not been able to resolve any references for this publication.