Roles with co-occurring smells within a snapshot

Roles with co-occurring smells within a snapshot

Source publication
Conference Paper
Full-text available
Infrastructure as Code is the practice of automating the provisioning, configuration, and orchestration of network nodes using code in which variable values such as configuration parameters, node hostnames, etc. play a central role. Mistakes in these values are an important cause of infrastructure defects and corresponding outages. Ansible, a popul...

Citations

... One approach to investigating the impact of security weaknesses on Ansible-based infrastructure management is to understand how security weaknesses propagate into tasks. A 'task' is a code construct that performs infrastructure management operations (Opdebeeck et al. 2022;Ansible 2020;Borovits et al. 2022). With tasks, Ansible users can specify what configurations are required to set up and manage necessary computing infrastructure and how such configurations will be executed (Opdebeeck et al. 2022;Ansible 2020;Borovits et al. 2022). ...
... A 'task' is a code construct that performs infrastructure management operations (Opdebeeck et al. 2022;Ansible 2020;Borovits et al. 2022). With tasks, Ansible users can specify what configurations are required to set up and manage necessary computing infrastructure and how such configurations will be executed (Opdebeeck et al. 2022;Ansible 2020;Borovits et al. 2022). If a security weakness propagates into a task, then we can determine that task to be impacted by that security weakness. ...
... Ansible uses a state-based approach to manage computing infrastructure (Ansible 2020). Ansible uses a code element called 'task', which is used to perform necessary changes to the relevant computing infrastructure (Opdebeeck et al. 2022). While executing a task, Ansible first queries about the state of the infrastructure, i.e., if the necessary configurations specified in a task exist in the infrastructure (Opdebeeck et al. 2022). ...
Article
Full-text available
Context Despite being beneficial for managing computing infrastructure at scale, Ansible scripts include security weaknesses, such as hard-coded passwords. Security weaknesses can propagate into tasks, i.e., code constructs used for managing computing infrastructure with Ansible. Propagation of security weaknesses into tasks makes the provisioned infrastructure susceptible to security attacks. A systematic characterization of task infection, i.e., the propagation of security weaknesses into tasks, can aid practitioners and researchers in understanding how security weaknesses propagate into tasks and derive insights for practitioners to develop Ansible scripts securely. Objective The goal of the paper is to help practitioners and researchers understand how Ansible-managed computing infrastructure is impacted by security weaknesses by conducting an empirical study of task infections in Ansible scripts. Method We conduct an empirical study where we quantify the frequency of task infections in Ansible scripts. Upon detection of task infections, we apply qualitative analysis to determine task infection categories. We also conduct a survey with 23 practitioners to determine the prevalence and severity of identified task infection categories. With logistic regression analysis, we identify development factors that correlate with presence of task infections. Results In all, we identify 1,805 task infections in 27,213 scripts. We identify six task infection categories: anti-virus, continuous integration, data storage, message broker, networking, and virtualization. From our survey, we observe tasks used to manage data storage infrastructure perceived to have the most severe consequences. We also find three development factors, namely age, minor contributors, and scatteredness to correlate with the presence of task infections. Conclusion Our empirical study shows computing infrastructure managed by Ansible scripts to be impacted by security weaknesses. We conclude the paper by discussing the implications of our findings for practitioners and researchers.
... Sharma et al. [153] studied best practices in Puppet configuration code, analysing 4,621 Puppet repositories for the presence of implementation and design configuration smells. Opdebeeck et al. conversely studied variable-related [132] and security-related [133] bad smells in Ansible files respectively. The Ansible Galaxy ecosystem has been an active subject of study in general, as will be shown in Chapter 9, and more specifically Section 9.3. ...
Chapter
This chapter defines and presents the kinds of software ecosystems that are targeted in this book. The focus is on the development, tooling and analytics aspects of "software ecosystems", i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are developing and maintaining. The technical and social dependencies between these developers and software components form a socio-technical dependency network, and the dynamics of this network change over time. We classify and provide several examples of such ecosystems, many of which will be explored in further detail in the subsequent chapters of the book. The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems, as well as the techniques and research methods that can be used to analyse different aspects of these ecosystems.
... Defects can cause issues in infrastructure deployments and potentially lead to outages [4]. Bad practices and code smells (i.e., recurring code patterns that indicate potential problems [5]) can hamper the maintainability of infrastructure code, decreasing the velocity at which changes are made [6]- [9]. Worse, security-related bad practices can give rise to security vulnerabilities in an infrastructure. ...
... In this paper, we propose a novel approach to detecting security smells which leverages Program Dependence Graphs (PDG) built for Ansible code [9]. This representation includes control-flow and data-flow information, enabling our approach to address the latter two limitations. ...
... To summarise, this paper makes the following contributions. • We describe a PDG representation for both Ansible playbooks (client) and role (library) code, based on Opdebeeck et al.'s prior work [9], which only supports the latter and a smaller subset of the Ansible syntax. • We propose a novel PDG-based approach to security smell detection for IaC. ...
Chapter
Edge-cloud system aims to reduce the processing time of Big data by bringing massive infrastructures closer to the source of data. Infrastructure as Code (IaC) supports the automatic deployment and management of these infrastructures through reusable code, and Ansible is the most popular IaC tool. As the quality of Ansible script directly influences the quality of Edge-cloud system, many researchers have studied improving the quality of Ansible scripts. However, there has yet to be an attempt to leverage the power of ChatGPT. Thus, we study to explore the feasibility of ChatGPT to improve the quality of Ansible scripts. Three raters evaluate ChatGPT’s code recommendation ability on 48 code revision cases from 25 Ansible project GitHub repositories, and we analyze the rating results. As a result, we can confirm that ChatGPT can recognize and understand Ansible script. However, its ability largely depends on how to user formulates the questions. Thus, we can confirm the need for prompt engineering for ChatGPT to acquire stable code recommendation results.
Article
Edge-Cloud system requires massive infrastructures located in closer to the user to minimize latencies in handling Big data. Ansible is one of the most popular Infrastructure as Code (IaC) tools crucial for deploying these infrastructures of the Edge-cloud system. However, Ansible also consists of code, and its code quality is critical in ensuring the delivery of high-quality services within the Edge-Cloud system. On the other hand, the Large Langue Model (LLM) has performed remarkably on various Software Engineering (SE) tasks in recent years. One such task is Automated Program Repairing (APR), where LLMs assist developers in proposing code fixes for identified bugs. Nevertheless, prior studies in LLM-based APR have predominantly concentrated on widely used programming languages (PL), such as Java and C, and there has yet to be an attempt to apply it to Ansible. Hence, we explore the applicability of LLM-based APR on Ansible. We assess LLMs’ performance (ChatGPT and Bard) on 58 Ansible script revision cases from Open Source Software (OSS). Our findings reveal promising prospects, with LLMs generating helpful responses in 70% of the sampled cases. Nonetheless, further research is necessary to harness this approach’s potential fully.
Conference Paper
While Terraform has gained popularity to implement the practice of infrastructure as code (IaC), little is known about characteristics or actionability of static analysis for Terraform manifests. Such lack of understanding hinders practitioners to adopt static analysis for their Terraform development process, as it happened for Company A, an organization who uses Terraform to create automated software deployment pipelines. In this experience report, we summarize our study of 491 static analysis alerts that occur for 10 open source and one proprietary Terraform repositories. From our analysis, we observed: (i) 10 categories of static analysis alerts appear in Terraform manifests, of which five are related to security, (ii) Majority of the practitioners understand static analysis alert messages and underlying root causes, (iii) Terraform resources with dependencies have 1.5x - 2.1x more static analysis alerts than resources with no dependencies, and (iv) Practitioner perceptions vary from one alert category to another while deciding on taking actions for reported alerts. This paper concludes with recommendations for practitioners, toolsmiths and researchers on how to use and analyze static analysis alerts of Terraform in an actionable manner.
... Opdebeeck et al. [30] hypothesise that some of Ansible's semantics, such as lazy evaluation of expressions and a complicated variable precedence system, may lead to defects in Ansible code. The authors therefore propose 6 code smells related to the usage of variables and expressions, such as multiple usages of a variable whose value may have changed between the usages, variables that have been defined through unnecessarily complicated mechanisms, and potentially accidental redefinitions of variables. ...
Chapter
Infrastructure as Code (IaC) is the practice of automating the provisioning, configuration, and orchestration of systems onto which software is deployed through scripts in domain-specific languages. With the increasing importance of reliable and repeatable deployments, ecosystems are emerging around online repositories of reusable IaC assets. In this chapter, we study two such ecosystems in detail: the one forming around the Docker Hub repository of reusable Docker images and the one forming around the Ansible Galaxy repository of reusable Ansible roles. We start with an introduction to Docker, the most popular container management tool, and Ansible, the most popular configuration management tool. Although both tools are used to configure machines onto which applications are deployed, they differ fundamentally in the means through which this is achieved. Next, we discuss the Docker Hub and Ansible Galaxy online repositories for reusable Docker images and Ansible roles. Having introduced these emerging ecosystems, we highlight a number of approaches taken by researchers studying them. Subsequently, we survey the state of the art in research on the practices followed by their contributors and users, ranging from the versioning of releases and keeping dependencies up to date to detecting bugs. We conclude with the challenges that researchers face when analyzing these ecosystems.