ArticlePDF Available

Cfengine: A site configuration engine

Authors:
  • Researcher and Advisor at ChiTek-i
... Using automated tools to configure systems has a long history [5] and is established as a best practice [37]. Configuration management systems are especially suited to managing HPC systems, which generally have a large number of very similar constituent nodes. ...
... Configuration management systems are especially suited to managing HPC systems, which generally have a large number of very similar constituent nodes. Today, configuration management tools such as Ansible [44], Cfengine [5], and Puppet [42] are widely used in HPC environments, but their installations can benefit from an updated look at their roles and responsibilities. ...
Preprint
Through the 1990s, HPC centers at national laboratories, universities, and other large sites designed distributed system architectures and software stacks that enabled extreme-scale computing. By the 2010s, these centers were eclipsed by the scale of web-scale and cloud computing architectures, and today even upcoming exascale HPC systems are magnitudes of scale smaller than those of datacenters employed by large web companies. Meanwhile, the HPC community has allowed system software designs to stagnate, relying on incremental changes to tried-and-true designs to move between generations of systems. We contend that a modern system software stack that focuses on manageability, scalability, security, and modern methods will benefit the entire HPC community. In this paper, we break down the logical parts of a typical HPC system software stack, look at more modern ways to meet their needs, and make recommendations of future work that would help the community move in that direction.
... Binding provisioning plan to topology model makes the orchestration less elastic, whereas our proposal is to internalize elasticity in fine-grained cloud operations. Another group [17], [24], [14] uses state into which an application shall be transferred: an orchestration plan consists of relationships between provisioning operations and relationships are generated based on the desired state. AI planning and graph covering techniques are used to analyze dependencies between nodes, relationships, and operations in order to generate workflows. ...
... AI planning and graph covering techniques are used to analyze dependencies between nodes, relationships, and operations in order to generate workflows. CFEngine [14] has a behavioral model for cloud resources based on promise theory [9]. In Ops-Scale, both topology and desired state are the implicit results of value functions; at the functional level, data is untyped and unstructured-giving a better degree of freedom to functions to encapsulate elastic operations. ...
Conference Paper
Full-text available
Recent research has proposed new techniques to streamline the autoscaling of cloud applications, but little effort has been made to advance configuration management (CM) systems for such elastic operations. Existing practices use CM systems, from the DevOps paradigm, to automate operations. However, these practices still require human intervention to program ad hoc procedures to fully automate reconfiguration. Moreover, even after careful programming of cloud operations, the backing models are insufficient for re-running such programs unchanged in other platforms-which implies an overhead in rewriting the programs. We argue that CM programs can be designed to be deployment-agnostic and highly elastic with well-defined abstractions. In this paper, we introduce our abstraction based on declarative functional programming, and we demonstrate it using a feedback loop control mechanism. Our proposal, called Ops-Scale, is a family of cloud operations that are derived by making a functional abstraction over existing configuration programs. The hypothesis in this paper is twofold: 1) it should be possible to make a highly declarative CM system rich enough to capture fine-grained reconfigurations of autoscaling automatically, and; 2) that a program written for a specific deployment can be re-used in other deployments. To test this hypothesis, we have implemented an open source configuration engine called Karamel that is already used in industry for large-scale cluster deployments. Results show that at scale Ops-Scale can capture a polynomial order of reconfiguration growth in a fully automated manner. In practice, recent deployments have demonstrated that Karamel can provision clusters of 100 virtual machines consisting of many-layers distributed services on Google's IaaS Cloud in 'less than 10 minutes'.
... IaC has attracted a lot of attention in recent years from both practitioners and researchers. For IaC, cfengine (Burgess and College 1995) provided a high-level language for network administration. The scope was expanded to operating systems and middleware configurations with Puppet (Kanies 2006) andChef (Nelson-Smith 2013). ...
Article
Full-text available
Infrastructure as code (IaC) for the cloud, which automatically configures a system’s cloud environment from source code, is an important practice thanks to its efficient, reproducible provisioning. On a cloud IaC definition (template), developers must carefully manage permission settings to minimize the risk of cyber-attacks. To this end, least privilege on IaC templates, i.e., the assignment of a necessary and sufficient set of permissions, is widely regarded as a best practice. However, the discovery of least privilege can be an error-prone, burdensome task for developers. This is partially because the execution of an action on the cloud sometimes implicitly requires permissions of other services, and since these are difficult to recognize without actual execution, developers are forced to manually iterate the execution of an action and the modification of permissions. In this work, we present an approach to automatically discover least privilege. Our approach utilizes a test suite, which represents what a system should achieve on the cloud, as an indicator of least privilege, and it iterates testing on the cloud and (re)configuration of permissions on the basis of the test results. We also propose a stepwise filtering technique that utilizes the co-occurrences of cloud services/actions and clustering-based pruning to efficiently rule out unnecessary permissions. Our experiments demonstrate that this filtering reduces the number of iterations compared to naive approaches, which directly affects the time and cost to discover least privilege. Moreover, three case studies show that our approach can identify least privilege on Amazon Web Services within a practical time.
... CFEngine is an open-source configuration management tool providing enterprise functionalities by a commercial version. Its primary function is to provide automated configuration and management on top of existing computing resources [11]. Similar to AWS CloudFormation, Azure Resource Manager is the deployment and management service by Azure to manage resources in Microsoft's cloud environment [27]. ...
Article
Full-text available
In recent years, a plethora of deployment technologies evolved, many following a declarative approach to automate the delivery of software components. Even if such technologies share the same purpose, they differ in features and supported mechanisms. Thus, it is difficult to compare and select deployment automation technologies as well as to migrate from one technology to another. Hence, we present a systematic review of declarative deployment technologies and introduce the essential deployment metamodel (EDMM) by extracting the essential parts that are supported by all these technologies. Thereby, the EDMM enables a common understanding of declarative deployment models by facilitating the comparison, selection, and migration of technologies. Moreover, it provides a technology-independent baseline for further deployment automation research.
... • Create a basic design for a typical HPC cluster and Big Data cluster • Create a hardware parts list for that cluster design • Assemble the hardware into a working cluster • Install a base OS onto all nodes in the cluster • Compile and install a list of software needed for the operation of an HPC cluster such as a scheduler, common scientific libraries, etc. or a Big Data cluster such as Hadoop, Spark, MapReduce, etc. • Install and maintain services for monitoring cluster such as Ganglia, Sensu, Nagios, etc • Configure the installed machine to work as a cluster making sure ssh keys are in place, proper scheduling rules are in place, etc. • Use a configuration management system such as xCAT, CFEngine [2] or Puppet to configure and maintain the cluster • Create tests and monitors to confirm the cluster is in good health from a node and service perspective • Take the entire cluster down for maintenance, do the maintenance, and return the cluster to service within allotted time 2.5.2 Scientific Application Support. ...
Conference Paper
Full-text available
There is a shortage of training programs for research cyber-facilitators and the need is only growing, especially in academia. This paper will discuss the importance of developing a workforce at the undergraduate level, creating a formal program for training and mentoring undergraduates in Research Computing at Purdue University, and how the approach to mentoring has evolved. The hands-on training and mentoring program has changed from one with students working as junior HPC administrators, performing hardware break-fix in a relative vacuum, to one with students working closely with their mentors, building real-world cyberinfrastructure solutions, such as distributed computing environments. More recently, the mentoring program has grown to include facilitating and supporting research applications with the Purdue user community. Finally, outcomes for the students in these programs lessons learned will be discussed.
Conference Paper
Infrastructure as Code (IaC) for cloud is an important practice due to its efficient and reproducible provisioning of cloud environments. On a cloud IaC definition (template), developers need to manage permissions for each cloud services as well as a desired cloud environment. To minimize the risk of cyber-attacks, retaining least privilege, i.e., giving a minimum set of permissions, on IaC templates is important and widely regarded as best practice. However, discovering least privilege on a target IaC template at one time is an error-prone and burdensome task for developers. One reason is that some actions of a cloud service implicitly use other services and require corresponding permissions, which are hard to recognize without actual executions on the cloud and burden the development process with iterations of permission setting and provisioned result checking. In this paper, we present a technique to automatically discover least privilege. Our method incrementally finds the least privilege by the iteration of testing on the cloud and (re)configuring permissions on the basis of test results. We conducted case studies and found that our approach can identify least privilege on Amazon Web Services within a practical time. Our experiments also show that the proposed algorithm can reduce the number of test executions, which directly affects the time and cost on cloud to determine least privilege, by 69.3% and 39.8% compared with the random and heuristic methods, respectively, on average.
Preprint
In recent years, a plethora of deployment technologies evolved, many following a declarative approach to automate the delivery of software components. Even if such technologies share the same purpose, they differ in features and supported mechanisms. Thus, it is difficult to compare and select deployment automation technologies as well as to migrate from one technology to another. Hence, we present a systematic review of declarative deployment technologies and introduce the Essential Deployment Metamodel (EDMM) by extracting the essential parts that are supported by all these technologies. Thereby, the EDMM enables a common understanding of declarative deployment models by facilitating the comparison, selection, and migration of technologies. Moreover, it provides a technology-independent baseline for further deployment automation research.
Article
We study a scenario for cloud services based on autonomous resource management agents in situations of competition for limited resources. In the scenario, autonomous agents make independent decisions on resource consumption in a competitive environment. Altruistic and selfish strategies for agent behaviour are simulated and compared with respect to whether they lead to successful resource management in the overall system, and how much information exchange is needed among the agents for the strategies to work. Our results imply that local agent information could be sufficient for global optimisation. Also, the selfish strategy proved stable compared to uninformed altruistic behaviour.
ResearchGate has not been able to resolve any references for this publication.