ArticlePDF Available

Cfengine: A site configuration engine

Authors:
... Using automated tools to configure systems has a long history [5] and is established as a best practice [37]. Configuration management systems are especially suited to managing HPC systems, which generally have a large number of very similar constituent nodes. ...
... Configuration management systems are especially suited to managing HPC systems, which generally have a large number of very similar constituent nodes. Today, configuration management tools such as Ansible [44], Cfengine [5], and Puppet [42] are widely used in HPC environments, but their installations can benefit from an updated look at their roles and responsibilities. ...
Preprint
Through the 1990s, HPC centers at national laboratories, universities, and other large sites designed distributed system architectures and software stacks that enabled extreme-scale computing. By the 2010s, these centers were eclipsed by the scale of web-scale and cloud computing architectures, and today even upcoming exascale HPC systems are magnitudes of scale smaller than those of datacenters employed by large web companies. Meanwhile, the HPC community has allowed system software designs to stagnate, relying on incremental changes to tried-and-true designs to move between generations of systems. We contend that a modern system software stack that focuses on manageability, scalability, security, and modern methods will benefit the entire HPC community. In this paper, we break down the logical parts of a typical HPC system software stack, look at more modern ways to meet their needs, and make recommendations of future work that would help the community move in that direction.
... Binding provisioning plan to topology model makes the orchestration less elastic, whereas our proposal is to internalize elasticity in fine-grained cloud operations. Another group [17], [24], [14] uses state into which an application shall be transferred: an orchestration plan consists of relationships between provisioning operations and relationships are generated based on the desired state. AI planning and graph covering techniques are used to analyze dependencies between nodes, relationships, and operations in order to generate workflows. ...
... AI planning and graph covering techniques are used to analyze dependencies between nodes, relationships, and operations in order to generate workflows. CFEngine [14] has a behavioral model for cloud resources based on promise theory [9]. In Ops-Scale, both topology and desired state are the implicit results of value functions; at the functional level, data is untyped and unstructured-giving a better degree of freedom to functions to encapsulate elastic operations. ...
Conference Paper
Full-text available
Recent research has proposed new techniques to streamline the autoscaling of cloud applications, but little effort has been made to advance configuration management (CM) systems for such elastic operations. Existing practices use CM systems, from the DevOps paradigm, to automate operations. However, these practices still require human intervention to program ad hoc procedures to fully automate reconfiguration. Moreover, even after careful programming of cloud operations, the backing models are insufficient for re-running such programs unchanged in other platforms-which implies an overhead in rewriting the programs. We argue that CM programs can be designed to be deployment-agnostic and highly elastic with well-defined abstractions. In this paper, we introduce our abstraction based on declarative functional programming, and we demonstrate it using a feedback loop control mechanism. Our proposal, called Ops-Scale, is a family of cloud operations that are derived by making a functional abstraction over existing configuration programs. The hypothesis in this paper is twofold: 1) it should be possible to make a highly declarative CM system rich enough to capture fine-grained reconfigurations of autoscaling automatically, and; 2) that a program written for a specific deployment can be re-used in other deployments. To test this hypothesis, we have implemented an open source configuration engine called Karamel that is already used in industry for large-scale cluster deployments. Results show that at scale Ops-Scale can capture a polynomial order of reconfiguration growth in a fully automated manner. In practice, recent deployments have demonstrated that Karamel can provision clusters of 100 virtual machines consisting of many-layers distributed services on Google's IaaS Cloud in 'less than 10 minutes'.
... Although this code captures the sysadmin's knowledge, it does not enable knowledge reuse because building on top of automation code requires the same knowledge as creating the code in the first place. This issue stems from the foundational theory behind configuration management tools: converging towards a predefined end-state, as popularized by Burgess et al. 3 The idea of convergence is that a sysadmin specifies the desired end-state of an application and the configuration management tool executes the necessary actions to get the application into that state. The automation code is in that sense a description of the desired end state of the application. ...
... The approach of converging towards a predefined end-state as popularized by Burgess et al. 3 is inherently inflexible as explained in the introduction. We believe that agent-based cloud management addresses this inflexibility. ...
Article
Full-text available
Managing cloud applications is complex, and the current state of the art is not addressing this issue. The ever-growing software ecosystem continues to increase the knowledge required to manage cloud applications at a time when there is already an IT skills shortage. Solving this issue requires capturing IT operations knowledge in software so that this knowledge can be reused by sysadmins who do not have it. The presented research tackles this issue by introducing a new and fundamentally different way to approach cloud application management: a hierarchical collection of independent software agents, collectively managing the cloud application. Each agent encapsulates knowledge of how to manage specific parts of the cloud application , is driven by sending and receiving cloud models, and collaborates with other agents by communicating using conversations. The entirety of communication and collaboration in this collection is called the orchestrator conversation. A thorough evaluation shows the orchestrator conversation makes it possible to encapsulate IT operations knowledge that current solutions cannot, reduces the complexity of managing a cloud application and happens inherently concurrent. The evaluation also shows that the conversation figures out how to deploy a single big data cluster in less than 100 milliseconds, which scales linearly to less than 10 seconds for 100 clusters , resulting in a minimal overhead compared to the deployment time of at least 20 minutes with the state of the art.
... CFEngine is an open-source configuration management tool providing enterprise functionalities by a commercial version. Its primary function is to provide automated configuration and management on top of existing computing resources [11]. Similar to AWS CloudFormation, Azure Resource Manager is the deployment and management service by Azure to manage resources in Microsoft's cloud environment [27]. ...
Article
Full-text available
In recent years, a plethora of deployment technologies evolved, many following a declarative approach to automate the delivery of software components. Even if such technologies share the same purpose, they differ in features and supported mechanisms. Thus, it is difficult to compare and select deployment automation technologies as well as to migrate from one technology to another. Hence, we present a systematic review of declarative deployment technologies and introduce the essential deployment metamodel (EDMM) by extracting the essential parts that are supported by all these technologies. Thereby, the EDMM enables a common understanding of declarative deployment models by facilitating the comparison, selection, and migration of technologies. Moreover, it provides a technology-independent baseline for further deployment automation research.
... • Create a basic design for a typical HPC cluster and Big Data cluster • Create a hardware parts list for that cluster design • Assemble the hardware into a working cluster • Install a base OS onto all nodes in the cluster • Compile and install a list of software needed for the operation of an HPC cluster such as a scheduler, common scientific libraries, etc. or a Big Data cluster such as Hadoop, Spark, MapReduce, etc. • Install and maintain services for monitoring cluster such as Ganglia, Sensu, Nagios, etc • Configure the installed machine to work as a cluster making sure ssh keys are in place, proper scheduling rules are in place, etc. • Use a configuration management system such as xCAT, CFEngine [2] or Puppet to configure and maintain the cluster • Create tests and monitors to confirm the cluster is in good health from a node and service perspective • Take the entire cluster down for maintenance, do the maintenance, and return the cluster to service within allotted time 2.5.2 Scientific Application Support. ...
Conference Paper
Full-text available
There is a shortage of training programs for research cyber-facilitators and the need is only growing, especially in academia. This paper will discuss the importance of developing a workforce at the undergraduate level, creating a formal program for training and mentoring undergraduates in Research Computing at Purdue University, and how the approach to mentoring has evolved. The hands-on training and mentoring program has changed from one with students working as junior HPC administrators, performing hardware break-fix in a relative vacuum, to one with students working closely with their mentors, building real-world cyberinfrastructure solutions, such as distributed computing environments. More recently, the mentoring program has grown to include facilitating and supporting research applications with the Purdue user community. Finally, outcomes for the students in these programs lessons learned will be discussed.
Conference Paper
Infrastructure as Code (IaC) for cloud is an important practice due to its efficient and reproducible provisioning of cloud environments. On a cloud IaC definition (template), developers need to manage permissions for each cloud services as well as a desired cloud environment. To minimize the risk of cyber-attacks, retaining least privilege, i.e., giving a minimum set of permissions, on IaC templates is important and widely regarded as best practice. However, discovering least privilege on a target IaC template at one time is an error-prone and burdensome task for developers. One reason is that some actions of a cloud service implicitly use other services and require corresponding permissions, which are hard to recognize without actual executions on the cloud and burden the development process with iterations of permission setting and provisioned result checking. In this paper, we present a technique to automatically discover least privilege. Our method incrementally finds the least privilege by the iteration of testing on the cloud and (re)configuring permissions on the basis of test results. We conducted case studies and found that our approach can identify least privilege on Amazon Web Services within a practical time. Our experiments also show that the proposed algorithm can reduce the number of test executions, which directly affects the time and cost on cloud to determine least privilege, by 69.3% and 39.8% compared with the random and heuristic methods, respectively, on average.
Preprint
In recent years, a plethora of deployment technologies evolved, many following a declarative approach to automate the delivery of software components. Even if such technologies share the same purpose, they differ in features and supported mechanisms. Thus, it is difficult to compare and select deployment automation technologies as well as to migrate from one technology to another. Hence, we present a systematic review of declarative deployment technologies and introduce the Essential Deployment Metamodel (EDMM) by extracting the essential parts that are supported by all these technologies. Thereby, the EDMM enables a common understanding of declarative deployment models by facilitating the comparison, selection, and migration of technologies. Moreover, it provides a technology-independent baseline for further deployment automation research.
Article
We study a scenario for cloud services based on autonomous resource management agents in situations of competition for limited resources. In the scenario, autonomous agents make independent decisions on resource consumption in a competitive environment. Altruistic and selfish strategies for agent behaviour are simulated and compared with respect to whether they lead to successful resource management in the overall system, and how much information exchange is needed among the agents for the strategies to work. Our results imply that local agent information could be sufficient for global optimisation. Also, the selfish strategy proved stable compared to uninformed altruistic behaviour.
ResearchGate has not been able to resolve any references for this publication.