Article

On the theory of system administration

Authors:
  • Researcher and Advisor at ChiTek-i
To read the full-text of this research, you can request a copy directly from the author.

Abstract

This paper describes a mean field approach to defining and implementing policy-based system administration. The concepts of regulation and optimization are used to define the notion of maintenance. These are then used to evaluate stable equilibria of system configuration, that are associated with sustainable policies for system management. Stable policies are thus associated with fixed points of a mapping that describes the evolution of the system. In general, such fixed points are the solutions of strategic games. A consistent system policy is not sufficient to guarantee compliance; the policy must also be implementable and maintainable. The paper proposes two types of model to understand policy driven management of Human-Computer systems: (i) average dynamical descriptions of computer system variables which provide a quantitative basis for decision, and (ii) competitive game theoretical descriptions that select optimal courses of action by generalizing the notion of configuration equilibria. It is shown how models can be formulated and simple examples are given.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... A fixed-point model of change was introduced in [8,9], based on the notion of repairability or 'maintenance' of an intended state. This model is realized in the software Cfengine [10], and was further elaborated upon using an alternative formulation in [11]. ...
... We follow the notation of [8] in writing a generic operators as letters with carets over them, e.g.Ô 1 ,Ô 2 , etc, while generic states on which these operators act are written in Dirac notation |q . The resulting state after applying an operatorÔ 1 to a system in the state |q is written asÔ 1 |q . ...
... Viewing parameter values as a subset of a field (e.g., the rational numbers), with corresponding algebraic structure, allows us to distinguish three approaches to change in the value of a parameter, making precise the notion of change q → q + δq, used in [8]. We call the three approaches relative (∆), absolute (C), and multiplicative (µ) or scale change, and we now wish to separate these, so as to distinguish their properties more clearly. ...
Preprint
Full-text available
In system operations it is commonly assumed that arbitrary changes to a system can be reversed or `rolled back', when errors of judgement and procedure occur. We point out that this view is flawed and provide an alternative approach to determining the outcome of changes. Convergent operators are fixed-point generators that stem from the basic properties of multiplication by zero. They are capable of yielding a repeated and predictable outcome even in an incompletely specified or `open' system. We formulate such `convergent operators' for configuration change in the language of groups and rings and show that, in this form, the problem of convergent reversibility becomes equivalent to the `division by zero' problem. Hence, we discuss how recent work by Bergstra and Tucker on zero-totalised fields helps to clear up long-standing confusion about the options for `rollback' in change management.
... Seeking invariance of promises is the key to process stability. A process may be called adiabatic [29] if exterior information does not alter promise definitions over the timescale of interactions that rely on it-meaning that a process's 12 The example of observing the inconsistent state of a clock was discussed in reference [5]. ...
... • Fixed point behaviour: a memory process referring to an invariant future state, with intrinsic process stability, which does not require any runtime memory, a priori, only a maintenance process that counters state drift and converges, or the absence of complete isolation from external change [10], [29]. ...
... The strategy of engineering around fixed points still goes highly unappreciated across software engineering and management [32]. The legacy of industrial commoditization is still with us, and out old-fashioned thinking favours the pattern of replacing defective 29 While memory is crucial for some tasks, it shouldn't be accumulated without good reason-that's called a memory leak. parts with fresh 'clean' parts (like changing the air filters). ...
Preprint
Several popular best-practice manifestos for IT design and architecture use terms like `stateful', `stateless', `shared nothing', etc, and describe `fact based' or `functional' descriptions of causal evolution to describe computer processes, especially in cloud computing. The concepts are used ambiguously and sometimes in contradictory ways, which has led to many imprecise beliefs about their implications. This paper outlines the simple view of state and causation in Promise Theory, which accounts for the scaling of processes and the relativity of different observers in a natural way. It's shown that the concepts of statefulness or statelessness are artifacts of observational scale and causal bias towards functional evaluation. If we include feedback loops, recursion, and process convergence, which appear acausal to external observers, the arguments about (im)mutable state need to be modified in a scale-dependent way. In most cases the intended focus of such remarks is not terms like `statelessness' but process predictability. A simple principle may be substituted in most cases as a guide to system design: the principle the separation of dynamic scales. Understanding data reliance and the ability to keep stable promises is of crucial importance to the consistency of data pipelines, and distributed client-server interactions, albeit in different ways. With increasingly data intensive processes over widely separated distributed deployments, e.g. in the Internet of Things and AI applications, the effects of instability need a more careful treatment. These notes are part of an initiative to engage with thinkers and practitioners towards a more rational and disciplined language for systems engineering for era of ubiquitous extended-cloud computing.
... A paradigm shift appears to be underway, which expands the scope of study in the area of systems administration. This is evidenced by the recent development of a formal theory of system administration, using a mathematical framework [9], the objective of which is to make possible a dynamical stability of the system as a whole. Reference [9] makes the point that the complexity of interaction between humans and computers presents an interesting challenge toward the formulation of any mathematical theory of system administration. ...
... This is evidenced by the recent development of a formal theory of system administration, using a mathematical framework [9], the objective of which is to make possible a dynamical stability of the system as a whole. Reference [9] makes the point that the complexity of interaction between humans and computers presents an interesting challenge toward the formulation of any mathematical theory of system administration. Interestingly, this complexity of human-computer interaction brings us to a second aspect of this paradigm shift. ...
... A proposed research framework should have a theoretical basis to which multiple problem spaces should be linked. As such, our proposed research framework employs the key aspects of the theory of system administration described in Reference [9]. Additionally, we adopt the definition for systems administration from [7], which states: "Network and system administration is a branch of engineering that concerns the operational management of human-computer systems." ...
Article
Full-text available
Information and computing infrastructures (ICT) involve levels of complexity that are highly dynamic in nature. This is due in no small measure to the proliferation of technologies, such as: cloud computing and distributed systems architectures, data mining and multidimensional analysis, and large scale enterprise systems, to name a few. Effective computing and network systems administration is integral to the stability and scalability of these complex software, hardware and communication systems. Systems administration involves the design, analysis, and continuous improvement of the performance or operation of information and computing systems. Additionally, social and administrative responsibilities have become nearly as integral for the systems administrator as are the technical demands that have been imposed for decades. The areas of operations research (OR) and system dynamics (SD) modeling offer system administrators a rich array of analytical and optimization tools that have been developed from diverse disciplines, which include: industrial, scientific, engineering, economic and financial, to name a few. This paper proposes a research framework by which OR and SD modeling techniques may prove useful to computing and network systems administration, which include: linear programming, network analysis, integer programming, nonlinear optimization, Markov processes, queueing modeling, simulation, decision analysis, heuristic techniques, and system dynamics modeling.
... In refs. [6][7][8], a bounded n-dimensional Euclidean model was used to model the memory of a computer system. As large as current systems may be, they remain finite and can be modelled by finite vectors. ...
... We assume that adjacency and the ability to exchange information are synonymous. After all, space is merely a reflection of the ability to observe and transport information 8 . This might seem peculiar from the viewpoint of absolute spacetime, but distance is just one of many possible associations between agents that change with perception, circumstance and individual capability. ...
... However, in a closed system, external states are irrelevant except to an outside observer with access to them. Our familiar notion of time must be understood as a continuum hypothesis of a finite state system.8 ...
Chapter
Full-text available
Using the previously developed concepts of semantic spacetime, I explore the interpretation of knowledge representations, and their structure, as a semantic system, within the framework of promise theory. By assigning interpretations to phenomena, from observers to observed, we may approach a simple description of knowledge-based functional systems, with direct practical utility. The focus is especially on the interpretation of concepts, associative knowledge, and context awareness. The inference seems to be that most if not all of these concepts emerge from purely semantic spacetime properties, which opens the possibility for a more generalized understanding of what constitutes a learning, or even 'intelligent' system. Some key principles emerge for effective knowledge representation: 1) separation of spacetime scales, 2) the recurrence of four irreducible types of association, by which intent propagates: aggregation, causation, cooperation , and similarity, 3) the need for discrimination of identities (discrete), which is assisted by distinguishing timeline simultaneity from sequential events, and 4) the ability to learn (memory). It is at least plausible that emergent knowledge abstraction capabilities have their origin in basic spacetime structures. These notes present a unified view of mostly well-known results; they allow us to see information models, knowledge representations, machine learning, and semantic networking (transport and information base) in a common framework. The notion of 'smart spaces' thus encompasses artificial systems as well as living systems, across many different scales, e.g. smart cities and organizations.
... The stability of every sampled observation made is uncertain, because an agent does not know how fast the world is changing without repeating its observations. Agents choose a sampling rate T sample which (by the Nyquist theorem) should be at least twice as fast as the rate of change of the promise it is assessing, and hence also the rate at which it is able to form a stable assessment of what it observes [3,7,8]. ...
... A corollary is that the significance of data to an assessment decays at the same rate with its relative age. By virtue of Nyquist's theorem [7,9], we can also say that: ...
... 6. Containment: the constraining of locations within a perimeter or other boundary. 7. State or pattern: configuration patterns of internal degrees of freedom, whose effect may or may not be visible in the exterior. ...
Article
Full-text available
Using the previously developed concepts of semantic spacetime, I explore the interpretation of knowledge representations, and their structure, as a semantic system, within the framework of promise theory. By assigning interpretations to phenomena, from observers to observed, we may approach a simple description of knowledge-based functional systems, with direct practical utility. The focus is especially on the interpretation of concepts, associative knowledge, and context awareness. The inference seems to be that most if not all of these concepts emerge from purely semantic spacetime properties, which opens the possibility for a more generalized understanding of what constitutes a learning, or even 'intelligent' system. Some key principles emerge for effective knowledge representation: 1) separation of spacetime scales, 2) the recurrence of four irreducible types of association, by which intent propagates: aggregation, causation, cooperation , and similarity, 3) the need for discrimination of identities (discrete), which is assisted by distinguishing timeline simultaneity from sequential events, and 4) the ability to learn (memory). It is at least plausible that emergent knowledge abstraction capabilities have their origin in basic spacetime structures. These notes present a unified view of mostly well-known results; they allow us to see information models, knowledge representations, machine learning, and semantic networking (transport and information base) in a common framework. The notion of 'smart spaces' thus encompasses artificial systems as well as living systems, across many different scales, e.g. smart cities and organizations.
... By embedding compressed information about intended state into a system throughout, repairs can be made in real time [Bur03,Bur04a,SW49]. If agents along a chain experience errors that can be self-corrected (e.g. ...
... How stable do they need to be? In ref. [Bur03] it is argued that policies should be based always on fixed points or equilibria of possible host strategies, since these are the regions that provide the necessary stability for hosts to perform a reliable function. Studies of destabilization of policies by successful mutant policies are a matter for game theoretical studies [Axe97]. ...
... • The controller can only address the average properties of its catchment area, as expressed in the maintenance theorem [Bur03,Bur04a]. ...
... By embedding compressed information about intended state into a system throughout, repairs can be made in real time [Bur03,Bur04a,SW49]. If agents along a chain experience errors that can be self-corrected (e.g. ...
... How stable do they need to be? In ref. [Bur03] it is argued that policies should be based always on fixed points or equilibria of possible host strategies, since these are the regions that provide the necessary stability for hosts to perform a reliable function. Studies of destabilization of policies by successful mutant policies are a matter for game theoretical studies [Axe97]. ...
... • The controller can only address the average properties of its catchment area, as expressed in the maintenance theorem [Bur03,Bur04a]. ...
... By embedding compressed information about intended state into a system throughout, repairs can be made in real time [Bur03,Bur04a,SW49]. If agents along a chain experience errors that can be self-corrected (e.g. ...
... How stable do they need to be? In ref. [Bur03] it is argued that policies should be based always on fixed points or equilibria of possible host strategies, since these are the regions that provide the necessary stability for hosts to perform a reliable function. Studies of destabilization of policies by successful mutant policies are a matter for game theoretical studies [Axe97]. ...
... • The controller can only address the average properties of its catchment area, as expressed in the maintenance theorem [Bur03,Bur04a]. ...
... This is the path or set of intermediate states between the start and the current value of an agent's observables, e.g. the history of an agent or the complete history of state transitions in a finite state machine. Let q be a vector of state information (which might include position, internal registers and other details) [7]. Such a trajectory begins at a certain time t 0 with a certain coordinate value q 0 , known as the initial conditions. ...
... We define an organization as a discrete pattern that is formed from interacting agents and which facilitates the achievement of a desired trajectory or task, i.e. a change from an initial state Q i to a final state Q f over a certain span of time. We refer to the discussion of systems in ref. [7] for the definition of a task. ...
... Neither promises nor programming exactly determine behaviour in general. Rather we must look to the spectrum of observable outcomes, as the interplay between freedoms and constraints in the agents [7,4]. ...
Conference Paper
Full-text available
Can the whole be greater than the sum of its parts? The phenomenon of emergence claims that it can. Autonomics suggests that emergence can be har-nessed to solve problems in self-management and behavioural regulation without human involvement, but the definitions of these key terms are unclear. Using promise theory, and the related operator theory of Burgess and Couch, we define behaviour in terms of promises and their outcomes. We describe the interaction of agents and their collective properties.
... In refs. [6][7][8], a bounded n-dimensional Euclidean model was used to model the memory of a computer system. As large as current systems may be, they remain finite and can be modelled by finite vectors. ...
... We assume that adjacency and the ability to exchange information are synonymous. After all, space is merely a reflection of the ability to observe and transport information 8 . This might seem peculiar from the viewpoint of absolute spacetime, but distance is just one of many possible associations between agents that change with perception, circumstance and individual capability. ...
... However, in a closed system, external states are irrelevant except to an outside observer with access to them. Our familiar notion of time must be understood as a continuum hypothesis of a finite state system.8 ...
Article
Full-text available
Relationships between objects constitute our notion of space. When these relationships change we interpret this as the passage of time. Observer interpretations are essential to the way we understand these relationships. Hence observer semantics are an integral part of what we mean by spacetime. Semantics make up the essential difference in how one describes and uses the concept of space in physics, chemistry, biology and technology. In these notes, I have tried to assemble what seems to be a set of natural, and pragmatic, considerations about discrete, finite spacetimes, to unify descriptions of these areas. It reviews familiar notions of spacetime, and brings them together into a less familiar framework of promise theory (autonomous agents), in order to illuminate the goal of encoding the semantics of observers into a description of spacetime itself. Autonomous agents provide an exacting atomic and local model for finite spacetime, which quickly reveals the issues of incomplete information and non-locality. From this we should be able to reconstruct all other notions of spacetime. The aim of this exercise is to apply related tools and ideas to an initial unification of real and artificial spaces, e.g. databases and information webs with natural spacetime. By reconstructing these spaces from autonomous agents, we may better understand naming and coordinatization of semantic spaces, from crowds and swarms to datacentres and libraries, as well as the fundamental arena of natural science.
... In refs. [6][7][8], a bounded n-dimensional Euclidean model was used to model the memory of a computer system. As large as current systems may be, they remain finite and can be modelled by finite vectors. ...
... We assume that adjacency and the ability to exchange information are synonymous. After all, space is merely a reflection of the ability to observe and transport information 8 . This might seem peculiar from the viewpoint of absolute spacetime, but distance is just one of many possible associations between agents that change with perception, circumstance and individual capability. ...
... However, in a closed system, external states are irrelevant except to an outside observer with access to them. Our familiar notion of time must be understood as a continuum hypothesis of a finite state system.8 ...
Article
Full-text available
Relationships between objects constitute our notion of space. When these relationships change we interpret this as the passage of time. Observer interpretations are essential to the way we understand these relationships. Hence observer semantics are an integral part of what we mean by spacetime. Semantics make up the essential difference in how one describes and uses the concept of space in physics, chemistry, biology and technology. In these notes, I have tried to assemble what seems to be a set of natural, and pragmatic, considerations about discrete, finite spacetimes, to unify descriptions of these areas. It reviews familiar notions of spacetime, and brings them together into a less familiar framework of promise theory (autonomous agents), in order to illuminate the goal of encoding the semantics of observers into a description of spacetime itself. Autonomous agents provide an exacting atomic and local model for finite spacetime, which quickly reveals the issues of incomplete information and non-locality. From this we should be able to reconstruct all other notions of spacetime. The aim of this exercise is to apply related tools and ideas to an initial unification of real and artificial spaces, e.g. databases and information webs with natural spacetime. By reconstructing these spaces from autonomous agents, we may better understand naming and coordinatization of semantic spaces, from crowds and swarms to datacentres and libraries, as well as the fundamental arena of natural science.
... How can a normal state for a system be defined, and how can that system keep that state by itself? Normality is interesting because we want systems to be predictable [5]. These questions require much research to be answered fully. ...
... When speaking of anomaly detection, it is not always certain if this term only covers anomaly intrusion detection or has a more broader view, where anomalies can come from misconfigurations, faults or other deviations which do not have anything to to with any intrusion attempt [5]. ...
... Commonly one supposes that systems are normal when they exhibit medium term stability, i.e. stability on a time scale at which users experience the system [5]. Health or stability is thus related to ones idea of policy. ...
... This paper is about the formulation of such a view within human-computer systems 2 . It is framed in the setting of a theory of maintenance for systems[2] so that we shall take the view that systems can have stable properties even in uncertain environments by arranging for there to be corrective forces maintaining an equilibrium with forces of environmental change. ...
... This paper is about the formulation of such a view within human-computer systems 2 . It is framed in the setting of a theory of maintenance for systems[2] so that we shall take the view that systems can have stable properties even in uncertain environments by arranging for there to be corrective forces maintaining an equilibrium with forces of environmental change. Specifically this paper is about the relationship between promises made by the parts of a system, i.e. the properties claimed for them and the actions or changes that are required to keep these promises. ...
... It represents the past or future history of an agent's state transitions. Let q be a vector of state information (which might include position, internal registers and other details)[2]. Such a trajectory begins at a certain time t 0 with a certain coordinate value q 0 , known as the initial conditions. ...
Article
Full-text available
We begin with two axioms: that system behaviour is an empirical phenomenon and that organization is a form of behaviour. We derive laws and characterizations of behaviour for generic systems. In our view behaviour is not determined by internal mech- anisms alone but also by environmental forces. Systems may 'announce' their internal expectations by making "promises" about their intended behaviour. We formalize this idea using promise theory to develop an reductionist understanding of how system behaviour and organization emerges from basic rules of interaction. Starting with the assumption that all system components are autonomous entities, we derive basic laws of influence betwe en them. Organization is then understood as persistent patterns in the trajectories of the system. We show how hierarchical structure emerges from the need to offload the cost of observational calibration: it is not a design requirement for control, rat her it begins as an economic imperative which then throttles itself through poor scalability and leads to clustered tree structures, with a trade-off between depth and width.
... It is this immunity model that we shall use in the present paper. We believe that this is the correct model for a self-regulating system, since it was shown in ref. [13] that a complete specification of policy determines a system's properties only to within an intrinsic uncertainty. The uncertainty can only be reduced by making each operation a closure[14, 15] by constant polling or maintenance sweeps, as advocated by the immunity model. ...
... We assert then that a reactive system does not exist without a policy. Policy compliance can be enforced, maintained and regulated by mapping each requirement onto a number of operations that are autonomically self-regulating[3, 13, 16, 4]. We shall draw on some of the results and concepts from earlier work to piece together a description of autonomic computing based on constellations of promises. ...
... In geometry and linear algebra the notion of an orthogonal basis that spans a vector space renders many discussions not only possible but lucid (if not Euclid!). Configuration entities do not generally form a vector space, but they are often organized with a Cartesian product structure (like a struct or record, or a database table) and therefore there is a natural decomposition of parameters that can change independently of one another[13, 18]. We would like to adopt such a set of orthogonal change operators, since this allows us to decompose any change into elemental, atomic sub-changes. ...
Article
Full-text available
We use the concept of promises to develop a service ori- ented abstraction of the primitive operations that make an autonomic computer system. Convergent behaviour does not depend on centralized control. We summarize necessary and sufficient conditions for maintain- ing a convergently enforced policy without sacrificing autonomy of deci- sion, and we discuss whether the idea of versioning control or "rollback" is compatible with an autonomic framework.
... Availability of peers in a network We begin with an abstract idea: that the correct configuration of a system over time depends on the system having regular maintenance checks. This is the essence of the maintenance theorem in ref. [6] which notes that humancomputer systems are stochastic in nature and subject to random errors in configuration. Such errors occur due to human interventions, automatic updates, software failures and even low probability bugs. ...
... We begin with an abstract idea: that the correct configuration of a system over time depends on the system having regular maintenance checks. This is the essence of the maintenance theorem in ref. [6] which notes that humancomputer systems are stochastic in nature and subject to random errors in configuration. Such errors occur due to human interventions, automatic updates, software failures and even low probability bugs. ...
... The needs of small clusters of users override the broader strokes painted by wide area management. This is the need for a scaled approach to system manage- ment[6]. ...
Article
Full-text available
Current interest in ad hoc and peer-to-peer network-ing technologies prompts a re-examination of models for configuration management, within these frameworks. In the future, network man-agement methods may have to scale to millions of nodes within a single organization, with complex social constraints. In this paper, we discuss whether it is possible to manage the configuration of large numbers of network devices using well-known and not-so-well-known configuration models, and we discuss how the special characteristics of ad hoc and peer-to-peer networks are reflected in this problem. Keywords— Configuration management, ad hoc networks, peer to peer.
... Rather than assuming that transitions between states of its model occur only at the instigation of an operator, or at at the behest of a protocol, cfengine imagines that changes of state occur unpredictably at any time, as part of the environment to be discovered. The cfengine project and derivative work (Burgess, 2003 ) accepts the idea of randomness in the interaction with environment. User interaction (Burgess et al., 2001) forms a mixture of signals which tends to disorder the system configuration, over time. ...
... Cfengine makes this process of 'maintenance' into an error-correction channel for messages belonging to a fuzzy alphabet (Burgess, 2002a), where error-correction is meant in the sense of Shannon (Shannon and Weaver, 1949). In ref. (Burgess, 2003 ) it was shown that a complete specification of policy determines an approximate configuration of a software system only approximately over persistent times. There are fundamental limits to the tolerances a system can satisfy with respect to policy compliance in a stochastic environ- ment. ...
... The maintenance model, underpinning cfengine, was outlined in ref. (Burgess, 2000b ) and is fully described in ref. (Burgess, 2003). Although a precise description of the cfengine viewpoint is involved, the idea is rather simple. ...
Article
Full-text available
Cfengine is a distributed agent framework for performing policy-based network and system administration that is used on hundreds of thousands of Unix-like and Windows systems. This paper describes cfengine's stochastic approach to policy implementation using distributed agents. It builds on the notion of 'convergent' statements, i.e. those which cause agents to gravitate towards an ideal configuration state, which is implied by policy specification. Cfengine's host classification model is briefly described and the model is compared to related work.
... There is thus additional uncertainty to be considered. In ref. [16], policy is identified as a specification of the average configuration of the system over persistent times. An important aspect of this definition is that it allows for error tolerances, necessitated by randomly occurring events that corrupt policy in the system management loop. ...
... There is a probabilistic or stochastic element to system behaviour and hence policy can only be an average property in general . We do not require a full definition of host policy here from ref. [16]. It suffices to define the following. ...
... It suffices to define the following. Definition 1 (Agent policy) The policy of an individual computing device (agent or component) is a representative specification of the desired average configuration of a host by appropriate constraints[16]. Promise theory takes a service viewpoint of policy and uses a graphical language to compose system properties and analyse them. ...
Conference Paper
Full-text available
The theory of promises describes policy governed services, in a framework of completely autonomous agents, which assist one another by voluntary cooperation alone. We propose this as a framework for analysing realistic models of modern networking, and as a formal model for swarm intelligence
... It applies both to humans and to computers. Clearly, policy should be based on a sound model of system behavior [23] if it is to lead to a successful decisionmaking . Policy success entails both the efficiency and security of the system. ...
... For instance, if we want to know the expected user activity at Fridays, we might compute the mean activity on Fridays for the sample we have. The maintenance theorem [23,26] was introduced as a way of describing the level of outstanding maintenance in a system, given a schedule for repair. In this paper we investigate the meaning of the maintenance theorem for risk management specifically in relation to disk backups. ...
... The maintenance theorem of ref. [23] tells us that a regular maintenance of the system can lead to an average stability if it is judiciously chosen. We therefore need to ascertain what the appropriate maintenance regimen should be. ...
Article
We discuss a simple model of disk backups and other maintenance processes that include change to computer data. We determine optimal strategies for scheduling such processes. A maximum entropy model of random change provides a simple and intuitive guide to the process of sector based disk change and leads to an easily computable optimum time for backup that is robust to changes in the model. We conclude with some theoretical considerations about strategies for organizing backup information. (c) 2006 Elsevier B.V. All rights reserved.
... This document is inspired by the studies of coarse-grained universal scaling in cities [1][2][3], and a comparison with models developed over the past decade or two on information systems, e.g. [4,5]. As IT systems grow in scale, is natural to expect a bridge between the behaviours of cities, software networks, and other functionally 'smart' spaces, and one hopes for a better understanding of pervasive information technology in social contexts. ...
... In other words, the volume is the effective average linear volume swept out by a fixed cross section L D−H , as it feeds into the N I nodes connected by the infrastructure 7 . This has the 5 This was an important argument in deriving the biological scaling laws [8]. 6 The model cannot formally distinguish between the intricacy of the infrastructure itself and the movement of agents around it, but it makes sense to assume that it is the motion of people and mobile agents that is complex, rather than the system of roads and wires of the city. ...
Article
Full-text available
The study of spacetime, and its role in understanding functional systems has received little attention in information science. Recent work, on the origin of universal scaling in cities and biological systems, provides an intriguing insight into the functional use of space, and its measurable effects. Cities are large information systems, with many similarities to other technological infrastructures, so the results shed new light indirectly on the scaling the expected behaviour of smart pervasive infrastructures and the communities that make use of them. Using promise theory, I derive and extend the scaling laws for cities to expose what may be extrapolated to technological systems. From the promise model, I propose an explanation for some anomalous exponents in the original work, and discuss what changes may be expected due to technological advancement .
... Nevertheless, short term dynamic environmental changes impose that the promised SLA can never be more than an expectation of a best-effort service quality during long term periods [2]. After admission, since the goal is to allocate resources to services in such a way that optimises the system's global utility, individual SLAs may be downgraded in a controlled fashion to a lower QoS level in order to accommodate new service requests with a higher utility. ...
... Making an effective use of the system's resources in a dynamic system is a complex and difficult task [8]. Any service provider's resource allocation policy is subject to environmental uncertainties, and for that reason, the promised SLA can never be more than an expectation of a quality level during longer term periods [2]. ...
Article
Full-text available
Due to the growing complexity and adaptability requirements of real-time systems, which often exhibit unrestricted Quality of Service (QoS) inter-dependencies among supported services and user-imposed quality constraints, it is increasingly difficult to optimise the level of service of a dynamic task set within an useful and bounded time. This is even more difficult when intending to benefit from the full potential of an open distributed cooperating environment, where service characteristics are not known beforehand and tasks may be inter-dependent. This paper focuses on optimising a dynamic local set of inter-dependent tasks that can be executed at varying levels of QoS to achieve an efficient resource usage that is constantly adapted to the specific constraints of devices and users, nature of executing tasks and dynamically changing system conditions. Extensive simulations demonstrate that the proposed anytime algorithms are able to quickly find a good initial solution and effectively optimise the rate at which the quality of the current solution improves as the algorithms are given more time to run, with a minimum overhead when compared against their traditional versions.
... Autonomy implies that one cannot require anything of agents directly; all agents can do is to make promises about their own behaviour. In ref. [29], host policy is identified as a specification of the average configuration of the system over persistent times. An important aspect of this definition is that it allows for error tolerances, necessitated by randomly occurring events that corrupt policy in the system management loop. ...
... There is a probabilistic or stochastic element to system behaviour and hence policy can only be an average property in general. We do not require a full definition of host policy here from ref. [29]. It suffices to define the following. ...
Article
Full-text available
Abstract We present a model for policy based management, stressing the role of decisive autonomy in generalized networks. The organization and consistency of agent cooperation is discussed within a cooperative network. We show that some simple rules can eliminate formal inconsistencies, allowing robust approximations to management. Using graph theoretical ranking methods, we evaluate also the probable consistency and robustness of cooperation in a network,region. Our theory makes,natural contact with social network,models,in building a theory of pervasive computing. We illustrate our model,with a number,of examples. Index Terms Configuration management, ad hoc networks, peer to peer, per vasive computing.
... In an appropriate sense, policy is a set of grammatically structured control knobs for altering the average state of a system. The view of policy taken in ref. [9] is that of a series of instructions, coded into the computer itself, that summarizes the expected behaviour. The precise behaviour is not enforcable, since the system is not deterministic: it is subject to a number of environmental disturbances. ...
... These have fluctuating values but might develop stable averages over time. These cannot normally be 'corrected' but they can be regulated over time (again this agrees with the maintenance theorem's view of average specification over time[9]). Cfengine deals with these two different realms differently: the former by direct language specification and the latter by machine learning and by classifying (digitizing) the arrival process. ...
Article
Full-text available
Cfengine is an autonomous agent for the configuration of Unix-like operating systems. It works by implementing a hybrid feedback loop, with both disrcete and continuous elements.
... Rather than assuming that transitions between states of its model occur only at the instigation of an operator, or at at the behest of a protocol, cfengine imagines that changes of state occur unpredictably at any time. The focus of cfengine, and supporting work[8], is this willingness to accept the idea of increasing random entropy of configuration through interaction. Users' social behaviour[9] is seen as a central and unignorable mixture of signals which tends to disorder the system configuration, over time. ...
... When a system complies with policy, it is healthy; when it deviates, it is sick. In ref. [8] it was shown that a complete specification of policy determines an approximate configuration of a software system only approximately over persistent times. There are fundamental limits to the tolerances one can expect a system to satisfy with respect to policy compliance. ...
Article
Full-text available
Cfengine is a distributed agent framework for performing policy-based net-work and system administration. It is in widespread use on Unix and NT sys-tems. This paper describes recent changes to the cfengine tool-set, including architectural changes in order to facilitate anomaly detection research, public key methods, improved scheduling technology and search filters.
... They do not capture the long-term behaviour of a system given an imperfect knowledge or specification, such as that one usually faces in a management scenario; thus they do not deal with the problem of inherent uncertainties. In ref. [27], host policy is identified as a specification of the average configuration of the system over persistent times. An important aspect of this definition is that it allows for error tolerances, necessitated by randomly occurring events that corrupt policy in the system management loop. ...
Preprint
Full-text available
We present a model for policy based management , stressing the role of decisive autonomy in generalized networks. The organization and consistency of agent cooperation is discussed within a cooperative network. We show that some simple rules can eliminate formal inconsistencies, allowing robust approximations to management. Using graph theoretical ranking methods, we evaluate also the probable consistency and robustness of cooperation in a network region. Our theory makes natural contact with social network models in building a theory of pervasive computing. We illustrate our model with a number of examples. Index Terms-Configuration management, ad hoc networks, peer to peer, pervasive computing, end-to-end service provision.
... • DO178c Software Engineering standards. 2 Promise Theory emerged from the study of stability and formal correctness of system states in computer installations as a deviation from the over-constraints of logical reasoning towards network processes [19][20][21][22]. Figure 1: The promises directed between agents. The public is a superagent containing all the others. ...
Preprint
Full-text available
Many public controversies involve the assessment of statements about which we have imperfect information. Without a structured approach, it is quite difficult to develop an approach to reasoning which is not based on ad hoc choices. Forms of logic have been used in the past to try to bring such clarity, but these fail for a variety of reasons. We demonstrate a simple approach to bringing a standardized approach to semantics, in certain discourse, using Promise Theory. As a case, we use Promise Theory (PT) to collect and structure publicly available information about the case of the MCAS software component for the Boeing 737 Max flight control system.
... System administration can be defined as a sequence of tasks to perform upkeep, configuration, and reliability operations of multi-user Information Technology (IT) infrastructures [2]. The implementation and modification of access control permissions is often performed by a System Administrator. ...
Chapter
Full-text available
Understanding how to implement file system access control rules within a system is heavily reliant on expert knowledge, both that intrinsic to how a system can be configured as well as how a current configuration is structured. Maintaining the required level of expertise in fast-changing environments, where frequent configuration changes are implemented, can be challenging. Another set of complexities lies in gaining structural understanding of large volumes of permission information. The accuracy of a new addition within a file system access control is essential, as inadvertently assigning rights that result in a higher than necessary level of access can generate unintended vulnerabilities. To address these issues, a novel mechanism is devised to automatically process a system’s event history to determine how previous access control configuration actions have been implemented and then utilise the model for suggesting how to implement new access control rules. Throughout this paper, we focus on Microsoft’s New Technology File System permissions (NTFS) access control through processing operating system generated log data. We demonstrate how the novel technique can be utilised to plan for the administrator when assigning new permissions. The plans are then evaluated in terms of their validity as well as the reduction in required expert knowledge.
... The term compliance is often used today for correctness of state with respect to a model. If a system deviates from its model, then with proper automation it self-repairs, 2,4 somewhat like an autopilot that brings systems back on course. What is interesting is that, when you can repair system state (both static configuration and runtime state), then the initial condition of the system becomes unimportant, and you may focus entirely on the desired outcome. ...
Article
Full-text available
The methods of system administration have changed little in the past 20 years. While core IT technologies have improved in a multitude of ways, for many if not most organizations system administration is still based on production-line build logistics (aka provisioning) and reactive incident handling. As we progress into an information age, humans will need to work less like the machines they use and embrace knowledge-based approaches. That means exploiting simple (hands-free) automation that leaves us unencumbered to discover patterns and make decisions. This goal is reachable if IT itself opens up to a core challenge of automation that is long overdue: namely, how to abandon the myth of determinism and expect the unexpected.
... In other words, the volume is the effective average linear volume swept out by a fixed cross section L D−H , as it feeds into the N I nodes connected by the infrastructure 7 . This has the 5 This was an important argument in deriving the biological scaling laws [8]. 6 The model cannot formally distinguish between the intricacy of the infrastructure itself and the movement of agents around it, but it makes sense to assume that it is the motion of people and mobile agents that is complex, rather than the system of roads and wires of the city. ...
Article
Full-text available
The study of spacetime, and its role in understanding functional systems has received little attention in information science. Recent work, on the origin of universal scaling in cities and biological systems, provides an intriguing insight into the functional use of space, and its measurable effects. Cities are large information systems, with many similarities to other technological infrastructures, so the results shed new light indirectly on the scaling the expected behaviour of smart pervasive infrastructures and the communities that make use of them. Using promise theory, I derive and extend the scaling laws for cities to expose what may be extrapolated to technological systems. From the promise model, I propose an explanation for some anomalous exponents in the original work, and discuss what changes may be expected due to technological advancement.
... Commonly one supposes that systems are normal when they exhibit medium term stability, i.e. stability on a time scale at which users experience the system [4]. Health or stability is thus related to ones idea of policy. ...
Chapter
Full-text available
We discuss the combination of two anomaly detection models, the Linux kernel module pH and cfengine, in order to create a multi-scaled approach to computer anomaly detection with automated response. By examining the time-average data from pH, we find the two systems to be conceptually complementary and to have compatible data models. Based on these findings, we build a simple prototype system and comment on how the same model could be extended to include other anomaly detection mechanisms.
... In a discrete system, it is straightforward to define a coarse graining by aggregation of autonomous agents into collections. In physics (dynamically), one does this by defining characteristic lengths (see the discussion for pseudo-continuous information in [10]). In a semantically labelled theory, there are no such easily defined lengths, and we are forced to define granular scales explicitly as sets, see definition 21 (section 3.7). ...
Research
Full-text available
Using Promise Theory as a calculus, I review how to define agency in a scalable way, for the purpose of understanding semantic spacetimes. By following simple scaling rules, replacing individual agents with `super-agents' (sub-spaces), it is shown how agency can be scaled both dynamically and semantically. The notion of occupancy and tenancy, or how space is used and filled in different ways, is also defined, showing how spacetime can be shared between independent parties, both by remote association and local encapsulation. I describe how to build up dynamic and semantic continuity, by joining discrete individual atoms and molecules of space into quasi-continuous lattices.
... System administrators create roles according to the job functions performed in an organization, grant access authorization to those roles, and then assign users to the roles on the basis of their specific job responsibilities and qualifications. In [4] Burgess described a mean field approach to defining and implementing policy-based system administration and proposed two types of model to understand policy driven management of human-computer systems. It was shown how these models could be formulated. ...
Article
Full-text available
A system administration model is described in this article. Two problems called the MINIMUM REPLICATION PROBLEM and the MINIMUM REPLICATION PROBLEM WITH USER PREFERENCES are defined to illustrate the use of the model. The complexities of the two problems are analyzed and shown to be NP-complete. A polynomial-time algorithm to solve one of the problems is described and the experimental results are shown to support the feasibility of the algorithm. In our experiments, our algorithm produces solutions of values at most 1.38 times the optimal ones.
... Nous définirons les responsabilités de l'administrateur [25,26,27,28,29] sont multiples : ...
... In this large, ever-changing, and complex computing environment, many organizations (including AMD) have turned to the practice of autonomic computing [4] to reduce the effort that sysadmins must exert to keep the environment stable. There is a system and OS configuration aspect to this, in which tools such as Cfengine [5] can enable autonomic behavior. ...
Conference Paper
System administrators have utilized log analysis for decades to monitor and automate their environments. As compute environments grow, and the scope and volume of the logs increase, it becomes more difficult to get timely, useful data and appropriate triggers for enabling automation using traditional tools like Swatch. Cloud computing is intensifying this problem as the number of systems in datacenters increases dramatically. To address these problems at AMD, we developed a tool we call the Variable Temporal Event Correlator, or VTEC. VTEC has unique design features, such as inherent multi-threaded/multi-process design, a flexible and extensible programming interface, built-in job queuing, and a novel method for storing and describing temporal information about events, that well suit it for quickly and efficiently handling a broad range of event correlation tasks in real-time. These features also enable VTEC to scale to tens of gigabytes of log data processed per day. This paper describes the architecture, use, and efficacy of this tool, which has been in production at AMD for more than four years.
... We think of the inference system as creating new facts from old facts, and new rules from old rules. An inference system is convergent if – by some finite number of applications of rules – it achieves a fixed point state in which no further operations add new facts or rules[3, 6]. The reason for this philosophical stance is computational . ...
Article
Full-text available
In troubleshooting a complex system, hidden depen-dencies manifest in unexpected ways. We present a methodology for uncovering dependencies between behavior and configuration by exploiting what we call "weak transitive relationships" in the architecture of a system. The user specifies known architectural re-lationships between components, plus a set of infer-ence rules for discovering new ones. A software sys-tem uses these to infer new relationships and suggest culprits that might cause a specific behavior. This serves both as a memory aid and to quickly enu-merate potential causes of symptoms. Architectural descriptions, including selected data from Configura-tion Management Databases (CMDB) contain most of the information needed to perform this analysis. Thus the user can obtain valuable information from such a database with little effort.
... This makes manual configuration timeconsuming and error prone. [1] System Administration research tells us that hand-configuring does not scale well and is unreliable. In a survey of ISP's in the San Francisco Bay area, 90% of system problems could be attributed to system administration errors in handconfiguration or simply misconfiguration. ...
Article
Computer Science instructors work hard to provide computer labs that are meaningful on the equipment available. New advances in computer virtualization offer a more effective utilization of existing equipment with flexible assignments. Modeling of existing networks with virtualization can create opportunities for deeper student understanding. Initial efforts to provide virtual systems for students required substantial support efforts and manual labor. By using a management tool called MLN, four universities have been able to provide student access to virtual systems and reduce the management overhead of supporting system configurations. Project in a box assignments, automated assignment checking and large scale network modeling all become easier using virtualization support tools. INTRODUCTION Too many students and not enough machines make the search for effective use of computer resources a regular activity. One solution is to provide a uniform heterogeneous computing lab facility and require that all student assignments fit into the lab resources. This diminishes the opportunities for student exploration. Virtualization offers the possibility of increasing resource utilization. Additionally, it provides for more diversity and complexity in the student programming experience.
... How can we solve this kind of problem? In the theory of system maintenance[7] , one builds up consistent and stable structures by imposing independent, atomic operations, satisfying certain constraints[8, 9]. By making the building blocks primitive and having special properties, we ensure consistency. ...
Conference Paper
Full-text available
Presently, there is no satisfactory model for dealing with political autonomy of agents in policy based management. A theory of atomic policy units called ‘promises’ is therefore discussed. Using promises, a global authority is not required to build conventional management abstractions, but work is needed to bind peers into a traditional authoritative structure. The construction of promises is precise, if tedious, but can be simplified graphically to reason about the distributed effect of autonomous policy. Immediate applications include resolving the problem of policy conflicts in autonomous networks.
... While resource management is beyond the scope of this paper, cfenvd's capabilities definitely inspired the current work on observed state. Cfenvd and related strategies inspired a parallel state machine model based upon linear algebra and convergent operators [9,11] that most definitely guided us in seeking the algebraic properties of operations. Other than cfenvd, the literature has been surprisingly quiet on the rather obvious relationship between configuration management and monitoring. ...
Article
A rigorous language for discussing the issue of configuration management is currently lacking. To this end, we develop a simple state machine model of configuration management. Observed behaviors comprise the state of a host and configuration processes accomplish state transitions. Using this language, we show that for one host in isolation and for some configuration processes, reproducibility of observed effect for a configuration process is a statically verifiable property of the process. Using configuration processes verified in this manner, we can efficiently identify latent preconditions that affect behavior among a population of hosts. Constructing configuration management tools with statically verifiable observed behaviors thus reduces the lifecycle cost of configuration management.
Article
The system administration is comprised of a set of functions intended to a multi-user computing environment to ensure reliable operations, manage efficient processing and enhance overall system performance. Due to the rapid development in wide area networks and powerful computational resources, distributed computing systems have gained widespread applications. As a result, the monitoring and managing of such large-scale heterogeneous distributed systems have become a challenging task. The existing ad hoc as well as the agent-based system monitoring and administration models are not completely suitable and reliable for managing as well as maintaining the large-scale distributed systems. This paper proposes a novel software architectural model for realizing autonomous remote system monitoring and administration in the large-scale heterogeneous distributed systems. The proposed software architecture incorporates cloud computing platform to gain scalability and location transparency. The combinations of mobile agent based approach and script based technologies are employed in the proposed model to achieve scalability, reliability and autonomy. The software architectural model is implemented on the heterogeneous distributed systems testbed and is evaluated with promising results. The heterogeneity of distributed computing systems is considered in the design by employing different operating systems, network connections and system administration domains. A detailed comparative analysis of the proposed design is included in the paper.
Article
We study a scenario for cloud services based on autonomous resource management agents in situations of competition for limited resources. In the scenario, autonomous agents make independent decisions on resource consumption in a competitive environment. Altruistic and selfish strategies for agent behaviour are simulated and compared with respect to whether they lead to successful resource management in the overall system, and how much information exchange is needed among the agents for the strategies to work. Our results imply that local agent information could be sufficient for global optimisation. Also, the selfish strategy proved stable compared to uninformed altruistic behaviour.
Conference Paper
CRDT (Conflict-free replicated data type) is a data type that supports conflict free resolution of concurrent, distributed updates. It is often mentioned alongside storage systems that are distributed, fault-tolerant and reliable. These are similar properties and features of Erlang/OTP systems. What distributed Erlang/OTP systems lack, however, is a standardised way to configure multiple nodes. OTP middleware allows you to set configuration parameters called application environment variables on a node basis, they can be updated at runtime, but will not survive a restart unless persisted in the business logic of the system. There is no widely adopted solution to address this omission. In some installations, changes are done manually in the Erlang shell and persisted by editing the configuration files. In others, changes and updates are implemented as part of a new releases and deployed through an upgrade procedure. These tools expect a happy path, and rarely take network failures and consistency into consideration. As a result, issues have been known to cause outages and have left the system in an inconsistent state, with no automated means of detecting the root cause of the problem. In this paper, we introduce a configuration management approach designed for distributed Erlang/OTP systems. They are systems which often trade consistency for availability and scalability, making them a perfect fit for CRDTs. We use a proprietary tool called WombatOAM to update environment variables and check their consistency on both node and cluster-levels. Inconsistencies and failed updates are detected and reported in the form of an alarms, and the history and status of all performed changes are logged, facilitating troubleshooting and recovery efforts. In this paper, we show our approaches to configuration management, and discuss how we approached the issue of consistency in the presence of unreliable networks. We present a qualitative evaluation and a case study to assess the capabilities of WombatOAM’s CRDT based configuration management feature.
Article
Cfengine is a policy-based configuration management system (Burgess 1995). Its pri-mary function is to provide automated configuration and maintenance of computers, from a policy specification.The cfengine project was started in 1993 as a reaction to the complexity and non-portability of shell scripting for Unix configuration management, and continuestoday. The aim was to absorb frequently used coding paradigms into a declarative, domain-specific language that would offer self-documenting configuration. Cfengine is estimated to run on millions of Unix, MacOSX and Windows computers all around the world. It is used in both large and small companies, as well as inmany universities and governmental institutions. Sites as large as 11,000 machines are reported, while sites of several thousand hosts running under cfengine are common.Cfengine falls into a class of approaches to system administration which is called policy-based configuration management (Sloman and Moffet 1993). Purchase this chapter to continue reading all 7 pages >
Article
Mobile applications are increasingly used by first responders, medics, researchers and other people in the field support of their missions and tasks. These environments have very limited connectivity and computing resources. Cloudlet-based cyber-foraging is a method of opportunistically discovering nearby resource-rich nodes that can increase the computing power of mobile devices and enhance the mobile applications running on them. In this paper we present On-Demand VM Provisioning, a mechanism for provisioning cloudlets at runtime by leveraging the advantages of enterprise provisioning tools commonly used to maintain configurations in enterprise environments. We present details of a prototype for On-Demand VM Provisioning and the results of a quantitative and qualitative evaluation of the prototype compared to other cloudlet provisioning mechanisms. The evaluation shows that On-Demand VM Provisioning shows promise in terms of flexibility, energy consumption, maintainability and leverage of cloud computing best practices, but can be challenging in disconnected environments, especially for complex applications with many dependencies. © 2014 The Institute for Computer Sciences, Social Informatics, and Telecommunications Engineering (ICST).
Chapter
This chapter appraises the art of system and network configuration, which has come a long way from its humble roots in scripting of manual system administration. System configuration management is the process of maintaining the function of computer networks as holistic entities, in alignment with some previously determined policy. This policy describes how systems should behave and it is translated into a low-level configuration, informally defined as the contents of a number of specific files contained within each computer system. A typical system to be managed consists of a central-processing unit, hard disk, and associated peripherals, including network card, video card, etc. Depending upon the system, many of these components may actually be contained on a single computer board. A system typically starts its life with no operating system installed, often called "bare metal". Configuration management is easy if one has access to unlimited hardware, and can encapsulate each distinct service on a separate machine. The complexities of configuration management arise from the need to combine or compose distinct and changing needs while utilizing limited amounts of hardware. The exact boundary between "system configuration management" andregular "system administration" is unclear, but there seems to be a solid distinction between the two at the point at which the overall configuration of a network of systems is being managed as an entity, rather than managing individual machines that contain distinct and unrelated configurations.
Article
This chapter presents a simple theory developing a reliable syntax, whose execution semantics are unambiguous. Of all the languages used by system administrators and operators in deploying services (e.g. PHP, Perl, TCL, Scheme, etc.) few can be said to have such a clear connection between syntax and behavior. The system world would do well to foster this kind of predictability in future technologies. Today there is a need for a new generation of languages for system management, in a variety of contexts. One example is configuration management. Configuration management is currently an active area. The chapter highlights the role of layers of language abstraction and their effect on semantics of language statements and shows how these languages can be built up from low-level primitives through virtualiation layers to high-level constructs, so that we might build new languages that are "correct by construction". A simple program notation for object oriented programming has been provided admitting a projection semantics. Many more features exist and the project of syntax design has only been touched upon. Still the claim is made that the above considerations provide a basis for the design of reliable syntax for much more involved program notations. Projection semantics provides a scientifically well-founded and rigorous, yet simple approach to the semantics of programming languages. Future work will have to clarify that this framework is industrially viable for the high-level design and analysis of complex systems, and for natural refinements of models to executable and reliable code.
Article
Biology has succeeded in solving many computational and communication problems in the natural world, and computer users are ever inspired by its apparently ingenious creativity. Today scientists are building artificial immune systems and discussing autonomic computing, with self-healing, self-anything systems. We discuss the relevance and efficacy of these approaches. Are they better than classical software engineering design?
Article
Full-text available
A causal, stochastic model of networked computers, based on information theory and non-equilibrium dynamical systems is presented. This provides a simple explanatino for recent experimental results revealing the structure of information in network transac-tions. The model is based on non-Poissonian stochastic variables, and pseudo-periodic functions. It explains the measured patterns seen in resource variables on computers in network communities. Weakly non-Poissonian behaviour can be eliminated by a confor-mal scaling transformation, and leads to a mapping onto statistical field theory. From this it is possible to calculate the exact profile of the spectrum of fluctuations. This work has applications to anomaly detection and time-series analysis of computer transactions.
Article
This inductive study relies on activity theory as the guiding framework to interpret the theory-practice linkages found in organizational projects that scholar-practitioners considered successful in delivering business results and furthering academic knowledge. These projects that delivered both business and academic results involved certain components of theory and practice as tools of mediation to inform action and the creation of distinct linkages between them. Six basic forms of linkages were evident in all projects that further tended to serve four predominant functions: as framing devices, influencing and legitimizing devices, sensemaking devices, and demonstrative devices. Two dominant strategies, turns and scaffolding, from theory to practice, and practice to theory, were used to create these linkages. The temporal sequence involved in this process is described in the paper; and the agential role of the scholar-practitioner in creating these theory-practice linkages is highlighted. The significance of this model is in moving beyond general descriptions of the usefulness of theory-practice integration, to provide a more specific description of the process of how such integration is achieved by the scholar-practitioner that she/he further uses to generate theoretical contributions and business results.
Article
Full-text available
We study the adaptive behavior of a computational ecosystem in the presence of time-periodic resource utilities as seen, for example in the day-night load variations of computer use and in the price fluctuations of seasonal products. We do so within the context of the Huberman-Hogg model of such systems. The dynamics is studied for the cases of competitive and cooperative payoff functions with time-modulated resource utilities, and the system's adaptability is measured by tracking its performance in response to a time-varying environment,
Article
Full-text available
The software architecture of a distributed program can be represented by an hierarchical composition of subsystems, with interacting processes at the leaves of the hierarchy. Compositional reachability analysis has been proposed as a promising automated method to derive the overall behavior of a distributed program in stages, based on its architecture. The method is particularly suitable for the analysis of programs which are subject to evolutionary change. When a program evolves, only behavior of those subsystems affected by the change need be re-evaluated. The method however has a limitation. The properties available for analysis are constrained by the set of actions that remain globally observable. The properties of subsystems, may not be analyzed. We extend the method to check safety properties of subsystems which may contain actions that are not globally observable. These safety properties can still be checked in the framework of compositional reachability analysis. The extension is supported by augmenting finite-state machines with a special undefined state /spl pi/. The state is used to capture possible violation of the safety properties specified by software developers. The concepts are illustrated using a gas station system as a case study.
Conference Paper
Full-text available
We present a form of discretionary lock which is designed to render unreliable but frequently scheduled scripts or programs predictable even when the execution time of locked operations may grow and exceed their expected scheduling interval. We implement our locking policy with lock-unlock semantics and test them on the system administration language cfengine. The locks are controlled by too-soon and too-late parameters so that execution times can be controlled within fixed bounds even when scheduling requests occur randomly in addition to the periodic scheduling time. This has the added bonus of providing an anti- spamming functionality.
Conference Paper
Full-text available
When deploying and administering systems infrastructures it is still common to think in terms of individual machines rather than view an entire infrastructure as a combined whole. This standard practice creates many problems, including labor-intensive administration, high cost of ownership, and limited generally available knowledge or code usable for administering large infrastructures. The model we describe treats an infrastructure as a single large distributed virtual machine. We found that this model allowed us to approach the problems of large infrastructures more effectively. This model was developed during the course of four years of mission-critical rollouts and administration of global financial trading floors. The typical infrastructure size was 300-1000 machines, but the principles apply equally as well to much smaller environments. Added together these infrastructures totaled about 15,000 hosts. Further refinements have been added since then, based on experiences at NASA Ames. The methodologies described here use UNIX and its variants as the example operating system. We have found that the principles apply equally well, and are as sorely needed, in managing infrastructures based on other operating systems. This paper is a living document: Revisions and additions are expected and are available at www.infrastructures.org. We also maintain a mailing list for discussion of infrastructure design and implementation issues - details are available on the web site.
Conference Paper
Full-text available
In order to develop system administration strategies which can best achieve organizations' goals, impartial methods of analysis need to be applied, based on the best information available about needs and user practices. This paper draws together several threads of earlier research to propose an analytical method for evaluating system administration policies, using statistical dynamics and the theory of games.
Article
Full-text available
In this paper, we present a game theoretic framework for bandwidth allocation for elastic services in high-speed net- works. The framework is based on the idea of the Nash bargaining solution from cooperative game theory, which not only provides the rate settings of users that are Pareto optimal from the point of view of the whole system, but are also consistent with the fairness axioms of game theory. We first consider the centralized problem and then show that this procedure can be decentralized so that greedy optimization by users yields the system optimal bandwidth allocations. We propose a distributed algorithm for implementing the optimal and fair bandwidth allocation and provide conditions for its convergence. The paper concludes with the pricing of elastic connections based on users' bandwidth requirements and users' budget. We show that the above bargaining framework can be used to characterize a rate allocation and a pricing policy which takes into account users' budget in a fair way and such that the total network revenue is maximized.
Article
Full-text available
Cfengine is a language-based system administration tool in which system maintenance tasks are automated and the configuration of all networked hosts are defined in a central file. Host configuration may be tested and repaired any number of times without the need for human intervention. Cfengine uses a decision-making process based on class membership and is therefore optimized for dealing with large numbers of related hosts as well as individually pin-pointed systems.
Article
Full-text available
A common property of many large networks, including the Internet, is that the connectivity of the various nodes follows a scale-free power-law distribution, P(k) = ck(-alpha). We study the stability of such networks with respect to crashes, such as random removal of sites. Our approach, based on percolation theory, leads to a general condition for the critical fraction of nodes, p(c), that needs to be removed before the network disintegrates. We show analytically and numerically that for alpha</=3 the transition never takes place, unless the network is finite. In the special case of the physical structure of the Internet (alpha approximately 2.5), we find that it is impressively robust, with p(c)>0.99.
Conference Paper
Full-text available
With the increasing demand for long running and highly available distributed services, interest in systems which can undergo dynamic reconfiguration has risen. However for dynamic change to yield valid systems, change actions must be carried out such that the consistency of the software modules making up the system is not breached. This can be ensured if the subset of the system which is to undergo change is in a state amenable to reconfiguration. This paper presents an algorithm which imposes a safe state over the part of the system undergoing change. The algorithm suits a particular class of transactional systems and places special emphasis on minimising the interference to the rest of the system and reducing the programmer contribution necessary for achieving this safe state.
Article
Full-text available
A model for dynamic change management which separates structural concerns from component application concerns is presented. This separation of concerns permits the formulation of general structural rules for change at the configuration level without the need to consider application state, and the specification of application component actions without prior knowledge of the actual structural changes which may be introduced. In addition, the changes can be applied in such a way so as to leave the modified system in a consistent state, and cause no disturbance to the unaffected part of the operational system. The model is applied to an example problem, `evolving philosophers'. The principles of this model have been implemented and tested in the Conic environment for distributed systems
Article
Full-text available
this article argues that the similarities are compelling and could point the way to improved computer security. Improvements can be achieved by designing computer immune systems that have some of the important properties illustrated by natural immune systems. These include multi-layered protection, highly distributed detection and memory systems, diversity of detection ability across individuals, inexact matching strategies, and sensitivity to most new foreign patterns. We first give an overview of how the immune system relates to computer security. We then illustrate these ideas with two examples. The immune system is comprised of cells and molecules.
Article
Full-text available
Automated intrusion response is an important unsolved problem in computer security. A system called pH (for process homeostasis) is described which can successfully detect and stop intrusions before the target system is compromised. In its current form, pH monitors every executing process on a computer at the system-call level, and responds to anomalies by either delaying or aborting system calls. The paper presents the rationale for pH, its design and implementation, and a set of initial experimental results. 1 Introduction This paper addresses a largely ignored aspect of computer security---the automated response problem. Previously, computer security research has focused almost entirely on prevention (e.g., cryptography, firewalls and protocol design) and detection (e.g., virus and intrusion detection). Response has been an afterthought, generally restricted to increased logging and administrator email. Commercial intrusion detection systems (IDSs) are capable of terminating conne...
Conference Paper
Article
System administration is about the design, running and maintenance of human-computer systems. Examples of human-computer systems include business enterprises, service institutions and any extensive machinery that is operated by, or interacts with human beings. System administration is often thought of as the technological side of a system: the architecture, construction and optimization of the collaborating parts, but it also occasionally touches on softer factors such as user assistance (help desks), ethical considerations in deploying a system, and the larger implications of its design for others who come into contact with it. This book summarizes the state of research and practice in this emerging field of network and system administration, in an anthology of chapters written by the top academics in the field. The authors include members of the IST-EMANICS Network of Excellence in Network Management.This book will be a valuable reference work for researchers and senior system managers wanting to understand the essentials of system administration, whether in practical application of a data center or in the design of new systems and data centers.
Article
The Dirac field, as perturbed by a time-dependent external electromagnetic field that reduces to zero on the boundary surfaces, is the object of discussion. Apart from the modification of the Green's function, the transformation function differs in form that of the field-free case only by the occurrence of a field-dependent numerical factor, which is expressed as an infinite determinant. It is shown that, for the class of fields characterized by finite space-time integrated energy densities, a modification of this determinant is an integral function of the parameter measuring the strength of the field and can therefore be expressed as a power series with an infinite radius of convergence. The Green's function is derived therefrom as the ratio of two such power series. The transformation function is used as a generating function for the elements of the occupation number labelled scattering matrix S and, in particular, we derive formulas for the probabilities of creating n pairs, for a system initially in the vacuum state. The general matrix element of S is presented, in terms of the classification that employs a time-reversed description for the negative frequency modes, with the aid of a related matrix Sigma, which can be viewed as describing the development of the system in proper time. The latter is characterized as indefinite unitary, in contrast with the unitary property of S, which is verified directly. Two appendices are devoted to determinantal properties.
Book
"This is the classic work upon which modern-day game theory is based. What began more than sixty years ago as a modest proposal that a mathematician and an economist write a short paper together blossomed, in 1944, when Princeton University Press published Theory of Games and Economic Behavior. In it, John von Neumann and Oskar Morgenstern conceived a groundbreaking mathematical theory of economic and social organization, based on a theory of games of strategy. Not only would this revolutionize economics, but the entirely new field of scientific inquiry it yielded--game theory--has since been widely used to analyze a host of real-world phenomena from arms races to optimal policy choices of presidential candidates, from vaccination policy to major league baseball salary negotiations. And it is today established throughout both the social sciences and a wide range of other sciences. This sixtieth anniversary edition includes not only the original text but also an introduction by Harold Kuhn, an afterword by Ariel Rubinstein, and reviews and articles on the book that appeared at the time of its original publication in the New York Times, tthe American Economic Review, and a variety of other publications. Together, these writings provide readers a matchless opportunity to more fully appreciate a work whose influence will yet resound for generations to come.
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
The approach described in this paper represents a substantive departure from the conventional quantitative techniques of system analysis. It has three main distinguishing features: 1) use of so-called ``linguistic'' variables in place of or in addition to numerical variables; 2) characterization of simple relations between variables by fuzzy conditional statements; and 3) characterization of complex relations by fuzzy algorithms. A linguistic variable is defined as a variable whose values are sentences in a natural or artificial language. Thus, if tall, not tall, very tall, very very tall, etc. are values of height, then height is a linguistic variable. Fuzzy conditional statements are expressions of the form IF A THEN B, where A and B have fuzzy meaning, e.g., IF x is small THEN y is large, where small and large are viewed as labels of fuzzy sets. A fuzzy algorithm is an ordered sequence of instructions which may contain fuzzy assignment and conditional statements, e.g., x = very small, IF x is small THEN Y is large. The execution of such instructions is governed by the compositional rule of inference and the rule of the preponderant alternative. By relying on the use of linguistic variables and fuzzy algorithms, the approach provides an approximate and yet effective means of describing the behavior of systems which are too complex or too ill-defined to admit of precise mathematical analysis.
Article
The recently discovered, nearby young supernova remnant in the southeast corner of the older Vela supernova remnant may have been seen in measurements of nitrate abundances in Antarctic ice cores. Such an interpretation of this 20-year-old ice-core data would provide a more accurate dating of this supernova than is possible purely using astrophysical techniques. It permits an inference of the supernova's 44Ti yield purely on an observational basis, without reference to supernova modelling. The resulting estimates of the supernova distance and light-arrival time are 200 pc and 700 years ago, implying an expansion speed of 5000 km/s for the supernova remnant. Such an expansion speed has been argued elsewhere to imply the explosion to have been a 15 M⊙ Type II supernova. This interpretation also adds new evidence to the debate as to whether nearby supernovae can measurably affect nitrate abundances in polar ice cores.
Article
We study the macroscopic behavior of computation and examine both emergent collective phenomena and dynamical aspects with an emphasis on software issues, which are at the core of large scale distributed computation and artificial intelligence systems. By considering large systems, we exhibit novel phenomena which cannot be foreseen from examination of their smaller counterparts. We review both the symbolic and connectionist views of artificial intelligence, provide a number of examples which display these phenomena, and resort to statistical mechanics, dynamical systems theory and the theory of random graphs to elicit the range of possible behaviors.
Conference Paper
The rapid growth of eCommerce increasingly means busi- ness revenues depend on providing good quality of service (QoS) for web site interactions. Traditionally, system administrators have been respon- sible for optimizing tuning parameters, a process that is time-consuming and skills-intensive, and therefore high cost. This paper describes an ap- proach to automating parameter tuning using a fuzzy controller that employs rules incorporating qualitative knowledge of the effect of tuning parameters. An example of such qualitative knowledge in the Apache web server is “MaxClients has a concave upward effect on response times.” Our studies using a real Apache web server suggest that such a scheme can improve performance without human intervention. Further, we show that the controller can automatically adapt to changes in workloads.
Conference Paper
Management of operating system configuration files files is an essential part of UNIX systems administration. It is particularly difficult in environments with a large number of computers. This paper presents a study of UNIX configuration file management. It compares existing systems and tools from the literature, presents several case studies of configuration file management in practice, examines one site in depth, and makes numerous observations on the configuration process.
Conference Paper
Probabilistic Risk Assessment (PRA) is a method of estimating system reliability by combining logic models of the ways systems can fail with numerical failure rates. One postulates a failure state and systematically decomposes this state into a combination of more basic events through a process known as Fault Tree Analysis (FTA). Failure rates are derived from vendor specifications, historical trends, on-call reports, and many other sources. FTA has been used for decades in the defense, aerospace, and nuclear power industries to manage risk and increase reliability of complex engineering systems. Combining FTA with event tree analysis (ETA), one can associate failure probabilities with consequences to clearly communicate risk both pictorially and numerically. Basic PRA techniques can help increase the reliability and security of computer systems.
Conference Paper
We were recently asked by management to produce an interactive metric. This metric should somehow measure the performance of the machine as the user perceives it, or the interactive response time of a machine. The metric could then be used to identify unusual behavior, or machines with potential performance problems in our network. This paper describes firstly how we set about trying to pin down such an intangible quality of the system and how we produced graphs that satisfied management requirements. We also discuss how further use can be made of the metric results to provide data for dealing with user reported interactive response problems. Finally, we relate why this metric is not the tool for analyzing system performance that it may superficially appear to be.
Conference Paper
The process of network debugging is commonly guided by ''decision trees,'' that describe and attempt to address the most common failure modes. We show that troubleshooting can be made more effective by converting decision trees into suites of ''convergent'' troubleshooting scripts that do not change network attributes unless these are out of compliance with accepted norms. ''Maelstrom'' is a tool for managing and coordinating execution of these scripts. Maelstrom exploits convergence of individual scripts to dynamically infer an appropriate execution order for the scripts. It accomplishes this in O(n 2 ) procedure trials, where n is the number of troubleshooting scripts. This greatly eases adding scripts to a troubleshooting scheme, and thus makes it easier for people to cooperate in producing more exhaustive and effective troubleshooting schemes.
Conference Paper
In an ideal world, the system administrator would simply specify a complete model of system requirements and the system would automatically fulfill them. If requirements changed, or if the system deviated from requirements, the system would change itself to converge with requirements. Current specialized tools for convergent system administration already provide some ability to do this, but are limited by specification languages that cannot adequately represent all possible sets of requirements. We take the opposite approach of starting with a general-purpose logic programming language intended for specifying requirements and analyzing system state, and adapting that language for system administration. Using Prolog with appropriate extensions, one can specify complex system requirements and convergent processes involving multiple information domains, including information about files, filesystems, users, and processes, as well as information from databases. By hiding unimportant details, Prolog allows a simple relationship between requirements and the scripts that implement them. We illustrate these observations by use of a simple proof-of-concept prototype.
Conference Paper
When faced with the many problems that arise in a complex of heterogeneous networked workstations, systems administrators often resort to coding scripts to monitor and problem-solve, scripts that they then schedule via cron. PIKT is a new and innovative approach to monitor scripting and managing system configurations. PIKT consists of an embedded scripting language with unique labor-saving features, a sophisticated script and system configuration file preprocessor, a scheduler, an installer, and other useful tools. More than just a systems monitor, PIKT is also a cross-categorical toolkit for configuring systems, organizing system security, formatting documents, assisting command-line work, and performing other common systems administration tasks.
Conference Paper
Computer systems require monitoring to detect performance anomalies such as runaway processes, but problem detection and diagnosis is a complex task requiring skilled attention. Although human attention was never ideal for this task, as networks of computers grow larger and their interactions more complex, it falls far short. Existing computer-aided management systems require the administrator manually to specify fixed "trouble" thresholds. In this paper we report on an expert system that automatically sets thresholds, and detects and diagnoses performance problems on a network of Unix computers. Key to the success and scalability of this system are the time series models we developed to model the variations in workload on each host. Analysis of the load average records of 50 machines yielded models which show, for workstations with simulated problem injection, false positive and negative rates of less than 1%. The server machines most difficult to model still gave average false positive/negative rates of only 6%/32%. Observed values exceeding the expected range for a particular host cause the expert system to focus on that machine. There it applies tools with finer resolution and more discrimination, including per-command profiles gleaned from process accounting records. It makes one of 18 specific diagnoses and notifies the administrator, and optionally the user [a].
Article
We describe experiences and frequently used configuration idioms for simplifying data and system administration using the GNU site configuration tool cfengine. © 1997 John Wiley & Sons, Ltd.
Article
In this paper two things are done. (1) It is shown that a considerable simplification can be attained in writing down matrix elements for complex processes in electrodynamics. Further, a physical point of view is available which permits them to be written down directly for any specific problem. Being simply a restatement of conventional electrodynamics, however, the matrix elements diverge for complex processes. (2) Electrodynamics is modified by altering the interaction of electrons at short distances. All matrix elements are now finite, with the exception of those relating to problems of vacuum polarization. The latter are evaluated in a manner suggested by Pauli and Bethe, which gives finite results for these matrices also. The only effects sensitive to the modification are changes in mass and charge of the electrons. Such changes could not be directly observed. Phenomena directly observable, are insensitive to the details of the modification used (except at extreme energies). For such phenomena, a limit can be taken as the range of the modification goes to zero. The results then agree with those of Schwinger. A complete, unambiguous, and presumably consistent, method is therefore available for the calculation of all processes involving electrons and photons.
Article
I want in this article to trace the history of an idea. It is beginning to become clear that a range of problems in evolution theory can most appropriately be attacked by a modification of the theory of games, a branch of mathematics first formulated by Von Neumann and Morgenstern in 1944 for the analysis of human conflicts. The problems are diverse and include not only the behaviour of animals in contest situations but also some problems in the evolution of genetic mechanisms and in the evolution of ecosystems. It is not, however, sufficient to take over the theory as it has been developed in sociology and apply it to evolution. In sociology, and in economics, it is supposed that each contestant works out by reasoning the best strategy to adopt, assuming that his opponents are equally guided by reason. This leads to the concept of a ‘minimax’ strategy, in which a contestant behaves in such a way as to minimise his losses on the assumption that his opponent behaves so as to maximise them. Clearly, this would not be a valid approach to animal conflicts. A new concept has to be introduced, the concept of an ‘evolutionary stable strategy’.
Article
It is shown that networks of computers can be described by concepts of statistical physics. Computers in a network behave like systems coupled to a thermal reservoir. The role of thermal fluctuations is played by computing transactions. A thermal Kubo-Martin-Schwinger condition arises due to the coupling of a computer to a strong periodic source, namely, the daily and weekly usage patterns of the system.
Conference Paper
Service providers typically define quality of service problems using threshold tests, such as “are HTTP operations greater than 12 per second on server XYZ?” This paper explores the feasibility of predicting violations of threshold tests. Such a capability would allow providers to take corrective actions in advance of service disruptions. Our approach estimates the probability of threshold violations for specific times in the future. We modeled the threshold metric (e.g., HTTP operations per second) at two levels: (1) nonstationary behavior (as is done in workload forecasting for capacity planning) and (2) stationary, time-serial dependencies. Using these models, we compute the probability of threshold violations. We asses our approach using measurements of HTTP operations per second collected from a production Web server. These assessments suggest that our approach works well if: (a) the actual values of predicted metrics are sufficiently distant from their thresholds; and/or (b) the prediction horizon is not too far into the future
Conference Paper
Extensible operating systems allow applications to modify kernel behavior by providing mechanisms for application code to run in the kernel address space. Extensibility enables a system to efficiently support a broader class of applications than is currently supported. This paper discusses the key challenge in making extensible systems practical: determining which parts of the system need to be extended and how. The determination of which parts of the system need to be extended requires self-monitoring, capturing a significant quantity of data about the performance of the system. Determining how to extend the system requires self-adaptation. In this paper, we describe how an extensible operating system (VINO) can use in situ simulation to explore the efficacy of policy changes. This automatic exploration is applicable to other extensible operating systems and can make these systems self-adapting to workload demands
Disk space management without quotas
  • E D Zwicky
E.D. Zwicky. Disk space management without quotas. Proceedings of the third systems administration conference LISA, (SAGE/USENIX), page 41, 1989.
Allocating functions among humans and machines Improving Function Allocation for Integrated Systems Design
  • T B Sheridan
T.B. Sheridan, Allocating functions among humans and machines, in: D. Beevis, P. Essens, H. Schuuel (Eds.), Improving Function Allocation for Integrated Systems Design, Wright–Patterson Airforce Base, CSERIAC State-of-the-Art Report, 1996, pp. 179 –198.
The Theory of Finanical Risks
  • J Bouchard
  • M Potters
J.P Bouchard, M. Potters, The Theory of Finanical Risks, Cambridge University Press, Cambridge, 2000.
Cfengine's immunity model of evolving configuration management
  • M Burgess
M. Burgess, Cfengine's immunity model of evolving conÿguration management, Sci. Comput. Programming, 2002, submitted for publication.
A site conÿguration engine Computing systems
  • M Burgess
M. Burgess, A site conÿguration engine, Computing systems, Vol. 8, MIT Press, Cambridge, MA, 1995, p. 309.
Measuring host normality
  • M Burgess
  • H Haugerud
  • T Reitan
  • S Straumsnes
M. Burgess, H. Haugerud, T. Reitan, S. Straumsnes, Measuring host normality, ACM Trans. Comput. Systems 20 (2001) 125–160.
Measuring host normality. Software Practice and Experience (submitted)
  • M Burgess
  • H Haugerud
  • S Straumsnes
M. Burgess, H. Haugerud, and S. Straumsnes. Measuring host normality. Software Practice and Experience (submitted), 1999.