Configuration management at massive scale: system design and experience

Pennsylvania State Univ., University Park, PA
IEEE Journal on Selected Areas in Communications (Impact Factor: 3.12). 05/2009; DOI: 10.1109/JSAC.2009.090408
Source: DBLP

ABSTRACT The development and maintenance of network device configurations is one of the central challenges faced by large network providers. Current network management systems fail to meet this challenge primarily because of their inability to adapt to rapidly evolving customer and provider-network needs, and because of mismatches between the conceptual models of the tools and the services they must support. In this paper, we present the Presto configuration management system that attempts to address these failings in a comprehensive and flexible way. Developed for and used during the last 5 years within a large ISP network, Presto constructs device-native configurations based on the composition of configlets representing different services or service options. Configlets are compiled by extracting and manipulating data from external systems as directed by the Presto configuration scripting and template language. We outline the configuration management needs of large-scale network providers, introduce the PRESTO system and configuration language, and reflect upon our experiences developing PRESTO configured VPN and VoIP services. In doing so, we describe how PRESTO promotes healthy configuration management practices.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent works, have shown the benefits of a systematic approach to designing enterprise networks. However, these works are limited to the design of greenfield (newly deployed) networks, or to incremental evolution of existing networks without altering prior design decisions. In this paper, we focus on redesigning existing networks, allowing for changes to existing decisions. Such redesign (migration) may be desirable from the perspective of improved network performance or lower complexity. However, the key challenge is that the costs of redesign may be high due to the presence of complex dependencies between network configurations. We consider these issues in the context of virtual local area networks (VLANs), an important area of enterprise network design. We make three contributions. First, we present a model to capture VLAN redesign costs. Such costs may arise from the need to reconfigure policies (e.g., security policies) to reflect the changes in VLAN design and ensure the continued correctness of the network. Second, we present a framework that enables operators to systematically determine the best strategies to redesign VLANs so the desired performance goals may be achieved while the costs of redesign are minimized. Finally, we demonstrate the effectiveness of our approach using data obtained from a large-scale campus network.
    INFOCOM, 2011 Proceedings IEEE; 05/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern data centers need to manage complex, multi-level hardware and software infrastructures in order to provide a wide array of services flexibly and reliably. The emerging trends of virtualization and outsourcing further increase the scale and complexity of this management. In this chapter, we focus on the configuration management issues and expose a variety of attack and misconfiguration scenarios, and discuss some approaches to making configuration management more robust. We also discuss a number of challenges in identifying the vulnerabilities in configurations, handling configuration management in the emerging cloud computing environments, and in hardening the configurations against hacker attacks.
    08/2011: pages 161-181;
  • [Show abstract] [Hide abstract]
    ABSTRACT: As a network evolves over time, multiple operators modify its configuration, without fully considering what has previously been done. Similar policies are defined more than once, and policies that become obsolete after a transition are left in the configuration. As a result, the network configuration becomes complicated and disorganized, escalating maintenance costs and operator faults. We present a reorganization system that groups common policies by discovering a set of shared features and which uses the groupings for the configuration instead of using each individual policy. Such an approach removes redundancies and simplifies the configuration while preserving the intended behavior of the configuration. We apply the reorganization system to the routing-policy configurations from four production networks, and reduce more than 50% of configuration commands. These reduced configurations are shown to be sufficient to satisfy changes as the network evolves over a two-year period. In addition, we conduct a set of user studies involving 62 participants. These studies examine the participants’ comprehension of reorganized configurations as compared to the original configurations. The studies show that our reorganization system improves both accuracy, from 60% to nearly 90%, as well as time-to-task-completion, from 24 min to 13 min.
    Computer Networks. 09/2012; 56(14):3192–3205.