An adaptive approach to network resilience: Evolving challenge detection and mitigation.
ABSTRACT It is widely agreed that computer networks need to become more resilient to a range of challenges that can seriously impact their normal operation. Challenges include malicious attacks, misconfigurations, accidental faults and operational overloads. As part of an overall strategy for network resilience, a crucial requirement is the identification of challenges in real-time, followed by the application of appropriate remedial action. In this paper, we motivate and describe a novel solution that enables the progressive multi-stage deployment of resilience strategies, based on incomplete challenge and context information. Policies are used to orchestrate the interactions between various resilience mechanisms, which incrementally identify the nature of a challenge and deploy appropriate remediation mechanisms. We demonstrate the benefits of this approach via simulation of a resource starvation attack on an Internet Service Provider infrastructure. By initially using lightweight detection and then progressively applying more heavyweight analysis, a key contribution of our work is the ability to mitigate a challenge as early as possible and rapidly detect its root cause. The approach we propose in this paper has the flexibility, reproducibility and extensibility needed to assist in the identification and remediation of various network challenges in the future.
- SourceAvailable from: Paul Smith
Conference Paper: Management patterns: SDN-enabled network resilience management[Show abstract] [Hide abstract]
ABSTRACT: Software-defined networking provides abstractions and a flexible architecture for the easy configuration of network devices, based on the decoupling of the data and control planes. This separation has the potential to considerably simplify the implementation of resilience functionality (e.g., traffic classification, anomaly detection, traffic shaping) in future networks. Although software-defined networking in general, and OpenFlow as its primary realisation, provide such abstractions, support is still needed for orchestrating a collection of OpenFlow-enabled services that must cooperate to implement network-wide resilience. In this paper, we describe a resilience management framework that can be readily applied to this problem. An important part of the framework are policy-controlled management patterns that describe how to orchestrate individual resilience services, implemented as OpenFlow applications.NOMS 2014 - 2014 IEEE/IFIP Network Operations and Management Symposium; 05/2014
Conference Paper: Single Points of Failure Within Systems-of-Systems[Show abstract] [Hide abstract]
ABSTRACT: Computer technology has become highly complex, widely available and thanks to growth and popularity of the Internet, society and organisations are becoming heavily reliant on distinct computer technology to meet objectives and fulfil daily demands. Unfortunately distinct systems cannot always fulfil the complex requirements demanded; hence distinct systems are now being integrated to form larger collaborating systems to meet intended requirements. These systems are defined as "Systems-of-Systems", and are emerging in areas such as critical infrastructure, space exploration, and the military. This paper highlights some of the challenges faced when integrating distinct systems to form these large complex heterogeneous Systems-of-Systems, we identify Systems-of-Systems that have failed and the disastrous consequences. Our research also discusses how single points of failure can heavily impact collaborating systems abilities to fulfil objectives, which can result in them failing with disastrous consequences.Proceedings of the 14th Annual Post Graduate Symposium on the Convergence of Telecommunications, Networking and Broadcasting (PGNet 2013; 06/2013
- [Show abstract] [Hide abstract]
ABSTRACT: Network resilience strategies aim to maintain ac-ceptable levels of network operation in the face of challenges, such as malicious attacks, operational overload or equipment failures. Often the nature of these challenges requires resilience strategies comprising mechanisms across multiple protocol layers and in disparate locations of the network. In this paper, we address the problem of resilience management and advocate that a new approach is needed for the design and evaluation of resilience strategies. To support the realisation of this approach we propose a framework that enables (1) the offline evaluation of resilience strategies to combat several types of challenges, (2) the generalisa-tion of successful solutions into reusable patterns of mechanisms, and (3) the rapid deployment of appropriate patterns when challenges are observed at run-time. The evaluation platform permits the simulation of a range of challenge scenarios and the resilience strategies used to combat these challenges. Strategies that can successfully address a particular type of challenge can be promoted to become resilience patterns. Patterns can thus be used to rapidly deploy resilience configurations of mechanisms when similar challenges are detected in the live network.01/2012; DOI:10.1109/NOMS.2012.6211924