An adaptive approach to network resilience: Evolving challenge detection and mitigation.
ABSTRACT It is widely agreed that computer networks need to become more resilient to a range of challenges that can seriously impact their normal operation. Challenges include malicious attacks, misconfigurations, accidental faults and operational overloads. As part of an overall strategy for network resilience, a crucial requirement is the identification of challenges in real-time, followed by the application of appropriate remedial action. In this paper, we motivate and describe a novel solution that enables the progressive multi-stage deployment of resilience strategies, based on incomplete challenge and context information. Policies are used to orchestrate the interactions between various resilience mechanisms, which incrementally identify the nature of a challenge and deploy appropriate remediation mechanisms. We demonstrate the benefits of this approach via simulation of a resource starvation attack on an Internet Service Provider infrastructure. By initially using lightweight detection and then progressively applying more heavyweight analysis, a key contribution of our work is the ability to mitigate a challenge as early as possible and rapidly detect its root cause. The approach we propose in this paper has the flexibility, reproducibility and extensibility needed to assist in the identification and remediation of various network challenges in the future.
- SourceAvailable from: Paul Smith
- "Despite the multitude of mechanisms and techniques available, it is often not clear how these should be combined and coordinated in complex multi-service networks. We found that the published state-of-the-art in challenge detection and classification varies in the resources that are required, the timeliness and accuracy of their operation, and the challenges they can effectively operate with . For example, localised detection in fluctuations of traffic volumes can give a rapid and relatively lightweight indication of the onset of challenges, such as DDoS attacks or flash crowd events, whereas a sophisticated classification system can yield more accurate information about the challenge, e.g., the identification of malicious flows, over a longer period of time. "
Conference Paper: Management patterns: SDN-enabled network resilience management[Show abstract] [Hide abstract]
ABSTRACT: Software-defined networking provides abstractions and a flexible architecture for the easy configuration of network devices, based on the decoupling of the data and control planes. This separation has the potential to considerably simplify the implementation of resilience functionality (e.g., traffic classification, anomaly detection, traffic shaping) in future networks. Although software-defined networking in general, and OpenFlow as its primary realisation, provide such abstractions, support is still needed for orchestrating a collection of OpenFlow-enabled services that must cooperate to implement network-wide resilience. In this paper, we describe a resilience management framework that can be readily applied to this problem. An important part of the framework are policy-controlled management patterns that describe how to orchestrate individual resilience services, implemented as OpenFlow applications.NOMS 2014 - 2014 IEEE/IFIP Network Operations and Management Symposium; 05/2014
- "While it could be perceived that data is at its most vulnerable state during its transmission across the SoS, we look at data in its entirety and recognise there are many weaknesses that can truly affect data within SoS. Data is not just at risk from malicious attacks by outsiders, data can be at risk from legitimate user error, components within the SoS, the physical structure of the network and Internet, as well as natural disasters . For example data can become corrupt via system components during its creation and processing, also malicious attackers from within the SoS can alter, corrupt or delete data just as easily as a legitimate users unintentional actions . "
Conference Paper: Single Points of Failure Within Systems-of-Systems[Show abstract] [Hide abstract]
ABSTRACT: Computer technology has become highly complex, widely available and thanks to growth and popularity of the Internet, society and organisations are becoming heavily reliant on distinct computer technology to meet objectives and fulfil daily demands. Unfortunately distinct systems cannot always fulfil the complex requirements demanded; hence distinct systems are now being integrated to form larger collaborating systems to meet intended requirements. These systems are defined as "Systems-of-Systems", and are emerging in areas such as critical infrastructure, space exploration, and the military. This paper highlights some of the challenges faced when integrating distinct systems to form these large complex heterogeneous Systems-of-Systems, we identify Systems-of-Systems that have failed and the disastrous consequences. Our research also discusses how single points of failure can heavily impact collaborating systems abilities to fulfil objectives, which can result in them failing with disastrous consequences.Proceedings of the 14th Annual Post Graduate Symposium on the Convergence of Telecommunications, Networking and Broadcasting (PGNet 2013; 06/2013
[Show abstract] [Hide abstract]
- "The framework presented in this paper builds on parts of our previous efforts on network resilience , . It also employs research done independently by the authors. "
ABSTRACT: Network resilience strategies aim to maintain ac-ceptable levels of network operation in the face of challenges, such as malicious attacks, operational overload or equipment failures. Often the nature of these challenges requires resilience strategies comprising mechanisms across multiple protocol layers and in disparate locations of the network. In this paper, we address the problem of resilience management and advocate that a new approach is needed for the design and evaluation of resilience strategies. To support the realisation of this approach we propose a framework that enables (1) the offline evaluation of resilience strategies to combat several types of challenges, (2) the generalisa-tion of successful solutions into reusable patterns of mechanisms, and (3) the rapid deployment of appropriate patterns when challenges are observed at run-time. The evaluation platform permits the simulation of a range of challenge scenarios and the resilience strategies used to combat these challenges. Strategies that can successfully address a particular type of challenge can be promoted to become resilience patterns. Patterns can thus be used to rapidly deploy resilience configurations of mechanisms when similar challenges are detected in the live network.01/2012; DOI:10.1109/NOMS.2012.6211924