Risk-Aware Routing for Optical Transport Networks
ABSTRACT A Service Level Agreement (SLA) typically specifies the availability a Service Provider (SP) promises to a customer. In an Optical Transport Network, finding a lightpath for a connection is commonly based on whether the availability of a lightpath availability complies with the connection's SLA-requested availability. Because of the stochastic nature of network failures, the actual availability of a lightpath over a specific time period is subject to uncertainty, and the SLA is usually at risk. We consider the network uncertainty, and study routing to minimize the probability of SLA violation. First, we use a single-link model to study SLA Violation Risk (i.e., the probability of SLA violation) under different settings. We show that SLA Violation Risk may vary by paths and is affected by other factors (e.g., failure rate, connection holding time, etc.), and hence cannot be simply described by path availability. We then formulate the problem of risk-aware routing in mesh networks, in which routing decisions are dictated by SLA Violation Risk. In particular, we focus on devising a scheme capable of computing lightpath(s) that are likely to successfully accommodate a connection's SLA-requested availability. A novel technique is applied to convert links with heterogeneous failure profiles to reference links which capture the main risk features in a relative manner. Based on the "reference link" concept, we present a polynomial Risk-Aware Routing scheme using only limited failure information. In addition, we extend our Risk-Aware Routing scheme to incorporate shared path protection (SPP) when protection is needed. We evaluate the performance and demonstrate the effectiveness of our schemes in terms of SLA violation ratio and, more generally, contrast them with the generic availability-aware approaches.
- [show abstract] [hide abstract]
ABSTRACT: Carrier networks need to provide their customers with high availability of communication services. Unfortunately, failures are managed by recovery mechanisms getting involved only after the failure occurrence to limit the impact on traffic flows. However, there are often forewarning signs that a network device will stop working properly. We propose to take into account this risk exposure in order to improve the performance of the existing restoration mechanisms, in particular for IP networks. Based on an embedded and real-time risk-level assessment, we can perform a proactive fault-management and isolate the failing routers out of the routed topology, and thus totally avoid service unavailability. Our novel approach enables routers to preventively steer traffic away from risky paths by temporally tuning OSPF link cost.01/2010;
- [show abstract] [hide abstract]
ABSTRACT: Telecom customers may have specific time periods during which they require extra resilience. However, these time-differentiated resilience requirements are not effectively addressed by current service level agreement (SLA) frameworks. To satisfy these high priority periods, a generic SLA framework typically provides upgraded protection over the entire service duration, which is unnecessary and expensive. In this study, we propose a novel SLA framework that allows customers to specify critical windows (CWs) to address their time-differentiated demands for resilience. CWs correspond to the resilience-sensitive periods, and connections are backed up during CWs using pre-cross-connected protection. To achieve high resource efficiency, we identify opportunities for backup resource sharing in a time-domain multiplexing manner. Two heuristic schemes are proposed, namely, Locally and Globally CW-Aware connection assignments. Our study on a sample optical mesh network shows that, by applying our SLA framework, 1) resource efficiency can be significantly improved, 2) CWs are effectively protected with high resilience (in terms of availability), 3) the availability of CWs can be increased almost linearly with used backup resources, and (4) our framework (approach) requires low operational complexity.Journal of Optical Communications and Networking. 03/2011; 3(4):312-322.