Conference Paper

Risk-Aware Routing for Optical Transport Networks

Dept. of Comput. Sci., Univ. of California, Davis, CA, USA
DOI: 10.1109/INFCOM.2010.5462168 Conference: INFOCOM, 2010 Proceedings IEEE
Source: IEEE Xplore

ABSTRACT A Service Level Agreement (SLA) typically specifies the availability a Service Provider (SP) promises to a customer. In an Optical Transport Network, finding a lightpath for a connection is commonly based on whether the availability of a lightpath availability complies with the connection's SLA-requested availability. Because of the stochastic nature of network failures, the actual availability of a lightpath over a specific time period is subject to uncertainty, and the SLA is usually at risk. We consider the network uncertainty, and study routing to minimize the probability of SLA violation. First, we use a single-link model to study SLA Violation Risk (i.e., the probability of SLA violation) under different settings. We show that SLA Violation Risk may vary by paths and is affected by other factors (e.g., failure rate, connection holding time, etc.), and hence cannot be simply described by path availability. We then formulate the problem of risk-aware routing in mesh networks, in which routing decisions are dictated by SLA Violation Risk. In particular, we focus on devising a scheme capable of computing lightpath(s) that are likely to successfully accommodate a connection's SLA-requested availability. A novel technique is applied to convert links with heterogeneous failure profiles to reference links which capture the main risk features in a relative manner. Based on the "reference link" concept, we present a polynomial Risk-Aware Routing scheme using only limited failure information. In addition, we extend our Risk-Aware Routing scheme to incorporate shared path protection (SPP) when protection is needed. We evaluate the performance and demonstrate the effectiveness of our schemes in terms of SLA violation ratio and, more generally, contrast them with the generic availability-aware approaches.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: Carrier networks need to provide their customers with high availability of communication services. Unfortunately, failures are managed by recovery mechanisms getting involved only after the failure occurrence to limit the impact on traffic flows. However, there are often forewarning signs that a network device will stop working properly. We propose to take into account this risk exposure in order to improve the performance of the existing restoration mechanisms, in particular for IP networks. Based on an embedded and real-time risk-level assessment, we can perform a proactive fault-management and isolate the failing routers out of the routed topology, and thus totally avoid service unavailability. Our novel approach enables routers to preventively steer traffic away from risky paths by temporally tuning OSPF link cost.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Telecom customers may have specific time periods during which they require extra resilience. However, these time-differentiated resilience requirements are not effectively addressed by current service level agreement (SLA) frameworks. To satisfy these high priority periods, a generic SLA framework typically provides upgraded protection over the entire service duration, which is unnecessary and expensive. In this study, we propose a novel SLA framework that allows customers to specify critical windows (CWs) to address their time-differentiated demands for resilience. CWs correspond to the resilience-sensitive periods, and connections are backed up during CWs using pre-cross-connected protection. To achieve high resource efficiency, we identify opportunities for backup resource sharing in a time-domain multiplexing manner. Two heuristic schemes are proposed, namely, Locally and Globally CW-Aware connection assignments. Our study on a sample optical mesh network shows that, by applying our SLA framework, 1) resource efficiency can be significantly improved, 2) CWs are effectively protected with high resilience (in terms of availability), 3) the availability of CWs can be increased almost linearly with used backup resources, and (4) our framework (approach) requires low operational complexity.
    Journal of Optical Communications and Networking. 03/2011; 3(4):312-322.
  • [Show abstract] [Hide abstract]
    ABSTRACT: A Service Level Agreement (SLA) is a contract between a Service Provider (SP) and a customer that typically includes the customer requirements that the SP guarantees, the fee paid to the SP if the requirements are satisfied, and the penalty incurred by the SP if they are violated. Since an important requirement is the customer's service availability, we focus on routing and admission control in optical networks to improve the SP's ability to meet customers' availability requirements. Previous researchers used statistical path availabilities to satisfy SLA requirements. A more accurate measure is the actual probability that the request will satisfy the SLA requirements. Furthermore, since typically the SP's goal is to maximize profit, a good admission control policy should also consider the profitability of the request. We study the problem of provisioning connection requests to maximize profit in optical networks. We propose a two-step solution to this problem: first, efficient SLA-aware routing and second, intelligent admission control. For the SLA-aware routing, we consider both single path and pair of paths (one primary and one backup) solutions that route the request while minimizing the SLA violation probability. For the admission control, we propose a model to express the profitability of a request and an admission control policy that considers the violation probability and profitability to determine if and how the request should be admitted. Our admission control policy assesses a request's profitability by considering not only its expected profit but also by quantifying its resource utilization. Our results show that our solution provisions more requests, satisfies more SLA requirements, and yields more expected profit than the traditional approach.