Conference Paper

Risk-Aware Routing for Optical Transport Networks

Dept. of Comput. Sci., Univ. of California, Davis, CA, USA
DOI: 10.1109/INFCOM.2010.5462168 Conference: INFOCOM, 2010 Proceedings IEEE
Source: IEEE Xplore


A Service Level Agreement (SLA) typically specifies the availability a Service Provider (SP) promises to a customer. In an Optical Transport Network, finding a lightpath for a connection is commonly based on whether the availability of a lightpath availability complies with the connection's SLA-requested availability. Because of the stochastic nature of network failures, the actual availability of a lightpath over a specific time period is subject to uncertainty, and the SLA is usually at risk. We consider the network uncertainty, and study routing to minimize the probability of SLA violation. First, we use a single-link model to study SLA Violation Risk (i.e., the probability of SLA violation) under different settings. We show that SLA Violation Risk may vary by paths and is affected by other factors (e.g., failure rate, connection holding time, etc.), and hence cannot be simply described by path availability. We then formulate the problem of risk-aware routing in mesh networks, in which routing decisions are dictated by SLA Violation Risk. In particular, we focus on devising a scheme capable of computing lightpath(s) that are likely to successfully accommodate a connection's SLA-requested availability. A novel technique is applied to convert links with heterogeneous failure profiles to reference links which capture the main risk features in a relative manner. Based on the "reference link" concept, we present a polynomial Risk-Aware Routing scheme using only limited failure information. In addition, we extend our Risk-Aware Routing scheme to incorporate shared path protection (SPP) when protection is needed. We evaluate the performance and demonstrate the effectiveness of our schemes in terms of SLA violation ratio and, more generally, contrast them with the generic availability-aware approaches.

1 Follower
1 Read
  • [Show abstract] [Hide abstract]
    ABSTRACT: Carrier networks need to provide their customers with high availability of communication services. Unfortunately, failures are managed by recovery mechanisms getting involved only after the failure occurrence to limit the impact on traffic flows. However, there are often forewarning signs that a network device will stop working properly. We propose to take into account this risk exposure in order to improve the performance of the existing restoration mechanisms, in particular for IP networks. Based on an embedded and real-time risk-level assessment, we can perform a proactive fault-management and isolate the failing routers out of the routed topology, and thus totally avoid service unavailability. Our novel approach enables routers to preventively steer traffic away from risky paths by temporally tuning OSPF link cost.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Routing and wavelength assignment under availability constraints has been extensively researched recently. Availability (or more precisely, steady-state availability) can be defined as the average probability of a connection operating over a time window that tends to infinity. However, service level agreements (SLAs) commit a minimum connection uptime fraction over a finite contract duration. This random variable is known in reliability engineering as interval availability. If the minimum agreed interval availability is not honored, the service provider is penalized. In order to balance the risk of non-compliance fines against asset protection costs, network planners must know the interval availability distribution. However, its estimation with existing numerical techniques is computationally expensive, motivating the search for approximate analytical methods. Under the hypotheses of Poissonian node and link failures and repairs, and assuming no more than two link failures or one node failure in the network, we propose, for connections protected by shared or dedicated methods:•an approximate Markov model that allows the derivation of a closed-form expression for the connection steady-state availability;•under the approximate Markov model, analytical bounds on the interval availability distribution.The proposed methods are validated by discrete-event simulations of an Italian network.
    Computer Networks 01/2011; 55(1-55):193-204. DOI:10.1016/j.comnet.2010.07.018 · 1.26 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Telecom customers may have specific time periods during which they require extra resilience. However, these time-differentiated resilience requirements are not effectively addressed by current service level agreement (SLA) frameworks. To satisfy these high priority periods, a generic SLA framework typically provides upgraded protection over the entire service duration, which is unnecessary and expensive. In this study, we propose a novel SLA framework that allows customers to specify critical windows (CWs) to address their time-differentiated demands for resilience. CWs correspond to the resilience-sensitive periods, and connections are backed up during CWs using pre-cross-connected protection. To achieve high resource efficiency, we identify opportunities for backup resource sharing in a time-domain multiplexing manner. Two heuristic schemes are proposed, namely, Locally and Globally CW-Aware connection assignments. Our study on a sample optical mesh network shows that, by applying our SLA framework, 1) resource efficiency can be significantly improved, 2) CWs are effectively protected with high resilience (in terms of availability), 3) the availability of CWs can be increased almost linearly with used backup resources, and (4) our framework (approach) requires low operational complexity.
    Journal of Optical Communications and Networking 03/2011; 3(4):312-322. DOI:10.1364/JOCN.3.000312 · 2.06 Impact Factor
Show more