Article

Fast Rerouting Against Multi-Link Failures Without Topology Constraint

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Multi-link failures may incur heavy packet loss and degrade the network performance. Fast rerouting has been proposed to address this issue by enabling routing protections. However, the effectiveness and efficiency issues of fast rerouting are not well addressed. In particular, the protection performance of existing approaches is not satisfactory even if the overhead is high, and topology constraints need to be met for the approaches to achieve a complete protection. To optimize the efficiency, we first answer the question that whether label-free routing can provide a complete protection against arbitrary multi-link failures in any networks. We propose a model for interface-specific-routing which can be seen as a general label-free routing. We analyze the conditions under which a multi-link failure will induce routing loops. And then, we present that there exist some networks in which no interface-specific-routing (ISR) can be constructed to protect the routing against any k-link failures (k ≥ 2). Then, we propose a tunneling on demand (TOD) approach, which covers most failures with ISR, and activate tunneling only when failures cannot be detoured around by ISR. We develop algorithms to compute ISR properly so as to minimize the number of activated tunnels, and compute the protection tunnels if necessary. We prove that TOD can protect routing against any single-link failures and dual-link failures. We evaluate TOD by simulations with real-world topologies. The results show that TOD can achieve a near 100% protection ratio with small tunneling overhead for multi-link failures, making a better tradeoff than the state-of-the-art label-based approaches.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Farooq Butt et al. [14] Maximize allocated VNs Heuristic Todimala and Ramamurthy [15] Maximize reliability ILP VN Reconfiguration Choi et al. [16] Maximize reliability Exact Algorithm Yang et al. [17] Maximize reliability Exact Algorithm Pozza et al. [18] Maximize allocated VNs Heuristic ...
... In this section, we divide the related work in three different categories: (1) VN mapping [7][8][9][10][11][12], (2) VN reconfiguration/ resilience [13][14][15][16][17][18], and (3) Service Placement on fog environments [19][20][21][22][23] (Table 1). ...
... Yang et al. [17] proposed an approach to deal with failures in multi-link networks with arbitrary network topologies, where their objective was to find paths that maximize the routing protection performance. They proposed a packet drop approach to prevent damage whenever the previous scheme is not able to protect the routing, since the network is disconnected by a failure. ...
Article
Full-text available
Fog computing emerged to ease the load of resource-constrained devices on the Internet. It provides services and computational devices closer to the users to reduce latency and improve quality of service. For this paradigm to be used in practice, certain optimization tasks need to be solved: to allocate the resources that are available to the users and to perform service placement and routing on the communications executed by them. In this paper, we develop a framework that allows users to offload services and perform communications in a Fog environment. For this, we propose a Mixed Integer Linear Programming (MILP) formulation and a heuristic to map Virtual Networks (VNs) into the substrate network, maximizing fairness of energy and bandwidth usage on the system. Moreover, we propose a MILP formulation and a heuristic to route and offload services of an application in each VN, minimizing energy and latency. We extend both heuristics to be used in dynamic settings with increasing number of users and applications and consider a reconfiguration approach when nodes or links fail in the substrate network. To evaluate the proposed approaches, we use the YAFS simulator and several instances of VNs and services with both randomly generated values and referenced parameters. We observed that the heuristics are able to obtain results close to the optimal value in most settings.
... Li defined a survivable k-node (edge) content connected virtual optical network in an elastic optical data centre network to survive against multiple failures [28]. Yang presented a fast rerouting algorithm against multi-link failures without topology constraint, which can reduce the overhead [29]. ...
... We build an ILP model composed of objective (21) and constraints (22)- (29) to realise optimal configuration of the capacity share for the communication paths of all DR service. Gurobi can solve this model. ...
... Constraints (25)- (29) restrict the shared link capacity should meet the requirements. y uv r, k, r′, k′ means the capacity shared by r k and r k′ on link uv. ...
Article
Full-text available
The increase of power consumption around the world puts forward higher requirements for the robustness of the power grid. As a critical application of the smart grid, demand response (DR) has significant impacts on the stable operation of the grid. The source–grid–load system (SGLS) in Jiangsu Province of China has implemented three different types of DR. This study aims at promoting the survivability of the DR communication network, which can enhance the reliability of DR. In response to concurrent dual‐link failure, the authors first present a capacity expansion algorithm with the minimum distance to enhance the topology of the DR communication network. Next, they design a multipath routing algorithm based on pre‐estimation to configure multiple paths for every DR terminal according to the characteristics of the DR service. Then they propose a link capacity sharing algorithm for multipath to reduce the communication network cost for DR. Simulations demonstrate the proposed algorithm can improve survivability and reduce the cost of the DR communication network. Furthermore, they compare the multipath configuration algorithms in different topology enhancement schemes, which can give some suggestions on the construction of SGLS in the future.
... When the network topology is changed, the routing of some nodes needs to be recalculated. Some routing protection methods [30]- [33] have been proposed to cope with this case, wherein the reactive strategy [30] recalculates the routing table after distributing the information about the failed links, where the proactive strategy [31]- [33] maintains multiple forwarding entries per destination for each router. The proactive strategy can only achieve the fast routing protection against single-link or dual-link failures [33]. ...
... When the network topology is changed, the routing of some nodes needs to be recalculated. Some routing protection methods [30]- [33] have been proposed to cope with this case, wherein the reactive strategy [30] recalculates the routing table after distributing the information about the failed links, where the proactive strategy [31]- [33] maintains multiple forwarding entries per destination for each router. The proactive strategy can only achieve the fast routing protection against single-link or dual-link failures [33]. ...
... Some routing protection methods [30]- [33] have been proposed to cope with this case, wherein the reactive strategy [30] recalculates the routing table after distributing the information about the failed links, where the proactive strategy [31]- [33] maintains multiple forwarding entries per destination for each router. The proactive strategy can only achieve the fast routing protection against single-link or dual-link failures [33]. In the reactive strategy, the information of the failed link is either globally broadcasted to achieve the global routing optimization, or locally diffused to fulfill local routing recovery. ...
Article
Network structure has significant impact on its data transmission and robustness. This article presents a framework to evaluate the structure performance of dynamic networks, with a focus on measuring the topological dynamics and assessing its effect on routing, transmission efficiency, network robustness, and network capacity. Based on this framework, the structural performance of satellite networks including both single-layered and multilayered satellite networks is evaluated systematically. The performance results reveal that it is difficult to design the efficient and feasible transmission protocols for multilayered satellite networks for their high-dynamic topologies with irregular changes, which brings unpractical requirements on the routing update. To cope with this problem, a synchronization mechanism of the interorbital links for multilayered satellite networks is presented, which can significantly reduce and regularize the topological dynamics of multilayered satellite networks. Its advantages can effectively facilitate the realization of transmission protocols for multilayered satellite networks. Finally, the experimental evaluation results demonstrate the effectiveness of the proposed strategy.
... Traditional routing algorithms, such as open shortest path first (OSPF) and Intermediate system-to-intermediate system (IS-IS), are mostly concerned with finding shortest paths towards each destination, and thus cannot provide good connectivity under frequent network failures. On detecting a failure, they start a global link-state advertisement and then recompute routes, inevitably causing network outage [4], [5]. This highlights the need for mechanisms that possess fast and efficient recovery capability [6]- [9]. ...
... And they suggest to build an augmented fat-tree topology which allows LFA to protect against all single link and node failures. TOD [4] proposes tunneling on demand (TOD) to handle one or dual link failures, but needs an additional signaling protocol to establish tunnels. Authors in [33] presented the STATIC-ROUTING-RESILIENCY problem and explored the power of static fast failover routing in a variety of models. ...
Article
Full-text available
The internet is playing an increasingly crucial role in both personal and business activities. In addition, with the emergence of real-time, delay sensitive and mission-critical applications, stringent network availability requirement is put forward for internet service providers (ISPs). However, commonly deployed intradomain link-state routing protocols react to link failures by globally exchanging link state advertisements and recalculating routing table, inevitably causing significant forwarding discontinuity after a failure. Therefore, the loop-free criterion (LFC) approach has been widely deployed by many ISPs for coping with the single network component failure scenario in large internet backbones. The success of LFC lies in its inherent simplicity, but this comes at the expense of letting certain failure scenarios go unprotected. To achieve full failure coverage with LFC without incurring significant extra overhead, we propose a novel link protection scheme, hybrid link protection (HLP), to achieve failure resilient routing. Compared to previous schemes, HLP ensures high network availability in a more efficient way. HLP is implemented in two stages. Stage one provides an efficient LFC based method (MNP-e). The complexity of the algorithm is less than that of Dijkstra's algorithm and can provide the similar network availability with LFC. Stage two provides backup path protection (BPP) based on MNP-e, where only a minimum number of links need to be protected, using special paths and packet headers, to meet the network availability requirement. We evaluate these algorithms in a wide spread of relevant topologies, both real and synthetic, and the results reveal that HLP can achieve high network availability without introducing conspicuous overhead. HLP not only needs around 10% time of that of full protection, but also provides full protection capabilities that full protection provide.
... By constructing a directed acyclic graph with three disjoint edges [12], any two links in the network can be protected from failure. This study [13] investigates and proves that single-and double-fault protection algorithms are not restricted by network topology. Among the hop-by-hop proactive schemes, DC has been favored by the industry because of its simplicity in coping with all the single-link failure scenarios. ...
... Equal cost multipath routing (ECMP) is the earliest and simplest route protection scheme employed in the industry. However, numerous studies have shown that ECMP cannot provide a high network failure protection ratio [Geng, Shi, Wang et al. (2017);Yang, Xu and Li (2018)]. To overcome this, the internet engineering task force (IETF) puts forward the framework of fast re-routing. ...
... Similarly, in the case of multi-link failures [24], a disjoint spanning tree based on edge cut was employed to reduce packet loss ratio, balance network load, and lower network recover delay. In order to improve the algorithm efficiency and performance as well as implement complete protection against link failures in the network [25], an approach of TOD (tunneling on demand) was addressed. An interface-specific routing model (ISR) is capable of handling link failures in most situations, and a tunneling mechanism will be activated if it fails to be backhauled by ISRs. ...
Article
Full-text available
Abstract Natural disasters such as earthquakes have consecutive impacts on the smart grid because of aftershock activities. To guarantee service requirements and smart grid stable operations, it is a challenge to design a fast and survivable rerouting mechanism. There are few studies that consider concurrent rerouting aiming at multiple services in smart grid communication network, however. Firstly, we formulate the node survivability, link survivability, and path survivability model in terms of the distance from the epicenter to the node and the link of the network. Meanwhile, we introduce the indicator of site difference level which is unique in the smart grid to further restrict the service path. Secondly, to improve the algorithm efficiency and reduce rerouting time, the deep first search algorithm is utilized to obtain the available rerouting set, and then the I-DQN based on the framework of reinforcement learning is proposed to achieve concurrent rerouting for multiple services. The experimental results show that our approach has a better convergence performance and higher survivability as well as the approximate latency in comparison with other approaches.
... We use the definition of protection ratio in [24], which is the number of original end-to-end paths that are protected to the number of original end-to-end paths that include link failures. We test the protection ratio of RSFR under singlelink failure in five topologies respectively. ...
Article
Full-text available
Software-defined networking (SDN) is a great innovation, which makes the network programmable so that it is easier to achieve failures recovery. Through pre-programmed recovery strategies and pre-deployed backup resource, flows can be redirected to destination quickly upon failures. However, it consumes a large amount of backup resources for fast failover. Nowadays the network scale and the magnitude of flows increase greatly, which leads to the need to deploy a large number of flow entries in face of failures. However, the Ternary Content Addressable Memory (TCAM) that stores flow entries is capacitylimited. Therefore, it is meaningful to reduce the backup resource consumption. In this paper, we propose a ring-based single-link failure recovery approach (RSFR) to improve backup resource utilization. We select a ring from the network based on node importance and link performance. Then based on the selected ring, we plan backup path and design flow tables to reuse most backup flow entries. Thus failure recovery can be achieved with less flow entries. Additionally, in order to ensure the performance of backup path, we periodically update the ring according to the predicted load. Simulation results show that the proposed approach has a great performance in terms of resource consumption and backup paths performance.
... To improve the fault resilience, we use tags in rerouting algorithms since the tag-based approach provides better protection performance in rerouting techniques [19]. In [20], [21], these methods ensure that the routing of the k-color tree cannot be affected by (k−1) link failures. ...
Article
Full-text available
In this paper, we consider the problem of routing interruption, which has received significant interest in recent years. To solve this problem, rerouting is an effective way. However, how to design an effective secure and applicable rerouting algorithm remains a significant challenge. To this end, we propose a fast rerouting framework for routing interruption. Our framework consists of two parts: (1) Diagnose routing interruption. We determine the location of interruption by directly interrogating the control plane IPv6 stack. (2) Implement fast rerouting. We propose the rerouting algorithm to make the router match and reroute all prefixes affected by interruptions. This paper updates the path only by adding a forwarding rule, thus the interruption recovery time is reduced. We conduct a comprehensive evaluation for our rerouting framework. Experimental results show that our method is better than FCP, OSPF and DLF in speed and accuracy.
... Traffic monitoring and measurement is a fundamental activity, with high impact for many networking applications and protocols such as load balancing [2,3,55], traffic engineering [4,48,58], flow rerouting [5,60,63], fairness [37], intrusion and anomaly detection [32,49,65], caching [28], policy enforcement [56] and performance diagnostics [24]. ...
Article
Full-text available
Reliably tracking large network flows in order to determine so-called elephant flows, also known as heavy hitters or frequent items, is a common data mining task. Indeed, this kind of information is crucial in many different contexts and applications. Since storing all of the traffic is impossible, owing to the fact that flows arrive as an unbounded, infinite length stream, many different algorithms have been designed for traffic measurement using only a limited amount of memory. CountMax is a recently published algorithm that solves this problem approximately by combining ideas from the CountMin and MisraGries algorithms. In this paper we introduce CMSS which cleverly combines ideas from the CountMin and Space Saving algorithms. We compare CMSS and CountMax on both synthetic and real datasets. The experimental results show that both algorithms achieve 100% recall, and can retrieve the frequent items without false negatives even with a limited amount of budgeted memory. Regarding the precision, CMSS proves to be robust and independent from all of the parameters, whilst CountMax is severely affected by false positives. Finally, both algorithms provide, in practice, the same frequency estimation quality. It follows that CMSS outperforms CountMax with regard to overall accuracy whilst providing the same frequency estimation quality.
... The Tunneling on Demand (ToD) [15] technique is the combi- nation of two algorithms: Interface Specific Routing (ISR) and tun- neling. The BasicISR algorithm calculates ISR path pairs and then divides them into Loop Risk Link Sets (LRLS). ...
Article
Fast rerouting requires that a backup route is already available at each node so that traffic can immediately be shifted on it without new path discovery and convergence time delay. Handling multiple failures with least possible delay, high throughput and least overhead with regard to memory and battery is a real challenge in Wireless Sensor Networks (WSN). Current fast rerouting techniques that handle multiple failures do not specifically target mission-critical WSN applications. Fast Rerouting techniques use spanning trees, backup topologies or configurations to shift traffic immediately as and when error is detected. These techniques do not focus on finding least hop count on the backup path and therefore, end-to-end delay on the backup paths is higher than on the primary path. The proposed Fast Rerouting Protocol (FRP) establishes primary and backup routes before the start of data transfer. It creates at least one backup path towards destination from every node on the primary path. FRP therefore has the ability to handle multiple failures in mission-critical WSN environment. NS-2 simulation results of FRP against the competitor reveal that, FRP takes least time and control messages to establish shorter fast rerouting paths, produces minimum end-to-end delay, least energy consumption and higher network life time.
Article
Routing schemes play a crucial role in handling link failures, especially in satellite networks where such failures are common due to the challenging space environment. In this article, a resilient routing algorithm called “shortest path first directional routing (SPFDR)” is proposed. By modeling the topology of the constellation as a mesh and applying specific rules for outport selection, SPFDR can handle any k ( kk \leq 2) link/node failures with minimal path stretch in inclined low Earth-orbit megaconstellations. A rigorous proof of SPFDR under k ( kk \leq 2) link/node failures is provided. The most prominent aspects of SPFDR are linear time complexity and label-free strategy. The simulation results demonstrate that SPFDR guarantees 100% reachability against any k link/node failures ( kk \leq 2), with an average path stretch increase of approximately zero.
Article
Due to the rapid development of the Internet technology such as 5G/6G and artificial intelligence, more and more new network applications appear. Customers using these applications may have different individual demands and such a trend causes great challenges to the traditional integrated service and routing model. In order to satisfy the individual demands of customers, the service customization should be considered, during which the cost of Internet Service Provider (ISP) naturally increases. Hence, how to reach a balance between the customer satisfaction and the ISP profit becomes vitally important. Targeting at addressing this critical problem, this work proposes a service customization oriented reliable routing mechanism, which includes two modules, that is, the service customization module and the routing module. In particular, the former (i.e., the service customization module) is responsible for classifying services by analyzing and processing the customer's demands. After that, the IPv6 protocol is used to implement the service customization, since it naturally supports differentiated services via the extended header fields. The latter is responsible for transforming the customized services into specific routing policies. Specifically, the Nash equilibrium based economic model is firstly introduced to make a perfect balance between the user satisfaction and the ISP profits, which could finally produce a win-win solution. After that, based on the customized service policies, an optimized grey wolf algorithm is designed to establish the routing path, during which the routing reliability is formulated and calculated. Finally, the experiments are carried out and the proposed mechanism is evaluated. The results indicate that the proposed service customization and routing mechanism improves the routing reliability, user satisfaction and ISP satisfaction by about 8.42%, 15.5% and 17.75% respectively compared with the classical open shortest path first algorithm and the function learning based algorithm.
Article
Wireless sensor networks (WSNs) are part of the short-term networks, including sensitivity, computation, and Wi-Fi connectivity capability. Many routing, range management, and log transfer protocols are specifically designed for WSN. The previous method showed less efficiency in routing management. The proposed lifetime maximization energy-aware routing protocol (LTMEARP) protocol is known for its high routing efficiency, giving a higher lifetime and throughput performance. The requirement for a routing system approach is advanced with Internet service provider's protocols by recalculating the routing table after the link stage's substitution worldwide, leading to responses and connection failures by sharing important data after traffic. LTMEARP routing protocol has assured high availability routing performances during traffic conditions. LTMEARP support sends homogeneous and expanded nodes. Analyze a new approach to routing-based selection algorithms for homogeneous node WSNs. The number of connections is limited and must be adjusted using separate paths and header packets to meet the user's network access location. The results show that the LTMEARP has achieved quality without introducing excessive network access program overhead, which deals with the study of communication and the benefits and problems with the performance of each routing technology.
Article
Energy consumption is becoming a key issue in the research of future network. In practice, network traffic has a periodic time distribution that occurs most often at a low level. This feature provides the possibility of achieving network energy savings through topology switching. By considering the deficiencies in existing studies, such as the low adaptability between network working topology and traffic load, the abnormal topology switching caused by abnormal and unbalanced traffic, and the low reliability of energy-saving topology, this paper proposes an energy-efficient routing method for software-defined networks based on topology switching and reliability. The method involves two parts: a topology-switching method and a failure recovery method. The former adapts the network working topology to the network traffic demands through dynamic topology switching to decrease the network energy consumption. The latter adopts an active strategy for fast fault recovery to ensure network reliability in the energy-efficient topology. Two network typologies and their traffic data are used to experimentally verify the method. The results show that, compared with the static topology switching method TLS, the energy saving of the proposed method can be improved at most 2.07 times and 4.63 times in two typical typologies, respectively, while ensuring network reliability.
Article
Software-defined networking (SDN) is an emerging trend where the control plane and the data plane are separated from each other, culminating in effective bandwidth utilization. This separation also allows multi-vendor interoperability. Link failure is a major problem in networking and must be detected as soon as possible because when a link fails the path becomes congested and packet loss occurs, delaying the delivery of packets to the destination. Backup paths must be configured immediately when a failure is detected in the network to speed up packet delivery, avoid congestion and packet loss and provide faster convergence. Various SDN segment protection algorithms that efficiently reduce CPU cycles and flow table entries exist, but each has drawbacks. An independent transient plane technique can be used to reduce packet loss but is not as efficient when multiple flows try to share the same link. The proposed work focuses on reducing congestion, providing faster convergence with minimal packet loss and effectively utilizing link bandwidth using bandwidth-sharing techniques. An analysis and related studies show that this method performs better and offers a more reliable network without loss, while simultaneously ensuring the swift delivery of data packets toward the destination without congestion, compared to the other existing schemes.
Chapter
This chapter provides a taxonomy of schemes for resilient routing followed by a discussion of their application to contemporary architectures of communication networks. In particular, a general classification of schemes for resilient routing is first presented followed by a description of the reference schemes for IP networks. The chapter in its later part focuses on the representative techniques of resilient routing for a multi-domain and multi-layer network scenarios followed by conclusions also referring to the applicability of the resilient routing schemes in disaster scenarios.
Article
Network failures may lead to serious packet loss and degrade network performance. Therefore, loop-free alternates (LFA) have been widely deployed by many Internet service providers to cope with single network component failure in large Internet backbones. However, the efficiency of LFA has not been sufficiently studied. Some existing methods have extremely large computational overhead, and their computational complexity is linear with the average node degree of the network. The current methods consume a large amount of central processing unit resources, thereby aggravating the burden on the router. To improve the routing resilience without introducing significant extra overhead, this study proposes an incremental alternate computation with negative augmentation algorithm (IAC), which is based on incremental shortest path first algorithm. First, IAC turns the problem of quick implementation of LFA into efficiently calculating the minimum cost of all its neighbors to all other network nodes on the shortest path tree rooted at the computing node. Then, several theorems for calculating the cost are presented and their correctness is validated. Finally, we evaluate IAC through simulations with real-world and generated topologies. Compared with TBFH, DMPA, and DMPA-e, which are algorithms optimized for limited scenarios, IAC finds approximately 50% more available alternates, is more than three times faster, and provides node protection capabilities that TBFH, DMPA, and DMPA-e cannot provide. These advantages make IAC a good candidate for traditional telecommunication networks and emerging complex networks that require failure repair and load balancing in a highly dynamic environment.
Conference Paper
Full-text available
In today's IP network, if a link or router fails, packets that traverse the failed link or router will be lost until the network re-converges even if there exists another path bypassing the failure. We call such a single failure a valid single failure. Therefore the IETF published a framework called IP Fast ReRoute (IPFRR) which aims to provide protection against such valid single failures before the network re-converges. Based on the IPFRR framework, a lot of methods have been proposed. One category of IPFRR methods is IP-tunnel-based. The IP-tunnel-based IPFRR methods impose extra cost such as the encapsulation cost, on the traffic delivery. In this paper we focus on the otber category of IPFRR methods, no-tunnel-based IPFRR methods, which do not impose any extra cost on the traffic delivery. However, the existing no-tunnel-based IPFRR methods cannot provide complete protection against all the valid single failures. Therefore in this paper we propose a new kind of no-tunnel-based IPERR method named with RPFP which can provide complete protection against all the valid single failures.
Conference Paper
Full-text available
High reliability is always pursued by network designers. Multipath routing can provide multiple paths for transmission and failover, and is considered to be effective in the improvement of the network reliability. However, existing multipath routing algorithms focus on how to find as many paths as possible, rather than their computation or communication overhead. We propose a dynamic distributed multipath algorithm (DMPA) to help a router in a link-state network find multiple nexthops for each destination. A router runs the algorithm locally and independently, where only one single shortest path tree (SPT) needs to be constructed, and no message other than the basic link states is disseminated. DMPA maintains the SPT and dynamically adjusts it in response to network state changes, so the sets of nexthops can be incrementally and efficiently updated. At the same time, DMPA guarantees loop-freeness of the induced forwarding path by a partial order of the routers underpinning it. We evaluate DMPA and compare it with some latest multipath algorithms, using a set of real, inferred and synthetic topologies. The results show that DMPA can provide good reliability and fast recovery for the network with very low overhead.
Article
Full-text available
Many modern network designs incorporate "failover" paths into routers' forwarding tables. We initiate the theoretical study of such resilient routing tables.
Article
Full-text available
We consider the problem of maintaining communication between the nodes of a data network and a central station in the presence of frequent topological changes as, for example, in mobile packet radio networks. We argue that flooding schemes have significant drawbacks for such networks, and propose a general class of distributed algorithms for establishing new loop-free routes to the station for any node left without a route due to changes in the network topology. By virtue of built-in redundancy, the algorithms are typically activated very infrequently and, even when they are, they do not involve any communication within the portion of the network that has not been materially affected by a topological change.
Conference Paper
Full-text available
Although providing reliable network services is getting more and more important, currently used methods in IP networks are typi- cally reactive and error correcting can take a long time. One of the most interesting solutions is interface based fast rerouting, where not only the destination address but also the incoming interface is taken into account during the forwarding. Unfortunately, current methods can not handle all the possible situations as they are prone to form loops and make parts of the network with no failure unavailable. In this paper we propose a new interface based routing method, which always avoids loops for the price of a bit longer paths. We also present extensive simulation results to compare current and proposed algorithms.
Conference Paper
Full-text available
IP Fast ReRoute (IPFRR) is the IETF standard for providing fast failure protection in IP and MPLS/LDP networks and Loop Free Alternates (LFA) is a basic specification for implementing it. Even though LFA is simple and unobtrusive, it has a significant drawback: it does not guarantee protection for all possible failure cases. Consequently, many IPFRR proposals have appeared lately, promising full failure coverage at the price of added complexity and non-trivial modifications to IP hardware and software. Meanwhile, LFA remains the only commercially available, and therefore, the only deployable IPFRR solution. Deployment, however, crucially depends on the extent to which LFA can protect failures in operational networks. In this paper, therefore, we revisit LFA in order to give theoretical insights and practical hints to LFA failure coverage analysis. First, we identify the topological properties a network must possess to profit from good failure coverage. Then, we study how coverage varies as new links are added to a network, we show how to do this optimally and, through extensive simulations, we arrive to the conclusion that cleverly adding just a couple of new links can improve the quality of LFA protection drastically.
Conference Paper
Full-text available
In this paper, we propose a routing technique to alleviate packet loss due to transient link failures, which are major causes of disruption in the Internet. The proposed technique based on Alternate Next Hop Counters (ANHC) allows routers to calculate backup paths and re-route packets accordingly, thereby bypassing transient failures. This technique guarantees full repair coverage for single link failures, without significantly changing the way traditional routing works and with minimal impact on the computation and memory requirements for routers. We evaluate the performance of our proposed ANHC approach through extensive simulations and show that the stretch of its pre-computed alternate paths, its failure-state link load increase, and its computational and memory overheads are minimal.
Conference Paper
Full-text available
We present path splicing, a new routing primitive that allows network paths to be constructed by combining multiple routing trees ("slices") to each destination over a single network topology. Path splicing allows traffic to switch trees at any hop en route to the destination. End systems can change the path on which traffic is forwarded by changing a small number of additional bits in the packet header. We evaluate path splicing for intradomain routing using slices generated from perturbed link weights and find that splicing achieves reliability that approaches the best possible using a small number of slices, for only a small increase in latency and no adverse effects on traffic in the network. In the case of interdomain routing, where splicing derives multiple trees from edges in alternate backup routes, path splicing achieves near-optimal reliability and can provide significant benefits even when only a fraction of ASes deploy it. We also describe several other applications of path splicing, as well as various possible deployment paths.
Conference Paper
Full-text available
This paper presents BCube, a new network architecture speciflcally designed for shipping-container based, modular data centers. At the core of the BCube architecture is its server-centric network structure, where servers with multi- ple network ports connect to multiple layers of COTS (com- modity ofi-the-shelf) mini-switches. Servers act as not only end hosts, but also relay nodes for each other. BCube sup- ports various bandwidth-intensive applications by speeding- up one-to-one, one-to-several, and one-to-all tra-c patterns, and by providing high network capacity for all-to-all tra-c. BCube exhibits graceful performance degradation as the server and/or switch failure rate increases. This property is of special importance for shipping-container data centers, since once the container is sealed and operational, it becomes very di-cult to repair or replace its components. Our implementation experiences show that BCube can be seamlessly integrated with the TCP/IP protocol stack and BCube packet forwarding can be e-ciently implemented in both hardware and software. Experiments in our testbed demonstrate that BCube is fault tolerant and load balanc- ing and it signiflcantly accelerates representative bandwidth- intensive applications.
Conference Paper
Full-text available
This paper presents Packet Re-cycling (PR), a technique that takes advantage of cellular graph embeddings to reroute packets that would otherwise be dropped in case of link or node failures. The technique employs only one bit in the packet header to cover any single link failures, and in the order of log2(d) bits to cover all non-disconnecting failure combinations, where d is the diameter of the network. We show that our routing strategy is effective and that its path length stretch is acceptable for realistic topologies. The packet header overhead incurred by PR is very small, and the extra memory and packet processing time required to implement it at each router are insignificant. This makes PR suitable for loss-sensitive, mission-critical network applications.
Conference Paper
Full-text available
With the increasing demand for low-latency applications in the Internet, the slow convergence of the existing routing protocols is a growing concern. A number of IP fast reroute mechanisms have been developed by the IETF to address the issue. The goal of the IPFRR mechanisms is to activate alternate routing paths which avoid micro loops under node or link failures. In this paper we present a comprehensive analysis of these proposals by evaluating their coverage for a variety of inferred and synthetic ISP topologies.
Article
Full-text available
The Internet would be more efficient and robust if routers could flexibly divide traffic over multiple paths. Often, having one or two extra paths is sufficient for customizing paths for different applications, improving security, reacting to failures, and balancing load. However, support for Internet-wide multipath routing faces two significant barriers. First, multipath routing could impose significant computational and storage overhead in a network the size of the Internet. Second, the independent networks that comprise the Internet will not relinquish control over the flow of traffic without appropriate incentives. In this article, we survey flexible multipath routing techniques that are both scalable and incentive compatible. Techniques covered include: multihoming, tagging, tunneling, and extensions to existing Internet routing protocols.
Article
This paper presents Packet Re-cycling (PR), a technique that takes advantage of cellular graph embeddings to reroute packets that would otherwise be dropped in case of link or node failures. The technique employs only one bit in the packet header to cover any single link failures, and in the order of log2(d) bits to cover all non-disconnecting failure combinations, where d is the diameter of the network. We show that our routing strategy is effective and that its path length stretch is acceptable for realistic topologies. The packet header overhead incurred by PR is very small, and the extra memory and packet processing time required to implement it at each router are insignificant. This makes PR suitable for loss-sensitive, mission-critical network applications.
Article
Fast reroute and other forms of immediate failover have long been used to recover from certain classes of failures without invoking the network control plane. While the set of such techniques is growing, the level of resiliency to failures that this approach can provide is not adequately understood. In this paper, we embarked upon a systematic algorithmic study of the resiliency of forwarding tables in a variety of models (i.e., deterministic/probabilistic routing, with packet-header-rewriting, with packet-duplication). Our results show that the resiliency of a routing scheme depends on the “connectivity” k of a network, i.e., the minimum number of link deletions that partition a network. We complement our theoretical result with extensive simulations. We show that resiliency to four simultaneous link failures, with limited path stretch, can be achieved without any packet modification/duplication or randomization. Furthermore, our routing schemes provide resiliency against k-1 failures, with limited path stretch, by storing log(k) bits in the packet header, with limited packet duplication, or with randomized forwarding technique.
Conference Paper
In datacenter networks, link and switch failures are a common occurrence. Although most of these failures do not disconnect the underlying topology, they do cause routing failures, disrupting communications between some hosts. Unfortunately, current 1:1 redundancy groups are only partly effective at reducing the impact of these routing failures. In principle, local fast failover schemes, such as OpenFlow fast failover groups, could reduce the impact by preinstalling backup routes that protect against multiple simultaneous failures. However, providing a sufficient number of backup routes within the available space provided by the forwarding tables of datacenter switches is challenging. To solve this problem, we contribute a new forwarding table compression algorithm. Further, we introduce the concept of compression-aware routing to improve the achieved compression ratio. Lastly, we have created Plinko, a new forwarding model that is designed to have more easily compressible forwarding tables. All optimizations combined, we often saw compression ratios ranging from 2.10x to 19.29x.
Article
This paper presents BCube, a new network architecture specifically designed for shipping-container based, modular data centers. At the core of the BCube architecture is its server-centric network structure, where servers with multiple network ports connect to multiple layers of COTS (commodity off-the-shelf) mini-switches. Servers act as not only end hosts, but also relay nodes for each other. BCube supports various bandwidth-intensive applications by speeding-up one-to-one, one-to-several, and one-to-all traffic patterns, and by providing high network capacity for all-to-all traffic. BCube exhibits graceful performance degradation as the server and/or switch failure rate increases. This property is of special importance for shipping-container data centers, since once the container is sealed and operational, it becomes very difficult to repair or replace its components. Our implementation experiences show that BCube can be seamlessly integrated with the TCP/IP protocol stack and BCube packet forwarding can be efficiently implemented in both hardware and software. Experiments in our testbed demonstrate that BCube is fault tolerant and load balancing and it significantly accelerates representative bandwidth-intensive applications.
Conference Paper
The introduction of coherent optics and wavelength division multiplexing (WDM) in telecommunication networks has led to unprecedented gains in backbone capacity. The increase in optical layer capacity inadvertently exacerbates the problem of traffic loss due to optical component failures. IP networks are designed over optical backbone networks, where each IP link traverses a multihop optical lightpath. Therefore, the failure of optical components often lead to multiple link failures in the IP network. In this paper, we develop an IP fast reroute mechanism using rooted arc-disjoint spanning trees that guarantees recovery from (k-1) link failures in a k-edge-connected network. As arc-disjoint spanning trees may be constructed in sub-quadratic time in the size of the network, our approach offers excellent scalability. Through experimental results, we show that employing arc-disjoint spanning trees to recover from multiple failures reduces path stretch in comparison with previously known techniques.
Conference Paper
We typically think of network architectures as having two basic components: a data plane responsible for forwarding packets at line-speed, and a control plane that instantiates the forwarding state the data plane needs. With this separation of concerns, ensuring connectivity is the responsibility of the control plane. However, the control plane typically operates at timescales several orders of magnitude slower than the data plane, which means that failure recovery will always be slow compared to data plane forwarding rates. In this paper we propose moving the responsibility for connectivity to the data plane. Our design, called Data-Driven Connectivity (DDC) ensures routing connectivity via data plane mechanisms. We believe this new separation of concerns -- basic connectivity on the data plane, optimal paths on the control plane -- will allow networks to provide a much higher degree of availability, while still providing flexible routing control.
Conference Paper
This paper introduces Plinko, a network architecture that uses a novel forwarding model and routing algorithm to build networks with forwarding paths that, assuming arbitrarily large forwarding tables, are provably resilient against t link failures, ∀t ∈ N. However, in practice, there are clearly limits on the size of forwarding tables. Nonetheless, when constrained to hardware comparable to modern top-of-rack (TOR) switches, Plinko scales with high resilience to networks with up to ten thousand hosts. Thus, as long as t or fewer links have failed, the only reason packets of any flow in a Plinko network will be dropped are congestion, packet corruption, and a partitioning of the network topology, and, even after t + 1 failures, most, if not all, flows may be unaffected. In addition, Plinko is topology independent, supports arbitrary paths for routing, provably bounds stretch, and does not require any additional computation during forwarding. To the best of our knowledge, Plinko is the first network to have all of these properties.
Article
It has been observed that transient failures are fairly common in IP backbone networks and there have been several proposals based on local rerouting to provide high network availability despite failures. While most of these proposals are effective in handling single failures, they either cause loops or drop packets in the case of multiple independent failures. To ensure forwarding continuity even with multiple failures, we propose Localized On-demand Link State (LOLS) routing. Under LOLS, each packet carries a blacklist, which is a minimal set of failed links encountered along its path, and the next hop is determined by excluding the blacklisted links. We show that the blacklist can be reset when the packet makes forward progress towards the destination and hence can be encoded in a few bits. Furthermore, blacklist-based forwarding entries at a router can be precomputed for a given set of failures requiring protection. While the LOLS approach is generic, this paper describes how it can be applied to ensure forwarding to all reachable destinations in case of any two link or node failures. Our evaluation of this failure scenario based on various real network topologies reveals that LOLS needs 6 bits in the worst case to convey the blacklist information. We argue that this overhead is acceptable considering that LOLS routing deviates from the optimal path by a small stretch only while routing around failures.
Conference Paper
Handling link failures is the fundamental task of routing schemes. Routing protocols based on link state (e.g., OSPF) require a global state advertisement and re-computation when link failure happens, and will cause inevitable delivery failures. To improve the routing resilience without introducing significant extra overhead, we propose a new routing approach, Keep Forwarding (KF) to achieve k-link failure resiliencesing inport-aware forwarding. KF is (i) flexible to handle multiple failures (or k-failure) with only small path stretch, (ii) e�cient in recovery speed by instant and local lookup, (iii) bounded on memory requirement. Besides, the proposed approach is compatible with existing Internet protocols and routing infrastructures (e.g., requires no packet labeling or state recording), and the pre-computation has a linear temporal complexity. Experimental results on real ISP and datacenter networks reveal that KF guarantees near-optimal resilience (99.9%�100% for single failure and over 99.7% for multiple failures), with the average path stretch increment less than 5%.
Article
We develop an approach for disjoint multipath routing and fast recovery in IP networks that guarantees recovery from arbitrary two link failures. We employ three link-independent trees, referred to as red, blue, and green trees, rooted at every destination. The path from a source to the destination on the trees are mutually link-disjoint. The routing of packets is based on the destination address and the input interface over which the packet was received. We discuss different ways of employing the three link-independent trees for multipath routing and/or failure recovery. If the trees are employed exclusively for multipath routing, then no packet overhead is required. If the trees are employed for failure recovery, then the overhead bits will range from 0 to 2 bits depending on the flexibility sought in routing. We evaluate the performance of the trees in fast recovery by comparing the path lengths provided under single and dual link failures with an earlier approach based on tunneling.
Article
In this article the Cn-graphs are introduced, by which a characterization of the embeddability of a graph on either an orientable surface or a non-orientable surface is provided.
Article
With network components increasingly reliable, routing is playing an ever greater role in determining network reliability. This has spurred much activity in improving routing stability and reaction to failures and rekindled interest in centralized routing solutions, at least within a single routing domain. Centralizing decisions eliminates uncertainty and many inconsistencies and offers added flexibility in computing routes that meet different criteria. However, it also introduces new challenges, especially in reacting to failures where centralization can increase latency. This paper leverages the flexibility afforded by centralized routing to address these challenges. Specifically, we explore when and how standby backup forwarding options can be activated while waiting for an update from the centralized server after the failure of an individual component (link or node). We provide analytical insight into the feasibility of such backups as a function of network structure and quantify their computational complexity. We also develop an efficient heuristic reconciling protectability and performance, and demonstrate its effectiveness in a broad range of scenarios. The results should facilitate deployments of centralized routing solutions.
Article
This paper develops novel mechanisms for recovering from failures in IP networks with proactive backup path calculations and Internet Protocol (IP) tunneling. The primary scheme provides resilience for up to two link failures along a path. The highlight of the developed routing approach is that a node reroutes a packet around the failed link without the knowledge of the second link failure. The proposed technique requires three protection addresses for every node, in addition to the normal address. Associated with every protection address of a node is a protection graph. Each link connected to the node is removed in at least one of the protection graphs, and every protection graph is guaranteed to be two-edge-connected. The network recovers from the first failure by tunneling the packet to the next-hop node using one of the protection addresses of the next-hop node; the packet is routed over the protection graph corresponding to that protection address. We prove that it is sufficient to provide up to three protection addresses per node to tolerate any arbitrary two link failures in a three-edge-connected graph. An extension to the basic scheme provides recovery from single-node failures in the network. It involves identification of the failed node in the packet path and then routing the packet to the destination along an alternate path not containing the failed node. The effectiveness of the proposed techniques were evaluated by simulating the developed algorithms over several network topologies.
Conference Paper
Failure recovery using IP fast reroute (IPFRR) has gained much attention recently. The basic idea is to find backup paths and configure the routing tables in advance. After a failure is detected, the pre-determined backup paths are used immediately to forward the affected packets. Since the calculation and configuration are performed in advance, the recovery can be completed very quickly. IPFFR is considered as a promising approach to enhance the survivability of IP networks. While single failure recovery has been extensively researched, using IPFRR for double-link failure recovery remains as a great challenge. We propose a solution for this issue called Efficient SCan for Alternate Paths for double-link failure recovery (ESCAP-DL). ESCAP-DL guarantees 100% coverage from both single and double-link failures and has the advantages of low complexity and resource requirement. The scheme resumes packet forwarding immediately after failures are detected and does not require failure advertising throughout the network.
Article
The IETF currently discusses fast reroute mechanisms for IP networks (IP FRR). IP FRR accelerates the recovery in case of network element failures and avoids micro-loops during re-convergence. Several mechanisms are proposed. Loop-free alternates (LFAs) are simple but cannot cover all single link and node failures. Not-via addresses can protect against these failures but are more complex, in particular, they use tunneling techniques to deviate backup traffic. In the IETF it has been proposed to combine both mechanisms to merge their advantages: simplicity and full failure coverage.This work analyzes LFAs and classifies them according to their abilities. We qualitatively compare LFAs and not-via addresses and develop a concept for their combined application to achieve 100% single failure coverage, while using simple LFAs wherever possible. The applicability of existing LFAs depends on the resilience requirements of the network. We study the backup path length and the link utilization for both IP FRR methods and quantify the decapsulation load and the increase of the routing table size caused by not-via addresses. We conclude that the combined usage of both methods has no advantage compared to the application of not-via addresses only.
Article
Multipath routing (MPR) is an effective strategy to achieve robustness, load balancing, congestion reduction, and increased throughput in computer networks. Disjoint multipath routing (DMPR) requires the multiple paths to be link- or node-disjoint. Both MPR and DMPR pose significant challenges in terms of obtaining loop-free multiple (disjoint) paths and effectively forwarding the data over the multiple paths, the latter being particularly significant in IP datagram networks.This paper develops a two-disjoint multipath routing strategy using colored trees. Two trees, red and blue, that are rooted at a designated node, called the drain, are formed. The paths from a given source to the drain on the two trees are link- or node-disjoint. The colored tree approach requires every node to maintain only two preferred neighbors for each destination, one on each tree. This paper (1) formulates the problem of colored-trees construction as an integer linear program (ILP); and (2) develops the first distributed algorithm to construct the colored trees using only local information. We demonstrate the effectiveness of the distributed algorithm by evaluating it on grid and random topologies and comparing to the optimal obtained by solving the ILP.
Conference Paper
While most topology control protocols only address limited network mobility, we propose in this paper a quasi-localized topology control algorithm that considers mobility predictions in order to construct and maintain a power efficient topology without ...
Conference Paper
This paper presents techniques that improve the efficiency and manageability of an IP Fast Reroute (IPFRR) technol- ogy: NotVia. NotVia providesthe IPFRR service for all des- tinations in an ISP's network upon any single link or node failure, while previous proposals such as Loop-free Alter- nates (LFA) can not guarantee this level of coverage. How- ever, NotVia increases the computational and memory costs of the IPFRR service, and poses new challenges to network management, as routers are unaware of the links and nodes (hence the amount of traffic) that they actually protect. This paper introduces three techniques: NotVia aggregation, pri- oritized NotVia computation,and the rNotVia algorithm that collectively reduce the overhead of NotVia and improve its manageability. We use simulations to evaluate these tech- niques on real ISP topologies as well as on randomly gen- erated topologies. The results show that the computational and memory overhead of NotVia are reduced to a fraction of their previous values on various topologies, suggesting that the techniques proposed in this paper make NotVia a more efficient and easy-to-manage IPFRR solution.
Conference Paper
Current distributed routing paradigms (such as link-state, distance-vector, and path-vector) involve a convergence process consisting of an iterative exploration of intermediate routes triggered by certain events such as link failures. The convergence process increases router load, introduces outages and transient loops, and slows reaction to failures. We propose a new routing paradigm where the goal is not to reduce the convergence times but rather to eliminate the convergence process completely. To this end, we propose a technique called Failure-Carrying Packets (FCP) that allows data packets to autonomously discover a working path without requiring completely up-to-date state in routers. Our simulations, performed using real-world failure traces and Rocketfuel topologies, show that: (a) the overhead of FCP is very low, (b) unlike traditional link-state routing (such as OSPF), FCP can provide both low loss-rate as well as low control overhead, (c) compared to prior work in backup path pre-computations, FCP provides better routing guarantees under failures despite maintaining lesser state at the routers.
Article
IP Fast ReRoute (IPFRR) has received increasing attention as a means to effectively shorten traffic disruption under failures. A major approach to implementing IPFRR is to pre-calculate backup paths for nodes and links. However, it may not be easy to deploy such an approach in practice due to the tremendous computational overhead. Thus, a light-weight IPFRR scheme is desired to effectively provide cost-efficient routing protection. In this paper, we propose a Fast Tunnel Selection (FTS) approach to achieve tunnel-based IPFRR. FTS approach can find an effective tunnel endpoint before complete computation of entire SPT and thus effectively reduce computation overhead. Specially, we propose two FTS algorithms to provide protection for networks with symmetric and asymmetric link weights. We simulate FTS with topologies of different sizes. The results show that FTS approach reduces more than 89% computation overhead compared to the existing approaches, and achieves more than 99% average link protection rate and more than 90% average node protection rate. Moreover, FTS approach achieves less than 15% path stretch, which is better than the existing approaches.
Article
Multipath routing allows for load balancing and fast re-routing in order to improve the reliability and the efficiency of the network. Current IP routers only support Equal Cost MultiPath (ECMP) which guarantees that the forwarding paths do not contain loops. However, ECMP provides limited path diversity. In this paper, we present an efficient algorithm that allows routers to enable more path diversity: our algorithm let all routers compute at least the two best first hop distinct paths towards each destination and achieves a good tradeoff between path diversity and overhead.In addition, we propose a multipath routing scheme whose goal is to combine fast re-routing and load balancing loop-free routes. The low overhead of our scheme (no additional signaling messages and low complexity) and the nature of its loop-free rules allow to incrementally deploy it on current IP routers. Using actual, inferred, and generated topologies, we compare our algorithm to existing solutions.
Article
With network components increasingly reliable, routing is playing an ever greater role in determining network reliability. This has spurred much activity in improving routing stability and reaction to failures, and rekindled interest in centralized routing solutions, at least within a single routing domain. Centralizing decisions eliminates uncertainty and many inconsistencies, and offers added flexibility in computing routes that meet different criteria. However, it also introduces new challenges; especially in reacting to failures where centralization can increase latency. This paper leverages the flexibility afforded by centralized routing to address these challenges. Specifically, we explore when and how standby backup forwarding options can be activated, while waiting for an update from the centralized server after the failure of an individual component (link or node). We provide analytical insight into the feasibility of such backups as a function of network structure, and quantify their computational complexity. We also develop an efficient heuristic reconciling protectability and performance, and demonstrate its effectiveness in a broad range of scenarios. The results should facilitate deployments of centralized routing solutions.
Article
Colored Trees (CTs) is an efficient approach to route packets along link- or node-disjoint paths in packet-switched networks. In this approach, two trees, namely red and blue, are constructed rooted at a drain such that the path from any node to the drain are link- or node-disjoint. For applications where both the trees are used simultaneously, it is critical to maintain the trees after link or node failures. To this end, this paper develops an algorithm, referred to as SimCT, that efficiently constructs and maintains colored trees under failures using only local information. Even when the entire tree needs to be recomputed, the SimCT algorithm requires 40% lesser messages than previous techniques. The convergence time of the SimCT algorithm is linear in the number of nodes. We show through extensive simulations that the average length of the disjoint paths obtained using the SimCT algorithm is lesser com- pared to the previously known techniques. The above mentioned improvements are obtained by exploiting the relationship between DFS numbering, lowpoint values, and the potentials employed for maintaining partial ordering of nodes. The SimCT algorithm is also extended to obtain colored trees in multi-drain networks.
Article
We describe and analyse in details the various factors that influence the convergence time of intradomain link state routing protocols. This convergence time reflects the time required by a network to react to the failure of a link or a router. To characterise the convergence process, we first use detailed measurements to determine the time required to perform the various operations of a link state protocol on currently deployed routers. We then build a simulation model based on those measurements and use it to study the convergence time in large networks. Our measurements and simulations indicate that sub-second link-state IGP convergence can be easily met on an ISP network without any compromise on stability.
Article
As the Internet evolves into a ubiquitous communication infrastructure and supports increasingly important services, its dependability in the presence of various failures becomes critical. In this paper, we analyze IS-IS routing updates from the Sprint IP backbone network to characterize failures that affect IP connectivity. Failures are first classified based on patterns observed at the IP-layer; in some cases, it is possible to further infer their probable causes, such as maintenance activities, router-related and optical layer problems. Key temporal and spatial characteristics of each class are analyzed and, when appropriate, parameterized using well-known distributions. Our results indicate that 20% of all failures happen during a period of scheduled maintenance activities. Of the unplanned failures, almost 30% are shared by multiple links and are most likely due to router-related and optical equipment-related problems, respectively, while 70% affect a single link at a time. Our classification of failures reveals the nature and extent of failures in the Sprint IP backbone. Furthermore, our characterization of the different classes provides a probabilistic failure model, which can be used to generate realistic failure scenarios, as input to various network design and traffic engineering problems.
Article
Link failures are part of the day-to-day operation of a network due to many causes such as maintenance, faulty interfaces, and accidental fiber cuts. Commonly deployed link state routing protocols such as OSPF react to link failures through global link state advertisements and routing table recomputations causing significant forwarding discontinuity after a failure. Careful tuning of various parameters to accelerate routing convergence may cause instability when the majority of failures are transient. To enhance failure resiliency without jeopardizing routing stability, we propose a local rerouting based approach called failure insensitive routing. The proposed approach prepares for failures using interface-specific forwarding, and upon a failure, suppresses the link state advertisement and instead triggers local rerouting using a backwarding table. With this approach, when no more than one link failure notification is suppressed, a packet is guaranteed to be forwarded along a loop-free path to its destination if such a path exists. This paper demonstrates the feasibility, reliability, and stability of our approach
Article
To date, realistic ISP topologies have not been accessible to the research community, leaving work that depends on topology on an uncertain footing. In this paper, we present new Internet mapping techniques that have enabled us to measure router-level ISP topologies. Our techniques reduce the number of required traces compared to a brute-force, all-to-all approach by three orders of magnitude without a significant loss in accuracy. They include the use of BGP routing tables to focus the measurements, the elimination of redundant measurements by exploiting properties of IP routing, better alias resolution, and the use of DNS to divide each map into POPs and backbone. We collect maps from ten diverse ISPs using our techniques, and find that our maps are substantially more complete than those of earlier Internet mapping efforts. We also report on properties of these maps, including the size of POPs, distribution of router outdegree, and the interdomain peering structure. As part of this work, we release our maps to the community.
Article
A key functionality in today's widely used interior gateway routing protocols such as OSPF and IS-IS involves the computation of a shortest path tree (SPT). In many existing commercial routers, the computation of an SPT is done from scratch following changes in the link states of the network. As there may coexist multiple SPTs in a network with a set of given link states, such recomputation of an entire SPT not only is inefficient but also causes frequent unnecessary changes in the topology of an existing SPT and creates routing instability.. This paper presents a new dynamic SPT algorithm that makes use of the structure of the previously computed SPT. Our algorithm is derived by recasting the SPT problem into an optimization problem in a dual linear programming framework, which can also be interpreted using a ball-and-string model. In this model, the increase (or decrease) of an edge weight in the tree corresponds to the lengthening (or shortening) of a string. By stretching the strings until each node is attached to a tight string, the resulting topology of the model defines an (or multiple) SPT(s). By emulating the dynamics of the ball-and-string model, we can derive an efficient algorithm that propagates changes in distances to all affected nodes in a natural order and in a most economical way. Compared with existing results, our algorithm has the best-known performance in terms of computational complexity as well as minimum changes made to the topology of an SPT. Rigorous proofs for correctness of our algorithm and simulation results illustrating its complexity are also presented
Conference Paper
Intra-domain routing protocols employed in the Internet route around failed links by having routers detect adjacent link failures, exchange link state changes, and recompute their routing tables. Due to several delays in detection, propagation and recomputation, it may take tens of seconds to minutes after a link failure to resume forwarding of packets to the affected destinations. This discontinuity in destination reachability adversely affects the quality of continuous media applications such as Voice over IP. Moreover, the resulting service unavailability for even a short duration could be catastrophic in the world of e-commerce. Careful tuning of various parameters to accelerate routing convergence may cause routing instability when the majority of failures are transient. To improve failure resiliency without jeopardizing routing stability, we propose a local rerouting based approach called failure insensitive routing. Under this approach, upon a link failure, adjacent router suppresses global updating and instead initiates local rerouting. All other routers infer potential link failures from packet's incoming interface, precompute interface dependent forwarding tables and route around failed links without explicit link state updates.
Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop)
IP Fast Reroute Framework
Basic Specification for IP Fast Reroute: Loop-Free Alternates
  • A Atlas
  • A Zinin
BCube: A high performance, server-centric network architecture for modular data centers
  • C Guo