Conference Paper

On the Design of Overlay Networks for IP Links Fault Verification

DOI: 10.1109/GLOCOM.2008.ECP.468 In proceeding of: Global Telecommunications Conference, 2008. IEEE GLOBECOM 2008. IEEE
Source: IEEE Xplore

ABSTRACT Accurate fault detection and location is essential to the efficient and economical operation of ISP networks. In addition, it affects the performance of Internet applications such as VoIP and online gaming. Fault detection algorithms typically depend on spatial correlation to produce a set of fault hypotheses, the size of which increases by the existence of lost and spurious symptoms, and the overlap among network paths. The network administrator is left with the task of accurately locating and verifying these fault scenarios, which is a tedious and time-consuming task. In this paper, we formulate the problem of designing infrastructure overlay networks for verifying the location of IP links faults taking into account the cost of the debugging paths and the stress on the underlying IP links. We map the problem into a integer generalized flow problem, and prove its NP-hardness. We relax the link stress constraint and formulate the resulting problem as a minimum cost circulation that can be solved in polynomial time. We evaluate the fault verification and IP links coverage capabilities of various overlay network sizes and topologies using real-life Internet topologies. Finally, we identify some interesting research problems in this context.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Internet backbone networks are under constant flux in order to keep up with demand and offer new features. The pace of change in technology often outstrips the pace of introduction of associated fault monitoring capabilities that are built into today's IP protocols and routers. Moreover, some of these new technologies cross networking layers, raising the potential for unanticipated interactions and service disruptions, which the individual layers' built-in monitoring capabilities may not detect. In these instances, operators typically employ higher layer monitoring techniques such as end-to-end liveness probing to detect lower or cross-layer failures, but lack tools to precisely determine where a detected failure may have occurred. In this paper, we evaluate the effectiveness of using risk modeling to translate high-level failure notifications into lower layer root causes in two specific scenarios in a tier-1 ISP. We show that a simple greedy heuristic works with accuracy exceeding 80 percent for many failure scenarios in simulation, while delivering extremely high precision (greater than 80 percent). We report our operational experience using risk modeling to isolate optical component and MPLS control plane failures in an ISP backbone.
    IEEE Transactions on Dependable and Secure Computing 01/2011; · 1.06 Impact Factor
  • 01/1993; Prentice Hall., ISBN: 978-0-13-617549-0
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The IDMaps project aims to provide a distance map of the Internet from which relative distances between hosts on the Internet can be gauged. Many distributed systems and applications can benefit from such a distance map service, for example, a common method to improve user-perceived performance of the Internet is to place data and server mirrors closer to clients. When a client tries to access a mirrored server, which mirror should it access? With IDMaps, the closest mirror can be determined based on distance estimates between the client and the mirrors. In this paper we investigate both graph theoretic methods and ad hoc heuristics for instrumenting the Internet to obtain distance maps. We evaluate the efficacy of the resulting distance maps by comparing the determinations of the closest replica using known topologies against those obtained using the distance maps
    INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE; 02/2000


Available from