On the Design of Overlay Networks for IP Links Fault Verification
ABSTRACT Accurate fault detection and location is essential to the efficient and economical operation of ISP networks. In addition, it affects the performance of Internet applications such as VoIP and online gaming. Fault detection algorithms typically depend on spatial correlation to produce a set of fault hypotheses, the size of which increases by the existence of lost and spurious symptoms, and the overlap among network paths. The network administrator is left with the task of accurately locating and verifying these fault scenarios, which is a tedious and timeconsuming task. In this paper, we formulate the problem of designing infrastructure overlay networks for verifying the location of IP links faults taking into account the cost of the debugging paths and the stress on the underlying IP links. We map the problem into a integer generalized flow problem, and prove its NPhardness. We relax the link stress constraint and formulate the resulting problem as a minimum cost circulation that can be solved in polynomial time. We evaluate the fault verification and IP links coverage capabilities of various overlay network sizes and topologies using reallife Internet topologies. Finally, we identify some interesting research problems in this context.
 Citations (19)
 Cited In (0)

Article: Fault Localization via Risk Modeling
[Show abstract] [Hide abstract]
ABSTRACT: Internet backbone networks are under constant flux in order to keep up with demand and offer new features. The pace of change in technology often outstrips the pace of introduction of associated fault monitoring capabilities that are built into today's IP protocols and routers. Moreover, some of these new technologies cross networking layers, raising the potential for unanticipated interactions and service disruptions, which the individual layers' builtin monitoring capabilities may not detect. In these instances, operators typically employ higher layer monitoring techniques such as endtoend liveness probing to detect lower or crosslayer failures, but lack tools to precisely determine where a detected failure may have occurred. In this paper, we evaluate the effectiveness of using risk modeling to translate highlevel failure notifications into lower layer root causes in two specific scenarios in a tier1 ISP. We show that a simple greedy heuristic works with accuracy exceeding 80 percent for many failure scenarios in simulation, while delivering extremely high precision (greater than 80 percent). We report our operational experience using risk modeling to isolate optical component and MPLS control plane failures in an ISP backbone.IEEE Transactions on Dependable and Secure Computing 01/2011; · 1.06 Impact Factor  01/1993; Prentice Hall., ISBN: 9780136175490
 SourceAvailable from: Yuval Shavitt
Conference Paper: On the placement of Internet instrumentation
[Show abstract] [Hide abstract]
ABSTRACT: The IDMaps project aims to provide a distance map of the Internet from which relative distances between hosts on the Internet can be gauged. Many distributed systems and applications can benefit from such a distance map service, for example, a common method to improve userperceived performance of the Internet is to place data and server mirrors closer to clients. When a client tries to access a mirrored server, which mirror should it access? With IDMaps, the closest mirror can be determined based on distance estimates between the client and the mirrors. In this paper we investigate both graph theoretic methods and ad hoc heuristics for instrumenting the Internet to obtain distance maps. We evaluate the efficacy of the resulting distance maps by comparing the determinations of the closest replica using known topologies against those obtained using the distance mapsINFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE; 02/2000
Page 1
On the Design of Overlay Networks for IP Links
Fault Verification
M. Fraiwan and G. Manimaran
Dept. of Electrical and Computer Engineering
Iowa State University, Ames, IA 50010
Email: mfraiwan,gmani@iastate.edu
Abstract—Accurate fault detection and location is essential
to the efficient and economical operation of ISP networks. In
addition, it affects the performance of Internet applications
such as VoIP and online gaming. Fault detection algorithms
typically depend on spatial correlation to produce a set of fault
hypotheses, the size of which increases by the existence of lost
and spurious symptoms, and the overlap among network paths.
The network administrator is left with the task of accurately
locating and verifying these fault scenarios, which is a tedious and
timeconsuming task. In this paper, we formulate the problem
of designing infrastructure overlay networks for verifying the
location of IP links faults taking into account the cost of the
debugging paths and the stress on the underlying IP links. We
map the problem into a integer generalized flow problem, and
prove its NPhardness. We relax the link stress constraint and
formulate the resulting problem as a minimum cost circulation
that can be solved in polynomial time. We evaluate the fault
verification and IP links coverage capabilities of various overlay
network sizes and topologies using reallife Internet topologies.
Finally, we identify some interesting research problems in this
context.
I. INTRODUCTION
Accurate fault detection and location affects the perfor
mance of Internet applications (e.g., VoIP, video streaming,
and online gaming), and ISP networks as a whole [1][2]. There
is an ever increasing need to reduce rerouting and maintenance
times. In order to achieve stable operating environments for
ISP networks and the kind of emerging applications they
support, we need to accurately locate IP links faults.
IP links faults can be caused by many factors, such as
fiber cuts, router crashing or misconfiguration, very heavy
congestion, or maintenance activities causing unintentional
effects. These kinds of failures occur on a daily basis [3], and
they may affect packet forwarding even with the existence of
backup paths due to the overlap among network paths [4].
The process of fault monitoring goes through three steps.
The first step is fault detection, which is done through IPlevel
management agents via management protocol messages (e.g.,
SNMP trap and CMIP EVENTREPORT), or applicationlevel
overlay monitoring [5]. These agents generate a set of alarms.
After that, fault identification through alarm correlation is
performed. The output of the second step is a set of possible
fault scenarios. The majority of fault identification algorithms
and systems [5][6] rely on spatial correlation of observed
symptoms and possible fault scenarios. These systems typi
cally generate a set of equally plausible fault locations. This
set of possible faults is non trivial due to lost and spurious
symptoms, the overlap among network paths, and the network
heterogeneity. The final step, which traditionally has been done
by the network operators or administrators, is fault verification
through debugging. Fault verification locates the exact faulty
link(s). The focus of this research is fault verification.
The proliferation of infrastructure overlay networks allows
the network administrator to deploy shortterm and long term
network solutions with great versatility and flexibility. Such
overlays present a tremendous opportunity for the verification
of suspected faulty IP links. However, several issues arise in
the design of such overlays due to the IP and overlay link
sharing among overlay paths. One of the major concerns is
the IP link stress (i.e., number of packets crossing the same
link). Such stress can cause measurement conflict [7], which
in turn leads to congestion and packet loss. Thus, adding
further confusion rather than verifying the faulty IP links. In
addition, overlay paths may have varying properties such as
the underling IP hop count or bandwidth capacity.
The contributions of this paper are as follows:
• We formulate the problem of overlay design for IP
links fault verification as a minimum cost link stress
constrained path selection problem, and construct the
corresponding flow network.
• We prove the NPhardness of the problem.
• We study the IP links coverage of various overlay sizes
and topologies using reallife Internet topologies.
• We point out several interesting research problems in this
context.
The remainder of this paper proceeds as follows. Section II
gives a motivational example and explores the problem space.
Section III presents our network model and its assumptions.
The overlay design problem flow network formulation along
with its complexity is presented in section IV. Section V
presents the performance evaluation results. Section VI dis
cusses the related work. We conclude in section VII.
II. MOTIVATION
In this section, we expose some of the issues involved in
the design of overlay networks for IP links fault verification
and explore the problem space. There are two aspects of the
problem: how do we obtain IP links fault information? And
how do we verify such information and what are the various
performance issues that should be considered in this context?
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
Authorized licensed use limited to: Iowa State University. Downloaded on January 11, 2009 at 21:03 from IEEE Xplore. Restrictions apply.
9781424423248/08/$25.00 © 2008 IEEE.1
Page 2
Fig. 1 shows an example overlay network and the underlying
IP network [8]. Let us assume that the overlay monitoring
algorithm has chosen overlay paths ECA and BA as a subset
of paths that will cover most of the underlying IP links, or
for any other consideration. The properties of the remaining
overlay links can be calculated (e.g., link AC), estimated [9],
or probed explicitly (e.g., link DC).
Let us go through a scenario wherein a fault has occurred
in an IP link, which caused path E −C −A to go down, then
consider the following:
• Performing spatial correlation [6] using overlay paths E−
C − A and B − A, and the underlying IP links, will
generate a set of equally “good” suspected list of faulty
IP links. This set includes links E − N2, N2− C, and
C−N1. Note that this set could also have been generated
by a management system at the IP layer.
• A network administrator will need to debug these sus
pected faulty IP links, either by checking each link
physically one by one (i.e., white box testing), or through
endtoend measurements (i.e., black box testing).
• Probing overlay paths D − C and C − A would be
sufficient. If path D−C is faulty then so is link N2−C,
and if path C −A is faulty then link C −N1is faulty. If
both paths are working then the assumption is that link
E − N2is faulty.
From this simple example, we can identify several issues
involving this problem:
1) An awful choice would have been for node B to probe
nodes C and D. Since there is no overlay link between
nodes B and C, and between nodes B and D. The paths
will have to go through node A first (because of overlay
routing), which will result in a higher link stress. For
example, link A − N1would have a stress of 4 probes,
and link N1−C would have a stress of 2 probes. Such
stress can cause measurement conflict [7], which in turn
leads to congestion and packet loss. Thus, adding further
confusion rather than verifying the faulty IP links.
2) The overlay needs to cover the underlying IP network
or the portion containing the suspected list of IP links.
This coverage need to be sufficient for fault debugging
(i.e., sufficient number of good paths).
3) Different overlay paths may have varying costs (e.g.,
hop count) and/or bandwidth capabilities. Such costs
need to be minimized and/or the bandwidth constraints
respected.
4) The number of overlay nodes involved in the measure
ment needs to be minimized. This has an effect on the
management overhead (e.g., overhead of aggregating the
results).
In this paper, we address the problem of choosing the
overlay paths with minimum cost subject to IP links stress
constraints, assuming that we have a set of overlay nodes
to choose from. The link stress constraint will capture the
measurement conflict, path disjointedness, and bandwidth ca
pacities of the underlying IP links.
C
B
D
A
B
A
E
D
C
E
Overlay Network
IP Network
N1
N2
Monitoring Paths
Debugging Paths
Fig. 1: A motivational example.
III. NETWORK MODEL AND ASSUMPTIONS
Throughout this paper, we use the following model and
make assumptions that are commonly used in the literature:
1) We represent the network as an undirected graph G =
(V,E), where V is the set of physical nodes and E
is the set of physical edges. From this graph a set of
network nodes Vo ⊆ V are available to be used in
network debugging.
2) The routing paths from each node in Voto every other
node in Vo are known. This routing information is
represented as a matrix R = [rij], where rij is defined
by:
rij=
0
otherwise
Several related studies make this assumption [7][10][11].
3) We are given the set of suspected faulty IP links
F = {f1,f2,...,fF}, where F ⊂ E. This set is
generated by a fault detection and alarm correlation
system. These alarms are produced by management
agents via management protocol messages (e.g., SNMP
trap and CMIP EVENTREPORT) operating at the IP
level, or using overlay monitoring as in Fig. 1. This
suspected list will need to be debugged and verified
operational.
4) The vector D = [di] specifies the bound on the link
stress experienced by each link i ∈ E. The bound is in
terms of the number of paths using this link, but can also
be in terms of the number of probes or the bandwidth
allocated for conducting measurements on each link. By
definition, the stress on faulty links should be 1 for the
debugging to be correct (i.e., the debugging path should
pass through only one suspected faulty link at a time).
5) The set of nonfaulty IP links is L = {l  l ∈ E \ F}.
6) The set of debugging paths is P = {p1,p2,...,pP},
contains those overlay paths in Vo× Vo that pass
through the set of faulty links. No path in P traverses
two faulty links at the same time, as it will violate the
stress constraint.
7) Each sourcedestination path i ∈ V ×V  is associated
with a nonnegative cost. This cost could represent the
number of hops in each path, the bandwidth cost, or
the operational cost of that path depending on its geo
?
1
if path i traverses edge j
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
Authorized licensed use limited to: Iowa State University. Downloaded on January 11, 2009 at 21:03 from IEEE Xplore. Restrictions apply.
9781424423248/08/$25.00 © 2008 IEEE.2
Page 3
graphical location or accessibility [10]. For the sake of
formulation simplicity, the path costs will be expressed
in terms of link costs κi, ∀li∈ L
IV. PROBLEM FORMULATION
Given the network graph G, the set of suspected faulty links
F, the sets Voand L, and links costs. The goal is to find the set
paths that can debug the suspected faulty links at a minimum
cost while respecting the link stress constraints for each link
incident to the debugging paths.
We formulate this problem as a integer generalized flow
problem [12] by constructing a flow network Gf= (Vf,Ef),
as follows:
1) Add a node for each suspected faulty link in F, a node
for each debugging path in P, and a node for each IP link
that we are interested in bounding its stress. In addition,
add a dummy source s and a dummy terminal t.
2) Add an edge with (cost, upper bound, multiplier) label
of (0, 1, 1) between the source s and every other faulty
link node fi∈ F.
3) Add an edge with (0,1,pj) label between each faulty
link node fi∈ F and the nodes pj∈ P representing its
debugging paths. pj is the number of links in path pj.
4) Add an edge with (0,1,1) label between each debugging
path node and the nodes representing nonfaulty links
li∈ L that are incident to the corresponding path.
5) Add an edge with (κi,di,1) label between each node
representing links in L and the terminal node t. di
represent the bound on the stress experienced by link
liin terms of the number of paths sharing that link, and
κiis the cost of link li.
The flow network Gf = (Vf,Ef) corresponding to the
minimum cost link stress constrained overlay design for fault
verification problem is shown in Fig. 2. Finding the minimum
cost maximum flow in this network will correspond to finding
the least cost maximum number of debugging paths (i.e., one
for each suspected faulty link), while respecting the link stress
constraints. Note that the bound on the flow leaving the l
nodes will limit the number of departing flow units. Hence,
by flow conservation, the number of incoming flow units (i.e.,
paths using that same link in L) is also bounded by the stress
constraint. Each s − fiedge carrying a flow means that fault
fiis verifiable by path pjthat has the flow value on fi− pj
equal to 1.
The problem of minimum cost link stress constrained over
lay design for fault verification problem is NPhard. The proof
is eliminated due to the lack of space.
A. Relaxing the Stress constraint
Since the problem is NPhard, we look for a relaxation of
the problem that will make it computationally tractable. We
choose to relax the link stress constraint, and reformulate the
problem with path costs λi, as a minimum cost circulation
problem [12]. Fig. 3 shows the corresponding flow network.
The main differences in this flow network are the removal of
the path multipliers, and that we add an edge from t to s
Debugging Paths
P
(0, 1, p1)
Suspected Faulty Links
F
Network Links
L
f1
pP
l1
p1
p2
p3
p5
f2
fF
l2
l3
lL
(0, 1, p3)
(0, 1, pP)
s t
(?1, d1, 1)
(? 3, d3, 1)
(? L, dL, 1)
(0, 1, 1)
(0, 1, 1)
(0, 1, 1)
(0, 1, 1)
(0, 1, 1)
i
j
(cost, upper bound, multiplier)
Fig. 2: The flow network corresponding to the fault debugging
problem. Not all edges are labeled to keep the figure clear.
Debugging Paths
P
Suspected Faulty Links
F
f1
pP
p1
p2
p3
f2
fF
i
j
(cost, upper bound)
(?1, 1)
(? 3, 1)
(? P, 1)
s t
(0, 1)
(0, 1)
(C, F)
(?2, 1)
(0, 1)
(0, 1)
(0, 1)
Fig. 3: Relaxing the link stress constraints allowed us to model
the problem as a minimum cost circulation.
with capacity F, and a cost of −C, where C is larger than
any path cost. It can be shown that finding the minimum cost
circulation will correspond to finding the maximum number
of least cost debugging paths that can verify the most possible
number of suspected faulty links. The maximum flow (i.e.
maximum number of faults to be verified) is achieved because
of the very large negative cost on the t − s edge. The more
flow that is pushed on the s−fiedges, the least the cost will
be, till we reach the upper bound of F (i.e., the number of
suspected faults) if possible, then the circulation stops. Each
s−fiedge carrying a flow means that fault fiis verifiable by
the path represented by node pjwith the flow value on fi−pj
equal to 1. The minimum cost is guaranteed by the elimination
of all negative cost cycles (i.e., faults verifiable with less cost)
in the residual network. The proof is eliminated for the lack
of space.
Building upon this construction, we map the problem into
an LP (Linear Program), which can be solved using common
optimization tools (e.g., CPLEX [13]), as follows: Let Gf=
(Vf,Ef) be the flow network in Fig. 3, let xij be the flow
value along edge (i,j) ∈ Ef, cij is the cost per unit flow
along edge (i,j), and uij is its upper bound, or the edge’s
capacity. Then, the corresponding LP formulation is:
Minimize
?
(i,j)∈Ef
cijxij
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
Authorized licensed use limited to: Iowa State University. Downloaded on January 11, 2009 at 21:03 from IEEE Xplore. Restrictions apply.
9781424423248/08/$25.00 © 2008 IEEE.3
Page 4
Subject to
?
xsfj= {0,1} ∀(s,fj) ∈ Ef
xfipj= {0,1} ∀(fi,pj) ∈ Ef
xpit= {0,1} ∀(pi,t) ∈ Ef
0 ≤ xts≤ F
In the above formulation, constraint 1 is the flow conservation
constraint, constraints 24 are binary constraints (e.g., either
use the link in the flow network or do not), and constraint 5
states that we can not verify more faults than what is given.
j
xij=
?
j
xji, ∀i,j ∈ Vf
(1)
(2)
(3)
(4)
(5)
V. PERFORMANCE EVALUATION
We study the problem on real Internet ISP topologies of two
major U.S. carriers provided by RocketFuel [18]: AT&T (AS#
7018, 11800 nodes), and Sprint (AS# 1239, 10332 nodes). In
addition, two smaller networks are considered: Ebone (AS#
1755, 300 nodes), and Above (AS# 6461, 654 nodes). A
shortest path algorithm, based on hop count, is used for the
IPlayer routing.
The set of overlay nodes is uniformly selected at random.
The number of overlay nodes (N) is set to 16 and 128. The
overlay topology is varied in two ways: a complete graph
(denoted as complete), where each nodes maintains overlay
links to every other node (e.g., RON), and a random graph
(denoted as log), where each node maintains log2N neighbors.
Coverage of IP links
The first aspect that should be considered when choosing
the set of overlay nodes is how many IP links to they cover?
Fig. 4 shows the percentage of the covered IP links against the
number of overlay nodes and for the two overlay topologies
previously mentioned. The results show that there is not much
difference, in terms of coverage, between the log overlay
and the complete overlay, as both achieve comparable results.
In general, the percentage is good (i.e., 2535% for 128
overlay nodes), however, it is so because of the small size
of the underlying IP network. For the larger AT&T and Sprint
networks, the percentage was close to 5% for 128 overlay
nodes, the figures where removed for the lack of space. The
conclusion to be drawn here is that the choice of the overlay
nodes (an ongoing and future research problem) should be
dynamic and changing with the set of suspected faults.
Debugging Faulty Links
How much path disjointedness is it possible to get from
an available set of overlay paths? Remember that in order to
verify any two links we need two paths such that each path
contains one of the links, but not the other. We show the results
for the AT&T network, as the other networks follow the same
trend. We set the number of links requiring debugging (from
the currently covered set of IP links) to 5, 10, 15, and 20. The
cost of the path is set to its IP hop count. Fig. 5a shows the
normalized average number of successfully debugged IP links
16 3264128
0
5
10
15
20
25
30
35
Number of overlay nodes
Percentage of covered edges (%)
Complete
log
(a) Ebone
1632 64128
0
5
10
15
20
25
30
Number of overlay nodes
Percentage of covered edges (%)
Complete
log
(b) Above
Fig. 4: The average link debugging capability for various sizes
and topologies of the overlay network.
versus the number of suspected IP links for a complete overlay
of sizes 16, 32, and 64 nodes. Fig. 5b shows the same results
for a random graph (denoted as log) for the same number of
nodes. This figure shows that even though there is a significant
overlap in the overlay, we are still able to find a very good
number of disjoint paths. For example, for 5 suspected links,
close to 90% of the links were debugged successfully using
a complete overlay of size 64 or 32, the percentage drops to
68% for the humble overlay size of 16. As for the random log
graph, the numbers are slightly less impressing as compared
to the complete graph.
VI. RELATED WORK
There have been many attempts to study the occurrence of
failures in backbone networks. In [1] the authors study the
impact of failures in Sprint’s IP backbone on VoIP services.
They observed that failures occur on a daily basis and they
have a tangible impact on the operation of backbone networks.
In another study [3], the authors classified probable causes of
such failures. They had found that 49% of all failures affect
a single link at a time, 21% of failures are caused by router
related problems or optical equipments, and 20% is due to
planned maintenance activities.
The area of fault detection and localization has been very
active in the past decade or so. An excellent survey of such
algorithms has been presented by Steinder and Sethi in [5].
SCORE [6][19] is one of the most recent related studies. In
SCORE, spatial correlation on a bipartite fault graph is used to
identify fault hypotheses that best explain the failure signature
with the least number of candidate faults. Such systems where
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
Authorized licensed use limited to: Iowa State University. Downloaded on January 11, 2009 at 21:03 from IEEE Xplore. Restrictions apply.
9781424423248/08/$25.00 © 2008 IEEE.4
Page 5
used to detect black holes in MPLS networks [19], or link fault
detection in ATM networks. In another paper, wang and Al
Shaer [20], extend the bipartite fault propagation graph to a
probabilistic model that can be used to handle lost and spurious
symptoms. We use such spatial correlation engines as an input
to our problem.
In the design of monitoring overlays literature, Cantieni
et al. [15] revisit the problem of monitor placement, they
propose an optimal algorithm to choose which monitors to
activate and at what sampling rate in order to achieve a
given measurement accuracy, as opposed to maximizing the
fraction of IP flows being monitored proposed by Suh et al.
[10]. In our study, we use similar assumptions regarding the
available IP network information. In another steady, Bejerano
and Rastogi [11] propose a two phase approach to minimize
the monitoring infrastructure cost and the probing overhead of
link delays measurement even in the presence of link faults.
Our contribution is different in that we aim at pinpointing the
faulty link(s) out of the suspected faulty ones, while theirs
is to ensure delay measurements are possible even in the
presence of failures through bypassing suspected faulty links
irrespective of their accurate location.
In our previous study [8], we have examined various topo
logical features (e.g., path diversity) of various ISP networks
and the properties of overlays built on top of these networks in
terms of fault verification. We have assumed that the overlay
nodes and paths have already been chosen and that they cover
all the suspected faulty links. The contribution in this paper is
completely different as we address the overlay design aspect
of the problem, which is an input in our previous work [8].
5 10 1520
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of suspected links
Normalized number of verifiable links
16
32
64
(a) A complete overlay, with number of nodes 16, 32, and 64.
510 1520
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Number of suspected links
Normalized number of verifiable links
16
32
64
(b) A random overlay. Each node maintains log2N neighbors.
Fig. 5: The average link debugging capability for various sizes
and topologies of the overlay network.
VII. CONCLUSION
In this paper, we have introduced the problem of verifying
IP links faults using overlay networks. We have shown that
the problem of verifying the maximum number of faults using
the least cost paths under link stress constraints is NPhard.
In addition, we have given a relaxation of the problem that
ignores the link stress and finds the maximum number of faults
that can be verified using the least cost paths.
Experimental results show that it is possible to achieve good
verification capabilities with a small number of overlay nodes
(e.g., 64 nodes) due to the path diversity provided by the
overlay network. However, the number of IP links covered
by the overlay is small. Hence, we need to make dynamic
selection of the overlay nodes based on the network topology
and the set of suspected faulty links.
Future work will focus on developing heuristic solutions
to the fault verification problem under the stress constraints.
More importantly, we will look into the choice of the overlay
nodes and the topology to be used for fault verification.
REFERENCES
[1] G. Iannaccona, C. Chuah, R. Mortier, S. Bhattacharyya, and C. Diot.
Analysis of link failures in an IP backbone. In Proc. of IMW, 2002.
[2] C. Boutremans, G. Iannaccone, and C. Diot. Impact of link failures on
VoIP performance. In Proceedings of NOSSDAV, May 2002.
[3] A. Markopoulou, G. Iannaccona, S. Bhattacharyya, C. Chuah, and C.
Diot. Characterization of failures in an IP backbone. In Proceedings of
IEEE INFOCOM 2004.
[4] S. Iyer, S. Bhattacharyya, N. Taft, and C. Diot. An approach to alleviate
link overload as observed on an IP backbone. In IEEE INFOCOM’03.
[5] M. Steinder, and A. Sethi. A survey of fault localization techniques
in computer networks. Elsevier Science of Computer Programming 53
(2004) 165194.
[6] R. Kompella, J. Yates, A. Greenberg, and A. Snoeren. IP fault localization
via risk modeling. In Proceedings of NSDI, May 2005.
[7] M. Fraiwan, and G. Manimaran. On the schedulability of measurement
conflict overlay networks. In proceedings of IFIP Networking, 2007.
[8] M. Fraiwan, and G. Manimaran. Localization of IP Links Faults Using
Overlay Measurements. In proceedings of IEEE ICC, 2008.
[9] Tian Bu, Francesco Presti, Nick Duffield, and Don Towsley. Network
Tomography on General Topologies. In Proc. of ACM Sigmetrics, 2002.
[10] K. Suh, Yang Guo, Jim Kurose, and Don Towsley. Locating network
monitors: complexity, heuristics, and coverage. In Infocom’05.
[11] Yigal Bejerano, and Rajeev Rastogi. Robust Monitoring of Link Delays
and Faults in IP Networks. In Proceedings of IEEE Infocom, 2003.
[12] R. K. Ahuja. T. L. Magnanti, and J. B. Orlin. Network Flows: Theorey,
Algorithms, and Applications. Prentice Hall Inc. 1993.
[13] CPLEX optimization package
http://www.ilog.com/products/cplex/
[14] N. Spring, R. Mahajan, and D. Wetherall. Measuring ISP topologies
with Rocketfuel. In Proceedings of ACM SIGCOMM, August 2002.
[15] G. R. Cantieni, Gianluca Iannaccone, Chadi Barakat, Christophe Diot,
and Patrick Thiran. Reformulating the Monitor Placement Problem: Op
timal NetworkWide Sampling. In proceeding of 40th Annual Conference
on Information Sciences and Systems, 2006.
[16] S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang. On the
Placement of Internet Instrumentation. In proc. of IEEE Infocom, 2003.
[17] Sridhar Srinivasan, and Ellen Zegura. RouteSeer: Topological Placement
of Nodes in Service Overlays. Technical report, Georgia Tech.
[18] N. Spring, R. Mahajan, and D. Wetherall. Measuring ISP topologies
with Rocketfuel. In Proceedings of ACM SIGCOMM, August 2002.
[19] Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex
C. Snoeren. Detection and Localization of Network Black Holes. In
Proceedings of IEEE INFOCOM, 2007.
[20] Y. Tang, E. S. AlShaer, and Raouf Boutaba. Active Integrated Fault
Localization in Communication Networks. in IEEE/IFIP Integrated Man
agement (IM’2005), May 2005.
(November13, 2007);
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
Authorized licensed use limited to: Iowa State University. Downloaded on January 11, 2009 at 21:03 from IEEE Xplore. Restrictions apply.
9781424423248/08/$25.00 © 2008 IEEE.5