ArticlePDF Available

Network Fault Correction in Overlay Network Through Optimality

Authors:

Abstract

End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting allfaulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. We construct a potential function for identifying the candidate nodes, one of which should be first checked by an optimal strategy.We proposes efficient inferring approach to the node to be checked in large-scale networks.
ISSN2394-3777 (Print)
ISSN2394-3785 (Online)
Available online atwww.ijartet.com
International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)
Vol. 2, Issue 8, August 2015
19
All Rights Reserved © 2015 IJARTET
Network Fault Correction in Overlay
Network through Optimality
Mary Varsha Peter1, Priya.M.2, Rajalakshmi.R.3, Muthu Bharathi.R.4, Pramila.E.5, Christo Ananth6
U.G.Scholars, Department of ECE, Francis Xavier Engineering College, Tirunelveli1,2,3,4,5
Associate Professor,Department of ECE, Francis Xavier Engineering College, Tirunelveli6
AbstractEnd-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize
the faults at minimum expected cost of correcting allfaulty nodes that cannot properly deliver data. First checking
the nodes that has the least checking cost does not minimize the expected costin fault localization. We construct a
potential function for identifying the candidate nodes, one of which should be first checked by an optimal strategy.
.We proposes efficient inferring approach to the node to be checked in large-scale networks.
Index TermsNetwork Management, Network Diagnosis and Correction, Fault Localization and Repair, Reliability
Engineering.
I. INTRODUCTION
Network components are prone to a variety
of faults such as packet loss, link cut, or node outage.
To prevent the faulty components from hindering
network applications, it is important to diagnose (i.e.,
detect and localize) the components that are the root
cause of network faults. However, it is also desirable
to repair the faulty components to enable them to
return to their operational states. Therefore, we focus
on network fault correction, by which we mean not
only to diagnose, but also to repair all faulty
components within a network. In addition, it has been
shown that a network outage can bring significant
economic loss.
As a result, we want to devise a cost
effective network fault correction mechanism that
corrects all network faults at minimum cost. To
diagnose (but not repair) network faults, recent
approaches like use all network nodes to
collaboratively achieve this. For instance, in hop-by-
hop authentication each hop inspects packets
received from its previous hop and reports errors
when packets are found to be corrupted. While such a
distributed infrastructure can accurately pinpoint
network faults, deploying and maintaining numerous
monitoring points in a large-scale network introduces
heavy computational overhead in collecting network
statistics and involves complicated administrative
management. In particular, it is difficult to directly
monitor and access all overlay nodes in an externally
managed network, whose routing nodes are
independently operated by various administrative
domains. In this case, we can only infer the network
condition from end-to-end information.
We consider an end-to-end inference
approach which, using end-to-end measurements,
infers components that are probably faulty in
forwarding data in an application-layer overlay
network whose overlay nodes are externally managed
by independent administrative domains. We start
with a routing tree topology with a set of overlay
nodes, since a tree-based setting is typically seen in
destination-based routing and where each overlay
node builds a routing tree with itself as a root, as well
as in multicast routing, where a routing tree is built to
connect members in a multicast group. We then
monitor every root-to-leaf overlay path. If a path
exhibits any “anomalous behavior” in forwarding
data, then some “faulty” overlay node on the path
must be responsible. In practice, the precise
definition of an “anomalous behavior” depends on
specific applications. For instance, a path is said to be
anomalous if it fails to deliver a number of correct
packets within a time window. Using the path
information collected at the application endpoints
(i.e., leaf nodes), we can narrow down the space of
possibly faulty overlay nodes.
International Journal of Advanced Research Trends in Engineering and Technology
Vol. 2, Issue 8
, August 2015
II. P
ROBLEM
I
N
T
HE
S
YSTEM
Failure model[1]
Nodes can delay or drop packets to be
forwarded b
ecause of power outage, full transmission
queues, hardware errors, or route misconfiguration
.The fail-
stop model ,the failure probabilities
then be characterized via statistical measurements of
reliability indexes. Node failures and the
corresponding failure probabilities
independent
Cost model[1]
The checking costs {ci}
using the personnel
hours and wages required for
troubl
equipment. The checking costs can be highly varying,
depending on the administrative domains in which
the checked nodes resides.The total checking cost is
the only cost component being considered in our
optimization problem.we assume that
pi
anyvalues in [0, 1] and [0,1].
III. O
PTIMIZATION
P
ROBLEM
End-to-
end inference approach for correcting
data-path failures.
Assume each node
known
Failure probability p
i
: the likelihood that
node i has failed.
Checking cost c
i
: the cost needed to perform
sanity tests on node i.
Minimize
the expected total checking cost of
correcting (i.e., diagnosing and repairing) all faulty
nodes.
Finding Good/Bad Paths
For each data path,
Good
if the data path has no faulty
and can deliver data
Bad –
if the data path has at least one faulty
node and cannot deliver data
Each node has the same data-
forwarding behavior
across all paths upon which it lies.This implies if a
node lies on at least one good path, it is a non
(good) node.
ISSN
ISSN
Available online at
International Journal of Advanced Research Trends in Engineering and Technology
, August 2015
All Rights Reserved © 2015 IJARTET
YSTEM
Nodes can delay or drop packets to be
ecause of power outage, full transmission
queues, hardware errors, or route misconfiguration
stop model ,the failure probabilities
{pi} can
then be characterized via statistical measurements of
reliability indexes. Node failures and the
corresponding failure probabilities
{pi} are all
using the personnel
hours and wages required for
eshootingproblems or the cost of test
equipment. The checking costs can be highly varying,
depending on the administrative domains in which
the checked nodes resides.The total checking cost is
the only cost component being considered in our
pi
and ci can be
ROBLEM
end inference approach for correcting
Assume each node
I has a priori
: the likelihood that
: the cost needed to perform
the expected total checking cost of
correcting (i.e., diagnosing and repairing) all faulty
if the data path has no faulty
node
if the data path has at least one faulty
forwarding behavior
across all paths upon which it lies.This implies if a
node lies on at least one good path, it is a non
-faulty
Fig 1.
Finding Good and Bad Path.
Forming a Bad Tree
Monitor data streams from the root node 1 to each of
the leaf nodes 6, 7, 8, 9.
Keep only bad paths, and
remove any nodes that are known to be good.
Fig 2.
Forming Bad Path.
IV. I
NFERENCE
A
LGORITHM
Our inference algorithm
to check, each
node i is associated with a potential
function.
p
i
= failure probability of node i
c
i
= checking cost of node i
Pr(T | X
i
, A
i
) = conditional probability of
having a bad tree
T = the event that the tree is a
bad tree
X
i
= the event that node i is bad
ISSN
2394-3777 (Print)
ISSN
2394-3785 (Online)
Available online at
www.ijartet.com
International Journal of Advanced Research Trends in Engineering and Technology
(IJARTET)
20
Finding Good and Bad Path.
Monitor data streams from the root node 1 to each of
Keep only bad paths, and
remove any nodes that are known to be good.
Forming Bad Path.
LGORITHM
Our inference algorithm
selects which nodes
node i is associated with a potential
= failure probability of node i
= checking cost of node i
) = conditional probability of
T = the event that the tree is a
= the event that node i is bad
ISSN2394-3777 (Print)
ISSN2394-3785 (Online)
Available online atwww.ijartet.com
International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)
Vol. 2, Issue 8, August 2015
21
All Rights Reserved © 2015 IJARTET
Ai= the event that ancestors of
node i are good
we should first check the node with high piand small
ci, i.e., the node with the high potential first.For
general cases, we don’t know which candidate node
should be checked first to minimize the expected
cost.
V. PROPOSED APPROACH
We propose several efficient heuristics for
inferring the best node to be checked in large-scale
networks. By extensive simulation, we show that we
can infer the best node in at least 95% of time, and
that first checking the candidate nodes rather than the
most likely faulty nodes can decrease the checking
cost of correcting all faulty nodes. As a result, we
want to devise a cost effective network fault
correction mechanism that corrects all network faults
at minimum cost. To diagnose (but not repair)
network faults, recent approaches like use all network
nodes to collaboratively achieve this. For instance, in
hop-by-hop authentication each hop inspects packets
received from its previous hop and reports errors
when packets are found to be corrupted. While such a
distributed infrastructure can accurately pinpoint
network faults, deploying and maintaining numerous
monitoring points in a large-scale network introduces
heavy computational overhead in collecting network
statistics and involves complicated administrative
management.
We present the optimality results for an end-
to-end inference approach to correct (i.e., diagnose
and repair) probabilistic network faults at minimum
expected cost. One motivating application of using
this end-to-end inference approach is an externally
managed overlay network, where we cannot directly
access and monitor nodes that are independently
operated by different administrative domains, but
instead we must infer failures via end to-end
measurements. We show that first checking the node
that is most likely faulty or has the least checking
cost does not necessarily minimize the expected cost
of correcting all faulty nodes.
VI. FAULT DETECTION AND CORRECTION
Implementation is the stage of the project when
the theoretical design is turned out into a working
system. Thus it can be considered to be the most
critical stage in achieving a successful new system
and in giving the user, confidence that the new
system will work and be effective. The
implementation stage involves careful planning,
investigation of the existing system and it’s
constraints on implementation, designing of methods
to achieve changeover and evaluation of changeover
methods.
.
Fig 3.Data Flow for Detection and Correction.
A. Transmitter Module
The transmitter sends a packet to the
receiver and waits for its acknowledgment. Based
on error-detection results, the receiver generates
either a negative acknowledgment (NACK) or a
positive acknowledgment (ACK) for each received
packet and sends it over a feedback channel. If an
ACK is received, the transmitter sends out a next
packet; otherwise, if an NACK is received,
retransmission of the same packet will be scheduled
immediately, and this process continues until the
packet is positively acknowledged.
B. Fault Node Diagnosis And Correction
We consider an end-to-end approach of
inferring probabilistic data-forwarding failures in an
externally managed overlay network, where overlay
ISSN2394-3777 (Print)
ISSN2394-3785 (Online)
Available online atwww.ijartet.com
International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)
Vol. 2, Issue 8, August 2015
22
All Rights Reserved © 2015 IJARTET
nodes are independently operated by various
administrative domains. Our optimization goal is to
minimize the expected cost of correcting (i.e.,
diagnosing and repairing) all faulty overlay nodes
that cannot properly deliver data. Instead of first
checking the most likely faulty nodes as in
conventional fault localization problems, we prove
that an optimal strategy should start with checking
one of the candidate nodes, which are identified
based on a potential function that we develop. We
propose several efficient heuristics for inferring the
best node to be checked in large-scale networks. By
extensive simulation, we show that we can infer the
best node in at least 95% of time, and that first
checking the candidate nodes rather than the most
likely faulty nodes can decrease the checking cost of
correcting all faulty nodes.
C. Receiver Module
Each data packet in the system is identified
by a unique integer number, referred to as the node
number. The transmitter has a buffer, referred to as
the transmission queue, to store packet node waiting
for transmission or retransmission. The transmission
queue is assumed to have an infinite supply of
packets, referred to as the heavy-traffic condition in
relative studies in nodes.In the transmitter sends
packets to the receiver continuously and receives
acknowledgments as well. To preserve the original
arriving order of packets at the receiver, the system
has a buffer, referred to as the nodes buffer, to store
the correctly received packets that have not been
released.
VII. CONCLUSION
We presented optimality results for
diagnosing and repairing all data-path failures, with
an objective to minimize the expected total checking
cost.We also constructed a potential function to
identify candidate nodes, one of which must be
checked first to minimize the expected total checking
cost. Our proposed approach reduces the cost of
correcting all faulty nodes.
REFERENCES
[1]. Toward Optimal Network Fault Correction inExternally
Managed Overlay Networks,Patrick P. C. Lee, Vishal
Misra, and Dan Rubenstein.
[2]. Toward Optimal Network Fault Correction viaEnd-to-
End Inference,Patrick P. C. Lee, Vishal M isra, and Dan
Rubenstein.
[3]. Multicast Topology Inference from MeasuredEnd-to-
End Loss, N.G. Duffield, J. Horowitz, F. Lo Presti, D.
Towsley.
[4]. Multicast-Based Inference of Network-Internal Delay
Distributions, Francesco Lo Presti, N. G. Duffield,
Senior Member, IEEE, Joe Horowitz, and Don
Towsley, Fellow, IEEE.
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
At the forefront of technological innovation and scholarly discourse, the Journal of Electrical Systems (JES) is a peer-reviewed publication dedicated to advancing the understanding and application of electrical systems, communication systems and information science. With a commitment to excellence, we provide a platform for researchers, academics, and professionals to contribute to the ever-evolving field of electrical engineering, communication technology and Information Systems. The mission of JES is to foster the exchange of knowledge and ideas in electrical and communication systems, promoting cutting-edge research and facilitating discussions that drive progress in the field. We aim to be a beacon for those seeking to explore, challenge, and revolutionize the way we harness, distribute, and utilize electrical energy and information systems..
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
The research on Quantum Networked Artificial Intelligence is at the intersection of Quantum Information Science (QIS), Artificial Intelligence, Soft Computing, Computational Intelligence, Machine Learning, Deep Learning, Optimization, Etc. It Touches On Many Important Parts Of Near-Term Quantum Computing And Noisy Intermediate-Scale Quantum (NISQ) Devices. The research on quantum artificial intelligence is grounded in theories, modelling, and significant studies on hybrid classical-quantum algorithms using classical simulations, IBM Q services, PennyLane, Google Cirq, D-Wave quantum annealer etc. So far, the research on quantum artificial intelligence has given us the building blocks to achieve quantum advantage to solve problems in combinatorial optimization, soft computing, deep learning, and machine learning much faster than traditional classical computing. Solving these problems is important for making quantum computing useful for noise-resistant large-scale applications. This makes it much easier to see the big picture and helps with cutting-edge research across the quantum stack, making it an important part of any QIS effort. Researchers — almost daily — are making advances in the engineering and scientific challenges to create practical quantum networks powered with artificial intelligence
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
JoWUA is an online peer-reviewed journal and aims to provide an international forum for researchers, professionals, and industrial practitioners on all topics related to wireless mobile networks, ubiquitous computing, and their dependable applications. JoWUA consists of high-quality technical manuscripts on advances in the state-of-the-art of wireless mobile networks, ubiquitous computing, and their dependable applications; both theoretical approaches and practical approaches are encouraged to submit. All published articles in JoWUA are freely accessible in this website because it is an open access journal. JoWUA has four issues (March, June, September, December) per year with special issues covering specific research areas by guest editors. The editorial board of JoWUA makes an effort for the increase in the quality of accepted articles compared to other competing journals..
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
Proceedings on Engineering Sciences examines new research and development at the engineering. It provides a common forum for both front line engineering as well as pioneering academic research. The journal's multidisciplinary approach draws from such fields as Automation, Automotive engineering, Business, Chemical engineering, Civil engineering, Control and system engineering, Electrical and electronic engineering, Electronics, Environmental engineering, Industrial and manufacturing engineering, Industrial management, Information and communication technology, Management and Accounting, Management and quality studies, Management Science and Operations Research, Materials engineering, Mechanical engineering, Mechanics of Materials, Mining and energy, Safety, Risk, Reliability, and Quality, Software engineering, Surveying and transport, Architecture and urban engineering.
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
Utilitas Mathematica Journal is a broad scope journal that publishes original research and review articles on all aspects of both pure and applied mathematics. This journal is the official publication of the Utilitas Mathematica Academy, Canada. It enjoys good reputation and popularity at international level in terms of research papers and distribution worldwide. Offers selected original research in Pure and Applied Mathematics and Statistics. UMJ coverage extends to Operations Research, Mathematical Economics, Mathematics Biology and Computer Science. Published in association with the Utilitas Mathematica Academy. The leadership of the Utilitas Mathematica Journal commits to strengthening our professional community by making it more just, equitable, diverse, and inclusive. We affirm that our mission, Promote the Practice and Profession of Statistics, can be realized only by fully embracing justice, equity, diversity, and inclusivity in all of our operations. Individuals embody many traits, so the leadership will work with the members of UMJ to create and sustain responsive, flourishing, and safe environments that support individual needs, stimulate intellectual growth, and promote professional advancement for all.
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
Most experts would consider this the biggest challenge. Quantum computers are extremely sensitive to noise and errors caused by interactions with their environment. This can cause errors to accumulate and degrade the quality of computation. Developing reliable error correction techniques is therefore essential for building practical quantum computers. While quantum computers have shown impressive performance for some tasks, they are still relatively small compared to classical computers. Scaling up quantum computers to hundreds or thousands of qubits while maintaining high levels of coherence and low error rates remains a major challenge. Developing high-quality quantum hardware, such as qubits and control electronics, is a major challenge. There are many different qubit technologies, each with its own strengths and weaknesses, and developing a scalable, fault-tolerant qubit technology is a major focus of research. Funding agencies, such as government agencies, are rising to the occasion to invest in tackling these quantum computing challenges. Researchers — almost daily — are making advances in the engineering and scientific challenges to create practical quantum computers.
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
It is no surprise that Quantum Computing will prove to be a big change for the world. The practical examples of quantum computing can prove to be a good substitute for traditional computing methods. Quantum computing can be applied to many concepts in today’s era when technology has grown by leaps and bounds. It has a wide beach of applications ranging from Cryptography, Climate Change and Weather Forecasting, Drug Development and Discovery, Financial Modeling, Artificial Intelligence, etc. Giant firms have already begun the process of quantum computing in the field of artificial intelligence. The search algorithms of today are mostly designed according to classical computing methods. While Comparing Quantum Computers with Data Mining with Other Counterpart Systems, we are able to understand its significance thereby applying new techniques to obtain new real-time results and solutions.
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
Published since 2004, Periódico Tchê Química (PQT) is a is a triannual (published every four months), international, fully peer-reviewed, and open-access Journal that welcomes high-quality theoretically informed publications in the multi and interdisciplinary fields of Chemistry, Biology, Physics, Mathematics, Pharmacy, Medicine, Engineering, Agriculture and Education in Science. Researchers from all countries are invited to publish on its pages. The Journal is committed to achieving a broad international appeal, attracting contributions, and addressing issues from a range of disciplines. The Periódico Tchê Química is a double-blind peer-review journal dedicated to express views on the covered topics, thereby generating a cross current of ideas on emerging matters.
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
Onkologia I Radioterapia is an international peer reviewed journal which publishes on both clinical and pre-clinical research related to cancer. Journal also provide latest information in field of oncology and radiotherapy to both clinical practitioner as well as basic researchers. Submission for publication can be submitted through online submission, Editorial manager system, or through email as attachment to journal office. For any issue, journal office can be contacted through email or phone for instatnt resolution of issue. Onkologia I Radioterapia is a peer-reviewed scopus indexed medical journal publishing original scientific (experimental, clinical, laboratory), review and case studies (case report) in the field of oncology and radiotherapy. In addition, publishes letters to the Editorial Board, reports on scientific conferences, book reviews, as well as announcements about planned congresses and scientific congresses. Oncology and Radiotherapy appear four times a year. All articles published with www.itmedical.pl and www.medicalproject.com.pl is now available on our new website.
... So, this method introduces an effective multi-hop scheduling routing scheme that considers the mobility of nodes which are clustered in one group is confined within a specified area, and multiple groups move uniformly across the network. [35] discussed about a method, End-to-end inference to diagnose and repair the data-forwarding failures, our optimization goal to minimize the faults at minimum expected cost of correcting all faulty nodes that cannot properly deliver data. First checking the nodes that has the least checking cost does not minimize the expected costin fault localization. ...
Preprint
Full-text available
The journal is published every quarter and contains 200 pages in each issue. It is devoted to the study of Indian economy, polity and society. Research papers, review articles, book reviews are published in the journal. All research papers published in the journal are subject to an intensive refereeing process. Each issue of the journal also includes a section on documentation, which reproduces extensive excerpts of relevant reports of committees, working groups, task forces, etc., which may not be readily accessible, official documents compiled from scattered electronic and/or other sources and statistical supplement for ready reference of the readers. It is now in its nineteenth year of publication. So far, five special issues have been brought out, namely: (i) The Scheduled Castes: An Inter-Regional Perspective, (ii) Political Parties and Elections in Indian States : 1990-2003, (iii) Child Labour, (iv) World Trade Organisation Agreements, and (v) Basel-II and Indian Banks.
Conference Paper
Full-text available
We consider an end-to-end approach of inferring network faults that manifest in multiple protocol layers, with an optimization goal of minimizing the expected cost of correcting all faulty nodes. Instead of first checking the most likely faulty nodes as in conventional fault localization problems, we prove that an optimal strategy should start with checking one of the candidate nodes, which are identified based on a potential function that we develop. We propose several efficient heuristics for inferring the best node to be checked in large-scale networks. By extensive simulation, we show that we can infer the best node in at least 95%, and that checking first the candidate nodes rather than the most likely faulty nodes can decrease the checking cost of correcting all faulty nodes by up to 25%.
Article
Full-text available
We consider an end-to-end approach of inferring probabilistic data-forwarding failures in an externally managed overlay network, where overlay nodes are independently operated by various administrative domains. Our optimization goal is to minimize the expected cost of correcting (i.e., diagnosing and repairing) all faulty overlay nodes that cannot properly deliver data. Instead of first checking the most likely faulty nodes as in conventional fault localization problems, we prove that an optimal strategy should start with checking one of the candidate nodes, which are identified based on a potential function that we develop. We propose several efficient heuristics for inferring the best node to be checked in large-scale networks. By extensive simulation, we show that we can infer the best node in at least 95% of time, and that first checking the candidate nodes rather than the most likely faulty nodes can decrease the checking cost of correcting all faulty nodes.
Article
Full-text available
Packet delay greatly influences the overall performance of network applications. It is therefore important to identify causes and locations of delay performance degradation within a network. Existing techniques, largely based on end-to-end delay measurements of unicast traffic, are well suited to monitor and characterize the behavior of particular end-to-end paths. Within these approaches, however, it is not clear how to apportion the variable component of end-to-end delay as queueing delay at each link along a path. Moreover, there are issues of scalability for large networks. In this paper, we show how end-to-end measurements of multicast traffic can be used to infer the packet delay distribution and utilization on each link of a logical multicast tree. The idea, recently introduced in Caceres et al. (1999), is to exploit the inherent correlation between multicast observations to infer performance of paths between branch points in a tree spanning a multicast source and its receivers. The method does not depend on cooperation from intervening network elements; because of the bandwidth efficiency of multicast traffic, it is suitable for large-scale measurements of both end-to-end and internal network dynamics. We establish desirable statistical properties of the estimator, namely consistency and asymptotic normality. We evaluate the estimator through simulation and observe that it is robust with respect to moderate violations of the underlying model.