Fig 2 - uploaded by Rebecca Steinert
Content may be subject to copyright.
Examples of latency distributions on a wired network. Starting at the upper left, the graphs describe the latency distribution on a local Ethernet link on servers with no, low, medium, and high load respectively.
Source publication
We present a distributed adaptive fault-handling algorithm applied in networked systems. The probabilistic approach that we use makes the proposed method capable of adaptively detect and localize network faults by the use of simple end-to-end test transactions. Our method operates in a fully distributed manner, such that each network element detect...
Contexts in source publication
Context 1
... on empirical probe testing and latency measurements on Ethernet links in different types of real-world networks ( fig. 2 and 3), we assume that the distribution of link latencies is Gamma distributed. Similar conclusions about network traffic matching Gamma, Weibull, or other exponential distributions have been made in a number of other papers, e.g. ...
Context 2
... different values on both τ and ψ, we see that the localization rate with respect to the number of generated fault events decreases with increasing drop rate and failure rate ( fig. 12, 13). Further, we see that very high values of τ produces lower localization rates compared to the rest of the surface ( fig. 12a, 12b, 13a, 13b), which matches the results from previous sections (section 4.1 and 4.2). Moreover, we see that different values on ψ have insignificant effects on the overall localization performance ( fig. 12c, ...
Context 3
... different values on both τ and ψ, we see that the localization rate with respect to the number of generated fault events decreases with increasing drop rate and failure rate ( fig. 12, 13). Further, we see that very high values of τ produces lower localization rates compared to the rest of the surface ( fig. 12a, 12b, 13a, 13b), which matches the results from previous sections (section 4.1 and 4.2). Moreover, we see that different values on ψ have insignificant effects on the overall localization performance ( fig. 12c, 12d, 13c, 13d). In all cases, we see that around 70% of the node failures and 95% of the link faults can be correctly localized for certain ...
Context 4
... rate ( fig. 12, 13). Further, we see that very high values of τ produces lower localization rates compared to the rest of the surface ( fig. 12a, 12b, 13a, 13b), which matches the results from previous sections (section 4.1 and 4.2). Moreover, we see that different values on ψ have insignificant effects on the overall localization performance ( fig. 12c, 12d, 13c, 13d). In all cases, we see that around 70% of the node failures and 95% of the link faults can be correctly localized for certain parameter settings ( fig. 12, 13). The reduced localization rates in both the cases of increased drop rate and failure rate ( fig. 12, 13) are essentially caused by increasingly ineffective communication between ...
Context 5
... the results from previous sections (section 4.1 and 4.2). Moreover, we see that different values on ψ have insignificant effects on the overall localization performance ( fig. 12c, 12d, 13c, 13d). In all cases, we see that around 70% of the node failures and 95% of the link faults can be correctly localized for certain parameter settings ( fig. 12, 13). The reduced localization rates in both the cases of increased drop rate and failure rate ( fig. 12, 13) are essentially caused by increasingly ineffective communication between nodes. In combination with packet drops, no alternative routing paths and increasing number of overlapping faults, the information exchanged between nodes in ...
Context 6
... ψ have insignificant effects on the overall localization performance ( fig. 12c, 12d, 13c, 13d). In all cases, we see that around 70% of the node failures and 95% of the link faults can be correctly localized for certain parameter settings ( fig. 12, 13). The reduced localization rates in both the cases of increased drop rate and failure rate ( fig. 12, 13) are essentially caused by increasingly ineffective communication between nodes. In combination with packet drops, no alternative routing paths and increasing number of overlapping faults, the information exchanged between nodes in the collaborative fault-localization process becomes increasingly insufficient. This means that faults are ...
Citations
... In order to maintain critical functionality within the network, a distributed approach for autonomous anomaly detection and collaborative fault-localization was proposed. The proposed statistical method[71][72] is based on parameter estimation of Gamma distributions obtained by measuring probe response delays via simple end-to-end transactions (performed by ICMP, etc.). The idea is to learn the expected probe response delay and packet drop rate of each connection in each node, and use it for parameter adaptation such that the manual configuration effort is minimized[64]. ...
Autonomic network management is a promising approach to reduce the cost and the complexity of managing network infrastructures. It attempts to lead the human administrator out of the network control loop, leaving the management tasks to be performed by the network itself. Due to its important implication on automating management systems, this area has attracted a growing attention from both academia and industry. In this paper, we provide a holistic view of autonomic architecture proposals and the evaluation metrics existing so far. Based on this, we identify some new criteria important to the autonomic architectures. Finally, we compare different existing autonomic architectures and describe the pros and cons of each one regarding to the network management and performances.
... The choice of model is motivated by the assumption that the response delay is a sum of independent exponential transmission delays caused by e.g., queueing times in processing nodes. Empirical tests indicate that the Gamma PDF matches real-world probe response delays quite well [10]. Similar conclusions about network traffic delays (on different network levels) matching Gamma, or other exponential distributions, have been made in several papers, such as [11], [12]. ...
We present a statistical probing-approach to distributed fault-detection in networked systems, based on autonomous configuration of algorithm parameters. Statistical modelling is used for detection and localisation of network faults. A detected fault is isolated to a node or link by collaborative fault-localisation. From local measurements obtained through probing between nodes, probe response delay and packet drop are modelled via parameter estimation for each link. Estimated model parameters are used for autonomous configuration of algorithm parameters, related to probe intervals and detection mechanisms. Expected fault-detection performance is formulated as a cost instead of specific parameter values, significantly reducing configuration efforts in a distributed system. The benefit offered by using our algorithm is fault-detection with increased certainty based on local measurements, compared to other methods not taking observed network conditions into account. We investigate the algorithm performance for varying user parameters and failure conditions. The simulation results indicate that more than 95% of the generated faults can be detected with few false alarms. At least 80% of the link faults and 65% of the node faults are correctly localised. The performance can be improved by parameter adjustments and by using alternative paths for communication of algorithm control messages.
... The Netlet type 2 may include legacy/ future protocols and it is needed for exchange of information with other MCs located in different nodes. Examples of algorithms employed by Cross-Layer QoS for self-adaptation are Anomaly Detection (AD) [8] and "Not All aT Once!" (NATO!) [9]. ...
The paper proposes a preliminary design of cross-layer quality of service applied to congestion control in future Internet. This is an alternative solution to QoS-aware routing, whenever the infrastructure operator cannot add new resources, and/or when re-routing is not possible. Dedicated software, running in each node, collects a list of local parameters such as available transfer rate and one-way delay between all neighbors. Then this real-time status information is distributed to all the in-network management enabled nodes that are allowed to be reached. Due to the statistics regarding individual link traffic, a minimal network coding scheme, triggered by cross-layer quality of service, is temporarily activated. This system presents an enhanced distributed routing that preserves the performances of the running services, despite the congestion which cannot be eliminated.
... In this report we will describe the extension to the existing approach to adaptive fault-handling described in [7]. Apart from detecting communication faults, our model is here extended to include detection of shifts in observed network latencies. ...
... Based on the expected probe response delay and the expected drop rate, probe tests and probing intervals are autonomously adapted to the current network conditions on individual links. To reduce communication overhead, we use two different intervals for probe tests and individual probes, as described in [7]. For detection of communication faults a probabilistic threshold is used to achieve reliable fault-detection with few false positives. ...
We present the extension of a distributed adaptive fault-detection algorithm applied in networked systems. In previous work, we developed an approach to probabilistic detection of communication faults based on measured probe response delays and packet drops. The algo-rithm is here extended to detect network latency shifts and adapt to long-term changes of the expected probe response delay. Initial performance tests indicate that detected latency shifts and communication faults suc-cessfully can be localised to links and nodes. Further, the amount of network traffic produced by the algorithm scales linearly with the net-work size.