Conference Paper

How to identify and estimate the largest traffic matrix elements in a dynamic environment

DOI: 10.1145/1005686.1005698 Conference: Proceedings of the International Conference on Measurements and Modeling of Computer Systems, SIGMETRICS 2004, June 10-14, 2004, New York, NY, USA
Source: DBLP


In this paper we investigate a new idea for traffic matrix estimation that makes the basic problem less under-constrained, by deliberately changing the routing to obtain additional measurements. Because all these measurements are collected over disparate time intervals, we need to establish models for each Origin-Destination (OD) pair to capture the complex behaviours of internet traffic. We model each OD pair with two components: the diurnal pattern and the fluctuation process. We provide models that incorporate the two components above, to estimate both the first and second order moments of traffic matrices. We do this for both stationary and cyclo-stationary traffic scenarios. We formalize the problem of estimating the second order moment in a way that is completely independent from the first order moment. Moreover, we can estimate the second order moment without needing any routing changes (i.e., without explicit changes to IGP link weights). We prove for the first time, that such a result holds for any realistic topology under the assumption of . We highlight how the second order moment helps the identification of the top largest OD flows carrying the most significant fraction of network traffic. We then propose a refined methodology consisting of using our variance estimator (without routing changes) to identify the top largest flows, and estimate only these flows. The benefit of this method is that it dramatically reduces the number of routing changes needed. We validate the effectiveness of our methodology and the intuitions behind it by using real aggregated sampled netflow data collected from a commercial Tier-1 backbone.

Download full-text


Available from: Emilio Leonardi,
27 Reads
  • Source
    • "ISPs employ techniques such as [14] to compute their TM i.e., a matrix that specifies the traffic demand from origin nodes to destination nodes in a network. As final destinations are altered by overlay nodes via packet encapsulation as mentioned above, the IP layer is unaware of the ultimate final destination within its domain. "
    [Show abstract] [Hide abstract]
    ABSTRACT: ISPs manage performance of their networks in the pres- ence of failures or congestion by employing common trac engineering techniques such as link weight settings, load balancing and routing policies. Overlay networks attempt to take control over routing in the hope that they might achieve better performance for such failures or high load episodes. In this paper, we examine some of the interac- tion dynamics between the two layers of control from an ISP's view. With the help of simple examples, we illustrate how an uncoordinated eort of the two layers to recover from failures may cause performance degradation for both overlay and non-overlay trac. We also show how current trac engineering techniques are inadequate to deal with emerging overlay network services.
  • Source
    • "Reference [3] [4] compares NetFlow to SNMP and packet-level data collection, while [5] proposes new sampling techniques to improve the performance of NetFlow. NetFlow data has also been used to examine the accuracy of traffic matrix estimation techniques [6] and for anomaly detection [7] [8]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates the problem of deploying network traffic monitors with optimized coverage and cost in an IP network. Deploying a network-wide monitoring infrastructure in operational networks is necessary for practical reasons. We investigate two representative solutions, a router-based solution called NetFlow and an interface-based solution called CMON. Several cost factors are associated with deploying either NetFlow or CMON in a network. We argue that enabling monitoring to cover a major portion of traffic instead of the entire traffic will achieve significant cost savings while at the same time give operators enough insight to their network. We use NetFlow as an example and develop a technique to achieve the optimal cost-coverage tradeoff. Specifically, we aim to solve the Optimal NetFlow Location Problem (ONLP) for a given coverage ratio. We analyze various cost factors to enabling NetFlow in such a network. We model the problem as an Integer Linear Program (ILP). We develop two greedy heuristics to cope with such problems of large scales given its NP-hard nature. The performance of the ILP and heuristics is demonstrated by numerical results and the LM heuristic is able to achieve sub-optimal solutions within 1–2% difference from the optimal solutions in a mixed router environment. It is observed that we can achieve 55% cost savings by covering 95% instead of 100% of the network traffic. We then extend our methodology to deploying CMON into such a network. The associated cost with deploying NetFlow and CMON is compared. The results demonstrate that CMON is more cost-effective when a small coverage ratio is desired because of its more modular nature.
    Computer Networks 09/2009; 53(14-53):2491-2501. DOI:10.1016/j.comnet.2009.05.004 · 1.26 Impact Factor
  • Source
    • "A shift away from maximum likelihood methods appears in [1], a paper which has other similarities with the present work including the use of queueing models for proposing moment models. Other network applications of moment methods are in [5] [20]. Recently there has been an increase in research on bandwidth estimation focused on algorithms for end-to-end estimation [2] [8] [9] and bottleneck identification [2] [17]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for using end-to-end available bandwidth measurements in order to estimate available bandwidth on individual internal links. The basic approach is to use a power transform on the observed end-to-end measurements, model the result as a mixture of spatially correlated exponential random variables, carryout estimation by moment methods, then transform back to the original variables to get estimates and confidence intervals for the expected available bandwidth on each link. Because spatial dependence leads to certain parameter confounding, only upper bounds can be found reliably. Simulations with ns2 show that the method can work well and that the assumptions are approximately valid in the examples.
    Journal of Statistical Computation and Simulation 07/2008; 78(7):639-652. DOI:10.1080/00949650701416139 · 0.64 Impact Factor
Show more