How to identify and estimate the largest traffic matrix elements in a dynamic environment.
ABSTRACT In this paper we investigate a new idea for traffic matrix estimation that makes the basic problem less underconstrained, by deliberately changing the routing to obtain additional measurements. Because all these measurements are collected over disparate time intervals, we need to establish models for each OriginDestination (OD) pair to capture the complex behaviours of internet traffic. We model each OD pair with two components: the diurnal pattern and the fluctuation process. We provide models that incorporate the two components above, to estimate both the first and second order moments of traffic matrices. We do this for both stationary and cyclostationary traffic scenarios. We formalize the problem of estimating the second order moment in a way that is completely independent from the first order moment. Moreover, we can estimate the second order moment without needing any routing changes (i.e., without explicit changes to IGP link weights). We prove for the first time, that such a result holds for any realistic topology under the assumption of . We highlight how the second order moment helps the identification of the top largest OD flows carrying the most significant fraction of network traffic. We then propose a refined methodology consisting of using our variance estimator (without routing changes) to identify the top largest flows, and estimate only these flows. The benefit of this method is that it dramatically reduces the number of routing changes needed. We validate the effectiveness of our methodology and the intuitions behind it by using real aggregated sampled netflow data collected from a commercial Tier1 backbone.


Conference Paper: A toolchain for simplifying network simulation setup
[Show abstract] [Hide abstract]
ABSTRACT: Arguably, one of the most cumbersome tasks required to run a network simulation is the setup of a complete simulation scenario and its implementation in the target simulator. This process includes selecting a topology, provision it with all required parameters and, finally, configure traffic sources or generate traffic matrices. Many tools exist to address some of these tasks. However, most of them do not provide methods for configuring network and traffic parameters, while others only support a specific simulator. As a consequence, a user often needs to implement the desired features personally, which is both timeconsuming and errorprone. To address these issues, we present the Fast Network Simulation Setup (FNSS) toolchain. It provides capabilities for parsing topologies from datasets or generating them synthetically, assign desired configuration parameters and generate traffic matrices or event schedules. It also provides APIs for a number of programming languages and network simulators to easily deploy the simulation scenario in the target simulator.Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques; 03/2013  SourceAvailable from: cise.ufl.edu
Page 1
How to Identify and Estimate the Largest
Traffic Matrix Elements in a Dynamic Environment
Augustin Soule
?, Antonio Nucci
?, Rene Cruz
?, Emilio Leonardi
?, Nina Taft
?,
?
?– LIP VI, Paris, France
?– Sprint Advanced Technology Laboratories, Burlingame, CA, USA
?– University of California San Diego, San Diego, CA, USA
?– Politecnico di Torino, Turin, Italy
ABSTRACT
In this paper we investigate a new idea for traffic matrix estima
tion that makes the basic problem less underconstrained, by de
liberately changing the routing to obtain additional measurements.
Because all these measurements are collected over disparate time
intervals, we need to establish models for each OriginDestination
(OD) pair to capture the complex behaviours of internet traffic. We
model each OD pair with two components: the diurnal pattern and
the fluctuation process. Weprovide models that incorporate the two
components above, to estimate both the first and second order mo
ments of traffic matrices. We do this for both stationary and cyclo
stationary trafficscenarios. Weformalizetheproblem of estimating
the second order moment in a way that is completely independent
from the first order moment. Moreover, we can estimate the second
order moment without needing any routing changes (i.e., without
explicit changes to IGP link weights). We prove for the first time,
that such a result holds for any realistic topology under the assump
tion of minimum cost routing and strictly positive link weights. We
highlight how the second order moment helps the identification of
the top largest OD flows carrying the most significant fraction of
network traffic. We then propose a refined methodology consisting
of using our variance estimator (without routing changes) to iden
tify the top largest flows, and estimate only these flows. The benefit
of this method is that it dramatically reduces the number of routing
changes needed. We validate the effectiveness of our methodol
ogy and the intuitions behind it by using real aggregated sampled
netflow data collected from a commercial Tier1 backbone.
Categories and Subject Descriptors
C.4 [Performance of Systems]: Modeling techniques and Design
studies.
General Terms
Performance, Theory.
Keywords
Network Tomography, Traffic Matrix Estimation.
?This work was done when Nina Taft was at Sprint ATL; she is
currently affiliated with Intel Research Berkeley, CA.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profi t or commercial advantage and that copies
bear this notice and the full citation on the fi rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specifi c
permission and/or a fee.
SIGMETRICS/Performance’04, June 12–16, 2004, New York, NY, USA.
Copyright 2004 ACM 1581138733/04/0006 ...$5.00.
1.INTRODUCTION
A traffic matrix is a representation of the volume of traffic that
flows between origindestination (OD) node pairs in a communica
tion network. In the context of the Internet, the nodes can represent
PointsofPresence (PoPs), routers or links. In current IP backbone
networks, obtaining accurate estimates of traffic matrices is prob
lematic. There are a number of important traffic engineering tasks
that could be greatly improved with the knowledge provided by
traffic matrices. As a result, network operators have identified a
need for the development of practical methods to obtain accurate
estimates of traffic matrices. Example applications of traffic matrix
estimation include logical topology design, capacity planning and
forecasting, routing protocol configuration, provisioning for Ser
vice Level Agreement (SLAs), load balancing, and fault diagnosis.
The most straightforward approach is to directly measure traffic
volumes between network endpoints. However, such approaches
(e.g., Cisco’s Netflow) still face challenging engineering obstacles
related to the widespread deployment of a uniform measuring in
frastructure, and the collection, storage, synchronization and pro
cessing of large amounts of information. Because such approaches
will not be available to carriers in the near future, researchers have
turned to statistical inference techniques.
The relationship between the traffic matrix, the routing and the
link counts can be described by a system of linear equations
, where is the vector of link counts,
trix organized as a vector, and A denotes a routing matrix in which
element
erwise. The elements of the routing matrix can have fractional val
ues if traffic splitting is supported. (We define our notation more
thoroughly later on.) In networking environments today,
are readily available; the link counts
standard SNMP measurements and the routing matrix
obtained by examining IGP link weights together with the cor
responding topological connectivity information. The problem at
hand is to estimate the traffic matrix
because the matrix
does not have full rank, and hence the funda
mental problem is that of a highly underconstrained, or illposed,
system.
Afirstgeneration oftechniques wereproposed in[1, 2, 3]. Model
parameters are estimated using either Moment Generating methods
[1], Bayesian methods [2] or Maximum Likelihood estimation [3].
A common ideabehind these approaches totacklethehighly under
constrained problem was to introduce additional constraints related
to the second order moment of the OD pairs. Estimation is then
carried out with two batches of constraints, one on the first order
moment and one batch for the second order moment. However the
combined set of constraints is not solvable without an assumption
on the relationship between the mean and variance. For example, in
[1, 2] the authors assume that the volume of traffic between a given
???
???
?
?
is the traffic ma
???? is equal to 1 if OD pair
? traverses link
? or zero oth
?
and
?
?
can be obtained through
?
can be
?
. This is not straightforward
?
73
Page 2
OD pair has a Poisson distribution. Since the mean and variance of
a Poisson variable are identical, an estimate of the variance of the
volume of trafficfor someOD paircan beused toestimatethemean
volumeof trafficfor thesameOD pair. Caoetal.[3] assumeinstead
that the traffic volume for OD pairs follows a Gaussian distribution,
and that a power law relationship between the mean and variance
exists. Acomparative studyof thesemethods [4]revealed thatthese
methods were highly dependent upon the initial starting point, of
ten called a prior, of the estimation procedure. Hence a second
generation of techniques emerged [4, 5, 6] that proposed various
methods for generating intelligent priors. Other second generation
techniques focused on improving the speed of computation [7].
In practice, all of these statistical inference techniques for es
timating traffic matrices suffer from limited accuracy. The error
rates are distributed over a range, however, in short we can say that
they typically fall between 10 and 25%. These methods can yield
outliers such that some OD pairs can have errors above 100%. It
has been difficult to drive the error rates down below these values.
Some carriers have indicated that they would not use traffic matri
ces for traffic engineering unless the inference methods could drive
the average errors below the 10% barrier.
The hardship in further reducing error rates is mainly due to fact
that real traffic exhibits complex behaviors that substantially devi
ates from any simple traffic model. As a consequence, any apriori
guess on OD flow behavior based on simple models may poten
tially have a significant impact on the degree of (in)accuracy in
estimates. To develop realistic models, we start by studying one
months’ worth of sampled Netflow data from a commercial Tier
1 backbone traffic matrix. By examining properties of OD flows
from direct measurement data, we observed that OD flows con
tain two critical components: diurnal patterns and a fluctuations
process. We also test the “powerlaw” relationship between mean
and variance of OD flows and we show how this assumption does
not “fully” hold for real traffic. Thus our first goal is to establish
models that do not require assumptions about the meanvariance
relationship and that can incorporate the two components above.
In addition to using realistic models, we believe that further im
provements inmaking theproblemlessunderconstrained areneeded
to reduce error rates. One way to do this is to define a scheme that
allowsfor therank of thesystemtobeincreased. In[8], theauthors,
for the first time, proposed the idea of changing the link weights
and then taking new SNMP measurements under this new routing
image. The reason why this can potentially reduce the “undercon
strainedness” of the system is as follows. If additional link counts
are collected under different routing scenarios, then this yields ad
ditional equations into the linear system that would increase the
rank of the system if they are linearly independent of the exist
ing equations. In [8] the authors proposed a heuristic algorithm
for computing link weight changes needed in order to obtain a full
rank system. They incorporate some practical requirements and
constraints that a carrier has to face in their solution. The advan
tage here is that with a full rank system, there is a huge potential to
reduce errors. However, for such systems to be practical the num
ber of times carriers have to change link weights needs to be kept
small. Note that the problem is not yet solved by obtaining a full
rank system because the additional measurements will be collected
over time scales of multiple hours and thus the traffic matrix itself
will change. To the best of our knowledge, this is the first paper to
consider the cyclostationarity of traffic matrices.
In this work we use this “routing changes idea” and propose two
new methods for traffic matrix estimation. The common idea of
these two methods is to make use of the wellposed property of
the new full rank system (after the sequence of weight changes has
been applied). With this system, we can model the first and second
order moments of OD flows separately. By coupling this approach
with OD flow models that capture both the diurnal patterns and
the fluctuation behaviors of real traffic, we will show that we can
avoid any of the typical modeling assumptions (such as Poisson,
Gaussian, or fixed meanvariance laws).
Our methods thus make use of multiple mechanisms each of
which constitute one of many steps in an overall TM estimation
procedure. The idea is to prepend basic inference techniques with
some extra steps. In particular we add (1) a variance estimator, and
(2) an algorithm for suggesting link weight changes that will in
crease the rank of the system. We derive a closed form solution
to estimate the covariance of the TM based on our underlying OD
models. To the best of our knowledge, this is the first time that an
estimate for the covariance of a TM has been proposed. For our
weight change algorithm, we rely on an algorithm such as that pro
posed in [8]. The last step in our method uses a basic inference step
(e.g, a pseudoinverse or Gauss Markov estimator).
In addition to providing an estimate for the covariance, we also
prove that thevariance willalwaysbe identifiable, i.e., thataunique
solution exists. In particular, we show that for any general topol
ogy with minimum cost routing, and strictly positive link costs, it
is always possible to estimate the covariance function without rout
ing changes. Being able to estimate the covariance has significant
consequences: (1) it can be used to improve the estimate of the
mean; (2) we avoid having to make assumptions about the rela
tionship between the mean and variance; and (3) we can use our
variance estimate to define a method for identifying the top largest
OD flows. Because 30% of theOD flowsconstitute95% of thetotal
traffic matrix load in the network we studied, it can be argued that
estimating the largest flows alone would be sufficient for TM esti
mation. Others have also argued [5, 9] that carriers only care about
estimating large flows because they carry almost all the traffic and
because the largest errors occur only for small flows.
We are able to identify which flows are largest (i.e. uncover their
identity) without estimating their mean rate. If we then modify
the TM estimation problem to focus only on the large flows, we
can then reduce the number of routing changes needed by meth
ods that try to increase the rank. This is because we have reduced
the number of variables to estimate. An important consequence of
being able to identify the top flows is that it makes methods based
on weightchange algorithms more practical since it helps limit the
number ofneeded routingchanges. Inthispaper westudythetrade
off between accuracy of the estimates versus the number of routing
changes needed to drive the linear system to full rank.
We will show that with our methods we can drive the average
error rates down into a whole new range. Our method succeeds in
driving the average error rates below the 10% target; and we are
often able to reach 4 or 5% average error rates (depending upon the
scenario). To the best of our knowledge, this is the first paper that
consistently achieves errors below this 10% barrier.
The rest of the paper is organized as follows. In Section 2 we
give the formal problem statement for traffic matrix estimation.
The essence of the idea of deliberately changing the routes to de
crease the “underconstrainedness” of the problem is illustrated in
Section 3. In Section 4 we discuss the dynamic nature of Inter
net traffic and we thus motivate the statistical traffic models we
use. Section 5 presents the two methodologies we propose for esti
mating the traffic matrix, whereas the modeling aspects needed to
estimate both the first and second order statistics of a traffic matrix
in the context of both stationary and cyclostationary environments
are presented in Section 6. In Section 7 we prove that the second
order moment can be always estimated for any realistic topology
under the assumptions of minimum cost routing and stricly positive
link weights. In Section 8 we validate our methodologies using real
and pseudoreal data while Section 9 concludes the paper.
74
Page 3
2. PROBLEM STATEMENT
The network traffic demand estimation problem can be formu
lated as follows. Consider a network represented by a collection
of nodes. Each node represents a set of co
located routers (PoP). A set of
given. Each link represents an aggregate of transmission resources
between two PoPs. (We assume there are no self directed links,
e.g. there are no links
time horizon consisting of disjoint measurement intervals, in
dexed from
snapshot. In each interval or snapshot we change the link weights,
i.e. the paths followed for some OD pairs, and this gives a different
image of OD pairs traversing the network. In all the following we
assume the routing stays the same within the same snapshot, while
two different snapshot are characterized by two different routing
scenarios. Within each snapshot, we collect multiple consecutive
readings of the link counts using the SNMP protocol at discrete
time
shot lasts for
minutes because SNMP reports link counts
every 5 minutes. For simplicity we assume that the measurement
intervals are of equal time durations, but this can be easily relaxed.
Each node is a source of traffic which is routed through other
nodes, ultimately departing the network. Consider an origin desti
nation (OD) pair
traffic associated with the OD pair
at discrete time
originating from node
measurement interval
at time
intervals are long enough so we can ignore any traffic stored in the
network. Letdenote the set of all OD pairs; there are a total of
OD pairs. We order the OD pairs
a column vector
predefined order.
Let
during measurement interval
in order to form the column vector
?
???????????????????????
?
directed links
???
?????
is also
? of the form
?
???
?
?
??? .) We consider a finite
?
? to
???
?. We refer to each measurement interval as a
, called sample, indexed from
? to
!#"$?
?. Then, each snap
!%"
?'&
(
?)?+*
?
?,*
?
? , and let
?#
?/.0?
1? be the amount of
( during measurement interval
.
. In other words,
?#
?/.0?
1? is the amount of traffic
*
?that departs the network at node
*
?during
.
. Weassume thatthemeasurement
2
3
2
3
?54
?6?
?
?
?
( and form
?
?/.0?
1? whose components are
?

?/.7?
$? in some
?08??/.7?
$? be the total volume of traffic which crosses link
?:9
?
.
at time
. Order the links
?;9<?
?=?/.0?
1? whose components are
?08>?/.0?
interval
Forming the
in matrix notation
$? for all
at time
?#9?? . Let
?
8A@

?/.
? be the fraction of the traffic
?%
?/.7?
$? from OD pair
( that traverses link
? during measurement
.
. Thus,
matrix
?B8C?/.0?
$?
?ED
?FHG
?
8A@

?/.
?
?

?/.7?
$? .
?
?I4
?
?/.
? with elements
?
?
8A@

?/.
?
?, we have
???/.7?
In the literature, the
tor, while
routing matrix
obtained by gathering topological information, as well as OSPF or
ISIS link weights. Link counts
data.
A general problem considered in the literature is to compute an
estimate of the traffic matrix
observed link count vectors
that the routing matrix
$?
?
?
?/.
?
?
?/.7?
$?
??J0.
9LK?
?
?M?
??Nand
<9'K?
?
!#"O?
??N/?
(1)
?P?/.0?
$? vector is called the link count vec
?
?/.
? is called the routing matrix. In IP networks, the
?
?/.
? during each measurement interval
.
can be
?=?/.7?
$? are obtained from SNMP
?
?/.7?
$? for each
.
and
and
given the
?P?/.0?
$? for each
.
, assuming
?
?/.
? does not change in time, i.e.
?
?/.
?
?
?
?
?/.
?
?
J0.
9QK?
?
?E?
??N, and that
?
?/.7?
$? is a sample of a station
ary (vector valued) random process. Furthermore it is generally
assumed in prior work that the components of
related. The general problem is nontrivial since the rank of
is at most and, i.e. for each
(1) is underdetermined.
Our goal in this work is to estimate both the first and second or
der moments of the traffic matrix. To the best of our knowledge,
this is the first attempt to try to estimate the second order moment.
We will show how having estimates for the TM covariance enables
new methodologies for estimating the first order moment used to
?
?/.7?
1? are uncor
?
?/.
?
???RSR
4.
the system of equations
populate a traffic matrix. In the next section we discuss traffic dy
namicsand further extend thisgoal toinclude such estimationwhen
the input data is gathered over long time scales, i.e., outside the sta
tionary regime.
3.ROUTE CHANGES CAN HELP
We now explain, via an example, the basic idea of exploiting
routing changes to increase the rank of the system. We do so be
cause this is a key component of our overall methodology, and in
order to make this paper selfsufficient. For a complete explanation
of the method and an algorithm for selecting such weight changes,
see [8].
A
B
C
E
D
3
5
2
2
6
5
2
2
5
6
3
A
B
C
E
D
Snapshot 0 Snapshot 1
3
Figure 1: Example: Impact of Routing Changes
Consider the network shown in Fig. 1 composed of five nodes
interconnected by six unidirectional links. Each link has an asso
ciated weight and the traffic from each OD pair is routed along the
shortest cost path. For simplicity, we consider only five OD pairs
(indicated by arrows). On the left of Fig. 1 we represent the net
work in its normal state, when no link weight changes have been
effected. Snapshot 0 would generate the following system of linear
equations.
TU
U
U
U
U
U
U
U
U
V
?0WBX
?
WBY
?BX1Y
?
X$Z
?B[$Z
?BY\[
]^
^
^
^
^
^
^
^
^
_
?
TU
U
U
U
U
U
U
U
U
V
?`?
?a?b?
?`?
?
?b?
?`?a?a?b?
?
?
?
?
?
?`?a?a?
?
?`?
?
?b?
]^
^
^
^
^
^
^
^
^
_
T
U
U
U
U
U
U
V
?
WBX
?
W$Z
?
W$[
?
X1Z
?
[1Z
]
^
^
^
^
^
^
_
The rank of routing matrix
pairs (
simple example they do not share their links with other OD pairs.
On the right of Fig. 1 we show the effect of decreasing the weight
of link
in the weights causes the rerouting of the OD pair
the new path
new system of linear equations, i.e. a new routing matrix
can be appended to the previous set. One line of the new routing
matrix would look like
see that adding this to the original system of equations adds a new
linearly independent equation into the system. As a consequence,
the new system
can be estimated exactly.
The challenge in taking advantage of this idea, is to determine a
minimal set of weight changes so that the impact of such changes
on network trafficisaslittleaspossible. Moreover, network admin
istrators may simply not allow certain changes if their impact is too
?
?
??? is four. Two of the five OD
?
?
?,c
? and
?/cd??e
? ) can be estimated exactly because in this
?
???
?
?gf
? from 5 to 3 (snapshot
.
???). This perturbation
?
?
??e
? through
???
?
?,f
?
???/fh??c
?
???/cd??e
?
?. This snapshot generates a
?
?i?
? , that
?0W$X
?
K
?
?Q?a?`?
N
K
?
WBXj?k?k? Nml
. We can
K
?
?
???
l
?
?i?
?
l
N
l
is full rank and all five OD pairs
75
Page 4
great (e.g., if a weight change increases the delay beyond allowable
levels). In [8] the authors developed an algorithm for computing a
minimal sequence of weight changes that takes into consideration
a variety of practical administrative constraints.
In this paper, we assume we have a predetermined schedule that
identifies a sequence of link weight changes to perform in a given
order. This schedule would be the output of an algorithm such
as the one given in [8]. The number of links whose weights get
changed simultaneously inasinglemeasurement interval
i.e., typically 1, 2 or at most 3 links. When executing this method,
two routing matrices
sufficiently different so that the rank of the new routing matrix
is larger than the rank of either matrix
alone. The schedule of link weight changes should be chosen such
that aftermeasurement intervals, the rank of
number of unknown coefficients needed to estimate the TM.
.
issmall,
?
?/.
?
? and
?
?/.
?
? for
.
???
?
.
?are usually
?
?
K
?
l
?/.
?
?
?
?
l
?/.
?
?
N
l
?
?
is equal to the
4.TRAFFIC DYNAMICS
In this section we discuss traffic dynamics and what they mean
for the traffic matrix estimation problem. We begin by examining
netflow data collected from a commercial Tier1 backbone. Net
flow was run for all the incoming links from gateway routers to
backbone routers. The version of Netflow used is called Aggre
gated Sampled Netfl ow and deterministically samples 1 out of ev
ery 250 packets. Netflow samples fine granularity flows defined by
the so called 5tuple in IP packet headers. Using local BGP tables
and topology information we were able to determine the exit link
for each incoming flow. The resulting linkbylink traffic matrix is
aggregated toformbotharoutertorouter andaPOPtoPOPtraffic
matrix. We had roughly one month’s worth of data at our disposal.
24h72h120h
30
35
40
45
50
55
OD 60
Sampled OD (MB/s)
24h72h120h
0
2
4
6
8
OD 28
24h 72h120h
0
0.02
0.04
0.06
OD 13
24h 72h 120h
0
5
10
15
OD 63
Sampled OD (MB/s)
24h72h120h
2
3
4
5
6
OD 105
24h72h120h
0
0.1
0.2
0.3
0.4
OD 54
Figure 2: Netflow data for representative OD flows: large on
the left, medium on the middle and small on the right.
Figure 2 shows the evolution of six OD pairs in time across a
week (excluding weekends). Here we show three types of behavior
and provide two OD pairs per type as illustrative examples. The
two OD pairs on the left (top and bottom) are large and exhibit a
regular cyclostationary behavior in time. There is a strong period
icity at intervals of 24 hours that fits the expected diurnal patterns.
On the right we plot two OD pairs with extremely low mean rate
and see that these flows are characterized by a very noisy shape.
These OD pairs have lost any cyclostationary shape and tend to
send most of their traffic in small time intervals. The two OD
pairs in the middle column, are indeed in between these two be
haviors: some may retain small diurnal patterns while others are
2040 6080 100120
0
1.6
3.2
4.8
6.4
8x 10
4
Spike indicator
OD pairs sorted by rate
0
9
18
27
36
45
Sampled MB/s
95% of the traffic
Figure 3: Spikiness and average traffic rate for each OD pair.
already becoming dominated by noisy behavior. We will use the
terms “large”, “medium” and “small” for these three types of OD
flows; these terms are not defined precisely but are used just to ease
the readability of the presentation.
Other traffic matrix studies have found high errors in small flows
and have argued that this is not important because network admin
istrators only care about the large flows [9, 5]. By using our data
here, we can shed some light a why small flows pose problems for
estimation, and further justify this suggestion.
Small flows are difficult to estimate accurately for multiple rea
sons. First, they are often very noisy and can have long periods of
zero or closetozero transmission rates. Second, the small flows
can often be two or three orders of magnitude less in volume than
large flows. Hence the numbers are often so small that they cause
numerical stability problems in estimation methods. Third, the
small flows exhibit the largest relative spikiness of all flows. We
define spikiness of a flow in terms of its time series. We differ
ence the flow’s basic time series by one measurement interval (10
minutes in our case) and divide by the mean to normalize. This
helps illustrate how dramatically a flow can change from one mea
surement interval to the next. To have only one metric for the entire
timeseries, wetake thesum of themaximum and theabsolute value
of the minimum value observed along time. We plot our spikiness
indicator in Figure 3. The OD pairs are sorted in order of increas
ing average rate. We see that this spikiness characteristic decreases
as the average rate of a flow increases.
The good news is that these difficult small flows constitute an
insignificant portion of the total networkwide traffic. In this same
Figure 3, the right hand side shows the cumulative contribution of
OD flows to the total network traffic. We see that 30% of the OD
pairs inthe network carry95% of thetotal network traffic. Thisjus
tifiesourapproach inwhichwechoose tofocusonlyon medium and
large flows as the target for accurate TM estimation. Moreover, the
flows in this batch do not suffer from relative spikiness problems.
We define the cutoff threshold for flows in the medium/large cate
gory tobe those above 3 MB/s, so that weinclude enough OD flows
to capture 95% of the total networkwide load. Sometimes we use
the term “top” OD flows to refer to those in the medium/large cat
egory defined by this threshold.
Figure 2 hints that OD flows contain (at least) two sources of
variability, namely diurnal patterns and a noisy fluctuations behav
ior. These two types of variability could show up at different time
scales. We believe these two types of variability are important to
be captured explicitly in a model for the following reason.
Consider the implications of the idea of changing link weights to
increase the rank of the system. In practice, network operators can
76
Page 5
not change link weights in rapid succession. Once the weights have
been changed, the routing protocol needs to compute new routes
and then we need to collect multiple samples of SNMP link counts
under the new routing. It was shown in [3] that it istypically advan
tageous to use around 10 consecutive SNMP samples for TM esti
mation. In this work on timevarying network tomography, it be
comes clear that estimation methods need to take advantage of link
correlations to make reasonable estimates. Therefore it is likely to
be a few hours (at least) between weight change events.
This leavescarriers with two choices as to how tocollect the data
they need. One way is to collect all the data at the same hour every
day, but over many days. The advantage of this method is that the
traffic matrix will be stationary, but it will take many days to col
lect the required data. When the traffic matrix is in the stationary
regime, a model is needed to capture the fluctuation behavior. Pois
son or Gaussian assumptions have been used for this scenario [1,
3]. In our work, we consider a more general model for the fluctu
ations that does not require us to make assumptions about whether
the process is is stationary or cyclostationary.
The second option is for network operators to collect all the
SNMP samples from all the snapshots within a few hours; but then
the traffic matrix is likely to be nonstationary. It is known that link
fluctuations are not stationary over multihour periods. Our data in
dicates that the traffic matrix is at least cyclostationary due to the
large diurnal swings.
We leave the decision as to when to collect all the needed mea
surements up to carriers as they will have other constraints regard
ing the management of the network that will impact this decision.
In order to enable any choice they might make, and to develop a
general model, we incorporate both cyclostationary behavior and
noise fluctuations into our model and estimation procedure.
We can further motivate the need to explicitly model these two
sources of variability with the following observation. In Figure 2
we observed these two behaviors across different flows. In fact,
in the flows we care about (medium and large flows) both of these
sources of variability often appear within each flow. To see this
consider the two sample OD flows plotted in the top portion of Fig
ure 4. The real OD pairs are plotted with continuous lines. We
used five basis functions of a Fourier series to filter out the diur
nal patterns, represented by dotted lines. This represents the first
component of an OD flow. The signal that remains after the diur
nal pattern is filtered out is shown in the bottom plots. We call this
latter component of an OD flow the fluctuations process.
Wemakeafewobservations onthesetwocomponents thatstrongly
impact our choice of models. An OD flow should contain two dis
tinct components, one for the diurnal trend and one for the fluctu
ations process. The diurnal trend can be viewed as deterministic
and cyclostationary, while the fluctuations process can be viewed
as a stationary zero mean random process. The stationarity of the
fluctuations process is evidenced in the figure by a pretty consistent
absence of any cycle trends over long time scales (hours and days).
We said that one of our goals is to estimate the variance of the traf
fic matrix. In fact, what we will be estimating is the variance of this
fluctuations process.
In the literature so far, it has been common to assume a fixed
known relationship between the mean and standard deviation of an
OD flow. The most common assumption is on the existence of
a power law relationship between the two parameters. Such as
sumptions are needed to help the estimation problem to be more
tractable. Figure 5 shows the relationship between the mean and
standard deviation for our OD pair data sorted by their mean (from
the smallest to the largest one). The points on the plot do not ap
proximate a straight line as the assumption on the existence of a
power law would entail. Note that this is a loglog plot hence the
deviations from the straight line can be quite large. The best linear
24h72h120h
30
35
40
45
50
55
Sampled OD (MB/s)
24h72h120h
1
2
3
4
5
6
7
OD pair 28
24h72h120h
−6
−4
−2
0
2
4
6
8
Fluctuation Process (MB/s)
24h 72h120h
−2
−1
0
1
2
3
OD pair 60
Figure 4: On the top is shown an example of two real large OD
pairs (dotted lines) and their diurnal patterns (continuos lines).
On thebottom isshown an example of fl uctuation process for the
two OD pairsobtained byremoving thediurnaltrendsfromthe
original signals.
fitting of the data we present in the figure corresponds to a power
law with exponent approximately equal to
ance powerlaw coefficient equal to
efficient is computed for individual OD pairs, we find a large range
spanning from 1 to 4. Existing methods require that a single value
for this variance applies to all flows. Clearly this is not the case.
Also, different researchers using different data sets have computed
different values for this coefficient [3, 4]. Thus one cannot gener
ally assume a particular value, nor a single value. It is unclear what
the impact on errors is in methods that rely on this assumption.
In our work, we need not make any such assumption. Instead
we remove the power law assumption and choose to estimate the
variance directly, independently of the mean. What this plot does
confirm isthe hypothesis that OD flows with large variance are also
the ones with large mean. Hence there is an implication that the
order of magnitude of the standard deviation is closely related to
the order of magnitude of the mean for an OD flow. We will make
use of this observation to help identify the top flows.
?
?
???, that implies a vari
?H?&??.When the powerlaw co
5.METHODOLOGY
Before describing the details of our models and estimation pro
cedures, wegive herean overall summary of the methods proposed.
We do so here because our methods involve combining a number
of steps that are quite different from previous solutions.
In this paper we essentially propose two new methods. Method
1 has five critical components: (1) collect all the relevant SNMP
data and estimate the variance of OD flows. To do this we provide
a closed form solution for estimating the variance of OD flows;
(2) use an algorithm for selecting a minimal set of weight changes
(snapshots) to have a full rank linear system; (3) apply the route
changes and collect all the relevant SNMP data; (4) we incorporate
models into the estimation procedure to capture both the diurnal
trends and the fluctuations process; (5) given this new model with
a high rank system, basic inference techniques can be used (e.g.,
pseudoinverse or GaussMarkov estimators). The algorithm for
finding appropriatesnapshots isnot new. Thecontributionshereare
77
Page 6
10
−6
10
−4
10
−2
10
0
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
10
1
95% of
traffic
Std of OD (MB/s)
Mean of OD (MB/s)
Figure 5: Standard deviation of traffic fluctuationsvs the mean
traffic volumes for all the OD pairs (in Mbytes/s).
in the modeling and estimation elements of steps (1) and (4), and in
the validation of thecomposite method against real data (something
that hasn’t been done until now). We use Method 1 to estimate all
of the OD pairs in the traffic matrix.
Note that step (1) can be omitted if a pseudoinverse method is
used, since it does not require any knowledge about traffic fluctua
tion statistics.
A key component in Method 2 is to identify the top largest OD
flows (in advance of estimating their average volume). As we ob
served in the last section, a weak relation exists between the order
or magnitude of the standard deviation and the order of magnitude
of the means. By selecting the flows with the largest variance, we
can be reasonably assured that we have identified the flows that are
largest in mean. We then set the estimates for the small flows to
zero and considered them known. We thus have fewer variables
to estimate as only the large flows remain. Having a system with
fewer unknowns, we can run our algorithm for finding the key link
weight changes to increase the rank of the system. We expect to
need fewer snapshots now since there are fewer variables to esti
mate.
Method 2 can be summarized by the following steps: (1) collect
all the relevant SNMP data and estimate the variance of OD flows
using the same model mentioned above; (2) use a threshold policy
(defined later) and select all OD flows whose variance is above the
threshold  call the set
to make the subproblem (seeking only the OD pairs in
rank; (4) apply the route changes and collect all the relevant SNMP
data; (5) utilize the new models for diurnal and noise behavior; (6)
estimate both the diurnal trends and the fluctuations process for OD
pairs in
uses one extra step to identify the subset
of Method 2 differ from those in Method 1 only in that they are
applied to a subset of the original system.
Weevaluate these two methods inthis paper. Wepoint out that in
our derived methodology (next section) we also provide one more
component that is included in the mathematical development to es
tablish a complete methodology; however it is not evaluated here
due to lack of space. This last step is a close form solution for
evaluating the goodness of the variance estimate.
By comparing these two methods we can evaluate an important
tradeoff. On the one hand, in method 1 we hope to have very good
accuracy since we have upgraded the system to full rank. On the
other hand, network operators would prefer to do the smallest num
ber of link weightschanges necessary. Inmethod 2 fewer snapshots
are needed, but potentially at the expense of accuracy of the esti
??? ; (3) evaluate the route changes needed
?
? ) full
??? using basic inference methods. In essence, Method 2
?
? . The last four steps
mates. Because the small flows are set to zero, their bandwidth will
be redistributed to other flows. A comparison of these two meth
ods should illustrate how much additional error we might incur by
greatly reducing the number of snapshots needed.
6. MODELS AND ESTIMATES
Before tackling the modeling aspects of the problem, we remind
to the reader that we are interested in extracting two main com
ponents for each OD pair: the diurnal pattern and the fluctua
tion process. The diurnal pattern captures the evolution in time of
the mean, while the fluctuation process captures the noisy behavior
around the mean. In this context, estimating the mean corresponds
to extract the main trend in time of the OD pair, while estimating
the variance collapses to estimating the variance of the fluctuation
process.
6.1 Stationary Traffic Scenario
Inthis section weassume that
is a realization of a segment of a stationary discrete
time random process. In particular, we model each OD pair as
follows:
?
?
?/.0?
$?
J0.
9'K ?
?
??
??Nand
9
K?
?
!%"?
??N?
?
?/.7?
$?
???????
?/.7?
$?
??J7.
9'K?
?
??
??Nand
9K?
?
!%";?
??N/?
(2)
where
the OD pair, while
?
is a deterministic column vector representing the mean of
are zero mean column vectors, i.e.
???
?/.7?
$?
?
c
K
?
?/.0?
1?
N
???, representing the “traffic fluctuation” for the OD
pairs at discrete time
component of
Inpractice, for eachmeasurement interval
ing” link measurements, due to collection errors, human errors, etc.
In addition, depending on the network topology and the routing in
measurement interval
dant (linearly dependent). For simplicity of notation, we assume to
keep all the equations in the system. Then we have:
in the measurement interval
.. We define
?

?/.7?
$? as the
(??
?
?
?/.0?
1? .
., theremaybe “miss
., some of the equations in(1) may be redun
?=?/.0?
By definition, rank
is not possible to infer the value of
there are no missing link measurements. As we mentioned before,
we assume that the routing can change during different measure
ment intervals, so it is possible that
Define the following matrices, using block matrix notation:
1?
?
?
?/.
?
?
?/.7?
$?
??J7.
9'K?
?
??
??Nand
L9LK?
?
!%"O?
??N/?
(3)
so it
?
?
?/.
?,?
?????. Typically,
?????
?QR
4
?
?/.7?
$? from
?P?/.0?
$? , even if
?
?/.
?
?
?
?
?
?/.
?
? if
.
?
?
??.
?.
??
TU
U
U
U
U
U
U
U
V
?=?
?
?
???
?
?
?
???
?
?
!%" ?
?
?
?
?
?
?=?
??
???
!%"O?
?
?
]^
^
^
^
^
^
^
^
_
?
?
?
TU
U
U
U
U
U
U
U
V
?
?
???
?
?
?
?
?
???
?
?
?
?
?
?)?
?
?
]^
^
^
^
^
^
^
^
_
?
?
TU
U
U
U
U
U
U
U
V
?
?
?
?
???
?
?
?
?
?
?
?
!#"??
?
?
?
?
?
??
???
!%"O?
?
?
]^
^
^
^
^
^
^
^
_
?Be
?
TU
U
U
U
V
?
?
??? ?
?????
?
?
?
?
???
?????
?
?
?
?
?
?
?
?
?
?
?
?
?
??
?????
?
?
?M?
?
?
]^
^
^
^
_
Note that in the new
for
the same measurement interval
?
matrix, each
?
?/.
? matrix is repeated
!#" times, since the routing is assumed to stay the same within
.. In terms of dimensions, note
78
Page 7
that
dimensional matrix,
andis a
using (2), we obtain
?
is a
?!
"
? dimensional column vector,
is a
matrix. Putting (3) into matrix form,
?
is a
?!
"
?
?h4
?
?!#"
4
dimensional column vector,
e
?!#"??
?
?!%"
4
??
?
??Qe
?'?
(4)
Ouraimistoestimate
?
fromtheobservations
???
?
?
???
?????????
as an es
?=?
?
?
!
"
?
?
?
?????????
???
??
???
!%"?
?
? . We can use this estimate of
?
timate of
estimate of
is of full rank.
One possible approach for synthesizing our estimate
imize the Euclidean norm of
?
?/.7?
$? for future time intervals. We shall denote our
?
by
?
?. Throughout the paper, we shall assume that
?
?
?
is to min
?
?
?
?, i.e.
?
?
?????????????
?
???
?
?
?
?
?
l
?
?
?
?
?
?
??
(5)
6.2Estimating the Traffic Matrix Given
Covariance of Traffic Fluctuations
In this section we show how simple and wellknown inference
techniques, i.e. PseudoInverse and GaussMarkov estimators, can
be applied to solve the optimization problem shown in Equation (5)
by using the full rank property of our approach.
To this end, let
be the covariance matrix of
?
?
, i.e.
?
??c
K
??
l
N7?
(6)
Let denote with
intervals, from
total number of samples collected across all the experiment. Note
that exists the biunique relationship between the discrete time
and the two temporal indexesand
? the discrete time across all the measurement
where
? to
?5?
?
?
?E?
?!%"?? , that identifies the
?
.
, i.e.
?
?).
!
"
?
, where
.
???
????!%"?? and
?
?
?
?
????!%"???!#" . Then the covariance matrix
?
can be written as:
?
?
T
U
U
U
U
V
?
?
???
?
?i?
?
???????
?
?Q?
?
?
?
?i?
?
?
?
???
???????
?
?Q?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?`?
?
?
?
?
?Q?
?
?
?????
?
?
???
]
^
^
^
^
_
?
(7)
where
?
?
?,? is the
4??4
matrix defined by:
?
?
?g?
?
K
?

?
?g?
isunknown. For the time being, however, let
is known. With thisassumption, we now consider
the problem of estimating the traffic matrix
mate of given, in the sense of minimizing
N
?
diag
?/c
K
?

???
?
?

???
?
?,?
N
?
?
(8)
Withthisdefinition, thecovariancevector of
A basic issue isthat
us assume that
?
isequal to
e
?
e%l
.
?
?
?. The best linear esti
?
?
c
K
?
?
?
?
?
?
l?
?
?
?
?
?
Nwith respect to
Square Error (MMSE) estimator. The best linear MMSE estimate
ofcan be obtained from the GaussMarkov Theorem [10], and is
stated below.
PROPOSITION 1. Thebest linearMMSEestimator
is
?is known as the best linear Minimum Mean
?
?
?
of
?
given
?
?
?
?
?
?:?
?;?
???
???
is the identity matrix. If
?
l
?Ce
?
e
l
??
?
?"!
?
?
l
?Ce
?
e
l
??
?
???
(9)
Note that the estimate
timate when
tribution, it can be verified that the estimate in (9) is in fact the
maximum likelihood estimate of
?
?
in (9) reduces to the pseudoinverse es
has a Gaussian dis
?
?
?.
COROLLARY 1. Regardless of whether or not
the estimate in Proposition 1 is unbiased, i.e.
thermore we have
?
is Gaussian,
c
K
?
?
N
?
?, and fur
?
?:?
?;?
???
????
?
?
l
?Ce
?
e
l
??
?
?
!
?
?
l
?Ce
?
e
l
??
?
e
?
(10)
and
c
K
?
?
?
?
?
?
?
?
?
?
?
?
l
N
?
?
?
l
?Ce
?
e
l
??
?
?
!
?
?
(11)
Note that (11) allows us to estimate the accuracy of our estimate
ofgivenand
of the matrix
estimate of theelement of the traffic matrix.
An advantage of using the pseudoinverse estimate above is that
it does not require knowledge about the statistics of
other hand, knowledge of the statistics of
an improved estimate of
the components ofmay have different variances. For example, if
the variance of a component of
ence between that component and the corresponding component of
should be heavily weighted in the determination of an estimate
of
6.3 Estimating the Covariance of Traffic
Fluctuations
Next, we are interested in estimating the variability of traffic be
tween OD pairs over time, so that a better estimation of
obtained using GaussMarkov method (Equation 9) and confi dence
intervals for our estimate can be defined (Equation 11). Moreover,
we have shown previously in Section 4 how flows with large vari
ance are also the ones with large mean. Having a model that allows
us to estimate the variance of each OD pair can help the identifica
tion of the top largest flows carrying the most significant fraction
of network traffic. To this end, we will consider the problem of
estimating the covariance function of the fluctuation process
In doing that, we assume that different OD pairs fluctuation can be
modeled as independent random processes.
We shall present a method for obtaining such an estimate. We
highlight two nice characteristics of our model. First, it does not
require any knowledge of the first order moment statistics. Previ
ous approaches assumed to know exactly the mean to recover the
covariance and viceversa. As a consequence, the new model does
not suffer anymore of potential error propagation problems intro
duced by the first order moment estimation. Second, the model
does not require any routing configuration change. It uses only a
large number of measurements of the link counts under the same
routing configuration. We prove in Section 7 that such a result gen
erally holds for any topology under realistic assumptions of mini
mum cost routing and strictly positive link costs.
Let usconsider ageneric discrete time
Section 6.2. We can define the link correlation matrix as
?
?
?
?
?
? . In particular, the
(
?
?
element of the diagonal
c
K
?
?
?
?
?
?
?
?
?
?
?
?
lBNis the mean square error of our
(
?
?
?
. On the
?
can be used to obtain
?. In particular, from (4), it is evident that
?
?
is small, the square of the differ
?
?
?
?.
?
can be
?
?
?.
?O9'K?
?
?d?
??Npresented in
e$#
?
?g?
?
c
K
?????
?
l
???
?
?g?
N. For twolinks
? and
% , each entryof thismatrix
can be defined as follows:
c'&)(
*
?,+
?
(
*
??+
?
?
8A@
?,K
?
?
???
?
???
?
N
?.
@
?
K
?
?
??
?
???
?
?g?
N0/
?
(
*
?1+
?
?
8A@
?
?

@
???
?
?
?g?
?
(
*
?1+
?
@
??+
?
@
?32+
?
?
8A@
?
?

@
?
c
K
?
?
Nmc
K
?
?
N/?
Note that in the previous statement we assumed that each OD
pair is independent. By using a matrix notation, we can write:
e
#
?
?,?
?
?
?
?
?g?
?
l
?Ic
K
???
NAc
K
???
N
l
?
?
?
?
?g?
?
l
?Ic
K
?#Nmc
K
?#N
(12)
l
?
Then, we can estimate the
???
?g? as follows:
?
?
?
?,?
?4???????????
5
3k3
e6#
?
?g?1?
?87
?
l
?
c
K
?%Nmc
K
?%N
l
3k3
?
?
?
Finally we notice that we can rewrite equation (12) as
9;:
?
?g?
?
<
??=
?
?,? , where
9>:
?
?,? is
e6#
?
?g??
c
K
?%NAc
K
?%NAl
ordered as a
?
?
vec
79
Page 8
tor,
products of each possible pair of rows from
vector whose elements are
As a consequence an estimation of
the pseudoinverse matrix approach:
<
is a
?
?
?`4
matrix whose rows are the componentwise
?
, and
?=
?
?g? is a
4
?

?
?g? .
?
=
?
?g? can be obtained using
?
??=
?
?g?
???
<
l
<
?
?
<
l
9>:
?
?,?
?
(13)
We conclude this section with some considerations on the accu
racy of the estimation of
?
??=
?
?g? . After some computations it results:
c
?
?
?
?
=
?
?g?1?
?
=
?
?g?,?
l
?
?
?
=
?
?,??
?
=
?
?g?,???
?
?
?
?
3k3
?I3k3
?
*
8
+
?
?
*

+
?
l
?
*
?,+??
l
?
*
??+??
c
K
?
8??
?>???

?
?
?
?g???
8>?
?
???

?
?
?
?,?
N
?
?
?
?
3k3
?I3k3
?
*
8
+
?
?
*

+
?
l
?
*
?,+??
l
?
*
??+??
c
K
?
8??
?>???

?
?
?
?,?
Nmc
K
?
8??
?
???

?
?
?
?g?
N
being
?
?
K
?
<
l
<
?
?
<
l
N
l
K
?
<
l
<
?
?
<
l
N.
PROOF. Let us consider the case in which
the notation (the other cases are similar). Let
tor whose components are
some easy computation we can write:
?
?
? . Thissimplifies
?
?
???
? be an
?
?
vec
?
8>???
???

???
?B?
c
K
?
8>???
?
?
Nmc
K
?

???
?
N, after
?
??=
?
???
?
?
?
l
?
*
?
+??
?
<
l
<
??
?
<
l
?
???
?
(14)
but since
?
=
?
???
?
?
l
K
D
l
?
?
+??
?
<
l
<
?
?
<
l
N
9
:
?
??? it results:
??=
?
???:?
?
??=
?
?
???
?
?
?
l
?
*
?
+??
?
<
l
<
??
, being
?
<
l
K
9>:
?
????
?
?
???
?
N
(15)
from which the assert easily follows just remembering that given
any pair of real valued vectors
;
(in module) eigenvalue of the symmetric matrix
erty can be easily checked since
thus selfadjoint; which implies that a complete system of orthogo
nal normalized eigenvectors of
?
and
?
and a real valued matrix
f ?%l1fl:f
?
3k3
flf
3k3
??l
?
3k3
flf
3k3the maximum
. This prop
fPl:f
f
l
f
is a symmetric matrix and
f
l
f
exists.
We notice that Equation (15) can be used to relate the confidence
of the estimation of
statistics which can be evaluated through standard statistical tech
niques.
6.4CycloStationary Traffic Scenario
Next, we consider a cyclostationary model for traffic matrices
and the associated estimation problem. As before, let
the fraction of the traffic
link
plicity of notation we hide the snapshot information by using the
discrete time
before. Rather than assume that
constant mean, weconsider amodel where
Specifically, instead of (2), we assume that
?
?
?,? to the forth order moments of linkcount
?
8A@

?/.
? be
?

?/.0?
1? from OD pair
(
that traverses
? during measurement interval
.
at discrete time
. For sim
? defined in section 6.2. Thus, (1) and (3) hold as
?
?
?
?g?
?
is a random process with
iscyclostationary.
?
?
?
?g?
?
?
?
?g?
?
?:?
?g?
???
?
?g?
??J
?O9K?
?
?Q?
??N/?
(16)
We assume that
the sense that
bution. More specifically, we assume that
that
is a deterministic (vector valued) sequence, periodic
with period
?
?
?
?g?
?
is cyclostationary with period
! , in
?
?
?,? and
?
?
?
?
!? have the same marginal distri
?
?
?,? is zero mean and
?
?:?
?g?
?
! . In Section 4 we have shown that the “fluctuation”
vectors
scale. Since this claim does not hold at smaller time scales, next we
introduce a more general model based on the assumption that the
“fluctuation” vectorsis cyclostationary to second order,
i.e.
are such that
the “fluctuation” vectors
model is still valid but the covariance matrices
for alland
Our aim is to estimate
the observations
We shall assume that
sum ofgiven basis functions, i.e.
???
?
?g?
?
can be assumed to be stationary at 5 minutes time
???
?
?g?
?
c
K
?
?
?,?
??l?
?
?
%
?
N
?
???
?
@
??
is stationary to second order, the
where the covariance matrices
for all
?
?
?
@
??
?
?
?
@
??
?
?
?
?????
@
??
.. Note that if
???
?
?g?
?
???
?
@
??
are such that
???
?
@
?
?
?
???
?
@
?
?
? ,
?
.
?:?
?g? for all
? (the “traffic matrix”) given
?P?
?,? ,
?
?
?OR
?
?
?!%" .
?:?
?g? can be represented as the weighted
?
!??
?
?
?1?
?g?
?
?
???
*
?
+????
???
?
?
?g?
(17)
where for each
? ,
?
?is a
4?b?
vector (of “coefficients”), and
?
?
???
? is a scalar “basis” function that is periodic with period
particular, we will consider a Fourier expansion where
??!
! . In
?
?
?
?,?
???! #"?$
?C?&%
?'?
?
, if
, if
?
?
?
?
!??
$
??1?C?&%
?
?
?d?L!???????!?!(?
?
?
?
?
?
?
!??
(18)
Substitution of (17) into (16), and then (3), we obtain
?P?
?,?
?
?
?/.
?
?
?
?
?
*
?
+??
?
?)?
?
?
?,?,?
??
?/.
?
?
?
?,?
?
?+*
?
?,?
?
??
?/.
?
?
?
?g?
?
where we define the
?C?
!(?
???
?
4?<?
vector
?
according to
?
?
TU
U
U
U
V
?
?
?
?
?
?
?
?
?
?
?
]^
^
^
^
_
?
(19)
and
?
*
?
?g? is the
?
?L?C?
!
?
?
?
?
4
matrix defined as
?
*
?
?,?
?,
?
?/.
?
?
?
?
?,?
?
?/.
?
?
?
?
?,?
?#?.?
?
?/.
?
?
?
?
?
?
?g?0/
?
(20)
Next we redefine the matrix
as follows:
?
to be of dimension
?
?
?d?C?
!1?
?
?
?
4
?
?
T
U
U
U
U
V
?
*C?
???
?
*
?i?
?
?
?
?
?
*C?
?
?
?
?
]
^
^
^
^
_
?
(21)
The matrices
we have an equation similar to equation (4), namely
e
and
?
are defined as before. With this notation
??
?
?
?ae
?
?
(22)
Thus, we can use essentially the same method as before to estimate
, and henceestimate
). The same approaches proposed in Section 6.3 can be applied to
the problem described by Equation (22) to estimate the covariance
matrices.
?
?:?
?,? for each timeinstant
? (seeEquation(17)
?2?
?
@
??
80
Page 9
7.IDENTIFIABILITYOFSECONDORDER
MOMENT
We want to prove that is always possible to estimate the covari
ance function without requiring any routing configuration changes
for any topology.
THEOREM 2. For a general connected topology the rank of
under any minimum cost routing inwhich linkcosts are strictly
positive.
<
is
4
PROOF. We will prove the theorem by contradiction. Suppose
the rank ofto be smaller then
vectorthat is mapped through
be the number of non null components of
be the vectorial space of cardinality
the vectors which have null components in correspondence to null
components of
We consider the correspondence
vectorinto a vector
nents of
be the matrix obtained by
columns that in the multiplication
contribute (since multiplied by null elements of
By construction
for any
the vector which corresponds to
component of
We will show that necessarily
tradiction with the previous assumptions, since
necessarily contains one and only one non null element.
To prove that a row ofcontains one (and only one) non null el
ement let us consider the set
to the elements of
weights) for any OD pair oin
which corresponds to a maximum path cost. Consider the first and
the last link (respectively) spanned by the
claim that the set of OD pairs that pass through both links neces
sarily comprises only
corresponds to the considered links must contains only one element
different from 0.
To complete the proof we only have to show that links
are both crossed only by
both
assume crosses both links
two possibilities: 1)
path; in this case however both the origin the destination of
would result coincident with the origin and the destination of
then contradicting the fact that
link or is not the last link of the path spanned by
however the path cost ofresult larger than the path cost of
(since the subpath offrom
cost of the wholepath) thus contradicting the fact that the path
cost ofis maximum.
<
4
then there exists a non null
in the null vector (i.e,
???
9
?
(
<
<1?
?
?
? ). Let
?
?
4
?
? . Let
?
?
comprising all
?
? .
???????
?
l
which maps any
?
9
?
?
9
?Sl
by discarding null compo
by discarding the
withwould not give
).
. Finally let
. We notice that every
??Let
?
<<
<:??
9
?
?
<:?
?
?
<
?
?
9
?
?
? be
?
? though
?
?
? is not null.
?
<
?
?
?
?
? , thus leading to a con
has a row which
?
<
?
<
7
???
of OD pairs which correspond
?
? and compute pathcost (sum of the link
be an OD pair in
7
???. Let
?
???
7
???
? and
%
?
???
path. We
?
???. As a consequence the row of
?
<
which
? and
%
?
???: i) by construction
?
???
crosses
? and
% . ii) no other OD pairs can cross
? and
% . Indeed
?
*
???
?
??
???
? and
%
then there are only
the last link of the
? is the first and
%
???
???
*
?
???
?
???
?
*
???
?
??
???; 2) either
? is not the first
%
?
*
???; in this case
?
*
???
?
???
?
*
???
? to
%
necessarily have the same
?
???
?
???
8. RESULTS
We now evaluate our two methods using the Netflow data from
our commercial Tier1 backbone. We focus on the scenario in
which operators want to estimate the TM within hours (as opposed
to using measurements from the same hour each day over many
days); in other words, this corresponds to the cyclostationary en
vironment. In Section 8.1 we validate our model for the mean es
timation, i.e. the diurnal pattern of a PoPtoPoP traffic matrix.
We show that Method 1 can push error rates down into a whole
new range. In Section 8.2 we show how the second order moment
model in Method 2 helps the identification of the top largest flows
carrying the largest amount of network traffic. As a consequence
of being able to do this, we can dramatically reduce the number of
8h16h24h
60
70
80
90
100
110
Sampled OD (MB/s)
8h16h24h
5
10
15
20
25
OD 56
8h16h24h
0
5
10
15
OD 28
8h16h24h
10
15
20
25
30
OD 63
Sampled OD (MB/s)
8h16h24h
0
10
20
30
OD 39
8h16h24h
0
5
10
15
OD 105
OD 60
Figure 6: Method 1. Mean Estimation. Real OD flow (dotted
lines), denoised OD flow (continuous lines), and estimated OD
flow (dashed lines).
routing changes needed to obtain a full rank system. This in turn
is the key to driving error rates down consistently. We illustrate the
tradeoff in terms of error rates versus number of routing changes
used. For both methods, we used a pseudoinverse estimator as our
basic inference method in the last step. We used this, rather than
the GaussMarkov estimator because (as will be explained later),
we did not have enough months of netflow data to do a proper com
parison using GaussMarkov estimators.
8.1Method 1: mean estimation
In this section we assess the accuracy of the mean estimation
when the traffic matrix is computed using Method 1. We remind
the reader that in this section, our goal is to estimate all of the OD
pairs. For component (2) of this method, we used the heuristic
presented in [8] to generate a sequence of snapshots (a schedule
of link weight changes) in order to obtain a full rank
For our network scenario considered, the algorithm determined that
snapshots were needed to identify all the OD pairs. Of
these 24 snapshots, 22 of them involve only one link weight change
at a time, (i.e. single weight change), while the last two involve two
simultaneous link weight changes, (i.e. double weight changes).
Beforeproviding summary statisticsfor all theOD pairs, wegive
results for illustrativepurposes  fromsixparticular OD pairs. The
six pairs in Figure 6 all come from the large and medium flow cat
egories. For each flow, these graphs shows the temporal shapes
of the real, denoised, and estimated OD pair. The denoised OD
pair refers to an OD pair with everything filtered out except the
first 5 basis functions of Fourier series; put alternatively, this illus
trates how well a simple Fourier model captures the changing mean
behavior. We see that our model fits the large and medium flows
extremely well. It is interesting that the quality of the estimation
obtained decreases as the average rate drops. The two on the right
do suffer from larger errors. Note that these two OD flows (#28
and #105) were the worst case performing OD flow estimates from
within the medium and large category. Our method exhibits the
same type of performance behavior as other methods in that it es
timates large flows well and has difficulty as the flows get smaller
and smaller.
The gain of our method comes in terms of the actual error rates
achieved for these top flows. To examine estimation errors in gen
eral, we use the following two metrics. First, we examine the dif
ference between the fi rst component of the Fourier series, i.e. the
?
matrix.
?
?E???
81
Page 10
2468 10 1214161820
2
4
6
8
10
12
14
16
18
20
10%
Real mean of OD (MB/s)
Estimated mean of OD (MB/s)
Figure 7: Method 1. Relative error of
Fourier series for top flows with
?
"
? component of
!#"
???
??? .
10
−2
10
0
10
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Relative L2−norm error across a day
Cummulative number of OD pairs
Nb=3, Ns=10
Nb=3, Ns=100
Nb=5, Ns=10
Nb=5, Ns=100
Figure 8: Method 2. Cumulative distribution of the rela
tive L2norm error for top flows as a function of
" .
!?? and
!
continuous component, of the netflow data and the estimation pro
vided by our models. Since this part of our estimate would be used
to populate a traffic matrix, we compute the relative error of this
estimate. Second, since we have a temporal process, we need to
examine the errors in estimation over time. We thus use the differ
ence in energy between the estimate
In particular, we use the relative L2norm to estimate the goodness
of the model fitting,
Figure 7 shows our first metric on the difference of the first com
ponent of the Fourier series for the real data (xaxis) and the esti
mated OD pairs (yaxis) for medium and large flows when
?
?

?
?g? and the real data
?

?
?g? .
3k3
?#
?
?,?1?
?
?%
?
?,?
33
?
?
?
33
?%
?
?,?
33
?
?.
!?"
?
?
??? samples per snapshot are used. The average error is 3.8%.
rors typically lie somewhere between 11%23%.
have indicated that they would not use traffic matrices for traffic
engineering unless the inference methods could drive the average
errors below the 10% barrier. We believe that this is the first study
to achieve this.
Among these flows, the worst case relative error across all OD
pairs is 28%. If we look carefully at the figure, we can see that all
the flows have less than 10% relative error, except for four outliers.
These outliers correspond to OD pairs whose average rate is less
than 5 MB/s, i.e. among the smallest OD flows within the “top” set
represented here. For 90% of these top flows (i.e., excluding these
four outliers), the average error drops below 2% and the worst case
relative error drops to 4.6%. (See the first column in Table 1 for 24
snapshots.) We point out that small OD pairs (those whose rate is
under 3 MB/s) do not appear in the figure but are considered in the
system and are thus a part of the overall computation.
Figure 8 shows the L2norm relative error cumulatively. In most
of our calculations the number of basis functions we used was
and the number of samples per snapshot used was
This is a large improvement upon other methods whose average er
1Some carriers
!(?
is 44%, corresponding to OD pair #105). We see that 85% of the
flows had an L2norm error of less than 10% (see OD pairs #56 and
#39 as an example). Figure 6 helps us to understand the L2norm
error. The flows with low errors are typically going to be the large
and medium OD pairs since our model succeeds in capturing the
diurnal trends. The errors are typically going to appear in the es
timates for smaller flows; Figure 6 illustrates that even though we
?
&
!?"
?
?
??? . This figure shows that the worst case L2norm relative er
ror was 56% corresponding to OD pair #28 (the second worst case
?
It is hard to compare numbers exactly because different studies
use different amounts of total load. However we capture more total
traffic than most other studies that typically include 75% or 80% of
the networkwide load.
have trouble tracking the dynamics, we do still track the mean be
havior of these flows. We are including the examples of the worst
case behavior for completeness of analysis and to describe how our
method can perform in estimating small OD flows.
Othershavereportedrelativeerrorsontheirmeanestimateswhere
the means are computed over some specific interval (usually some
what long). Our L2norm relative error is an error rate on our es
timation (or fitting) of the dynamic OD flow varying in time; put
alternatively it summarizes “instantaneous” errorsin OD flow mod
eling. We cannot compare the value of this latter metric to other
studies because other studies have not tried to build a model cap
turing the temporal behavior.
The performance of our method is influenced by the number of
samples per snapshot and the number of basis functions used. In
this same figure, we can examine the influence of these two param
eters. Intuitively, a larger number of basis function
a better quality estimate, although at the cost of a larger number of
samples. Note that the number of samples
role independently of the number of basis functions implemented.
The more samples collected the more is learned about the temporal
evolution of each OD pair, and a better estimation can be provided.
For a fixed number of basis functions, going from
?
!1? will lead to
!#" plays an important
!
"
?
?
? to
!%"
gains and thus we decided to set
evaluation.
?
??? yields a substantial improvement. Our experimentation
showed that using more than 5 basis functions yielded insignificant
!??
?
&
for the remainder of our
8.2Method2: identificationandestimationof
the top OD pairs
In this section we study the performance of Method 2. We start
by estimating the variance of the OD flows. This information is
used to isolate the “top” flows that we want to estimate. We do this
by setting the small flows to zero and hence removing any need to
estimate the associated variables. As we will see, Method 2 makes
an interesting tradeoff surface, namely that of reducing the number
of snapshots required at the expense of some accuracy in estima
tion.
In Section 4 we saw that 95% of the network traffic is carried
by only 30% of the OD pairs, each of whose rate is greater than 3
MB/s. The first two steps of Method 2 are used in order to try to
identify as many of these large flows as possible. First we compute
the variance of the OD flows using Equation (14). Second we order
the OD flows by size of variance. By relying upon our observation
that flows that have large variance are also typically large in mean,
82
Page 11
the task now is to set a threshold and select the “top” flows above
this threshold. There are two issues arising from these steps.
As discussed in Section 6, the nice feature about our variance
estimator is that it does not require any routing changes. However
a large number of SNMP samples are required to force the relative
error of the variance estimate to be under 5%. We selected the
5% target arbitrarily. After some experimentation we found that
roughly 25,000 samples were needed to achieve this target level of
accuracy. In a real network this implies one needs about 3 months
worth of SNMP data. In commercial networks obtaining this much
historical data is not a problem as all ISPs store their SNMP data
for multiyear periods. Although we had plenty of months (years)
of SNMP data, we did not have 3 months of Netflow data available
to us that would have been needed to do a complete validation of
this method.
We therefore decided to use pseudoreal data for this evaluation.
We call this pseudoreal data because it is generated based on a
model fitted to actual data (but only one month’s worth). To create
sample OD flows, we filter out the noise from each of our sampled
OD pairs and keep only the first five components of the matched
Fourier series. We generate sample OD flows, over longer peri
ods of time, using this Fourier series model to which we add a zero
mean gaussiannoisewithapowerlawvariance whosecoefficient is
set to 1.56 (in accordance with the empirical data observed in Fig
ure 5). By comparing Figures 6 (Netflow data) and 11 (pseudoreal
data), we can see how well our pseudoreal data generator matches
the actual sampled data. We route this traffic(according to the orig
inal A matrix, i.e. snapshot
resulting SNMP link counts for the 3 month period under study.
This last step is the same as the methodology presented in [6].
.
?
? ) and generate what would be the
10
1
0
1
2
3
4x 10
4
Std of OD flows (MB/s)
Mean of OD flows (MB/s)
synthetic
estimated
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
0
1000
2000
3000
4000
5000
6000
Std of OD flows (MB/s)
Mean of OD flows (MB/s)
Figure 9: Real and estimated standard deviation for all OD
pairs, large and medium(on thetop) and small (on thebottom).
We compare our estimate of the standard deviation (std) to the
standard deviation of the pseudoreal data in Figure 9. The top
plot is for the medium and large flows, while the bottom plot in
cludes the comparison for small flows. The variance estimate for
the medium and large OD flows is quite good in that it achieves an
average estimation error of less than 5%. As expected, it is harder
to estimate the variance of the smaller OD pairs and we see the
errors can span a large range. This challenge cannot be met by
merely increasing the number of samples because it is due to the
difference in order of magnitude of large and small OD pairs. As
a consequence, a small error in the stdestimate of large OD pairs
020406080100
0
5
10
15
20
25
# of Snapshot needed
Fraction of OD pairs set to zero [%]
0
9
18
27
36
45
95% of the traffic
Sampled OD (MB/s)
Figure 10: Number of snapshots needed as a function of OD
pairs to be estimated.
will be spread across multiple OD pairs causing large errors in the
stdestimate of small OD pairs.
We were not able to use Equation (15) as a metric for confidence
of our variance estimator because this involves a detailed study for
which space is lacking. Without a method for extracting exactly the
top 30% largest OD pairs from the rest, we thus rely on a simple
threshold scheme for now. We keep all OD pairs whose rates are
above a threshold set to be two orders of magnitude lower than the
largest OD pair. The OD pairs below this threshold are set to zero.
We are now ready to examine the impact of removing the small
OD pairs on the number of snapshots required. Having isolated
the top OD flows, our methodology next uses a heuristic algorithm
to select the sequence of snapshots needed  however this time we
seek identifiability on a much reduced set of OD pairs. Figure 10
depicts the number of snapshots required as a function of the num
ber of OD pairs set to 0. Recall that as more OD pairs get set to
zero, there are fewer variables to estimate and so we expect the
number of snapshots to be decreasing. We start with 24 snapshots,
when all OD pairs are considered by the heuristic, and end with 0
when no OD pair has to be identified. Note that only 5 snapshots
are needed to disaggregate 30% of the largest OD pairs carrying
95% of network traffic. Recall that when we estimated the entire
traffic matrix using Method 1, our algorithm requires 24 weight
changes, or snapshots. Here we see that if we contend ourselves
with estimating 95% of the load of the traffic matrix, then we can
drop the number of needed snapshots to 5, i.e., an 80% reduction!
This is indeed a dramatic decrease in the number of needed snap
shots. With the number of snapshots so small, ISPs are more likely
to be able to execute such a method. One of our key findings here is
thus that by focusing only on large flows and being able to identify
them, we can enable a method (the weight change method) that ap
pears to be able to significantly improve the error rates as compared
to previous methods.
Finally we evaluate the impact of removing the small OD pairs
on the accuracy of estimation for the remaining top OD flows.
When flows are “removed” (i.e., set to zero) for the purpose of our
method, the traffic they carry still appears inside the SNMP link
counts. Thus if we assume some OD flow has zero rate, then its
actual rate will be transfered to some other OD pair retained in the
estimation process. This will increase the inaccuracy in the flows
being estimated.
Figure 11 shows an example of mean estimation for 6 top OD
pairs as a function of the number of snapshots implemented. Set
tingsmallOD pairstozero doesnot haveasignificant impact onthe
large flows. It can have an impact on the smallest of our medium
flows (again OD #28 and #105 are the worst case flows in our top
set).
83
Page 12
8h16h
30
35
40
45
50
55
OD 60
Sampled OD (MB/s)
8h16h
2
4
6
8
10
12
OD 56
8h16h
−2
0
2
4
6
8
OD 28
8h16h
5
10
15
OD 63
Sampled OD (MB/s)
8h16h
−5
0
5
10
15
OD 39
8h16h
−2
0
2
4
6
OD 105
Synthetic
Snapshot = 24
Snapshot = 16
Snapshot = 10
Figure 11: Method 2. Example of mean estimation for six OD
pairs within top flows, as a function of number of snapshots
carried out.
OD [%]Snapshot=24
ME
3.79
2.34
1.72
Snapshot=16
ME
8.96
5.79
4.54
Snapshot=10
ME
10.88
7.99
6.78
WC
28.04
15.48
9.30
WC
57.31
31.23
22.51
WC
58.40
31.20
24.85
100%
95%
90%
Table 1: Average (ME) and WorstCase (WC) relative errors of
? component of Fourier series for 100%, 95% and 90% of top
OD pairs.
We provide summary statistics for our errors as a function of the
number of snapshots in Table 1. Clearly both the average relative
error and the worst case error decline as the number of snapshots
is reduced. For example, 90% of our top flows have an average
error of under 2% when 24 snapshots are used, an average error
under 5% when 16 snapshots are used. The average error climbs to
7% when only 10 snapshots are used. We point out that all of the
average errors reported in this table are under the 10% barrier for
target error rates.
?
"
9.CONCLUSIONS
In this paper we propose a new approach for dealing with the ill
posed nature of traffic matrix estimation. In previous approaches,
additional constraints on the variance of OD flows were added into
the estimation procedure to provide more information than simply
the constraints on first order moments. To estimate the mean, both
batches of constraints are solved together along with needed as
sumptions relating the mean to the variance. In our work we derive
an estimate for the variance that is independent of the mean, and
thus doesn’t require assumptions about the mean and variance re
lationship. Our variance estimator is based on a new model for OD
flows that we establish after studying a set of directly measured OD
flows. This model explicitly incorporates both the diurnal patterns
and the fluctuations behavior.
Armed with a variance estimator, we can address the illposed
nature of the problem by using our variance estimates to identify
the largest flows in the traffic matrix. This is useful because our
data shows it is usually less than half of the traffic matrix elements
that are responsible for carrying the majority of the traffic load.
The complexity of the traffic matrix estimation problem is thus re
duced because the number of variables to estimate has decreased.
We use an algorithm to identify a sequence of IGP weight changes
that should be applied to backbone links. By collecting additional
SNMP measurements under these different routing snapshots, we
can add new linearly independent equations into the basic linear
system thereby increasing its rank. Increasing the rank of the sys
tem is the key to making the problem less underconstrained.
We thus make use of all of these mechanisms to propose a new
methodology thatincorporatesthevarianceestimator, thealgorithm
for finding appropriate routing snapshots, and an inference tech
nique as the final step. Both the variance estimator and the infer
ence scheme are based on our OD flow model that incorporates
two types of flow variability behaviors, namely a cyclostationary
diurnal pattern and a stationary fluctuations process. We show that
our OD flow models work well in that they accurately capture the
temporal dynamics of Internet backbone traffic.
One of our key findings here is thus that by focusing only on
largeflows and being able toidentify them, we can enable amethod
(the link weight change method) that is able to significantly im
prove the error rates as compared to previous methods. In all our
test cases, the average errors stayed consistently below the 10%
target carriers have suggested, and sometimes even reached as low
as 1 or 2% (depending upon the scenario evaluated). We believe
that this is the first proposed method to achieve average error rates
in this range.
We show the tradeoff in accuracy of the estimates versus the
number of snapshots used in increasing the rank. On the one hand,
as more and more small flows are ignored, the errors in the esti
matesof thelargeflowsincrease. Ontheother hand, dropping more
and more small flows means that fewer snapshots are needed to ob
tain a full rank system for the remaining flows of interest. The ad
vantage of this is that it renders the weightchange approach more
practical since only a few such changes may really be needed in
an operational environment. For the backbone network considered
here, this tradeoff is not too dramatic in that we can still achieve av
erage error rates under the 10% barrier even when a limitednumber
of link weight changes are carried out.
10.
[1] Y.Vardi, “Estimating SourceDestination Traffi c Intensities from
Link Data”, Journal of the the American Statistical Association,
91(433), March 1996.
[2] C.Tebaldi and M.West, “Bayesian Inference of Network Traffi c
Using Link Count Data”, Journal of the the American Statistical
Association, 93(442), June 1998.
[3] J.Cao, D.Davis, S.Vander Weil, and B.Yu, “TimeVarying Network
Tomography: Router Link Data”, Journal of the the American
Statistical Association, 95(452), 2000.
[4] A.Medina, N.Taft, K.Salamatian, S.Bhattacharyya and C.Diot,
“Traffi c Matrix Estimation: Existing Techniques Compared and New
Directions”, ACM Sigcomm, Pitsburgh, PA, August 2002.
[5] Y. Zhang, M. Roughan, N. Duffi eld and A. Greenberg, “Fast
Accurate Computation of LargeScale IP Traffi c Matrices from Link
Loads”, Proceedings of ACM Sigmetrics San Diego, CA, June 2003.
[6] Y. Zhang, M. Roughan, C. Lund and D. Donoho, “ An Information
Theoretic Approach to Traffi c Matrix Estimation”, Proceedings of
ACM Sigcomm, Karlsruhe, Germany, August 2003.
[7] Gang Liang, Bin Yu, “Pseudo Likelihood Estimation in Nework
Tomography”, IEEE Infocom, San Francisco, CA, March 2003.
[8] A. Nucci, R. Cruz, N. Taft and C. Diot, “Design of IGP Link Weight
Changes for Estimation of Traffi c Matrices”, IEEE Infocom, Hong
Kong, China, March 2004.
[9] A. Feldmann, A. Greenberg, C. Lunc, N. Reingold, J. Rexford and F.
True, “Deriving Traffi c Demands for Operational IP Networks:
Methodology and Experience”, IEEE/ACM Transactions on
Networking June 2001.
[10] H. Stark and J.W. Woods, “Probability, Random Processes, and
Estimation Theory for Engineers”, PrenticeHall, Englewood Cliffs,
New Jersey 07632.
REFERENCES
84
View other sources
Hide other sources
 Available from Emilio Leonardi · May 19, 2014
 Available from psu.edu