Page 1

Thomas Karagiannis,

Mart Molle,

and Michalis Faloutsos

University of California,Riverside

Long-Range Dependence

Ten Years of Internet Traffic Modeling

Self-similarity and scaling phenomena have dominated Internet traffic analysis for

the past decade.With the identification of long-range dependence (LRD) in

network traffic, the research community has undergone a mental shift from

Poisson and memory-less processes to LRD and bursty processes.Despite its

widespread use, though, LRD analysis is hindered by our difficulty in actually

identifying dependence and estimating its parameters unambiguously.The authors

outline LRD findings in network traffic and explore the current lack of accuracy

and robustness in LRD estimation. In addition, the authors present recent

evidence that packet arrivals appear to be in agreement with the Poisson

assumption in the Internet core.

T

replicate the Internet and study it as a

whole, so we rely on thorough analysis of

network measurements and their trans-

formation into models to help explain the

Internet’s functionality and improve its

performance.

About 10 years ago, the introduction

of long-range dependence (LRD) and

self-similarity revolutionized our under-

standing of network traffic. (LRD means

that the behavior of a time-dependent

process shows statistically significant

correlations across large time scales; self-

raffic modeling and analysis is a fun-

damental building block of Internet

engineering and design. We can’t

similarity describes the phenomenon in

which the behavior of a process is pre-

served irrespective of scaling in space or

time.) Prior to that, researchers consid-

ered Poisson processes (that is, the pack-

et arrival process is memory-less and

interarrival times follow the exponential

distribution) to be an adequate represen-

tation for network traffic in real sys-

tems.1LRD flew in the face of conven-

tional wisdom by stating that network

traffic exhibits long-term memory (its

behavior across widely separated times is

correlated). This assertion challenged the

validity of the Poisson assumption and

shifted the community’s focus from

2 SEPTEMBER • OCTOBER 2004 Published by the IEEE Computer Society1089-7801/04/$20.00 © 2004 IEEEIEEE INTERNET COMPUTING

Internet Measurement

Page 2

assuming memory-less and smooth behavior to

long memory and bursty behavior.

In this article, we provide an overview of what

the community has learned from 10 years of LRD

research; we also identify the caveats and limita-

tions of our ability to detect LRD. In particular, we

want to raise awareness on two issues: that identi-

fying and estimating LRD is far from straightfor-

ward, and that the large-scale aggregation of the

Internet’s core might have shifted packet-level

behavior toward being a Poisson process. Ultimate-

ly, measuring and modeling the Internet requires us

to constantly reinvent models and methods.

Self-Similarity

in Internet Traffic

Ample evidence collected over the past decade

suggests the existence of LRD, self-similarity and

heavy-tailed distributions (meaning large values

can exist with non-negligible probability) in vari-

ous aspects of network behavior.

Before we look at the major advances in LRD

research, we must first describe LRD and self-sim-

ilarity in the context of time-series analysis.

Stochastic Time Series

Let X(t) be a stochastic process. In some cases, X

can take the form of a discrete time series {Xt}, t

= 0, 1, ..., N, either through periodic sampling or

by averaging its value across a series of fixed-

length intervals. We say that X(t) is stationary if its

joint distribution across a collection of times t1, ...,

tNis invariant to time shifting. Thus, we can char-

acterize the dependence between the process’s val-

ues at different times by evaluating the process’s

autocorrelation function (ACF), which is ρ(k). The

ACF measures similarity between a series Xtand a

shifted version of itself Xt+k:

()

−

(

⎡⎣

+

2

, (1)

where µ and σ are the mean and standard devia-

tions, respectively, for X.

Also of interest is a time series’ aggregated

process Xk

(m):

, k = 0, 1, 2, …, — 1. (2)

Intuitively, {Xk(m)} describes the average value of

the time series across “windows” of m consecutive

values from the original time series. If {Xk

independent and identically distributed, then

Var(X(m)) = σ2/m. However, if the sequence exhibits

(m)} were

long memory, then the aggregated process’s vari-

ance converges to zero at a much slower rate than

1/m.2

Self-Similarity and LRD

A stationary process X is long-range dependent if

its autocorrelations decay to zero so slowly that

their sum doesn’t converge — that is, ∑k=1

= ∞. Intuitively, memory is built-in to the process

because the dependence among an LRD process’s

widely separated values is significant, even across

large time shifts.

A stochastic process X is self-similar if

∞|ρ(k)|

X(at) = aHX(t), a > 0,

where the equality refers to equality in distribu-

tions, a is a scaling factor, and the self-similarity

parameter H is called the Hurst exponent. Intu-

itively, self-similarity describes the phenomenon

in which certain process properties are preserved

irrespective of scaling in space or time.

Second-order self-similarity describes the prop-

erty that a time series’ correlation structure (ACF) is

preserved irrespective of time aggregation. Simply

put, a second-order self-similar time series’ ACF is

the same for either coarse or fine time scales. A sta-

tionary process Xtis second-order self-similar3if

ρ(k) =1/2 [(k + 1)2H– 2k2H+ (k – 1)2H],

0.5 < H < 1 (3)

and asymptotically exactly self-similar if

ρ(k) =1/2 [(k + 1)2H– 2k2H+ (k – 1)2H],

0.5 < H < 1.

Second-order self-similar processes are char-

acterized by a hyperbolically decaying ACF and

used extensively to model LRD processes. Con-

versely, quickly decaying correlations characterize

short-range dependence. From these definitions,

we can infer that LRD characterizes a time series if

0.5 < H < 1. As H → 1, the dependence is stronger.

For network-measurement processes, X refers

to the number of packets and bytes at consecutive

time intervals, meaning that X describes the vol-

ume of bytes/packets observed in a link every time

interval t.

Self-Similarity in Internet Traffic

Leland and colleagues’ pioneering work provided

the first empirical evidence of self-similar charac-

lim

k→∞

N

m

⎢

⎣⎢

⎥

⎦⎥

X

m

X

k

m

i

i km

=

∑

km

()

()

=

+−

1

11

ρ

µµ

σ

( ) k

EXX

t t k

=

−

)

⎤⎦

IEEE INTERNET COMPUTINGwww.computer.org/internet/ SEPTEMBER • OCTOBER 20043

Long-Range Dependence

Page 3

teristics in LAN traffic.4They performed a rigor-

ous statistical analysis of Ethernet traffic mea-

surements and established its self-similar nature.

Specifically, they observed that Internet traffic

variability was invariant to the observed time scale

— that is, traffic didn’t become smooth with aggre-

gation as fast as the Poisson traffic model indicat-

ed. Subsequently, Paxson and Floyd described the

failure of using Poisson modeling in wide-area

Internet traffic.5They demonstrated that packet

interarrival times for Telnet and FTP traffic were

described by heavy-tailed distributions and char-

acterized by burstiness, which indicated that the

Poisson process underestimated both burstiness

and variability. In addition, they proved that large-

scale correlations characterized wide-area traffic

traces, concluding, “We should abandon Poisson-

based modeling of wide-area traffic for all but user

session arrivals.”

These two landmark studies nudged researchers

away from traditional Poisson modeling and inde-

pendence assumptions, which were discarded as

unrealistic and overly simplistic. The nature of the

congestion produced from self-similar network traf-

fic models had a considerable impact on queuing

performance,6due in large part to variability across

various time scales. Further studies proved that

Poisson-based models significantly underestimated

performance measures, showing that self-similari-

ty resulted in performance degradation by drasti-

cally increasing queuing delay and packet loss.7

Self-similarity’s origins in Internet traffic are

mainly attributed to heavy-tailed distributions of

file sizes.8,9Several studies correlated the Hurst

exponent with heavy-tailed distributions, indicat-

ing that extremely large transfer requests could

occur with non-negligible probability.

Apart from LRD, Internet traffic presents com-

plex scaling and multifractal characteristics. Many

simulations and empirical studies illustrate how

scaling behavior and the intensity of the observed

dependence is related to the scale of observation.

Specifically, loose versus strong dependence exists

in smaller versus larger time scales, respectively.

The change point is usually associated with either

the round-trip time (RTT) or intrusive “fast” flows

with small interarrival times.10,11

Despite the overwhelming evidence of LRD’s

presence in Internet traffic, a few findings indi-

cate that Poisson models and independence could

still be applicable as the number of sources

increases in fast backbone links that carry vast

numbers of distinct flows, leading to large vol-

umes of traffic multiplexing.12In addition, other

studies13point out that several end-to-end net-

work properties seem to agree with the indepen-

dence assumptions in the presence of nonstation-

arity (that is, statistical properties vary with time).

LRD Estimation

and Its Limitations

The predominant way to quantify LRD is through

the Hurst exponent, which is a scalar, but calcu-

lating this exponent isn’t straightforward. First, it

can’t be calculated definitively, only estimated.

Second, although we can use several different

methods to estimate the Hurst exponent, they

often produce conflicting results, and it’s not clear

which provides the most accurate estimation.

We can classify Hurst exponent estimators into

two general categories: those operating in the time

domain and those operating in the frequency or

wavelet domain. Due to space constraints, we can’t

give a complete description of all available esti-

mators, but an overview appears elsewhere.14

Time-domain estimators investigate the

power-law relationship between a specific statis-

tical property in a time series and the time-aggre-

gation block size m: LRD exists if the specific

property versus m is a straight line when plotted

in log-log scale. This line’s slope is an estimate of

the Hurst exponent, so time-domain estimators

imply two presuppositions for LRD to exist: sta-

tistically significant evidence that the relevant

points do indeed represent a straight line, and the

line’s slope is such that 0.5 < H < 1 (the Hurst

exponent H depends on this slope). These estima-

tors use several methodologies: R/S (rescaled

range statistic), absolute value, variance, and vari-

ance of residuals.

Naturally, frequency- and wavelet-domain

estimators operate in the frequency or wavelet

domain. Similarly to the time-domain method-

ologies, they examine if a time series’ spectrum or

energy follows power-law behavior. These esti-

mators include the Periodogram, the Whittle, and

the wavelet Abry-Veitch (AV) estimators.15

We can test these estimation methodologies’

capabilities by first examining their accuracy on

synthesized LRD series and then testing their abil-

ity to discriminate LRD behavior when applied to

non-LRD data sets. In agreement with similar

findings in earlier studies,14,16our findings

demonstrate that no consistent estimator is robust

in every case: estimators can hide LRD or report

it erroneously. Furthermore, each estimator has

4 SEPTEMBER • OCTOBER 2004 www.computer.org/internet/IEEE INTERNET COMPUTING

Internet Measurement

Page 4

different strengths and limitations. We used the

software package SELFIS (publicly distributed at

our Web site, www.cs.ucr.edu/~tkarag) to perform

the experiments described next.

Estimator Accuracy

on Synthesized LRD Series

The most extensively used self-similar processes

for simulating LRD are fractional Gaussian noise

(fGn) and fractional Auto Regressive Integrated

Moving Average (ARIMA) processes. fGn is an

increment of fractional Brownian motion (fBm) (a

random walk process with dependent increments);

fGn is a Gaussian process and its ACF is given by

Equation 3. The fractional ARIMA(p,d,q) model is a

fractional integration of the autoregressive moving

average, or ARMA(p,q), model. Fractional ARIMA

processes describe LRD series when 0 < d < 0.5, in

which H = d + 0.5.

We tested each estimator against two different

types of synthesized long-memory series: frac-

tional ARIMA and fGn.17For each Hurst value

between 0.5 and 1 (using a step of 0.1), we gener-

ated 100 fGn and 100 fractional ARIMA synthe-

sized data sets of 64 Kbytes. Figure 1 reports the

average estimated Hurst value for these data sets

for each estimator as well as the 95 percent confi-

dence intervals of the mean (that is, the range of

values that has a high probability of containing

the mean). However, these intervals are so close to

the average that they’re barely discernible.

Although many estimators and generators exist,

we used and evaluated the most common and

widely used ones.

Figure 1 shows significant variation in the

estimated Hurst exponent value between the var-

ious methodologies, especially as the Hurst expo-

nent tends to 1, where the intensity of long-

range dependence is larger. Frequency-domain

estimators seem to be more accurate. In the case

of the fGn synthesized series, Whittle and Peri-

odogram estimators fall exactly on top of the

optimal estimation line. The Whittle estimator

has the a priori advantage of being applied to a

time series whose underlying structure matches

the assumptions under which the estimator was

derived. The wavelet AV estimator always over-

estimates the Hurst exponent’s value (usually by

0.05). Overall, time-domain estimators fail to

report the correct Hurst exponent value, under-

estimating it by more than 20 percent. (In Figure

1, lines clustered under the optimal estimation

line represent these estimators.) When we used

fractional ARIMA to synthesize the time series,

the estimations were generally closer to the opti-

mal estimation line. However, none of the esti-

mators consistently followed the optimal line

across all Hurst values.

Discrimination of LRD

Behavior in Deterministic Series

To study the estimations’ sensitivity, we examined

IEEE INTERNET COMPUTINGwww.computer.org/internet/SEPTEMBER • OCTOBER 20045

Long-Range Dependence

Figure 1.Estimating Hurst exponent values. We tested the

performance of estimators on (a) fractional Gaussian noise (fGn)

estimator and (b) fractional ARIMA (Auto Regressive Integrated

Moving Average) synthesized time series.The target line is the

optimal estimation.In both cases,time-domain estimators,

represented by the lines clustered below the target line,failed to

capture the synthesized Hurst exponent value,especially as H

tended to 1.Frequency-based estimators appear to be more

accurate,following the target line closer.

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.500.55 0.600.650.70 0.750.80 0.850.900.95

Estimated Hurst exponent

Hurst exponents of fGn series

Target

RS

Variance

Abs

Residuals

Perioidograms

Whittle

AVs

0.90

0.95

0.50 0.550.600.65 0.70 0.750.800.85 0.90 0.95

Estimated Hurst exponent

Hurst exponents of fractional ARIMA series

0.40

0.50

0.60

0.70

0.80

0.45

0.55

0.65

0.75

0.85

Target

RS

Variance

Abs

Residuals

Perioidograms

Whittle

AVs

(a)

(b)

Page 5

6 SEPTEMBER • OCTOBER 2004 www.computer.org/internet/IEEE INTERNET COMPUTING

Internet Measurement

the effects of various phenomena common to

time-series analysis, such as periodicity, noise, and

trend (where the mean of the process is steadily

increasing or decreasing). Our analysis revealed

that the presence of such processes significantly

affects estimators. Furthermore, most methodolo-

gies fail to distinguish between LRD and such phe-

nomena, and falsely report LRD in deterministic

non-LRD time series. We examined four cases and

learned that, essentially, no estimator is consis-

tently robust in every case. Each one evaluates dif-

ferent statistics to estimate the Hurst exponent,

which requires the examination of many estima-

tors to get an overall picture of the time series’

properties. Applying signal-processing techniques

and methodologies could help us overcome some

of these limitations, but networking practitioners

aren’t necessarily familiar with such practices.

Cosine plus white Gaussian noise. In our first test,

we applied the estimators to periodic data sets and

then synthesized the series with white Gaussian

noise and a cosine function: Acos(αx). Periodicity

can mislead the Whittle, Periodogram, R/S, and AV

methods into falsely reporting LRD. The Hurst

exponent estimation depends mainly on A, so the

estimations approach 1 as A increases. Thus, as the

amplitude increases, estimations become less reli-

able. If the amplitude is large and the period is

small, Whittle always estimates the Hurst exponent

to be 0.99. (Whittle estimates of 0.99 represent the

failure of robust estimation.)

fGn series plus white Gaussian noise. We next

examined the effect of noise on LRD data. We

found that all estimators underestimate the Hurst

exponent in the presence of noise, but with the

exception of Whittle and the wavelet estimator, the

difference is negligible. Depending on the signal-

to-noise ratio and the fGn series’ Hurst exponent

value, however, these two estimators could signif-

icantly underestimate the Hurst exponent — by

more than 20 percent in some cases.

fGn series plus a cosine function. In studying the

effect of periodicity on LRD data, we found that all

estimations were affected by its presence. Depend-

ing on the cosine function’s amplitude, time-

domain estimators tend to underestimate the Hurst

exponent. On the other hand, frequency-based

methodologies overestimate the Hurst exponent.

As we increase the cosine function’s amplitude,

estimates tend toward 1.

Trend. The definition of LRD assumes stationary

time series. To study the impact of nonstationarity

on the estimators, we therefore synthesized vari-

ous series with different decaying or increasing

trends. We also examined combinations of previ-

ous categories (white Gaussian noise and cosine

functions) with trend. In every case, the Whittle

estimate was consistently 0.99; the Periodogram

method’s estimates for the Hurst exponent were

greater than 1, whereas self-similarity is only

defined for H < 1. No other methodology produced

statistically significant estimations.

Examining the Poisson

Assumption in the Backbone

We studied the Poisson assumption’s validity on

several OC48 (2.5 Gbps) backbone traces taken

from CAIDA (Cooperative Association for Internet

Data Analysis) monitors located at two different

SONET OC48 links belonging to two US tier-1

Internet service providers (ISPs).

To capture the traces, we used Linux-based

monitors with Dag 4.11 network cards and pack-

et-capture software originally developed at the

University of Waikato and currently produced by

Endace. We analyzed various backbone traces:

August 2002 (backbone 1, eight hours), January

2003 (backbone 1, one hour), April 2003 (back-

bone 1, eight hours), May 2003 (backbone 1, 48

hours; backbone 2, two hours), and January 2004

(backbone 2, one hour).

Our analysis demonstrates that backbone pack-

et arrivals appear to agree with the Poisson

assumption,12,18but our traces also appear to agree

with self-similarity and past LRD findings. A more

elaborate discussion of our findings as well as a

traffic characterization that reconciles these con-

tradictory results appears elsewhere;18there, we

argue how Internet traffic demonstrates a nonsta-

tionary, time-dependent Poisson process and,

when viewed across very long time scales, exhibits

the observed LRD.

To test the Poisson traffic model’s validity, we

must examine two key properties: whether packet

interarrival times follow the exponential distribu-

tion, and whether packet sizes and interarrival

times appear mutually independent. Congestion in

today’s Internet usually appears on access links

rather than in the backbone where ISPs overpro-

vision their networks: traffic characteristics can

vary in such links, which means our findings

might not apply.