Page 1

Hindawi Publishing Corporation

EURASIP Journal on Advances in Signal Processing

Volume 2009, Article ID 752818, 11 pages

doi:10.1155/2009/752818

Research Article

DetectingDistributed NetworkTrafficAnomaly with

Network-WideCorrelationAnalysis

LiZonglin,HuGuangmin,YaoXingmiao,andYangDan

Key Lab of Broadband Optical Fiber Transmission and Communication Networks,

University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China

Correspondence should be addressed to Li Zonglin, lizonglin@uestc.edu.cn

Received 22 October 2007; Accepted 20 August 2008

Recommended by Rocky Chang

Distributed network traffic anomaly refers to a traffic abnormal behavior involving many links of a network and caused by

the same source (e.g., DDoS attack, worm propagation). The anomaly transiting in a single link might be unnoticeable and

hard to detect, while the anomalous aggregation from many links can be prevailing, and does more harm to the networks.

Aiming at the similar features of distributed traffic anomaly on many links, this paper proposes a network-wide detection

method by performing anomalous correlation analysis of traffic signals’ instantaneous parameters. In our method, traffic signals’

instantaneous parameters are firstly computed, and their network-wide anomalous space is then extracted via traffic prediction.

Finally, an anomaly is detected by a global correlation coefficient of anomalous space. Our evaluation using Abilene traffic traces

demonstrates the excellent performance of this approach for distributed traffic anomaly detection.

Copyright © 2009 Li Zonglin et al. This is an open access article distributed under the Creative Commons Attribution License,

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.Introduction

Network traffic anomalyis referred to asa situation suchthat

traffic deviates from its normal behavior, while distributed

network traffic anomaly is a traffic abnormal behavior

involving multiple links of a network and caused by the same

source. There are many reasons that can cause distributed

network traffic anomaly, such as DDoS attack, flash crowd,

sudden shifts in traffic, worm propagation, network failure,

network outages, and so forth. Any of these anomalies will

seriously impact the performance of network.

Usually, there are not any obvious features of anomalies

in individual links for distributed network traffic anomaly,

that is, compared with background traffic of backbone

network, even its normal changes, anomalous traffic may

be unnoticeable so that detection based on information

collected from single link is very difficult. However, the

sum of anomalous traffic on many links can be prevailing.

If we put multitraffic singles together and apply network-

wide anomaly detection to them, the relationship between

traffic would help to reveal anomaly. Principle component

analysis (PCA) is an existing statistical-analysis technique;

Lakhina et al. [1, 2] applied it as a network-wide detection

method to the field of traffic anomaly detection. It follows

that decomposing overall traffic into two disjoint parts based

on correlation across links or origin-destination (OD) flows,

respectively, corresponds to normal space and anomalous

space.Trafficwithlesscorrelationisconsideredasanomalous

space, the energy of anomalous space; is then compared with

a threshold to diagnosis anomaly.

The distributed traffic anomalies caused by the same

source usually have some similar features in time or

frequency domain. These similarities contribute to strong

correlation between anomalous flows. Since PCA-based

methods deal with the anomalous space that lacks correla-

tion, they are prone to suffer from false negative. Although

the volume of individual anomaly is small, anomalous

flows in many links exhibit inherent correlations. This fact

should be useful for detection. Drawing on the change of

correlation between network-wide anomalous space lends

itself to bypass the limitation of PCA-based methods. In

this paper, we propose a method to detect distributed

traffic anomaly with network-wide correlation analysis of

instantaneous parameters. First traffic signals’ instantaneous

Page 2

2EURASIP Journal on Advances in Signal Processing

parameters are computed; and their network-wide anoma-

lous space is then extracted via traffic prediction; finally,

global correlation coefficient as a measure of the correlation

between anomalous space is calculated to reveal anomaly.

The contributions of this paper are as follows.

(i) We perform detection on instantaneous amplitude

and instantaneous frequency of traffic signal, which

can reveal anomalies by its characteristics of time

and frequency domain. To improve computation

speed of instantaneous parameters, we propose a fast

algorithm of instantaneous parameters computation

for anomaly detection.

(ii) We divide anomalous space by means of comparing

theactualinstantaneousparametersofODflowswith

thepredictionstoovercomelimitationofPCAinfail-

ing to detect the anomalies with strong correlations.

(iii) Targeting at the characteristics of distributed traffic

anomaly, we deploy detection by correlation analysis

of amplitude and frequency between anomalous

space, rather than volume, which can detect small

anomaly in single link.

2.RelatedWork

Network traffic anomaly detection method can be classified

into single node and multinodes detection by traffic number

being analyzed. Based on whether to take into account

the relationship between traffic, multinodes detection can

be further differentiated between distributed detection and

network-wide detection.

Distributed detection [3–10] is to select some nodes

in the network to construct subdetection networks. First,

each node deploys simple and fast local detection by self-

collected information; second, exchange detecting results of

each node through a certain communication mechanism;

then, synthesize the results of partial or all nodes to

determine whether anomaly occurs. Some related systems

or architecture have been reported, for instance, distributed

attack detection system (DAD) [4, 5], Cooperative Intrusion

Traceback and Response Architecture (CITRA) [6, 7], and so

on. In addition, some try to deploy local detection by fre-

quency domain analysis, as shown in [11]. This collaborative

distributed detection, that determines anomaly by detection

results on many nodes, overcomes the limit of detecting

only by one single node and increases detection accuracy

effectively. However, its final detection result still depends on

localresultofeachnodetoagreatextent,whereasdistributed

network anomaly does not present obvious feature on single

node, which makes it hard to detect one of them.

Being different from the former distributed detection

which tends to detect at different position independently,

network-wide detection is a method that analyzes all traffic

signals together and exposes anomaly through relationship

between traffic. Diagnosis anomaly in network-wide per-

spective was firstly reported in the works of Lakhina et al.

[1, 2]; they perform PCA to analyze the relationship between

volumeofalllinksorODflows,inordertodivideanomalous

part from traffic. In 2005, Lakhina et al. [12] proposed an

anomaly detection method by applying PCA to the feature

distribution of network-wide traffic, and a DDoS attack

detection method using multiway PCA [13]. Li et al. [14]

introduced a method combining traffic sketch and subspace

for network-wide anomaly detection. Yuan and Mills [15]

defined a weight vector and discovered congestion on many

links by cross-correlation analysis. Huang et al. [16] detected

network disruption via performing PCA to network-wide

routing updates data.

Most of existing network-wide detection methods are

based on PCA. The main advantage of these methods is

the use of the relationship among overall traffic, and can

detectsomeanomalieseffectively,especiallyabruptchangeof

traffic at local point. The basic idea of PCA is to treat traffic

which are highly correlated as normal space and only analyze

the remaining anomaly space. However, distributed traffic

anomaliescausedbythesamesourcepossesshighcorrelation

with each other, and they are prone to be divided into

normal space by PCA. Therefore, PCA-based method may

suffer from false negative in detecting distributed network

traffic anomaly. Furthermore, these methods still determine

anomaly only by the value of traffic volume, which leads

to the difficulties in detecting relatively small distributed

anomalous traffic from normal ones. In this paper, we

divide anomalous space by comparing predictions of traffic

instantaneous parameters with real value, and make use of

thevariationdegreeofcorrelationbetweenanomalousspace,

rather than volume, to infer anomaly.

Signal process technique has been widely used in traffic

anomaly detection for single node. Cheng et al. [17] found

that the PSD of normal TCP flows exhibit periodicity while

the PSD of DoS attack flow is not. Hussain et al. [18] utilized

the difference of PSD in lower frequency band to classify

the attacks as single or multisource. Chen and Hwang [11]

compared the PSD of normal traffic with attack in lower

frequencybandwiththeaimofperiodicpulsingDDoSattack

detection. The PSD of signal illustrates the proportion of

everyfrequencycomponent as a whole,however it lacks local

information, and cannot be more specific about the time

each frequency component is involved in, while it is more

important to nonstationary traffic signals whose frequency

components are time varying. The instantaneous parameters

can provide information about amplitude and frequency

of nonstationary signal in every time point and how they

change with time. Wang et al. [19] used Hilbert-Huang

transform [20] to acquire the instantaneous frequency of

traffic as an outline of normal behavior for single link. In

this paper, we use both instantaneous parameters of OD

flow, namely, instantaneous frequency and instantaneous

amplitude, and divide anomalous space for each of them.

The main difference between our method and [19] is

that, first, the method proposed by [19] is used for single

node detection, it attempted to find anomaly based on obvi-

ous change of traffic instantaneous frequency, however the

variation of instantaneous frequency caused by distributed

anomaly traffic on individual link is potentially small, the

detection method would be hampered by this fact. Whereas,

we analyze network-wide OD flows, and use the change of

Page 3

EURASIP Journal on Advances in Signal Processing3

correlation caused by the effect of alteration simultaneously

across multiple traffic data, to circumvent the difficulty

caused by individual anomaly with small variation in instan-

taneous frequency. Second, the analysis in [19] was only

for traffic instantaneous frequency. Since anomalous traffic

may cause different impact on instantaneous frequency

and instantaneous amplitude of background traffic, there

might exist false negative in detection from instantaneous

frequency or amplitude solely. Instead, we use instantaneous

amplitude as well as instantaneous frequency so as to achieve

a better detection performance.

3.DistributedNetwork Traffic

AnomaliesDetection

Distributed network traffic anomalies caused by the same

source usually have some similar features. For instance, the

anomalies arose by same attack event, commonly generated

by specific tools, might possess some similarities in their

start time, lasting time, interval time, type and frequency

characteristic, and so forth; likewise, the alternative dis-

tributed traffic anomalies caused by nonattack reasons, like

outages, might result in the flows that traverse the location of

anomalous event change simultaneously. These similarities

both in time and frequency domain contribute to the strong

correlation between anomalous flows.

The previous anomaly detection methods usually make

use of the difference between individual anomaly and the

normal pattern to derive judgment. However, they generally

fail to detect the anomalies on individual links which are

relatively small. The alteration of single anomalous flow

is unnoted, while the variational tendency of multiple

anomalous flows in time or frequency domain is easy to

be captured, and by means of this collectively variational

tendency, can conquer the difficulties resulting from small

single anomaly. Therefore, the concept of correlation can

be used to characterize the relationship between multiflows

when they change simultaneously.

As the correlation of anomalous flows is not only

exhibited in time domain, but also reflected in frequency

domain,itisadvantageoustoconsidermorekindsoffeatures

of anomalous flows both in time and frequency domains for

correlation analysis to reveal anomaly. Instantaneous param-

eters (i.e., both instantaneous amplitude and instantaneous

frequency) are physical parameters, which capture transient

characteristic of signal, and characterize it in different ways.

In this sense, we perform correlation analysis on the two

instantaneous parameters of anomalous flows to identify

anomalies more extensively.

Besides the correlation of distributed anomalous flows,

there still exists correlation between normal traffic, such as

the similar diurnal and weekly pattern. Accordingly before

we perform correlation analysis on anomalies, it is necessary

to eliminate the influence of correlation between normal

trafficstoavoidtheimpactondetectionresult,itisequivalent

to extract anomalous space from the whole traffic signal.

The detection steps are depicted in Figure 1. Firstly, we

compute instantaneous parameters of every OD flows to get

Traffic

signal

Instantaneous

parameters

computation

Anomalous

space

extraction

Network-wide

correlation

analysis

Figure 1: Distributed network traffic anomaly detection steps.

their instantaneous amplitude and frequency; then model

theinstantaneousparameterswithcorrespondingtimeseries

models, the difference between actual data and predictions

is used to approximate the anomalous space which includes

abnormal flows; finally, network-wide correlation analysis is

performed on the anomalous space and detect distributed

traffic anomaly by the variation degree of correlation. The

computation of instantaneous parameters, extraction of

anomalous space, and network-wide correlation analysis will

be elaborated, respectively, in Sections 4, 5, 6.

4.InstantaneousParametersand

FastAlgorithm

4.1.InstantaneousParameters. Trafficsignalisnonstationary,

it varies with time, so does its frequency content. The instant

characteristicofnonstationarysignalisgenerallycapturedby

instantaneous parameters (including instantaneous ampli-

tude (IA), instantaneous frequency (IF)), which decompose

the information of amplitude and frequency, and do not

change the nature of signal, but rather to set up reflections

of different aspects. Instantaneous parameters tend to reveal

some characteristics of signal that are covered by usual time

description. The definitions of instantaneous parameters are

as follow: for any continue time signal X(t), we can get its

Hilbert transformation: Y(t) = (1/π)?+∞

then resolve signal Z(t) is obtained by Z(t) = X(t)+ iY(t) =

a(t)ejθ(t), where θ(t) = arctan(Y(t)/X(t)) is the phases

function of Z(t). The instantaneous amplitude of Z(t) is

computed by:

−∞X(τ)/(t − τ)dτ,

a(t) =?X(t)2+Y(t)2?1/2.

(1)

Instantaneous frequency ω(t) is denoted as

ω(t) =dθ(t)

dt

.

(2)

4.2. Fast Algorithm for Instantaneous Parameters Computa-

tion. Anomaly detection is usually required to be processed

online.Incomputationofinstantaneousparameters,awhole

traffic series is needed for convolution, however it cannot

meet the need of real-time operation. Accordingly, a sliding

window can be used in practical calculation, to move along

the traffic and intercept data from it. While the window is

sliding, the two data sets, intercepted, respectively, before

and after window moves, always have a same part, and

there would be a lot of redundant results if we compute

the same part twice. So it is convenient to store this part

of instantaneous parameters in advance, and only compute

the new data intercepted by the window to avoid repeating

calculation and improve the detection speed.

Page 4

4EURASIP Journal on Advances in Signal Processing

(1)(2) (3)

(4) (5)

t

S2:

S1:

Figure2:Fastalgorithmofinstantaneousparameterscomputation.

LetS1bethetrafficdatasetinterceptedbyslidingwindow

at certain time, and the length of window is N, the kernel

of the Hilbert transform 1/(πt), which can be considered as

a filter with the length of 2L, then the Hilbert transform of

S1(k) can be written as

HS1(k) =

L?

i=−L

S1(k)

(k − i)π,

k = 0,1,...,N.

(3)

When k − i < 0, namely 0 ≤ k ≤ L, the data of Skin this

section are out of range and demand process separately, this

section is at the beginning of the signal. When k − i > N,

namely k − i > N, the data in this section are out of range

and demand process separately, this section is at the end of

the signal. When L ≤ k ≤ N − L, the data in this section do

the normal convolution.

As moving along the traffic data, the sliding window

samples the data to get another signal S2every time lapse of

ΔT, as depicted in Figure 2, it is composed as follows:

⎧

⎪⎩

The data of S1in the section L+ΔN ≤ k ≤ N −L are the

same as the data of S2in the section L ≤ k ≤ N − L − ΔN,

so the instantaneous parameters IP1(k) and IP2(k) of this

part are the same, as represented in Figure 2(2). The number

of the same points is M = N − 2L − ΔN. Therefore, as

long as N > L, we only need to compute the instantaneous

parameters of S2in the section of k ∈ [0,L] ∪ [N − (L +

ΔN),N].

Thefastcalculationofinstantaneousparametersincludes

4 steps.

S2(K) =

⎪⎨

S1(k −ΔN),

new input data,

0 ≤ k ≤ (N −ΔN),

(N −ΔN) < k ≤ N.

(4)

(i) Compute the instantaneous parameters IP1(k) of

signalS1,andstorek ∈ [L,N−L−ΔN]partofIP1(k)

to be the section of IP2(k) for k ∈ [L,N − L − ΔN],

which is represented in Figure 2(2).

(ii) According to the principle of data periodic repetition

which deals with data beyond the boundary, we pick

up the part of k ∈ [N −L,N] from S2, and convolute

with filter to get the section of IP2(k) for k ∈ [0,L),

as it shown in Figure 2(4).

(iii) Pick up the part of k ∈ [0,L] from S2, and convolute

with filter to get the section of IP2(k) for k ∈ (N −

(L+ΔN),N), as it shown in Figure 2(5).

(iv) Synthesizingthreestepsmentionedbefore,wecanget

the whole instantaneous parameters IP2(k) of S2.

The fast algorithm of instantaneous parameters based on

sliding window technology adds an array with the length of

0

2

4

6

8

×107

Traffic

0500 10001500 2000

Time (5 minutes)

(a) Addingoneanomaly inno.26ODflow (betweenverticaldashlines)

0

5

10

×107

Traffic

05001000 15002000

Time (5 minutes)

(b) No.50 OD flow unstained

Figure 3: Anomaly in a single flow.

0

2

4

6

8

×107

Traffic

0500100015002000

Time (5 minutes)

(a) Addingoneanomaly inno.26ODflow (betweenverticaldashlines)

0

5

10

×107

Traffic

0500100015002000

Time (5 minutes)

(b) Adding one anomaly in no.50 OD flow (between vertical dash lines)

Figure 4: Two anomalies in two flows.

M (M = N − 2L − ΔN), to record the same part between

IP1(k) and IP2(k), by comparison with normal computation.

When calculating IP2(k), the same part with IP1(k) can be

transferred directly to the result to improve the computation

speed of instantaneous parameters.

Page 5

EURASIP Journal on Advances in Signal Processing5

0

1

2

3

×1014

Residual vector

0 5001000 1500 2000

Time (5 minutes)

Figure 5: PCA for one anomalies in two flows.

0

1

2

3

×1014

Residual vector

0 5001000 15002000

Time (5 minutes)

Figure 6: PCA for two anomalies in two flows.

5.Anomalous Space Extraction

The extraction of anomalous space from traffic signal is

implemented via getting rid of normal traffic behavior. Most

ofnetwork-wideanomalytrafficdetectionmethodsarePCA-

based method, they draw on PCA to divide traffic into

normal and abnormal space, the normal part is determined

while they have strong temporal trend among links or OD

flows.Itperformswellindetectingabruptchangeinthelocal

of single traffic, but may be limited to the case of distributed

traffic anomaly, for the anomalies with strong correlation are

possibly divided into normal space. We will illustrate it by

changing the number of anomalous flows.

Figure 3 is the traffic of no. 26, 50 OD flows of Abilene

network (more detail in Section 7.1) in the 3rd week. In

Figure 3(a), we inject one anomaly to 26 OD flow with five

times of the mean of it, from 1000 to 1004 sample point,

which corresponds to the spike and can be easily visually

isolated. 50th OD flow is unstained. The anomalous space

derived by PCA is depicted in Figure 5, and the abrupt

change of 26th OD flow is correctly partitioned. In the same

way, we inject another anomaly with 5 times of its mean

and the same lasting time on 50th OD flow, as shown in

Figure 4(b). There are similarities between two anomalies in

the beginning, lasting time, and the change of volume. The

outcome of PCA for the two anomalies is shown in Figure 6.

It shows that the anomalies nearby the 1000th sample point

are not divided into the anomalous space, instead they are

considered as the normal due to the strong correlation.

Therefore, PCA method cannot separate anomalous space

for distributed traffic anomaly with strong correlation.

Observing from normal OD flows, traffic usually consists

of normal part and the part representing some random fac-

tors,whichmightbetheresultofaccidentalbehaviorofusers

when there exists no anomaly. Owing to the similar daily and

weekly pattern of traffic, the normal part must have some

correlation, if the behavior of normal traffic is separated, the

residual of different OD flows should not have correlation,

which means that the residual traffic are independent of

each other. While anomaly occurs, anomalous flows are of

strong correlation. For this reason, the correlation of normal

traffic is necessary to be restrained. ARIMA (p,d,q) (Auto

Regressive Integrated Moving Average) model [21, 22] are

adopted to forecast the instantaneous parameters of OD

flows, the prediction results as an estimation of normal

pattern are subtracted by actual data so as to divide normal

behavior, and the residual that represents the anomalous

space is needed for the next correlation analysis.

Duetothestrongcorrelationoftwoinjectedanomaliesin

timedomain,asshowninFigure 4,weextracttheanomalous

space of instantaneous amplitude through our method,

the result is shown in Figures Figure 7(a) and Figure 7(b),

the similar changing tendency features of anomalies in

instantaneous amplitude are captured accurately. This sim-

ilar characteristic will contribute to strong correlation of

anomalies, it will be introduced in the Section 6.

6.Network-WideCorrelationAnalysis

6.1. Network-Wide Correlation Analysis for Anomalous Space

of OD Flows. The correlation of anomalous space from two

different OD flows in time or frequency domain can be

measured by correlation coefficient in statistical, which is

defined as follows.

Let X and Y stand for two random variables, the

covariance of X and Y is Cov(X,Y) = E{[X − E(X)][Y −

E(Y)]}, where D(X) and D(Y) are the variance of X and

Y, respectively. The correlation coefficient of X and Y is

computed by

ρxy=

Cov(X,Y)

?D(X)D(Y).

(5)

The correlation coefficient is a measure of the linear

relationship between two variables. The absolute value of ρxy

varies between 0 and 1, with 1 indicating a perfect linear

relationship, and ρxy= 0 indicating no relationship.

Due to the path and delay in the network, the distributed

anomalous flows may not rise in the same time, thereby it

is not wise to consider the correlation of two anomalous

space only in the same period. Two sliding windows are

introduced to calculate the correlation coefficient between

two neighborhood periods.

AsshowninFigure 8,OiandOjaretheanomalousspaces

extracted from two different OD flows. Window w1 starts at

time t, intercepting the data of Oiwith length of w1, as one

of the vector. For the other anomalous space Oj, the window

with start point varies between (t −w2,t+w2), intercept the

same length of data to be another vector. Every time the start

point of window on Ojmoves, a correlation coefficient can