Page 1

A NEW INFORMATION THEORETIC MEASURE OF PDF SYMMETRY

Weifeng Liu, P. P. Pokharel, Jose C. Principe

CNEL, University of Florida

ABSTRACT

In this paper, a new quantity called symmetric information

potential (SIP) is proposed to measure the reflection

symmetry and to estimate the location parameter of

probability density functions. SIP is defined as an inner

product in the probability density function space and has a

close relation to information theoretic learning. A simple

nonparametric estimator directly from data exists.

Experiments demonstrate that this concept can be very

useful dealing with impulsive data distributions, in

particular, ?-stable distributions.?

Index Terms— Symmetric distributions, information

theoretic learning, robust statistics

1. INTRODUCTION

A fundamental task in statistical analysis is to characterize

the location and variability of a data set. Further

characterization of the data includes skewness and kurtosis.

In this paper, we mainly address two of these important

issues: location and skewness.

The estimation of a location parameter is to find a

typical or central value that best describes the data. For

univariate data, mean, median, and mode are three common

methods [9]. However, if the data is Cauchy distributed, the

mean becomes useless, because collecting more data does

not provide a more accurate estimate [9].

Another important characteristic of a data distribution is

skewness, which is a measure of symmetry. A distribution is

symmetric if it looks the same to the left and right of the

center point. For the data set

{ }N

x

as

1

sk(

i

i

x

?

?

where x and s are the mean and the standard deviation of

the data. By definition, the skewness measures symmetry

with respect to the mean. If the mean estimate is

meaningless as in the Cauchy distribution, the skewness is

also meaningless although the Cauchy distribution is

obviously symmetric. In this paper, we try to solve this

dilemma by defining a new concept of center and a new

1

i i

?, the skewness is defined

33

) /(1)

N

xNs

???

(1)

?This work was partially supported by NSF grant ECS-0601271.

symmetry measure of data distributions based on

information theoretic learning [1].

The organization of the paper is as follows. In section

2, a brief review is given about information theoretic

learning. Then the definition and properties of symmetric

information potential (SIP) are presented, based on which

the Euclidean symmetry measure, Cauchy-Schwartz

symmetry measure and the reflection point are defined in

section 3. Possible applications are discussed in section 4.

Finally, section 5 summarizes the main conclusions.

2. INFORMATION THEORETIC LEARNING

Given i.i.d. samples

1

{ }N

i i

x

estimator [2] of the PDF is

1

ˆ( )

X

fx

N

where

()

i

Gxx

?

?

is the Gaussian kernel and ? is the

kernel size.

() exp( (

i

Gxxx

?

???

Renyi’s quadratic entropy of a random variable X with

PDF

( )

x is defined by

? drawn from

( )

x , the Parzen

Xf

1

()

N

i

i

Gxx

?

?

??

?

(2)

22

) /2 )/ 2

i

x

????

(3)

Xf

2

2( )Xlog( )x dx

X

Hf

??

??

? ?

?

(4)

The argument of the log function in (4) is called information

potential (IP) since the PDF estimated with Parzen kernels

can be thought to define an information potential field over

the space of the samples [3]. A non-parametric estimator of

the IP (and thus of quadratic Renyi’s entropy) directly from

samples is obtained through (2)

1

ˆ( )

j

N

?

3. SYMMETRIC INFORMATION POTENTIAL

2

2

11

()

NN

ji

i

IP XGxx

?

?

??

??

(5)

3.1. Definition

Definition: Suppose the PDF of a random variable X is

( )

x . Symmetric Information Potential (SIP) of X is

defined as

??

?

?

With i.i.d. samples

1

{ }N

i i

x

estimator is obtained as

Xf

( )( )

x f

()

XX

SIP Xf x dx

??

?

(6)

? drawn from X, a nonparametric

II 5291424407281/07/$20.00 ©2007 IEEE ICASSP 2007

Page 2

2

2

11

1

ˆ( )

()

NN

ji

ji

SIP XGxx

N

?

??

??

??

(7)

Comparing (7) with (5), we see that SIP is very similar to IP

and furthermore when the distribution is symmetric,

i.e.

( )()

XX

fxfx

??

, SIP reduces to IP by definition.

3.2 Properties

Let X be a random variable.

Property 1:

( )0

SIP X ? .

Property 2:

( )( )

SIP XIP X

?

. Equality holds if and only if

( )()

XX

fxfx

??

.

Proof: By Cauchy-Schwartz inequality

( )( )

X

SIP Xf

?

?

??

?

Equality holds when

( )

X

fx

?

real positive constant. Since

( )

X

f x dx

?

??

a can only be 1. This completes the proof.

Property 3: Define a new random variable

where

1

X is independent of

PDF

( )x . SIP(X) equals the probability density of

Proof: Since

1

X and

2

X are independent, the PDF of Y is

( )( )

YX

fyf x f

?

?

Therefore,

( )( )

XX

SIP Xf x f

?

?

This completes the proof. With only the data available and

by using (2) in (10), we have

1

ˆ( )

Y

ji

N

??

This is exactly the Parzen estimator of

)}N

ij i j

xx

?

?

are regarded as the realizations drawn from Y.

3.3. Reflection Symmetry Measure

It is natural to measure symmetry of a data distribution by

comparing its PDF

( )x with its mirror image

since this is the fundamental definition of the reflection

symmetry. If we treat

( )x and

the PDF space, their Euclidean distance is

(( )

EDXX

Sfxf

??

?

??

22 1/2

)

2

()

( ( ) x dx f()

( )x dx ( )

X

XX

X

x f x dx

f x dx

f IP X

?

??

??

(8)

()

X

afx

?

, where a is some

()1

X

f x dx

??

(9)

12

YXX

??

,

2

X but they have the same

Xf0Y ? .

()

X

y x dx

?

. (10)

()( 0)

Y

x dxfy

???

. (11)

2

2

11

()

NN

ji

fyGxxy

?

???

??

(12)

( )y

Yf

if

,1

{(

Xf()

Xfx

?

Xf()

Xfx

?

as two points in

2

2

( ))

2( (( )

x dx

( )

x f

())

2( ( )( ))

XXX

x dx

ffx dx

IP XSIP X

?

???

??

(13)

By the properties of SIP, we see that

(1) 02 ( )

ED

SIP X

??

;

(2)

S

way to define ‘distance’ in a vector space is to measure the

angle between two vectors. Therefore, the Cauchy-Schwartz

symmetry measure SCS is

( )

X

CS

XX

fx fx dx

?

?

?

This is a normalized version of SIP and it denotes the cosine

value of the angle between

following properties:

(1) 01

CS

S

?? ;

(2)

1( )(

CSXX

Sifffxf

???

S

and

CS

S

are almost equivalent. Most importantly,

they possess nice non-parametric estimators directly from

data whereas other forms of divergence such as Kullback-

Leibler are computationally expensive. Practically

more appropriate than

ED

S

since it removes the effect of the

‘norm’ of the PDF.

3.4. Reflection Point

In the previous definitions, we set the reflection point at the

origin by default. However, it would be appealing in some

cases to have the symmetry as an internal property of the

data distribution and independent of the external coordinate

system. As defined in (1), the skewness measures symmetry

with respect to the data mean, and is therefore shift-

invariant. By analogy, the center of the data can be first

estimated with a subsequent shift to the origin. However,

this means our definition of symmetry depends on the

particular choice of which point is the center and in practice

depends on what kind of method is used to estimate it.

Intuitively, the concepts of reflection point and reflection

symmetry are co-dependent. Further, it is logically sound to

define the reflection point, with respect to which the

maximal symmetry is achieved. The reflection point Rp can

be used to represent the ‘center’ of the data distribution.

Assume we shift the data by t. The

?

?

?

?

?

?

The denominator, i.e. the IP, is shift-invariant and the

numerator turns out to be

reflection point is defined as

0( )x()

ED

is called the Euclidean Symmetry measure. Another

XX

Siffffx

???

.

ED

1/2

)

1/2

)

()

(( )( )(()()

( )

x f

()

( )

( )

( )

x f

( )

x dx

X

XX

XX

XX

f x fx dx

S

f x fx dx

fx dx

SIP X

IP X

f

?

?

??

?

??

?

?

(14)

( )x and

Xf()

Xfx

?

.

CS

S

has the

)x

.

ED

CS

S

is

CS

S

becomes

21/2

)

21/2

)

()()

( )t

(()(()

( ) x f(2)

( )x f ( )x dx

XX

?

CS

XX

XX

XX

fx t fx t dx

? ?

S

fx t dxfx t dx

? ?

ftx dx

f

?

?

?

(15)

(2 )t

Yfy

?

. As a result, the

II 530

Page 3

argmax( ) x f (2)

argmax( 2 )t

XX

t

Y

t

Rpft x dx

fy

??

??

?

(16)

Consequently, finding the reflection point of

equivalent to finding the main mode of

the sum of two independent random variables, Y would be

‘closer’ to the normal distribution than X by the central limit

theorem. Thus it is advantageous to deal with

of

( )x except for the increasing computation. Finding the

main mode of

( )y can be accomplished by the mean

shift algorithm or by gradient methods [4].

4. APPLICATIONS

( )x is

Xf

( )y . Since Y is

Yf

( )y instead

Yf

Xf

Yf

4.1 Measuring Symmetry of data distributions

In the first example, we compare our method with skewness

in assessing symmetry of data drawn from Cauchy,

Laplacian and exponential distributions (Table I).

TABLE I

DISTRIBUTIONS USED IN THE FIRST EXAMPLE

Distribution

Cauchy

Laplacian

Exponential

2-Cauchy

Mixture

PDF

1/ (1

??

exp( | |)/2

??

exp(x x

??

1

(1 (2) )x

?

??

2

( )f x

( ) f x

( )f x

)x

?

x

),

0

?

22

1

2

1

2

1

x

( )f x

(1 (

?

2) )

??

??

For each distribution, 1000 points are drawn to estimate the

mean, skewness, Rp and

CS

S

. 100 Monte Carlo realizations

are run for each distribution. After that, we calculate the

average and the standard deviation of these estimates. All

distributions and their parameters used in the simulation are

listed in Table I. The kernel size is chosen by the

Silverman’s rule [5]. The results are summarized in Table

II.

CS

S

has much smaller estimation variance than the

skewness. In the case of symmetric, heavy-tailed

distributions like Cauchy and Laplacian, Rp also provides a

much more accurate estimate of the center than the mean. In

most cases, the skewness has such a large variance that its

estimate is uninformative. The reason why

suitable to describe impulsive distributions is beyond the

scope of this paper (see [8]).

The ?-stable distributions are a family of heavy-tailed

distributions widely used in financial analysis [6]. The ?-

stable distribution requires four parameters for complete

description: an index of stability

parameter

[ 1,1]

? ? ?

, a scale parameter

parameter ? , denoted as

S?? ? ? . When

Gaussian distribution results. The Cauchy distribution is a

special case when

1

? ?

and

CS

S

is very

(0,2]

? ? and a location

? ?

? ?

, a skewness

0

( , , )

2

, the

0

? ?

. When

2

? ?

, the

variance is infinite and the tails are asymptotically

equivalent to a Pareto Law. The estimation of stable law

parameters is in general severely hampered by the lack of

known closed-form density functions. As far as we know,

simple descriptors for ?-stable distributions are still lacking

due to the flat tails and asymmetry. In this example, we

show that the newly defined SCS can be used to characterize

the skewness of these distributions. In this example, we fix

three parameters

0.8

? ?

,

? ? ,

to 0.9. For each ? , 2000 data are generated to estimate the

skewness and SCS. 100 Monte Carlo simulations are run so

that the average and the standard deviation are calculated

for both estimates (Fig. 1). The skewness is estimated by

two ways: one is with entire data and the other with 5% data

trimmed off. As we see, a smooth monotonic curve is

obtained between SCS and ? whereas the skewness is

almost uninformative.

10

? ? and vary ? from 0

00.2 0.4 0.60.81

-5

0

5

?

trimmed skewness

00.20.40.60.81

0

0.5

1

Scs

0 0.20.4 0.60.81

-50

0

50

skewness

Fig.1. Estimates of

CS

S

and skewness with different ?

4.2 Signal detection in symmetrically distributed noises

In this example, we utilize the property of Rp in a signal

detection problem. In a digital communication system, let S

and R be respectively the transmitted signal and the

received signal corrupted by the additive noise N.

RS

?

Suppose

i

Ss

?

which is either 0 or 1 with equal

probably. The noise can be any symmetric distribution with

reflection point at the origin.

mixture noises (in Table I) are used in this simulation. The

signal-noise-ratio (SNR) is simply controlled by scaling the

noise. When an interval of signal is observed, T samples are

obtained, based on which a decision is made whether the

transmission is 0 or 1. The mean square error criterion is

commonly used while M-estimation is more effective in

impulsive noises [7]. Assume the sampled received signal is

{ }T

tt

r

is used

N

?

(17)

1.2(1,0,0)S

and 2-Cauchy

1

?. For the Rp detection method, the following criterion

II 531

Page 4

1111

1( 2)( );

0.

TTTT

tt

tt

Decide if G rr

?

G rr

?

Decide

We set T=20 and transmit 104 bits in each simulation to

calculate the bit error rate (BER). Kernel size is set

according to the Silverman’s rule.

otherwise

??

????

????

?? ??

(18)

-35-30 -25 -20-15

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

SNR(dB)

Bit Error Rate

MSE

M-esti

Rp

trmd MSE

Fig.2. Detection performance with ?-stable noise

-30 -25-20-15 -10

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

SNR(dB)

Bit Error Rate

MSE

M-esti

Rp

trmd MSE

Fig.3. Detection performance with 2-Cauchy mixture noise

The noise power and trimmed MSE is estimated from

5% trimmed data. As we see in Fig.3, the mode detection

fails in the case of 2-Cauchy mixture because there is a dip

in the noise PDF at the center.

5. CONCLUSIONS

In this paper, a new quantity Symmetric Information

Potential (SIP) is proposed. Its mathematical meaning,

probabilistic interpretation and relation to the information

potential are presented. Based on this understanding,

descriptors

ED

S

,

CS

S

and Rp are defined to quantify the

reflection symmetry and to estimate the location parameter

of data distributions. All these methods are non-parametric

and robust so they are very useful to characterize impulsive

data distributions. Examples demonstrate the newly

proposed methods outperform the conventional mean and

skewness in estimating the location parameter and

quantifying symmetry in Laplacian, ?-stable and other

mixture models. Future work includes the bias and variance

analysis of the SIP, detailed parameter estimation methods

for ?-stable distributions and its possible application in

supervised learning.

6. REFERENCES

[1] Deniz Erdogmus, Jose Principe, “From linear adaptive filtering

to nonlinear information processing,” IEEE Signal Processing

Magazine, Nov. 2006.

[2] E. Parzen, “On the estimation of a probability density function

and the mode”, Ann. Math. Stat. 33, p1065, 1962.

[3] Deniz Erdogmus, Jose C. Principe, “Generalized information

potential criterion for adaptive system training,” Trans. on Neural

Networks, Vol. 13, No. 5, pp. 1035-1044, Sept. 2002.

[4] Y. Cheng, "Mean shift, mode seeking, and clustering," IEEE

Trans. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790--799,

Aug. 1995.

[5] B. W. Silverman, Density Estimation for Statistics and Data

Analysis. Chapman and Hall, London, 1986.

[6] Nolan, J. P., “Numerical calculation of stable densities and

distribution functions,” Communications in Statistics-Stochastic

Models, 13: 759-774, 1997.

[7] Weifeng Liu, P. P. Pokharel, J. C. Principe, "Correntropy: A

Localized Similarity Measure," in Proc. of Intl. Joint Conf. on

Neural Networks, 2006.

[8] Weifeng Liu, P. P. Pokharel, J. C. Principe, "Error Entropy,

Correntropy and M-Estimation," in Proc. IEEE Int. Workshop on

Machine Learning for Signal Processing, 2006.

[9] NIST/SEMATECH e-Handbook of Statistical Methods,

http://www.itl.nist.gov/div898/handbook/, 2006.

TABLE II

ESTIMATION RESULTS OF CAUCHY DISTRIBUTION

Distributions

Cauchy

Laplacian

Exponential

data mean

average

-2.7038

0.0194

0.9969

data skewness

average

-1.1888

0.00556

2.112

Rp of data

average

-0.00073

CS

S of data

average

0.99951

0.99634

0.84084

std

7.3509

0.0534

0.0270

std

24.037

0.19704 -0.00448

0.20936 0.48942

std

0.04510

0.02650

0.04184

std

0.00026

0.00245

0.01616

II 532