A New Information Theoretic Measure of PDF Symmetry
ABSTRACT In this paper, a new quantity called symmetric information potential (SIP) is proposed to measure the reflection symmetry and to estimate the location parameter of probability density functions. SIP is defined as an inner product in the probability density function space and has a close relation to information theoretic learning. A simple nonparametric estimator directly from the data exists. Experiments demonstrate that this concept can be very useful dealing with impulsive data distributions, in particular, alphastable distributions

Conference Paper: Variance and Bias Analysis of Information Potential and Symmetric Information Potential
[Show abstract] [Hide abstract]
ABSTRACT: Information theoretical learning (ITL) is a signal processing technique that goes far beyond the traditional techniques based on second order statistics which highly relies on the linearity and Gaussinarity assumptions. Information potential (IP) and symmetric information potential (SIP) are very important concepts in ITL used for system adaptation and data inference. In this paper, a mathematical analysis of the bias and the variance of their estimators is presented. Our results show that the variances decrease as the sample size N increases at the speed of O(N<sup>1</sup>) and a bound exists for the biases. A simple numerical simulation is demonstrated to support our analysis.Machine Learning for Signal Processing, 2007 IEEE Workshop on; 09/2007
Page 1
A NEW INFORMATION THEORETIC MEASURE OF PDF SYMMETRY
Weifeng Liu, P. P. Pokharel, Jose C. Principe
CNEL, University of Florida
ABSTRACT
In this paper, a new quantity called symmetric information
potential (SIP) is proposed to measure the reflection
symmetry and to estimate the location parameter of
probability density functions. SIP is defined as an inner
product in the probability density function space and has a
close relation to information theoretic learning. A simple
nonparametric estimator directly from data exists.
Experiments demonstrate that this concept can be very
useful dealing with impulsive data distributions, in
particular, ?stable distributions.?
Index Terms— Symmetric distributions, information
theoretic learning, robust statistics
1. INTRODUCTION
A fundamental task in statistical analysis is to characterize
the location and variability of a data set. Further
characterization of the data includes skewness and kurtosis.
In this paper, we mainly address two of these important
issues: location and skewness.
The estimation of a location parameter is to find a
typical or central value that best describes the data. For
univariate data, mean, median, and mode are three common
methods [9]. However, if the data is Cauchy distributed, the
mean becomes useless, because collecting more data does
not provide a more accurate estimate [9].
Another important characteristic of a data distribution is
skewness, which is a measure of symmetry. A distribution is
symmetric if it looks the same to the left and right of the
center point. For the data set
{ }N
x
as
1
sk(
i
i
x
?
?
where x and s are the mean and the standard deviation of
the data. By definition, the skewness measures symmetry
with respect to the mean. If the mean estimate is
meaningless as in the Cauchy distribution, the skewness is
also meaningless although the Cauchy distribution is
obviously symmetric. In this paper, we try to solve this
dilemma by defining a new concept of center and a new
1
i i
?, the skewness is defined
33
) /( 1)
N
xNs
???
(1)
?This work was partially supported by NSF grant ECS0601271.
symmetry measure of data distributions based on
information theoretic learning [1].
The organization of the paper is as follows. In section
2, a brief review is given about information theoretic
learning. Then the definition and properties of symmetric
information potential (SIP) are presented, based on which
the Euclidean symmetry measure, CauchySchwartz
symmetry measure and the reflection point are defined in
section 3. Possible applications are discussed in section 4.
Finally, section 5 summarizes the main conclusions.
2. INFORMATION THEORETIC LEARNING
Given i.i.d. samples
1
{ }N
i i
x
estimator [2] of the PDF is
1
ˆ( )
X
fx
N
where
()
i
Gxx
?
?
is the Gaussian kernel and ? is the
kernel size.
()exp( (
i
Gxxx
?
???
Renyi’s quadratic entropy of a random variable X with
PDF
( )
x is defined by
? drawn from
( )
x , the Parzen
Xf
1
()
N
i
i
Gxx
?
?
??
?
(2)
22
) /2)/ 2
i
x
????
(3)
Xf
2
2( )X log( )x dx
X
Hf
??
??
? ?
?
(4)
The argument of the log function in (4) is called information
potential (IP) since the PDF estimated with Parzen kernels
can be thought to define an information potential field over
the space of the samples [3]. A nonparametric estimator of
the IP (and thus of quadratic Renyi’s entropy) directly from
samples is obtained through (2)
1
ˆ( )
j
N
?
3. SYMMETRIC INFORMATION POTENTIAL
2
2
11
()
NN
ji
i
IP XGxx
?
?
??
??
(5)
3.1. Definition
Definition: Suppose the PDF of a random variable X is
( )
x . Symmetric Information Potential (SIP) of X is
defined as
??
?
?
With i.i.d. samples
1
{ }N
i i
x
estimator is obtained as
Xf
( )( )
x f
()
XX
SIP Xfx dx
??
?
(6)
? drawn from X, a nonparametric
II 529 1424407281/07/$20.00 ©2007 IEEEICASSP 2007
Page 2
2
2
11
1
ˆ( )
()
NN
ji
ji
SIP XGxx
N
?
??
??
??
(7)
Comparing (7) with (5), we see that SIP is very similar to IP
and furthermore when the distribution is symmetric,
i.e.
( )()
XX
fxfx
??
, SIP reduces to IP by definition.
3.2 Properties
Let X be a random variable.
Property 1:
( )0
SIP X ? .
Property 2:
( )( )
SIP X IP X
?
. Equality holds if and only if
( )()
XX
fxfx
??
.
Proof: By CauchySchwartz inequality
( )( )
X
SIP Xf
?
?
??
?
Equality holds when
( )
X
fx
?
real positive constant. Since
( )
X
fx dx
?
??
a can only be 1. This completes the proof.
Property 3: Define a new random variable
where
1
X is independent of
PDF
( )x . SIP(X) equals the probability density of
Proof: Since
1
X and
2
X are independent, the PDF of Y is
( )( )
YX
fyfx f
?
?
Therefore,
( )( )
XX
SIP Xf x f
?
?
This completes the proof. With only the data available and
by using (2) in (10), we have
1
ˆ( )
Y
ji
N
??
This is exactly the Parzen estimator of
)}N
ij i j
xx
?
?
are regarded as the realizations drawn from Y.
3.3. Reflection Symmetry Measure
It is natural to measure symmetry of a data distribution by
comparing its PDF
( )x with its mirror image
since this is the fundamental definition of the reflection
symmetry. If we treat
( )x and
the PDF space, their Euclidean distance is
(( )
EDXX
Sfxf
??
?
??
221/2
)
2
()
(( )x dx f()
( ) x dx( )
X
XX
X
x fx dx
fx dx
fIP X
?
??
??
(8)
()
X
afx
?
, where a is some
()1
X
fx dx
??
(9)
12
YXX
??
,
2
X but they have the same
Xf0Y ? .
()
X
yx dx
?
. (10)
()( 0)
Y
x dxfy
???
. (11)
2
2
11
()
NN
ji
fyGxxy
?
???
??
(12)
( )y
Yf
if
,1
{(
Xf()
Xfx
?
Xf()
Xfx
?
as two points in
2
2
())
2( (( )
x dx
( )
x f
())
2(( ) ( ))
XXX
x dx
ffx dx
IP XSIP X
?
???
??
(13)
By the properties of SIP, we see that
(1) 02 ( )
ED
S IP X
??
;
(2)
S
way to define ‘distance’ in a vector space is to measure the
angle between two vectors. Therefore, the CauchySchwartz
symmetry measure SCS is
( )
X
CS
XX
fx fx dx
?
?
?
This is a normalized version of SIP and it denotes the cosine
value of the angle between
following properties:
(1) 01
CS
S
?? ;
(2)
1( )(
CSXX
Sifffxf
???
S
and
CS
S
are almost equivalent. Most importantly,
they possess nice nonparametric estimators directly from
data whereas other forms of divergence such as Kullback
Leibler are computationally expensive. Practically
more appropriate than
ED
S
since it removes the effect of the
‘norm’ of the PDF.
3.4. Reflection Point
In the previous definitions, we set the reflection point at the
origin by default. However, it would be appealing in some
cases to have the symmetry as an internal property of the
data distribution and independent of the external coordinate
system. As defined in (1), the skewness measures symmetry
with respect to the data mean, and is therefore shift
invariant. By analogy, the center of the data can be first
estimated with a subsequent shift to the origin. However,
this means our definition of symmetry depends on the
particular choice of which point is the center and in practice
depends on what kind of method is used to estimate it.
Intuitively, the concepts of reflection point and reflection
symmetry are codependent. Further, it is logically sound to
define the reflection point, with respect to which the
maximal symmetry is achieved. The reflection point Rp can
be used to represent the ‘center’ of the data distribution.
Assume we shift the data by t. The
?
?
?
?
?
?
The denominator, i.e. the IP, is shiftinvariant and the
numerator turns out to be
reflection point is defined as
0 ( )x()
ED
is called the Euclidean Symmetry measure. Another
XX
Siffffx
???
.
ED
1/2
)
1/2
)
()
(( ) ( )(()()
( )
x f
()
( )
( )
( )
x f
( )
x dx
X
XX
XX
XX
fx f x dx
S
f x fx dx
f x dx
SIP X
IP X
f
?
?
??
?
??
?
?
(14)
( )x and
Xf()
Xfx
?
.
CS
S
has the
)x
.
ED
CS
S
is
CS
S
becomes
21/2
)
2 1/2
)
()()
( )t
(()(()
( )x f(2)
( )x f( )x dx
XX
?
CS
XX
XX
XX
fx t f x t dx
? ?
S
fx t dxf x t dx
? ?
ftx dx
f
?
?
?
(15)
( 2 )t
Yfy
?
. As a result, the
II 530
Page 3
argmax( )x f(2)
argmax( 2 )t
XX
t
Y
t
Rpftx dx
fy
??
??
?
(16)
Consequently, finding the reflection point of
equivalent to finding the main mode of
the sum of two independent random variables, Y would be
‘closer’ to the normal distribution than X by the central limit
theorem. Thus it is advantageous to deal with
of
( )x except for the increasing computation. Finding the
main mode of
( )y can be accomplished by the mean
shift algorithm or by gradient methods [4].
4. APPLICATIONS
( )x is
Xf
( )y . Since Y is
Yf
( )y instead
Yf
Xf
Yf
4.1 Measuring Symmetry of data distributions
In the first example, we compare our method with skewness
in assessing symmetry of data drawn from Cauchy,
Laplacian and exponential distributions (Table I).
TABLE I
DISTRIBUTIONS USED IN THE FIRST EXAMPLE
Distribution
Cauchy
Laplacian
Exponential
2Cauchy
Mixture
PDF
1/ (1
??
exp(  )/2
??
exp(x x
??
1
(1 (2) )x
?
??
2
( )f x
( ) f x
( )f x
)x
?
x
),
0
?
22
1
2
1
2
1
x
( )f x
(1 (
?
2) )
??
??
For each distribution, 1000 points are drawn to estimate the
mean, skewness, Rp and
CS
S
. 100 Monte Carlo realizations
are run for each distribution. After that, we calculate the
average and the standard deviation of these estimates. All
distributions and their parameters used in the simulation are
listed in Table I. The kernel size is chosen by the
Silverman’s rule [5]. The results are summarized in Table
II.
CS
S
has much smaller estimation variance than the
skewness. In the case of symmetric, heavytailed
distributions like Cauchy and Laplacian, Rp also provides a
much more accurate estimate of the center than the mean. In
most cases, the skewness has such a large variance that its
estimate is uninformative. The reason why
suitable to describe impulsive distributions is beyond the
scope of this paper (see [8]).
The ?stable distributions are a family of heavytailed
distributions widely used in financial analysis [6]. The ?
stable distribution requires four parameters for complete
description: an index of stability
parameter
[ 1,1]
? ? ?
, a scale parameter
parameter ? , denoted as
S?? ? ? . When
Gaussian distribution results. The Cauchy distribution is a
special case when
1
? ?
and
CS
S
is very
(0,2]
? ? and a location
? ?
? ?
, a skewness
0
( , , )
2
, the
0
? ?
. When
2
? ?
, the
variance is infinite and the tails are asymptotically
equivalent to a Pareto Law. The estimation of stable law
parameters is in general severely hampered by the lack of
known closedform density functions. As far as we know,
simple descriptors for ?stable distributions are still lacking
due to the flat tails and asymmetry. In this example, we
show that the newly defined SCS can be used to characterize
the skewness of these distributions. In this example, we fix
three parameters
0.8
? ?
,
? ? ,
to 0.9. For each ? , 2000 data are generated to estimate the
skewness and SCS. 100 Monte Carlo simulations are run so
that the average and the standard deviation are calculated
for both estimates (Fig. 1). The skewness is estimated by
two ways: one is with entire data and the other with 5% data
trimmed off. As we see, a smooth monotonic curve is
obtained between SCS and ? whereas the skewness is
almost uninformative.
10
? ? and vary ? from 0
00.2 0.40.60.81
5
0
5
?
trimmed skewness
00.20.40.60.81
0
0.5
1
Scs
00.2 0.4 0.60.81
50
0
50
skewness
Fig.1. Estimates of
CS
S
and skewness with different ?
4.2 Signal detection in symmetrically distributed noises
In this example, we utilize the property of Rp in a signal
detection problem. In a digital communication system, let S
and R be respectively the transmitted signal and the
received signal corrupted by the additive noise N.
RS
?
Suppose
i
Ss
?
which is either 0 or 1 with equal
probably. The noise can be any symmetric distribution with
reflection point at the origin.
mixture noises (in Table I) are used in this simulation. The
signalnoiseratio (SNR) is simply controlled by scaling the
noise. When an interval of signal is observed, T samples are
obtained, based on which a decision is made whether the
transmission is 0 or 1. The mean square error criterion is
commonly used while Mestimation is more effective in
impulsive noises [7]. Assume the sampled received signal is
{ }T
tt
r
is used
N
?
(17)
1.2(1,0,0)S
and 2Cauchy
1
?. For the Rp detection method, the following criterion
II 531
Page 4
1111
1(2)( );
0.
TTTT
tt
tt
Decide if G rr
?
G rr
?
Decide
We set T=20 and transmit 104 bits in each simulation to
calculate the bit error rate (BER). Kernel size is set
according to the Silverman’s rule.
otherwise
??
????
????
????
(18)
3530 252015
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
SNR(dB)
Bit Error Rate
MSE
Mesti
Rp
trmd MSE
Fig.2. Detection performance with ?stable noise
3025201510
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
SNR(dB)
Bit Error Rate
MSE
Mesti
Rp
trmd MSE
Fig.3. Detection performance with 2Cauchy mixture noise
The noise power and trimmed MSE is estimated from
5% trimmed data. As we see in Fig.3, the mode detection
fails in the case of 2Cauchy mixture because there is a dip
in the noise PDF at the center.
5. CONCLUSIONS
In this paper, a new quantity Symmetric Information
Potential (SIP) is proposed. Its mathematical meaning,
probabilistic interpretation and relation to the information
potential are presented. Based on this understanding,
descriptors
ED
S
,
CS
S
and Rp are defined to quantify the
reflection symmetry and to estimate the location parameter
of data distributions. All these methods are nonparametric
and robust so they are very useful to characterize impulsive
data distributions. Examples demonstrate the newly
proposed methods outperform the conventional mean and
skewness in estimating the location parameter and
quantifying symmetry in Laplacian, ?stable and other
mixture models. Future work includes the bias and variance
analysis of the SIP, detailed parameter estimation methods
for ?stable distributions and its possible application in
supervised learning.
6. REFERENCES
[1] Deniz Erdogmus, Jose Principe, “From linear adaptive filtering
to nonlinear information processing,” IEEE Signal Processing
Magazine, Nov. 2006.
[2] E. Parzen, “On the estimation of a probability density function
and the mode”, Ann. Math. Stat. 33, p1065, 1962.
[3] Deniz Erdogmus, Jose C. Principe, “Generalized information
potential criterion for adaptive system training,” Trans. on Neural
Networks, Vol. 13, No. 5, pp. 10351044, Sept. 2002.
[4] Y. Cheng, "Mean shift, mode seeking, and clustering," IEEE
Trans. Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790799,
Aug. 1995.
[5] B. W. Silverman, Density Estimation for Statistics and Data
Analysis. Chapman and Hall, London, 1986.
[6] Nolan, J. P., “Numerical calculation of stable densities and
distribution functions,” Communications in StatisticsStochastic
Models, 13: 759774, 1997.
[7] Weifeng Liu, P. P. Pokharel, J. C. Principe, "Correntropy: A
Localized Similarity Measure," in Proc. of Intl. Joint Conf. on
Neural Networks, 2006.
[8] Weifeng Liu, P. P. Pokharel, J. C. Principe, "Error Entropy,
Correntropy and MEstimation," in Proc. IEEE Int. Workshop on
Machine Learning for Signal Processing, 2006.
[9] NIST/SEMATECH eHandbook of Statistical Methods,
http://www.itl.nist.gov/div898/handbook/, 2006.
TABLE II
ESTIMATION RESULTS OF CAUCHY DISTRIBUTION
Distributions
Cauchy
Laplacian
Exponential
data mean
average
2.7038
0.0194
0.9969
data skewness
average
1.1888
0.00556
2.112
Rp of data
average
0.00073
CS
S of data
average
0.99951
0.99634
0.84084
std
7.3509
0.0534
0.0270
std
24.037
0.19704 0.00448
0.20936 0.48942
std
0.04510
0.02650
0.04184
std
0.00026
0.00245
0.01616
II 532