Page 1

LOG-DENSITY DECONVOLUTION

BY WAVELET THRESHOLDING

Jérémie Bigot & Sébastien Van Bellegem

March 2007

Abstract

This paper proposes a new wavelet-based method for deconvolving a

density. The estimator combines the ideas of nonlinear wavelet thresholding

with periodised Meyer wavelets and estimation by information projection.

is guaranteed to be in the class of density functions, in particular it is positive

everywhere by construction. The theoretical optimality of the estimator is

established in terms of rate of convergence of the Kullback-Leibler discrepancy

over Besov classes. Finite sample properties is investigated in detail, and show

the excellent practical performance of the estimator, compared with other recently

introduced estimators.

It

Keywords: Deconvolution, Waveletthresholding, Adaptiveestimation, Information projection, Kullback-

Leibler divergence, Besov space

AMS classifications: Primary 62G07; secondary 42C40, 41A29

Affiliations

JÉRÉMIE BIGOT, Institut de Mathématiques de Toulouse, Laboratoire de Statistique et Probabilités,

Université Paul Sabatier, F-31062 Toulouse Cedex 9, France, Jeremie.Bigot@math.ups-tlse.fr

SÉBASTIEN VAN BELLEGEM, Institut de statistique and CORE, Université catholique de Louvain, Voie

du Roman Pays, 20, B-1348 Louvain-la-Neuve, Belgium, vanbellegem@stat.ucl.ac.be

Ackowledgements

This work was supported by the IAP research network nr P5/24 of the Belgian Government (Belgian

Science Policy). We gratefully ackowledge Yves Rozenholc for providing the Matlab code to compute

Page 2

the model selection estimator, Marc Raimondo for providing the Matlab code for translation invariant

deconvolution, and Anestis Antoniadis for helpful comments and suggestions.

1 Introduction

Density deconvolution is a widely studied statistical problem that is encountered in

manyappliedsituations. Thisproblem ariseswhenthe probabilitydensityofarandom

variable X has to be estimated from an independent and identically distributed (iid)

sample contaminated by some independent additive noise. Namely, the observations

at hand, denoted by Yifor i = 1,...,n, are such that

Yi= Xi+ ǫi,

i = 1,...,n

where Xiare iid variables with unknown density fX, and the added variables ǫimodel

the contamination by some noise. The number n represents the sample size and the

contamination variables ǫiare supposed iid with a known density function fǫ, and

independent from the Xi’s. In this setting, the density function fYof the observed

sample Yican be written as a convolution between the density of interest fX, and the

density of the additive noise fǫ, i.e.

fY(y) = fX⋆ fǫ(y) :=

?

fX(u)fǫ(y − u)du,

y ∈ R . (1.1)

In data analysis, density estimation from noisy sample plays a fundamental role.

Applications can be found in communication theory (e.g. Masry, 2003), experimental

physics (e.g. Kosarev et al., 2003) or econometrics (e.g. Postel-Vinay and Robin, 2002)

to name but a few. The problem of estimating the probability density fXrelates to

classical nonparametric methods of estimation, but the indirect observation of the data

leads to different optimality properties, for instance in terms of rate of convergence.

Among the nonparametric methods of deconvolution, standard methods recently

studied in the statistical literature include estimation by model selection (e.g. Comte,

Rozenholc and Taupin, 2006b), wavelet thresholding (e.g. Fan and Koo, 2002), kernel

smoothing (e.g. Carroll and Hall, 1988), spline deconvolution (e.g. Koo, 1999) or

spectral cut-off (e.g. Carrasco and Florens, 2002). However, a problem frequently

encountered with most of these techniques is that the proposed estimator is not

everywhere positive, therefore is not a valid probability density.

1

Page 3

The goal of this paper is to present an estimator that is optimal in terms

of asymptotic rates of convergence, and that benefits from good finite sample

properties. Furthermore, the proposed estimator is automatically a valid density,

in particular because it is guaranteed to be positive.

uses wavelet thresholding combined with information projection techniques, and is

computationally simple.

The advantage of wavelet methods is their ability in estimating local features of the

density, such as peaks or local discontinuities. In particular, they can estimate irregular

functions (in Besov spaces) with optimal rates of convergence. Wavelet methods for

deconvolution have received a special attention in the recent literature. Optimality of

the nonlinear wavelet estimator has been established in Fan and Koo (2002), but the

given estimator is not computable since it depends on an integral in the frequency

domain that cannot be calculated in practice. The estimator we propose below, in

addition to be a valid density, is fully computable as it only involves finite sums in

finite sample. Other recent wavelet estimators for deconvolution problems include the

work of Johnstone, Kerkyacharian, Picard and Raimondo (2004) or De Canditiis and

Pensky (2006), see also the references therein.

Our estimator combines wavelet thresholding with information projection that

guarantees the solution to be positive. This technique was studied by Barron and Sheu

(1991) for the approximation of density functions by sequences of exponential families.

An extension of this method to linear inverse problems has been studied in Koo and

Chung (1998) using expansions in Fourier series. In the special case of Poisson inverse

problems, Antoniadis and Bigot (2006) combined this technique with estimation by

wavelet expansions.

It is well-known that the difficulty of the deconvolution problem is quantified

by the smoothness of the noise density fǫ.

coefficients of the densities fY, fXand fǫrespectively, then the convolution equation

(1.1) is equivalent to fY

tend to zero, the reconstruction of fX

was systematically studied by Fan (1991), who introduced the following two types of

assumption on the smoothness of fǫ.

The proposed solution

If fY

ℓ, fX

ℓ

and fǫ

ℓdenote the Fourier

ℓ= fX

ℓ· fǫ

ℓ. Depending how fast the Fourier coefficients fǫ

ℓwill be more or less accurate. This phenomenon

ℓ

Assumption 1.1 Ordinary smooth convolution: the Fourier coefficients of fǫdecay in a

polynomial fashion i.e. there exists a constant C and a real ν ? 0 such that |fǫ

ℓ| ∼ C|ℓ|−ν.

2

Page 4

Assumption 1.2 Super smooth convolution: the Fourier coefficients of fǫare such that

d1|ℓ|ν0exp(−|t|ν/d0) ? |fǫ

where d0,d1,d2,ν,ν0,ν1are some positive constants.

ℓ| ? d2|ℓ|ν1exp(−|t|ν/d0) as |ℓ| → ∞,

Inthispaper, wealsoconsider thesetwo smoothness assumptions. Theoptimal rate

of convergence we can expect from a linear or a nonlinear wavelet estimator depends

on thesesmoothness assumptions andare well-studiedin theliterature. Tosummarize,

we know from the work of Pensky and Vidakovic (1999); Fan and Koo (2002) that for

ordinary smooth convolution both linear and nonlinear wavelet estimators achieve

the optimal rate of convergence. This rate is of polynomial order of the sample size n.

However, no adaptive linear estimator are optimal, and only well-calibrated nonlinear

wavelet estimators are adaptive. For the case of super smooth convolution, the optimal

rate of convergence is only of logarithmic order of the sample size, and there is no

difference between the rate of convergence of linear and nonlinear estimators. These

results are recalled in Section 3 below. It is worth mentionning that the estimators we

define in this paper achieve these optimal rates of convergence.

The next section recalls some general results on wavelet approximation and the

definition of the Meyer wavelet used for deconvolution. Then Section 3 defines the

linear and nonlinear wavelet estimators by information projection. The (optimal) rate

of convergence of the proposed estimators are stated in Section 4. The loss function

we consider to calculate this rate is the Kullback-Leibler divergence.

aforementioned difference with the wavelet estimator of Fan and Koo (2002), their

technique of proof is very different from the proof presented in this paper. Our proof

is actually based on a combination of the maxiset theorem in Johnstone et al. (2004) for

hard thresholding waveletestimators and other results on Kullback-Leibler divergence

by Csiszár (1975) or Barron and Sheu (1991).

The fruitful combination of wavelet thresholding and information projection is not

new and was proposed in Antoniadis and Bigot (2006), who were concerned with

the estimation of the intensity of a Poisson process using nonlinear wavelet Galerkin

methods. The following development is different, as it deals with the estimation of

log-densities and is specific to the case of deconvolution with the use of periodised

Meyer wavelets. The proof techniques are also in contrast with Antoniadis and Bigot

(2006), who followed the Gaussian approximation technique developed in Donoho,

Due to the

3

Page 5

Johnstone, Kerkyacharian and Picard (1995). That approach is however not applicable

with periodised Meyer wavelets.

Section 5 addresses the practical issues of the proposed estimation procedure. We

compare the performance of the proposed estimator with two of the most recent

techniques for density deconvolution. The first is deconvolution via cosine series

studied by Hall and Qiu (2005), and the second is the model selection approach of

Comte, Rozenholc and Taupin (2006a). While the estimator by model selection showed

significant small sample improvements against most of the standard techniques

of deconvolution, the proposed wavelet-based estimator by information projection

outperforms the results of Comte et al. (2006a).

We conclude the paper by a technical appendix which contains the proof of the

main theorems, and where we adapt some results of Barron and Sheu (1991) to the

case of estimation by information projection using periodised Meyer wavelets.

2 Meyer wavelets for deconvolution

In this paper, we assume that the support of fXis compact and included in [0,1]. Of

course, this is not an assumption that would hold in many practical applications and it

is mainly made for mathematical convenience to define more easily the estimation of

fXby functions in an exponential family based on a finite linear combination of basis

functions (see the next section). The support of fǫhowever can be unbounded, so the

support of fYis in general unbounded1.

Wavelet systems provide unconditional bases for Besov spaces. Using wavelets,

one can express whether or not fXbelongs to a Besov space by a simple requirement

on the absolute value of the wavelet coefficients of fX. More precisely, assume that

(φ,ψ) denotes some scaling and wavelet functions that have enough regularity and

vanishing moments. If σ = s + (1/2 − 1/p) ? 0, define the norm ? · ?s,p,qby

?q/p

?fX?q

s,p,q=

∞

∑

j=0

?

2jσp

2j−1

∑

k=0

|?g,ψj,k?|p

.

It can be shown (Meyer, 1992) that this norm is equivalent to the norm in traditional

1The case where the support of fXis included in [0,T] is handled by adapting the Fourier tranform

(the corresponding exponential orthogonal system is exp(−i2πxℓ/T)).

4