BY WAVELET THRESHOLDING
Jérémie Bigot & Sébastien Van Bellegem
This paper proposes a new wavelet-based method for deconvolving a
density. The estimator combines the ideas of nonlinear wavelet thresholding
with periodised Meyer wavelets and estimation by information projection.
is guaranteed to be in the class of density functions, in particular it is positive
everywhere by construction. The theoretical optimality of the estimator is
established in terms of rate of convergence of the Kullback-Leibler discrepancy
over Besov classes. Finite sample properties is investigated in detail, and show
the excellent practical performance of the estimator, compared with other recently
Keywords: Deconvolution, Waveletthresholding, Adaptiveestimation, Information projection, Kullback-
Leibler divergence, Besov space
AMS classifications: Primary 62G07; secondary 42C40, 41A29
JÉRÉMIE BIGOT, Institut de Mathématiques de Toulouse, Laboratoire de Statistique et Probabilités,
Université Paul Sabatier, F-31062 Toulouse Cedex 9, France, Jeremie.Bigot@math.ups-tlse.fr
SÉBASTIEN VAN BELLEGEM, Institut de statistique and CORE, Université catholique de Louvain, Voie
du Roman Pays, 20, B-1348 Louvain-la-Neuve, Belgium, email@example.com
This work was supported by the IAP research network nr P5/24 of the Belgian Government (Belgian
Science Policy). We gratefully ackowledge Yves Rozenholc for providing the Matlab code to compute
the model selection estimator, Marc Raimondo for providing the Matlab code for translation invariant
deconvolution, and Anestis Antoniadis for helpful comments and suggestions.
Density deconvolution is a widely studied statistical problem that is encountered in
manyappliedsituations. Thisproblem ariseswhenthe probabilitydensityofarandom
variable X has to be estimated from an independent and identically distributed (iid)
sample contaminated by some independent additive noise. Namely, the observations
at hand, denoted by Yifor i = 1,...,n, are such that
Yi= Xi+ ǫi,
i = 1,...,n
where Xiare iid variables with unknown density fX, and the added variables ǫimodel
the contamination by some noise. The number n represents the sample size and the
contamination variables ǫiare supposed iid with a known density function fǫ, and
independent from the Xi’s. In this setting, the density function fYof the observed
sample Yican be written as a convolution between the density of interest fX, and the
density of the additive noise fǫ, i.e.
fY(y) = fX⋆ fǫ(y) :=
fX(u)fǫ(y − u)du,
y ∈ R . (1.1)
In data analysis, density estimation from noisy sample plays a fundamental role.
Applications can be found in communication theory (e.g. Masry, 2003), experimental
physics (e.g. Kosarev et al., 2003) or econometrics (e.g. Postel-Vinay and Robin, 2002)
to name but a few. The problem of estimating the probability density fXrelates to
classical nonparametric methods of estimation, but the indirect observation of the data
leads to different optimality properties, for instance in terms of rate of convergence.
Among the nonparametric methods of deconvolution, standard methods recently
studied in the statistical literature include estimation by model selection (e.g. Comte,
Rozenholc and Taupin, 2006b), wavelet thresholding (e.g. Fan and Koo, 2002), kernel
smoothing (e.g. Carroll and Hall, 1988), spline deconvolution (e.g. Koo, 1999) or
spectral cut-off (e.g. Carrasco and Florens, 2002). However, a problem frequently
encountered with most of these techniques is that the proposed estimator is not
everywhere positive, therefore is not a valid probability density.
The goal of this paper is to present an estimator that is optimal in terms
of asymptotic rates of convergence, and that benefits from good finite sample
properties. Furthermore, the proposed estimator is automatically a valid density,
in particular because it is guaranteed to be positive.
uses wavelet thresholding combined with information projection techniques, and is
The advantage of wavelet methods is their ability in estimating local features of the
density, such as peaks or local discontinuities. In particular, they can estimate irregular
functions (in Besov spaces) with optimal rates of convergence. Wavelet methods for
deconvolution have received a special attention in the recent literature. Optimality of
the nonlinear wavelet estimator has been established in Fan and Koo (2002), but the
given estimator is not computable since it depends on an integral in the frequency
domain that cannot be calculated in practice. The estimator we propose below, in
addition to be a valid density, is fully computable as it only involves finite sums in
finite sample. Other recent wavelet estimators for deconvolution problems include the
work of Johnstone, Kerkyacharian, Picard and Raimondo (2004) or De Canditiis and
Pensky (2006), see also the references therein.
Our estimator combines wavelet thresholding with information projection that
guarantees the solution to be positive. This technique was studied by Barron and Sheu
(1991) for the approximation of density functions by sequences of exponential families.
An extension of this method to linear inverse problems has been studied in Koo and
Chung (1998) using expansions in Fourier series. In the special case of Poisson inverse
problems, Antoniadis and Bigot (2006) combined this technique with estimation by
It is well-known that the difficulty of the deconvolution problem is quantified
by the smoothness of the noise density fǫ.
coefficients of the densities fY, fXand fǫrespectively, then the convolution equation
(1.1) is equivalent to fY
tend to zero, the reconstruction of fX
was systematically studied by Fan (1991), who introduced the following two types of
assumption on the smoothness of fǫ.
The proposed solution
ℓdenote the Fourier
ℓ. Depending how fast the Fourier coefficients fǫ
ℓwill be more or less accurate. This phenomenon
Assumption 1.1 Ordinary smooth convolution: the Fourier coefficients of fǫdecay in a
polynomial fashion i.e. there exists a constant C and a real ν ? 0 such that |fǫ
ℓ| ∼ C|ℓ|−ν.
Assumption 1.2 Super smooth convolution: the Fourier coefficients of fǫare such that
d1|ℓ|ν0exp(−|t|ν/d0) ? |fǫ
where d0,d1,d2,ν,ν0,ν1are some positive constants.
ℓ| ? d2|ℓ|ν1exp(−|t|ν/d0) as |ℓ| → ∞,
Inthispaper, wealsoconsider thesetwo smoothness assumptions. Theoptimal rate
of convergence we can expect from a linear or a nonlinear wavelet estimator depends
on thesesmoothness assumptions andare well-studiedin theliterature. Tosummarize,
we know from the work of Pensky and Vidakovic (1999); Fan and Koo (2002) that for
ordinary smooth convolution both linear and nonlinear wavelet estimators achieve
the optimal rate of convergence. This rate is of polynomial order of the sample size n.
However, no adaptive linear estimator are optimal, and only well-calibrated nonlinear
wavelet estimators are adaptive. For the case of super smooth convolution, the optimal
rate of convergence is only of logarithmic order of the sample size, and there is no
difference between the rate of convergence of linear and nonlinear estimators. These
results are recalled in Section 3 below. It is worth mentionning that the estimators we
define in this paper achieve these optimal rates of convergence.
The next section recalls some general results on wavelet approximation and the
definition of the Meyer wavelet used for deconvolution. Then Section 3 defines the
linear and nonlinear wavelet estimators by information projection. The (optimal) rate
of convergence of the proposed estimators are stated in Section 4. The loss function
we consider to calculate this rate is the Kullback-Leibler divergence.
aforementioned difference with the wavelet estimator of Fan and Koo (2002), their
technique of proof is very different from the proof presented in this paper. Our proof
is actually based on a combination of the maxiset theorem in Johnstone et al. (2004) for
hard thresholding waveletestimators and other results on Kullback-Leibler divergence
by Csiszár (1975) or Barron and Sheu (1991).
The fruitful combination of wavelet thresholding and information projection is not
new and was proposed in Antoniadis and Bigot (2006), who were concerned with
the estimation of the intensity of a Poisson process using nonlinear wavelet Galerkin
methods. The following development is different, as it deals with the estimation of
log-densities and is specific to the case of deconvolution with the use of periodised
Meyer wavelets. The proof techniques are also in contrast with Antoniadis and Bigot
(2006), who followed the Gaussian approximation technique developed in Donoho,
Due to the
Johnstone, Kerkyacharian and Picard (1995). That approach is however not applicable
with periodised Meyer wavelets.
Section 5 addresses the practical issues of the proposed estimation procedure. We
compare the performance of the proposed estimator with two of the most recent
techniques for density deconvolution. The first is deconvolution via cosine series
studied by Hall and Qiu (2005), and the second is the model selection approach of
Comte, Rozenholc and Taupin (2006a). While the estimator by model selection showed
significant small sample improvements against most of the standard techniques
of deconvolution, the proposed wavelet-based estimator by information projection
outperforms the results of Comte et al. (2006a).
We conclude the paper by a technical appendix which contains the proof of the
main theorems, and where we adapt some results of Barron and Sheu (1991) to the
case of estimation by information projection using periodised Meyer wavelets.
2 Meyer wavelets for deconvolution
In this paper, we assume that the support of fXis compact and included in [0,1]. Of
course, this is not an assumption that would hold in many practical applications and it
is mainly made for mathematical convenience to define more easily the estimation of
fXby functions in an exponential family based on a finite linear combination of basis
functions (see the next section). The support of fǫhowever can be unbounded, so the
support of fYis in general unbounded1.
Wavelet systems provide unconditional bases for Besov spaces. Using wavelets,
one can express whether or not fXbelongs to a Besov space by a simple requirement
on the absolute value of the wavelet coefficients of fX. More precisely, assume that
(φ,ψ) denotes some scaling and wavelet functions that have enough regularity and
vanishing moments. If σ = s + (1/2 − 1/p) ? 0, define the norm ? · ?s,p,qby
It can be shown (Meyer, 1992) that this norm is equivalent to the norm in traditional
1The case where the support of fXis included in [0,T] is handled by adapting the Fourier tranform
(the corresponding exponential orthogonal system is exp(−i2πxℓ/T)).