Adaptive estimation of covariance functions via wavelet thresholding and information projection
ABSTRACT In this paper, we study the problem of nonparametric adaptive estimation of the covariance function of a stationary Gaussian process. For this purpose, we consider a waveletbased method which combines the ideas of wavelet approximation and estimation by information projection in order to warrants the positive semidefiniteness property of the solution. The spectral density of the process is estimated by projecting the wavelet thresholding expansion of the periodogram onto a family of exponential functions. This ensures that the spectral density estimator is a strictly positive function. Then, by Bochner theorem, we obtain a semidefinite positive estimator of the covariance function. The theoretical behavior of the estimator is established in terms of rate of convergence of the KullbackLeibler discrepancy over Besov classes. We also show the excellent practical performance of the estimator in some numerical experiments.
 Citations (20)
 Cited In (0)

 Scandinavian Journal of Statistics 02/1999; 26(1):7386. · 1.17 Impact Factor
 SourceAvailable from: robertnz.net01/1973;
Page 1
Adaptive estimation of covariance functions via wavelet thresholding
and information projection
J´ er´ emie Bigot1, Rolando J. Biscay Lirio3, JeanMichel Loubes1& Lilian Mu˜ niz Alvarez2
IMT, Universit´ e Paul Sabatier, Toulouse, France1
Facultad de Matem´ atica y Computaci´ on de la Universidad de La Habana2
Instituto de Cibern´ etica, Matem´ atica y F´ ısica, Cuba3
December 15, 2009
Abstract
In this paper, we study the problem of nonparametric adaptive estimation of the covariance
function of a stationary Gaussian process. For this purpose, we consider a waveletbased method
which combines the ideas of wavelet approximation and estimation by information projection in
order to warrants the nonnegative definiteness property of the solution. The spectral density of
the process is estimated by projecting the wavelet thresholding expansion of the periodogram onto
a family of exponential functions. This ensures that the spectral density estimator is a strictly
positive function. Then, by Bochner theorem, we obtain a semidefinite positive estimator of the
covariance function. The theoretical behavior of the estimator is established in terms of rate of
convergence of the KullbackLeibler discrepancy over Besov classes. We also show the excellent
practical performance of the estimator in some numerical experiments.
Keywords: Covariance estimation, adaptive estimation, wavelet thresholding, sequences of exponential
families, Besov spaces.
AMS classifications: Primary 62G07; secondary 42C40, 41A29
1 Introduction
Estimating a covariance function is a fundamental problem in inference for stationary stochastic
processes. Many applications in several fields such as geosciences, ecology, demography and financial
series are deeply related to this issue, see for instance Journel and Huijbregts [19], Christakos [7]
and Stein [28]. The purpose of this work is not only to provide an estimate but also to obtain
an estimator which is a covariance function. In particular, we aim at preserving the property of
nonnegative definiteness.
For this, usually statisticians resort to fitting parametric models, which are numerous in the
literature, see Cressie [9] for a detailed account of parametric covariance estimation. Nonparametric
approaches provide more flexibility when constructing the estimator but their main drawback comes
from the difficulty to restrict to nonnegative definite class of estimators. For example, Shapiro and
Botha [27] suggest an estimator with this property on a discrete set but not on the continuum.
In the non stationnary case and for spatiotemporal data, Sampson and Guttorp [25] propose an
approach based on the particular covariance representation due to Schoenberg [26], which ensures
that the resulting estimator is a covariance function. Hall, Fisher and Hoffman in [18] enforce
hal00440424, version 2  15 Dec 2009
Page 2
the nonnegative definiteness property on a kernel type estimator of the covariance function using
Bochner theorem [3], which characterizes the class of continuous nonnegative definite functions by
the behavior of their Fourier transform. However, this approach requires the precise choice of three
parameters, including an optimal selection of the bandwidth of the kernel function. More recently
Elogne, Perrin and ThomasAgnan [15] use interpolation methods for estimating smooth stationary
covariance. The major drawback is that the computation of this estimator is difficult since it involves
the calculation of convolution integrals.
When a spectral measure has a density, this density is called the spectral density. Since Bochner’s
theorem states that a continuous function on Rdis nonnegative definite if and only if it is the Fourier
transform of a bounded nonnegative measure called the spectral measure, hence, the estimation of
the covariance function is strongly related to the estimation of the spectral density of the process.
Actually, inference in the spectral domain uses the periodogram of the data, providing an incon
sistent estimator which must be smoothed in order to achieve consistency. For highly regular spectral
densities, linear smoothing techniques such as kernel smoothing are appropriate (see Brillinger [4]).
However, linear smoothing methods are not able to achieve the optimal meansquare rate of conver
gence for spectra whose smoothness is distributed inhomogeneously over the domain of interest. For
this nonlinear methods are needed. One nonlinear method for adaptive spectral density estimation
of a stationary Gaussian sequence was proposed by Comte [8]. It is based on model selection tech
niques. Others nonlinear smoothing procedures are the wavelet thresholding methods, first proposed
by Donoho and Johnstone [14]. In this context, different thresholding rules have been proposed by
Neumann [23] and Fryzlewics, Nason and von Sachs [16] to name but a few.
Neumann’s approach [23] consists in preestimating the variance of the periodogram via kernel
smoothing, so that it can be supplied to the wavelet estimation procedure. Kernel preestimation
may not be appropriate in cases where the underlying spectral density is of low regularity. One
way to avoid this problem is proposed in Fryzlewics, Nason and von Sachs [16], where the empirical
wavelet coefficient thresholds are built as appropriate local weighted l1norms of the periodogram.
Their method does not produce a positive spectral density estimator, therefore the corresponding
estimator of the covariance function is not semidefinite positive.
To overcome the drawbacks of previous estimators, in this paper we propose a new wavelet
based method for the estimation of the spectral density of a Gaussian process and its corresponding
covariance function. As a solution to ensure nonnegativeness of the spectral density estimator,
our method combines the ideas of wavelet thresholding and estimation by information projection.
We estimate the spectral density by a projection of the nonlinear wavelet approximation of the
periodogram onto a family of exponential functions. Therefore, the estimator is non negative by
construction. Then, by Bochner theorem, the corresponding estimator of the covariance function
satisfies the nonnegative definiteness property. This technique was studied by Barron and Sheu [2]
for the approximation of density functions by sequences of exponential families, by Loubes and Yan
[21] for penalized maximum likelihood estimation with l1penalty, by Antoniadis and Bigot [1] for the
study of Poisson inverse problems, and by Bigot and Van Bellegem [5] for logdensity deconvolution.
The theoretical optimality of the estimators for the spectral density of a stationary process is
generally studied using risk bounds in L2norm. This is the case in the papers of Neumann [23],
Comte [8] and Fryzlewics, Nason and von Sachs [16] mentioned before. In this work, the behavior
of the proposed estimator is established in terms of the rate of convergence of the KullbackLeibler
discrepancy over Besov classes, which is maybe a more natural loss function for the estimation of a
spectral density function than the L2norm. Moreover, the thresholding rules that we use to derive
adaptive estimators differ from previous approaches based on wavelet decomposition and are quite
2
hal00440424, version 2  15 Dec 2009
Page 3
simple to compute. Finally, we compare the performance of our estimated with other estimators on
some simulations.
The paper is organized as follows. Section 2 presents the statistical framework under which
we work. We define the model, the waveletbased exponential family and the linear and nonlinear
wavelet estimators by information projection. We also recall the definition of the KullbackLeibler
divergence and some results on Besov spaces. The rate of convergence of the proposed estimators
are stated in Section 3. Some numerical experiments are described in Section 4. Technical lemmas
and proofs of the main theorems are gathered in the Appendix.
Throughout this paper C denotes a constant that may vary from line to line. The notation C(.)
specifies the dependency of C on some quantities.
2Statistical framework
2.1The model
We aim at providing a nonparametric adaptive estimation of the spectral density which satisfies the
property of being nonnegative in order to guarantee that the covariance estimator is a nonnegative
definite function. We consider the sequence (Xt)t∈Nthat satisfies the following assumptions:
Assumption 1 The sequence (X1,...Xn) is an nsample drawn from a stationary sequence of Gaus
sian random variables.
Let ρ be the covariance function of the process, i.e. ρ(h) = cov (Xt,Xt+h) with h ∈ Z. The
spectral density f is defined as:
f (ω) =
1
2π
?
h∈Z
ρ(h)ei2πωh, ω ∈ [0,1].
We need the following standard assumption on ρ:
Assumption 2 The covariance function ρ is nonnegative definite, such that there exists two con
stants 0 < C1,C2< +∞ such that
h∈Z
?
ρ(h) = C1and
?
h∈Z
??hρ2(h)??= C2.
Assumption 2 implies in particular that the spectral density f is bounded by the constant C1.
As a consequence, it is also square integrable. As in Comte [8], the data consist on a number of
observations X1,...,Xnat regularly spaced points. We want to obtain a positive estimator for the
spectral density function f without parametric assumptions on the basis of these observations. For
this, we combine the ideas of wavelet thresholding and estimation by information projection.
2.2Estimation by information projection
2.2.1 Waveletbased exponential family
To ensure nonnegativity of the estimator, we will look for approximations over an exponential family.
For this, we construct a sieves of exponential functions defined in a wavelet basis.
Let φ(ω) and ψ (ω), respectively, be the scaling and the wavelet functions generated by an
orthonormal multiresolution decomposition of L2([0,1]), see Mallat [22] for a detailed exposition
3
hal00440424, version 2  15 Dec 2009
Page 4
on wavelet analysis. Throughout the paper, the functions φ and ψ are supposed to be compactly
supported and such that ?φ?∞< +∞, ?ψ?∞< +∞. Then, for any integer j0≥ 0, any function
g ∈ L2([0,1]) has the following representation:
2j0−1
?
j0
g (ω) =
k=0
?g,φj0,k?φj0,k(ω) +
+∞
?
j=j0
2j−1
?
k=0
?g,ψj,k?ψj,k(ω),
where φj0,k(ω) = 2
expand the spectral density f onto this wavelet basis and to find an estimator of this expansion
that is then modified to impose the positivity property. The scaling and wavelet coefficients of the
spectral density function f are denoted by aj0,k= ?f,φj0,k? and bj,k= ?f,ψj,k?.
To simplify the notations, we write (ψj,k)j=j0−1for the scaling functions (φj,k)j=j0. Let j1≥ j0
and define the set
Λj1=?(j,k) : j0− 1 ≤ j < j1,0 ≤ k ≤ 2j− 1?.
Note that #Λj1= 2j1, where #Λj1denotes the cardinal of Λj1. Let θ denotes a vector in R#Λj1, the
waveletbased exponential family Ej1at scale j1is defined as the set of functions:
(j,k)∈Λj1
We will enforce our estimator of the spectral density to belong to the family Ej1of exponential
functions, which are positive by definition.
2φ?2j0ω − k?and ψj,k(ω) = 2
j
2ψ?2jω − k?. The main idea of this paper is to
Ej1=
fj1,θ(.) = exp
?
θj,kψj,k(.)
, θ = (θj,k)(j,k)∈Λj1∈ R#Λj1
.(2.1)
2.2.2Information projection
Following Csisz´ ar [10], it is possible to define the projection of a function f onto Ej1. If this projection
exists, it is defined as the function fj1,θ∗
function f in the KullbackLeibler sense, and is characterized as the unique function in the family
Ej1for which
?
Note that the notation βj,kis used to denote both the the scaling coefficients aj0,kand the wavelet
coefficients bj,k.
Let
j1in the exponential family Ej1that is the closest to the true
fj1,θ∗
j1,ψj,k
?
= ?f,ψj,k? := βj,kfor all (j,k) ∈ Λj1.
In(ω) =
1
2πn
n
?
t=1
n
?
t′=1
?Xt− X??Xt′ − X?∗ei2πω(t−t′),
?Xt− X?∗denotes the conjugate transpose of
t=1Xt. The expansion of In(ω) onto the wavelet basis allows to obtain estimators of aj0,k
and bj,kgiven by
be the classical periodogram, where
n ?
?Xt− X?
and
X =
1
n
? aj0,k=
1
?
0
In(ω)φj0,k(ω)dωand
?bj,k=
1
?
0
In(ω)ψj,k(ω)dω.(2.2)
4
hal00440424, version 2  15 Dec 2009
Page 5
It seems therefore natural to estimate the function f by searching for some?θn∈ R#Λj1 such that
?
0
fj1,bθn,ψj,k
?
=
1
?
In(ω)ψj,k(ω)dω :=?βj,kfor all (j,k) ∈ Λj1,(2.3)
where?βj,kdenotes both the estimation of the scaling coefficients ? aj0,kand the wavelet coefficients
Similarly, the positive nonlinear estimator with hard thresholding is defined as the function fHT
?bj,k. The function fj1,bθnis the spectral density positive linear estimator.
(with?θn∈ R#Λj1) such that
fHT
= δξ
for all (j,k) ∈ Λj1,
where δξdenotes the hard thresholding rule defined by
j1,bθn,ξ
?
j1,bθn,ξ,ψj,k
?
??βj,k
?
(2.4)
δξ(x) = xI (x ≥ ξ) for x ∈ R,
where ξ > 0 is an appropriate threshold whose choice is discussed later on.
The existence of these estimators is questionable. Moreover, there is no way to obtain an explicit
expression for?θn. In our simulations, we use a numerical approximation of?θn that is obtained
probability one is a difficult task. For the related problem of estimating a density from an independent
and identically distributed random variable, it is even shown in Barron and Sheu [2] that for some
exponential family (e.g. based on a spline basis), the vector?θnmay not exist with a small positive
and fHT
j1,bθn,ξwith probability tending to one as n → +∞.
2.3An appropriate loss function : the KullbackLeibler divergence
via a gradientdescent algorithm with an adaptive step. Proving that such estimators exist with
probability. Thus, in the next sections, some sufficient conditions are given for the existence of fj1,bθn
To assess the quality of the estimators, we will measure the discrepancy between an estimator?f and
∆f;?f
where µ denotes the Lebesgue measure on [0,1]. It can be shown that ∆
equals zero if and only if?f = f.
2.4Smoothness Assumptions
the true function f in the sense of relative entropy (KullbackLeibler divergence) defined by:
??
=
?1
0
?
f log
?f
?f
?
− f +?f
?
dµ,
?
f;?f
?
is nonnegative and
It is well known that Besov spaces for periodic functions in L2([0,1]) can be characterized in terms
of wavelet coefficients (see e.g. Mallat [22]). Assume that ψ has m vanishing moments, and let
0 < s < m denote the usual smoothness parameter. Then, for a Besov ball Bs
with 1 ≤ p,q ≤ ∞, one has that for s∗= s + 1/2 − 1/p ≥ 0:
5
p,q(A) of radius A > 0
Bs
p,q(A) :=
g ∈ L2([0,1] : ?g?s,p,q:=
2j0−1
?
k=0
aj0,kp
1
p
+
∞
?
j=j0
2js∗q
2j−1
?
k=0
bj,kp
q
p
1
q
≤ A
,
hal00440424, version 2  15 Dec 2009
Page 6
with the respective above sums replaced by maximum if p = ∞ or q = ∞ and where aj0,k= ?g,φj0,k?
and bj,k= ?g,ψj,k?.
The condition that s+1/2−1/p ≥ 0 is imposed to ensure that Bs
and we shall restrict ourselves to this case in this paper (although not always stated, it is clear that
all our results hold for s < m). Besov spaces allow for more local variability in local smoothness
than is typical for functions in the usual H¨ older or Sobolev spaces. For instance, a real function f
on [0,1] that is piecewise continuous, but for which each piece is locally in Cs, can be an element of
Bs
p,p(A) with 1 ≤ p < 2, despite the possibility of discontinuities at the transition from one piece to
the next (see e.g. Proposition 9.2 in Mallat [22]). Note that if s > 1 is not an integer, then Bs
is equivalent to a Sobolev ball of order s. Moreover, the space Bs
piecewise smooth functions with local irregularities such as discontinuities.
Let M > 0 and denote by Fs
p,q(M) the set of functions such that
p,q(A) is a subspace of L2([0,1]),
2,2(A)
p,q(A) with 1 ≤ p < 2 contains
Fs
p,q(M) = {f = exp(g) : ?g?s,p,q≤ M},
where ?g?s,p,q denotes the norm in the Besov space Bs
implies that f is strictly positive. In the next section we establish the rate of convergence of our
estimators in terms of the KullbackLeibler discrepancy over Besov classes.
p,q. Note that assuming that f ∈ Fs
p,q(M)
3Asymptotic behavior of the estimators
We make the following assumption on the wavelet basis that guarantees that Assumption 2 holds
uniformly over Fs
p,q(M).
Assumption 3 Let M > 0, 1 ≤ p ≤ 2 and s > 1/p. For f ∈ Fs
?1
that there exists a constant M∗such that for all f ∈ Fs
C1(f) ≤ M∗and C2(f) ≤ M∗.
p,q(M) and h ∈ Z, let ρ(h) =
??hρ2(h)??. Then, the wavelet basis is such
0f(ω)e−i2πωhdω, C1(f) :=
?
h∈Z
ρ(h) and C2(f) :=
?
p,q(M),
h∈Z
3.1Linear estimation
The following theorem is the general result on the linear information projection estimator of the
spectral density function. Note that the choice of the coarse level resolution level j0 is of minor
importance, and without loss of generality we take j0= 0 for the linear estimator fj1,bθn.
Theorem 3.1 Assume that f ∈ Fs
satisfied. Define j1 = j1(n) as the largest integer such that 2j1≤ n
tending to one as n → +∞, the information projection estimator (2.3) exists and satisfies:
?
Moreover, the convergence is uniform over the class Fs
2,2(M) with s >1
2and suppose that Assumptions 1, 2 and 3 are
1
2s+1. Then, with probability
∆f;fj1(n),bθn
?
= Op
?
n−
2s
2s+1
?
.
2,2(M) in the sense that
?
lim
K→+∞
lim
n→+∞
sup
2,2(M)f∈Fs
P
?
n
2s
2s+1∆
?
f;fj1(n),bθn
?
> K= 0.
6
hal00440424, version 2  15 Dec 2009
Page 7
This theorem provides the existence with probability tending to one of a linear estimator for the
spectral density f given by fj1(n),bθj1(n). This estimator is strictly positive by construction. Therefore
transform of fj1(n),bθn) is a positive definite function by Bochner theorem. Hence ? ρLis a covariance
In the related problem of density estimation from an i.d.d. sample, Koo [20] has shown that,
for the KullbackLeibler divergence, n−
estimating a density f such that log(f) belongs to the space Bs
to a general Besov ball Bs
convergence for the L2risk. For the KullbackLeibler divergence, we conjecture that n−
minimax rate of convergence for spectral densities belonging to Fs
However, the result obtained in the above theorem is nonadaptive because the selection of j1(n)
depends on the unknown smoothness s of f. Moreover, the result is only suited for smooth functions
(as Fs
2,2(M) corresponds to a Sobolev space of order s) and does not attain an optimal rate of
convergence when for example g = log(f) has singularities. We therefore propose in the next section
an adaptive estimator derived by applying an appropriate nonlinear thresholding procedure.
the corresponding estimator of the covariance function ? ρL(which is obtained as the inverse Fourier
function.
2s
2s+1 is the fastest rate of convergence for the problem of
2,2(M). For spectral densities belonging
p,q(M), Newman [23] has also shown that n−
2s
2s+1 is an optimal rate of
2s
2s+1is the
2,2(M).
3.2Adaptive estimation
3.2.1 The bound of f is known
In adaptive estimation, we need to define an appropriate thresholding rule for the wavelet coefficients
of the periodogram. This threshold is leveldependent and in this paper will take the form
?
n
ξ = ξj,n= 22?f?∞
??
δ logn
+ 2
j
2?ψ?∞
δ logn
n
?
+C∗
√n
?
,(3.1)
where δ ≥ 0 is a tuning parameter whose choice will be discussed later on and C∗=
following theorem states that the relative entropy between the true f and its nonlinear estimator
achieves in probability the conjectured optimal rate of convergence up to a logarithmic factor over a
wide range of Besov balls.
?
C2+39C2
4π2
1
. The
Theorem 3.2 Assume that f ∈ Fs
Assumptions 1, 2, 3 hold. For any n > 1, define j0= j0(n) to be the integer such that 2j0≥ logn ≥
2j0−1, and j1= j1(n) to be the integer such that 2j1≥
ξj,nas in (3.1). Then, the thresholding estimator (2.4) exists with probability tending to one when
n → +∞ and satisfies:
?
Note that the choices of j0, j1and ξj,nare independent of the parameter s; hence the estimator
fHT
j0(n),j1(n),bθn,ξj,nis an adaptive estimator which attains in probability what we claim is the optimal
This theorem provides the existence with probability tending to one of a nonlinear estimator for the
p,q(M) with s >
1
2+1
pand 1 ≤ p ≤ 2.Suppose also that
n
logn≥ 2j1−1. For δ ≥ 6, take the threshold
∆f;fHT
j0(n),j1(n),bθn,ξj,n
?
= Op
??
n
logn
?−
2s
2s+1?
.
rate of convergence, up to a logarithmic factor. In particular, fHT
j0(n),j1(n),bθn,ξj,nis adaptive on Fs
2,2(M).
7
hal00440424, version 2  15 Dec 2009
Page 8
spectral density. This estimator is strictly positive by construction. Therefore the corresponding
estimator of the covariance function ? ρNL(which is obtained as the inverse Fourier transform of
function.
fHT
j0(n),j1(n),bθn,ξj,n) is a positive definite function by Bochner theorem. Hence ? ρNLis a covariance
3.2.2Estimating the bound of f
Although the results of Theorem 3.2 are certainly of some theoretical interest, they are not helpful for
practical applications. The (deterministic) threshold ξj,ndepends on the unknown quantities ?f?∞
and C∗:= C (C1,C2), where C1and C2are unknown constants. To make the method applicable,
it is necessary to find some completely datadriven rule for the threshold, which works well over a
range as wide as possible of smoothness classes. In this subsection, we give an extension that leads
to consider a random threshold which no longer depends on the bound on f neither on C∗. For this
let us consider the dyadic partitions of [0,1] given by In=??j/2Jn,(j + 1)/2Jn?, j = 0,...,2Jn− 1?.
dyadic partition Inof step 2−Jn. The dimension of Pndepends on n and is denoted by Nn. Note
that Nn= (r + 1)2Jn. This family is regular in the sense that the partition Inhas equispaced knots.
An estimator of ?f?∞is constructed as proposed by Birg´ e and Massart [6] in the following
way. We take the infinite norm of?fn, where?fndenotes the (empirical) orthogonal projection of the
the following theorem holds.
Given some positive integer r, we define Pnas the space of piecewise polynomials of degree r on the
periodogram Inon Pn. We denote by fnthe L2orthogonal projection of f on the same space. Then
Theorem 3.3 Assume that f ∈ Fs
Assumptions 1, 2 and 3 hold. For any n > 1, let j0= j0(n) be the integer such that 2j0≥ logn ≥
2j0−1, and let j1= j1(n) be the integer such that 2j1≥
b ∈?3
?ξj,n= 2
Then, if ?f − fn?∞≤1
degree of the polynomials, the thresholding estimator (2.4) exists with probability tending to one as
n → +∞ and satisfies
?
Note that, we finally obtain a fully tractable estimator of f which reaches the optimal rate of
convergence without prior knowledge of the regularity of the spectral density, but also which gets
rise to a real covariance estimator.
p,q(M) with s >
1
2+1
pand 1 ≤ p ≤ 2.Suppose also that
n
logn≥ 2j1−1. Take the constants δ = 6 and
4,1?, and define the threshold
?
2
????fn
???∞
??
δ
(1 − b)2
logn
n
+ 2
j
2?ψ?∞
δ
(1 − b)2
logn
n
?
+
?
logn
n
?
. (3.2)
4?f?∞and Nn≤
κ
(r+1)2
n
logn, where κ is a numerical constant and r is the
∆f;fHT
j0(n),j1(n),bθn,bξj,n
?
= Op
??
n
logn
?−
2s
2s+1?
.
We point out that, in Comte [8] the condition ?f − fn?∞≤1
regularity conditions on f, results from approximation theory entails that this condition is met.
Indeed for f ∈ Bs
“
n
,
4?f?∞is assumed. Under some
p,∞, with s >1
p, we know from DeVore and Lorentz [13] that
?f − fn?∞≤ C (s)fs,pN
−s−1
p
”
8
hal00440424, version 2  15 Dec 2009
Page 9
with fs,p= sup
y>0y−swd(f,y)p< +∞, where wd(f,y)pis the modulus of smoothness and d = [s]+1.
?
constant depending on f, s and p.
Therefore ?f − fn?∞≤
1
4?f?∞if Nn ≥4C (s)
fs,p
?f?∞
?
1
s−1
p := C (f,s,p), where C (f,s,p) is a
4 Numerical experiments
In this section we present some numerical experiments which support the claims made in the theo
retical part of this paper. The programs for our simulations were implemented using the MATLAB
programming environment. We simulate a time series which is a superposition of an ARMA(2,2)
process and a Gaussian white noise:
Xt= Yt+ coZt,(4.1)
where Yt+a1Yt−1+a2Yt−2= b0εt+b1εt−1+b2εt−2, and {εt}, {Zt} are independent Gaussian white
noise processes with unit variance. The constants were chosen as a1= 0.2, a2= 0.9, b0= 1, b1= 0,
b2= 1 and c0= 0.5. We generated a sample of size n = 1024 according to (4.1). The spectral density
f of (Xt) is shown in Figure 1. It has two moderately sharp peaks and is smooth in the rest of the
domain.
Starting from the periodogram we considered the Symmlet 8 basis, i.e. the least asymmetric,
compactly supported wavelets which are described in Daubechies [11]. We choose j0and j1as in the
hypothesis of Theorem 3.3 and left the coefficients assigned to the father wavelets unthresholded.
Hard thresholding is performed using the threshold?ξj,nas in (3.2) for the levels j = j0,...,j1, and the
2j0−1
?
which is obtained by simply thresholding the wavelet coefficients (2.2) of the periodogram. Note that
such an estimator is not guaranteed to be strictly positive in the interval [0,1]. However, we use it
to built our strictly positive estimator fHT
j0,j1,bθn,bξj,n(see (2.4) to recall its definition). We want to find
?
For this, we take
?
?
(j,k)∈Λj1
mization problem we used a gradient descent method with an adaptive step, taking as initial value
?
empirical coefficients from the higher resolution scales j > j1are set to zero. This gives the estimate
fHT
j0,j1,ξj,n=
k=0
? aj0,kφj0,k+
j1
?
j=j0
2j−1
?
k=0
?bj,kI
?????bj,k
??? > ξj,n
?
ψj,k, (4.2)
?θnsuch that
fHT
j0,j1,bθn,bξj,n,ψj,k
?
= δbξj,n
??βj,k
?
for all (j,k) ∈ Λj1
?θn= argmin
θ∈R#Λj1
?
(j,k)∈Λj1
?fj0,j1,θ,ψj,k? − δbξj,n
??βj,k
??2,
where fj0,j1,θ(.) = exp
?
θj,kψj,k(.)
?
∈ Ej1and Ej1is the family (2.1). To solve this opti
θ0= log
??
fHT
j0,j1,bξj,n
?
+
?
,ψj,k
?
,
9
hal00440424, version 2  15 Dec 2009
Page 10
where
?
fHT
j0,j1,bξj,n(ω)
?
+:= max
?
fHT
j0,j1,bξj,n(ω),η
?
for all ω ∈ [0,1] and η > 0 is a small constant.
j0,j1,ξj,nas in (4.2), obtained by thresholding
In Figure 1 we display the unconstrained estimator fHT
of the wavelet coefficients of the periodogram, together with the estimator fHT
j0,j1,bθn,bξj,n, which is
strictly positive by construction. Note that these wavelet estimators capture well the peaks and look
fairly good on the smooth part too.
0 0.1 0.20.3 0.4 0.50.60.7 0.80.91
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
True spectral density
Estimation by Wavelet Thresholding
Final positive estimator
Figure 1: True spectral density f, wavelet thresholding estimator fHT
j0,j1,bξj,nand final positive estimator
fHT
j0,j1,bθn,bξj,n.
We compared our method with the spectral density estimator proposed by Comte [8], which is
based on a model selection procedure. As an example, in Comte [8], the author study the behavior of
such estimators using a collection of nested models (Sm), with m = 1,...,100, where Smis the space of
piecewise constant functions, generated by a histogram basis on [0,1] of dimension m with equispaced
knots (see Comte [8] for further details). In Figure 2 we show the result of this comparison. Note
that our method better captures the peaks of the true spectral density.
10
hal00440424, version 2  15 Dec 2009
Page 11
00.10.20.3 0.40.5 0.6 0.70.80.91
0
0.2
0.4
0.6
0.8
1
1.2
1.4
True spectral density
Final positive estimator
Regular histogram estimation
Figure 2: True spectral density f, final positive estimator fHT
j0,j1,bθn,bξj,nand estimator via model
selection using regular histograms.
5Appendix
5.1 Some notations and definitions.
Throughout all the proofs, C denotes a generic constant whose value may change from line to line.
First, let us introduce the following definitions.
Definition 5.1 Let Vjdenote the usual multiresolution space at scale j spanned by the scaling func
tions (φj,k)0≤k≤2j−1, and define Aj< +∞ as the constant such that ?υ?∞≤ Aj?υ?L2for all υ ∈ Vj.
Definition 5.2 For f ∈ Fs
γj= ?g − gj?∞, where gj=
k=0
p,q(M), let g = log(f). Then for j ≥ j0− 1, define Dj= ?g − gj?L2and
2j−1
?
The proof of the following lemma immediately follows from the arguments in the proof of Lemma
A.5 in Antoniadis and Bigot [1].
θj,kψj,k, with θj,k= ?g,ψj,k?.
11
hal00440424, version 2  15 Dec 2009
Page 12
Lemma 5.3 Let j ∈ N. Then Aj ≤ C2j/2. Suppose that f ∈ Fs
s >1
constants depending only on M, s, p and q.
p,q(M) with 1 ≤ p ≤ 2 and
p. Then, uniformly over Fs
p,q(M), Dj≤ C2−j(s+1/2−1/p)and γj≤ C2−j(s−1/p)where C denotes
5.2Technical results on information projection
The estimation of density function based on information projection has been introduced by Barron
and Sheu [2]. To apply this method in our context, we recall for completeness a set of results that
are useful to prove the existence of our estimators. The proofs of the following lemmas immediately
follow from results in Barron and Sheu [2] and Antoniadis and Bigot [1].
Lemma 5.4 Let f and g two functions in L2([0,1]) such that log
‚‚‚log
Lemma 5.5 Let β ∈ R#Λj1. Assume that there exists some θ(β) ∈ R#Λj1 such that, for all (j,k) ∈
Λj1, θ(β) is a solution of
?fj,θ(β),ψj,k
Then for any function f such that ?f,ψj,k? = βj,kfor all (j,k) ∈ Λj1, and for all θ ∈ R#Λj1, the
following Pythagorianlike identity holds:
∆(f;fj,θ) = ∆?f;fj,θ(β)
The next lemma is a key result which gives sufficient conditions for the existence of the vector
θ(β) as defined in Lemma 5.5. This lemma also relates distances between the functions in the
exponential family to distances between the corresponding wavelet coefficients. Its proof relies upon
a series of lemmas on bounds within exponential families for the KullbackLeibler divergence and
can be found in Barron and Sheu [2] and Antoniadis and Bigot [1].
?β0,(j,k)
????β − β0
exists and satisfies
???θ
fj1,θ(eβ)
∆fj1,θ(β0);fj1,θ(eβ)
where ?β?2denotes the standard Euclidean norm for β ∈ R#Λj1.
Lemma 5.7 Suppose that f ∈ Fs
such that for all f ∈ Fs
?
f
g
?
is bounded. Then ∆(f;g) ≤
1
2e
“f
g
”‚‚‚∞?1
0f
?
log
?
f
g
??2dµ, where µ denotes the Lebesgue measure on [0,1].
?= βj,k.
?+ ∆?fj,θ(β);fj,θ
?. (5.1)
Lemma 5.6 Let θ0 ∈ R#Λj1, β0 =
all (j,k) ∈ Λj1, and?β ∈ R#Λj1 a given vector. Let b = exp??log(fj,θ0)?∞
?fj1,θ,ψj,k? =?βj,kfor all (j,k) ∈ Λj1
??β
fj1,θ(β0)
≤ 2ebAj1
?
?
(j,k)∈Λj1
∈ R#Λj1 such that β0,(j,k)= ?fj,θ0,ψj,k? for
?
and e = exp(1). If
???2≤
1
2ebAj1then the solution θ
??β
?
of
?
− θ0
??????
???2≤ 2eb
????β − β0
???2
?????log
?
∞
?
????β − β0
???2
≤ 2eb
????β − β0
p,q(M) with s >1
≤ f ≤ M1< +∞.
???
2
2,
pand 1 ≤ p ≤ 2. Then, there exists a constant M1
p,q(M), 0 < M−1
1
12
hal00440424, version 2  15 Dec 2009
Page 13
5.3Technical results for the proofs of the main results
Lemma 5.8 Let n ≥ 1, βj,k:= ?f,ψj,k? and?βj,k:= ?In,ψj,k? for j ≥ j0− 1 and 0 ≤ k ≤ 2j− 1.
C∗=
there exists a constant M2> 0 such that for all f ∈ Fs
??βj,k− βj,k
≤ ?f − E(In)?2
and 2 imply that ?f − E(In)?2
variance term, remark that
??βj,k
?1
Then, under Assumptions 1 and 2, it follows that there exists an absolute constant C > 0 such
that for all ω ∈ [0,1], EIn(ω) − E(In(ω))2≤C
Assumption 3 implies that these bounds for the bias and the variance hold uniformly over Fs
Suppose that Assumptions 1, 2 and 3 hold. Then, Bias2??βj,k
4π2
, and V ar
?
:=
?
E
??βj,k
?
− βj,k
?2
≤
C2
nwith
∗
?
C2+39C2
1
??βj,k
?
:= E
??βj,k− E
??βj,k
?
L2. Using Proposition 1 in Comte [8], Assumptions 1
??2
≤C
nfor some constant C > 0. Moreover,
p,q(M) with s >1
pand 1 ≤ p ≤ 2,
E
?2
= Bias2??βj,k
?
+ V ar
??βj,k
≤M2
n.
Proof. Note that Bias2??βj,k
?
L2≤
C2+39C2
4π2n
1
, which gives the result for the bias term. To bound the
V ar
?
= E?In− E(In),ψj,k?2
≤ E?In− E(In)?2
EIn(ω) − E(In(ω))2dω.
L2?ψj,k?2
L2
=
0
n. To complete the proof it remains to remark that
p,q(M).
Lemma 5.9 Let n ≥ 1, bj,k:= ?f,ψj,k? and?bj,k:= ?In,ψj,k? for j ≥ j0and 0 ≤ k ≤ 2j−1. Suppose
?
?
Proof. Note that
that Assumptions 1 and 2 hold. Then for any x > 0,
P
?bj,k− bj,k > 2?f?∞
where C∗=
??x
n+ 2j/2?ψ?∞x
n
?
+C∗
√n
?
≤ 2e−x,
C2+39C2
4π2
1
?bj,k=
1
2πn
n
?
t=1
n
?
t′=1
?Xt− X??Xt′ − X?∗
1
?
0
ei2πω(t−t′)ψj,k(ω)dω =
1
2πnX
TTn(ψj,k)X∗,
where X =
?X1− X,...,Xn− X?T, XTdenotes the transpose of X and Tn(ψj,k) is the Toeplitz
matrix with entries [Tn(ψj,k)]t,t′ =
0
loss of generality that E (Xt) = 0, and then under under Assumptions 1 and 2, X is a centered
Gaussian vector in Rnwith covariance matrix Σ = Tn(f). Using the decomposition X = Σ
where ε ∼ N (0,In), it follows that?bj,k=
1?
ei2πω(t−t′)ψj,k(ω)dω, 1 ≤ t,t′≤ n. We can assume without
1
2ε,
1
2πnεTAj,kε, with Aj,k= Σ
1
2Tn(ψj,k)Σ
1
2. Note also that
E
??bj,k
?
=
1
2πntr(Aj,k), where tr(A) denotes the trace of a matrix A.
13
hal00440424, version 2  15 Dec 2009
Page 14
Now let s1,...,snbe the eigenvalues of the Hermitian matrix Aj,kwith s1 ≥ s2 ≥ ... ≥ sn
and let Z = 2πn
= εTAj,kε − tr(Aj,k). Then, for 0 < λ < (2s1)−1one has that
??bj,k− E
eλZ??
??bj,k
−λsi−1
??
log
?
E
?
=
n
?
n
?
n
?
i=1
2log(1 − 2λsi)
=
i=1
+∞
?
−λsi −1
ℓ=2
1
2ℓ(2siλ)ℓ≤
n
?
i=1
+∞
?
ℓ=2
1
2ℓ(2siλ)ℓ
≤
i=1
2log(1 − 2λsi)
where we have used the fact that −log(1 − x) =?+∞
ℓ=1
2, the above inequality implies that
xℓ
ℓfor x < 1. Then using the inequality
−u −1
2log(1 − 2u) ≤
?
where ?s?2=?n
?
that implies
?
Let τ (A) denotes the spectral radius of a matrix A. For the Toeplitz matrices Σ = Tn(f) and
Tn(ψj,k) one has that
u2
1−2uthat holds for all 0 < u <1
n
?
i=1si2. Arguing as in Birg´ e and Massart [6], the above inequality implies that for
any x > 0, P(Z > 2s1x + 2?s?√x) ≤ 2e−x, which can also be written as
??bj,k
log
E
?
eλZ??
≤
i=1
λ2si2
1 − 2λsi≤
λ2?s?2
1 − 2λs1
P
?bj,k− E
?
 > 2s1
x
2πn+ 2?s?
2πn
√x
?
≤ 2e−x,
P
?bj,k− E
??bj,k
?
 > 2s1x
n+ 2?s?
n
√x
?
≤ 2e−x.(5.2)
τ (Σ) ≤ ?f?∞and τ (Tn(ψj,k)) ≤ ?ψj,k?∞= 2j/2?ψ?∞
The above inequalities imply that
?
Let λi, i = 1.,,,.n, be the eigenvalues of Tn(ψj,k). From Lemma 3.1 in Davies [12], we have that
s1 = τΣ
1
2Tn(ψj,k)Σ
1
2
?
≤ τ (Σ)τ (Tn(ψj,k)) ≤ ?f?∞2j/2?ψ?∞
(5.3)
limsup
n→+∞
1
ntr?Tn(ψj,k)2?= limsup
n→+∞
1
n
n
?
i=1
λ2
i=
1
?
0
ψ2
j,k(ω)dω = 1,
which implies that
?s?2=
n
?
i=1
si2= tr?A2
j,k
?= tr
?
(ΣTn(ψj,k))2?
≤ τ (Σ)2tr?Tn(ψj,k)2?≤ ?f?2
∞n,(5.4)
14
hal00440424, version 2  15 Dec 2009
Page 15
where we have used the inequality tr?(AB)2?≤ τ (A)2tr?B2?that holds for any pair of Hermitian
?
Now, let ξj,n= 2?f?∞
?????bj,k− bj,k
By Lemma 5.8, one has that− bj,k
which implies using (5.5) that
?????bj,k− bj,k
which completes the proof of Lemma 5.9.
matrices A,B. Combining (5.2), (5.3) and (5.4), we finally obtain that for any x > 0
??x
??x
P
≤ P
???E
P
≤ P
P
?bj,k− E
??bj,k
?
 > 2?f?∞
n+ 2j/2?ψ?∞x
?+C∗
?????bj,k− E
n
??
≤ 2e−x. (5.5)
n+ 2j/2?ψ?∞x
?
??bj,k
?
n
√n, and note that
??bj,k
C∗
√n, and thus ξj,n −
??? > ξj,n
???? > ξj,n−
???E
??bj,k
?
− bj,k
???E
???
?
,
?
??? ≤
??bj,k
?
− bj,k
??? ≥ ξj,n−C∗
√n
??? > ξj,n
?????bj,k− E
??bj,k
???? > ξj,n−C∗
√n
?
≤ 2e−x,
Lemma 5.10 Assume that f ∈ Fs
1, 2 and 3 hold. For any n > 1, define j0= j0(n) to be the integer such that 2j0> logn ≥ 2j0−1,
and j1 = j1(n) to be the integer such that 2j1≥
ξj,n= 22?f?∞
?f,ψj,k? and?βξj,n,(j,k):= δξj,n
(j,k)∈Λj1
n:
(j,k)∈Λj1
uniformly over Fs
p,q(M).
p,q(M) with s >1
2+1
pand 1 ≤ p ≤ 2. Suppose that Assumptions
n
logn≥ 2j1−1. For δ ≥ 6, take the threshold
as in (3.1), where C∗=
?
??
δ logn
n
+ 2
j
2 ?ψ?∞
??βj,k
δ logn
n
?
+C∗
√n
?
?
C2+39C2
4π2
1
. Let βj,k:=
?
with (j,k) ∈ Λj1as in (2.4). Take β = (βj,k)(j,k)∈Λj1and
. Then there exists a constant M3> 0 such that for all sufficiently large
?βξj,n=
??βξj,n,(j,k)
???β −?βξj,n
?
E
???
2
2:= E
?
???βj,k− δξj,n
??βj,k
????
2
≤ M3
?
n
logn
?−
2s
2s+1
Proof. Taking into account that
E
???β −?βξj,n
???
2
2=
2j0−1
?
j1
?
k=0
E(aj0,k− ? aj0,k)2+
2j−1
?
:= T1+ T2+ T3,
j1
?
j=j0
2j−1
?
k=0
E
??
bj,k−?bj,k
?2I
?????bj,k
??? > ξj,n
??
+
j=j0
k=0
b2
j,kP
?????bj,k
??? ≤ ξj,n
?
(5.6)
we are interested in bounding these three terms. The bound for T1follows from Lemma 5.8 and the
fact that j0= log2(logn) ≤
?2j0
1
2s+1log2(n):
T1=
2j0−1
?
k=0
E(aj0,k− ? aj0,k)2= O
n
?
≤ O
?
n−
2s
2s+1
?
.(5.7)
15
hal00440424, version 2  15 Dec 2009