# Adaptive estimation of covariance functions via wavelet thresholding and information projection

**ABSTRACT** In this paper, we study the problem of nonparametric adaptive estimation of the covariance function of a stationary Gaussian process. For this purpose, we consider a wavelet-based method which combines the ideas of wavelet approximation and estimation by information projection in order to warrants the positive semidefiniteness property of the solution. The spectral density of the process is estimated by projecting the wavelet thresholding expansion of the periodogram onto a family of exponential functions. This ensures that the spectral density estimator is a strictly positive function. Then, by Bochner theorem, we obtain a semidefinite positive estimator of the covariance function. The theoretical behavior of the estimator is established in terms of rate of convergence of the Kullback-Leibler discrepancy over Besov classes. We also show the excellent practical performance of the estimator in some numerical experiments.

**0**Bookmarks

**·**

**66**Views

- Citations (26)
- Cited In (0)

- Annals of Mathematics · 3.03 Impact Factor
- [Show abstract] [Hide abstract]

**ABSTRACT:**. In this paper we consider logspline density estimation for random variables which are contaminated with random noise. In the logspline density estimation for data without noise, the logarithm of an unknown density function is estimated by a polynomial spline, the unknown parameters of which are given by maximum likelihood. When noise is present, B-splines and the Fourier inversion formula are used to construct the logspline density estimator of the unknown density function. Rates of convergence are established when the log-density function is assumed to be in a Besov space. It is shown that convergence rates depend on the smoothness of the density function and the decay rate of the characteristic function of the noise. Simulated data are used to show the finite-sample performance of inference based on the logspline density estimation.Scandinavian Journal of Statistics 02/1999; 26(1):73-86. · 1.17 Impact Factor - SourceAvailable from: robertnz.netAdvances in Applied Probability 01/1973; · 0.90 Impact Factor

Page 1

Adaptive estimation of covariance functions via wavelet thresholding

and information projection

J´ er´ emie Bigot1, Rolando J. Biscay Lirio3, Jean-Michel Loubes1& Lilian Mu˜ niz Alvarez2

IMT, Universit´ e Paul Sabatier, Toulouse, France1

Facultad de Matem´ atica y Computaci´ on de la Universidad de La Habana2

Instituto de Cibern´ etica, Matem´ atica y F´ ısica, Cuba3

December 15, 2009

Abstract

In this paper, we study the problem of nonparametric adaptive estimation of the covariance

function of a stationary Gaussian process. For this purpose, we consider a wavelet-based method

which combines the ideas of wavelet approximation and estimation by information projection in

order to warrants the nonnegative definiteness property of the solution. The spectral density of

the process is estimated by projecting the wavelet thresholding expansion of the periodogram onto

a family of exponential functions. This ensures that the spectral density estimator is a strictly

positive function. Then, by Bochner theorem, we obtain a semidefinite positive estimator of the

covariance function. The theoretical behavior of the estimator is established in terms of rate of

convergence of the Kullback-Leibler discrepancy over Besov classes. We also show the excellent

practical performance of the estimator in some numerical experiments.

Keywords: Covariance estimation, adaptive estimation, wavelet thresholding, sequences of exponential

families, Besov spaces.

AMS classifications: Primary 62G07; secondary 42C40, 41A29

1 Introduction

Estimating a covariance function is a fundamental problem in inference for stationary stochastic

processes. Many applications in several fields such as geosciences, ecology, demography and financial

series are deeply related to this issue, see for instance Journel and Huijbregts [19], Christakos [7]

and Stein [28]. The purpose of this work is not only to provide an estimate but also to obtain

an estimator which is a covariance function. In particular, we aim at preserving the property of

nonnegative definiteness.

For this, usually statisticians resort to fitting parametric models, which are numerous in the

literature, see Cressie [9] for a detailed account of parametric covariance estimation. Nonparametric

approaches provide more flexibility when constructing the estimator but their main drawback comes

from the difficulty to restrict to nonnegative definite class of estimators. For example, Shapiro and

Botha [27] suggest an estimator with this property on a discrete set but not on the continuum.

In the non stationnary case and for spatio-temporal data, Sampson and Guttorp [25] propose an

approach based on the particular covariance representation due to Schoenberg [26], which ensures

that the resulting estimator is a covariance function. Hall, Fisher and Hoffman in [18] enforce

hal-00440424, version 2 - 15 Dec 2009

Page 2

the nonnegative definiteness property on a kernel type estimator of the covariance function using

Bochner theorem [3], which characterizes the class of continuous nonnegative definite functions by

the behavior of their Fourier transform. However, this approach requires the precise choice of three

parameters, including an optimal selection of the bandwidth of the kernel function. More recently

Elogne, Perrin and Thomas-Agnan [15] use interpolation methods for estimating smooth stationary

covariance. The major drawback is that the computation of this estimator is difficult since it involves

the calculation of convolution integrals.

When a spectral measure has a density, this density is called the spectral density. Since Bochner’s

theorem states that a continuous function on Rdis nonnegative definite if and only if it is the Fourier

transform of a bounded nonnegative measure called the spectral measure, hence, the estimation of

the covariance function is strongly related to the estimation of the spectral density of the process.

Actually, inference in the spectral domain uses the periodogram of the data, providing an incon-

sistent estimator which must be smoothed in order to achieve consistency. For highly regular spectral

densities, linear smoothing techniques such as kernel smoothing are appropriate (see Brillinger [4]).

However, linear smoothing methods are not able to achieve the optimal mean-square rate of conver-

gence for spectra whose smoothness is distributed inhomogeneously over the domain of interest. For

this nonlinear methods are needed. One nonlinear method for adaptive spectral density estimation

of a stationary Gaussian sequence was proposed by Comte [8]. It is based on model selection tech-

niques. Others nonlinear smoothing procedures are the wavelet thresholding methods, first proposed

by Donoho and Johnstone [14]. In this context, different thresholding rules have been proposed by

Neumann [23] and Fryzlewics, Nason and von Sachs [16] to name but a few.

Neumann’s approach [23] consists in pre-estimating the variance of the periodogram via kernel

smoothing, so that it can be supplied to the wavelet estimation procedure. Kernel pre-estimation

may not be appropriate in cases where the underlying spectral density is of low regularity. One

way to avoid this problem is proposed in Fryzlewics, Nason and von Sachs [16], where the empirical

wavelet coefficient thresholds are built as appropriate local weighted l1norms of the periodogram.

Their method does not produce a positive spectral density estimator, therefore the corresponding

estimator of the covariance function is not semidefinite positive.

To overcome the drawbacks of previous estimators, in this paper we propose a new wavelet-

based method for the estimation of the spectral density of a Gaussian process and its corresponding

covariance function. As a solution to ensure nonnegativeness of the spectral density estimator,

our method combines the ideas of wavelet thresholding and estimation by information projection.

We estimate the spectral density by a projection of the nonlinear wavelet approximation of the

periodogram onto a family of exponential functions. Therefore, the estimator is non negative by

construction. Then, by Bochner theorem, the corresponding estimator of the covariance function

satisfies the nonnegative definiteness property. This technique was studied by Barron and Sheu [2]

for the approximation of density functions by sequences of exponential families, by Loubes and Yan

[21] for penalized maximum likelihood estimation with l1penalty, by Antoniadis and Bigot [1] for the

study of Poisson inverse problems, and by Bigot and Van Bellegem [5] for log-density deconvolution.

The theoretical optimality of the estimators for the spectral density of a stationary process is

generally studied using risk bounds in L2-norm. This is the case in the papers of Neumann [23],

Comte [8] and Fryzlewics, Nason and von Sachs [16] mentioned before. In this work, the behavior

of the proposed estimator is established in terms of the rate of convergence of the Kullback-Leibler

discrepancy over Besov classes, which is maybe a more natural loss function for the estimation of a

spectral density function than the L2-norm. Moreover, the thresholding rules that we use to derive

adaptive estimators differ from previous approaches based on wavelet decomposition and are quite

2

hal-00440424, version 2 - 15 Dec 2009

Page 3

simple to compute. Finally, we compare the performance of our estimated with other estimators on

some simulations.

The paper is organized as follows. Section 2 presents the statistical framework under which

we work. We define the model, the wavelet-based exponential family and the linear and nonlinear

wavelet estimators by information projection. We also recall the definition of the Kullback-Leibler

divergence and some results on Besov spaces. The rate of convergence of the proposed estimators

are stated in Section 3. Some numerical experiments are described in Section 4. Technical lemmas

and proofs of the main theorems are gathered in the Appendix.

Throughout this paper C denotes a constant that may vary from line to line. The notation C(.)

specifies the dependency of C on some quantities.

2Statistical framework

2.1The model

We aim at providing a nonparametric adaptive estimation of the spectral density which satisfies the

property of being nonnegative in order to guarantee that the covariance estimator is a nonnegative

definite function. We consider the sequence (Xt)t∈Nthat satisfies the following assumptions:

Assumption 1 The sequence (X1,...Xn) is an n-sample drawn from a stationary sequence of Gaus-

sian random variables.

Let ρ be the covariance function of the process, i.e. ρ(h) = cov (Xt,Xt+h) with h ∈ Z. The

spectral density f is defined as:

f (ω) =

1

2π

?

h∈Z

ρ(h)ei2πωh, ω ∈ [0,1].

We need the following standard assumption on ρ:

Assumption 2 The covariance function ρ is nonnegative definite, such that there exists two con-

stants 0 < C1,C2< +∞ such that

h∈Z

?

|ρ(h)| = C1and

?

h∈Z

??hρ2(h)??= C2.

Assumption 2 implies in particular that the spectral density f is bounded by the constant C1.

As a consequence, it is also square integrable. As in Comte [8], the data consist on a number of

observations X1,...,Xnat regularly spaced points. We want to obtain a positive estimator for the

spectral density function f without parametric assumptions on the basis of these observations. For

this, we combine the ideas of wavelet thresholding and estimation by information projection.

2.2Estimation by information projection

2.2.1 Wavelet-based exponential family

To ensure nonnegativity of the estimator, we will look for approximations over an exponential family.

For this, we construct a sieves of exponential functions defined in a wavelet basis.

Let φ(ω) and ψ (ω), respectively, be the scaling and the wavelet functions generated by an

orthonormal multiresolution decomposition of L2([0,1]), see Mallat [22] for a detailed exposition

3

hal-00440424, version 2 - 15 Dec 2009

Page 4

on wavelet analysis. Throughout the paper, the functions φ and ψ are supposed to be compactly

supported and such that ?φ?∞< +∞, ?ψ?∞< +∞. Then, for any integer j0≥ 0, any function

g ∈ L2([0,1]) has the following representation:

2j0−1

?

j0

g (ω) =

k=0

?g,φj0,k?φj0,k(ω) +

+∞

?

j=j0

2j−1

?

k=0

?g,ψj,k?ψj,k(ω),

where φj0,k(ω) = 2

expand the spectral density f onto this wavelet basis and to find an estimator of this expansion

that is then modified to impose the positivity property. The scaling and wavelet coefficients of the

spectral density function f are denoted by aj0,k= ?f,φj0,k? and bj,k= ?f,ψj,k?.

To simplify the notations, we write (ψj,k)j=j0−1for the scaling functions (φj,k)j=j0. Let j1≥ j0

and define the set

Λj1=?(j,k) : j0− 1 ≤ j < j1,0 ≤ k ≤ 2j− 1?.

Note that #Λj1= 2j1, where #Λj1denotes the cardinal of Λj1. Let θ denotes a vector in R#Λj1, the

wavelet-based exponential family Ej1at scale j1is defined as the set of functions:

(j,k)∈Λj1

We will enforce our estimator of the spectral density to belong to the family Ej1of exponential

functions, which are positive by definition.

2φ?2j0ω − k?and ψj,k(ω) = 2

j

2ψ?2jω − k?. The main idea of this paper is to

Ej1=

fj1,θ(.) = exp

?

θj,kψj,k(.)

, θ = (θj,k)(j,k)∈Λj1∈ R#Λj1

.(2.1)

2.2.2Information projection

Following Csisz´ ar [10], it is possible to define the projection of a function f onto Ej1. If this projection

exists, it is defined as the function fj1,θ∗

function f in the Kullback-Leibler sense, and is characterized as the unique function in the family

Ej1for which

?

Note that the notation βj,kis used to denote both the the scaling coefficients aj0,kand the wavelet

coefficients bj,k.

Let

j1in the exponential family Ej1that is the closest to the true

fj1,θ∗

j1,ψj,k

?

= ?f,ψj,k? := βj,kfor all (j,k) ∈ Λj1.

In(ω) =

1

2πn

n

?

t=1

n

?

t′=1

?Xt− X??Xt′ − X?∗ei2πω(t−t′),

?Xt− X?∗denotes the conjugate transpose of

t=1Xt. The expansion of In(ω) onto the wavelet basis allows to obtain estimators of aj0,k

and bj,kgiven by

be the classical periodogram, where

n ?

?Xt− X?

and

X =

1

n

? aj0,k=

1

?

0

In(ω)φj0,k(ω)dωand

?bj,k=

1

?

0

In(ω)ψj,k(ω)dω.(2.2)

4

hal-00440424, version 2 - 15 Dec 2009

Page 5

It seems therefore natural to estimate the function f by searching for some?θn∈ R#Λj1 such that

?

0

fj1,bθn,ψj,k

?

=

1

?

In(ω)ψj,k(ω)dω :=?βj,kfor all (j,k) ∈ Λj1,(2.3)

where?βj,kdenotes both the estimation of the scaling coefficients ? aj0,kand the wavelet coefficients

Similarly, the positive nonlinear estimator with hard thresholding is defined as the function fHT

?bj,k. The function fj1,bθnis the spectral density positive linear estimator.

(with?θn∈ R#Λj1) such that

fHT

= δξ

for all (j,k) ∈ Λj1,

where δξdenotes the hard thresholding rule defined by

j1,bθn,ξ

?

j1,bθn,ξ,ψj,k

?

??βj,k

?

(2.4)

δξ(x) = xI (|x| ≥ ξ) for x ∈ R,

where ξ > 0 is an appropriate threshold whose choice is discussed later on.

The existence of these estimators is questionable. Moreover, there is no way to obtain an explicit

expression for?θn. In our simulations, we use a numerical approximation of?θn that is obtained

probability one is a difficult task. For the related problem of estimating a density from an independent

and identically distributed random variable, it is even shown in Barron and Sheu [2] that for some

exponential family (e.g. based on a spline basis), the vector?θnmay not exist with a small positive

and fHT

j1,bθn,ξwith probability tending to one as n → +∞.

2.3An appropriate loss function : the Kullback-Leibler divergence

via a gradient-descent algorithm with an adaptive step. Proving that such estimators exist with

probability. Thus, in the next sections, some sufficient conditions are given for the existence of fj1,bθn

To assess the quality of the estimators, we will measure the discrepancy between an estimator?f and

∆f;?f

where µ denotes the Lebesgue measure on [0,1]. It can be shown that ∆

equals zero if and only if?f = f.

2.4Smoothness Assumptions

the true function f in the sense of relative entropy (Kullback-Leibler divergence) defined by:

??

=

?1

0

?

f log

?f

?f

?

− f +?f

?

dµ,

?

f;?f

?

is nonnegative and

It is well known that Besov spaces for periodic functions in L2([0,1]) can be characterized in terms

of wavelet coefficients (see e.g. Mallat [22]). Assume that ψ has m vanishing moments, and let

0 < s < m denote the usual smoothness parameter. Then, for a Besov ball Bs

with 1 ≤ p,q ≤ ∞, one has that for s∗= s + 1/2 − 1/p ≥ 0:

5

p,q(A) of radius A > 0

Bs

p,q(A) :=

g ∈ L2([0,1] : ?g?s,p,q:=

2j0−1

?

k=0

|aj0,k|p

1

p

+

∞

?

j=j0

2js∗q

2j−1

?

k=0

|bj,k|p

q

p

1

q

≤ A

,

hal-00440424, version 2 - 15 Dec 2009

Page 6

with the respective above sums replaced by maximum if p = ∞ or q = ∞ and where aj0,k= ?g,φj0,k?

and bj,k= ?g,ψj,k?.

The condition that s+1/2−1/p ≥ 0 is imposed to ensure that Bs

and we shall restrict ourselves to this case in this paper (although not always stated, it is clear that

all our results hold for s < m). Besov spaces allow for more local variability in local smoothness

than is typical for functions in the usual H¨ older or Sobolev spaces. For instance, a real function f

on [0,1] that is piecewise continuous, but for which each piece is locally in Cs, can be an element of

Bs

p,p(A) with 1 ≤ p < 2, despite the possibility of discontinuities at the transition from one piece to

the next (see e.g. Proposition 9.2 in Mallat [22]). Note that if s > 1 is not an integer, then Bs

is equivalent to a Sobolev ball of order s. Moreover, the space Bs

piecewise smooth functions with local irregularities such as discontinuities.

Let M > 0 and denote by Fs

p,q(M) the set of functions such that

p,q(A) is a subspace of L2([0,1]),

2,2(A)

p,q(A) with 1 ≤ p < 2 contains

Fs

p,q(M) = {f = exp(g) : ?g?s,p,q≤ M},

where ?g?s,p,q denotes the norm in the Besov space Bs

implies that f is strictly positive. In the next section we establish the rate of convergence of our

estimators in terms of the Kullback-Leibler discrepancy over Besov classes.

p,q. Note that assuming that f ∈ Fs

p,q(M)

3Asymptotic behavior of the estimators

We make the following assumption on the wavelet basis that guarantees that Assumption 2 holds

uniformly over Fs

p,q(M).

Assumption 3 Let M > 0, 1 ≤ p ≤ 2 and s > 1/p. For f ∈ Fs

?1

that there exists a constant M∗such that for all f ∈ Fs

C1(f) ≤ M∗and C2(f) ≤ M∗.

p,q(M) and h ∈ Z, let ρ(h) =

??hρ2(h)??. Then, the wavelet basis is such

0f(ω)e−i2πωhdω, C1(f) :=

?

h∈Z

|ρ(h)| and C2(f) :=

?

p,q(M),

h∈Z

3.1Linear estimation

The following theorem is the general result on the linear information projection estimator of the

spectral density function. Note that the choice of the coarse level resolution level j0 is of minor

importance, and without loss of generality we take j0= 0 for the linear estimator fj1,bθn.

Theorem 3.1 Assume that f ∈ Fs

satisfied. Define j1 = j1(n) as the largest integer such that 2j1≤ n

tending to one as n → +∞, the information projection estimator (2.3) exists and satisfies:

?

Moreover, the convergence is uniform over the class Fs

2,2(M) with s >1

2and suppose that Assumptions 1, 2 and 3 are

1

2s+1. Then, with probability

∆f;fj1(n),bθn

?

= Op

?

n−

2s

2s+1

?

.

2,2(M) in the sense that

?

lim

K→+∞

lim

n→+∞

sup

2,2(M)f∈Fs

P

?

n

2s

2s+1∆

?

f;fj1(n),bθn

?

> K= 0.

6

hal-00440424, version 2 - 15 Dec 2009

Page 7

This theorem provides the existence with probability tending to one of a linear estimator for the

spectral density f given by fj1(n),bθj1(n). This estimator is strictly positive by construction. Therefore

transform of fj1(n),bθn) is a positive definite function by Bochner theorem. Hence ? ρLis a covariance

In the related problem of density estimation from an i.d.d. sample, Koo [20] has shown that,

for the Kullback-Leibler divergence, n−

estimating a density f such that log(f) belongs to the space Bs

to a general Besov ball Bs

convergence for the L2risk. For the Kullback-Leibler divergence, we conjecture that n−

minimax rate of convergence for spectral densities belonging to Fs

However, the result obtained in the above theorem is nonadaptive because the selection of j1(n)

depends on the unknown smoothness s of f. Moreover, the result is only suited for smooth functions

(as Fs

2,2(M) corresponds to a Sobolev space of order s) and does not attain an optimal rate of

convergence when for example g = log(f) has singularities. We therefore propose in the next section

an adaptive estimator derived by applying an appropriate nonlinear thresholding procedure.

the corresponding estimator of the covariance function ? ρL(which is obtained as the inverse Fourier

function.

2s

2s+1 is the fastest rate of convergence for the problem of

2,2(M). For spectral densities belonging

p,q(M), Newman [23] has also shown that n−

2s

2s+1 is an optimal rate of

2s

2s+1is the

2,2(M).

3.2Adaptive estimation

3.2.1 The bound of f is known

In adaptive estimation, we need to define an appropriate thresholding rule for the wavelet coefficients

of the periodogram. This threshold is level-dependent and in this paper will take the form

?

n

ξ = ξj,n= 22?f?∞

??

δ logn

+ 2

j

2?ψ?∞

δ logn

n

?

+C∗

√n

?

,(3.1)

where δ ≥ 0 is a tuning parameter whose choice will be discussed later on and C∗=

following theorem states that the relative entropy between the true f and its nonlinear estimator

achieves in probability the conjectured optimal rate of convergence up to a logarithmic factor over a

wide range of Besov balls.

?

C2+39C2

4π2

1

. The

Theorem 3.2 Assume that f ∈ Fs

Assumptions 1, 2, 3 hold. For any n > 1, define j0= j0(n) to be the integer such that 2j0≥ logn ≥

2j0−1, and j1= j1(n) to be the integer such that 2j1≥

ξj,nas in (3.1). Then, the thresholding estimator (2.4) exists with probability tending to one when

n → +∞ and satisfies:

?

Note that the choices of j0, j1and ξj,nare independent of the parameter s; hence the estimator

fHT

j0(n),j1(n),bθn,ξj,nis an adaptive estimator which attains in probability what we claim is the optimal

This theorem provides the existence with probability tending to one of a nonlinear estimator for the

p,q(M) with s >

1

2+1

pand 1 ≤ p ≤ 2.Suppose also that

n

logn≥ 2j1−1. For δ ≥ 6, take the threshold

∆f;fHT

j0(n),j1(n),bθn,ξj,n

?

= Op

??

n

logn

?−

2s

2s+1?

.

rate of convergence, up to a logarithmic factor. In particular, fHT

j0(n),j1(n),bθn,ξj,nis adaptive on Fs

2,2(M).

7

hal-00440424, version 2 - 15 Dec 2009

Page 8

spectral density. This estimator is strictly positive by construction. Therefore the corresponding

estimator of the covariance function ? ρNL(which is obtained as the inverse Fourier transform of

function.

fHT

j0(n),j1(n),bθn,ξj,n) is a positive definite function by Bochner theorem. Hence ? ρNLis a covariance

3.2.2Estimating the bound of f

Although the results of Theorem 3.2 are certainly of some theoretical interest, they are not helpful for

practical applications. The (deterministic) threshold ξj,ndepends on the unknown quantities ?f?∞

and C∗:= C (C1,C2), where C1and C2are unknown constants. To make the method applicable,

it is necessary to find some completely data-driven rule for the threshold, which works well over a

range as wide as possible of smoothness classes. In this subsection, we give an extension that leads

to consider a random threshold which no longer depends on the bound on f neither on C∗. For this

let us consider the dyadic partitions of [0,1] given by In=??j/2Jn,(j + 1)/2Jn?, j = 0,...,2Jn− 1?.

dyadic partition Inof step 2−Jn. The dimension of Pndepends on n and is denoted by Nn. Note

that Nn= (r + 1)2Jn. This family is regular in the sense that the partition Inhas equispaced knots.

An estimator of ?f?∞is constructed as proposed by Birg´ e and Massart [6] in the following

way. We take the infinite norm of?fn, where?fndenotes the (empirical) orthogonal projection of the

the following theorem holds.

Given some positive integer r, we define Pnas the space of piecewise polynomials of degree r on the

periodogram Inon Pn. We denote by fnthe L2-orthogonal projection of f on the same space. Then

Theorem 3.3 Assume that f ∈ Fs

Assumptions 1, 2 and 3 hold. For any n > 1, let j0= j0(n) be the integer such that 2j0≥ logn ≥

2j0−1, and let j1= j1(n) be the integer such that 2j1≥

b ∈?3

?ξj,n= 2

Then, if ?f − fn?∞≤1

degree of the polynomials, the thresholding estimator (2.4) exists with probability tending to one as

n → +∞ and satisfies

?

Note that, we finally obtain a fully tractable estimator of f which reaches the optimal rate of

convergence without prior knowledge of the regularity of the spectral density, but also which gets

rise to a real covariance estimator.

p,q(M) with s >

1

2+1

pand 1 ≤ p ≤ 2.Suppose also that

n

logn≥ 2j1−1. Take the constants δ = 6 and

4,1?, and define the threshold

?

2

????fn

???∞

??

δ

(1 − b)2

logn

n

+ 2

j

2?ψ?∞

δ

(1 − b)2

logn

n

?

+

?

logn

n

?

. (3.2)

4?f?∞and Nn≤

κ

(r+1)2

n

logn, where κ is a numerical constant and r is the

∆f;fHT

j0(n),j1(n),bθn,bξj,n

?

= Op

??

n

logn

?−

2s

2s+1?

.

We point out that, in Comte [8] the condition ?f − fn?∞≤1

regularity conditions on f, results from approximation theory entails that this condition is met.

Indeed for f ∈ Bs

“

n

,

4?f?∞is assumed. Under some

p,∞, with s >1

p, we know from DeVore and Lorentz [13] that

?f − fn?∞≤ C (s)|f|s,pN

−s−1

p

”

8

hal-00440424, version 2 - 15 Dec 2009

Page 9

with |f|s,p= sup

y>0y−swd(f,y)p< +∞, where wd(f,y)pis the modulus of smoothness and d = [s]+1.

?

constant depending on f, s and p.

Therefore ?f − fn?∞≤

1

4?f?∞if Nn ≥4C (s)

|f|s,p

?f?∞

?

1

s−1

p := C (f,s,p), where C (f,s,p) is a

4 Numerical experiments

In this section we present some numerical experiments which support the claims made in the theo-

retical part of this paper. The programs for our simulations were implemented using the MATLAB

programming environment. We simulate a time series which is a superposition of an ARMA(2,2)

process and a Gaussian white noise:

Xt= Yt+ coZt,(4.1)

where Yt+a1Yt−1+a2Yt−2= b0εt+b1εt−1+b2εt−2, and {εt}, {Zt} are independent Gaussian white

noise processes with unit variance. The constants were chosen as a1= 0.2, a2= 0.9, b0= 1, b1= 0,

b2= 1 and c0= 0.5. We generated a sample of size n = 1024 according to (4.1). The spectral density

f of (Xt) is shown in Figure 1. It has two moderately sharp peaks and is smooth in the rest of the

domain.

Starting from the periodogram we considered the Symmlet 8 basis, i.e. the least asymmetric,

compactly supported wavelets which are described in Daubechies [11]. We choose j0and j1as in the

hypothesis of Theorem 3.3 and left the coefficients assigned to the father wavelets unthresholded.

Hard thresholding is performed using the threshold?ξj,nas in (3.2) for the levels j = j0,...,j1, and the

2j0−1

?

which is obtained by simply thresholding the wavelet coefficients (2.2) of the periodogram. Note that

such an estimator is not guaranteed to be strictly positive in the interval [0,1]. However, we use it

to built our strictly positive estimator fHT

j0,j1,bθn,bξj,n(see (2.4) to recall its definition). We want to find

?

For this, we take

?

?

(j,k)∈Λj1

mization problem we used a gradient descent method with an adaptive step, taking as initial value

?

empirical coefficients from the higher resolution scales j > j1are set to zero. This gives the estimate

fHT

j0,j1,ξj,n=

k=0

? aj0,kφj0,k+

j1

?

j=j0

2j−1

?

k=0

?bj,kI

?????bj,k

??? > ξj,n

?

ψj,k, (4.2)

?θnsuch that

fHT

j0,j1,bθn,bξj,n,ψj,k

?

= δbξj,n

??βj,k

?

for all (j,k) ∈ Λj1

?θn= argmin

θ∈R#Λj1

?

(j,k)∈Λj1

?fj0,j1,θ,ψj,k? − δbξj,n

??βj,k

??2,

where fj0,j1,θ(.) = exp

?

θj,kψj,k(.)

?

∈ Ej1and Ej1is the family (2.1). To solve this opti-

θ0= log

??

fHT

j0,j1,bξj,n

?

+

?

,ψj,k

?

,

9

hal-00440424, version 2 - 15 Dec 2009

Page 10

where

?

fHT

j0,j1,bξj,n(ω)

?

+:= max

?

fHT

j0,j1,bξj,n(ω),η

?

for all ω ∈ [0,1] and η > 0 is a small constant.

j0,j1,ξj,nas in (4.2), obtained by thresholding

In Figure 1 we display the unconstrained estimator fHT

of the wavelet coefficients of the periodogram, together with the estimator fHT

j0,j1,bθn,bξj,n, which is

strictly positive by construction. Note that these wavelet estimators capture well the peaks and look

fairly good on the smooth part too.

0 0.1 0.20.3 0.4 0.50.60.7 0.80.91

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

True spectral density

Estimation by Wavelet Thresholding

Final positive estimator

Figure 1: True spectral density f, wavelet thresholding estimator fHT

j0,j1,bξj,nand final positive estimator

fHT

j0,j1,bθn,bξj,n.

We compared our method with the spectral density estimator proposed by Comte [8], which is

based on a model selection procedure. As an example, in Comte [8], the author study the behavior of

such estimators using a collection of nested models (Sm), with m = 1,...,100, where Smis the space of

piecewise constant functions, generated by a histogram basis on [0,1] of dimension m with equispaced

knots (see Comte [8] for further details). In Figure 2 we show the result of this comparison. Note

that our method better captures the peaks of the true spectral density.

10

hal-00440424, version 2 - 15 Dec 2009

Page 11

00.10.20.3 0.40.5 0.6 0.70.80.91

0

0.2

0.4

0.6

0.8

1

1.2

1.4

True spectral density

Final positive estimator

Regular histogram estimation

Figure 2: True spectral density f, final positive estimator fHT

j0,j1,bθn,bξj,nand estimator via model

selection using regular histograms.

5Appendix

5.1 Some notations and definitions.

Throughout all the proofs, C denotes a generic constant whose value may change from line to line.

First, let us introduce the following definitions.

Definition 5.1 Let Vjdenote the usual multiresolution space at scale j spanned by the scaling func-

tions (φj,k)0≤k≤2j−1, and define Aj< +∞ as the constant such that ?υ?∞≤ Aj?υ?L2for all υ ∈ Vj.

Definition 5.2 For f ∈ Fs

γj= ?g − gj?∞, where gj=

k=0

p,q(M), let g = log(f). Then for j ≥ j0− 1, define Dj= ?g − gj?L2and

2j−1

?

The proof of the following lemma immediately follows from the arguments in the proof of Lemma

A.5 in Antoniadis and Bigot [1].

θj,kψj,k, with θj,k= ?g,ψj,k?.

11

hal-00440424, version 2 - 15 Dec 2009

Page 12

Lemma 5.3 Let j ∈ N. Then Aj ≤ C2j/2. Suppose that f ∈ Fs

s >1

constants depending only on M, s, p and q.

p,q(M) with 1 ≤ p ≤ 2 and

p. Then, uniformly over Fs

p,q(M), Dj≤ C2−j(s+1/2−1/p)and γj≤ C2−j(s−1/p)where C denotes

5.2Technical results on information projection

The estimation of density function based on information projection has been introduced by Barron

and Sheu [2]. To apply this method in our context, we recall for completeness a set of results that

are useful to prove the existence of our estimators. The proofs of the following lemmas immediately

follow from results in Barron and Sheu [2] and Antoniadis and Bigot [1].

Lemma 5.4 Let f and g two functions in L2([0,1]) such that log

‚‚‚log

Lemma 5.5 Let β ∈ R#Λj1. Assume that there exists some θ(β) ∈ R#Λj1 such that, for all (j,k) ∈

Λj1, θ(β) is a solution of

?fj,θ(β),ψj,k

Then for any function f such that ?f,ψj,k? = βj,kfor all (j,k) ∈ Λj1, and for all θ ∈ R#Λj1, the

following Pythagorian-like identity holds:

∆(f;fj,θ) = ∆?f;fj,θ(β)

The next lemma is a key result which gives sufficient conditions for the existence of the vector

θ(β) as defined in Lemma 5.5. This lemma also relates distances between the functions in the

exponential family to distances between the corresponding wavelet coefficients. Its proof relies upon

a series of lemmas on bounds within exponential families for the Kullback-Leibler divergence and

can be found in Barron and Sheu [2] and Antoniadis and Bigot [1].

?β0,(j,k)

????β − β0

exists and satisfies

???θ

fj1,θ(eβ)

∆fj1,θ(β0);fj1,θ(eβ)

where ?β?2denotes the standard Euclidean norm for β ∈ R#Λj1.

Lemma 5.7 Suppose that f ∈ Fs

such that for all f ∈ Fs

?

f

g

?

is bounded. Then ∆(f;g) ≤

1

2e

“f

g

”‚‚‚∞?1

0f

?

log

?

f

g

??2dµ, where µ denotes the Lebesgue measure on [0,1].

?= βj,k.

?+ ∆?fj,θ(β);fj,θ

?. (5.1)

Lemma 5.6 Let θ0 ∈ R#Λj1, β0 =

all (j,k) ∈ Λj1, and?β ∈ R#Λj1 a given vector. Let b = exp??log(fj,θ0)?∞

?fj1,θ,ψj,k? =?βj,kfor all (j,k) ∈ Λj1

??β

fj1,θ(β0)

≤ 2ebAj1

?

?

(j,k)∈Λj1

∈ R#Λj1 such that β0,(j,k)= ?fj,θ0,ψj,k? for

?

and e = exp(1). If

???2≤

1

2ebAj1then the solution θ

??β

?

of

?

− θ0

??????

???2≤ 2eb

????β − β0

???2

?????log

?

∞

?

????β − β0

???2

≤ 2eb

????β − β0

p,q(M) with s >1

≤ f ≤ M1< +∞.

???

2

2,

pand 1 ≤ p ≤ 2. Then, there exists a constant M1

p,q(M), 0 < M−1

1

12

hal-00440424, version 2 - 15 Dec 2009

Page 13

5.3Technical results for the proofs of the main results

Lemma 5.8 Let n ≥ 1, βj,k:= ?f,ψj,k? and?βj,k:= ?In,ψj,k? for j ≥ j0− 1 and 0 ≤ k ≤ 2j− 1.

C∗=

there exists a constant M2> 0 such that for all f ∈ Fs

??βj,k− βj,k

≤ ?f − E(In)?2

and 2 imply that ?f − E(In)?2

variance term, remark that

??βj,k

?1

Then, under Assumptions 1 and 2, it follows that there exists an absolute constant C > 0 such

that for all ω ∈ [0,1], E|In(ω) − E(In(ω))|2≤C

Assumption 3 implies that these bounds for the bias and the variance hold uniformly over Fs

Suppose that Assumptions 1, 2 and 3 hold. Then, Bias2??βj,k

4π2

, and V ar

?

:=

?

E

??βj,k

?

− βj,k

?2

≤

C2

nwith

∗

?

C2+39C2

1

??βj,k

?

:= E

??βj,k− E

??βj,k

?

L2. Using Proposition 1 in Comte [8], Assumptions 1

??2

≤C

nfor some constant C > 0. Moreover,

p,q(M) with s >1

pand 1 ≤ p ≤ 2,

E

?2

= Bias2??βj,k

?

+ V ar

??βj,k

≤M2

n.

Proof. Note that Bias2??βj,k

?

L2≤

C2+39C2

4π2n

1

, which gives the result for the bias term. To bound the

V ar

?

= E?In− E(In),ψj,k?2

≤ E?In− E(In)?2

E|In(ω) − E(In(ω))|2dω.

L2?ψj,k?2

L2

=

0

n. To complete the proof it remains to remark that

p,q(M).

Lemma 5.9 Let n ≥ 1, bj,k:= ?f,ψj,k? and?bj,k:= ?In,ψj,k? for j ≥ j0and 0 ≤ k ≤ 2j−1. Suppose

?

?

Proof. Note that

that Assumptions 1 and 2 hold. Then for any x > 0,

P

|?bj,k− bj,k| > 2?f?∞

where C∗=

??x

n+ 2j/2?ψ?∞x

n

?

+C∗

√n

?

≤ 2e−x,

C2+39C2

4π2

1

?bj,k=

1

2πn

n

?

t=1

n

?

t′=1

?Xt− X??Xt′ − X?∗

1

?

0

ei2πω(t−t′)ψj,k(ω)dω =

1

2πnX

TTn(ψj,k)X∗,

where X =

?X1− X,...,Xn− X?T, XTdenotes the transpose of X and Tn(ψj,k) is the Toeplitz

matrix with entries [Tn(ψj,k)]t,t′ =

0

loss of generality that E (Xt) = 0, and then under under Assumptions 1 and 2, X is a centered

Gaussian vector in Rnwith covariance matrix Σ = Tn(f). Using the decomposition X = Σ

where ε ∼ N (0,In), it follows that?bj,k=

1?

ei2πω(t−t′)ψj,k(ω)dω, 1 ≤ t,t′≤ n. We can assume without

1

2ε,

1

2πnεTAj,kε, with Aj,k= Σ

1

2Tn(ψj,k)Σ

1

2. Note also that

E

??bj,k

?

=

1

2πntr(Aj,k), where tr(A) denotes the trace of a matrix A.

13

hal-00440424, version 2 - 15 Dec 2009

Page 14

Now let s1,...,snbe the eigenvalues of the Hermitian matrix Aj,kwith |s1| ≥ |s2| ≥ ... ≥ |sn|

and let Z = 2πn

= εTAj,kε − tr(Aj,k). Then, for 0 < λ < (2|s1|)−1one has that

??bj,k− E

eλZ??

??bj,k

−λsi−1

??

log

?

E

?

=

n

?

n

?

n

?

i=1

2log(1 − 2λsi)

=

i=1

+∞

?

−λ|si| −1

ℓ=2

1

2ℓ(2siλ)ℓ≤

n

?

i=1

+∞

?

ℓ=2

1

2ℓ(2|si|λ)ℓ

≤

i=1

2log(1 − 2λ|si|)

where we have used the fact that −log(1 − x) =?+∞

ℓ=1

2, the above inequality implies that

xℓ

ℓfor x < 1. Then using the inequality

−u −1

2log(1 − 2u) ≤

?

where ?s?2=?n

?

that implies

?

Let τ (A) denotes the spectral radius of a matrix A. For the Toeplitz matrices Σ = Tn(f) and

Tn(ψj,k) one has that

u2

1−2uthat holds for all 0 < u <1

n

?

i=1|si|2. Arguing as in Birg´ e and Massart [6], the above inequality implies that for

any x > 0, P(|Z| > 2|s1|x + 2?s?√x) ≤ 2e−x, which can also be written as

??bj,k

log

E

?

eλZ??

≤

i=1

λ2|si|2

1 − 2λ|si|≤

λ2?s?2

1 − 2λ|s1|

P

|?bj,k− E

?

| > 2|s1|

x

2πn+ 2?s?

2πn

√x

?

≤ 2e−x,

P

|?bj,k− E

??bj,k

?

| > 2|s1|x

n+ 2?s?

n

√x

?

≤ 2e−x.(5.2)

τ (Σ) ≤ ?f?∞and τ (Tn(ψj,k)) ≤ ?ψj,k?∞= 2j/2?ψ?∞

The above inequalities imply that

?

Let λi, i = 1.,,,.n, be the eigenvalues of Tn(ψj,k). From Lemma 3.1 in Davies [12], we have that

|s1| = τΣ

1

2Tn(ψj,k)Σ

1

2

?

≤ τ (Σ)τ (Tn(ψj,k)) ≤ ?f?∞2j/2?ψ?∞

(5.3)

limsup

n→+∞

1

ntr?Tn(ψj,k)2?= limsup

n→+∞

1

n

n

?

i=1

λ2

i=

1

?

0

ψ2

j,k(ω)dω = 1,

which implies that

?s?2=

n

?

i=1

|si|2= tr?A2

j,k

?= tr

?

(ΣTn(ψj,k))2?

≤ τ (Σ)2tr?Tn(ψj,k)2?≤ ?f?2

∞n,(5.4)

14

hal-00440424, version 2 - 15 Dec 2009

Page 15

where we have used the inequality tr?(AB)2?≤ τ (A)2tr?B2?that holds for any pair of Hermitian

?

Now, let ξj,n= 2?f?∞

?????bj,k− bj,k

By Lemma 5.8, one has that− bj,k

which implies using (5.5) that

?????bj,k− bj,k

which completes the proof of Lemma 5.9.

matrices A,B. Combining (5.2), (5.3) and (5.4), we finally obtain that for any x > 0

??x

??x

P

≤ P

???E

P

≤ P

P

|?bj,k− E

??bj,k

?

| > 2?f?∞

n+ 2j/2?ψ?∞x

?+C∗

?????bj,k− E

n

??

≤ 2e−x. (5.5)

n+ 2j/2?ψ?∞x

?

??bj,k

?

n

√n, and note that

??bj,k

C∗

√n, and thus ξj,n −

??? > ξj,n

???? > ξj,n−

???E

??bj,k

?

− bj,k

???E

???

?

,

?

??? ≤

??bj,k

?

− bj,k

??? ≥ ξj,n−C∗

√n

??? > ξj,n

?????bj,k− E

??bj,k

???? > ξj,n−C∗

√n

?

≤ 2e−x,

Lemma 5.10 Assume that f ∈ Fs

1, 2 and 3 hold. For any n > 1, define j0= j0(n) to be the integer such that 2j0> logn ≥ 2j0−1,

and j1 = j1(n) to be the integer such that 2j1≥

ξj,n= 22?f?∞

?f,ψj,k? and?βξj,n,(j,k):= δξj,n

(j,k)∈Λj1

n:

(j,k)∈Λj1

uniformly over Fs

p,q(M).

p,q(M) with s >1

2+1

pand 1 ≤ p ≤ 2. Suppose that Assumptions

n

logn≥ 2j1−1. For δ ≥ 6, take the threshold

as in (3.1), where C∗=

?

??

δ logn

n

+ 2

j

2 ?ψ?∞

??βj,k

δ logn

n

?

+C∗

√n

?

?

C2+39C2

4π2

1

. Let βj,k:=

?

with (j,k) ∈ Λj1as in (2.4). Take β = (βj,k)(j,k)∈Λj1and

. Then there exists a constant M3> 0 such that for all sufficiently large

?βξj,n=

??βξj,n,(j,k)

???β −?βξj,n

?

E

???

2

2:= E

?

???βj,k− δξj,n

??βj,k

????

2

≤ M3

?

n

logn

?−

2s

2s+1

Proof. Taking into account that

E

???β −?βξj,n

???

2

2=

2j0−1

?

j1

?

k=0

E(aj0,k− ? aj0,k)2+

2j−1

?

:= T1+ T2+ T3,

j1

?

j=j0

2j−1

?

k=0

E

??

bj,k−?bj,k

?2I

?????bj,k

??? > ξj,n

??

+

j=j0

k=0

b2

j,kP

?????bj,k

??? ≤ ξj,n

?

(5.6)

we are interested in bounding these three terms. The bound for T1follows from Lemma 5.8 and the

fact that j0= log2(logn) ≤

?2j0

1

2s+1log2(n):

T1=

2j0−1

?

k=0

E(aj0,k− ? aj0,k)2= O

n

?

≤ O

?

n−

2s

2s+1

?

.(5.7)

15

hal-00440424, version 2 - 15 Dec 2009