Page 1

Journal of Multivariate Analysis 99 (2008) 2406–2443

www.elsevier.com/locate/jmva

Model checking in errors-in-variables regression

Weixing Song

Department of Statistics, Kansas State University, Manhattan, KS 66502, USA

Received 28 January 2007

Available online 10 March 2008

Abstract

This paper discusses a class of minimum distance tests for fitting a parametric regression model to a

class of regression functions in the errors-in-variables model. These tests are based on certain minimized

distancesbetweenanonparametricregressionfunctionestimatorandadeconvolutionkernelestimatorofthe

conditional expectation of the parametric model being fitted. The paper establishes the asymptotic normality

of the proposed test statistics under the null hypothesis and that of the corresponding minimum distance

estimators. We also prove the consistency of the proposed tests against a fixed alternative and obtain the

asymptotic distributions for general local alternatives. Simulation studies show that the testing procedures

are quite satisfactory in the preservation of the finite sample level and in terms of a power comparison.

c ? 2008 Elsevier Inc. All rights reserved.

AMS 1991 subject classifications: primary 62G08; secondary 62G10

Keywords: Errors-in-variables model; Deconvolution kernel estimator; Minimum distance; Lack-of-fit test

1. Introduction

In the errors-in-variables regression model of interest here, one observes Z,Y obeying the

model

Y = µ(X) + ε,

where X is the unobservable d-dimensional random design variables. The random variables

(X,u,ε) are assumed to be mutually independent, with u being d-dimensional and ε being one-

dimensional having E(ε) = 0, E(u) = 0. The marginal densities of X and u will be denoted as

fX, furespectively. For the sake of identifiability, the density fuis assumed to be known. This is

Z = X + u,

(1.1)

E-mail address: weixing@ksu.edu.

0047-259X/$ - see front matter c ? 2008 Elsevier Inc. All rights reserved.

doi:10.1016/j.jmva.2008.02.034

Page 2

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2407

a common and standard assumption in the literature of the errors-in-variables regression models.

The density fXand the distribution of ε may not be known.

Errors-in-variables regression models have received continuing attention in the statistical

literature. For some literature reviews, see [17,1,16,2,4,13–15,6,7] and the references therein.

Most of the existing literature has focused on the estimation problem. The lack-of-fit testing

problemhasnotbeendiscussedthoroughly.Onlysomesporadicresultsonthistopiccanbefound

in the literature. See [16,5] for some informal lack-of-fit tests in the linear errors-in-variables

regression model. The problem of interest in this paper is to develop tests for the hypothesis

H0: µ(x) = θ?

in the model (1.1).

Many interesting and profound results, on the other hand, are available for the above testing

problem in the absence of errors in the independent variables, that is, for the ordinary regression

models. For instance, [10–12,18,21,23,22], among others, give such results.

For the errors-in-variables model, in the case of r(x) = x, Zhu, Cui and Ng [25] found a

necessary and sufficient condition for the linearity of the conditional expectation E(ε|Z) with

respect to Z. Based on this fact, they constructed a score type lack-of-fit test. This test is of

limited use since the normality assumptions on the design variable and measurement errors are

rather restrictive. Zhu, Song and Cui [24] and Cheng and Kukush [8] independently extended

the method of Zhu, Cui and Ng to deal with a polynomial errors-in-variables model without

assuming the normality. The model checking problem for general r(x) was studied by Zhu and

Cui [26]. After correcting for the bias of the conditional expectation given Z of least square

residuals, they construct a score type test based on the modified residuals, but the theoretical

arguments still require the density function of X to be known up to an unknown parameter. This

restriction will be removed in the current developments. Cheng and Kukush [8] does not require

the density function of X to be known, but their procedure puts very strict restrictions on the

moments of the predictor and the measurement error, also their procedure is computationally

extensive.

The paper is organized as follows. Section 2 introduces the construction of the test. Section 3

states the needed assumptions and the main results. A multidimensional extension of Lemma A.1

in [20] and the consistency and the asymptotic power of the test against certain fixed alternatives

and a class of local alternatives are also stated in Section 3. Section 4 includes some results from

finite sample simulation studies. The conclusion and some further discussion on the MD test are

present in Section 5. All the technical proofs are postponed to Section 6.

0r(x),

for some θ0∈ Rq,

versus H1: H0is not true,

(1.2)

2. Construction of test

The way for constructing tests here is to first recognize that the independence of X and ε

and E(ε) = 0 imply that ν(z) = E(Y|Z = z) = E(µ(X)|Z = z). Thus one can consider

the new regression model Y = ν(Z) + ζ in which the error ζ is uncorrelated with Z and has

mean 0. The problem of testing for H0is now transformed to a test for ν(z) = νθ0(z), where

νθ(z) = θ?E(r(X)|Z = z).

A very important question related to the above transformation is: Are the two hypotheses,

H10 : µ(x) = mθ(x), for all x, and H20 : ν(z) = νθ(z), for all z, equivalent? The answer

is generally negative, but note that E(m1(X)|Z = z) = E(m2(X)|Z = z) is equivalent

to

as a distribution family with parameter z ∈ Rd, forms a complete family, then these two

?m1(x) fX(x) fu(z − x)dx =

?m2(x) fX(x) fu(z − x)dx for all z, hence if fu(z − ·),

Page 3

2408

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

hypotheses are indeed equivalent. This is the case, for example, for double exponential and

normal distributions.

For any z for which fZ(z) > 0, we have ν(z) =?µ(x) fX(x) fu(z − x)dx/fZ(z). If fXis

suppose (Yi, Zi), i = 1,2,...,n are independently and identically distributed copies of (Y, Z)

from model (1.1), h is a bandwidth only depending on n and Khi(z) = K((z − Zi)/h)/hdfor

any kernel function K and bandwidth h. If we define

?

nfZ(z)

i=1

where G isaσ-finitemeasureonRd,I isacompactsubsetinRd,thenonecanseethat,¯Tnindeed

is a weighted distance between a nonparametric kernel estimator and a parametric estimator of

the regression function ν(z). Then, we may use¯θn = argminθ∈Rq¯Tn(θ) to estimate θ, and

construct the test statistic through¯Tn(¯θn). The same method, called minimum distance (MD)

procedure, was used in the recent paper of Koul and Ni [19] (K–N) in the classical regression

set up. One can see that, if fX is known, the above test procedure will be a trivial extension

of K–N.

Unfortunately, fXis generally not known and hence fZand Q are unknown. This makes the

above procedure infeasible. To construct the test statistic, one needs estimators for fZand Q. In

this connection the deconvolution kernel density estimators are found to be useful here.

For any density L on Rd, let φLdenote its characteristic function and define

?

x ∈ Rd,

where i = (−1)1/2. The function ˆfXhis called a deconvolution kernel density estimator and it

can be used to estimate fX. See Masry [9]. Note that Q(z) = R(z)/fZ(z), where

?

Then, one can estimate Q(z) byˆQn(z) =ˆRn(z)/ˆfZh(z), where

ˆRn(z) =

known then fZis known and hence νθis known except for θ. Let Q(z) = E(r(X)|Z = z). Now

¯Tn(θ) =

?

I

1

n

?

Khi(z)(Yi− θ?Q(Zi))

?2

dG(z),θ ∈ Rq,

Lh(x) =

1

(2π)d

Rdexp(−it · x)

φL(t)

φu(t/h)dt,

ˆfXh(x) =

1

nhd

n

?

i=1

Lh

?x − Zi

h

?

,

(2.1)

R(z) =

r(x) fX(x) fu(z − x)dx,

fZ(z) =

?

fX(x) fu(z − x)dx.

(2.2)

?

r(x)ˆfXh(x) fu(z − x)dx,

ˆfZh(z) =

?

ˆfXh(x) fu(z − x)dx.

At this point, it is worth mentioning that, by the definition of Lhand a direct calculation, one

can showˆfZhis nothing but the classical kernel estimator of fZwith kernel L and bandwidth h.

That is, ˆfZh(z) =?n

classical kernel estimator in which a different kernel other than L may be adopted.

It is well known that the convergence rates of the deconvolution kernel density estimators

are slower than that of the classical kernel density estimators. See [9,20] and [14]. This creates

extra difficulty when considering the asymptotic behaviors of the analogs of the corresponding

MD estimators and test statistics. In fact, the consistency of the corresponding MD estimator is

still available, but its asymptotic normality and that of the corresponding MD test statistic may

i=1L((z − Zi)/h)/nhd. Our proposed inference procedures will be based

on the analogs of¯Tnwhere Q(z) is replaced by the above estimatorˆQn, and fZis replaced by a

Page 4

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2409

not be obtained. We overcome this difficulty by using different bandwidths and splitting the full

sample, say S, with sample size n into two subsamples, S1with size n1, and S2with size n2,

then using the subsample S2to estimate fXhence Q(z) and the subsample S1to estimate other

quantities. The sample size allocation scheme is stated in Section 3. A more detailed discussion

on this sample size allocation scheme can be found in Section 5. Without loss of generality, we

will number the observations in S1from 1 to n1, and the observations in S2from n1+ 1 to n.

Also all the integration with respect to G in the following will be over the compact subset I.

To be precise, let

˜fZh2(z) =

n1

?

?

?

i=1

Kh2i(z)/n1,

ˆfXw(x) =

n

?

j=n1+1

Lw((x − Zj)/w)/n2wd,

ˆRn2(z) =

r(x)ˆfXw1(x) fu(z − x)dx,

ˆfZw2(z) =

ˆfXw2(x) fu(z − x)dx,

ˆQn2(z) =ˆRn2(z)/ˆfZw2(z),

then define, for θ ∈ Rq,

Mn(θ) =

??

1

n1˜fZh2(z)

n1

?

i=1

Kh1i(z)(Yi− θ?ˆQn2(Zi))

?2

dG(z),

(2.3)

with h1, h2depending on n1, and w1and w2depending on n2. One can easily see that Mn(θ)

is a weighted distance between a nonparametric kernel estimator and a deconvolution kernel

estimator of the regression function ν(z) under the null hypothesis. Then we may use

ˆθn= arg inf

θ∈RqMn(θ)

(2.4)

to estimate θ, and construct the test statistic through Mn(ˆθn). We first prove the consistency of

ˆθnfor θ, then the asymptotic normality of√n1(ˆθn− θ0). Finally, let

n1

?

ˆΓn= 2hd

i?=j=1

We prove that the asymptotic null distribution of the normalized test statistic

ˆζi= Yi−ˆθ?

nˆQn2(Zi),

ˆCn= n−2

1

i=1

?

K2

h1i(z)ˆζ2

idˆψh2(z),

?2

1n−2

1

n1

?

??

Kh1i(z)Kh1j(z)ˆζiˆζjdˆψh2(z),

dˆψh2(z) :=

dG(z)

˜f2

Zh2(z).

ˆDn= n1hd/2

is standard normal. Consequently, the test that rejects H0 whenever |ˆDn| > zα/2 is of the

asymptotic size α, where zαis the 100(1 − α)% percentile of the standard normal distribution.

1

ˆΓ−1/2

n

(Mn(ˆθn) −ˆCn)

(2.5)

3. Assumptions and main results

This section first states the various conditions needed in the subsequent sections. About the

errors, the underlying design and the integrating σ-finite measure G, we assume the following:

Page 5

2410

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

(e1) The random variables {(Zi,Yi) : Zi ∈ Rd,Yi ∈ R,i = 1,2,...,n} are independently

and identically distributed with the conditional expectation ν(z) = E(Y|Z = z) satisfying

?ν2dG < ∞, where G is a σ-finite measure on Rd.

function δ2(z) = E[(θT

(e3) E|ε|2+δ< ∞, E?r(X)?2+δ< ∞, for some δ > 0.

(e4) E|ε|4< ∞, E?r(X)?4< ∞.

(u) The density function fuis continuous and?|φu(t)|dt < ∞.

(f2) For some δ0> 0, the density fZis bounded below on the compact subset Iδ0of Rd, where

Iδ0= {y ∈ Rd: max

(e2) 0 < σ2

ε= Eε2< ∞, E?r(X)?2< ∞, where ? · ? denotes the usual Euclidean norm. The

0r(X) − θT

0Q(Z))2|Z = z] is a.s. (G) continuous.

(f1) The density fXand its all possible first and second derivatives are continuous and bounded.

1≤j≤d|yj− zj| ≤ δ0, y = (y1,..., yd)?,z = (z1zd)?,z ∈ I},

(g) G has a continuous Lebesgue density g.

(q) Σ0=?

About the null model we need to assume the following:

Q(z)Q?(z)dG(z) is positive definite.

(m1) There exist a positive continuous function J(z) and a positive number T0, such that for all

t with ?t? > T0,

?t?−α

φu(t)

holds for some α ≥ 0 and all z ∈ Rd, and EJ2(Z) < ∞.

(m2) E?r(Z)?2< ∞, EI2(Z) < ∞, where I(z) =??r(x)? fu(z − x)dx.

About the kernel functions, we assume:

????

?(r(z − x) − r(z))exp(−it?x) fu(x)dx

????≤ J(z),

(?) The kernel function L is a density, symmetric around the origin, supt∈Rd ?t?α|φL(t)| < ∞,

for all t ∈ Rd; moreover,??v?2L(v)dv < ∞ and??t?α|φL(t)|dt < ∞ with α as in (m1).

About the bandwidths and sample size we need to assume the following:

(n) With n denoting the sample size, let n1, n2be two positive integers such that n = n1+ n2,

n2= [nb

(h1)h1∼ n−a

(w1) w1= n−1/(d+4+2α)

Assumption (m1) is not so strict as it appears. Some commonly used regression functions

such as polynomial and exponential functions indeed satisfy this assumption as shown below.

1], b > 1 + (d + 2α)/4, where α is as in (m1).

1, where 0 < a < min(1/2d,4/d(d + 4)). (h2) h2= c1(log(n1)/n1)1/(d+4).

2

. (w2) w2= c2(log(n2)/n2)1/(d+4).

Example 1. Suppose d = q, r(x) = x, and u ∼ Nd(0,Σu). Then,

????

?(r(z − x) − r(z))exp(−it?x) fu(x)dx

φu(t)

????=

????

?

∂φu(t)

i∂t

x exp(−it?x) fu(x)dx

????· exp(t?Σut/2) ≤ c?t?,

????· exp

?t?Σut

2

?

=

????

where the constant c depends only on Σu. Hence (m1) holds with α = 1 and J(z) = c.

Page 6

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2411

Example 2. Suppose d = q = 1, r(x) = x2, and u has a double exponential distribution with

mean 0 and variance σ2

????

≤ 2|z|

u. In this case, φu(t) = 1/(1 + σ2

?(r(z − x) − r(z))exp(−itx) fu(x)dx

ut2/2) and

φu(t)

????=

????

?

(−2zx + x2)exp(−itx) fu(x)dx

????

≤ 2|z|

1 + σ2

+

(1 + σ2

????

?

?

|φu(t)|

∂φu(t)

i∂t

σ2

u|t|

????

?

|φu(t)| +

????

∂2φu(t)

∂t2

????

|φu(t)|

ut2/2+

2σ4

ut2/2)2.

σ2

u

ut2/2

1 + σ2

ut2

Hence, as |t| → ∞, (m1) holds for α = 0 and, J(z) = 2|z| + 2. One can easily verify

that a similar result holds for r(x) = xk, where k is any positive integer, hence for r(x) being

polynomials of x.

Example 3. Suppose d = q = 1, r(x) = ex, and u ∼ N(0,σ2

????

u). Then

?

(r(z − x) − r(z))exp(−itx) fu(x)dx

?????

J(z) = cez.

Next, we give some general preliminaries needed for the proofs below.

The following lemma is a multidimensional extension of a Stefanski and Carroll [20] result

which will be frequently used in the following.

????=

????

?

(ez−x− ez)exp(−itx) fu(x)dx

????

≤ ez

?

exeitxfu(x)dx

????+ |φu(t)|

?

≤ cez|φu(t)|,

where c is some positive number depending only on σ2

u. Hence, (m1) holds for α = 0 and,

Lemma 3.1. Suppose d ≥ 1, and (f1), (u), (m1), (h1) hold. Then for any z ∈ Rd,

?EˆRn2(z) − R(z)?2≤ cw4

+?r(z)?2),

where R(z) is as in (2.2), I(z) is as in (m2), J(z) is as in (m1) and c is a constant not depending

on z,n2and w1.

1I2(z),

E?ˆRn2(z) − EˆRn2(z)?2≤

c

n2wd

1

(J2(z)w−2α

1

By the usual bias and variance decomposition of mean square error, the following inequality

is a direct consequence of Lemma 3.1,

c

n2wd

1

If the bandwidth w1is chosen by assumption (w1), then

E?ˆRn2(z) − R(z)?2≤ cw4

1I2(z) +

(J2(z)w−2α

1

+ ?r(z)?2).

E?ˆRn2(z) − R(z)?2≤ cn

−

2

4

d+2α+4

(I2(z) + J2(z) + ?r(z)?2).

(3.1)

Page 7

2412

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

In the following, we will write

T (z) = I2(z) + J2(z) + ?r(z)?2.

The following lemma will be used repeatedly, which, along with its proof, appears as Theorem

2.2 part (2) in [3]. We state the lemma for a sample size n and a bandwidth h, they may be

replaced by n1or n2, h2or w2according to the context.

(3.2)

Lemma 3.2. LetˆfZbe the kernel estimator with a kernel K which satisfies a Lipschitz condition

and has bandwidth h. If fZis twice continuously differentiable, and the bandwidth h is chosen

to be cn(log(n)/n)1/(d+4), where cn→ c > 0, then

(logkn)−1(n/log(n))2/(d+4)sup

z∈I

for any positive integer k and compact set I.

Recall the definitions in (2.3). Because the null model is linear in θ, so the minimizerˆθnhas

an explicit form obtained by setting the derivative of Mn(θ) with respect to θ equal to 0, which

gives the equation

?

?

Adding and subtracting θ?

the following equation:

?

?

The above explicit relation betweenˆθn− θ0and other quantities allows us, compared to K–N,

to investigate the asymptotic distribution ofˆθnwithout proving the consistency in advance. Most

importantly, the separation ofˆθnfromˆRn2(z) makes a conditional expectation argument in the

following proofs relatively easy.

The asymptotic distributions ofˆθnand Mn(ˆθn) under the null hypothesis are summarized in

the following theorems.

|ˆfZ(z) − fZ(z)| → 0

a.s.

1

n1

n1

?

i=1

Kh1i(z)ˆQn2(Zi) ·

1

n1

n1

?

n1

?

i=1

Kh1i(z)ˆQ?

n2(Zi)dˆψh2(z) ·ˆθn

=

1

n1

n1

?

i=1

Kh1i(z)Yi·

1

n1

i=1

Kh1i(z)ˆQn2(Zi)dˆψh2(z).

(3.3)

0ˆQn2(Zi) from Yi, and doing some routine arrangement,ˆθnwill satisfy

1

n1

n1

?

i=1

Kh1i(z)ˆQn2(Zi) ·

1

n1

n1

?

i=1

Kh1i(z)ˆQ?

n2(Zi)dˆψh2(z) · (ˆθn− θ0)

=

1

n1

n1

?

i=1

Kh1i(z)(Yi− θ?

0ˆQn2(Zi)) ·

1

n1

n1

?

i=1

Kh1i(z)ˆQn2(Zi)dˆψh2(z).

(3.4)

Theorem 3.1. Suppose

(?), (n), (h1), (h2), (w1), and (w2) hold, then√n1(ˆθn− θ0) ?⇒ Nd(0,Σ−1

?

Σ0is as in condition (q), and τ2(z) = σ2

H0, (e1),(e2),(e3),(u),(f1),(f2),(q),

0ΣΣ−1

(m1), (m2),

0), where

Σ =

τ2(z)Q(z)Q?(z)g2(z)/f (z)dz,

ε+ δ2(z), where σ2

ε, and δ2(z) are as in (e2).

Page 8

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2413

Theorem 3.2. Suppose H0, and the conditions (e1), (e2), (e4), (u), (f1), (f2), (q), (m1),

(m2), (?), (n), (h1), (h2), (w1) and (w2) hold, thenˆDn?⇒ N(0,1), whereˆDnis as in (2.5).

We end this section by adding some remarks. First, the MD estimator and testing procedure

depend on the choice of the integrating measure. In the classical regression case, K–N provides

some guidelines on how to choose G. The same guidelines also apply here. For example, in the

one-dimensional case, the asymptotic variance of√n(ˆθn− θ0) can attain its minimum if G is

chosen to be ˆfZh2(z). As far as the MD test statistic Mn(ˆθn) is concerned, the choice of G will

depend on the alternatives. In the classical regression case, K–N found that the test has high

power against the selected alternatives, if the density function is chosen to be the square of the

density estimator of the design variables. The same phenomenon happens in our case. Secondly,

since replacingˆΓnin (2.5) by any other consistent estimator of Γ does not affect the validity of

Theorem 3.2, where

?

σ2

estimator of Γ, such as

to make the test computationally efficient, where the constant C = 2?[?

Let m(x) be a Borel measurable real-valued function of x ∈ Rd, and H(z) = E(m(X)|Z = z)

such that H(z) ∈ L2(G). We will show that the MD estimator defined by (2.4) converges to some

finite constant in probability, then based on this result, one can show the consistency of the MD

test against certain fixed alternatives. In fact, we have

Γ = 2

(σ2

e(z))2g(z)dψ(z) ·

? ??

K(u)K(u + v)du

?2

dv,

(3.5)

e(z) = σ2

ε+ δ2(z), δ2(z) is as in condition (e2), so we can choose some other consistent

¯Γn= C

?

n1 ?

i=1

Kh1i(z)(Yi−ˆθ?

n1ˆfZh2(z)

nˆQn2(Zi))2

2

g(z)dˆψh2(z),

(3.6)

K(u)K(u +v)du]2dv.

Finally we present some theoretical results about asymptotic power of the proposed tests.

Theorem 3.3. Suppose the conditions in Theorem 3.2 and the alternative hypothesis Ha :

µ(x) = m(x),∀x hold with the additional assumption that infθ

Then, for the MD estimatorˆθndefined in (2.4), |ˆDn| → ∞ in probability.

Now we consider the asymptotic power of the proposed MD tests against the following local

alternatives.

??

where v(x) is an arbitrary and known continuous real-valued function with V(z) = E(v(X)|Z =

z) ∈ L2(G). The following theorem gives asymptotic distribution of the MD test against the local

alternative (3.7). This enables us to investigate the asymptotic local power of the MD test.

?[H(z) − θ?Q(z)]2dG(z) > 0.

Hna: µ(x) = θ?

0r(x) + γnv(x),γn= 1

n1hd/2

1

(3.7)

Theorem 3.4. Suppose the conditions in Theorem 3.2. Then under the local alternative (3.7), we

haveˆDn→dN(Γ−1/2D,1), where

D =

and Γ is as in (3.5).

?

V2(z)dG(z) +

?

V(z)Q?(z)dG(z) ·

?

Q(z)Q?(z)dG(z) ·

?

V(z)Q(z)dG(z)

Page 9

2414

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

Table 4.1

Mean and MSE ofˆθn

(50, 134)(100, 317) (200, 753)(300, 1250)(500, 2366)

Mean

MSE

1.0103

0.0014

1.0095

0.0007

1.0102

0.0004

1.0105

0.0003

1.0098

0.0002

4. Monte Carlo simulation

This section contains the results of four simulations corresponding to the following cases:

Case 1: d = q = 1 and mθlinear, the measurement error ? is chosen to be normal and u double

exponential; Case 2: d = q = 1 and mθlinear, the measurement error ? and u are chosen to be

normal; Case 3: d = 1,q = 2 and mθa polynomial, the measurement error ? is chosen to be

normal and u double exponential; Case 4: d = q = 2, and mθlinear, the measurement error ? is

chosen to be normal and u double exponential. It is easy to check that the models being simulated

below satisfy all the conditions stated in Section 3. In each case the Monte Carlo average ofˆθn,

MSE(ˆθn), empirical levels and powers of the MD test are reported. The asymptotic level is taken

to be 0.05 in all cases. For any random variable W, we will use {Wjkj}nj

the jth subsample Sjfrom W with sample size nj. So the full sample is S1

the simulation less time consuming,¯Γndefined in (3.6) will be used in the test statistic instead

ofˆΓn. So the value of the test statistic is calculated by? Dn= n1h1/2

on [−1,1], {εjkj}nj

and {ujkj}nj

mean 0 and variance 0.01. The parametric model is taken to be mθ(X) = θX, and the true

parameter θ0 = 1. Then (Yi, Zi) are generated using the model Yjkj= Xjkj+ εjkj, Zjkj=

Xjkj+ ujkj,kj= 1,2,...,nj, j = 1,2. From Example 2, we know that the assumption (m1)

is held for α = 0. The kernel functions K and K∗and the bandwidths used in all the simulations

are

kj=1, j = 1,2 to denote

?S2. Finally, to make

¯Γ−1/2

1

n

(Mn(ˆθn) −ˆCn).

Case 1. In this case, {Xjkj}nj

kj=1are obtained as a random sample from the uniform distribution

kj=1are obtained as a random sample from the normal distribution N(0,0.12)

kj=1are obtained as a random sample from the double exponential distribution with

K(z) = K∗(z) =3

with some choices for a and b. For the chosen kernel function (4.1), the constant C in¯Γnis

equal to 0.7642. The kernel function used in (2.1) is chosen to be the standard normal, so that

the deconvolution kernel function with bandwidth w takes the form Lw(x) = exp(−x2/2)[1 −

0.005(x2− 1)/w2]/√2π, and the bandwidth w1 = n−1/5

chosen by the assumptions (w1) and (w2). Correspondingly,ˆQn2(z) = yˆRn2(z)/˜fZw2(z), where

ˆRn2(z) =

4(1 − z2)I(|z| ≤ 1),

h1= an−1/3

1

,

h2= bn−1/5

1

(logn1)1/5, (4.1)

2

, w2 = (log(n2)/n2)1/5which are

?

xˆfXw1(x) fu(z − x)dx,

˜fZw2=

?

ˆfXw2(x) fu(z − x)dx.

Table 4.1 reports the Monte Carlo mean and the MSE(ˆθn) under H0for the sample sizes

(n1,n2) = (50,134),(100,317),(200,753),(300,1250),(500,2366), each repeated 1000

times. One can see that, there appears to be a small bias inˆθnfor all chosen sample sizes and as

expected, the MSE decreases as the sample size increases.

Page 10

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2415

Table 4.2

Levels and powers of the minimum distance test

Model (a, b) (50, 134)(100, 317) (200, 753)(300, 1250) (500, 2366)

(0.3, 0.5)

(0.3, 0.8)

(0.5, 0.5)

(1.0, 0.8)

(1.0, 1.0)

0.003

0.008

0.010

0.024

0.028

0.008

0.014

0.011

0.028

0.037

0.009

0.017

0.020

0.026

0.030

0.020

0.031

0.030

0.039

0.048

0.041

0.053

0.049

0.050

0.054

Model 0

(0.3, 0.5)

(0.3, 0.8)

(0.5, 0.5)

(1.0, 0.8)

(1.0, 1.0)

0.407

0.491

0.704

0.921

0.926

0.865

0.888

0.975

0.999

0.997

0.987

0.990

0.999

1.000

1.000

0.997

0.998

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

Model 1

(0.3, 0.5)

(0.3, 0.8)

(0.5, 0.5)

(1.0, 0.8)

(1.0, 1.0)

0.898

0.919

0.985

0.999

0.999

0.972

0.976

0.999

1.000

1.000

0.999

0.999

0.999

1.000

1.000

0.999

0.999

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

Model 2

(0.3, 0.5)

(0.3, 0.8)

(0.5, 0.5)

(1.0, 0.8)

(1.0, 1.0)

0.774

0.807

0.933

0.992

0.988

0.959

0.964

0.966

1.000

1.000

0.993

0.993

0.999

1.000

1.000

0.998

0.998

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

Model 3

To assess the level and power behavior of the? Dntest, we chose the following four alternative

Model 0: Y = X + ε,

Model 1: Y = X + 0.3X2+ ε,

Model 2: Y = X + 1.4exp(−0.2X2) + ε,

Model 3: Y = XI(X ≥ 0.2) + ε.

To assess the effect of the choice of (a,b) that appears in the bandwidths on the level and power,

we ran the simulations for numerous choices of (a,b), ranging from 0.3 to 1. Table 4.2 reports

the simulation results pertaining to? Dnfor three choices of (a,b). The simulation results for the

study the empirical sizes, and from Models 1 to 3 are used to study the empirical powers of the

test. These entities are obtained by computing #{|? Dn| ≥ 1.96}/1000.

moderate sample sizes (n1≤ 200) but gets closer to the asymptotic level of 0.05 with the increase

in the sample size, and hence is stable over the chosen values of (a,b) for large sample sizes.

On the other hand the empirical power appears to be far less sensitive to the values of (a,b) for

the sample sizes of 100 and more. Even though the theory of the present paper is not applicable

to model 3, it was included here to see the effect of the discontinuity in the regression function

on the power of the minimum distance test. In our simulation, the discontinuity of the regression

has little effect on the power of the minimum distance test.

We also conduct a simulation in which the predictor X follows a normal distribution. The

results are similar to the results reported above, hence are omitted.

models for simulation:

other choices were similar to those reported here. Data from Model 0 in this table are used to

From Table 4.2, one sees that the empirical level is sensitive to the choice of (a,b) for

Page 11

2416

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

Table 4.3

Mean and MSE ofˆθn

(50, 941) (100, 3164)(200, 10643)(300, 21638) (500, 52902)

Mean

MSE

1.0051

0.0013

1.0078

0.0007

1.0085

0.0004

1.0101

0.0003

1.0169

0.0004

Table 4.4

Levels and powers of the minimum distance test

Model (50, 941)(100, 3164) (200, 10643)(300, 21638) (500, 52902)

Model 0

Model 1

Model 2

Model 3

0.018

0.918

0.999

0.993

0.022

0.999

1.000

1.000

0.029

1.000

1.000

1.000

0.035

1.000

1.000

1.000

0.049

1.000

1.000

1.000

Case 2: The measurement error in this case has normal distribution N(0,(0.1)2), x is generated

from uniform distribution U[−1,1] and ε ∼ N(0,0.12). By Example 1 in Section 2, we see the

assumption (m1) is satisfied with α = 1. Hence, by the sample allocation scheme (n), the sample

sizes n2= [n1]b, b > 7/4. In the simulation, we choose b = 7/4 + 0.0001. The bandwidths are

chosen to be

h1= n1/3

1,

h2= (log(n1)/n1)1/5,w1= n−1/7

2

,w2= (log(n2)/n2)1/5

by the assumptions (h1), (h2), (w1) and (w2). The kernel functions K, K∗are the same as

in the first case, while the density function L has a Fourier transform given by φL(t) =

max{(1 − t2)3,0}, the corresponding deconvolution kernel function then takes the form

Lw(x) =1

π

0

?1

cos(tx)(1 − t2)3exp(0.005t2/w2)dt.

Table 4.3 reports the Monte Carlo mean and the MSE of the MD estimatorˆθnunder H0. One can

see that, there appears to be a small bias inˆθnfor all chosen sample sizes and as expected, the

MSE decreases as the sample size increases.

To assess the level and power behavior of the? Dntest, we chose the following four alternative

Model 0: Y = X + ε,

Model 1: Y = X + 0.3X2+ ε,

Model 2: Y = X + 1.4exp(−0.2X2) + ε,

Model 3: Y = XI(X ≥ 0.2) + ε.

Table 4.4 reports the simulation results pertaining to? Dn. Data from Model 0 in this table are

of the test.

models for simulation:

used to study the empirical sizes, and from Models 1 to 3 are used to study the empirical powers

Case 3: This simulation considers the case of d = 1,q = 2. Everything here is the same as

in Case 1 except that the null model to test is mθ(X) = θ1X + θ2X2. The true parameters are

Page 12

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2417

Table 4.5

Mean and MSE ofˆθn

(50, 134)(100, 317) (200, 753)(300, 1250)(500, 2366)

Mean ofˆθn1

MSE ofˆθn1

Mean ofˆθn2

MSE ofˆθn2

1.0169

0.0058

1.0144

0.0031

1.0139

0.0015

1.0136

0.0011

1.0128

0.0007

2.0450

0.0124

2.0452

0.0076

2.0463

0.0046

2.0493

0.0042

2.0473

0.0033

Table 4.6

Levels and powers of the minimum distance test

Model(50, 134)(100, 317)(200, 753) (300, 1250) (500, 2366)

Model 0

Model 1

Model 2

Model 3

0.001

0.297

0.528

0.996

0.009

0.815

0.965

0.999

0.019

0.999

0.999

1.000

0.029

1.000

1.000

1.000

0.046

1.000

1.000

1.000

θ1= 1,θ2= 2. It is easy to see thatˆRn2(z) takes the form

??

Table 4.5 reports the Monte Carlo mean and the MSE of the MD estimatorˆθn = (ˆθn1,ˆθn2)

under H0. One can see that, there appears to be a small bias inˆθnfor all chosen sample sizes and

as expected, the MSE decreases as the sample size increases.

To assess the level and power behavior of the? Dntest, we chose the following four models to

Model 0: Y = X + 2X2+ ε,

Model 1: Y = X + 2X2+ 0.3X3+ 0.1 + ε,

Model 2: Y = X + 2X2+ 1.4exp(−0.2X2) + ε,

Model 3: Y = X + 2X2sin(X) + ε.

Table 4.6 reports the simulation results pertaining to? Dn. Data from Model 0 in this table are

of the test.

Case 4: This simulation considers the case of d = 2,q = 2. The null model we want to test

is mθ(X) = θ1X1+ θ2X2. X1and X2are both generated from uniform distribution U[−1,1],

ε ∼ N(0,0.12), and the measurement error is generated from double exponential distribution

with mean 0 and variance 0.01. The true parameters are θ1= 1,θ2= 2. The kernel functions K

and K∗and the bandwidths used in the simulation are

9

16(1 − z2

h1= n−1/5

is equal to 0.292. The kernel function used in (2.1) is chosen to be the bivariate standard normal,

ˆRn2(z) =

xˆfXw1(x) fu(z − x)dx,

?

x2ˆfXw1(x) fu(z − x)dx

??

.

simulate data from.

used to study the empirical sizes, and from Models 1 to 3 are used to study the empirical powers

K(z1,z2) = K∗(z1,z2) =

,h2= n−1/6

1)(1 − z2

2)I(|z1| ≤ 1,|z2| ≤ 1),

(4.2)

11

(logn1)1/6. For the chosen kernel function (4.2), the constant C in¯Γn

Page 13

2418

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

Table 4.7

Mean and MSE ofˆθn

(50, 354)(100, 1001) (200, 2830) (300, 5200) (500, 11188)

Mean ofˆθn1

MSE ofˆθn1

Mean ofˆθn2

MSE ofˆθn2

1.0099

0.0042

1.0120

0.0019

1.0115

0.0011

1.0094

0.0008

1.0113

0.0005

2.0202

0.0042

2.0220

0.0027

2.0213

0.0014

2.0225

0.0011

2.0209

0.0008

Table 4.8

Levels and powers of the minimum distance test

Model(50, 354) (100, 1001)(200, 2830) (300, 5200) (500, 11188)

Model 0

Model 1

Model 2

Model 3

0.002

0.908

0.992

0.935

0.012

0.998

0.999

0.996

0.018

1.000

1.000

1.000

0.016

1.000

1.000

1.000

0.038

1.000

1.000

1.000

so the deconvolution kernel function with bandwidth w takes the form

?

2

Lw(x) =

1

2πexp

−x2

1+ x2

2

??

1 −0.005(x2

1− 1)

w2

??

1 −0.005(x2

2− 1)

w2

?

.

Since (m1) holds for α = 0, so the bandwidths w1= n−1/6

chosen by assumption (w1) and (w2). According to the assumption (n) we take n2= n1.5001

Table 4.7 reports the Monte Carlo mean and the MSE of the MD estimatorˆθn = (ˆθn1,ˆθn2)

under H0. One can see that, there appears to be a small bias inˆθnfor all chosen sample sizes and

as expected, the MSE decreases as the sample size increases.

To assess the level and power behavior of the? Dntest, we chose the following four models to

Model 0: Y = X1+ 2X2+ ε,

Model 1: Y = X1+ 2X2+ 0.3X1X2+ 0.9 + ε,

Model 2: Y = X1+ 2X2+ 1.4(exp(−0.2X1) − exp(0.7X2)) + ε,

Model 3: Y = X1I(X2≥ 0.2) + ε.

Table 4.8 reports the simulation results pertaining to? Dn. Data from Model 0 in this table are

of the test.

2

, w2= (log(n2)/n2)1/6which are

1

.

simulate data from.

used to study the empirical sizes, and from Models 1 to 3 are used to study the empirical powers

5. Conclusion and discussion

For the general linear errors-in-variables model, this paper proposes an MD test procedure,

based on the minimum distance idea and by exploiting the nature of deconvolution density

estimator, to check if the regression function takes a parametric form. As a byproduct, the

MD estimator for the regression parameters is also derived. The asymptotic normality of the

proposed test statistics under the null hypothesis and that of the corresponding minimum distance

estimators are fully discussed. We also prove the consistency of the proposed tests against a

Page 14

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2419

Table 5.1

Double exponential, independent sample with n1= n2

Model(50, 50) (100, 100) (200, 200)(300, 300) (500, 500)

Model 0

Model 1

Model 2

Model 3

0.008

0.938

1.000

0.990

0.036

1.000

1.000

1.000

0.033

1.000

1.000

1.000

0.038

1.000

1.000

1.000

0.049

1.000

1.000

1.000

Table 5.2

Double exponential, same sample

Model50 100 200300500

Model 0

Model 1

Model 2

Model 3

0.015

0.934

0.999

0.991

0.024

1.000

1.000

1.000

0.036

1.000

1.000

1.000

0.043

1.000

1.000

1.000

0.047

1.000

1.000

1.000

fixed alternative and obtain asymptotic power against a class of local alternatives orthogonal

to the parametric model being fitted. The significant contribution we made in this research is

the removal of the common assumption in the existing literature that the density function of

the design variable is known or known up to some unknown parameters. The price we paid in

removing such restrictive assumption is mainly the slow rate of the test procedure, due to the

sample size allocation assumption (n).

The simulation studies show that the proposed testing procedures are quite satisfactory in the

preservation of the finite sample level and in terms of a power comparison. But in the proof

of the above theorems, we need the sample size allocation assumption (n) to ensure that the

estimatorˆQn2(z) has a faster convergence rate. The assumption (n) plays a very important role

in the theoretical argument, but it loses attraction to a practitioner. For example, in the simulation

case 1 where the measurement error follows a double exponential distribution, the sample size

allocation is n2= [nb

of the sample size n1in the first subsample, If n1= 500, n2is at least 2365, the sample size of

the full sample is 2865 which is perhaps not easily available in practice. The situation becomes

even worse when the measurement error is super-smooth or d > 1. For example, in Case 2, the

measurement error has a normal distribution, n2is at least 52902 if n1= 500; in Case 4, d = 2,

n2is at least 11188 if n1= 500.

Then an interesting question arises. What is the small sample behavior of the test procedure

if (1) n1= n2and the two subsamples S1and S2are independent or (2) n = n1= n2and we

do not split the sample at all? We have no theory at this point about the asymptotic behavior of

Mn(ˆθn). For d = 1, we only conduct some Monte Carlo simulations here to see the performance

of the test procedure (see Tables 5.1–5.4). The simulation results about levels and powers of the

MD test appears in the following tables, in which the measurement error follows the same double

exponential and normal distributions as in the previous section, the null and alternative models

are the same as in Case 1.

To our surprise, the simulation results for the first three cases in which d = 1 are very good.

There are almost no differences between the simulation results based on our theory and the

simulation results by just neglecting the theory. In the Case 4 with d = 2, we only conduct the

simulation for S1= S2, see Table 5.5. The test procedure is conservative for small sample sizes,

1], and b = 1.2501. n2in the second subsample S2increases in a power rate

Page 15

2420

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

Table 5.3

Normal, independent sample with n1= n2

Model (50, 50) (100, 100)(200, 200) (300, 300)(500, 500)

Model 0

Model 1

Model 2

Model 3

0.013

0.931

1.000

0.984

0.023

0.999

1.000

1.000

0.027

1.000

1.000

1.000

0.035

1.000

1.000

1.000

0.047

1.000

1.000

1.000

Table 5.4

Normal, same sample

Model 50100 200300500

Model 0

Model 1

Model 2

Model 3

0.017

0.954

0.999

0.992

0.019

0.998

1.000

1.000

0.036

1.000

1.000

1.000

0.036

1.000

1.000

1.000

0.051

1.000

1.000

1.000

Table 5.5

Double exponential, same sample, d = 2

Model 50 100200300500

Model 0

Model 1

Model 2

Model 3

0.000

0.628

0.994

0.844

0.004

0.996

0.999

0.998

0.010

1.000

1.000

1.000

0.018

1.000

1.000

1.000

0.041

1.000

1.000

1.000

but the empirical level is close to the nominal level 0.05 when sample size reaches 500. This

phenomenon suggests to us that by relaxing some conditions, such as (n), even the assumptions

on the choices of the bandwidths, Theorems 3.1 and 3.2 may still be valid.

Ifthenullmodelispolynomial,thenthetestproceduresproposedbyZhuetal.[24]andCheng

and Kukush [8] are more powerful than the MD test constructed in this paper, and hence should

be recommended. To illustrate this point, we conduct a small simulation study to compare the

performance of Zhu et al.’s score type test, the Cheng and Kukush [8] exponential weighted test

and the MD test. Because of the above-mentioned phenomenon, the simulation for the MD test

is done by using the same sample, i.e. without sample splitting. The model being simulated is

y = θ1X +θ2X2+cX3+ε, Z = X +u, where X ∼ N(0,1), ε ∼ N(0,1) and u ∼ N(0,0.22),

the null model corresponds to c = 0, the alternative models correspond to c = 0.3,0.5,1. The

simulation result for the sample size 200 is reported in Table 5.6. Simulation results show that

the MD test is more conservative and less powerful than the Cheng and Kukush test and the

Zhu, Song and Cui test. The Cheng and Kukush test (the quasi-optimal λ = 1.243 is used in

the simulation) is the most powerful among these three tests. This phenomenon is not out of

our expectation in that the Zhu, Song and Cui test and the Cheng and Kukush test are basically

parametric tests, the MD test, however, is a nonparametric one.

6. Proofs of the main results

Proof of Lemma 3.1. A direct calculation yields that for any x

?

∈

Rd, EˆfXw1(x)

=

L(v) fX(x − vw1)dv. By assumption (f1), there exists a vector a(x,v) such that fX(x − vw1)

Page 16

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2421

Table 5.6

Comparison of tests

c = 0

0.042

0.059

0.036

c = 0.3

0.862

0.572

0.202

c = 0.5

0.992

0.743

0.297

c = 1

1.000

0.855

0.554

Cheng and Kukush test

Zhu, Song and Cui test

MD test

has a Taylor expansion up to the second order, fX(x − vw1) =

w2

to its argument. Hence

?

=

?

+1

2

fX(x) − w1v?˙fX(x) +

1v?¨fX(a(x,v))v/2, where˙f and¨f are the first- and second-order derivatives of f with respect

EˆRn2(z) =

?

?

r(x)L(v) fX(x − vw1) fu(z − x)dvdx

?

−w1

r(x)L(v) fX(x) fu(z − x)dvdx

?

?

r(x)L(v)v?˙fX(x) fu(z − x)dvdx

?

r(x)L(v)w2

1v?¨fX(a(v,x))vfu(z − x)dvdx.

?r(x) fX(x) fu(z − x)dx = R(z), the secondAssumption (?) implies that the first term is

term vanishes because of?v?L(v)dv = 0, while the third term is bounded above by cI(z)

Therefore, the first claim in the lemma holds.

Note thatˆRn2(z)−EˆRn2(z) is an average of independently and identically distributed centered

random vectors. A routine calculation shows that

????

by using the fact that the variance is bounded above by the second moment. Let D(t,z) =

?r(x) fu(z − x)exp(−it?x)dx. By the definition of the deconvolution kernel Lb, it follows that

1

w2d

1

?

By changing the variable, D(t,z) = exp(−it?z)?r(z − x) fu(x)exp(it?x)dx. Adding and

?

From assumption (m1), ?D(t,z)? is bounded above by |φu(t)| · [?r(z)? + J(z)?t?α] for all

z ∈ Rd. Hence E?ˆRn2(z) − EˆRn2(z)?2is bounded above by

by assumption (f1), where c is a positive constant depending only on the kernel function L.

E?ˆRn2(z) − EˆRn2(z)?2≤

1

n2w2d

1

E

?

r(x)Lw1((x − z)/w1) fu(z − x)dx

????

2

E

????

?

?

r(x)Lw1((x − z)/w1) fu(z − x)dx

D?(t,z)D(s,z)φL(tw1)φL(sw1)φX(t + s)φu(t + x)

(2π)2dφu(t)φu(s)

????

2

=

dsdt.

subtracting r(z) from r(z − x) in the integrand, we obtain

D(t,z) = exp(−itTz)φu(z)

r(z) +

?(r(z − x) − r(z)) fu(x)exp(it?x)dx

φu(t)

?

.

Page 17

2422

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

c?r(z)?2

n2

??

|φL(tw1)φL(sw1)φu(t + s)|dtds

+cJ(z)?r(z)?

n2

+cJ2(z)

n2

Note that for any m, p = 0 or α, from assumption (?), we have

?

≤ w−p−m−2d

≤ cw−p−m−2d

= cw−p−m−d

The second claim in the lemma follows from (6.1) by using the above inequality.

??

(?t?α+ ?s?α)|φL(tw1)φL(sw1)φu(t + s)|dtds

??

?t?α?s?α|φL(tw1)φL(sw1)φu(t + s)|dtds.

(6.1)

?

?t?p?s?m|φL(tw1)φL(sw1)φu(t + s)|dtds

?

?

?

1

?

?t?p?s?m|φL(t)φL(s)φu((t + s)/w1)|dtds

1

?

?s?m|φL(s)||φu((t + s)/w1)|dtds

?

1

?s?m|φL(s)|ds ·|φu(t)|dt = cw−p−m−d

1

.

?

Proof of Theorem 3.1. To keep the exposition concise, let

Un1(z) =

1

n1

1

n1

n1

?

n1

?

n1

?

i=1

Kh1i(z)(Yi− θ?

0Q(Zi)),

Dn(z) =

i=1

Kh1i(z)(ˆQn2(Zi) − Q(Zi)),

(6.2)

µn1(z) =

1

n1

i=1

Kh1i(z)Q(Zi),

∆n1(z) =

1

˜f2

Zh2(z)

−

1

f2

Z(z).

It suffices to show that the matrix beforeˆθn− θ0on the left-hand side of (3.4) converges to

Σ0in probability, and√n1times the right-hand side of (3.4) is asymptotically normal with mean

vector 0 and covariance matrix Σ.

Consider the second claim first. Adding and subtracting θT

first factor of the integrand, and adding and subtracting Q(Zi) fromˆQn2(Zi) in the second factor

of the integrand, replacing 1/˜f2

√n1times the right-hand side of (3.4) can be written as the sum of the following eight terms.

Sn1=√n1

Sn3=√n1

Sn5= −√n1

Sn7= −√n1

Among these terms, Sn4is asymptotically normal with mean vector 0 and covariance matrix

Σ. The proof uses the Lindeberg–Feller central limit theorem, and the arguments are exactly

0Q(Zi) from Yi−θ?

0ˆQn2(Zi) in the

Zh2(z) by 1/˜f2

Zh2(z) − 1/f2

Z(z) + 1/f2

Z(z) = ∆n1(z) + 1/f2

Z(z),

?

?

Un1(z)Dn(z)∆n1(z)dG(z),

Sn2=√n1

Sn4=√n1

Sn6= −√n1

Sn8= −√n1

?

?

Un1(z)Dn(z)dψ(z),

Un1(z)µn1(z)∆n1(z)dG(z),

?

?

Un1(z)µn1(z)dψ(z),

?

?

Dn(z)D?

n(z)∆n1(z)dG(z)θ0,

Dn(z)D?

n(z)dψ(z)θ0,

Dn(z)µ?

n1(z)∆n1(z)dG(z)θ0,

Dn(z)µ?

n1(z)dψ(z)θ0.

Page 18

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2423

the same as in K–N with mθ0(Xi) and ˙ mθ0(Xi) there replaced by θT

respectively. The proof is omitted. All the other seven terms are of the order op(1). Since the

proofs are similar, only Sn8= op(1) will be shown below for the sake of brevity. We note that

by using a similar method as in K–N, we can show Un1(z) is Op(1/

proving Snl= op(1) for l = 1,2,3.

First, notice that the kernel function K has compact support [−1,1]d, so Kh1iis not 0 only if

the distances between each coordinate pair of Ziand z are no more than h. On the other hand,

the integrating measure has compact support I, so if we define

Ih1= {y ∈ Rd: |yj− zj| ≤ h1, j = 1,...,d, y = (y1,..., yd)?,

z = (z1,...,zd)?,z ∈ I},

then Ih1is a compact set in Rd, and Kh1i= 0 if Zi?∈ Ih1. Hence, without loss of generality, we

can assume all Zi∈ Ih1. Since fZis bounded from below on the compact set Iδ0by assumption

(f2) and Ih1⊂ Iδ0for n1large enough, from assumption (w2), Lemma 3.2, we obtain

?????

Secondly, we have the following inequality,

0Q(Zi) and Q(Zi) here,

?

n1hd

1), which is used in

sup

z∈Ih1

fZ(z)

ˆfZw2(z)

− 1

?????= o

?

(logkn2)

?logn2

n2

? 2

d+4?

a.s., sup

z∈Ih1

?????

fZ(z)

ˆfZw2(z)

?????= Op(1).

(6.3)

?ˆQn2(Zi) − Q(Zi)? ≤?ˆRn2(Zi) − R(Zi)?

fZ(Zi)

·

?????

fZ(Zi)

ˆfZw2(Zi)

?????+

?????

?

fZ(Zi)

ˆfZw2(Zi)

− 1

?????· ?Q(Zi)?.

Recall the definition of Sn8. We have

?Sn8? ≤√n1?θ0?

?

1

n1

n1

?

i=1

Kh1i(z)?ˆQn2(Zi) − Q(Zi)?1

n1

n1

i=1

Kh1i(z)?Q(Zi)?dψ(z).

From (6.3) and (6.4), this upper bound satisfies

√n1· Op(1) · An11+√n1· o

?

(logkn2)

?logn2

n2

? 2

d+4?

· An12,

(6.4)

where

An11=

?

??

1

n1

n1

?

1

n1

i=1

?

Kh1i(z)?ˆRn2(Zi) − R(Zi)? ·

1

n1

n1

?

i=1

Kh1i(z)?Q(Zi)?dψ(z)

An12=

n1

i=1

Kh1i(z)?Q(Zi)?

?2

dψ(z).

By the Cauchy–Schwarz inequality, A2

n11is bounded above by

?2

??

1

n1

n1

?

i=1

Kh1i(z)?ˆRn2(Zi) − R(Zi)?

dψ(z) ·

??

1

n1

n1

?

i=1

Kh1i(z)?Q(Zi)?

?2

dψ(z).

Page 19

2424

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

Note that

E

??

1

n1

n1

?

?

i=1

Kh1i(z)?ˆRn2(Zi) − R(Zi)?

?2

dψ(z)

=

?

E

1

n2

1

n1

?

i,j=1

Kh1i(z)Kh1j(z)ES1(?ˆRn2(Zi) − R(Zi)??ˆRn2(Zj) − R(Zj)?)

?

dψ(z).

By the Cauchy–Schwarz inequality again, ES1(?ˆRn2(Zi) − R(Zi)??ˆRn2(Zj) − R(Zj)?) is

bounded above by (ES1?ˆRn2(Zi) − R(Zi)?2)1/2(ES1?ˆRn2(Zj) − R(Zj)?2)1/2, which in turn,

from the independence of the subsamples S1and S2, the choice of bandwidth w1, and (3.1), is

bounded above by cn−4/(d+2α+4)

2

??

?

Using the similar method as in K–N, together with the assumptions (m1) and (m2), we can show

that

??

T1/2(Zi)T1/2(Zj), where T is defined in (3.2). So

?2

?2

E

1

n1

n1

?

i=1

Kh1i(z)?ˆRn2(Zi) − R(Zi)?

?

n1

i=1

dψ(z) ≤ cn

−

2

4

d+2α+4

×

E

1

n1

?

Kh1i(z)T1/2(Zi)

dψ(z).

1

n1

n1

?

i=1

Kh1i(z)T1/2(Zi)

?2

dψ(z) = Op(1)

=

??

1

n1

n1

?

i=1

Kh1i(z)?Q(Zi)?

?2

dψ(z).

Finally, from (6.4), we obtain ?Sn8? ≤√n1·Op(n−2/(d+2α+4)

which is of the order op(1) by the assumption (n).

To finish the proof, we only need to show the matrix beforeˆθn− θ0on the left-hand side of

(3.4) converges to Σ0in probability. Adding and subtracting Q(Zi) from ˆQn2(Zi), this matrix

can be written as the sum of the following eight terms.

?

Tn3=

?

Tn7=

2

)+√n1·op((logkn2)(logn2

n2)

2

d+4)

Tn1=

Dn(z)D?

n(z)∆n1(z)dG(z),

Tn2=

?

?

Dn(z)µ?

n1(z)∆n1(z)dG(z),

?

µn1(z)D?

n(z)∆n1(z)dG(z),

Tn4=

?

?

µn1(z)µ?

n1(z)∆n1(z)dG(z),

Tn5=

Dn(z)D?

n(z)dψ(z),

Tn6=

Dn(z)µ?

n1(z)dψ(z),

?

µn1(z)D?

n(z)dψ(z),

Tn8=

µn1(z)µ?

n1(z)dψ(z).

Notice the connection between Tn1and Sn5, Tn2,Tn3and Sn7, Tn5and Sn6, Tn6,Tn7and Sn8.

By using a similar argument as above, we can verify that Tnl= op(1) for l = 1,2,3,4,5,6,7.

Page 20

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2425

From (6.3), and the second fact in (6.5), Tn4is also of the order of op(1). Finally, employing a

similar method as in K–N, we can show Tn8converges to Σ0in probability, thereby proving the

theorem.

?

Proof of Theorem 3.2. To state the result precisely, the following notations are needed.

ξi= Yi− θ?

0Q(Zi),

n1

?

(τ2(z))2g(z)dψ(z) ·

ζi= Yi− θ?

0ˆQn2(Zi),

˜Cn= n−2

1

i=1

?

K2

h1i(z)ζ2

idψ(z),

˜ Mn(θ0) =

??

n−1

1

n1

?

dv,

i=1

Kh1i(z)ζi

?2

dψ(z),

Γ = 2

? ? ??

K(u)K(u + v)du

?2

where τ2(z) is as in Theorem 3.1. The proof is facilitated by the following five lemmas.

Lemma 6.1. If H0,(e1), (e2), (e4), (u), (f1), (f2), (m1), (m2),(?),(n), (h1), (w1)and (w2)

hold, then n1hd/2

1

(˜ Mn(θ0) −˜Cn) ?⇒ Nd(0,Γ).

Proof. Replacing ζiby ξi+ θ?

quadratic term, n1hd/2

0(Q(Zi) −ˆQn2(Zi)) in the definition ˜ Mn(θ0) and expanding the

(˜ Mn(θ0) −˜Cn) can be written as the sum of the following four terms.

1

Bn1=

1

n2

1

n1

?

n1

?

n1

?

?n1

i?=j

?

?

?

Kh1i(z)Kh1j(z)ξiξjdψ(z),

Bn2=

1

n2

1

i?=j

Kh1i(z)Kh1j(z)ξiθ?

0(Q(Zj) −ˆQn2(Zj))dψ(z),

Bn3=

1

n2

1

i?=j

Kh1i(z)Kh1j(z)ξjθ?

0(Q(Zi) −ˆQn2(Zi))dψ(z),

and Bn4 = n−2

Using the similar method as in K–N, one can show that n1hd/2

the lemma, it is sufficient to show n1hd/2

l = 2. By (6.3) and the inequality (6.4), and letting Cnij=?n1

1

i?=j

?

Kh1i(z)Kh1j(z)θ?

0(Q(Zi) − ˆQn2(Zi))θ?

0(Q(Zj) − ˆQn2(Zj))dψ(z).

Bn1 ?⇒ Nd(0,Γ). To prove

?

1

1

Bnl= op(1) for l = 2,3,4. We begin with the case of

i?=j

Kh1i(z)Kh1j(z)ξidψ(z), Bn2

is bounded above by the sum Bn21+ Bn22, where

Bn21= Op(1) ·

1

n2

1

n1

?

?logn2

j=1

[?ˆRn2(Zj) − R(Zj)? · |Cnij|],

d+4?

Bn22= o

?

(logkn2)

n2

? 2

·

1

n2

1

n1

?

j=1

[?Q(Zj)? · |Cnij|].

Page 21

2426

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

On the one hand, by the conditional expectation argument and inequality (3.1), we have

E1

n2

1

n1

?

j=1

[?ˆRn2(Zj) − R(Zj)? · |Cnij|] = E1

n2

1

n1

?

j=1

[ES1(?ˆRn2(Zj) − R(Zj)?) · |Cnij|]

?

1

j=1

= cn−2/(d+2α+4)

≤ cn−2/(d+2α+4)

2

E

1

n2

n1

?

T1/2(Zj) · |Cnij|

?

2

1

n1E[T1/2(Z1) · |Cni1|].

Now, consider the asymptotic behavior of E[T1/2(Z1) · |Cni1|]. Instead of considering the

expectation, we investigate the second moment. It is easy to see that ET (Z1)C2

?

= (n1− 1)

ni1equals

ET (Z1)

i?=1

?

?

j?=1

??

E(Kh12(z)Kh12(y)ξ2

Kh1i(z)Kh11(z)Kh1j(y)Kh11(y)ξiξjdψ(z)dψ(y)

?

2) · E(Kh11(z)Kh11(y)T (Z1))dψ(z)dψ(y).

(6.5)

The second equality is from the independence of ξi, i = 1,...,n1and Eξ1= 0. But

E(Kh12(z)Kh12(y)ξ2

1

hd

1

2) = E(Kh12(z)Kh12(y)(σ2

=

ε+ δ2(Z2)))

− v

?

K(v)K

?y − z

h1

?

(σ2

ε+ δ2(z − h1v)) fZ(z − h1v)dv.

Similarly, we can show that

E(Kh11(z)Kh11(y)T (Z1)) =

1

hd

1

?

K(v)K

?y − z

h1

− v

?

T (z − h1v) fZ(z − h1v)dv.

Putting back these two expectations in (6.5), and changing variables y = z + h1u, then by the

continuity of fZ, δ2(z), g(z), and T (z), we obtain ET (Z1)C2

E1

n2

1

j=1

This, in turn, implies Bn21= Op(n−2b/(d+2α+4)−1/2

can show n−2

1

ni1= (n1− 1)h−d

1

n1

1. Therefore,

n1

?

[?ˆRn2(Zj) − R(Zj)? · |Cnij|] = O

?

n−2/(d+2α+4)

2

·

?

n1− 1h−d/2

1

?

.

1

h−d/2

1

), by assumption (n). Similarly, one

h−d/2

1

). Thus,

?n1

j=1[?Q(Zj)? · |Cnij|] is of the order Op(n−1/2

Bn22= op((logkn1)(logn1/nb

Hence

?

since b > (d + 2α + 4)/4 by assumption (n).

By exactly the same method as above, we can show that n1hd/2

1

1)2/(d+4)· n−1/2

1

h−d/2

1

).

n1hd/2

1

|Bn2| = Op

n

1

2−

1

2b

d+2α+4

?

+ Op

?

n

1

2−2b

1

d+4

logkn1(logn1)

2

d+4

?

= op(1),

1

Bn3= op(1).

Page 22

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2427

It remains to show that n1hd/2

1

Bn4= op(1). Note that

|Bn4| ≤

1

n2

n1

?

i?=j

?

Kh1i(z)Kh1j(z)?θ0?2· ?ˆQn2(Zi) − Q(Zi)?

×?ˆQn2(Zj) − Q(Zj)?dψ(z).

From (6.4), the right-hand side of the above inequality is bounded above by the sum

?

n2

?

n2

Op(1) · Bn41+ op

(logkn2)

?logn2

? 4

? 2

d+4?

(Bn42+ Bn43)

+op

(log2

kn2)

?logn2

d+4?

Bn44,

where

Bn41=

1

n2

1

n1

?

n1

?

n1

?

n1

?

i?=j

?

?

?

?

Kh1i(z)Kh1j(z) · ?ˆRn2(Zi) − R(Zi)? · ?ˆRn2(Zj) − R(Zj)?dψ(z),

Bn42=

1

n2

1

i?=j

Kh1i(z)Kh1j(z) · ?ˆRn2(Zi) − R(Zi)? · ?Q(Zj)?dψ(z),

Bn43=

1

n2

1

i?=j

Kh1i(z)Kh1j(z) · ?ˆRn2(Zj) − R(Zj)? · ?Q(Zi)?dψ(z),

Bn44=

1

n2

1

i?=j

Kh1i(z)Kh1j(z) · ?Q(Zi)? · ?Q(Zj)?dψ(z).

By a conditional expectation argument, the Cauchy–Schwarz inequality, (2.2), and the continuity

of fZand T(z), we obtain

?

This implies Bn41= OP(n−4/(d+2α+4)

n1hd/2

11

Similarly, we can show

EBn41≤ cn−4/(d+2α+4)

2

E[Kh11(z)T1/2(Z1)]2dψ(z) = O(n−4/(d+2α+4)

2

).

2

), since b > (d + 2α + 4)/4 by assumption (n), so that

· Op(1)OP(n−4b/(d+2α+4)

· Op(1)Bn41= n1hd/2

1

) = op(1).

Bn42= OP(n−2/(d+2α+4)

Therefore, for l = 2,3,

?

2

),

Bn43= OP(n−2/(d+2α+4)

2

),

Bn44= OP(1).

n1hd/2

1

op

(logkn2)

?logn2

n2

? 2

d+4?

Bn4l= op(n

1−2b

1

d+4−

2b

d+2α+4

hd/2

1

(logkn1)(logn1)

2

d+4)

which is of the order op(1) by assumption (n). For Bn44, we have

?

n1hd/2

1

· op

(log2

kn2)

?logn2

n2

? 4

d+4?

Bn44= op(n

1−4b

1

d+4

hd/2

1

(log2

kn1)(logn1)

4

d+4)

Page 23

2428

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

which is also of the order op(1). Finally, from the above and (6.6), we prove n1hd/2

thereby proving the lemma.

1

Bn4= op(1),

Lemma 6.2. In addition to the conditions in Lemma 6.1, suppose (h2) also holds, then

n1hd/2

1

(Mn(ˆθn) − Mn(θ0)) = op(1).

Proof. Recall the definitions of Mn(θ). Adding and subtracting n−1

the squared integrand of Mn(ˆθn), we can write Mn(ˆθn)− Mn(θ0) as the sum Wn1+2Wn2, where

??

?

and ζi= Yi− θ?

??

??

We write the first term on the right-hand side as Wn11and the second term as Wn12. On the one

hand, note that Wn11is bounded above by

?????

integral part is indeed of the order op(1). By assumption (w2), the compactness of Ih1, and the

asymptotic behavior ofˆθn− θ0stated in Theorem 3.1, n1hd/2

other hand, Wn12is bounded above by

?????

1

Therefore, n1hd/2

1

Wn1= op(1) is proved. Now rewrite Wn2as

?

Note that the integral part of Wn2is the same as the expression on the right-hand side of (3.4),

thus

?

1

?n1

i=1Kh1i(z)θ?

0ˆQn2(Zi) in

Wn1=

1

n1

n1

?

n1

?

i=1

Kh1i(z)(θ0−ˆθn)?ˆQn2(Zi)

?2

dˆψh2(z),

Wn2=

1

n1

i=1

Kh1i(z)ζi·

1

n1

n1

?

i=1

Kh1i(z)(θ0−ˆθn)?ˆQn2(Zi)dˆψh2(z),

0ˆQn2(Zi). It is easy to see that

Wn1≤ 2

1

n1

n1

?

1

n1

i=1

Kh1i(z)(θ0−ˆθn)?(ˆQn2(Zi) − Q(Zi))

?2

dˆψh2(z)

+2

n1

?

i=1

Kh1i(z)(θ0−ˆθn)TQ(Zi)

?2

dˆψh2(z).

?ˆθn− θ0?2· sup

z∈I

fZ(z)

ˆfZw2(z)

?????·

??

1

n1

n1

?

i=1

Kh1i(z)?ˆQn2(Zi) − Q(Zi)?

?2

dψ(z).

By the conditional expectation argument as we used in the previous part, we can show that the

1

Wn11= op(hd/2

1

) = op(1). On the

?ˆθn− θ0?2· sup

z∈I

fZ(z)

ˆfZw2(z)

?????·

??

1

n1

n1

?

i=1

Kh1i(z)?Q(Zi)?

?2

dψ(z).

Sincetheintegralpartisoftheorder Op(1),n1hd/2

Wn12= Op(hd/2

1

) = op(1)iseasilyobtained.

Wn2=

1

n1

n1

?

i=1

Kh1i(z)ζi·

1

n1

n1

?

i=1

Kh1i(z)ˆQ?

n2(Zi)dˆψh2(z) · (θ0−ˆθn).

Wn2= (ˆθn− θ0)?

1

n1

n1

?

i=1

Kh1i(z)ˆQn2(Zi) ·

1

n1

n1

?

i=1

Kh1i(z)ˆQ?

n2(Zi)dˆψh2(z) · (θ0−ˆθn).

Page 24

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

Therefore, Wn2 is bounded above by ?ˆθn − θ0?2?[n−1

by the sum Wn21+ Wn22, where

??

??

Arguing as in Wn11and Wn12, we can show n1hd/2

Therefore, n1hd/2

1

proved.

?

2429

1

?n1

i=1Kh1i(z)?ˆQn2(Zi)?]2dˆψh2(z).

Adding and subtracting Q(Zi) from ˆQn2(Zi), it turns out that Wn2is further bounded above

Wn21= 2?ˆθn− θ0?2

n−1

1

n1

?

n1

?

i=1

Kh1i(z)?ˆQn2(Zi) − Q(zi)?

?2

dˆψh2(z),

Wn22= 2?ˆθn− θ0?2

n−1

1

i=1

Kh1i(z)?Q(zi)?

?2

dˆψh2(z).

1

|Wn21| = op(1), n1hd/2

1

1

|Wn22| = op(1).

|Wn2| = op(1). Together with the result n1hd/2

|Wn1| = op(1), the lemma is

Lemma 6.3. If H0,(e1), (e2), (u), (f1), (f2), (m1), (m2),(?),(n), (h1), (h2), (w1)and (w2)

hold, n1hd/2

1

Proof. Recall the definition of ζiand Un1(z). Note that

?????

above by the sum

??

The first term is of the order Op((n1hd

while the second term, by the conditional expectation argument, has the same order as

?????

1

?

1

?

+ Op

All the three terms are of the order op(1) by the assumptions (n), (h1), (h2), (w1) and (w2).

Hence the lemma.

?

(Mn(θ0) −˜ Mn(θ0)) = op(1).

n1hd/2

1

|Mn(θ0) −˜ Mn(θ0)| ≤ n1hd/2

1

sup

z∈I

f2

Z(z)

˜f2

Zh2(z)

− 1

?????

??

1

n1

n1

?

i=1

Kh1i(z)ζi

?2

dψ(z).

Replace ζiby ξi+θ?

0(Q(Zi)−ˆQn2(Zi)), the integral part of the above inequality can be bounded

2

?

U2

n1(z)dψ(z) + 2

1

n1

n1

?

1)−1/2) which is obtained by the similar method as in K–N,

i=1

Kh1i(z)θ?

0(Q(Zi) −ˆQn2(Zi))

?2

dψ(z).

sup

z∈Ih1

f2

Z(z)

ˆf2

Zw2(z)

?????· O(n−4/(d+2α+4)

2

) + sup

z∈Ih1

?????

f2

Z(z)

ˆf2

Zw2(z)

− 1

?????

2

· Op(1).

Therefore, n1hd/2

|Mn(θ0) −˜ Mn(θ0)| is less than or equal to

1

nhd

Op

n1hd/2

1

·· logkn1(logn1/n1)2/(d+4)

?

+ Op

n1hd/2

1

· logkn1(logn1/n1)2/(d+4)· n−4b/(d+2α+4)

· logkn1(logn1/n1)2/(d+4)· log2

1

?

?

n1hd/2

1

kn1(logn1)4/(d+4)n−4b/(d+4)

1

?

.

Lemma 6.4. If H0, (e1), (e2), (e4), (u), (f1), (f2), (m1), (m2), (?), (n), (h1), (h2), (w1)

and (w2) hold, n1hd/2

1

(ˆCn−˜Cn) = op(1).

Page 25

2430

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

Proof. Recall the notation ∆n1(z) in (6.2). Adding and subtracting θ?

integrand of hCn, then expand the quadratic term, thenˆCn−˜Cncan be rewritten as the sum of

Cnl, l = 1,2,3,4,5, where

1

n2

1

i=1

2

n2

1

i=1

2

n2

1

i=1

1

n2

1

i=1

1

n2

1

i=1

To prove the lemma, it is enough to prove n1hd/2

1

For the case of l = 1, first notice that

1

n2

1

i=1

|∆n1(z)| ·

n2

1

i=1

= Cn11+ Cn12.

Since n−2

1

i=1

n1hd/2

1

|Cn11| = op

Second, from the compactness of Θ, we have

?

1

n2

1

i=1

Again by the conditional expectation argument, the second factor of the above expression has the

same order as

?????

+ sup

z∈Ih1

0ˆQn2(Zi) from Yi in the

Cn1=

n1

?

n1

?

n1

?

n1

?

n1

?

?

?

?

?

?

K2

h1i(z)(Yi− θ?

0ˆQn2(Zi))2∆n1(z)dψ(z),

Cn2=

K2

h1i(z)(Yi− θ?

0ˆQn2(Zi))(θ0−ˆθn)?ˆQn2(Zi)∆n1(z)dψ(z),

Cn3=

K2

h1i(z)(Yi− θ?

0ˆQn2(Zi))(θ0−ˆθn)?ˆQn2(Zi)dψ(z),

Cn4=

K2

h1i(z)(Yi− θ?

0ˆQn2(Zi))((θ0−ˆθn)?ˆQn2(Zi))2∆n1(z)dψ(z),

Cn5=

K2

h1i(z)(Yi− θ?

0ˆQn2(Zi))((θ0−ˆθn)?ˆQn2(Zi))2dψ(z).

Cnl= op(1) for l = 1,2,3,4,5.

|Cn1| ≤ 2sup

z∈I

|∆n1(z)| ·

n1

?

1

?

K2

h1i(z)ξ2

idψ(z)

+2sup

z∈I

n1

?

?

K2

h1i(z)(θ?

0(Q(Zi) −ˆQn2(Zi)))2dψ(z)

?n1

?

K2

h1i(z)ξ2

idψ(z) = Op(1/n1hd

n1hd/2

1

1) by a routine expectation argument,

?

· (logkn1)(logn1)2/(d+4)n−2/(d+4)

1

· (n1h1)−1?

= op(1).

1

n2

1

n1

?

≤ O(1) ·

i=1

K2

h1i(z)(θ?

0(Q(Zi) −ˆQn2(Zi)))2dψ(z)

?

n1

?

K2

h1i(z)?Q(Zi) −ˆQn2(Zi)?2dψ(z).

Op(n−4/(d+2α+4)

2

) · sup

z∈Ih1

fZ(z)

˜fZw2(z)

fZ(z)

˜fZw2(z)

?????

?????

2

·

1

n2

1

n1

?

i=1

?

K2

h1i(z)T2(Zi)dψ(z)

?????

− 1

2

·

1

n2

1

n1

?

i=1

?

K2

h1i(z)?Q(Zi)?2dψ(z).

Page 26

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2431

Because

1

n2

1

n1

?

i=1

?

K2

h1i(z)T2(Zi)dψ(z) = Op(1/n1hd

1) =

1

n2

1

n1

?

i=1

?

K2

h1i(z)?Q(Zi)?2dψ(z),

then from (h2), (w2), (h1), and Lemma 3.2, we get n1hd/2

op(1). Now we will show that n1hd/2

op(1) is a natural consequence. In fact,

?

×(ˆQn2(Zi) − Q(Zi) + Q(Zi))dψ(z).

So |Cn3| is bounded above by the sum 2(Cn31+ Cn32+ Cn33+ Cn34), where

1

n2

1

i=1

1

n2

1

i=1

1

n2

1

i=1

1

n2

1

i=1

It is sufficient to show that n1hd/2

1

|Cn3l| = op(1) for l = 1,2,3,4. Because the proofs are

similar, here we only show n1hd/2

1

|Cn32| = op(1), others are omitted for the sake of brevity.

In fact, note that n−2

1

i=1

argument, then from ?ˆθn − θ0? = Op(n−1/2

n1hd/2

11

a < 1/2d by assumption (h1), the above expression is op(1). Similarly, we can show that the

same results hold for Cn4and Cn5. The details are left out.

1

|Cn12| = op(1). Hence n1hd/2

1

|Cn1| =

|Cn2| =

1

|Cn3| = op(1). Once we prove this, then n1hd/2

1

Cn3=

2

n2

1

n1

?

i=1

K2

h1i(z)(ξi+ θT

0Q(Zi) − θ?

0ˆQn2(Zi)) · (θ0−ˆθn)?

Cn31=

n1

?

n1

?

n1

?

n1

?

?

?

?

?

K2

h1i(z)|ξi|?θ0−ˆθn??ˆQn2(Zi) − Q(Zi)?dψ(z),

Cn32=

K2

h1i(z)|ξi|?θ0−ˆθn??Q(Zi)?dψ(z)

Cn33=

K2

h1i(z)?θ0??θ0−ˆθn??ˆQn2(Zi) − Q(Zi)?2dψ(z),

Cn34=

K2

h1i(z)?θ0??θ0−ˆθn??ˆQn2(Zi) − Q(Zi)??Q(Zi)?dψ(z).

?n1

?

K2

h1i(z)|ξi|?Q(Zi)?dψ(z) = Op(1/n1hd

1

1) = Op(n−1/2

1) by an expectation

) by Theorem 3.1, we have n1hd/2

h−d/2

11

1

|Cn32| =

?ˆθn− θ0? · Op(1/n1hd

). Since n−1/2

h−d/2

1

= n−1/2+ad/2

1

and

?

Lemma 6.5. Under the same conditions as in Lemma 6.4,ˆΓn− Γ = op(1).

Proof. Recall the notation for ξi. Define

??

The lemma is proved by showing that

˜Γn= 2hd

1n−2

1

n1

?

i?=j

Kh1i(z)Kh1j(z)ξiξjdˆψh2(z)

?2

.

ˆΓn−˜Γn= op(1),

where Γ is as in (3.5). But the second claim can be shown using the same method as in K–N, so

we only prove the first claim. Write un=ˆθn− θ0,ri= θT

˜Γn− Γ = op(1),

(6.6)

0Q(Zi) −ˆθ?

nˆQn2(Zi). NowˆΓncan be

Page 27

2432

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

expressed as the sum of˜Γnand the following terms:

n1

?

+

n1

?

+

Bn1 = 2hd

1n−2

1

i?=j

??

Kh1i(z)Kh1j(z)ξirjdˆψh2(z) +

?

Kh1i(z)Kh1j(z)ξjridˆψh2(z)

?

1n−2

?

Kh1i(z)Kh1j(z)rirjdˆψh2(z)

??

Kh1i(z)Kh1j(z)ξjridˆψh2(z) +

?2

,

Bn2 = 4hd

1

i?=j

Kh1i(z)Kh1j(z)ξiξjdˆψh2(z)

?

·

??

Kh1i(z)Kh1j(z)ξirjdˆψh2(z)

?

Kh1i(z)Kh1j(z)rirjdˆψh2(z)

?

,

so it suffices to show that both terms are of the order op(1). Applying the Cauchy–Schwarz

inequality to the double sum, one can see that we only need to show the following

??

n1

?

hd

1

i?=j

The third claim in (6.7) can be proved by using the same argument as in K–N. Now, consider

the first claim above. From Lemma 3.2, we only need to show the claim is true when dˆψh2(z) is

replaced by dψ(z). Since rjhas nothing to do with the integration variable, the left-hand side of

the first claim after the replacing can be rewritten as

??

Note that rj = u?n(Q(Zj) −ˆQn2(Zj)) − uT

bounded above by the sum of the following three terms:

hd

1n−2

1

n1

?

i?=j

Kh1i(z)Kh1j(z)|ξirj|dˆψh2(z)

?2

?2

?2

= op(1)

hd

1n−2

1

i?=j

n1

?

??

??

Kh1i(z)Kh1j(z)|rirj|dˆψh2(z)

= op(1),

1n−2

Kh1i(z)Kh1j(z)|ξiξj|dˆψh2(z)

= Op(1).

(6.7)

hd

1n−2

1

n1

?

i?=j

|rj|2

Kh1i(z)Kh1j(z)|ξi|dψ(z)

?2

.

(6.8)

nQ(Zj) − θ?

0(ˆQn2(Zj) − Q(Zj)), so (6.8) can be

An1= 3hd

1n−2

1?un?2

n1

?

n1

?

n1

?

i?=j

?ˆQn2(Zj) − Q(Zj)?2

??

??

Kh1i(z)Kh1j(z)|ξi|dψ(z)

?2

?2

,

An2= 3hd

1n−2

1?un?2

i?=j

Kh1i(z)Kh1j(z)|ξi|?Q(Zj)?dψ(z),

An3= 3hd

1n−2

1?θ0?2

i?=j

?ˆQn2(Zj) − Q(Zj)?2

??

Kh1i(z)Kh1j(z)|ξi|dψ(z)

?2

.

An2= op(1) can be shown by the fact that un=ˆθn− θ0= op(1), and that

n1

?

hd

1n−2

1

i?=j

??

Kh1i(z)Kh1j(z)|ξi|?Q(Zj)?dψ(z)

?2

= Op(1)

Page 28

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2433

which can be shown by using the same argument as in K–N. Let us consider An3. Using the

inequality (6.4), Lemma 3.2 or (6.3), and the compactness of Θ, it is easy to see An3is bounded

above by the sum An31+ An32, where

n1

?

An32= op(1) · hd

i?=j

Apply the conditional expectation argument to the second factor in An31, using the fact (3.1) and

the elementary inequality a < (1 + a)2, we can show

?

i?=j

?

i?=j

−

2

E

1

i?=j

The expectation of the right-hand side of the above inequality turns out to be O(1) by using the

same argument as in K–N. So, hd

1

= op(1). This, in turn, implies that the second factor in An31 = op(1). The same method as

in K–N also leads to hd

1

i?=j

An32 = op(1). Therefore, Bn1 = op(1), and Bn2 = op(1), thereby proving the first claim in

(6.6), hence the lemma.

?

An31= Op(1) · hd

1n−2

1

i?=j

n1

?

?ˆRn2(Zj) − R(Zj)?2

??

??

Kh1i(z)Kh1j(z)|ξi|dψ(z)

?2

?2

1n−2

1

Kh1i(z)Kh1j(z)|ξi|?Q(Zj)?dψ(z)

.

Ehd

1n−2

1

n1

?

?ˆRn2(Zj) − R(Zj)?2

??

Kh1i(z)Kh1j(z)|ξi|dψ(z)

??

?2?

= Ehd

1n−2

1

n1

?

(ES1?ˆRn2(Zj) − R(Zj)?2)

??

Kh1i(z)Kh1j(z)|ξi|dψ(z)

?2?

≤ cn

4

d+2α+4

?

hd

1n−2

n1

?

Kh1i(z)Kh1j(z)|ξi? 1 + T (Zj)|dψ(z)

?2?

.

1n−2

?n1

??

i?=j?ˆRn2(Zj)−R(Zj)?2??

Kh1i(z)Kh1j(z)|ξi|?Q(Zj)?dψ(z)?2= Op(1). Hence

Kh1i(z)Kh1j(z)|ξi|dψ(z)?2

1n−2

?n1

Proof of Theorem 3.3. Before we prove the consistency of the MD test, let us consider the

convergence of the MD estimator. Under the alternative hypothesis Ha : µ(x) = m(x), one

can verify that the right-hand side of (3.3) can be written as the sum of the following two terms

An1=

?

?

1

n1

n1

?

n1

?

i=1

Kh1i(z)m(Xi) ·

1

n1

n1

?

Kh1i(z)ˆQn2(Zi)dˆψh2(z).

i=1

Kh1i(z)ˆQn2(Zi)dˆψh2(z),

An2=

1

n1

i=1

Kh1i(z)εi·

1

n1

n1

?

i=1

Adding and subtracting Q(Zi) fromˆQn2(Zi), on the one hand, one has

?

×1

n1

i=1

An1=

1

n1

n1

?

n1

?

i=1

Kh1i(z)m(Xi) · Dn(z)dˆψh2(z) +

?

1

n1

n1

?

i=1

Kh1i(z)m(Xi)

Kh1i(z)Q(Zi)dˆψh2(z),

Page 29

2434

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

where Dn(z) is as in (6.2). The first term of An1is the order of op(1), while the second term

converges to?

An2=

n1

i=1

×1

n1

i=1

Similarly to proving the asymptotic results for Sn1+ Sn2 and Sn3+ Sn4 in the proof of

Theorem 3.1, we can show that both terms of An2are op(1). Recall thatˆθnsatisfies (3.3), indeed

we proved thatˆθn→ Σ−1

Adding and subtracting H(z) = E(m(X)|Z = z) from Yi, Mn(ˆθn) can be written as the sum

Sn1+ Sn2+ Sn3, where

??

??

??

?

n1

i=1

Define

?

?

σ2

H(z)Q(z)dG(z) in probability. On the other hand,

?

1

n1

?

n1

?

Kh1i(z)εi· Dn(z)dˆψh2(z) +

?

1

n1

n1

?

i=1

Kh1i(z)εi

Kh1i(z)Q(Zi)dˆψh2(z).

0

?

H(z)Q(z)dG(z).

Sn1=

1

n1

n1

?

n1

?

1

n1

1

i=1

Kh1i(z)(Yi− H(Zi))

?2

dˆψh2(z),

Sn2=

1

n1

i=1

Kh1i(z)(H(Zi) −ˆθ?

nˆQn2(Zi))

?

?2

dˆψh2(z),

Sn3 = 2

n1

?

n1

?

i=1

Kh1i(z)(Yi− H(Zi))

×

Kh1i(z)(H(Zi) −ˆθ?

nˆQn2(Zi))

?

dˆψh2(z).

C∗

n=

1

n2

1

n1

?

(σ2

i=1

K2

h1i(z)(Yi− H(Zi))2dˆψh2(z),

? ??

ε+ E[(m(X) − H(Z))2|Z = z] + (H(z) − θ?Q(z))2.

Similarly to the proof of Theorem 3.2, one can show that n1hd/2

distribution. Let θ = Σ−1

ˆθ?

??

??

?

n1

i=1

Γ∗= 2

∗(z) = σ2

∗(z))2g(z)dψ(z)

K(u)K(u + v)du

?2

dv,

1

(Sn1− C∗

n) → N(0,Γ∗) in

0

?

H(z)Q(z)dG(z), adding and subtracting θ?Q(Zi) from H(Zi) −

nˆQn2(Zi), then Sn2equals the sum of Sn21, Sn22and Sn23, where

Sn21=

1

n1

n1

?

i=1

Kh1i(z)(H(Zi) − θ?Q(Zi))

?2

dˆψh2(z),

?

?

Sn22 = −2

1

n1

n1

?

n1

?

Kh1i(z)(ˆθ?

i=1

Kh1i(z)(H(Zi) − θ?Q(Zi))

×

1

nˆQn2(Zi) − θ?Q(Zi))

dˆψh2(z),

Page 30

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2435

Sn23= 2

??

1

n1

n1

?

i=1

Kh1i(z)(ˆθ?

nˆQn2(Zi) − θ?Q(Zi))

?2

dˆψh2(z).

Routine calculation and Lemma 3.2 show that Sn21=?[H(z) − θ?Q(z)]2dG(z) + op(1), while

??

??

which is op(1) by Theorem 3.3 and the asymptotic property of ˆQn2(Zi) discussed in the proof

of Theorem 3.1. Hence by the Cauchy–Schwarz inequality, Sn22= op(1). Sn3= op(1) can be

obtained by using the Cauchy–Schwarz inequality again.

Note that Cn= C∗

2

n2

1

i=1

1

n2

1

i=1

Both Cn1and Cn2are op(1). Hence Cn− C∗

Next, we shall show thatˆΓn= Γ∗+op(1). To this end, write ei= Yi− H(Zi), w(Zi,ˆθn) =

ˆθ?

n

?

Expanding the square of the integral, one can rewriteˆΓn=?10

An1=2hd

n2

1

i?=j

An2=2hd

n2

1

i?=j

An3=2hd

n2

1

i?=j

An4=2hd

n2

1

i?=j

An5 = −4hd

n2

1

i?=j

×

Sn23is bounded above by

2

1

n1

n1

?

i=1

Kh1i(z)(ˆθn− θ)?ˆQn2(Zi)

?2

dˆψh2(z)

+2

1

n1

n1

?

i=1

Kh1i(z)(ˆQn2(Zi) − Q(Zi))?θ

?2

dˆψh2(z),

n+ Cn1+ Cn2, where

?

?

Cn1=

n1

?

n1

?

K2

h1i(z)(Yi− H(Zi))(H(Zi) −ˆθ?

nˆQn2(Zi))dˆψh2(z),

Cn2=

K2

h1i(z)(H(Zi) −ˆθ?

nˆQn2(Zi))2dˆψh2(z).

n= op(1).

nˆQn2(Zi) − H(Zi), then

ˆΓn= 2h1n−2

1

i?=j=1

??

Kh1i(z)Kh1j(z)(ei− w(Zi,ˆθn))(ej− w(Zj,ˆθn))dˆψh2(z)

?2

.

j=1Anj, where

1

?

?

?

?

1

??

??

??

??

?

Kh1i(z)Kh1j(z)eiw(Zj,ˆθn)dˆψh2(z)

Kh1i(z)Kh1j(z)eiejdˆψh2(z)

?2

1

Kh1i(z)Kh1j(z)eiw(Zj,ˆθn)dˆψh2(z)

?2

?2

1

Kh1i(z)Kh1j(z)w(Zi,ˆθn)ejdˆψh2(z)

1

Kh1i(z)Kh1j(z)w(Zi,ˆθn)w(Zj,ˆθn)dˆψh2(z)

?2

??

Kh1i(z)Kh1j(z)eiejdˆψh2(z)

?

?

Page 31

2436

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

??

Kh1i(z)Kh1j(z)w(Zi,ˆθn)ejdˆψh2(z)

??

Kh1i(z)Kh1j(z)w(Zi,ˆθn)w(Zj,ˆθn)dˆψh2(z)

??

Kh1i(z)Kh1j(z)w(Zi,ˆθn)ejdˆψh2(z)

??

Kh1i(z)Kh1j(z)w(Zi,ˆθn)w(Zj,ˆθn)dˆψh2(z)

An10 = −4hd

n2

1

i?=j

×

An6 = −4hd

1

n2

?

1

1

?

i?=j

Kh1i(z)Kh1j(z)eiejdˆψh2(z)

×

4hd

n2

?

An7 =

1

?

i?=j

Kh1i(z)Kh1j(z)eiejdˆψh2(z)

×

4hd

n2

?

1

?

An8 =

1

?

i?=j

Kh1i(z)Kh1j(z)eiw(Zj,ˆθn)dˆψh2(z)

×

?

?

An9 = −4hd

1

n2

?

1

?

i?=j

Khi(z)Kh1j(z)eiw(Zj,ˆθn)dˆψh2(z)

×

?

1

?

Kh1i(z)Kh1j(z)w(Zi,ˆθn)w(Zj,ˆθn)dˆψh2(z)

??

Kh1i(z)Kh1j(z)w(Zi,ˆθn)ejdˆψh2(z)

?

?

.

By taking the expectation, using Fubini’s theorem, we obtain

hd

1n−2

1

?

?

i?=j

??

??

Kh1i(z)Kh1j(z)|ei?ej|dψ(z)

?2

= Op(1),

(6.9)

hd

1n−2

1

i?=j

Kh1i(z)Kh1j(z)|ei|dψ(z)

?2

= Op(1),

k = 0,1.

(6.10)

By Lemma 3.2, (6.9), and arguing as in the proof of Lemma 6.5, one can verify that

An1→ Γ∗

1= 2

?

σ4

e(z)g2(z)/f2

Z(z)dz

? ??

K(u + v)K(u)du

?2

dv

in probability, where σ2

for An2, writeˆθ?

by consideringˆθn− θ0, ˆQn2(Zj) − Q(Zj) as a whole, respectively, An2can be written as the

sum of

e(z) = E[(Y − H(Z))2|Z = z] = σ2

nˆQn2(Zj) as (ˆθn−θ +θ)?(ˆQn2(Zj)− Q(Zj)+ Q(Zj)) and expand the integral

ε+ E[(m(X) − H(Z))2|Z = z]. As

2hd

1n−2

1

?

i?=j

??

Kh1i(z)Kh1j(z)ei(θ?Q(Zj) − H(Zj))dˆψh2(z)

?2

,

Page 32

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2437

and a remainder. The above term converges to

?

and the remainder equals op(1) by the consistency ofˆθn, and a similar conditional argument on

ˆQn2(Zj) − Q(Zj) as in the proof of Lemma 6.1, together with (6.10) with k = 1.

The same argument leads to An3→ Γ∗

Adding and subtracting θ?Q(Zi) from w(Zi,ˆθn), θ?Q(Zj) from w(Zj,ˆθn), arguing as above,

one can show that

?

in probability. Next, write An5as the sum of

An51 = −4hd

n2

1

i?=j

×

An52 = −4hd

n2

1

i?=j

×

By Lemma 3.2, one can verify that An51=˜An51+ op(1), where

˜An51= −4hd

n2

1

i?=j

×

Clearly, E˜An51 = 0. Arguing as in K–N, one can show that E(˜A2

An51 = op(1). One can also show that An52 = op(1). These results imply An5 = op(1).

Similarly, we can show that Anj= op(1) for j = 6,7,8,9,10.

Note that Γ∗= Γ∗

Finally, we get

Γ∗

2= 2

σ2

e(z)[H(z) − θ?Q(z)]2g2(z)/f2

Z(z)dz

? ??

K(u + v)K(u)du

?2

dv,

2in probability.

An4→ Γ∗

3= 2

[H(z) − θ?Q(z)]4g2(z)/f2

Z(z)dz

? ??

K(u + v)K(u)du

?2

dv

1

?

Kh1i(z)Kh1j(z)ei[θ?Q(Zj) − H(Zj)]dˆψh2(z)

?

Kh1i(z)Kh1j(z)ei[ˆθ?

??

Kh1i(z)Kh1j(z)eiejdˆψh2(z)

?

?

1

??

Kh1i(z)Kh1j(z)eiejdˆψh2(z)

?

nˆQn2(Zj) − θ?Q(Zj)]dˆψh2(z)

?

.

1

?

Kh1i(z)Kh1j(z)ei[θ?Q(Zj) − H(Zj)]dˆψh2(z)

??

Kh1i(z)Kh1j(z)eiejdˆψh2(z)

?

?

.

n51) = O(n−1

1h−d

1). Hence

1+ 2Γ∗

2+ Γ∗

3, we obtainˆΓn→ Γ∗in probability.

ˆDn= n1hd/2

and the theorem follows immediately.

1

ˆΓ−1/2

n

(Sn1− C∗

n) + n1hd/2

1

ˆΓ−1/2

n

?

[H(z) − θ?Q(z)]2dG(z) + op(n1hd/2

1

)

?

Proof of Theorem 3.4. Adding and subtracting θ?

(3.3), then (3.3) becomes (3.4). Adding and subtracting θ?

respectively, and letting ξi= εi+θ?

right-hand side of (3.4) can be written as the sum of the following six terms

?

0ˆQn2(Zi) from Yi on the right-hand side of

0Q(Zi), Q(Zi) from YiandˆQn2(Zi),

0Q(Zi), then under the local alternatives (3.7), the

0r(Xi)−θ?

Bn1=

1

n1

n1

?

i=1

Kh1i(z)ξi·

1

n1

n1

?

i=1

Kh1i(z)Q(Zi)dˆψh2(z),

Page 33

2438

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

Bn2=

?

1

n1

?

?

?

?

n1

?

1

n1

i=1

Kh1i(z)ξi· Dn(z)dˆψh2(z),

Bn3= γn

n1

?

n1

?

i=1

Kh1i(z)v(Xi) ·

1

n1

n1

?

i=1

Kh1i(z)Q(Zi)dˆψh2(z),

Bn4= γn

1

n1

i=1

Kh1i(z)v(Xi) · Dn(z)dˆψh2(z),

Bn5= −

θ?

0Dn(z) · Dn(z)dˆψh2(z),

1

n1

Bn6= −

θ?

0Dn(z) ·

n1

?

i=1

Kh1i(z)Q(Zi)dˆψh2(z).

Note that ξi are i.i.d. with mean 0 and finite second moment, arguing as in the proof of the

asymptotic normality of Sn4in the proof of Theorem 3.1 with Yi− θ?

can show that√n1Bn1∼ Nd(0,Σ) as n → ∞. While√n1Bn2= op(1) can be shown in the

similar way to showing Sn1+ Sn2 = op(1),√n1Bn4 = op(1) and√n1Bn6 = op(1) can be

proven similarly as in proving Sn7+ Sn8in the proof of Theorem 3.1. Then√n1Bn5= op(1) as

well.

Now let us consider√n1Bn3= op(1). Denote

ηv(z) = E(Kh(z − Z)v(X)),

then

???

0Q(Zi) replaced by ξi, one

ηQ(z) = E(Kh(z − Z)Q(Z)),

(6.11)

√n1Bn3= γn√n1

1

n1

n1

?

i=1

Kh1i(z)v(Xi) − ηv(z)

?

+ (ηv(z) − fZ(z)V(Z)) + fZ(z)V(Z)

??

n1

i=1

?

?

×

1

n1

?

Kh1i(z)Q(Zi) − ηQ(z)

?

+?ηQ(z) − fZ(z)Q(Z)?

+ fZ(z)V Q(Z)

dˆψh2(z).

Expanding the product will result in nine terms. All terms can be shown to be the order of

op(hd/4

1

) except

γn√n1

f2

?

Z(z)V(z)Q(z)dˆψh2(z) = γn√n1

?

V(z)Q(z)dG(z)

?

+γn√n1

V(z)Q(z)[1/˜f2

Zh2(z) − 1/f2

Z(z)]dG(z).

The second term is op(1) by the condition (n), (h1) and Lemma 3.2. Hence

?

√n1Bn3=√n1γn

V(z)Q(z)dG(z) + op(1).

Page 34

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2439

Note that the random matrix beforeˆθn− θ0in (3.4) does not depend on the local alternative

hypothesis, so it converges to Σ0in probability. Thus, we showed that

?

where Σ0and Σ are the same as in Theorem 3.1.

Now, let us consider the local power of the MD test. Under the local alternative (3.7),

Yi= θ?

??

??

?2

Expanding the integral, Mn(ˆθn) can be written as the sum?6

Tn1=

n1

i=1

??

??

?

?

×1

n1

i=1

?

A simple argument leads to

√n(ˆθn− θ0) −√n1γn

V(z)Q(z)dG(z)→dNq(0,Σ−1

0ΣΣ−1

0),

(6.12)

0r(Xi) + γnv(Xi) + εi. Then

Mn(ˆθn) =

1

n1

n1

?

n1

?

i=1

Kh1i(z)[Yi−ˆθ?

nˆQn2(Zi)]

?2

dˆψh2(z)

=

1

n1

i=1

Kh1i(z)[θ?

0r(Xi) + γnv(Xi) + εi− θ?

0Q(Zi)

+ θ?

0Q(Zi) −ˆθ?

nˆQn2(Zi)]

dˆψh2(z).

j=1Tnj, where

?2

??

1

n1

?

1

n1

Kh1i(z)[εi+ θ?

0r(Xi) − θ?

?2

0Q(Zi)]

dˆψh2(z),

Tn2= γ2

n

n1

?

i=1

Kh1i(z)v(Xi)

dˆψh2(z),

Tn3=

1

n1

n1

?

1

n1

i=1

Kh1i(z)[θ?

0Q(Zi) −ˆθ?

nˆQn2(Zi)]

?2

dˆψh2(z),

Tn4= 2γn

n1

?

n1

?

n1

?

1

n1

i=1

Kh1i(z)[εi+ θ?

0r(Xi) − θ?

0Q(Zi)] ·

1

n1

n1

?

i=1

Kh1i(z)v(Xi)dˆψh2(z),

Tn5 = 2

1

n1

i=1

Kh1i(z)[εi+ θ?

0r(Xi) − θ?

0Q(Zi)]

Kh1i(z)[θ?

0Q(Zi) −ˆθ?

nˆQn2(Zi)]dˆψh2(z),

Tn6= 2γn

n1

?

i=1

Kh1i(z)v(Xi) ·

1

n1

n1

?

i=1

Kh1i(z)[θ?

0Q(Zi) −ˆθ?

nˆQn2(Zi)]dˆψh2(z).

n1hd/2

1

Tn2=

??

1

n1

n1

?

i=1

Kh1i(z)v(Xi)

?2

dˆψh2(z) →

?

[V(z)]2dG(z)

(6.13)

in probability.

Page 35

2440

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

To deal with Tn3, note that

n1hd/2

1

Tn3=

??

1

n1

n1

?

i=1

Kh1i(z)[(ˆθn− θ0+ θ0)?[ˆQn2(Zi) − Q(Zi)

?2

+ Q(Zi)] − θ?

0Q(Zi)]

dˆψh2(z)

can be written as the sum of the following six terms

Tn31= n1hd/2

1

??

??

??

??

1

n1

i=1

??

1

n1

i=1

??

1

n1

i=1

1

n1

n1

?

n1

?

n1

?

?

Kh1i(z)(ˆθn− θ0)?Q(Zi)

i=1

Kh1i(z)(ˆθn− θ0)?(ˆQn2(Zi) − Q(Zi))

?2

dˆψh2(z)

Tn32= n1hd/2

1

1

n1

i=1

Kh1i(z)(ˆθn− θ0)?Q(Zi)

?2

dˆψh2(z)

?2

Tn33= n1hd/2

1

1

n1

i=1

Kh1i(z)θ?

0(ˆQn2(Zi) − Q(Zi))

dˆψh2(z)

Tn34 = n1hd/2

1

1

n1

n1

i=1

Kh1i(z)(ˆθn− θ0)?(ˆQn2(Zi) − Q(Zi))

?

?

×

?

n1

?

dˆψh2(z)

Tn35 = n1hd/2

1

1

n1

n1

?

i=1

Kh1i(z)(ˆθn− θ0)?(ˆQn2(Zi) − Q(Zi))

?

×

?

n1

?

Kh1i(z)θ?

0(ˆQn2(Zi) − Q(Zi))

?

dˆψh2(z)

?

Tn36 = n1hd/2

1

1

n1

n1

?

i=1

Kh1i(z)(ˆθn− θ0)?Q(Zi)

×

?

n1

?

Kh1i(z)θ?

0(ˆQn2(Zi) − Q(Zi))

?

dˆψh2(z).

By (6.12), and conditional expectation arguments onˆQn2(Zi), one can show that Tn3k= op(1)

for k = 1,3,4,5,6.

For Tn32, we have

??

?

n1

i=1

Tn32= n1hd/2

1

(ˆθn− θ0)?

1

n1

n1

?

?

i=1

Kh1i(z)Q(Zi)

?

×

1

n1

?

Kh1i(z)Q?(Zi)

dˆψh2(z)(ˆθn− θ0).

Page 36

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2441

From (6.12), we see that

?

n1hd/2

1

(ˆθn− θ0) = γn

?

V(z)Q(z)dG(z) + op(1).

n1hd/2

1

?

V(z)Q(z)dG(z) + op(hd/4

1

)

=

?

Therefore, we can show that

n1hd/2

1

Tn3=

?

V(z)Q?(z)dG(z) ·

?

Q(z)Q?(z)dG(z) ·

?

V(z)Q(z)dG(z) + op(1).

(6.14)

To show Tn4= op(1), recall the notation ηv(z) defined in (6.11), and

?

1

n1

i=1

one has

?

?

?

n1

i=1

= n1hd/2

by assumption (h1). Similarly, one can obtain that n1hd/2

Write ξi= εi+ θ?

1

n2

1

i=1

+(θ0−ˆθn)?ˆQn2(Zi) + (Q(Zi) −ˆQn2(Zi))?θ0]2dˆψh2(z)

which, by expanding the second square term in the integrand, equals ˆCn1 + ˆCn2, where

ˆCn1 = n−2

conditional argument onˆQn2(Zi), one can verify that n1hd/2

To see the asymptotic property of ˆΓn under the local alternatives, we use the same

technique. Adding and subtracting θ?

respectively, one obtains

1

n1

n1

?

i=1

Kh1i(z)[εi+ θ?

?

0r(Xi) − θ?

0Q(Zi)] · EKh1(z − Z)v(X)dψ(z)

=

n1

?

Kh1i(z)EKh1(z − Z)v(X)dψ(z)[εi+ θ?

0r(Xi) − θ?

0Q(Zi)] = Op(1/√n1),

Tn4= 2γn

1

n1

n1

?

1

n1

i=1

Kh1i(z)[εi+ θ?

0r(Xi) − θ?

0Q(Zi)] · ηv(z)dˆψh2(z)

+2γn

n1

?

i=1

Kh1i(z)[εi+ θ?

0r(Xi) − θ?

?

γnOp(1/(n1hd

0Q(Zi)]

×

1

n1

?

γnOp(1/√n1) + n1hd/2

Kh1i(z)v(Xi) − ηv(z)

dˆψh2(z)

11

1)) = op(1)

Tn5= op(1) = n1hd/2

11

Tn6.

0r(Xi) − θ?

0Q(Zi). ThenˆCncan be written as

n1

?

?

K2

h1i(z)[ξi+ γnv(Xi) + (θ0−ˆθn)?(Q(Zi) −ˆQn2(Zi))

1

?n1

i=1

?

K2

h1i(z)ξ2

idˆψh2(z) andˆCn2is the remainder. By consistency ofˆθnand the

ˆCn2= op(1).

1

0Q(Zi),θ?

0Q(Zj) from Yi−ˆθ?

nˆQn2(Zi) and Yj−ˆθ?

nˆQn2(Zj)

ˆΓn=2hd

1

n2

1

n1

?

i?=j

?

Kh1i(z)Kh1j(z)ξiξjdˆψh2(z)

?2+ Vn.

Page 37

2442

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

One can show that Vn= op(1) by the consistency ofˆθn, the Cauchy–Schwarz inequality on the

double sum, and the facts (6.9) and (6.10), while the first term converges to Γ in probability.

Therefore,

ˆDn= n1hd/2

the theorem follows by noting that the second term on the right-hand side ofˆDnasymptotically

has standard normal distribution and (6.13) and (6.14).

1

ˆΓ−1/2

n

(Tn1−ˆCn1) + n1hd/2

1

(ˆCn1−ˆCn) + n1hd/2

1

(Tn2+ Tn3) + op(1),

?

Acknowledgments

The author would like to thank the Associate Editor and two referees for their helpful and

critical comments which led to substantial improvement on the presentation of the paper.

References

[1] T.W. Anderson, Estimating linear statistical relationships, Ann. Statist. 12 (1984) 1–45.

[2] P.J. Bickel, Y. Ritov, Efficient estimation in the errors in variables model, Ann. Statist. 15 (2) (1987) 513–540.

[3] D. Bosq, Nonparametric Statistics for Stochastic Processes: Estimation and Prediction, 2nd edition, in: Springer

Lecture Notes in Statistics, vol. 110, Springer-Verlag, New York, Inc, 1998.

[4] R.J. Carroll, P. Hall, Optimal rates of convergence for deconvoluting a density, JASA 83 (1988) 1184–1185.

[5] R.J. Carroll, C.H. Spiegelman, Diagnostics for nonlinearity and heteroscedasticity in errors in variables regression,

Technometrics 34 (1992) 186–196.

[6] R.J. Carroll, D. Ruppert, L.A. Stefanski, Measurement Error in Nonlinear Models, Chapman & Hall/CRC, Boca

Raton, 1995.

[7] C. Cheng, J.W. Van Ness, Statistical Regression with Measurement Error, Arnold, London, 1999.

[8] C.L. Cheng, A.G. Kukush, A goodness-of- fit test for a polynomial errors-in-variables model, Ukrainian Math. J.

56 (2004) 527–543.

[9] E. Masry, Strong consistency and rates for deconvolution of multivariate densities of stationary process, Stochastic

Process. Appl. 47 (1993) 53–74.

[10] R.L. Eubank, C.H. Spiegelman, Testing the goodness of fit of a linear model via nonparametric regression

techniques, J. Amer. Statist. Assoc. 85 (1990) 387–392.

[11] R.L. Eubank, J.D. Hart, Testing the goodness of fit in regression via order selection criteria, Ann. Statist. 20 (1992)

1412–1425.

[12] R.L. Eubank, J.D. Hart, Commonality of CUMSUM, von Neumann and smoothing based goodness-of-fit tests,

Biometrika 80 (1993) 89–98.

[13] J. Fan, On the optimal rates of convergence for nonparametric deconvolution problems, Ann. Statist. 19 (1991)

1257–1272.

[14] J. Fan, Asymptotic normality for deconvolution kernel density estimators, Sankhy ¯ a Ser. A 53 (1991) 97–110.

[15] J. Fan, K.T. Truong, Nonparametric regression with errors in variables, Ann. Statist. 21 (1993) 1900–1925.

[16] W.A. Fuller, Measurement Error Models, Wiley, New York, 1987.

[17] L.J. Gleser, Estimation in a multivariate “errors in variables” regression model: Large sample results, Ann. Statist.

9 (1) (1981) 24–44.

[18] J.D. Hart, Nonparametric Smoothing and Lack-of-Fit Tests, Springer-Verlag, New York, Inc, 1997.

[19] Hira L. Koul, Pingping Ni, Minimum distance regression model checking, J. Stat. Plann. Inference 119 (1) (2004)

109–141.

[20] L.A. Stefanski, R.J. Carroll, Deconvolution-based score tests in measurement error models, Ann. Statist. 19 (1991)

249–259.

[21] W. Stute, Nonparametric model checks for regression, Ann. Statist. 25 (1997) 613–641.

[22] W. Stute, S. Thies, L.X. Zhu, Model checks for regression: An innovation process approach, Ann. Statist. 26 (1998)

1916–1934.

[23] J.X.Zheng,Aconsistenttestoffunctionalformvianonparametricestimationtechniques,J.Econometrics75(1996)

263–289.

Page 38

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2443

[24] L.X. Zhu, W.X. Song, H.J. Cui, Testing lack-of-fit for a polynomial errors-in-variables model, Acta Math. Appl.

Sin. Engl. Ser. 19 (2003) 353–362.

[25] L.X. Zhu, H.J. Cui, K.W. Ng, Some properties of a lack-of-fit for a linear errors in variables model, Acta Math.

Appl. Sinica 20 (4) (2004) 533–540.

[26] L.X. Zhu, H.J. Cui, Testing lack-of-fit for general linear errors in variables models, Statistica Sinica 15 (4) (2005)

1049–1068.