Page 1

Journal of Multivariate Analysis 99 (2008) 2406–2443

www.elsevier.com/locate/jmva

Model checking in errors-in-variables regression

Weixing Song

Department of Statistics, Kansas State University, Manhattan, KS 66502, USA

Received 28 January 2007

Available online 10 March 2008

Abstract

This paper discusses a class of minimum distance tests for fitting a parametric regression model to a

class of regression functions in the errors-in-variables model. These tests are based on certain minimized

distancesbetweenanonparametricregressionfunctionestimatorandadeconvolutionkernelestimatorofthe

conditional expectation of the parametric model being fitted. The paper establishes the asymptotic normality

of the proposed test statistics under the null hypothesis and that of the corresponding minimum distance

estimators. We also prove the consistency of the proposed tests against a fixed alternative and obtain the

asymptotic distributions for general local alternatives. Simulation studies show that the testing procedures

are quite satisfactory in the preservation of the finite sample level and in terms of a power comparison.

c ? 2008 Elsevier Inc. All rights reserved.

AMS 1991 subject classifications: primary 62G08; secondary 62G10

Keywords: Errors-in-variables model; Deconvolution kernel estimator; Minimum distance; Lack-of-fit test

1. Introduction

In the errors-in-variables regression model of interest here, one observes Z,Y obeying the

model

Y = µ(X) + ε,

where X is the unobservable d-dimensional random design variables. The random variables

(X,u,ε) are assumed to be mutually independent, with u being d-dimensional and ε being one-

dimensional having E(ε) = 0, E(u) = 0. The marginal densities of X and u will be denoted as

fX, furespectively. For the sake of identifiability, the density fuis assumed to be known. This is

Z = X + u,

(1.1)

E-mail address: weixing@ksu.edu.

0047-259X/$ - see front matter c ? 2008 Elsevier Inc. All rights reserved.

doi:10.1016/j.jmva.2008.02.034

Page 2

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2407

a common and standard assumption in the literature of the errors-in-variables regression models.

The density fXand the distribution of ε may not be known.

Errors-in-variables regression models have received continuing attention in the statistical

literature. For some literature reviews, see [17,1,16,2,4,13–15,6,7] and the references therein.

Most of the existing literature has focused on the estimation problem. The lack-of-fit testing

problemhasnotbeendiscussedthoroughly.Onlysomesporadicresultsonthistopiccanbefound

in the literature. See [16,5] for some informal lack-of-fit tests in the linear errors-in-variables

regression model. The problem of interest in this paper is to develop tests for the hypothesis

H0: µ(x) = θ?

in the model (1.1).

Many interesting and profound results, on the other hand, are available for the above testing

problem in the absence of errors in the independent variables, that is, for the ordinary regression

models. For instance, [10–12,18,21,23,22], among others, give such results.

For the errors-in-variables model, in the case of r(x) = x, Zhu, Cui and Ng [25] found a

necessary and sufficient condition for the linearity of the conditional expectation E(ε|Z) with

respect to Z. Based on this fact, they constructed a score type lack-of-fit test. This test is of

limited use since the normality assumptions on the design variable and measurement errors are

rather restrictive. Zhu, Song and Cui [24] and Cheng and Kukush [8] independently extended

the method of Zhu, Cui and Ng to deal with a polynomial errors-in-variables model without

assuming the normality. The model checking problem for general r(x) was studied by Zhu and

Cui [26]. After correcting for the bias of the conditional expectation given Z of least square

residuals, they construct a score type test based on the modified residuals, but the theoretical

arguments still require the density function of X to be known up to an unknown parameter. This

restriction will be removed in the current developments. Cheng and Kukush [8] does not require

the density function of X to be known, but their procedure puts very strict restrictions on the

moments of the predictor and the measurement error, also their procedure is computationally

extensive.

The paper is organized as follows. Section 2 introduces the construction of the test. Section 3

states the needed assumptions and the main results. A multidimensional extension of Lemma A.1

in [20] and the consistency and the asymptotic power of the test against certain fixed alternatives

and a class of local alternatives are also stated in Section 3. Section 4 includes some results from

finite sample simulation studies. The conclusion and some further discussion on the MD test are

present in Section 5. All the technical proofs are postponed to Section 6.

0r(x),

for some θ0∈ Rq,

versus H1: H0is not true,

(1.2)

2. Construction of test

The way for constructing tests here is to first recognize that the independence of X and ε

and E(ε) = 0 imply that ν(z) = E(Y|Z = z) = E(µ(X)|Z = z). Thus one can consider

the new regression model Y = ν(Z) + ζ in which the error ζ is uncorrelated with Z and has

mean 0. The problem of testing for H0is now transformed to a test for ν(z) = νθ0(z), where

νθ(z) = θ?E(r(X)|Z = z).

A very important question related to the above transformation is: Are the two hypotheses,

H10 : µ(x) = mθ(x), for all x, and H20 : ν(z) = νθ(z), for all z, equivalent? The answer

is generally negative, but note that E(m1(X)|Z = z) = E(m2(X)|Z = z) is equivalent

to

as a distribution family with parameter z ∈ Rd, forms a complete family, then these two

?m1(x) fX(x) fu(z − x)dx =

?m2(x) fX(x) fu(z − x)dx for all z, hence if fu(z − ·),

Page 3

2408

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

hypotheses are indeed equivalent. This is the case, for example, for double exponential and

normal distributions.

For any z for which fZ(z) > 0, we have ν(z) =?µ(x) fX(x) fu(z − x)dx/fZ(z). If fXis

suppose (Yi, Zi), i = 1,2,...,n are independently and identically distributed copies of (Y, Z)

from model (1.1), h is a bandwidth only depending on n and Khi(z) = K((z − Zi)/h)/hdfor

any kernel function K and bandwidth h. If we define

?

nfZ(z)

i=1

where G isaσ-finitemeasureonRd,I isacompactsubsetinRd,thenonecanseethat,¯Tnindeed

is a weighted distance between a nonparametric kernel estimator and a parametric estimator of

the regression function ν(z). Then, we may use¯θn = argminθ∈Rq¯Tn(θ) to estimate θ, and

construct the test statistic through¯Tn(¯θn). The same method, called minimum distance (MD)

procedure, was used in the recent paper of Koul and Ni [19] (K–N) in the classical regression

set up. One can see that, if fX is known, the above test procedure will be a trivial extension

of K–N.

Unfortunately, fXis generally not known and hence fZand Q are unknown. This makes the

above procedure infeasible. To construct the test statistic, one needs estimators for fZand Q. In

this connection the deconvolution kernel density estimators are found to be useful here.

For any density L on Rd, let φLdenote its characteristic function and define

?

x ∈ Rd,

where i = (−1)1/2. The function ˆfXhis called a deconvolution kernel density estimator and it

can be used to estimate fX. See Masry [9]. Note that Q(z) = R(z)/fZ(z), where

?

Then, one can estimate Q(z) byˆQn(z) =ˆRn(z)/ˆfZh(z), where

ˆRn(z) =

known then fZis known and hence νθis known except for θ. Let Q(z) = E(r(X)|Z = z). Now

¯Tn(θ) =

?

I

1

n

?

Khi(z)(Yi− θ?Q(Zi))

?2

dG(z),θ ∈ Rq,

Lh(x) =

1

(2π)d

Rdexp(−it · x)

φL(t)

φu(t/h)dt,

ˆfXh(x) =

1

nhd

n

?

i=1

Lh

?x − Zi

h

?

,

(2.1)

R(z) =

r(x) fX(x) fu(z − x)dx,

fZ(z) =

?

fX(x) fu(z − x)dx.

(2.2)

?

r(x)ˆfXh(x) fu(z − x)dx,

ˆfZh(z) =

?

ˆfXh(x) fu(z − x)dx.

At this point, it is worth mentioning that, by the definition of Lhand a direct calculation, one

can showˆfZhis nothing but the classical kernel estimator of fZwith kernel L and bandwidth h.

That is, ˆfZh(z) =?n

classical kernel estimator in which a different kernel other than L may be adopted.

It is well known that the convergence rates of the deconvolution kernel density estimators

are slower than that of the classical kernel density estimators. See [9,20] and [14]. This creates

extra difficulty when considering the asymptotic behaviors of the analogs of the corresponding

MD estimators and test statistics. In fact, the consistency of the corresponding MD estimator is

still available, but its asymptotic normality and that of the corresponding MD test statistic may

i=1L((z − Zi)/h)/nhd. Our proposed inference procedures will be based

on the analogs of¯Tnwhere Q(z) is replaced by the above estimatorˆQn, and fZis replaced by a

Page 4

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

2409

not be obtained. We overcome this difficulty by using different bandwidths and splitting the full

sample, say S, with sample size n into two subsamples, S1with size n1, and S2with size n2,

then using the subsample S2to estimate fXhence Q(z) and the subsample S1to estimate other

quantities. The sample size allocation scheme is stated in Section 3. A more detailed discussion

on this sample size allocation scheme can be found in Section 5. Without loss of generality, we

will number the observations in S1from 1 to n1, and the observations in S2from n1+ 1 to n.

Also all the integration with respect to G in the following will be over the compact subset I.

To be precise, let

˜fZh2(z) =

n1

?

?

?

i=1

Kh2i(z)/n1,

ˆfXw(x) =

n

?

j=n1+1

Lw((x − Zj)/w)/n2wd,

ˆRn2(z) =

r(x)ˆfXw1(x) fu(z − x)dx,

ˆfZw2(z) =

ˆfXw2(x) fu(z − x)dx,

ˆQn2(z) =ˆRn2(z)/ˆfZw2(z),

then define, for θ ∈ Rq,

Mn(θ) =

??

1

n1˜fZh2(z)

n1

?

i=1

Kh1i(z)(Yi− θ?ˆQn2(Zi))

?2

dG(z),

(2.3)

with h1, h2depending on n1, and w1and w2depending on n2. One can easily see that Mn(θ)

is a weighted distance between a nonparametric kernel estimator and a deconvolution kernel

estimator of the regression function ν(z) under the null hypothesis. Then we may use

ˆθn= arg inf

θ∈RqMn(θ)

(2.4)

to estimate θ, and construct the test statistic through Mn(ˆθn). We first prove the consistency of

ˆθnfor θ, then the asymptotic normality of√n1(ˆθn− θ0). Finally, let

n1

?

ˆΓn= 2hd

i?=j=1

We prove that the asymptotic null distribution of the normalized test statistic

ˆζi= Yi−ˆθ?

nˆQn2(Zi),

ˆCn= n−2

1

i=1

?

K2

h1i(z)ˆζ2

idˆψh2(z),

?2

1n−2

1

n1

?

??

Kh1i(z)Kh1j(z)ˆζiˆζjdˆψh2(z),

dˆψh2(z) :=

dG(z)

˜f2

Zh2(z).

ˆDn= n1hd/2

is standard normal. Consequently, the test that rejects H0 whenever |ˆDn| > zα/2 is of the

asymptotic size α, where zαis the 100(1 − α)% percentile of the standard normal distribution.

1

ˆΓ−1/2

n

(Mn(ˆθn) −ˆCn)

(2.5)

3. Assumptions and main results

This section first states the various conditions needed in the subsequent sections. About the

errors, the underlying design and the integrating σ-finite measure G, we assume the following:

Page 5

2410

W. Song / Journal of Multivariate Analysis 99 (2008) 2406–2443

(e1) The random variables {(Zi,Yi) : Zi ∈ Rd,Yi ∈ R,i = 1,2,...,n} are independently

and identically distributed with the conditional expectation ν(z) = E(Y|Z = z) satisfying

?ν2dG < ∞, where G is a σ-finite measure on Rd.

function δ2(z) = E[(θT

(e3) E|ε|2+δ< ∞, E?r(X)?2+δ< ∞, for some δ > 0.

(e4) E|ε|4< ∞, E?r(X)?4< ∞.

(u) The density function fuis continuous and?|φu(t)|dt < ∞.

(f2) For some δ0> 0, the density fZis bounded below on the compact subset Iδ0of Rd, where

Iδ0= {y ∈ Rd: max

(e2) 0 < σ2

ε= Eε2< ∞, E?r(X)?2< ∞, where ? · ? denotes the usual Euclidean norm. The

0r(X) − θT

0Q(Z))2|Z = z] is a.s. (G) continuous.

(f1) The density fXand its all possible first and second derivatives are continuous and bounded.

1≤j≤d|yj− zj| ≤ δ0, y = (y1,..., yd)?,z = (z1zd)?,z ∈ I},

(g) G has a continuous Lebesgue density g.

(q) Σ0=?

About the null model we need to assume the following:

Q(z)Q?(z)dG(z) is positive definite.

(m1) There exist a positive continuous function J(z) and a positive number T0, such that for all

t with ?t? > T0,

?t?−α

φu(t)

holds for some α ≥ 0 and all z ∈ Rd, and EJ2(Z) < ∞.

(m2) E?r(Z)?2< ∞, EI2(Z) < ∞, where I(z) =??r(x)? fu(z − x)dx.

About the kernel functions, we assume:

????

?(r(z − x) − r(z))exp(−it?x) fu(x)dx

????≤ J(z),

(?) The kernel function L is a density, symmetric around the origin, supt∈Rd ?t?α|φL(t)| < ∞,

for all t ∈ Rd; moreover,??v?2L(v)dv < ∞ and??t?α|φL(t)|dt < ∞ with α as in (m1).

About the bandwidths and sample size we need to assume the following:

(n) With n denoting the sample size, let n1, n2be two positive integers such that n = n1+ n2,

n2= [nb

(h1)h1∼ n−a

(w1) w1= n−1/(d+4+2α)

Assumption (m1) is not so strict as it appears. Some commonly used regression functions

such as polynomial and exponential functions indeed satisfy this assumption as shown below.

1], b > 1 + (d + 2α)/4, where α is as in (m1).

1, where 0 < a < min(1/2d,4/d(d + 4)). (h2) h2= c1(log(n1)/n1)1/(d+4).

2

. (w2) w2= c2(log(n2)/n2)1/(d+4).

Example 1. Suppose d = q, r(x) = x, and u ∼ Nd(0,Σu). Then,

????

?(r(z − x) − r(z))exp(−it?x) fu(x)dx

φu(t)

????=

????

?

∂φu(t)

i∂t

x exp(−it?x) fu(x)dx

????· exp(t?Σut/2) ≤ c?t?,

????· exp

?t?Σut

2

?

=

????

where the constant c depends only on Σu. Hence (m1) holds with α = 1 and J(z) = c.