Page 1

1

CONSISTENT SPECIFICATION TESTS FOR ORDERED

DISCRETE CHOICE MODELS*

Juan Mora and Ana I. Moro-Egido**

WP-AD 2006-17

Correspondence to: Juan Mora. Departamento de Fundamentos del Análisis Económico, Universidad de

Alicante, Apartado de Correos 99, 03080 Alicante, Spain. E-mail address: juan@merlin.fae.ua.es.

Editor: Instituto Valenciano de Investigaciones Económicas, S.A.

Primera Edición Julio 2006

Depósito Legal: V-3367-2006

IVIE working papers offer in advance the results of economic research under way in order to

encourage a discussion process before sending them to scientific journals for their final

publication.

* We are grateful to M.D. Collado, C. Muller and F. Peñaranda for helpful comments. Financial support from

Spanish MEC (SEJ2005-02829/ECON) and Ivie is gratefully acknowledged.

**J. Mora: Universidad de Alicante. A.I. Moro-Egido: Universidad de Granada.

Page 2

CONSISTENT SPECIFICATION TESTS FOR ORDERED DISCRETE CHOICE

MODELS

Juan Mora and Ana I. Moro-Egido

ABSTRACT

We discuss how to test consistently the specification of an ordered discrete

choice model. Two approaches are considered: tests based on conditional moment

restrictions and tests based on comparisons between parametric and

nonparametric estimations. Following these approaches, various statistics are

proposed and their asymptotic properties are discussed. The performance of the

statistics is compared by means of simulations. A variant of the standard

conditional moment statistic and a generalization of Horowitz-Spokoiny’s statistic

perform best.

Codes JEL: C25, C52, C15.

Keywords: Specification Tests, Ordered Discrete Choice Models; Statistical

Simulation.

Page 3

1. INTRODUCTION

Ordered discrete choice variables often appear in Statistics and Econometrics

as a dependent variable. The outcomes of an ordered discrete choice variable

Y are usually labelled as 0,1,...,J. Given certain explanatory variables X =

(X1,...,Xk)0, the researcher is usually interested in analysing whether one (or

some) of the proposed explanatory variables is significant or not, and/or providing

accurate estimates of the conditional probabilities Pr(Y = j | X = x), which

may be interesting by themselves or required in a first stage to derive a two-stage

estimator. Examples of ordered discrete choice dependent variables that have been

used in applied work include: education level attained by individuals (Jiménez and

Kugler 1987); female labour participation: work full-time, work part-time, not to

work (Gustaffson and Stafford 1992); level of demand for a new product or service

(Klein and Sherman 1997); and number of children in a family (Calhoun 1989).

The parametric model that is most frequently used for an ordered discrete choice

variable arises when one assumes the existence of a latent continuous dependent

variable Y∗for which a linear regression model Y∗= X0β0+ u holds; the non-

observed variable Y∗and the observed variable Y are assumed to be related as

follows:

Y = j

if

µ0,j−1≤ Y∗< µ0j,for j = 0,1,...,J,

(1)

where µ0,−1≡ −∞, µ0,J≡ +∞ and µ00, µ01, ..., µ0,J−1are threshold parameters

such that µ00≤ µ01≤ ... ≤ µ0,J−1. Assuming independence between u and X, the

relationship (1) induces the following specification for Y :

Pr(Y = j | X) = F(µ0j− X0β0) − F(µ0,j−1− X0β0),for j = 0,1,...,J,

(2)

where F(·) is the distribution function of u, usually referred to as the “link func-

1

Page 4

tion”. In a parametric framework, for the sake of identification of the model it

is usually assumed that the first threshold parameter µ00is zero; additionally

it is assumed that F(·) is entirely known, the most typical choices being the

standard normal distribution (“ordered probit model”) and the logistic distrib-

ution (“ordered logit model”). With these assumptions we obtain a full para-

meterization of the conditional distribution Y | X = x, with parameter vector

θ0= (β0

0,µ0

0)0∈ Θ ⊂ Rk+J−1.

The key assumptions in a parametric ordered discrete choice model are the

linearity in the latent regression model, the form of the link function F(·) (specif-

ically, its symmetry and its behaviour at the tails) and the independence between

u and X in the latent regression model (which, in turn, implies that it is ho-

moskedastic). Parameter estimates and predicted probabilities based on ordered

discrete choice models are inconsistent if any of these key assumptions is not met

(note that in contrast to standard regression models, heteroskedasticity or mis-

specification of the distribution function of u provoke inconsistencies). On the

other hand, there exist semiparametric methods (Klein and Sherman 2002) and

even nonparametric methods (Matzkin 1992) that allow for consistent estimation

of the conditional probabilities Pr(Y = j | X = x) under much weaker assump-

tions; these methods have not been much used in empirical applications due their

technical complexity, but they provide a reasonable alternative to purely para-

metric methods. Therefore, it is especially important to test the specification of

a parametric ordered discrete choice model, since misspecification errors lead to

inconsistent estimation, and it is possible to use alternative consistent techniques.

In recent years, various statistics have been proposed to test one (or some) of

the assumptions of a parametric ordered discrete choice model. However, most

2

Page 5

of these statistics are constructed to detect only specific departures from the null

hypothesis; for example, Weiss (1997) proposes to test the null of a homoskedastic

u versus some parametric heteroskedastic alternatives; Glewwe (1997) proposes

to test the null that u is normal versus the alternative that it is a member of

the Pearson family; Murphy (1996) proposes to test the null that F(·) is logistic

versus various alternatives; and Santos Silva (2001) proposes a test statistic to face

two non-nested parametric specifications. By contrast, we focus here on consistent

specification test statistics, i.e. test statistics that allow us to detect any deviation

from the null hypothesis “the proposed parametric specification is correct”.

Roughly speaking, two main approaches may be followed in our context to de-

rive consistent specification statistics: tests based on conditional moment (CM)

restrictions, which can be constructed following the general methodology described

in Newey (1985), Tauchen (1985) and Andrews (1988); and tests based on compar-

isons between parametric and nonparametric estimations, such as those proposed

by Andrews (1997), Stute (1997) and Horowitz and Spokoiny (2001), among oth-

ers. The objective of this article is to examine how these two approaches can

be applied to test the specification of an ordered discrete choice model, and to

compare the performance of the resulting statistics by means of simulations.

Our results highlight the importance of covariance matrix estimation in CM

tests. Specifically, standard CM statistics, computed with covariance matrix es-

timators based on actual derivatives, usually perform worse than statistics based

on comparisons between parametric and nonparametric estimations. However, ex-

ploiting the information about conditional expectations contained in the model,

here we derive variants of standard CM statistics that perform much better than

standard CM statistics; additionally, these variants are easy to compute, since

3

Page 6

they can be obtained using artificial regressions. On the other hand, our results

suggest that the methodology proposed in Horowitz and Spokoiny (2001) can be

extended successfully to the context of ordered discrete choice models, and the

resulting statistic outperforms other statistics that are based on comparisons be-

tween parametric and nonparametric methods. However, when we compare the

performace of the generalization of Horowitz-Spokoiny’s statistic with the variant

of the standard CM statistic that we propose to use, there is no clear-cut answer

to the question of which one performs better: the latter outperforms the former in

some cases (e.g. with heteroskedastic alternatives) and is computationally much

less demanding, but the additional power that is obtained when using the former

in some other cases (e.g. with non-normal alternatives) may well justify the extra

computing effort.

The rest of the paper is organized as follows. In Sections 2 and 3 we describe

the test statistics considered here. In Section 4 we report the results of various

Monte Carlo experiments. In Section 5 we present two empirical applications. In

Section 6 we conclude. Some technical details are relegated to an Appendix.

2. STATISTICS BASED ON CONDITIONAL MOMENT

RESTRICTIONS

We assume that independent and identically distributed (i.i.d.) observations

(Yi,X0

i)0are available, where, hereafter, i = 1,...,n. Additionally, the following

notation will be used: Dji ≡ I(Yi = j), for j = 0,1,...,J , where I(·) is the

indicator function; and, given θ ≡ (β0,µ0)0∈ Θ ⊂ Rk+J−1, p0i(θ) ≡ F(−X0

pJi(θ) ≡ 1 − F(µJ−1− X0

J ≥ 3, pji(θ) ≡ F(µj− X0

iβ);

iβ); if J ≥ 2, p1i(θ) ≡ F(µ1− X0

iβ) − F(−X0

iβ); and if

iβ) − F(µj−1− X0

iβ), for j = 2,...,J − 1.

4

Page 7

Let us consider mji(θ) ≡ Dji− pji(θ). Then, it follows from (2) that

E{mji(θ0) | Xi} = 0,

for j = 0,1,...,J.

(3)

This yields J + 1 conditional moment restrictions that must be satisfied. In

fact, since the sum of all probabilities adds to one, only J moment conditions

are relevant to construct a test statistic. Specifically, we disregard the moment

condition for j = 0 and consider the random vectorPn

the J × 1 column vector whose j-th component is mji(θ) andbθ is a well-behaved

estimator of θ0. To derive an asymptotically valid test statistic, we must analyze

the asymptotic behaviour ofPn

Taylor expansion of mji(bθ) − mji(θ0), it follows that

n−1/2

mi(bθ) = n−1/2

where B0≡ E{Bi(θ0)} and Bi(θ) denotes the J×(k+J−1) matrix whose j-th row

is ∂mji(θ)/∂θ0. Thus, the asymptotic behaviour of n−1/2Pn

the asymptotic behaviour of n1/2(bθ−θ0). In our context, since our null hypothesis

specifies the conditional distribution Yi| Xi= x, the natural way to estimate θ0

is maximum likelihood (ML). The log-likelihood of the model can be written as

i=1mi(bθ), where mi(θ) is

i=1mi(bθ). First of all note that using a first-order

n

X

i=1

n

X

i=1

mi(θ0) + B0× n1/2(bθ − θ0) + op(1),

(4)

i=1mi(bθ) depends on

lnL(θ) =

n

X

i=1

J

X

j=0

Djilnpji(θ).

We assume that the regularity conditions that ensure that the ML estimator is

asymptotically normal are met (see e.g. Amemiya 1985, Chapter 9). In this case,

this amounts to saying that the ML estimatorbθ satisfies

n1/2(bθ − θ0) = A−1

0 × n−1/2

n

X

i=1

gi(θ0) + op(1),

(5)

5

Page 8

where gi(θ) ≡PJ

term in lnL(θ), and A0is the limiting information matrix, i.e. A0= E{Ai(θ0)},

for Ai(θ) ≡ −∂gi(θ)/∂θ0. From (4) and (5) it follows that:

⎡

⎣

where IJis the J × J identity matrix. Hence,

X

where

j=1Dji∂ lnpji(θ)/∂θ is the derivative with respect to θ of the i-th

n−1/2

n

X

i=1

mi(bθ) = [IJ: B0A−1

0]

⎢

n−1/2Pn

i=1mi(θ0)

n−1/2Pn

i=1gi(θ0)

⎤

⎦+ op(1),

⎥

n−1/2

n

i=1

mi(bθ)

d

−→ N (0,V0),

(6)

V0≡ [IJ: B0A−1

0]Q0[IJ: B0A−1

0]0,

(7)

Q0 ≡ E{Qi(θ0)} and Qi(θ) ≡ (mi(θ)0,gi(θ)0)0(mi(θ)0,gi(θ)0). Finally, to derive

a test statistic, a consistent estimator of V0must be proposed. It is worthwhile

discussing in detail how V0 can be estimated, since it is well-known that the

finite-sample performance of statistics based on conditional moment restrictions

crucially depends on this.

According to (7), the natural candidate for estimating V0 is Vn,1 ≡ [IJ :

BnA−1

and An ≡ n−1Pn

gressions (see e.g. MacKinnon 1992), it is possible to propose an alternative

n]Qn [IJ : BnA−1

n]0, where Qn ≡ n−1Pn

i=1Qi(bθ), Bn ≡ n−1Pn

i=1Bi(bθ)

i=1Ai(bθ). However, following the literature on artificial re-

estimator of V0 that leads to a computationally simpler statistic.To derive

it, note that E{mli(θ)gi(θ)0} = E{pli∂ lnpli(θ)/∂θ0} = E{−∂mli(θ)/∂θ0}; hence

E{mi(θ0)gi(θ0)0} = −B0. Moreover, the information matrix equality ensures that

E{gi(θ0)gi(θ0)0} = A0. Using these equalities, it follows that V0equals

E{mi(θ0)mi(θ0)0} − E{mi(θ0)gi(θ0)0}E{gi(θ0)gi(θ0)0}−1E{gi(θ0)mi(θ0)0}. (8)

6

Page 9

This leads us to consider

Vn,2≡ n−1[

n

X

i=1

mi(bθ)mi(bθ)0−

n

X

i=1

mi(bθ)gi(bθ)0{

n

X

i=1

gi(bθ)gi(bθ)0}−1

n

X

i=1

gi(bθ)mi(bθ)0].

Vn,1and Vn,2are the standard choices for estimating V0. Both are obtained

by simply replacing population moments by sample moments. However, in our

context we can do better than that. Observe that our null hypothesis yields the

specification of the conditional distribution Y | X = x. This means that we can

compute the conditional expectation with respect to the independent variables

and then, by the law of iterated expectations, the sample analog of this condi-

tional expectation is a consistent estimator of the the population moment; e.g.

E{mi(θ0)mi(θ0)0} can be consistently estimated with n−1Pn

Following this approach, expression (8) for V0suggests that we can estimate this

matrix with Vn,3≡ n−1Pn

Vi,3(θ) ≡ EX{mi(θ)mi(θ)0}−EX{mi(θ)gi(θ)0}EX{gi(θ)gi(θ)0}−1EX{gi(θ)mi(θ)0}.

i=1EX{mi(bθ)mi(bθ)0}.

i=1Vi,3(bθ), where

Observe that the analytical expressions of these conditional expectations are easy

to derive. On the other hand, this approach could also be followed using expression

(7) rather than (8) as a starting point; but the estimator that is obtained in this

way proves to be again Vn,3.

To sum up, we can consider three possible consistent estimates for V0 and,

thus, we can derive three possible test statistics

C(M)

n,l ≡ n−1{

n

X

i=1

mi(bθ)0}{Vn,l}−1{

n

X

i=1

mi(bθ)},

for l = 1,2,3. From (6), it follows that if specification (2) is correct then C(M)

n,l

d

−→

χ2

J; hence, given a significance level α, an asymptotically valid critical region is

7

Page 10

{C(M)

facilitate the computation of these three statistics, in Appendix A1 we derive the

n,l

> χ2

1−α;J}, where χ2

1−α;Jis the 1 − α quantile of a χ2

Jdistribution. To

specific analytical expressions for Vn,1, Vn,2and Vn,3that are obtained here. As is

well-known in the relevant literature, note that C(M)

an artificial regression because, sincePn

tic C(M)

n,2proves to be the explained sum of squares in the artificial regression of a

n,2can also be computed using

i=1gi(bθ) = (∂ lnL/∂θ)|θ=bθ= 0, the statis-

vector of ones on mi(bθ)0and gi(bθ). On the other hand, using a Cholesky decom-

position of EX{mi(θ)mi(θ)0}, it is also possible to derive an artificial regression

whose explained sum of squares coincides with C(M)

n,3 (see Appendix A2).

Still within the framework of tests based on conditional moment restrictions,

more statistics can be derived following the methodology described in Andrews

(1988), who proposes increasing the degrees of freedom of the statistic by parti-

tioning the support of the regressors. Specifically, let us assume that the support

of Xi is partitioned into G subsets A1,...,AG. Then we can define mjgi(θ) ≡

mji(θ)I(Xi ∈ Ag) for j = 1,...,J and g = 1,...,G, and thus consider the JG

conditional moment restrictions

E{mji(θ0) | Xi} = 0,

for j = 1,...,J, and g = 1,...,G.

To derive a test statistic, we consider now the random vectorPn

m(P)

i

(θ) ≡ mi(θ) ⊗Pi, and Piis the G×1 matrix whose g-th row is I(Xi∈ Ag).

As before, it follows that

i=1m(P)

i

(bθ), where

n−1/2

n

X

i=1

m(P)

i

(bθ)

d

−→ N(0,V(P)

0 ),

(9)

where V(P)

0

≡ [IJG: B(P)

(θ0)}, B(P)

0 A−1

0]Q(P)

0 [IJG: B(P)

0 A−1

0]0, B(P)

0

≡ E{B(P)

(θ)0,gi(θ)0)0(m(P)

i

(θ0)}, Q(P)

0

≡

E{Q(P)

ii

(θ) ≡ Bi(θ) ⊗ Pi and Q(P)

i

(θ) ≡ (m(P)

ii

(θ)0,

8

Page 11

gi(θ)0). Now, the natural candidate forestimating V(P)

0

is V(P)

n,1≡ [IJG: B(P)

≡ n−1Pn

n A−1

n]Q(P)

n

[IJG: B(P)

n A−1

n]0, where B(P)

n

≡ n−1Pn

i=1B(P)

i

(bθ) and Q(P)

n

i=1Q(P)

i

(bθ).

With the same reasoning as before, two other consistent estimators of V(P)

0

can be

proposed: V(P)

n,2and V(P)

n,3, defined in the same way as Vn,2and Vn,3, respectively,

but replacing mi(bθ) by m(P)

three different test statistics

i

(bθ) (see Appendix A1). In this fashion we again obtain

C(MP)

n,l

≡ n−1{

n

X

i=1

m(P)

i

(bθ)0}{V(P)

n,l}−1{

n

X

i=1

m(P)

i

(bθ)},

for l = 1,2,3. From (9), it follows that if specification (2) is correct then C(MP)

n,l

d

−→

χ2

JG; hence, C(MP)

n,l

can also be used as a test statistic. Additionally, both C(MP)

n,2

and C(MP)

n,3

can be computed using an artificial regression. Specifically, C(MP)

n,2

coincides with the explained sum of squares in the artificial regression of a vector

of ones on m(P)

i

(bθ) and gi(bθ), whereas the artificial regression which allows us to

is described in Appendix A2.

compute C(MP)

n,3

Finally, when J ≥ 2 and k ≥ 2, Butler and Chatterjee (1997) propose using a

test of overidentifying restrictions. Observe that (3) implies that the following Jk

moment conditions hold:

E{Xlimji(θ0)} = 0,

for l = 1,...,k, j = 1,...,J,

where Xlidenotes the l-th component of Xi. Since the number of parameters is

k + J − 1, a test of overidentifying restrictions is possible if Jk > k + J − 1, and

this condition holds if and only if J ≥ 2 and k ≥ 2. Adapting the general results

of the generalized method of moments tests to our framework (see e.g. Hamilton

1994, Chapter 14), it follows that now the test of overidentifying restrictions can

be computed as follows: i) obtain an initial estimate of θ0, say θ, by minimizing

9

Page 12

sn(θ)0sn(θ), where sn(θ) ≡ n−1Pn

i=1mi(θ) ⊗ Xi; ii) compute Sn(θ), where

Sn(θ) ≡

⎡

⎢⎢⎢⎢⎢

⎣

S11,n(θ) ... S1J,n(θ)

.........

SJ1,n(θ) ... SJJ,n(θ)

⎤

⎥⎥⎥⎥⎥

⎦

,

Sjj,n(θ) = n−1Pn

l; iii) obtain a final estimate of θ0, sayeθ, by minimizing sn(θ)0Sn(θ)−1sn(θ); and

iv) compute the test statistic

i=1XiX

0

ipji(1−pji) and Sjl,n(θ) = −n−1Pn

i=1XiX

0

ipjiplifor j 6=

C(BC)

n

= nsn(eθ)0Sn(θ)−1sn(eθ).

n

−→ χ2

If specification (2) is correct then C(BC)

d

Jk−(k+J−1); thus, C(BC)

n

can also be

used as a statistic to perform an asymptotically valid specification test.

3. STATISTICS BASED ON COMPARISONS BETWEEN

PARAMETRIC AND NONPARAMETRIC ESTIMATIONS

Many specification tests have recently been developed using comparisons be-

tween parametric and nonparametric estimations. In this paper we consider three

of them: one based on the comparison of joint empirical distribution functions

(Andrews 1997), and two others based on comparisons between regression esti-

mations, either non-smoothed (Stute 1997) or smoothed (Horowitz and Spokoiny

2001). In fact, as we discuss below, only the first of these statistics applies directly

to our framework, but the other two can be conveniently modified to cover our

problem. We focus on these three statistics because they have the advantage that

their performance does not depend on the choice of a bandwidth value1. Note as

1The test statistics proposed by Andrews (1997) and Stute (1997) do not use any bandwidth.

Horowitz and Spokoiny (2001) propose using as a test statistic a maximum from among statistics

computed with different bandwidths; hence the influence of bandwidth selection is ruled out.

10

Page 13

well that these three statistics require the use of a root-n-consistent estimator of

θ0; as in the previous section, the ML estimator is the natural choice.

Andrews (1997) suggests testing a parametric specification of the conditional

distribution Y | X = x by comparing the joint empirical distribution function of

(Y,X0)0and an estimate of the joint distribution function based on the parametric

specification. Specifically, if F(· | x,θ0) is the assumed parametric conditional

distribution function of Y | X = x and we denote

X

wherebθ is a root-n-consistent estimator of θ0, Andrews (1997) proposes using the

Kolmogorov-Smirnov-type test statistic:

Hn(x,y) ≡ n−1{

n

i=1

I(Yi≤ y,Xi≤ x) −

n

X

i=1

F(y | Xi,bθ)I (Xi≤ x)},

C(AN)

n

≡ max

1≤j≤n

¯¯n1/2Hn(Xj,Yj)¯¯.

n

cannot be tabulated but, given a sig-

The asymptotic null distribution of C(AN)

nificance level α, an asymptotically valid critical region is {C(AN)

c(AN)

α,n

is a bootstrap critical value. This bootstrap critical value can be obtained

n

> c(AN)

α,n }, where

as the 1 − α quantile of {C(AN)∗

tistic constructed in the same way as C(AN)

n,b

, b = 1,...,B}, where C(AN)∗

, but using as sample data the b-th

n,b

is a bootstrap sta-

n

bootstrap sample {(Y∗

and Y∗

ib,X∗0

ib)0}n

i=1, which, in turn, is obtained as follows: X∗

ib= Xi

ibis generated with distribution function F(· | Xi,bθ).

Stute (1997) suggests testing the specification of a regression model by compar-

ing parametric and nonparametric estimations of the regression function. Specif-

ically, if m(·,θ0) is the specified parametric regression function and

X

Rn(x) ≡ n−1[

n

i=1

{Yi− m(Xi,bθ)}I(Xi≤ x)],

11

Page 14

wherebθ is a root-n-consistent estimator of θ0, then he proposes using the Cramér-

von Mises-type statistic Tn ≡Pn

applied to our problem since the specification that we consider is not a regression

l=1Rn(Xl)2. This statistic cannot be directly

model if J > 1. However, observe that (2) holds if and only if

E(Dji| Xi) = pji(θ0)

for j = 1,...,J,

(10)

that is, our specification is equivalent to the the specification of J regression

models. Hence, we can derive a test statistic for our problem following Stute’s

methodology as follows: first, we consider Stute’s statistic for the j-th regression

model in (10), which proves to be

C(ST)

j,n

≡ n−2

n

X

l=1

"

n

X

i=1

{Dji− pji(bθ)}I(Xi≤ Xl)

#2

;

and then we consider the overall statistic

C(ST)

n

≡

J

X

j=1

C(ST)

j,n.

This way of defining an overall statistic ensures that any deviation in any of the

J regression models considered in (10) will be detected; but other definitions of

the overall statistic would also ensure this, e.g. max1≤j≤nC(ST)

The asymptotic null distribution of C(ST)

j,n

orPJ

j=1{C(ST)

j,n}2.

n

is not known either, but approximate

critical values can be derived using the bootstrap procedure described above. The

asymptotic validity of this bootstrap procedure in this context can be proved using

arguments similar to those in Stute et al. (1998).

Horowitz and Spokoiny (2001) propose testing the specification of a regression

model comparing smoothed nonparametric and parametric estimations of the re-

gression function with various bandwidths. Specifically, denote

Rn,h(x) ≡

n

X

i=1

{Yi− m(Xi,bθ)}wi,h(x),

12

Page 15

where wi,h(x) ≡ K{(x − Xi)/h}/Pn

weight, K(·) is the kernel function, h is the bandwidth and, as above, m(·,θ0) is

the specified parametric regression function andbθ is a root-n-consistent estimator

of θ0; with this notation, the statistic proposed in Horowitz and Spokoiny (2001)

is Tn≡ maxh∈Hn{Pn

stants and Hnis a finite set of possible bandwidth values. As above, we can apply

l=1K{(x − Xl)/h} is the Nadaraya-Watson

l=1Rn,h(Xl)2−ˆ Nh}/ˆVh, whereˆ NhandˆVhare normalizing con-

this methodology in our context by first computing the statistic corresponding to

the j-th regression model in (10), which proves to be

C(HS)

j,n

≡ max

h∈Hn

Pn

l=1

hPn

i=1{Dji− pji(bθ)}wi,h(Xl)

i=1

i2

jl

−Pn

i=1aii,hˆ σ2

ji

©2Pn

Pn

l=1a2

il,hˆ σ2

jiˆ σ2

ª1/2

where ail,h≡Pn

consider the overall statistic

m=1wi,h(Xm)wl,h(Xm) and ˆ σ2

ji≡ pji(bθ){1 − pji(bθ)}; and then we

C(HS)

n

≡

J

X

j=1

C(HS)

j,n .

Again, any deviation in any of the J regression models considered in (10) is

detected in this way. The asymptotic null distribution of C(HS)

n

is not known

either, but approximate critical values can be derived using the bootstrap pro-

cedure described above. Observe that neither the bootstrap procedure nor the

conditional variance estimators ˆ σ2

jithat we use are the ones proposed in Horowitz

and Spokoiny (2001), because we exploit the fact that in our case the dependent

variable is binary; in this way, a better finite-sample performance is obtained.

4. SIMULATIONS

In order to check the behavior of the test statistics, we perform eight Monte

Carlo experiments. In all of them we test the null hypothesis that (2) holds with

13