Page 1

Simple Estimators for Monotone Index Models

Hyungtaik Ahn

Dongguk University,

Hidehiko Ichimura

University College London,

James L. Powell

University of California, Berkeley

(powell@econ.berkeley.edu)

June 2004

Abstract

In this paper, estimation of the coe¢cients in a “single-index” regression model is considered

under the assumption that the regression function is a smooth and strictly monotonic function

of the index. The estimation method follows a “two-step” approach, where the …rst step uses a

nonparametric regression estimator for the dependent variable, and the second step estimates the

unknown index coe¢cients (up to scale) by an eigenvector of a matrix de…ned in terms of this …rst-

step estimator. The paper gives conditions under which the proposed estimator is root-n-consistent

and asymptotically normal.

JEL Classi…cation: C24, C14, C13.

Acknowledgements

This research was supported by the National Science Foundation. Hyungtaik Ahn’s research

was supported by Dongguk Research Fund. We are grateful to Bo Honoré, Ekaterini Kyriazidou,

Robin Lumsdaine, Thomas Rothenberg, Paul Ruud, and Mark Watson for their helpful comments.

Page 2

1. Introduction

Estimation of the unknown coe¢cients ?0in the single index regression model

E(yijxi) = G(x0

i?0); (1.1)

where yiand xiare observable and G(?) is an unknown function, has been investigated in a number

of papers in the econometric literature on semiparametric estimation. (A survey of these estima-

tors is given in Powell (1994).) Some estimation methods, like the “average derivative” approach

of Härdle and Stoker (1989) and Powell, Stock, and Stoker (1989) and the “density-weighted least

squares” estimator of Ruud (1986) and Newey and Ruud (1991) exploit an assumption of smooth-

ness (continuity and di¤erentiability) of the unknown function G, but require all components of the

regressor vector x to be jointly continuously distributed, which rarely applies in practice. Härdle

and Horowitz (1996) has extended the average derivative estimator to allow for discrete regressors

at the expense of introducing four additional nuisance parameters to be chosen by users of their

estimator in addition to the standard smoothing parameter choice required in all nonparametric

estimators. Other estimation methods which assume smoothness of G include the “single-index

regression” estimators of Ichimura (1993a), Ichimura and Lee (1991), and, for the special case of

a binary dependent variable, Klein and Spady (1993); these estimation methods permit general

distributions of the regressors, but can be computationally burdensome, since they involve min-

imization problems with nonparametric estimators of G whose solutions cannot be written in a

simple closed form. Still other estimators of were proposed for the “generalized regression model”

proposed by Han (1987),

yi= T(x0

i?0;"i); (1.2)

where the unknown transformation T(?) is assumed to be monotonic in its …rst argument, and where

the unobservable error term "iis assumed to be independent of xi. The assumed monotonicity of T,

which implies monotonicity of G in (1.1), is fundamental for the consistency of the “maximum rank

correlation” estimator of Han (1987) and the related monotonicity-based estimators of Cavanagh

and Sherman (1991); like the “single index regression” estimators, computation of the “monotonic-

ity” estimators is typically formidable, since it requires minimization of a criterion which may be

discontinuous and involves a double sum over the data.

1

Page 3

In this paper, which combines the results of Ahn (1995) and Ichimura and Powell (1996), both

“smoothness” and monotonicity the nuisance function G are imposed – more speci…cally, it is as-

sumed to be di¤erentiable (up to a high order) and invertible in its argument. Simple “two-step”

estimators are proposed under these restrictions; the …rst step obtains a nonparametric estimator

of the conditional mean giof yigiven xiusing a standard (kernel) method, while the second step

extracts an estimator of ?0from a matrix de…ned using this …rst-step estimator. One estimator

of the unknown coe¢cients is based upon the “eigenvector” approach that was used in a di¤erent

context by Ichimura (1993b), and the corresponding second-step matrix estimator was considered

(again in a di¤erent context) by Ahn and Powell (1993). An alternative, closed-form estimator

of ?0is also proposed; the relation of the “eigenvector” to the “closed form” estimation approach

is analogous to the relation of limited information maximum likelihood (LIML) to two-stage least

squares (2SLS) for simultaneous equations models. These estimators are computationally simple

(since the second-step matrix estimator can be written in closed form), and do not require that all

components of the regressor vector xiare jointly continuously distributed. And, as shown below,

they are root-n consistent (where n is the sample size) and asymptotically normal under regular-

ity conditions that have been imposed elsewhere in the econometric literature on semiparametric

estimation.

2. The Model and Estimator

Rewriting the single-index regression model (1.1) as

yi? gi+ ui? G(x0

i?0) + ui; (2.1)

where

gi? G(x0

i?0) ? E[yijxi] (2.2)

is the conditional mean of the (scalar) dependent variable yi given the p-dimensional vector of

regressors xi (so the unobservable ui has E[uijxi] = 0), the maintained assumption that G is

monotonic implies

?(gi) ? G?1(gi) = x0

i?0:(2.3)

2

Page 4

That is, given the conditional mean giof yi,

0 = x0

i?0? ?(gi) (2.4)

for some unknown transformation ?(gi) of gi. Clearly ?0could only be identi…ed up to a scale

normalization from this relation; given such a normalization, though, (2.4) can be used to identify

the remaining components of ?0, provided the regressors xi are su¢ciently variable when the

conditional mean giis held …xed. Speci…cally, for a pair of observations with the same conditional

mean gi, the parameter vector ?0must be orthogonal to the di¤erence in regressors, i.e.,

0=(xi? xj)0?0+ ?(gj) ? ?(gi)

(xi? xj)0?0

=ifgi= gj: (2.5)

Therefore, letting w(xi;xj) ? wijbe any nonnegative weighting function of the pair of regressors

xiand xj, the coe¢cient vector ?0satis…es

?0

0?w?0= 0;(2.6)

where ?w? E[?w(gj)] and

?w(s) = E[w(xi;xj) ? (xi? xj)(xi? xj)0jgi= s](2.7)

assuming these moments exist. Provided the matrix ?w has rank (p ? 1) — so that any other

nontrivial linear combination (xi?xj)0? of the di¤erence of regressors has nonzero variance condi-

tional on (xi?xj)0?0= 0 and ? is not proportional to ?0— the parameter vector is identi…ed (up

to scale) as the eigenvector corresponding to the unique zero eigenvalue of the matrix ?w, which

depends only on the joint distribution of the observable (yi;x0

i).

A natural approach to transform this identi…cation result into an estimation method for ?0

would be to …rst estimate the unobservable conditional expectation terms gi? E[yijxi] by some

nonparametric method, then estimate a sample analogue to the matrix ?wusing pairs of observa-

tions with estimated values ^ giof githat were approximately equal. Such an estimation strategy was

proposed, in a somewhat di¤erent context, by Ahn and Powell (1993); in that paper, the …rst-step

nonparametric estimator was the familiar kernel estimator, which takes the form of a weighted

average of the dependent variable,

^ gi?

Pn

i=1Kij? yj

Pn

3

i=1Kij

; (2.8)

Page 5

with weights Kijgiven by

Kij? K

?xi? xj

h1

?

; (2.9)

for K(?) a “kernel” function which tends to zero as the magnitude of its argument increases, and

h1? h1na …rst-step “bandwidth” which is chosen to tend to zero as the sample size n increases.

Given this estimator ^ giof the conditional mean variable gi, a second-step estimator of a matrix

analogous to ?wwas de…ned by Ahn and Powell (1993) as

^S ?

n

2

!?1n?1

X

i=1

n

X

j=i+1

^ !ij(xi? xj)(xi? xj)0;(2.10)

the weights ^ !ijtook the form

^ !ij?

1

h2

k

?^ gi? ^ gj

h2

?

titj; (2.11)

where k(?) is a univariate kernel analogous to K above, h2 ? h2n is a bandwidth sequence for

the second-step estimator^S, and ti= t(xi) is a “trimming” term which is chosen to equal zero

for observations where ^ gi is known to be imprecise (i.e., where xi is outside some prespeci…ed

compact subset of its support). The weighting function ^ !ij in (2.11) declines to zero as ^ gi? ^ gj

increases relative to the bandwidth h2; thus, the conditioning event “gi= gj” in the de…nition of

?wis ultimately imposed as this bandwidth shrinks with the sample size (and the nonparametric

estimator of giconverges to its true value in probability).

Adopting this estimator^S of ?w(which implies a particular de…nition of the weighting func-

tion w(xi;xj) in (2.6), described below), a corresponding estimator of ?0would exploit a sample

analogue of relation (2.6) based on the eigenvectors of^S. Though the matrix ?wwill be positive

semi-de…nite under the regularity conditions to be imposed, the estimator^S need not be for any

…nite sample. Hence, the estimator^? of ?0is de…ned here as the eigenvector for the eigenvalue

of^S that is closest to zero in magnitude. That is, de…ning (^ ?1;:::;^ ?p) to be the p solutions to the

determinantal equation

j^S??Ij = 0;(2.12)

the estimator^? is de…ned as an appropriately-normalized solution to

(^S?^ ?I)^? = 0; (2.13)

where

^ ? ? argmin

j

j^ ?jj: (2.14)

4