Simple Estimators for Monotone Index Models
University College London,
James L. Powell
University of California, Berkeley
In this paper, estimation of the coe¢cients in a “single-index” regression model is considered
under the assumption that the regression function is a smooth and strictly monotonic function
of the index. The estimation method follows a “two-step” approach, where the …rst step uses a
nonparametric regression estimator for the dependent variable, and the second step estimates the
unknown index coe¢cients (up to scale) by an eigenvector of a matrix de…ned in terms of this …rst-
step estimator. The paper gives conditions under which the proposed estimator is root-n-consistent
and asymptotically normal.
JEL Classi…cation: C24, C14, C13.
This research was supported by the National Science Foundation. Hyungtaik Ahn’s research
was supported by Dongguk Research Fund. We are grateful to Bo Honoré, Ekaterini Kyriazidou,
Robin Lumsdaine, Thomas Rothenberg, Paul Ruud, and Mark Watson for their helpful comments.
Estimation of the unknown coe¢cients ?0in the single index regression model
E(yijxi) = G(x0
where yiand xiare observable and G(?) is an unknown function, has been investigated in a number
of papers in the econometric literature on semiparametric estimation. (A survey of these estima-
tors is given in Powell (1994).) Some estimation methods, like the “average derivative” approach
of Härdle and Stoker (1989) and Powell, Stock, and Stoker (1989) and the “density-weighted least
squares” estimator of Ruud (1986) and Newey and Ruud (1991) exploit an assumption of smooth-
ness (continuity and di¤erentiability) of the unknown function G, but require all components of the
regressor vector x to be jointly continuously distributed, which rarely applies in practice. Härdle
and Horowitz (1996) has extended the average derivative estimator to allow for discrete regressors
at the expense of introducing four additional nuisance parameters to be chosen by users of their
estimator in addition to the standard smoothing parameter choice required in all nonparametric
estimators. Other estimation methods which assume smoothness of G include the “single-index
regression” estimators of Ichimura (1993a), Ichimura and Lee (1991), and, for the special case of
a binary dependent variable, Klein and Spady (1993); these estimation methods permit general
distributions of the regressors, but can be computationally burdensome, since they involve min-
imization problems with nonparametric estimators of G whose solutions cannot be written in a
simple closed form. Still other estimators of were proposed for the “generalized regression model”
proposed by Han (1987),
where the unknown transformation T(?) is assumed to be monotonic in its …rst argument, and where
the unobservable error term "iis assumed to be independent of xi. The assumed monotonicity of T,
which implies monotonicity of G in (1.1), is fundamental for the consistency of the “maximum rank
correlation” estimator of Han (1987) and the related monotonicity-based estimators of Cavanagh
and Sherman (1991); like the “single index regression” estimators, computation of the “monotonic-
ity” estimators is typically formidable, since it requires minimization of a criterion which may be
discontinuous and involves a double sum over the data.
In this paper, which combines the results of Ahn (1995) and Ichimura and Powell (1996), both
“smoothness” and monotonicity the nuisance function G are imposed – more speci…cally, it is as-
sumed to be di¤erentiable (up to a high order) and invertible in its argument. Simple “two-step”
estimators are proposed under these restrictions; the …rst step obtains a nonparametric estimator
of the conditional mean giof yigiven xiusing a standard (kernel) method, while the second step
extracts an estimator of ?0from a matrix de…ned using this …rst-step estimator. One estimator
of the unknown coe¢cients is based upon the “eigenvector” approach that was used in a di¤erent
context by Ichimura (1993b), and the corresponding second-step matrix estimator was considered
(again in a di¤erent context) by Ahn and Powell (1993). An alternative, closed-form estimator
of ?0is also proposed; the relation of the “eigenvector” to the “closed form” estimation approach
is analogous to the relation of limited information maximum likelihood (LIML) to two-stage least
squares (2SLS) for simultaneous equations models. These estimators are computationally simple
(since the second-step matrix estimator can be written in closed form), and do not require that all
components of the regressor vector xiare jointly continuously distributed. And, as shown below,
they are root-n consistent (where n is the sample size) and asymptotically normal under regular-
ity conditions that have been imposed elsewhere in the econometric literature on semiparametric
2. The Model and Estimator
Rewriting the single-index regression model (1.1) as
yi? gi+ ui? G(x0
i?0) + ui;(2.1)
i?0) ? E[yijxi] (2.2)
is the conditional mean of the (scalar) dependent variable yi given the p-dimensional vector of
regressors xi (so the unobservable ui has E[uijxi] = 0), the maintained assumption that G is
?(gi) ? G?1(gi) = x0
That is, given the conditional mean giof yi,
0 = x0
i?0? ?(gi) (2.4)
for some unknown transformation ?(gi) of gi. Clearly ?0could only be identi…ed up to a scale
normalization from this relation; given such a normalization, though, (2.4) can be used to identify
the remaining components of ?0, provided the regressors xi are su¢ciently variable when the
conditional mean giis held …xed. Speci…cally, for a pair of observations with the same conditional
mean gi, the parameter vector ?0must be orthogonal to the di¤erence in regressors, i.e.,
0=(xi? xj)0?0+ ?(gj) ? ?(gi)
= ifgi= gj:(2.5)
Therefore, letting w(xi;xj) ? wijbe any nonnegative weighting function of the pair of regressors
xiand xj, the coe¢cient vector ?0satis…es
where ?w? E[?w(gj)] and
?w(s) = E[w(xi;xj) ? (xi? xj)(xi? xj)0jgi= s](2.7)
assuming these moments exist. Provided the matrix ?w has rank (p ? 1) — so that any other
nontrivial linear combination (xi?xj)0? of the di¤erence of regressors has nonzero variance condi-
tional on (xi?xj)0?0= 0 and ? is not proportional to ?0— the parameter vector is identi…ed (up
to scale) as the eigenvector corresponding to the unique zero eigenvalue of the matrix ?w, which
depends only on the joint distribution of the observable (yi;x0
A natural approach to transform this identi…cation result into an estimation method for ?0
would be to …rst estimate the unobservable conditional expectation terms gi? E[yijxi] by some
nonparametric method, then estimate a sample analogue to the matrix ?wusing pairs of observa-
tions with estimated values ^ giof githat were approximately equal. Such an estimation strategy was
proposed, in a somewhat di¤erent context, by Ahn and Powell (1993); in that paper, the …rst-step
nonparametric estimator was the familiar kernel estimator, which takes the form of a weighted
average of the dependent variable,
with weights Kijgiven by
for K(?) a “kernel” function which tends to zero as the magnitude of its argument increases, and
h1? h1na …rst-step “bandwidth” which is chosen to tend to zero as the sample size n increases.
Given this estimator ^ giof the conditional mean variable gi, a second-step estimator of a matrix
analogous to ?wwas de…ned by Ahn and Powell (1993) as
^ !ij(xi? xj)(xi? xj)0;(2.10)
the weights ^ !ijtook the form
?^ gi? ^ gj
where k(?) is a univariate kernel analogous to K above, h2 ? h2n is a bandwidth sequence for
the second-step estimator^S, and ti= t(xi) is a “trimming” term which is chosen to equal zero
for observations where ^ gi is known to be imprecise (i.e., where xi is outside some prespeci…ed
compact subset of its support). The weighting function ^ !ij in (2.11) declines to zero as ^ gi? ^ gj
increases relative to the bandwidth h2; thus, the conditioning event “gi= gj” in the de…nition of
?wis ultimately imposed as this bandwidth shrinks with the sample size (and the nonparametric
estimator of giconverges to its true value in probability).
Adopting this estimator^S of ?w(which implies a particular de…nition of the weighting func-
tion w(xi;xj) in (2.6), described below), a corresponding estimator of ?0would exploit a sample
analogue of relation (2.6) based on the eigenvectors of^S. Though the matrix ?wwill be positive
semi-de…nite under the regularity conditions to be imposed, the estimator^S need not be for any
…nite sample. Hence, the estimator^? of ?0is de…ned here as the eigenvector for the eigenvalue
of^S that is closest to zero in magnitude. That is, de…ning (^ ?1;:::;^ ?p) to be the p solutions to the
j^S??Ij = 0;(2.12)
the estimator^? is de…ned as an appropriately-normalized solution to
(^S?^ ?I)^? = 0;(2.13)
^ ? ? argmin
j^ ?jj: (2.14)
A convenient normalization for^? (and ?0) imposes the additional restriction that a particular
component of ?0(say, the …rst) is known to be nonzero, and is normalized to unity, so that the
remaining coe¢cients are identi…ed relative to that value. Speci…cally, writing^? and ?0as
and partitioning^S conformably as
the solution to (2.13) takes the form
^? ? [^S22? ^ ?I]?1?^S21
for this normalization.
The “smallest eigenvalue” approach used here was used in an earlier paper by Ichimura (1993b)
to construct a two-step estimator for the binary response model under a conditional median re-
striction, suggested as a computationally-simpler alternative to the maximum score estimator for
this model proposed by Manski (1975, 1985). For the binary response model with a continously-
distributed, conditional-mean-zero error term, the conditional mean gi ? E[yijxi] of the binary
dependent variable equals one-half if and only if the underlying regression function x0
zero; by the same reasoning as given for the estimator^? in the present paper, Ichimura proposed
estimation of ?0using the eigenvector for the smallest eigenvalue of a matrix of weighted averages
of the cross-products of the regressors, with kernel weights (like those above) depending upon the
deviation of the estimated conditional means f^ gig from one-half. Though this estimator was shown
to be consistent and asymptotically normal, its rate of convergence was slower than the square root
of the sample size, unlike the asymptotic theory for the present estimator^? derived in the next
An alternative “closed form” estimator~? of ?0can be de…ned as
~? ? [^S22]?1?^S21;(2.18)
this estimator is motivated by rewriting (2.4) as
i2?0+ ?(gi) + vi; (2.19)
which is in the same form as the “selectivity bias” model treated by Ahn and Powell (1993), with
xi = (xi1;x0
i2)0and with error term vi which is identically zero for this application. In light of
the motivation for^? given above, the alternative estimator~? can be viewed as exploiting the fact
(to be veri…ed below) that ^ ? tends to zero in probability, since the smallest eigenvalue of the
probability limit ?w of^S is zero. The relation of^? to~? here is analogous to the relationship
between the two classical single-equation estimators for simultaneous equations systems, namely,
limited-information maximum likelihood, which has an alternative derivation as a least-variance
ratio (LVR) estimator, and two-stage least squares (2SLS), which can be viewed as a modi…cation
of LVR which replaces an estimated eigenvalue by its known (zero) probability limit. The analogy
to these classical estimators extends to the asymptotic distribution theory for^? and~?, which, under
the conditions imposed below, will be asymptotically equivalent, like LVR and 2SLS. Their relative
advantages and disadvantages are also analogous – e.g.,~? is simpler to compute, while^? will be
equivariant with respect to choice of which (nonzero) component of ?0to normalize to unity. As
noted in the introduction, both estimators will be much easier to calculate than many of the existing
estimators for the single-index regression model, which typically require solution of a p-dimensional
minimization problem with a criterion involving a double-summation over the observations.
3. Large Sample Properties of the Estimator
Since the de…nition of the estimator^? = (1;?^?
matrix estimator^S analyzed in Ahn and Powell (1993), it is most convenient to impose the same
0)0is based on the same form of a “pairwise di¤erence”
regularity conditions from that paper, and to derive the asymptotic theory for the present estimator
using the large-sample characterizations previously obtained. The appendix below lists analogues
of the eleven assumptions imposed by Ahn and Powell (1993), modi…ed to …t the present problem
and notation. The necessity and generality of those assumptions were discussed at length in that
previous paper; here, then, those conditions are only brie‡y reviewed, noting any di¤erences between
the current assumptions and their earlier counterparts.
In the assumptions in the appendix, the regression vector xiis assumed to have only discretely-
and continuously-distributed components, and high-order moments (namely, six) of yiand xiare
assumed to exist. The conditional mean gi= E[yijxi] = G(x0
i?0) is assumed to be continuously
distributed, with density function denoted by f(?); it is also assumed that the density f and various
conditional expectations of functions of xigiven gi= g are very smooth in the argument g, i.e.,
they have high-order derivatives which have well-behaved distributions when evaluated at gi= g.
One of these functions is the conditional expectation of the trimming variable ?(xi), which is
assumed to be bounded above and to decline smoothly to zero outside some compact set X for
which f(g) = f(G(x0?0)) is bounded away from zero on X. The functions K(?) and k(?) for the
…rst- and second-step estimators are assumed to be “higher-order” kernels, with the number of
vanishing moments of K depending upon the number of continuous components of xi, and with the
…rst three moments of k equalling zero (i.e., k is a “fourth-order kernel”). Likewise, the bandwidth
terms h1and h2are assumed to converge to zero at particular rates as the sample size n increases;
these conditions, combined with the “smoothness” and “higher-order kernel” assumptions, ensure
that the bias of various implicit nonparametric estimators is of smaller order than the square root
of the sample size, and is therefore negligible for the …rst-order distribution theory.
Under these conditions, the results of Ahn and Powell (1993) imply that the estimator^S of
(2.10) above converges in probability to a matrix ?0, which is a special case of the general matrix
?wof (2.7), with the particular weighting function
w0(xi;xj) ? titj(fifj)1=2= titjfi; (3.1)
where fi? f(gi) is the density of giand the last equality imposes the conditioning event gi= gj.
Using iterated expectations, the matrix ?0can be rewritten as
?E?2fi[E(tijgi) ? E(tixix0
E?2f(gi)[?i(gi) ? ?xx(gi) ? ?x(gi) ? ?x(gi)0?;
ijgi) ? E(tixijgi) ? E(tix0
It is easy to verify that ?0?0
= 0; the …nal assumption in the appendix is the identi…cation
condition, which asserts that the null space of ?0 only consists of scalar multiples of ?0. And,
imposing the normalization ?0= (1;??0), this requires that rank(?0) = p?1 = rank(?22), where
?22is the lower-right (p ? 1)-dimensional submatrix of ?0.
Under these conditions, spelled out precisely in the appendix, obvious modi…cations of the
arguments for Lemma 3.1 and Theorem 3.1 of Ahn and Powell (1993) yield the following large-
sample properties of the estimator^S :
Lemma 3.1: Under Assumptions A.1 through A.11 in the appendix below,
^S ? ?0
tifi?0(gi)[?(gi)xi? ?x(gi)] ? ui+ op(1);
where ui? yi?gi; ?0(g) = d?(g)=dg, for ? is de…ned in (2.3), and the remaining terms are de…ned
in (3.1) - (3.3) above.?
Consistency of the estimator^? in (2.17) above could be veri…ed directly using result (i) and
the identi…cation condition A.4, but it is simpler to derive the asymptotic distribution of^? using
result (ii), from which consistency of^? immediately follows. The asymptotic linearity expression
in (ii) above is the analogue to the result (ii) of Theorem 3.1 of Ahn and Powell (1993), exploiting
the relation ?(g) = x0
i?0(so^S?0is the same as \^Sz?" of that paper, with zi? xi). The terms
in the normalized average in expression (ii) have zero mean and …nite variance, so the Lindeberg-
Levy central limit theorem implies that^S?0is asymptotically normal; however, this asymptotic
distribution will be singular, since
0[?(gi)xi? ?x(gi)] = [?(gi)?(gi) ? E(ti?(g)jgi)] = 0;(3.4)
again using ?(g) = x0
i?0. It follows that
0^S?0= op(1); (3.5)
which further implies that the smallest (in magnitude) eigenvalue ^ ? converges in probability to zero
faster than the square root of the sample size, because
pnj^ ?j =pn min
To derive the asymptotic distribution of the proposed estimator^? of (2.17), the normalized
di¤erence of^? and ?0can be decomposed as
pn(^? ? ?0)=[^S22? ^ ?I]?1pn[^S12? (^S22? ^ ?I)?0]
[^S22? ^ ?I]?1pn^ s ?pn^ ? [^S22? ^ ?I]?1?0;= (3.7)
^ s ?^S12?^S22?0? [^S?0]2;(3.8)
i.e., ^ s is the subvector of^S corresponding to the free coe¢cients ?0. From result (i) of Lemma 3.1
and (3.6), it follows that
^S22? ^ ? I !p?22
pn(^? ? ?0) = [?22]?1pn^ s + o(1);(3.10)
from which the consistency and asymptotic normality of^? follow from the asymptotic normality
of^S?0. A similar argument yields the asymptotic equivalence of Ahn’s estimator~? of (2.18) and
the estimator^?, since
pn([^S22? ^ ? I]?1? [^S22]?1)^S12
?pn~ ? [^S22? ^ ? I]?1^?= (3.11)
for ~ ? an intermediate value between ^ ? and zero.
The results of these calculations are summarized in the following proposition:
Theorem 3.1: Under Assumptions A.1 through A.11, the estimator^? de…ned in (2.17) has
the asymptotic linear representation
pn(^? ? ?0) = ??1
i? 2tif(gi)?0(gi)(E[tijgi]xi2? E[tixi2jgi])(yi? gi);
and is asymptotically normal,
pn(^? ? ?0) !dN(0;??1
where ? ?E[ i 0
i]: Also,^? and the estimator~? of (2.18), proposed by Ahn (1995), are asymptoti-
pn(^? ?~?) = op(1):?
A …nal requirement for conducting the usual large-sample normal inference procedures is a
consistent estimator of the asymptotic covariance matrix ??1
22of^?. Estimation of ??1
straightforward; by result (i) of Lemma 3.1 and (3.6) above, either [^S22? ^ ?I]?1or [^S?1
consistent, with the former being more natural for^? and the latter for~?. Consistent estimation of
the matrix ? is less straightforward, but, as Ahn (1995) points out, one consistent estimator would
^ i^ i; (3.12)
n ? 1
^?ij(xi2? xj2)(xi2? xj2)0; (3.13)
?^ gi? ^ gj
and k0(?) denotes the derivative of the second-step kernel k(?). The argument for Ahn’s (1995)
Theorem 3.2 applies here as well, and yields consistency of^?.
4. Topics for Further Research
Though the large-sample theory of the previous section used a speci…c (kernel) form for the …rst-step
nonparametric estimator ^ giof gi= E[yijxi], it seems likely the results of Lemma 3.1 and Theorem
3.1 could be established using other initial nonparametric estimators of this conditional mean – like
series, nearest neighbor, or locally linear regression methods – under analogous regularity conditions
for those estimators. (Indeed, Ahn’s (1995) analysis used a slightly di¤erent speci…cation of the
kernel estimator than in the present paper.) This would be useful because the …rst-step kernel
estimator, while theoretically convenient, may well be problematic for practical implementation of
the procedure. In particular, the proposed estimation method exploits the monotonicity of giin
terms of the index x0
i?0, but the kernel estimator may require relatively large samples to accurately
re‡ect this monotonicity, and the second-step estimator will be sensitive to “oversmoothing” in
the …rst step. Consider, for example, a sample with ti= 1 for all observations (so that all xilie
in the prespeci…ed compact set X). In this case, as the …rst-step bandwidth h1tends to in…nity,
^ gitends to the sample average ? y of the dependent variable for all observations, and the matrix^S
tends to a constant multiple of the sample covariance of the regressors, whose smallest eigenvalue
needs not tend to zero in large samples, and whose corresponding eigenvector bears no necessary
relation to ?0. This suggests that a …rst-step estimation method whose “oversmoothed” limit was
non-constant might have better …nite-sample performance than the present kernel estimator. (For
example, gimight be estimated by the sum of a linear least-squares …t of yion xiand a kernel
regression estimate of the conditional mean of the residuals from that …t.) Whether such alternative
…rst-step estimators are theoretically valid and practically useful is a good topic for further work.
On a related topic, the “faster-than-root-n” convergence of the smallest eigenvalue ^ ? to zero
would be expected to fail if the single-index speci…cation (1.1) for the conditional mean of yi is
not satis…ed, which suggests that a normalized version of ^ ? might be used to test whether the
single-index speci…cation is indeed correct. However, the derivation of the asymptotic distribution
of ^ ? is not straightforward, and is related to the “asymptotic singularity” issue that arises in the
nonparametric speci…cation testing literature (e.g., see ãit Sahalia, Bickel, and Stoker (1994)), so
the large-sample properties of ^ ? will require additional work.
5. APPENDIX: Regularity Conditions
With modi…cations for the present problem and notation, conditions 3.1 through 3.11 of Ahn and
Powell (1993) are translated here as follows:
Assumption A.1 (Random Sampling and Bounded Moments): The vectors (yi;x0
dependently and identically distributed across i, with all components having …nite sixth-order
Assumption A.2 (Correctly-Speci…ed Model): The data satisfy the monotone single-index
regression model described in (2.1), (2.2), and (2.3) above.
Assumption A.3 (Continuous Distribution of Index): The conditional distribution of gi ?
E[yijxi] = G(x0
density function f(?) that is continuous and bounded from above.
i?0) is absolutely continuous with respect to Lebesgue measure, with (conditional)
Assumption A.4 (Identi…cation): The matrix ?0, de…ned in (3.2), and its lower-right (p ?
1) ? (p ? 1) submatrix ?22have rank p ? 1.
Assumption A.5 (Kernel Regularity, Second Step): The kernel function k(?) used to de…ne
the weights ^ !ijin (2.11) above satis…es
(i)k(u) is twice di¤erentiable, with k00(u) < k0for some k0;
(ii)k(u) = k(?u);
k(u) = 0 if juj > l0for some l0> 0; and
Ruk(u)du = 0 for u = 1, 2, and 3.
Assumption A.6 (Bandwidth Rates, Second Step): The bandwidth sequence h2used to de…ne
the weights ^ !ijof (2.11) is of the form
h2= cn? n??;
where the positive sequence cnhas c0< cn< c?1
for some c0> 0, and ? 2 (1=8;1=6).
Assumption A.7 (Smooth Density and Conditional Expectations): The conditional density
function f(u) of gi, the functions ?(g); ?x(g); and ?xx(g) (de…ned in (3.3) above), and the function
?(g) ? G?1[g] and its derivative ?0(g) ? d?(g)0dg are all fourth-order continuously di¤erentiable,
with derivatives that are bounded for all g in the support of gi:
Assumption A.8 (Distribution of Conditioning Variables): After an appropriate reordering,
the vector xi of regressors can be partitioned as xi = (x(1)
i)0, where x(1)
distributed and x(2)
is discrete. Furthermore, if ?(x(1)jx(2)) is the conditional density function of
= x(2), then for each x in some known, compact subset X of the support of xi, the
following conditions hold:
(i)?(x(1)jx(2)) > ?0for some ?0= ?0(x(2)) > 0:
(ii)De…ning ?i? ?(xi) ? (1;gi;fi;?i;?0(gi);x0
bounded and M-times continuously di¤erentiable with bounded derivatives in x(1), for some even
i;?x(g)0)0; the function ?(x) ? ?(x(1)jx(2)) is
integer M > m=(1=3 ? 2?), where m = dim(x(1)) and ? is given in Assumption 3.6 above.
(iii)The functions E[y2
ijxi= x] ? ?(x(1)jx(2)) and g(x) ? ?(x(1)jx(2)) are continuous on X.
(iv)The number of points of support of x(2)in X is …nite.
Assumption A.9 (Exogenous Trimming): The indicator variable ti is constructed so that
ti> 0 only if xi2 X, where the compact set X satis…es the restrictions in Assumption A.8 above.
Assumption A.10 (Kernel Regularity, First Step): The kernel function K(?) used to de…ne
the estimator ^ giin (2.8) and (2.9) above is of the form
(i)the even integer M satis…es the conditions of Assumption A.8 (ii);
(ii)?(u;C) is the density function of a N(0;C) random vector;
C is an arbitrary positive de…nite matrix;(iii)
(iv)b1;:::;bM=2are arbitrary, distinct, positive constants; and
(v)the constants a1;:::;aM=2satisfy the linear equations
j= 0for q = 1;:::;M=2 ? 1:
Assumption A.11 (Bandwidth Rates, First Step): The bandwidth sequence h1used to de…ne
the estimator giin (2.8) and (2.9) is of the form
h1= dn? n??;
where the positive sequence dnhas d0< dn< d?1
for some d0> 0, and ? 2 (1=2M;(1=6 ? ?)=m),
for M, ?, and m given in Assumptions 3.6 and 3.8 (ii).
Ahn, H., 1995, “Estimation of monotonic single index models,” manuscript, Department of
Economics, Virginia Polytechnic Institute.
Ahn, H. and J.L. Powell, 1993, “Semiparametric estimation of censored selection models with
a nonparametric selection mechanism,” Journal of Econometrics, 58, 3-29.
Aït Sahalia, Y., P. Bickel, and T. Stoker, 1995, “Goodness of …t tests for regression using kernel
methods,” manuscript, Sloan School of Management, M.I.T.
Cavanagh, C. and R. Sherman, 1991, “Rank estimators for monotonic regression models,”
Han, A.K., 1987, “Non-parametric analysis of a generalized regression model: the maximum
rank correlation estimator,” Journal of Econometrics 35, 303-316.
Härdle, W. and T.M. Stoker, 1989, “Investigating smooth multiple regression by the method of
average derivatives,” Journal of the American Statistical Association, 84, 986-995.
Härdle, W. and Horowitz, 1996, “Direct Semiparametric Estimation of Single-Index Models
With Discrete Covariates,” Journal of the American Statistical Association, 91, 1632-1640.
Ichimura, H., 1993a, “Semiparametric least squares (SLS) and weighted SLS estimation of
single-index models,” Journal of Econometrics, 58, 71-120.
Ichimura, H., 1993b, “Local quantile regression estimation of binary response models with
conditional heteroskedasticity,” manuscript, Department of Economics, University of Minnesota.
Ichimura, H. and L. Lee, 1991, “Semiparametric least squares estimation of multiple index
models: single equation estimation,” in: W.A. Barnett, J.L. Powell, and G. Tauchen, eds., Non-
parametric and semiparametric methods in econometrics and statistics, Cambridge: Cambridge
Ichimura, H. and J.L. Powell, 1996, “A simple estimator for monotone single index models, ”
manuscript, Department of Economics, Princeton University.
Klein, R.W. and R.S. Spady, 1993, “An e¢cient semiparametric estimator of the binary response
model,” Econometrica, 61, 387-422.
Manski, C.F., 1975, “Maximum score estimation of the stochastic utility model of choice,”
Journal of Econometrics, 3, 205-228.
Manski, C.F., 1985, “Semiparametric Analysis of discrete response, asymptotic properties of
the maximum score estimator,” Journal of Econometrics, 27, 205-228.
Newey, W.K. and P. Ruud, 1991, “Density weighted least squares estimation,” manuscript,
Department of Economics, U.C. Berkeley.
Powell, J.L., 1994, “Estimation of semiparametric models,” in R.F. Engle and D.F. McFadden, Download full-text
eds., Handbook of Econometrics, Volume 4, Amsterdam: North Holland.
Powell, J.L., J.H. Stock and T.M. Stoker, 1989, “Semiparametric estimation of weighted average
derivatives,” Econometrica 57, 1403-1430.
Ruud, P., 1986, “Consistent estimation of limited dependent variable models despite misspeci-
…cation of distribution,” Journal of Econometrics, 32, 157-187.