Content uploaded by JeanMarie Dufour
Author content
All content in this area was uploaded by JeanMarie Dufour on Feb 18, 2017
Content may be subject to copyright.
Content uploaded by JeanMarie Dufour
Author content
All content in this area was uploaded by JeanMarie Dufour on Feb 18, 2017
Content may be subject to copyright.
Finitesample generalized conﬁdence distributions
and signbased robust estimators in median regressions
with heterogeneous dependent errors ∗
Élise Coudin †
INSEECREST
JeanMarie Dufour ‡
McGill University
January 2017
∗The authors thank Magali Beffy, Marine Carrasco, Frédéric Jouneau, Marc Hallin, Thierry Magnac, Bill McCaus
land, Benoit Perron, and Alain Trognon for useful comments and constructive discussions. This work was supported
by the William Dow Chair in Political Economy (McGill University), the Bank of Canada (Research Fellowship), the
Toulouse School of Economics (PierredeFermat Chair of excellence), the Universitad Carlos III de Madrid (Banco
Santander de Madrid Chair of excellence), a Guggenheim Fellowship, a KonradAdenauer Fellowship (Alexandervon
Humboldt Foundation, Germany), the Canadian Network of Centres of Excellence [program on Mathematics of Informa
tion Technology and Complex Systems (MITACS)], the Natural Sciences and Engineering Research Council of Canada,
the Social Sciences and Humanities Research Council of Canada, and the Fonds de recherche sur la société et la culture
(Québec).
†Centre de recherche en économie et statistique (CRESTENSAE). Mailing address: Laboratoire de microéconome
trie, CREST, Timbre J390, 15 Bd G. Péri, 92245 Malakoff Cedex, France. Email: elise.coudin@ensae.fr. TEL: 33 (0) 1
41 17 77 33; FAX: 33 (0) 1 41 17 76 34.
‡William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des
organisations (CIRANO), and Centre interuniversitaire de recherche en économie quantitative (CIREQ). Mailing address:
Department of Economics, McGill University, Leacock Building, Room 414, 855 Sherbrooke Street West, Montréal,
Québec H3A 2T7, Canada. TEL: (1) 514 398 4400 ext. 09156; FAX: (1) 514 398 4800; email: jeanmarie.dufour@
mcgill.ca . Web page: http://www.jeanmariedufour.com
ABSTRACT
We study the problem of estimating the parameters of a linear median regression without any as
sumption on the shape of the error distribution – including no condition on the existence of moments
– allowing for heterogeneity (or heteroskedasticity) of unknown form, noncontinuous distributions,
and very general serial dependence (linear and nonlinear). This is done through a reverse inference
approach, based on a distributionfree testing theory [Coudin and Dufour (2009, The Economet
rics Journal)], from which conﬁdence sets and point estimators are subsequently generated. The
estimation problem is tackled in two complementary ways. First, we show how conﬁdence distri
butions for model parameters can be applied in such a context. Such distributions – which can be
interpreted as a form of ﬁducial inference – provide a frequencybased method for associating prob
abilities with subsets of the parameter space (like posterior distributions do in a Bayesian setup)
without the introduction of prior distributions. We consider generalized conﬁdence distributions
applicable to multidimensional parameters, and we suggest the use of a projection technique for
conﬁdence inference on individual model parameters. Second, we propose point estimators, which
have a natural association with conﬁdence distributions. These estimators are based on maximiz
ing test pvalues and inherit robustness properties from the generating distributionfree tests. Both
ﬁnitesample and largesample properties of the proposed estimators are established under weak
regularity conditions. We show they are median unbiased (under symmetry and estimator unicity)
and possess equivariance properties. Consistency and asymptotic normality are established without
any moment existence assumption on the errors, allowing for noncontinuous distributions, hetero
geneity and serial dependence of unknown form. These conditions are considerably weaker than
those used to show corresponding results for LAD estimators. In a Monte Carlo study of bias and
RMSE, we show signbased estimators perform better than LADtype estimators in heteroskedastic
settings. We present two empirical applications, which involve ﬁnancial and macroeconomic data,
both affected by heavy tails (nonnormality) and heteroskedasticity: a trend model for the S&P
index, and an equation used to study
β
convergence of output levels across U.S. States.
Key words: signbased methods; median regression; test inversion; HodgesLehmann estimators;
conﬁdence distributions; pvalue function; least absolute deviation estimators; quantile regres
sions; sign test; simultaneous inference; Monte Carlo tests; projection methods; nonnormality;
heteroskedasticity; serial dependence; GARCH; stochastic volatility.
Journal of Economic Literature classiﬁcation: C13, C12, C14, C15.
i
Contents
1. Introduction 1
2. Framework 4
2.1. Model ....................................... 4
2.2. Quadratic signbased tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Conﬁdence distributions 7
3.1. Conﬁdence distributions in univariate regressions . . . . . . . . . . . . . . . . . 7
3.2. Simultaneous and projectionbased pvalue functions in multivariate regression .10
4. Signbased estimators 13
4.1. Signbased estimators as maximizers of a pvalue function ............ 13
4.2. Signbased estimators as solutions of a nonlinear generalized leastsquares problem 13
4.3. Signbased estimators as GMM estimators . . . . . . . . . . . . . . . . . . . . 15
5. Finitesample properties of signbased estimators 16
6. Asymptotic properties 17
6.1. Identiﬁcation and consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2. Asymptotic normality ............................... 19
6.3. Asymptotic or projectionbased conﬁdence sets? . . . . . . . . . . . . . . . . . 20
7. Simulation study 21
7.1. Simulation setup ................................. 21
7.2. Bias and RMSE .................................. 23
8. Empirical applications 23
8.1. Drift estimation with heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . 25
8.2. A robust signbased estimate of convergence across U.S. states .......... 26
9. Conclusion 29
A. Proofs 30
B. Convergence data: concentrated statistics and pvalues 37
ii
List of Deﬁnitions, Assumptions, Propositions and Theorems
Assumption 2.0 : Weak conditional mediangale . . . . . . . . . . . . . . . . . . . . . . 5
Assumption 2.0 : Sign Moment condition . . . . . . . . . . . . . . . . . . . . . . . . . 5
Deﬁnition 3.0 : Conﬁdence distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Assumption 4.0 : Invariance of the distribution function . . . . . . . . . . . . . . . . . . 13
Proposition 5.1 : Invariance ................................ 16
Proposition 5.2 : Median unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Assumption 6.0 : Mixing ................................. 17
Assumption 6.0 : Boundedness .............................. 17
Assumption 6.0 : Compactness .............................. 17
Assumption 6.0 : Regularity of the density . . . . . . . . . . . . . . . . . . . . . . . . . 17
Assumption 6.0 : Point identiﬁcation condition . . . . . . . . . . . . . . . . . . . . . . 17
Assumption 6.0 : Uniformly positive deﬁnite weight matrix . . . . . . . . . . . . . . . . 17
Assumption 6.0 : Locally positive deﬁnite weight matrix . . . . . . . . . . . . . . . . . 17
Theorem 6.1 : Consistency ................................. 18
Assumption 6.1 : Uniformly bounded densities . . . . . . . . . . . . . . . . . . . . . . 19
Assumption 6.1 : Mixing with r>2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Assumption 6.1 : Deﬁnite positiveness of Ln. . . . . . . . . . . . . . . . . . . . . . . . 19
Assumption 6.1 : Deﬁnite positiveness of Jn. . . . . . . . . . . . . . . . . . . . . . . . 19
Theorem 6.2 : Asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
List of Tables
1 Simulated models ................................. 22
2 Simulated bias and RMSE ............................. 24
3 Constant and drift estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Summary of regression diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Regressions for personal income across U.S. States, 18801988 .......... 28
iii
1. INTRODUCTION
1
1. Introduction
A basic problem in statistics and econometrics consists in studying the relationship between a de
pendent variable and a vector of explanatory variables under weak distributional assumptions. For
that purpose, the LaplaceBoscovich median regression is an attractive approach because it can
yield estimators and tests which are considerably more robust to nonnormality and outliers than
leastsquares methods; see Dodge (1997). The least absolute deviation (LAD) estimator is the refer
ence estimation method in this context. Quantile regressions [Koenker and Bassett (1978), Koenker
(2005)] can be viewed as extensions of median regression. An important reason why such methods
yield more robust inference comes from the fact that hypotheses about moments are not generally
testable in nonparametric setups, while hypotheses about quantiles remain testable under similar
conditions [see Bahadur and Savage (1956), Dufour (2003), Dufour, Jouneau and Torrès (2008)].
Tests and conﬁdence sets associated with LAD estimators are typically based on asymptotic
approximations. The distributional theory of LAD estimators and their extensions usually postu
lates moment conditions on model errors, such as the existence of moments up to a given order,
as well as other regularity conditions, such as continuity, independence or identical distributions;
see for instance Portnoy (1991), Knight (1998), El Bantli and Hallin (1999), and Koenker (2005).
Further, this theory and the associated tests and conﬁdence sets are typically based on asymptotic
approximations. The same remark applies to work on LADtype estimation in models involving het
eroskedasticity and autocorrelation [Zhao (2001), Weiss (1990)], endogeneity [Amemiya (1982),
Powell (1983), Hong and Tamer (2003)], censored models [Powell (1984, 1986)], and nonlinear
functional forms [Weiss (1991)]. By contrast, provably valid tests can be derived in such models,
under remarkably weaker conditions, which do not require the existence of moments and allow for
very general forms of heterogeneity (or heteroskedasticity); see Coudin and Dufour (2009).
In this paper, we exploit this feature of testing theory in the context of median regression to
derive more robust estimation methods. Speciﬁcally, we study the problem of estimating the pa
rameters of a linear median regression without any assumption on the shape of the error distribution
– including no condition on the existence of moments at any order – allowing for heterogeneity
(or heteroskedasticity) of unknown form, noncontinuous distributions, and very general serial de
pendence (linear and nonlinear). This is done through a reverse inference approach, which starts
from a distributionfree testing theory [Coudin and Dufour (2009)], subsequently exploited to derive
conﬁdence sets and point estimators. Using the tests proposed in Coudin and Dufour (2009), the
estimation problem is tackled in two complementary ways.
First, we show how conﬁdence distributions for model parameters [Schweder and Hjort (2002),
Xie and Singh (2013)] can be applied in such a context. Such distributions – which can be in
terpreted as a form of ﬁducial inference [Fisher (1930), Buehler (1983), Efron (1998)] – provide
a frequencybased method for associating probabilities with subsets of the parameter space (like
posterior distributions do in a Bayesian setup) without the introduction of a prior distribution. In
the onedimensional model, the conﬁdence distribution is deﬁned as a distribution whose quantiles
span all the possible conﬁdence intervals [Schweder and Hjort (2002)]. In this paper, we consider
generalized conﬁdence distributions applicable to multidimensional parameters, and we suggest the
use of a projection technique for conﬁdence inference on individual model parameters. The latter
1. INTRODUCTION
2
are exact – in the sense that the parameters considered are covered with known probabilities (or
larger) – under the mediangale assumption considered in Coudin and Dufour (2009). Further, if
more general linear dependence is allowed, the proposed method remains valid asymptotically.
Second, we propose point estimates, which bear a natural association with the above conﬁdence
distributions. These HodgesLehmann estimators are based on maximizing test pvalues and in
herit several robustness properties from the distributionfree tests used to generate them [Hodges
and Lehmann (1963)]. In particular, both ﬁnitesample and largesample properties are established
under very weak regularity conditions. We show they are median unbiased (under symmetry and es
timator unicity) and possess equivariance properties with respect to linear transformations of model
variables. Consistency and asymptotic normality are established without any moment existence
assumption on the errors, allowing noncontinuous distributions, heterogeneity and general serial
dependence of unknown form. These conditions are considerably weaker than those usually used to
obtain corresponding results for LAD estimators.
The conjunction of signbased tests, projectionbased conﬁdence regions, projectionbased p
values and signbased estimators thus provides a complete system of inference, which is valid for
any given sample size under very weak distributional assumptions and remains asymptotically valid
under weaker conditions (including allowance for general forms of linear residual dependence).
Fisher’s ﬁducial distributions and other ﬁducial inference arguments [Fisher (1930), Buehler
(1983), Efron (1998), Hannig (2006)] are not commonly used in econometrics because they require
the availability of pivotal test statistics with known distributions. This condition is not fulﬁlled in
general, especially in semiparametric or nonparametric settings. However, in the context of me
dian regression, signbased methods provide a way to construct such pivots and ﬁducial inference
tools can be developed. For any given sample size, the sign transform enables one to construct test
statistics with known nuisanceparameterfree distributions without additional parametric restric
tions. This enables us to construct ﬁducial inference tools adapted to multidimensional parameters.
We exploit realized pvalue functions, which are constructed by testing hypotheses of the form
H0(
β
0):
β
=
β
0,where
β
is the vector of the regression coefﬁcients. Speciﬁcally, we combine
signbased tests for such joint hypotheses [as given in Coudin and Dufour (2009)] with projec
tion techniques. For each component, a projected pvalue function provides a representation of the
evidence for each possible value of that component.
Using the above pvalues (as a function of
β
0), we then derive estimators and study their prop
erties. Hodges and Lehmann (1963) proposed a general principle to directly derive estimators from
test procedures. They suggest to invert a test for H0(
β
0):
β
=
β
0, and then choose the value
β
0
which is “least rejected” by the test procedure. First applied to the Wilcoxon’s signed rankstatistic
for estimating a shift or a location, this principle was adapted to regression models by deriving
socalled Restimators from rank or signedrank statistics [Jureckova (1971), Jaeckel (1972), Koul
(1971)]. In a multidimensional context, this leads one to select the value of
β
0with the highest
degree of conﬁdence, i.e. with the highest pvalue.
We study the problem of estimating the parameters of the median regression by minimizing
(weighted) signbased test statistics over different null hypotheses. Since the tests used to generate
them are remarkably robust, the estimators inherit strong robustness properties. Signbased esti
mators are generated by signbased tests, and they inherit several attractive properties of the latter
1. INTRODUCTION
3
(e.g., robustness to nonnormality and heterogeneity). We will see that these can alternatively be
computed by minimizing quadratic forms of the constrained signs. So they have a classical GMM
form [Hansen (1982), and Honore and Hu (2004) for GMM statistics involving signs].
Both ﬁnitesample and largesample properties of signbased estimators are established under
weak regularity conditions. We show they are median unbiased (under symmetry and estimator
unicity) and possess equivariance properties with respect to linear transformations of model vari
ables. Consistency and asymptotic normality are established without any moment existence as
sumption on the errors, allowing noncontinuous distributions, heterogeneity and general serial de
pendence of unknown form. These conditions are considerably weaker than those usually used
to obtain corresponding results for LAD estimators; see Bassett and Koenker (1978), Bloomﬁeld
and Steiger (1983), Powell (1984), Phillips (1991), Pollard (1991), Portnoy (1991), Weiss (1991),
Knight (1998), El Bantli and Hallin (1999) and the references therein. In particular, asymptotic
normality and consistency hold for heavytailed disturbances which may not have ﬁnite variances.
This interesting property is induced by the sign transformation. Signs of residuals always possess
ﬁnite moments, so no further restriction on the disturbance moments is required. Except for Knight
(1989) and Phillips (1991), who considered the case of autoregressive models, the distribution of
LAD estimators in regressions where the error variances may not exist has received little atten
tion. In general, LAD estimators and the signbased estimators proposed here follow from different
optimization rules, and they can be quite different.
The class of signbased estimators we propose includes as special cases the sign estimators
derived by Boldin, Simonova and Tyurin (1997) from locally most powerful sign tests in linear re
gressions with i.i.d. errors and ﬁxed regressors. Note also that the procedures proposed by Hong and
Tamer (2003) and Honore and Hu (2004) also rely on the i.i.d. assumption. In this paper, we stress
that a major advantage of signs over ranks consists in dealing transparently with heteroskedastic (or
heterogeneous) disturbances. Many heteroskedastic and possibly dependent schemes are covered
and, in presence of linear dependence, a HACtype correction for heteroskedasticity and autocorre
lation can be included in the criterion function.
The construction of signbased estimators as HodgesLehmann estimators makes these a natural
complement of the ﬁnitesample tests used to generate them. The latter rely on the exact distribution
of the corresponding signbased test statistics, do not involve nuisance parameters, and allow one to
control test levels in ﬁnite samples under heteroskedasticity and nonlinear dependence of unknown
form. In Coudin and Dufour (2009), Monte Carlo test methods [Dwass (1957), Barnard (1963) and
Dufour (2006)] are combined with test inversion and projection techniques [Dufour (1990, 1997),
Dufour and Kiviet (1998), Abdelkhalek and Dufour (1998), Dufour and Jasiak (2001), Dufour and
Taamouti (2005)] to build conﬁdence sets and test general hypotheses.1There is no need to estimate
the error density at zero in contrast with tests that rely on kernel estimates of the LAD asymptotic
covariance matrix.2Furthermore, when the test criteria are modiﬁed to cover linear dependence,
1For an alternative ﬁnitesample inference exploiting a quantile version of the same sign pivotality result, which holds
if the observations are Xconditionally independent, see Chernozhukov, Hansen and Jansson (2009).
2In the i.i.d. error case, Honore and Hu (2004) observed in simulations that kernelbased estimates of the asymptotic
standard error of the medianbased estimator tend to be too small, so the associated tests tend to overreject the null
hypothesis. Other estimates of the LAD asymptotic covariance matrix can be obtained by bootstrap procedures [design
matrix bootstrap in Buchinsky (1995, 1998), block bootstrap in Fitzenberger (1997), Bayesian bootstrap in Hahn (1997)]
2. FRAMEWORK
4
the resulting inference is asymptotically valid. The conjunction of signbased tests, projection
based conﬁdence regions, and signbased estimators thus provides a complete system of inference,
which is valid for any given sample size under very weak distributional assumptions and remains
asymptotically valid under even weaker conditions (including allowance for linear dependence in
regression disturbances).
We study the performance of the proposed estimators in a Monte Carlo study that allows for var
ious nonGaussian and heteroskedastic setups. We ﬁnd that signbased estimators are competitive
(in terms of bias and RMSE) when errors are i.i.d., while they are substantially more reliable than
usual methods (LS, LAD) when heterogeneity or serial dependence is present in the error term.
Finally we present two empirical applications, which involve ﬁnancial and macroeconomic data.
In the ﬁrst one, we study a trend model for the Standard and Poor’s Composite Price Index, over the
period 19281987 as well as the 1929 crash period (which is characterized by huge price volatilities).
In the second application, we consider an equation used to study
β
convergence of output levels
across U.S. States, with a small size. In both cases, the data are affected by heavy tails (non
normality) and heteroskedasticity.
The paper is organized as follows. Section 2 presents the model, the signbased statistics and
the Monte Carlo tests. Section 3 is dedicated to conﬁdence distributions and pvalue functions. In
section 4, we deﬁne the proposed family of signbased estimators. The ﬁnitesample properties of
the signbased estimators are studied in section 5, while their asymptotic properties are considered
in section 6. In section 7, we present the results of our simulation study of bias and RMSE. The
empirical application is reported in section 8. We conclude in section 9. Appendix A contains the
proofs.
2. Framework
We will now summarize the general framework we study and deﬁne the test statistics on which the
estimation methods we propose are based. This framework is the same as the one used in Coudin
and Dufour (2009).
2.1. Model
We consider a stochastic process {(yt,x′
t):
Ω
→Rp+1:t=1,2, . .. }deﬁned on a probability space
(
Ω
,F,P),such that ytand xtsatisfy a linear model of the form
yt=x′
t
β
+ut,t=1,...,n,(2.1)
where ytis a dependent variable, xt= (xt1,..., xt p )′is a pvector of explanatory variables, and ut
is an error process. The xt’s may be random or ﬁxed. In the sequel, y= (y1,..., yn)′∈Rnwill
denote the dependent variable vector, X= (x1,..., xn)′∈Rn×pthe n×pmatrix of explanatory
variables, and u= (u1,...,un)′∈Rnthe disturbance vector. Moreover, F
t(·x1,..., xn)represents
and resampling methods [Parzen, Wei and Ying (1994)]. But the justiﬁcation of these also rely on usual asymptotic
regularity conditions.
2. FRAMEWORK
5
the distribution function of utconditional on X. This framework is also used in Coudin and Dufour
(2009).
The traditional form of a median regression assumes that the disturbances u1, .. . , unare i.i.d.
with median zero
Med(utx1,...,xn) = 0,t=1,...,n.(2.2)
Here, we relax the assumption that the utare i.i.d., and we consider moment conditions based on
residual signs where the sign operator s:R→{−1,0,1}is deﬁned as s(a) = 1[0,+∞)(a)−1(−∞,0](a),
with 1A(a) = 1 if a∈Aand 1A(a) = 0 if a/∈A.For convenience, if u∈Rn, we will note s(u) =
s(u1),...,s(un), the nvector of the signs of the components.
Assumption (2.2) is not sufﬁcient to obtain a ﬁnitesample distributional theory for sign statistics
(because further restrictions on the dependence between the errors are needed). Let us consider
adapted sequences S(v,F) = {vt,Ft:t=1,2, ... }where vtis any measurable function of Wt=
(yt,x′
t)′,Ftis a
σ
ﬁeld in
Ω
,Fs⊆Ftfor s<t,
σ
(W1,...,Wt)⊂Ftand
σ
(W1,...,Wt)is the
σ
algebra spanned by W1,...,Wt.Then the weak conditional mediangale provides such a setup.
Assumption 2.1 WEAK COND ITI ONAL MEDIANGALE. Let Ft=
σ
(u1,...,ut,X),for t ≥1.u
in the adapted sequence S(u,F)is a weak mediangale conditional on X with respect to {Ft:t=
1,2, . .. }iff P[u1<0X] = P[u1>0X]and
P[ut<0u1,...,ut−1,X] = P[ut>0u1,...,ut−1,X],for t >1.(2.3)
Besides nonnormality (including no condition on the existence of moments), this assumption al
lows for heterogeneity (or heteroskedasticity) of unknown form, noncontinuous distributions, and
general forms of (nonlinear) serial dependence, including GARCHtype and stochastic volatility of
unknown order. It does not, however, cover “linear serial dependence” such as an ARMA process
on ut.
Clearly, Assumption 2.1 entails (2.2). When Ext<+∞,for all t,it also implies that s(ut)is
uncorrelated with xt,an assumption we state for future reference.
Assumption 2.2 SIGN MO MEN T CONDITION.Ext<+∞and E[s(ut)xt] = 0,for t =1,...,n.
This assumption allows for both linear and nonlinear serial dependence, but makes difﬁcult the
derivation of ﬁnitesample distributions. We use it in the asymptotic results presented below.
2.2. Quadratic signbased tests
In order to derive robust estimators, we consider tests for hypotheses of the form H0(
β
0):
β
=
β
0
vs. H1(
β
0):
β
6=
β
0in model (2.1)(2.2). These are based on general quadratic forms based on the
vector s(y−X
β
0)of the constrained signs (i.e., the signs aligned with respect to X
β
0):
DS[
β
0,¯
Ω
n(
β
0)] = s(y−X
β
0)′X
Ω
ns(y−X
β
0),XX′s(y−X
β
0)(2.4)
where ¯
Ω
n(
β
0) =
Ω
ns(y−X
β
0),Xis a p×ppositive deﬁnite weight matrix which may depend on
the constrained signs. If the disturbances follow a weak mediangale (Assumption 2.1), signbased
2. FRAMEWORK
6
statistics of this form constitute pivotal functions: the distribution of DS[
β
0,¯
Ω
n(
β
0)] conditional
on Xis completely determined under H0(
β
0)and can be simulated. Even though the distribution
of DS[
β
0,¯
Ω
n(
β
0)] depends on Xand
Ω
n·under H0(
β
0),critical values can be approximated to
any degree of precision by simulation. Alternatively, exact Monte Carlo tests can be built using a
randomized tie correction procedure [Dufour (2006)]. So we can get an exact test of H0(
β
0).The
fact that
Ω
n·depends on the data only through s(y−X
β
0)plays a central role in generating this
feature.
Further, if linear serial dependence is allowed and the assumption that s(y−X
β
0)are Xare
independent is relaxed [as described in Coudin and Dufour (2009)], this dependence can be taken
into account by an appropriate choice of
Ω
n·.The test statistic DS[
β
0,¯
Ω
n(
β
0)] then remains
asymptotically pivotal under H0(
β
0), and the ﬁnitesample procedure just described yields a test
such that the probability of rejecting H0(
β
0)converges to the nominal level of test under any dis
tribution compatible with H0(
β
0).In all cases, due to the sign transformation, the tests so obtained
are remarkably robust to heavytailed distributions (and other features).
It will be useful to spell out how an exact Monte Carlo test based on a discrete test statistic
like DS[
β
0,¯
Ω
n(
β
0)] can be obtained. Under Assumption 2.1, we can generate a vector of Nin
dependent replicates D(1)
S(
β
0),...,D(N)
S(
β
0)′from the distribution of DS[
β
0,¯
Ω
n(
β
0)] under the
null hypothesis as well as (V(0),...,V(N))′a(N+1)vector of i.i.d. uniform variables on the inter
val [0,1].Setting D(0)
S(
β
0)≡DS[
β
0,¯
Ω
n(
β
0)] the observed statistic. Then, a Monte Carlo test for
H0(
β
0)consists in rejecting the null hypothesis whenever the empirical pvalue is smaller than
α
,
i.e. ˜pN(
β
0)≤
α
where ˜pN(
β
0)≡ˆpN[D(0)
S(
β
0),
β
0],
ˆpN(x,
β
0) = Nˆ
GN(x,
β
0) + 1
N+1(2.5)
and ˆ
GN(x,
β
0) = 1−1
N∑N
i=1s+(x−D(i)
S(
β
0))+ 1
N∑N
i=1
δ
(D(i)
S(
β
0)−x)s+(V(i)−V(0)),with s+(x) =
1[0,∞)(x),
δ
(x) = 1{0}(x). When
α
(N+1)is an integer, the size of this test is equal to
α
for any
sample size n[see Dufour (2006)]. This procedure also provides a test such that the probability of
rejection converges to
α
.
Note also that the conﬁdence region
C1−
α
(
β
) = {
β
0: ˜pN(
β
0)≥
α
}(2.6)
which contains all the values
β
0such that the empirical pvalue ˜pN(
β
0)is higher than
α
has by
construction level 1 −
α
for any sample size. It is then possible to derive general (and possibly
nonlinear) tests and conﬁdence sets by projection techniques. For example, conservative individual
conﬁdence intervals are obtained in such a way. Finally, if DSis an asymptotically pivotal function
all previous results hold asymptotically. For a detailed presentation, see Coudin and Dufour (2009).
3. CONFIDENCE DISTRIBUTIONS
7
3. Conﬁdence distributions
In the oneparameter model, statisticians have deﬁned the conﬁdence distribution notion that sum
marizes a family of conﬁdence intervals; see Schweder and Hjort (2002). By deﬁnition, the quantiles
of a conﬁdence distribution span all the possible conﬁdence intervals of a real
β
. The conﬁdence
distribution is a reinterpretation of the Fisher ﬁducial distributions and provides, in a sense, an ana
logue of Bayesian posterior probabilities in a frequentist setup [see also Fisher (1930), Neyman
(1941) and Efron (1998)]. This statistical notion is not commonly used in the econometric litera
ture, for two reasons. First, it is only deﬁned in the oneparameter case. Second, it requires that the
test statistic be a pivot with known exact distribution. Below we extend that notion (or an equiv
alent) to multidimensional parameters. The sign transformation enables one to construct statistics
which are pivots with known distribution without imposing parametric restrictions on the sample.
Consequently, our setup does not suffer from the second restriction. In that section, we brieﬂy recall
the initial statistical concept and apply it to an example in univariate regression. Then, we address
the extension to multidimensional regressions.
3.1. Conﬁdence distributions in univariate regressions
Schweder and Hjort (2002) deﬁned the conﬁdence distribution for the real parameter
β
such a
distribution depending on the observations (y,x), whose cumulative distribution function evaluated
at the true value of
β
has a uniform distribution whatever the true value of
β
. In a formalized way,
this can be expressed as follows:
Deﬁnition 3.1 CON FIDENCE DISTRIBUTION. Any distribution with cumulative CD(
β
)and quan
tile function CD−1(
β
), such that
P
β
[
β
≤CD−1(
α
;y;x)] = P
β
[CD(
β
;y;x)≤
α
] =
α
(3.1)
for all
α
∈(0,1)and for all probability distributions in the statistical model, is called a conﬁdence
distribution of
β
.
(−∞,CD−1(
α
)] is a onesided stochastic conﬁdence interval with coverage probability
α
,3
and the realized conﬁdence CD(
β
0;y;x)is the pvalue of the onesided hypothesis H∗
0(
β
0):
β
≤
β
0versus H∗
1(
β
0):
β
>
β
0when the observed data are y,x. The realized pvalue when testing
H0(
β
0):
β
=
β
0versus H1(
β
0):
β
6=
β
0is 2min{CD(
β
0),1−CD(
β
0)}. Those relations are
stated in Lemma 2 of Schweder and Hjort (2002): the conﬁdence of the statement “
β
≤
β
0” is the
degree of conﬁdence CD(
β
0)for the conﬁdence interval −∞,CD−1CD(
β
0), and is equal to
the pvalue of a test of H∗
0(
β
0):
β
≤
β
0vs. H∗
1(
β
0):
β
>
β
0. Hence, tests and conﬁdence intervals
on
β
are contained in the conﬁdence distribution.
Schweder and Hjort (2002) also note that, since the cumulative function CD(
β
)is an invertible
function of
β
and is uniformly distributed, CD(
β
)constitutes a pivot conditional on x. Recipro
cally, whenever a pivot increases with
β
(for example a continuous statistic T(
β
)with cumulative
3For continuous distributions, just note that P
β
[
β
≤CD−1(
α
)] = P
β
{CD(
β
)≤CDCD−1(
α
)}=P
β
{CD(
β
)≤
α
]}=
α
3. CONFIDENCE DISTRIBUTIONS
8
distribution function Fthat is independent of
β
and free of any nuisance parameter), FT(
β
)is
uniformly distributed and satisﬁes conditions for providing a conﬁdence distribution. Let T(
β
)be
such a continuous real statistic increasing with
β
with a free of nuisance parameter distribution. A
test of H0:
β
≤
β
0is rejected when Tobs(
β
0)is large, with pvalue P
β
0[T(
β
0)>Tobs(
β
0)]. Then,
P
β
0[T(
β
0)>Tobs(
β
0)] = 1−F
β
0(Tobs(
β
0)) = CD(
β
0)(3.2)
where F
β
0(.)is the sampling distribution of T(
β
0)under
β
=
β
0. Consequently, simulated sam
pling distributions and simulated realized pvalues as presented previously yield a way to construct
simulated conﬁdence distributions.
The sampling distribution and the conﬁdence distribution are fundamentally different theoreti
cal notions. The sampling distribution is the probability distribution of T(
β
)obtained by repeated
samplings whereas the conﬁdence distribution is an expost object that contains the conﬁdence
statements one can have on the value of
β
given y,x,Tobs(
β
).
Randomized conﬁdence distributions for discrete statistics. A last remark relates to discrete statis
tics. Conﬁdence distributions based on discrete statistics cannot lead to a continuous uniform dis
tribution. Approximations must be used. Schweder and Hjort (2002) proposed half correction. For
discrete statistics, they used
CD(
β
0) = P
β
0[T(
β
0)>Tobs(
β
0)] + 1
2P
β
0[T(
β
0) = Tobs(
β
0)],(3.3)
We rather use randomization as in section 2. The discrete statistic T(
β
)is associated with an
auxiliary one UT, which is independently, uniformly and continuously distributed over [0,1]. Lexi
cographical order is used to order ties.
CD(
β
0) = P
β
0[T(
β
0)>Tobs(
β
0)] + P[UT(
β
0)>UTobs(
β
0)]P
β
0[T(
β
0) = Tobs(
β
0)].(3.4)
Simulated conﬁdence distributions and illustration. Let us consider a simple example to illustrate
those notions. In the model yi=
β
xi+ui,i=1,...,n,(ui,xi)iid
∼N(0,I2), the Student signbased
statistic
SST (
β
) = ∑s(yi−xi
β
)xi
(∑x2
i)1/2
is a pivotal function and decreases with
β
. The simulated conﬁdence distribution of
β
given the
realization y,xis
c
CD(
β
0) = 1−ˆ
F
β
0(SST (
β
0)),(3.5)
with ˆ
F
β
0a Monte Carlo estimate of the sampling distribution of SST under H0(
β
0):
β
=
β
0. Figure
1 presents a simulated conﬁdence distribution cumulative function for
β
, given 200 realizations of
(ui,xi)based on SST . The Monte Carlo estimate of ˆ
F
β
0is obtained from 9999 replicates of SST
under H0(
β
0).Testing H∗
0:
β
≤.1 at 10% can be done by reading CD(.1)here .92. The test accepts
H∗
0. Further, (−∞, .23]constitutes a onesided conﬁdence interval for
β
with level .95.
Realized pvalue functions for discrete statistics. Another interesting object is the realized pvalue
3. CONFIDENCE DISTRIBUTIONS
9
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta
CD(beta)
Figure 1. Simulated conﬁdence distribution cumulative function based on SST.
function when testing point hypotheses H0(
β
0). The latter is a simple transformation of the CD
cumulative function. The simulated realized pvalue is given by
ˆpSST (
β
0) = 2min{c
CDSST (
β
0),1−c
CDSST (
β
0)}.(3.6)
Consider now the statistic SF =SST 2.SF is a pivotal function but not a monotone function of
β
contrary to SST . An entire conﬁdence distribution cannot be recovered from SF because of this lack
of monotonicity. However, the pvalue function can be constructed using equation (2.5). Figures
2 (a) and (b) compare pvalue functions based on SST and SF . Inverting the pvalue function
allows one to recover half of the conﬁdence distribution and consequently half of the inference
results, i.e. the twosided conﬁdence intervals. For example, in Figure 2 (a), [−.17, .24]constitutes
a conﬁdence interval with level 90% for both statistics. The pvalue function provides then an
interesting summary on the available inference. Especially, it gives the conﬁdence degree one can
have in the statement
β
=
β
0. Finally, the pvalue function has an important advantage over the
conﬁdence distribution: it is straightforwardly extendable to multidimensional parameters.
The spread of the pvalue function is also related to the model speciﬁcation and the parameter
identiﬁcation. When the pvalue function is ﬂat, one may expect the parameter to be badly identiﬁed
either because there exists a set of observationally equivalent parameters, then, the pvalues are high
for a wide set of values; either because there does not exist any value satisfying the model and then
the pvalues are small everywhere. To illustrate that point, let us consider another example (example
2) where the ﬁrst n1observations satisfy yi=
β
1xi+ui,i=1,...,n1,(ui,xi)iid
∼N(0,I2)and the n2
3. CONFIDENCE DISTRIBUTIONS
10
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta
p−value
FS SST
(a) Example 1: well identiﬁed case
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta
p−value
FS SST
(b) Example 2: misspeciﬁed case
Figure 2. Simulated pvalue functions based on SST and SF
followings, yi=
β
2xi+ui,i=n1+1,...,n1+n2,(ui,xi)iid
∼N(0,I2), with
β
1=−.5 and
β
2=.5.
The model yi=
β
xi+ui,i=1,...,n1+n2, is misspeciﬁed. In Figure 2 (b), we notice the spread of
the pvalue function based on SF is large: the set of observationally equivalent
β
is not reduced to
a point.
3.2. Simultaneous and projectionbased pvalue functions in multivariate regression
If p≥2, the conﬁdence distribution notion is not deﬁned anymore. However, simulated real
ized pvalues for testing H0(
β
0)can easily be constructed from the SF statistic and more gen
erally from any signbased statistic which satisﬁes equation (2.4). Simulated pvalues lead to
a mapping for which we have a 3dimensional representation for p=2. Consider the model:
yi=
β
1x1i+
β
2x2i+ui,i=1,...,n,(ui,x1i,x2i)iid
∼N(0,I3),
β
= (
β
1,
β
2) = (0,0)′,y= (y1,...,yn)′,
u= (u1,...,un)′,x1= (x11,...,x1n)′,x2= (x21,...,x2n)′and X= (x1,x2). Let DS
β
,(X′X)−1=
s′(y−X
β
)X(X′X)−1X′s(y−X
β
). In Figure 3, we compute the simulated pvalue function ˜pDS
N(
β
0)
for testing H0(
β
0)on a grid of values of
β
0, using Nreplicates of the sign vector. ˜pDS
N(
β
0)allows
one to construct simultaneous conﬁdence sets for
β
= (
β
1,
β
2)with any level. By construction, the
conﬁdence region C1−
α
(
β
)deﬁned as
C1−
α
(
β
) = {
β
˜pDS
N(
β
0)≥
α
},(3.7)
3. CONFIDENCE DISTRIBUTIONS
11
−0.5 −0.4 −0.3 −0.2 −0.1 00.1 0.2 0.3 0.4 0.5
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta1
beta2
simulated
p−values
Figure 3. Simulated pvalue functions based on SF (n=200, N=9999).
has level 1 −
α
[see Dufour (2006)]. Hence, by construction, C1−
α
(
β
)corresponds to the intersec
tion of the horizontal plan at ordinate
α
with the envelope of ˜pDS
N(
β
0).
For higher dimensions (p>2), a complete graphical representation is not available anymore.
However, one can consider projectionbased realized pvalue functions for each individual compo
nent of the parameter of interest in a similar way than projectionbased conﬁdence intervals. For
this, we apply the general strategy of projection on the complete simultaneous pvalue function.
The projectedbased realized pvalue function for the component
β
1is given by:
Proj.˜p
β
1
N(
β
1
0) = max
β
2
0∈R
˜pDS
N[(
β
1
0,
β
2
0)].(3.8)
Figure 4 presents projectionbased conﬁdence intervals for the individual parameters of the previous
3. CONFIDENCE DISTRIBUTIONS
12
Figure 4. Projectionbased pvalues.
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta1
Projected p−values
(a) Projectionbased pvalues for
β
1
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta2
Projected p−values
(b) Projectionbased pvalues for
β
2
2dimensional example. [−.22,.21]is a 95% (conservative) conﬁdence interval for
β
1.[−.38,.02]
is a 95% (conservative) conﬁdence interval for
β
2. Testing
β
1=0 is accepted at 5% with pvalue
1.0. Testing
β
2=0 is accepted at 5% with pvalue .06.
Controlled inference using simulated conﬁdence distributions and realized pvalues. Simulated
conﬁdence distribution and realized pvalues are Monte Carlobased tools. Hence derived tests
control the nominal size only for
α
’s such that
α
(N+1)∈N; see Dufour (2006):
P[˜pDS
N(
β
0)≤
α
] =
α
∀
α
such that
α
(N+1)∈N.
If
α
(N+1)/∈N, only bounds on the signiﬁcance level are known, but they are very close to
α
when
Nis sufﬁciently large:
I(
α
(N+1)−1)
N+1≤P[˜pDS
N(
β
0)≤
α
]<
α
∀
α
such that
α
(N+1)/∈N.
Contrary to tests, simulated conﬁdence distributions and realized pvalues are not evaluated at a
given signiﬁcance level
α
but rather on a range of signiﬁcance levels (
α
1,...,
α
A). Hence, one
must choose carefully Nthe number of replicates in order to control the signiﬁcance level for all the
α
i’s, i.e. choose Nsufﬁciently large to have (N+1)
α
i∈N,∀
α
i∈(
α
1,...,
α
A). In the previous
illustrations, N=9999 which insures that the signiﬁcance levels are controlled at .0001.
4. SIGNBASED ESTIMATORS
13
4. Signbased estimators
Signbased estimators complete the above system of inference. Intuition suggests to consider values
with the highest conﬁdence degree, i.e, with the highest pvalues. Estimators obtained by that sort
of test inversion constitute multidimensional extensions of the HodgesLehmann principle.
4.1. Signbased estimators as maximizers of a pvalue function
Hodges and Lehmann (1963) presented a general principle to derive estimators by test inversion; see
also Johnson, Kotz and Read (1983). Suppose
µ
∈Rand T(
µ
0,W)is a statistic for testing
µ
=
µ
0
against
µ
>
µ
0based on the observations W. Suppose further that T(
µ
,W)is nondecreasing in the
scalar
µ
. Given a known central value of T(
µ
0,W), say m(
µ
0)[for example EWT(
µ
0,W)], the test
rejects
µ
=
µ
0whenever the observed Tis larger than, say, m(
µ
0). If that is the case, one is inclined
to prefer higher values of
µ
. The reverse holds when testing the opposite. If m(
µ
0)does not depend
on
µ
0[m(
µ
0) = m0], an intuitive estimator of
µ
(if it exists) is given by
µ
∗such that T(
µ
∗,W)
equals m0(or is very close to m0).
µ
∗may be seen as the value of
µ
which is most supported by the
observations.
This principle can be directly extended to multidimensional parameter setups through pvalue
functions. Let
β
∈Rp. Consider testing H0(
β
0):
β
=
β
0versus H1(
β
0):
β
=
β
1with the positive
statistic T. A test based on Trejects H0(
β
0)when T(
β
0)is larger than a certain critical value
that depends on the test level. The estimator of
β
is chosen as the value of
β
least rejected when
the level
α
of the test increases. This corresponds to the highest pvalue. If the associated p
value for H0(
β
0)is p(
β
0) = GDS(
β
0)
β
0, where G(x
β
0)is the survival function of DS(
β
0),i.e.
G(x
β
0) = P[DS(
β
0)>x], the set
M1=argmax
β
∈Rp
p(
β
)(4.1)
constitutes a set of HodgesLehmanntype estimators. HLtype estimators maximize the pvalue
function. There may not be a unique maximizer. In that case, any maximizer is consistent with the
data.
4.2. Signbased estimators as solutions of a nonlinear generalized leastsquares
problem
When the distribution of T(
β
0)and the corresponding pvalue function do not depend on the tested
value
β
0, maximizing the pvalue is equivalent to minimizing the statistic T(
β
0). This point is
stated in the following proposition. Let us denote ¯
F(x
β
0)the distribution of T(
β
0)when
β
=
β
0
and assume this distribution is invariant to
β
(Assumption 4.1).
Assumption 4.1 INVARIANCE OF THE DISTR IBUTION FUNCT ION.
¯
F(x
β
0) = ¯
F(x)∀x∈R+,∀
β
0∈Rp.
4. SIGNBASED ESTIMATORS
14
Let us deﬁne
M1=argmax
β
∈Rp
p(
β
).(4.2)
M2=argmin
β
∈Rp
T(
β
).(4.3)
Then, the following proposition holds.
Proposition 4.1 If Assumption 4.1 holds, then M1=M2with probability one.
Maximizing p(
β
)is equivalent (in probability) to minimizing T(
β
)if Assumption 4.1 holds.
Under the mediangale Assumption 2.1, any signbased statistic DSdoes satisfy Assumption 4.1.
Consequently,
ˆ
β
n(
Ω
n)∈argmin
β
∈Rp
s′(Y−X
β
)X
Ω
ns(Y−X
β
),XX′s(Y−X
β
) = M2(Y,X,D
Ω
n
S)(4.4)
equals (with probability one) a HodgesLehmann estimator based on DS(
Ω
n,
β
). Since DS(
Ω
n,
β
)is
nonnegative, problem (4.4) always possesses at least one solution. As signs can only take 3 values,
for ﬁxed n, the quadratic function can take a ﬁnite number of values, which entails the existence of
the minimum. If the solution is not unique, one may add a choice criterion. For example, one can
choose the smallest solution in terms of a norm or use a randomization. Under conditions of point
identiﬁcation, any solution of (4.4) is a consistent estimator.
In models with sets of observationally equivalent values of
β
, any inference approach relying on
the consistency of a point estimator (which assumes point identiﬁcation), gives misleading results
whereas a whole estimator set remains informative. The approach of Chernozhukov, Hong and
Tamer (2007) can be applied here. Let us remind that the Monte Carlo signbased inference method
[Coudin and Dufour (2009)] does not rely on identiﬁcation conditions and leads to valid results in
any case.
The signbased estimators studied by Boldin et al. (1997) are solutions of
ˆ
β
n(Ip)∈arg min
β
∈Rps′(Y−X
β
)XX′s(Y−X
β
) = argmin
β
∈R
SB(
β
),(4.5)
and ˆ
β
n((X′X)−1)∈arg min
β
∈Rps′(Y−X
β
)X(X′X)−1X′s(Y−X
β
) = argmin
β
∈R
SF(
β
).(4.6)
For heteroskedastic independent disturbances, we introduce weighted versions of signbased esti
mators that can be more efﬁcient than the basic ones deﬁned in (4.5) or (4.6). Weighted signbased
estimators are signbased analogues to weighted LAD estimator [see Zhao (2001)]. The weighted
LAD estimator is given by
β
W LAD
n=argmin
β
∈Rp∑
i
diyi−x′
i
β
.(4.7)
4. SIGNBASED ESTIMATORS
15
The weighted signbased estimators are solutions of
ˆ
β
DX
n∈argmin
β
∈Rp
s′(Y−X
β
)˜
X(˜
X′˜
X)−1˜
X′D′s(Y−X
β
)(4.8)
where ˜
X=diag(d1, . .. , dn)Xand (di),i=1,...,n∈R+∗. Weighted signbased estimators that
involve optimal estimating functions in the sense of Godambe (2001) are solutions of
ˆ
β
DX∗
n∈argmin
β
∈Rp
s′(Y−X
β
)X∗(X∗′X∗)−1X∗′D′s(Y−X
β
)(4.9)
where ˜
X=diagf1(0X), . .. , fn(0X)Xand ft(0X),t=1,...,n,are the conditional disturbance
densities evaluated at zero. The inherent problem of such a class of estimators is to provide good
approximations of fi(0X)’s. Densities of normal distributions can be used.
4.3. Signbased estimators as GMM estimators
Signbased estimators have been interpreted in the literature as GMM estimators exploiting the or
thogonality condition between the signs and the explanatory variables [see Honore and Hu (2004)].
In our opinion, a strictly GMM interpretation hides the link with the testing theory. That is the rea
son why we ﬁrst introduced signbased estimators as HodgesLehmann estimators. The quadratic
form (4.4) refers to quite unusual moment conditions. The sign transformation evacuates the un
known parameters that affect the error distribution. It validates nonparametric ﬁnitesamplebased
inference when mediangale Assumption holds. However, in settings where only the signmoment
condition 2.2 is satisﬁed, the GMM interpretation of signbased estimators still applies and entails
useful extensions.
For autocorrelated disturbances, an estimator based on a HAC signbased statistic DS(
β
,ˆ
J−1
n)
can be used:
ˆ
β
n(ˆ
J−1
n)∈arg min
β
∈Rps′(Y−X
β
)X[ˆ
Jn(s(Y−X
β
),X)]−1X′s(Y−X
β
),(4.10)
where ˆ
Jn(s(Y−X
β
),X)accounts for the dependence among the signs and the explanatory variables.
β
appears twice, ﬁrst in the constrained signs, second in the weight matrix. In practice, optimizing
(4.10) requires one to invert a new matrix ˆ
Jnfor each value of
β
whereas problem (4.6) only re
quires one inversion of X′X. In practice, this numerical problem may quickly become cumbersome
similarly to continuously updating GMM. We advocate to use a twostep method: ﬁrst, solve (4.6)
and obtain ˆ
β
n((X′X)−1); compute then ˆ
J−1
ns(Y−Xˆ
β
n((X′X)−1)),Xand ﬁnally solve,
ˆ
β
2S
n(ˆ
J−1
n)∈arg min
β
∈Rps′(Y−X
β
)X[ˆ
Jn(s(Y−Xˆ
β
n),X)]−1X′s(Y−X
β
).(4.11)
The 2step estimator is not a HodgesLehmann estimator anymore. However, it is still consistent and
share some interesting ﬁnitesample properties with classical signbased estimators. The properties
of signbased estimators are studied in the next section.
5. FINITESAMPLE PROPERTIES OF SIGNBASED ESTIMATORS
16
5. Finitesample properties of signbased estimators
In this section, ﬁnitesample properties of signbased estimators are studied. Signbased estimators
share invariance properties with the LAD estimator and are medianunbiased if the disturbance
distribution is symmetric and some additional assumptions on the form of the solution are satisﬁed.
The topology of the argmin set of the optimization problem 4.4 does not possess a simple structure.
In some cases it is reduced to a single point like the empirical median of 2p+1 observations. In
other cases, it is a set. More generally, the argmin set is a union of convex sets but it is not a priori
either convex nor connected. To see that it is a union of convex sets just remark that the reciprocal
image of nﬁxed signs is convex.
Signbased estimators share some attractive equivariance properties with LAD and quantile es
timators [see Koenker and Bassett (1978)]. It is straightforward to see that the following proposition
holds.
Proposition 5.1 INVARIANCE. Let M(y,X)be the set of the solutions of the minimization problem
(4.4).If ˆ
β
(y,X)∈M(y,X),then the following properties hold:
λ
ˆ
β
(y,X)∈M(
λ
y,X),∀
λ
∈R,(5.1)
ˆ
β
(y,X) +
γ
∈M(y+X
γ
,X),∀
γ
∈Rp,(5.2)
A−1ˆ
β
(y,X)∈M(y,XA),for any nonsingular k ×k matrix A.(5.3)
Further, if ˆ
β
(y,X)is a uniquely determined solution of (4.4),then
ˆ
β
(
λ
y,X) =
λ
ˆ
β
(y,X),∀
λ
∈R,(5.4)
ˆ
β
(y+X
γ
,X) = ˆ
β
(y,X) +
γ
,∀
γ
∈Rp,(5.5)
ˆ
β
(y,XA) = A−1ˆ
β
(y,X),for any nonsingular k ×k matrix A.(5.6)
To prove this property, it is sufﬁcient to write down the different optimization problems. (5.1)
and (5.4) state a form of scale invariance: if yis rescaled by a certain factor, ˆ
β
, rescaled by the same
one is solution of the transformed problem. (5.2) and (5.5) represent location invariance, while
(5.3) and (5.6) show the behavior of the estimator changes states a reparameterization of the design
matrix. In all cases, parameter estimates change in the same way as theoretical parameters.
If the disturbance distribution is assumed to be symmetric and the optimization problems to
have a unique solution then signestimators are median unbiased.
Proposition 5.2 MEDIAN UN BIA SEDNESS. If u ∼ −u and the signbased estimator ˆ
β
(y,X)is a
uniquely determined solution of the minimization problem(4.4), then ˆ
β
is median unbiased, i.e.
Med(ˆ
β
−¯
β
) = 0
where ¯
β
represents the “true value” of
β
.
6. ASYMPTOTIC PROPERTIES
17
6. Asymptotic properties
We demonstrate consistency of the proposed signbased estimators when the parameter is identiﬁed
under weaker assumptions than the LAD estimator, which validates the use of signbased estima
tors even in settings when the LAD estimator fails to converge. Finally, signbased estimators are
asymptotically normal. For reviews of the asymptotic distributional theory of LAD estimators, the
reader may consult Bassett and Koenker (1978), Knight (1989), Phillips (1991), Pollard (1991),
Portnoy (1991), Weiss (1991), Fitzenberger (1997), Knight (1998), El Bantli and Hallin (1999), and
Koenker (2005).
6.1. Identiﬁcation and consistency
We show that the signbased estimators (4.4) and (4.11) are consistent under the following set of
assumptions. In the sequel, we denote by ¯
β
the “true value” of
β
,and by
β
0any hypothesized
value.
Assumption 6.1 MIXI NG.{Wt= (yt,x′
t)}t=1,2,... is
α
mixing of size −r/(r−1)with r >1.
Assumption 6.2 BOUNDEDNESS. xt= (x1t,...,xpt )′and Exht r+1<
∆
<∞,h=1,...,p,t=
1,...,n,∀n∈N.
Assumption 6.3 COMPACTNESS.¯
β
∈Int(
Θ
), where
Θ
is a compact subset of Rp.
Assumption 6.4 REGU LARITY OF THE DENSITY.
1. There are positive constants fLand p1such that, for all n ∈N,
P[ft(0X)>fL]>p1,t=1,...,n,a.s.
2. ft(·X)is continuous, for all n ∈Nfor all t, a.s.
Assumption 6.5 POIN T IDE NTIFICATI ON CONDITION.∀
δ
>0,∃
τ
>0such that
liminf
n→∞
1
n∑
t
P[x′
t
δ
>
τ
ft(0x1,...,xn)>fL]>0.
Assumption 6.6 UNIFORMLY POSIT IVE D EFINITE WEIGHT MATRIX.¯
Ω
n(
β
)is symmetric posi
tive deﬁnite for all
β
in
Θ
.
Assumption 6.7 LO CALLY POSITIV E DEFI NITE WEIGHT MATRIX.¯
Ω
n(
β
)is symmetric positive
deﬁnite for all
β
in a neighborhood of ¯
β
.
Then, we can state the consistency theorem. The assumptions are interpreted just after.
6. ASYMPTOTIC PROPERTIES
18
Theorem 6.1 CONSISTEN CY. Under model (2.1)with the assumptions 2.2 and 6.16.6, any
signbased estimator of the type,
ˆ
β
n∈argmin
β
0∈
Θ
s(y−X
β
0)′X
Ω
ns(y−X
β
0),XX′s(y−X
β
0)(6.1)
or ˆ
β
2S
n∈argmin
β
0∈
Θ
s(y−X
β
0)′Xˆ
Ω
ns(y−Xˆ
β
),XX′s(y−X
β
0),(6.2)
where ˆ
β
stands for any (ﬁrst step) consistent estimator of ¯
β
, is consistent. ˆ
β
2S
ndeﬁned in equation
(6.2)is also consistent if Assumption 6.6 is replaced by Assumption 6.7.
It will useful to discuss Assumptions 6.1 6.7 and compare them with the ones required for LAD
and quantile estimator consistency. On considering the special case where X
Ω
ns(y−X
β
0),XX′=
Inthe identity matrix, the estimators in (6.1)  (6.2) coincide with the “quantile regression estimator”
(with
θ
=1/2)studied by Fitzenberger (1997, Theorem 2.2). However, allowing for a weighting
matrix different the identity matrix – as we do here – turns out to be important from the viewpoint
of efﬁciency. Stricto sensu, the signbased estimators in (6.1)  (6.2) and Fitzenberger (1997, The
orem 2.2) are not LAD estimators, because the size of residuals (through absolute values) do not
appear in the objective function. This feature is crucial for relaxing assumptions on moments. The
disturbances indeed appear in the objective function only through their sign transforms which pos
sess ﬁnite moments at all orders. Consequently, no additional restriction need be imposed on the
disturbance process (in addition to regularity conditions on the density). Only assumptions on the
moments of xtare used (see Assumption 6.2). There is very little work on LAD estimators proper
ties with inﬁnite variance errors; see Knight (1989) and Phillips (1991) who derive LAD asymptotic
properties for an autoregressive model with inﬁnite variance errors, which are in the domain of
attraction of a stable law.
Assumption 6.1 on mixing is needed to apply a generic weak law of large numbers; see An
drews (1987) and White (2001). It was used by Fitzenberger (1997) with stationary linearly depen
dent processes. It covers, among other processes, stationary ARMA disturbances with continuously
distributed innovations. Identiﬁcation is provided by Assumptions 6.4 and 6.5. Assumption 6.5 is
similar to Condition ID in Weiss (1991). Assumption 6.4 is usual in LAD estimator asymptotics.4
It is analogous to Fitzenberger’s (1997) conditions (ii.b)  (ii.c) and Weiss’s (1991) CD condition.
It implies that there is enough variation around zero to identify the median. It restricts the setup
for some “bounded” heteroskedasticity in the disturbance process but not in the usual (variance
based) way. It is related to diffusivity 2f(0)−1, a dispersion measure adapted to medianunbiased
estimators. Diffusivity indicates the vertical spread of a density rather than its horizontal spread,
and appears in CramérRaotype efﬁciency bounds for medianunbiased estimators; see Sung, Stan
genhaus and David (1990) and So (1994). Assumption 6.6 entails that the weight matrix
Ω
nis
everywhere invertible, while Assumption 6.7 only requires local invertibility.
4Assumption 6.4 can be slightly relaxed covering error terms with mass point if the objective function involves ran
domized signs instead of usual signs.
6. ASYMPTOTIC PROPERTIES
19
6.2. Asymptotic normality
Signbased estimators are asymptotically normal. Signbased estimators are well adapted to deal
with heavytailed disturbances that may not possess ﬁnite variances. The assumptions we consider
are the following ones.
Assumption 6.8 UNIF ORM LY BOUNDED DENS ITI ES.∃fU<+∞such that ,∀n∈N,∀
λ
∈R,
sup
{t∈(1,...,n)}ft(
λ
x1,...,xn)<fU,a.s.
Under the conditions 2.2, 6.1, 6.2 and 6.8, we can deﬁne L(
β
), the derivative of the limiting
objective function at
β
:
L(
β
) = lim
n→∞
1
n∑
t
Extx′
tftx′
t(
β
−¯
β
)x1,...,xn=lim
n→∞Ln(
β
).(6.3)
where
Ln(
β
) = 1
n∑
t
Extx′
tftx′
t(
β
−¯
β
)x1,...,xn.(6.4)
The other assumptions are fairly standard conditions to prove asymptotic normality.
Assumption 6.9 MIXING WITH r>2. The process {Wt= (yt,x′
t):t=1,2,...}is
α
mixing of
size −r/(r−2)with r >2.
Assumption 6.10 DEFIN ITE POS ITIVENESS OF Ln. Ln(¯
β
)is positive deﬁnite uniformly in n.
Assumption 6.11 DEFINITE POSI TIVENESS OF Jn. Jn=E1
n∑n
t,ss(ut)xtx′
ss(us)is positive deﬁ
nite uniformly in n and converges to a deﬁnite positive symmetric matrix J as n →∞.
Then, we have the following result.
Theorem 6.2 ASYM PTOT IC NO RMALITY. Under the assumptions (2.2), 6.1 to 6.6, and 6.9 to
6.11, we have:
S−1/2
n√nˆ
β
n−¯
β
d
→N(0,Ip)(6.5)
where ˆ
β
n(
Ω
n)is any estimator which minimizes DS[
β
0,¯
Ω
n(
β
0)] in (2.4),
Sn= [Ln(¯
β
)
Ω
nLn(¯
β
)]−1Ln(¯
β
)
Ω
nJn
Ω
nLn(¯
β
)[Ln(¯
β
)
Ω
nLn(¯
β
)]−1
and
Ln(¯
β
) = 1
n∑
t
Extx′
tft0x1,...,xn.(6.6)
When ¯
Ω
n(
β
0) = ˆ
Jn(
β
0)−1and ˆ
Jn(
β
0) = 1
n∑n
t,ss(yt−x′
t
β
0)xtx′
ss(ys−x′
s
β
0), we get:
[Ln(¯
β
)ˆ
J−1
nLn(¯
β
)]−1/2√nˆ
β
n(ˆ
J−1
n)−¯
β
d
→N0,Ip.(6.7)
6. ASYMPTOTIC PROPERTIES
20
This corresponds to the use of optimal instruments and quasiefﬁcient estimation. ˆ
β
n(ˆ
J−1
n)has the
same asymptotic covariance matrix as the LAD estimator. Thus, performance differences between
the two estimators correspond to ﬁnitesample features. This result contradicts the generally ac
cepted idea that sign procedures involve a heavy loss of information. There is no loss induced by
the use of signs instead of absolute values.
Note again that we do not require that the disturbance process variance be ﬁnite. We only assume
that the secondorder moments of Xare ﬁnite and the mixing property of {Wt,t=1,...}holds.
This differs from usual assumptions for LAD asymptotic normality.5This difference comes from
the fact that absolute values of the disturbance process are replaced in the objective function by their
signs. Since signs possess ﬁnite moments at any order, one sees easily that a CLT can be applied
without any further restriction. Consequently, asymptotic normality, such as consistency, holds
for heavytailed disturbances that may not possess ﬁnite variance. This is an important theoretical
advantage of signbased rather than absolute valuebased estimators and, a fortiori, rather than
leastsquares estimators. Estimators, for which asymptotic normality holds on bounded asymptotic
variance assumption (for example OLS) are not accurate in heavytail settings because the variance
is not a measure of dispersion adapted to those settings. Estimators, for which the asymptotic
behavior relies on other measures of dispersion, like the diffusivity, help one out of trouble.
The form of the asymptotic covariance matrix simpliﬁes under stronger assumptions. When the
signs are mutually independent conditional on X[mediangale Assumption 2.1], both ˆ
β
n((X′X)−1)
and ˆ
β
(J−1
n)are asymptotically normal with variance
Sn= [Ln(¯
β
)]−1E"(1/n)
n
∑
t=1
xtx′
t#[Ln(¯
β
)]−1.
If uis an i.i.d. process and is independent of X, then ft(0) = f(0),and
Sn=1
4f(0)2E(xtx′
t)−1.(6.8)
In the general case, ft(0)is a nuisance parameter even if condition 6.8 implies that it can be bounded.
All the features known about the LAD estimator asymptotic behavior apply also for the SHAC
estimator; see Boldin et al. (1997). For example, asymptotic relative efﬁciency of the SHAC (and
LAD) estimator with respect to the OLS estimator is 2/
π
if the errors are normally distributed
N(0,
σ
2), but SHAC (such as LAD) estimator can have arbitrarily large ARE with respect to OLS
when the disturbance generating process is contaminated by outliers.
6.3. Asymptotic or projectionbased conﬁdence sets?
In section 4, we introduced signbased estimators as HodgesLehmann estimators associated with
signbased statistics. By linking them with GMM settings, we then derived asymptotic normal
ity. We stressed that signbased estimator asymptotic normality holds under weaker assumptions
5See Fitzenberger (1997) for the derivation of the LAD asymptotics in a similar setup and BassettKoenker(1978) or
Weiss (1991) for a derivation of the LAD asymptotics under sign independence.
7. SIMULATION STUDY
21
than the ones needed for the LAD estimator. Therefore, signbased estimator asymptotic normal
ity enables one to construct asymptotic tests and conﬁdence intervals. Thus, we have two ways of
making inference with signs: we can use the Monte Carlo (ﬁnitesample) based method described
in Coudin and Dufour (2009)  see section 2.2  and the classical asymptotic method. Let us list here
the main differences between them. Monte Carlo inference relies on the pivotality of the signbased
statistic. The derived tests are valid (with controlled level) for any sample size if the mediangale
Assumption 2.1 holds. When only the sign moment condition 2.2 holds, the Monte Carlo inference
remains asymptotically valid. Asymptotic test levels are controlled. Besides, in simulations, the
Monte Carlo inference method appears to perform better in small samples than classical asymptotic
methods, even if its use is only asymptotically justiﬁed [see Coudin and Dufour (2009)]. Never
theless, that method has an important drawback: its computational complexity. On the contrary,
classical asymptotic methods which yield tests with controlled asymptotic level under the sign mo
ment condition 2.2 may be less time consuming. The choice between both is mainly a question
of computational capacity. We point out that classical asymptotic inference greatly relies on the
way the asymptotic covariance matrix, that depends on unknown parameters (densities at zero), is
treated. If the asymptotic covariance matrix is estimated thanks to a simulationbased method (such
as the bootstrap) then the time argument does not hold anymore. Both methods would be of the
same order of computational complexity.
7. Simulation study
In this section, we compare the performance of the signbased estimators with the OLS and LAD
estimators in terms of asymptotic bias and RMSE.
7.1. Simulation setup
We use estimators derived from the signbased statistics DS
β
,(X′X)−1and DS(
β
,ˆ
J−1
n)when a
correction is needed for linear serial dependence (SHAC estimator). Minimizations are solved by
simulated annealing. We consider a set of general DGP’s to illustrate different classical problems
one may encounter in practice. We use the following linear regression model:
yt=x′
t
β
+ut(7.1)
where xt= (1,x2,t,x3,t)′and
β
are 3 ×1 vectors. We denote the sample size n. Monte Carlo studies
are based on Sgenerated random samples. Table 1 presents the cases considered.
In a ﬁrst group of examples (A1A4), we consider classical independent cases with bounded
heterogeneity. In a second one (B5B8), we look at processes involving large heteroskedasticity
so that some of the estimators we consider may not be asymptotically normal nor even consistent.
Finally, the third group (C9C11) is dedicated to autocorrelated disturbances. We wonder whether
the twostep SHAC signbased estimator performs better in small samples than the noncorrected
one.
To sum up, cases A1 and A2 present i.i.d. normal observations without and with conditional
heteroskedasticity. Case A3 involves a sort of weak nonlinear dependence in the error term. Case A4
7. SIMULATION STUDY
22
Table 1. Simulated models.
A1: Normal HOM errors (x2,t,x3,t,ut)′i.i.d
∼N(0,I3),t=1,.. . ,n
A2: Normal HET errors (x2,t,x3,t,˜ut)′i.i.d
∼N(0,I3),
ut=min{3,max[0.21,x2,t]}× ˜ut,t=1,...,n
A3: Dep.HE T x j,t=
ρ
xxj,t−1+
ν
j
t,j=1,2,
ρ
x=.5 : ut=min{3,max[0.21,x2,t]}×
ν
u
t,
(
ν
2
t,
ν
3
t,
ν
u
t)′i.i.d
∼N(0,I3),t=2,...,n
ν
2
1and
ν
3
1chosen to insure stationarity.
A4: Unbalanced design matrix x2,t∼B(1,0.3),x3,t
i.i.d.
∼N(0,.012),
ut
i.i.d.
∼N(0,1),xt,utindependent, t=1,...,n.
B5: Cauchy errors (x2,t,x3,t)′∼N(0,I2),
ut
i.i.d.
∼C,xt,ut,independent, t=1,. . . , n.
B6: Stochastic volatility (x2,t,x3,t)′i.i.d.
∼N(0,I2),ut=exp(wt/2)
ε
twith
wt=0.5wt−1+vt, where
ε
t
i.i.d.
∼N(0,1),vt
i.i.d.
∼
χ
2(3),
xt,ut,independent, t=1,...,n.
B7: Nonstationary (x2,t,x3,t,
ε
t)′i.i.d.
∼N(0,I3),t=1,.. . ,n,
GARCH(1,1) ut=
σ
t
ε
t,
σ
2
t=0.8u2
t−1+0.8
σ
2
t−1.
B8: Exponential error variance (x2,t,x3,t,
ε
t)′i.i.d.
∼N(0,I3),ut=exp(.2t)
ε
t.
C9: AR(1)HOM (x2,t,x3,t,
ν
u
t)′∼N(0,I3),t=2,.. . ,n,
ρ
u=.5ut=
ρ
uut−1+
ν
u
t,
(x2,1,x3,1)′∼N(0,I2),
ν
u
1insures stationarity.
C10: AR(1)HE T x j,t=
ρ
xxj,t−1+
ν
j
t,j=1,2,
ρ
u=.5,ut=min{3,max[0.21,x2,t]}× ˜ut,
ρ
x=.5 ˜ut=
ρ
u˜ut−1+
ν
u
t,
(
ν
2
t,
ν
3
t,
ν
u
t)′i.i.d
∼N(0,I3),t=2,...,n
ν
2
1,
ν
3
1and
ν
u
1chosen to insure stationarity.
C11: AR(1)HOM (x2,t,x3,t,
ν
u
t)′∼N(0,I3),t=2,...,n,
ρ
u=.9ut=
ρ
uut−1+
ν
u
t,
(x2,1,x3,1)′∼N(0,I2),
ν
u
1insures stationarity.
8. EMPIRICAL APPLICATIONS
23
presents a very unbalanced scheme in the design matrix (a case when the LAD estimator is known
to perform badly). Cases B5, B6, B7 and B8 are other cases of long tailed errors, heteroskedasticity
and nonlinear dependence. Cases C9 to C11 illustrate different levels of autocorrelation in the error
term with and without heteroskedasticity.
7.2. Bias and RMSE
We give biases and RMSE of each parameter of interest in Table 2 and we report a norm of these
three values. n=50 and S=1000. These results are unconditional on X.
In classical cases (A1A3), signbased estimators have roughly the same behavior as the LAD
estimator, in terms of bias and RMSE. OLS is optimal in case A1. However, there is no important
efﬁciency loss or bias increase in using signs instead of LAD. Besides, if the LAD is not accurate
in a particular setup (for example with highly unbalanced explanatory scheme, case A4), the sign
based estimators do not suffer from the same drawback. In case A4, the RMSE of the signbased
estimator is notably smaller than those of the OLS and the LAD estimates.
For setups with strong heteroskedasticity and nonstationary disturbances (B5B8), we see that
the signbased estimators yield better results than both LAD and OLS estimators. Not far from the
(optimal) LAD in case of Cauchy disturbances (B5), the signs estimators are the only estimators
that stay reliable with nonstationary variance (B6B8). No assumption on the moments of the error
term is needed for signbased estimators consistency. All that matters is the behavior of their signs.
When the error term is autocorrelated (C9C11), results are mixed. When a moderate linear
dependence is present in the data, signbased estimators give good results (C9, C10). But when the
linear dependence is stronger (C11), that is no longer true. The SHAC signbased estimator does
not give better results than the noncorrected one in these selected examples.
To conclude, signbased estimators are robust estimators much less sensitive than the LAD
estimator to various unbalanced schemes in the explanatory variables and to heteroskedasticity.
They are particularly adequate when an amount an heteroskedasticity or nonlinear dependence is
suspected in the error term, even if the error term fails to be stationary. Finally, the HAC correction
does not seem to increase the performance of the estimator. Nevertheless, it does for tests. We show
in Coudin and Dufour (2009) that using a HACcorrected statistic allows for the asymptotic validity
of the Monte Carlo inference method and improves the test performance in small samples.
8. Empirical applications
In this section, we go back to the two illustrations presented in Coudin and Dufour (2007, 2009)
where signbased tests were derived, with now estimation in mind. The ﬁrst application is dedicated
to estimate a drift on the Standard and Poor’s Composite Price Index (S&P), 19281987. In the
second one, we search a robust estimate of the rate of
β
convergence between output levels across
U.S. States during the 18801988 period using Barro and SalaiMartin (1991) data.
8. EMPIRICAL APPLICATIONS
24
Table 2. Simulated bias and RMSE.
n=50 OLS LAD SF SHAC
S=1000 Bias RMSE Bias RMSE Bias RMSE Bias RMSE
A1:
β
0.003 .142 .002 .179 .002 .179 .004 .178
β
1.003 .149 .006 .184 .004 .182 .004 .182
β
2−.002 .149 −.007 .186 −.006 .185 −.007 .183

β
*.004 .254 .009 .316 .007 .315 .009 .313
A2:
β
0−.003 .136 .000 .090 −.000 .089 −.000 .089
β
1−.0135 .230 −.006 .218 −.010 .218 −.010 .218
β
2.002 .142 −.001 .095 −.001 .092 −.001 .092

β
 .014 .303 .007 .254 .010 .253 .010 .253
A3:
β
0.022 .167 .018 .108 .025 .107 .023 .107
β
1−1.00 .228 .005 .215 .003 .214 .002 .215
β
2.001 .150 .005 .105 .007 .104 .007 .105

β
 .022 .320 .019 .263 .026 .261 .024 .262
A4:
β
0−.001 .174 .007 .2102 .010 .2181 .008 .2171
β
1−.016 .313 −.011 .375 −.021 .396 −.021 .394
β
2−.100 14.6.077 18.4.014 7.41 .049 7.40

β
 .101 14.6.078 18.5.027 7.42 .054 7.41
B5:
β
016.0 505 .001 .251 .004 .248 .003 .248
β
1−3.31 119 .015 .264 .020 .265 .020 .265
β
2−2.191 630 .000 .256 .003 .258 .001 .258

β
 26.0 817 .015 .445 .021 .445 .020 .445
B6:
β
0−.908 29.6−1.02 27.4.071 2.28 .083 2.28
β
12.00 37.6 3.21 68.4.058 2.38 .069 2.39
β
21.64 59.3 2.59 91.8−.101 2.30 −.089 2.29

β
 2.73 76.2 4.25 118 .136 4.02 .139 4.02
B7:
β
0−127 3289 −.010 7.85 −.008 3.16 −.028 3.17
β
1−81.4 237 .130 11.2−.086 3.80 −.086 3.823
β
2−31.0 1484 −.314 12.0−.021 3.606 −.009 3.630

β
 154 4312 .340 18.2.089 6.12 .091 6.15
B8:
β
0<−1010 >1010 <−109>1010 .312 5.67 .307 5.67
β
1>1010 >1010 >109>1010 .782 5.40 .863 5.46
β
2<−1010 >1010 <−109>1010 .696 5.52 .696 5.55

β
 >1010 >1010 >1010 >1010 1.09 9.58 1.15 9.63
C9:
β
0.005 .279 .001 .308 .003 .309 .004 .311
β
1−.002 .163 −.005 .201 −.004 .200 −.005 .199
β
2.001 .165 −.004 .204 .003 .198 .002 .198

β
 .006 .363 .007 .420 .006 .418 .006 .419
C10:
β
0−.013 .284 −.010 .315 −.015 .314 −.014 .314
β
1−.009 .182 −.009 .220 −.011 .218 −.011 .219
β
2.008 .189 .011 .222 .007 .215 .007 .215

β
 .018 .387 .018 .444 .020 .439 .019 .439
C11:
β
0.070 1.23 −.026 .308 .058 1.26 .053 1.27
β
1−.000 .268 .005 .214 −.005 .351 −.008 .354
β
2.001 .273 −.004 .210 .002 .361 −.001 .361

β
 .070 1.29 .027 .430 .059 1.36 .054 1.37
*. stands for the Euclidean norm.
8. EMPIRICAL APPLICATIONS
25
8.1. Drift estimation with heteroskedasticity
In this section, we estimate a constant and a drift on the Standard and Poor’s Composite Price
Index (SP), 19281987. That process is known to involve a large amount of heteroskedasticity and
have been used by Gallant, Hsieh and Tauchen (1997) and Dufour and Valéry (2006, 2009) to ﬁt
a stochastic volatility model. Here, we are interested in robust estimation without modeling the
volatility in the disturbance process. The data set consists in a series of 16,127 daily observations of
SP
t, then converted in price movements, yt=100[log(SP
t)−log(SP
t−1)] and adjusted for systematic
calendar effects. We consider a model involving a constant and a drift,
yt=a+bt +ut,t=1,...,16127,(8.2)
and we allow that {ut:t=1,...,16127}exhibits stochastic volatility or nonlinear heteroskedasticity
of unknown form. White and BreuschPagan tests for heteroskedasticity both reject homoskedas
ticity at 1%.6
We compute both the basic SF signbased estimator and the SHAC version with the twostep
method. They are compared with the LAD and OLS estimates. Then, we redo a similar experiment
on two subperiods: on the year 1929 (291 observations) and the last 90 days of 1929, which roughly
corresponds to the four last months of 1929 (90 observations). Due to the ﬁnancial crisis, one may
expect data to involve an extreme amount of heteroskedasticity in that period of time. We wonder at
which point that heteroskedasticity can bias the subsample estimates. The Wall Street crash occurred
between October, 24th (Black Thursday) and October, 29th (Black Tuesday). Hence, the second
subsample corresponds to the period just before the crash (September), the crash period (October)
and the early beginning of the Great Depression (November and December). Heteroskedasticity
tests reject homoskedasticity for both subsamples.7
In Table 3, we report estimates and recall the 95% conﬁdence intervals for aand bobtained
by the ﬁnitesample signbased method (SF and SHAC);8and by moving block bootstrap (LAD
and OLS). The entire set of signbased estimators is reported, i.e., all the minimizers of the sign
objective function.
First, OLS estimates are just reported for comparison sake, even if they estimate different quan
tities as LAD/sign estimators, and are greatly unreliable in the presence of heteroskedasticity. Pre
senting the entire sets of signbased estimators enables us to compare them with the LAD estimator.
In this example, LAD and signbased estimators yield very similar estimates. The value of the LAD
estimator is indeed just at the limit of the sets of signbased estimators. This does not mean that the
LAD estimator is included in the set of signbased estimators, but, there is a signbased estimator
giving the same value as the LAD estimate for a certain individual component (the second compo
nent may differs). One easy way to check this is to compare the two objective functions evaluated at
the two estimates. For example, in the 90 observation sample, the sign objective function evaluated
at the basic signestimators is 4.75×10−3, and at the LAD estimate 5.10 ×10−2; the LAD objective
6See Coudin and Dufour (2009): White: 499 (pvalue=.000) ; BP: 2781 (pvalue=.000).
71929: White: 24.2, pvalues: .000 ; BP: 126, pvalues: .000; SeptOctNovDec 1929: White: 11.08, pvalues: .004;
BP: 1.76, pvalues: .18.
8see Coudin and Dufour (2009)
8. EMPIRICAL APPLICATIONS
26
Table 3. Constant and drift estimates.
Whole sample Subsamples
Constant parameter (a)(16120obs)1929(291obs)1929(90obs)
Set of basic signbased .062 (.160,.163)∗(−.091, .142)
estimators (SF) [−.007,.105]∗∗ [−.226,.521] [−1.453, .491]
Set of 2step signbased .062 (.160,.163) (−.091, .142)
estimators (SHAC) [−.007,.106] [−.135, .443] [−1.030, .362]
LAD .062 .163 −.091
[.008,.116] [−.130, .456] [−1.223,1.040]
OLS −.005 .224 −.522
[−.056,.046] [−.140, .588] [−1.730, .685]
Drift parameter (b)×10−5×10−2×10−1
Set of basic signbased (−.184,−.178) (−.003,.000) (−.097,−.044)
estimators (SF) [−.676,.486] [−.330, .342] [−.240, .305]
Set of 2step signbased (−.184,−.178) (−.003,.000) (−.097,−.044)
estimators (SHAC) [−.699,.510] [−.260, .268] [−.204, .224]
LAD −.184 .000 −.044
[−.681,.313] [−.236, .236] [−.316, .229]
OLS .266 −.183 .010
[−.228,.761] [−.523, .156] [−.250, .270]
* Interval of admissible estimators (minimizers of the sign objective function).
** 95% conﬁdence intervals.
function evaluated at the LAD estimate is 210.4 and at one of the signbased estimates 210.5. Both
are close but different.
Finally, twostep signbased estimators and basic signbased estimators yield the same esti
mates. Only conﬁdence intervals differ. Both methods are indeed expected to give different results
especially in the presence of linear dependence.
8.2. A robust signbased estimate of convergence across U.S. states
One ﬁeld suffering from both a small number of observations and possibly very heterogeneous
data is crosssectional regional data sets. Least squares methods may be misleading because a few
outlying observations may drastically inﬂuence the estimates. Robust methods are greatly needed
in such cases. Signbased estimators are robust (in a statistical sense) and are naturally associated
with a ﬁnitesample inference. In the following, we examine signbased estimates of the rate of
β
convergence between output levels across U.S. States between 1880 and 1988 using Barro and
SalaiMartin (1991) data.
In the neoclassical growth model, Barro and SalaiMartin (1991) estimated the rate of
β

convergence between levels of per capita output across the U.S. States for different time periods
8. EMPIRICAL APPLICATIONS
27
Table 4. Summary of regression diagnostics.
Period Heteroskedasticity* Nonnormality** Inﬂuential Possible outliers**
observations**
Basic eq. Eq. Reg. Dum.
18801900 yes  yes  yes yes no no
19001920 yes yes yes yes yes yes yes (MT) yes
19201930     yes  no no
19301940   yes  yes yes no no
19401950     yes yes yes (VT) yes (VT)
19501960    yes yes yes yes (MT) yes (MT)
19601970       no no
19701980   yes yes yes yes yes (WY) yes (WY)
19801988 yes   yes yes yes yes (WY) yes (WY)
* White and BreuschPagan tests for heteroskedasticity are performed. If at least one test rejects at 5%
homoskedasticity, a “yes” is reported in the table, else a “” is reported, when tests are both nonconclusive.
** Scatter plots, kernel density, leverage analysis, Studentized or standardized residuals larger than 3, DFbeta
and Cooks distance have been performed and lead to suspicions for nonnormality, outlier or high inﬂuential
observation presence.
between 1880 and 1988. They used nonlinear least squares to estimate equations of the form
(1/T)ln(yi,t/yi,t−T) = a−[ln(yi,t−T)] ×[(1−e−
β
T)/T] + x′
i
δ
+
ε
t,T
i,
i=1,...,48,T=8,10 or 20,t=1900,1920,1930,1940,1950,1960,1970,1980,1988.
Their basic equation does not include any other variables but they also consider a speciﬁcation with
regional dummies (Eq. with reg. dum.). The basic equation assumes that the 48 States share a
common per capita level of personal income at steady state while the second speciﬁcation allows
for regional differences in steady state levels. Their regressions involve 48 observations and are run
for each 20year or 10year period between 1880 and 1988. Their results suggest a
β
convergence
at a rate somewhat above 2% a year but their estimates are not stable across subperiods, and vary
greatly from .0149 to .0431 (for the basic equation). This instability is expected because of the
succession of troubles and growth periods in the last century. However, they may also be due to
particular observations behaving like outliers and inﬂuencing the leastsquares estimates. A survey
of potential data problem is performed and regression diagnostics are summarized in Table 4. It
suggests the presence of highly inﬂuential observations in all the periods but one. Outliers are
clearly identiﬁed in periods 19001920, 19401950, 19501960, 19701980 and 19801988.
These two effects are probably combined. We wonder which part of that variability is really
due to business cycles and which part is only due to the nonrobustness of leastsquares methods.
Further, we would like to have a stable estimate of the rate of convergence at steady state. For
this, we use robust signbased estimation with DS
β
,(X′X)−1. We consider the following linear
8. EMPIRICAL APPLICATIONS
28
Table 5. Regressions for personal income across U.S. States, 18801988. estimates of
β
Period Basic equation Equation with regional dummies
SIGN NLLS∗∗∗ SIGN NLLS∗∗∗
1880 −1900 .0012 .0101 .0016 .0224
[−.0068,.0123]∗[.0058, .0532]∗∗ [−.0123, .0211] [.0146, .0302]
1900 −1920 .0184 .0218 .0163 .0209
[.0092,.0313] [.0155, .0281] [−.0088, .1063] [.0086, .0332]
1920 −1930 −.0147 −.0149 −.0002 −.0122
[−.0301,.0018] [−.0249,−.0049] [−.0463,.0389] [−.0267, .0023]
1930 −1940 .0130 .0141 .0152 .0127
[.0043,.0234] [.0082, .0200] [−.0189, .0582] [.0027, .0227]
1940 −1950 .0364 .0431 .0174 .0373
[.0291,.0602] [.0372, .0490] [.0083, .0620] [.0314, .0432]
1950 −1960 .0195 .0190 .0140 .0202
[.0084,.0352] [.0121, .0259] [−.0044, .0510] [.0100, .0304]
1960 −1970 .0289 .0246 .0230 .0131
[.0099,.0377] [.0170, .0322] [−.0112, .0431] [.0047, .0215]
1970 −1980 .0181 .0198 .0172 .0119
[.0021,.0346] [−.0315, .0195] [−.0131, .0739] [−.0273, .0173]
1980 −1988 −.0081 −.0060 −.0059 −.0050
[−.0552,.0503] (.0130) [−.0472, .1344] (.0114)
* Projectionbased 95% CI.
** Asymptotic 95% CI.
*** Estimates from Barro and SalaiMartin (1991).
equation:
(1/T)ln(yi,t/yi,t−T) = a+
γ
[ln(yi,t−T)] + x′
i
δ
+
ε
t,T
i(8.3)
where xi’s contain regional dummies when included, and we compute HodgesLehmann estimate
for
β
=−(1/T)ln(
γ
T+1)for both speciﬁcations. We also provide 95%level projectionbased
CI, asymptotic CI and projectionbased pvalue functions for the parameter of interest. Results are
presented in Table 5 where Barro and SalaiMartin (1991) NLLS results are reported.
Sign estimates are more stable than leastsquares ones. They vary between [−.0147,.0364]
whereas leastsquares estimates vary between [−.0149,.0431]. This suggests that at least 12% of
the leastsquares estimates variability between subperiods are only due to the nonrobustness of
leastsquares methods. In all cases but two, signbased estimates are lower (in absolute values) than
the NLLS ones. Consequently, we incline to a lower value of the stable rate of convergence.
In graphics 6(a)8(f) [see Appendix B], projectionbased pvalue functions and optimal concen
trated signstatistics are presented for each basic equation over the period 18801988. The optimal
concentrated signbased statistic reports the minimal value of DSfor a given
β
(letting avarying).
The projectionbased pvalue function is the maximal simulated pvalue for a given
β
over admis
sible values of a. Those functions enable us to perform tests on
β
. 95% projection based conﬁdence
intervals for
β
presented in Table 5 are obtained by cutting the pvalue function with the p=.05