Content uploaded by Jean-Marie Dufour
Author content
All content in this area was uploaded by Jean-Marie Dufour on Feb 18, 2017
Content may be subject to copyright.
Content uploaded by Jean-Marie Dufour
Author content
All content in this area was uploaded by Jean-Marie Dufour on Feb 18, 2017
Content may be subject to copyright.
Finite-sample generalized confidence distributions
and sign-based robust estimators in median regressions
with heterogeneous dependent errors ∗
Élise Coudin †
INSEE-CREST
Jean-Marie Dufour ‡
McGill University
January 2017
∗The authors thank Magali Beffy, Marine Carrasco, Frédéric Jouneau, Marc Hallin, Thierry Magnac, Bill McCaus-
land, Benoit Perron, and Alain Trognon for useful comments and constructive discussions. This work was supported
by the William Dow Chair in Political Economy (McGill University), the Bank of Canada (Research Fellowship), the
Toulouse School of Economics (Pierre-de-Fermat Chair of excellence), the Universitad Carlos III de Madrid (Banco
Santander de Madrid Chair of excellence), a Guggenheim Fellowship, a Konrad-Adenauer Fellowship (Alexander-von-
Humboldt Foundation, Germany), the Canadian Network of Centres of Excellence [program on Mathematics of Informa-
tion Technology and Complex Systems (MITACS)], the Natural Sciences and Engineering Research Council of Canada,
the Social Sciences and Humanities Research Council of Canada, and the Fonds de recherche sur la société et la culture
(Québec).
†Centre de recherche en économie et statistique (CREST-ENSAE). Mailing address: Laboratoire de microéconome-
trie, CREST, Timbre J390, 15 Bd G. Péri, 92245 Malakoff Cedex, France. E-mail: elise.coudin@ensae.fr. TEL: 33 (0) 1
41 17 77 33; FAX: 33 (0) 1 41 17 76 34.
‡William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des
organisations (CIRANO), and Centre interuniversitaire de recherche en économie quantitative (CIREQ). Mailing address:
Department of Economics, McGill University, Leacock Building, Room 414, 855 Sherbrooke Street West, Montréal,
Québec H3A 2T7, Canada. TEL: (1) 514 398 4400 ext. 09156; FAX: (1) 514 398 4800; e-mail: jean-marie.dufour@
mcgill.ca . Web page: http://www.jeanmariedufour.com
ABSTRACT
We study the problem of estimating the parameters of a linear median regression without any as-
sumption on the shape of the error distribution – including no condition on the existence of moments
– allowing for heterogeneity (or heteroskedasticity) of unknown form, noncontinuous distributions,
and very general serial dependence (linear and nonlinear). This is done through a reverse inference
approach, based on a distribution-free testing theory [Coudin and Dufour (2009, The Economet-
rics Journal)], from which confidence sets and point estimators are subsequently generated. The
estimation problem is tackled in two complementary ways. First, we show how confidence distri-
butions for model parameters can be applied in such a context. Such distributions – which can be
interpreted as a form of fiducial inference – provide a frequency-based method for associating prob-
abilities with subsets of the parameter space (like posterior distributions do in a Bayesian setup)
without the introduction of prior distributions. We consider generalized confidence distributions
applicable to multidimensional parameters, and we suggest the use of a projection technique for
confidence inference on individual model parameters. Second, we propose point estimators, which
have a natural association with confidence distributions. These estimators are based on maximiz-
ing test p-values and inherit robustness properties from the generating distribution-free tests. Both
finite-sample and large-sample properties of the proposed estimators are established under weak
regularity conditions. We show they are median unbiased (under symmetry and estimator unicity)
and possess equivariance properties. Consistency and asymptotic normality are established without
any moment existence assumption on the errors, allowing for noncontinuous distributions, hetero-
geneity and serial dependence of unknown form. These conditions are considerably weaker than
those used to show corresponding results for LAD estimators. In a Monte Carlo study of bias and
RMSE, we show sign-based estimators perform better than LAD-type estimators in heteroskedastic
settings. We present two empirical applications, which involve financial and macroeconomic data,
both affected by heavy tails (non-normality) and heteroskedasticity: a trend model for the S&P
index, and an equation used to study
β
-convergence of output levels across U.S. States.
Key words: sign-based methods; median regression; test inversion; Hodges-Lehmann estimators;
confidence distributions; p-value function; least absolute deviation estimators; quantile regres-
sions; sign test; simultaneous inference; Monte Carlo tests; projection methods; non-normality;
heteroskedasticity; serial dependence; GARCH; stochastic volatility.
Journal of Economic Literature classification: C13, C12, C14, C15.
i
Contents
1. Introduction 1
2. Framework 4
2.1. Model ....................................... 4
2.2. Quadratic sign-based tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Confidence distributions 7
3.1. Confidence distributions in univariate regressions . . . . . . . . . . . . . . . . . 7
3.2. Simultaneous and projection-based p-value functions in multivariate regression .10
4. Sign-based estimators 13
4.1. Sign-based estimators as maximizers of a p-value function ............ 13
4.2. Sign-based estimators as solutions of a nonlinear generalized least-squares problem 13
4.3. Sign-based estimators as GMM estimators . . . . . . . . . . . . . . . . . . . . 15
5. Finite-sample properties of sign-based estimators 16
6. Asymptotic properties 17
6.1. Identification and consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2. Asymptotic normality ............................... 19
6.3. Asymptotic or projection-based confidence sets? . . . . . . . . . . . . . . . . . 20
7. Simulation study 21
7.1. Simulation setup ................................. 21
7.2. Bias and RMSE .................................. 23
8. Empirical applications 23
8.1. Drift estimation with heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . 25
8.2. A robust sign-based estimate of convergence across U.S. states .......... 26
9. Conclusion 29
A. Proofs 30
B. Convergence data: concentrated statistics and p-values 37
ii
List of Definitions, Assumptions, Propositions and Theorems
Assumption 2.0 : Weak conditional mediangale . . . . . . . . . . . . . . . . . . . . . . 5
Assumption 2.0 : Sign Moment condition . . . . . . . . . . . . . . . . . . . . . . . . . 5
Definition 3.0 : Confidence distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Assumption 4.0 : Invariance of the distribution function . . . . . . . . . . . . . . . . . . 13
Proposition 5.1 : Invariance ................................ 16
Proposition 5.2 : Median unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Assumption 6.0 : Mixing ................................. 17
Assumption 6.0 : Boundedness .............................. 17
Assumption 6.0 : Compactness .............................. 17
Assumption 6.0 : Regularity of the density . . . . . . . . . . . . . . . . . . . . . . . . . 17
Assumption 6.0 : Point identification condition . . . . . . . . . . . . . . . . . . . . . . 17
Assumption 6.0 : Uniformly positive definite weight matrix . . . . . . . . . . . . . . . . 17
Assumption 6.0 : Locally positive definite weight matrix . . . . . . . . . . . . . . . . . 17
Theorem 6.1 : Consistency ................................. 18
Assumption 6.1 : Uniformly bounded densities . . . . . . . . . . . . . . . . . . . . . . 19
Assumption 6.1 : Mixing with r>2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Assumption 6.1 : Definite positiveness of Ln. . . . . . . . . . . . . . . . . . . . . . . . 19
Assumption 6.1 : Definite positiveness of Jn. . . . . . . . . . . . . . . . . . . . . . . . 19
Theorem 6.2 : Asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
List of Tables
1 Simulated models ................................. 22
2 Simulated bias and RMSE ............................. 24
3 Constant and drift estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Summary of regression diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Regressions for personal income across U.S. States, 1880-1988 .......... 28
iii
1. INTRODUCTION
1
1. Introduction
A basic problem in statistics and econometrics consists in studying the relationship between a de-
pendent variable and a vector of explanatory variables under weak distributional assumptions. For
that purpose, the Laplace-Boscovich median regression is an attractive approach because it can
yield estimators and tests which are considerably more robust to non-normality and outliers than
least-squares methods; see Dodge (1997). The least absolute deviation (LAD) estimator is the refer-
ence estimation method in this context. Quantile regressions [Koenker and Bassett (1978), Koenker
(2005)] can be viewed as extensions of median regression. An important reason why such methods
yield more robust inference comes from the fact that hypotheses about moments are not generally
testable in nonparametric setups, while hypotheses about quantiles remain testable under similar
conditions [see Bahadur and Savage (1956), Dufour (2003), Dufour, Jouneau and Torrès (2008)].
Tests and confidence sets associated with LAD estimators are typically based on asymptotic
approximations. The distributional theory of LAD estimators and their extensions usually postu-
lates moment conditions on model errors, such as the existence of moments up to a given order,
as well as other regularity conditions, such as continuity, independence or identical distributions;
see for instance Portnoy (1991), Knight (1998), El Bantli and Hallin (1999), and Koenker (2005).
Further, this theory and the associated tests and confidence sets are typically based on asymptotic
approximations. The same remark applies to work on LAD-type estimation in models involving het-
eroskedasticity and autocorrelation [Zhao (2001), Weiss (1990)], endogeneity [Amemiya (1982),
Powell (1983), Hong and Tamer (2003)], censored models [Powell (1984, 1986)], and nonlinear
functional forms [Weiss (1991)]. By contrast, provably valid tests can be derived in such models,
under remarkably weaker conditions, which do not require the existence of moments and allow for
very general forms of heterogeneity (or heteroskedasticity); see Coudin and Dufour (2009).
In this paper, we exploit this feature of testing theory in the context of median regression to
derive more robust estimation methods. Specifically, we study the problem of estimating the pa-
rameters of a linear median regression without any assumption on the shape of the error distribution
– including no condition on the existence of moments at any order – allowing for heterogeneity
(or heteroskedasticity) of unknown form, noncontinuous distributions, and very general serial de-
pendence (linear and nonlinear). This is done through a reverse inference approach, which starts
from a distribution-free testing theory [Coudin and Dufour (2009)], subsequently exploited to derive
confidence sets and point estimators. Using the tests proposed in Coudin and Dufour (2009), the
estimation problem is tackled in two complementary ways.
First, we show how confidence distributions for model parameters [Schweder and Hjort (2002),
Xie and Singh (2013)] can be applied in such a context. Such distributions – which can be in-
terpreted as a form of fiducial inference [Fisher (1930), Buehler (1983), Efron (1998)] – provide
a frequency-based method for associating probabilities with subsets of the parameter space (like
posterior distributions do in a Bayesian setup) without the introduction of a prior distribution. In
the one-dimensional model, the confidence distribution is defined as a distribution whose quantiles
span all the possible confidence intervals [Schweder and Hjort (2002)]. In this paper, we consider
generalized confidence distributions applicable to multidimensional parameters, and we suggest the
use of a projection technique for confidence inference on individual model parameters. The latter
1. INTRODUCTION
2
are exact – in the sense that the parameters considered are covered with known probabilities (or
larger) – under the mediangale assumption considered in Coudin and Dufour (2009). Further, if
more general linear dependence is allowed, the proposed method remains valid asymptotically.
Second, we propose point estimates, which bear a natural association with the above confidence
distributions. These Hodges-Lehmann estimators are based on maximizing test p-values and in-
herit several robustness properties from the distribution-free tests used to generate them [Hodges
and Lehmann (1963)]. In particular, both finite-sample and large-sample properties are established
under very weak regularity conditions. We show they are median unbiased (under symmetry and es-
timator unicity) and possess equivariance properties with respect to linear transformations of model
variables. Consistency and asymptotic normality are established without any moment existence
assumption on the errors, allowing noncontinuous distributions, heterogeneity and general serial
dependence of unknown form. These conditions are considerably weaker than those usually used to
obtain corresponding results for LAD estimators.
The conjunction of sign-based tests, projection-based confidence regions, projection-based p-
values and sign-based estimators thus provides a complete system of inference, which is valid for
any given sample size under very weak distributional assumptions and remains asymptotically valid
under weaker conditions (including allowance for general forms of linear residual dependence).
Fisher’s fiducial distributions and other fiducial inference arguments [Fisher (1930), Buehler
(1983), Efron (1998), Hannig (2006)] are not commonly used in econometrics because they require
the availability of pivotal test statistics with known distributions. This condition is not fulfilled in
general, especially in semiparametric or nonparametric settings. However, in the context of me-
dian regression, sign-based methods provide a way to construct such pivots and fiducial inference
tools can be developed. For any given sample size, the sign transform enables one to construct test
statistics with known nuisance-parameter-free distributions without additional parametric restric-
tions. This enables us to construct fiducial inference tools adapted to multidimensional parameters.
We exploit realized p-value functions, which are constructed by testing hypotheses of the form
H0(
β
0):
β
=
β
0,where
β
is the vector of the regression coefficients. Specifically, we combine
sign-based tests for such joint hypotheses [as given in Coudin and Dufour (2009)] with projec-
tion techniques. For each component, a projected p-value function provides a representation of the
evidence for each possible value of that component.
Using the above p-values (as a function of
β
0), we then derive estimators and study their prop-
erties. Hodges and Lehmann (1963) proposed a general principle to directly derive estimators from
test procedures. They suggest to invert a test for H0(
β
0):
β
=
β
0, and then choose the value
β
0
which is “least rejected” by the test procedure. First applied to the Wilcoxon’s signed rank-statistic
for estimating a shift or a location, this principle was adapted to regression models by deriving
so-called R-estimators from rank or signed-rank statistics [Jureckova (1971), Jaeckel (1972), Koul
(1971)]. In a multidimensional context, this leads one to select the value of
β
0with the highest
degree of confidence, i.e. with the highest p-value.
We study the problem of estimating the parameters of the median regression by minimizing
(weighted) sign-based test statistics over different null hypotheses. Since the tests used to generate
them are remarkably robust, the estimators inherit strong robustness properties. Sign-based esti-
mators are generated by sign-based tests, and they inherit several attractive properties of the latter
1. INTRODUCTION
3
(e.g., robustness to non-normality and heterogeneity). We will see that these can alternatively be
computed by minimizing quadratic forms of the constrained signs. So they have a classical GMM
form [Hansen (1982), and Honore and Hu (2004) for GMM statistics involving signs].
Both finite-sample and large-sample properties of sign-based estimators are established under
weak regularity conditions. We show they are median unbiased (under symmetry and estimator
unicity) and possess equivariance properties with respect to linear transformations of model vari-
ables. Consistency and asymptotic normality are established without any moment existence as-
sumption on the errors, allowing noncontinuous distributions, heterogeneity and general serial de-
pendence of unknown form. These conditions are considerably weaker than those usually used
to obtain corresponding results for LAD estimators; see Bassett and Koenker (1978), Bloomfield
and Steiger (1983), Powell (1984), Phillips (1991), Pollard (1991), Portnoy (1991), Weiss (1991),
Knight (1998), El Bantli and Hallin (1999) and the references therein. In particular, asymptotic
normality and consistency hold for heavy-tailed disturbances which may not have finite variances.
This interesting property is induced by the sign transformation. Signs of residuals always possess
finite moments, so no further restriction on the disturbance moments is required. Except for Knight
(1989) and Phillips (1991), who considered the case of autoregressive models, the distribution of
LAD estimators in regressions where the error variances may not exist has received little atten-
tion. In general, LAD estimators and the sign-based estimators proposed here follow from different
optimization rules, and they can be quite different.
The class of sign-based estimators we propose includes as special cases the sign estimators
derived by Boldin, Simonova and Tyurin (1997) from locally most powerful sign tests in linear re-
gressions with i.i.d. errors and fixed regressors. Note also that the procedures proposed by Hong and
Tamer (2003) and Honore and Hu (2004) also rely on the i.i.d. assumption. In this paper, we stress
that a major advantage of signs over ranks consists in dealing transparently with heteroskedastic (or
heterogeneous) disturbances. Many heteroskedastic and possibly dependent schemes are covered
and, in presence of linear dependence, a HAC-type correction for heteroskedasticity and autocorre-
lation can be included in the criterion function.
The construction of sign-based estimators as Hodges-Lehmann estimators makes these a natural
complement of the finite-sample tests used to generate them. The latter rely on the exact distribution
of the corresponding sign-based test statistics, do not involve nuisance parameters, and allow one to
control test levels in finite samples under heteroskedasticity and nonlinear dependence of unknown
form. In Coudin and Dufour (2009), Monte Carlo test methods [Dwass (1957), Barnard (1963) and
Dufour (2006)] are combined with test inversion and projection techniques [Dufour (1990, 1997),
Dufour and Kiviet (1998), Abdelkhalek and Dufour (1998), Dufour and Jasiak (2001), Dufour and
Taamouti (2005)] to build confidence sets and test general hypotheses.1There is no need to estimate
the error density at zero in contrast with tests that rely on kernel estimates of the LAD asymptotic
covariance matrix.2Furthermore, when the test criteria are modified to cover linear dependence,
1For an alternative finite-sample inference exploiting a quantile version of the same sign pivotality result, which holds
if the observations are X-conditionally independent, see Chernozhukov, Hansen and Jansson (2009).
2In the i.i.d. error case, Honore and Hu (2004) observed in simulations that kernel-based estimates of the asymptotic
standard error of the median-based estimator tend to be too small, so the associated tests tend to overreject the null
hypothesis. Other estimates of the LAD asymptotic covariance matrix can be obtained by bootstrap procedures [design
matrix bootstrap in Buchinsky (1995, 1998), block bootstrap in Fitzenberger (1997), Bayesian bootstrap in Hahn (1997)]
2. FRAMEWORK
4
the resulting inference is asymptotically valid. The conjunction of sign-based tests, projection-
based confidence regions, and sign-based estimators thus provides a complete system of inference,
which is valid for any given sample size under very weak distributional assumptions and remains
asymptotically valid under even weaker conditions (including allowance for linear dependence in
regression disturbances).
We study the performance of the proposed estimators in a Monte Carlo study that allows for var-
ious non-Gaussian and heteroskedastic setups. We find that sign-based estimators are competitive
(in terms of bias and RMSE) when errors are i.i.d., while they are substantially more reliable than
usual methods (LS, LAD) when heterogeneity or serial dependence is present in the error term.
Finally we present two empirical applications, which involve financial and macroeconomic data.
In the first one, we study a trend model for the Standard and Poor’s Composite Price Index, over the
period 1928-1987 as well as the 1929 crash period (which is characterized by huge price volatilities).
In the second application, we consider an equation used to study
β
-convergence of output levels
across U.S. States, with a small size. In both cases, the data are affected by heavy tails (non-
normality) and heteroskedasticity.
The paper is organized as follows. Section 2 presents the model, the sign-based statistics and
the Monte Carlo tests. Section 3 is dedicated to confidence distributions and p-value functions. In
section 4, we define the proposed family of sign-based estimators. The finite-sample properties of
the sign-based estimators are studied in section 5, while their asymptotic properties are considered
in section 6. In section 7, we present the results of our simulation study of bias and RMSE. The
empirical application is reported in section 8. We conclude in section 9. Appendix A contains the
proofs.
2. Framework
We will now summarize the general framework we study and define the test statistics on which the
estimation methods we propose are based. This framework is the same as the one used in Coudin
and Dufour (2009).
2.1. Model
We consider a stochastic process {(yt,x′
t):
Ω
→Rp+1:t=1,2, . .. }defined on a probability space
(
Ω
,F,P),such that ytand xtsatisfy a linear model of the form
yt=x′
t
β
+ut,t=1,...,n,(2.1)
where ytis a dependent variable, xt= (xt1,..., xt p )′is a p-vector of explanatory variables, and ut
is an error process. The xt’s may be random or fixed. In the sequel, y= (y1,..., yn)′∈Rnwill
denote the dependent variable vector, X= (x1,..., xn)′∈Rn×pthe n×pmatrix of explanatory
variables, and u= (u1,...,un)′∈Rnthe disturbance vector. Moreover, F
t(·|x1,..., xn)represents
and resampling methods [Parzen, Wei and Ying (1994)]. But the justification of these also rely on usual asymptotic
regularity conditions.
2. FRAMEWORK
5
the distribution function of utconditional on X. This framework is also used in Coudin and Dufour
(2009).
The traditional form of a median regression assumes that the disturbances u1, .. . , unare i.i.d.
with median zero
Med(ut|x1,...,xn) = 0,t=1,...,n.(2.2)
Here, we relax the assumption that the utare i.i.d., and we consider moment conditions based on
residual signs where the sign operator s:R→{−1,0,1}is defined as s(a) = 1[0,+∞)(a)−1(−∞,0](a),
with 1A(a) = 1 if a∈Aand 1A(a) = 0 if a/∈A.For convenience, if u∈Rn, we will note s(u) =
s(u1),...,s(un), the n-vector of the signs of the components.
Assumption (2.2) is not sufficient to obtain a finite-sample distributional theory for sign statistics
(because further restrictions on the dependence between the errors are needed). Let us consider
adapted sequences S(v,F) = {vt,Ft:t=1,2, ... }where vtis any measurable function of Wt=
(yt,x′
t)′,Ftis a
σ
-field in
Ω
,Fs⊆Ftfor s<t,
σ
(W1,...,Wt)⊂Ftand
σ
(W1,...,Wt)is the
σ
-algebra spanned by W1,...,Wt.Then the weak conditional mediangale provides such a setup.
Assumption 2.1 WEAK COND ITI ONAL MEDIANGALE. Let Ft=
σ
(u1,...,ut,X),for t ≥1.u
in the adapted sequence S(u,F)is a weak mediangale conditional on X with respect to {Ft:t=
1,2, . .. }iff P[u1<0|X] = P[u1>0|X]and
P[ut<0|u1,...,ut−1,X] = P[ut>0|u1,...,ut−1,X],for t >1.(2.3)
Besides nonnormality (including no condition on the existence of moments), this assumption al-
lows for heterogeneity (or heteroskedasticity) of unknown form, noncontinuous distributions, and
general forms of (nonlinear) serial dependence, including GARCH-type and stochastic volatility of
unknown order. It does not, however, cover “linear serial dependence” such as an ARMA process
on ut.
Clearly, Assumption 2.1 entails (2.2). When E|xt|<+∞,for all t,it also implies that s(ut)is
uncorrelated with xt,an assumption we state for future reference.
Assumption 2.2 SIGN MO MEN T CONDITION.E|xt|<+∞and E[s(ut)xt] = 0,for t =1,...,n.
This assumption allows for both linear and nonlinear serial dependence, but makes difficult the
derivation of finite-sample distributions. We use it in the asymptotic results presented below.
2.2. Quadratic sign-based tests
In order to derive robust estimators, we consider tests for hypotheses of the form H0(
β
0):
β
=
β
0
vs. H1(
β
0):
β
6=
β
0in model (2.1)-(2.2). These are based on general quadratic forms based on the
vector s(y−X
β
0)of the constrained signs (i.e., the signs aligned with respect to X
β
0):
DS[
β
0,¯
Ω
n(
β
0)] = s(y−X
β
0)′X
Ω
ns(y−X
β
0),XX′s(y−X
β
0)(2.4)
where ¯
Ω
n(
β
0) =
Ω
ns(y−X
β
0),Xis a p×ppositive definite weight matrix which may depend on
the constrained signs. If the disturbances follow a weak mediangale (Assumption 2.1), sign-based
2. FRAMEWORK
6
statistics of this form constitute pivotal functions: the distribution of DS[
β
0,¯
Ω
n(
β
0)] conditional
on Xis completely determined under H0(
β
0)and can be simulated. Even though the distribution
of DS[
β
0,¯
Ω
n(
β
0)] depends on Xand
Ω
n·under H0(
β
0),critical values can be approximated to
any degree of precision by simulation. Alternatively, exact Monte Carlo tests can be built using a
randomized tie correction procedure [Dufour (2006)]. So we can get an exact test of H0(
β
0).The
fact that
Ω
n·depends on the data only through s(y−X
β
0)plays a central role in generating this
feature.
Further, if linear serial dependence is allowed and the assumption that s(y−X
β
0)are Xare
independent is relaxed [as described in Coudin and Dufour (2009)], this dependence can be taken
into account by an appropriate choice of
Ω
n·.The test statistic DS[
β
0,¯
Ω
n(
β
0)] then remains
asymptotically pivotal under H0(
β
0), and the finite-sample procedure just described yields a test
such that the probability of rejecting H0(
β
0)converges to the nominal level of test under any dis-
tribution compatible with H0(
β
0).In all cases, due to the sign transformation, the tests so obtained
are remarkably robust to heavy-tailed distributions (and other features).
It will be useful to spell out how an exact Monte Carlo test based on a discrete test statistic
like DS[
β
0,¯
Ω
n(
β
0)] can be obtained. Under Assumption 2.1, we can generate a vector of Nin-
dependent replicates D(1)
S(
β
0),...,D(N)
S(
β
0)′from the distribution of DS[
β
0,¯
Ω
n(
β
0)] under the
null hypothesis as well as (V(0),...,V(N))′a(N+1)-vector of i.i.d. uniform variables on the inter-
val [0,1].Setting D(0)
S(
β
0)≡DS[
β
0,¯
Ω
n(
β
0)] the observed statistic. Then, a Monte Carlo test for
H0(
β
0)consists in rejecting the null hypothesis whenever the empirical p-value is smaller than
α
,
i.e. ˜pN(
β
0)≤
α
where ˜pN(
β
0)≡ˆpN[D(0)
S(
β
0),
β
0],
ˆpN(x,
β
0) = Nˆ
GN(x,
β
0) + 1
N+1(2.5)
and ˆ
GN(x,
β
0) = 1−1
N∑N
i=1s+(x−D(i)
S(
β
0))+ 1
N∑N
i=1
δ
(D(i)
S(
β
0)−x)s+(V(i)−V(0)),with s+(x) =
1[0,∞)(x),
δ
(x) = 1{0}(x). When
α
(N+1)is an integer, the size of this test is equal to
α
for any
sample size n[see Dufour (2006)]. This procedure also provides a test such that the probability of
rejection converges to
α
.
Note also that the confidence region
C1−
α
(
β
) = {
β
0: ˜pN(
β
0)≥
α
}(2.6)
which contains all the values
β
0such that the empirical p-value ˜pN(
β
0)is higher than
α
has by
construction level 1 −
α
for any sample size. It is then possible to derive general (and possibly
nonlinear) tests and confidence sets by projection techniques. For example, conservative individual
confidence intervals are obtained in such a way. Finally, if DSis an asymptotically pivotal function
all previous results hold asymptotically. For a detailed presentation, see Coudin and Dufour (2009).
3. CONFIDENCE DISTRIBUTIONS
7
3. Confidence distributions
In the one-parameter model, statisticians have defined the confidence distribution notion that sum-
marizes a family of confidence intervals; see Schweder and Hjort (2002). By definition, the quantiles
of a confidence distribution span all the possible confidence intervals of a real
β
. The confidence
distribution is a reinterpretation of the Fisher fiducial distributions and provides, in a sense, an ana-
logue of Bayesian posterior probabilities in a frequentist setup [see also Fisher (1930), Neyman
(1941) and Efron (1998)]. This statistical notion is not commonly used in the econometric litera-
ture, for two reasons. First, it is only defined in the one-parameter case. Second, it requires that the
test statistic be a pivot with known exact distribution. Below we extend that notion (or an equiv-
alent) to multidimensional parameters. The sign transformation enables one to construct statistics
which are pivots with known distribution without imposing parametric restrictions on the sample.
Consequently, our setup does not suffer from the second restriction. In that section, we briefly recall
the initial statistical concept and apply it to an example in univariate regression. Then, we address
the extension to multidimensional regressions.
3.1. Confidence distributions in univariate regressions
Schweder and Hjort (2002) defined the confidence distribution for the real parameter
β
such a
distribution depending on the observations (y,x), whose cumulative distribution function evaluated
at the true value of
β
has a uniform distribution whatever the true value of
β
. In a formalized way,
this can be expressed as follows:
Definition 3.1 CON FIDENCE DISTRIBUTION. Any distribution with cumulative CD(
β
)and quan-
tile function CD−1(
β
), such that
P
β
[
β
≤CD−1(
α
;y;x)] = P
β
[CD(
β
;y;x)≤
α
] =
α
(3.1)
for all
α
∈(0,1)and for all probability distributions in the statistical model, is called a confidence
distribution of
β
.
(−∞,CD−1(
α
)] is a one-sided stochastic confidence interval with coverage probability
α
,3
and the realized confidence CD(
β
0;y;x)is the p-value of the one-sided hypothesis H∗
0(
β
0):
β
≤
β
0versus H∗
1(
β
0):
β
>
β
0when the observed data are y,x. The realized p-value when testing
H0(
β
0):
β
=
β
0versus H1(
β
0):
β
6=
β
0is 2min{CD(
β
0),1−CD(
β
0)}. Those relations are
stated in Lemma 2 of Schweder and Hjort (2002): the confidence of the statement “
β
≤
β
0” is the
degree of confidence CD(
β
0)for the confidence interval −∞,CD−1CD(
β
0), and is equal to
the p-value of a test of H∗
0(
β
0):
β
≤
β
0vs. H∗
1(
β
0):
β
>
β
0. Hence, tests and confidence intervals
on
β
are contained in the confidence distribution.
Schweder and Hjort (2002) also note that, since the cumulative function CD(
β
)is an invertible
function of
β
and is uniformly distributed, CD(
β
)constitutes a pivot conditional on x. Recipro-
cally, whenever a pivot increases with
β
(for example a continuous statistic T(
β
)with cumulative
3For continuous distributions, just note that P
β
[
β
≤CD−1(
α
)] = P
β
{CD(
β
)≤CDCD−1(
α
)}=P
β
{CD(
β
)≤
α
]}=
α
3. CONFIDENCE DISTRIBUTIONS
8
distribution function Fthat is independent of
β
and free of any nuisance parameter), FT(
β
)is
uniformly distributed and satisfies conditions for providing a confidence distribution. Let T(
β
)be
such a continuous real statistic increasing with
β
with a free of nuisance parameter distribution. A
test of H0:
β
≤
β
0is rejected when Tobs(
β
0)is large, with p-value P
β
0[T(
β
0)>Tobs(
β
0)]. Then,
P
β
0[T(
β
0)>Tobs(
β
0)] = 1−F
β
0(Tobs(
β
0)) = CD(
β
0)(3.2)
where F
β
0(.)is the sampling distribution of T(
β
0)under
β
=
β
0. Consequently, simulated sam-
pling distributions and simulated realized p-values as presented previously yield a way to construct
simulated confidence distributions.
The sampling distribution and the confidence distribution are fundamentally different theoreti-
cal notions. The sampling distribution is the probability distribution of T(
β
)obtained by repeated
samplings whereas the confidence distribution is an ex-post object that contains the confidence
statements one can have on the value of
β
given y,x,Tobs(
β
).
Randomized confidence distributions for discrete statistics. A last remark relates to discrete statis-
tics. Confidence distributions based on discrete statistics cannot lead to a continuous uniform dis-
tribution. Approximations must be used. Schweder and Hjort (2002) proposed half correction. For
discrete statistics, they used
CD(
β
0) = P
β
0[T(
β
0)>Tobs(
β
0)] + 1
2P
β
0[T(
β
0) = Tobs(
β
0)],(3.3)
We rather use randomization as in section 2. The discrete statistic T(
β
)is associated with an
auxiliary one UT, which is independently, uniformly and continuously distributed over [0,1]. Lexi-
cographical order is used to order ties.
CD(
β
0) = P
β
0[T(
β
0)>Tobs(
β
0)] + P[UT(
β
0)>UTobs(
β
0)]P
β
0[T(
β
0) = Tobs(
β
0)].(3.4)
Simulated confidence distributions and illustration. Let us consider a simple example to illustrate
those notions. In the model yi=
β
xi+ui,i=1,...,n,(ui,xi)iid
∼N(0,I2), the Student sign-based
statistic
SST (
β
) = ∑s(yi−xi
β
)xi
(∑x2
i)1/2
is a pivotal function and decreases with
β
. The simulated confidence distribution of
β
given the
realization y,xis
c
CD(
β
0) = 1−ˆ
F
β
0(SST (
β
0)),(3.5)
with ˆ
F
β
0a Monte Carlo estimate of the sampling distribution of SST under H0(
β
0):
β
=
β
0. Figure
1 presents a simulated confidence distribution cumulative function for
β
, given 200 realizations of
(ui,xi)based on SST . The Monte Carlo estimate of ˆ
F
β
0is obtained from 9999 replicates of SST
under H0(
β
0).Testing H∗
0:
β
≤.1 at 10% can be done by reading CD(.1)here .92. The test accepts
H∗
0. Further, (−∞, .23]constitutes a one-sided confidence interval for
β
with level .95.
Realized p-value functions for discrete statistics. Another interesting object is the realized p-value
3. CONFIDENCE DISTRIBUTIONS
9
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta
CD(beta)
Figure 1. Simulated confidence distribution cumulative function based on SST.
function when testing point hypotheses H0(
β
0). The latter is a simple transformation of the CD
cumulative function. The simulated realized p-value is given by
ˆpSST (
β
0) = 2min{c
CDSST (
β
0),1−c
CDSST (
β
0)}.(3.6)
Consider now the statistic SF =SST 2.SF is a pivotal function but not a monotone function of
β
contrary to SST . An entire confidence distribution cannot be recovered from SF because of this lack
of monotonicity. However, the p-value function can be constructed using equation (2.5). Figures
2 (a) and (b) compare p-value functions based on SST and SF . Inverting the p-value function
allows one to recover half of the confidence distribution and consequently half of the inference
results, i.e. the two-sided confidence intervals. For example, in Figure 2 (a), [−.17, .24]constitutes
a confidence interval with level 90% for both statistics. The p-value function provides then an
interesting summary on the available inference. Especially, it gives the confidence degree one can
have in the statement
β
=
β
0. Finally, the p-value function has an important advantage over the
confidence distribution: it is straightforwardly extendable to multidimensional parameters.
The spread of the p-value function is also related to the model specification and the parameter
identification. When the p-value function is flat, one may expect the parameter to be badly identified
either because there exists a set of observationally equivalent parameters, then, the p-values are high
for a wide set of values; either because there does not exist any value satisfying the model and then
the p-values are small everywhere. To illustrate that point, let us consider another example (example
2) where the first n1observations satisfy yi=
β
1xi+ui,i=1,...,n1,(ui,xi)iid
∼N(0,I2)and the n2
3. CONFIDENCE DISTRIBUTIONS
10
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta
p−value
FS SST
(a) Example 1: well identified case
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta
p−value
FS SST
(b) Example 2: misspecified case
Figure 2. Simulated p-value functions based on SST and SF
followings, yi=
β
2xi+ui,i=n1+1,...,n1+n2,(ui,xi)iid
∼N(0,I2), with
β
1=−.5 and
β
2=.5.
The model yi=
β
xi+ui,i=1,...,n1+n2, is misspecified. In Figure 2 (b), we notice the spread of
the p-value function based on SF is large: the set of observationally equivalent
β
is not reduced to
a point.
3.2. Simultaneous and projection-based p-value functions in multivariate regression
If p≥2, the confidence distribution notion is not defined anymore. However, simulated real-
ized p-values for testing H0(
β
0)can easily be constructed from the SF statistic and more gen-
erally from any sign-based statistic which satisfies equation (2.4). Simulated p-values lead to
a mapping for which we have a 3-dimensional representation for p=2. Consider the model:
yi=
β
1x1i+
β
2x2i+ui,i=1,...,n,(ui,x1i,x2i)iid
∼N(0,I3),
β
= (
β
1,
β
2) = (0,0)′,y= (y1,...,yn)′,
u= (u1,...,un)′,x1= (x11,...,x1n)′,x2= (x21,...,x2n)′and X= (x1,x2). Let DS
β
,(X′X)−1=
s′(y−X
β
)X(X′X)−1X′s(y−X
β
). In Figure 3, we compute the simulated p-value function ˜pDS
N(
β
0)
for testing H0(
β
0)on a grid of values of
β
0, using Nreplicates of the sign vector. ˜pDS
N(
β
0)allows
one to construct simultaneous confidence sets for
β
= (
β
1,
β
2)with any level. By construction, the
confidence region C1−
α
(
β
)defined as
C1−
α
(
β
) = {
β
|˜pDS
N(
β
0)≥
α
},(3.7)
3. CONFIDENCE DISTRIBUTIONS
11
−0.5 −0.4 −0.3 −0.2 −0.1 00.1 0.2 0.3 0.4 0.5
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta1
beta2
simulated
p−values
Figure 3. Simulated p-value functions based on SF (n=200, N=9999).
has level 1 −
α
[see Dufour (2006)]. Hence, by construction, C1−
α
(
β
)corresponds to the intersec-
tion of the horizontal plan at ordinate
α
with the envelope of ˜pDS
N(
β
0).
For higher dimensions (p>2), a complete graphical representation is not available anymore.
However, one can consider projection-based realized p-value functions for each individual compo-
nent of the parameter of interest in a similar way than projection-based confidence intervals. For
this, we apply the general strategy of projection on the complete simultaneous p-value function.
The projected-based realized p-value function for the component
β
1is given by:
Proj.˜p
β
1
N(
β
1
0) = max
β
2
0∈R
˜pDS
N[(
β
1
0,
β
2
0)].(3.8)
Figure 4 presents projection-based confidence intervals for the individual parameters of the previous
3. CONFIDENCE DISTRIBUTIONS
12
Figure 4. Projection-based p-values.
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta1
Projected p−values
(a) Projection-based pvalues for
β
1
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
beta2
Projected p−values
(b) Projection-based pvalues for
β
2
2-dimensional example. [−.22,.21]is a 95% (conservative) confidence interval for
β
1.[−.38,.02]
is a 95% (conservative) confidence interval for
β
2. Testing
β
1=0 is accepted at 5% with p-value
1.0. Testing
β
2=0 is accepted at 5% with p-value .06.
Controlled inference using simulated confidence distributions and realized p-values. Simulated
confidence distribution and realized p-values are Monte Carlo-based tools. Hence derived tests
control the nominal size only for
α
’s such that
α
(N+1)∈N; see Dufour (2006):
P[˜pDS
N(
β
0)≤
α
] =
α
∀
α
such that
α
(N+1)∈N.
If
α
(N+1)/∈N, only bounds on the significance level are known, but they are very close to
α
when
Nis sufficiently large:
I(
α
(N+1)−1)
N+1≤P[˜pDS
N(
β
0)≤
α
]<
α
∀
α
such that
α
(N+1)/∈N.
Contrary to tests, simulated confidence distributions and realized p-values are not evaluated at a
given significance level
α
but rather on a range of significance levels (
α
1,...,
α
A). Hence, one
must choose carefully Nthe number of replicates in order to control the significance level for all the
α
i’s, i.e. choose Nsufficiently large to have (N+1)
α
i∈N,∀
α
i∈(
α
1,...,
α
A). In the previous
illustrations, N=9999 which insures that the significance levels are controlled at .0001.
4. SIGN-BASED ESTIMATORS
13
4. Sign-based estimators
Sign-based estimators complete the above system of inference. Intuition suggests to consider values
with the highest confidence degree, i.e, with the highest p-values. Estimators obtained by that sort
of test inversion constitute multidimensional extensions of the Hodges-Lehmann principle.
4.1. Sign-based estimators as maximizers of a p-value function
Hodges and Lehmann (1963) presented a general principle to derive estimators by test inversion; see
also Johnson, Kotz and Read (1983). Suppose
µ
∈Rand T(
µ
0,W)is a statistic for testing
µ
=
µ
0
against
µ
>
µ
0based on the observations W. Suppose further that T(
µ
,W)is nondecreasing in the
scalar
µ
. Given a known central value of T(
µ
0,W), say m(
µ
0)[for example EWT(
µ
0,W)], the test
rejects
µ
=
µ
0whenever the observed Tis larger than, say, m(
µ
0). If that is the case, one is inclined
to prefer higher values of
µ
. The reverse holds when testing the opposite. If m(
µ
0)does not depend
on
µ
0[m(
µ
0) = m0], an intuitive estimator of
µ
(if it exists) is given by
µ
∗such that T(
µ
∗,W)
equals m0(or is very close to m0).
µ
∗may be seen as the value of
µ
which is most supported by the
observations.
This principle can be directly extended to multidimensional parameter setups through p-value
functions. Let
β
∈Rp. Consider testing H0(
β
0):
β
=
β
0versus H1(
β
0):
β
=
β
1with the positive
statistic T. A test based on Trejects H0(
β
0)when T(
β
0)is larger than a certain critical value
that depends on the test level. The estimator of
β
is chosen as the value of
β
least rejected when
the level
α
of the test increases. This corresponds to the highest p-value. If the associated p-
value for H0(
β
0)is p(
β
0) = GDS(
β
0)|
β
0, where G(x|
β
0)is the survival function of DS(
β
0),i.e.
G(x|
β
0) = P[DS(
β
0)>x], the set
M1=argmax
β
∈Rp
p(
β
)(4.1)
constitutes a set of Hodges-Lehmann-type estimators. HL-type estimators maximize the p-value
function. There may not be a unique maximizer. In that case, any maximizer is consistent with the
data.
4.2. Sign-based estimators as solutions of a nonlinear generalized least-squares
problem
When the distribution of T(
β
0)and the corresponding p-value function do not depend on the tested
value
β
0, maximizing the p-value is equivalent to minimizing the statistic T(
β
0). This point is
stated in the following proposition. Let us denote ¯
F(x|
β
0)the distribution of T(
β
0)when
β
=
β
0
and assume this distribution is invariant to
β
(Assumption 4.1).
Assumption 4.1 INVARIANCE OF THE DISTR IBUTION FUNCT ION.
¯
F(x|
β
0) = ¯
F(x)∀x∈R+,∀
β
0∈Rp.
4. SIGN-BASED ESTIMATORS
14
Let us define
M1=argmax
β
∈Rp
p(
β
).(4.2)
M2=argmin
β
∈Rp
T(
β
).(4.3)
Then, the following proposition holds.
Proposition 4.1 If Assumption 4.1 holds, then M1=M2with probability one.
Maximizing p(
β
)is equivalent (in probability) to minimizing T(
β
)if Assumption 4.1 holds.
Under the mediangale Assumption 2.1, any sign-based statistic DSdoes satisfy Assumption 4.1.
Consequently,
ˆ
β
n(
Ω
n)∈argmin
β
∈Rp
s′(Y−X
β
)X
Ω
ns(Y−X
β
),XX′s(Y−X
β
) = M2(Y,X,D
Ω
n
S)(4.4)
equals (with probability one) a Hodges-Lehmann estimator based on DS(
Ω
n,
β
). Since DS(
Ω
n,
β
)is
non-negative, problem (4.4) always possesses at least one solution. As signs can only take 3 values,
for fixed n, the quadratic function can take a finite number of values, which entails the existence of
the minimum. If the solution is not unique, one may add a choice criterion. For example, one can
choose the smallest solution in terms of a norm or use a randomization. Under conditions of point
identification, any solution of (4.4) is a consistent estimator.
In models with sets of observationally equivalent values of
β
, any inference approach relying on
the consistency of a point estimator (which assumes point identification), gives misleading results
whereas a whole estimator set remains informative. The approach of Chernozhukov, Hong and
Tamer (2007) can be applied here. Let us remind that the Monte Carlo sign-based inference method
[Coudin and Dufour (2009)] does not rely on identification conditions and leads to valid results in
any case.
The sign-based estimators studied by Boldin et al. (1997) are solutions of
ˆ
β
n(Ip)∈arg min
β
∈Rps′(Y−X
β
)XX′s(Y−X
β
) = argmin
β
∈R
SB(
β
),(4.5)
and ˆ
β
n((X′X)−1)∈arg min
β
∈Rps′(Y−X
β
)X(X′X)−1X′s(Y−X
β
) = argmin
β
∈R
SF(
β
).(4.6)
For heteroskedastic independent disturbances, we introduce weighted versions of sign-based esti-
mators that can be more efficient than the basic ones defined in (4.5) or (4.6). Weighted sign-based
estimators are sign-based analogues to weighted LAD estimator [see Zhao (2001)]. The weighted
LAD estimator is given by
β
W LAD
n=argmin
β
∈Rp∑
i
di|yi−x′
i
β
|.(4.7)
4. SIGN-BASED ESTIMATORS
15
The weighted sign-based estimators are solutions of
ˆ
β
DX
n∈argmin
β
∈Rp
s′(Y−X
β
)˜
X(˜
X′˜
X)−1˜
X′D′s(Y−X
β
)(4.8)
where ˜
X=diag(d1, . .. , dn)Xand (di),i=1,...,n∈R+∗. Weighted sign-based estimators that
involve optimal estimating functions in the sense of Godambe (2001) are solutions of
ˆ
β
DX∗
n∈argmin
β
∈Rp
s′(Y−X
β
)X∗(X∗′X∗)−1X∗′D′s(Y−X
β
)(4.9)
where ˜
X=diagf1(0|X), . .. , fn(0|X)Xand ft(0|X),t=1,...,n,are the conditional disturbance
densities evaluated at zero. The inherent problem of such a class of estimators is to provide good
approximations of fi(0|X)’s. Densities of normal distributions can be used.
4.3. Sign-based estimators as GMM estimators
Sign-based estimators have been interpreted in the literature as GMM estimators exploiting the or-
thogonality condition between the signs and the explanatory variables [see Honore and Hu (2004)].
In our opinion, a strictly GMM interpretation hides the link with the testing theory. That is the rea-
son why we first introduced sign-based estimators as Hodges-Lehmann estimators. The quadratic
form (4.4) refers to quite unusual moment conditions. The sign transformation evacuates the un-
known parameters that affect the error distribution. It validates nonparametric finite-sample-based
inference when mediangale Assumption holds. However, in settings where only the sign-moment
condition 2.2 is satisfied, the GMM interpretation of sign-based estimators still applies and entails
useful extensions.
For autocorrelated disturbances, an estimator based on a HAC sign-based statistic DS(
β
,ˆ
J−1
n)
can be used:
ˆ
β
n(ˆ
J−1
n)∈arg min
β
∈Rps′(Y−X
β
)X[ˆ
Jn(s(Y−X
β
),X)]−1X′s(Y−X
β
),(4.10)
where ˆ
Jn(s(Y−X
β
),X)accounts for the dependence among the signs and the explanatory variables.
β
appears twice, first in the constrained signs, second in the weight matrix. In practice, optimizing
(4.10) requires one to invert a new matrix ˆ
Jnfor each value of
β
whereas problem (4.6) only re-
quires one inversion of X′X. In practice, this numerical problem may quickly become cumbersome
similarly to continuously updating GMM. We advocate to use a two-step method: first, solve (4.6)
and obtain ˆ
β
n((X′X)−1); compute then ˆ
J−1
ns(Y−Xˆ
β
n((X′X)−1)),Xand finally solve,
ˆ
β
2S
n(ˆ
J−1
n)∈arg min
β
∈Rps′(Y−X
β
)X[ˆ
Jn(s(Y−Xˆ
β
n),X)]−1X′s(Y−X
β
).(4.11)
The 2-step estimator is not a Hodges-Lehmann estimator anymore. However, it is still consistent and
share some interesting finite-sample properties with classical sign-based estimators. The properties
of sign-based estimators are studied in the next section.
5. FINITE-SAMPLE PROPERTIES OF SIGN-BASED ESTIMATORS
16
5. Finite-sample properties of sign-based estimators
In this section, finite-sample properties of sign-based estimators are studied. Sign-based estimators
share invariance properties with the LAD estimator and are median-unbiased if the disturbance
distribution is symmetric and some additional assumptions on the form of the solution are satisfied.
The topology of the argmin set of the optimization problem 4.4 does not possess a simple structure.
In some cases it is reduced to a single point like the empirical median of 2p+1 observations. In
other cases, it is a set. More generally, the argmin set is a union of convex sets but it is not a priori
either convex nor connected. To see that it is a union of convex sets just remark that the reciprocal
image of nfixed signs is convex.
Sign-based estimators share some attractive equivariance properties with LAD and quantile es-
timators [see Koenker and Bassett (1978)]. It is straightforward to see that the following proposition
holds.
Proposition 5.1 INVARIANCE. Let M(y,X)be the set of the solutions of the minimization problem
(4.4).If ˆ
β
(y,X)∈M(y,X),then the following properties hold:
λ
ˆ
β
(y,X)∈M(
λ
y,X),∀
λ
∈R,(5.1)
ˆ
β
(y,X) +
γ
∈M(y+X
γ
,X),∀
γ
∈Rp,(5.2)
A−1ˆ
β
(y,X)∈M(y,XA),for any nonsingular k ×k matrix A.(5.3)
Further, if ˆ
β
(y,X)is a uniquely determined solution of (4.4),then
ˆ
β
(
λ
y,X) =
λ
ˆ
β
(y,X),∀
λ
∈R,(5.4)
ˆ
β
(y+X
γ
,X) = ˆ
β
(y,X) +
γ
,∀
γ
∈Rp,(5.5)
ˆ
β
(y,XA) = A−1ˆ
β
(y,X),for any nonsingular k ×k matrix A.(5.6)
To prove this property, it is sufficient to write down the different optimization problems. (5.1)
and (5.4) state a form of scale invariance: if yis rescaled by a certain factor, ˆ
β
, rescaled by the same
one is solution of the transformed problem. (5.2) and (5.5) represent location invariance, while
(5.3) and (5.6) show the behavior of the estimator changes states a reparameterization of the design
matrix. In all cases, parameter estimates change in the same way as theoretical parameters.
If the disturbance distribution is assumed to be symmetric and the optimization problems to
have a unique solution then sign-estimators are median unbiased.
Proposition 5.2 MEDIAN UN BIA SEDNESS. If u ∼ −u and the sign-based estimator ˆ
β
(y,X)is a
uniquely determined solution of the minimization problem(4.4), then ˆ
β
is median unbiased, i.e.
Med(ˆ
β
−¯
β
) = 0
where ¯
β
represents the “true value” of
β
.
6. ASYMPTOTIC PROPERTIES
17
6. Asymptotic properties
We demonstrate consistency of the proposed sign-based estimators when the parameter is identified
under weaker assumptions than the LAD estimator, which validates the use of sign-based estima-
tors even in settings when the LAD estimator fails to converge. Finally, sign-based estimators are
asymptotically normal. For reviews of the asymptotic distributional theory of LAD estimators, the
reader may consult Bassett and Koenker (1978), Knight (1989), Phillips (1991), Pollard (1991),
Portnoy (1991), Weiss (1991), Fitzenberger (1997), Knight (1998), El Bantli and Hallin (1999), and
Koenker (2005).
6.1. Identification and consistency
We show that the sign-based estimators (4.4) and (4.11) are consistent under the following set of
assumptions. In the sequel, we denote by ¯
β
the “true value” of
β
,and by
β
0any hypothesized
value.
Assumption 6.1 MIXI NG.{Wt= (yt,x′
t)}t=1,2,... is
α
-mixing of size −r/(r−1)with r >1.
Assumption 6.2 BOUNDEDNESS. xt= (x1t,...,xpt )′and E|xht |r+1<
∆
<∞,h=1,...,p,t=
1,...,n,∀n∈N.
Assumption 6.3 COMPACTNESS.¯
β
∈Int(
Θ
), where
Θ
is a compact subset of Rp.
Assumption 6.4 REGU LARITY OF THE DENSITY.
1. There are positive constants fLand p1such that, for all n ∈N,
P[ft(0|X)>fL]>p1,t=1,...,n,a.s.
2. ft(·|X)is continuous, for all n ∈Nfor all t, a.s.
Assumption 6.5 POIN T IDE NTIFICATI ON CONDITION.∀
δ
>0,∃
τ
>0such that
liminf
n→∞
1
n∑
t
P[|x′
t
δ
|>
τ
|ft(0|x1,...,xn)>fL]>0.
Assumption 6.6 UNIFORMLY POSIT IVE D EFINITE WEIGHT MATRIX.¯
Ω
n(
β
)is symmetric posi-
tive definite for all
β
in
Θ
.
Assumption 6.7 LO CALLY POSITIV E DEFI NITE WEIGHT MATRIX.¯
Ω
n(
β
)is symmetric positive
definite for all
β
in a neighborhood of ¯
β
.
Then, we can state the consistency theorem. The assumptions are interpreted just after.
6. ASYMPTOTIC PROPERTIES
18
Theorem 6.1 CONSISTEN CY. Under model (2.1)with the assumptions 2.2 and 6.1-6.6, any
sign-based estimator of the type,
ˆ
β
n∈argmin
β
0∈
Θ
s(y−X
β
0)′X
Ω
ns(y−X
β
0),XX′s(y−X
β
0)(6.1)
or ˆ
β
2S
n∈argmin
β
0∈
Θ
s(y−X
β
0)′Xˆ
Ω
ns(y−Xˆ
β
),XX′s(y−X
β
0),(6.2)
where ˆ
β
stands for any (first step) consistent estimator of ¯
β
, is consistent. ˆ
β
2S
ndefined in equation
(6.2)is also consistent if Assumption 6.6 is replaced by Assumption 6.7.
It will useful to discuss Assumptions 6.1 -6.7 and compare them with the ones required for LAD
and quantile estimator consistency. On considering the special case where X
Ω
ns(y−X
β
0),XX′=
Inthe identity matrix, the estimators in (6.1) - (6.2) coincide with the “quantile regression estimator”
(with
θ
=1/2)studied by Fitzenberger (1997, Theorem 2.2). However, allowing for a weighting
matrix different the identity matrix – as we do here – turns out to be important from the viewpoint
of efficiency. Stricto sensu, the sign-based estimators in (6.1) - (6.2) and Fitzenberger (1997, The-
orem 2.2) are not LAD estimators, because the size of residuals (through absolute values) do not
appear in the objective function. This feature is crucial for relaxing assumptions on moments. The
disturbances indeed appear in the objective function only through their sign transforms which pos-
sess finite moments at all orders. Consequently, no additional restriction need be imposed on the
disturbance process (in addition to regularity conditions on the density). Only assumptions on the
moments of xtare used (see Assumption 6.2). There is very little work on LAD estimators proper-
ties with infinite variance errors; see Knight (1989) and Phillips (1991) who derive LAD asymptotic
properties for an autoregressive model with infinite variance errors, which are in the domain of
attraction of a stable law.
Assumption 6.1 on mixing is needed to apply a generic weak law of large numbers; see An-
drews (1987) and White (2001). It was used by Fitzenberger (1997) with stationary linearly depen-
dent processes. It covers, among other processes, stationary ARMA disturbances with continuously
distributed innovations. Identification is provided by Assumptions 6.4 and 6.5. Assumption 6.5 is
similar to Condition ID in Weiss (1991). Assumption 6.4 is usual in LAD estimator asymptotics.4
It is analogous to Fitzenberger’s (1997) conditions (ii.b) - (ii.c) and Weiss’s (1991) CD condition.
It implies that there is enough variation around zero to identify the median. It restricts the setup
for some “bounded” heteroskedasticity in the disturbance process but not in the usual (variance-
based) way. It is related to diffusivity 2f(0)−1, a dispersion measure adapted to median-unbiased
estimators. Diffusivity indicates the vertical spread of a density rather than its horizontal spread,
and appears in Cramér-Rao-type efficiency bounds for median-unbiased estimators; see Sung, Stan-
genhaus and David (1990) and So (1994). Assumption 6.6 entails that the weight matrix
Ω
nis
everywhere invertible, while Assumption 6.7 only requires local invertibility.
4Assumption 6.4 can be slightly relaxed covering error terms with mass point if the objective function involves ran-
domized signs instead of usual signs.
6. ASYMPTOTIC PROPERTIES
19
6.2. Asymptotic normality
Sign-based estimators are asymptotically normal. Sign-based estimators are well adapted to deal
with heavy-tailed disturbances that may not possess finite variances. The assumptions we consider
are the following ones.
Assumption 6.8 UNIF ORM LY BOUNDED DENS ITI ES.∃fU<+∞such that ,∀n∈N,∀
λ
∈R,
sup
{t∈(1,...,n)}|ft(
λ
|x1,...,xn)|<fU,a.s.
Under the conditions 2.2, 6.1, 6.2 and 6.8, we can define L(
β
), the derivative of the limiting
objective function at
β
:
L(
β
) = lim
n→∞
1
n∑
t
Extx′
tftx′
t(
β
−¯
β
)|x1,...,xn=lim
n→∞Ln(
β
).(6.3)
where
Ln(
β
) = 1
n∑
t
Extx′
tftx′
t(
β
−¯
β
)|x1,...,xn.(6.4)
The other assumptions are fairly standard conditions to prove asymptotic normality.
Assumption 6.9 MIXING WITH r>2. The process {Wt= (yt,x′
t):t=1,2,...}is
α
-mixing of
size −r/(r−2)with r >2.
Assumption 6.10 DEFIN ITE POS ITIVENESS OF Ln. Ln(¯
β
)is positive definite uniformly in n.
Assumption 6.11 DEFINITE POSI TIVENESS OF Jn. Jn=E1
n∑n
t,ss(ut)xtx′
ss(us)is positive defi-
nite uniformly in n and converges to a definite positive symmetric matrix J as n →∞.
Then, we have the following result.
Theorem 6.2 ASYM PTOT IC NO RMALITY. Under the assumptions (2.2), 6.1 to 6.6, and 6.9 to
6.11, we have:
S−1/2
n√nˆ
β
n−¯
β
d
→N(0,Ip)(6.5)
where ˆ
β
n(
Ω
n)is any estimator which minimizes DS[
β
0,¯
Ω
n(
β
0)] in (2.4),
Sn= [Ln(¯
β
)
Ω
nLn(¯
β
)]−1Ln(¯
β
)
Ω
nJn
Ω
nLn(¯
β
)[Ln(¯
β
)
Ω
nLn(¯
β
)]−1
and
Ln(¯
β
) = 1
n∑
t
Extx′
tft0|x1,...,xn.(6.6)
When ¯
Ω
n(
β
0) = ˆ
Jn(
β
0)−1and ˆ
Jn(
β
0) = 1
n∑n
t,ss(yt−x′
t
β
0)xtx′
ss(ys−x′
s
β
0), we get:
[Ln(¯
β
)ˆ
J−1
nLn(¯
β
)]−1/2√nˆ
β
n(ˆ
J−1
n)−¯
β
d
→N0,Ip.(6.7)
6. ASYMPTOTIC PROPERTIES
20
This corresponds to the use of optimal instruments and quasi-efficient estimation. ˆ
β
n(ˆ
J−1
n)has the
same asymptotic covariance matrix as the LAD estimator. Thus, performance differences between
the two estimators correspond to finite-sample features. This result contradicts the generally ac-
cepted idea that sign procedures involve a heavy loss of information. There is no loss induced by
the use of signs instead of absolute values.
Note again that we do not require that the disturbance process variance be finite. We only assume
that the second-order moments of Xare finite and the mixing property of {Wt,t=1,...}holds.
This differs from usual assumptions for LAD asymptotic normality.5This difference comes from
the fact that absolute values of the disturbance process are replaced in the objective function by their
signs. Since signs possess finite moments at any order, one sees easily that a CLT can be applied
without any further restriction. Consequently, asymptotic normality, such as consistency, holds
for heavy-tailed disturbances that may not possess finite variance. This is an important theoretical
advantage of sign-based rather than absolute value-based estimators and, a fortiori, rather than
least-squares estimators. Estimators, for which asymptotic normality holds on bounded asymptotic
variance assumption (for example OLS) are not accurate in heavy-tail settings because the variance
is not a measure of dispersion adapted to those settings. Estimators, for which the asymptotic
behavior relies on other measures of dispersion, like the diffusivity, help one out of trouble.
The form of the asymptotic covariance matrix simplifies under stronger assumptions. When the
signs are mutually independent conditional on X[mediangale Assumption 2.1], both ˆ
β
n((X′X)−1)
and ˆ
β
(J−1
n)are asymptotically normal with variance
Sn= [Ln(¯
β
)]−1E"(1/n)
n
∑
t=1
xtx′
t#[Ln(¯
β
)]−1.
If uis an i.i.d. process and is independent of X, then ft(0) = f(0),and
Sn=1
4f(0)2E(xtx′
t)−1.(6.8)
In the general case, ft(0)is a nuisance parameter even if condition 6.8 implies that it can be bounded.
All the features known about the LAD estimator asymptotic behavior apply also for the SHAC
estimator; see Boldin et al. (1997). For example, asymptotic relative efficiency of the SHAC (and
LAD) estimator with respect to the OLS estimator is 2/
π
if the errors are normally distributed
N(0,
σ
2), but SHAC (such as LAD) estimator can have arbitrarily large ARE with respect to OLS
when the disturbance generating process is contaminated by outliers.
6.3. Asymptotic or projection-based confidence sets?
In section 4, we introduced sign-based estimators as Hodges-Lehmann estimators associated with
sign-based statistics. By linking them with GMM settings, we then derived asymptotic normal-
ity. We stressed that sign-based estimator asymptotic normality holds under weaker assumptions
5See Fitzenberger (1997) for the derivation of the LAD asymptotics in a similar setup and Bassett-Koenker(1978) or
Weiss (1991) for a derivation of the LAD asymptotics under sign independence.
7. SIMULATION STUDY
21
than the ones needed for the LAD estimator. Therefore, sign-based estimator asymptotic normal-
ity enables one to construct asymptotic tests and confidence intervals. Thus, we have two ways of
making inference with signs: we can use the Monte Carlo (finite-sample) based method described
in Coudin and Dufour (2009) - see section 2.2 - and the classical asymptotic method. Let us list here
the main differences between them. Monte Carlo inference relies on the pivotality of the sign-based
statistic. The derived tests are valid (with controlled level) for any sample size if the mediangale
Assumption 2.1 holds. When only the sign moment condition 2.2 holds, the Monte Carlo inference
remains asymptotically valid. Asymptotic test levels are controlled. Besides, in simulations, the
Monte Carlo inference method appears to perform better in small samples than classical asymptotic
methods, even if its use is only asymptotically justified [see Coudin and Dufour (2009)]. Never-
theless, that method has an important drawback: its computational complexity. On the contrary,
classical asymptotic methods which yield tests with controlled asymptotic level under the sign mo-
ment condition 2.2 may be less time consuming. The choice between both is mainly a question
of computational capacity. We point out that classical asymptotic inference greatly relies on the
way the asymptotic covariance matrix, that depends on unknown parameters (densities at zero), is
treated. If the asymptotic covariance matrix is estimated thanks to a simulation-based method (such
as the bootstrap) then the time argument does not hold anymore. Both methods would be of the
same order of computational complexity.
7. Simulation study
In this section, we compare the performance of the sign-based estimators with the OLS and LAD
estimators in terms of asymptotic bias and RMSE.
7.1. Simulation setup
We use estimators derived from the sign-based statistics DS
β
,(X′X)−1and DS(
β
,ˆ
J−1
n)when a
correction is needed for linear serial dependence (SHAC estimator). Minimizations are solved by
simulated annealing. We consider a set of general DGP’s to illustrate different classical problems
one may encounter in practice. We use the following linear regression model:
yt=x′
t
β
+ut(7.1)
where xt= (1,x2,t,x3,t)′and
β
are 3 ×1 vectors. We denote the sample size n. Monte Carlo studies
are based on Sgenerated random samples. Table 1 presents the cases considered.
In a first group of examples (A1-A4), we consider classical independent cases with bounded
heterogeneity. In a second one (B5-B8), we look at processes involving large heteroskedasticity
so that some of the estimators we consider may not be asymptotically normal nor even consistent.
Finally, the third group (C9-C11) is dedicated to autocorrelated disturbances. We wonder whether
the two-step SHAC sign-based estimator performs better in small samples than the non-corrected
one.
To sum up, cases A1 and A2 present i.i.d. normal observations without and with conditional
heteroskedasticity. Case A3 involves a sort of weak nonlinear dependence in the error term. Case A4
7. SIMULATION STUDY
22
Table 1. Simulated models.
A1: Normal HOM errors (x2,t,x3,t,ut)′i.i.d
∼N(0,I3),t=1,.. . ,n
A2: Normal HET errors (x2,t,x3,t,˜ut)′i.i.d
∼N(0,I3),
ut=min{3,max[0.21,|x2,t|]}× ˜ut,t=1,...,n
A3: Dep.-HE T x j,t=
ρ
xxj,t−1+
ν
j
t,j=1,2,
ρ
x=.5 : ut=min{3,max[0.21,|x2,t|]}×
ν
u
t,
(
ν
2
t,
ν
3
t,
ν
u
t)′i.i.d
∼N(0,I3),t=2,...,n
ν
2
1and
ν
3
1chosen to insure stationarity.
A4: Unbalanced design matrix x2,t∼B(1,0.3),x3,t
i.i.d.
∼N(0,.012),
ut
i.i.d.
∼N(0,1),xt,utindependent, t=1,...,n.
B5: Cauchy errors (x2,t,x3,t)′∼N(0,I2),
ut
i.i.d.
∼C,xt,ut,independent, t=1,. . . , n.
B6: Stochastic volatility (x2,t,x3,t)′i.i.d.
∼N(0,I2),ut=exp(wt/2)
ε
twith
wt=0.5wt−1+vt, where
ε
t
i.i.d.
∼N(0,1),vt
i.i.d.
∼
χ
2(3),
xt,ut,independent, t=1,...,n.
B7: Nonstationary (x2,t,x3,t,
ε
t)′i.i.d.
∼N(0,I3),t=1,.. . ,n,
GARCH(1,1) ut=
σ
t
ε
t,
σ
2
t=0.8u2
t−1+0.8
σ
2
t−1.
B8: Exponential error variance (x2,t,x3,t,
ε
t)′i.i.d.
∼N(0,I3),ut=exp(.2t)
ε
t.
C9: AR(1)-HOM (x2,t,x3,t,
ν
u
t)′∼N(0,I3),t=2,.. . ,n,
ρ
u=.5ut=
ρ
uut−1+
ν
u
t,
(x2,1,x3,1)′∼N(0,I2),
ν
u
1insures stationarity.
C10: AR(1)-HE T x j,t=
ρ
xxj,t−1+
ν
j
t,j=1,2,
ρ
u=.5,ut=min{3,max[0.21,|x2,t|]}× ˜ut,
ρ
x=.5 ˜ut=
ρ
u˜ut−1+
ν
u
t,
(
ν
2
t,
ν
3
t,
ν
u
t)′i.i.d
∼N(0,I3),t=2,...,n
ν
2
1,
ν
3
1and
ν
u
1chosen to insure stationarity.
C11: AR(1)-HOM (x2,t,x3,t,
ν
u
t)′∼N(0,I3),t=2,...,n,
ρ
u=.9ut=
ρ
uut−1+
ν
u
t,
(x2,1,x3,1)′∼N(0,I2),
ν
u
1insures stationarity.
8. EMPIRICAL APPLICATIONS
23
presents a very unbalanced scheme in the design matrix (a case when the LAD estimator is known
to perform badly). Cases B5, B6, B7 and B8 are other cases of long tailed errors, heteroskedasticity
and nonlinear dependence. Cases C9 to C11 illustrate different levels of autocorrelation in the error
term with and without heteroskedasticity.
7.2. Bias and RMSE
We give biases and RMSE of each parameter of interest in Table 2 and we report a norm of these
three values. n=50 and S=1000. These results are unconditional on X.
In classical cases (A1-A3), sign-based estimators have roughly the same behavior as the LAD
estimator, in terms of bias and RMSE. OLS is optimal in case A1. However, there is no important
efficiency loss or bias increase in using signs instead of LAD. Besides, if the LAD is not accurate
in a particular setup (for example with highly unbalanced explanatory scheme, case A4), the sign-
based estimators do not suffer from the same drawback. In case A4, the RMSE of the sign-based
estimator is notably smaller than those of the OLS and the LAD estimates.
For setups with strong heteroskedasticity and nonstationary disturbances (B5-B8), we see that
the sign-based estimators yield better results than both LAD and OLS estimators. Not far from the
(optimal) LAD in case of Cauchy disturbances (B5), the signs estimators are the only estimators
that stay reliable with nonstationary variance (B6-B8). No assumption on the moments of the error
term is needed for sign-based estimators consistency. All that matters is the behavior of their signs.
When the error term is autocorrelated (C9-C11), results are mixed. When a moderate linear
dependence is present in the data, sign-based estimators give good results (C9, C10). But when the
linear dependence is stronger (C11), that is no longer true. The SHAC sign-based estimator does
not give better results than the non-corrected one in these selected examples.
To conclude, sign-based estimators are robust estimators much less sensitive than the LAD
estimator to various unbalanced schemes in the explanatory variables and to heteroskedasticity.
They are particularly adequate when an amount an heteroskedasticity or nonlinear dependence is
suspected in the error term, even if the error term fails to be stationary. Finally, the HAC correction
does not seem to increase the performance of the estimator. Nevertheless, it does for tests. We show
in Coudin and Dufour (2009) that using a HAC-corrected statistic allows for the asymptotic validity
of the Monte Carlo inference method and improves the test performance in small samples.
8. Empirical applications
In this section, we go back to the two illustrations presented in Coudin and Dufour (2007, 2009)
where sign-based tests were derived, with now estimation in mind. The first application is dedicated
to estimate a drift on the Standard and Poor’s Composite Price Index (S&P), 1928-1987. In the
second one, we search a robust estimate of the rate of
β
-convergence between output levels across
U.S. States during the 1880-1988 period using Barro and Sala-i-Martin (1991) data.
8. EMPIRICAL APPLICATIONS
24
Table 2. Simulated bias and RMSE.
n=50 OLS LAD SF SHAC
S=1000 Bias RMSE Bias RMSE Bias RMSE Bias RMSE
A1:
β
0.003 .142 .002 .179 .002 .179 .004 .178
β
1.003 .149 .006 .184 .004 .182 .004 .182
β
2−.002 .149 −.007 .186 −.006 .185 −.007 .183
||
β
||*.004 .254 .009 .316 .007 .315 .009 .313
A2:
β
0−.003 .136 .000 .090 −.000 .089 −.000 .089
β
1−.0135 .230 −.006 .218 −.010 .218 −.010 .218
β
2.002 .142 −.001 .095 −.001 .092 −.001 .092
||
β
|| .014 .303 .007 .254 .010 .253 .010 .253
A3:
β
0.022 .167 .018 .108 .025 .107 .023 .107
β
1−1.00 .228 .005 .215 .003 .214 .002 .215
β
2.001 .150 .005 .105 .007 .104 .007 .105
||
β
|| .022 .320 .019 .263 .026 .261 .024 .262
A4:
β
0−.001 .174 .007 .2102 .010 .2181 .008 .2171
β
1−.016 .313 −.011 .375 −.021 .396 −.021 .394
β
2−.100 14.6.077 18.4.014 7.41 .049 7.40
||
β
|| .101 14.6.078 18.5.027 7.42 .054 7.41
B5:
β
016.0 505 .001 .251 .004 .248 .003 .248
β
1−3.31 119 .015 .264 .020 .265 .020 .265
β
2−2.191 630 .000 .256 .003 .258 .001 .258
||
β
|| 26.0 817 .015 .445 .021 .445 .020 .445
B6:
β
0−.908 29.6−1.02 27.4.071 2.28 .083 2.28
β
12.00 37.6 3.21 68.4.058 2.38 .069 2.39
β
21.64 59.3 2.59 91.8−.101 2.30 −.089 2.29
||
β
|| 2.73 76.2 4.25 118 .136 4.02 .139 4.02
B7:
β
0−127 3289 −.010 7.85 −.008 3.16 −.028 3.17
β
1−81.4 237 .130 11.2−.086 3.80 −.086 3.823
β
2−31.0 1484 −.314 12.0−.021 3.606 −.009 3.630
||
β
|| 154 4312 .340 18.2.089 6.12 .091 6.15
B8:
β
0<−1010 >1010 <−109>1010 .312 5.67 .307 5.67
β
1>1010 >1010 >109>1010 .782 5.40 .863 5.46
β
2<−1010 >1010 <−109>1010 .696 5.52 .696 5.55
||
β
|| >1010 >1010 >1010 >1010 1.09 9.58 1.15 9.63
C9:
β
0.005 .279 .001 .308 .003 .309 .004 .311
β
1−.002 .163 −.005 .201 −.004 .200 −.005 .199
β
2.001 .165 −.004 .204 .003 .198 .002 .198
||
β
|| .006 .363 .007 .420 .006 .418 .006 .419
C10:
β
0−.013 .284 −.010 .315 −.015 .314 −.014 .314
β
1−.009 .182 −.009 .220 −.011 .218 −.011 .219
β
2.008 .189 .011 .222 .007 .215 .007 .215
||
β
|| .018 .387 .018 .444 .020 .439 .019 .439
C11:
β
0.070 1.23 −.026 .308 .058 1.26 .053 1.27
β
1−.000 .268 .005 .214 −.005 .351 −.008 .354
β
2.001 .273 −.004 .210 .002 .361 −.001 .361
||
β
|| .070 1.29 .027 .430 .059 1.36 .054 1.37
*||.|| stands for the Euclidean norm.
8. EMPIRICAL APPLICATIONS
25
8.1. Drift estimation with heteroskedasticity
In this section, we estimate a constant and a drift on the Standard and Poor’s Composite Price
Index (SP), 1928-1987. That process is known to involve a large amount of heteroskedasticity and
have been used by Gallant, Hsieh and Tauchen (1997) and Dufour and Valéry (2006, 2009) to fit
a stochastic volatility model. Here, we are interested in robust estimation without modeling the
volatility in the disturbance process. The data set consists in a series of 16,127 daily observations of
SP
t, then converted in price movements, yt=100[log(SP
t)−log(SP
t−1)] and adjusted for systematic
calendar effects. We consider a model involving a constant and a drift,
yt=a+bt +ut,t=1,...,16127,(8.2)
and we allow that {ut:t=1,...,16127}exhibits stochastic volatility or nonlinear heteroskedasticity
of unknown form. White and Breusch-Pagan tests for heteroskedasticity both reject homoskedas-
ticity at 1%.6
We compute both the basic SF sign-based estimator and the SHAC version with the two-step
method. They are compared with the LAD and OLS estimates. Then, we redo a similar experiment
on two subperiods: on the year 1929 (291 observations) and the last 90 days of 1929, which roughly
corresponds to the four last months of 1929 (90 observations). Due to the financial crisis, one may
expect data to involve an extreme amount of heteroskedasticity in that period of time. We wonder at
which point that heteroskedasticity can bias the subsample estimates. The Wall Street crash occurred
between October, 24th (Black Thursday) and October, 29th (Black Tuesday). Hence, the second
subsample corresponds to the period just before the crash (September), the crash period (October)
and the early beginning of the Great Depression (November and December). Heteroskedasticity
tests reject homoskedasticity for both subsamples.7
In Table 3, we report estimates and recall the 95% confidence intervals for aand bobtained
by the finite-sample sign-based method (SF and SHAC);8and by moving block bootstrap (LAD
and OLS). The entire set of sign-based estimators is reported, i.e., all the minimizers of the sign
objective function.
First, OLS estimates are just reported for comparison sake, even if they estimate different quan-
tities as LAD/sign estimators, and are greatly unreliable in the presence of heteroskedasticity. Pre-
senting the entire sets of sign-based estimators enables us to compare them with the LAD estimator.
In this example, LAD and sign-based estimators yield very similar estimates. The value of the LAD
estimator is indeed just at the limit of the sets of sign-based estimators. This does not mean that the
LAD estimator is included in the set of sign-based estimators, but, there is a sign-based estimator
giving the same value as the LAD estimate for a certain individual component (the second compo-
nent may differs). One easy way to check this is to compare the two objective functions evaluated at
the two estimates. For example, in the 90 observation sample, the sign objective function evaluated
at the basic sign-estimators is 4.75×10−3, and at the LAD estimate 5.10 ×10−2; the LAD objective
6See Coudin and Dufour (2009): White: 499 (p-value=.000) ; BP: 2781 (p-value=.000).
71929: White: 24.2, p-values: .000 ; BP: 126, p-values: .000; Sept-Oct-Nov-Dec 1929: White: 11.08, p-values: .004;
BP: 1.76, p-values: .18.
8see Coudin and Dufour (2009)
8. EMPIRICAL APPLICATIONS
26
Table 3. Constant and drift estimates.
Whole sample Subsamples
Constant parameter (a)(16120obs)1929(291obs)1929(90obs)
Set of basic sign-based .062 (.160,.163)∗(−.091, .142)
estimators (SF) [−.007,.105]∗∗ [−.226,.521] [−1.453, .491]
Set of 2-step sign-based .062 (.160,.163) (−.091, .142)
estimators (SHAC) [−.007,.106] [−.135, .443] [−1.030, .362]
LAD .062 .163 −.091
[.008,.116] [−.130, .456] [−1.223,1.040]
OLS −.005 .224 −.522
[−.056,.046] [−.140, .588] [−1.730, .685]
Drift parameter (b)×10−5×10−2×10−1
Set of basic sign-based (−.184,−.178) (−.003,.000) (−.097,−.044)
estimators (SF) [−.676,.486] [−.330, .342] [−.240, .305]
Set of 2-step sign-based (−.184,−.178) (−.003,.000) (−.097,−.044)
estimators (SHAC) [−.699,.510] [−.260, .268] [−.204, .224]
LAD −.184 .000 −.044
[−.681,.313] [−.236, .236] [−.316, .229]
OLS .266 −.183 .010
[−.228,.761] [−.523, .156] [−.250, .270]
* Interval of admissible estimators (minimizers of the sign objective function).
** 95% confidence intervals.
function evaluated at the LAD estimate is 210.4 and at one of the sign-based estimates 210.5. Both
are close but different.
Finally, two-step sign-based estimators and basic sign-based estimators yield the same esti-
mates. Only confidence intervals differ. Both methods are indeed expected to give different results
especially in the presence of linear dependence.
8.2. A robust sign-based estimate of convergence across U.S. states
One field suffering from both a small number of observations and possibly very heterogeneous
data is cross-sectional regional data sets. Least squares methods may be misleading because a few
outlying observations may drastically influence the estimates. Robust methods are greatly needed
in such cases. Sign-based estimators are robust (in a statistical sense) and are naturally associated
with a finite-sample inference. In the following, we examine sign-based estimates of the rate of
β
-convergence between output levels across U.S. States between 1880 and 1988 using Barro and
Sala-i-Martin (1991) data.
In the neoclassical growth model, Barro and Sala-i-Martin (1991) estimated the rate of
β
-
convergence between levels of per capita output across the U.S. States for different time periods
8. EMPIRICAL APPLICATIONS
27
Table 4. Summary of regression diagnostics.
Period Heteroskedasticity* Nonnormality** Influential Possible outliers**
observations**
Basic eq. Eq. Reg. Dum.
1880-1900 yes - yes - yes yes no no
1900-1920 yes yes yes yes yes yes yes (MT) yes
1920-1930 - - - - yes - no no
1930-1940 - - yes - yes yes no no
1940-1950 - - - - yes yes yes (VT) yes (VT)
1950-1960 - - - yes yes yes yes (MT) yes (MT)
1960-1970 - - - - - - no no
1970-1980 - - yes yes yes yes yes (WY) yes (WY)
1980-1988 yes - - yes yes yes yes (WY) yes (WY)
* White and Breusch-Pagan tests for heteroskedasticity are performed. If at least one test rejects at 5%
homoskedasticity, a “yes” is reported in the table, else a “-” is reported, when tests are both nonconclusive.
** Scatter plots, kernel density, leverage analysis, Studentized or standardized residuals larger than 3, DFbeta
and Cooks distance have been performed and lead to suspicions for nonnormality, outlier or high influential
observation presence.
between 1880 and 1988. They used nonlinear least squares to estimate equations of the form
(1/T)ln(yi,t/yi,t−T) = a−[ln(yi,t−T)] ×[(1−e−
β
T)/T] + x′
i
δ
+
ε
t,T
i,
i=1,...,48,T=8,10 or 20,t=1900,1920,1930,1940,1950,1960,1970,1980,1988.
Their basic equation does not include any other variables but they also consider a specification with
regional dummies (Eq. with reg. dum.). The basic equation assumes that the 48 States share a
common per capita level of personal income at steady state while the second specification allows
for regional differences in steady state levels. Their regressions involve 48 observations and are run
for each 20-year or 10-year period between 1880 and 1988. Their results suggest a
β
-convergence
at a rate somewhat above 2% a year but their estimates are not stable across subperiods, and vary
greatly from -.0149 to .0431 (for the basic equation). This instability is expected because of the
succession of troubles and growth periods in the last century. However, they may also be due to
particular observations behaving like outliers and influencing the least-squares estimates. A survey
of potential data problem is performed and regression diagnostics are summarized in Table 4. It
suggests the presence of highly influential observations in all the periods but one. Outliers are
clearly identified in periods 1900-1920, 1940-1950, 1950-1960, 1970-1980 and 1980-1988.
These two effects are probably combined. We wonder which part of that variability is really
due to business cycles and which part is only due to the non-robustness of least-squares methods.
Further, we would like to have a stable estimate of the rate of convergence at steady state. For
this, we use robust sign-based estimation with DS
β
,(X′X)−1. We consider the following linear
8. EMPIRICAL APPLICATIONS
28
Table 5. Regressions for personal income across U.S. States, 1880-1988. estimates of
β
Period Basic equation Equation with regional dummies
SIGN NLLS∗∗∗ SIGN NLLS∗∗∗
1880 −1900 .0012 .0101 .0016 .0224
[−.0068,.0123]∗[.0058, .0532]∗∗ [−.0123, .0211] [.0146, .0302]
1900 −1920 .0184 .0218 .0163 .0209
[.0092,.0313] [.0155, .0281] [−.0088, .1063] [.0086, .0332]
1920 −1930 −.0147 −.0149 −.0002 −.0122
[−.0301,.0018] [−.0249,−.0049] [−.0463,.0389] [−.0267, .0023]
1930 −1940 .0130 .0141 .0152 .0127
[.0043,.0234] [.0082, .0200] [−.0189, .0582] [.0027, .0227]
1940 −1950 .0364 .0431 .0174 .0373
[.0291,.0602] [.0372, .0490] [.0083, .0620] [.0314, .0432]
1950 −1960 .0195 .0190 .0140 .0202
[.0084,.0352] [.0121, .0259] [−.0044, .0510] [.0100, .0304]
1960 −1970 .0289 .0246 .0230 .0131
[.0099,.0377] [.0170, .0322] [−.0112, .0431] [.0047, .0215]
1970 −1980 .0181 .0198 .0172 .0119
[.0021,.0346] [−.0315, .0195] [−.0131, .0739] [−.0273, .0173]
1980 −1988 −.0081 −.0060 −.0059 −.0050
[−.0552,.0503] (.0130) [−.0472, .1344] (.0114)
* Projection-based 95% CI.
** Asymptotic 95% CI.
*** Estimates from Barro and Sala-i-Martin (1991).
equation:
(1/T)ln(yi,t/yi,t−T) = a+
γ
[ln(yi,t−T)] + x′
i
δ
+
ε
t,T
i(8.3)
where xi’s contain regional dummies when included, and we compute Hodges-Lehmann estimate
for
β
=−(1/T)ln(
γ
T+1)for both specifications. We also provide 95%-level projection-based
CI, asymptotic CI and projection-based p-value functions for the parameter of interest. Results are
presented in Table 5 where Barro and Sala-i-Martin (1991) NLLS results are reported.
Sign estimates are more stable than least-squares ones. They vary between [−.0147,.0364]
whereas least-squares estimates vary between [−.0149,.0431]. This suggests that at least 12% of
the least-squares estimates variability between sub-periods are only due to the non-robustness of
least-squares methods. In all cases but two, sign-based estimates are lower (in absolute values) than
the NLLS ones. Consequently, we incline to a lower value of the stable rate of convergence.
In graphics 6(a)-8(f) [see Appendix B], projection-based p-value functions and optimal concen-
trated sign-statistics are presented for each basic equation over the period 1880-1988. The optimal
concentrated sign-based statistic reports the minimal value of DSfor a given
β
(letting avarying).
The projection-based p-value function is the maximal simulated p-value for a given
β
over admis-
sible values of a. Those functions enable us to perform tests on
β
. 95% projection based confidence
intervals for
β
presented in Table 5 are obtained by cutting the p-value function with the p=.05