Article

Applications of James-Stein Shrinkage (II): Bias Reduction in Instrumental Variable Estimation

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In a two-stage linear regression model with Normal noise, I consider James-Stein type shrinkage in the estimation of the first-stage instrumental variable coefficients. For at least four instrumental variables and a single endogenous regressor, I show that the standard two-stage least-squares estimator is dominated with respect to bias. I construct the dominating estimator by a variant of James-Stein shrinkage in a first-stage high-dimensional Normal-means problem followed by a control-function approach in the second stage; it preserves invariances of the structural instrumental variable equations.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... In a companion paper (Spiess, 2017), I show how shrinkage in at least four instrumental variables in a canonical structural form provides consistent bias improvement over the two-stage least-squares estimator. Together, these results suggests different roles of overfitting in control and instrumental variable coefficients, respectively: while overfitting to control variables induces variance, overfitting to instrumental variables in the first stage of a two-stage least-squares procedure induces bias. ...
Article
In a linear regression model with homoscedastic Normal noise, I consider James-Stein type shrinkage in the estimation of nuisance parameters associated with control variables. For at least three control variables and exogenous treatment, I show that the standard least-squares estimator is dominated with respect to squared-error loss in the treatment effect even among unbiased estimators and even when the target parameter is low-dimensional. I construct the dominating estimator by a variant of James-Stein shrinkage in an appropriate high-dimensional Normal-means problem; it can be understood as an invariant generalized Bayes estimator with an uninformative (improper) Jeffreys prior in the target parameter.
Article
In a linear regression model with homoscedastic Normal noise, I consider James-Stein type shrinkage in the estimation of nuisance parameters associated with control variables. For at least three control variables and exogenous treatment, I show that the standard least-squares estimator is dominated with respect to squared-error loss in the treatment effect even among unbiased estimators and even when the target parameter is low-dimensional. I construct the dominating estimator by a variant of James-Stein shrinkage in an appropriate high-dimensional Normal-means problem; it can be understood as an invariant generalized Bayes estimator with an uninformative (improper) Jeffreys prior in the target parameter.
Article
Maasoumi (1978 Maasoumi, E. (1978). A modified Stein-like estimator for the reduced form coefficients of simultaneous equations. Econometrica 46:695–703.[CrossRef], [Web of Science ®] [Google Scholar]) proposed a Stein-like estimator for simultanous equations and showed that his Stein shrinkage estimator has bounded finite sample risk, unlike the 3SLS estimator. We revisit his proposal by investigating Stein-like shrinkage in the context of 2SLS estimation of a structural parameter. Our estimator follows Maasoumi (1978 Maasoumi, E. (1978). A modified Stein-like estimator for the reduced form coefficients of simultaneous equations. Econometrica 46:695–703.[CrossRef], [Web of Science ®] [Google Scholar]) in taking a weighted average of the 2SLS and OLS estimators, with the weight depending inversely on the Hausman (1978 Hausman, J. A. (1978). Specification tests in econometrics. Econometrica 46:1251–1271.[CrossRef], [Web of Science ®] [Google Scholar]) statistic for exogeneity. Using a local-to-exogenous asymptotic theory, we derive the asymptotic distribution of the Stein estimator, and calculate its asymptotic risk. We find that if the number of endogenous variables exceeds two, then the shrinkage estimator has strictly smaller risk than the 2SLS estimator, extending the classic result of James and Stein (1961 James W, ., Stein, C. M. (1961). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1:361–380. [Google Scholar]). In a simple simulation experiment, we show that the shrinkage estimator has substantially reduced finite sample median squared error relative to the standard 2SLS estimator.
Conference Paper
In this paper closed-form expressions are derived for the expectation of the logarithm and for the expectation of the n-th power of the reciprocal value (inverse moments) of a noncentral chi-square random variable of even degree of freedom. It is shown that these expectations can be expressed by a family of continuous functions g(m) (.) and that these families have nice properties (monotonicity, convexity, etc.). Moreover, some tight upper and lower bounds axe derived that are helpful in situations where the closed-form expression of g(m) (.) is too complex for further analysis. As an example of the applicability of these results, in the second part of this paper an independent and identically distributed (IID) Gaussian multiple-input-multiple-output (MIMO) fading channel with a scalar line-of-sight component is analyzed. Some new expressions axe derived for the fading number that describes the asymptotic channel capacity at high signal-to-noise ratios (SNR).
Article
The question of the existence of negative moments of a continuous probability density function is explored. A sufficient condition for the existence of the first negative moment is given. The condition is easy to verify, as it involves limits rather than integrals. An example is given, however, that shows that this simple condition is not necessary for the existence of the first negative moment. The delicacy of the characterization of existence is explored further with some results concerning the existence of moments surrounding the first negative moment.
Article
It has long been customary to measure the adequacy of an estimator by the smallness of its mean squared error. The least squares estimators were studied by Gauss and by other authors later in the nineteenth century. A proof that the best unbiased estimator of a linear function of the means of a set of observed random variables is the least squares estimator was given by Markov [12], a modified version of whose proof is given by David and Neyman [4]. A slightly more general theorem is given by Aitken [1]. Fisher [5] indicated that for large samples the maximum likelihood estimator approximately minimizes the mean squared error when compared with other reasonable estimators. This paper will be concerned with optimum properties or failure of optimum properties of the natural estimator in certain special problems with the risk usually measured by the mean squared error or, in the case of several parameters, by a quadratic function of the estimators. We shall first mention some recent papers on this subject and then give some results, mostly unpublished, in greater detail.
Article
This paper applies some general concepts in decision theory to a simple instrumental variables model. There are two endogenous variables linked by a single structural equation; k of the exogenous variables are excluded from this structural equation and provide the instrumental variables (IV). The reduced-form distribution of the endogenous variables conditional on the exogenous variables corresponds to independent draws from a bivariate normal distribution with linear regression functions and a known covariance matrix. A canonical form of the model has parameter vector (ρ, φ, ω), where φis the parameter of interest and is normalized to be a point on the unit circle. The reduced-form coefficients on the instrumental variables are split into a scalar parameter ρand a parameter vector ω, which is normalized to be a point on the (k−1)-dimensional unit sphere; ρmeasures the strength of the association between the endogenous variables and the instrumental variables, and ωis a measure of direction. A prior distribution is introduced for the IV model. The parameters φ, ρ, and ωare treated as independent random variables. The distribution for φis uniform on the unit circle; the distribution for ωis uniform on the unit sphere with dimension k-1. These choices arise from the solution of a minimax problem. The prior for ρis left general. It turns out that given any positive value for ρ, the Bayes estimator of φdoes not depend on ρ; it equals the maximum-likelihood estimator. This Bayes estimator has constant risk; because it minimizes average risk with respect to a proper prior, it is minimax.
Article
Estimation of the means of independent normal random variables is considered, using sum of squared errors as loss. An unbiased estimate of risk is obtained for an arbitrary estimate, and certain special classes of estimates are then discussed. The results are applied to smoothing by use of moving averages and to trimmed analogs of the James-Stein estimate. A suggestion is made for calculating approximate confidence sets for the mean vector centered at an arbitrary estimate.