Article

Recovering Latent Variables by Matching*

Taylor & Francis
Journal of the American Statistical Association
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We propose an optimal-transport-based matching method to nonparametrically estimate linear models with independent latent variables. The method consists in generating pseudo-observations from the latent variables, so that the Euclidean distance between the model’s predictions and their matched counterparts in the data is minimized. We show that our nonparametric estimator is consistent, and we document that it performs well in simulated data. We apply this method to study the cyclicality of permanent and transitory income shocks in the Panel Study of Income Dynamics. We find that the dispersion of income shocks is approximately acyclical, whereas the skewness of permanent shocks is procyclical. By comparison, we find that the dispersion and skewness of shocks to hourly wages vary little with the business cycle.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Our setting is distinct from other literatures on heterogeneous effects for matrices, such as that on vector quantiles (see for instance Carlier et al. 2016;Galichon 2016;Arellano and Bonhomme 2021;Fan and Henry 2022). These linear tools are not generally appropriate for the kinds of quadratic problems that arise in a double randomized experiment. ...
Preprint
Full-text available
We are interested in the distribution of treatment effects for an experiment where units are randomized to treatment but outcomes are measured for pairs of units. For example, we might measure risk sharing links between households enrolled in a microfinance program, employment relationships between workers and firms exposed to a trade shock, or bids from bidders to items assigned to an auction format. Such a double randomized experimental design may be appropriate when there are social interactions, market externalities, or other spillovers across units assigned to the same treatment. Or it may describe a natural or quasi experiment given to the researcher. In this paper, we propose a new empirical strategy based on comparing the eigenvalues of the outcome matrices associated with each treatment. Our proposal is based on a new matrix analog of the Fr\'echet-Hoeffding bounds that play a key role in the standard theory. We first use this result to bound the distribution of treatment effects. We then propose a new matrix analog of quantile treatment effects based on the difference in the eigenvalues. We call this analog spectral treatment effects.
... Optimal transport has a long history in economics and operations research (see Kantorovitch, 1958 for an early treatment). Furthermore, optimal transport has recently witnessed renewed interest in economics and applied econometrics, including the analysis of identification of dynamic discrete-choice models (Chiong, Galichon and Shum, 2016), in vector quantile regression (Carlier, Chernozhukov and Galichon, 2016), in empirical matching models (Galichon, Kominers and Weber, 2018), and in latent variables (Arellano and Bonhomme, 2019). To the best of our knowledge, this is the first principled use of optimal transport for reduced form analysis in economics, notwithstanding matching applications. ...
Preprint
Full-text available
Black markets can reduce the effects of distortionary regulations by reallocating scarce resources toward consumers who value them most. The illegal nature of black markets, however, creates transaction costs that reduce the gains from trade. We take a partial identification approach to infer gains from trade and transaction costs in the black market for Beijing car license plates, which emerged following their recent rationing. We find that at least 11% of emitted license plates are illegally traded. The estimated transaction costs suggest severe market frictions: between 61% and 82% of the realized gains from trade are lost to transaction costs.
Article
We consider a multivariate system Yt = AXt, where the unobserved components Xt are independent AR(1) processes and the number of sources is greater than the number of observed outputs. We show that the mixing matrix A, the AR(1) coefficients and distributions of Xt can be identified (up to scale factors of Xt), which solves the dynamic deconvolution problem. The proof is constructive and allows us to introduce simple consistent estimators of all unknown scalar and functional parameters of the model. The approach is illustrated by an estimation and identification of the dynamics of unobserved short and long run components in a time series. Applications to causal models with structural innovations are also discussed, such as the identification in error‐in‐variables models and causal mediation models.
Article
Full-text available
This paper presents identification and estimation results for a flexible state space model. Our modification of the canonical model allows the permanent component to follow a unit root process and the transitory component to follow a semiparametric model of a higher‐order autoregressive‐moving‐average (ARMA) process. Using panel data of observed earnings, we establish identification of the nonparametric joint distributions for each of the permanent and transitory components over time. We apply the identification and estimation method to the earnings dynamics of U.S. men using the Panel Survey of Income Dynamics (PSID). The results show that the marginal distributions of permanent and transitory earnings components are more dispersed, more skewed, and have fatter tails than the normal and that earnings mobility is much lower than for the normal. We also find strong evidence for the existence of higher‐order ARMA processes in the transitory component, which lead to much different estimates of the distributions of and earnings mobility in the permanent component, implying that misspecification of the process for transitory earnings can affect estimated distributions of the permanent component and estimated earnings dynamics of that component. Thus our flexible model implies earnings dynamics for U.S. men different from much of the prior literature.
Article
Full-text available
We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution PXP_X and the latent variable model distribution PGP_G. We show that the OT problem can be equivalently written in terms of probabilistic encoders, which are constrained to match the posterior and prior distributions over the latent space. When relaxed, this constrained optimization problem leads to a penalized optimal transport (POT) objective, which can be efficiently minimized using stochastic gradient descent by sampling from PXP_X and PGP_G. We show that POT for the 2-Wasserstein distance coincides with the objective heuristically employed in adversarial auto-encoders (AAE) (Makhzani et al., 2016), which provides the first theoretical justification for AAEs known to the authors. We also compare POT to other popular techniques like variational auto-encoders (VAE) (Kingma and Welling, 2014). Our theoretical results include (a) a better understanding of the commonly observed blurriness of images generated by VAEs, and (b) establishing duality between Wasserstein GAN (Arjovsky and Bottou, 2017) and POT for the 1-Wasserstein distance.
Article
Full-text available
We study the problem of nonparametric estimation under Lp-loss, p ∈ [1, ∞), in the framework of the convolution structure density model on R d. This observation scheme is a generalization of two classical statistical models, namely density estimation under direct and indirect observations. In Part I the original pointwise selection rule from a family of " kernel-type " estimators is proposed. For the selected estimator, we prove an Lp-norm oracle inequality and several of its consequences. In Part II the problem of adaptive minimax estimation under Lp–loss over the scale of anisotropic Nikol'skii classes is addressed. We fully characterize the behavior of the minimax risk for different relationships between regularity parameters and norm indexes in the definitions of the functional class and of the risk. We prove that the selection rule proposed in Part I leads to the construction of an optimally or nearly optimally (up to logarithmic factor) adaptive estimator. AMS 2000 subject classifications: 62G05, 62G20.
Article
Full-text available
This paper studies linear factor models that have arbitrarily dependent factors. Assuming that the coefficients are known and that their matrix representation satisfies rank conditions, we identify the nonparametric joint distribution of the unobserved factors using first and then second-order partial derivatives of the log characteristic function of the observed variables. In conjunction with these identification strategies the mean and variance of the vector of factors are identified. The main result provides necessary and sufficient conditions for identification of the joint distribution of the factors. In an illustrative example, we show identification of an earnings dynamics model with a subset of arbitrarily dependent income shocks. Closed-form formulas lead to estimators that converge uniformly and despite being based on inverse Fourier transforms have tight confidence bands around their theoretical counterparts in Monte Carlo simulations.
Article
Full-text available
In purely generative models, one can simulate data given parameters but not necessarily evaluate the likelihood. We use Wasserstein distances between empirical distributions of observed data and empirical distributions of synthetic data drawn from such models to estimate their parameters. Previous interest in the Wasserstein distance for statistical inference has been mainly theoretical, due to computational limitations. Thanks to recent advances in numerical transport, the computation of these distances has become feasible, up to controllable approximation errors. We leverage these advances to propose point estimators and quasi-Bayesian distributions for parameter inference, first for independent data. For dependent data, we extend the approach by using delay reconstruction and residual reconstruction techniques. For large data sets, we propose an alternative distance using the Hilbert space-filling curve, which computation scales as nlognn\log n where n is the size of the data. We provide a theoretical study of the proposed estimators, and adaptive Monte Carlo algorithms to approximate them. The approach is illustrated on four examples: a quantile g-and-k distribution, a toggle switch model from systems biology, a Lotka-Volterra model for plankton population sizes and a L\'evy-driven stochastic volatility model.
Article
Full-text available
We investigate the data driven choice of the cutoff parameter in density deconvolution problems with unknown error distribution. To make the target density identifiable, one has to assume that some additional information on the noise is available. We consider two different models: the framework where some additional sample of the pure noise is available, as well as the model of repeated measurements, where the contaminated random variables of interest can be observed repeatedly, with independent errors. We introduce spectral cutoff estimators and present upper risk bounds. The focus of this work lies on the optimal choice of the bandwidth by penalization strategies, leading to non-asymptotic oracle bounds. © 2014, Institute of Mathematical Statistics. All rights reserved.
Article
Full-text available
Politis and Romano have put forth a general subsampling methodology for the construction of large-sample confidence regions for a general unknown parameter θ associated with the probability distribution generating the stationary sequence X1,…,Xn. The subsampling methodology hinges on approximating the large-sample distribution of a statistic Tn = Tn(X1,…, Xn) that is consistent for θ at some known rate τn. Although subsampling has been shown to yield confidence regions for θ of asymptotically correct coverage under very weak assumptions, the applicability of the methodology as it has been presented so far is limited if the rate of convergence τn happens to be unknown or intractable in a particular setting. In this article we show how it is possible to circumvent this limitation by (a) using the subsampling methodology to derive a consistent estimator of the rate τn, and (b) using the estimated rate to construct asymptotically correct confidence regions for θ based on subsampling.
Article
Full-text available
In this paper, we consider a multidimensional convolution model for which we provide adaptive anisotropic kernel estimators of a signal density f measured with additive error. For this, we generalize Fan's~(1991) estimators to multidimensional setting and use a bandwidth selection device in the spirit of Goldenschluger and Lepski's~(2011) proposal fr density estimation without noise. We consider first the pointwise setting and then, we study the integrated risk. Our estimators depend on an automatically selected random bandwidth. We assume both ordinary and super smooth components for measurement errors, which have known density. We also consider both anisotropic H\"{o}lder and Sobolev classes for f. We provide non asymptotic risk bounds and asymptotic rates for the resulting data driven estimator, which is proved to be adaptive. We provide an illustrative simulation study, involving the use of Fast Fourier Transform algorithms. We conclude by a proposal of extension of the method to the case of unknown noise density, when a preliminary pure noise sample is available.
Article
Full-text available
The subject of this paper is the problem of nonparametric estimation of a continuous distribution function from observations with measurement errors. We study minimax complexity of this problem when unknown distribution has a density belonging to the Sobolev class, and the error density is ordinary smooth. We develop rate optimal estimators based on direct inversion of empirical characteristic function. We also derive minimax affine estimators of the distribution function which are given by an explicit convex optimization problem. Adaptive versions of these estimators are proposed, and some numerical results demonstrating good practical behavior of the developed procedures are presented.
Article
Full-text available
Random coefficient regression models are important in representing linear models with heteroscedastic errors and in unifying the study of classical fixed effects and random effects linear models. For prediction intervals and for bootstrapping in random coefficient regressions, it is necessary to estimate the distributions of the random coefficients consistently. We show that this is often possible and provide practical representative estimators of these distributions.
Article
Full-text available
Deconvolution problems arise in a variety of situations in statistics. An interesting problem is to estimate the density f of a random variable X based on n i.i.d. observations from Y=X+εY = X + \varepsilon, where ε\varepsilon is a measurement error with a known distribution. In this paper, the effect of errors in variables of nonparametric deconvolution is examined. Insights are gained by showing that the difficulty of deconvolution depends on the smoothness of error distributions: the smoother, the harder. In fact, there are two types of optimal rates of convergence according to whether the error distribution is ordinary smooth or supersmooth. It is shown that optimal rates of convergence can be achieved by deconvolution kernel density estimators.
Article
Full-text available
Is individual labor income more risky in recessions? This is a difficult question to answer because existing panel data sets are so short. To address this problem, we develop a generalized method of moments estimator that conditions on the macroeconomic history that each member of the panel has experienced. Variation in the cross-sectional variance between households with differing macroeconomic histories allows us to incorporate business cycle information dating back to 1930, even though our data do not begin until 1968. We implement this estimator using household-level labor earnings data from the Panel Study of Income Dynamics. We estimate that idiosyncratic risk is (i) highly persistent, with an annual autocorrelation coefficient of 0.95, and (ii) strongly countercyclical, with a conditional standard deviation that increases by 75 percent (from 0.12 to 0.21) as the macroeconomy moves from peak to trough.
Article
Full-text available
This paper examines the link between income and consumption inequality. We create panel data on consumption for the Panel Study of Income Dynamics using an imputation procedure based on food demand estimates from the Consumer Expenditure Survey. We document a disjuncture between income and consumption inequality over the 1980s and show that it can be explained by changes in the persistence of income shocks. We find some partial insurance of permanent shocks, especially for the college educated and those near retirement. We find full insurance of transitory shocks except among poor households. Taxes, transfers, and family labor supply play an important role in insuring permanent shocks. (JEL D12, D31, D91, E21)
Article
We estimate the causal effect of each county in the United States on children’s incomes in adulthood. We first estimate a fixed effects model that is identified by analyzing families who move across counties with children of different ages. We then use these fixed effect estimates to (i) quantify how much places matter for intergenerational mobility, (ii) construct forecasts of the causal effect of growing up in each county that can be used to guide families seeking to move to opportunity, and (iii) characterize which types of areas produce better outcomes. For children growing up in low-income families, each year of childhood exposure to a one standard deviation (std. dev.) better county increases income in adulthood by 0.5%. There is substantial variation in counties’ causal effects even within metro areas. Counties with less concentrated poverty, less income inequality, better schools, a larger share of two-parent families, and lower crime rates tend to produce better outcomes for children in poor families. Boys’ outcomes vary more across areas than girls’ outcomes, and boys have especially negative outcomes in highly segregated areas. Areas that generate better outcomes have higher house prices on average, but our approach uncovers many “opportunity bargains”—places that generate good outcomes but are not very expensive.
Book
Optimal transport theory is used widely to solve problems in mathematics and some areas of the sciences, but it can also be used to understand a range of problems in applied economics, such as the matching between job seekers and jobs, the determinants of real estate prices, and the formation of matrimonial unions. This is the first text to develop clear applications of optimal transport to economic modeling, statistics, and econometrics. It covers the basic results of the theory as well as their relations to linear programming, network flow problems, convex analysis, and computational geometry. Emphasizing computational methods, it also includes programming examples that provide details on implementation. Applications include discrete choice models, models of differential demand, and quantile-based statistical estimation methods, as well as asset pricing models. The book also features numerous exercises throughout that help to develop mathematical agility, deepen computational skills, and strengthen economic intuition.
Article
Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown non-decreasing regression function f from independent pairs (xi,yi)(x_i, y_i) where E[yi]=f(xi),i=1,n{\mathbb{E}}[y_i]=f(x_i), i=1, \ldots n. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart, where one is given only the unordered sets {x1,,xn}\{x_1, \ldots , x_n\} and {y1,,yn}\{y_1, \ldots , y_n\}. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on yiy_i and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.
Article
Recent works have concluded that labor earnings dynamics exhibit non-Gaussian and nonlinear features. We argue in this paper that this finding is mainly due to volatility in working time. Using a non-parametric approach, we find from French data that changes in labor earnings exhibit strong asymmetry and high peakedness. However, after decomposing labor earnings growth into growth in wages and working time, deviations from Gaussianity stem from changes in working time. The nonlinearity of earnings dynamics is also mostly driven by working time dynamics at the extensive margin.
Article
Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746–1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total e ort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions—two di erent piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a “global” cost to every such transport, using the “local” consideration of how much it costs to move a grain of sand from one place to another. Mathematicians are interested in the properties of that least costly transport, as well as in its e cient computation. That smallest cost not only defines a distance between distributions, but it also entails a rich geometric structure on the space of probability distributions. That structure is canonical in the sense that it borrows key geometric properties of the underlying “ground” space on which these distributions are defined. For instance, when the underlying space is Euclidean, key concepts such as interpolation, barycenters, convexity or gradients of functions extend naturally to the space of distributions endowed with an OT geometry. OT has been (re)discovered in many settings and under different forms, giving it a rich history. While Monge’s seminal work was motivated by an engineering problem, Tolstoi in the 1920s and Hitchcock, Kantorovich and Koopmans in the 1940s established its significance to logistics and economics. Dantzig solved it numerically in 1949 within the framework of linear programming, giving OT a firm footing in optimization. OT was later revisited by analysts in the 1990s, notably Brenier, while also gaining fame in computer vision under the name of earth mover’s distances. Recent years have witnessed yet another revolution in the spread of OT, thanks to the emergence of approximate solvers that can scale to large problem dimensions. As a consequence, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), graphics (for shape manipulation) or machine learning (for regression, classification and generative modeling). This paper reviews OT with a bias toward numerical methods, and covers the theoretical properties of OT that can guide the design of new algorithms.We focus in particular on the recent wave of efficient algorithms that have helped OT find relevance in data sciences. We give a prominent place to the many generalizations of OT that have been proposed in but a few years, and connect them with related approaches originating from statistical inference, kernel methods and information theory. All of the figures can be reproduced using code made available in a companion website1. This website hosts the book project Computational Optimal Transport. You will also find slides and computational resources.
Article
Compactness is a widely used assumption in econometrics. In this article, we gather and review general compactness results for many commonly used parameter spaces in nonparametric estimation, and we provide several new results. We consider three kinds of functions: (1) functions with bounded domains which satisfy standard norm bounds, (2) functions with bounded domains which do not satisfy standard norm bounds, and (3) functions with unbounded domains. In all three cases, we provide two kinds of results, compact embedding and closedness, which together allow one to show that parameter spaces defined by a ||·||s-norm bound are compact under a norm ||·||c. We illustrate how the choice of norms affects the parameter space, the strength of the conclusions, as well as other regularity conditions in two common settings: nonparametric mean regression and nonparametric instrumental variables estimation.
Article
We study how the distribution of earnings growth evolves over the business cycle in Italy. We distinguish between two sources of annual earnings growth: changes in employment time (number of weeks of employment within a year) and changes in weekly earnings. Changes in employment time generate the tails of the earnings growth distribution, and account for its procyclical skewness. In contrast, the distribution of weekly earnings growth is close to symmetric and stable over the cycle. This suggests that studies of earnings risk should carefully model the employment margin to avoid erroneous conclusions on the nature and magnitude of risks underlying individual earnings. We show that the combination of simple employment and wage processes is enough to capture the complex features of the earnings growth distribution.
Article
We give a statistical interpretation of entropic optimal transport by showing that performing maximum-likelihood estimation for Gaussian deconvolution corresponds to calculating a projection with respect to the entropic optimal transport distance. This structural result gives theoretical support for the wide adoption of these tools in the machine learning community.
Article
This paper considers a dynamic panel model where a latent state variable follows a unit root process with nonparametric heteroskedasticity. We develop constructive nonparametric identification and estimation of the skedastic function. Applying this method to the Panel Survey of Income Dynamics (PSID) in the framework of earnings dynamics, we found that workers with lower pre-recession permanent earnings had higher earnings risk during the three most recent recessions.
Article
The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to information divergences to handle such problematic scenarios. Unfortunately, training generative machines using OT raises formidable computational and statistical challenges, because of (i) the computational burden of evaluating OT losses, (ii) the instability and lack of smoothness of these losses, (iii) the difficulty to estimate robustly these losses and their gradients in high dimension. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles both these issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function.
Article
We develop a new quantile-based panel data framework to study the nature of income persistence and the transmission of income shocks to consumption. Log-earnings are the sum of a general Markovian persistent component and a transitory innovation. The persistence of past shocks to earnings is allowed to vary according to the size and sign of the current shock. Consumption is modeled as an age-dependent nonlinear function of assets, unobservable tastes, and the two earnings components. We establish the nonparametric identification of the nonlinear earnings process and of the consumption policy rule. Exploiting the enhanced consumption and asset data in recent waves of the Panel Study of Income Dynamics, we find that the earnings process features nonlinear persistence and conditional skewness. We confirm these results using population register data from Norway. We then show that the impact of earnings shocks varies substantially across earnings histories, and that this nonlinearity drives heterogeneous consumption responses. The framework provides new empirical measures of partial insurance in which the transmission of income shocks to consumption varies systematically with assets, the level of the shock, and the history of past shocks.
Article
Empirical Bayes methods for Gaussian and binomial compound decision problems involving longitudinal data are considered. A recent convex optimization reformulation of the nonparametric maximum likelihood estimator of Kiefer and Wolfowitz (Annals of Mathematical Statistics 1956; 27: 887-906) is employed to construct nonparametric Bayes rules for compound decisions. The methods are illustrated with an application to predict baseball batting averages, and the age profile of batting performance. An important aspect of the empirical application is the general bivariate specification of the distribution of heterogeneous location and scale effects for players that exhibits a weak positive association between location and scale attributes. Prediction of players' batting averages for 2012 based on performance in the prior decade using the proposed methods shows substantially improved performance over more naive methods with more restrictive treatment of unobserved heterogeneity. Comparisons are also made with nonparametric Bayesian methods based on Dirichlet process priors, which can be viewed as a regularized, or smoothed, version of the Kiefer-Wolfowitz method.
Article
The aim of this paper is to estimate the density f of a random variable X when one has access to independent observations of the sum of K \ge 2 independent copies of X. We provide a constructive estimator based on a suitable definition of the logarithm of the empirical characteristic function.We propose a new strategy for the data driven choice of the cut-off parameter. The adaptive estimator is proven to be minimax-optimal up to some logarithmic loss. A numerical study illustrates the performances of the method. Moreover, we discuss the fact that the definition of the estimator applies in a wider context than the one considered here.
Article
Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation ; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.
Article
We propose a notion of conditional vector quantile function and a vector quantile regression. A conditional vector quantile function (CVQF) of a random vector Y , taking values in ℝd given covariates Z = z, taking values in ℝk, is a map u→→QY|Z(u, z), which is monotone, in the sense of being a gradient of a convex function and such that given that vector U follows a reference non-atomic distribution FU for instance uniform distribution on a unit cube in ℝd the random vector QY|Z(U z) has the distribution of Y conditional on Z = z. Moreover we have a strong representation Y =QY|Z(UZ) almost surely for some version of U. The vector quantile regression (VQR) is a linear model for CVQF of Y given Z. Under correct specification the notion produces strong representation Y = β(U)Τ f (Z) for f (Z) denoting a known set of transformations of Z where u →β(u)Τ f (Z) is a monotone map the gradient of a convex function and the quantile regression coefficients u →β(u) have the interpretations analogous to that of the standard scalar quantile regression. As f (Z) becomes a richer class of transformations of Z the model becomes nonparametric as in series modelling. A key property of VQR is the embedding of the classical Monge-Kantorovich's optimal transportation problem at its core as a special case. In the classical case where Y is scalar VQR reduces to a version of the classical QR and CVQF reduces to the scalar conditional quantile function. An application to multiple Engel curve estimation is considered.
Article
An unknown prior density g(θ)g(\theta ) has yielded realizations Θ1,,ΘN\Theta _1,\ldots ,\Theta _N. They are unobservable, but each Θi\Theta _i produces an observable value XiX_i according to a known probability mechanism, such as XiPo(Θi)X_i\sim {\rm Po}(\Theta _i). We wish to estimate g(θ)g(\theta ) from the observed sample X1,,XNX_1,\ldots ,X_N. Traditional asymptotic calculations are discouraging, indicating very slow nonparametric rates of convergence. In this article we show that parametric exponential family modelling of g(θ)g(\theta ) can give useful estimates in moderate-sized samples. We illustrate the approach with a variety of real and artificial examples. Covariate information can be incorporated into the deconvolution process, leading to a more detailed theory of generalized linear mixed models.
Article
Introduction Measurement error is widespread in statistical and/or economic data and can have substantial impact on point estimates and statistical inference in general. Accordingly, there exists a vast literature focused on addressing this problem. The present overview emphasizes the recent econometric literature on the topic and mostly centers on the author's interest in the question of identification (and consistent estimation) of general nonlinear models with measurement error without simply assuming that the distribution of the measurement error is known. This chapter is organized as follows. First, we explain the origins of measurement-error bias before describing simple approaches that rely on distributional knowledge regarding the measurement error (e.g., decon-volution or validation-data techniques). We then describe methods that secure identification via more readily available auxiliary variables (e.g., repeated measurements, measurement systems with a “factor model” structure, instrumental variables, and panel data). An overview of methods exploiting higher-order moments or bounding techniques to avoid the need for auxiliary information is presented next. Special attention is devoted to a recently introduced general method to handle a broad class of latent variable models, called Entropic Latent Variable Integration via Simulation (ELVIS). Finally, the complex but active topic of nonclassical measurement error is discussed and applications of measurement-error techniques to other fields are outlined.
Article
The ill-posedness of the inverse problem of recovering a regression function in a nonparametric instrumental variable model leads to estimators that may suffer from a very slow, logarithmic rate of convergence. In this paper, we show that restricting the problem to models with monotone regression functions and monotone instruments significantly weakens the ill-posedness of the problem. In stark contrast to the existing literature, the presence of a monotone instrument implies boundedness of our measure of ill-posedness when restricted to the space of monotone functions. Based on this result we derive a novel non-asymptotic error bound for the constrained estimator that imposes monotonicity of the regression function. For a given sample size, the bound is independent of the degree of ill-posedness as long as the regression function is not too steep. As an implication, the bound allows us to show that the constrained estimator converges at a fast, polynomial rate, independently of the degree of ill-posedness, in a large, but slowly shrinking neighborhood of constant functions. Our simulation study demonstrates significant finite-sample performance gains from imposing monotonicity even when the regression function is rather far from being a constant. We apply the constrained estimator to the problem of estimating gasoline demand functions from U.S. data.
Article
We propose a computationally feasible way of deriving the identified features of models with multiple equilibria in pure or mixed strategies. It is shown that in the case of Shapley regular normal form games, the identified set is characterized by the inclusion of the true data distribution within the core of a Choquet capacity, which is interpreted as the generalized likelihood of the model. In turn, this inclusion is characterized by a finite set of inequalities and efficient and easily implementable combinatorial methods are described to check them. In all normal form games, the identified set is characterized in terms of the value of a submodular or convex optimization program. Efficient algorithms are then given and compared to check inclusion of a parameter in this identified set. The latter are illustrated with family bargaining games and oligopoly entry games.
Article
We consider deconvolution from repeated observations with unknown error distribution. So far, this model has mostly been studied under the additional assumption that the errors are symmetric. We construct an estimator for the non-symmetric error case and study its theoretical properties and practical performance. It is interesting to note that we can improve substantially upon the rates of convergence which have so far been presented in the literature and, at the same time, dispose of most of the extremely restrictive assumptions which have been imposed so far.
Article
This note demonstrates that the conditions of Kotlarski’s (1967, Pacific Journal of Mathematics 20(1), 69–76) lemma can be substantially relaxed. In particular, the condition that the characteristic functions of M, U 1, and U 2 are nonvanishing can be replaced with much weaker conditions: The characteristic function of U 1 can be allowed to have real zeros, as long as the derivative of its characteristic function at those points is not also zero; that of U 2 can have an isolated number of zeros; and that of M need satisfy no restrictions on its zeros. We also show that Kotlarski’s lemma holds when the tails of U 1 are no thicker than exponential, regardless of the zeros of the characteristic functions of U 1, U 2, or M.
Article
Suppose that the sum of two independent random variables X and Z is observed, where Z denotes measurement error and has a known distribution, and where the unknown density f of X is to be estimated. One application is the estimation of a prior density for a sequence of location parameters. A second application arises in the errors-in-variables problem for nonlinear and generalized linear models, when one attempts to model the distribution of the true but unobservable covariates. This article shows that if Z is normally distributed and f has k bounded derivatives, then the fastest attainable convergence rate of any nonparametric estimator of f is only (log n). Therefore, deconvolution with normal errors may not be a practical proposition. Other error distributions are also treated. Stefanski—Carroll (1987a) estimators achieve the optimal rates. The results given have versions for multiplicative errors, where they imply that even optimal rates are exceptionally slow.
Article
This paper considers estimation of a continuous bounded probability density when observations from the density are contaminated by additive measurement errors having a known distribution. Properties of the estimator obtained by deconvolving a kernel estimator of the observed data are investigated. When the kernel used is sufficiently smooth the deconvolved estimator is shown to be pointwise consistent and bounds on its integrated mean squared error are derived. Very weak assumptions are made on the measurement-error density thereby permitting a comparison of the effects of different types of measurement error on the deconvolved estimator
Article
Introduction The Kantorovich duality Geometry of optimal transportation Brenier's polar factorization theorem The Monge-Ampere equation Displacement interpolation and displacement convexity Geometric and Gaussian inequalities The metric side of optimal transportation A differential point of view on optimal transportation Entropy production and transportation inequalities Problems Bibliography Table of short statements Index.
Article
This paper considers the consistent estimation of nonlinear errors-in-variables models. It adopts the functional modeling approach by assuming that the true but unobserved regressors are random variables but making no parametric assumption on the distribution from which the latent variables are drawn. This paper shows how the information extracted from the replicate measurements can be used to identify and consistently estimate a general nonlinear errors-in-variables model. The identification is established through characteristic functions. The estimation procedure involves nonparametric estimation of the conditional density of the latent variables given the measurements using the identification results at the first stage, and at the second stage, a semiparametric nonlinear least-squares estimator is proposed. The consistency of the proposed estimator is also established. Finite sample performance of the estimator is investigated through a Monte Carlo study.
Article
There are many environments where knowledge of a structural relationship is required to answer questions of interest. Also, nonseparability of a structural disturbance is a key feature of many models. Here, we consider nonparametric identification and estimation of a model that is monotonic in a nonseparable scalar disturbance, which disturbance is independent of instruments. This model leads to conditional quantile restrictions. We give local identification conditions for the structural equations from those quantile restrictions. We find that a modified completeness condition is sufficient for local identification. We also consider estimation via a nonparametric minimum distance estimator. The estimator minimizes the sum of squares of predicted values from a nonparametric regression of the quantile residual on the instruments. We show consistency of this estimator.
Article
This paper considers the nonparametric estimation of the densities of the latent variable and the error term in the standard measurement error model when two or more measurements are available. Using an identification result due to Kotlarski we propose a two-step nonparametric procedure for estimating both densities based on their empirical characteristic functions. We distinguish four cases according to whether the underlying characteristic functions are ordinary smooth or supersmooth. Using the loglog Law and von Mises differentials we show that our nonparametric density estimators are uniformly convergent. We also characterize the rate of uniform convergence in each of the four cases.
Article
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite-dimensional parameter spaces that may not be compact and the optimization problem may no longer be well-posed. The method of sieves provides one way to tackle such difficulties by optimizing an empirical criterion over a sequence of approximating parameter spaces (i.e., sieves); the sieves are less complex but are dense in the original space and the resulting optimization problem becomes well-posed. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated semi-nonparametric models with (or without) endogeneity and latent heterogeneity. It can easily incorporate prior information and constraints, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. It can simultaneously estimate the parametric and nonparametric parts in semi-nonparametric models, typically with optimal convergence rates for both parts.This chapter describes estimation of semi-nonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve M-estimates, pointwise normality of series estimates of regression functions, root-n asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite-dimensional parameters. Examples are used to illustrate the general results.
Article
We study linear factor models under the assumptions that factors are mutually independent and independent of errors, and errors can be correlated to some extent. Under the factor non-Gaussianity, second-to-fourth-order moments are shown to yield full identification of the matrix of factor loadings. We develop a simple algorithm to estimate the matrix of factor loadings from these moments. We run Monte Carlo simulations and apply our methodology to data on cognitive test scores, and financial data on stock returns.
Article
Suppose k-variate data are drawn from a mixture of two distributions, each having independent components. It is desired to estimate the univariate marginal distributions in each of the products, as well as the mixing proportion. This is the setting of two-class, fully parametrized latent models that has been proposed for estimating the distributions of medical test results when disease status is unavailable. The problem is one of inference in a mixture of distributions without training data, and until now it has been tackled only in a fully parametric setting. We investigate the possibility of using nonparametric methods. Of course, when k=1 the problem is not identifiable from a nonparametric viewpoint. We show that the problem is "almost" identifiable when k=2; there, the set of all possible representations can be expressed, in terms of any one of those representations, as a two-parameter family. Furthermore, it is proved that when k3k\geq3 the problem is nonparametrically identifiable under particularly mild regularity conditions. In this case we introduce root-n consistent nonparametric estimators of the 2k univariate marginal distributions and the mixing proportion. Finite-sample and asymptotic properties of the estimators are described.
Article
We will characterize the gamma distribution by the nature of the joint distribution of the two quotients X1|X3, X2|X2 for three identically gamma distributed random variables.
Article
This article introduces estimators defined as minimizers of Kantorovich distances between statistical models and empirical distributions. Existence, measurability and consistency of these estimators are studied. A few significant examples illustrate the applicability of the theoretical results dealt with in the paper.