Preprint

Locally Robust Policy Learning: Inequality, Inequality of Opportunity and Intergenerational Mobility

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the author.

Abstract

Policy makers need to decide whether to treat or not to treat heterogeneous individuals. The optimal treatment choice depends on the welfare function that the policy maker has in mind and it is referred to as the policy learning problem. I study a general setting for policy learning with semiparametric Social Welfare Functions (SWFs) that can be estimated by locally robust/orthogonal moments based on U-statistics. This rich class of SWFs substantially expands the setting in Athey and Wager (2021) and accommodates a wider range of distributional preferences. Three main applications of the general theory motivate the paper: (i) Inequality aware SWFs, (ii) Inequality of Opportunity aware SWFs and (iii) Intergenerational Mobility SWFs. I use the Panel Study of Income Dynamics (PSID) to assess the effect of attending preschool on adult earnings and estimate optimal policy rules based on parental years of education and parental income.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
I provide lower and upper bound estimates of inequality of opportunity (IOp) for 32 European countries, between 2005 and 2019. Lower bound estimates use machine learning methods to address sampling variability. Upper bound estimates use longitudinal data to capture all‐time invariant factors. Across all years and countries, lower bound estimates of IOp account from 6 percent to 60 percent of total income inequality, while upper bound estimates account from 20 percent to almost all income inequality. On average, upper bound IOp saw a slight decrease in the aftermath of the Great Recession, recovering and stabilizing at around 80 percent of total inequality in the second half of the 2010s. Lower bound estimates for 2005, 2011, and 2019 show a similar pattern. My findings suggest that lower and upper bound estimates complement each other, corroborating information and compensating each other's weaknesses, highlighting the relevance of a bounded estimate of IOp.
Article
Full-text available
This paper characterizes and proposes a method to correct for errors-in-variables biases in the estimation of rank correlation coefficients (Spearman’s ρ and Kendall’s τ ). We first investigate a set of sufficient conditions under which measurement errors bias the sample rank correlations toward zero. We then provide a feasible nonparametric bias-corrected estimator based on the technique of small error variance approximation. We assess its performance in simulations and an empirical application, using rich Swedish data to estimate intergenerational rank correlations in income. The method performs well in both cases, lowering the mean squared error by 50-85 percent already in moderately sized samples (n = 1, 000).
Article
Full-text available
Second-degree dominance has become a widely accepted criterion for ordering distribution functions according to social welfare. However, it provides only a partial ordering, and it may fail to rank distributions that intersect. To rank intersecting distribution functions, we propose a general approach based on rank-dependent theory. This approach avoids making arbitrary restrictions or parametric assumptions about social welfare functions and allows researchers to identify the weakest set of assumptions needed to rank distribution according so social welfare. Our approach is based on two complementary sequences of nested dominance criteria. The first (second) sequence extends second-degree stochastic dominance by placing more emphasis on differences that occur in the lower (upper) part of the distribution. The sequences characterize two separate systems of nested subfamilies of rank-dependent social welfare functions. This allows us to identify the least restrictive rank-dependent social preferences that give an unambiguous ranking of a given set of distribution functions. We also provide an axiomatization of the sequences of dominance criteria and the corresponding subfamilies of social welfare functions. We show the usefulness of our approach using two empirical applications; the first assesses the welfare implications of changes in household income distributions over the business cycle, while the second performs a social welfare comparison of the actual and counterfactual outcome distributions from a policy experiment.
Article
Full-text available
In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example, policies may be restricted to take the form of decision trees based on a limited set of easily observable individual characteristics. We propose a new approach to this problem motivated by the theory of semiparametrically efficient estimation. Our method can be used to optimize either binary treatments or infinitesimal nudges to continuous treatments, and can leverage observational data where causal effects are identified using a variety of strategies, including selection on observables and instrumental variables. Given a doubly robust estimator of the causal effect of assigning everyone to treatment, we develop an algorithm for choosing whom to treat, and establish strong guarantees for the asymptotic utilitarian regret of the resulting policy.
Article
Full-text available
We study deep neural networks and their use in semiparametric inference. We establish novel nonasymptotic high probability bounds for deep feedforward neural nets. These deliver rates of convergence that are sufficiently fast (in some cases minimax optimal) to allow us to establish valid second‐step inference after first‐step estimation with deep learning, a result also new to the literature. Our nonasymptotic high probability bounds, and the subsequent semiparametric inference, treat the current standard architecture: fully connected feedforward neural networks (multilayer perceptrons), with the now‐common rectified linear unit activation function, unbounded weights, and a depth explicitly diverging with the sample size. We discuss other architectures as well, including fixed‐width, very deep networks. We establish the nonasymptotic bounds for these deep nets for a general class of nonparametric regression‐type loss functions, which includes as special cases least squares, logistic regression, and other generalized linear models. We then apply our theory to develop semiparametric inference, focusing on causal parameters for concreteness, and demonstrate the effectiveness of deep learning with an empirical application to direct mail marketing.
Article
Full-text available
Estimates of the level of inequality of opportunity have traditionally been proposed as lower bounds due to the downward bias resulting from the partial observability of circumstances that affect individual outcome. We show that such estimates may also suffer from upward bias as a consequence of sampling variance. The magnitude of the latter distortion depends on both the empirical strategy used and the observed sample. We suggest that, although neglected in empirical contributions, the upward bias may be significant and challenge the interpretation of inequality of opportunity estimates as lower bounds. We propose a simple criterion to select the best specification that balances the two sources of bias. Our method is based on cross-validation and can easily be implemented with survey data. To show how this method can improve the reliability of inequality of opportunity measurement, we provide an empirical illustration based on income data from 31 European countries. Our evidence shows that estimates of inequality of opportunity are sensitive to model selection. Alternative specifications lead to significant differences in the absolute level of inequality of opportunity and to the re-ranking of a number of countries, which confirms the need for an objective criterion to select the best econometric model when measuring inequality of opportunity.
Article
Full-text available
Many economic and causal parameters depend on nonparametric or high dimensional first steps. We give a general construction of locally robust/orthogonal moment functions for GMM, where first steps have no effect, locally, on average moment functions. Using these orthogonal moments reduces model selection and regularization bias, as is important in many applications, especially for machine learning first steps. Also, associated standard errors are robust to misspecification when there is the same number of moment functions as parameters of interest. We use these orthogonal moments and cross‐fitting to construct debiased machine learning estimators of functions of high dimensional conditional quantiles and of dynamic discrete choice parameters with high dimensional state variables. We show that additional first steps needed for the orthogonal moment functions have no effect, globally, on average orthogonal moment functions. We give a general approach to estimating those additional first steps. We characterize double robustness and give a variety of new doubly robust moment functions. We give general and simple regularity conditions for asymptotic theory.
Article
Full-text available
There are many economic parameters that depend on nonparametric first steps. Examples include games, dynamic discrete choice, average exact consumer surplus, and treatment effects. Often estimators of these parameters are asymptotically equivalent to a sample average of an object referred to as the influence function. The influence function is useful in local policy analysis, in evaluating local sensitivity of estimators, and constructing debiased machine learning estimators. We show that the influence function is a Gateaux derivative with respect to a smooth deviation evaluated at a point mass. This result generalizes the classic Von Mises (1947) and Hampel (1974) calculation to estimators that depend on smooth nonparametric first steps. We give explicit influence functions for first steps that satisfy exogenous or endogenous orthogonality conditions. We use these results to generalize the omitted variable bias formula for regression to policy analysis for and sensitivity to structural changes. We apply this analysis and find no sensitivity to endogeneity of average equivalent variation estimates in a gasoline demand application.
Article
Full-text available
In applied problems it is common to specify a model for the conditional mean of a response given a set of regressors. A subset of the regressors may be missing for some study subjects either by design or happenstance. In this article we propose a new class of semiparametric estimators, based on inverse probability weighted estimating equations, that are consistent for parameter vector α0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled. We show that the asymptotic variance of the optimal estimator in our class attains the semiparametric variance bound for the model by first showing that our estimation problem is a special case of the general problem of parameter estimation in an arbitrary semiparametric model in which the data are missing at random and the probability of observing complete data is bounded away from 0, and then deriving a representation for the efficient score, the semiparametric variance bound, and the influence function of any regular, asymptotically linear estimator in this more general estimation problem. Because the optimal estimator depends on the unknown probability law generating the data, we propose locally and globally adaptive semiparametric efficient estimators. We compare estimators in our class with previously proposed estimators. We show that each previous estimator is asymptotically equivalent to some, usually inefficient, estimator in our class. This equivalence is a consequence of a proposition stating that every regular asymptotic linear estimator of α0 is asymptotically equivalent to some estimator in our class. We compare various estimators in a small simulation study and offer some practical recommendations.
Article
Full-text available
This paper develops asymptotic optimality theory for statistical treatment rules in smooth parametric and semiparametric models. Manski (2000, 2002, 2004) and Dehejia (2005) have argued that the problem of choosing treatments to maximize social welfare is distinct from the point estimation and hypothesis testing problems usually considered in the treatment effects literature, and advocate formal analysis of decision procedures that map empirical data into treatment choices. We develop large-sample approximations to statistical treatment assignment problems using the limits of experiments framework. We then consider some different loss functions and derive treatment assignment rules that are asymptotically optimal under average and minmax risk criteria.
Article
Full-text available
We consider median regression and, more generally, a possibly infinite collection of quantile regressions in high-dimensional sparse models. In these models, the number of regressors p is very large, possibly larger than the sample size n, but only at most s regressors have a nonzero impact on each conditional quantile of the response variable, where s grows more slowly than n. Since ordinary quantile regression is not consistent in this case, we consider ℓ1-penalized quantile regression (ℓ1-QR), which penalizes the ℓ1-norm of regression coefficients, as well as the post-penalized QR estimator (post-ℓ1-QR), which applies ordinary QR to the model selected by ℓ1-QR. First, we show that under general conditions ℓ1-QR is consistent at the near-oracle rate s/nlog(pn)\sqrt{s/n}\sqrt{\log(p\vee n)} , uniformly in the compact set U(0,1)\mathcal{U}\subset(0,1) of quantile indices. In deriving this result, we propose a partly pivotal, data-driven choice of the penalty level and show that it satisfies the requirements for achieving this rate. Second, we show that under similar conditions post-ℓ1-QR is consistent at the near-oracle rate s/nlog(pn)\sqrt{s/n}\sqrt{\log(p\vee n)} , uniformly over U\mathcal{U} , even if the ℓ1-QR-selected models miss some components of the true models, and the rate could be even closer to the oracle rate otherwise. Third, we characterize conditions under which ℓ1-QR contains the true model as a submodel, and derive bounds on the dimension of the selected model, uniformly over U\mathcal{U} ; we also provide conditions under which hard-thresholding selects the minimal true model, uniformly over U\mathcal{U} .
Article
Full-text available
We obtain an improved approximation rate (in Sobolev norm) of r -1/2-α(d+1)/ for a large class of single hidden layer feedforward artificial neural networks (ANN) with r hidden units and possibly nonsigmoid activation functions when the target function satisfies certain smoothness conditions. Here, d is the dimension of the domain of the target function, and α∈(0, 1) is related to the smoothness of the activation function. When applying this class of ANNs to nonparametrically estimate (train) a general target function using the method of sieves, we obtain new root-mean-square convergence rates of Op([n/log(n)]-(1+2α/(d+1))/[4(1+α/(d+1))])=op(n -1/4) by letting the number of hidden units τn, increase appropriately with the sample size (number of training examples) n. These rates are valid for i.i.d. data as well as for uniform mixing and absolutely regular (β-mixing) stationary time series data. In addition, the rates are fast enough to deliver root-n asymptotic normality for plug-in estimates of smooth functionals using general ANN sieve estimators. As interesting applications to nonlinear time series, we establish rates for ANN sieve estimators of four different multivariate target functions: a conditional mean, a conditional quantile, a joint density, and a conditional density. We also obtain root-n asymptotic normality results for semiparametric model coefficient and average derivative estimators
Article
We propose the use of machine learning methods to estimate inequality of opportunity and illustrate that regression trees and forests represent a substantial improvement over existing approaches: they reduce the risk of ad‐hoc model selection and trade off upward and downward bias in inequality of opportunity estimates. The advantages of regression trees and forests are illustrated by an empirical application for a cross‐section of 31 European countries. We show that arbitrary model selection may lead to significant biases in inequality of opportunity estimates relative to our preferred method. These biases are reflected in both point estimates and country rankings. This article is protected by copyright. All rights reserved.
Article
We study the problem of a decision maker who must provide the best possible treatment recommendation based on an experiment. The desirability of the outcome distribution resulting from the policy recommendation is measured through a functional capturing the distributional characteristic that the decision maker is interested in optimizing. This could be, e.g., its inherent inequality, welfare, level of poverty or its distance to a desired outcome distribution. If the functional of interest is not quasi-convex or if there are constraints, the optimal recommendation may be a mixture of treatments. This vastly expands the set of recommendations that must be considered. We characterize the difficulty of the problem by obtaining maximal expected regret lower bounds. Furthermore, we propose two (near) regret-optimal policies. The first policy is static and thus applicable irrespectively of the subjects arriving sequentially or not in the course of the experimentation phase. The second policy can utilize that subjects arrive sequentially by successively eliminating inferior treatments and thus spends the sampling effort where it is most needed.
Article
As a result of digitization of the economy, more and more decision makers from a wide range of domains have gained the ability to target products, services, and information provision based on individual characteristics. Examples include selecting offers, prices, advertisements, or emails to send to consumers, choosing a bid to submit in a contextual first-price auctions, and determining which medication to prescribe to a patient. The key to enabling this is to learn a treatment policy from historical observational data in a sample-efficient way, hence uncovering the best personalized treatment choice recommendation. In “Offline Policy Learning: Generalization and Optimization,” Z. Zhou, S. Athey, and S. Wager provide a sample-optimal policy learning algorithm that is computationally efficient and that learns a tree-based treatment policy from observational data. In our quest toward fully automated personalization, the work provides a theoretically sound and practically implementable approach.
Article
Empirical researchers are increasingly faced with rich data sets containing many controls or instrumental variables, making it essential to choose an appropriate approach to variable selection. In this paper, we provide results for valid inference after post- or orthogonal L2-boosting is used for variable selection. We consider treatment effects after selecting among many control variables and instrumental variable models with potentially many instruments. To achieve this, we establish new results for the rate of convergence of iterated post-L2-boosting and orthogonal L2-boosting in a high-dimensional setting similar to Lasso, i.e., under approximate sparsity without assuming the beta-min condition. These results are extended to the 2SLS framework and valid inference is provided for treatment effect analysis. We give extensive simulation results for the proposed methods and compare them with Lasso. In an empirical application, we construct efficient IVs with our proposed methods to estimate the effect of pre-merger overlap of bank branch networks in the US on the post-merger stock returns of the acquirer bank.
Article
The policy relevant treatment effect (PRTE) measures the average effect of switching from a status-quo policy to a counterfactual policy under consideration. Estimation of the PRTE involves estimation of multiple preliminary parameters, including propensity scores, conditional expectation functions of the outcome and covariates given the propensity score, and marginal treatment effects. These preliminary estimators can affect the asymptotic distribution of the PRTE estimator in complicated and intractable manners. In this light, we propose an orthogonal score for double debiased estimation of the PRTE, whereby the asymptotic distribution of the PRTE estimator is obtained without any influence of preliminary parameter estimators as far as they satisfy mild requirements of convergence rates. To our knowledge, this paper is the first to develop limit distribution theories for inference about the PRTE.
Article
Inequality of opportunity is defined as the difference in individuals’ outcome systematically correlated with morally irrelevant pre-determined circumstances, such as ethnicity, socio-economic background, area of birth. This definition has been extensively studied by economists on the assumption that, in addition to being normatively undesirable, it can be related to low potentials for growth. However, empirical estimations of inequality of opportunity require accessing rich data sources, rarely available in poorer countries. In this paper, we exploit 13 consumption household surveys to evaluate inequality of opportunity in 10 Sub-Saharan African countries. According to our results, the portion of total inequality that can be attributed to exogenous circumstances is between 40% and 56% for the generality of countries. Our estimates are significantly higher than what has been found by previous studies. We detect a positive association between total consumption inequality and inequality of opportunity, and we study the different sources of unequal opportunities. The place of birth and the education of the father appear to exert the most relevant role in shaping inequality of opportunity in the region.
Article
We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman [Mach. Learn. 45 (2001) 5–32]) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: nonparametric quantile regression, conditional average partial effect estimation and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.
Article
One of the main objectives of empirical analysis of experiments and quasi‐experiments is to inform policy decisions that determine the allocation of treatments to individuals with different observable covariates. We study the properties and implementation of the Empirical Welfare Maximization (EWM) method, which estimates a treatment assignment policy by maximizing the sample analog of average social welfare over a class of candidate treatment policies. The EWM approach is attractive in terms of both statistical performance and practical implementation in realistic settings of policy design. Common features of these settings include: (i) feasible treatment assignment rules are constrained exogenously for ethical, legislative, or political reasons, (ii) a policy maker wants a simple treatment assignment rule based on one or more eligibility scores in order to reduce the dimensionality of individual observable characteristics, and/or (iii) the proportion of individuals who can receive the treatment is a priori limited due to a budget or a capacity constraint. We show that when the propensity score is known, the average social welfare attained by EWM rules converges at least at n^(−1/2) rate to the maximum obtainable welfare uniformly over a minimally constrained class of data distributions, and this uniform convergence rate is minimax optimal. We examine how the uniform convergence rate depends on the richness of the class of candidate decision rules, the distribution of conditional treatment effects, and the lack of knowledge of the propensity score. We offer easily implementable algorithms for computing the EWM rule and an application using experimental data from the National JTPA Study.
Article
Finding the optimal treatment regime (or a series of sequential treatment regimes) based on individual characteristics has important applications in areas such as precision medicine, government policies and active labor market interventions. In the current literature, the optimal treatment regime is usually defined as the one that maximizes the average benefit in the potential population. This paper studies a general framework for estimating the quantile-optimal treatment regime, which is of importance in many real-world applications. Given a collection of treatment regimes, we consider robust estimation of the quantile-optimal treatment regime, which does not require the analyst to specify an outcome regression model. We propose an alternative formulation of the estimator as a solution of an optimization problem with an estimated nuisance parameter. This novel representation allows us to investigate the asymptotic theory of the estimated optimal treatment regime using empirical process techniques. We derive theory involving a nonstandard convergence rate and a non-normal limiting distribution. The same nonstandard convergence rate would also occur if the mean optimality criterion is applied, but this has not been studied. Thus, our results fill an important theoretical gap for a general class of policy search methods in the literature. The paper investigates both static and dynamic treatment regimes. In addition, doubly robust estimation and alternative optimality criterion such as that based on Gini's mean difference or weighted quantiles are investigated. Numerical simulations demonstrate the performance of the proposed estimator. A data example from a trial in HIV+ patients is used to illustrate the application.
Article
This paper concerns the problem of allocating a binary treatment among a target population based on observed covariates. The goal is to (i) maximize the mean social welfare arising from an eventual outcome distribution, when a budget constraint limits what fraction of the population can be treated and (ii) to infer the dual value, i.e. the minimum resources needed to attain a specific level of mean welfare via efficient treatment assignment. We consider a treatment allocation procedure based on sample data from randomized treatment assignment and derive asymptotic frequentist confidence interval for the welfare generated from it. We propose choosing the conditioning covariates through cross-validation. The methodology is applied to the efficient provision of anti-malaria bed net subsidies, using data from a randomized experiment conducted in Western Kenya. We find that subsidy allocation based on wealth, presence of children and possession of bank account can lead to a rise in subsidy use by about 9% points compared to allocation based on wealth only, and by 17% points compared to a purely random allocation.
Article
We use administrative records on the incomes of more than 40 million children and their parents to describe three features of intergenerational mobility in the United States. First, we characterize the joint distribution of parent and child income at the national level. The conditional expectation of child income given parent income is linear in percentile ranks. On average, a 10 percentile increase in parent income is associated with a 3.4 percentile increase in a child’s income. Second, intergenerational mobility varies substantially across areas within the United States. For example, the probability that a child reaches the top quintile of the national income distribution starting from a family in the bottom quintile is 4.4% in Charlotte but 12.9% in San Jose. Third, we explore the factors correlated with upward mobility. High mobility areas have (i) less residential segregation, (ii) less income inequality, (iii) better primary schools, (iv) greater social capital, and (v) greater family stability. Although our descriptive analysis does not identify the causal mechanisms that determine upward mobility, the publicly available statistics on intergenerational mobility developed here can facilitate research on such mechanisms. JEL Codes: H0, J0, R0.
Article
We discuss the tension between 'what we can get' (identification) and 'what wewant' (parameters of interest) in models of policy choice (treatment assignment). Our nonstandard empirical object of interest is the ranking of counterfactual policies. Partial identification of treatment effects maps into a partial welfare ranking of treatment assignment policies. We characterize the identified ranking and show how the identifiability of the ranking depends on identifying assumptions, the feasible policy set, and distributional preferences. An application to the project STAR experiment illustrates this dependence. This paper connects the literatures on partial identification, robust statistics, and choice under Knightian uncertainty. © 2016 by the President and Fellows of Harvard College and the Massachusetts Institute of Technology.
Article
We introduce a class of so-called conditional U-statistics, which generalize the Nadaraya-Watson estimate of a regression function in the same way as Hoeffding’s classical U-statistic is a generalization of the sample mean. Asymptotic normality and weak and strong consistency are proved.
Article
This paper considers an individual making a treatment choice. The individual has access to data on other individuals, with values for a list of characteristics, treatment assignments, and outcomes. The individual knows his value for the list of characteristics. The goal is to use this data set to guide his treatment choice. The role of treatment assignment is developed, and how it affects the specification of prior distributions. The likelihood function is the same for random assignment and for selection on observables, but the prior distributions differ. A question here is whether there is a value in knowing the propensity score. The propensity score does not appear in the likelihood, but it does appear in the prior distribution. So there is a value to knowing the propensity score if the prior is not dominated by the data. In particular, the list of measured characteristics may be of high dimension, and the paper considers prior distributions that may be effective in this case. The paper also considers selection on unobservables and the use of instrumental variables. The prior distribution is not dominated by the data. We make a particular suggestion, in which the undominated part of the prior shows up in the choice of a functional form, which is then combined with a maximum-likelihood approximation to obtain a decision rule. We discuss the role of extrapolation in this decision rule by making a connection with compliers, always-takers, and never-takers in the local average treatment effect developed by Imbens and Angrist (1994).
Article
This paper studies the problem of treatment choice between a status quo treatment with a known outcome distribution and an innovation whose outcomes are observed only in a finite sample. I evaluate statistical decision rules, which are functions that map sample outcomes into the planner’s treatment choice for the population, based on regret, which is the expected welfare loss due to assigning inferior treatments. I extend previous work started by Manski (2004) that applied the minimax regret criterion to treatment choice problems by considering decision criteria that asymmetrically treat Type I regret (due to mistakenly choosing an inferior new treatment) and Type II regret (due to mistakenly rejecting a superior innovation) and derive exact finite sample solutions to these problems for experiments with normal, Bernoulli and bounded distributions of outcomes. The paper also evaluates the properties of treatment choice and sample size selection based on classical hypothesis tests and power calculations in terms of regret.
Article
Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr {S – ES ≥ nt} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.
Article
John Rawls's work (1971) has greatly contributed to rehabilitating equality as a basic social value, after decades of utilitarian hegemony,particularly in normative economics, but Rawls also emphasized that full equality of welfare is not an adequate goal either. This thesis was echoed in Dworkin's famous twin papers on equality (Dworkin 1981a,b), and it is now widely accepted that egalitarianism must be selective. The bulk of the debate on ‘Equality of What?’ thus deals with what variables ought to be submitted for selection and how this selection ought to be carried out.
Article
Let V ⊆ {0, 1}n have Vapnik-Chervonenkis dimension d. Let (k/n, V) denote the cardinality of the largest W ⊆ V such that any two distinct vectors in W differ on at least k indices. We show that (k/n, V) ≤ (cn/(k + d))d for some constant c. This improves on the previous best result of . This new result has applications in the theory of empirical processes.
Article
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite-dimensional parameter spaces that may not be compact and the optimization problem may no longer be well-posed. The method of sieves provides one way to tackle such difficulties by optimizing an empirical criterion over a sequence of approximating parameter spaces (i.e., sieves); the sieves are less complex but are dense in the original space and the resulting optimization problem becomes well-posed. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated semi-nonparametric models with (or without) endogeneity and latent heterogeneity. It can easily incorporate prior information and constraints, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. It can simultaneously estimate the parametric and nonparametric parts in semi-nonparametric models, typically with optimal convergence rates for both parts.This chapter describes estimation of semi-nonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve M-estimates, pointwise normality of series estimates of regression functions, root-n asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite-dimensional parameters. Examples are used to illustrate the general results.
Article
This paper applies the minimax regret criterion to choice between two treatments conditional on observation of a finite sample. The analysis is based on exact small sample regret and does not use asymptotic approximations or finite-sample bounds. Core results are: (i) Minimax regret treatment rules are well approximated by empirical success rules in many cases, but differ from them significantly–both in terms of how the rules look and in terms of maximal regret incurred–for small sample sizes and certain sample designs. (ii) Absent prior cross-covariate restrictions on treatment outcomes, they prescribe inference that is completely separate across covariates, leading to no-data rules as the support of a covariate grows. I conclude by offering an assessment of these results.
Article
When incomes are ranked in descending order the social-evaluation function corresponding to the Gini relative inequality index can be written as a linear function withthe weights being the odd numbers in increasing order. We generalize this function by allowing the weights to be an arbitrary non-decreasing sequence of numbers. This results in a class of generalized Gini relative inequality indices and a class of generalized Gini absolute inequality indices. An axiomatic characterization of the latter class is also provided.
Book
This is the online version of a book written by Michael R. Kosorok, and published by Springer Science + Business Media, Inc. in 2006. It contains the first 14 chapters. The aim of this book is to introduce statisticians, and researchers with a background in statistics, to empirical processes and semiparametric inference. It contains three parts. The first part provides an overview of basic concepts in both empirical processes and semiparametric inference. The second part is devoted to empirical processes, whilst the third part is devoted to semiparametric efficiency and inference. References are also provided. The book is in PDF format (Adobe Acrobat Reader required).
Article
There is little evidence on unemployment duration and its determinants in developing countries. This study is on the duration aspect of unemployment in a developing country, Turkey. We analyze the determinants of the probability of leaving unemployment for employment or the hazard rate. The effects of the personal and household characteristics and the local labor market conditions are examined. The analyses are carried out for men and women separately. The results indicate that the nature of unemployment in Turkey exhibits similarities to the unemployment in both the developed and the developing countries.
Article
An important objective of empirical research on treatment response is to provide decision makers with information useful in choosing treatments. This paper studies minimax-regret treatment choice using the sample data generated by a classical randomized experiment. Consider a utilitarian social planner who must choose among the feasible statistical treatment rules, these being functions that map the sample data and observed covariates of population members into a treatment allocation. If the planner knew the population distribution of treatment response, the optimal treatment rule would maximize mean welfare conditional on all observed covariates. The appropriate use of covariate information is a more subtle matter when only sample data on treatment response are available. I consider the class of conditional empirical success rules; that is, rules assigning persons to treatments that yield the best experimental outcomes conditional on alternative subsets of the observed covariates. I derive a closed-form bound on the maximum regret of any such rule. Comparison of the bounds for rules that condition on smaller and larger subsets of the covariates yields sufficient sample sizes for productive use of covariate information. When the available sample size exceeds the sufficiency boundary, a planner can be certain that conditioning treatment choice on more covariates is preferable (in terms of minimax regret) to conditioning on fewer covariates. Copyright The Econometric Society 2004.
Article
This paper evaluates a pilot program run by a company called OPOWER, previously known as Positive Energy, to mail home energy reports to residential utility consumers. The reports compare a household’s energy use to that of its neighbors and provide energy conservation tips. Using data from randomized natural field experiment at 80,000 treatment and control households in Minnesota, I estimate that the monthly program reduces energy consumption by 1.9 to 2.0 percent relative to baseline. In a treatment arm receiving reports each quarter, the effects decay in the months between letters and again increase upon receipt of the next letter. This suggests either that the energy conservation information is not useful across seasons or, perhaps more interestingly, that consumers’ motivation or attention is malleable and non-durable. I show that “profiling,” or using a statistical decision rule to target the program at households whose observable characteristics suggest larger treatment effects, could substantially improve cost effectiveness in future programs. The effects of this program provide additional evidence that non-price “nudges” can substantially affect consumer behavior.
Article
Kakwani decomposition of redistributive effect into vertical and reranking terms is one of the most widely used tools in measurement of income redistribution. This paper describes how the decomposition has emerged, how its proponents managed to expand and upgrade it, and how extensively it has been employed in empirical research. However, the arguments are presented that the decomposition features certain methodological problems and it is therefore called for its reinterpretation.
Article
Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set.
Least squares after model selection in high-dimensional sparse models
---(2013): "Least squares after model selection in high-dimensional sparse models," Bernoulli, 19, 521 -547.
  • Y Cui
  • S Han
Cui, Y. and S. Han (2024): "Policy Learning with Distributional Welfare,".
The measurement of the inequality of incomes
  • H Dalton
Dalton, H. (1920): "The measurement of the inequality of incomes," The Economic Journal, 30, 348-361.
Ethically flexible Gini indices for income distributions in the continuum
---(1983): "Ethically flexible Gini indices for income distributions in the continuum," Journal of Economic Theory, 29, 353-358.
  • J C Escanciano
  • J R Terschuur
Escanciano, J. C. and J. R. Terschuur (2023): "Machine Learning Inference on Inequality of Opportunity," arXiv preprint arXiv:2206.05235.
Fairness in Europe: A Multidimensional Comparison
  • P Hufe
  • A Peichl
  • P Schüle
  • J Todorović
Hufe, P., A. Peichl, P. Schüle, and J. Todorović (2022): "Fairness in Europe: A Multidimensional Comparison," in CESifo Forum, München: ifo Institut-Leibniz-Institut für Wirtschaftsforschung an der..., vol. 23, 45-51.