John M. Abowd

John M. Abowd
  • Cornell University

About

227
Publications
26,980
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,444
Citations
Current institution
Cornell University

Publications

Publications (227)
Preprint
Full-text available
We use place of birth information from the Social Security Administration linked to earnings data from the Longitudinal Employer-Household Dynamics Program and detailed race and ethnicity data from the 2010 Census to study how long-term earnings differentials vary by place of birth for different self-identified race and ethnicity categories. We foc...
Article
Nous étudions les méthodes à effets mixtes pour l’estimation d’équations contenant des effets individuels et d’entreprise. En économie, ces modèles sont généralement estimés à l’aide de méthodes à effets fixes. Les améliorations récentes de ces méthodes à effets fixes incluent des corrections du biais dans l’estimation de la matrice de covariance d...
Article
Full-text available
The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. We argue that any proposal for quantifying disclosure risk should be based on prespecifi...
Preprint
Full-text available
We study mixed-effects methods for estimating equations containing person and firm effects. In economics such models are usually estimated using fixed-effects methods. Recent enhancements to those fixed-effects methods include corrections to the bias in estimating the covariance matrix of the person and firm effects, which we also consider.
Preprint
Full-text available
This chapter examines the motivations and imperatives for modernizing how statistical agencies approach statistical disclosure limitation for official data product releases. It discusses the implications for agencies' broader data governance and decision-making, and it identifies challenges that agencies will likely face along the way. In conclusio...
Article
Full-text available
This paper is part of the Global Repository of Income Dynamics (GRID) project cross‐country comparison of earnings inequality, volatility, and mobility. Using data from the U.S. Census Bureau's Longitudinal Employer‐Household Dynamics (LEHD) infrastructure files, we produce a uniform set of earnings statistics for the U.S. From 1998 to 2019, we fin...
Article
In an era where external data and computational capabilities far exceed statistical agencies’ own resources and capabilities, they face the renewed challenge of protecting the confidentiality of underlying microdata when publishing statistics in very granular form and ensuring that these granular data are used for statistical purposes only. Convent...
Article
This paper is part of a coordinated collection of papers on prime-age male earnings volatility. Each paper produces a similar set of statistics for the same reference population using a different primary data source. Our primary data source is the Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) infrastructure files. Using LEHD data...
Preprint
Full-text available
The purpose of this paper is to guide interpretation of the semantic privacy guarantees for some of the major variations of differential privacy, which include pure, approximate, R\'enyi, zero-concentrated, and $f$ differential privacy. We interpret privacy-loss accounting parameters, frequentist semantics, and Bayesian semantics (including new res...
Preprint
Full-text available
This article is an edited transcript of the session of the same name at the 38th Annual NABE Economic Policy Conference: Policy Options for Sustainable and Inclusive Growth. The panelists are experts from government and private research organizations.
Article
This article is an edited transcript of the session of the same name at the 38th Annual NABE Economic Policy Conference: Policy Options for Sustainable and Inclusive Growth. The panelists are experts from government and private research organizations.
Article
There is a large literature on earnings and income volatility in labor economics, household finance, and macroeconomics. One strand of that literature has studied whether individual earnings volatility has risen or fallen in the United States over the last several decades. There are strong disagreements in the empirical literature on this important...
Preprint
Full-text available
In an era where external data and computational capabilities far exceed statistical agencies' own resources and capabilities, they face the renewed challenge of protecting the confidentiality of underlying microdata when publishing statistics in very granular form and ensuring that these granular data are used for statistical purposes only. Convent...
Preprint
Full-text available
The Census TopDown Algorithm (TDA) is a disclosure avoidance system using differential privacy for privacy-loss accounting. The algorithm ingests the final, edited version of the 2020 Census data and the final tabulation geographic definitions. The algorithm then creates noisy versions of key queries on the data, referred to as measurements, using...
Preprint
Full-text available
The TopDown Algorithm (TDA) first produces differentially private counts at the nation and then produces counts at lower geolevels (e.g.: state, county, etc.) subject to the constraint that all query answers in lower geolevels are consistent with those at previously estimated geolevels. This paper describes the three sets of definitions of these ge...
Preprint
Full-text available
There is a large literature on earnings and income volatility in labor economics, household finance, and macroeconomics. One strand of that literature has studied whether individual earnings volatility has risen or fallen in the U.S. over the last several decades. There are strong disagreements in the empirical literature on this important question...
Preprint
Full-text available
This paper is part of the Global Income Dynamics Project cross-country comparison of earnings inequality, volatility, and mobility. Using data from the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) infrastructure files we produce a uniform set of earnings statistics for the U.S. From 1998 to 2019, we find U.S. earnings inequa...
Preprint
Full-text available
Privacy-protected microdata are often the desired output of a differentially private algorithm since microdata is familiar and convenient for downstream users. However, there is a statistical price for this kind of convenience. We show that an uncertainty principle governs the trade-off between accuracy for a population of interest ("sum query") vs...
Chapter
This chapter provides an overview of the methods that have been developed and implemented to safeguard privacy, while providing researchers the means to draw valid conclusions from protected data. It focuses on the protections that pertain to the linked nature of the data. The protection mechanisms are both physical and statistical, but exist becau...
Article
We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full-quarter employment, average monthly earnings of full-quarter employees...
Preprint
Full-text available
This paper is one of a collection of papers on prime-age male earnings volatility. Each paper produces a similar set of statistics for the same reference population using a different primary data source. Our primary data source is the Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) infrastructure files. Using LEHD data from 1998 to...
Preprint
Full-text available
We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total flow-employment, beginning-of-quarter employment, full-quarter employment, average monthly earnings of full-quarter employees...
Preprint
Full-text available
With vast databases at their disposal, private tech companies can compete with public statistical agencies to provide population statistics. However, private companies face different incentives to provide high-quality statistics and to protect the privacy of the people whose data are used. When both privacy protection and statistical accuracy are p...
Article
When Google or the US Census Bureau publishes detailed statistics on browsing habits or neighborhood characteristics, some privacy is lost for everybody while supplying public information. To date, economists have not focused on the privacy loss inherent in data publication. In their stead, these issues have been advanced almost exclusively by comp...
Article
IN 2020, THE U.S. Census Bureau will conduct the Constitutionally mandated decennial Census of Population and Housing. Because a census involves collecting large amounts of private data under the promise of confidentiality, traditionally statistics are published only at high levels of aggregation. Published statistical tables are vulnerable to data...
Article
Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of pr...
Article
Full-text available
As readers of this Journal know, I paid my tribute to Steve Fienberg in my 2016 Julius Shiskin Lecture:As readers of this Journal know, I paid my tribute to Steve Fienberg in my 2016 Julius Shiskin Lecture: "Finally, I would like to acknowledge the role of Stephen Fienberg of Carnegie Mellon University. I'm sure almost everyone in this auditorium...
Preprint
Full-text available
When differential privacy was created more than a decade ago, the motivating example was statistics published by an official statistics agency. In attempting to transition differential privacy from the academy to practice, the U.S. Census Bureau has encountered many challenges unanticipated by differential privacy's creators. These challenges inclu...
Article
With the dramatic improvement in both computer speeds and the efficiency of SAT and other NP-hard solvers in the last decade, DRAs on statistical databases are no longer just a theoretical danger. The vast quantity of data products published by statistical agencies each year may give a determined attacker more than enough information to reconstruct...
Preprint
Full-text available
Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of pr...
Conference Paper
The U.S. Census Bureau announced, via its Scientific Advisory Committee, that it would protect the publications of the 2018 End-to-End Census Test (E2E) using differential privacy. The E2E test is a dress rehearsal for the 2020 Census, the constitutionally mandated enumeration of the population used to reapportion the House of Representatives and r...
Article
Decomposing the year-to-year changes in the earnings distribution from 2004 to 2013, we analyze the role of the employer in explaining earnings inequality in the United States. Movements between the bottom, middle, and top involve 20.5 million workers each year. Another 19.9 million move between employment and nonemployment. There are large gains f...
Article
Full-text available
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to the broader research community and to the Federal Statistical System (FSS), particularly to the Census Bureau. The activities to date have covered both f...
Article
We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to un...
Article
Full-text available
The dual problems of respecting citizen privacy and protecting the confidentiality of their data have become hopelessly conflated in the “Big Data” era. There are orders of magnitude more data outside an agency’s firewall than inside it—compromising the integrity of traditional statistical disclosure limitation methods. And increasingly the informa...
Conference Paper
Full-text available
National statistical agencies around the world publish tabular summaries based on combined employer-employee (ER-EE) data. The privacy of both individuals and business establishments that feature in these data are protected by law in most countries. These data are currently released using a variety of statistical disclosure limitation (SDL) techniq...
Article
Government statistical agencies collect enormously valuable data on the nation's population and business activities. Wide access to these data enables evidence-based policy making, supports new research that improves society, facilitates training for students in data science, and provides resources for the public to better understand and participat...
Article
Full-text available
In contrast to the many public-use microdata samples available for individual and household data from many statistical agencies around the world, there are virtually no establishment or firm microdata available. In large part, this difficulty in providing access to business microdata is due to the skewed and sparse distributions that characterize b...
Article
We use the bipartite graph representation of longitudinally linked employer-employee data, and the associated projections onto the employer and employee nodes, respectively, to characterize the set of potential statistical summaries that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method...
Article
Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between protection of confidentiality and quality of inference. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underly...
Research
We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to un...
Article
We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker-and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unm...
Article
Full-text available
This paper explores the consequences for economic research of methods used by data publishers to protect the privacy of their respondents. We review the concept of statistical disclosure limitation for an audience of economists who may be unfamiliar with these methods. We characterize what it means for statistical disclosure limitation to be ignora...
Article
Full-text available
We estimate “CEO style” wage equations using two longitudinal matched employer-employee data sets. We develop a model in which the wages of all employees in a firm are linked to those of top management, which generates specific empirical implications that have already been tested on CEOs. We provide evidence that the wages of “regular” employees in...
Article
Full-text available
We test for sorting of workers between and within industrial sectors in a directed search model with coordination frictions. We fit the model to sector-specific vacancy and output data along with publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general...
Article
We use the bipartite graph representation of longitudinally linked employer-employee data, and the associated projections onto the employer and employee nodes, respectively, to characterize the set of potential statistical summaries that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method...
Article
Full-text available
We propose a new methodology that does not assume a prior specification of the statistical properties of the measurement errors and treats all sources as noisy measures of some underlying true value. The unobservable true value can be represented as a weighted average of all available measures, using weights that must be specified a priori unless t...
Article
Full-text available
In this paper, matched employer-employee data constructed by the United States Census Bureau are used to study the effects of co-worker ability on wages, with a multiple imputation procedure used to complete missing data on job tenure. Basic wage equations are augmented by the addition of average years of schooling and tenure within each firm. The...
Article
Full-text available
Social science researchers increasingly make use of data that is confidential because it contains linkages to the identities of people, corporations, etc. The value of this data lies in the ability to join the identifiable entities with external data, such as genome data, geospatial information, and the like. However, the confidentiality of this da...
Conference Paper
Full-text available
We develop the core of a method for solving the data archive and curation problem that confronts the custodians of restricted-access research data and the scientific users of such data. Our solution recognizes the dual protections afforded by physical security and access limitation protocols. It is based on extensible tools and can be easily incorp...
Article
The Census Bureau's Quarterly Workforce Indicators (QWI) provide detailed quarterly statistics on employment measures such as worker and job flows, tabulated by worker characteristics in various combinations. The data are released for several levels of NAICS industries and geography, the lowest aggregation of the latter being counties. Disclosure a...
Article
Full-text available
We use the Census Bureau's Quarterly Workforce Indicators and the Federal Housing Finance Agency's House Price Indices to study the effects of the housing price bubble on local labor markets. We show that the 35 MSAs in the top decile of the house price boom were most severely impacted. Their stable job employment fell much more than the national a...
Article
Full-text available
We consider a particular maximum likelihood estimator (MLE) and a computationally-intensive Bayesian method for differentially private estimation of the linear mixed-effects model (LMM) with normal random errors. The LMM is important because it is used in small area estimation and detailed industry tabulations that present significant challenges fo...
Conference Paper
Full-text available
We consider a differentially private MLE for the linear mixed-effects model with normal random errors. This model is important because it is frequently used in small area estimation and detailed industry tabulations that present significant challenges for confidentiality protection of the underlying data. The differentially private estimator perfor...
Article
Full-text available
Using gross flows of workers into and out of employment, we investigate the composition of flows in non-recessionary periods as well as in the Great Recession of 2008-2009. In particular, we use gross flows at highly detailed geographic and demographic levels to assess whether particular demographic groups are less affected by the sharp changes in...
Article
Full-text available
In this paper, we describe the sensitivity of small-cell flow statistics to coding errors in the identity of the underlying entities. Specifically, we present results based on a comparison of the U.S. Census Bureau's Quarterly Workforce Indicators (QWI) before and after correcting for such errors in SSN-based identifiers in the underlying individua...
Article
The Quarterly Workforce Indicators (QWI) are local labor market data produced and released every quarter by the United States Census Bureau. Unlike any other local labor market series produced in the U.S. or the rest of the world, QWI measure employment flows for workers (accession and separations), jobs (creations and destructions) and earnings fo...
Article
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. One approach with the potential for overcoming these risks is to release synthetic data; that is, the released establishment data are simulated from statistical mode...
Article
Full-text available
We study the effects of endogenous mobility on the linear decomposition of log wage rates into observable, personal, and employer heterogeneity. The Abowd, Kramarz and Margolis (1999) method for estimating such models under the assumption of exogenous mobility produces residuals that can be used to produce tests for the validity of the assumption....
Article
Full-text available
When the founders of this Journal -- Cynthia Dwork, Stephen Fienberg and Alan Karr -- made its initial call for papers, they and we identified many constituencies that participate in the scientific analysis of privacy and confidentiality. Statisticians, particularly those working within national statistical offices, have developed the field of stat...
Article
This chapter reports a detailed documentation of the Longitudinal Employer-Households Dynamics Program (LEHD) data sources and the methods used to construct the Quarterly Workforce Indicators (QWI). It provides a valuable reference source for users of the QWI and the LEHD. The procedures implemented in the LEHD Infrastructure Files to detect, edit,...
Chapter
This chapter uses the Longitudinal Employer-Household Dynamics (LEHD) data set to assess the human capital embodied in a firm's workforce and relate it to the performance of the firm. It observes that mass layoffs and firm failure are much more likely in firms with a large proportion of low human capital workers. Firms that do not fail generally up...
Chapter
This chapter investigates the sources of variation in two core outcomes of interest to economists in the United States—the earnings distribution and mobility patterns—and presents a brief literature review and institutional background. It also briefly discusses the new database infrastructure, and then reports some basic statistics about the struct...
Article
Parker and Van Praag (2009) showed, based on theory, that the group status of the profession ‘entrepreneurship’ shapes people’s occupational preferences and thus their choice behavior. The current study focuses on the determinants and consequences of the group status of a profession, entrepreneurship in particular. If the group status of entreprene...
Conference Paper
This short paper provides a synthesis of the statistical disclosure limitation and computer science data privacy approaches to measuring the confidentiality protections provided by fully synthetic data. Since all elements of the data records in the release file derived from fully synthetic data are sampled from an appropriate probability distributi...
Conference Paper
Full-text available
In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that shows the commuting patterns of the population of the United States. The source data for this app...
Chapter
Full-text available
In this article we address the econometric issues associated with analyses of these data, in particular with longitudinal linked employer-employee data. The key feature of such data is that individuals and employing firms are both identified and followed over time. Measured characteristics of the individual are collected at multiple points in time...
Article
Full-text available
In this paper we examine the earnings covariance matrix generated from a ten-year time series and estimate a variance components model that parameterizes the process generating earnings. We use our estimated variance components to test key hypotheses concerning life-cycle human capital investment and labor supply separately for men and women. Hu-ma...
Article
Beyond the traditional methods of tabulations and public-use microdata samples, statistical agencies have developed four key alternatives for providing non-government researchers with access to confidential microdata to improve statistical modeling. The first, licensing, allows qualified researchers access to confidential microdata at their own fac...
Article
Full-text available
The U.S. economy is highly dynamic: businesses open and close, workers switch jobs and start new enterprises, and innovative technologies redefine the workplace and enhance productivity. With globalization markets have also become more interconnected. Measuring business activity in this rapidly evolving environment increasingly requires tracking co...
Article
Full-text available
We estimate the effects of technology investments on the demand for skilled workers using longitudinally integrated employer-employee data from the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics Program infrastructure files spanning two Economic Censuses (1992 and 1997). We estimate the distribution of human capital and its observabl...

Network

Cited By