ArticlePDF Available

Estimating Measurement Error in Annual Job Earnings: A Comparison of Survey and Administrative Data


Abstract and Figures

We propose a new methodology that does not assume a prior specification of the statistical properties of the measurement errors and treats all sources as noisy measures of some underlying true value. The unobservable true value can be represented as a weighted average of all available measures, using weights that must be specified a priori unless there has been a truth audit. The Census Bureau's Survey of Income and Program Participation (SIPP) survey jobs are linked to Social Security Administration earnings data, creating two potential annual earnings observations. The reliability statistics for both sources are quite similar except for cases where the SIPP used imputations for some missing monthly earnings reports. © 2013 by the President and Fellows of Harvard College and the Massachusetts Institute of Technology.
Content may be subject to copyright.
The Review of Economics and Statistics
Vo l . XCV December 2013 Number 5
John M. Abowd and Martha H. Stinson*
Abstract—We propose a new methodology that does not assume a prior
specification of the statistical properties of the measurement errors and treats
all sources as noisy measures of some underlying true value. The unobserv-
able true value can be represented as a weighted average of all available
measures, using weights that must be specified a priori unless there has
been a truth audit. The Census Bureau’s Survey of Income and Program
Participation (SIPP) survey jobs are linked to Social Security Administra-
tion earnings data, creating two potential annual earnings observations. The
reliability statistics for both sources are quite similar except for cases where
the SIPP used imputations for some missing monthly earnings reports.
I. Introduction
IN the large and long literature on measurement error, most
studies begin with data that are believed to contain errors
and look for a way to quantify those errors. The goal is to
account for and remove the effects of those errors. Defining
data errors fundamentally requires some measure of truth:
an objective standard by which the accuracy of the data can
be judged. A common approach to this measurement error
problem is to find a second data source that contains the
“truth” and define errors in the first data set as the differ-
ence between the two sources. We believe, however, that the
assumption that some data contain errors while other data do
not is fundamentally flawed. While the error-generating pro-
cess may be different in the two sources, no source is likely
to be completely error free.
Received for publication May 5, 2011. Revision accepted for publication
August 28, 2012.
* Abowd: Cornell University, Census Bureau NBER, CREST/INSEE, and
IZA; Stinson: U.S. Census Bureau.
We thank Joanne Pascale, Tracy Mattingly, Gary Benedetto, Julia Lane,
George Jakubson, David Johnson, Donald Kenkel,Simon Woodcock, Kevin
McKinney, Kristin Sandusky, Lars Vilhuber, and Marc Roemer for helpful
comments and support. We also acknowledge the comments of the edi-
tor and two referees whose input improved the paper substantially. This
report is released to inform interested parties of ongoing research and to
encourage discussion of work in progress. Any views expressed on sta-
tistical, methodological, technical, or operational issues are those of the
authors and not necessarily those of the U.S. Census Bureau, Cornell Uni-
versity, or any of the project sponsors. This work was partially supported by
the National Science Foundation Grant SES-9978093 and SES-0427889 to
Cornell University, the National Institute on Aging Grant R01-AG18854,
and the Alfred P. Sloan Foundation. J.A. also acknowledges direct support
from the Census Bureau and NSF grants SES-0339191, SES-1042181, and
SES-1131848. All data used in this paper are confidential. The U.S. Census
Bureau supports external researchers’ use of some of these data through the
Research Data Center network ( For public use data
please visit and click “Access SIPP Synthetic Data.
A supplemental appendix is available online at http://www.mitpress
In this paper, we expand the measurement error litera-
ture in two ways. Our first contribution is to extend the
methodology by showing that defining truth with respect to
an observed quantity requires a researcher to place priors
on which source of data is the most reliable. These pri-
ors define the measurement error and its properties. After
recognizing this dependence on prior beliefs, we relax the
assumption that one source of data is truth. We show how
different priors lead to different measures of truth and, sub-
sequently, different amounts of error. We then specify and
estimate a multivariate linear mixed-effects model (MLMM)
in the spirit of Abowd and Card (1989) and consider the level
of error in both the fixed effects (the relationship between the
measure and the observable characteristics of respondents)
and the random effects (relationship between the measure
and the unobservable characteristics of respondents and their
employers), showing how different priors about the truth lead
to different conclusions. Our approach includes the special
case of declaring one source to be truth and can be used with
complete or incomplete data from any number of sources.
Our second contribution is to implement our method
using job-level annual earnings from two sources: the Cen-
sus Bureau’s Survey of Income and Program Participation
(SIPP) and the Internal Revenue Service (IRS)/Social Secu-
rity Administration (SSA) W-2 forms. We use the largest
nationally representative sample to date—five SIPP panels–
matched at the job level to administrative data from the
SSAs Detailed Earnings Record (DER) to study the measure-
ment error in earnings for these important national data. Our
comprehensive data allow us to combine the early focus in
the survey measurement error literature on employer versus
employee reports, which were inherently job-level studies,
with the later focus on comprehensive samples of survey
respondents regularly used by empirical researchers, which
were conducted at the person level because the particu-
lar survey studied, the Current Population Survey, collected
complete data only at the person level.1
1When both job-level and person-level earnings data are available, as in
the SIPP, many studies use the job-level earnings measure, in combination
with the survey hours measures, to construct outcomes and control variables
of interest. See, for example, Gottschalk and Moffitt (1999), which uses the
SIPP, CPS, and PSID in exactly this way, to study the effects of employer
The Review of Economics and Statistics, December 2013, 95(5): 1451–1467
No rights reserved. This work was authored as part of the Contributor’s official duties as an Employee of the United States Government and is therefore a work of the United States Government.
In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. law.
We use the internal, confidential SIPP data that have
uncapped earnings and a carefully edited job history intended
to accurately track movement across jobs over the course of a
year. The confidential administrative data also have uncapped
earnings, measured separately for each employer. Among val-
idation studies done with survey data, our analysis uses the
most complete set of inputs to date. We analyze a range of
priors about the reliability of each data source. We report the
resulting measures of error for both the fixed and random
components of a multivariate earnings equation estimated by
residual maximum likelihood (REML).
The paper proceeds as follows. In section II, we review past
findings from studies of measurement error in survey earnings
and discuss how these relate to our study. In section III, we
lay out our methodological framework for identifying and
quantifying errors in earnings. We give a brief overview of
our data and the linking process between the two sources in
section IV. In section V, we present our results quantifying
error in earnings in the SIPP and W-2 data. We conclude in
section VI with thoughts on how both producers and users of
public use data might take account of measurement error.
II. Background
Early studies of measurement error, beginning with Fuller
(1987), defined observed quantities as the sum of unobserved
true values and error. Hence, a variable Ytis decomposed into
yt+ut, the sum the of true value, yt, and measurement error,
ut. Assuming that the ytand utare uncorrelated, one can
calculate a reliability ratio that gives the percentage of total
variance that is true variance:
κyy =Cov y,Y
Var [Y]=σyy
σyy +σuu.
This statistic is important because when a second variable At
is a function of the true value ytplus some error,
but is regressed on Yt,
the reliability ratio defines the ratio of ˆ
βto β. This is evident
from the formula for the expected value of ˆ
β, which, when
σyu =0 and σue =0, is given by
β=Cov [A,Y]
Var [Y]=βσyy
σyy +σuu=βκyy .
Angrist and Krueger (1999) define a reliability ratio for
first differenced quantities Y=(YtYt1)=(ytyt1)+
σyy +σuu (1τ)
where τis the autocorrelation coefficient of the measurement
error and ρis the autocorrelation coefficient of yt.Ifρ>τ,
then (1τ)
(1ρ)is greater than 1 and the reliability ratio declines
relative to the ratio for levels of yt.
Finally if Ytis used as a dependent variable and regressed
on xtand the covariance of ytand utis not 0, then using
ut=δyt+v, where δis called the attenuation bias, Bound
et al. (1994) show that the coefficient on xtin the regression
Y=(1+δ)y+v=xβ+εwill be biased since E ˆ
Hence, only if the variance and structure of the measure-
ment error are known can unbiased estimators of βin either
the direct or reverse regression be obtained. Those studying
measurement error have focused on estimating κyy and testing
whether the assumptions of classical measurement error were
violated. Studies that obtain a second report for the mismea-
sured variable Yin order to calculate σuu and σyy have been
called validation studies (as in, for example, Bound, Brown,
& Mathiowetz, 2001). Without exception, the second report
is treated as yt(i.e., truth) and the measurement errors are
calculated as ut=Yt(first report) yt(second report). The
properties of these errors are then investigated. Researchers
have often concluded that the assumptions of classical mea-
surement errors are violated and that the errors are correlated
with the true values, i.e., σyu = 0. However, they acknowl-
edge that their models are driven by the assumption that
they obtained a true measure of y. Without this assumption,
there would be no way to determine the relation between the
errors and the true values. This assumption is fundamentally
untestable and is justified solely by the authors’ knowledge of
the quality of the secondary data source. We briefly discuss
the history of this interpretation in the context of earnings
validation studies next.
Some prominent examples of validation studies are Mel-
low and Sider (1983), Duncan and Hill (1985), Bound et al.
(1994), Pischke (1995), Bound and Krueger (1991), Kapteyn
and Ypma (2007), Gottschalk and Huynh (2010), and Mei-
jer, Rohwedder, and Wansbeek (2012). The first four papers
are similar to ours in that they estimate measurement error
at the job level. Mellow and Sider (1983) used a special
supplement to the January 1977 Current Population Sur-
vey (CPS) that obtained employment information from both
workers and employers. Looking at matched pairs with
both employer and employee wage reports, they found that
employer-reported wages exceeded worker reports by 4.8%
on average. Although they did not calculate a reliability ratio
per se, they did test the sensitivity of statistical models and
concluded that wage regressions were generally not that sen-
sitive to the source of information: worker versus employer.
In particular, the returns to education and experience were
very similar across their different regression equations. This
result is consistent with a relatively high reliability ratio.
2Bound et al. (1994) also derive a formula for the reliability ratio when
ytand utare correlated.
Like Mellow and Sider, Duncan and Hill (1985), Bound
et al. (1994), and Pischke (1995) examine two wage reports
for each job they observe. However, they do not use a rep-
resentative sample of people in the United States but rather
sample workers from a large anonymous manufacturing com-
pany. Workers at the company were interviewed using a Panel
Study of Income Dynamics (PSID) survey instrument, and
then information for these workers was obtained from com-
pany records. The authors treated the company reports of
annual earnings as measures of true earnings values and con-
sidered any differences between worker and employer reports
to be errors on the part of the workers, justifying this decision
as due to their confidence in the accuracy of the company’s
The survey was carried out in two waves, and the
authors reported a ratio of noise to total variance
(σyy+σuu )in the notation aboveof 0.302 for annual earnings
in 1986 and 0.151 for annual earnings in 1982. They found
evidence that errors in earnings were correlated with the true
levels of earnings and reported noise-to-variance ratios that
took account of this covariance as 0.239 in 1986 and 0.076 in
1982. They estimated the proportional attenuation bias when
using earnings as a dependent variable, δ,as0.172 for 1986
and 0.104 for 1982. Finally, the authors compared earnings
equations using the two data sources and found that relative to
the employer provided data, employee interview data over-
stated the return to education by 40% and understated the
return to tenure by 20%.
Bound and Krueger (1991) depart from the job-level
method and compare total earnings in a given year from two
sources. Like our study, they use survey and administrative
data, linking March 1977 and 1978 CPS respondents to Social
Security earnings records. They reported large negative corre-
lations between measurement error and true earnings for both
CPS reference years (1976 at 0.46 and 1977 at 0.42). They
reported reliability ratios that did and did not take account of
these correlations as 1.016 and 0.844, respectively, for 1976
and 0.974 and 0.819 for 1977.
Bound et al. (2001, p. 3832) summarized the general
approach of all of these studies as: “Those collecting val-
idation data usually begin with the intention of obtaining
‘true’ values against which the errors of survey reports can be
assessed; more often than not we end up with the realization
that the validation data are also imperfect. While much can
still be learned from such data, particularly if one is confident
the errors in the validation data are uncorrelated with those in
the survey reports, this means replacing one assumption (e.g.,
errors are uncorrelated with true values) with another (e.g.,
errors in survey reports uncorrelated with errors in validation
Gottschalk and Huynh (2010) also studied person-level
earnings using matched SIPP and SSA data to conduct an
extensive investigation of the effects of measurement error
on earnings inequality and mobility. These are the same data
sources that we use, but they use only the 1996 SIPP panel.
Initially these authors do not label the SSA data as truth
and instead describe their study as quantifying differences
between the two sources. Nonetheless, their ability to study
the impact of SIPP error on mobility and inequality measures
hinges on a definition of error that requires them to declare
one source as truth. Using the DER as truth, the authors
estimate a traditional reliability ratio of 0.67 for the SIPP.
However, they conclude that measurement error is mean
reverting and show that in their framework, this type of error
partially offsets the bias in estimates of inequality in the SIPP.
They also conclude that measurement error is correlated over
time, and this diminishes the attenuation bias in the correla-
tion of earnings and lessens the impact of measurement error
on estimates of earnings mobility in the SIPP.
Kapteyn and Ypma (2007) use person-level linked sur-
vey and administrative data from Sweden. Like Gottschalk
and Huynh and our paper, they do not declare either source
to be true earnings, which formally is a latent variable in
their mixture of normals model. They use a prior specifica-
tion of the effects of misclassification errors and different
survey measurement errors to identify the posterior probabil-
ities associated with the administrative and survey measures
being true. Meijer et al. (2012) show that the Kapteyn and
Ypma model is a special case of a mixture factor model
with a specification very similar to ours. Their generalized
model also identifies the marginal prior probabilities, which
are treated as hyperparameters, and the conditional posterior
probabilities by making functional form assumptions about
the relation between the survey and administrative measures.
The relation between the identification strategy in these mix-
ture factor models and our multivariate linear mixed-effects
model is discussed in section IIIB.
A final related study by Roemer (2002) uses matched CPS,
SIPP, and DER data to study the distribution of annual earn-
ings. Rather than focus on reliability statistics and regression-
coefficient comparisons, Roemer compares the percentiles
of the annual earnings distributions from the three sources.
Treating the DER as truth, he concludes that both the SIPP
and CPS estimate a person’s percentile rank more accurately
than the dollar amount of earnings. In his analysis, the SIPP
displays a shortage of high-earning workers compared to the
III. Statistical Model
A. Multivariate Linear Mixed-Effects Model
In this section we lay out the general statistical model
that we employ to estimate the fixed and random com-
ponents of our joint model for SIPP and DER earnings
outcomes observed on the same matched job. The underly-
ing specification is a multivariate linear mixed-effects model
(MLMM). The advantage of the MLMM framework is that it
shows with full generality how to accommodate two or more
matched observations of earnings on the same job, how to
vary the prior assumptions about which measure is “true”
systematically, and how to use external audit information,
if available, to update the posterior distribution over which
value is “true.
The outcome under study, which is the dependent vari-
able in the model, is yist ,a1×Qvector of measures of log
earnings for individual i=1, ...,Iin sequential job spell
s{1, ...,S}, and time period t{1, ...,T}, where Qis
the total number of sources of earnings reports. The indices
sand tare always ordered sequentially, but not every ihas
values for every level of sand t. Define the vectors xist, the
1×Kdesign of the fixed effects associated with sex, race,
education, experience, and so on; dithe 1 ×Idesign of the
random effects associated with person i; and fist the 1 ×J
design of the random effects associated with the employer of
iin job spell sduring period t. The full model is
yist =xist B+diΘ+fist Ψ+ηist , (1)
where Bis the K×Qmatrix of fixed-effect coefficients, Θ
the I×Qmatrix of random person effects with ith row θi
the 1 ×Qvector of random person effects for individual i,
Ψis the J×Qmatrix of random employer effects with jth
row ψjthe 1 ×Qvector of random employer effects for each
employer j, and ηist is the 1×Qresidual vector for individual
iin job spell sfor period t.
Stacking the observation over i,s,t, the model becomes
Y=XB +ZU +H,(2)
where Yis the N×Qmatrix of dependent variables with N
equal to the total number of person, job spell, and years in
the data; Xis the N×Kdesign matrix for all fixed effects;
DF]is the N×(I+J)design matrix for the combined
random effects; U≡[
ΘTΨT]Tis an (I+J)×Qmatrix
of random effects; and His an N×Qmatrix of residuals.
Equation (2)is a multivariate linear mixed-effects model rep-
resented in canonical form. By construction, every column of
Ycan be represented as a single linear mixed-effects model,
also represented in canonical form, for dependent variable
Y(q), where the subscript (q)denotes selection of the indi-
cated column from the associated matrix. For example, the
qth column has the form
for q=1, ...,Q.
Parameterize the stochastic structure of equation (2)as
E[ηist |X,Z]=0,
Var [ηist |X,Z]=Σ0,Q×Q,
Cov [ηist,ηist|X,Z]=0, i= i,s,s,t,t,
Cov [ηist,ηist|X,Z]=0, s= st,t,
Cov [ηist,ηist |X,Z]=Σtt,t= t,Q×Q,
Var [θi|X,Z]=G(θ),Q×Q,
Cov [θi,θi|X,Z]=0, i= i,
Var ψj|X,Z=G(ψ),Q×Q,
Cov ψj,ψj|X,Z=0, j= j,
where Σ|tt|is the Q×Qautocovariance matrix of ηist at
lag |tt|. We use ASREML (Gilmour et al. 2009) to fit
equation (2)using the residual maximum likelihood method
(REML) and assuming that vec (U)and vec (H)have inde-
pendent joint normal distributions with zero means and the
covariance structure specified above. This special-purpose
software may not be as familiar to economists as it is to bio-
statisticians; however, all of the parameters that we specify
in the MLMM are identified using conventional methods—
proof that the residual likelihood function has a well-defined
maximum for the given specification. The ASREML software
checks these identification conditions (or estimability condi-
tions, as they are known in biostatistics) and notes violations
at the optimum, so that only estimates of identifiable param-
eters, for both fixed and random effects, are reported. We do
not elaborate on the estimation because ASREML produces
the REML estimates of all parameters and their estimated
covariance matrix taking account of the full stochastic struc-
ture of the specified model.3Although they are identifiable
and calculated by the software, we make no use of the esti-
mated person and employer effects, ˆ
ΨT]Tin our
To simplify the exposition of the results, we note here the
exact formulas for all of the variance components for the case
Q=2 in which the SIPP value is listed first (q=1)and the
DER value is listed second (q=2). Then we have
θ2, (6)
ψ2. (7)
3We require the assumption that G(θ),G(ψ), and Σ|tt|for all tare con-
sistently estimated when either Uor His nonnormal. This is reasonable
because the estimation is based on minimizing an objective function that
depends on only the first two moments of vec (Y), albeit with a form that is
based on the multivariate normal distribution. Departures from normal are
likely to affect the covariance matrix in the asymptotic distribution of the
unique parameters of G(θ),G(ψ), and Σ|tt|, which we have not investigated.
B. Defining True Values, Associated Measurement Error, and
Reliability Statistics for the Random Effects
Signal, measurement error, and truth audits. Let ω
ω1,...,ωQTbe a Q×1 vector where ιTω=1, 0 ωi1,
and ιis always a conformable column vector of 1s. The ele-
ments of the vector ωcorrespond to the prior probabilities or
weights associated with each of the elements being the cor-
rect or true value. The signal is, then, the expected true value,
where the expectation is taken over the prior probabilities ω,
Sig (yist )yist ω, (8)
and the measurement error is the deviation of each measure
from the signal component:
ME (yist )yist Sig (yist )=yist yist ωιT
=yist [IωιT]. (9)
Hence, the weight vector ω=(0, 1)Twhen Q=2 corre-
sponds to declaring that the second measure (in our case
DER) is correct. This is precisely the assumption in his-
torical administrative record measurement error studies. We
consider alternatives—in particular, ω=(0.5, 0.5), the case
where either measure is equally likely to be the truth, and
ω=(0.1, 0.9), the case where the DER is much more likely
to be true. Neither of these definitions imposes multivariate
normality on the signal or measurement error.
In a full Bayesian analysis, one would model ωist as a
latent data quality indicator whose prior expectation is the ω
in the preceding paragraph. Let gist represent the 1×Lvector
associated with the design of additional information used to
determine the correctness of the measurement, and let wist
represent the 1×Qoutcome vector from the data audit whose
elements are all zero except for a single column, q, coded 1,
which indicates that for the earnings outcome of person i,at
job spell s, in time period t,qwas the correct measurement.
Then the additional equation system that models wist as a
function of xist ,di,fist , and gist provides the framework for
computing a posterior estimate of ωist that would replace the
prior estimate used in equations (8)and (9).
It is also worth noting that our method generalizes to the
case where there are not the same number of measurements
for each i,s,ttuple. We implement this feature in our esti-
mation, treating missing values of one or the other measure
for a particular year tas ignorable (Rubin, 1976).
Identification of measurement error. No truth audit was
conducted as a part of our SIPP-to-DER matching exercise.
Therefore, we have no outcome data corresponding to wist,
even for a subsample. Hence, the prior ωmust equal the
posterior ω, which is why we form the signal and measure-
ment error components as we do. This assumption merely
formalizes what has been the practice for more than three
decades while also showing exactly what information would
be required to eliminate the use of purely prior information
about the correctness of one of the outcomes in measurement
error studies.
Because the Kapteyn and Ypma (2007) and Meijer et al.
(2012) models are facially similar to our model, we discuss
them now in the context of the assumptions that identify mea-
surement error. In our model and in theirs, if any measure is
declared to be true, wistq =1 for some qin our notation,
then measurement errors are observed and can be calculated
as yistqyistq for all q= q, again in our notation. In our
model, without an audit there is no measurement on truth. We
have not formalized the judgments that analysts might make
based on the observed data to infer truth in the absence of
an audit. This is the critical distinction between our approach
and theirs.
Kapteyn and Ypma (2007), explicitly model the ana-
lyst’s behavior. If the survey and administrative measures
match, then the analyst declares them both true, and the
properties of the model for the administrative data and
measurement-error-free survey data are identified. The case
is “labeled” in their notation. The analyst makes two further
assumptions: (a) there is no measurement error in the admin-
istrative earnings variable, only mismatch (data matched from
the wrong individual), (b) survey measurement error, which
comes in two types, must increase the variance of the survey
measure relative to the administrative measure when there
is no mismatch. These two assumptions for the “unlabeled”
cases effectively declare the administrative measure to be the
truth whenever it is correctly matched to the survey record and
provide enough information to estimate posterior probabili-
ties of truth. Meijer et al. (2012) generalize these assumptions
and show that the posterior probabilities are sensitive to the
identifying assumptions, as expected.
Our model shares the property that if there are only two
observed measures and they are equal, then there is no
observed measurement error. But we make no use of the struc-
ture of the error components in the two measures that would
identify measurement error specifically. Moreover, during an
audit, an analyst could discover that neither measure con-
tained, for example, off-the-book payments like tips. Then
neither measure would be true, and the auditor would have
to estimate such payments or declare a latent third measure
to be true. Our model explicitly allows this outcome. Fur-
thermore, an analyst who wished to use prior information
to characterize the effects of different types of errors on the
distribution of ycould calculate the posterior distribution of
ωgiven these model components. This is what Meijer et al.
(2012) have done using the mixture factor analysis model.
Our method makes clear than in the absence of an audit,
measurement error is entirely defined a priori and not from
any observed data. Classical validation studies like the ones
discussed in section I and their more sophisticated recent
counterparts all identify the measurement error by plac-
ing very strong priors on the data generation process. Our
contribution is to generalize this identification strategy to
accommodate other priors and a broader set of potential
Reliability statistics. To compute reliability statis-
tics, we require estimates of Var Sig (yist )|X,Zand
Var ME (yist )|X,Z. These can be computed from the sto-
chastic structure of equation (2):
Cov yist,yist |X,Z=G(θ)+G(ψ)+Σ|tt|, (10)
Cov Sig (yist), Sig (yist )|X,Z
=ωTG(θ)+G(ψ)+Σ|tt|ω, (11)
Cov ME (yist),ME(yist)|X,Z
The traditional measures for the case Q=2 can be computed
at all lags using ω=(0, 1)T.
Some care must be taken in using the formulas in equations
(11) and (12) because they do not represent an orthogonal
decomposition of equation (10). The conventional reliability
ratio for measure qis defined as the ratio of its signal variance
to its total variance. With the SIPP and DER measures in
positions 1 and 2, respectively, the traditional reliability ratios
TRR0,SIPP =ωT(G(θ)+G(ψ)+Σ0)ω
TRR0,DER =ωTG(θ)+G(ψ)+Σ0ω
, (14)
where the notation {}ij means to extract the i,jth element of
the matrix in {}. The difficulty with equations (13) and (14) is
that they are not bounded above by unity because either of the
two measures in isolation (SIPP or DER) can omit elements
that should be measured or include elements that should not.
Our measurement error model has only the traditional relia-
bility ratio property of being bounded above by unity when
either the SIPP or DER is true or when the two measures are
exchangeable, as in conventional survey reliability estima-
tion where the measures are obtained by repeated application
of the same survey instrument (see Groves et al., 2004).
We choose instead to define the reliability statistic by
generalizing the index of inconsistency, the ratio of the mea-
surement error variance to the total variance, and subtracting
it from unity so that it has the interpretation of a reliability
G(θ)+G(ψ)+Σ011 (15)
The reliability statistics in equations (15) and (16) reproduce
the conventional reliability ratios when either of the measures
is true or when the two measures are exchangeable.4Because
of the serial correlation caused by the structure of the indi-
vidual, employer, and time effects, we also define reliability
statistics at different lags. These are given by
G(θ)+G(ψ)+Σ|tt|11 (17)
These definitions require consistent estimates of the variance
parameters G(θ),G(ψ)and Σ|tt|, but they do not impose
multivariate normality on the underlying data.
C. Defining True Values, Associated Measurement Error, and
Reliability Statistics for the Fixed Effects
Using the MLMM specification in equation (2) and fol-
lowing the method from equation (8), we define
Sig (B)Bω, (19)
where Bis the matrix of fixed effects associated with the
design X. The true fixed effect is the ωweighted average of
the SIPP and DER fixed-effect coefficients. These end points
define the range of each fixed effect. The SIPP and DER
measurement errors are defined as
ME B(q)B(q)Bω, (20)
4It is worth noting here that it makes no sense to assume that the two
measures are exchangeable because that is equivalent to assuming that the
labels “SIPP” and “DER” are meaningless. The whole point of the analysis is
that we know that outcomes were collected based on different measurement
concepts (survey versus administrative records), so they should not have the
same joint distribution if we exchange the labels.
where q=1, 2, and the generalization to arbitrary Qis
straightforward. We note that defining the signal and the
measurement error in terms of the theoretical fixed-effect
coefficients is exactly comparable to the methods discussed
in section II, when the weight vector takes the value (0, 1)T,
that is, when we assume that DER is truth.
Applying the MLMM model estimator in ASREML pro-
duces the estimates vec(ˆ
B)], which can be
used directly to estimate the signal, measurement error, and
associated standard errors for the fixed-effect coefficients.
These are reported in section V for the same values of ωas
we use for the random effects.5
We do not compute reliability statistics for the fixed effects
because the measurement error in equation (20) has tradi-
tionally been interpreted as a bias, which is the interpretation
we adopt—making point estimates and their standard errors
more appropriate than reliability statistics.
D. Person-level Models
For comparison to the literature on the assessment of sur-
vey measurement errors using administrative data, we also
estimate the MLMM for person-level outcomes. A person-
level outcome for individual iin year tis defined as the sum
of all observations yist over all jobs s, including those jobs that
matched between the SIPP and the DER and those that did
not. Thus, for each person, there are qoutcomes per year, and
these outcomes differ across sources because of differences
in reporting at the job level and because of differences in the
number of jobs reported.
The base specification becomes
yit =xit B+diΘ+fit Ψ+ηit , (21)
where xit can be defined unambiguously because our
observed covariates do not vary by employer. In order to be
comparable to the literature, we do not include the design fit
in most of the person-level specifications. But we do report
one set of results where the employer is defined as the one for
which y2ist is a maximum over s, that is, the employer with
the greatest DER earnings during the year.6All MLMM and
reliability statistics for person-level models are defined in a
manner that is strictly comparable to the job-level models, so
we do not repeat any of the formulas here.
IV. Data Description
The fundamental unit of observation in this paper is a job,
defined as a match between an individual and an employer.
Data on jobs come from two sources: five Survey of Income
B)]is computed assuming that vec(U)and vec(H)in equa-
tion (2)have independent joint normal distributions with zero means and
covariance matrices as specified in section IIIA. This is the only use of the
normality assumption beyond the specification of the objective function in
REML. See Gilmour et al. (2009).
6In cases where a job-year has reported SIPP earnings but no DER
earnings, we use the SIPP earnings to determine the dominant employer.
and Program Participation (SIPP) panels conducted dur-
ing the 1990’s and the Detailed Earnings Records (DER)
extracted from the Social Security Administration Master
Earnings File for the respondents in each of the five panels.7
In the SIPP, data on earnings were reported monthly, while in
the DER, earnings were reported annually. In both sources,
there were multiple records per job from repeated interviews
and annually filed W-2s. Hence, in order to compare earn-
ings, we first had to identify jobs and group earnings records
over time in each data source. After job records were created,
individuals in each data set were linked by Social Security
number and, then, job records from the SIPP and the DER
were matched to each other for each individual. We describe
each step of this process.
A. Creating a SIPP Jobs Data Set
All the SIPP panels conducted in the 1990s collected
detailed labor force information from respondents every four
months, or approximately three times per year, over the
course of two and a half to four years. Respondents were
asked questions about at most two jobs held during the previ-
ous four months, where the term job was loosely defined as
working for pay. We used the longitudinal SIPP person ID,
the wave (interview) number, and an edited longitudinal job
ID that we created to combine records and create one observa-
tion per person per job. Appendix A in the online supplement,
which includes all appendixes, describes the problems we
found with the original job ID and gives a summary of how we
created our edited version. The first column of table 1 shows
the number of respondents in each SIPP panel who report
working and the total number of jobs reported, using the three
identifiers listed above to count jobs.8Once we defined a set
of jobs for each SIPP panel, we created annual earnings mea-
sures for each year covered by the survey by summing the
appropriate monthly earnings reports.
In order to understand the comparison to another data
source, it is important to first understand the concept of
earnings as used during the SIPP interview. The field rep-
resentative sought to ask each adult about his or her earnings,
but if an adult was not present at the time of the interview,
another adult could answer as a proxy. During the 1990–
1993 SIPP panels, respondents (or proxies) were asked to
report gross earnings from a specific employer in the follow-
ing way: “The next question is about the pay … received
from this job during the 4-month period. We need the most
accurate figures you can provide. Please remember that cer-
tain months contain 5 paydays for workers paid weekly and
3 paydays for workers paid every 2 weeks. Be sure to include
any tips, bonuses, overtime pay, or commissions. What was
the total amount of pay that … received BEFORE deductions
7The five SIPP panels began in 1990, 1991, 1992, 1993, and 1996.
8The edited SIPP job ID for the 1990–1993 panels was released by the
Census Bureau as an update to the public use files and is available on the
SIPP FTP website. The edited job ID is described in Stinson (2003).
Table 1.—Data Summary of SIPP and DER Jobs
12 3 4 5 6 7
Both Job Drop Missing Drop Missing Drop
SIPP Panel SIPP DER SIPP DER Match Sample Years Waves Imputes
1990 People with jobs 37,291 35,032 30,993 28,313 26,615 21,776 16,642
1989–1992 Total jobs held 66,991 96,086 55,087 88,324 41,885 36,168 30,003 22,862
1991 People with jobs 23,520 21,729 19,056 17,426 16,323 13,517 10,347
1990–1993 Total jobs held 40,818 58,020 32,447 52,797 25,258 21,761 18,331 13,892
1992 People with jobs 33,920 31,557 27,394 25,314 24,599 19,263 13,476
1991–1995 Total jobs held 65,278 99,524 51,650 90,360 39,729 37,021 29,750 20,549
1993 People with jobs 32,972 29,831 26,267 24,103 22,103 18,582 13,454
1992–1995 Total jobs held 61,094 81,320 47,723 74,317 36,469 29,563 25,144 18,013
1996 People with jobs 63,116 55,894 48,542 44,626 44,203 7,654 5,037
1995–2000 Total jobs held 121,450 192,720 97,149 173,623 75,110 72,805 13,553 8,546
Job counts in the SIPP are from internal Census files using the job tracking identifier created by authors. Job counts in the DER are done using the employer tax identifier (EIN). Person counts in the SIPP are
calculated using the internal longitudinal person identifier. Person counts in the DER are calculated using the SSN.
on this job in …?”9The field representative read the name of
each month and separately recorded earnings for that month.
In the 1996 survey instrument, which was conducted using
a computer-assisted personal Interview (CAPI) system, indi-
viduals (or proxies) could report earnings payments over a
variety of time periods and the instrument automatically cal-
culated monthly earnings. Field representatives (FRs) asked,
“Each time he/she was paid by [Name of Employer] in
[Month X], how much did he/she receive BEFORE deduc-
tions?”10 The field representative then followed up with
questions about whether there were any other payments such
as tips, bonuses, overtime pay, or commissions. Built-in con-
sistency checks flagged earnings amounts outside certain
bounds and prompted the FR to make corrections. Respon-
dents were also asked to refer to earnings records if possible
so as to give accurate responses. Thus, in the most accurate
cases, these earnings reports most likely reflected the gross
pay from monthly pay stubs.
B. Creating a DER Job-Level Data Set
The second source of data, DER, was a specialized extract
from the SSAs Master Earnings File that contained earn-
ings histories for each SIPP respondent in the 1990, 1991,
1992, 1993, and 1996 panels with a validated SSN. The
creation of the DER was a joint project between the Cen-
sus Bureau and SSA. The Census Bureau asked each SIPP
respondent at the time of the survey to provide an SSN. SSA
then compared self-reported name, sex, race, and date of birth
to their counterparts for the matching SSN on the Numident,
an administrative database containing demographic informa-
tion collected when every SSN was issued and updated when
the individual had subsequent contacts with SSA. If a respon-
dent’s name and demographics were deemed close enough to
the name and demographics associated with the SSN in the
9SIPP 1993 wave 1 questionnaire, page 15, available at http://www.census
10 SIPP 1996 wave 1 questionnaire, Labor Force Amount section,
available at
Numident, the SSN was declared valid.11 This list of validated
SSNs was the basis for extracting detailed earnings records
from the SSA Master Earnings File.
A W-2 history for a SIPP respondent consisted of annual
earnings, broken down by employer, from 1978 to 2000. The
primary earnings variable came from box 1 of the W-2 form:
wages, tips, and other compensation. This earnings vari-
able was uncapped and represented all earnings that were
taxable under federal income tax. For the purposes of this
earnings comparison study, jobs with an employer (i.e., non-
self-employment) held during the time period covered by the
survey questions were used.12 In the second column of table 1,
we show the number of SIPP respondents with DER records
and a count of unique person-employer matches.
Employers were identified in the DER by an IRS-assigned
employer identification number (EIN). The EIN linked
employers to the Business Register, the master list of all
businesses maintained by the Census Bureau as the sam-
pling frame for establishment-level surveys. Using this link,
we merged information from the Business Register about the
industry and name of the employer to each relevant job report
11 For respondents who answered “do not know” to the SSN question, an
attempt was made to find the missing SSN by locating the person in the
Numident based on his or her reported name and demographic characteris-
tics. When a respondent refused to provide an SSN, no attempt was made
to link this person to any administrative data, and the SSN was left missing.
12 In addition to employer reports, the DER contained reports of self-
employment earnings. The SIPP also collected information about self-
employment, but responses to these questions were treated separately from
responses to the questions about jobs with employers. Self-employment
reports from either source were not included in this study. A mismatch
in employment type is another reason that a SIPP job might fail to link
to a DER job. An investigation of the frequency of this type of error is
beyond the scope of this paper. However, to give an idea of the poten-
tial magnitude of the problem, we looked at self-employment rates in each
source for the 1996 panel. In this panel, approximately 4% of individuals
report self-employment in addition to regular employment. If all these self-
employment reports were really additional employer-based jobs, the SIPP
job count would be understated by approximately 2,400 jobs. In the DER,
12% of individuals have both regular and self-employment records. If all
of these self-employment cases were falsely reported by SIPP respondents
as regular jobs, the SIPP job count would be over stated by approximately
6,700 jobs. Using year-round, full-time workers, Roemer (2002) reports
that DER self-employment was misreported as a SIPP employer-based job
about 1.5% of the time.
in the DER data. Details about this merge can be found in
appendix B.
C. Matching SIPP and DER Jobs
After the creation of the SIPP and DER job-level data sets,
the next step was to create a common sample of people who
had job reports in both files. In the third column of table 1,
we show the number of people found in both sources and the
total jobs they held according to the SIPP and the DER.13
Here the timing of the survey plays an important role. In
every SIPP panel, the survey asked employment questions of
at least some respondents in the last few months of the year
preceding the official beginning year and in only a subset of
months in the final year of the panel.14 For DER jobs, we did
not have any subannual information about the dates the job
was held. In order to attempt to match as many SIPP and DER
jobs as possible, all DER jobs from the years either partially
or fully covered by the survey were included in the potential
match set, as appropriate for each respondent. We did this to
allow the best possible chance for a given SIPP job to match,
feeling that we did not wish to impose the requirement that
timing between the SIPP and the DER be exact. However,
this has the effect of making the SIPP and DER job counts
noncomparable. We report more comparable job counts in
appendix table C2, where we count only SIPP and DER jobs
reported in the full survey years.
After we matched by SSN, a job-to-job match was per-
formed using probabilistic record linking based primarily
on name matching.15 The primary basis for matching was
self-reported name of the employer from the SIPP and admin-
istrative name of the employer from the Business Register.
Earnings were not used in the match in order to prevent bias
in the subsequent comparison of earnings. Appendix C gives
the details of this match, including which additional matching
variables were used, how duplicate matches were handled,
and how company ownership changes affected the matching.
The fourth column of table 1 shows the number of SIPP jobs
that were successfully matched to a counterpart job in the
DER. While the percentage of SIPP jobs that match ranges
from 76% to 78% across the panels, the percentage of total
person earnings represented by these matching jobs is much
higher, ranging from 91% to 94%.
Of the jobs that matched, we dropped those that did not
have at least one DER and one SIPP earnings report in one
of the full years covered by the panel, but we did not require
these reports to be in the same year. For example, a SIPP job
could have earnings reports for 1996 and 1997 but not 1998,
while the DER job could have reports for all three years.
As a result, the SIPP and DER sample sizes were slightly
different for each year. In our mixed-effects modeling, 0s
13 See appendix table C1 for more details about matching at the person-
level, including a breakdown of reasons for not matching.
14 See appendix C for further details about the timing of the SIPP panels
used in this analysis.
15 We used the Fellegi and Sunter (1969) method. Details are provided in
appendix C.
were treated as missing values and were modeled as ignorable
in the sense of Rubin (1976) given all effects in the fixed-
and random-effects design matrices, facilitating estimation
with an unbalanced panel. The decision not to require exact
matching in the earnings years was based on the fact that
earnings essentially reported as 0 in one source and positive
in another source was a type of measurement error that we
did not wish to exclude. The fifth column of table 1 shows the
decrease in our sample of jobs due to missing DER or SIPP
Finally, there were jobs that matched and had both SIPP
and DER earnings in full survey years but the SIPP earnings
were incomplete due to respondents missing an interview in
the middle of the panel. When an entire household missed
an interview, the Census Bureau did not impute responses
for this wave of the survey, and the data were left missing.
We dropped individuals who ever had a missing wave of
SIPP data.16 In the sixth column of table 1, we show the
final total number of jobs per panel that were used in the
analysis. Combined across panels, our sample has 116,781
jobs, 80,792 people, and 70,081 unique employers.
In months where a SIPP respondent was interviewed but
failed to answer the earnings questions, responses were
imputed by the Census Bureau. Our main sample includes
both reported and imputed values. In addition, we split our
sample into person-job observations that never have imputed
monthly earnings and those that do and estimate our model
on both subsamples.17 This allows us to show the effect of
the Census Bureau’s imputation method on reliability statis-
tic calculations. The last column of table 1 shows people and
jobs that remain when these imputations are dropped.
Tables 2 and 3 describe the covariance structure of the
SIPP and DER earnings over time. Variances are shown on
the diagonals, covariances are listed below the diagonal, and
correlations are listed above. A job contributes an observation
for any year in which it has nonzero SIPP or DER earnings
or both. In the SIPP data, the correlations between adjacent
years range from 0.53 to 0.76. In the DER data, they are
higher, ranging from 0.80 to 0.83. For 1992 to 1994 and 1999,
the variance of earnings is higher in the DER than in the SIPP,
while in 1996, the SIPP has higher variance. In the remaining
years (1990–1991 and 1997–1998), the variance is quite close
between the two sources.
16 In earlier versions of this paper, we did not make this assumption
because we wished to show the effect of these missing data on the measure-
ment of annual earnings. We changed our sample for two reasons. First,
we wanted to be more comparable to the literature, which generally drops
missing data when estimating measurement error. Second, we concluded
that most data users would first do an imputation of some kind for the miss-
ing months when calculating annual earnings. Since at the moment there
is no standard method for doing this imputation, we decided to drop these
cases from our modeling.
17 When dropping imputed values, we use the flag that indicates a monthly
earnings value was imputed and the interview flag that tells when all of a
respondent’s answers were imputed using a hot-deck method that assigns a
donor. This latter type of imputation is called Type Z by the Census Bureau
and is used when a household is interviewed but some members are not
able to be interviewed.
Table 2.—Covariance/Correlation Matrix for Natural Log of SIPP Job Annual Earnings
1990 1991 1992 1993 1994 1996 1997 1998 1999
1990 2.038 0.61
1991 0.983 2.054 0.53
1992 0.833 1.795 0.56 0.51
1993 0.887 1.808 0.76
1994 0.699 1.060 1.935
1996 2.094 0.76 0.71 0.65
1997 1.074 1.946 0.76 0.70
1998 0.808 1.030 1.923 0.75
1999 0.659 0.772 0.961 1.805
Variances on diagonal; covariances below the diagonal; correlations above the diagonal. Sample is the matched SIPP/DER jobs with at least one SIPP and DER earnings report in the panel years and no missing
interview waves.
Table 3.—Covariance/Correlation Matrix for Natural Log of DER Job Annual Earnings
1990 1991 1992 1993 1994 1996 1997 1998 1999
1990 1.965 0.81
1991 1.209 2.095 0.80
1992 1.266 2.088 0.80 0.74
1993 1.284 2.195 0.80
1994 1.040 1.276 2.261
1996 1.898 0.82 0.77 0.72
1997 1.147 1.931 0.83 0.77
1998 0.968 1.154 1.919 0.83
1999 0.882 1.000 1.186 1.974
Variances on diagonal; covariances below the diagonal; correlations above the diagonal. Same sample as table 2.
Table 4.—Correlation Matrix for Natural Log of SIPP/DER Job Annual Earnings
Ln(SIPP Job Annual Earnings)
Ln(DER Job Annual Earnings) 1990 1991 1992 1993 1994 1996 1997 1998 1999
1990 0.75 0.60
1991 0.59 0.76 0.64
1992 0.59 0.78 0.73 0.69
1993 0.62 0.87 0.72
1994 0.55 0.71 0.88
1996 0.89 0.73 0.68 0.64
1997 0.72 0.87 0.71 0.65
1998 0.67 0.72 0.88 0.71
1999 0.64 0.66 0.72 0.86
Same sample as table 2.
Table 4 gives the cross-source correlations between each
year of DER and SIPP data. For earnings in the same year,
correlations range from 0.75 to 0.89, and for adjacent years,
from 0.55 to 0.73. In general, correlation between SIPP and
DER earnings has increased over time, with the high point
occurring in 1996. Adjacent years of DER and SIPP data
have lower cross-source correlations than the autocorrelation
of adjacent years of DER data but are mixed when compared
to the autocorrelation of adjacent years of SIPP data. For
the 1996 panel, the cross-source correlations are lower in
adjacent years than the SIPP-to-SIPP correlations. However,
for the years 1992 to 1994, the correlation of SIPP earnings
with the prior year is stronger when those earnings come from
the DER instead of the SIPP.
D. Why Administrative Data Might Not Be Truth
Before comparing earnings, we discuss three reasons
that considering administrative data to be truth might be
problematic. First, there are some definitional differences
between the two data sources. Second, there is likely to be
error in the administrative data themselves. Third, the match-
ing process between the data sources may also introduce
error. We briefly discuss each of these in turn and summa-
rize how they might affect the comparison between SIPP and
DER earnings.
Conceptual differences between SIPP and DER: Jobs and
earnings definitions. Conceptual differences between SIPP
and DER stem from different definitions of earnings and jobs.
There are at least two parts of earnings that would be reported
on an employee’s pay stub in gross earnings that are not
included in box 1 of the W-2 form: pre-tax health insurance
plan premiums paid by the employee and pre-tax elective con-
tributions made to deferred compensation arrangements such
as 401(k) retirement plans. In the latter case, these contribu-
tions are reported elsewhere on the W-2 form (e.g., box 13
in 1999) and the DER file contains reports of these deferred
earnings that can be added to box 1 earnings to approximate
gross earnings. While pre-tax health insurance plan premi-
ums are reported on the W-2 form, they are not contained in
the DER extract created for research use. This omission rep-
resents one important way in which administrative records
may differ from survey records that is not the result of error
in the survey data collection process. DER will be less than
SIPP earnings if, as instructed, the respondent reported gross
earnings during the survey that included health insurance
There are other possible differences between box 1 on the
W-2 form and gross earnings reported in the survey. These
involve an employee benefit that the employee is unlikely to
consider wages and is unlikely to be reported as such on a
pay stub but that the employer is required to report as taxable
income.18 In these cases, DER earnings are likely to be higher
than SIPP earnings, because respondents, again as instructed,
do not report these benefits as gross earnings.
A final potential problem with DER employer reports is
that EINs do not necessarily remain constant over time. This
poses problems for defining an employer-employee rela-
tionship. Unlike Social Security numbers, which serve as
good longitudinal identifiers for individuals, EINs can change
for reasons that do not involve a person moving to a new
employer. Company reorganizations through mergers, acqui-
sitions, or spinoffs may result in a worker having two W-2
forms for a tax year, each with a different EIN, without hav-
ing changed employers. In such cases, the DER earnings will
be less than the SIPP earnings because a portion of the earn-
ings for the year is missing. As part of the linking process
between DER and SIPP earnings, we attempted to identify
these kinds of successor-predecessor problems and merge the
two DER jobs determined to be related to a single SIPP job
(see appendix C for details).
To summarize, the exclusion of health insurance premiums
from the DER implies DER less than SIPP; the inclusion of
employee benefits in the DER implies DER is greater than
SIPP; and the EIN changes due to firm reorganization imply
DER is less than SIPP.
Error in the administrative data. Government agencies
that collect administrative data recognize that mistakes are
made in the reporting process although researchers com-
monly do not consider that. In the case of W-2 records, the
SSA has a process for employers to file amended returns,
and these are incorporated into the Master Earnings File.
Data managers at SSA generally suggest that most amended
returns are filed and processed within two years of the original
18 These include educational assistance above a certain monetary level,
business expense reimbursement above the amount treated as substantiated
by the IRS, payments made by the employer to cover the employee’s share
of Social Security and Medicare taxes, certain types of fringe benefits such
as the use of a company car, golden parachute payments, group-term life
insurance over $50,000 paid for by the employer,potentially some portion of
employer contributions to medical savings accounts, nonqualified moving
expenses, and, in some circumstances, sick pay from an insurance company
or other third-party payer.
filing. Since our MEF research extract was created in 2002,
we believe that our DER data from tax years 1990 to 1999
contain most of the relevant amended filings. However, we
are also confident that not all filing mistakes are recognized
and corrected and that some corrections first happen many
years later. While this type of error may be less common than
survey reporting error, it does exist.
Another type of error in administrative data is process-
ing error. Sometimes employers make a processing mistake
when filing returns for all their employees, such as adding
extra zeros to the ends of numerical amounts. There is some
evidence that automated read-in processes sometimes mal-
function and dollar amounts are created that are nonsensical.
This type of error process is very different from the one typ-
ically postulated for self-reported data as it is unrelated to
the actual amount or the person reporting. More research is
needed to determine the extent of this error and quantify its
specific impact.
Error in matching. Record mismatches are the final
source of data error that we consider here. While much
clerical review has convinced us that our probabilistic link-
ing process produced high-quality matches, some (small, we
hope), error is always introduced by this type of record match-
ing. In our case, if the SSN for a respondent is correct, then a
mismatch in the job means the earnings belonged to the same
person but did not come from the same employer. If the SSN
is incorrect, then the difference in SIPP and DER earnings is
likely to be larger because the source person was incorrect.
We believe it is unlikely that an incorrect SSN would have
many job-level matches due to our use of the employer name,
but it is a possibility, especially for large employers.
V. Results
A. Random Effects and Reliability Statistics
We report parameter estimates for the elements of Σ0,
Σ|tt|,G(θ), and G(ψ)from the estimation of our MLMM
model in equation (2)in table 5. We report the variance of
the SIPP and DER, the variance of the signal, the variance of
the measurement error, and the reliability statistics, accord-
ing to equations (10) to (18) in table 6. We present results for
job-level specifications using the full sample and then sepa-
rately using jobs with and without SIPP earnings imputations.
We also present results for person-level specifications where
the earnings are summed across all jobs, matched and non-
matched. Again, we use the full sample, a subsample with
only individuals with no imputed earnings, the complemen-
tary subsample with individuals with at least one month of
imputed earnings, and finally the full sample with a dominant
employer effect included in the random effects design matrix.
For each sample, we show in table 6 calculations done with
four different definitions of ω: SIPP as truth (1, 0), SIPP and
DER equally likely to be truth (0.5, 0.5), DER more likely
than SIPP to be truth (0.1, 0.9), and DER as truth (0, 1).
Table 5.—Unobservable Heterogeneity
Estimates of Variance of Random Effects
Job-Level Models Person-Level Models
Comparison to Literature Include Dominant
All No Imputations All No Imputations Employer
Observations Imputations Only Observations Imputations Only Effect
Person effect/G(Theta)
SIPP (1) 0.2846 0.3219 0.2656 0.1456 0.1820 0.0636 0.0422
DER (2) 0.3493 0.3734 0.3587 0.2299 0.2168 0.2169 0.0805
Covariance 0.3109 0.3444 0.2962 0.1689 0.1833 0.1135 0.0524
Employer (firm) effect/G(Psi)
SIPP (1) 0.3288 0.3671 0.2496 0.1702
DER (2) 0.4796 0.5053 0.4905 0.2250
Covariance 0.3626 0.3919 0.3165 0.1773
Common variance (sigma2_c) 0.5561 0.5435 0.4722 0.6999 0.7011 0.7037 0.7054
AR1 correlation (rho_c) 0.7219 0.7875 0.4523 0.6261 0.7138 0.4837 0.5675
SIPP variance (sigma2_1) 0.1777 0.1588 0.2250 0.0967 0.0581 0.1606 0.0951
DER variance (sigma2_2) 0.2628 0.1875 0.4064 0.1458 0.1190 0.1970 0.1350
SIPP AR1 correlation (rho_1) 0.4743 0.6275 0.2830 0.1174 0.0636 0.1287 0.1295
DER AR1 correlation (rho_2) 0.5882 0.4580 0.6895 0.3938 0.3367 0.4663 0.3803
People 80,792 58,956 26,963 80,792 53,829 26,963 80,792
Jobs 116,781 83,862 32,919
People-(Job)-Years-SIPP 210,703 145,987 64,716 182,632 117,767 64,865 182,632
People-(Job)-Years-DER 211,477 146,600 64,877 184,939 119,407 65,532 184,939
Unique employers 70,081 53,448 23,439 58,358
Value of the likelihood 120,762.89 76,369.18 40224.92 6,311.56 6,436.61 7,372.38 2,229.32
Model degrees of freedom 90 90 90 90 90 90 90
Residual degrees of freedom 422,090 292,497 129,503 367,481 237,084 130,307 367,481
Total observations 422,180 292,587 129,593 367,571 237,174 130,397 367,571
Random effects are based on the REML estimation of the mixed-effects models. The job-level sample is matched SIPP/DER jobs with at least one SIPP and DER earnings report in the panel years and no missing
interview waves. The person-level sample is created by summing earnings from all jobs for people with at least one matched job.
As shown in table 5, the person, employer, and time-period
specific variance components are uniformly higher in the
DER than the SIPP, which, using equation (10), means a
higher conditional variance for the DER overall than in the
SIPP, as shown in the third column of table 6 (Variance of Y,
DER). Person effects, employer effects, and measure-specific
random time effects all contribute to the greater conditional
variance of the DER measure as compared to SIPP. This
observation holds over all samples and subsamples and for
both job- and person-level specifications.
In the model estimated with no imputed values, the vari-
ance of the person and employer effects rises for both SIPP
and DER, and the difference between them remains similar.
However, the variance of the time-period specific compo-
nents, σ2
1and σ2
2, falls for both SIPP and DER, and the gap
narrows. In the model estimated with only jobs that had at
least one month of imputed earnings, the DER person and
employer variance components remain similar to the other
two models, but the SIPP person and employer components
become smaller. The variance of the time-period specific
component rises for both the SIPP and DER, and the gap
between them increases. This result is reflected in table 6
where the no-imputations sample has the highest variance in
the SIPP and the imputations-only sample has the lowest SIPP
variance. For the DER, the opposite is true: the imputations-
only sample has the highest variance and the no-imputations
sample has the lowest variance. Our hypothesis is that this
ranking of overall levels of conditional variance is due to
the Census Bureau’s survey imputation methods, specifically,
that earnings non-responders are not ignorably missing given
the conditioning data used in the hot-deck imputation.
The overall conditional variance for each data source
remains constant regardless of our definition of truth. The
variance of the signal moves between the SIPP variance
and the DER variance, depending on the value of ω. When
ω=(1, 0)—SIPP is truth—the signal equals the variance
of the SIPP. When ω=(0, 1)—DER is truth—the signal
equals the variance of the DER. The variance of the measure-
ment error moves accordingly and rises for each source as we
place less weight on that source being the truth. Our reliabil-
ity statistic as defined in equations (15)and (16)is shown
in column 6 of table 6. Without additional assumptions or
data about which source is truth, there is no way to choose
one reliability statistic over the others. However, the range is
informative in that it shows how much the SIPP statistic might
change if researchers were to move away from the concept
of administrative data as truth. Depending on the assumption
about truth, the statistic ranges from 1 to 0.6 for the SIPP and
from 1 to 0.68 for the DER in the full job sample. Not surpris-
ingly, the range of both the SIPP and DER reliability statistics
is larger for the model estimated with only jobs with imputa-
tions (1 to 0.37 for the SIPP and 1 to 0.55 for the DER) and
smaller for the model using only jobs without imputations
(1 to 0.68 for the SIPP and 1 to 0.73 for the DER).
Table 6.—Reliability Statistics
Truth Model Variance of Y Variance Variance of ME Reliability Statistic (RR) RR t-1 RR t-6
Sample: Jobs
All 1,0 1.347 1.648 1.347 0.000 0.536 1.000 0.675 1.000 0.759 1.000 0.882
All 0.5,0.5 1.347 1.648 1.364 0.134 0.134 0.901 0.919 0.924 0.940 0.961 0.971
All 0.1,0.9 1.347 1.648 1.570 0.434 0.005 0.678 0.997 0.754 0.998 0.874 0.999
All 0,1 1.347 1.648 1.648 0.536 0.000 0.602 1.000 0.696 1.000 0.844 1.000
No imputations 1,0 1.391 1.610 1.391 0.000 0.441 1.000 0.726 1.000 0.799 1.000 0.895
No imputations 0.5,0.5 1.391 1.610 1.390 0.110 0.110 0.921 0.931 0.942 0.950 0.968 0.974
No imputations 0.1,0.9 1.391 1.610 1.548 0.357 0.004 0.743 0.997 0.813 0.998 0.896 0.999
No imputations 0,1 1.391 1.610 1.610 0.441 0.000 0.683 1.000 0.769 1.000 0.872 1.000
Imputations only 1,0 1.212 1.728 1.212 0.000 0.770 1.000 0.554 1.000 0.641 1.000 0.796
Imputations only 0.5,0.5 1.212 1.728 1.278 0.193 0.193 0.841 0.889 0.848 0.910 0.912 0.949
Imputations only 0.1,0.9 1.212 1.728 1.607 0.624 0.008 0.485 0.996 0.507 0.996 0.715 0.998
Imputations only 0,1 1.212 1.728 1.728 0.770 0.000 0.365 1.000 0.391 1.000 0.648 1.000
Sample: People
All 1,0 0.942 1.076 0.942 0.000 0.280 1.000 0.740 1.000 0.853 1.000 0.860
All 0.5,0.5 0.942 1.076 0.939 0.070 0.070 0.926 0.935 0.955 0.963 0.949 0.965
All 0.1,0.9 0.942 1.076 1.037 0.227 0.003 0.759 0.997 0.855 0.999 0.835 0.999
All 0,1 0.942 1.076 1.076 0.280 0.000 0.703 1.000 0.821 1.000 0.796 1.000
No imputations 1,0 0.941 1.037 0.941 0.000 0.209 1.000 0.798 1.000 0.900 1.000 0.895
No imputations 0.5,0.5 0.941 1.037 0.937 0.052 0.052 0.944 0.950 0.972 0.975 0.971 0.974
No imputations 0.1,0.9 0.941 1.037 1.009 0.170 0.002 0.820 0.998 0.910 0.999 0.905 0.999
No imputations 0,1 0.941 1.037 1.037 0.209 0.000 0.778 1.000 0.889 1.000 0.882 1.000
Imputations only 1,0 0.928 1.118 0.928 0.000 0.411 1.000 0.632 1.000 0.744 1.000 0.756
Imputations only 0.5,0.5 0.928 1.118 0.920 0.103 0.103 0.889 0.908 0.902 0.936 0.809 0.939
Imputations only 0.1,0.9 0.928 1.118 1.062 0.333 0.004 0.641 0.996 0.683 0.997 0.380 0.998
Imputations only 0,1 0.928 1.118 1.118 0.411 0.000 0.557 1.000 0.609 1.000 0.234 1.000
All, employer effect 1,0 1.013 1.146 1.013 0.000 0.289 1.000 0.748 1.000 0.839 1.000 0.821
All, employer effect .5, 1.013 1.146 1.007 0.072 0.072 0.929 0.937 0.951 0.960 0.938 0.955
All, employer effect .1, 1.013 1.146 1.107 0.234 0.003 0.769 0.997 0.842 0.998 0.798 0.998
All, employer effect 0,1 1.013 1.146 1.146 0.289 0.000 0.715 1.000 0.805 1.000 0.750 1.000
Calculations made using variance components from REML mixed-effects models as reported in table 5. The label “imputations only” means that at least one SIPP year contains a hot-deck earnings imputation for at
least one month for that person-job.
The reliability statistic is predictably higher for the mod-
els estimated with person-level data and ranges from 1 to
0.7 for the SIPP and 1 to 0.74 for the DER when estimated
using the whole sample. These results are consistent with
those of Gottschalk and Huynh (2000), who found a reliabil-
ity statistic of 0.67 for the 1996 panel, which rose to 0.73 when
earnings imputations were dropped. Although we include
four additional SIPP panels, when we declare the DER to be
the truth, our SIPP reliability statistics of 0.70 and 0.78 for
the full and no-imputation samples, respectively, are close.
For the person-level specification that includes the dominant
employer effect, there is very little change in the reliability
statistics for either the SIPP or the DER. The only major
difference is the fall in the variance of the person effects,
which is also what Abowd, Kramarz, and Margolis (1999)
find in French administrative data and Woodcock (2008) finds
for American administrative data. Most of the variance due
to employers was previously attributed to individuals in the
validation studies cited in section II.
In the final two columns of table 6, we show reliabil-
ity statistics for data one year lagged and six years lagged,
calculated based on equations (17)and (18). For every defi-
nition of truth, the SIPP reliability statistic is higher for time
period t1 than for t, a finding that is also consistent with
Gottschalk and Huynh’s result that the impact of measure-
ment error on earnings mobility estimates is mitigated by
the structure of the error, in particular its property of mean
reversion. Even when we place some probability on the SIPP
being true and the DER having error, our results support this
An interesting feature of tables 5 and 6 is that when our
model is fit at the person level, the variance of the person and
employer effects is substantially less than when the model
is fit at the job level. This may be an important new finding
because employer effects are rarely included at the person
level, and we speculate on its causes. First, there is much
more variance in the SIPP and DER earnings measures at the
job level than at the person level. The effect of multiple job
holding over the year, either simultaneous or sequential, is to
reduce the total variance of earnings and to reduce the contri-
bution of individual differences in earnings to that variance,
particularly when we control for the effect of the dominant
employer. Second, persistent differences in the pay policies
of employers are mitigated by changing employers but not
by enough to eliminate employer effects in annual earnings
from all sources.
Table 7.—Fixed Effects, Job-Level Data, All Observations
True (Weighted Average) SIPP ME DER ME
Weight(SIPP,DER) Weight(SIPP,DER) Weight(SIPP,DER)
1,0 .5,.5 .1,.9 0,1 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9
Male white
High school diploma Coefficient 0.302 0.302 0.302 0.302 0.000 0.000 0.000 0.000 0.000 0.000
SE 0.017 0.017 0.018 0.019 0.005 0.009 0.010 0.010 0.005 0.001
Some college Coefficient 0.306 0.309 0.312 0.313 0.004 0.006 0.007 0.007 0.004 0.001
SE 0.017 0.017 0.018 0.018 0.005 0.008 0.009 0.009 0.005 0.001
College degree Coefficient 0.823 0.841 0.856 0.860 0.018∗∗ 0.033∗∗ 0.037∗∗ 0.037 0.018 0.004
SE 0.021 0.021 0.022 0.023 0.006 0.011 0.012 0.012 0.006 0.001
Gradvatic degree Coefficient 0.845 0.861 0.873 0.876 0.016∗∗ 0.028∗∗ 0.031∗∗ 0.031 0.016 0.003
SE 0.020 0.021 0.022 0.022 0.006 0.010 0.011 0.011 0.006 0.001
Male nonwhite
High school diploma Coefficient 0.361 0.363 0.365 0.365 0.002 0.004 0.004 0.004 0.002 0.000
SE 0.044 0.044 0.047 0.048 0.012 0.022 0.025 0.025 0.012 0.002
Some college Coefficient 0.420 0.423 0.425 0.426 0.003 0.005 0.006 0.006 0.003 0.001
SE 0.044 0.044 0.047 0.048 0.012 0.022 0.025 0.025 0.012 0.002
College degree Coefficient 0.823 0.816 0.810 0.809 0.007 0.013 0.014 0.014 0.007 0.001
SE 0.058 0.058 0.062 0.063 0.016 0.029 0.033 0.033 0.016 0.003
Gradvate degree Coefficient 0.996 0.997 0.998 0.998 0.001 0.002 0.002 0.002 0.001 0.000
SE 0.055 0.056 0.059 0.060 0.016 0.028 0.032 0.032 0.016 0.003
Intercept difference Coefficient 0.266 0.297 0.322 0.328 0.031 0.056 0.062 0.062 0.031 0.006
SE 0.084 0.083 0.088 0.090 0.026 0.047 0.052 0.052 0.026 0.005
Female white
High school diploma Coefficient 0.311 0.317 0.323 0.324 0.007 0.012 0.014 0.014 0.007 0.001
SE 0.019 0.019 0.020 0.020 0.005 0.009 0.011 0.011 0.005 0.001
Some college Coefficient 0.381 0.384 0.386 0.386 0.003 0.005 0.005 0.005 0.003 0.001
SE 0.018 0.018 0.020 0.020 0.005 0.009 0.010 0.010 0.005 0.001
College degree Coefficient 0.801 0.805 0.808 0.809 0.004 0.008 0.009 0.009 0.004 0.001
SE 0.023 0.023 0.024 0.025 0.006 0.011 0.013 0.013 0.006 0.001
Gradvate degree Coefficient 0.952 0.958 0.963 0.965 0.006 0.011 0.012 0.012 0.006 0.001
SE 0.022 0.023 0.024 0.025 0.006 0.011 0.013 0.013 0.006 0.001
Intercept difference Coefficient 0.107 0.102 0.098 0.097 0.005 0.009 0.010 0.010 0.005 0.001
SE 0.041 0.040 0.043 0.044 0.013 0.023 0.025 0.025 0.013 0.003
Female nonwhite
High school diploma Coefficient 0.352 0.359 0.365 0.366 0.007 0.013 0.014 0.014 0.007 0.001
SE 0.040 0.040 0.043 0.044 0.011 0.021 0.023 0.023 0.011 0.002
Some college Coefficient 0.423 0.416 0.410 0.408 0.007 0.013 0.015 0.015 0.007 0.001
SE 0.039 0.039 0.042 0.043 0.011 0.020 0.022 0.022 0.011 0.002
College degree Coefficient 0.886 0.901 0.913 0.916 0.015 0.027 0.030 0.030 0.015 0.003
SE 0.051 0.052 0.055 0.056 0.015 0.026 0.029 0.029 0.015 0.003
Gradvate degree Coefficient 1.141 1.161 1.176 1.180 0.020 0.035 0.039 0.039 0.020 0.004
SE 0.054 0.055 0.058 0.059 0.015 0.028 0.031 0.031 0.015 0.003
Intercept difference Coefficient 0.142 0.133 0.127 0.125 0.008 0.015 0.016 0.016 0.008 0.002
SE 0.076 0.075 0.079 0.081 0.024 0.042 0.047 0.047 0.024 0.005
1990 SIPP panel Coefficient 0.165 0.124 0.092 0.084 0.040∗∗∗ 0.073∗∗∗ 0.081∗∗∗ 0.081 0.040 0.008
SE 0.022 0.021 0.022 0.023 0.007 0.013 0.014 0.014 0.007 0.001
1991 SIPP panel Coefficient 0.127 0.079 0.041 0.032 0.048∗∗∗ 0.086∗∗∗ 0.095∗∗∗ 0.095 0.048 0.010
SE 0.020 0.020 0.021 0.022 0.007 0.012 0.013 0.013 0.007 0.001
1992 SIPP panel Coefficient 0.107 0.068 0.038 0.030 0.038∗∗∗ 0.069∗∗∗ 0.077∗∗∗ 0.077 0.038 0.008
SE 0.017 0.017 0.018 0.018 0.005 0.009 0.010 0.010 0.005 0.001
1993 SIPP panel Coefficient 0.041 0.005 0.024 0.032 0.036∗∗∗ 0.066∗∗∗ 0.073∗∗∗ 0.073 0.036 0.007
SE 0.016 0.016 0.017 0.018 0.005 0.009 0.010 0.010 0.005 0.001
Intercept Coefficient 6.858 6.819 6.787 6.779 0.040∗∗ 0.071∗∗ 0.079∗∗ 0.079 0.040 0.008
SE 0.037 0.036 0.038 0.039 0.012 0.022 0.024 0.024 0.012 0.002
Linear time trend Coefficient 0.038 0.037 0.036 0.036 0.001 0.001 0.001 0.001 0.001 0.000
SE 0.002 0.002 0.003 0.003 0.001 0.002 0.002 0.002 0.001 0.000
Fixed effects (coefficients on race, gender, education, panel, and time trend) from REML estimation of mixed-effects model using full job-level sample. Significant at *5%, **1%, and ***0.1%.
B. Fixed Effects
We turn next to the fixed effects from our estimation of
the MLMM model and present results from the job-level
model estimated with all the observations in table 7. Using
the definition from equation (19), in the first four columns,
we present different estimates of the vector Sig(B), calcu-
lated using the same priors for ωas we used to calculate the
reliability statistics in section VA. Since equation (19)defines
the truth to be the weighted average of the SIPP and DER
coefficients, the first and fourth columns of table 7 [ω=(1, 0)
and (0, 1)], correspond to coefficients from a SIPP earnings
equation and a DER earnings equation, respectively. These
end points define the range of each fixed effect. The SIPP and
DER measurement errors are defined according to equation
(20)and are reported in columns 5 to 10. When one source is
declared to be the truth, by definition the measurement error
for this source is 0; hence, each source has only three columns
of measurement error reported. Standard errors are reported
for both the true coefficients and the measurement error. Mea-
surement error significantly different from 0 is equivalent
to stating that there are significant differences between the
SIPP and DER estimates of a particular coefficient. Negative
measurement error for either source means that the source
coefficient was smaller than the weighted average coefficient.
By definition, the DER and SIPP measurement errors will
have opposite signs.
We remind readers that in mixed-effects modeling, the
fixed effects are the coefficients on the observed characteris-
tics of the individuals. These effects may vary over time and
differ from the random effects because the effect is estimated
directly instead of being inferred from its distribution. The
fixed effects included in our model are education in five lev-
els (no high school diploma (excluded case); high school
diploma; some college; college degree; graduate degree); a
piecewise linear spline in labor force experience with bend
points at 2, 5, 10, and 25 years of experience; an overall inter-
cept; SIPP panel effects (excluded case is 1996); and a linear
time trend. The education, experience, and overall intercept
coefficients are all interacted with race and gender to produce
separate estimates for white males, nonwhite males, white
females, and nonwhite females.
The only fixed effects that are significantly different
between the SIPP and the DER are the college and gradu-
ate degree indicators for white males and the panel effects.
For these two education effects, SIPP measurement error esti-
mates range from 0.018 to 0.037, meaning that the SIPP
estimates of the returns to a college degree are approximately
2 to 4 log points lower than in the DER. Estimates for the
return to a graduate degree vary by approximately 1.5 to 3
log points, with the SIPP returns again being lower. The panel
effects represent average differences in annual earnings for
each SIPP panel relative to the 1996 panel. These effects are
negative in both the SIPP and DER, meaning that 1996 earn-
ings are higher on average. SIPP measurement error ranges
from 0.04 to 0.1, which means that the DER panel effects
are less negative, that is, for DER earnings, the differences
between panels are lower. Here, however, we caution against
assigning too much importance to this result. The 1996 panel
was three waves longer than the longest panel of the early
1990s and suffered much more from problems with individu-
als missing waves. Thus, as shown in table 1, our 1996 sample
size becomes very small when we drop individuals with miss-
ing waves and likely leaves us with a group of respondents
with different characteristics from those in the panels from
the early 1990s. Thus, comparisons across panels are difficult
to make. All the other coefficients in table 7 are very similar
between the SIPP and the DER with measurement error usu-
ally less than 1 percentage point and not significantly different
from 0.
In table 8, we show the effect on earnings of 2, 5, 10, 25, and
30 years of experience, calculated using the five coefficients
from the piece-wise linear spline in the main job-level model.
These effects are split by demographic group and, as with
the coefficients in table 7, we report the true effect based on
four different priors for ωand also the SIPP and DER mea-
surement error. Each effect is followed by its standard error.
For both white males and females, there are significant dif-
ferences between the SIPP and DER experience effects at 2
and 5 years, with the SIPP effect being larger by 5 to 15 log
points. At 10 and 25 years, there are no significant differences
between the SIPP and DER effects. For white men at 30 years,
the DER effect is 2 to 4 log points significantly larger, but for
white women, the SIPP and DER effects are not significantly
different at 30 years. For nonwhite males and females, the dif-
ferences between the SIPP and the DER are significant only
at 2 years. For nonwhite men, the measurement error effects
are relatively large at 5 years and 30 years, ranging from 5 to
10 log points, but these results are imprecisely estimated and
are not significant. Nonwhite females have similarly large
standard errors and hence no significant effects after 2 years,
but even the magnitude of the effects stays small after 5 years.
In this sense, the profiles of white and nonwhite women are
similar to each other. The SIPP effect is initially higher, and
then the SIPP and DER converge and are quite similar at 30
years. For both white and nonwhite men, the SIPP effect is
initially larger and then converges to the DER effect, and then
the DER effect becomes larger, although only for white males
is this pattern significant.
These results change somewhat when imputations are
dropped or when the analysis is done at the person level.
In particular, returns for women stay higher in the SIPP rela-
tive to the DER across the whole experience profile. We refer
interested readers to appendix D, where we discuss the full
set of comparable results for the job-level model estimated
using jobs without imputed earnings and jobs with at least one
month of imputed earnings and for the person-level model
using all three samples and the dominant employer specifi-
cation. This appendix also contains graphical summaries of
the experience effects.
VI. Conclusion
We used linked survey data from the Census Bureau’s
SIPP and administrative data from the SSA’s DER, matched
at both the job and person levels, to estimate and analyze
measurement error models based on a multivariate linear
mixed-effects model for the pair of SIPP and DER annual
earnings outcomes. We showed that linking survey and
administrative data at the job level is a substantially more
complicated and nuanced process than linking the same data
at the individual level. The potential for measurement error
due to mismatching is substantial, and we documented the
steps taken to control that error. In the statistical specification,
we find that the conditional variance of the DER measures,
given the factors in both the fixed- and random-effect design
matrices, is greater than that of the SIPP component by com-
ponent for the person, employer, and time effects. There
Table 8.Experience Profiles from Job-Level Data, All Observations, for Different Truth Definitions
Experience =2 years Experience =5 years Experience =10 years Experience =25 years Experience =30 years
Weight(SIPP,DER) Weight(SIPP,DER) Weight(SIPP,DER) Weight(SIPP,DER) Weight(SIPP,DER)
1,0 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9 0,1
Male White
Effect 1.154 1.079 1.019 1.004 1.809 1.762 1.724 1.714 2.406 2.388 2.373 2.370 2.914 2.929 2.942 2.945 2.767 2.786 2.802 2.806
SE 0.036 0.035 0.037 0.038 0.033 0.033 0.035 0.036 0.031 0.031 0.033 0.034 0.031 0.031 0.032 0.033 0.030 0.030 0.032 0.032
Effect 0.000 0.075 0.135 0.150 0.000 0.047 0.085 0.095 0.000 0.018 0.032 0.036 0.000 0.015 0.027 0.030 0.000 0.020 0.035 0.039
SE 0.000 0.012 0.021 0.023 0.000 0.010 0.018 0.020 0.000 0.010 0.017 0.019 0.000 0.010 0.017 0.019 0.000 0.009 0.017 0.019
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗
Effect 0.150 0.075 0.015 0.000 0.095 0.047 0.009 0.000 0.036 0.018 0.004 0.000 0.030 0.015 0.003 0.000 0.039 0.020 0.004 0.000
SE 0.023 0.012 0.002 0.000 0.020 0.010 0.002 0.000 0.019 0.010 0.002 0.000 0.019 0.010 0.002 0.000 0.019 0.009 0.002 0.000
Female White
Effect 1.089 1.044 1.008 0.999 1.590 1.557 1.530 1.524 2.025 2.015 2.007 2.005 2.403 2.398 2.394 2.393 2.287 2.288 2.288 2.289
SE 0.034 0.033 0.035 0.036 0.032 0.031 0.033 0.034 0.030 0.030 0.032 0.033 0.031 0.030 0.032 0.033 0.030 0.029 0.031 0.032
Effect 0.000 0.045 0.081 0.090 0.000 0.033 0.059 0.066 0.000 0.010 0.017 0.019 0.000 0.005 0.009 0.010 0.000 0.001 0.002 0.002
SE 0.000 0.011 0.020 0.022 0.000 0.010 0.017 0.019 0.000 0.009 0.017 0.019 0.000 0.009 0.017 0.019 0.000 0.009 0.016 0.018
∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗
Effect 0.090 0.045 0.009 0.000 0.066 0.033 0.007 0.000 0.019 0.010 0.002 0.000 0.010 0.005 0.001 0.000 0.002 0.001 0.000 0.000
SE 0.022 0.011 0.002 0.000 0.019 0.010 0.002 0.000 0.019 0.009 0.002 0.000 0.019 0.009 0.002 0.000 0.018 0.009 0.002 0.000
Male Nonwhite
Effect 1.461 1.388 1.330 1.315 1.873 1.826 1.788 1.779 2.412 2.425 2.436 2.439 2.792 2.831 2.862 2.869 2.728 2.776 2.814 2.824
SE 0.097 0.095 0.101 0.103 0.088 0.087 0.092 0.094 0.084 0.083 0.088 0.090 0.083 0.082 0.087 0.089 0.081 0.080 0.085 0.087
Effect 0.000 0.073 0.131 0.146 0.000 0.047 0.085 0.095 0.000 0.014 0.024 0.027 0.000 0.038 0.069 0.077 0.000 0.048 0.087 0.096
SE 0.000 0.031 0.056 0.062 0.000 0.027 0.048 0.054 0.000 0.026 0.047 0.052 0.000 0.026 0.046 0.051 0.000 0.025 0.045 0.050
Effect 0.146 0.073 0.015 0.000 0.095 0.047 0.009 0.000 0.027 0.014 0.003 0.000 0.077 0.038 0.008 0.000 0.096 0.048 0.010 0.000
SE 0.062 0.031 0.006 0.000 0.054 0.027 0.005 0.000 0.052 0.026 0.005 0.000 0.051 0.026 0.005 0.000 0.050 0.025 0.005 0.000
Female Nonwhite
Effect 1.187 1.117 1.061 1.047 1.605 1.563 1.531 1.522 1.959 1.937 1.919 1.915 2.356 2.354 2.352 2.351 2.266 2.271 2.274 2.275
SE 0.085 0.084 0.089 0.091 0.077 0.076 0.081 0.083 0.073 0.072 0.077 0.079 0.074 0.073 0.077 0.079 0.072 0.071 0.075 0.077
Effect 0.000 0.070 0.127 0.141 0.000 0.041 0.074 0.082 0.000 0.022 0.040 0.044 0.000 0.002 0.004 0.005 0.000 0.005 0.009 0.010
SE 0.000 0.028 0.050 0.056 0.000 0.024 0.043 0.048 0.000 0.023 0.041 0.046 0.000 0.023 0.041 0.046 0.000 0.022 0.040 0.045
Effect 0.141 0.070 0.014 0.000 0.082 0.041 0.008 0.000 0.044 0.022 0.004 0.000 0.005 0.002 0.000 0.000 0.010 0.005 0.001 0.000
SE 0.056 0.028 0.006 0.000 0.048 0.024 0.005 0.000 0.046 0.023 0.005 0.000 0.046 0.023 0.005 0.000 0.045 0.022 0.004 0.000
Effect on earnings of 2, 5, 10, 25, and 30 years of experience. Calculated using five coefficients from piece-wise linear spline (0-2, 2-5, 5-10, 10-25, 25+ years) in job-level mixed effects model, full sample. The astrisks refer to the SIPP ME effects: Significant at *5%, **1%,
and ***0.1%.
is more variability in the DER job- and person-level data,
even controlling for demography, education, and labor force
In our model, neither the SIPP nor the DER measure was
treated as “true.” Instead, we specified a prior weight vec-
tor that was used to define “truth” as a weighted average of
SIPP and DER. Such a specification allowed us systemati-
cally to consider the implications of errors in either measure
on the resulting conclusions about conditional means (fixed
effects) and variance components (random effects). Consid-
ering the random components of the error process, we found
that the reliability statistics for SIPP and DER earnings mea-
sures were quite comparable except for the subsample of SIPP
person-jobs, where at least one year of SIPP earnings con-
tained a Census Bureau imputation. These measures were
less reliable than the DER. For the fixed effects, we found
very little statistically meaningful measurement error, with
most of the error being found in the highly educated white
male groups and the early-career experience profiles of male
and female whites.
Overall, our results point to the need to allow for mea-
surement error in both the survey and administrative data
when doing validation studies. However, there are certain
situations, particularly when the SIPP measure is based on
partially imputed data, where we find strong evidence that
the administrative measure contains less error. An impor-
tant next step is to combine our modeling procedure with an
audit study that determines the correct value of the earnings
measure as a function of variables that are measured for all
cases. Results from such work could be used by statistical
agencies to produce a measure of “true earnings” that is a
hybrid of survey and administrative data, a valuable measure
for researchers that would allow agencies to release informa-
tion from administrative data while limiting confidentiality
Abowd, John M., and David Card, “On the CovarianceStructure of Earnings
and Hours Changes,” Econometrica 57 (1989), 411–446.
Abowd, John M., Francis Kramarz, and David N. Margolis, “High
Wage Workers, High Wage Firms,Econometrica 67 (1999),
Angrist, Joshua, and Alan B. Krueger, “Empirical Strategies in Labor
Economics” (Vol. 3, pt. 1, pp. 1277–1366), in O. Ashenfelter and
D. Card, eds., Handbook of Labor Economics (New York: Elsevier,
Bound, John, Charles Brown, Greg J. Duncan, and Willard L. Rodgers,
“Evidence on the Validity of Cross-Sectional and Longitudinal Labor
Market Data,” Journal of Labor Economics 12 (1994), 345–368.
Bound, John, Charles Brown, and Nancy Mathiowetz, “Measurement Error
in Survey Data” (pp. 3705–3843), in J. J. Heckman and E. Leamer,
eds., Handbook of Econometrics (New York: Elsevier, 2001).
Bound, John, and Alan B. Krueger, “The Extent of Measurement Error
in Longitudinal Earnings Data: Do Two Wrongs Make a Right?”
Journal of Labor Economics 9 (1991), 1–24.
Duncan, Greg J., and Daniel H. Hill, “An Investigation of the Extent and
Consequences of Measurement Error in Labor-Economic Survey
Data,” Journal of Labor Economics 3 (1985), 508–532.
Fellegi, Ivan P., and Alan B. Sunter, “A Theory for Record Linkage,Journal
of the American Statistical Association 64 (1969), 1183–1210.
Fuller, Wayne, Measurement Error Models (New York; Wiley, 1987).
Gilmour, Arthur R., B. J. Gogel, Robin Thompson, and Brian R. Cullis,
ASREML User Guide Release 3.0 (Hermel Wempstead, UK: VSN
International Ltd., 2009).
Gottschalk, Peter, and Minh Huynh, “Are Earnings Inequality and Mobility
Overstated? The Impact of Non-Classical Measurement Error,” this
review 92 (2010), 302–315.
Gottschalk, Peter, and Robert Moffitt, “Changes in Job Instability and Inse-
curity Using Monthly Survey Data,Journal of Labor Economics
17 (1999), S91–S126.
Groves, Robert M., Floyd J. Fowler Jr., Mick P. Couper, James M. Lep-
kowski, Eleanor Singer,and Roger Tourangeau, Survey Methodology
(New York: Wiley, 2004).
Kapteyn, Arie, and Jelmer Y. Ypma, “Measurement Error and Misclassifi-
cation: A Comparison of Survey and Administrative Data,Journal
of Labor Economics 25 (2007), 513–550.
Meijer, Erik, Susann Rohwedder, and Tom Wansbeek, “Measurement Error
in Earnings Data: Using a Mixture Model Approach to Combine Sur-
vey and Register Data,Journal of Business and Economic Statistics
30 (2012): 191–201.
Mellow, Wesley, and Hal Sider, “Accuracy of Response in Labor Market
Surveys: Evidence and Implications,Journal of Labor Economics
1 (1983): 331–344.
Pischke, Jörn-Steffen, “Measurement Error and Earnings Dynamics: Some
Estimates from the PSID Validation Study,Journal of Business and
Economic Statistics 13 (1995), 305–314.
Roemer, Marc, “Using Administrative Earnings Records to Assess Wage
Date Quality in the March Current Population Survey and the
Survey of Income and Program Participation,” LEHD techni-
cal paper TP-2002–22 (2002),
Rubin, Donald B., “Inference and Missing Data,” Biometrika 63 (1976),
Stinson, Martha H., “Technical Description of SIPP Job Identification
No. Editing, 1990–1993 SIPP Panels.” SIPP technical paper, U.S.
Census Bureau (2003),
/core_notes/ DescriptionSIPPJOBIDEDITING.pdf.
Woodcock, Simon, “Wage Differentials in the Presence of Unobserved
Worker, Firm and Match Heterogeneity,” Labour Economics 15
(2008), 771–793.
... 22 This difference is strongly exhibited in our six data series, as shown by the cumulative distribution functions in Figure 2, which shows the LEHD to 22 However, matches of other surveys to administrative data have found differences in both directions-survey reports typically have jobs and earnings reports that are missing from the administrative data as well as the other way around. In many cases, this seems to be because the administrative data are in error and do not, for a variety of reasons, pick up jobs and earnings that survey respondents report (Juhn and McCue, 2010;Abraham et al., 2013;Abowd and Stinson, 2013). ...
Full-text available
There is a large literature on earnings and income volatility in labor economics, household finance, and macroeconomics. One strand of that literature has studied whether individual earnings volatility has risen or fallen in the U.S. over the last several decades. There are strong disagreements in the empirical literature on this important question, with some studies showing upward trends, some showing downward trends, and some showing no trends. Some studies have suggested that the differences are the result of using flawed survey data instead of more accurate administrative data. This paper summarizes the results of a project attempting to reconcile these findings with four different data sets and six different data series--three survey and three administrative data series, including two which match survey respondent data to their administrative data. Using common specifications, measures of volatility, and other treatments of the data, four of the six data series show a lack of any significant long-term trend in male earnings volatility over the last 20-to-30+ years when differences across the data sets are properly accounted for. A fifth data series (the PSID) shows a positive net trend but small in magnitude. A sixth, administrative, data set, available only since 1998, shows no net trend 1998-2011 and only a small decline thereafter. Many of the remaining differences across data series can be explained by differences in their cross-sectional distribution of earnings, particularly differences in the size of the lower tail. We conclude that the data sets we have analyzed, which include many of the most important available, show little evidence of any significant trend in male earnings volatility since the mid-1980s.
... The effects of the correction for measurement error on low-pay incidence are covered by Breen and Moisio (2004). The assumption that administrative records can be considered error-free has been questioned in papers such as Abowd and Stinson (2013) and Kapteyn and Ypma (2007). These authors link their skepticism for this type of data to mismatching, because a value recorded in an administrative file is likely to refer to a different observation. ...
Full-text available
The paper adopts workers' administrative records to study the low-pay phenomenon in Italy between 1990 and 2017. We compute different indicators, in particular a relative measure (threshold set at 0.6 of the median of yearly and monthly labour earnings) jointly with an absolute measure based on absolute poverty thresholds for single individuals. Then, we study the determinants of low-pay with a descriptive regression framework. Finally, we verify the possible role of the labour market de-regulation reforms in shaping low-pay dynamics. The main results are the increasing trend of low-pay incidence between 1990 and 2017 and the growing role of low-pay persistence from 2000 up to 2017
Advances in agricultural data production provide ever-increasing opportunities for pushing the research frontier in agricultural economics and designing better agricultural policy. As new technologies present opportunities to create new and integrated data sources, researchers face tradeoffs in survey design that may reduce measurement error or increase coverage. In this chapter, we first review the econometric and survey methodology literatures that focus on the sources of measurement error and coverage bias in agricultural data collection. Second, we provide examples of how agricultural data structure affects testable empirical models. Finally, we review the challenges and opportunities offered by technological innovation to meet old and new data demands and address key empirical questions, focusing on the scalable data innovations of greatest potential impact for empirical methods and research.
We identify two sets of households in the Panel Study of Income Dynamics (PSID) differing dramatically in their income and consumption dynamics, although both should be equally representative. The degree of consumption insurance in each subsample is consistent with the standard incomplete-markets model’s prediction. We contrast PSID and administrative earnings data and study the patterns in international datasets modeled on the PSID. We find an important role of differential attrition based on the dynamic properties of incomes in inducing the differences and identify PSID households providing a better guide to income dynamics and consumption insurance in the U.S.
Household finance surveys are now common in many countries. However, the validity of the self-reported financial information is still understudied, especially for complex choices. Using a unique matched dataset between the Chilean Household Finance Survey and the banking system’s loan records, we find a positive effect of financial literacy on the accuracy of loan reporting. These findings are robust to the use of several proxies for financial literacy, such as the OECD INFE measure, the knowledge of the respondent’s personal pension account type, or the use of electronic means of payments. Using a nearest neighbor matching estimator, we confirm that the effect of financial literacy on the accuracy of loan reporting is causal even after controlling for several observable characteristics.
Individual records referring to personal interviews conducted for a survey on income in Modena during 2012 and tax year 2011 were matched with the corresponding records in the Italian Ministry of Finance databases containing fiscal income data for tax year 2011. The analysis of the resulting data set suggested that the fiscal income data were generally more reliable than the surveyed income data. Moreover, the obtained data set enabled identification of the factors determining over- and under-reporting, as well as measurement errors, through a comparison of the surveyed income data with the fiscal income data, only for suitable categories of interviewees, that is, taxpayers who are forced to respect the tax laws (the public sector) and taxpayers who have many evasion options (the private sector). The percentage of under-reporters (67.3%) was higher than the percentage of over-reporters (32.7%). Level of income, age, and education were the main regressors affecting measurement errors and the behaviours of tax evaders. Tax evasion and the impacts of personal factors affecting evasion were evaluated using various approaches. The average tax evasion amounted to 26.0% of the fiscal income. About 10% of the sample was made up of possible total tax evaders.
Family income questions in general purpose surveys are usually collected with either a single-question summary design or a multiple-question disaggregation design. It is unclear how estimates from the two approaches agree with each other. The current paper takes advantage of a large-scale survey that has collected family income with both methods. With data from 14,222 urban and rural families in the 2018 wave of the nationally representative China Family Panel Studies, we compare the two estimates, and further evaluate factors that might contribute to the discrepancy. We find that the two estimates are loosely matched in only a third of all families, and most of the matched families have a simple income structure. Although the mean of the multiple-question estimate is larger than that of the single-question estimate, the pattern is not monotonic. At lower percentiles up till the median, the single-question estimate is larger, whereas the multiple-question estimate is larger at higher percentiles. Larger family sizes and more income sources contribute to higher likelihood of inconsistent estimates from the two designs. Families with wage income as the main income source have the highest likelihood of giving consistent estimates compared with all other families. In contrast, families with agricultural income or property income as the main source tend to have very high probability of larger single-question estimates. Omission of certain income components and rounding can explain over half of the inconsistencies with higher multiple-question estimates and a quarter of the inconsistencies with higher single-question estimates.
Loss functions are widely used to compare several competing forecasts. However, forecast comparisons are often based on mismeasured proxy variables for the true target. We introduce the concept of exact robustness to measurement error for loss functions and fully characterize this class of loss functions as the Bregman class. Hence, only conditional mean forecasts can be evaluated exactly robustly. For such exactly robust loss functions, forecast loss differences are on average unaffected by the use of proxy variables and, thus, inference on conditional predictive ability can be carried out as usual. Moreover, we show that more precise proxies give predictive ability tests higher power in discriminating between competing forecasts. Simulations illustrate the different behavior of exactly robust and non-robust loss functions. An empirical application to US GDP growth rates demonstrates the non-robustness of quantile forecasts. It also shows that it is easier to discriminate between mean forecasts issued at different horizons if a better proxy for GDP growth is used.
Full-text available
Survey data on earnings tend to contain measurement error. Administrative data are superior in principle, but they are worthless in case of a mismatch. We develop methods for prediction in mixture factor analysis models that combine both data sources to arrive at a single earnings figure. We apply the methods to a Swedish data set. Our results show that register earnings data perform poorly if there is a (small) probability of a mismatch. Survey earnings data are more reliable, despite their measurement error. Predictors that combine both and take conditional class probabilities into account outperform all other predictors.
Full-text available
Measures of inequality and mobility based on self-reported earnings reflect attributes of both the joint distribution of earnings across time and the joint distribution of measurement error and earnings. While classical measurement error would increase measures of inequality and mobility, there is substantial evidence that measurement error in earnings is not classical. In this paper, we present the analytical links between nonclassical measurement error and some summary measures of inequality and mobility. The empirical importance of nonclassical measurement error is explored using the Survey of Income and Program Participation (SIPP) matched to tax records. We find that the effects of nonclassical measurement error are large. However, these nonclassical effects are largely offsetting when estimating mobility, as measured by the intertemporal correlation in earnings. As a result, SIPP estimates of the correlation are similar to estimates based on tax records, though SIPP estimates of inequality are smaller than estimates based on tax records. (c) 2010 The President and Fellows of Harvard College and the Massachusetts Institute of Technology.
When making sampling distribution inferences about the parameter of the data, theta, it is appropriate to ignore the process that causes missing data if the missing data are 'missing at random' and the observed data are 'observed at random', but these inferences are generally conditional on the observed pattern of missing data. When making direct likelihood or Bayesian inferences about theta, it is appropriate to ignore the process that causes missing data if the missing data are missing at random and the parameter of the missing data process is 'distinct' from theta. These conditions are the weakest general conditions under which ignoring the process that causes missing data always leads to correct inferences.
This article investigates error properties of survey reports of labor market variables. We use the Panel Study of Income Dynamics (PSID) Validation Study, a two-wave panel survey of workers employed by a large firm that shared its detailed payroll records. Individuals' reports of annual earnings are fairly accurate. Errors are negatively related to true earnings, reducing bias due to measurement error when earnings are used as an independent variable. Biases are moderately larger for changes in earnings. Earnings per hour are less reliably reported than annual earnings. Biases in estimating earnings functions are relatively small, but those in labor supply functions may be important.
SUMMARY When making sampling distribution inferences about the parameter of the data, θ, it is appropriate to ignore the process that causes missing data if the missing data are ‘missing at random’ and the observed data are ‘observed at random’, but these inferences are generally conditional on the observed pattern of missing data. When making direct-likelihood or Bayesian inferences about θ, it is appropriate to ignore the process that causes missing data if the missing data are missing at random and the parameter of the missing data process is ‘distinct’ from θ. These conditions are the weakest general conditions under which ignoring the process that causes missing data always leads to correct inferences.
A mathematical model is developed to provide a theoretical framework for a computer-oriented solution to the problem of recognizing those records in two files which represent identical persons, objects or events (said to be matched). A comparison is to be made between the recorded characteristics and values in two records (one from each file) and a decision made as to whether or not the members of the comparison-pair represent the same person or event, or whether there is insufficient evidence to justify either of these decisions at stipulated levels of error. These three decisions are referred to as link (A1), a non-link (A3), and a possible link (A2). The first two decisions are called positive dispositions. The two types of error are defined as the error of the decision A1 when the members of the comparison pair are in fact unmatched, and the error of the decision A3 when the members of the comparison pair are, in fact matched. The probabilities of these errors are defined as and respectively where u(γ), m(γ) are the probabilities of realizing γ (a comparison vector whose components are the coded agreements and disagreements on each characteristic) for unmatched and matched record pairs respectively. The summation is over the whole comparison space r of possible realizations. A linkage rule assigns probabilities P(A1|γ), and P(A2|γ), and P(A3|γ) to each possible realization of γ ε Γ. An optimal linkage rule L (μ, λ, Γ) is defined for each value of (μ, λ) as the rule that minimizes P(A2) at those error levels. In other words, for fixed levels of error, the rule minimizes the probability of failing to make positive dispositions. A theorem describing the construction and properties of the optimal linkage rule and two corollaries to the theorem which make it a practical working tool are given.
This chapter provides an overview of the methodological and practical issues that arise when estimating causal relationships that are of interest to labor economists. The subject matter includes identification, data collection, and measurement problems. Four identification strategies are discussed, and five empirical examples — the effects of schooling, unions, immigration, military service, and class size — illustrate the methodological points. In discussing each example, we adopt an experimentalist perspective that emphasizes the distinction between variables that have causal effects, control variables, and outcome variables. The chapter also discusses secondary datasets, primary data collection strategies, and administrative data. The section on measurement issues focuses on recent empirical examples, presents a summary of empirical findings on the reliability of key labor market data, and briefly reviews the role of survey sampling weights and the allocation of missing values in empirical research.
Parker and Van Praag (2009) showed, based on theory, that the group status of the profession ‘entrepreneurship’ shapes people’s occupational preferences and thus their choice behavior. The current study focuses on the determinants and consequences of the group status of a profession, entrepreneurship in particular. If the group status of entrepreneurship is related to individual choice behavior, it is policy relevant to better understand this relationship and the determinants of the status of the entrepreneur. For reasons outlined in the introduction, this study focuses on (800) students in the Netherlands. We find that the status of occupations is mostly determined by the required level of education, the income level to be expected and respect. Furthermore, our results imply that entrepreneurship is associated with hard work, high incomes, but little power and education. Moreover, we find evidence that individual characteristics, such as entrepreneurship experience, vary systematically with the perceived status of occupations, thereby contributing ammunition to a fundamental discussion in the literature. Finally, we find a strong association between the perceived status of the entrepreneur and the estimated likelihood and willingness to become an entrepreneur.
This article examines the properties and prevalence of measurement error in longitudinal earnings data. The analysis compares matched Current Population Survey data to administrative Social Security payroll tax records. In contrast to typically assumed properties of measurement error, the results indicate that errors are serially correlated over two years and negatively correlated with true earnings (i.e., mean reverting). In a cross section, the ratio of the variance of the signal to the total variance is 0.82 for men and 0.92 for women. These ratios fall to 0.65 and 0.81 when the data are specified in first differences. Longitudinal earnings data may be more reliable than previously believed. Copyright 1991 by University of Chicago Press.