Content uploaded by John M. Abowd

Author content

All content in this area was uploaded by John M. Abowd on Jun 04, 2016

Content may be subject to copyright.

The Review of Economics and Statistics

Vo l . XCV December 2013 Number 5

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS:

A COMPARISON OF SURVEY AND ADMINISTRATIVE DATA

John M. Abowd and Martha H. Stinson*

Abstract—We propose a new methodology that does not assume a prior

speciﬁcation of the statistical properties of the measurement errors and treats

all sources as noisy measures of some underlying true value. The unobserv-

able true value can be represented as a weighted average of all available

measures, using weights that must be speciﬁed a priori unless there has

been a truth audit. The Census Bureau’s Survey of Income and Program

Participation (SIPP) survey jobs are linked to Social Security Administra-

tion earnings data, creating two potential annual earnings observations. The

reliability statistics for both sources are quite similar except for cases where

the SIPP used imputations for some missing monthly earnings reports.

I. Introduction

IN the large and long literature on measurement error, most

studies begin with data that are believed to contain errors

and look for a way to quantify those errors. The goal is to

account for and remove the effects of those errors. Deﬁning

data errors fundamentally requires some measure of truth:

an objective standard by which the accuracy of the data can

be judged. A common approach to this measurement error

problem is to ﬁnd a second data source that contains the

“truth” and deﬁne errors in the ﬁrst data set as the differ-

ence between the two sources. We believe, however, that the

assumption that some data contain errors while other data do

not is fundamentally ﬂawed. While the error-generating pro-

cess may be different in the two sources, no source is likely

to be completely error free.

Received for publication May 5, 2011. Revision accepted for publication

August 28, 2012.

* Abowd: Cornell University, Census Bureau NBER, CREST/INSEE, and

IZA; Stinson: U.S. Census Bureau.

We thank Joanne Pascale, Tracy Mattingly, Gary Benedetto, Julia Lane,

George Jakubson, David Johnson, Donald Kenkel,Simon Woodcock, Kevin

McKinney, Kristin Sandusky, Lars Vilhuber, and Marc Roemer for helpful

comments and support. We also acknowledge the comments of the edi-

tor and two referees whose input improved the paper substantially. This

report is released to inform interested parties of ongoing research and to

encourage discussion of work in progress. Any views expressed on sta-

tistical, methodological, technical, or operational issues are those of the

authors and not necessarily those of the U.S. Census Bureau, Cornell Uni-

versity, or any of the project sponsors. This work was partially supported by

the National Science Foundation Grant SES-9978093 and SES-0427889 to

Cornell University, the National Institute on Aging Grant R01-AG18854,

and the Alfred P. Sloan Foundation. J.A. also acknowledges direct support

from the Census Bureau and NSF grants SES-0339191, SES-1042181, and

SES-1131848. All data used in this paper are conﬁdential. The U.S. Census

Bureau supports external researchers’ use of some of these data through the

Research Data Center network (www.census.gov/ces). For public use data

please visit www.census.gov/sipp/ and click “Access SIPP Synthetic Data.”

A supplemental appendix is available online at http://www.mitpress

journals.org/doi/suppl/10.1162/REST_a_00352.

In this paper, we expand the measurement error litera-

ture in two ways. Our ﬁrst contribution is to extend the

methodology by showing that deﬁning truth with respect to

an observed quantity requires a researcher to place priors

on which source of data is the most reliable. These pri-

ors deﬁne the measurement error and its properties. After

recognizing this dependence on prior beliefs, we relax the

assumption that one source of data is truth. We show how

different priors lead to different measures of truth and, sub-

sequently, different amounts of error. We then specify and

estimate a multivariate linear mixed-effects model (MLMM)

in the spirit of Abowd and Card (1989) and consider the level

of error in both the ﬁxed effects (the relationship between the

measure and the observable characteristics of respondents)

and the random effects (relationship between the measure

and the unobservable characteristics of respondents and their

employers), showing how different priors about the truth lead

to different conclusions. Our approach includes the special

case of declaring one source to be truth and can be used with

complete or incomplete data from any number of sources.

Our second contribution is to implement our method

using job-level annual earnings from two sources: the Cen-

sus Bureau’s Survey of Income and Program Participation

(SIPP) and the Internal Revenue Service (IRS)/Social Secu-

rity Administration (SSA) W-2 forms. We use the largest

nationally representative sample to date—ﬁve SIPP panels–

matched at the job level to administrative data from the

SSA’s Detailed Earnings Record (DER) to study the measure-

ment error in earnings for these important national data. Our

comprehensive data allow us to combine the early focus in

the survey measurement error literature on employer versus

employee reports, which were inherently job-level studies,

with the later focus on comprehensive samples of survey

respondents regularly used by empirical researchers, which

were conducted at the person level because the particu-

lar survey studied, the Current Population Survey, collected

complete data only at the person level.1

1When both job-level and person-level earnings data are available, as in

the SIPP, many studies use the job-level earnings measure, in combination

with the survey hours measures, to construct outcomes and control variables

of interest. See, for example, Gottschalk and Mofﬁtt (1999), which uses the

SIPP, CPS, and PSID in exactly this way, to study the effects of employer

transitions.

The Review of Economics and Statistics, December 2013, 95(5): 1451–1467

No rights reserved. This work was authored as part of the Contributor’s ofﬁcial duties as an Employee of the United States Government and is therefore a work of the United States Government.

In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. law.

1452 THE REVIEW OF ECONOMICS AND STATISTICS

We use the internal, conﬁdential SIPP data that have

uncapped earnings and a carefully edited job history intended

to accurately track movement across jobs over the course of a

year. The conﬁdential administrative data also have uncapped

earnings, measured separately for each employer. Among val-

idation studies done with survey data, our analysis uses the

most complete set of inputs to date. We analyze a range of

priors about the reliability of each data source. We report the

resulting measures of error for both the ﬁxed and random

components of a multivariate earnings equation estimated by

residual maximum likelihood (REML).

The paper proceeds as follows. In section II, we review past

ﬁndings from studies of measurement error in survey earnings

and discuss how these relate to our study. In section III, we

lay out our methodological framework for identifying and

quantifying errors in earnings. We give a brief overview of

our data and the linking process between the two sources in

section IV. In section V, we present our results quantifying

error in earnings in the SIPP and W-2 data. We conclude in

section VI with thoughts on how both producers and users of

public use data might take account of measurement error.

II. Background

Early studies of measurement error, beginning with Fuller

(1987), deﬁned observed quantities as the sum of unobserved

true values and error. Hence, a variable Ytis decomposed into

yt+ut, the sum the of true value, yt, and measurement error,

ut. Assuming that the ytand utare uncorrelated, one can

calculate a reliability ratio that gives the percentage of total

variance that is true variance:

κyy =Cov y,Y

Var [Y]=σyy

σyy +σuu.

This statistic is important because when a second variable At

is a function of the true value ytplus some error,

At=βyt+et,

but is regressed on Yt,

At=ˆ

βYt+et=ˆ

β(yt+ut)+et,

the reliability ratio deﬁnes the ratio of ˆ

βto β. This is evident

from the formula for the expected value of ˆ

β, which, when

σyu =0 and σue =0, is given by

Eˆ

β=Cov [A,Y]

Var [Y]=βσyy

σyy +σuu=βκyy .

Angrist and Krueger (1999) deﬁne a reliability ratio for

ﬁrst differenced quantities Y=(Yt−Yt−1)=(yt−yt−1)+

(ut−ut−1)as

κyy=σyy

σyy +σuu (1−τ)

(1−ρ),

where τis the autocorrelation coefﬁcient of the measurement

error and ρis the autocorrelation coefﬁcient of yt.Ifρ>τ,

then (1−τ)

(1−ρ)is greater than 1 and the reliability ratio declines

relative to the ratio for levels of yt.

Finally if Ytis used as a dependent variable and regressed

on xtand the covariance of ytand utis not 0, then using

ut=δyt+v, where δis called the attenuation bias, Bound

et al. (1994) show that the coefﬁcient on xtin the regression

Y=(1+δ)y+v=xβ+εwill be biased since E ˆ

β/β=

(1+δ).2

Hence, only if the variance and structure of the measure-

ment error are known can unbiased estimators of βin either

the direct or reverse regression be obtained. Those studying

measurement error have focused on estimating κyy and testing

whether the assumptions of classical measurement error were

violated. Studies that obtain a second report for the mismea-

sured variable Yin order to calculate σuu and σyy have been

called validation studies (as in, for example, Bound, Brown,

& Mathiowetz, 2001). Without exception, the second report

is treated as yt(i.e., truth) and the measurement errors are

calculated as ut=Yt(ﬁrst report) −yt(second report). The

properties of these errors are then investigated. Researchers

have often concluded that the assumptions of classical mea-

surement errors are violated and that the errors are correlated

with the true values, i.e., σyu = 0. However, they acknowl-

edge that their models are driven by the assumption that

they obtained a true measure of y. Without this assumption,

there would be no way to determine the relation between the

errors and the true values. This assumption is fundamentally

untestable and is justiﬁed solely by the authors’ knowledge of

the quality of the secondary data source. We brieﬂy discuss

the history of this interpretation in the context of earnings

validation studies next.

Some prominent examples of validation studies are Mel-

low and Sider (1983), Duncan and Hill (1985), Bound et al.

(1994), Pischke (1995), Bound and Krueger (1991), Kapteyn

and Ypma (2007), Gottschalk and Huynh (2010), and Mei-

jer, Rohwedder, and Wansbeek (2012). The ﬁrst four papers

are similar to ours in that they estimate measurement error

at the job level. Mellow and Sider (1983) used a special

supplement to the January 1977 Current Population Sur-

vey (CPS) that obtained employment information from both

workers and employers. Looking at matched pairs with

both employer and employee wage reports, they found that

employer-reported wages exceeded worker reports by 4.8%

on average. Although they did not calculate a reliability ratio

per se, they did test the sensitivity of statistical models and

concluded that wage regressions were generally not that sen-

sitive to the source of information: worker versus employer.

In particular, the returns to education and experience were

very similar across their different regression equations. This

result is consistent with a relatively high reliability ratio.

2Bound et al. (1994) also derive a formula for the reliability ratio when

ytand utare correlated.

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS 1453

Like Mellow and Sider, Duncan and Hill (1985), Bound

et al. (1994), and Pischke (1995) examine two wage reports

for each job they observe. However, they do not use a rep-

resentative sample of people in the United States but rather

sample workers from a large anonymous manufacturing com-

pany. Workers at the company were interviewed using a Panel

Study of Income Dynamics (PSID) survey instrument, and

then information for these workers was obtained from com-

pany records. The authors treated the company reports of

annual earnings as measures of true earnings values and con-

sidered any differences between worker and employer reports

to be errors on the part of the workers, justifying this decision

as due to their conﬁdence in the accuracy of the company’s

records.

The survey was carried out in two waves, and the

authors reported a ratio of noise to total variance

σuu

(σyy+σuu )in the notation aboveof 0.302 for annual earnings

in 1986 and 0.151 for annual earnings in 1982. They found

evidence that errors in earnings were correlated with the true

levels of earnings and reported noise-to-variance ratios that

took account of this covariance as 0.239 in 1986 and 0.076 in

1982. They estimated the proportional attenuation bias when

using earnings as a dependent variable, δ,as−0.172 for 1986

and −0.104 for 1982. Finally, the authors compared earnings

equations using the two data sources and found that relative to

the employer provided data, employee interview data over-

stated the return to education by 40% and understated the

return to tenure by 20%.

Bound and Krueger (1991) depart from the job-level

method and compare total earnings in a given year from two

sources. Like our study, they use survey and administrative

data, linking March 1977 and 1978 CPS respondents to Social

Security earnings records. They reported large negative corre-

lations between measurement error and true earnings for both

CPS reference years (1976 at −0.46 and 1977 at −0.42). They

reported reliability ratios that did and did not take account of

these correlations as 1.016 and 0.844, respectively, for 1976

and 0.974 and 0.819 for 1977.

Bound et al. (2001, p. 3832) summarized the general

approach of all of these studies as: “Those collecting val-

idation data usually begin with the intention of obtaining

‘true’ values against which the errors of survey reports can be

assessed; more often than not we end up with the realization

that the validation data are also imperfect. While much can

still be learned from such data, particularly if one is conﬁdent

the errors in the validation data are uncorrelated with those in

the survey reports, this means replacing one assumption (e.g.,

errors are uncorrelated with true values) with another (e.g.,

errors in survey reports uncorrelated with errors in validation

data).”

Gottschalk and Huynh (2010) also studied person-level

earnings using matched SIPP and SSA data to conduct an

extensive investigation of the effects of measurement error

on earnings inequality and mobility. These are the same data

sources that we use, but they use only the 1996 SIPP panel.

Initially these authors do not label the SSA data as truth

and instead describe their study as quantifying differences

between the two sources. Nonetheless, their ability to study

the impact of SIPP error on mobility and inequality measures

hinges on a deﬁnition of error that requires them to declare

one source as truth. Using the DER as truth, the authors

estimate a traditional reliability ratio of 0.67 for the SIPP.

However, they conclude that measurement error is mean

reverting and show that in their framework, this type of error

partially offsets the bias in estimates of inequality in the SIPP.

They also conclude that measurement error is correlated over

time, and this diminishes the attenuation bias in the correla-

tion of earnings and lessens the impact of measurement error

on estimates of earnings mobility in the SIPP.

Kapteyn and Ypma (2007) use person-level linked sur-

vey and administrative data from Sweden. Like Gottschalk

and Huynh and our paper, they do not declare either source

to be true earnings, which formally is a latent variable in

their mixture of normals model. They use a prior speciﬁca-

tion of the effects of misclassiﬁcation errors and different

survey measurement errors to identify the posterior probabil-

ities associated with the administrative and survey measures

being true. Meijer et al. (2012) show that the Kapteyn and

Ypma model is a special case of a mixture factor model

with a speciﬁcation very similar to ours. Their generalized

model also identiﬁes the marginal prior probabilities, which

are treated as hyperparameters, and the conditional posterior

probabilities by making functional form assumptions about

the relation between the survey and administrative measures.

The relation between the identiﬁcation strategy in these mix-

ture factor models and our multivariate linear mixed-effects

model is discussed in section IIIB.

A ﬁnal related study by Roemer (2002) uses matched CPS,

SIPP, and DER data to study the distribution of annual earn-

ings. Rather than focus on reliability statistics and regression-

coefﬁcient comparisons, Roemer compares the percentiles

of the annual earnings distributions from the three sources.

Treating the DER as truth, he concludes that both the SIPP

and CPS estimate a person’s percentile rank more accurately

than the dollar amount of earnings. In his analysis, the SIPP

displays a shortage of high-earning workers compared to the

DER.

III. Statistical Model

A. Multivariate Linear Mixed-Effects Model

In this section we lay out the general statistical model

that we employ to estimate the ﬁxed and random com-

ponents of our joint model for SIPP and DER earnings

outcomes observed on the same matched job. The underly-

ing speciﬁcation is a multivariate linear mixed-effects model

(MLMM). The advantage of the MLMM framework is that it

shows with full generality how to accommodate two or more

matched observations of earnings on the same job, how to

1454 THE REVIEW OF ECONOMICS AND STATISTICS

vary the prior assumptions about which measure is “true”

systematically, and how to use external audit information,

if available, to update the posterior distribution over which

value is “true.”

The outcome under study, which is the dependent vari-

able in the model, is yist ,a1×Qvector of measures of log

earnings for individual i=1, ...,Iin sequential job spell

s∈{1, ...,S}, and time period t∈{1, ...,T}, where Qis

the total number of sources of earnings reports. The indices

sand tare always ordered sequentially, but not every ihas

values for every level of sand t. Deﬁne the vectors xist, the

1×Kdesign of the ﬁxed effects associated with sex, race,

education, experience, and so on; dithe 1 ×Idesign of the

random effects associated with person i; and fist the 1 ×J

design of the random effects associated with the employer of

iin job spell sduring period t. The full model is

yist =xist B+diΘ+fist Ψ+ηist , (1)

where Bis the K×Qmatrix of ﬁxed-effect coefﬁcients, Θ

the I×Qmatrix of random person effects with ith row θi

the 1 ×Qvector of random person effects for individual i,

Ψis the J×Qmatrix of random employer effects with jth

row ψjthe 1 ×Qvector of random employer effects for each

employer j, and ηist is the 1×Qresidual vector for individual

iin job spell sfor period t.

Stacking the observation over i,s,t, the model becomes

Y=XB +ZU +H,(2)

where Yis the N×Qmatrix of dependent variables with N

equal to the total number of person, job spell, and years in

the data; Xis the N×Kdesign matrix for all ﬁxed effects;

Z≡[

DF]is the N×(I+J)design matrix for the combined

random effects; U≡[

ΘTΨT]Tis an (I+J)×Qmatrix

of random effects; and His an N×Qmatrix of residuals.

Equation (2)is a multivariate linear mixed-effects model rep-

resented in canonical form. By construction, every column of

Ycan be represented as a single linear mixed-effects model,

also represented in canonical form, for dependent variable

Y(q), where the subscript (q)denotes selection of the indi-

cated column from the associated matrix. For example, the

qth column has the form

Y(q)=XB(q)+ZU(q)+H(q)(3)

for q=1, ...,Q.

Parameterize the stochastic structure of equation (2)as

follows:

E[ηist |X,Z]=0,

Var [ηist |X,Z]=Σ0,Q×Q,

Cov [ηist,ηist|X,Z]=0, i= i,∀s,s,t,t,

Cov [ηist,ηist|X,Z]=0, s= s∀t,t,

Cov [ηist,ηist |X,Z]=Σt−t,t= t,Q×Q,

Var [θi|X,Z]=G(θ),Q×Q,

Cov [θi,θi|X,Z]=0, i= i,

Var ψj|X,Z=G(ψ),Q×Q,

Cov ψj,ψj|X,Z=0, j= j,

where Σ|t−t|is the Q×Qautocovariance matrix of ηist at

lag |t−t|. We use ASREML (Gilmour et al. 2009) to ﬁt

equation (2)using the residual maximum likelihood method

(REML) and assuming that vec (U)and vec (H)have inde-

pendent joint normal distributions with zero means and the

covariance structure speciﬁed above. This special-purpose

software may not be as familiar to economists as it is to bio-

statisticians; however, all of the parameters that we specify

in the MLMM are identiﬁed using conventional methods—

proof that the residual likelihood function has a well-deﬁned

maximum for the given speciﬁcation. The ASREML software

checks these identiﬁcation conditions (or estimability condi-

tions, as they are known in biostatistics) and notes violations

at the optimum, so that only estimates of identiﬁable param-

eters, for both ﬁxed and random effects, are reported. We do

not elaborate on the estimation because ASREML produces

the REML estimates of all parameters and their estimated

covariance matrix taking account of the full stochastic struc-

ture of the speciﬁed model.3Although they are identiﬁable

and calculated by the software, we make no use of the esti-

mated person and employer effects, ˆ

U=[ˆ

ΘTˆ

ΨT]Tin our

notation.

To simplify the exposition of the results, we note here the

exact formulas for all of the variance components for the case

Q=2 in which the SIPP value is listed ﬁrst (q=1)and the

DER value is listed second (q=2). Then we have

Σ0=σ2

c+σ2

1σ2

c

σ2

cσ2

c+σ2

2,(4)

Σ|t−t|=⎡

⎣ρ|t−t|

cσ2

c+ρ|t−t|

1σ2

1ρ|t−t|

cσ2

c

ρ|t−t|

cσ2

cρ|t−t|

cσ2

c+ρ|t−t|

2σ2

2⎤

⎦,

(5)

G(θ)=σ2

θ1ρθσθ1σθ2

ρθσθ1σθ2σ2

θ2, (6)

G(ψ)=σ2

ψ1ρψσψ1σψ2

ρψσψ1σψ2σ2

ψ2. (7)

3We require the assumption that G(θ),G(ψ), and Σ|t−t|for all tare con-

sistently estimated when either Uor His nonnormal. This is reasonable

because the estimation is based on minimizing an objective function that

depends on only the ﬁrst two moments of vec (Y), albeit with a form that is

based on the multivariate normal distribution. Departures from normal are

likely to affect the covariance matrix in the asymptotic distribution of the

unique parameters of G(θ),G(ψ), and Σ|t−t|, which we have not investigated.

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS 1455

B. Deﬁning True Values, Associated Measurement Error, and

Reliability Statistics for the Random Effects

Signal, measurement error, and truth audits. Let ω≡

ω1,...,ωQTbe a Q×1 vector where ιTω=1, 0 ≤ωi≤1,

and ιis always a conformable column vector of 1s. The ele-

ments of the vector ωcorrespond to the prior probabilities or

weights associated with each of the elements being the cor-

rect or true value. The signal is, then, the expected true value,

where the expectation is taken over the prior probabilities ω,

Sig (yist )≡yist ω, (8)

and the measurement error is the deviation of each measure

from the signal component:

ME (yist )≡yist −Sig (yist )=yist −yist ωιT

=yist [I−ωιT]. (9)

Hence, the weight vector ω=(0, 1)Twhen Q=2 corre-

sponds to declaring that the second measure (in our case

DER) is correct. This is precisely the assumption in his-

torical administrative record measurement error studies. We

consider alternatives—in particular, ω=(0.5, 0.5), the case

where either measure is equally likely to be the truth, and

ω=(0.1, 0.9), the case where the DER is much more likely

to be true. Neither of these deﬁnitions imposes multivariate

normality on the signal or measurement error.

In a full Bayesian analysis, one would model ωist as a

latent data quality indicator whose prior expectation is the ω

in the preceding paragraph. Let gist represent the 1×Lvector

associated with the design of additional information used to

determine the correctness of the measurement, and let wist

represent the 1×Qoutcome vector from the data audit whose

elements are all zero except for a single column, q, coded 1,

which indicates that for the earnings outcome of person i,at

job spell s, in time period t,qwas the correct measurement.

Then the additional equation system that models wist as a

function of xist ,di,fist , and gist provides the framework for

computing a posterior estimate of ωist that would replace the

prior estimate used in equations (8)and (9).

It is also worth noting that our method generalizes to the

case where there are not the same number of measurements

for each i,s,t−tuple. We implement this feature in our esti-

mation, treating missing values of one or the other measure

for a particular year tas ignorable (Rubin, 1976).

Identiﬁcation of measurement error. No truth audit was

conducted as a part of our SIPP-to-DER matching exercise.

Therefore, we have no outcome data corresponding to wist,

even for a subsample. Hence, the prior ωmust equal the

posterior ω, which is why we form the signal and measure-

ment error components as we do. This assumption merely

formalizes what has been the practice for more than three

decades while also showing exactly what information would

be required to eliminate the use of purely prior information

about the correctness of one of the outcomes in measurement

error studies.

Because the Kapteyn and Ypma (2007) and Meijer et al.

(2012) models are facially similar to our model, we discuss

them now in the context of the assumptions that identify mea-

surement error. In our model and in theirs, if any measure is

declared to be true, wistq =1 for some qin our notation,

then measurement errors are observed and can be calculated

as yistq−yistq for all q= q, again in our notation. In our

model, without an audit there is no measurement on truth. We

have not formalized the judgments that analysts might make

based on the observed data to infer truth in the absence of

an audit. This is the critical distinction between our approach

and theirs.

Kapteyn and Ypma (2007), explicitly model the ana-

lyst’s behavior. If the survey and administrative measures

match, then the analyst declares them both true, and the

properties of the model for the administrative data and

measurement-error-free survey data are identiﬁed. The case

is “labeled” in their notation. The analyst makes two further

assumptions: (a) there is no measurement error in the admin-

istrative earnings variable, only mismatch (data matched from

the wrong individual), (b) survey measurement error, which

comes in two types, must increase the variance of the survey

measure relative to the administrative measure when there

is no mismatch. These two assumptions for the “unlabeled”

cases effectively declare the administrative measure to be the

truth whenever it is correctly matched to the survey record and

provide enough information to estimate posterior probabili-

ties of truth. Meijer et al. (2012) generalize these assumptions

and show that the posterior probabilities are sensitive to the

identifying assumptions, as expected.

Our model shares the property that if there are only two

observed measures and they are equal, then there is no

observed measurement error. But we make no use of the struc-

ture of the error components in the two measures that would

identify measurement error speciﬁcally. Moreover, during an

audit, an analyst could discover that neither measure con-

tained, for example, off-the-book payments like tips. Then

neither measure would be true, and the auditor would have

to estimate such payments or declare a latent third measure

to be true. Our model explicitly allows this outcome. Fur-

thermore, an analyst who wished to use prior information

to characterize the effects of different types of errors on the

distribution of ycould calculate the posterior distribution of

ωgiven these model components. This is what Meijer et al.

(2012) have done using the mixture factor analysis model.

Our method makes clear than in the absence of an audit,

measurement error is entirely deﬁned a priori and not from

any observed data. Classical validation studies like the ones

discussed in section I and their more sophisticated recent

counterparts all identify the measurement error by plac-

ing very strong priors on the data generation process. Our

contribution is to generalize this identiﬁcation strategy to

accommodate other priors and a broader set of potential

measures.

1456 THE REVIEW OF ECONOMICS AND STATISTICS

Reliability statistics. To compute reliability statis-

tics, we require estimates of Var Sig (yist )|X,Zand

Var ME (yist )|X,Z. These can be computed from the sto-

chastic structure of equation (2):

Cov yist,yist |X,Z=G(θ)+G(ψ)+Σ|t−t|, (10)

Cov Sig (yist), Sig (yist )|X,Z

=ωTG(θ)+G(ψ)+Σ|t−t|ω, (11)

Cov ME (yist),ME(yist)|X,Z

=I−ωιTTG(θ)+G(ψ)+Σ|t−t|I−ωιT.

(12)

The traditional measures for the case Q=2 can be computed

at all lags using ω=(0, 1)T.

Some care must be taken in using the formulas in equations

(11) and (12) because they do not represent an orthogonal

decomposition of equation (10). The conventional reliability

ratio for measure qis deﬁned as the ratio of its signal variance

to its total variance. With the SIPP and DER measures in

positions 1 and 2, respectively, the traditional reliability ratios

are

TRR0,SIPP =ωT(G(θ)+G(ψ)+Σ0)ω

{G(θ)+G(ψ)+Σ0}11

(13)

and

TRR0,DER =ωTG(θ)+G(ψ)+Σ0ω

G(θ)+G(ψ)+Σ022

, (14)

where the notation {}ij means to extract the i,jth element of

the matrix in {}. The difﬁculty with equations (13) and (14) is

that they are not bounded above by unity because either of the

two measures in isolation (SIPP or DER) can omit elements

that should be measured or include elements that should not.

Our measurement error model has only the traditional relia-

bility ratio property of being bounded above by unity when

either the SIPP or DER is true or when the two measures are

exchangeable, as in conventional survey reliability estima-

tion where the measures are obtained by repeated application

of the same survey instrument (see Groves et al., 2004).

We choose instead to deﬁne the reliability statistic by

generalizing the index of inconsistency, the ratio of the mea-

surement error variance to the total variance, and subtracting

it from unity so that it has the interpretation of a reliability

statistic:

RR0,SIPP

=1−I−ωιTTG(θ)+G(ψ)+Σ0I−ωιT11

G(θ)+G(ψ)+Σ011 (15)

and

RR0,DER

=1−I−ωιTTG(θ)+G(ψ)+Σ0I−ωιT22

G(θ)+G(ψ)+Σ022

.

(16)

The reliability statistics in equations (15) and (16) reproduce

the conventional reliability ratios when either of the measures

is true or when the two measures are exchangeable.4Because

of the serial correlation caused by the structure of the indi-

vidual, employer, and time effects, we also deﬁne reliability

statistics at different lags. These are given by

RR|t−t|,SIPP

=1−I−ωιTTG(θ)+G(ψ)+Σ|t−t|I−ωιT11

G(θ)+G(ψ)+Σ|t−t|11 (17)

and

RR|t−t|,DER

=1−I−ωιTTG(θ)+G(ψ)+Σ|t−t|

I−ωιT22

G(θ)+G(ψ)+Σ|t−t|

22

.

(18)

These deﬁnitions require consistent estimates of the variance

parameters G(θ),G(ψ)and Σ|t−t|, but they do not impose

multivariate normality on the underlying data.

C. Deﬁning True Values, Associated Measurement Error, and

Reliability Statistics for the Fixed Effects

Using the MLMM speciﬁcation in equation (2) and fol-

lowing the method from equation (8), we deﬁne

Sig (B)≡Bω, (19)

where Bis the matrix of ﬁxed effects associated with the

design X. The true ﬁxed effect is the ω−weighted average of

the SIPP and DER ﬁxed-effect coefﬁcients. These end points

deﬁne the range of each ﬁxed effect. The SIPP and DER

measurement errors are deﬁned as

ME B(q)≡B(q)−Bω, (20)

4It is worth noting here that it makes no sense to assume that the two

measures are exchangeable because that is equivalent to assuming that the

labels “SIPP” and “DER” are meaningless. The whole point of the analysis is

that we know that outcomes were collected based on different measurement

concepts (survey versus administrative records), so they should not have the

same joint distribution if we exchange the labels.

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS 1457

where q=1, 2, and the generalization to arbitrary Qis

straightforward. We note that deﬁning the signal and the

measurement error in terms of the theoretical ﬁxed-effect

coefﬁcients is exactly comparable to the methods discussed

in section II, when the weight vector takes the value (0, 1)T,

that is, when we assume that DER is truth.

Applying the MLMM model estimator in ASREML pro-

duces the estimates vec(ˆ

B)and

Var[vec(ˆ

B)], which can be

used directly to estimate the signal, measurement error, and

associated standard errors for the ﬁxed-effect coefﬁcients.

These are reported in section V for the same values of ωas

we use for the random effects.5

We do not compute reliability statistics for the ﬁxed effects

because the measurement error in equation (20) has tradi-

tionally been interpreted as a bias, which is the interpretation

we adopt—making point estimates and their standard errors

more appropriate than reliability statistics.

D. Person-level Models

For comparison to the literature on the assessment of sur-

vey measurement errors using administrative data, we also

estimate the MLMM for person-level outcomes. A person-

level outcome for individual iin year tis deﬁned as the sum

of all observations yist over all jobs s, including those jobs that

matched between the SIPP and the DER and those that did

not. Thus, for each person, there are qoutcomes per year, and

these outcomes differ across sources because of differences

in reporting at the job level and because of differences in the

number of jobs reported.

The base speciﬁcation becomes

yit =xit B+diΘ+fit Ψ+ηit , (21)

where xit can be deﬁned unambiguously because our

observed covariates do not vary by employer. In order to be

comparable to the literature, we do not include the design fit

in most of the person-level speciﬁcations. But we do report

one set of results where the employer is deﬁned as the one for

which y2ist is a maximum over s, that is, the employer with

the greatest DER earnings during the year.6All MLMM and

reliability statistics for person-level models are deﬁned in a

manner that is strictly comparable to the job-level models, so

we do not repeat any of the formulas here.

IV. Data Description

The fundamental unit of observation in this paper is a job,

deﬁned as a match between an individual and an employer.

Data on jobs come from two sources: ﬁve Survey of Income

5

Var[vec(ˆ

B)]is computed assuming that vec(U)and vec(H)in equa-

tion (2)have independent joint normal distributions with zero means and

covariance matrices as speciﬁed in section IIIA. This is the only use of the

normality assumption beyond the speciﬁcation of the objective function in

REML. See Gilmour et al. (2009).

6In cases where a job-year has reported SIPP earnings but no DER

earnings, we use the SIPP earnings to determine the dominant employer.

and Program Participation (SIPP) panels conducted dur-

ing the 1990’s and the Detailed Earnings Records (DER)

extracted from the Social Security Administration Master

Earnings File for the respondents in each of the ﬁve panels.7

In the SIPP, data on earnings were reported monthly, while in

the DER, earnings were reported annually. In both sources,

there were multiple records per job from repeated interviews

and annually ﬁled W-2s. Hence, in order to compare earn-

ings, we ﬁrst had to identify jobs and group earnings records

over time in each data source. After job records were created,

individuals in each data set were linked by Social Security

number and, then, job records from the SIPP and the DER

were matched to each other for each individual. We describe

each step of this process.

A. Creating a SIPP Jobs Data Set

All the SIPP panels conducted in the 1990s collected

detailed labor force information from respondents every four

months, or approximately three times per year, over the

course of two and a half to four years. Respondents were

asked questions about at most two jobs held during the previ-

ous four months, where the term job was loosely deﬁned as

working for pay. We used the longitudinal SIPP person ID,

the wave (interview) number, and an edited longitudinal job

ID that we created to combine records and create one observa-

tion per person per job. Appendix A in the online supplement,

which includes all appendixes, describes the problems we

found with the original job ID and gives a summary of how we

created our edited version. The ﬁrst column of table 1 shows

the number of respondents in each SIPP panel who report

working and the total number of jobs reported, using the three

identiﬁers listed above to count jobs.8Once we deﬁned a set

of jobs for each SIPP panel, we created annual earnings mea-

sures for each year covered by the survey by summing the

appropriate monthly earnings reports.

In order to understand the comparison to another data

source, it is important to ﬁrst understand the concept of

earnings as used during the SIPP interview. The ﬁeld rep-

resentative sought to ask each adult about his or her earnings,

but if an adult was not present at the time of the interview,

another adult could answer as a proxy. During the 1990–

1993 SIPP panels, respondents (or proxies) were asked to

report gross earnings from a speciﬁc employer in the follow-

ing way: “The next question is about the pay … received

from this job during the 4-month period. We need the most

accurate ﬁgures you can provide. Please remember that cer-

tain months contain 5 paydays for workers paid weekly and

3 paydays for workers paid every 2 weeks. Be sure to include

any tips, bonuses, overtime pay, or commissions. What was

the total amount of pay that … received BEFORE deductions

7The ﬁve SIPP panels began in 1990, 1991, 1992, 1993, and 1996.

8The edited SIPP job ID for the 1990–1993 panels was released by the

Census Bureau as an update to the public use ﬁles and is available on the

SIPP FTP website. The edited job ID is described in Stinson (2003).

1458 THE REVIEW OF ECONOMICS AND STATISTICS

Table 1.—Data Summary of SIPP and DER Jobs

12 3 4 5 6 7

Both Job Drop Missing Drop Missing Drop

SIPP Panel SIPP DER SIPP DER Match Sample Years Waves Imputes

1990 People with jobs 37,291 35,032 30,993 28,313 26,615 21,776 16,642

1989–1992 Total jobs held 66,991 96,086 55,087 88,324 41,885 36,168 30,003 22,862

1991 People with jobs 23,520 21,729 19,056 17,426 16,323 13,517 10,347

1990–1993 Total jobs held 40,818 58,020 32,447 52,797 25,258 21,761 18,331 13,892

1992 People with jobs 33,920 31,557 27,394 25,314 24,599 19,263 13,476

1991–1995 Total jobs held 65,278 99,524 51,650 90,360 39,729 37,021 29,750 20,549

1993 People with jobs 32,972 29,831 26,267 24,103 22,103 18,582 13,454

1992–1995 Total jobs held 61,094 81,320 47,723 74,317 36,469 29,563 25,144 18,013

1996 People with jobs 63,116 55,894 48,542 44,626 44,203 7,654 5,037

1995–2000 Total jobs held 121,450 192,720 97,149 173,623 75,110 72,805 13,553 8,546

Job counts in the SIPP are from internal Census ﬁles using the job tracking identiﬁer created by authors. Job counts in the DER are done using the employer tax identiﬁer (EIN). Person counts in the SIPP are

calculated using the internal longitudinal person identiﬁer. Person counts in the DER are calculated using the SSN.

on this job in …?”9The ﬁeld representative read the name of

each month and separately recorded earnings for that month.

In the 1996 survey instrument, which was conducted using

a computer-assisted personal Interview (CAPI) system, indi-

viduals (or proxies) could report earnings payments over a

variety of time periods and the instrument automatically cal-

culated monthly earnings. Field representatives (FRs) asked,

“Each time he/she was paid by [Name of Employer] in

[Month X], how much did he/she receive BEFORE deduc-

tions?”10 The ﬁeld representative then followed up with

questions about whether there were any other payments such

as tips, bonuses, overtime pay, or commissions. Built-in con-

sistency checks ﬂagged earnings amounts outside certain

bounds and prompted the FR to make corrections. Respon-

dents were also asked to refer to earnings records if possible

so as to give accurate responses. Thus, in the most accurate

cases, these earnings reports most likely reﬂected the gross

pay from monthly pay stubs.

B. Creating a DER Job-Level Data Set

The second source of data, DER, was a specialized extract

from the SSA’s Master Earnings File that contained earn-

ings histories for each SIPP respondent in the 1990, 1991,

1992, 1993, and 1996 panels with a validated SSN. The

creation of the DER was a joint project between the Cen-

sus Bureau and SSA. The Census Bureau asked each SIPP

respondent at the time of the survey to provide an SSN. SSA

then compared self-reported name, sex, race, and date of birth

to their counterparts for the matching SSN on the Numident,

an administrative database containing demographic informa-

tion collected when every SSN was issued and updated when

the individual had subsequent contacts with SSA. If a respon-

dent’s name and demographics were deemed close enough to

the name and demographics associated with the SSN in the

9SIPP 1993 wave 1 questionnaire, page 15, available at http://www.census

.gov/sipp/core_content/1993/quests/sipp93w1.pdf.

10 SIPP 1996 wave 1 questionnaire, Labor Force Amount section,

available at http://www.census.gov/sipp/core_content/1996/quests/screens

/lf_par2.html.

Numident, the SSN was declared valid.11 This list of validated

SSNs was the basis for extracting detailed earnings records

from the SSA Master Earnings File.

A W-2 history for a SIPP respondent consisted of annual

earnings, broken down by employer, from 1978 to 2000. The

primary earnings variable came from box 1 of the W-2 form:

wages, tips, and other compensation. This earnings vari-

able was uncapped and represented all earnings that were

taxable under federal income tax. For the purposes of this

earnings comparison study, jobs with an employer (i.e., non-

self-employment) held during the time period covered by the

survey questions were used.12 In the second column of table 1,

we show the number of SIPP respondents with DER records

and a count of unique person-employer matches.

Employers were identiﬁed in the DER by an IRS-assigned

employer identiﬁcation number (EIN). The EIN linked

employers to the Business Register, the master list of all

businesses maintained by the Census Bureau as the sam-

pling frame for establishment-level surveys. Using this link,

we merged information from the Business Register about the

industry and name of the employer to each relevant job report

11 For respondents who answered “do not know” to the SSN question, an

attempt was made to ﬁnd the missing SSN by locating the person in the

Numident based on his or her reported name and demographic characteris-

tics. When a respondent refused to provide an SSN, no attempt was made

to link this person to any administrative data, and the SSN was left missing.

12 In addition to employer reports, the DER contained reports of self-

employment earnings. The SIPP also collected information about self-

employment, but responses to these questions were treated separately from

responses to the questions about jobs with employers. Self-employment

reports from either source were not included in this study. A mismatch

in employment type is another reason that a SIPP job might fail to link

to a DER job. An investigation of the frequency of this type of error is

beyond the scope of this paper. However, to give an idea of the poten-

tial magnitude of the problem, we looked at self-employment rates in each

source for the 1996 panel. In this panel, approximately 4% of individuals

report self-employment in addition to regular employment. If all these self-

employment reports were really additional employer-based jobs, the SIPP

job count would be understated by approximately 2,400 jobs. In the DER,

12% of individuals have both regular and self-employment records. If all

of these self-employment cases were falsely reported by SIPP respondents

as regular jobs, the SIPP job count would be over stated by approximately

6,700 jobs. Using year-round, full-time workers, Roemer (2002) reports

that DER self-employment was misreported as a SIPP employer-based job

about 1.5% of the time.

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS 1459

in the DER data. Details about this merge can be found in

appendix B.

C. Matching SIPP and DER Jobs

After the creation of the SIPP and DER job-level data sets,

the next step was to create a common sample of people who

had job reports in both ﬁles. In the third column of table 1,

we show the number of people found in both sources and the

total jobs they held according to the SIPP and the DER.13

Here the timing of the survey plays an important role. In

every SIPP panel, the survey asked employment questions of

at least some respondents in the last few months of the year

preceding the ofﬁcial beginning year and in only a subset of

months in the ﬁnal year of the panel.14 For DER jobs, we did

not have any subannual information about the dates the job

was held. In order to attempt to match as many SIPP and DER

jobs as possible, all DER jobs from the years either partially

or fully covered by the survey were included in the potential

match set, as appropriate for each respondent. We did this to

allow the best possible chance for a given SIPP job to match,

feeling that we did not wish to impose the requirement that

timing between the SIPP and the DER be exact. However,

this has the effect of making the SIPP and DER job counts

noncomparable. We report more comparable job counts in

appendix table C2, where we count only SIPP and DER jobs

reported in the full survey years.

After we matched by SSN, a job-to-job match was per-

formed using probabilistic record linking based primarily

on name matching.15 The primary basis for matching was

self-reported name of the employer from the SIPP and admin-

istrative name of the employer from the Business Register.

Earnings were not used in the match in order to prevent bias

in the subsequent comparison of earnings. Appendix C gives

the details of this match, including which additional matching

variables were used, how duplicate matches were handled,

and how company ownership changes affected the matching.

The fourth column of table 1 shows the number of SIPP jobs

that were successfully matched to a counterpart job in the

DER. While the percentage of SIPP jobs that match ranges

from 76% to 78% across the panels, the percentage of total

person earnings represented by these matching jobs is much

higher, ranging from 91% to 94%.

Of the jobs that matched, we dropped those that did not

have at least one DER and one SIPP earnings report in one

of the full years covered by the panel, but we did not require

these reports to be in the same year. For example, a SIPP job

could have earnings reports for 1996 and 1997 but not 1998,

while the DER job could have reports for all three years.

As a result, the SIPP and DER sample sizes were slightly

different for each year. In our mixed-effects modeling, 0s

13 See appendix table C1 for more details about matching at the person-

level, including a breakdown of reasons for not matching.

14 See appendix C for further details about the timing of the SIPP panels

used in this analysis.

15 We used the Fellegi and Sunter (1969) method. Details are provided in

appendix C.

were treated as missing values and were modeled as ignorable

in the sense of Rubin (1976) given all effects in the ﬁxed-

and random-effects design matrices, facilitating estimation

with an unbalanced panel. The decision not to require exact

matching in the earnings years was based on the fact that

earnings essentially reported as 0 in one source and positive

in another source was a type of measurement error that we

did not wish to exclude. The ﬁfth column of table 1 shows the

decrease in our sample of jobs due to missing DER or SIPP

reports.

Finally, there were jobs that matched and had both SIPP

and DER earnings in full survey years but the SIPP earnings

were incomplete due to respondents missing an interview in

the middle of the panel. When an entire household missed

an interview, the Census Bureau did not impute responses

for this wave of the survey, and the data were left missing.

We dropped individuals who ever had a missing wave of

SIPP data.16 In the sixth column of table 1, we show the

ﬁnal total number of jobs per panel that were used in the

analysis. Combined across panels, our sample has 116,781

jobs, 80,792 people, and 70,081 unique employers.

In months where a SIPP respondent was interviewed but

failed to answer the earnings questions, responses were

imputed by the Census Bureau. Our main sample includes

both reported and imputed values. In addition, we split our

sample into person-job observations that never have imputed

monthly earnings and those that do and estimate our model

on both subsamples.17 This allows us to show the effect of

the Census Bureau’s imputation method on reliability statis-

tic calculations. The last column of table 1 shows people and

jobs that remain when these imputations are dropped.

Tables 2 and 3 describe the covariance structure of the

SIPP and DER earnings over time. Variances are shown on

the diagonals, covariances are listed below the diagonal, and

correlations are listed above. A job contributes an observation

for any year in which it has nonzero SIPP or DER earnings

or both. In the SIPP data, the correlations between adjacent

years range from 0.53 to 0.76. In the DER data, they are

higher, ranging from 0.80 to 0.83. For 1992 to 1994 and 1999,

the variance of earnings is higher in the DER than in the SIPP,

while in 1996, the SIPP has higher variance. In the remaining

years (1990–1991 and 1997–1998), the variance is quite close

between the two sources.

16 In earlier versions of this paper, we did not make this assumption

because we wished to show the effect of these missing data on the measure-

ment of annual earnings. We changed our sample for two reasons. First,

we wanted to be more comparable to the literature, which generally drops

missing data when estimating measurement error. Second, we concluded

that most data users would ﬁrst do an imputation of some kind for the miss-

ing months when calculating annual earnings. Since at the moment there

is no standard method for doing this imputation, we decided to drop these

cases from our modeling.

17 When dropping imputed values, we use the ﬂag that indicates a monthly

earnings value was imputed and the interview ﬂag that tells when all of a

respondent’s answers were imputed using a hot-deck method that assigns a

donor. This latter type of imputation is called Type Z by the Census Bureau

and is used when a household is interviewed but some members are not

able to be interviewed.

1460 THE REVIEW OF ECONOMICS AND STATISTICS

Table 2.—Covariance/Correlation Matrix for Natural Log of SIPP Job Annual Earnings

1990 1991 1992 1993 1994 1996 1997 1998 1999

1990 2.038 0.61

1991 0.983 2.054 0.53

1992 0.833 1.795 0.56 0.51

1993 0.887 1.808 0.76

1994 0.699 1.060 1.935

1996 2.094 0.76 0.71 0.65

1997 1.074 1.946 0.76 0.70

1998 0.808 1.030 1.923 0.75

1999 0.659 0.772 0.961 1.805

Variances on diagonal; covariances below the diagonal; correlations above the diagonal. Sample is the matched SIPP/DER jobs with at least one SIPP and DER earnings report in the panel years and no missing

interview waves.

Table 3.—Covariance/Correlation Matrix for Natural Log of DER Job Annual Earnings

1990 1991 1992 1993 1994 1996 1997 1998 1999

1990 1.965 0.81

1991 1.209 2.095 0.80

1992 1.266 2.088 0.80 0.74

1993 1.284 2.195 0.80

1994 1.040 1.276 2.261

1996 1.898 0.82 0.77 0.72

1997 1.147 1.931 0.83 0.77

1998 0.968 1.154 1.919 0.83

1999 0.882 1.000 1.186 1.974

Variances on diagonal; covariances below the diagonal; correlations above the diagonal. Same sample as table 2.

Table 4.—Correlation Matrix for Natural Log of SIPP/DER Job Annual Earnings

Ln(SIPP Job Annual Earnings)

Ln(DER Job Annual Earnings) 1990 1991 1992 1993 1994 1996 1997 1998 1999

1990 0.75 0.60

1991 0.59 0.76 0.64

1992 0.59 0.78 0.73 0.69

1993 0.62 0.87 0.72

1994 0.55 0.71 0.88

1996 0.89 0.73 0.68 0.64

1997 0.72 0.87 0.71 0.65

1998 0.67 0.72 0.88 0.71

1999 0.64 0.66 0.72 0.86

Same sample as table 2.

Table 4 gives the cross-source correlations between each

year of DER and SIPP data. For earnings in the same year,

correlations range from 0.75 to 0.89, and for adjacent years,

from 0.55 to 0.73. In general, correlation between SIPP and

DER earnings has increased over time, with the high point

occurring in 1996. Adjacent years of DER and SIPP data

have lower cross-source correlations than the autocorrelation

of adjacent years of DER data but are mixed when compared

to the autocorrelation of adjacent years of SIPP data. For

the 1996 panel, the cross-source correlations are lower in

adjacent years than the SIPP-to-SIPP correlations. However,

for the years 1992 to 1994, the correlation of SIPP earnings

with the prior year is stronger when those earnings come from

the DER instead of the SIPP.

D. Why Administrative Data Might Not Be Truth

Before comparing earnings, we discuss three reasons

that considering administrative data to be truth might be

problematic. First, there are some deﬁnitional differences

between the two data sources. Second, there is likely to be

error in the administrative data themselves. Third, the match-

ing process between the data sources may also introduce

error. We brieﬂy discuss each of these in turn and summa-

rize how they might affect the comparison between SIPP and

DER earnings.

Conceptual differences between SIPP and DER: Jobs and

earnings deﬁnitions. Conceptual differences between SIPP

and DER stem from different deﬁnitions of earnings and jobs.

There are at least two parts of earnings that would be reported

on an employee’s pay stub in gross earnings that are not

included in box 1 of the W-2 form: pre-tax health insurance

plan premiums paid by the employee and pre-tax elective con-

tributions made to deferred compensation arrangements such

as 401(k) retirement plans. In the latter case, these contribu-

tions are reported elsewhere on the W-2 form (e.g., box 13

in 1999) and the DER ﬁle contains reports of these deferred

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS 1461

earnings that can be added to box 1 earnings to approximate

gross earnings. While pre-tax health insurance plan premi-

ums are reported on the W-2 form, they are not contained in

the DER extract created for research use. This omission rep-

resents one important way in which administrative records

may differ from survey records that is not the result of error

in the survey data collection process. DER will be less than

SIPP earnings if, as instructed, the respondent reported gross

earnings during the survey that included health insurance

premiums.

There are other possible differences between box 1 on the

W-2 form and gross earnings reported in the survey. These

involve an employee beneﬁt that the employee is unlikely to

consider wages and is unlikely to be reported as such on a

pay stub but that the employer is required to report as taxable

income.18 In these cases, DER earnings are likely to be higher

than SIPP earnings, because respondents, again as instructed,

do not report these beneﬁts as gross earnings.

A ﬁnal potential problem with DER employer reports is

that EINs do not necessarily remain constant over time. This

poses problems for deﬁning an employer-employee rela-

tionship. Unlike Social Security numbers, which serve as

good longitudinal identiﬁers for individuals, EINs can change

for reasons that do not involve a person moving to a new

employer. Company reorganizations through mergers, acqui-

sitions, or spinoffs may result in a worker having two W-2

forms for a tax year, each with a different EIN, without hav-

ing changed employers. In such cases, the DER earnings will

be less than the SIPP earnings because a portion of the earn-

ings for the year is missing. As part of the linking process

between DER and SIPP earnings, we attempted to identify

these kinds of successor-predecessor problems and merge the

two DER jobs determined to be related to a single SIPP job

(see appendix C for details).

To summarize, the exclusion of health insurance premiums

from the DER implies DER less than SIPP; the inclusion of

employee beneﬁts in the DER implies DER is greater than

SIPP; and the EIN changes due to ﬁrm reorganization imply

DER is less than SIPP.

Error in the administrative data. Government agencies

that collect administrative data recognize that mistakes are

made in the reporting process although researchers com-

monly do not consider that. In the case of W-2 records, the

SSA has a process for employers to ﬁle amended returns,

and these are incorporated into the Master Earnings File.

Data managers at SSA generally suggest that most amended

returns are ﬁled and processed within two years of the original

18 These include educational assistance above a certain monetary level,

business expense reimbursement above the amount treated as substantiated

by the IRS, payments made by the employer to cover the employee’s share

of Social Security and Medicare taxes, certain types of fringe beneﬁts such

as the use of a company car, golden parachute payments, group-term life

insurance over $50,000 paid for by the employer,potentially some portion of

employer contributions to medical savings accounts, nonqualiﬁed moving

expenses, and, in some circumstances, sick pay from an insurance company

or other third-party payer.

ﬁling. Since our MEF research extract was created in 2002,

we believe that our DER data from tax years 1990 to 1999

contain most of the relevant amended ﬁlings. However, we

are also conﬁdent that not all ﬁling mistakes are recognized

and corrected and that some corrections ﬁrst happen many

years later. While this type of error may be less common than

survey reporting error, it does exist.

Another type of error in administrative data is process-

ing error. Sometimes employers make a processing mistake

when ﬁling returns for all their employees, such as adding

extra zeros to the ends of numerical amounts. There is some

evidence that automated read-in processes sometimes mal-

function and dollar amounts are created that are nonsensical.

This type of error process is very different from the one typ-

ically postulated for self-reported data as it is unrelated to

the actual amount or the person reporting. More research is

needed to determine the extent of this error and quantify its

speciﬁc impact.

Error in matching. Record mismatches are the ﬁnal

source of data error that we consider here. While much

clerical review has convinced us that our probabilistic link-

ing process produced high-quality matches, some (small, we

hope), error is always introduced by this type of record match-

ing. In our case, if the SSN for a respondent is correct, then a

mismatch in the job means the earnings belonged to the same

person but did not come from the same employer. If the SSN

is incorrect, then the difference in SIPP and DER earnings is

likely to be larger because the source person was incorrect.

We believe it is unlikely that an incorrect SSN would have

many job-level matches due to our use of the employer name,

but it is a possibility, especially for large employers.

V. Results

A. Random Effects and Reliability Statistics

We report parameter estimates for the elements of Σ0,

Σ|t−t|,G(θ), and G(ψ)from the estimation of our MLMM

model in equation (2)in table 5. We report the variance of

the SIPP and DER, the variance of the signal, the variance of

the measurement error, and the reliability statistics, accord-

ing to equations (10) to (18) in table 6. We present results for

job-level speciﬁcations using the full sample and then sepa-

rately using jobs with and without SIPP earnings imputations.

We also present results for person-level speciﬁcations where

the earnings are summed across all jobs, matched and non-

matched. Again, we use the full sample, a subsample with

only individuals with no imputed earnings, the complemen-

tary subsample with individuals with at least one month of

imputed earnings, and ﬁnally the full sample with a dominant

employer effect included in the random effects design matrix.

For each sample, we show in table 6 calculations done with

four different deﬁnitions of ω: SIPP as truth (1, 0), SIPP and

DER equally likely to be truth (0.5, 0.5), DER more likely

than SIPP to be truth (0.1, 0.9), and DER as truth (0, 1).

1462 THE REVIEW OF ECONOMICS AND STATISTICS

Table 5.—Unobservable Heterogeneity

Estimates of Variance of Random Effects

Job-Level Models Person-Level Models

Comparison to Literature Include Dominant

All No Imputations All No Imputations Employer

Observations Imputations Only Observations Imputations Only Effect

Person effect/G(Theta)

SIPP (1) 0.2846 0.3219 0.2656 0.1456 0.1820 0.0636 0.0422

DER (2) 0.3493 0.3734 0.3587 0.2299 0.2168 0.2169 0.0805

Covariance 0.3109 0.3444 0.2962 0.1689 0.1833 0.1135 0.0524

Employer (ﬁrm) effect/G(Psi)

SIPP (1) 0.3288 0.3671 0.2496 0.1702

DER (2) 0.4796 0.5053 0.4905 0.2250

Covariance 0.3626 0.3919 0.3165 0.1773

Sigma

Common variance (sigma2_c) 0.5561 0.5435 0.4722 0.6999 0.7011 0.7037 0.7054

AR1 correlation (rho_c) 0.7219 0.7875 0.4523 0.6261 0.7138 0.4837 0.5675

SIPP variance (sigma2_1) 0.1777 0.1588 0.2250 0.0967 0.0581 0.1606 0.0951

DER variance (sigma2_2) 0.2628 0.1875 0.4064 0.1458 0.1190 0.1970 0.1350

SIPP AR1 correlation (rho_1) 0.4743 0.6275 0.2830 0.1174 0.0636 0.1287 0.1295

DER AR1 correlation (rho_2) 0.5882 0.4580 0.6895 0.3938 0.3367 0.4663 0.3803

Observations

People 80,792 58,956 26,963 80,792 53,829 26,963 80,792

Jobs 116,781 83,862 32,919

People-(Job)-Years-SIPP 210,703 145,987 64,716 182,632 117,767 64,865 182,632

People-(Job)-Years-DER 211,477 146,600 64,877 184,939 119,407 65,532 184,939

Unique employers 70,081 53,448 23,439 58,358

Model

Value of the likelihood −120,762.89 −76,369.18 −40224.92 −6,311.56 6,436.61 −7,372.38 −2,229.32

Model degrees of freedom 90 90 90 90 90 90 90

Residual degrees of freedom 422,090 292,497 129,503 367,481 237,084 130,307 367,481

Total observations 422,180 292,587 129,593 367,571 237,174 130,397 367,571

Random effects are based on the REML estimation of the mixed-effects models. The job-level sample is matched SIPP/DER jobs with at least one SIPP and DER earnings report in the panel years and no missing

interview waves. The person-level sample is created by summing earnings from all jobs for people with at least one matched job.

As shown in table 5, the person, employer, and time-period

speciﬁc variance components are uniformly higher in the

DER than the SIPP, which, using equation (10), means a

higher conditional variance for the DER overall than in the

SIPP, as shown in the third column of table 6 (Variance of Y,

DER). Person effects, employer effects, and measure-speciﬁc

random time effects all contribute to the greater conditional

variance of the DER measure as compared to SIPP. This

observation holds over all samples and subsamples and for

both job- and person-level speciﬁcations.

In the model estimated with no imputed values, the vari-

ance of the person and employer effects rises for both SIPP

and DER, and the difference between them remains similar.

However, the variance of the time-period speciﬁc compo-

nents, σ2

1and σ2

2, falls for both SIPP and DER, and the gap

narrows. In the model estimated with only jobs that had at

least one month of imputed earnings, the DER person and

employer variance components remain similar to the other

two models, but the SIPP person and employer components

become smaller. The variance of the time-period speciﬁc

component rises for both the SIPP and DER, and the gap

between them increases. This result is reﬂected in table 6

where the no-imputations sample has the highest variance in

the SIPP and the imputations-only sample has the lowest SIPP

variance. For the DER, the opposite is true: the imputations-

only sample has the highest variance and the no-imputations

sample has the lowest variance. Our hypothesis is that this

ranking of overall levels of conditional variance is due to

the Census Bureau’s survey imputation methods, speciﬁcally,

that earnings non-responders are not ignorably missing given

the conditioning data used in the hot-deck imputation.

The overall conditional variance for each data source

remains constant regardless of our deﬁnition of truth. The

variance of the signal moves between the SIPP variance

and the DER variance, depending on the value of ω. When

ω=(1, 0)—SIPP is truth—the signal equals the variance

of the SIPP. When ω=(0, 1)—DER is truth—the signal

equals the variance of the DER. The variance of the measure-

ment error moves accordingly and rises for each source as we

place less weight on that source being the truth. Our reliabil-

ity statistic as deﬁned in equations (15)and (16)is shown

in column 6 of table 6. Without additional assumptions or

data about which source is truth, there is no way to choose

one reliability statistic over the others. However, the range is

informative in that it shows how much the SIPP statistic might

change if researchers were to move away from the concept

of administrative data as truth. Depending on the assumption

about truth, the statistic ranges from 1 to 0.6 for the SIPP and

from 1 to 0.68 for the DER in the full job sample. Not surpris-

ingly, the range of both the SIPP and DER reliability statistics

is larger for the model estimated with only jobs with imputa-

tions (1 to 0.37 for the SIPP and 1 to 0.55 for the DER) and

smaller for the model using only jobs without imputations

(1 to 0.68 for the SIPP and 1 to 0.73 for the DER).

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS 1463

Table 6.—Reliability Statistics

Truth Model Variance of Y Variance Variance of ME Reliability Statistic (RR) RR t-1 RR t-6

Observations Used Weight(SIPP,DER) SIPP DER of Signal SIPP DER SIPP DER SIPP DER SIPP DER

Sample: Jobs

All 1,0 1.347 1.648 1.347 0.000 0.536 1.000 0.675 1.000 0.759 1.000 0.882

All 0.5,0.5 1.347 1.648 1.364 0.134 0.134 0.901 0.919 0.924 0.940 0.961 0.971

All 0.1,0.9 1.347 1.648 1.570 0.434 0.005 0.678 0.997 0.754 0.998 0.874 0.999

All 0,1 1.347 1.648 1.648 0.536 0.000 0.602 1.000 0.696 1.000 0.844 1.000

No imputations 1,0 1.391 1.610 1.391 0.000 0.441 1.000 0.726 1.000 0.799 1.000 0.895

No imputations 0.5,0.5 1.391 1.610 1.390 0.110 0.110 0.921 0.931 0.942 0.950 0.968 0.974

No imputations 0.1,0.9 1.391 1.610 1.548 0.357 0.004 0.743 0.997 0.813 0.998 0.896 0.999

No imputations 0,1 1.391 1.610 1.610 0.441 0.000 0.683 1.000 0.769 1.000 0.872 1.000

Imputations only 1,0 1.212 1.728 1.212 0.000 0.770 1.000 0.554 1.000 0.641 1.000 0.796

Imputations only 0.5,0.5 1.212 1.728 1.278 0.193 0.193 0.841 0.889 0.848 0.910 0.912 0.949

Imputations only 0.1,0.9 1.212 1.728 1.607 0.624 0.008 0.485 0.996 0.507 0.996 0.715 0.998

Imputations only 0,1 1.212 1.728 1.728 0.770 0.000 0.365 1.000 0.391 1.000 0.648 1.000

Sample: People

All 1,0 0.942 1.076 0.942 0.000 0.280 1.000 0.740 1.000 0.853 1.000 0.860

All 0.5,0.5 0.942 1.076 0.939 0.070 0.070 0.926 0.935 0.955 0.963 0.949 0.965

All 0.1,0.9 0.942 1.076 1.037 0.227 0.003 0.759 0.997 0.855 0.999 0.835 0.999

All 0,1 0.942 1.076 1.076 0.280 0.000 0.703 1.000 0.821 1.000 0.796 1.000

No imputations 1,0 0.941 1.037 0.941 0.000 0.209 1.000 0.798 1.000 0.900 1.000 0.895

No imputations 0.5,0.5 0.941 1.037 0.937 0.052 0.052 0.944 0.950 0.972 0.975 0.971 0.974

No imputations 0.1,0.9 0.941 1.037 1.009 0.170 0.002 0.820 0.998 0.910 0.999 0.905 0.999

No imputations 0,1 0.941 1.037 1.037 0.209 0.000 0.778 1.000 0.889 1.000 0.882 1.000

Imputations only 1,0 0.928 1.118 0.928 0.000 0.411 1.000 0.632 1.000 0.744 1.000 0.756

Imputations only 0.5,0.5 0.928 1.118 0.920 0.103 0.103 0.889 0.908 0.902 0.936 0.809 0.939

Imputations only 0.1,0.9 0.928 1.118 1.062 0.333 0.004 0.641 0.996 0.683 0.997 0.380 0.998

Imputations only 0,1 0.928 1.118 1.118 0.411 0.000 0.557 1.000 0.609 1.000 0.234 1.000

All, employer effect 1,0 1.013 1.146 1.013 0.000 0.289 1.000 0.748 1.000 0.839 1.000 0.821

All, employer effect .5, 1.013 1.146 1.007 0.072 0.072 0.929 0.937 0.951 0.960 0.938 0.955

All, employer effect .1, 1.013 1.146 1.107 0.234 0.003 0.769 0.997 0.842 0.998 0.798 0.998

All, employer effect 0,1 1.013 1.146 1.146 0.289 0.000 0.715 1.000 0.805 1.000 0.750 1.000

Calculations made using variance components from REML mixed-effects models as reported in table 5. The label “imputations only” means that at least one SIPP year contains a hot-deck earnings imputation for at

least one month for that person-job.

The reliability statistic is predictably higher for the mod-

els estimated with person-level data and ranges from 1 to

0.7 for the SIPP and 1 to 0.74 for the DER when estimated

using the whole sample. These results are consistent with

those of Gottschalk and Huynh (2000), who found a reliabil-

ity statistic of 0.67 for the 1996 panel, which rose to 0.73 when

earnings imputations were dropped. Although we include

four additional SIPP panels, when we declare the DER to be

the truth, our SIPP reliability statistics of 0.70 and 0.78 for

the full and no-imputation samples, respectively, are close.

For the person-level speciﬁcation that includes the dominant

employer effect, there is very little change in the reliability

statistics for either the SIPP or the DER. The only major

difference is the fall in the variance of the person effects,

which is also what Abowd, Kramarz, and Margolis (1999)

ﬁnd in French administrative data and Woodcock (2008) ﬁnds

for American administrative data. Most of the variance due

to employers was previously attributed to individuals in the

validation studies cited in section II.

In the ﬁnal two columns of table 6, we show reliabil-

ity statistics for data one year lagged and six years lagged,

calculated based on equations (17)and (18). For every deﬁ-

nition of truth, the SIPP reliability statistic is higher for time

period t−1 than for t, a ﬁnding that is also consistent with

Gottschalk and Huynh’s result that the impact of measure-

ment error on earnings mobility estimates is mitigated by

the structure of the error, in particular its property of mean

reversion. Even when we place some probability on the SIPP

being true and the DER having error, our results support this

hypothesis.

An interesting feature of tables 5 and 6 is that when our

model is ﬁt at the person level, the variance of the person and

employer effects is substantially less than when the model

is ﬁt at the job level. This may be an important new ﬁnding

because employer effects are rarely included at the person

level, and we speculate on its causes. First, there is much

more variance in the SIPP and DER earnings measures at the

job level than at the person level. The effect of multiple job

holding over the year, either simultaneous or sequential, is to

reduce the total variance of earnings and to reduce the contri-

bution of individual differences in earnings to that variance,

particularly when we control for the effect of the dominant

employer. Second, persistent differences in the pay policies

of employers are mitigated by changing employers but not

by enough to eliminate employer effects in annual earnings

from all sources.

1464 THE REVIEW OF ECONOMICS AND STATISTICS

Table 7.—Fixed Effects, Job-Level Data, All Observations

True (Weighted Average) SIPP ME DER ME

Weight(SIPP,DER) Weight(SIPP,DER) Weight(SIPP,DER)

1,0 .5,.5 .1,.9 0,1 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9

Male white

High school diploma Coefﬁcient 0.302 0.302 0.302 0.302 0.000 0.000 0.000 0.000 0.000 0.000

SE 0.017 0.017 0.018 0.019 0.005 0.009 0.010 0.010 0.005 0.001

Some college Coefﬁcient 0.306 0.309 0.312 0.313 −0.004 −0.006 −0.007 0.007 0.004 0.001

SE 0.017 0.017 0.018 0.018 0.005 0.008 0.009 0.009 0.005 0.001

College degree Coefﬁcient 0.823 0.841 0.856 0.860 −0.018∗∗ −0.033∗∗ −0.037∗∗ 0.037 0.018 0.004

SE 0.021 0.021 0.022 0.023 0.006 0.011 0.012 0.012 0.006 0.001

Gradvatic degree Coefﬁcient 0.845 0.861 0.873 0.876 −0.016∗∗ −0.028∗∗ −0.031∗∗ 0.031 0.016 0.003

SE 0.020 0.021 0.022 0.022 0.006 0.010 0.011 0.011 0.006 0.001

Male nonwhite

High school diploma Coefﬁcient 0.361 0.363 0.365 0.365 −0.002 −0.004 −0.004 0.004 0.002 0.000

SE 0.044 0.044 0.047 0.048 0.012 0.022 0.025 0.025 0.012 0.002

Some college Coefﬁcient 0.420 0.423 0.425 0.426 −0.003 −0.005 −0.006 0.006 0.003 0.001

SE 0.044 0.044 0.047 0.048 0.012 0.022 0.025 0.025 0.012 0.002

College degree Coefﬁcient 0.823 0.816 0.810 0.809 0.007 0.013 0.014 −0.014 −0.007 −0.001

SE 0.058 0.058 0.062 0.063 0.016 0.029 0.033 0.033 0.016 0.003

Gradvate degree Coefﬁcient 0.996 0.997 0.998 0.998 −0.001 −0.002 −0.002 0.002 0.001 0.000

SE 0.055 0.056 0.059 0.060 0.016 0.028 0.032 0.032 0.016 0.003

Intercept difference Coefﬁcient −0.266 −0.297 −0.322 −0.328 0.031 0.056 0.062 −0.062 −0.031 −0.006

SE 0.084 0.083 0.088 0.090 0.026 0.047 0.052 0.052 0.026 0.005

Female white

High school diploma Coefﬁcient 0.311 0.317 0.323 0.324 −0.007 −0.012 −0.014 0.014 0.007 0.001

SE 0.019 0.019 0.020 0.020 0.005 0.009 0.011 0.011 0.005 0.001

Some college Coefﬁcient 0.381 0.384 0.386 0.386 −0.003 −0.005 −0.005 0.005 0.003 0.001

SE 0.018 0.018 0.020 0.020 0.005 0.009 0.010 0.010 0.005 0.001

College degree Coefﬁcient 0.801 0.805 0.808 0.809 −0.004 −0.008 −0.009 0.009 0.004 0.001

SE 0.023 0.023 0.024 0.025 0.006 0.011 0.013 0.013 0.006 0.001

Gradvate degree Coefﬁcient 0.952 0.958 0.963 0.965 −0.006 −0.011 −0.012 0.012 0.006 0.001

SE 0.022 0.023 0.024 0.025 0.006 0.011 0.013 0.013 0.006 0.001

Intercept difference Coefﬁcient −0.107 −0.102 −0.098 −0.097 −0.005 −0.009 −0.010 0.010 0.005 0.001

SE 0.041 0.040 0.043 0.044 0.013 0.023 0.025 0.025 0.013 0.003

Female nonwhite

High school diploma Coefﬁcient 0.352 0.359 0.365 0.366 −0.007 −0.013 −0.014 0.014 0.007 0.001

SE 0.040 0.040 0.043 0.044 0.011 0.021 0.023 0.023 0.011 0.002

Some college Coefﬁcient 0.423 0.416 0.410 0.408 0.007 0.013 0.015 −0.015 −0.007 −0.001

SE 0.039 0.039 0.042 0.043 0.011 0.020 0.022 0.022 0.011 0.002

College degree Coefﬁcient 0.886 0.901 0.913 0.916 −0.015 −0.027 −0.030 0.030 0.015 0.003

SE 0.051 0.052 0.055 0.056 0.015 0.026 0.029 0.029 0.015 0.003

Gradvate degree Coefﬁcient 1.141 1.161 1.176 1.180 −0.020 −0.035 −0.039 0.039 0.020 0.004

SE 0.054 0.055 0.058 0.059 0.015 0.028 0.031 0.031 0.015 0.003

Intercept difference Coefﬁcient −0.142 −0.133 −0.127 −0.125 −0.008 −0.015 −0.016 0.016 0.008 0.002

SE 0.076 0.075 0.079 0.081 0.024 0.042 0.047 0.047 0.024 0.005

1990 SIPP panel Coefﬁcient −0.165 −0.124 −0.092 −0.084 −0.040∗∗∗ −0.073∗∗∗ −0.081∗∗∗ 0.081 0.040 0.008

SE 0.022 0.021 0.022 0.023 0.007 0.013 0.014 0.014 0.007 0.001

1991 SIPP panel Coefﬁcient −0.127 −0.079 −0.041 −0.032 −0.048∗∗∗ −0.086∗∗∗ −0.095∗∗∗ 0.095 0.048 0.010

SE 0.020 0.020 0.021 0.022 0.007 0.012 0.013 0.013 0.007 0.001

1992 SIPP panel Coefﬁcient −0.107 −0.068 −0.038 −0.030 −0.038∗∗∗ −0.069∗∗∗ −0.077∗∗∗ 0.077 0.038 0.008

SE 0.017 0.017 0.018 0.018 0.005 0.009 0.010 0.010 0.005 0.001

1993 SIPP panel Coefﬁcient −0.041 −0.005 0.024 0.032 −0.036∗∗∗ −0.066∗∗∗ −0.073∗∗∗ 0.073 0.036 0.007

SE 0.016 0.016 0.017 0.018 0.005 0.009 0.010 0.010 0.005 0.001

Intercept Coefﬁcient 6.858 6.819 6.787 6.779 0.040∗∗ 0.071∗∗ 0.079∗∗ −0.079 −0.040 −0.008

SE 0.037 0.036 0.038 0.039 0.012 0.022 0.024 0.024 0.012 0.002

Linear time trend Coefﬁcient −0.038 −0.037 −0.036 −0.036 −0.001 −0.001 −0.001 0.001 0.001 0.000

SE 0.002 0.002 0.003 0.003 0.001 0.002 0.002 0.002 0.001 0.000

Fixed effects (coefﬁcients on race, gender, education, panel, and time trend) from REML estimation of mixed-effects model using full job-level sample. Signiﬁcant at *5%, **1%, and ***0.1%.

B. Fixed Effects

We turn next to the ﬁxed effects from our estimation of

the MLMM model and present results from the job-level

model estimated with all the observations in table 7. Using

the deﬁnition from equation (19), in the ﬁrst four columns,

we present different estimates of the vector Sig(B), calcu-

lated using the same priors for ωas we used to calculate the

reliability statistics in section VA. Since equation (19)deﬁnes

the truth to be the weighted average of the SIPP and DER

coefﬁcients, the ﬁrst and fourth columns of table 7 [ω=(1, 0)

and (0, 1)], correspond to coefﬁcients from a SIPP earnings

equation and a DER earnings equation, respectively. These

end points deﬁne the range of each ﬁxed effect. The SIPP and

DER measurement errors are deﬁned according to equation

(20)and are reported in columns 5 to 10. When one source is

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS 1465

declared to be the truth, by deﬁnition the measurement error

for this source is 0; hence, each source has only three columns

of measurement error reported. Standard errors are reported

for both the true coefﬁcients and the measurement error. Mea-

surement error signiﬁcantly different from 0 is equivalent

to stating that there are signiﬁcant differences between the

SIPP and DER estimates of a particular coefﬁcient. Negative

measurement error for either source means that the source

coefﬁcient was smaller than the weighted average coefﬁcient.

By deﬁnition, the DER and SIPP measurement errors will

have opposite signs.

We remind readers that in mixed-effects modeling, the

ﬁxed effects are the coefﬁcients on the observed characteris-

tics of the individuals. These effects may vary over time and

differ from the random effects because the effect is estimated

directly instead of being inferred from its distribution. The

ﬁxed effects included in our model are education in ﬁve lev-

els (no high school diploma (excluded case); high school

diploma; some college; college degree; graduate degree); a

piecewise linear spline in labor force experience with bend

points at 2, 5, 10, and 25 years of experience; an overall inter-

cept; SIPP panel effects (excluded case is 1996); and a linear

time trend. The education, experience, and overall intercept

coefﬁcients are all interacted with race and gender to produce

separate estimates for white males, nonwhite males, white

females, and nonwhite females.

The only ﬁxed effects that are signiﬁcantly different

between the SIPP and the DER are the college and gradu-

ate degree indicators for white males and the panel effects.

For these two education effects, SIPP measurement error esti-

mates range from −0.018 to −0.037, meaning that the SIPP

estimates of the returns to a college degree are approximately

2 to 4 log points lower than in the DER. Estimates for the

return to a graduate degree vary by approximately 1.5 to 3

log points, with the SIPP returns again being lower. The panel

effects represent average differences in annual earnings for

each SIPP panel relative to the 1996 panel. These effects are

negative in both the SIPP and DER, meaning that 1996 earn-

ings are higher on average. SIPP measurement error ranges

from −0.04 to −0.1, which means that the DER panel effects

are less negative, that is, for DER earnings, the differences

between panels are lower. Here, however, we caution against

assigning too much importance to this result. The 1996 panel

was three waves longer than the longest panel of the early

1990s and suffered much more from problems with individu-

als missing waves. Thus, as shown in table 1, our 1996 sample

size becomes very small when we drop individuals with miss-

ing waves and likely leaves us with a group of respondents

with different characteristics from those in the panels from

the early 1990s. Thus, comparisons across panels are difﬁcult

to make. All the other coefﬁcients in table 7 are very similar

between the SIPP and the DER with measurement error usu-

ally less than 1 percentage point and not signiﬁcantly different

from 0.

In table 8, we show the effect on earnings of 2, 5, 10, 25, and

30 years of experience, calculated using the ﬁve coefﬁcients

from the piece-wise linear spline in the main job-level model.

These effects are split by demographic group and, as with

the coefﬁcients in table 7, we report the true effect based on

four different priors for ωand also the SIPP and DER mea-

surement error. Each effect is followed by its standard error.

For both white males and females, there are signiﬁcant dif-

ferences between the SIPP and DER experience effects at 2

and 5 years, with the SIPP effect being larger by 5 to 15 log

points. At 10 and 25 years, there are no signiﬁcant differences

between the SIPP and DER effects. For white men at 30 years,

the DER effect is 2 to 4 log points signiﬁcantly larger, but for

white women, the SIPP and DER effects are not signiﬁcantly

different at 30 years. For nonwhite males and females, the dif-

ferences between the SIPP and the DER are signiﬁcant only

at 2 years. For nonwhite men, the measurement error effects

are relatively large at 5 years and 30 years, ranging from 5 to

10 log points, but these results are imprecisely estimated and

are not signiﬁcant. Nonwhite females have similarly large

standard errors and hence no signiﬁcant effects after 2 years,

but even the magnitude of the effects stays small after 5 years.

In this sense, the proﬁles of white and nonwhite women are

similar to each other. The SIPP effect is initially higher, and

then the SIPP and DER converge and are quite similar at 30

years. For both white and nonwhite men, the SIPP effect is

initially larger and then converges to the DER effect, and then

the DER effect becomes larger, although only for white males

is this pattern signiﬁcant.

These results change somewhat when imputations are

dropped or when the analysis is done at the person level.

In particular, returns for women stay higher in the SIPP rela-

tive to the DER across the whole experience proﬁle. We refer

interested readers to appendix D, where we discuss the full

set of comparable results for the job-level model estimated

using jobs without imputed earnings and jobs with at least one

month of imputed earnings and for the person-level model

using all three samples and the dominant employer speciﬁ-

cation. This appendix also contains graphical summaries of

the experience effects.

VI. Conclusion

We used linked survey data from the Census Bureau’s

SIPP and administrative data from the SSA’s DER, matched

at both the job and person levels, to estimate and analyze

measurement error models based on a multivariate linear

mixed-effects model for the pair of SIPP and DER annual

earnings outcomes. We showed that linking survey and

administrative data at the job level is a substantially more

complicated and nuanced process than linking the same data

at the individual level. The potential for measurement error

due to mismatching is substantial, and we documented the

steps taken to control that error. In the statistical speciﬁcation,

we ﬁnd that the conditional variance of the DER measures,

given the factors in both the ﬁxed- and random-effect design

matrices, is greater than that of the SIPP component by com-

ponent for the person, employer, and time effects. There

1466 THE REVIEW OF ECONOMICS AND STATISTICS

Table 8.—Experience Profiles from Job-Level Data, All Observations, for Different Truth Definitions

Experience =2 years Experience =5 years Experience =10 years Experience =25 years Experience =30 years

Weight(SIPP,DER) Weight(SIPP,DER) Weight(SIPP,DER) Weight(SIPP,DER) Weight(SIPP,DER)

1,0 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9 0,1 1,0 .5,.5 .1,.9 0,1

Male White

TRUE

Effect 1.154 1.079 1.019 1.004 1.809 1.762 1.724 1.714 2.406 2.388 2.373 2.370 2.914 2.929 2.942 2.945 2.767 2.786 2.802 2.806

SE 0.036 0.035 0.037 0.038 0.033 0.033 0.035 0.036 0.031 0.031 0.033 0.034 0.031 0.031 0.032 0.033 0.030 0.030 0.032 0.032

SIPP ME

Effect 0.000 0.075 0.135 0.150 0.000 0.047 0.085 0.095 0.000 0.018 0.032 0.036 0.000 −0.015 −0.027 −0.030 0.000 −0.020 −0.035 −0.039

SE 0.000 0.012 0.021 0.023 0.000 0.010 0.018 0.020 0.000 0.010 0.017 0.019 0.000 0.010 0.017 0.019 0.000 0.009 0.017 0.019

∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗

DER ME

Effect −0.150 −0.075 −0.015 0.000 −0.095 −0.047 −0.009 0.000 −0.036 −0.018 −0.004 0.000 0.030 0.015 0.003 0.000 0.039 0.020 0.004 0.000

SE 0.023 0.012 0.002 0.000 0.020 0.010 0.002 0.000 0.019 0.010 0.002 0.000 0.019 0.010 0.002 0.000 0.019 0.009 0.002 0.000

Female White

TRUE

Effect 1.089 1.044 1.008 0.999 1.590 1.557 1.530 1.524 2.025 2.015 2.007 2.005 2.403 2.398 2.394 2.393 2.287 2.288 2.288 2.289

SE 0.034 0.033 0.035 0.036 0.032 0.031 0.033 0.034 0.030 0.030 0.032 0.033 0.031 0.030 0.032 0.033 0.030 0.029 0.031 0.032

SIPP ME

Effect 0.000 0.045 0.081 0.090 0.000 0.033 0.059 0.066 0.000 0.010 0.017 0.019 0.000 0.005 0.009 0.010 0.000 −0.001 −0.002 −0.002

SE 0.000 0.011 0.020 0.022 0.000 0.010 0.017 0.019 0.000 0.009 0.017 0.019 0.000 0.009 0.017 0.019 0.000 0.009 0.016 0.018

∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗

DER ME

Effect −0.090 −0.045 −0.009 0.000 −0.066 −0.033 −0.007 0.000 −0.019 −0.010 −0.002 0.000 −0.010 −0.005 −0.001 0.000 0.002 0.001 0.000 0.000

SE 0.022 0.011 0.002 0.000 0.019 0.010 0.002 0.000 0.019 0.009 0.002 0.000 0.019 0.009 0.002 0.000 0.018 0.009 0.002 0.000

Male Nonwhite

TRUE

Effect 1.461 1.388 1.330 1.315 1.873 1.826 1.788 1.779 2.412 2.425 2.436 2.439 2.792 2.831 2.862 2.869 2.728 2.776 2.814 2.824

SE 0.097 0.095 0.101 0.103 0.088 0.087 0.092 0.094 0.084 0.083 0.088 0.090 0.083 0.082 0.087 0.089 0.081 0.080 0.085 0.087

SIPP ME

Effect 0.000 0.073 0.131 0.146 0.000 0.047 0.085 0.095 0.000 −0.014 −0.024 −0.027 0.000 −0.038 −0.069 −0.077 0.000 −0.048 −0.087 −0.096

SE 0.000 0.031 0.056 0.062 0.000 0.027 0.048 0.054 0.000 0.026 0.047 0.052 0.000 0.026 0.046 0.051 0.000 0.025 0.045 0.050

∗∗∗

DER ME

Effect −0.146 −0.073 −0.015 0.000 −0.095 −0.047 −0.009 0.000 0.027 0.014 0.003 0.000 0.077 0.038 0.008 0.000 0.096 0.048 0.010 0.000

SE 0.062 0.031 0.006 0.000 0.054 0.027 0.005 0.000 0.052 0.026 0.005 0.000 0.051 0.026 0.005 0.000 0.050 0.025 0.005 0.000

Female Nonwhite

TRUE

Effect 1.187 1.117 1.061 1.047 1.605 1.563 1.531 1.522 1.959 1.937 1.919 1.915 2.356 2.354 2.352 2.351 2.266 2.271 2.274 2.275

SE 0.085 0.084 0.089 0.091 0.077 0.076 0.081 0.083 0.073 0.072 0.077 0.079 0.074 0.073 0.077 0.079 0.072 0.071 0.075 0.077

SIPP ME

Effect 0.000 0.070 0.127 0.141 0.000 0.041 0.074 0.082 0.000 0.022 0.040 0.044 0.000 0.002 0.004 0.005 0.000 −0.005 −0.009 −0.010

SE 0.000 0.028 0.050 0.056 0.000 0.024 0.043 0.048 0.000 0.023 0.041 0.046 0.000 0.023 0.041 0.046 0.000 0.022 0.040 0.045

∗∗∗

DER ME

Effect −0.141 −0.070 −0.014 0.000 −0.082 −0.041 −0.008 0.000 −0.044 −0.022 −0.004 0.000 −0.005 −0.002 0.000 0.000 0.010 0.005 0.001 0.000

SE 0.056 0.028 0.006 0.000 0.048 0.024 0.005 0.000 0.046 0.023 0.005 0.000 0.046 0.023 0.005 0.000 0.045 0.022 0.004 0.000

Effect on earnings of 2, 5, 10, 25, and 30 years of experience. Calculated using ﬁve coefﬁcients from piece-wise linear spline (0-2, 2-5, 5-10, 10-25, 25+ years) in job-level mixed effects model, full sample. The astrisks refer to the SIPP ME effects: Signiﬁcant at *5%, **1%,

and ***0.1%.

ESTIMATING MEASUREMENT ERROR IN ANNUAL JOB EARNINGS 1467

is more variability in the DER job- and person-level data,

even controlling for demography, education, and labor force

experience.

In our model, neither the SIPP nor the DER measure was

treated as “true.” Instead, we speciﬁed a prior weight vec-

tor that was used to deﬁne “truth” as a weighted average of

SIPP and DER. Such a speciﬁcation allowed us systemati-

cally to consider the implications of errors in either measure

on the resulting conclusions about conditional means (ﬁxed

effects) and variance components (random effects). Consid-

ering the random components of the error process, we found

that the reliability statistics for SIPP and DER earnings mea-

sures were quite comparable except for the subsample of SIPP

person-jobs, where at least one year of SIPP earnings con-

tained a Census Bureau imputation. These measures were

less reliable than the DER. For the ﬁxed effects, we found

very little statistically meaningful measurement error, with

most of the error being found in the highly educated white

male groups and the early-career experience proﬁles of male

and female whites.

Overall, our results point to the need to allow for mea-

surement error in both the survey and administrative data

when doing validation studies. However, there are certain

situations, particularly when the SIPP measure is based on

partially imputed data, where we ﬁnd strong evidence that

the administrative measure contains less error. An impor-

tant next step is to combine our modeling procedure with an

audit study that determines the correct value of the earnings

measure as a function of variables that are measured for all

cases. Results from such work could be used by statistical

agencies to produce a measure of “true earnings” that is a

hybrid of survey and administrative data, a valuable measure

for researchers that would allow agencies to release informa-

tion from administrative data while limiting conﬁdentiality

concerns.

REFERENCES

Abowd, John M., and David Card, “On the CovarianceStructure of Earnings

and Hours Changes,” Econometrica 57 (1989), 411–446.

Abowd, John M., Francis Kramarz, and David N. Margolis, “High

Wage Workers, High Wage Firms,” Econometrica 67 (1999),

251–333.

Angrist, Joshua, and Alan B. Krueger, “Empirical Strategies in Labor

Economics” (Vol. 3, pt. 1, pp. 1277–1366), in O. Ashenfelter and

D. Card, eds., Handbook of Labor Economics (New York: Elsevier,

1999).

Bound, John, Charles Brown, Greg J. Duncan, and Willard L. Rodgers,

“Evidence on the Validity of Cross-Sectional and Longitudinal Labor

Market Data,” Journal of Labor Economics 12 (1994), 345–368.

Bound, John, Charles Brown, and Nancy Mathiowetz, “Measurement Error

in Survey Data” (pp. 3705–3843), in J. J. Heckman and E. Leamer,

eds., Handbook of Econometrics (New York: Elsevier, 2001).

Bound, John, and Alan B. Krueger, “The Extent of Measurement Error

in Longitudinal Earnings Data: Do Two Wrongs Make a Right?”

Journal of Labor Economics 9 (1991), 1–24.

Duncan, Greg J., and Daniel H. Hill, “An Investigation of the Extent and

Consequences of Measurement Error in Labor-Economic Survey

Data,” Journal of Labor Economics 3 (1985), 508–532.

Fellegi, Ivan P., and Alan B. Sunter, “A Theory for Record Linkage,” Journal

of the American Statistical Association 64 (1969), 1183–1210.

Fuller, Wayne, Measurement Error Models (New York; Wiley, 1987).

Gilmour, Arthur R., B. J. Gogel, Robin Thompson, and Brian R. Cullis,

ASREML User Guide Release 3.0 (Hermel Wempstead, UK: VSN

International Ltd., 2009).

Gottschalk, Peter, and Minh Huynh, “Are Earnings Inequality and Mobility

Overstated? The Impact of Non-Classical Measurement Error,” this

review 92 (2010), 302–315.

Gottschalk, Peter, and Robert Mofﬁtt, “Changes in Job Instability and Inse-

curity Using Monthly Survey Data,” Journal of Labor Economics

17 (1999), S91–S126.

Groves, Robert M., Floyd J. Fowler Jr., Mick P. Couper, James M. Lep-

kowski, Eleanor Singer,and Roger Tourangeau, Survey Methodology

(New York: Wiley, 2004).

Kapteyn, Arie, and Jelmer Y. Ypma, “Measurement Error and Misclassiﬁ-

cation: A Comparison of Survey and Administrative Data,” Journal

of Labor Economics 25 (2007), 513–550.

Meijer, Erik, Susann Rohwedder, and Tom Wansbeek, “Measurement Error

in Earnings Data: Using a Mixture Model Approach to Combine Sur-

vey and Register Data,” Journal of Business and Economic Statistics

30 (2012): 191–201.

Mellow, Wesley, and Hal Sider, “Accuracy of Response in Labor Market

Surveys: Evidence and Implications,” Journal of Labor Economics

1 (1983): 331–344.

Pischke, Jörn-Steffen, “Measurement Error and Earnings Dynamics: Some

Estimates from the PSID Validation Study,” Journal of Business and

Economic Statistics 13 (1995), 305–314.

Roemer, Marc, “Using Administrative Earnings Records to Assess Wage

Date Quality in the March Current Population Survey and the

Survey of Income and Program Participation,” LEHD techni-

cal paper TP-2002–22 (2002), http://lehd.did.census.gov/led/library

/techpapers/tp-2002–22.pdf.

Rubin, Donald B., “Inference and Missing Data,” Biometrika 63 (1976),

581–592.

Stinson, Martha H., “Technical Description of SIPP Job Identiﬁcation

No. Editing, 1990–1993 SIPP Panels.” SIPP technical paper, U.S.

Census Bureau (2003), http://www.census.gov/sipp/core_content

/core_notes/ DescriptionSIPPJOBIDEDITING.pdf.

Woodcock, Simon, “Wage Differentials in the Presence of Unobserved

Worker, Firm and Match Heterogeneity,” Labour Economics 15

(2008), 771–793.