Chapter

Non sampling errors in sample surveys: the Bank of Italy's experience

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Non-sampling errors are a serious problem in household surveys. This paper exploits the Bank of Italy’s Survey on Household Income and Wealth to show how these issues can be studied and how the main effects on estimates can be accounted for. The topics examined are unit non-response, uncorrelated measurement errors and some specific cases of underreporting. The unit non-response can be overcome by weighting valid cases using external (typically demographic and geographical) information or by modelling the respondents’ propensities to participate in the survey. The effect of the uncorrelated measurement errors can be evaluated using specific reliability indices constructed with the information collected over the panel component. The underreporting bias of income and wealth is estimated by combining statistical matching techniques with auxiliary information and by exploiting different response behaviours across different groups.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Le principali fonti per la stima della ricchezza finanziaria delle famiglie italiane provengono dalla Banca d’Italia e sono costituite dai Conti Finanziari, pubblicati con cadenza trimestrale, e dall’Indagine sui bilanci delle famiglie (IBF), realizzata ogni due anni. Le stime delle attività finanziarie di natura campionaria hanno il pregio di poter essere disaggregate per numerose caratteristiche rilevate nell’indagine ma risultano a livello aggregato sistematicamente inferiori a quelle basate sui Conti Finanziari (anche dopo avere tenuto conto delle differenze nelle definizioni). L’obiettivo del lavoro è quello di presentare una procedura per correggere la distorsione delle stime della ricchezza finanziaria delle famiglie italiane basate sui risultati dell’indagine sui bilanci, causata dalla reticenza dei partecipanti a indicare il possesso nonché l’ammontare effettivamente detenuto degli strumenti finanziari (under-reporting). La procedura di correzione proposta si basa sulle informazioni ottenute da un’indagine su un campione di famiglie clienti del gruppo Unicredito (indagine UCI), raccordate ai dati sulle consistenze effettivamente detenute presso le banche del gruppo. L’aggiustamento dei dati IBF avviene in due stadi successivi. Nel primo si misura la reticenza, confrontando le dichiarazioni degli intervistati nell’indagine UCI con i dati sulle consistenze effettivamente detenute, in funzione degli importi dichiarati e delle caratteristiche socio-economiche delle famiglie. Nel secondo stadio, le relazioni stimate allo stadio precedente sono estese al campione IBF, ottenendo valori aggiustati della ricchezza finanziaria per l’intera popolazione italiana di clienti bancari. La media delle attività finanziarie corrette per l’under-reporting è pari a 59 mila euro (più del doppio rispetto ai dati non aggiustati), ovvero circa l’85 per cento della stima dei Conti Finanziari (a seconda dei parametri usati nel processo di aggiustamento, il valore è compreso tra l’81 e l’89 per cento). La maggiore correzione interessa le obbligazioni e i fondi comuni; anche a causa di ciò, la quota di famiglie con portafogli rischiosi aumenta sensibilmente. L’intensità dell’aggiustamento risulta superiore per le famiglie con un solo componente e cresce con l’età del capofamiglia. Inoltre, essa aumenta per i capifamiglia con basso titolo di studio e per coloro che sono pensionati o disoccupati. Nel complesso, dopo il processo di correzione, la concentrazione della ricchezza finanziaria mostra una lieve diminuzione.
Article
Full-text available
This paper is aimed at evaluating the incidence of measurement error in the Bank of Italy's Survey of Household Income and Wealth (SHIW). In the case of time-invariant variables, we assess the degree of inconsistency of answers given by panel households in subsequent survey waves. For quantities that vary with time, we estimate the incidence of measurement error by decomposing observed variability into true dynamics and error-induced noise. We apply the Heise model or the latent Markov model, depending on whether the data are continuous or categorical. We also present regression models that explain the error-generating process. Our results are relevant to researchers who use SHIW data for economic analysis, but also to data producers involved in similar income and wealth surveys. The methods we describe and test can be employed in a number of contexts to gain better understanding of data-related problems and plans for survey improvement.
Article
Full-text available
This paper aims to describe non-respondents in the Bank of Italy’s Survey of Household Income and Wealth (SHIW) and to measure the underestimation of income and wealth attributable to non-response. The evidence confirms that non-response is not random, since it is more frequent among wealthier households. Therefore exclusive use of post-stratification procedures based on demographic characteristics only, which are commonly employed, cannot properly adjust for the selection process observed in the SHIW. As to the estimates of average aggregates, the bias seems to be greater for financial assets (the adjusted estimates are from 15 to 31 per cent higher than the unadjusted) than for income (for which the adjustments vary from 5 to 14 per cent, probably owing to a greater asymmetry in the distribution of wealth).
Article
Full-text available
We estimate the size of Britain's black economy (defined narrowly as unreported taxable income) by using income and expenditure data drawn from the 1982 Family Expenditure Survey. Our working assumptions are that all income groups report expenditure on food correctly; employees in employment report income correctly; and that the self-employed under- report their income. We estimate food exependiture equations for all groups and then invert them to arrive at the conclusion that on average true self-employment income is 1.55 times as much as reported self-employment income. This implies that the size of the black economy is about 5.5 percent of GDP.
Article
In a recent article in this journal, Lombard, Snyder-Duch, and Bracken (2002) surveyed 200 content analyses for their reporting of reliability tests, compared the virtues and drawbacks of five popular reliability measures, and proposed guidelines and standards for their use. Their discussion revealed that numerous misconceptions circulate in the content analysis literature regarding how these measures behave and can aid or deceive content analysts in their effort to ensure the reliability of their data. This article proposes three conditions for statistical measures to serve as indices of the reliability of data and examines the mathematical structure and the behavior of the five coefficients discussed by the authors, as well as two others. It compares common beliefs about these coefficients with what they actually do and concludes with alternative recommendations for testing reliability in content analysis and similar data-making efforts.
Article
Sources of Survey Error Frames: Definitions of Frames and Frame Errors Frames: Quantifying Frame Errors Frames: Conducting Surveys with Imperfect Frames Nonresponse: Background and Terminology Nonresponse: Statistical Effects of the Problem Nonresponse: Dealing with the Problem Measurement: Survey Measurement and Measurement Error Measurement: Quantifying Measurement Error Measurement: Quantifying Measurement Error, Variability in Measurement Total Survey Design: More General Error Models Compedium of Nonsampling Error Terminology.
Article
The randomized response technique appears to have been an innovative and useful procedure for eliciting reliable responses from individuals on sensitive or embarrassing questions. In this paper a new and alternative method is proposed for the same problem. Through the use of supplemented block, (v, k, r, b. λ) balanced incomplete block, and spring balance weighing designs, the individual is required to give a total of the responses to k questions, sensitive or not. From these block totals it is possible to obtain estimated responses for each of the v questions used in the survey, yet not obtain individual response to single questions. Anonymity of response for a single interviewee is thus maintained. Estimators and their variances for the estimated responses are obtained. The method allows the surveyor to obtain answers to several sensitive questions without being unduly time‐consuming.
Article
Formulas are developed for estimating the true reliability of a measure from data collected at three points in time. The procedure can be applied to a single question, and unlike traditional test-retest reliabilities, this measure is not reduced in value when changes occur during the testing interval. A related coefficient of stability also is introduced, and a procedure is presented for examining the credibility of required assumptions.
Article
Preface. Chapter 1. The Evolution of Survey Process Quality. 1.1 The Concept of a Survey. 1.2 Types of Surveys. 1.3 Brief History of Survey Methodology. 1.4 The Quality Revolution. 1.5 Definitions of Quality and Quality in Statistical Organizations. 1.6 Measuring Quality. 1.7 Improving Quality. 1.8 Quality in a Nutshell. Chapter 2. The Survey Process and Data Quality. 2.1 Overview of the Survey Process. 2.2 Data Quality and Total Survey Error. 2.3 Decomposing Nonsampling Error into Its Component Parts. 2.4 Gauging the Magnitude of Total Survey Error. 2.5 Mean Squared Error. 2.6 An Illustration of the Concepts. Chapter 3. Coverage and Nonresponse Error. 3.1 Coverage Error. 3.2 Measures of Coverage Bias. 3.3 Reducing Coverage Bias. 3.4 Unit Nonresponse Error. 3.5 Calculating Response Rates. 3.6 Reducing Nonresponse Bias. Chapter 4. The Measurement Process and Its Implications for Questionnaire Design. 4.1Components of Measurement Error. 4.2 Errors Arising from the Questionnaire Design. 4.3 Understanding the Response Process. Chapter 5. Errors Due to Interviewers and Interviewing. 5.1 Role of the Interviewer. 5.2 Interviewer Variability. 5.3 Design Factors that Influence Interviewer Effects. 5.4 Evaluation of Interviewer Performance. Chapter 6. Data Collection Modes and Associated Errors. 6.1 Modes of Data Collection. 6.2 Decision Regarding Mode. 6.3 Some Examples of Mode Effects. Chapter 7. Data Processing: Errors and Their Control. 7.1 Overview of Data Processing Steps. 7.2 Nature of Data Processing Error. 7.3 Data Capture Errors. 7.4 Post-Data Capture Editing. 7.5 Coding. 7.6 File Preparation. 7.7 Applications of Continuous Quality Improvement: The Case of Coding. 7.8 Integration Activities. Chapter 8. Overview of Survey Error Evaluation Methods. 8.1 Purposes of Survey Error Evaluation. 8.2 Evaluation Methods for Designing and Pretesting Surveys. 8.3 Methods for Monitoring and Controlling Data Quality. 8.4 Postsurvey Evaluations. 8.5 Summary of Evaluation Methods. Chapter 9. Sampling Error. 9.1 Brief History of Sampling. 9.2 Nonrandom Sampling Methods. 9.3 Simple Random Sampling. 9.4 Statistical Inference in the Presence of Nonsampling Errors. 9.5 Other Methods of Random Sampling. 9.6 Concluding Remarks. Chapter 10.1 Practical Survey Design for Minimizing Total Survey Error. 10.1 Balance Between Cost, Survey Error, and Other Quality Features. 10.2 Planning a Survey for Optimal Quality. 10.3 Documenting Survey Quality. 10.4 Organizational Issues Related to Survey Quality. References. Index.
Article
This paper analyses respondents' behaviour when reporting their income sources in sample surveys and presents a method to deal with response error. Survey data relating to the number of earning recipients and to amounts received are validated using external information from administrative and statistical sources. Our findings suggest that the response bias on household income is about 12 per cent of reported figures. Misreporting is particularly severe for income from self-employment, financial assets and rents, as well as from secondary jobs. As to the distribution of response error, about 15 per cent of respondents show a high probability of misreporting. Misreporting is more diffuse among males, the older, the self-employed and respondents at the higher end of the earnings distribution.
Book
The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.
Article
Incl. bibl., abstract. Weighting adjustments are commonly applied in surveys to compensate for nonresponse and noncoverage, and to make weighted sample estimates conform to external values. Recent years have seen theoretical developments and increased use of methods that take account of substantial amounts of auxiliary information in making these adjustments. The article uses a simple example to describe such methods as cell weighting, raking, generali sed regression estimation, logistic regression weighting, mixtures of methods, and methods for restricting the range of the resultant adjustments. It also discusses how auxiliary variables may be chosen for use in the adjustments and describes some applications.
Article
For various reasons individuals in a sample survey may prefer not to confide to the interviewer the correct answers to certain questions. In such cases the individuals may elect not to reply at all or to reply with incorrect answers. The resulting evasive answer bias is ordinarily difficult to assess. In this paper it is argued that such bias is potentially removable through allowing the interviewee to maintain privacy through the device of randomizing his response. A randomized response method for estimating a population proportion is presented as an example. Unbiased maximum likelihood estimates are obtained and their mean square errors are compared with the mean square errors of conventional estimates under various assumptions about the underlying population.
Article
Nonresponse weighting is a common method for handling unit nonresponse in surveys. A widespread view is that the weighting method is aimed at reducing nonresponse bias, at the expense of an increase in variance. Hence, the efficacy of weighting adjustments becomes a bias-variance trade-off. This note suggests that this view is an oversimplification -- nonresponse weighting can in fact lead to a reduction in variance as well as bias. A covariate for a weighting adjustment must have two characteristics to reduce nonresponse bias - it needs to be related to the probability of response, and it needs to be related to the survey outcome. If the latter is true, then weighting can reduce, not increase, sampling variance. A detailed analysis of bias and variance is provided in the setting of weighting for an estimate of a survey mean based on adjustment cells. The analysis suggests that the most important feature of variables for inclusion in weighting adjustments is that they are predictive of survey outcomes; prediction of the propensity to respond is a secondary, though useful, goal. Empirical estimates of root mean squared error for assessing when weighting is effective are proposed and evaluated in a simulation study.
Measuring interviewer effects across countries and surveys. Paper presented at The Fourth Conference of the European Survey Research Association
  • A Blom
Blom, A.: Measuring interviewer effects across countries and surveys. Paper presented at The Fourth Conference of the European Survey Research Association, Lausanne (2011)
Housing assets in the Bank of Italy's survey of household income and wealth
  • L Cannari
  • G D'alessio
Cannari, L., D'Alessio, G.: Housing assets in the Bank of Italy's survey of household income and wealth. In: C. Dagum, M. Zenga (eds.) Income and Wealth Distribution, Inequality and Poverty -Proceedings, Springer Verlag, Pavia, pp. 326-334 (1990)
Estimating Engel curves under unit and item nonresponse
  • G De Luca
  • F Peracchi
De Luca, G., Peracchi, F.: Estimating Engel curves under unit and item nonresponse. Journal of Applied Econometrics, doi: 10.1002/jae.1232 (2011)
Computing Krippendorff's Alpha Reliability. Departmental Papers 43, Annenberg School for Communication
  • K Krippendorff
Krippendorff, K.: Computing Krippendorff's Alpha Reliability. Departmental Papers 43, Annenberg School for Communication, University of Pennsylvania (2007)