Article

Efficient bootstrap for business surveys

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Australian Bureau of Statistics has recently developed a generalized estimation system for processing its large scale annual and sub-annual business surveys. Designs for these surveys have a large number of strata, use Simple Random Sampling within Strata, have non-negligible sampling fractions, are overlapping in consecutive periods, and are subject to frame changes. A significant challenge was to choose a variance estimation method that would best meet the following requirements: valid for a wide range of estimators (e. g., ratio and generalized regression), requires limited computation time, can be easily adapted to different designs and estimators, and has good theoretical properties measured in terms of bias and variance. This paper describes the Without Replacement Scaled Bootstrap (WOSB) that was implemented at the ABS and shows that it is appreciably more efficient than the Rao and Wu (1988)'s With Replacement Scaled Bootstrap (WSB). The main advantages of the Bootstrap over alternative replicate variance estimators are its efficiency (i.e., accuracy per unit of storage space) and the relative simplicity with which it can be specified in a system. This paper describes the WOSB variance estimator for point-in-time and movement estimates that can be expressed as a function of finite population means. Simulation results obtained as part of the evaluation process show that the WOSB was more efficient than the WSB, especially when the stratum sample sizes are sometimes as small as 5.

No full-text available

Request Full-text Paper PDF

Request the article directly
from the authors on ResearchGate.

... d surveys due to the inability of the bootstrap to appropriately account for clustering in area-based sample design. Rao & Wu (1984) then established the Without Replacement Scaled Bootstrap variance estimator (WOSB) which is now largely adopted as the standard in the context of generalised regression estimation (or generalised raking) within NSOs. Chipperfield & Preston (2007) have shown that the WOSB reduces replication error and therefore yields more precise variance estimators under stratified random sampling. Time series in official statistics largely revolves around the dissemination of seasonally adjusted figures, given their intuitive representation of economic phenomena. Shiskin & Eisenpress (1957) co ...
Thesis
Decomposition models are often utilised by official statistical agencies in order to provide an explicit breakdown of time series data into trend, seasonal and irregular components. This paper seeks to explore the volatility in these estimates, incorporating the additional variation contributed by the repeated sample surveys underlying the series. The finite populationadjusted bootstrap replication procedure is explored as an option in quantifying such volatility through application to the Australian Bureau of Statistics’ Quarterly Business Indicators Survey. Methods implemented are critically analysed with respect to their robustness, sensitivity to sample design parameters, from a traditional time series viewpoint, and also implications within the context of official statistics. As a secondary discussion, the aforementioned variance estimation methods are then postulated as useful tools in sample design. Such design would seek to incorporate the impact of variation in time series components on the allocation of sample into strata. Methods are specified but not necessarily applied in this thesis. Key words: time series, repeated surveys, official statistics, bootstrap, sample design
... Since the microcensus is a sample without replacement drawn from a finite population, the "naïve" bootstrap procedure described above can not be applied in exactly this form. Instead, the "rescaled" bootstrap procedure introduced by Rao and Wu (1988) with the adjustment of using rescaled weights instead of rescaled survey data values (see Rao, Wu, and Yue 1992) is used with the additional modification of selecting bootstrap samples without replacement (see Chipperfield and Preston 2007;Preston 2009), also incorporating the stratification by region r (see Section 3.2). To be more specific, instead of drawing c bootstrap samples with replacement of the same size m r as the original sample, subsamples without replacement of size m j r = m r /2 are drawn. ...
Article
Full-text available
The Austrian microcensus is the biggest sample survey of the Austrian population, itis a regionally stratied cluster sample with a rotational pattern. The sampling fractionsdier signicantly between the regions, therefore the sample size of the regions is quitehomogeneous. The primary sampling unit is the household, within each household allpersons are surveyed. The design weights are the input for the calibration on populationcounts and household forecasts. It is performed by iterative proportional tting. Untilthe third quarter of 2014 only demographic, regional and household information wereused in the weighting procedure. From the fourth quarter 2014 onwards the weightingprocess was improved by adding an additional dimension to the calibration, namely alabour status generated from administrative data and available for the whole population.Apart from that some further minor changes were introduced. This paper describes themethodological and practical issues of the microcensus weighting process and the varianceestimation applied from 2015 onwards. The new procedure was used for the rst timefor the forth quarter of 2014, published at the end of March 2015. At the same time, allprevious microcensus surveys back to 2004 were reweighted according to the new approach.
Chapter
In this chapter, the authors extend the results of S. M. Tam and J. K. Kim to continuous variables, again without making any missing‐at‐random assumptions on the inclusion mechanism for the nonprobability sample. They also extend data integration methods to address measurement errors in the data source (henceforth denoted as B), the simple random sample (denoted by A), and nonresponse biases in A. The authors describe the methods Tam and Kim used, and show how the two data sources, B and A, can be combined to address undercoverage bias in B and improve the efficiency in estimating the population total of the target population using A. They discuss the estimation of the population total when measurement errors occur in data source B or in the probability sample A. The authors present simulation results to illustrate the methods, and discuss two applications of the methods in official statistics with certain limitations.
Article
Wie schon 2011 wird auch 2021 in Deutschland wieder ein registergestützter Zensus durchgeführt. Dabei werden die benötigten Informationen soweit wie möglich aus Melderegistern und anderen Verwaltungsdaten zusammengetragen und um weitere Informationen aus Primärerhebungen ergänzt. Eine jener Erhebungen ist die Haushaltsstichprobe, deren wichtigster Zweck die Korrektur der Register um Karteileichen und Fehlbestände zur Schätzung der Einwohnerzahl ist. Darüber hinaus wird mit Hilfe der Haushaltsstichprobe eine Vielzahl von weiteren regional und inhaltlich tief gegliederten Zensusergebnissen ermittelt, wie zum Beispiel für regional und demographisch differenzierte Bevölkerungskohorten.
Article
Record linkage is the act of bringing together records from two files that are believed to belong to the same unit (e.g., a person or business). It is a low‐cost way of increasing the set of variables available for analysis. Errors may arise in the linking process if an error‐free unit identifier is not available. Two types of linking errors include an incorrect link (records belonging to two different units are linked) and a missed record (an unlinked record for which a correct link exists). Naively ignoring linkage errors may mean that analysis of the linked file is biased. This paper outlines a “weighting approach” to making correct inference about regression coefficients and population totals in the presence of such linkage errors. This approach is designed for analysts who do not have the expertise or time to use specialist software required by other approaches but who are comfortable using weights in inference. The performance of the estimator is demonstrated in a simulation study.
Article
Resampling methods are a common measure to estimate the variance of a statistic of interest when data consist of nonresponse and imputation is used as compensation. Applying resampling methods usually means that subsamples are drawn from the original sample and that variance estimates are computed based on point estimators of several subsamples. However, newer resampling methods such as the rescaling bootstrap of Chipperfield and Preston [Efficient bootstrap for business surveys. Surv Methodol. 2007;33:167–172] include all elements of the original sample in the computation of its point estimator. Thus, procedures to consider imputation in resampling methods cannot be applied in the ordinary way. For such methods, modifications are necessary. This paper presents an approach applying newer resampling methods for imputed data. The Monte Carlo simulation study conducted in the paper shows that the proposed approach leads to reliable variance estimates in contrast to other modifications.
Chapter
We first review bootstrap variance estimation for estimators of finite population quantities such as population totals or means. In this context, the bootstrap is typically implemented by producing a set of bootstrap design weights that account for the variability due to sample selection. Sometimes, survey analysts are interested in making inferences about model parameters. We then describe how to modify bootstrap design weights so as to account for the variability resulting from the analyst’s model. Finally, we discuss bootstrap tests of hypotheses for survey data.
Article
The Producer Price Index (PPI) collects price data from domestic producers of commodities and publishes monthly indexes on average price changes received by those producers at all stages of processing. PPI samples employ a two-stage design where establishments are selected in the first stage and unique items are selected in the second stage. In this paper we review the research results from the PPI variance estimation study. The objective of the study was to determine the best method of variance estimation appropriate for PPI data. Historical data from eleven NAICS industries were used to create simulation frames, from which simulation samples were drawn and estimated variances calculated. The replication methods compared were the Balanced Repeated Replication (BRR), Fay's BRR, Jackknife and the Bootstrap. The Bootstrap method was recommended for the PPI program by the study.
Article
A number of covariance results in finite population sampling are brought together in this article. Instructive methods are given to derive these results.