Article

Efficient bootstrap for business surveys

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Australian Bureau of Statistics has recently developed a generalized estimation system for processing its large scale annual and sub-annual business surveys. Designs for these surveys have a large number of strata, use Simple Random Sampling within Strata, have non-negligible sampling fractions, are overlapping in consecutive periods, and are subject to frame changes. A significant challenge was to choose a variance estimation method that would best meet the following requirements: valid for a wide range of estimators (e. g., ratio and generalized regression), requires limited computation time, can be easily adapted to different designs and estimators, and has good theoretical properties measured in terms of bias and variance. This paper describes the Without Replacement Scaled Bootstrap (WOSB) that was implemented at the ABS and shows that it is appreciably more efficient than the Rao and Wu (1988)'s With Replacement Scaled Bootstrap (WSB). The main advantages of the Bootstrap over alternative replicate variance estimators are its efficiency (i.e., accuracy per unit of storage space) and the relative simplicity with which it can be specified in a system. This paper describes the WOSB variance estimator for point-in-time and movement estimates that can be expressed as a function of finite population means. Simulation results obtained as part of the evaluation process show that the WOSB was more efficient than the WSB, especially when the stratum sample sizes are sometimes as small as 5.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Thus, in some situations, it is important that the variance estimator includes all stages of the sampling design. A resampling method that takes all the sampling stages into account in the variance estimation is the multistage rescaling bootstrap proposed by Preston (2009), which is an extension of the rescaling bootstrap of Chipperfield and Preston (2007) for multistage sampling designs. 3 For this resampling method, subsamples are drawn independently without replacement at each stage of the sampling design. ...
... It is also possible to use the rescaling bootstrap of Chipperfield and Preston (2007) with the extension to multistage sampling designs proposed by Preston (2009) and to make use of subsampling process of the rescaling bootstrap to derive a variance estimator for V s (θ I |z). The important concern is how to apply the reimputation for the rescaling bootstrap. ...
Article
Full-text available
In this paper, we propose a method that estimates the variance of an imputed estimator in a multistage sampling design. The method is based on the rescaling bootstrap for multistage sampling introduced by Preston (Surv Methodol 35(2):227–234, 2009). In his original version, this resampling method requires that the dataset includes only complete cases and no missing values. Thus, we propose two modifications for applying this method to nonresponse and imputation. These modifications are compared to other modifications in a Monte Carlo simulation study. The results of our simulation study show that our two proposed approaches are superior to the other modifications of the rescaling bootstrap and, in many situations, produce valid estimators for the variance of the imputed estimator in multistage sampling designs.
... Furthermore, these methods assume an overall negligible sampling fraction and provide ad hoc-if any-procedures to incorporate finite population correction (FPC) adjustments into the variance estimator. There are several bootstrap resampling algorithms that incorporate FPC adjustments with moderately trivial adjustments to the bootstrap weights that provide asymptotically unbiased design-based variance estimates under complete response or adjustment cell weighting (see Sitter 1992;Rao and Wu 1988;Chipperfield and Preston 2007;Mashreghi, Haziza, and Leger 2016). However, these bootstrap methods do not provide unbiased design-based variance estimates under imputation. ...
Article
The U.S. Census Bureau has historically used nearest neighbor (NN) or random hot deck (RHD) imputation to handle missing data for many types of survey data. Using these methods removes the need to parametrically model values in imputation models. With strong auxiliary information, NN imputation is preferred because it produces more precise estimates than RHD. In addition, NN imputation is robust against a misspecified response mechanism if missingness depends on the auxiliary variable, in contrast to RHD which ignores the auxiliary information. A compromise between these two methods is k-NN imputation, which identifies a set of the k closest neighbors (“donor pool”) and randomly selects a single donor from this set. Recently these methods have been used for multiple imputation (MI), enabling variance estimation via the so-called Rubin’s Combining Rules. The Approximate Bayesian Bootstrap (ABB) is a simple-to-implement algorithm that makes the RHD “proper” for MI. In concept, ABB should work to propagate uncertainty for NN MI; bootstrapping respondents mean each nonrespondent’s one “nearest” donor will not be available for every imputation. However, we demonstrate through simulation that NN MI using ABB leads to variance underestimation. This underestimation is somewhat but not entirely attenuated with k-NN imputation. An alternative approach to variance estimation after MI, bootstrapped MI, eliminates the underestimation with NN imputation, but we show that it suffers from overestimation of variance with nonnegligible sampling fractions under both equal and unequal probability sampling designs. We propose a modification to bootstrapped MI to account for nonnegligible sampling fractions. We compare the performance of RHD and the various NN MI methods under a variety of sampling designs, sampling fractions, distribution shapes, and missingness mechanisms.
... Since the microcensus is a sample without replacement drawn from a finite population, the "naïve" bootstrap procedure described above can not be applied in exactly this form. Instead, the "rescaled" bootstrap procedure introduced by Rao and Wu (1988) with the adjustment of using rescaled weights instead of rescaled survey data values (see Rao, Wu, and Yue 1992) is used with the additional modification of selecting bootstrap samples without replacement (see Chipperfield and Preston 2007;Preston 2009), also incorporating the stratification by region r (see Section 3.2). To be more specific, instead of drawing c bootstrap samples with replacement of the same size m r as the original sample, subsamples without replacement of size m j r = m r /2 are drawn. ...
Article
Full-text available
The Austrian microcensus is the biggest sample survey of the Austrian population, it is a regionally stratied cluster sample with a rotational pattern. The sampling fractions dier signicantly between the regions, therefore the sample size of the regions is quite homogeneous. The primary sampling unit is the household, within each household all persons are surveyed. The design weights are the input for the calibration on population counts and household forecasts. It is performed by iterative proportional tting. Until the third quarter of 2014 only demographic, regional and household information were used in the weighting procedure. From the fourth quarter 2014 onwards the weighting process was improved by adding an additional dimension to the calibration, namely a labour status generated from administrative data and available for the whole population. Apart from that some further minor changes were introduced. This paper describes the methodological and practical issues of the microcensus weighting process and the variance estimation applied from 2015 onwards. The new procedure was used for the rst time for the forth quarter of 2014, published at the end of March 2015. At the same time, all previous microcensus surveys back to 2004 were reweighted according to the new approach.
... d surveys due to the inability of the bootstrap to appropriately account for clustering in area-based sample design. Rao & Wu (1984) then established the Without Replacement Scaled Bootstrap variance estimator (WOSB) which is now largely adopted as the standard in the context of generalised regression estimation (or generalised raking) within NSOs. Chipperfield & Preston (2007) have shown that the WOSB reduces replication error and therefore yields more precise variance estimators under stratified random sampling. Time series in official statistics largely revolves around the dissemination of seasonally adjusted figures, given their intuitive representation of economic phenomena. Shiskin & Eisenpress (1957) co ...
Thesis
Decomposition models are often utilised by official statistical agencies in order to provide an explicit breakdown of time series data into trend, seasonal and irregular components. This paper seeks to explore the volatility in these estimates, incorporating the additional variation contributed by the repeated sample surveys underlying the series. The finite populationadjusted bootstrap replication procedure is explored as an option in quantifying such volatility through application to the Australian Bureau of Statistics’ Quarterly Business Indicators Survey. Methods implemented are critically analysed with respect to their robustness, sensitivity to sample design parameters, from a traditional time series viewpoint, and also implications within the context of official statistics. As a secondary discussion, the aforementioned variance estimation methods are then postulated as useful tools in sample design. Such design would seek to incorporate the impact of variation in time series components on the allocation of sample into strata. Methods are specified but not necessarily applied in this thesis. Key words: time series, repeated surveys, official statistics, bootstrap, sample design
Article
Bootstrap is a useful computational tool for statistical inference, but it may lead to erroneous analysis under complex survey sampling. In this paper, we propose a unified bootstrap method for stratified multi‐stage cluster sampling, Poisson sampling, simple random sampling without replacement and probability proportional to size sampling with replacement. In the proposed bootstrap method, we first generate bootstrap finite populations, apply the same sampling design to each bootstrap population to get a bootstrap sample, and then apply studentization. The second‐order accuracy of the proposed bootstrap method is established by the Edgeworth expansion. Simulation studies confirm that the proposed bootstrap method outperforms the commonly used Wald‐type method in terms of coverage, especially when the sample size is not large.
Article
Full-text available
In a Dual Frame (DF) surveys, set of two frames is used instead of a traditional single frame of sampling units from the target population. Dual frame surveys are applicable in those situations where one frame covers the entire population but very expensive to sample; so an alternate frame may be available that does not cover the entire population but is inexpensive to sample. As Hartley (1962) noted, variance estimation can be more complicated for dual frame surveys than for a single-frame survey. Unbiased variance estimator of parameter of interest is very tedious to obtain for estimator using dual frame surveys. In this article, we propose two rescaling bootstrap variance estimation techniques in dual frame surveys viz. Stratified Rescaling Bootstrap Without Replacement (SRBWO) and Post-stratified Rescaling Bootstrap Without Replacement (PRBWO) methods. Statistical properties of the proposed methods are compared through a simulation study. Simulation results suggest that the proposed SRBWO and PRBWO methods give an unbiased estimate of the variance of the dual frame estimator of population total and the SRBWO method performs better than the PRBWO method.
Article
Surveys with a rotating panel design are a prominent tool for producing more efficient estimates for indicators regarding trends or net changes over time. Variance estimation for net changes becomes however more complicated due to a possibly high correlation between the panel waves. Therefore, these estimates are quite burdensome to produce with traditional means. With the R-package surveysd, we present a tool which supports a straightforward way for producing estimates and corresponding standard errors for complex surveys with a rotating panel design. The package uses bootstrap techniques which incorporate the panel design and thus makes it easy to estimate standard errors. In addition the package supports a method for producing more efficient estimates by cumulating multiple consecutive sample waves. This method can lead to a significant decrease in variance assuming that structural patterns for the indicator in question remain fairly robust over time. The usability of the package and variance improvement, using this bootstrap methodology, is demonstrated on data from the user database (UDB) for the EU Statistics on Income and Living Conditions of selected countries with various sampling designs.
Chapter
In this chapter, the authors extend the results of S. M. Tam and J. K. Kim to continuous variables, again without making any missing‐at‐random assumptions on the inclusion mechanism for the nonprobability sample. They also extend data integration methods to address measurement errors in the data source (henceforth denoted as B), the simple random sample (denoted by A), and nonresponse biases in A. The authors describe the methods Tam and Kim used, and show how the two data sources, B and A, can be combined to address undercoverage bias in B and improve the efficiency in estimating the population total of the target population using A. They discuss the estimation of the population total when measurement errors occur in data source B or in the probability sample A. The authors present simulation results to illustrate the methods, and discuss two applications of the methods in official statistics with certain limitations.
Article
Wie schon 2011 wird auch 2021 in Deutschland wieder ein registergestützter Zensus durchgeführt. Dabei werden die benötigten Informationen soweit wie möglich aus Melderegistern und anderen Verwaltungsdaten zusammengetragen und um weitere Informationen aus Primärerhebungen ergänzt. Eine jener Erhebungen ist die Haushaltsstichprobe, deren wichtigster Zweck die Korrektur der Register um Karteileichen und Fehlbestände zur Schätzung der Einwohnerzahl ist. Darüber hinaus wird mit Hilfe der Haushaltsstichprobe eine Vielzahl von weiteren regional und inhaltlich tief gegliederten Zensusergebnissen ermittelt, wie zum Beispiel für regional und demographisch differenzierte Bevölkerungskohorten.
Article
Record linkage is the act of bringing together records from two files that are believed to belong to the same unit (e.g., a person or business). It is a low‐cost way of increasing the set of variables available for analysis. Errors may arise in the linking process if an error‐free unit identifier is not available. Two types of linking errors include an incorrect link (records belonging to two different units are linked) and a missed record (an unlinked record for which a correct link exists). Naively ignoring linkage errors may mean that analysis of the linked file is biased. This paper outlines a “weighting approach” to making correct inference about regression coefficients and population totals in the presence of such linkage errors. This approach is designed for analysts who do not have the expertise or time to use specialist software required by other approaches but who are comfortable using weights in inference. The performance of the estimator is demonstrated in a simulation study.
Article
Resampling methods are a common measure to estimate the variance of a statistic of interest when data consist of nonresponse and imputation is used as compensation. Applying resampling methods usually means that subsamples are drawn from the original sample and that variance estimates are computed based on point estimators of several subsamples. However, newer resampling methods such as the rescaling bootstrap of Chipperfield and Preston [Efficient bootstrap for business surveys. Surv Methodol. 2007;33:167–172] include all elements of the original sample in the computation of its point estimator. Thus, procedures to consider imputation in resampling methods cannot be applied in the ordinary way. For such methods, modifications are necessary. This paper presents an approach applying newer resampling methods for imputed data. The Monte Carlo simulation study conducted in the paper shows that the proposed approach leads to reliable variance estimates in contrast to other modifications.
Chapter
We first review bootstrap variance estimation for estimators of finite population quantities such as population totals or means. In this context, the bootstrap is typically implemented by producing a set of bootstrap design weights that account for the variability due to sample selection. Sometimes, survey analysts are interested in making inferences about model parameters. We then describe how to modify bootstrap design weights so as to account for the variability resulting from the analyst’s model. Finally, we discuss bootstrap tests of hypotheses for survey data.
Article
We study the generalized bootstrap technique under general sampling designs. We focus mainly on bootstrap variance estimation but we also investigate the empirical properties of bootstrap confidence intervals obtained using the percentile method. Generalized bootstrap consists of randomly generating bootstrap weights so that the first two (or more) design moments of the sampling error are tracked by the corresponding bootstrap moments. Most bootstrap methods in the literature can be viewed as special cases. We discuss issues such as the choice of the distribution used to generate bootstrap weights, the choice of the number of bootstrap replicates, and the potential occurrence of negative bootstrap weights. We first describe the generalized bootstrap for the linear Horvitz-Thompson estimator and then consider non-linear estimators such as those defined through estimating equations. We also develop two ways of bootstrapping the generalized regression estimator of a population total. We study in greater depth the case of Poisson sampling, which is often used to select samples in Price Index surveys conducted by national statistical agencies around the world. For Poisson sampling, we consider a pseudo-population approach and show that the resulting bootstrap weights capture the first three design moments of the sampling error. A simulation study and an example with real survey data are used to illustrate the theory. © 2012 The Authors. International Statistical Review
Article
The Producer Price Index (PPI) collects price data from domestic producers of commodities and publishes monthly indexes on average price changes received by those producers at all stages of processing. PPI samples employ a two-stage design where establishments are selected in the first stage and unique items are selected in the second stage. In this paper we review the research results from the PPI variance estimation study. The objective of the study was to determine the best method of variance estimation appropriate for PPI data. Historical data from eleven NAICS industries were used to create simulation frames, from which simulation samples were drawn and estimated variances calculated. The replication methods compared were the Balanced Repeated Replication (BRR), Fay's BRR, Jackknife and the Bootstrap. The Bootstrap method was recommended for the PPI program by the study.
Article
A number of covariance results in finite population sampling are brought together in this article. Instructive methods are given to derive these results.