Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

When sampling from a continuous population (or distribution), we often want a rather small sample due to some cost attached to processing the sample or to collecting information in the field. Moreover, a probability sample that allows for design‐based statistical inference is often desired. Given these requirements, we want to reduce the sampling variance of the Horvitz–Thompson estimator as much as possible. To achieve this, we introduce different approaches to using the local pivotal method for selecting well‐spread samples from multidimensional continuous populations. The results of a simulation study clearly indicate that we succeed in selecting spatially balanced samples and improve the efficiency of the Horvitz–Thompson estimator.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The approximate variance estimator for the LPM, proposed by Grafström and Schelin (2014) and Grafström and Matei (2018) is similar to the local difference estimators used with systematic sampling. In previous studies, the approximate variance estimator has worked well (Grafström and Schelin 2014;Grafström and Matei 2018). ...
... The approximate variance estimator for the LPM, proposed by Grafström and Schelin (2014) and Grafström and Matei (2018) is similar to the local difference estimators used with systematic sampling. In previous studies, the approximate variance estimator has worked well (Grafström and Schelin 2014;Grafström and Matei 2018). However, the study areas have been small and generalization of results for NFI estimation where results are reported at regional level has not been shown. ...
... Several studies have observed enhancements in the estimation efficiency when LPM has utilized auxiliary data correlating with the target variables in addition to the geographic coordinates, (e.g. Grafström and Ringvall 2013;Roberge et al. 2017;Grafström and Matei 2018;Räty et al. 2018). As in Grafström and Ringvall (2013), also in our data, the correlations which we used to explore the explanatory power between the auxiliary and target variables were strong between the mean volume related target and auxiliary data (Tables 4-5). ...
Article
Background The local pivotal method (LPM) utilizing auxiliary data in sample selection has recently been proposed as a sampling method for national forest inventories (NFIs). Its performance compared to simple random sampling (SRS) and LPM with geographical coordinates has produced promising results in simulation studies. In this simulation study we compared all these sampling methods to systematic sampling. The LPM samples were selected solely using the coordinates (LPMxy) or, in addition to that, auxiliary remote sensing-based forest variables (RS variables). We utilized field measurement data (NFI-field) and Multi-Source NFI (MS-NFI) maps as target data, and independent MS-NFI maps as auxiliary data. The designs were compared using relative efficiency (RE); a ratio of mean squared errors of the reference sampling design against the studied design. Applying a method in NFI also requires a proven estimator for the variance. Therefore, three different variance estimators were evaluated against the empirical variance of replications: 1) an estimator corresponding to SRS; 2) a Grafström-Schelin estimator repurposed for LPM; and 3) a Matérn estimator applied in the Finnish NFI for systematic sampling design. Results The LPMxy was nearly comparable with the systematic design for the most target variables. The REs of the LPM designs utilizing auxiliary data compared to the systematic design varied between 0.74–1.18, according to the studied target variable. The SRS estimator for variance was expectedly the most biased and conservative estimator. Similarly, the Grafström-Schelin estimator gave overestimates in the case of LPMxy. When the RS variables were utilized as auxiliary data, the Grafström-Schelin estimates tended to underestimate the empirical variance. In systematic sampling the Matérn and Grafström-Schelin estimators performed for practical purposes equally. Conclusions LPM optimized for a specific variable tended to be more efficient than systematic sampling, but all of the considered LPM designs were less efficient than the systematic sampling design for some target variables. The Grafström-Schelin estimator could be used as such with LPMxy or instead of the Matérn estimator in systematic sampling. Further studies of the variance estimators are needed if other auxiliary variables are to be used in LPM.
... [2], and soil surveys, e.g. [3], and has been used for sampling from continuous populations [7,8]. ...
... When variance reduction techniques are applied, the estimation of the variance is usually more complicated. For LPM, we suggest to estimate the variance using a local mean variance estimator as in [19,20] and [12], see also [7]. More precisely, we suggest to estimate the variance of our LPM-estimator of E(y(X)) by the local mean variance estimator ...
Preprint
Full-text available
The local pivotal method (LPM) is a successful sampling method for taking well-spread samples from discrete populations. We show how the LPM can be utilized to sample from arbitrary continuous distributions and thereby give powerful variance reduction in general cases. The method creates an "automatic stratification" on any continuous distribution, of any dimension, and selects a "thin" well-spread sample. We demonstrate the simplicity, generality and effectiveness of the LPM with various examples, including Monte Carlo estimation of integrals, option pricing and stability estimation in non-linear dynamical systems. Additionally, we show how the LPM can be combined with other variance reduction techniques, such as importance sampling , to achieve even greater variance reduction. To facilitate the implementation of the LPM, we provide a quick start guide to using LPM in MATLAB and R, which includes sample code demonstrating how to achieve variance reduction with just a few lines of code.
... For the running means of eight 10-year periods, adding ∆CMS t to the pool of auxiliary variables consistently resulted in values of ∆ V M larger than those obtained by EPS models not using this auxiliary variable. Differences between GL-EPS and GL-EPS-CMS were largest at 6.7%, i.e., 38.3% vs. 45.0%, and differences between GREG-EPS and GREG-EPS-CMS and TREE-EPS and TREE-EPS-CMS were of small magnitude < 1%, i.e., 36.1% vs. 36 ...
... In particular, the estimation of V AGB t1 and V AGB t2 are instances of the same problem, and the same unknowns and potential solutions apply to them. Methods to provide spatially balanced samples with a time overlap have been proposed [36] and could be used to estimate 2Cov AGB t2 , AGB t1 . ...
Article
Full-text available
Quantifying above-ground biomass changes, ΔAGB, is key for understanding carbon dynamics. National Forest Inventories, NFIs, aims at providing precise estimates of ΔAGB relying on model-assisted estimators that incorporate auxiliary information to reduce uncertainty. Poststratification estimators, PS, are commonly used for this task. Recently proposed endogenous poststratification, EPS, methods have the potential to improve the precision of PS estimates of ΔAGB. Using the state of Oregon, USA, as a testing area, we developed a formal comparison between three EPS methods, traditional PS estimators used in the region, and the Horvitz-Thompson, HT, estimator. Results showed that gains in performance with respect to the HT estimator were 9.71% to 19.22% larger for EPS than for PS. Furthermore, EPS methods easily accommodated a large number of auxiliary variables, and the inclusion of independent predictions of ΔAGB as an additional auxiliary variable resulted in further gains in performance.
... Generalised random tessellation stratified (GRTS, Stevens & Olsen 2004) was among the first, and several designs using similar methodologies have been proposed (Theobald et al. 2007;Dickson & Tillé 2016;Chauvet & Le Gleut 2020). Another group of methods apply a local repulsion strategy to promote sample spread (Grafström 2011;Grafström, Lundström & Schelin 2012;Grafström & Tillé 2013;Grafström & Matei 2018;Jauslin & Tillé 2020). and Robertson, Price & Reale (2024) maximise the average spatial balance over a set of candidate samples, and maximise a within-sample distance measure. ...
Article
One of the most critical design features for sampling spatial populations is being able to draw spatially balanced samples. A substantial body of literature on sampling methodology has shown that spatially balanced samples can improve the precision of commonly used design‐based estimators in various settings. Spatially balanced master samples offer several practical advantages for practitioners, including adjusting the sample size to match budgetary constraints, intensifying a previous sample or defining a panel design for surveying over time. These designs are of practical importance and should be easy to generate with reliable and efficient software. The spbal R package provides explicit functionality for spatially balanced master sampling designs from point and areal resources. Stratified and panel designs are also possible with spbal . In this article, we demonstrate the flexibility of spbal with several example designs using spatial populations from New Zealand.
... Another limitation could be stated regarding the largely unbalanced sample sizes for the abused and the nonabused groups, which may have potentially led to less reliable comparisons between groups. Hopefully, the abused may represent a small percentage of the general population, making it challenging to ensure balanced samples in statistics (e.g., 91,92). Also, Wave 1 was conducted at the start of the COVID-19 pandemic, and Waves 2 and 3 occurred after its conclusion. ...
Article
Full-text available
Background While the impact of parenting styles on adolescents’ mental health is well documented, no study has used latent person-oriented methods to analyze the effects of parenting style trajectories, experienced by physically abused and nonabused adolescents from early to middle adolescence, on mental health outcomes. Method In this longitudinal study, we used latent transition analysis (LTA) to detect parenting patterns and their trajectories among 1,709 adolescents from 44 high schools in Switzerland across three data waves (2021-2023) by applying a multigroup comparison between physically nonabused and abused adolescents. Using multinomial regression, we tested the effects of the detected parenting patterns on adolescents’ mental health. Results Along with the two known patterns, termed “supportive” and “negative” parenting, two new parenting patterns which we termed “absent” (low levels on all tested parenting styles) and “ambiguous” (middle to high levels on all tested parenting styles) emerged as playing a key role in the perceptions of adolescents with and without parental abuse experience longitudinally. These four patterns developed in diverse ways: Supportive parenting decreased for abused adolescents over time but remained stable for the nonabused adolescents. The absent parenting level was stable over time among abused adolescents when compared to the outcomes experienced by adolescents subjected to the negative parenting pattern. Furthermore, we found a remarkable decline in the number of nonabused adolescents in the absence pattern from Wave 1 to Wave 3. Further, we also found that abused adolescents reported more negative parenting than nonabused adolescents. Additionally, we found that supportive parenting was beneficial for adolescents’ mental health whereas negative, ambiguous, and absent parenting all had detrimental effects. Conclusions These findings highlight the beneficial association of supportive parenting and the detrimental effects of negative, ambiguous, and absent parenting. This also suggests that we must consider a more complex approach that involves examining a blend of different parenting styles when analyzing adolescent mental health.
... Continuous forest inventory (CFI) programs are motivated by and must contend with variation in forest structure across space and time. Numerous sampling designs have been developed to spread and balance CFI sample points spatially across a region so as to improve estimation accuracy (e.g., Grafström and Matei 2018), but the distribution of measurements over time generally follows one of two strategies. In the case of a periodic CFI, the complete set of sample points is measured in a single year (or short span of consecutive years), with recurrent full remeasurement planned on some longer-term basis. ...
Article
Full-text available
The mixed estimator (ME) for annual forest inventory proposed by van Deusen (1999; Can. J. For. Res. 29: 1824–1828) is reformulated as a linear mixed model. This equivalent structure admits an interpretation of the ME as a polynomial regression on year with correlated year-specific random effects. It also uncovers the necessary criterion for maximum likelihood (ML) estimation. The improved performance of the ME under ML estimation is illustrated through simulations and application to inventory data from Montana, USA. Limitations of the ME relating to model-misspecification are also discussed.
... This balanced selection of tracts enhances estimation precision, meaning that the estimates are closer to the true values than traditional sampling methods. The design is also exible, allowing scalability and the inclusion of uncommon and common phenomena within the same framework through different sampling densities, multistage sampling, and the exclusion of tracts (Grafström & Matei, 2018). The potential of a balanced sampling design has been increasingly recognized for large-scale environmental monitoring and forest cover change assessment, as demonstrated in studies by Kermorvant This study employed a multi-level sampling approach to estimate miombo woodland canopy cover in Tanzania. ...
Preprint
Full-text available
Long-term monitoring is essential to understand the impacts of land use and climate change on miombo woodlands. This study introduces an innovative monitoring design for miombo woodlands with a two-stage sampling utilizing spatially balanced techniques to estimate the area and canopy cover of miombo woodland across the Tabora, Sikonge, Mlele, and Tanganyika districts. The first step involved the selection of 68 tracts, each comprising an average of 1025 plots, with the aid of spatially balanced sampling. Each of the 69,716 plots was classified into closed (canopy cover > 70%), open (40% ≤ canopy cover ≤ 70%), very open (10% ≤ canopy cover < 40%), and non-miombo (canopy cover < 10%) based on woodland cover derived from Sentinel 2 images, followed by the second step consisting of stratified random sampling and inventorying of 2,690 plots within 68 tracts. Using PlanetScope images, we determined the canopy cover for the 2,690 plots selected in the second step and reclassified them accordingly. Employing the Horvitz–Thompson estimator, our results showed that miombo woodlands in these districts cover 37,359 ± 4,618 km² with an average canopy cover of 55% ± 5%. Closed miombo woodland (canopy cover > 70%) was the dominating woodland type, covering 29,546 ± 4,382 km² of the study area with an average canopy cover of 84% ± 7%. The study's innovative sampling design provides reliable estimates of the area of miombo woodlands and average canopy cover, with relative standard errors consistently below 25%, offering a robust foundation for monitoring different miombo types.
... Benedetti and Piersimoni (2017) proposed a flexible class of methods that draw an SBS using a within-sample distance or similarity measure. Another group of designs apply a local repulsion strategy to ensure samples that are well-spread are drawn (Grafström 2011;Grafström et al. 2012;Grafström and Tillè 2013;Grafström and Matei 2018). Designs based on quasi-random number sequences, for example the Halton sequence (Halton 1960), have also been proposed (Robertson et al. 2013(Robertson et al. , 2017van Dam-Bates et al. 2018;Robertson et al. 2018Robertson et al. , 2021. ...
Article
Full-text available
In this paper, we construct three types of augmented samples, which are samples generated from two separate randomization events. The first type combines a simple random sample (SRS) with a spatially balanced sample (SBS) selected from the same finite population. The second type combines an SBS with an SRS. The third type combines two spatially balanced samples. The simple random sample is constructed without replacement and does not contain any ties. The spatially balanced samples are constructed using the properties of the Halton sequence. We provide the first and second order inclusion probabilities for the augmented samples. Next, using the inclusion probabilities of the augmented samples, we construct estimators for the mean and total of a finite population. The efficiency of the augmented samples varies between the efficiency of SRS and SBS samples. If the number of SRS observations in the augmented sample is large, the efficiency is closer to the efficiency of SRS. Otherwise, it is closer to the efficiency of SBS. We also provide estimators for the variances of the estimators of population total of augmented samples. The stability of these variance estimators depends on the proportion of SRS observations in the augmented samples. The larger number of SRS observations lead to stable variance estimators.
... Walvoort, Brus, and De Gruijter (2010) used compact geographical strata to perform stratified sampling; this approach is available in the spcosa R package (Walvoort, Brus, and De Gruijter 2022). Grafström, Lundström, and Schelin (2012) used a local pivot method for finite populations and Grafström and Matei (2018) generalized this approach to infinite populations; these approaches are available in the BalancedSampling R package (Grafström and Lisic 2019). Grafström (2012) used a spatially correlated Poisson approach, also available in Balanced-Sampling. ...
Article
Full-text available
spsurvey is an R package for design-based statistical inference, with a focus on spatial data. spsurvey provides the generalized random-tessellation stratified (GRTS) algorithm to select spatially balanced samples via the grts() function. The grts() function flexibly accommodates several sampling design features, including stratification, varying inclusion probabilities, legacy (or historical) sites, minimum distances between sites, and two options for replacement sites. spsurvey also provides a suite of data analysis options, including categorical variable analysis (cat_analysis()), continuous variable analysis cont_analysis()), relative risk analysis (relrisk_analysis()), attributable risk analysis (attrisk_analysis()), difference in risk analysis (diffrisk_analysis()), change analysis (change_analysis()), and trend analysis (trend_analysis()). In this manuscript, we first provide background for the GRTS algorithm and the analysis approaches and then show how to implement them in spsurvey. We find that the spatially balanced GRTS algorithm yields more precise parameter estimates than simple random sampling, which ignores spatial information.
... After determining the sample size corresponding to each stratum of the ecological region, LPM is used to select the sample pixels. Grafström et al. [38,39] proposed LPM for spatially balanced sampling. During the experimental operation, the spatial position of the pixel is used as auxiliary information to achieve spatially balanced sampling. ...
Article
Full-text available
In recent years, the availability of multi-temporal global land-cover datasets has meant that they have become a key data source for evaluating land cover in many applications. Due to the high data volume of the multi-temporal land-cover datasets, probability sampling is an efficient method for validating multi-temporal global urban land-cover maps. However, the current accuracy assessment methods often work for a single-epoch dataset, and they are not suitable for multi-temporal data products. Limitations such as repeated sampling and inappropriate sample allocation can lead to inaccurate evaluation results. In this study, we propose the use of spatio-temporal stratified sampling to assess thematic mappings with respect to the temporal changes and spatial clustering. The total number of samples in the two stages, i.e., map and pixel, was obtained by using a probability sampling model. Since the proportion of the area labeled as no change is large while that of the area labeled as change is small, an optimization algorithm for determining the sample sizes of the different strata is proposed by minimizing the sum of variance of the user’s accuracy, producer’s accuracy, and proportion of area for all strata. The experimental results show that the allocation of sample size by the proposed method results in the smallest bias in the estimated accuracy, compared with the conventional sample allocation, i.e., equal allocation and proportional allocation. The proposed method was applied to multi-temporal global urban land-cover maps from 2000 to 2010, with a time interval of 5 years. Due to the spatial aggregation characteristics, the local pivotal method (LPM) is adopted to realize spatially balanced sampling, leading to more representative samples for each stratum in the spatial domain. The main contribution of our research is the proposed spatio-temporal sampling approach and the accuracy assessment conducted for the multi-temporal global urban land-cover product.
... Benedetti and Piersimoni (2017) proposed a flexible class of designs that draw SB samples based on a within-sample distance or similarity measure. Another group of methods apply a local repulsion strategy to ensure well-spread samples are drawn (Grafström 2011;Grafström et al. 2012;Grafström and Tillé 2013;Grafström and Matei 2018). SB designs based on the Halton sequence (Halton 1960) have been proposed (Robertson et al. 2013(Robertson et al. , 2017van Dam-Bates et al. 2018;Robertson et al. 2018. ...
Article
Full-text available
A spatial sampling design determines where sample locations are placed in a study area so that population parameters can be estimated with good precision. Spatially balanced designs draw samples with good spatial spread and provide precise results for commonly used estimators when surveying natural resources. In this article, we propose a new sampling strategy that incorporates ranking information from nearby locations into a spatially balanced sample. If the population exhibits spatial trends, our simple local ranking strategy can improve the precision of commonly used estimators. Numerical results on several test populations with different spatial structures show that local ranking can improve the performance of a spatially balanced design. To show that local ranking is simple and effective in practice, we provide an example application for the health and productivity assessment of a Shiraz vineyard in South Australia. Supplementary materials accompanying this paper appear online.
... The first spatially balanced sampling algorithm to see widespread use was the generalized random tessellation stratified (GRTS) algorithm (Stevens and Olsen, 2004). After the GRTS algorithm was developed, several other spatially balanced sampling algorithms emerged, including stratified sampling with compact geographical strata (Walvoort et al., 2010), the local pivotal method Grafström and Matei, 2018), spatially correlated Poisson sampling (Grafström, 2012), balanced acceptance sampling (Robertson et al., 2013), within-sample-distance sampling , and Halton iterative partitioning sampling (Robertson et al., 2018). In this manuscript, we select spatially balanced samples using the GRTS algorithm because it is readily available in the spsurvey R package (Dumelle et al., 2022b) and naturally accommodates finite and infinite sampling frames, unequal inclusion probabilities, and replacement units. ...
Article
Full-text available
The design‐based and model‐based approaches to frequentist statistical inference rest on fundamentally different foundations. In the design‐based approach, inference relies on random sampling. In the model‐based approach, inference relies on distributional assumptions. We compare the approaches in a finite population spatial context. We provide relevant background for the design‐based and model‐based approaches and then study their performance using simulated data and real data. The real data are from the United States Environmental Protection Agency's 2012 National Lakes Assessment. A variety of sample sizes, location layouts, dependence structures, and response types are considered. The population mean is the parameter of interest, and performance is measured using statistics like bias, squared error and interval coverage. When studying the simulated and real data, we found that regardless of the strength of spatial dependence in the data, the generalized random tessellation stratified (GRTS) algorithm, which explicitly incorporates spatial locations into sampling, tends to outperform the simple random sampling (SRS) algorithm, which does not explicitly incorporate spatial locations into sampling. We also found that model‐based inference tends to outperform design‐based inference, even for skewed data where the model‐based distributional assumptions are violated. The performance gap between design‐based inference and model‐based inference is small when GRTS samples are used but large when SRS samples are used, suggesting that the sampling choice (whether to use GRTS or SRS) is most important when performing design‐based inference. There are many benefits and drawbacks to the design‐based and model‐based approaches for finite population spatial sampling and inference that practitioners must consider when choosing between them. We provide relevant background contextualizing each approach and study their properties in a variety of scenarios, making recommendations for use based on the practitioner's goals.
... Power analysis are, for now, able to be calculated only for simple random sampling design and so are not very relevant when using a more advanced sampling design (i.e spatially balanced sampling design). Spatially balanced sampling designs are proved to be more efficient than simple random sampling (Robertson et al., 2013;Brown et al., 2015;Grafström and Matei, 2018;Kermorvant et al., 2019b) and so need fewer samples to detect same level of change between two survey seasons. We believe that overestimation of sample size is probable in such cases. ...
Article
Issues on sampling procedure definition led numerous study results to be biased and object of controversy. Choosing relevant sampling design and number of samples is a difficult task when wanted to set up or optimize a survey. The survey design choice is very important to avoid bias and increase the survey cost-efficiency. It can have a strong effect on the sample size needed to achieve some targeted accuracy on results. And on the final cost of the procedure. The sequential process we expose here melt design based and model based sampling theories. Its objectives is helping practitioners defining a sampling design and a number of samples for their survey when inference to the whole population is wanted. The main idea is to mathematically reconstruct the distribution of the surveyed population. Then assess and compare cost-effectiveness of various sampling designs on this population. This process allows setting predetermined level(s) of accuracy to be reached in the targeted estimates and to take into account previous relevant data. Results are an optimal sampling design and an associated optimal sample size for a desired accuracy in the results. This accuracy is so achieved without excess sampling. Strength of this process is that it is based on simulations. This allows trying a high number of combinations between sampling design, sample size and desired level of accuracy. Sampling design performances can thus be compared. The user can decide which combination is the best for his survey and apply it for real. We discuss how to use available data to improve the survey, from the case were several historical data are provided to the case where no data are available.
... Different from BAS (HIP), the selections are based on distances. By applying LPM and SCPS, we can spread the samples in the auxiliary variables even if they do not constitute dimensions of the population (Grafström & Matei, 2018a). ...
Article
Full-text available
A new sampling strategy for design‐based environmental monitoring is proposed. It has the potential to produce superior estimators for totals of environmental variables and their changes over time. In the strategy, we combine two concepts known as spatially balanced sampling and coordination of samples. Spatially balanced sampling can provide superior estimators of totals, while coordination of samples over time is often used to improve estimators of change. Compared with reference strategies, we show that the new strategy can improve the precision of the estimators of totals and their change simultaneously. A forest inventory application is used to illustrate the new strategy and the results can be summarized as (i) using auxiliary information to spread the sample can improve the estimators of totals; (ii) a positive coordination of the samples reduced the variance of the estimator of change by more than 37% compared with independent samples; and (iii) a high overlap between successive samples does not guarantee a good estimator of change. The presented strategy can be used to develop more efficient environmental monitoring programs.
... There have been many studies showing the effectiveness of spatially balanced sampling with this estimator on a variety of populations with different spatial structures (c.f. Stevens and Olsen 2004;Grafström et al. 2012;Grafström and Lundström 2013;Robertson et al. 2013Robertson et al. , 2018Grafström & Matei 2018). If the spatial trend is weak or if there is no trend at all, there is no statistical advantage in choosing spatially balanced designs over other probabilistic designs (Robertson et al. 2013). ...
Article
Full-text available
Some environmental studies use non-probabilistic sampling designs to draw samples from spatially distributed populations. Unfortunately, these samples can be difficult to analyse statistically and can give biased estimates of population characteristics. Spatially balanced sampling designs are probabilistic designs that spread the sampling effort evenly over the resource. These designs are particularly useful for environmental sampling because they produce good sample coverage over the resource, have precise design-based estimators and they can potentially reduce the sampling cost. The most popular spatially balanced design is generalized random tessellation stratified (GRTS), which has many desirable features including a spatially balanced sample, design-based estimators and the ability to select spatially balanced oversamples. This article considers the popularity of spatially balanced sampling, reviews several spatially balanced sampling designs and shows how these designs can be implemented in the statistical programming language R. We hope to increase the visibility of spatially balanced sampling and encourage environmental scientists to use these designs.
... This was compensated by adding an auxiliary variable which describes the amount of within cluster variation, i.e., variance of total growing stock volume, but the variation could also originate from distribution of land use or tree species compositions as well as from other site conditions like altitude, which were not included as auxiliaries in our study. Instead of mean values, different kinds of metrics to describe the distances between the clusters in the multi-dimensional auxiliary space or the variation and distribution of auxiliary variables within the clusters as well as variance estimator (Grafström and Schelin 2014;Grafström et al. 2017a;Grafström and Matei 2018) could also be studied further. ...
Article
Full-text available
Key message Using spatially balanced sampling utilizing auxiliary information in the design phase can enhance the design efficiency of national forest inventory. These gains decreased with increasing proportion of permanent plots in the sample. Using semi-permanent plots, changing every n th inventory round, instead of permanent plots, reduced this phenomenon. Further studies for accounting the permanent sample when selecting temporary sample are needed. Context National forest inventories (NFIs) produce national- and regional-level statistics for sustainability assessment and decision-making. Using an interpreted satellite image as auxiliary information in the design phase improved the relative efficiency (RE). Spatially balanced sampling through local pivotal method (LPM) used for selection of clusters of sample plots is designed for temporary sample; thus, the method was tested in a NFI design with both permanent and temporary clusters. Aims We estimated LPM method and stratified sampling for a NFI designed for successive occasions, where the clusters are permanent, semi-permanent, or temporary being replaced: never, every nth, and every inventory round, respectively. Methods REs of sampling designs against systematic sampling were studied with simulations of inventory sampling. Results The larger the proportion of permanent clusters the smaller benefits gained with LPM. REs of stratified sampling were not depending on the proportion of permanent clusters. The semi-permanent sampling with LPM removed the previously described decrease and resulted in the largest REs. Conclusion Sampling strategies with semi-permanent clusters were the most efficient, yet not necessarily optimal for all inventory variables. Further development of method to simultaneously take into account the distribution of permanent sample when selecting temporary or semi-temporary sample is desired since it could increase the design efficiency.
Article
The French National Forest Inventory (NFI) employs a two-stage two-phase sampling scheme summarized by the following key steps: first, the territory is divided into a spatial grid, and cells are randomly selected from this grid. Within the selected cells, additional random sampling of points is conducted. Subsequently, classification of the selected points is performed using auxiliary information from photo-interpretation. This information is used to draw a sub-sample that leads to field measurements. We evaluate the efficiency of the French NFI’s sampling design when the Horvitz–Thompson and post-stratified estimators for the total are used in the first and second phases, respectively. Given the complexity of the French NFI’s sampling design, a new theoretical framework is introduced for two-stage two-phase sampling schemes to facilitate design-based inference, combining inference methods for both finite and continuous populations. Horvitz–Thompson type estimators for the total and post-stratified estimators are proposed alongside variance estimators. Their performances are assessed through a simulation study, comparing the French NFI’s sampling design using alternative methods. The results indicate that the strategy formed by the French NFIs sampling design and proposed estimators may be effective in practice. The proposed framework is general and can be applied to other forest and environmental surveys.
Article
In environmental and ecological surveys, well spread samples can be easily obtained via widely adopted tessellation schemes, which yield equal first‐order inclusion probabilities in the case of finite populations of areas or constant inclusion density functions in the case of continuous populations. In the literature, many alternative schemes that are explicitly tailored to select well spread samples have been proposed, but owing to their complexity, their use should be preferred only if they allow us to achieve a valuable gain in precision with respect to the tessellation schemes. Therefore, by means of an extensive simulation study, the performances of tessellation schemes and several specifically tailored schemes are compared under constant first‐order inclusion probabilities or constant inclusion density functions.
Article
A spatial sampling design determines where sample locations are placed in a study area so that population parameters can be estimated with relatively high precision. If the response variable has spatial trends, spatially balanced or well-spread designs give precise results for commonly used estimators. This article proposes a new method that draws well-spread samples over arbitrary auxiliary spaces and can be used for master sampling applications. All we require is a measure of the distance between population units. Numerical results show that the method generates well-spread samples and compares favorably with existing designs. We provide an example application using several auxiliary variables to estimate total aboveground biomass over a large study area in Eastern Amazonia, Brazil. Multipurpose surveys are also considered, where the totals of aboveground biomass, primary production, and clay content (3 responses) are estimated from a single well-spread sample over the auxiliary space.
Article
When the sampled population belongs to a metric space, the selection of neighboring units will imply often similarities in the collected data due to their geographical proximity. In order to estimate parameters such as means or totals, it is therefore more efficient to select samples that are well distributed in space. Often, the interest lies not only in estimating a parameter at one point in time, but rather in estimating it at several points and studying its evolution. Because of the temporal autocorrelation of successive values from the same unit, a system of temporal rotation of the units in the samples must be provided. In other words, this type of problem forces us to consider two types of autocorrelation: spatial and temporal. In this article, we propose two new spatiotemporal sampling methods for equal or unequal inclusion probabilities. Systematic sampling is used to promote a rotation of the selection of the same unit over time, and thus address temporal spread. Both methods select samples that are well distributed in space at each sampling time. They differ by the fact that these samples are of random size for the first one, while for the second one, more complex, their sizes are controlled. Thus, the first method is called spatiotemporal sampling with random sample sizes (SPAR) and the second, spatiotemporal sampling with fixed sample sizes (SPAF). Simulations show that our methods outperform and generalize existing methods.
Article
Full-text available
A new sampling strategy for forest inventories is presented. The most important difference from the traditional sampling strategies is that auxiliary variables from remote sensing are incorporated into the sampling design. The sample is selected to match population distributions of the auxiliary variables as well as possible. This is achieved by a double sampling approach, where auxiliary variables are extracted for a large first-phase sample. The second selection is done by the local pivotal method and produces an even thinning of the first-phase sample. Thus, we make sure that the selected second-phase sample becomes much more representative of the population than what is possible by the use of traditional designs. The potential of implementing the new strategy for the temporary clusters within the Swedish national forest inventory is evaluated with five auxiliary variables: the geographical coordinates, elevation, predicted tree height, and predicted basal area. The increased representativity that we achieve with the new strategy induces up to 95% reduction of the variance of the sample means of the remote sensing auxiliary variables compared with traditional designs. For this reason, we conclude that the new strategy that will be implemented in the forthcoming Swedish national forest inventory has a great potential to achieve large improvements in estimation of many important forest attributes.
Article
Full-text available
When sampling from a finite population there is often auxiliary information available on unit level. Such information can be used to improve the estimation of the target parameter. We show that probability samples that are well spread in the auxiliary space are balanced, or approximately balanced, on the auxiliary variables. A consequence of this balancing effect is that the Horvitz-Thompson estimator will be a very good estimator for any target variable that can be well approximated by a Lipschitz continuous function of the auxiliary variables. Hence we give a theoretical motivation for use of well spread probability samples. Our conclusions imply that well spread samples, combined with the Horvitz- Thompson estimator, is a good strategy in a varsity of situations.
Article
Full-text available
A very general class of sampling methods without replacement and with unequal probabilities is proposed. It consists of splitting the inclusion probability vector into several new inclusion probability vectors. One of these vectors is chosen randomly; thus, the initial problem is reduced to another sampling problem with unequal probabilities. This splitting is then repeated on these new vectors of inclusion probabilities; at each step, the sampling problem is reduced to a simpler problem. The simplicity of this technique allows one to generate easily new sampling procedures with unequal probabilities. The splitting method also generalises well-known methods such as the Midzuno method, the elimination procedure and the Chao procedure. Next, a sufficient condition is given in order that a splitting method satisfies the Sen-Yates-Grundy condition. Finally, it is shown that the elimination procedure satisfies the Gabler sufficient condition.
Article
Full-text available
The spatial distribution of a natural resource is an important consideration in designing an efficient survey or monitoring program for the resource. We review a unified strategy for designing probability samples of discrete, finite resource populations, such as lakes within some geographical region; linear populations, such as a stream network in a drainage basin, and continuous, two-dimensional populations, such as forests. The strategy can be viewed as a generalization of spatial stratification. In this article, we develop a local neighborhood variance estimator based on that perspective, and examine its behavior via simulation. The simulations indicate that the local neighborhood estimator is unbiased and stable. The Horvitz–Thompson variance estimator based on assuming independent random sampling (IRS) may be two times the magnitude of the local neighborhood estimate. An example using data from a generalized random-tessellation stratified design on the Oahe Reservoir resulted in local variance estimates being 22 to 58 percent smaller than Horvitz–Thompson IRS variance estimates. Variables with stronger spatial patterns had greater reductions in variance, as expected. Copyright © 2003 John Wiley & Sons, Ltd.
Article
Full-text available
A simple method to select a spatially balanced sample using equal or unequal inclusion probabilities is presented. For populations with spatial trends in the variables of interest, the estimation can be much improved by selecting samples that are well spread over the population. The method can be used for any number of dimensions and can hence also select spatially balanced samples in a space spanned by several auxiliary variables. Analysis and examples indicate that the suggested method achieves a high degree of spatial balance and is therefore efficient for populations with trends.
Article
We give a formal definition of a representative sample, but roughly speaking, it is a scaled-down version of the population, capturing its characteristics. New methods for selecting representative probability samples in the presence of auxiliary variables are introduced. Representative samples are needed for multipurpose surveys, when several target variables are of interest. Such samples also enable estimation of parameters in subspaces and improved estimation of target variable distributions. We describe how two recently proposed sampling designs can be used to produce representative samples. Both designs use distance between population units when producing a sample. We propose a distance function that can calculate distances between units in general auxiliary spaces. We also propose a variance estimator for the commonly used Horvitz–Thompson estimator. Real data as well as illustrative examples show that representative samples are obtained and that the variance of the Horvitz–Thompson estimator is reduced compared with simple random sampling.
Article
A new method for sampling from a finite population that is spread in one, two or more dimensions is presented. Weights are used to create strong negative correlations between the inclusion indicators of nearby units. The method can be used to produce unequal probability samples that are well spread over the population in every dimension, without any spatial stratification. Since the method is very general there are numerous possible applications, especially in sampling of natural resources where spatially balanced sampling has proven to be efficient. Two examples show that the method gives better estimates than other commonly used designs.
Article
To design an efficient survey or monitoring program for a natural resource it is important to consider the spatial distribution of the resource. Generally, sample designs that are spatially balanced are more efficient than designs which are not. A spatially balanced design selects a sample that is evenly distributed over the extent of the resource. In this article we present a new spatially balanced design that can be used to select a sample from discrete and continuous populations in multi-dimensional space. The design, which we call balanced acceptance sampling, utilizes the Halton sequence to assure spatial diversity of selected locations. Targeted inclusion probabilities are achieved by acceptance sampling. The BAS design is conceptually simpler than competing spatially balanced designs, executes faster, and achieves better spatial balance as measured by a number of quantities. The algorithm has been programed in an R package freely available for download.
Article
This paper presents a general technique for the treatment of samples drawn without replacement from finite universes when unequal selection probabilities are used. Two sampling schemes are discussed in connection with the problem of determining optimum selection probabilities according to the information available in a supplementary variable. Admittedly, these two schemes have limited application. They should prove useful, however, for the first stage of sampling with multi-stage designs, since both permit unbiased estimation of the sampling variance without resorting to additional assumptions.* Journal Paper No. J2139 of the Iowa Agricultural Experiment Station, Ames, Iowa, Project 1005. Presented to the Institute of Mathematical Statistics, March 17, 1951.
Chapter
The Koksma–Hlawka inequality is a tight error bound on the approximation of an integral by the sample average of integrand values. The integration error is bounded by a product of two terms, the discrepancy of the sample points, , and the variation of the integrand, V(g). These two quantities measure the quality of the sample points and the roughness of the integrand, respectively. The Koksma–Hlawka inequality plays a key role in the development of quasi–Monte Carlo methods. Such methods replace simple random sample points by low discrepancy points. The Koksma–Hlawka inequality has also influenced the study of experimental design and led to the creation of UNIFORM DESIGNS.
Article
When a continuous population is sampled, the spatial mean is often the target parameter if the design-based approach is assumed. In this case, auxiliary information may be suitably used to increase the accuracy of the spatial mean estimators. To this end, regression models are usually considered at the estimation stage in order to implement regression estimators. Since the spatial mean may be obviously represented as a bivariate integral, the strategies for placing the sampling locations are actually Monte Carlo integration methods. Hence, the regression-based estimation is equivalent to the control-variate integration method. In this setting, we suggest more refined Monte Carlo integration strategies which may drastically increase the regression estimator accuracy. Copyright © 2005 John Wiley & Sons, Ltd.
Article
A theory of estimation for sampling from a continuous universe is developed. The results are similar in form to the Horvitz—Thompson theorem from finite population sampling.
Article
The spatial distribution of a natural resource is an important consideration in designing an efficient survey or monitoring program for the resource. Generally, sample sites that are spatially balanced, that is, more or less evenly dispersed over the extent of the resource, are more efficient than simple random sampling. We review a unified strategy for selecting spatially balanced probability samples of natural resources. The technique is based on creating a function that maps two-dimensional space into one-dimensional space, thereby defining an ordered spatial address. We use a restricted randomization to randomly order the addresses, so that systematic sampling along the randomly ordered linear structure results in a spatially well-balanced random sample. Variable inclusion probability, proportional to an arbitrary positive ancillary variable, is easily accommodated. The basic technique selects points in a two-dimensional continuum, but is also applicable to sampling finite populations or one-dimensional continua embedded in two-dimensional space. An extension of the basic technique gives a way to order the sample points so that any set of consecutively numbered points is in itself a spatially well-balanced sample. This latter property is extremely useful in adjusting the sample for the frame imperfections common in environmental sampling.
SDraw: Spatially balanced sample draws for spatial objects
  • T Mcdonald
McDonald, T. (2016). SDraw: Spatially balanced sample draws for spatial objects. R package version 2.1.3.
BalancedSampling: Balanced and spatially balanced sampling
  • A Grafström
  • J Lisic
Grafström, A. & Lisic, J. (2016). BalancedSampling: Balanced and spatially balanced sampling. R package version 1.5.2.