[Show abstract][Hide abstract] ABSTRACT: The block correlation is the correlation between the block kriging prediction of a variable and the true spatial mean which it estimates, computed for a particular sampling configuration and block size over the stochastic model which underlies the kriging prediction. This correlation can be computed if the variogram and disposition of sample points are known. It is also possible to compute the concordance correlation, a modified correlation which measures the extent to which the block kriging prediction and true block spatial mean conform to the 1:1 line, and so is sensitive to the tendency of the kriging predictor to over-smooth. It is proposed that block concordance correlation has two particular advantages over kriging variance for communicating uncertainty in predicted values. First, as a measure on a bounded scale it is more intuitively understood by the non-specialist data user, particularly one who is interested in a synoptic overview of soil variation across a region. Second, because it accounts for the variability of the spatial means and their kriged estimates, as well as the uncertainty of the latter, it can be more readily compared between blocks of different sizes than can a kriging variance.
[Show abstract][Hide abstract] ABSTRACT: Seabed sediment texture can be mapped by geostatistical prediction from limited direct observations such as grab-samples. A geostatistical model can provide local estimates of the probability of each texture class so the most probable sediment class can be identified at any unsampled location, and the uncertainty of this prediction can be quantified. In this paper we show, in a case study off the northeast coast of England, how swath bathymetry and backscatter can be incorporated into a geostatistical linear mixed model (LMM) as fixed effects (covariates).
Parameters of the LMM were estimated by maximum likelihood which allowed us to show that both covariates provided useful information. In a cross-validation, each observation was predicted from the rest using the LMMs with (i) no covariates, or (ii) bathymetry and backscatter as covariates. The proportion of cases in which the most probable class according to the prediction corresponded to the observed class was increased (from 58% to 65% of cases) by including the covariates which also increased the information content of the predictions, measured by the entropy of the class probabilities. A qualitative assessment of the geostatistical results shows that the model correctly predicts, for example, the occurrence of coarser sediment over discrete glacial sediment landforms, and muddier sediment in relatively quiescent, localized deep water environments. This demonstrates the potential for assimilating geophysical data with direct observations by the LMM, and could offer a basis for a routine mapping procedure which incorporates these and other ancillary information such as manually-interpreted geological and geomorphological maps.
[Show abstract][Hide abstract] ABSTRACT: The loss function expresses the costs to an organization that result from decisions made using erroneous information. In closely constrained circumstances, such as remediation of soil on contaminated land prior to development, it has proved possible to compute loss functions and to use these to guide rational decision making on the amount of resource to spend on sampling to collect soil information. In many circumstances it may not be possible to define loss functions prior to decision making on soil sampling. This may be the case when multiple decisions may be based on the soil information and the costs of errors are hard to predict. We propose the implicit loss function as a tool to aid decision making in these circumstances. Conditional on a logistical model which expresses costs of soil sampling as a function of effort, and statistical information from which the error of estimates can be modelled as a function of effort, the implicit loss function is the loss function which makes a particular decision on effort rational. After defining the implicit loss function we compute it for a number of arbitrary decisions on sampling effort for a hypothetical soil monitoring problem. This is based on a logistical model of sampling cost parameterized from a recent survey of soil in County Donegal, Ireland and on statistical parameters estimated with the aid of a process model for change in soil organic carbon. We show how the implicit loss function might provide a basis for reflection on a particular choice of sampling regime, specifically the simple random sample size, by comparing it with the values attributed to soil properties and functions. In a recent study rules were agreed to deal with uncertainty in soil carbon stocks for purposes of carbon trading by treating a percentile of the estimation distribution as the estimated value. We show that this is equivalent to setting a parameter of the implicit loss function, its asymmetry. We then discuss scope for further research to develop and apply the implicit loss function to help decision making by policy makers and regulators.
[Show abstract][Hide abstract] ABSTRACT: It is generally accepted that geological line work, such as mapped boundaries, are uncertain for various reasons. It is difficult to quantify this uncertainty directly, because the investigation of error in a boundary at a single location may be costly and time consuming, and many such observations are needed to estimate an uncertainty model with confidence. However, it is recognized across many disciplines that experts generally have a tacit model of the uncertainty of information that they produce (interpretations, diagnoses, etc.) and formal methods exist to extract this model in usable form by elicitation. In this paper we report a trial in which uncertainty models for geological boundaries mapped by geologists of the British Geological Survey (BGS) in six geological scenarios were elicited from a group of five experienced BGS geologists. In five cases a consensus distribution was obtained, which reflected both the initial individually elicited distribution and a structured process of group discussion in which individuals revised their opinions. In a sixth case a consensus was not reached. This concerned a boundary between superficial deposits where the geometry of the contact is hard to visualize. The trial showed that the geologists' tacit model of uncertainty in mapped boundaries reflects factors in addition to the cartographic error usually treated by buffering line work or in written guidance on its application. It suggests that further application of elicitation, to scenarios at an appropriate level of generalization, could be useful to provide working error models for the application and interpretation of line work.
[Show abstract][Hide abstract] ABSTRACT: Laboratory-based aggregate stability (AS) tests should be applied to material wetted to a moisture content comparable with that of a field soil. We have improved our original laser granulometer (LG)-based AS test published in this journal by including a pre-wetting stage. Our method estimates disaggregation reduction (DR; µm) for a soil sample (1–2-mm diameter aggregates). Soils with more stable aggregates have larger DR values. We apply the new technique to soils from 60 cultivated sites across eastern England, with ten samples from each of six different parent material (PM) types encompassing a wide range of soil organic carbon (SOC) concentrations (1.2–7.0%). There are large differences between the median DR values (rescaled to < 500 µm) for soils over the PM types, which when used as a predictor (in combination with SOC concentration) accounted for 53% of the variation in DR. There was no evidence for including an interaction term between PM class and SOC concentration for the prediction of DR. After applying the aggregate stability tests with the 60 regional soil samples, they were stored for 9 months and the tests were repeated, resulting in a small but statistically significant increase in DR for samples from some, but not all, PM types. We show how a palaeosol excavated from a site in southern England can be used as an aggregate reference material (RM) to monitor the reproducibility of our technique. It has been suggested that soil quality, measured by critical soil physical properties, may decline if the organic carbon concentration is less than a critical threshold. Our results show that, for aggregate stability, any such thresholds are specific to the PM.
European Journal of Soil Science 04/2015; 66(3). DOI:10.1111/ejss.12250 · 2.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: It is generally accepted that geological linework, such as mapped boundaries, are uncertain for various reasons. It is difficult to quantify this uncertainty directly, because the investigation of error in a boundary at a single location may be costly and time consuming, and many such observations are needed to estimate an uncertainty model with confidence. However, it is also recognized across many disciplines that experts generally have a tacit model of the uncertainty of information that they produce (interpretations, diagnoses etc.) and formal methods exist to extract this model in usable form by elicitation. In this paper we report a trial in which uncertainty models for mapped boundaries in six geological scenarios were elicited from a group of five experienced geologists. In five cases a consensus distribution was obtained, which reflected both the initial individually elicted distribution and a structured process of group discussion in which individuals revised their opinions. In a sixth case a consensus was not reached. This concerned a boundary between superficial deposits where the geometry of the contact is hard to visualize. The trial showed that the geologists' tacit model of uncertainty in mapped boundaries reflects factors in addition to the cartographic error usually treated by buffering linework or in written guidance on its application. It suggests that further application of elicitation, to scenarios at an appropriate level of generalization, could be useful to provide working error models for the application and interpretation of linework.
[Show abstract][Hide abstract] ABSTRACT: We conducted a designed experiment to quantify sources of uncertainty in geologists' interpretations of a geological cross section. A group of 28 geologists participated in the experiment. Each interpreted borehole record included up to three Palaeogene bedrock units, including the target unit for the experiment: the London Clay. The set of boreholes was divided into batches from which validation boreholes had been withheld; as a result, we obtained 129 point comparisons between the interpreted elevation of the base of the London Clay and its observed elevation in a borehole not used for that particular interpretation. Analysis of the results showed good general agreement between the observed and interpreted elevations, with no evidence of systematic bias. Between-site variation of the interpretation error was spatially correlated, and the variance appeared to be stationary. The between-geologist component of variance was smaller overall, and depended on the distance to the nearest borehole. There was also evidence that the between-geologist variance depends on the degree of experience of the individual. We used the statistical model of interpretation error to compute confidence intervals for any one interpretation of the base of the London Clay on the cross section, and to provide uncertainty measures for decision support in a hypothetical route-planning process. The statistical model could also be used to quantify error propagation in a full 3-D geological model produced from interpreted cross sections.
[Show abstract][Hide abstract] ABSTRACT: Spatial predictions of soil properties are needed for various purposes. However, the costs associated with soil sampling and laboratory analysis are substantial. One way to improve efficiencies is to combine measurement of soil properties with collection of cheaper-to-measure ancillary data. There are two possible approaches. The first is the formation of classes from ancillary data. A second is the use of a simple predictive linear model of the target soil property on the ancillary variables. Here, results are presented and compared where proximally sensed gamma-ray (γ-ray) spectrometry and electromagnetic induction (EMI) data are used to predict the variation in topsoil properties (e.g. clay content and pH). In the first instance, the proximal data is numerically clustered using a fuzzy k-means (FKM) clustering algorithm, to identify contiguous classes. The resultant digital soil maps (i.e. k = 2–10 classes) are consistent with a soil series map generated using traditional soil profile description, classification and mapping methods at a highly variable site near the township of Shelford, Nottinghamshire UK. In terms of prediction, the calculated expected value of mean squared prediction error (i.e. σ2p,C) indicated that values of k = 7 and 8 were ideal for predicting clay and pH. Secondly, a linear mixed model (LMM) is fitted in which the proximal data are fixed effects but the residuals are treated as a combination of a spatially correlated random effect and an independent and identically distributed error. In terms of prediction, the expected value of the mean squared prediction error from a regression (σ2p,R) suggested that the regression models were able to predict clay content, better than FKM clustering. The reverse was true with respect to pH, however. We conclude that both methods have merit. In the case of the clustering the approach is able to account for soil properties which have non-linearity's with the ancillary data (i.e. pH), whereas the LMM approach is best when there is a strong linear relationship (i.e. clay).
Geoderma 11/2014; s 232–234:69–80. DOI:10.1016/j.geoderma.2014.04.031 · 2.77 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Soil bulk density (BD) is measured during soil monitoring. Because it is spatially variable, an appropriate sampling protocol is required. This paper shows how information on short-range variability can be used to quantify uncertainty of estimates of mean BD and soil organic carbon on a volumetric basis (SOCv) at a sampling site with different sampling intensities. We report results from two contrasting study areas, with mineral soil and with peat. More sites should be investigated to develop robust protocols for national-scale monitoring, but these results illustrate the methodology. A 20 × 20-m2 monitoring site was considered and sampling protocols were evaluated under geostatistical models of our two study areas. At sites with local soil variability comparable to our mineral soil, sampling at 16 points (4 × 4 square grid of interval 5 m) would achieve a root mean square error (RMSE) of the sample mean value of both BD and SOCv of less than 5% of the mean (topsoil and subsoil). Pedotransfer functions (PTFs) gave predictions of mean soil BD at a sample site, comparable to our study area on mineral soil, with similar precision to a single direct measurement of BD. On peat soils comparable to our second study area, the mean BD for the monitoring site at depth 0–50 cm would be estimated with RMSE to be less than 5% of the mean with a sample of 16 cores, but at greater depths this criterion cannot be achieved with 25 cores or fewer.
European Journal of Soil Science 10/2014; 65(6). DOI:10.1111/ejss.12178 · 2.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We briefly describe three methods of seabed characterization which are ‘fit for purpose’, in that each approach is well suited to distinct objectives e.g. characterizing glacial geomorphology and shallow glacial geology vs. rapid prediction of seabed sediment distribution via geostatistics. The methods vary from manual ‘expert’ interpretation to increasingly automated and mathematically based models, each with their own attributes and limitations. We would note however that increasing automation and mathematical sophistication does not necessarily equate to improve map outputs, or reduce the time required to produce them. Judgements must be made to select methodologies which are most appropriate to the variables mapped, and according to the extent and presentation scale of final maps.
http://www.earthdoc.org/publication/publicationdetails/?publication=77789
[Show abstract][Hide abstract] ABSTRACT: Soil in a changing world is subject to both anthropogenic and environmental stresses. Soil monitoring is essential to assess the magnitude of changes in soil variables and how they affect ecosystem processes and human livelihoods. However, we cannot always be sure which sampling design is best for a given monitoring task. We employed a rotational stratified simple random sampling (rotStRS) for the estimation of temporal changes in the spatial mean of saturated hydraulic conductivity (Ks) at three sites in central Panama in 2009, 2010 and 2011. To assess this design's efficiency we compared the resulting estimates of the spatial mean and variance for 2009 with those gained from stratified simple random sampling (StRS), which was effectively the data obtained on the first sampling time, and with an equivalent unexecuted simple random sampling (SRS). The poor performance of geometrical stratification and the weak predictive relationship between measurements of successive years yielded no advantage of sampling designs more complex than SRS. The failure of stratification may be attributed to the small large-scale variability of Ks. Revisiting previously sampled locations was not beneficial because of the large small-scale variability in combination with destructive sampling, resulting in poor consistency between revisited samples. We conclude that for our Ks monitoring scheme, repeated SRS is equally effective as rotStRS. Some problems of small-scale variability might be overcome by collecting several samples at close range to reduce the effect of small-scale variation. Finally, we give recommendations on the key factors to consider when deciding whether to use stratification and rotation in a soil monitoring scheme.
European Journal of Soil Science 09/2014; 65(6). DOI:10.1111/ejss.12174 · 2.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Deficiency or excess of certain trace elements in the soil causes problems for agriculture, including disorders of grazing ruminants. Geostatistics has been used to map the probability that trace element concentrations in soil exceed or fall below particular thresholds. However, deficiency or toxicity problems may depend on interactions between elements in the soil. Here we show how cokriging from a regional survey of topsoil geochemistry can be used to map the risk of deficiency, and the best management intervention, where both depend on the interaction between two elements. Our case study is on cobalt. Farmers and their advisors in Ireland use index values for the concentration of total soil cobalt and manganese to identify where grazing sheep are at risk of cobalt deficiency. We use topsoil data from a regional geochemical survey across six counties of Ireland to form local cokriging predictions of cobalt and manganese concentrations with an attendant distribution which reflects the joint uncertainty of these predictions. From this distribution we then compute conditional probabilities for different combinations of cobalt and manganese index values, and so for the corresponding inferred risk to sheep of cobalt deficiency and the appropriateness of different management interventions. We represent these results as maps, using a verbal scale for the communication of uncertain information. This scale is based on one used by the Intergovernmental Panel on Climate Change, modified in light of some recent research on its effectiveness.
Geoderma 08/2014; s 226–227(1):64–78. DOI:10.1016/j.geoderma.2014.03.002 · 2.77 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A Confidence Index is proposed that expresses the confidence of experts in the quality of a 3-D model as a representation of the subsurface at particular locations. The Confidence Index is based on the notion that the variation of the height of a particular geological surface represents general geological variability and local variability. The general variability comprises simple trends which allow the modeller to project surface structure at locations remote from direct observations. The local variability limits the extent to which borehole observations constrain inferences which the modeller can make concerning local fluctuations around the broad trends. The general and local geological variability of particular contacts are modelled in terms of simple trend surfaces and variogram models. These are then used to extend measures of confidence that reflect expert opinion so as to assign a confidence value to any location where a particular contact is represented in a model. The index is illustrated with an example from the East Midlands region of the United Kingdom.
Proceedings of the Geologists Association 07/2014; 125(3). DOI:10.1016/j.pgeola.2014.05.002 · 1.15 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The multivariate cumulants characterize aspects of the spatial variability of a regionalized variable. A centred multivariate Gaussian random variable, for example, has zero third-order cumulants. In this paper it is shown how the third-order cumulants can be used to test the plausibility of the assumption of multivariate normality for the porosity of an important formation, the Bunter Sandstone in the North Sea. The results suggest that the spatial variability of this variable deviates from multivariate normality, and that this assumption may lead to misleading inferences about, for example, the uncertainty attached to kriging predictions.
[Show abstract][Hide abstract] ABSTRACT: In this paper we develop a model for the spatial variability of apparent electrical conductivity, ECa, of soil formed in relict patterned ground. The model is based on the continuous local trend (CLT) random processes introduced by Lark (2012b) (Geoderma, 189-190, 661-670). These models are non-Gaussian and so their parameters cannot be estimated just by fitting a variogram model. We show how a plausible CLT model, and parameters for this model, can be found by the structured use of soil knowledge about the pedogenic processes in the particular environment and the physical properties of the soil material, along with some limited descriptive statistics on the target variable. This approach is attractive to soil scientists in that it makes the geostatistical analysis of soil properties an explicitly pedological procedure, and not simply a numerical exercise. We use this approach to develop a CLT model for ECa at our target site. We then develop a test statistic which measures the extent to which soils on this site with small values of ECa, which are coarser and so more permeable, tend to be spatially connected in the landscape. When we apply this statistic to our data we get results which indicate that the CLT model is more appropriate for the variable than is a Gaussian model, even after the transformation of the data. The CLT model could be used to generate training images of soil processes to be used for computing conditional distributions of variables at unsampled sites by multiple point geostatistical algorithms.
[Show abstract][Hide abstract] ABSTRACT: Angular data are commonly encountered in the earth sciences and
statistical descriptions and inferences about such data are necessary in
structural geology. In this paper we compare two statistical
distributions appropriate for complex angular data sets: the mixture of
von Mises and the projected normal distribution. We show how the number
of components in a mixture of von Mises distribution may be chosen, and
how one may chose between the projected normal distribution and mixture
of von Mises for a particular data set. We illustrate these methods with
some structural geological data, showing how the fitted models can
complement geological interpretation and permit statistical inference.
One of our data sets suggests a special case of the projected normal
distribution which we discuss briefly.
[Show abstract][Hide abstract] ABSTRACT: It is important to understand how and where pollution and other anthropogenic processes compromise the ability of urban soil to serve as a component of the natural infrastructure. An extensive survey of the topsoil of the Greater London Area (GLA) in the United Kingdom has recently been completed by a non-probability systematic sampling scheme. We studied data on lead content from this survey. We examined an overall hypothesis that land use, as recorded at the time of sampling, is an important source of the variation of soil lead content, and we examined specific orthogonal contrasts to test particular hypotheses about land use effects. The assumption that the residuals from land use effects are independent random variables cannot be sustained because of the non-probability sampling. For this reason model-based analyses were used to test the hypotheses. One particular contrast, between the lead content in the soil of domestic gardens and that in the soil under parkland or recreational land, was modelled as a spatially dependent random variable, predicted optimally by cokriging.
We found that land use is an important source of variation in lead content of topsoil. Industrial sites had the largest mean lead content, followed by domestic gardens. Detailed contrasts between land uses are reported. For example, the lead content in soil of parkland did not differ significantly from that of recreational land, but the soil in these two land uses, considered together, had significantly less lead than did the soil of domestic gardens. Local cokriging predictions of this contrast varied substantially, and were larger in outer parts of the GLA, particularly in the south west.
Geoderma 11/2013; s 209–210:65–74. DOI:10.1016/j.geoderma.2013.06.004 · 2.77 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: As soils come under increasing pressure to maintain a range of ecosystem services, there is interest in how soils change over time in response to factors such as change in land-use. Many studies examining long- and short-term soil change have focused on soils with relatively high mineral and fertility status. Therefore, the aims of this study are to explore regional change on a marginal sandy soil formed over the Sherwood Sandstone outcrop in Nottinghamshire, U.K. (750 km2) and to assess changes in soil fertility as a function of the natural weathering process and land use change. The study uses data from three sources to examine differences between soil fertility properties under two major land-uses through the depth of the soil/mobile regolith (~ 1.6 m) and into the saprolite. It is proposed that the differences reflect in part the result of historical change in land-use. From old maps we identify the land-use changes back to 1781. This allowed us to compare soils that have been under woodland cover at least since 1781 with those that were converted to arable use in major deforestation between 1781 and 1881. Soils now under woodland have low concentrations of base cations, an acid pH and a mean organic carbon concentration (0–15 cm) of 2.7%. In contrast soils now under arable use have large concentrations of base cations, pH close to neutral and mean organic carbon concentration (0–15 cm) of 1.7%. There is evidence in the arable soils of leaching to depth of materials from applied fertilisers and lime. These results show the rapid change in properties of soil formed in bedrock, with small concentrations of nutrients and weatherable minerals, which can result from land-use change.
Geoderma 10/2013; s 207–208(1):35–48. DOI:10.1016/j.geoderma.2013.05.004 · 2.77 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We analyzed data on nitrous oxide emissions and on soil properties that were collected on a 7.5-km transect across an agricultural landscape in eastern England using the discrete wavelet packet transform. We identified a wavelet packet "best basis" for the emission data. Wavelet packet basis functions are used to decompose the data into a set of coefficients that represent the variation in the data at different spatial frequencies and locations. The "best basis" for a set of data is adapted to the variability in the data by ensuring that the spatial resolution of local features is good at those spatial frequencies where variation is particularly intermittent. The best basis was shown to be adapted to represent such intermittent variation, most markedly at wavelengths of 100 m or less. Variation at these wavelengths was shown to be correlated particularly with chemical properties of the soil, such as nitrate content. Variation at larger wavelengths showed less evidence of intermittency and was found to be correlated with soil chemical and physical constraints on emission rates. In addition to frequency-dependent intermittent variation, it was found that the variance of emission rates at some wavelengths changed at particular locations along the transect. One factor causing this appeared to be contrasts in parent material. The complex variation in emission rates identified by these analyses has implications for how emission rates are estimated.
[Show abstract][Hide abstract] ABSTRACT: We develop an algorithm for optimizing the design of multi-phase soil remediation surveys. The locations of observations in later phases are selected to minimize the expected loss incurred from misclassification of the local contamination status of the soil. Unlike in existing multi-phase design methods, the location of multiple observations can be optimized simultaneously and the reduction in the expected loss can be forecast. Hence rational decisions can be made regarding the resources which should be allocated to further sampling. The geostatistical analysis uses a copula-based spatial model which can represent general types of variation including distributions which include extreme values. The algorithm is used to design a hypothetical second phase of a survey of soil lead contamination in Glebe, Sydney. Observations for this phase are generally dispersed on the boundaries between areas which, according to the first phase, either require or do not require remediation. The algorithm is initially used to make remediation decisions at the point scale, but we demonstrate how it can be used to inform over blocks.