ISRIC - World Soil Information
  • Wageningen, Netherlands
Recent publications
Soils are the largest terrestrial reservoir of organic carbon, yet they are easily degraded. Consistent and accurate monitoring of changes in soil organic carbon stocks and net greenhouse gas emissions, reporting, and their verification is key to facilitate investment in sustainable land use practices that maintain or increase soil organic carbon stocks, as well as to incorporate soil organic carbon sequestration in national greenhouse gas emission reduction targets. Building up on an initial review of monitoring, reporting and verification (MRV) schemes with a focus on croplands, grasslands, and forestlands we develop a framework for a modular, scalable MRV system. We then provide an inventory and classification of selected MRV methodologies and subsequently “score” them against a list of key characteristics. It appears that the main challenge in developing a unified MRV system concerns the monitoring component. Finally, we present a conceptual workflow that shows how a prototype for an operational, modular multi-ecosystem MRV tool could be systematically built.
In digital soil mapping, modelling soil thickness poses a challenge due to the prevalent issue of right‐censored data. This means that the true soil thickness exceeds the depth of sampling, and neglecting to account for the censored nature of the data can lead to poor model performance and underestimation of the true soil thickness. Survival analysis is a well‐established domain of statistical modelling that can deal with censored data. The random survival forest is a notable example of a survival‐related machine learning approach used to address right‐censored soil property data in digital soil mapping. Previous studies that employed this model either focused on mapping the probability of soil thickness exceeding certain depths, and thereby not mapping soil thickness itself, or dismissed it due to perceived poor performance. In this study, we propose an alternative survival model to map soil thickness that is based on the inverse probability of censoring weighting. In this approach, calibration data are weighted by the inverse of the probability that soil thickness exceeds a certain depth, that is, a survival probability. These weights can then be used with most machine learning models. We used the weights with a regular random forest, and compared it with a random survival forest, and other strategies for handling right‐censored data, through a comprehensive synthetic simulation study and two real‐world case studies. The results suggest that the weighted random forest model produces competitive predictions, establishing it as a viable option for mapping right‐censored soil property data.
Maize (Zea mays) is an important staple crop for food security in Sub-Saharan Africa. However, there is need to increase production to feed a growing population. In Ghana, this is mainly done by increasing acreage with adverse environmental consequences, rather than yield increment per unit area. Accurate prediction of maize yields and nutrient use efficiency in production is critical to making informed decisions toward economic and ecological sustainability. We trained the random forest machine learning algorithm to predict maize yield and agronomic efficiency in Ghana using soil, climate, environment, and management factors, including fertilizer application. We calibrated and evaluated the performance of the random forest machine learning algorithm using a 5 × 10-fold nested cross-validation approach. Data from 482 maize field trials consisting of 3136 georeferenced treatment plots conducted in Ghana from 1991 to 2020 were used to train the algorithm, identify important predictor variables, and quantify the uncertainties associated with the random forest predictions. The mean error, root mean squared error, model efficiency coefficient and 90 % prediction interval coverage probability were calculated. The results obtained on test data demonstrate good prediction performance for yield (MEC = 0.81) and moderate performance for agronomic efficiency (MEC = 0.63, 0.55 and 0.54 for AE-N, AE-P and AE-K, respectively). We found that climatic variables were less important predictors than soil variables for yield prediction, but temperature was of key importance to yield prediction and rainfall to agronomic efficiency. The developed random forest models provided a better understanding of the drivers of maize yield and agronomic efficiency in a tropical climate and an insight towards improving fertilizer recommendations for sustainable maize production and food security in Sub-Saharan Africa.
Soursop (Annona muricata L.) is considered to be a neglected or underutilized plant. Recently, the plant has been found to have great medicinal potential and could significantly improve food security due to its high nutritional and calorific value. Soursop is reported to grow extensively in tropical environments. However, there is a lack of information on the most suitable soil conditions for its growth and development. This work aims is to identify and establish edaphic requirements of A. muricata through an extensive bibliographic study to contribute information necessary to improve soil management vis-à-vis optimal production. Optimal climatic conditions are as follows: mean annual temperature (22−25 °C), mean annual rainfall (2,000−2,500 mm), relative humidity (70%−80%), soil temperature at 0−50 cm soil depth (25−35 °C), soil moisture regime (udic), altitude (200–300 m a.s.l.). Highly suitable soil physical characteristics include well-drained and deep (> 180 cm) soils with a light texture (loam, sandy loam, loamy sand, sandy clay loam). As concerns soil chemical characteristics, pH should range between 5.5−6.5, high organic matter content, base saturation > 60%, and Al saturation < 20%. These results could be refined with data from field trials conducted in different environments.
Accurate modeling of site-specific crop yield response is key to providing farmers with accurate site-specific economically optimal input rates (EOIRs) recommendations. Many studies have demonstrated that machine learning models can accurately predict yield. These models have also been used to analyze the effect of fertilizer application rates on yield and derive EOIRs. But models with accurate yield prediction can still provide highly inaccurate input application recommendations. This study quantified the uncertainty generated when using machine learning methods to model the effect of fertilizer application on site-specific crop yield response. The study uses real on-farm precision experimental data to evaluate the influence of the choice of machine learning algorithms and covariate selection on yield and EOIR prediction. The crop is winter wheat, and the inputs considered are a slow-release basal fertilizer NPK 25–6–4 and a top-dressed fertilizer NPK 17–0–17. Random forest, XGBoost, support vector regression, and artificial neural network algorithms were trained with 255 sets of covariates derived from combining eight different soil properties. Results indicate that both the predicted EOIRs and associated gained profits are highly sensitive to the choice of machine learning algorithm and covariate selection. The coefficients of variation of EOIRs derived from all possible combinations of covariate selection ranged from 13.3 to 31.5% for basal fertilization and from 14.2 to 30.5% for top-dressing. These findings indicate that while machine learning can be useful for predicting site-specific crop yield levels, it must be used with caution in making fertilizer application rate recommendations.
For restoring soil health and mitigating climate change, information of soil organic matter is needed across space, depth and time. Here we developed a statistical modelling platform in three-dimensional space and time as a new paradigm for soil organic matter monitoring. Based on 869 094 soil organic matter observations from 339,231 point locations and the novel use of environmental covariates variable in three-dimensional space and time, we predicted soil organic matter and its uncertainty annually at 25 m resolution between 0–2 m depth from 1953–2022 in the Netherlands. We predicted soil organic matter decreases of more than 25% in peatlands and 0.1–0.3% in cropland mineral soils, but increases between 10–25% on reclaimed land due to land subsidence. Our analysis quantifies the substantial variations of soil organic matter in space, depth, and time, highlighting the inadequacy of evaluating soil organic matter dynamics at point scale or static mapping at a single depth for policymaking.
Site-specific estimation of lime requirement requires high-resolution maps of soil organic carbon (SOC), clay and pH. These maps can be generated with digital soil mapping models fitted on covariates observed by proximal soil sensors. However, the quality of the derived maps depends on the applied methodology. We assessed the effects of (i) training sample size (5–100); (ii) sampling design (simple random sampling (SRS), conditioned Latin hypercube sampling (cLHS) and k-means sampling (KM)); and (iii) prediction model (multiple linear regression (MLR) and random forest (RF)) on the prediction performance for the above mentioned three soil properties. The case study is based on conditional geostatistical simulations using 250 soil samples from a 51 ha field in Eastern Germany. Lin’s concordance correlation coefficient (CCC) and root-mean-square error (RMSE) were used to evaluate model performances. Results show that with increasing training sample sizes, relative improvements of RMSE and CCC decreased exponentially. We found the lowest median RMSE values with 100 training observations i.e., 1.73%, 0.21% and 0.3 for clay, SOC and pH, respectively. However, already with a sample size of 10, models of moderate quality (CCC > 0.65) were obtained for all three soil properties. cLHS and KM performed significantly better than SRS. MLR showed lower median RMSE values than RF for SOC and pH for smaller sample sizes, but RF outperformed MLR if at least 25–30 or 75–100 soil samples were used for SOC or pH, respectively. For clay, the median RMSE was lower with RF, regardless of sample size.
In 2019, the Government of India launched the National Clean Air Program (NCAP) to address the pervasive problem of poor air quality and the adverse effect on public health. Coordinated efforts to prevent agricultural burning of crop residues in Northwestern IGP (Indo-Gangetic Plain) have been implemented, but the practice is rapidly expanding into the populous Eastern IGP states, including Bihar, with uncertain consequences for regional air quality. This research has three objectives: (1) characterize historical rice residue burning trends since 2002 over space and time in Bihar State, (2) project future burning trajectories to 2050 under ‘business as usual’ and alternative scenarios of change, and (3) simulate air quality outcomes under each scenario to describe implications for public health. Six future burning scenarios were defined as maintenance of the ‘status quo’ fire extent, area expansion of burning at ‘business as usual’ rates, and a Northwest IGP analogue, of which both current rice yields and plausible yield intensification were considered for each case. The Community Earth System Model (CESM v2.1.0) was used to characterize the mid-century air quality impacts under each scenario. These analyses suggest that contemporary Bihar State burning levels contribute a small daily average proportion (8.1%) of the fine particle pollution load (i.e., PM2.5, particles < = 2.5 μm) during the burning months, but up to as much as 62% on the worst of winter days in Bihar’s capital region. With a projected 142% ‘business as usual’ increase in burned area extent anticipated for 2050, Bihar’s capital region may experience the equivalent of 30 PM2.5 additional exceedance days, according to the WHO standard (24-hour; exceedance level: 15 µg/m3), due to rice residue burning alone in the October to December period. If historical burning trends intensify and Bihar resembles the Northwest States of Punjab and Haryana by 2050, 46 days would exceed the WHO standard for PM2.5 in Bihar’s capital region.
Soil organic carbon (SOC) is essential for most soil functions. Changes in land use from natural land to cropland disrupt long-established SOC balances and reduce SOC levels. The intensive use of chemical fertilisers in modern agriculture accelerates the rate of SOC depletion. Domestic organic residues (DOR) are a valuable source of SOC replenishment with high carbon content. However, there is still a lack of knowledge and data regarding whether and to what extent DOR can contribute to replenishing SOC. This paper aims to unpack the potential of DOR as a SOC source. Total SOC demand and annual SOC loss are defined and calculated. The carbon flow within different DOR management systems is investigated in three countries (China, Australia, and The Netherlands). The results show that the total SOC demand is too large to be fulfilled by DOR in a short time. However, DOR still has a high potential as a source of SOC as it can mitigate the annual SOC loss by up to 100%. Achieving this 100% mitigation requires a shift to more circular management of DOR, in particular, more composting, and direct land application instead of landfilling and incineration (Australia and China), or a higher rate of source separation of DOR (The Netherlands). These findings form the basis for future research on DOR recycling as a SOC source.
Global, continental and regional maps of concentrations, stocks and fluxes of natural resources provide baseline data to assess how ecosystems respond to human disturbance and global warming. They are also used as input to numerous modelling efforts. But these maps suffer from multiple error sources and hence it is good practice to report estimates of the associated map uncertainty, so that users can evaluate their fitness for use. We explain why quantification of uncertainty of spatial aggregates is more complex than uncertainty quantification at point support, because it must account for spatial autocorrelation of the map errors. Unfortunately this is not done in a number of recent high-profile studies. We describe how spatial autocorrelation of map errors can be accounted for with block kriging, a method that requires geostatistical expertise. Next, we propose a new, model-based approach that avoids the numerical complexity of block kriging and is feasible for large-scale studies where maps are typically made using machine learning. Our approach relies on Monte Carlo integration to derive the uncertainty of the spatial average or total from point support prediction errors. We account for spatial autocorrelation of the map error by geostatistical modelling of the standardized map error. We show that the uncertainty strongly depends on the spatial autocorrelation of the map errors. In a first case study, we used block kriging to show that the uncertainty of the predicted topsoil organic carbon in France decreases when the support increases. In a second case study, we estimated the uncertainty of spatial aggregates of a machine learning map of the aboveground biomass in Western Africa using Monte Carlo integration. We found that this uncertainty was small because of the weak spatial autocorrelation of the standardized map errors. We present a tool to get realistic estimates of the uncertainty of spatial averages and totals of natural resources maps. The method presented in this paper is essential for parties that need to evaluate whether differences in aggregated environmental variables or natural resources between regions or over time are statistically significant.
Pedometricians have spent a lot of effort on mapping soil types and basic soil properties. However, end‐users typically need a more elaborate soil quality index for land management. Soil quality indices are typically derived from multiple individual soil properties by evaluating whether specific criteria are met. If this is based on individually mapped soil properties, then an important consequence is that cross‐correlations between soil properties are ignored. This makes it impossible to quantify the uncertainties associated with the mapped indices. The objective of this study was to map a soil potential multifunctionality index for agriculture (Agri‐SPMI) over a 12,125 km ² study region located along the French Mediterranean coast to help urban planners preserve soils of the highest quality. The index considered the ability of soils to fulfil four functions under five land use scenarios. A binary map represented each soil function fulfilment for a given scenario. The final soil quality index map was the sum of the 20 binary maps. A regression cokriging model was developed to map the basic soil properties first individually from legacy soil data and spatial soil covariates using a random forest algorithm, and next, interpolate the residuals using cokriging and the linear model of coregionalisation. The mapping uncertainties of soil properties were propagated by calculating the soil quality index over 300 stochastic simulations of soil properties derived from the linear models of coregionalisation. Results showed a poor prediction accuracy of the quality index, mainly because some soil properties were poorly predicted (notably available water capacity and coarse fragments) and used in combination with extreme thresholds that defined land suitability. Overall, the uncertainty was correctly quantified because the stochastic simulations reproduced the width of the observed distribution well, but the shapes of the distributions differed considerably from those of the observations. We envisage some ways for improvement, such as creating probability maps instead of the mean from simulations, and changing the prediction support from point to area. Highlights The study mapped a soil quality index (Agri‐SPMI) to help preserve soils of highest quality along the French Mediterranean coast. The Agri‐SPMI considered the ability of soils to fulfill four functions under five land use scenarios, derived from multiple individual soil properties. A regression cokriging model was developed to map the basic soil properties, followed by interpolating the residuals using cokriging and the linear model of coregionalisation. The study accurately quantified mapping uncertainties using stochastic simulations, but the soil quality index prediction accuracy was poor, with suggestions for improvement.
Using fertilisers is indispensable for closing yield gaps in Sub Saharan Africa. Current fertiliser recommendations, however, are often blanket recommendations which do not take spatial variation in soil conditions within a region or country into account. Soil maps can potentially support fertiliser recommendations at a higher spatial resolution. The QUantitative Evaluation of the Fertility of Tropical Soils (QUEFTS) model is a decision support tool that predicts crop yields as an indicator of soil fertility and can be used to evaluate yield responses to fertilisers. It was designed for field level output and runs on field-specific soil information. The aim of this study was to compare two methods for developing maps of QUEFTS output, i.e. maize yield and the yield-limiting nutrient, with Rwanda as a case study. We used a database containing soil analysis results of 999 samples collected across Rwanda. Transfer functions were applied to predict the required P-Olsen and Exchangeable K input for QUEFTS based on the soil data. For the “Calculate-then-Interpolate” (CI) method, transfer functions and QUEFTS were applied to point data, and the final output was then interpolated using random forest modelling. For the “Interpolate-then-Calculate” (IC) method, maps of the soil parameters were developed first, before applying calculations. Implications of the chosen method (i.e. CI or IC) on QUEFTS predictions on a national scale were evaluated using set-aside locations. Results showed low precision and accuracy of QUEFTS maize yield predictions across Rwanda. The CI method performed better in predicting QUEFTS yield and yield-limiting nutrient than the IC method. Correlations between mapped yield predictions and predictions on set-aside evaluation locations were similar for the CI (r = 0.444) and IC (r = 0.439) methods. The poorer performance of the IC method was mostly due to overestimation of yields, which was most likely caused by the effect of smoothing on the soil maps used as input for QUEFTS. We conclude that the CI method is the preferred method for spatial application of QUEFTS.
Soil classification is based on both the properties of the soil material and the pedogenetic pathways responsible for those properties. Because soil properties are linked to soil function and potential, information on soil classification has formed the basis for empirical interpretations of mapping units in terms of limitations or suitabilities for a wide range of land uses. In this way, a soil type acts as an accessible “carrier of information” presenting “the story of..”. Though valuable for broad land-use assessments, these empirical interpretations of soil functionality are inadequate to answer modern interdisciplinary questions focused on sustainable development. Four case studies are presented showing various quantitative approaches focusing on soil functions contributing to ecosystem services in line with the United Nations Sustainable Development Goals (SDGs) and the European Green Deal, demonstrating that: (i) the use of soil surveys and associated databases feeding soil–water-atmosphere-plant simulation models can contribute to defining soil functions and ecosystem services; (ii) hydropedological characterization of soil types can allow a strong reduction in the number of landscape units to be considered, improving practical applicability; (iii) pedotransfer functions can successfully link soil data to modeling parameters; (iv) functionality requires expression of soil management effects on properties of a given soil type, to be expressed by phenoforms; (v) only models can be applied to explore important future effects of climate change by running IPCC scenarios; and (vi) the most effective level of soil classification—acting as carriers of information when defining soil functionality—will differ depending on the spatial scale being considered, whether local, regional or higher.
Geostatistics and machine learning have been extensively applied for modelling and predicting the spatial distribution of continuous soil variables. In addition to providing predictions, both techniques quantify the uncertainty associated with the predictions, although geostatistics is more developed in this respect. Despite the increased use of these techniques, most algorithms ignore that the soil measurements are not error-free. Recently, concern has also arisen about the extrapolation risk of these techniques, be it in geographic space, feature space, or both. In this paper, regression kriging (RK) and random forest (RF) were compared with respect to their ability to deliver accurate predictions and quantify prediction uncertainties, while accounting for measurement errors in the soil data. The sensitivity of results of both models to soil measurement errors was also evaluated, as well as their spatial extrapolation potential. This was done for a case study in Cameroon where soil pH, clay and organic carbon were mapped from measurements obtained using both conventional and proximal soil sensing methods. The results showed that both models produced comparable ranges and maps of predicted values for the soil properties of interest. Compared to RF, RK outperformed RF by presenting generally a higher Model Efficiency Coefficient (MEC), lower Root Mean Squared Error (RMSE) values and better extrapolation performance. The improvement in RMSE was about 10, 12 and 2 % while the improvement in MEC was on average 5, 22 and 1 % for pH, clay and SOC, respectively Overestimation of the local uncertainty observed for RK was larger than that of RF as shown by accuracy plots, indicating that prediction uncertainties were better quantified by the RF model. Better extrapolation performance was obtained with RK that derived better predictions than RF at unsampled locations as shown by cross-validation metrics and scatter plots, particularly when RK and RF were used for spatial extrapolation. The effects of incorporating measurement errors were not significant both for the predictions and for the prediction uncertainties due to the fact that most calibration data had the same measurement error variance. Model comparison should go beyond common validation metrics that only evaluate prediction accuracy but must also account for the ability to quantify prediction uncertainty at unsampled locations.
Understanding the spatial-temporal dynamics of crop nitrogen (N) use efficiency (NUE) and the relationship with explanatory environmental variables can support land-use management and policymaking. Nevertheless, the application of statistical models for evaluating the explanatory variables of space-time variation in crop NUE is still under-researched. In this study, stepwise multiple linear regression (SMLR) and random forest (RF) were used to evaluate the spatial and temporal variation of NUE indicators (i.e., partial factor productivity of N (PFPN); partial nutrient balance of N (PNBN)) at county scale in Northeast China (Heilongjiang, Liaoning and Jilin provinces) from 1990 to 2015. Explanatory variables included agricultural management practices, topography, climate, economy, soil and crop types. Results revealed that the PFPN was higher in the northern parts and lower in the center of the Northeast China and PNBN increased from southern to northern parts during the 1990-2015 period. The NUE indicators decreased with time in most counties during the study period. The model efficiency coefficients of the SMLR and RF models were 0.44 and 0.84 for PFPN, and 0.67 and 0.89 for PNBN, respectively. The RF model had higher relative importance of soil and climatic covariates and lower relative importance of crop covariates compared to the SMLR model. The planting area index of vegetables and beans, soil clay content, saturated water content, enhanced vegetation index in November & December, soil bulk density, and annual minimum temperature were the main explanatory variables for both NUE indicators. This is the first study to show the quantitative relative importance of explanatory variables for NUE at a county level in Northeast China using RF and SMLR. This novel study gives reference measurements to improve crops NUE which is one of the most effective means of managing N for sustainable development, ensuring food security, alleviating environmental degradation and increasing farmer's profitability.
Present global maps of soil water retention (SWR) are mostly derived from pedotransfer functions (PTFs) applied to maps of other basic soil properties. As an alternative, ‘point-based’ mapping of soil water content can improve global soil data availability and quality. We developed point-based global maps with estimated uncertainty of the volumetric SWR at 100, 330 and 15 000 cm suction using measured SWR data extracted from the WoSIS Soil Profile Database together with data estimated by a random forest PTF (PTF-RF). The point data was combined with around 200 environmental covariates describing vegetation, terrain morphology, climate, geology, and hydrology using DSM. In total, we used 7292, 33 192 and 42 016 SWR point observations at 100, 330 and 15 000 cm, respectively, and complemented the dataset with 436 108 estimated values at each suction. Tenfold cross-validation yielded a Root Mean Square Error (RMSE) of 6.380, 7.112 and 6.485 10⁻²cm³cm⁻³, and a Model Efficiency Coefficient (MEC) of 0.430, 0.386, and 0.471, respectively, for 100, 330 and 15 000 cm. The results were also compared to three published global maps of SWR to evaluate differences between point-based and map-based mapping approaches. Point-based mapping performed better than the three map-based mapping approaches for 330 and 15 000 cm, while for 100 cm results were similar, possibly due to the limited number of SWR observations for 100 cm. Major sources or uncertainty identified included the geographical clustering of the data and the limitation of the covariates to represent the naturally high variation of SWR.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
13 members
Information
Address
Wageningen, Netherlands
Head of institution
Rik van den Bosch