Content uploaded by Matthew J Aitkenhead
Author content
All content in this area was uploaded by Matthew J Aitkenhead on Jun 18, 2020
Content may be subject to copyright.
European Journal of Soil Science, 2014 doi: 10.1111/ejss.12199
Predicting Scottish topsoil organic matter content
from colour and environmental factors
M. J. A, D. D, L. S, D. G. M, M. C. C
&H.I.J.B
The James Hutton Institute, Craigiebuckler, Aberdeen, AB15 8QH, UK
Summary
Assessment of soil organic matter content using laboratory analysis can be costly and time consuming, so limiting
how often land managers assess this important property. This work demonstrates an ability to estimate topsoil
organic matter content from eld observations alone and provides a method by which rapid and cost-effective
assessments of soil organic matter status may be made. Models using environmental factors from the National Soil
Inventory of Scotland (NSIS) dataset as inputs to a neural network model were used to predict loss on ignition
(LOI). Two models, one for all soils and one for soils with small organic matter contents (LOI <20%), were
developed. It was found that the model developed for all soils produced reasonable predictive results across the
entire LOI range (R2=0.877), although it was not as effective at predicting small LOI values (R2=0.354) as the
small organic matter content model (R2=0.674). Both models were tested with imagery and data from samples
outwith the NSIS dataset to validate the approach. Predictive results were less accurate than when using NSIS
data. A discussion of possible improvements to make the model useful for eld observations of soils is given.
Introduction
Soil organic matter (SOM) controls a host of soil functions and
ecosystem services, and the development of effective policies and
monitoring tools for ensuring that SOM contents are maintained
or increased is a high priority (Orr et al., 2008). Much current
European soil policy-relevant research is focused on assessing and
improving SOM content (Glenk & Colombo, 2011), and future
policy objectives are likely to be even more concerned with this,
and encouraging farmers to manage their land in a manner that
will enhance ecosystem services. This is a complex and politically
sensitive topic that has received a great deal of attention in
recent years (Lal, 2009; Robbins, 2011). There are well-understood
agriculture management strategies such as no-till, set-aside or
cover crops that would improve soil agricultural productivity and
increase other ecosystem service provisions (Kassam et al., 2009;
Lal, 2010). However, to provide incentives for these strategies
through farmer payment schemes requires careful auditing not only
of management strategies, but also of their impact on the soil.
One vital component of any auditing system will be accurate,
cost-effective and rapid monitoring of ecosystem service indicators
across managed land.
Kibblewhite et al. (2008) argue that measurement of individual
soil properties does not provide an accurate indication of soil
Correspondence: M. J Aitkenhead. E-mail: matt.aitkenhead@hutton.ac.uk
Received 18 June 2013; revised version accepted 15 September 2014
health, because of the complexity and integrative nature of process
interactions within the soil. However, it is possible for specic
properties to be used as indicators of specic ecosystem services
(Haines-Young & Potschin, 2009; Maes et al., 2013) and SOM is a
particularly useful example. The measurement of SOM is relevant
to the determination not only of how much carbon is being stored,
but also how much farmers could be paid for keeping it stored.
Saby et al. (2008) and Aalders et al. (2009) emphasize the need for
effective national soil monitoring networks (SMNs) that are able to
monitor changes in SOM content at the regional and national scales.
Challenges to the measurement of SOM include: spatial variabil-
ity (Conant et al., 2011), which implies a need for intense spatial
measuring density; the inuence of many different factors such as
land use and soil type (Martin et al., 2011; Van Wesemael et al.,
2011), which can be resolved with a stratied sampling approach;
measuring changes in SOM content over time (Chapman et al.,
2013); and the need to obtain measurements rapidly and cheaply
in order to make the monitoring cost-effective. This last challenge
has received a great deal of attention in recent years (see McHenry,
2009) and has seen some breakthroughs in the use of monitoring
soil spectral properties with remote sensing, for example (Croft
et al., 2012), or eld spectroscopy techniques (Stevens et al., 2008;
Bellon-Maurel & McBratney, 2011).
Relationships between SOM and soil spectral characteristics have
been known to exist for some time (Barouchas & Moustakas, 2004).
Soil spectroscopy often uses wavelengths outside those visible to
humans, although visible wavelengths provide useful information
© 2014 British Society of Soil Science 1
2M. J. Aitkenhead et al.
(La et al., 2008; Aitkenhead et al., 2012). While spectroscopic
tools (Vasques et al., 2010) may make it possible to measure
SOM more accurately than with basic colour descriptors such as
Munsell (Munsell Color Company, 1954), RGB (the red, green
and blue values assigned to computer display pixels), equipment
and staff costs make it difcult to carry out soil surveys. The aim
of this work is to demonstrate an approach that relies on a suite
of easily-measured image colour properties to provide rapid and
accurate SOM assessment for land managers.
Improvements to SOM estimation can be made by including
information such as soil class or texture (Suuster et al., 2012),
topographic characteristics (Chaplot et al., 2001), climate or veg-
etation (Zhang et al., 2011). Here we use a neural network model
to integrate a number of ancillary properties to make predictions
of SOM content. The resulting system, which has been imple-
mented within a mobile phone application, is cost-effective and
rapid in allowing land managers to assess one of the key indica-
tors of soil health, and requires little or no expert knowledge to use.
Neural networks are particularly effective in capturing the relation-
ships between environmental factors and SOM content, as long as
the model parameterization, architecture and training approach are
selected appropriately (Li et al., 2013).
Our aim was to develop and demonstrate a method for estimating
soil organic matter (SOM) values from information on soil colour
and site characteristics. While we acknowledge that from a green-
house gas or carbon budget perspective, soil organic carbon (SOC)
information is more useful than SOM, we have used SOM rather
than SOC in this work. The reasons for this are that (i) farmers and
other land managers are usually more familiar with the concept of
organic matter content and (ii) SOM is accommodated on a scale
between none (0%) and all (100%) within the soil, making it eas-
ier to understand where on this universal scale a particular soil lies.
The SOC upper limits in soil are harder to pin down, making com-
prehension of the position of a soil in relation to others less easy to
describe. Work by De Vos et al. (2005) indicates that it is possible
to estimate SOC from loss on ignition values, implying that in situ-
ations where only LOI is known, it is still possible to estimate SOC
values if they are preferred.
Materials and methods
NSIS data preparation
A national grid-based survey of Scotland’s soils (National Soil
Inventory of Scotland, NSIS1) was rst carried out from 1978
to 1988 by the Soil Survey of Scotland. During this period, the
Macaulay Institute for Soil Research (now The James Hutton
Institute) was engaged in a programme to map the soils of Scotland
at 1:250 000 scale and NSIS1 was designed to create a dataset that
would hold environmental, morphological and analytical data on a
systematic grid basis and assist in ground-truthing for the mapping.
Later survey work to repeat some of this has been designated NSIS2,
but only the NSIS1 data have been used here. We refer to this
throughout as NSIS data.
Between 1978 and 1988, sample sites were located at every 5-km
intersection of the Ordnance Survey national grid (some sites were
not visited because of problems of access). Soil information was
recorded with a standard proforma to capture site and environmental
data. At each 10-km intersection, a full survey pit was also dug, and
the soil classied and described (Lilly et al., 2010). Samples were
taken from each morphological horizon using standard protocols
(Lilly et al., 2010) and returned to the laboratory for analysis. Each
sample was prepared by air drying and sieving (plus milling for
some analyses), then subjected to a standard list of physical, chemi-
cal and biological analyses. The following information derived from
this sampling and analysis was used in this work.
(1) Site and environmental data, including elevation, slope, veg-
etation and climate (mean monthly temperature and rainfall
interpolated from UK Meteorological Ofce data). These data
were extracted from spatial layers by using the known location
of the samples.
(2) Horizon depth and colour (middle of horizon taken as depth,
and Munsell colour in the eld).
(3) Loss on ignition (LOI, sample weighed, then dried at 105∘C
for a minimum of 2 hours, cooled and weighed, then heated at
900∘C for 2 hours, cooled again and re-weighed to determine
the loss on ignition).
Figure 1 shows the distribution of the NSIS sample points
across Scotland and the location and distribution of the Hartwood
sample points (see section ‘Additional testing and incorporation of
photoimagery data’ for further explanation), and Table 1 lists the
input variables used for development of the neural network model,
with the possible values or ranges of values associated with each.
Many of the variables in Table 1 were used directly as inputs
for the neural network models described below, with normalization
carried out to t them into a standard range. These included the
colour properties (which were derived from original soil Munsell
colour using a conversion table developed specically for this
purpose), temperature and rainfall. The conversion of Munsell
codes to RGB is not appropriate at certain large Value and Chroma
numbers as the range of RGB values is in effect not large enough
to cover all of the possible Munsell values, but within the range of
colours found within Scottish soils this was not an issue. The same
issue does not exist for the CIELab colour coordinate system (CIE,
1932), which was considered as an alternative to RGB but has only
minor improvements to the prediction of soil characteristics and has
additional processing costs (Aitkenhead et al., 2013). Most soils in
Scotland occupy the smaller value and chroma ranges of the hues Y
and YR, with a smaller number of samples being of hue R, G, GY,
G and BG. Soils in other parts of the world occupy a wider range
of Munsell colour codes than those in Scotland, but very few would
fall outside the range of colours possible using the RGB system,
so the methodology developed here would still be appropriate.
Problems with converting Munsell colour codes to the RGB system
occur generally with large value numbers such as ‘extremely pale’,
which are relatively rare although not unseen. Conversion tables
© 2014 British Society of Soil Science, European Journal of Soil Science
Predicting Scottish topsoil organic matter content 3
Figure 1 Map showing the distribution of points in the National Soils
Inventory of Scotland and of the sample points used in the Hartwood dataset
(points in yellow are the 32 used from the overall distribution of samples in
Hartwood).
from Munsell colour codes to RGB were developed through a
combination of internet-based sources, many of which are relatively
old and contained only partial look-up tables. One of the most useful
sites found was that at http://ccc.orgfree.com/ (Boronkay, 2012),
which supplies a Microsoft Excel spreadsheet containing a number
of conversion utilities. It is possible to nd other packages to convert
between Munsell and RGB values, for example within the statistical
package R. However, we did not have sufcient experience in the
use of these packages and so developed our own sub-routine, which
made use of the above look-up table.
Other variables, by their nature or range of values, required
manipulation into a form more suitable for model input. These
included the following.
(1) Elevation: the majority of values were at relatively low eleva-
tion, and it was thought that using a linear normalization would
result in the higher sampling points masking the effects of small
changes at low elevations. Therefore, the square root of the ele-
vation in metres was taken to reduce this effect.
Tab le 1 Input characteristics used for development of the neural network
model
Characteristic Range/values Characteristic Range/values
Topsoil R (red) 0– 255 Soil drainage Excessive
Topsoil G (green) 0– 255 Free
Topsoil B (blue) 0– 255 Moderate
Subsoil R (red) 0 –255 Imperfect
Subsoil G (green) 0 –255 Poor
Subsoil B (blue) 0– 255 Very poor
Elevation / m 1– 1200 Vegetation Deciduous
Slope / ∘0– 90 Coniferous
Aspect (north) 0– 180 Arable (crop)
Aspect (east) 0– 180 Grassland
(improved)
Slope form Flat Grassland (rough)
Concave Heath
Convex Bog
Straight
Slope type Flat Soil type Alluvial
Complex Calcareous
Simple Brown earth
Site drainage Normal Gley
Receiving Peat
Shedding Podzol
Mean temperature / ∘C0–20 Ranker
Mean rainfall / mm 0– 4000 Regosol
(2) Slope: as for elevation, most slope values were relatively small
and the square root of each slope value in degrees was therefore
used.
(3) Aspect: the values for aspect were originally given as degrees
measured in an anticlockwise manner from the north. Using
a linear normalization therefore results in a discontinuity
between values that are slightly east and slightly west of
north. To correct for this, aspect has been given as two values,
absolute degrees from north and absolute degrees from east.
This removes any discontinuity, but makes it necessary to have
two values in order to identify properly the original aspect
value.
(4) Slope form: this is given as a descriptive term in the original
survey, with four possible terms described using four dummy
variable inputs (A, at; B, concave; C, convex; D, straight).
(5) Slope type: for the same reason as for slope form, this property
was expressed as multiple inputs (three in this case). The same
reasoning was applied to site drainage, soil drainage, vegetation
and soil type.
An examination of the coefcient of determination between
individual predictors was carried out, to determine whether it
was possible to simplify the inputs meaningfully. Results showed
that when using an R2value of 0.9 as a threshold, there were
strong relationships between: elevation and some temperature
monthly variables; elevation and some rainfall monthly variables;
temperature monthly variables ‘near’ one another in time (for
© 2014 British Society of Soil Science, European Journal of Soil Science
4M. J. Aitkenhead et al.
Figure 2 Conceptual diagram of a feed-forward
fully-connected neural network as used in this work.
The number of input nodes equals the number of input
variables, while the number of nodes in the hidden layer
equals twice that number.
example between mean monthly temperatures for May and June,
but not between May and December); and also rainfall monthly
variables ‘near’ one another. Because climate variables showed
strong coefcients of determination with one another in many
cases there was an argument to be made for reducing the numbers
of variables used. However, uncertainty about whether or not
seasonality of climate was important in affecting SOM led to
the decision to retain all monthly climate variables. A further
investigation to nd out if SOM values were correlated strongly
with any input variables showed that for the full dataset with
LOI range 0–100%, the majority of the input variables had R2
values of less than 0.1. The exceptions to this were bog vegetation
(R2=0.161) and the presence or absence of peat soil (R2=0.171).
Similarly, there were small values for the dataset with the LOI range
of 0–20%. We therefore concluded that no single input variable
could be used to predict SOM content.
Neural network architecture and training
The neural network model used was a feed-forward
back-propagation network (Bishop, 1995; Goh, 1995) with one
hidden layer. The training algorithm for this network uses incre-
mental changes in the connections weights over many (usually
several thousand) training cycles to minimize the error between
actual and target outputs at output nodes (and the nodes in the
hidden layer between input and output layers; see Figure 2, which
gives a schematic of the architecture and connectivity of a standard
articial neural network). Input nodes for a neural network of this
kind accept values in the range [0, 1], meaning that variables need
to be adjusted to t. For continuous variables such as elevation or
RGB colour codes, the relevant values need only be normalized
within the possible range. For variables that have a number of
different categories, however, such as soil or land-cover type, we
have used dummy variables in the same manner as for slope form
described earlier.
The node response function used to determine the activation level
of all hidden and output nodes was that given in Equation (1):
y=1∕(1+e−𝛽x),(1)
where y is the output activation and lies in the range [0, 1], x is
the input activation [−∞,+∞], e is Euler’s number (approximately
2.71828) and 𝛽is the node response variable [0, +∞]. The number
of nodes in the hidden layer (86) was equal to twice the number
of input nodes (43), in accordance with Kolmogorov’s theory on
neural network architecture (Bishop, 1995). In order to optimize
the training rate 𝛼(which controls the rate at which connection
weights are adjusted) and node response variable 𝛽(which controls
the sensitivity of each node’s activation level to input values), all
combinations of the values (𝛼=0.0001, 0.001, 0.01, 0.02, 0.05, 0.1
and 𝛽=0.1, 0.2, 0.5, 1.0, 2.0, 5.0) were used in training a network
for 105steps, which took approximately 1 hour on a standard
desktop PC using Microsoft VB6.0 to implement the NN models.
From the results of comparing R2, RMSE (root mean square error)
and MAE (mean absolute error) using the cross-validation approach
described below, the values of 𝛼=0.02 and 𝛽=1 were used.
The MAE values varied only slightly across the different variable
combinations, while very slightly larger values of R2were obtained
with larger values of 𝛼, as was found for RMSE and smaller values
of 𝛽. The combination of values used was therefore considered to
be optimal to provide a balance of statistical evaluation variables.
Training was carried out by splitting the dataset described in
Section NSIS data preparation (2614 data points) into 10 subsets of
approximately equal size, by assigning each data point to a subset
© 2014 British Society of Soil Science, European Journal of Soil Science
Predicting Scottish topsoil organic matter content 5
at random. We then used tenfold cross-validation training, in which
10 models were each trained using nine out of the 10 subsets, with
the nal subset in each case used for testing of that model. Each
model was tested using a different subset, to allow robust ‘blind’
validation of the models while at the same time making full use of
all available data points. In order to avoid the problems caused by
attempting to train the NN to give values of 0 and 1 for the smallest
and largest output values (which according to Equation (1) would
require inputs of −∞ and +∞, respectively), the output range was
adjusted to lie within the range [0.1, 0.9] by normalizing along a
linear scale within this range. The consequence of this is that output
values of the trained network tend to fall within the same range
and were converted back to the range [0, 1] after output. After the
nal training with the variable values given earlier, the network
was evaluated using the test dataset. Values of RMSE, R2,MAE
and mean error for the actual and predicted values were calculated.
These values when given for the cross-validation training are across
the full dataset rather than one of the validation subsets.
In addition to training a neural network model using the full
dataset containing LOI values between 0 and 100%, a secondary
dataset that contained LOI values between 0 and 20% (1665 data
points) was generated. This was done in order to determine whether
or not a model restricted to small LOI values would be more
accurate within this range than the model trained on the full range
of values. As a priority of this work is to produce a model that could
be used by agricultural land managers, it is important to have a
model that operates best within the small organic matter range most
commonly found on agricultural soils in Scotland. The secondary
dataset output values were adjusted to t in the range [0.1, 0.9]
as above, and training was once again carried out using tenfold
cross-validation. The same statistical evaluation as described earlier
was carried out on the network trained and tested using these
secondary datasets.
Additional testing and incorporation of photoimagery data
To carry out further validation of the two NN models (full range
and small organic matter content), a soil organic matter dataset was
used from eld experiments at the Hartwood Research Station in
Lanarkshire, Scotland. This is an upland farming area of 350 ha,
between 150 and 300 m a.s.l.. None of the data used for testing
from this area was used in the training of the neural network
models described above. Thirty-two observations were selected
from a total of 319 made at the site (these were the only ones for
which imagery was available; see later), with the selected points
distributed spatially over the whole study area and including all
of the possible soil and land-cover types. Of the 32 sample points
used, 20 had topsoil LOI values less than 20% and were used
for testing the ‘mineral soil’ network model. The GPS locations
for each observation allowed the relevant environmental variables
given in Table 1 to be determined from existing spatial datasets.
A Nikon E5000 (Nikon, Tokyo, Japan) mid-range compact camera
had been used to obtain digital photographs of the soil at each
observation site, allowing RGB values to be estimated directly
Figure 3 Example image taken at Hartwood of a soil core used for colour
evaluation.
from the images. This estimate of colour was carried out after
adjustment of the colour values in each image with an automated
colour-correction method designed to balance the RGB values
of a white sheet of paper shown in each image. Optimal colour
correction using a standardized ‘colour card’ was not possible
as the imagery was obtained prior to the decision to use it for
organic matter content estimation, and so a standardized colour
correction card was not present in the image. However, visual
analysis of features in the corrected imagery such as the auger, grass
or the clipboard used indicated that the image colour, and therefore
presumably the natural soil colour, had been correctly restored by
the automated colour correction process. Figure 3 gives an example
of the imagery acquired during eld sampling at Hartwood, from
which the relevant area of the image (the soil in the auger) was
cropped and the RGB values averaged over a window of size
10 ×10 pixels for both the topsoil and subsoil.
Results
Full organic matter content range model
The neural network model trained with all 2614 data points with
organic matter values ranging from 0.81 to 98.7% predicted SOM
with an R2value of 0.877. The best-tted straight line produced for
comparing real and predicted values had a gradient of 0.730 and an
intercept of the yaxis of 5.34 (Figure 4). The RMSE value for the
entire test dataset was 11.13%, the ME was +2.15% and the MAE
was 5.87%. However, Figure 5 shows that when the RMSE values
were plotted against LOI values grouped within 1% LOI intervals
across the test data (as shown by Martin et al., 2011), this RMSE
value was not consistent across all LOI values. This curve can be
partially explained by the proportionally infrequent occurrence of
LOI values in the range between 20 and 90%, implying that the
network is less well trained on ‘organo-mineral’ soils than it is on
mineral or organic soils. For values greater than 93%, the RMSE
dropped once more as the number of samples increased. The fact
© 2014 British Society of Soil Science, European Journal of Soil Science
6M. J. Aitkenhead et al.
Figure 4 Actual plotted against predicted values of LOI for the testing
dataset and using all proportions of organic matter between 0 and 100%.
Some predicted values lie outwith the range [0, 100%] because of the
re-adjustment of NN output values caused by initial normalization of the
training values to make them lie in the range [0.1, 0.9].
Figure 5 RMSE values plotted against LOI for all LOI values, grouped
within intervals of 1% LOI.
that some of the predicted SOM values were less than 0 or more than
100% in Figure 4 is explained by the fact that normalization was
used during training to t the neural network output values between
0.1 and 0.9. If an output of less than 0.1 is given for example, then
converting the outputs back to the range [0, 100%] will result in
negative values. In practice, values outwith the range of possible
values (less than 0 or more than 100%) should be rounded to the
nearest ‘possible’ value of 0 or 100%, respectively.
The integration of colour and site descriptor data as inputs to the
model was assumed to provide a better model than using colour
or site descriptors alone. In order to test this assumption, the LOI
full-range dataset was used to develop models with (A) colour only
and (B) site descriptors only. The R2value obtained with colour
alone was 0.424, while that obtained using only site descriptors
was 0.605. Compared with the R2of 0.877 obtained with all data,
this gives a clear indication that using both types of information
produces a better model.
Figure 6 Actual plotted against predicted values of LOI for the testing
dataset with all proportions of organic matter between 0 and 20%. Some
predicted values lie outwith the range [0, 1] due to the readjustment of NN
output values caused by initial normalization of the training values to make
them lie in the range [0.1, 0.9].
Small organic matter content model
When tested on data containing LOI values of less than 20% alone,
the NN model trained on the full range of LOI values gave an
R2value of 0.354 and an RMSE of 7.31%. In addition, the MAE
was 4.10% and the ME was +1.84%. The RMSE, MAE and ME
values were improvements on those given for the dataset with the
full range of LOI values. However, this performance was still not
good enough to be applied in the eld, based on knowledge of (1)
ranges of organic matter content in agricultural soil, (2) the impacts
of land management on the organic matter content of these soils, and
(3) the estimated accuracies available from traditional laboratory
based LOI measurements. Nearly all cultivated land has LOI values
of less than 20%, so the rst NN model would not only provide
poor predictions for farmers but is also not focused on the necessary
range of values. The second network, trained only on LOI values of
less than 20%, gave an R2value of 0.674, an RMSE of 1.842%,
a mean absolute error of 1.327% and a mean error of 0.938%.
These values are an improvement on the ‘full-range’ model, and
demonstrate the effectiveness of the approach. Figure 6 shows the
relationship between predicted and target values of LOI for the
points with values less than 20% from the small organic matter
content model, while Figure 7 shows the relationship between LOI
and RMSE for this model, as found by Martin et al. (2011). Figure 7
shows that the RMSE increases from a minimum near zero organic
matter content to between 2 and 2.5% at 20% LOI. This matches
what is seen in Figure 5 for the same LOI range, but with much
smaller RMSE values. With these levels of accuracy, predictions of
SOM are much more useful.
Additional testing with Hartwood soils
The sample data from the Hartwood eld station provided a further
test of the method, using eld data obtained outwith the sampling
protocols of the NSIS survey, and with digital imagery used to
© 2014 British Society of Soil Science, European Journal of Soil Science
Predicting Scottish topsoil organic matter content 7
Figure 7 RMSE values plotted against LOI for all LOI values up to 20%,
grouped within intervals of 1% LOI.
Tab le 2 Statistical evaluation of the predictions made by the two neural
network LOI prediction models against actual LOI values for Hartwood soils
Full range (N =32) Small LOI (N =20)
Regression gradient 0.715 0.488
Regression intercept 5.78 2.71
R20.844 0.626
RMSE / % 13.92 3.04
Mean absolute error / % 6.31 1.97
Mean error / % 2.01 1.22
derive soil colour instead of Munsell estimates converted to RGB
values. Table 2 shows the R2, RMSE, mean absolute error and
mean error values given by the two neural network models for
the Hartwood soils. For the ‘full range’ model, 32 sample values
were used, while for the ‘mineral soil’ model, 20 sample values
were used. As can be seen, the models gave predictive accuracies
comparable to those obtained with the NSIS test data, showing that
the models can be applied effectively to the prediction of soil LOI
values for new sites.
Discussion
We have demonstrated the applicability of a neural network mod-
elling approach that can be used to predict soil LOI content from
observable environmental variables and soil colour. This approach
has been used to develop two models, one for soils with small
organic matter content and one for all soils. The small organic mat-
ter content model is more accurate within its target range of LOI
proportion values (0– 20%), while the ‘full range’ model is more
accurate at small (mineral soil) LOI values and less so at the inter-
mediate ‘organo-mineral’ range. Cultivated soils in the UK com-
monly have small LOI values in comparison to forest or moorland
soils (although extensively grazed moorland soils can have organic
matter-rich layers), and the small-LOI neural network model pre-
dicts LOI values with a degree of accuracy that allows soil organic
matter content to be rapidly estimated in the eld. This therefore
indicates that the approach used here will be more applicable for
small LOI soils such as those under agriculture. While this approach
is not as accurate as laboratory analysis, it provides an assessment
of organic matter content that is potentially useful for mineral soils.
Although estimates of variation within LOI measurements vary for
control samples, normal gures quoted for accuracy are between 0.3
and 0.5% (Hoskins, 2002; Jason Owen, personal communication).
Recent work by Nocita et al. (2014) using Vis-NIR (visible-near
infrared) spectroscopy gives accuracy gures between 0.36 and
1.19% for soil organic carbon (SOC) at the European scale for min-
eral soils. If this is multiplied by a factor of between 1.5 and 2 to
convert to SOM and assuming that this is similar to LOI, it shows
that spectroscopy possibly provides slightly better input data than
RGB values alone. However, this is based on laboratory measure-
ments that are neither as rapid nor as cost-effective as the eld-based
assessment possible with colour alone. De Vos et al. (2005) showed
that LOI could be used to estimate TOC in soils with an R2of
0.98, even for soils with small organic matter contents, with the tra-
ditional multiplication factor of 0.58 being a good match for the
relationship. If we assume that TOC and SOC are the same, then
it is therefore acceptable to assume that an estimate of LOI can be
used to produce an estimate of SOC that is useful for land managers
and scientists alike. Some error propagation between calculation of
LOI and SOC will reduce the accuracy of the prediction, however.
We have shown that it is possible to assess LOI content rapidly and
cheaply to within an acceptable degree of error, with the decrease
in accuracy balanced against an improvement in speed and cost.
If implemented within a software tool, this can be useful for land
managers in assessing soil fertility and health. Recent work at
the James Hutton Institute has resulted in an application (‘app’)
(SOCiT) for Android and Apple mobile phones that makes use
of this model for Scotland (Donnelly et al., 2013). This ‘app’ is
potentially useful for assessing soil carbon stocks and budgets
over time, and provides a novel method of rapidly monitoring the
distribution of soil organic matter at small spatial scales.
One potential issue with the use of the LOI models described here
is that of knowing when to use the best model. The NN model used
for predicting small (<20%) LOI values worked better than the ‘full
range’ model for mineral soils, but without already knowing that
the soil has a relatively poor organic matter status it is not possible
to decide when that model should be applied. Land managers will
usually know the approximate organic content of their soils, and so
should be able to make that decision successfully, while a surveyor
unfamiliar with a specic site should be able to judge whether a
soil has a ‘small’, ‘medium’ or ‘large’ organic matter status based
on the relative colour, texture and structure of topsoil and subsoil.
However, this judgement is likely to be error-prone particularly for
soils with LOI values approaching 20% and this is an acknowledged
weakness of the system. Once this judgement has been made, it
could be used to select the model to be used in the hypothetical
software tool mentioned above. An example of such a tool (which
is only applicable for mineral soils on agricultural and forested land
in Scotland) is the SOCiT app mentioned above (Donnelly et al.,
© 2014 British Society of Soil Science, European Journal of Soil Science
8M. J. Aitkenhead et al.
2013). Existing online information, such as SIFSS (soil information
for Scottish soils), can be used to indicate the range of values for
the soil series present at a specic location, and can also provide
information about the indicative soil type. Recent work at the James
Hutton Institute has produced an iPhone ‘app’ implementation of
the SIFSS web application, allowing it to be used in the eld.
We have not evaluated the neural network models to determine
the relative or absolute sensitivity of OM predictions in relation
to individual inputs. Some of the input variables will be more
inuential than others, and it would be useful in future work to be
able to determine if there were some inputs that could be dropped
from the model without altering the overall accuracy of the system.
The model used here will not operate if any of the input values are
missing, and as some variables are harder to measure than others
in the eld, it might be possible to eliminate some from future
work and make the implementation and use of the model easier.
Reducing the number of input variables might also have an impact
on the accuracy of the system by eliminating some sources of error,
as each dataset used will have some degrees of error associated
with it. There will also be natural variation in the system that is
not accounted for and which is caused by other environmental
factors not considered. There are doubtless also errors caused by
additional factors, such as the impact of soil moisture on colour and
the natural variation of soil colour caused by mineralogy. These
are sources of error that would be difcult to eliminate without
carrying out detailed analysis of the soil, and this would eliminate
any usefulness of the approach in terms of rapid eld-based soil
assessment.
The models demonstrated here have been applied solely for
prediction of soil organic matter content. However, recent work
has shown that the concept could also be applied to carbon
budgets, with the loss or increase of organic matter content in a
soil being predicted (Liles et al., 2013) for different management
and environmental conditions. This is arguably a more useful
application of the soil organic matter model concept, as it would
provide information about the changes in the SOC. However, it is
harder to obtain information about the rate of change of organic
matter in a soil than it is to get information about the current organic
matter status. Acquiring sufcient data to train a model that could be
accurately applied across a whole country would require additional
investment in long-term monitoring networks.
A comparison with the work of Liles et al. (2013) is useful as this
also aimed to predict soil carbon across a range of soil types and
environmental conditions. The samples in this case were prepared
in the laboratory (air-dried and sieved) and illuminated under
controlled conditions for colorimetry, and the statistical analysis
was carried out after grouping the samples, either into soil type or
parent material. This preparation of the samples resulted in smaller
RMSE values for Liles et al. (0.35 –0.8% for soils with <4%
carbon, and 1.2% for soils with >4% carbon) than for the neural
network model trained across all soil types and with eld data (1.8%
for testing within the standardized dataset used). This difference
in accuracy is to be expected given the variation in lighting, soil
moisture and other conditions, and the fact that the Munsell colours
provided for the NSIS data were evaluated by eye. However, it does
given an indication of the levels of accuracy that could be aimed for
in the future.
Conclusions
While we have developed an approach that is potentially useful
for assessing soil organic matter contents rapidly in the eld,
improvements are required to the models developed here before
they can be used to detect changes to soil organic matter content
caused by land-management activities or some other environmental
driver. It is also necessary to improve the estimation accuracy
in order to make them more effective for soils with very little
organic matter. Improvements could be sought in three different
ways: (i) improving the modelling approach, through the use of a
more sophisticated neural network training algorithm (or another
modelling method entirely, if it is demonstrably superior); (ii)
improving the colour sensor information, using colorimetric sensors
with better spectral resolution or accuracy or by adding available
multispectral remote sensing data; or (iii) increasing the amount
of information available from site characterization. This could
include additional topographic features, more detailed geological
information or information from more detailed soil maps than the
one used.
We have shown that neural network modelling can be used
to predict soil LOI content based on easily obtained, in situ
observations including soil colour determined by imagery. We
have also demonstrated that using colour or site character alone
produces less accurate models with this neural network method.
The approach has been used to develop two models, one that can
be applied to soils with any organic matter content and one that
can be applied to soils with small LOI values. This has potential
for a number of applications, including rapid soil organic matter
estimation and, if the accuracy is improved, monitoring changes in
soil C and the efcacy of management to enhance C sequestration.
Acknowledgements
The authors would like to thank QMS (Quality Meat Scotland)
for providing the co-funding for this work as a grant in aid award
matched to funding from The Scottish Government’s Rural and
Environment Science and Analytical Services Division (RESAS).
We would also like to thank Dr Keith Matthews, Dr Allan Lilly and
Dr Steve Chapman of the James Hutton Institute for information
and assistance provided.
References
Aalders, I., Hough, R.L., Towers, W., Black, H.I.J., Ball, B.C., Grifths, B.S.
et al. 2009. Considerations for Scottish soil monitoring in the European
context. European Journal of Soil Science,60, 833–843.
Aitkenhead, M.J., Coull, M.C., Towers, W., Hudson, G. & Black, H.I.J.
2012. Predicting soil chemical composition and other soil parameters
from eld observations using a neural network. Computers & Electronics
in Agriculture,82, 108– 116.
© 2014 British Society of Soil Science, European Journal of Soil Science
Predicting Scottish topsoil organic matter content 9
Aitkenhead, M.J., Coull, M., Towers, W., Hudson, G. & Black, H.I.J. 2013.
Prediction of soil characteristics and colour using data from the National
Soils Inventory of Scotland. Geoderma,200-201, 99– 107.
Barouchas, P.E. & Moustakas, N.K. 2004. Soil colour and spectral analysis
employing linear regression models. I. Effect of organic matter. Interna-
tional Agrophysics,18, 1– 10.
Bellon-Maurel, V. & McBratney, A. 2011. Near-Infrared (NIR) and
Mid-Infrared (MIR) spectroscopic techniques for assessing the amount
of carbon stock in soils – critical review and research perspectives. Soil
Biology & Biochemistry,43, 1398– 1410.
Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford
University Press, Oxford.
Boronkay, G. 2012. Colour Conversion Centre [WWW document]. URL
http://ccc.orgfree.com/ [accessed on 21 August 2014].
Chaplot, V., Bernoux, M., Walter, C., Curmi, P. & Herpin, U. 2001.
Soil carbon storage prediction in temperate hydromorphic soils using a
morphologic index and digital elevation model. Soil Science,166, 48– 60.
Chapman, S.J., Bell, J.S., Campbell, C.D., Hudson, G., Lilly, A., Nolan, A.J.
et al. 2013. Comparison of soil carbon stocks in Scottish soils between
1978 and 2009. European Journal of Soil Science,64, 455–465.
CIE 1932. Commission international de l’Eclairage proceedings.Cam-
bridge University Press, Cambridge.
Conant, R.T., Ogle, S.M., Paul, E.A. & Paustian, K. 2011. Measuring and
monitoring soil organic carbon stocks in agricultural lands for climate
mitigation. Frontiers in Ecology & the Environment,9, 169– 173.
Croft, H., Kuhn, N.J. & Anderson, K. 2012. On the use of remote sensing
techniques for monitoring spatio-temporal soil organic carbon dynamics
in agricultural systems. Catena,94, 64–74.
De Vos, B., Vandecasteele, B., Deckers, J. & Muys, B. 2005. Capability of
loss-on-ignition as a predictor of total organic carbon in non-calcareous
forest soils. Communications in Soil Science & Plant Analysis,36,
2899– 2921.
Donnelly, D., Aitkenhead, M.J. & Coull, M.C. 2013. SOCiT Soil Car-
bon App for iPhone/Android [WWW document]. URL http://www.
hutton.ac.uk/research/groups/information-and-computational-sciences/
esmart [accessed on 29 January 2013].
Glenk, K. & Colombo, S. 2011. Designing policies to mitigate the agricul-
tural contribution to climate change: an assessment of soil based carbon
sequestration and its ancillary effects. Climatic Change,105, 43– 66.
Goh, A.T.C. 1995. Back-propagation neural networks for modeling complex
systems. Articial Intelligence in Engineering,9, 143– 151.
Haines-Young, R.H. & Potschin, M.B. 2009. Methodologies for Dening
and Assessing Ecosystem Services. Final Report, JNCC, Project Code
C08-0170-0062. The University of Nottingham, Nottingham.
Hoskins, B. 2002. Organic Matter by Loss on Ignition [WWW document].
URL http://www.naptprogram.org/les/napt/publications/method-
papers/2002-organic-matter-by-loss-on-ignition.pdf [accessed on 21
August 2014].
Kassam, A., Friedrich, T., Shaxson, F. & Pretty, J. 2009. The spread of
conservation agriculture: justication, sustainability and uptake. Interna-
tional Journal of Agricultural Sustainability,7, 292– 320.
Kibblewhite, M.G., Ritz, K. & Swift, M.J. 2008. Soil health in agricultural
systems. Philosophical Transactions of the Royal Society B: Biological
Sciences,363, 685– 701.
La, W.J., Sudduth, K.A., Chung, S.-O. & Kim, H.-J. 2008. Spectral
reectance estimates of surface soil physical and chemical properties.
American Society of Agricultural & Biological Engineers Annual Inter-
national Meeting, 2008, 4159– 4172.
Lal, R. 2009. Soils and food sufciency. A review. Agronomy for Sustainable
Development,29, 113– 133.
Lal, R. 2010. Beyond Copenhagen: mitigating climate change and achiev-
ing food security through soil carbon sequestration. Food Security,2,
169– 177.
Li, Q.Q., Yue, T.X., Wang, C.Q., Zhang, W.J., Yu, Y., Li, B. et al. 2013.
Spatially distributed modeling of soil organic matter across China: an
application of articial neural network approach. Catena,104, 210– 218.
Liles, G.C., Beaudette, D.E., O’Geen, A.T. & Horwath, W.R. 2013. Devel-
oping predictive soil C models for soils using quantitative color measure-
ments. Soil Science Society of America Journal,77, 2173– 2181.
Lilly, A., Bell, J.S., Hudson, G., Nolan, A.J. & Towers, W.(Compilers)
2010. National Soil Inventory of Scotland 1 (NSIS_1): Site Location,
Sampling and Prole Description Protocols. (1975–1988). Technical
Bulletin. Macaulay Institute, Aberdeen.
Maes, J., Hauck, J., Paracchini, M.L., Ratamaki, O., Hutchins, M., Ter-
manen, M. et al. 2013. Mainstreaming ecosystem services in EU policy.
Current Opinion in Environmental Sustainability,5, 128–134.
Martin, M.P., Wattenbach, M., Smith, P., Meersmans, J., Jolivet, C.,
Boulonne, L. et al. 2011. Spatial distribution of soil organic carbon stocks
in France. Biogeosciences,8, 1053– 1065.
McHenry, M.P. 2009. Farm soil carbon monitoring developments and land
use change: unearthing relationships between paddock carbon stocks,
monitoring technology and new market options in Western Australia.
Mitigation & Adaptation Strategies for Global Change,14, 497– 512.
Munsell Color Company 1954. Soil Color Charts. Munsell Color Company
Inc., Baltimore, MA.
Nocita, M., Stevens, A., Toth, G., Panagos, P., van Wesemael, B. & Mon-
tanarella, L. 2014. Prediction of soil organic carbon content by diffuse
reectance spectroscopy using a local partial least square regression
approach. Soil Biology & Biochemistry,68, 337– 347.
Orr, H.G., Wilby, R.L., Hedger, M.M. & Brown, I. 2008. Climate change
in the uplands: a UK perspective on safeguarding regulatory ecosystem
services. Climate Research,37, 77– 98.
Robbins, M. 2011. Crops and Carbon: Paying Farmers to Combat Climate
Change. Routledge, Taylor & Francis, Abingdon.
Saby, N.P.A., Bellamy, P.H., Morvan, X., Arrouays, D., Jones, R.J.A.,
Verheijen, F.G.A. et al. 2008. Will European soil-monitoring networks be
able to detect changes in topsoil organic carbon content? Global Change
Biology,14, 2432– 2442.
Stevens, A., van Wesemael, B., Bartholomeus, H., Rosillon, D., Tychon,
B. & Ben-Dor, E. 2008. Laboratory, eld and airborne spectroscopy for
monitoring organic carbon content in agricultural soils. Geoderma,144,
395– 404.
Suuster, E., Ritz, C., Roostalu, H., Kolli, R. & Astover, A. 2012. Modelling
soil organic carbon concentration of mineral soils in arable land using
legacy soil data. European Journal of Soil Science,63, 351 –359.
Van Wesemael, B., Paustian, K., Andren, O., Cerri, C.E.P., Dodd, M.,
Etchevers, J. et al. 2011. How can soil monitoring networks be used to
improve predictions of organic carbon pool dynamics and CO2 uxes in
agricultural soils? Plant & Soil,338, 247– 259.
Vasques, G.M., Grunwald, S. & Harris, W.G. 2010. Spectroscopic models
of soil organic carbon in Florida, USA. Journal of Environmental Quality,
39, 923– 934.
Zhang, C.S., Tang, Y., Xu, X.L. & Kiely, G. 2011. Towards spatial
geochemical modelling: use of geographically weighted regression for
mapping soil organic carbon contents in Ireland. Applied Geochemistry,
26, 1239– 1248.
© 2014 British Society of Soil Science, European Journal of Soil Science