ArticlePDF Available

Comparing Continuous Valued Raster Data: A Cross Disciplinary Literature Scan

Authors:
COMPARING CONTINUOUS
VALUED RASTER DATA
A CROSS DISCIPLINARY LITERATURE SCAN
Text: Alex Hagen-Zanker, RIKS bv
Layout: RIKS bv
Illustrations: RIKS bv
Published by: RIKS bv
© RIKS bv
June 2006
This is a publication of the Research Institute for Knowledge Systems (RIKS bv),
Abtstraat 2a, P.O. Box 463, 6200 AL Maastricht, The Netherlands,
http://www.riks.nl, e-mail: info@riks.nl, Tel. +31(43)388.33.22, Fax. +31(43)325.31.55.
Product information
This report presents research conducted that has been conducted with a view on further development of the
MAP
COMPARISON KIT. So far, the focus of the product has been on the comparison of multinomial maps, future
extensions will incorporate methods for continuous data discussed in this report.
The
MAP COMPARISON KIT has been developed as part of a series of projects carried out for the Netherlands
Environmental Assessment Agency (MNP), P.O. Box 1, 3720 BA Bilthoven, The Netherlands, the National Institute
for Coastal and Marine Management (RIKZ), P.O. Box 20907, 2500 EX Den Haag, The Netherlands, and, the
European Commission, Directorate General Joint Research Centre, Institute for Environment and Sustainability,
Land Management Unit, Urban and Regional Development Sector, Ispra, Italy.
For more information you are kindly requested to contact RIKS bv.
The latest information regarding the
MAP COMPARISON KIT, including further plans for development, new versions
of the software and/or documentation, will be made available from the project web-site:
http://www.riks.nl/mck.
Comparing continuous valued raster
data: A cross disciplinary literature
scan.
Alex Hagen-Zanker
Submitted to:
Netherlands Environmental Assessment Agency
Postbus 303
3720 AH Bilthoven
© RIKS bv June 2006
Research Institute for Knowledge Systems bv
P. O. Box 463
6200 AL Maastricht
The Netherlands
www.riks.nl
CONTENTS
CONTENTS................................................................................................................................................5
INTRODUCTION......................................................................................................................................7
1 BACKGROUND............................................................................................................................9
1.1 Image analysis.............................................................................................................................9
1.2 Meteorology..............................................................................................................................10
1.3 Spatial Statistics ........................................................................................................................10
1.4 Other fields ...............................................................................................................................11
2 METHODS ..................................................................................................................................13
2.1 State-of-the-practice versus state-of-the-art..............................................................................13
2.2 Fuzzy Numerical.......................................................................................................................14
2.3 An intensity-scale approach ......................................................................................................15
2.4 Wavelets and field forecast verification....................................................................................16
2.5 Image quality assesment ...........................................................................................................17
2.6 Information weighted comparison ............................................................................................19
2.7 Clustering of model errors ........................................................................................................21
2.8 Bivariate spatial association......................................................................................................22
2.9 Image warping ..........................................................................................................................23
3 DATA ...........................................................................................................................................25
3.1 Synthetic dataset .......................................................................................................................25
3.2 Practical dataset ........................................................................................................................26
4 RESULTS.....................................................................................................................................27
4.1 Cell-per-cell difference .............................................................................................................27
4.2 Fuzzy Numerical.......................................................................................................................30
4.3 Image Quality Assessment........................................................................................................32
4.4 Wavelet verification..................................................................................................................36
4.5 Image Warping..........................................................................................................................38
4.6 Bivariate Spatial Association ....................................................................................................39
5 CONCLUSIONS AND RECOMMENDATIONS ....................................................................43
6 REFERENCES............................................................................................................................45
ANNEX A: DETAILED RESULTS IMAGE QUALITY ASSESSMENT....................................49
- 5 -
INTRODUCTION
The main task of the Netherlands Environmental Assessment Agency is to advise the Dutch
government on a wide variety of environmental issues from a scientific base. Naturally the
advice is often based on spatial analysis and modelling. Map comparison is a reoccurring task
and is necessary for quantifying, visualizing and understanding analysis results as well as going
through the modelling process of verification, validation and calibration. In cooperation with the
Research Institute for Knowledge Systems a software tool is developed that supports this type of
analysis: The Map Comparison Kit (Visser et al. 2004).
The Map Comparison Kit (MCK) supports spatial modellers and analysts with a number of
methodologies for quantifying differences between raster maps. The purpose is to provide
insight in the extent, nature and spatial distribution of differences and similarities in pairs of
maps. Although the tool presents state-of-the-art techniques for the comparison of categorical
raster maps, the methodologies for numerical maps are only rudimentary. This report offers a
cross-disciplinary literature scan as a preliminary step to extending the functionality of the
MCK with advanced numerical map comparison methods.
A major challenge of map comparison is that the information contained in a map is more than
the sum of information present in all individual elements (pixels or cells in a raster map) of the
map, since essential information is captured in their spatial relationships (e.g. clustering
proximity, connectivity etc.) It is surprising then that the state of the practice in comparison
methods is still the cell-by-cell comparison. Luckily, this discrepancy is recognized and in
recent years considerable research has been directed at quantifications of map similarity that
account for spatial structure.
One strategy to involving spatial structure in the comparison is the recognition of features in the
landscape and basing the comparison on characteristics of those features. In categorical maps
the simplest, and possibly most meaningful features, are patches. Patches are groups of
contiguous cells taken in by a single category. Several of the comparison methods in the MCK
are based on the comparison of patch characteristics.
Another strategy to involve spatial structure in the comparison is multi-scale analysis. Here, the
main idea is that a single map contains information on several scales and that pairs of maps may
be similar at some but not all scales. One interpretation is to equate scale to resolution of the
raster, coarser scales are then found by aggregating cells. Alternatively a moving window can be
applied; in that case, the resolution of the coarse scale map is equal to that of the fine scale map
but values at the coarse scale are found as the aggregate (e.g. mean or distance weighted mean)
of the fine scale cells within a window. The scale of the map is then determined by the size of
the moving window and the distance decay weights.
An assumption in writing this report was that similar strategies can be followed for numerical
maps as well. More than an exhaustive overview the report is intended to provide a cross-
section of available methods and demonstrate their relative merits on two test datasets. Raster
data is found outside the field of environmental modelling and analysis. Relevant
methodological contributions were found in different disciplines such as image analysis,
geographical information science, hydrology, meteorology and biometrics.
It is recognized that there are many purposes for comparing maps, ranging from assessing
historical trends to detecting patterns in large collections of spatial data. This report, however, is
written from the point of view of model validation and assumes that the compared maps are a
pair of one observed and one forecasted map for the same moment in time, covering the same
- 7 -
area on an identical raster. Other applications are not excluded and the reader is kindly invited
to think ‘out of the box’.
The report in structured as follows. Chapter 1 gives background information on the different
disciplines of which comparison methodologies have been investigated. Pointers to relevant
literature are given and crucial particularities are discussed. Chapter 2 follows by highlighting 8
of the methods that were found in the literature. Their rationale is explained and the methods are
discussed in the light of their general applicability. A selection of 5 of the methods is evaluated
in Chapter 4 on the basis of 2 test cases that are introduced in Chapter 3. Conclusions and
recommendation are given in Chapter 5.
- 8 -
1 BACKGROUND
Researchers from different fields are facing similar problems when evaluating spatial data
(Boots & Csillag 2006), to recognize this fact, methodological contributions from different
disciplines have been sought. In particular the disciplines of image analysis, meteorology and
geographical information science have been investigated. Although the disciplines consider
similar problems there are also significant differences and this first section is intended to
highlight some particularities of the different fields.
1.1 Image analysis
Image analysis is concerned with abstracting information (measurements) from digital images.
The field of image analysis is vast and for the current purpose only methodological
contributions to image analysis are considered that compare greyscale images. Of course digital
images and maps are not the same; nevertheless greyscale images can be seen as continuous
valued maps, where the mapped property is luminance. Two main purposes are served by the
comparison of greyscale images.
The first main purpose is to quantify the effect of distortions by such as noise or other artefacts
by comparing the original image to a distorted version of it. This type of measurement is for
instance applied to evaluate the performance of image compression algorithms. The overview
paper by Eskicioglu & Fisher (1995) discusses and compares a number of such methods.
The second main purpose is content based image retrieval. Here the key is to find the image(s)
most similar to a target image in a database. This has many practical purposes, for instance in
biometrics(e.g. fingerprint and iris scan recognition) and data mining. An overview paper on
content based image retrieval was published by Smeulders et al. (2000) and presents over 200
references.
A particularity of greyscale images is that a single pixel practically has no meaning at all. The
grey level of a pixel is only a very indirect measurement of what is being observed, not in the
least because in general images are 2-D projections of 3-D objects. As an effect, one and the
same grey level in a picture of a human face may be present in the hair, eyes, skin or mouth
depending on how the light falls. As a consequence the meaning of a pixel should be based on
the context and the relation between the pixel and its surrounding may be highly complex.
Another particularity that may be less concerning is that greyscale images only contain positive
values, whereas in models of natural systems also negative value may occur (e.g. net rainfall =
rainfall – evaporation). Also, in practice greyscale images are not truly continuous; typically a
greyscale image only uses 256 grey levels. It is therefore quite feasible, and common, for
methodologies in image analysis to tabulate all possible greyscale values in a histogram. This
poses a problem of transferability when the method is applied on a map with truly continuous
values. A straightforward solution is the creation of bins (classes defined by upper and lower
boundaries), although the choice of the boundaries introduces an additional degree of
subjectivity.
- 9 -
1.2 Meteorology
Interest in the weather is from all times, weather has an impact on our lives ranging from
agricultural production to our sheer mood. It is therefore not surprising that there is a long
tradition on weather forecasting and that some of the comparison methods used for evaluating
weather forecasts date back to the late 19
th
century. An excellent overview of verification
methods for spatial weather forecast is maintained on the Internet (Ebert 2005)
As the field has evolved for such a long time, methods have refined and are in cases highly
adjusted to a particular kind of weather forecast and have limited general applicability. An
example of such an specialized approach is given by Ebert & McBride (2000). This elaborate
method aims at the verification of precipitation in weather systems. It recognizes Contiguous
Rain Areas on the basis of rain intensities and heuristic thresholds, and evaluates the cells within
these areas on errors attributed to location and quantity (intensity).
A common characteristic of many weather models is that they consider well localized
phenomena that follow a trajectory over space and time. For these systems it is therefore helpful
to attribute error to location, timing and magnitude. Another particularity is that weather
changes fast and it is being measured throughout the world and already for a long time. This
implies that comparisons between observed and forecasted maps can be made for large samples
of applications, which opens opportunities for analysis of the distribution of errors and to
generalize about the performance of different models. Exploring the temporal aspect of forecast
similarity is beyond the scope of this report.
Climate modelling is different from weather forecasting as it deals with large scale processes
over large timescales. Climate models are politically sensitive and results based on climate
models are heavily scrutinized. It is therefore not surprising that model evaluation of climate
models have much more focussed on statistically underpinning straightforward cell-by-cell
based methods, instead of introducing new “wild” structure based comparison methods.
Examples of such statistically rigorous papers are Wigley & Santer (1990) and Santer & Wigley
(1993).
1.3 Spatial Statistics
Spatial statistics is a discipline within the field of Geographical Information Science. It
recognizes that the application of regular statistical approaches on geographical data is often
problematic. O’Sullivan & Unwin (2003) provide an overview of properties of geographical
data that causes these problems, these are:
spatial autocorrelation
modifiable area unit problem, which holds that the conclusion of statistical analysis
may strongly depend on the subjective definition of area units
scale and edge effects, relations between spatial variables may be different across
scales.
The problem of spatial autocorrelation is of particular interest. Positive spatial autocorrelation
means that the values at a given location are similar to those found in the neighbourhood of the
location. This violates the common assumption in statistics that samples are mutually
independent and in effect means that the sample size is overestimated. Practically all maps
display spatial autocorrelation, therefore dedicated statistics are required for analysis of
geographical data. Statistics taking this spatial correlation into account make use of a spatial lag,
in analogy to the time lag in time series analysis. Moran’s I and Geary’s C statistics are
examples of statistics that measure the degree of autocorrelation as a function of the spatial lag.
- 10 -
1.4 Other fields
It is evident that continuous spatial data are relevant in other disciplines besides the three
mentioned in this section. Three examples of large domains that have been practically ignored
in this report are remote sensing, landscape ecology and hydrology.
Remote sensing concerns the observation of the earth from remote equipment, such as satellites
and aeroplanes. The results from remote sensing exercises typically are raster maps often of
continuous values. The accuracy assessment of remote sensing is often based on map
comparison. However remote sensing is very much pixel / cell oriented and the evaluation of
results is typically a cell-by-cell evaluation (Foody, 2002), therefore it has not received much
attention in this report.
Landscape ecology emphasizes the interaction between spatial pattern and ecological process
and the field has contributed much to the analysis spatial structure (Turner et al. 2001). The field
has been left largely out of consideration in this report, because of its focus on categorical
(multinomial) data.
An overview of comparison methods in the field of hydrology has been presented by Wealands
et al. (2005). The message of that paper is that the state of the art in map comparison for the
evaluation of hydrological models is largely limited to cell-by-cell mean squared error
calculations and the authors seek their inspiration in other disciplines.
- 11 -
2 METHODS
2.1 State-of-the-practice versus state-of-the-art
A discrepancy can be observed between the state of the art in map comparison and the state of
the practice on the other hand. This means that although methods are becoming available to
compare maps accounting for the spatial structures present in the data, the most practiced
procedures rely on cell-by-cell evaluations. This is noted by Wealands et al (2005) and
confirmed in many papers on model application based papers such as: Ahrens et al (1998),
Bishop et al. (2005), Garen & Marks (2005), Liu et al (1997), Strasser & Mauser (2001),
Viscarra Rossel & Walter (2004), Zhou & Liu (2004).
As a consequence the innovative methods have hardly established themselves and it is hard to
qualify their merits on the basis of practical applications. A contributing factor probably is that
spatial explicit comparisons tend to be rather complicated; not only from a conceptual point of
view but also the technical implementation in a Geographical Information System or other
software.
The availability of tools such as the Map Comparison Kit can lead to a wider dissemination and
saturation of promising methods in actual research practice. After introduction of the Fuzzy
Kappa methodology by Hagen (2003) followed an introduction of the software (Visser & de
Nijs 2005) and ultimately researchers are applying the method as part of the validation process
of their models (Prasad et al. 2006, Ménard & Marceau 2006). Another example of
dissemination via software implementation is the Idrisi GIS package which offers amongst other
the validation methodology by Pontius et al. (2004)
This chapter describes a number of selections from the vast body of work in which
methodologies are introduced. The selection of these papers is largely subjective; those papers
were chosen that are considered ‘promising’, ‘original’ or ‘highly cited’. Also it is attempted to
attain a degree of diversity, by including methods that apply different strategies and origin from
various disciplines. Not all methods are intended as generally applicable metrics, where
necessary comments are given regarding the general applicability.
- 13 -
2.2 Fuzzy Numerical
The point of the Fuzzy Numerical statistic is that the similarity at one location is set by the
degree to which a cell in one map is similar to its counterpart in the other map, or but discounted
for the distance the similarity to cells found in the direct neighbourhood of the counterpart in the
other map.
The Fuzzy Numerical method did not come forth out of the literature research. This method was
developed by the Research Institute for Knowledge Systems and implemented in the Map
Comparison Kit as part of earlier work for the Netherlands Environmental Assessment Agency.
The method follows the rationale of the Fuzzy Kappa (Hagen 2003, Hagen-Zanker et al. 2005)
but is adjusted to work with continuous instead of categorical data. The method is documented
in the MCK User manual (Hagen-Zanker et al. 2006).
Equations 1-4 describe the metric. It should be noted that equations 1-3 are generic as they are
open to the actual similarity function f(a,b). Equation 4 is the particular similarity function that
is applied. Naturally, the outcome of the comparison will depend on the particular distance
weight function that is being applied and its respective parameters. This introduces a degree of
subjectivity to the comparison.
()
()
(
)
(
)
,
,max ,
N
ijij
sAB fAB wd=∗
ij
() ()()
()
1.
2.
,min ,,,
iii
SAB sABsBA=
() ()
1
1
,,
i
i
SAB S AB
n
=
=
n
3.
()
()
,1
max ,
ab
fab
ab
=−
4.
Where s
i
(A,B) is the one-way similarity between map A and B at cell i. Index j iterates
through all N cells in the neighbourhood of cell i. S
i
(A,B) combines the two one-
way similarities into an overall similarity. S(A,B) is the overall map similarity, it is
the mean of over all n locations. The function f(a,b) determines the similarity of
two values. The function w(d) gives the weight pertaining to the distance.
- 14 -
2.3 An intensity-scale approach
Casati, B., Ross, G., & Stephenson, D. B. (2004). A new intensity-scale approach for the
verification of spatial precipitation forecasts. Meteorological Applications, 11(2), 141-
154.
The intensity scale approach (Casati et al. 2004) attributes errors to different scales and value
ranges (intensity). To separate errors over different scales, wavelet analysis is applied. Different
ranges are found by reclassification.
This comparison method operates in two phases; the first phase is pre-processing in this phase
the input maps are manipulated to the effect that maps will be compared on the basis of the
relative distribution of values over the map, rather than the absolute values. The second is the
intensity-scale verification where degrees of similarity are found related to scale (wavelet
decomposition level) and intensity (threshold value).
The first step of the pre-processing phase is dithering. In this step both the observed and the
forecasted map have some noise added. The amplitude of the noise is halve of the minimum
non-zero difference in values between locations and serves to ‘compensate for discretization
effects caused by finite precision storage of the precipitation rate values’, or in other words to
eliminate the occurrence of same valued cells.
The second step of the pre-processing phase is ‘normalization’ in which every cell in the map is
subjected to a 2-based logarithmical transformation. The transformation value of 0 is the 2-
based logarithm of to the amplitude of the dithering function.
The third step of the pre-processing phase is called recalibration and each value in the forecast
map is replaced by the value with the same empirical cumulative probability in the observed
map. Which, in effect, means; the same rank number when all values in the map are sorted from
high to low. The difference between the calibrated and the original forecast map can be seen as
a measure of differences in quantity (frequency distribution, or histogram).
In the intensity-scale verification phase first a binary map is created for both the observed and
pre-processed forecast map. The binary map has value 1 for all cells above a threshold value
and 0 for all others. As an effect of the pre-processing the number of cells of value 0 resp. 1 are
identical in both maps.
The binary maps are compared cell-by-cell yielding a comparison map with three possible
values:
Value -1 indicates present in observed but not in forecast
Value 1 indicates present in forecast but not in observed
Value 0 indicates present in both or in neither
The second step of the intensity-scale verification phase is the decomposition of the squared
error over different scales using wavelet decomposition on the basis of the Haar wavelet. Due to
the orthogonal and orthonormal properties of wavelets the squared error at each location as well
as the mean squared error (MSE) can be decomposed into error pertaining to each scale (4, 16,
64, 256 etc cell aggregates). Also following the assumption that random error is equally
partitioned over all scales the Heidke Skill Score (a.k.a. Kappa statistic) can be can be
decomposed.
The first and the second step of the intensity scale verification are repeated for all possible
values of the observed map, before pre-processing. Yielding, finally, plots of MSE as a function
of scale and threshold.
- 15 -
Some notes
The use of wavelets allows attributing errors to different scales. This is of course a beautiful
characteristic that is also exploited by Briggs & Levine (1997); therefore section 2.4 includes a
comparison on main lines between the methods.
The first step of dithering is strictly a pragmatic one and serves its purpose. For transparency it
should be noted that the same effect could be achieved by randomly ordering same-values cells
in the forecast map when sorting them in the third step of pre-processing.
The second step of the first phase introduces a redundancy to the procedure. The logarithmic
transformation does not alter the ranking of the values on the map, only the relative difference
in values. So it has no effect in the recalibration phase. Furthermore, the binary reclassification
step can be performed for transformed values just as well as untransformed values and is not
helped by the normalization either.
It seems that the authors have realized this redundancy because all results are expressed in units
relating to the non transformed values (i.e. mm/h, and not log(mm/h) ). Somehow this part of
the procedure has not been edited out of the paper. It is our recommendation to simply leave out
the logarithmic transformation.
The idea of recalibration as a technique for separating errors in quantity and in location is
compelling and may very well be applied in combinations with other comparison methods too.
It certainly warrants further investigation.
2.4 Wavelets and field forecast verification
Briggs, W. M., & Levine, R. A. (1997). Wavelets and field forecast verification. Monthly
Weather Review, 125(6), 1329-1341.
The main idea of the comparison operation proposed in this paper is to decompose the input
maps to a pile of maps at different scales, by using a discrete wavelet transform. The maps at
different scales are then compared against each other on the basis of two methods Root Mean
Squared Error and ACC, this second requires a climate field and can otherwise be replaced by
the correlation r.
The wavelet transformation serves two purposes. The first is removal of noise from the data.
The second is to attribute differences between forecast and observation to different scales.
The steps that are recognized:
Step 1. Perform a wavelet transformation of the observed/forecast (not clear) for a
number (library) of wavelets. Of the transformations pick the one with lowest
Shannon Entropy.
Step 2. Take the selected wavelet and apply a soft threshold function to remove
noise. The threshold value should be based on the Shannon entropy for each layer.
Step 3. The different scales are compared with respect to rmse (quantity), r
(pattern), ER (energy). A distinction similar to that by Wang et al (2004) that is also
discussed in this chapter.
Some notes
Step 2 in the comparison, noise removal, is based on assumptions on what defines noise. In
general purpose applications such assumptions cannot easily be made, and it may be advised to
not filter the data for ‘noise’.
- 16 -
The method is mathematically thorough and the results presented in the paper are convincing. It
must be kept in mind however that the decomposition of the maps into different scales is strictly
based on information theory and may have limited meaning in a physical sense. Also, applying
discrete wavelets means that the maps are aggregated (in wavelet coordinates) to coarser
resolution (2X2; 4X4; 8X8; 16X16, etc) which sets the border between aggregated cells more or
less arbitrary. Theoretically, offsetting both maps a single pixel in the same direction may result
in completely different conclusions. This may be irrelevant in the case of image processing,
where wavelets are used to store the information contained in an image more efficiently, but it
may be problematic in the current case; since we are in fact interested in the information
contained at different scales.
Comparison between Briggs & Levine (1997) and Casati et al. (2004)
Although presenting similar approaches and motivations there are some differences between the
two.
Table 2-1 Two wavelet based comparison methods
Briggs & Levine (1997) Casati et al (2004)
First compares, then decomposes
the comparison map
Decomposes input maps, then
compares
Classifies numerical data to
binary to remove outliers
Removes noise, by applying a
soft filter
Separates quantity from location
but does not quantify location
error
Separates quantity, pattern and
energy
Although it would be preferable to compare both methods in a practical setting it is, chapter 4
will focus on implement Briggs & Levine (1997) a representative case of comparing maps at
multiple scales using wavelet decomposition. The reason is that this method is most consistent
in its approach (information based, retaining the continuous character of the data) and therefore
most demonstrative of its potential.
2.5 Image quality assesment
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality
assessment: From error visibility to structural similarity. IEEE Transactions on Image
Processing, 13(4), 600-612.
This method comes from the field of image processing. The objective of the method is to
quantify the difference between an image and a distorted version of it. The main idea is that
similarity is a composite measure of luminance, contrast and structure each of which is
measured in a distance weighted moving window.
The approach is a multivariate one in which the difference at each location is specified as a
function three aspects of similarity. S(x,y) = f(l(x, y), c(x, y), s(x, y)). Where S is the overall
similarity, l is the similarity of the luminance (i.e. mean), c compares local contrast (i.e.
variance) and s compares local structure (i.e. covariance). The whole is calculated by means of a
distance weighted moving window, for which the distance decay function is Gaussian. There is
a strong conceptual relation to the work of Briggs & Levine (1997) who at different scales
consider root mean squared error, correlation and energy level.
- 17 -
The main variables for the comparison metric are mean, variance and co-variance as defined in
equations 5-7. These main variables are not calculated for the whole map, but instead for each
cell on the basis of a distance weighted moving window.
5.
1
N
xi
i
wx
µ
=
=
i
()
1
2
2
1
N
xiix
i
wx
σµ
=
⎛⎞
=−
⎜⎟
⎝⎠
6.
()
()
1
µ
2
1
N
xy i i x i y
i
wx y
σµ
=
⎛⎞
=−
⎜⎟
⎝⎠
7.
where w
i
is a distance decay weight based on a sum normalized Gaussian function with
standard deviation = 1.5 . Index i iterates through the N cells in the window. x
refers to a window in the first map and y one in the second. The procedure is
symmetric, meaning that S(x,y) = S(y,x).
The summary statistics are combined into the three similarity values according to equations 8-
10.
()
1
22
1
2
,
xy
xy
C
lxy
C
µ
µ
µµ
+
=
++
8.
()
2
22
2
,
xy
xy
cxy
C
2 C
σ
σ
σσ
+
=
++
9.
()
3
3
,
xy
xy
sxy
C
C
σσ
=
+
+
10.
where C
1
, C
2
and C
3
are constants that are introduced to the equation for stability in situations
where variability or mean are close to zero. The values are related to the range of
the pixel values (called L) via the constants K
1
and K
2
. These values have been
established heuristically by the authors. C
1
=(K
1
L)
2
, C
2
=(K
2
L)
2
, C
3
= 0.5 C
2
, K
1
=
0.01. K
2
= 0.03.
The components of similarity are combined into an overall measure of structural similarity
SSIM(x,y) by weighted multiplication. The overall structural similarity MSSIM(X,Y) is
calculated as the mean similarity over all locations
()()()
(, ) , , ,SSIMxy l xy cxy sxy
α
βγ
=
11.
1
1
(,) (, )
jj
j
M
M
SSIM X Y SSIM x y
M
=
=
12.
where, X and Y are the maps each representing M windows x
j
resp y
j
, α, β and γ are
parameters for which no preferred value has been found yet, therefore they take the
neutral value of 1.
- 18 -
Some notes
The approach is clear and robust. The method has conceptual links to the moving windows
based structure comparisons already present in the Map Comparison Kit.
The paper shortly discusses the occurrence of ‘blocking artefacts’. In this context, artefacts arise
as moving window based approaches only recognize pairs of mitigating errors (i.e. an over-
prediction close to an under-prediction) when they occur in the centre of the window, but not
when they occur at the edge of the window (because only the error cell and not its mitigating
counterpart is found within the window). As an effect small spatial errors are registered as large
errors at one window distance away from their actual location.
The distance weighted neighbourhood proposed in the paper reduces the effect of blocking
artefacts by applying a distance decay weight, causing errors at the edge of the window to be
small by definition. The fuzzy weighted neighbourhood that has been applied in the calculation
of the fuzzy kappa statistic (Hagen 2003, Hagen-Zanker et al. 2005) may fully eliminate the
blocking artefacts.
2.6 Information weighted comparison
Tompa, D., Morton, J., & Jernigan, E. (2000). Perceptually based image comparison.
Paper presented at the 2000 International Conference on Image Processing,
Vancouver, Canada.
This approach (referenced by Wealands et al 2005) offers a straightforward approach to
perception based image processing. The main idea is that changes that occur within value ranges
that are common on the map are weighted less than those that lie within uncommon ranges.
Thus, an information weighted mean squared error (IMSE) is introduced. The degree to which a
value is common is expressed by the Shannon information content. The INSE is calculated
according to equations 13-16:
()
,
M
z
M
M
n
Pz
N
=
13.
()
()
1
log
M
M
Iz
Pz
⎛⎞
=
⎜⎟
⎜⎟
⎝⎠
14.
() () ()
,
xxAxxB
IMSE A B A I A B I B=−
2
x
15.
() ()
1
,,
x
x
1
n
I
MSE A B IMSE A B
N
=
=
16.
where P
M
(z) is the probability (frequency) of finding value z in map M. I
M
(z) is the
information contained in a cell with value z in map M. Index x iterates over all
locations on the map and A
x
resp B
x
is the value at location x in map A resp. B.
Some notes
The limited range of values (256 grey levels) in the images that is considered in the paper
allows assessing information content on the basis of the occurrence of unique values. In
applications where higher definition continuous data is used a classification in bins is required.
- 19 -
The comparison method is not strictly a weighted MSE. This is best illustrated by an example:
consider a location that on both maps has value 5, in the first map 10% of all cells have value 5
and in the second map only 1%. This means that the information content of both cells is
different, which results in an IMSE that is unequal to zero even though the cells have the same
value:
()
() ()
2
5, 5, 5 0.1, 5 0.01
11
, 5log 5log 47.7
0.1 0.01
xxA B
x
ABP P
IMSE A B
== = =
⎡⎤
⎛⎞
=− =
⎜⎟
⎢⎥
⎝⎠
⎣⎦
17.
This illustrates the fact that the ‘information weighted’ MSE is not strictly speaking a weighted
MSE. This may appear inappropriate from some perspectives. It must be realized however that
in image processing the meaning of a pixel can only be considered in relation to the context (all
other pixels in this case) then it is reasonable that identical valued cells found within a different
context should actually be considered dissimilar.
The image quality assessment by Wang et al (2005) considers the context more thoroughly and
is therefore selected (instead of Tompa et al. 2000) to be further explored in Chapter 4.
- 20 -
2.7 Clustering of model errors
Zhang, L. J., & Gove, J. H. (2005). Spatial assessment of model errors from four
regression techniques. Forest Science, 51(4), 334-346.
Zhang & Gove (2005) present a methodology for assessing the spatial heterogeneity of model
performance. The authors do not consider direct mapping of the error sufficient, because that
does not identify significant clusters of positive or negative model errors. To obtain insight in
the clustering of the errors, the authors make use of a local indicator of spatial association
(LISA) called local Moran coefficient. A Moran value for each location is calculated according
to Anselin (1995) (equation 18).
()
()
()
1
n
ii ij j
j
M
Ceechee
=
=−
18.
Where e
i
and e
j
denote model errors at locations i and j, respectively, ē is the mean model
error over the whole map, c
ij
(h) is the binary spatial weight matrix as a function of
bandwidth h it takes the value 1 for all combinations of i and j where the distance
between i and j is smaller than the bandwidth and 0 otherwise.
A positive value indicates a clustering of same-valued errors, relative to the mean error. A
negative value indicates a cluster of opposite-valued errors relative to the mean.
Some notes
It should be noted that a small location error will lead to negative values in local Moran,
because it signifies an under-prediction in one location and an over-prediction near by. On the
other hand a cluster of cells for which both compared maps have identical values a positive
Moran’s I value will be found, because the deviation from the mean error of the cell and the
neighbours is same-valued. This makes the interpretation of the spatial Moran statistic quite
complex. The authors seem to realize this, as all conclusions in the paper are based on the
overall degree of clustering of errors.
- 21 -
2.8 Bivariate spatial association
Lee, S. I. (2001). Developing a bivariate spatial association measure: An integration of
Pearson's r and Moran's I. Journal of Geographical Systems, 3(4), 369-385.
Lee (2001) offers an approach to calculate bivariate spatial association reconciliating Pearson’s
r statistic as an aspatial measure of bivariate association and Moran’s I as a univariate measure
of spatial association. The resulting figure is given in equations 19-21. It is a conflation of the
bivariate spatial smoothing scalar (BSSS) and the correlation between the smoothed spatial
fields.
()
2
2
i
i
X
i
i
x
x
SSS
x
x
⎛⎞
⎜⎟
⎝⎠
=
19.
,XY X Y
B
SSS SSS SSS=
20.
,,
,
.
XY XY
XY
LBSSSr
∧∧
=
21.
Where
i
x
indicates the mean of the neighbourhood (all cells within the spatial lag) at
location i.
is the correlation over mean fields of X and Y.
l
l
,
XY
r
Some notes
Essentially it can be said that the correlation found between the mean fields is corrected for the
degree to which X and Y are spatially autocorrelated. (SSS
X
and SSS
Y
are indications of
autocorrelation). L
X,Y
measures the extent to which both map 1 and map 2 are spatially
autocorrelated and their neighbourhood mean fields are correlated as well. It does not become
clear from the paper, why this is a good measure of bivariate spatial association.
Possibly the Cross-Moran statistic (Wartenberg, 1985) that the paper intends to improve upon
may give insight into a more relevant question: does the presence of a variable in one variable
coincides with the presence of another variable in the other map either at the exactly the same
location or in the direct neighbourhood?
- 22 -
2.9 Image warping
Reilly, C., Price, P., Gelman, A., & Sandgathe, S. A. (2004). Using image and curve
registration for measuring the goodness of fit of spatial and temporal predictions.
Biometrics, 60(4), 954-964.
The approach by Reilly et al (2004) is similar to those by Hoffman et al. (1997) and Nehkorn et
al. (2003). The paper recognizes that errors are not limited to cell-to-cell differences, which are
called vertical difference, but that there are also location differences, which are called horizontal
differences. This notion is not different from the other methods discussed in this chapter. The
special trait of this method is how it quantifies horizontal errors. For this, a transformation is
sought, consisting of stretching and compressing the forecasted map along the horizontal plain
until an optimum fit with the observed map is found. Then, the degree of stretching and
compressing is considered the horizontal error and the cell-by-cell error after the deformation is
the vertical error. Optimizing the fit requires a weighting (trade off) of errors of both kinds.
The optimal deformation is chosen according to equation 22.
22.
()
()
()
()
()
()
()
,min , ,
fD
AA
y y G y x y f x dx F x f x dx
λ
λ
⎧⎫
Ι≡ +
⎨⎬
⎩⎭
∫∫
where G is the discrepancy metric (to be defined later) between the objective map y and the
deformed map
and F measures the amount of deformation. f(x) is the
deformation function, if the deformation is zero then f(x) = x and by definition
F(x,x) = 0.
y
In other words G expresses the vertical error and F expresses the horizontal error, the trade off
between the two is controlled by parameter λ.
For G a straightforward squared difference function is applied, as in equation 23.
23.
()
()
()
()
()
()
()
2
,Gyx yf x yx yf x
⎡⎤
≡−
⎣⎦
For F which is a vector the difference derivative from the f(x) = x function is chosen.
()
()
22
12
12
,1
ff
Fxf x
xx
⎛⎞
∂∂
≡−+
⎜⎟
∂∂
⎝⎠
1
24.
- 23 -
Some notes
Using morphological deformations as a map comparison method seems a elegant solution to
separating errors due to location and quantity. Even though the algorithm to find the solution is
highly complicated, the results can be well interpreted. It is adaptive to the data and thus
overcomes several problems that are associated to other methods that also intend to achieve a
balanced judgement of location errors and quantity errors. These are:
Moving window based methods:
o Possibility of multiple compensation; in a moving window compensating
errors may be found, i.e. within the window overestimation balances
underestimation. It is well possible that underestimation at one site balances
overcompensation (or vice versa) at several other sites.
o Homogeneity and isotropy of the window. In all applications that I am
aware of the window size is the same for all locations. Also the window is
symmetric favouring all directions equally (or in the case of square
windows, having an unwarranted bias in some directions).
o Blocking artefact as discussed in Section 2.5
Aggregation based methods
o Modifiable area unit problem. This problem relates to the fact that the
results of aggregations based methods can depend for a major extent on a
trivial decision. i.e. the decision of how the major grid is positioned over
the minor grid. A small deviation may lead to completely different results.
o Homogeneity of the analysis unit is a problem here as well
The strong adaptation to the morphology of the data comes at a cost however. Firstly, the
method requires high computation times. Maps that normally take less then a second to compare
may consume many minutes or even hours with this method. Moreover, the numerical analysis
does not in all cases lead to a solution. The method requires smooth maps that have a reasonable
similarity in pattern. In other words, the method is not as robust as the moving window and
aggregation based methods.
Another characteristic of the method is that in the transformation the map is distorted in such a
way that the area weighted mean (or the integral over the whole area) is not preserved. This may
be considered problematic, especially in the case of maps that represent ‘stock’ variables, such
as population or mass
- 24 -
3 DATA
The data used to evaluate a selection of the methods discussed in the previous sections have
been submitted by the Netherlands Environmental Assessment Agency. The first dataset is
synthetic and was specifically created to represent errors at different spatial scales. The second
dataset consists of three maps. Two of which are results by metamodels and the third is the
‘ground truth’ created by the original model that the other two are an approximation of.
3.1 Synthetic dataset
The first dataset consists of the following maps:
1a 1b
2a 2b 2c
3a 3b 3c
Figure 3-1 The synthetic dataset
- 25 -
The difference in the maps are known and can be summarized as follows:
The underlying gradient of the 3a, 3b and 3c maps is higher than that of 2a, 2b and
2c, whereas map 1a and 1b have no underlying gradient at all.
All differences found between maps 1a and 1b are attributed to the relative location
of the spots.
All differences found between maps 2a and 2c as well as maps 3a and 3c are
attributed to the relative location of the spots.
All differences between 2a and 2b, as well as between 3a and 3b are attributed to
the reverse direction of the gradient from southwest to northeast.
The differences between 2b and 2c, as well as between 3b and 3c are the result both
of the reversed gradient and differences in the location of the spots.
3.2 Practical dataset
The second dataset consists of the following maps:
GeoPearl
MetaPearl Index
Figure 3-2 The practical dataset
The map GeoPearl is output of the GeoPearl model, whereas Index and MetaPearl are the
outputs of metamodels that approximate GeoPearl.
- 26 -
4 RESULTS
4.1 Cell-per-cell difference
4.1.1 Synthetic dataset
The cell-per-cell differences in the synthetic dataset are depicted in Figure 4-1. The synthetic
nature of the dataset is manifest as the differences are exactly as described in Chapter 3.
1a-1b
2a-2b 2a-2c 2b-2c
3a-3b 3a-3c 3b-3c
Figure 4-1 Cell by cell differences in the synthetic dataset
- 27 -
The correlation can be calculated on the basis of a cell-by cell evaluation as well (Table 4-1).
The correlation clearly picks up on the reverse trend at the lower scale that is present in pairs 2a-
2b, 2b-2c , 3a-3b and 3b-3c. The effect of these reverse spatial trends is a negative correlation
value.
Table 4-1 Correlation (Pearson)
Pair R
1a-1b 0.403
2a-2b 0.106
2a-2c 0.645
2b-2c -0.217
3a-3b -0.408
3a-3c 0.806
3b-3c -0.577
4.1.2 Practical dataset
The cell-by-cell difference in the practical dataset are given in Figure 4-3. It appears that the
model errors are strongly clustered. Furthermore it is suggested that the clustering is related to
the structure of soil map of the Netherlands (Figure 4-2). Exploring the relation between soil
type and model error is not within the scope of this report. It is however strongly recommended
to investigate the nature of this degree of clustering
1
. The correlation figures indicate that the
output maps of the two metamodels are less similar to each other than to the original GeoPearl
results.
Figure 4-2 Soil map of the Netherlands (source: www.bodems.nl)
1
The GeoPearl webpage, www.alterra-research.nl/pls/portal30/docs/folder/pearl/pearl/geopearl.htm,
indicates that GeoPearl is based on runs for 6405 unique combinations of basic model inputs (soil type,
climate district, land-use type, groundwater depth class etc). It is recommended to compare the model and
metamodels at the level of these combinations rather than the raster map visualization.
- 28 -
Index – MetaPearl: R = 0.834
MetaPearl-GeoPearl R = 0.913 Index – GeoPearl R = 0.945
Figure 4-3 Cell-by-cell differences in the practical dataset
- 29 -
4.2 Fuzzy Numerical
4.2.1 Synthetic Dataset
The fuzzy numerical approach (Figure 4-4) is clearly able to discern small spatial errors from
large spatial errors in map pair 1a-1b. Large spatial errors found in the northwest where dots are
found in one map but not in the other, in the other parts of the map there are also differences in
the spots, but these are attributed to shifts in location and are thus minor difference. In map pairs
2a-2c and 3a-3c the structure of the spots is identical, but due to the gradient ‘mitigating’ values
are found in the neighbourhood and the locations that are in map pair 1a-1b are considered
major difference are now seen as minor differences. As the gradient is stronger in pair 3a-3c, the
effect of mitigating is also stronger. As a result map pair 3a-3c is considered most similar.
1a-1b: S = 0.675
2a-2b: S = 0.596 2a-2c: S = 0.845 2b-2c: S = 0.572
3a-3b: S = 0.572 3a-3c: S = 0.886 3b-3c: S = 0.563
Figure 4-4 Fuzzy similarity, R = 15 halving = 3
- 30 -
4.2.2 Practical dataset
For the practical dataset the Fuzzy Numerical approach indicates that differences in both maps
are mainly minor ones. The stronger differences are found in the north for Index and northeast
for MetaPearl
GeoPearl-Index
S = 0.827
GeoPearl-MetaPearl
S = 0.813
Figure 4-5 Fuzzy similarity, R = 15 halving = 3
- 31 -
4.3 Image Quality Assessment
4.3.1 Synthetic dataset
Figure 4-6 displays the main outcomes for the first dataset on the basis of the default parameters
given in the paper (radius = 11, deviation = 1.5).
1a-1b: SSIM = 0.243
2a-2b: SSIM = 0.759 2a-2c: SSIM = 0.391 2b-2c: SSIM = 0.253
3a-3b: SSIM = 0.684 3a-3c: SSIM = 0.441 3b-3c: SSIM = 0.239
Figure 4-6 Main image quality assessment results
As expected the similarity of 2b and 2c is lower than that of either 2a and 2b or 2a and 2c,
because it entails the differences found in both (likewise for the combination 3a,3b and 3c). It
may be surprising that both in pattern and absolute value the difference between map 1a and 1b
is more akin to 2b and 2c than to 2a and 2c (likewise for the combination 3a, 3b and 3c) . This
can be explained by the similarity in contrast as a result of the gradient, that is found in the pairs
2a-2c and 3a-3c, but not in 1a-1b.
- 32 -
The results as a function of scale illustrate how the indicators respond differently to an
increasing scale. Both similarities in luminance and contrast increase along with the scale of the
analysis. This is expected as the increase in scale signifies an increase in tolerance for location
error. In order to learn from the scale-similarity plots it is necessary to look beyond the trend,
which is practically always positive and consider the steepness instead.
Structure on the other hand does not have a direct relation with scale. Since it calculates a
correlation within the window, it does not imply that difference are smoothed away. As the
window expands it may pick up on large scale similarities (such as the similarity in gradient,
recognized in Figure 4-9 and Figure 4-12) or dissimilarities (e.g. the misplaced spots in Figure
4-7, or the reversed gradient in Figure 4-8 and Figure 4-11).
- 33 -
Figure 4-7 Differences in map pair 1a-1b
Figure 4-8 Differences in map
pair 2a-2b
Figure 4-9 Differences in map
pair 2a-2c
0
0.2
0.4
0.6
0.8
1
1.2
024681012
Sigma (Radius = 5XSigma)
Similarity
Luminance
Contrast
Structure
Figure 4-10 Differences in map
pair 2b-2c
Figure 4-11 Differences in map
pair 3a-3b
Figure 4-12 Differences in map
pair 3a-3c
Figure 4-13 Differences in map
pair 3b-3c
- 34 -
4.3.2 Practical dataset
The results of the practical dataset (Table 4-2) indicate that Index better resembles the GeoPearl
reference map than MetaPearl. The difference is mainly due to the difference in structure, and
not or hardly due to luminance and contrast. This implies Index and MetaPearl achieve similar
results within a small moving window, but Index is better capable of predicting the local peaks
and troughs.
It is striking that the difference between the two models is larger than the discrepancy with the
reference map in both situations. It suggests that the models complement each other i.e. they
both get something right that the other does not. As a blunt approach of exploiting this fact a
fourth map has been created, which is the mean of the two model maps, and indeed this
‘improved model’ outperforms the other two on all accounts (Table 4-2).
Figure 4-14 details the spatial distribution of the error. The similarity in structure to the soil
map, that was recognized in Section 4.1 is obscured here because the direction of the error
(over- or under-estimation) is not reflected in the map. Still, it is recommended to consider the
spatial distribution of the error in light of the soil map.
Table 4-2 Image quality assessment results, for the practical dataset
SSIM Luminance Contrast Structure
Deviation = 1
Index - GeoPearl 0.87 0.98 0.95 0.93
MetaPearl - GeoPearl 0.77 0.98 0.94 0.83
Index - MetaPearl 0.61 0.96 0.91 0.69
Deviation = 4
Index - GeoPearl 0.91 0.99 0.98 0.93
MetaPearl - GeoPearl 0.83 0.99 0.98 0.85
Index - MetaPearl 0.68 0.98 0.97 0.72
Deviation = 1
Mean - GeoPearl 0.92 0.99 0.97 0.96
GeoPearl-Index GeoPearl-MetaPearl
Figure 4-14 Spatial distribution of structural difference SSIM in the practical dataset.
- 35 -
4.4 Wavelet verification
4.4.1 Synthetic dataset
The results of the wavelet verification are in line with expectations. Those map pairs that have
opposing gradients find a large error at the second to coarsest scale (the coarsest scale simply
compares the mean of the two maps). The pairs with identical or no gradient only find errors at
the finer scale, mainly on the 2X2 and 4X4 aggregation. The maps that contain both types of
errors do indeed display two peeks for the mean squared error (4X4 and 32X32). The only
downside of the wavelet verification approach is that not all of the coarse scale errors are
registered as such; the graphs for the pairs 2a-2b and 3a-3b do not unambiguously make clear
that the errors are coarse scale only, instead they suggest that the error is mainly coarse scale but
to a lesser extent also fine scale. This may be related to the fact that the type of wavelet applied
is a discrete Haar wavelet which is not sufficient to capture a smooth trend at a coarse scale.
The correlation results (Figure 4-16) are in line with those of the mean squared error. It can be
considered disappointing that the procedure does not fully attribute recognize the inverse
relation at the coarsest scale that is present in pairs 2a-2b, 2b-2c, 3a-3b and 3b-3c. In the ideal
situation the comparison method would recognize the perfect a negative correlation and return
the value -1. the explanation may be sought in the inclusion of NoData values in the comparison
(as value 0).
1a-1b
0
3
6
9
12
15
1 2 4 8 16 32 64
Scale (cell size)
Mean Squared Error
2a-2b
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
2a-2c
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
2b-2c
0
3
6
9
12
15
1 2 4 8 16 32 64
Scale (cell size)
Mean Squared Error
3a-3b
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
3a-3c
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
3b-3c
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
Figure 4-15 Wavelets verification results for the synthetic dataset. Mean squared error
is attributed to different scales.
- 36 -
1a-1b
0
0.2
0.4
0.6
0.8
1
1248163
Scale (cell size)
Correlation
2
2a-2b
0
0.2
0.4
0.6
0.8
1
1248163
Scale (cell size)
Correlation
2
2a-2c
0
0.2
0.4
0.6
0.8
1
1248163
Scale (cell size)
Correlation
2
2b-2c
0
0.2
0.4
0.6
0.8
1
1 2 4 8 16 32
Scale (cell size)
Correlation
3a-3b
-0.1
0.1
0.3
0.5
0.7
0.9
1 2 4 8 16 32
Scale (cell size)
Correlation
3a-3c
0
0.2
0.4
0.6
0.8
1
1 2 4 8 16 32
Scale (cell size)
Correlation
3b-3c
-0.1
0.1
0.3
0.5
0.7
0.9
1248163
Scale (cell size)
Correlation
2
Figure 4-16 Wavelets verification results for the synthetic dataset. Correlation is
depicted for different scales. Note that the correlation for the coarsest scale of
64X64 cells is not calculated. At that scale, all cells in the map are taken in by the
same value, which render the correlation meaningless.
4.4.2 Practical dataset
The results for the practical dataset indicate that for both maps the strongest differences are
related to the finest scale, however up to very large scales (512X512 cells, a quarter of the map)
structural differences are found. Based on the strong spatial clustering of errors (Section 4.1) a
more pronounced error at larger scales was expected. The length of the ‘tail’ indicates that Index
has errors at a coarser scale than MetaPearl. The correlation and mean squared error provide
approximately the same information.
index-meta
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
1
2
4
8
1
6
3
2
64
1
28
2
56
5
12
1024
Scale (cell size)
Mean Squared Error
index-meta
0.85
0.88
0.91
0.94
0.97
1
1
2
4
8
16
3
2
6
4
12
8
25
6
5
12
1
02
4
Scale (cell size)
Correlation
Figure 4-17 Comparison of the two metamodels, Mean Squared Error and Correlation
as a function of scale
- 37 -
index-geo
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
1
2
4
8
1
6
32
64
1
28
256
5
12
1024
Scale (cell size)
Mean Squared Error
meta-geo
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
1
2
4
8
16
32
64
12
8
25
6
5
12
1024
Scale (cell size)
Mean Squared Error
index-geo
0.85
0.88
0.91
0.94
0.97
1
1
2
4
8
1
6
32
64
12
8
2
5
6
5
1
2
1
02
4
Scale (cell size)
Correlation
meta-geo
0.85
0.88
0.91
0.94
0.97
1
1
2
4
8
1
6
3
2
64
1
2
8
256
512
1
024
Scale (cell size)
Correlation
Figure 4-18 Wavelets verification results for the practical dataset. Mean squared error
is attributed to different scales.
4.5 Image Warping
The method based on morphological transformations is strained to its limits by the test case. In
the practical case it appears impossible to find a solution, whereas for the synthetic case
solutions are only found for some values of λ, which is the parameter that sets the weight of the
horizontal error over the vertical error. The synthetic maps have been compared at the lowest
value of Lambda that yields a solution in all cases. The motivation for this choice is to
maximize the tolerance for location errors, but still be able to compare all maps according to the
same standard.
In short, the results are disappointing; only in the pair 1a-1b does the methodology find a
solution that actually has a lower RMISE after the transformation.
Table 4-3 Root mean squared error before and after transformation on the basis of
interpolated maps
RMISE before RMISE after Deformation penalty
1a-1b 0.13 0.11 0.080
1b-1a 0.13 0.12 0.086
2a-2b 0.14 0.14 0.048
2a-2c 0.076 0.078 0.050
2b-2c 0.15 0.15 0.058
3a-3b 0.18 0.18 0.066
3a-3c 0.058 0.074 0.047
3b-3c 0.18 0.19 0.16
- 38 -
4.6 Bivariate Spatial Association
4.6.1 Synthetic dataset
Positive Lee’s L values are found for those maps where the differences are exclusively the
effect of locations of the spots. Negative Lee’s values are found for those where the gradient are
reversed. The effect of the dissimilarity of the gradient is best recognized with a spatial lag of 10
cell widths, whereas the similarity of the spots is best recognized at a spatial lag of 2 or 3.
The maps indicate that the strongest impact on the L statistic in map pair 1a-1b stems from the
spots and not from the surrounding ‘flatness’. In the maps with reversed gradients the spots do
have a positive contribution to the L statistic, but are outweighed by the negative impact of the
gradient.
1a-1b
2a-2b 2a-2c 2b-2c
3a-3b 3a-3c 3b-3c
Figure 4-19 Local Lee’s L values for the synthetic dataset (spatial lag 1.5)
- 39 -
Figure 4-20 Lee’s L values as a function of the spatial lag (radius).
4.6.2 Pragmatic dataset
The spatialized results indicate that Lee’s L statistic for the pragmatic dataset is almost fully the
effect of positive local correlation. This means that there are many locations (in blue, Figure 4-
21) where the two local means stick out from the global mean, but is happens only at very few
locations (in red) that a local mean in one map stands out negatively and in the other positively.
Besides this observation, neither the spatial results nor the relation between spatial lag and L
provide much insight in the nature of the differences.
Index-GeoPearl MetaPearl-GeoPearl
Figure 4-21 Local Lee’s L values for the pragmatic dataset (spatial lag 1.5)
- 40 -
Figure 4-22 Lee’s L values as a function of the spatial lag (radius).
- 41 -
5 CONCLUSIONS AND RECOMMENDATIONS
The most general conclusion of this study is: Yes, there are raster similarity metrics
available and their application yields rich information on the nature, extent and spatial
distribution of differences and similarities in pairs of numerical maps.
Similarity metrics for pairs of raster datasets are used in different fields of science and
engineering. As a consequence much methodological research has been done, that is not
necessarily known from one discipline to another. This may in part be ascribed to network
effects, but also to differences in terminology. One of the objectives of this report is to look
beyond disciplinary barriers and provide a cross-section of comparison methodologies. A gap is
recognized between practice and theory; although advanced metrics (those taking into account
spatial relations between cells) are becoming available, the common practice is to perform cell-
by-cell comparisons. This leaves the literature fairly fragmented, offering individual methods
rather than an evolving theory. In effect the report has the character of a sampler of some
recently developed methods.
Eight methods have been introduced and shortly discussed. Of these, five have been applied on
two test cases. These are Fuzzy Numerical, Image Quality Assessment, Wavelet Field
Verification, Bivariate Spatial Association and Image Warping.
The only method that performed inadequately is Image Warping, based on the paper by Reilly et
al. (2004). This method is applies numerical optimization for a morphological transformation
that balances horizontal and vertical errors. The optimization only finds trivial solutions for the
synthetic dataset and does not find a solution at all in the more complicated practical dataset. It
is not recommended, however, to simply discard this approach; in theory it can solve some
problems associated to moving windows based approaches (such as the Image Quality
Assessment and Bivariate Spatial Association) and aggregation based methods (such as Wavelet
Field Verification). Future research into this type of comparison should not only focus on
numerical improvements, but should preferably also apply morphological transformations that
conserve the ‘volume’ of the map. This implies morphological operators that do not move
points in space, but instead move ‘volume’ from one cell to another. This would be especially
relevant when the raster represents a stock variable.
The method of Briggs & Levine 1997 applies wavelets to obtain indices of map differences at
different scales. Of all four methods that have been applied it is best capable of differentiating
between the large scale and small scale errors that are present in the synthetic datasets. The only
downside of this method is that it attributes the coarse scale errors not only to the coarsest scales
but to a lesser extent also to the finer scales. This is the effect of the discrete nature of the
wavelets. It is recommended to investigate if the application of continuous wavelets can reduce
this effect. Another recommendation is to apply the wavelet approach for multi-scale analysis
for instance of structure metrics such as patch size, diversity and edge density. The negative
correlation at coarse scales that is present in some of the test maps was not properly recognized
by the comparison method. The likely explanation is the presence of NoData values are
interpreted as value 0.
The results of the ‘Bivariate Spatial Association’ are hard to interpret. In particular since the
spatial association between two identical maps is not equal to 1. Nevertheless, the spatialized
version of this metric seems very much in line with human observations In particular, the
method is the only one that identifies partly overlapping spots as being highly similar. The
calculation of expected similarities and variance that the author introduced in a later paper (Lee,
2004) is not considered here, because for medium sized maps it requires prohibitively large
- 43 -
matrix operations. A recommendation for taking into account distributions of errors is to use
Monte Carlo simulation. The advantage of Monte Carlo simulation is that it can be applied for
any similarity metric and different stochastic null models can be applied.
Fuzzy Numerical is the method that seems best able to distinguish areas of minor spatial errors
from major spatial errors in the synthetic dataset. In the maps with a background gradient this
quality became obscured as within more diverse neighbourhoods also more mitigating cells can
be found. Further decomposition of the error, along the lines of the Image Quality Assessment
may be a solution.
Image Quality Assessment yields results that are in line with expectations and allows a
decomposition of the error into different sources. This method gave the clearest feedback on the
practical dataset. A disadvantage of this method is the occurrence of blocking artefacts. These
are only a minor distortion due to the distance decay weights, but a full solution may be offered
by following a fuzzy weighting system, along the lines of Hagen (2003).
It is recommended to investigate the possibilities of a hybrid method, taking elements from
Image Quality Assessment and Fuzzy Numerical. This hybrid approach should take from IQA
the different types of neighbourhood comparison (luminance, contrast, structure) and the fuzzy
weighting of the Fuzzy Numerical and Fuzzy Kappa methods.
With regards to application in the Map Comparison Kit it is stressed that the gap between theory
and practice will be tightened as methods become available in user friendly software. It is
therefore recommended to make the methods discussed in this report available in the software.
An issue to resolve is how to deal with NoData values and non-rectangular maps in particular
for the Wavelet Field Verification and Image Warping. One of the advantages of the MCK is
that it is equipped with tools to perform structured analysis by multiple comparisons, including
a Monte Carlo approach of significance on the basis of neutral models. It is recommended to
extend the MCK with neutral models of continuous valued landscapes, because these are
currently not supported.
- 44 -
6 REFERENCES
Ahrens, B., Karstens, U., Rockel, B., & Stuhlmann, R. (1998). On the validation of the
atmospheric model REMO with ISCCP data and precipitation measurements using simple
statistics. Meteorology and Atmospheric Physics, 68(3-4), 127-142.
Anselin, L. (1995). Local Indicators of Spatial Association - Lisa. Geographical Analysis, 27(2),
93-115.
Bishop, G. D., Church, M. R., Aber, J. D., Neilson, R. P., Ollinger, S. V., & Daly, C. (1998). A
comparison of mapped estimates of long-term runoff in the northeast United States. Journal
of Hydrology, 206(3-4), 176-190.
Bogena, H., Kunkel, R., Schobel, T., Schrey, H. P., & Wendland, E. (2005). Distributed
modeling of groundwater recharge at the macroscale. Ecological Modelling, 187(1), 15-26.
Boots, B., & Csillag, F. (2006). Categorical maps, comparisons, and confidence. Journal of
Geographical Systems, 8(2), 109-118.
Briggs, W. M., & Levine, R. A. (1997). Wavelets and field forecast verification. Monthly
Weather Review, 125(6), 1329-1341.
Brooks, H. E., & Doswell, C. A. (1996). A comparison of measures-oriented and distributions-
oriented approaches to forecast verification. Weather and Forecasting, 11(3), 288-303.
Casati, B., Ross, G., & Stephenson, D. B. (2004). A new intensity-scale approach for the
verification of spatial precipitation forecasts. Meteorological Applications, 11(2), 141-154.
Domingues, M. O., Mendes, O., & da Costa, A. M. (2005). On wavelet techniques in
atmospheric sciences. Fundamentals of Space Environment Science, 35(5), 831-842.
Ebert, E. E. (2005). Forecast Verification - Issues, Methods and FAQ, [Internet]. Available:
http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html [2006, 8 May].
Ebert, E. E., & McBride, J. L. (2000). Verification of precipitation in weather systems:
determination of systematic errors. Journal of Hydrology, 239(1-4), 179-202.
Eskicioglu, A. M., & Fisher, P. S. (1995). Image quality measures and their performance.
Communications, IEEE Transactions on, 43(12), 2959-2965.
Foody, G. M. (2002). Status of land cover classification accuracy assessment. Remote Sensing
of Environment, 80(1), 185-201.
Garen, D. C., & Marks, D. (2005). Spatially distributed energy balance snowmelt modelling in a
mountainous river basin: estimation of meteorological inputs and verification of model
results. Journal of Hydrology, 315(1-4), 126-153.
Goovaerts, P., Jacquez, G. M., & Greiling, D. (2005). Exploring scale-dependent correlations
between cancer mortality rates using factorial kriging and population-weighted
semivariograms. Geographical Analysis, 37(2), 152-182.
Hagen, A. (2003). Fuzzy set approach to assessing similarity of categorical maps. International
Journal of Geographical Information Science, 17(3), 235-249.
Hagen-Zanker, A., Engelen, G., Hurkens, J., Vanhout, R., & Uljee, I. (2006). Map Comparison
Kit 3: User Manual. Maastricht: Research Institute for Knowledge Systems.
Hagen-Zanker, A., Straatman, B., & Uljee, I. (2005). Further developments of a fuzzy set map
comparison approach. International Journal of Geographical Information Science, 19(7),
769-785.
- 45 -
Hoffman, R. N., Liu, Z., Louis, J.-F., & Grassoti, C. (1995). Distortion Representation of
Forecast Errors. Monthly Weather Review, 123(9), 2758-2770.
Lee, S. I. (2004). A generalized significance testing method for global measures of spatial
association: an extension of the Mantel test. Environment and Planning A, 36(9), 1687-1703.
Lee, S.-I. (2001). Developing a bivariate spatial association measure: An integration of
Pearson's r and Moran's I. Journal of Geographical Systems, 3(4), 369-385.
Liu, J., Chen, J. M., Cihlar, J., & Park, W. M. (1997). A process-based boreal ecosystem
productivity simulator using remote sensing inputs. Remote Sensing of Environment, 62(2),
158-175.
Menard, A., & Marceau, D. J. Simulating the impact of forest management scenarios in an
agricultural landscape of southern Quebec, Canada, using a geographic cellular automata.
Landscape and Urban Planning, In Press, Corrected Proof.
Miliaresis, G. C., & Paraschou, C. V. E. (2005). Vertical accuracy of the SRTM DTED level 1
of Crete. International Journal of Applied Earth Observation and Geoinformation, 7(1), 49-
59.
Ostrem, G., & Haakensen, N. (1999). Map Comparison or Traditional Mass-balance
Measurements: Which Method is Better? Geografiska Annaler, Series A: Physical
Geography, 81(4), 703-711.
O'Sullivan, D., & Unwin, D. (2002). Geographic information analysis. Hoboken, N.J.: Wiley.
Pal, N. R., & Pal, S. K. (1993). A review on image segmentation techniques. Pattern
Recognition, 26(9), 1277-1294.
Pontius, J., Robert Gilmore, Huffaker, D., & Denman, K. (2004). Useful techniques of
validation for spatially explicit land-change models. Ecological Modelling, 179(4), 445-461.
Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree
techniques: Bagging and random forests for ecological prediction. Ecosystems, 9(2), 181-
199.
Reilly, C., Price, P., Gelman, A., & Sandgathe, S. A. (2004). Using image and curve registration
for measuring the goodness of fit of spatial and temporal predictions. Biometrics, 60(4), 954-
964.
Santer, B. D., Wigley, T. M. L., & Jones, P. D. (1993). Correlation Methods in Fingerprint
Detection Studies. Climate Dynamics, 8(6), 265-276.
Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based
image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 22(12), 1349-1380.
Strasser, U., & Mauser, W. (2001). Modelling the spatial and temporal variations of the water
balance for the Weser catchment 1965-1994. Journal of Hydrology, 254(1-4), 199-214.
Tompa, D., Morton, J., & Jernigan, E. (2000). Perceptually based image comparison. Paper
presented at the 2000 International Conference on Image Processing, Vancouver, Canada.
Tustison, B., Harris, D., & Foufoula-Georgiou, E. (2001). Scale issues in verification of
precipitation forecasts. Journal of Geophysical Research-Atmospheres, 106(D11), 11775-
11784.
Viscarra Rossel, R. A., & Walter, C. (2004). Rapid, quantitative and spatial field measurements
of soil pH using an Ion Sensitive Field Effect Transistor. Geoderma, 119(1-2), 9-20.
Visser, H., & de Nijs, T. (2006). The Map Comparison Kit. Environmental Modelling &
Software, 21(3), 346-358.
Visser, H., Hagen, A., Nijs (de), T., Klein Goldewijk, C. G. M., Borsboom - van Beurden, J. A.
M., & Niet (de), R. (2004). The Map Comparison Kit: Method, software and applications
(report 550002005). Bilthoven: RIVM.
- 46 -
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment:
From error visibility to structural similarity. Ieee Transactions on Image Processing, 13(4),
600-612.
Wartenberg, D. (1985). Multivariate spatial correlation: a method for exploratory geographical
analysis. Geographical Analysis, 17, 263-283.
Wealands, S. R., Grayson, R. B., & Walker, J. P. (2005). Quantitative comparison of spatial
fields for hydrological model assessment - some promising approaches. Advances in Water
Resources, 28(1), 15-32.
Wigley, T. M. L., & Santer, B. D. (1990). Statistical Comparison of Spatial Fields in Model
Validation, Perturbation, and Predictability Experiments. Journal of Geophysical Research-
Atmospheres, 95(D1), 851-865.
Zepeda-Arce, J., Foufoula-Georgiou, E., & Droegemeier, K. K. (2000). Space-time rainfall
organization and its role in validating quantitative precipitation forecasts. Journal of
Geophysical Research-Atmospheres, 105(D8), 10129-10146.
Zhang, L. J., & Gove, J. H. (2005). Spatial assessment of model errors from four regression
techniques. Forest Science, 51(4), 334-346.
Zhou, Q. M., & Liu, X. J. (2004). Analysis of errors of derived slope and aspect related to DEM
data properties. Computers & Geosciences, 30(4), 369-378.
- 47 -
Annex A: DETAILED RESULTS IMAGE
QUALITY
ASSESSMENT
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-1 Differences in pair2a-2b as a function of variance and radius
- 49 -
Sigma = 1
R = 5
Sigma = 2
R = 10
Sigma = 4
R = 20
Sigma =
10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-2 Differences in pair1a-1b as a function of variance and radius
- 50 -
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-3 Differences in pair2b-2c as a function of variance and radius
- 51 -
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-4 Differences in pair2a-2c as a function of variance and radius
- 52 -
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 8
R = 40
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-5 Differences in pair3a-3b as a function of variance and radius
- 53 -
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-6 Differences in pair3b-3c as a function of variance and radius
- 54 -
3a-3c
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure 6-1 Differences in pair3a-3c as a function of variance and radius
- 55 -
Overall (Deviation 1) Overall (Deviation 4)
Luminance (Deviation 1) Contrast (Deviation 1) Structure (Deviation 1)
Figure A-7 Differences in pair Index-GeoPearl
- 56 -
Overall (Deviation 1) Overall (Deviation 4)
Luminance (Deviation 1) Contrast (Deviation 1) Structure (Deviation 1)
Figure A-8 Differences in pair MetaPearl-GeoPearl
- 57 -
Overall (Deviation 1) Overall (Deviation 4)
Luminance (Deviation 1) Contrast (Deviation 1) Structure (Deviation 1)
Figure A-9 Differences in pair MetaPearl-Index
- 58 -
Footer
59
Footer
59

Supplementary resource (1)

... We compared the hard and soft recreation potential maps of the whole Cairngorms National Park and the areas seen from the two trails using the Fuzzy Numerical (FN) approach (Hagen-Zanker, 2006). This method compares output values at each pixel of two maps, while also accounting for the values of neighbouring pixels that may mitigate deviations between the focal pixels (Hagen-Zanker, 2006), and provides spatially explicit results. ...
... We compared the hard and soft recreation potential maps of the whole Cairngorms National Park and the areas seen from the two trails using the Fuzzy Numerical (FN) approach (Hagen-Zanker, 2006). This method compares output values at each pixel of two maps, while also accounting for the values of neighbouring pixels that may mitigate deviations between the focal pixels (Hagen-Zanker, 2006), and provides spatially explicit results. The FN approach generates a statistic ranging from 0 (completely different) to 1 (identical) between the two maps. ...
... The FN approach generates a statistic ranging from 0 (completely different) to 1 (identical) between the two maps. The influence of neighbouring pixels on a locations' FN statistic depends on the distance weight function, introducing an element of subjectivity (Hagen-Zanker, 2006;Visser & de Nijs, 2006). We used an exponential decay function with a 200 m halving distance and 400 m radius. ...
Article
There is often pressure on protected area managers to enhance wild landscapes and protect biodiversity while simultaneously promoting goals such as assuring ongoing provision of ecosystem services and providing for tourist and recreational needs. In the current study, we investigate two tools that may provide different types of management-relevant knowledge on recreation ecosystem services to aid management decisions in a protected area (PA), particularly where such tradeoffs must be considered. We constructed a spatial recreational model of the whole Cairngorms National Park, Scotland (ESTIMAP-Recreation), parameterised by hard and soft recreationalists (providing “where and what” types of knowledge). We then conducted 33 interviews with managers, residents, and visitors, asking if and how they considered the resulting maps useful. In parallel, we conducted focused walking interviews in two contrasting park sites, favoured by hard and soft recreationalist to determine if it was possible to gain further insight into other aspects of the recreational ecosystem services (the “why”). These walking interviews were conducted in a woodland adjacent to an urban area (30 interviews) and in a wild landscape mountain area (22 interviews). The complimentary nature of these approaches highlights the diverse sources of knowledge PA managers may exploit to assist in managing the many conflicting objectives imposed upon them.
... We calculated the difference maps from the pixel value of our forest AGB estimated map minus the corresponding pixel value of the other forest AGB maps. We derived the FN value, which represented the spatial distribution similarity of two raster maps-that is, the higher the FN value, the greater the spatial similarity of the two maps [56]. Calculating the FN value, we should choose the calculated window [56], and in order to directly compare the spatial similarity between the two maps, the window in this article was 1 × 1. ...
... We derived the FN value, which represented the spatial distribution similarity of two raster maps-that is, the higher the FN value, the greater the spatial similarity of the two maps [56]. Calculating the FN value, we should choose the calculated window [56], and in order to directly compare the spatial similarity between the two maps, the window in this article was 1 × 1. ...
Article
Full-text available
Global aboveground forest biomass (AGB) is very important in quantifying carbon stock.It is necessary to estimate forest AGB accurately. Many studies have obtained reliable AGB estimates by using Light Detection and Ranging (LiDAR) data. However, LiDAR data frequently are not available at a regional over a long time. Although many studies have integrated multi-source data to estimate biomass to compensate for these deficiencies, few methods can be applied to produce global time series of high-resolution AGB due to the complexity of the method, data source limitations, and large uncertainty. This study developed a new method to produce a global forest AGB map using multiple data sources—including LiDAR-derived biomass products, a suite of high-level satellite products, forest inventory data, and other auxiliary datasets—to train estimated models for five different forest types. We explored three machine learning methods (artificial neural network, multivariate adaptive regression splines, and gradient boosting regression tree (GBRT)) to build the estimated models. The GBRT method was the optimal algorithm for generating a global forest AGB map at a spatial resolution of 1 km. The independent validation result showed good accuracy with an R2 value of 0.90 and a RMSE value of 35.87 Mg/ha. Moreover, we compared the generated global forest AGB map with several other forest AGB maps and found the results to be highly consistent. An important feature of this new method is its ability to produce time series of high-resolution global forest AGB maps because it heavily relies on high-level satellite products.
... These methods were fundamentally first proposed by Hagen (2003) for comparing categorical geodata. First, we introduce fuzziness in space and preserve continuous numeric values of raster datasets, which is referred to as fuzzy numerical map comparison (Hagen, 2006). Second, we implement fuzzy sets and kappa κ statistics to establish a method for comparing categorical (i.e., categorized numerical values) raster maps. ...
... The sensitivity of the plume-dispersal model projections to different input parameters was tested with Fuzzy Kappa statistic for assessing the similarity of numerical maps. In general terms, the fuzzy numerical map statistic goes beyond a traditional cell-by-cell comparison and takes the neighbouring cells into account to compensate for spatial offsets in correlation analyses through fuzziness of location (Hagen, 2003;Hagen, 2006). We adopted an exponential distance decay membership function considering a neighbourhood radius of 4 cells and a halving distance of 2 cells. ...
Article
Full-text available
It is increasingly recognised that deep-sea mining of seafloor massive sulphides (SMS) could become an important source of mineral resources. These operations will remove the targeted substrate and produce potentially toxic plumes from in situ seabed excavation and from the return water pumped back down to the seafloor. However, the spatial extent of the impact of deep-sea mining is still uncertain because few field experiments and models of plume dispersion have been conducted. In this study, we used three-dimensional hydrodynamic models of the Azores region together with a theoretical commercial mining operation of polymetallic SMS to simulate the potential dispersal of plumes originating from different phases of mining operations, and to assess the magnitude of potential impacts. Although the model simulations presented here were subject to many caveats, they did reveal some important patterns. The model projected marked differences among sites making generalisations about plume-dispersal patterns in mid-ocean ridges difficult. Nevertheless, the models predicted large horizontal and vertical plume-dispersals above the thresholds adopted. Persistent plumes (temporal frequency >50%, i.e., 6 months out of 12 months) were projected to disperse an average linear distance of 10 to 20 km, cover an area of 17 to 150 km2, and extend more than 800 m in the water column. In fact, the model projected that plumes may disperse beyond the licensed mining areas, reach the flanks and summits of nearby topographic features, and extend into the bathypelagic, mesopelagic, and epipelagic environments. Modelled plume-dispersal overlaps with the predicted distribution of cold-water corals and with existing fishing activities. These potential impacts would be of particular concern in regions such as the Azores, where local populations are highly dependent on the sea for their livelihoods. The findings of this study are an important initial step towards understanding the nature and magnitude of deep-sea mining impacts in space and time.
... MCS is the map comparison statistic, and a and b correspond to the normalized values of the assessed indicators (nitrogen budget, nitrate leaching potential and denitrification potential). The statistical analysis aims to identify the average difference between each pair of compared datasets(Hagen- Zanker 2006;Schulp et al. 2014;Ma et al. 2019). ...
Article
Full-text available
In this study, the nutrient regulation ecosystem service (ES) demand was quantified and mapped in an agriculturally-dominated landscape in the federal German state of North Rhine-Westphalia. The demand was assessed in a case study area on an individual field scale. As an indicator for the nutrient regulation ecosystem service demand, nitrogen budgets were calculated. The assessment includes a comparison of an agriculturally calculated nitrogen budget to an ecologically calculated nitrogen budget. The agricultural calculation is based on legal regulations and considers volatile nitrogen losses from fertilizers, whereas the ecological calculation incorporates the total amount of nitrogen and includes also the atmospheric nitrogen deposition. Furthermore, the positive effects of additional agricultural practices on the nutrient regulation ES demand were identified. The spatial distribution of the nutrient regulation ES demand was compared to the distribution of the nitrate leaching and denitrification potential in order to analyse the relative vulnerability of individual fields to nutrient oversupply. The findings of this study, which highlight in particular the suitability of enlarged crop rotation systems, can be used to support sustainable agricultural practices and land management strategies on the local sale.EDITED BY Christine Fürst
... We also assigned three levels of model reliability (MR) [21]. To do so, we evaluated and combined five model performance variables into a single rating: (1) a pseudo-R 2 obtained from the RandomForest (RF) model; (2) a Fuzzy Kappa comparing the imputed RF map to the FIA-derived map [22]; (3) a tree skill statistic of the imputed RF, after removing records with very high coefficient of variables (CV); (4) the deviance of the CV among 30 regression trees via bagging [23,24]; and (5) the stability of the top five variables from 30 regression trees [25]. ...
Article
Full-text available
We modeled and combined outputs for 125 tree species for the eastern United States, using habitat suitability and colonization potential models along with an evaluation of adaptation traits. These outputs allowed, for the first time, the compilation of tree species’ current and future potential for each unit of 55 national forests and grasslands and 469 1 × 1 degree grids across the eastern United States. A habitat suitability model, a migration simulation model, and an assessment based on biological and disturbance factors were used with United States Forest Service Forest Inventory and Analysis data to evaluate species potential to migrate or infill naturally into suitable habitats over the next 100 years. We describe a suite of variables, by species, for each unique geographic unit, packaged as summary tables describing current abundance, potential future change in suitable habitat, adaptability, and capability to cope with the changing climate, and colonization likelihood over 100 years. This resulting synthesis and summation effort, culminating over two decades of work, provides a detailed data set that incorporates habitat quality, land cover, and dispersal potential, spatially constrained, for nearly all the tree species of the eastern United States. These tables and maps provide an estimate of potential species trends out 100 years, intended to deliver managers and publics with practical tools to reduce the vast set of decisions before them as they proactively manage tree species in the face of climate change.
... We created a model reliability (ModRel) score from a series of five metrics obtained from the performance statistics of each of 125 species. These included (1) a pseudo R 2 obtained from the RF model (RF R 2 ); (2) a Fuzzy Kappa (FK) metric which compares outputs of the imputed RF-predicted map to the FIA-derived map [72]; (3) the deviance of the CV (CVdev) among 30 regression trees via bagging [41]; and (4) the stability of the top five variables (Top5) from 30 regression trees, and (5) a true skill statistic (TSS) of the imputed RF. The first four were used previously, described in Iverson et al. [42]. ...
Article
Full-text available
Forests across the globe are faced with a rapidly changing climate and an enhanced understanding of how these changing conditions may impact these vital resources is needed. Our approach is to use DISTRIB-II, an updated version of the Random Forest DISTRIB model, to model 125 tree species individually from the eastern United States to quantify potential current and future habitat responses under two Representative Concentration Pathways (RCP 8.5 -high emissions which is our current trajectory and RCP 4.5 -lower emissions by implementing energy conservation) and three climate models. Climate change could have large impacts on suitable habitat for tree species in the eastern United States, especially under a high emissions trajectory. On average, of the 125 species, approximately 88 species would gain and 26 species would lose at least 10% of their suitable habitat. The projected change in the center of gravity for each species distribution (i.e., mean center) between current and future habitat moves generally northeast, with 81 species habitat centers potentially moving over 100 km under RCP 8.5. Collectively, our results suggest that many species will experience less pressure in tracking their suitable habitats under a path of lower greenhouse gas emissions.
... Our SSI i , based on Hangen-Zanker [22], measures the similarity of modeled production reported in pixel i estimated using one of our alternative methodological-cum-data choices, a i, relative to modeled production for that same pixel taken from the original SPAM2005 data set, b i, . By construction, SSI i ranges from 0 (entirely distinct) to 1 (identical), and is calculated between each pair of corresponding pixels (a i, and b i, ) using the following similarity function: ...
Article
Full-text available
Worldwide, crop production is intrinsically intertwined with biological, environmental and economic systems, all of which involve complex, inter-related and spatially-sensitive phenomena. Thus knowing the location of agriculture matters much for a host of reasons. There are several widely cited attempts to model the spatial pattern of crop production worldwide, not least by pixilating crop production statistics originally reported on an areal (administrative boundary) basis. However, these modeled measures have had little scrutiny regarding the robustness of their results to alternative data and modeling choices. Our research casts a critical eye over the nature and empirical plausibility of these types of datasets. To do so, we determine the sensitivity of the 2005 variant of the spatial production allocation model data series (SPAM2005) to eight methodological-cum-data choices in nine agriculturally-large and developmentally-variable countries: Brazil, China, Ethiopia, France, India, Indonesia, Nigeria, Turkey and the United States. We compare the original published estimates with those obtained from a series of robustness tests using various aggregations of the pixelized spatial production indicators (specifically, commodity-specific harvested area, production quantity and yield). Spatial similarity is empirically assessed using a pixel-level spatial similarity index (SSI). We find that the SPAM2005 estimates are most dependent on the degree of disaggregation of the underlying national and subnational production statistics. The results are also somewhat sensitive to the use of a simple spatial allocation method based solely on cropland proportions versus a cross-entropy allocation method, as well as the set of crops or crop aggregates being modeled, and are least sensitive to the inclusion of crude economic elements. Finally, we assess the spatial concordance between the SPAM2005 estimates of the area harvested of major crops in the United States and pixelated measures derived from remote-sensed data.
Article
Full-text available
Airborne-pests can be introduced into Korea from overseas areas by wind, which can cause considerable damage to major crops. Meteorological models have been used to estimate the wind trajectories of airborne insects. The objective of this study is to analyze the effect of input settings on the prediction of areas where airborne pests arrive by wind. The wind trajectories were predicted using the HYbrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model. The HYSPLIT model was used to track the wind dispersal path of particles under the assumption that brown plant hopper (Nilaparvata lugens) was introduced into Korea from sites where the pest was reported in China. Meteorological input data including instantaneous and average wind speed were generated using meso-scale numerical weather model outputs for the domain where China, Korea, and Japan were included. In addition, the calculation time intervals were set to 1, 30, and 60 minutes for the wind trajectory calculation during early June in 2019 and 2020. It was found that the use of instantaneous and average wind speed data resulted in a considerably large difference between the arrival areas of airborne pests. In contrast, the spatial distribution of arrival areas had a relatively high degree of similarity when the time intervals were set to be 1 minute. Furthermore, these dispersal patterns predicted using the instantaneous wind speed were similar to the regions where the given pest was observed in Korea. These results suggest that the impact assessment of input settings on wind trajectory prediction would be needed to improve the reliability of an approach to predict regions where airborne-pest could be introduced.
Article
Full-text available
Numerical modeling represents a state‐of‐the‐art technique to simulate hydro‐morphodynamic processes in river ecosystems. Numerical models are often validated based on observed topographic change in the form of pixel information on net erosion or deposition over a simulation period. When model validation is performed by a pixel‐by‐pixel comparison of exactly superimposed simulated and observed pixels, zero or negative correlation coefficients are often calculated, suggesting poor model performance. Thus, a pixel‐by‐pixel approach penalizes quantitative simulation errors, even if a model conceptually works well. To distinguish between reasonably well‐performing and non‐representative models, this study introduces and tests fuzzy map comparison methods. First, we use a fuzzy numerical map comparison to compensate for spatial offset errors in correlation analyses. Second, we add a level of fuzziness with a fuzzy kappa map comparison to additionally address quantitative inaccuracy in modeled topographic change by categorizing data. Sample datasets from a physical lab model and datasets from a 6.9 km long gravel–cobble bed river reach enable the verification of the relevance of fuzzy map comparison methods. The results indicate that a fuzzy numerical map comparison is a viable technique to compensate for model errors stemming from spatial offset. In addition, fuzzy kappa map comparisons are suitable for objectively expressing subjectively perceived correlation between two maps, provided that a small number of categories is used. The methods tested and the resulting spatially explicit comparison maps represent a significant opportunity to improve the evaluation and potential calibration of numerical models of river ecosystems in the future.
Article
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.
Article
This paper describes a boreal ecosystems productivity simulator (BEPS) recently developed at the Canada Centre for Remote Sensing to assist in natural resources management and to estimate the carbon budget over Canadian landmass (10(6)-10(7) km(2)). BEPS uses principles of FOREST biogeochemical cycles (FOREST-BGC) (Running and Coughlan, 1988) for quantifying the biophysical processes governing ecosystems productivity, but the original model is modified to better represent canopy radiation processes. A numerical scheme is developed to integrate different data types: remote sensing data a 1-km resolution in lambert conformal conic projection, daily meteorological data in Gaussian or longitude-latitude gridded systems, and soil data grouped in polygons. The processed remote sensing data required int he model are leaf area index (LAI) and land-cover type. The daily meteorological data include air temperature, incoming shortwave radiation, precipitation, and humidity. The soil-data input is the available water-holding capacity. The major outputs of BEPS include spatial fields of net primary productivity (NPP) and evapotranspiration. The NPP calculated by BEPS has been tested against biomass data obtained in Quebec, Canada. A time series of LAI over the growing season of 1993 in Quebec was derived by using 10-day composite normalized difference vegetation index images acquired by the advanced very high resolution radiometer at 1-km resolution (resampled). Soil polygon data were mosaicked, georeferenced, and rasterized in a geographic information system (ARC/INFO). With the use of the process-based model incorporating all major environmental variables affecting plant growth and development, detailed spatial distributions of NPP (annual and four seasons) in Quebec are shown in this paper. The accuracy of NPP calculation is estimated to be 60% for single pixels and 75% for 3x3 pixel areas (9 km(2)). The modeled NPP ranges from 0.6 kg C/m(2)/year at the southern border to 0.01 kg C/m(2)/year at the northern limit of the province. The total annual NPP in Quebec is estimated top be 0.24 Gt C in 1993, which is about 0.3-0.4% of the global NPP. (C) Elsevier Science Inc., 1997.
Article
Precipitation forecasts from numerical weather prediction models are often compared to rain gauge observations to make inferences as to model performance and the “best” resolution needed to accurately capture the structure of observed precipitation. A common approach to quantitative precipitation forecast (QPF) verification is to interpolate the model-predicted areal averages (typically assigned to the center point of the model grid boxes) to the observation sites and compare observed and predicted point values using statistical scores such as bias and RMSE. In such an approach, the fact that the interpolated values and their uncertainty depend on the scale (model resolution) of the values from which the interpolation was done is typically ignored. This interpolation error, which comes from scale effects, is referred to here as the “representativeness error.” It is a nonzero scale-dependent error even for the case of a perfect model and thus can be seen as independent of model performance. The scale dependency of the representativeness error can have a significant effect on model verification, especially when model performance is judged as a function of grid resolution. An alternative method is to upscale the gauge observations to areal averages and compare at the scale of the model output. Issues of scale arise here too, with a different scale dependency in the representativeness error. This paper examines the merits and limitations of both verification methods (area-to-point and point-to-area) in view of the pronounced spatial variability of precipitation fields and the inherent scale dependency of the representativeness error in each of the verification procedures. A composite method combining the two procedures is introduced and shown to diminish the scale dependency of the representativeness error.
Article
Forecast error is decomposed into three components, termed displacement error, amplitude error, mid residual error, respectively. Displacement error measures how much of the forecast error can be accounted for by moving the forecast to best fit the analysis. Amplitude error measures how much of the forecast error can be accounted for by changing the amplitude of the displaced forecast to best fit the analysis. The combination of a displacement and an amplification is called a distortion. The part of the forecast error unaccounted for by the distortion is called the residual error. The distortion must be large scale, in line with the basic premise that forecast errors are best described by reference to large-scale meteorological features. A general mathematical formalism for defining distortions and decomposing forecast errors into distortion and residual errors is formulated. The distortion representation of forecast errors should prove useful for describing forecast skill and for representing the statistics of the background errors in objective data analysis. Examples using nonstandard satellite data–SSM/I precipitable water and ERS-1 backscatter—demonstrate the detection and characterization of analysis errors in terms of position mid amplitude errors. In addition, a 48-h forecast of Northern Hemisphere 500-hPa geopotential height is decomposed. For this case a large-scale distortion is capable of representing the larger part of the forecast error field and the displacement error is predominant over the amplification error. These examples indicate the feasibility of implementing the proposed method in an operational setting.
Article
The comparison of spatial fields of meteorological variables is an essential component of model validation studies and is central in assessing the significance of any change between a perturbed and control run of a general circulation model. Comparisions may be made of statistics which define the time-mean-state, the temporal variability about this state, and/or spatial variability. Comparisons may also be made of the two time-mean spatial patterns, or of the temporal evolutions of spatial patterns. We consider here a suite of univariate and multivariate statistics which may be used to make these comparisons. Some of these statistics have been used previously, while others are either new or have not previously been used in the present context. The use of these statistics, their differences and similarities, and their relative performances are illustrated by considering mean sea level pressure changes between the decades 1951-1960 and 1971-1980 over an area covering North America, the North Atlantic Ocean, and Europe. Significance levels are assessed using the pool-permutation procedure of Preisendorfer and Barnett (1983) (henceforth P+B). This overcomes problems arising from nonideal behavior of the data (particularly spatial autocorrelation), unknown sampling distributions, and multiplicity in the case of univariate statistics.A subset of statistics is identified as most useful. For tests of differences in means these are the grid point by grid point t-test, a test comparing the overall means, and P+B's SITES statistic. For the tests of differences in temporal variability they are the grid point by grid point F-test, and SPRET1 (the ratio of the spatial means of the time variances). SPRET1 is a modification of P+B's SPRED statistic designed to identify the direction of any variance difference. As a test of spatial variability differences, we identify SPREX1 (the ratio of the time measn of the spatial variances), and for comparing spatial patterns the best statistic is the (spatial) correlation coefficient between the time-mean fields. For comparing the temporal evolution of spatial patterns, we recommend using the time-mean anomaly field correlation which is a more easily interpreted equivalent to P+B's SHAPE statistic.
Article
Forest remnants are vital for the overall heterogeneity and health of rural landscapes. However, deforestation is a significant process afflicting large numbers of agroforested regions of the world. The Maskoutains regional county municipality (RCM) in southern Quebec, Canada, experiences intense deforestation that has reached critical levels. The goal of this study is to develop a geographic cellular automata (GCA) to model land-use change in this region and test the influence of different management scenarios on the fate of the forested remnants. The GCA was built using a 100m cell size, a Moore neighborhood configuration, a 3 years time step resolution and probabilistic transition rules derived from the comparison of two land-use maps for the years 1999 and 2002. Four groups of management scenarios were tested: (1) status quo (SQ), (2) reduced deforestation (RD), (3) promotion of ligniculture (L), and (4) protection of forest connectivity (CONN). Results indicate that none of the scenarios succeed in maintaining the actual levels of forest area. However, certain scenarios (amongst the RD and CONN), significantly alter the loss of forest areas in the short to mid-term and delay the fragmentation, reduction, and isolation of forest patches.
Article
Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000.
Article
We evaluated the relative accuracy of four methods of producing maps of long-term runoff for part of the northeast United States: MAN, a manual procedure that incorporates expert opinion in contour placement; RPRIS, an automated procedure based on water balance considerations; PnET-II, a physiologically based model of carbon/water balance in forests; and MAPSS (Mapped Atmosphere-Plant Soil System), a rule/process-based vegetation distribution/water balance model. Our goal was to confirm the accuracy of the modeling and mapping procedures, and to see if any improvements to the models and methods might be suggested.In our analyses, we compared contour maps derived from the four methods both qualitatively (visual inspection) and quantitatively (raster overlay and uncertainty analysis). The manual and automated (RPRIS) methods gave the best results. Our analyses suggest that methods directly integrating gaged runoff data (i.e. MAN and RPRIS) provide the best results under current climatic conditions. For predicting runoff under altered conditions, e.g. climate change, the existing models studied here (i.e. PnET-II and MAPSS) hold significant promise.
Article
The scope of this paper is to introduce a suite of new multiscale statistical measures which can be used, in addition to traditional measures, to compare observed and model-predicted patterns for model validation. Recent research on analysis of observed precipitation patterns at a multitude of scales has revealed interesting spatial and spatiotemporal organizations which have often been related to physical properties of the storm environment. By testing whether this multiscale statistical organization is also reproduced in the model-predicted patterns or whether there are significant biases and disagreements in such comparisons is conjectured to hold promise for understanding model performance and guiding future model improvements. Results from application of the developed methodologies to the May 7-8, 1995, multisquall line storm over central Oklahoma are presented and discussed in light of the additional information gained by the new validation measures as compared to traditional measures.