Content uploaded by Alex Hagen-Zanker
Author content
All content in this area was uploaded by Alex Hagen-Zanker
Content may be subject to copyright.
COMPARING CONTINUOUS
VALUED RASTER DATA
A CROSS DISCIPLINARY LITERATURE SCAN
Text: Alex Hagen-Zanker, RIKS bv
Layout: RIKS bv
Illustrations: RIKS bv
Published by: RIKS bv
© RIKS bv
June 2006
This is a publication of the Research Institute for Knowledge Systems (RIKS bv),
Abtstraat 2a, P.O. Box 463, 6200 AL Maastricht, The Netherlands,
http://www.riks.nl, e-mail: info@riks.nl, Tel. +31(43)388.33.22, Fax. +31(43)325.31.55.
Product information
This report presents research conducted that has been conducted with a view on further development of the
MAP
COMPARISON KIT. So far, the focus of the product has been on the comparison of multinomial maps, future
extensions will incorporate methods for continuous data discussed in this report.
The
MAP COMPARISON KIT has been developed as part of a series of projects carried out for the Netherlands
Environmental Assessment Agency (MNP), P.O. Box 1, 3720 BA Bilthoven, The Netherlands, the National Institute
for Coastal and Marine Management (RIKZ), P.O. Box 20907, 2500 EX Den Haag, The Netherlands, and, the
European Commission, Directorate General Joint Research Centre, Institute for Environment and Sustainability,
Land Management Unit, Urban and Regional Development Sector, Ispra, Italy.
For more information you are kindly requested to contact RIKS bv.
The latest information regarding the
MAP COMPARISON KIT, including further plans for development, new versions
of the software and/or documentation, will be made available from the project web-site:
http://www.riks.nl/mck.
Comparing continuous valued raster
data: A cross disciplinary literature
scan.
Alex Hagen-Zanker
Submitted to:
Netherlands Environmental Assessment Agency
Postbus 303
3720 AH Bilthoven
© RIKS bv June 2006
Research Institute for Knowledge Systems bv
P. O. Box 463
6200 AL Maastricht
The Netherlands
www.riks.nl
CONTENTS
CONTENTS................................................................................................................................................5
INTRODUCTION......................................................................................................................................7
1 BACKGROUND............................................................................................................................9
1.1 Image analysis.............................................................................................................................9
1.2 Meteorology..............................................................................................................................10
1.3 Spatial Statistics ........................................................................................................................10
1.4 Other fields ...............................................................................................................................11
2 METHODS ..................................................................................................................................13
2.1 State-of-the-practice versus state-of-the-art..............................................................................13
2.2 Fuzzy Numerical.......................................................................................................................14
2.3 An intensity-scale approach ......................................................................................................15
2.4 Wavelets and field forecast verification....................................................................................16
2.5 Image quality assesment ...........................................................................................................17
2.6 Information weighted comparison ............................................................................................19
2.7 Clustering of model errors ........................................................................................................21
2.8 Bivariate spatial association......................................................................................................22
2.9 Image warping ..........................................................................................................................23
3 DATA ...........................................................................................................................................25
3.1 Synthetic dataset .......................................................................................................................25
3.2 Practical dataset ........................................................................................................................26
4 RESULTS.....................................................................................................................................27
4.1 Cell-per-cell difference .............................................................................................................27
4.2 Fuzzy Numerical.......................................................................................................................30
4.3 Image Quality Assessment........................................................................................................32
4.4 Wavelet verification..................................................................................................................36
4.5 Image Warping..........................................................................................................................38
4.6 Bivariate Spatial Association ....................................................................................................39
5 CONCLUSIONS AND RECOMMENDATIONS ....................................................................43
6 REFERENCES............................................................................................................................45
ANNEX A: DETAILED RESULTS IMAGE QUALITY ASSESSMENT....................................49
- 5 -
INTRODUCTION
The main task of the Netherlands Environmental Assessment Agency is to advise the Dutch
government on a wide variety of environmental issues from a scientific base. Naturally the
advice is often based on spatial analysis and modelling. Map comparison is a reoccurring task
and is necessary for quantifying, visualizing and understanding analysis results as well as going
through the modelling process of verification, validation and calibration. In cooperation with the
Research Institute for Knowledge Systems a software tool is developed that supports this type of
analysis: The Map Comparison Kit (Visser et al. 2004).
The Map Comparison Kit (MCK) supports spatial modellers and analysts with a number of
methodologies for quantifying differences between raster maps. The purpose is to provide
insight in the extent, nature and spatial distribution of differences and similarities in pairs of
maps. Although the tool presents state-of-the-art techniques for the comparison of categorical
raster maps, the methodologies for numerical maps are only rudimentary. This report offers a
cross-disciplinary literature scan as a preliminary step to extending the functionality of the
MCK with advanced numerical map comparison methods.
A major challenge of map comparison is that the information contained in a map is more than
the sum of information present in all individual elements (pixels or cells in a raster map) of the
map, since essential information is captured in their spatial relationships (e.g. clustering
proximity, connectivity etc.) It is surprising then that the state of the practice in comparison
methods is still the cell-by-cell comparison. Luckily, this discrepancy is recognized and in
recent years considerable research has been directed at quantifications of map similarity that
account for spatial structure.
One strategy to involving spatial structure in the comparison is the recognition of features in the
landscape and basing the comparison on characteristics of those features. In categorical maps
the simplest, and possibly most meaningful features, are patches. Patches are groups of
contiguous cells taken in by a single category. Several of the comparison methods in the MCK
are based on the comparison of patch characteristics.
Another strategy to involve spatial structure in the comparison is multi-scale analysis. Here, the
main idea is that a single map contains information on several scales and that pairs of maps may
be similar at some but not all scales. One interpretation is to equate scale to resolution of the
raster, coarser scales are then found by aggregating cells. Alternatively a moving window can be
applied; in that case, the resolution of the coarse scale map is equal to that of the fine scale map
but values at the coarse scale are found as the aggregate (e.g. mean or distance weighted mean)
of the fine scale cells within a window. The scale of the map is then determined by the size of
the moving window and the distance decay weights.
An assumption in writing this report was that similar strategies can be followed for numerical
maps as well. More than an exhaustive overview the report is intended to provide a cross-
section of available methods and demonstrate their relative merits on two test datasets. Raster
data is found outside the field of environmental modelling and analysis. Relevant
methodological contributions were found in different disciplines such as image analysis,
geographical information science, hydrology, meteorology and biometrics.
It is recognized that there are many purposes for comparing maps, ranging from assessing
historical trends to detecting patterns in large collections of spatial data. This report, however, is
written from the point of view of model validation and assumes that the compared maps are a
pair of one observed and one forecasted map for the same moment in time, covering the same
- 7 -
area on an identical raster. Other applications are not excluded and the reader is kindly invited
to think ‘out of the box’.
The report in structured as follows. Chapter 1 gives background information on the different
disciplines of which comparison methodologies have been investigated. Pointers to relevant
literature are given and crucial particularities are discussed. Chapter 2 follows by highlighting 8
of the methods that were found in the literature. Their rationale is explained and the methods are
discussed in the light of their general applicability. A selection of 5 of the methods is evaluated
in Chapter 4 on the basis of 2 test cases that are introduced in Chapter 3. Conclusions and
recommendation are given in Chapter 5.
- 8 -
1 BACKGROUND
Researchers from different fields are facing similar problems when evaluating spatial data
(Boots & Csillag 2006), to recognize this fact, methodological contributions from different
disciplines have been sought. In particular the disciplines of image analysis, meteorology and
geographical information science have been investigated. Although the disciplines consider
similar problems there are also significant differences and this first section is intended to
highlight some particularities of the different fields.
1.1 Image analysis
Image analysis is concerned with abstracting information (measurements) from digital images.
The field of image analysis is vast and for the current purpose only methodological
contributions to image analysis are considered that compare greyscale images. Of course digital
images and maps are not the same; nevertheless greyscale images can be seen as continuous
valued maps, where the mapped property is luminance. Two main purposes are served by the
comparison of greyscale images.
The first main purpose is to quantify the effect of distortions by such as noise or other artefacts
by comparing the original image to a distorted version of it. This type of measurement is for
instance applied to evaluate the performance of image compression algorithms. The overview
paper by Eskicioglu & Fisher (1995) discusses and compares a number of such methods.
The second main purpose is content based image retrieval. Here the key is to find the image(s)
most similar to a target image in a database. This has many practical purposes, for instance in
biometrics(e.g. fingerprint and iris scan recognition) and data mining. An overview paper on
content based image retrieval was published by Smeulders et al. (2000) and presents over 200
references.
A particularity of greyscale images is that a single pixel practically has no meaning at all. The
grey level of a pixel is only a very indirect measurement of what is being observed, not in the
least because in general images are 2-D projections of 3-D objects. As an effect, one and the
same grey level in a picture of a human face may be present in the hair, eyes, skin or mouth
depending on how the light falls. As a consequence the meaning of a pixel should be based on
the context and the relation between the pixel and its surrounding may be highly complex.
Another particularity that may be less concerning is that greyscale images only contain positive
values, whereas in models of natural systems also negative value may occur (e.g. net rainfall =
rainfall – evaporation). Also, in practice greyscale images are not truly continuous; typically a
greyscale image only uses 256 grey levels. It is therefore quite feasible, and common, for
methodologies in image analysis to tabulate all possible greyscale values in a histogram. This
poses a problem of transferability when the method is applied on a map with truly continuous
values. A straightforward solution is the creation of bins (classes defined by upper and lower
boundaries), although the choice of the boundaries introduces an additional degree of
subjectivity.
- 9 -
1.2 Meteorology
Interest in the weather is from all times, weather has an impact on our lives ranging from
agricultural production to our sheer mood. It is therefore not surprising that there is a long
tradition on weather forecasting and that some of the comparison methods used for evaluating
weather forecasts date back to the late 19
th
century. An excellent overview of verification
methods for spatial weather forecast is maintained on the Internet (Ebert 2005)
As the field has evolved for such a long time, methods have refined and are in cases highly
adjusted to a particular kind of weather forecast and have limited general applicability. An
example of such an specialized approach is given by Ebert & McBride (2000). This elaborate
method aims at the verification of precipitation in weather systems. It recognizes Contiguous
Rain Areas on the basis of rain intensities and heuristic thresholds, and evaluates the cells within
these areas on errors attributed to location and quantity (intensity).
A common characteristic of many weather models is that they consider well localized
phenomena that follow a trajectory over space and time. For these systems it is therefore helpful
to attribute error to location, timing and magnitude. Another particularity is that weather
changes fast and it is being measured throughout the world and already for a long time. This
implies that comparisons between observed and forecasted maps can be made for large samples
of applications, which opens opportunities for analysis of the distribution of errors and to
generalize about the performance of different models. Exploring the temporal aspect of forecast
similarity is beyond the scope of this report.
Climate modelling is different from weather forecasting as it deals with large scale processes
over large timescales. Climate models are politically sensitive and results based on climate
models are heavily scrutinized. It is therefore not surprising that model evaluation of climate
models have much more focussed on statistically underpinning straightforward cell-by-cell
based methods, instead of introducing new “wild” structure based comparison methods.
Examples of such statistically rigorous papers are Wigley & Santer (1990) and Santer & Wigley
(1993).
1.3 Spatial Statistics
Spatial statistics is a discipline within the field of Geographical Information Science. It
recognizes that the application of regular statistical approaches on geographical data is often
problematic. O’Sullivan & Unwin (2003) provide an overview of properties of geographical
data that causes these problems, these are:
• spatial autocorrelation
• modifiable area unit problem, which holds that the conclusion of statistical analysis
may strongly depend on the subjective definition of area units
• scale and edge effects, relations between spatial variables may be different across
scales.
The problem of spatial autocorrelation is of particular interest. Positive spatial autocorrelation
means that the values at a given location are similar to those found in the neighbourhood of the
location. This violates the common assumption in statistics that samples are mutually
independent and in effect means that the sample size is overestimated. Practically all maps
display spatial autocorrelation, therefore dedicated statistics are required for analysis of
geographical data. Statistics taking this spatial correlation into account make use of a spatial lag,
in analogy to the time lag in time series analysis. Moran’s I and Geary’s C statistics are
examples of statistics that measure the degree of autocorrelation as a function of the spatial lag.
- 10 -
1.4 Other fields
It is evident that continuous spatial data are relevant in other disciplines besides the three
mentioned in this section. Three examples of large domains that have been practically ignored
in this report are remote sensing, landscape ecology and hydrology.
Remote sensing concerns the observation of the earth from remote equipment, such as satellites
and aeroplanes. The results from remote sensing exercises typically are raster maps often of
continuous values. The accuracy assessment of remote sensing is often based on map
comparison. However remote sensing is very much pixel / cell oriented and the evaluation of
results is typically a cell-by-cell evaluation (Foody, 2002), therefore it has not received much
attention in this report.
Landscape ecology emphasizes the interaction between spatial pattern and ecological process
and the field has contributed much to the analysis spatial structure (Turner et al. 2001). The field
has been left largely out of consideration in this report, because of its focus on categorical
(multinomial) data.
An overview of comparison methods in the field of hydrology has been presented by Wealands
et al. (2005). The message of that paper is that the state of the art in map comparison for the
evaluation of hydrological models is largely limited to cell-by-cell mean squared error
calculations and the authors seek their inspiration in other disciplines.
- 11 -
2 METHODS
2.1 State-of-the-practice versus state-of-the-art
A discrepancy can be observed between the state of the art in map comparison and the state of
the practice on the other hand. This means that although methods are becoming available to
compare maps accounting for the spatial structures present in the data, the most practiced
procedures rely on cell-by-cell evaluations. This is noted by Wealands et al (2005) and
confirmed in many papers on model application based papers such as: Ahrens et al (1998),
Bishop et al. (2005), Garen & Marks (2005), Liu et al (1997), Strasser & Mauser (2001),
Viscarra Rossel & Walter (2004), Zhou & Liu (2004).
As a consequence the innovative methods have hardly established themselves and it is hard to
qualify their merits on the basis of practical applications. A contributing factor probably is that
spatial explicit comparisons tend to be rather complicated; not only from a conceptual point of
view but also the technical implementation in a Geographical Information System or other
software.
The availability of tools such as the Map Comparison Kit can lead to a wider dissemination and
saturation of promising methods in actual research practice. After introduction of the Fuzzy
Kappa methodology by Hagen (2003) followed an introduction of the software (Visser & de
Nijs 2005) and ultimately researchers are applying the method as part of the validation process
of their models (Prasad et al. 2006, Ménard & Marceau 2006). Another example of
dissemination via software implementation is the Idrisi GIS package which offers amongst other
the validation methodology by Pontius et al. (2004)
This chapter describes a number of selections from the vast body of work in which
methodologies are introduced. The selection of these papers is largely subjective; those papers
were chosen that are considered ‘promising’, ‘original’ or ‘highly cited’. Also it is attempted to
attain a degree of diversity, by including methods that apply different strategies and origin from
various disciplines. Not all methods are intended as generally applicable metrics, where
necessary comments are given regarding the general applicability.
- 13 -
2.2 Fuzzy Numerical
The point of the Fuzzy Numerical statistic is that the similarity at one location is set by the
degree to which a cell in one map is similar to its counterpart in the other map, or but discounted
for the distance the similarity to cells found in the direct neighbourhood of the counterpart in the
other map.
The Fuzzy Numerical method did not come forth out of the literature research. This method was
developed by the Research Institute for Knowledge Systems and implemented in the Map
Comparison Kit as part of earlier work for the Netherlands Environmental Assessment Agency.
The method follows the rationale of the Fuzzy Kappa (Hagen 2003, Hagen-Zanker et al. 2005)
but is adjusted to work with continuous instead of categorical data. The method is documented
in the MCK User manual (Hagen-Zanker et al. 2006).
Equations 1-4 describe the metric. It should be noted that equations 1-3 are generic as they are
open to the actual similarity function f(a,b). Equation 4 is the particular similarity function that
is applied. Naturally, the outcome of the comparison will depend on the particular distance
weight function that is being applied and its respective parameters. This introduces a degree of
subjectivity to the comparison.
()
()
(
)
(
)
,
,max ,
N
ijij
sAB fAB wd=∗
ij
() ()()
()
1.
2.
,min ,,,
iii
SAB sABsBA=
() ()
1
1
,,
i
i
SAB S AB
n
=
=
∑
n
3.
()
()
,1
max ,
ab
fab
ab
=−
−
4.
Where s
i
(A,B) is the one-way similarity between map A and B at cell i. Index j iterates
through all N cells in the neighbourhood of cell i. S
i
(A,B) combines the two one-
way similarities into an overall similarity. S(A,B) is the overall map similarity, it is
the mean of over all n locations. The function f(a,b) determines the similarity of
two values. The function w(d) gives the weight pertaining to the distance.
- 14 -
2.3 An intensity-scale approach
Casati, B., Ross, G., & Stephenson, D. B. (2004). A new intensity-scale approach for the
verification of spatial precipitation forecasts. Meteorological Applications, 11(2), 141-
154.
The intensity scale approach (Casati et al. 2004) attributes errors to different scales and value
ranges (intensity). To separate errors over different scales, wavelet analysis is applied. Different
ranges are found by reclassification.
This comparison method operates in two phases; the first phase is pre-processing in this phase
the input maps are manipulated to the effect that maps will be compared on the basis of the
relative distribution of values over the map, rather than the absolute values. The second is the
intensity-scale verification where degrees of similarity are found related to scale (wavelet
decomposition level) and intensity (threshold value).
The first step of the pre-processing phase is dithering. In this step both the observed and the
forecasted map have some noise added. The amplitude of the noise is halve of the minimum
non-zero difference in values between locations and serves to ‘compensate for discretization
effects caused by finite precision storage of the precipitation rate values’, or in other words to
eliminate the occurrence of same valued cells.
The second step of the pre-processing phase is ‘normalization’ in which every cell in the map is
subjected to a 2-based logarithmical transformation. The transformation value of 0 is the 2-
based logarithm of to the amplitude of the dithering function.
The third step of the pre-processing phase is called recalibration and each value in the forecast
map is replaced by the value with the same empirical cumulative probability in the observed
map. Which, in effect, means; the same rank number when all values in the map are sorted from
high to low. The difference between the calibrated and the original forecast map can be seen as
a measure of differences in quantity (frequency distribution, or histogram).
In the intensity-scale verification phase first a binary map is created for both the observed and
pre-processed forecast map. The binary map has value 1 for all cells above a threshold value
and 0 for all others. As an effect of the pre-processing the number of cells of value 0 resp. 1 are
identical in both maps.
The binary maps are compared cell-by-cell yielding a comparison map with three possible
values:
• Value -1 indicates present in observed but not in forecast
• Value 1 indicates present in forecast but not in observed
• Value 0 indicates present in both or in neither
The second step of the intensity-scale verification phase is the decomposition of the squared
error over different scales using wavelet decomposition on the basis of the Haar wavelet. Due to
the orthogonal and orthonormal properties of wavelets the squared error at each location as well
as the mean squared error (MSE) can be decomposed into error pertaining to each scale (4, 16,
64, 256 etc cell aggregates). Also following the assumption that random error is equally
partitioned over all scales the Heidke Skill Score (a.k.a. Kappa statistic) can be can be
decomposed.
The first and the second step of the intensity scale verification are repeated for all possible
values of the observed map, before pre-processing. Yielding, finally, plots of MSE as a function
of scale and threshold.
- 15 -
Some notes
The use of wavelets allows attributing errors to different scales. This is of course a beautiful
characteristic that is also exploited by Briggs & Levine (1997); therefore section 2.4 includes a
comparison on main lines between the methods.
The first step of dithering is strictly a pragmatic one and serves its purpose. For transparency it
should be noted that the same effect could be achieved by randomly ordering same-values cells
in the forecast map when sorting them in the third step of pre-processing.
The second step of the first phase introduces a redundancy to the procedure. The logarithmic
transformation does not alter the ranking of the values on the map, only the relative difference
in values. So it has no effect in the recalibration phase. Furthermore, the binary reclassification
step can be performed for transformed values just as well as untransformed values and is not
helped by the normalization either.
It seems that the authors have realized this redundancy because all results are expressed in units
relating to the non transformed values (i.e. mm/h, and not log(mm/h) ). Somehow this part of
the procedure has not been edited out of the paper. It is our recommendation to simply leave out
the logarithmic transformation.
The idea of recalibration as a technique for separating errors in quantity and in location is
compelling and may very well be applied in combinations with other comparison methods too.
It certainly warrants further investigation.
2.4 Wavelets and field forecast verification
Briggs, W. M., & Levine, R. A. (1997). Wavelets and field forecast verification. Monthly
Weather Review, 125(6), 1329-1341.
The main idea of the comparison operation proposed in this paper is to decompose the input
maps to a pile of maps at different scales, by using a discrete wavelet transform. The maps at
different scales are then compared against each other on the basis of two methods Root Mean
Squared Error and ACC, this second requires a climate field and can otherwise be replaced by
the correlation r.
The wavelet transformation serves two purposes. The first is removal of noise from the data.
The second is to attribute differences between forecast and observation to different scales.
The steps that are recognized:
• Step 1. Perform a wavelet transformation of the observed/forecast (not clear) for a
number (library) of wavelets. Of the transformations pick the one with lowest
Shannon Entropy.
• Step 2. Take the selected wavelet and apply a soft threshold function to remove
noise. The threshold value should be based on the Shannon entropy for each layer.
• Step 3. The different scales are compared with respect to rmse (quantity), r
(pattern), ER (energy). A distinction similar to that by Wang et al (2004) that is also
discussed in this chapter.
Some notes
Step 2 in the comparison, noise removal, is based on assumptions on what defines noise. In
general purpose applications such assumptions cannot easily be made, and it may be advised to
not filter the data for ‘noise’.
- 16 -
The method is mathematically thorough and the results presented in the paper are convincing. It
must be kept in mind however that the decomposition of the maps into different scales is strictly
based on information theory and may have limited meaning in a physical sense. Also, applying
discrete wavelets means that the maps are aggregated (in wavelet coordinates) to coarser
resolution (2X2; 4X4; 8X8; 16X16, etc) which sets the border between aggregated cells more or
less arbitrary. Theoretically, offsetting both maps a single pixel in the same direction may result
in completely different conclusions. This may be irrelevant in the case of image processing,
where wavelets are used to store the information contained in an image more efficiently, but it
may be problematic in the current case; since we are in fact interested in the information
contained at different scales.
Comparison between Briggs & Levine (1997) and Casati et al. (2004)
Although presenting similar approaches and motivations there are some differences between the
two.
Table 2-1 Two wavelet based comparison methods
Briggs & Levine (1997) Casati et al (2004)
First compares, then decomposes
the comparison map
Decomposes input maps, then
compares
Classifies numerical data to
binary to remove outliers
Removes noise, by applying a
soft filter
Separates quantity from location
but does not quantify location
error
Separates quantity, pattern and
energy
Although it would be preferable to compare both methods in a practical setting it is, chapter 4
will focus on implement Briggs & Levine (1997) a representative case of comparing maps at
multiple scales using wavelet decomposition. The reason is that this method is most consistent
in its approach (information based, retaining the continuous character of the data) and therefore
most demonstrative of its potential.
2.5 Image quality assesment
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality
assessment: From error visibility to structural similarity. IEEE Transactions on Image
Processing, 13(4), 600-612.
This method comes from the field of image processing. The objective of the method is to
quantify the difference between an image and a distorted version of it. The main idea is that
similarity is a composite measure of luminance, contrast and structure each of which is
measured in a distance weighted moving window.
The approach is a multivariate one in which the difference at each location is specified as a
function three aspects of similarity. S(x,y) = f(l(x, y), c(x, y), s(x, y)). Where S is the overall
similarity, l is the similarity of the luminance (i.e. mean), c compares local contrast (i.e.
variance) and s compares local structure (i.e. covariance). The whole is calculated by means of a
distance weighted moving window, for which the distance decay function is Gaussian. There is
a strong conceptual relation to the work of Briggs & Levine (1997) who at different scales
consider root mean squared error, correlation and energy level.
- 17 -
The main variables for the comparison metric are mean, variance and co-variance as defined in
equations 5-7. These main variables are not calculated for the whole map, but instead for each
cell on the basis of a distance weighted moving window.
5.
1
N
xi
i
wx
µ
=
=
∑
i
()
1
2
2
1
N
xiix
i
wx
σµ
=
⎛⎞
=−
⎜⎟
⎝⎠
∑
6.
()
()
1
µ
2
1
N
xy i i x i y
i
wx y
σµ
=
⎛⎞
=−−
⎜⎟
⎝⎠
∑
7.
where w
i
is a distance decay weight based on a sum normalized Gaussian function with
standard deviation = 1.5 . Index i iterates through the N cells in the window. x
refers to a window in the first map and y one in the second. The procedure is
symmetric, meaning that S(x,y) = S(y,x).
The summary statistics are combined into the three similarity values according to equations 8-
10.
()
1
22
1
2
,
xy
xy
C
lxy
C
µ
µ
µµ
+
=
++
8.
()
2
22
2
,
xy
xy
cxy
C
2 C
σ
σ
σσ
+
=
++
9.
()
3
3
,
xy
xy
sxy
C
C
σ
σσ
=
+
+
10.
where C
1
, C
2
and C
3
are constants that are introduced to the equation for stability in situations
where variability or mean are close to zero. The values are related to the range of
the pixel values (called L) via the constants K
1
and K
2
. These values have been
established heuristically by the authors. C
1
=(K
1
L)
2
, C
2
=(K
2
L)
2
, C
3
= 0.5 C
2
, K
1
=
0.01. K
2
= 0.03.
The components of similarity are combined into an overall measure of structural similarity
SSIM(x,y) by weighted multiplication. The overall structural similarity MSSIM(X,Y) is
calculated as the mean similarity over all locations
()()()
(, ) , , ,SSIMxy l xy cxy sxy
α
βγ
=
11.
1
1
(,) (, )
jj
j
M
M
SSIM X Y SSIM x y
M
=
=
∑
12.
where, X and Y are the maps each representing M windows x
j
resp y
j
, α, β and γ are
parameters for which no preferred value has been found yet, therefore they take the
neutral value of 1.
- 18 -
Some notes
The approach is clear and robust. The method has conceptual links to the moving windows
based structure comparisons already present in the Map Comparison Kit.
The paper shortly discusses the occurrence of ‘blocking artefacts’. In this context, artefacts arise
as moving window based approaches only recognize pairs of mitigating errors (i.e. an over-
prediction close to an under-prediction) when they occur in the centre of the window, but not
when they occur at the edge of the window (because only the error cell and not its mitigating
counterpart is found within the window). As an effect small spatial errors are registered as large
errors at one window distance away from their actual location.
The distance weighted neighbourhood proposed in the paper reduces the effect of blocking
artefacts by applying a distance decay weight, causing errors at the edge of the window to be
small by definition. The fuzzy weighted neighbourhood that has been applied in the calculation
of the fuzzy kappa statistic (Hagen 2003, Hagen-Zanker et al. 2005) may fully eliminate the
blocking artefacts.
2.6 Information weighted comparison
Tompa, D., Morton, J., & Jernigan, E. (2000). Perceptually based image comparison.
Paper presented at the 2000 International Conference on Image Processing,
Vancouver, Canada.
This approach (referenced by Wealands et al 2005) offers a straightforward approach to
perception based image processing. The main idea is that changes that occur within value ranges
that are common on the map are weighted less than those that lie within uncommon ranges.
Thus, an information weighted mean squared error (IMSE) is introduced. The degree to which a
value is common is expressed by the Shannon information content. The INSE is calculated
according to equations 13-16:
()
,
M
z
M
M
n
Pz
N
=
13.
()
()
1
log
M
M
Iz
Pz
⎛⎞
=
⎜⎟
⎜⎟
⎝⎠
14.
() () ()
,
xxAxxB
IMSE A B A I A B I B=−
2
x
⎡
⎤
⎣
⎦
15.
() ()
1
,,
x
x
1
n
I
MSE A B IMSE A B
N
=
=
∑
16.
where P
M
(z) is the probability (frequency) of finding value z in map M. I
M
(z) is the
information contained in a cell with value z in map M. Index x iterates over all
locations on the map and A
x
resp B
x
is the value at location x in map A resp. B.
Some notes
The limited range of values (256 grey levels) in the images that is considered in the paper
allows assessing information content on the basis of the occurrence of unique values. In
applications where higher definition continuous data is used a classification in bins is required.
- 19 -
The comparison method is not strictly a weighted MSE. This is best illustrated by an example:
consider a location that on both maps has value 5, in the first map 10% of all cells have value 5
and in the second map only 1%. This means that the information content of both cells is
different, which results in an IMSE that is unequal to zero even though the cells have the same
value:
()
() ()
2
5, 5, 5 0.1, 5 0.01
11
, 5log 5log 47.7
0.1 0.01
xxA B
x
ABP P
IMSE A B
== = =
⎡⎤
⎛⎞ ⎛ ⎞
=− =
⎜⎟ ⎜ ⎟
⎢⎥
⎝⎠ ⎝ ⎠
⎣⎦
17.
This illustrates the fact that the ‘information weighted’ MSE is not strictly speaking a weighted
MSE. This may appear inappropriate from some perspectives. It must be realized however that
in image processing the meaning of a pixel can only be considered in relation to the context (all
other pixels in this case) then it is reasonable that identical valued cells found within a different
context should actually be considered dissimilar.
The image quality assessment by Wang et al (2005) considers the context more thoroughly and
is therefore selected (instead of Tompa et al. 2000) to be further explored in Chapter 4.
- 20 -
2.7 Clustering of model errors
Zhang, L. J., & Gove, J. H. (2005). Spatial assessment of model errors from four
regression techniques. Forest Science, 51(4), 334-346.
Zhang & Gove (2005) present a methodology for assessing the spatial heterogeneity of model
performance. The authors do not consider direct mapping of the error sufficient, because that
does not identify significant clusters of positive or negative model errors. To obtain insight in
the clustering of the errors, the authors make use of a local indicator of spatial association
(LISA) called local Moran coefficient. A Moran value for each location is calculated according
to Anselin (1995) (equation 18).
()
()
()
1
n
ii ij j
j
M
Ceechee
=
=− −
∑
18.
Where e
i
and e
j
denote model errors at locations i and j, respectively, ē is the mean model
error over the whole map, c
ij
(h) is the binary spatial weight matrix as a function of
bandwidth h it takes the value 1 for all combinations of i and j where the distance
between i and j is smaller than the bandwidth and 0 otherwise.
A positive value indicates a clustering of same-valued errors, relative to the mean error. A
negative value indicates a cluster of opposite-valued errors relative to the mean.
Some notes
It should be noted that a small location error will lead to negative values in local Moran,
because it signifies an under-prediction in one location and an over-prediction near by. On the
other hand a cluster of cells for which both compared maps have identical values a positive
Moran’s I value will be found, because the deviation from the mean error of the cell and the
neighbours is same-valued. This makes the interpretation of the spatial Moran statistic quite
complex. The authors seem to realize this, as all conclusions in the paper are based on the
overall degree of clustering of errors.
- 21 -
2.8 Bivariate spatial association
Lee, S. I. (2001). Developing a bivariate spatial association measure: An integration of
Pearson's r and Moran's I. Journal of Geographical Systems, 3(4), 369-385.
Lee (2001) offers an approach to calculate bivariate spatial association reconciliating Pearson’s
r statistic as an aspatial measure of bivariate association and Moran’s I as a univariate measure
of spatial association. The resulting figure is given in equations 19-21. It is a conflation of the
bivariate spatial smoothing scalar (BSSS) and the correlation between the smoothed spatial
fields.
()
2
2
i
i
X
i
i
x
x
SSS
x
x
∧
⎛⎞
−
⎜⎟
⎝⎠
=
−
∑
∑
19.
,XY X Y
B
SSS SSS SSS=
20.
,,
,
.
XY XY
XY
LBSSSr
∧∧
=
21.
Where
i
x
indicates the mean of the neighbourhood (all cells within the spatial lag) at
location i.
is the correlation over mean fields of X and Y.
l
l
,
XY
r
Some notes
Essentially it can be said that the correlation found between the mean fields is corrected for the
degree to which X and Y are spatially autocorrelated. (SSS
X
and SSS
Y
are indications of
autocorrelation). L
X,Y
measures the extent to which both map 1 and map 2 are spatially
autocorrelated and their neighbourhood mean fields are correlated as well. It does not become
clear from the paper, why this is a good measure of bivariate spatial association.
Possibly the Cross-Moran statistic (Wartenberg, 1985) that the paper intends to improve upon
may give insight into a more relevant question: does the presence of a variable in one variable
coincides with the presence of another variable in the other map either at the exactly the same
location or in the direct neighbourhood?
- 22 -
2.9 Image warping
Reilly, C., Price, P., Gelman, A., & Sandgathe, S. A. (2004). Using image and curve
registration for measuring the goodness of fit of spatial and temporal predictions.
Biometrics, 60(4), 954-964.
The approach by Reilly et al (2004) is similar to those by Hoffman et al. (1997) and Nehkorn et
al. (2003). The paper recognizes that errors are not limited to cell-to-cell differences, which are
called vertical difference, but that there are also location differences, which are called horizontal
differences. This notion is not different from the other methods discussed in this chapter. The
special trait of this method is how it quantifies horizontal errors. For this, a transformation is
sought, consisting of stretching and compressing the forecasted map along the horizontal plain
until an optimum fit with the observed map is found. Then, the degree of stretching and
compressing is considered the horizontal error and the cell-by-cell error after the deformation is
the vertical error. Optimizing the fit requires a weighting (trade off) of errors of both kinds.
The optimal deformation is chosen according to equation 22.
22.
()
()
()
()
()
()
()
,min , ,
fD
AA
y y G y x y f x dx F x f x dx
λ
λ
∈
⎧⎫
Ι≡ +
⎨⎬
⎩⎭
∫∫
where G is the discrepancy metric (to be defined later) between the objective map y and the
deformed map
and F measures the amount of deformation. f(x) is the
deformation function, if the deformation is zero then f(x) = x and by definition
F(x,x) = 0.
y
In other words G expresses the vertical error and F expresses the horizontal error, the trade off
between the two is controlled by parameter λ.
For G a straightforward squared difference function is applied, as in equation 23.
23.
()
()
()
()
()
()
()
2
,Gyx yf x yx yf x
⎡⎤
≡−
⎣⎦
For F which is a vector the difference derivative from the f(x) = x function is chosen.
()
()
22
12
12
,1
ff
Fxf x
xx
⎛⎞⎛
∂∂
≡−+−
⎜⎟⎜
∂∂
⎝⎠⎝
1
⎞
⎟
⎠
24.
- 23 -
Some notes
Using morphological deformations as a map comparison method seems a elegant solution to
separating errors due to location and quantity. Even though the algorithm to find the solution is
highly complicated, the results can be well interpreted. It is adaptive to the data and thus
overcomes several problems that are associated to other methods that also intend to achieve a
balanced judgement of location errors and quantity errors. These are:
• Moving window based methods:
o Possibility of multiple compensation; in a moving window compensating
errors may be found, i.e. within the window overestimation balances
underestimation. It is well possible that underestimation at one site balances
overcompensation (or vice versa) at several other sites.
o Homogeneity and isotropy of the window. In all applications that I am
aware of the window size is the same for all locations. Also the window is
symmetric favouring all directions equally (or in the case of square
windows, having an unwarranted bias in some directions).
o Blocking artefact as discussed in Section 2.5
• Aggregation based methods
o Modifiable area unit problem. This problem relates to the fact that the
results of aggregations based methods can depend for a major extent on a
trivial decision. i.e. the decision of how the major grid is positioned over
the minor grid. A small deviation may lead to completely different results.
o Homogeneity of the analysis unit is a problem here as well
The strong adaptation to the morphology of the data comes at a cost however. Firstly, the
method requires high computation times. Maps that normally take less then a second to compare
may consume many minutes or even hours with this method. Moreover, the numerical analysis
does not in all cases lead to a solution. The method requires smooth maps that have a reasonable
similarity in pattern. In other words, the method is not as robust as the moving window and
aggregation based methods.
Another characteristic of the method is that in the transformation the map is distorted in such a
way that the area weighted mean (or the integral over the whole area) is not preserved. This may
be considered problematic, especially in the case of maps that represent ‘stock’ variables, such
as population or mass
- 24 -
3 DATA
The data used to evaluate a selection of the methods discussed in the previous sections have
been submitted by the Netherlands Environmental Assessment Agency. The first dataset is
synthetic and was specifically created to represent errors at different spatial scales. The second
dataset consists of three maps. Two of which are results by metamodels and the third is the
‘ground truth’ created by the original model that the other two are an approximation of.
3.1 Synthetic dataset
The first dataset consists of the following maps:
1a 1b
2a 2b 2c
3a 3b 3c
Figure 3-1 The synthetic dataset
- 25 -
The difference in the maps are known and can be summarized as follows:
• The underlying gradient of the 3a, 3b and 3c maps is higher than that of 2a, 2b and
2c, whereas map 1a and 1b have no underlying gradient at all.
• All differences found between maps 1a and 1b are attributed to the relative location
of the spots.
• All differences found between maps 2a and 2c as well as maps 3a and 3c are
attributed to the relative location of the spots.
• All differences between 2a and 2b, as well as between 3a and 3b are attributed to
the reverse direction of the gradient from southwest to northeast.
• The differences between 2b and 2c, as well as between 3b and 3c are the result both
of the reversed gradient and differences in the location of the spots.
3.2 Practical dataset
The second dataset consists of the following maps:
GeoPearl
MetaPearl Index
Figure 3-2 The practical dataset
The map GeoPearl is output of the GeoPearl model, whereas Index and MetaPearl are the
outputs of metamodels that approximate GeoPearl.
- 26 -
4 RESULTS
4.1 Cell-per-cell difference
4.1.1 Synthetic dataset
The cell-per-cell differences in the synthetic dataset are depicted in Figure 4-1. The synthetic
nature of the dataset is manifest as the differences are exactly as described in Chapter 3.
1a-1b
2a-2b 2a-2c 2b-2c
3a-3b 3a-3c 3b-3c
Figure 4-1 Cell by cell differences in the synthetic dataset
- 27 -
The correlation can be calculated on the basis of a cell-by cell evaluation as well (Table 4-1).
The correlation clearly picks up on the reverse trend at the lower scale that is present in pairs 2a-
2b, 2b-2c , 3a-3b and 3b-3c. The effect of these reverse spatial trends is a negative correlation
value.
Table 4-1 Correlation (Pearson)
Pair R
1a-1b 0.403
2a-2b 0.106
2a-2c 0.645
2b-2c -0.217
3a-3b -0.408
3a-3c 0.806
3b-3c -0.577
4.1.2 Practical dataset
The cell-by-cell difference in the practical dataset are given in Figure 4-3. It appears that the
model errors are strongly clustered. Furthermore it is suggested that the clustering is related to
the structure of soil map of the Netherlands (Figure 4-2). Exploring the relation between soil
type and model error is not within the scope of this report. It is however strongly recommended
to investigate the nature of this degree of clustering
1
. The correlation figures indicate that the
output maps of the two metamodels are less similar to each other than to the original GeoPearl
results.
Figure 4-2 Soil map of the Netherlands (source: www.bodems.nl)
1
The GeoPearl webpage, www.alterra-research.nl/pls/portal30/docs/folder/pearl/pearl/geopearl.htm,
indicates that GeoPearl is based on runs for 6405 unique combinations of basic model inputs (soil type,
climate district, land-use type, groundwater depth class etc). It is recommended to compare the model and
metamodels at the level of these combinations rather than the raster map visualization.
- 28 -
Index – MetaPearl: R = 0.834
MetaPearl-GeoPearl R = 0.913 Index – GeoPearl R = 0.945
Figure 4-3 Cell-by-cell differences in the practical dataset
- 29 -
4.2 Fuzzy Numerical
4.2.1 Synthetic Dataset
The fuzzy numerical approach (Figure 4-4) is clearly able to discern small spatial errors from
large spatial errors in map pair 1a-1b. Large spatial errors found in the northwest where dots are
found in one map but not in the other, in the other parts of the map there are also differences in
the spots, but these are attributed to shifts in location and are thus minor difference. In map pairs
2a-2c and 3a-3c the structure of the spots is identical, but due to the gradient ‘mitigating’ values
are found in the neighbourhood and the locations that are in map pair 1a-1b are considered
major difference are now seen as minor differences. As the gradient is stronger in pair 3a-3c, the
effect of mitigating is also stronger. As a result map pair 3a-3c is considered most similar.
1a-1b: S = 0.675
2a-2b: S = 0.596 2a-2c: S = 0.845 2b-2c: S = 0.572
3a-3b: S = 0.572 3a-3c: S = 0.886 3b-3c: S = 0.563
Figure 4-4 Fuzzy similarity, R = 15 halving = 3
- 30 -
4.2.2 Practical dataset
For the practical dataset the Fuzzy Numerical approach indicates that differences in both maps
are mainly minor ones. The stronger differences are found in the north for Index and northeast
for MetaPearl
GeoPearl-Index
S = 0.827
GeoPearl-MetaPearl
S = 0.813
Figure 4-5 Fuzzy similarity, R = 15 halving = 3
- 31 -
4.3 Image Quality Assessment
4.3.1 Synthetic dataset
Figure 4-6 displays the main outcomes for the first dataset on the basis of the default parameters
given in the paper (radius = 11, deviation = 1.5).
1a-1b: SSIM = 0.243
2a-2b: SSIM = 0.759 2a-2c: SSIM = 0.391 2b-2c: SSIM = 0.253
3a-3b: SSIM = 0.684 3a-3c: SSIM = 0.441 3b-3c: SSIM = 0.239
Figure 4-6 Main image quality assessment results
As expected the similarity of 2b and 2c is lower than that of either 2a and 2b or 2a and 2c,
because it entails the differences found in both (likewise for the combination 3a,3b and 3c). It
may be surprising that both in pattern and absolute value the difference between map 1a and 1b
is more akin to 2b and 2c than to 2a and 2c (likewise for the combination 3a, 3b and 3c) . This
can be explained by the similarity in contrast as a result of the gradient, that is found in the pairs
2a-2c and 3a-3c, but not in 1a-1b.
- 32 -
The results as a function of scale illustrate how the indicators respond differently to an
increasing scale. Both similarities in luminance and contrast increase along with the scale of the
analysis. This is expected as the increase in scale signifies an increase in tolerance for location
error. In order to learn from the scale-similarity plots it is necessary to look beyond the trend,
which is practically always positive and consider the steepness instead.
Structure on the other hand does not have a direct relation with scale. Since it calculates a
correlation within the window, it does not imply that difference are smoothed away. As the
window expands it may pick up on large scale similarities (such as the similarity in gradient,
recognized in Figure 4-9 and Figure 4-12) or dissimilarities (e.g. the misplaced spots in Figure
4-7, or the reversed gradient in Figure 4-8 and Figure 4-11).
- 33 -
Figure 4-7 Differences in map pair 1a-1b
Figure 4-8 Differences in map
pair 2a-2b
Figure 4-9 Differences in map
pair 2a-2c
0
0.2
0.4
0.6
0.8
1
1.2
024681012
Sigma (Radius = 5XSigma)
Similarity
Luminance
Contrast
Structure
Figure 4-10 Differences in map
pair 2b-2c
Figure 4-11 Differences in map
pair 3a-3b
Figure 4-12 Differences in map
pair 3a-3c
Figure 4-13 Differences in map
pair 3b-3c
- 34 -
4.3.2 Practical dataset
The results of the practical dataset (Table 4-2) indicate that Index better resembles the GeoPearl
reference map than MetaPearl. The difference is mainly due to the difference in structure, and
not or hardly due to luminance and contrast. This implies Index and MetaPearl achieve similar
results within a small moving window, but Index is better capable of predicting the local peaks
and troughs.
It is striking that the difference between the two models is larger than the discrepancy with the
reference map in both situations. It suggests that the models complement each other i.e. they
both get something right that the other does not. As a blunt approach of exploiting this fact a
fourth map has been created, which is the mean of the two model maps, and indeed this
‘improved model’ outperforms the other two on all accounts (Table 4-2).
Figure 4-14 details the spatial distribution of the error. The similarity in structure to the soil
map, that was recognized in Section 4.1 is obscured here because the direction of the error
(over- or under-estimation) is not reflected in the map. Still, it is recommended to consider the
spatial distribution of the error in light of the soil map.
Table 4-2 Image quality assessment results, for the practical dataset
SSIM Luminance Contrast Structure
Deviation = 1
Index - GeoPearl 0.87 0.98 0.95 0.93
MetaPearl - GeoPearl 0.77 0.98 0.94 0.83
Index - MetaPearl 0.61 0.96 0.91 0.69
Deviation = 4
Index - GeoPearl 0.91 0.99 0.98 0.93
MetaPearl - GeoPearl 0.83 0.99 0.98 0.85
Index - MetaPearl 0.68 0.98 0.97 0.72
Deviation = 1
Mean - GeoPearl 0.92 0.99 0.97 0.96
GeoPearl-Index GeoPearl-MetaPearl
Figure 4-14 Spatial distribution of structural difference SSIM in the practical dataset.
- 35 -
4.4 Wavelet verification
4.4.1 Synthetic dataset
The results of the wavelet verification are in line with expectations. Those map pairs that have
opposing gradients find a large error at the second to coarsest scale (the coarsest scale simply
compares the mean of the two maps). The pairs with identical or no gradient only find errors at
the finer scale, mainly on the 2X2 and 4X4 aggregation. The maps that contain both types of
errors do indeed display two peeks for the mean squared error (4X4 and 32X32). The only
downside of the wavelet verification approach is that not all of the coarse scale errors are
registered as such; the graphs for the pairs 2a-2b and 3a-3b do not unambiguously make clear
that the errors are coarse scale only, instead they suggest that the error is mainly coarse scale but
to a lesser extent also fine scale. This may be related to the fact that the type of wavelet applied
is a discrete Haar wavelet which is not sufficient to capture a smooth trend at a coarse scale.
The correlation results (Figure 4-16) are in line with those of the mean squared error. It can be
considered disappointing that the procedure does not fully attribute recognize the inverse
relation at the coarsest scale that is present in pairs 2a-2b, 2b-2c, 3a-3b and 3b-3c. In the ideal
situation the comparison method would recognize the perfect a negative correlation and return
the value -1. the explanation may be sought in the inclusion of NoData values in the comparison
(as value 0).
1a-1b
0
3
6
9
12
15
1 2 4 8 16 32 64
Scale (cell size)
Mean Squared Error
2a-2b
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
2a-2c
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
2b-2c
0
3
6
9
12
15
1 2 4 8 16 32 64
Scale (cell size)
Mean Squared Error
3a-3b
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
3a-3c
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
3b-3c
0
3
6
9
12
15
124816326
Scale (cell size)
Mean Squared Error
4
Figure 4-15 Wavelets verification results for the synthetic dataset. Mean squared error
is attributed to different scales.
- 36 -
1a-1b
0
0.2
0.4
0.6
0.8
1
1248163
Scale (cell size)
Correlation
2
2a-2b
0
0.2
0.4
0.6
0.8
1
1248163
Scale (cell size)
Correlation
2
2a-2c
0
0.2
0.4
0.6
0.8
1
1248163
Scale (cell size)
Correlation
2
2b-2c
0
0.2
0.4
0.6
0.8
1
1 2 4 8 16 32
Scale (cell size)
Correlation
3a-3b
-0.1
0.1
0.3
0.5
0.7
0.9
1 2 4 8 16 32
Scale (cell size)
Correlation
3a-3c
0
0.2
0.4
0.6
0.8
1
1 2 4 8 16 32
Scale (cell size)
Correlation
3b-3c
-0.1
0.1
0.3
0.5
0.7
0.9
1248163
Scale (cell size)
Correlation
2
Figure 4-16 Wavelets verification results for the synthetic dataset. Correlation is
depicted for different scales. Note that the correlation for the coarsest scale of
64X64 cells is not calculated. At that scale, all cells in the map are taken in by the
same value, which render the correlation meaningless.
4.4.2 Practical dataset
The results for the practical dataset indicate that for both maps the strongest differences are
related to the finest scale, however up to very large scales (512X512 cells, a quarter of the map)
structural differences are found. Based on the strong spatial clustering of errors (Section 4.1) a
more pronounced error at larger scales was expected. The length of the ‘tail’ indicates that Index
has errors at a coarser scale than MetaPearl. The correlation and mean squared error provide
approximately the same information.
index-meta
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
1
2
4
8
1
6
3
2
64
1
28
2
56
5
12
1024
Scale (cell size)
Mean Squared Error
index-meta
0.85
0.88
0.91
0.94
0.97
1
1
2
4
8
16
3
2
6
4
12
8
25
6
5
12
1
02
4
Scale (cell size)
Correlation
Figure 4-17 Comparison of the two metamodels, Mean Squared Error and Correlation
as a function of scale
- 37 -
index-geo
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
1
2
4
8
1
6
32
64
1
28
256
5
12
1024
Scale (cell size)
Mean Squared Error
meta-geo
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
1
2
4
8
16
32
64
12
8
25
6
5
12
1024
Scale (cell size)
Mean Squared Error
index-geo
0.85
0.88
0.91
0.94
0.97
1
1
2
4
8
1
6
32
64
12
8
2
5
6
5
1
2
1
02
4
Scale (cell size)
Correlation
meta-geo
0.85
0.88
0.91
0.94
0.97
1
1
2
4
8
1
6
3
2
64
1
2
8
256
512
1
024
Scale (cell size)
Correlation
Figure 4-18 Wavelets verification results for the practical dataset. Mean squared error
is attributed to different scales.
4.5 Image Warping
The method based on morphological transformations is strained to its limits by the test case. In
the practical case it appears impossible to find a solution, whereas for the synthetic case
solutions are only found for some values of λ, which is the parameter that sets the weight of the
horizontal error over the vertical error. The synthetic maps have been compared at the lowest
value of Lambda that yields a solution in all cases. The motivation for this choice is to
maximize the tolerance for location errors, but still be able to compare all maps according to the
same standard.
In short, the results are disappointing; only in the pair 1a-1b does the methodology find a
solution that actually has a lower RMISE after the transformation.
Table 4-3 Root mean squared error before and after transformation on the basis of
interpolated maps
RMISE before RMISE after Deformation penalty
1a-1b 0.13 0.11 0.080
1b-1a 0.13 0.12 0.086
2a-2b 0.14 0.14 0.048
2a-2c 0.076 0.078 0.050
2b-2c 0.15 0.15 0.058
3a-3b 0.18 0.18 0.066
3a-3c 0.058 0.074 0.047
3b-3c 0.18 0.19 0.16
- 38 -
4.6 Bivariate Spatial Association
4.6.1 Synthetic dataset
Positive Lee’s L values are found for those maps where the differences are exclusively the
effect of locations of the spots. Negative Lee’s values are found for those where the gradient are
reversed. The effect of the dissimilarity of the gradient is best recognized with a spatial lag of 10
cell widths, whereas the similarity of the spots is best recognized at a spatial lag of 2 or 3.
The maps indicate that the strongest impact on the L statistic in map pair 1a-1b stems from the
spots and not from the surrounding ‘flatness’. In the maps with reversed gradients the spots do
have a positive contribution to the L statistic, but are outweighed by the negative impact of the
gradient.
1a-1b
2a-2b 2a-2c 2b-2c
3a-3b 3a-3c 3b-3c
Figure 4-19 Local Lee’s L values for the synthetic dataset (spatial lag 1.5)
- 39 -
Figure 4-20 Lee’s L values as a function of the spatial lag (radius).
4.6.2 Pragmatic dataset
The spatialized results indicate that Lee’s L statistic for the pragmatic dataset is almost fully the
effect of positive local correlation. This means that there are many locations (in blue, Figure 4-
21) where the two local means stick out from the global mean, but is happens only at very few
locations (in red) that a local mean in one map stands out negatively and in the other positively.
Besides this observation, neither the spatial results nor the relation between spatial lag and L
provide much insight in the nature of the differences.
Index-GeoPearl MetaPearl-GeoPearl
Figure 4-21 Local Lee’s L values for the pragmatic dataset (spatial lag 1.5)
- 40 -
Figure 4-22 Lee’s L values as a function of the spatial lag (radius).
- 41 -
5 CONCLUSIONS AND RECOMMENDATIONS
The most general conclusion of this study is: Yes, there are raster similarity metrics
available and their application yields rich information on the nature, extent and spatial
distribution of differences and similarities in pairs of numerical maps.
Similarity metrics for pairs of raster datasets are used in different fields of science and
engineering. As a consequence much methodological research has been done, that is not
necessarily known from one discipline to another. This may in part be ascribed to network
effects, but also to differences in terminology. One of the objectives of this report is to look
beyond disciplinary barriers and provide a cross-section of comparison methodologies. A gap is
recognized between practice and theory; although advanced metrics (those taking into account
spatial relations between cells) are becoming available, the common practice is to perform cell-
by-cell comparisons. This leaves the literature fairly fragmented, offering individual methods
rather than an evolving theory. In effect the report has the character of a sampler of some
recently developed methods.
Eight methods have been introduced and shortly discussed. Of these, five have been applied on
two test cases. These are Fuzzy Numerical, Image Quality Assessment, Wavelet Field
Verification, Bivariate Spatial Association and Image Warping.
The only method that performed inadequately is Image Warping, based on the paper by Reilly et
al. (2004). This method is applies numerical optimization for a morphological transformation
that balances horizontal and vertical errors. The optimization only finds trivial solutions for the
synthetic dataset and does not find a solution at all in the more complicated practical dataset. It
is not recommended, however, to simply discard this approach; in theory it can solve some
problems associated to moving windows based approaches (such as the Image Quality
Assessment and Bivariate Spatial Association) and aggregation based methods (such as Wavelet
Field Verification). Future research into this type of comparison should not only focus on
numerical improvements, but should preferably also apply morphological transformations that
conserve the ‘volume’ of the map. This implies morphological operators that do not move
points in space, but instead move ‘volume’ from one cell to another. This would be especially
relevant when the raster represents a stock variable.
The method of Briggs & Levine 1997 applies wavelets to obtain indices of map differences at
different scales. Of all four methods that have been applied it is best capable of differentiating
between the large scale and small scale errors that are present in the synthetic datasets. The only
downside of this method is that it attributes the coarse scale errors not only to the coarsest scales
but to a lesser extent also to the finer scales. This is the effect of the discrete nature of the
wavelets. It is recommended to investigate if the application of continuous wavelets can reduce
this effect. Another recommendation is to apply the wavelet approach for multi-scale analysis
for instance of structure metrics such as patch size, diversity and edge density. The negative
correlation at coarse scales that is present in some of the test maps was not properly recognized
by the comparison method. The likely explanation is the presence of NoData values are
interpreted as value 0.
The results of the ‘Bivariate Spatial Association’ are hard to interpret. In particular since the
spatial association between two identical maps is not equal to 1. Nevertheless, the spatialized
version of this metric seems very much in line with human observations In particular, the
method is the only one that identifies partly overlapping spots as being highly similar. The
calculation of expected similarities and variance that the author introduced in a later paper (Lee,
2004) is not considered here, because for medium sized maps it requires prohibitively large
- 43 -
matrix operations. A recommendation for taking into account distributions of errors is to use
Monte Carlo simulation. The advantage of Monte Carlo simulation is that it can be applied for
any similarity metric and different stochastic null models can be applied.
Fuzzy Numerical is the method that seems best able to distinguish areas of minor spatial errors
from major spatial errors in the synthetic dataset. In the maps with a background gradient this
quality became obscured as within more diverse neighbourhoods also more mitigating cells can
be found. Further decomposition of the error, along the lines of the Image Quality Assessment
may be a solution.
Image Quality Assessment yields results that are in line with expectations and allows a
decomposition of the error into different sources. This method gave the clearest feedback on the
practical dataset. A disadvantage of this method is the occurrence of blocking artefacts. These
are only a minor distortion due to the distance decay weights, but a full solution may be offered
by following a fuzzy weighting system, along the lines of Hagen (2003).
It is recommended to investigate the possibilities of a hybrid method, taking elements from
Image Quality Assessment and Fuzzy Numerical. This hybrid approach should take from IQA
the different types of neighbourhood comparison (luminance, contrast, structure) and the fuzzy
weighting of the Fuzzy Numerical and Fuzzy Kappa methods.
With regards to application in the Map Comparison Kit it is stressed that the gap between theory
and practice will be tightened as methods become available in user friendly software. It is
therefore recommended to make the methods discussed in this report available in the software.
An issue to resolve is how to deal with NoData values and non-rectangular maps in particular
for the Wavelet Field Verification and Image Warping. One of the advantages of the MCK is
that it is equipped with tools to perform structured analysis by multiple comparisons, including
a Monte Carlo approach of significance on the basis of neutral models. It is recommended to
extend the MCK with neutral models of continuous valued landscapes, because these are
currently not supported.
- 44 -
6 REFERENCES
Ahrens, B., Karstens, U., Rockel, B., & Stuhlmann, R. (1998). On the validation of the
atmospheric model REMO with ISCCP data and precipitation measurements using simple
statistics. Meteorology and Atmospheric Physics, 68(3-4), 127-142.
Anselin, L. (1995). Local Indicators of Spatial Association - Lisa. Geographical Analysis, 27(2),
93-115.
Bishop, G. D., Church, M. R., Aber, J. D., Neilson, R. P., Ollinger, S. V., & Daly, C. (1998). A
comparison of mapped estimates of long-term runoff in the northeast United States. Journal
of Hydrology, 206(3-4), 176-190.
Bogena, H., Kunkel, R., Schobel, T., Schrey, H. P., & Wendland, E. (2005). Distributed
modeling of groundwater recharge at the macroscale. Ecological Modelling, 187(1), 15-26.
Boots, B., & Csillag, F. (2006). Categorical maps, comparisons, and confidence. Journal of
Geographical Systems, 8(2), 109-118.
Briggs, W. M., & Levine, R. A. (1997). Wavelets and field forecast verification. Monthly
Weather Review, 125(6), 1329-1341.
Brooks, H. E., & Doswell, C. A. (1996). A comparison of measures-oriented and distributions-
oriented approaches to forecast verification. Weather and Forecasting, 11(3), 288-303.
Casati, B., Ross, G., & Stephenson, D. B. (2004). A new intensity-scale approach for the
verification of spatial precipitation forecasts. Meteorological Applications, 11(2), 141-154.
Domingues, M. O., Mendes, O., & da Costa, A. M. (2005). On wavelet techniques in
atmospheric sciences. Fundamentals of Space Environment Science, 35(5), 831-842.
Ebert, E. E. (2005). Forecast Verification - Issues, Methods and FAQ, [Internet]. Available:
http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html [2006, 8 May].
Ebert, E. E., & McBride, J. L. (2000). Verification of precipitation in weather systems:
determination of systematic errors. Journal of Hydrology, 239(1-4), 179-202.
Eskicioglu, A. M., & Fisher, P. S. (1995). Image quality measures and their performance.
Communications, IEEE Transactions on, 43(12), 2959-2965.
Foody, G. M. (2002). Status of land cover classification accuracy assessment. Remote Sensing
of Environment, 80(1), 185-201.
Garen, D. C., & Marks, D. (2005). Spatially distributed energy balance snowmelt modelling in a
mountainous river basin: estimation of meteorological inputs and verification of model
results. Journal of Hydrology, 315(1-4), 126-153.
Goovaerts, P., Jacquez, G. M., & Greiling, D. (2005). Exploring scale-dependent correlations
between cancer mortality rates using factorial kriging and population-weighted
semivariograms. Geographical Analysis, 37(2), 152-182.
Hagen, A. (2003). Fuzzy set approach to assessing similarity of categorical maps. International
Journal of Geographical Information Science, 17(3), 235-249.
Hagen-Zanker, A., Engelen, G., Hurkens, J., Vanhout, R., & Uljee, I. (2006). Map Comparison
Kit 3: User Manual. Maastricht: Research Institute for Knowledge Systems.
Hagen-Zanker, A., Straatman, B., & Uljee, I. (2005). Further developments of a fuzzy set map
comparison approach. International Journal of Geographical Information Science, 19(7),
769-785.
- 45 -
Hoffman, R. N., Liu, Z., Louis, J.-F., & Grassoti, C. (1995). Distortion Representation of
Forecast Errors. Monthly Weather Review, 123(9), 2758-2770.
Lee, S. I. (2004). A generalized significance testing method for global measures of spatial
association: an extension of the Mantel test. Environment and Planning A, 36(9), 1687-1703.
Lee, S.-I. (2001). Developing a bivariate spatial association measure: An integration of
Pearson's r and Moran's I. Journal of Geographical Systems, 3(4), 369-385.
Liu, J., Chen, J. M., Cihlar, J., & Park, W. M. (1997). A process-based boreal ecosystem
productivity simulator using remote sensing inputs. Remote Sensing of Environment, 62(2),
158-175.
Menard, A., & Marceau, D. J. Simulating the impact of forest management scenarios in an
agricultural landscape of southern Quebec, Canada, using a geographic cellular automata.
Landscape and Urban Planning, In Press, Corrected Proof.
Miliaresis, G. C., & Paraschou, C. V. E. (2005). Vertical accuracy of the SRTM DTED level 1
of Crete. International Journal of Applied Earth Observation and Geoinformation, 7(1), 49-
59.
Ostrem, G., & Haakensen, N. (1999). Map Comparison or Traditional Mass-balance
Measurements: Which Method is Better? Geografiska Annaler, Series A: Physical
Geography, 81(4), 703-711.
O'Sullivan, D., & Unwin, D. (2002). Geographic information analysis. Hoboken, N.J.: Wiley.
Pal, N. R., & Pal, S. K. (1993). A review on image segmentation techniques. Pattern
Recognition, 26(9), 1277-1294.
Pontius, J., Robert Gilmore, Huffaker, D., & Denman, K. (2004). Useful techniques of
validation for spatially explicit land-change models. Ecological Modelling, 179(4), 445-461.
Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree
techniques: Bagging and random forests for ecological prediction. Ecosystems, 9(2), 181-
199.
Reilly, C., Price, P., Gelman, A., & Sandgathe, S. A. (2004). Using image and curve registration
for measuring the goodness of fit of spatial and temporal predictions. Biometrics, 60(4), 954-
964.
Santer, B. D., Wigley, T. M. L., & Jones, P. D. (1993). Correlation Methods in Fingerprint
Detection Studies. Climate Dynamics, 8(6), 265-276.
Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based
image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 22(12), 1349-1380.
Strasser, U., & Mauser, W. (2001). Modelling the spatial and temporal variations of the water
balance for the Weser catchment 1965-1994. Journal of Hydrology, 254(1-4), 199-214.
Tompa, D., Morton, J., & Jernigan, E. (2000). Perceptually based image comparison. Paper
presented at the 2000 International Conference on Image Processing, Vancouver, Canada.
Tustison, B., Harris, D., & Foufoula-Georgiou, E. (2001). Scale issues in verification of
precipitation forecasts. Journal of Geophysical Research-Atmospheres, 106(D11), 11775-
11784.
Viscarra Rossel, R. A., & Walter, C. (2004). Rapid, quantitative and spatial field measurements
of soil pH using an Ion Sensitive Field Effect Transistor. Geoderma, 119(1-2), 9-20.
Visser, H., & de Nijs, T. (2006). The Map Comparison Kit. Environmental Modelling &
Software, 21(3), 346-358.
Visser, H., Hagen, A., Nijs (de), T., Klein Goldewijk, C. G. M., Borsboom - van Beurden, J. A.
M., & Niet (de), R. (2004). The Map Comparison Kit: Method, software and applications
(report 550002005). Bilthoven: RIVM.
- 46 -
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment:
From error visibility to structural similarity. Ieee Transactions on Image Processing, 13(4),
600-612.
Wartenberg, D. (1985). Multivariate spatial correlation: a method for exploratory geographical
analysis. Geographical Analysis, 17, 263-283.
Wealands, S. R., Grayson, R. B., & Walker, J. P. (2005). Quantitative comparison of spatial
fields for hydrological model assessment - some promising approaches. Advances in Water
Resources, 28(1), 15-32.
Wigley, T. M. L., & Santer, B. D. (1990). Statistical Comparison of Spatial Fields in Model
Validation, Perturbation, and Predictability Experiments. Journal of Geophysical Research-
Atmospheres, 95(D1), 851-865.
Zepeda-Arce, J., Foufoula-Georgiou, E., & Droegemeier, K. K. (2000). Space-time rainfall
organization and its role in validating quantitative precipitation forecasts. Journal of
Geophysical Research-Atmospheres, 105(D8), 10129-10146.
Zhang, L. J., & Gove, J. H. (2005). Spatial assessment of model errors from four regression
techniques. Forest Science, 51(4), 334-346.
Zhou, Q. M., & Liu, X. J. (2004). Analysis of errors of derived slope and aspect related to DEM
data properties. Computers & Geosciences, 30(4), 369-378.
- 47 -
Annex A: DETAILED RESULTS IMAGE
QUALITY
ASSESSMENT
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-1 Differences in pair2a-2b as a function of variance and radius
- 49 -
Sigma = 1
R = 5
Sigma = 2
R = 10
Sigma = 4
R = 20
Sigma =
10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-2 Differences in pair1a-1b as a function of variance and radius
- 50 -
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-3 Differences in pair2b-2c as a function of variance and radius
- 51 -
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-4 Differences in pair2a-2c as a function of variance and radius
- 52 -
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 8
R = 40
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-5 Differences in pair3a-3b as a function of variance and radius
- 53 -
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure A-6 Differences in pair3b-3c as a function of variance and radius
- 54 -
3a-3c
Sigma = 1
R = 5
Sigma = 4
R = 20
Sigma = 10
R = 50
Luminance
(mean)
Contrast
(variance)
Structure
(covariance)
Figure 6-1 Differences in pair3a-3c as a function of variance and radius
- 55 -
Overall (Deviation 1) Overall (Deviation 4)
Luminance (Deviation 1) Contrast (Deviation 1) Structure (Deviation 1)
Figure A-7 Differences in pair Index-GeoPearl
- 56 -
Overall (Deviation 1) Overall (Deviation 4)
Luminance (Deviation 1) Contrast (Deviation 1) Structure (Deviation 1)
Figure A-8 Differences in pair MetaPearl-GeoPearl
- 57 -
Overall (Deviation 1) Overall (Deviation 4)
Luminance (Deviation 1) Contrast (Deviation 1) Structure (Deviation 1)
Figure A-9 Differences in pair MetaPearl-Index
- 58 -
Footer
59
Footer
59