ArticlePDF Available

Abstract and Figures

The AirMerge platform was designed and constructed for increasing the availability and improving the interoperability of heatmap-based environmental data on the Internet. This platform allows data from multiple heterogeneous chemical weather data sources to be continuously collected and archived in a unified repository; all the data in this repository have a common data format and access scheme. In this paper, we address the technical structure and applicability of the AirMerge platform. The platform facilitates personalized information services, and can be used as an environmental information node for other web-based information systems. The results demonstrate the feasibility of this approach and its potential for being applied also in other areas, in which image-based environmental information retrieval will be needed.
Content may be subject to copyright.
Environmental data extraction from heatmaps using
the AirMerge system
Victor Epitropou
1
&Tassos Bassoukos
1
&Kostas Karatzas
Ari Karppinen
2
&Leo Wanner
3,4
&Stefanos Vrochidis
5
&
Ioannis Kompatsiaris
5
&Jaakko Kukkonen
2
Received: 18 August 2014 / Revised: 19 March 2015 /Accepted: 1 April 2015
#Springer Science+Business Media New York 2015
Abstract The AirMerge platform was designed and constructed to increase the availability
and improve the interoperability of heatmap-based environmental data on the Internet. This
platform allows data from multiple heterogeneous chemical weather data sources to be
continuously collected and archived in a unified repository; all the data in this repository have
a common data format and access scheme. In this paper, we address the technical structure and
applicability of the AirMerge platform. The platform facilitates personalized information
Multimed Tools Appl
DOI 10.1007/s11042-015-2604-7
*Victor Epitropou
vepitrop@isag.meng.auth.gr; ve5822@ee.duth.gr
Tassos Bassoukos
abas@isag.meng.auth.gr
Kostas Karatzas
kkara@eng.auth.gr
Ari Karppinen
ari.karppinen@fmi.fi
Leo Wanner
leo.wanner@upf.edu
Stefanos Vrochidis
stefanos@iti.gr
Ioannis Kompatsiaris
ikom@iti.gr
Jaakko Kukkonen
jaakko.kukkonen@fmi.fi
1
Department of Mechanical Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
2
Finnish Meteorological Institute, Helsinki, Finland
3
Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona,
Catalonia, Spain
4
Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Catalonia, Spain
5
Centre for Research and Technology Hellas - Information Technologies Institute, Thessaloniki, Greece
services, and can be used as an environmental information node for other web-based infor-
mation systems. The results demonstrate the feasibility of this approach and its potential for
being applied also in other areas, in which image-based environmental information retrieval
will be needed.
Keywords Heatmaps .Data retrieval .Air quality.Image processing .Web services .GIS
1 Introduction
There are many types of environment-related images available on-line, broadly belonging to
two categories. These are (a) captured images that are generated as the result of a monitoring
activity (either in situ or remotely), and (b) synthetic (i.e., modeled) images that visualize the
result of an environmental computation process. In the latter category, heatmaps may be
considered as a representative type of synthetic images, and are commonly produced with
the aid of models.
A special area of application regarding the processing of heatmaps is air quality forecasting
(AQF). If we address in particular the regional and continental scales of AQF, the term
Chemical Weather (CW) is commonly used. In Chemical Weather Forecasting (CWF), there
has been an ever-growing number of forecast providers. These forecasts can cover some
regions of Europe multiple-fold [11,16]. However, the advancements in the number and
quality of AQF forecasts have not been associated with similar advancements in publishing
those results on-line, or making them available for added value services [15] in interoperable
forms.
On-line data publishing and divulgation is, in most cases, performed by the use of simple
heatmaps, while the numerical data used for the construction of the heatmaps is commonly
either not available or its access is severely restricted due to legal or technical constraints.
There is no harmonization regarding the (on-line) publishing of heatmaps [19]. Each CWF
data provider has therefore chosen to adopt their own heatmap format and publishing
parameters.
A relatively recent solution for publishing maps, geographical coverages and associated
data and metadata has materialized in the form of the various Open Geospatial Consortium
(OGC) standards for the publishing and visualization of geospatial information. However, the
available implementations and their somewhat inflexible client-server tiered architecture, as
well as limited support for time- and elevation- based data ordering [19] have limited their
adoption, with map visualization (WMS) being the most popular of OGC services, while the
most complex and data-oriented ones (e.g., WFS, WCS, WPS) are lagging behind in terms of
acceptance, both because of their relatively niche use (compared to map viewing) and relative
difficulty of implementation, in the case of WPS.
It can therefore be concluded that CWF data publishing via heatmaps is a field which has,
so far, eluded web service integration and convergence of data interoperability, thus not
providing fertile technical conditions for the development of personalized or other added value
services.
In this paper, an integrated platform is presented, named AirMerge, which has been
designed to address this problem. It uses heatmaps as the starting point for automatically
collecting data from different CWF providers, converting them to a commonly interoperable
format, storing and recalling them from a centralized database and making them available to
other value-added services through a common web API. Elements of AirMerges technology
have first been presented in [8], while ongoing developments, applications and extensions of
the system have been presented in [6,17].
Compared to existing systems with some similar functionality and in the same AQF and
CWF domain, like MyAir [1], AirMerge provides with the unique functionality of image-
based environmental information retrieval [7], accompanied by unique functions allowing for
image noise removal and feature extraction, map projection transformations, data fusion and
mathematical analysis of extracted information as well as providing for external connectivity
and component reuse in third-party projects such as PESCaDO [6].
AirMerge is thus an information hub for spatially defined environmental data, which
provides a single access point to various CWF data. Although AirMerge is currently used
only for chemical weather data, it can deal with any spatially distributed environmental
information, provided that a suitable retrieval and parsing subsystem is implemented in the
form of an extension or plug-in to the system. Via AirMerge, numerical environmental data
can be extracted from heatmaps and made available via a proper web-based API to be used by
other services. Even though AirMerges focus is on heatmaps, other sources of information can
be used as well, such as formatted text data, offline databases, data exchange files, remote data
feeds and so on. These secondary data sources typically offer a much reduced areal coverage
compared to heatmaps and require more interim interfacing to be integrated, but they may be
more precise on a local scale (for example, if they are generated by real-time sensor readings)
and can be used for fine-tuning or verification purposes.
The aims of this article are (i) to present the processing of CWF heatmaps in the AirMerge
system in greater detail than previous publications, and (ii) to discuss AirMerges external
connectivity and interaction with other systems. In addition, it is intended as a way to
comprehensively present the totality of AirMerges components and sub-systems, which have
so far only been presented separately in application-specific settings.
2 Materials and design requirements
2.1 Air quality heatmaps as input material
CWF models predict spatial and temporal concentration data [25]. Such data can be encoded
as digital images in the form of visually-interpretable heatmaps. Heatmaps are defined as 2D
images with a discrete number of color levels representing different pollutant concentration
value ranges over a geographical domain. They are accompanied by auxiliary information
concerning the color scale used and their geographical context, as well as descriptions of the
pollutants and units being used. The accompanying information may even contain secondary
data elements such as line charts, auxiliary heatmaps or tables, which add further levels of
complexity and information richness to CWF heatmaps.
A heatmap, by design, is intended as an intuitive, human-readable, one-way communica-
tion medium, conveying information to various groups of stakeholders. It is a one-way
medium primarily because it is not normally meant to be exported, modified or processed
by the intended target audience.
Generally, heatmaps are not intended as a data interchange format. However, they are
widely available, and they contain a large amount of usable data (relative to most other
potential data sources available online under the same terms). As an intuitive example, a
heatmap of a CWF representing a geographical area on a fixed grid, with grid dimensions of
300× 200 elements, with a grid resolution of 20×20 km, covers an area of 6000×4000 km
(roughly a pan-European coverage), and contains 60,000 data points [16].
Considering that a forecast provider will typically offer at least 24-h coverage (same-day
predictions), and that each data point evaluates to a real-valued number (after heatmap
interpretation), it is obvious that the potential volume of recoverable environmental data is
rather high. In addition, heatmaps carry their own geolocation information and have a large
continuous coverage area, offering a combination of high area coverage, relatively high spatial
resolution and a fair temporal resolution.
Similar amounts and typologies of data can be extracted by remote sensing imagery (RSI)
[2,26], but with the added complication of having to perform more advanced image process-
ing and feature extraction before yielding usable results.
Techniques relying on harvesting information from other sources like social media have to
deal with high noise-to-signal content, low accuracy, and complications caused by bringing
factors such as language semantics and ontologies into play [22]. In the case of heatmaps, the
problem lies almost entirely within the signal processing domain, allowing for a more direct
approach.
2.2 Availability and accessibility of CWF data
Initially, heatmaps were studied in terms of their availability, publishing patterns, data
semantics as well as their structural characteristics. Specifically, we addressed heatmaps
that were within the European Open-Access Chemical Weather Portal [3,14]. This prelim-
inary analysis led to some important conclusions, regarding the state-of-the art in CWF
publishing. An important conclusion was that the existence of multiple CWF providers on
the internet can lead to the simultaneous existence of multiple contradictory forecasts, even
if they refer to the same geographical areas, time spans and pollutant types [11,13]. This
means that the average user wishing to consult more than one CW forecast will have to
evaluate their accuracy and reliability for themselves, by exploring different providers
websites, and with very limited options when it comes to simultaneous display and com-
parison of different CWF sources.
Up to date, there have been only a few efforts between data providers to standardize the
output format of their models and spatial, temporal and qualitative coverages [10,12]. In
addition, access remains mostly non-numerical. The adoption of Open Geospatial Con-
sortium standards for presenting maps and datasets, such as WMS and WPS, has also
remained very limited in the domain of CWF. The MyAir project [9] does offer a direct
data WCS (Web Coverage Service) access option [10], but this is the exception, rather than
the norm.
This has resulted in the current situation, where there is a significant quantity of new CWF
data published on-line every day in Europe, but access to this data in its original resolution and
precision is limited. In addition, there is no unified repository of regional CWF data focusing
on the collection of recently published data.
2.3 Operational premises
The platform was created around the following premises, which are based on the extensive
background research conducted (according to Subsection 2.2):
&New heatmaps representing CWF forecasts are published daily or at least fairly regularly
with a predictable pattern and at predictable web URLs, to make them worthwhile
harvesting, both for practical and for informational gain reasons.
&Heatmap formats and publishing patterns such as update hours, frequency, etc. may
change without warning. It is the platformsmaintainersresponsibility to keep up with
them.
&It is possible to reconvert heatmaps (or, in general, any sufficiently processed remote
sensing or synthetic image [2,26]) back into numerical data, albeit with limitations, and it
is possible to archive and post-process any data recovered by the reconversion process [7].
2.4 Heatmap characteristics
A typical example of a CWF heatmap image is presented in Fig. 1. Those characteristics and
isolated elements are also coded in AirMerges parsing configuration with specific keywords,
which are associated with each characteristic.
The parsing subsystem utilizes XML-based configuration files which contain descriptive
fields for all of the above elements. Some of those elements are fixed for a given providers
heatmaps, while others indicate variable/mutable characteristics, such as the type of pollutant
used. An example of such a configuration file is given in Fig. 2, which contains the instructions
for parsing heatmaps with the structure of the one in Fig. 1.
2.5 Map region
Heatmaps contain at least one rasterized map region, indicated with the < region > tag in the
AirMerges XML parsing subsystem. Regions contain the color-coded data of interest, with a
specific raster height and width. One pixel of this raster map corresponds to one data or grid
point, though a single data point doesnt necessarily uniquely map to a single geographic
coordinate, due to map projection considerations. The map projection is described in the <
projection > node.
2.6 Color legends and pollutants
Each CWF forecast provider usually publishes a series of pollutants using identical map
layouts, but different color scales and value ranges. Hierarchically, color scales are considered
a sub-feature associated with each pollutant.
Color legends, color scales or simply Blegends^are color look-up tables (LUTs) or palettes
siding the map region, which indicate the relationships between the colors used in the heatmap,
and the numerical pollutant concentration value ranges that they represent.
In the scripting subsystem, the totality of the pollutants offered by a CWF provider
are grouped under the (unique) < pollutants > XML tag, while individual pollutants
are found under the (multiple) < pollutant > XML tags, as shown in Fig. 3,which
shows the description employed by AirMerge for parsing the CO pollutant from the
heatmapinFig.1. The position of the color legend as well as the value ranges are
entered manually, but the actual color sampling points are determined automatically.
The order of color parsing is left-to-right for horizontal color scales, and bottom-to-
top for vertical ones.
2.7 Determining map geometry and bounding box
AirMerge utilizes the following simple method for determining the relationship be-
tween image pixels and geographical coordinates: two points p
1
={x
1
,y
1
}andp
2
={x
2
,
y
2
} are selected on the map region itself, and their geographical coordinates
g
1
={λ
1
,φ
1
},g
2
={λ
2
,φ
2
} are determined by using the (usually present) geographical
grid of the image. The two selected points should be as far as possible in terms of
longitude and latitude as the map allows, in order to minimize discretization and
distortion errors. These points are chosen in the < pixelpinning > and < geopinning >
tags in AirMerges XML configuration scripts, as shown in Fig. 2. Then, the pixel/
longitude ratio r
λ
and pixel/latitude ratio r
φ
are easy to reconstruct with the following
formulas:
Non-data
pixels
Top-left
boundary
pixel
Bottom-
right
boundary
pixel
Lon/Lat
origin
Color-value
ranges legend
Pixel
Origin
Secondary heatmapsSecondary heatmaps
Longitude
references
Latitude
references
Auxiliary text
Map Region
Title text
Fig. 1 An example of a heatmap with its important areas and structures highlighted. This particular example also
contains secondary heatmaps and text areas, which are normally not used
<provider>
<name>FMI</name>
<models>
<model name="SILAM" code="5.2" />
</models>
<urlbase>http://silam.fmi.fi/AQ/operational/europe</urlbase>
<regions>
<region name="Europe" legendcy="446">
<cropregion>
<NW x="98" y="110" />
<SE x="500" y="401" />
</cropregion>
<projections>
<projection>
<type>equirectangular</type>
<use>input</use>
<params>
<param name="lam_0" value="0.0" />
</params>
</projection>
<projection>
<type>equirectangular</type>
<use>output</use>
<params>
<param name="lam_0" value="0.0" />
</params>
</projection>
</projections>
<pixelpinning>
<pixel x="98" y="401" />
<pixel x="500" y="110" />
</pixelpinning>
<geopinning>
<lonlat x="-25.0" y="30.0" />
<lonlat x="45.0" y="75.0" />
</geopinning>
</region>
</regions>
</provider>
Fig. 2 Part of the XML-based configuration script used by AirMerge in order to parse heatmaps with a specific
structure. In this case, instructions on how to isolate the map region from heatmaps of the type used in Fig. 1are
given
<pollutants>
<pollutant name="acid/000/CO_gas" type="CO" unit="ugr/m3"
scale="1e-06">
<scales>
<scale>
<autopoints xMin="116" xMax="483" />
<ranges>
<range min="0" max="30" />
<range min="30" max="60" />
<range min="60" max="90" />
<range min="90" max="120" />
<range min="120" max="150" />
<range min="150" max="180" />
<range min="180" max="210" />
<range min="210" max="240" />
<range min="240" max="270" />
<range min="270" max="300" />
<range min="300" max="1200" />
</ranges>
</scale>
</scales>
</pollutant>
...
</pollutants>
Fig. 3 Part of the XML-based configuration script used by AirMerge in order to parse the CO gas pollutant from
heatmaps of the type used in Fig. 1
rλ¼x2x1
λ2λ1
;rφ¼y2y1
φ2φ1
;ð1Þ
The formulas can be applied verbatim under the following conditions:
&p
1
is located South-West (SW) of p
2
&g
1
and g
2
are both located in the northern hemisphere (0°φ
1,2
90°)
&Distances are always computed positive, moving from p
1
to p
2
in the North-East (NE)
direction. If p
2
is more than 180° radians East of p
1
, then the large arc is considered
(0°λ
1,2
360°).
After these ratios have been computed, it is trivial to determine the geographical offsets of
the map (position of the SW point) as well as its maximum geographical extension, and
therefore construct a complete, four-point geographical bounding box.
Selecting appropriate points is done manually by the AirMerges operator, though it is
possible to partially automate the process [18,27]. If no usable geographical reference grid is
supplied with the heatmap, it is still possible to use the known geographical coordinates of two
easily identifiable landmarks.
2.8 Non-data elements
On several heatmaps, there will almost always bevisual elements that do not represent pollutant
concentrations, but instead mark boundaries, form longitude/latitude reference lines, signify
land-water interfaces, indicate urban areas or simply represent geographical areas which, even if
physically present on the map, are not part of the models output. Such areas are called Bvoid
areas^, and can for example be seen as the white, uncovered map areas in Fig. 4.
Fig. 4 An example of a heatmap with both numerous geomarkers and extended void regions
Some of those elements may be useful during configuration phases (for determining the
bounding box and map projection parameters, for instance) but in general they are undesirable
in processed data, and AirMerge uses several techniques to minimize their presence in the data
it stores in its database.
AirMerge automatically classifies any pixel with a color not among those described
in the legend as a non-data element, and considers its position as noise in the data,
forming a gap. Data gaps formed by boundary lines and geomarkers are usually thin
(one or two pixels), and are dealt with by using simple interpolation and noise-
removal algorithms [7], resulting in seamless, continuous images from which to
recover data.
2.9 Handling of borderline cases
Ideally, heatmaps should contain only the colors appearing in their associated legends,
and any extraneous color should clearly indicate a geomarker element to remove. Also,
geomarkers and void/uncovered geographical areas in the map should use different
colors than those used in areas containing valid data, and the heatmaps themselves
should only be delivered in lossless image formats. However, in practice the following
problems do arise:
&Though no CWF provider of those represented in AirMerge actually delivers their
heatmaps in an actually lossy image format (e.g., JPEG), some heatmaps show signs of
having been submitted to a lossy process at some point. This may create unwanted noise,
visible patterns and color artifacts which, if treated indiscriminately like noise, would result
in a too extended data loss.
&Colors that differ slightly from those defined in the legend may appear in the map region,
or there might be more shades and hues than those implied by the legend.
&The legends themselves may contain noise or off-key colors which may differ slightly than
those appearing in the map region, making their use as absolute color references
problematic.
&Usually, geomarkers and void areas use different colors, and it is easy to distinguish
between the two. However, sometimes the same color is used for both, making it
impossible to distinguish them based on color/hue alone.
To counter these occurrences, the parsing subsystem has a built-in configurable tolerance
factor when parsing colors. This allows for gap filling to be turned on and off, specifying
which colors to treat as void and using a special gap filling mode which takes into account the
existence of ambiguous noise and void in the same heatmap. An example of how this
subsystem is configured in AirMerges XML based configuration script, in node <
colorscalespecs > is given in Fig. 5.
2.10 The AirMerge system
AirMerge has been designed and implemented to process heatmaps via adopting a results-
oriented approach. The AirMerge system was developed to include the following components,
visually illustrated in Fig. 6:
&A script-driven CWF heatmap fetching sub-system, which can be configured to fetch all
heatmaps from a given set of CWF providers. This subsystem makes use of tags to
describe all heatmap features that are required for fetching, processing and archiving.
&A scheduler subsystem, which initiates the fetching scripts daily at prefixed intervals and
also handles networking errors such as connection failures, missing resources and script
execution failures by the fetcher subsystem, notifying the systems administrator in case
non-automatic intervention is necessary.
&A heatmap-to-data conversion subsystem, which performs all the necessary image to data
conversions, map projection transformations and image cleanup.
&A database back-end, which stores both the raw and processed data from the heatmap
processing subsystem, according to a schema which allows searching and retrieval by
several fine-grained criteria.
Config
Script Fetcher
Fetching
from
CWF
providers
Heatmap
Processing
Scheduler
Database
External
API
Visualization
module
Database
connector
CWF
providers
Processed
heatmaps
Users Third-party services
Fig. 6 Structure of the AirMerge platform
<colorscalespecs>
<colorspace>
<type>rgb</type>
<tolerant>true</tolerant>
<params>
<param name="tolerance" value="10" />
</params>
</colorspace>
<stretch>0</stretch>
<cleanup>true</cleanup>
</colorscalespecs>
Fig. 5 Part of the XML-based configuration script used by AirMerge in order to configure the color scale and
map region color-based parsing subsystem
&A RESTful API, which allows accessing the data stored in the database using simpler
commands than accessing the database directly.
&Several post-processing modules designed to operate on processed data, either on a
particular coordinate or on an area. These modules offer various statistical and
geoprocessing functions such as computing concentration value averages, producing
ensemble (composite) forecasts or comparing and combining with observations, even
though the intention is to enable external services to implement such functionality through
theuseoftheAirMergeAPIs.
&Third party extensions or special linkage modules which allow accessing those modules
through a simplified interface, such as the ones used for interconnecting with the
PESCaDO projects framework [6].
&A visualization module offering direct in-browser user interaction.
All items except the post-processing modules and third-party extensions form part of
AirMerges core functionality, and are designed to be as generic and data-agnostic as possible,
thus being applicable to any heatmap image processing task. By using all of these function-
alities together, AirMerge constitutes an environmental data collection repository, which can
be extended and used for the creation of third-party services, which can then extract environ-
mental knowledge from AirMerges processed data.
2.11 CWF gathering workflow and sub-system
Since collecting and processing CWF heatmaps is the primary goal of AirMerge, the first step
in its workflow is to gather the heatmaps themselves. In order for a particular providers
heatmaps to be successfully parsed and classified by AirMerge, their URLs must follow a
regular pattern, with a fixed base form and variable parts in their names which should be
indicative of time, pollutant and other relevant parameters. In other words, it is a necessary
precondition that the heatmap URLs themselves carry clearly structured classification meta-
data. Being able to uniquely identify heatmaps and infer some of their variable aspects via their
URL patterns is the key to AirMerges functionality. An example of an URL and its structure
can be seen in Fig. 7:
The URL pattern, as well as its constituent elements, are defined in the XML code snippet
shown in Fig. 8. The URLs structure is encoded in the < formatString > node, while its
variable parts such as the pollutant, elevation layer etc. are indicated by tokens surrounded by
hashes. Some tokens like #prognosis# have different names from the XML nodes they are
iterated over, e.g., #prognosis# takes values from the < forecasts > node, while others are more
self-explanatory.
It has been shown that automating the extraction of some of a heatmaps metadata and
structural information is possible [6,27], by using OCR and text/image processing techniques,
http://silam.fmi.fi/AQ/operational/europe/acid/000/CO_gas_srf_009.png
URL base form Pollutant Elevation Hour of
day
Fig. 7 An example of the structural parts of a heatmap URL
but the efficacy of such techniques is limited by the fact that individual heatmaps do not
always contain all of the necessary information, and operating on individual heatmaps makes it
hard to detect the existence of generalized/common schemas between a series of similar
heatmaps produced by the same CWF provider.
2.12 Image processing module
After images have been fetched, they are transferred to the image processing module, whose
task is to convert raw bitmap data into numerical data, by taking into account each heatmaps
format and characteristics. The map region portion of the heatmap is cropped, and each of its
pixels is scanned individually. Depending on its color and on how closely it matches one of the
colors already present in the legend, it is assigned to a specific classification bin, according to
the following (simplified) pseudocode:
Inputs:
&a[m×n]imageIcontaining RGB color 3-ples
&a color legend C containing kunique RGB color 3-ples
Outputs:
&an [m×n]arrayQcontaining integer values.
for all pixels pI
for all colors cC
if pcthen Q[p]=index(C,c)
end for
if pCthen Q[p]=gap_marker
if isTransparent(C,p) then Q[p]=void_marker
end for
<formatString>urlbase#/#pollutant#_#layer#_#prognosis#.png
</formatString>
<layers>
<layer name="srf" meters="0" />
<layer name="500m" meters="500" />
<layer name="1000m" meters="1000" />
<layer name="3000m" meters="3000" />
</layers>
<stats>
<!-- This URL encodes no stat, so a dummy one is defined -->
<stat name="dummy" type="AvgC1" />
</stats>
<forecasts>
<forecast value="000" unit="hour" hours="0" />
...
<forecast value="096" unit="hour" hours="96" />
</forecasts>
<priority>
<order>prognosis</order>
<order>layer</order>
<order>pollutant</order>
<order>urlbase</order>
</priority>
<dateFormatString>yy/MM/dd</dateFormatString>
Fig. 8 Part of the XML-based configuration script used by AirMerge in order to configure the URL sequencer
where index(C,c) is an integer-valued function which returns the zero-based index of a color
cC. Pixels that do not manage to be classified as one of the colors existing in the legend C,
are assigned the special gap_marker value, which means that they are considered as invalid
data/undesirable noise. An exception to that is if they meet the criteria of the Boolean-valued
function isTransparent(C,p), which determines whether a pixel is to be classified as transpar-
ent or Bvoid^, according to the setup of the color legend C.ThoseBvoid^pixels are not
considered as either valid or invalid data and will be ignored during any successive
computations.
The classification condition, indicated with B^(almost equal) is used to represent the fact
that color classification is performed by using a tolerance threshold function. Most heatmaps
are parsed by using the three-dimensional RGB (Red, Green, Blue) color space for classifica-
tion. In this color space, two colors c
1
={c
1,r
,c
1,g
,c
1,b
}andc
2
={c
2,r
,c
2,g
,c
2,b
} are considered
equal for the purpose of classification when their Euclidean distance d
rgb
is less than a set
threshold ε:
drgb c1;c2
ðÞ¼c1c2
kk¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
c1;rc2;r

2þc1;gc2;g

2
qð2Þ
Usually this threshold εis set to a sphere of radius 10 in the RGB color space (assuming
8 bits or an intensity range of [0,255] per color channel), which proved to be a good all-
around empirical value after extensive testing. Using a threshold rather than an exact match
adds robustness against off-color and noisy graphics. It is also possible to use alternate
color spaces such as HSV (Hue, Saturation, Value), which make it easier to remove certain
kinds of semi-transparent geomarkers without further data losses, but make it more difficult
to discern between certain hues, so using them depends on the characteristics of the
heatmaps to process.
2.13 Cleanup module
Before being stored in the database or subjected to further processing, the indexed data
generated from the image processing step is subjected to the gap-filling or cleanup procedure.
Its goal is to substitute all pixels marked as Bgaps^with valid values, taken from the legend C.
The algorithms used to do this have been detailed in [8]. The quality of the data recoverable
from heatmaps at the end of the cleanup procedure, compared to the original models data
before web publishing by data providers, has been explored in [7].
2.14 Database persistence module
After parsing and gap filling, the now cleaned-up retrieved forecast data is stored in the form of
an indexed image, along with region, projection and color scale information which are derived
from the scripting-configuring system, according to the UML database schema shown in
Fig. 9.
This schema associates each harvested forecast with a unique Blayerimage^entity, which is
the top entity in AirMerges database hierarchy, while other associated information such as
geographical region, legend, pollutant etc., can be reused and shared between multiple
layerimage entities through the Bregionlayer^entity.
By exemplifying, it could be said that the regionlayer entity models all possible
parameter combinations and variants of a forecast that are available through a provider.
These variants can be numerous, but they are always finite in number and eventually tend
to repeat, and so are not unique. The layerimage entities, on the other hand, always have
unique timestamps, as they represent actual unique instances of a forecast issued in a
particular moment in time.
Since the schema exposes geographical coverage information for each regionlayer entity, it
is possible to select multiple layerimage entities covering the same or overlapping geograph-
ical areas. It is of course possible to filter out layerimages according to time coverage as well as
recentness/relevance criteria. For example, for a given time of the day a forecast issued on that
same day will contain more reliable and up-to-date information than one issued 24 or 48 h
before, even if it nominally refers to the same time and day (the Brepresented time^attribute),
hence the most recent forecast will also be more relevant. This way, conditions of multiple
forecasts covering the same time and space can be resolved.
2.15 Geographical transformation modules
To avoid unnecessary data loss, AirMerge stores all forecasts in their original rasters resolu-
tion and map projection, without any permanent alterations. However, it is often necessary to
transform a forecast from one type of geographical projection to another, for example, during a
computation or visualization in a map projection other than its native one. For this reason,
AirMerge contains a variety of map projection and transformation modules, which can be
chained together to form a complete deprojection and reprojection workflow for any stored
forecast.
In general, a forecast is associated with up to four different projection rules:
layerimage
PK,FK2 imageid
dateretrieved
originalurl
lstmodifiedheader
originalmd5
etagheader
datepublished
daterepresentend
daterepresentstart
FK1 regionlayerid
legend
PK legendid
name
FK2 providerid
FK1 legendrowid
legendrow
PK legendrowid
index
color
lowmark
highmark
FK1 legendid
pollutant
PK pollutantid
name
fullname
units
FK1 categoryid
pollutantcategory
PK categoryid
description
name
predictiontype
PK predictiontypeid
description
hours
name
provider
PK providerid
groupname
selng
swlng
nelng
url
nelat
selat
modelname
nwlng
description
name
nwlat
swlat
region
PK regionid
deprojectstring
height
name
width
south
north
east
west
FK1 providerid
regionlayer
PK regionlayerid
elevation
offsethours
FK1 legendid
FK2 pollutantid
FK3 predictiontypeid
FK4 regionid
providerfetcher
PK fetcherid
isactive
xmlparsingspec
dailystarttime
refetchinterval
lastexecution
dailyavailabilitytime
FK1 providerid
imagedata
PK imageid
imagedata
Fig. 9 An exemplified version of AirMerges database schema, indicating how forecasts are stored and how
their metadata and auxiliary information can be retrieved. Primary keys are indicated with the PK note, foreign
keys with the FKnnote
&Input Pixel Projection: A projection from the forecasts raster space (pixels) to the
associated transformed (linear) coordinatesspace.
&Input Geographical Projection: A projection from the transformed coordinatesspace to
a common geographical coordinatesspace
&Output Geographical Projection: A projection from the forecastsgeographicalcoordi-
natesspace to a transformed linear coordinatesspace.
&Output Pixel Projection: A projection from the forecaststransformed coordinatesspace
into an imagesordataarrays raster space (pixels).
This apparent redundancy and complexity, is required because there are three coordinate
spaces to consider:
&The m,ncoordinate space of the forecasts raster itself: (m,n)
2
, also called Bpixel
space^.
&The transformed x,ycoordinate space the forecast: (x,y)
2
, also called Blinear space^.
&The geographical λ,φ(longitude, latitude) coordinate space of the forecast: (λ,φ)[π,π]
2
also called Bgeo space^.
The linear space is an intermediate 2D real-valued space which appears when using non-
linear projections, e.g., conical, polar stereographic or spherical [24]. The relationship with the
pixel space will generally be linear (hence the name) and straightforward, but the relationship
with actual geographical coordinates can be quite complex. An example of such a space are the
UTM coordinates, which are expressed in km from a set point of origin [23], and which are not
trivially convertible to latitude and longitude with a simple linear relationship.
In practice, pixel and geo projections are used in pairs, masking the existence of the
transformed space, which therefore remains hidden and not used directly in computations or
projections. This way, all computations and queries have to deal only in terms of pixels and
absolute geographical coordinates. Thus, in practice only the geo and pixel spaces are used.
In Fig. 10, the typical transformation workflow is illustrated. During input, pixel or raster
data is deprojected to the geographical coordinatesspace. The input/deprojection function is
actually performed in two steps, but as far as AirMerge is concerned, the net result is a
conversion from pixels to geographical coordinates. This allows queries about single geo-
graphical locations or areas to be expressed in intuitive and broadly supported and understood
geographical coordinates, while the deprojection and reprojection mechanisms take care of the
complex transformations required. In Fig. 11, an example of a cleaned-up and projection-
corrected heatmap is shown.
Each type of projection function used in AirMerge is also implemented to be fully
reversible, so it is also possible to convert geographical coordinates back into an array of
pixels or other types of rasterized data. This array may be of the same type and size as the
original input (in which case there would be a closed-loop processing), or of a different type
(for example, visualizing several different types of input heatmaps in a common map
projection).
The implementation of map projections in AirMerge is done by custom code, which
allowed for a more lightweight codebase, direct control and less dependence on external
frameworks. Nevertheless, it is still possible to use adapters to external map transformation
enginessuchastheoneinGeoTools[20], in order to expedite the integration of new map
projections.
2.16 Forecast ensembling module
In addition to retrieving, parsing and storing heatmap data, AirMerge can also perform
several mathematical and statistical operations on groups of two or more layers, retrieved
from different forecasts. Those operations are very common in the domain of ensemble
forecasting [21], and find applications in forecast modeling refinement and big data
processing.
In order to perform such operations, it is necessary that multiple heterogeneous layers can
be translated to a common coordinate system and reference grid. This ability of AirMerge is
displayed in Fig. 12, where three forecasts from different providers are averaged to one
composite (averaging) forecast, and differences in scale and resolution are also leveled.
AirMerge can perform these operations either on a point-to-point basis (for a specific geo-
graphical coordinate), or on a geographically bounded area basis.
Linear
(x,y)
Pixel
(m,n)
Geo
(λ,φ)
Data
array
Processing
Geo
(λ,φ)
Linear
(x,y)
Pixel
(m,n)
OUTPUT
Pixel
Projection
Geo
Projection
Pixel
Projection
Image
Geo
Projection
Deprojection
Reprojection
Fig. 10 Representation of de- and re-projection workflow
Fig. 11 Cleanup and reprojection of a heatmap using a conical map projection
2.17 AirMerge public API
In order to make AirMerges harvested heatmaps and data available to third-party services and
researchers, a public API was designed, available as a RESTful web service that responds to
HTTP GET methods [4]. Currently the API is in a testing phase, and is unauthenticated and
publicly accessible.
Its original purpose was to allow interfacing with the PESCaDO node orchestration service
allowing it to request chemical weather data for specific geographical coordinates (point
queries), from one or multiple layers, and also performing a compositing/ensemble forecasting
of multiple source layers into a single result, according to the principles of ensemble forecast-
ing [21].
2.18 AirMerge visualization module
The visualization module is not considered an essential AirMerge component, as the system can
continue functioning even without it. It is, however, a convenient way of accessing the currently
stored layers of CWF data and visualizing them in a harmonized way, on a common Google
Maps background. This module was used during development and is currently not actively
maintained, as its replacement with an OGC WMS-based solution is scheduled in the future.
2.19 Use of AirMerge in the PESCaDO project
The PESCaDO [28] service system has been developed with an express purpose quite relevant
to that of AirMerge, by being oriented towards discovering new environmental data sources on
the Web and integrating them in a centralized repository.
In contrast to AirMerge however, emphasis has been placed in automatic discovery,
retrieval and classification of informational nodes, including elements of Machine Learning
and ontological data organization, while allowing for extensions through auxiliary external
functionality. In this context, AirMerge has been interfaced to PESCaDO with two distinct
roles/functionalities, that of an environmental data node, and as a provider of forecasts
ensembling and fusion.
AB
C
Fig. 12 Heatmap retrieval (A), cleanup (B), reprojection and combination (C) (ensembling) workflow
In PESCaDO, the concept of environmental data node encompasses every kind of usable
online data source, adopting a philosophy similar to AirMerge, but more focused on sources
broader than heatmaps (i.e., websites providing weather, air quality and pollen forecasts and
historical data).
Normally, PESCaDO includes data discovering and fetching mechanisms designed to
operate on readable text contents, data feeds, air quality bulletins, and, more in general, on
textual web resources and websites, inferring context and contents by the use of semantic-
ontological text analysis techniques.
However, in the PESCaDOs system early design phases, it was realized that important
environmental information is included also in non-textual data (e.g., heatmaps) and therefore
having access to AirMerges mechanisms and its already-harvested data would be
advantageous.
For non-textual sources, such as heatmaps, PESCaDO can make use of specialized
connectors that allow it to either access pre-organized external databases, or utilize specialized
computation and processing modules that allow it to make use of non-textual data sources,
thus adding even non-textual resources to its knowledge database.
AirMerge has found use within the PESCaDO project in three separate ways. First, access
to its already harvested data was granted through the AirMerge API, whose initial develop-
ment was stimulated and shaped precisely by the needs of PESCaDO. Using the AirMerge
API, the PESCaDO service was capable of performing point-based queries all over Europe,
receiving precise numerical responses.
Second, AirMerge performs a type of localized, point-based ensembling [21]whena
particular geographic location is covered by more than one CWF provider, as well as
producing an uncertainty metric of the final reported result. This allows PESCaDO not only
to supply its users with numerical, rather than qualitative information, but also to provide an
estimate of the underlying datas reliability and precision. This is achieved by keeping the
AirMerge platform running as usual, with the PESCaDO platform performing its queries
remotely through the API, without any implementation details of the two platforms being
exposed [5].
A third and more direct involvement of AirMerge in the PESCaDO platform, was achieved
by the almost direct reuse of AirMerges heatmap parsing component by part PESCaDO. This
component can be used autonomously, and even off-line, provided a suitable configuration
script is supplied, describing how to parse a specific heatmap. The configuration script can be
generated either manually or automatically using a dedicated annotation tool [17], and is
similar to the examples already shown.
The component utilized by PESCaDO is a stripped-down version of the parsing subsystem.
It utilizes the same configuration scripting as AirMerge system minus some features like an
URL sequencer and harvester, support for provider-specific configurations, multiple map
regions, and in general without any features meant to process sequences of similar heatmaps.
The scripting language is instead reduced to describe how to parse a single specific heatmap,
rather than an entire class of heatmaps bound by some common characteristics.
While this may seem restrictive at first, it is actually a customization for fitting with
PESCaDOs semi-automatic heatmap configuration subsystem, which attempts to automati-
cally determine the characteristics of a heatmap like resolution, geographical bounding, color
scale and values, etc. based on OCR and text processing techniques. Then, this information is
validated and further edited by an administrative user with the aid of a dedicated Annotation
Tool. This information is then used to generate an AirMerge heatmap parsing XML script,
which is fed to the parsing AirMerge component, according to the scheme in Fig. 13,where
the AirMerge component is indicated as the BHeatmap Processing^block [17].
In the future, this same subsystem can be made publicly available via WPS, thus eliminat-
ing the need to provide an implementation library to use in situ.
2.20 Conclusions and future developments
In order to design an information system that uses heatmaps as its input and produces high
quality environmental information as its output, a precise knowledge of the heatmapsstructure
and characteristics is required, as well as designing a streamlined and coordinated process for
data retrieval, handling, information extraction and system operation.
AirMerge was designed from the ground-up according to this knowledge, adopting a
bottom-up approach and following a results-oriented development strategy, in order to deal
with any encountered sub-problems when treating heatmaps. The AirMerge system is not meant
to be static, but it evolves and is upgraded along with changes in the CWF publishing scene
where it is currently applied. New providers, heatmap formats and map projections are being
added as necessary, while changes in current providerspublishing patterns are being followed.
AirMerge focuses on daily updates starting from published heatmaps, rather than one-off
exchanges of historical model or station data, though AirMerge can also function as a historical
repository. AirMerge can be considered as filling a niche between long-term historical and
statistical presentation of regional air quality data, and short term CWF, allowing for its
database to grow as new CWF are published.
AirMerge has been evaluated both in the roles of a CWF data repository and a supplier of
specialized chemical weather processing services both as a standalone research tool, as well as
a component on a third-party value-added service (PESCaDO).
In the way of making AirMerge more interoperable and more readily accessible by other
third parties, as well as being more readily utilizable as a base for building CWF-related
services [28], the implementation of Open Geospatial Consortium standards is considered, to
work alongside or even entirely supersede the custom AirMerge API for most tasks.
In particular, visualization of harvested heatmaps could be performed through the OGC
Web Map Service, while downloadable numerical data could be supplied through on-the-fly
generation of NetCDF files or other suitable formats by an OGC Web Coverage Service. In
addition, certain extra processing functions offered through the AirMerge API could be better
exposed as OGC WPS (Web Processing Service) processlets.
In general, future efforts will be directed on making AirMerge more accessible to third
parties through the use of well-established GIS standards, rather than providing custom access
and visualization interfaces, in order to turn AirMerge into an attractive, standards-compliant
and solid foundation for the development of CWF-related web services.
Fig. 13 Overall heatmap content distillation architecture in PESCaDO
Acknowledgments AirMerge was developed in the frame of COST Action ES0602, and was financially
supported by the FMI during the years 20102012 and co-funded by the PESCaDO project during 2012
2013. This publication was supported by the BIKY Fellowships of Excellence for Postgraduate Studies in
GreeceSiemens Program^at the time of writing.
References
1. Aalto A. (2012) Scalability of Complex Event Processing as a part of a distributed Enterprise Service Bus.
[Internet]. Espoo, Finland [cited 2014 Jul 31]. Available from: http://www.cleen.fi/en/SitePages/Public%
20deliverables.aspx?fileId=780&webpartid=g_4859b5f8_884d_4432_8aab_2e4c3e4f17dd
2. Armenakis C, Savopol F (2014) Image processing and GIS tools for feature and change extraction. In: Proc.
of the XXth ISPRS Congress. Istanbul, p. 605610
3. Balk T, Kukkonen J, Karatzas K, Bassoukos A, Epitropou V (2011) A European open access chemical
weather forecasting portal. Atmos Environ 45:69176922
4. Bassoukos A (2013) AirMerge Remote API Overview. [Internet]. [cited 2014 Jul 31]. Available from: https://
docs.google.com/document/d/10z_B-Vxd1YJbKADVdsM30OuSoXszpfyRnuVNf4-qUio/edit?usp=sharing
5. Epitropou V, Johansson L, Karatzas K, Bassoukos A, Karppinen A, Kukkonen J, Haakana M. (2012) Fusion
Of Environmental Information For The Delivery Of Orchestrated Services For The Atmospheric
Environment In The PESCaDO Project. [Internet]. Leipzig, Germany [cited 2014 Aug 4]. Available from:
http://www.iemss.org/sites/iemss2012//proceedings/D2_1012_Johansson_et_al.pdf
6. Epitropou V, Karatzas K, Karppinen A, Kukkonen J, Bassoukos A (2012) Orchestration services for
chemical weather forecasting models in the frame of the PESCaDO project. In: 8th International
Conference on Air QualityScience and Application; Athens, 1923
7. Epitropou V, Karatzas K, Kukkonen J, Vira J (2012) Evaluation of the accuracy of an inverse image-based
reconstruction method for chemical weather data. Int J Artif Intell 9(12):152171
8. Epitropou V, Karatzas K, Bassoukos A (2010) A method for the inverse reconstruction of environmental data
applicable at the Chemical Weather portal. In: Proceedings of the GI-Forum Symposium and exhibit on
applied Geoinformatics; p. 5868
9. European Earth Observation Programme (2012) PASODOBLE project. MyAir PASODOBLE project
homepage. [Internet]. [cited 2014 Aug 4]. Available from: http://www.myair.eu/airsheds/
10. European Environment Agency (2014) AirBase - The European air quality database. [Internet]. [cited 2014
Aug 4]. Available from: http://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-
database-7
11. Galmarini S, Kioutsoukis I, Solazzo E (2013) E pluribus unum: ensemble air quality predictions. Atmos
Chem Phys 29:71537182
12. Horálek J, Tarrasón L, de Smet P, Malherbe L, Schneider P, Ung A, Corbet L, Denby B (2013) Evaluation of
copernicus MACC-II ensemble products in the ETC./ACM spatial air quality mapping. Technical Paper
2013/9. European Topic Centre on Air Pollution and Climate Change Mitigation
13. Karatzas K, Kukkonen J (2009) COST Action ES0602: Quality of life information services towards a
sustainable society for the atmospheric environment. Sofia Publishers, Thessaloniki
14. Karatzas K, Kukkonen J, Bassoukos A, Epitropou V, Balk T (2011) A European chemical weather
forecasting portal. In: Steyn GD, Trini SC (eds). Air pollution modeling and its applications XXI.
Springer, NATO Science for Peace and Security Series C: Environmental Security: p. 239243
15. Khan FH, Javed MY, Bashir S, Khan A, Sikandar M, Khiyal H (2010) QoS based dynamic web services
composition & execution. Int J Comput Sci Inf Secur
16. Kukkonen J, Olsson T, Schultz DM, Baklanov A, Klein T, Miranda AI, Monteiro A, Hirtl M, Tarvainen V,
Boy M et al (2012) A review of operational, regional-scale, chemical weather forecasting models in Europe.
AtmosChemPhys12:187
17. Moumtzidou A, Epitropou V, Vrochidis S, Karatzas K, Voth S, Bassoukos A, Mossgraber J, Karppinen A,
Kukkonen J, Kompatsiaris I (2014) A model for environmental data extraction from multimedia and its
evaluation against various chemical weather forecasting datasets. J Ecol Inf 23:6982
18. Moumtzidou A, Epitropou V, Vrochidis S, Voth S, Bassoukos A, Karatzas K, Moßgraber J, Kompatsiaris I,
Karppinen A, Kukkonen J (2012) Environmental data extraction from multimedia resources. In: Proceedings
of the 1st ACM international workshop on Multimedia analysis for ecological data (MAED 2012); Nara,
Japan. p. 1318
19. Open Geospatial Consortium (2013) OGC best practice for using web map services (WMS) with time-
dependent or elevation-dependent data [Internet]. [cited 2014 Jul 30]. Available from: http://external.
opengeospatial.org/twiki_public/pub/MetOceanDWG/MetOceanWMSBPOnGoingDrafts/12-111r1_Best_
Practices_for_WMS_with_Time_or_Elevation_dependent_data.pdf
20. OSGeo Foundation (2014) GeoTools Infosheet. [Internet]. [cited 2014 Jul 31]. Available from: http://www.
osgeo.org/geotools
21. Potempski S, Galmarini S (2009) Est modus in rebus: analytical properties of multi-model ensembles. Atmos
Chem Phys 9(24):94719489
22. Riga M, Karatzas K (2014) Investigating the relationship between social media content and real-time
observations for urban air quality and public health. In: Proceedings of the 4th International Conference
on Web Intelligence, Mining and Semantics (WIMS 14); New York, USA. p. 59:17
23. Snyder J (1987) Map ProjectionsA Working Manual. Professional Paper: 1395. USGS Publications
Warehouse
24. Snyder J (1989) An album of map projections. Professional Paper: 1453. USGS Publications Warehouse
25. Sofiev M, Siljamo P, Valkama I, Ilvonen M, Kukkonen J (2006) A dispersion modelling system SILAM and
its evaluation against ETEX data. Atmos Environ 44(4):674685
26. Verstraete MM, Pinty B (2013) Environmental information extraction from satellite remote sensing data. In:
Kasibhatla P (ed) Inverse methods in global biogeochemical cycles, vol 1., pp 125137
27. Vrochidis S, Epitropou V, Bassoukos A, Voth S, Karatzas K, Moumtzidou A, Moßgraber J, Kompatsiaris I,
Karppinen A, Kukkonen J (2012) Extraction of environmental data from on-line environmental information
sources. In: IFIP Advances in Information and Communication Technology; p. 361370
28. Wanner L, Vrochidis S, Rospocher M, Mossgraber J, Bosch H, Karppinen A, Myllynen M, Tonelli S,
Bouayad-Agha N, Bugel U, et al (2012) Personalized environmental service orchestration for quality of life
improvement. In: artificial intelligence applications and innovations, IFIP advances in information and
communication technology, 3rd Intelligent Systems for Quality of Life information Services (ISQL 2012);
Halkidiki, Greece. p. 351360
Victor Epitrop ou has received his MEng degree in Electrical Engineering and Computer Science from the
Democritus University of Thrace (DUTh) in 2007 with a thesis on the parallelization of algorithms in an object-
oriented context, and his MSc degree in Digital Image and Signal processing from the DUTh in 2011, after
serving in the Hellenic Army as a reserve officer of the Research & IT corps from 2007 to 2009. He has worked
at the Informatics Systems & Applications Group of the Aristotle University of Thessaloniki since late 2009 on
several European research projects in the domain of air quality and personalized web services, and is currently a
PhD candidate at the Aristotle University of Thessaloniki. His research and work interests also include embedded
software development, neural networks, wireless sensor networks, and desktop applications development.
Tassos Bassoukos is a Senior Software Engineer with the Informatics Systems & Applications Group of the
Aristotle University of Thessaloniki, handling the groups Software Engineering needs. He has been carrying the
main software development burden of the group for several years. He has been working with Web technologies
since HTML 2.0 and Java web applications since 1999. He has participated in several European research projects
and has seen them to successful completion. Additional areas of professional interest include Open Source
frameworks, Web-based Content management systems, inter-language integration, web mapping systems,
information representation and Human-Computer Interfaces.
Kostas Karatzas is an Asst. Professor for Informatics Systems & Applications at the Dept. of Mechanical
Engineering, Aristotle University of Thessaloniki (AUTh), Greece, where he leads the Informatics Applications
and Systems Group (ISAG). The Group is specialized in Environmental Informatics, and is conducting data
oriented analysis and modelling at a raw, processed, and e-service level, for citizens, authorities and industry,
employing Computational Intelligence. ISAG has a long expertise in web-based applications, web portals and
services, as well as in applications for mobile devices and smart phones, with an emphasis on participatory
environmental sensing. Kostas Karatzas is the author and co-author of more than 150 scientific publications, and
is serving as a scientific committee member for international conferences and as an advisory board member of int.
publications, with emphasis on Environmental Informatics and Computational Intelligence.
Dr. Ari Karppinen has worked as a research scientist at the Finnish Meteorological Institute since 1984. His
expertise is on mathematical modeling, atmospheric physics and chemistry; particularly evaluation of urban air
quality, the dispersion of pollution from traffic. His MSc thesis (1987) dealt with the description and application
of a system for calculating radiation doses due to long range transport of radioactive releases and his Licentiatess
thesis (1998) studied the effective choice of NOx - emission control measures. His doctors thesis (2001) dealt
with the meteorological pre-processing and atmospheric dispersion modeling of urban air quality and applica-
tions in the Helsinki metropolitan area Doc. Karppinen is the author of approximately 200 scientific publications;
37 of these in refereed international journals. He has given over 50 lectures and presentations at scientific
conferences.
Leo Wanner is an ICREA Research Professor in the Department of Information and Communication Technol-
ogies at Universitat Pompeu Fabra (UPF). He earned his Diploma degree in Computer Science from the
University of Karlsruhe, Germany, and his Ph.D. in Linguistics from the University of The Saarland,
Saarbrücken, Germany. Leo works in the field of computational linguistics. His research foci include automatic
multilingual report generation, automatic summarization of written material and paraphrasing, computational
lexicology and lexicography. Throughout his career, Leo has been involved in various large-scale national,
European, and transatlantic research projects. He has published five books and over 100 refereed journal and
conference articles.
Stefanos Vrochidis received the received the Diploma degree in electrical engineering from Aristotle University
of Thessaloniki, the MSc degree in radio frequency communication systems from University of Southampton,
and the PhD degree in electronic engineering from Queen Mary, University of London. He is a postdoctoral
researcher with ITI-CERTH. His research interests include semantic multimedia analysis, indexing and retrieval,
semantic search, multimedia search engines and human interaction, as well as environmental applications and
patent search.
Ioannis Kompatsiaris received the Diploma degree in electrical engineering and the Ph.D. degree in 3-D model
based image sequence coding from Aristotle University of Thessaloniki in 1996 and 2001, respectively. He is a
Senior Researcher with ITI-CERTH and director of its Multimedia Knowledge Laboratory. His research interests
include multimedia content processing, multimodal techniques, multimedia and Semantic Web, multimedia
ontologies, knowledge-based analysis, and context aware inference for semantic multimedia analysis,
personalisation and retrieval.
Jaakko Kukkonen received the Ph.D. degree in physics from the University of Helsinki in 1990. He is currently
Research Professor at the Finnish Meteorological Institute. He is also Docent (Adj. Prof.) of Physics at the
University of Helsinki and Visiting Professor at the University of Hertfordshire (U.K.). He has worked on
atmospheric physics and chemistry, including especially the development, evaluation and applications of
mathematical atmospheric models.
... As a consequence of encoding information and knowledge implicitly in higher-level data products, human agents need to manually extract information from articles, for example, to perform a meta-analysis. Another practice is to attempt to algorithmically extract the characteristics of spatial features by processing image pixels (e.g., Epitropou et al., 2015;Stocker et al., 2015a). ...
Article
Traditionally, temperature-salinity (T-S) relationship was analysed to indicate the characteristic of water mass, and prediction models based on regression may be built to estimate the salinity in earlier researches. Temperature-salinity characteristic however might change dynamically with respect to the geographic location, season, or water layer, and is quite sensitive to the depth for the same location. It is therefore of interest whether including depth into the regression model could help to improve the prediction accuracy. In this paper, multivariate nonlinear regression is investigated to predict the salinity according to both temperature and depth. Experimental results show that depth is very effective for improving the prediction accuracy, and season-dependent model may achieve better performance than season-independent model. In addition, when the analysis was conducted for 5-year range, it is found the prediction accuracy is significantly higher than the result for all years, which indicates there might exist long-term variation on the characteristics of the water masses. Furthermore, 3D model and visualization scheme were proposed to explore the effect of depth on the temperature-salinity-depth characteristic, and a visualization system was built accordingly. This system may present the T-S curve and 3D Model according to the assigned criteria of season or multi-year range, and allows the user to view the similarity map for the given T-S-D data so as to conduct comparative study of water masses for a wide area of ocean.
Article
There are limitations to traditional visualization solutions regarding real-time 3D visualization of time-varying and large-volume 3D gridded oceanographic data in a web environment. We adopted the open-source visualization technologies to implement a browser-based 3D visualization framework. The developed 3D visualization interfaces provide users 3DGIS experiences on a virtual globe and simultaneously provide efficient 3D volume rendering and enriched interactive volume analysis. Our experiments suggest that the well-designed Cesium and Plotly.js API allow researchers to easily establish 3D visualization applications while avoiding the requirements of intensive programming and computations. The case study conducted shows that the proposed methods is a feasible alternative web-based 3D visualization solution, which provides a faster rendering speed, high visual effects and on-the-fly 3D visualization of oceanographic data. Due to its open-source architecture and the simplicity of the adopted technologies, the visualization framework can be easily customized to visualize other scientific data with few modifications.
Book
Full-text available
The chemical composition of the atmosphere has numerous impacts to the quality of human life. Some prominent examples of these are the adverse health effects of fine particulate matter and ozone, the irritation and cough caused by some air pollutants and the sneezing associated to aeroallergens, the sense of smell associated to the changes of the seasons as well as to the exposure to unpleasant odors. The COST Action ES0602: Towards a European Network on Chemical Weather Forecasting and Information Systems (www.chemicalweather.eu), organized in May 2008 a workshop in Thessaloniki, Greece, devoted to Quality of life information services towards a caring and sustainable society for the atmospheric environment. The main purpose of the workshop was to present and discuss existing Chemical Weather Forecasting and Information Systems, developed both by the action participants, and by related important organizations, such as the European Environment Agency and the U.S. Environmental Protection Agency. The focus of the workshop was specifically on the dissemination and the wider use of chemical weather forecasting information. This has been the main topic of the Working Group 3 of this COST action. The workshop included a number of key presentations from invited experts from Europe and the United States. These represent regional, national and continental solutions and services that provide information on the quality of the atmospheric environment and chemical weather forecasts, via web portals, mobile devices, and other ICT communication channels. These presentations are included in this publication, and we wish to thank all the authors for providing these contributions. In addition, this publication includes three brief papers that present the objectives, content, interaction and achievements of the working groups that are active within the COST ES0602 Action. We also wish to acknowledge the substantial contributions towards the successful organization of this workshop by all of the participants of this COST action. Last but not least, this publication includes an inventory of AQ information systems in Europe, on the basis of input received by members of the Action. We hope that based on these proceedings and the referenced information, the readers will be able to have direct access to the current state of the art, the latest developments and the future plans concerning Quality of Life Information Services for the Atmospheric Environment (This publication is supported by COST).
Article
Full-text available
The spatially interpolated European air quality maps used at the EEA are produced by its ETC/ACM on an annual basis through combining observations from AirBase stations as primary data source, with European-wide dispersion modelling information taken from the EMEP model as supplementary data source. The EMEP model is a reference European chemical transport model with a spatial resolution of 50x50 km2. The quality and resolution of the EMEP model influences the quality of the spatially interpolated maps. The recent availability of a series of Copernicus Atmospheric Service (MACC-II) modelling products opens up a possibility for potentially improving the interpolated mapping products of the ETC/ACM, using one of these modelling products instead of the EMEP model. This paper reflects the comparisons of the ETC/ACM mapping results when using either the EMEP model or the MACC-II modelling products as auxiliary variable in the geostatistical interpolation. The paper describes the mapping methods of ETC/ACM, the comparison approaches executed and the suite of input data selected for these comparisons. It describes and discusses the comparison results, leading to conclusions on limitations and suitability of the use of MACC-II products. It includes recommendations in general and specifically on requirements of products from the MACC-II project.
Article
Full-text available
The PESCaDO project (http://www.pescado-project.eu/) aims at providing tailored environmental information to EU citizens. For this purpose, PESCaDO delivers personalized environmental information, based on coordinating the data flow from multiple sources. After the necessary discovery, indexing and parsing of those sources, the harmonization and retrieval of data is achieved through Node Orchestration and the creation of unified and accurate responses to user queries by using the Fusion service, which assimilates input data into a coherent data block according to their imprecision and relevance in respect to the user defined query. Environmental nodes are selected from open-access web resources of various types, and from the direct usage of data from monitoring stations. Forecasts of models are made available through the synergy with the AirMerge Image parsing engine and its chemical weather database. In the presented paper, elements of the general architecture of AirMerge, and the Fusion service of PESCaDO are exposed as an example of the modus operandi of environmental information fusion for the atmospheric environment.
Conference Paper
Full-text available
The rapid rise of Web 2.0 technologies and the popularity of social media, together with the broad use of low cost smart devices, changed dramatically the way users receive information, but also gave them the ability to become significant contributors of disseminated data. The challenge now is to benefit from large volumes of data and collective intelligence, so as to detect what people think or discuss in virtual communities, at the time that an event happens or the information is spread. Our domain of interest is the Urban Air Quality (UAQ) and public health. We want to promote the potential use of social media as a real-time source of "sensing" the environmental load or the existing environmental condition that affects directly humans' quality of life. With the use of the Self-Organizing Map (SOM), we analyze posts gathered from Twitter and we identify existing UAQ conditions, based on users' reports. Clusters of tweets with similar topics of discussion are formed. We additionally investigate the relations between citizens' reports and the corresponding, in time and location, actual observations of specific environmental characteristics. With a thorough investigation of SOM visualizations, we conclude that there is a positive correlation between personal observations and official data, highlighting thus the agreement among soft sensors' (users) and hard sensors' (monitoring sites) measurements.
Article
Full-text available
Remote sensing is a particular instance of the more general measurement process, which is intrinsically an inverse problem. Some of the key issues of inverse theory in the context of the interpretation of remote sensing over terrestrial surfaces are outlined. Earth Observation from space provides a unique opportunity to acquire data on the state and evolution of the environment globally, repetitively, and at spatial resolutions suitable for the simulation of dynamic processes in the relevant geophysical media. Various approaches can be used to address this inverse problem, including the formal inversion of physically-based or empirical surface reflectance models against remote sensing measurements, or more empirical approaches such as the exploitation of correlations between the variables of interest and spectral indices derived from the measurements. This field of study will evolve considerably in the next few years as a result of the introduction of a new generation of space sensors with greatly increased performances and capabilities. The higher quality data generated by these sensors will permit the evaluation of the theoretical bases of existing models, and the latter will foster a much better exploitation of the data. However, scientific and technological advances will need to be coordinated to take full advantage of the opportunities, as the availability of reliable models and high performance computers and networks will rival as bottlenecks to the full exploitation of Earth Observation data.
Article
Full-text available
Environmental data analysis and information provision are considered of great importance for people, since environmental conditions are strongly related to health issues and directly affect a variety of everyday activities. Nowadays, there are several free web-based services that provide environmental information in several formats with map images being the most commonly used to present air quality and pollen forecasts. This format, despite being intuitive for humans, complicates the extraction and processing of the underlying data. Typical examples of this case are the chemical weather forecasts, which are usually encoded heatmaps (i.e. graphical representation of matrix data with colors), while the forecasted numerical pollutant concentrations are commonly unavailable. This work presents a model for the semi-automatic extraction of such information based on a template configuration tool, on methodologies for data reconstruction from images, as well as on text processing and Optical Character Recognition (OCR). The aforementioned modules are integrated in a standalone framework, which is extensively evaluated by comparing data extracted from a variety of chemical weather heat maps against the real numerical values produced by chemical weather forecasting models. The results demonstrate a satisfactory performance in terms of data recovery and positional accuracy.
Conference Paper
Full-text available
The PESCaDO project aims at providing personalized environmentally-derived information to European citizens using plain, human-language queries and using them to infer the appropriate environmental, ontological, quality-of-life and spatial-temporal context in which to focus. This information is in turn used to provide information about pollution concentration and health risks associated with the action inferred by the query within the areas covered by the service's knowledge base. In order to provide this information, PESCaDO automatically orchestrates and combines several sources of environmental information, among which are Chemical Weather forecasts, which provide high-detail, high-volume pollutant concentration data over most of Europe. Harmonized and uniform access to such Chemical Weather (CW) data is achieved through PESCaDO's integration with the AirMerge CW image parsing engine, which also provides innovative services such as the ability to automatically generate ensemble forecasts and automatically rank CW model providers for reliability and accuracy of results. In the current paper we report on the orchestration of services base on CW forecasting models, as achieved via the integration with AirMerge.
Article
In this paper we investigate some basic properties of the multi-model ensemble systems, which can be deduced from a general characteristic of statistical distributions of the ensemble members with the help of mathematical tools. In particular we show how to find optimal linear combination of model results, which minimizes the mean square error both in the case of uncorrelated and correlated models. By proving basic estimations we try to deduce general properties describing multi-model ensemble systems. We show also how mathematical formalism can be used for investigation of the characteristics of such systems.
Article
In this study we present a novel approach for improving the air quality predictions using an ensemble of air quality models generated in the context of AQMEII (Air Quality Model Evaluation International Initiative). The development of the forecasting method makes use of modeled and observed time series (either spatially aggregated or relative to single monitoring stations) of ozone concentrations over different areas of Europe and North America. The technique considers the underlying forcing mechanisms on ozone by means of spectrally decomposed previsions. With the use of diverse applications we demonstrate how the approach screens the ensemble members, extracts the best components and generates bias-free forecasts with improved accuracy over the candidate models. Compared to more traditional forecasting methods such as the ensemble median, the approach reduces the forecast error and at the same time it clearly improves the modelled variance. Furthermore, the result is not a mere statistical outcome depended on the quality of the selected members. The few individual cases with degraded performance are also identified and analyzed. Finally, we show the extensions of the approach to other pollutants, specifically particulate matter and nitrogen dioxide, and provide a framework for its operational implementation.
Article
In this study we present a novel approach for improving the air quality predictions using an ensemble of air quality models generated in the context of AQMEII (Air Quality Model Evaluation International Initiative). The development of the forecasting method makes use of modelled and observed time series (either spatially aggregated or relative to single monitoring stations) of ozone concentrations over different areas of Europe and North America. The technique considers the underlying forcing mechanisms on ozone by means of spectrally decomposed previsions. With the use of diverse applications, we demonstrate how the approach screens the ensemble members, extracts the best components and generates bias-free forecasts with improved accuracy over the candidate models. Compared to more traditional forecasting methods such as the ensemble median, the approach reduces the forecast error and at the same time it clearly improves the modelled variance. Furthermore, the result is not a mere statistical outcome depended on the quality of the selected members. The few individual cases with degraded performance are also identified and analysed. Finally, we show the extensions of the approach to other pollutants, specifically particulate matter and nitrogen dioxide, and provide a framework for its operational implementation.