Environmental data extraction from heatmaps using
the AirMerge system
Received: 18 August 2014 / Revised: 19 March 2015 /Accepted: 1 April 2015
#Springer Science+Business Media New York 2015
Abstract The AirMerge platform was designed and constructed to increase the availability
and improve the interoperability of heatmap-based environmental data on the Internet. This
platform allows data from multiple heterogeneous chemical weather data sources to be
continuously collected and archived in a unified repository; all the data in this repository have
a common data format and access scheme. In this paper, we address the technical structure and
applicability of the AirMerge platform. The platform facilitates personalized information
Multimed Tools Appl
Department of Mechanical Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
Finnish Meteorological Institute, Helsinki, Finland
Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona,
Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Catalonia, Spain
Centre for Research and Technology Hellas - Information Technologies Institute, Thessaloniki, Greece
services, and can be used as an environmental information node for other web-based infor-
mation systems. The results demonstrate the feasibility of this approach and its potential for
being applied also in other areas, in which image-based environmental information retrieval
will be needed.
Keywords Heatmaps .Data retrieval .Air quality.Image processing .Web services .GIS
There are many types of environment-related images available on-line, broadly belonging to
two categories. These are (a) captured images that are generated as the result of a monitoring
activity (either in situ or remotely), and (b) synthetic (i.e., modeled) images that visualize the
result of an environmental computation process. In the latter category, heatmaps may be
considered as a representative type of synthetic images, and are commonly produced with
the aid of models.
A special area of application regarding the processing of heatmaps is air quality forecasting
(AQF). If we address in particular the regional and continental scales of AQF, the term
Chemical Weather (CW) is commonly used. In Chemical Weather Forecasting (CWF), there
has been an ever-growing number of forecast providers. These forecasts can cover some
regions of Europe multiple-fold [11,16]. However, the advancements in the number and
quality of AQF forecasts have not been associated with similar advancements in publishing
those results on-line, or making them available for added value services  in interoperable
On-line data publishing and divulgation is, in most cases, performed by the use of simple
heatmaps, while the numerical data used for the construction of the heatmaps is commonly
either not available or its access is severely restricted due to legal or technical constraints.
There is no harmonization regarding the (on-line) publishing of heatmaps . Each CWF
data provider has therefore chosen to adopt their own heatmap format and publishing
A relatively recent solution for publishing maps, geographical coverages and associated
data and metadata has materialized in the form of the various Open Geospatial Consortium
(OGC) standards for the publishing and visualization of geospatial information. However, the
available implementations and their somewhat inflexible client-server tiered architecture, as
well as limited support for time- and elevation- based data ordering  have limited their
adoption, with map visualization (WMS) being the most popular of OGC services, while the
most complex and data-oriented ones (e.g., WFS, WCS, WPS) are lagging behind in terms of
acceptance, both because of their relatively niche use (compared to map viewing) and relative
difficulty of implementation, in the case of WPS.
It can therefore be concluded that CWF data publishing via heatmaps is a field which has,
so far, eluded web service integration and convergence of data interoperability, thus not
providing fertile technical conditions for the development of personalized or other added value
In this paper, an integrated platform is presented, named AirMerge, which has been
designed to address this problem. It uses heatmaps as the starting point for automatically
collecting data from different CWF providers, converting them to a commonly interoperable
format, storing and recalling them from a centralized database and making them available to
other value-added services through a common web API. Elements of AirMerge’s technology
have first been presented in , while ongoing developments, applications and extensions of
the system have been presented in [6,17].
Compared to existing systems with some similar functionality and in the same AQF and
CWF domain, like MyAir , AirMerge provides with the unique functionality of image-
based environmental information retrieval , accompanied by unique functions allowing for
image noise removal and feature extraction, map projection transformations, data fusion and
mathematical analysis of extracted information as well as providing for external connectivity
and component reuse in third-party projects such as PESCaDO .
AirMerge is thus an information hub for spatially defined environmental data, which
provides a single access point to various CWF data. Although AirMerge is currently used
only for chemical weather data, it can deal with any spatially distributed environmental
information, provided that a suitable retrieval and parsing subsystem is implemented in the
form of an extension or plug-in to the system. Via AirMerge, numerical environmental data
can be extracted from heatmaps and made available via a proper web-based API to be used by
other services. Even though AirMerge’s focus is on heatmaps, other sources of information can
be used as well, such as formatted text data, offline databases, data exchange files, remote data
feeds and so on. These secondary data sources typically offer a much reduced areal coverage
compared to heatmaps and require more interim interfacing to be integrated, but they may be
more precise on a local scale (for example, if they are generated by real-time sensor readings)
and can be used for fine-tuning or verification purposes.
The aims of this article are (i) to present the processing of CWF heatmaps in the AirMerge
system in greater detail than previous publications, and (ii) to discuss AirMerge’s external
connectivity and interaction with other systems. In addition, it is intended as a way to
comprehensively present the totality of AirMerge’s components and sub-systems, which have
so far only been presented separately in application-specific settings.
2 Materials and design requirements
2.1 Air quality heatmaps as input material
CWF models predict spatial and temporal concentration data . Such data can be encoded
as digital images in the form of visually-interpretable heatmaps. Heatmaps are defined as 2D
images with a discrete number of color levels representing different pollutant concentration
value ranges over a geographical domain. They are accompanied by auxiliary information
concerning the color scale used and their geographical context, as well as descriptions of the
pollutants and units being used. The accompanying information may even contain secondary
data elements such as line charts, auxiliary heatmaps or tables, which add further levels of
complexity and information richness to CWF heatmaps.
A heatmap, by design, is intended as an intuitive, human-readable, one-way communica-
tion medium, conveying information to various groups of stakeholders. It is a one-way
medium primarily because it is not normally meant to be exported, modified or processed
by the intended target audience.
Generally, heatmaps are not intended as a data interchange format. However, they are
widely available, and they contain a large amount of usable data (relative to most other
potential data sources available online under the same terms). As an intuitive example, a
heatmap of a CWF representing a geographical area on a fixed grid, with grid dimensions of
300× 200 elements, with a grid resolution of 20×20 km, covers an area of 6000×4000 km
(roughly a pan-European coverage), and contains 60,000 data points .
Considering that a forecast provider will typically offer at least 24-h coverage (same-day
predictions), and that each data point evaluates to a real-valued number (after heatmap
interpretation), it is obvious that the potential volume of recoverable environmental data is
rather high. In addition, heatmaps carry their own geolocation information and have a large
continuous coverage area, offering a combination of high area coverage, relatively high spatial
resolution and a fair temporal resolution.
Similar amounts and typologies of data can be extracted by remote sensing imagery (RSI)
[2,26], but with the added complication of having to perform more advanced image process-
ing and feature extraction before yielding usable results.
Techniques relying on harvesting information from other sources like social media have to
deal with high noise-to-signal content, low accuracy, and complications caused by bringing
factors such as language semantics and ontologies into play . In the case of heatmaps, the
problem lies almost entirely within the signal processing domain, allowing for a more direct
2.2 Availability and accessibility of CWF data
Initially, heatmaps were studied in terms of their availability, publishing patterns, data
semantics as well as their structural characteristics. Specifically, we addressed heatmaps
that were within the European Open-Access Chemical Weather Portal [3,14]. This prelim-
inary analysis led to some important conclusions, regarding the state-of-the art in CWF
publishing. An important conclusion was that the existence of multiple CWF providers on
the internet can lead to the simultaneous existence of multiple contradictory forecasts, even
if they refer to the same geographical areas, time spans and pollutant types [11,13]. This
means that the average user wishing to consult more than one CW forecast will have to
evaluate their accuracy and reliability for themselves, by exploring different provider’s
websites, and with very limited options when it comes to simultaneous display and com-
parison of different CWF sources.
Up to date, there have been only a few efforts between data providers to standardize the
output format of their models and spatial, temporal and qualitative coverages [10,12]. In
addition, access remains mostly non-numerical. The adoption of Open Geospatial Con-
sortium standards for presenting maps and datasets, such as WMS and WPS, has also
remained very limited in the domain of CWF. The MyAir project  does offer a direct
data WCS (Web Coverage Service) access option , but this is the exception, rather than
This has resulted in the current situation, where there is a significant quantity of new CWF
data published on-line every day in Europe, but access to this data in its original resolution and
precision is limited. In addition, there is no unified repository of regional CWF data focusing
on the collection of recently published data.
2.3 Operational premises
The platform was created around the following premises, which are based on the extensive
background research conducted (according to Subsection 2.2):
&New heatmaps representing CWF forecasts are published daily or at least fairly regularly
with a predictable pattern and at predictable web URLs, to make them worthwhile
harvesting, both for practical and for informational gain reasons.
&Heatmap formats and publishing patterns such as update hours, frequency, etc. may
change without warning. It is the platform’smaintainers’responsibility to keep up with
&It is possible to reconvert heatmaps (or, in general, any sufficiently processed remote
sensing or synthetic image [2,26]) back into numerical data, albeit with limitations, and it
is possible to archive and post-process any data recovered by the reconversion process .
2.4 Heatmap characteristics
A typical example of a CWF heatmap image is presented in Fig. 1. Those characteristics and
isolated elements are also coded in AirMerge’s parsing configuration with specific keywords,
which are associated with each characteristic.
The parsing subsystem utilizes XML-based configuration files which contain descriptive
fields for all of the above elements. Some of those elements are fixed for a given provider’s
heatmaps, while others indicate variable/mutable characteristics, such as the type of pollutant
used. An example of such a configuration file is given in Fig. 2, which contains the instructions
for parsing heatmaps with the structure of the one in Fig. 1.
2.5 Map region
Heatmaps contain at least one rasterized map region, indicated with the < region > tag in the
AirMerge’s XML parsing subsystem. Regions contain the color-coded data of interest, with a
specific raster height and width. One pixel of this raster map corresponds to one data or grid
point, though a single data point doesn’t necessarily uniquely map to a single geographic
coordinate, due to map projection considerations. The map projection is described in the <
projection > node.
2.6 Color legends and pollutants
Each CWF forecast provider usually publishes a series of pollutants using identical map
layouts, but different color scales and value ranges. Hierarchically, color scales are considered
a sub-feature associated with each pollutant.
Color legends, color scales or simply Blegends^are color look-up tables (LUTs) or palettes
siding the map region, which indicate the relationships between the colors used in the heatmap,
and the numerical pollutant concentration value ranges that they represent.
In the scripting subsystem, the totality of the pollutants offered by a CWF provider
are grouped under the (unique) < pollutants > XML tag, while individual pollutants
are found under the (multiple) < pollutant > XML tags, as shown in Fig. 3,which
shows the description employed by AirMerge for parsing the CO pollutant from the
heatmapinFig.1. The position of the color legend as well as the value ranges are
entered manually, but the actual color sampling points are determined automatically.
The order of color parsing is left-to-right for horizontal color scales, and bottom-to-
top for vertical ones.
2.7 Determining map geometry and bounding box
AirMerge utilizes the following simple method for determining the relationship be-
tween image pixels and geographical coordinates: two points p
} are selected on the map region itself, and their geographical coordinates
} are determined by using the (usually present) geographical
grid of the image. The two selected points should be as far as possible in terms of
longitude and latitude as the map allows, in order to minimize discretization and
distortion errors. These points are chosen in the < pixelpinning > and < geopinning >
tags in AirMerge’s XML configuration scripts, as shown in Fig. 2. Then, the pixel/
longitude ratio r
and pixel/latitude ratio r
are easy to reconstruct with the following
Secondary heatmapsSecondary heatmaps
Fig. 1 An example of a heatmap with its important areas and structures highlighted. This particular example also
contains secondary heatmaps and text areas, which are normally not used
<model name="SILAM" code="5.2" />
<region name="Europe" legendcy="446">
<NW x="98" y="110" />
<SE x="500" y="401" />
<param name="lam_0" value="0.0" />
<param name="lam_0" value="0.0" />
<pixel x="98" y="401" />
<pixel x="500" y="110" />
<lonlat x="-25.0" y="30.0" />
<lonlat x="45.0" y="75.0" />
Fig. 2 Part of the XML-based configuration script used by AirMerge in order to parse heatmaps with a specific
structure. In this case, instructions on how to isolate the map region from heatmaps of the type used in Fig. 1are
<pollutant name="acid/000/CO_gas" type="CO" unit="ugr/m3"
<autopoints xMin="116" xMax="483" />
<range min="0" max="30" />
<range min="30" max="60" />
<range min="60" max="90" />
<range min="90" max="120" />
<range min="120" max="150" />
<range min="150" max="180" />
<range min="180" max="210" />
<range min="210" max="240" />
<range min="240" max="270" />
<range min="270" max="300" />
<range min="300" max="1200" />
Fig. 3 Part of the XML-based configuration script used by AirMerge in order to parse the CO gas pollutant from
heatmaps of the type used in Fig. 1
The formulas can be applied verbatim under the following conditions:
is located South-West (SW) of p
are both located in the northern hemisphere (0°≤φ
&Distances are always computed positive, moving from p
in the North-East (NE)
direction. If p
is more than 180° radians East of p
, then the large arc is considered
After these ratios have been computed, it is trivial to determine the geographical offsets of
the map (position of the SW point) as well as its maximum geographical extension, and
therefore construct a complete, four-point geographical bounding box.
Selecting appropriate points is done manually by the AirMerge’s operator, though it is
possible to partially automate the process [18,27]. If no usable geographical reference grid is
supplied with the heatmap, it is still possible to use the known geographical coordinates of two
easily identifiable landmarks.
2.8 Non-data elements
On several heatmaps, there will almost always bevisual elements that do not represent pollutant
concentrations, but instead mark boundaries, form longitude/latitude reference lines, signify
land-water interfaces, indicate urban areas or simply represent geographical areas which, even if
physically present on the map, are not part of the model’s output. Such areas are called Bvoid
areas^, and can for example be seen as the white, uncovered map areas in Fig. 4.
Fig. 4 An example of a heatmap with both numerous geomarkers and extended void regions
Some of those elements may be useful during configuration phases (for determining the
bounding box and map projection parameters, for instance) but in general they are undesirable
in processed data, and AirMerge uses several techniques to minimize their presence in the data
it stores in its database.
AirMerge automatically classifies any pixel with a color not among those described
in the legend as a non-data element, and considers its position as noise in the data,
forming a gap. Data gaps formed by boundary lines and geomarkers are usually thin
(one or two pixels), and are dealt with by using simple interpolation and noise-
removal algorithms , resulting in seamless, continuous images from which to
2.9 Handling of borderline cases
Ideally, heatmaps should contain only the colors appearing in their associated legends,
and any extraneous color should clearly indicate a geomarker element to remove. Also,
geomarkers and void/uncovered geographical areas in the map should use different
colors than those used in areas containing valid data, and the heatmaps themselves
should only be delivered in lossless image formats. However, in practice the following
problems do arise:
&Though no CWF provider of those represented in AirMerge actually delivers their
heatmaps in an actually lossy image format (e.g., JPEG), some heatmaps show signs of
having been submitted to a lossy process at some point. This may create unwanted noise,
visible patterns and color artifacts which, if treated indiscriminately like noise, would result
in a too extended data loss.
&Colors that differ slightly from those defined in the legend may appear in the map region,
or there might be more shades and hues than those implied by the legend.
&The legends themselves may contain noise or off-key colors which may differ slightly than
those appearing in the map region, making their use as absolute color references
&Usually, geomarkers and void areas use different colors, and it is easy to distinguish
between the two. However, sometimes the same color is used for both, making it
impossible to distinguish them based on color/hue alone.
To counter these occurrences, the parsing subsystem has a built-in configurable tolerance
factor when parsing colors. This allows for gap filling to be turned on and off, specifying
which colors to treat as void and using a special gap filling mode which takes into account the
existence of ambiguous noise and void in the same heatmap. An example of how this
subsystem is configured in AirMerge’s XML based configuration script, in node <
colorscalespecs > is given in Fig. 5.
2.10 The AirMerge system
AirMerge has been designed and implemented to process heatmaps via adopting a results-
oriented approach. The AirMerge system was developed to include the following components,
visually illustrated in Fig. 6:
&A script-driven CWF heatmap fetching sub-system, which can be configured to fetch all
heatmaps from a given set of CWF providers. This subsystem makes use of tags to
describe all heatmap features that are required for fetching, processing and archiving.
&A scheduler subsystem, which initiates the fetching scripts daily at prefixed intervals and
also handles networking errors such as connection failures, missing resources and script
execution failures by the fetcher subsystem, notifying the system’s administrator in case
non-automatic intervention is necessary.
&A heatmap-to-data conversion subsystem, which performs all the necessary image to data
conversions, map projection transformations and image cleanup.
&A database back-end, which stores both the raw and processed data from the heatmap
processing subsystem, according to a schema which allows searching and retrieval by
several fine-grained criteria.
Users Third-party services
Fig. 6 Structure of the AirMerge platform
<param name="tolerance" value="10" />
Fig. 5 Part of the XML-based configuration script used by AirMerge in order to configure the color scale and
map region color-based parsing subsystem
&A RESTful API, which allows accessing the data stored in the database using simpler
commands than accessing the database directly.
&Several post-processing modules designed to operate on processed data, either on a
particular coordinate or on an area. These modules offer various statistical and
geoprocessing functions such as computing concentration value averages, producing
ensemble (composite) forecasts or comparing and combining with observations, even
though the intention is to enable external services to implement such functionality through
&Third party extensions or special linkage modules which allow accessing those modules
through a simplified interface, such as the ones used for interconnecting with the
PESCaDO project’s framework .
&A visualization module offering direct in-browser user interaction.
All items except the post-processing modules and third-party extensions form part of
AirMerge’s core functionality, and are designed to be as generic and data-agnostic as possible,
thus being applicable to any heatmap image processing task. By using all of these function-
alities together, AirMerge constitutes an environmental data collection repository, which can
be extended and used for the creation of third-party services, which can then extract environ-
mental knowledge from AirMerge’s processed data.
2.11 CWF gathering workflow and sub-system
Since collecting and processing CWF heatmaps is the primary goal of AirMerge, the first step
in its workflow is to gather the heatmaps themselves. In order for a particular providers’
heatmaps to be successfully parsed and classified by AirMerge, their URLs must follow a
regular pattern, with a fixed base form and variable parts in their names which should be
indicative of time, pollutant and other relevant parameters. In other words, it is a necessary
precondition that the heatmap URLs themselves carry clearly structured classification meta-
data. Being able to uniquely identify heatmaps and infer some of their variable aspects via their
URL patterns is the key to AirMerge’s functionality. An example of an URL and its structure
can be seen in Fig. 7:
The URL pattern, as well as its constituent elements, are defined in the XML code snippet
shown in Fig. 8. The URL’s structure is encoded in the < formatString > node, while its
variable parts such as the pollutant, elevation layer etc. are indicated by tokens surrounded by
hashes. Some tokens like #prognosis# have different names from the XML nodes they are
iterated over, e.g., #prognosis# takes values from the < forecasts > node, while others are more
It has been shown that automating the extraction of some of a heatmap’s metadata and
structural information is possible [6,27], by using OCR and text/image processing techniques,
URL base form Pollutant Elevation Hour of
Fig. 7 An example of the structural parts of a heatmap URL
but the efficacy of such techniques is limited by the fact that individual heatmaps do not
always contain all of the necessary information, and operating on individual heatmaps makes it
hard to detect the existence of generalized/common schemas between a series of similar
heatmaps produced by the same CWF provider.
2.12 Image processing module
After images have been fetched, they are transferred to the image processing module, whose
task is to convert raw bitmap data into numerical data, by taking into account each heatmap’s
format and characteristics. The map region portion of the heatmap is cropped, and each of its
pixels is scanned individually. Depending on its color and on how closely it matches one of the
colors already present in the legend, it is assigned to a specific classification bin, according to
the following (simplified) pseudocode:
&a[m×n]imageIcontaining RGB color 3-ples
&a color legend C containing kunique RGB color 3-ples
&an [m×n]arrayQcontaining integer values.
for all pixels p∈I
for all colors c∈C
if p≅cthen Q[p]=index(C,c)
if p∈Cthen Q[p]=gap_marker
if isTransparent(C,p) then Q[p]=void_marker
<layer name="srf" meters="0" />
<layer name="500m" meters="500" />
<layer name="1000m" meters="1000" />
<layer name="3000m" meters="3000" />
<!-- This URL encodes no stat, so a dummy one is defined -->
<stat name="dummy" type="AvgC1" />
<forecast value="000" unit="hour" hours="0" />
<forecast value="096" unit="hour" hours="96" />
Fig. 8 Part of the XML-based configuration script used by AirMerge in order to configure the URL sequencer
where index(C,c) is an integer-valued function which returns the zero-based index of a color
c∈C. Pixels that do not manage to be classified as one of the colors existing in the legend C,
are assigned the special gap_marker value, which means that they are considered as invalid
data/undesirable noise. An exception to that is if they meet the criteria of the Boolean-valued
function isTransparent(C,p), which determines whether a pixel is to be classified as transpar-
ent or Bvoid^, according to the setup of the color legend C.ThoseBvoid^pixels are not
considered as either valid or invalid data and will be ignored during any successive
The classification condition, indicated with B≅^(almost equal) is used to represent the fact
that color classification is performed by using a tolerance threshold function. Most heatmaps
are parsed by using the three-dimensional RGB (Red, Green, Blue) color space for classifica-
tion. In this color space, two colors c
} are considered
equal for the purpose of classification when their Euclidean distance d
is less than a set
Usually this threshold εis set to a sphere of radius 10 in the RGB color space (assuming
8 bits or an intensity range of [0,255] per color channel), which proved to be a good all-
around empirical value after extensive testing. Using a threshold rather than an exact match
adds robustness against off-color and noisy graphics. It is also possible to use alternate
color spaces such as HSV (Hue, Saturation, Value), which make it easier to remove certain
kinds of semi-transparent geomarkers without further data losses, but make it more difficult
to discern between certain hues, so using them depends on the characteristics of the
heatmaps to process.
2.13 Cleanup module
Before being stored in the database or subjected to further processing, the indexed data
generated from the image processing step is subjected to the gap-filling or cleanup procedure.
Its goal is to substitute all pixels marked as Bgaps^with valid values, taken from the legend C.
The algorithms used to do this have been detailed in . The quality of the data recoverable
from heatmaps at the end of the cleanup procedure, compared to the original model’s data
before web publishing by data providers, has been explored in .
2.14 Database persistence module
After parsing and gap filling, the now cleaned-up retrieved forecast data is stored in the form of
an indexed image, along with region, projection and color scale information which are derived
from the scripting-configuring system, according to the UML database schema shown in
This schema associates each harvested forecast with a unique Blayerimage^entity, which is
the top entity in AirMerge’s database hierarchy, while other associated information such as
geographical region, legend, pollutant etc., can be reused and shared between multiple
layerimage entities through the Bregionlayer^entity.
By exemplifying, it could be said that the regionlayer entity models all possible
parameter combinations and variants of a forecast that are available through a provider.
These variants can be numerous, but they are always finite in number and eventually tend
to repeat, and so are not unique. The layerimage entities, on the other hand, always have
unique timestamps, as they represent actual unique instances of a forecast issued in a
particular moment in time.
Since the schema exposes geographical coverage information for each regionlayer entity, it
is possible to select multiple layerimage entities covering the same or overlapping geograph-
ical areas. It is of course possible to filter out layerimages according to time coverage as well as
recentness/relevance criteria. For example, for a given time of the day a forecast issued on that
same day will contain more reliable and up-to-date information than one issued 24 or 48 h
before, even if it nominally refers to the same time and day (the Brepresented time^attribute),
hence the most recent forecast will also be more relevant. This way, conditions of multiple
forecasts covering the same time and space can be resolved.
2.15 Geographical transformation modules
To avoid unnecessary data loss, AirMerge stores all forecasts in their original raster’s resolu-
tion and map projection, without any permanent alterations. However, it is often necessary to
transform a forecast from one type of geographical projection to another, for example, during a
computation or visualization in a map projection other than its native one. For this reason,
AirMerge contains a variety of map projection and transformation modules, which can be
chained together to form a complete deprojection and reprojection workflow for any stored
In general, a forecast is associated with up to four different projection rules:
Fig. 9 An exemplified version of AirMerge’s database schema, indicating how forecasts are stored and how
their metadata and auxiliary information can be retrieved. Primary keys are indicated with the PK note, foreign
keys with the FKnnote
&Input Pixel Projection: A projection from the forecast’s raster space (pixels) to the
associated transformed (linear) coordinates’space.
&Input Geographical Projection: A projection from the transformed coordinates’space to
a common geographical coordinates’space
&Output Geographical Projection: A projection from the forecast’sgeographicalcoordi-
nates’space to a transformed linear coordinates’space.
&Output Pixel Projection: A projection from the forecast’stransformed coordinates’space
into an image’sordataarray’s raster space (pixels).
This apparent redundancy and complexity, is required because there are three coordinate
spaces to consider:
&The m,ncoordinate space of the forecast’s raster itself: (m,n)∈ℕ
, also called Bpixel
&The transformed x,ycoordinate space the forecast: (x,y)∈ℝ
, also called Blinear space^.
&The geographical λ,φ(longitude, latitude) coordinate space of the forecast: (λ,φ)∈[−π,π]
also called Bgeo space^.
The linear space is an intermediate 2D real-valued space which appears when using non-
linear projections, e.g., conical, polar stereographic or spherical . The relationship with the
pixel space will generally be linear (hence the name) and straightforward, but the relationship
with actual geographical coordinates can be quite complex. An example of such a space are the
UTM coordinates, which are expressed in km from a set point of origin , and which are not
trivially convertible to latitude and longitude with a simple linear relationship.
In practice, pixel and geo projections are used in pairs, masking the existence of the
transformed space, which therefore remains hidden and not used directly in computations or
projections. This way, all computations and queries have to deal only in terms of pixels and
absolute geographical coordinates. Thus, in practice only the geo and pixel spaces are used.
In Fig. 10, the typical transformation workflow is illustrated. During input, pixel or raster
data is deprojected to the geographical coordinates’space. The input/deprojection function is
actually performed in two steps, but as far as AirMerge is concerned, the net result is a
conversion from pixels to geographical coordinates. This allows queries about single geo-
graphical locations or areas to be expressed in intuitive and broadly supported and understood
geographical coordinates, while the deprojection and reprojection mechanisms take care of the
complex transformations required. In Fig. 11, an example of a cleaned-up and projection-
corrected heatmap is shown.
Each type of projection function used in AirMerge is also implemented to be fully
reversible, so it is also possible to convert geographical coordinates back into an array of
pixels or other types of rasterized data. This array may be of the same type and size as the
original input (in which case there would be a closed-loop processing), or of a different type
(for example, visualizing several different types of input heatmaps in a common map
The implementation of map projections in AirMerge is done by custom code, which
allowed for a more lightweight codebase, direct control and less dependence on external
frameworks. Nevertheless, it is still possible to use adapters to external map transformation
enginessuchastheoneinGeoTools, in order to expedite the integration of new map
2.16 Forecast ensembling module
In addition to retrieving, parsing and storing heatmap data, AirMerge can also perform
several mathematical and statistical operations on groups of two or more layers, retrieved
from different forecasts. Those operations are very common in the domain of ensemble
forecasting , and find applications in forecast modeling refinement and big data
In order to perform such operations, it is necessary that multiple heterogeneous layers can
be translated to a common coordinate system and reference grid. This ability of AirMerge is
displayed in Fig. 12, where three forecasts from different providers are averaged to one
composite (averaging) forecast, and differences in scale and resolution are also leveled.
AirMerge can perform these operations either on a point-to-point basis (for a specific geo-
graphical coordinate), or on a geographically bounded area basis.
Fig. 10 Representation of de- and re-projection workflow
Fig. 11 Cleanup and reprojection of a heatmap using a conical map projection
2.17 AirMerge public API
In order to make AirMerge’s harvested heatmaps and data available to third-party services and
researchers, a public API was designed, available as a RESTful web service that responds to
HTTP GET methods . Currently the API is in a testing phase, and is unauthenticated and
Its original purpose was to allow interfacing with the PESCaDO node orchestration service
allowing it to request chemical weather data for specific geographical coordinates (point
queries), from one or multiple layers, and also performing a compositing/ensemble forecasting
of multiple source layers into a single result, according to the principles of ensemble forecast-
2.18 AirMerge visualization module
The visualization module is not considered an essential AirMerge component, as the system can
continue functioning even without it. It is, however, a convenient way of accessing the currently
stored layers of CWF data and visualizing them in a harmonized way, on a common Google
Maps background. This module was used during development and is currently not actively
maintained, as its replacement with an OGC WMS-based solution is scheduled in the future.
2.19 Use of AirMerge in the PESCaDO project
The PESCaDO  service system has been developed with an express purpose quite relevant
to that of AirMerge, by being oriented towards discovering new environmental data sources on
the Web and integrating them in a centralized repository.
In contrast to AirMerge however, emphasis has been placed in automatic discovery,
retrieval and classification of informational nodes, including elements of Machine Learning
and ontological data organization, while allowing for extensions through auxiliary external
functionality. In this context, AirMerge has been interfaced to PESCaDO with two distinct
roles/functionalities, that of an environmental data node, and as a provider of forecasts
ensembling and fusion.
Fig. 12 Heatmap retrieval (A), cleanup (B), reprojection and combination (C) (ensembling) workflow
In PESCaDO, the concept of environmental data node encompasses every kind of usable
online data source, adopting a philosophy similar to AirMerge, but more focused on sources
broader than heatmaps (i.e., websites providing weather, air quality and pollen forecasts and
Normally, PESCaDO includes data discovering and fetching mechanisms designed to
operate on readable text contents, data feeds, air quality bulletins, and, more in general, on
textual web resources and websites, inferring context and contents by the use of semantic-
ontological text analysis techniques.
However, in the PESCaDO’s system early design phases, it was realized that important
environmental information is included also in non-textual data (e.g., heatmaps) and therefore
having access to AirMerge’s mechanisms and its already-harvested data would be
For non-textual sources, such as heatmaps, PESCaDO can make use of specialized
connectors that allow it to either access pre-organized external databases, or utilize specialized
computation and processing modules that allow it to make use of non-textual data sources,
thus adding even non-textual resources to its knowledge database.
AirMerge has found use within the PESCaDO project in three separate ways. First, access
to its already harvested data was granted through the AirMerge API, whose initial develop-
ment was stimulated and shaped precisely by the needs of PESCaDO. Using the AirMerge
API, the PESCaDO service was capable of performing point-based queries all over Europe,
receiving precise numerical responses.
Second, AirMerge performs a type of localized, point-based ensembling whena
particular geographic location is covered by more than one CWF provider, as well as
producing an uncertainty metric of the final reported result. This allows PESCaDO not only
to supply its users with numerical, rather than qualitative information, but also to provide an
estimate of the underlying data’s reliability and precision. This is achieved by keeping the
AirMerge platform running as usual, with the PESCaDO platform performing its queries
remotely through the API, without any implementation details of the two platforms being
A third and more direct involvement of AirMerge in the PESCaDO platform, was achieved
by the almost direct reuse of AirMerge’s heatmap parsing component by part PESCaDO. This
component can be used autonomously, and even off-line, provided a suitable configuration
script is supplied, describing how to parse a specific heatmap. The configuration script can be
generated either manually or automatically using a dedicated annotation tool , and is
similar to the examples already shown.
The component utilized by PESCaDO is a stripped-down version of the parsing subsystem.
It utilizes the same configuration scripting as AirMerge system minus some features like an
URL sequencer and harvester, support for provider-specific configurations, multiple map
regions, and in general without any features meant to process sequences of similar heatmaps.
The scripting language is instead reduced to describe how to parse a single specific heatmap,
rather than an entire class of heatmaps bound by some common characteristics.
While this may seem restrictive at first, it is actually a customization for fitting with
PESCaDO’s semi-automatic heatmap configuration subsystem, which attempts to automati-
cally determine the characteristics of a heatmap like resolution, geographical bounding, color
scale and values, etc. based on OCR and text processing techniques. Then, this information is
validated and further edited by an administrative user with the aid of a dedicated Annotation
Tool. This information is then used to generate an AirMerge heatmap parsing XML script,
which is fed to the parsing AirMerge component, according to the scheme in Fig. 13,where
the AirMerge component is indicated as the BHeatmap Processing^block .
In the future, this same subsystem can be made publicly available via WPS, thus eliminat-
ing the need to provide an implementation library to use in situ.
2.20 Conclusions and future developments
In order to design an information system that uses heatmaps as its input and produces high
quality environmental information as its output, a precise knowledge of the heatmaps’structure
and characteristics is required, as well as designing a streamlined and coordinated process for
data retrieval, handling, information extraction and system operation.
AirMerge was designed from the ground-up according to this knowledge, adopting a
bottom-up approach and following a results-oriented development strategy, in order to deal
with any encountered sub-problems when treating heatmaps. The AirMerge system is not meant
to be static, but it evolves and is upgraded along with changes in the CWF publishing scene
where it is currently applied. New providers, heatmap formats and map projections are being
added as necessary, while changes in current providers’publishing patterns are being followed.
AirMerge focuses on daily updates starting from published heatmaps, rather than one-off
exchanges of historical model or station data, though AirMerge can also function as a historical
repository. AirMerge can be considered as filling a niche between long-term historical and
statistical presentation of regional air quality data, and short term CWF, allowing for its
database to grow as new CWF are published.
AirMerge has been evaluated both in the roles of a CWF data repository and a supplier of
specialized chemical weather processing services both as a standalone research tool, as well as
a component on a third-party value-added service (PESCaDO).
In the way of making AirMerge more interoperable and more readily accessible by other
third parties, as well as being more readily utilizable as a base for building CWF-related
services , the implementation of Open Geospatial Consortium standards is considered, to
work alongside or even entirely supersede the custom AirMerge API for most tasks.
In particular, visualization of harvested heatmaps could be performed through the OGC
Web Map Service, while downloadable numerical data could be supplied through on-the-fly
generation of NetCDF files or other suitable formats by an OGC Web Coverage Service. In
addition, certain extra processing functions offered through the AirMerge API could be better
exposed as OGC WPS (Web Processing Service) processlets.
In general, future efforts will be directed on making AirMerge more accessible to third
parties through the use of well-established GIS standards, rather than providing custom access
and visualization interfaces, in order to turn AirMerge into an attractive, standards-compliant
and solid foundation for the development of CWF-related web services.
Fig. 13 Overall heatmap content distillation architecture in PESCaDO
Acknowledgments AirMerge was developed in the frame of COST Action ES0602, and was financially
supported by the FMI during the years 2010–2012 and co-funded by the PESCaDO project during 2012–
2013. This publication was supported by the BIKY Fellowships of Excellence for Postgraduate Studies in
Greece—Siemens Program^at the time of writing.
1. Aalto A. (2012) Scalability of Complex Event Processing as a part of a distributed Enterprise Service Bus.
[Internet]. Espoo, Finland [cited 2014 Jul 31]. Available from: http://www.cleen.fi/en/SitePages/Public%
2. Armenakis C, Savopol F (2014) Image processing and GIS tools for feature and change extraction. In: Proc.
of the XXth ISPRS Congress. Istanbul, p. 605–610
3. Balk T, Kukkonen J, Karatzas K, Bassoukos A, Epitropou V (2011) A European open access chemical
weather forecasting portal. Atmos Environ 45:6917–6922
4. Bassoukos A (2013) AirMerge Remote API Overview. [Internet]. [cited 2014 Jul 31]. Available from: https://
5. Epitropou V, Johansson L, Karatzas K, Bassoukos A, Karppinen A, Kukkonen J, Haakana M. (2012) Fusion
Of Environmental Information For The Delivery Of Orchestrated Services For The Atmospheric
Environment In The PESCaDO Project. [Internet]. Leipzig, Germany [cited 2014 Aug 4]. Available from:
6. Epitropou V, Karatzas K, Karppinen A, Kukkonen J, Bassoukos A (2012) Orchestration services for
chemical weather forecasting models in the frame of the PESCaDO project. In: 8th International
Conference on Air Quality—Science and Application; Athens, 19–23
7. Epitropou V, Karatzas K, Kukkonen J, Vira J (2012) Evaluation of the accuracy of an inverse image-based
reconstruction method for chemical weather data. Int J Artif Intell 9(12):152–171
8. Epitropou V, Karatzas K, Bassoukos A (2010) A method for the inverse reconstruction of environmental data
applicable at the Chemical Weather portal. In: Proceedings of the GI-Forum Symposium and exhibit on
applied Geoinformatics; p. 58–68
9. European Earth Observation Programme (2012) PASODOBLE project. MyAir PASODOBLE project
homepage. [Internet]. [cited 2014 Aug 4]. Available from: http://www.myair.eu/airsheds/
10. European Environment Agency (2014) AirBase - The European air quality database. [Internet]. [cited 2014
Aug 4]. Available from: http://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-
11. Galmarini S, Kioutsoukis I, Solazzo E (2013) E pluribus unum: ensemble air quality predictions. Atmos
Chem Phys 29:7153–7182
12. Horálek J, Tarrasón L, de Smet P, Malherbe L, Schneider P, Ung A, Corbet L, Denby B (2013) Evaluation of
copernicus MACC-II ensemble products in the ETC./ACM spatial air quality mapping. Technical Paper
2013/9. European Topic Centre on Air Pollution and Climate Change Mitigation
13. Karatzas K, Kukkonen J (2009) COST Action ES0602: Quality of life information services towards a
sustainable society for the atmospheric environment. Sofia Publishers, Thessaloniki
14. Karatzas K, Kukkonen J, Bassoukos A, Epitropou V, Balk T (2011) A European chemical weather
forecasting portal. In: Steyn GD, Trini SC (eds). Air pollution modeling and its applications XXI.
Springer, NATO Science for Peace and Security Series C: Environmental Security: p. 239–243
15. Khan FH, Javed MY, Bashir S, Khan A, Sikandar M, Khiyal H (2010) QoS based dynamic web services
composition & execution. Int J Comput Sci Inf Secur
16. Kukkonen J, Olsson T, Schultz DM, Baklanov A, Klein T, Miranda AI, Monteiro A, Hirtl M, Tarvainen V,
Boy M et al (2012) A review of operational, regional-scale, chemical weather forecasting models in Europe.
17. Moumtzidou A, Epitropou V, Vrochidis S, Karatzas K, Voth S, Bassoukos A, Mossgraber J, Karppinen A,
Kukkonen J, Kompatsiaris I (2014) A model for environmental data extraction from multimedia and its
evaluation against various chemical weather forecasting datasets. J Ecol Inf 23:69–82
18. Moumtzidou A, Epitropou V, Vrochidis S, Voth S, Bassoukos A, Karatzas K, Moßgraber J, Kompatsiaris I,
Karppinen A, Kukkonen J (2012) Environmental data extraction from multimedia resources. In: Proceedings
of the 1st ACM international workshop on Multimedia analysis for ecological data (MAED 2012); Nara,
Japan. p. 13–18
19. Open Geospatial Consortium (2013) OGC best practice for using web map services (WMS) with time-
dependent or elevation-dependent data [Internet]. [cited 2014 Jul 30]. Available from: http://external.
20. OSGeo Foundation (2014) GeoTools Infosheet. [Internet]. [cited 2014 Jul 31]. Available from: http://www.
21. Potempski S, Galmarini S (2009) Est modus in rebus: analytical properties of multi-model ensembles. Atmos
Chem Phys 9(24):9471–9489
22. Riga M, Karatzas K (2014) Investigating the relationship between social media content and real-time
observations for urban air quality and public health. In: Proceedings of the 4th International Conference
on Web Intelligence, Mining and Semantics (WIMS ‘14); New York, USA. p. 59:1–7
23. Snyder J (1987) Map Projections—A Working Manual. Professional Paper: 1395. USGS Publications
24. Snyder J (1989) An album of map projections. Professional Paper: 1453. USGS Publications Warehouse
25. Sofiev M, Siljamo P, Valkama I, Ilvonen M, Kukkonen J (2006) A dispersion modelling system SILAM and
its evaluation against ETEX data. Atmos Environ 44(4):674–685
26. Verstraete MM, Pinty B (2013) Environmental information extraction from satellite remote sensing data. In:
Kasibhatla P (ed) Inverse methods in global biogeochemical cycles, vol 1., pp 125–137
27. Vrochidis S, Epitropou V, Bassoukos A, Voth S, Karatzas K, Moumtzidou A, Moßgraber J, Kompatsiaris I,
Karppinen A, Kukkonen J (2012) Extraction of environmental data from on-line environmental information
sources. In: IFIP Advances in Information and Communication Technology; p. 361–370
28. Wanner L, Vrochidis S, Rospocher M, Mossgraber J, Bosch H, Karppinen A, Myllynen M, Tonelli S,
Bouayad-Agha N, Bugel U, et al (2012) Personalized environmental service orchestration for quality of life
improvement. In: artificial intelligence applications and innovations, IFIP advances in information and
communication technology, 3rd Intelligent Systems for Quality of Life information Services (ISQL 2012);
Halkidiki, Greece. p. 351–360
Victor Epitrop ou has received his MEng degree in Electrical Engineering and Computer Science from the
Democritus University of Thrace (DUTh) in 2007 with a thesis on the parallelization of algorithms in an object-
oriented context, and his MSc degree in Digital Image and Signal processing from the DUTh in 2011, after
serving in the Hellenic Army as a reserve officer of the Research & IT corps from 2007 to 2009. He has worked
at the Informatics Systems & Applications Group of the Aristotle University of Thessaloniki since late 2009 on
several European research projects in the domain of air quality and personalized web services, and is currently a
PhD candidate at the Aristotle University of Thessaloniki. His research and work interests also include embedded
software development, neural networks, wireless sensor networks, and desktop applications development.
Tassos Bassoukos is a Senior Software Engineer with the Informatics Systems & Applications Group of the
Aristotle University of Thessaloniki, handling the group’s Software Engineering needs. He has been carrying the
main software development burden of the group for several years. He has been working with Web technologies
since HTML 2.0 and Java web applications since 1999. He has participated in several European research projects
and has seen them to successful completion. Additional areas of professional interest include Open Source
frameworks, Web-based Content management systems, inter-language integration, web mapping systems,
information representation and Human-Computer Interfaces.
Kostas Karatzas is an Asst. Professor for Informatics Systems & Applications at the Dept. of Mechanical
Engineering, Aristotle University of Thessaloniki (AUTh), Greece, where he leads the Informatics Applications
and Systems Group (ISAG). The Group is specialized in Environmental Informatics, and is conducting data–
oriented analysis and modelling at a raw, processed, and e-service level, for citizens, authorities and industry,
employing Computational Intelligence. ISAG has a long expertise in web-based applications, web portals and
services, as well as in applications for mobile devices and smart phones, with an emphasis on participatory
environmental sensing. Kostas Karatzas is the author and co-author of more than 150 scientific publications, and
is serving as a scientific committee member for international conferences and as an advisory board member of int.
publications, with emphasis on Environmental Informatics and Computational Intelligence.
Dr. Ari Karppinen has worked as a research scientist at the Finnish Meteorological Institute since 1984. His
expertise is on mathematical modeling, atmospheric physics and chemistry; particularly evaluation of urban air
quality, the dispersion of pollution from traffic. His MSc thesis (1987) dealt with the description and application
of a system for calculating radiation doses due to long range transport of radioactive releases and his Licentiates’s
thesis (1998) studied the effective choice of NOx - emission control measures. His doctor’s thesis (2001) dealt
with the meteorological pre-processing and atmospheric dispersion modeling of urban air quality and applica-
tions in the Helsinki metropolitan area Doc. Karppinen is the author of approximately 200 scientific publications;
37 of these in refereed international journals. He has given over 50 lectures and presentations at scientific
Leo Wanner is an ICREA Research Professor in the Department of Information and Communication Technol-
ogies at Universitat Pompeu Fabra (UPF). He earned his Diploma degree in Computer Science from the
University of Karlsruhe, Germany, and his Ph.D. in Linguistics from the University of The Saarland,
Saarbrücken, Germany. Leo works in the field of computational linguistics. His research foci include automatic
multilingual report generation, automatic summarization of written material and paraphrasing, computational
lexicology and lexicography. Throughout his career, Leo has been involved in various large-scale national,
European, and transatlantic research projects. He has published five books and over 100 refereed journal and
Stefanos Vrochidis received the received the Diploma degree in electrical engineering from Aristotle University
of Thessaloniki, the MSc degree in radio frequency communication systems from University of Southampton,
and the PhD degree in electronic engineering from Queen Mary, University of London. He is a postdoctoral
researcher with ITI-CERTH. His research interests include semantic multimedia analysis, indexing and retrieval,
semantic search, multimedia search engines and human interaction, as well as environmental applications and
Ioannis Kompatsiaris received the Diploma degree in electrical engineering and the Ph.D. degree in 3-D model
based image sequence coding from Aristotle University of Thessaloniki in 1996 and 2001, respectively. He is a
Senior Researcher with ITI-CERTH and director of its Multimedia Knowledge Laboratory. His research interests
include multimedia content processing, multimodal techniques, multimedia and Semantic Web, multimedia
ontologies, knowledge-based analysis, and context aware inference for semantic multimedia analysis,
personalisation and retrieval.
Jaakko Kukkonen received the Ph.D. degree in physics from the University of Helsinki in 1990. He is currently
Research Professor at the Finnish Meteorological Institute. He is also Docent (Adj. Prof.) of Physics at the
University of Helsinki and Visiting Professor at the University of Hertfordshire (U.K.). He has worked on
atmospheric physics and chemistry, including especially the development, evaluation and applications of
mathematical atmospheric models.