Conference PaperPDF Available

Extraction of Environmental Data from On-Line Environmental Information Sources

Authors:

Abstract and Figures

Analysis of environmental information is considered of utmost importance for humans, since environmental conditions are strongly related to health issues and to a variety of everyday activities. Despite the fact that there are already many free on-line services providing environmental information, there are several cases, in which the presentation format complicates the extraction and processing of such data. A very characteristic example is the air quality forecasts, which are usually encoded in image maps of heterogeneous formats, while the initial (numerical) pollutant concentrations, calculated and predicted by a relevant model, remain unavailable. This work addresses the task of semi-automatic extraction of such information based on a template configuration tool, on methodologies for data reconstruction from images, as well as on Optical Character Recognition (OCR) techniques. The framework is tested with a number of air quality forecast heatmaps demonstrating satisfactory results.
Content may be subject to copyright.
L. Iliadis et al. (Eds.): AIAI 2012 Workshops, IFIP AICT 382, pp. 361–370, 2012.
© IFIP International Federation for Information Processing 2012
Extraction of Environmental Data from On-Line
Environmental Information Sources
Stefanos Vrochidis1, Victor Epitropou2, Anastasios Bassoukos2, Sascha Voth3,
Kostas Karatzas2, Anastasia Moumtzidou1, Jürgen Moßgraber3,
Ioannis Kompatsiaris1, Ari Karppinen4, and Jaakko Kukkonen4
1 Centre for Research and Technology Hellas, Informatics and Telematics Institute
2 Informatics Systems and Applications Group, Aristotle University of Thessaloniki
3 Fraunhofer Institute of Optronics, System Technologies and Image Exploitation
4 Finnish Meteorological Institute, Helsinki
{stefanos,moumtzid,ikom}@iti.gr,
{vepitrop,abas}@isag.meng.auth.gr,
{sascha.voth,juergen.mossgraber}@iosb.fraunhofer.de,
{ari.karppinen,jaakko.kukkonen}@fmi.fi, kkara@eng.auth.gr
Abstract. Analysis of environmental information is considered of utmost
importance for humans, since environmental conditions are strongly related to
health issues and to a variety of everyday activities. Despite the fact that there are
already many free on-line services providing environmental information, there are
several cases, in which the presentation format complicates the extraction and
processing of such data. A very characteristic example is the air quality forecasts,
which are usually encoded in image maps of heterogeneous formats, while the
initial (numerical) pollutant concentrations, calculated and predicted by a relevant
model, remain unavailable. This work addresses the task of semi-automatic
extraction of such information based on a template configuration tool, on
methodologies for data reconstruction from images, as well as on Optical Character
Recognition (OCR) techniques. The framework is tested with a number of air
quality forecast heatmaps demonstrating satisfactory results.
Keywords: Environmental, air quality, heatmap, image processing, OCR, data
reconstruction, template configuration.
1 Introduction
Analysis of environmental information is considered of utmost importance for human
population, as this is strongly related to health issues (e.g. cardiovascular diseases), as
well as to a variety of important activities (e.g. agriculture). In everyday life,
environmental conditions of the atmospheric environment, in terms of air quality,
weather, pollen measurements and forecasts are also of particular interest for outdoor
activities (e.g. trip planning) and therefore they strongly affect the quality of life.
Nowadays, the main sources of such information for the everyday user are web
portals and sites. In order to support people in everyday action planning considering
the environmental conditions, we need to provide them with services, which combine
complementary environmental information from several resources, with a view to
362 S. Vrochidis et al.
generate more reliable environmental measurements. The first step towards this
direction is the extraction of data from environmental resources. In practice only a
few of the data providers make available some means of access to their actual
(numerical) forecast data. In this context, this paper addresses the semi-automatic
extraction of air quality forecasts from heatmap images.
After studying a number of on-line chemical weather forecasts by various providers
[1], it can be said that the air quality information is most usually presented in the form of
images representing forecast pollutant concentrations over a geographically bounded
region, typically in terms of maximum or average air pollution concentration values for
the time scale of reference, which is usually the hour or day [2], [3], [4], [5], [6]. These
providers present their air quality forecasts almost exclusively in the form of
preprocessed images with a color index scale indicating the concentration of pollutants.
In addition, these providers arbitrarily choose the resolution of their images, the color
scale and color depth employed for visualizing pollution loadings, the covered region, as
well as the geographical map projection. The actual mode of presentation varies from
simple web images to more elaborated AJAX, Java or Adobe Flash viewers [7]. While
this representation is informative for the casual user (e.g. compared to a table with
numerical values), it has the drawback that the data are being presented in a wide range
of highly heterogeneous forms, which makes it very complicated to extract and compare
their results. To make it worse, some of the images are permanently marked with visible
watermarks, text, lines etc. that would make the extraction phase even more challenging.
In order to address this challenge we propose a semi-automatic framework for
extracting air quality information from such images and store them into a numerical
format. The proposed system is based on an annotation tool, which supports an
administrative user to generate a configuration template for each heatmap, and on
Optical Character Recognition (OCR) techniques for text information extraction. The
basic functionality of the system (i.e. the information extraction from heatmaps), is
based on AirMerge [4], [6], [8], [9], a system that allows for the automatic harvesting,
annotation, harmonization and reverse engineering of heatmaps, in order to come up
with easily deployable numerical values of chemical weather forecasts.
The contribution of this paper is the methodology and the framework for user-
assisted air quality information extraction from heatmaps, which extends previous
works (i.e. AirMerge) by further adding OCR techniques, as well as allowing user
configuration with the aid of a dedicated graphical user interface. More specifically,
we propose a framework, which is based on a novel heatmap Annotation Tool (AnT),
on the application and optimization of OCR techniques for textual information
extraction from heatmap images and on the AirMerge tool [4] for image processing.
This paper is structured as follows: section 2 presents the related work, while
section 3 describes the framework architecture. Section 4 presents the Annotation
Tool, section 5 the OCR techniques and section 6 the AirMerge system. The results
are presented in section 7 and finally, section 8 concludes the paper.
2 Related Work
Existing maps can be grouped into map types based on the placement and
presentation of their information. Discriminating factors between map types can be
Extraction of Environmental Data from On-Line Environmental Information Sources 363
found in their scale, colorization, quality, accuracy, topology and many other aspects.
In case of air quality (or chemical weather) maps there are mainly two types of
information covered by the map data: a) Geographical information: points and lines
describing country frontiers or other well-known points of interests or structures (e.g.
sea, land, capitals) in a given coordinate system, b) Color information: measured data
of any kind (e.g. average temperature), which are coded via a color scale representing
the measured values. Single values are referenced geographically by a color value at
the corresponding geographical point. Chemical weather maps often use this type of
maps called raster map or heatmaps images to represent measured data. There are
several approaches to extract and digitalize this image information automatically.
First, the authors in [10] describe the process of the vectorization of digital image
data. Hereby the geographical information, in form of lines, is extracted and
converted to digital storable vector data. Only the lines are processed. The work in
[11] makes use of the specific knowledge of the known colorization in USGS maps,
to have the ability to automatically segment these maps based on their semantic
contents (e.g. roads, rivers). In [12] the segmentation quality of text and graphics in
color map images is improved, to enhance the results of the following analysis
processes (e.g. OCR), by selecting black or dark pixels from color maps, cleaning
them up from possible errors or known unwanted structures (e.g. dashed lines), to get
cleaner text structures.
Although research work has been conducted towards the automatic extraction of
information in maps, very few works address the automatic extraction of information
from chemical weather maps. In such works [4], [6], [8], a method to reconstruct
environmental data out of chemical weather images is proposed. In a first step the
relevant map section is scraped from the chemical weather image. After that
disturbances are removed (e.g. country lines) and a color classification is employed to
classify every single data point (pixel), to recover the measured data. With the aid of
the known geographical boundaries, given by the coordinate axis and the map
projection type, the geographical position of the measured data point can be retrieved.
In case of missing data points, a special interpolation algorithm is used to fill these
gaps.
The proposed work goes one step beyond the aforementioned heatmap extraction
methods, since it introduces a configurable user-assisted environment, which
facilitates the application of the framework on new heatmaps without requiring
programming skills and low level configuration on the user’s side.
3 Framework Architecture
The architecture of the proposed framework is illustrated in Figure 1 and includes two
main components: the Annotation Tool and the data extraction service.
The first phase, called “Template Configuration” (123), includes the manual
annotation of an image with the AnT, and the generation of a configuration file. This
process is controlled by an administrative user with the aid of AnT. The second phase
includes the “Data extraction” (1+345), which uses the configuration file to
364 S. Vrochidis et al.
Fig. 1. Air quality data extraction framework
extract data from the specific heatmap. During this phase, the parts of each image are
analyzed using image and text processing techniques. Specifically, the heatmap is
processed with the AirMerge system, while the text information located in the image
is extracted and processed using OCR techniques and text processing.
4 Annotation Tool
The Annotation Tool (AnT) is used to interactively annotate heat maps and it was
developed in order to make the annotation process easier for the user.
To make the tool platform independent, the QT Framework1 was used. The
implementation is designed via the MVC (Model/View/Controller) pattern, to ensure
its expandability. To allow for different interaction possibilities, two views were
implemented. First a simple Tree View, which represents the XML structure and its
entries as traversable tree, and a window, which represents the selected tree data
graphically. Regions of Interest (ROIs) and Points of Interest (POIs) are drawn onto
this window.
Figure 2 depicts the AnT tool after a heatmap from GEM’s Project2 site is loaded.
The air quality heatmaps contained in the site are typical examples of images used for
representing chemical weather forecasts. The left part of the tool contains the heatmap
as well as the ROIs, which are the following: a) the map itself, b) the x and y axis
related to the heatmap, c) the color scale, d) the numbers corresponding to the color
scale and e) the title of the heatmap. The ROIs are depicted as red bounding boxes
and are defined by the user. Finally, their values are recorded to the right part of the
AnT inside the XML template under the corresponding nodes.
1 http://qt.nokia.com/products/
2 http://gems.ecmwf.int/d/products/raq/
Extraction of Environmental Data from On-Line Environmental Information Sources 365
Fig. 2. Screenshot of Image Annotation tool
5 OCR Techniques
The OCR module uses the information of the configuration file to extract textual data
from images and improve the results using text processing based on heuristic rules.
The first processing step of this module includes the application of OCR on
specific parts of the input image as the title, the color scale, the map x and y axes
parts. The OCR software that was used is Abbyy Fine Reader3.
In the second step, we apply text processing based on heuristic rules in order to
correct, extract and understand the semantic information encoded in the
aforementioned locations. Each of these locations was treated in a different way.
The title, if it exists, usually contains the name of the aspect, the measurement
units and the date/time. The measurement units are usually standard depending on the
measured aspect so we do not need to extract them. The date/time is considered as the
element that is the most difficult to extract, given the fact that many different formats
exist. In order to correct possible mistakes in the textual format of the month, day or
aspect we exploited the Levenshtein distance. More specifically, three English ground
truth sets were created for the three aforementioned elements and were compared to
the corresponding OCR result. Then, we have selected the element from the ground
truth that had the smaller Levenshtein distance from the text generated by the OCR.
The color scale contains the values that each color of the map corresponds to. The
processing and extraction of information from the color scale element can be divided
into two main parts. In the first, we attempt to check and correct OCR results for the
scale, while in the second we correlate values to colors. In order to correct the OCR
results, we find the most common range among the scale values and adapt
3 http://www.abbyy.com/
Heatma
p
and com
p
onent
p
arts
XML Tem
p
late
366 S. Vrochidis et al.
accordingly the mistaken values. The correlation of values to colors is achieved by
pointing at the middle of each color by using the coordinates of the values.
The last two elements that are analyzed with OCR are the x and y axes. In the case
of heatmaps the two axes contain similar information and thus we will apply similar
processing techniques to them. The information that can be obtained by each axis is
the geographical coordinates of the points of the map. In order to realize this, we have
to identify successfully at least two points (x, y coordinates) of the map axes and
define their position in relation to the map. In order to identify these points, we first
correct most of the errors produced by OCR and then use the coordinates of the
elements, as defined by OCR to specify the position of those points.
6 AirMerge
AirMerge is a web-based system that supports harvesting Chemical Weather forecast
images and converting them to numerical data. A derivative of its image processing
engine is used in the Data Extraction phase of the proposed toolchain, and it is already
used for creating harmonized, numerical Chemical Weather data4.
The AirMerge system combines elements of screen scraping and innovative image
processing algorithms [4], [6], [8] in order to produce uniform, indexed data. These
data are then stored in a back-end database and may be recalled for further processing
such as numerical applications, model ensembles, visualization, transformation etc.
The main task of AirMerge is the extraction of data from heatmaps. This is
achieved by using a processing chain that consists of a “screen scraping” phase,
where raw RGB pixel data are extracted from heatmaps, a mapping phase, where
RGB values are classified to a color scale and mapped to ranges of numerical values
and a linear deprojection phase, where the images’ raster is interpreted as a
geographical grid in a specified geographical projection, centered on key points.
Screen scraping procedure: This step handles the cropping of the original image to a
region of interest and parsing of it into a 2D data array directly mapped to the original
images’ pixels. Also, it associates the color to minimum/maximum value ranges of
the air pollutant concentration levels, which is often implied by the color scale
associated with the original images. It should be noted that the information about
where to crop, where each color on the legend is, to which index it should correspond,
etc. are provided by the configuration template of the AnT in the proposed system. In
this phase, the mapping of the images’ raster to a specific geographical grid is
performed, since the images themselves represent geographical region. The
configuration system allows choosing between the most commonly encountered
geographical projections (equirectangular, conical, polar stereographic etc.) and
choosing keypoint in the image to allow for precise pixel-coordinate mapping.
“Reconstruction of missing values and data gaps” procedure: This step is introduced
to deal with unwanted elements such as legends, text, geomarkings and watermarks,
as well as regions that are not part of the forecast area, which might be present after
the screen scraping phase. The image pixels are classified into three main categories:
valid data (with colors that satisfy the color scale’s classification), invalid data (with
4 http://projects.isag.meng.auth.gr/airmerge/
Extraction of Environmental
colors not present in the c
o
marked for exclusion, and
w
regions are not considere
d
correction. However, regi
o
regions with correctable er
r
is due to their different a
p
continuous (e.g. sea regio
n
while invalid data regions
watermarks etc.) and with
m
remove them by using ga
p
grid and pattern-based inte
It should be noted that
t
API, which is available as
request related to the hea
t
processing), thus making it
7 Results
In this section, we presen
t
quality heatmaps from diff
e
provided in [8], we evalu
Regarding the OCR, we fo
c
most important informatio
n
right coordinates. The foll
o
Laboratory of Atmospheri
c
the Atmospheric and Ocea
n
7.1 GEMs Website
Figures 3 and 4 depict the
which are almost identical,
Fig. 3. Original I
m
5
http://gems.ecmwf.i
n
6
http://lap.physics.
a
7
http://www.fisica.u
n
Data from On-Line Environmental Information Sources
o
lor scale), and regions containing colors that are expli
c
w
hich are considered void during processing. Such mar
d
as part of the forecast, and thus do not undergo
d
o
ns containing unmarked invalid data are considere
d
r
ors or “data gaps” which can be filled-in. This distinc
t
p
pearance patterns: void regions are usually extended
n
s not covered by the forecast, but present on the m
a
are usually smaller but more noticeable (e.g. lines, t
m
ore noise-like patterns, and thus it is more compellin
g
p
-filling techniques. These techniques include traditi
o
r
polation techniques using neural networks.
t
he AirMerge system functionality is also provided vi
a
a REST service [9]. Therefore, AirMerge can serve
t
maps of many chemical weather models (e.g. every-
suitable for environmental service-oriented application
s
t
the results of the framework when applied in three
e
rent providers. Since an evaluation of AirMerge is alre
ate the results of the OCR and the total system out
p
c
us on the recognition of the x and y axes, since this is
n
in order to correctly map the air quality index onto
o
wing providers are considered for the evaluation: GE
M
c
Physics of the Aristotle University of Thessaloniki
6
n
ic Physics Group
7
.
original and reconstructed image by the Airmerge syst
and any noise (e.g. black lines) was removed [4], [8].
m
age Fig. 4. Image Reconstructed from AirMe
r
n
t/d/products/raq/
a
uth.gr/forecasting/fore_images/
n
ige.it/atmosfera/bolchem/MAPS/
367
c
itly
r
ked
d
ata
d
as
t
ion
and
a
p),
ext,
g
to
o
nal
a
an
any
day
s
.
air
ady
p
ut.
the
the
M
S
5
,
and
e
m
,
r
ge
368 S. Vrochidis et al.
T
a
Resolut
i
Step
s
Longitude step 0.079
1
Latitude step 0.077
7
During this process an
OCR to perfectly identify
t
we report the longitude an
d
correct step value between
0
0
, 5
0
, 10
0
, etc. the step v
a
finally the introduced erro
general we assume that an
e
7.2 Laboratory of At
m
In a similar way we prese
n
Figures 5 and 6. The result
s
Fig. 5. Original I
m
Table 2. Results fo
Resolut
i
Step
s
Longitude step 0.031
1
Latitude step 0.027
6
7.3 Atmospheric and
O
Finally, in table 3 we prese
n
0.35%. The initial and th
e
should be noted that the w
h
and considered as a distinc
as unwanted noise and fille
d
a
ble 1. Results for the GEMs website
i
on
s
Correct
Value
Estimated
Value
Absolute
Difference Erro
r
1
5 5.0634 0.0634 1.25
%
7
5 4.9776 0.0224 0.45
%
error is usually introduced mostly due to the inabilit
y
t
he position of each coordinate on the map axes. In tab
l
d
latitude steps (i.e. the coordinate step for each pixel),
two subsequent coordinate marks (e.g. when the marks
a
lue is 5), the estimated value, the absolute difference
r. In both cases the error is very low and acceptable
e
rror is acceptable, when it is less than 3%).
m
ospheric Physics of the AUTH site
n
t the initial and the reconstructed image of this websit
e
s
are reported in table 2 and the error is again very smal
l
m
age
Fig. 6. Image Reconstructed from AirMer
g
r the Atmospheric and Oceanic Physics Group website
i
on
s
Correct
Value
Estimated
Value
Absolute
Difference Erro
r
1
2 1.9924 0.0076 0.4
%
6
1 1.0236 0.0236 2.3
%
O
ceanic Physics Group Site
n
t results for the last provider reporting an average erro
e
reconstructed map are illustrated in figures 7 and
8
h
ite region in figure 7 is treated as “void space” in figur
t case than national border lines, which are instead tre
a
d
-in.
r
%
%
y
of
l
e 1
the
are
and
(in
e
in
l
.
g
e
r
%
%
o
r of
8
. It
e 8,
a
ted
Extraction of Environmental
Fig. 7. Original I
m
Table 3. Results fo
Resolut
i
Step
s
Longitude step 0.028
9
Latitude step 0.024
9
8 Conclusions
Despite the fact that the c
u
ideal for casual users, it is
expect a structured and nu
m
proposed a framework
f
combining existing (AirM
e
framework could serve as
a
either air quality informati
o
purposes or high level sug
g
advanced decision suppo
r
proposed work overcomes
t
only considers informatio
n
also data access policies. A
heatmaps it could also
d
represented in the same w
images in different projecti
o
Acknowledgments. This
w
References
1. Balk, T., Kukkonen, J.,
access chemical weathe
r
(2011), doi:10.1016/j.at
m
Data from On-Line Environmental Information Sources
m
age Fig. 8. Image Reconstructed from AirMer
g
r the Atmospheric and Oceanic Physics Group website
i
on
s
Correct
Value
Estimated
Value
Absolute
Difference Erro
r
9
2 1.9937 0.0063 0.3
%
9
1 0.9958 0.0042 0.4
%
u
rrent presentation format of air quality forecasts migh
t
not easily accessible by automatic services which w
o
m
erical format of the forecast data. In this context, we
h
f
or air quality information extraction from heatm
a
e
rge), as well as new (AnT and OCR) components.
T
a
basis for supporting environmental systems that pro
v
o
n from several providers for comparison or orchestra
t
g
estions on everyday issues (e.g. travel planning) base
d
r
t [13], which could facilitate the quality of life.
T
t
he limitation of not having access to the raw data, sin
c
n
being publicly available on the Internet, thus respec
t
lthough the system has been tested with forecast air qu
a
d
eal with observed pollutant and pollen concentrat
i
ay. Future work includes extensive evaluation with
m
o
ns (e.g. conical) and addressing of pollen heatmaps.
w
ork was supported by PESCaDO project (FP7-248594)
Karatzas, K., Bassoukos, A., Epitropou, V.: A European
o
r
forecasting portal. Atmospheric Environment 45, 6917–
6
m
osenv.2010.09.058
369
g
e
r
%
%
t
be
o
uld
h
ave
a
ps,
T
his
v
ide
t
ion
d
on
T
he
c
e it
t
ing
a
lity
i
ons
m
ore
.
o
pen
6
922
370 S. Vrochidis et al.
2. Karatzas, K.: Internet-based management of Environmental simulation tasks. In Farago, I.,
Georgiev, K., Havasi, A. (eds) Advances in Air Pollution Modelling for Environmental
Security, Hardcover, NATO Reference EST.ARW980503, 406 p., pp. 253–262. Springer
(2005) ISBN: 1-4020-3349-4
3. San José, R., Baklanov, A., Sokhi, R.S., Karatzas, K., Pérez, J.L.: Computational Air
Quality Modelling. In: Jakeman, A.J., Voinov, A.A., Rizzoli, A.E., Chen, S.H. (eds.)
Developments in Integrated Environmental Assessment. Environmental Modelling,
Software and Decision Support, vol. 3 (2008) ISBN: 9780080568867
4. Epitropou, V., Karatzas, K., Bassoukos, A.: A method for the inverse reconstruction of
environmental data applicable at the Chemical Weather portal. In: Geospatial Crossroads
@GI_Forum 2010, Proceedings of the GeoInformatics Forum Salzburg, pp. 58–68.
Wichmann Verlag, Berlin (2010) ISBN 978-3-87907-496-9
5. Karatzas, K., Kukkonen, J., Bassoukos, A., Epitropou, V., Balk, T.: A European Chemical
Weather forecasting Portal. In: 31st ITM - NATO/SPS International Technical Meeting on
Air Pollution Modelling and its Application, Torino, September 28 (2010); Published in
Steyn, D.G., Trini Castelli, S. (eds.) Air Pollution Modeling and its Applications XXI, 1st
edn., Hardcover. NATO Science for Peace and Security Series C: Environmental Security,
pp. 239–243. Springer (2011) ISBN 978-94-007-1358-1
6. Epitropou, V., Karatzas, K.D., Bassoukos, A., Kukkonen, J., Balk, T.: A new
environmental image processing method for chemical weather forecasts in Europe. In:
Proceedings of the 5th International Symposium on Information Technologies in
Environmental Engineering, Poznan, July 6-8 (2011)
7. Kukkonen, J., Klein, T., Karatzas, K., Torseth, K., Fahre Vik, A., San José, R., Balk, T.,
Sofiev, M.: COST ES0602: Towards a European network on chemical weather forecasting
and information systems. Advances in Science and Research Journal 1, 1–7 (2009)
8. Epitropou, V., Karatzas, K., Kukkonen, J., Vira, J.: Evaluation of the accuracy of an
inverse image-based reconstruction method for chemical weather data. International
Journal of Artificial Intelligence 9/A12, 152–171 (2012)
9. Epitropou, V., Johansson, L., Karatzas, K., Bassoukos, A., Karppinen, A., Kukkonen, J.,
Haakana, M.: Fusion of Environmental Information for the Delivery of Orchestrated
Services for the Atmospheric Environment in the PESCaDO Project. In: Seppelt, R.,
Voinov, A.A., Lange, S., Bankamp, D. (eds.) 2012 International Congress on
Environmental Modelling and Software, Managing Resources of a Limited Planet,
Leipzig, Germany. International Environmental Modelling and Software Society (iEMSs)
(in press, 2012)
10. Musavi, M.T., Shirvaikar, M.V., Ramanathan, E., Nekovei, A.R.: Map processing
methods: an automated alternative. In: Proceedings of the Twentieth Southeastern
Symposium on System Theory, pp. 300–303. IEEE Computer Society (1988)
11. Henderson, T.C., Linton, T.: Raster Map Image Analysis. In: Proceedings of the 2009 10th
International Conference on Document Analysis and Recognition (ICDAR 2009), pp. 376–
380. IEEE Computer Society, Washington, DC (2009)
12. Cao, R., Tan, C.-L.: Text/Graphics Separation in Maps. In: Blostein, D., Kwon, Y.-B.
(eds.) GREC 2001. LNCS, vol. 2390, pp. 167–177. Springer, Heidelberg (2002)
13. Wanner, L., Vrochidis, S., Tonelli, S., Moßgraber, J., Bosch, H., Karppinen, A., Myllynen,
M., Rospocher, M., Bouayad-Agha, N., Bügel, U., Casamayor, G., Ertl, T., Kompatsiaris,
I., Koskentalo, T., Mille, S., Moumtzidou, A., Pianta, E., Saggion, H., Serafini, L.,
Tarvainen, V.: Building an Environmental Information System for Personalized Content
Delivery. In: Hřebíček, J., Schimak, G., Denzer, R. (eds.) ISESS 2011. IFIP AICT,
vol. 359, pp. 169–176. Springer, Heidelberg (2011)
... After these ratios have been computed, it is trivial to determine the geographical offsets of the map (position of the SW point) as well as its maximum geographical extension, and therefore construct a complete, four-point geographical bounding box. Selecting appropriate points is done manually by the AirMerge's operator, though it is possible to partially automate the process[18,27]. If no usable geographical reference grid is supplied with the heatmap, it is still possible to use the known geographical coordinates of two easily identifiable landmarks. ...
... Some tokens like #prognosis# have different names from the XML nodes they are iterated over, e.g., #prognosis# takes values from the < forecasts > node, while others are more self-explanatory. It has been shown that automating the extraction of some of a heatmap's metadata and structural information is possible[6,27], by using OCR and text/image processing techniques, URL base form Pollutant Elevation Hour of daybut the efficacy of such techniques is limited by the fact that individual heatmaps do not always contain all of the necessary information, and operating on individual heatmaps makes it hard to detect the existence of generalized/common schemas between a series of similar heatmaps produced by the same CWF provider. ...
Article
Full-text available
The AirMerge platform was designed and constructed for increasing the availability and improving the interoperability of heatmap-based environmental data on the Internet. This platform allows data from multiple heterogeneous chemical weather data sources to be continuously collected and archived in a unified repository; all the data in this repository have a common data format and access scheme. In this paper, we address the technical structure and applicability of the AirMerge platform. The platform facilitates personalized information services, and can be used as an environmental information node for other web-based information systems. The results demonstrate the feasibility of this approach and its potential for being applied also in other areas, in which image-based environmental information retrieval will be needed.
... Methods to recreate numerical data from heatmaps have been proposed and applied in [3] and [7], but they are limited by the lossy process used to create the heatmaps themselves, where a significant part of the initial information ends up being discarded. The data used to produce these heatmaps are commonly the outputs of numerical dispersion models simulating the variation of pollutant concentrations in time and space, such as the SILAM [6] integrated modelling for atmospheric composition, created and managed by the Finnish Meteorological Institute. ...
... Due to the nature of the quantization process applied to AQ forecasts, the resulting heatmaps can be regarded as a particular class of non-uniformly sampled signals [2,5]. In addition, due to the specifics of CW forecasting itself, it is possible to make certain assumptions regarding the distribution of values in a CW forecast heatmap [3,7], if the quantization process and the value ranges of the quantization levels employed in the heatmap itself are known. An example of a CW forecast heatmap and its colour scale is given in Figure 1. ...
Conference Paper
Full-text available
Chemical Weather (CW) and other geospatial environmental information produced by numerical models are often published on-line in the form of heatmaps which have undergone several lossy processing and transformation steps, resulting in a relatively low visualization and recoverable data, compared to the model data which generated them,. In this paper, a method which is fine-tuned to the partial reconstruction of such chemical weather data starting from discrete-level heatmaps is presented, which relies on the augmentation of ordinary interpolation methods through the use of peak-limiting functions and other constraining methods
... These modules are integrated in a standalone, user-based interface that allows for template-based customization of heatmaps and thus assists in handling several formats of heatmaps. This paper substantially extends the works presented in Moumtzidou et al. (2012a) and Vrochidis et al. (2012), which have demonstrated the initial results of this framework, by providing an extensive evaluation, which includes a comparative study of the proposed framework against the manually configured AirMerge system and real numerical data provided by forecast models for a variety of providers. ...
... In the case of missing data points, a special interpolation algorithm (based on a novel Artificial Neural Network algorithm developed by the authors) is used to close these gaps. The authors in Moumtzidou et al. (2012a) and Vrochidis et al. (2012) propose a framework that integrates the system of Epitropou et al. (i.e. AirMerge system) and aims at automating and thus facilitating its use. ...
Article
Full-text available
Environmental data analysis and information provision are considered of great importance for people, since environmental conditions are strongly related to health issues and directly affect a variety of everyday activities. Nowadays, there are several free web-based services that provide environmental information in several formats with map images being the most commonly used to present air quality and pollen forecasts. This format, despite being intuitive for humans, complicates the extraction and processing of the underlying data. Typical examples of this case are the chemical weather forecasts, which are usually encoded heatmaps (i.e. graphical representation of matrix data with colors), while the forecasted numerical pollutant concentrations are commonly unavailable. This work presents a model for the semi-automatic extraction of such information based on a template configuration tool, on methodologies for data reconstruction from images, as well as on text processing and Optical Character Recognition (OCR). The aforementioned modules are integrated in a standalone framework, which is extensively evaluated by comparing data extracted from a variety of chemical weather heat maps against the real numerical values produced by chemical weather forecasting models. The results demonstrate a satisfactory performance in terms of data recovery and positional accuracy.
Article
Environmental and meteorological conditions are of utmost importance for the population, as they are strongly related to the quality of life. Citizens are increasingly aware of this importance. This awareness results in an increasing demand for environmental information tailored to their specific needs and background. We present an environmental information platform that supports submission of user queries related to environmental conditions and orchestrates results from complementary services to generate personalized suggestions. The system discovers and processes reliable data in the Web in order to convert them into knowledge. At runtime, this information is transferred into an ontology-structured knowledge base, from which then information relevant to the specific user is deduced and communicated in the language of their preference. The platform is demonstrated with real world use cases in the south area of Finland, showing the impact it can have on the quality of everyday life.
Conference Paper
The automatic method of information extraction from heatmaps based on OCR, image processing and image recognition techniques is proposed. It is composed of the sequence of steps. First, the heatmap area is separated from other elements of the heatmap image. Next, the key and axis are recognized. To produce quick answers for a user query, the heatmap is stored in the form of a tree. The method was tested on the basis of several diverse heatmaps. The results are promising.
Article
Full-text available
The PESCaDO project (http://www.pescado-project.eu/) aims at providing tailored environmental information to EU citizens. For this purpose, PESCaDO delivers personalized environmental information, based on coordinating the data flow from multiple sources. After the necessary discovery, indexing and parsing of those sources, the harmonization and retrieval of data is achieved through Node Orchestration and the creation of unified and accurate responses to user queries by using the Fusion service, which assimilates input data into a coherent data block according to their imprecision and relevance in respect to the user defined query. Environmental nodes are selected from open-access web resources of various types, and from the direct usage of data from monitoring stations. Forecasts of models are made available through the synergy with the AirMerge Image parsing engine and its chemical weather database. In the presented paper, elements of the general architecture of AirMerge, and the Fusion service of PESCaDO are exposed as an example of the modus operandi of environmental information fusion for the atmospheric environment.
Chapter
Full-text available
The European Chemical Weather Forecasting Portal (ECWFP) has been developed within the COST (European Cooperation in Science and Technology) ES0602 action, “Towards a European Network on Chemical Weather Forecasting and Information Systems”. The portal provides access to the predictions of a substantial number of chemical weather forecasting systems and may be used to find out which services are available for specific (1) areas, (2) time periods and (3) pollutants. The portal serves as a “one stop shop” of chemical weather modeling services and associated information, and is currently expanding its functionalities to allow for a harmonized presentation and inter-comparison of the various available forecasts, as well as for the computation of model ensemble predictions.
Chapter
Full-text available
It is common practice to present environmental information of spatial nature (like atmospheric quality patterns) in the form of pre-processed images. The current paper deals with the harmonization, comparison and reuse of Chemical Weather (CW) forecasts in the form of pre-processed images of varying quality and informational content, without having access to the original data. In order to compare, combine and reuse such environmental data, an innovative method for the inverse reconstruction of environmental data from images, was developed. The method is based on a new, neural adaptive data interpolation algorithm, and is tested on CW images coming from various European providers. Results indicate a very good performance that renders this method as appropriate to be used in various image-processing problems that require data reconstruction, retrieval and reuse.
Conference Paper
Full-text available
It is common practice to disseminate Chemical Weather (air quality and meteorology) forecasts to the general public, via the internet, in the form of pre-processed images which differ in format, quality and presentation, without other forms of access to the original data. As the number of on-line available Chemical Weather (CW) forecasts is increasing, there are many geographical areas that are covered by different models, and their data could not be combined, compared, or used in any synergetic way by the end user, due to the aforementioned heterogeneity. This paper describes a series of methods for extracting and reconstructing data from heterogeneous air quality forecast images coming from different data providers, to allow for their unified harvesting, processing, transformation, storage and presentation in the Chemical Weather portal.
Article
Full-text available
It is common practice to publish environmental information via the Internet. In the case of geographical coverage information such as pollutant concentration charts and maps in chemical weather forecasts, such data are published as web-resolution images. These forecasts are commonly presented with an associated value-range pseudocolor scale, which represents a simplified version of the original data, obtained through dispersion models and related post-processing methods. In this paper, the numerical and signal processing performance of a method to reconstruct numerical data from the published coverage images is evaluated by comparing the reconstructed data with the original forecast data
Article
Progress in computer capabilities has substantially influenced research in air quality modelling, a very complex and multidisciplinary area. It covers remote sensing, land use impacts, initial and boundary conditions, data assimilation techniques, chemical schemes, comparison between measured and modelled data, computer efficiency, parallel computing, coupling with meteorology, long-range transport impact on local air pollution, new satellite data assimilation techniques, real-time and forecasting and sensitivity analysis. This contribution focuses on providing a general overview of the state of the art in air quality modelling from the point of view of the “user community,” which includes policy makers, urban planners and environmental managers. It also tries to bring to the discussion key questions, such as where are the greatest uncertainties in emission inventories and meteorological fields, how well do air quality models simulate urban aerosols, and what are the next generation developments in models to answer new scientific and management questions.
Article
A European chemical weather forecasting portal is presented in this paper that has been developed within the COST (European Cooperation in Science and Technology) ES0602 action, “Towards a European Network on Chemical Weather Forecasting and Information Systems”. The portal includes an access to a substantial number (currently 21) of available chemical weather forecasting systems and their numerical forecasts; these cover in total more than 30 regions in Europe. This portal can be used, e.g., to find out, which services are available for a specific domain, for specific source categories or for specific pollutants. The portal currently expands its functionalities to allow for a harmonized presentation and inter-comparison of the various available forecasts, as well as for the computation of model ensemble predictions. It provides functions for obtaining relevant supplementary information, e.g., using the Model Documentation System of the European Environmental Agency. The new portal is an open access system, through which chemical weather forecasts can be added to the system, and the predictions can be accessed, analysed and inter-compared. Such a single point of reference for the European chemical weather forecasting information has previously not been in operation. We present the characteristics of the new portal, and discuss how this activity complements the GEMS and PROMOTE air quality forecasting portals.
Chapter
Urban air quality information originates either from observations or from mathematical tools — models and estimations. While the former correspond to the current status of air quality, and may be directly interpreted in terms of human health risk and eco-system degradation potential or effect, the latter provide forecasting information in advance, thus offering decision makers the opportunity to take preventive measures that would “smooth” or alter the results of a forecasted “episode” or even “crisis”. Both “information categories” have a strong spatial/temporal dimension, and are an ideal application domain for World Wide Web (Web)-based information dissemination methods. In the present paper, the use of web-based technologies for the management of environmental simulation tasks within the air quality domain is discussed.
Conference Paper
Raster map images (e.g., USGS) provide much information in digital form; however, the color assignments and pixel labels leave many serious ambiguities. A color histogram classification scheme is described, followed by the application of a tensor voting method to classify linear features in the map as well as intersections in linear feature networks. The major result is an excellent segmentation of roads, and road intersections are detected with about 93% recall and 66 % precision.
Conference Paper
The separation of overlapping text and graphics is a challenging problem in document image analysis. This paper proposes a specific method of detecting and extracting characters that are touching graphics. It is based on the observation that the constituent strokes of characters are usually short segments in comparison with those of graphics. It combines line continuation with the feature line width to decompose and reconstruct segments underlying the region of intersection. Experimental results showed that the proposed method improved the percentage of correctly detected text as well as the accuracy of character recognition significantly.