Content uploaded by Stefanos Vrochidis
Author content
All content in this area was uploaded by Stefanos Vrochidis
Content may be subject to copyright.
Environmental Data Extraction from Multimedia Resources
Anastasia Moumtzidou 1, Victor Epitropou2, Stefanos Vrochidis1, Sascha Voth3,
Anastasios Bassoukos2, Kostas Karatzas2, Jürgen Moßgraber3, Ioannis Kompatsiaris1,
Ari Karppinen4 and Jaakko Kukkonen4
1Information Technologies Institute
6th Klm Charilaou-Thermi Road
Thessaloniki, Greece
{moumtzid, stefanos, ikom}@iti.gr
3Fraunhofer Institute of Optronics,
System Technologies and Image Exploitation
Karlsruhe, Germany
sascha.voth@iosb.fraunhofer.de
juergen.mossgraber@iosb.fraunhofer.de
2ISAG/AUTH
Informatics Systems and Applications Group
Aristotle University of Thessaloniki
Thessaloniki, Greece
vepitrop@isag.meng.auth.gr,
abas@isag.meng.auth.gr, kkara@eng.auth.gr,
4Finnish Meteorological Institute
Helsinki, Finland
{ari.karppinen, jaakko.kukkonen}@fmi.fi
ABSTRACT
Extraction and analysis of environmental information is very
important, since it strongly affects everyday life. Nowadays there
are already many free services providing environmental
information in several formats including multimedia (e.g. map
images). Although such presentation formats might be very
informative for humans, they complicate the automatic extraction
and processing of the underlying data. A characteristic example is
the air quality and pollen forecasts, which are usually encoded in
image maps, while the initial (numerical) pollutant concentrations
remain unavailable. This work proposes a framework for the
semi-automatic extraction of such information based on a
template configuration tool, on Optical Character Recognition
(OCR) techniques and on methodologies for data reconstruction
from images. The system is tested with a different air quality and
pollen forecast heatmaps demonstrating promising results.
Categories and Subject Descriptors
H.3.1 [Information Storage and Retrieval]: Content Analysis
and Indexing – Indexing methods.
Keywords
Environmental, multimedia, images, heatmaps, OCR, data
reconstruction, template, configuration, pollen, air quality.
1. INTRODUCTION
Environmental conditions are of particular interest for people,
since they affect everyday life. Thus, meteorological conditions,
air quality and pollen (i.e. weather, chemical weather and
biological weather) are strongly related to health issues (e.g.
allergies, asthma, cardiovascular diseases) and of course they play
an important role in everyday outdoor activities such as sports and
commuting. With a view to offering personalized decision support
services for people based on environmental information regarding
their everyday activities [1], there is a need to extract and combine
complementary and competing environmental information from
several resources. One of the main steps towards this goal is the
environmental information extraction from multimedia resources.
Environmental observations are automatically performed by
specialized instruments, “hosted” in stations established by
environmental organizations, and the data collected are usually
made available to the public through web portals. In addition to
the observations, forecasts are used to foretell the levels of
pollution in areas of interest, and these are usually published on-
line in the form of images, while only a few of the data providers
make available some means of access to their actual (numerical)
forecast data. It should be mentioned that the presentation format
adopted is human oriented and in most of the cases doesn’t allow
for an automatic (or at least semi-automatic) extraction of
information. A characteristic example is the air quality and pollen
forecasts, which are usually encoded in image maps (heatmaps) of
heterogeneous formats, while the initial (numerical) pollutant
concentrations remain unavailable. In this context, we propose a
semi-automatic framework for extracting environmental data from
air quality and pollen concentrations, which are presented as
heatmap images. The framework consists of three main
components: an annotation tool for user intervention, an Optical
Character Recognition (OCR) and text processing module, as well
as the AirMerge heatmaps image processing system [2], [3].
The contribution of this paper is the integration of existing tools
for environmental quality forecast data extraction (i.e. AirMerge)
with text processing and OCR techniques tailored for heatmap
analysis, under a configurable semi-automatic framework for
processing air quality and pollen forecast heatmaps, which offers
a graphical user interface for template-based customization.
This paper is structured as follows: section 2 presents the relevant
work and section 3 describes the problem. Section 4 introduces
the proposed framework, the annotation tool, the text processing
component and the image processing module. The evaluation is
presented in section 5 and finally, section 6 concludes the paper.
2. RELATED WORK
The task of map analysis strongly depends on the map type and
the information we need to extract. Depending on the application,
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
MAED’12, November 2, 2012, Nara, Japan.
Copyright 2012 978-1-4503-1588-3/12/11...$15.00.
a straightforward requirement would be to extract meaningful
segments (e.g. rivers, forests, etc.), while in the case of heatmaps
it is to transform color into numerical data. In general, the
information contained in maps can be distinguished by their scale,
colorization, quality, etc.. In the case of air quality and pollen
forecast maps two types of information are mainly covered:
1. Geographical information: points and lines describing
country frontiers or other well-known points of interests or
structures (e.g. sea, land) in a given coordinate system.
2. Feature information: measured or forecasted parameters of
any kind (e.g. average temperature) which is coded via a
color scale representing the (measured or forecasted) values.
Single values are referenced geographically by a color point
at the corresponding geographical point.
Chemical weather maps often use raster map images to represent
forecasted data or spatially interpolated measured data. There are
several approaches to extract and digitalize this image information
automatically.
The authors in [4] describe the process of the vectorization of
digital image data. Hereby the geographical information, in form
of lines, is extracted and converted to digital storable vector data.
In [5] the authors use the specific knowledge of the known
colorization in USGS maps, to automatically segment these maps
based on their semantic contents (e.g. roads, rivers). Finally, [6]
improves the segmentation quality of text and graphics in color
map images, to enhance the results of the following analysis
processes (e.g. OCR), by selecting black or dark pixels from color
maps, cleaning them up from possible errors or known unwanted
structures (e.g. dashed lines), to get cleaner text structures.
Although research work has been conducted towards the
automatic extraction of information in maps, to the best of our
knowledge only the work performed for the AirMerge system
addresses the extraction of information from chemical weather
maps with image processing. In such works [7], [2], [3], a method
to reconstruct environmental data out of chemical weather images
is described. First, the relevant map section is scraped from the
chemical weather image. Then, disturbances are removed and a
color classification is used to classify every single data point
(pixel), to recover the measured data. With the aid of the known
geographical boundaries, given by the coordinate axis and the
map projection type, the geographical position of the measured
data point can be retrieved. In case of missing data points, a
special interpolation algorithm is used to close these gaps.
The proposed work proposes a framework, which integrates
AirMerge and extends its application to pollen forecasts and in
addition facilitates the procedure of information extraction from
heatmaps using OCR and visual annotation techniques.
3. EMPIRICAL STUDY AND PROBLEM
STATEMENT
In order to clearly state the problem we have conducted an
empirical study on more than 60 environmental websites (dealing
with weather, air quality and pollen) and based on previous works
[8] we reached to a conclusion that a considerable share of
environmental content, almost 60%, is encoded in images.
Specifically, pollen and air quality forecast information is always
illustrated in heatmap images. Overall, it can be said [9] that the
air quality and pollen information is usually presented in the form
of images representing pollutant or pollen concentrations over a
geographically bounded region, typically in terms of maximum or
average concentration values for the time scale of reference,
which is usually the hour or day [7]. These providers present their
air quality forecasts almost exclusively in the form of
preprocessed images with a color index scale indicating the
concentration of pollutants. In addition, they arbitrarily choose the
image resolution and the color scale employed for visualizing
pollution loadings, the covered region, as well as the geographical
map projection. The mode of presentation varies from simple web
images to AJAX, Java or Adobe Flash viewers [10].
The heatmaps that contain environmental information are static
bitmap images, which represent the coverage data (e.g.
concentrations) in terms of a color-coded scale over a
geographical map. An example of such heatmap is depicted in
Figure 1 obtained from the GEMS project1 website.
Figure 1. Typical example of image of air quality forecasts.
After observing the image, we conclude that besides the
geographical information and concentrations, additional
information is also provided, which is the type of environmental
“feature” (e.g. ozone, birch pollen), the date/ time information of
the meaningful information and a color scale. Therefore, the main
parts of information that need to be extracted from the image are:
Heatmap: map depicting a geographical region with
colors representing the environmental aspect value.
Color scale: range indicating the correspondence
between feature value and color
Coordinate axis (x, y): indicate the geographical
longitude and latitude of every map point for a specific
geographic projection
Title: contains information such as the type of aspect
measured, the time and date of forecast
Additional information: watermarks, wind fields
superimposed to concentration maps and any
information that can be categorized as “noise” in terms
of influencing the information content and
representation value of the specific heatmap.
4. FRAMEWORK
The proposed architecture draws upon the requirements that were
set in the previous section. The idea is to employ image analysis
and processing techniques to map the color variations on the
images on specific categories that can be ranges of values. Optical
character recognition techniques need to be used for recognizing
1 http://gems.ecmwf.int/d/products/raq/forecasts/plot_RIU/
text encoded in image format such as image titles, dates,
environmental information and coordinates.
Due to the fact that there is a large variation of images and many
different representations, there is a need for optimizing and
configuring the algorithms involved. Specifically, the intervention
of an administrative user is required in order to annotate and
manually segment different parts of a new image type (like data,
legend, etc.), which need to be processed by the content extraction
algorithms. The system workflow and the involved modules are
depicted in Figure 2.
In order to facilitate this configuration through a graphical user
interface we have implemented the “Annotation Tool” (AnT),
which is tailored for dealing with heat maps. The output of this
tool is a configuration file that holds the static information of the
image. The second module is the “Text Processing”, which uses
the information of the configuration file to extract data from the
corresponding image. More specifically, it retrieves and analyzes
the information captured in text format using text processing
techniques including OCR. The third module is the “Image
Processing”, which uses information both from the output of the
“Text processing” module and the configuration file to process the
heatmap found inside the image.
Figure 2. Image content distillation architecture
The input of the framework is a heatmap image and the output is
an XML file, in which each geographical coordinate of the initial
heatmap image is associated with a value (e.g. air quality index).
4.1 ANNOTATION TOOL
The results of the empirical study indicated that heatmap images
share common characteristics, which in most cases do not match
exactly, e.g. spatial arrangement. Therefore, it is necessary to
manually identify the interesting parts in the images in order to
automatically extract the important information. This can be
achieved by a semi-automatic approach, which involves an
administrative user, who would annotate parts of the image.
The information provided by this user includes mainly the
position and the dimension of several pre-specified elements of
the image, such as the position and size of the heatmap, the color
scale, the x and y axis, and the title, which are saved in a template
file. To do this, the user has to define region of interests (ROI)
and/or point of interests (POI) inside these images.
Within the context of this work, a tool has been developed to
make the annotation process easier and more user-friendly. The
tool called Annotation Tool (“AnT”) can load images and
predefined templates, and let the user interactively annotate them.
The tool receives as input an image and produces an XML
configuration file as output.
The basic annotation structure of the images is preloaded based on
the image type, in the form of an XML template file, which
includes all necessary elements to describe the images (e.g. for
heat maps at least the Region Of Interest (ROI) of the map, the
ROI of the coordinates, etc.). The Annotation tool (Figure 3)
provides two data views on the annotation data: a tree view and a
graphical view. The tree view can be used to traverse through all
the elements and edit the values of the elements by hand. In case
of ROIs or Points of Interest (POI) these elements are drawn in
the graphical view onto the loaded image as overlay, to verify the
parameters (e.g. size and position of an ROI). By using the mouse
the displayed ROIs and POIs can be modified. In case of ROIs the
four corner points of the drawn rectangle can be used to resize the
actual element. By clicking in the centre of the rectangle, the
object can be moved. POIs can be moved to another position. Any
change of position and size is synchronized between both views.
Figure 3. The user interface of Annotation Tool
4.2 OCR AND TEXT PROCESSING
This module is driven by the configuration file and focuses on
retrieving the textual information captured in the image using
OCR and text processing.
The first step includes the application of OCR on several parts of
the input image. We used the Abbyy Fine Reader2 OCR software,
which is applied separately on the following parts of the initial
image: title, color scale, map x and y axis. In the second step
empirical text processing techniques are applied on the results of
OCR in order to make corrections by combining different sources
of information. Part of the extracted information (e.g. coordinates)
is used as input for the Image Processing Module. Then, we
describe the two steps by applying them on the typical heatmap
image of Figure 1 and present the results.
4.2.1 OCR on Title, Color Scale, Map Axis
Based on the empirical study, considerable part of the meaningful
information can be extracted from the text surrounding the image.
More specifically, color scale and map axis are essential elements
that provide information about the values and the geographical
area covered. On the other hand, the title contains information
about the environmental aspect measured and the corresponding
date/ time. The location of the aforementioned image parts is
captured in the configuration template.
We apply OCR in the heatmap of Figure 1.The Tables 1, 2 and 3
contain the input and output of OCR for the title, the color scale
and y axis (the results for x axis are omitted, since they are
generated in a similar way to the ones of y axis). The values in
bold indicate the errors produced by OCR. It should be noted that
for the cases of color scale and x, y axis, we also retrieved the
exact position of the text, in order to relate the text position with
geographical coordinates.
2 http://www.abbyy.com.gr/
Table 1. Title - Image (top) and OCR output (bottom)
Tuesday 29 November 2011 OOUTC GEMS-RAQ Forecast
t4027 VT: Wednesday 30 November 2011 03UTC
Model: EURAD-IM Height level: Surface Parameter: Ozone [
\iq m31
Table 2. Color scale – Image (left) and OCR (with position)
output (right)
Position: left, top, right, bottom – Value
Position: 0, 17,51, 39 – Value: 360
Position: 0, 128, 50, 150 – Value: 240
Position: 0, 236, 51, 258 – Value: 200
Position: 3, 345, 51, 366 – Value: 180
Position: 3, 456, 51, 477 – Value: 180
Position: 2, 563, 51, 585 – Value: 140
Position: 3, 672, 51, 693 - Value: 120
Table 3. Coordinates of y axis – Image (left) and OCR (with
position) output (right)
Position: left, top, right, bottom – Value
Position: 20, 194, 78, 213 – Value: 65°N
Position: 20, 386, 78, 405 - Value: 60°N
Position: 21, 578, 77, 596 - Value: 55°N
Position: 21, 770, 78, 789 - Value: 50°N
Position: 20, 959, 47, 978 - Value: 45
Position: 51, 960, 77, 977 - Value: °N
Position: 21, 1155, 47, 1172 - Value: 40
Position: 51, 1155, 77, 1172 - Value: °N
4.2.2 Text processing on OCR Results
Then, we apply text processing to extract, correct and understand
the semantic information encoded in the aforementioned places.
Each of these segments was treated in different way since the type
of the semantic information included is different.
4.2.2.1 Title
The title (if it exists) usually contains the name of the aspect, the
measurement units and the date/ time. Moreover, in case that the
data included in the map are forecast data, they contain two dates.
Regarding the measurement units, these are usually standard
depending on the measured environmental aspect and therefore
we will not attempt to extract them. The date/time is considered as
the most complex element given that it is presented in several
different formats. In the current implementation, we focus on
formats similar to the GEMs site. In order to correct possible
errors in the textual format of the month, day and aspect we apply
the Levenshtein distance and compare with three English ground
truth sets. Then we correct the initial OCR result by considering
the word from the ground truth dataset that has the minimum
distance from it. For simplicity, in the current version we
considered having only one date/time in the title.
In the specific example, no corrections were required. Thus, the
information we obtained from the title is the following:
Date/ time: 2011-11-29 00:00:00, Aspect: Ozone
4.2.2.2 Color Scale
The color scale shows the mapping between colors variations in
the map and environmental aspect values. The extraction of
information from the color scale is a two step procedure.
The first step corrects OCR results, while the second correlates
values to colors. In order to correct the OCR results, the most
common difference among the scale values is calculated and then
the error values are adapted accordingly. The correlation of values
to colors is achieved by using the top-bottom or left-right
coordinates, depending on the color scale orientation, of the color
scale values and map them to the closest color.
In the specific example, the most common interval among the
values in the scale is 20 and error values are corrected based on
that. Then, values are mapped onto coordinates and thus colors.
For example, 140-160 is mapped onto the color found in
(719,224) coordinates of the initial image.
4.2.2.3 X and Y Axis
Regarding x and y axis, similar processing techniques are applied
since they both represent the geographical coordinates of the map.
Specifically, at least two points of the map, as well as their
position with respect to the map needs to be resolved, in order to
identify successfully all the point coordinates. The procedure
followed includes again two steps: a) correction of the errors
produced by OCR and b) use of the element the coordinates.
For the specific example, after correcting OCR results, we
associated the geographical coordinates (-10, 65) and (-5, 60) to
the image map pixels (98, 125) and (162,189) respectively.
4.3 IMAGE PROCESSING
In this section we present the image processing module that
extracts data from different models and coordinate systems. The
tool integrated into the system is the AirMerge engine, which
performs various tasks concerning analysis, reverse engineering
and reuse of heatmaps like Chemical Weather forecasts.
The AirMerge engine combines elements of screen scraping,
image processing and geographical coordinate transformations, in
order to produce uniform, indexed data using a unified format and
geographical projection. The engine is already in use as part of a
more complex production environment [9], [3], it is available as a
REST service via an API, and also in the proposed tool-chain as
the final processing step, as indicated in Figure 2.
4.3.1 Screen Scraping
This step handles the cropping of the original image to a region of
interest (the heatmap) and parsing of it into a 2D data array
directly mapped to the original images’ pixels. Also, it associates
each color to minimum/maximum value ranges of the air pollutant
concentration levels, which is often implied by the color scale
associated with the original images. It should be noted that the
information about where to crop, where each color on the legend
is, etc. are provided by the configuration template of the AnT in
the proposed system. In this phase, the mapping of the images’
raster to a specific geographical grid is performed, since the
images themselves represent a geographical region. The
configuration system allows choosing between the most
commonly encountered geographical projections (equirectangular,
conical, etc.) and choosing keypoints in the image to allow for
precise pixel-coordinate mapping. These functions are semi-
automated in the autonomous AirMerge system.
4.3.2 Reconstruction of Missing Values and Data
Gaps
This step deals with unwanted elements such as legends, text,
geomarkings and watermarks, as well as regions that are not part
of the forecast area. The image’s pixels are classified into three
main categories: valid data (with colors that satisfy the color
scale’s classification), invalid data (with colors not present in the
color scale), and regions containing colors that are explicitly
marked for exclusion, and which are considered void for all
further presentation and processing. However, regions containing
unmarked invalid data are considered as regions with correctable
errors or “data gaps” which can be filled-in. This distinction is due
to their different appearance patterns: void regions are usually
extended and continuous (e.g. sea regions not covered by the
forecast, but present on the map), while invalid data regions are
usually smaller but more noticeable (e.g. lines, text, watermarks
“buried” in valid data regions) and with more noise-like patterns,
and thus it is more compelling to remove them by using gap-
filling techniques. These techniques include traditional grid
interpolation as well as pattern-based interpolation techniques
using neural networks, which are described in detail in [8].
5. RESULTS AND EVALUATION
The evaluation of the framework is carried out into two steps with
different focus. The first step deals with evaluating the OCR and
providing a visual assessment of the output, while the second
evaluates the final system result after running a series of tests on
several images and comparing with ground truth. We omit the
presentation of the final XML output of the system (i.e. mapping
of geographic coordinates to forecast values), since its visual
presentation is not that informative, and instead we present the
reconstructed image, which derives from this representation and is
more appropriate for visual inspection.
5.1 OCR Performance and Visual Results
The tests in this step focus on the recognition of the x and y axis
and evaluate the assignment of pixels to geographical coordinates.
Given the fact that we were not aware of the initial heatmap
values we can only assess the results of AirMerge by visual
comparison of the original image and the one produced by
AirMerge. However, a more detailed assessment and evaluation of
the AirMerge system can be found in [7], [2] and [3].
The images tested during the first step of the evaluation were
extracted from the following sites of the Finnish Meteorological
Institute (FMI): Pollen FMI site
3
and SILAM model FMI site
4
.
Pollen FMI website
Figures 4 and 5 depict the original image and the reconstructed
image produced from the proposed system after visualizing the
XML output. The reconstructed figure is almost identical and in
addition any noise (e.g. black lines) was removed.
In table 4 we report the error introduced by the OCR named as
“absolute error”, which is calculated as the difference when
subtracting the OCR estimation (e.g. 4.98775 in the first line)
from the initial degrees range (e.g. 5 in the first line). In both
cases the absolute error is very low (around 0.3%) and acceptable.
3
http://pollen.fmi.fi
4
http://silam.fmi.fi/
Figure 4: Original Image Figure 5: Reconstructed Image
Table 4. OCR error in Pollen website
Original
degrees Estimation Absolute Error
Longitude step 5
o
4.98775 0.01225
Latitude step 5
o
4.98404 0.01596
SILAM model FMI website
Figure 6: Original Image
F
igure 7: Reconstructed Image
Table 5. OCR error in SILAM website
Original
degrees Estimation Absolute Error
Longitude step 5
o
4.97516 0.02484
Latitude step 5
o
4.96523 0.03477
In case of the SILAM site, based on visual assessment the
reconstructed image (Figure 7) is almost identical to the initial one
(Figure 6). The absolute geo-coordinate error is very low (around
0.6%) and thus the error introduced by OCR is not significant.
5.2 System Evaluation
In this step we focus on evaluating the system output with
heatmaps from different providers. The evaluation is realized by
comparing the results of AirMerge system based on manual
configuration, which is considered as ground truth with the results
of the proposed system involving AnT, OCR and AirMerge.
The tests are performed on a set of 60 images, extracted from the
following sites and locations for different times/dates:
GEMS site, http://gems.ecmwf.int/d/products/raq/
Pollen FMI site,
http://pollen.fmi.fi/pics/Europe_ECMWF_olive.html
SILAM site,
http://silam.fmi.fi/AQ_forecasts/Europe_v4_8/index.html
Atmospheric and Oceanic Physics Group site,
http://www.fisica.unige.it/atmosfera/bolchem/MAPS/
In Table 6 we report the following results for every site: a) the
number of images, b) the number of different colors in the color
scale, c) the absolute latitude and longitude errors, which indicate
the error introduced by the proposed framework for 5o degrees in
each axis, d) the average percentage of pixels with correct values
(i.e. compared with the values provided by manually configured
AirMerge) and e) the average error introduced in each pixel due to
OCR and thus misalignment of the coordinates. The error is
calculated as:
, where is the total number of
pixels, is the value of pixel i using AirMerge with manual
configuration and is the value estimated by the system.
Based on Table 6, it is evident that both the latitude and longitude
errors are quite low for all sites and the percentage of pixels with
correct values is satisfactory. The error introduced in each pixel
value is in general quite low (around 6%), while only in the case
of Pollen FMI site, the error is higher (around 12%). This is
probably due to the fact that the values between sequential pixels
were highly varying compared to the other sites.
6. CONCLUSIONS
In this paper, we propose a framework for environmental
information extraction from air quality and pollen forecast
heatmaps, combining image processing, template configuration,
as well as textual recognition components. This framework could
serve as a basis for supporting environmental systems that provide
either air quality information from several providers for direct
comparison or orchestration purposes or decision support [1] on
everyday issues (e.g. travel planning). The proposed work
overcomes the limitation of not having access to the raw data,
since it only considers information being publicly available on the
Internet. Future work includes extensive evaluation with more
images in different projections (e.g. conical), recognition of
additional elements with OCR, as well as employment of Linked
Open Data to enrich the semantics of the extracted information.
7. ACKNOWLEDGMENTS
This work was supported by the FP7 project PESCaDO.
8. REFERENCES
[1] Wanner, L., Rospocher, M., Vrochidis, S., Bosch, H.,
Bouayad-Agha, N., Bugel, U., Casamayor, G., Ertl, T.,
Hilbring, D., Karppinen, A., Kompatsiaris, I., Koskentalo, T.,
Mille, S., Moßgraber, J., Moumtzidou, A., Myllynen, M.,
Pianta, E., Saggion, H., Serafini, L., Tarvainen, V., and
Tonelli, S. 2012. Personalized Environmental Service
Configuration and Delivery Orchestration: The PESCaDO
Demonstrator. In Proceedings of the 9th Extended Semantic
Web Conference (ESWC 2012), Heraclion, Crete, Greece.
[2] Epitropou V., Karatzas K. and Bassoukos A. 2010. A method
for the inverse reconstruction of environmental data
applicable at the Chemical Weather portal. In Geospatial
Crossroads @GI_Forum’10, In Proceedings of the
GeoInformatics Forum Salzburg, 58-68, Wichmann Verlag,
Berlin, ISBN 978-3-87907-496-9.
[3] Epitropou V. Karatzas K., Kukkonen J. and Vira J. 2012.
Evaluation of the accuracy of an inverse image-based
reconstruction method for chemical weather data,
International Journal of Artificial Intelligence, in press.
[4] Musavi, M.T., Shirvaikar, M.V., Ramanathan, E. and
Nekovei, A.R. 1988. Map processing methods: an automated
alternative. In Proceedings of the Twentieth Southeastern
Symposium on System Theory, 300-303.
[5] Henderson, T. C. and Linton, T. 2009. Raster Map Image
Analysis. In Proceedings of the 2009 10th International
Conference on Document Analysis and Recognition (ICDAR
'09). Washington DC, USA, 376-380.
[6] Cao, R. and Tan, C. 2002. Text/graphics separation in maps.
In Fourth IAPR Workshop on Graphics Recognition, 2390,
167–177, Springer, Berlin.
[7] Epitropou, V., Karatzas, K.D., Bassoukos, A., Kukkonen, J.
and Balk, T. 2011. A new environmental image processing
method for chemical weather forecasts in Europe. In
Proceedings of the 5th International Symposium on
Information Technologies in Environmental Engineering,
Poznan, (Golinska, Paulina; Fertsch, Marek; Marx-Gómez,
Jorge, eds.), ISBN: 978-3-642-19535-8, Springer Series:
Environmental Science and Engineering, 781-791.
[8] Karatzas K. 2009. Informing the public about atmospheric
quality: air pollution and pollen, Allergo Journal, 18, Issue
3/09, 212-217.
[9] Balk T., Kukkonen J., Karatzas K., Bassoukos A. and
Epitropou V. 2011. A European open access chemical
weather forecasting portal, Atmospheric Environment, 45,
6917-6922, doi:10.1016/j.atmosenv.2010.09.058
[10] Kukkonen, J., Klein, T., Karatzas, K., Torseth, K., Fahre Vik,
A., San Jose, R., Balk, T. and Sofiev, M. 2009. COST
ES0602: Towards a European network on chemical weather
forecasting and information systems, Advances in Science
and Research Journal, 1, 1–7.
Table 6. Results comparing the proposed system output with the manually configured AirMerge.
Pollen FMI
site
SILAM
site
GEMS
site
Atmospheric and Oceanic Physics
Group site
Number of images 15 15 15 15
Number of colors 9 11 12 12
Latitude Error (in 5o) 2.85·10-4 8.54·10-4 2.42·10-4 6.38·10-4
Longitude Error (in 5o) 0.00174 3.88·10-4 0.00164 1.39·10-4
Mean percentage of pixels with correct value 97.424 % 90.957 % 77.23 % 77.3 %
Average error per pixel 0.1265 0.0365 0.0619 0.0504