ArticlePDF Available

Abstract and Figures

There is a large amount of meteorological and air quality data available online. Often, different sources provide deviating and even contradicting data for the same geographical area and time. This implies that users need to evaluate the relative reliability of the information and then trust one of the sources. We present a novel data fusion method that merges the data from different sources for a given area and time, ensuring the best data quality. The method is a unique combination of land-use regression techniques, statistical air quality modelling and a well-known data fusion algorithm. We show experiments where a fused temperature forecast outperforms individual temperature forecasts from several providers. Also, we demonstrate that the local hourly NO2 concentration can be estimated accurately with our fusion method while a more conventional extrapolation method falls short. The method forms part of the prototype web-based service PESCaDO, designed to cater personalized environmental information to users.
Content may be subject to copyright.
The Extraction and Fusion of Meteorological and Air
Quality Information for Orchestrated Services
Lasse Johansson
The Finnish Meteorological Institute,
Dept. of Atmospheric composition
Erik Palmenin aukio 1
00101, Helsinki, Finland
Victor Epitropou
and Kostas Karatzas
Aristotle University of Thessaloniki,
Dept. of Mechanical Engineering,
54124 Thessaloniki, Greece
Leo Wanner
Catalan Institute for Research and
Advanced Studies,
Dept. of Information and
Communication Technologies,
Pompeu Fabra University, Barcelona,
Ari Karppinen
and Jaakko Kukkonen
The Finnish Meteorological Institute,
Dept. of Atmospheric composition
Stefanos Vrochidis
and Ioannis Kompatsiaris
Information Technologies Institute, Centre for Research
and Technology Hellas, Thessaloniki, Greece
The PESCaDO system (Personal Environmental Service
Configuration and Delivery Orchestration) aims at providing
accurate and timely information about local air quality and
weather conditions in Europe. The system receives environment
related queries from end users, discovers reliable environmental
multimedia data in the web from different providers and
processes these data in order to convert them into information and
knowledge. Finally, the system uses the produced information to
provide the end user a personalized response. In this paper, we
present the general architecture of the above mentioned system,
focusing on the extraction and fusion of multimedia
environmental data. The main research contribution of the
proposed system is a novel information fusion method based on
statistical regression modelling that uses as input data land use
and population density masks, historic track-record of data
providers as well as an array of atmospheric measurements at
various locations. An implementation of this fusion model has
been successfully tested against two selected datasets on air
pollutant concentrations and ambient air temperatures.
Recently, the emergence of social media, personalized web
services and the increased public awareness of environmental
conditions that impact the quality of life have resulted in the
demand for easier access to environmental information tailored to
personal requirements. In particular, in case of the atmospheric
environment, there is a need for an integrated assessment of the
impact of air pollution, allergens and extreme meteorological
conditions on public health [9], [8]. In addition, this information
has to be disseminated to citizens in an easily accessible form [7].
Getting a direct answer to a seemingly simple question such as
“How will the air quality be tomorrow in Glasgow?” involves
extensive manual search and expert interpretation of the often
contradictory and heterogeneous information found on various
web sites. Furthermore, a significant portion of air quality and
meteorological information is published on the Internet only in
the form of colour-mapped, geo-referenced images [1]. Also the
quality of information might vary significantly in reliability and
relevance with respect to the queried location and time. On the
other hand, even biased and inaccurate information about air
quality could be utilized effectively by data fusion methods in
order to provide reliable information. The success of fusing
multiple model results is evident in the case of models with no
major deviation of forecasting performance, and has been
demonstrated in many related studies [22].
In this context, in [5] it has been presented an approach to
provide air quality information for any location within a large
geographical domain, by fusing air quality data from multiple
sources, by using a statistical air pollution model (RIO). In a
review of land use regression (LUR) models it has been stated
that LUR-models have been very successful in predicting annual
mean concentrations of NO2 and PM2.5 in urban environments [4].
However, these state-of-the-art LUR models are difficult to utilize
for the accurate prediction of hourly concentration of air
pollutants a more dynamic approach is needed. Another
complication is the extremely heterogeneous nature of input data
which may contain model forecasts and observations, both with
varying reliability, time of validity and location. Spatial and
temporal gaps are also a matter of concern; there are only a finite
number of measurement stations, and forecasting models also
have a finite spatial and temporal resolution. These considerations
lead to the need to use some form of data interpolation either in
space or time, or both.
In this paper, we aim to describe the general architecture of the
PESCaDO system, focusing especially on the fusion of extracted
information [20], [21]. First, we discover environmental nodes
(i.e. web resources that include environmental measurements),
which are relevant to the area of interest. Then, a specific service
called AirMerge is presented, which is capable of performing
extraction and fusion of information from a wide range of online
Chemical Weather (CW) forecasting systems. The online fusion
service is then presented; this is a general method for the fusion of
Copyright © by the paper’s authors. Copying permitted only for private
and academic purposes.
In: S. Vrochidis, K. Karatzas, A. Karpinnen, A. Joly (eds.): Proceedings of
the International Workshop on Environmental Multimedia Retrieval
(EMR2014), Glasgow, UK, April 1, 2014, published at
processed meteorological and air quality data, and is also the
main topic of this paper. There are many definitions of data
fusion, as it is a method that is applied to various scientific
domains, such as remote sensing, meteorological forecasting,
sensor networks, etc. [19]. We use the term “fusion” to describe
the process of integration of multiple data and knowledge into a
consistent, accurate, and useful representation. An evaluation of
the performance of this fusion system is presented for two
selected cases: i) the fusion of atmospheric temperature forecasts
and (ii) the fusion of measured NO2 concentrations.
We present here an overview of the general architecture of the
PESCaDO system. For a more detailed description, the reader is
referred to [20], [21].
2.1 An overview of the PESCaDO system
The purpose of the PESCaDO system is to address the need for
timely personalized environmental information (see for more information). It first processes
user queries, based on the personal information on the user,
formulated in terms of a user profile. For instance, health
conditions such as asthma may affect the displayed warnings and
recommendations while the user group (e.g. citizen or
administrative expert) affects the level of detail and technicality
of the response.
Figure 1a-b: A simplified schematic diagram of the PESCaDO
system, starting from the user defined query and ending at
the delivery of response (a). An example response for the user
is presented in figure (b).
The queries are formulated in terms of PESCaDO’s Problem
Description Language via an interactive web interface. First, the
system discovers environmental nodes that contain measurements
for the areas of interest. Then, for each query, (i) relevant
environmental data sources are orchestrated, (ii) data from textual
and image formats in the sources are identified, extracted, fused
and reasoned over to assess the relevance of the data for the user,
and (iii) his query and the outcome are presented in terms of a
bulletin in the language of the preference of the user.
Figure 1a illustrates the information flow of PESCaDO from
the viewpoint of the Fusion Service, which is the backbone of the
system. The system includes two uncoupled process chains, called
here as pipelines, that operate in offline and online modes. In the
offline pipeline, environmental websites that cover the region
targeted by the user are searched for in the web and data are
extracted from the identified sites and fed into the database of the
system. We use the term ‘offline’ here since at the time of user
query the data used by the pipeline has already been retrieved,
processed and stored into a local database. In the online pipeline
user queries are processed and answered. The online pipeline
starts from the specification of personal information and query by
the user. With this information, the system first determines which
aspects of environmental and contextual knowledge (e.g.
temperature, CO2 concentration, etc.) are relevant to the user and
his query (cf. Fig, 1, Relevant aspects determination). Next, the
Fusion Service (FS) is given a request to produce fused
information about the identified relevant aspects. At this stage,
the system retrieves information from the database and starts to
process it. The ‘relevant aspects’ could be, for instance, “NO2
concentration and ambient air temperature, tomorrow between
12:00 and 18:00 in a specified region in Helsinki, given the
reported traffic density”. Furthermore, the user profile
(administration personnel vs. citizen; healthy individual vs.
allergic, etc.) affects the way the response is ultimately presented
to the user (relevant aspects determination).
The Data Retrieval Service (DRS) serves as an interface,
through which other PESCADO services can retrieve information
(i.e. environmental measurements) from the database. The Fusion
Service queries the DRS to receive environmental data available
for the requested geographic areas and time periods for all related
environmental aspects. After the FS fuses the data retrieved from
the DRS these are inserted to the PESCaDO Knowledge Base
The PESCaDO’s KB contains, manages and provides
information represented with the PESCaDO ontology to other
services [13]. This KB also provides the Fusion Service with
supporting information needed in the fusion process. This
includes source identification and fixed coordinates if available,
and source reliability. Furthermore, the PESCaDO ontology helps
to translate verbal ratings into numeric form if needed. For
instance, the expression “heavy rain” can be converted into mm/h
numeric value with the help of the concept definitions in the
ontology. More specifically, the KB is queried about the upper
and lower limit for “heavy rain” in the specified region and then
the average value of the returned limits can be taken to represent
the input in numeric format - an approach related to the use of
fuzzy logic methods in air quality problems [6].
Once all input data are in numeric form, the FS fuses the data
by one variable (e.g. temperature, wind speed, NO2 or O3) at a
time, utilizing available uncertainty metrics for each information
source given by the Uncertainty Metrics tool (UMT). Fused data
are stored in the KB and then the tasks, including the selection,
structuring and presentation of the information resulting from the
fusion to the user can be carried on. In parallel, the retrieved
information, which can be used for performance evaluation later
on, is passed to UMT and stored. Using this stored information,
UMT evaluates measured values against forecasts autonomously
and produces updated source node uncertainty metrics.
2.2 Discovery of environmental nodes
As described in the previous section, the first step realized by
the PESCaDO framework is the discovery of environmental
nodes. The huge number of the nodes, their diversity both in
purpose and content, as well as, their widely varying and a priori
unknown quality, set several challenges for the discovery and the
orchestration of these services [21].
The PESCaDO discovery framework combines the main two
methodologies of internet domain specific search: (a) the use of
existing search engines for the submission of domain-specific
automatically generated queries, and (b) focused crawling of
predetermined websites [23]. To support domain-specific search
using a general purpose search engine [12], two types of domain
specific queries are being formulated: the basic and the extended.
Basic queries are produced by combining environmental related
keywords (e.g. weather, temperature) with geographical data (e.g.
city names). Extended queries are generated by enhancing the
basic queries with additional domain-specific keywords, which
are produced using the keyword spice technique [14]. Both types
of queries are then submitted to Yahoo BOSS API search engine.
In parallel, a focused crawler is employed, built upon the
Apache Nutch -crawler and is based on [18]. This implementation
attempts to classify sites by using hyperlink and text information
(i.e. anchor text and text around the link) with the aid of a
supervised classifier. This approach is new in comparison to a
previously presented method for web-based information
identification and retrieval with the aid of a domain vocabulary
and web-crawling tools [2].
The output of both techniques is post-processed in order to
improve the precision of the results by separating relevant from
irrelevant nodes and categorizing and further filtering the relevant
nodes with respect to the types of environmental data they
provide (air quality, pollen, weather, etc.). The determination of
the relevance of the nodes and their categorization is done using a
supervised classification method based on Support Vector
Machines (SVM). The SVM classifiers are trained with manually
annotated websites and textual and visual features extracted from
the environmental nodes. The textual features are key phrases and
concepts extracted from the metadata and content of the
webpages using KX [15] and the vector representation is based on
the bag of words model. The visual features (MPEG-7, [17]) are
extracted from the images included in the discovered websites in
order to identify heatmaps that are usually present in air quality
forecast websites.
2.3 Orchestration of environmental nodes
and data extraction
Once the environmental nodes have been detected and indexed,
they are available as data sources or as active data consuming
services (if they require external data and are accessible via a web
service API).
To distil data from text, advanced natural language parsing
techniques are applied, while to transform semi-structured web
content into structured data, regular expressions and HTML trees
are used. Data extraction from images focused on heatmap
analysis using the AirMerge system, described in the following
2.4 AirMerge subsystem
A significant portion of Air Quality (AQ) related information
(in particular, Chemical Weather forecasts) is published on the
Internet only in the form of colour-mapped geo-referenced
images. Such image-based information is impossible to be parsed
via usual text-mining and screen-scraping techniques used in web
mash-up-like services. It was thus important to provide
PESCaDO with a specialized service that allows accessing and
using CW forecast images as another source of data to use during
the Orchestration and Fusion phases. Such a system, called Air
Merge, has already been developed and described in [3], [1].
AirMerge is an open access system, which is currently
dedicated to the whole European continent (the coverage of
different territories is possible, accessing a wide number of
environmental nodes containing CW information, and can
automatically extract data from various data sources). These
images commonly have geographical spatial resolutions ranging
from 1x1 km to 20x20 km, and temporal resolutions from a
minimum of one hour to an entire day [10]. The reported values
usually are maximum or average air pollution concentration
values for the selected integration time.
Figure 2: Example of a PM2.5 forecast (produced by MACC)
conversion process using AirMerge. Bitmap data (a) is
transformed into numerical form by using the colour scale c).
The heatmap a) has been reproduced in b) using the
converted numeric grid.
In the context of PESCaDO, AirMerge apart from performing
image extraction, it acts as an autonomous web-crawling, parsing
and database-storage mechanism for CW forecasts, using its own
means and processes which are distinct from those of PESCaDO,
having been developed independently. The harvested data cover
most of Europe for a time period going back to August 2010
when it first became operational. Time resolutions range from one
hour to a day, depending on the capabilities of the sources used.
A typical set of CW models and the resulting images can be
found in the European Open-access Chemical Weather
Forecasting Portal described by [1], that has been developed in
the frame of COST Action ES0602 (
AirMerge is able to convert such image-based concentration maps
into numerical, geographically referenced data, accounting for
geographical projections, missing data, noise and the differences
in publishing formats between different model providers. The
result is the effective conversion of image data back into
numerical data, which is then made directly available for a
number of numerical processing applications.
It should be clarified that in the proposed system AirMerge has
two roles: a) it performs image data extraction and b) it is an
additional environmental node that provides environmental data
encoded in images.
The fusion of information in an orchestrated service such as
PESCaDO, offers several advantages to the user. First, the output
of the system includes only one set of values instead of an
extensive collection of pieces of information that may not agree
with each other. Secondly, the fusion result will be of a better
quality with respect to the individual sources. Third, small
geographic and temporal gaps in the input data can be
The above mentioned services for environmental node
discovery and data retrieval guarantee a large amount of relevant
input data which need to be fused with respect to the user defined
query. However individual competing pieces of information from
different nodes can seldom be regarded as equally relevant and
thus a general measure for information relevance and quality is
needed for data fusion.
In the fusion process, all pieces of meteorological and air
quality data correspond to a certain time and place. These pieces
of information can be regarded as statistical estimators
or in short, in which is distance and is time, for the
conditions governing the area and time of interest for the user:
  (1)
where / is the coordinate vector for the location of interest /
location associated with the estimator, / is the time of
interest / estimator time and is the estimator error. For sensors
the estimator time is simply the time of measurement. The
algorithm that is used in calculating the fused value requires
information about the statistical properties of , namely the
expected variance of . Thus, a detailed description of the
evaluation of  is given. The fusion service estimates an
aggregate statistical variance measure for each and these
variance measures are then used for the assignment of averaging
weights to each . Essentially a large estimated aggregate
variance causes the assigned weight to decrease, while the data
from the more accurate and relevant sources are assigned larger
weights and gain more emphasis in the fusion.
3.1 Variance estimation
The variance of , , is affected by the information
source’s capability to properly assess the phenomenon of interest.
In addition, information about air pollutant concentrations and
weather conditions loses accuracy rapidly as a function of the
temporal interval between the measurement time and the time of
interest defined by the user. Furthermore, a data point near
should always get a larger weight in the fusion in contrast to other
data points that describes the conditions in more remote locations.
Thus, we assume that the variance related to is the sum of these
three individual (independent and thus summable) components,
given by
    (2)
where  is the variance component as function of ,  is
the temporal variance component as a function of , in which
   (3a)
   (3b)
 in Eq. 2 describes the information source’s
inherent quality in terms of variance, i.e., the capability to
estimate at point-blank range when and are equal to
zero. For the evaluation of , stored information
about the source’s prediction accuracy in past can be used,
evaluated by the Uncertainty Metrics Tool (see Fig 1). More
specifically, measurements and model forecasts are paired
together if they represent the same time and location and the
statistical variance is then calculated for the population of
evaluation pairs.
In the presented PESCaDO framework, the location for the
estimator may not have been defined exactly; this is
usually the case, for instance, with extracted weather forecasts for
cities. In these cases actually pinpoints the center of city while
information represents the conditions through-out the city. In such
cases the coordinates are flagged as approximations and set
 , where is the radius of the city.
The variance models  and  can be formulated with
statistical methods. In the fusion service these have been
formulated individually for each air pollutant species using
regression analysis with historical measurement data. For the pilot
application of the method, these data represent 6 to 43
measurement stations across Finland, depending on the measured
values. More specifically, the following simple regression models
are employed:
  (4a)
 (4b)
where parameters  and are defined with statistical
regression techniques. More complex regression models were also
studied but the added benefit for using more natural, logarithmic
regression models was negligible; the achieved correlation of
 polynomial models is generally very high for the temporal
domain of interest (τ < 36h). In the formulation of , the
measurement station’s capability to predict the measured
phenomenon at a distance of (covariance of the two time series)
is evaluated.
3.2 Optimal weight calculation
Assuming all data sources to be independent and the estimators
to be non-biased (  ), an optimal fused value 
can be calculated according to [16] given by:
where individual weights is given by
To assure statistical independence of .., only the most
relevant estimator per data source is selected for the fused
value calculation in Eq. 5. If a collection of estimators
{} is available from the same source, the
selected to represent the source is simply the one with the
lowest  from the collection. In the particular case for
extracted time series from measurement stations, the estimator
which has the smallest is selected to represent the source, as
and the base variance are the same for all ...
Theoretically, it can be shown that the fused value  is
the optimal estimator in terms of mean squared error and that the
prediction accuracy increases while the number of independent
data sources (n) is increased [16]. More importantly, 
does not suffer from low quality input data, as long as  in
Eq. 2 has been estimated reasonably well.
3.3 Bias correction
In the algorithm presented in section 3.2, it was assumed that
each is an unbiased estimator for the conditions in at the
time . Local air quality measurements from a different
environment, however, are usually significantly biased estimators
for the conditions in other nearby environments. Moreover, the
hour of day may even contribute to the bias (consider a
measurement station near a busy road during the morning traffic).
Thus, in order to use Eq. 5 effectively, the fusion service utilizes a
geographic profiling feature to detect and automatically remove
this kind of structural bias from the estimators. The fusion service
was incorporated with high-resolution land use and population
density masks for Finland (the selected domain for the PESCaDO
prototype). For land-use, a dataset from CORINE with a
resolution of 50m x 50m is being used. For population density
data (for 2010), the fusion service has the prototype domain
covered with a resolution of 250m x 250m. These two data
sources are used for profiling and comparing the differences
between the environments in and and ultimately, is
polished into a non-biased estimator for . The profiling
is done as follows:
- The surrounding land use (with evaluation radius of
200m) and population density (a wider evaluation
radius of 6km) for both and is evaluated.
- The evaluated environment is expressed as a collection
of selected land-use frequencies and population density.
This collection is referred to as a profile in this paper
(Fig 3).
After the evaluation of profiles, the difference between the
expected values is evaluated. Let  be the estimator profile
and  be the evaluated profile corresponding to the user
defined location and time. Then, a bias corrected estimator
 is given by
  (7)
where  is the expected hourly concentration of the
pollutant at the estimator’s location at time and  is
the expected pollutant concentration in the user defined location
at the time .
The evaluation of Eq. 7 requires yet another statistical model
(for each pollutant) to calculate the expected concentration as a
function of time and key land-use frequencies. Such a set of
statistical models has been implemented with the fusion service,
using the archived measurement time series in Finland as
calibration data: the environments around the stations were
evaluated and multi-variable regression was applied. The
regression was repeated with several different land-use and
population evaluation radii; the best correlation was achieved
with the abovementioned values (land-use with a 200m radius,
population density with a 6km radius). Nevertheless, this
mathematically intensive regression procedure is not discussed in
this paper further although for the NO2 pollutant, a demonstration
of the profiling method and its capability to predict the expected
hourly concentration is presented in section 4.1.
Figure 3: Profile evaluation with land use and population
density maps. The larger circle represents the area for local
population determination and the smaller red circle
represents the area for land use determination. Satellite image
provided by Google Earth.
As discussed in at the beginning of section 2.1 the fusion
service stores measurements as evaluation material for individual
service providers and models. Thus for another completely
different region other than Finland, the regression parameters for
profiling can be set without a fixed set of calibration material; the
stored measurements that have flown through the PESCaDO
system can be further exploited by setting up the regression
parameters for profiling automatically as the number of
measurements builds up over time. In this sense the profiling
feature within the Fusion Service is adaptive.
The presented bias correction method offers yet another
advantage: episodes that affect air quality on a major scale, such
as forest fires, are automatically accounted for if the input data
contains some measurements from the episode-driven locations.
For instance, if a background station has measured an
exceptionally high concentration of NO2, then the expected NO2
concentration at a nearby urban environment is going to be
reflected on the episode-affected background concentration.
The performance of the presented environmental information
fusion method was evaluated using temperature forecasts
provided by four well known weather service providers (FMI,
SMHI, Met Norway and Weather Underground). For 43 locations
around Finland weather forecasts were extracted from respective
online sites and stored during several months in 2012. Uncertainty
metrics in terms of  for individual SPs were
evaluated by comparing measured temperature values against
individual stored forecasts for each SP; a total of 2500 forecasted
versus measured temperature -pairs for each SP were gathered in
order to get statistically meaningful  estimates as
a function of forecasted period length. Then, fused forecasts
(temperature of the next 3 days) for the locations in August 2012
were produced on a daily basis for each of these locations using
the stored forecasts.
In Figure 4, the mean absolute error of temperature forecasts
and the fused forecast is presented. According to the figure fused
temperature forecasts have the lowest mean error with just four
different SPs providing forecasts simultaneously. This result goes
to show that the well-known benefits of forecast fusion can be
exploited within web services such as PESCaDO when the
performance of forecast providers is being monitored.
Figure 4: Mean absolute error of temperature (C) forecasts
and the fused forecast for different forecast time spans.
Forecasted and measured data for 43 different locations and
time periods in august was used.
4.1 Performance of the environmental
profiling feature
The environmental profiling feature of the fusion service was
calibrated using measurement time series from Finland during
2010. To test the performance of this novel feature, 8 different
NO2 measurement stations with varying environments were
selected in 2011, and the observed hourly concentrations were
compared against the values predicted with the aid of the profiling
feature. The profiling feature differentiates working days and
weekends and for this test, the working days were selected.
It can be seen from the figures 5a-h that the profiling feature is
able to predict the expected average NO2 concentration well in
various different environments. Background areas, urban and
rural, fare better in the comparison while the traffic-intense
environments are more difficult to predict. This is to be expected
as the actual traffic volumes have to be derived using only the
local population and road intensity. As a consequence, the
profiling feature inevitably underestimates the expected
concentration near large motorways that have a small surrounding
4.2 Comparison of measured and predicted
NO2 time series
The performance of the fusion of air quality measurements
with the presented methodology was tested with NO2
measurements in Southern Finland. Measurement time series for
February 2011 from the available stations (n = 20) were used as
input data and fused NO2 concentrations were calculated for a
remote location for which comparison time series was readily
available. The domain for the test can be seen from Figure 6
which illustrates the fused concentration of NO2 at one of the
hours of interest.
Figure 5a-h: Predicted and observed hourly average
concentration of NO2 during working days (Monday to
Friday) in several measurement sites. Predicted values have
been obtained by evaluating the station’s environment with
the aid of the profiling feature.
Figure 6: Fused NO2 concentration in Southern Finland in
2011 at 07:00.
The highest concentration can be found at the centre of
Helsinki, which resides in the bottom-right corner of the figure.
The remote test area is a small city centre (Lohja), located
approximately 70 kilometres to the right of Helsinki 50
kilometres away from the nearest measurement station. The fused
values were compared against the on-site measurements in the
test area and results are shown in Figure 7.
The comparison between fused and measured NO2
concentration at the test site (Figure 7) shows that the pollutant
concentration has been estimated fairly accurately with the
presented method.
During the study period the mean absolute error between
predicted and measured NO2 hourly concentration was of the
order of 7 µg/m3 (mean = 12µg, Var = 107 µ2g2). This error is
significantly less than the achieved mean error when a
conventional geographical extrapolation method would be used:
using inverse distance weighting (IWD), [11] the resulting mean
absolute error would be 14 µg/m3.
Figure 7: The observed and predicted NO2 concentration
during February 2011 at the test site, the centre of Lohja city.
Figure 8 illustrates a collection of mean absolute prediction
errors from calculations similar to the one presented in Fig 7. One
by one, the measurement stations were removed from the input
data and the removed time series was compared against the fused
time series which was produced using the remaining data.
According to Fig 8 if the locations for near-by measurements
represents similar environment than the location for IWD
extrapolation (Laune station, Tikkurila station of Fig 8), then the
IWD extrapolation may be able to predict the hourly
concentration fairly well. Otherwise, the IWD-method without
bias correction capabilities produces generally poor estimates in
terms of mean absolute error whereas the fusion service performs
well regardless of the collection of estimators used as input.
Indeed, Luukki station, a rural NO2 background measurement
station is an example of this; there are several urban measurement
stations nearby and thus the hourly concentration of NO2 in
Luukki cannot be extrapolated with conventional methods.
Figure 8: Comparison of IWD extrapolation and the
presented fusion method in terms of standard deviation.
Observed average describes the average hourly NO2
concentration at measurement site.
To provide timely meteorological and air quality related
information to citizens and administrative user alike, a prototype
service PESCaDO was developed. By combining the data
discovery, extraction and fusion methods, described in this paper,
it possible to produce accurate and personalized information to
the users. Unlike several search engines, the user is not confused
by the sheer amount of presented data and suggestions; instead,
the user is provided with a single, understandable yet precise
answer. This is also what separates PESCaDO from a
conventional, generic search engine. The self-maintaining design
of PESCaDO system facilitates the discovery and indexing of
new information sources. The source provider’s performance can
be evaluated and stored on a continuous basis and the stored
performance data can be used to guide the fusion of information.
Furthermore, the measured air quality and meteorological data
that flows through the system can be used in the calibration of the
fusion service’s various statistical models effectively allowing the
system to adapt into different regions.
The fusion method offers several advantages for the PESCaDO
system. For instance, it is not necessary to discard any extracted
information as the algorithm takes care that the irrelevant input is
not over-emphasized. In this paper, a demonstration of the fusion
of temperature forecasts was given. It was shown that the fused
temperature forecast in fact had the lowest margin of error, which
goes to show the benefits to be had in the fusion of information
even if the amount of service providers is small.
It was shown that the presented profiling feature of the fusion
service is able to predict hourly concentrations of NO2 in different
environments quite well. As a consequence, the fusion method
was able to outperform a conventional extrapolation method
(IWD). However, NO2 is strongly affected by urbanization and
road traffic and thus is an ideal phenomenon to be handled with
the proposed fusion method. Other pollutants however, such as
ozone and carbon monoxide are more difficult to handle with the
presented profiling feature. In fact, the static environment based
bias-removal needs to be more dynamic in the future. This could
be achieved by introducing meteorology in the fusion process. For
instance, the profile could be analysed from the wind’s direction.
Furthermore, the expected concentration could be a function of
several meteorological parameters such as rain, sky conditions
and wind speed. As a result, the PESCaDO system would be
orchestrated in another new level, where the extracted
meteorological data would be subject to fusion and used again in
the fusion of air quality pollutants.
This work was supported by the European Commission under
the contract FP7-ICT-248594 (PESCaDO).
[1] Balk, T., Kukkonen J., Karatzas, K., Bassoukos, A., and
Epitropou, V., European Open Access Chemical Weather
Forecasting Portal, Atmospheric Environment, 38(45),
69176922, 2011.
[2] Bassoukos A., Karatzas K., Kelemis A. (2005)
Environmental Information portals, services, and retrieval
systems, Proceedings of of “Informatics for Environmental
Protection- Networking Environmental Information”-19th
International EnviroInfo Cenference, Brno, Czech Republic,
pp. 151-155.
[3] Epitropou, V., Karatzas, K., Bassoukos, A., Kukkonen, J.
and Balk, T., A new environmental image processing
method for chemical weather forecasts in Europe,
Proceedings of the 5th International Symposium on
Information Technologies in Environmental Engineering.
Poznan: Springer Series: Environmental Science and
Engineering, 781791, 2011.
[4] Hoek, G., Beelen, R., Hoogh, K., Viennau, D., Gulliver, J.,
Fischer, P. and Birggs, D. A review of land-use regression
models to assess spatial variation of outdoor air pollution.
Atmospheric Environment 42 (2008) 75617578,
doi:10.1016/j.atmosenv.2008.05.057. 2008.
[5] Janssen, S., Gerwin, D., Fierens, F. and Mensink, C. Spatial
interpolation of air pollution measurements using CORINE
land cover data. Atmospheric Environment,Volume 42, Issue
20, June 2008, Pages 48844903, 2008.
[6] Karatzas K. A fuzzy logic approach in Urban Air Quality
Management and Information Systems (UAQMIS),
Proceedings of the 4th International Conference on Urban
Air Quality Measurement, Modelling and Management (R.
Sokhi and J. Brexhler eds), Charles University, Prague,
Czech Republic, 25-27 March 2003, pp. 274-276, 2003
[7] Karatzas K. Informing the public about atmospheric quality:
air pollution and pollen, Allergo Journal 18, Issue 3/09, pp
212-217, 2009
[8] Karatzas, K. and Kukkonen, J., COST Action ES0602:
Quality of life information services towards a sustainable
society for the atmospheric environment, ISBN: 978-960-
6706-20-2, Thessaloniki: Sofia Publishers, 2009.
[9] Κlein Τh., Kukkonen J., Dahl Å., Bossioli E., Baklanov A.,
Fahre Vik Α., Agnew P., Karatzas, K., and Sofiev, M.,
Interactions of physical, chemical and biological weather
calling for an integrated assessment, forecasting and
communication of air quality, AMBIO,, 41(8), pp. 851-864,
[10] Kukkonen, J., Olsson, T., Schultz, D.M., Baklanov, A.,
Klein, T., Miranda, A. I., Monteiro, A., Hirtl, M., Tarvainen,
V., Boy, M., Peuch, V.-H., Poupkou, A., Kioutsioukis, I.,
Finardi, S., Sofiev, M., Sokhi, R., Lehtinen, K. E. J.,
Karatzas, K., San José, R., Astitha, M., Kallos, G., Schaap,
M., Reimer, E., Jakobs, H., and Eben, K., A review of
operational, regional-scale, chemical weather forecasting
models in Europe, Atmos. Chem. Phys (12), 1-87,
doi:10.5194/acp-12-1-2012, 2012.
[11] Li, J. and Heap, A.D., 2008. A Review of Spatial
Interpolation Methods for Environmental Scientists.
Geoscience Australia, Record 2008/23, 137 pp, ISBN 978 1
921498 30 5.
[12] Moumtzidou, A., Vrochidis, S., Tonelli, S., Kompatsiaris, I.,
& Pianta, E. (2012). Discovery of Environmental Nodes in
the Web", Proceedings of the 5th IRF Conference, Vienna,
Austria, 2012.
[13] Moßgraber, J., Rospocher, M. Ontology Management in a
Service-oriented Architecture. Architecture of a Knowledge
Base Access Service. Proceedings of the 23rd International
Workshop on Database and Expert Systems Applications.
[14] Oyama, S., Kokubo, T., Ishida, T.: Domain-Specific Web
Search with Keyword Spices Awareness in Urban Areas. J.
IEEE Transactions on Knowledge and Data Engineering. 16
(1), 1724, 2004
[15] Pianta, E., & Tonelli, S. KX: A Flexible System for
Keyphrase Extraction. Proceedings of SemEval, 2010.
[16] Potempski, S. and Galmarini, S., Est modus in rebus:
analytical properties of multi-model ensembles, Atmos.
Chem. Phys., 9, 94719489, doi:10.5194/acp-9-9471-
[17] Sikora, T. The MPEG-7 visual standard for content
description-an overview. IEEE Transactions on Circuits and
Systems for Video Technology, 11(6), pp. 696-702, 2001
[18] Tang, T. T., Hawking, D., Craswell, N., & Sankaranarayana,
R. S. Focused crawling in depression portal search: A
feasibility study. Proceedings of the 9th Australasian
Document Computing Symposium, Melbourne, Australia,
[19] Wald, L., Some terms of reference in data fusion, IEEE
Transactions on Geosciences and Remote Sensing 37(3), pp.
1190-1193, 2001.
[20] Wanner, L., Vrochidis, S., Tonelli, S., Mossgraber, J.,
Bosch, H., Karppinen, A., Myllynen, M., Rospocher, M.,
Bouayad-Agha, N., Bügel, U., Casamayo,r G., Ertl, T.,
Kompatsiaris, I., Koskentalo, T., Mille, S., Moumtzidou, A.,
Pianta, E., Saggion, H., Serafini, L., and Tarvainen, V,.
Building an Environmental Information System for
Personalized Content Delivery. In (Hrebícek J., Schimak G.,
Denzer R. eds.): Environmental Software Systems.
Frameworks of eEnvironment - 9th IFIP WG 5.11
International Symposium, Proceedings. IFIP Publications
359, Springer, ISBN 978-3-642-22284-9, pp. 169-176, 2011.
[21] Wanner L., Vrochidis S., Rospocher M., Moßgraber J.,
Bosch H., Karppinen A., Myllynen M., Tonelli S., Bouayad-
Agha N., Casamayor G., Ertl Th., Hilbring D., Johansson L.,
Karatzas K., Kompatsiaris I., Koskentalo T., Mille S.,
Moumtzidou A., Pianta E., Serafini L. and Tarvainen V.
Personalized Environmental Service Orchestration for
Quality Life Improvement, 8th IFIP WG 12.5 International
Conference, AIAI 2012 Workshops, IFIP AICT 382 (L.
Iliadis et al., eds), Proceedings, Springer, pp.351-360., 2012
[22] Weigel, A.P, Liniger, M.A. and Appenzeller, C. Can multi-
model combination really enhance the prediction skill of
probabilistic ensemble forecasts?. QUARTERLY
SOCIETY. Q. J. R. Meteorol. Soc. 134: 241260, 2008
[23] Wöber, K. Domain Specific Search Engines, In: Fesenmaier,
D. R., Werthner, H., Wöber, K. (eds.) Travel Destination
Recommendation Systems: Behavioral Foundations and
Applications, 205226. Cambridge, MA: CAB
International, 2006.
... A support vector machine (SVM) is used in [72] for fusion conducted at Level 1 of the JDL model, which is used as a reference (although the details of how this method is implemented and its ultimate goal are omitted). Data are fused using regression models in two papers, for example [73], which employs multiple linear regression techniques to fuse environmental sensor data with other environmental factors for precise sensor calibration using the regression equation, and [74], employing land use regression (LUR), an algorithm often used to analyse pollution, particularly in densely populated areas, in order to fuse data from different environmental sources and assure the quality of the final result by minimizing the negative effects, like deviations and contradiction 11 J o u r n a l P r e -p r o o f between sources. Finally, we found papers describing algorithms that work on time series, like [75], where a data fusion mechanism is used to generate correct data to replace WSN outlier data. ...
... In this respect, the most commonly used statistics in the papers retrieved in this survey are the central tendency, particularly, the different forms of mean: arithmetic mean [85] [60] [86] [83] [87], weighted mean for fast multi-exposure image fusion [88], and a more sophisticated mean like the exponential weighted moving average (EWMA), a type of weighted mean for fusing time series that attaches more importance to recent than old observations [85]. Measures of dispersion, like the minimum [89], the maximum [80] [63] [89] and the variance [74] have also been used for image fusion and the fusion of meteorological and air quality data extracted from the web for personalized environmental information services, respectively. On the other hand, we found two papers related to relational databases proposing fusion functions or operators which are actually based on statistical operators defined as part of SQL (Structured Query Language). ...
... Linear [73] LUR [74] Time Series Time Warping [75] Association and Clustering [76] Optimization Genetic algorithms [77] Cuckoo search [72] Lagrange Contourlet [111] Others: Gram-Schmidt and Modified intensity-huesaturation: [62]; Belief propagation: [112]; Pansharpening: [113]; Semantics-ontology: [114]; Data cleansing: [115]; Gaussian and Laplacian pyramids: [116]; ...
The information fusion field has recently been attracting a lot of interest within the scientific community, as it provides, through the combination of different sources of heterogeneous information, a fuller and/or more precise understanding of the real world than can be gained considering the above sources separately. One of the fundamental aims of computer systems, and especially decision support systems, is to assure that the quality of the information they process is high. There are many different approaches for this purpose, including information fusion. Information fusion is currently one of the most promising methods. It is particularly useful under circumstances where quality might be compromised, for example, either intrinsically due to imperfect information (vagueness, uncertainty, …) or because of limited resources (energy, time, …). In response to this goal, a wide range of research has been undertaken over recent years. To date, the literature reviews in this field have focused on problem-specific issues and have been circumscribed to certain system types. Therefore, there is no holistic and systematic knowledge of the state of the art to help establish the steps to be taken in the future. In particular, aspects like what impact different information fusion methods have on information quality, how information quality is characterised, measured and evaluated in different application domains depending on the problem data type or whether fusion is designed as a flexible process capable of adapting to changing system circumstances and their intrinsically limited resources have not been addressed. This paper aims precisely to review the literature on research into the use of information fusion techniques specifically to improve information quality, analysing the above issues in order to identify a series of challenges and research directions, which are presented in this paper.
... The new AQI can be implemented and updated in applications using information of current AQI. For example, the local AQI information is embedded in the GreenPaths application (Poom et al., 2020) and ENFUSER air quality model (Johansson et al., 2015) in the HMA. When using the GreenPaths application, individuals can choose routes to their wished destination with the least exposure to air pollution based on the AQI while ENFUSER spatial model is currently adopted in several governmental bodies to tell the public the AQI information as a way of communication. ...
Full-text available
To convey the severity of ambient air pollution level to the public, air quality index (AQI) is used as a communication tool to reflect the concentrations of individual pollutants on a common scale. However, due to the enhanced air pollution control in recent years, air quality has improved, and the roles of some air pollutant species included in the existing AQI as urban air pollutants have diminished. In this study, we suggest the current AQI should be revised in a way that new air pollution indicators would be considered so that it would better represent the health effects caused by local combustion processes from traffic and residential burning. Based on the air quality data of 2017-2019 in three different sites in Helsinki metropolitan area, we assumed the statistical distributions of the current indicators (NO2 and PM2.5) and the proposed particulate indicators (BC, LDSA and PNC) were related as they have similar sources in urban regions despite the varying correlations between the current and proposed indicators (NO2: r = 0.5-0.85, PM2.5: r = 0.28-0.72). By fitting the data to an optimal distribution function, together with expert opinions, we improved the current Finnish AQI and determined the AQI breakpoints for the proposed indicators where this robust statistical approach is transferrable to other cities. The addition of the three proposed indicators to the current AQI would decrease the number of good air quality hours in all three environments (largest decrease in urban traffic site, ~22 %). The deterioration of air quality class appeared more severe during peak hours in the urban traffic site due to vehicular emission and evenings in the detached housing site where domestic wood combustion often takes place. The introduction of the AQI breakpoints of the three new indicators serve as a first step of improving the current AQI before further air quality guideline levels are updated.
... For instance, Liu et al. (2021) developed a correlation analysis of PM10, SO 2 , NO 2 , and O 3 and showed improvement in prediction data efficiency up to 86% by applying regression models with reference-grade data and other covariate data, such as meteorology using artificial neural network. SmartAirQ uses such ML methods across UAQM lifecycle, namely, random forest in improving LCS calibration (Zimmerman et al., 2018), data fusion (Johansson et al., 2015;Lau et al., 2019), parametrization , bias correction (Haupt et al., 2021;Xu et al., 2021), extracting information from -ftp, API, internet, WiFi, Bluetooth, cellular network -Smart city ICT centre (command and control centre) -Data wrangling with data classification -Data calibration using ML methods -Uniform gridding using land use regression methods -Temporal classification (24 h average/8 h average) -Units synchronization Data processing and management -Real-time traffic data from sensors and CCTV images are accessed and tagged with date, time, geospatial parameters -Data fusion service citizens -The data are dynamically classified with image processing for classifying 2, 3, 4W, buses, trucks, number plate recognition using ML methods. The data also include crowdsourced congestion images shared by citizens from smartphones and extracted from social media posts -Using the gridded monitoring and traffic data, real-time traffic emission estimates are prepared. ...
Full-text available
Rapid urbanization across the world has put an enormous burden on our environment. Cities from developing countries, in particular, are experiencing high air pollution levels. To address this challenge, the new WHO global air quality guidelines and various nations are mandating cities to implement clean air measures. However, these implementations are largely hindered by limited observations, siloed city operations, absence of standard processes, inadequate outreach, and absence of collaborative urban air quality management (UAQM) governance. The world is experiencing transformative changes in the way we live. The 4th industrial revolution technologies of artificial intelligence, Internet of Things, big data, and cloud computing bridge gaps between physical, natural, and personal entities. Globally, smart cities are being promulgated on the premise that technologies and data aid in improving urban services. However, in many instances, the smart city programs and UAQM services may not be aligned, thereby constraining the cumulative advantage in building urban resilience. Considering the potential of these technologies as enablers of environmental sustainability, a conceptual urban computing framework “SmartAirQ” for UAQM is designed. This interdisciplinary study outlines the SmartAirQ components: 1) data acquisition, 2) communication and aggregation, 3) data processing and management, 4) intelligence, 5) application service, 6) high-performance computing- (HPC-) cloud, and 7) security. The framework has integrated science cloud and urban services aiding in translating scientific data into operations. It is a step toward collaborative, data-driven, and sustainable smart cities.
... Modeling of the atmospheric chemical composition has advanced steadily over the past number of years [1,2] with the aim of producing actionable Air Quality (AQ) information. Data about AQ are routinely collected from official (ground-based) monitoring stations. ...
Full-text available
Deployment of an air quality low-cost sensor network (AQLCSN), with proper calibration of low-cost sensors (LCS), offers the potential to substantially increase the ability to monitor air pollution. However, to leverage this potential, several drawbacks must be ameliorated, thus the calibration of such sensors is becoming an essential component in their use. Commonly, calibration takes place in a laboratory environment using gasses of known composition to measure the response and a linear calibration is often reached. On site calibration is a promising complementary technique where an LCS and a reference instrument are collocated with the former being calibrated to match the measurements of the latter. In a scenario where an AQLCSN is already operational, both calibration approaches are resource and time demanding procedures to be implemented as frequently repeated actions. Furthermore, sensors are sensitive to the local meteorology and adaptation is a slow process making relocation a complex and expensive option. We concentrate our efforts in keeping the LCS positions fixed and propose to blend a genetic algorithm (GA) with a hybrid stacking (HS) ensemble into the GAHS framework. GAHS employs a combination of batch machine learning algorithms and regularly updated online machine learning calibration function(s) for the whole network when a small number of reference instruments are present. Furthermore, we introduce the concept of spatial online learning to achieve better spatial generalization. The frameworks are tested for the case of Thessaloniki where a total of 33 devices are installed. The AQLCSN is calibrated on the basis of on-site matching with high quality observations from three reference station measurements. The O3 LCS are successfully calibrated for 8–10 months and the PM10 LCS calibration is evaluated for 13–24 months showing a strong seasonal dependence on their ability to correctly capture the pollution levels.
... In a different setup, we may assume the similarity of the same type of environment and utilise the measurements as a replacement. Furthermore, this continuous LDSA estimation could be useful in updating some of the current air quality applications, for instance, the ENFUSER air quality model, which provides accurate spatiotemporal estimation for air pollutants in Helsinki (Johansson et al., 2015). Data availability. ...
Full-text available
Lung-deposited surface area (LDSA) has been considered to be a better metric to explain nanoparticle toxicity instead of the commonly used particulate mass concentration. LDSA concentrations can be obtained either by direct measurements or by calculation based on the empirical lung deposition model and measurements of particle size distribution. However, the LDSA or size distribution measurements are neither compulsory nor regulated by the government. As a result, LDSA data are often scarce spatially and temporally. In light of this, we developed a novel statistical model, named the input-adaptive mixed-effects (IAME) model, to estimate LDSA based on other already existing measurements of air pollutant variables and meteorological conditions. During the measurement period in 2017–2018, we retrieved LDSA data measured by Pegasor AQ Urban and other variables at a street canyon (SC, average LDSA = 19.7 ± 11.3 µm2 cm−3) site and an urban background (UB, average LDSA = 11.2 ± 7.1 µm2 cm−3) site in Helsinki, Finland. For the continuous estimation of LDSA, the IAME model was automatised to select the best combination of input variables, including a maximum of three fixed effect variables and three time indictors as random effect variables. Altogether, 696 submodels were generated and ranked by the coefficient of determination (R2), mean absolute error (MAE) and centred root-mean-square difference (cRMSD) in order. At the SC site, the LDSA concentrations were best estimated by mass concentration of particle of diameters smaller than 2.5 µm (PM2.5), total particle number concentration (PNC) and black carbon (BC), all of which are closely connected with the vehicular emissions. At the UB site, the LDSA concentrations were found to be correlated with PM2.5, BC and carbon monoxide (CO). The accuracy of the overall model was better at the SC site (R2=0.80, MAE = 3.7 µm2 cm−3) than at the UB site (R2=0.77, MAE = 2.3 µm2 cm−3), plausibly because the LDSA source was more tightly controlled by the close-by vehicular emission source. The results also demonstrated that the additional adjustment by taking random effects into account improved the sensitivity and the accuracy of the fixed effect model. Due to its adaptive input selection and inclusion of random effects, IAME could fill up missing data or even serve as a network of virtual sensors to complement the measurements at reference stations.
... This analysis leans on the tradition of environmental equity [52], yet the approach is improved by using micro-level population data and modelled environmental exposure at very granular resolution, in contrast to traditional approach of self-reported survey-data or proxies, such as distances to roads. The key data sources for improvements are high-quality FMI-ENFUSER air quality data [40,41], and micro-level population data from Statistics Finland. Descriptive and visual spatial analyses of the population and environmental quality are done. ...
Full-text available
Background Air pollution is one of the major environmental challenges cities worldwide face today. Planning healthy environments for all future populations, whilst considering the ongoing demand for urbanisation and provisions needed to combat climate change, remains a difficult task. Objective To combine artificial intelligence (AI), atmospheric and social sciences to provide urban planning solutions that optimise local air quality by applying novel methods and taking into consideration population structures and traffic flows. Methods We will use high-resolution spatial data and linked electronic population cohort for Helsinki Metropolitan Area (Finland) to model (a) population dynamics and urban inequality related to air pollution; (b) detailed aerosol dynamics, aerosol and gas-phase chemistry together with detailed flow characteristics; (c) high-resolution traffic flow addressing dynamical changes at the city environment, such as accidents, construction work and unexpected congestion. Finally, we will fuse the information resulting from these models into an optimal city planning model balancing air quality, comfort, accessibility and travelling efficiency.
... By forcing enterprises to improve production processes and promoting enterprises to accelerate innovation and upgrade through replacing traditional resource-intensive products with technology-intensive ones, energy consumption can be reduced in the production process. From the public participation perspective, environmental protection is closely related to everyone's life [38]. The popularization of Internet technology provides new ways and opportunities for the public to obtain environmental information, form environmental awareness, and participate in environmental protection [39]. ...
Full-text available
Based on the data of the 283 prefecture-level cities in China from 2003 to 2018, this paper examines the impact of Internet development on environmental quality. The results show that China’s urban PM2.5 has a significant spatial spillover effect. In general, the Internet has a significant negative direct effect on urban environmental pollution, which means that the development of the Internet can improve urban environmental quality. This result remains robust under different methods. As the Internet has evolved over the years, its influence on environmental quality has increased and became more and more significant. In terms of regions, the spatial spillover effect of PM2.5 shows a pattern of eastern region < central region < western region < northeast region, where the eastern region is the only region with a statistically significant negative value for the coefficient, which indicates the direct effects of Internet development on the environmental quality. In addition, the statistic testing on mediating effect shows that the Internet’s effect on urban environment quality is mainly transmitted through the upgrading of industrial structure. With the industrial structure being used as the threshold variable, the influence of Internet development on environmental quality could be divided into two stages.
Full-text available
This review provides a community's perspective on air quality research focusing mainly on developments over the past decade. The article provides perspectives on current and future challenges as well as research needs for selected key topics. While this paper is not an exhaustive review of all research areas in the field of air quality, we have selected key topics that we feel are important from air quality research and policy perspectives. After providing a short historical overview, this review focuses on improvements in characterizing sources and emissions of air pollution, new air quality observations and instrumentation, advances in air quality prediction and forecasting, understanding interactions of air quality with meteorology and climate, exposure and health assessment, and air quality management and policy. In conducting the review, specific objectives were (i) to address current developments that push the boundaries of air quality research forward, (ii) to highlight the emerging prominent gaps of knowledge in air quality research, and (iii) to make recommendations to guide the direction for future research within the wider community. This review also identifies areas of particular importance for air quality policy. The original concept of this review was borne at the International Conference on Air Quality 2020 (held online due to the COVID 19 restrictions during 18–26 May 2020), but the article incorporates a wider landscape of research literature within the field of air quality science. On air pollution emissions the review highlights, in particular, the need to reduce uncertainties in emissions from diffuse sources, particulate matter chemical components, shipping emissions, and the importance of considering both indoor and outdoor sources. There is a growing need to have integrated air pollution and related observations from both ground-based and remote sensing instruments, including in particular those on satellites. The research should also capitalize on the growing area of low-cost sensors, while ensuring a quality of the measurements which are regulated by guidelines. Connecting various physical scales in air quality modelling is still a continual issue, with cities being affected by air pollution gradients at local scales and by long-range transport. At the same time, one should allow for the impacts from climate change on a longer timescale. Earth system modelling offers considerable potential by providing a consistent framework for treating scales and processes, especially where there are significant feedbacks, such as those related to aerosols, chemistry, and meteorology. Assessment of exposure to air pollution should consider the impacts of both indoor and outdoor emissions, as well as application of more sophisticated, dynamic modelling approaches to predict concentrations of air pollutants in both environments. With particulate matter being one of the most important pollutants for health, research is indicating the urgent need to understand, in particular, the role of particle number and chemical components in terms of health impact, which in turn requires improved emission inventories and models for predicting high-resolution distributions of these metrics over cities. The review also examines how air pollution management needs to adapt to the above-mentioned new challenges and briefly considers the implications from the COVID-19 pandemic for air quality. Finally, we provide recommendations for air quality research and support for policy.
The digital economy is of great significant for carbon emission reduction. To estimate the impact on carbon emissions of digital economy, this paper conducts nonlinear analysis with the combination of spatial DURBIN model (SDM) and panel threshold model (PTM). Moreover, this paper decomposes the carbon emissions reduction effect into direct part and spatial spillover part, further analyzes the action mechanism from the perspective of technology progress, energy use and industrial structure. The empirical results indicate that the digital economy and carbon emissions have inverted U-shaped relationship. Similarly, the spatial spillover effect of digital economy on carbon emissions is also inverted-U shape. The digital economy increases first and then reduces the carbon emissions. The robustness tests and endogenous analysis both support above conclusions. Compared with the indirect effect, the direct effect of digital economy on carbon emissions is significantly greater. Moreover, the long-term effect of digital economy on carbon emissions is greater than the short-term effect. The impact of digital economy on carbon emissions exists regional heterogeneity. Different from the eastern and central regions, the digital economy in western region promotes carbon emissions monotonically. The impact of digital economy on carbon emissions faces significant resource endowment thresholds, urban scale thresholds and innovative capability thresholds. The increase of energy use and non-green technology progress are the main paths to influence local carbon emissions in short term, while the green technology progress and industrial structure upgrade become dominant paths in long term. The active demonstration of industrial structure upgrade and technology spillovers drive the spillover effect of digital economy on carbon emissions in long term. This paper suggests that, the realization of carbon peak and carbon neutral needs to strengthen the digital economy and promote the regional cooperation in environmental governance. The green integration of digital economy and traditional industries is of great significant for carbon emission reduction.
With the continuous development of cloud computing and internet of things technology, the integration of internet and haze governance has broad prospects and infinite potential. Through the internet platform and technology, environmental monitoring can be done by the internet, and realize the intellec-tualization of environmental management, improve the early warning ability of environmental pollution emergencies, and facilitate the public's in-depth participation in environmental supervision. The dynamic spatial Durbin model and the quantile regression model are employed to analyze the effect and mechanism of the internet development on China's haze pollution on the basis of provincial panel data in China from 2006 to 2017. The results indicate that there is an inverted "U" curve between internet development and haze pollution in China, and the conclusion is still valid after a series of robustness tests. There is significant heterogeneity between direct and indirect spillover effects. Meanwhile, from the perspective of different regions in China (such as the east-central region and western regions), the inverted "U" curve between internet development and air pollution still exists. Besides, quantile regression results also show that the suppression effect of the internet on haze pollution is getting stronger with the increase of haze concentration. The regression results of the mediation effect indicate that internet development mainly affects haze pollution by improving technological innovation and environmental governance efficiency.
Full-text available
The chemical composition of the atmosphere has numerous impacts to the quality of human life. Some prominent examples of these are the adverse health effects of fine particulate matter and ozone, the irritation and cough caused by some air pollutants and the sneezing associated to aeroallergens, the sense of smell associated to the changes of the seasons as well as to the exposure to unpleasant odors. The COST Action ES0602: Towards a European Network on Chemical Weather Forecasting and Information Systems (, organized in May 2008 a workshop in Thessaloniki, Greece, devoted to Quality of life information services towards a caring and sustainable society for the atmospheric environment. The main purpose of the workshop was to present and discuss existing Chemical Weather Forecasting and Information Systems, developed both by the action participants, and by related important organizations, such as the European Environment Agency and the U.S. Environmental Protection Agency. The focus of the workshop was specifically on the dissemination and the wider use of chemical weather forecasting information. This has been the main topic of the Working Group 3 of this COST action. The workshop included a number of key presentations from invited experts from Europe and the United States. These represent regional, national and continental solutions and services that provide information on the quality of the atmospheric environment and chemical weather forecasts, via web portals, mobile devices, and other ICT communication channels. These presentations are included in this publication, and we wish to thank all the authors for providing these contributions. In addition, this publication includes three brief papers that present the objectives, content, interaction and achievements of the working groups that are active within the COST ES0602 Action. We also wish to acknowledge the substantial contributions towards the successful organization of this workshop by all of the participants of this COST action. Last but not least, this publication includes an inventory of AQ information systems in Europe, on the basis of input received by members of the Action. We hope that based on these proceedings and the referenced information, the readers will be able to have direct access to the current state of the art, the latest developments and the future plans concerning Quality of Life Information Services for the Atmospheric Environment (This publication is supported by COST).
Full-text available
Environmental data analysis and information provision are considered of great importance for people, since environmental conditions are strongly related to health issues and directly affect a variety of everyday activities. Nowadays, there are several free web-based services that provide environmental information in several formats with map images being the most commonly used to present air quality and pollen forecasts. This format, despite being intuitive for humans, complicates the extraction and processing of the underlying data. Typical examples of this case are the chemical weather forecasts, which are usually encoded heatmaps (i.e. graphical representation of matrix data with colors), while the forecasted numerical pollutant concentrations are commonly unavailable. This work presents a model for the semi-automatic extraction of such information based on a template configuration tool, on methodologies for data reconstruction from images, as well as on text processing and Optical Character Recognition (OCR). The aforementioned modules are integrated in a standalone framework, which is extensively evaluated by comparing data extracted from a variety of chemical weather heat maps against the real numerical values produced by chemical weather forecasting models. The results demonstrate a satisfactory performance in terms of data recovery and positional accuracy.
In this paper we investigate some basic properties of the multi-model ensemble systems, which can be deduced from a general characteristic of statistical distributions of the ensemble members with the help of mathematical tools. In particular we show how to find optimal linear combination of model results, which minimizes the mean square error both in the case of uncorrelated and correlated models. By proving basic estimations we try to deduce general properties describing multi-model ensemble systems. We show also how mathematical formalism can be used for investigation of the characteristics of such systems.
Conference Paper
Environmental data are considered of utmost importance for human life, since weather conditions, air quality and pollen are strongly related to health issues and affect everyday activities. This paper addresses the problem of discovery of air quality and pollen forecast Web resources, which are usually presented in the form of heatmaps (i.e. graphical representation of matrix data with colors). Towards the solution of this problem, we propose a discovery methodology, which builds upon a general purpose search engine and a novel post processing heatmap recognition layer. The first step involves generation of domain-specific queries, which are submitted to the search engine, while the second involves an image classification step based on visual low level features to identify Web sites including heatmaps. Experimental results comparing various visual features combinations show that relevant environmental sites can be efficiently recognized and retrieved.
Conference Paper
Focussed crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic based on evidence obtained from the already downloaded pages. This work proposes a classifier-guided focussed crawling approach that estimates the relevance of a hyperlink to an unvisited Web resource based on the combination of textual evidence representing its local context, namely the textual content appearing in its vicinity in the parent page, with visual evidence associated with its global context, namely the presence of images relevant to the topic within the parent page. The proposed focussed crawling approach is applied towards the discovery of environmental Web resources that provide air quality measurements and forecasts, since such measurements (and particularly the forecasts) are not only provided in textual form, but are also commonly encoded as multimedia, mainly in the form of heatmaps. Our evaluation experiments indicate the effectiveness of incorporating visual evidence in the link selection process applied by the focussed crawler over the use of textual features alone, particularly in conjunction with hyperlink exploration strategies that allow for the discovery of highly relevant pages that lie behind apparently irrelevant ones.
Conference Paper
An increasing number of information systems integrate semantic data stores for managing ontologies. To access these knowledge bases most of the available implementations provide application programming interfaces (APIs). The implementations of these APIs normally do not support any kind of network protocol or service interface. This works fine as long as a monolithic system is developed. If the need arises to integrate such a knowledge base into a service-oriented architecture a different approach is needed. In this paper we propose an architecture to address this issue. A first demonstrator was fully implemented in the European project PESCaDO. Several services access and work on a central knowledge base access service which supports multi-threaded access for parallel instantiated ontologies.
Spatially continuous data of environmental variables are often required for environmental sciences and management. However, information for environmental variables is usually collected by point sampling, particularly for the mountainous region and deep ocean area. Thus, methods generating such spatially continuous data by using point samples become essential tools. Spatial interpolation methods (SIMs) are, however, often data-specific or even variable-specific. Many factors affect the predictive performance of the methods and previous studies have shown that their effects are not consistent. Hence it is difficult to select an appropriate method for a given dataset. This review aims to provide guidelines and suggestions regarding application of SIMs to environmental data by comparing the features of the commonly applied methods which fall into three categories, namely: non-geostatistical interpolation methods, geostatistical interpolation methods and combined methods. Factors affecting the performance, including sampling design, sample spatial distribution, data quality, correlation between primary and secondary variables, and interaction among factors, are discussed. A total of 25 commonly applied methods are then classified based on their features to provide an overview of the relationships among them. These features are quantified and then clustered to show similarities among these 25 methods. An easy to use decision tree for selecting an appropriate method from these 25 methods is developed based on data availability, data nature, expected estimation, and features of the method. Finally, a list of software packages for spatial interpolation is provided.