ArticlePDF Available

Efficient prediction of total column ozone based on support vector regression algorithms, numerical models and Suomi-satellite data

Authors:

Abstract and Figures

This paper proposes a novel prediction method for Total Column Ozone (TCO), based on the combination of Support Vector Regression (SVR) algorithms and different predictive variables coming from satellite data (Suomi National Polar-orbiting Partnership satellite), numerical models (Global Forecasting System model, GFS) and direct measurements. Data from satellite consists of temperature and humidity profiles at different heights, and TCO measurements the days before the prediction. GFS model provides predictions of temperature and humidity for the day of prediction. Alternative data measured in situ, such as aerosol optical depth at different wavelengths, are also considered in the system. The SVR methodology is able to obtain an accurate TCO prediction from these predictive variables, outperforming other regression methodologies such as neural networks. Analysis on the best subset of features in TCO prediction is also carried out in this paper. The experimental part of the paper consists in the application of the SVR to real data collected at the radiometric observatory of Madrid, Spain, where ozone measurements obtained with a Brewer spectrophotometer are available, and allow the system's training and the evaluation of its performance.
Content may be subject to copyright.
Atmósfera 30(1), 1-10 (2017)
doi: 10.20937/ATM.2017.30.01.01
© 2017 Universidad Nacional Autónoma de México, Centro de Ciencias de la Atmósfera.
This is an open access article under the CC BY-NC-ND License (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Efcient prediction of total column ozone based on support vector regression
algorithms, numerical models and Suomi-satellite data
Leo CARRO-CALVO,a Carlos CASANOVA-MATEO,b Julia SANZ-JUSTO,b
José Luis CASANOVA-ROQUEb and Sancho SALCEDO-SANZa*
a Departmento de la Teoría de la Señal y Comunicaciones, Universidad de Alcalá, carretera Madrid-Barcelona, km 33.6,
28805 Alcalá de Henares, Madrid, España
b LATUV, Laboratorio de Teledetección, Universidad de Valladolid, Edicio I+D, Paseo de Belén 11, 47011 Valladolid,
España
* Corresponding author: sancho.salcedo@uah.es
Received: January 25, 2016; accepted: November 10, 2016
RESUMEN
Se propone un nuevo método de pronóstico para la columna total de ozono (CTO) basado en la combinación
de algoritmos de vectores de soporte para regresión (VSR) y variables de predicción provenientes del saté-
lite de colaboración nacional en órbita polar Suomi, así como de modelos numéricos del Sistema Global de
Predicción (SGP) y mediciones directas. Los datos de satélite incluyen perles de temperatura y humedad
a diferentes alturas, y mediciones de CTO realizadas en los días anteriores al pronóstico. El modelo SGP
proporciona datos de temperatura y humedad para el día del pronóstico. El sistema también considera los
datos alternos de mediciones in situ, p. ej. de la profundidad óptica de aerosoles a diferentes longitudes de
onda. Mediante la metodología VSR se puede obtener un pronóstico exacto de la CTO a partir de estas va-
riables de predicción, con mejores resultados que los obtenidos con otros métodos de regresión, p. ej. redes
neuronales. También se efectúa un análisis del mejor subconjunto de características del pronóstico de CTO.
La parte experimental de la investigación consiste en la aplicación de VSR a datos de observación directa
obtenidos en el laboratorio radiométrico de Madrid, España, donde están disponibles mediciones de ozono
adquiridas por medio de un espectrofotómetro Brewer, lo que posibilita el entrenamiento del sistema y la
evaluación de sus resultados.
ABSTRACT
This paper proposes a novel prediction method for Total Column Ozone (TCO), based on the combination
of Support Vector Regression (SVR) algorithms and different predictive variables coming from satellite
data (Suomi National Polar-orbiting Partnership satellite), numerical models (Global Forecasting System
model, GFS) and direct measurements. Data from satellite consists of temperature and humidity proles at
different heights, and TCO measurements the days before the prediction. GFS model provides predictions of
temperature and humidity for the day of prediction. Alternative data measured in situ, such as aerosol optical
depth at different wavelengths, are also considered in the system. The SVR methodology is able to obtain
an accurate TCO prediction from these predictive variables, outperforming other regression methodologies
such as neural networks. Analysis on the best subset of features in TCO prediction is also carried out in this
paper. The experimental part of the paper consists in the application of the SVR to real data collected at the
radiometric observatory of Madrid, Spain, where ozone measurements obtained with a Brewer spectropho-
tometer are available, and allow the system’s training and the evaluation of its performance.
Keywords: Total column ozone, daily forecasting, satellite data, numerical models, support vector regression.
2L. Carro-Calvo et al.
1. Introduction
Ozone is a gas naturally present in the Earth’s at-
mosphere. In the upper atmosphere, ozone is able
to absorb some of the harmful ultraviolet radiation
coming from the Sun, creating thus a protective cover
to our planet. In the troposphere, ozone is formed
through chemical reactions between volatile organic
components, nitrogen oxides and sunlight. In the
lower atmosphere, it is a harmful pollutant that may
cause respiratory problems to humans, and different
damages in plants and other living systems. For this
twofold behavior, ozone variability and prediction
studies have been a major issue in the last decades
(Anton et al., 2011a, b; Varotsos et al., 2004). The
interest in modeling ozone variability started on the
early 1970s, when changes of stratospheric ozone
were attributed to catalytic reactions in the strato-
sphere that caused losses in the total amount of ozone
(Crutzen, 1970, 1971).
Other studies on this topic focused on the role
of chlorine (Stolarski and Cicerone, 1974) and the
chlorouorocarbons (CFCs) (Molina and Rowland,
1974) in ozone losses in the stratosphere. These
hypotheses were conrmed by the observation of a
sharp decrease in the stratospheric ozone levels over
Antarctica, at the start of the southern spring season
in the middle 1980s over several polar bases of that
continent (Farman et al., 1985).
From these first studies, the analysis of Total
Column Ozone (TCO) (dened as the amount of
ozone contained in a vertical column of base 1 cm2 at
standard pressure and temperature) became a primary
important problem in atmospheric physics (Savastiouk
and McElroy, 2005; Silva, 2007), in connection with
atmospheric circulation and its dynamics (Khokhlov
and Romanova, 2011), climate change (Krzyscin and
Borkowski, 2008), greenhouse gases concentration
(Bronnimann et al., 2000; Steinbrecht et al., 2003) and,
of course, pollutants concentration in different zones
of the Earth (Rajab et al., 2013). TCO variability has
also been studied using remote sensing techniques,
mainly satellite data, such as in Silva (2007), where
the use of satellite measurements in the study of TCO
over Brazil in the last decades is reviewed; Latha and
Badarinath (2003), where satellite measurements are
used together with ground measurements in the study
of TCO content in the atmosphere; Jin et al. (2008),
where TCO measurements are calculated from geosta-
tionary satellite data; Christakos et al. (2004), where
remote sensing data and empirical models are mixed
with existing data bases for TCO mapping; Anton et
al. (2008), where satellite data from the Global Ozone
Monitoring Experiment (GOME) are used to study
TCO variability over the Iberian Peninsula; Rajab et
al. (2013), where satellite measurements of different
atmospheric variables are used in ozone prediction
over Malaysia; and Pinedo et al. (2014), where Total
Ozone Mapping Spectrometer (TOMS) and Ozone
Monitoring Instrument (OMI) satellite data are used
to analyze TCO over Mexico in the period 1978-2013.
Regarding TCO prediction, different systems
and approaches have been proposed, both using
numerical and classical statistical methods such as
autoregressive approaches (Chattopadhyay, 2009a).
In general, TCO prediction with numerical models
tends to be more accurate than statistical prediction,
but note that alternative statistical-based procedures
are also able to obtain a good prediction, in a fraction
of time compared to numerical models, and with a
smaller infrastructure. In the last few years, compu-
tational intelligence algorithms have been proposed,
obtaining accurate algorithms for TCO prediction.
Among other approaches, neural networks have
been intensely used in TCO estimation problems
(Monge and Medrano, 2004; Chattopadhyay, 2007,
2009b, Salcedo et al., 2010). In Monge and Medrano
(2004), a multi-layer perceptron neural network (MLP)
(Hagan and Menhaj, 1994) is applied to the prediction
of TCO series in Arosa (Switzerland), Lisbon (Por-
tugal) and Vigna di Valle (Italy). In this case, using
TCO data from 1967 to 1973, a good performance
of the approach could be demonstrated. In a more
recent work, Chattopadhyay and Bandyopadhyay
(2007) successfully apply a neural network (which
was trained using the back propagation algorithm)
to the TCO series of Arosa between 1932 and 1970.
In Salcedo et al. (2011) a neural network bank is
applied to TCO prediction in the Iberian Peninsula,
with good results. Martínez et al. (2011) describe
a methodology based on association-rules for TCO
prediction, improving the interpretability of pre-
dictions in terms of the predictive variables. More
recently Rajab et al. (2013) apply multiple regres-
sion techniques and principal component analysis
(PCA) to TCO prediction in the Malaysia Peninsula
using satellite data.
In this paper we propose a novel system for
TCO prediction in a daily time-horizon (24 h) that
3
TCO efcient prediction with SMVs, numerical models and Suomi data
combines a powerful regression methodology (sup-
port vector regression, SVR) (Salcedo et al., 2014)
with different predictive variables coming from sat-
ellite data (Suomi National Polar-orbiting Partnership
[NPP] satellite), numerical models (Global Forecast-
ing System [GFS] model) and in-situ measurements.
To our knowledge, there are not previous works
dealing with the SVR methodology in TCO predic-
tion. The complete system provides an accurate TCO
prediction within a 24-h time-horizon, by combining
the prediction capabilities of SVR with satellite
data and proles predictions by numerical models.
The objective variable (TCO) to train the system is
obtained by means of a Brewer spectrophotometer.
Different experiments to evaluate the performance of
the system have been carried out at the radiometric
station of Madrid, including comparison with arti-
cial neural systems. Further analysis on the subsets
of features that provides the best results in terms of
TCO prediction is also included in the experimental
analysis of the paper.
The structure of the paper is as follows: section 2
presents the data available to face this daily TOC
prediction problem; section 2.1 describes the obser-
vational data available from satellite measurements;
section 2.2 describes the predicted variables used
in addition, obtained from the GFS, and section 2.3
gives the description of the TCO measurements used
to train the algorithm and to evaluate the predicted
TCO. Section 3 reviews the main concepts of the
SVR algorithm. Section 4 presents the experimental
part of the paper, where the performance of the pro-
posed system is shown in different experiments at the
radiometric station of Madrid. Finally, in section 5
some concluding remarks are given.
2. Data available for this study
A predictive model is proposed where satellite data,
aerosol optical depth (AOD) from a ground-installed
sunphotometer, and numerical models information
are considered. All the data sources used in the fol-
lowing subsections are reviewed.
2.1 Satellite-based and ground data
Regarding satellite data, the following information
is used:
a. Temperature and humidity proles (100 pressure
levels) obtained from the Advanced Technology
Microwave Sounder (ATMS) by means of the
CSPP-CIMSS software (http://cimss.ssec.wisc.
edu/cspp/).
b. Total column ozone derived from the Ozone
Mapping Proler Suite (OMPS).
The satellite used in this work is the Suomi NPP
polar satellite, the rst satellite of the new series of
American satellites forming the Joint Polar Satellite
System (JPSS), which will be the replacement of the
historical NOAA satellites. Suomi NPP is the result
of a joint venture of NOAA and NASA and it has
been designed to be the prototype of the future JPSS
satellite series. Suomi NPP carries ve instruments on
board with the aim of testing several key technologies
of the JPSS mission. It is one of the rst satellites
to meet the challenge of performing a wide range
of measurements over land, ocean and atmosphere
that may aid in the understanding of climate, while
it carries on with the operational needs of weather
forecasting and continuing key data records that are
essential for the study of global change, i.e., it meets
the objectives of NOAA and EOS satellites.
The instruments on board Suomi NPP are the
following:
Advanced Technology Microwave Sounder
(ATMS), a scanner with 22 channels providing
vertical soundings of temperature and humidity
for weather forecasting.
Visible Infrared Imaging Radiometer Suite
(VIIRS), a radiometer that measures 26 VIS and
IR channels with multiple applications for the
study of aerosols, clouds, ocean color, surface
temperature, res, albedo, etc. Its data can im-
prove the understanding of climate change. It is
considered the substitute for MODIS.
– Cross-track Infrared Sounder (CrIS), a Fourier
transform spectrometer with 1305 channels that
allows obtaining vertical proles of temperature,
pressure and humidity at a very high resolution
(100 levels). These measurements will help short
and medium term weather forecasting.
– Ozone Mapping Profiler Suite (OMPS), two
hyper-spectral instruments that measure ozone
prole with a very high vertical resolution. Due
to their high resolution, they provide insights into
the state of the ozone layer and a better under-
standing of chemical phenomena that lead to the
destruction of ozone near the troposphere.
4L. Carro-Calvo et al.
Clouds and the Earth’s Radiant Energy System
(CERES), a three-channel spectrometer that mea-
sures solar radiation reected and emitted by the
Earth. It also analyzes cloud properties such as
thickness, height, particle size, phase of the cloud
and others.
These instruments perfectly fulll the objectives
of JPSS, contributing to the study of climate change
and providing series of critical data for understanding
climate dynamics.
Due to the fact that aerosols can absorb solar en-
ergy (Wang et al., 2009), we considered in addition
that it could be interesting to include aerosol optical
depth (AOD) in our model as another input parameter.
The daily mean aerosol optical depth product can be
obtained from the measurements of a sunphotometer,
which makes direct sun measurements at wavelengths
340, 380, 440, 500, 670, 870 and 1020 nm with a eld
of view of 1.20 nm. Fortunately, a Cimel CE318 sun-
photometer is installed at the radiometric observatory
of Madrid. This instrument is part of the NASA Aerosol
Robotic Network (AERONET) (Holben et al., 1998).
2.2 Model predicted variables
Regarding numerical model information, daily mean
predicted temperature and humidity proles obtained
from the GFS numerical weather prediction model
(Kanamitsu et al., 1991) were used. Although its
horizontal resolution is quite coarse, the GFS model
has the advantage that its data are freely available on
the Internet. In this case, the variables were taken at
the grid point closest to the region of interest.
2.3 Target variable: TCO control measurements
Currently the World Meteorological Organization’s
Global Atmosphere Watch (WMO/GAW) program
suggests that the most relevant instrument to mea-
sure column ozone from the ground is the Brewer
spectrophotometer. This instrument allows to derive
the total ozone amount from the ratios of measured
sunlight intensities at ve wavelengths between 306
and 320 nm with a resolution of 0.6 nm, where the
absorption by ozone presents large spectral struc-
tures (Anton et al., 2008). As a result, in this study
we used the daily mean ground-based total ozone
amount derived from the Brewer spectrophotometer
in Madrid as the objective variable to be predicted
from the predictive variables described above. The
Agencia Estatal de Meteorología (Meteorological
State Agency, AEMET) of Spain operates a national
Brewer spectrophotometer network, having one
of its instruments located at the radiometric station of
Madrid (40.8º N, 4.01º W). This Brewer instrument
is part of the WMO/GAW Global Ozone Monitoring
Network. Total ozone data cover the period from
March 1, 2013 to February 28, 2014, which represents
one year of daily measurements. Note that both Brew-
er and Cimel networks are managed under a quality
management system certied to ISO 9001:2008,
which guarantees their accuracy, and it ensures the
compliance of the measurements with international
standards on ozone and aerosol optical depth mea-
surements, particularly those stated by WMO. Table I
summarizes all the predictive (inputs) and objective
(target) variables considered in this paper.
3. Support vector regression algorithms
SVR (Smola and Scholkopf, 2004) is one of the
state-of-the-art algorithms for regression and
function approximation, which has yielded good
results in many different regression problems.
SVR algorithms are adequate for a large variety of
regression problems, since they do not only take
Table I. Input variables used for this study on TCO prediction.
Variable Source Previous Day Target day Units Spatial Coverage
Temperature prole ATMS X K 100 pressure levels
Humidity prole ATMS X % 100 pressure levels
Total Ozone OMPS X Dobson Atmospheric column
Aerosol Optical Depth Cimel sunphotometer X - Atmospheric column
Temperature prole forecast GFS X K 11 pressure levels
Humidity prole forecast GFS X % 11 pressure levels
Total Ozone (target to verify
the prediction)
Brewer
spectrophotometer X Dobson Atmospheric column
5
TCO efcient prediction with SMVs, numerical models and Suomi data
into account the error estimates of the data, but
also the generalization of the regression model (the
capability of the model to improve the prediction
when a new dataset is evaluated). Although there
are several versions of SVR, the e-SVR classical
model described in detail by Smola and Scholkopf
(2004), which has been used in a large number of
applications in science and engineering (Salcedo et
al., 2014), is considered in this work.
The SVR method for regression uses a given a
set of training vectors 𝕋 = {(xi, oi), i = 1,...l}, where
xi stands for the inputs, and oi stands for the TCO
variable to be predicted. For obtaining a regression
model of the form o(x) = f(x) + b = wT ϕ(x) + b, to
minimize a general risk function:
R
]=
1
2w2+C
l
i=1
L(oi(xi
))
(1)
where C is a hyper-parameter of the model, the norm
of w controls the smoothness of the regression model,
ϕ(x) is a function of projection of the input space to
the feature space, b is a parameter of bias for the
model, xi is a feature vector of the input space with
dimension N (training of the new input vector), yi is
the output value to be estimated and L (yi, f[xi]) is the
loss function selected (Smola and Scholkopf, 2004).
In this paper, we use the L1-SVRr (L1 support vector
regression), characterized by an ε-insensitive loss
function (Smola and Scholkopf, 2004):
L(o
i
,f (x
i
))
0if|oif(xi)|
|oif(xi)| otherwise
=
(2)
Figure 1 shows an example of an SVR-process
in a two-dimensional regression problem, with an
ε-insensitive loss function.
In order to train the above presented model, it is
necessary to solve the following optimization prob-
lem (Smola and Scholkopf, 2004):
min 1
2w
2
+C
l
i=1
*
(ξ
i
+ξ
i
)
(3)
subject to
oi−w
Tϕ(xi)− b + ξi,i=1,...,l
(4)
o
i
+wTϕ(x
i
)+ b*
+ ξi,i=1,...,l
(5)
*
ξ
i
,ξi≥0,i=1,...,l
(6)
The dual form of this optimization problem is
usually obtained through the minimization of the
Lagrange function, constructed from the objective
function and the problem constraints. In this case,
the dual form of the optimization problem is the
following:
ma
x−
1
2
l
*
**
*
i,j =1
(αiαi)(αjαj)K(xi,xj
)−
l
i=1
(αi+αi)+
l
i=1
oi(αiαi)
(7)
*
(αiαi)=
(8)
α
i
*
,αi[0,C]
(9)
In addition to these constraints, the Karush-Kuhn-
Tucker conditions must be fullled, and also the
bias variable, b, must be obtained. The interested
reader can consult Smola and Scholkopf (2004) for
reference. In the dual formulation of the problem the
function K(xi, xj) is the kernel matrix, which is formed
by the evaluation of a kernel function, equivalent to
the dot product (ϕ[xi], 0[xj]). A usual election for this
kernel function is a Gaussian function, as follows:
Kernel space
Input space
xi
*
xj
L(e)
Φ
ϕ(xi)
ξi
ξj
ϕ(xj)
0
–ε
0
*
ξiξj
e–ε
Fig. 1. Example of a SVR-process in a two-dimensional
regression problem, with an e-insensitive loss function.
6L. Carro-Calvo et al.
K
(x
i
,x
j
)=exp( γx
i
−x
j
2).
(10)
The nal form of function f(x) depends on the
Lagrange multipliers αiαi
*, as follows:
(x)=
l
i=1
(αi*
αi)K(xi,x
)
f (11)
In this way it is possible to obtain a SVR model
by means of the training of a quadratic problem for a
given hyper-parameters C, ϵ and γ. One of the most
used free SVR codes is the C implementation of the
algorithm described in Chang and Lin (2011), available
at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/.
4. Experiments and results
This section presents the experimental part of the
paper. First it is shown how the initial data are prepro-
cessed to keep a reduced number of predictive vari-
ables for the SVR. The methodology carried out to
evaluate the SVR performance is also described in the
next subsection. After this, the results obtained by
the SVR are presented, together with a comparison
with an MLP.
4.1 Data preprocessing and methodology
The input data set is huge, including 100 levels
of humidity and temperature from the satellite,
TCO measurement (from the previous days to
the one to be predicted), aerosol optical depths
at seven different wavelengths, and humidity and
temperature forecasts (11 different pressure levels:
925, 850, 700, 500, 400, 300, 250, 200, 150, 100
and 50 hPa), from the GFS model. A rst prepro-
cessing step is needed in order to reduce the size
of the data set. This is done by means of a features
extraction process using PCA, a technique that has
been used before in ozone analysis (Rajab et al.,
2013). After this preprocessing step, PCA vari-
ables that contain 99.5% of the variance are kept,
which results in a reduced number of variables, as
described in Table II.
Since only one year of data is available (see
section 2.3), the direct partition of the data into
training and test data (as usually performed) could
lead to misleading results. Instead, a 20-fold
cross validation procedure is proposed, i.e., the
available data are split into 20 subsets (with 13
or 14 days per subset), and the performance of the
SVR is analyzed by the average that results from
training the SVR in 19 subsets and testing in the
remaining one.
For comparison purposes an MLP with Lev-
enberg-Marquardt training algorithm (Hagan and
Menhaj, 1994) is used. MLPs have been previously
applied to TCO prediction, and are considered as the
state-of-the-art in this eld.
4.2 Results
First of all, the performance of the proposed SVR
was tested vs. the MLP approach using all variables
described in Table II. In addition, to establish the
most important features in TCO prediction, both
approaches were evaluated using each prediction
variable separately. Results are shown in Table
III. As can be seen, SVR outperforms MLP in all
the cases, with improvements in the range of 5
to 11%. TCO prediction by means of the SVR,
considering all the variables, is accurate, with a
mean absolute error (MAE) of about 28 Dobson
units. TCO prediction, with the input data taken
separately, reveals that the accurate prediction of
temperatures given by the GFS (10 variables after
the PCA pre-processing) is crucial to obtain good
TCO predictions. In contrast, neither aerosols and
water content (in situ measurements), nor humid-
ity given by satellite measurements, contribute to
improve the TCO prediction. It is also interesting
that the TCO measurement of the previous day is
Table II. Input variables considered for TCO prediction after a rst data extraction preprocessing step.
Variable # initial variables # nal variables Method
(HS) Humidity (Suomi) 100 3 PCA (99.5%)
(TS) Temperature (Suomi) 100 7 PCA (99.5%)
(AW) Aerosole+water content (Cimel) 7+1 2 PCA (99.5%)
(TCO) TCO measurements (Suomi) - 3 t-1,t-2,t-3
(HG) Humidity prediction (GFS) 11 9 PCA (99.5%)
(TG) Temperature prediction (GFS) 11 10 PCA (99.5%)
7
TCO efcient prediction with SMVs, numerical models and Suomi data
not a very good input variable for predicting TCO
for the following day.
The next issue is whether a subset of data can
provide a more accurate TCO prediction than the
complete set. Table IV shows the results of using
different subsets of predictive variables in TCO
prediction. Four subsets are investigated in this case,
and compared to the case where all variables are
considered. The rst subset analyzed is TS + TCO +
TG (temperature proles [Suomi] + TCO measure-
ment [Suomi] + temperatures prediction [GFS], in
all 20 predictive variables). The second, third and
fourth cases are subsets considering combinations of
two of these variables. As can be seen in Table IV,
TCO prediction using the TS + TCO + TG variables
and SVR is the best obtained in all the experiments
carried out, with a MAE of about 25 Dobson units.
Subsets of two of these variables with the SVR show
different behavior: the TCO + TG case (13 predic-
tive variables) also gives good results, only slightly
inferior to the case with three variables. The third
worse case is TS + TG, but it is still better than the
TCO prediction obtained considering all variables.
Note that the last case (TS + TCO, 10 predictive
variables) leads to much poorer results in terms of
TCO prediction, which highlights the importance of
the TG variables to obtain a good TCO prediction
with a daily time-horizon.
These results can be better visualized by means
of depicting TCO prediction graphs. Figures 2, 3, 4
and 5 show TCO prediction using the SVR approach
(temporal prediction and scatter plot), corresponding
to the predictive variables TS + TCO + TG, TCO
+ TG, TG + TS and TCO + TS, respectively. Note
the good prediction obtained by using SVR with
TS + TCO + TG, which follows the TCO peaks and
provides a very accurate prediction in all the cases
considered. In contrast, the input variables TCO + TS
provide a worse TCO prediction, in which the TCO
peaks are not completely resolved. This shows the
importance of temperature prediction variables (TG)
in TCO prediction, and how the rest of the satellite
variables provide a slightly more accurate prediction.
Note also that humidity variables (either the satellite
Table III. Results in TCO prediction (mean absolute
error, in Dobson units) obtained with the different input
variables considered.
Variables SVR MLP improvement (%)
all 28.86 31.18 7.44
HS 50.99 56.74 10.13
TS 36.69 41.27 11.09
AW 60.86 65.89 7.63
TCO 41.22 46.71 11.75
HG 44.42 49.33 9.95
TG 30.93 34.57 10.52
Table IV. Results in TCO prediction (mean absolute error
in Dobson units) obtained with selected subsets of the
input variables considered.
Variables SVR MLP improvement (%)
all 28.86 31.18 7.44
TS+TCO+TG 25.59 28.37 9.79
TCO+TG 26.92 30.02 10.32
TS+TG 27.48 29.93 8.18
TCO+TS 37.85 40.24 5.23
200 300 400 500 600 700
800
200
300
400
500
600
700
800
TOC measured (Dobson units)
(a)Scatterplot
(b) Temporal
0 50 100 150 200 250
200
300
400
500
600
700
800
Days (March 2014 - February 2014)
TOC predicted (Dubson units)
TOC (Dubson units)
Fig. 2. Prediction (scatter plot and temporal prediction)
with the SVR using TS + TCO + TG predictive variables
(20 variables); (a) scatter plot; (b) temporal prediction,
TCO measured (blue) and predicted (red).
8L. Carro-Calvo et al.
prole the day before prediction and humidity pre-
diction by GFS) do not seem to be relevant variables
for obtaining accurate daily TCO predictions.
5. Conclusions
The prediction of total column ozone (TCO) is a
difcult problem with important environmental
applications. In this paper, a novel and efcient
prediction method for TCO has been proposed,
which includes an excellent performance regression
approach (SVR) applied to a set of predictive vari-
ables from heterogeneous sources, such as satellite
data (Suomi NPP polar satellite), numerical models
(GFS) or direct measurements using devices such
as sunphotometers. Data from satellite instruments
consist of temperature and humidity proles at
different heights, and TCO measurements from the
days before the prediction. The GFS model provides
predictions of temperature and humidity for the day
of prediction. Alternative measurement data such as
aerosol optical depth at different wavelengths are
also considered in the system.
This work shows the good performance of the
proposed SVR algorithm applied to daily TCO pre-
diction, outperforming alternative algorithms such
as neural networks.
An analysis of the most suitable input data for
TCO prediction has also been carried out in this study.
The results show that temperature prediction by a
numerical model is the most important variable to be
considered in TCO prediction. We have shown that the
SVR methodology is able to provide excellent results
in daily TCO prediction, better than the previously
considered neural networks algorithms. The improve-
ment obtained with SVR over the neural networks
methodology is in the range of 5 to 11% in all the
cases evaluated. We have also shown the importance
of a good temperature prediction by numerical models
in obtaining accurate TCO predictions, which can be
200 300 400 500 600 700 80
0
200
300
400
500
600
700
800
0 50 100 150 200 250
200
300
400
500
600
700
800
(a) Scatter plot
TOC predicted (Dubson units)
TOC (Dubson units)
TOC measured (Dobson units)
Days (March 2014 - February 2014)
(b) Temporal
Fig. 3. Prediction (scatter plot and temporal prediction)
with SVR using the TCO + TG predictive variables (13
variables). (a) Scatter plot; (b) temporal prediction, TCO
measured (blue) and predicted (red).
200 300 400 500 600 700
800
200
300
400
500
600
700
800
050 100 150 200 250
200
300
400
500
600
700
800
(a) Scatterplot
TOC predicted (Dubson units)
TOC (Dubson units)
TOC measured (Dobson units)
Days (March 2014 - February 2014)
(b) Temporal
Fig. 4. Prediction (scatter plot and temporal prediction)
with SVR using the TS + TG predictive variables (17
variables). (a) Scatter plot; (b) temporal prediction, TCO
measured (blue) and predicted (red).
9
TCO efcient prediction with SMVs, numerical models and Suomi data
complemented with satellite measurements to improve
even more the accuracy of the prediction results.
Acknowledgments
This work has been partially supported by the
project TIN2014-54583-C2-2-R of the Comisión Inter-
ministerial de Ciencia y Tecnología (CICYT) of Spain.
References
Antón M., D. Loyola, B. Navascues and P. Valks, 2008.
Comparison of GOME total ozone data with ground data
from the Spanish Brewer spectroradiometers. Ann. Geo-
phys. 26, 401-412, doi:10.5194/angeo-26-401-2008.
Anton M., D. Bortoli, M. J. Costa, P. S. Kulkarni, A. F.
Domingues, D. Barriopedro, A. Serrano and A. M.
Silva, 2011a. Temporal and spatial variabilities of total
ozone column over Portugal. Remote Sens. Environ.
115, 855-863, doi:10.1016/j.rse.2010.11.013.
Anton M., D. Bortoli, P. S. Kulkarni, M. J. Costa,
A. F. Domingues, D. Loyola, A. M. Silva and L.
Alados-Arboledas, 2011b. Long-term trends of total
ozone column over the Iberian Peninsula for the
period 1979-2008. Atmos. Environ. 45, 6283-6290,
doi:10.1016/j.atmosenv.2011.08.058.
Bronnimann S., J. Luterbacher, C. Schmutz, H. Wanner
and J. Staehelin, 2000. Variability of total ozone at
Arosa, Switzerland, since 1931 related to atmospheric
circulation indices. Geophys Res. Lett. 27, 22132216,
doi:10.1029/1999GL011057.
Chang C. C. and C. J. Lin, 2011. LIBSVM: A library for
support vector machines. ACM Tran. Intel. Syst. Tech.
2, 1-27, doi:10.1145/1961189.1961199.
Chattopadhyay S. and G. Bandyopadhyay, 2007. Arti-
cial neural network with back propagation learn-
ing to predict mean monthly total ozone in Arosa,
Switzerland. Int. J. Remote Sens. 28, 4471-4482,
doi:10.1080/01431160701250440.
Chattopadhyay G. and S. Chattopadhyay, 2009a. Autore-
gressive forecast of monthly total ozone concentration:
a neurocomputing approach. Comp. Geosci. 35, 1925-
1932, doi:10.1016/j.cageo.2008.11.007.
Chattopadhyay G. and S. Chattopadhyay, 2009b. Predict-
ing daily total ozone over Kolkata, India: skill assess-
ment of different neural network models. Meteorol.
Appl. 16, 179-190, doi:10.1002/met.97.
Christakos G., A. Kolovos, M. L. Serre and F. Vukovich,
2004. Total ozone mapping by integrating databases
from remote sensing instruments and empirical mod-
els. IEEE Trans. Geosci. Remote Sens. 42, 9911008,
doi:10.1109/TGRS.2003.822751.
Crutzen P. J., 1970. The inuence of nitrogen oxide on the
atmospheric ozone content. Q. J. Roy. Meteor. Soc. 96,
320-327, doi:10.1002/qj.49709640815.
Crutzen P. J., 1971. Ozone production rates in an oxy-
gen-hydrogen-nitrogen oxide atmosphere. J. Geoph.
Res. 76, 7311-7327, doi:10.1029/JC076i030p07311.
Farman J. C., B. Gardiner and J. D. Shanklin, 1985.
Large losses of total ozone in Antarctica reveal sea-
sonal ClOx/NOx interaction. Nature 315, 207-210,
doi:10.1038/315207a0.
Hagan M. T. and M. B. Menhaj, 1994. Training
feed forward network with the Marquardt al-
gorithm. IEEE Trans. Neural Net. 5, 989-993,
doi:10.1109/72.329697.
Holben B. N., T. F. Eck. I. Slutsker, D. Tanre, J. P. Buis, A.
Setzer A, et al., 1998. AERONET-A federated instru-
ment network and data archive for aerosol characteri-
zation. Remote Sens. Environ. 66, 1-16, doi:10.1016/
S0034-4257(98)00031-5.
Fig. 5. Prediction (Scatter plot and temporal prediction)
with the SVR using TCO+TS predictive variables (10
variables); (a) Scatter plot; (b) Temporal prediction, TCO
measured (blue) and predicted (red).
200 300 400 500 600 70
08
00
200
300
400
500
600
700
800
050 100 150 200 250
200
300
400
500
600
700
800
(a) Scatter plot
TOC predicted (Dubson units)
TOC (Dubson units)
TOC measured (Dobson units)
Days (March 2014 - February 2014)
(b) Temporal
10 L. Carro-Calvo et al.
Jin X., J. Li, C. C. Schmidt, T. J. Schmit and J. Li, 2008.
Retrieval of total column ozone from images onboard
geostationary satellites. IEEE Trans. Geosci. Remote
Sens. 46, 479-488, doi:10.1109/TGRS.2007.910222.
Kanamitsu M., J. C. Alpert, K. A. Campana, P. M.
Caplan, D. G. Deaven, M. Iredell, et al., 1991. Re-
cent changes implemented into the Global Forecast
System at NMC. Weather Forecast 6, 425-436,
doi:10.1175/1520-0434(1991)006¡0425:RCIITG¿2.0.
CO;2.
Khokhlov V. N. and A. V. Romanova, 2011. NAO-induced
spatial variations of total ozone column over Europe
at near-synoptic time scale. Atmos. Environ. 45, 3360-
3365, doi:10.1016/j.atmosenv.2011.03.056.
Krzyscin J. W. and J. L. Borkowski, 2008. Variability
of the total ozone trend over Europe for the period
1950-2004 derived from reconstructed data. Atmos.
Chem. and Phys. 8, 2847-2857, doi:10.5194/acp-8-
2847-2008.
Latha K. M. and K. V. Badarinath, 2003. Impact of aerosols
on total columnar ozone measurements. A case study
using satellite and ground-based instruments. Atmos.
Res. 66, 307-313, doi:10.1016/S0169-8095(03)00026-7.
Martínez-Ballesteros M., S. Salcedo-Sanz, J. C. Riquelme,
C. Casanova-Mateo and J. L. Camacho, 2011. Evo-
lutionary association rules for total ozone content
modeling from satellite observations. Chemom.
Intel. Lab. Syst. 109, 217-227, doi:10.1016/j.chemo-
lab.2011.09.011.
Molina M. J. and F. S. Rowland, 1974. Stratospheric
sink for chlorouoromethanes: Chlorine atom cat-
alyzed destruction of ozone. Nature 249, 820-812,
doi:10.1038/249810a0.
Monge-Sanz B. and N. Medrano-Marques, 2004. Total
ozone time series analysis: a neural network model
approach. Non-lin. Proc. Geophys. 11, 683-689,
doi:10.5194/npg-11-683-2004.
Pinedo-Vega J. L., C. Ríos-Martínez, F. Mireles-García,
V. M. García-Saldíar, J. I. Dávila-Rangel and A. R.
Salazar-Román, 2014. Trend of total column ozone
over Mexico from TOMS and OMI data (1978-
2013). Atmósfera 27, 251-260, doi:10.1016/S0187-
6236(14)71114-2.
Rajab J. M., M. Z. MatJafri and H. S. Lim, 2013. Com-
bining multiple regression and principal component
analysis for accurate predictions for column ozone
in Peninsular Malaysia. Atmos. Environ. 71, 36-43,
doi:10.1016/j.atmosenv.2013.01.019.
Salcedo-Sanz S., J. L. Camacho, A. M. Pérez-Bellido and
E. Hernández-Martín, 2010. Novel deseasonalizing
models for improving the prediction of total ozone in
column using evolutionary programming and neural
networks. J. Atmos. Solar-Terr. Phys. 72, 1333-1340,
doi:10.1016/j.jastp.2010.09.021.
Salcedo-Sanz S., J. L. Camacho, A. M. Perez-Bellido,
E. Ortiz-García, A. Portilla-Figueras and E. Hernán-
dez-Martín, 2011. Improving the prediction of average
total ozone in column over the Iberian Peninsula using
neural networks banks. Neurocomp. 74, 1492-1496,
doi:10.1016/j.neucom.2011.01.003.
Salcedo-Sanz S., J. L. Rojo, M. Martínez-Ramon and
G. Camps-Valls, 2014. Support vector machines in
engineering: an overview. WIREs Data-Min. Knowl.
Discover. 4, 234-267, doi:10.1002/widm.1125.
Savastiouk V. and C. T. McElroy, 2005. Brewer spectro-
photometer total ozone measurements made during the
1998 middle atmosphere nitrogen trend assessment
(MANTRA) Campaign. Atmos. Ocean. 43, 315-324,
doi:10.3137/ao.430403.
Silva A. A., 2007. A quarter century of TOMS total column
ozone measurements over Brazil. J. Atmos. Solar-Terr.
Phys. 69, 1447-1458, doi:10.1016/j.jastp.2007.05.006.
Smola A. J. and B. Scholkopf, 2004. A tutorial on sup-
port vector regression. Stat. Comput. 14, 199-222,
doi:10.1023/B:STC0.0000035301.49549.88.
Steinbrecht W., B. Hassler, H. Claude, P. Winkler and R.
S. Stolarski, 2003. Global distribution of total ozone
and lower stratospheric temperature variations. Atmos.
Chem. Phys. 3, 1421-1438, doi:10.5194/acp-3-1421-
2003.
Stolarski R. S. and R. J. Cicerone, 1974. Stratospheric
chlorine: a possible sink for ozone. Canadian J. Chem.
52, 1610-1615, doi:10.1139/v74-233.
Varotsos C., C. Cartalis, A. Vlamakis, C. Tzanis and I.
Keramitsoglou, 2004. The long-term coupling between
column ozone and tropopause properties. J. Clim. 17,
3843-3854, doi:10.1175/1520-0442(2004)017¡3843
:TLCBC0¿2.0.C0;2.
Wang C., G. R. Jeong and N. Mahowald, 2009. Partic-
ulate absorption of solar radiation: anthropogenic
aerosols vs. dust. Atmos. Chem. Phys. 9, 3935-3945,
doi:10.5194/acp-9-3935-2009.
... One of the first works on exploiting data fusion from multi-sensor sources with ML was [66], where a problem of ionograms inversion is tackled using data fusion techniques with neural networks. More recently, there have been alternative specific applications, such as [67] where a problem of total ozone in column prediction is tackled with SVMs and information fusion from different data sources, such as numerical models, ground stations and satellite data. In [68] a problem of eddy detection from different satellite sensors images was faced, using a deep learning approach for multi-scale feature fusion, followed by a SVM algorithm. ...
... Kalman filter (KF) [57] Other applications Ionograms inversion [66], Total ozone atmospheric content [67], Eddies detection at sea [68], Forest Biomass estimation [69], Seagrass presence [70], Environmental sound recognition [71] Neural networks [66], Support Vector Machines [67,68] 3.2. In-situ observations: ground, atmosphere, and ocean ...
... Kalman filter (KF) [57] Other applications Ionograms inversion [66], Total ozone atmospheric content [67], Eddies detection at sea [68], Forest Biomass estimation [69], Seagrass presence [70], Environmental sound recognition [71] Neural networks [66], Support Vector Machines [67,68] 3.2. In-situ observations: ground, atmosphere, and ocean ...
Preprint
Full-text available
This paper reviews the most important information fusion data-driven algorithms based on Machine Learning (ML) techniques for problems in Earth observation. Nowadays we observe and model the Earth with a wealth of observations, from a plethora of different sensors, measuring states, fluxes, processes and variables, at unprecedented spatial and temporal resolutions. Earth observation is well equipped with remote sensing systems, mounted on satellites and airborne platforms, but it also involves in-situ observations, numerical models and social media data streams, among other data sources. Data-driven approaches, and ML techniques in particular, are the natural choice to extract significant information from this data deluge. This paper produces a thorough review of the latest work on information fusion for Earth observation, with a practical intention, not only focusing on describing the most relevant previous works in the field, but also the most important Earth observation applications where ML information fusion has obtained significant results. We also review some of the most currently used data sets, models and sources for Earth observation problems, describing their importance and how to obtain the data when needed. Finally, we illustrate the application of ML data fusion with a representative set of case studies, as well as we discuss and outlook the near future of the field.
... Kalman filter (KF) [57] Other applications Ionograms inversion [66] , Total ozone atmospheric content [67] , Eddies detection at sea [68] , Forest Biomass estimation [69] , Seagrass presence [70] , Environmental sound recognition [71] Neural networks [66] , Support Vector Machines [67,68] , Random Forest [69] , Convolutional Neural Networks [71] [ [66][67][68][69][70][71] spatio-temporal fusion models for remote sensing, by means of a comparison among the most important existing ones. Also within the framework of remote sensing, Gómez-Chova et al. [9] has presented a review on multi-modal classification techniques for remote sensing images, and Schmitt and Zhu [10] presented a review specifically focused on data fusion techniques and algorithms for remote sensing. ...
... Kalman filter (KF) [57] Other applications Ionograms inversion [66] , Total ozone atmospheric content [67] , Eddies detection at sea [68] , Forest Biomass estimation [69] , Seagrass presence [70] , Environmental sound recognition [71] Neural networks [66] , Support Vector Machines [67,68] , Random Forest [69] , Convolutional Neural Networks [71] [ [66][67][68][69][70][71] spatio-temporal fusion models for remote sensing, by means of a comparison among the most important existing ones. Also within the framework of remote sensing, Gómez-Chova et al. [9] has presented a review on multi-modal classification techniques for remote sensing images, and Schmitt and Zhu [10] presented a review specifically focused on data fusion techniques and algorithms for remote sensing. ...
... Kalman filter (KF) [57] Other applications Ionograms inversion [66] , Total ozone atmospheric content [67] , Eddies detection at sea [68] , Forest Biomass estimation [69] , Seagrass presence [70] , Environmental sound recognition [71] Neural networks [66] , Support Vector Machines [67,68] , Random Forest [69] , Convolutional Neural Networks [71] [ [66][67][68][69][70][71] spatio-temporal fusion models for remote sensing, by means of a comparison among the most important existing ones. Also within the framework of remote sensing, Gómez-Chova et al. [9] has presented a review on multi-modal classification techniques for remote sensing images, and Schmitt and Zhu [10] presented a review specifically focused on data fusion techniques and algorithms for remote sensing. ...
Article
This paper reviews the most important information fusion data-driven algorithms based on Machine Learning (ML) techniques for problems in Earth observation. Nowadays we observe and model the Earth with a wealth of observations, from a plethora of different sensors, measuring states, fluxes, processes and variables, at unprecedented spatial and temporal resolutions. Earth observation is well equipped with remote sensing systems, mounted on satellites and airborne platforms, but it also involves in-situ observations, numerical models and social media data streams, among other data sources. Data-driven approaches, and ML techniques in particular, are the natural choice to extract significant information from this data deluge. This paper produces a thorough review of the latest work on information fusion for Earth observation, with a practical intention, not only focusing on describing the most relevant previous works in the field, but also the most important Earth observation applications where ML information fusion has obtained significant results. We also review some of the most currently used data sets, models and sources for Earth observation problems, describing their importance and how to obtain the data whether needed. Finally, we illustrate the application of ML data fusion with a representative set of case studies, as well as we discuss and outlook the near future of the field.
... The successful application of SVR with accurate prediction was presented for the prediction of SO 2 hourly in Wanliu of Beijing in China [47]. Additionally, the SVR was applied for modeling the total Column Ozone using input variables of daily temperature and humidity [48]. It was concluded that SVR outperformed prediction algorithms compared to ANNs. ...
Article
Full-text available
Modeling air quality in city centers is essential due to environmental and health-related issues. In this study, machine learning (ML) approaches were used to approximate the impact of air pollutants and metrological parameters on SO2 quality levels. The parameters, NO, NO2, O3, PM10, RH, HyC, T, and P are significant factors affecting air pollution in Jeddah city. These factors were considered as the input parameters of the ANNs, MARS, SVR, and Hybrid model to determine the effect of those factors on the SO2 quality level. Hence, ANN was employed to approximate the nonlinear relation between SO2 and input parameters. The MARS approach has successful applications in air pollution predictions as an ML tool, employed in this study. The SVR approach was used as a nonlinear modeling tool to predict the SO2 quality level. Furthermore, the MARS and SVR approaches were integrated to develop a novel hybrid modeling scheme for providing a nonlinear approximation of SO2 concentration. The main innovation of this hybrid approach applied for predicting the SO2 quality levels is to develop an efficient approach and reduce the time-consuming calibration processes. Four comparative statistical considerations, MAE, RMSE, NSE, and d, were applied to measure the accuracy and tendency. The hybrid SVR model outperforms the other models with the lowest RMSE and MAE, and the highest d and NSE in testing and training processes.
... Ozone formation is a different process in terms of time and according to the circumstances of an area. Ozone is produced in various chemical reactions, but the first mechanism is the absorption of ultraviolet radiation energy (UV) from the sun and displacement in the atmosphere (Carro-Calvo et al. 2017). Ozone (O 3 ) will form when the oxygen gas (O 2 ) acts to absorb sunlight (UV) at a wave distance of 242 nanometers and is broken down in the removal of the chemical-light response from the sun by a wave distance of more than 290 nanometers. ...
Article
The growth of air pollution as a serious concern throughout the country is influenced by several causes sourcing from transportation, combustion, and industrialization. These factors reduced the air quality which harms society, animals, and plants’ well-being. Surface ozone is one of the main causes of air pollution in developing areas around the world. Analysis of surface ozone by area and season variations increased during summer were associated with meteorological parameters such as atmospheric temperature and humidity, meanwhile, the concentration of surface ozone is low during the northeast monsoon season. In addition, the concentration of surface ozone also increases when there are other environmental influences such as the release of pollutants from anthropogenic activity. Therefore, a study will be carried out to identify the variation of surface ozone concentrations by region and monsoon season in peninsular Malaysia, especially in the northern, eastern, and western zones, to identify the highest duration for daily ozone concentration (O3) and to study the relationship between the monthly and annual surface ozone concentrations with meteorological parameters. The results in this study can be used to identify air quality and address the problem of air pollution so that human health and the environment are preserved by using the method of boxplot and line graph obtained from descriptive statistics.
... The combination of support vector regression algorithms and numerical models were studied in (Carro-Calvo et al., 2017). In the work of Lu et al. (Lu and Wang, 2014), the authors explained the limitations of both ANN and support vector machines (SVM) in the field of the ground-level O 3 prediction. ...
Article
Full-text available
Surface ozone (O3) is considered an hazard to human health, affecting vegetation crops and ecosystems. Accurate time and location O3 forecasting can help to protect citizens to unhealthy exposures when high levels are expected. Usually, forecasting models use numerous O3 precursors as predictors, limiting the reproducibility of these models to the availability of such information from data providers. This study introduces a 24 h-ahead hourly O3 concentrations forecasting methodology based on bagging and ensemble learning, using just two predictors with lagged O3 concentrations. This methodology was applied on ten-year time series (2006–2015) from three major urban areas of Andalusia (Spain). Its forecasting performance was contrasted with an algorithm especially designed to forecast time series exhibiting temporal patterns. The proposed methodology outperforms the contrast algorithm and yields comparable results to others existing in literature. Its use is encouraged due to its forecasting performance and wide applicability, but also as benchmark methodology.
Article
Full-text available
Partikül madde (PM) kirliliği önemli çevresel sorunlara sebep olmaktadır. PM kirliliğinin olumsuz etkileri, canlı sağlığına yönelik riskleri nedeniyle yaygın bir sorun haline gelmiştir. PM kirliliğinin tüm bu olumsuz etkileri ve atmosferdeki karmaşık etkileşimi sebebiyle, daha fazla çalışmaya konu olması önemlidir. Özellikle, PM kirliliğinin izlenmesi ve tahmin edilmesi konusunda yapılacak çalışmalar önemlidir. Son yıllarda meteorolojik faktörler göz önüne alınarak PM kirliliğinin tahmin edilmesi çalışmaları artmıştır. Özellikle makine öğrenme yöntemleri ile PM kirliliği tahmini çalışmaları hız kazanmıştır. Bu çalışmada, meteorolojik faktörler göz önüne alınarak çeşitli makine öğrenme algoritmaları ile PM10 kirliliği tahmin edilmiştir. Çalışmada kullanılan meteoroloji verileri Meteoroloji Genel Müdürlüğü Ankara Bölge istasyonundan (enlem:39,9727, boylam:32,8637, rakım:891 m.) elde edilmiştir. PM10 kirlilik verileri ise Çevre, Şehircilik ve İklim Değişikliği Bakanlığı Ankara Keçiören-Sanatoryum hava kalitesi istasyonundan (enlem: 39,999, boylam: 32,856, rakım: 1009 m.) elde edilmiştir. Makine öğrenme çalışması aşamasında, sıcaklık, çiğ noktası sıcaklığı, yağış, bağıl nem, rüzgar hızı, basınç, bulut kapalılığı ve bir önceki güne ait PM10 ölçümleri göz önüne alınarak, farklı makine öğrenme (karar ağacı regresyonu, destek vektör regresyonu, lasso regresyonu ve yapay sinir ağı) algoritmalarıyla ayrı ayrı çalışma yapılmış ve bu algoritmaların tutarlılıkları karşılaştırılmıştır. Tutarlılıklarının incelenmesi aşamasında çeşitli istatistiksel metrikler kullanılmıştır. Sonuçta, test bölümü göz önüne alındığında, yapay sinir ağı algoritmasının belirleme katsayısı ̴0,6, kök ortalama kare hatası ̴18 ve ortalama mutlak hata ̴12 olarak bulunmuş ve yapay sinir ağı algoritmasının diğer algoritmalara göre daha iyi sonuç verdiği görülmüştür.
Article
The present study attempts to detect anomalies and evaluate statistical climatological trends of recorded total columnar ozone (RTCO) over the Indian sites. The 30–54 years of RTCO data recorded by the Dobson Spectrophotometer obtained from the India Meteorological Department (IMD) is used. TCO Anomalies are detected using predicted TCO (PTCO) from a Long Short-Term Memory (LSTM) based neural network model. The percentage of anomalies detected by the current model are 1.20% (2%), 0.76% (1.76%), 0.97% (0.72%) and 1.07% (1.60%) for Dobson Spectrophotometer (Satellite measured TCO) over New Delhi, Kodaikanal, Pune and Varanasi respectively. After removing anomalies, the PTCO by the neural network (NN) model correlates a minimum of 83% (New Delhi) and a maximum of 94% (Pune) with RTCO measurements, which demonstrates the accuracy of the present model in predicting the TCO. Using the anomaly removed long-term RTCO measurements, statistical climatological trends are estimated using Mann–Kendall (MK) test-based Sen’s slope to evaluate the significance of the linear fit. Results of linear regression (MK test) based linear fit reported an increasing TCO trend over New Delhi and decreasing TCO trend over Varanasi with slope of 0.22 (0.21) DU year⁻¹ and -0.40 (-0.46) DU year⁻¹ respectively. However, the MK-based statistical test shows no trend over Kodaikanal and Pune.
Article
Full-text available
Using satellite measurements from the Total Ozone Mapping Spectrometer (TOMS) and Ozone Monitoring Instrument (OMI) version 8, this work presents the total column ozone (TCO) trends over Mexico and, in particular, over the state of Zacatecas. Interannual variations and their statistical dispersion show a surprisingly systematic behavior. Yearly low values occur during December and January, while high values between April and May. A significant depletion of about 2.5% in TCO between 1978 and 1994 is derived from their statistical analysis, which also shows stabilization from 1996 to 2013. Although the depletion is merely significant, it is a sign that the studied regions, crossed by the Tropic of Cancer, have not escaped to the depletion of the ozone layer. The characterization described herein is important in terms of the correlation of TCO and ultraviolet radiation levels.
Article
Full-text available
The observational data of the vertical temperature distribution and column ozone, obtained from 10 main stations in the Northern Hemisphere, are analyzed in order to explore the tropopause variations in conjunction with the dynamical variability in column ozone. From the analysis presented, it is evident that the summer distribution of the frequency of occurrence of the tropopause over Greece, apart from its main maximum (around 12 km), is also characterized by a secondary one around 16 km. It is proposed that this elevated maximum possibly originates from the height variation of the tropopause from 12 to 16 km depending on whether the Athens station is located below the cyclonic shear side or below the anticyclonic shear side of the subtropical jet stream. It is also suggested that the transport in the upper troposphere and lower stratosphere that originated in the equatorial region forces the appearance of the multiple tropopauses above Greece. Furthermore, the observational analysis of the vertical ozone distribution above Greece shows that the upward movement of the ozone profile is accompanied by an increase in the annually averaged tropopause height, which leads to an excessive column ozone trend around 0.5% 1.0% decade-1. Additionally, the linear regression analyses of the deseasonalized monthly mean column ozone and tropopause height indicate that the tropopause variations might be responsible for about a quarter of the observed total ozone content (TOC) trend over Greece, the same magnitude of midlatitude ozone depletion that 2D dynamical and chemical models cannot reproduce. This part of the trend is only due to the variations in the upper-troposphere/lower-stratosphere region and not attributable to all dynamical changes and forcing on TOC. Finally, the inverse relationship between column ozone and tropopause height at various geographical sites shows a longitudinal and latitudinal variability, with the strongest signal observed in the eastern midlatitudes of the Northern Hemisphere. At these geographical sites, changes in both the column ozone and lower-stratospheric temperature are roughly 10 Dobson unit (DU) K-1.
Article
Full-text available
Present study deals with the mean monthly total ozone time series over Arosa, Switzerland. The study period is 1932-1971. First of all, the total ozone time series has been identified as a complex system and then Artificial Neural Networks models in the form of Multilayer Perceptron with back propagation learning have been developed. The models are Single-hidden-layer and Two-hidden-layer Perceptrons with sigmoid activation function. After sequential learning with learning rate 0.9 the peak total ozone period (February-May) concentrations of mean monthly total ozone have been predicted by the two neural net models. After training and validation, both of the models are found skillful. But, Two-hidden-layer Perceptron is found to be more adroit in predicting the mean monthly total ozone concentrations over the aforesaid period.
Article
Long-term variability of total ozone over Europe is discussed using results of a flexible trend model applied to the reconstructed total ozone data for the period 1950–2004. The data base used was built within the objectives of the COST action 726 "Long-term changes and climatology of UV radiation over Europe". The trend pattern, which comprises both anthropogenic and "natural" component, is not a priori assumed but it is a result of a smooth curve fit to the zonal monthly means and monthly grid values. The trend values in 5-year and 10-year intervals in cold (October-next year April) and warm (May–September) seasons are calculated as the differences between the smooth curve values at the end and beginning of selected time intervals divided by length of the intervals. The confidence intervals for the trend values are calculated by the block bootstrapping. The statistically significant negative trends are found almost over whole Europe only in the period 1985–1994. Negative trends up to −3% per decade appeared over small areas in earlier periods when the anthropogenic forcing on the ozone layer was weak. The statistically positive trends are found only during warm seasons 1995–2004 over Svalbard archipelago. The reduction of ozone level in 2004 relative to that before the satellite era is not dramatic, i.e., up to ~−5% and ~−3.5% in the cold and warm subperiod, respectively. Present ozone level is still depleted over many popular resorts in southern Europe and northern Africa. For high latitude regions the trend overturning could be inferred in last decade (1995–2004) as the ozone depleted areas are not found there in 2004 in spite of substantial ozone depletion in the period 1985–1994.
Article
This paper provides an overview of the support vector machine (SVM) methodology and its applicability to real-world engineering problems. Specifically, the aim of this study is to review the current state of the SVM technique, and to show some of its latest successful results in real-world problems present in different engineering fields. The paper starts by reviewing the main basic concepts of SVMs and kernel methods. Kernel theory, SVMs, support vector regression (SVR), and SVM in signal processing and hybridization of SVMs with meta-heuristics are fully described in the first part of this paper. The adoption of SVMs in engineering is nowadays a fact. As we illustrate in this paper, SVMs can handle high-dimensional, heterogeneous and scarcely labeled datasets very efficiently, and it can be also successfully tailored to particular applications. The second part of this review is devoted to different case studies in engineering problems, where the application of the SVM methodology has led to excellent results. First, we discuss the application of SVR algorithms in two renewable energy problems: the wind speed prediction from measurements in neighbor stations and the wind speed reconstruction using synoptic-pressure data. The application of SVMs in noninvasive cardiac indices estimation is described next, and results obtained there are presented. The application of SVMs in problems of functional magnetic resonance imaging (fMRI) data processing is further discussed in the paper: brain decoding and mental disorder characterization. The following application deals with antenna array processing, namely SVMs for spatial nonlinear beamforming, and the SVM application in a problem of arrival angle detection. Finally, the application of SVMs to remote sensing image classification and target detection problems closes this review.
Article
In this paper we propose an evolutionary method of association rules discovery (EQAR, Evolutionary Quantitative Association Rules) that extends a recently published algorithm by the authors and we describe its application to a problem of Total Ozone Content (TOC) modeling in the Iberian Peninsula. We use TOC data from the Total Ozone Mapping Spectrometer (TOMS) on board the NASA Nimbus-7 satellite measured at three locations (Lisbon, Madrid and Murcia) of the Iberian Peninsula. As prediction variables for the association rules we consider several meteorological variables, such as Outgoing Long-wave Radiation (OLR), Temperature at 50 hPa level, Tropopause height, and wind vertical velocity component at 200 hPa. We show that the best association rules obtained by EQAR are able to accurate modeling the TOC data in the three locations considered, providing results which agree to previous works in the literature.
Article
In this paper we present a novel method for deseasonalizing TOC data using non-linear models, with evolutionary computation techniques, and its performance with a neural network as regression approach. Specifically, the proposed deseasonalization method uses an evolutionary programming (EP) approach to carry out a curve fitting problem, where a given function model is optimized to be as similar as possible to an objective curve (a real TOC measurement in this case). Different non-linear models are proposed to be optimized with the EP algorithm. In addition, we test the possibility of deseasonalizing the TOC measurement and also the meteorological input data. The deseasonalized series is then used to train a neural network (multi-layer perceptron). We test the proposed models in the prediction of several TOC series in the Iberian Peninsula, where we carry out a comparison against a reference deseasonalizing model previously proposed in the literature. The results obtained show the good performance of some of the deseasonalizing models proposed in this paper.
Article
This study proposes that the oxides of chlorine, ClOx, may constitute an important sink for stratospheric ozone. A photochemical scheme is devised which includes two catalytic cycles through which ClOx destroys odd oxygen. The individual ClX constituents (HCl, Cl, ClO, and OClO) perform analogously to the respective constituents (HNO3, NO, NO2, and NO3) in the NOx catalytic cycles, but the ozone destruction efficiency is higher for ClOx. Our photochemical scheme predicts that ClO is the dominant chlorine constituent in the lower and middle stratosphere and HCl dominates in the upper stratosphere. Sample calculations are performed for several ClX altitude profiles: an assumed 1 p.p.b. volume mixing ratio, a ground level source, and direct injection by volcanic explosions. Finally we discuss certain limitations of the present model: uncertainty in stratospheric OH concentrations, the possibility that ClOO exists, the need to couple ClOx cycles with NOx and HOx cycles, and possible heterogeneous reactions.