Content uploaded by Gutiérrez-Bustillo Adela M.
Author content
All content in this area was uploaded by Gutiérrez-Bustillo Adela M. on Feb 19, 2021
Content may be subject to copyright.
ORIGINAL PAPER
Predicting the Olea pollen concentration with a machine learning
algorithm ensemble
José María Cordero
1
&J. Rojo
2
&A. Montserrat Gutiérrez-Bustillo
3
&Adolfo Narros
1
&Rafael Borge
1
Received: 11 May 2020 /Revised: 14 September 2020 /Accepted: 31 October 2020
#ISB 2020
Abstract
Air pollution in large cities produces numerous diseases and even millions of deaths annually according to the World Health
Organization. Pollen exposure is related to allergic diseases, which makes its prediction a valuable tool to assess the risk level to
aeroallergens. However, airborne pollen concentrations are difficult to predict due to the inherent complexity of the relationships
among both biotic and environmental variables. In this work, a stochastic approach based on supervised machine learning
algorithms was performed to forecast the daily Olea pollen concentrations in the Community of Madrid, central Spain, from
1993 to 2018. Firstly, individual Light Gradient Boosting Machine (LightGBM) and artificial neural network (ANN) models
were applied to predict theday of the year (DOY) when the peak of the pollen season occurs, resulting the estimated average peak
date 149.1 ± 9.3 and 150.1 ± 10.8 DOY for LightGBM and ANN, respectively, close to the observed value (148.8 ± 9.8).
Secondly, the daily pollen concentrations during the entire pollen season have been calculated using an ensemble of two-step
GAM followed by LightGBM and ANN. The results of the prediction of daily pollen concentrations showed a coefficient of
determination (r
2
) above 0.75 (goodness of the model following cross-validation). The predictors included in the ensemble
models were meteorological variables, phenological metrics, specific site-characteristics, and preceding pollen concentrations.
The models are state-of-the-art in machine learning and their potential has been shown to be used and deployed to understand and
to predict the pollen risk levels during the main olive pollen season.
Keywords Air quality .Pollen exposure .Pollen prediction .Neural networks .Boosted trees
Introduction
Poor air quality is associated to mortality and morbidity through
respiratory causes, including cardiovascular diseases and lung
cancer (Burnett et al. 2018; Cole-Hunter et al. 2018). According
to the World Health Organization (WHO), 4.2 million prema-
ture deaths worldwide every year can be attributed to air pollu-
tion (“World Health Organization”,2019). Moreover, the
contribution of air pollution to premature mortality could dou-
ble by 2050 (Lelieveld et al. 2015). Over 500,000 premature
annual deaths are attributed to population exposure to PM
2.5
,
NO
2
,andO
3
in Europe (EEA 2019a). In this sense, there is a
clear need to improve urban air quality and foster climatic co-
benefits (Balbus et al. 2014;Xieetal.2018) in agreement with
the air quality standards of the Air Quality Directive (Directive
2008/50/EC of the European Parliament and of the Council of
21 May 2008 on ambient air quality and cleaner air for
Europe, 2008) (AQD) (EEA 2019b).
In addition to abiotic air pollutants (particulate inorganic
matter, NO
2
,SO
2
,O
3
, etc.), the interest of the air quality field
is increasing on biological particles which induce an important
risk for human health (Lake et al. 2017). According to the
World Allergy Organization, allergies are considered one of
the most relevant public health problems in this century
(D’Amato et al. 2015;Pawankaretal.2013). Therefore, nu-
merous medical studies have quantified hospital admissions as
a consequence of allergenic reactions to high pollen levels in
the preceding days (Diaz et al. 2007; Galan et al. 2010;Thien
*José María Cordero
jm.cordero@upm.es
1
Universidad Politécnica de Madrid (UPM). ETSII-UPM, José
Gutiérrez Abascal 2, 28006 Madrid, Spain
2
University of Castilla-La Mancha. Institute of Environmental
Sciences (Botany), Avda. Carlos III s/n, E-45071 Toledo, Spain
3
Department of Pharmacology, Pharmacognosy and Botany,
Complutense University of Madrid, Ciudad Universitaria,
28040 Madrid, Spain
International Journal of Biometeorology
https://doi.org/10.1007/s00484-020-02047-z
et al. 2018). With this background, the interest in the monitor-
ing of airborne biological particles has increased during the
last decades, not only from an allergological point of view but
also with agricultural and ecological purposes (Beggs et al.
2017;Šantl-Temkiv et al. 2019).
The prediction of Olea pollen concentrations during the
pollen season is a valuable tool to assess the risk level and to
warn allergic sufferers to pollen.
The short-term predictive modelling in olive pollen is very
complex and implies plenty of difficulties because (i) yearly
flower production of olive trees depends on the environmental
conditions during the current and previous year, as well as on
the internal hormonal and biochemical balance of the plant
(Rojo et al. 2015); and (ii) the variability associated to local
features such as the topography or altitude on the succession
of olive flowering characteristics (Aguilera and Ruiz
Valenzuela 2012;Oterosetal.2013; Rojo and Perez-Badia
2014). In addition, a strongvariability among years and also in
consecutive days in the maximum pollen concentration has
been reported (Fernandez-Rodriguez et al. 2016a; Perez-
Badia et al. 2013). Furthermore, autoregressive effects of the
temperature or precipitation occurring the previous days influ-
ence on pollen concentration (Silva-Palacios et al. 2016).
The modelling approaches of main pollen season (MPS)
(Galán et al. 2017)ofOlea pollen and other pollen types have
often faced this issue by estimating separately both phenolog-
ical metrics (start and peak dates and length of the pollen
season) and pollen intensity parameters (annual amounts and
daily concentrations) (Fernandez-Rodriguez et al. 2016b). A
number of works analysed the importance of accumulated
temperature and precipitation previously to the flowering
stage (Rojo and Perez-Badia 2015), as well as thermal require-
ments based on the chilling and forcing accumulated periods
of the olive tree (Aguilera et al. 2014; Chuine et al. 1999;
Fraga et al. 2019; Orlandi et al. 2013). In addition, predictive
models for forecasting daily pollen concentrations were devel-
oped taking into account the meteorological variables and
pollen concentration of the preceding days using statistical-
based procedures from linear regression models (Cotos-Yanez
et al. 2004) to neural network models (Iglesias-Otero et al.
2015). Also, the historical time series of pollen data have been
studied to predict future pollen levels based on the inherent
seasonal behaviour of the pollen season in conjunction with
current meteorological conditions (Rojo et al. 2017).
In this work, we use models that have not been implemented
before in pollen prediction as they are state-of-the-art in machine
learning technology. LightGBM (light gradient boosting ma-
chine) as well as ANNs (artificial neural networks) are able to
capture non-linearities and to make a deep learning of the rela-
tionships among the predicting variables. Firstly, LightGBM and
ANNs are applied to predict the peak of the MPS. Secondly, a
supervised machine learning ensembling process that involves
two steps of generalized additive models (GAMs) followed by
LightGBM on one side and ANNs on the other side, was pro-
posed to predict the Olea pollen concentration. The phenology of
the MPS was accounted for using the day of the year (DOY) as
numeric input to the models (start and peak dates). The accumu-
lated heat requirements necessary for olive flowering were ad-
dressed by introducing a forcing requirements variable (state of
forcing, Sf). The effect of the time series was accounted for
introducing lags of 3 and 5 days of Olea pollen concentrations.
The measurement locations and their meteorology were also
used as features. The main objective reached in this work was
to develop predicting models for daily pollen concentrations and
periods of the pollen season of Olea pollen. The results of this
forecasting model would allow the health authorities to anticipate
the adequate measures to minimize the impact on the population.
Materials and methods
Pollen data
The PALINOCAM Network belonging to the Community of
Madrid has been measuring the pollen concentrations since
1993 recording an extensive database involving 25 morpholog-
ical pollen types based on daily measurements. From all the
pollen types, the Poaceae (=Gramineae) (Grass), Olea (Olive),
Cupressaceae/Taxaceae (Cypress/Arizonica), and Platanus
(London plane) have a special interest in the Mediterranean re-
gion due to their impact in human health and their relative abun-
dance in the air (D’Amato et al. 2007;Perez-Badiaetal.2010).
Olive pollen is one of the most important aeroallergens moni-
tored by the PALINOCAM Network (Community of Madrid,
central Spain).
The study includes the analysis of 22 years of data. The period
1997–2017 was employed to train the model, and the remaining
2018 was used for external validation. It includes up to 10 pollen
sampling points where Olea pollen was measured (Fig. 1). These
locations belong to the aerobiological network of Madrid
(“PALINOCAM”,2019). The pollen sampling was done using
10 Hirst-type volumetric spore-trap according to the guidelines
from the international standardized methodological procedure
agreed by the main scientific aerobiological organizations in
Europe (Galan et al. 2014). The units of the daily pollen concen-
trations were number of pollen grains per cubic meter of air
(pollen grains/m
3
).
The Olea_3_day and Olea_5_day, respectively, were calcu-
lated for the daily pollen concentration 3 and 5 days before.
Taking them as lag predictors allows including an
autoregressive effect in the time series. Furthermore, to account
for the seasonality component of the time series, the day of the
year (DOY) was also included as feature (yearday): considering
day one as 1st January. This variable indicates the moment of
the year in the model, which is directly related to the strongly
seasonality of pollen season. In this way, all components of
Int J Biometeorol
classical time series analysis (seasonality and trend) are as-
sumed in the fitted models. Note that the objective of this work
was to predict if a pollen peak is going to occur with some days
in advance. The Olea pollen concentration is daily available, so
preceding days of the Olea concentration information can be
accessed on-line and used in operative predictive models.
Meteorological data
The meteorological data were obtained from the Open Data API
from the AEMET (“Open Data”,2019). The meteorological var-
iables considered were maximum temperature (tmax), minimum
temperature (tmin), average temperature (tmed), wind speed
(Ws), wind direction (Wd), and sunshine hours (Sun).
Phenological variables and thermal requirements
There are references defining with precision the timing dates
related to the MPS (Bastl et al. 2018; Galán et al. 2017;Pfaar
et al. 2018). The present study has used an adaptation of Pfaar
et al. (2018): The start_PS has been taken as the first DOY
when the Olea pollen concentration was above 20 grains/m
3
,
whereas the other phenological variables (high_PS and
peak_PS) were taken exactly as indicated in such reference.
The start_PS has been introduced in the models as binary
variable yielding 1 once the MPS starts. The high_PS variable
was also introduced as binary, yielding 1 when the daily pol-
len concentration was above 100 grains/m
3
. The same was
done for Peak_PS, corresponding to the DOY when maxi-
mum Olea pollen concentration is reached.
Modelling of the phenological variables was conducted
using a thermal-based approach using the state of forcing heat
(Sf). The Sf necessary to initiate the bud growth has been
widely applied to determine the start of the pollen season
(Chuine et al. 1998; Osborne et al. 2000; Picornell et al.
2019). Sf is expressed as forcing units, which are non-dimen-
sional, and can be understood as a mathematical transforma-
tion of heat (Chuine et al. 1999). The following equations
Fig. 1 Location of the monitoring sites in the Community of Madrid (extent highlighted in red). Green triangles for the PALINOCAM pollen stations,
blue squares for meteorological stations
Int J Biometeorol
have been commonly used for calculating the state of forcing
heat (Sf):
Sf ¼0Tavg ≤Tb
∑n
i¼1Tavg −Tb
Tavg ≤Tb
ð1Þ
Sf ¼∑n
i¼1
1
1þedT
avg−c
ðÞ
! ð2Þ
where nis the number of days of the year.
Where dis a numeric parameter with negative values, T
avg
the daily averaged temperature, T
b
abasetemperature,andc
the temperature threshold. According to the equations from
above, a day when T
avg
is under the temperature threshold c
would have a nearly null contribution to the sum of Sf.
Inversely, if T
avg
is above the temperature threshold, that day
would have a high contribution to the Sf. The base tempera-
ture can be arbitrarily adjusted (Chuine et al. 1998), i.e. taken
as fixed parameter. Just the cand dconstants would be differ-
ent for different T
b
. Other method is to consider it as the
temperature above which the heat forcing starts affecting the
pollen production (Osborne et al. 2000).
To fit the parameters T
b
,d,andc, the following methodol-
ogy was applied: Firstly, T
b
was computed as the average
temperature calculated among all the locations and years for
the entire time series without the validation data (year 2018
excluded), a week previous to the start of the pollen season
(start_PS). This T
b
that resulted to be 18.4 °C was a good
proxy for the range of temperatures in the week previously
to the flowering and developing of the anthers. Then, the
initial Sf was calculated according to Eq. (1).
In the next step, a non-linear least squares algorithm based on
Gauss-Newton (Hartley 1961) was fitted to the data, resulting in
values for the parameters cand dof, respectively, 18.89 and −
6.12 °C. The mean absolute error (MAE) between the Sf calcu-
lated with the fitted non-linear model (Eq. 2) and predictions (Eq.
1) was computed. When the Sf was finally computed, the mean
Sf which marked the start_PS could be calculated for all stations
and years. We will refer to this Sf as critic Sf, since by itself can
be used to estimate the day of the year (DOY) when the pollen
season starts (start_PS), and hence to predict the start of the
pollen season given in a site and its past conditions taking into
account only the daily averaged temperatures. For our case and
as average among locations and years, it yielded 3.3 forcing
units. The fact of Sf capability to mark the start of the pollen
season provides valuable information for the predicting models.
Site-dependent variable (locations)
We have introduced a location feature as a categorical variable
that takes 10 possible values (one for each pollen station). Then,
it was transformed to 10 binary dummy variables. This last step
is necessary for linear models (ANNs) but could be skipped for
tree-based models due to their intrinsic non-linearity. Although
the models have been applied to the Community of Madrid, they
can be easily extrapolated to other locations and other palyno-
logical species. The user only has to code its locations as dummy
variables and to fit the proposed algorithm.
Models
In this work, two separated lines of modelling algorithms were
developed: (1) in the first step (line 1), a LightGBM and an ANN
were used to predict the peak_PS DOYs; (2) in the second step
(line 2), an ensemble combining two previous GAM steps with a
LightGBMandanANNwasemployedtopredicttheOlea pol-
len concentration during the whole MPS. In each case, using a
separate LightGBM and ANN in parallel allowed to compare the
results from both models (Fig. 2).
ThemodelsemployedinFig.2are briefly described in the
following paragraphs. The GAMs (Hastie 2018, James et al.
2013) used as first layers in the calculus of the Olea pollen
concentrations offer the possibility to be easily approximated
by polynomials. They are non-parametric, so no assumptions
about the normality of the data have to be made. In addition, they
capture well the time trends and the user can fit a different func-
tion for each feature, linear or non-linear, so their flexibility is
huge. GAMs thus allow a more general model to be fitted than
linear models and they have been found useful in other areas of
atmospheric research (Borge et al. 2019). The functions chosen
were splines since they are non-linear smoothers.
The number of degrees of freedom is a hyper-parameter of
spline-based GAMs. Therefore, a grid search was done for
optimizing the number of degrees of freedom of the splines
between 1 and 20, whereas the root mean squared error
(RMSE) was used as the default cross-validation error score
(where a number of 10 folds was chosen). The R package
Caret was used for fitting GAMs (Kuhn 2017). The resulting
repeated cross-validation RMSE is shown in Fig. 3.
The optimum was not reached up to 20 degrees of freedom,
but from that point, the model complexity is intractable. An
intermediate number of degrees of freedom was set to 10 to
avoid overfitting.
The LightGMB algorithm is by itself an ensembling meth-
od, since it combines several boosted trees to reach the pre-
dictions. Obviously, with this technique, as well as with
ANNs, the focus was the improvement of the predictions at
the expenses of the interpretability of the results, because they
behave as black boxes.
For the case of ANNs, several architectures were analysed,
and the one which performed best was chosen. This architecture
consisted of an input layer of 20 units, a hidden layer with
50 units, and one output layer. The activation functions were,
respectively, sigmoid, sigmoid, and linear, resulting from a grid
search testing diverse hyper-parameters. This algorithm was
Int J Biometeorol
described several years ago (HOPFIELD 1982), and its imple-
mentation using TensorFlow 2.0 (released on September, 2019)
constitutes a novelty in a rapidly evolving technology.
LightGBM and ANNs are state-of-the-artmachine learning
models and are currently among the best and most used algo-
rithms (Kaggle 2019). Their application to the case of Olea
pollen, in contrast to other traditional algorithms mentioned in
the introduction, is another novelty of this work.
The data pre-processing and calculus of the estimated vari-
ables related to the pollen season was performed using R in
RStudio Team (2015)andthespecific“AeRobiology”Rpack-
age for aerobiological tasks (Rojo et al. 2019). The LightGBM
(Ke et al. 2017) was coded in Python in Spyder IDE using the
library of the same name, whereas the ANNs were implemented
in Python 3.7 using Tensorflow’s2.0(Abadietal.2016)withthe
API Keras (Chollet 2015) in the cloud environment provided by
Google Collaboratory running on GPU (“Google
Collaboratory”,2019).
Models for the timing of pollen season in DOYs (line 1)
In this first step, only separate LightGBM and ANN models
were used (line 1). They take as features all the variables
shown in Fig. 2, both station and computed data.
Pollen data from the different locations were merged into
one pool dataset. Developing models for each specific place
would have been expectably more accurate, but less applica-
ble in a general way, so a regional approach was preferred
although the particular characteristics of each site were
neglected, e.g. the wind direction may influence the pollen
in a given site if olive groves were located upstream.
Once the models were fitted, they were used to assess the
peak_PS. The observed peak values were also computed tak-
ing into consideration the observed Olea pollen maximum
concentrations. The results were visualized by means of box
and whiskers plots and the differences between predicted and
r
2
= 0.71
AIC = 12522
Time of fing = 10 sec
Meteorology
Sf
High_PS
GAM1 GAM2
LightGBM
ANNs
First
Predicted
Second
Predicted
Final Olea concentraon
r
2
= 0.12
Time of fing = 35 min
r
2
= 0.46
Time of fing = 55 min
r
2
= 0.78
r
2*
= 0.53
AIC = 11423
Time of fing = 1 min
r
2
= 0.81
r
2*
= 0.53
AIC = 10233
Time of fing = 20 sec
Pallinology/
Phenology
Olea_3_day
Olea_5_day
Start_PS
Time-series
Locaon
tmed
tmin
tmax
Wd
Ws
Sun
10 places
Final Olea concentraon
Line 1: peak_PS predicon models
Line 2: Olea pollen concentraon models
LightGBM
ANNs
Peak_PS
Peak_PS
Comparison
Comparison
r
2
= 0.68
AIC = 12654
Time of fing = 1 min
Fig. 2 Scheme of the machine learning procedure showing the two lines
followed: Line 1 for standalone LightGBM and ANN for predicting the
peak_PS DOYs and line 2 showing the ensemble for predicting the Olea
pollen concentration. The r
2
(r
2
* for the final models applied alone)
calculated in the test set (25% of the total data) is also shown
grains/m
3
Fig. 3 Repeated cross-validation RMSE vs. the number of degrees of
freedom for spline-based GAMs. The RMSE was calculated for each
given degrees of freedom as an average of 10 folds
Int J Biometeorol
Apr Aug. Apr Aug.
alc alco
Apr Aug. Apr Aug.
Apr Aug. Apr Aug.
Apr Aug.
Apr Aug.
Apr Aug.Apr
. Jun. .Jun.
. Jun. . Jun.
. Jun. .Jun.
.Jun.
. Jun.
. Jun. .Jun. Aug.
bar
ara
get
ciu
ret
leg
vil
roz
Int J Biometeorol
observed DOYs assessed by the RMSE, the mean absolute
error (MAE), and r
2
metrics. The results are shown in the
“Predicting the peak dates and values (line 1)”section.
Models for the prediction of daily pollen concentrations
(grains/m
3
) (line 2)
The utility behind ensemble techniques is that some models cov-
er the weaknesses of others, and reinforce their strengths.
Cordero et al. (2018) used a combination of multiple linear re-
gression and artificial neural networks (ANNs) to process the
signal of Air Quality sensors elsewhere. In this work, another
approach was followed using more sophisticated and state-of-
the-art algorithms.
To reach the optimal architecture, several combinations
were performed to make an ensemble algorithm. The best
combination in terms of accuracy is schematized in Fig. 2(line
2). It combines two generalized additive models (GAMs) with
LightGBM and ANNs. A third GAM did not improve suffi-
ciently further the r
2
and took a high computational cost, so
introducing more GAM steps was discarded.
The first GAM takes as inputs again all the available variables
showninFig.2,andOlea pollen concentrations as response
variable. The first GAM predicted Olea concentrations were tak-
en as an additional variable and hence as additional input to the
second GAM (metavariable). Note the improvement of the r
2
(Fig. 2). The essence of ensembling is precisely that the second
models have information about the performance of the first
models given the same set of conditions (variables). Once both
GAM predictions were obtained (metavariables), they were fed
to the LightGBM and the ANNs along with the rest of variables
into.
Validation of the models
All the models were validated internally and externally. For in-
ternal validation, a two-step method was used: (i) the whole
dataset was randomly split into train and test sets, accounting
respectively for 75 and 25% of the data, which is common prac-
tice. The statistics RMSE, MAE, and r
2
were calculated on the
test dataset (see Fig. 2); (ii) a 10-fold cross-validation was per-
formed and the results were expressed with the same statistics
from before and the corresponding standard deviation among
folds. This two-step approach allowed us to make sure of the
robustness of the models.
The external validation was performed computing the r
2
,
RMSE, and MAE, and using the year 2018 for the ten loca-
tions, which involves independent data which have not been
included in the training task of the model. We have used data
from April to July 2018 (inclusively) for external validation.
In other words, we restricted the external validation to the
Olea pollen season since it is the important period in this work
(both, observed and predicted values remain around zero out-
side of that period). The standard deviation among locations
was also shown.
The RMSE (Eq. 3) is useful providing information about
large errors since the differences between the estimated and
the observed values are squared previously to take the aver-
age. On the other hand, the MAE (Eq. 4)providesdirectly
related errors, so its interpretability is more straightforward.
The combination of both errors should give a complementary
idea of the errors of the models (both in grains/m
3
).
RMSE ¼1
Nffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
∑N
iyi−b
yi
2
rð3Þ
Fig. 4 Average pollen concentrations of Olea pollen for the 10 locations
studied. The orange shadow of colours represents the amplitude
(maximum-minimum) of daily values for the time series 1997–2018.
Aerobiology R package was used (Rojo et al. 2019)
Reference Olea
LightGBM
ANN
Olea, grains/m
3
2000 2005 2010 2015
date
800
600
400
200
0
Fig. 5 Observed Olea concentration time series (black line) and predicted values using LightGBM (blue lines) or ANN (red line). The data has been
treated as a pool of all the observation stations using daily averages
Int J Biometeorol
MAE ¼1
N∑N
iyi−b
yi
ð4Þ
where,
N, is the number of observations
y
i
,istheith observed Olea concentration in grains/m
3
b
yi,istheith predicted Olea concentration in grains/m
3
The Akaike Information Criterion index (AIC) is common-
ly used to compare different models. It assesses the amount of
information that would be lost if a given model was used
taking into account the trade-off between the goodness of fit
and the penalty on the number of variables. AIC can be com-
puted using Eq. 5(Akaike 1974):
AIC ¼NLL þ2kð5Þ
where
LL is the log-likelihood for the model using the natural
logarithm (e.g. the log of the MSE)
kis the number of variables
The best model will show the lowest AIC.
Finally, the feature importance could be assessed by means
of the package LightGBM (Keras does not currently provide
functions to evaluate it) by plotting the features ordered upside
down by number of splits. The splits measure how many times
each variable was used in the model. The results are given in
the “Predicting the daily pollen concentration (line 2)”section
(Fig. 9).
Results and discussion
Figure 4shows the time series for Olea pollen concentration
measured at each study site in this work. Pollen concentrations
above zero were only detected within a small period of time,
coinciding with the main pollen season. This situation makes
difficult to develop predictive models because of the marked
seasonality of pollen emission.
Several pollen peaks can be seen in the pollen curve, but
only the maximum for each year will be referred here as the
pollen peak. The different peaks are probably related to stag-
gered flowering due to the different location of the olive or-
chards or site-related features like altitude or other topographi-
cal features (Chuine et al. 1998; Rojo and Perez-Badia 2014). In
addition, the multiple peaks observed in the pollen curve could
also be due to factors involved in the pollen dispersion
ba
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
800
600
400
200
0
Max Olea, grains/m3
200
180
160
140
Peak_PS DOY
Fig. 6 Boxplots of the maximum observed Olea concentrations (peak) (a) and of the day of the year (DOY) when the maximum peak occurs (b) for the
ten pollen stations studied. The predictions from both models LightGBM and ANN are contrasted
Table 1 Models results from the internal and external validation of peak
predictions (line 1)
Ensemble Validation r
2
RMSE, DOY MAE, DOY
LightGBM Internal 0.98 11.7 9.5
ANN Internal 0.96 12.2 8.3
LightGBM External* 0.74 ± 0.09 15.6 ± 10.8 9.2 ± 5.8
ANN External* 0.69 ± 0.05 11.4 ± 8.6 8.6 ± 4.7
*The error metrics and their standard deviations are computed across the
10 locations
Table 2 Models results from the cross and external validation for the
pollen concentration (line 2)
Ensemble Validation r
2
RMSE MAE
LightGBM Cross 0.78 ± 0.05 29.27 ± 2.73 9.45 ± 2.55
ANN Cross 0.71 ± 0.09 25.03 ± 2.48 9.35 ± 2.24
LightGBM External 0.63 ± 0.17 39.06 ± 19 14.85 ± .6.34
ANN External 0.56 ± 0.12 49.14 ± 22.17 29.56 ± 16.29
*The error metrics and their standard deviations are computed across the
10 locations
Int J Biometeorol
conditions (proximity to olive groves, or urban architecture
close to the traps) or sudden changes in meteorological condi-
tions (wind conditions or other factors).
Predicting the peak dates and values (line 1)
In line 1 for predicting the peak_PS, a r
2
of around 0.70 was
obtained for the test set. Figure 5shows the observed and
predicted Olea concentration time series using both models
using all the data as a pool and daily averages. The AICs of
LightGBM and ANN are similar, but slightly smaller for the
former model, indicating a best fit.
A question to note is that the LightGBM/ANN models
seem to capture accurately the timing of the peaks of the
pollen season, but they tend to underpredict observed peak
values. Figure 6a shows the comparison among observed
and estimated maximum peaks concentrations and Fig. 6b
shows the comparison among observed and estimated peak
dates (peak_PS) for the pollen stations studied, using both
LightGBM and ANN statistical models.
The models perform reasonably well at predicting the
peaks, both peak pollen value and peak date (peak_PS) (Fig.
6). The MAE obtained was within the same order of magni-
tude of the one found by other authors (Picornell et al. 2019).
Table 1shows the results of the internal validation for the
peak_PS prediction, and the corresponding values for the ex-
ternal validation (see the “Predicting the daily pollen concen-
tration (line 2)”section).
Regarding the phenological peak date, the predicted mean
peak date (considering all pollen stations and years) was of
149.1 ± 9.3 days for LightGBM and 150.1 ± 10.8 days for
ANN, very close to the observed mean peak date (148.8 ±
9.8 days).
Additionally, the critic Sf can be used by itself for predicting
the start_PS. The critic Sf that minimized the MAE (as explained
in the “Phenological variables and thermal requirements”sec-
tion) resulted to be of 3.3 units, with a MAE of 1.9 Sf units (Sf
is dimensionless). Therefore, the start_PS DOY obtained by
using the critic Sf by itself was of 138.2 ± 14.0 days, being the
observed start_PS of 131.2 ± 10.1 days, which are quite similar.
Predicting the daily pollen concentration (line 2)
In this section, the ensemble shown in Fig. 2has been trained and
the results were assessed. Table 2shows the results from the 10-
fold cross-validation as mean values followed by their standard
deviation. The results are comparable and in the order of magni-
tude of (Iglesias-Otero et al. 2015;Laraetal.2019). In addition,
Table 2shows the results from the external validation as mean
values followed by their standard deviation by location and the
year 2018 that had not been included in the previously used data
training time series. By algorithms, r
2
for LightGBM were slight-
ly better than the ones for ANNs (Table 2), but the RMSE and
MAE were a little higher in the cross-validation, and the opposite
in the external validation. The calculated AICs are analogous to
the r
2
and to the findings of the “Predicting the peak dates and
values (line 1)”section.
In addition, we have compared these r
2
with that obtained
when a naïve model was applied: using the Olea pollen con-
centration from previous day as the predicted concentration.
The r
2
was of 0.48 and the AIC of 18,251 indicating that
having only one variable does not compensate the loose of
accuracy. The poor accuracy of such a model is on not includ-
ing the autoregressive component (in this study, as Olea pol-
len lags) and a seasonality effect (included in this study by the
yearday variable). Note also that a high part of the variance is
induced by short-term meteorological effects.
The results show that the ensembles used were able to
predict the period when the MPS begins. In addition,
peak_PS are predicted very accurately with the exceptions
of locations like alc, get, or roz, where the models seem to
predict earlier lower peaks. However, some difficulties are
presented to capture the exact moment when the different
peaks occur in some sites like ciu or vil (Fig. 7). This result
may be due to tree-related variables not included in the
statistical model. Information about olive pollen biology is
a very complex field out of the aims of this work (Chuine
et al. 1998). Other variability sources such intrapopulation
phenological variability may explain why the external val-
idation can be so complex (Rojo and Pérez-Badia., 2015).
The regional model using the pool dataset allows to gener-
ate more general forecasting models and hence represent
better the overall olive crops in the Community of Madrid.
Oteros et al. (2013) supported that local particularities could
improve the predictions for particular locations, although
the applicability of the model shrinks.
The models are then ready to predict Olea pollen concen-
trations and the peak dates of the pollen season. They also will
constitute the future basis to study other pollen species.
Finally, some plots were made to help visualizing the over-
all results. Figure 8shows measured Olea concentration vs.
that calculated by means of the LightGBM ensemble (A), and
the ANN ensemble (B), whereas Fig. 9shows the LightGBM
feature importance plot.
The most important features according to Fig. 9are the
time series lags for the daily pollen concentrations, in agree-
ment with Lara et al. (2019). Then, a block of temperature
features including the Sf can be identified. Next variables
are wind and the predictions from both GAMs, which is con-
sistent with the methodology of including GAMs in the pro-
posed ensemble. The features start_PS and peak_PS may
seem unimportant; however, they are included indirectly in
the variable yearday, which indicates that phenological data
is relevant to the model. The site features remain as the less
important ones, pointing out that the choice of a regional
model was appropriate.
Int J Biometeorol
R2 = 0.59
R2 = 0.51
R2 = 0.70
R2 = 0.66
ara bar
R2 = 0.35
R2 = 0.57
R2 = 0.76
R2 = 0.49
alco vil
R2 = 0.24
R2 = 0.22
R2 = 0.64
R2 = 0.57
alc ciu
R2 = 0.71
R2 = 0.57
R2 = 0.60
R2 = 0.56
get leg
R2 = 0.71
R2 = 0.67
R2 = 0.62
R2 = 0.57
ret roz
May. Jun. Jul.
.
Pollen grains/m3
600
400
200
0
600
400
200
0
600
400
200
0
600
400
200
0
600
400
200
0
600
400
200
0
600
400
200
0
600
400
200
0
600
400
200
0
600
400
200
0
date
May. Jun. Jul.
May. Jun. Jul.
May. Ju n. Jul.
May. Jun. Jul.
May. Jun. Jul.
May. Ju n. Jul.
May. Ju n. Jul.
May. Jun. Jul.
May Jun. Jul.
Int J Biometeorol
The results are quite accurate regarding the results from the
10-fold cross and the internal validations. It is worth
remarking that the ANN architecture chosen seems to
underpredict the peaks. Further research on this issue must
be done in order to improve the ANNs performance.
Nevertheless, the models developed improve both peak dates
and Olea pollen concentrations predictions with respect to
more simpler models like predicting based on the previous
day measurements and using models from the ensemble alone.
Conclusions
In this work, the prediction of the Olea pollen season in terms
of phenological timing (peak dates) and pollen intensity (daily
pollen concentrations) was combined in a regional prediction
for the Community of Madrid (central Spain). In a first step,
the LightGBM and ANNs were applied directly to predict
peak_PS DOYs. The predictions resulted to be very accurate
for the peak date and peak values. In a second step, a more
complex ensemble involving two GAM steps previous to the
LightGBM/ANNs showed to be very effective to predict the
entire time series of Olea pollen concentrations.
Predicting the peaks of the pollen season revealed as a
challenging issue because of the difficulties of integrating
the measured biological and environmental characteristics
at local-scale in statistical-based predictive models. More
studies must be conducted to develop new predictive vari-
ables accounting for these features that can help improving
model performance. However, the already obtained high
accuracies make the models suitable to assess both the pe-
riod and the concentrations of the Olea pollen season, and
could be extended to other allergenic species in the future.
This may help health authorities to prevent allergic diseases
Olea, grains/m
3
Olea, grains/m
3
LightGBM, grains/m
3
ANN, grains/m
3
b
r2=0.78
Naïve, grains/m
3
Olea, grains/m
3
a
r2= 0.81
c
r2=0.48
Fig. 8 Prediction of Olea pollen concentration (estimated vs. observed) using naïve model (a), LightGBM (b)andANN(c)
Fig. 7 Time series of observed Olea concentration (black line) and the
predictions of the ensembles LightGBM (red line) and ANN (blue line).
The predicted peak_PS from the models of line 1 are displayed as vertical
lines
Int J Biometeorol
using predictors such as pollen concentrations for preceding
days or phenological variables which could be estimated in
advance by meteorological factors.
Acknowledgements This study was carried out within the AIRTEC-CM
(urban air quality and climate change integral assessment) scientific pro-
gramme funded by the Directorate General for Research and Innovation
of the Greater Madrid Region (S2018/EMT-4329). The State
Meteorological Agency (AEMET) as well as the PALINOCAM
Network are acknowledged for providing meteorological and palynolog-
ical observations.
Funding This study was carried out within the AIRTEC-CM (urban air
quality and climate change integral assessment) scientific programme
funded by the Directorate General for Research and Innovation of the
Greater Madrid Region (S2018/EMT-4329).
Data availability The sources of data are publicly available on-line: me-
teorological data from AEMET Open Data and pollen data from the
PALINOCAM Network.
Compliance with ethical standards
Conflict of interest The authors declare that they have no competing
interests.
Ethics approval Not applicable.
Consent to participate Not applicable.
Consent for publication Not applicable.
Code availability The code for this work was custom made using free
Open Source libraries from R/Python.
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M,
Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga
R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V,
Warden P, Wicke M, Yu Y, Zheng X (2016) TensorFlow: a system
for large-scale machine learning, in: 12th USENIX Symposium on
Operating Systems Design and Implementation (OSDI 16). pp.
265–283
Aguilera F, Ruiz Valenzuela L (2012) Altitudinal fluctuations in the olive
pollen emission: an approximation from the olive groves of the
south-east Iberian Peninsula. Aerobiologia (Bologna) 28:403–411.
https://doi.org/10.1007/s10453-011-9244-9
Aguilera F, Ruiz L, Fornaciari M, Romano B, Galan C, Oteros J, Ben
Dhiab A, Msallem M, Orlandi F (2014) Heat accumulation period in
the Mediterranean region: phenological response of the olive in
different climate areas (Spain, Italy and Tunisia). Int J Biometeorol
58:867–876. https://doi.org/10.1007/s00484-013-0666-7
Akaike H (1974) A new look at the statistical model identification. IEEE
Trans Automat Contr 19(6):716–723
Balbus JM, Greenblatt JB, Chari R, Millstein D, Ebi KL (2014) A wedge-
based approach to estimating health co-benefits of climate change
mitigation activities in the United States. Clim Chang 127:199–210.
https://doi.org/10.1007/s10584-014-1262-5
Bastl K, Kmenta M, Berger UE (2018) Defining pollen seasons: back-
ground and recommendations. Curr Allergy Asthma Rep 18:73.
https://doi.org/10.1007/s11882-018-0829-z
Beggs PJ, Sikoparija B, Smith M (2017) Aerobiology in the International
Journal of Biometeorology, 1957-2017. Int J Biometeorol 61:S51–
S58. https://doi.org/10.1007/s00484-017-1374-5
Borge R, Requia WJ, Yague C, Jhun I, Koutrakis P (2019) Impact of
weather changes on air quality and related mortality in Spain over a
25 year period {[}1993-2017]. Environ Int 133:105272. https://doi.
org/10.1016/j.envint.2019.105272
Burnett R, Chen H, Szyszkowicz M, Fann N, Hubbell B, Pope CA, Apte
JS, Brauer M, Cohen A, Weichenthal S, Coggins J, Di Q,
Brunekreef B, Frostad J, Lim SS, Kan H, Walker KD, Thurston
GD, Hayes RB, Lim CC, Turner MC, Jerrett M, Krewski D,
Olea_3_day
Olea_5_day
Yeard ay
tmed
Sun
tmax
Sf
tmin
Wd
GAM1
GAM2
Ws
Id_ret
Month
Id_alco
High_P S
Id_get
Id_alc
Id_leg
Start_PS
Id_bar
Id_ro z
Splits
Fig. 9 Feature importance plot
using the package LightGBM
taking the number of splits as an
indicator
Int J Biometeorol
Gapstur SM, Diver WR, Ostro B, Goldberg D, Crouse DL, Martin
RV, Peters P, Pinault L, Tjepkema M, van Donkelaar A, Villeneuve
PJ, Miller AB, Yin P, Zhou M, Wang L, Janssen NAH, Marra M,
Atkinson RW, Tsang H, Quoc Thach T, Cannon JB, Allen RT, Hart
JE, Laden F, Cesaroni G, Forastiere F, Weinmayr G, Jaensch A,
Nagel G, Concin H, Spadaro JV (2018) Global estimates of mortal-
ity associated with long-term exposure to outdoor fine particulate
matter. Proc Natl Acad Sci 115:9592–9597. https://doi.org/10.1073/
pnas.1803222115
Chollet, F., 2015. Keras
Chuine I, Cour P, Rousseau DD (1998) Fitting models predicting dates of
flowering of temperate-zone trees using simulated annealing. Plant
Cell Environ 21:455–466. https://doi.org/10.1046/j.1365-3040.
1998.00299.x
Chuine I, Cour P, Rousseau DD (1999) Selecting models to predict the
timing of flowering of temperate trees: implications for tree phenol-
ogy modelling. Plant Cell Environ 22:1–13. https://doi.org/10.1046/
j.1365-3040.1999.00395.x
Cole-Hunter T, de Nazelle A, Donaire-Gonzalez D, Kubesch N, Carrasco-
Turigas G, Matt F, Foraster M, Martinez T, Ambros A, Cirach M,
Martinez D, Belmonte J, Nieuwenhuijsen M (2018) Estimated effects
of air pollution and space-time-activity on cardiopulmonary outcomes in
healthy adults: a repeated measures study. Environ Int 111:247–259.
https://doi.org/10.1016/j.envint.2017.11.024
Cordero JM, Borge R, Narros A (2018) Using statistical methods to carry
out in field calibrations of low cost air quality sensors. Sensors
Actuators B Chem 267:245–254. https://doi.org/10.1016/j.snb.
2018.04.021
Cotos-Yanez TR, Rodriguez-Rajo FJ, Jato MV (2004) Short-term predic-
tion of Betula airborne pollen concentration in Vigo (NW Spain)
using logistic additive models and partially linear models. Int J
Biometeorol 48:179–185. https://doi.org/10.1007/s00484-004-
0203-9
D’Amato G, Cecchi L, Bonini S, Nunes C, Annesi-Maesano I, Behrendt
H, Liccardi G, Popov T, van Cauwenberge P (2007) Allergenic
pollen and pollen allergy in Europe. Allergy 62(9):976–990.
https://doi.org/10.1111/j.1398-9995.2007.01393.x
D’Amato G, Holgate ST, Pawankar R, Ledford DK, Cecchi L, Al-Ahmad
M, Al-Enezi F, Al-Muhsen S, Ansotegui I, Baena-Cagnani CE,
Baker DJ, Bayram H, Bergmann KC, Boulet LP, Buters JTM,
D’Amato M, Dorsano S, Douwes J, Finlay SE, Garrasi D, Gómez
M, Haahtela T, Halwani R, Hassani Y, Mahboub B, Marks G,
Michelozzi P, Montagni M, Nunes C, Oh JJW, Popov TA,
Portnoy J, Ridolo E, Rosário N, Rottem M, Sánchez-Borges M,
Sibanda E, Sienra-Monge JJ, Vitale C, Annesi-Maesano I (2015)
Meteorological conditions, climate change, new emerging factors,
and asthma and related allergic disorders. A statement of the World
Allergy Organization. World Allergy Organ J 8:25. https://doi.org/
10.1186/s40413-015-0073-0
Diaz J, Linares C, Tobias A (2007) Short-term effects of pollen species on
hospital admissions in the city of Madrid in terms of specific causes
and age. Aerobiologia (Bologna). 23:231–238. https://doi.org/10.
1007/s10453-007-9067-x
Directive 2008/50/EC of the European Parliament and of the Council of
21 May 2008 on ambient air quality and cleaner air for Europe,
2008. , Official Journal of the European Communities
EEA (2019a) European Environment Agency (EEA), 2018a. Air quality
in Europe —2018 report. EEA Report No 12/2018. EEA Report No
12/2018
EEA (2019b) European Environment Agency (EEA), 2018b. Improving
Europe’s air quality —measures reported by countries, EEA brief-
ing. ISSN 2467–3196
Fernandez-Rodriguez S, Duran-Barroso P, Silva-Palacios I, Tormo-
Molina R, Maria Maya-Manzano J, Gonzalo-Garijo A (2016a)
Quercus long-term pollen season trends in the southwest of the
Iberian Peninsula. Process Saf Environ Prot 101:152–159. https://
doi.org/10.1016/j.psep.2015.11.008
Fernandez-Rodriguez S, Duran-Barroso P, Silva-Palacios I, Tormo-
Molina R, Maria Maya-Manzano J, Gonzalo-Garijo A (2016b)
Regional forecast model for the Olea pollen season in
Extremadura (SW Spain). Int J Biometeorol 60:1509–1517.
https://doi.org/10.1007/s00484-016-1141-z
Fraga H, Pinto JG, Santos JA (2019) Climate change projections for
chilling and heat forcing conditions in European vineyards and olive
orchards: a multi-model assessment. Clim Chang 152:179–193.
https://doi.org/10.1007/s10584-018-2337-5
Galan I, Prieto A, Rubio M, Herrero T, Cervigon P, Luis Cantero J,
Dolores Gurbindo M, Isabel Martinez M, Tobias A (2010)
Association between airborne pollen and epidemic asthma in
Madrid, Spain: a case-control study. Thorax 65:398–402. https://
doi.org/10.1136/thx.2009.118992
Galan C, Smith M, Thibaudon M, Frenguelli G, Oteros J, Gehrig R,
Berger U, Clot B, Brandao R, Grp EASQCW (2014) Pollen moni-
toring: minimum requirements and reproducibility of analysis.
Aerobiologia (Bologna) 30:385–395. https://doi.org/10.1007/
s10453-014-9335-5
Galán C, Ariatti A, Bonini M, Clot B, Crouzy B, Dahl A, Fernandez-
González D, Frenguelli G, Gehrig R, Isard S, Levetin E, Li DW,
Mandrioli P, Rogers CA, Thibaudon M, Sauliene I, Skjoth C, Smith
M, Sofiev M (2017) Recommended terminology for aerobiological
studies. Aerobiologia (Bologna) 33:293–295. https://doi.org/10.
1007/s10453-017-9496-0
Google Collaboratory [WWW Document], 2019
Hartley HO (1961) The modified Gauss-Newton method for the fitting of
non-linear regression functions by least squares. Technometrics 3:
269–280. https://doi.org/10.1080/00401706.1961.10489945
Hastie T (2018) Gam: generalized additive models
HOPFIELD JJ (1982) Neural networks and physical systems with emer-
gent collective computational abilities. Proc Natl Acad Sci United
States Am Sci 79:2554–2558. https://doi.org/10.1073/pnas.79.8.
2554
Iglesias-Otero MA, Astray G, Vara A, Galvez JF, Mejuto JC, Rodriguez-
Rajo FJ (2015) Forecasting OLEA airborne pollen concentration by
means of artificial intelligence. Fresenius Environ Bull 24:4574–
4580
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to
statistical learning: with applications in R. Springer New York.
http://books.google.es/books?id=qcI_AAAAQBAJ
Kaggle (2019) http://www.kaggle.com/. Accessed 2019
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T.-Y
(2017) LightGBM: a highly efficient gradientboosting decision tree,
in: Guyon, I and Luxburg, UV and Bengio, S and Wallach, H and
Fergus, R and Vishwanathan, S and Garnett, R (Ed.), Advances in
neural information processing systems 30 (NIPS 2017), Advances in
Neural Information Processing Systems. NEURAL
INFORMATION PROCESSING SYSTEMS (NIPS), 10010
NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA
92037 USA
Kuhn M (2017) Caret: classification and regression training
Lake IR, Jones NR, Agnew M, Goodess CM, Giorgi F, Hamaoui-Laguel
L, Semenov MA, Solomon F, Storkey J, Vautard R, Epstein MM
(2017) Climate change and future pollen allergy in Europe. Environ
Health Perspect 125:385–391. https://doi.org/10.1289/EHP173
Lara B, Rojo J, Fernández-González F, Pérez-Badia R (2019) Prediction
of airborne pollen concentrations for the plane tree as a tool for
evaluating allergy risk in urban green areas. Landsc Urban Plan
189:285–295. https://doi.org/10.1016/j.landurbplan.2019.05.002
Lelieveld J, Evans JS, Fnais M, Giannadaki D, Pozzer A (2015) The
contribution of outdoor air pollution sources to premature mortality
on a global scale. Nature 525, 367+. https://doi.org/10.1038/
nature15371
Int J Biometeorol
Open Data [WWW Document], 2019
Orlandi F, Garcia-Mozo H, Ben Dhiab A, Galan C, Msallem M, Romano
B, Abichou M, Dominguez-Vilches E, Fornaciari M (2013)
Climatic indices in the interpretation of the phenological phases of
the olive in Mediterranean areas during its biological cycle. Clim
Chang 116:263–284. https://doi.org/10.1007/s10584-012-0474-9
Osborne, C P, Chuine, I., Viner, D., Woodward, F.I., 2000. Olive phe-
nology as a sensitive indicator of future climatic warming in the
Mediterranean. Plant Cell Environ 23, 701–710. https://doi.org/10.
1046/j.1365-3040.2000.00584.x
Oteros J, Garcia-Mozo H, Vazquez L, Mestre A, Dominguez-Vilches E,
Galan C (2013) Modelling olive phenological response to weather
and topography. Agric Ecosyst Environ 179:62–68. https://doi.org/
10.1016/j.agee.2013.07.008
PALINOCAM [WWW Document], 2019
Pawankar R, Canonica GW, Holgate ST, Lockey RF, Blaiss M (2013)
World Allergy Organisation (WAO) white book on allergy: update
2013. World Allergy Organization, Milwaukee
Perez-Badia R, Rapp A, Morales C, Sardinero S, Galan C, Garcia-Mozo
H (2010) Pollen spectrum and risk of pollen allegry in central Spain.
AAEM 17(1):139–151
Perez-Badia R, Bouso V, Rojo J, Vaquero C, Sabariego S (2013)
Dynamics and behaviour of airborne Quercus pollen in central
Iberian Peninsula. Aerobiologia (Bologna) 29:419–428. https://doi.
org/10.1007/s10453-013-9294-2
Pfaar O, Bastl K, Berger U, Buters J, Calderon MA, Clot B, Darsow U,
Demoly P, Durham SR, Gala’nC,GehrigR,vanWijkRG,
Jacobsen L, Klimek L, Sofiev M, Thibaudon M, Bergmann KC
(2018) Defining pollen exposure times for clinical trials of allergen
immunotherapy for pollen-induced rhinoconjunctivitis - an EAACI
position paper. Allergologie 41:386–399. https://doi.org/10.5414/
ALX02053
Picornell A, Buters J, Rojo J, Traidl-Hoffmann C,Damialis A, Menzel A,
Bergmann KC, Werchan M, Schmidt-Weber C, Oteros J (2019)
Predicting the start, peak and end of the Betula pollen season in
Bavaria, Germany. Sci Total Environ 690:1299–1309. https://doi.
org/10.1016/j.scitotenv.2019.06.485
Rojo J, Perez-Badia R (2014) Effects of topography and crown-exposure
on olive tree phenology. Trees-Structure Funct 28:449–459. https://
doi.org/10.1007/s00468-013-0962-1
Rojo J, Perez-Badia R (2015) Models for forecasting the flowering of
Cornicabra olive groves. Int J Biometeorol 59:1547–1556. https://
doi.org/10.1007/s00484-015-0961-6
Rojo J, Salido P, Perez-Badia R (2015) Flower and pollen production in
the ‘Cornicabra’olive (Olea europaea L.) cultivar and the influence
of environmental factors. Trees-Structure Funct 29:1235–1245.
https://doi.org/10.1007/s00468-015-1203-6
Rojo J, Rivero R, Romero-Morte J, Fernandez-Gonzalez F, Perez-Badia
R (2017) Modeling pollen time series using seasonal-trend decom-
position procedure based on LOESS smoothing. Int J Biometeorol
61:335–348. https://doi.org/10.1007/s00484-016-1215-y
Rojo J, Picornell A, Oteros J (2019) AeRobiology: the computational tool
for biological data in the air. Methods Ecol, Evol
RStudio Team (2015) RStudio: integrated development environment for
R. http://www.rstudio.com/
Šantl-Temkiv T, Sikoparija B, Maki T, Carotenuto F, Amato P, Yao M,
Morris CE, Schnell R, Jaenicke R, Pöhlker C, DeMott PJ, Hill TCJ,
Huffman JA (2019) Bioaerosol field measurements: challenges and
perspectives in outdoor studies. Aerosol Sci Technol 54:1–41.
https://doi.org/10.1080/02786826.2019.1676395
Silva-Palacios I, Fernandez-Rodriguez S, Duran-Barroso P, Tormo-
Molina R, Maria Maya-Manzano J, Gonzalo-Garijo A (2016)
Temporal modelling and forecasting of the airborne pollen of
Cupressaceae on the southwestern Iberian Peninsula. Int J
Biometeorol 60:297–306. https://doi.org/10.1007/s00484-015-
1026-6
Thien F, Beggs PJ, Csutoros D, Darvall J, Hew M, Davies JM, Bardin
PG, Bannister T, Barnes S, Bellomo R, Byrne T, Casamento A,
Conron M, Cross A, Crosswell A, Douglass JA, Durie M, Dyett J,
Ebert E, Erbas B, French C, Gelbart B, Gillman A, Harun N-S,
Huete A, Irving L, Karalapillai D, Ku D, Lachapelle P, Langton
D, Lee J, Looker C, MacIsaac C, McCaffrey J, McDonald CF,
McGain F, Newbigin E, O’Hehir R, Pilcher D, Prasad S,
Rangamuwa K, Ruane L, Sarode V, Silver JD, Southcott AM,
Subramaniam A, Suphioglu C, Susanto NH, Sutherland MF, Taori
G, Taylor P, Torre P, Vetro J, Wigmore G, Young AC, Guest C
(2018) The Melbourne epidemic thunderstorm asthma event 2016:
an investigation of environmental triggers, effect on health services,
and patient risk factors. Lancet Planet Heal 2:e255–e263. https://doi.
org/10.1016/S2542-5196(18)30120-7
World Health Organization [WWW Document], 2019
Xie Y, Dai H, Xu X, Fujimori S, Hasegawa T, Yi K, Masui T, Kurata G
(2018) Co-benefits of climate mitigation on air quality and human
health in Asian countries. Environ Int 119:309–318. https://doi.org/
10.1016/j.envint.2018.07.008
Publisher’snoteSpringer Nature remains neutral with regard to jurisdic-
tional claims in published maps and institutional affiliations.
Int J Biometeorol