Conference PaperPDF Available

Improved Extreme Rainfall Events Forecasting Using Neural Networks and Water Vapor Measures


Abstract and Figures

In the last few years, many studies claimed that machine learning tools would soon overperform the classical conceptual models in extreme rainfall events forecasting. In order to better investigate this statement, we implement advanced deep learning predictors, such as the deep neural nets, for the forecasting of the occurrence of extreme rainfalls. These predictors are proved to overperform more simple models such as the logistic regression, which are traditionally used as a benchmark for these tasks. Also, we evaluate the value of the information provided by the Zenith Tropospheric Delay. We show that adding this variable to the traditional meteorological data leads to an improvement of the model accuracy in the order of 3-4 %. We consider an area composed by the catchments of four rivers (Lambro, Seveso, Groane, and Olona) in the Lombardy region, northern Italy, just upstream from the metropolitan area of Milan, as a case study. Data of convective extreme rainfall events from 2010 up to 2017 (more than 600 extreme events) have been used to identify and test the predictors.
Content may be subject to copyright.
Improved Extreme Rainfall Events Forecasting Using
Neural Networks and Water Vapor Measures
Matteo Sangiorgio1, Stefano Barindelli2, Riccardo Biondi3, Enrico Solazzo2, Eugenio
Realini4, Giovanna Venuti2, and Giorgio Guariso1
1 Department Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
2 Department of Civil and Environmental Engineering, Politecnico di Milano, Milan, Italy
3 Department of Geosciences, Università degli Studi di Padova, Padova, Italy
4Geomatics Research & Development srl (GReD), Lomazzo (CO), Italy
Abstract. In the last few years, many studies claimed that machine learning
tools would soon overperform the classical conceptual models in extreme rain-
fall events forecasting. In order to better investigate this statement, we imple-
ment advanced deep learning predictors, such as the deep neural nets, for the
forecasting of the occurrence of extreme rainfalls. These predictors are proved
to overperform more simple models such as the logistic regression, which are
traditionally used as a benchmark for these tasks. Also, we evaluate the value of
the information provided by the Zenith Tropospheric Delay. We show that add-
ing this variable to the traditional meteorological data leads to an improvement
of the model accuracy in the order of 3-4 %. We consider an area composed by
the catchments of four rivers (Lambro, Seveso, Groane, and Olona) in the Lom-
bardy region, northern Italy, just upstream from the metropolitan area of Milan,
as a case study. Data of convective extreme rainfall events from 2010 up to
2017 (more than 600 extreme events) have been used to identify and test the
Keywords: Nowcasting, Extreme Rain Events, Deep Neural Networks, Global
Navigation Satellite System, Zenith Tropospheric Delay.
1 Introduction
Many researchers in the field of meteorology claim that machine learning tech-
niques will soon overperform the traditional physically based models in weather fore-
Also, black box models seem to be well suited for real-time application, since they
are faster due to the lower computational effort required with respect to the traditional
meteorological nowcasting methodologies, which are based on physically based mod-
In particular, extreme events are very difficult to predict with classical Numerical
Weather Prediction (NWP) models because they usually affect very small and loca l-
ized areas and the convection is triggered by peculiar and local conditions, requiring
both high-resolution NWP and high temporal and spatial resolution observations.
In this work, we deal with the problem of forecasting the occurrence of extreme lo-
cal rainfall events 30 minutes ahead.
The considered area, located in Lombardy region, Northern Italy, is composed by
the hydrological basin of four torrential rivers (Lambro, Seveso, Groane, and Olona).
This is a high-risk territory due to the high frequency of severe and short thunder-
storms, which usually trigger flash floods. The situation is even more critical due to
the presence of the metropolitan area of Milan, where the flows coming from the four
considered rivers are drained, causing severe damage. In 2014, for instance, floods
produced damages evaluated in several million euros in the Milan municipality.
In this work, we adopted advanced machine learning tools, the Deep Neural Net-
works (DNNs hereafter), which receive as input some meteorological variables sam-
pled inside and around the study area and return as output the prediction about the
occurrence of an extreme event.
In addition to the classical meteorological variables (temperature, pressure, hu-
midity, wind speed), we also included the Zenith Tropospheric Delay (ZTD), which
seems to be promising since it is a proxy of water vapor in the atmosphere, a funda-
mental variable in rain events genesis [1] [2] [3] [4].
This represents a novel element of this research since it is one of the first attempts
to use the ZTD in a black box model for prediction of severe storms [5] [6]. We quan-
tify the impact of ZTD repeating the task twice: the first time without considering
ZTD, the second including it within the model inputs.
Developing a black box model for this environmental problem could become an
innovative nowcasting product exploitable also by Civil Protection Agencies to face
This work is part of the Lombardy based Advanced Meteorological Predictions and
Observations (LAMPO) project (
2 Methods
2.1 Extreme Event Definition
The objective of this work is to identify machine learning models able to forecast
the occurrence of extreme rainfall events 30 minutes ahead.
We consider a rainfall event as extreme if it persists for more than 25 minutes
within the study area and if the radar reflectivity factor is greater than 50 dBZ.
2.2 Machine Learning Models
the task we are dealing with is a binary classification task.
As it is well-known, while developing machine learning tools, it is important to
start with some simple models which will be considered as a benchmark for more
complex (and hopefully more performing) ones. In this case, we adopted a logistic
regression (see Fig. 1) as a baseline model, using its Python implementation provided
by Scikit-learn library [7].
The logistic regression is a linear classifier which splits the feature space (which in
this case is a high-dimensional one) with a linear manifold and classifies each sample
according to its position relative to a linear decision boundary.
Given the complexity of almost all the real-world applications, it is unlikely that
the decision boundary is actually a linear one. For this reason, we introduced a more
advanced machine learning model which can efficiently deal with problems where
classes are not linearly separable: a DNN [8] (see Fig. 1).
The deep neural network here considered has a traditional fully connected structure
[9] and has been implemented in Keras [10] with TensorFlow backend.
Fig. 1. Representation of the considered model's architectures.
To find the best combination of hyper-parameter (learning rate, batch size, regular-
ization rate, activation functions shape, number of hidden layers, number of neurons
for each layer, class weights) values, we implemented a traditional grid search ap-
The dataset used to identify the classifiers has been split into training (70 % of the
samples), validation (15 %) and test (15 %) sets, as it is common practice in the neu-
Since we are dealing with a classification task, we considered the binary cross-
entropy as loss function and the overall classification accuracy as validation metrics.
Early stopping and L2 norm weight regularization have been used to avoid overfit-
ting on training data. The performances, in terms of overall accuracy and confusion
matrix, are then evaluated on the test set.
2.3 Meteorological Variables
Several classical meteorological variables are measured every 10 minutes: temper-
ature, air pressure, wind speed, and relative humidity. In addition, another variable
has been considered: the Global Navigation Satellite System (GNSS) derived ZTD
estimated from the observations of the permanent geodetic station of Como. ZTD
represents the zenithal delay in the transmission of the GNSS signal from the satellite
to the ground receiver caused by the troposphere [11]. It is the sum of a delay caused
by the troposphere gases in hydrostatic equilibrium, called Zenith Hydrostatic Delay
(ZHD) and a delay caused by the presence of water vapor called Zenith Wet Delay
(ZWD). Since the temporal variations of the first term are very small, the ZTD could
be considered a proxy of the presence of water vapor in the atmosphere [12], which is
a fundamental variable in rain events genesis.
Each sample in the dataset is thus formed by an input vector, whose elements are
the meteorological variables, and by an output value, a boolean variable which repre-
sents the occurrence (or not) of the rainfall extreme event.
The dataset considered in this work covers the period from 2010 to 2017 and con-
tains 656 extreme events (together with thousands of cases where the extreme events
did not occur).
3 Results
The baseline situation (i.e., using logistic regression with traditional meteorological
variables only) guarantees an overall classification accuracy of 72.5 % corresponding
confusion matrix is reported in Fig. 2.
Fig. 2. Confusion matrix obtained with the logistic regression considering traditional mete-
orological variables only.
As already stated in the previous section, given the complexity and the nonlinear
nature of the processes which occur in the atmosphere, it is very unlikely that a simple
model such as the logistic regression would turn out to be the best approach to deal
with the considered problem.
This idea is confirmed by the performances obtained with a more complex model:
a DNN with three hidden layers, each one composed by ten neurons: the overall accu-
racy grows up to 79.0 % (see Fig. 3 for the confusion matrix).
Fig. 3. Confusion matrix obtained with the DNN considering traditional meteorological var-
iables only.
To evaluate the importance of including ZTD estimates, we repeated the identifica-
tion of the two models with the new set of input variables.
Fig. 4 and 5 show the confusion matrices computed with the logistic regression and
the DNN, respectively. Looking at the comparison between the models, the results
exhibit almost the same trend when the ZTD is included or not in the inputs: adopting
complex models like the DNNs, the overall accuracy in the forecasting of extreme
events increases of 6.5 % and 8.5 % for the cases without and with the ZTD, respec-
tively (see Table 1).
Fig. 4. Confusion matrix obtained with the logistic regression, including the ZTD in the in-
put variable set.
Fig. 5. Confusion matrix obtained with the DNN, including the ZTD in the input variable
The performances computed in terms of overall accuracy, which are reported in
Table 1, allow quantifying the value of the information provided by the ZTD meas-
ured at Como. In fact, considering the logistic regression, including the ZTD within
the input set increases the accuracy from 72.5 % to 74.0 % (+1.5 %). The advantage is
even more evident when adopting a DNN: the overall accuracy grows from 79.0 % to
82.5 %.
Table 1. Overall accuracy of the models identified in the study.
Model Overall accuracy
Logistic regression without ZTD 72.5 %
DNN without ZTD 79.0 %
Logistic regression with ZTD 74.0 %
DNN with ZTD 82.5 %
4 Conclusion
In this paper, we showed how machine learning techniques can be effectively used
to forecast extreme rainfall events. In particular, the results demonstrate that complex
nonlinear models, such as the DNNs, overperform the logistic regression, which has
been used as a benchmark. For the considered case study, this advantage can be quan-
tified in the range of 5-10 %.
In addition, we confirm the results recently obtained in [5] and [6]: including the
ZTD in the input set leads to an increase of the model accuracy, especially when
adopting a DNN, of the order of 3-4 %.
This fact seems interesting because the ZTD station, located in Como, is on the
border of our study area. We would expect even better performances in case the sta-
tion where ZTD is measured was localized closer to the center of the study area or if
there were some stations inside and/or outside the considered boundary.
1. Barindelli, S., Realini, E., Venuti, G., Fermi, A., Gatti, A.: Detection of water vapor time
variations associated with heavy rain in northern Italy by geodetic and low-cost GNSS
recievers. Earth Planets Space 70, 28 (2018).
2. De Haan, S.: Assimilation of GNSS ZTD and radar radial velocity for the benefit of very-
short-range regional weather forecasts. Q. J. R. Meteorol. Soc. 139, 2097-2107 (2013).
3. Dousa, J., Vaclavovic, P.: Real-time zenith tropospheric delays in support of numerical
weather prediction applications. Adv. Space Res. 53, 1347-1358 (2014).
4. Benevides, P., Catalão, J., Miranda, P.M.A.: On the inclusion of GPS precipitable water
vapour in the Nowcasting of rainfall. Nat. Hazards Earth Syst. Sci. 15, 2605-2616 (2015).
5. Benevides, P., Catalão, J., Nico, G., Miranda, P.: Evaluation of rainfall forecasts combin-
ing GNSS precipitable water vapor with ground and remote sensing meteorological varia-
bles in a neural network approach. In: Remote Sensing of Clouds and the Atmosphere
XXIII. International Society for Optics and Photonics, p. 1078607 (2018).
6. Benevides, P., Catalao, J., Nico, G.: Neural Network Approach to Forecast Hourly Intense
Rainfall Using GNSS Precipitable Water Vapor and Meteorological Sensors. Remote
Sensing 11(8), 966 (2019).
7. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al:
Scikit-learn: Machine Learning in Python. J Mach Learn Res (2012).
8. Bengio, Y.: Learning Deep Architectures for AI. Found Trends® Mach Learn. 2, 1 127
9. Goodfellow, I., Bengio, Y., Courville, A.: Convolutional Networks. In: Thomas Dietterich,
editor. Deep Learning. Cambridge, Massachusetts; London, England: MIT Press. 321 359
10. Chollet, F.: Keras Documentation. (web: (2015).
11. Kleijer, F.: Troposphere modeling and filtering for precise GPS leveling (2004).
12. Bevis, M., Businger, S., Herring, T.A., Rocken, C., Anthes, R.A., Ware, R.H.: GPS mete-
orology: Remote sensing of atmospheric water vapor using the Global Positioning System.
Journal of Geophysical Research: Atmospheres, 97(D14), 15787-15801 (1992).
... Manandhar et al. (2019) proposed a data-driven method using seven variables including solar radiation collected over a 4-year period in Singapore, and the correct detection rate and FAR score were 80.4 % and 20.3 %, respectively. Sangiorgio et al. (2019) adopted the advanced Deep Neural Network (DNN) to evaluate the performance of extreme precipitation detection using GNSS-ZTD and other traditional meteorological data as the model's inputs, and results showed the model's accuracy was improved by 3-4 %-82.5 %. In addition, Liu et al. (2019) also proposed a model based on an improved BP-NN algorithm using multiple types of meteorological data including GNSS-PWV, and its correct detection rate was over 96 % but the FAR score was about 40 %. ...
... Although the BP-NN algorithm is widely accepted and applied in various fields, it often looks like a "black box" in current research and provides no insight into its physical process, as stated by McCormack (1991). Despite the fact that there are several interpretation techniques available with machine learning methods (McGovern et al., 2019), only few researches have used GNSS derived tropospheric products in the construction of a BP-NN-based model for precipitation detection, and either GNSS-PWV (Liu et al., 2019) or GNSS-ZTD (Sangiorgio et al., 2019) was used separately as the model's inputs in previous researches. However, according to Zhao et al. (2020a), the combination of ZTD and PWV was used in a threshold-based model, which was proved to be more effective. ...
... Based on our datasets and those previous researches using a NNbased model for precipitation detection (Benevides et al., 2019;Sangiorgio et al., 2019), a 7-dimensional variable vector consisting of the seasonal and diurnal variables of DOY and HOD; the commonly used meteorological data of T, P and RH; and GNSS-ZTD and GNSS-PWV was formulated for an exploratory data analysis in this section. The Pearson correlation coefficient (PCC), which is one of the most widely used coefficients to evaluate the dependence of two variables, was used to analyze the correlations among the seven variables, and it can be expressed as (Benesty et al., 2009): ...
Recent years have witnessed a growing interest in using GNSS observations to detect heavy precipitation. In this study, a neural network-based (NN-based) approach taking seven meteorological variables as input data was developed based on the back propagation (BP) algorithm for detecting heavy precipitation. Apart from the surface meteorological variables of temperature, pressure and relative humidity, the model has also adopted other information such as day-of-year, hour-of-day and GNSS-derived zenith total delay and precipitable water vapor (PWV) as input variables. The feasibility of using these variables for developing the BP-NN-based model was elaborated by conducting the feature analysis of the seven input variables. In addition, the criterion for selecting a proper size of training sample was also briefly investigated by studying the impact and sensibility of the sample lengths in the model. The proposed model was developed using a sample size of an 8-year (2010–2017) period in the summer at a pair of co-located GNSS/weather stations−HKSC-KP in Hong Kong. The use of a long-term data is to “reliably” capture the characteristics of the selected variables. The detection results for the summer months in 2018 and 2019 were then compared against corresponding precipitation records to valid the effectiveness of the newly proposed model. Results of the correct detection and false alarm rates were 94.5 % and 20.8 %, respectively, which were significant improvements compared with the existing models.
... The LSTM shows excellent potential in modelling the dynamics of natural systems, as already noted in the Introduction. The applications of LSTM networks range over several fields, including geology [42], energy [43], air quality modelling [44], economy [45], and meteorology [46]. LSTM neural networks can store information about past data, avoiding excessively increasing the dimension of the network itself. ...
... Although the Lura and Seveso Rivers belong to the same hydrological system, the Lambro-Olona basin, using the ANN, the author estimates a value of 0.83 for F1, compared with the 0.97 computed in this study as the flood detection performance for the Lura River. The same is true for another study, conducted on the hydrological basin of the Lambro, Seveso, and Olona Rivers [46], that developed a multivariate DNN to predict water level 30 min ahead. ...
Full-text available
Accurate flow forecasting may support responsible institutions in managing river systems and limiting damages due to high water levels. Machine-learning models are known to describe many nonlinear hydrological phenomena, but up to now, they have mainly provided a single future value with a fixed information structure. This study trains and tests multi-step deep neural networks with different inputs to forecast the water stage of two sub-alpine urbanized catchments. They prove effective for one hour ahead flood stage values and occurrences. Convolutional neural networks (CNNs) perform better when only past information on the water stage is used. Long short-term memory nets (LSTMs) are more suited to exploit the data coming from the rain gauges. Predicting a set of water stages over the following hour rather than just a single future value may help concerned agencies take the most urgent actions. The paper also shows that the architecture developed for one catchment can be adapted to similar ones maintaining high accuracy.
... While the ZWD is most significant for the GNSS accuracy, it is also necessary for weather forecasting, as Hassanli and Rahimzadegan [40], Sangiorgio, Barindelli [41], and Zhao, Liu [42] demonstrated. Different approaches have been attempted to forecast the ZWD such as Katsougiannopoulos and Pikridas [43] who utilized MLP (multi-layer perceptron) to predict ZTD for 6 EUREF stations and discovered that the predicted the ZTD might be as accurate as 3 cm. ...
... This discovery outperforms the 72e82% overall accuracy achieved by Sangiorgio, Barindelli [41]. ...
Full-text available
In recent years, the focus of tropospheric studies has evolved to GNSS meteorology and weather forecasting. The Zenith Wet Delay (ZWD), which might be assembled to the Integrated Water Vapour (IWV), is an essential component of the tropospheric delay. Acquiring predicted the ZWD with the required level of accuracy is crucial for weather forecasting. The scope of this study is to use the adaptive neural fuzzy inference system (ANFIS) to predict the ZWD for the following 6-h epoch based exclusively on the present the ZWD value. It was developed and verified using 505 geographically and internationally distributed stations which were used for training and testing from 2008 to 2019. It was assessed based on two criteria. First, the correlation coefficient (R) values were found to be more than 0.8 in 98% of the stations, including those with highest and lowest latitudes, and the remaining 2% of stations located in coastal areas. Second, the Root Mean Square Error (RMSE) values of the differences between the predicted and the actual ZWD which were considered to be the more important finding of the study. That is, 99.21% of the 505 stations had the RMSE values equal to or less than 3 cm, with only 4 stations having the RMSE values higher (0.2 mm) than 3 cm. Since the results of this study achieved the required degree of accuracy from the predicted ZWD to be utilized in weather forecasting, they may also be beneficial for GNSS meteorology.
... The effect of adding an input parameter, on the extreme rainfall event multiclass classification, was inspected by Sangiorgio et al. [17] They compared the performance of logistic regression and DNN with weather parameters as inputs and found that with the addition of an input parameter selected by them, there was an improvement in the accuracy of the classification. Many researchers have used parameters derived from the GNSS and radar for analysis [18], classification [8], and nowcasting [6] of the rainfall, storms, thunderstorms. ...
... Many researchers have used parameters derived from the GNSS and radar for analysis [18], classification [8], and nowcasting [6] of the rainfall, storms, thunderstorms. The parameters derived from GNSS include zenith tropospheric delay (ZTD) [8], [17], precipitable water vapor [11], [19], [20], IWV [21], and IWV with vertical profiles of wet refractivity [6], to name a few. Most of these are related to multiclass classification and utilize several different features for this purpose. ...
Full-text available
Summer monsoon rainfall contributes more than 75% of the annual rainfall in India. For the state of Maharashtra, India, this is more than 80% for almost all regions of the state. The high variability of rainfall during this period necessitates the classification of rainy and non-rainy days. While there are various approaches to rainfall classification, this paper proposes rainfall classification based on weather variables. This paper explores the use of support vector machine (SVM) and artificial neural network (ANN) algorithms for the binary classification of summer monsoon rainfall using common weather variables such as relative humidity, temperature, pressure. The daily data, for the summer monsoon months, for nineteen years, was collected for the Shivajinagar station of Pune in the state of Maharashtra, India. Classification accuracy of 82.1 and 82.8%, respectively, was achieved with SVM and ANN algorithms, for an imbalanced dataset. While performance parameters such as misclassification rate, F1 score indicate that better results were achieved with ANN, model parameter selection for SVM was less involved than ANN. Domain adaptation technique was used for rainfall classification at the other two stations of Maharashtra with the network trained for the Shivajinagar station. Satisfactory results for these two stations were obtained only after changing the training method for SVM and ANN.
... A classical example are the meteorological processes, whose nonlinearity often generates chaotic trajectories. In such context, the machine learning algorithms proved to outperform the traditional methodologies, mainly relying on linear modelling techniques [1,2]. ...
... One-step predictors are optimized by minimizing MSE y,ŷ (1) . Conversely, a multistep predictors can be directly trained on the loss function computed on the entire h-step horizon: ...
Full-text available
The prediction of chaotic dynamical systems’ future evolution is widely debated and represents a hot topic in the context of nonlinear time series analysis. Recent advances in the field proved that machine learning techniques, and in particular artificial neural networks, are well suited to deal with this problem. The current state-of-the-art primarily focuses on noise-free time series, an ideal situation that never occurs in real-world applications. This chapter provides a comprehensive analysis that aims at bridging the gap between the deterministic dynamics generated by archetypal chaotic systems, and the real-world time series. We also deeply explore the importance of different typologies of noise, namely observation and structural noise. Artificial intelligence techniques turned out to provide robust predictions, and potentially represent an effective and flexible alternative to the traditional physically-based approach for real-world applications. Besides the accuracy of the forecasting, the domain-adaptation analysis attested the high generalization capability of the neural predictors across a relatively heterogeneous spatial domain.
... In addition to the above mentioned two categories, nowadays, increasing attention is also paid to applying the machine learning technique to detecting precipitation events [35]− [38]. Various neural network-based (NN-based) models using GNSS tropospheric products together with other meteorological variables as the input parameters of the model have been developed and the performances resulting from these models are promising [39]− [42]. ...
In recent years, tropospheric products obtained from ground-based Global Navigation Satellite Systems (GNSS) measurements, especially the zenith total delay (ZTD) and precipitable water vapor (PWV) estimates, have advanced their usages in meteorological applications such as the detection of precipitation events. Generally, a cumulative anomaly time series of any atmospheric variable, which represents the long-term departure of the variable from its “normal” cycle, is widely used for quantitatively estimating the variable’s variations in response to a weather event. In this study, a new cumulative anomaly-based model (NCAM) containing 14 variables, including not only PWV and ZTD values, but their respective six types of derivatives, for detecting heavy precipitation was developed. The 6-h cumulative anomaly time series of the variables were calculated based on the data of hourly precipitation records and time series of ZTD and PWV collected at the co-located HKSC-KP stations over the 8-year period 2010-2017. The model was evaluated using the 14 variables’ cumulative anomaly time series to detect heavy precipitation events happened in the summer months over the period 2018-2019, and precipitation records in the same period were used as the reference. Results demonstrated that 99.1% of heavy precipitation were correctly detected by the NCAM with a lead time of 2.87 h, and the FAR score resulting from the model was reduced to 22.4%. In addition, two case studies were also conducted to verify the effectiveness of the NCAM. These results all provide a promising direction for the application of using cumulative anomaly time series of GNSS tropospheric products to the detection of heavy precipitation events.
... Moreover, experimental results obtained by feeding NWP models and black box models with ZTD values retrieved by GNSS will be shown. Some of the presented results and methodologies are published in the co-authored articles [95], [94], [57] and [56]. ...
... e simulation and experiment results showed that the retrieving profiles based on the proposed method were better than those obtained by Lowry's method and the Hopfield model [15]. Sangiorgio et al. [16] implemented advanced deep learning predictors, such as the deep neural nets (DNNs), for the forecasting of the occurrence of extreme rainfalls. ...
Full-text available
Real-time modeling of regional troposphere has attracted considerable research attention in the current GNSS field, and its modeling products play an important role in global navigation satellite system (GNSS) real-time precise positioning and real-time inversion of atmospheric water vapor. Multicore support vector machine (MS) based on genetic optimization algorithm, single-core support vector machine (SVM), four-parameter method (FP), neural network method (BP), and root mean square fusion method (SUM) are used for real-time and final zenith tropospheric delay (ZTD) modeling of Hong Kong CORS network in this study. Real-time ZTD modeling experiment results for five consecutive days showed that the average deviation (bias) and root mean square (RMS) of FP, BP, SVM, and SUM reduced by 48.25%, 54.46%, 41.82%, and 51.82% and 43.16%, 48.46%, 30.09%, and 33.86%, respectively, compared with MS. The final ZTD modeling experiment results showed that the bias and RMS of FP, BP, SVM, and SUM reduced by 3.80%, 49.78%, 25.71%, and 49.35% and 43.16%, 48.46%, 30.09%, and 33.86%, respectively, compared with MS. Accuracy of the five methods generally reaches millimeter level in most of the time periods. MS demonstrates higher precision and stability in the modeling of stations with an elevation at the average level of the survey area and higher elevation than that of other models. MS, SVM, and SUM exhibit higher precision and stability in the modeling of the station with an elevation at the average level of the survey area than FP. Meanwhile, real-time modeling error distribution of the five methods is significantly better than the final modeling. Standard deviation and average real-time modeling improved by 43.19% and 24.04%, respectively.
... In our previous study [17], a new model including five predictors derived from GNSS-PWV timeseries was developed, and the model's probability of detection (POD) and FAR were 95.5% and 28.9%, respectively. Apart from these threshold-based models, for which a set of predefined thresholds for the predictors adopted in the model were used to make predictions, the neural network (NN) technique has also been applied to precipitation prediction by incorporating PWV with other atmospheric parameters as input variables of the model [18,19]. Benevides et al. [20] presented a nonlinear autoregressive exogenous neural network model developed based on the integration of GNSS and meteorological data for the short-term prediction of intense precipitation events. ...
Full-text available
Nowadays, precipitable water vapor (PWV) retrieved from ground-based Global Navigation Satellite Systems (GNSS) tracking stations has heralded a new era of GNSS meteorological applications, especially for severe weather prediction. Among the existing models that use PWV timeseries to predict heavy precipitation, the “threshold-based” models, which are based on a set of predefined thresholds for the predictors used in the model for predictions, are effective in heavy precipitation nowcasting. In previous studies, monthly thresholds have been widely accepted due to the monthly patterns of different predictors being fully considered. However, the primary weakness of this type of thresholds lies in their poor prediction results in the transitional periods between two consecutive months. Therefore, in this study, a new method for the determination of an optimal set of diurnal thresholds by adopting a 31-day sliding window was first proposed. Both the monthly and diurnal variation characteristics of the predictors were taken into consideration in the new method. Then, on the strength of the new method, an improved PWV-based model for heavy precipitation prediction was developed using the optimal set of diurnal thresholds determined based on the hourly PWV and precipitation records for the summer over the period 2010–2017 at the co-located HKSC–KP (King’s Park) stations in Hong Kong. The new model was evaluated by comparing its prediction results against the hourly precipitation records for the summer in 2018 and 2019. It is shown that 96.9% of heavy precipitation events were correctly predicted with a lead time of 4.86 h, and the false alarms resulting from the new model were reduced to 25.3%. These results suggest that the inclusion of the diurnal thresholds can significantly improve the prediction performance of the model.
Spatiotemporal analysis has drawn a lot of attentions recently on industrial time series prediction. Most of the existing methods cannot consider the production semantics or recognize the time-order characteristic of different processes in spatial dimension. In this study, a novel multi-layer spatiotemporal network based on granular modeling and long short-term memory (LSTM) network is proposed. Regarding the energy data that contain semantic characteristics, a number of information granules are first partitioned based on the process or phase semantics. For representing the spatiotemporal dependencies among the granular sequences, a spatiotemporal topology network is established, where the units are defined specifically by their spatial layers located. In order to reduce the network complexity, a new gating mechanism is further put forward within the units, with which the network can understand the time-order characteristics of spatial granules at each time step and adjust the impact of input data on the updating procedure accordingly. The performance of the proposed method is validated on the benchmark dataset and the practical energy data with different fluctuation characteristics, using state-of-the-art comparative algorithms. The results indicate that the proposed method can recognize the spatiotemporal relationship effectively and outperforms the comparative ones on prediction accuracy.
Full-text available
This work presents a methodology for the short-term forecast of intense rainfall based on a neural network and the integration of Global Navigation and Positioning System (GNSS) and meteorological data. Precipitable water vapor (PWV) derived from GNSS is combined with surface pressure, surface temperature and relative humidity obtained continuously from a ground-based meteorological station. Five years of GNSS data from one station in Lisbon, Portugal, are processed. Data for precipitation forecast are also collected from the meteorological station. Spaceborne Spinning Enhanced Visible and Infrared Imager (SEVIRI) data of cloud top measurements are also gathered, providing collocated information on an hourly basis. In previous studies it was found that the time-varying PWV is correlated with rainfall and can be used to detected heavy rain. However, a significant number of false positives were found, meaning that the evolution of PWV does not contain enough information to infer future rain. In this work, a nonlinear autoregressive exogenous neural network model (NARX) is used to process the GNSS and meteorological data to forecast the hourly precipitation. The proposed methodology improves the detection of intense rainfall events and reduces the number of false positives, with a good classification score varying from 63% up to 72% and a false positive rate of 36% down to 21%, for the tested years in the dataset. A score of 64% for intense rain events classification with 22% false positive rate is obtained for the most recent years. The method also achieves an almost 100% hit rate for the rain vs no rain detection, with close to no false alarms.
Full-text available
GNSS atmospheric water vapor monitoring is not yet routinely performed in Italy, particularly at the regional scale. However, in order to support the activities of regional environmental protection agencies, there is a widespread need to improve forecasting of heavy rainfall events. Localized convective rain forecasts are often misplaced in space and/or time, causing inefficiencies in risk mitigation activities. Water vapor information can be used to improve these forecasts. In collaboration with the environmental protection agencies of the Lombardy and Piedmont regions in northern Italy, we have collected and processed GNSS and weather station datasets for two heavy rain events: one which was spatially widespread, and another which was limited to few square kilometers. The time variations in water vapor derived from a regional GNSS network with inter-station distances on the order of 50 km were analyzed, and the relationship between the time variations and the evolution of the rain events was evaluated. Results showed a signature associated with the passage of the widespread rain front over each GNSS station within the area of interest. There was a peak in the precipitable water vapor value when the heavier precipitation area surrounded the station, followed by a steep decrease (5–10 mm in about 1 h) as the rainclouds moved past the station. The smaller-scale event, a convective storm a few kilometers in extent, was not detected by the regional GNSS network, but strong fluctuations in water vapor were detected by a low-cost station located near the area of interest.
Full-text available
The temporal behaviour of precipitable water vapour (PWV) retrieved from GPS delay data is analysed in a number of case studies of intense precipitation in the Lisbon area, in the period 2010–2012 and in a continuous annual cycle of 2012 observations. Such behaviour is found to correlate positively with the probability of precipitation, especially in cases of severe rainfall. The evolution of the GPS PWV in a few stations is analysed by a least-squares fitting of a broken line tendency, made by a temporal sequence of ascents and descents over the data. It is found that most severe rainfall events occur in descending trends after a long ascending period and that the most intense events occur after steep ascents in PWV. A simple algorithm, forecasting rain in the 6 h after a steep ascent of the GPS PWV in a single station, is found to produce reasonable forecasts of the occurrence of precipitation in the nearby region, without significant misses in what concerns larger rain events, but with a substantial amount of false alarms. It is suggested that this method could be improved by the analysis of 2-D or 3-D time-varying GPS PWV fields or by its joint use with other meteorological data relevant to nowcast precipitation.
Full-text available
Theoretical results strongly suggest that in order to learn the kind of complicated functions that can repre- sent high-level abstractions (e.g. in vision, language, an d other AI-level tasks), one needs deep architec- tures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult opti mization task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper d iscusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.
Full-text available
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from
Full-text available
Precise height differences (5--10 mm standard deviation) are of interest for applications such as maintenance of the Amsterdam Ordnance Datum and deformation analysis. For these applications the Global Positioning System (GPS) is a cost-effective alternative for classic leveling techniques. However, the constituents in the troposphere, of which water vapor is both spatially and temporally the most variable component, cause the GPS signals to be delayed. Several types of troposphere modeling are investigated for GPS leveling because the geometry causes the height component to be strongly affected by the signal delays. The dissertation describes physical, functional, and stochastic aspects of this modeling and gives recursive filtering techniques that can be used in the data processing. Static networks or baselines with geodetic receivers collecting two-frequency phase observables were assumed. The most important models and filters are implemented in simulation software with which the sensitivities of the height difference are analysed. By frequent estimation of zenith delays using mapping functions, biases in the filtered height differences can be largely avoided, but the use of spatiotemporal constraints with that turns out to have hardly a positive contribution and can even have a precision-deteriorating effect. A stochastic model for slant delays based on Kolmogorov turbulence shows to be potentially precision improving (10--30% of the standard deviation), but this model is to be validated. Furthermore, the simulations show that, even for observation times longer than three hours, correctly resolving the GPS phase ambiguities has a precision improving effect of 15--20%.
Full-text available
We present a new approach to remote sensing of water vapor based on the Global Positioning System (GPS). Geodesists and geophysicists have devised methods for estimating the extent to which signals propagating from GPS satellites to ground-based GPS receivers are delayed by atmospheric water vapor. This delay is parameterized in terms of a time-varying zenith wet delay (ZWD) which is retrieved by stochastic filtering of the GPS data. Given surface temperature and pressure readings at the GPS receiver, the retrieved ZWD can be transformed with very little additional uncertainty into an estimate of the integrated water vapor (IWV) overlying that receiver. Networks of continuously operating GPS receivers are being constructed by geodesists, geophysicists, and government and military agencies, in order to implement a wide range of positioning capabilities. These emerging GPS networks offer the possibility of observing the horizontal distribution of IWV or, equivalently, precipitate water with unprecedented coverage and a temporal resolution of the order of 10 min. These measurements could be utilized in operational weather forecasting and in fundamental research into atmospheric storm systems, the hydrologic cycle, atmospheric chemistry, and global climate change.
The Geodetic Observatory Pecný (GOP) routinely estimates near real-time zenith total delays (ZTD) from GPS permanent stations for assimilation in numerical weather prediction (NWP) models more than 12 years. Besides European regional, global and GPS and GLONASS solutions, we have recently developed real-time estimates aimed at supporting NWP nowcasting or severe weather event monitoring. While all previous solutions are based on data batch processing in a network mode, the real-time solution exploits real-time global orbits and clocks from the International GNSS Service (IGS) and Precise Point Positioning (PPP) processing strategy. New application G-Nut/Tefnut has been developed and real-time ZTDs have been continuously processed in the nine-month demonstration campaign (February–October, 2013) for selected 36 European and global stations. Resulting ZTDs can be characterized by mean standard deviations of 6–10 mm, but still remaining large biases up to 20 mm due to missing precise models in the software. These results fulfilled threshold requirements for the operational NWP nowcasting (i.e. 30 mm in ZTD). Since remaining ZTD biases can be effectively eliminated using the bias-reduction procedure prior to the assimilation, results are approaching the target requirements in terms of relative accuracy (i.e. 6 mm in ZTD). Real-time strategy and software are under the development and we foresee further improvements in reducing biases and in optimizing the accuracy within required timeliness. The real-time products from the International GNSS Service were found accurate and stable for supporting PPP-based tropospheric estimates for the NWP nowcasting.
Wind, humidity and temperature observations from aircraft and radiosondes are generally used to find the best initial state of the atmosphere for numerical weather prediction (NWP). To be of use for very-short-range numerical weather forecasting (or numerical nowcasting), these observations need to be available within several minutes after observation time. Radiosondes have a typically observation latency of over 30 min and arrive too late for numerical nowcasting. Zenith Total Delay (ZTD) observations obtained from a ground-based network of Global Navigation Satellite System (GNSS) receivers can fill this gap of lacking rapid humidity information. ZTD contains information on the total amount of water vapour. Other rapidly available observations, such as radial wind estimates from Doppler weather radars, can also be exploited. Both observations are available with a delay of less than 5 min with adequate spatial resolution. In this article, the impact of assimilation of these humidity and wind observations in a very-short-range regional forecast model is assessed over a four-month summer period and a six-week winter period. As a reference for the impact, GNSS observations are also assimilated in a three-hourly NWP scheme with longer observation cut-off times. The quality of the forecasts is evaluated against radiosonde observations, radar radial wind and hourly precipitation observations. Assimilation of both GNSS ZTD and radar radial winds resulted in a positive impact on humidity, rainfall and wind forecasts.