ArticlePublisher preview available

Runoff prediction using a multi-scale two-phase processing hybrid model

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Accurate and timely runoff prediction is essential for effective water resource management and controlling floods and droughts. However, the stochasticity of runoff due to environmental changes and human activities poses a significant challenge in achieving reliable predictions. This paper presents a multi-scale two-phase processing strategy to develop a hybrid model for runoff prediction. In the first phase of model design, the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) is utilised to identify significant frequencies in the non-stationary target data series. The inputs to the model are decomposed into intrinsic modal functions during this stage. In the second phase, the swarm decomposition (SWD) is used to decompose high-frequency components with consistently high values of time-shift multi-scale weighted permutation entropy (TSMWPE) into sub-sequences. This permits further identification and establishment of data attributes that are incorporated into the extreme learning machine (ELM) algorithm. The ELM then simulates the series of component data, creating a comprehensive tool for runoff prediction. The hybrid model demonstrates exceptional accuracy, achieving a Nash-Sutcliffe efficiency greater than 0.95 and a qualification rate exceeding 0.93. This model can be utilised in decision-making systems as an efficient and accurate solution for generating reliable predictions, particularly for hydrological challenges characterized by non-stationary data.
This content is subject to copyright. Terms and conditions apply.
ORIGINAL PAPER
Stochastic Environmental Research and Risk Assessment (2025) 39:1059–1076
https://doi.org/10.1007/s00477-025-02907-3
al. 2024b). Process-driven modeling is complex and has
stringent data requirements, which many watersheds often
cannot meet (Granata and Fabio Di Nunno 2024; Xu et al.
2021). In contrast, data-driven modeling requires less infor-
mation, oers greater adaptability, and demonstrates better
prediction performance. As a result, data-driven modeling
is becoming increasingly popular in practical applications
(Wu et al. 2023). In data-driven modeling, traditional pre-
diction methods, such as auto-regressive moving average
model and multiple linear regression model, are commonly
used. While these methods have a fast computational speed,
they often struggle to provide satisfactory predictions for
nonlinear and non-stationary runo series (AlDahoul et al.
2023).
The application of articial intelligence has led to machine
learning (ML) becoming the most widely used method for
forecasting (Ditthakit et al. 2023). ML can uncover deep
connections within time series data, build complex nonlin-
ear high-dimensional models, and establish corresponding
mathematical relationships to address nonlinear problems,
thereby enhancing the reliability of nonlinear time series
1 Introduction
Runo is a crucial stochastic variable in various environ-
mental processes, and its spatial and temporal variability
directly impacts local and global water, energy and matter
cycles. Accurate and reliable runo forecasting is vital for
preventing oods and droughts, as well as for the ecient
and rational utilisation of water resources. This is signicant
for promoting sustainable development of society (Wang
and Peng 2024a). However, runo is highly stochastic and
non-stationarity in nature due to the inuence of multiple
factors, which makes accurate runo prediction challenging
(Zhao et al. 2021).
There are currently two approaches to predicting run-
o: process-driven and data-driven methods (Wang et
Xuehua Zhao
zhaoxh688@126.com
1 College of Water Resources Science and Engineering,
Taiyuan University of Technology, Taiyuan 030024, China
Abstract
Accurate and timely runo prediction is essential for eective water resource management and controlling oods and
droughts. However, the stochasticity of runo due to environmental changes and human activities poses a signicant
challenge in achieving reliable predictions. This paper presents a multi-scale two-phase processing strategy to develop a
hybrid model for runo prediction. In the rst phase of model design, the improved complete ensemble empirical mode
decomposition with adaptive noise (ICEEMDAN) is utilised to identify signicant frequencies in the non-stationary tar-
get data series. The inputs to the model are decomposed into intrinsic modal functions during this stage. In the second
phase, the swarm decomposition (SWD) is used to decompose high-frequency components with consistently high values
of time-shift multi-scale weighted permutation entropy (TSMWPE) into sub-sequences. This permits further identication
and establishment of data attributes that are incorporated into the extreme learning machine (ELM) algorithm. The ELM
then simulates the series of component data, creating a comprehensive tool for runo prediction. The hybrid model dem-
onstrates exceptional accuracy, achieving a Nash-Sutclie eciency greater than 0.95 and a qualication rate exceeding
0.93. This model can be utilised in decision-making systems as an ecient and accurate solution for generating reliable
predictions, particularly for hydrological challenges characterized by non-stationary data.
Keywords ICEEMDAN · ELM · Runo prediction · SWD · TSMWPE
Accepted: 3 January 2025 / Published online: 19 January 2025
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025
Runo prediction using a multi-scale two-phase processing hybrid
model
XuehuaZhao1· HuifangWang1· QiucenGuo1· JiatongAn1
1 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... It consists primarily of two strategies: divide and conquer and data reconstruction. Common data processing techniques include EMD, EEMD and VMD Wang et al. 2024;Zhao et al. 2025). Tan et al. (2018) developed a hybrid method for predicting runoff using EEMD and ANN model. ...
... Previous research applied EMD to process runoff signals and adaptively decompose data into various intrinsic mode functions (IMFs) with multiple wavelengths (Zhao et al. 2025). However, EMD exhibits modal mixing, where a single IMF may contain multiple scales of signals or have the same scale of signals, or exist in different IMFs simultaneously . ...
... This study evaluates point forecasting effects (monthly and daily scales) using MAE, MAPE, RMSE, and MdAPE Zhao et al. 2025). For interval forecasting, three evaluation indicators ) are considered. ...
Article
Full-text available
Runoff forecasting precision is critical for water resource management and watershed ecological operation. However, robust runoff prediction is difficult because of runoff series' instability and nonlinearity. To address these challenges, this study develops an innovative ensemble forecasting framework that integrates decomposition-reconstruction techniques and a weight combination strategy to enhance both point and interval forecasting. Specifically, we propose a hybrid point-interval forecasting framework that leverages commonly used distribution functions to quantify uncertainty and generate high-quality runoff prediction intervals. Compared with traditional methods, the proposed framework improves predictive accuracy by optimizing the combination of multiple forecasting models and reducing error accumulation through adaptive sequence reconstruction. To evaluate the effectiveness of the proposed approach, we conducted a comparative analysis across multiple watersheds. Outcomes indicate that: (1) The proposed point forecasting framework outperforms the benchmark models (e.g., Mean Absolute Percentage Error (MAPE) ≤ 0.1494); (2) By incorporating decomposition-reconstruction technology, our framework efficiently captures runoff characteristics, thereby enhancing forecasting performance; (3) The constructed interval forecasting framework effectively generates accurate forecasting intervals, achieving a minimum Prediction Interval Coverage Probability (PICP) value of 0.8750 on the monthly scale and 0.9676 on the daily scale. These findings highlight the effectiveness of the proposed hybrid framework as a powerful tool for water resource planning and decision-making.
Article
Full-text available
Streamflow forecasting holds a pivotal role in the effective management of water resources, flood control, hydropower generation, agricultural planning, and environmental conservation. This study assessed the effectiveness of a stacked Multilayer Perceptron-Random Forest (MLP-RF) ensemble model for short- to medium-term (7 to 15 days ahead) daily streamflow forecasts in the UK. The stacked model combines MLP and RF, enhancing generalization by capturing complex nonlinear relationships and robustness to noisy data. Stacking reduces bias and variance by aggregating predictions and addressing differing sources of bias and variance in MLP and RF. Furthermore, this ensemble model is computationally inexpensive. The study also examined the impact of different meta-learner algorithms, Elastic Net (EN), Isotonic Regression (IR), Pace Regression (PR), and Radial Basis Function (RBF) Neural Networks, on model performance. For 1-day ahead forecasts, all models performed well (Kling Gupta efficiency, KGE, from 0.921 to 0.985, mean absolute percentage error, MAPE, from 3.59 to 13.02%), with minimal impact from the choice of meta-learner. At 7-day ahead forecasts, satisfactory results were obtained (KGE from 0.876 to 0.963, MAPE from 11.53 to 24.55%), while at the 15-day horizon, accuracy remained reasonable (KGE from 0.82 to 0.961, MAPE from 18.31 to 34.38%). The RBF meta-learner generally led to more accurate predictions, particularly affecting low and peak flow rates. RBF consistently outperformed in predicting low flow rates, while EN excelled in predicting flood flow rates in many cases. For estimating total discharged water volume, all models exhibited low relative error (< 0.08).
Article
Full-text available
Accurate streamflow prediction is significant for water resources management. However, due to the impact of climate change and human activities, accurately identifying the input factors of the streamflow prediction model and achieving high-precision results presents a significant challenge. In this study, past streamflow, meteorological, and climate factors were utilized as inputs to develop a predictive scenario for the bi-decomposition of input factors and streamflow series, i.e. Scenario 3 (S3). Mutual information (MI) was applied to recognize the input factors prediction potential. Based on the predictive potentials, factors were progressively incorporated into the kernel extreme learning machine (KELM) and hybrid kernel extreme learning machine (HKELM) models optimized by the gazelle optimization algorithm (GOA) to ascertain the optimal input configuration for each sub-series. The prediction results of S3-KELM and S3-HKELM models were obtained by reconstructing the optimal prediction results of each sub-series. The monthly streamflow of the upper Fenhe River Basin, which is in the semi-humid and semi-arid climate zone, was selected as a case study. The results indicate that in comparison to both undecomposed and singly decomposed scenarios, the input–output bi-decomposed scenario more accurately identifies the input factors and constructs high-precision prediction models. The Nash–Sutcliffe efficiency (NSE) of both the S3-KELM and S3-HKELM models exceeds 0.85. Specifically, the S3-HKELM model demonstrates superior performance, capable of handling more complex inputs, with its NSE reaching up to 0.93. Importantly, meteorological and climate factors contribute to the accuracy of streamflow predictions across different scenarios.
Article
Full-text available
Accurate runoff prediction is vital in optimizing reservoir scheduling, efficiently managing water resources, and ensuring the effective utilization of water resources. In this paper, a hybrid prediction model combining complete ensemble empirical mode decomposition with adaptive noise, variational mode decomposition, CABES, and long short-term memory network (CEEMDAN-VMD-CABES-LSTM) is proposed. Firstly, CEEMDAN is used to decompose the original data, and the high-frequency component obtained from the CEEMDAN decomposition is decomposed using VMD. Then, each component is input into the LSTM optimized by CABES for prediction. Finally, the results of individual component predictions are combined and reconstructed to produce the monthly runoff predictions. The hybrid model is employed to predict the monthly runoff at the Xiajiang hydrological station and the Yingluoxia hydrological station. A comprehensive comparison is conducted with other models including BP, LSTM, SSA-LSTM, bald eagle search (BES)-LSTM, CABES-LSTM, CEEMDAN-CABES-LSTM, and VMD-CABES-LSTM. The assessment of each model's prediction performance uses four evaluation indexes. Results reveal that the CEEMDAN-VMD-CABES-LSTM model showcased the highest forecast accuracy among all the models evaluated. Compared with the single LSTM, the root mean square error (RMSE) and mean absolute percentage error (MAPE) of the Xiajiang hydrological station decreased by 71.09 and 65.26%, respectively, and the RMSE and MAPE of the Yingluoxia hydrological station decreased by 65.13 and 40.42%, respectively. The R and Nash efficiency coefficient (NSEC) values obtained for both sites are near 1.
Article
Full-text available
Accurate runoff forecasting plays a vital role in issuing timely flood warnings. Whereas, previous research has primarily focused on historical runoff and precipitation variability while disregarding other factors' influence. Additionally, the prediction process of most machine learning models is opaque, resulting in low interpretability of model predictions. Hence, this study develops an ensemble deep learning model to forecast runoff from three hydrological stations. Initially, time‐varying filtered based empirical mode decomposition is employed to decompose the runoff series into several internal mode functions (IMFs). Subsequently, the complexity of each IMF component is evaluated by the multi‐scale permutation entropy, and the IMFs are classified into high‐ and low‐frequency portions based on entropy values. Considering the high‐frequency portions still exhibit great volatility, robust local mean decomposition is adopted to perform secondary decomposition of the high‐frequency portions. Then, the meteorological variables processed by the Relief algorithm and variance inflation factor features are employed as inputs, the individual subsequences of secondary and preliminary decomposition as outputs to the bidirectional gated recurrent unit and extreme learning machine models. Random forests (RF) are introduced to nonlinear ensemble the individual predicted sub‐models to obtain the final prediction results. The proposed model outperforms other models in various evaluation metrics. Meanwhile, due to the opaque nature of machine learning models, shapley is employed to assess the contribution of each selected meteorological variable to the long‐term trend of runoff. The proposed model could serve as an essential reference for precise flood prediction and timely warning.
Chapter
Nowadays, hydrological systems are becoming increasingly complex owing to the growing interaction between nature and humans at the local scale of river sections, lakes, reservoirs, catchments, etc., to the global scale. There is great demand for the development of models to evaluate, predict, and optimize the performance of complex hydrological systems whose behaviour is characterized by a strong nonlinearity. However, traditional approaches can hardly handle this nonlinear behaviour; moreover, the analysis of hydrological systems at large or even global scale, requires dealing with large-volume and real-time data. In recent years, artificial intelligence (AI), especially deep learning, has shown great potential to process massive data and solve large-scale nonlinear problems. AI has been successfully applied to computer vision, machine translation, bioinformatics, drug design, and climate science. AI models have produced results comparable to and even better than expert human performance. It is expected that AI can significantly contribute to hydrology research as well as development. This book presents some of the latest advances in the field of AI in hydrology. Both theoretical and experimental chapters are included, covering new and emerging AI methods and models from various challenging problems in hydrology. In Focus–a book series that showcases the latest accomplishments in water research. Each book focuses on a specialist area with papers from top experts in the field. It aims to be a vehicle for in-depth understanding and inspire further conversations in the sector.
Article
The machine learning models of multiple linear regression (MLR), support vector regression (SVR), and extreme learning machine (ELM) and the proposed ELM models of online sequential ELM (OS-ELM) and OS-ELM with forgetting mechanism (FOS-ELM) are applied in the prediction of the lime utilization ratio of dephosphorization in the basic oxygen furnace steelmaking process. The ELM model exhibites the best performance compared with the models of MLR and SVR. OS-ELM and FOS-ELM are applied for sequential learning and model updating. The optimal number of samples in validity term of the FOS-ELM model is determined to be 1500, with the smallest population mean absolute relative error (MARE) value of 0.058226 for the population. The variable importance analysis reveals lime weight, initial P content, and hot metal weight as the most important variables for the lime utilization ratio. The lime utilization ratio increases with the decrease in lime weight and the increases in the initial P content and hot metal weight. A prediction system based on FOS-ELM is applied in actual industrial production for one month. The hit ratios of the predicted lime utilization ratio in the error ranges of ±1%, ±3%, and ±5% are 61.16%, 90.63%, and 94.11%, respectively. The coefficient of determination, MARE, and root mean square error are 0.8670, 0.06823, and 1.4265, respectively. The system exhibits desirable performance for applications in actual industrial production.
Article
This study addresses the critical need for precise Direct Normal Irradiation forecasting in concentrating solar power systems to enhance performance and manage power generation intermittency. We propose a novel hybrid model that combines Variation Mode Decomposition, Swarm Decomposition Algorithm, Random Forest for feature selection, and Deep Convolutional Neural Networks, aiming to improve the forecasting accuracy. This model covers the entire process from Direct Normal Irradiation forecasting to heliostat field optimization and electricity generation. We validated the model across four globally diverse regions, taking into account their distinct climates and meteorological conditions. The results show that our model aligns closely with actual measurements and outperforms existing forecasting methods in terms of precision and stability. The forecasting performance was assessed using normalized Root Mean Square Error, with results ranging from 0.75% to 3.4% across different regions. This demonstrates the model's potential for real-world application in concentrating solar power systems, optimizing heliostat field effectiveness, and reliably forecasting electricity production for grid management.