## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

One of the goals of efficient water supply management is the regular supply of clean water at the pressure required by consumers. In this context, predicting water consumption in urban areas is of key importance for water supply management. This prediction is also relevant in processes for reviewing prices; as well as for operational management of a water network. In this paper, we describe and compare a series of predictive models for forecasting water demand. The models are obtained using time series data from water consumption in an urban area of a city in south-eastern Spain. This includes highly non-linear time series data, which has conditioned the type of models we have included in our study. Namely, we have considered artificial neural networks, projection pursuit regression, multivariate adaptive regression splines, random forests and support vector regression. Apart from these models, we also propose a simple model based on the weighted demand profile resulting from our exploratory analysis of the data.In our comparative study, all predictive models were evaluated using an experimental methodology for hourly time series data that detailed water demand in a hydraulic sector of a water supply network in a city in south-eastern Spain. The accuracy of the obtained results, together with the medium size of the demand area, suggests that this was a suitable environment for making adequate management decisions.

To read the full-text of this research,

you can request a copy directly from the authors.

... Moreover, changes in water demand exhibit nonlinearity due to non-linear changes in water consumption, temperature variations, and holidays impacting urban water usage [12] and cannot be accurately predicted by linear approaches [13]. In the realm of water demand forecasting, some of the AI models have been explored in previous studies such as artificial neural network (ANN) [5,6], support vector machines (SVM) [13][14][15][16], SVM method using Fourier method [14], random forests [15], extreme learning machines (ELM) [15,17], employed ELM in conjunction with wavelet based ANN [18], system dynamics modeling (SDM) [19], ensemble wavelet-bootstrap machine-learning approach [20], singular spectrum analysis coupled with neural networks [21], and adaptive neuro-fuzzy inference system (ANFIS) [22]. In general, these learning techniques have shown successful results and are widely applied in water demand forecasting. ...

... Moreover, changes in water demand exhibit nonlinearity due to non-linear changes in water consumption, temperature variations, and holidays impacting urban water usage [12] and cannot be accurately predicted by linear approaches [13]. In the realm of water demand forecasting, some of the AI models have been explored in previous studies such as artificial neural network (ANN) [5,6], support vector machines (SVM) [13][14][15][16], SVM method using Fourier method [14], random forests [15], extreme learning machines (ELM) [15,17], employed ELM in conjunction with wavelet based ANN [18], system dynamics modeling (SDM) [19], ensemble wavelet-bootstrap machine-learning approach [20], singular spectrum analysis coupled with neural networks [21], and adaptive neuro-fuzzy inference system (ANFIS) [22]. In general, these learning techniques have shown successful results and are widely applied in water demand forecasting. ...

... Moreover, changes in water demand exhibit nonlinearity due to non-linear changes in water consumption, temperature variations, and holidays impacting urban water usage [12] and cannot be accurately predicted by linear approaches [13]. In the realm of water demand forecasting, some of the AI models have been explored in previous studies such as artificial neural network (ANN) [5,6], support vector machines (SVM) [13][14][15][16], SVM method using Fourier method [14], random forests [15], extreme learning machines (ELM) [15,17], employed ELM in conjunction with wavelet based ANN [18], system dynamics modeling (SDM) [19], ensemble wavelet-bootstrap machine-learning approach [20], singular spectrum analysis coupled with neural networks [21], and adaptive neuro-fuzzy inference system (ANFIS) [22]. In general, these learning techniques have shown successful results and are widely applied in water demand forecasting. ...

Accurate prediction of water demand in a city is crucial for the management of urban water distribution system. The current study aims to create adequate daily water demand forecasting models for the Canadian metropolis of London utilizing deep learning (DL)-based models. This study explores the potential of two stand-alone DL models for daily water demand modeling using a convolutional neural network (CNN) and the long short-term memory (LSTM) along with their hybrid CNN–LSTM model. Furthermore, a deep learning-based bi-directional LSTM model is introduced with CNN to predict daily water consumption in London as CNN–BiLSTM hybrid model. Daily water consumption data for the years 2009 to 2020 are used for the development and assessment of the predictive models. These stand-alone and hybrid models have been developed with specified input lags of daily water demand and verified for daily water demand prediction. The performance of the developed hybrid models was compared with other well-established DL-based stand-alone models. The model outcomes during the training, validation, and testing phases were assessed using statistical metrics such as the mean absolute error (MAE), Nash–Sutcliffe coefficient (NSE), correlation coefficient (r), Scatter Index (SI), Mean Bias Error (MBE), and Discrepancy Ratio (DR). The stand-alone models captured the observations very well during training and testing which is obvious at 1-day ahead. Moreover, at 7 days and 15 days ahead those models, except the stand-alone CNN, closely reproduced the pattern in daily water demand. Among the hybrid models, the outperforming CNN–BiLSTM model produced 1-day to 15-day multi-step ahead forecasting with performance metrics in the following ranges: MAE = 0.245–2.541 ML/day, NSE = 99.830–84.843% and r = 0.999–0.921 during the testing period. The uncertainty analysis has been performed which advocates the superiority of the CNN–BiLSTM model showing the forecasts bands with 88–90% observations within the 90% confidence interval (CI). Overall, the outcomes supporting the CNN–BiLSTM is to be considered as a promising deep learning procedure for accurate forecasting of urban water demand in any city globally.

... When monitoring, modelling, and engineering urban water systems, one critical hydrological variable is the volumetric flowrate or discharge. Knowledge of discharge is important in optimising future urban water systems and minimising their environmental impact [1,2], detecting and locating leaks and burst pipes [3], and estimating pollutants from combined sewer overflow events [4,5]. Flow can be measured in a number of ways, ranging from manual dilution methods (i.e., adding salt tracers and EC sensors [6]), using stage discharge relationships that may or may not include the use of hydraulic control structures (i.e., weirs [7][8][9]), to the velocity-area method. ...

... To facilitate a deeper understanding and to obtain a more accurate understanding of waterways, higher resolution monitoring has been desired so that real-time monitoring and control of our urban waterways can be achieved [3,[13][14][15]. For flowrate monitoring found when using existing sensors. ...

... Once this is known, Equations (1) and (2) can now be applied to determine the water surface velocity and line-of-sight distance to the water surface. The line-of-sight distance to the water surface can be converted to a water depth measurement, D, via D = H − d sinθ (3) where H is the perpendicular distance from the radar sensor to the channel bed, which should be measured manually during the installation of the sensor. ...

We designed an out-of-water radar water velocity and depth sensor, which is unique due to its low cost and low power consumption. The sensor is a first at a cost of less than USD 50, which is well suited to previously cost-prohibited high-resolution monitoring schemes. This use case is further supported by its out-of-water operation, which provides low-effort installations and longer maintenance-free intervals when compared with in-water sensors. The inclusion of both velocity and depth measurement capabilities allows the sensor to also be used as an all-in-one solution for flowrate measurement. We discuss the design of the sensor, which has been made freely available under open-hardware and open-source licenses. The design uses commonly available electronic components, and a 3D-printed casing makes the design easy to replicate and modify. Not before seen on a hydrology sensor, we include a 3D-printed radar lens in the casing, which boosts radar sensitivity by 21 dB. The velocity and depth-sensing performance were characterised in laboratory and in-field tests. The depth is accurate to within ±6% and ±7 mm and the uncertainty in the velocity measurements ranges from less than 30% to 36% in both laboratory and field conditions. Our sensor is demonstrated to be a feasible low-cost design which nears the uncertainty of current, yet more expensive, velocity sensors, especially when field performance is considered.

... With an aim to improve the accuracy of short-term water demand forecasting, much of the work has been using model-centric approaches (Lertpalangsunti et al. 1998;Herrera et al. 2010;Adamowski et al. 2012;Chen et al. 2017;Gagliardi et al. 2017;Chen & Boccelli 2018;Sardinha-Lourenço et al. 2018;Liu et al. 2022). These approaches focus on developing and adapting models to data, through various approaches including parameter optimisation, alterations to model structure and ensemble models. ...

... To answer the above questions, four forecast models are used -ARIMA, NN, RF and Prophet. ARIMA is commonly used as a benchmark model for comparative purposes (Adamowski et al. 2012;Tiwari & Adamowski 2013;Chen & Boccelli 2018;Guo et al. 2018;Sardinha-Lourenço et al. 2018) NN is a common model that has demonstrated high forecast accuracy (Maidment & Miaou 1986;Lertpalangsunti et al. 1998;Adamowski 2008;Ghiassi et al. 2008;Herrera et al. 2010;Adamowski et al. 2012;Tiwari & Adamowski 2013;Gagliardi et al. 2017;Guo et al. 2018). RF has received less attention compared to ARIMA and NN, but it has been shown to produce similar high forecast accuracy to NN (Herrera et al. 2010;Chen et al. 2017). ...

... ARIMA is commonly used as a benchmark model for comparative purposes (Adamowski et al. 2012;Tiwari & Adamowski 2013;Chen & Boccelli 2018;Guo et al. 2018;Sardinha-Lourenço et al. 2018) NN is a common model that has demonstrated high forecast accuracy (Maidment & Miaou 1986;Lertpalangsunti et al. 1998;Adamowski 2008;Ghiassi et al. 2008;Herrera et al. 2010;Adamowski et al. 2012;Tiwari & Adamowski 2013;Gagliardi et al. 2017;Guo et al. 2018). RF has received less attention compared to ARIMA and NN, but it has been shown to produce similar high forecast accuracy to NN (Herrera et al. 2010;Chen et al. 2017). Prophet is a relatively new forecasting model developed by Facebook (Taylor & Letham 2017) and it has yet to be applied to the field of short-term water demand forecasting. ...

Accurate water demand forecasting is the key to urban water management and can alleviate system pressure brought by urbanisation, water scarcity and climate change. However, existing research on water demand forecasting using machine learning is focused on model-centric approaches, where various forecasting models are tested to improve accuracy. The study undertakes a data-centric machine learning approach by analysing the impact of training data length, temporal resolution and data uncertainty on forecasting model results. The models evaluated are Autoregressive (AR) Integrated Moving Average (ARIMA), Neural Network (NN), Random Forest (RF) and Prophet. The first two are commonly used forecasting models. RF has shown similar forecast accuracy to NN but has received less attention. Prophet is a new model that has not been applied to short-term water demand forecasting, though it has had successful applications in various fields. The results obtained from four case studies show that (1) data-centric machine learning approaches offer promise for improving forecast accuracy of short-term water demands; (2) accurate forecasts are possible with short training data; (3) RF and NN models are superior at forecasting high-temporal resolution data; and (4) data quality improvement can achieve a level of accuracy increase comparable to model-centric machine learning approaches.
HIGHLIGHTS
Data-centric machine learning approaches offer promise for improving the forecast accuracy of short-term water demands.;
Accurate forecasts are possible with short training data.;
Random forest and neural network models are superior at forecasting high-temporal resolution data.;
Data quality improvements can achieve a level of accuracy increase comparable to model-centric machine learning approaches.;

... Allocating and utilizing limited water resources is an effective way to tackle the urban water supply-demand contradiction. Accurate prediction of urban water consumption is key to water resource allocation and planning (Herrera et al. 2010); hence, it is important to study urban water prediction. ...

... consumption forecasting (Haque et al. 2017;Herrera et al. 2010;Sardinha-Lourenco et al. 2018). Moreover, if the model's predicted values are consistently greater than the observed values, it indicates the presence of systematic errors in the model (Del Giudice et al. 2015), reflecting certain flaws in the model. ...

The prediction of urban water consumption is of great significance for urban planning and management, addressing water demand conflicts among various industries in a city and balancing supply and demand. The prediction of future data by data-driven models is largely based on the assumption of data consistency. However, large-scale human migration, the rapid development of economic activity, climate change and other factors affect the consistency of urban water consumption data, thus creating challenges for traditional data-driven models. In response, a combined prediction model based on the new information priority theory (CPMBNIP) is proposed to predict urban water consumption in a changing environment. To represent the linear and nonlinear characteristics of the urban water consumption system, the autoregressive moving average (ARMA) model and gray model (GM(1,1)) are selected as basic models. Based on the prediction results of the basic models, an optimization model with the corresponding weights of the two basic models as the decision variables is constructed. The optimization model is solved using the nondominated sorting genetic algorithm II (NSGA II) to obtain the set of weight combinations. Based on the principle of new information priority, the final weight combination is selected from the weight combination set according to the criterion of the best fit with the verification set. The final weight combination is incorporated into the two basic models to obtain CPMBNIP and predict the data for the test set. Based on urban water consumption sequence data from six lower-tier cities in southern China from 1965 to 2004, the urban water consumption of these six cities from 2005 to 2013 is predicted by CPMBNIP. Additionally, CPMBNIP is compared with two basic models (ARMA and GM(1,1)) and a single-objective combined prediction model (SOCPM). The percentage errors of CPMBNIP for the six cities' test sets are 4.54%, 3.88%, 6.14%, 4.34%, 3.01% and 3.43%. The prediction effect of CPMBNIP for the test set is better than that of the other models. The results show that CPMBNIP yields the best prediction performance. In addition, compared with the other models, CPMBNIP can better use the information provided by the new data to improve the prediction of some nonstationary time series. This study provides support for urban water consumption prediction under changing environments.

... É conhecido que essas séries apresentam componentes estocásticas e não lineares [Bonissone 1997], tornado a previsão da demanda de água um tema complexo. Nesse sentido, os métodos SC como lógica Fuzzy [Altunkaynak et al. 2005], [Firat et al. 2009a [Nasseri et al. 2011], máquinas de vetor suporte [Herrera et al. 2010], [Shabri e Samsudin 2015] e modelos híbridos [Tiwari e Adamowski 2013], [Tiwari e Adamowski 2015] tem alcançado resultados mais precisos para a previsão da demanda de água urbana. ...

... A arquitetura mais utilizada é a do tipo feed-forward neural network (FFNN), também conhecida como multi-layerperceptron (MLP) [Tiwari e Adamowski 2015], [Ghiassi et al. 2008], [Firat et al. 2009b], [Santos, e Pereira Filho 2014]. Referências ao modelo de Rede Neural de Regressão Generalizada (GRNN) [Di et al. 2014], [Al-Zahrani e Abo-Monasar 2015], Rede Neural de Base Radial (RBF) [Firat et al. 2009b], [Di et al. 2014], Rede Neural Artificial Dinâmica (DAN2) [Ghiassi et al. 2008 [Herrera et al. 2010]. Outros tipos de algoritmos, como o ELM, têm sido aplicados para otimizar as previsões baseadas em Redes Neurais [Tiwari et al. 2016]. ...

RESUMO O objetivo principal desta pesquisa é avaliar a capacidade preditiva e a complexidade computacional da demanda de água urbana utilizando modelos estatísticos e soft computing. Em comparação com abordagens de soft computing, o método denominado Dynamic Time Scan Forecasting (DTScanF) é extremamente rápido. PALAVRAS CHAVE. Métodos estatísticos, previsão de demanda, Soft computing. ABSTRACT The main objective of this research is to evaluate the predictive capacity and computational complexity of urban water demand using statistical and soft-computing models. Compared to soft computing approaches, the Dynamic Time Scan Forecasting (DTScanF) method is extremely fast.

... To address this issue, many non-linear methods have been developed to predict water demand. The representative models include artificial neural networks (ANN) (Adamowski, 2008;Ghiassi et al., 2008;Firat et al., 2010), support vector machines (SVM) (Peña-Guzmán et al., 2016;Candelieri et al., 2019;Herrera et al., 2010) and random forests (RF) (Herrera et al., 2010;Chen et al., 2017;Duerr et al., 2018). Some advanced techniques exploited, such as activation functions (Sharma et al., 2017), kernel functions (Amari and Wu, 1999), enable these methods to describe non-linear relationship. ...

... To address this issue, many non-linear methods have been developed to predict water demand. The representative models include artificial neural networks (ANN) (Adamowski, 2008;Ghiassi et al., 2008;Firat et al., 2010), support vector machines (SVM) (Peña-Guzmán et al., 2016;Candelieri et al., 2019;Herrera et al., 2010) and random forests (RF) (Herrera et al., 2010;Chen et al., 2017;Duerr et al., 2018). Some advanced techniques exploited, such as activation functions (Sharma et al., 2017), kernel functions (Amari and Wu, 1999), enable these methods to describe non-linear relationship. ...

Short-term water demand forecasting provides guidance on real-time water allocation in the water supply network, which help water utilities reduce energy cost and avoid potential accidents. Although a variety of methods have been proposed to improve forecast accuracy, it is still difficult for statistical models to learn the periodic patterns due to the chaotic nature of the water demand data with high temporal resolution. To overcome this issue from the perspective of improving data predictability, we proposed a hybrid Wavelet-CNN-LSTM model, that combines time-frequency decomposition characteristics of Wavelet Multi-Resolution Analysis (MRA) and implement it into an advanced deep learning model, CNN-LSTM. Four models — ANN, Conv1D, LSTM, GRUN — are used to compare with Wavelet-CNN-LSTM, and the results show that Wavelet-CNN-LSTM outperforms the other models both in single-step and multi-steps prediction. Besides, further mechanistic analysis revealed that MRA produce significant effect on improving model accuracy.

... Water demand forecasting (WDF) is essential for the design and optimal management of water distribution systems (WDS). During the past few decades, a wide variety of WDF models has been proposed, ranging from statistical models such as auto-regressive integrated moving average (ARIMA) and exponential 5 smoothing models (e.g., Caiado, 2010;Arandia et al., 2015) to machine learning models such as artificial neural network, support vector regression, and random forests (Herrera et al., 2010;Romano & Kapelan, 2014;Brentan et al., 2017) to more recently developed deep learning models such as gated recurrent unit (GRU), long short-term memory (LSTM), and 1D convolution-GRU (Guo et al.,10 2018; Mu et al., 2020;Salloom et al., 2021;Chen et al., 2022). ...

... And further, hourly predictions of water demand are frequently used by 565 water managers to determine optimal regulation and pumping schemes; hence water demand forecasting on an hourly basis continues to be the research hotspot of many scholars (e.g., Herrera et al., 2010;Chen & Boccelli, 2018;Ambrosio et al., 2019;Hu et al., 2021;Huang et al., 2021). ...

Water demand forecasting (WDF) is essential for the design and optimal management of water distribution systems (WDS). Historical water demand data contribute significantly to WDF. Yet the obtained water demand data contain anomalies on occasions due to failures in WDS or monitoring systems. The contaminated water demand data are invalid to describe the actual water demand, thus degrading the performance of WDF models. However, the importance of anomaly detection in water demand data is underestimated, or at least not explicitly described in many published papers. To fill the gap, we propose an unsupervised anomaly detection method based on an asymmetric encoder–decoder (asyED) model. Different from the symmetric structure of a traditional autoencoder where a signal is reproduced from itself, asyED is asymmetric where a signal is reproduced from the upstream and downstream information of the signal, while the signal itself does not participate in the reconstruction. In light of this feature, asyED is powerful in identifying anomalies. The proposed method is employed to detect anomalies in hourly water demand data which exhibit trend and seasonality. The results show the superiority of the proposed method over the other four commonly used anomaly detection methods: Z-score, isolation forest, local outlier factor, and seasonal hybrid ESD (Extreme Studentized Deviate test).

... This precision enables water utilities to minimize energy consumption costs linked to water pump operations while simultaneously fulfilling user requirements, thereby achieving a balanced equilibrium between water production and consumption [2]. Water demand also assumes a vital role in monitoring efforts, evident in its capacity to detect potential leakage instances when actual demand values significantly deviate from the projected water demand [3]. The topic of water demand forecasting has been a focal point of research since the 1960s [4], encompassing various forecasting timeframes, notably long-term, medium-term, and short-term [5]. ...

Accurate short-term water demand forecasting assumes a pivotal role in optimizing water supply control strategies, constituting a cornerstone of effective water management. In recent times, the rise of machine learning technologies has ushered in hybrid models that exhibit superior performance in this domain. Given the intrinsic non-linear fluctuations and variations in short-term water demand sequences, achieving precise forecasts presents a formidable challenge. Against this backdrop, this study introduces an innovative machine learning framework for short-term water demand prediction. The maximal information coefficient (MIC) is employed to select high-quality input features. A deep learning architecture is devised, featuring an Attention-BiLSTM network. This design leverages attention weights and the bidirectional information in historical sequences to highlight influential factors and enhance predictive capabilities. The integration of the XGBoost algorithm as a residual correction module further bolsters the model’s performance by refining predicted results through error simulation. Hyper-parameter configurations are fine-tuned using the Keras Tuner and random parameter search. Through rigorous performance comparison with benchmark models, the superiority and stability of this method are conclusively demonstrated. The attained results unequivocally establish that this approach outperforms other models in terms of predictive accuracy, stability, and generalization capabilities, with MAE, RMSE, MAPE, and NSE values of 544 m3/h, 915 m3/h, 1.00%, and 0.99, respectively. The study reveals that the incorporation of important features selected by the MIC, followed by their integration into the attention mechanism, essentially subjects these features to a secondary filtration. While this enhances model performance, the potential for improvement remains limited. Our proposed forecasting framework offers a fresh perspective and contribution to the short-term water resource scheduling in smart water management systems.

... Fundamentally, estimating and forecasting water and energy consumption is critical to infrastructure, facilities, municipal planning, and design (Surendra and Deka, 2022). High resolution consumption predictions over a short-term horizon (e.g., 24 h ahead at a 1 h resolution) have extensive applications for water and energy resource management, including understanding patterns of consumer behavior, identifying water or energy saving potentials, detecting abnormal events, and deriving efficient operation and effective designs (Frankel et al., 2021;Herrera et al., 2010;Surendra and Deka, 2022;Tanimoto et al., 2013). For instance, Shen et al. (2020) found that a nearly 12 % reduction in monthly electricity consumption could be achieved through personality energy-saving interventions, based on the prediction of household electricity consumption. ...

... Among them, DT-based algorithms need to estimate fewer parameters and are easy to apply. Therefore, they have a high degree of automation but are easily overshadowed by the tendency of data overfitting [35]. For this reason, it is gradually replaced by more advanced and simpler machine learning algorithms; the support vector machine of the kernel function method and the random forest of the ensemble tree method have become very effective methods in metallogenic prediction. ...

In recent years, there has been a growing emphasis on combining intelligent prospecting algorithms, such as random forest, with extensive geological and mineral data for the purpose of quantitatively predicting exploration geochemistry. This approach holds significant importance for enhancing the accuracy of target delineation. The central Kunlun area in Xinjiang possesses highly favorable ore-forming geological conditions, offering excellent prospects for mineral exploration. However, the depletion of shallow deposits coupled with a decade-long gap in geological exploration have presented increasing challenges in the quest to discover substantial metal resources. Consequently, there is now a severe shortage of reserve assets in the region, prompting an urgent need for the implementation of new theories, methods, and technologies in mineral resource investigation and evaluation efforts. The researchers used geological and regional geochemical data to construct a random forest metallogenic discriminant model for predicting the mineralization of gold polymetallic minerals in the central Kunlun area of Xinjiang and delineating the metallogenic target area. Two different sampling methods were compared to quantitatively predict gold polymetallic mineral resources. The results indicate that the selected training samples offer higher prediction accuracy and reliability by fully capturing the complex information of the original data. The random forest model using select training samples has valuable applications in metallogenic prospect prediction and potential division due to its ability to consider the actual exploration cost and identify small areas with high potential and a high proportion of ore. This study significantly improves prediction accuracy, reduces exploration risk, and expands the use of machine learning algorithms in mathematical geology in the central Kunlun area of Xinjiang.

... For example, Ghiassi et al. 2008 andFirat et al. 2009a employed ANNs. Also Herrera et al. 2010, Brentan et al. 2017and, Peña-Guzmán et al. 2016 used the support vector machine (SVM) method to forecast water demand. In addition, Altunkaynak et al. 2005 andFirat et al. 2009b used fuzzy logic for the same task. ...

Efficient and optimal management of urban water distribution networks needs to forecast the amount of short term water demand for a day and night at hourly intervals. Water demand has a time series nature and a pattern with a complex structure, which is influenced by many factors. Deep neural networks (DNNs) can be suitable for extracting this pattern. In this study, a One-Dimensional convolutional neural network (1D CNN) is implemented for the short-term forecast of urban water. Next, the obtained outputs are compared with other deep learning models, including deep feedforward neural
network (DFNN), simple recurrent neural network (Simple RNN), long short-term memory (LSTM), and gated recurrent unit (GRU) neural networks. The results show that the 1D CNN with a Mean Absolute Percentage Error (MAPE) of 3.52% is superior to other DNNs. This issue and the short time required to train the 1D CNN model compared to other models make this model superior. Deep learning models were implemented in the TensorFlow software platform and Keras library
in Python. In this study, the Rolling Cross-Validation technique was used to evaluate and adjust the hyperparameters of
deep learning models.

... For example, Ghiassi et al. 2008 andFirat et al. 2009a employed ANNs. Also Herrera et al. 2010, Brentan et al. 2017and, Peña-Guzmán et al. 2016 used the support vector machine (SVM) method to forecast water demand. In addition, Altunkaynak et al. 2005 andFirat et al. 2009b used fuzzy logic for the same task. ...

Efficient and optimal management of urban water distribution networks needs to forecast the amount of short-term water demand for a day and night at hourly intervals. Water demand has a time series nature and a pattern with a complex structure, which is influenced by many factors. Deep neural networks (DNNs) can be suitable for extracting this pattern. In this study, a One-Dimensional convolutional neural network (1D CNN) is implemented for the short-term forecast of urban water. Next, the obtained outputs are compared with other deep learning models, including deep feedforward neural network (DFNN), simple recurrent neural network (Simple RNN), long short-term memory (LSTM), and gated recurrent unit (GRU) neural networks. The results show that the 1D CNN with a Mean Absolute Percentage Error (MAPE) of 3.52% is superior to other DNNs. This issue and the short time required to train the 1D CNN model compared to other models make this model superior. Deep learning models were implemented in the TensorFlow software platform and Keras library in Python. In this study, the Rolling Cross-Validation technique was used to evaluate and adjust the hyperparameters of deep learning models.

... Among them, DT-based algorithms need to estimate fewer parameters and are easy to apply. Therefore, they have a high degree of automation, but are easily overshadowed by the tendency of data overfitting [35]. For this reason, it is gradually replaced by more advanced and simpler machine learning algorithms; the support vector machine of kernel function method and the random forest of ensemble tree method have become very effective methods in metallogenic prediction. ...

In recent years, how to combine intelligent prospecting algorithms such as random forest with a large number of geological and mineral data for quantitative prediction of exploration geochemistry has become an important topic of concern to quantitatively improve the accuracy of target delineation. The ore-forming geological conditions in the central Kunlun area of Xinjiang are great and have good prospecting prospects. However, due to the exhaustion of shallow deposits and the lag of geological prospecting work in the past ten years, there has been no expected breakthrough in the search for large and super-large metal deposits for many years. There has been a serious shortage of reserve resources. The use of new theories, new methods and new technologies for mineral resources investigation and evaluation has become an urgent need in the current prospecting work. In view of this, based on the existing spatial database of geological and mineral resources in the central Kunlun of Xinjiang, combined with the geological characteristics, genesis and metallogenic regularity of the area, this paper carried out a series of studies on gold polymetallic minerals with the help of geographic information system and data science programming software platform. The researchers integrated geological and regional geochemical data, and constructed a random forest metallogenic discriminant model based on two different sampling methods (integrated random undersampling and selection of training samples) to predict the mineralization of gold polymetallic minerals in the central Kunlun area of Xinjiang and delineate the metallogenic target area. The quantitative prediction of gold polymetallic mineral resources in the central Kunlun area of Xinjiang by two random forest models is compared and discussed: the known ore spots, fault structures and geochemical information are extracted, and the known gold polymetallic ore spots and geochemical data are used to form a training set and a prediction set to construct a machine learning random forest model. The results of prediction evaluation and metallogenic prospect division show that for different sampling methods, the performance evaluation parameters of the training process show that the prediction accuracy of the selected training samples is higher, and the selected training samples are more reliable because they can fully learn the complex information of the original data. In the metallogenic prospect prediction and metallogenic potential division, the random forest model of selecting training samples has more reference value and further exploration research significance in the production problem considering the actual exploration cost because of its small area of high potential prediction area and high proportion of ore bearing per unit area. At the same time, this study innovatively improves the prediction accuracy, reduces the exploration risk, and expands the prospecting idea of machine learning algorithm in mathematical geology in the central Kunlun area of Xinjiang. The delineated metallogenic potential area has positive guiding significance for the actual gold polymetallic prospecting work in this area.

... Support vector machines (SVMs) are supervised learning models based on the concepts of Cortes and Vapnik [8], where a hyperplane is found through an optimization process that guarantees the greatest separation distance between groups of data. There is a variation of SVM based on regression [12,33], also known as support vector regression (SVR), which has been used to forecast water demand [1,7,18]. ...

This paper proposes a new hybrid SVR-ANN model for water demand forecasting. Where an adaptation of the methodology proposed by Zhang (Neurocomputing 50:159–175, 2003) is used to decompose the time series of 10 reservoirs that supply the Metropolitan Region of Salvador (RMS). The data used are from the historical consumption from January 2017 to February 2022, obtained from the local supply company, Empresa Baiana de Águas e Saneamento, and meteorological data obtained from the National Institute of Meteorology of Brazil. The results demonstrated the feasibility of using the proposed model, compared to other traditional models such as the multilayer perceptron (MLP), support vector regression (SVR), short long-term memory (LSTM) and autoregressive integrated moving average (ARIMA).

... The water balance for land development with an area of 900 ha using a surplus of water deficit is obtained from the results of reducing the debit of water availability with water demand if the result is positive then it is a surplus and if it is negative then it is a deficit (Herrera et al., 2010). From Figure 5 below, it can be concluded that the water deficit scenario 1 is October 1, and October 2. Scenario 2 does not have a water deficit month or all months have a surplus of water, meaning that the availability of water is sufficient to meet their needs. ...

In the Tabalong Regency, South Kalimantan Province, many irrigation networks have not worked optimally. The Jaro Irrigation Area is one of these irrigation networks, the majority of available fields are planted with Paddy. Evaluation of irrigation water to irrigate paddy fields is very important for optimal growth and development of paddy. The objective of this research is to evaluate the water balance, the potential area for paddy field area development can be obtained by applying the optimal crop pattern. Analysis of the rainfall data used in this study with statistical analysis in the form of validation and correction of rainfall data. Rainfall data was obtained from the Tropical Rainfall Measurement Mission (TRMM) and the Meteorology, Climatology, and Geophysics Agency (BMKG) Jaro Station in the period 2013-2019, the data using statistical analysis obtained a correlation coefficient and regression equation. The regression equation is used to obtain the corrected rainfall value which will be used in the hydrological analysis. Water requirement analysis with several cropping pattern scenarios. Calculation of evapotranspiration using the Modified Penman method. The F.J Mock method is used to develop the discharge value. Dependable discharge, 80%, is used to estimate water availability. Furthermore, an evaluation of the water balance is carried out for each scenario, and the result of information on surplus or deficit conditions can be obtained each month. Analysis of water demand discharge based on planting scenarios is divided into three. First Scenario with a high-yield paddy - a high-yield paddy according to the existing conditions at the research location. The second scenario is with a high-yield paddy - high-yield paddy - beans and the third scenario is with a high-yield paddy - paddy. According to the results of the water balance evaluation for the three scenarios, the potential area can be reached up to 900 ha from 850 ha with the chosen crop pattern in scenario number 2.

... ML is gaining traction in water-related applications, with recent studies having developed and applied algorithms for topics including leak detection in pipes (Mounce et al. 2010;Romano et al. 2014;Carreño-Alvarado et al. 2017), water demand forecasting (Herrera et al. 2010;Xenochristou et al. 2021), wastewater treatment plant operations (Dairi et al. 2019;Mamandipoor et al. 2020;Xu et al. 2021), sewer overflow predictions (Mounce et al. 2014;Rosin et al. 2022), prediction of chlorine decay at consumers taps (Gibbs et al. 2006), prediction of indicator microorganisms in drinking water supply (Mohammed et al. 2017), and prediction of water quality events in DWDS using sensors (Vries et al. 2016;Fellini et al. 2018;Garcia et al. 2020). The aforementioned studies generally cover a single application but collectively demonstrate the potential for ML techniques to provide value to water utility operations. ...

Water utilities collect vast amounts of data, but they are stored and utilised in silos. Machine learning (ML) techniques offer the potential to gain deeper insight from such data. We set out a Big Data framework that for the first time enables a structured approach to systematically progress through data storage, integration, analysis, and visualisation, with applications shown for drinking water quality. A novel process for the selection of the appropriate ML method, driven by the insight required and the available data, is presented. Case studies for a water utility supplying 5.5 million people validate the framework and provide examples of its use to derive actionable information from data to help ensure the delivery of safe drinking water.
HIGHLIGHTS
A four-layer Big Data framework for better water quality management is proposed.;
Framework consists of data collection, integration, analysis, and visualisation.;
Machine learning method selection tool driven by data availability is included.;
Framework yields information for interventions to manage drinking water quality.;
Two case studies demonstrate the success of the framework.;

... Artificial Neural Networks are machine learning tools that cluster, classify, predict, capture, and focus on the functional relationship in historical water demand data [25,26]. Support Vector Regressions (SVR) have also been widely applied for forecasting water demand [27,28]. Several studies have applied these methods to forecast water demand for the short, medium, and long-term. ...

Drinking water demand modelling and forecasting is a crucial task for sustainable management and planning of water supply systems. Despite many short-term investigations, the medium-term problem needs better exploration, particularly the analysis and assessment of meteorological data for forecasting drinking water demand. This work proposes to analyse the suitability of ERA5-Land reanalysis data as weather input in water demand modelling. A multivariate deep learning model based on the long short-term memory architecture is used in this study over a prediction horizon ranging from seven days to two months. The performance of the model, fed by ground station data and ERA5-Land data, is compared and analysed. Close-to-operative forecasting is then presented using observed data for training and ERA5-Land dataset for testing. The results highlight the reliability of the proposed architecture fed by ERA5-Land data for different time horizons. In particular, the ERA5-Land shows promising performance as input of the multivariate machine learning forecasting model, although some meteorological biases are present, which can be improved, especially in close-to-operative application with bias correction techniques. The proposed study leads to practical implications in the use of regional climate model outputs to support drinking water forecasting for sustainable and efficient management of water distribution systems.

... Neste estudo a abordagem por rede neural artificial se mostrou mais eficiente. Herrera et al. (2010) ...

Forecasting water demand is fundamental to a region's social and economic development. In the literature there are several studies with specific applications, However, the topic still lacks a comprehensive view. Therefore, this article proposes a integrative review of the literature, to obtain an overview of the subject (methods, areas of application, objectives, and other factors). Using Methodi Ordinatio methodology, 74 articles with scientific relevance for analysis were selected, most of them published in the USA, Australia, and the United Kingdom. It was concluded that in the use of methods there is a predominance of the approach of artificial neural networks and regression analyzes. As for the application, most studies were for forecasting residential demand.

... According to Billings & Jones [18], forecasting water consumption could supply a basis for operational, tactical, and strategic decision-making and can help improve the performance of water distribution systems by anticipating consumption values. Forecasting models generally facilitate understanding water consumption behavior [19], besides helping the development of water-saving strategies [20], energy, and adequate destination for effluents [21]. In addition, understanding and managing water, in an urban context, is considered a critical factor for achieving sustainability [22]. ...

Water has always been associated with life; hence, efficient and appropriate prediction is ultimately needed for future water consumption. In this study, the seasonal ARIMA model was utilized in forecasting water consumption in the City of Mati using the monthly consumption in cubic meters (CBM) from January 2006 to July 2021. Specifically, this piece of endeavor is anchored to the time-series models as it looks at past patterns of data and attempts to predict the future based on the underlying patterns contained within those data. Upon diagnostic checking with the use of AIC, SBC, and MAPE; the ARIMA (0, 1, 1)×(0, 0, 2)12 was found to be the best-fit model to do the forecasting. The result showed an increase of approximately 4.12% in the average annual water consumption. And it further revealed that there will be a nearly 15.48% increase in water consumption for the year 2021 vis-à-vis year 2023.

... It is known that these series have stochastic and nonlinear components, making water demand forecasting a complex issue. In this context, Soft Computing methods such as Fuzzy Logic (Firat et al. 2009a;Ambrosio et al. 2019), Neural Computing (Firat et al. 2009b(Firat et al. , 2010Santos & Pereira Filho 2014;Al-Zahrani & Abo-Monasar 2015;Pacchin et al. 2019), Evolutionary Computation (Bai et al. 2014;Romano & Kapelan 2014;Leon et al. 2020;Shirkoohi et al. 2021), Support Vector Machines (Herrera et al. 2010;Brentan et al. 2016;Ambrosio et al. 2019), Random Forests (Chen et al. 2017;Ambrosio et al. 2019), Long Short-Term Memory (Boudhaouia & Wira 2021), Dual-Scale Deep Belief Network (Xu et al. 2018), Continuous Deep Belief Echo State Network (Xu et al. 2019b) and hybrid models (e.g., Nasseri et al. 2011;Adamowski et al. 2012;Campisi-Pinto et al. 2012;Odan & Reis 2012;Huang et al. 2014Huang et al. , 2021Huang et al. , 2022Tiwari & Adamowski 2015;Guo et al. 2022;Rajballie et al. 2022) have provided more accurate results for urban water demand forecasting. In general, hybrid models are more robust for water demand forecasting compared to Feed Forward Neural Network (FFNN), Multiple Linear Regression (RLM), Multiple Nonlinear Regression (MNLR) and ARIMA models. ...

The specialized literature on water demand forecasting indicates that successful predicting models are based on soft computing approaches such as neural networks, fuzzy systems, evolutionary computing, support vector machines and hybrid models. However, soft computing models are extremely sensitive to sample size, with limitations for modeling extensive time-series. As an alternative, this work proposes the use of the dynamic time scan forecasting (DTSF) method to predict time-series for water demand in urban supply systems. Such a model scans a time-series looking for patterns similar to the values observed most recently. The values that precede the selected patterns are used to create the prediction using similarity functions. Compared with soft computing approaches, the DTSF method has very low computational complexity and is indicated for large time-series. Results presented here demonstrate that the proposed method provides similar or improved forecast values, compared with soft computing and statistical methods, but with lower computational cost. Thus, its use for online water demand forecasts is favored.
HIGHLIGHTS
Novel analog-based methodology to forecasting in univariate time-series.;
A fast time-series forecasting methodology for large data sets.;
The great advantage of this data-oriented method is that, given a large amount of data, in general, the performance improves.;
The method has very low computational complexity, thus, its use for online water demand forecasts is favored.;
There is no best model for predicting daily water demand.;

... Among these ANNs, the most commonly used type is the backpropagation neural network (BPNN), where a backpropagation algorithm is used for training (Bougadis et al. 2005). Some previous studies have shown that it is possible to yield fairly accurate forecasts using BPNNs to predict short-term water demand (Adamowski & Karapataki 2010;Herrera et al. 2010). Although BPNNs perform well in some cases, they easily fall into the local optimal solutions due to the randomness of the initial weights and thresholds, which results in poor generalization, especially for complex prediction problems. ...

This paper presents a backpropagation neural network (BPNN) approach based on the sparse autoencoder (SAE) for short-term water demand forecasting. In this method, the SAE is used as a feature learning method to extract useful information from hourly water demand data in an unsupervised manner. After that, the extracted information is employed to optimize the initial weights and thresholds of the BPNN. In addition, to enhance the effectiveness of the proposed method, data reconstruction is implemented to create suitable samples for the BPNN, and the early stopping method is employed to overcome the BPNN overfitting problem. Data collected from a real-world water distribution system are used to verify the effectiveness of the proposed method, and a comparison with the BPNN and other BPNN-based methods which integrate the BPNN with particle swarm optimization (PSO) and the mind evolutionary algorithm (MEA), respectively, is conducted. The results show that the proposed method can achieve fairly accurate and stable forecasts with a 2.31% mean absolute percentage error (MAPE) and 320 m3/h root mean squared error (RMSE). Compared with the BPNN, PSO–BPNN and MEA–BPNN models, the proposed method gains MAPE improvements of 5.80, 3.33 and 3.89%, respectively. In terms of the RMSE, promising improvements (i.e., 5.27, 2.73 and 3.33%, respectively) can be obtained.
HIGHLIGHTS
To enhance the performance of the BPNN, the SAE is introduced to extract useful features in an unsupervised manner.;
An effective framework which integrates the BPNN with the SAE and early stopping technique is proposed for water demand forecasting.;
The proposed method is verified by comparing with the BPNN and similar methods which integrate the BPNN with PSO and the MEA, respectively.;

... In other words, once a tree within a particular forest is split, the RF method chooses a random subset of the independent variables. Not only the predictive accuracy but also the running time is also relatively improved (Herrera et al., 2010;Seo et al., 2018). We would refer the readers to Breiman (Breiman, 2001) to get detailed information regarding how the RF is mathematically formulated. ...

The community’s well-being and economic livelihoods are heavily influenced by the water level of watersheds. The changes in water levels directly affect the circulation processes of lakes and rivers that control water mixing and bottom sediment resuspension, further affecting water quality and aquatic ecosystems. Thus, these considerations have made the water level monitoring process essential to save the environment. Machine learning hybrid models are emerging robust tools that are successfully applied for water level monitoring. Various models have been developed, and selecting the optimal model would be a lengthy procedure. A timely, detailed, and instructive overview of the models’ concepts and historical uses would be beneficial in preventing researchers from overlooking models’ potential selection and saving significant time on the problem. Thus, recent research on water level prediction using hybrid machines is reviewed in this article to present the “state of the art” on the subject and provide some suggestions on research methodologies and models. This comprehensive study classifies hybrid models into four types algorithm parameter optimisation-based hybrid models (OBH), pre-processing-based hybrid models (PBH), the components combination-based hybrid models (CBH), and hybridisation of parameter optimisation-based with preprocessing-based hybrid models (HOPH); furthermore, it explains the pre-processing of data in detail. Finally, the most popular optimisation methods and future perspectives and conclusions have been discussed.

... In the latest decades, many studies attempted to develop these tools for many tasks, for instance, developing digital twins for state estimation (Bonilla et al. 2022), algorithms for burst and leakage detection (Wu & Liu 2017), developing data analysis techniques for smart water metering systems (Rahim et al. 2020), modelling water demand (House-Peters & Chang 2011), for intrusion detection (Mboweni et al. 2021), and much more different and important tasks. Among plenty of possible techniques that can be developed with these data-driven approaches, one of the most important one is certainly water demand forecasting (e.g., Herrera et al. 2010;Pacchin et al. 2019;Zanfei et al. 2022a). In general, many applications of forecasting models show how such techniques can improve and optimise the management of water resources. ...

Sustainable management of water resources is a key challenge nowadays and in the future. Water distribution systems have to ensure fresh water for all users in an increasing demand scenario related to the long-term effects due to climate change. In this context, a reliable short-term water demand forecasting model is crucial for the optimal management of water resources. This study proposes a novel deep learning model based on long short-term memory (LSTM) neural networks to forecast hourly water demand. Due to the limitations of using multiple input sequences with different time lengths using LSTM, the proposed deep learning model is developed with two modules that process different temporal sequences of data: a first module aimed at dealing with short-term meteorological information and a second module aimed at representing the longer-term information of the water demand. The proposed dual-module structure allows a multivariate selection of the inputs with sequences of a different time length. The performance of the proposed deep learning model is compared to a conventional multi-layer perceptron (MLP) and a seasonal integrated moving average (SARIMA) model in a real case study. The results highlight the potential of the proposed multivariate approach in short-term water demand prediction, outperforming the more conventional approaches.
HIGHLIGHTS
This study proposes a novel short-term water demand forecasting model using multivariate long short-term memory with meteorological data.;
The proposed dual-module structure allows to process different temporal sequences of data.;
The model is tested on a real case study, outperforming state-of-art methods.;
This study highlights the importance of a multivariate approach, especially due to climate change.;

... Accurate water demand prediction is the key foundation of building this intelligent system, which allows water utilities to minimize energy consumption costs associated with water pump operations, while simultaneously meeting the needs of users, thereby achieving an appropriate balance between water production and consumption (Huang et al, 2021). Water demand also plays an important role in monitoring work, and one manifestation is that it can help identify the possible occurrence of leakage when actual demand values deviate obviously from the forecasted water demand (Herrera et al, 2010). Since the 1960s, the issue of water demand forecasting has elicited the focus of growing body of researches (Sebri, 2016). ...

As a key component of water distribution management, reliable short-term water demand forecasting plays a fundamental role in the optimal control for water supply. Most reported approaches based on deep learning omit the two-way information flow existed in historical water demand data and the model inputs cannot automatically highlight the significance of crucial features to current water demands, which could have impact on the prediction accuracy. Owing to the high nonlinear changes and fluctuations in water demand series, making accurate forecast a challenging task. To address the problem in this study, maximal information coefficient (MIC) is presented for feature extraction analysis, deep learning with Attention-BiLSTM networks is developed, to reinforce the performance, that combining the XGBoost algorithm as a residual correction module to forecast short-term water demand. Hyper-parameter configurations are conducted with the models. Finally, the superiority of proposed method is illustrated by comparing to other benchmark models. The results show that the proposed method outperforms other predictive models, in which both on the accuracy and stability.

... RF models have been presented by Breiman (2001) as classical ensemble learning algorithms and have shown to be outstanding predictive models in classification tasks (Herrera et al 2010, James et al 2013. RFs are built using the same fundamental principles as decision trees and bagging (Bootstrap aggregation). ...

Water monitoring in households provides occupants and utilities with key information to support water conservation and efficiency in the residential sector. High costs, intrusiveness, and practical complexity limit appliance-level monitoring via sub-meters on every water-consuming end use in households. Non-intrusive machine learning methods have emerged as promising techniques to analyze observed data collected by a single meter at the inlet of the house and estimated the disaggregated contribution of each water end use. While fine temporal resolution data allow for more accurate end-use disaggregation, there is an inevitable increase in the amount of data that needs to be stored and analyzed. To explore this tradeoff and advance previous studies based on synthetic data, we first collected 1-second resolution indoor water use data from a residential single-point smart water metering system installed at a 4-person household, as well as ground-truth end-use labels based on a water diary recorded over a 4-week study period. Second, we trained a supervised machine learning model (random forest classifier) to classify six water end use categories across different temporal resolutions and two different model calibration scenarios. Finally, we evaluated the results based on three different performance metrics (micro, weighted, and macro F1 scores). Our findings show that data collected at 1- to 5-second intervals allow for better end-use classification (weighted F-score higher than 0.85), particularly for toilet events; however, certain water end uses (e.g., shower and washing machine events) can still be predicted with acceptable accuracy even at coarser resolutions, up to 1 minute, provided that these end use categories are well represented in the training dataset. Overall, our study provides insights for further water sustainability research and widespread deployment of smart water meters.

... Research aimed at developing new data-driven strategies for improving water management has blossomed in the last few years. Among a myriad of different applications, there highlight the development of modeling water demand (e.g., House-Peters and Chang, 2011), data analysis techniques for smart water metering systems (e.g., Rahim et al., 2020;Nguyen et al., 2018), intrusion detection methods (e.g., Mboweni et al., 2021) and water demand forecasting (e.g., Herrera et al., 2010;Brentan et al., 2017). All these research approaches have in common the aim of helping water utilities in efficient WDS management. ...

Sustainable management of water resources is a key challenge for the well-being and security of current and future society worldwide. In this regard, water utilities have to ensure fresh water for all users in a demand scenario stressed by climate change along with the increase in the size of cities. Dealing with anomalies, such as leakages and pipe bursts, represents one of the major issues for efficient water distribution system (WDS) operation and management. To this end, it is crucial to count on suitable methods and technologies to provide a quick, reliable, and accurate detection of such anomalies and supply disruption events. Therefore, this work proposes a novel WDS management framework based on the development of graph convolutional neural networks (GCN) models for bursts detection in WDSs. These methods rely on a WDS graph representation for a set of pressure and flow rates measures. Such a graph is used to design two GCN-based models to identify bursts. In addition, two conventional multi-layer perceptron models are used as the benchmarks to compare the graph-based methodologies. Finally, the proposed methodology is tested on a water utility network, showing the high potential of graph convolutional networks for anomaly detection on WDSs.

... Accuracy metrics that are often used in the water demand forecasting literature are Mean Absolute Error -MAE (Antunes et al., 2018;Dos Santos & Pereira, 2014;Herrera et al., 2010;Kofinas et al., 2014;Shabani et al., 2016), Mean Absolute Percentage Error -MAPE (Bai et al., 2014;Candelieri et al., 2015;Kofinas et al., 2014;Tiwari et al., 2016), Root Mean Square Error -RMSE (Dos Santos & Pereira, 2014;Kofinas et al., 2014;Shabani et al., 2016;Tiwari et al., 2016), and R 2 coefficient of determination (Babel et al., 2007;Bakker et al., 2014;Dos Santos & Pereira, 2014;Haque et al., 2014;Kofinas et al., 2014;Shabani et al., 2016;Tiwari et al., 2016). ...

Analytics can support numerous aspects of water industry planning, management, and operations. Given this wide range of touchpoints and applications, it is becoming increasingly imperative that the championship and capability of broad-based analytics needs to be developed and practically integrated to address the current and transitional challenges facing the drinking water industry. Analytics will contribute substantially to future efforts to provide innovative solutions that make the water industry more sustainable and resilient.
The purpose of this book is to introduce analytics to practicing water engineers so they can deploy the covered subjects, approaches, and detailed techniques in their daily operations, management, and decision-making processes. Also, undergraduate students as well as early graduate students who are in the water concentrations will be exposed to established analytical techniques, along with many methods that are currently considered to be new or emerging/maturing.
This book covers a broad spectrum of water industry analytics topics in an easy-to-follow manner. The overall background and contexts are motivated by (and directly drawn from) actual water utility projects that the authors have worked on numerous recent years. The authors strongly believe that the water industry should embrace and integrate data-driven fundamentals and methods into their daily operations and decision-making process(es) to replace established “rule-of-thumb” and weak heuristic approaches – and an analytics viewpoint, approach, and culture is key to this industry transformation.
ISBN: 9781789062373 (paperback)
ISBN: 9781789062380 (eBook)
ISBN: 9781789062397 (ePub)

... An emerging type of machine learning techniques which utilizes ensembles of regressions is receiving highlighted interest in many fields of knowledge (Hansen and Salamon, 1990;Steele, 2000;Sesnie et al., 2008;). An ensemble learning tool called Random Forest (RF) is increasingly being applied in fields related to the environment and water resources (Herrera et al., 2010;Loos and Elsenbeer, 2011;McGinnis and Kerans, 2012;Rodriguez-Galiano et al., 2012). RF offers a new approach to the problem of vulnerability mapping, as it is relatively robust to outliers and it can overcome the "black-box" limitations of artificial neural networks, assessing the relative importance of the variables and being able to select the most important variables (features) and reducing dimensionality. ...

The potential groundwater pollution zones within Bitumen impregnated Agbabu community an agglomerated town in Odigbo Local Government was mapped using electrical resistivity methods with Machine learning regression to produce a potential vulnerability map of the area. The Electrical resistivity techniques consisted of2D wenner imaging andvertical electrical sounding (VES)techniques were conducted along the established. Water samples from wells within the area were subsequently collected for physicochemical studies. Twenty (20) Vertical Electrical Sounding data were acquired at different locations. Results of the depth sounding showed that the KQ and HA were the dominant type-curves constituting about 90% of the curve types obtained. Four to five geo-electric/geologic layers were delineated with Bitumen impregnated layer found within the third and fourth layer with resistivities ranging from 86 to 255Ωm. Results from the 2D Werner and geoelectric sequencing showed that the underlaying bitumen Impregnated layer was overlain by a protective clay layer which has some discontinuities and weak zones in Traverses 1 and 4 indicative of the most vulnerable zone(s) in terms of groundwater pollution while Traverses 2 and 3 are less vulnerable. Five (5) geophysically derived independent variables which represent intrinsic properties of the groundwater quality were computed, mapped and fed into the Random Forest Machine Learning Tool. These variables were permuted and the effect of the out of bag classification was measured. Longitudinal Conductance was ranked highest with a variable importance of 0.51. The model used 75% of the input dataset to train the model while 25% was used to test the trained model; the model trained from this subset had a mean square error (MSE) of 0.0982. The resultant output predicted values which ranged from 0.7251to 2.83836. The predictive vulnerable map showed that the northern and the central part trending north east were regions predicted to be the most vulnerable regions while the southern part of the study area was predicted to be the least vulnerable area.

... They can learn the underlying and deep relationship between the NWD and related variables from historical data, and this relationship is usually implicit before and after modeling. Common machine learning algorithms include Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest (RF), etc., which have been applied in water demand prediction and have good performance (Herrera et al. 2010). In addition to choosing two common machine learning algorithms, ANN and RF, this study also adopts another effective machine learning algorithm, extreme gradient boosting (XGBoost), to establish the NWD prediction model. ...

The real-time hydraulic model (RTHM) is a key assistive tool in water distribution system (WDS) management, and its performance directly affects assisted decision-making. This study develops a framework to improve the timeliness and accuracy of RTHMs, which includes the following five steps: flow data processing, establishing nodal water demand (NWD) prediction models, node grouping, data assimilation (DA) and uncertainty analysis. Based on the actual network data, the performance of two data processing methods and three machine learning algorithms are, respectively, compared, and the best is selected for modeling. In the establishment of the hourly NWD prediction models, massive data, including flow measurement and data of all 26 input variables on climate, time and social influencing factors are used. It is found that the time features are the most important model input parameter. Application results of actual network prove that the flow data processing method, accurate NWD prediction, node grouping and Kalman filter-based DA method reduce the uncertainty in the RTHM and improve its timeliness and accuracy, so as to obtain the real-time state estimation of the WDS. Accurate NWD estimation (especially in the high-demand period) and combining RTHM with DA have a great influence on the uncertainty reduction in water pressure estimation, although uncertainty is weakened in the propagation process. HIGHLIGHTS
A framework to improve the timeliness and accuracy of real-time hydraulic models was established, helping to accurately estimate the real-time status of the water distribution system.;
Reliable nodal water demand prediction models were established, with an accuracy of hourly level.;
The obtained results improve the understanding of the propagation process and reduction methods of the uncertainty of the WDS hydraulic models.;

... They evaluated the artificial neural network (ANN) and ARIMA models for predicting potential water demand and discovered that the ANNbased prediction model outperformed the ARIMA model. Furthermore, Manuel et al. [20] stated that it is crucial to predict the water consumption of consumers to provide an efficient water supply, and they used ANN, RF, and support vector regression (SVR) algorithms to predict the water consumption of cities in southeastern Spain. SVR had the best predictive performance among the three machine learning algorithms. ...

The importance of efficient water resource supply has been acknowledged, and it is essential to predict short-term water consumption in the future. Recently, it has become possible to obtain data on water consumption at the household level through smart water meters. The pattern of these data is nonlinear due to various factors related to human activities, such as holidays and weather. However, it is difficult to accurately predict household water consumption with a nonlinear pattern with the autoregressive integrated moving average (ARIMA) model, a traditional time series prediction model. Thus, this study used a deep learning-based long short-term memory (LSTM) approach to develop a water consumption prediction model for each customer. The proposed model considers several variables to learn nonlinear water consumption patterns. We developed an ARIMA model and an LSTM model in the training dataset for customers with four different water-use types (detached houses, apartment, restaurant, and elementary school). The performances of the two models were evaluated using a test dataset that was not used for model learning. The LSTM model outperformed the ARIMA model in all households (correlation coefficient: mean 89% and root mean square error: mean 5.60 m3). Therefore, it is expected that the proposed model can predict customer-specific water consumption at the household level depending on the type of use.

... Smolak et al. (2020) compared the performance of water consumption forecasting by using SVR and the associated machine learning techniques. Several research papers have compared the water consumption forecasting efficiency between conventional regression models and several ANN models (Adamowski and Karapataki 2010, Herrera et al. 2010, Abba et al. 2020b. Many previous studies suggested that the robust ANN technique was superior among all conventional models (Mouatadid and Adamowki 2016, Guo et al. 2018. ...

Water consumption is strongly affected by numerous factors, such as population, climatic, geographic, and socio-economic factors. Therefore, the implementation of a reliable predictive model of water consumption pattern is challenging task. This study investigates the performance of predictive models based on multi-layer perceptron (MLP), multiple linear regression (MLR), and support vector regression (SVR). To understand the significant factors affecting water consumption, the stepwise regression (SW) procedure is used in MLR to obtain suitable variables. Then, this study also implements three predictive models based on these significant variables (e.g., SWMLR, SWMLP, and SWSVR). Annual data of water consumption in Thailand during 2006 - 2015 were compiled and categorized by provinces and distributors. By comparing the predictive performance of models with all variables, the results demonstrate that the MLP models outperformed the MLR and SVR models. As compared to the models with selected variables, the predictive capability of SWMLP was superior to SWMLR and SWSVR. Therefore, the SWMLP still provided satisfactory results with the minimum number of explanatory variables which in turn reduced the computation time and other resources required while performing the predictive task. It can be concluded that the MLP exhibited the best result and can be utilized as a reliable water demand predictive model for both of all variables and selected variables cases. These findings support important implications and serve as a feasible water consumption predictive model and can be used for water resources management to produce sufficient tap water to meet the demand in each province of Thailand.

Since soft computing has gained a lot of attention in hydrological studies, this study focuses on predicting aeration efficiency (E20) using circular plunging jets employing soft computing techniques such as reduced error pruning tree (REPTree), random forest (RF), and M5P. The study undertaken required the development and validation of models, which were achieved using 63 experimental data values with input variables, such as angle of inclination of tilt channel (α), number of plunging jets (JN), discharge of each jet (Q), hydraulic radius of each jet (HR), and Froude number (Fr. No), to evaluate the aeration efficiency (E20), which served as the output variable. To evaluate the effectiveness of the developed models, three different statistical indices were used such as the coefficient of correlation (CC), root-mean-square error (RMSE), and mean absolute error (MAE), and it was found that all of the applied techniques possessed good forecasting ability since their correlation coefficient values were greater than 0.8. Upon testing, it was discovered that the M5P model outperformed other soft computing-based models in its ability to predict E20, as demonstrated by its correlation coefficient value of 0.9564 and notably low values of MAE (0.0143) and RMSE (0.0193).

La definición que entrega el diccionario de la Real Academia Española de la Lengua sobre Epidemiología indica que ésta es una ciencia y como tal, tiene aquellos elementos propios de un conjunto sistematizado de conocimientos entre los que se destaca la metodología, aquella que analiza los procedimientos usados en el objeto de estudio.
Desde el surgimiento del ser humano se ha podido evidenciar, a lo largo de la historia, cómo se ha hecho uso del agua en el abastecimiento tanto a nivel de sustento y salud, como en el nivel industrial. Igualmente, se puede evidenciar la evolución de la epidemiología para analizar las enfermedades que principalmente se transmiten por aguas contaminadas y cómo la metodología ha hecho avances tan significativos que permiten predecir el comportamiento de los patógenos y mitigar las consecuencias de un posible contagio.
El desarrollo de la epidemiología se ha volcado principalmente en el área de la medicina en la cual ha mostrado una enorme evolución al afrontar grandes retos como la propagación de las enfermedades infecciosas y el replanteamiento continuo de los modelos de análisis. No obstante, esta ciencia se puede adaptar a cualquier área del conocimiento humano como la gestión de los recursos hídricos y más concretamente en la gestión de las redes de abastecimiento de agua urbana.
La puesta en marcha de una red de abastecimiento de agua en una ciudad, cuyas dimensiones y construcción generalmente son monumentales, implica un diseño y una operabilidad que surge de la aplicación de modelos matemáticos y/o estadísticos, los cuales permiten analizar las distintas condiciones de funcionamiento antes de iniciar obras. Ese comportamiento puede caracterizarse a partir de métodos de resolución basados en los procedimientos epidemiológicos y que han sido contrastados ampliamente en forma empírica y funcional.
En toda red de suministro existen dos componentes independientes e interdependientes, como lo son la gestión de la demanda y la gestión de fallos. En ambos hay incertidumbres que, generalmente, provienen de variables externas, aleatorias, que dificultan su cuantificación y por lo mismo, su predicción. Para la gestión de la demanda, resulta importante la aplicación de modelos de estimación de la demanda precisos, pues con ellos se pueden determinar las capacidades y cargas que soporte la red. A la par, para la gestión de fallos en las redes, resultan importantes modelos de estimación precisos que ayuden a mitigar el impacto de contingencias generadas por fallos en cascada y la propagación de éstos hasta un posible colapso.
En los procesos de gestión de demanda se vienen utilizando principalmente los modelos de series temporales, llegando a la aplicación de modelos que impliquen el algoritmo SAX. En los procesos de gestión de fallos se han aplicado métodos como el análisis de supervivencia y más recientemente, las redes neuronales, llegando a los sistemas multiagente con los modelos SIR, SIRS y SEIR.
El desarrollo de los modelos SAX se pueden apreciar en un caso de estudio de la ciudad de Franca en Brasil, en la que se combinan patrones de similitud entre sectores con patrones de las MINDIST que respaldan los métodos predictivos, mejorando su precisión y facilitando la detección de lecturas anormales en los medidores de flujo e incluso la presencia de usos o fugas inesperados.
El Modelo Basado en Agentes (MBA) se puede desarrollar mediante, por ejemplo, la herramienta NetLogo, y su aplicación en una red de suministro resulta muy efectiva para determinar el comportamiento de los posibles fallos en cascada; ejemplo de ello se aprecia en el caso de estudio de la ciudad de Coro en Venezuela en la que se pueden establecer momentos para cada comportamiento: susceptibilidad, infección (fallo) y recuperación; proporcionando así un modelo predictivo mejorado para este tipo de situaciones.

Accurate prediction of future water demand is desirable in both design and operation of water distribution networks (WDNs). While the long-term forecasting is helpful in planning and designing the system, a short- to medium-term forecasting contributes in better operation, maintenance practices and calibration of the system. The automations in WDNs have emphasised on more accurate prediction of daily and weekly demands for online scheduling of pump operations and valve adjustments. In this paper, a deep learning algorithm called recurrent neural network (RNN) has been used to train the model and forecast hourly water demand, using long short-term memory (LSTM) layer for a city of Spain. The performance of LSTM model is compared with some statistical hybrid models such as ensemble empirical mode decomposition (EEMD), difference pattern sequence forecasting (DPSF) (EEMD-DPSF), and the EEMD with DPSF and autoregressive integrated moving average (ARIMA) (EEMD-DPSF-ARIMA), etc. by means of root mean squared error (RMSE) and mean absolute error (MAE) and mean percentage absolute error (MAPE). Results of this study show that the LSTM-based model can make predictions with improved accuracy than the other models that are being compared when dealing with data with higher time resolutions, data points with abrupt changes, and data with a relatively high uncertainty level. It is also observed that with respect to EEMD-DPSF, LSTM-based models provide better performance in predicting multiple successive water demands.KeywordsWater demand predictionLSTMSoft computing approachErrors

Deterioration of groundwater quality is a long-term incident which leads unending vulnerability of groundwater. The present work was carried out in Murshidabad District, West Bengal, India to assess groundwater vulnerability due to elevated arsenic (As) and other heavy metal contamination in this area. The geographic distribution of arsenic and other heavy metals including physicochemical parameters of groundwater (in both pre-monsoon and post-monsoon season) and different physical factors were performed. GIS-machine learning model such as support vector machine (SVM), random forest (RF) and support vector regression (SVR) were used for this study. Results revealed that, the concentration of groundwater arsenic compasses from 0.093 to 0.448 mg/L in pre-monsoon and 0.078 to 0.539 mg/L in post-monsoon throughout the district; which indicate that all water samples of the Murshidabad District exceed the WHO's permissible limit (0.01 mg/L). The GIS-machine learning model outcomes states the values of area under the curve (AUC) of SVR, RF and SVM are 0.923, 0.901 and 0.897 (training datasets) and 0.910, 0.899 and 0.891 (validation datasets), respectively. Hence, "support vector regression" model is best fitted to predict the arsenic vulnerable zones of Murshidabad District. Then again, groundwater flow paths and arsenic transport was assessed by three dimensions underlying transport model (MODPATH). The particles discharging trends clearly revealed that the Holocene age aquifers are major contributor of As than Pleistocene age aquifers and this may be the main cause of As vulnerability of both northeast and southwest parts of Murshidabad District. Therefore, special attention should be paid on the predicted vulnerable areas for the safeguard of the public health. Moreover, this study can help to make a proper framework towards sustainable groundwater management.

An authentic water consumption forecast is an auxiliary tool to support the management of the water supply and demand in urban areas. Providing a highly accurate forecasting model depends a lot on the quality of the input data. Despite the advancement of technology, water consumption in some places is still recorded by operators, so its database usually has some approximate and incomplete data. For this reason, the methods used to predict the water demand should be able to handle the drawbacks caused by the uncertainty in the dataset. In this regard, a structured hybrid approach was designed to cluster the customers and predict their water demand according to the uncertainty in the dataset. First, a fuzzy-based algorithm consisting of Forward-Filling, Backward-Filling, and Mean methods was innovatively proposed to impute the missing data. Then, a multi-dimensional time series k-means clustering technique was developed to group the consumers based on their consumption behavior, for which the missing data were estimated with fuzzy numbers. Finally, one forecasting model inspired by Long Short-Term Memory (LSTM) networks was adjusted for each cluster to predict the monthly water demand using the lagged demand and the temperature. This approach was implemented on the water time series of the residential consumers in Yazd, Iran, from January 2011 to November 2020. Based on the performance evaluation in terms of the Root Mean Squared Error (RMSE), the proposed approach had an acceptable level of confidence to predict the water demand of all the clusters.

Effective water quality management and reliable environmental modeling depend on the availability, size, and quality of water quality (WQ) data. Observed stream water quality data are usually sparse in both time and space. Reconstruction of water quality time series using surrogate variables such as streamflow have been used to evaluate risk metrics such as reliability, resilience, vulnerability, and watershed health (WH) but only at gauged locations. Estimating these indices for ungauged watersheds has not been attempted because of the high-dimensional nature of the potential predictor space. In this study, machine learning (ML) models, namely random forest regression, AdaBoost, gradient boosting machines, and Bayesian ridge regression (along with an ensemble model), were evaluated to predict watershed health and other risk metrics at ungauged hydrologic unit code 10 (HUC-10) basins using watershed attributes, long-term climate data, soil data, land use and land cover data, fertilizer sales data, and geographic information as predictor variables. These ML models were tested over the Upper Mississippi River Basin, the Ohio River Basin, and the Maumee River Basin for water quality constituents such as suspended sediment concentration, nitrogen, and phosphorus. Random forest, AdaBoost, and gradient boosting regressors typically showed a coefficient of determination R2>0.8 for suspended sediment concentration and nitrogen during the testing stage, while the ensemble model exhibited R2>0.95. Watershed health values with respect to suspended sediments and nitrogen predicted by all ML models including the ensemble model were lower for areas with larger agricultural land use, moderate for areas with predominant urban land use, and higher for forested areas; the trained ML models adequately predicted WH in ungauged basins. However, low WH values (with respect to phosphorus) were predicted at some basins in the Upper Mississippi River Basin that had dominant forest land use. Results suggest that the proposed ML models provide robust estimates at ungauged locations when sufficient training data are available for a WQ constituent. ML models may be used as quick screening tools by decision makers and water quality monitoring agencies for identifying critical source areas or hotspots with respect to different water quality constituents, even for ungauged watersheds.

Meeting customer needs in a timely manner has a significant impact on customer satisfaction. For this reason, the planning process has successfully influenced the success of sales activities. The crucial point for the success of the planning process depends on the sales forecasts. Sales forecasting estimates the quantity required by the customer needs. It helps in determining sales targets as campaigns, pricing, brand and product communication, and distribution channels are incorporated in the sales forecast. In this paper, we use regression and artificial neural networks to predict automobile sales in Turkey. The performance of regression is compared with that of an artificial neural network, and it is shown which network is able to predict. Thus, the result of the study, automobile sales in Turkey, was predicted and compared with the actual sales for 2020. The result is that the best prediction method will determine the automobile sales in Turkey.

Water resources are vital to the energy conversion process but few efforts have been devoted to the joint optimization problem which is fundamentally critical to the water-energy nexus for small-scale or remote energy systems (e.g., energy hubs). Traditional water and energy trading mechanisms depend on centralized authorities and cannot preserve security and privacy effectively. Also, their transaction process cannot be verified and is subject to easy tampering and frequent exposures to cyberattacks, forgery, and network failures. Toward that end, water-energy hubs (WEHs) offers a promising way to analyse water-energy nexus for greater resource utilization efficiency. We propose a two-stage blockchain-based transactive management method for multiple, interconnected WEHs. Our method considers peer-topeer (P2P) trading and demand response, and leverages blockchain to create a secure trading environment. It features auditing and resource transaction record management via system aggregators enabled by a consortium blockchain, and entails spatial-temporal distributionally robust optimization (DRO) for renewable generation and load uncertainties. A spatial-temporal ambiguity set is incorporated in DRO to characterize the spatial-temporal dependencies of the uncertainties in distributed renewable generation and load demand. We conduct a simulation-based evaluation that includes robust optimization and the moment-based DRO as benchmarks. The results reveal that our method is consistently more effective than both benchmarks. Key findings include i) our method reduces conservativeness with lower WEH trading and operation costs, and achieves important performance improvements by up to 6.1%; and ii) our method is efficient and requires 18.7% less computational time than the moment-based DRO. Overall, this study contributes to the extant literature by proposing a novel two-stage blockchain-based WEH transaction method, developing a realistic spatialtemporal ambiguity set to effectively hedge against the uncertainties for distributed renewable generation and load demand, and producing empirical evidence suggesting its greater effectiveness and values than several prevalent methods.

Water scarcity has urged the need for adequate water demand forecasting to facilitate efficient planning of municipal infrastructure. However, the development of water consumption models is challenged by the rapid environmental and socio-economic changes, particularly during unforeseen events like the COVID-19 pandemic. This study investigated the impact of COVID-19 on the efficiency of water demand prediction models, considering the lockdown measures and various exogenous features, such as previous consumption (PC) and socio-demographic (SDF), seasonal (SF), and climatic (CF) factors. Multiple ensemble models, gradient-boosting machines (GBM), extreme-gradient-boosting (XGB), light-gradient-boosting, random forest (RF), and stack regressor (STK) were examined, compared to other machine-learning techniques, multiple-linear regression (MLR), decision trees, and neural networks. The models were tested using 3-year metering records for 128,000 consumers in Dubai. The feature importance analysis indicated that PC and SDF had a significant impact on consumption rates with correlation coefficients of 0.95 and 0.74, respectively, as opposed to SF and CF, which had negligible effect. The results showed that, before COVID, RF and STK outperformed other models with a coefficient-of-determination (R²) and root-mean-squared-error (RMSE) of 0.928 and 0.039, followed by XGB at 0.923 and 0.041, respectively. However, MLR achieved the highest prediction accuracy amid COVID with R² and RMSE of 0.90 and 0.05, followed by GBM and XGB equally at 0.83 and 0.06, respectively. An ensemble-based error prediction model was applied, resulting in up to 9.2% improvement in predictions. Overall, this research emphasized the efficiency of ensemble models in handling fluctuating data with a high degree of nonlinearity.

This paper presents the development of hybrid machine learning models to forecast the natural flows of water bodies. Five models were considered under the analysis: Extreme Gradient Boosting (XGB), Extreme Learning Machines (ELM), Support Vector Regression (SVR), Elastic Net linear model (EN), and Multivariate Adaptive Regression Splines (MARS). Grey Wolf Optimization Algorithm (GWO) optimised all of the models’ internal parameters. A feature selection approach was embedded in the hybrid model to reduce the number of input variables. The hybrid model performed the forecasts considering 1, 3, 5 and 7 days ahead on data collected from Cahora Bassa dam, Mozambique. In the experiments conducted in this paper, XGB outperformed EN, ELM, MARS, and SVR, presenting lower prediction error and uncertainty. The proposed XGB model arises as an alternative to help with flow prediction, which is crucial for hydroelectric power plant activity.

Artificial intelligence (AI) and machine learning (ML) technology are bringing new opportunities in water resources engineering. ML, a subset of AI, is a significant research area of interest contributing smartly to the planning and execution of water resources projects. Still, ML in water resources engineering can explore new applications such as automatic scour detection, flood prediction and mitigation, etc. The challenges faced by the researchers in applying ML are mainly due to the acquisition of quality data and the cost involved in computational resources. This chapter reviews the history of the development of AI and ML algorithm applied in water resources. This chapter also presents the scientometric review of shallow ML algorithms, viz., linear regression, logistic regression, artificial neural network, decision trees, gene expression programming, genetic programming, multigene genetic programming, support vector machines, k-nearest neighbor, k-means clustering algorithm, AdaBoost, random forest, hidden Markov model, spectral clustering, and group method of data handling. This chapter analyzes the articles related to the shallow learning algorithms mentioned above from 1989 to 2022 and their applications in various aspects of water resource engineering.

With the increase adoption of monitoring technology for Smart Water Grid (SWG) system, accurate prediction of SWG status is essential for water companies to effectively operate and manage water networks. Although different data-driven predictive techniques have been developed over last two decades with various degree of success, predictive modeling is not widely adopted in practice. The challenges remain in (1) developing accurate and robust model for near real-time applications; (2) the selection of training data size, model update frequency, and input data size for competent model performance. Therefore, in this paper, a versatile framework is developed by integrating data preprocessing procedures with various statistical methods, machine learning, and deep learning algorithms. It is flexible and accelerated by the latest graphics processing unit computing technology. The case study using the real-world monitoring data shows that the prediction accuracy of 91% and 98% has been achieved for flow and pressures, respectively.

In this study, a deep learning model is proposed to predict groundwater levels. The model is able to accurately complete the prediction task even when the data utilized are insufficient. The hybrid model that we have developed, CNN-LSTM-ML, uses a combined network structure of convolutional neural networks (CNN) and long short-term memory (LSTM) network to extract the time dependence of groundwater level on meteorological factors, and uses a meta-learning algorithm framework to ensure the network’s performance under sample conditions. The study predicts groundwater levels from 66 observation wells in the middle and lower reaches of the Heihe River in arid regions and compares them with other data-driven models. Experiments show that the CNN-LSTM-ML model outperforms other models in terms of prediction accuracy in both the short term (1 month) and long term (12 months). Under the condition that the training data are reduced by 50%, the MAE of the proposed model is 33.6% lower than that of LSTM. The results of ablation experiments show that CNN-LSTM-ML is 26.5% better than the RMSE of the original CNN-LSTM structure. The model provides an effective method for groundwater level prediction and contributes to the sustainable management of water resources in arid regions.

Short-term forecasting of water demand is a crucial process for managing efficiently water supply
systems. This paper proposes to develop a novel graph convolutional recurrent neural network (GCRNN)
to predict time series of water demand related to some water supply systems or district metering areas that
belong to the same geographical area. The aim is to build a graph-based model able to capture the dependence
among the different water demand time series both in spatial and in temporal terms. This model is built on a
set of different graphs, and its performance is compared to two methods, including a state-of-the-art deep long
short-term memory (LSTM) neural network and a traditional seasonal autoregressive moving average model.
Additionally, the forecasting model is tested in a condition when a sensor has a malfunction. The results show
the ability of the GCRNN to produce accurate and reliable forecasting, especially when based on graph built
while accounting for both time-series correlation and spatial criteria. The GCRNN consistently outperforms
the LSTM during the fault test, showing its ability to generate a robust prediction for days after a sensor
malfunction, given the GCRNN's ability to benefit from the other time series of the graph.

Partially linear models (PLM) are regression models in which the response depends on some covariates linearly but on other covariates nonparametrically. PLMs generalize standard linear regression techniques and are special cases of additive models. This chapter covers the basic results and explains how PLMs are applied in the biometric practice. More specifically, we are mainly concerned with least squares estimators of the linear parameter while the nonparametric part is estimated by e.g. kernel regression, spline approximation, piecewise polynomial and local polynomial techniques. When the model is heteroscedastic, the variance functions are approximated by weighted least squares estimators. Numerous examples illustrate the implementation in practice. PLMs are defined by Y = X-À + g(T) + -Ã, (5.1) where X and T are d-dimensional and scalar regressors, -À is a vector of unknown parameters, g(-E) an unknown smooth function and -Ã an error term with mean zero conditional on X and T. The PLM is a special form of the additive regression models Hastie and Tibshrani (1990); Stone (1985), which allows easier interpretation of the effect of each variables and may be preferable to a completely nonparametric regression since the well-known reason -gcurse of dimensionality-h. On the other hand, PLMs are more flexible than the standard linear models since they combine both parametric and nonparametric components. Several methods have been proposed to estimate PLMs. Suppose there are n observations {Xi, Ti, Yi}n i=1. Engle, Granger, Rice and Weiss (1986), Heck88 5 Partially Linear Models man (1986) and Rice (1986) used spline smoothing and defined estimators of -À and g as the solution of argmin -À,g 1 n n i=1 {Yi. Xi-À. g(Ti)}2 + -É {g(u)}2du. (5.2) Speckman (1988) estimated the nonparametric component by W-Á, where W is a (n -~ q).matrix of full rank and -Á is an additional parameter. A PLM may be rewritten in a matrix form Y = X-À +W-Á + -Ã. (5.3) The estimator of -À based on (5.3) is-ÀS = {X(I. PW)X}.1{X(I. PW)Y }, (5.4) where PW = W(WW).1W is a projection matrix and I is a d.order identity matrix. Green, Jennison and Seheult (1985) proposed another class of estimates-ÀGJS = {X(I.Wh)X)}.1{X(I.Wh)Y)} by replacing W in (5.4) by another smoother operator Wh. Chen (1988) proposed a piecewise polynomial to approximate nonparametric function and then derived the least squares estimator which is the same form as (5.4). Recently H-Nardle, Liang and Gao (2000) have systematically summarized the different approaches to PLM estimation. No matter which regression method is used for the nonparametric part, the forms of the estimators of -À may always be written as {X(I.W)X}.1{X(I.W)Y }, where W is a projection operation. The estimators are asymptotically normal under appropriate assumptions. The next section will be concerned with several nonparametric fit methods for g(t) because of their popularity, beauty and importance in nonparametric statistics. In Section 5.4, the Framingham heart study data are investigated for illustrating the theory and the proposed statistical techniques.

In modern water distribution systems, pumping accounts for a large portion of the costs; therefore, water utilities need to reduce the pumping cost. The purpose of this study was to reduce and optimize the pumping costs. The study area was composed of one filtration plant, five reservoirs and 3 pumping stations. The IP (Integer programming) method used for the optimization, as it gave a global solution, with the pump controlled as either on or off. It is necessary to correctly forecast the hourly water demand to obtain an IP solution as both the optimized pumping schedule and low limitation of the reservoir are dependent on this factor. Therefore, three methods (time index, multiple regression + time index & Fourier series + transfer ARIMA) were compared to forecast the hourly water demand. As a result of these comparisons, the multiple regression + time index model was selected. The low limitation of the reservoir was also determined depending on the correction of the hourly water demand model. The optimization of pumping in the water distribution system had previously been simulated for 3 months using the IP. As a result of this simulation, it was found that the pumping cost could be reduced by 12.2%-38.7%.

This paper shows how a new daily demand model incorporating base use values calculated using temperature and rainfall thresholds for East Doncaster, Victoria, Australia was evaluated. The model is based on a postulate that total water use is made up of base use and seasonal use, where base use represents mainly indoor use and is independent of climatic effects such as rainfall and temperature and seasonal use on seasonal, climatic and persistence components. Using the daily data collected for East Doncaster, Victoria, Australia water supply distribution zone and the corresponding rainfall and temperature data from 1990 to 2000 reference or threshold levels in which water use is independent of temperature and rainfall were identified. The base values were correlated with the day of the week and climatic factors such as temperature and rainfall. Results revealed these base values to be climate independent but are affected by weekends and weekdays. The calculated base use values using temperature and rainfall thresholds were incorporated in a total water demand model and showed strong correlation with R2 of 71%. The model is further validated using an independent set of data from 2000 to 2001 and yielded an R2 of 83%.

The short-term, demand-forecasting model described in this paper forms the third constituent part of the POWADIMA research project which, taken together, address the issue of real-time, near-optimal control of water-distribution networks. Since the intention is to treat water distribution as a feed-forward control system, operational decisions have to be based on the expected future demands for water, rather than just the present known requirements. Accordingly, it was necessary to develop a short-term, demand-forecasting procedure. To that end, monitoring facilities were installed to measure short-term fluctuations in demands for a small experimental network, which enabled a thorough investigation of trends and periodicities that can usually be found in this type of time-series. On the basis of these data, a short-term, demand-forecasting model was formulated. The model reproduces the periodic patterns observed at annual, weekly and daily levels prior to fine-tuning the estimated values of future demands through the inclusion of persistence effects. Having validated the model, the demand forecasts were subjected to an analysis of the sensitivity to possible errors in the various components of the model. Its application to much larger case studies is described in the following two papers.

Classification and regression trees are machine‐learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 14‐23 DOI: 10.1002/widm.8
This article is categorized under: Technologies > Classification
Technologies > Machine Learning
Technologies > Prediction
Algorithmic Development > Statistics

This new edition updates Durbin & Koopman's important text on the state space approach to time series analysis. The distinguishing feature of state space time series models is that observations are regarded as made up of distinct components such as trend, seasonal, regression elements and disturbance terms, each of which is modelled separately. The techniques that emerge from this approach are very flexible and are capable of handling a much wider range of problems than the main analytical system currently in use for time series analysis, the Box-Jenkins ARIMA system. Additions to this second edition include the filtering of nonlinear and non-Gaussian series. Part I of the book obtains the mean and variance of the state, of a variable intended to measure the effect of an interaction and of regression coefficients, in terms of the observations. Part II extends the treatment to nonlinear and non-normal models. For these, analytical solutions are not available so methods are based on simulation.

Mining frequent itemsets has been widely studied over the last decade. Past research focuses on mining frequent itemsets from static databases. In many of the new applications, data flow through the Internet or sensor networks. It is challenging to extend the mining techniques to such a dynamic environment. The main challenges include a quick response to the continuous request, a compact summary of the data stream, and a mechanism that adapts to the limited resources. In this paper, we develop a novel approach for mining frequent itemsets from data streams based on a time-sensitive sliding window model. Our approach consists of a storage structure that captures all possible frequent itemsets and a table providing approximate counts of the expired data items, whose size can be adjusted by the available storage space. Experiment results show that in our approach both the execution time and the storage space remain small under various parameter settings. In addition, our approach guarantees no false alarm or no false dismissal to the results yielded.

Amidst growing concerns for the state of the world’s surface water resources, water quality modeling is assuming increasing importance. Thomann (1998) suggests that we are entering a’Golden Age’ of water quality modeling in which surface water models will make significant contributions towards fuller diagnosis of problems, uncovering surprises, providing a framework for decision-making and lessening future conflicts between environmental interests, managers and those who are regulated. Traditionally, there have been two main philosophical approaches to surface water quality modeling. Process based models consider the underlying physical processes directly, whereas statistical models determine relationships based on historical data sets. Of course, in reality, process based models usually require some degree of statistical calibration to historical data, whereas statistical models should be based, where possible, on relationships that have a physical basis. Recently, artificial neural networks (ANNs) have emerged as alternatives to traditional statistical models in a variety of fields, including water quality modeling.

An optimization model based on genetic algorithms is applied for the optimal calibration of water distribution systems. The model uses a neural network for function evaluation in conjunction with a rigorous mathematical simulation model. An efficient training scheme, which greatly reduces the training period compared to the popular backpropagation scheme, is employed to increase the computational efficiency. For large water distribution systems, the use of neural networks in conjunction with a genetic optimization framework can increase the computational efficiency significantly.

As traditional least squares support vector machine (LSSVM) model parameters determined by cross-validation is time-consuming, the Bayesian evidence framework was proposed to infer the LSSVM model parameters. The weight vector w and the bias term b of the LSSVM were obtained on the first level of inference. The model hyperparameters μ, ζ were determined on the second level of inference. The model comparison was performed on the third level of inference in order to automatically determine the coefficient of kernel function. According to the periodicity and trend of water demand series, an hourly water demand forecast model based on Bayesian LSSVM was established. Case analysis shows that the modeling speed of Bayesian LSSVM-based hourly water demand forecast model is faster and forecasting precision is higher than those of traditional LSSVM-based model and BP neural network-based model.

In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

This new edition updates Durbin & Koopman's important text on the state space approach to time series analysis. The distinguishing feature of state space time series models is that observations are regarded as made up of distinct components such as trend, seasonal, regression elements and disturbance terms, each of which is modelled separately. The techniques that emerge from this approach are very flexible and are capable of handling a much wider range of problems than the main analytical system currently in use for time series analysis, the Box-Jenkins ARIMA system. Additions to this second edition include the filtering of nonlinear and non-Gaussian series. Part I of the book obtains the mean and variance of the state, of a variable intended to measure the effect of an interaction and of regression coefficients, in terms of the observations. Part II extends the treatment to nonlinear and non-normal models. For these, analytical solutions are not available so methods are based on simulation.

A new method for nonparametric multiple regression is presented. The procedure models the regression surface as a sum of general smooth functions of linear combinations of the predictor variables in an iterative manner. It is more general than standard stepwise and stagewise regression procedures, does not require the definition of a metric in the predictor space, and lends itself to graphical interpretation.

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

A time series model of daily municipal water use is developed. The model
is termed a conditional autoregressive process and can be interpreted as
an autoregressive process with randomly varying mean. The randomly
varying mean accounts for changes in water use that result from the
complex interaction over time of "structural features" of the water use
system. These features may include the price of water, total service
area connections, plumbing code provisions, and customer income, among
many others. The modeling approach is semiparametric. The model can be
split into a component that is treated in a nonparametric framework and
a component that is treated parametrically. The random mean process,
which represents long-term trend in water use, is treated in a
nonparametric framework. Conditional on the random mean water use, the
model reduces to a Gaussian autoregressive process with a modest number
of parameters. The water use model is the core of a forecast system
which is used to schedule releases from two water supply reservoirs
which serve the Washington, D.C., Metropolitan Area. Model structure
dictates that the key step in producing a water use forecast is an
updating step in which a revised estimate of current mean water use is
computed.

A time series model of daily municipal water use as a function of
rainfall and air temperature is developed. Total water use is separated
into base use and seasonal use. The seasonal use series is detrended,
then a nonlinear heat function relating water use to air temperature
during rainless periods is employed to deseasonalize the series. The
residuals are modeled using Box-Jenkins transfer functions with
transformed rainfall and air temperature as independent variables. The
model is applied to daily data from Austin, Texas from 1975 to 1981 and
accounts for 97% of the variance of daily municipal water use over that
period. Forecasts of daily usage are made for a two-week lead time.

This paper examines the potential of the support vector machine (SVM) in long-term prediction of lake water levels. Lake Erie mean monthly water levels from 1918 to 2001 are used to predict future water levels up to 12 months ahead. The results are compared with a widely used neural network model called a multilayer perceptron (MLP) and with a conventional multiplicative seasonal autoregressive model (SAR). Overall, the SVM showed good performance and is proved to be competitive with the MLP and SAR models. For a 3- to 12-month-ahead prediction, the SVM model outperforms the two other models based on root-mean square error and correlation coefficient performance criteria. Furthermore, the SVM exhibits inherent advantages due to its use of the structural risk minimization principle in formulating cost functions and of quadratic programming during model optimization. These advantages lead to a unique optimal and global solution compared to conventional neural network models.

The effectiveness of ultrafiltration for the purification of recombinant proteins from aqueous corn endosperm and germ extracts was examined using model proteins of two different sizes, recombinant type I human collagen (rCollagen, 265kDa) and green fluorescent protein (GFP, 27kDa), to evaluate the effects of membrane pore size, transmembrane pressure (TMP), crossflow rate, and filtration pH on permeate flux and protein sieving. Using a 300kDa MWCO membrane resulted in a significant loss of rCollagen, whereas a 100kDa MWCO membrane completely retained rCollagen. Increasing the filtration crossflow rate and TMP resulted in a higher permeate flux without significantly altering the sieving of the host cell proteins (HCP) or GFP. The greatest HCP sieving was observed in the endosperm extract filtration at low pH and, compared to endosperm, the filtration of germ extracts had lower HCP sieving. GFP exhibited similar sieving as the average HCP for all filtration conditions. rCollagen purity of 89% was achieved with only diafiltration of endosperm extracts and, when preceded by precipitation, a purity of >99% was attained. Thus, ultrafiltration is a valuable method to separate and purify corn-hosted recombinant proteins >100kDa, particularly when the expression is targeted to the endosperm.

Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

The short-term water demand forecast modeling techniques were presented. A variety of forecast modeling techniques for use in short-term water demand forecasting included conventional techniques such as regression and time series analysis, artificial intelligence (AI) techniques such as expert systems and artificial neural networks (ANN). The performance of each model was evaluated using two standard statistical parameters. Results of the analysis showed that the AI models outperformed the conventional models.

Transfer functions are used to model the short-term response of daily
municipal water use to rainfall and air temperature variations. Daily
water use data from nine cities are studied, three cities each from
Florida, Pennsylvania, and Texas. The dynamic response of water use to
rainfall and air temperature is similar across the cities within each
State; in addition the responses of the Texas and Florida cities are
very similar to one another while the response of the Pennsylvania
cities is more sensitive to air temperature and less to rainfall. There
is little impact of city size on the response functions. The response of
water use to rainfall depends first on the occurrence of rainfall and
second on its magnitude. The occurrence of a rainfall more than 0.05
in./day (0.13 cm/day) causes a drop in the seasonal component of water
use one day later that averages 38% for the Texas cities, 42% for the
Florida cities, and 7% for the Pennsylvania cities. In Austin, Texas, a
spatially averaged rainfall series shows a clearer relationship with
water use than does rainfall data from a single gage. There is a
nonlinear response of water use to air temperature changes with no
response for daily maximum air temperatures between 40° and 70°F
(4-21°C) an increase in water use with air temperature beyond
70°F; above 85°-90°F (29°-32°C) water use increases
3-5 times more per degree than below that limit in Texas and Florida.
The model resulting from these studies can be used for daily water use
forecasting and water conservation analysis.

LIBSVM is a library for support vector machines (SVM). Its goal is to help users to easily use SVM as a tool. In this document, we present all its imple-mentation details. For the use of LIBSVM, the README file included in the package and the LIBSVM FAQ provide the information.

Water demand forecasts are needed for the design, operation and management of urban water supply systems. In this study, the relative performance of regression, time series analysis and artificial neural network (ANN) models are investigated for short‐term peak water demand forecasting. The significance of climatic variables (rainfall and maximum air temperature, in addition to past water demand) on water demand management is also investigated.
Numerical analysis was performed on data from the city of Ottawa, Ontario, Canada. The existing water supply infrastructure will not be able to meet the demand for projected population growth; thus, a study is needed to determine the effect of peak water demand management on the sizing and staging of facilities for developing an expansion strategy. Three different ANNs and regression models and seven time‐series models have been developed and compared. The ANN models consistently outperformed the regression and time‐series models developed in this study. It has been found that water demand on a weekly basis is more significantly correlated with the rainfall amount than the occurrence of rainfall. Copyright © 2005 John Wiley & Sons, Ltd.

The efficient operation and management of an existing water supply system require short-term water demand forecasts as inputs. Conventionally, regression and time series analysis have been employed in modelling short-term water demand forecasts. The relatively new technique of artificial neural networks has been proposed as an efficient tool for modelling and forecasting in recent years. The primary objective of this study is to investigate the relatively new technique of artificial neural networks for use in forecasting short-term water demand at the Indian Institute of Technology, Kanpur. Other techniques investigated in this study include regression and time series analysis for comparison purposes. The secondary objective of this study is to investigate the validity of the following two hypotheses: 1) the short-term water demand process at the Indian Institute of Technology, Kanpur campus is a dynamic process mainly driven by the maximum air temperature and interrupted by rainfall occurrences, and 2) occurrence of rainfall is a more significant variable than the rainfall amount itself in modelling the short-term water demand forecasts. The data employed in this study consist of weekly water demand at the Indian Institute of Technology, Kanpur campus, and total weekly rainfall and weekly average maximum air temperature from the City of Kanpur, India. Six different artificial neural network models, five regression models, and two time series models have been developed and compared. The artificial neural network models consistently outperformed the regression and time series models developed in this study. An average absolute error in forecasting of 2.41% was achieved from the best artificial neural network model, which also showed the best correlation between the modelled and targeted water demands. It has been found that the water demand at the Indian Institute of Technology, Kanpur campus is better correlated with the rainfall occurrence rather than the amount of rainfall itself.

This paper presents the Intelligent Forecasters Construction Set (IFCS) which is a toolset for constructing forecasting applications. The toolset supports the intelligent techniques of fuzzy logic, artificial neural networks, knowledge-based and case-based reasoning. The developer can construct a forecasting application using rules, procedures and flow diagrams, which are organized into a hierarchy of workspaces. The modularity of the IFCS allows subsequent addition of other modules of intelligent techniques.The IFCS was used for developing a water demand forecasting system based on real-world data obtained from the City of Regina's water distribution system and Environment Canada. A utility demand prediction system developed with the IFCS system is useful for optimizing operation costs of water plants. Some water plants need to pay a flat rate for electricity, which is set depending on peak kilowatt demand. Hence, if the peak kilowatt demand can be reduced, the operating costs of the plant can be lessened (Jamieson RA et al. American Water Works Association Journal 1993;85:48–55). An energy management system needs a good estimate of future customer demand in order to find the optimal pumping schedules that can minimize the peak kilowatt demand. Since the IFCS supports developing multiple predictor models, modeling of data can be expedited. The benefits of using multiple modules of artificial neural networks for demand prediction are presented. The results from this approach are compared with a linear regression and a case-based reasoning program. The performance comparisons among the forecasters will be discussed.

A time series forecasting model of hourly water consumption 24 h in advance for an urban zone within the Melbourne (Australia) water supply system is developed. The model comprises two modules—daily and hourly. The daily module is formulated as a set of equations representing the effects of three factors on water use namely seasonality, climatic correlation, and autocorrelation. The hourly module is developed to disaggregate the estimated daily consumption into hourly consumption. The models were calibrated using hourly and daily data for a 6 year period, and independently validated over an additional seven month period. Over this latter period, the hourly forecast model accounted for 66% of the variance in the peak hourly water consumption with a standard error of 162 l/p/d.

The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. Then, in the second and concluding part of this presentation, the indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.

Broad-scale maps of forest characteristics are needed throughout the United States for a wide variety of forest land management applications. Inexpensive maps can be produced by modelling forest class and structure variables collected in nationwide forest inventories as functions of satellite-based information. But little work has been directed at comparing modelling techniques to determine which tools are best suited to mapping tasks given multiple objectives and logistical constraints. Consequently, five modelling techniques were compared for mapping forest characteristics in the Interior Western United States. The modelling techniques included linear models (LMs), generalized additive models (GAMs), classification and regression trees (CARTs), multivariate adaptive regression splines (MARS), and artificial neural networks (ANNs). Models were built for two discrete and four continuous forest response variables using a variety of satellite-based predictor variables within each of five ecologically different regions. All techniques proved themselves workable in an automated environment. When their potential mapping ability was explored through simulations, tremendous advantages were seen in use of MARS and ANN for prediction over LMs, GAMs, and CART. However, much smaller differences were seen when using real data. In some instances, a simple linear approach worked virtually as well as the more complex models, while small gains were seen using more complex models in other instances. In real data runs, MARS and GAMS performed (marginally) best for prediction of forest characteristics.

This paper surveys the main issues in the literature on residential water demand. Several tariff types and their objectives are analyzed. Then, the main contributions to the literature on residential water demand estimation are reviewed, with particular attention to variables, specification model, data set, and the most common econometric problems. The paper concludes with comments on future trends and a summary of the contents of the study.

In this paper, four alternative flexible nonlinear regression model approaches are reviewed and their performance evaluated based on various measures of out-of-sample forecast accuracy. The class of flexible regression model considered includes Neural Networks, Projection Pursuit models and the Random Field regression model approach recently suggested by Hamilton [Econometrica 69 (2001) 537–573]. An empirical illustration is provided, showing that linear models for the US unemployment rate and the growth rate in US industrial production cannot outperform the “best” flexible nonlinear regression models in terms of out-of-sample forecast accuracy. The results indicate a possible presence of a nonlinear component in the conditional mean function of both time series.

The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described in the first part of this presentation: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. In this, the second and concluding part of the presentation, the indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.