Fig 6 - uploaded by Donghwi Jung
Content may be subject to copyright.
Six locations for analyzing the impact of missing data on hydraulic model's head losses predictions (Locations 1-6) and two pipe bursts locations (Locations A and B). Missing data locations are shown for an exemplary set of locations because these locations are randomly generated for each of the 100 realizations.

Six locations for analyzing the impact of missing data on hydraulic model's head losses predictions (Locations 1-6) and two pipe bursts locations (Locations A and B). Missing data locations are shown for an exemplary set of locations because these locations are randomly generated for each of the 100 realizations.

Contexts in source publication

Context 1
... that are more accurate than disaggregating the system demand under an assumed spatial distribution. With AMI nodal demands, pressure conditions can be evaluated and bursts located. Here, to examine the impact of missing data, the hydraulic model's head loss predictions are investigated for six different observation locations in the study network (Fig. 6). Failure conditions (bursts) are also considered and compared to normal condition results. Two pipe burst events are considered: at the center of network (Burst A) and at observation Location 5 (Burst B in Fig. 6). The average burst flow is 2.1 L=s, or equivalent to 2% of total system's demand. Fig. 7 shows the 24-h average percentage ...
Context 2
... data, the hydraulic model's head loss predictions are investigated for six different observation locations in the study network (Fig. 6). Failure conditions (bursts) are also considered and compared to normal condition results. Two pipe burst events are considered: at the center of network (Burst A) and at observation Location 5 (Burst B in Fig. 6). The average burst flow is 2.1 L=s, or equivalent to 2% of total system's demand. Fig. 7 shows the 24-h average percentage of head losses for nonburst conditions when zero imputation is used. HM and DS imputation results are not shown because they are very similar to the zero scheme. As the percentage of missing data increases, the ...
Context 3
... each time step's mean and standard deviation of RðtÞ and to confirm that standardized residuals [zðtÞ] are normally distributed. Visual inspection and two statistical tests (Chi-square goodness-of-fit and Kolmogorov-Smirnov) at 1% significance level are applied for four different times, and all showed that zðtÞ values follow normal distributions (Fig. ...

Citations

... The hydraulic simulation results presented in the article are generally in line with the findings of other studies. For example, a study by Jun, Jung and Lansey (2021) found that pressure and flow in a water distribution network can vary significantly throughout the day, depending on consumption demand. ...
Article
Full-text available
Currently, many water supply systems collect and monitor data daily, among which we can highlight values of reservoir levels, pressures, and consumption demands, in addition to electrical data. The data generated is transformed into information, providing the necessary knowledge to guide the manager in planning actions and making decisions in general. The R program is a programming language widely used for statistical analysis and, recently, in some R works coupled to EPANET. Thus, this work aimed to evaluate the potential of the R program interconnected to EPANET for the database of water supply systems. For this purpose, the methodology of proposing a simulation in R of a theoretical water distribution network created in EPANET was considered to evaluate the results of daily consumption demand. The proposed network was simulated by varying consumption demands with reservoir levels, obtaining several results in 24 hours. Consequently, it was possible to automate the process of statistical analysis, generating tables and graphs referring to the dispersion of demands, node pressures, and outflows of the stretches arising from each variation in consumption and reservoir levels. The results obtained proved the compatibility and practicality of the mathematical model of the water distribution project in the EPANET program, simulated in R and stored in the SQL SERVER database. Keywords statistical analysis; database; EPANET; R software; distribution networks
... AMI systems provide water utilities with high-resolution flow data measured from individual households (DiCarlo and Berglund 2022). Due to their potential benefits for WDS operation tasks, many water utilities are widely installing AMIs (Jun et al. 2021). For example, the City of San Diego authorized up to $25 million to install AMIs at all households (Horn 2018), while the Madison Water Utility in Madison, Wisconsin, has installed more than 64,300 AMIs (City of Madison 2019). ...
... The WEC statistical process control rules (WEC 1956) have been widely applied for leak detection using source flows, internal measurements, and mass balance approaches [i.e., comparing the supplied flow with total withdrawals (i.e., sum of AMI demands)] (Romano et al. 2014;Jun et al. 2021;Hagos et al. 2016). The WEC rules attempt to distinguish between random variability of a process and anomalies. ...
... For example, a positive score that is immediately followed by a negative score (or vice versa) does not issue an alarm. However, as demonstrated in Jun et al. (2021), only positive side errors are used for the AMI system, as negative residual values occur only if the AMI withdrawals are larger than the system flows. Because this condition is not leak related, many false alarms will occur if negative residuals consistently occur (Jun et al. 2021). ...
... Thus, the clear next step is to develop and test approaches for smart systems such as advanced metering infrastructure (AMI) in which more data and information is collected. As the cost of AMI drops, many water utilities are widely adopting AMIs, and systems with those fully installed are becoming common (Jun et al. 2021). In addition, as pressure meters have been added to the AMIs, pressure-supplemented AMI systems that measure customer meter pressures and flows are the next WDN generation data collection system that can provide a fuller picture of the network. ...
... DP and RF evaluate leak detection effectiveness, whereas ADT measures detection efficiency. Jung and Lansey (2015) and Jun et al. (2021) provided detailed descriptions. Localization performance is evaluated by calculating the average shortest path distance (ASPD) between the true leak site and n t potential leak locations identified by a threshold rule. ...
... Using SIMDEUM, 24-h end-user demands were generated randomly for 6,000 households. These demands were summed over the number of households surrounding a node (Jun et al. 2021). To derive the model, an average use per capita assuming 2.44 residents= household (United States Census Bureau 2021) and Austin's average residential demand of 462 L per capita per day (LPCD) [122 gal. ...
... They mentioned that the missing observations were estimated through linear interpolation. As well as, Jun et al. (2021) compared three imputation methods, including zero, historical mean, and distribution sampling method to replace the missing Advanced Metering Infrastructure (AMI) data. They showed that the historical mean method is the most useful tool to impute the missing data in their data set. ...
Article
Full-text available
An authentic water consumption forecast is an auxiliary tool to support the management of the water supply and demand in urban areas. Providing a highly accurate forecasting model depends a lot on the quality of the input data. Despite the advancement of technology, water consumption in some places is still recorded by operators, so its database usually has some approximate and incomplete data. For this reason, the methods used to predict the water demand should be able to handle the drawbacks caused by the uncertainty in the dataset. In this regard, a structured hybrid approach was designed to cluster the customers and predict their water demand according to the uncertainty in the dataset. First, a fuzzy-based algorithm consisting of Forward-Filling, Backward-Filling, and Mean methods was innovatively proposed to impute the missing data. Then, a multi-dimensional time series k-means clustering technique was developed to group the consumers based on their consumption behavior, for which the missing data were estimated with fuzzy numbers. Finally, one forecasting model inspired by Long Short-Term Memory (LSTM) networks was adjusted for each cluster to predict the monthly water demand using the lagged demand and the temperature. This approach was implemented on the water time series of the residential consumers in Yazd, Iran, from January 2011 to November 2020. Based on the performance evaluation in terms of the Root Mean Squared Error (RMSE), the proposed approach had an acceptable level of confidence to predict the water demand of all the clusters.
... Outliers can be simply defined as the values that significantly differ from the true measured values. For example, meter malfunctions, relay failures, weather events, vandalism, and cyber-attacks are unexpected circumstances when outliers occur [49]. Outliers are defined as the data at each measured point that are less than the 1st percentile and exceed the 99th percentile [50]. ...
... Historical mean (HM) is a statistically reasonable methodology because it uses the average value of the same time zone as the data to be replaced [52]. HM is used in several areas of data management, including power grids, transportation, and water resources [49,53,54]. In this study, HM used the average value of previous data collected at the same time to replace missing data values. ...
... This process is repeated in the section in which all imputations are required. Jun et al. (2021) was the first example of DS being applied in a WDS, please refer to it for detailed information [49]. ...
Article
Full-text available
Recently, various detection approaches that identify anomalous events (e.g., discoloration, contamination) by analyzing data collected from smart meters (so-called structured data) have been developed for many water distribution systems (WDSs). However, although some of them have showed promising results, meters often fail to collect/transmit the data (i.e., missing data) thus meaning that these methods may frequently not work for anomaly identification. Thus, the clear next step is to combine structured data with another type of data, unstructured data, that has no structural format (e.g., textual content, images, and colors) and can often be expressed through various social media platforms. However, no previous work has been carried out in this regard. This study proposes a framework that combines structured and unstructured data to identify WDS water quality events by collecting turbidity data (structured data) and text data uploaded to social networking services (SNSs) (unstructured data). In the proposed framework, water quality events are identified by applying data-driven detection tools for the structured data and cosine similarity for the unstructured data. The results indicate that structured data-driven tools successfully detect accidents with large magnitudes but fail to detect small failures. When the proposed framework is used, those undetected accidents are successfully identified. Thus, combining structured and unstructured data is necessary to maximize WDS water quality event detection.
... This study aims to evaluate how the quality of the imputation process affects the forecasting of urban water demand in the case of time series with missing values, which are a common problem of metering systems (Jun et al. 2021). It is proposed to analyze a set of conventional imputation algorithms, ranging from the simpler (e.g., linear interpolation) to the more complex [e.g., k-nearest neighbor or the Kalman-seasonal autoregressive integrated moving average (SARIMA) approaches]. ...
Article
Nowadays, drinking water demand forecasting has become fundamental to efficiently manage water distribution systems. With the growth of accessible data and the increase of available computational power, the scientific community has been tackling the forecasting problem, opting often for a data-driven approach with considerable results. However, the most performing methodologies, like deep learning, rely on the quantity and quality of the available data. In real life, the demand data are usually affected by the missing data problem. This study proposes an analysis of the role of missing data imputation in the frame of a short-term forecasting process. A set of conventional imputation algorithms were considered and applied on three test cases. Afterward, the forecasting process was performed using three state-of-the-art deep neural network models. The results showed that a good quality imputation can significantly affect the forecasting results. In particular, the results highlighted significant variation in the accuracy of the forecasting models that had past observation as inputs. On the contrary, a forecasting model that used only static variables as input was not affected by the imputation process and may be a good choice whenever a good quality imputation is not possible.
... The WEC rules were first developed by the Western Electric Company (WEC 1958), and they are widely used for WDS burst detection (Jun et al. 2021;Zhang 2022). The WEC rules identify an anomaly (or burst) when single or consecutive standard scores is/are beyond the control limits. ...
... The effectiveness of burst detection methods was examined using published indicators, the detection probability (DP), and the rate of false alarm (RF), where DP represents the number of detected burst events to the total number of abnormal events, and RF represents the ratio of the detected non-burst events to the total number of normal events (Jung et al. 2015;Hagos et al. 2016;Jun et al. 2021). A single detection technique can occupy up to eight sections in the PNE and MS layouts, and therefore, eight parameter sets that provide the best detectability (i.e., high DP with low RF) were determined by fine-tuning the parameters for all SPC methods. ...
Article
Full-text available
Various data-driven anomaly detection methods have been developed for identifying pipe burst events in water distribution systems (WDSs); however, their detection effectiveness varies based on network characteristics (e.g., size and topology) and the magnitude or location of bursts. This study proposes an ensemble convolutional neural network (CNN) model that employs several burst detection tools with different detection mechanisms. The model converts the detection results produced by six different statistical process control (SPC) methods into a single compromise indicator and derives reliable final detection decisions using a CNN. A total of thirty-six binary detection results (i.e., detected or not) for a single event were transformed into a six-by-six grayscale heatmap by considering multiple parameter combinations for each SPC method. Three different heatmap configuration layouts were considered for identifying the best layout that provides higher CNN classification accuracy. The proposed ensemble CNN pipe burst detection approach was applied to a network in Austin, TX and improved the detection probability approximately 2% higher than that of the best SPC method. Results presented in this paper indicate that the proposed ensemble model is more effective than traditional detection tools for WDS burst detection. These results suggest that the ensemble model can be effectively applied to many detection problems with primary binary results in WDSs and pipe burst events.
Article
Model-based leakage localisation in water distribution networks requires accurate estimates of nodal demands to correctly simulate hydraulic conditions. While digital water meters installed at household premises can be used to provide high-resolution information on water demands, questions arise regarding the necessary temporal resolution of water demand data for effective leak localisation. In addition, how do temporal and spatial data gaps affect leak localisation performance? To address these research gaps, a real-world water distribution network is first extended with the stochastic water end-use model PySIMDEUM. Then, more than 700 scenarios for leak localisation assessment characterised by different water demand sampling resolutions, data gap rates, leak size, time of day for analysis, and data imputation methods are investigated. Numerical results indicate that during periods with high/peak demand, a fine temporal resolution (e.g., 15 min or lower) is required for the successful localisation of leakages. However, regardless of the sampling frequency, leak localisation with a sensitivity analysis achieves a good performance during periods with low water demand (localisation success is on average 95%). Moreover, improvements in leakage localisation might occur depending on the data imputation method selected for data gap management, as they can mitigate random/sudden temporal and spatial fluctuations of water demands.
Article
A long short‐term memory (LSTM) model is introduced to predict missing datapoints of dissolved oxygen (DO) in an eel ( Anguilla japonica ) recirculating aquaculture system. Field experiments allow to determine periodic patterns in DO data corresponding to day–night cycles and a DO decrease after feeding. To improve the accuracy of DO prediction by using a training‐to‐test data ratio of 5:1, training with data in sequential and reverse orders is performed and evaluated. The LSTM model used to predict DO levels in the fish tank has an error of approximately 3.25%. The proposed LSTM model trained on DO data has a high applicability and may support water quality control in aquaculture farms.