Content uploaded by Jeroen Langeveld
Author content
All content in this area was uploaded by Jeroen Langeveld on Jan 16, 2014
Content may be subject to copyright.
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
Liefting et al. 1
Sewer Monitoring Projects: Data Collection, Data Handling and
Data Quality in Arnhem
H.J.Liefting
1
* and J.G. Langeveld
1,2
1
Royal Haskoning, Barbarossastraat 35, P.O. Box 151, 6500 AD, Nijmegen, the Netherlands
2
Delft University of Technology, Faculty of Civil Engineering and Geosciences, P.O. Box
5048, 2600 GA, Delft, the Netherlands
*Corresponding author, e-mail e.liefting@royalhaskoning.com
ABSTRACT
The municipality of Arnhem has set up a monitoring program to investigate water quality in
storm sewers as well as the efficiency of several end-of-pipe solutions for stormwater
treatment. Four pilot installations have been set up: a sandfilter, a lamella-separator, a soil
filter and a helophyte filter. Flow rates and water levels are real-time measured. The obtained
data are tested on reliability, accuracy and completeness by automated tests. Logical and
statistical tests are used. During the project the necessity of data validation was proved. The
detected problems appeared to be divers. Heavy noise in flow data was distinguished with a
check on autocorrelations. The total measured volume was largely distorted by the noise.
Multiple errors in data logger software, causing faulty flow data, were discovered. Suspicious
data were sometimes caused by unexpected disturbance of the conditions. Data validation
must regularly take place from the very beginning of a monitoring project to be alert on any of
such disturbances.
KEYWORDS
Case Study, Data Collection, Data Handling, Data Quality, Data Validation, Stormwater
Treatment
INTRODUCTION
In the last decade, water quantity sensors and to a certain extent water quality sensors have
become better and cheaper. Combined with a rapid development of the information and
communication technology this has led to a widespread application of continuous monitoring
in sewer systems. The municipality of Arnhem has set up a monitoring program to investigate
water quality in storm sewers as well as the efficiency of several end-of-pipe solutions for
stormwater treatment. Flow rates and water levels are real-time measured. An automatic
sampling system starts when runoff flow is measured. After the runoff period the samples are
analyzed on water quality parameters.
Unfortunately, a sewer is a harsh environment for sensitive measuring equipment. This may
cause that the obtained data have a poor quality in terms of availability, reliability and
accuracy. Therefore, data have to be validated. The current practice of data validation in the
field of wastewater engineering is largely manual and visually based, often using
spreadsheets. The application of continuous monitoring has caused the amount of data to
grow greatly. Also the Arnhem monitoring project produces a lot of data. For about two years,
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
2 Sewer Monitoring Projects: Data Collection, Data Handling and Data Quality in Arnhem
every minute, at four locations, flow and water level data are sampled. After two years of
measuring each sensor produced over one million data points.
There are no standard methods for data validation (Bertrand-Krajewski and Muste, 2008). It
may cost great efforts to do analyses manually, if at all possible (Van Bijnen and Korving,
2008). For example, making a table and plot in MS Excel is impossible, because the
maximum number of rows in this program is exceeded by the amount of data. Therefore, data
management in this project has been largely automated. The used calculation program for data
validation is Matlab. The results of the validation show that validation was necessary indeed.
METHODS
Monitoring setup
Four pilot installations have been set up: a sandfilter, a lamella-separator, a soil filter and
(later added to the project) a helophyte filter. At these locations, the measured flow is used to
control the automatic sampling system. An automatic sampling system starts when flow
above a threshold is measured.
Location sandfilter. The flow sensor at location sandfilter is a combined ultrasonic flow
sensor (OCM PRO), with ultrasonic velocity measurement. This sensor uses a correlation
technique to recognize reflecting particles and to determine their traveled distance in time.
The flow sensor is installed in a 700 mm diameter sewer. The measuring range of the sensor
is adjusted to 0 – 200 m
3
/h.
Location lamella filter. The flow sensor at location lamella filter is an ultrasonic flow
measuring system (Endress + Hauser PROMAG 50 W). This sensor uses a Doppler shift to
determine the velocity in the fully submerged pipe. The flow sensor is installed in a 250 mm
contraction at the end of a 700 mm diameter sewer.
Location soil passage. The flow sensor at location soil passage is the same type ultrasonic
flow sensor as at location lamella filter. The flow sensor is installed in a 400 mm contraction
in a 800 mm pipe. Besides the flow in the sewer, also the water level in the soil passage is
measured by pressure gauges. The used sensors are pressure gauges, type ATM/N.
Location helophyte field. At the helophyte field the flow is derived from water level
measurements at an orifice construction. In this case, the setup of the pilot installation was as
shown in figure 1. All used sensors are pressure gauges, type ATM/N.
Langeveld et. al., (submitted) provides more information about the setup of the monitoring
program.
1. Flow is derived from
level measurements (H)
before and after orifice
2. Automatic
sampling
influent
6. Manual
sampling effluent
after runoff period
3. Helophyte filter with level
measurements (H)
4. Effluent drain
7. Surface
water
5. Manhole with level
measurement (H)
pipe
H H H H
H
Figure 1. System layout of pilot installation helophyte filter
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
Liefting et al. 3
Data collection and management
The used sensors are connected to a Campbell Scientific CR200 logger at the pilot locations
(CR2100 at location helophyte filter). These loggers are equipped with a GSM / GPRS
transmitter for data transport to a receiver at the Royal Haskoning office in Nijmegen. The
module has an alerting service via SMS. The data transmission is managed from the office
with the Campbell software LoggerNet. The transmitted data are added to a data file per
location on the connected computer. Depending on the weather circumstances and any SMS
alerts, the data are plotted, analyzed, and stored in a backup several times per week to several
times per day. After the analysis of the obtained measurement data, it is decided how any
taken samples will be analyzed.
Principles of data quality and data validation
Bertrand-Krajewski and Muste (2008) state that measurements are wrong until there are
sufficient and objective reasons to admit that the data are representative and reliable. Data
validation is obtaining such reasons. In general, data quality can be judged on 6 aspects
(Liefting 2006):
1. Reliability. The reliability of a measurement is influenced by systematic differences
between the actual state and the state as measured (Clemens, 2001). Due to various
causes systematic errors can occur.
2. Accuracy. Accuracy is the variation range directly related to the measuring method
and the measuring equipment applied (Clemens, 2001). Note that it is useless to
improve accuracy when there are systematic errors.
3. Completeness. Completeness of the data is the amount of acquired data, compared to
the amount that was aimed at in the measuring plan.
4. Actuality. Actuality of the data is the usability in the present of data obtained in the
past. Changed conditions may have the effect that data are not actual any more.
5. Verifiability. Verifiability of data is the possibility of comparing the data with other
information, like other measured data or metadata.
6. Practical usability. Practical usability of the data may be influenced by the data
format, the data being digitalized or not, the used date and time notation and similar
characteristics.
The obtained data in the Arnhem monitoring project are tested on reliability, accuracy and
completeness by automated tests in Matlab. The tests are logical and statistical tests. The
method of data transmission and filing makes the raw data immediately accessible, so they do
not need (manual) pre-processing before validation. Actuality and verifiability are secured by
keeping logbooks and archiving all relevant information digitally.
Logical tests for data validation
The following logical tests are used in the automated data validation:
• Exceeding of boundary values. The check for exceeding of boundary values is used
to discover values outside the range of the measuring instrument and physically
improbable values. Per location the boundaries are set by the analyst.
• Absence of data. The completeness of the dataset is checked calculating the intervals
between the registrations. If the time registration is non-equidistant, data lack.
• Events with too small timescale. During dry weather the flow sensors in a storm
sewer normally produce zero-values. Under wet weather conditions, the flow will
become significant until the last runoff is transported and the system returns to dry
weather conditions again. The duration of such a runoff peak is somewhere in between
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
4 Sewer Monitoring Projects: Data Collection, Data Handling and Data Quality in Arnhem
a quarter of an hour and multiple hours. Deviations from dry weather values smaller
than 15 minutes are found with a logical test and marked.
Statistical data validation.
Autocorrelation check. If for every time lag at least one pair of measured data separated by
that lag exists, then the estimate of each covariance is (Hamilton, 1994; Liefting, 2006):
( ) ( )
kt
kT
t
kttt
k
k
IyIy
cT
kT
+
−
=
+
−−⋅
−
=
1
1
µµγ
,
where
is the sample mean;
T is the total number of observations;
c
k
is a coefficient defined as:
−
=
+
=
kT
t
kttk IIc
1
(Jones 1971);
I is an indicator sequence defined as:
I
t
=1 if y
t
is observed;
I
t
=0 if y
t
is missing.
The division
k
/
0
is called the k-th sample autocorrelation. A moving window is used to
localize high and low correlations in the time series. The used method neglects missing data,
while the observation times are still taken into account (Jones, 1971).
Provided that the measuring interval is chosen small enough, the successive data (k=1) in a
time series should be significantly (positively) correlated. If no significant autocorrelation
exists between successive data, this is either an indication that the real process has such a
small timescale that the measuring interval is too large to record the process, or the data
contain noise. In both cases, the quality of the data is low.
Depending on the presence of periodical events (like a daily pattern) data also are correlated
for higher order k. For storm water sewers no daily pattern should be expected in flow and
water level data.
Cross correlation check.
A cross correlation test is used to check for a relation between data
from different time series. An obvious example is the correlation between rain intensity and
flow rates in the nearby sewer system. The cross correlation function for two time series (x
and y) of equal length at a certain time lag k is (Liefting, 2006):
( )
( )
yx
T
kt
yyktxxt
kyx
k
IyIx
cT
kT
,0,0
1
,,
1
γγ
µµ
⋅
−−
⋅
−
=Γ
+=
−
,
where
−
=
+
=
kT
t
ktytxkyx
IIc
1
,,,,
.
The value of time lag k at which the correlation is maximal, is determined by the hydraulic
traveling time between the two measuring locations.
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
Liefting et al.
5
RESULTS AND DISCUSSION
During the monitoring project in Arnhem the necessity of data validation was proved. The
detected problems appeared to be divers.
Location sandfilter
Almost directly from the start of the monitoring project, the flow sensor showed a signal as
shown in figure 2. Flow data were not correlated with rain data. The time-scale of the
measured flow peaks is much smaller than the time-scale of storm weather runoff. The flow
exceeds 100 m
3
/h, which is more than half the measuring range. Furthermore it is remarkable
that the wild peaks stop when a recognizable runoff flow is passing the sensor. If the shown
peak flows were to be a real process, we might see here a pattern of frequent voluminous
discharges into the sewer system or fluctuations caused by waves entering the sewer from the
canal. More likely however, is that the shown pattern is non-physical noise.
Figure 2. Flow peaks at location sand filter after the start of the monitoring period with
precipitation in the same period.
Figure 3. The flow data in figure 2 tested on autocorrelations. The size of the moving window
is 30 observations. If the first autocorrelation
1
/
0
< 0.7, the data are marked with an “x”.
Zero-values are neglected.
The data have been tested with an autocorrelation test with k=1 (figure 3). It appears that the
shown pattern has no significant positive autocorrelation; the measuring value after 1 minute
is not correlated to the measuring value before. Also longer-term autocorrelation does not
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
6
Sewer Monitoring Projects: Data Collection, Data Handling and Data Quality in Arnhem
exist. If the shown pattern is physical, it has a characteristic timescale smaller than 1 minute.
That is practically impossible, because of the inertia of the water volume in the fully
submerged 700 mm sewer. Thus, the shown pattern must be noise. The outcome of the
validation is confirmed by a test on the location. The sewer was closed with clinchers, so that
actual flow became impossible. The sensor continued to produce noise during the test.
The total volume measured by the flow sensor is largely distorted by the noise. The total
volume measured by the flow sensor between 7 September 2006 and 27 November 2006
during measured flows higher than 50 m
3
/h, is 6147 m
3
. However, when the noise is removed,
the total volume is only 963 m
3
(table 1).
Table 1. Flow data location sandfilter between 7 September 2006 and 27 November 2006
divided in “good” data and “noise”, with test statistic for noise
1
/
0
< 0.7.
absolute percentage
good noise good noise
number of measurements 95,929 20,963 82% 18%
total measured volume (m
3
)
flow data Q 50 m
3
/h 963 5,184 16% 84%
The the sandfilter installation and the sampling program start when the flow exceeds 50 m
3
/h.
Since the noise peaks regularly exceeded 50 m
3
/h, the installation often started needless. On
27 November 2006 a numerical dissipation is installed in the measuring equipment which
damps the noise. The side effect is that the dissipated noise is now artificially autocorrelated.
Thus it cannot be detected with the autocorrelation test any more.
Location lamella filter
With sensors and loggers, always software is included. The supplier of the equipment often
adjusts this software. At location lamella filter, multiple errors were discovered in the
software of the data logger, causing faulty flow data.
Software error nr. 1.
Soon after the start of the project on 3 July 2006, a fault was discovered,
because the measured flows exceeded the confidence interval. Velocities over 6 m/s were
reached in the measurements. The cause was a software error (table 2). On 7 September 2006
the software was corrected.
Table 2. Small part of the software code in the logger of location lamella filter. Six flow
measurements per minute should be averaged. But the measurements were totalized, causing
the flow registration to be six times too high.
old code corrected code
DataTable (TS_Fast,1,-1)
DataInterval (0,reg_tabel_1,reg_unit)
Totalize (1,debiet_m3,False)
EndTable
DataTable (TS_Fast,1,-1)
DataInterval (0,reg_tabel_1,reg_unit)
Average (1,debiet_m3,0)
EndTable
Software error nr. 2.
The flow sensor has a 4-20 mA analogue output. This output is linearly
converted into a digital value. The conversion function is described by an offset value and a
direction coefficient. Further all values smaller than a cutoff value are set to zero, leaving no
negative values in the data. About 6 months after the start of the monitoring program a second
fault in the flow sensor software was discovered by pure coincidence. It appeared that the
offset value of the conversion function was incorrect. This fault caused all produced data to be
10 m
3
/h too low. Between and 8 September 2006 and 14 February 2007 a total volume of
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
Liefting et al.
7
3,130 m
3
was measured by the faulty calibrated flow sensor. After a correction for the faulty
offset value, the volume had become 4,986 m
3
. Thus all measured values contained a
structural error of averaged 37%.
On 8 September 2006, after the first software correction, the flow sensor had been locally
calibrated. This was done by pumping a fixed flow of 230 m
3
/h into an upstream manhole.
This method appeared to be insufficient to detect a 10 m
3
/h bias in the measurements.
Noise.
Figure 4 shows flow data of location lamella filter during dry weather. This pattern has
no significant autocorrelation for the first time step. The density spectrum of the data is
grouped around 6 values, all multiples of 1/6 times the cutoff value of 7.5 m
3
/h. Since the data
are means of 6 measurements per minute, the shown pattern is probably caused by 1 to 6
small exceedings of the cutoff value per minute. The conclusion is that these values are noise.
Figure 4. Flow data during storm weather and dry weather at location lamella filter.
Figure 5. The flow data in figure 4 tested on autocorrelations. The size of the moving window
is 30 observations. If the first autocorrelation
k
/
0
< 0.7, the data are marked with an “x”.
Zero-values are neglected.
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
8
Sewer Monitoring Projects: Data Collection, Data Handling and Data Quality in Arnhem
Location helophyte filter
Bertrand-Krajewski and Muste (2008) distinguish two causes of anomalies in data:
1. the sensor response is unreliable or inaccurate under normal conditions,
2. the measurand (i.e. the phenomenon to be measured) is disturbed due to unexpected
events.
Data validation may lead to the outcome that not the data are faulty, but that the measurand is
unexpectedly disturbed. At location helophyte filter, multiple examples of the last occurred.
Level drift.
Based on the system layout in figure 1 there was no trend expected for the water
level in the manhole (5) at the effluent side of the filter. The surface water level should
constantly be below the invert level of the pipe. However, the water level data in the first part
of the monitoring period showed a trend (figure 6), even during dry weather. In first instance
it was thought that a level drift of the sensor caused this trend. A field survey one week later
showed the real cause: the PVC pipe from the manhole to the surface water was closed with a
lid under the surface water level. Thus the outlet of the filter was locked and the filter could
not function hydraulically at all. If the suspicious data had not been discovered early in the
project, this problem might not have been detected for a long time (figure 7 demonstrates the
visibility of a small outlet under water level).
Figure 6. Trend of effluent water level after the start of the monitoring project.
Figure 7. The effluent pipe was closed with a lid under the surface water level. The removed
lid is marked.
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
Liefting et al.
9
Reverse flow.
In the weekend following on the discovery of the closed effluent pipe, due to a
pump failing, the nearby combined sewer was filled and ultimately overflowed into the pond
near the helophyte filter. After the pond was filled with foul water it also entered the
helophyte filter and the storm sewer. Due to the absence of staff during the weekend, the
failure was discovered only on Monday (figure 8). After checking the measurement data, the
possible cause was guessed well and Royal Haskoning informed the municipality about the
incident.
Figure 8. Removal of PVC lid on Friday 4 January. Filling of the pond as result of a sewer
overflow in the following weekend. Discovery of overflow on Monday 7 January. On 8
January, the storm water sewer was emptied to remove the foul water.
Flow without rain.
In the same month, the measurements at the location showed a significant
runoff peak (figure 9). Automatic sampling started and an SMS alert was sent. The strange
thing was no single drop of rain was falling in the area. A direct telephonic inquiry learned
that the flow was caused by the connection of a pond to the storm water sewer.
Figure 9. “Stormwater event” without storm weather on 25 January
CONCLUSIONS
From the results of the different locations, the following conclusions can be drawn:
•
Data validation is a crucial part of any monitoring project.
11
th
International Conference on Urban Drainage, Edinburgh, Scotland, UK, 2008
10
Sewer Monitoring Projects: Data Collection, Data Handling and Data Quality in Arnhem
•
Checking for autocorrelations appeared to be a good method to detect if certain
measured values are noise.
•
An accurate local calibration of a newly installed flow sensor is, though practically
difficult, an absolute must.
•
All used sensor software and adjustments of the used equipment have to be filed as
metadata with the measured data, so that any signal conversion or other digital
operations can be checked afterwards.
•
Suppliers and installers of measuring instruments should be professionally distrusted
(in our case, the supplier was a well-known firm specialized in sensors for soil and
water measuring).
•
If measuring data appear to be suspicious, the cause may be unexpected disturbance of
the measurand as well as sensor failing.
•
Data validation must regularly take place from the very beginning of a monitoring
project, not afterwards, to be alert on any disturbances of the measurand.
ACKNOWLEDGEMENT
This monitoring program is initiated by the municipality of Arnhem. Water board
Rivierenland is partner of the project. The project is supported by the EU Interreg program
“Urban Water”. François Clemens (Delft University of Technology) contributed to the
development of the validation tools during the corresponding author’s MSc graduation.
REFERENCES
Bertrand-Krajewski, J.-L. and Muste, M. (2008). Data validation: principles and implementation. In: Fletcher,
T.D. and Deleti (eds.), Data Requirements for Integrated Urban Water Management, Urban Water
Series, UNESCO.
Butler, D. and Davies, J.W. (2000). Urban Drainage. Spon Press, London.
Bijnen, M. van and Korving, H. (submitted). Application and results of automatic validation of sewer monitoring
data. 11
th
Int. Conf. on Urban Drainage.
Clemens, F.H.L.R. (2001). Hydrodynamic models in urban drainage: application and calibration. PhD thesis,
Delft University of Technology.
Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press, Princeton, New Jersey.
Jones, R.H. (1971). Spectrum Estimation with Missing Observations. In: Annals of the Intitute of Statistical
Mathematics, vol. 23, Kluwer, Tokyo, pp. 387-398.
Langeveld, J.G., Liefting, H.J. and Velthorst, H., (submitted). Storm water sewers: pollution levels and measured
removal rates of storm water treatment techniques. 11
th
Int. Conf. on Urban Drainage.
Liefting, H.J. (2006), Validation of Water Quality Data from In-Sewer Measurements. MSc thesis, Faculty of
Civil Engineering and Geosciences, Delft University of Technology.