Content uploaded by Mario Molinara
Author content
All content in this area was uploaded by Mario Molinara on Oct 18, 2021
Content may be subject to copyright.
A false positive reduction system for continuous
water quality monitoring
A. Bria, L. Ferrigno, L. Gerevini, C. Marrocco, M. Molinara
Dept. of Electrical and Information Engineering
University of Cassino and Southern Lazio
Cassino, Italy
{a.bria; ferrigno; luca.gerevini; c.marrocco; m.molinara}@unicas.it
P. Bruschi, M. Cicalini, G. Manfredini, A. Ria
University of Pisa
Pisa, Italy
{paolo.bruschi@; mattia.cicalini@phd.; giuseppe.manfredini@phd.; andrea.ria@ing.}unipi.it
G. Cerro
Dept. of Medicine and Health Sciences
University of Molise
Campobasso, Italy
gianni.cerro@unimol.it
R. Simmarano, G. Teolis, M. Vitelli
Sensichips srl
Aprilia, Italy
{roberto.simmarano; giovanni.teolis; michele.vitelli}@sensichips.com
Abstract—Water monitoring systems continuously working
ensure real–time pollutant detection capabilities according to
their sensitivity and specificity. It is necessary to balance such
features because, although being able to sense several substances
is a desired feature, the reduction of false positives is a primary
goal a classification system should have. High false positive makes
the system unusable. The current solution enables a 24/7 service
with a sampling rate equal to 0.6 Hz. Our goal is to limit false
positives to 1 per day, thus achieving 99.99% accuracy at least. In
this paper, we add a false positive reduction module to our pre-
existent system, aiming to manage false positive boosters as sensor
drift and signal oscillations. Obtained results, using a Multi Layer
Perceptron classifier, confirm the false positive reduction while
keeping high true positive rates.
Index Terms—water quality monitoring, machine learning,
false positive reduction
I. INTRODUCTION
Water quality is a key-factor for human life and its as-
sessment is increasingly involving researchers from several
research areas, from sensors [1] to data processing [2] and
artificial intelligence [3], [4]. Human activities heavily impact
on water status, and many kinds of pollutants can be found
in it. In this paper, we focus on wastewater, i.e. water after
pollution due to domestic, commercial, industrial activities,
just to cite a few. In a fast developing paradigm as Smart City,
mapping wastewater system by detecting possible pollution
sources is a required task. When performing an environmental
monitoring activity, sensing technologies and data analysis are
the main components to address. Deep reviews are available as
sensing technology regards [5], [6]: the typical set-ups involve
the application of metal electrodes [7] or the adoption of spe-
cific sensing films [8]. Data flowing from sensing technologies
are generally processed through normalization techniques and
feed machine learning [9] and deep learning [10] methods,
used to classify pollutants. Such techniques can be tuned in
order to be very sensitive to various substances and capable
of discriminating also very similar inputs. When sensitivity
is so enhanced, possible measurement problems, sensor drifts
and superimposed noise can easily lead to false positive
increase, i.e. the system can detect pollutants even in case
of water–only condition. Therefore, a specificity addressing
task is always needed to tune the system and reduce false
alarms. The authors stem from a pre-existent system based
on a specific multi-sensing platform, SENSIPLUS, briefly
described in Section II. Furthermore, they exploit past efforts
on the creation of a water monitoring smart system [11]–
[13] to propose a low-cost solution, working as a finite-state-
machine able to reduce false positive, thus enhancing system
specificity, while keeping good system sensitivity. In detail,
both in clean and waste water, what really influences the
detection capability is the system’s promptness to distinguish
between normal condition (i.e. no pollutant, background) and
pollutant conditions. During acquisition phase, the distinction
between aforementioned conditions is not straightforward,
because the system continuously acquires and updates its
background condition adopted to recognize pollutant case. In
[11], the authors adopted a static background update, based on
such a simple rule: the system acquired a certain amount of
samples, surely belonging to background, and updated it with
a very slow average operation, thus avoiding to sudden spikes,
due to contaminant, to deeply influence the background levels.
In this paper, a novel procedure is proposed: the background
estimation is a finite state in which the machine stays as far as
an empirical threshold is overcome. In such a way, whenever
a pollutant provokes a variation of the acquisition level, the
background acquisition is stopped and stored to be used in the
classification case. In this way, it is proved how a strong false
positive reduction can be obtained, while the classification
accuracy remains high. The paper is organized as follows:
in Section II the system architecture is detailed, the data
collection mechanism is reported in Section III; methodology
and results are respectively described in Sections IV and V.
Section VI draws concluding remarks.
II. SYSTEM ARCHITECTURE
The proposed measurement system is shown in figure 1.
The system is based on a proprietary embedded IoT–ready
Micro-Analytical Sensing Platform of size 12.2×15.5mm and
1.5mW power absorption, named Smart Cable Water (SCW).
The SCW is based on the micro-chip SENSIPLUS which
is a proprietary technology of Sensichips s.r.l. developed
in collaboration with the University of Pisa. The internal
and external analog ports allow the micro-chip to perform
measurements with multiple sensors, as in the case of SCW.
Indeed, as shown in Figure 2, the system has been customized
Fig. 1. Measurement System.
to be placed on a printed circuit board endowed with six In-
terDigitated Electrodes (IDE), functionalized with six different
metals: Copper, Gold, Silver, Nickel, Palladium, Platinum. The
adopted physical principle is the electrical impedance that,
for the same measurand, has different value between different
metallized IDEs. By analysing Figure 1, it can be seen:
•The SCW Sensichips board;
•The Espressif Micro Controller Unit (MCU) ESP8266,
managing the measurement layer and transmitting ac-
quired data via USB or Wi–Fi technologies;
•The host controller (PC equipped with Windows/Linux
OSs, Android devices (Smartphone, Tablet)).
As the MCU is concerned, it works as a trans-coder bridge
between the SENSIPLUS chip and the host control, managed
through the bit-banging mode GPIO pin control. In partic-
ular, the MCU communicates with host via USB/Wifi and
with SENSIPLUS via a one-wire proprietary protocol named
SENSIBUS.
Fig. 2. Smart Cable Water.
TABLE I
SYNTHETIC WASTE WATER CHEMICAL COMPOSITION.
Compounds [mg/l]
Fertilizer 91.74
Ammonium Chloride 12.75
Sodium Acetate Trihydrate 131.64
Magnesium Hydrogen Phosphate Trihydrate 29.02
Monopotassium Phosphate 23.4
Iron (II) Sulfate Heptahydrate 5.80
Starch 122.00
Milk Powder 116.19
Yeast 52.24
Soy Oil 29.02
III. DATA COLLECTION
The system has been designed to recognize substances in
waste water, but acquisition of all measurements cannot be
done directly in the sewage network for two main reasons:
•First, from a measuring point of view, all measurements
must be acquired in the same reliable conditions and,
since the sewage background environmental composition
is not stable (for example just think to rain event and
domestic spills), it is not possible to achieve the same
reliable conditions;
•Second, from a human health point of view, it would rep-
resent biological hazards due to the presence of viruses,
bacteria, parasites, and other dangers.
For these reasons, the experimental data have been acquired
using a Synthetic Waste Water (SWW) in order to simplify
the laboratory activities. In this background (SWW), the 5
substances under consideration have been spilled: Sulphufic
Acid (SA), Ethanol (E), Sodium Hypochlorite (SH), Sodium
Chloride (SC) and Dish Wash detergent (DW).
The SWW has been obtained through a mix of substances
(see Table I) able to reproduce the PH and the conductivity of
the real Waste Water.
As the data acquisition is concerned, it is divided in two
main phases:
•Warm–Up phase: the first 600 samples are acquired in
only SWW, in order to let all sensors stabilize and to
build a measurement reference (a.k.a. baseline) used to
normalize the following measures;
•Measurement phase: the given substance is spilled in
the SWW, and 1000 samples are acquired. Thus, these
samples represent the sensor evolution after substance
injection.
As for the acquired components, we measure ten different
quantity between: Gold, Silver, Platinum and Nickel IDEs. On
each IDE, we acquire resistance and capacitance at different
frequencies, in particular:
•Gold: Resistance and Capacitance at 200Hz and Resis-
tance at 78kHz;
•Silver: Resistance and Capacitance at 200Hz;
•Platinum: Resistance and Capacitance at 200Hz and
Resistance at 78kHz;
•Nickel: Resistance and Capacitance at 200Hz.
IV. METHODOLOGY
In this section, a new approach for baseline tracking is
presented and compared to that proposed in [11].
A. The old baseline tracking
In the previous version [11], the baseline was captured
through an Exponential Moving Average (EMA) according to
the following equation:
bt=(stt= 0
αst+ (1 −α)·st−1, t > 0(1)
with α= 1/EM Ac, where stis a vector of signals coming
directly from SENSIPLUS sensors, btis a vector of baselines
(one for each signal), and EM Acis the EMA coefficient
empirically set to 10.000. This type of baseline (evaluated
for each sensor) showed inadequate behavior compared to
expectations because:
•It is unable to follow the sensor drift during the phase in
which only SWW is present;
•It is unable to preserve a stable value during the phase
in which a substance is present;
•Finally, it works in counter-phase during wash-out (when
the substance go away).
Figures 3 and 4 represent the behaviours of the old baseline
and the new baseline. A set of instants are highlighted: t0
when the acquisition starts, t1when the substance is injected,
t2when the substance is removed (through dilution), t3when
the substance “disappears” because the dilution becomes very
high. In figure 3, the baseline for a single signal is generated
without any adaptation to the signal behaviour. It is possible
to see that, in the interval [t0, t1], the baseline is far from
the true signal. This distance augments during the injection
time [t1, t2](that is good), but the distance between signal
and baseline vanishes because the baseline continues to track
the signal (that is bad), furthermore generating an anti-phase
behavior during the period after t3.
Fig. 3. The behaviour of the previous EMA [11].
Fig. 4. The new baseline.
All these oscillations are not good for the classification
that is realized on a feature vector ftobtained through a
“normalization” corresponding to the ratio between the sensor
signal stand the evaluated baseline bt, as reported in the
following expression:
ft=st/bt(2)
B. The new baseline tracking
In order to evaluate in a different way the baseline, we
modified our system by introducing the Finite State Machine
(FSM) reported in figure 5.
In figure 5, we indicate tas the current time sample, and we
also report the parameters α= 1/EM Ac, with EM Ac= 25
and τ= 0.05, both empirically set. Furthermore, the distance
dtis evaluated as the Euclidean distance between the point of
coordinates 1and the vector ftin a 10-dimensional space (the
number of sensors and the size of st):
dt=kft−1k(3)
The new baseline is evaluated as:
bt=
stt= 0
bt−1t > 0, S ∈ {BS, BS P }
αst+ (1 −α)·st−1, t > 0, S ∈ {W T, B A, BT }
(4)
Fig. 5. A Finite State Machine (FSM) for the baseline acquisition and tracking.
where it is possible to see that the tracking with EMA is
activated only when t > 0and the state Sof the FSM belongs
to {W T, B A, BT }, while the value of btremains constant
when S∈ {BS, BS P }(the FSM and its states are described
below).
The entire system is depicted in figure 6 and the obtained
behaviour in terms of tracking capacity is reported in figure 4.
The final output ctrepresents the classification given in output
by the whole system.
As shown in figure 5, when the acquisition starts the first
state of the FSM is WT (WaiT), in which the machine remains
for EM Acsamples. After EM Acsamples, the next state of
the FSM is the BA (Baseline Acquisition): during this state,
and until (E(dt)+3σ)> τ, the machine tries to follow the
signal, the classifier is disabled, and the system is in a kind of
warm-up state (the output is ct=SW W , i.e. the background).
The mean Eon dtis evaluated on the last EMAcsamples and
τhas been empirically set to 0.05. When the tracking becomes
stable with E(dt)+3σ < τ, the system changes its state in BT
(Baseline Tracking), i.e. it is ready for classification. During
the BT state, the FSM generates a predefined output class that
is ct=SW W . When a substance is injected we expect that dt
grows-up and, in particular, if dt> τ, the system goes in a so
called BSP (Baseline SusPended) state, where, the classifier
is not activated (the output will be again ct=SW W ). At
this point two state changes are possible: if dt> τ until a
Fig. 6. A flow diagram of the entire system.
counter in BSP reaches 5, the machine changes its state in
BS (Baseline Stopped), where for each sample and for each
feature vector ftthe classifier is called. If dt< τ before
these 5instants, the FSM returns in BT state. At this point,
the only way to exit from BS state is when the classifier
TABLE II
NUM BER O F SA MPL ES FO R DI FFER EN T CLA SS ES AN D FO R TRAINING,
TES T AND VAL IDATI ON SE T
Substances Class Samples
Total Training Validation Test
E (1) 9371 5183 3174 1014
SH (2) 9122 5059 3073 990
SA (3) 9162 5079 3071 1012
DW (4) 9172 5078 3081 1013
SC (5) 9162 5078 3071 1013
SWW (6) 9165 5079 3072 1014
Total 55154 30556 18542 6056
called during the BS state classifies the substance as SWW
(the background). If this happens for at least EM Actimes, the
FSM comes back to BA to prepare itself for a new substance,
and the output will be SWW to allow the reset of the baseline
tracking system.
V. RE SU LTS
In the experiments, a comparison between the previously
proposed solution [11] and the new one is presented. In
[11], results obtained with a Multi Layer Perceptron (MLP)
network combined with Principal Component Analysis (PCA)
technique were presented. The same experiments have been
replicated in this paper in order to give a complete comparison
in terms of obtained improvements. The main differences
between two approaches are:
•The number of features: 10 in the new solution, 8 in the
old one;
•The preprocessing phase, i.e. the baseline acquisition
described in the previous section.
In table II a description of the entire dataset is reported.
In Table III and IV, we report the results obtained re-
spectively with the old system [11] and the newly proposed
one. With the previously proposed approach, the best results
were obtained with an MLP characterized by 3 inputs (with
PCA), 6 outputs (one for each class) and 32 hidden neurons.
The best mean accuracy was 87.32% and the maximum
accuracy for SWW was 99.70%. The new solution shows
better performance in terms of specificity and sensibility with
a mean accuracy of 97.20% and an accuracy on SWW equal
to 99.99% in the best case that is obtained with an MLP with
10 inputs, 6 outputs and 16 hidden neurons.
It is worth noting that the obtained performance on SWW
are compatible with the desired accuracy of 99.98%. Such
value comes from the following consideration: the samples are
acquired continuously at a rate of one for each 1.6 seconds.
With this rate, during a day, 54000 = 24∗60∗60/1.6samples
will be acquired. Supposing that not more than one false
positive is acceptable in a day, the requested accuracy becomes
at least: 99.99% = 100 ∗(54000 −1)/54000.
VI. CONCLUSIONS
In this paper, a false positive reduction system has been pre-
sented and the results are compared with a previously proposed
approach where too many false positives were generated for a
TABLE III
GLO BAL R ESU LTS FO R ANN FROM THE PREVIOUS METHOD. FOR E ACH
EX PER IM ENT H AVE BEE N RE PORT ED : SHL = S IZ E OF HIDDEN LAYER ,
PCA = WITH OR WITHOUTH PCA , M = ME AN,SD=STANDA RD
DEV IATIO N, SWW = AC CUR ACY O N SWW
SHL PCA M SD SWW
64 no 0.8226 0.11651 0.9512
64 yes 0.8009 0.12908 0.9444
32 no 0.7795 0.10832 0.9362
32 yes 0.8732 0.13132 0.9970
16 no 0.8264 0.10541 0.9840
16 yes 0.8366 0.12749 0.9734
TABLE IV
GLO BAL R ESU LTS FO R ANN FRO M THE N EW M ETH OD . FOR E ACH
EX PER IM ENT H AVE BEE N RE PORT ED : SHL = S IZ E OF HIDDEN LAYER ,
PCA = WITH OR WITHOUTH PCA , M = ME AN,SD=STANDA RD
DEV IATIO N, SWW = AC CUR ACY O N SWW
SHL PCA M SD SWW
64 no 0.9719 0.0443 0.9989
64 yes 0.9699 0.0443 0.9987
32 no 0.9720 0.0443 0.9981
32 yes 0.9688 0.0442 0.9989
16 no 0.9720 0.0443 0.9999
16 yes 0.9693 0.0445 0.9984
continuous and real-time application context. The results show
that the new approach outperforms the old one both from the
global accuracy that grown from 87.32% to 97.21% and false
positive that becomes compatible with a use 24h/7d. First trial
on the field showed promising results.
VII. ACKNOWLEDGMENT
The research leading to these results has received fund-
ing from the European Unions Horizon 2020 research and
innovation programme under grant agreement SYSTEM No.
787128. The authors are solely responsible for it and that it
does not represent the opinion of the Community and that
the Community is not responsible for any use that might be
made of information contained therein. This work was also
supported by MIUR (Minister for Education, University and
Research, Law 232/216, Department of Excellence).
REFERENCES
[1] J. Cleary, D. Maher, C. Slater, and D. Diamond, “In situ monitoring of
environmental water quality using an autonomous microfluidic sensor,”
in 2010 IEEE Sensors Applications Symposium (SAS), 2010, pp. 36–40.
[2] A. C. D. S. J´
unior, R. Munoz, M. D. L. A. Quezada, A. V. L. Neto,
M. M. Hassan, and V. H. C. D. Albuquerque, “Internet of water things: A
remote raw water monitoring and control system,” IEEE Access, vol. 9,
pp. 35 790–35 800, 2021.
[3] K. S., S. T.V., M. S. Kumaraswamy, and V. Nair, “Iot based water
parameter monitoring system,” in 2020 5th International Conference on
Communication and Electronics Systems (ICCES), 2020, pp. 1299–1303.
[4] Z. Sun, N. B. Chang, C. F. Chen, C. Mostafiz, and W. Gao, “Ensemble
learning via higher order singular value decomposition for integrating
data and classifier fusion in water quality monitoring,” IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing,
vol. 14, pp. 3345–3360, 2021.
[5] S. Zhuiykov, “Solid-state sensors monitoring parameters of water quality
for the next generation of wireless sensor networks,” Sensors and
Actuators B: Chemical, vol. 161, no. 1, pp. 1 – 20, 2012.
[6] S. N. Zulkifli, H. A. Rahim, and W.-J. Lau, “Detection of contaminants
in water supply: A review on state-of-the-art monitoring technologies
and their applications,” Sensors and Actuators B: Chemical, vol. 255,
pp. 2657 – 2689, 2018.
[7] C. Desmet, A. Degiuli, C. Ferrari, F. S. Romolo, L. Blum, and C. Mar-
quette, “Electrochemical sensor for explosives precursors’ detection in
water,” Challenges, vol. 8, no. 1, 2017.
[8] J. K. Atkinson, M. Glanc, M. Prakorbjanya, M. Sophocleous, R. P.
Sion, and E. Garcia-Breijo, “Thick film screen printed environmental and
chemical sensor array reference electrodes suitable for subterranean and
subaqueous deployments,” Microelectronics International, Apr. 2013.
[9] G. Charulatha, S. Srinivasalu, O. Uma Maheswari, T. Venugopal, and
L. Giridharan, “Evaluation of ground water quality contaminants using
linear regression and artificial neural network models,” Arabian Journal
of Geosciences, vol. 10, no. 6, p. 128, Mar 2017.
[10] S. N. Dean, L. C. Shriver-Lake, D. A. Stenger, J. S. Erickson, J. P.
Golden, and S. A. Trammell, “Machine learning techniques for chemical
identification using cyclic square wave voltammetry,” Sensors, vol. 19,
no. 10, 2019.
[11] M. Ferdinandi, M. Molinara, G. Cerro, L. Ferrigno, C. Marrocco,
A. Bria, P. Di Meo, C. Bourelly, and R. Simmarano, “A novel smart
system for contaminants detection and recognition in water,” in 2019
IEEE International Conference on Smart Computing (SMARTCOMP),
June 2019, pp. 186–191.
[12] M. Molinara, M. Ferdinandi, G. Cerro, L. Ferrigno, and E. Massera, “An
end to end indoor air monitoring system based on machine learning and
sensiplus platform,” IEEE Access, vol. 8, pp. 72204–72 215, 2020.
[13] G. Betta, G. Cerro, M. Ferdinandi, L. Ferrigno, and M. Molinara,
“Contaminants detection and classification through a customized iot-
based platform: A case study,” IEEE Instrumentation Measurement
Magazine, vol. 22, no. 6, pp. 35–44, Dec 2019.