Content uploaded by Mario Molinara
Author content
All content in this area was uploaded by Mario Molinara on Mar 15, 2022
Content may be subject to copyright.
Noname manuscript No.
(will be inserted by the editor)
Evolutionary computation to implement an IoT-based
system for water pollution detection
Claudio De Stefano ·Luigi Ferrigno ·
Francesco Fontanella ·Luca Gerevini ·
Mario Molinara
the date of receipt and acceptance should be inserted later
Abstract The problem of detecting pollutants in water with non-invasive and1
low-cost sensors is an open question. In this paper, we propose a system for the2
detection and classification of pollutants based on the improvement of a previ-3
ous proposal, focused on geometric cones. The solution is based on a classifier4
suitable to be implemented aboard the so-called Smart Cable Water (SCW)5
sensor, a multi-sensor based on SENSIPLUS®technology developed by Sen-6
sichips s.r.l. The SCW endowed with six interdigitated electrodes is a smart-7
sensor covered by specific sensing materials that allow differentiating between8
different water contaminants. By using the PCA or LDA decomposition we9
obtain a data compression that makes data suitable for the ”edge computing”10
paradigm with a reduction from a 10-dimensional space to a 3-dimensional11
space. We defined an ad-hoc classifier to distinguish contaminants represented12
by points in the 3-dimensional space. We used an evolutionary algorithm to13
learn the classifier’s parameters. Finally, we compared the performance of our14
system with that achieved by the old classification system based only on PCA,15
as well as those achieved by other machine learning algorithms. The proposed16
system achieved the best accuracy of 87%, outperforming the other state-of-17
the-art systems compared. The novelty of the system proposed lies in the usage18
of an evolutionary algorithm for the optimization of the parameters of a novel19
PCA-based classification algorithm for the detection of water pollutants.20
Corresponding author:
Francesco Fontanella
tel.: +39 0776 2993382
E-mail: fontanella@unicas.it
Claudio De Stefano, Luigi Ferrigno, Francesco Fontanella, Luca Gerevini, Mario Molinara
Department of Electrical and Information Engineering (DIEI)
University of Cassino and Southern Lazio
Via G. Di Biasio, 43 03043
Cassino (FR), ITALY
2 Claudio De Stefano et al.
1 Introduction21
Water pollution is a worldwide concern and the WHO (World Health Orga-22
nization) has estimated that about two billion people worldwide is plagued23
by this problem [29]. This concern also includes wastewater which, if properly24
treated helps protecting ecosystems and may allow recovering energy, nutri-25
ents, and recoverable materials.26
Water quality is usually monitored by laboratory analyses, performed by27
experienced professionals who use sophisticated tools. Due to the time and28
cost of this approach, environmental disasters cannot be prevented. This pre-29
vention activity requires an effective and reactive water analysis through a30
large number of distributed measurement systems. While these systems are31
available and guarantee good performance in terms of accuracy and reliability,32
their large-scale use is limited by their high costs.33
In this context, low-cost microsensors for a capillary monitoring would34
be very useful. These microsensors should combine low costs with a good35
measurement accuracy (even for low levels of pollution), as well as good re-36
liability. Such a system could easily spread to developing countries as well.37
Furthermore, systems conceived in this way would make it possible to use the38
paradigms of the Internet of Things (IoT) [1,19, 23,26] as well as those of edge39
and fog computing [18,27] to perform early analysis and detection in the field.40
These paradigms benefit from the application of Artificial Intelligence (AI)41
and Machine Learning (ML) techniques to effectively analyze and exploit the42
information contained in the generated data[2,5, 6].43
In this context, in two previous papers [10,28], we presented two IoT-ready44
systems for wastewater pollutant detection and classification, based on the45
multi-sensor microcontroller SENSIPLUS®. In both systems input data were46
first projected into a 3-D space and then classified using simple geometrical47
models. The aim was to implement these models using a simple SENSIPLUS®
48
microcontroller with few computational resources (or other cheap microcon-49
trollers currently available on the market). In the first paper, we proposed a50
model based on cones centered on the origin of the transformed 3-D reference51
system. We used a one-versus-all strategy to learn the four parameters of the52
cones, one for each pollutant to be recognized. In the second paper, each con-53
taminant was represented by a straight line passing through the origin of the54
transformed 3-D reference system and each data point assigned to the near-55
est line. This approach allowed us to implement a multiclass classifier in a56
straightforward manner, thus avoiding problems of labeling conflicts between57
cones deriving from the one-versus-all strategy. This approach also allowed us58
to simplify the classification model: in a 3-D space a straight line is represented59
by three parameters.60
In both papers, we used an evolutionary algorithm (EA) to find the optimal61
values of the model parameters (cones or lines). These systems were tested on62
four contaminants (acetic acid, phosphoric acid, sulphuric acid, ammonia) as63
well as synthetic wastewater.64
Title Suppressed Due to Excessive Length 3
In this paper, we present a further development of the systems described65
above. In particular, we used polar coordinates to represent lines. This al-66
lowed us to represent each line using two parameters. Any system for water67
contaminant detection should be able to recognize as many pollutants as pos-68
sible. For this reason, we tested our system on more pollutants with respect to69
those tested in the two previous approaches mentioned above. We also tried70
two strategies to improve the performance of our system: (i) we tested linear71
data analysis (LDA) to project the input data into the 3-D space; (ii) we used72
training data points to initialize the individuals in the initial population.73
To summarize, the main objectives of the paper are the following:74
–We propose a further development of a previously presented IoT-based75
systems for wastewater pollutant detection;76
–We propose a novel 3D data representation based on polar coordinates;77
–in order to improve the performance of our system, we compared two di-78
mensionality reduction strategies: the first one based on the PCA algorithm79
used in a previous study, and the second one based on a LDA algorithm;80
–we also improved the results provided by the evolutionary algorithm by81
using a different initialization procedure, which generate the initial popu-82
lation tacking into account the information of the training set;83
–we tested the ability of our system to cope with several contaminants by84
increasing the number of pollutants considered in the experiments.85
The remainder of the paper is organized as follows: Section 2 discusses the86
related work; Section 3 details data collection; Section 4 presents the system87
architecture and the evolutionary algorithm we used to learn the parameters88
of the classification model; Section 5 shows the experimental results. Some89
conclusions are eventually left to Section 6.90
2 Related work91
The analysis of issues related to the monitoring of environmental pollution92
is engaging many researchers and technical communities. They are trying to93
propose new emerging sensors able to reliable detect pollutants saving money,94
size and energy consumption, new network technologies, new communication95
standards and, finally, new methods for the data analysis. Many researchers96
are exploiting the advantages offered by Artificial Intelligence and Machine97
Learning (ML) [3,4,13, 15]. ML techniques are often preprocessed using Prin-98
cipal Component Analysis (PCA). PCA is a dimensionality reduction tech-99
nique that aims at removing redundant and poorly statistical significant (with100
respect to the target concept) features.101
It is worth noting that PCA is most often used as a preprocessing step for102
feature reduction before classification and generally not used to develop an ad-103
hoc classifier as is the case of our approach. In [21], for example, the authors104
faced the classification problem of gene-expression microarray data. They pro-105
posed a novel method (named PCA-BEL) based on a dual stage approach106
4 Claudio De Stefano et al.
that considers at first a PCA to reduce the quantity of features and a Brain107
Emotional Learning (BEL) network, with the aim of minimizing the compu-108
tational effort. BEL networks are methodologies that use simulated emotions109
to aid their learning process. The novelty of the proposal was the application110
of BEL model to cancer classification. Using standard datasets, they propose111
a benchmarking of the method versus standard classification methods for five112
types of cancers. The experimental results showed that PCA-BEL achieved113
improved detection for three of them, whereas worsen results were obtained114
for the remaining two. In [17], instead, the authors proposed a comparison115
between PCA and SVM in fault classification for ”complicated industrial pro-116
cess”. They used a standard dataset publicly available named Tennessee East-117
man (TE) challenging benchmark. Th experimental results showed that the118
PCA offers a higher classification rate for this multi-class classification case119
with much less computational effort based on the results obtained from the120
TE challenge process, whereas SVM classification takes longer and gets less121
accurate classification results. Finally, similar approaches have also been used122
for face recognition [14] and intrusion detection problems [30]. In [14] the au-123
thors proposed a face recognition system based on PCA combined with SVM.124
The novelty of the paper was in the combination of PCA with SVM. The125
experimental results showed that the RBF kernel outperformed other kernels126
obtaining the best results also with respect to a MLP. In [30] the authors127
presented a novel approach for the intrusion detection in computer networks.128
They developed an adaptive intrusion detection method based on PCA and129
SVM. The method consisted of two steps. The former is the PCA that aims130
in reducing the quantity of data to be considered by the classifier. The latter131
is the SVM network that solves the classification problem. The use of a stan-132
dard dataset showed the effectiveness of the approach in terms of accuracy133
and training and testing times.134
Evolutionary computation (EC) based algorithms have proven to be effec-135
tive in solving many real-world problems characterized by large and non-linear136
search spaces [12,7, 11,8]. They have also been used to improve the perfor-137
mance provided by the basic PCA algorithm. In [20], in a hyperspectral image138
classification problem, the authors proposed a new approach (called Selective139
Principal Component Analysis based on Genetic Algorithm with Subgroups,140
SPCA-GAS) to select the best feature subset to provide in input to PCA. This141
approach introduced the use of subgroups for combining feature selection and142
extraction. Each subgroup was a partition of the search space including the143
solutions with a number of features in the range [ni, ni+1]. Then each subgroup144
was separately evolved in a subpopulation to find the best number of features145
to be selected. Experiments were carried out on the AVIRIS dataset with 100146
bands, and showed that SPC-GAS outperformed the compared approaches in147
terms of average accuracy and feature reduction rate.148
In [31], instead, the authors used a GA to find the optimal PCA components149
so that the resultant features (extracted from PET images) were capable to150
identify people with suspected Alzheimer’s disease. The proposed approach151
was based on eigenvector selection and weighting and achieved a test accu-152
Title Suppressed Due to Excessive Length 5
Fig. 1: Layout of the proposed system.
racy of 90% on a dataset of 210 clinical cases. Finally, in [22] the authors153
introduced a novel approach that used PCA and GA for human face recogni-154
tion. The PCA was used for extracting features from images with the help of155
covariance analysis to generate eigen components of the images, whereas the156
GA was used for dimensionality reduction. The proposed approach was tested157
on the Japanese Female Facial Expression (JAFFE) dataset and achieved an158
accuracy of approximately 96%.159
From the brief literature review outlined above, we can observe that EC-160
based approaches were mainly used to optimize the performance of PCA by161
suitably selecting or modifying the principal components provided by the stan-162
dard algorithm. However, to the best of our knowledge, EC-based algorithms163
have never been used to learn the parameters of a classification model in the164
feature space provided by the PCA, as is the case of our approach. The only165
exception are the two papers mentioned in the Introduction [10,28], reporting166
the results of our previous studies. As shown in the experiments (Section 5),167
this approach allowed our method to significantly improve the performance of168
wastewater pollution detection systems.169
3 Data Collection170
Data were collected in such a way to simulate the conditions present in a con-171
trolled drain of a sewage network. To this aim, we used Synthetic Wastewater172
(SWW), whose composition is shown in Table 1. Further details about its can173
be found in [24]. Furthermore, before starting each acquisition, the pH of the174
SWW was corrected to stay between 7.2 to 7.9. The correction has been made175
by adding some NaOH in order to rise the pH value, or some HCl to decrease176
it. Finally, the data were collected at room temperature, in a range between177
20◦C and 30◦C.178
Data were collected using a Sensichips board equipped with a SENSIPLUS®
179
micro-chip connected to six sensors, and a micro controller unit for data com-180
munication. Data are sent to external device. This latter can be personal com-181
puter, a tablet, or even a smartphone.182
3.1 Measurement procedure183
The measurement procedure used to collect data from the sensors consists of184
two steps:185
6 Claudio De Stefano et al.
Table 1: Synthetic Wastewater chemical composition.
Compounds [mg/L]
Fertilizer 91.74
Ammonium Chloride 12.75
Sodium Acetate Trihydrate 131.64
Magnesium Hydrogen Phosphate Trihydrate 29.02
Monopotassium Phosphate 23.4
Iron (II) Sulfate Heptahydrate 5.80
Starch 122.00
Milk Powder 116.19
Yeast 52.24
Soy Oil 29.02
–Warm-Up: allows sensors to be stabilized, the first 600 samples are acquired186
in SWW only.187
–Measurement: after the collection of the first 600 samples, the contaminant188
is injected in the SWW and 1000 samples are acquired.189
The 600 samples acquired in the warm-up step are also used to build a ”base-190
line”, which is used to normalize the remaining measurements. This normaliza-191
tion plays an important role because it allow us to ”ignore” the contaminants192
already present in the wastewater and let the system able to focus only on the193
contaminant injected. Contaminant concentrations and quantities are shown194
in Table 2. It is important to specify that before acquiring the data for a given195
contaminant, preliminary tests were carried out to ensure that, given the quan-196
tity of contaminant selected, the pH was between 3.0 and 12.0 (values outside197
of which the SCW sensors are damaged).198
199
4 System architecture200
The proposed system is made of two modules (see Fig. 1). The first module uses201
a dimensionality reduction algorithm to transform the input data, consisting of202
ten electrical measures, i.e. resistance and capacity values measured at different203
frequencies and with various sensors (platinum, gold, silver, and nickel) into a204
3-D space. The aim of this transformation is twofold: (i) simplify the original205
Contaminant
Initial
Concentration
[%]
Injected
Quantity
[mL]
Final
Concentration
[%]
Acetic Acid 80 0.3 0.2393
Ammonia 30 0.3 0.0897
Phosphoric Acid 75 0.12 0.0899
Hydrogen Peroxide 35 0.4 0.1394
Formic Acid 85 0.1 0.0849
Sulphuric Acid 95 0.1 0.0949
Table 2: Contaminant concentrations.
Title Suppressed Due to Excessive Length 7
Fig. 2: Flowchart of the proposed system.
data identifying a few uncorrelated features which maximize data variability;206
(ii) to set up a simple classification model that can be implemented with very207
few hardware resources as is the case the simple water sensors considered in208
our study.209
Once data points are projected into the 3D space, the second module clas-210
sifies them using a simple geometrical model: straight lines passing through211
the origin of the transformed 3-D reference system. The entire classification212
system is shown in Figure 2.213
The dimensionality reduction transformation and the classification model im-214
plemented are described in the following subsections.215
4.1 Data transformation216
As mentioned in the Introduction, our main goal is to build an ad-hoc clas-217
sification system that can be implemented aboard low-cost sensors. For this218
reason, it is necessary to try to simplify as much as possible the space dimen-219
sionality in order to reduce the overall system’s computational complexity and220
the number of parameters of the model used to classify water pollutants.221
For the dimensionality reduction we tested two approaches: PCA and LDA.222
These procedures are used to project the points (data samples) from the orig-223
inal ten-dimensional input space to a 3-D space. This data transformation224
8 Claudio De Stefano et al.
allow us to design a simple and lightweight classification model that can be225
implemented with very few hardware resources.226
The PCA is used in order to decompose a multidimensional dataset into227
a set of orthogonal components that try to maximize the variance. Basically228
the PCA learns a given number of components that can be used on new data,229
to project it on these components. In our case the PCA has been performed230
using the randomized truncated Singular Value Decomposition of the data,231
which performs the eigen-decomposition of the covariance matrix [16].232
The LDA algorithms is based on an unsupervised procedure that generated233
a linear decision boundary, using Bayes’ rule and fitting the Gaussian density234
to each class, assuming that all classes share the same covariance matrix.235
In order to reduce the dimensionality, the LDA, can project the input data236
according to the most discriminative directions.237
4.2 The classification model238
Once the original data points are projected into the 3-D space, the training239
samples are passed to the multi-class classifier that models the Cclasses (the240
number of classes is equal to the number of contaminants to recognize).241
Each contaminant, but for the Synthetic Wastewater (SWW), is modelled by242
a line rpassing through the origin of the reference system as shown in Figure243
3(a). According to the approach presented in [28], to classify SWW points244
we use a sphere, instead of a line. The sphere represents a threshold on the245
distance from the origin of the reference system: given point P, if it falls inside246
the sphere, we classify it as belonging to SWW (see Figure 3).247
We chose this solution since, according to the data normalization’s mechanism248
used (see Section 5), SWW samples are concentrated around the origin (see249
Figure 3(b)). All the others contaminants, instead, are modelled by a line r250
defined by the spherical coordinates θ,φand ρ. However, since the ρparam-251
eter defines the distance from the origin, we take it as a constant value and252
so the classification model just evolves two parameters: θand φ. Therefore,253
a line is defined by only these two parameters. A given point Pis labeled to254
belonging to the class associated to its nearest line. Regarding our old clas-255
sification model based on Clines rdefined by 3 parameters l, m, n we chose256
to use the spherical coordinates θand φ(with constant ρ) in order to further257
decrease the complexity and the necessary memory of our classification’s sys-258
tem. Indeed, while in the old model we have to store C×3 coefficients, in the259
new classification model we have to store only C×2 coefficients.260
The problem of finding the set of parameters θ,φthat maximize the per-261
formance for a given dataset D, is an optimization problem where the function262
to maximize is the global accuracy (see Algorithm 2). To summarize, a given263
point Pin the 3-D space is assigned according to the following procedure:264
1. check whether Pis inside the SWW’s sphere or not. If so, assign Pto265
SWW, otherwise go to point 2;266
2. compute the distance between Pand each of the Clines;267
Title Suppressed Due to Excessive Length 9
(a) Entire Model 3-D View.
(b) Sww Sphere
Fig. 3: Line-based Model 3D View.
10 Claudio De Stefano et al.
3. assign Pto the class of its nearest line.268
In order to find the distance between a line rand a point Pwe can start from269
the parametric equation in the 3-D space of the line:270
r:
xr=x0+lt
yr=y0+mt
zr=z0+nt
with t∈R=⇒vr= (l, m, n) (1)
(x0, y0, z0) represent the origin of the reference system where the line rpasses271
through, whereas (l, m, n) are the components with respect to the base {i, j, k}272
of a parallel vector to r. We computed the transformation from Cartesian to273
polar coordinates inverting the following equations1:274
l=ρsin φcos θ ρ ∈[0,+∞]
m=ρsin φsin θwith θ∈[0,2π)
n=ρcos φ φ ∈[0, π]
(2)
Note that as mentioned above, in a 3D space a line passing through the origin275
is characterized by θand φ, whereas ρcan assume any value. For this reason276
we set ρ= 1.277
At this point, we can compute the plane α, orthogonal to rand that passing278
through the point P. The Cartesian equation of a plane αin the 3-D space is:279
α:ax +by +cz +d= 0 (3)
The plane orthogonal to the line rthat passing through Pis computed by:280
lx +my +nz −(lxP+myP+nzP) = 0 (4)
Now by replacing (x, y, z) with (xr, yr, zr) we are able to solve the Eq. 4 with281
respect to the parameter tof the Eq. 1. Thus we can calculate the point282
H(xH, yH, zH) given by the intersection between the plane of Eq. 4 and the283
line of Eq. 1 which brings us to the:284
H:
xH=x0+lt
yH=y0+mt
zH=z0+nt
(5)
And now, we can finally compute the distance between Pand ras:285
d(P, r) = d(P, H) = p(xP−xH)2+ (yP−yH)2+ (zP−zH)2(6)
Title Suppressed Due to Excessive Length 11
Algorithm 1: Evolutionary algorithm
input: list of parameters (Table 3), a data set T
output: best individual found
begin
1: randomly initialize a population of Pindividuals;
2: evaluate the fitness of each individual;
3: g= 0;
4: while g < Ngdo
5: copy the best eindividuals in the new population;
6: for i= 0 to P/2−edo
7: select a couple of individuals;
8: replicate the selected individuals;
9: if flip(pc)then
10: apply the crossover operator on the selected individuals;
11: if flip(pm)then
12: perform the mutation on the offspring;
13: evaluate the fitness of each individual;
14: replace the old population with the new one;
15: update the best individual found so far;
16: g=g+ 1;
17: return the best individual found;
end
The function flip(p) returns the value 1 with a probability pand the value 0 with a
probability (1 −p).
4.3 The evolutionary algorithm286
To find the optimal parameters for the Clines in the 3-D space representing287
the contaminants to be detected and distinguished, we used a generational288
evolutionary algorithm (EA). The algorithm starts by generating a population289
of Pindividuals, each made of 2 ×Creal variables.290
Afterwards, the fitness of the individuals is evaluated and then a new pop-291
ulation is generated. In order to implement an elitist strategy, the best eare292
just copied in the new population, without being modified by the genetic op-293
erators2. Then the remaining (P− D)/2 couples of individuals are selected294
by using the tournament method. Then the uniform crossover is applied to295
each of the selected couples, according to a given probability factor pc. Next,296
the mutation operator is applied with a probability pm. The value of pmhas297
been set to 1
2×C, i.e. the reciprocal of the number of variables in the chromo-298
some. This probability value allows, on average, the modification of only one299
chromosome element. This value has been suggested in [25] as the optimal mu-300
tation rate below the error threshold of replication. Finally, these individuals301
are added to the new population. The process just described is repeated for302
1
2Note that this strategy ensures that the best individuals found along the evolutionary
process are not lost.
12 Claudio De Stefano et al.
Nggenerations. The evolutionary algorithm implemented is described in Algo-303
rithm 1, whereas Algorithm 2 describes the fitness function used. the Further304
details about the evolutionary algorithm can be found in [9].305
5 Experimental results306
To test the effectiveness of our system we took into account six contaminants307
(acetic acid, phosphoric acid, sulphuric acid, ammonia, formic acid and hydro-308
gen peroxide) as well as SWW. The dataset contained ten data acquisitions309
per contaminant, each consisting of 1,600 samples. From each acquisition we310
removed the first 600 samples: the first half to allow sensors to stabilize, the311
remaining half to build the baseline. The last 1,000 samples were used to312
build up the dataset used for the experiment detailed in the following. So,313
for each contaminant we used 10,000 samples. To evaluate the classification314
performance of our system we used the ten-fold cross-validation strategy.315
Data were acquired using twenty SCW boards, equipped with different316
versions of the SENSIPLUS®micro-chip. Furthermore, to assess the device317
independence of the results provided by our system, data were organized in318
such a way that the samples recorded using the same device were included319
in the same fold. Note that this data split allowed us to assess the device320
independence of the performance achieved by our system.321
Since the nature of the physical quantities measured by the aboard sensors322
of our system is different, they have different value ranges. For this reason we323
performed a data normalization with respect to the baseline.324
In particular, the 1,000 samples of each acquisition used to build our dataset325
were divided by the respective baseline, computed on the first 600 samples as326
follows:327
n=xi
bi
−1.0∀i∈[1,2, ..., 10] (7)
where xirepresents the set of last 1,000 samples of the i-th acquisition, bi
328
is the baseline of that given acquisition, and nis the output set, consisting329
Algorithm 2: Fitness function
input: A dataset Tconsisting of NTpoints in the 3-D space, an individual I
representing Clines
output: Accuracy achieved by Ion T
begin
Nc= 0;
for i= 1 to NTdo
compute the distance between Pi∈ T and each of the lines of I
assign Pito the nearest line rn
if label(Pi) == label(rn)then
Nc=Nc+ 1;
return Nc/NT;
end
Title Suppressed Due to Excessive Length 13
Table 3: Evolutionary algorithm parameters.
Parameter Symbol value
Population size P100
Number of Generations Ng500
Elitism e2
Tournament size t5
Crossover probability pc0.6
Mutation probability pm0.08
Mutation range mr0.1
of the normalized samples. Furthermore, to centering our data in the origin,330
we subtract 1.0 to the obtained value. For this reason, since the baseline bi
331
is computed in only SWW, all the SWW samples are around the origin. This332
operation was performed over all the acquisitions used to build our dataset.333
334
We split the dataset built as just mentioned into two sets: a training set335
containing the 90% of the samples of all contaminants except the SWW, and336
a test set consisting of the remaining samples as well as the SWW samples.337
Summarizing, the training set consists of 54,000 samples (9,000 for each con-338
taminant), whereas the test set contains 7,000 samples (1,000 per contaminant339
plus the 1,000 SWW samples). We chose to add only 1,000 samples (one acqui-340
sition) at time of SWW, in the test set, in order to maintain the data balanced.341
Furthermore in order to reduce the training time of the EA, from the 54,000342
samples we randomly chose 5400 of them (10%).343
The training set was used to compute the individuals’ fitness of the EA344
outlined in Section 4. For each fold, we performed thirty runs and at the end345
of each run, the twelve real values (two parameters per contaminant, see eq.346
4) encoded by the individual with the best fitness were stored as the solution347
provided by that run. To set the parameters of the EA, we performed some348
preliminary trials. These parameters were used for the experiments described349
below and are shown in Table 3.350
351
352
To evaluate the effectiveness of our system, we performed three sets of353
experiments, using the data described above. In the first we compared the354
results of our system with those achieved by the system presented in [28]. In the355
second, we tested the two strategies taken into account to improve the system356
performance (LDA data transformation and initial population initialization).357
Finally, in the third set, we compared our results with with those of four well-358
known and widely-used classification algorithms. The experiments performed359
are described in the following subsections.360
14 Claudio De Stefano et al.
Fig. 4: Confusion matrix achieved on the test test with the PB system (overall
accuracy: 0.87).
5.1 Polar coordinates testing361
In the first set of experiments we tried to answer to the following question: do362
the polar coordinates used to represent the axes (see Section 4) allow us to363
improve the performance of our system? To answer this question we compared364
our system’s performance with that achieved by the approach presented in365
[28]. Figures 4 and 5 show the best confusion matrices obtained on the test366
data by our system (polar-based, PB in the following) and the previous one367
(cartesian-based, CB in the following), respectively. From the figures we can see368
that both approaches confused a few the pollutants with SWW, confirming the369
effectiveness of our approach in developing a system able to detect pollutants370
in water. However, in terms of overall accuracy, the novel proposed approach371
outperformed the previous one: PB best overall accuracy was 0.86, whereas372
CB accuracy was 0.42.373
From Figure 4, we can see that both formic and phosphoric acids were con-374
fused with acetic acid (with probabilities equal to 0.46 and 0.38, respectively).375
This result is ”acceptable” because sensors tend to respond to acids with sim-376
ilar patterns. As for Figure 5, we can observe that, apart of the confusion377
between formic and phosphoric acid with sulphuric acid, hydrogen peroxide378
was confused with sulphuric acid whereas formic acid was confused with hy-379
drogen peroxide. This peroxide-acid confusion is much less acceptable than the380
acid-acid confusion, because for these two kinds of substances sensors tend to381
Title Suppressed Due to Excessive Length 15
Fig. 5: Confusion matrices achieved on the test test with the CB system (overall
accuracy: 0.42).
provide different measurement patterns. Together with the overall accuracy382
comparison, this last result confirms that polar coordinates are more effective383
than Cartesian ones in discriminating the pollutants analyzed in this study.384
5.2 System improvement testing385
In the second set of experiments we tried to answer the following question:386
How can we improve the performance of our system? To this aim we tested387
two approaches: (i) the Linear Discriminant Analysis (LDA in the follow-388
ing) algorithm to project the original input data into the 3D space; (ii) using389
information from the training set to initialize the individuals in the initial390
population.391
5.2.1 PCA vs LDA392
As mentioned in Section 4, the LDA algorithm is a dimensionality reduction393
technique that uses a supervised procedure to find a linear combination of the394
16 Claudio De Stefano et al.
Fig. 6: Confusion matrix achieved on the test test with LDA (overall accuracy:
0.71).
features in the original space that allow a better class separation. We used395
LDA to project the original 10 features data points to the 3D space. To test396
the effectiveness of this transformation we compared its results with those397
achieved using the PCA algorithm. Figure 6 shows the confusion matrices398
obtained using LDA as data transformation algorithm. Comparing the matrix399
with that shown in Figure 6 we can see that PCA outperformed LDA, in terms400
of overall accuracy (PCA: 0.86, LDA: 0.71) and pollutant confusion with SWW401
(see last columns of confusion matrices). It is worth noting that this latter402
performance index is much more important than inter-pollutant confusion:403
confusing a pollutant with another one still allow the end-user to be warned,404
whereas confusing a pollutant with SWW does not allow any warning. PCA405
confused very few percentages of pollutants with SWW, whereas LDA confused406
60% of phosphoric acid with SWW. As for the inter-pollutant confusion, we can407
observe that LDA achieved a peroxide-acid confusion (peroxide with acetic and408
formic acids) which is less acceptable than acid-acid confusion (see discussion409
in the previous Subsection).410
These results seems counterintuitive: LDA uses a supervised procedure,411
whereas PCA uses an unsupervised procedure that only preserve data varia-412
tion. However, most probably this depends on the fact that LDA exploits most413
of the class label information, limiting the ability of the EC module to find414
the best 3-D model (the set of axis).415
Title Suppressed Due to Excessive Length 17
Algorithm 3: Individual initialization
input: A dataset Tconsisting of NTpoints in the 3-D space, an individual I
representing Clines to be initialized
output: Iinitialized
begin
Nc= 0;
for i= 1 to Cdo
randomly select from Ta sample sibelonging to the class i
compute θiand φiin such a way that the i-th axis of Ipasses through si
return I;
end
5.3 Smart Initialization416
As second improvement strategy we tested a different initialization procedure417
for the initial population of the evolutionary algorithm (see line #1 of Algo-418
rithm 1). This procedure initializes a fraction of the individuals in the initial419
population using the information provided by the training set. In practice,420
given an individual to be initialized, each couple of real values representing421
an axis ais initialized randomly choosing a sample sfrom the training set be-422
longing to the class (pollutant) of aand setting the values in such a way that423
the apasses through s(see Algorithm 3). Since each sample of our dataset424
is represented by the x,y, and zcoordinates, θiand φivalues can computed425
according to the following equations:426
φ= arccos z
ρwith (x, y, z)6= (0,0,0) (8)
θ=
π
2if x= 0, y > 0
3π
2if x= 0, y < 0
not defined if x= 0, y = 0
arctan y
xif x > 0, y ≥0
arctan y
x+ 2πif x > 0, y < 0 or if x < 0, y > 0
arctan y
x+πif x < 0, y ≤0
(9)
We tested three values for the fraction of initial population to initialize ac-427
cording to procedure described in Algorithm 3: 0.05, 0.10, and 0.15. To assess428
how this ”smart initialization” affected the evolutionary process, we analysed429
and compared the population average fitness (training accuracy) along the430
evolution for the three values tested. This comparison is shown in Figure 7.431
The figure also shows the curve obtained without initialization (0.0). From the432
18 Claudio De Stefano et al.
Fig. 7: Average fitness along the evolution with different fraction of initializa-
tion.
figure we can see that the values 0.0, 0.05 and 0.10 achieved similar perfor-433
mances, with the latter which performed a little better than the other two.434
From the figure we can also see that the value 0.15 performed much worse435
than the other ones. From the figure we can draw the following conclusions:436
(i) a too high fraction of initialized individuals limits the exploration ability437
of our evolutionary algorithm; (ii) low values of initialization do not allow the438
algorithm to improve its exploration ability; (iii) there is a value that allow a439
small but still significant improvement of the performance.440
To further investigate how the smart initialization affected the evolutionary441
process we plotted the population average fitness (training accuracy) along the442
evolution for the three values tested also for the single pollutants. The plots443
are shown in Figure 8. The plots show that, as expected, for most pollutants444
the value 0.15 performed worse than the others. Only for the sulphuric acid445
this value outperformed the other ones. Most probably, the axis parameters446
for this acid were not easy to find and a higher initialization fraction helped447
the search process. The value 0.10 performed slightly better than the others448
except for the formic acid. In this case the random initialization was the best449
performing. Most probably, this result confirms that for this acid the values450
of θand φof the axis can be easily found.451
5.4 Comparison findings452
To further test the effectiveness of our approach, we compared its results with453
those achieved by four well-known and widely-used classification algorithms,454
namely Decision Tree (DT), Nearest Neighbor (K-NN), Neural Networks (NN),455
and Support Vector Machines (SVM). The values of the parameters used in the456
Title Suppressed Due to Excessive Length 19
Table 4: Values of the classifier parameters used in the experiments. Note that
as for the number of hidden neurons of the NN we applied, it derives from the
following formula: (#features + #classes)/2.
Classifier Parameter value
DT Confidence factor 0.25
Minimum #instances per leaf 2
K-NN K 3
Distance Euclidian
NN Learning rate 0.3
Momentum 0.2
Hidden Neurons 8
Epochs 500
SVM Kernel RBF
C 1.0
γ0.5
experiments are shown in Table 4. For the sake of a fair comparison we trained457
and tested these classifiers (performing 30 runs) using the same procedure used458
for our evolutionary algorithm. Also in this case, for each run we randomly459
selected 10% of the training set and trained the classifier on them. Each trained460
classifier was then evaluated on the test set. In the following we will refer to461
them as ML algorithms. To statistically validate the comparison results, we462
performed the Wilcoxon rank-sum test (α= 0.05).463
Comparison results are shown in Table 5. The table shows the average464
accuracy and the related the standard deviation computed on the 30 runs465
performed. The table also shows the p–value of the Wilcoxon test and the466
performance achieved on the best run. From the table we can see that per-467
formance differences between our system and the ML algorithms are not stat-468
ically significant. However, if we look at the best performance achieved, our469
approach largely outperforms the other ones. It is worth noticing that this470
performance is particularly important in real-world applications, as is the case471
of our system. In these applications once multiple runs have been performed,472
the solution from the best run is used to implement the real system.473
Table 5: Comparison results with the state-of-the-art classification algorithms.
Classifier Avg Std dev pBest
Our system 0.69 0.13 – 0.87
SVM 0.73 0.01 0.16 0.74
MLP 0.73 0.01 0.11 0.74
DT 0.69 0.02 0.92 0.74
KNN 0.71 0.01 0.67 0.73
20 Claudio De Stefano et al.
To deepen the behaviour of the classification algorithms used for the com-474
parison, we analyzed the confusion matrices computed from their best run.475
These matrices are shown in Figure 9. From the figure we can see that they476
shows similar patterns: formic (#3) and phosphoric (#5) acid are confused477
with other pollutants. The first was mostly confused with the acetic acid478
(#1), whereas the second was mostly confused with the formic acid. These479
behaviours are similar to that exhibited by our system (see Figure 4). How-480
ever, in this case the percentage of pollutants confused is much lower than481
those achieved by the approaches used for the comparison. From the figure we482
can also see that SWW is correctly classified by three of the four ML algo-483
rithms used. Only DT had a significant percentage of SWW samples (about484
22%) confused with the other acids. This value represents the percentage of485
false alarms ( with respect to SWW samples) and makes this algorithm much486
less performing than the other ones.487
5.5 Discussion488
Looking at the results reported above, some conclusions can be drawn:489
–Polar coordinates achieves a better performance than the Cartesian ones,490
in terms of both overall accuracy and groups of pollutants confused. This491
result confirms that for the number of pollutants considered in this study492
(C= 6), using a shorter chromosome (2×Cinstead of 3 ×C) allows the EA493
to be more effective in searching the best parameters for the classification494
model proposed.495
–The use of LDA (a supervised procedure) for the dimensionality reduction496
did not produce any performance improvement compared to PCA (an un-497
supervised procedure). This result confirms that the data transformation498
provided by LDA, which maximizes class separation, limits the effective-499
ness of the EA in finding the best parameters of the classification model.500
–The initialization of a fraction of the individuals in the initial population by501
using training set information, allows the EA to find better solutions. We502
have found that the best value for this fraction is 0.10. This value, although503
quite low, allows a small but significant improvement of the performance.504
Together with the LDA performance discussed above, this result is a further505
confirmation that providing too much ”a-priori information” to the EA506
limits its search ability.507
–As mentioned at the beginning of Section 5, the performance of our system508
was tested on data acquired with microchips different from those used to509
train it. Therefore, the good results achieved by our system confirm that its510
performance is robust with respect to the device to device variability. This511
is an import point to consider: in a real-world scenario, the parameters512
optimized by our system can be used to configure all the devices to be513
produced.514
Title Suppressed Due to Excessive Length 21
6 Conclusions and future work515
Water pollution is a worldwide concern that also includes wastewater which,516
if not polluted, helps protecting ecosystems. Therefore, it is crucial to find517
reliable and low-cost technologies for a continuous and diffused monitoring of518
wastewater.519
In two previous papers, we presented a system for water pollutant classifi-520
cation to be implemented on the multi-sensor microcontroller SENSIPLUS®.521
Input data were first projected into a 3-D space and then classified using sim-522
ple geometrical models. The aim was to implement a pollutant classification523
system able to work even with the few computational resources available on524
cheap microcontrollers. In this paper, we presented a further development of525
those approaches. This development allowed us to: (i) improve the effectiveness526
of our IoT-based system; (ii) reduce the amount of computational resources527
needed. The experiments were performed on six contaminants. The obtained528
results proved the effectiveness of the solutions proposed to improve the perfor-529
mance of our system. The results also confirmed that our system outperforms530
some state-of-the-art classification algorithms.531
Future work will focus on the definition of a variable-length chromosome.532
This should allow us to individuate sub-clusters of points belonging to the533
same contaminant. In this context we will also implement the ”dynamic label-534
ing” strategy presented in [9]. According to this approach, axes making up an535
individual will be not a priori labeled, but their labeling will occur after each536
sample in the training set has been assigned to its nearest axis.537
Acknowledgments538
The research leading to these results has received funding from the European539
Union’s Horizon 2020 research and innovation program under grant agreement540
SYSTEM No. 787128. The authors are solely responsible for it and that it does541
not represent the opinion of the Community and that the Community is not542
responsible for any use that might be made of information contained therein.543
The authors gratefully acknowledge Sensichips s.r.l. for the support during544
the experimental phases.545
This work was also supported by MIUR (Minister for Education, University546
and Research, Law 232/216, Department of Excellence).547
Conflicts of interest548
On behalf of all authors, the corresponding author states that there is no549
conflict of interest.550
22 Claudio De Stefano et al.
References551
1. Atzori, L., Iera, A., Morabito, G.: The internet of things: A survey. Computer Networks552
54(15), 2787 – 2805 (2010)553
2. Bernieri, A., Ferrigno, L., Laracca, M., Molinara, M.: An svm approach to crack shape554
reconstruction in eddy current testing. In: 2006 IEEE Instrumentation and Measurement555
Technology Conference Proceedings, pp. 2121–2126 (2006)556
3. Betta, G., Cerro, G., Ferdinandi, M., Ferrigno, L., Molinara, M.: Contaminants detec-557
tion and classification through a customized iot-based platform: A case study. IEEE558
Instrumentation Measurement Magazine 22(6), 35–44 (2019)559
4. Bruschi, P., Cerro, G., Colace, L., De Iacovo, A., Del Cesta, S., Ferdinandi, M., Fer-560
rigno, L., Molinara, M., Ria, A., Simmarano, R., Tortorella, F., Venettacci, C.: A novel561
integrated smart system for indoor air monitoring and gas recognition. In: 2018 IEEE562
International Conference on Smart Computing (SMARTCOMP), pp. 470–475 (2018)563
5. Cerro, G., Ferdinandi, M., Ferrigno, L., Laracca, M., Molinara, M.: Metrological charac-564
terization of a novel microsensor platform for activated carbon filters monitoring. IEEE565
Transactions on Instrumentation and Measurement 67(10), 2504–2515 (2018)566
6. Cerro, G., Ferdinandi, M., Ferrigno, L., Molinara, M.: Preliminary realization of a mon-567
itoring system of activated carbon filter rli based on the sensiplus®microsensor plat-568
form. In: 2017 IEEE International Workshop on Measurement and Networking (M N),569
pp. 1–5 (2017)570
7. Cilia, N., De Stefano, C., Fontanella, F., Scotto di Freca, A.: Variable-length representa-571
tion for ec-based feature selection in high-dimensional data. Lecture Notes in Computer572
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes573
in Bioinformatics) 11454 LNCS, 325–340 (2019)574
8. Cilia, N., De Stefano, C., Fontanella, F., Raimondo, S., Scotto di Freca, A.: An ex-575
perimental comparison of feature-selection and classification methods for microarray576
datasets. Information (Switzerland) 10(3) (2019)577
9. Cordella, L.P., De stefano, C., Fontanella, F.: Evolutionary prototyping for handwriting578
recognition. International Journal of Pattern Recognition and Artificial Intelligence579
21(01), 157–178 (2007)580
10. De Stefano, C., Ferrigno, L., Fontanella, F., Gerevini, L., Scotto di Freca, A.: A581
novel pca-based approach for building on-board sensor classifiers for water con-582
taminant detection. Pattern Recognition Letters 135, 375 – 381 (2020). DOI583
https://doi.org/10.1016/j.patrec.2020.05.015584
11. De Stefano, C., Fontanella, F., Folino, G., Scotto Di Freca, A.: A bayesian approach for585
combining ensembles of gp classifiers. Lecture Notes in Computer Science (including586
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)587
6713 LNCS, 26–35 (2011)588
12. De Stefano, C., Fontanella, F., Marrocco, C.: A ga-based feature selection algorithm for589
remote sensing images. Lecture Notes in Computer Science (including subseries Lecture590
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4974 LNCS, 285–591
294 (2008)592
13. Desmet, C., Degiuli, A., Ferrari, C., Romolo, F.S., Blum, L., Marquette, C.: Electro-593
chemical sensor for explosives precursors’ detection in water. Challenges 8(1) (2017)594
14. Faruqe, M.O., Hasan, M.A.M.: Face recognition using pca and svm. In: 2009 3rd Interna-595
tional Conference on Anti-counterfeiting, Security, and Identification in Communication,596
pp. 97–101 (2009)597
15. Ferdinandi, M., Molinara, M., Cerro, G., Ferrigno, L., Marrocco, C., Bria, A., Di Meo,598
P., Bourelly, C., Simmarano, R.: A novel smart system for contaminants detection and599
recognition in water. In: 2019 IEEE International Conference on Smart Computing600
(SMARTCOMP), pp. 186–191 (2019)601
16. Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: Proba-602
bilistic algorithms for constructing approximate matrix decompositions. SIAM Review603
53(2), 217–288 (2011)604
17. Jing, C., Hou, J.: Svm and pca based fault classification approaches for complicated605
industrial process. Neurocomputing 167, 636 – 642 (2015)606
Title Suppressed Due to Excessive Length 23
18. Kaur, A., Singh, P., Nayyar, A.: Fog Computing: Building a Road to IoT with Fog607
Analytics, pp. 59–78. Springer Singapore, Singapore (2020)608
19. Krishnamurthi, R., Kumar, A., Gopinathan, D., Nayyar, A., Qureshi, B.: An overview609
of iot sensor data processing, fusion, and analysis techniques. Sensors 20(21) (2020)610
20. Liu Ying, Gu Yanfeng, Zhang Ye: Hyperspectral feature extraction using selective pca611
based on genetic algorithm with subgroups. In: First International Conference on Inno-612
vative Computing, Information and Control - Volume I (ICICIC’06), vol. 3, pp. 652–656613
(2006)614
21. Lotfi, E., Keshavarz, A.: Gene expression microarray classification using pca–bel. Com-615
puters in Biology and Medicine 54, 180 – 187 (2014)616
22. Mahmud, F., Haque, M.E., Zuhori, S.T., Pal, B.: Human face recognition using pca617
based genetic algorithm. In: 2014 International Conference on Electrical Engineering618
and Information Communication Technology, pp. 1–5 (2014)619
23. Nayyar, A., Puri, V.: Smart farming: Iot based smart sensors agriculture stick for live620
temperature and moisture monitoring using arduino, cloud computing & solar tech-621
nology. In: Proc. of the International Conference on Communication and Computing622
Systems (ICCCS-2016), pp. 673–680 (2016)623
24. Nopens, I., Capalozza, C., Vanrolleghem, P.A.: Stability analysis of a synthetic munic-624
ipal wastewater. Department of Applied Mathematics Biometrics and Process Control,625
University of Gent, Belgium (2001)626
25. Ochoa, G.: Error thresholds in genetic algorithms. Evolutionary Computation 14(2),627
157–182 (2006)628
26. Rathee, D.S., Ahuja, K., Nayyar, A.: Sustainable future iot services with touch-enabled629
handheld devices. Security and Privacy of Electronic Healthcare Records: Concepts,630
paradigms and solutions (2019)631
27. Shi, W., Dustdar, S.: The promise of edge computing. Computer 49(5), 78–81 (2016)632
28. Stefano, C.D., Ferrigno, L., Fontanella, F., Gerevini, L., Molinara, M.: A novel evolu-633
tionary approach for iot-based water contaminant detection. In: P.A. Castillo, J.L.J.634
Laredo (eds.) Applications of Evolutionary Computation - 24th International Confer-635
ence, EvoApplications 2021, Held as Part of EvoStar 2021, Virtual Event, April 7-636
9, 2021, Proceedings, Lecture Notes in Computer Science, vol. 12694, pp. 781–794.637
Springer (2021)638
29. Whelton, A.J., McMillan, L., Connell, M., Kelley, K.M., Gill, J.P., White, K.D., Gupta,639
R., Dey, R., Novy, C.: Residential tap water contamination following the freedom in-640
dustries chemical spill: Perceptions, water quality, and health impacts. Environmental641
Science & Technology 49(2), 813–823 (2015)642
30. Xu, X., Wang, X.: An adaptive network intrusion detection method based on pca and643
support vector machines. In: X. Li, S. Wang, Z.Y. Dong (eds.) Advanced Data Mining644
and Applications, pp. 696–703. Springer Berlin Heidelberg, Berlin, Heidelberg (2005)645
31. Yong Xia, Wen, L., Eberl, S., Fulham, M., Feng, D.: Genetic algorithm-based pca eigen-646
vector selection and weighting for automated identification of dementia using fdg-pet647
imaging. In: 2008 30th Annual International Conference of the IEEE Engineering in648
Medicine and Biology Society, pp. 4812–4815 (2008)649
24 Claudio De Stefano et al.
(a) Acetic acid (b) Ammonia
(c) Formic acid (d) Hydrogen peroxid
(e) Phosforic acid (f) Sulphuric acid
Fig. 8: Average fitness along the evolution with different fraction of initializa-
tion of the six pollutants.
Title Suppressed Due to Excessive Length 25
(a) DT (b) KNN
(c) NN (d) SVM
Fig. 9: Confusion matrices achieved on the best run by the ML algorithms.
A preview of this full-text is provided by Springer Nature.
Content available from SN Computer Science
This content is subject to copyright. Terms and conditions apply.