ArticlePDF Available

Abstract and Figures

The problem of detecting pollutants in water with non-invasive and low-cost sensors is an open question. In this paper, we propose a system for the detection and classification of pollutants based on the improvement of a previous proposal, focused on geometric cones. The solution is based on a classifier suitable to be implemented aboard the so-called Smart Cable Water (SCW) sensor, a multi-sensor based on SENSIPLUS® technology developed by Sensichips s.r.l. The SCW endowed with six interdigitated electrodes is a smart-sensor covered by specific sensing materials that allow differentiating between different water contaminants. Using the PCA or LDA decomposition, we obtain a data compression that makes data suitable for the “edge computing” paradigm with a reduction from a 10-dimensional space to a 3-dimensional space. We defined an ad-hoc classifier to distinguish contaminants represented by points in the 3-dimensional space. We used an evolutionary algorithm to learn the classifier’s parameters. Finally, we compared the performance of our system with that achieved by the old classification system based only on PCA, as well as those achieved by other machine learning algorithms. The proposed system achieved the best accuracy of 87%, outperforming the other state-of-the-art systems compared. The novelty of the system proposed lies in the usage of an evolutionary algorithm for the optimization of the parameters of a novel PCA-based classification algorithm for the detection of water pollutants.
This content is subject to copyright. Terms and conditions apply.
Noname manuscript No.
(will be inserted by the editor)
Evolutionary computation to implement an IoT-based
system for water pollution detection
Claudio De Stefano ·Luigi Ferrigno ·
Francesco Fontanella ·Luca Gerevini ·
Mario Molinara
the date of receipt and acceptance should be inserted later
Abstract The problem of detecting pollutants in water with non-invasive and1
low-cost sensors is an open question. In this paper, we propose a system for the2
detection and classification of pollutants based on the improvement of a previ-3
ous proposal, focused on geometric cones. The solution is based on a classifier4
suitable to be implemented aboard the so-called Smart Cable Water (SCW)5
sensor, a multi-sensor based on SENSIPLUS®technology developed by Sen-6
sichips s.r.l. The SCW endowed with six interdigitated electrodes is a smart-7
sensor covered by specific sensing materials that allow differentiating between8
different water contaminants. By using the PCA or LDA decomposition we9
obtain a data compression that makes data suitable for the ”edge computing”10
paradigm with a reduction from a 10-dimensional space to a 3-dimensional11
space. We defined an ad-hoc classifier to distinguish contaminants represented12
by points in the 3-dimensional space. We used an evolutionary algorithm to13
learn the classifier’s parameters. Finally, we compared the performance of our14
system with that achieved by the old classification system based only on PCA,15
as well as those achieved by other machine learning algorithms. The proposed16
system achieved the best accuracy of 87%, outperforming the other state-of-17
the-art systems compared. The novelty of the system proposed lies in the usage18
of an evolutionary algorithm for the optimization of the parameters of a novel19
PCA-based classification algorithm for the detection of water pollutants.20
Corresponding author:
Francesco Fontanella
tel.: +39 0776 2993382
E-mail: fontanella@unicas.it
Claudio De Stefano, Luigi Ferrigno, Francesco Fontanella, Luca Gerevini, Mario Molinara
Department of Electrical and Information Engineering (DIEI)
University of Cassino and Southern Lazio
Via G. Di Biasio, 43 03043
Cassino (FR), ITALY
2 Claudio De Stefano et al.
1 Introduction21
Water pollution is a worldwide concern and the WHO (World Health Orga-22
nization) has estimated that about two billion people worldwide is plagued23
by this problem [29]. This concern also includes wastewater which, if properly24
treated helps protecting ecosystems and may allow recovering energy, nutri-25
ents, and recoverable materials.26
Water quality is usually monitored by laboratory analyses, performed by27
experienced professionals who use sophisticated tools. Due to the time and28
cost of this approach, environmental disasters cannot be prevented. This pre-29
vention activity requires an effective and reactive water analysis through a30
large number of distributed measurement systems. While these systems are31
available and guarantee good performance in terms of accuracy and reliability,32
their large-scale use is limited by their high costs.33
In this context, low-cost microsensors for a capillary monitoring would34
be very useful. These microsensors should combine low costs with a good35
measurement accuracy (even for low levels of pollution), as well as good re-36
liability. Such a system could easily spread to developing countries as well.37
Furthermore, systems conceived in this way would make it possible to use the38
paradigms of the Internet of Things (IoT) [1,19, 23,26] as well as those of edge39
and fog computing [18,27] to perform early analysis and detection in the field.40
These paradigms benefit from the application of Artificial Intelligence (AI)41
and Machine Learning (ML) techniques to effectively analyze and exploit the42
information contained in the generated data[2,5, 6].43
In this context, in two previous papers [10,28], we presented two IoT-ready44
systems for wastewater pollutant detection and classification, based on the45
multi-sensor microcontroller SENSIPLUS®. In both systems input data were46
first projected into a 3-D space and then classified using simple geometrical47
models. The aim was to implement these models using a simple SENSIPLUS®
48
microcontroller with few computational resources (or other cheap microcon-49
trollers currently available on the market). In the first paper, we proposed a50
model based on cones centered on the origin of the transformed 3-D reference51
system. We used a one-versus-all strategy to learn the four parameters of the52
cones, one for each pollutant to be recognized. In the second paper, each con-53
taminant was represented by a straight line passing through the origin of the54
transformed 3-D reference system and each data point assigned to the near-55
est line. This approach allowed us to implement a multiclass classifier in a56
straightforward manner, thus avoiding problems of labeling conflicts between57
cones deriving from the one-versus-all strategy. This approach also allowed us58
to simplify the classification model: in a 3-D space a straight line is represented59
by three parameters.60
In both papers, we used an evolutionary algorithm (EA) to find the optimal61
values of the model parameters (cones or lines). These systems were tested on62
four contaminants (acetic acid, phosphoric acid, sulphuric acid, ammonia) as63
well as synthetic wastewater.64
Title Suppressed Due to Excessive Length 3
In this paper, we present a further development of the systems described65
above. In particular, we used polar coordinates to represent lines. This al-66
lowed us to represent each line using two parameters. Any system for water67
contaminant detection should be able to recognize as many pollutants as pos-68
sible. For this reason, we tested our system on more pollutants with respect to69
those tested in the two previous approaches mentioned above. We also tried70
two strategies to improve the performance of our system: (i) we tested linear71
data analysis (LDA) to project the input data into the 3-D space; (ii) we used72
training data points to initialize the individuals in the initial population.73
To summarize, the main objectives of the paper are the following:74
We propose a further development of a previously presented IoT-based75
systems for wastewater pollutant detection;76
We propose a novel 3D data representation based on polar coordinates;77
in order to improve the performance of our system, we compared two di-78
mensionality reduction strategies: the first one based on the PCA algorithm79
used in a previous study, and the second one based on a LDA algorithm;80
we also improved the results provided by the evolutionary algorithm by81
using a different initialization procedure, which generate the initial popu-82
lation tacking into account the information of the training set;83
we tested the ability of our system to cope with several contaminants by84
increasing the number of pollutants considered in the experiments.85
The remainder of the paper is organized as follows: Section 2 discusses the86
related work; Section 3 details data collection; Section 4 presents the system87
architecture and the evolutionary algorithm we used to learn the parameters88
of the classification model; Section 5 shows the experimental results. Some89
conclusions are eventually left to Section 6.90
2 Related work91
The analysis of issues related to the monitoring of environmental pollution92
is engaging many researchers and technical communities. They are trying to93
propose new emerging sensors able to reliable detect pollutants saving money,94
size and energy consumption, new network technologies, new communication95
standards and, finally, new methods for the data analysis. Many researchers96
are exploiting the advantages offered by Artificial Intelligence and Machine97
Learning (ML) [3,4,13, 15]. ML techniques are often preprocessed using Prin-98
cipal Component Analysis (PCA). PCA is a dimensionality reduction tech-99
nique that aims at removing redundant and poorly statistical significant (with100
respect to the target concept) features.101
It is worth noting that PCA is most often used as a preprocessing step for102
feature reduction before classification and generally not used to develop an ad-103
hoc classifier as is the case of our approach. In [21], for example, the authors104
faced the classification problem of gene-expression microarray data. They pro-105
posed a novel method (named PCA-BEL) based on a dual stage approach106
4 Claudio De Stefano et al.
that considers at first a PCA to reduce the quantity of features and a Brain107
Emotional Learning (BEL) network, with the aim of minimizing the compu-108
tational effort. BEL networks are methodologies that use simulated emotions109
to aid their learning process. The novelty of the proposal was the application110
of BEL model to cancer classification. Using standard datasets, they propose111
a benchmarking of the method versus standard classification methods for five112
types of cancers. The experimental results showed that PCA-BEL achieved113
improved detection for three of them, whereas worsen results were obtained114
for the remaining two. In [17], instead, the authors proposed a comparison115
between PCA and SVM in fault classification for ”complicated industrial pro-116
cess”. They used a standard dataset publicly available named Tennessee East-117
man (TE) challenging benchmark. Th experimental results showed that the118
PCA offers a higher classification rate for this multi-class classification case119
with much less computational effort based on the results obtained from the120
TE challenge process, whereas SVM classification takes longer and gets less121
accurate classification results. Finally, similar approaches have also been used122
for face recognition [14] and intrusion detection problems [30]. In [14] the au-123
thors proposed a face recognition system based on PCA combined with SVM.124
The novelty of the paper was in the combination of PCA with SVM. The125
experimental results showed that the RBF kernel outperformed other kernels126
obtaining the best results also with respect to a MLP. In [30] the authors127
presented a novel approach for the intrusion detection in computer networks.128
They developed an adaptive intrusion detection method based on PCA and129
SVM. The method consisted of two steps. The former is the PCA that aims130
in reducing the quantity of data to be considered by the classifier. The latter131
is the SVM network that solves the classification problem. The use of a stan-132
dard dataset showed the effectiveness of the approach in terms of accuracy133
and training and testing times.134
Evolutionary computation (EC) based algorithms have proven to be effec-135
tive in solving many real-world problems characterized by large and non-linear136
search spaces [12,7, 11,8]. They have also been used to improve the perfor-137
mance provided by the basic PCA algorithm. In [20], in a hyperspectral image138
classification problem, the authors proposed a new approach (called Selective139
Principal Component Analysis based on Genetic Algorithm with Subgroups,140
SPCA-GAS) to select the best feature subset to provide in input to PCA. This141
approach introduced the use of subgroups for combining feature selection and142
extraction. Each subgroup was a partition of the search space including the143
solutions with a number of features in the range [ni, ni+1]. Then each subgroup144
was separately evolved in a subpopulation to find the best number of features145
to be selected. Experiments were carried out on the AVIRIS dataset with 100146
bands, and showed that SPC-GAS outperformed the compared approaches in147
terms of average accuracy and feature reduction rate.148
In [31], instead, the authors used a GA to find the optimal PCA components149
so that the resultant features (extracted from PET images) were capable to150
identify people with suspected Alzheimer’s disease. The proposed approach151
was based on eigenvector selection and weighting and achieved a test accu-152
Title Suppressed Due to Excessive Length 5
Fig. 1: Layout of the proposed system.
racy of 90% on a dataset of 210 clinical cases. Finally, in [22] the authors153
introduced a novel approach that used PCA and GA for human face recogni-154
tion. The PCA was used for extracting features from images with the help of155
covariance analysis to generate eigen components of the images, whereas the156
GA was used for dimensionality reduction. The proposed approach was tested157
on the Japanese Female Facial Expression (JAFFE) dataset and achieved an158
accuracy of approximately 96%.159
From the brief literature review outlined above, we can observe that EC-160
based approaches were mainly used to optimize the performance of PCA by161
suitably selecting or modifying the principal components provided by the stan-162
dard algorithm. However, to the best of our knowledge, EC-based algorithms163
have never been used to learn the parameters of a classification model in the164
feature space provided by the PCA, as is the case of our approach. The only165
exception are the two papers mentioned in the Introduction [10,28], reporting166
the results of our previous studies. As shown in the experiments (Section 5),167
this approach allowed our method to significantly improve the performance of168
wastewater pollution detection systems.169
3 Data Collection170
Data were collected in such a way to simulate the conditions present in a con-171
trolled drain of a sewage network. To this aim, we used Synthetic Wastewater172
(SWW), whose composition is shown in Table 1. Further details about its can173
be found in [24]. Furthermore, before starting each acquisition, the pH of the174
SWW was corrected to stay between 7.2 to 7.9. The correction has been made175
by adding some NaOH in order to rise the pH value, or some HCl to decrease176
it. Finally, the data were collected at room temperature, in a range between177
20C and 30C.178
Data were collected using a Sensichips board equipped with a SENSIPLUS®
179
micro-chip connected to six sensors, and a micro controller unit for data com-180
munication. Data are sent to external device. This latter can be personal com-181
puter, a tablet, or even a smartphone.182
3.1 Measurement procedure183
The measurement procedure used to collect data from the sensors consists of184
two steps:185
6 Claudio De Stefano et al.
Table 1: Synthetic Wastewater chemical composition.
Compounds [mg/L]
Fertilizer 91.74
Ammonium Chloride 12.75
Sodium Acetate Trihydrate 131.64
Magnesium Hydrogen Phosphate Trihydrate 29.02
Monopotassium Phosphate 23.4
Iron (II) Sulfate Heptahydrate 5.80
Starch 122.00
Milk Powder 116.19
Yeast 52.24
Soy Oil 29.02
Warm-Up: allows sensors to be stabilized, the first 600 samples are acquired186
in SWW only.187
Measurement: after the collection of the first 600 samples, the contaminant188
is injected in the SWW and 1000 samples are acquired.189
The 600 samples acquired in the warm-up step are also used to build a ”base-190
line”, which is used to normalize the remaining measurements. This normaliza-191
tion plays an important role because it allow us to ”ignore” the contaminants192
already present in the wastewater and let the system able to focus only on the193
contaminant injected. Contaminant concentrations and quantities are shown194
in Table 2. It is important to specify that before acquiring the data for a given195
contaminant, preliminary tests were carried out to ensure that, given the quan-196
tity of contaminant selected, the pH was between 3.0 and 12.0 (values outside197
of which the SCW sensors are damaged).198
199
4 System architecture200
The proposed system is made of two modules (see Fig. 1). The first module uses201
a dimensionality reduction algorithm to transform the input data, consisting of202
ten electrical measures, i.e. resistance and capacity values measured at different203
frequencies and with various sensors (platinum, gold, silver, and nickel) into a204
3-D space. The aim of this transformation is twofold: (i) simplify the original205
Contaminant
Initial
Concentration
[%]
Injected
Quantity
[mL]
Final
Concentration
[%]
Acetic Acid 80 0.3 0.2393
Ammonia 30 0.3 0.0897
Phosphoric Acid 75 0.12 0.0899
Hydrogen Peroxide 35 0.4 0.1394
Formic Acid 85 0.1 0.0849
Sulphuric Acid 95 0.1 0.0949
Table 2: Contaminant concentrations.
Title Suppressed Due to Excessive Length 7
Fig. 2: Flowchart of the proposed system.
data identifying a few uncorrelated features which maximize data variability;206
(ii) to set up a simple classification model that can be implemented with very207
few hardware resources as is the case the simple water sensors considered in208
our study.209
Once data points are projected into the 3D space, the second module clas-210
sifies them using a simple geometrical model: straight lines passing through211
the origin of the transformed 3-D reference system. The entire classification212
system is shown in Figure 2.213
The dimensionality reduction transformation and the classification model im-214
plemented are described in the following subsections.215
4.1 Data transformation216
As mentioned in the Introduction, our main goal is to build an ad-hoc clas-217
sification system that can be implemented aboard low-cost sensors. For this218
reason, it is necessary to try to simplify as much as possible the space dimen-219
sionality in order to reduce the overall system’s computational complexity and220
the number of parameters of the model used to classify water pollutants.221
For the dimensionality reduction we tested two approaches: PCA and LDA.222
These procedures are used to project the points (data samples) from the orig-223
inal ten-dimensional input space to a 3-D space. This data transformation224
8 Claudio De Stefano et al.
allow us to design a simple and lightweight classification model that can be225
implemented with very few hardware resources.226
The PCA is used in order to decompose a multidimensional dataset into227
a set of orthogonal components that try to maximize the variance. Basically228
the PCA learns a given number of components that can be used on new data,229
to project it on these components. In our case the PCA has been performed230
using the randomized truncated Singular Value Decomposition of the data,231
which performs the eigen-decomposition of the covariance matrix [16].232
The LDA algorithms is based on an unsupervised procedure that generated233
a linear decision boundary, using Bayes’ rule and fitting the Gaussian density234
to each class, assuming that all classes share the same covariance matrix.235
In order to reduce the dimensionality, the LDA, can project the input data236
according to the most discriminative directions.237
4.2 The classification model238
Once the original data points are projected into the 3-D space, the training239
samples are passed to the multi-class classifier that models the Cclasses (the240
number of classes is equal to the number of contaminants to recognize).241
Each contaminant, but for the Synthetic Wastewater (SWW), is modelled by242
a line rpassing through the origin of the reference system as shown in Figure243
3(a). According to the approach presented in [28], to classify SWW points244
we use a sphere, instead of a line. The sphere represents a threshold on the245
distance from the origin of the reference system: given point P, if it falls inside246
the sphere, we classify it as belonging to SWW (see Figure 3).247
We chose this solution since, according to the data normalization’s mechanism248
used (see Section 5), SWW samples are concentrated around the origin (see249
Figure 3(b)). All the others contaminants, instead, are modelled by a line r250
defined by the spherical coordinates θ,φand ρ. However, since the ρparam-251
eter defines the distance from the origin, we take it as a constant value and252
so the classification model just evolves two parameters: θand φ. Therefore,253
a line is defined by only these two parameters. A given point Pis labeled to254
belonging to the class associated to its nearest line. Regarding our old clas-255
sification model based on Clines rdefined by 3 parameters l, m, n we chose256
to use the spherical coordinates θand φ(with constant ρ) in order to further257
decrease the complexity and the necessary memory of our classification’s sys-258
tem. Indeed, while in the old model we have to store C×3 coefficients, in the259
new classification model we have to store only C×2 coefficients.260
The problem of finding the set of parameters θ,φthat maximize the per-261
formance for a given dataset D, is an optimization problem where the function262
to maximize is the global accuracy (see Algorithm 2). To summarize, a given263
point Pin the 3-D space is assigned according to the following procedure:264
1. check whether Pis inside the SWW’s sphere or not. If so, assign Pto265
SWW, otherwise go to point 2;266
2. compute the distance between Pand each of the Clines;267
Title Suppressed Due to Excessive Length 9
(a) Entire Model 3-D View.
(b) Sww Sphere
Fig. 3: Line-based Model 3D View.
10 Claudio De Stefano et al.
3. assign Pto the class of its nearest line.268
In order to find the distance between a line rand a point Pwe can start from269
the parametric equation in the 3-D space of the line:270
r:
xr=x0+lt
yr=y0+mt
zr=z0+nt
with tR=vr= (l, m, n) (1)
(x0, y0, z0) represent the origin of the reference system where the line rpasses271
through, whereas (l, m, n) are the components with respect to the base {i, j, k}272
of a parallel vector to r. We computed the transformation from Cartesian to273
polar coordinates inverting the following equations1:274
l=ρsin φcos θ ρ [0,+]
m=ρsin φsin θwith θ[0,2π)
n=ρcos φ φ [0, π]
(2)
Note that as mentioned above, in a 3D space a line passing through the origin275
is characterized by θand φ, whereas ρcan assume any value. For this reason276
we set ρ= 1.277
At this point, we can compute the plane α, orthogonal to rand that passing278
through the point P. The Cartesian equation of a plane αin the 3-D space is:279
α:ax +by +cz +d= 0 (3)
The plane orthogonal to the line rthat passing through Pis computed by:280
lx +my +nz (lxP+myP+nzP) = 0 (4)
Now by replacing (x, y, z) with (xr, yr, zr) we are able to solve the Eq. 4 with281
respect to the parameter tof the Eq. 1. Thus we can calculate the point282
H(xH, yH, zH) given by the intersection between the plane of Eq. 4 and the283
line of Eq. 1 which brings us to the:284
H:
xH=x0+lt
yH=y0+mt
zH=z0+nt
(5)
And now, we can finally compute the distance between Pand ras:285
d(P, r) = d(P, H) = p(xPxH)2+ (yPyH)2+ (zPzH)2(6)
Title Suppressed Due to Excessive Length 11
Algorithm 1: Evolutionary algorithm
input: list of parameters (Table 3), a data set T
output: best individual found
begin
1: randomly initialize a population of Pindividuals;
2: evaluate the fitness of each individual;
3: g= 0;
4: while g < Ngdo
5: copy the best eindividuals in the new population;
6: for i= 0 to P/2edo
7: select a couple of individuals;
8: replicate the selected individuals;
9: if flip(pc)then
10: apply the crossover operator on the selected individuals;
11: if flip(pm)then
12: perform the mutation on the offspring;
13: evaluate the fitness of each individual;
14: replace the old population with the new one;
15: update the best individual found so far;
16: g=g+ 1;
17: return the best individual found;
end
The function flip(p) returns the value 1 with a probability pand the value 0 with a
probability (1 p).
4.3 The evolutionary algorithm286
To find the optimal parameters for the Clines in the 3-D space representing287
the contaminants to be detected and distinguished, we used a generational288
evolutionary algorithm (EA). The algorithm starts by generating a population289
of Pindividuals, each made of 2 ×Creal variables.290
Afterwards, the fitness of the individuals is evaluated and then a new pop-291
ulation is generated. In order to implement an elitist strategy, the best eare292
just copied in the new population, without being modified by the genetic op-293
erators2. Then the remaining (P D)/2 couples of individuals are selected294
by using the tournament method. Then the uniform crossover is applied to295
each of the selected couples, according to a given probability factor pc. Next,296
the mutation operator is applied with a probability pm. The value of pmhas297
been set to 1
2×C, i.e. the reciprocal of the number of variables in the chromo-298
some. This probability value allows, on average, the modification of only one299
chromosome element. This value has been suggested in [25] as the optimal mu-300
tation rate below the error threshold of replication. Finally, these individuals301
are added to the new population. The process just described is repeated for302
1
2Note that this strategy ensures that the best individuals found along the evolutionary
process are not lost.
12 Claudio De Stefano et al.
Nggenerations. The evolutionary algorithm implemented is described in Algo-303
rithm 1, whereas Algorithm 2 describes the fitness function used. the Further304
details about the evolutionary algorithm can be found in [9].305
5 Experimental results306
To test the effectiveness of our system we took into account six contaminants307
(acetic acid, phosphoric acid, sulphuric acid, ammonia, formic acid and hydro-308
gen peroxide) as well as SWW. The dataset contained ten data acquisitions309
per contaminant, each consisting of 1,600 samples. From each acquisition we310
removed the first 600 samples: the first half to allow sensors to stabilize, the311
remaining half to build the baseline. The last 1,000 samples were used to312
build up the dataset used for the experiment detailed in the following. So,313
for each contaminant we used 10,000 samples. To evaluate the classification314
performance of our system we used the ten-fold cross-validation strategy.315
Data were acquired using twenty SCW boards, equipped with different316
versions of the SENSIPLUS®micro-chip. Furthermore, to assess the device317
independence of the results provided by our system, data were organized in318
such a way that the samples recorded using the same device were included319
in the same fold. Note that this data split allowed us to assess the device320
independence of the performance achieved by our system.321
Since the nature of the physical quantities measured by the aboard sensors322
of our system is different, they have different value ranges. For this reason we323
performed a data normalization with respect to the baseline.324
In particular, the 1,000 samples of each acquisition used to build our dataset325
were divided by the respective baseline, computed on the first 600 samples as326
follows:327
n=xi
bi
1.0i[1,2, ..., 10] (7)
where xirepresents the set of last 1,000 samples of the i-th acquisition, bi
328
is the baseline of that given acquisition, and nis the output set, consisting329
Algorithm 2: Fitness function
input: A dataset Tconsisting of NTpoints in the 3-D space, an individual I
representing Clines
output: Accuracy achieved by Ion T
begin
Nc= 0;
for i= 1 to NTdo
compute the distance between Pi T and each of the lines of I
assign Pito the nearest line rn
if label(Pi) == label(rn)then
Nc=Nc+ 1;
return Nc/NT;
end
Title Suppressed Due to Excessive Length 13
Table 3: Evolutionary algorithm parameters.
Parameter Symbol value
Population size P100
Number of Generations Ng500
Elitism e2
Tournament size t5
Crossover probability pc0.6
Mutation probability pm0.08
Mutation range mr0.1
of the normalized samples. Furthermore, to centering our data in the origin,330
we subtract 1.0 to the obtained value. For this reason, since the baseline bi
331
is computed in only SWW, all the SWW samples are around the origin. This332
operation was performed over all the acquisitions used to build our dataset.333
334
We split the dataset built as just mentioned into two sets: a training set335
containing the 90% of the samples of all contaminants except the SWW, and336
a test set consisting of the remaining samples as well as the SWW samples.337
Summarizing, the training set consists of 54,000 samples (9,000 for each con-338
taminant), whereas the test set contains 7,000 samples (1,000 per contaminant339
plus the 1,000 SWW samples). We chose to add only 1,000 samples (one acqui-340
sition) at time of SWW, in the test set, in order to maintain the data balanced.341
Furthermore in order to reduce the training time of the EA, from the 54,000342
samples we randomly chose 5400 of them (10%).343
The training set was used to compute the individuals’ fitness of the EA344
outlined in Section 4. For each fold, we performed thirty runs and at the end345
of each run, the twelve real values (two parameters per contaminant, see eq.346
4) encoded by the individual with the best fitness were stored as the solution347
provided by that run. To set the parameters of the EA, we performed some348
preliminary trials. These parameters were used for the experiments described349
below and are shown in Table 3.350
351
352
To evaluate the effectiveness of our system, we performed three sets of353
experiments, using the data described above. In the first we compared the354
results of our system with those achieved by the system presented in [28]. In the355
second, we tested the two strategies taken into account to improve the system356
performance (LDA data transformation and initial population initialization).357
Finally, in the third set, we compared our results with with those of four well-358
known and widely-used classification algorithms. The experiments performed359
are described in the following subsections.360
14 Claudio De Stefano et al.
Fig. 4: Confusion matrix achieved on the test test with the PB system (overall
accuracy: 0.87).
5.1 Polar coordinates testing361
In the first set of experiments we tried to answer to the following question: do362
the polar coordinates used to represent the axes (see Section 4) allow us to363
improve the performance of our system? To answer this question we compared364
our system’s performance with that achieved by the approach presented in365
[28]. Figures 4 and 5 show the best confusion matrices obtained on the test366
data by our system (polar-based, PB in the following) and the previous one367
(cartesian-based, CB in the following), respectively. From the figures we can see368
that both approaches confused a few the pollutants with SWW, confirming the369
effectiveness of our approach in developing a system able to detect pollutants370
in water. However, in terms of overall accuracy, the novel proposed approach371
outperformed the previous one: PB best overall accuracy was 0.86, whereas372
CB accuracy was 0.42.373
From Figure 4, we can see that both formic and phosphoric acids were con-374
fused with acetic acid (with probabilities equal to 0.46 and 0.38, respectively).375
This result is ”acceptable” because sensors tend to respond to acids with sim-376
ilar patterns. As for Figure 5, we can observe that, apart of the confusion377
between formic and phosphoric acid with sulphuric acid, hydrogen peroxide378
was confused with sulphuric acid whereas formic acid was confused with hy-379
drogen peroxide. This peroxide-acid confusion is much less acceptable than the380
acid-acid confusion, because for these two kinds of substances sensors tend to381
Title Suppressed Due to Excessive Length 15
Fig. 5: Confusion matrices achieved on the test test with the CB system (overall
accuracy: 0.42).
provide different measurement patterns. Together with the overall accuracy382
comparison, this last result confirms that polar coordinates are more effective383
than Cartesian ones in discriminating the pollutants analyzed in this study.384
5.2 System improvement testing385
In the second set of experiments we tried to answer the following question:386
How can we improve the performance of our system? To this aim we tested387
two approaches: (i) the Linear Discriminant Analysis (LDA in the follow-388
ing) algorithm to project the original input data into the 3D space; (ii) using389
information from the training set to initialize the individuals in the initial390
population.391
5.2.1 PCA vs LDA392
As mentioned in Section 4, the LDA algorithm is a dimensionality reduction393
technique that uses a supervised procedure to find a linear combination of the394
16 Claudio De Stefano et al.
Fig. 6: Confusion matrix achieved on the test test with LDA (overall accuracy:
0.71).
features in the original space that allow a better class separation. We used395
LDA to project the original 10 features data points to the 3D space. To test396
the effectiveness of this transformation we compared its results with those397
achieved using the PCA algorithm. Figure 6 shows the confusion matrices398
obtained using LDA as data transformation algorithm. Comparing the matrix399
with that shown in Figure 6 we can see that PCA outperformed LDA, in terms400
of overall accuracy (PCA: 0.86, LDA: 0.71) and pollutant confusion with SWW401
(see last columns of confusion matrices). It is worth noting that this latter402
performance index is much more important than inter-pollutant confusion:403
confusing a pollutant with another one still allow the end-user to be warned,404
whereas confusing a pollutant with SWW does not allow any warning. PCA405
confused very few percentages of pollutants with SWW, whereas LDA confused406
60% of phosphoric acid with SWW. As for the inter-pollutant confusion, we can407
observe that LDA achieved a peroxide-acid confusion (peroxide with acetic and408
formic acids) which is less acceptable than acid-acid confusion (see discussion409
in the previous Subsection).410
These results seems counterintuitive: LDA uses a supervised procedure,411
whereas PCA uses an unsupervised procedure that only preserve data varia-412
tion. However, most probably this depends on the fact that LDA exploits most413
of the class label information, limiting the ability of the EC module to find414
the best 3-D model (the set of axis).415
Title Suppressed Due to Excessive Length 17
Algorithm 3: Individual initialization
input: A dataset Tconsisting of NTpoints in the 3-D space, an individual I
representing Clines to be initialized
output: Iinitialized
begin
Nc= 0;
for i= 1 to Cdo
randomly select from Ta sample sibelonging to the class i
compute θiand φiin such a way that the i-th axis of Ipasses through si
return I;
end
5.3 Smart Initialization416
As second improvement strategy we tested a different initialization procedure417
for the initial population of the evolutionary algorithm (see line #1 of Algo-418
rithm 1). This procedure initializes a fraction of the individuals in the initial419
population using the information provided by the training set. In practice,420
given an individual to be initialized, each couple of real values representing421
an axis ais initialized randomly choosing a sample sfrom the training set be-422
longing to the class (pollutant) of aand setting the values in such a way that423
the apasses through s(see Algorithm 3). Since each sample of our dataset424
is represented by the x,y, and zcoordinates, θiand φivalues can computed425
according to the following equations:426
φ= arccos z
ρwith (x, y, z)6= (0,0,0) (8)
θ=
π
2if x= 0, y > 0
3π
2if x= 0, y < 0
not defined if x= 0, y = 0
arctan y
xif x > 0, y 0
arctan y
x+ 2πif x > 0, y < 0 or if x < 0, y > 0
arctan y
x+πif x < 0, y 0
(9)
We tested three values for the fraction of initial population to initialize ac-427
cording to procedure described in Algorithm 3: 0.05, 0.10, and 0.15. To assess428
how this ”smart initialization” affected the evolutionary process, we analysed429
and compared the population average fitness (training accuracy) along the430
evolution for the three values tested. This comparison is shown in Figure 7.431
The figure also shows the curve obtained without initialization (0.0). From the432
18 Claudio De Stefano et al.
Fig. 7: Average fitness along the evolution with different fraction of initializa-
tion.
figure we can see that the values 0.0, 0.05 and 0.10 achieved similar perfor-433
mances, with the latter which performed a little better than the other two.434
From the figure we can also see that the value 0.15 performed much worse435
than the other ones. From the figure we can draw the following conclusions:436
(i) a too high fraction of initialized individuals limits the exploration ability437
of our evolutionary algorithm; (ii) low values of initialization do not allow the438
algorithm to improve its exploration ability; (iii) there is a value that allow a439
small but still significant improvement of the performance.440
To further investigate how the smart initialization affected the evolutionary441
process we plotted the population average fitness (training accuracy) along the442
evolution for the three values tested also for the single pollutants. The plots443
are shown in Figure 8. The plots show that, as expected, for most pollutants444
the value 0.15 performed worse than the others. Only for the sulphuric acid445
this value outperformed the other ones. Most probably, the axis parameters446
for this acid were not easy to find and a higher initialization fraction helped447
the search process. The value 0.10 performed slightly better than the others448
except for the formic acid. In this case the random initialization was the best449
performing. Most probably, this result confirms that for this acid the values450
of θand φof the axis can be easily found.451
5.4 Comparison findings452
To further test the effectiveness of our approach, we compared its results with453
those achieved by four well-known and widely-used classification algorithms,454
namely Decision Tree (DT), Nearest Neighbor (K-NN), Neural Networks (NN),455
and Support Vector Machines (SVM). The values of the parameters used in the456
Title Suppressed Due to Excessive Length 19
Table 4: Values of the classifier parameters used in the experiments. Note that
as for the number of hidden neurons of the NN we applied, it derives from the
following formula: (#features + #classes)/2.
Classifier Parameter value
DT Confidence factor 0.25
Minimum #instances per leaf 2
K-NN K 3
Distance Euclidian
NN Learning rate 0.3
Momentum 0.2
Hidden Neurons 8
Epochs 500
SVM Kernel RBF
C 1.0
γ0.5
experiments are shown in Table 4. For the sake of a fair comparison we trained457
and tested these classifiers (performing 30 runs) using the same procedure used458
for our evolutionary algorithm. Also in this case, for each run we randomly459
selected 10% of the training set and trained the classifier on them. Each trained460
classifier was then evaluated on the test set. In the following we will refer to461
them as ML algorithms. To statistically validate the comparison results, we462
performed the Wilcoxon rank-sum test (α= 0.05).463
Comparison results are shown in Table 5. The table shows the average464
accuracy and the related the standard deviation computed on the 30 runs465
performed. The table also shows the p–value of the Wilcoxon test and the466
performance achieved on the best run. From the table we can see that per-467
formance differences between our system and the ML algorithms are not stat-468
ically significant. However, if we look at the best performance achieved, our469
approach largely outperforms the other ones. It is worth noticing that this470
performance is particularly important in real-world applications, as is the case471
of our system. In these applications once multiple runs have been performed,472
the solution from the best run is used to implement the real system.473
Table 5: Comparison results with the state-of-the-art classification algorithms.
Classifier Avg Std dev pBest
Our system 0.69 0.13 0.87
SVM 0.73 0.01 0.16 0.74
MLP 0.73 0.01 0.11 0.74
DT 0.69 0.02 0.92 0.74
KNN 0.71 0.01 0.67 0.73
20 Claudio De Stefano et al.
To deepen the behaviour of the classification algorithms used for the com-474
parison, we analyzed the confusion matrices computed from their best run.475
These matrices are shown in Figure 9. From the figure we can see that they476
shows similar patterns: formic (#3) and phosphoric (#5) acid are confused477
with other pollutants. The first was mostly confused with the acetic acid478
(#1), whereas the second was mostly confused with the formic acid. These479
behaviours are similar to that exhibited by our system (see Figure 4). How-480
ever, in this case the percentage of pollutants confused is much lower than481
those achieved by the approaches used for the comparison. From the figure we482
can also see that SWW is correctly classified by three of the four ML algo-483
rithms used. Only DT had a significant percentage of SWW samples (about484
22%) confused with the other acids. This value represents the percentage of485
false alarms ( with respect to SWW samples) and makes this algorithm much486
less performing than the other ones.487
5.5 Discussion488
Looking at the results reported above, some conclusions can be drawn:489
Polar coordinates achieves a better performance than the Cartesian ones,490
in terms of both overall accuracy and groups of pollutants confused. This491
result confirms that for the number of pollutants considered in this study492
(C= 6), using a shorter chromosome (2×Cinstead of 3 ×C) allows the EA493
to be more effective in searching the best parameters for the classification494
model proposed.495
The use of LDA (a supervised procedure) for the dimensionality reduction496
did not produce any performance improvement compared to PCA (an un-497
supervised procedure). This result confirms that the data transformation498
provided by LDA, which maximizes class separation, limits the effective-499
ness of the EA in finding the best parameters of the classification model.500
The initialization of a fraction of the individuals in the initial population by501
using training set information, allows the EA to find better solutions. We502
have found that the best value for this fraction is 0.10. This value, although503
quite low, allows a small but significant improvement of the performance.504
Together with the LDA performance discussed above, this result is a further505
confirmation that providing too much ”a-priori information” to the EA506
limits its search ability.507
As mentioned at the beginning of Section 5, the performance of our system508
was tested on data acquired with microchips different from those used to509
train it. Therefore, the good results achieved by our system confirm that its510
performance is robust with respect to the device to device variability. This511
is an import point to consider: in a real-world scenario, the parameters512
optimized by our system can be used to configure all the devices to be513
produced.514
Title Suppressed Due to Excessive Length 21
6 Conclusions and future work515
Water pollution is a worldwide concern that also includes wastewater which,516
if not polluted, helps protecting ecosystems. Therefore, it is crucial to find517
reliable and low-cost technologies for a continuous and diffused monitoring of518
wastewater.519
In two previous papers, we presented a system for water pollutant classifi-520
cation to be implemented on the multi-sensor microcontroller SENSIPLUS®.521
Input data were first projected into a 3-D space and then classified using sim-522
ple geometrical models. The aim was to implement a pollutant classification523
system able to work even with the few computational resources available on524
cheap microcontrollers. In this paper, we presented a further development of525
those approaches. This development allowed us to: (i) improve the effectiveness526
of our IoT-based system; (ii) reduce the amount of computational resources527
needed. The experiments were performed on six contaminants. The obtained528
results proved the effectiveness of the solutions proposed to improve the perfor-529
mance of our system. The results also confirmed that our system outperforms530
some state-of-the-art classification algorithms.531
Future work will focus on the definition of a variable-length chromosome.532
This should allow us to individuate sub-clusters of points belonging to the533
same contaminant. In this context we will also implement the ”dynamic label-534
ing” strategy presented in [9]. According to this approach, axes making up an535
individual will be not a priori labeled, but their labeling will occur after each536
sample in the training set has been assigned to its nearest axis.537
Acknowledgments538
The research leading to these results has received funding from the European539
Union’s Horizon 2020 research and innovation program under grant agreement540
SYSTEM No. 787128. The authors are solely responsible for it and that it does541
not represent the opinion of the Community and that the Community is not542
responsible for any use that might be made of information contained therein.543
The authors gratefully acknowledge Sensichips s.r.l. for the support during544
the experimental phases.545
This work was also supported by MIUR (Minister for Education, University546
and Research, Law 232/216, Department of Excellence).547
Conflicts of interest548
On behalf of all authors, the corresponding author states that there is no549
conflict of interest.550
22 Claudio De Stefano et al.
References551
1. Atzori, L., Iera, A., Morabito, G.: The internet of things: A survey. Computer Networks552
54(15), 2787 2805 (2010)553
2. Bernieri, A., Ferrigno, L., Laracca, M., Molinara, M.: An svm approach to crack shape554
reconstruction in eddy current testing. In: 2006 IEEE Instrumentation and Measurement555
Technology Conference Proceedings, pp. 2121–2126 (2006)556
3. Betta, G., Cerro, G., Ferdinandi, M., Ferrigno, L., Molinara, M.: Contaminants detec-557
tion and classification through a customized iot-based platform: A case study. IEEE558
Instrumentation Measurement Magazine 22(6), 35–44 (2019)559
4. Bruschi, P., Cerro, G., Colace, L., De Iacovo, A., Del Cesta, S., Ferdinandi, M., Fer-560
rigno, L., Molinara, M., Ria, A., Simmarano, R., Tortorella, F., Venettacci, C.: A novel561
integrated smart system for indoor air monitoring and gas recognition. In: 2018 IEEE562
International Conference on Smart Computing (SMARTCOMP), pp. 470–475 (2018)563
5. Cerro, G., Ferdinandi, M., Ferrigno, L., Laracca, M., Molinara, M.: Metrological charac-564
terization of a novel microsensor platform for activated carbon filters monitoring. IEEE565
Transactions on Instrumentation and Measurement 67(10), 2504–2515 (2018)566
6. Cerro, G., Ferdinandi, M., Ferrigno, L., Molinara, M.: Preliminary realization of a mon-567
itoring system of activated carbon filter rli based on the sensiplus®microsensor plat-568
form. In: 2017 IEEE International Workshop on Measurement and Networking (M N),569
pp. 1–5 (2017)570
7. Cilia, N., De Stefano, C., Fontanella, F., Scotto di Freca, A.: Variable-length representa-571
tion for ec-based feature selection in high-dimensional data. Lecture Notes in Computer572
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes573
in Bioinformatics) 11454 LNCS, 325–340 (2019)574
8. Cilia, N., De Stefano, C., Fontanella, F., Raimondo, S., Scotto di Freca, A.: An ex-575
perimental comparison of feature-selection and classification methods for microarray576
datasets. Information (Switzerland) 10(3) (2019)577
9. Cordella, L.P., De stefano, C., Fontanella, F.: Evolutionary prototyping for handwriting578
recognition. International Journal of Pattern Recognition and Artificial Intelligence579
21(01), 157–178 (2007)580
10. De Stefano, C., Ferrigno, L., Fontanella, F., Gerevini, L., Scotto di Freca, A.: A581
novel pca-based approach for building on-board sensor classifiers for water con-582
taminant detection. Pattern Recognition Letters 135, 375 381 (2020). DOI583
https://doi.org/10.1016/j.patrec.2020.05.015584
11. De Stefano, C., Fontanella, F., Folino, G., Scotto Di Freca, A.: A bayesian approach for585
combining ensembles of gp classifiers. Lecture Notes in Computer Science (including586
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)587
6713 LNCS, 26–35 (2011)588
12. De Stefano, C., Fontanella, F., Marrocco, C.: A ga-based feature selection algorithm for589
remote sensing images. Lecture Notes in Computer Science (including subseries Lecture590
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4974 LNCS, 285–591
294 (2008)592
13. Desmet, C., Degiuli, A., Ferrari, C., Romolo, F.S., Blum, L., Marquette, C.: Electro-593
chemical sensor for explosives precursors’ detection in water. Challenges 8(1) (2017)594
14. Faruqe, M.O., Hasan, M.A.M.: Face recognition using pca and svm. In: 2009 3rd Interna-595
tional Conference on Anti-counterfeiting, Security, and Identification in Communication,596
pp. 97–101 (2009)597
15. Ferdinandi, M., Molinara, M., Cerro, G., Ferrigno, L., Marrocco, C., Bria, A., Di Meo,598
P., Bourelly, C., Simmarano, R.: A novel smart system for contaminants detection and599
recognition in water. In: 2019 IEEE International Conference on Smart Computing600
(SMARTCOMP), pp. 186–191 (2019)601
16. Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: Proba-602
bilistic algorithms for constructing approximate matrix decompositions. SIAM Review603
53(2), 217–288 (2011)604
17. Jing, C., Hou, J.: Svm and pca based fault classification approaches for complicated605
industrial process. Neurocomputing 167, 636 642 (2015)606
Title Suppressed Due to Excessive Length 23
18. Kaur, A., Singh, P., Nayyar, A.: Fog Computing: Building a Road to IoT with Fog607
Analytics, pp. 59–78. Springer Singapore, Singapore (2020)608
19. Krishnamurthi, R., Kumar, A., Gopinathan, D., Nayyar, A., Qureshi, B.: An overview609
of iot sensor data processing, fusion, and analysis techniques. Sensors 20(21) (2020)610
20. Liu Ying, Gu Yanfeng, Zhang Ye: Hyperspectral feature extraction using selective pca611
based on genetic algorithm with subgroups. In: First International Conference on Inno-612
vative Computing, Information and Control - Volume I (ICICIC’06), vol. 3, pp. 652–656613
(2006)614
21. Lotfi, E., Keshavarz, A.: Gene expression microarray classification using pca–bel. Com-615
puters in Biology and Medicine 54, 180 187 (2014)616
22. Mahmud, F., Haque, M.E., Zuhori, S.T., Pal, B.: Human face recognition using pca617
based genetic algorithm. In: 2014 International Conference on Electrical Engineering618
and Information Communication Technology, pp. 1–5 (2014)619
23. Nayyar, A., Puri, V.: Smart farming: Iot based smart sensors agriculture stick for live620
temperature and moisture monitoring using arduino, cloud computing & solar tech-621
nology. In: Proc. of the International Conference on Communication and Computing622
Systems (ICCCS-2016), pp. 673–680 (2016)623
24. Nopens, I., Capalozza, C., Vanrolleghem, P.A.: Stability analysis of a synthetic munic-624
ipal wastewater. Department of Applied Mathematics Biometrics and Process Control,625
University of Gent, Belgium (2001)626
25. Ochoa, G.: Error thresholds in genetic algorithms. Evolutionary Computation 14(2),627
157–182 (2006)628
26. Rathee, D.S., Ahuja, K., Nayyar, A.: Sustainable future iot services with touch-enabled629
handheld devices. Security and Privacy of Electronic Healthcare Records: Concepts,630
paradigms and solutions (2019)631
27. Shi, W., Dustdar, S.: The promise of edge computing. Computer 49(5), 78–81 (2016)632
28. Stefano, C.D., Ferrigno, L., Fontanella, F., Gerevini, L., Molinara, M.: A novel evolu-633
tionary approach for iot-based water contaminant detection. In: P.A. Castillo, J.L.J.634
Laredo (eds.) Applications of Evolutionary Computation - 24th International Confer-635
ence, EvoApplications 2021, Held as Part of EvoStar 2021, Virtual Event, April 7-636
9, 2021, Proceedings, Lecture Notes in Computer Science, vol. 12694, pp. 781–794.637
Springer (2021)638
29. Whelton, A.J., McMillan, L., Connell, M., Kelley, K.M., Gill, J.P., White, K.D., Gupta,639
R., Dey, R., Novy, C.: Residential tap water contamination following the freedom in-640
dustries chemical spill: Perceptions, water quality, and health impacts. Environmental641
Science & Technology 49(2), 813–823 (2015)642
30. Xu, X., Wang, X.: An adaptive network intrusion detection method based on pca and643
support vector machines. In: X. Li, S. Wang, Z.Y. Dong (eds.) Advanced Data Mining644
and Applications, pp. 696–703. Springer Berlin Heidelberg, Berlin, Heidelberg (2005)645
31. Yong Xia, Wen, L., Eberl, S., Fulham, M., Feng, D.: Genetic algorithm-based pca eigen-646
vector selection and weighting for automated identification of dementia using fdg-pet647
imaging. In: 2008 30th Annual International Conference of the IEEE Engineering in648
Medicine and Biology Society, pp. 4812–4815 (2008)649
24 Claudio De Stefano et al.
(a) Acetic acid (b) Ammonia
(c) Formic acid (d) Hydrogen peroxid
(e) Phosforic acid (f) Sulphuric acid
Fig. 8: Average fitness along the evolution with different fraction of initializa-
tion of the six pollutants.
Title Suppressed Due to Excessive Length 25
(a) DT (b) KNN
(c) NN (d) SVM
Fig. 9: Confusion matrices achieved on the best run by the ML algorithms.
... This significant issue is connected to social, economic, legal, and lifestyle choices . (De Stefano et al., 2022) presented the evolutionary technique of detecting water pollution based on IoT. It incorporates sensor devices and collects the data signals. ...
Article
Rapid urbanization impacts water quality because contaminants from the urban environment accumulate in the water and pollute it and because there is more rivalry for water among municipalities, businesses, and other sectors such as farming. A change in the microclimate, fluid mechanics, geomorphic, ecological, or biogeochemical conditions will impact the water's quantity and quality. There is a reduction in the groundwater because of the difficulty that water has soaked into the earth as more roads are built. When the rain washes over impervious buildings like roadways and roofs, it leaves excessive pollution in water bodies. Both people and aquatic life may be at risk from the increased water pollution. This paper uses deep learning methods such as Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) to classify water quality. Next, it identifies the air quality in Urban Development (Conv. LSTM). The convolutional LSTMs use convolutional layers and the recurrent connections found in LSTMs. This allows the model to capture spatial dependencies in the input data in addition to the temporal dependencies captured by the recurrent connections. We also use thorough exploratory analysis to investigate the various beach habitats and the kinds of trash discovered in multiple places. Lowering water pollution and raising air quality are both strategies that can be employed to ensure sustainable urban development. The performance metrics such as accuracy, recall, precision, and F1-score are evaluated and classify the water pollution efficiently. In the water pollution dataset, the algorithms of RNN 65%, DBN 78%, LSTM 82%, and the proposed work of Conv.LSTM 92%. Similarly, for the air pollution dataset, the algorithms of RNN 60%, DBN 75%, LSTM 80%, and the proposed work of Conv.LSTM 91%.
... This prediction would be of paramount importance in the framework of Precision Medicine to concentrate the resources for patients' managements on the cases with the worse predicted clinical outcome. In the last few years, machine learning is demonstrating to be a viable way to solve a wide spectrum of real-world problems [6][7][8]. Regarding the prediction of cognitive decline in AD patients, to date, most studies using machine learning methods are based on magnetic resonance image (MRI) and positron emission tomography (PET) data [11,13]. ...
Chapter
Alzheimer’s disease causes most of dementia cases. Although currently there is no cure for this disease, predicting the cognitive decline of people at the first stage of the disease allows clinicians to alleviate its burden. Clinicians evaluate individuals’ cognitive decline by using neuropsychological tests consisting of different sections, each devoted to testing a specific set of cognitive skills. In this paper, we present the results of a preliminary study aimed at assessing the ability of machine learning based tools to predict the cognitive decline of Alzheimer’s patients using features extracted from EEG records at resting state. We tested seven classification schemes in predicting nine scores, provided by different sections of four neuropsychological tests. The experimental results demonstrated that at least three of these scores allows EEG-based features to be effective in predicting the cognitive decline of Alzheimer’s patients by using machine learning tools.
Article
Full-text available
This article explores the impact of automation on environmental sensing, focusing on advanced technologies that revolutionize data collection analysis and monitoring. The International Union of Pure and Applied Chemistry (IUPAC) defines automation as integrating hardware and software components into modern analytical systems. Advancements in electronics, computer science, and robotics drive the evolution of automated sensing systems, overcoming traditional limitations in manual data collection. Environmental sensor networks (ESNs) address challenges in weather constraints and cost considerations, providing high-quality time-series data, although issues in interoperability, calibration, communication, and longevity persist. Unmanned Aerial Systems (UASs), particularly unmanned aerial vehicles (UAVs), play an important role in environmental monitoring due to their versatility and cost-effectiveness. Despite challenges in regulatory compliance and technical limitations, UAVs offer detailed spatial and temporal information. Pollution monitoring faces challenges related to high costs and maintenance requirements, prompting the exploration of cost-efficient alternatives. Smart agriculture encounters hurdle in data integration, interoperability, device durability in adverse weather conditions, and cybersecurity threats, necessitating privacy-preserving techniques and federated learning approaches. Financial barriers, including hardware costs and ongoing maintenance, impede the widespread adoption of smart technology in agriculture. Integrating robotics, notably underwater vehicles, proves indispensable in various environmental monitoring applications, providing accurate data in challenging conditions. This review details the significant role of transfer learning and edge computing, which are integral components of robotics and wireless monitoring frameworks. These advancements aid in overcoming challenges in environmental sensing, underscoring the ongoing necessity for research and innovation to enhance monitoring solutions. Some state-of-the-art frameworks and datasets are analyzed to provide a comprehensive review on the basic steps involved in the automation of environmental sensing applications.
Article
Water quality is affected by increased urbanization as pollutants produced in the urban environment settle and contaminate water, and there is an increase in competition of water among cities, industries, agriculture, etc. The quality and quantity of water are affected by alterations in the microclimate, water dynamics, geomorphology, ecology, and biogeochemistry. As more pavements get created, it becomes increasingly difficult for water to soak into the ground and this causes a decrease in the water table. Impervious structures like streets and roofs when washed with rain deposit excessive pollutants in water bodies. The overall increase in water pollution is a potential health hazard for humans and aquatic life. Hence it is necessary to take adequate measures for addressing the water pollution issue that may potentially arise due to increased urbanization. In this study, we tackle the issue using two approaches. The first approach deals with analyzing the water quality to determine its potability using fifteen different types of machine learning techniques like random forests, decision trees, support vector machines, artificial neural networks, etc. The model has been evaluated using metrics such as precision, recall, accuracy, and F-1 score. The second approach deals with identifying marine litter from beaches in many parts of the world using machine learning algorithms. We also explore the different types of beach environments and the type of litter that is found in different locations using extensive exploratory analysis. Both approaches can be used for ensuring sustainable urban development by reducing water pollution.
Article
Full-text available
In the recent era of the Internet of Things, the dominant role of sensors and the Internet provides a solution to a wide variety of real-life problems. Such applications include smart city, smart healthcare systems, smart building, smart transport and smart environment. However, the real-time IoT sensor data include several challenges, such as a deluge of unclean sensor data and a high resource-consumption cost. As such, this paper addresses how to process IoT sensor data, fusion with other data sources, and analyses to produce knowledgeable insight into hidden data patterns for rapid decision-making. This paper addresses the data processing techniques such as data denoising, data outlier detection, missing data imputation and data aggregation. Further, it elaborates on the necessity of data fusion and various data fusion methods such as direct fusion, associated feature extraction, and identity declaration data fusion. This paper also aims to address data analysis integration with emerging technologies, such as cloud computing, fog computing and edge computing, towards various challenges in IoT sensor network and sensor data analysis. In summary, this paper is the first of its kind to present a complete overview of IoT sensor data processing, fusion and analysis techniques.
Chapter
Full-text available
There is a great impact on our day-to-day life by integrating platforms of cloud computing and Internet-of-things (IoT). Also, some of the limitations exist in today’s era. Although various services of cloud are freely available and are also comparatively cheaper. But it consumes a large amount of network bandwidth. The main disadvantage of cloud computing is the distance between the data center and the data source. Fog computing offers a solution to these kinds of problems in cloud computing. It is one of the distributed service computing models. It completely utilizes the various computing functions of terminal devices. It also exhibits para-virtualized architecture. The different characteristics of cloud and fog computing platforms are explained in this chapter. Also, the detailed architecture of both platforms is introduced with a comparative analysis. On the fog server, fog analytics tool performs data localization. All the methods of application management such as resource coordination technique, distributed application deployment, and distributed data flow method are discussed. Further, research direction in using Deep Learning to Big Data is detailed as the improved formulation of data abstractions, dimensionality reduction, etc. Also, the possible solutions are presented.
Article
Full-text available
In the last decade, there has been a growing scientific interest in the analysis of DNA microarray datasets, which have been widely used in basic and translational cancer research. The application fields include both the identification of oncological subjects, separating them from the healthy ones, and the classification of different types of cancer. Since DNA microarray experiments typically generate a very large number of features for a limited number of patients, the classification task is very complex and typically requires the application of a feature-selection process to reduce the complexity of the feature space and to identify a subset of distinctive features. In this framework, there are no standard state-of-the-art results generally accepted by the scientific community and, therefore, it is difficult to decide which approach to use for obtaining satisfactory results in the general case. Based on these considerations, the aim of the present work is to provide a large experimental comparison for evaluating the effect of the feature-selection process applied to different classification schemes. For comparison purposes, we considered both ranking-based feature-selection techniques and state-of-the-art feature-selection methods. The experiments provide a broad overview of the results obtainable on standard microarray datasets with different characteristics in terms of both the number of features and the number of patients.
Chapter
Nowadays, the problem of pollution in water is a very serious issue to be faced and it is really important to be able to monitoring it with non-invasive and low-cost solutions, like those offered by smart sensor technologies. In this paper, we propose an improvement of an our innovative classification system, based on geometrical cones, to detect and classify pollutants, belonging to a given set of substances, spilled into waste water. The solution is based on an ad-hoc classifier that can be implemented aboard the Smart Cable Water (SCW) sensor, based on SENSIPLUS technology developed by Sensichips s.r.l. The SCW is a smart-sensor endowed with six interdigitated electrodes, covered by specific sensing materials that allow detecting between different water contaminants. In order to develop an algorithm suitable to apply the “edge computing” paradigm we first compress the input data from a 10-dimensional space to a 3-D space by using the PCA decomposition techniques. Then we use an ad-hoc classifier to classify between the different contaminants in the transformed space. To learn the classifier’s parameters we used the evolutionary algorithms. The obtained results have been compared with the old classification system and other, more classical, machine learning approaches.
Article
Water pollution causes an ever-increasing number of diseases and represents a worldwide concern, both for governments and researchers, as well as public opinion. This pollution also regards drinkable water, with two billion people plagued by this problem. Therefore, it is crucial to find reliable and low-cost technologies for a continuous and diffused monitoring of water. In this paper, we present a novel approach that allows the detection of water contaminants by using an ad-hoc classification system that can be implemented aboard low-cost sensors. To this aim, we first project the input data from the sensors into a 3-D space by using the PCA algorithm, then we use an ad-hoc devised classifier to distinguish the contaminants in the transformed space. We used an evolutionary algorithm to learn the parameters of the classifiers. The experiments were performed on a large dataset containing data from four contaminants, with the phosphoric and sulphuric acids, among the others. The results obtained confirm the effectiveness of the proposed approach.
Chapter
In today’s hectic schedule numerous important tasks like servicing of our devices, switching off home appliances, purchasing essential eatables, and many other things which are important, usually skipped from our checklists. In order to resolve such issues many researchers/technicians have introduced the concept called Internet of things (IoTs). Home and industry, are two basic fields, where IoT has embed many new protocols or techniques to make things smarter. Everyone has a dream to make their home smart, where appliances communicate with each other and person himself monitors his home from anywhere. Today, it is possible to control refrigerator, treadmill, smart TV, light at home/office/industry from handheld devices. Latest gadgets are equipped with varied smart sensors like accelerometers, gyroscope, proximity sensor, GPS, barometer, magnetometer, ambient light sensor, Bluetooth, RFID along with long lasting batteries making them as smart handheld device. It may seem surprising today, but smart phones are going to manage IoTs movement in the near future. IoTs allude to expand interconnectedness of diverse smart gadgets over web. These gadgets include sensors and Internet which enable them to get, assemble, and transmit data by utilizing various connections, for example, Bluetooth, Wi-Fi, and so on. Therefore, the handheld devices can be considered as the user’s ultimate device for IoTs interactions and control. In this era handheld devices are helping customers to order items online, application to check the items, or even enable user to track how big the queue is in the store, regarding the order of an item, and let customer when to pick it up. In addition to this, IoT devices also help the users to keep eye on fitness, track the steps, and so on. All emerging new technologies in smart handheld devices prove that MEMs can be the main candidate to achieve the IoTs movement in future. So, handheld devices are considered as a sixth sense for the current user and capabilities can be increased by integrating IoTs. In this chapter, the integration of handheld devices with IoT is described in detail and also gives a clear vision regarding the challenges and opportunities regarding the implementation in the real-world applications.
Article
Internet of Things (IoT) is involving more and more fields where monitoring actions and fast and reliable data communication are simultaneously needed. Inside the general class of monitoring applications, those related to pollutant detection and classification are currently faced by many researchers and companies. Several approaches are being proposed in the literature, but lots of open issues and challenges are still to be handled before deploying a commonly considered optimum system. This contribution proposes a novel low-cost and highly flexible platform which is intended to tackle such challenges adopting ad-hoc hardware and software techniques. The proposed solution is applied to air and water contaminant detection case studies. The paper provides the reader with an innovative system in the field of pollution monitoring and focuses the attention on limitations, challenges and possible improvements needed to obtain reliable contaminant detection and, consequently, improve life quality.
Chapter
Feature selection is a challenging problem, especially when hundreds or thousands of features are involved. Evolutionary Computation based techniques and in particular genetic algorithms, because of their ability to explore large and complex search spaces, have proven to be effective in solving such kind of problems. Though genetic algorithms binary strings provide a natural way to represent feature subsets, several different representation schemes have been proposed to improve the performance, with most of them needing to a priori set the number of features. In this paper, we propose a novel variable length representation, in which feature subsets are represented by lists of integers. We also devised a crossover operator to cope with the variable length representation. The proposed approach has been tested on several datasets and the results compared with those achieved by a standard genetic algorithm. Results of comparisons demonstrated the effectiveness of the proposed approach in improving the performance obtainable with a standard genetic algorithm when thousand of features are involved.