ArticlePDF Available

Abstract and Figures

The development of deep learning has led to a dramatic increase in the number of applications of artificial intelligence. However, the training of deeper neural networks for stable and accurate models translates into artificial neural networks (ANNs) that become unmanageable as the number of features increases. This work extends our earlier study where we explored the acceleration effects obtained by enforcing, in turn, scale freeness, small worldness, and sparsity during the ANN training process. The efficiency of that approach was confirmed by recent studies (conducted independently) where a million-node ANN was trained on non-specialized laptops. Encouraged by those results, our study is now focused on some tunable parameters, to pursue a further acceleration effect. We show that, although optimal parameter tuning is unfeasible, due to the high non-linearity of ANN problems, we can actually come up with a set of useful guidelines that lead to speed-ups in practical cases. We find that significant reductions in execution time can generally be achieved by setting the revised fraction parameter (ζ) to relatively low values.
Accuracy percentage over 150 epochs varying ζ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta $$\end{document} among [0%,1%,2%]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[0\%, 1\%, 2\%]$$\end{document} plus ζ=30%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta =30\%$$\end{document} that is the benchmark value. In particular, ζ=0%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta = 0\%$$\end{document} with circled markers, ζ=1%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta = 1\%$$\end{document} has triangular markers, ζ=2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta = 2\%$$\end{document} is shown with squared markers, and for ζ=30%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta = 30\%$$\end{document} cross shape markers have been used
… 
This content is subject to copyright. Terms and conditions apply.
Soft Computing (2020) 24:17787–17795
https://doi.org/10.1007/s00500-020-05302-y
FOCUS
Artificial neural networks training acceleration through network
science strategies
Lucia Cavallaro1·Ovidiu Bagdasar1·Pasquale De Meo2·Giacomo Fiumara3·Antonio Liotta4
Published online: 9 September 2020
© The Author(s) 2020
Abstract
The development of deep learning has led to a dramatic increase in the number of applications of artificial intelligence.
However, the training of deeper neural networks for stable and accurate models translates into artificial neural networks
(ANNs) that become unmanageable as the number of features increases. This work extends our earlier study where we
explored the acceleration effects obtained by enforcing, in turn, scale freeness, small worldness, and sparsity during the
ANN training process. The efficiency of that approach was confirmed by recent studies (conducted independently) where a
million-node ANN was trained on non-specialized laptops. Encouraged by those results, our study is now focused on some
tunable parameters, to pursue a further acceleration effect. We show that, although optimal parameter tuning is unfeasible, due
to the high non-linearity of ANN problems, we can actually come up with a set of useful guidelines that lead to speed-ups in
practical cases. We find that significant reductions in execution time can generally be achieved by setting the revised fraction
parameter (ζ) to relatively low values.
Keywords Network science ·Artificial neural networks ·Multilayer perceptron ·Revise phase
1 Introduction
The effort to simulate the human brain behaviour is one of the
top scientific trends today. In particular, deep learning strate-
gies pave the way to many new applications, thanks to their
ability to manage complex architectures. Notable examples
are: speech recognition (Hinton et al. 2012), cyber-security
(Berman et al. 2019), image (Krizhevsky et al. 2017), and
signal processing (Dong and Li 2011). Other applications
gaining popularity are related to bio-medicine (Cao et al.
Communicated by Yaroslav D. Sergeyev.
BLucia Cavallaro
l.cavallaro@derby.ac.uk
Antonio Liotta
antonio.liotta@unibz.it
1University of Derby, Kedleston Road, Derby DE22 1GB, UK
2University of Messina, Polo Universitario Annunziata, 98122
Messina, Italy
3MIFT Department, University of Messina, 98166 Messina,
Italy
4Faculty of Computer Science, Free University of
Bozen-Bolzano, Bolzano, Italy
2018) and drug discovery (Chen et al. 2018; Ruano-Ordás
et al. 2019).
However, despite their success, deep learning architec-
tures suffer from important scalability issues, i.e., the actual
artificial neural networks (ANN) become unmanageable as
the number of features increases.
While most current strategies focus on using more pow-
erful hardware, the approach herein described employs
network science strategies to tackle the complexity of ANNs
iteratively, that is, at each epoch of the training process.
This work originates in our earlier publication (Mocanu
et al. 2018), a promising research avenue to speed up neural
network training. There, a new approach called sparse evolu-
tionary training (SET) was defined, in which the acceleration
effects obtained by enforcing, in turn, scale freeness, small
worldness, and sparsity, during the ANN training process,
were explored.
The SET framework firstly initializes an ANN as a sparse
weighted Erd˝os–Rényi graph in which the graph density is
fixed (=20%, by default), and assigns weights to edges
based on a normal distribution with mean equal to zero.
Secondly ( i.e., during the revision step), nonzero weights
iteratively replace null edges ( i.e., links with weight equal
to zero) with the twofold goal of reducing the loss on the
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
17788 L. Cavallaro et al.
training set and to keep the number of connections constant.
We should note that the revision step is not only rewiring
the links but also re-computing the actual weight of the new
links.
The efficiency of this approach has also been recently con-
firmed by independent researchers, who managed to train
a million-node ANN on non-specialized laptops (Liu et al.
2019).
Encouraged by those results, our research has now moved
into looking at algorithm tuning parameters to pursue a fur-
ther acceleration effect, at a negligible accuracy loss. The
focus is on the revision stage (determined by the ζparameter)
and on its impact on the training time over epochs. Notewor-
thy results have been achieved by conducting an in-depth
investigation into the optimal tuning of ζand by provid-
ing general guidelines on how to achieve better trade-offs
between time and accuracy, as described in Sect. 5.2.
The rest of the paper is organized as follows: Sect. 2pro-
vides the background theories employed in this work. To
better position our contribution, Sect. 3captures the state of
the art. Next, Sect. 4addresses the methodology followed
and Sect. 5shows the results obtained. Finally, Sect. 6draws
the conclusions.
2 Background
This section briefly introduces the main concepts required
for understanding this work.
Note that, for the sake of simplicity,the words ‘weight’ and
‘link’ are used interchangeably, and only weighted links have
been considered. The goal is to demonstrate the effectiveness
of the SET approach, aiming at lower revised fraction values,
in the context of the multilayer perceptron (MLP) supervised
model. MLP is a feed-forward ANN composed by several
hidden layers, forming a deep network, as shown in Fig.1.
Because of the intra-layer links flow, an MLP can be seen
as a fully connected directed graph between the input and
output layers.
Supervised learning involves observing several samples
of a given dataset, which will be divided into ‘training’ and
‘test’ samples. While the former is used to train the neural net-
work, the latter works as a litmus test, as it is compared with
the ANN predictions. One can find further details on deep
learning in LeCun et al. (2015); Goodfellow et al. (2016).
The construction of a fully connected graph inevitably
leads to higher computational costs, as the network grows. To
overcome this issue, the SET framework (Mocanu et al. 2018)
drew inspiration from human brain models and modelled an
ANN topology as a weighted sparse Erd˝os–Rényi graph in
which edges were randomly placed with nodes, according
to a fixed probability (Erd˝os and Rényi 1959; Barabási and
Pósfai 2016; Latora et al. 2017).
Like in Mocanu et al. (2018), the edge probability is
defined as follows:
pWk
ij=(nk+nk1)
nknk1,(1)
where WkRnk1×nkis a sparse weight matrix between
the k-th layer and the previous one, R+is the sparsity
parameter, and i,jare a pair of neurons; moreover, nkis the
number of neurons in the k-th layer.
As outlined in the previous section, this process led to
forcing network sparsity. This stratagem is balanced by intro-
ducing the tunable revise fraction parameter ζ, which defines
the weights fraction size that needs to be rewired (with a new
weight assignment) during the training process.
Indeed, at the end of each epoch, there is a weight adjust-
ment phase. It consists of removing the closest-to-zero links
in between layers plus a wider revising range ( i.e.,ζ). This
parameter verifies the correctness of the forced-to-be-zero
weights. Subsequently, the framework adds new weights ran-
domly to exactly compensate the removed ones. Thanks to
this procedure, the number of links between layers remains
constant across different epochs, without isolated neurons
(Mocanu et al. 2018).
Herein, the role of ζis analysed as well as showing how
to find a good range of ζvalues. Our aim is to strike a good
balance between learning speed and accuracy.
3 Related literature
In recent years, ANNs have been widely applied in a broad
range of domains such as image classification (He et al.
2016), machine translation (Vaswani et al. 2017), and text
to speech (Kalchbrenner et al. 2018).
Previous work proves that the accuracy of an ANN (also
known as model quality) crucially depends on both the model
size (defined as the number of layers and neurons per layers)
and the amount of training data (Hestness et al. 2017). Due
to these reasons, the amount of resources required to train
large ANNs is often prohibitive for real-life applications.
An approach promising to achieve high accuracy even
with modest hardware resources is sparsity (Gale et al. 2019).
An ANN is referred to as sparse when only a subset (hope-
fully of small size) of the model parameters has a value
different from zero. The advantages of sparse networks are
obvious. On the one hand, sparse data structures can be used
to store matrices associated with the representation of an
ANN. On the other hand, most of the matrix multiplications
(which constitute the most time expensive stage of neural
network computation) can be avoided. Furthermore, previ-
ous works (Ullrich et al. 2017; Mocanu et al. 2018) suggested
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Artificial neural networks training acceleration through network… 17789
Fig. 1 Example of a generic
multilayer perceptron network
with more than two hidden
layers. Circles represent
neurons, and arrows describe the
links between layers
that high levels of sparsity do not severely affect the accuracy
of an ANN.
This section provides a brief overview of methods used to
induce sparse ANNs, by classifying existing methods in two
main categories, namely:
1. Methods derived from network science to induce sparse
ANNs,
2. Methods derived from ANN regularization to induce
sparse ANNs.
3.1 Methods derived from network science to induce
sparse ANNs
Some previous papers focus on the interplay between net-
work science and artificial networks (Stier and Granitzer
2019; Mocanu et al. 2018; Bourely et al. 2017). More specifi-
cally, they draw inspiration from biological phenomena such
as the organization of human brain (Latora et al. 2017;
Barabási and Pósfai 2016).
Early studies in network science, in fact, pointed out that
real graphs (e.g. social networks describing social ties among
members of a community) display important features such
as power-law distribution in node degree (Barabási and Pós-
fai 2016) and the small-world property (Watts and Strogatz
1998). Many authors agree that these properties are likely
to exist in many large networked systems one can observe
in nature. For instance, in case of biological and neuronal
networks, Hilgetag and Goulas (2016) suggested that the neu-
ronal network describing the human brain can be depicted as
a globally sparse network with a modular structure.
As a consequence, approaches based on network science
consider ANNs as sparse networks whose topological fea-
tures resemble those of many biological systems and they
take advantage from their sparseness to speed up the training
stage.
A special mention goes to recent research in Liu et al.
(2019), where the authors managed to train a million-node
ANN on non-specialized laptops, based on the SET frame-
work that was initially introduce in Mocanu et al. (2018).
SET is a training procedure in which connections are pruned
on the basis of their magnitude, while other connections are
randomly added. The SET algorithm is actually capable of
generating ANNs that have sparsely connected layers and,
yet, achieve excellent predictive accuracy on real datasets.
Inspired by studies on rewiring in human brain, Bellec
et al. (2018) formulated the DEEPR algorithm for training
ANNs under connectivity constraints. This algorithm auto-
matically rewires an ANN during the training stage and, to
perform such a task, it combines a stochastic gradient descent
algorithm with a random walk in the space of parameters to
learn.
Bourely et al. (2017) studied to what extent the accuracy of
an ANN depends on the density of connections between two
consecutive layers. In their approach, they proposed sparse
neural network architectures, which derive from random or
structured bipartite graphs. Experimental results show that,
with a properly chosen topology, sparse neural networks can
equal or supersede a fully connected ANN with the same
number of nodes and layers in accuracy, with the clear advan-
tage of handling a much smaller parameter space.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
17790 L. Cavallaro et al.
Stier and Granitzer (2019) illustrated a procedure to gener-
ate ANNs, which derive from artificial graphs. The proposed
approach generates a random directed and acyclic graph
Gaccording to the Watts-Strogatz (1998) or the Barabási-
Albert (2016) models. Nodes in Gare then mapped onto
layers in an ANN, and some classifiers (such as support
vector machines and random forest) are trained to decide
if a Watts–Strogatz topology yields a better accuracy than a
Barabási–Albert one (or vice versa).
3.2 Methods derived from ANN regularization to
induce sparse ANNs
Methods such as L1or L0regularization, which gained pop-
ularity in supervised learning, have been extensively applied
to generate compact yet accurate ANNs.
For instance, Srinivas et al. (2017) introduced addi-
tional gate variables to efficiently perform model selection.
Furthermore, Louizos et al. (2017) described an L0-norm
regularization method, which forces connection weights to
become zero. Zero-weight connections are thus pruned, and
this is equivalent to induce sparse networks.
The methods above are successful in producing sparse
but accurate ANNs; however, they lack explainability. Thus,
it is hard to understand why certain architectures are more
competitive than others.
It is also interesting to point out that regularization tech-
niques can be viewed as procedures compressing an ANN by
deleting unnecessary connections (or, in an equivalent fash-
ion, to select only few parameters). According to Frankle and
Carbin (2018), techniques to prune an ANN are effective to
uncover sub-networks within an ANN whose initialization
made the training process more effective. According to these
premises, Frankle and Carbin suggested what they called the
lottery ticket hypothesis. In other words, dense and randomly
initialized ANNs contain sub-networks (called winning tick-
ets) that, when trained in isolation, are able to reach the same
(or a comparable) test accuracy as the original network, and
within a similar number of iterations.
4 Method
Herein we illustrate our research questions and strategy.
To speed up the training process, the investigation relates
to the effects drawn by ζvariations during the evolutionary
weight phase, at each epoch. The analysis involves a gradual
ζreduction with the goal to provide a guideline on how to
find the best ζvalues range, to trade-off between speed-up
and accuracy loss on different application domains.
In Mocanu et al. (2018), the default revise fraction was
set to ζ=0.3 (i.e. 30% of the revised fraction of nodes)
and no further investigations on the sensitivity to ζwere
carried out. Unlike in Mocanu et al. (2018)’s research, an
in-depth analysis on the revised fraction is herein conducted
to understand these effects, particularly how the revise step
affects the training when ζis substantially reduced. In this
paper, ζ∈[0,1]and ζ∈[0% 100%]notations are used
interchangeably.
Some obvious considerations of this problem are that a
shorter execution time and a certain percentage of accuracy
loss for smaller values of ζare expected. Nonetheless, this
relationship is bound to be nonlinear; thus, it is crucial to get
to quantitative results.
4.1 Dataset and ANN descriptions
The experiments were conducted using well-known datasets,
publicly available online1:
Lung Cancer2is a biological dataset composed by fea-
tures on lung cancer in order to train the ANN to be able
to detect them.
CLL_SUB_1113is composed by B-cell chronic lympho-
cytic leukaemia. This dataset born to profile the five most
frequent genomic aberrations ( i.e., deletions affecting
chromosome bands 13q14, 11q22-q23, 17p13 and 6q21,
and gains of genomic material affecting chromosome
band 12q13) (Haslinger et al. 2004).
COIL204is an image dataset used to train ANNs to detect
20 different objects. The images of each object were taken
five degrees apart as the object is rotated on a turntable
and each object has 72 images. The size of each image
is 32 ×32 pixels, with 256 grey levels per pixel. Thus,
each one is represented by a 1024-dimensional vector
(Cai et al. 2011,PAMI),(Caietal.2011, VLDB).
Both Lung Cancer and CLL_SUB_111 are biological
datasets, widely used for their importance in medicine,
whereas the COIL20 dataset is a popular images dataset.
Further quantitative details are provided in Table 1.
The ANN used is composed of three hidden layers with
3,000 neurons per layer. The activation functions used by
default are ReLu for the hidden layers and sigmoid for the
output (Table 2).
4.2 Comparison with our previous work
In Mocanu et al. (2018), the goal was to implement the SET
algorithm and test it with numerous datasets, on several ANN
1http://featureselection.asu.edu/.
2https://sites.google.com/site/ feipingnie/file/.
3https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2466.
4http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Artificial neural networks training acceleration through network… 17791
Table 1 Dataset structures
description Name Type Inst. (#) In. Feat. (#) Out. C. (#)
Lung cancer Biological 203 3,312 5
CLL_SUB_111 Biological 111 11,340 3
COIL20 Face image 1440 1024 20
From left: dataset name; dataset type; number of instances, number of input features; number of output classes
Table 2 Artificial neural networks description
Loss function Batch size (fitting) Batch size (prediction) Learning rate Momentum Weight decay
MSE 2 1 0.01 0.9 0.0002
It provides information about: the loss function, the batch sizes, the learning rate, the momentum, and the weight decay
types (MLPs, CNN, RBMs), and on different types of tasks
(supervised and unsupervised learning). The current study
investigates the role of the revise fraction parameter ζ, rather
than on the algorithm itself. The aim is to provide a gen-
eral guideline on finding the best ζvalues range to reduce
execution time, at a negligible loss of accuracy.
In Cavallaro et al. (2020), a preliminary study on the role of
ζhas suggested a negligible accuracy loss, lower fluctuations,
and a valuable gain in overall execution time with ζ<0.02
with the Lung Cancer dataset. In the present paper, this
intuition is analysed on a wider range of datasets to provide
stronger justifications for the findings. The most important
contribution of our study has been to confirm the effective-
ness of the SET framework. Indeed, the random sparseness in
ANNs introduced by the SET algorithm is powerful enough
even without further fine tuning of weights ( i.e., revise frac-
tion) during the training process.
5 Results
This section compares the results obtained by varying the
parameter ζ, evaluating the training goodness in terms of the
balance between high accuracy reached and short execution
time. These topics are treated in Sects. 5.1 and 5.2, respec-
tively. Section 5.3 provides a brief comment on the preferable
ζvalue, following up from the previous subsections.
For brevity, only the most important outcomes are reported
hereafter. The number of epochs was increased from the
default value of 100 up to 150 with the aim of finding the
ending point of the transient phase. By combining these two
tuning parameters ( i.e., number of epochs and ζ), we have
discovered that, with the datasets herein analysed, the mean-
ingful revise range is 0 ζ0.02.
In particular, Sect. 5.2 shows further investigations in
terms of execution time gains, conducted by replicated exper-
iments over ten runs and averaging the obtained results.
5.1 Accuracy investigation
This section shows the results obtained from the comparative
analysis in terms of accuracy improvements over 150 epochs,
on the three datasets.
In the Lung Cancer dataset (Fig. 2a), substantial accu-
racy fluctuations are present, but there is a no well-defined
transient phase for ζ>0.02. The benchmark value ζ=0.3
shows an accuracy variation of more than 10% (e.g. accu-
racy increasing from 82% to 97% at the 60-th epoch and
an accuracy from 85% to 95% at the 140th epoch). Note
that, since the first 10 epochs are within the settling phase,
the significant observations concern the simulation from the
11th epoch. Due to this uncertainty and due to the absence of
a transient phase, it is impossible to identify an optimal stop-
ping condition for the algorithm. For instance, at the 60th
epoch an accuracy collapse from 97% to 82% was found,
followed by an accuracy of 94% at the next epoch.
For a lower revise fraction, i.e.,ζ0.02, an improve-
ment in terms of both stability ( i.e., lower fluctuations) and
accuracy loss emerges, as expected. In this scenario, defining
an exit condition according to the accuracy trend over time is
easier. Indeed, despite a higher accuracy loss, the curve sta-
bility allows the identification of a gradual accuracy growth
over the epochs, with no unexpected sharp drops.
To quantify the amount of accuracy loss, refer to Table 3,
which reports both the revise fraction and the highest accu-
racy reached during the whole simulation, as a percentage.
Moreover, mean and confidence interval bounds are pro-
vided. From Table 3, it is possible to assert that, on average,
the improvement achieved by using a higher revise fraction
(as the default one is) has an accuracy gain of just less than
3% (e.g. mean at ζ=0% vs mean at ζ=30%) that is a
negligible improvement in most of the application domains.
This depends on the tolerance level required. For example,
if the goal is to achieve an accuracy of at least 90%, then a
lower ζis sufficiently effective. The confidence interval is
rather low, given that the fluctuation between the lower and
the upper bounds is comprised between 0.8 and 0.9.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
17792 L. Cavallaro et al.
(a) (b)
(c)
Fig. 2 Accuracy percentage over 150 epochs varying ζamong [0%,1%,2%]plus ζ=30% that is the benchmark value. In particular, ζ=0%
with circled markers, ζ=1% has triangular markers, ζ=2% is shown with squared markers, and for ζ=30% cross shape markers have been
used
In the Coil20 dataset (Fig. 2b), a short transient phase with
no evident improvements among the simulations with differ-
ent values of ζemerges. Indeed, there are just small accuracy
fluctuations of ±3%. These results do not surprise, since
improvements achieved through ζvariations also depend on
the goodness of the dataset itself, both in terms of its size and
in the choice of its features. Table 3shows that accuracy is
always above 98%; thus, even with ζ=0 the accuracy loss
is negligible. Also the confidence interval is lower than 0.3.
As the accuracy is continuously increasing over the training
epochs, defining a dynamic exit condition is easier in this
application domain.
Figure 2c shows the results obtained in CLL_SUB_111
dataset. It is evident that the worse and more unstable
approaches among the one considered are both the default
one ( i.e.,ζ=30%) and ζ=2%.
From Table 3, it is interesting to notice how the accuracy
levels are even more stable when using a lower revise fraction
(i.e., going from a mean equal to 62.23% in ζ=30% up
to 67.14% in ζ=0%). The fluctuations compared with the
other two datasets are more evident, even when looking at the
confidence interval; indeed, it varies from 1.06 (with ζ=0)
up to 2.18 (with ζ=30), which is larger than the previously
analysed one. Because of significant accuracy fluctuations, a
possible early exit condition should be considered only with
ζ=0 even at the cost of a slighter higher accuracy loss.
The results obtained so far suggest that there is no need
to fine-tune ζ, because the sparsity introduced by the SET
algorithm is sufficiently powerful, and only a few links need
to be rewired ( i.e.,ζ0.2). Apart from the goodness of the
datasets themselves (as in COIL20), opting for a lower revise
fraction has shown that, on the one hand, the accuracy loss
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Artificial neural networks training acceleration through network… 17793
Table 3 Evaluating parameters
varying the revise fraction on
datasets considered in a single
run with fixed seed
ζ(%) Max Acc. (%) Mean (%) Lower B. (%) Upper B. (%)
(a) Lung cancer dataset
30% 97.06% 93.13% 92.67% 93.58%
2% 95.59% 90.38% 90.08% 90.68%
1% 94.12% 90.19% 89.84% 90.55%
0% 94.12% 90.21% 89.79% 90.62%
(b) COIL20 dataset
30% 100% 98.82% 98.75% 98.90%
2% 98.96% 98.17% 98.08% 98.25%
1% 99.79% 99.09% 98.99% 99.18%
0% 99.79% 98.84 % 98.70% 98.98%
(c) CLL_SUB_111 dataset
30% 72.97% 62.23% 61.14% 63.32%
2% 72.97% 65.15% 64.54% 65.76%
1% 75.67% 70.79% 70.15% 71.42%
0% 70.27% 67.14% 66.61% 67.67%
From left: the revise fraction in percentage; the highest accuracy reached during the simulation expressed in
percentage; the accuracy mean during the simulation, and the confidence interval bounds. Note that these last
three parameters are computed after the first 10 epochs to avoid noise
is sometimes negligible. On the other hand, as it was in the
CLL_SUB_111 dataset, the performances are even higher
than the ones obtained through the benchmark value. This
confirms the hypothesis made in Sect. 5.1 of the goodness of
using a randomly sparse ANN topology.
5.2 Execution time investigation
This section shows the comparative analysis conducted
among the datasets used, in terms of execution time, over
replicated simulations. Ten runs have been averaged, using
the default value ζ=0.3, as benchmark ( i.e.,ζdef ault).
Note that only the most significant and competitive ζvalue
has been considered ( i.e.,ζ0=0). Figure 3shows the exe-
cution time (in seconds) of the same averaged simulations
computed on the three datasets.
In both Lung and CLL_SUB_111 datasets, ζ=0isfaster
than the benchmark value. In particular, in CLL_SUB_111,
the execution time is almost 40% faster than the default one
and with higher accuracy performances too, as previously
asserted in Sect. 5.1. It became less competitive in COIL20.
The reason is the same with the results emerged in the accu-
racy analysis. Indeed, the goodness of the dataset is such as
to make insignificant the improvements obtained by varying
the revise parameter. Furthermore, the execution time gain
between ζ=0 and ζdef ault has been computed among the
datasets over ten runs as follows:
Gain =1ζ0
ζdef ault
(2)
The execution time gain was equal to 0.1370 in Lung,
0.0052 in COIL20, and 0.3664 in CLL_SUB_111.This
means that, except for COIL20, there is an improvement
in terms of algorithm performances. Thus, the algorithm
became faster using a lower revise fraction. This is even more
evident in CLL_SUB_111 as already noticed from Figure 3.
On the other hand, the slow down emerged in COIL20 is
almost negligible; thus, it may be concluded that for specific
types of datasets, there is neither gain nor loss in choosing a
lower ζ.
These results confirmed the previous hypothesis of the
unnecessary fine-tune ζprocess even because, on particular
datasets (e.g. COIL20), an in-depth analysis of ζis profitless.
Thus, a relatively low revise fraction has been demonstrated
to be a good practice in most of the cases.
5.3 Considerations on the tuning process
In Sects. 5.1 and 5.2, we have described the effects of ζin
terms of accuracy loss and execution time, respectively. This
section provides a brief summary of what emerged from those
experiments. As largely discussed in the literature, it is unre-
alistic to try and find a perfect value, which works well in
all possible deep learning scenarios apriori. The same con-
sideration should be made during the revise fraction tuning.
This is why those tests are not aimed at finding the opti-
mal value, which depends on too many variables. Instead,
it may be asserted that, from the experiments herein con-
ducted, a relatively low ζis always a good choice. Indeed, in
the datasets analysed the best results have been obtained with
0ζ0.02. It also important to highlight that because of
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
17794 L. Cavallaro et al.
Fig. 3 Execution Time over 10 runs. From left to right, the Lung,
COIL20 and CLL datasets are shown
the high non-linearity of the problem itself, more than one ζ
value could effectively work, and the process of fine-tuning
ζis an operation that may require more time than the training
process itself. This is why this study would provide a good
enough range of possible ζvalues. Thus, the tests have been
conducted on very different datasets to assert that, empiri-
cally speaking, in different scenarios 0 ζ0.02 it is
sufficient to offer a high accuracy with low fluctuations and,
at the same time, faster execution time.
6 Conclusions
In this paper, we moved a step forward from earlier work
Mocanu et al. (2018). Not only did our experiments confirm
the efficiency arising from training sparse neural networks,
but they also managed to further exploit sparsity through a
better tuned algorithm, featuring increased speed at a negli-
gible accuracy loss.
The revised fraction goodness is independent from the
application domain; thus, a relatively low zeta is always a
good practice. Of course, according to the specific scenario
considered, the performance may be higher than (or at least
equal to) the benchmark value. Yet, it is evident that net-
work science algorithms, by keeping sparsity in ANNs, are a
promising direction for accelerating their training processes.
From one side, acting on the revise parameter ζ, accu-
racy and execution time performances are positively affected.
From the other side, it is unrealistic to try and define apriori
an optimal ζvalue, without considering the specific applica-
tion domain, because of the high non-linearity of the problem.
However, through this analysis it is possible to assert that a
relatively low ζis generally sufficient to balance both accu-
racy loss and execution time. Another strategy could be to
sample the dataset in order to manage a lower amount of
data and train only that portion of information on which to
conduct tests on ζ.
This study paves the way for other works, such as
the implementation of dynamic exit conditions to further
speed-up the algorithm itself, the development of adaptive
algorithms that dynamically tune the parameters, and the
study of different distributions for the initial weight assign-
ments.
Acknowledgements We thank Dr Decebal Costantin Mocanu for pro-
viding constructive feedback.
Compliance with ethical standards
Conflict of interest All authors declare that they have no conflict of
interest.
Human and animal rights This article does not contain any studies with
human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adap-
tation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indi-
cate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence,
unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your
intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copy-
right holder. To view a copy of this licence, visit http://creativecomm
ons.org/licenses/by/4.0/.
References
Barabási A-L, Pósfai M (2016) Network science. Cambridge University
Press, Cambridge UK
Bellec G, Kappel D, Maass W, Legenstein R (2018) Deep
rewiring: training very sparse deep networks. arXiv preprint
arXiv:1711.05136
Berman DS, Buczak AL, Chavis JS, Corbett CL (2019) A survey of deep
learning methods for cyber security.Information 4:122. https://doi.
org/10.3390/info10040122
Bourely A, Boueri JP, Choromonski K (2017) Sparse neural networks
topologies. arXiv preprint arXiv:1706.05683
Cai D, He X, Han J, Huang TS (2011) Graph regularized non-negative
matrix factorization for data representation. PAMI 33(8):1548–
1560
Cai D, He X, Han J (2011) Speed up kernel discriminant analysis. VLDB
J 20:21–33. https://doi.org/10.1007/s00778-010- 0189-3
CaoC,LiuF,TanH,SongD,ShuW,LiW,ZhouY,BoX,XieZ
(2018) Deep learning and its applications in biomedicine. Genom
Proteomics Bioinform 16(1):17–32. https://doi.org/10.1016/j.gpb.
2017.07.003
Cavallaro L, Bagdasar O, De Meo P, Fiumara G, Liotta A (2020) Artifi-
cial neural networks training acceleration through network science
strategies. In: Sergeyev YD, Kvasov DE (eds) Numerical compu-
tations: theory and algorithms, NUMTA 2019. Lecture Notes in
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Artificial neural networks training acceleration through network… 17795
Computer Science, Springer, Cham 11974:330–336. https://doi.
org/10.1007/978- 3-030- 40616-5_27
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018)
The rise of deep learning in drug discovery. Drug Discov Today
23(6):1241–1250. https://doi.org/10.1016/j.drudis.2018.01.039
Dong Y, Li D (2011) Deep learning and its applications to signal and
information processing [exploratory DSP]. IEEE Signal Process
Mag 1:145. https://doi.org/10.1109/MSP.2010.939038
Erd˝os P, Rényi A (1959) On random graphs i. Publ Math-Debr 6:290–
297
Frankle J, Carbin M (2018) The Lottery Ticket Hypothesis:
Finding Sparse, Trainable Neural Networks. arXiv preprint
arXiv:1803.03635
Gale T, Elsen E, Hooker S (2019) the state of sparsity in deep neural
networks. arXiv preprint arXiv:1902.09574
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press,
Cambridge US
Haslinger C, Schweifer N, Stilgenbauer S, Döhner H, Lichter P, Kraut
N, Stratowa C, Abseher R (2004) Microarray gene expression pro-
filing of B-cell chronic lymphocytic leukemia subgroups defined
by genomic aberrations and VH mutation status. J Clin Oncol
22(19):3937–49. https://doi.org/10.1200/JCO.2004.12.133
Hestness J, Narang S, Ardalani N, Diamos GF, Jun H, Kianinejad H,
Patwary MMA, Yang Y, Zhou Y (2017) Deep learning scaling is
predictable, empirically. arXiv preprint arXiv:1712.00409
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image
recognition. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, CVPR 2016, Las Vegas USA, pp
770–778
Hilgetag CC, Goulas A (2016) Is the brain really a small-world network?
Brain Struct Funct 221(4):2361–2366
Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A,
Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep
neural networks for acoustic modeling in speech recognition. IEEE
Signal Process Mag 29:82–97
Kalchbrenner N, Elsen E, Simonyan K, Noury S, Casagrande N, Lock-
hart E, Stimberg F, van den Oord A, Dieleman S, Kavukcuoglu K
(2018) Efficient neural audio synthesis. In: Proceedings of the
international conference on machine learning, ICML 2018, Stock-
holm, pp 2415–2424
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classifica-
tion with deep convolutional neural networks. Commun ACM
60(6):84–90. https://doi.org/10.1145/3065386
Latora V, Nicosia V, Russo G (2017) Complex networks: principles,
methods and applications. Cambridge University Press, Cam-
bridge UK
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–
444. https://doi.org/10.1038/nature14539
Liu S, Mocanu DC, Matavalam ARR, Pei Y, Pechenizkiy M (2019)
Sparse evolutionary Deep Learning with over one million artificial
neurons on commodity hardware. ArXiv, arXiv:1901.09181
Louizos C, Welling M, Kingma DP (2017) Learning sparse
neural networks through L0Regularization. arXiv preprint
arXiv:1712.01312
Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A
(2018) Scalable training of artificial neural networks with adaptive
sparse connectivity inspired by network science. Nat Commun
9:2383. https://doi.org/10.1038/s41467-018- 04316-3
Ruano-Ordás D, Yevseyeva I, Fernandes VB, Méndez JR, Emmerich
MTM (2019) Improving the drug discovery process by using mul-
tiple classifier systems. Expert Syst Appl 121:292–303. https://
doi.org/10.1016/j.eswa.2018.12.032
Srinivas S, Subramanya A, Babu RV (2017) Training sparse neural
networks. In: Proceedings of the IEEE conference on computer
vision and pattern recognition workshops, Honolulu, pp 455–462.
https://doi.org/10.1109/CVPRW.2017.61
Stier J, Granitzer M (2019) Structural analysis of sparse neural net-
works. Procedia Comput Sci 159:107–116
Ullrich K, Meeds E, Welling M (2017) Soft weight-sharing for neural
network compression. arXiv preprint arXiv:1702.04008
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN,
Kaiser L, Polosukhin I (2017) Attention is all you need. In:
Proceedings of the annual conference on neural information pro-
cessing systems, Long Beach, USA, pp 6000–6010
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’
networks. Nature 393:440–442
Publisher’s Note Springer Nature remains neutral with regard to juris-
dictional claims in published maps and institutional affiliations.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Some works in the literature address this problem and aim to simplify the training phase, for instance by using Genetic Algorithms to select the model topology [5][6][7]. On the other hand, other works aim at accelerating the training by using sparse Artificial Neural Network models: the first work in this direction is by Mocanu et al. [8], who proposes an evolutionary algorithm that takes inspiration from biological neural networks, but others followed [9,10]. ...
... Supervised learning involves observing several samples of a given dataset, which will be divided into training and test samples. While the former is used to train the neural network, the latter works as a litmus test, as it is compared with the ANN predictions [1,9,18]. Indeed, thanks to the training set, the supervised learning models are able to teach models to obtain the desired output to the ANN. ...
... The size of each image is 32 × 32 pixels, with 256 grey levels per pixel. Thus, each one is represented by a 1024-dimensional vector [9]. ...
Chapter
Training Artificial Neural Networks (ANNs) is a non-trivial task. In the last years, there has been a growing interest in the academic community in understanding how those structures work and what strategies can be adopted to improve the efficiency of the trained models. Thus, the novel approach proposed in this paper is the inclusion of the entropy metric to analyse the training process. Herein, indeed, an investigation on the accuracy computation process in relation to the entropy of the intra-layers’ weights of multilayer perceptron (MLP) networks is proposed. From the analysis conducted on two well-known datasets with several configurations of the ANNs, we discovered that there is a connection between those two metrics (i.e., accuracy and entropy). These promising results can be helpful in defining, in the future, new criteria to evaluate the training process goodness in real-time by optimising it and allow faster detection of its trend.
... The family of algorithms related mostly to our work are those based on topology sparsification [15]. Methods have been proposed which prune connections between the neurons; for instance, the works [13], [24] and [6,7,21]. However, these linkage sparsification techniques do not aim at mimicking the topological structure of real neural networks, but are mainly based on eliminating close-to-zero weighted connections. ...
... The competitors include the SET algorithm [22], and also the fully-connected (FC) neural network without any pruning. Moreover, we implemented a recent algorithm inspired by the concept, proposed in [6] and a set of variations implementing the logic of the algorithm reported in [35]. We call the former algorithm as the ζparameterized -(SF2SFrand), which is a variant of SF2SFrand and it differs in the values of ζ parameter, which is responsible for the number of the close-to-zero weight links, being erased in every epoch. ...
Article
Full-text available
Multilayer neural architectures with a complete bipartite topology have very high training time and memory requirements. Solid evidence suggests that not every connection contributes to the performance; thus, network sparsification has emerged. We get inspiration from the topology of real biological neural networks which are scale-free. We depart from the usual complete bipartite topology among layers, and instead we start from structured sparse topologies known in network science, e.g., scale-free and end up again in a structured sparse topology, e.g., scale-free. Moreover, we apply smart link rewiring methods to construct these sparse topologies. Thus, the number of trainable parameters is reduced, with a direct impact on lowering training time and a direct beneficial result in reducing memory requirements. We design several variants of our concept (SF2SFrand, SF2SFba, SF2SF5, SF2SW, and SW2SW, respectively) by considering the neural network topology as a Scale-Free or Small-World one in every case. We conduct experiments by cutting and stipulating the replacing method of the 30% of the linkages on the network in every epoch. Our winning method, namely the one starting from a scale-free topology and producing a scale-free-like topology (SF2SFrand) can reduce training time without sacrificing neural network accuracy and also cutting memory requirements for the storage of the neural network.
... To ensure the success of the training, different parts of the data must be examined. The training efficiency of neural network models is dependent on the data and the learning algorithm adopted [47,48]. Advanced learning algorithms effectively transfer intelligence to the ANN from historical facts or data. ...
Article
Full-text available
Flooding is a hazardous natural calamity that causes significant damage to lives and infrastructure in the real world. Therefore, timely and accurate decision-making is essential for mitigating flood-related damages. The traditional flood prediction techniques often encounter challenges in accuracy, timeliness, complexity in handling dynamic flood patterns and leading to substandard flood management strategies. To address these challenges, there is a need for advanced machine learning models that can effectively analyze Internet of Things (IoT)-generated flood data and provide timely and accurate flood predictions. This paper proposes a novel approach-the Adaptive Momentum and Backpropagation (AM-BP) algorithm-for flood prediction and management in IoT networks. The AM-BP model combines the advantages of an adaptive momentum technique with the backpropagation algorithm to enhance flood prediction accuracy and efficiency. Real-world flood data is used for validation, demonstrating the superior performance of the AM-BP algorithm compared to traditional methods. In addition, multilayer high-end computing architecture (MLCA) is used to handle weather data such as rainfall, river water level, soil moisture, etc. The AM-BP’s real-time abilities enable proactive flood management, facilitating timely responses and effective disaster mitigation. Furthermore, the AM-BP algorithm can analyze large and complex datasets, integrating environmental and climatic factors for more accurate flood prediction. The evaluation result shows that the AM-BP algorithm outperforms traditional approaches with an accuracy rate of 96%, 96.4% F1-Measure, 97% Precision, and 95.9% Recall. The proposed AM-BP model presents a promising solution for flood prediction and management in IoT networks, contributing to more resilient and efficient flood control strategies, and ensuring the safety and well-being of communities at risk of flooding
... The confusion matrix compares the actual values with the predicted values predicted by CNN model. It represents the values for TP, TN, FP and FN [37]. ...
Article
Full-text available
Biomedical imaging is a rapidly evolving field that covers different types of imaging techniques which are used for diagnostic and therapeutic purposes. It plays a vital role in diagnosis and treating health conditions of human body. Classification of different imaging modalities plays a vital role in terms of providing better care and treatment options to the patients. Advancements in technology open up the new doors for medical professionals and this involves deep learning methods for automatic image classification. Convolutional neural network (CNN) is a special class of deep learning that is applied to visual imagery. In this paper, a novel spatial feature fusion based deep CNN is proposed for classification of microscopic peripheral blood cell images. In this work, multiple transfer learning features are extracted through four pre-trained CNN architectures namely VGG19, ResNet50, MobileNetV2 and DenseNet169. These features are fused into a generalized feature space that increases the classification accuracy. The dataset considered for the experiment contains 17902 microscopic images that are categorized into 8 distinct classes. The result shows that the proposed CNN model with fusion of multiple transfer learning features outperforms the individual pre-trained CNN model. The proposed model achieved 96.10% accuracy, 96.55% F1-score, 96.40% Precision and 96.70% Recall values.
... YSA'nın diğer bir temel öğesi olan Aktivasyon Fonksiyonu bir değişkeni farklı bir boyuta taşıyan doğrusal veya doğrusal olmayan bir fonksiyondur. YSA'da Aktivasyon Fonksiyonu yapay sinir hücresi girdi verileri üzerinde işlem yaparak buna karşılık gelen net çıktı sonuçları elde eder [13][14][15][16][17][18][19][20]. Algoritma 2. YSA Temel Adımları [21] i. ...
Article
Full-text available
Büyük veri azaltma sürecinde karşılaşılan başlıca zorluk, veri setinin homojenliğinin ve problem uzayını temsil yeteneğinin korunmasıdır. Bu durum, büyük veri setleri üzerinde yapılan modelleme çalışmalarında hesaplama karmaşıklığının yeterince azaltılamamasına, geliştirilen modelin orijinal veri setine dayalı olarak geliştirilen modele kıyasla kararlılık ve doğruluk performansının önemli ölçüde azalmasına neden olmaktadır. Bu makale çalışmasının amacı, büyük veri setleri için kararlı ve etkili bir şekilde çalışan veri azaltma algoritması geliştirmektir. Bu amaçla, yapay sinir ağları (YSA) tabanlı problem modelleme modülü ve K-ortalamalar tabanlı veri azaltma modülünden oluşan melez bir algoritma geliştirilmiştir. Problem modelleme modülü, büyük veri seti için performans eşik değerlerini tanımlamayı sağlamaktadır. Bu sayede, orijinal veri setinin ve veri azaltma işlemi uygulanmış veri setlerinin problem uzayını temsil yetenekleri ve kararlılıkları analiz edilmektedir. K-ortalamalar modülünün görevi ise, veri uzayını K-adet kümede gruplamayı ve bu grupların her biri için küme merkezini referans alarak kademeli olarak veri (gözlem) azaltma işlemini gerçekleştirmektir. Böylelikle, K-ortalamalar modülü ile veri azaltma işlemi uygulanırken, azaltılmış veri setlerinin performansı ise YSA modülü ile test edilmekte ve performans eşik değerlerini karşılama durumu analiz edilmektedir. Geliştirilen melez veri azaltma algoritmasının performansını test etmek ve doğrulamak amacıyla UCI Machine Learning uluslararası veri havuzunda yer alan üç farklı veri seti kullanılmıştır. Deneysel çalışma sonuçları istatistiksel olarak analiz edilmiştir. Analiz sonuçlarına göre büyük veri setlerinde kararlılık ve performans kaybı yaşanmadan %30-%40 oranları arasında veri azaltma işlemi başarılı bir şekilde gerçekleştirilmiştir.
... Artificial Neural networks (ANNs) are types of ML models that are characterized by a complex algorithmic process for prediction, which could in turn lead to more accurate forecasts. This algorithmic complexity leads to significantly high algorithmic training times, especially when the number of features increase [4]. As a result, ANNs lack the potential to be dynamically trained on real-time data, and thus provide dynamic forecasts. ...
Article
Full-text available
The paper develops a goal programming-based multi-criteria methodology, for assessing different machine learning (ML) regression models under accuracy and time efficiency criteria. The developed methodology provides users with high flexibility in assessing the models as it allows for a fast and computationally efficient sensitivity analysis of accuracy and time significance weights as well as accuracy and time significance threshold values. Four regression models were assessed, namely the decision tree, random forest, support vector and the neural network. The developed methodology was employed to forecast the time to failures of NASA Turbofans. The results reveal that decision tree regression (DTR) seems to be preferred for low values of accuracy weights (up to 30%) and low accuracy and time efficiency threshold values. As the accuracy weights tend to increase and for higher accuracy and time efficiency threshold values, random forest regression (RFR) seems to be the best choice. The preference for the RFR model however, seems to change towards the adoption of the neural network for accuracy weights equal to and higher than 90%.
Article
Full-text available
Goal programming (GP) is an important optimization technique for handling multiple, and often conflicting, objectives in decision making. This paper undertakes an extensive literature review to synthesize key findings on the diverse real-world applications of GP across domains, its implementation challenges, and emerging directions. The introduction sets the context and objectives of the review. This is followed by an in-depth review of literature analyzing GP applications in areas as varied as agriculture, healthcare, education, energy management, supply chain planning, and macroeconomic policy modeling. The materials and methods provide an overview of the systematic literature review methodology. Key results are presented in terms of major application areas of GP. The discussion highlights the versatility and practical utility of GP, while also identifying limitations. The conclusion outlines promising avenues for enhancing GP modeling approaches to strengthen multi-criteria decision support.
Article
Full-text available
Artificial neural networks (ANNs) have emerged as hot topics in the research community. Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the unprecedented growth in the data volumes. Particularly for microarray data, the very high dimensionality and the small number of samples make it difficult for machine learning techniques to handle. Furthermore, specialized hardware such as graphics processing unit (GPU) is expensive. Sparse neural networks are the leading approaches to address these challenges. However, off-the-shelf sparsity-inducing techniques either operate from a pretrained model or enforce the sparse structure via binary masks. The training efficiency of sparse neural networks cannot be obtained practically. In this paper, we introduce a technique allowing us to train truly sparse neural networks with fixed parameter count throughout training. Our experimental results demonstrate that our method can be applied directly to handle high-dimensional data, while achieving higher accuracy than the traditional two-phase approaches. Moreover, we have been able to create truly sparse multilayer perceptron models with over one million neurons and to train them on a typical laptop without GPU (https://github.com/dcmocanu/sparse-evolutionary-artificial-neural-networks/tree/master/SET-MLP-Sparse-Python-Data-Structures), this being way beyond what is possible with any state-of-the-art technique.
Article
Full-text available
Sparse Neural Networks regained attention due to their potential of mathematical and computational advantages. We give motivation to study Artificial Neural Networks (ANNs) from a network science perspective, provide a technique to embed arbitrary Directed Acyclic Graphs into ANNs and report study results on predicting the performance of image classifiers based on the structural properties of the networks’ underlying graph. Results could further progress neuroevolution and add explanations for the success of distinct architectures from a structural perspective.
Article
Full-text available
This survey paper describes a literature review of deep learning (DL) methods for cyber security applications. A short tutorial-style description of each DL method is provided, including deep autoencoders, restricted Boltzmann machines, recurrent neural networks, generative adversarial networks, and several others. Then we discuss how each of the DL methods is used for security applications. We cover a broad array of attack types including malware, spam, insider threats, network intrusions, false data injection, and malicious domain names used by botnets.
Article
Full-text available
Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) artificial neural networks, too, should not have fully-connected layers. Here we propose sparse evolutionary training of artificial neural networks, an algorithm which evolves an initial sparse topology (Erdős–Rényi random graph) of two consecutive layers of neurons into a scale-free topology, during learning. Our method replaces artificial neural networks fully-connected layers with sparse ones before training, reducing quadratically the number of parameters, with no decrease in accuracy. We demonstrate our claims on restricted Boltzmann machines, multi-layer perceptrons, and convolutional neural networks for unsupervised and supervised learning on 15 datasets. Our approach has the potential to enable artificial neural networks to scale up beyond what is currently possible.
Article
Full-text available
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning.
Article
Full-text available
Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high output quality. We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24kHz 16-bit audio 4x faster than real time on a GPU. Second, we apply a weight pruning technique to reduce the number of weights in the WaveRNN. We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds for sparsity levels beyond 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time. Finally, we propose a new generation scheme based on subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple samples at once. The Subscale WaveRNN produces 16 samples per step without loss of quality and offers an orthogonal method for increasing sampling efficiency.
Article
Full-text available
Over the past decade, deep learning has achieved remarkable success in various artificial intelligence research areas. Evolved from the previous research on artificial neural networks, this technology has shown superior performance to other machine learning algorithms in areas such as image and voice recognition, natural language processing, among others. The first wave of applications of deep learning in pharmaceutical research has emerged in recent years, and its utility has gone beyond bioactivity predictions and has shown promise in addressing diverse problems in drug discovery. Examples will be discussed covering bioactivity prediction, de novo molecular design, synthesis prediction and biological image analysis.
Chapter
Deep Learning opened artificial intelligence to an unprecedented number of new applications. A critical success factor is the ability to train deeper neural networks, striving for stable and accurate models. This translates into Artificial Neural Networks (ANN) that become unmanageable as the number of features increases. The novelty of our approach is to employ Network Science strategies to tackle the complexity of the actual ANNs at each epoch of the training process. The work presented herein originates in our earlier publications, where we explored the acceleration effects obtained by enforcing, in turn, scale freeness, small worldness, and sparsity during the ANN training process. The efficiency of our approach has also been recently confirmed by independent researchers, who managed to train a million-node ANN on non-specialized laptops. Encouraged by these results, we have now moved into having a closer look at some tunable parameters of our previous approach to pursue a further acceleration effect. We now investigate on the revise fraction parameter, to verify the necessity of the role of its double-check. Our method is independent of specific machine learning algorithms or datasets, since we operate merely on the topology of the ANNs. We demonstrate that the revise phase can be avoided in order to half the overall execution time with an almost negligible loss of quality.
Article
Machine learning methods have become an indispensable tool for utilizing large knowledge and data repositories in science and technology. In the context of the pharmaceutical domain, the amount of acquired knowledge about the design and synthesis of pharmaceutical agents and bioactive molecules (drugs) is enormous. The primary challenge for automatically discovering new drugs from molecular screening information is related to the high dimensionality of datasets, where a wide range of features is included for each candidate drug. Thus, the implementation of improved techniques to ensure an adequate manipulation and interpretation of data becomes mandatory. To mitigate this problem, our tool (called D2-MCS) can split homogeneously the dataset into several groups (the subset of features) and subsequently, determine the most suitable classifier for each group. Finally, the tool allows determining the biological activity of each molecule by a voting scheme. The application of the D2-MCS tool was tested on a standardized, high quality dataset gathered from ChEMBL and have shown outperformance of our tool when compare to well-known single classification models.
Article
Recent work on neural network pruning indicates that, at training time, neural networks need to be significantly larger in size than is necessary to represent the eventual functions that they learn. This paper articulates a new hypothesis to explain this phenomenon. This conjecture, which we term the "lottery ticket hypothesis," proposes that successful training depends on lucky random initialization of a smaller subcomponent of the network. Larger networks have more of these "lottery tickets," meaning they are more likely to luck out with a subcomponent initialized in a configuration amenable to successful optimization. This paper conducts a series of experiments with XOR and MNIST that support the lottery ticket hypothesis. In particular, we identify these fortuitously-initialized subcomponents by pruning low-magnitude weights from trained networks. We then demonstrate that these subcomponents can be successfully retrained in isolation so long as the subnetworks are given the same initializations as they had at the beginning of the training process. Initialized as such, these small networks reliably converge successfully, often faster than the original network at the same level of accuracy. However, when these subcomponents are randomly reinitialized or rearranged, they perform worse than the original network. In other words, large networks that train successfully contain small subnetworks with initializations conducive to optimization. The lottery ticket hypothesis and its connection to pruning are a step toward developing architectures, initializations, and training strategies that make it possible to solve the same problems with much smaller networks.