PreprintPDF Available

Abstract and Figures

Machine learning (ML) empowers biomedical systems with the capability to optimize their performance through modeling of the available data extremely well, without using strong assumptions about the modeled system. Especially in nano-scale biosystems, where the generated data sets are too vast and complex to mentally parse without computational assist, ML is instrumental in analyzing and extracting new insights, accelerating material and structure discoveries and designing experience as well as supporting nano-scale communications and networks. However, despite these efforts, the use of ML in nano-scale biomedical engineering remains still under-explored in certain areas and research challenges are still open in fields such as structure and material design and simulations, communications and signal processing, and bio-medicine applications. In this article, we review the existing research regarding the use of ML in nano-scale biomedical engineering. In more detail, we first identify and discuss the main challenges that can be formulated as ML problems. These challenges are classified in the three aforementioned main categories. Next, we discuss the state of the art ML methodologies that are used to countermeasure the aforementioned challenges. For each of the presented methodologies, special emphasis is given to its principles, applications and limitations. Finally, we conclude the article with insightful discussions, that reveals research gaps and highlights possible future research directions.
Content may be subject to copyright.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 1
Machine Learning in Nano-Scale
Biomedical Engineering
Alexandros–Apostolos A. Boulogeorgos, Senior Member, IEEE, Stylianos E. Trevlakis, Student Member, IEEE,
Sotiris A. Tegos, Student Member, IEEE, Vasilis K. Papanikolaou, Student Member, IEEE, and
George K. Karagiannidis, Fellow, IEEE
Abstract—Machine learning (ML) empowers biomedical sys-
tems with the capability to optimize their performance through
modeling of the available data extremely well, without using
strong assumptions about the modeled system. Especially in nano-
scale biosystems, where the generated data sets are too vast and
complex to mentally parse without computational assist, ML is
instrumental in analyzing and extracting new insights, accelerat-
ing material and structure discoveries and designing experience
as well as supporting nano-scale communications and networks.
However, despite these efforts, the use of ML in nano-scale
biomedical engineering remains still under-explored in certain
areas and research challenges are still open in fields such as
structure and material design and simulations, communications
and signal processing, and bio-medicine applications. In this
article, we review the existing research regarding the use of ML
in nano-scale biomedical engineering. In more detail, we first
identify and discuss the main challenges that can be formulated
as ML problems. These challenges are classified in the three
aforementioned main categories. Next, we discuss the state of
the art ML methodologies that are used to countermeasure the
aforementioned challenges. For each of the presented method-
ologies, special emphasis is given to its principles, applications
and limitations. Finally, we conclude the article with insightful
discussions, that reveals research gaps and highlights possible
future research directions.
Index Terms—Biomedical engineering, Machine learning,
Molecular communications, Nano-structure design, Nano-scale
networks.
NOMENCLATURE
2D Two dimensional
3D Three dimensional
ANI Accurate neural network engine for molecular
energies
AL Active Learning
AdaBoost Adaptive Boosting
AEV Atomic Environments Vector
ANN Artificial Neural Network
ANOVA Analysis of Variance
ARES Autonomous Research System
Bagging Bootstrap Aggregating
BER Bit Error Rate
BPN Behler-Parrinello Network
The authors are with the Wireless Communications Systems Group
(WCSG), Department of Electrical and Computer Engineering, Aristotle
University of Thessaloniki, Thessaloniki, 54124 Greece. e-mails: {trevlakis,
geokarag, tegosoti, vpapanikk} @auth.gr, al.boulogeorgos@ieee.org.
Alexandros–Apostolos A. Boulogeorgos is also with the Department of
Digital Systems, University of Piraeus, Piraeus 18534, Greece.
Manuscript received -, 2020; revised -, 2020.
BSS Blind Source Separation
CG Coarse Graining
CGN Coarse Graining Network
CMOS Complementary Metal-Oxide-Semiconductor
CNN Convolution Neural Network
DCF Discrete Convolution Filter
DNN Deep Neural Network
D2NN Diffractive Deep Neural Network
DPN Deep Potential Network
DT Decision Table
DTL Decision Tree Learning
DTNB Decision Table Naive Bayes
DTNN Deep Tensor Neural Network
FS Feature Selection
FSC Feedback System Control
GAN Generative Adversarial Network
GD Gradient Descent
GRNN Generalized Regression Neural Network
ICA Independent Component Analysis
ISI Inter-Symbol Interference
KNN k-Nearest Neighbor
LDA Linear Discriminant Analysis
LR Logistic Regression
LWL Local Weighted Learning
MAN Molecular Absorption Noise
MC Molecular Communications
MIMO Multiple-Input Multiple-Output
ML Machine Learning
MLP Multi-layer Perceptron
ML-SF Machine Learning Scoring Function
MvLR Multivariate linear regression
NBTree Naive Bayes Tree
NN Neural Network
NNP Neural Network Potential
PAMAM Polyamidoamine
PCA Principal Component Analysis
PDF Probability Density Function
PES Potential Energy Surface
PSO Particle Swarm Optimization
QM Quantum Mechanic
QP Quadratic Programming
QPOP Quadratic Phenotype Optimization Platform
QSAR Quantitative Structure-activity relationships
RELU REctified Linear Unit
RForest Random Forest
RNAi Ribonucleic acid interference
arXiv:2008.02195v1 [cs.LG] 5 Aug 2020
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 2
RNN Recurrent Neural Network
SDR Standard Deviation Reduction
SF Scoring Functions
SmF Symmetry Function
SMO Sequential Minimal Optimization
SOTA State Of The Art
SVM Support Vector Machine
TEM Transmission Electron Microscope
THz Terahertz
I. INTRODUCTION
In 1959, Richard P. Feynman articulated “It would be
interesting if you could shallow the surgeon. You put the
mechanical surgeon inside the blood vessel and it goes into the
heart and looks around... other small machines might be per-
manently incorporated in the body to assist some inadequately-
functioning organ.” More than half a century later, this quote is
still state-of-the-art (SOTA). Currently, nanotechnology revis-
its the conventional therapeutic approaches by producing more
than 100 nano-material based drugs. These have already been
approved or they are under clinical trial [1], while discussing
the utilization of nano-scale communication networks for real
time monitoring and precision drug delivery [2], [3]. However,
these developments come with the need of analyzing vast and
complicated, as well as rich in relations, data sets.
Fortunately, in the last couple of decades, we have witnessed
a revolutionary development of new tools from the field of
machine learning (ML), which enables the analysis of large
data sets through training models. These models can be used
for observations classification or predictions and have been
used in several engineering fields, including computer vision,
speech and image recognition, natural language processing,
etc. This frontier is continuing its expansion into several other
scientific domains, such as quantum physics, chemistry, and
biology and is expected to make a significant impact on
the design of novel nano-materials and structures, nano-scale
communication systems and networks, while simultaneously
presenting new data-driven biomedicine applications [4].
In the field of nano-materials and structure design, ex-
perimental and computational simulating methodologies have
traditionally been the two fundamental pillars in exploring
and discovering properties of novel constructions as well as
optimizing their performance [5]. However, these methodolo-
gies are constrained by experimental conditions and limitation
of the existing theoretical knowledge. Meanwhile, as the
chemical complexity of nano-scale heterogeneous structures
increases, the two traditional methodologies are rendered
incapable of predicting their properties. In this context, the
development of data-driven techniques, like ML, becomes very
attractive. Similarly, in nano-scale communications and signal
processing, the computational resources are limited and the
major challenge is the development of low-complexity and
accurate system models and data detection techniques, that
do not require channel knowledge and equalization, while
taking into account the environmental conditions (e.g. specific
enzyme composition). To address these challenges the devel-
opment of novel ML methods is deemed necessary [6]. Last
but not least, ML can aid in devising novel, more accurate
methods for disease detection and therapy development, by
enabling genome classification [7] and selection of the opti-
mum combination of drugs [8].
Motivated from the above, the present contribution provides
an interdisciplinary review of the existing research from the
areas of nano-engineering, biomedical engineering and ML.
To the best of the authors knowledge no such review exists
in the technical literature, that focuses on the ML-related
methodologies that are employed in nano-scale biomedical
engineering. In more detail, the contribution of this paper is
as follows:
The main challenges-problems in nano-scale biomedical
engineering, which can be tackled with ML techniques,
are identified and classified based on the discipline in
three main categories: structure and material design and
simulations, communications and signal processing, and
bio-medicine applications.
SOTA ML methodologies, which are used in the field
of nano-scale biomedical engineering, are reviewed, and
their architectures are described. For each one of the
presented ML methods, we report its principles and
building blocks. Finally, their compelling applications in
nano-scale biomedicine systems are surveyed for aiding
the readers in refining the motivation of ML in these
systems, all the way from analyzing and designing new
materials and structures to holistic therapy development.
Finally, the advantages and limitations of each ML ap-
proach are highlighted, and future research directions
are provided.
The rest of the paper is organized as follows: Section II
identifies the nano-scale biomedical engineering problems that
can be solved with ML techniques. Section III presents the
most common ML approaches related to the field of nano-scale
biomedical engineering. Section IV explains the advantages
and limitations of the ML approaches alongside their applica-
tions and extracts future directions. Section V concludes this
paper and summarizes its contribution. The structure of this
treatise is summarized at a glance in Fig. 1.
II. MAC HI NE LEARNING CHALLENGES IN NA NO -SC ALE
BIOMEDICAL ENGINEERING
In this section, we report how several of the open challenges
in nano-scale biomedical engineering has already been and can
be formulated to ML problems. In order to provide a better
understanding of the nature of these challenges, we classify
them into three categories, i.e. i) structure and material design
and simulation, ii) communications and signal processing,
and iii) biomedicine applications. Following this classification,
which is illustrated in Fig. 2, the rest of this section is
organized as follows: Section II-A focuses on presenting the
challenges on designing and simulating nano-scale structures,
materials and systems, whereas, Section II-B discusses the
necessity of employing ML in nano-scale communications.
Similarly, Section II-C emphasizes in the possible applications
of ML in several applications, such as therapy development,
drug delivery and data analysis.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 3
Sec. I - Introduction
Sec. II - Machine Learning Challenges in Nano-scale Biomedical Engineering
Sec. II-A - Structure and Material Design and Simulation
Sec. II-B - Communications and Signal Processing
Sec. II-C - Biomedicine Applications
Sec. III - Machine Learning Methodologies in Nano-scale Biomedical Engineering
Sec. III-A - Artificial Neural Networks
Sec. III-B - Regression
Sec. III-C - Support Vector Machine
Sec. III-D - -Nearest Neighbors
Sec. III-E - Dimentionality Reduction
Sec. III-F - Gradient Descent Method
Sec. III-G - Active Learning
Sec. III-H - Bayesian Machine Learning
Sec. III-I - Decision Tree Learning
Sec. III-J - Decision Table
Sec. III-K - Surrogate-Based Otpimization
Sec. III-L - Quantitative Structure-Activity Relationships
Sec. III-M - Boltzmann Generator
Sec. III-N - Feedback System Control
Sec. III-O - Quadratic Phenotypic Optimization Platform
Sec. IV - Discussion & The Road Ahead
Sec. V - Conclusion
Fig. 1. The structure of this treatise.
A. Structure and Material Design and Simulation
One of the fundamental challenges in material science and
chemistry is the understanding of the structure properties [9].
The complexity of this problem grows dramatically in the case
of nanomaterials because: i) they adopt different properties
from their bulk components; and ii) they are usually hetero-
structures, consisting of multiple materials. As a result, the
design and optimization of novel structures and materials, by
discovering their properties and behavior through simulations
and experiments, lead to multi-parameter and multi-objective
problems, which in most cases are extremely difficult or
impossible to be solved through conventional approaches; ML
can be an efficient alternative choice to this challenge.
1) Biological and chemical systems simulation: In atomic
and molecular systems, there exist complex relationships be-
tween the atomistic configuration and the chemical properties,
which, in general, cannot be described by explicit forms. In
these cases, ML aims to the development of associate config-
urations by means of acquiring knowledge from experimental
data. Specifically, in order to incorporate quantum effects on
molecular dynamics simulations, ML can be employed for the
derivation of potential energy surfaces (PESs) from quantum
mechanic (QM) evaluations [10]–[15]. Another use of ML
lies in the simulation of molecular dynamic trajectories. For
example, in [16]–[18], the authors formulated ML problems
for discovering the optimum reaction coordinates in molecular
ML in nano-scale biomedical engineering
Structure and material design and simulation
Experimental planning and autonomous research
Inverse design
Biologican and chemical system simulation
Communications and signal processing
Channel modeling
Signal detection
Security
Routing and mobility management
Event detection
Biomedical Applications
Therapy development
Disease detection
Fig. 2. ML challenges in nano-scale biomedical engineering.
dynamics, whereas, in [19]–[23], the problem of estimating
free energy surfaces, was reported. Furthermore, in [24]–[27],
the ML problem of creating Markov state models, which
take into account the molecular kinetics, was investigated.
Finally, the ML use in generating samples from equilibrium
distributions, that describe molecular systems, was studied
in [28].
2) Inverse design: The availability of several high-
resolution lithographic techniques opened the door to devising
complex structures with unprecedented properties. However,
the vast choices space, which is created due to the large
number of spatial degrees of freedom complemented by the
wide choice of materials, makes extremely difficult or even
impossible for conventional inverse design methodologies to
ensure the existence or uniqueness of acceptable utilizations.
To address this challenge, nanoscience community turned their
eyes to ML. In more detail, several researchers identified three
possible methods, which are based on artificial neural net-
works (ANNs),deep neural networks (DNNs), and generative
adversarial networks (GANs). ANNs follow a trail-and-error
approach in order to design multilayer nanoparticles [29].
Meanwhile, DNNs are used in the metasurface design [30].
Finally, GANs can be used to design nanophotonics structures
with precise user-define spectral responses [31].
3) Experiments planning and autonomous research: ML
has been widely employed, in order to efficiently explore
the vast parameter space created by different combinations of
nano-materials and experimental conditions and to reduce the
number of experiments needed to optimize hetero-structures
(see e.g., [32] and references therein). Towards this direction,
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 4
fully autonomous research can be conducted, in which exper-
iments can be designed based on insights extracted from data
processing through ML, without human in the loop [33].
B. Communications and Signal Processing
In biomedical applications, nano-sensors can be utilized for
a variety of tasks such as monitoring, detection and treatment.
The size of such nano-sensors ranges between 1100 nm,
which refers to both macro-molecules and bio-cells [34]. The
proper selection of size and materials is critical for the system
performance, while it is constrainted by the target area, their
purpose, and safety concerns. Such nano-networks are inspired
by living organisms and, when they are injected into the
human body, they interact with biological processes in order
to collect the necessary information [35]. However, they are
characterized by limited communication range and processing
power, that allow only short-range transmission techniques to
be used [36]. As a consequence, conventional electromagnetic-
based transmission schemes may not be appropriate for com-
munications among molecules [3], [37], since, in the latter
the information is usually encoded in the number of released
particles. The simplest approach for the receiver to demodulate
the symbol is to compare the number of received particles
with predetermined thresholds. In the absence of inter-symbol
interference (ISI), finding the optimal thresholds is a straight-
forward process. However, in the presence of ISI the threshold
needs to be extracted as a solution of the error probability
minimization (or performance maximization) problem [38]–
[40]. The aforementioned approaches require knowledge of
the channel model. However, in several practical scenarios,
where the molecular communications (MC) system complexity
is high, this may not be possible. To countermeasure this issue,
ML methods can be employed to accurately model the channel
or perform data sequence detection.
An alternative to MCs that has been used to support nano-
networks is communications in the terahertz (THz) band. For
these networks, apart from their specifications, an accurate
model for the THz communication between nano-sensors is
imperative for their simulation and performance assessment. In
addition, another problem that is entangled with novel nano-
sensor networks is their resilience against attacks, which is
of high importance since not only the system reliability is
threatened, but also the safety of the patients is at stake.
Thus, it is imperative for any possible threats to be recognized
and for effective countermeasures to be developed. A solution
to the above problems appears to be relatively complex for
conventional computational methods. On the other hand, ML
can provide the tools to model the space-time trajectories of
nano-sensors in the complex environments of the human body
as well as to draw strategies that mitigate the security risks of
the novel network architectures.
1) Channel modeling: One of the fundamental problems
in MCs is to accurately model the channel in different en-
vironments and conditions. Most of the MC models assume
that a molecule is removed from the environment after hitting
the receiver [41]–[45]; hence, each molecule can contribute
to the received signal once. To model this phenomenon,
a first-passage process is employed. Another approach was
created from the assumption that molecules can pass through
the receiver [46]–[49]. In this case, a molecule contributes
multiple times to the received signal. However, neither of the
aforementioned approaches are capable of modeling perfectly
absorbing receivers, when the transmitters are reflecting spher-
ical bodies. Interistingly, such models accommodate practical
scenarios where the emitter cells do not have receptors at the
emission site and they cannot absorb the emitted molecules.
An indicative example lies in hormonal secretion in the
synapses and pancreatic βcell islets [50]. To fill this gap,
ML was employed in [51], [52] to model molecular channels
in realistic scenarios, with the aid of ANNs. Similarly, in
THz nano-scale networks, where the in-body environment is
characterized by high path-loss and molecular absorption noise
(MAN), ML methods can be used in order to accurately model
MAN. This opens the road to a better understanding of the
MAN’s nature and the design of new transmission schemes
and waveforms.
2) Signal detection: To avoid channel estimation in MC,
Farsal et al. proposed in [53] a sequence detection scheme,
based on recurrent neural networks (RNNs). Compared with
previously presented ISI mitigation schemes, ML-based data
sequence detection is less complex, since they do not require to
perform channel estimation and data equalization. Following a
similar approach, in [6], the authors presented an ANN capable
of achieving the same performance as conventional detection
techniques, that require perfect knowledge of the channel.
In THz nano-scale networks, an energy detector is usually
used to estimate the received data [54]. In more detail, if the
received signal power is below a predefined threshold, the de-
tector decides that the bit 0has been sent, otherwise, it decides
that 1is sent. However, the transmission of 1causes a MAN
power increase, usually capable of affecting the detection of
the next symbols. To counterbalance this, without increasing
the symbol duration, a possible approach is to design ML
algorithms that are trained to detect the next symbol and
take into account the already estimated ones. Another ML
challenge in signal detection at THz nano-scale networks,
lies with detecting the modulation mode of the transmission
signal by a receiver, when no prior synchronization between
transmitter and receiver has occurred. The solution to this
problem will provide scalability to these networks. Motivated
by this, in [55], the authors provided a ML algorithm for
modulation recognition and classification.
3) Routing and mobility management: In THz nano-scale
networks, the design of routing protocols capable of proac-
tively countermeasuring congestion has been identified as the
next step for their utilization [56]. These protocols need to
take into account the extremely constrained computational
resources, the stochastic nature of nano-nodes movements
as well as the existence of obstacles that may interrupt the
line-of-sight transmission. The aforementioned challenges can
be faced by employing SOTA ML techniques for analyzing
collected data and modeling the nano-sensors’ movements,
discovering neighbors that can be used as intermediate nodes,
identifying possible blockers, and proactively determining
the message root from the source to the final destination.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 5
In this context, in [57], the authors presented a multi-hop
deflection routing algorithm based on reinforcement learning
and analyzed its performance in comparison to different neural
networks (NNs) and decision tree updating policies.
4) Event detection: Nano-sensor biomedicine networks can
provide continuous monitoring solutions, that can be used
as compact, accurate, and portable diagnostic systems. Each
nano-sensor obtains a biological signal linked to a specific
disease and is used for detecting physiological change or
various biological materials [58]. Successful applications in
event detection include monitoring of DNA interactions, an-
tibody, and enzymatic interactions, or cellular communication
processes, and are able to detect viruses, asthma attacks and
lung cancer [59]. For example, in [60], the authors developed a
bio-transferrable graphene wireless nano-sensor that is able to
sense extremely sensitive chemicals and biological compounds
up to single bacterium. Furthermore, in [61], a Sandwich
Assay was developed that combines mechanical and opto-
plasmonic transduction in order to detect cancer biomarkers
at extremely low concentrations. Also, in [62], a molecular
communication-based event detection network was proposed,
that is able to cope with scenarios where the molecules
propagate according to anomalous diffusion instead of the
conventional Brownian motion.
5) Security: Although, the emergence of nano-scale net-
works based on both electromagnetic and MCs opened oppor-
tunities for the development of novel healthcare applications,
it also generated new problems concerning the patients’ safety.
In particular, two types of security risks have been observed,
namely blackhole and sentry attacks [63]. In the former,
malicious nano-sensors emit chemicals to attract the legitimate
ones and prevent them from searching for their target. On the
contrary, in the latter, the malicious nano-sensors repel the
legitimate ones for the same reason. Such security risks can be
counterbalanced with the use of threshold-based and bayesian
ML techniques that have been proven to counter the threats
with minimal requirements.
C. Biomedicine Applications
Timely detection and intervention are tied with successful
treatment for many diseases. This is the so-called proactive
treatment and is one of the main objectives of the next-
generation healthcare systems, in order to detect and pre-
dict diseases and offer treatment services seamlessly. Data
analysis and nanotechnology progress simultaneously toward
the realization of these systems. Recent breakthroughs in
nanotechnology-enabled healthcare systems allow for the ex-
ploitation of not only the data that already exist in medical
databases throughout the world, but also of the data gathered
from millions of nano-sensors.
1) Disease detection: One of the most common problems
in healthcare systems is genome classification, with cancer
detection being the most popular. Various classification algo-
rithms are suitable for tackling this problem, such as Naive
Bayes, k-Nearest Neighbors, Decision tree, ANNs and support
vector machine (SVM) [64]. For example, the authors in [65],
predicted the risk of cerebral infarction in patients by using
demographic and cerebral infarction data. In addition, in [7] a
unique coarse-to-fine learning method was applied on genome
data to identify gastric cancer. Another example is the research
presented in [66], where SVM and convolution NNs (CNNs)
were used to classify breast cancer subcategory by performing
analysis on microscopic images of biopsy.
2) Therapy development: Therapy development and opti-
mization can improve clinical efficacy of treatment for various
diseases, without generating unwanted outcomes. Optimization
still remains a challenging task, due to its requirement for
selecting the right combination of drugs, dose and dosing fre-
quency [67]. For instance, a quadratic phenotype optimization
platform (QPOP) was proposed in [68] to determine the opti-
mal combination from 114 drugs to treat bortezomib-resistant
multiple myeloma. Since its creation, QPOP has been used to
surpass the problems related to drug designing and optimiza-
tion, as well as drug combinations and dosing strategies. Also,
in [69], the authors presented a platform called CURATE.AI,
which was validated clinically and was used to standardize
therapy of tuberculosis patients with liver transplant-related
immunosuppression. Furthermore, CURATE.AI was used for
treatment development and patient guidance that resulted in
halted progression of metastatic castration resistant prostate
cancer [70].
III. MACH IN E LEARNING MET HO DS I N NAN O-SCALE
BIOMEDICAL ENGINEERING
This section presents the fundamental ML methodologies
that are used in nano-scale biomedical engineering. It is
organized as follows: Section III-A provides a survey of the
ANNs, which are employed in this field, while Section III-B
presents regression methodologies. Meanwhile, the applica-
tions, architecture and building blocks of SVMs and knearest
neighbors (KNNs) are respectively described in Sections III-C
and III-D, whereas dimentionality reduction methods are given
in Section III-E. A brief review of gradient descent (GD) and
active learning (AL) methods are respectively delivered in Sec-
tions III-F and III-G. Furthermore, Bayesian ML is discussed
in Section III-H, whereas decision tree learning (DTL) and
decision table (DT) based algorithms are respectively reported
in Sections III-I and III-J. Section III-K revisits the operating
principles of surrogate-based optimization, while Section III-L
describes the use of quantitative structure-activity relation-
ships (QSARs) in ML. Finally, the Boltzmann generator is
presented in Section III-M, while Sections III-N and III-O
respectively discuss feedback system control (FSC) methods
and the quadratic phenotypic optimization platform.
A. Artificial Neural Networks
This section is focused on presenting the ANNs that are
commonly used in nano-scale biomedical engineering. To-
wards this direction, Section III-A1 reports the applications
of CNNs in this field, presents a typical CNN architecture
and discusses its building blocks functionalities. Similarly,
Section III-A2 presents the operation of RRNs, while deep
NNs (DNNs) are discussed in Section III-A3. Diffractive
DNNs (D2NN) and generalized regression NNs (GRNNs) are
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 6
respectively described in Section III-A4 and III-A5, while
Sections III-A6 and III-A7 respectively revisit the multi-
layer perceptrons (MLPs) and GANs. Moreover, the ap-
plications, architecture and limitations of Behler-Parrinello
networks (BPNs) are reported in Section III-A8, whereas,
Sections III-A9, III-A10, and III-A11 respectively present the
ones of deep potential networks (DPNs), deep tensor NNs
(DTNNs), and SchNets. Likewise, the usability and building
blocks of accurate NN engine for molecular energies, or as
is widely-known ANI, are provided in Section III-A12. Fi-
nally, comprehensive descriptions of coarse graining networks
(CGNs) and neurophormic computing are respectively given
in Sections III-A13 and III-A14.
1) Convolution Neural Networks: CNNs have been ex-
tensively used for analyzing images with some degrees of
spatial correlation [71]–[74]. The aim of CNNs is to extract
fundamental local correlations within the data, and thus, they
are suitable for identifying image features that depend on
these correlations. In this sense, in [75], the author employed
CNNs to analyze skyrmions in labeled Lorentz transmission
electron microscope (TEM) images, while, in [76], CNNs were
used to identify matter phases from data extracted via Monte
Carlo simulations. Another application of CNNs in nano-scale
biomedical systems lies in the utilization of autonomous re-
search systems (ARES) [77]. Specifically, in [77], the authors
presented a learning method that determines the state-of-the-
tip in scanning tunneling microscopy.
Figure 3 depicts a typical CNN architecture, which mimics
the neurons’ connectivity patterns in the human brain. It
consists of neurons, which are arranged in a three dimensional
(3D) space, i.e., width, height, and depth. Each neuron receives
several inputs and performs an element-wise multiplication,
which is usually followed by a non-linear operation. Note that,
in most cases, CNN architectures are not fully-connected. This
means that the neurons in a layer will only be connected to
a small region of the previous layer. Each layer of a CNN
transforms its input to a 3D output of neuron activations. In
more detail, it consists of the following layers:
Input: holds the raw pixels of the image in the three color
channels, namely red, green, and blue.
Convolution: evaluates the output of neurons, which are
connected to local regions in the input.
Rectified linear unit (RELU): applies an element-wise
activation function, such as thresholding at zero.
Pooling: conducts downsampling along the spatial dimen-
sions.
Flattening: reorganizes the values of the 3D matrix into
a vector.
Hidden layers: returns the classification scores.
2) Recurrent Neural Networks: Most ML networks rely to
the assumption of independence among the training and test
data. Thus, after processing each data point, the entire state
of the network is lost. Apparently, this is not a problem, if
the data points are independently generated. However, if they
are in time or space related, this becomes an unacceptable
assumption. Moreover, conventional networks usually rely on
data points, which can be organized in vectors of fixed length.
However, in practice there exist several problems, which
Red
Green
Blue
Convolution + RELU
Pooling
Convolution + RELU
.
.
.
· · ·
Flattening
Inputs
Outputs · · ·
Fig. 3. CNN architecture.
require modeling data with temporal or sequential structure
and varying length inputs and outputs.
In order to overcome the aforementioned limitations, RNNs
have been proposed in [78]. RNNs are connectionist models
capable of selectively passing information across sequence
steps, while processing sequential data. From the nano-scale
applications point of view, RNNs have been used for nano-
structure design and data sequence detection in MCs. Specifi-
cally, in [79], Hedge described the role that RNNs are expected
to play in the design of nano-structures, while, in [80] and
in [53], the authors employed a RNN in order to train a
maximum likelihood detector in MCs systems.
Figure 4 depicts the most successful RNN architecture,
introduced by Hochreiter and Schmidhuber [81]. From this
figure, it is evident that the only difference between RNN
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 7
and CNN is the fact that the hidden layers of the latter
are replaced with memory cells with self-connected recurrent
fix-weighted edges. These recurrent edges guarantee that the
gradient can pass across several steps without vanishing. The
weights change during training in a slowing rate in order to
create a long-term memory. Finally, RRNs support short-term
memory through ephemeral activations, which pass from each
node to successive nodes.
3) Deep Neural Networks: Deep learning was suggested
in [53] as an efficient method to detect the information at
the receiver in molecular communication systems. Specifically,
based on the similarities between speech recognition and
molecular channels, techniques from DL can be utilized to
train a detection algorithm from samples of transmitted and
received signals. In this work, it was proposed that well-known
NNs such as a RNN, can train a detector even if the underlying
system model is not known. Furthermore, the authors of [53]
proposed a real-time NN-based sequence detector, and it was
shown that the suggested DL-based algorithms could eliminate
the need for instantaneous CSI estimation.
Finally, in [52] a NN-based modeling of the molecular
multiple-input multiple-output (MIMO) channel, was pre-
sented. This is a remarkable contribution, since the proposed
model can be used to investigate the possibility of increasing
the low rates in MCs. Specifically, in this paper a 2×2molec-
ular MIMO channel was modeled through two ML-based
techniques and the developed model was used to evaluate the
bit error rate (BER).
4) Diffractive Deep Neural Network: In [82], a diffractive
deep NN (D2NN) framework was proposed. D2NN is an
all-optical deep learning framework, where multiple layers
of diffractive surfaces physically form the NN. These layers
collaborate to optically perform an arbitrary function, which
can be learned statistically by the network. The learning part
is performed through a computer, whereas the prediction of
the physical network is all-optical.
Several transmissive and/or reflective layers create the
D2NN. More specifically, each point on a specific layer can
either transmit or reflect the incoming wave. To this end,
an artificial neuron is formed, which is connected to other
neurons of the following layers through optical diffraction.
Following Huygens’ principle, each point on a specific layer
acts as a secondary source of a wave, whose amplitude and
phase are expressed as the product of the complex valued
transmission or reflection coefficient and the input wave at that
point. Consequently, the input interference pattern, due to the
earlier layers and the local transmission/reflection coefficient at
a specific point, modulate the amplitude and phase a secondary
wave, through which an artificial neuron in the D2NN is
connected to the neurons of the following layer.
5) Generalized regression neural networks: GRNN belongs
to the instance-based learning methods and it is a variation
of radial basis NNs [83]. Instance-based learning methods,
that construct hypotheses directly from the training instances,
have tractable computational cost in general, compared to the
not instance-based like MLP with backpropagation. GRNN
consists of an input layer, a pattern layer, and the output layer
and can be expressed by
ˆy(x)=ˆ
f(x)=ÍN
k=1ykK(x,xk)
ÍN
k=1K(x,xk),(1)
where y(x)is the prediction value of the N+1-th input x,ykis
the activation of k-th neuron of the pattern layer and K(x,xk)
is the radial basis function kernel, which is a Gaussian kernel
given by
K(x,xk)=edk/2σ2,dk=(xxk)T(xxk),(2)
where dis the Euclidean distance and σis a smoothing
parameter. Due to the presence of K(x,xk), the value ykof
training data instances that are closer to x, according to the σ
parameter, has more significant contribution to the predicted
value.
GRNN is used in [84] in order to characterize psychological
wellness from survey results that measure stress, depression,
anger, and fatigue.
6) Multi-layer Perceptrons: MLP is a type of feedforward
ANN that consists of at least three layers of nodes: input layer,
output layer, and one or more hidden layers [85]. Apart from
the input nodes a(0)
n, each node is a neuron that takes as input
a weighted sum of the node values as well as a bias of the
previous layer and gives an output depending on a usually
sigmoid activation function, σ(Û
). Therefore, the input of the
k-th neuron in the L-th layer can be expressed as
z(L)
k=wk,0a(L1)
0+. . . wk,na(L1)
n+bk,(3)
where wiare the weights associated to each node at the
previous layer and b(L)
iis the bias at the i-th node of the
L-th hidden layer. The activation of that neuron then can be
written as
a(L)
i=σ(z(L)
i).(4)
The number of nodes in the input layer is equivalent to
the number of input features, whereas the number of output
neurons is equivalent to the output features. A cost function C,
usually the sum squared errors between prediction and target,
is calculated and it is fed in a backward fashion in order to
update the weights in each neuron via a GD algorithm, and
thus, to minimize the cost function. This learning method of
updating the weights in such manner is called backpropagation
[86]. More specifically, the degree of error in an output node j
for the n-th training example is ej(n)=yj(n)− ˆyj(n), where yis
the target value and ˆyis the predicted value by the perceptron.
The error, for example n,over all output nodes is given by
C(n)=Õ
j
e2
j(n).(5)
GD dictates a change in weights proportional to the negative
gradient of the cost function, −∇C(w). However, this method
with the entirety of training data can be computationally
expensive, so methods like stochastic GD for every step can
increase efficiency.
MLP is used in [84] in order to characterize psychological
wellness from survey results that measure stress, depression,
anger, and fatigue.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 8
Inputs
Outputs
Fig. 4. An RNN with a hidden layer consisting of two memory cells.
7) Generative Adversarial Networks: AGAN [87] is an
unsupervised learning strategy, which was introduced in [88].
A GAN consists of two networks, a generator that estimates
the distributions of the parameters and a discriminator that
evaluates each estimation by comparing it to the available
unlabeled data. This strategy can exploit specific training
algorithms for different models and optimization algorithms.
Specifically, a MLP can be utilized in a twofold way, i.e., the
generative model generates samples by passing random noise
through it, while it is also used as the discriminative model.
Both networks can be trained using only the highly successful
backpropagation and dropout algorithms, while approximate
prediction or Markov chains are not necessary.
The generator’s distribution pgover data xcan be learned by
defining a prior on input noise variables pz(z)and representing
a mapping to data space as G(z;θg), where Gis a differentiable
function which corresponds to a MLP with parameter θg. A
second MLP D(x;θd)with parameter θdand a single scalar
number as output, denotes the probability that xis derived
from the data rather than pg. The Dis trained in order to max-
imize the probability that the training examples and samples
from Gare labeled correctly, while Gis simultaneously trained
to minimize the term log(1D(G(z))). More specifically, a
two-player min-max game is performed with value function
V(G;D)as follows:
min
Gmax
DV(D,G)=Expg(x)[log D(x)]
+Expz(z)[log(1D(G(z)))].
(6)
In practice, the game must be performed by using an iterative
numerical approach. Optimizing Din the inner loop of training
is computationally prohibitive and on finite data sets would
result in over-fitting. A better solution is to alternate between
ksteps of optimizing Dand one step of optimizing G. To
this end, Dis maintained near its optimal solution, while G
is modified slowly enough.
8) Behler-Parrinello Networks: BPNs are traditionally used
in molecular sciences in order to learn and predict the energy
surfaces from QM data, by combining all the relevant physical
symmetries and properties as well as sharing parameters be-
tween atoms. The fundamental BPN architecture is depicted in
Fig. 5. For each atom i, the molecular coordinates are mapped
to invariant features. A set of correlation functions, which
describe the chemical environment of each atom, is employed
in order to map the distances of neighboring atoms of a
certain type and the angle between two neighbors of specific
types. The aforementioned features are inputted into a dense
NN, which returns the energy of atom iin its environment.
Input feature functions are designed taken into account that
the energy is rototranslationally invariant, while equivalent
atoms share their parameters. In the final step, all the atoms
of a molecule are dentified and their atomic energies are
summed. This guarantees permutation invariance. Parameter
sharing combined with the summation principle offers also
scalability, since it allows growing or shrinking the molecules
network to any size, including ones that were never seen in the
training data. The main limitation of BPNs is that they cannot
accurately predict the energy surfaces in complex chemical
environments.
9) Deep Potential Networks: DPNs aim at providing an
end-to-end representation of PESs, which employ atomic con-
figuration directly at the input data, without decompositioning
the contributions of different number of bodies. Similarly to
BPNs, the main challenge is to design a DNN, that takes into
account both the rotational and permutational symmetries as
well as the chemically equivalent atom.
Let us consider a molecule that consists of NXiatoms of
type Xi, with i={1,2,· · · ,M}. As demonstrated in Fig. 6, the
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 9
x1
Coordinates · · · xn
gi
1
· · ·
Atom ifeatures gi
k
· · ·
· · ·
Atom-specific
neural network
Atom ienergy
Fig. 5. Behler-Parrinello network architecture.
DPN takes as inputs the Cartesian coordinates of each atom
and feeds them in ÍM
i=1NXialmost independent sub-networks.
Each of them provides a scalar output that corresponds to the
local energy contribution to the PES, and maps a different
atom in the system. Furthermore, they are coupled only
through summation in the last step of this method, when the
total energy of the molecule is computed. In order to ensure
the permutational symmetry of the input, in each sub-network,
the atoms are fed into different groups that corresponds to
different atomic species. Within each group, the atoms are
sorted in order to increase the distance to the origin. To further
guarantee global permutation symmetry, the same parameters
are assigned to all the sub-networks.
10) Deep Tensor Neural Networks: Recently, several re-
searchers have exploited DTNN capability to learn a multi-
scale representation of the properties of molecules and mate-
rials from large-scale data in order to develop molecular and
material simulators [11], [89], [90]. In more detail, DTNN
initially recognizes and constructs a representation vector for
each one of the atoms within the chemical environment, and
then it employs a tensor construction algorithm that iteratively
learns higher-order representations, after interacting with all
pairwise neighbors.
Figure 7 presents a comprehensive example of DTNN archi-
tecture. The input, which consists of atom types and positions,
is processed through several layers to produce atom-wise
energies that are summed to a total energy. In the interaction
layer, which is the most important one, atoms interact via
continuous convolution functions. The variable Wtstands for
convolution weights that are returned from a filter generator
function. Continuous convolutions are generated by DNNs that
operate on interatomic distances, ensuring rototranslational
invariance of the energy.
DTNNs can accurately model a general QM molecular
potential by training them in a diverse set of molecular
energies [89]. Their main disadvantage is that they are unable
to perform energy predictions for larger systems than those
included in the training set [91].
11) SchNet: SchNets can be considered as a special case
of DTNN, since they both share atom embedding, interaction
refinements and atom-wise energy contribution. Their main
difference is that interactions in DTNNs are modeled by tensor
layers, which provide atom representations. Parameter tensors
are also used in order to combine the atom representations
with inter-atomic distances [92]. On the other side, to model
the interactions, SchNet employs filter convolutions, which are
interpreted as a special case of computational-efficient low-
rank factorized tensor layers [93], [94].
Conventional SchNets use discrete convolution filters
(DCFs), which are designed for pixelated image processing
in computer vision [95]. QM properties, like energy, are
highly sensitive to position ambiguity. As a consequence, the
accuracy of a model that discretize the particles position in
a grid is questionable. To solve this problem, in [96], the
authors employed continuous convolutions in order to map
the rototranslationally invariant inter-atomic distances to filter
values, which are used in the convolution.
12) Accurate neural network engine for molecular ener-
gies: Accurate neural network engine for molecular energies
(ANAKIN-ME), or ANI for short, are networks that have been
developed to break the walls built by DTNNs. The principle
behind ANI is to developed modified symmetry functions
(SmFs), which were introduced by BPNs, in order to develop
NN potentials (NNPs). NNPs output single-atom atomic envi-
ronments vectors (AEVs), as a molecular representation. AEVs
allow energy prediction in complex chemical environments;
thus, ANI solves the transferability problem of BPNs. By
employing AEVs, the problem, which need to be solved by
ANI, is simplified into sampling statistically diverse set of
molecular interactions within a predefined region of interest.
To successfully solve this problem, a considerably large data
set that spans molecular conformational and configurational
space, is required. A trained ANI is capable of accurately pre-
dicting energies for molecules within the training set region.
As presented in Fig. 8, ANI uses the molecular coordinates
and the atoms in order to compute the AEV of each atom. The
AEV of atom Ai(with i=1,· · · ,N), GAi, scrutinizes specific
regions of Ai’s radial and angular chemical environment. Each
GAiis inputted in a single NPP, which returns the energy of
atom i. Finally, the total energy of a molecule is evaluated as
the sum of the energies of each one of the atoms.
13) Coarse Graining Networks: A common approach in
order to go beyond the time and length scales, accessible with
computational expensive molecular dynamics simulations, is
the coarse-graining (CG) models. Towards this direction,
several research works, including [18], [97]–[105], developed
CG energy functions for large molecular systems, which
take into account either the macroscopic properties or the
structural features of atomistic models. All the aforemen-
tioned contributions agreed on the importance of incorporating
the physical constraints of the system in order to develop
a successful model. The training data are usually obtained
through atomistic molecular dynamics simulations. Values
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 10
Cartesian coordinates
Input Input Input
Hidden layer Hidden layer Hidden layer
Local energy Local energy Local energy
Total energy
Local energy
Fig. 6. Deep potential net architecture.
Fig. 7. DTNN architecture.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 11
Molecular coordinates
Atomic environment
vector generator Atomic environment
vector generator
Atom Atom
Neural network
potentials of Neural network
potentials
Atoms
Atomic energy Atomic energy
Total energy
Atomic energy
Fig. 8. ANI architecture.
within physically forbidden regions are not sampled and not
included in the training. As a result, the machine is unable
to perform predictions far away the training data, without
additional constraints.
To countermeasure the aforementioned problem, CG net-
works employ regularization methods in order to enforce the
correct asymptotic behavior of the energy when a nonphysical
limit is violated. Similarly to BPNs and SchNets, CG networks
initially translate the cartesian into internal coordinates, and
use them to predict the rototranslationally invariant energy.
Next, as illustrated in Fig. 9, the network learns the difference
from a simple prior energy, which has been defined to have
the correct asymptotic behavior [18]. Note that due to the
fact that CG networks are capable of using available training
data in order to correct the prior energy, its exact form is
not required. Likewise, CG networks compute the gradient of
the total free energy with respect to the input configuration in
order to predict the conservative and rotation-equivariant force
fields. The force-matching loss minimization of this prediction
is used as a training rule of the CG network.
In practice, CGNs are used to predict the thermodynamic
of chemical systems that are considerably larger than what
is possible to simulate with atomistic resolution. Moreover,
there have been recently presented some indications that they
can also used to approximate the system kinetics, through the
addition of fictitious particles [106] or by employing spectral
matching to train the CGN [107].
14) Neuromorphic Computing: Neuromorphic computing
[87] is an emerging field, where the architecture of the
brain is closely represented by the designed hardware-level
system. The fundamental unit of neuromorphic computation
is a memristor, which is a two-terminal device in which
conductance is a function of the prior voltages in the de-
vice. Memristors were realized experimentally considering
that many nanoscale materials exhibit memristive properties
through ionic motion [108]. Nanophotonic systems are also
utilized for neuromorphic computing and especially for the
realization of deep learning networks [109] and adsorption-
based photonic NNs [110].
Cartesian coordinates
Featurization
Free energy
Net
Prior
Energy
Fig. 9. CG network architecture.
Although neuromorphic computing and memristors tend
to be a scalable practical technology, large area uniformity,
reproducibility of the components, switching speed/efficiency
and total lifetime in terms of cycles remain quite challenging
aspects [111], which require either the development of novel
memristive systems or improvements to existing systems.
To this end, integration with existing complementary metal-
oxide-semiconductor (CMOS) platforms and competitive per-
formance advantage over CMOS neurons must be explored.
These analog networks, after they are trained, can be highly
efficient, however their training does not utilize digital logic
and, thus, lacks flexibility [87].
B. Regression
In this section, we discuss the regression methods that are
commonly-used in the field of nano-scale biomedical engineer-
ing. In this sense, Section III-B1 provides a brief overview of
logistic regression (LR), whereas Sections III-B2 and III-B3
respectively discusses multivariate linear regression (MvLR)
and classification via regression. Finally, Sections III-B4
and III-B5 respectively report the operating principles of local
weighted learning (LWL) and scoring functions (SFs).
1) Logistic regression: LR is a supervised learning classi-
fication algorithm used to predict the probability of a target
variable. The nature of target or dependent variable is dichoto-
mous, which means there would be only two possible classes.
LR can fit trends that are more complex than linear regression,
but it still treats multiple properties as linearly related and is
still a linear model. LR is named after the function used at
the core of the method, the logistic function, which can take
any real-valued number and map it into a value between 0 and
1. Furthermore, LR has been used extensively in biomedical
applications, such as disease detection. Specifically, in [112],
the authors used LR for determining structureâ ˘
A¸Sactivity
relationships and design rules for spherical nucleic acids
functioning as cancer-vaccine candidates.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 12
2) Multivariate linear regression: Following the previous
analysis, when multiple correlated dependent variables are pre-
dicted, rather than a single scalar variable, the method is called
MvLR. This method is a generalization of multiple linear
regression and incorporates a number of different statistical
models, such as analysis of variance (ANOVA), t-test, F-test
and more. MvLR has been used in ML for several nano-scale
biomedical applications. Among the most successful ones is
the prediction of cytotoxicity in nanoparticles [113].
The MvLR model has the form:
yik =b0k+
p
Õ
j=1
bjk xi j +eik ,(7)
where yik is the k-th response for the i-th observation, b0kis
the regression intercept for the k-th response, bj k is the j-th
predictor’s regression slope for the k-th response, xi j is the
j-th predictor for the i-th observation, eik is a Gaussian error
term for the k-th response, k[1,m]and i[1,n].
3) Classification via regression: Conventionally, when
dealing with discrete classes in ML, a classification method
is used, while a regression method is applied, when dealing
with continuous outputs. However, it is possible to perform
classification through a regression method. The class is bina-
rized and one regression model is built for each class value.
In [114], in order to predict cytotoxicity of certain nano-
particles, classification via regression is among the methods
that were evaluated, in order to eliminate in silico materials
from potential human applications.
4) Local weighted learning: In the majority of learning
methods, a global solution can be reached using the entirety
of the training data. LWL offers an alternative approach at
a much lower cost, by creating a local model, based on the
neighboring data of a point of interest. In general, data points
in the neighborhood of the point of interest, called query
point, are assigned a weight based on a kernel function and
their respective distance from the query point. The goal of
the method is to find the regression coefficient that minimizes
a cost function, similar to most regression methods. Due to
its nature as a local approximation, LWL allows for easy
addition of new training data. Depending on whether LWL
stores in memory or not the entirely of the training data, LWL-
based methods can be divided into memory-based and purely
incremental, respectively [115].
Recently, LWL was used in [114], in order to predict
the cytotoxicity of nanoparticles in biological systems given
an ensemble of attributes. It is found that when the data
were further validated, the LWL classifier was the only one
out of a set of classifiers that could offer predictions with
high accuracy.
5) Machine learning scoring functions: SFs can be used to
assess the docking performance, i.e. to predict how a small
molecule binds to a target can be applied if a structural model
of such target is available. However, despite much research
efforts dedicated in the last years to improve the accuracy
of SFs for structure-based binding affinity prediction, the
achieved progress seems to be limited. ML-SFs have recently
proposed to fill this performance gap. These are based on ML
regression models with not a predetermined functional form,
Classical SF
DOCK (force field SF) bind
PMF (knowledge-based SF) PMF
X-Score (empirical SF) bind vdW HBonds rotor hydrophobic
ML SF cut-off
Fig. 10. Examples of classical and ML-SFs (from [116])
and thus, are able to efficiently exploit a much larger amount
of experimental data [116]. The concept behind ML-SFs is that
the classical approach of using linear regression with a small
number of expert-selected structural features can be strongly
improved by using ML on nonlinear regression together with
comprehensive data-driven feature selection (FS). Also, in
[117] investigated whether the superiority of ML-SFs over
classical SFs on average across targets, is exclusively due to
the presence of training with highly similar proteins to those
in the test set.
In Fig. 10 examples of classical and ML-SFs are depicted
[116]. The first three (DOCK, PMF and X-SCORE) are clas-
sical SFs, which are distinguished by the employed structural
descriptors. As it is evident, they all assume an additive
functional form. On the other side, ML-SFs do not make
assumptions about their functional form, which is inferred
from the training data.
C. Support Vector Machine
NNs can be efficiently used in classification, when a huge
number of data is available for training. However, in many
cases this method outputs a local optimal solution instead of
a global one. SVM is a supervised learning technique, which
can overcome the shortcomings of NNs in classification and
regression. For a brief but useful description of the SVM
please see [118] and references therein. Next, for the help
of the reader the SVM is summarized by using [118].
The aim of SVM is to find a classification criterion, which
can effectively distinguish data at the testing stage. This
criterion can be a line for two classes data, with a maximum
distance of each class. This linear classifier is also known as
an optimal hyperplane. In Fig. 11, the linear hyperplane is
described for a set of training data, x=(1,2,3, ..., n), as:
wTx+b=0,(8)
where wis an n-dimensional vector and bis a bias (error)
term.
This hyperplane should satisfy two specific properties: (1)
the least possible error in data separation, and (2) the distance
from the closest data of each class must be the maximum one.
Under these conditions, data of each class can only belong
to the left of the hyperplane. Therefore, two margins can be
defined to ensure the separability of data as:
wTx+b1for yi=1
≤ −1for yi=1(9)
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 13
Class 1
Class 2
Fig. 11. The SVM method [118]
The general equation of the SVM for a linearly separable case,
which would be subjected to two constraints as:
max Ld(α)=ÍN
i=1αi1
2ÍN
i,j=1yiyjαiαjxT
ixj
s.t. αi0
ÍN
i=1αiyi=0
(10)
where αis a Lagrange multiplier.
Eq. (10) is used in order to find the support vectors and their
corresponding input data. The parameter w of the hyperplane
(decision function) can then be obtained
w0=
N
Õ
i=1
αixiyi(11)
and the bias parameter bcan be calculated as:
b0=1
N
N
Õ
S=1ySwTxS(12)
More details about the use of the linear as well as the non-
linear SVM methods, can be found in [118].
An indicative training algorithm for SVM is the sequential
minimal optimization (SMO). SMO is a training algorithm
for SVMs. The training of an SVM requires the solution of
a large quadratic programming (QP) optimization problem.
Conventionally, the QP problem is solved by complex numer-
ical methods, however SMO breaks down the problem into
the smallest possible and solves it analytically, thus reducing
significantly the amount of required time. SMO chooses two
Lagrange multipliers to optimize, which can be done analyt-
ically, and updates the SVM accordingly. Interestingly, the
smallest amount of Lagrange multipliers to solve the dual
problem is two, one from a box constraint and one from
linear constraint, meaning the minimum lies in a diagonal line
segment. If only one multiplier was used in SMO, it would
not be able to guarantee that the linear constraint is fulfilled at
every step [119]. Moreover, SMO ensures convergence using
Osuna’s theorem, since it is a special case of the Osuna
algorithm, that is guaranteed to converge [120]. Recently,
in [114], SMO was one of the classifiers used to predict
cytotoxicity of Polyamidoamine (PAMAM) dendrimers, well
documented nanoparticles that have been proposed as suitable
carriers of various bioactive agents.
SVM have been applied in many significant applications
in bioinformatics and bioemedical engineering. Examples
include: protein classification, detection of the splice sites,
analysis of the gene expression, including gene selection for
microarray data, where a special type of SVM called Potential
SVM has been successfully used for analysis of brain tumor
data set, lymphoma data set, and breast cancer data set ( [121]
and references therein).
Recently, SVM was considered in MCs. Specifically, in
[122] the authors proposed injection velocity as a very promis-
ing modulation method in turbulent diffusion channels, which
can be applied in several practical applications as in pollution
monitoring, where inferring the pollutant ejection velocity may
give an indication to the rate of underlying activities. In order
to increase the reliability of inference, a time difference SVM
technique was proposed to identify the initial velocities. It was
shown that this can be achieved with very high accuracy.
In [123] a diffused molecular communication system model
was proposed with the use a spherical transceiver and a trape-
zoidal container. The model was developed through SVM-
Regression and other ML techniques, and it was shown that
it performs with high accuracy, especially if long distance
is assumed.
D. kNearest neighbors
KNN is a supervised ML classifier and regressor. It is based
on the evaluation of the distance between the test data and
the input and gives the prediction accordingly. The concept
behind KNN is the classification of a class of data, based on
the k nearest neighbors. Other names of this ML algorithm are
memory-based classification and example-based classification
or case-based classification.
KNN classification consists of two stages: the determination
of the nearest neighbors and the class using those neighbors.
A brief description of the KNN algorithms is as follows [124]:
Let us considered a training data set Dconsisted of (xi)i∈[1,|D|]
training samples. The examples are described by a set of
features F, which are normalized in the range[0,1]. Each
training example is labelled with a class label yjY. The
aim is to classify an unknown example q. To achieve this, for
each xiDwe evaluate the distance between qand xias:
d(q,xi)=Õ
fF
wfδqf,xi f (13)
There are many choices for this distance metric; a funda-
mental metric, based on the Euclidian distance, for continuous
and discrete attributes is
δqf,xi f =
0fdiscrete and qf=xi f
1fdiscrete and qf,xi f
qfxif fcontinuous
(14)
The KNNs are selected based on this distance metric. There
are a variety of ways in which the KNN can be used to
determine the class of q. The most straightforward approach
is to assign the majority class among the nearest neighbors to
the query.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 14
Class A
Class B
Fig. 12. The KNN ML method [125]
Figure 12 depicts a 3and 6KNN on a two-class problem in
a two-dimensional space [125]. The red star represents the test
data point whose value is (2,1,3). The test point is surrounded
by yellow and blue dots which represent the two classes. The
distance from our test point to each of the dots present on
the graph. Since there are 10 dots, we get 10 distances. We
determine the lowest distance and predict that it belongs to
the same class of its nearest neighbor. If a yellow dot is the
closest, we predict that our test data point is also a yellow dot.
In some cases, you can also get two distances which exactly
equal. Here, we take into consideration a third data point and
calculate its distance from the test data. In Fig. 12 the test data
lies in between the yellow and the blue dot. We considered
the distance from the third data point and predicted that the
test data is of the blue class.
The advantages of KNN are simple implementation and no
need for prior assumption of the data. The disadvantage of
KNN is the high prediction time.
E. Dimentionality Reduction
This section is devoted to discussing dimentionality reduc-
tion methods. In particular, a comprehensive description of FS
is provided in Section III-E1. Likewise, principal component
analysis (PCA) and linear discriminant analysis (LDA) are
respectively discussed in Sections III-E2 and III-E3. Finally,
Section III-E4 presents the fundamentals of independent com-
ponent analysis (ICA).
1) Feature Selection: FS reduces the complexity of a prob-
lem by detecting the subset of features that contribute most to
the results. FS is one of the core concepts in ML, which hugely
impacts the achievable performance. It is important to point
out that FS is different from dimensionality reduction. Both
methods seek to reduce the number of attributes in the data
set, but a dimensionality reduction method do so by creating
new combinations of attributes, whereas FS methods include
and exclude attributes present in the data without changing
them.
Combining ML algorithms with FS has been proven to be
very useful for disease detection [126], [127]. It highlights the
features associated with a specific target from a larger pool.
For instance, in [128], a classification algorithm was used to
analyze 10000 genes from 200 cancer patients, while FS was
used to associate 50 of them with metastatic prostate cancer.
The selected features were then utilized as biomarker signature
criteria in a ML algorithm for classification and diagnostics.
Furthermore, recent research efforts provided evidence that
combining data from multiple sources, such as transcrip-
tomics and metabolomics, to create composite signatures can
improve the accuracy of biomarker signatures and disease
diagnoses [129].
2) Principal Component Analysis: PCA [87], [130]–[132]
is an approach to solve the problem of blind source separation
(BSS), which aims at the separation of a set of source signals
from a set of mixed signals, with little information about the
source signals or the mixing process. PCA utilizes the eigen-
vectors of the covariance matrix to determine which linear
combinations of input variables contain the most information.
It can also be used for feature extraction and dimensionality
reduction. For cases with strong response variations, PCA
allows an effective approach to rapidly process, de-noise, and
compress data, however it cannot explicitly classify data.
More specifically, in PCA, the d-dimensional data are rep-
resented in a lower-dimensional space, reducing the degrees
of freedom, the space and time complexities. PCA aims to
represent the data in a space that best expresses the variation
in a sum-squared error sense and is utilized for segmenting
signals from multiple sources. As in standard clustering meth-
ods, it is useful if the number of the independent components
is determined. Using the covariance matrix C=AAT, where
Adenotes the matrix of all experimental data points, the
eigenvectors wkand the corresponding eigenvalues λkcan be
calculated. The eigenvectors are orthogonal and are chosen in
order the corresponding eigenvalues to be placed in descending
order, i.e, λ1> λ2> .... To this end, the first eigenvector w1
contains the most information and the amount of information
decreases in the following eigenvectors. Therefore, the major-
ity of the information is contained in a number of eigenvectors,
whereas the remaining ones are dominated by noise.
3) Linear Discriminant Analysis: LDA is another method
for the solution of the BSS problem [87], [131]. In LDA,
linear combinations of parameters that optimally classify data
are identified and the main goal is to reduce the dimension
of data. LDA has been used with a nanofluidic system to
interpret gene expression data from exosomes and thus, to
classify the disease state of patients. More specifically, LDA
aims to create a new variable that is a combination of the
original predictors, by maximizing the differences between
the predefined groups with respect to the new variable. The
predictor scores are utilized in order to form the discriminant
score, which constitutes a single new composite variable.
Therefore, the use of LDA results in an significant data dimen-
sion reduction technique that compresses the p-dimensional
predictors into a one-dimensional line. Although at the end
of the process the desired result is that each class will have
a normal distribution of discriminant scores with the largest
possible difference in mean scores between the classes, some
overlap between the discriminant score distributions exists.
The degree of this overlap represent a measure of the success
of LDA. The discriminant function which is used to calculate
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 15
the discriminant scores can be expressed as
D=w1Z1+w2Z2+... +wpZp,(15)
where wkand Zkwith k=1, ...pdenote the weights and
predictors, respectively. From (15), it can be observed that
the discriminant score is a weighted linear combination of the
predictors. The estimation of the weights aims to maximize
the difference between each class mean discriminant scores.
To this end, the predictors which are not similar with respect to
the class mean discriminant scores will have larger weights,
whereas the weights will reduce the more similar the class
means are [133].
4) Independent Component Analysis: ICA [87], [131],
[132] was introduced in [134] and is another approach to the
solution of the BSS problem. According to ICA, the original
inputs are transformed into features, which are mutually inde-
pendent and the non-orthogonal basis vectors that correspond
to the correlations of the data are identified through higher
order statistics. The use of the last one is needed, since the
components are statistically independent, i.e., the joint PDF of
the components is obtained as the product of the PDFs of all
components.
Let consider cindependent scalar source signals xk(t), with
K=1, ..., cand tbeing a time index. The csignals can be
grouped into a zero mean.vector x(t). Assuming that there is
no noise and considering the independence of the components,
the joint PDF can be expressed as
fx(x)=
c
Ö
k=1
fxk(xk).(16)
An d-dimensional data vector, y(t), can be observed at each
moment through,
y(t)=Ax(t)(17)
where Ais a c×dscalar matrix with dc.
ICA aims to recover the source signals from the sensed
signals, thus the real matrix W=A1has to be determined.
To this end, the determination of Ais performed by maximum-
likelihood techniques. An estimate of the density, termed as
ˆ
fy(y;a), is used and the parameter vector a, that minimizes the
difference between the source distribution and the estimate has
to be determined. It should be highlighted that ais the basis
vector of Aand, thus, ˆ
fy(y;a)is an estimate of fy(y).
F. Gradient Descent Method
When there are one or more inputs the optimization of the
coefficients by iteratively minimizing the error of the model
on the training data becomes a very important procedure. This
operation is called GD and initiates with random values for
each coefficient. The sum of the squared errors is calculated for
each pair of input and output values. A learning rate is used as
a scale factor and the coefficients are updated to minimize the
error. The process is repeated until a minimum sum squared
error is achieved or no further improvement is possible. In
practice, GD is taught using a linear regression model due to
its straightforward nature and it proves to be useful for very
large datasets [135].
GD is one of the most popular algorithms to optimize in
NNs and has been extensively used in nano-scale biomedical
enginnering. For example, in [29], the authors proposed a
method to use ANNs to approximate light scattering by multi-
layer nano-particles and used the GD for optimizing the input
parameters of the NN.
G. Active Learning
In AL, also known as the optimal design of experiments,
a surrogate model is created from a given data set, and then
the model is used to select which data should be obtained
next [136]. The selected data are added to the original data
set and then used to create an updated surrogate model. The
process is repeated iteratively such that the surrogate model
is improved continuously. In contrast to classic ML methods,
the identifier of an AL system is that it develops and tests
new hypotheses as part of a continuing, interactive learning
process. This method of iterative surrogate model screening
has already been used in other fields, such as drug discovery
and molecular property prediction [137], [138].
H. Bayesian Machine Learning
Although the Bayes Theorem is a powerful tool in statistics,
it is also widely used in ML to develop models for classifi-
cation, such as the Optimal Bayes classifier and Naive Bayes.
The Optimal Bayes classifier selects the class that presents the
largest a posteriori probability of occurrence. It can be shown,
that among all classifiers, the Optimal Bayes classifier has
the lowest error probability. In most real-life applications the
posterior distribution is unknown but can rather be estimated.
In this case, the Naive Bayes classifier approximates the op-
timal Bayes classifier by looking at the empirical distribution
and assuming independence of predictors. So, the Naive Bayes
classifier is a simple but suboptimal solution. It should be
mentioned that Naive Bayes can be coupled with a variety of
methods to improve the accuracy [139]. Furthermore, since
it relies on the computation of closed-form expressions of
a posteriori probabilities, it takes linear time to compute,
in contrast to expensive iterative approximations that are
commonly used in other methods.
Assuming an instance that is represented by the observa-
tion of nfeatures, x=(x1, . . . , xn), Naive Bayes assigns
a probability p(Ck|x)for each possible class Ckamong K
possible outcomes. According to Bayes’ theorem, the posterior
probability is given by the prior times the likelihood over the
evidence, i.e.,
p(Ck|x)=p(Ck)p(x|Ck)
p(x).(18)
The evidence is not dependent on Cso it is of no interest.
Naive Bayes is a naive classifier because it assumes that all
features in xare mutually independent conditioned on Ck.
Therefore, it assigns a class label as
ˆy=argmax
k{1,...,K}
p(Ck)
n
Ö
i=1
p(xi|Ck).(19)
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 16
Bayesian analysis and ML are playing an important role in
various aspects of nanotechnology and related molecular-scale
research. Recently it has been shown that an atomic version of
Greenâ ˘
A´
Zs function and Bayesian optimization is capable of
optimizing the interfacial thermal conductance of Si-Si and Si-
Ge nano-structures [140]. This method was able to identify the
optimal structures between 60000 candidate structures. Fur-
thermore, more recent works have relaxed the data requirement
limitations by adapting output parameters to unsupervised
learning methods such as Bayesian statistical methods that do
not rely on an external reference [141]–[143]. Naive Bayes has
been applied to predict cytotoxicity of PAMAM dendrimers,
which are well documented nanoparticles that have been
proposed as suitable carriers of various bioactive agents, in
[114]. By pre-processing the data, Naive Bayes presented
substantial improvement in the accuracy despite its simplicity,
thus, outperforming the classification methods used in [114].
I. Decision Tree Learning
DTL is a predictive modeling technique used in ML, which
uses a decision tree to draw conclusions about the target
value based on observations. In the tree paradigm, the target
values are represented as leaves, while the observations are
denoted by branches. There are two types of DTL, namely
classification and regression trees. In the former, the target
variable belongs in a discrete set of values, while in the
latter the target variable is continuous. Furthermore, some
techniques, such as bagged trees and bootstrap aggregated
decision trees, use multiple decision trees. In more detail,
the bagged trees method builds an ensemble incrementally
by training each new instance to emphasize the training
instances that were previously mis-modeled. The bootstrap
aggregated decision trees is an early ensemble method that
creates multiple decision trees by resampling training data and
voting the trees for a consensus prediction.
DTL has been used extensively in nano-medicine by op-
timizing material properties according to predicted interac-
tions with the target drug, biological fluids, immune system,
vasculature, and cell membranes, all affecting therapeutic
efficacy [144]. Specifically, in [145], decision trees were used
for classification of effective and ineffective sequences for
Ribonucleic acid interference (RNAi) in order to recognize
key features in their design. In addition, several algorithms
have been developed over the years that improve the accuracy
and efficiency of DTL. For instance, the J48 algorithm is
considered among the best algorithms with regard to accuracy
and has been used in various biomedical tasks, such as
predicting cytotoxicity, measured as cell viability [114], [146].
Next, we present the most commonly used DTL methods. In
this direction, Bootstrap aggregating (bagging) is revisited in
Section III-I1, while the operating principles of bagged trees
are highlighted in Section III-I2. Moreover, the fundamentals
of bagged Bayes trees are discussed in Section III-I3, whereas
the adaptive boosting (AdaBoost) approach is reported in Sec-
tion III-I4. Finally, descriptions of random forest (RForest) and
M5P approaches are respectively delivered in Sections III-I5
and III-I6
1) Bagging: Bootstrapping methods have been used exten-
sively to minimize statistical errors of predictors, by utilizing
random sampling with replacement. In supervised learning,
a training dataset is utilized to train a predictor. Bootstrap
replicas of the training dataset can be employed to generate
new predictors. bagging is a meta-learning algorithm that
uses this idea to develop an aggregated predictor, either by
averaging the predictors over the learning sets when the exit
is numerical or by voting, when the exit is a class label [147].
More specifically, assuming a learning set Lconsists of data
{(yn,xn),n=1, . . ., N}and a predictor φ(x,L),yis predicted
by φ(x,L) if the input is x. The learning set consists of N
observations and since it is hard or in many cases impossible to
obtain more observations in order to improve the learning set,
we turn to bootstrapping, creating different learning sets using
the sample Nas the population, which effectively leads to new
predictors ({φ(x,L)}). The aggregated predictor’s accuracy is
determined by the stability of the procedure for constructing
each φpredictor, i.e., the accuracy will be improved with
bagging for unstable procedures, where small variation in the
learning set leads to large changes in the predictor.
Recently, bagging has been used to predict possible toxic
effects caused by the exposure to nanomaterials in biological
systems [148]. As a base predictor φ, REPTree was used,
which is a fast decision tree-based learning algorithm. It
should be mentioned that the bagging algorithm offered the
highest accuracy, in terms of correlation, between actual and
predicted results.
2) Bagged Tree: Bagging can be applied to any kind of
model. By using bagged decision trees, it is possible to lower
the bias by leaving the trees un-pruned. High variance and low
bias is essential for bagging classifiers. The aggregate classifier
can capitalize on this and provide an increase in accuracy. In
[149], a bagged tree was used with great success in a ensemble
classifier with particle swarm optimization (PSO) in order to
predict heart disease.
3) Naive Bayes tree: A hybrid approach to learning, when
many attributes are deemed relevant for a classification task,
yet they are not sufficiently independent, is the NBTree.
NBTree consists in practice of a decision tree with Naive
Bayes classifiers at the leaf nodes [150]. Firstly, according
to a utility function an attribute is split in the decision tree
making process. If the utility is not sufficiently high, the node
becomes a leaf and a Naive Bayes classifier is created at the
node. NBTree can deal both with discrete data, by multi-way
splits for all values, and with continuous data, by using a
threshold split.
In [114], NBTree was used among other learning methods as
a way to predict the cytotoxicity of nanomaterials in biological
systems. When leave-one-out cross validation was performed,
NBTree achieved the best performence and achieved an accu-
racy of 77.7%.
4) Adaptive boosting: AdaBoost is a learning method that
uses an ensemble of classifiers in order to improve accu-
racy [151], [152]. Boosting is a technique that takes a set of
weak learners –usually a decision tree classifier– and combines
them into a strong one. The procedure can be summarized as
follows. A set of labeled training examples {(xi,yi)}, where
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 17
Instance
Tree #1 Tree #2 Tree #N
...
Majority Voting
Final Result
Fig. 13. Random forest diagram.
xiis an observable quality and yiis the outcome, are given
into a set of classifiers that are each assigned a weight.
After every weak classifier has reached to a prediction, the
boosting method combines all the weak hypotheses into a
single prediction. AdaBoost does not need prior knowledge of
the accuracies of the weak classifiers, instead, it adapts to the
errors of the weak classifiers. In essence, the weak classifiers
are tweaked to better handle data that were mishandled by
previous classifiers. In some cases, AdaBoost has shown to be
less susceptible to over-fitting than other learning methods,
however it is prone to noisy data and outliers due to its
adaptive nature.
AdaBoost was one of the methods used in [149] in an
ensemble classifier together with PSO to predict heart disease.
5) Random Forest: RForest is one of the one of the most
used ML algorithms, due to its simplicity and diversity, since
it can be used for both classification and regression. As the
name suggests, a RForest is a tree-based ensemble, where each
tree is connected to a collection of random variables [153]. In
Fig. 13, RForest average multiple decision trees are presented,
that have been trained on different parts of the same training
set, in order to reduce the variance. The different decision trees
are trained based on the bagging technique, thus they exploit
the random subsets of the training data. An advantage of
RForest is that it decreases the variance of the model and, thus,
it combines uncorrelated individual trees with bagging, makes
them more robust without increasing the bias to overfitting.
Another technique for combining individual trees is boosting,
where the samples are weighted for sampling so that samples,
which were predicted incorrectly, get a higher weight and
are therefore, sampled more often. The concept behind this
is that difficult cases should be emphasized during learning,
compared to easy ones. Because of this difference, bagging can
be easily paralleled, while boosting is performed sequentially.
Next, we provide briefly the mathematical concept behind the
RForest method.
We assume an unknown joint distribution PXY(X,Y), where
X=X1, . . . , XpTis a p-dimensional random vector, which
represents the predictor variables and Yis the real-valued
response. The aim of the RForest algorithm is to find a
prediction function f(X)in order to predict Y. The prediction
function is that which minimizes the expected value of the
loss function L(Y,f(X)), i.e.
EXY (L(Y,f(X))),(20)
where the subscripts denote expectation with respect to the
joint distribution of Xand Y.
Note that L(Y,f(X)) is a measure of how close f(X)is to
Yand it penalizes values of f(X)that are far from Y. Typical
choices of Lare squared error loss L(Y,f(X)) =(Yf(X))2
for regression and zero-one loss for classification:
L(Y,f(X)) =I(Y,f(X)) =0if Y=f(X)
1otherwise. (21)
It turns out that minimizing EXY (L(Y,f(X))) for squared er-
ror loss gives the conditional expectation f(x)=E(Y|X=x),
which is known as the regression function. When classification
is considered, if the set of possible values of Yis denoted by
Y, then minimizing EXY (L(Y,f(X))) for zero-one loss results
to
f(x)=arg max
y∈Y P(Y=y|X=x)(22)
which is the Bayes rule.
Ensembles construct fin terms of the so-called â ˘
AIJbase
learnersâ ˘
A˙
Ih1(x), ..., hJ(x)and these are combined to give
the â ˘
AIJensemble predictorâ ˘
A˙
If(x). In regression, the base
learners are averaged
f(x)=1
J
J
Õ
j=1
hj(x)(23)
while in classification, f(x)is the most frequently predicted
class
f(x)=arg max
y∈Y
J
Õ
j=1
Iy=hj(x)(24)
In RForests the jth base learner is a tree denoted as
hjX,Θj, where Θj,j=1, ..., J.is a collection of inde-
pendent random variables. To deeply understand the RForest
algorithm, a fundamental knowledge of the type of trees used
as base learners is needed.
6) M5P: The M5 model tree method was introduced by
Quinlan in 1992 [154]. Wang and Witten later presented
an improved public-domain scheme [155], called M5P, that
generates more compact and comprehensible models with
slightly better accuracy. M5P combines conventional binary
decision tree models with regression planes at the leaves, to
provide a way to deal with continuous-class problems. The
initial tree split is based on a standard deviation criterion,
called standard deviation reduction (SDR) and given by
SDR =SD(A) − Õ
i
|Ti|
|T|SD(T),(25)
where SD(A)is the standard deviation of the set A,Tis the
set of learning examples that reach the node, and {Ti}are
the subsets that result from splitting Taccording to a chosen
attribute. The attribute that maximizes SDR is the chosen for
the split. However, this process can lead to large tree structures
that are prone to over-fitting. Therefore, pruning the tree is
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 18
necessary to improve accuracy. For every interior node of
the tree, a regression model is calculated with the examples
that reach that node, if the subtree error is greater than the
respective error of the regression model in that node, the tree
is pruned and that particular node is turned into a leaf node.
Recently, M5P was used in [148] to built a simulator that
can dynamically predict the mortality rate of cells in biological
systems in order to test possible toxic effects from exposure to
nano-materials. The simulator’s user can change the attribute
values dynamically and obtain the predicted value of the used
metric.
J. Decision Table
ADT is a simple tabular representation of conditions and
actions [156]. It is very similar to the popular decision trees.
A key difference between among them is that the former can
include more than one “OR” condition. However, DTs are
usually preferred when a small number of features is available,
whereas decision trees can be used for more complex models.
Decision Table Naive Bayes: Combined learning models
is an efficient way to improve the accuracy of stand-alone
models. DT Naive Bayes (DTNB) is such a hybrid model,
where a DT classifier is combined with a naive Bayes network,
to produce a table with conditional probabilities. The learning
process for DTNB splits the training data into two disjoint
subsets and utilizes one set for training the DT and the other
for training the NB [156]. The goal is to use NB on the
attributes that are somewhat independent, since NB already
assumed independence of attributes. Cross validation methods
are suitable in this hybrid model since it is effective in
both DTs, due to the structure of the table remaining the
same, and the NB as the frequency counts can be updated
in constant time.
Assuming that xDT is the set of attributes used in DT, and
xNB is the respective set of attributes for NB, the class k
probability can be computed as
P(Ck|x)=aP(Ck|xDT)P(Ck|xNB)
P(Ck),(26)
where ais a normalization constant and P(Ck)is the prior
probability of the class. DTNB is shown to achieve significant
gains over both DTs and NB. More specifically, in [114],
DTNB was used among other methods to predict cytotoxicity
values of nanomaterials in biological systems.
K. Surrogate-Based Optimization
Surrogate-based optimization [157], [158] refers to a class
of optimization methodologies, that calculate the local or
global optima by utilizing surrogate modeling techniques.
This framework utilizes conventional optimization algorithms,
such as gradient-based or evolutionary algorithms, for sub-
optimization. Surrogate modeling techniques can significantly
improve the design efficiency and facilitate finding global
optima, filtering numerical noise, accomplishing parallel de-
sign optimization and integrating simulation codes of different
disciplines into a process chain.
In optimization problems, surrogate models can approxi-
mate the cost functions and the state functions, constructed
from sampled data which are obtained by randomly exploring
the design space. After this step, a new design based on the
surrogate models, which is most likely to be the optimum, is
searched by applying an optimization algorithm such as Ge-
netic Algorithms. Utilizing a surrogate model for the estima-
tion of the optimum is more effective than using a numerical
analysis code, thus, the computational cost of the search based
on the surrogate models is negligible. Surrogate models are
built from the sampled data, thus the way the sample points
are chosen and the way the accuracy of surrogate models is
evaluated are important issues for surrogate modeling.
In [159], surrogate-based optimization is used to search the
space of intermetallics for potentially selective catalysts for
CO2reduction reaction and hydrogen evolution reaction.
L. Quantitative Structure-Activity Relationships
ML techniques have been combined with QSARs models
over the past decade [160]. One of the most successful
applications of such models is the development of new drugs
faster and with lower cost. QSAR methods are data-driven
and based on supervised learning. They capture the complex
relationships between the properties of nanomaterials without
requiring detailed knowledge of the mechanisms of interaction.
In more detail, every biological activity of organic molecules
is a function of their structural properties that depend on their
chemical structures. These relationships can be expressed as
in [160]
Activity =fÕ(Properties),(27)
and
Property =f(Structure).(28)
Due to the complexity of the materials the predictivity of
the applied methods must be optimized, thus various differ-
ent techniques have been used in the literature. Specifically,
in [161], QSAR models were developed based on sparse
linear FS and regression in conjunction with a minimization
algorithm, while, in [162]–[164], nonlinear FS was used with
Bayesian regularized NNs that used Gaussian or Laplacian
priors. Also, ANNs have been recently employed to forecast
the biological activity of compounds under investigation, while
the ANN-classification model categorizes the compounds for
a specific biological response [165].
M. Boltzmann Generator
The aim of statistical mechanics is to assess the average
behavior of physical systems based on their microscopic
constituents and interactions, in order not only to understand
the molecules and materials functionalities, but also provide
the principles for devising drug molecules and materials with
novel properties. In this direction, the statistics of the equi-
librium states of many-body systems needs to be evaluated.
To conceive the complexity of this, let us try to evaluate
the probability that, at a given temperature, a protein will be
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 19
Re-weight
Fig. 14. Boltzmann generator.
folded. In order to solve this problem, we need to examine
each one of the huge number of ways to place all the proteins
in a predetermined space, and for each one of them extract
the corresponding probability. However, since the enumeration
of all configurations is extremely difficult or even infeasible,
the necessity to sample them from their equilibrium distribu-
tion has been identified in [28]. In this work, the authors
proposed the Boltzmann generator, which combines deep ML
and statistical mechanics in order to learn sample equilibrium
distributions. In contrast to conventional generative learning,
the Boltzmann generator is not trained to learn the probability
density from data, but to directly produce independent samples
of low-energy structures for condensed-matter systems and
protein molecules.
As presented in Fig. III-M, the operation principle of
Boltzmann generator consists of two parts:
1) A generative model, Fz x , is trained capable of providing
samples from a stochastic distribution, which is described
by the probability density function (PDF), fx(x), when
sampling zfrom a simple prior, such as a Gaussian
distribution with PDF fz(z).
2) A re-weighting process that transforms the generated
distribution, fx(x), into the Boltzmann distribution, and
produces unbiased samples from the eu(x), with u(x)
being the dimensionless energy.
Note that both training and re-weighting require fx(x)knowl-
edge. This can be ensured by adopting an invertible Fzx
transformation, which allows us to transform fz(z)to fx(x).
Output =
a
b
c
Input
System
Refine input
Output
Regression analysis
Search algorithm
d
e
Fig. 15. Examples of classical and ML-SFs (from [166])
N. Feedback System Control
FSC [166] is a recently proposed method for the optimiza-
tion of drug combinations. FSC is a phenotypically driven
optimization process, which does not require any mechanistic
knowledge for the system. This is the reason that FSC can
be successfully applied in various complex biological systems
(see [167] and references therein)
The FSC method is based on the closed-loop feedback
control process outlined in Fig. 15 [166]. It mainly consists
of two steps: the first step is the definition of an initial set
of compounds to be tested. The second step refers to the
generation of broad dose-response curves for each compound
in the selected cellular bioassay, which is selected to provide
a phenotypic output response, that is used to evaluate the
efficacy of the drugs and drug combinations on overall cell
activity.
A schematic representation of the FSC technique is pre-
sented in Fig. 15. The five main components of the optimiza-
tion process are depicted as:
(a) The input, i.e., the drug combinations with defined drug
doses.
(b) The system, i.e., the selected cell type representation of
the disease to be studied
(c) The system output, i.e., the cellular response to the
defined drug combination input in the selected cell bioassay.
(d) The search algorithm that iteratively drives the system
output toward the desired response.
(e) The statistical analysis used to guide drug elimination.
O. Quadratic Phenotypic Optimization Platform
Methods based on ML, like FSC, aim to overcome the
disadvantages of the traditional methods, as for example the
high-throughput screening. Recently, a powerful AI platform
called Quadratic Phenotypic Optimization Platform (QPOP)
was proposed, to interrogate a large pool of potential drugs
and to design a novel combination therapy against multiple
myeloma [68]. This platform can efficiently and iteratively
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 20
outputs effective drug combinations and can optimize the drug
doses.
The main concept of QPOP lies in recognizing the re-
lationship between inputs (e.g., drugs) and desired pheno-
typic outputs (e.g., cell viabilities) to a smooth, second-order
quadratic surface representative of the biological system of
interest. Since QPOP utilizes only the controllable inputs
and measurable phenotypic outputs of the biological system,
it is able to identify optimal drug combinations and doses
independently of predetermined drug synergy information
and pharmacokinetic properties. Furthermore, QPOP utilized
ML in order to preclinically re-optimize the combination
and successfully translate the multi-drug regimen through in
vivo validation. It is important to mention that both the in
vitro and preclinical re-optimization processes were able to
simultaneously take into account both efficacy and safety, and
this is an important aspect of the QPOP platform.
QPOP can also be used as an actionable platform to
design patient-specific regimens. This multi-parametric global
optimization methodology can overcome many of the drug
development process difficulties, and can result in efficient
and safe therapies. This will revisit the drug development,
translating into improved and effective treatment choices.
More details about the use of the QPQP platform in
biomedicine applications can be found in [168] and [169] and
references therein.
IV. DISCUSSION & TH E ROAD AHEAD
In this section, we clarify how the ML methodologies
presented in Section III can be efficiently used to solve the
problems discussed in Section II and elaborate on some major
open research problems, which are of great importance for
unveiling the potential benefits, advantages and limitations of
employing ML in nano-scale biomedical engineering. In this
direction, Table I, which is given in the top of the next page,
connects the ML challenges with the ML methodologies, that
have been used in nano-scale biomedical engineering.
From Table I, it becomes evident that ANNs can be em-
ployed to solve a large variety of ML problems in nano-scale
biomedical engineering. The ML methods CNNs, RNNs, and
DNNs are capable of identifying patterns, locate and classify
target objects in an image, and detect events [170]. As a result,
they can excel in the development of ARES, which contributes
to the discovery, design, and performance optimization of
nano-structures and nano-materials. Furthermore, they can be
used for the detection of received symbols in molecular and
electromagnetic nano-networks, for the classification of obser-
vations that may provide a better understanding of biological
and chemical processes, and for the identification of specific
patterns. On the other hand, D2NNs can efficiently execute
identification and classification tasks, after being trained by
large datasets. Therefore, they have been successfully used in
lens imaging at THz spectrum, while they are expected to find
application in image analysis, feature detection, and object
classification. In other words, D2NNs may be employed for
heterogeneous nano-structures discovery, channel estimation
and symbol detection in nano-scale molecular and THz net-
works, as well as disease detection and therapy development.
By inducing the algorithm to learn complex relationships
within a training dataset and making judgments on test datasets
with high fidelity, GRNNs are capable of providing a sys-
tematic methodology to map inputs to predictive outputs. As
a consequence, they have been applied in several fields, in-
cluding optical character recognition, pattern recognition, and
manufacturing for predicting the output classification [171],
[172]. In nano-scale biomedical engineering, they have been
extensively used in discovering the properties of and de-
signing heterogeneous nano-structures [172], [173] as well
as analyzing the data collected from them [174]. However,
their applicability in molecular and electromagnetic nano-scale
networks specific problems needs to be assessed.
Based on Cybernko’s theorem [175], MLPs are proven to be
universal function approximators. In other words, they return
low-complexity approximating solutions from extremely com-
plex problems. As a result, MLPs have been a popular ML
method in 80s in several fields including speech and image
recognition (see e.g., [176], [177] and references therein). In
nano-scale biomedical engineering, MLPs have been applied
for nano-structure properties discovery [178], [179] and data
analysis [84]. However, it is expected to be replaced by much
simpler SVMs, which are considered their main competitors.
GANs have been recently used to inversely design meta-
surfaces in order to provide arbitrary patterns of the unit cell
structure [180]. However, they experience high instability. To
solve this problem conditional deep convolutional GANs are
usually employed. These networks return very stable Nash
equilibrium solutions that can be used for inversely designing
nanophotonic structures [181], [182]. Another application of
GANs lies in the statistical characterization of psychological
wellness states [84]. In general, for applications in which
the data have a non-linear behavior, GANs achieve similar
performance as SVMs and knearest neighbor, and outper-
form MLPs.
Classical force field theory can neither easily scale into large
molecules nor become transferable to different environments.
To break these limitations, BPMs, DPNs, DTNNs, SchNets,
and CGNs have been traditionally used to model the PESs and
atomic forces in large molecules, like proteins and provide
transferability to different covalent and non-covalent environ-
ments. However, these approaches are incapable of reaching
the required accuracy with lower than classical force field eval-
uation complexity. Motivated by this, symmetrized gradient-
domain ML have been very recently presented as a possible
solution to the aforementioned problem [14], [183]–[185].
The limitation of this ML approach is that it cannot support
molecules that consists of more than 20 atoms. In other words,
it lacks scalability and transferability. To countermeasure this,
researchers should turn their eye in combining BPMs, DPNs,
DTNNs, SchNets, and CGNs with gradient-domain ML in
order to provide high-accuracy in configuration and chemical
space simulations. A plethora of new insights await as a result
of such simulations.
Regression approaches have been used to extract the rela-
tionship between several independent variables and one depen-
dent variable. Therefore, they have supported the solution of
a large variety of problems that range from the area of nano-
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 21
TABLE I
ML PROBL EM S AND S OL UTI ONS.
ML approaches ML challenge categories
Structure and
material design and
simulation
Communications and
signal processing
Applications
ANNs
Convolution neural networks XXX
Recurrent neural networks XXX
Deep neural networks XXX
Diffractive deep neural networks X- -
Generalized regression neural networks X-X
Multi-layer perceptron X X -
Generative adversarial networks X-X
Behler-Parrinello networks X- -
Deep potential networks X- -
Deep tensor neural networks X- -
SchNet X- -
Accurate neural network engine for molecular
energies
X- -
Coarse graining X- -
Neuromorphic computing X- -
Regression
Logistic regression XXX
Multivariate linear regression X- -
Classification via regression X- -
Local weighted learning X- -
Machine learning scoring functions X- -
Support vector machine
Support vector machine XXX
k-nearest neighbors
k-nearest neighbors X X -
Dimentionality reduction
Feature selection X- -
Principle component analysis X-X
Linear discriminant analysis X-X
Independent component analysis X-X
Gradient descent
Gradient descent X- -
Active learning
Active learning X- -
Bayesian ML XXX
Decision tree learning
Bagging X X -
Bagged tree - - X
Naive Bayer tree X-X
Adaptive boosting - - X
Random forest - - X
M5P X- -
Decision table
Decision table naive Bayes X X -
Surrogate-based optimization
Surrogate-based optimization X- -
QSAR
QSAR X-X
Boltzmann generator
Boltzmann generator X- -
Feedback system control
Feedback system control - - X
Quadratic phenotypic optimization platform
Quadratic phenotypic optimization platform X-X
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 22
materials and nano-structure design to data-driven applications
in biomedical engineering [112], [123]. Moreover, they usually
require no input features or tuning for scaling and they are easy
to regularize. However, it is incapable of solving non-linear
problems. Another disadvantage of regression approaches is
that they require the identification of all the important inde-
pendent attributes before inserting the data into the machine.
Moreover, most of them return discrete outputs, i.e., they only
provide categorical outcomes. Finally, they are sensitive to
overfitting [186].
Similarly to regression, SVMs are efficient methods for
problems with high-dimensional spaces. Taking this into ac-
count, several researchers have adopted them in order to pro-
vide solutions to a large range of problems from heterogeneous
structure design to signal detection in molecular communi-
cation systems and data-driven applications. However, as the
data set size increases, SVMs may underperform. Another
limitation that should be highlighted is that they are not
suitable for problems with overlapping targeting classes [187].
KNN has been employed in structure and material de-
sign [188], MCs for symbol detection [6], and disease de-
tection [189], [190]. It is a low-complexity approach suitable
for classifying data without training. However, it suffers from
performance degradation when applied to large data sets, due
to increased cost of computing the distance between the new
point and each of the existing points. A similar performance
degradation is observed as the dimensions of the data increase.
This indicates that the application of KNN approach in hetero-
geneous nano-structure design is questionable. On the other
hand, it excels in data sequence detection in MC systems,
where the dimension of the data is no higher than 2.
Dimensionality reduction methods have been applied in the
nano-structure and material design [191], [192] as well as
in therapy development [193]. Their objective is to remove
dimensions, i.e. redundant features, in order to identify the
more suitable variable for the problem under investigation.
As a result, they contribute to data compression and to
computation time reduction. Moreover, they are capable of
transforming multi-dimensional problems to two dimensional
(2D) or 3D ones allowing their visualization. This property has
been extensively used in nano-structure properties discovery.
Likewise, dimensionality reduction methods can aid at noise
removal; thus, they can significantly improve the model’s
performance. However, they come with some disadvantages.
In particular, they cause data loss. Moreover, PCA tends to
extract linear correlations between variables. In practice, most
of the nano-structure properties have a non-linear behavior. As
a result, PCA may return unrealistic results. This highlights the
need of designing new dimensionality reduction methods that
take into accounts the chemical and biological properties of the
nano-structure components. Finally, dimensionality reduction
methods traditionally fail in cases where the datasets cannot
be fully defined by their mean and covariance.
GD is an iterative ML optimization algorithm that aims
at reducing the cost function in order to make accurate
predictions; therefore, it has been employed in predicting the
properties of heterogeneous nano-structures. Its main disad-
vantage is that the solution returned by this method is not
guaranteed to be a global minimum. As a result, every time
that the search-space is expanded, due to the incorporation of
an additional parameter into the objective function, the surface
of optimal solutions may exhibit numerous locally optimal
solutions. Thus, conventional GD algorithms may return a non-
global local optimum. In this context, examination of more
sophisticated GD algorithms needs to be performed. Finally,
GD may be seen as an attractive optimization tool for finding
Pareto-optimal solutions of multi-objective optimization prob-
lems in nano-scale networks. Such problems would aim at
minimizing the outage probability, power consumption and/or
maximizing throughput, network lifetime and other parameters
that improve the network’s quality of experience.
DTL algorithms are able to solve both regression and
classification problems. As a result, they have been extensively
used in several fields including structure and material design
and simulation as well as analyzing data acquired from nano-
scale systems. Compared to other ML algorithms, decision
tree and table learning algorithms simplify data preparation
processes, since they demand neither data normalization nor
scaling. Moreover, they perform well even when with in-
complete data sets and their models are very intuitive and
easy to explain. Therefore, several researchers have used them
to provide comprehensive understanding of the properties of
nano-structures and the relationship with their design param-
eters. However, DTL algorithms are sensitive to even small
changes in the data. In more detail, a small change in the
data may result in a significant change in the structure of the
decision tree, which in turn may cause instability. Another
disadvantage of decision trees and tables is that they require
higher time to train the models and to perform after-training
calculations. Finally, they are incapable for applying regression
and predicting continuous values. These disadvantages render
them unsuitable for use in real-time applications in the fields
of communications and signal processing as well as in nano-
scale networks.
QSARs are mathematical models, which relate a phar-
macological or biological activity with the physicochemical
characteristics (termed molecular descriptors) of molecule
sets. Indicative examples of QSAR applications are the study
of enzyme activity [194], the minimum effective dose of
a drug estimation [195], and toxicity prediction of nano-
structures [196]. The main advantage of QSAR models lies
with their ability to predict activities of a large number of
compounds with little to no prior experimental data. However,
they are incapable of providing in-depth insights on the
mechanism behind biological actions.
Boltzmann generators have been employed to create physi-
cally realistic one-shot samples of model systems and proteins
in implicit solvent [197], [198]. Scaling to large systems,
such as those investigated in MCs and nano-scale networks,
needs to build the invariances of the energy, as the exchange
of molecules, into the transformation to include parameter
sharing. In other words, researchers need to develop equiv-
ariant networks with parameter sharing. These networks are
expected to provide a better understanding of molecular chan-
nel modeling and eventually contribute to the design of new
transmission schemes.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 23
V. CONCLUSION
In summary, in this article, we have reviewed how ML
algorithms bear fruits in nano-scale biomedical engineering. In
more detail, we presented the main challenges and problems
in this field, which can be solved through ML, and clas-
sified them, based on their discipline, into three distinctive
categories. For each category, we have provided insightful
discussions that revealed its particularities as well as existing
research gaps. Moreover, we have surveyed a variate of SOTA
ML methodologies and models, which have been used as
countermeasures to the aforementioned challenges. Special
attention was payed to the ML methodologies architecture,
operating principle, advantages and limitations. Finally, future
research directions have been provided, which highlight the
need of thorough interdisciplinary research efforts for the
successful realization of hitherto uncharted scenarios and
applications in the nano-scale biomedical engineering field.
REFERENCES
[1] D. Bobo, K. J. Robinson, J. Islam, K. J. Thurecht, and S. R. Corrie,
“Nanoparticle-based medicines: A review of FDA-approved materials
and clinical trials to date,” Pharm. Res., vol. 33, no. 10, pp. 2373–2387,
jun 2016.
[2] I. Akyildiz, M. Pierobon, S. Balasubramaniam, and Y. Koucheryavy,
“The internet of bio-nano things,” IEEE Commun. Mag., vol. 53, no. 3,
pp. 32–40, Mar. 2015.
[3] N. Farsad, H. B. Yilmaz, A. Eckford, C.-B. Chae, and W. Guo, “A
comprehensive survey of recent advancements in molecular communi-
cation,” IEEE Communications Surveys & Tutorials, vol. 18, no. 3, pp.
1887–1919, 2016.
[4] T. J. Cleophas and A. H. Zwinderman, Machine Learning in Medicine
- a Complete Overview. Springer International Publishing, 2015.
[5] S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vuckovi´
c, and A. W.
Rodriguez, “Inverse design in nanophotonics,Nat. Photonics, vol. 12,
no. 11, pp. 659–670, Oct. 2018.
[6] X. Qian, M. D. Renzo, and A. Eckford, “Molecular communications:
Model-based and data-driven receiver design and optimization,IEEE
Access, vol. 7, pp. 53 555–53 565, 2019.
[7] F. Bao, Y. Deng, Y. Zhao, J. Suo, and Q. Dai, “Bosco: Boosting correc-
tions for genome-wide association studies with imbalanced samples,”
IEEE Transactions on NanoBioscience, vol. 16, no. 1, pp. 69–77, jan
2017.
[8] X. Duan, L. Dai, S.-C. Chen, J. P. Balthasar, and J. Qu, “Nano-scale
liquid chromatography/mass spectrometry and on-the-fly orthogonal
array optimization for quantification of therapeutic monoclonal anti-
bodies and the application in preclinical analysis,” J. Chromatogr. A,
vol. 1251, pp. 63–73, Aug. 2012.
[9] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh,
“Machine learning for molecular and materials science,” Nature, vol.
559, no. 7715, pp. 547–555, Jul. 2018.
[10] J. Behler and M. Parrinello, “Generalized neural-network representa-
tion of high-dimensional potential-energy surfaces,Phys. Rev. Lett.,
vol. 98, no. 14, Apr. 2007.
[11] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Fast
and accurate modeling of molecular atomization energies with machine
learning,” Phys. Rev. Lett., vol. 108, no. 5, Jan. 2012.
[12] F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke, and K.-R.
MÃijller, “Bypassing the kohn-sham equations with machine learning,
Nat. Commun., vol. 8, no. 1, Oct. 2017.
[13] T. Bereau, R. A. DiStasio, A. Tkatchenko, and O. A. von Lilienfeld,
“Non-covalent interactions across organic and biological subsets of
chemical space: Physics-based potentials parametrized from machine
learning,” The Journal of Chemical Physics, vol. 148, no. 24, p. 241706,
Jun. 2018.
[14] S. Chmiela, H. E. Sauceda, K.-R. MÃijller, and A. Tkatchenko,
“Towards exact molecular dynamics simulations with machine-learned
force fields,” Nat. Commun., vol. 9, no. 1, Sep. 2018.
[15] J. S. Smith, B. T. Nebgen, R. Zubatyuk, N. Lubbers,
C. Devereux, K. Barros, S. Tretiak, O. Isayev, and A. Roitberg,
“Approaching coupled cluster accuracy with a general-purpose neural
network potential through transfer learning,” ChemRxiv, 6 2019.
[Online]. Available: https://chemrxiv.org/articles/preprint/Outsmarting_
Quantum_Chemistry_Through_Transfer_Learning/6744440
[16] S. T. John and G. Csányi, “Many-body coarse-grained interactions
using gaussian approximation potentials,” The Journal of Physical
Chemistry B, vol. 121, no. 48, pp. 10 934–10 949, Nov. 2017.
[17] L. Zhang, J. Han, H. Wang, R. Car, and W. E, “DeePCG: Constructing
coarse-grained models via deep neural networks,” The Journal of
Chemical Physics, vol. 149, no. 3, p. 034101, jul 2018.
[18] J. Wang, S. Olsson, C. Wehmeyer, A. Pérez, N. E. Charron, G. de Fab-
ritiis, F. Noé, and C. Clementi, “Machine learning of coarse-grained
molecular dynamics force fields,” ACS Central Science, Apr. 2019.
[19] T. Stecher, N. Bernstein, and G. Csányi, “Free energy surface recon-
struction from umbrella samples using gaussian process regression,” J.
Chem. Theory Comput., vol. 10, no. 9, pp. 4079–4097, aug 2014.
[20] L. Mones, N. Bernstein, and G. Csányi, “Exploration, sampling, and
reconstruction of free energy surfaces with gaussian process regres-
sion,” J. Chem. Theory Comput., vol. 12, no. 10, pp. 5100–5110, Sep.
2016.
[21] E. Schneider, L. Dai, R. Q. Topper, C. Drechsel-Grau, and M. E.
Tuckerman, “Stochastic neural network approach for learning high-
dimensional free energy surfaces,Phys. Rev. Lett., vol. 119, no. 15,
oct 2017.
[22] J. M. L. Ribeiro, P. B. Collado, Y. Wang, and P. Tiwary, “Reweighted
autoencoded variational bayes for enhanced sampling (rave),ArXiv,
Feb. 2018.
[23] J. R. Cendagorta, J. Tolpin, E. Schneider, R. Q. Topper, and M. E.
Tuckerman, “Comparison of the performance of machine learning
models in representing high-dimensional free energy surfaces and
generating observables,The Journal of Physical Chemistry B, vol.
124, no. 18, pp. 3647–3660, Apr. 2020.
[24] B. M. Warfield and P. C. Anderson, “Molecular simulations and
markov state modeling reveal the structural diversity and dynamics
of a theophylline-binding RNA aptamer in its unbound state,PLOS
ONE, vol. 12, no. 4, p. e0176229, Apr. 2017.
[25] A. Mardt, L. Pasquali, H. Wu, and F. Noé, “VAMPnets for deep
learning of molecular kinetics,” Nat. Commun., vol. 9, no. 1, Jan. 2018.
[26] H. Wu, A. Mardt, L. Pasquali, and F. Noe, “Deep generative markov
state models,” ArXiv, May 2018.
[27] W. Chen, H. Sidky, and A. L. Ferguson, “Nonlinear discovery of slow
molecular modes using state-free reversible VAMPnets,” The Journal
of Chemical Physics, vol. 150, no. 21, p. 214114, jun 2019.
[28] F. Noé, S. Olsson, J. Köhler, and H. Wu, “Boltzmann generators:
Sampling equilibrium states of many-body systems with deep learning,”
Science, vol. 365, no. 6457, p. eaaw1147, Sep. 2019.
[29] J. Peurifoy, Y. Shen, L. Jing, Y. Yang, F. Cano-Renteria, B. G. DeLacy,
J. D. Joannopoulos, M. Tegmark, and M. Soljaˇ
ci´
c, “Nanophotonic
particle simulation and inverse design using artificial neural networks,
Science Advances, vol. 4, no. 6, p. eaar4206, jun 2018.
[30] D. Liu, Y. Tan, E. Khoram, and Z. Yu, “Training deep neural networks
for the inverse design of nanophotonic structures,ACS Photonics,
vol. 5, no. 4, pp. 1365–1369, Feb. 2018.
[31] Z. Liu, D. Zhu, S. P. Rodrigues, K.-T. Lee, and W. Cai, “Generative
model for the inverse design of metasurfaces,Nano Lett., vol. 18,
no. 10, pp. 6570–6576, Sep. 2018.
[32] B. Cao, L. A. Adutwum, A. O. Oliynyk, E. J. Luber, B. C. Olsen,
A. Mar, and J. M. Buriak, “How to optimize materials and devices
via design of experiments and machine learning: Demonstration using
organic photovoltaics,ACS Nano, vol. 12, no. 8, pp. 7434–7444, Jul.
2018.
[33] R. D. King, K. E. Whelan, F. M. Jones, P. G. K. Reiser, C. H. Bryant,
S. H. Muggleton, D. B. Kell, and S. G. Oliver, “Functional genomic hy-
pothesis generation and experimentation by a robot scientist,” Nature,
vol. 427, no. 6971, pp. 247–252, Jan. 2004.
[34] I. F. Akyildiz and J. M. Jornet, “Electromagnetic wireless nanosensor
networks,” Nano Communication Networks, vol. 1, no. 1, pp. 3–19,
mar 2010.
[35] N. Agoulmine, K. Kim, S. Kim, T. Rim, J.-S. Lee, and M. Meyyappan,
“Enabling communication and cooperation in bio-nanosensor networks:
toward innovative healthcare solutions,” IEEE Wireless Communica-
tions, vol. 19, no. 5, pp. 42–51, oct 2012.
[36] N. A. Ali and M. Abu-Elkheir, “Internet of nano-things healthcare ap-
plications: Requirements, opportunities, and challenges,” in 2015 IEEE
11th International Conference on Wireless and Mobile Computing,
Networking and Communications (WiMob). IEEE, oct 2015.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 24
[37] S. Hiyama, Y. Moritani, T. Suda, R. Egashira, A. Enomoto, M. Moore,
and T. Nakano, “Molecular communication,Journal-Institute of Elec-
tronics Information and Communication Engineers, vol. 89, no. 2, p.
162, 2006.
[38] V. Jamali, A. Ahmadzadeh, C. Jardin, H. Sticht, and R. Schober,
“Channel estimation for diffusive molecular communications,IEEE
Trans. Commun., pp. 1–1, 2016.
[39] S. M. Rouzegar and U. Spagnolini, “Channel estimation for diffusive
MIMO molecular communications,” in European Conference on Net-
works and Communications (EuCNC). IEEE, Jun. 2017.
[40] S. Abdallah and A. M. Darya, “Semi-blind channel estimation for
diffusive molecular communication,IEEE Commun. Lett., pp. 1–1,
2020.
[41] K. V. Srinivas, A. W. Eckford, and R. S. Adve, “Molecular commu-
nication in fluid media: The additive inverse gaussian noise channel,”
IEEE Trans. Inf. Theory, vol. 58, no. 7, pp. 4678–4692, Jul. 2012.
[42] T. Nakano, Y. Okaie, and J.-Q. Liu, “Channel model and capacity
analysis of molecular communication with Brownian motion,IEEE
Commun. Lett., vol. 16, no. 6, pp. 797–800, Jun. 2012.
[43] H. B. Yilmaz, A. C. Heren, T. Tugcu, and C.-B. Chae, “Three-
dimensional channel characteristics for molecular communications with
an absorbing receiver,IEEE Commun. Lett., vol. 18, no. 6, pp. 929–
932, Jun. 2014.
[44] A. Ahmadzadeh, A. Noel, and R. Schober, “Analysis and design of
multi-hop diffusion-based molecular communication networks,IEEE
Trans. Mol. Biol. Multi-Scale Commun., vol. 1, no. 2, pp. 144–157,
Jun. 2015.
[45] Q. Li, “The clock-free asynchronous receiver design for molecular
timing channels in diffusion-based molecular communications,IEEE
Trans. Nanobiosci., vol. 18, no. 4, pp. 585–596, Oct. 2019.
[46] M. Pierobon and I. Akyildiz, “A physical end-to-end model for molec-
ular communication in nanonetworks,” IEEE J. Sel. Areas Commun.,
vol. 28, no. 4, pp. 602–611, may 2010.
[47] D. Kilinc and O. B. Akan, “Receiver design for molecular communi-
cation,” IEEE J. Sel. Areas Commun., vol. 31, no. 12, pp. 705–714,
Dec. 2013.
[48] A. Noel, D. Makrakis, and A. Hafid, “Channel impulse responses in
diffusive molecular communication with spherical transmitters,arXiv:
Emerging Technologies, 2016.
[49] F. Dinc, B. C. Akdeniz, A. E. Pusane, and T. Tugcu, “Impulse response
of the molecular diffusion channel with a spherical absorbing receiver
and a spherical reflective boundary,” IEEE Trans. Mol. Biol. Multi-
Scale Commun., vol. 4, no. 2, pp. 118–122, Jun. 2018.
[50] M. S. Kuran, H. B. Yilmaz, and T. Tugcu, “A tunnel-based approach
for signal shaping in molecular communication,” in IEEE International
Conference on Communications Workshops (ICC). IEEE, Jun. 2013.
[51] H. B. Yilmaz, C. Lee, Y. J. Cho, and C.-B. Chae, “A machine learning
approach to model the received signal in molecular communications,
in IEEE International Black Sea Conference on Communications and
Networking (BlackSeaCom). IEEE, Jun. 2017.
[52] C. Lee, H. B. Yilmaz, C. Chae, N. Farsad, and A. Goldsmith, “Machine
learning based channel modeling for molecular mimo communica-
tions,” in IEEE 18th International Workshop on Signal Processing
Advances in Wireless Communications (SPAWC), 2017, pp. 1–5.
[53] N. Farsad and A. Goldsmith, “Neural network detection of data
sequences in communication systems,” IEEE Trans. Signal Process.,
vol. 66, no. 21, pp. 5663–5678, Nov. 2018.
[54] J. M. Jornet and I. F. Akyildiz, “Femtosecond-long pulse-based modu-
lation for terahertz band communication in nanonetworks,” IEEE Trans.
Commun., vol. 62, no. 5, pp. 1742–1754, May 2014.
[55] M. O. Iqbal, M. M. U. Rahman, M. A. Imran, A. Alomainy, and Q. H.
Abbasi, “Modulation mode detection and classificationfor in vivo nano-
scale communication systems operating in terahertz band,” IEEE Trans.
Nanobiosci., vol. 18, no. 1, pp. 10–17, Jan. 2019.
[56] R. Zhang, K. Yang, Q. H. Abbasi, K. A. Qaraqe, and A. Alomainy,
“Analytical modelling of the effect of noise on the terahertz in-
vivo communication channel for body-centric nano-networks,Nano
Communication Networks, vol. 15, pp. 59–68, mar 2018.
[57] C.-C. Wang, X. Yao, W.-L. Wang, and J. M. Jornet, “Multi-hop de-
flection routing algorithm based on reinforcement learning for energy-
harvesting nanonetworks,IEEE Trans. Mobile Comput., pp. 1–1, 2020.
[58] T. Nakano, M. J. Moore, F. Wei, A. V. Vasilakos, and J. Shuai, “Molec-
ular communication and networking: Opportunities and challenges,”
IEEE Transactions on NanoBioscience, vol. 11, no. 2, pp. 135–148,
jun 2012.
[59] T. Nakano, T. Suda, Y. Okaie, M. J. Moore, and A. V. Vasilakos,
“Molecular communication among biological nanomachines: A layered
architecture and research issues,” IEEE Transactions on NanoBio-
science, vol. 13, no. 3, pp. 169–197, sep 2014.
[60] M. S. Mannoor, H. Tao, J. D. Clayton, A. Sengupta, D. L. Kaplan, R. R.
Naik, N. Verma, F. G. Omenetto, and M. C. McAlpine, “Graphene-
based wireless bacteria detection on tooth enamel,” Nature Communi-
cations, vol. 3, no. 1, jan 2012.
[61] P. M. Kosaka, V. Pini, J. J. Ruz, R. A. da Silva, M. U. González,
D. Ramos, M. Calleja, and J. Tamayo, “Detection of cancer biomarkers
in serum using a hybrid mechanical and optoplasmonic nanosensor,
Nature Nanotechnology, vol. 9, no. 12, pp. 1047–1053, nov 2014.
[62] T. C. Mai, M. Egan, T. Q. Duong, and M. Di Renzo, “Event detection in
molecular communication networks with anomalous diffusion,IEEE
Commun. Lett., vol. 21, no. 6, pp. 1249–1252, 2017.
[63] A. Giaretta, S. Balasubramaniam, and M. Conti, “Security vulnera-
bilities and countermeasures for target localization in bio-NanoThings
communication networks,” IEEE Transactions on Information Foren-
sics and Security, vol. 11, no. 4, pp. 665–676, apr 2016.
[64] A. Rizwan, A. Zoha, R. Zhang, W. Ahmad, K. Arshad, N. A. Ali,
A. Alomainy, M. A. Imran, and Q. H. Abbasi, “A review on the role of
nano-communication in future healthcare systems: A big data analytics
perspective,IEEE Access, vol. 6, pp. 41 903–41 920, 2018.
[65] M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, “Disease predic-
tion by machine learning over big data from healthcare communities,
IEEE Access, vol. 5, pp. 8869–8879, 2017.
[66] D. Bardou, K. Zhang, and S. M. Ahmad, “Classification of breast
cancer based on histology images using convolutional neural networks,
IEEE Access, vol. 6, pp. 24 680–24 693, 2018.
[67] B. Wilson and G. KM, “Artificial intelligence and related technologies
enabled nanomedicine for advanced cancer treatment,” Nanomedicine,
vol. 15, no. 5, pp. 433–435, feb 2020.
[68] M. B. M. A. Rashid, T. B. Toh, L. Hooi, A. Silva, Y. Zhang, P. F.
Tan, A. L. Teh, N. Karnani, S. Jha, C.-M. Ho, W. J. Chng, D. Ho,
and E. K.-H. Chow, “Optimizing drug combinations against multiple
myeloma using a quadratic phenotypic optimization platform (qpop),”
Science Translational Medicine, vol. 10, no. 453, 2018.
[69] A. Zarrinpar, D.-K. Lee, A. Silva, N. Datta, T. Kee, C. Eriksen,
K. Weigle, V. Agopian, F. Kaldas, D. Farmer, S. E. Wang, R. Busuttil,
C.-M. Ho, and D. Ho, “Individualizing liver transplant immunosup-
pression using a phenotypic personalized medicine platform,” Science
Translational Medicine, vol. 8, no. 333, pp. 333ra49–333ra49, apr
2016.
[70] A. J. Pantuck, D.-K. Lee, T. Kee, P. Wang, S. Lakhotia, M. H. Silver-
man, C. Mathis, A. Drakaki, A. S. Belldegrun, C.-M. Ho, and D. Ho,
“Modulating BET bromodomain inhibitor ZEN-3694 and enzalutamide
combination dosing in a metastatic prostate cancer patient using CU-
RATE.AI, an artificial intelligence platform,” Advanced Therapeutics,
vol. 1, no. 6, p. 1800104, aug 2018.
[71] L. Chua and T. Roska, “The CNN paradigm,IEEE Trans. Circuits
Syst. I, vol. 40, no. 3, pp. 147–156, Mar. 1993.
[72] M. Egmont-Petersen, D. de Ridder, and H. Handels, “Image processing
with neural networks—a review,” Pattern Recognit., vol. 35, no. 10, pp.
2279–2301, Oct. 2002.
[73] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall,
M. B. Gotway, and J. Liang, “Convolutional neural networks for
medical image analysis: Full training or fine tuning?” IEEE Trans.
Med. Imag., vol. 35, no. 5, pp. 1299–1312, May 2016.
[74] L. Fang, C. Wang, S. Li, H. Rabbani, X. Chen, and Z. Liu, “Attention
to lesion: Lesion-aware convolutional neural network for retinal optical
coherence tomography image classification,” IEEE Trans. Med. Imag.,
vol. 38, no. 8, pp. 1959–1970, Aug. 2019.
[75] K. Shibata, T. Tanigaki, T. Akashi, H. Shinada, K. Harada, K. Niitsu,
D. Shindo, N. Kanazawa, Y. Tokura, and T. hisa Arima, “Current-
driven motion of domain boundaries between skyrmion lattice and
helical magnetic structure,” Nano Lett., vol. 18, no. 2, pp. 929–933,
Jan. 2018.
[76] J. Carrasquilla and R. G. Melko, “Machine learning phases of matter,
Nat. Phys., vol. 13, no. 5, pp. 431–434, Feb. 2017.
[77] M. Rashidi and R. A. Wolkow, “Autonomous scanning probe mi-
croscopy in situ tip conditioning through machine learning,” ACS Nano,
vol. 12, no. 6, pp. 5185–5189, May 2018.
[78] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent
neural networks for sequence learning,” ArXiV, 2015.
[79] R. S. Hegde, “Deep learning: A new tool for photonic nanostructure
design,” Nanoscale Advances, vol. 2, no. 3, pp. 1007–1023, Feb. 2020.
[80] N. Farsad, D. Pan, and A. Goldsmith, “A novel experimental platform
for in-vessel multi-chemical molecular communications,” in IEEE
Global Communications Conference, Dec. 2017.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 25
[81] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[82] X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and
A. Ozcan, “All-optical machine learning using diffractive deep neural
networks,” Science, vol. 361, no. 6406, pp. 1004–1008, 2018.
[83] D. F. Specht, “A general regression neural network,” IEEE Transactions
on Neural Networks, vol. 2, no. 6, pp. 568–576, 1991.
[84] J. Park, K.-Y. Kim, and O. Kwon, “Comparison of machine learning
algorithms to predict psychological wellness indices for ubiquitous
healthcare system design,” in Proceedings of the 2014 International
Conference on Innovative Design and Manufacturing (ICIDM). IEEE,
aug 2014. [Online]. Available: https://doi.org/10.1109%2Fidam.2014.
6912705
[85] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of
Statistical Learning. Springer New York, 2009. [Online]. Available:
https://doi.org/10.1007%2F978-0-387-84858-7
[86] B. J. Wythoff, “Backpropagation neural networks: A tutorial,”
Chemometrics and Intelligent Laboratory Systems, vol. 18, no. 2, pp.
115 – 155, 1993. [Online]. Available: http://www.sciencedirect.com/
science/article/pii/016974399380052J
[87] K. A. Brown, S. Brittman, N. Maccaferri, D. Jariwala, and U. Celano,
“Machine learning in nanoscience: Big data at small scales,” Nano
Letters, vol. 20, no. 1, pp. 2–10, 2019.
[88] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-
Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative
adversarial nets,” in Advances in Neural Information Processing
Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D.
Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc.,
2014, pp. 2672–2680. [Online]. Available: http://papers.nips.cc/paper/
5423-generative-adversarial-nets.pdf
[89] K. T. Schutt, F. Arbabzadah, S. Chmiela, K. R. Muller, and
A. Tkatchenko, “Quantum-chemical insights from deep tensor neural
networks,” Nat. Commun., vol. 8, no. 1, Jan. 2017.
[90] A. P. Bartók, R. Kondor, and G. Csányi, “On representing chemical
environments,Physical Review B, vol. 87, no. 18, May 2013.
[91] J. S. Smith, O. Isayev, and A. E. Roitberg, “ANI-1: an extensible neural
network potential with DFT accuracy at force field computational cost,”
Chemical Science, vol. 8, no. 4, pp. 3192–3203, Feb. 2017.
[92] K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and
K.-R. MÃijller, “SchNet – a deep learning architecture for molecules
and materials,” The Journal of Chemical Physics, vol. 148, no. 24, p.
241722, Jun. 2018.
[93] K. T. Schütt, P. Kessel, M. Gastegger, K. A. Nicoli, A. Tkatchenko, and
K.-R. MÃijller, “SchNetPack: A deep learning toolbox for atomistic
systems,” J. Chem. Theory Comput., vol. 15, no. 1, pp. 448–455, nov
2018.
[94] K. T. Schütt, A. Tkatchenko, and K.-R. Müller, Learning Represen-
tations of Molecules and Materials with Atomistic Neural Networks.
Cham: Springer International Publishing, 2020, pp. 215–230. [Online].
Available: https://doi.org/10.1007/978-3-030-40245-7_11
[95] W.-K. Jeong, H. Pfister, and M. Fatica, “Medical image processing
using GPU-accelerated ITK image filters,” in GPU Computing Gems
Emerald Edition. Elsevier, 2011, pp. 737–749.
[96] P.-J. K. Kristof T. SchÃijtt, H. E. Sauceda, S. Chmiela, A. Tkatchenko,
and K.-R. Müller, “SchNet: A continuous-filter convolutional neural
network for modeling quantum interactions,” Advances in Neural
Information Processing Systems, vol. 30, pp. 991–1001, Dec. 2017.
[97] A. P. Lyubartsev and A. Laaksonen, “Calculation of effective inter-
action potentials from radial distribution functions: A reverse monte
carlo approach,” Physical Review E, vol. 52, no. 4, pp. 3730–3737,
Oct. 1995.
[98] C. Clementi, H. Nymeyer, and J. N. Onuchic, “Topological and
energetic factors: what determines the structural details of the transition
state ensemble and “en-route” intermediates for protein folding? an
investigation for small globular proteins,J. Mol. Biol., vol. 298, no. 5,
pp. 937–953, May 2000.
[99] F. Müller-Plathe, “Coarse-graining in polymer simulation: From the
atomistic to the mesoscopic scale and back,” ChemPhysChem, vol. 3,
no. 9, pp. 754–769, Sep. 2002.
[100] S. O. Nielsen, C. F. Lopez, G. Srinivas, and M. L. Klein, “A coarse
grain model for n-alkanes parameterized from surface tension data,”
The Journal of Chemical Physics, vol. 119, no. 14, pp. 7043–7049,
Oct. 2003.
[101] S. Matysiak and C. Clementi, “Optimal combination of theory and
experiment for the characterization of the protein folding landscape of
s6: How far can a minimalist model go?” J. Mol. Biol., vol. 343, no. 1,
pp. 235–248, Oct. 2004.
[102] S. J. Marrink, A. H. de Vries, and A. E. Mark, “Coarse grained
model for semiquantitative lipid simulations,The Journal of Physical
Chemistry B, vol. 108, no. 2, pp. 750–760, Jan. 2004.
[103] S. Matysiak and C. Clementi, “Minimalist protein model as a diagnostic
tool for misfolding and aggregation,” J. Mol. Biol., vol. 363, no. 1, pp.
297–308, Oct. 2006.
[104] Y. Wang, W. G. Noid, P. Liu, and G. A. Voth, “Effective force coarse-
graining,” Phys. Chem. Chem. Phys., vol. 11, no. 12, p. 2002, 2009.
[105] J. Chen, J. Chen, G. Pinamonti, and C. Clementi, “Learning effective
molecular models from experimental observables,J. Chem. Theory
Comput., vol. 14, no. 7, pp. 3849–3858, May 2018.
[106] A. Davtyan, G. A. Voth, and H. C. Andersen, “Dynamic force match-
ing: Construction of dynamic coarse-grained models with realistic short
time dynamics and accurate long time dynamics,” The Journal of
Chemical Physics, vol. 145, no. 22, p. 224107, Dec. 2016.
[107] F. Nüske, L. Boninsegna, and C. Clementi, “Coarse-graining molecular
systems by spectral matching,” The Journal of Chemical Physics, vol.
151, no. 4, p. 044116, Jul. 2019.
[108] D. Strukov, G. Snider, D. Stewart, and S. Williams, “The missing
memristor found,” Nature, vol. 453, pp. 80–3, 06 2008.
[109] Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones,
M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and
M. Soljaˇ
ci´
c, “Deep learning with coherent nanophotonic circuits,” Nat.
Photonics, vol. 11, no. 7, pp. 441–446, Jun 2017.
[110] J. K. George, A. Mehrabian, R. Amin, J. Meng, T. F. de Lima, A. N.
Tait, B. J. Shastri, T. El-Ghazawi, P. R. Prucnal, and V. J. Sorger,
“Neuromorphic photonics with electro-absorption modulators,” Opt.
Express, vol. 27, no. 4, pp. 5181–5191, Feb 2019.
[111] M. A. Zidan, J. P. Strachan, and W. D. Lu, “The future of electronics
based on memristive systems,Nat. Electron., vol. 1, no. 1, pp. 22–29,
Jan 2018.
[112] G. Yamankurt, E. J. Berns, A. Xue, A. Lee, N. Bagheri, M. Mrksich,
and C. A. Mirkin, “Exploration of the nanomedicine-design space with
high-throughput screening and machine learning,” Nature Biomedical
Engineering, vol. 3, no. 4, pp. 318–327, feb 2019.
[113] C. Sayes and I. Ivanov, “Comparative study of predictive computational
models for nanoparticle-induced cytotoxicity,Risk Analysis, vol. 30,
no. 11, pp. 1723–1734, jun 2010.
[114] D. E. Jones, H. Ghandehari, and J. C. Facelli, “Predicting cytotoxicity
of PAMAM dendrimers using molecular descriptors,Beilstein Journal
of Nanotechnology, vol. 6, pp. 1886–1896, sep 2015.
[115] C. G. Atkeson, A. W. Moore, and S. Schaal, “Locally weighted
learning for control,” in Lazy Learning. Springer Netherlands,
1997, pp. 75–113. [Online]. Available: https://doi.org/10.1007%
2F978-94-017-2053-3_3
[116] Q. U. Ain, A. Aleksandrova, F. D. Roessler, and P. J. Ballester,
“Machine-learning scoring functions to improve structure-based bind-
ing affinity prediction and virtual screening,” WIREs Computational
Molecular Science, vol. 5, no. 6, pp. 405–424, 2015.
[117] H. Li, J. Peng, Y. Leung, K.-S. Leung, M.-H. Wong, G. Lu, and P. J.
Ballester, “The impact of protein structure and sequence similarity on
the accuracy of machine-learning scoring functions for binding affinity
prediction,” Biomolecules, vol. 8, no. 1, 2018.
[118] “Chapter 27 - support vector machine: Principles, parameters, and ap-
plications,” in Handbook of Neural Computation, P. Samui, S. Sekhar,
and V. E. Balas, Eds. Academic Press, 2017, pp. 515 – 535.
[119] J. Platt, “Sequential minimal optimization: A fast algorithm for training
support vector machines,” Advances in Kernel Methods-Support Vector
Learning, vol. 208, 07 1998.
[120] E. Osuna, R. Freund, and F. Girosi, “An improved training algorithm
for support vector machines,” in Neural Networks for Signal Processing
VII â ˘
Aˇ
T Proceedings of the 1997 IEEE Workshop, pages 276 â ˘
S 285.
IEEE, 1997.
[121] K. A. Cyran, J. Kawulok, M. Kawulok, M. Stawarz, M. Michalak,
M. Pietrowska, P. Widłak, and J. Pola ´
nska, Support Vector Machines
in Biomedical and Biometrical Applications. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2013, pp. 379–417.
[122] J. Li, W. Zhang, X. Bao, M. Abbaszadeh, and W. Guo, “Inference
in turbulent molecular information channels using support vector
machine,” IEEE Transactions on Molecular, Biological and Multi-Scale
Communications, vol. 6, no. 1, pp. 25–35, 2020.
[123] S. Mohamed, D. Jian, L. Hongwei, and Z. Decheng, “Molecular
communication via diffusion with spherical receiver and transmitter
and trapezoidal container,Microprocessors and Microsystems, vol. 74,
p. 103017, 2020.
[124] P. Cunningham and S. J. Delany, “k-Nearest Neighbour Classifiers,
University College Dublin, Tech. Rep., 03 2007.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 26
[125] www.medium.com, accessed: 2020-08-02.
[126] K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and
D. I. Fotiadis, “Machine learning applications in cancer prognosis
and prediction,” Computational and Structural Biotechnology Journal,
vol. 13, pp. 8–17, 2015.
[127] J. TAN, M. UNG, C. CHENG, and C. S. GREENE, “UNSUPERVISED
FEATURE CONSTRUCTION AND KNOWLEDGE EXTRACTION
FROM GENOME-WIDE ASSAYS OF BREAST CANCER WITH
DENOISING AUTOENCODERS,” in Biocomputing 2015. WORLD
SCIENTIFIC, nov 2014.
[128] X. Ren, Y. Wang, L. Chen, X.-S. Zhang, and Q. Jin, “ellipsoidFN: a
tool for identifying a heterogeneous set of cancer biomarkers based on
gene expressions,” Nucleic Acids Research, vol. 41, no. 4, pp. e53–e53,
dec 2012.
[129] M. Kim, N. Rai, V. Zorraquino, and I. Tagkopoulos, “Multi-omics
integration accurately predicts cellular state in unexplored conditions
for escherichia coli,” Nat. Commun., vol. 7, no. 1, oct 2016.
[130] S. Jesse and S. V. Kalinin, “Principal component and spatial correlation
analysis of spectroscopic-imaging data in scanning probe microscopy,
Nanotechnology, vol. 20, no. 8, p. 085714, 2009.
[131] A. Subasi and M. I. Gursoy, “Eeg signal classification using pca, ica,
lda and support vector machines,” Expert systems with applications,
vol. 37, no. 12, pp. 8659–8666, 2010.
[132] L. Cao, K. Chua, W. Chong, H. Lee, and Q. Gu, “A comparison of pca,
kpca and ica for dimensionality reduction in support vector machine,”
2003.
[133] A. H. Fielding, Cluster and classification techniques for the bio-
sciences. Cambridge University Press, 2006.
[134] P. Comon, “Independent component analysis, a new concept?” Signal
Processing, vol. 36, no. 3, pp. 287 – 314, 1994, higher Order
Statistics. [Online]. Available: http://www.sciencedirect.com/science/
article/pii/0165168494900299
[135] S. Ruder, “An overview of gradient descent optimization algorithms,
arXiv preprint arXiv:1609.04747, 2016.
[136] B. Settles, “Active learning,” Synthesis Lectures on Artificial Intelli-
gence and Machine Learning, vol. 6, no. 1, pp. 1–114, jun 2012.
[137] M. K. Warmuth, J. Liao, G. RÃd’tsch, M. Mathieson, S. Putta, and
C. Lemmen, “Active learning with support vector machines in the drug
discovery process,Journal of Chemical Information and Computer
Sciences, vol. 43, no. 2, pp. 667–673, feb 2003.
[138] K. Gubaev, E. V. Podryabinkin, and A. V. Shapeev, “Machine learning
of molecular properties: Locality and active learning,The Journal of
Chemical Physics, vol. 148, no. 24, p. 241727, jun 2018.
[139] I. H. Witten, E. Frank, and M. A. Hall, Eds., Data Mining: Practical
Machine Learning Tools and Techniques (Third Edition), third
edition ed., ser. The Morgan Kaufmann Series in Data Management
Systems. Boston: Morgan Kaufmann, 2011. [Online]. Available: http:
//www.sciencedirect.com/science/article/pii/B9780123748560000213
[140] S. Ju, T. Shiga, L. Feng, Z. Hou, K. Tsuda, and J. Shiomi, “Designing
nanostructures for phonon transport via bayesian optimization,” Phys-
ical Review X, vol. 7, no. 2, may 2017.
[141] M. J. Bryan, S. A. Martin, W. Cheung, and R. P. N. Rao, “Probabilistic
co-adaptive brain–computer interfacing,Journal of Neural Engineer-
ing, vol. 10, no. 6, p. 066008, oct 2013.
[142] Y. Huang and R. P. N. Rao, “Reward optimization in the primate brain:
A probabilistic model of decision making under uncertainty,PLoS
ONE, vol. 8, no. 1, p. e53344, jan 2013.
[143] R. Bauer and A. Gharabaghi, “Reinforcement learning for adaptive
threshold control of restorative brain-computer interfaces: a bayesian
simulation,” Frontiers in Neuroscience, vol. 9, feb 2015.
[144] O. Adir, M. Poley, G. Chen, S. Froim, N. Krinsky, J. Shklover,
J. Shainsky-Roitman, T. Lammers, and A. Schroeder, “Integrating arti-
ficial intelligence and nanotechnology for precision cancer medicine,”
Adv. Mater., vol. 32, no. 13, p. 1901989, Jul. 2019.
[145] B. JAGLA, “Sequence characteristics of functional siRNAs,RNA,
vol. 11, no. 6, pp. 864–872, jun 2005.
[146] L. Horev-Azaria, G. Baldi, D. Beno, D. Bonacchi, U. Golla-Schindler,
J. C. Kirkpatrick, S. Kolle, R. Landsiedel, O. Maimon, P. N. Marche,
J. Ponti, R. Romano, F. Rossi, D. Sommer, C. Uboldi, R. E. Unger,
C. Villiers, and R. Korenstein, “Predictive toxicology of cobalt ferrite
nanoparticles: comparative in-vitro study of different cellular models
using methods of knowledge discovery from data,Particle and Fibre
Toxicology, vol. 10, no. 1, p. 32, 2013.
[147] L. Breiman, “Bagging predictors,” vol. 24, pp. 123–140, 1996.
[148] X. Liu, Tang, Harper, J. Steevens, R. Xu, and Harper, “Predictive
modeling of nanomaterial exposure effects in biological systems,
International Journal of Nanomedicine, p. 31, sep 2013. [Online].
Available: https://doi.org/10.2147%2Fijn.s40742
[149] I. Yekkala, S. Dixit, and M. A. Jabbar, “Prediction of heart
disease using ensemble learning and particle swarm optimization,”
in 2017 International Conference On Smart Technologies For Smart
Nation (SmartTechCon). IEEE, aug 2017. [Online]. Available:
https://doi.org/10.1109%2Fsmarttechcon.2017.8358460
[150] R. Kohavi, “Scaling up the accuracy of naive bayes classiffiers: a
decision tree hybrid,” in Proceedings of the Second International
Conference on Knowledge Discovery and Data Mining, 1996, pp. 202–
207.
[151] Y. Freund and R. E. Schapire, “A desicion-theoretic generalization
of on-line learning and an application to boosting,” in European
conference on computational learning theory. Springer, 1995, pp.
23–37.
[152] ——, “A decision-theoretic generalization of on-line learning and
an application to boosting,” Journal of Computer and System
Sciences, vol. 55, no. 1, pp. 119–139, aug 1997. [Online]. Available:
https://doi.org/10.1006%2Fjcss.1997.1504
[153] C. Zhang and Y. Ma, Ensemble Machine Learning: Methods and
Applications. Springer Publishing Company, Incorporated, 2012.
[154] J. Quinlan, “Proceedings of the 5th australian joint conference on
artificial intelligence,” in Learning with continuous classes, 1992, pp.
343–348.
[155] Y. Wang and I. H. Witten, “Inducing model trees for continuous
classes,” in In Proc. of the 9th European Conf. on Machine Learning
Poster Papers, 1997, pp. 128–137.
[156] M. A. Hall and E. Frank, “Combining naive bayes and decision tables,
in FLAIRS Conference, 2008.
[157] N. V. Queipo, R. T. Haftka, W. Shyy, T. Goel, R. Vaidyanathan, and
P. K. Tucker, “Surrogate-based analysis and optimization,” Progress in
aerospace sciences, vol. 41, no. 1, pp. 1–28, 2005.
[158] Z.-H. Han and K.-S. Zhang, “Surrogate-based optimization,” in
Real-World Applications of Genetic Algorithms, O. Roeva, Ed.
Rijeka: IntechOpen, 2012, ch. 17. [Online]. Available: https:
//doi.org/10.5772/36125
[159] K. Tran and Z. W. Ulissi, “Active learning across intermetallics to guide
discovery of electrocatalysts for CO2 reduction and H2 evolution,Nat.
Catal., vol. 1, no. 9, pp. 696–703, Sep 2018.
[160] D. Winkler, F. Burden, B. Yan, R. Weissleder, C. Tassa, S. Shaw, and
V. Epa, “Modelling and predicting the biological effects of nanomate-
rials,” SAR and QSAR in Environmental Research, vol. 25, no. 2, pp.
161–172, feb 2014.
[161] F. Burden and D. Winkler, “Optimal sparse descriptor selection for
QSAR using bayesian methods,” QSAR & Combinatorial Science,
vol. 28, no. 6-7, pp. 645–653, jul 2009.
[162] F. R. Burden and D. A. Winkler, “Robust QSAR models using bayesian
regularized neural networks,Journal of Medicinal Chemistry, vol. 42,
no. 16, pp. 3183–3187, jul 1999.
[163] D. A. Winkler and F. R. Burden, “Robust QSAR models from novel
descriptors and bayesian regularised neural networks,” Molecular Sim-
ulation, vol. 24, no. 4-6, pp. 243–258, aug 2000.
[164] F. Burden and D. Winkler, “An optimal self-pruning neural network
and nonlinear descriptor selection in QSAR,” QSAR & Combinatorial
Science, vol. 28, no. 10, pp. 1092–1097, oct 2009.
[165] M. K. Gupta, S. Gupta, and R. K. Rawal, “Impact of artificial neural
networks in QSAR and computational modeling,” in Artificial Neural
Network for Drug Design, Delivery and Disposition. Elsevier, 2016,
pp. 153–179.
[166] P. Nowak-Sliwinska, A. Weiss, X. Ding, P. J. Dyson, H. van den Bergh,
A. W. Griffioen, and C.-M. Ho, “Optimization of drug combinations
using feedback system control,” Nature Protocols, vol. 11, no. 2, pp.
302–315, 2016.
[167] W. Liu, Y.-L. Li, M.-T. Feng, Y.-W. Zhao, X. Ding, B. He, and
X. Liu, “Application of feedback system control optimization technique
in combined use of dual antiplatelet therapy and herbal medicines,”
Frontiers in Physiology, vol. 9, p. 491, 2018.
[168] D. Ho, P. Wang, and T. Kee, “Artificial intelligence in nanomedicine,
Nanoscale Horiz., vol. 4, pp. 365–377, 2019.
[169] J. Khong, P. Wang, T. R. Gan, J. Ng, T. T. Lan Anh, A. Blasiak,
T. Kee, and D. Ho, “Chapter 22 - the role of artificial intelligence in
scaling nanomedicine toward broad clinical impact,” in Nanoparticles
for Biomedical Applications, ser. Micro and Nano Technologies, E. J.
Chung, L. Leon, and C. Rinaldi, Eds. Elsevier, 2020, pp. 385 – 407.
[170] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A
survey of deep neural network architectures and their applications,
Neurocomputing, vol. 234, pp. 11–26, Apr. 2017.
IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 27
[171] K.-L. Xiang, P.-Y. Xiang, and Y.-P. Wu, “Prediction of the fatigue life
of natural rubber composites by artificial neural network approaches,”
Materials & Design, vol. 57, pp. 180–185, May 2014.
[172] H. Almakaeel, A. Albalawi, and S. Desai, “Artificial neural network
based framework for cyber nano manufacturing,Manufacturing Let-
ters, vol. 15, pp. 151–154, Jan. 2018.
[173] T. Akter and S. Desai, “Developing a predictive model for nanoimprint
lithography using artificial neural networks,” Materials & Design, vol.
160, pp. 836–848, dec 2018.
[174] M. del Rosario Martinez-Blanco, V. H. Castañeda-Miranda, G. Ornelas-
Vargas, H. A. Guerrero-Osuna, L. O. Solis-Sanchez, R. Castañeda-
Miranda, J. M. Celaya-Padilla, C. E. Galvan-Tejada, J. I. Galvan-
Tejada, H. R. Vega-Carrillo, M. Martínez-Fierro, I. Garza-Veloz, and
J. M. Ortiz-Rodriguez, “Generalized regression neural networks with
application in neutron spectrometry,” in Artificial Neural Networks -
Models and Applications. InTech, oct 2016.
[175] G. Cybenko, “Approximation by superpositions of a sigmoidal func-
tion,” Mathematics of Control, Signals, and Systems, vol. 2, no. 4, pp.
303–314, Dec. 1989.
[176] Z. Kuang and A. Kuh, “A combined self-organizing feature map
and multilayer perceptron for isolated word recognition,” IEEE Trans.
Signal Process., vol. 40, no. 11, pp. 2651–2657, Nov. 1992.
[177] J. Tang, C. Deng, and G.-B. Huang, “Extreme learning machine for
multilayer perceptron,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27,
no. 4, pp. 809–821, Apr. 2016.
[178] F. M. Bayat, M. Prezioso, B. Chakrabarti, H. Nili, I. Kataeva, and
D. Strukov, “Implementation of multilayer perceptron network with
highly uniform passive memristive crossbar circuits,Nat. Commun.,
vol. 9, no. 1, Jun. 2018.
[179] H. Guo, J. Y. Zhao, and J. H. Yin, “Random forest and multilayer per-
ceptron for predicting the dielectric loss of polyimide nanocomposite
films,” RSC Adv., vol. 7, no. 49, pp. 30 999–31 008, Jun. 2017.
[180] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
learning with deep convolutional generative adversarial networks,”
ArXiV, Nov. 2015.
[181] S. So and J. Rho, “Designing nanophotonic structures using conditional
deep convolutional generative adversarial networks,” Nanophotonics,
vol. 8, no. 7, pp. 1255–1261, Jun. 2019.
[182] A. Gayon-Lombardo, L. Mosser, N. P. Brandon, and S. J. Cooper,
“Pores for thought: generative adversarial networks for stochastic re-
construction of 3d multi-phase electrode microstructures with periodic
boundaries,” npj Comput. Mater., vol. 6, no. 1, Jun. 2020.
[183] S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. SchÃijtt,
and K.-R. MÃijller, “Machine learning of accurate energy-conserving
molecular force fields,” Sci. Adv., vol. 3, no. 5, p. e1603015, May 2017.
[184] S. Chmiela, H. E. Sauceda, I. Poltavsky, K.-R. MÃijller, and
A. Tkatchenko, “sGDML: Constructing accurate and data efficient
molecular force fields using machine learning,” Comput. Phys. Com-
mun., vol. 240, pp. 38–45, Jul. 2019.
[185] H. E. Sauceda, S. Chmiela, I. Poltavsky, K.-R. MÃijller, and
A. Tkatchenko, Construction of Machine Learned Force Fields with
Quantum Chemical Accuracy: Applications and Chemical Insights.
Springer International Publishing, 2020, pp. 277–307.
[186] S. Menard, Logistic regression : from introductory to advanced con-
cepts and applications. Los Angeles: SAGE, 2010.
[187] S. Mirjalili, H. Faris, and I. Aljarah, Eds., Evolutionary Machine
Learning Techniques. Springer-Verlag GmbH, 2019. [Online].
Available: https://www.ebook.de/de/product/38294840/evolutionary_
machine_learning_techniques.html
[188] F. Nigsch, A. Bender, B. van Buuren, J. Tissen, E. Nigsch, and
J. B. O. Mitchell, “Melting point prediction employing k-nearest
neighbor algorithms and genetic parameter optimization,” J. Chem. Inf.
Model., vol. 46, no. 6, pp. 2412–2422, Sep. 2006.
[189] A. Junejo, Y. Shen, A. A. Laghari, X. Zhang, and H. Luo, “Molecular
diagnostic and using deep learning techniques for predict functional
recovery of patients treated of cardiovascular disease,IEEE Access,
vol. 7, pp. 120 315–120 325, Aug. 2019.
[190] K. V. Rani and S. J. Jawhar, “Superpixel with nanoscale imaging and
boosted deep convolutional neural network concept for lung tumor
classification,” Int. J. Imaging Syst. Technol., Apr. 2020.
[191] C.-W. Chen, K.-P. Chang, C.-W. Ho, H.-P. Chang, and Y.-W. Chu,
“KStable: A computational method for predicting protein thermal
stability changes by k-star with regular-mRMR feature selection,
Entropy, vol. 20, no. 12, p. 988, Dec. 2018.
[192] P. S. Lamoureux, T. S. Choksi, V. Streibel, and F. Abild-Pedersen,
“Artificial intelligence real-time prediction and physical interpretation
of atomic binding energies in nano-scale metal clusters,” ArXiV, May
2020.
[193] J. Chen, S. Wong, J. Chang, P. choo Chung, H. Li, U.-V. Koc, F. Prior,
and R. Newcomb, “A wake-up call for the engineering and biomedical
science communities,” IEEE Circuits Syst. Mag., vol. 9, no. 2, pp. 69–
77, 2009.
[194] Y.-X. Liu, S. Gao, T. Ye, J.-Z. Li, F. Ye, and Y. Fu, “Combined 3d-
quantitative structure–activity relationships and topomer technology-
based molecular design of human 4-hydroxyphenylpyruvate dioxyge-
nase inhibitors,” Future Med. Chem., vol. 12, no. 9, pp. 795–811, May
2020.
[195] D. A. Tomalia, L. S. Nixon, and D. M. Hedstrand, “Engineering critical
nanoscale design parameters (CNDPs): A strategy for developing
effective nanomedicine therapies and assessing quantitative nanoscale
structure-activity relationships (QNSARs),” in Pharmaceutical Appli-
cations of Dendrimers. Elsevier, 2020, pp. 3–47.
[196] S. P. Mukherjee, M. Davoren, and H. J. Byrne, “In vitro mammalian
cytotoxicological study of PAMAM dendrimers – towards quantitative
structure activity relationships,Toxicol. in Vitro, vol. 24, no. 1, pp.
169–177, Feb. 2010.
[197] F. Noe, “Boltzmann Generators: Deep Learning of Thermodynamics
and Efficient Monte Carlo,” in APS March Meeting Abstracts, ser. APS
Meeting Abstracts, vol. 2019, Jan. 2019, p. B21.006.
[198] A. E. Ulanov, E. S. Tiunov, and A. I. Lvovsky, “Quantum-inspired
annealers as boltzmann generators for machine learning and statistical
physics,” ArXiV, 2019.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The generation of multiphase porous electrode microstructures is a critical step in the optimisation of electrochemical energy storage devices. This work implements a deep convolutional generative adversarial network (DC-GAN) for generating realistic n-phase microstructural data. The same network architecture is successfully applied to two very different three-phase microstructures: A lithium-ion battery cathode and a solid oxide fuel cell anode. A comparison between the real and synthetic data is performed in terms of the morphological properties (volume fraction, specific surface area, triple-phase boundary) and transport properties (relative diffusivity), as well as the two-point correlation function. The results show excellent agreement between datasets and they are also visually indistinguishable. By modifying the input to the generator, we show that it is possible to generate microstructure with periodic boundaries in all three directions. This has the potential to significantly reduce the simulated volume required to be considered “representative” and therefore massively reduce the computational cost of the electrochemical simulations necessary to predict the performance of a particular microstructure during optimisation.
Article
Full-text available
Inference of transmitter side information is essential to communication. In Molecular Communication (MC), whilst the Bayesian inference of mass diffusion channel parameters is well established, turbulent diffusion (TD) channels are not well-understood. Cascading vortices rapidly transform transmitted momentum (molecular information puffs) into heat, which raises the challenge of receiver inferring transmitter information. Our initial results found that in TD channels, inferring transmitted molecular concentration or timing is challenging. As such, we were motivated to infer transmitter velocity from a flexible receiver sample area. In this paper, we consider an unbounded scenario of Molecular Communication via Turbulent Diffusion (MCvTD) where a transmitter injects several molecular puffs with different velocities. We first developed a time difference concentration (TDC) method based on large-scale support vector machine (SVM) to distinguish the injection velocities. To trade-off the prediction accuracy and number of receiver spatial samples, we propose the stepwise maximum variance (SMV) algorithm to select the limited dominant receiver sampling locations. The overall performance can achieve 100% accuracy in transmitter velocity information recovery, with excellent error vs. receiver size trade-off (e.g. 5% error for 74% area reduction). The research results indicate that velocity modulation at transmitter and TDC with SVM receiver should be used in MCvTD channels.
Article
Full-text available
Lung tumor is a complex illness caused by irregular lung cell growth. Earlier tumor detection is a key factor in effective treatment planning. When assessing the lung computed tomography, the doctor has many difficulties when determining the precise tumor boundaries. By offering the radiologist a second opinion and helping to improve the sensitivity and accuracy of tumor detection, the use of computer‐aided diagnosis could be near as effective. In this research article, the proposed Lung Tumor Detection Algorithm consists of four phases: image acquisition, preprocessing, segmentation, and classification. The Advance Target Map Superpixel‐based Region Segmentation Algorithm is proposed for segmentation purposes, and then the tumor region is measured using the nanoimaging theory. Using the concept of boosted deep convolutional neural network yields 97.3% precision, image recognition can be achieved. In the types of literature with the current method, which shows the study's proposed efficacy, the implementation of the proposed approach is found dramatically.
Article
In this letter, we consider the problem of channel estimation for diffusive molecular communication (MC) systems. The presence of memory in diffusive MC channels, along with channel noise caused by various sources, necessitate the development of accurate channel estimators to acquire the channel impulse response (CIR). Previous works proposed pilot-based estimators based on the maximum likelihood (ML) and least squares (LS) criteria. In contrast, we propose three novel semi-blind estimators, one based on the expectation maximization (EM) framework and two based on the decision-directed (DD) estimation strategy. We also obtain the corresponding semi-blind Cramer-Rao bound (CRB). Our simulation results show that all the proposed semi-blind estimators offer substantially lower mean-squared error than the existing pilot-based estimators. The EM estimator provides the highest accuracy and converges to the semi-blind CRB, while the DD estimators offer convenient low-complexity alternatives. Importantly, the proposed estimators allow for a significant reduction in the number of transmitted pilots, without compromising the estimation accuracy.
Article
Nanonetworks are composed of interacting nano-nodes, whose size ranges from several hundred cubic nanometers to several cubic micrometers. The extremely constrained computational resources of nano-nodes, the fluctuations in their energy caused by energy harvesting processes, and their very limited transmission range at Terahertz (THz)-band frequencies (0.1-10 THz), make the design of routing protocols in nanonetworks very challenging. A multi-hop deflection routing algorithm based on reinforcement learning (MDR-RL) is proposed in this paper to dynamically and efficiently explore the routing paths during packet transmissions. Firstly, new routing and deflection tables are implemented in nano-nodes, so that nano-nodes can deflect packets to other neighbors when route entries in the routing table are invalid. Secondly, one forward updating scheme and two feedback updating schemes based on reinforcement learning are designed to update the tables, namely, on-policy and off-policy updating schemes. Finally, extensive simulations in networks simulator-3 are conducted to analyze the performance of MDR-RL using different updating policies, as well as to compare the performance with other machine learning routing algorithms based on Neural Networks and Decision Tree. The results show that the MDR-RL can increase the packet delivery ratio and number of delivered packets, and can decrease the packet average hop count.
Chapter
Highly accurate force fields are a mandatory requirement to generate predictive simulations. Here we present the path for the construction of machine learned molecular force fields by discussing the hierarchical pathway from generating the dataset of reference calculations to the construction of the machine learning model, and the validation of the physics generated by the model. We will use the symmetrized gradient-domain machine learning (sGDML) framework due to its ability to reconstruct complex high-dimensional potential energy surfaces (PES) with high precision even when using just a few hundreds of molecular conformations for training. The data efficiency of the sGDML model allows using reference atomic forces computed with high-level wave-function-based approaches, such as the gold standard coupled-cluster method with single, double, and perturbative triple excitations (CCSD(T)). We demonstrate that the flexible nature of the sGDML framework captures local and non-local electronic interactions (e.g., H-bonding, lone pairs, steric repulsion, changes in hybridization states (e.g., sp2sp3sp^2 \rightleftharpoons sp^3), n → π∗ interactions, and proton transfer) without imposing any restriction on the nature of interatomic potentials. The analysis of sGDML models trained for different molecular structures at different levels of theory (e.g., density functional theory and CCSD(T)) provides empirical evidence that a higher level of theory generates a smoother PES. Additionally, a careful analysis of molecular dynamics simulations yields new qualitative insights into dynamics and vibrational spectroscopy of small molecules close to spectroscopic accuracy.
Article
Free energy surfaces of chemical and physical systems are often generated using a popular class of enhanced sampling methods that target a set of collective variables (CVs) chosen to distinguish the characteristic features of these surfaces. While some of these approaches are typically limited to low (\sim1-3)-dimensional CV subspaces, methods such as driven adiabatic free-energy dynamics/ temperature-accelerated molecular dynamics have been shown to be capable of generating free energy surfaces of quite high dimension by sampling the associated marginal probability distribution via full sweeps over the CV landscape. These approaches repeatedly visit conformational basins, producing a small scattering of points within the basins on each visit. Consequently, they are particularly amenable to synergistic combination with regression machine learning methods for filling in the surfaces between the sampled points and for providing a compact and continuous (or semi-continuous) representation of the surfaces that can be easily stored and used for further computation of observable properties. Given the central role of machine learning techniques in this combined approach, it is timely to provide a detailed comparison of the performance of different machine learning strategies and models, including neural networks, kernel ridge regression, support vector machines, and weighted neighbor schemes, for their ability to learn these high-dimensional surfaces as a function of the amount of sampled training data and, once trained, to subsequently generate accurate ensemble averages corresponding to observable properties of the systems. In this article, we perform such a comparison on a set of oligopeptides, in both gas and aqueous phases, corresponding to CV spaces of 2-10 dimensions and assess their ability to provide a global representation of the free energy surfaces and to generate accurate ensemble averages.
Article
Aim: 4-Hydroxyphenylpyruvate dioxygenase (HPPD) has attracted increasing attention as an important target against tyrosinemia type I. This paper aimed to explore the structure–activity relationship of HPPD inhibitors with pyrazole scaffolds and to design novel HPPD inhibitors. Methodology & Results: The best 3D-quantitative structure–activity relationships model was established by two different strategies based on 40 pyrazole scaffold-based analogs. Screening of molecular fragments by topomer technology, combined with molecular docking, 14 structures were identified for potential human HPPD inhibitory activity. Molecular dynamics results demonstrated that all the compounds obtained bound to the enzyme and possessed a satisfactory binding free energy. Conclusion: The quantitative structure–activity relationship of HPPD inhibitors of pyrazole scaffolds was clarified and 14 original structures with potential human HPPD inhibitory activity were obtained.
Book
Adequate health and health care is no longer possible without proper data supervision from modern machine learning methodologies like cluster models, neural networks, and other data mining methodologies. The current book is the first publication of a complete overview of machine learning methodologies for the medical and health sector, and it was written as a training companion, and as a must-read, not only for physicians and students, but also for any one involved in the process and progress of health and health care. In this second edition the authors have removed the textual errors from the first edition. Also, the improved tables from the first edition, have been replaced with the original tables from the software programs as applied. This is, because, unlike the former, the latter were without error, and readers were better familiar with them. The main purpose of the first edition was, to provide stepwise analyses of the novel methods from data examples, but background information and clinical relevance information may have been somewhat lacking. Therefore, each chapter now contains a section entitled "Background Information". Machine learning may be more informative, and may provide better sensitivity of testing than traditional analytic methods may do. In the second edition a place has been given for the use of machine learning not only to the analysis of observational clinical data, but also to that of controlled clinical trials. Unlike the first edition, the second edition has drawings in full color providing a helpful extra dimension to the data analysis. Several machine learning methodologies not yet covered in the first edition, but increasingly important today, have been included in this updated edition, for example, negative binomial and Poisson regressions, sparse canonical analysis, Firth's bias adjusted logistic analysis, omics research, eigenvalues and eigenvectors.