Content uploaded by Alexandros-Apostolos A. Boulogeorgos

Author content

All content in this area was uploaded by Alexandros-Apostolos A. Boulogeorgos on Nov 08, 2020

Content may be subject to copyright.

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 1

Machine Learning in Nano-Scale

Biomedical Engineering

Alexandros–Apostolos A. Boulogeorgos, Senior Member, IEEE, Stylianos E. Trevlakis, Student Member, IEEE,

Sotiris A. Tegos, Student Member, IEEE, Vasilis K. Papanikolaou, Student Member, IEEE, and

George K. Karagiannidis, Fellow, IEEE

Abstract—Machine learning (ML) empowers biomedical sys-

tems with the capability to optimize their performance through

modeling of the available data extremely well, without using

strong assumptions about the modeled system. Especially in nano-

scale biosystems, where the generated data sets are too vast and

complex to mentally parse without computational assist, ML is

instrumental in analyzing and extracting new insights, accelerat-

ing material and structure discoveries and designing experience

as well as supporting nano-scale communications and networks.

However, despite these efforts, the use of ML in nano-scale

biomedical engineering remains still under-explored in certain

areas and research challenges are still open in ﬁelds such as

structure and material design and simulations, communications

and signal processing, and bio-medicine applications. In this

article, we review the existing research regarding the use of ML

in nano-scale biomedical engineering. In more detail, we ﬁrst

identify and discuss the main challenges that can be formulated

as ML problems. These challenges are classiﬁed in three main

categories: structure and material design and simulation, com-

munications and signal processing and biomedicine applications.

Next, we discuss the state of the art ML methodologies that are

used to countermeasure the aforementioned challenges. For each

of the presented methodologies, special emphasis is given to its

principles, applications and limitations. Finally, we conclude the

article with insightful discussions, that reveal research gaps and

highlight possible future research directions.

Index Terms—Biomedical engineering, Machine learning,

Molecular communications, Nano-structure design, Nano-scale

networks.

NOMENCLATURE

2D Two dimensional

3D Three dimensional

ANI Accurate neural network engine for molecu-

lar energies

AL Active Learning

AdaBoost Adaptive Boosting

AEV Atomic Environments Vector

ANN Artiﬁcial Neural Network

ANOVA Analysis of Variance

ARES Autonomous Research System

Bagging Bootstrap Aggregating

BER Bit Error Rate

The authors are with the Wireless Communications Systems Group

(WCSG), Department of Electrical and Computer Engineering, Aristotle

University of Thessaloniki, Thessaloniki, 54124 Greece. e-mails: {trevlakis,

geokarag, tegosoti, vpapanikk} @auth.gr, al.boulogeorgos@ieee.org.

Alexandros–Apostolos A. Boulogeorgos is also with the Department of

Digital Systems, University of Piraeus, Piraeus 18534, Greece.

Manuscript received -, 2020; revised -, 2020.

BPN Behler-Parrinello Network

BSS Blind Source Separation

CG Coarse Graining

CGN Coarse Graining Network

CMOS ComplementaryMetal-Oxide-Semiconductor

CNN Convolution Neural Network

DCF Discrete Convolution Filter

DNN Deep Neural Network

D2NN Diffractive Deep Neural Network

DPN Deep Potential Network

DT Decision Table

DTL Decision Tree Learning

DTNB Decision Table Naive Bayes

DTNN Deep Tensor Neural Network

EEG Electroencephalography

FS Feature Selection

FSC Feedback System Control

GAN Generative Adversarial Network

GD Gradient Descent

GRNN Generalized Regression Neural Network

ICA Independent Component Analysis

ISI Inter-Symbol Interference

KNN k-Nearest Neighbor

LDA Linear Discriminant Analysis

LR Logistic Regression

LWL Local Weighted Learning

MAN Molecular Absorption Noise

MC Molecular Communications

MIMO Multiple-Input Multiple-Output

ML Machine Learning

MLP Multi-layer Perceptron

ML-SF Machine Learning Scoring Function

MvLR Multivariate linear regression

NBTree Naive Bayes Tree

NN Neural Network

NNP Neural Network Potential

NP Nano-Particles

PAMAM Polyamidoamine

PCA Principal Component Analysis

PDF Probability Density Function

PES Potential Energy Surface

PSO Particle Swarm Optimization

QM Quantum Mechanic

QP Quadratic Programming

QPOP Quadratic Phenotype Optimization Platform

QSAR Quantitative Structure-activity relationships

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 2

RELU REctiﬁed Linear Unit

RForest Random Forest

RNAi Ribonucleic acid interference

RNN Recurrent Neural Network

SDR Standard Deviation Reduction

SF Scoring Functions

SiC Silicon Carbide

SmF Symmetry Function

SMO Sequential Minimal Optimization

SOTA State Of The Art

SVM Support Vector Machine

TEM Transmission Electron Microscope

THz Terahertz

ZnO Zinc Oxide

I. INTRODUCTION

In 1959, Richard P. Feynman articulated “It would be

interesting if you could shallow the surgeon. You put the

mechanical surgeon inside the blood vessel and it goes into the

heart and looks around... other small machines might be per-

manently incorporated in the body to assist some inadequately-

functioning organ.” More than half a century later, this quote is

still state-of-the-art (SOTA). Currently, nanotechnology revis-

its the conventional therapeutic approaches by producing more

than 100 nano-material based drugs. These have already been

approved or they are under clinical trial [1], while discussing

the utilization of nano-scale communication networks for real

time monitoring and precision drug delivery [2], [3]. However,

these developments come with the need of analyzing vast and

complicated, as well as rich in relations, data sets.

Fortunately, in the last couple of decades, we have witnessed

a revolutionary development of new tools from the ﬁeld

of machine learning (ML), which enables the analysis of

large data sets through training models. These models can

be utilized for observations classiﬁcation or predictions and

have been considered in several engineering ﬁelds, including

computer vision, speech and image recognition, natural lan-

guage processing, etc. This frontier is continuing its expan-

sion into several other scientiﬁc domains, such as quantum

physics, chemistry and biology, and is expected to make a

signiﬁcant impact on the design of novel nano-materials and

structures, nano-scale communication systems and networks,

while simultaneously presenting new data-driven biomedicine

applications [4].

In the ﬁeld of nano-materials and structure design, ex-

perimental and computational simulating methodologies have

traditionally been the two fundamental pillars in exploring

and discovering properties of novel constructions as well as

optimizing their performance [5]. However, these methodolo-

gies are constrained by experimental conditions and limitation

of the existing theoretical knowledge. Meanwhile, as the

chemical complexity of nano-scale heterogeneous structures

increases, the two traditional methodologies are rendered

incapable of predicting their properties. In this context, the

development of data-driven techniques, like ML, becomes very

attractive. Similarly, in nano-scale communications and signal

processing, the computational resources are limited and the

major challenge is the development of low-complexity and

accurate system models and data detection techniques, that

do not require channel knowledge and equalization, while

taking into account the environmental conditions (e.g., spe-

ciﬁc enzyme composition). To address these challenges the

development of novel ML methods is deemed necessary [6].

Last but not least, ML can aid in devising novel, more accurate

methods for disease detection and therapy development, by en-

abling genome classiﬁcation [7] and selection of the optimum

combination of drugs [8].

Motivated from above, the present contribution provides

an interdisciplinary review of the existing research from the

areas of nano-engineering, biomedical engineering and ML.

To the best of the authors knowledge no such review exists

in the technical literature, that focuses on the ML-related

methodologies that are employed in nano-scale biomedical

engineering. In more detail, the contribution of this paper is

as follows:

•The main challenges-problems in nano-scale biomedi-

cal engineering, which can be tackled with ML tech-

niques, are identiﬁed and classiﬁed in three main cate-

gories, namely: structure and material design and simu-

lations, communications and signal processing, and bio-

medicine applications.

•SOTA ML methodologies, which are used in the ﬁeld

of nano-scale biomedical engineering, are reviewed, and

their architectures are described. For each one of the pre-

sented ML methods, we report its principles and building

blocks. Finally, their compelling applications in nano-

scale biomedicine systems are surveyed for aiding the

readers in reﬁning the motivation of ML in these systems,

all the way from analyzing and designing new nano-

materials and structures to holistic therapy development.

•Finally, the advantages and limitations of each ML ap-

proach are highlighted, and future research directions

are provided.

The rest of the paper is organized as follows: Section II

identiﬁes the nano-scale biomedical engineering problems that

can be solved with ML techniques. Section III presents the

most common ML approaches related to the ﬁeld of nano-scale

biomedical engineering. Section IV explains the advantages

and limitations of the ML approaches alongside their applica-

tions and extracts future directions. Section V concludes this

paper and summarizes its contribution. The structure of this

treatise is summarized at a glance in Fig. 1.

II. MAC HI NE LEARNING CHALLENGES IN NANO-SCALE

BIOMEDICAL ENGINEERING

In this section, we report how several of the open challenges

in nano-scale biomedical engineering has already been and

can be formulated to ML problems. As mentioned in the

previous section, in order to provide a better understanding

of the nature of these challenges, we classify them into three

categories, i.e. i) structure and material design and simulation,

ii) communications and signal processing, and iii) biomedicine

applications. Following this classiﬁcation, which is illustrated

in Fig. 2, the rest of this section is organized as follows:

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 3

Sec. I - Introduction

Sec. II - Machine Learning Challenges in Nano-scale Biomedical Engineering

Sec. II-A - Structure and Material Design and Simulation

Sec. II-B - Communications and Signal Processing

Sec. II-C - Biomedicine Applications

Sec. III - Machine Learning Methodologies in Nano-scale Biomedical Engineering

Sec. III-A - Artificial Neural Networks

Sec. III-B - Regression

Sec. III-C - Support Vector Machine

Sec. III-D - -Nearest Neighbors

Sec. III-E - Dimentionality Reduction

Sec. III-F - Gradient Descent Method

Sec. III-G - Active Learning

Sec. III-H - Bayesian Machine Learning

Sec. III-I - Decision Tree Learning

Sec. III-J - Decision Table

Sec. III-K - Surrogate-Based Otpimization

Sec. III-L - Quantitative Structure-Activity Relationships

Sec. III-M - Boltzmann Generator

Sec. III-N - Feedback System Control

Sec. III-O - Quadratic Phenotypic Optimization Platform

Sec. IV - Discussion & The Road Ahead

Sec. V - Conclusion

Fig. 1. The structure of this treatise.

Section II-A focuses on presenting the challenges on designing

and simulating nano-scale structures, materials and systems,

whereas, Section II-B discusses the necessity of employing

ML in nano-scale communications. Similarly, Section II-C

emphasizes in the possible applications of ML in several

applications, such as therapy development, drug delivery and

data analysis.

A. Structure and Material Design and Simulation

One of the fundamental challenges in material science and

chemistry is the understanding of the structure properties [9].

The complexity of this problem grows dramatically in the case

of nanomaterials because: i) they adopt different properties

from their bulk components; and ii) they are usually hetero-

structures, consisting of multiple materials. As a result, the

design and optimization of novel structures and materials, by

discovering their properties and behavior through simulations

and experiments, lead to multi-parameter and multi-objective

problems, which in most cases are extremely difﬁcult or

impossible to be solved through conventional approaches; ML

can be an efﬁcient alternative choice to this challenge.

1) Biological and chemical systems simulation: In atomic

and molecular systems, there exist complex relationships be-

tween the atomistic conﬁguration and the chemical properties,

which, in general, cannot be described by explicit forms. In

ML in nano-scale biomedical engineering

Structure and material design and simulation

Experimental planning and autonomous research

Inverse design

Biological and chemical system simulation

Communications and signal processing

Channel modeling

Signal detection

Security

Routing and mobility management

Event detection

Biomedical Applications

Therapy development

Disease detection

Fig. 2. ML challenges in nano-scale biomedical engineering.

these cases, ML aims to the development of associate conﬁg-

urations by means of acquiring knowledge from experimental

data. Speciﬁcally, in order to incorporate quantum effects on

molecular dynamics simulations, ML can be employed for the

derivation of potential energy surfaces (PESs) from quantum

mechanic (QM) evaluations [10]–[15]. Another use of ML

lies in the simulation of molecular dynamic trajectories. For

example, in [16]–[18], the authors formulated ML problems

for discovering the optimum reaction coordinates in molecular

dynamics, whereas, in [19]–[23], the problem of estimating

free energy surfaces was reported. Furthermore, in [24]–[27],

the ML problem of creating Markov state models, which

take into account the molecular kinetics, was investigated.

Finally, the ML use in generating samples from equilibrium

distributions, that describe molecular systems, was studied

in [28].

2) Inverse design: The availability of several high-

resolution lithographic techniques opened the door to devising

complex structures with unprecedented properties. However,

the vast choices space, which is created due to the large

number of spatial degrees of freedom complemented by the

wide choice of materials, makes extremely difﬁcult or even

impossible for conventional inverse design methodologies to

ensure the existence or uniqueness of acceptable utilizations.

To address this challenge, nanoscience community turned their

eyes to ML. In more detail, several researchers identiﬁed three

possible methods, which are based on artiﬁcial neural net-

works (ANNs),deep neural networks (DNNs), and generative

adversarial networks (GANs). ANNs follow a trail-and-error

approach in order to design multilayer nanoparticles (NP) [29].

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 4

Meanwhile, DNNs are used in the metasurface design [30].

Finally, GANs can be used to design nanophotonics structures

with precise user-deﬁne spectral responses [31].

3) Experiments planning and autonomous research: ML

has been widely employed, in order to efﬁciently explore

the vast parameter space created by different combinations of

nano-materials and experimental conditions and to reduce the

number of experiments needed to optimize hetero-structures

(see e.g., [32] and references therein). Towards this direction,

fully autonomous research can be conducted, in which exper-

iments can be designed based on insights extracted from data

processing through ML, without human in the loop [33].

B. Communications and Signal Processing

In biomedical applications, nano-sensors can be utilized

for a variety of tasks such as monitoring, detection and

treatment [34], [35]. The size of such nano-sensors ranges

between 1−100 nm, which refers to both macro-molecules

and bio-cells [35]. The proper selection of size and materials is

critical for the system performance, while it is constrainted by

the target area, their purpose, and safety concerns. Such nano-

networks are inspired by living organisms and, when they are

injected into the human body, they interact with biological

processes in order to collect the necessary information [36].

However, they are characterized by limited communication

range and processing power, that allow only short-range

transmission techniques to be used [37]. As a consequence,

conventional electromagnetic-based transmission schemes may

not be appropriate for communications among molecules [3],

[38], since, in molecular communications the information is

usually encoded in the number of released particles. The sim-

plest approach for the receiver to demodulate the symbol is to

compare the number of received particles with predetermined

thresholds. In the absence of inter-symbol interference (ISI),

ﬁnding the optimal thresholds is a straightforward process.

However, in the presence of ISI the threshold needs to be

extracted as a solution of the error probability minimization (or

performance maximization) problem [39]–[41]. The aforemen-

tioned approaches require knowledge of the channel model.

However, in several practical scenarios, where the molecular

communications (MC) system complexity is high, this may

not be possible. To countermeasure this issue, ML methods

can be employed to accurately model the channel or perform

data sequence detection.

An alternative to MCs that has been used to support nano-

networks is communications in the terahertz (THz) band. For

these networks, apart from their speciﬁcations, an accurate

model for the THz communication between nano-sensors is

imperative for their simulation and performance assessment. In

addition, another problem that is entangled with novel nano-

sensor networks is their resilience against attacks, which is

of high importance since not only the system reliability is

threatened, but also the safety of the patients is at stake.

Thus, it is imperative for any possible threats to be recognized

and for effective countermeasures to be developed. A solution

to the above problems appears to be relatively complex for

conventional computational methods. On the other hand, ML

can provide the tools to model the space-time trajectories of

nano-sensors in the complex environments of the human body

as well as to draw strategies that mitigate the security risks of

the novel network architectures.

1) Channel modeling: One of the fundamental problems

in MCs is to accurately model the channel in different en-

vironments and conditions. Most of the MC models assume

that a molecule is removed from the environment after hitting

the receiver [42]–[46]; hence, each molecule can contribute

to the received signal once. To model this phenomenon,

a ﬁrst-passage process is employed. Another approach was

created from the assumption that molecules can pass through

the receiver [47]–[50]. In this case, a molecule contributes

multiple times to the received signal. However, neither of the

aforementioned approaches are capable of modeling perfectly

absorbing receivers, when the transmitters reﬂect spherical

bodies. Interistingly, such models accommodate practical sce-

narios where the emitter cells do not have receptors at the

emission site and they cannot absorb the emitted molecules.

An indicative example lies in hormonal secretion in the

synapses and pancreatic β−cell islets [51]. To ﬁll this gap,

ML was employed in [52], [53] to model molecular channels

in realistic scenarios, with the aid of ANNs. Similarly, in

THz nano-scale networks, where the in-body environment is

characterized by high path-loss and molecular absorption noise

(MAN), ML methods can be used in order to accurately model

MAN. This opens the road to a better understanding of the

MAN’s nature and the design of new transmission schemes

and waveforms.

2) Signal detection: To avoid channel estimation in MC,

Farsal et al. proposed in [54] a sequence detection scheme,

based on recurrent neural networks (RNNs). Compared with

previously presented ISI mitigation schemes, ML-based data

sequence detection is less complex, since they do not require to

perform channel estimation and data equalization. Following a

similar approach, in [6], the authors presented an ANN capable

of achieving the same performance as conventional detection

techniques, that require perfect knowledge of the channel.

In THz nano-scale networks, an energy detector is usually

used to estimate the received data [55]. In more detail, if the

received signal power is below a predeﬁned threshold, the de-

tector decides that the bit 0has been sent, otherwise, it decides

that 1is sent. However, the transmission of 1causes a MAN

power increase, usually capable of affecting the detection of

the next symbols. To counterbalance this, without increasing

the symbol duration, a possible approach is to design ML

algorithms that are trained to detect the next symbol and

take into account the already estimated ones. Another ML

challenge in signal detection at THz nano-scale networks,

lies with detecting the modulation mode of the transmission

signal by a receiver, when no prior synchronization between

transmitter and receiver has occurred. The solution to this

problem will provide scalability to these networks. Motivated

by this, in [56], the authors provided a ML algorithm for

modulation recognition and classiﬁcation.

3) Routing and mobility management: In THz nano-scale

networks, the design of routing protocols capable of proac-

tively countermeasuring congestion has been identiﬁed as the

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 5

next step for their utilization [57]. These protocols need to

take into account the extremely constrained computational

resources, the stochastic nature of nano-nodes movements

as well as the existence of obstacles that may interrupt the

line-of-sight transmission. The aforementioned challenges can

be faced by employing SOTA ML techniques for analyzing

collected data and modeling the nano-sensors’ movements,

discovering neighbors that can be used as intermediate nodes,

identifying possible blockers, and proactively determining

the message root from the source to the ﬁnal destination.

In this context, in [58], the authors presented a multi-hop

deﬂection routing algorithm based on reinforcement learning

and analyzed its performance in comparison to different neural

networks (NNs) and decision tree updating policies.

4) Event detection: Nano-sensor biomedicine networks can

provide continuous monitoring solutions, that can be used

as compact, accurate, and portable diagnostic systems. Each

nano-sensor obtains a biological signal linked to a speciﬁc

disease and is used for detecting physiological change or

various biological materials [59]. Successful applications in

event detection include monitoring of DNA interactions, an-

tibody, and enzymatic interactions, or cellular communication

processes, and are able to detect viruses, asthma attacks and

lung cancer [60]. For example, in [61], the authors developed a

bio-transferrable graphene wireless nano-sensor that is able to

sense extremely sensitive chemicals and biological compounds

up to single bacterium. Furthermore, in [62], a Sandwich

Assay was developed that combines mechanical and opto-

plasmonic transduction in order to detect cancer biomarkers

at extremely low concentrations. Also, in [63], a molecular

communication-based event detection network was proposed,

that is able to cope with scenarios where the molecules

propagate according to anomalous diffusion instead of the

conventional Brownian motion.

5) Security: Although, the emergence of nano-scale net-

works based on both electromagnetic and MCs opened oppor-

tunities for the development of novel healthcare applications,

it also generated new problems concerning the patients’ safety.

In particular, two types of security risks have been observed,

namely blackhole and sentry attacks [64]. In the former,

malicious nano-sensors emit chemicals to attract the legitimate

ones and prevent them from searching for their target. On the

contrary, in the latter, the malicious nano-sensors repel the

legitimate ones for the same reason. Such security risks can be

counterbalanced with the use of threshold-based and bayesian

ML techniques that have been proven to counter the threats

with minimal requirements.

C. Biomedicine Applications

Timely detection and intervention are tied with successful

treatment for many diseases. This is the so-called proactive

treatment and is one of the main objectives of the next-

generation healthcare systems, in order to detect and pre-

dict diseases and offer treatment services seamlessly. Data

analysis and nanotechnology progress simultaneously toward

the realization of these systems. Recent breakthroughs in

nanotechnology-enabled healthcare systems allow for the ex-

ploitation of not only the data that already exist in medical

Fig. 3. ML methodologies for nano-scale biomedical engineering.

databases throughout the world, but also of the data gathered

from millions of nano-sensors.

1) Disease detection: One of the most common problems

in healthcare systems is genome classiﬁcation, with cancer

detection being the most popular. Various classiﬁcation algo-

rithms are suitable for tackling this problem, such as Naive

Bayes, k-Nearest Neighbors, Decision tree, ANNs and support

vector machine (SVM) [65]. For example, the authors in [66],

predicted the risk of cerebral infarction in patients by using

demographic and cerebral infarction data. In addition, in [7] a

unique coarse-to-ﬁne learning method was applied on genome

data to identify gastric cancer. Another example is the research

presented in [67], where SVM and convolution NNs (CNNs)

were used to classify breast cancer subcategory by performing

analysis on microscopic images of biopsy.

2) Therapy development: Therapy development and opti-

mization can improve clinical efﬁcacy of treatment for various

diseases, without generating unwanted outcomes. Optimization

still remains a challenging task, due to its requirement for

selecting the right combination of drugs, dose and dosing fre-

quency [68]. For instance, a quadratic phenotype optimization

platform (QPOP) was proposed in [69] to determine the opti-

mal combination from 114 drugs to treat bortezomib-resistant

multiple myeloma. Since its creation, QPOP has been used to

surpass the problems related to drug designing and optimiza-

tion, as well as drug combinations and dosing strategies. Also,

in [70], the authors presented a platform called CURATE.AI,

which was validated clinically and was used to standardize

therapy of tuberculosis patients with liver transplant-related

immunosuppression. Furthermore, CURATE.AI was used for

treatment development and patient guidance that resulted in

halted progression of metastatic castration resistant prostate

cancer [71].

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 6

III. MACH IN E LEARNING MET HO DS I N NAN O-S CALE

BIOMEDICAL ENGINEERING

This section presents the fundamental ML methodologies

that are used in nano-scale biomedical engineering. As illus-

trated in Fig. 3, in nano-scale biomedical engineering, depend-

ing on how training data are used, we can identify two groups

of ML methodologies, namely supervised, and unsupervised

learning.

Supervised learning methodologies require a certain amount

of labeled data for training [72]. Their objective is to create a

function that maps the input data to the output labels relying on

the initial training. In more detail, supervised learning return

a mapping function g(x)that maximizes the scoring function

f(xn, yn)for each n∈[1, N ], with xnbeing the n−th sample

of the input training data, ynrepresenting the label of xn,

and Nbeing the size of the training set. Of note, in most

realistic scenarios, the aforementioned sets are independent

and identical distributed.

On the other hand, unsupervised learning methodologies

aim at exploring the hidden features or structure of data

without relying on training sets [73]. Therefore, they have

extensively been used for chemical and biological properties

discovery in nano-scale structures and materials. The disadvan-

tage of unsupervised learning methodologies lies to the fact

that no standard accuracy evaluation method for their output,

due to the lack of training data sets.

The rest of this section is organized as follows: Section III-A

provides a survey of the ANNs, which are employed in this

ﬁeld, while Section III-B presents regression methodologies.

Meanwhile, the applications, architecture and building blocks

of SVMs and k−nearest neighbors (KNNs) are respectively

described in Sections III-C and III-D, whereas dimentionality

reduction methods are given in Section III-E. A brief review

of gradient descent (GD) and active learning (AL) methods

are respectively delivered in Sections III-F and III-G. Further-

more, Bayesian ML is discussed in Section III-H, whereas

decision tree learning (DTL) and decision table (DT) based

algorithms are respectively reported in Sections III-I and III-J.

Section III-K revisits the operating principles of surrogate-

based optimization, while Section III-L describes the use of

quantitative structure-activity relationships (QSARs) in ML.

Finally, the Boltzmann generator is presented in Section III-M,

while Sections III-N and III-O respectively discuss feedback

system control (FSC) methods and the quadratic phenotypic

optimization platform. The organization of this section is

summarized at a glance in Fig. 4.

A. Artiﬁcial Neural Networks

ANNs can be used for both classiﬁcation and regression.

Their operation principle is based on the linear and/or non-

linear manipulation of the input-data in several intermediate

(hidden) layers. The output of each layer is subjected to by

some non-linear functions, namely activation functions. This

can be formulated as

yk=g(vk+ck),(1)

Sec. III - Machine Learning Methodologies in Nano-scale Biomedical Engineering

Sec. III-A - Artificial Neural Networks

Sec. III-B - Regression

Sec. III-C - Support Vector Machine

Sec. III-D - -Nearest Neighbors

Sec. III-E - Dimentionality Reduction

Sec. III-F - Gradient Descent Method

Sec. III-G - Active Learning

Sec. III-H - Bayesian Machine Learning

Sec. III-I - Decision Tree Learning

Sec. III-J - Decision Table

Sec. III-K - Surrogate-Based Otpimization

Sec. III-L - Quantitative Structure-Activity Relationships

Sec. III-M - Boltzmann Generator

Sec. III-N - Feedback System Control

Sec. III-O - Quadratic Phenotypic Optimization Platform

Sec. III-A.1 - Convolution Neural Networks

Sec. III-A.2 - Recurrent Neural Networks

Sec. III-A.3 - Deep Neural Networks

Sec. III-A.4 - Diffractive Deep Neural Networks

Sec. III-A.5 - Generalized Regrssion Neural Networks

Sec. III-A.6 - Multi-layer Perceptrons

Sec. III-A.7 - Generative Adversarial Networks

Sec. III-A.8 - Behler-Parrinello Networks

Sec. III-A.9 - Deep Potential Networks

Sec. III-A.10 - Deep Tensor Neural Networks

Sec. III-A.11 - SchNet

Sec. III-A.12 - Accurate Neural Network Engine for Molecular Energies

Sec. III-A.13 - Coarse Graining Networks

Sec. III-A.14 - Neuromorphic Computing

Sec. III-B.1 - Logistic Regression

Sec. III-B.2 - Multivariate Linear Regression

Sec. III-B.3 - Classification via Regression

Sec. III-B.4 - Local Weighted Learning

Sec. III-B.5 - Machine Learning Scoring Functions

Sec. III-E.1 - Feature Selection

Sec. III-E.2 - Principal Component Analysis

Sec. III-E.3 - Linear Discriminant Analysis

Sec. III-E.4 - Independent Component Analysis

Sec. III-I.1 - Bagging

Sec. III-I.2 - Bagged Tree

Sec. III-I.3 - Naive Bayes Tree

Sec. III-I.4 - Adaptive Boosting

Sec. III-I.5 - Random Forest

Sec. III-I.6 - M5P

Fig. 4. The organization of Section III.

where

vk=

m

X

i=1

wkixi,(2)

with xiand ykrespectively being the input and the output

signals of the k-th layer, while wki and ckrespectively

standing for the associated weights and bias. Finally, g(·)

stands for the activation function. This process allows us to

model complex relationships of the processed data.

The reminder of this Section is focused on presenting

the ANNs that are commonly used in nano-scale biomedi-

cal engineering and is organized as follows: Section III-A1

reports the applications of CNNs in this ﬁeld, presents a

typical CNN architecture and discusses its building blocks

functionalities. Similarly, Section III-A2 presents the oper-

ation of RNNs, while deep NNs (DNNs) are discussed in

Section III-A3. Diffractive DNNs (D2NN) and generalized

regression NNs (GRNNs) are respectively described in Sec-

tion III-A4 and III-A5, while Sections III-A6 and III-A7

respectively revisit the multi-layer perceptrons (MLPs) and

GANs. Moreover, the applications, architecture and limitations

of Behler-Parrinello networks (BPNs) are reported in Sec-

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 7

tion III-A8, whereas, Sections III-A9, III-A10, and III-A11

respectively present the ones of deep potential networks

(DPNs), deep tensor NNs (DTNNs), and SchNets. Likewise,

the usability and building blocks of accurate NN engine for

molecular energies, or as is widely-known ANI, are provided

in Section III-A12. Finally, comprehensive descriptions of

coarse graining networks (CGNs) and neurophormic comput-

ing are respectively given in Sections III-A13 and III-A14.

Table I summarizes some of the typical applications of ANNs

in nano-scale biomedical engineering.

1) Convolution Neural Networks: CNNs have been ex-

tensively used for analyzing images with some degrees of

spatial correlation [94]–[97]. The aim of CNNs is to extract

fundamental local correlations within the data, and thus, they

are suitable for identifying image features that depend on

these correlations. In this sense, in [74], the author employed

CNNs to analyze skyrmions in labeled Lorentz transmission

electron microscope (TEM) images, while, in [75], CNNs were

used to identify matter phases from data extracted via Monte

Carlo simulations. Another application of CNNs in nano-scale

biomedical systems lies in the utilization of autonomous re-

search systems (ARES) [76]. Speciﬁcally, in [76], the authors

presented a learning method that determines the state-of-the-

tip in scanning tunneling microscopy.

Figure 5 depicts a typical CNN architecture, which mimics

the neurons’ connectivity patterns in the human brain. It

consists of neurons, which are arranged in a three dimensional

(3D) space, i.e., width, height, and depth. Each neuron receives

several inputs and performs an element-wise multiplication,

which is usually followed by a non-linear operation. Note that,

in most cases, CNN architectures are not fully-connected. This

means that the neurons in a layer will only be connected to

a small region of the previous layer. Each layer of a CNN

transforms its input to a 3D output of neuron activations. In

more detail, it consists of the following layers:

•Input: This layer represents the input image into the CNN.

Input layer holds the raw pixels of the image in the three

color channels, namely red, green, and blue.

•Convolution: layers are the pillars of CNN. They contain

the weights that are used to extract the distinguished

features of the images. As illustrated in Fig. 5, they

evaluate the output of neurons, which are connected to

local regions in the input.

•Rectiﬁed linear unit (RELU): applies an element-wise

activation function, such as thresholding at zero. This

allows the generation of non-linear decision boundaries.

•Pooling: conducts downsampling along the spatial dimen-

sions.

•Flattening: reorganizes the values of the 3D matrix into

a vector.

•Hidden layers: returns the classiﬁcation scores.

2) Recurrent Neural Networks: Most ML networks rely to

the assumption of independence among the training and test

data. Thus, after processing each data point, the entire state of

the network is lost. Apparently, this is not a problem, if the

data points are independently generated. However, if they are

in time or space related, the aforementioned assumption be-

comes unacceptable. Moreover, conventional networks usually

Red

Green

Blue

Convolution + RELU

Pooling

Convolution + RELU

.

.

.

· · ·

Flattening

Inputs

Outputs · · ·

Fig. 5. CNN architecture.

rely on data points, which can be organized in vectors of ﬁxed

length. However, in practice, there exist several problems,

which require modeling data with temporal or sequential

structure and varying length inputs and outputs.

In order to overcome the aforementioned limitations, RNNs

have been proposed in [98]. RNNs are connectionist models

capable of selectively passing information across sequence

steps, while processing sequential data. From the nano-scale

applications point of view, RNNs have been used for nano-

structure design and data sequence detection in MCs. Speciﬁ-

cally, in [77], Hedge described the role that RNNs are expected

to play in the design of nano-structures, while, in [78] and

in [54], the authors employed a RNN in order to train a

maximum likelihood detector in MCs systems.

Figure 6 depicts the most successful RNN architecture,

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 8

TABLE I

ANN APPLI CATIONS I N NAN O-SCALE BIOMEDICAL ENGINEERING.

Paper Application Method Description

[18] Chemical properties discovery CGN Prediction of the rototranslationally invariant energy in QM

[31] Nano-material inverse design GAN Metasurfaces inverse design

[53] Channel modelling DNN MIMO channel modeling in MC

[54] Sequence detection RNN Data sequence detection in MC

[74] Image analysis CNN Skyrmions analysis in labeled Lorentz TEM images

[75] Image analysis CNN Matter phases identiﬁcation

[76] ARES CNN State-of-the-tip identiﬁcation in tunneling microscopy scanning

[77] Image analysis RNN Nano-structure design

[78] Sequence detection RNN Data sequence detection in MC

[79] Feature detection and object classiﬁcation D2NN Classiﬁcation of images and creation of imaging lens at THz spectrum

[80] Data analysis GRNN, MLP, BPN Characterization of psychological wellness from survey results

[81] Nano-structure properties discovery GRNN Study of the impact of ZnO NPs suspensions in diesel and Mahua

biodiesel blended fuel

[82] Nano-structure properties discovery GRNN Prediction of the pool boiling heat transfer coefﬁcient of refrigerant-based

nano-ﬂuids

[83] Nano-structure analysis MLP Analysis of the crystalline structure of magnesium oxide ﬁlms grown over

6H SiC substrates

[84] Nano-structure design GAN Nano-photonic structure design

[85] Chemical properties discovery BPN Energy surfaces prediction from QM data

[86] Complex structure simulation BPN Self-learning Monte Carlo creation for many-body interactions

[87] Complex structure simulation BPN Atomic energy prediction

[88] Chemical properties discovery DPN PES prediction that use atomic conﬁguration directly at the input data

[89] Molecules and nano-material properties DTNN General QM molecular potential modeling

discovery

[90] Chemical properties discovery SchNet PES prediction that takes into account rototranslationally invariant

inter-atomic distances

[91] Chemical properties modeling ANI Prediction of molecules energies in complex nano-structures

[92] Chemical properties modeling CGN Theormodynamics prediction in chemical systems

[93] Chemical properties modeling CGN Theormodynamics prediction in chemical systems

introduced by Hochreiter and Schmidhuber [99]. From this

ﬁgure, it is evident that the only difference between RNN

and CNN is the fact that the hidden layers of the latter are

replaced with memory cells with self-connected recurrent ﬁx-

weighted edges. The memory cells store the internal state

of the RNN and allow processing sequences of inputs of

varying length. Likewise, the recurrent edges guarantee that

the gradient can pass across several steps without vanishing.

The weights change during training in a slowing rate in order

to create a long-term memory. Finally, RNNs support short-

term memory through ephemeral activations, which pass from

each node to successive nodes. This allows RNNs to exploit

the dynamic temporal information hidden in time sequences.

3) Deep Neural Networks: Deep learning was suggested

in [54] as an efﬁcient method to detect the information at the

receiver in MCs. Speciﬁcally, based on the similarities between

speech recognition and molecular channels, techniques from

DL can be utilized to train a detection algorithm from samples

of transmitted and received signals. In the same work, it

was proposed that well-known NNs such as an RNN, can

train a detector even if the underlying system model is not

known. Furthermore, a real-time NN-based sequence detector

was proposed, and it was shown that the suggested DL-based

algorithms could eliminate the need for instantaneous channel

state information estimation.

In another research work, [53], a NN-based modeling of

the molecular multiple-input multiple-output (MIMO) channel,

was presented. This is a remarkable contribution, since the

proposed model can be used to investigate the possibility of

increasing the low rates in MCs. Speciﬁcally, in this paper

a2×2molecular MIMO channel was modeled through two

ML-based techniques and the developed model was used to

evaluate the bit error rate (BER).

4) Diffractive Deep Neural Networks: In [79], a diffractive

deep NN (D2NN) framework was proposed. The D2NN is

an all-optical deep learning framework, where multiple layers

of diffractive surfaces physically form the NN. These layers

collaborate to optically perform an arbitrary function, which

can be learned statistically by the network. The learning part

is performed through a computer, whereas the prediction of

the physical network follows an all-optical approach.

Several transmissive and/or reﬂective layers create the

D2NN. More speciﬁcally, each point on a speciﬁc layer can

either transmit or reﬂect the incoming wave. To this end, an

artiﬁcial neuron is formed, which is connected to other neurons

of the following layers through optical diffraction. Following

Huygens’ principle, each point on a speciﬁc layer acts as a

secondary source of a wave, whose amplitude and phase are

expressed as the product of the complex valued transmission or

reﬂection coefﬁcient and the input wave at that point. Conse-

quently, the input interference pattern, due to the earlier layers

and the local transmission/reﬂection coefﬁcient at a speciﬁc

point, modulate the amplitude and phase a secondary wave,

through which an artiﬁcial neuron in the D2NN is connected to

the neurons of the following layer. The transmission/reﬂection

coefﬁcient of each neuron can be considered as a multiplicative

bias term, which is an repetitively adjusted parameter during

the training process of the diffractive network, using an error

back-propagation method. Generally, the amplitude and the

phase of each neuron can be a learnable parameter, providing a

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 9

Inputs

Outputs

Fig. 6. An RNN with a hidden layer consisting of two memory cells.

complex-valued modulation at each layer and, thus, enhancing

the inference performance of the network.

5) Generalized Regression Neural Networks: GRNN be-

longs to the instance-based learning methods and it is a

variation of radial basis NNs [100]. Instance-based learning

methods, that construct hypotheses directly from the training

instances, have tractable computational cost in general, com-

pared to the not instance-based like MLP with backpropaga-

tion. GRNN consists of an input layer, a pattern layer, and the

output layer and can be expressed as

ˆy(x) = ˆ

f(x) = PN

k=1 ykK(x, xk)

PN

k=1 K(x, xk),(3)

where y(x)is the prediction value of the N+1-th input x,ykis

the activation of k-th neuron of the pattern layer and K(x, xk)

is the radial basis function kernel, which is a Gaussian kernel

given by

K(x, xk) = e−dk/2σ2, dk= (x−xk)T(x−xk),(4)

where dis the Euclidean distance and σis a smoothing

parameter. Due to the presence of K(x, xk), the value yk

of training data instances that are closer to x, according

to the σparameter, has more signiﬁcant contribution to the

predicted value.

GRNN is used in [80] in order to characterize psychological

wellness from survey results that measure stress, depression,

anger, and fatigue. Moreover, it was employed in [81] for

investigating the effect of zinc oxide (ZnO) NPs suspensions

in diesel and Mahua biodiesel blended fuel on single cylinder

diesel engine performance characteristics. Finally, in [82],

it was employed for predict the pool boiling heat transfer

coefﬁcient of refrigerant-based nano-ﬂuids.

6) Multi-layer Perceptrons: MLP is a type of feed-forward

ANN that consists of at least three layers of nodes: input layer,

output layer, and one or more hidden layers [101]. Apart from

the input nodes a(0)

n, each node is a neuron that takes as input

a weighted sum of the node values as well as a bias of the

previous layer and gives an output depending on a usually

sigmoid activation function, σ(˙

). Therefore, the input of the

k-th neuron in the L-th layer can be expressed as

z(L)

k=wk,0a(L−1)

0+. . . wk,na(L−1)

n+bk,(5)

where wiare the weights associated to each node at the

previous layer and b(L)

iis the bias at the i-th node of the

L-th hidden layer. The activation of that neuron then can be

written as

a(L)

i=σ(z(L)

i).(6)

The number of nodes in the input layer is equivalent to

the number of input features, whereas the number of output

neurons corresponds to the output features. A cost function C,

which is usually the sum squared errors between prediction

and target, is calculated and it is fed in a backward fashion

in order to update the weights in each neuron via a GD

algorithm, and thus, to minimize the cost function. This

learning method of updating the weights in such manner is

called back-propagation [102]. More speciﬁcally, the degree

of error in an output node jfor the n-th training example is

ej(n) = yj(n)−ˆyj(n), where yis the target value and ˆyis

the predicted value by the perceptron. The error, for example

n, over all output nodes can be obtained as

C(n) = X

j

e2

j(n).(7)

GD dictates a change in weights proportional to the negative

gradient of the cost function, −∇C(w). However, this method

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 10

with the entirety of training data can be computationally

expensive, so methods like stochastic GD for every step can

increase efﬁciency.

MLP was used in [80] in order to characterize psychological

wellness from survey results that measure stress, depression,

anger, and fatigue. Likewise, in [83], MLP found an applica-

tion in analyzing the crystalline structure of magnesium oxide

ﬁlms grown over 6H silicon carbide (SiC) substrates.

7) Generative Adversarial Networks: AGAN [103] is an

unsupervised learning strategy, which was introduced in [104].

A GAN consists of two networks, a generator that estimates

the distributions of the parameters and a discriminator that

evaluates each estimation by comparing it to the available

unlabeled data. This strategy can exploit speciﬁc training

algorithms for different models and optimization algorithms.

Speciﬁcally, a MLP can be utilized in a twofold way, i.e., the

generative model generates samples by passing random noise

through it, while it is also used as the discriminative model.

Both networks can be trained using only the highly successful

backpropagation and dropout algorithms, while approximate

prediction or Markov chains are not necessary.

The generator’s distribution pgover data xcan be learned

by deﬁning a prior on input noise variables pz(z)and rep-

resenting a mapping to data space as G(z;θg), where Gis

a differentiable function which corresponds to a MLP with

parameter θg. A second MLP D(x;θd)with parameter θd

and a single scalar number as output, denotes the probability

that xis derived from the data rather than pg. The Dis

trained in order to maximize the probability that the training

examples and samples from Gare labeled correctly, while Gis

simultaneously trained to minimize the term log(1−D(G(z))).

More speciﬁcally, a two-player min-max game is performed

with value function V(G;D)as follows:

min

Gmax

DV(D, G) = Ex∼pg(x)[log D(x)]

+Ex∼pz(z)[log(1 −D(G(z)))].(8)

In practice, the game must be performed by using an iterative

numerical approach. Optimizing Din the inner loop of training

is computationally prohibitive and on ﬁnite data sets would

result in over-ﬁtting. A better solution is to alternate between

ksteps of optimizing Dand one step of optimizing G. To

this end, Dis maintained near its optimal solution, while Gis

modiﬁed slowly enough. In nano-scale biomedical engineer-

ing GAN has found application in nanophotonics structure

design [84] as well as in metasurface inverse design [31].

8) Behler-Parrinello Networks: BPNs are traditionally used

in molecular sciences in order to learn and predict the energy

surfaces from QM data, by combining all the relevant physical

symmetries and properties as well as sharing parameters

between atoms [85]. Another use of BPN lies in the self-

learning Monte Carlo simulation development for many-body

interactions [86]. Speciﬁcally, in [86], the authors employed

BPNs to make trainable effective Hamiltonians that were used

to extract the potential-energy surfaces in interacting many

particle systems. Finally, in [87], BPNs were used to predict

the atomic energy for different elements.

The fundamental BPN architecture is depicted in Fig. 7. For

each atom i, the molecular coordinates are mapped to invariant

x1

Coordinates · · · xn

gi

1· · ·

Atom ifeatures gi

k

· · ·

· · ·

Atom-speciﬁc

neural network

Atom ienergy

Fig. 7. Behler-Parrinello network architecture.

features. A set of correlation functions, which describe the

chemical environment of each atom, is employed in order

to map the distances of neighboring atoms of a certain type

and the angle between two neighbors of speciﬁc types. The

aforementioned features are inputted into a dense NN, which

returns the energy of atom iin its environment. Input feature

functions are designed taken into account that the energy is

rototranslationally invariant, while equivalent atoms share their

parameters. In the ﬁnal step, all the atoms of a molecule are

dentiﬁed and their atomic energies are summed. This guar-

antees permutation invariance. Parameter sharing combined

with the summation principle offers also scalability, since it

allows growing or shrinking the molecules network to any size,

including ones that were never seen in the training data. The

main limitation of BPNs is that they cannot accurately predict

the energy surfaces in complex chemical environments.

9) Deep Potential Networks: DPNs aim at providing an

end-to-end representation of PESs, which employ atomic con-

ﬁguration directly at the input data, without decompositioning

the contributions of different number of bodies [88]. Similarly

to BPNs, the main challenge is to design a DNN, that takes

into account both the rotational and permutational symmetries

as well as the chemically equivalent atom.

Let us consider a molecule that consists of NXiatoms

of type Xi, with i={1,2,· · · , M }. As demonstrated in

Fig. 8, the DPN takes as inputs the Cartesian coordinates of

each atom and feeds them in PM

i=1 NXialmost independent

sub-networks. Each of them provides a scalar output that

corresponds to the local energy contribution to the PES, and

maps a different atom in the system. Furthermore, they are

coupled only through summation in the last step of this

method, when the total energy of the molecule is computed.

In order to ensure the permutational symmetry of the input, in

each sub-network, the atoms are fed into different groups that

corresponds to different atomic species. Within each group,

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 11

the atoms are sorted in order to increase the distance to the

origin. To further guarantee global permutation symmetry, the

same parameters are assigned to all the sub-networks.

10) Deep Tensor Neural Networks: Recently, several re-

searchers have exploited the DTNN capability to learn a multi-

scale representation of the properties of molecules and mate-

rials from large-scale data in order to develop molecular and

material simulators [11], [89], [105]. In more detail, DTNN

initially recognizes and constructs a representation vector for

each one of the atoms within the chemical environment, and

then it employs a tensor construction algorithm that iteratively

learns higher-order representations, after interacting with all

pairwise neighbors.

Figure 9 presents a comprehensive example of DTNN archi-

tecture. The input, which consists of atom types and positions,

is processed through several layers to produce atom-wise

energies that are summed to a total energy. In the interaction

layer, which is the most important one, atoms interact via

continuous convolution functions. The variable Wtstands for

convolution weights that are returned from a ﬁlter generator

function. Continuous convolutions are generated by DNNs that

operate on interatomic distances, ensuring rototranslational

invariance of the energy.

DTNNs can accurately model a general QM molecular

potential by training them in a diverse set of molecular

energies [89]. Their main disadvantage is that they are unable

to perform energy predictions for systems larger than those

included in the training set [106].

11) SchNet: SchNets can be considered as a special case

of DTNN, since they both share atom embedding, interaction

reﬁnements and atom-wise energy contribution. Their main

difference is that interactions in DTNNs are modeled by tensor

layers, which provide atom representations. Parameter tensors

are also used in order to combine the atom representations

with inter-atomic distances [107]. On the other side, to model

the interactions, SchNet employs ﬁlter convolutions, which are

interpreted as a special case of computational-efﬁcient low-

rank factorized tensor layers [108], [109].

Conventional SchNets use discrete convolution ﬁlters

(DCFs), which are designed for pixelated image processing

in computer vision [110]. QM properties, like energy, are

highly sensitive to position ambiguity. As a consequence, the

accuracy of a model that discretize the particles position in

a grid is questionable. To solve this problem, in [90], the

authors employed continuous convolutions in order to map

the rototranslationally invariant inter-atomic distances to ﬁlter

values, which are used in the convolution.

12) Accurate Neural Network Engine for Molecular Ener-

gies: Accurate neural network engine for molecular energies

(ANAKIN-ME), or ANI for short, are networks that have been

developed to break the walls built by DTNNs. The princi-

ple behind ANI is to develop modiﬁed symmetry functions

(SmFs), which were introduced by BPNs, in order to develop

NN potentials (NNPs). NNPs output single-atom atomic envi-

ronments vectors (AEVs), as a molecular representation. AEVs

allow energy prediction in complex chemical environments;

thus, ANI solves the transferability problem of BPNs. By

employing AEVs, the problem, which needs to be solved by

ANI, is simpliﬁed into sampling statistically diverse set of

molecular interactions within a predeﬁned region of interest.

To successfully solve this problem, a considerably large data

set that spans molecular conformational and conﬁgurational

space, is required. A trained ANI is capable of accurately

predicting energies for molecules within the training set re-

gion [91].

As presented in Fig. 10, ANI uses the molecular coordinates

and the atoms in order to compute the AEV of each atom. The

AEV of atom Ai(with i= 1,· · · , N ), GAi, scrutinizes spe-

ciﬁc regions of Ai’s radial and angular chemical environment.

Each GAiis inputted in a single NPP, which returns the energy

of atom i. Finally, the total energy of a molecule is evaluated

as the sum of the energies of each one of the atoms.

13) Coarse Graining Networks: A common approach in

order to go beyond the time and length scales, accessible with

computational expensive molecular dynamics simulations, is

the coarse-graining (CG) models. Towards this direction,

several research works, including [18], [111]–[119], developed

CG energy functions for large molecular systems, which

take into account either the macroscopic properties or the

structural features of atomistic models. All the aforemen-

tioned contributions agreed on the importance of incorporating

the physical constraints of the system in order to develop

a successful model. The training data are usually obtained

through atomistic molecular dynamics simulations. Values

within physically forbidden regions are not sampled and not

included in the training. As a result, the machine is unable

to perform predictions far away the training data, without

additional constraints.

To countermeasure the aforementioned problem, CG net-

works employ regularization methods in order to enforce the

correct asymptotic behavior of the energy when a nonphysical

limit is violated. Similarly to BPNs and SchNets, CG networks

initially translate the cartesian into internal coordinates, and

use them to predict the rototranslationally invariant energy.

Next, as illustrated in Fig. 11, the network learns the difference

from a simple prior energy, which has been deﬁned to have

the correct asymptotic behavior [18]. Note that due to the

fact that CG networks are capable of using available training

data in order to correct the prior energy, its exact form is

not required. Likewise, CG networks compute the gradient of

the total free energy with respect to the input conﬁguration in

order to predict the conservative and rotation-equivariant force

ﬁelds. The force-matching loss minimization of this prediction

is used as a training rule of the CG network.

In practice, CGNs are used to predict the thermodynamic

of chemical systems that are considerably larger than what

is possible to simulate with atomistic resolution. Moreover,

there have been recently presented some indications that they

can also used to approximate the system kinetics, through the

addition of ﬁctitious particles [92] or by employing spectral

matching to train the CGN [93].

14) Neuromorphic Computing: Neuromorphic computing

[103] is an emerging ﬁeld, where the architecture of the

brain is closely represented by the designed hardware-level

system. The fundamental unit of neuromorphic computation

is a memristor, which is a two-terminal device in which

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 12

Cartesian coordinates

Input Input Input

Hidden layer Hidden layer Hidden layer

Local energy Local energy Local energy

Total energy

Local energy

Fig. 8. Deep potential net architecture.

Atom type Positions

Embedding

Interaction

Interaction

Atom-wise

Embedding

Shifted softplus

Atom-wise

Cfconv

Atom-wise

Shifted softplus

Atom-wise

Positions

Filter generator

Positions

Resource description

framework

Dense layer

Shifted softplus

Dense layer

Shifted softplus

Periodic boundary

conditions pooling

Atom-wise

Sum pooling

Fig. 9. DTNN architecture.

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 13

Molecular coordinates

Atomic environment

vector generator Atomic environment

vector generator

Atom Atom

Neural network

potentials of Neural network

potentials

Atoms

Atomic energy Atomic energy

Total energy

Atomic energy

Fig. 10. ANI architecture.

Cartesian coordinates

Featurization

Free energy

Net

Prior

Energy

Fig. 11. CG network architecture.

conductance is a function of the prior voltages in the de-

vice. Memristors were realized experimentally considering

that many nanoscale materials exhibit memristive properties

through ionic motion [120]. Nanophotonic systems are also

utilized for neuromorphic computing and especially for the

realization of deep learning networks [121] and adsorption-

based photonic NNs [122].

Although neuromorphic computing and memristors tend

to be a scalable practical technology, large area uniformity,

reproducibility of the components, switching speed/efﬁciency

and total lifetime in terms of cycles remain quite challenging

aspects [123], which require either the development of novel

memristive systems or improvements to existing systems.

To this end, integration with existing complementary metal-

oxide-semiconductor (CMOS) platforms and competitive per-

formance advantage over CMOS neurons must be explored.

These analog networks, after they are trained, can be highly

efﬁcient, however their training does not utilize digital logic

and, thus, lacks ﬂexibility [103].

B. Regression

In this section, we discuss the regression methods that are

commonly-used in the ﬁeld of nano-scale biomedical engineer-

ing. Regression aims at characterizing the relationships among

different variables. Three types of variables are identiﬁed in

regression problems, namely predictors, objective, and distor-

tion. A predictor, xi, with i∈[1, N], is an independent vari-

able, while the objective, Y, is the dependent one. Moreover,

let dstand for the distortion parameter that model unknown

parameters of the problem under investigation and affect the

estimated value of the dependent parameter. Mathematically

speaking, the objective of regression methods is to ﬁnd the

regression function f(x1,· · · , xN, d)that satisﬁes

Y=f(x1,· · · , xN, d).(9)

An important step for regression methods is to specify the form

of the regression function. Based on the selected regression

function, different regression methods can be identiﬁed. The

rest of this section presents the regression methods that are

commonly used in nano-scale biomedical engineering. In more

detail, Section III-B1 provides a brief overview of logistic

regression (LR), whereas Sections III-B2 and III-B3 respec-

tively discuss multivariate linear regression (MvLR) and clas-

siﬁcation via regression. Finally, Sections III-B4 and III-B5

respectively report the operating principles of local weighted

learning (LWL) and scoring functions (SFs). Table II sum-

marizes the applications of regression methodologies in nano-

scale biomedical engineering.

1) Logistic Regression: LR is a supervised learning classi-

ﬁcation algorithm used to predict the probability of a target

variable. The concept behind the target or the dependent

variable is dichotomous, which means that there would be only

two possible classes. LR can ﬁt trends that are more complex

than linear regression, but it still treats multiple properties

as linearly related and is still a linear model. LR is named

after the function used at the core of the method, the logistic

function, which can take any real-valued number and map it

into a value between 0and 1. To provide a better understanding

of LR, let us consider the binary classiﬁcation problem in

which zis the dependent variable and x= [x1, x2,· · · , xN]

are the Nindependent variables. Since, for a ﬁxed x,zfollows

a Bernoulli distribution, the probabilities Pr (z= 1 |x)and

Pr (z= 0 |x)can be respectively obtained as

Pr (z= 1 |x) = 1

1 + exp (−f(x)),(10)

Pr (z= 0 |x)=1−Pr (z= 1 |x)

=1

1 + exp (f(x)),(11)

where

f(x) = c0+

N

X

i=1

cixi,(12)

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 14

TABLE II

REGRESSION APPLICATIONS IN NANO-SCALE BIOMEDICAL ENGINEERING.

Paper Application Method Description

[124] Nanomedicine design LR Structure-activity relationships and design rules for spherical nucleic acids

[125] Treatment design LR Classiﬁcation of clinical trials based on an unsupervised ML algorithm

[126] Chemical properties modeling MvLR Comparison of predictive computational models for nanoparticle

induced cytotoxicity

[127] Chemical properties modeling Classiﬁcation via Regression Elimination of silico materials from potential human applications

[127] Chemical properties modeling LWL, SVM Cytotoxicity prediction of NPs in biological systems

[128] Chemical properties modeling SF Binding afﬁnity and virtual screening prediction for nano-structures

[129] Chemical properties modeling SF Quantiﬁcation of the impact of protein structure on binding afﬁnity

with c0, c1,· · · , cNbeing the regression coefﬁcients.

From (10), we can straightforwardly obtain f(x)as

f(x) = ln Pr (z= 1 |x)

1−Pr (z= 1 |x).(13)

For a given training-set of length N,{zi, xi,1,· · · , xi,M }with

i∈[1, N ], the regression coefﬁcients can be estimated by

employing the maximum likelihood approach.

LR has been used extensively in biomedical applications,

such as disease detection. Indicatively, in [124], LR was

used to determine structure-activity relationships and design

rules for spherical nucleic acids functioning as cancer-vaccine

candidates. Moreover, in [125], it has been used for nano-

medicine-based clinical trials classiﬁcation and treatment de-

velopment.

2) Multivariate Linear Regression: Following the previous

analysis, when multiple correlated dependent variables are pre-

dicted rather than a single scalar variable, the method is called

MvLR. This method is a generalization of multiple linear

regression and incorporates a number of different statistical

models, such as analysis of variance (ANOVA), t-test, F-test,

and more. MvLR has been used in ML for several nano-scale

biomedical applications. Among the most successful ones is

the prediction of cytotoxicity in NPs [126].

The MvLR model can be expressed in the form

yik =b0k+

p

X

j=1

bjk xij +eik,(14)

where yik is the k-th response for the i-th observation, b0kis

the regression intercept for the k-th response, bjk is the j-th

predictor’s regression slope for the k-th response, xij is the

j-th predictor for the i-th observation, eik is a Gaussian error

term for the k-th response, k∈[1, m]and i∈[1, n].

3) Classiﬁcation via Regression: Conventionally, when

dealing with discrete classes in ML, a classiﬁcation method is

used, while a regression method is applied, when dealing with

continuous outputs. However, it is possible to perform classiﬁ-

cation through a regression method. The class is binarized and

one regression model is built for each class value. In [127],

in order to predict cytotoxicity of certain NPs, classiﬁcation

via regression is among the methods that were evaluated, in

order to eliminate in silico materials from potential human

applications.

4) Local Weighted Learning: In the majority of learning

methods, a global solution can be reached using the entirety

of the training data. LWL offers an alternative approach at

a much lower cost, by creating a local model, based on the

neighboring data of a point of interest. In general, data points

in the neighborhood of the point of interest, called query

point, are assigned a weight based on a kernel function and

their respective distance from the query point. The goal of

the method is to ﬁnd the regression coefﬁcient that minimizes

a cost function, similar to most regression methods. Due to

its nature as a local approximation, LWL allows for easy

addition of new training data. Depending on whether LWL

stores in memory or not the entirely of the training data, LWL-

based methods can be divided into memory-based and purely

incremental, respectively [130].

Recently, LWL was used in [127], in order to predict the

cytotoxicity of NPs in biological systems given an ensemble of

attributes. It is found that when the data were further validated,

the LWL classiﬁer was the only one out of a set of classiﬁers

that could offer predictions with high accuracy.

5) Machine Learning Scoring Functions: SFs can be used

to assess the docking performance, i.e. to predict how a small

molecule binds to a target can be applied if a structural

model of such target is available. However, despite the notable

research efforts dedicated in the last years to improve the accu-

racy of SFs for structure-based binding afﬁnity prediction, the

achieved progress seems to be limited. ML-SFs have recently

proposed to ﬁll this performance gap. These are based on ML

regression models without a predetermined functional form,

and thus, are able to efﬁciently exploit a much larger amount

of experimental data [128]. The concept behind ML-SFs is that

the classical approach of using linear regression with a small

number of expert-selected structural features can be strongly

improved by using ML on nonlinear regression together with

comprehensive data-driven feature selection (FS). Also, in

[129] investigated whether the superiority of ML-SFs over

classical SFs on average across targets, is exclusively due to

the presence of training with highly similar proteins to those

in the test set.

In Fig. 12 examples of classical and ML-SFs are de-

picted [128]. The ﬁrst three (DOCK, PMF and X-SCORE)

are classical SFs, which are distinguished by the employed

structural descriptors. As it is evident, they all assume an

additive functional form. On the other side, ML-SFs do

not make assumptions about their functional form, which is

inferred from the training data.

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 15

Classical SF

DOCK (force field SF) bind

PMF (knowledge-based SF) PMF

X-Score (empirical SF) bind vdW HBonds rotor hydrophobic

ML SF cut-off

Fig. 12. Examples of classical and ML-SFs (from [128])

Class 1

Class 2

Fig. 13. The SVM method [131]

C. Support Vector Machine

NNs can be efﬁciently used in classiﬁcation, when a huge

number of data is available for training. However, in many

cases this method outputs a local optimal solution instead of

a global one. SVM is a supervised learning technique, which

can overcome the shortcomings of NNs in classiﬁcation and

regression. For a brief but useful description of the SVM

please see [131] and references therein. Next, for the help

of the reader the SVM is summarized by using [131].

The aim of SVM is to ﬁnd a classiﬁcation criterion, which

can effectively distinguish data at the testing stage. This

criterion can be a line for two classes data, with a maximum

distance of each class. This linear classiﬁer is also known as

an optimal hyperplane. In Fig. 13, the linear hyperplane is

described for a set of training data, x= (1,2,3, ..., n), as

wTx+b= 0,(15)

where wis an n-dimensional vector and bis a bias (error) term.

This hyperplane should satisfy two speciﬁc properties: (1)

the least possible error in data separation, and (2) the distance

from the closest data of each class must be the maximum one.

Under these conditions, data of each class can only belong

to the left of the hyperplane. Therefore, two margins can be

deﬁned to ensure the separability of data as

wTx+b≥1for yi= 1

≤ −1for yi=−1(16)

The general equation of the SVM for a linearly separable case,

which would be subjected to two constraints as

max Ld(α) = PN

i=1 αi−1

2PN

i,j=1 yiyjαiαjxT

ixj

s.t. αi≥0

PN

i=1 αiyi= 0

(17)

where αis a Lagrange multiplier.

Eq. (17) is used in order to ﬁnd the support vectors and their

corresponding input data. The parameter wof the hyperplane

(decision function) can then be obtained as

w0=

N

X

i=1

αixiyi(18)

and the bias parameter bcan be calculated as

b0=1

N

N

X

S=1 yS−wTxS(19)

More details about the use of the linear as well as the non-

linear SVM methods, can be found in [131].

An indicative training algorithm for SVM is the sequential

minimal optimization (SMO). SMO is a training algorithm

for SVMs. The training of an SVM requires the solution of

a large quadratic programming (QP) optimization problem.

Conventionally, the QP problem is solved by complex numer-

ical methods, however SMO breaks down the problem into

the smallest possible and solves it analytically, thus reducing

signiﬁcantly the amount of required time. SMO chooses two

Lagrange multipliers to optimize, which can be done analyt-

ically, and updates the SVM accordingly. Interestingly, the

smallest amount of Lagrange multipliers to solve the dual

problem is two, one from a box constraint and one from

linear constraint, meaning the minimum lies in a diagonal line

segment. If only one multiplier was used in SMO, it would

not be able to guarantee that the linear constraint is fulﬁlled at

every step [132]. Moreover, SMO ensures convergence using

Osuna’s theorem, since it is a special case of the Osuna

algorithm, that is guaranteed to converge [133]. Recently,

in [127], SMO was one of the classiﬁers used to predict

cytotoxicity of Polyamidoamine (PAMAM) dendrimers, well

documented NPs that have been proposed as suitable carriers

of various bioactive agents.

SVM have been applied in many signiﬁcant applications

in bioinformatics and bioemedical engineering. Examples

include: protein classiﬁcation, detection of the splice sites,

analysis of the gene expression, including gene selection for

microarray data, where a special type of SVM called Potential

SVM has been successfully used for analysis of brain tumor

data set, lymphoma data set, and breast cancer data set ( [134]

and references therein).

Recently, SVM was considered in MCs. Speciﬁcally, in

[135] the authors proposed injection velocity as a very promis-

ing modulation method in turbulent diffusion channels, which

can be applied in several practical applications as in pollution

monitoring, where inferring the pollutant ejection velocity may

give an indication to the rate of underlying activities. In order

to increase the reliability of inference, a time difference SVM

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 16

Class A

Class B

Fig. 14. The KNN ML method.

technique was proposed to identify the initial velocities. It was

shown that this can be achieved with very high accuracy.

In [136] a diffused molecular communication system model

was proposed with the use a spherical transceiver and a trape-

zoidal container. The model was developed through SVM-

Regression and other ML techniques, and it was shown that

it performs with high accuracy, especially if long distance

is assumed.

D. k−Nearest Neighbors

KNN is a supervised ML classiﬁer and regressor. It is based

on the evaluation of the distance between the test data and

the input and gives the prediction accordingly. The concept

behind KNN is the classiﬁcation of a class of data, based on

the k nearest neighbors. Other names of this ML algorithm are

memory-based classiﬁcation and example-based classiﬁcation

or case-based classiﬁcation.

KNN classiﬁcation consists of two stages: the determination

of the nearest neighbors and the class using those neighbors.

A brief description of the KNN algorithms is as follows

[137]: Let us considered a training data set Dconsisted of

(xi)i∈[1,|D|]training samples. The examples are described by

a set of features F, which are normalized in the range[0,1].

Each training example is labelled with a class label yj∈Y.

The aim is to classify an unknown example q. To achieve this,

for each xi∈D, we evaluate the distance between qand xi

as

d(q,xi) = X

f∈F

wfδ(qf,xif )(20)

There are many choices for this distance metric; a funda-

mental metric, based on the Euclidian distance, for continuous

and discrete attributes is

δ(qf,xif ) =

0fdiscrete and qf=xif

1fdiscrete and qf6=xif

|qf−xif |fcontinuous (21)

The KNNs are selected based on this distance metric. There

are a variety of ways in which the KNN can be used to

determine the class of q. The most straightforward approach

is to assign the majority class among the nearest neighbors to

the query.

Figure 14 depicts a 3and 6KNN on a two-class problem in

a two-dimensional space [137]. The red star represents the test

data point whose value is (2,1,3). The test point is surrounded

by yellow and blue dots which represent the two classes. The

distance from our test point to each of the dots present on

the graph. Since there are 10 dots, we get 10 distances. We

determine the lowest distance and predict that it belongs to

the same class of its nearest neighbor. If a yellow dot is the

closest, we predict that our test data point is also a yellow dot.

In some cases, you can also get two distances which exactly

equal. Here, we take into consideration a third data point and

calculate its distance from the test data. In Fig. 14 the test data

lies in between the yellow and the blue dot. We considered

the distance from the third data point and predicted that the

test data is of the blue class.

The advantages of KNN are simple implementation and no

need for prior assumption of the data. The disadvantage of

KNN is the high prediction time.

E. Dimentionality Reduction

This section is devoted to discussing dimentionality re-

duction methods. Dimentionality reduction constitutes the

preparatory phase of ML, because the initially acquired raw

data may contain some irrelevant or redundant features. Next, a

comprehensive description of FS is provided in Section III-E1.

Likewise, principal component analysis (PCA) and linear

discriminant analysis (LDA) are respectively discussed in

Sections III-E2 and III-E3. Finally, Section III-E4 presents

the fundamentals of independent component analysis (ICA).

Table III report the dimentionality reduction methodologies

applications in nano-scale biomedical engineering.

1) Feature Selection: FS reduces the complexity of a prob-

lem by detecting the subset of features that contribute most to

the results. FS is one of the core concepts in ML, which hugely

impacts the achievable performance. It is important to point

out that FS is different from dimensionality reduction. Both

methods seek to reduce the number of attributes in the data set,

but a dimensionality reduction method do so by creating new

combinations of attributes, whereas FS methods include and

exclude attributes present in the data without changing them.

Combining ML algorithms with FS has been proven to be

very useful for disease detection [138], [139]. It highlights the

features associated with a speciﬁc target from a larger pool.

For instance, in [140], a classiﬁcation algorithm was used to

analyze 10000 genes from 200 cancer patients, while FS was

used to associate 50 of them with metastatic prostate cancer.

The selected features were then utilized as biomarker signature

criteria in a ML algorithm for classiﬁcation and diagnostics.

Furthermore, recent research efforts provided evidence that

combining data from multiple sources, such as transcrip-

tomics and metabolomics to create composite signatures can

improve the accuracy of biomarker signatures and disease

diagnoses [141].

2) Principal Component Analysis: PCA [103], [142]–[144]

is an approach to solve the problem of blind source separation

(BSS), which aims at the separation of a set of source signals

from a set of mixed signals, with little information about the

source signals or the mixing process. PCA utilizes the eigen-

vectors of the covariance matrix to determine which linear

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 17

TABLE III

DIMENTIONALITY REDUCTION APPLICATIONS IN NANO-SCALE BIOMEDICAL ENGINEERING.

Paper Application Method Description

[138] Disease detection FS Cancer prognosis and prediction

[139] Disease detection FS Breast cancer detection

[140] Disease detection FS Metastatic cancer detection

[141] Disease detection FS Improved diagnoses based on composite biomarker signature

[142] Image analysis PCA Spectroscopic image analysis

[143] Signal analysis PCA, LDA Classiﬁcation of EEG signals

combinations of input variables contain the most information.

It can also be used for feature extraction and dimensionality

reduction. For cases with strong response variations, PCA

allows an effective approach to rapidly process, de-noise, and

compress data, however it cannot explicitly classify data.

More speciﬁcally, in PCA, the dimensional data are rep-

resented in a lower-dimensional space, reducing the degrees

of freedom, the space and time complexities. PCA aims to

represent the data in a space that best expresses the variation

in a sum-squared error sense and is utilized for segmenting

signals from multiple sources. As in standard clustering meth-

ods, it is useful if the number of the independent components

is determined. Using the covariance matrix C=AAT, where

Adenotes the matrix of all experimental data points, the

eigenvectors wkand the corresponding eigenvalues λkcan

be calculated. The eigenvectors are orthogonal and are chosen

in order for the corresponding eigenvalues to be placed in

descending order, i.e, λ1> λ2> .... To this end, the

ﬁrst eigenvector w1contains the most information and the

amount of information decreases in the following eigenvectors.

Therefore, the majority of the information is contained in

a number of eigenvectors, whereas the remaining ones are

dominated by noise.

3) Linear Discriminant Analysis: LDA is another method

for the solution of the BSS problem [103], [143]. In LDA,

linear combinations of parameters that optimally classify data

are identiﬁed and the main goal is to reduce the dimension

of data. LDA has been used with a nanoﬂuidic system to

interpret gene expression data from exosomes and thus, to

classify the disease state of patients. More speciﬁcally, LDA

aims to create a new variable that is a combination of the

original predictors, by maximizing the differences between

the predeﬁned groups with respect to the new variable. The

predictor scores are utilized in order to form the discriminant

score, which constitutes a single new composite variable.

Therefore, the use of LDA results in an signiﬁcant data dimen-

sion reduction technique that compresses the p-dimensional

predictors into a one-dimensional line. Although at the end

of the process the desired result is that each class will have

a normal distribution of discriminant scores with the largest

possible difference in mean scores between the classes, some

overlap between the discriminant score distributions exists.

The degree of this overlap represent a measure of the success

of LDA. The discriminant function which is used to calculate

the discriminant scores can be expressed as

D=w1Z1+w2Z2+... +wpZp,(22)

where wkand Zkwith k= 1, ...p denote the weights and

predictors, respectively. From (22), it can be observed that

the discriminant score is a weighted linear combination of the

predictors. The estimation of the weights aims to maximize

the difference between each class mean discriminant scores.

To this end, the predictors which are not similar with respect to

the class mean discriminant scores will have larger weights,

whereas the weights will reduce the more similar the class

means are [145].

4) Independent Component Analysis: ICA [103], [143],

[144] was introduced in [146] and is another approach to the

solution of the BSS problem. According to ICA, the original

inputs are transformed into features, which are mutually inde-

pendent and the non-orthogonal basis vectors that correspond

to the correlations of the data are identiﬁed through higher

order statistics. The use of the last one is needed, since the

components are statistically independent, i.e., the joint PDF of

the components is obtained as the product of the PDFs of all

components.

Let consider cindependent scalar source signals xk(t), with

K= 1, ..., c and tbeing a time index. The csignals can be

grouped into a zero mean.vector x(t). Assuming that there is

no noise and considering the independence of the components,

the joint PDF can be expressed as

fx(x) =

c

Y

k=1

fxk(xk).(23)

An d-dimensional data vector, y(t), can be observed at each

moment through,

y(t) = Ax(t)(24)

where Ais a c×dscalar matrix with d≥c.

ICA aims to recover the source signals from the sensed

signals, thus the real matrix W=A−1has to be determined.

To this end, the determination of Ais performed by maximum-

likelihood techniques. An estimate of the density, termed as

ˆ

fy(y;a), is used and the parameter vector a, that minimizes the

difference between the source distribution and the estimate has

to be determined. It should be highlighted that ais the basis

vector of Aand, thus, ˆ

fy(y;a)is an estimate of fy(y).

F. Gradient Descent Method

When there are one or more inputs the optimization of the

coefﬁcients by iteratively minimizing the error of the model

on the training data becomes a very important procedure. This

operation is called GD and initiates with random values for

each coefﬁcient. The sum of the squared errors is calculated for

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 18

each pair of input and output values. A learning rate is used as

a scale factor and the coefﬁcients are updated to minimize the

error. The process is repeated until a minimum sum squared

error is achieved or no further improvement is possible. In

practice, GD is taught using a linear regression model due to

its straightforward nature and it proves to be useful for very

large datasets [147].

GD is one of the most popular algorithms to optimize in

NNs and has been extensively used in nano-scale biomedical

engineering. For example, in [29], the authors proposed a

method to use ANNs to approximate light scattering by multi-

layer NPs and used the GD for optimizing the input parameters

of the NN.

G. Active Learning

In AL, also known as the optimal design of experiments,

a surrogate model is created from a given data set, and then

the model is used to select which data should be obtained

next [148]. The selected data are added to the original data

set and then used to create an updated surrogate model. The

process is repeated iteratively such that the surrogate model

is improved continuously. In contrast to classic ML methods,

the identiﬁer of an AL system is that it develops and tests

new hypotheses as part of a continuing, interactive learning

process. This method of iterative surrogate model screening

has already been used in other ﬁelds, such as drug discovery

and molecular property prediction [149], [150].

H. Bayesian Machine Learning

In addition to the Bayes Theorem is a powerful tool in

statistics, it is also widely used in ML to develop models

for classiﬁcation, such as the Optimal Bayes classiﬁer and

Naive Bayes. The optimal Bayes classiﬁer selects the class

that presents the largest a posteriori probability of occurrence.

It can be shown, that among all classiﬁers, the Optimal

Bayes classiﬁer has the lowest error probability. In most

real-life applications the posterior distribution is unknown

but can rather be estimated. In this case, the Naive Bayes

classiﬁer approximates the optimal Bayes classiﬁer by looking

at the empirical distribution and assuming independence of

predictors. So, the Naive Bayes classiﬁer is a simple but

suboptimal solution. It should be mentioned that Naive Bayes

can be coupled with a variety of methods to improve the

accuracy [151]. Furthermore, since it relies on the computation

of closed-form expressions of a posteriori probabilities, it

takes linear time to compute, in contrast to expensive iterative

approximations that are commonly used in other methods.

Assuming an instance that is represented by the observation

of nfeatures, x= (x1, . . . , xn), Naive Bayes assigns a

probability p(Ck|x)for each possible class Ckamong K

possible outcomes. According to Bayes’ theorem, the posterior

probability is given by the prior times the likelihood over the

evidence, i.e.

p(Ck|x) = p(Ck)p(x|Ck)

p(x).(25)

The evidence is not dependent on Cso it is of no interest.

Naive Bayes is a naive classiﬁer because it assumes that all

features in xare mutually independent conditioned on Ck.

Therefore, it assigns a class label as

ˆy= argmax

k∈{1,...,K}

p(Ck)

n

Y

i=1

p(xi|Ck).(26)

Bayesian analysis and ML are playing an important role

in various aspects of nanotechnology and related molecular-

scale research. Recently it has been shown that an atomic

version of Green’s function and Bayesian optimization is

capable of optimizing the interfacial thermal conductance of

Si-Si and Si-Ge nano-structures [152]. This method was able

to identify the optimal structures between 60000 candidate

structures. Furthermore, more recent works have relaxed the

data requirement limitations by adapting output parameters to

unsupervised learning methods such as Bayesian statistical

methods that do not rely on an external reference [153]–

[155]. Naive Bayes has been applied to predict cytotoxicity

of PAMAM dendrimers, which are well documented NPs that

have been proposed as suitable carriers of various bioactive

agents, in [127]. By pre-processing the data, Naive Bayes

presented substantial improvement in the accuracy despite its

simplicity, thus, outperforming the classiﬁcation methods used

in [127].

I. Decision Tree Learning

DTL is a predictive modeling technique used in ML, which

uses a decision tree to draw conclusions about the target

value based on observations. In the tree paradigm, the target

values are represented as leaves, while the observations are

denoted by branches. There are two types of DTL, namely

classiﬁcation and regression trees. In the former, the target

variable belongs in a discrete set of values, while in the

latter the target variable is continuous. Furthermore, some

techniques, such as bagged trees and bootstrap aggregated

decision trees, use multiple decision trees. In more detail,

the bagged trees method builds an ensemble incrementally

by training each new instance to emphasize the training

instances that were previously mis-modeled. The bootstrap

aggregated decision trees is an early ensemble method that

creates multiple decision trees by resampling training data and

voting the trees for a consensus prediction.

DTL has been used extensively in nano-medicine by op-

timizing material properties according to predicted interac-

tions with the target drug, biological ﬂuids, immune system,

vasculature, and cell membranes, all affecting therapeutic

efﬁcacy [156]. Speciﬁcally, in [157], decision trees were used

for classiﬁcation of effective and ineffective sequences for

Ribonucleic acid interference (RNAi) in order to recognize

key features in their design. In addition, several algorithms

have been developed over the years that improve the accuracy

and efﬁciency of DTL. For instance, the J48 algorithm is

considered among the best algorithms with regards to accuracy

and has been used in various biomedical tasks, such as

predicting cytotoxicity, measured as cell viability [127], [158].

Next, we present the most commonly used DTL methods. In

this direction, Bootstrap aggregating (bagging) is revisited in

Section III-I1, while the operating principles of bagged trees

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 19

are highlighted in Section III-I2. Moreover, the fundamentals

of bagged Bayes trees are discussed in Section III-I3, whereas

the adaptive boosting (AdaBoost) approach is reported in Sec-

tion III-I4. Finally, descriptions of random forest (RForest) and

M5P approaches are respectively delivered in Sections III-I5

and III-I6

1) Bagging: Bootstrapping methods have been used exten-

sively to minimize statistical errors of predictors by utilizing

random sampling with replacement. In supervised learning,

a training dataset is utilized to train a predictor. Bootstrap

replicas of the training dataset can be employed to generate

new predictors. bagging is a meta-learning algorithm that

uses this idea to develop an aggregated predictor, either by

averaging the predictors over the learning sets when the exit

is numerical or by voting, when the exit is a class label

[163]. More speciﬁcally, assuming a learning set Lconsists of

data {(yn,xn), n = 1, . . . , N}and a predictor φ(x,L),yis

predicted by φ(x,L)if the input is x. The learning set consists

of Nobservations and since it is hard or in many cases impos-

sible to obtain more observations to improve the learning set,

we turn to bootstrapping, creating different learning sets using

the sample Nas the population, which effectively leads to new

predictors ({φ(x,L)}). The aggregated predictor’s accuracy is

determined by the stability of the procedure for constructing

each φpredictor, i.e., the accuracy will be improved with

bagging in unstable procedures, where small variation in the

learning set leads to large changes in the predictor.

Recently, bagging has been used to predict possible toxic

effects caused by the exposure to nanomaterials in biological

systems [159]. As a base predictor φ, REPTree was used,

which is a fast decision tree-based learning algorithm. It

should be mentioned that the bagging algorithm offered the

highest accuracy, in terms of correlation, between actual and

predicted results.

2) Bagged Tree: Bagging can be applied to any kind of

model. By using bagged decision trees, it is possible to lower

the bias by leaving the trees un-pruned. High variance and low

bias is essential for bagging classiﬁers. The aggregate classiﬁer

can capitalize on this and provide an increase in accuracy. In

[160], a bagged tree was used with great success in a ensemble

classiﬁer with particle swarm optimization (PSO) in order to

predict heart disease.

3) Naive Bayes Tree: A hybrid approach to learning, when

many attributes are deemed relevant for a classiﬁcation task,

yet they are not sufﬁciently independent, is the NBTree.

NBTree consists in practice of a decision tree with Naive

Bayes classiﬁers at the leaf nodes [164]. Firstly, according

to a utility function an attribute is split in the decision tree

making process. If the utility is not sufﬁciently high, the node

becomes a leaf and a Naive Bayes classiﬁer is created at the

node. NBTree can deal both with discrete data, by multi-way

splits for all values, and with continuous data, by using a

threshold split.

In [127], NBTree was used among other learning methods as

a way to predict the cytotoxicity of nanomaterials in biological

systems. When leave-one-out cross validation was performed,

NBTree achieved the best performence and achieved an accu-

racy of 77.7%.

4) Adaptive Boosting: AdaBoost is a learning method that

uses an ensemble of classiﬁers in order to improve accu-

racy [165], [166]. Boosting is a technique that takes a set of

weak learners –usually a decision tree classiﬁer– and combines

them into a strong one. The procedure can be summarized

as follows. A set of labeled training examples {(xi, yi)},

where xiis an observable quality and yiis the outcome, are

given into a set of classiﬁers that are each assigned a weight.

After every weak classiﬁer has reached to a prediction, the

boosting method combines all the weak hypotheses into a

single prediction. AdaBoost does not need prior knowledge of

the accuracies of the weak classiﬁers, instead, it adapts to the

errors of the weak classiﬁers. In essence, the weak classiﬁers

are tweaked to better handle data that were mishandled by

previous classiﬁers. In some cases, AdaBoost has shown to be

less susceptible to over-ﬁtting than other learning methods,

however it is prone to noisy data and outliers due to its

adaptive nature.

AdaBoost was one of the methods used in [160] in an

ensemble classiﬁer together with PSO to predict heart disease.

Moreover, AdaBoost was used in [161] as a learning approach

for particle detection in cryo-electron micrographs. Similarly,

in [162], it was used for characterizing and analyzing unique

features and properties of nanomaterials and nanostructures.

5) Random Forest: RForest is one of the one of the most

used ML algorithms, due to its simplicity and diversity, since

it can be used for both classiﬁcation and regression. As the

name suggests, a RForest is a tree-based ensemble, where each

tree is connected to a collection of random variables [167]. In

Fig. 15, RForest average multiple decision trees are presented,

that have been trained on different parts of the same training

set, in order to reduce the variance. The different decision trees

are trained based on the bagging technique, thus they exploit

the random subsets of the training data. An advantage of

RForest is that it decreases the variance of the model and, thus,

combines uncorrelated individual trees with bagging, making

them more robust without increasing the bias to overﬁtting.

Another technique for combining individual trees is boosting,

where the samples are weighted for sampling so that samples,

which were predicted incorrectly, get a higher weight and

are therefore, sampled more often. The concept behind this

is that difﬁcult cases should be emphasized during learning,

compared to easy ones. Because of this difference, bagging can

be easily paralleled, while boosting is performed sequentially.

Next, we provide brieﬂy the mathematical concept behind the

RForest method.

We assume an unknown joint distribution PXY (X,Y ), where

X= (X1, . . . , Xp)Tis a p-dimensional random vector, which

represents the predictor variables and Yis the real-valued

response. The aim of the RForest algorithm is to ﬁnd a

prediction function f(X)in order to predict Y. The prediction

function is that which minimizes the expected value of the loss

function L(Y, f (X)), i.e. EXY (L(Y , f(X))),where the sub-

scripts denote expectation with respect to the joint distribution

of Xand Y.

Note that L(Y, f (X)) is a measure of how close f(X)is to

Yand it penalizes values of f(X)that are far from Y. Typical

choices of Lare squared error loss L(Y, f (X)) = (Y−f(X))2

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 20

TABLE IV

DECISION TREE LEARNING APPLICATIONS IN NANO-SCALE BIOMEDICAL ENGINEERING.

Paper Application Method Description

[156] Disease treatment DTL Cancer treatment

[157] Chemical properties modeling DTL Feature recognition in the design of RNA sequences

[158] Chemical properties modeling DTL Prediction of cytotoxicity

[127] Chemical properties modeling DTL, NBTree Prediction of cytotoxicity

[159] Chemical properties modeling Bagging, M5P Prediction of cytotoxicity

[160] Disease prediction Bagged tree, AdaBoost Heart disease prediction

[161] Disease detection AdaBoost Particle detection in cryo-electon micrographs

[162] Chemical properties modeling AdaBoost Characterization nanomaterial properties

Instance

Tree #1 Tree #2 Tree #N

...

Majority Voting

Final Result

Fig. 15. Random forest diagram.

for regression and zero-one loss for classiﬁcation:

L(Y, f (X)) = I(Y6=f(X)) = 0if Y=f(X)

1otherwise. (27)

It turns out that minimizing EXY (L(Y , f(X))) for squared

error loss gives the conditional expectation f(x) = E(Y|

X=x), which is known as the regression function. When

classiﬁcation is considered, if the set of possible values of

Yis denoted by Y, then minimizing EXY (L(Y , f(X))) for

zero-one loss results to

f(x) = arg max

y∈Y P(Y=y|X=x)(28)

which is the Bayes rule.

Ensembles construct fin terms of the so-called “base

learners” h1(x), ..., hJ(x)and these are combined to give the

“ensemble predictor” f(x). In regression, the base learners are

averaged

f(x) = 1

J

J

X

j=1

hj(x)(29)

while in classiﬁcation, f(x)is the most frequently predicted

class

f(x) = arg max

y∈Y

J

X

j=1

I(y=hj(x)) (30)

In RForests the jth base learner is a tree denoted as

hj(X, Θj), where Θj, j = 1, ..., J. is a collection of inde-

pendent random variables. To deeply understand the RForest

algorithm, a fundamental knowledge of the type of trees used

as base learners is needed.

6) M5P: The M5 model tree method was introduced by

Quinlan in 1992 [168]. Wang and Witten later presented

an improved public-domain scheme [169], called M5P, that

generates more compact and comprehensible models with

slightly better accuracy. M5P combines conventional binary

decision tree models with regression planes at the leaves, to

provide a way to deal with continuous-class problems. The

initial tree split is based on a standard deviation criterion,

called standard deviation reduction (SDR) and given by

SDR = SD(A)−X

i

|Ti|

|T|SD(T),(31)

where SD(A)is the standard deviation of the set A,Tis the

set of learning examples that reach the node, and {Ti}are

the subsets that result from splitting Taccording to a chosen

attribute. The attribute that maximizes SDR is the chosen for

the split. However, this process can lead to large tree structures

that are prone to over-ﬁtting. Therefore, pruning the tree is

necessary to improve accuracy. For every interior node of

the tree, a regression model is calculated with the examples

that reach that node, if the subtree error is greater than the

respective error of the regression model in that node, the tree

is pruned and that particular node is turned into a leaf node.

Recently, M5P was used in [159] to built a simulator that

can dynamically predict the mortality rate of cells in biological

systems in order to test possible toxic effects from exposure to

nano-materials. The simulator’s user can change the attribute

values dynamically and obtain the predicted value of the used

metric.

J. Decision Table

ADT is a simple tabular representation of conditions and

actions [170]. It is very similar to the popular decision trees.

A key difference between among them is that the former can

include more than one “OR” condition. However, DTs are

usually preferred when a small number of features is available,

whereas decision trees can be used for more complex models.

Decision Table Naive Bayes: Combined learning models

is an efﬁcient way to improve the accuracy of stand-alone

models. DT Naive Bayes (DTNB) is such a hybrid model,

where a DT classiﬁer is combined with a naive Bayes network,

to produce a table with conditional probabilities. The learning

process for DTNB splits the training data into two disjoint

subsets and utilizes one set for training the DT and the other

for training the NB [170]. The goal is to use NB on the

attributes that are somewhat independent, since NB already

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 21

assumed independence of attributes. Cross validation methods

are suitable in this hybrid model since it is effective in

both DTs, due to the structure of the table remaining the

same, and the NB as the frequency counts can be updated

in constant time.

Assuming that xDT is the set of attributes used in DT, and

xNB is the respective set of attributes for NB, the class k

probability can be computed as

P(Ck|x) = aP (Ck|xDT)P(Ck|xNB)

P(Ck),(32)

where ais a normalization constant and P(Ck)is the prior

probability of the class. DTNB is shown to achieve signiﬁcant

gains over both DTs and NB. More speciﬁcally, in [127],

DTNB was used among other methods to predict cytotoxicity

values of nanomaterials in biological systems.

K. Surrogate-Based Optimization

Surrogate-based optimization [171], [172] refers to a class

of optimization methodologies, that calculate the local or

global optima by utilizing surrogate modeling techniques.

This framework utilizes conventional optimization algorithms,

such as gradient-based or evolutionary algorithms, for sub-

optimization. Surrogate modeling techniques can signiﬁcantly

improve the design efﬁciency and facilitate ﬁnding global

optima, ﬁltering numerical noise, accomplishing parallel de-

sign optimization and integrating simulation codes of different

disciplines into a process chain.

In optimization problems, surrogate models can approxi-

mate the cost functions and the state functions, constructed

from sampled data which are obtained by randomly exploring

the design space. After this step, a new design based on the

surrogate models, which is most likely to be the optimum, is

searched by applying an optimization algorithm such as Ge-

netic Algorithms. Utilizing a surrogate model for the estima-

tion of the optimum is more effective than using a numerical

analysis code, thus, the computational cost of the search based

on the surrogate models is negligible. Surrogate models are

built from the sampled data, thus the way the sample points

are chosen and the way the accuracy of surrogate models is

evaluated are important issues for surrogate modeling.

In [173], surrogate-based optimization is used to search the

space of intermetallics for potentially selective catalysts for

CO2reduction reaction and hydrogen evolution reaction.

L. Quantitative Structure-Activity Relationships

ML techniques have been combined with QSARs models

over the past decade [174]. One of the most successful

applications of such models is the development of new drugs

faster and with lower cost. QSAR methods are data-driven

and based on supervised learning. They capture the complex

relationships between the properties of nanomaterials without

requiring detailed knowledge of the mechanisms of interaction.

In more detail, every biological activity of organic molecules

is a function of their structural properties that depend on their

chemical structures. These relationships can be expressed as

in [174]

Activity =fX(Properties),(33)

and

Property =f(Structure).(34)

Due to the complexity of the materials the predictivity of

the applied methods must be optimized, thus various differ-

ent techniques have been used in the literature. Speciﬁcally,

in [175], QSAR models were developed based on sparse

linear FS and regression in conjunction with a minimization

algorithm, while, in [176]–[178], nonlinear FS was used with

Bayesian regularized NNs that used Gaussian or Laplacian

priors. Also, ANNs have been recently employed to forecast

the biological activity of compounds under investigation, while

the ANN-classiﬁcation model categorizes the compounds for

a speciﬁc biological response [179].

M. Boltzmann Generator

The aim of statistical mechanics is to assess the average

behavior of physical systems based on their microscopic

constituents and interactions, in order not only to understand

the molecules and materials functionalities, but also provide

the principles for devising drug molecules and materials with

novel properties. In this direction, the statistics of the equi-

librium states of many-body systems needs to be evaluated.

To conceive the complexity of this, let us try to evaluate

the probability that, at a given temperature, a protein will be

folded. In order to solve this problem, we need to examine

each one of the huge number of ways to place all the proteins

in a predetermined space and for each one of them extract

the corresponding probability. However, since the enumeration

of all conﬁgurations is extremely difﬁcult or even infeasible,

the necessity to sample them from their equilibrium distribu-

tion has been identiﬁed in [28]. In this work, the authors

proposed the Boltzmann generator, which combines deep ML

and statistical mechanics in order to learn sample equilibrium

distributions. In contrast to conventional generative learning,

the Boltzmann generator is not trained to learn the probability

density from data, but to directly produce independent samples

of low-energy structures for condensed-matter systems and

protein molecules.

As presented in Fig. 16, the operation principle of Boltz-

mann generator consists of two parts:

1) A generative model, Fzx, is trained capable of providing

samples from a stochastic distribution, which is described

by the probability density function (PDF), fx(x), when

sampling zfrom a simple prior, such as a Gaussian

distribution with PDF fz(z).

2) A re-weighting process that transforms the generated

distribution, fx(x), into the Boltzmann distribution, and

produces unbiased samples from the e−u(x), with u(x)

being the dimensionless energy.

Note that both training and re-weighting require fx(x)knowl-

edge. This can be ensured by adopting an invertible Fzx

transformation, which allows us to transform fz(z)to fx(x).

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 22

Re-weight

Fig. 16. Boltzmann generator.

N. Feedback System Control

FSC [180] is a recently proposed method for the optimiza-

tion of drug combinations. FSC is a phenotypically driven

optimization process, which does not require any mechanistic

knowledge for the system. This is the reason that FSC can

be successfully applied in various complex biological systems

(see [181] and references therein)

The FSC method is based on the closed-loop feedback

control process outlined in Fig. 17 [180]. It mainly consists

of two steps: the ﬁrst step is the deﬁnition of an initial set

of compounds to be tested. The second step refers to the

generation of broad dose-response curves for each compound

in the selected cellular bioassay, which is selected to provide

a phenotypic output response, that is used to evaluate the

efﬁcacy of the drugs and drug combinations on overall cell

activity.

A schematic representation of the FSC technique is pre-

sented in Fig. 17. The ﬁve main components of the optimiza-

tion process are depicted as:

(a) The input, i.e., the drug combinations with deﬁned drug

doses.

(b) The system, i.e., the selected cell type representation of

the disease to be studied

(c) The system output, i.e., the cellular response to the

deﬁned drug combination input in the selected cell bioassay.

(d) The search algorithm that iteratively drives the system

output toward the desired response.

(e) The statistical analysis used to guide drug elimination.

Output =

a

b

c

Input

System

Refine input

Output

Regression analysis

Search algorithm

d

e

Fig. 17. Examples of classical and ML-SFs.

O. Quadratic Phenotypic Optimization Platform

Methods based on ML, like FSC, aim to overcome the

disadvantages of the traditional methods, as for example the

high-throughput screening. Recently, a powerful AI platform

called Quadratic Phenotypic Optimization Platform (QPOP)

was proposed, to interrogate a large pool of potential drugs

and to design a novel combination therapy against multiple

myeloma [69]. This platform can efﬁciently and iteratively

outputs effective drug combinations and can optimize the drug

doses.

The main concept of QPOP lies in recognizing the re-

lationship between inputs (e.g., drugs) and desired pheno-

typic outputs (e.g., cell viabilities) to a smooth, second-order

quadratic surface representative of the biological system of

interest. Since QPOP utilizes only the controllable inputs

and measurable phenotypic outputs of the biological system,

it is able to identify optimal drug combinations and doses

independently of predetermined drug synergy information

and pharmacokinetic properties. Furthermore, QPOP utilized

ML in order to preclinically re-optimize the combination

and successfully translate the multi-drug regimen through in

vivo validation. It is important to mention that both the in

vitro and preclinical re-optimization processes were able to

simultaneously take into account both efﬁcacy and safety, and

this is an important aspect of the QPOP platform.

QPOP can also be used as an actionable platform to

design patient-speciﬁc regimens. This multi-parametric global

optimization methodology can overcome many of the drug

development process difﬁculties, and can result in efﬁcient

and safe therapies. This will revisit the drug development,

translating into improved and effective treatment choices.

More details about the use of the QPQP platform in

biomedicine applications can be found in [182] and [183] and

references therein.

IV. DISCUSSION & TH E ROAD AHEAD

In this section, we clarify how the ML methodologies

presented in Section III can be efﬁciently used to solve the

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 23

problems discussed in Section II and elaborate on some major

open research problems, which are of great importance for

unveiling the potential beneﬁts, advantages, and limitations of

employing ML in nano-scale biomedical engineering. In this

direction, Table V, which is given in the top of the next page,

connects the ML challenges with the ML methodologies, that

have been used in nano-scale biomedical engineering.

From Table V, it becomes evident that ANNs can be em-

ployed to solve a large variety of ML problems in nano-scale

biomedical engineering. The ML methods CNNs, RNNs, and

DNNs are capable of identifying patterns, locate and classify

target objects in an image, and detect events [184]. As a result,

they can excel in the development of ARES, which contributes

to the discovery, design, and performance optimization of

nano-structures and nano-materials. Furthermore, they can be

used for the detection of received symbols in molecular and

electromagnetic nano-networks, for the classiﬁcation of obser-

vations that may provide a better understanding of biological

and chemical processes, and for the identiﬁcation of speciﬁc

patterns. On the other hand, D2NNs can efﬁciently execute

identiﬁcation and classiﬁcation tasks, after being trained by

large datasets. Therefore, they have been successfully used

in lens imaging at THz spectrum, while they are expected to

ﬁnd application in image analysis, feature detection, and object

classiﬁcation. In other words, D2NNs may be employed for

heterogeneous nano-structures discovery, channel estimation

and symbol detection in nano-scale molecular and THz net-

works, as well as disease detection and therapy development.

By inducing the algorithm to learn complex relationships

within a training dataset and making judgments on test datasets

with high ﬁdelity, GRNNs are capable of providing a sys-

tematic methodology to map inputs to predictive outputs. As

a consequence, they have been applied in several ﬁelds, in-

cluding optical character recognition, pattern recognition, and

manufacturing for predicting the output classiﬁcation [185],

[186]. In nano-scale biomedical engineering, they have been

extensively used in discovering the properties of and de-

signing heterogeneous nano-structures [186], [187] as well

as analyzing the data collected from them [188]. However,

their applicability in molecular and electromagnetic nano-scale

networks speciﬁc problems needs to be assessed.

Based on Cybernko’s theorem [189], MLPs are proven to be

universal function approximators. In other words, they return

low-complexity approximating solutions from extremely com-

plex problems. As a result, MLPs have been a popular ML

method in 80s in several ﬁelds including speech and image

recognition (see e.g., [190], [191] and references therein). In

nano-scale biomedical engineering, MLPs have been applied

for nano-structure properties discovery [192], [193] and data

analysis [80]. However, it is expected to be replaced by much

simpler SVMs, which are considered their main competitors.

GANs have been recently used to inversely design meta-

surfaces in order to provide arbitrary patterns of the unit cell

structure [194]. However, they experience high instability. To

solve this problem conditional deep convolutional GANs are

usually employed. These networks return very stable Nash

equilibrium solutions that can be used for inversely designing

nanophotonic structures [84], [195]. Another application of

GANs lies in the statistical characterization of psychological

wellness states [80]. In general, for applications in which

the data have a non-linear behavior, GANs achieve similar

performance as SVMs and k−nearest neighbor, and outper-

form MLPs.

Classical force ﬁeld theory can neither easily scale into large

molecules nor become transferable to different environments.

To break these limitations, BPMs, DPNs, DTNNs, SchNets,

and CGNs have been traditionally used to model the PESs and

atomic forces in large molecules, like proteins and provide

transferability to different covalent and non-covalent environ-

ments. However, these approaches are incapable of reaching

the required accuracy with lower than classical force ﬁeld eval-

uation complexity. Motivated by this, symmetrized gradient-

domain ML have been very recently presented as a possible

solution to the aforementioned problem [14], [196]–[198].

The limitation of this ML approach is that it cannot support

molecules that consists of more than 20 atoms. In other words,

it lacks scalability and transferability. To countermeasure this,

researchers should turn their eye in combining BPMs, DPNs,

DTNNs, SchNets, and CGNs with gradient-domain ML in

order to provide high-accuracy in conﬁguration and chemical

space simulations. A plethora of new insights awaits as a result

of such simulations.

Regression approaches have been used to extract the rela-

tionship between several independent variables and one depen-

dent variable. Therefore, they have supported the solution of

a large variety of problems that range from the area of nano-

materials and nano-structure design to data-driven applications

in biomedical engineering [124], [136]. Moreover, they usually

require no input features or tuning for scaling and they are easy

to regularize. However, it is incapable of solving non-linear

problems. Another disadvantage of regression approaches is

that they require the identiﬁcation of all the important inde-

pendent attributes before inserting the data into the machine.

Moreover, most of them return discrete outputs, i.e., they only

provide categorical outcomes. Finally, they are sensitive to

overﬁtting [199].

Similarly to regression, SVMs are efﬁcient methods for

problems with high-dimensional spaces. Taking this into ac-

count, several researchers have adopted them in order to pro-

vide solutions to a large range of problems from heterogeneous

structure design to signal detection in molecular communi-

cation systems and data-driven applications. However, as the

data set size increases, SVMs may underperform. Another

limitation that should be highlighted is that they are not

suitable for problems with overlapping targeting classes [200].

KNN has been employed in structure and material de-

sign [201], MCs for symbol detection [6], and disease de-

tection [202], [203]. It is a low-complexity approach suitable

for classifying data without training. However, it suffers from

performance degradation when applied to large data sets, due

to increased cost of computing the distance between the new

point and each of the existing points. A similar performance

degradation is observed as the dimensions of the data increase.

This indicates that the application of KNN approach in hetero-

geneous nano-structure design is questionable. On the other

hand, it excels in data sequence detection in MC systems,

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 24

TABLE V

ML PROBL EM S AND S OL UTI ONS.

ML approaches ML challenge categories

Structure and

material design and

simulation

Communications and

signal processing

Applications

ANNs

Convolution neural networks XXX

Recurrent neural networks XXX

Deep neural networks XXX

Diffractive deep neural networks X- -

Generalized regression neural networks X-X

Multi-layer perceptron X X -

Generative adversarial networks X-X

Behler-Parrinello networks X- -

Deep potential networks X- -

Deep tensor neural networks X- -

SchNet X- -

Accurate neural network engine for molecular

energies

X- -

Coarse graining X- -

Neuromorphic computing X- -

Regression

Logistic regression XXX

Multivariate linear regression X- -

Classiﬁcation via regression X- -

Local weighted learning X- -

Machine learning scoring functions X- -

Support vector machine

Support vector machine XXX

k-nearest neighbors

k-nearest neighbors X X -

Dimentionality reduction

Feature selection X- -

Principle component analysis X-X

Linear discriminant analysis X-X

Independent component analysis X-X

Gradient descent

Gradient descent X- -

Active learning

Active learning X- -

Bayesian ML XXX

Decision tree learning

Bagging X X -

Bagged tree - - X

Naive Bayer tree X-X

Adaptive boosting - - X

Random forest - - X

M5P X- -

Decision table

Decision table naive Bayes X X -

Surrogate-based optimization

Surrogate-based optimization X- -

QSAR

QSAR X-X

Boltzmann generator

Boltzmann generator X- -

Feedback system control

Feedback system control - - X

Quadratic phenotypic optimization platform

Quadratic phenotypic optimization platform X-X

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 25

where the dimension of the data is no higher than 2.

Dimensionality reduction methods have been applied in the

nano-structure and material design [204], [205] as well as

in therapy development [206]. Their objective is to remove

dimensions, i.e. redundant features, in order to identify the

more suitable variable for the problem under investigation. As

a result, they contribute to data compression and to computa-

tion time reduction. Moreover, they are capable of transform-

ing multi-dimensional problems into two dimensional (2D)

or 3D ones allowing their visualization. This property has

been extensively used in nano-structure properties discovery.

Likewise, dimensionality reduction methods can aid at noise

removal; thus, they can signiﬁcantly improve the model’s

performance. However, they come with some disadvantages.

In particular, they cause data loss. Moreover, PCA tends to

extract linear correlations between variables. In practice, most

of the nano-structure properties have a non-linear behavior. As

a result, PCA may return unrealistic results. This highlights the

need of designing new dimensionality reduction methods that

take into accounts the chemical and biological properties of the

nano-structure components. Finally, dimensionality reduction

methods traditionally fail in cases where the datasets cannot

be fully deﬁned by their mean and covariance.

GD is an iterative ML optimization algorithm that aims

at reducing the cost function in order to make accurate

predictions; therefore, it has been employed in predicting the

properties of heterogeneous nano-structures. Its main disad-

vantage is that the solution returned by this method is not

guaranteed to be a global minimum. As a result, every time

that the search-space is expanded, due to the incorporation of

an additional parameter into the objective function, the surface

of optimal solutions may exhibit numerous locally optimal

solutions. Thus, conventional GD algorithms may return a non-

global local optimum. In this context, examination of more

sophisticated GD algorithms needs to be performed. Finally,

GD may be seen as an attractive optimization tool for ﬁnding

Pareto-optimal solutions of multi-objective optimization prob-

lems in nano-scale networks. Such problems would aim at

minimizing the outage probability, power consumption and/or

maximizing throughput, network lifetime and other parameters

that improve the network’s quality of experience.

DTL algorithms are able to solve both regression and

classiﬁcation problems. As a result, they have been extensively

used in several ﬁelds including structure and material design

and simulation as well as analyzing data acquired from nano-

scale systems. Compared to other ML algorithms, decision

tree and table learning algorithms simplify data preparation

processes, since they demand neither data normalization nor

scaling. Moreover, they perform well even when with in-

complete data sets and their models are very intuitive and

easy to explain. Therefore, several researchers have used them

to provide comprehensive understanding of the properties of

nano-structures and the relationship with their design param-

eters. However, DTL algorithms are sensitive to even small

changes in the data. In more detail, a small change in the

data may result in a signiﬁcant change in the structure of the

decision tree, which in turn may cause instability. Another

disadvantage of decision trees and tables is that they require

higher time to train the models and to perform after-training

calculations. Finally, they are incapable for applying regression

and predicting continuous values. These disadvantages render

them unsuitable for use in real-time applications in the ﬁelds

of communications and signal processing as well as in nano-

scale networks.

QSARs are mathematical models, which relate a phar-

macological or biological activity with the physicochemical

characteristics (termed molecular descriptors) of molecule

sets. Indicative examples of QSAR applications are the study

of enzyme activity [207], the minimum effective dose of

a drug estimation [208], and toxicity prediction of nano-

structures [209]. The main advantage of QSAR models lies

with their ability to predict activities of a large number of

compounds with little to no prior experimental data. However,

they are incapable of providing in-depth insights on the

mechanism behind biological actions.

Boltzmann generators have been employed to create physi-

cally realistic one-shot samples of model systems and proteins

in implicit solvent [210], [211]. Scaling to large systems,

such as those investigated in MCs and nano-scale networks,

needs to build the invariances of the energy, as the exchange

of molecules, into the transformation to include parameter

sharing. In other words, researchers need to develop equiv-

ariant networks with parameter sharing. These networks are

expected to provide a better understanding of molecular chan-

nel modeling and eventually contribute to the design of new

transmission schemes.

V. CONCLUSION

In summary, in this article, we have reviewed how ML

algorithms bear fruit in nano-scale biomedical engineering. In

more detail, we presented the main challenges and problems

in this ﬁeld, which, due to their high complexity, require the

use of ML in order to be solved, and classiﬁed them, based

on their discipline, into three distinctive categories. For each

category, we have provided insightful discussions that revealed

its particularities as well as existing research gaps. Moreover,

we have surveyed a variate of SOTA ML methodologies

and models, which have been used as countermeasures to

the aforementioned challenges. Special attention was payed

to the ML methodologies architecture, operating principle,

advantages and limitations. Finally, future research directions

have been provided, which highlight the need of thorough

interdisciplinary research efforts for the successful realization

of hitherto uncharted scenarios and applications in the nano-

scale biomedical engineering ﬁeld.

REFERENCES

[1] D. Bobo, K. J. Robinson, J. Islam, K. J. Thurecht, and S. R. Corrie,

“Nanoparticle-Based Medicines: A Review of FDA-Approved Mate-

rials and Clinical Trials to Date,” Pharm. Res., vol. 33, no. 10, pp.

2373–2387, Jun. 2016.

[2] I. Akyildiz, M. Pierobon, S. Balasubramaniam, and Y. Koucheryavy,

“The internet of Bio-Nano things,” IEEE Commun. Mag., vol. 53, no. 3,

pp. 32–40, Mar. 2015.

[3] N. Farsad, H. B. Yilmaz, A. Eckford, C.-B. Chae, and W. Guo, “A

comprehensive survey of recent advancements in molecular communi-

cation,” IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 1887–1919,

Feb. 2016.

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 26

[4] T. J. Cleophas and A. H. Zwinderman, Machine Learning in Medicine

- a Complete Overview. Springer International Publishing, 2015.

[5] S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vuckovi´

c, and A. W.

Rodriguez, “Inverse design in nanophotonics,” Nat. Photonics, vol. 12,

no. 11, pp. 659–670, Oct. 2018.

[6] X. Qian, M. D. Renzo, and A. Eckford, “Molecular communications:

Model-based and data-driven receiver design and optimization,” IEEE

Access, vol. 7, pp. 53 555–53 565, Apr. 2019.

[7] F. Bao, Y. Deng, Y. Zhao, J. Suo, and Q. Dai, “Bosco: Boosting correc-

tions for genome-wide association studies with imbalanced samples,”

IEEE Trans. Nanobiosci., vol. 16, no. 1, pp. 69–77, Jan. 2017.

[8] X. Duan, L. Dai, S.-C. Chen, J. P. Balthasar, and J. Qu, “Nano-scale

liquid chromatography/mass spectrometry and on-the-ﬂy orthogonal

array optimization for quantiﬁcation of therapeutic monoclonal anti-

bodies and the application in preclinical analysis,” J. Chromatogr. A,

vol. 1251, pp. 63–73, Aug. 2012.

[9] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh,

“Machine learning for molecular and materials science,” Nature, vol.

559, no. 7715, pp. 547–555, Jul. 2018.

[10] J. Behler and M. Parrinello, “Generalized neural-network representa-

tion of high-dimensional potential-energy surfaces,” Phys. Rev. Lett.,

vol. 98, no. 14, p. 146401, Apr. 2007.

[11] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Fast

and accurate modeling of molecular atomization energies with machine

learning,” Phys. Rev. Lett., vol. 108, no. 5, p. 058301, Jan. 2012.

[12] F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke, and

K.-R. MÃijller, “Bypassing the Kohn-Sham equations with machine

learning,” Nat. Commun., vol. 8, no. 1, Oct. 2017.

[13] T. Bereau, R. A. DiStasio, A. Tkatchenko, and O. A. von Lilienfeld,

“Non-covalent interactions across organic and biological subsets of

chemical space: Physics-based potentials parametrized from machine

learning,” J. Chem. Phys., vol. 148, no. 24, p. 241706, Jun. 2018.

[14] S. Chmiela, H. E. Sauceda, K.-R. MÃijller, and A. Tkatchenko,

“Towards exact molecular dynamics simulations with machine-learned

force ﬁelds,” Nat. Commun., vol. 9, no. 1, Sep. 2018.

[15] J. S. Smith, B. T. Nebgen, R. Zubatyuk, N. Lubbers,

C. Devereux, K. Barros, S. Tretiak, O. Isayev, and A. Roitberg,

“Approaching coupled cluster accuracy with a general-purpose neural

network potential through transfer learning,” ChemRxiv, 6 2019.

[Online]. Available: https://chemrxiv.org/articles/preprint/Outsmarting_

Quantum_Chemistry_Through_Transfer_Learning/6744440

[16] S. T. John and G. Csányi, “Many-Body Coarse-Grained Interactions

Using Gaussian Approximation Potentials,” J. Chem. Phys. B, vol. 121,

no. 48, pp. 10 934–10 949, Nov. 2017.

[17] L. Zhang, J. Han, H. Wang, R. Car, and W. E, “DeePCG: Constructing

coarse-grained models via deep neural networks,” J. Chem. Phys., vol.

149, no. 3, p. 034101, Jul. 2018.

[18] J. Wang, S. Olsson, C. Wehmeyer, A. Pérez, N. E. Charron, G. de Fab-

ritiis, F. Noé, and C. Clementi, “Machine Learning of Coarse-Grained

Molecular Dynamics Force Fields,” ACS Cent. Sci., vol. 5, no. 5, pp.

755–767, Apr. 2019.

[19] T. Stecher, N. Bernstein, and G. Csányi, “Free Energy Surface Recon-

struction from Umbrella Samples Using Gaussian Process Regression,”

J. Chem. Theory Comput., vol. 10, no. 9, pp. 4079–4097, Aug. 2014.

[20] L. Mones, N. Bernstein, and G. Csányi, “Exploration, sampling, and

reconstruction of free energy surfaces with gaussian process regres-

sion,” J. Chem. Theory Comput., vol. 12, no. 10, pp. 5100–5110, Sep.

2016.

[21] E. Schneider, L. Dai, R. Q. Topper, C. Drechsel-Grau, and M. E.

Tuckerman, “Stochastic Neural Network Approach for Learning High-

Dimensional Free Energy Surfaces,” Phys. Rev. Lett., vol. 119, no. 15,

p. 150601, Oct. 2017.

[22] J. Ribeiro, P. Collado, Y. Wang, and P. Tiwary, “Reweighted Autoen-

coded Variational Bayes for Enhanced Sampling (RAVE),” J. Chem.

Phys., vol. 149, no. 7, p. 072301, Feb. 2018.

[23] J. R. Cendagorta, J. Tolpin, E. Schneider, R. Q. Topper, and M. E.

Tuckerman, “Comparison of the Performance of Machine Learning

Models in Representing High-Dimensional Free Energy Surfaces and

Generating Observables,” J. Chem. Phys. B, vol. 124, no. 18, pp. 3647–

3660, Apr. 2020.

[24] B. M. Warﬁeld and P. C. Anderson, “Molecular simulations and

markov state modeling reveal the structural diversity and dynamics

of a theophylline-binding RNA aptamer in its unbound state,” PLOS

ONE, vol. 12, no. 4, pp. 1–34, Apr. 2017.

[25] A. Mardt, L. Pasquali, H. Wu, and F. Noé, “VAMPnets for deep

learning of molecular kinetics,” Nat. Commun., vol. 9, no. 1, Jan. 2018.

[26] H. Wu, A. Mardt, L. Pasquali, and F. Noe, “Deep Generative Markov

State Models,” ArXiv, May 2018.

[27] W. Chen, H. Sidky, and A. L. Ferguson, “Nonlinear discovery of

slow molecular modes using state-free reversible VAMPnets,” J. Chem.

Phys., vol. 150, no. 21, p. 214114, Jun. 2019.

[28] F. Noé, S. Olsson, J. Köhler, and H. Wu, “Boltzmann generators:

Sampling equilibrium states of many-body systems with deep learning,”

Science, vol. 365, no. 6457, p. eaaw1147, Sep. 2019.

[29] J. Peurifoy, Y. Shen, L. Jing, Y. Yang, F. Cano-Renteria, B. G. DeLacy,

J. D. Joannopoulos, M. Tegmark, and M. Soljaˇ

ci´

c, “Nanophotonic

particle simulation and inverse design using artiﬁcial neural networks,”

Sci. Adv, vol. 4, no. 6, p. eaar4206, Jun. 2018.

[30] D. Liu, Y. Tan, E. Khoram, and Z. Yu, “Training deep neural networks

for the inverse design of nanophotonic structures,” ACS Photonics,

vol. 5, no. 4, pp. 1365–1369, Feb. 2018.

[31] Z. Liu, D. Zhu, S. P. Rodrigues, K.-T. Lee, and W. Cai, “Generative

Model for the Inverse Design of Metasurfaces,” Nano Lett., vol. 18,

no. 10, pp. 6570–6576, Sep. 2018.

[32] B. Cao, L. A. Adutwum, A. O. Oliynyk, E. J. Luber, B. C. Olsen,

A. Mar, and J. M. Buriak, “How to optimize materials and devices

via design of experiments and machine learning: Demonstration using

organic photovoltaics,” ACS Nano, vol. 12, no. 8, pp. 7434–7444, Jul.

2018.

[33] R. D. King, K. E. Whelan, F. M. Jones, P. G. K. Reiser, C. H. Bryant,

S. H. Muggleton, D. B. Kell, and S. G. Oliver, “Functional genomic hy-

pothesis generation and experimentation by a robot scientist,” Nature,

vol. 427, no. 6971, pp. 247–252, Jan. 2004.

[34] A.-A. A. Boulogeorgos, S. E. Trevlakis, and N. D. Chatzidiamantis,

“Optical wireless communications for in-body and transdermal biomed-

ical applications,” ArXiV, Apr. 2020.

[35] I. F. Akyildiz and J. M. Jornet, “Electromagnetic wireless nanosensor

networks,” Nano Commun. Netw., vol. 1, no. 1, pp. 3–19, Mar. 2010.

[36] N. Agoulmine, K. Kim, S. Kim, T. Rim, J.-S. Lee, and M. Meyyappan,

“Enabling communication and cooperation in bio-nanosensor networks:

toward innovative healthcare solutions,” IEEE Wireless Commun.,

vol. 19, no. 5, pp. 42–51, Oct. 2012.

[37] N. A. Ali and M. Abu-Elkheir, “Internet of nano-things healthcare ap-

plications: Requirements, opportunities, and challenges,” in 2015 IEEE

11th International Conference on Wireless and Mobile Computing,

Networking and Communications (WiMob), Abu Dhabi, United Arab

Emirates, Oct. 2015, pp. 9–14.

[38] S. Hiyama, Y. Moritani, T. Suda, R. Egashira, A. Enomoto, M. Moore,

and T. Nakano, “Molecular communication,” J. IEICE, vol. 89, no. 2,

p. 162, Feb. 2006.

[39] V. Jamali, A. Ahmadzadeh, C. Jardin, H. Sticht, and R. Schober,

“Channel estimation for diffusive molecular communications,” IEEE

Trans. Commun., pp. 4238 – 4252, Oct. 2016.

[40] S. M. R. Rouzegar and U. Spagnolini, “Diffusive MIMO Molecular

Communications: Channel Estimation, Equalization, and Detection,”

IEEE Transactions on Communications, vol. 67, no. 7, pp. 4872–4884,

Apr. 2019.

[41] S. Abdallah and A. M. Darya, “Semi-blind Channel Estimation for

Diffusive Molecular Communication,” IEEE Commun. Lett., pp. 1–1,

Jul. 2020.

[42] K. V. Srinivas, A. W. Eckford, and R. S. Adve, “Molecular commu-

nication in ﬂuid media: The additive inverse gaussian noise channel,”

IEEE Trans. Inf. Theory, vol. 58, no. 7, pp. 4678–4692, Jul. 2012.

[43] T. Nakano, Y. Okaie, and J.-Q. Liu, “Channel model and capacity

analysis of molecular communication with Brownian motion,” IEEE

Commun. Lett., vol. 16, no. 6, pp. 797–800, Jun. 2012.

[44] H. B. Yilmaz, A. C. Heren, T. Tugcu, and C.-B. Chae, “Three-

dimensional channel characteristics for molecular communications with

an absorbing receiver,” IEEE Commun. Lett., vol. 18, no. 6, pp. 929–

932, Jun. 2014.

[45] A. Ahmadzadeh, A. Noel, and R. Schober, “Analysis and design of

multi-hop diffusion-based molecular communication networks,” IEEE

Trans. Mol. Biol. Multi-Scale Commun., vol. 1, no. 2, pp. 144–157,

Jun. 2015.

[46] Q. Li, “The clock-free asynchronous receiver design for molecular

timing channels in diffusion-based molecular communications,” IEEE

Trans. Nanobiosci., vol. 18, no. 4, pp. 585–596, Oct. 2019.

[47] M. Pierobon and I. Akyildiz, “A physical end-to-end model for molec-

ular communication in nanonetworks,” IEEE J. Sel. Areas Commun.,

vol. 28, no. 4, pp. 602–611, May 2010.

[48] D. Kilinc and O. B. Akan, “Receiver design for molecular communi-

cation,” IEEE J. Sel. Areas Commun., vol. 31, no. 12, pp. 705–714,

Dec. 2013.

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 27

[49] A. Noel, D. Makrakis, and A. Haﬁd, “Channel Impulse Responses

in Diffusive Molecular Communication with Spherical Transmitters,”

arXiv: Emerging Technologies, Apr. 2016.

[50] F. Dinc, B. C. Akdeniz, A. E. Pusane, and T. Tugcu, “Impulse Response

of the Molecular Diffusion Channel With a Spherical Absorbing

Receiver and a Spherical Reﬂective Boundary,” IEEE Trans. Mol. Biol.

Multi-Scale Commun., vol. 4, no. 2, pp. 118–122, Jun. 2018.

[51] M. S. Kuran, H. B. Yilmaz, and T. Tugcu, “A tunnel-based approach

for signal shaping in molecular communication,” in IEEE International

Conference on Communications Workshops (ICC), Budapest, Hungary,

Jun. 2013, pp. 776–781.

[52] H. B. Yilmaz, C. Lee, Y. J. Cho, and C.-B. Chae, “A machine learning

approach to model the received signal in molecular communications,”

in IEEE International Black Sea Conference on Communications and

Networking (BlackSeaCom), Istanbul, Turkey, Jun. 2017, pp. 1–5.

[53] C. Lee, H. B. Yilmaz, C. Chae, N. Farsad, and A. Goldsmith, “Machine

learning based channel modeling for molecular MIMO communica-

tions,” in IEEE 18th International Workshop on Signal Processing

Advances in Wireless Communications (SPAWC), Sapporo, Japan, 2017,

pp. 1–5.

[54] N. Farsad and A. Goldsmith, “Neural Network Detection of Data

Sequences in Communication Systems,” IEEE Trans. Signal Process.,

vol. 66, no. 21, pp. 5663–5678, Nov. 2018.

[55] J. M. Jornet and I. F. Akyildiz, “Femtosecond-Long Pulse-Based

Modulation for Terahertz Band Communication in Nanonetworks,”

IEEE Trans. Commun., vol. 62, no. 5, pp. 1742–1754, May 2014.

[56] M. O. Iqbal, M. M. U. Rahman, M. A. Imran, A. Alomainy, and Q. H.

Abbasi, “Modulation Mode Detection and Classiﬁcationfor In Vivo

Nano-Scale Communication Systems Operating in Terahertz Band,”

IEEE Trans. Nanobiosci., vol. 18, no. 1, pp. 10–17, Jan. 2019.

[57] R. Zhang, K. Yang, Q. H. Abbasi, K. A. Qaraqe, and A. Alomainy,

“Analytical modelling of the effect of noise on the terahertz in-

vivo communication channel for body-centric nano-networks,” Nano

Commun. Netw., vol. 15, pp. 59–68, Mar. 2018.

[58] C.-C. Wang, X. Yao, W.-L. Wang, and J. M. Jornet, “Multi-hop

Deﬂection Routing Algorithm Based on Reinforcement Learning for

Energy-Harvesting Nanonetworks,” IEEE Trans. Mobile Comput., pp.

1–1, Jul. 2020.

[59] T. Nakano, M. J. Moore, F. Wei, A. V. Vasilakos, and J. Shuai, “Molec-

ular Communication and Networking: Opportunities and Challenges,”

IEEE Trans. NanoBiosci., vol. 11, no. 2, pp. 135–148, Jun. 2012.

[60] T. Nakano, T. Suda, Y. Okaie, M. J. Moore, and A. V. Vasilakos,

“Molecular Communication Among Biological Nanomachines: A Lay-

ered Architecture and Research Issues,” IEEE Trans. NanoBiosci.,

vol. 13, no. 3, pp. 169–197, Sep. 2014.

[61] M. S. Mannoor, H. Tao, J. D. Clayton, A. Sengupta, D. L. Kaplan, R. R.

Naik, N. Verma, F. G. Omenetto, and M. C. McAlpine, “Graphene-

based wireless bacteria detection on tooth enamel,” Nat. Commun.,

vol. 3, no. 1, Jan. 2012.

[62] P. M. Kosaka, V. Pini, J. J. Ruz, R. A. da Silva, M. U. González,

D. Ramos, M. Calleja, and J. Tamayo, “Detection of cancer biomarkers

in serum using a hybrid mechanical and optoplasmonic nanosensor,”

Nat. Nanotechnol., vol. 9, no. 12, pp. 1047–1053, Nov. 2014.

[63] T. C. Mai, M. Egan, T. Q. Duong, and M. Di Renzo, “Event Detection

in Molecular Communication Networks With Anomalous Diffusion,”

IEEE Commun. Lett., vol. 21, no. 6, pp. 1249–1252, Feb. 2017.

[64] A. Giaretta, S. Balasubramaniam, and M. Conti, “Security Vul-

nerabilities and Countermeasures for Target Localization in Bio-

NanoThings Communication Networks,” IEEE Trans. Inf. Forensics

Security, vol. 11, no. 4, pp. 665–676, Apr. 2016.

[65] A. Rizwan, A. Zoha, R. Zhang, W. Ahmad, K. Arshad, N. A. Ali,

A. Alomainy, M. A. Imran, and Q. H. Abbasi, “A Review on the Role

of Nano-Communication in Future Healthcare Systems: A Big Data

Analytics Perspective,” IEEE Access, vol. 6, pp. 41 903–41 920, Jul.

2018.

[66] M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, “Disease

Prediction by Machine Learning Over Big Data From Healthcare

Communities,” IEEE Access, vol. 5, pp. 8869–8879, Apr. 2017.

[67] D. Bardou, K. Zhang, and S. M. Ahmad, “Classiﬁcation of Breast Can-

cer Based on Histology Images Using Convolutional Neural Networks,”

IEEE Access, vol. 6, pp. 24 680–24 693, May 2018.

[68] B. Wilson and G. KM, “Artiﬁcial intelligence and related technologies

enabled nanomedicine for advanced cancer treatment,” Nanomedicine,

vol. 15, no. 5, pp. 433–435, Feb. 2020.

[69] M. B. M. A. Rashid, T. B. Toh, L. Hooi, A. Silva, Y. Zhang, P. F.

Tan, A. L. Teh, N. Karnani, S. Jha, C.-M. Ho, W. J. Chng, D. Ho,

and E. K.-H. Chow, “Optimizing drug combinations against multiple

myeloma using a quadratic phenotypic optimization platform (qpop),”

Sci. Transl. Med., vol. 10, no. 453, Aug. 2018.

[70] A. Zarrinpar, D.-K. Lee, A. Silva, N. Datta, T. Kee, C. Eriksen, K. Wei-

gle, V. Agopian, F. Kaldas, D. Farmer, S. E. Wang, R. Busuttil, C.-M.

Ho, and D. Ho, “Individualizing liver transplant immunosuppression

using a phenotypic personalized medicine platform,” Sci. Transl. Med.,

vol. 8, no. 333, pp. 333ra49–333ra49, Apr. 2016.

[71] A. J. Pantuck, D.-K. Lee, T. Kee, P. Wang, S. Lakhotia, M. H. Silver-

man, C. Mathis, A. Drakaki, A. S. Belldegrun, C.-M. Ho, and D. Ho,

“Modulating BET bromodomain inhibitor ZEN-3694 and enzalutamide

combination dosing in a metastatic prostate cancer patient using CU-

RATE.AI, an artiﬁcial intelligence platform,” Advanced Therapeutics,

vol. 1, no. 6, p. 1800104, Aug. 2018.

[72] S. Suthaharan, Machine Learning Models and Algorithms

for Big Data Classiﬁcation. New York, USA: Springer-

Verlag GmbH, Oct. 2015. [Online]. Available: https:

//www.ebook.de/de/product/25161991/shan_suthaharan_machine_

learning_models_and_algorithms_for_big_data_classiﬁcation.html

[73] T. Hastie, The Elements of Statistical Learning : Data Mining, Infer-

ence, and Prediction. City: Springer, 2001.

[74] K. Shibata, T. Tanigaki, T. Akashi, H. Shinada, K. Harada, K. Niitsu,

D. Shindo, N. Kanazawa, Y. Tokura, and T. hisa Arima, “Current-

driven motion of domain boundaries between skyrmion lattice and

helical magnetic structure,” Nano Lett., vol. 18, no. 2, pp. 929–933,

Jan. 2018.

[75] J. Carrasquilla and R. G. Melko, “Machine learning phases of matter,”

Nat. Phys., vol. 13, no. 5, pp. 431–434, Feb. 2017.

[76] M. Rashidi and R. A. Wolkow, “Autonomous scanning probe mi-

croscopy in situ tip conditioning through machine learning,” ACS Nano,

vol. 12, no. 6, pp. 5185–5189, May 2018.

[77] R. S. Hegde, “Deep learning: A new tool for photonic nanostructure

design,” Nanoscale Advances, vol. 2, no. 3, pp. 1007–1023, Feb. 2020.

[78] N. Farsad, D. Pan, and A. Goldsmith, “A novel experimental platform

for in-vessel multi-chemical molecular communications,” in IEEE

Global Communications Conference, Dec. 2017.

[79] X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and

A. Ozcan, “All-optical machine learning using diffractive deep neural

networks,” Science, vol. 361, no. 6406, pp. 1004–1008, Jul. 2018.

[80] J. Park, K.-Y. Kim, and O. Kwon, “Comparison of machine learning

algorithms to predict psychological wellness indices for ubiquitous

healthcare system design,” in Proceedings of the 2014 International

Conference on Innovative Design and Manufacturing (ICIDM). IEEE,

Aug. 2014. [Online]. Available: https://doi.org/10.1109%2Fidam.2014.

6912705

[81] C. R. Seela, B. Ravisankar, and B. Raju, “A GRNN based frame work

to test the inﬂuence of nano zinc additive biodiesel blends on CI engine

performance and emissions,” Egypt. J. Pet., vol. 27, no. 4, pp. 641–647,

Dec. 2018.

[82] M. J. Zarei, H. R. Ansari, P. Keshavarz, and M. M. Zerafat, “Prediction

of pool boiling heat transfer coefﬁcient for various nano-refrigerants

utilizing artiﬁcial neural networks,” J. Therm. Anal. Calorim., vol. 139,

no. 6, pp. 3757–3768, Aug. 2019.

[83] G. M. Uddin, K. Ziemer, A. Zeid, and S. Kamarthi, “Study of lattice

strain propagation in molecular beam epitaxy of nano scale magnesium

oxide thin ﬁlm on 6h-SiC substrates using neural network computer

models,” in Volume 9: Micro- and Nano-Systems Engineering and

Packaging, Parts A and B. American Society of Mechanical Engineers,

Nov. 2012.

[84] S. So and J. Rho, “Designing nanophotonic structures using conditional

deep convolutional generative adversarial networks,” Nanophotonics,

vol. 8, no. 7, pp. 1255–1261, Jun. 2019.

[85] J. Han, L. Zhang, R. Car, and W. E, “Deep potential: A general rep-

resentation of a many-body potential energy surface,” Comm. Comput.

Phys., vol. 23, no. 3, Jan. 2018.

[86] Y. Nagai, M. Okumura, and A. Tanaka, “Self-learning monte carlo

method with behler-parrinello neural networks,” Phys. Rev. B, vol. 101,

no. 11, Mar. 2020.

[87] M. Liu and J. R. Kitchin, “SingleNN: Modiﬁed behler-parrinello

neural network with shared weights for atomistic simulations with

transferability,” The Journal of Physical Chemistry C, vol. 124, no. 32,

pp. 17 811–17 818, Jul. 2020.

[88] L. Zhang, J. Han, H. Wang, R. Car, and W. E, “Deep potential

molecular dynamics: A scalable model with the accuracy of quantum

mechanics,” Phys. Rev. Lett., vol. 120, no. 14, Apr. 2018.

[89] K. T. Schutt, F. Arbabzadah, S. Chmiela, K. R. Muller, and

A. Tkatchenko, “Quantum-chemical insights from deep tensor neural

networks,” Nat. Commun., vol. 8, no. 1, Jan. 2017.

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 28

[90] P.-J. K. Kristof T. SchÃijtt, H. E. Sauceda, S. Chmiela, A. Tkatchenko,

and K.-R. Müller, “SchNet: A continuous-ﬁlter convolutional neural

network for modeling quantum interactions,” Advances in Neural

Information Processing Systems, vol. 30, pp. 991–1001, Dec. 2017.

[91] X. Gao, F. Ramezanghorbani, O. Isayev, J. S. Smith, and A. E.

Roitberg, “TorchANI: A free and open source PyTorch-based deep

learning implementation of the ANI neural network potentials,” Journal

of Chemical Information and Modeling, vol. 60, no. 7, pp. 3408–3415,

jun 2020.

[92] A. Davtyan, G. A. Voth, and H. C. Andersen, “Dynamic force match-

ing: Construction of dynamic coarse-grained models with realistic short

time dynamics and accurate long time dynamics,” The Journal of

Chemical Physics, vol. 145, no. 22, p. 224107, Dec. 2016.

[93] F. Nüske, L. Boninsegna, and C. Clementi, “Coarse-graining molecular

systems by spectral matching,” The Journal of Chemical Physics, vol.

151, no. 4, p. 044116, Jul. 2019.

[94] L. Chua and T. Roska, “The CNN paradigm,” IEEE Trans. Circuits

Syst. I, vol. 40, no. 3, pp. 147–156, Mar. 1993.

[95] M. Egmont-Petersen, D. de Ridder, and H. Handels, “Image processing

with neural networks—a review,” Pattern Recognit., vol. 35, no. 10, pp.

2279–2301, Oct. 2002.

[96] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall,

M. B. Gotway, and J. Liang, “Convolutional neural networks for

medical image analysis: Full training or ﬁne tuning?” IEEE Trans.

Med. Imag., vol. 35, no. 5, pp. 1299–1312, May 2016.

[97] L. Fang, C. Wang, S. Li, H. Rabbani, X. Chen, and Z. Liu, “Attention

to lesion: Lesion-aware convolutional neural network for retinal optical

coherence tomography image classiﬁcation,” IEEE Trans. Med. Imag.,

vol. 38, no. 8, pp. 1959–1970, Aug. 2019.

[98] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent

neural networks for sequence learning,” ArXiV, Oct. 2015.

[99] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural

Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.

[100] D. F. Specht, “A general regression neural network,” IEEE Transactions

on Neural Networks, vol. 2, no. 6, pp. 568–576, Nov. 1991.

[101] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of

Statistical Learning. Springer New York, 2009. [Online]. Available:

https://doi.org/10.1007%2F978-0-387-84858-7

[102] B. J. Wythoff, “Backpropagation neural networks: A tutorial,”

Chemom. Intell. Lab. Syst., vol. 18, no. 2, pp. 115 – 155, Mar. 1993.

[Online]. Available: http://www.sciencedirect.com/science/article/pii/

016974399380052J

[103] K. A. Brown, S. Brittman, N. Maccaferri, D. Jariwala, and U. Celano,

“Machine learning in nanoscience: Big data at small scales,” Nano

Lett., vol. 20, no. 1, pp. 2–10, Dec. 2019.

[104] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-

Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative

adversarial nets,” in Advances in Neural Information Processing

Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D.

Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc.,

2014, pp. 2672–2680. [Online]. Available: http://papers.nips.cc/paper/

5423-generative-adversarial-nets.pdf

[105] A. P. Bartók, R. Kondor, and G. Csányi, “On representing chemical

environments,” Phys. Rev. B, vol. 87, no. 18, May 2013.

[106] J. S. Smith, O. Isayev, and A. E. Roitberg, “ANI-1: an extensible neural

network potential with DFT accuracy at force ﬁeld computational cost,”

Chem. Sci., vol. 8, no. 4, pp. 3192–3203, Feb. 2017.

[107] K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and

K.-R. MÃijller, “SchNet – a deep learning architecture for molecules

and materials,” The Journal of Chemical Physics, vol. 148, no. 24, p.

241722, Jun. 2018.

[108] K. T. Schütt, P. Kessel, M. Gastegger, K. A. Nicoli, A. Tkatchenko, and

K.-R. MÃijller, “SchNetPack: A deep learning toolbox for atomistic

systems,” J. Chem. Theory Comput., vol. 15, no. 1, pp. 448–455, Nov.

2018.

[109] K. T. Schütt, A. Tkatchenko, and K.-R. Müller, Learning Represen-

tations of Molecules and Materials with Atomistic Neural Networks.

Cham: Springer International Publishing, 2020, pp. 215–230. [Online].

Available: https://doi.org/10.1007/978-3-030-40245-7_11

[110] W.-K. Jeong, H. Pﬁster, and M. Fatica, “Medical image processing

using GPU-accelerated ITK image ﬁlters,” in GPU Computing Gems

Emerald Edition. Elsevier, 2011, pp. 737–749.

[111] A. P. Lyubartsev and A. Laaksonen, “Calculation of effective inter-

action potentials from radial distribution functions: A reverse monte

carlo approach,” Physical Review E, vol. 52, no. 4, pp. 3730–3737,

Oct. 1995.

[112] C. Clementi, H. Nymeyer, and J. N. Onuchic, “Topological and

energetic factors: what determines the structural details of the transition

state ensemble and “en-route” intermediates for protein folding? an

investigation for small globular proteins,” J. Mol. Biol., vol. 298, no. 5,

pp. 937–953, May 2000.

[113] F. Müller-Plathe, “Coarse-graining in polymer simulation: From the

atomistic to the mesoscopic scale and back,” Chem. Phys. Chem., vol. 3,

no. 9, pp. 754–769, Sep. 2002.

[114] S. O. Nielsen, C. F. Lopez, G. Srinivas, and M. L. Klein, “A coarse

grain model for n-alkanes parameterized from surface tension data,”

The Journal of Chemical Physics, vol. 119, no. 14, pp. 7043–7049,

Oct. 2003.

[115] S. Matysiak and C. Clementi, “Optimal combination of theory and

experiment for the characterization of the protein folding landscape of

s6: How far can a minimalist model go?” J. Mol. Biol., vol. 343, no. 1,

pp. 235–248, Oct. 2004.

[116] S. J. Marrink, A. H. de Vries, and A. E. Mark, “Coarse grained

model for semiquantitative lipid simulations,” The Journal of Physical

Chemistry B, vol. 108, no. 2, pp. 750–760, Jan. 2004.

[117] S. Matysiak and C. Clementi, “Minimalist protein model as a diagnostic

tool for misfolding and aggregation,” J. Mol. Biol., vol. 363, no. 1, pp.

297–308, Oct. 2006.

[118] Y. Wang, W. G. Noid, P. Liu, and G. A. Voth, “Effective force coarse-

graining,” Phys. Chem. Chem. Phys., vol. 11, no. 12, p. 2002, Feb.

2009.

[119] J. Chen, J. Chen, G. Pinamonti, and C. Clementi, “Learning effective

molecular models from experimental observables,” J. Chem. Theory

Comput., vol. 14, no. 7, pp. 3849–3858, May 2018.

[120] D. Strukov, G. Snider, D. Stewart, and S. Williams, “The missing

memristor found,” Nature, vol. 453, pp. 80–3, Jun. 2008.

[121] Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones,

M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and

M. Soljaˇ

ci´

c, “Deep learning with coherent nanophotonic circuits,” Nat.

Photonics, vol. 11, no. 7, pp. 441–446, Jun. 2017.

[122] J. K. George, A. Mehrabian, R. Amin, J. Meng, T. F. de Lima, A. N.

Tait, B. J. Shastri, T. El-Ghazawi, P. R. Prucnal, and V. J. Sorger,

“Neuromorphic photonics with electro-absorption modulators,” Opt.

Express, vol. 27, no. 4, pp. 5181–5191, Feb. 2019.

[123] M. A. Zidan, J. P. Strachan, and W. D. Lu, “The future of electronics

based on memristive systems,” Nat. Electron., vol. 1, no. 1, pp. 22–29,

Jan. 2018.

[124] G. Yamankurt, E. J. Berns, A. Xue, A. Lee, N. Bagheri, M. Mrksich,

and C. A. Mirkin, “Exploration of the nanomedicine-design space with

high-throughput screening and machine learning,” Nature Biomedical

Engineering, vol. 3, no. 4, pp. 318–327, Feb. 2019.

[125] C. M. Pérez-Espinoza, N. Beltran-Robayo, T. Samaniego-Cobos,

A. Alarcón-Salvatierra, A. Rodriguez-Mendez, and P. Jaramillo-

Barreiro, “Using a machine learning logistic regression algorithm

to classify nanomedicine clinical trials in a known repository,” in

Communications in Computer and Information Science. Springer

International Publishing, 2019, pp. 98–110.

[126] C. Sayes and I. Ivanov, “Comparative study of predictive computational

models for nanoparticle-induced cytotoxicity,” Risk Anal., vol. 30,

no. 11, pp. 1723–1734, Jun. 2010.

[127] D. E. Jones, H. Ghandehari, and J. C. Facelli, “Predicting cytotoxicity

of PAMAM dendrimers using molecular descriptors,” Beilstein J.

Nanotechnol., vol. 6, pp. 1886–1896, Sep. 2015.

[128] Q. U. Ain, A. Aleksandrova, F. D. Roessler, and P. J. Ballester,

“Machine-learning scoring functions to improve structure-based bind-

ing afﬁnity prediction and virtual screening,” WIREs Computational

Molecular Science, vol. 5, no. 6, pp. 405–424, Aug. 2015.

[129] H. Li, J. Peng, Y. Leung, K.-S. Leung, M.-H. Wong, G. Lu, and P. J.

Ballester, “The impact of protein structure and sequence similarity on

the accuracy of machine-learning scoring functions for binding afﬁnity

prediction,” Biomolecules, vol. 8, no. 1, Mar. 2018.

[130] C. G. Atkeson, A. W. Moore, and S. Schaal, “Locally weighted

learning for control,” in Lazy Learning. Springer Netherlands,

1997, pp. 75–113. [Online]. Available: https://doi.org/10.1007%

2F978-94-017-2053-3_3

[131] P. Samui, S. Sekhar, and V. E. Balas, “Chapter 27 - support vector

machine: Principles, parameters, and applications,” in Handbook of

Neural Computation. Academic Press, 2017, pp. 515 – 535.

[132] J. Platt, “Sequential minimal optimization: A fast algorithm for training

support vector machines,” Advances in Kernel Methods-Support Vector

Learning, vol. 208, Jul. 1998.

IEEE TRANSACTIONS ON MOLECULAR, BIOLOGICAL, AND MULTI-SCALE COMMUNICATIONS, VOL. -, NO. -, - 2020 29

[133] E. Osuna, R. Freund, and F. Girosi, “An improved training algorithm

for support vector machines,” in Neural Networks for Signal Processing

VII â ˘

Aˇ

T Proceedings of the 1997 IEEE Workshop, pages 276 â ˘

A¸S 285.

IEEE, 1997.

[134] K. A. Cyran, J. Kawulok, M. Kawulok, M. Stawarz, M. Michalak,

M. Pietrowska, P. Widłak, and J. Pola ´

nska, Support Vector Machines

in Biomedical and Biometrical Applications. Berlin, Heidelberg:

Springer Berlin Heidelberg, 2013, pp. 379–417.

[135] J. Li, W. Zhang, X. Bao, M. Abbaszadeh, and W. Guo, “Inference

in turbulent molecular information channels using support vector

machine,” IEEE Trans. Mol. Biol. Multi-Scale Commun., vol. 6, no. 1,

pp. 25–35, Jun. 2020.

[136] S. Mohamed, D. Jian, L. Hongwei, and Z. Decheng, “Molecular

communication via diffusion with spherical receiver and transmitter and

trapezoidal container,” Microprocess. Microsyst., vol. 74, p. 103017,

Feb. 2020.

[137] P. Cunningham and S. J. Delany, “k-Nearest Neighbour Classiﬁers,”

University College Dublin, Tech. Rep., Mar. 2007.

[138] K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and

D. I. Fotiadis, “Machine learning applications in cancer prognosis and

prediction,” Comput. Struct. Biotechnol. J., vol. 13, pp. 8–17, Jul. 2015.

[139] J. Tan, M. Ung, C. Cheng, and C. S. Greene, “Unsupervised Feature

Construction and Knowledge Extraction from Genome-wide Assays of

Breast Cancer With Denoising Autoencoders,” in Biocomputing 2015.

World Scientiﬁc, Nov. 2014.

[140] X. Ren, Y. Wang, L. Chen, X.-S. Zhang, and Q. Jin, “ellipsoidFN: a

tool for identifying a heterogeneous set of cancer biomarkers based

on gene expressions,” Nucleic Acids Res., vol. 41, no. 4, pp. e53–e53,

Dec. 2012.

[141] M. Kim, N. Rai, V. Zorraquino, and I. Tagkopoulos, “Multi-omics

integration accurately predicts cellular state in unexplored conditions

for escherichia coli,” Nat. Commun., vol. 7, no. 1, Oct. 2016.

[142] S. Jesse and S. V. Kalinin, “Principal component and spatial correlation

analysis of spectroscopic-imaging data in scanning probe microscopy,”

Nanotechnology, vol. 20, no. 8, p. 085714, Feb. 2009.

[143] A. Subasi and M. I. Gursoy, “Eeg signal classiﬁcation using pca, ica,

lda and support vector machines,” Expert Syst Appl, vol. 37, no. 12,

pp. 8659–8666, Jul. 2010.

[144] L. Cao, K. Chua, W. Chong, H. Lee, and Q. Gu, “A comparison of pca,

kpca and ica for dimensionality reduction in support vector machine,”

2003.

[145] A. H. Fielding, Cluster and classiﬁcation techniques for the bio-

sciences. Cambridge: Cambridge University Press, 2006.

[146] P. Comon, “Independent component analysis, A new concept?” Signal

Processing, vol. 36, no. 3, pp. 287 – 314, 1994, higher Order

Statistics. [Online]. Available: http://www.sciencedirect.com/science/

article/pii/0165168494900299

[147] S. Ruder, “An overview of gradient descent optimization algorithms,”

arXiv preprint arXiv:1609.04747, 2016.