About
136
Publications
96,923
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
725
Citations
Introduction
In addition to my coding skills, my abilities and skills include machine learning, deep learning, feature extraction (dimensionality reduction & manifold learning), data visualization, data science, computer vision, image processing, and biomedical engineering in both application and theory. I have very good and in-depth understanding of machine learning algorithms.
Current institution
Publications
Publications (136)
Autonomous vehicles represent a revolutionary advancement driven by the integration of artificial intelligence within intelligent transportation systems. However, they remain vulnerable due to the absence of robust security mechanisms in the Controller Area Network (CAN) bus. In order to mitigate the security issue, many machine learning models and...
Self-supervised learning has gained significant attention in contemporary applications, particularly due to the scarcity of labeled data. While existing SSL methodologies primarily address feature variance and linear correlations, they often neglect the intricate relations between samples and the nonlinear dependencies inherent in complex data. In...
Large neural networks, such as large language models, cannot be easily implemented in edge devices and embedded systems, such as cell phones, because of limitations in memory storage and battery. Therefore, large neural networks need to be compressed. This tutorial and survey paper introduces the methods of neural network compression. Different cat...
Maximum Mean Discrepancy (MMD), also called the kernel two-sample test, is a measurement for difference of two probability distributions. It measures this difference by pulling data to the reproducing kernel Hilbert space and calculating the difference of their first moments in that space. Generative Moment Matching Network (GMMN) uses MMD between...
This is a tutorial and survey on PAC learnability and information bottleneck in deep learning. This paper first introduced PAC learnability, agnostic PAC learnability, the VC dimension, and error bounds in neural networks. It discusses that PAC learnability in deep learning does not align with the performance of deep learning in practice. Then, the...
This is a tutorial and survey paper on backpropagation and optimization in neural networks. It starts with gradient descent, line-search, momentum, and steepest descent. Then, backpropagation is introduced. Afterwards, stochastic gradient descent, mini-batch stochastic gradient descent, and their convergence rates are discussed. Adaptive learning r...
This is a tutorial and survey paper on reinforcement learning, from fundamental reinforcement learning to deep reinforcement learning. It starts with introducing the elements of reinforcement learning. Then, Markov decision process and policy are explained. Bellman equation is introduced. Then, value iteration, policy iteration, and modified policy...
This is a tutorial paper on graph neural networks including ChebNet, graph convolutional network, graph attention network, and graph autoencoder. It starts with Laplacian of graph, graph Fourier transform, and graph convolution. Then, it is explained how Chebyshev polynomials are used in graph networks to have ChebNet. Afterwards, graph convolution...
Diffusion models are a family of generative models which work based on a Markovian process. In their forward process, they gradually add noise to data until it becomes a complete noise. In the backward process, the data are gradually generated out of noise. In this tutorial paper, the Denoising Diffusion Probabilistic Model (DDPM) is fully explaine...
This is a tutorial paper for Hidden Markov Model (HMM). First, we briefly review the background on Expectation Maximization (EM), Lagrange multiplier, factor graph, the sum-product algorithm, the max-product algorithm, and belief propagation by the forward-backward procedure. Then, we introduce probabilistic graphical models including Markov random...
This is a tutorial and survey paper on the attention mechanism, transformers, BERT, and GPT. We first explain attention mechanism, sequence-to-sequence model without and with attention, self-attention, and attention in different areas such as natural language processing and computer vision. Then, we explain transformers which do not use any recurre...
Functional heavy-chain antibodies (HCAbs) are promising therapeutic agents for targeted cancer therapy. Since they do not have a light chain, HCAbs have a small antigen-binding domain that can unlock new functionalities by reaching epitopes that are inaccessible to classical antibodies. HCAbs can be obtained by immunizing camelids, but this approac...
Density estimation, which estimates the distribution of data, is an important category of probabilistic machine learning. A family of density estimators is mixture models, such as Gaussian Mixture Model (GMM) by expectation maximization. Another family of density estimators is the generative models which generate data from input latent variables. O...
This is a tutorial paper on Recurrent Neural Network (RNN), Long Short-Term Memory Network (LSTM), and their variants. We start with a dynamical system and backpropagation through time for RNN. Then, we discuss the problems of gradient vanishing and explosion in long-term dependencies. We explain close-to-identity weight matrix, long delays, leaky...
Principal Component Analysis (PCA) (Jolliffe, Principal component analysis. Springer, 2011) is a very well-known and fundamental linear method for subspace learning and dimensionality reduction (Friedman et al., The elements of statistical learning. vol. 2. Springer series in statistics, 2009). This method, which is also used for feature extraction...
Probabilistic methods are a category of dimensionality reduction. Among the probabilistic methods, there are neighbour embedding algorithms where the probabilities of neighbourhoods are used. In these algorithms, attractive and repulsive forces are utilized for neighbour and non-neighbour points, respectively.
Linear dimensionality reduction methods project data onto the low-dimensional column space of a projection matrix. Many of these methods, such as Principal Component Analysis (PCA) (see Chap. 5) and Fisher Discriminant Analysis (FDA) (see Chap. 6), learn a projection matrix for either better representation of data or discrimination between the clas...
Multidimensional Scaling (MDS) was first proposed in Torgerson and is one of the earliest proposed dimensionality reduction methods.
Learning models can be divided into discriminative and generative models. Discriminative models discriminate the classes of data for better separation of classes while the generative models learn a latent space that generates the data points. This chapter introduces generative models.
It is not wrong to say that almost all machine learning algorithms, including the dimensionality reduction methods, reduce to optimization. Many of the optimization methods can be explained in terms of the Karush-Kuhn-Tucker (KKT) conditions (Kjeldsen, Historia Math 27:331–361, 2000), proposed in Karush (Minima of functions of several variables wit...
It was mentioned in Chap. 11 that metric learning can be divided into three types of learning—spectral, probabilistic and deep metric learning.
It was mentioned in Chap. 11 that metric learning can be divided into spectral, probabilistic, and deep metric learning. Chapters 11 and 13 explained that both spectral and probabilistic metric learning methods use the generalized Mahalanobis distance, i.e., Eq. (11.53) in Chap. 11, and learn the weight matrix in the metric. Deep metric learning, h...
Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction method that can be used for manifold embedding and feature extraction.
In functional analysis—a field of mathematics—there are various spaces of either data points or functions. For example, the Euclidean space is a subset of the Hilbert space, while the Hilbert space itself is a subset of the Banach space. The Hilbert space is a space of functions and its dimensionality is often considered to be high.
Suppose there is a dataset that has labels, either for regression or classification. Sufficient Dimension Reduction (SDR), first proposed by Li, is a family of methods that find a transformation of the data to a lower dimensional space, which does not change the conditional of labels given the data.
Spectral dimensionality reduction methods deal with the graph and geometry of data and usually reduce to an eigenvalue or generalized eigenvalue problem (see Chap. 2).
Various spectral methods have been proposed over the past few decades. Some of the most well-known spectral methods include Principal Component Analysis (PCA), Multidimensional Scaling (MDS), Isomap, spectral clustering, Laplacian eigenmap, diffusion map, and Locally Linear Embedding (LLE).
Stochastic Neighbour Embedding (SNE) is a manifold learning and dimensionality reduction method that can be used for feature extraction and data visualization. It takes a probabilistic approach, to fit the data in the embedding space locally hoping to preserve the global structure of data.
Fisher Discriminant Analysis (FDA) attempts to find a subspace that separates the classes as much as possible, while the data also become as spread as possible.
Due to the effectiveness of using machine learning in physics, it has been widely received increased attention in the literature. However, the notion of applying physics in machine learning has not been given much awareness to. This work is a hybrid of physics and machine learning where concepts of physics are used in machine learning. We propose t...
After the development of different machine learning and manifold learning algorithms, it may be a good time to put them together to make a powerful mind for machine. In this work, we propose affective manifolds as components of a machine's mind. Every affective manifold models a characteristic group of mind and contains multiple states. We define t...
A family of dimensionality reduction methods known as metric learning learns a distance metric in an embedding space to separate dissimilar points and bring together similar points. In supervised metric learning, the aim is to discriminate classes by learning an appropriate metric.
Chapter 12 explained that learning models can be divided into discriminative and generative models. The Variational Autoencoder (VAE), introduced in this chapter, is a generative model. Variational inference is a technique that finds a lower bound on the log-likelihood of the data and maximizes the lower bound rather than the log-likelihood in the...
Suppose there is a generative model that takes random noise as input and generates a data point. The aim is to have the generated data point to be of good quality; therefore, there is a need to judge its quality. One way to judge it is to observe the generated sample and assess its quality visually. In this case, the judge is a human. However, it i...
Centuries ago, the Boltzmann distribution, also called the Gibbs distribution, was proposed. This energy-based distribution was found to be useful for statistically modelling physical systems. One of these systems was the Ising model, which modelled interacting particles with binary spins. Later, it was discovered that the Ising model could be a ne...
Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we look at the linear reconstruction step from a stochastic perspective where it is ass...
Consider a set of $n$ data points in the Euclidean space $\mathbb{R}^d$. This set is called dataset in machine learning and data science. Manifold hypothesis states that the dataset lies on a low-dimensional submanifold with high probability. All dimensionality reduction and manifold learning methods have the assumption of manifold hypothesis. In t...
This is a tutorial and survey paper on metric learning. Algorithms are divided into spectral, probabilistic, and deep metric learning. We first start with the definition of distance metric, Ma-halanobis distance, and generalized Mahalanobis distance. In spectral methods, we start with methods using scatters of data, including the first spectral met...
This is a tutorial and survey paper on Generative Adversarial Network (GAN), adversarial autoencoders, and their variants. We start with explaining adversarial learning and the vanilla GAN. Then, we explain the conditional GAN and DCGAN. The mode collapse problem is introduced and various methods, including minibatch GAN, unrolled GAN, BourGAN, mix...
This is a tutorial and survey paper on various methods for Sufficient Dimension Reduction (SDR). We cover these methods with both statistical high-dimensional regression perspective and machine learning approach for dimensionality reduction. We start with introducing inverse regression methods including Sliced Inverse Regression (SIR), Sliced Avera...
This is a tutorial and survey paper on Karush-Kuhn-Tucker (KKT) conditions, first-order and second-order numerical optimization, and distributed optimization. After a brief review of history of optimization, we start with some preliminaries on properties of sets, norms, functions, and concepts of optimization. Then, we introduce the optimization pr...
This work concentrates on optimization on Riemannian manifolds. The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a commonly used quasi-Newton method for numerical optimization in Euclidean spaces. Riemannian LBFGS (RLBFGS) is an extension of this method to Riemannian manifolds. RLBFGS involves computationally expensive vecto...
Uniform Manifold Approximation and Projection (UMAP) is one of the state-of-the-art methods for dimensionality reduction and data visualization. This is a tutorial and survey paper on UMAP and its variants. We start with UMAP algorithm where we explain probabilities of neighborhood in the input and embedding spaces, optimization of cost function, t...
This is a tutorial and survey paper on the Johnson-Lindenstrauss (JL) lemma and linear and nonlinear random projections. We start with linear random projection and then justify its correctness by JL lemma and its proof. Then, sparse random projections with $\ell_1$ norm and interpolation norm are introduced. Two main applications of random projecti...
This is a tutorial and survey paper on Boltzmann Machine (BM), Restricted Boltzmann Machine (RBM), and Deep Belief Network (DBN). We start with the required background on probabilistic graphical models, Markov random field, Gibbs sampling, statistical physics, Ising model, and the Hopfield network. Then, we introduce the structures of BM and RBM. T...
Data often have nonlinear patterns in machine learning. One can unfold the nonlinear manifold of a dataset for low-dimensional visualization and feature extraction. Locally Linear Embedding (LLE) is a nonlinear spectral method for dimensionality reduction and manifold unfolding. It embeds data using the same linear reconstruction weights as in the...
This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analys...
This is a tutorial and survey paper on kernels, kernel methods, and related fields. We start with reviewing the history of kernels in functional analysis and machine learning. Then, Mercer kernel, Hilbert and Banach spaces, Reproducing Kernel Hilbert Space (RKHS), Mercer's theorem and its proof, frequently used kernels, kernel construction from dis...
This is a tutorial and survey paper for nonlinear dimensionality and feature extraction methods which are based on the Laplacian of graph of data. We first introduce adjacency matrix, definition of Laplacian matrix, and the interpretation of Laplacian. Then, we cover the cuts of graph and spectral clustering which applies clustering in a subspace o...
We propose a new embedding method, named Quantile–Quantile Embedding (QQE), for distribution transformation and manifold embedding with the ability to choose the embedding distribution. QQE, which uses the concept of quantile–quantile plot from visual statistical tests, can transform the distribution of data to any theoretical desired distribution...
Histopathology image embedding is an active research area in computer vision. Most of the embedding models exclusively concentrate on a specific magnification level. However, a useful task in histopathology embedding is to train an embedding space regardless of the magnification level. Two main approaches for tackling this goal are domain adaptatio...
Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we propose two novel generative versions of LLE, named Generative LLE (GLLE), whose lin...
Raw data are usually required to be pre-processed for better representation or discrimination of classes. This pre-processing can be done by data reduction, i.e., either reduction in dimensionality or numerosity (cardinality). Dimensionality reduction can be used for feature extraction or data visualization. Numerosity reduction is useful for ranki...
The YouTube presentation of slides:
https://www.youtube.com/watch?v=bqtNjwI60dQ
The YouTube presentation of slides:
https://www.youtube.com/watch?v=S8lGRykashc
The YouTube presentation of slides:
https://www.youtube.com/watch?v=XuFBoRp4ZAM
The YouTube presentation of slides:
https://www.youtube.com/watch?v=iYwW9XdWPZw
The YouTube presentation of slides:
https://www.youtube.com/watch?v=LscooUyktz4
The YouTube presentation of slides:
https://www.youtube.com/watch?v=4Zk2UFGIrd0
The YouTube presentation of slides:
https://www.youtube.com/watch?v=3rXYXZiH-pA
Histopathology image embedding is an active research area in computer vision. Most of the embedding models exclusively concentrate on a specific magnification level. However, a useful task in histopathology embedding is to train an embedding space regardless of the magnification level. Two main approaches for tackling this goal are domain adaptatio...
Metric learning is a technique in manifold learning to find a projection subspace for increasing and decreasing the inter- and intra-class variances, respectively. Some metric learning methods are based on triplet learning with anchor-positive-negative triplets. Large margin metric learning for nearest neighbor classification is one of the fundamen...
Variants of Triplet networks are robust entities for learning a discriminative embedding subspace. There exist different triplet mining approaches for selecting the most suitable training triplets. Some of these mining methods rely on the extreme distances between instances, and some others make use of sampling. However, sampling from stochastic di...
This is a tutorial and survey paper on factor analysis, probabilistic Principal Component Analysis (PCA), variational inference, and Variational Autoencoder (VAE). These methods, which are tightly related, are dimensionality reduction and generative models. They asssume that every data point is generated from or caused by a low-dimensional latent f...
This is a tutorial and survey paper on the attention mechanism, transformers, BERT, and GPT. We first explain attention mechanism, sequence-to-sequence model without and with attention, self-attention, and attention in different areas such as natural language processing and computer vision. Then, we explain transformers which do not use any recurre...
Vehicle platooning has been shown to be quite fruitful in the transportation industry to enhance fuel economy, road throughput, and driving comfort. Model Predictive Control (MPC) is widely used in literature for platoon control to achieve certain objectives, such as safely reducing the distance among consecutive vehicles while following the leader...
We analyze the effect of offline and online triplet mining for colorectal cancer (CRC) histopathology dataset containing 100,000 patches. We consider the extreme, i.e., farthest and nearest patches to a given anchor, both in online and offline mining. While many works focus solely on selecting the triplets online (batch-wise), we also study the eff...
This is a tutorial and survey paper for Locally Linear Embedding (LLE) and its variants. The idea of LLE is fitting the local structure of manifold in the embedding space. In this paper, we first cover LLE, kernel LLE, inverse LLE, and feature fusion with LLE. Then, we cover out-of-sample embedding using linear reconstruction, eigenfunctions, and k...
This paper is a tutorial and literature review on sampling algorithms. We have two main types of sampling in statistics. The first type is survey sampling which draws samples from a set or population. The second type is sampling from probability distribution where we have a probability density or mass function. In this paper, we cover both types of...
Metric learning is one of the techniques in manifold learning with the goal of finding a projection subspace for increasing and decreasing the inter- and intra-class variances, respectively. Some of the metric learning methods are based on triplet learning with anchor-positive-negative triplets. Large margin metric learning for nearest neighbor cla...
Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in bo...
Multidimensional Scaling (MDS) is one of the first fundamental manifold learning methods. It can be categorized into several methods, i.e., classical MDS, kernel classical MDS, metric MDS, and non-metric MDS. Sammon mapping and Isomap can be considered as special cases of metric MDS and kernel classical MDS, respectively. In this tutorial and surve...
We consider the problem of distributed multi-choice voting in a setting that each node can communicate with its neighbors merely by sending beep signals. Given its simplicity, the beep communication model is of practical importance in different applications such as system biology and wireless sensor networks. Yet, the distributed majority voting ha...
Variants of Triplet networks are robust entities for learning a discriminative embedding subspace. There exist different triplet mining approaches for selecting the most suitable training triplets. Some of these mining methods rely on the extreme distances between instances, and some others make use of sampling. However, sampling from stochastic di...
We analyze the effect of offline and online triplet mining for colorectal cancer (CRC) histopathology dataset containing 100,000 patches. We consider the extreme, i.e., farthest and nearest patches with respect to a given anchor, both in online and offline mining. While many works focus solely on how to select the triplets online (batch-wise), we a...
As many algorithms depend on a suitable representation of data, learning unique features is considered a crucial task. Although supervised techniques using deep neural networks have boosted the performance of representation learning, the need for a large sets of labeled data limits the application of such methods. As an example, high-quality deline...
Human action recognition is one of the important fields of computer vision and machine learning. Although various methods have been proposed for 3D action recognition, some of which are basic and some use deep learning, the need of basic methods based on generalized eigenvalue problem is sensed for action recognition. This need is especially sensed...
In this paper, we introduce the Hinduism religion and philosophy. We start with introducing the holy books in Hinduism including Vedas and Upanishads. Then, we explain the simplistic Hinduism, Brahman, gods and their incarnations, stories of apocalypse, karma, reincarnation, heavens and hells, vegetarianism, and sanctity of cows. Then, we switch to...
We propose a new embedding method, named Quantile-Quantile Embedding (QQE), for distribution transformation, manifold embedding, and image embedding with the ability to choose the embedding distribution. QQE, which uses the concept of quantile-quantile plot from visual statistical tests, can transform the distribution of data to any theoretical des...
We present a new method which generalizes subspace learning based on eigenvalue and generalized eigenvalue problems. This method, Roweis Discriminant Analysis (RDA) named after Sam Roweis, is a family of infinite number of algorithms where Principal Component Analysis (PCA), Supervised PCA (SPCA), and Fisher Discriminant Analysis (FDA) are special...
Fisher Discriminant Analysis (FDA) is a subspace learning method which minimizes and maximizes the intra- and inter-class scatters of data, respectively. Although, in FDA, all the pairs of classes are treated the same way, some classes are closer than the others. Weighted FDA assigns weights to the pairs of classes to address this shortcoming of FD...
Generative models and inferential autoencoders mostly make use of \(\ell _2\) norm in their optimization objectives. In order to generate perceptually better images, this short paper theoretically discusses how to use Structural Similarity Index (SSIM) in generative models and inferential autoencoders. We first review SSIM, SSIM distance metrics, a...
After the tremendous development of neural networks trained by backpropagation, it is a good time to develop other algorithms for training neural networks to gain more insights into networks. In this paper, we propose a new algorithm for training feedforward neural networks which is fairly faster than backpropagation. This method is based on projec...
Human action recognition has been one of the most active fields of research in computer vision for last years. Two dimensional action recognition methods are facing serious challenges such as occlusion and missing the third dimension of data. Development of depth sensors has made it feasible to track positions of human body joints over time. This p...
As many algorithms depend on a suitable representation of data, learning unique features is considered a crucial task. Although supervised techniques using deep neural networks have boosted the performance of representation learning, the need for a large set of labeled data limits the application of such methods. As an example, high-quality delinea...
We propose a novel approach to anomaly detection called Curvature Anomaly Detection (CAD) and Kernel CAD based on the idea of polyhedron curvature. Using the nearest neighbors for a point, we consider every data point as the vertex of a polyhedron where the more anomalous point has more curvature. We also propose inverse CAD (iCAD) and Kernel iCAD...
We propose a novel approach to anomaly detection called Curvature Anomaly Detection (CAD) and Kernel CAD based on the idea of polyhedron curvature. Using the nearest neighbors for a point, we consider every data point as the vertex of a polyhedron where the more anomalous point has more curvature. We also propose inverse CAD (iCAD) and Kernel iCAD...
Siamese neural network is a very powerful architecture for both feature extraction and metric learning. It usually consists of several networks that share weights. The Siamese concept is topology-agnostic and can use any neural network as its backbone. The two most popular loss functions for training these networks are the triplet and contrastive l...
After the tremendous development of neural networks trained by backpropagation, it is a good time to develop other algorithms for training neural networks to gain more insights into networks. In this paper, we propose a new algorithm for training feedforward neural networks which is fairly faster than backpropagation. This method is based on projec...
Fisher Discriminant Analysis (FDA) is a subspace learning method which minimizes and maximizes the intra- and inter-class scatters of data, respectively. Although, in FDA, all the pairs of classes are treated the same way, some classes are closer than the others. Weighted FDA assigns weights to the pairs of classes to address this shortcoming of FD...
Generative models and inferential autoencoders mostly make use of $\ell_2$ norm in their optimization objectives. In order to generate perceptually better images, this short paper theoretically discusses how to use Structural Similarity Index (SSIM) in generative models and inferential autoencoders. We first review SSIM, SSIM distance metrics, and...
Vehicle platooning has been shown to be quite fruitful in the transportation industry to enhance fuel economy, road throughput, and driving comfort. Model Predictive Control (MPC) is widely used in literature for platoon control to achieve certain objectives, such as safely reducing the distance among consecutive vehicles while following the leader...
We propose a new method, named isolation Mondrian forest (iMondrian forest), for batch and online anomaly detection. The proposed method is a novel hybrid of isolation forest and Mondrian forest which are existing methods for batch anomaly detection and online random forest, respectively. iMondrian forest takes the idea of isolation, using the dept...
This is a tutorial paper for Hidden Markov Model (HMM). First, we briefly review the background on Expectation Maximization (EM), Lagrange multiplier, factor graph, the sum-product algorithm , the max-product algorithm, and belief propagation by the forward-backward procedure. Then, we introduce probabilistic graphical models including Markov rando...
In this paper, we propose two distributed algorithms, named Distributed Voting with Beeps (DVB), for multi-choice voting in beep model where each node can communicate with its neighbors merely by sending beep signals. The two proposed algorithms have simple structures and can be employed in networks having severe constraints on the size of messages...