# Mark CrowleyUniversity of Waterloo | UWaterloo · Department of Electrical & Computer Engineering

Mark Crowley

Ph.D. Computer Science

## About

159

Publications

79,428

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

1,326

Citations

Introduction

My research area can be broadly described as Artificial Intelligence with a particular focus on the areas of decision making under uncertainty, probabilistic graphical models, probabilistic inference, machine learning and reinforcement learning. I do research into decision making under uncertainty in massive spatiotemporal domains such as ecological planning.

Additional affiliations

January 2015 - present

January 2012 - August 2014

September 2005 - October 2011

## Publications

Publications (159)

This is a tutorial paper for Hidden Markov Model (HMM). First, we briefly review the background on Expectation Maximization (EM), Lagrange multiplier, factor graph, the sum-product algorithm, the max-product algorithm, and belief propagation by the forward-backward procedure. Then, we introduce probabilistic graphical models including Markov random...

This paper provides a simulated laboratory for making use of reinforcement learning (RL) for material design, synthesis, and discovery. Since RL is fairly data intensive, training agents ‘on-the-fly’ by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are no...

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy...

Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applicatio...

This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL...

Conventional supervised learning methods typically assume i.i.d samples and are found to be sensitive to out-of-distribution (OOD) data. We propose Generative Causal Representation Learning (GCRL) which leverages causality to facilitate knowledge transfer under distribution shifts. While we evaluate the effectiveness of our proposed method in human...

Principal Component Analysis (PCA) (Jolliffe, Principal component analysis. Springer, 2011) is a very well-known and fundamental linear method for subspace learning and dimensionality reduction (Friedman et al., The elements of statistical learning. vol. 2. Springer series in statistics, 2009). This method, which is also used for feature extraction...

Probabilistic methods are a category of dimensionality reduction. Among the probabilistic methods, there are neighbour embedding algorithms where the probabilities of neighbourhoods are used. In these algorithms, attractive and repulsive forces are utilized for neighbour and non-neighbour points, respectively.

Linear dimensionality reduction methods project data onto the low-dimensional column space of a projection matrix. Many of these methods, such as Principal Component Analysis (PCA) (see Chap. 5) and Fisher Discriminant Analysis (FDA) (see Chap. 6), learn a projection matrix for either better representation of data or discrimination between the clas...

Multidimensional Scaling (MDS) was first proposed in Torgerson and is one of the earliest proposed dimensionality reduction methods.

Learning models can be divided into discriminative and generative models. Discriminative models discriminate the classes of data for better separation of classes while the generative models learn a latent space that generates the data points. This chapter introduces generative models.

It is not wrong to say that almost all machine learning algorithms, including the dimensionality reduction methods, reduce to optimization. Many of the optimization methods can be explained in terms of the Karush-Kuhn-Tucker (KKT) conditions (Kjeldsen, Historia Math 27:331–361, 2000), proposed in Karush (Minima of functions of several variables wit...

It was mentioned in Chap. 11 that metric learning can be divided into three types of learning—spectral, probabilistic and deep metric learning.

It was mentioned in Chap. 11 that metric learning can be divided into spectral, probabilistic, and deep metric learning. Chapters 11 and 13 explained that both spectral and probabilistic metric learning methods use the generalized Mahalanobis distance, i.e., Eq. (11.53) in Chap. 11, and learn the weight matrix in the metric. Deep metric learning, h...

Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction method that can be used for manifold embedding and feature extraction.

In functional analysis—a field of mathematics—there are various spaces of either data points or functions. For example, the Euclidean space is a subset of the Hilbert space, while the Hilbert space itself is a subset of the Banach space. The Hilbert space is a space of functions and its dimensionality is often considered to be high.

Suppose there is a dataset that has labels, either for regression or classification. Sufficient Dimension Reduction (SDR), first proposed by Li, is a family of methods that find a transformation of the data to a lower dimensional space, which does not change the conditional of labels given the data.

Spectral dimensionality reduction methods deal with the graph and geometry of data and usually reduce to an eigenvalue or generalized eigenvalue problem (see Chap. 2).

Various spectral methods have been proposed over the past few decades. Some of the most well-known spectral methods include Principal Component Analysis (PCA), Multidimensional Scaling (MDS), Isomap, spectral clustering, Laplacian eigenmap, diffusion map, and Locally Linear Embedding (LLE).

Stochastic Neighbour Embedding (SNE) is a manifold learning and dimensionality reduction method that can be used for feature extraction and data visualization. It takes a probabilistic approach, to fit the data in the embedding space locally hoping to preserve the global structure of data.

Fisher Discriminant Analysis (FDA) attempts to find a subspace that separates the classes as much as possible, while the data also become as spread as possible.

Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Lever...

The novel method proposed in this paper is compromised of application of two Convolutional Neural Networks(CNN) working in parallel to simultaneously classify driver behaviors while classifying maneuvers by using time series data. We claim that the Parallel Convolutional Neural Network (PCNN) not only speeds-up training time but also increases perf...

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent a...

Emotions can provide a natural communication modality to complement the existing multi-modal capabilities of social robots, such as text and speech, in many domains. We conducted three online studies with 112, 223, and 151 participants to investigate the benefits of using emotions as a communication modality for Search And Rescue (SAR) robots. In t...

A family of dimensionality reduction methods known as metric learning learns a distance metric in an embedding space to separate dissimilar points and bring together similar points. In supervised metric learning, the aim is to discriminate classes by learning an appropriate metric.

Chapter 12 explained that learning models can be divided into discriminative and generative models. The Variational Autoencoder (VAE), introduced in this chapter, is a generative model. Variational inference is a technique that finds a lower bound on the log-likelihood of the data and maximizes the lower bound rather than the log-likelihood in the...

Suppose there is a generative model that takes random noise as input and generates a data point. The aim is to have the generated data point to be of good quality; therefore, there is a need to judge its quality. One way to judge it is to observe the generated sample and assess its quality visually. In this case, the judge is a human. However, it i...

Centuries ago, the Boltzmann distribution, also called the Gibbs distribution, was proposed. This energy-based distribution was found to be useful for statistically modelling physical systems. One of these systems was the Ising model, which modelled interacting particles with binary spins. Later, it was discovered that the Ising model could be a ne...

Multiagent reinforcement learning algorithms have not been widely adopted in large scale environments with many agents as they often scale poorly with the number of agents. Using mean field theory to aggregate agents has been proposed as a solution to this problem. However, almost all previous methods in this area make a strong assumption of a cent...

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy...

Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we look at the linear reconstruction step from a stochastic perspective where it is ass...

Consider a set of $n$ data points in the Euclidean space $\mathbb{R}^d$. This set is called dataset in machine learning and data science. Manifold hypothesis states that the dataset lies on a low-dimensional submanifold with high probability. All dimensionality reduction and manifold learning methods have the assumption of manifold hypothesis. In t...

This is a tutorial and survey paper on metric learning. Algorithms are divided into spectral, probabilistic, and deep metric learning. We first start with the definition of distance metric, Ma-halanobis distance, and generalized Mahalanobis distance. In spectral methods, we start with methods using scatters of data, including the first spectral met...

Emotions can provide a natural communication modality to complement the existing multi-modal capabilities of social robots, such as text and speech, in many domains. We conducted three online studies with 112, 223, and 151 participants, respectively, to investigate the benefits of using emotions as a communication modality for Search And Rescue (SA...

Using a novel toy nautical navigation environment, we show that dynamic programming can be used when only incomplete information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety, outperforming the baseline perfor...

Multiagent reinforcement learning algorithms have not been widely adopted in large scale environments with many agents as they often scale poorly with the number of agents. Using mean field theory to aggregate agents has been proposed as a solution to this problem. However, almost all previous methods in this area make a strong assumption of a cent...

The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step...

This is a tutorial and survey paper on Generative Adversarial Network (GAN), adversarial autoencoders, and their variants. We start with explaining adversarial learning and the vanilla GAN. Then, we explain the conditional GAN and DCGAN. The mode collapse problem is introduced and various methods, including minibatch GAN, unrolled GAN, BourGAN, mix...

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent a...

A pathology report is one of the most significant medical documents providing interpretive insights into the visual appearance of the patient's biopsy sample. In digital pathology, high-resolution images of tissue samples are stored along with pathology reports. Despite the valuable information that pathology reports hold, they are not used in any...

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy...

This is a tutorial and survey paper on various methods for Sufficient Dimension Reduction (SDR). We cover these methods with both statistical high-dimensional regression perspective and machine learning approach for dimensionality reduction. We start with introducing inverse regression methods including Sliced Inverse Regression (SIR), Sliced Avera...

This is a tutorial and survey paper on Karush-Kuhn-Tucker (KKT) conditions, first-order and second-order numerical optimization, and distributed optimization. After a brief review of history of optimization, we start with some preliminaries on properties of sets, norms, functions, and concepts of optimization. Then, we introduce the optimization pr...

The capability of showing affective expressions is important for the design of social robots in many contexts, where the robot is designed to communicate with humans. It is reasonable to expect that, similar to all other interaction modalities, communicating with affective expressions is not without limitations. In this paper, we present two online...

This work concentrates on optimization on Riemannian manifolds. The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a commonly used quasi-Newton method for numerical optimization in Euclidean spaces. Riemannian LBFGS (RLBFGS) is an extension of this method to Riemannian manifolds. RLBFGS involves computationally expensive vecto...

Uniform Manifold Approximation and Projection (UMAP) is one of the state-of-the-art methods for dimensionality reduction and data visualization. This is a tutorial and survey paper on UMAP and its variants. We start with UMAP algorithm where we explain probabilities of neighborhood in the input and embedding spaces, optimization of cost function, t...

This is a tutorial and survey paper on the Johnson-Lindenstrauss (JL) lemma and linear and nonlinear random projections. We start with linear random projection and then justify its correctness by JL lemma and its proof. Then, sparse random projections with $\ell_1$ norm and interpolation norm are introduced. Two main applications of random projecti...

This is a tutorial and survey paper on Boltzmann Machine (BM), Restricted Boltzmann Machine (RBM), and Deep Belief Network (DBN). We start with the required background on probabilistic graphical models, Markov random field, Gibbs sampling, statistical physics, Ising model, and the Hopfield network. Then, we introduce the structures of BM and RBM. T...

Data often have nonlinear patterns in machine learning. One can unfold the nonlinear manifold of a dataset for low-dimensional visualization and feature extraction. Locally Linear Embedding (LLE) is a nonlinear spectral method for dimensionality reduction and manifold unfolding. It embeds data using the same linear reconstruction weights as in the...

This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analys...

This is a tutorial and survey paper on kernels, kernel methods, and related fields. We start with reviewing the history of kernels in functional analysis and machine learning. Then, Mercer kernel, Hilbert and Banach spaces, Reproducing Kernel Hilbert Space (RKHS), Mercer's theorem and its proof, frequently used kernels, kernel construction from dis...

This is a tutorial and survey paper for nonlinear dimensionality and feature extraction methods which are based on the Laplacian of graph of data. We first introduce adjacency matrix, definition of Laplacian matrix, and the interpretation of Laplacian. Then, we cover the cuts of graph and spectral clustering which applies clustering in a subspace o...

We propose a new embedding method, named Quantile–Quantile Embedding (QQE), for distribution transformation and manifold embedding with the ability to choose the embedding distribution. QQE, which uses the concept of quantile–quantile plot from visual statistical tests, can transform the distribution of data to any theoretical desired distribution...

Histopathology image embedding is an active research area in computer vision. Most of the embedding models exclusively concentrate on a specific magnification level. However, a useful task in histopathology embedding is to train an embedding space regardless of the magnification level. Two main approaches for tackling this goal are domain adaptatio...

Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method. It has two main steps which are linear reconstruction and linear embedding of points in the input space and embedding space, respectively. In this work, we propose two novel generative versions of LLE, named Generative LLE (GLLE), whose lin...

The YouTube presentation of slides:
https://www.youtube.com/watch?v=bqtNjwI60dQ

The YouTube presentation of slides:
https://www.youtube.com/watch?v=S8lGRykashc

The YouTube presentation of slides:
https://www.youtube.com/watch?v=XuFBoRp4ZAM

The YouTube presentation of slides:
https://www.youtube.com/watch?v=iYwW9XdWPZw

The YouTube presentation of slides:
https://www.youtube.com/watch?v=LscooUyktz4

The YouTube presentation of slides:
https://www.youtube.com/watch?v=4Zk2UFGIrd0

The YouTube presentation of slides:
https://www.youtube.com/watch?v=3rXYXZiH-pA

Histopathology image embedding is an active research area in computer vision. Most of the embedding models exclusively concentrate on a specific magnification level. However, a useful task in histopathology embedding is to train an embedding space regardless of the magnification level. Two main approaches for tackling this goal are domain adaptatio...

Metric learning is a technique in manifold learning to find a projection subspace for increasing and decreasing the inter- and intra-class variances, respectively. Some metric learning methods are based on triplet learning with anchor-positive-negative triplets. Large margin metric learning for nearest neighbor classification is one of the fundamen...

Variants of Triplet networks are robust entities for learning a discriminative embedding subspace. There exist different triplet mining approaches for selecting the most suitable training triplets. Some of these mining methods rely on the extreme distances between instances, and some others make use of sampling. However, sampling from stochastic di...

This is a tutorial and survey paper on factor analysis, probabilistic Principal Component Analysis (PCA), variational inference, and Variational Autoencoder (VAE). These methods, which are tightly related, are dimensionality reduction and generative models. They asssume that every data point is generated from or caused by a low-dimensional latent f...

Traditional multi-agent reinforcement learning algorithms are not scalable to environments with more than a few agents, since these algorithms are exponential in the number of agents. Recent research has introduced successful methods to scale multi-agent reinforcement learning algorithms to many agent scenarios using mean field theory. Previous wor...

Vehicle platooning has been shown to be quite fruitful in the transportation industry to enhance fuel economy, road throughput, and driving comfort. Model Predictive Control (MPC) is widely used in literature for platoon control to achieve certain objectives, such as safely reducing the distance among consecutive vehicles while following the leader...

We analyze the effect of offline and online triplet mining for colorectal cancer (CRC) histopathology dataset containing 100,000 patches. We consider the extreme, i.e., farthest and nearest patches to a given anchor, both in online and offline mining. While many works focus solely on selecting the triplets online (batch-wise), we also study the eff...

This is a tutorial and survey paper for Locally Linear Embedding (LLE) and its variants. The idea of LLE is fitting the local structure of manifold in the embedding space. In this paper, we first cover LLE, kernel LLE, inverse LLE, and feature fusion with LLE. Then, we cover out-of-sample embedding using linear reconstruction, eigenfunctions, and k...