## About

61

Publications

11,810

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

2,324

Citations

Citations since 2017

Introduction

Additional affiliations

June 2015 - present

## Publications

Publications (61)

We study the problem of agent selection in causal strategic learning under multiple decision makers and address two key challenges that come with it. Firstly, while much of prior work focuses on studying a fixed pool of agents that remains static regardless of their evaluations, we consider the impact of selection procedure by which agents are not...

We present a novel approach for explaining Gaussian processes (GPs) that can utilize the full analytical covariance structure present in GPs. Our method is based on the popular solution concept of Shapley values extended to stochastic cooperative games, resulting in explanations that are random variables. The GP explanations generated using our app...

Causality is a central concept in a wide range of research areas, yet there is still no universally agreed axiomatisation of causality. We view causality both as an extension of probability theory and as a study of \textit{what happens when one intervenes on a system}, and argue in favour of taking Kolmogorov's measure-theoretic axiomatisation of p...

We investigate a simple objective for nonlinear instrumental variable (IV) regression based on a kernelized conditional moment restriction known as a maximum moment restriction (MMR). The MMR objective is formulated by maximizing the interaction between the residual and the instruments belonging to a unit ball in a reproducing kernel Hilbert space....

Explainability has become a central requirement for the development, deployment, and adoption of machine learning (ML) models and we are yet to understand what explanation methods can and cannot do. Several factors such as data, model prediction, hyperparameters used in training the model, and random initialization can all influence downstream expl...

We propose a method to learn predictors that are invariant under counterfactual changes of certain covariates. This method is useful when the prediction target is causally influenced by covariates that should not affect the predictor output. For instance, an object recognition model may be influenced by position, orientation, or scale of the object...

Important problems in causal inference, economics, and, more generally, robust machine learning can be expressed as conditional moment restrictions, but estimation becomes challenging as it requires solving a continuum of unconditional moment restrictions. Previous works addressed this problem by extending the generalized method of moments (GMM) to...

Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use...

Democratization of AI involves training and deploying machine learning models across heterogeneous and potentially massive environments. Diversity of data opens up a number of possibilities to advance AI systems, but also introduces pressing concerns such as privacy, security, and equity that require special attention. This work shows that it is th...

This paper provides some first steps in developing empirical process theory for functions taking values in a vector space. Our main results provide bounds on the entropy of classes of smooth functions taking values in a Hilbert space, by leveraging theory from differential calculus of vector-valued functions and fractal dimension theory of metric s...

In this note, I summarize my research across the "prediction-causation-regulation" spectrum in machine learning and illustrate my vision for future research. Recent breakthroughs in algorithmic predictions have not only led to widespread use of predictive models in critical domains, but also expedited scientific discoveries. Nevertheless, important...

Kernel maximum moment restriction (KMMR) recently emerges as a popular framework for instrumental variable (IV) based conditional moment restriction (CMR) models with important applications in conditional moment (CM) testing and parameter estimation for IV regression and proximal causal learning. The effectiveness of this framework, however, depend...

We address the problem of causal effect estimation in the presence of unobserved confounding, but where proxies for the latent confounder(s) are observed. We propose two kernel-based methods for nonlinear causal effect estimation in this setting: (a) a two-stage regression approach, and (b) a maximum moment restriction approach. We focus on the pro...

We propose to analyse the conditional distributional treatment effect (CoDiTE), which, in contrast to the more common conditional average treatment effect (CATE), is designed to encode a treatment's distributional aspects beyond the mean. We first introduce a formal definition of the CoDiTE associated with a distance function between probability me...

We propose data-dependent test statistics based on a one-dimensional witness function, which we call witness two-sample tests (WiTS tests). We first optimize the witness function by maximizing an asymptotic test-power objective and then use as the test statistic the difference in means of the witness evaluated on two held-out test samples. When the...

We address the problem of causal effect estima-tion in the presence of unobserved confounding,but where proxies for the latent confounder(s) areobserved. We propose two kernel-based meth-ods for nonlinear causal effect estimation in thissetting: (a) a two-stage regression approach, and(b) a maximum moment restriction approach. Wefocus on the proxim...

We present some learning theory results on reproducing kernel Hilbert space (RKHS) regression, where the output space is an infinite-dimensional Hilbert space.

We propose a simple framework for nonlinear instrumental variable (IV) regression based on a kernelized conditional moment restriction (CMR) known as a maximum moment restriction (MMR). The MMR is formulated by maximizing the interaction between the residual and functions of IVs that belong to a unit ball of reproducing kernel Hilbert space (RKHS)....

In recent years, substantial progress has been made on robotic grasping of household objects. Yet, human grasps are still difficult to synthesize realistically. There are several key reasons: (1) the human hand has many degrees of freedom (more than robotic manipulators); (2) the synthesized hand should conform naturally to the object surface; and...

Modern large-scale kernel-based tests such as maximum mean discrepancy (MMD) and kernelized Stein discrepancy (KSD) optimize kernel hyperparameters on a held-out sample via data splitting to obtain the most powerful test statistics. While data splitting results in a tractable null distribution, it suffers from a reduction in test power due to small...

We propose a new family of specification tests called kernel conditional moment (KCM) tests. Our tests are built on conditional moment embeddings (CMME)---a novel representation of conditional moment restrictions in a reproducing kernel Hilbert space (RKHS). After transforming the conditional moment restrictions into a continuum of unconditional co...

We present a new operator-free, measure-theoretic definition of the conditional mean embedding as a random variable taking values in a reproducing kernel Hilbert space. While the kernel mean embedding of marginal distributions has been defined rigorously, the existing operator-based approach of the conditional version lacks a rigorous definition, a...

Transfer operators such as the Perron-Frobenius or Koopman operator play an important role in the global analysis of complex dynamical systems. The eigenfunctions of these operators can be used to detect metastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposed signals. We extend transfer operator theory...

We propose a new family of specification tests called kernel conditional moment (KCM) tests. Our tests are built on a novel representation of conditional moment restrictions in a reproducing kernel Hilbert space (RKHS) called conditional moment embedding (CMME). After transforming the conditional moment restrictions into a continuum of unconditiona...

This work presents the concept of kernel mean embedding and kernel probabilistic programming in the context of stochastic systems. We propose formulations to represent, compare, and propagate uncertainties for fairly general stochastic dynamics in a distribution-free manner. The new tools enjoy sound theory rooted in functional analysis and wide ap...

The kernel mean embedding of probability distributions is commonly used in machine learning as an injective mapping from distributions to functions in an infinite-dimensional Hilbert space. It allows us, for example, to define a distance measure between probability distributions, called the maximum mean discrepancy. In this work, we propose to repr...

This work presents the concept of kernel mean embedding and kernel probabilistic programming in the context of stochastic systems. We propose formulations to represent, compare, and propagate uncertainties for fairly general stochastic dynamics in a distribution-free manner. The new tools enjoy sound theory rooted in functional analysis and wide ap...

Modern implicit generative models such as generative adversarial networks (GANs) are generally known to suffer from issues such as instability, uninterpretability, and difficulty in assessing their performance. If we see these implicit models as dynamical systems, some of these issues are caused by being unable to control their behavior in a meanin...

We present a novel single-stage procedure for instrumental variable (IV) regression called DualIV which simplifies traditional two-stage regression via a dual formulation. We show that the common two-stage procedure can alternatively be solved via generalized least squares. Our formulation circumvents the first-stage regression which can be a bottl...

Bilinear pooling is capable of extracting high-order information from data, which makes it suitable for fine-grained visual understanding and information fusion. Despite their effectiveness in various applications, bilinear models with massive number of parameters can easily suffer from curse of dimensionality and intractable computation. In this p...

The kernel mean embedding of probability distributions is commonly used in machine learning as an injective mapping from distributions to functions in an infinite dimensional Hilbert space. It allows us, for example, to define a distance measure between probability distributions, called maximum mean discrepancy (MMD). In this work, we propose to re...

The use of propensity score methods to reduce selection bias when determining causal effects is common practice for observational studies. Although such studies in econometrics, social science, and medicine often rely on sensitive data, there has been no prior work on privatising the propensity scores used to ascertain causal effects from observed...

We introduce a conditional density estimation model termed the conditional density operator. It naturally captures multivariate, multimodal output densities and is competitive with recent neural conditional density models and Gaussian processes. To derive the model, we propose a novel approach to the reconstruction of probability densities from the...

Consequential decisions are increasingly informed by sophisticated data-driven predictive models. For accurate predictive models, deterministic threshold rules have been shown to be optimal in terms of utility, even under a variety of fairness constraints. However, consistently learning accurate models requires access to ground truth data. Unfortun...

Modern implicit generative models such as generative adversarial networks (GANs) are generally known to suffer from instability and lack of interpretability as it is difficult to diagnose what aspects of the target distribution are missed by the generative model. In this work, we propose a theoretically grounded solution to these issues by augmenti...

Fine-grained temporal action parsing is important in many applications, such as daily activity understanding, human motion analysis, surgical robotics and others requiring subtle and precise operations in a long-term period. In this paper we propose a novel bilinear pooling operation, which is used in intermediate layers of a temporal convolutional...

This paper introduces a novel Hilbert space representation of a counterfactual distribution---called counterfactual mean embedding (CME)---with applications in nonparametric causal inference. Counterfactual prediction has become an ubiquitous tool in machine learning applications, such as online advertisement, recommendation systems, and medical di...

Neural Information Processing Systems (NIPS) is a top-tier annual conference in machine learning. The 2016 edition of the conference comprised more than 2,400 paper submissions, 3,000 reviewers, and 8,000 attendees, representing a growth of nearly 40% in terms of submissions, 96% in terms of reviewers, and over 100% in terms of attendees as compare...

A Hilbert space embedding of distributions---in short, kernel mean embedding---has recently emerged as a powerful machinery for probabilistic modeling, statistical inference, machine learning, and causal discovery. The basic idea behind this framework is to map distributions into a reproducing kernel Hilbert space (RKHS) in which the whole arsenal...

A Hilbert space embedding of a distribution—in short, a kernel mean embedding—has recently emerged as a powerful tool for machine learning and statistical inference. The basic idea behind this framework is to map distributions into a reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel methods can be extended to probability...

In this paper, we study the minimax estimation of the Bochner integral $$\mu_k(P):=\int_{\mathcal{X}} k(\cdot,x)\,dP(x),$$ also called as the \emph{kernel mean embedding}, based on random samples drawn i.i.d.~from $P$, where $k:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R}$ is a positive definite kernel. Various estimators (including the empiri...

We pose causal inference as the problem of learning to classify probability
distributions. In particular, we assume access to a collection
$\{(S_i,l_i)\}_{i=1}^n$, where each $S_i$ is a sample drawn from the
probability distribution of $X_i \times Y_i$, and $l_i$ is a binary label
indicating whether "$X_i \to Y_i$" or "$X_i \leftarrow Y_i$". Given...

We describe a method to perform functional operations on probability
distributions of random variables. The method uses reproducing kernel Hilbert
space representations of probability distributions, and it is applicable to all
operations which can be applied to points drawn from the respective
distributions. We refer to our approach as {\em kernel...

The dissertation presents a novel learning framework on probability measures which has abundant real-world applications. In classical setup, it is assumed that the data are points that have been drawn independent and identically (i.i.d.) from some unknown distribution. In many scenarios, however, representing data as distributions may be more prefe...

The problem of estimating the kernel mean in a reproducing kernel Hilbert
space (RKHS) is central to kernel methods in that it is used by classical
approaches (e.g., when centering a kernel PCA matrix), and it also forms the
core inference step of modern kernel methods (e.g., kernel-based non-parametric
tests) that rely on embedding probability dis...

We are interested in learning causal relationships between pairs of random
variables, purely from observational data. To effectively address this task,
the state-of-the-art relies on strong assumptions regarding the mechanisms
mapping causes to effects, such as invertibility or the existence of additive
noise, which only hold in limited situations....

A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel
mean, is an important part of many algorithms ranging from kernel principal
component analysis to Hilbert-space embedding of distributions. Given a finite
sample, an empirical average has been used consistently as a standard estimator
for the true kernel mean. Despite a commo...

2014 A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is an important part of many algorithms ranging from kernel principal component analysis to Hilbert-space embedding of distributions. Given a finite sample, an empirical average is the standard estimate for the true kernel mean. We show that this estimator can be i...

Determining conditional independence (CI) relationships between random variables is a challenging but important task for problems such as Bayesian network learning and causal discovery. We propose a new kernel CI test that uses a single, learned permutation to convert the CI test problem into an easier two-sample test problem. The learned permutati...

A mean function in reproducing kernel Hilbert space, or a kernel mean, is an
important part of many applications ranging from kernel principal component
analysis to Hilbert-space embedding of distributions. Given finite samples, an
empirical average is the standard estimate for the true kernel mean. We show
that this estimator can be improved via a...

We propose one-class support measure machines (OCSMMs) for group anomaly
detection which aims at recognizing anomalous aggregate behaviors of data
points. The OCSMMs generalize well-known one-class support vector machines
(OCSVMs) to a space of probability measures. By formulating the problem as
quantile estimation on distributions, we can establis...

This paper investigates domain generalization: How to take knowledge acquired from an arbitrary number of related domains and apply it to previously unseen domains? We propose Domain-Invariant Component Analysis (DICA), a kernel-based optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whil...

Let X denote the feature and Y the target. We consider domain adaptation under three possible scenarios: (1) the marginal PY changes, while the conditional PX|Y stays the same (target shift), (2) the marginal PY is fixed, while the conditional PX|Y changes with certain constraints (conditional shift), and (3) the marginal PY changes, and the condit...

This paper proposes a Hilbert space embedding for Dirichlet Process mixture
models via a stick-breaking construction of Sethuraman. Although Bayesian
nonparametrics offers a powerful approach to construct a prior that avoids the
need to specify the model size/complexity explicitly, an exact inference is
often intractable. On the other hand, frequen...

This paper presents a kernel-based discriminative learning framework on
probability measures. Rather than relying on large collections of vectorial
training examples, our framework learns using a collection of probability
distributions that have been constructed to meaningfully represent training
data. By representing these probability distribution...

There has recently been a large effort in using unlabeled data in conjunction with labeled data in machine learning. Semi-supervised
learning and active learning are two well-known techniques that exploit the unlabeled data in the learning process. In this
work, the active learning is used to query a label for an unlabeled data on top of a semi-sup...

Graph-based semi-supervised learning has attracted much attention in recent years. Many successful methods rely on graph structure
to propagate labels from labeled data to unlabeled data. Although graph structure affects the performance of the system, only
few works address its construction problem. In this work, the graph structure is constructed...

This paper proposes PACS (picture archiving communication system) to manage and transfer information for dental field focusing on 2 main fields as follows. First application was to open digital imaging and communications in medicine (DICOM) files of patients inside the database via local area network (LAN) and hypertext transfer protocol [HTTP]. Se...