Conference Paper

# Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions.

Conference: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada.

Source: DBLP

- [Show abstract] [Hide abstract]

**ABSTRACT:**Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis testing, yields substantial improvements even when sample sizes are small, and excels at hypothesis tests involving multiple comparisons with power control. We derive asymptotic distributions under the null and alternative hypotheses, and assess power control. Outstanding results are obtained on: challenging EEG data, MNIST, the Berkley Covertype, and the Flare-Solar dataset.05/2013; - [Show abstract] [Hide abstract]

**ABSTRACT:**We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, Maximum Mean Discrepancies (MMD), i.e., distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with the semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.The Annals of Statistics 10/2013; 41(5):2263-2291. · 2.53 Impact Factor - [Show abstract] [Hide abstract]

**ABSTRACT:**Given random samples drawn i.i.d. from a probability measure $\mathbb{P}$ (defined on say, $\mathbb{R}^d$), it is well-known that the empirical estimator is an optimal estimator of $\mathbb{P}$ in weak topology but not even a consistent estimator of its density (if it exists) in the strong topology (induced by the total variation distance). On the other hand, various popular density estimators such as kernel and wavelet density estimators are optimal in the strong topology in the sense of achieving the minimax rate over all estimators for a Sobolev ball of densities. Recently, it has been shown in a series of papers by Gin\'{e} and Nickl that these density estimators on $\mathbb{R}$ that are optimal in strong topology are also optimal in $\Vert\cdot\Vert_\mathcal{F}$ for certain choices of $\mathcal{F}$ such that $\Vert\cdot\Vert_\mathcal{F}$ metrizes the weak topology, where $\Vert\mathbb{P}\Vert_\mathcal{F}:=\sup\{\int f\,d\mathbb{P}:f\in\mathcal{F}\}$. In this paper, we investigate this problem of optimal estimation in weak and strong topologies by choosing $\mathcal{F}$ to be a unit ball in a reproducing kernel Hilbert space (say $\mathcal{F}_H$ defined over $\mathbb{R}^d$), where this choice is both of theoretical and computational interest. Under some mild conditions on the reproducing kernel, we show that $\Vert\cdot\Vert_{\mathcal{F}_H}$ metrizes the weak topology and the kernel density estimator (with $L^1$ optimal bandwidth) estimates $\mathbb{P}$ at dimension independent optimal rate of $n^{-1/2}$ in $\Vert\cdot\Vert_{\mathcal{F}_H}$ along with providing a uniform central limit theorem for the kernel density estimator.10/2013;

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.