Conference Paper

A Kernel Statistical Test of Independence

Conference: Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007
Source: DBLP


Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m 2), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlation-based tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists. 1

40 Reads
  • Source
    • "The extension of random split to thinning may lead to improved cotraining performance, as thinning may make features from different partitions less dependent and meanwhile well preserves the classification power in a high-dimensional setting when there is sufficient redundancy among features (see Section 3.2). The optimal number of partitions can be selected by heuristics such as the kernel independence test [Bach and Jordan (2003), Gretton et al. (2007)], which we leave for future work. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm - Tissue Array Co-Occurrence Matrix Analysis (TACOMA) - for quantifying cellular phenotypes based on textural regularity summarized by local inter-pixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists' input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With co-training, TACOMA can be trained with a radically small training sample (e.g., with size 30). We give theoretical insights into the success of co-training via thinning of the feature set in a high dimensional setting when there is "sufficient" redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists' performance in terms of accuracy and repeatability.
  • Source
    • "Although choosing F = F W or F β yields consistent estimates of γ F (P, Q) for all P and Q when M = R d , the rates The distance measure γ k has appeared in a wide variety of applications. These include statistical hypothesis testing, of homogeneity (Gretton et al., 2007), independence (Gretton et al., 2008), and conditional independence (Fukumizu et al., 2008); as well as in machine learning applications including kernel independent component analysis (Bach and Jordan, 2002; Gretton et al., 2005) and kernel based dimensionality reduction for supervised learning (Fukumizu et al., 2004). In these applications, kernels offer a linear approach to deal with higher order statistics: given the problem of homogeneity testing, for example, differences in higher order moments are encoded as differences in the means of nonlinear features of the variables. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as γk, indexed by the kernel function k that defines the inner product in the RKHS. We present three theoretical properties of γk. First, we consider the question of determining the conditions on the kernel k for which γk is a metric: such k are denoted characteristic kernels. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g., on compact domains), and are difficult to check, our conditions are straightforward and intuitive: integrally strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translation-invariant on ℜd, then it is characteristic if and only if the support of its Fourier transform is the entire ℜd. Second, we show that the distance between distributions under γk results from an interplay between the properties of the kernel and the distributions, by demonstrating that distributions are close in the embedding space when their differences occur at higher frequencies. Third, to understand the nature of the topology induced by γk, we relate γk to other popular metrics on probability measures, and present conditions on the kernel k under which γk metrizes the weak topology.
    Journal of Machine Learning Research 07/2009; 11:1517-1561. · 2.47 Impact Factor
  • Source
    • "Since there is no obvious way to discretize the continuous data, standard tests (like χ 2 ) are not very well-suited for this method. In our implementation we used a statistical test of independence based on the Hilbert-Schmidt Independence Criterion (HSIC) (Gretton et al., 2005; Smola et al., 2007; Gretton et al., 2008 "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a method that detects the true direction of time series, by tting an autore- gressive moving average model to the data. Whenever the noise is independent of the pre- vious samples for one ordering of the observa- tions, but dependent for the opposite order- ing, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identi- able). A new and important implication of our result is that it conrms a fundamental conjecture in causal reasoning | if after re- gression the noise is independent of signal for one direction and dependent for the other, then the former represents the true causal direction | in the case of time series. We test our approach on two types of data: sim- ulated data sets conforming to our modeling assumptions, and real world EEG time se- ries. Our method makes a decision for a sig- nicant fraction of both data sets, and these decisions are mostly correct. For real world data, our approach outperforms alternative solutions to the problem of time direction re- covery.
    Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009; 01/2009
Show more

Similar Publications


40 Reads