Conference Paper
A Kernel Statistical Test of Independence
Conference: Advances in Neural Information Processing Systems 20, Proceedings of the TwentyFirst Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 36, 2007
Source: DBLP
ABSTRACT
Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the HilbertSchmidt independence criterion (HSIC). The resulting test costs O(m 2), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlationbased tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists. 1
Fulltext preview
citeseerx.ist.psu.edu Available from: Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

 "The extension of random split to thinning may lead to improved cotraining performance, as thinning may make features from different partitions less dependent and meanwhile well preserves the classification power in a highdimensional setting when there is sufficient redundancy among features (see Section 3.2). The optimal number of partitions can be selected by heuristics such as the kernel independence test [Bach and Jordan (2003), Gretton et al. (2007)], which we leave for future work. "
Article: Statistical Methods for Analyzing Tissue Microarray Images  Algorithmic Scoring and Cotraining
[Show abstract] [Hide abstract]
ABSTRACT: Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful mediumtohigh throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm  Tissue Array CoOccurrence Matrix Analysis (TACOMA)  for quantifying cellular phenotypes based on textural regularity summarized by local interpixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists' input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With cotraining, TACOMA can be trained with a radically small training sample (e.g., with size 30). We give theoretical insights into the success of cotraining via thinning of the feature set in a high dimensional setting when there is "sufficient" redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists' performance in terms of accuracy and repeatability. 
 "Although choosing F = F W or F β yields consistent estimates of γ F (P, Q) for all P and Q when M = R d , the rates The distance measure γ k has appeared in a wide variety of applications. These include statistical hypothesis testing, of homogeneity (Gretton et al., 2007), independence (Gretton et al., 2008), and conditional independence (Fukumizu et al., 2008); as well as in machine learning applications including kernel independent component analysis (Bach and Jordan, 2002; Gretton et al., 2005) and kernel based dimensionality reduction for supervised learning (Fukumizu et al., 2004). In these applications, kernels offer a linear approach to deal with higher order statistics: given the problem of homogeneity testing, for example, differences in higher order moments are encoded as differences in the means of nonlinear features of the variables. "
[Show abstract] [Hide abstract]
ABSTRACT: A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as γk, indexed by the kernel function k that defines the inner product in the RKHS. We present three theoretical properties of γk. First, we consider the question of determining the conditions on the kernel k for which γk is a metric: such k are denoted characteristic kernels. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g., on compact domains), and are difficult to check, our conditions are straightforward and intuitive: integrally strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translationinvariant on ℜd, then it is characteristic if and only if the support of its Fourier transform is the entire ℜd. Second, we show that the distance between distributions under γk results from an interplay between the properties of the kernel and the distributions, by demonstrating that distributions are close in the embedding space when their differences occur at higher frequencies. Third, to understand the nature of the topology induced by γk, we relate γk to other popular metrics on probability measures, and present conditions on the kernel k under which γk metrizes the weak topology. 
 "Since there is no obvious way to discretize the continuous data, standard tests (like χ 2 ) are not very wellsuited for this method. In our implementation we used a statistical test of independence based on the HilbertSchmidt Independence Criterion (HSIC) (Gretton et al., 2005; Smola et al., 2007; Gretton et al., 2008 "
Conference Paper: Detecting the direction of causal time series
[Show abstract] [Hide abstract]
ABSTRACT: We propose a method that detects the true direction of time series, by tting an autore gressive moving average model to the data. Whenever the noise is independent of the pre vious samples for one ordering of the observa tions, but dependent for the opposite order ing, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identi able). A new and important implication of our result is that it conrms a fundamental conjecture in causal reasoning  if after re gression the noise is independent of signal for one direction and dependent for the other, then the former represents the true causal direction  in the case of time series. We test our approach on two types of data: sim ulated data sets conforming to our modeling assumptions, and real world EEG time se ries. Our method makes a decision for a sig nicant fraction of both data sets, and these decisions are mostly correct. For real world data, our approach outperforms alternative solutions to the problem of time direction re covery.