A Kernel Statistical Test of Independence
MPI for Biological Cybernetics
T¨ ubingen, Germany
Inst. of Statistical Mathematics
Choon Hui Teo
and University of Sydney
Bernhard Sch¨ olkopf
MPI for Biological Cybernetics
T¨ ubingen, Germany
Alexander J. Smola
Although kernel measures of independence have been widely applied in machine
learning (notably in kernel ICA), there is as yet no method to determine whether
they have detected statistically significant dependence. We provide a novel test of
the independence hypothesis for one particular kernel independence measure, the
Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m2),
where m is the sample size. We demonstrate that this test outperformsestablished
contingency table and functional correlation-based tests, and that this advantage
is greater for multivariate data. Finally, we show the HSIC test also applies to
text (and to structured data more generally), for which no other independence test
Kernel independencemeasures have been widely applied in recent machine learning literature, most
commonly in independentcomponentanalysis (ICA) [2, 11], but also in fitting graphical models 
and in feature selection . One reason for their success is that these criteria have a zero expected
value if and only if the associated random variables are independent, when the kernels are universal
(in the sense of ). There is presently no way to tell whether the empirical estimates of these
dependence measures indicate a statistically significant dependence, however. In other words, we
are interested in the threshold an empirical kernel dependence estimate must exceed, before we can
dismiss with high probability the hypothesis that the underlying variables are independent.
Statistical tests of independencehave been associated with a broad variety of dependencemeasures.
Classical tests such as Spearman’s ρ and Kendall’s τ are widely applied, however they are not
guaranteed to detect all modes of dependence between the random variables. Contingency table-
based methods, and in particular the power-divergence family of test statistics , are the best
require a partitioning of the space in which each random variable resides. Characteristic function-
based tests [6, 13] have also been proposed, which are more general than kernel density-based tests
, although to our knowledge they have been used only to compare univariate random variables.
In this paper we present three main results: first, and most importantly,we show how to test whether
statistically significant dependence is detected by a particular kernel independence measure, the
Hilbert Schmidt independence criterion (HSIC, from ). That is, we provide a fast (O(m2) for
sample size m) and accurate means of obtaining a threshold which HSIC will only exceed with
small probability, when the underlying variables are independent. Second, we show the distribution
Table 1: Independence tests for cross-language dependence detection. Topics are in the first column, where the
total number of 5-line extracts for each dataset is in parentheses. BOW(10) denotes a bag of words kernel and
m = 10 sample size, Spec(50) is a k-spectrum kernel with m = 50. The first entry in each cell is the null
acceptance rate of the test under H0(i.e. 1−(Type I error); should be near 0.95); the second entry is the null
acceptance rate under H1(the Type II error, small is better). Each entry is an average over 300 repetitions.
of passages of text and their translation.Another application along these lines might be in testing
dependence between data of completely different types, such as images and captions.
Acknowledgements: NICTA is funded through the Australian Government’s Backing Australia’s
Ability initiative, in part through the ARC. This work was supported in part by the IST Programme
of the European Community, under the PASCAL Network of Excellence, IST-2002-506778.
 F. Bach and M. Jordan. Tree-dependent component analysis. In UAI 18, 2002.
 F. R. Bach and M. I. Jordan. Kernel independent component analysis. J. Mach. Learn. Res., 3:1–48, 2002.
 I. Calvino. If on a winter’s night a traveler. Harvest Books, Florida, 1982.
 J. Dauxois and G. M. Nkiet.Nonlinear canonical analysis and independence tests.
 L. Devroye, L. Gy¨ orfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Number 31 in
Applications of mathematics. Springer, New York, 1996.
 Andrey Feuerverger.A consistent test for bivariate dependence.
 K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with repro-
ducing kernel Hilbert spaces. Journal of Machine Learning Research, 5:73–99, 2004.
 A. Gretton, K. Borgwardt, M. Rasch, B. Sch¨ olkopf, and A. Smola. A kernel method for the two-sample-
problem. In NIPS 19, pages 513–520, Cambridge, MA, 2007. MIT Press.
 A. Gretton, O. Bousquet, A.J. Smola, and B. Sch¨ olkopf. Measuring statistical dependence with Hilbert-
Schmidt norms. In ALT, pages 63–77, 2005.
 A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Sch¨ olkopf, and A. Smola. A kernel statistical test of
independence. Technical Report 168, MPI for Biological Cybernetics, 2008.
 A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Sch¨ olkopf. Kernel methods for measuring
independence. J. Mach. Learn. Res., 6:2075–2129, 2005.
 N. L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Volume 1 (Second
Edition). John Wiley and Sons, 1994.
 A. Kankainen. Consistent Testing of Total Independence Based on the Empirical Characteristic Function.
PhD thesis, University of Jyv¨ askyl¨ a, 1995.
 Juha Karvanen. A resampling test for the total independence of stationary time series: Application to the
performance evaluation of ica algorithms. Neural Processing Letters, 22(3):311 – 324, 2005.
 C.-J. Ku and T. Fine. Testing for stochastic independence: application to blind source separation. IEEE
Transactions on Signal Processing, 53(5):1815–1826, 2005.
 C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification.
In Pacific Symposium on Biocomputing, pages 564–575, 2002.
 T. Read and N. Cressie. Goodness-Of-Fit Statistics for Discrete Multivariate Analysis. Springer-Verlag,
New York, 1988.
 A. R´ enyi. On measures of dependence. Acta Math. Acad. Sci. Hungar., 10:441–451, 1959.
 M. Rosenblatt. A quadratic measure of deviation of two-dimensional density estimates and a test of
independence. The Annals of Statistics, 3(1):1–14, 1975.
 B. Sch¨ olkopf, K. Tsuda, and J.-P. Vert. Kernel Methods in Computational Biology. MIT Press, 2004.
 R. Serfling. Approximation Theorems of Mathematical Statistics. Wiley, New York, 1980.
 L. Song, A. Smola, A. Gretton, K. Borgwardt, and J. Bedo. Supervised feature selection via dependence
estimation. In Proc. Intl. Conf. Machine Learning, pages 823–830. Omnipress, 2007.
 I. Steinwart. The influence of the kernel on the consistency of support vector machines. Journal of
Machine Learning Research, 2, 2002.
 C. H. Teo and S. V. N. Vishwanathan. Fast and space efficient string kernels using suffix arrays. In ICML,
pages 929–936, 2006.
 F.J. Theis. Towards a general independent subspace analysis. In NIPS 19, 2007.
International Statistical Review,