Conference Paper

Correlation Clustering for Learning Mixtures of Canonical Correlation Models.

Source: DBLP

ABSTRACT This paper addresses the task of analyzing the correlation between two related domains X and Y. Our research is motivated by an Earth Science task that studies the rela- tionship between vegetation and precipitation. A standard statistical technique for such problems is Canonical Correla- tion Analysis (CCA). A critical limitation of CCA is that it can only detect linear correlation between the two domains that is globally valid throughout both data sets. Our ap- proach addresses this limitation by constructing a mixture of local linear CCA models through a process we name cor- relation clustering. In correlation clustering, both data sets are clustered simultaneously according to the data's corre- lation structure such that, within a cluster, domain X and domain Y are linearly correlated in the same way. Each clus- ter is then analyzed using the traditional CCA to construct local linear correlation models. We present results on both artificial data sets and Earth Science data sets to demon- strate that the proposed approach can detect useful correla- tion patterns, which traditional CCA fails to discover.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of extracting statistical dependencies between multivariate signals, to be used for exploratory analysis of complicated natural phenomena. In particular, we develop generative models for extracting the dependencies, made possible by the probabilistic interpretation of canonical correlation analysis (CCA). We introduce a mixture of robust canonical correlation analyzers, using t-distribution to make the model robust to outliers and variational Bayesian inference for learning from noisy data. We demonstrate the improvements of the new model on artificial data, and further apply it for analyzing dependencies between MEG and measurements of autonomic nervous system to illustrate potential use scenarios.
    Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, Proceedings, Part III; 01/2010
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The study of land cover change is an important problem in the Earth Science domain because of its impacts on lo- cal climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Most well-known change detection techniques from statistics, sig- nal processing and control theory are not well-suited for the massive high-dimensional spatio-temporal data sets from Earth Science due to limitations such as high computational complexity and the inability to take advantage of seasonality and spatio-temporal autocorrelation inherent in Earth Sci- ence data. In our work, we seek to address these challenges with new change detection techniques that are based on data mining approaches. Specically, in this paper we have per- formed a case study for a new change detection technique for the land cover change detection problem. We study land cover change in the state of California, focusing on the San Francisco Bay Area and perform an extended study on the entire state. We also perform a comparative evaluation on forests in the entire state. These results demonstrate the utility of data mining techniques for the land cover change detection problem.
    Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008; 01/2008

Full-text (3 Sources)

Available from
Jun 3, 2014