Conference Paper

Correlation Clustering for Learning Mixtures of Canonical Correlation Models.

Source: DBLP

ABSTRACT This paper addresses the task of analyzing the correlation between two related domains X and Y. Our research is motivated by an Earth Science task that studies the rela- tionship between vegetation and precipitation. A standard statistical technique for such problems is Canonical Correla- tion Analysis (CCA). A critical limitation of CCA is that it can only detect linear correlation between the two domains that is globally valid throughout both data sets. Our ap- proach addresses this limitation by constructing a mixture of local linear CCA models through a process we name cor- relation clustering. In correlation clustering, both data sets are clustered simultaneously according to the data's corre- lation structure such that, within a cluster, domain X and domain Y are linearly correlated in the same way. Each clus- ter is then analyzed using the traditional CCA to construct local linear correlation models. We present results on both artificial data sets and Earth Science data sets to demon- strate that the proposed approach can detect useful correla- tion patterns, which traditional CCA fails to discover.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Canonical correlation analysis (CCA) is a well-known technique for extracting linearly correlated features from multiple views (i.e., sets of features) of data. Recently, a locality-preserving CCA, named LPCCA, has been developed to incorporate the neighborhood information into CCA. Although LPCCA is proved to be better in revealing the intrinsic data structure than CCA, its discriminative power for subsequent classification is low on high-dimensional data sets such as face databases. In this paper, we propose an alternative formulation for integrating the neighborhood information into CCA and derive a new locality-preserving CCA algorithm called ALPCCA, which can better discover the local manifold structure of data and further enhance the discriminative power for high-dimensional classification. The experimental results on both synthetic and real-world data sets including multiple feature data set and face databases validate the effectiveness of the proposed method.
    Neural Processing Letters 04/2012; 37(2). · 1.24 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Although weather regimes are often used as a primary step in many statistical downscaling processes, they are usually defined solely in terms of atmospheric variables and seldom to maximize their correlation to observed local meteorological phenomena. This paper compares different clustering methods to perform such a task. The correlation clustering model is introduced to define regimes that are well correlated to local-scale precipitation observed on seven French Mediterranean rain gauges. This clustering method is compared to other approaches such as the k-means and ``expectation-maximization'' (EM) algorithms. The two latter are applied either to the main principal components of large-scale reanalysis data (geopotential height at 500 mbar and sea level pressure) covering the Mediterranean basin or to the canonical variates associated with large scale and resulting from a canonical correlation analysis performed on reanalyses and local precipitation. The weather regimes obtained by the different approaches are compared, with a focus on the ``extreme content'' captured within the regimes. Then, cost functions are developed to quantify the errors due to misclassification, in terms of local precipitation. The different clustering approaches show different misclassification and costs. EM applied to canonical variates appears as a good compromise between the other approaches, with high discrimination, overall for extreme precipitation, while the precipitation costs due to bad classification are acceptable. This paper provides tools to help the users choose the clustering method to be used according to the expected goal and the use of the weather regimes.
    Journal of Geophysical Research Atmospheres 01/2010; 115. · 3.44 Impact Factor

Full-text (3 Sources)

Available from
Jun 3, 2014