Conference Paper

Correlation Clustering for Learning Mixtures of Canonical Correlation Models.

Source: DBLP

ABSTRACT

This paper addresses the task of analyzing the correlation between two related domains X and Y. Our research is motivated by an Earth Science task that studies the rela- tionship between vegetation and precipitation. A standard statistical technique for such problems is Canonical Correla- tion Analysis (CCA). A critical limitation of CCA is that it can only detect linear correlation between the two domains that is globally valid throughout both data sets. Our ap- proach addresses this limitation by constructing a mixture of local linear CCA models through a process we name cor- relation clustering. In correlation clustering, both data sets are clustered simultaneously according to the data's corre- lation structure such that, within a cluster, domain X and domain Y are linearly correlated in the same way. Each clus- ter is then analyzed using the traditional CCA to construct local linear correlation models. We present results on both artificial data sets and Earth Science data sets to demon- strate that the proposed approach can detect useful correla- tion patterns, which traditional CCA fails to discover.

Download full-text

Full-text

Available from: Carla E Brodley
  • Source
    • "There are a number of problems in the Earth Science domain that have a data mining requirement due to the unique challenges posed by the types of data encountered. There have been several recent applications of data mining techniques to Earth Science problems [15] [28] [31] [32] using a variety of data types ranging from remote-sensing data to data obtained from climate models. The land cover change detection problem is also one where data mining techniques can have a significant impact. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The study of land cover change is an important problem in the Earth Science domain because of its impacts on lo- cal climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Most well-known change detection techniques from statistics, sig- nal processing and control theory are not well-suited for the massive high-dimensional spatio-temporal data sets from Earth Science due to limitations such as high computational complexity and the inability to take advantage of seasonality and spatio-temporal autocorrelation inherent in Earth Sci- ence data. In our work, we seek to address these challenges with new change detection techniques that are based on data mining approaches. Specically, in this paper we have per- formed a case study for a new change detection technique for the land cover change detection problem. We study land cover change in the state of California, focusing on the San Francisco Bay Area and perform an extended study on the entire state. We also perform a comparative evaluation on forests in the entire state. These results demonstrate the utility of data mining techniques for the land cover change detection problem.
    Full-text · Conference Paper · Jan 2008
  • Source
    • "There are a number of problems in the Earth science domain that have a data mining requirement due to the unique challenges posed by the types of data encountered in the domain. There have been a number of recent applications of data mining techniques to Earth science problems [15] [28] [31] [32] using a variety of data types ranging from remote-sensing data to data obtained from climate models. The land cover change detection problem is also one where data mining techniques can have a significant impact. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The study of land cover change is an important problem in the Earth science domain because of its impacts on local climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Data min-ing and knowledge discovery techniques can aid this effort by efficiently discovering patterns that capture complex in-teractions between ocean temperature, air pressure, surface meteorology, and terrestrial carbon flux. Most well-known change detection techniques from statistics, signal process-ing and control theory are not well-suited for the massive high-dimensional spatio-temporal data sets from Earth Sci-ence due to limitations such as high computational com-plexity and the inability to take advantage of seasonality and spatio-temporal autocorrelation inherent in Earth Sci-ence data. In our work, we seek to address these challenges with new change detection techniques that are based on data mining approaches. Specifically, in this paper we have per-formed a case study for a new change detection technique for the land cover change detection problem. We study land cover change in the state of California, focusing on the San Francisco Bay Area as well perform an extended study on the entire state. We also perform a comparative evaluation on forests in the entire state. These results demonstrate the utility of data mining techniques for the land cover change detection problem.
    Full-text · Article ·
  • Source

    Preview · Article ·
Show more