Preprint

Incomplete Multi-view Clustering via Diffusion Contrastive Generation

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Incomplete multi-view clustering (IMVC) has garnered increasing attention in recent years due to the common issue of missing data in multi-view datasets. The primary approach to address this challenge involves recovering the missing views before applying conventional multi-view clustering methods. Although imputation-based IMVC methods have achieved significant improvements, they still encounter notable limitations: 1) heavy reliance on paired data for training the data recovery module, which is impractical in real scenarios with high missing data rates; 2) the generated data often lacks diversity and discriminability, resulting in suboptimal clustering results. To address these shortcomings, we propose a novel IMVC method called Diffusion Contrastive Generation (DCG). Motivated by the consistency between the diffusion and clustering processes, DCG learns the distribution characteristics to enhance clustering by applying forward diffusion and reverse denoising processes to intra-view data. By performing contrastive learning on a limited set of paired multi-view samples, DCG can align the generated views with the real views, facilitating accurate recovery of views across arbitrary missing view scenarios. Additionally, DCG integrates instance-level and category-level interactive learning to exploit the consistent and complementary information available in multi-view data, achieving robust and end-to-end clustering. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Facilitated by the powerful feature extraction ability of neural networks, deep clustering has achieved great success in analyzing high-dimensional and complex real-world data. The performance of deep clustering methods is affected by various factors such as network structures and learning objectives. However, as pointed out in this survey, the essence of deep clustering lies in the incorporation and utilization of prior knowledge, which is largely ignored by existing works. From pioneering deep clustering methods based on data structure assumptions to recent contrastive clustering methods based on data augmentation invariances, the development of deep clustering intrinsically corresponds to the evolution of prior knowledge. In this survey, we provide a comprehensive review of deep clustering methods by categorizing them into six types of prior knowledge. We find that in general the prior innovation follows two trends, namely, i) from mining to constructing, and ii) from internal to external. Besides, we provide a benchmark on five widely-used datasets and analyze the performance of methods with diverse priors. By providing a novel prior knowledge perspective, we hope this survey could provide some novel insights and inspire future research in the deep clustering community.
Conference Paper
Full-text available
The future of autonomous vehicles lies in the convergence of human-centric design and advanced AI capabilities. Autonomous vehicles of the future will not only transport passengers but also interact and adapt to their desires, making the journey comfortable, efficient, and pleasant. In this paper, we present a novel framework that leverages Large Language Models (LLMs) to enhance autonomous vehicles' decision-making processes. By integrating LLMs' natural language capabilities and contextual understanding, specialized tools usage, synergizing reasoning, and acting with various modules on autonomous vehicles, this framework aims to seamlessly integrate the advanced language and reasoning capabilities of LLMs into autonomous vehicles. The proposed framework holds the potential to revolutionize the way autonomous vehicles operate, offering personalized assistance, continuous learning, and transparent decision-making, ultimately contributing to safer and more efficient autonomous driving technologies.
Technical Report
Full-text available
In this note, we show that k-means clustering can be understood as a constrained matrix factorization problem. This insight will later allow us to recognize that k-means clustering is but a specific latent factor model and closely related to techniques such as non-negative matrix factorization or archetypal analysis.
Conference Paper
Full-text available
We investigate bag-of-visual-words (BOVW) approaches to land-use classification in high-resolution overhead imagery. We consider a standard non-spatial representation in which the frequencies but not the locations of quantized image features are used to discriminate between classes analogous to how words are used for text document classification without regard to their order of occurrence. We also consider two spatial extensions, the established spatial pyramid match kernel which considers the absolute spatial arrangement of the image features, as well as a novel method which we term the spatial co-occurrence kernel that considers the relative arrangement. These extensions are motivated by the importance of spatial structure in geographic data. The methods are evaluated using a large ground truth image dataset of 21 land-use classes. In addition to comparisons with standard approaches, we perform extensive evaluation of different configurations such as the size of the visual dictionaries used to derive the BOVW representations and the scale at which the spatial relationships are considered. We show that even though BOVW approaches do not necessarily perform better than the best standard approaches overall, they represent a robust alternative that is more effective for certain land-use classes. We also show that extending the BOVW approach with our proposed spatial co-occurrence kernel consistently improves performance.
Conference Paper
Graph-level clustering, which is essential in medical, biomedical, and social network data analysis, aims to group a set of graphs into various clusters. However, existing methods generally rely on a single clustering criterion, e.g., k-means, which limits their abilities to fully exploit the complex Euclidean and structural information inherent in graphs. To bridge this gap, we propose a dual contrastive graph-level clustering (DCGLC) method in this paper. DCGLC leverages graph contrastive learning and introduces the Euclidian-based and subspace-based cluster heads to capture the cluster information from different cluster perspectives. To overcome the inconsistency estimations and fuse the cluster information of multiple cluster heads, we propose a contrastive mechanism to align the cluster information derived from them. The cluster-perspective contrast facilitates the capture of more comprehensive cluster information. Importantly, DCGLC is an end-to-end framework in which graph contrastive learning and cluster-perspective contrast are mutually improved. We demonstrate the superiority of DCGLC over the state-of-the-art baselines on numerous graph benchmarks.
Article
Deep multi-view clustering leverages deep neural networks to achieve promising performance, but almost all existing methods implicitly assume that all views are aligned correctly. This assumption is unrealistic in many real-world scenarios, where noise, occlusion, or sensor differences can inevitably cause misaligned data. Based on this observation, we reveal and study a practical but understudied problem in multi-view clustering (MVC), i.e., noisy correspondence (NC). Considering this problem, we argue that the main challenge is to prevent the model from overfiting NC. To this end, we propose a novel Robust Multi-view Clustering with Noisy Correspondence (RMCNC) method, which alleviates the influence of the misaligned pairs from multi-view data. To be specific, we first compute a united probability with all positive pairs to learn cross-view alignment consistency, thereby alleviating the adverse impact of the individual false positives. To further mitigate the overfitting problem, we propose a noise-tolerance multi-view contrastive loss that avoids overemphasizing noisy data. Moreover, RMCNC is a unified framework, which can deal with both partially view-aligned and NC problems in multi-view clustering. To the best of our knowledge, it could be the first study on NC in multi-view clustering. The experimental results on eight benchmark datasets indicate our RMCNC achieves competitive performance and robustness. The code of RMCNC is released at https://github.com/sunyuan-cs/2024-TKDE-RMCNC .
Article
The joint clustering of multimodal remote sensing (RS) data poses a critical and challenging task in Earth observation. Although recent advances in multiview subspace clustering have shown remarkable success, existing methods become computationally prohibitive when dealing with large-scale RS datasets. Moreover, they neglect intrinsic nonlinear and spatial interdependencies among heterogeneous RS data and lack generalization ability for out-of-sample data, thereby restricting their applicability. This article introduces a novel unified framework called anchor-based multiview kernel subspace clustering with spatial regularization (AMKSC). It learns a scalable anchor graph in the kernel space, leveraging contributions from each modality instead of seeking a consensus full graph in the feature space. To ensure spatial consistency, we incorporate a spatial smoothing operation into the formulation. The method is efficiently solved using an alternating optimization strategy, and we provide theoretical evidence of its scalability with linear computational complexity. Furthermore, an out-of-sample extension of AMKSC based on multiview collaborative representation-based classification is introduced, enabling the handling of larger datasets and unseen instances. Extensive experiments on three real heterogeneous RS datasets confirm the superiority of our proposed approach over state-of-the-art methods in terms of clustering performance and time efficiency. The source code is available at https://github.com/AngryCai/AMKSC.
Article
Multi-view clustering (MVC) has attracted broad attention due to its capacity to exploit consistent and complementary information across views. This paper focuses on a challenging issue in MVC called the incomplete continual data problem (ICDP). Specifically, most existing algorithms assume that views are available in advance and overlook the scenarios where data observations of views are accumulated over time. Due to privacy considerations or memory limitations, previous views cannot be stored in these situations. Some works have proposed ways to handle this problem, but all of them fail to address incomplete views. Such an incomplete continual data problem (ICDP) in MVC is difficult to solve since incomplete information with continual data increases the difficulty of extracting consistent and complementary knowledge among views. We propose Fast Continual Multi-View Clustering with Incomplete Views (FCMVC-IV) to address this issue. Specifically, the method maintains a scalable consensus coefficient matrix and updates its knowledge with the incoming incomplete view rather than storing and recomputing all the data matrices. Considering that the given views are incomplete, the newly collected view might contain samples that have yet to appear; two indicator matrices and a rotation matrix are developed to match matrices with different dimensions. In addition, we design a three-step iterative algorithm to solve the resultant problem with linear complexity and proven convergence. Comprehensive experiments conducted on various datasets demonstrate the superiority of FCMVC-IV over the competing approaches. The code is publicly available at https://github.com/wanxinhang/FCMVC-IV .
Article
High-dimensional and complex spectral structures make the clustering of hyperspectral images (HSIs) a challenging task. Subspace clustering is an effective approach for addressing this problem. However, current subspace clustering algorithms are primarily designed for a single view and do not fully exploit the spatial or textural feature information in HSI. In this study, contrastive multiview subspace clustering of HSI was proposed based on graph convolutional networks. Pixel neighbor textural and spatial–spectral information was sent to construct two graph convolutional subspaces to learn their affinity matrices. To maximize the interaction between different views, a contrastive learning algorithm was introduced to promote the consistency of positive samples and assist the model in extracting robust features. An attention-based fusion module was used to adaptively integrate these affinity matrices, constructing a more discriminative affinity matrix. The model was evaluated using four popular HSI datasets: Indian Pines, Pavia University, Houston, and Xu Zhou. It achieved overall accuracies of 97.61%, 96.69%, 87.21%, and 97.65%, respectively, and significantly outperformed state-of-the-art clustering methods. In conclusion, the proposed model effectively improves the clustering accuracy of HSI. Our implementation is available at https://github.com/GuanRX/CMSCGC .
Article
Generative Adversarial Networks (GANs) are widely-used generative models for synthesizing complex and realistic data. However, mode collapse, where the diversity of generated samples is significantly lower than that of real samples, poses a major challenge for further applications. Our theoretical analysis demonstrates that the generator loss function is non-convex with respect to its parameters when there are multiple real modes. In particular, parameters that result in generated distributions with perfect partial mode coverage of the real distribution are the local minima of the generator loss function. To address mode collapse, we propose a unified framework called Dynamic GAN. This method detects collapsed samples in the generator by thresholding on observable discriminator outputs, divides the training set based on these collapsed samples, and trains a dynamic conditional model on the partitions. The theoretical outcome ensures progressive mode coverage and experiments on synthetic and real-world data sets demonstrate that our method surpasses several GAN variants. In conclusion, we examine the root cause of mode collapse and offer a novel approach to quantitatively detect and resolve it in GANs.
Article
Multiview clustering (MVC) has gained significant attention as it enables the partitioning of samples into their respective categories through unsupervised learning. However, there are a few issues as follows: 1) many existing deep clustering methods use the same latent features to achieve the conflict objectives, namely, reconstruction and view consistency. The reconstruction objective aims to preserve view-specific features for each individual view, while the view-consistency objective strives to obtain common features across all views; 2) some deep embedded clustering (DEC) approaches adopt view-wise fusion to obtain consensus feature representation. However, these approaches overlook the correlation between samples, making it challenging to derive discriminative consensus representations; and 3) many methods use contrastive learning (CL) to align the view’s representations; however, they do not take into account cluster information during the construction of sample pairs, which can lead to the presence of false negative pairs. To address these issues, we propose a novel multiview representation learning network, called anchor-sharing and clusterwise CL (CwCL) network for multiview representation learning. Specifically, we separate view-specific learning and view-common learning into different network branches, which addresses the conflict between reconstruction and consistency. Second, we design an anchor-sharing feature aggregation (ASFA) module, which learns the sharing anchors from different batch data samples, establishes the bipartite relationship between anchors and samples, and further leverages it to improve the samples’ representations. This module enhances the discriminative power of the common representation from different samples. Third, we design CwCL module, which incorporates the learned transition probability into CL, allowing us to focus on minimizing the similarity between representations from negative pairs with a low transition probability. It alleviates the conflict in previous sample-level contrastive alignment. Experimental results demonstrate that our method outperforms the state-of-the-art performance.
Conference Paper
In this paper, we study how to achieve two characteristics highly-expected by incomplete multi-view clustering (IMvC). Namely, i) instance commonality refers to that within-cluster instances should share a common pattern, and ii) view versatility refers to that cross-view samples should own view-specific patterns. To this end, we design a novel dual-stream model which employs a dual attention layer and a dual contrastive learning loss to learn view-specific prototypes and model the sample-prototype relationship. When the view is missed, our model performs data recovery using the prototypes in the missing view and the sample-prototype relationship inherited from the observed view. Thanks to our dual-stream model, both cluster- and view-specific information could be captured, and thus the instance commonality and view versatility could be preserved to facilitate IMvC. Extensive experiments demonstrate the superiority of our method on five challenging benchmarks compared with 11 approaches. The code could be accessed from https://pengxi.me.
Article
Incomplete multi-view clustering (IMVC) analysis, where some views of multi-view data usually have missing data, has attracted increasing attention. However, existing IMVC methods still have two issues: (1) they pay much attention to imputing or recovering the missing data, without considering the fact that the imputed values might be inaccurate due to the unknown label information, (2) the common features of multiple views are always learned from the complete data, while ignoring the feature distribution discrepancy between the complete and incomplete data. To address these issues, we propose an imputation-free deep IMVC method and consider distribution alignment in feature learning. Concretely, the proposed method learns the features for each view by autoencoders and utilizes an adaptive feature projection to avoid the imputation for missing data. All available data are projected into a common feature space, where the common cluster information is explored by maximizing mutual information and the distribution alignment is achieved by minimizing mean discrepancy. Additionally, we design a new mean discrepancy loss for incomplete multi-view learning and make it applicable in mini-batch optimization. Extensive experiments demonstrate that our method achieves the comparable or superior performance compared with state-of-the-art methods.
Article
In this article, we propose a unified framework to solve the following two challenging problems in incomplete multi-view representation learning: i) how to learn a consistent representation unifying different views, and ii) how to recover the missing views. To address the challenges, we provide an information theoretical framework under which the consistency learning and data recovery are treated as a whole. With the theoretical framework, we propose a novel objective function which jointly solves the aforementioned two problems and achieves a provable sufficient and minimal representation. In detail, the consistency learning is performed by maximizing the mutual information of different views through contrastive learning, and the missing views are recovered by minimizing the conditional entropy through dual prediction. To the best of our knowledge, this is one of the first works to theoretically unify the cross-view consistency learning and data recovery for representation learning. Extensive experimental results show that the proposed method remarkably outperforms 20 competitive multi-view learning methods on six datasets in terms of clustering, classification, and human action recognition. The code could be accessed from https://pengxi.me .
Article
Deep subspace clustering (DSC) has achieved remarkable performances in the unsupervised classification of hyperspectral images. However, previous models based on pixel-level self-expressiveness of data suffer from the exponential growth of computational complexity and access memory requirements with an increasing number of samples, thus leading to poor applicability to large hyperspectral images. This article presents a neighborhood contrastive subspace clustering (NCSC) network, a scalable and robust DSC approach, for unsupervised classification of large hyperspectral images. Instead of using a conventional autoencoder, we devise a novel superpixel pooling autoencoder to learn the superpixel-level latent representation and subspace, allowing a contracted self-expressive layer. To encourage a robust subspace representation, we propose a novel neighborhood contrastive regularization to maximize the agreement between positive samples in subspace. We jointly train the resulting model in an end-to-end fashion by optimizing an adaptively weighted multitask loss. Extensive experiments on three hyperspectral benchmarks demonstrate the effectiveness of the proposed approach and its substantial advancement of state-of-the-art approaches.
Article
Nowadays, with the rapid development of data collection sources and feature extraction methods, multi-view data are getting easy to obtain and have received increasing research attention in recent years, among which, multi-view clustering (MVC) forms a mainstream research direction and is widely used in data analysis. However, existing MVC methods mainly assume that each sample appears in all the views, without considering the incomplete view case due to data corruption, sensor failure, equipment malfunction, etc. In this study, we design and build a generative partial multi-view clustering model with adaptive fusion and cycle consistency, named as GP-MVC, to solve the incomplete multi-view problem by explicitly generating the data of missing views. The main idea of GP-MVC lies in two-fold. First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the shared cluster structure across multiple views. Second, view-specific generative adversarial networks with multi-view cycle consistency are developed to generate the missing data of one view conditioning on the shared representation given by other views. These two steps could be promoted mutually, where the learned common representation facilitates data imputation and the generated data could further explores the view consistency. Moreover, an weighted adaptive fusion scheme is implemented to exploit the complementary information among different views. Experimental results on four benchmark datasets are provided to show the effectiveness of the proposed GP-MVC over the state-of-the-art methods.
Article
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding
  • G Chao
  • Y Jiang
  • D Chu
Chao, G.; Jiang, Y.; and Chu, D. 2024. Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 11221-11229.
A simple framework for contrastive learning of visual representations
  • T Chen
  • S Kornblith
  • M Norouzi
  • G Hinton
Chen, T.; Kornblith, S.; Norouzi, M.; and Hinton, G. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597-1607. PMLR.
Denoising diffusion probabilistic models. Advances in neural information processing systems
  • J Ho
  • A Jain
  • P Abbeel
Ho, J.; Jain, A.; and Abbeel, P. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 6840-6851.
Rethinking Multi-view Representation Learning via Distilled Disentangling
  • G Ke
  • B Wang
  • X Wang
  • S He
Ke, G.; Wang, B.; Wang, X.; and He, S. 2024. Rethinking Multi-view Representation Learning via Distilled Disentangling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 26774-26783.
Co-regularized multi-view spectral clustering. Advances in neural information processing systems
  • A Kumar
  • P Rai
  • H Daume
Kumar, A.; Rai, P.; and Daume, H. 2011. Co-regularized multi-view spectral clustering. Advances in neural information processing systems, 24.
Multi-granularity correspondence learning from longterm noisy videos
  • Y Lin
  • J Zhang
  • Z Huang
  • J Liu
  • Z Wen
  • X Peng
Lin, Y.; Zhang, J.; Huang, Z.; Liu, J.; Wen, Z.; and Peng, X. 2024. Multi-granularity correspondence learning from longterm noisy videos. In International conference on learning representations.
Combating mode collapse via offline manifold entropy estimation
  • H Liu
  • B Li
  • H Wu
  • H Liang
  • Y Huang
  • Y Li
  • B Ghanem
  • Y Zheng
Liu, H.; Li, B.; Wu, H.; Liang, H.; Huang, Y.; Li, Y.; Ghanem, B.; and Zheng, Y. 2023. Combating mode collapse via offline manifold entropy estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 8834-8842.
Deep safe incomplete multiview clustering: Theorem and algorithm
  • H Tang
  • Y Liu
Tang, H.; and Liu, Y. 2022. Deep safe incomplete multiview clustering: Theorem and algorithm. In International Conference on Machine Learning, 21090-21110.
Diffusion-based Missing-view Generation With the Application on Incomplete Multi-view Clustering
  • J Wen
  • S Deng
  • W Wong
  • C Guoqing
  • H Chao
  • L Fei
  • Y Xu
Wen, J.; Deng, S.; Wong, W.; Guoqing, C.; Chao, H.; Fei, L.; and Xu, Y. 2024. Diffusion-based Missing-view Generation With the Application on Incomplete Multi-view Clustering. In International Conference on Machine Learning.
Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms
  • H Xiao
  • K Rasul
  • R Vollgraf
Xiao, H.; Rasul, K.; and Vollgraf, R. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures
  • G Xu
  • J Wen
  • C Liu
  • B Hu
  • Y Liu
  • L Fei
  • W Wang
Xu, G.; Wen, J.; Liu, C.; Hu, B.; Liu, Y.; Fei, L.; and Wang, W. 2024a. Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 16147-16155.
Robust spectral embedding completion based incomplete multi-view clustering
  • C Zhang
  • J Wei
  • B Wang
  • Z Li
  • C Chen
  • H Li
Zhang, C.; Wei, J.; Wang, B.; Li, Z.; Chen, C.; and Li, H. 2023. Robust spectral embedding completion based incomplete multi-view clustering. In Proceedings of the 31st ACM International Conference on Multimedia, 300-308.