Preprint

GCFAgg: Global and Cross-view Feature Aggregation for Multi-view Clustering

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Multi-view clustering can partition data samples into their categories by learning a consensus representation in unsupervised way and has received more and more attention in recent years. However, most existing deep clustering methods learn consensus representation or view-specific representations from multiple views via view-wise aggregation way, where they ignore structure relationship of all samples. In this paper, we propose a novel multi-view clustering network to address these problems, called Global and Cross-view Feature Aggregation for Multi-View Clustering (GCFAggMVC). Specifically, the consensus data presentation from multiple views is obtained via cross-sample and cross-view feature aggregation, which fully explores the complementary ofsimilar samples. Moreover, we align the consensus representation and the view-specific representation by the structure-guided contrastive learning module, which makes the view-specific representations from different samples with high structure relationship similar. The proposed module is a flexible multi-view data representation module, which can be also embedded to the incomplete multi-view data clustering task via plugging our module into other frameworks. Extensive experiments show that the proposed method achieves excellent performance in both complete multi-view data clustering tasks and incomplete multi-view data clustering tasks.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Multi-view clustering (MVC) has attracted more and more attention in the recent few years by making full use of complementary and consensus information between multiple views to cluster objects into different partitions. Although there have been two existing works for MVC survey, neither of them jointly takes the recent popular deep learning-based methods into consideration. Therefore, in this paper, we conduct a comprehensive survey of MVC from the perspective of representation learning. It covers a quantity of multi-view clustering methods including the deep learning-based models, providing a novel taxonomy of the MVC algorithms. Furthermore, the representation learning-based MVC methods can be mainly divided into two categories, i.e., shallow representation learning-based MVC and deep representation learning-based MVC, where the deep learning-based models are capable of handling more complex data structure as well as showing better expression. In the shallow category, according to the means of representation learning, we further split it into two groups, i.e., multi-view graph clustering and multi-view subspace clustering. To be more comprehensive, basic research materials of MVC are provided for readers, containing introductions of the commonly used multi-view datasets with the download link and the open source code library. In the end, some open problems are pointed out for further investigation and development.
Conference Paper
Full-text available
Multi-view subspace clustering has received widespread attention to effectively fuse multi-view information among multimedia applications. Considering that most existing approaches' cubic time complexity makes it challenging to apply to realistic large-scale scenarios , some researchers have addressed this challenge by sampling anchor points to capture distributions in different views. However, the separation of the heuristic sampling and clustering process leads to weak discriminate anchor points. Moreover, the complementary multi-view information has not been well utilized since the graphs are constructed independently by the anchors from the corresponding views. To address these issues, we propose a Scalable Multi-view Subspace Clustering with Unified Anchors (SMVSC). To be specific, we combine anchor learning and graph construction into a unified optimization framework. Therefore, the learned anchors can represent the actual latent data distribution more accurately, leading to a more discriminative clustering structure. Most importantly, the linear time complexity of our proposed algorithm allows the multi-view subspace clustering approach to be applied to large-scale data. Then, we design a four-step alternative optimization algorithm with proven convergence. Compared with state-of-the-art multi-view subspace clustering methods and large-scale oriented methods, the experimental results on several datasets demonstrate that our SMVSC method achieves comparable or better clustering performance much more efficiently. • Computing methodologies → Cluster analysis; • Theory of computation → Unsupervised learning and clustering.
Article
Full-text available
Multi-view graph clustering has been intensively investigated during the past years. However, existing methods are still limited in two main aspects. On the one hand, most of them can not deal with data that have both attributes and graphs. Nowadays, multi-view attributed graph data are ubiquitous and the need for effective clustering methods is growing. On the other hand, many state-of-the-art algorithms are either shallow or deep models. Shallow methods may seriously restrict their capacity for modeling complex data, while deep approaches often involve large number of parameters and are expensive to train in terms of running time and space needed. In this paper, we propose a novel multi-view attributed graph clustering (MAGC) framework, which exploits both node attributes and graphs. Our novelty lies in three aspects. First, instead of deep neural networks, we apply a graph filtering technique to achieve a smooth node representation. Second, the original graph could be noisy or incomplete and is not directly applicable, thus we learn a consensus graph from data by considering the heterogeneous views. Third, high-order relations are explored in a flexible way by designing a new regularizer. Extensive experiments demonstrate the superiority of our method in terms of effectiveness and efficiency.
Article
Full-text available
Multi-view clustering (MVC), which aims to explore the underlying structure of data by leveraging heterogeneous information of different views, has brought along a growth of attention. Multi-view clustering algorithms based on different theories have been proposed and extended in various applications. However, most existing MVC algorithms are shallow models, which learn structure information of multi-view data by mapping multi-view data to low-dimensional representation space directly, ignoring the nonlinear structure information hidden in each view, and thus, the performance of multi-view clustering is weakened to a certain extent. In this paper, we propose a deep multi-view clustering algorithm based on multiple auto-encoder, termed MVC-MAE, to cluster multi-view data. MVC-MAE adopts auto-encoder to capture the nonlinear structure information of each view in a layer-wise manner and incorporate the local invariance within each view and consistent as well as complementary information between any two views together. Besides, we integrate the representation learning and clustering into a unified framework, such that two tasks can be jointly optimized. Extensive experiments on six real-world datasets demonstrate the promising performance of our algorithm compared with 15 baseline algorithms in terms of two evaluation metrics.
Article
Full-text available
Multi-view clustering, which explores complementarity and consistency among multiple distinct feature sets to boost clustering performance, is becoming more and more useful in many real-world applications. Traditional approaches usually map multiple views to a unified embedding, in which some weighted mechanisms are utilized to measure the importance of each view. The embedding, serving as a clustering friendly representation, is then sent to extra clustering algorithms. However, a unified embedding cannot cover both complementarity and consistency among views and the weighted scheme measuring the importance of each view as a whole ignores the differences of features in each view. Moreover, because of lacking in proper grouping structure constraint imposed on the unified embedding, it will lead to just multi-view representation learned, which is not clustering friendly. In this paper, we propose a novel multi-view clustering method to alleviate the above problems. By dividing the embedding of a view into unified and view-specific vectors explicitly, complementarity and consistency can be reflected. Besides, an adversarial learning process is developed to force the above embeddings to be non-trivial. Then a fusion strategy is automatically learned, which will adaptively adjust weights for all the features in each view. Finally, a Kullback-Liebler (KL) divergence based objective is developed to constrain the fused embedding for clustering friendly representation learning and to conduct clustering. Extensive experiments have been conducted on various datasets, performing better than the state-of-the-art clustering approaches.
Conference Paper
Full-text available
Multi-view clustering has attracted increasing attention in recent years by exploiting common clustering structure across multiple views. Most existing multi-view clustering algorithms use shallow and linear embedding functions to learn the common structure of multi-view data. However, these methods cannot fully utilize the non-linear property of multi-view data, which is important to reveal complex cluster structure underlying multi-view data. In this paper, we propose a novel multi-view clustering method, named Deep Adversarial Multi-view Clustering (DAMC) network, to learn the intrinsic structure embedded in multi-view data. Specifically, our model adopts deep auto-encoders to learn latent representations shared by multiple views, and meanwhile leverages adversarial training to further capture the data distribution and disentangle the latent space. Experimental results on several real-world datasets demonstrate that the proposed method outperforms the state-of art methods.
Article
Full-text available
We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder takes multimodal data as input and fuses them to a latent space representation. We investigate early, late and intermediate fusion techniques and propose three different encoders corresponding to them for spatial fusion. The self-expressive layers and multimodal decoders are essentially the same for different spatial fusion-based approaches. In addition to various spatial fusion-based methods, an affinity fusion-based network is also proposed in which the self-expressiveness layer corresponding to different modalities is enforced to be the same. Extensive experiments on three datasets show that the proposed methods significantly outperform the state-of-the-art multimodal subspace clustering methods.
Article
Full-text available
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed Self-Supervised Video Hashing (SSVH), that is able to capture the temporal nature of videos in an end-to-end learning-to-hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos; and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary autoencoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world datasets (FCVID and YFCC) show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the currently best performance on the task of unsupervised video retrieval.
Technical Report
Full-text available
In this note, we show that k-means clustering can be understood as a constrained matrix factorization problem. This insight will later allow us to recognize that k-means clustering is but a specific latent factor model and closely related to techniques such as non-negative matrix factorization or archetypal analysis.
Article
Full-text available
With the fast development of information technology, especially the popularization of internet, multi-view learning becomes more and more popular in machine learning and data mining fields. As we all know that, multi-view semi-supervised learning, such as co-training, co-regularization has gained considerable attentions. Although recently, multi-view clustering (MVC) has developed rapidly, there are not a survey or review to summarize and analyze the current progress. Therefore, this paper sums up the common strategies of combining multiple views and based on that we proposed a novel taxonomy of the MVC approaches. We also discussed the relationships between MVC and multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and multi-view semi-supervised learning. Several representative real-world applications are elaborated. To promote the further development of MVC, we pointed out several open problems that are worth exploring in the future.
Article
Full-text available
Bacteria and Archaea display a variety of phenotypic traits and can adapt to diverse ecological niches. However, systematic annotation of prokaryotic phenotypes is lacking. We have therefore developed Pro-Traits, a resource containing ∼545 000 novel phenotype inferences, spanning 424 traits assigned to 3046 bacterial and archaeal species. These annotations were assigned by a computational pipeline that associates microbes with phenotypes by text-mining the scientific literature and the broader World Wide Web, while also being able to define novel concepts from unstructured text. Moreover, the Pro-Traits pipeline assigns phenotypes by drawing extensively on comparative genomics, capturing patterns in gene repertoires, codon usage biases, proteome composition and co-occurrence in metagenomes. Notably, we find that gene synteny is highly predic-tive of many phenotypes, and highlight examples of gene neighborhoods associated with spore-forming ability. A global analysis of trait interrelatedness outlined clusters in the microbial phenotype network, suggesting common genetic underpinnings. Our extended set of phenotype annotations allows detection of 57 088 high confidence gene-trait links, which recover many known associations involving sporulation, flagella, catalase activity, aerobicity, photosynthesis and other traits. Over 99% of the commonly occurring gene families are involved in genetic interactions conditional on at least one phenotype, suggesting that epistasis has a major role in shaping microbial gene content.
Article
Full-text available
In many clustering problems, we have access to multiple views of the data each of which could be individually used for clustering. Exploiting information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. Often these different views admit same underlying clustering of the data, so we can approach this problem by looking for clusterings that are consistent across the views, i.e., corresponding data points in each view should have same cluster membership. We propose a spectral cluster-ing framework that achieves this goal by co-regularizing the clustering hypothe-ses, and propose two co-regularization schemes to accomplish this. Experimental comparisons with a number of baselines on two synthetic and three real-world datasets establish the efficacy of our proposed approaches.
Article
Full-text available
Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms. In this article, we propose a novel computational model for jointly stage classification and anatomical terms annotation of Drosophila gene expression patterns. We propose a novel Tri-Relational Graph (TG) model that comprises the data graph, anatomical term graph, developmental stage term graph, and connect them by two additional graphs induced from stage or annotation label assignments. Upon the TG model, we introduce a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP datasets demonstrate that our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods. http://ranger.uta.edu/%7eheng/Drosophila/.
Conference Paper
Full-text available
The Kullback Leibler (KL) divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian mixture models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically tractable, nor does any efficient computational algorithm exist. Some techniques cope with this problem by replacing the KL divergence with other functions that can be computed efficiently. We introduce two new methods, the variational approximation and the variational upper bound, and compare them to existing methods. We discuss seven different techniques in total and weigh the benefits of each one against the others. To conclude we evaluate the performance of each one through numerical experiments
Article
The existing deep multiview clustering (MVC) methods are mainly based on autoencoder networks, which seek common latent variables to reconstruct the original input of each view individually. However, due to the view-specific reconstruction loss, it is challenging to extract consistent latent representations over multiple views for clustering. To address this challenge, we propose adversarial MVC (AMvC) networks in this article. The proposed AMvC generates each view's samples conditioning on the fused latent representations among different views to encourage a more consistent clustering structure. Specifically, multiview encoders are used to extract latent descriptions from all the views, and the corresponding generators are used to generate the reconstructed samples. The discriminative networks and the mean squared loss are jointly utilized for training the multiview encoders and generators to balance the distinctness and consistency of each view's latent representation. Moreover, an adaptive fusion layer is developed to obtain a shared latent representation, on which a clustering loss and the l1,2{l_{1,2}} -norm constraint are further imposed to improve clustering performance and distinguish the latent space. Experimental results on video, image, and text datasets demonstrate that the effectiveness of our AMvC is over several state-of-the-art deep MVC methods.
Article
Recently, multi-view clustering has received much attention in the fields of machine learning and pattern recognition. Spectral clustering for single and multiple views has been the common solution. Despite its good clustering performance, it has a major limitation: it requires an extra step of clustering. This extra step, which could be the famous k-means clustering, depends heavily on initialization, which may affect the quality of the clustering result. To overcome this problem, a new method called Multi-view Clustering via Consensus Graph Learning and Nonnegative Embedding (MVCGE) is presented in this paper. In the proposed approach, the consensus affinity matrix (graph matrix), consensus representation and cluster index matrix (nonnegative embedding) are learned simultaneously in a unified framework. Our proposed method takes as input the different kernel matrices corresponding to the different views. The proposed learning model integrates two interesting constraints: (i) the cluster indices should be as smooth as possible over the consensus graph and (ii) the cluster indices are set to be as close as possible to the graph convolution of the consensus representation. In this approach, no post-processing such as k-means or spectral rotation is required. Our approach is tested with real and synthetic datasets. The experiments performed show that the proposed method performs well compared to many state-of-the-art approaches.
Article
Multi-view clustering, which exploits the multi-view information to partition data into their clusters, has attracted intense attention. However, most existing methods directly learn a similarity graph from original multi-view features, which inevitably contain noises and redundancy information. The learned similarity graph is inaccurate and is insufficient to depict the underlying cluster structure of multi-view data. To address this issue, we propose a novel multi-view clustering method that is able to construct an essential similarity graph in a spectral embedding space instead of the original feature space. Concretely, we first obtain multiple spectral embedding matrices from the view-specific similarity graphs, and reorganize the gram matrices constructed by the inner product of the normalized spectral embedding matrices into a tensor. Then, we impose a weighted tensor nuclear norm constraint on the tensor to capture high-order consistent information among multiple views. Furthermore, we unify the spectral embedding and low rank tensor learning into a unified optimization framework to determine the spectral embedding matrices and tensor representation jointly. Finally, we obtain the consensus similarity graph from the gram matrices via an adaptive neighbor manner. An efficient optimization algorithm is designed to solve the resultant optimization problem. Extensive experiments on six benchmark datasets are conducted to verify the efficacy of the proposed method. The code is implemented by using MATLAB R2018a and MindSpore library [1] : https://github.com/guanyuezhen/CGL .
Article
Taking the assumption that data samples are able to be reconstructed with the dictionary formed by themselves, recent multi-view subspace clustering algorithms aim to find a consensus reconstruction matrix via exploring complementary information across multiple views. Most of them directly operate on the original data observations without pre-processing, while others on corresponding kernel matrices. However, they both ignore that the collected features may be designed arbitrarily and hard guaranteed to be independent and non-overlapping. As a result, original data observations and kernel matrices would contain a large number of redundant details. To address this issue, we propose a multi-view subspace clustering algorithm which groups samples and removes data redundancy concurrently. In specific, eigen-decomposition is employed to obtain the robust data representation of low-redundancy for later clustering. By utilizing the two processes into an unified model, clustering results will guide eigen-decomposition to generate more discriminative data representation, which, as a feedback, helps obtain better clustering results. Additionally, an alternate and convergent algorithm is designed to solve the optimization problem. Extensive experiments are conducted on eight benchmarks, and the proposed algorithm outperforms comparative ones in recent literature by a large margin, verifying its superiority. At the same time, its effectiveness, computational efficiency and robustness to noise are validated experimentally.
Article
Multiview clustering has aroused increasing attention in recent years since real-world data are always comprised of multiple features or views. Despite the existing clustering methods having achieved promising performance, there still remain some challenges to be solved: 1) most existing methods are unscalable to large-scale datasets due to the high computational burden of eigendecomposition or graph construction and 2) most methods learn latent representations and cluster structures separately. Such a two-step learning scheme neglects the correlation between the two learning stages and may obtain a suboptimal clustering result. To address these challenges, a pseudo-label guided collective matrix factorization (PLCMF) method that jointly learns latent representations and cluster structures is proposed in this article. The proposed PLCMF first performs clustering on each view separately to obtain pseudo-labels that reflect the intraview similarities of each view. Then, it adds a pseudo-label constraint on collective matrix factorization to learn unified latent representations, which preserve the intraview and interview similarities simultaneously. Finally, it intuitively incorporates latent representation learning and cluster structure learning into a joint framework to directly obtain clustering results. Besides, the weight of each view is learned adaptively according to data distribution in the joint framework. In particular, the joint learning problem can be solved with an efficient iterative updating method with linear complexity. Extensive experiments on six benchmark datasets indicate the superiority of the proposed method over state-of-the-art multiview clustering methods in both clustering accuracy and computational efficiency.
Article
Multi-view clustering is an important approach for analyzing multi-view data in an unsupervised way. Among various methods, the multi-view subspace clustering approach has gained increasing attention due to its encouraging performance. Essentially, it integrates multi-view information into graphs, which are then fed into spectral clustering algorithm for final results. However, its performance may degrade due to noises existing in each individual view or inconsistencies between heterogeneous features. Orthogonal to current work, we propose to fuse multi-view information in a partition space, which enhances the robustness of Multi-view clustering. Specifically, we generate multiple partitions and integrate them to find a shared partition. The proposed model unifies graph learning, generation of basic partitions, and view weight learning. These three components co-evolve towards better quality outputs. We have conducted comprehensive experiments on benchmark datasets and our empirical results verify the effectiveness and robustness of our approach.
Article
Multi-view clustering has attracted increasing attentions recently by utilizing information from multiple views. However, existing multi-view clustering methods are either with high computation and space complexities, or lack of representation capability. To address these issues, we propose deep embedded multi-view clustering with collaborative training (DEMVC) in this paper. Firstly, the embedded representations of multiple views are learned individually by deep autoencoders. Then, both consensus and complementary of multiple views are taken into account and a novel collaborative training scheme is proposed. Concretely, the feature representations and cluster assignments of all views are learned collaboratively. A new consistency strategy for cluster centers initialization is further developed to improve the multi-view clustering performance with collaborative training. Experimental results on several popular multi-view datasets show that DEMVC achieves significant improvements over state-of-the-art methods.
Article
Recently, an increasingly pervasive trend in real-word applications is that the data are collected from multiple sources or represented by multiple views. Owing to the powerful ability of affinity graph in capturing the structural relationships among samples, constructing a robust and meaningful affinity graph has been extensively studied, especially in spectral clustering tasks. However, conventional spectral clustering extended to multi-view scenarios cannot obtain the satisfactory performance due to the presence of noise and the heterogeneity among different views. In this paper, we propose a robust affinity graph learning framework to deal with this issue. First, we employ an improved feature selection algorithm that integrates the advantages of hypergraph embedding and sparse regression to select significant features such that more robust graph Laplacian matrices for various views on this basis can be constructed. Second, we model hypergraph Laplacians as points on a Grassmann manifold and propose a Consistent Affinity Graph Learning (CAGL) algorithm to fuse all views. CAGL aims to learn a latent common affinity matrix shared by all Laplacian matrices by taking both the clustering quality evaluation criterion and the view consistency loss into account. We also develop an alternating descent algorithm to optimize the objective function of CAGL. Experiments on five publicly available datasets demonstrate that our proposed method obtains promising results compared with state-of-the-art methods.
Conference Paper
In recent years, incomplete multi-view clustering, which studies the challenging multi-view clustering problem on missing views, has received growing research interests. Although a series of methods have been proposed to address this issue, the following problems still exist: 1) Almost all of the existing methods are based on shallow models, which is difficult to obtain discriminative common representations. 2) These methods are generally sensitive to noise or outliers since the negative samples are treated equally as the important samples. In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues. Specifically, it captures the high-level features and local structure of each view by incorporating the view-specific deep encoders and graph embedding strategy into a framework. Moreover, based on the human cognition, \emph{i.e.}, learning from easy to hard, it introduces a self-paced strategy to select the most confident samples for model training, which can reduce the negative influence of outliers. Experimental results on several incomplete datasets show that CDIMC-net outperforms the state-of-the-art incomplete multi-view clustering methods.
Article
An important underlying assumption that guides the success of the existing multiview learning algorithms is the full observation of the multiview data. However, such rigorous precondition clearly violates the common-sense knowledge in practical applications, where in most cases, only incomplete fractions of the multiview data are given. The presence of the incomplete settings generally disables the conventional multiview clustering methods. In this article, we propose a simple but effective incomplete multiview clustering (IMC) framework, which simultaneously considers the local geometric information and the unbalanced discriminating powers of these incomplete multiview observations. Specifically, a novel graph-regularized matrix factorization model, on the one hand, is developed to preserve the local geometric similarities of the learned common representations from different views. On the other hand, the semantic consistency constraint is introduced to stimulate these view-specific representations toward a unified discriminative representation. Moreover, the importance of different views is adaptively determined to reduce the negative influence of the unbalanced incomplete views. Furthermore, an efficient learning algorithm is proposed to solve the resulting optimization problem. Extensive experimental results performed on several incomplete multiview datasets demonstrate that the proposed method can achieve superior clustering performance in comparison with some state-of-the-art multiview learning methods.
Article
Although numerous multi-view spectral clustering algorithms have been developed, most of them generally encounter the following two deficiencies. First, high time cost. Second, inferior operability. To this end, in this work we provide a simple yet effective method for multi-view spectral clustering. The main idea is to learn a consistent similarity matrix with sparse structure from multiple views. We show that proposed method is fast, straightforward to implement, and can achieve comparable or better clustering results compared to several state-of-the-art algorithms. Furthermore, the computation complexity of proposed method is approximately equivalent to the single-view spectral clustering. For these advantages, it can be considered as a baseline for multi-view spectral clustering.
Conference Paper
In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally self-weighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as K-means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graph-based multiview clustering approaches on four real-world datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.
Conference Paper
Recognizing visual content in unconstrained videos has become a very important problem for many applications. Existing corpora for video analysis lack scale and/or content diversity, and thus limited the needed progress in this critical area. In this paper, we describe and release a new database called CCV, containing 9,317 web videos over 20 semantic categories, including events like "baseball" and "parade", scenes like "beach", and objects like "cat". The database was collected with extra care to ensure relevance to consumer interest and originality of video content without post-editing. Such videos typically have very little textual annotation and thus can benefit from the development of automatic content analysis techniques. We used Amazon MTurk platform to perform manual annotation, and studied the behaviors and performance of human annotators on MTurk. We also compared the abilities in understanding consumer video content by humans and machines. For the latter, we implemented automatic classifiers using state-of-the-art multi-modal approach that achieved top performance in recent TRECVID multimedia event detection task. Results confirmed classifiers fusing audio and video features significantly outperform single-modality solutions. We also found that humans are much better at understanding categories of nonrigid objects such as "cat", while current automatic techniques are relatively close to humans in recognizing categories that have distinctive background scenes or audio patterns.
Article
High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.
A simple framework for contrastive learning of visual representations
  • Ting Chen
  • Simon Kornblith
  • Mohammad Norouzi
  • Geoffrey Hinton
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597-1607. PMLR, 2020. 3, 4
  • Dong Huang
  • Chang-Dong Wang
  • Jian-Huang Lai
Dong Huang, Chang-Dong Wang, and Jian-Huang Lai. Fast multi-view clustering via ensembles: Towards scalability, superiority, and simplicity. arXiv preprint arXiv:2203.11572, 2022. 5, 6
Large-scale multi-view subspace clustering in linear time
  • Zhao Kang
  • Wangtao Zhou
  • Zhitong Zhao
  • Junming Shao
  • Meng Han
  • Zenglin Xu
Zhao Kang, Wangtao Zhou, Zhitong Zhao, Junming Shao, Meng Han, and Zenglin Xu. Large-scale multi-view subspace clustering in linear time. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4412-4419, 2020. 2, 5, 6
Adam: A method for stochastic optimization
  • P Diederik
  • Jimmy Kingma
  • Ba
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 5
Information theory, inference and learning algorithms. Cambridge university press
  • J C David
  • David Jc Mac Mackay
  • Kay
David JC MacKay, David JC Mac Kay, et al. Information theory, inference and learning algorithms. Cambridge university press, 2003. 5
Multi-view contrastive graph clustering
  • Erlin Pan
  • Zhao Kang
Erlin Pan and Zhao Kang. Multi-view contrastive graph clustering. Advances in neural information processing systems, 34:2148-2159, 2021. 2, 3
Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems
  • Adam Paszke
  • Sam Gross
  • Francisco Massa
  • Adam Lerer
  • James Bradbury
  • Gregory Chanan
  • Trevor Killeen
  • Zeming Lin
  • Natalia Gimelshein
  • Luca Antiga
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. 5
Comic: Multi-view clustering without parameter selection
  • Xi Peng
  • Zhenyu Huang
  • Jiancheng Lv
  • Hongyuan Zhu
  • Joey Tianyi Zhou
Xi Peng, Zhenyu Huang, Jiancheng Lv, Hongyuan Zhu, and Joey Tianyi Zhou. Comic: Multi-view clustering without parameter selection. In International conference on machine learning, pages 5092-5101, 2019. 5
Deep safe incomplete multi-view clustering: Theorem and algorithm
  • Huayi Tang
  • Yong Liu
Huayi Tang and Yong Liu. Deep safe incomplete multi-view clustering: Theorem and algorithm. In Proceedings of the 39th International Conference on Machine Learning, pages 162:21090-21110, 2022. 1, 2, 5, 6, 7, 8
Contrastive multiview coding
  • Yonglong Tian
  • Dilip Krishnan
  • Phillip Isola
Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. In European conference on computer vision, pages 776-794. Springer, 2020. 3
Reconsidering representation alignment for multi-view clustering
  • Sigurd Daniel J Trosten
  • Robert Lokse
  • Michael Jenssen
  • Kampffmeyer
Daniel J Trosten, Sigurd Lokse, Robert Jenssen, and Michael Kampffmeyer. Reconsidering representation alignment for multi-view clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1255-1265, 2021. 1, 3, 5, 6, 7
Visualizing data using t-sne
  • Laurens Van Der Maaten
  • Geoffrey Hinton
Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008. 8
Attention is all you need. Advances in neural information processing systems
  • Ashish Vaswani
  • Noam Shazeer
  • Niki Parmar
  • Jakob Uszkoreit
  • Llion Jones
  • Aidan N Gomez
  • Łukasz Kaiser
  • Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. 4