Conference Paper

K-Means Clustering in Dual Space for Unsupervised Feature Partitioning in Multi-view Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Although tagging simplifies resource browsing and retrieval, it suffers from several issues: among them are redundancy and ambiguity. In this work we focus on the problem of resolving tag word-sense ambiguity within a typical semi-automatic tagging procedure. In that process a user proposes a tag for a resource, if the tag is found to be related to more than one context, she is provided with two or more cues among which to choose, so as to remove the tag ambiguity. Key phases, in such a disambiguation procedure, are ambiguous tag detection and cue discovery. Both should rely on effective word-to-context relatedness metrics. Among the most effective relatedness metrics are those defined on the basis of a feature vector representation of the words. In this work we compare different word-to-context relatedness metrics in terms of effectiveness within the disambiguation process. We propose to use a metrics derived from a Maximum Likelihood estimator of the Jensen-Shannon Divergence among feature-count histograms and we show that such a metrics performs -- in terms of quality of the output -- better than both the Jensen-Shannon and the Symmetrized Kullback-Leibler divergence between histograms. We study the relative gain in quality within the task of unsupervised cue discovery by using a synthetic language corpus.
Conference Paper
Full-text available
Folksonomies - networks of users, resources, and tags allow users to easily retrieve, organize and browse web contents. However, their advantages are still limited according to the noisiness of user provided tags. To overcome this problem, we propose an approach for identifying related tags in folksonomies. The approach uses tag co-occurrence statistics and Laplacian score feature selection to create probability distribution for each tag. Consequently, related tags are determined according to the distance between their distributions. In this regards, we propose a distance metric based on Jensen-Shannon Divergence. The new metric named AJSD deals with the noise in the measurements due to statistical fluctuations in tag co-occurrences. We experimentally evaluated our approach using WordNet and compared it to a common tag relatedness approach based on the cosine similarity. The results show the effectiveness of our approach and its advantage over the adversary method.
Article
Full-text available
In many clustering problems, we have access to multiple views of the data each of which could be individually used for clustering. Exploiting information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. Often these different views admit same underlying clustering of the data, so we can approach this problem by looking for clusterings that are consistent across the views, i.e., corresponding data points in each view should have same cluster membership. We propose a spectral cluster-ing framework that achieves this goal by co-regularizing the clustering hypothe-ses, and propose two co-regularization schemes to accomplish this. Experimental comparisons with a number of baselines on two synthetic and three real-world datasets establish the efficacy of our proposed approaches.
Article
Full-text available
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.
Conference Paper
Full-text available
We consider clustering problems in which the available attributes can be split into two independent subsets, such that either subset suffices for learning. Example applications of this multi-view setting include clustering of Web pages which have an intrinsic view (the pages themselves) and an extrinsic view (e.g., anchor texts of inbound hyperlinks); multi-view learning has so far been studied in the context of classification. We develop and study partitioning and agglomerative, hierarchical multi-view clustering algorithms for text data. We find empirically that the multi-view versions of k-means and EM greatly improve on their single-view counterparts. By contrast, we obtain negative results for agglomerative hierarchical multi-view clustering. Our analysis explains this surprising phenomenon.
Article
Multi-view learning is an emerging direction in machine learning which considers learning with multiple views to improve the generalization performance. Multi-view learning is also known as data fusion or data integration from multiple feature sets. Since the last survey of multi-view machine learning in early 2013, multi-view learning has made great progress and developments in recent years, and is facing new challenges. This overview first reviews theoretical underpinnings to understand the properties and behaviors of multi-view learning. Then multi-view learning methods are described in terms of three classes to offer a neat categorization and organization. For each category, representative algorithms and newly proposed algorithms are presented. The main feature of this survey is that we provide comprehensive introduction for the recent developments of multi-view learning methods on the basis of coherence with early methods. We also attempt to identify promising venues and point out some specific challenges which can hopefully promote further research in this rapidly developing field.
Article
Folksonomies - networks of users, resources, and tags allow users to easily retrieve, organize and browse web contents. However, their advantages are still limited mainly due to the noisiness of user provided tags. To overcome this issue, we propose an approach for characterizing related tags in folksonomies: we use tag co-occurrence statistics and Laplacian score based feature selection in order to create empirical co-occurrence probability distribution for each tag; then we identify related tags on the basis of the dissimilarity between their distributions. For this purpose, we introduce variant of the Jensen-Shannon Divergence, which is more robust to statistical noise. We experimentally evaluate our approach using WordNet and compare it to a common tag-relatedness approach based on the cosine similarity. The results show the effectiveness of our approach and its advantage over the competing method.
Conference Paper
Folksonomies - collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap. However, user tags are noisy; thus, they need to be processed before they can be used by further applications. In this paper, we propose an approach for bootstrapping semantics from folksonomy tags. Our goal is to automatically identify semantically related tags. The approach is based on creating probability distribution for each tag based on co-occurrence statistics. Subsequently, the similarity between two tags is determined by the distance between their corresponding probability distributions. For this purpose, we propose an extension for the well-known Jensen-Shannon Divergence. We compared our approach to a widely used method for identifying similar tags based on the cosine measure. The evaluation shows promising results and emphasizes the advantage of our approach.
Article
Multi-view learning or learning with multiple distinct feature sets is a rapidly growing direction in machine learning with well theoretical underpinnings and great practical success. This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited. This survey aims to provide an insightful organization of current developments in the field of multi-view learning, identify their limitations, and give suggestions for further research. One feature of this survey is that we attempt to point out specific open problems which can hopefully be useful to promote the research of multi-view machine learning.
Article
Cluster ensembles combine multiple clusterings of a set of objects into a single consolidated clustering, often referred to as the consensus solution. Consensus clustering can be used to generate more robust and stable clustering results compared to a single clustering approach, perform distributed computing under privacy or sharing constraints, or reuse existing knowledge. This paper describes a variety of algorithms that have been proposed to address the cluster ensemble problem, organizing them in conceptual categories that bring out the common threads and lessons learnt while simultaneously highlighting unique features of individual approaches. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 305–315 DOI: 10.1002/widm.32
Conference Paper
In this paper we propose a new graph-based feature splitting algorithm maxlnd, which creates a balanced split maximizing the independence between the two feature sets. We study the performance of RBF net in a co-training setting with natural, truly independent, random and maxlnd split. The results show that RBF net is successful in a co-training setting, outperforming SVM and NB. Co-training is also found to be sensitive to the trade-off between the dependence of the features within a feature set, and the dependence between the feature sets.
Article
Co-training is a multiview semi-supervised learning algorithm to learn from both labeled and unlabeled data, which iteratively adopts a classifier trained on one view to teach the other view using some confident predictions given on unlabeled examples. However, as it does not examine the reliability of the labels provided by classifiers on either view, co-training might be problematic. Even very few inaccurately labeled examples can deteriorate the performance of learned classifiers to a large extent. In this paper, a new method named robust co-training is proposed, which integrates canonical correlation analysis (CCA) to inspect the predictions of co-training on those unlabeled training examples. CCA is applied to obtain a low-dimensional and closely correlated representation of the original multiview data. Based on this representation the similarities between an unlabeled example and the original labeled examples are determined. Only those examples whose predicted labels are consistent with the outcome of CCA examination are eligible to augment the original labeled data. The performance of robust co-training is evaluated on several different classification problems where encouraging experimental results are observed.
Article
The microarray technology is increasingly being applied in biological and medical research to address a wide range of problems such as the classification of tumors. An important statistical question associated with tumor classification is the identification of new tumor classes using gene expression profiles. Essential aspects of this clustering problem include identifying accurate partitions of the tumor samples into clusters and assessing the confidence of cluster assignments for individual samples. Two new resampling methods, inspired from bagging in prediction, are proposed to improve and assess the accuracy of a given clustering procedure. In these ensemble methods, a partitioning clustering procedure is applied to bootstrap learning sets and the resulting multiple partitions are combined by voting or the creation of a new dissimilarity matrix. As in prediction, the motivation behind bagging is to reduce variability in the partitioning results via averaging. The performances of the new and existing methods were compared using simulated data and gene expression data from two recently published cancer microarray studies. The bagged clustering procedures were in general at least as accurate and often substantially more accurate than a single application of the partitioning clustering procedure. A valuable by-product of bagged clustering are the cluster votes which can be used to assess the confidence of cluster assignments for individual observations. For supplementary information on datasets, analyses, and software, consult http://www.stat.berkeley.edu/~sandrine and http://www.bioconductor.org.
Article
Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where, unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.
Article
Both document clustering and word clustering are important and well-studied problems. By using the vector space model, a document collection may be represented as a word-document matrix. In this paper, we present the novel idea of modeling the document collection as a bipartite graph between documents and words. Using this model, we pose the clustering probliem as a graph partitioning problem and give a new spectral algorithm that simultaneously yields a clustering of documents and words. This co-clustrering algorithm uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. In fact, it can be shown that these singular vectors give a real relaxation to the optimal solution of the graph bipartitioning problem. We present several experimental results to verify that the resulting co-clustering algoirhm works well in practice and is robust in the presence of noise.
Article
We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks that point to that page. We assume that either view of the example would be sufficient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment a much smaller set of labeled examples. Specifically, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm 's predictions on new unlabeled examples are used to e...