## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

Subspace clustering is a growing field of unsuper-vised learning that has gained much popularity in the computer vision community. Applications can be found in areas such as motion segmentation and face clustering. It assumes that data originate from a union of subspaces, and clusters the data depending on the corresponding subspace. In practice, it is reasonable to assume that a limited amount of labels can be obtained, potentially at a cost. Therefore, algorithms that can effectively and efficiently incorporate this information to improve the clustering model are desirable. In this paper, we propose an active learning framework for subspace clustering that sequentially queries informative points and updates the subspace model. The query stage of the proposed framework relies on results from the perturbation theory of principal component analysis, to identify influential and potentially misclassified points. A constrained subspace clustering algorithm is proposed that monotonically decreases the objective function subject to the constraints imposed by the labelled data. We show that our proposed framework is suitable for subspace clustering algorithms including iterative methods and spectral methods. Experiments on synthetic data sets, motion segmentation data sets, and Yale Faces data sets demonstrate the advantage of our proposed active strategy over state-of-the-art.

To read the full-text of this research,

you can request a copy directly from the authors.

... We therefore discuss an active learning strategy that is designed to query the labels of points so as to maximise the overall quality of the subspace clustering model. In particular, we draw on the work of Peng and Pavlidis (2019) to select informative points to label for subspace clustering. The label information is subsequently incorporated in a constrained clustering formulation that combines WSSR and constrained K -subspace clustering. ...

... In active learning the algorithm controls the choice of points for which external information is obtained. The majority of active learning techniques are designed for supervised methods, and little research has considered the problem of active learning for subspace clustering Balzano 2015, 2017;Peng and Pavlidis 2019). Lipor and Balzano (2015) propose two active strategies. ...

... However, correctly assigning these points is not guaranteed to maximally improve the accuracy of the estimated subspaces, hence the overall quality of the clustering. Peng and Pavlidis (2019) propose an active learning strategy for sequentially querying point(s) to maximise the decrease of the overall reconstruction error in (2). ...

Spectral-based subspace clustering methods have proved successful in many challenging applications such as gene sequencing, image recognition, and motion segmentation. In this work, we first propose a novel spectral-based subspace clustering algorithm that seeks to represent each point as a sparse convex combination of a few nearby points. We then extend the algorithm to a constrained clustering and active learning framework. Our motivation for developing such a framework stems from the fact that typically either a small amount of labelled data are available in advance; or it is possible to label some points at a cost. The latter scenario is typically encountered in the process of validating a cluster assignment. Extensive experiments on simulated and real datasets show that the proposed approach is effective and competitive with state-of-the-art methods.

... In this work, we propose an iterative active learning and constrained clustering framework for WSSR. We draw upon the work of Peng and Pavlidis (2019) to select informative points to query for subspace clustering. The cluster assignment is updated after obtaining the labels of these points. ...

... In active learning the algorithm controls the choice of points for which external information is obtained. The majority of active learning techniques are designed for supervised methods, and little research has considered the problem of active learning for subspace clustering Balzano, 2015, 2017;Peng and Pavlidis, 2019). Lipor and Balzano (2015) propose two active strategies. ...

... However, correctly assigning these points is not guaranteed to maximally improve the accuracy of the estimated subspaces, hence the overall quality of the clustering. Peng and Pavlidis (2019) propose an active learning strategy for sequentially querying point(s) to maximise the decrease of the overall reconstruction error in (2). ...

Spectral-based subspace clustering methods have proved successful in many challenging applications such as gene sequencing, image recognition, and motion segmentation. In this work, we first propose a novel spectral-based subspace clustering algorithm that seeks to represent each point as a sparse convex combination of a few nearby points. We then extend the algorithm to constrained clustering and active learning settings. Our motivation for developing such a framework stems from the fact that typically either a small amount of labelled data is available in advance; or it is possible to label some points at a cost. The latter scenario is typically encountered in the process of validating a cluster assignment. Extensive experiments on simulated and real data sets show that the proposed approach is effective and competitive with state-of-the-art methods.

... In the absence of any existing annotated data, a potential approach to AL is to utilize the distributional properties of the dataset with clustering-based AL methods (e.g. [10][11][12][13]). These methods rely heavily on how the samples are organized in the feature space (i.e. the choice of features) and what distance metric is used, as the methods need to use these two to cluster the data points and to prioritize the order in which cluster samples are provided for human annotators. ...

... In the absence of any existing annotated data, a potential approach to AL is to utilize the distributional properties of the dataset with clustering-based AL methods (e.g. [10][11][12][13]). These methods rely heavily on how the samples are organized in the feature space (i.e. the choice of features) and what distance metric is used, as the methods need to use these two to cluster the data points and to prioritize the order in which cluster samples are provided for human annotators. ...

When domain experts are needed to perform data annotation for complex machine-learning tasks, reducing annotation effort is crucial in order to cut down time and expenses. For cases when there are no annotations available, one approach is to utilize the structure of the feature space for clustering-based active learning (AL) methods. However, these methods are heavily dependent on how the samples are organized in the feature space and what distance metric is used. Unsupervised methods such as contrastive predictive coding (CPC) can potentially be used to learn organized feature spaces, but these methods typically create high-dimensional features which might be challenging for estimating data density. In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL. Our experiments for simulating speech emotion recognition system deployment show that both the local and global topology of the feature space can be successfully used for AL, and that CPC can be used to improve clustering-based AL performance over traditional signal features. Additionally, we observe that compressing data dimensionality does not harm AL performance substantially, and that 2-D feature representations achieved similar AL performance as higher-dimensional representations when the number of annotations is not very low.

... Advances in active learning techniques have improved the ability to find the most useful data points. Unsupervised learning techniques, such as subspace clustering, have been shown to find influential points from a cluster [51]. A hybrid method that connects active learning and data programming [48] has shown improvements in the reduction of noisy data in large scale workspaces [15]. ...

... Advances in active learning techniques have improved the ability to find the most useful data points. Unsupervised learning techniques, such as subspace clustering, have been shown to find influential points from a cluster [51]. A hybrid method that connects active learning and data programming [48] has shown improvements in the reduction of noisy data in large scale workspaces [15]. ...

Ordering the selection of training data using active learning can lead to improvements in learning efficiently from smaller corpora. We present an exploration of active learning approaches applied to three grounded language problems of varying complexity in order to analyze what methods are suitable for improving data efficiency in learning. We present a method for analyzing the complexity of data in this joint problem space, and report on how characteristics of the underlying task, along with design decisions such as feature selection and classification model, drive the results. We observe that representativeness, along with diversity, is crucial in selecting data samples.

Principal component analysis is a method of dimensionality reduction based on the eigensystem of the covariance matrix of a set of multivariate observations. Analyzing the effects of some specific observations on this eigensystem is therefore of particular importance in the sensitivity study of the results. In this framework, approximations for the perturbed eigenvalues and eigenvectors when deleting one or several observations are useful from a computational standpoint. Indeed, they allow one to evaluate the effects of these observations without having to recompute the exact perturbed eigenvalues and eigenvectors. However, it turns out that some approximations which have been suggested are based on an incorrect application of matrix perturbation theory. The aim of this short note is to provide the correct formulations which are illustrated with a numerical study.

Semi-supervised clustering seeks to augment traditional clustering methods by
incorporating side information provided via human expertise in order to
increase the semantic meaningfulness of the resulting clusters. However, most
current methods are \emph{passive} in the sense that the side information is
provided beforehand and selected randomly. This may require a large number of
constraints, some of which could be redundant, unnecessary, or even detrimental
to the clustering results. Thus in order to scale such semi-supervised
algorithms to larger problems it is desirable to pursue an \emph{active}
clustering method---i.e. an algorithm that maximizes the effectiveness of the
available human labor by only requesting human input where it will have the
greatest impact. Here, we propose a novel online framework for active
semi-supervised spectral clustering that selects pairwise constraints as
clustering proceeds, based on the principle of uncertainty reduction. Using a
first-order Taylor expansion, we decompose the expected uncertainty reduction
problem into a gradient and a step-scale, computed via an application of matrix
perturbation theory and cluster-assignment entropy, respectively. The resulting
model is used to estimate the uncertainty reduction potential of each sample in
the dataset. We then present the human user with pairwise queries with respect
to only the best candidate sample. We evaluate our method using three different
image datasets (faces, leaves and dogs), a set of common UCI machine learning
datasets and a gene dataset. The results validate our decomposition formulation
and show that our method is consistently superior to existing state-of-the-art
techniques, as well as being robust to noise and to unknown numbers of
clusters.

We propose an algorithm called query by commitee, in which a committee of students is trained on the same data set. The next query is chosen according to the principle of maximal disagreement. The algorithm is studied for two toy models: the high-low game and perceptron learning of another perceptron. As the number of queries goes to infinity, the committee algorithm yields asymptotically finite information gain. This leads to generalization error that decreases exponentially with the number of examples. This in marked contrast to learning from randomly chosen inputs, for which the information gain approaches zero and the generalization error decreases with a relatively slow inverse power law. We suggest that asymptotically finite information gain may be an important characteristic of good query algorithms.

We present a framework for margin based active learning of linear separators. We instantiate it for a few important cases, some of which have been previously considered in the literature. We analyze the effectiveness of our frame- work both in the realizable case and in a specific noisy setting related to the Tsy - bakov small noise condition.

Active Learning methods rely on static strategies for sam- pling unlabeled point(s). These strategies range from uncertainty sam- pling and density estimation to multi-factor methods with learn-once- use-always model parameters. This paper proposes a dynamic approach, called DUAL, where the strategy selection parameters are adaptively updated based on estimated future residual error reduction after each actively sampled point. The objective of dual is to outperform static strategies over a large operating range: from very few to very many la- beled points. Empirical results over six datasets demonstrate that DUAL outperforms several state-of-the-art methods on most datasets.

The technique of spectral clustering is widely used to segment a range of data from graphs to images. Our work marks a natural progression of spectral clustering from the original passive unsupervised formulation to our active semi-supervised formulation. We follow the widely used area of constrained clustering and allow supervision in the form of pair wise relations between two nodes: Must-Link and Cannot-Link. Unlike most previous constrained clustering work, our constraints are specified incrementally by querying an oracle (domain expert). Since in practice, each query comes with a cost, our goal is to maximally improve the result with as few queries as possible. The advantages of our approach include: 1) it is principled by querying the constraints which maximally reduce the expected error, 2) it can incorporate both hard and soft constraints which are prevalent in practice. We empirically show that our method significantly outperforms the baseline approach, namely constrained spectral clustering with randomly selected constraints, on UCI benchmark data sets.

Over the past few years, several methods for segmenting a scene containing multiple rigidly moving objects have been proposed. However, most existing methods have been tested on a handful of sequences only, and each method has been often tested on a different set of sequences. Therefore, the comparison of different methods has been fairly limited. In this paper, we compare four 3D motion segmentation algorithms for affine cameras on a benchmark of 155 motion sequences of checkerboard, traffic, and articulated scenes.

We prove that the set of all Lambertian reflectance functions (the mapping from surface normals to intensities) obtained with arbitrary distant light sources lies close to a 9D linear subspace. This implies that, in general, the set of images of a convex Lambertian object obtained under a wide variety of lighting conditions can be approximated accurately by a low-dimensional linear subspace, explaining prior empirical results. We also provide a simple analytic characterization of this linear space. We obtain these results by representing lighting using spherical harmonics and describing the effects of Lambertian materials as the analog of a convolution. These results allow us to construct algorithms for object recognition based on linear methods as well as algorithms that use convex optimization to enforce nonnegative lighting functions. We also show a simple way to enforce nonnegative lighting when the images of an object lie near a 4D linear space. We apply these algorithms to perform face recognition by finding the 3D model that best matches a 2D query image.

Many clustering problems in computer vision and other contexts are also classification problems, where each cluster shares a meaningful label. Subspace clustering algorithms in particular are often applied to problems that fit this description, for example with face images or handwritten digits. While it is straightforward to request human input on these datasets, our goal is to reduce this input as much as possible. We present an algorithm for active query selection that allows us to leverage the union of subspace structure assumed in subspace clustering. The central step of the algorithm is in querying points of minimum margin between estimated subspaces; analogous to classifier margin, these lie near the decision boundary. This procedure can be used after any subspace clustering algorithm that outputs an affinity matrix and is capable of driving the clustering error down more quickly than other state-of-the-art active query algorithms on datasets with subspace structure. We demonstrate the effectiveness of our algorithm on several benchmark datasets, and with a modest number of queries we see significant gains in clustering performance.

Subspace clustering has typically been approached as an unsupervised machine learning problem. However in several applications where the union of subspaces model is useful, it is also reasonable to assume you have access to a small number of labels. In this paper we investigate the benefit labeled data brings to the subspace clustering problem. We focus on incorporating labels into the k-subspaces algorithm, a simple and computationally efficient alternating estimation algorithm. We find that even a very small number of randomly selected labels can greatly improve accuracy over the unsupervised approach. We demonstrate that with enough labels, we get a significant improvement by using actively selected labels chosen for points that are nearly equidistant to more than one estimated subspace. We show this improvement on simulated data and face images.

A large number of images with ground truth object bounding boxes are critical for learning object detectors, which is a fundamental task in compute vision. In this paper, we study strategies to crowd-source bounding box annotations. The core challenge of building such a system is to effectively control the data quality with minimal cost. Our key observation is that drawing a bounding box is significantly more difficult and time consuming than giving answers to multiple choice questions. Thus quality control through additional verification tasks is more cost effective than consensus based algorithms. In particular, we present a system that consists of three simple sub-tasks - a drawing task, a quality verification task and a coverage verification task. Experimental results demonstrate that our system is scalable, accurate, and cost-effective. Copyright © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

In multivariate analysis, the eigensystem of a sample covariance matrix has an important place as a diagnostic tool. Deletion of one or more observations from data will give rise to a perturbation of a sample covariance matrix. In this paper we derive power series expansions for unrepeated eigenvalues and corresponding eigenvectors of a perturbed sample covariance matrix. These results are then developed to provide methods for the detection of influential observations. Also some further results on the relationship between the eigenvalues of the one-case-deleted and complete sample covariance matrices are derived.

Assuming that numerical scores are available for the performance of each of n persons on each of n jobs, the "assignment problem" is the quest for an assignment of persons to jobs so that the sum of the n scores so obtained is as large as possible. It is shown that ideas latent in the work of two Hungarian mathematicians may be exploited to yield a new method of solving this problem. 1.

Assuming that numerical scores are available for the performance of each of n persons on each of n jobs, the "assignment problem" is the quest for an assignment of persons to jobs so that the sum of the n scores so obtained is as large as possible. It is shown that ideas latent in the work of two Hungarian mathematicians may be exploited to yield a new method of solving this problem.

We propose and study an algorithm, called Sparse Subspace Clustering, to cluster high-dimensional data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points that come from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data. As solving the sparse optimization program is NP-hard, we consider its convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm does not require initialization, can be solved efficiently, and can handle data points near the intersections of subspaces. In addition, our algorithm can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by modifying the optimization program to incorporate the model of the data. We verify the effectiveness of the proposed algorithm through experiments on synthetic data as well as two real-world problems of motion segmentation and face clustering.

In influence analysis several problems arise in the field of Principal Components when applying different sample versions. Among these are the difficulty of determining a certain correspondence between the eigenvalues before and after the deletion of observations, the choice of the sign of the eigenvectors and the computational problem derived from the resolution of a great number of eigenproblems. In this article, such problems are discussed from the joint influence point of view and a solution is proposed by using approximations. Furthermore, the influence on a new parameter of interest is introduced: the proportion of variance explained by a set of principal components.

This paper presents an active learning method that di-rectly optimizes expected future error. This is in con-trast to many other popular techniques that instead aim to reduce version space size. These methods are popu-lar because for many learning models, closed form cal-culation of the expected future error is intractable. Our approach is made feasible by taking a Monte Carlo ap-proach to estimating the expected reduction in error due to the labeling of a query. In experimental results on three real-world data sets we reach high accuracy with four times fewer labelled examples than compet-ing methods.

and for Natalie, who now piques it. i ii Supervised machine learning is a branch of artificial intelligence concerned with automatically inducing predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but particularly shine for tasks involving the automatic organization, extraction, and retrieval of information from large collections of data (e.g., text, images, and other digital media). In traditional supervised learning, one uses “labeled ” training data to induce a model. However, labeled instances for real-world applications are often difficult, expensive, or time consuming to obtain. Consider a complex task such as extracting key person and organization names from text documents. While gathering large amounts of unlabeled documents for these tasks is often relatively easy (e.g., from the World Wide Web), labeling these texts usually requires experienced human annotators with specific domain knowledge and training. There are implicit costs associated with obtaining these labels from domain experts, such as limited time and financial resources. This

Half-title pageSeries pageTitle pageCopyright pageDedicationPrefaceAcknowledgementsContentsList of figuresHalf-title pageIndex

A finite new algorithm is proposed for clustering m given points in n-dimensional real space into k clusters by generating k planes that constitute a local solution to the nonconvex problem of minimizing the sum of squares of the 2-norm distances between each point and a nearest plane. The key to the algorithm lies in a formulation that generates a plane in n-dimensional space that minimizes the sum of the squares of the 2-norm distances to each of m1 given points in the space. The plane is generated by an eigenvector corresponding to a smallest eigenvalue of an n n simple matrix derived from the m1 points. The algorithm was tested on the publicly available Wisconsin Breast Prognosis Cancer database to generate well separated patient survival curves. In contrast, the k-mean algorithm did not generate such well-separated survival curves.

This paper has been presented with the Best Paper Award. It will appear in print in Volume 52, No. 1, February 2005.

We present a framework for active learning in the multiple-instance (MI) setting. In an MI learning problem, instances are naturally organized into bags and it is the bags, instead of individual instances, that are labeled for training. MI learners assume that every instance in a bag labeled negative is actually negative, whereas at least one instance in a bag labeled positive is actually positive. We consider the particular case in which an MI learner is allowed to selectively query unla- beled instances from positive bags. This approach is well motivated in domains in which it is inexpensive to acquire bag labels and possible, but expensive, to acquire instance labels. We describe a method for learning from labels at mixed levels of granularity, and introduce two active query selection strategies moti- vated by the MI setting. Our experiments show that learning from instance labels can significantly improve performance of a basic MI learning algorithm in two multiple-instance domains: content-based image retrieval and text classification.

Query by Committee is an effective approach to selective sampling in which disagreement amongst an ensemble of hypotheses is used to select data for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting, respectively, to build the committees. For effective active learning, it is critical that the committee be made up of consistent hypotheses that are very different from each other. DECORATE is a recently developed method that directly constructs such diverse committees using artificial training data. This paper introduces ACTIVE-DECORATE, which uses DECORATE committees to select good training examples. Extensive experimental results demonstrate that, in general, ACTIVE-DECORATE outperforms both Query by Bagging and Query by Boosting.

We propose low-rank representation (LRR) to segment data drawn from a union of mul- tiple linear (or a-ne) subspaces. Given a set of data vectors, LRR seeks the lowest- rank representation among all the candidates that represent all vectors as the linear com- bination of the bases in a dictionary. Unlike the well-known sparse representation (SR), which computes the sparsest representation of each data vector individually, LRR aims at flnding the lowest-rank representation of a collection of vectors jointly. LRR better cap- tures the global structure of data, giving a more efiective tool for robust subspace seg- mentation from corrupted data. Both the- oretical and experimental results show that LRR is a promising tool for subspace segmen- tation.

This chapter introduces the concept of differential entropy, which is the entropy of a continuous random variable. Differential entropy is also related to the shortest description length, and is similar in many ways to the entropy of a discrete random variable. But there are some important differences, and there is need for some care in using the concept.

Based on the definition of generalised influence function and generalised Cook statistic, local influence of small perturbations on the eigenvalues and eigenvectors of a covariance matrix are studied for population and sample versions. The results based on the correlation matrix are also derived and some related topics are discussed. Finally, an example is used for illustration.

In linear regression, the theoretical influence function and the various sample versions of it have an established place as
diagnostic tools. These same functions are developed here to provide methods for the detection of influential observations
in principal components analysis. The perturbation theory of real symmetric matrices unifies this development. Some interesting
points of contrast with the regression case are noted and explained theoretically.

Previous work has demonstrated that the image variation of many objects (human faces in particular) under variable lighting can be effectively modeled by low-dimensional linear spaces, even when there are multiple light sources and shadowing. Basis images spanning this space are usually obtained in one of three ways: A large set of images of the object under different lighting conditions is acquired, and principal component analysis (PCA) is used to estimate a subspace. Alternatively, synthetic images are rendered from a 3D model (perhaps reconstructed from images) under point sources and, again, PCA is used to estimate a subspace. Finally, images rendered from a 3D model under diffuse lighting based on spherical harmonics are directly used as basis images. In this paper, we show how to arrange physical lighting so that the acquired images of each object can be directly used as the basis vectors of a low-dimensional linear space and that this subspace is close to those acquired by the other methods. More specifically, there exist configurations of k point light source directions, with k typically ranging from 5 to 9, such that, by taking k images of an object under these single sources, the resulting subspace is an effective representation for recognition under a wide range of lighting conditions. Since the subspace is generated directly from real images, potentially complex and/or brittle intermediate steps such as 3D reconstruction can be completely avoided; nor is it necessary to acquire large numbers of training images or to physically construct complex diffuse (harmonic) light fields. We validate the use of subspaces constructed in this fashion within the context of face recognition.

The paper is concerned with two-class active learning. While the common approach for collecting data in active learning is to select samples close to the classification boundary, better performance can be achieved by taking into account the prior data distribution. The main contribution of the paper is a formal framework that incorporates clustering into active learning. The algorithm first constructs a classifier on the set of the cluster representatives, and then propagates the classification decision to the other samples via a local noise model. The proposed model allows to select the most representative samples as well as to avoid repeatedly labeling samples in the same cluster. During the active learning process, the clustering is adjusted using the coarse-to-fine strategy in order to balance between the advantage of large clusters and the accuracy of the data representation. The results of experiments in image databases show a better performance of our algorithm compared to the current methods.

We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons. Keywords: selective sampling, query learning, Bayesian Learning, experimental design Yoav Freund, Room 2B-428, AT&T Laboratories, 700 Mountain Ave., Murray Hill, NJ, 07974. Telephone:908-582-3164. 1 1 Introduction Most of the research on the theory of learning from random examples is based on a paradigm in which the learner is both trained and tested on examples drawn at random from the same distribution. In this paradigm the learner is passive and has no control over the information that it receives. In contrast, in the query paradigm, the learner is given the power to ask questions. ...