Article

Partial Label Metric Learning Based on Statistical Inference

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In partial label data, the ground-truth label of a training example is concealed in a set of candidate labels associated with the instance. As the ground-truth label is inaccessible, it is difficult to train the classifier via the label information. Consequently, manifold structure information is adopted, which is under the assumption that neighbor/similar instances in the feature space have similar labels in the label space. However, the real-world data may not fully satisfy this assumption. In this paper, a partial label metric learning method based on likelihood-ratio test is proposed to make partial label data satisfy the manifold assumption. Moreover, the proposed method needs no objective function and treats the data pairs asymmetrically. The experimental results on several real-world PLL datasets indicate that the proposed method outperforms the existing partial label metric learning methods in terms of classification accuracy and disambiguation accuracy while costs less time.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Partial label learning (PLL) is an emerging framework in weakly supervised machine learning with broad application prospects. It handles the case in which each training example corresponds to a candidate label set and only one label concealed in the set is the ground-truth label. In this paper, we propose a novel taxonomy framework for PLL including four categories: disambiguation strategy, transformation strategy, theory-oriented strategy and extensions. We analyze and evaluate methods in each category and sort out synthetic and real-world PLL datasets which are all hyperlinked to the source data. Future work of PLL is profoundly discussed in this article based on the proposed taxonomy framework.
Article
Full-text available
Supervised learning techniques construct predictive models by learning from a large number of training examples, where each training example has a label indicating its ground-truth output. Though current techniques have achieved great success, it is noteworthy that in many tasks it is difficult to get strong supervision information like fully ground-truth labels due to the high cost of data labeling process. Thus, it is desired for machine learning techniques to work with weak supervision. This article reviews some research progress of weakly supervised learning, focusing on three typical types of weak supervision: incomplete supervision where only a subset of training data are given with labels; inexact supervision where the training data are given with only coarse-grained labels; inaccurate supervision where the given labels are not always ground-truth.
Conference Paper
Full-text available
Many machine learning and pattern recognition algorithms rely heavily on good distance metrics to achieve competitive performance. While distance metrics can be learned, the computational expense of doing so is currently infeasible on large datasets. In this paper, we propose two efficient-and-effective approaches for selecting the training dataset using Locality-Sensitive Hashing (LSH) with discriminative information, and with K-Means clustering inside LSH buckets, for accelerating metric learning. Our methods yield a speedup factor of (N=C)2, where N is training set size and C N is the user-selected compressed set size, achieving quadratic speedup to metric learning often realized as a 1-2 or more orders of magnitude improvement. For example, our generic filter approach enables the current overall fastest Large Margin Nearest Neighbor (LMNN) to learn metrics on one million samples in 6.8 minutes down from 5.4hrs — a 48x speedup. LMNN and similar state-of-the-art methods use tree data structures to speed up nearest-neighbor queries — an advantage that degrades at higher dimensions. Our approach does not share this limitation.
Conference Paper
Full-text available
In this paper, we raise important issues on scalability and the required degree of supervision of existing Mahalanobis metric learning methods. Often rather tedious optimization procedures are applied that become computationally intractable on a large scale. Further, if one considers the constantly growing amount of data it is often infeasible to specify fully supervised labels for all data points. Instead, it is easier to specify labels in form of equivalence constraints. We introduce a simple though effective strategy to learn a distance metric from equivalence constraints, based on a statistical inference perspective. In contrast to existing methods we do not rely on complex optimization problems requiring computationally expensive iterations. Hence, our method is orders of magnitudes faster than comparable methods. Results on a variety of challenging benchmarks with rather diverse nature demonstrate the power of our method. These include faces in unconstrained environments, matching before unseen object instances and person re-identification across spatially disjoint cameras. In the latter two benchmarks we clearly outperform the state-of-the-art.
Conference Paper
Full-text available
Metric learning aims at nding a distance that approximates a task-specic notion of semantic similarity. Typically, a Mahalanobis distance is learned from pairs of data labeled as being semantically sim- ilar or not. In this paper, we learn such metrics in a weakly supervised setting where \bags" of instances are labeled with \bags" of labels. We formulate the problem as a multiple instance learning (MIL) problem over pairs of bags. If two bags share at least one label, we label the pair positive, and negative otherwise. We propose to learn a metric us- ing those labeled pairs of bags, leading to MildML, for multiple instance logistic discriminant metric learning. MildML iterates between updates of the metric and selection of putative positive pairs of examples from positive pairs of bags. To evaluate our approach, we introduce a large and challenging data set, Labeled Yahoo! News, which we have manually annotated and contains 31147 detected faces of 5873 dierent people in 20071 images. We group the faces detected in an image into a bag, and group the names detected in the caption into a corresponding set of labels. When the labels come from manual annotation, we nd that MildML using the bag-level annotation performs as well as fully super- vised metric learning using instance-level annotation. We also consider performance in the case of automatically extracted labels for the bags, where some of the bag labels do not correspond to any example in the bag. In this case MildML works substantially better than relying on noisy instance-level annotations derived from the bag-level annotation by resolving face-name associations in images with their captions.
Book
The metric learning problem is concerned with learning a distance function tuned to a particular task, and has been shown to be useful when used in conjunction with nearest-neighbor methods and other techniques that rely on distances or similarities. Metric Learning: A Review presents an overview of existing research in this topic, including recent progress on scaling to high-dimensional feature spaces and to data sets with an extremely large number of data points. It presents as unified a framework as possible under which existing research on metric learning can be cast. The monograph starts out by focusing on linear metric learning approaches, and mainly concentrates on the class of Mahalanobis distance learning methods. It then discusses nonlinear metric learning approaches, focusing on the connections between the non-linear and linear approaches. Finally, it discusses extensions of metric learning, as well as applications to a variety of problems in computer vision, text analysis, program analysis, and multimedia. Metric Learning: A Review is an ideal reference for anyone interested in the metric learning problem. It synthesizes much of the recent work in the area and it is hoped that it will inspire new algorithms and applications.
Conference Paper
Partial label learning aims to induce a multi-class classifier from training examples where each of them is associated with a set of candidate labels, among which only one is the ground-truth label. The common strategy to train predictive model is disambiguation, i.e. differentiating the modeling outputs of individual candidate labels so as to recover ground-truth labeling information. Recently, feature-aware disambiguation was proposed to generate different labeling confidences over candidate label set by utilizing the graph structure of feature space. However, the existence of noise and outliers in training data makes the similarity derived from original features less reliable. To this end, we proposed a novel approach for partial label learning based on adaptive graph guided disambiguation (PL-AGGD). Compared with fixed graph, adaptive graph could be more robust and accurate to reveal the intrinsic manifold structure within the data. Moreover, instead of the two-stage strategy in previous algorithms, our approach performs label disambiguation and predictive model training simultaneously. Specifically, we present a unified framework which jointly optimizes the ground-truth labeling confidences, similarity graph and model parameters to achieve strong generalization performance. Extensive experiments show that PL-AGGD performs favorably against state-of-the-art partial label learning approaches.
Article
Partial label learning (PLL) is a new weakly supervised learning framework that addresses the classification problems, where the true label of each training sample is concealed in a set of candidate labels. To learn from such weakly supervised training data, the key is to disambiguate the ambiguous labeling information. Because it is difficult to address by only focusing on the manipulation in the label space, manifold structure among training data in the feature space has gradually been exploited simultaneously to facilitate the disambiguation process by researchers in recent years. However, the manifold structure is commonly analyzed under an assumption that the samples close to each other in the feature space will share identical labels in the label space, which may be not correct in many real-world problems. In this paper, geometric mean metric learning approach is employed to learn a distance metric for PLL problems such that can maintain the aforementioned assumption correct in as many situations as possible. It is significantly more challenging than the conventional setup of distance metric learning because it is difficult to precisely identify whether a pair of training samples belong to the same class. We propose an alternative approach in which each training sample and its neighbor with shared candidate label are taken as a similarity pair, and each training sample and its neighbor without shared candidate label are taken as a dissimilarity pair. Considering that two samples with shared candidate label do not necessarily come from the same class, a weight is placed on each similarity pair. The experimental results on twenty four controlled UCI data sets and six real-world PLL problems show the proposed distance metric learning approach can be used as a front end of both the PLL algorithms exploiting the manifold structure among training data and other existing distance-based PLL algorithms to significantly improve their performance.
Article
In partial label learning, each training example is associated with a set of \emph{candidate} labels among which only one is the ground-truth label. The common strategy to induce predictive model is trying to disambiguate the candidate label set, i.e. differentiating the modeling outputs of individual candidate labels. Specifically, disambiguation by differentiation can be conducted either by identifying the ground-truth label iteratively or by treating each candidate label equally. Nonetheless, the disambiguation strategy is prone to be misled by the false positive labels co-occurring with ground-truth label. In this paper, a new partial label learning strategy is studied which refrains from conducting disambiguation. Specifically, by adapting error-correcting output codes (ECOC), a simple yet effective approach named {\plecoc} is proposed by utilizing candidate label set as \emph{an entirety}. During training phase, to build binary classifier w.r.t. each column coding, any partially labeled example will be regarded as a positive or negative training example only if its candidate label set entirely falls into the coding dichotomy. During testing phase, class label for the unseen instance is determined via loss-based decoding which considers binary classifiers' empirical performance and predictive margin. Extensive experiments show that {\plecoc} performs favorably against state-of-the-art partial label learning approaches.
Conference Paper
Partial label learning deals with the problem where each training example is represented by a feature vector while associated with a set of candidate labels, among which only one label is valid. To learn from such ambiguous labeling information, the key is to try to disambiguate the candidate label sets of partial label training examples. Existing disambiguation strategies work by either identifying the ground-truth label iteratively or treating each candidate label equally. Nonetheless, the disambiguation process is generally conducted by focusing on manipulating the label space, and thus ignores making full use of potentially useful information from the feature space. In this paper, a novel two-stage approach is proposed to learning from partial label examples based on feature-aware disambiguation. In the first stage, the manifold structure of feature space is utilized to generate normalized labeling confidences over candidate label set. In the second stage, the predictive model is learned by performing regularized multi-output regression over the generated labeling confidences. Extensive experiments on artificial as well as real-world partial label data sets clearly validate the superiority of the proposed feature-aware disambiguation approach.
Article
In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic Stick-Breaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSB-CMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions.
Conference Paper
The FG-NET aging database was released in 2004 in an attempt to support research activities related to facial aging. Since then a number of researchers used the database for carrying out research in various disciplines related to facial aging. Based on the analysis of published work where the FG-NET aging database was used, conclusions related to the type of research carried out in relation to the impact of the dataset in shaping up the research topic of facial aging, are presented. In particular we focus our attention on the topic of age estimation that proved to be the most popular among users of the FG-NET aging database. Through the review of key papers in age estimation and the presentation of benchmark results the main approaches/directions in facial aging are outlined and future trends, requirements and research directions are drafted.
Article
The metric learning problem is concerned with learning a distance function tuned to a particular task, and has been shown to be useful when used in conjunction with nearest-neighbor methods and other techniques that rely on distances or similarities. This survey presents an overview of existing research in metric learning, including recent progress on scaling to high-dimensional feature spaces and to data sets with an extremely large number of data points. A goal of the survey is to present as unified as possible a framework under which existing research on metric learning can be cast. The first part of the survey focuses on linear metric learning approaches, mainly concentrating on the class of Mahalanobis distance learning methods. We then discuss nonlinear metric learning approaches, focusing on the connections between the nonlinear and linear approaches. Finally, we discuss extensions of metric learning, as well as applications to a variety of problems in computer vision, text analysis, program analysis, and multimedia.
Article
Multi-instance multi-label learning (MIML) is a framework for supervised classification where the objects to be classified are bags of instances associated with multiple labels. For example, an image can be represented as a bag of segments and associated with a list of objects it contains. Prior work on MIML has focused on predicting label sets for previously unseen bags. We instead consider the problem of predicting instance labels while learning from data labeled only at the bag level. We propose Rank-Loss Support Instance Machines, which optimize a regularized rank-loss objective and can be instantiated with different aggregation models connecting instance-level predictions with bag-level predictions. The aggregation models that we consider are equivalent to defining a "support instance" for each bag, which allows efficient optimization of the rank-loss objective using primal sub-gradient descent. Experiments on artificial and real-world datasets show that the proposed methods achieve higher accuracy than other loss functions used in prior work, e.g., Hamming loss, and recent work in ambiguous label classification.
Conference Paper
Inducing a classification function from a set of examples in the form of labeled instances is a standard problem in supervised machine learning. In this paper, we are concerned with ambiguous label classification (ALC), an extension of this setting in which several candidate labels may be assigned to a single example. By extending three concrete classification methods to the ALC setting and evaluating their performance on benchmark data sets, we show that appropriately designed learning algorithms can successfully exploit the information contained in ambiguously labeled examples. Our results indicate that the fundamental idea of the extended methods, namely to disambiguate the label information by means of the inductive bias underlying (heuristic) machine learning methods, works well in practice.
Article
We address the problem of partially-labeled multiclass classification, where instead of a single label per instance, the algorithm is given a candidate set of labels, only one of which is correct. Our setting is motivated by a common scenario in many image and video collections, where only partial access to labels is available. The goal is to learn a classifier that can disambiguate the partially-labeled training instances, and generalize to unseen data. We define an intuitive property of the data distribution that sharply characterizes the ability to learn in this setting and show that effective learning is possible even when all the data is only partially labeled. Exploiting this property of the data, we propose a convex learning formulation based on minimization of a loss function appropriate for the partial label setting. We analyze the conditions under which our loss function is asymptotically consistent, as well as its generalization and transductive performance. We apply our framework to identifying faces culled from web news sources and to naming characters in TV series and movies; in particular, we annotated and experimented on a very large video data set and achieve 6% error for character naming on 16 episodes of the TV series Lost.
Article
Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.
Article
Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. The human brain confronts the same problem in everyday perception, extracting from its high-dimensional sensory inputs—30,000 auditory nerve fibers or 106 optic nerve fibers—a manageably small number of perceptually relevant features. Here we describe an approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set. Unlike classical techniques such as principal component analysis (PCA) and multidimensional scaling (MDS), our approach is capable of discovering the nonlinear degrees of freedom that underlie complex natural observations, such as human handwriting or images of a face under different viewing conditions. In contrast to previous algorithms for nonlinear dimensionality reduction, ours efficiently computes a globally optimal solution, and, for an important class of data manifolds, is guaranteed to converge asymptotically to the true structure.
Solving the partial label learning problem: An instance-based approach
  • M L Zhang
  • F Yu
M.L. Zhang and F. Yu, "Solving the partial label learning problem: An instance-based approach," Proc. 24th Int. Joint Conf. Artif. Intell., Argentina, pp.4048-4054, 2015.