Conference Paper

Active Learning from Positive and Unlabeled Data.

DOI: 10.1109/ICDMW.2011.20 Conference: Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, Vancouver, BC, Canada, December 11, 2011
Source: DBLP

ABSTRACT During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label of that point from user. Many different methods such as uncertainty sampling and minimum risk sampling have been utilized to select the most informative sample in active learning. Although many active learning algorithms have been proposed so far, most of them work with binary or multi-class classification problems and therefore can not be applied to problems in which only samples from one class as well as a set of unlabeled data are available. Such problems arise in many real-world situations and are known as the problem of learning from positive and unlabeled data. In this paper we propose an active learning algorithm that can work when only samples of one class as well as a set of unlabeled data are available. Our method works by separately estimating probability density of positive and unlabeled points and then computing expected value of in formativeness to get rid of a hyper-parameter and have a better measure of in formativeness. Experiments and empirical analysis show promising results compared to other similar methods.

0 Bookmarks
 · 
90 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a general transductive learning framework named generalized manifold-ranking-based image retrieval (gMRBIR) for image retrieval. Comparing with an existing transductive learning method named MRBIR [12], our method could work well whether or not the query image is in the database; thus, it is more applicable for real applications. Given a query image, gMRBIR first initializes a pseudo seed vector based on neighborhood relationship and then spread its scores via manifold ranking to all the unlabeled images in the database. Furthermore, in gMRBIR, we also make use of relevance feedback and active learning to refine the retrieval result so that it converges to the query concept as fast as possible. Systematic experiments on a general-purpose image database consisting of 5,000 Corel images demonstrate the superiority of gMRBIR over state-of-the-art techniques.
    IEEE Transactions on Image Processing 11/2006; 15(10):3170-7. · 3.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper surveys the existing method of learning from positive and unlabeled examples. We divide the existing methods into three families, and review the main algorithms, respectively. The first Family of methods takes a two-step strategy, extracting some reliable negative examples, and then applying the supervised or semi-supervised learning method. The second family of methods estimates statistical queries over positive and unlabeled examples. The third family of methods reduces this problem to the problem of learning with high one-sided noise by treating the unlabeled set as noisy negative examples. Finally, we conclude and issue future works.
    Information Processing (ISIP), 2008 International Symposiums on; 06/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most methods for learning object categories require large amounts of labeled training data. However, obtaining such data can be a difficult and time-consuming endeavor. We have developed a novel, entropy-based ldquoactive learningrdquo approach which makes significant progress towards this problem. The main idea is to sequentially acquire labeled data by presenting an oracle (the user) with unlabeled images that will be particularly informative when labeled. Active learning adaptively prioritizes the order in which the training examples are acquired, which, as shown by our experiments, can significantly reduce the overall number of training examples required to reach near-optimal performance. At first glance this may seem counter-intuitive: how can the algorithm know whether a group of unlabeled images will be informative, when, by definition, there is no label directly associated with any of the images? Our approach is based on choosing an image to label that maximizes the expected amount of information we gain about the set of unlabeled images. The technique is demonstrated in several contexts, including improving the efficiency of Web image-search queries and open-world visual learning by an autonomous agent. Experiments on a large set of 140 visual object categories taken directly from text-based Web image searches show that our technique can provide large improvements (up to 10 x reduction in the number of training examples needed) over baseline techniques.
    Computer Vision and Pattern Recognition Workshops, 2008. CVPR Workshops 2008. IEEE Computer Society Conference on; 07/2008