Conference Paper

Unsupervised Learning of Invariant Features Using Video

DOI: 10.1109/CVPR.2010.5539773 Conference: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
Source: IEEE Xplore

ABSTRACT We present an algorithm that learns invariant features from real data in an entirely unsupervised fashion. The principal benefit of our method is that it can be applied without human intervention to a particular application or data set, learning the specific invariances necessary for excellent feature performance on that data. Our algorithm relies on the ability to track image patches over time using optical flow. With the wide availability of high frame rate video (eg: on the web, from a robot), good tracking is straightforward to achieve. The algorithm then optimizes feature parameters such that patches corresponding to the same physical location have feature descriptors that are as similar as possible while simultaneously maximizing the distinctness of descriptors for different locations. Thus, our method captures data or application specific invariances yet does not require any manual supervision. We apply our algorithm to learn domain-optimized versions of SIFT and HOG. SIFT and HOG features are excellent and widely used. However, they are general and by definition not tailored to a specific domain. Our domain-optimized versions offer a substantial performance increase for classification and correspondence tasks we consider. Furthermore, we show that the features our method learns are near the optimal that would be achieved by directly optimizing the test set performance of a classifier. Finally, we demonstrate that the learning often allows fewer features to be used for some tasks, which has the potential to dramatically improve computational concerns for very large data sets.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Place recognition for loop closure detection lies at the heart of every Simultaneous Localization and Mapping (SLAM) method. Recently methods that use cameras and describe the entire image by one holistic feature vector have experienced a resurgence. Despite the success of these methods, it remains unclear how a descriptor should be constructed for this particular purpose. The problem of choosing the right descriptor becomes even more pronounced in the context of life long mapping. The appearance of a place may vary considerably under different illumination conditions and over the course of a day. None of the handcrafted descriptors published in literature are particularly designed for this purpose. Herein, we propose to use a set of elementary building blocks from which millions of different descriptors can be constructed automatically. Moreover, we present an evaluation function which evaluates the performance of a given image descriptor for place recognition under severe lighting changes. Finally we present an algorithm to efficiently search the space of descriptors to find the best suited one. Evaluating the trained descriptor on a test set shows a clear superiority over its hand crafted counter parts like BRIEF and U-SURF. Finally we show how loop closures can be reliably detected using the automatically learned descriptor. Two overlapping image sequences from two different days and times are merged into one pose graph. The resulting merged pose graph is optimized and does not contain a single false link while at the same time all true loop closures were detected correctly. The descriptor and the place recognizer source code is published with datasets on
    IEEE Intelligent Vehicles Symposium; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper gives a review of the recent developments in deep learning and un-supervised feature learning for time-series problems. While these techniques have shown promise for modeling static data, such as computer vision, ap-plying them to time-series data is gaining increasing attention. This paper overviews the particular challenges present in time-series data and provides a review of the works that have either applied time-series data to unsupervised feature learning algorithms or alternatively have contributed to modications of feature learning algorithms to take into account the challenges present in time-series data.
    Pattern Recognition Letters 06/2014; 42. DOI:10.1016/j.patrec.2014.01.008 · 1.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Image retrieval based on the query-by-example (QBE) principle is still not reliable enough, largely because of the likely variations in the capture conditions (e.g. light, blur, scale, occlusion) and viewpoint between the query image and the images in the collection. In this paper, we propose a framework in which this problem is explicitly addressed to improve the reliability of QBE-based image retrieval. We aim at the use scenario involving the user capturing the query object by his/her mobile device and requesting information augmenting the query from the database. Reliability improvement is achieved by allowing the user to submit not a single image but a short video clip as a query. Since a video clip may combine object or scene appearances captured from different viewpoints and under different conditions, the rich information contained therein can be exploited to discover the proper query representation and to improve the relevance of the retrieved results. The experimental results show that video-based image retrieval (VBIR) is significantly more reliable than the retrieval using a single image as query. Furthermore, to make the proposed framework deployable in a practical mobile image retrieval system, where realtime query response is required, we also propose the priority queue-based feature description scheme and cache-based bi-quantization algorithm for an efficient parallel implementation of the VBIR concept.
    09/2012; 2(3). DOI:10.1007/s13735-012-0023-3