TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation

Microsoft Research Ltd, Cambridge, UK; Department of Engineering, University of Cambridge
DOI: 10.1007/11744023_1

ABSTRACT This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and
context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs.
Our discriminative model exploits novel features, based on textons, which jointly model shape and texture. Unary classification
and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number
of classes. Accurate image segmentation is achieved by incorporating these classifiers in a conditional random field. Efficient
training of the model on very large datasets is achieved by exploiting both random feature selection and piecewise training

High classification and segmentation accuracy are demonstrated on three different databases: i) our own 21-object class database
of photographs of real objects viewed under general lighting conditions, poses and viewpoints, ii) the 7-class Corel subset
and iii) the 7-class Sowerby database used in [1]. The proposed algorithm gives competitive results both for highly textured
(e.g. grass, trees), highly structured (e.g. cars, faces, bikes, aeroplanes) and articulated objects (e.g. body, cow).

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most of the existing approaches for RGB-D indoor scene labeling employ hand-crafted features for each modality independently and combine them in a heuristic manner. There has been some attempt on directly learning features from raw RGB-D data, but the performance is not satisfactory. In this paper, we adapt the unsupervised feature learning technique for RGB-D labeling as a multi-modality learning problem. Our learning framework performs feature learning and feature encoding simultaneously which significantly boosts the performance. By stacking basic learning structure, higher-level features are derived and combined with lower-level features for better representing RGB-D data. Experimental results on the benchmark NYU depth dataset show that our method achieves competitive performance, compared with state-of-theart.
    ECCV; 09/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most existing high-performance co-segmentation algorithms are usually complicated due to the way of co-labelling a set of images and the requirement to handle quite a few parameters for effective co-segmentation. In this paper, instead of relying on the complex process of co-labelling multiple images, we perform segmentation on individual images but based on a combined saliency map that is obtained by fusing singleimage saliency maps of a group of similar images. Particularly, a new multiple image based saliency map extraction, namely geometric mean saliency (GMS) method, is proposed to obtain the global saliency maps. In GMS, we transmit the saliency information among the images using the warping technique. Experiments show that our method is able to outperform state-of-the-art methods on three benchmark cosegmentation datasets.
    IEEE ICIP; 10/2014
  • Source
    N Dinesh Reddy, Prateek Singhal, K Madhava
    [Show abstract] [Hide abstract]
    ABSTRACT: While the literature has been fairly dense in the areas of scene understanding and semantic labeling there have been few works that make use of motion cues to embellish seman-tic performance and vice versa. In this paper, we address the problem of semantic motion segmentation, and show how semantic and motion priors augments performance. We pro-pose an algorithm that jointly infers the semantic class and motion labels of an object. Integrating semantic, geometric and optical flow based constraints into a dense CRF-model we infer both the object class as well as motion class, for each pixel. We found improvement in performance using a fully connected CRF as compared to a standard clique-based CRFs. For inference, we use a Mean Field approximation based algorithm. Our method outperforms recently pro-posed motion detection algorithms and also improves the se-mantic labeling compared to the state-of-the-art Automatic Labeling Environment algorithm on the challenging KITTI dataset especially for object classes such as pedestrians and cars that are critical to an outdoor robotic navigation sce-nario.
    indian conference on computer vision , graphics and image processing; 12/2014

Full-text (4 Sources)

Available from
May 26, 2014