TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation

Microsoft Research Ltd, Cambridge, UK; Department of Engineering, University of Cambridge
DOI: 10.1007/11744023_1

ABSTRACT This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and
context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs.
Our discriminative model exploits novel features, based on textons, which jointly model shape and texture. Unary classification
and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number
of classes. Accurate image segmentation is achieved by incorporating these classifiers in a conditional random field. Efficient
training of the model on very large datasets is achieved by exploiting both random feature selection and piecewise training

High classification and segmentation accuracy are demonstrated on three different databases: i) our own 21-object class database
of photographs of real objects viewed under general lighting conditions, poses and viewpoints, ii) the 7-class Corel subset
and iii) the 7-class Sowerby database used in [1]. The proposed algorithm gives competitive results both for highly textured
(e.g. grass, trees), highly structured (e.g. cars, faces, bikes, aeroplanes) and articulated objects (e.g. body, cow).

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Image segmentation is known to be an ambiguous problem whose solution needs an integration of image and shape cues of various levels; using low-level information alone is often not sufficient for a segmentation algorithm to match human capability. Two recent trends are popular in this area: (1) low-level and mid-level cues are combined to-gether in learning-based approaches to localize segmentation boundaries; (2) high-level vision tasks such as image labeling and object recognition are directly performed to ob-tain object boundaries. In this paper, we present an interesting observation that performs image segmentation in a reverse way, i.e., using a high-level semantic labeling approach to address a low-level segmentation problem, could be a proper solution. We perform semantic labeling on input images and derive segmentations from the labeling results. We adopt graph coloring theory to connect these two tasks and provide theoretical in-sights to our solution. This seemingly unusual way of doing image segmentation leads to surprisingly encouraging results, superior or comparable to those of the state-of-the-art image segmentation algorithms on multiple publicly available datasets.
    British Machine Vision Conference (BMVC), 2014; 01/2014
  • Source
    IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OHIO; 06/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper propose a simple method for selecting an optimal feature subset for the urban scene understanding. Urban scene could be divided into three semantic levels: subordinate, basic and superordinate. Each semantic level contains objects of different classes which could be described as an N-dimensional vector. The proposed algorithm use both appearance and contextual properties for feature selection. Algorithm takes into account physical properties of objects (texture information, contrast, geometrical size etc), contextual information, and relations between semantic levels. Training process consists of a three steps. On a first step we use a combination of genetic algorithm and a C-mean clustering to find the subclasses and select descriptive features. Then classes are analyzed in pairs in order to select the features that maximize the difference between classes. Finally, contextual analysis is applied in order to find dependences in appearance of different classes.

Full-text (4 Sources)

Available from
May 26, 2014