Image analysis by counting on a grid
ABSTRACT In recent object/scene recognition research images or large image regions are often represented as disorganized ”bags” of image features. This representation allows direct application of models of word counts in text. However, the image feature counts are likely to be constrained in different ways than word counts in text. As a camera pans upwards from a building entrance over its first few floors and then above the penthouse to the backdrop formed by the mountains, and then further up into the sky, some feature counts in the image drop while others rise-only to drop again giving way to features found more often at higher elevations (Fig. 1). The space of all possible feature count combinations is constrained by the properties of the larger scene as well as the size and the location of the window into it. Accordingly, our model is based on a grid of feature counts, considerably larger than any of the modeled images, and considerably smaller than the real estate needed to tile the images next to each other tightly. Each modeled image is assumed to have a representative window in the grid in which the sum of feature counts mimics the distribution in the image. We provide learning procedures that jointly map all images in the training set to the counting grid and estimate the appropriate local counts in it. Experimentally, we demonstrate that the resulting representation captures the space of feature count combinations more accurately than the traditional models, such as latent Dirichlet allocation, even when modeling images of different scenes from the same category.
- SourceAvailable from: Julia Vogel[show abstract] [hide abstract]
ABSTRACT: In this paper, we present a novel image rep- resentation that renders it possible to access natural scenes by local semantic description. Our work is mo- tivated by the continuing efiort in content-based image retrieval to extract and to model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic concept classes such as water, rocks, or foliage. Images are represented through the frequency of occurrence of these local con- cepts. Through extensive experiments, we demonstrate that the image representation is well suited for modeling the semantic content of heterogenous scene categories, and thus for categorization and retrieval. The image representation also allows us to rank na- tural scenes according to their semantic similarity rela- tive to certain scene categories. Based on human ranking data, we learn a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking. This re- sult is especially valuable for content-based image re- trieval where the goal is to present retrieval results in descending semantic similarity from the query.International Journal of Computer Vision 01/2007; 72:133-157. · 3.62 Impact Factor
Article: Epitomic location recognition.[show abstract] [hide abstract]
ABSTRACT: This paper presents a novel method for location recognition, which exploits an epitomic representation to achieve both high efficiency and good generalization. A generative model based on epitomic image analysis captures the appearance and geometric structure of an environment while allowing for variations due to motion, occlusions, and non-Lambertian effects. The ability to model translation and scale invariance together with the fusion of diverse visual features yields enhanced generalization with economical training. Experiments on both existing and new labeled image databases result in recognition accuracy superior to state of the art with real-time computational performance.IEEE Transactions on Software Engineering 12/2009; 31(12):2158-67. · 2.59 Impact Factor
Conference Proceeding: Epitomic analysis of appearance and shape[show abstract] [hide abstract]
ABSTRACT: We present novel simple appearance and shape models that we call epitomes. The epitome of an image is its miniature, condensed version containing the essence of the textural and shape properties of the image. As opposed to previously used simple image models, such as templates or basis functions, the size of the epitome is considerably smaller than the size of the image or object it represents, but the epitome still contains most constitutive elements needed to reconstruct the image. A collection of images often shares an epitome, e.g., when images are a few consecutive frames from a video sequence, or when they are photographs of similar objects. A particular image in a collection is defined by its epitome and a smooth mapping from the epitome to the image pixels. When the epitomic representation is used within a hierarchical generative model, appropriate inference algorithms can be derived to extract the epitome from a single image or a collection of images and at the same time perform various inference tasks, such as image segmentation, motion estimation, object removal and super-resolution.Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on; 11/2003