A sparsity-enforcing method for learning face features.
ABSTRACT In this paper, we propose a new trainable system for selecting face features from over-complete dictionaries of image measurements. The starting point is an iterative thresholding algorithm which provides sparse solutions to linear systems of equations. Although the proposed methodology is quite general and could be applied to various image classification tasks, we focus here on the case study of face and eyes detection. For our initial representation, we adopt rectangular features in order to allow straightforward comparisons with existing techniques. For computational efficiency and memory saving requirements, instead of implementing the full optimization scheme on tenths of thousands of features, we propose a three-stage architecture which consists of finding first intermediate solutions to smaller size optimization problems, then merging the obtained results, and next applying further selection procedures. The devised system requires the solution of a number of independent problems, and, hence, the necessary computations could be implemented in parallel. Experimental results obtained on both benchmark and newly acquired face and eyes images indicate that our method is a serious competitor to other feature selection schemes recently popularized in computer vision for dealing with problems of real-time object detection. A major advantage of the proposed system is that it performs well even with relatively small training sets.
- SourceAvailable from: psu.edu[show abstract] [hide abstract]
ABSTRACT: The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and bandpass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive field properties may be accounted for in terms of a strategy for producing a sparse distribution of output activity in response to natural images. Here, in addition to describing this work in a more expansive fashion, we examine the neurobiological implications of sparse coding. Of particular interest is the case when the code is overcomplete--i.e., when the number of code elements is greater than the effective dimensionality of the input space. Because the basis functions are non-orthogonal and not linearly independent of each other, sparsifying the code will recruit only those basis functions necessary for representing a given input, and so the input-output function will deviate from being purely linear. These deviations from linearity provide a potential explanation for the weak forms of non-linearity observed in the response properties of cortical simple cells, and they further make predictions about the expected interactions among units in response to naturalistic stimuli.Vision Research 01/1998; 37(23):3311-25. · 2.14 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detailIEEE Transactions on Pattern Analysis and Machine Intelligence 12/1998; · 4.80 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: This paper presents a general, trainable system for object detection in unconstrained, cluttered scenes. The system derives much of its power from a representation that describes an object class in terms of an overcomplete dictionary of local, oriented, multiscale intensity differences between adjacent regions, efficiently computable as a Haar wavelet transform. This example-based learning approach implicitly derives a model of an object class by training a support vector machine classifier using a large set of positive and negative examples. We present results on face, people, and car detection tasks using the same architecture. In addition, we quantify how the representation affects detection performance by considering several alternate representations including pixels and principal components. We also describe a real-time application of our person detection system as part of a driver assistance system.International Journal of Computer Vision 05/2000; 38(1):15-33. · 3.62 Impact Factor