-
[show abstract]
[hide abstract]
ABSTRACT: An important role of visual systems is to detect nearby predators, prey, and potential mates, which may be distinguished in part by their motion. When an animal is at rest, an object moving in any direction may easily be detected by motion-sensitive visual circuits. During locomotion, however, this strategy is compromised because the observer must detect a moving object within the pattern of optic flow created by its own motion through the stationary background. However, objects that move creating back-to-front (regressive) motion may be unambiguously distinguished from stationary objects because forward locomotion creates only front-to-back (progressive) optic flow. Thus, moving animals should exhibit an enhanced sensitivity to regressively moving objects. We explicitly tested this hypothesis by constructing a simple fly-sized robot that was programmed to interact with a real fly. Our measurements indicate that whereas walking female flies freeze in response to a regressively moving object, they ignore a progressively moving one. Regressive motion salience also explains observations of behaviors exhibited by pairs of walking flies. Because the assumptions underlying the regressive motion salience hypothesis are general, we suspect that the behavior we have observed in Drosophila may be widespread among eyed, motile organisms.
Current biology: CB 06/2012; 22(14):1344-50. · 10.99 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Which one comes first: segmentation or recognition? We propose a unified framework for carrying out the two simultaneously and without supervision. The framework combines a flexible probabilistic model, for representing the shape and appearance of each segment, with the popular "bag of visual words'' model for recognition. If applied to a collection of images, our framework can simultaneously discover the segments of each image, and the correspondence between such segments, without supervision. Such recurring segments may be thought of as the 'parts' of corresponding objects that appear multiple times in the image collection. Thus, the model may be used for learning new categories, detecting/classifying objects, and segmenting images, without using expensive human annotation.
IEEE Transactions on Software Engineering 12/2011; · 1.98 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple data sets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: 1) We put together a large, well-annotated, and realistic monocular pedestrian detection data set and study the statistics of the size, position, and occlusion patterns of pedestrians in urban scenes, 2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and 3) we evaluate the performance of sixteen pretrained state-of-the-art detectors across six data sets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians.
IEEE Transactions on Software Engineering 07/2011; 34(4):743-61. · 1.98 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We introduce a nonparametric Bayesian model, called TAX, which can organize image collections into a tree-shaped taxonomy without supervision. The model is inspired by the Nested Chinese Restaurant Process (NCRP) and associates each image with a path through the taxonomy. Similar images share initial segments of their paths and thus share some aspects of their representation. Each internal node in the taxonomy represents information that is common to multiple images. We explore the properties of the taxonomy through experiments on a large (~10(4)) image collection with a number of users trying to locate quickly a given image. We find that the main benefits are easier navigation through image collections and reduced description length. A natural question is whether a taxonomy is the optimal form of organization for natural images. Our experiments indicate that although taxonomies can organize images in a useful manner, more elaborate structures may be even better suited for this task.
IEEE Transactions on Software Engineering 04/2011; 33(11):2302-15. · 1.98 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Electrical stimulation of certain hypothalamic regions in cats and rodents can elicit attack behaviour, but the exact location of relevant cells within these regions, their requirement for naturally occurring aggression and their relationship to mating circuits have not been clear. Genetic methods for neural circuit manipulation in mice provide a potentially powerful approach to this problem, but brain-stimulation-evoked aggression has never been demonstrated in this species. Here we show that optogenetic, but not electrical, stimulation of neurons in the ventromedial hypothalamus, ventrolateral subdivision (VMHvl) causes male mice to attack both females and inanimate objects, as well as males. Pharmacogenetic silencing of VMHvl reversibly inhibits inter-male aggression. Immediate early gene analysis and single unit recordings from VMHvl during social interactions reveal overlapping but distinct neuronal subpopulations involved in fighting and mating. Neurons activated during attack are inhibited during mating, suggesting a potential neural substrate for competition between these opponent social behaviours.
Nature 02/2011; 470(7333):221-6. · 36.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The ability to choose rapidly among multiple targets embedded in a complex perceptual environment is key to survival. Targets may differ in their reward value as well as in their low-level perceptual properties (e.g., visual saliency). Previous studies investigated separately the impact of either value or saliency on choice; thus, it is not known how the brain combines these two variables during decision making. We addressed this question with three experiments in which human subjects attempted to maximize their monetary earnings by rapidly choosing items from a brief display. Each display contained several worthless items (distractors) as well as two targets, whose value and saliency were varied systematically. We compared the behavioral data with the predictions of three computational models assuming that (i) subjects seek the most valuable item in the display, (ii) subjects seek the most easily detectable item, and (iii) subjects behave as an ideal Bayesian observer who combines both factors to maximize the expected reward within each trial. Regardless of the type of motor response used to express the choices, we find that decisions are influenced by both value and feature-contrast in a way that is consistent with the ideal Bayesian observer, even when the targets' feature-contrast is varied unpredictably between trials. This suggests that individuals are able to harvest rewards optimally and dynamically under time pressure while seeking multiple targets embedded in perceptual clutter.
Proceedings of the National Academy of Sciences 03/2010; 107(11):5232-7. · 9.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Arousal is fundamental to many behaviors, but whether it is unitary or whether there are different types of behavior-specific arousal has not been clear. In Drosophila, dopamine promotes sleep-wake arousal. However, there is conflicting evidence regarding its influence on environmentally stimulated arousal. Here we show that loss-of-function mutations in the D1 dopamine receptor DopR enhance repetitive startle-induced arousal while decreasing sleep-wake arousal (i.e., increasing sleep). These two types of arousal are also inversely influenced by cocaine, whose effects in each case are opposite to, and abrogated by, the DopR mutation. Selective restoration of DopR function in the central complex rescues the enhanced stimulated arousal but not the increased sleep phenotype of DopR mutants. These data provide evidence for at least two different forms of arousal, which are independently regulated by dopamine in opposite directions, via distinct neural circuits.
Neuron 11/2009; 64(4):522-36. · 14.74 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We present a camera-based method for automatically quantifying the individual and social behaviors of fruit flies, Drosophila melanogaster, interacting in a planar arena. Our system includes machine-vision algorithms that accurately track many individuals without swapping identities and classification algorithms that detect behaviors. The data may be represented as an ethogram that plots the time course of behaviors exhibited by each fly or as a vector that concisely captures the statistical properties of all behaviors displayed in a given period. We found that behavioral differences between individuals were consistent over time and were sufficient to accurately predict gender and genotype. In addition, we found that the relative positions of flies during social interactions vary according to gender, genotype and social environment. We expect that our software, which permits high-throughput screening, will complement existing molecular methods available in Drosophila, facilitating new investigations into the genetic and cellular basis of behavior.
Nature Methods 06/2009; 6(6):451-7. · 19.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We introduce a method based on machine vision for automatically measuring aggression and courtship in Drosophila melanogaster. The genetic and neural circuit bases of these innate social behaviors are poorly understood. High-throughput behavioral screening in this genetically tractable model organism is a potentially powerful approach, but it is currently very laborious. Our system monitors interacting pairs of flies and computes their location, orientation and wing posture. These features are used for detecting behaviors exhibited during aggression and courtship. Among these, wing threat, lunging and tussling are specific to aggression; circling, wing extension (courtship 'song') and copulation are specific to courtship; locomotion and chasing are common to both. Ethograms may be constructed automatically from these measurements, saving considerable time and effort. This technology should enable large-scale screens for genes and neural circuits controlling courtship and aggression.
Nature Methods 05/2009; 6(4):297-303. · 19.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: How do reward outcomes affect early visual performance? Previous studies found a suboptimal influence, but they ignored the non-linearity in how subjects perceived the reward outcomes. In contrast, we find that when the non-linearity is accounted for, humans behave optimally and maximize expected reward. Our subjects were asked to detect the presence of a familiar target object in a cluttered scene. They were rewarded according to their performance. We systematically varied the target frequency and the reward/penalty policy for detecting/missing the targets. We find that 1) decreasing the target frequency will decrease the detection rates, in accordance with the literature. 2) Contrary to previous studies, increasing the target detection rewards will compensate for target rarity and restore detection performance. 3) A quantitative model based on reward maximization accurately predicts human detection behavior in all target frequency and reward conditions; thus, reward schemes can be designed to obtain desired detection rates for rare targets. 4) Subjects quickly learn the optimal decision strategy; we propose a neurally plausible model that exhibits the same properties. Potential applications include designing reward schemes to improve detection of life-critical, rare targets (e.g., cancers in medical images).
Journal of Vision 02/2009; 9(1):31.1-16. · 3.38 Impact Factor
-
2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA; 01/2009
-
[show abstract]
[hide abstract]
ABSTRACT: We observe that everyday images contain dozens of objects, and that humans, in describing these images, give different priority
to these objects. We argue that a goal of visual recognition is, therefore, not only to detect and classify objects but also
to associate with each a level of priority which we call ‘importance’. We propose a definition of importance and show how
this may be estimated reliably from data harvested from human observers. We conclude by showing that a first-order estimate
of importance may be computed from a number of simple image region measurements and does not require access to image meaning.
10/2008: pages 523-536;
-
[show abstract]
[hide abstract]
ABSTRACT: Environmental and genetic factors can modulate aggressiveness, but the biological mechanisms underlying their influence are largely unknown. Social experience with conspecifics suppresses aggressiveness in both vertebrate and invertebrate species, including Drosophila. We searched for genes whose expression levels correlate with the influence of social experience on aggressiveness in Drosophila by performing microarray analysis of head tissue from socially isolated (aggressive) vs. socially experienced (nonaggressive) male flies. Among approximately 200 differentially expressed genes, only one was also present in a gene set previously identified by profiling Drosophila strains subjected to genetic selection for differences in aggressiveness [Dierick HA, Greenspan RJ (2006) Nat Genet 38:1023-1031]. This gene, Cyp6a20, encodes a cytochrome P450. Social experience increased Cyp6a20 expression and decreased aggressiveness in a reversible manner. In Cyp6a20 mutants, aggressiveness was increased in group-housed but not socially isolated flies. These data identify a common genetic target for environmental and heritable influences on aggressiveness. Cyp6a20 is expressed in a subset of nonneuronal support cells associated with pheromone-sensing olfactory sensilla, suggesting that social experience may influence aggressiveness by regulating pheromone sensitivity.
Proceedings of the National Academy of Sciences 05/2008; 105(15):5657-63. · 9.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Humans move their eyes while looking at scenes and pictures. Eye movements correlate with shifts in attention and are thought to be a consequence of optimal resource allocation for high-level tasks such as visual recognition. Models of attention, such as "saliency maps," are often built on the assumption that "early" features (color, contrast, orientation, motion, and so forth) drive attention directly. We explore an alternative hypothesis: Observers attend to "interesting" objects. To test this hypothesis, we measure the eye position of human observers while they inspect photographs of common natural scenes. Our observers perform different tasks: artistic evaluation, analysis of content, and search. Immediately after each presentation, our observers are asked to name objects they saw. Weighted with recall frequency, these objects predict fixations in individual images better than early saliency, irrespective of task. Also, saliency combined with object positions predicts which objects are frequently named. This suggests that early saliency has only an indirect effect on attention, acting through recognized objects. Consequently, rather than treating attention as mere preprocessing step for object recognition, models of both need to be integrated.
Journal of Vision 02/2008; 8(14):18.1-26. · 3.38 Impact Factor
-
Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part II; 01/2008
-
[show abstract]
[hide abstract]
ABSTRACT: What do we see when we glance at a natural scene and how does it change as the glance becomes longer? We asked naive subjects to report in a free-form format what they saw when looking at briefly presented real-life photographs. Our subjects received no specific information as to the content of each stimulus. Thus, our paradigm differs from previous studies where subjects were cued before a picture was presented and/or were probed with multiple-choice questions. In the first stage, 90 novel grayscale photographs were foveally shown to a group of 22 native-English-speaking subjects. The presentation time was chosen at random from a set of seven possible times (from 27 to 500 ms). A perceptual mask followed each photograph immediately. After each presentation, subjects reported what they had just seen as completely and truthfully as possible. In the second stage, another group of naive individuals was instructed to score each of the descriptions produced by the subjects in the first stage. Individual scores were assigned to more than a hundred different attributes. We show that within a single glance, much object- and scene-level information is perceived by human subjects. The richness of our perception, though, seems asymmetrical. Subjects tend to have a propensity toward perceiving natural scenes as being outdoor rather than indoor. The reporting of sensory- or feature-level information of a scene (such as shading and shape) consistently precedes the reporting of the semantic-level information. But once subjects recognize more semantic-level components of a scene, there is little evidence suggesting any bias toward either scene-level or object-level recognition.
Journal of Vision 02/2007; 7(1):10. · 3.38 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We present a “parts and structure” model for object category recognition that can be learnt efficiently and in a weakly-supervised
manner: the model is learnt from example images containing category instances, without requiring segmentation from background
clutter.
The model is a sparse representation of the object, and consists of a star topology configuration of parts modeling the output
of a variety of feature detectors. The optimal choice of feature types (whose repertoire includes interest points, curves
and regions) is made automatically.
In recognition, the model may be applied efficiently in a complete manner, bypassing the need for feature detectors, to give
the globally optimal match within a query image. The approach is demonstrated on a wide variety of categories, and delivers
both successful classification and localization of the object within the image.
01/2007: pages 443-461;
-
International Journal of Computer Vision. 01/2007; 71:273-303.
-
International Journal of Computer Vision. 01/2007; 71:305-336.
-
[show abstract]
[hide abstract]
ABSTRACT: Current computational approaches to learning visual object categories require thousands of training images, are slow, cannot learn in an incremental manner and cannot incorporate prior information into the learning process. In addition, no algorithm presented in the literature has been tested on more than a handful of object categories. We present an method for learning object categories from just a few training images. It is quick and it uses prior information in a principled way. We test it on a dataset composed of images of objects belonging to 101 widely varied categories. Our proposed method is based on making use of prior information, assembled from (unrelated) object categories which were previously learnt. A generative probabilistic model is used, which represents the shape and appearance of a constellation of features belonging to the object. The parameters of the model are learnt incrementally in a Bayesian manner. Our incremental algorithm is compared experimentally to an earlier batch Bayesian algorithm, as well as to one based on maximum likelihood. The incremental and batch versions have comparable classification performance on small training sets, but incremental learning is significantly faster, making real-time learning feasible. Both Bayesian methods outperform maximum likelihood on small training sets.
Computer Vision and Image Understanding. 01/2007;