Article

An expanded model for perceptual visual single object recognition system using expectation priming following neuroscientific evidence

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Under numerous circumstances, humans recognize visual objects in their environment with remarkable response times and accuracy. Existing artificial visual object recognition systems have not yet surpassed human vision, especially in its universality of application. We argue that modeling the recognition process in an exclusive feedforward manner hinders those systems’ performance. To bridge that performance gap between them and human vision, we present a brief review of neuroscientific data, which suggests that considering an agent’s internal influences (from cognitive systems that peripherally interact with visual-perceptual processes) recognition can be improved. Then, we propose a model for visual object recognition which uses these systems’ information, such as affection, for generating expectation to prime the object recognition system, thus reducing its execution times. Later, an implementation of the model is described. Finally, we present and discuss an experiment and its results.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this proposal, this agent helps to remember situations of interest, and associate situations with plans. • Perception agent: In the human being, this cognitive function gives semantic meaning to information coming from different senses [54]. In our case, this agent processes information sensed by InfoCom devices to generate its semantic meaning and helps determine the context. ...
... To propose a plan, it first consults the memory agent to see whether it already knows a plan to reach the objective in the specific context; otherwise it must create one from scratch. • Perception agent: This agent processes information sensed by InfoCom devices to generate its semantic meaning [54]. In our case, it assigns the meaning of crime to a scene involving a human with a weapon approaching another human in a specific context. ...
Full-text available
Article
Pervasive service composition is useful in many scenarios, for instance, in urban planning or controlled harvest. Currently, there is no standard to develop solutions using pervasive service composition. However, big companies propose their frameworks to develop complex services, but their frameworks are appropriate in specific applications, such as home automation and agriculture. On the other hand, there are different very well-grounded academic proposals for pervasive service composition. However, these do not solve the problems of traditional approaches that are appropriate to specific areas of application, and adaptation is needed to deal with the dynamism of the environment. This article presents a cognitive approach for pervasive service composition where InfoCom devices and the implementation of cognitive functions interact to create pervasive composite services. Our central hypothesis is that cognitive theory can help solve actual problems requiring pervasive service composition, as it addresses the above-mentioned problems. To test our approach, in this article we present a case of urban insecurity. Specifically, in different countries, street robbery using firearms is one of the problems with a high impact because of its frequency. This article proposes to compose a pervasive service for deterring criminals from committing their crimes. The results obtained by simulating our proposal in our case study are promising. However, more research needs to be achieved before applying the proposed approach to actual problems. The research needed ought to address various problems, some of which are discussed in this article.
... The first step in the workflow was the capture of the environmental inputs through the sensory system agents such as visual inputs [22]. Later, the object classification results in a scene composition inside of the memory agent [21] thanks to the perception agent [23]. Inside the memory agent the new scene is compared with scenes learnt in the past. ...
Article
There has been numerous works around cognitive architectures recently. Nevertheless, since no cognitive architecture is truly finished because of the limitations of brain studies, the activities that a cognitive architecture can perform is not well delimited. On the other hand, we argue that cognition is not a system’s characteristic but rather, a relationship between a system and its environment. In this article, we propose a task that a hypothetical finished cognitive architecture must be able to solve. Then, the performance that this hypothetical cognitive architecture achieves is evaluated. Our main goals with this are (1) to evaluate the viability of using cognitive architectures to solve these kind of tasks, and (2) to find good ways to present information to systems that work as humans.
Full-text available
Article
Converging evidence suggests that the primate ventral visual pathway encodes increasingly complex stimulus features in downstream areas. We quantitatively show that there indeed exists an explicit gradient for feature complexity in the ventral pathway of the human brain. This was achieved by mapping thousands of stimulus features of increasing complexity across the cortical sheet using a deep neural network. Our approach also revealed a fine-grained functional specialization of downstream areas of the ventral stream. Furthermore, it allowed decoding of representations from human brain activity at an unsurpassed degree of accuracy, confirming the quality of the developed approach. Stimulus features that successfully explained neural responses indicate that population receptive fields were explicitly tuned for object categorization. This provides strong support for the hypothesis that object categorization is a guiding principle in the functional organization of the primate ventral stream.
Full-text available
Article
From only brief exposure to a face, individuals spontaneously categorize another's race. Recent behavioral evidence suggests that visual context may affect such categorizations. We used fMRI to examine the neural basis of contextual influences on the race categorization of faces. Participants categorized the race of faces that varied along a White-Asian morph continuum and were surrounded by American, neutral, or Chinese scene contexts. As expected, the context systematically influenced categorization responses and their efficiency (response times). Neuroimaging results indicated that the retrosplenial cortex (RSC) and orbitofrontal cortex (OFC) exhibited highly sensitive, graded responses to the compatibility of facial and contextual cues. These regions showed linearly increasing responses as a face became more White when in an American context, and linearly increasing responses as a face became more Asian when in a Chinese context. Further, RSC activity partially mediated the effect of this face-context compatibility on the efficiency of categorization responses. Together, the findings suggest a critical role of the RSC and OFC in driving contextual influences on face categorization, and highlight the impact of extraneous cues beyond the face in categorizing other people.
Full-text available
Article
Numerous psychophysical studies have described perceptual learning as long-lasting improvements in perceptual discrimination and detection capabilities following practice. Where and how long-term plastic changes occur in the brain is central to understanding the neural basis of perceptual learning. Here, neurophysiological research using non-human primates is reviewed to address the neural mechanisms underlying visual perceptual learning. Previous studies have shown that training either has no effect on or only weakly alters the sensitivity of neurons in early visual areas, but more recent evidence indicates that training can cause long-term changes in how sensory signals are read out in the later stages of decision making. These results are discussed in the context of learning specificity, which has been crucial in interpreting the mechanisms underlying perceptual learning. The possible mechanisms that support learning-related plasticity are also discussed.
Full-text available
Article
The neuronal mechanisms underlying perceptual grouping of discrete, similarly oriented elements are not well understood. To investigate this, we measured neural population responses using voltage-sensitive dye imaging in V1 of monkeys trained on a contour-detection task. By mapping the contour and background elements onto V1, we could study their neural processing. Population response early in time showed activation patches corresponding to the contour/background individual elements. However, late increased activity in the contour elements, along with suppressed activity in the background elements, enabled us to visualize in single trials a salient continuous contour "popping out" from a suppressed background. This modulated activity in the contour and in background extended beyond the cortical representation of individual contour or background elements. Finally, the late modulation was correlated with behavioral performance of contour saliency and the monkeys' perceptual report. Thus, opposing responses in the contour and background may underlie perceptual grouping in V1.
Full-text available
Article
Since the original characterization of the ventral visual pathway, our knowledge of its neuroanatomy, functional properties, and extrinsic targets has grown considerably. Here we synthesize this recent evidence and propose that the ventral pathway is best understood as a recurrent occipitotemporal network containing neural representations of object quality both utilized and constrained by at least six distinct cortical and subcortical systems. Each system serves its own specialized behavioral, cognitive, or affective function, collectively providing the raison d'être for the ventral visual pathway. This expanded framework contrasts with the depiction of the ventral visual pathway as a largely serial staged hierarchy culminating in singular object representations and more parsimoniously incorporates attentional, contextual, and feedback effects.
Full-text available
Article
Hierarchical generative models, such as Bayesian networks, and belief propagation have been shown to provide a theoretical framework that can account for perceptual processes, including feedforward recognition and feedback modulation. The framework explains both psychophysical and physiological experimental data and maps well onto the hierarchical distributed cortical anatomy. However, the complexity required to model cortical processes makes inference, even using approximate methods, very computationally expensive. Thus, existing object perception models based on this approach are typically limited to tree-structured networks with no loops, use small toy examples or fail to account for certain perceptual aspects such as invariance to transformations or feedback reconstruction. In this study we develop a Bayesian network with an architecture similar to that of HMAX, a biologically-inspired hierarchical model of object recognition, and use loopy belief propagation to approximate the model operations (selectivity and invariance). Crucially, the resulting Bayesian network extends the functionality of HMAX by including top-down recursive feedback. Thus, the proposed model not only achieves successful feedforward recognition invariant to noise, occlusions, and changes in position and size, but is also able to reproduce modulatory effects such as illusory contour completion and attention. Our novel and rigorous methodology covers key aspects such as learning using a layerwise greedy algorithm, combining feedback information from multiple parents and reducing the number of operations required. Overall, this work extends an established model of object recognition to include high-level feedback modulation, based on state-of-the-art probabilistic approaches. The methodology employed, consistent with evidence from the visual cortex, can be potentially generalized to build models of hierarchical perceptual organization that include top-down and bottom-up interactions, for example, in other sensory modalities.
Full-text available
Article
Memory and perception have long been considered separate cognitive processes, and amnesia resulting from medial temporal lobe (MTL) damage is thought to reflect damage to a dedicated memory system. Recent work has questioned these views, suggesting that amnesia can result from impoverished perceptual representations in the MTL, causing an increased susceptibility to interference. Using a perceptual matching task for which fMRI implicated a specific MTL structure, the perirhinal cortex, we show that amnesics with MTL damage including the perirhinal cortex, but not those with damage limited to the hippocampus, were vulnerable to object-based perceptual interference. Importantly, when we controlled such interference, their performance recovered to normal levels. These findings challenge prevailing conceptions of amnesia, suggesting that effects of damage to specific MTL regions are better understood not in terms of damage to a dedicated declarative memory system, but in terms of impoverished representations of the stimuli those regions maintain.
Full-text available
Article
Neurophysiological evidence for invariant representations of objects and faces in the primate inferior temporal visual cortex is described. Then a computational approach to how invariant representations are formed in the brain is described that builds on the neurophysiology. A feature hierarchy model in which invariant representations can be built by self-organizing learning based on the temporal and spatial statistics of the visual input produced by objects as they transform in the world is described. VisNet can use temporal continuity in an associative synaptic learning rule with a short-term memory trace, and/or it can use spatial continuity in continuous spatial transformation learning which does not require a temporal trace. The model of visual processing in the ventral cortical stream can build representations of objects that are invariant with respect to translation, view, size, and also lighting. The model has been extended to provide an account of invariant representations in the dorsal visual system of the global motion produced by objects such as looming, rotation, and object-based movement. The model has been extended to incorporate top-down feedback connections to model the control of attention by biased competition in, for example, spatial and object search tasks. The approach has also been extended to account for how the visual system can select single objects in complex visual scenes, and how multiple objects can be represented in a scene. The approach has also been extended to provide, with an additional layer, for the development of representations of spatial scenes of the type found in the hippocampus.
Full-text available
Article
The mechanisms of perceptual learning are analyzed theoretically, probed in an orientation-discrimination experiment involving a novel nonstationary context manipulation, and instantiated in a detailed computational model. Two hypotheses are examined: modification of early cortical representations versus task-specific selective reweighting. Representation modification seems neither functionally necessary nor implied by the available psychophysical and physiological evidence. Computer simulations and mathematical analyses demonstrate the functional and empirical adequacy of selective reweighting as a perceptual learning mechanism. The stimulus images are processed by standard orientation- and frequency-tuned representational units, divisively normalized. Learning occurs only in the "read-out" connections to a decision unit; the stimulus representations never change. An incremental Hebbian rule tracks the task-dependent predictive value of each unit, thereby improving the signal-to-noise ratio of their weighted combination. Each abrupt change in the environmental statistics induces a switch cost in the learning curves as the system temporarily works with suboptimal weights.
Full-text available
Article
This is a report on the LIDA architecture, a work in progress that is based on IDA, an intelligent, autonomous, "conscious" software agent that does personnel work for the US Navy. IDA uses locally developed cutting edge artificial intelligence technology designed to model hu- man cognition. IDA's task is to find jobs for sailors whose current assignments are about to end. She selects jobs to offer a sailor, taking into account the Navy's policies, the job's needs, the sailor's preferences, and her own delibera- tion about feasible dates. Then she negotiates with the sailor, in English via iterative emails, about job selection. We use the word "conscious" in the sense of Baars' Global Workspace Theory (Baars, 1988, 1997), upon which our architecture is based. IDA loops through a cognitive cycle in which she perceives the environments, internal and external; creates meaning, by interpreting the environment and deciding what is important; and answers the only question there is: "What do I do next?" LIDA, the learning IDA will add three modes of learning to IDA's design: perceptual learn- ing, episodic learning, and procedural learning. LIDA will learn from experience, which may yield several les- sons over several cognitive cycles. Such lessons include newly perceived objects and their relationship to already known objects and categories, relationships among objects and between objects and actions, effects of actions on sen- sation, and improved perception of sensory data. The LIDA architecture incorporates six major artificial intelli- gence software technologies: the copycat architecture, sparse distributed memory, pandemonium theory, the schema mechanism, the behavior net model, and the sub- sumption architecture.
Full-text available
Article
The division of cortical visual processing into distinct dorsal and ventral streams is a key framework that has guided visual neuroscience. The characterization of the ventral stream as a 'What' pathway is relatively uncontroversial, but the nature of dorsal stream processing is less clear. Originally proposed as mediating spatial perception ('Where'), more recent accounts suggest it primarily serves non-conscious visually guided action ('How'). Here, we identify three pathways emerging from the dorsal stream that consist of projections to the prefrontal and premotor cortices, and a major projection to the medial temporal lobe that courses both directly and indirectly through the posterior cingulate and retrosplenial cortices. These three pathways support both conscious and non-conscious visuospatial processing, including spatial working memory, visually guided action and navigation, respectively.
Full-text available
Article
Object vision in human and nonhuman primates is often cited as a primary example of adult plasticity in neural information processing. It has been hypothesized that visual experience leads to single neurons in the monkey brain with strong selectivity for complex objects, and to regions in the human brain with a preference for particular categories of highly familiar objects. This view suggests that adult visual experience causes dramatic local changes in the response properties of high-level visual cortex. Here, we review the current neurophysiological and neuroimaging evidence and find that the available data support a different conclusion: adult visual experience introduces moderate, relatively distributed effects that modulate a pre-existing, rich and flexible set of neural object representations.
Full-text available
Article
We recorded the activities of neurons in the lateral surface of the posterior inferior temporal cortex (PIT) of 3 hemispheres of 3 monkeys performing a visual fixation task. We characterized the color and shape selectivities of each neuron, mapped its receptive field (RF), and studied the distributions of these response properties. Using a set of color stimuli that were systematically distributed in Commission Internationale de l'Eclairage-xy chromaticity diagram, we found numerous color-selective neurons distributed throughout the area examined. Neurons in the ventral region tended to have sharper color tuning than those in the dorsal region. We also found a crude retinotopic organization in the ventral region. Within the ventral region of PIT, neurons in the dorsal part had RFs that overlapped the foveal center; the eccentricity of RFs increased in the more ventral part, and neurons in the anterior and posterior parts had RFs that represented the lower and upper visual fields, respectively. In all 3 hemispheres, the region where sharply tuned color-selective neurons were concentrated was confined within this retinotopic map. These findings suggest that PIT is a heterogeneous area and that there is a circumscribed region within it that has crude retinotopic organization and is involved in the processing of color.
Full-text available
Article
1. The inferotemporal cortex (IT) has been thought to play an essential and specific role in visual object discrimination and recognition, because a lesion of IT in the monkey results in a specific deficit in learning tasks that require these visual functions. To understand the cellular basis of the object discrimination and recognition processes in IT, we determined the optimal stimulus of individual IT cells in anesthetized, immobilized monkeys. 2. In the posterior one-third or one-fourth of IT, most cells could be activated maximally by bars or disks just by adjusting the size, orientation, or color of the stimulus. 3. In the remaining anterior two-thirds or three-quarters of IT, most cells required more complex features for their maximal activation. 4. The critical feature for the activation of individual anterior IT cells varied from cell to cell: a complex shape in some cells and a combination of texture or color with contour-shape in other cells. 5. Cells that showed different types of complexity for the critical feature were intermingled throughout anterior IT, whereas cells recorded in single penetrations showed critical features that were related in some respects. 6. Generally speaking, the critical features of anterior IT cells were moderately complex and can be thought of as partial features common to images of several different natural objects. The selectivity to the optimal stimulus was rather sharp, although not absolute. We thus propose that, in anterior IT, images of objects are coded by combinations of active cells, each of which represents the presence of a particular partial feature in the image.
Full-text available
Article
Inferior temporal cortex neurons have generally been found to have large visual receptive fields that typically include the fovea and extend throughout much of the visual field. However, a problem of such a large receptive field is that it does not easily support object selection by subsequent processing areas, in that all objects within such a large receptive field might activate inferior temporal cortex cells. To clarify this, we recorded from inferior temporal cortex neurons while macaques searched for objects in complex natural scenes or in plain backgrounds, as normally used. Inferior temporal cortex neuron receptive fields were much smaller in natural scenes (mean radius, 11 degrees) than in plain backgrounds (39 degrees). With two objects in a scene, one of which was a target for action (a touch), the firing rates were equally high during foveation of the effective stimulus when it was the target and when it was the distractor in both the plain and the complex scenes. With a plain background and two objects present, the receptive fields were much larger (24 degrees ) for the stimulus when it was the target than when it was the distractor (9 degrees ). This effect of object-based attention was much less evident in the complex scene, when the receptive fields were small both when the stimulus was a distractor and when it was a target. The results show that the temporal visual cortex provides an unambiguous representation in natural scenes by responding to the object shown at or close to the fixation point.
Full-text available
Article
Visual object recognition is computationally difficult because changes in an object's position, distance, pose, or setting may cause it to produce a different retinal image on each encounter. To robustly recognize objects, the primate brain must have mechanisms to compensate for these variations. Although these mechanisms are poorly understood, it is thought that they elaborate neuronal representations in the inferotemporal cortex that are sensitive to object form but substantially invariant to other image variations. This study examines this hypothesis for image variation resulting from changes in object position. We studied the effect of small differences (+/-1.5 degrees ) in the retinal position of small (0.6 degrees wide) visual forms on both the behavior of monkeys trained to identify those forms and the responses of 146 anterior IT (AIT) neurons collected during that behavior. Behavioral accuracy and speed were largely unaffected by these small changes in position. Consistent with previous studies, many AIT responses were highly selective for the forms. However, AIT responses showed far greater sensitivity to retinal position than predicted from their reported receptive field (RF) sizes. The median AIT neuron showed a approximately 60% response decrease between positions within +/-1.5 degrees of the center of gaze, and 52% of neurons were unresponsive to one or more of these positions. Consistent with previous studies, each neuron's rank order of target preferences was largely unaffected across position changes. Although we have not yet determined the conditions necessary to observe this marked position sensitivity in AIT responses, we rule out effects of spatial-frequency content, eye movements, and failures to include the RF center. To reconcile this observation with previous studies, we hypothesize that either AIT position sensitivity strongly depends on object size or that position sensitivity is sharpened by extensive visual experience at fixed retinal positions or by the presence of flanking distractors.
Full-text available
Article
The majority of the research related to visual recognition has so far focused on bottom-up analysis, where the input is processed in a cascade of cortical regions that analyze increasingly complex information. Gradually more studies emphasize the role of top-down facilitation in cortical analysis, but it remains something of a mystery how such processing would be initiated. After all, top-down facilitation implies that high-level information is activated earlier than some relevant lower-level information. Building on previous studies, I propose a specific mechanism for the activation of top-down facilitation during visual object recognition. The gist of this hypothesis is that a partially analyzed version of the input image (i.e., a blurred image) is projected rapidly from early visual areas directly to the prefrontal cortex (PFC). This coarse representation activates in the PFC expectations about the most likely interpretations of the input image, which are then back-projected as an "initial guess" to the temporal cortex to be integrated with the bottom-up analysis. The top-down process facilitates recognition by substantially limiting the number of object representations that need to be considered. Furthermore, such a rapid mechanism may provide critical information when a quick response is necessary.
Full-text available
Article
Angles and junctions embedded within contours are important features to represent the shape of objects. To study the neuronal basis to extract these features, we conducted extracellular recordings while two macaque monkeys performed a fixation task. Angle stimuli were the combination of two straight half-lines larger than the size of the classical receptive fields (CRFs). Each line was drawn from the center to outside the CRFs in 1 of 12 directions, so that the stimuli passed through the CRFs and formed angles at the center of the CRFs. Of 114 neurons recorded from the superficial layer of area V2, 91 neurons showed selective responses to these angle stimuli. Of these, 41 neurons (36.0%) showed selective responses to wide angles between 60 degrees and 150 degrees that were distinct from responses to straight lines or sharp angles (30 degrees ). Responses were highly selective to a particular angle in approximately one-fourth of neurons. When we tested the selectivity of the same neurons to individual half-lines, the preferred direction was more or less consistent with one or two components of the optimal angle stimuli. These results suggest that the selectivity of the neurons depends on both the combination of two components and the responses to individual components. Angle-selective V2 neurons are unlikely to be specific angle detectors, because the magnitude of their responses to the optimal angle was indistinguishable from that to the optimal half-lines. We suggest that the extraction of information of angles embedded within contour stimuli may start in area V2.
Full-text available
Article
We see the world in scenes, where visual objects occur in rich surroundings, often embedded in a typical context with other related objects. How does the human brain analyse and use these common associations? This article reviews the knowledge that is available, proposes specific mechanisms for the contextual facilitation of object recognition, and highlights important open questions. Although much has already been revealed about the cognitive and cortical mechanisms that subserve recognition of individual objects, surprisingly little is known about the neural underpinnings of contextual analysis and scene perception. Building on previous findings, we now have the means to address the question of how the brain integrates individual elements to construct the visual experience.
Full-text available
Article
It takes a fraction of a second to recognize a person or an object even when seen under strikingly different conditions. How such a robust, high-level representation is achieved by neurons in the human brain is still unclear. In monkeys, neurons in the upper stages of the ventral visual pathway respond to complex images such as faces and objects and show some degree of invariance to metric properties such as the stimulus size, position and viewing angle. We have previously shown that neurons in the human medial temporal lobe (MTL) fire selectively to images of faces, animals, objects or scenes. Here we report on a remarkable subset of MTL neurons that are selectively activated by strikingly different pictures of given individuals, landmarks or objects and in some cases even by letter strings with their names. These results suggest an invariant, sparse and explicit code, which might be important in the transformation of complex visual percepts into long-term and more abstract memories.
Full-text available
Article
While it is often assumed that objects can be recognized irrespective of where they fall on the retina, little is known about the mechanisms underlying this ability. By exposing human subjects to an altered world where some objects systematically changed identity during the transient blindness that accompanies eye movements, we induced predictable object confusions across retinal positions, effectively 'breaking' position invariance. Thus, position invariance is not a rigid property of vision but is constantly adapting to the statistics of the environment.
Full-text available
Article
The macaque inferotemporal (IT) cortex, which serves as the storehouse of visual long-term memory, consists of two distinct but mutually interconnected areas: area TE (TE) and area 36 (A36). In the present study, we tested whether memory encoding is put forward at this stage, i.e., whether association between the representations of different but semantically linked objects proceeds forward from TE to A36. To address this question, we trained monkeys in a pair-association (PA) memory task, after which single-unit activities were recorded from TE and A36 during PA trials. Neurons in both areas showed stimulus-selective cue responses (347 in TE, 76 in A36; "cue-selective neurons") that provided, at the population level, mnemonic linkage between the paired associates. The percentage of neurons in which responses to the paired associates were significantly (p < 0.01) correlated at the single-neuron level ("pair-coding neuron") dramatically increased from TE (4.9% of the cue-selective neurons) to A36 (33%). The pair-coding neurons in A36 were further separable into Type1 (68%) and Type2 (32%) on the basis of their initial transient responses after cue stimulus presentation. Type1 neurons, but not Type2 neurons, began to encode association between paired stimuli as soon as they exhibited stimulus selectivity. Thus, the representation of long-term memory encoded by Type1 neurons in A36 is likely substantiated without feedback input from other higher centers. Therefore, we conclude that association between the representations of the paired associates proceeds forward at this critical step within IT cortex, suggesting selective convergence onto a single A36 neuron from two TE neurons that encode separate visual objects.
Full-text available
Article
Understanding the brain computations leading to object recognition requires quantitative characterization of the information represented in inferior temporal (IT) cortex. We used a biologically plausible, classifier-based readout technique to investigate the neural coding of selectivity and invariance at the IT population level. The activity of small neuronal populations (approximately 100 randomly selected cells) over very short time intervals (as small as 12.5 milliseconds) contained unexpectedly accurate and robust information about both object "identity" and "category." This information generalized over a range of object positions and scales, even for novel objects. Coarse information about position and scale could also be read out from the same population.
Full-text available
Article
The ability to use abstract rules or principles allows behavior to generalize from specific circumstances. We have previously shown that such rules are encoded in the lateral prefrontal cortex (PFC) and premotor cortex (PMC). Here, we extend these investigations to two other areas directly connected with the PFC and the PMC, the inferior temporal cortex (ITC) and the dorsal striatum (STR). Monkeys were trained to use two abstract rules: "same" or "different". They had to either hold or release a lever, depending on whether two successively presented pictures were the same or different, and depending on which rule was in effect. The rules and the behavioral responses were reflected most strongly and, on average, tended to be earlier in the PMC followed by the PFC and then the STR; few neurons in the ITC reflected the rules or the actions. By contrast, perceptual information (the identity of the pictures used as sample and test stimuli) was encoded more strongly and earlier in the ITC, followed by the PFC; they had weak, if any, effects on neural activity in the PMC and STR. These findings are discussed in the context of the anatomy and posited functions of these areas.
Full-text available
Article
We investigated object representation in area TE, the anterior part of monkey inferotemporal (IT) cortex, with a combination of optical and extracellular recordings in anesthetized monkeys. We found neurons that respond to visual stimuli composed of naturally distinguishable parts. These neurons were sensitive to a particular spatial arrangement of parts but less sensitive to differences in local features within individual parts. Thus these neurons were activated when arbitrary local features were arranged in a particular spatial configuration, suggesting that they may be responsible for representing the spatial configuration of object images. Previously it has been reported that many neurons in area TE respond to visual features less complex than natural objects, but it has remained unclear whether these features are related to local features of object images or to more global features. These results indicate that TE neurons represent not only local features but also global features such as the spatial relationship among object parts.
Full-text available
Article
We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
Article
With practice, humans tend to improve their performance on most tasks. But do such improvements then generalize to new tasks? Although early work documented primarily task-specific learning outcomes in the domain of perceptual learning [1-3], an emerging body of research has shown that significant learning generalization is possible under some training conditions [4-9]. Interestingly, however, research in this vein has focused nearly exclusively on just one possible manifestation of learning generalization, wherein training on one task produces an immediate boost to performance on the new task. For instance, it is this form of generalization that is most frequently referred to when discussing learning "transfer" [10, 11]. Essentially no work in this domain has focused on a second possible manifestation of generalization, wherein the knowledge or skills acquired via training, despite not being directly applicable to the new task, nonetheless allow the new task to be learned more efficiently [12-15]. Here, in both the visual category learning and visual perceptual learning domains, we demonstrate that sequentially training participants on tasks that share a common high-level task structure can produce faster learning of new tasks, even in cases where there is no immediate benefit to performance on the new tasks. We further show that methods commonly employed in the field may fail to detect or else conflate generalization that manifests as increased learning rate with generalization that manifests as immediate boosts to performance. These results thus lay the foundation for the various routes to learning generalization to be more thoroughly explored.
Article
Contour integration refers to the ability of the visual system to bind disjoint local elements into coherent global shapes. In cluttered images containing randomly oriented elements a contour becomes salient when its elements are coaligned with a smooth global trajectory, as described by the Gestalt law of good continuation. Abrupt changes of curvature strongly diminish contour salience. Here we show that by inserting local corner elements at points of angular discontinuity, a jagged contour becomes as salient as a straight one. We report results from detection experiments for contours with and without corner elements which indicate their psychophysical equivalence. This presents a challenge to the notion that contour integration mostly relies on local interactions between neurons tuned to single orientations, and suggests that a site where single orientations and more complex local features are combined constitutes the early basis of contour and 2D shape processing.
Conference Paper
The features of distributed systems help to solve problems in different research areas like fault tolerance, use of distributed resources, etc. The relevant cognitive architectures (CA) use middleware (distributed systems concept) to test its models and propose new theories. Thanks to a middleware, the researchers may conceive CAs as a whole, not as a set of components. However, most of the middlewares used in present CAs are modifications of generic ones, which leads to extra processing affecting the whole performance. In this research, we propose a middleware designed and developed taking into account the requirements of CAs. Our middleware allows us the integration of different cognitive functions, like memory and attention developed independently in an easily and incrementally. Also our middleware allows us test the cognitive functions integrated in the CA. To test our proposal, the middleware simulates an attention-novelty handling cognitive process.
Article
We suggest that population representation of objects in inferotemporal cortex lie on a continuum between a purely structural, parts-based description and a purely holistic description. The intrinsic dimensionality of object representation is estimated to be around 100, perhaps with lower dimensionalities for object representations more toward the holistic end of the spectrum. Cognitive knowledge in the form of semantic information and task information feed back to inferotemporal cortex from perirhinal and prefrontal cortex respectively, providing high-level multimodal-based expectations that assist in the interpretation of object stimuli. Integration of object information across eye movements may also contribute to object recognition through a process of active vision.
Article
Perception is substantially facilitated by top-down influences, typically seen as predictions. Here, we outline that the process is competitive in nature, in that sensory input initially activates multiple possible interpretations, or perceptual hypotheses, of its causes. This raises the question of how the selection of the correct interpretation from among those multiple hypotheses is achieved. We first review previous findings in support of such a competitive nature of perceptual processing, and then propose which neural regions might provide a platform for rising and using expectations to resolve this competition. Specifically, we propose that it is the rapid extraction and top-down dissemination of a global context signal from the frontal cortices, particularly the orbitofrontal cortex, that affords the quick and reliable resolution of the initial competition among likely alternatives toward a singular percept. © 2015 New York Academy of Sciences.
Article
When we see an object, we interpret the visual image that is projected on the retina, giving it meaning. The transformation from visual image to meaningful object takes place along the human ventral visual stream. Early in the ventral stream, brain activity represents low-level visual information,
Article
The parahippocampal cortex (PHC) has been associated with many cognitive processes, including visuospatial processing and episodic memory. To characterize the role of PHC in cognition, a framework is required that unifies these disparate processes. An overarching account was proposed whereby the PHC is part of a network of brain regions that processes contextual associations. Contextual associations are the principal element underlying many higher-level cognitive processes, and thus are suitable for unifying the PHC literature. Recent findings are reviewed that provide support for the contextual associations account of PHC function. In addition to reconciling a vast breadth of literature, the synthesis presented expands the implications of the proposed account and gives rise to new and general questions about context and cognition.
Article
Re-entrant or feedback pathways between cortical areas carry rich and varied information about behavioural context, including attention, expectation, perceptual tasks, working memory and motor commands. Neurons receiving such inputs effectively function as adaptive processors that are able to assume different functional states according to the task being executed. Recent data suggest that the selection of particular inputs, representing different components of an association field, enable neurons to take on different functional roles. In this Review, we discuss the various top-down influences exerted on the visual cortical pathways and highlight the dynamic nature of the receptive field, which allows neurons to carry information that is relevant to the current perceptual demands.
We propose a novel approach to learn and recognize natural scene categories. Unlike previous work [9,17], it does not require experts to annotate the training set. We represent the image of a scene by a collection of local regions, denoted as codewords obtained by unsupervised learning. Each region is represented as part of a "theme". In previous work, such themes were learnt from hand-annotations of experts, while our method learns the theme distributions as well as the codewords distribution over the themes without supervision. We report satisfactory categorization performances on a large set of 13 categories of complex scenes.
Article
Perceptual learning refers to the phenomenon that practice or training in perceptual tasks often substantially improves perceptual performance. Often exhibiting stimulus or task specificities, perceptual learning differs from learning in the cognitive or motor domains. Research on perceptual learning reveals important plasticity in adult perceptual systems, and as well as the limitations in the information processing of the human observer. In this article, we review the behavioral results, mechanisms, physiological basis, computational models, and applications of visual perceptual learning.
Article
We show in a unifying computational approach that representations of spatial scenes can be formed by adding an additional self-organizing layer of processing beyond the inferior temporal visual cortex in the ventral visual stream without the introduction of new computational principles. The invariant representations of objects by neurons in the inferior temporal visual cortex can be modelled by a multilayer feature hierarchy network with feedforward convergence from stage to stage, and an associative learning rule with a short-term memory trace to capture the invariant statistical properties of objects as they transform over short time periods in the world. If an additional layer is added to this architecture, training now with whole scenes that consist of a set of objects in a given fixed spatial relation to each other results in neurons in the added layer that respond to one of the trained whole scenes but do not respond if the objects in the scene are rearranged to make a new scene from the same objects. The formation of these scene-specific representations in the added layer is related to the fact that in the inferior temporal cortex and, we show, in the VisNet model, the receptive fields of inferior temporal cortex neurons shrink and become asymmetric when multiple objects are present simultaneously in a natural scene. This reduced size and asymmetry of the receptive fields of inferior temporal cortex neurons also provides a solution to the representation of multiple objects, and their relative spatial positions, in complex natural scenes.
Article
Previous investigations of the neural code for complex object shape have focused on two-dimensional pattern representation. This may be the primary mode for object vision given its simplicity and direct relation to the retinal image. In contrast, three-dimensional shape representation requires higher-dimensional coding derived from extensive computation. We found evidence for an explicit neural code for complex three-dimensional object shape. We used an evolutionary stimulus strategy and linear/nonlinear response models to characterize three-dimensional shape responses in macaque monkey inferotemporal cortex (IT). We found widespread tuning for three-dimensional spatial configurations of surface fragments characterized by their three-dimensional orientations and joint principal curvatures. Configural representation of three-dimensional shape could provide specific knowledge of object structure to support guidance of complex physical interactions and evaluation of object functionality and utility.
Article
The perceptual recognition of objects is conceptualized to be a process in which the image of the input is segmented at regions of deep concavity into an arrangement of simple geometric components. The fundamental assumption of the proposed theory, recognition-by-components (RBC), is that a modest set of generalized-cone components, called geons, can be derived from contrasts of five readily detectable properties of edges in a two-dimensional image. The detection of these properties is generally invariant over viewing position and image quality and consequently allows robust object perception when the image is projected from a novel viewpoint or is degraded. RBC thus provides a principled account of the heretofore undecided relation between the classic principles of perceptual organization and pattern recognition. The results from experiments on the perception of briefly presented pictures by human observers provide empirical support for the theory. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
1. The striate cortex was studied in lightly anaesthetized macaque and spider monkeys by recording extracellularly from single units and stimulating the retinas with spots or patterns of light. Most cells can be categorized as simple, complex, or hypercomplex, with response properties very similar to those previously described in the cat. On the average, however, receptive fields are smaller, and there is a greater sensitivity to changes in stimulus orientation. A small proportion of the cells are colour coded.
Article
We present a biologically plausible model of an attentional mechanism for forming position- and scale-invariant representations of objects in the visual world. The model relies on a set of control neurons to dynamically modify the synaptic strengths of intracortical connections so that information from a windowed region of primary visual cortex (V1) is selectively routed to higher cortical areas. Local spatial relationships (i.e., topography) within the attentional window are preserved as information is routed through the cortex. This enables attended objects to be represented in higher cortical areas within an object-centered reference frame that is position and scale invariant. We hypothesize that the pulvinar may provide the control signals for routing information through the cortex. The dynamics of the control neurons are governed by simple differential equations that could be realized by neurobiologically plausible circuits. In preattentive mode, the control neurons receive their input from a low-level "saliency map" representing potentially interesting regions of a scene. During the pattern recognition phase, control neurons are driven by the interaction between top-down (memory) and bottom-up (retinal input) sources. The model respects key neurophysiological, neuroanatomical, and psychophysical data relating to attention, and it makes a variety of experimentally testable predictions.
Article
We describe a neural model for forming size- and position-invariant representations of visual objects. The model is based on a previously proposed dynamic routing circuit that remaps selected portions of an input array into an object-centered reference frame. Here, we show how a multiscale representation may be incorporated at the input stage of the model, and we describe the control architecture and dynamics for a hierarchical, multistage routing circuit. Specific neurobiological substrates and mechanisms for the model are proposed, and a number of testable predictions are described.
Article
To explore the role of visual area V2 in shape analysis, we studied the responses of neurons in area V2 of the alert macaque using a set of 128 grating and geometric line stimuli that varied in their shape characteristics and geometric complexity. Simple stimuli included oriented bars and sinusoidal gratings; complex stimuli included angles, arcs, circles, and intersecting lines, plus hyperbolic and polar gratings. We found that most V2 cells responded well to at least some of the complex stimuli, and in many V2 cells the most effective complex stimulus elicited a significantly larger response than the most effective bar or sinusoid. Approximately one-third of the V2 cells showed significant differential responsiveness to various complex shape characteristics, and many were also selective for the orientation, size, and/or spatial frequency of the preferred shape. These results indicate that V2 cells explicitly represent complex shape information and suggest specific types of higher order visual information that V2 cells extract from visual scenes.
Article
In the search for the neural correlate of visual awareness, much controversy exists about the role of primary visual cortex. Here, the neurophysiological data from V1 recordings in awake monkeys are examined in light of two general classes of models of visual awareness. In the first model type, visual awareness is seen as being mediated either by a particular set of areas or pathways, or alternatively by a specific set of neurons. In these models, the role of V1 seems rather limited, as the mere activity of V1 cells seems insufficient to mediate awareness. In the second model type, awareness is hypothesized to be mediated by a global mechanism, i.e. a specific kind of activity not linked to a particular area or cell type. Two separate versions of global models are discussed, synchronous oscillations and spike rate modulations. It is shown that V1 synchrony does not reflect perception but rather the horizontal connections between neurons, indicating that V1 synchrony cannot be a direct neural correlate of conscious percepts. However, the rate of spike discharges of V1 neurons is strongly modulated by perceptual context, and these modulations correlate very well with aspects of perceptual organization, visual awareness, and attention. If these modulations serve as a neural correlate of visual awareness, then V1 contributes to that neural correlate. Whether V1 plays a role in the neural correlate of visual awareness thus strongly depends on the way visual awareness is hypothesized to be implemented in the brain.
Article
We describe a model of invariant visual object recognition in the brain that incorporates feedback biasing effects of top-down attentional mechanisms on a hierarchically organized set of visual cortical areas with convergent forward connectivity, reciprocal feedback connections, and local intra-area competition. The model displays space-based and object-based covert visual search by using attentional top-down feedback from either the posterior parietal or the inferior temporal cortex (IT) modules, and interactions between the two processing streams occurring in V1 and V2. The model explains the gradually increasing magnitude of the attentional modulation that is found in fMRI experiments from earlier visual areas (V1, V2) to higher ventral stream visual areas (V4, IT); how the effective size of the receptive fields of IT neurons becomes smaller in natural cluttered scenes; and makes predictions about interactions between stimuli in their receptive fields.
Article
How do we learn to recognize visual categories, such as dogs and cats? Somehow, the brain uses limited variable examples to extract the essential characteristics of new visual categories. Here, I describe an approach to category learning and recognition that is based on recent computational advances. In this approach, objects are represented by a hierarchy of fragments that are extracted during learning from observed examples. The fragments are class-specific features and are selected to deliver a high amount of information for categorization. The same fragments hierarchy is then used for general categorization, individual object recognition and object-parts identification. Recognition is also combined with object segmentation, using stored fragments, to provide a top-down process that delineates object boundaries in complex cluttered scenes. The approach is computationally effective and provides a possible framework for categorization, recognition and segmentation in human vision.