
Robert Jacobs- PhD
- Professor (Full) at University of Rochester
Robert Jacobs
- PhD
- Professor (Full) at University of Rochester
About
137
Publications
40,504
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,217
Citations
Introduction
Current institution
Additional affiliations
September 1992 - present
Publications
Publications (137)
Human visual working memory (VWM) is a memory store people use to maintain the visual features of objects and scenes. Although it is obvious that bottom-up information influences VWM, the extent to which top-down conceptual information influences VWM is largely unknown. We report an experiment in which groups of participants were trained in one of...
Does semantic information—in particular, regularities in category membership across objects—influence visual working memory (VWM) processing? We predict that the answer is “yes”. Four experiments evaluating this prediction are reported. Experimental stimuli were images of real-world objects arranged in either one or two spatial clusters. On coheren...
Analogical reasoning, e.g. inferring that teacher is to chalk as mechanic is to wrench, plays a fundamental role in human cognition. However, whether brain activity patterns of individual words are encoded in a way that could facilitate analogical reasoning is unclear. Recent advances in computational linguistics have shown that information about a...
The vision sciences literature contains a large diversity of experimental and theoretical approaches to the study of visual attention. We argue that this diversity arises, at least in part, from the field's inability to unify differing theoretical perspectives. In particular, the field has been hindered by a lack of a principled formal framework fo...
Efficient data compression is essential for capacity-limited systems, such as biological perception and perceptual memory. We hypothesize that the need for efficient compression shapes biological systems in many of the same ways that it shapes engineered systems. If true, then the tools that engineers use to analyze and design systems, namely rate-...
The “resource-rational” approach is ambitious and worthwhile. A shortcoming of the proposed approach is that it fails to constrain what counts as a constraint. As a result, constraints used in different cognitive domains often have nothing in common. We describe an alternative framework that satisfies many of the desiderata of the resource-rational...
We describe and analyze the performance of metric learning systems, including deep neural networks (DNNs), on a new dataset of human visual object shape similarity judgments of naturalistic, part-based objects known as "Fribbles". In contrast to previous studies which asked participants to judge similarity when objects or scenes were rendered from...
The human brain is able to learn difficult categorization tasks, even ones that have linearly inseparable boundaries; however, it is currently unknown how it achieves this computational feat. We investigated this by training participants on an animal categorization task with a linearly inseparable prototype structure in a morph shape space. Partici...
Although real-world environments are often multisensory, visual scientists typically study visual learning in unisensory environments containing visual signals only. Here, we use deep or artificial neural networks to address the question, Can multisensory training aid visual learning? We examine a network's internal representations of objects based...
Human brains are finite, and thus have bounded capacity. An efficient strategy for a capacity-limited agent is to continuously adapt by dynamically reallocating capacity in a task-dependent manner. Here we study this strategy in the context of visual working memory (VWM). People use their VWM stores to remember visual information over seconds or mi...
Although deep neural networks (DNNs) are state-of-the-art artificial intelligence systems, it is unclear what insights, if any, they provide about human intelligence. We address this issue in the domain of visual perception. After briefly describing DNNs, we provide an overview of recent results comparing human visual representations and performanc...
Prior neuroimaging and neuropsychological research indicates that the left inferior parietal lobule in the human brain is a critical substrate for representing object manipulation knowledge. In the present functional MRI study we used multivoxel pattern analyses to test whether action similarity among objects can be decoded in the inferior parietal...
Despite decades of research, little is known about how people visually perceive object shape. We hypothesize that a promising approach to shape perception is provided by a "visual perception as Bayesian inference" framework which augments an emphasis on visual representation with an emphasis on the idea that shape perception is a form of statistica...
The format of high-level object representations in temporal-occipital cortex is a fundamental and as yet unresolved issue. Here we use fMRI to show that human lateral occipital cortex (LOC) encodes novel 3-D objects in a multisensory and part-based format. We show that visual and haptic exploration of objects leads to similar patterns of neural act...
We argue for the advantages of the probabilistic language of thought (pLOT), a recently emerging approach to modeling human cognition. Work using this framework demonstrates how the pLOT (a) refines the debate between symbols and statistics in cognitive modeling, (b) permits theories that draw on insights from both nativist and empiricist approache...
People learn modality-independent, conceptual representations from modality-specific sensory signals. Here, we hypothesize that any system that accomplishes this feat will include three components: a representational language for characterizing modality-independent representations, a set of sensory-specific forward models for mapping from modality-...
If a person is trained to recognize or categorize objects or events using one sensory modality, the person can often recognize or categorize those same (or similar) objects and events via a novel modality. This phenomenon is an instance of cross-modal transfer of knowledge. Here, we study the Multisensory Hypothesis which states that people extract...
This paper presents a computational model of concept learning using Bayesian
inference for a grammatically structured hypothesis space, and test the model
on multisensory (visual and haptics) recognition of 3D objects. The study is
performed on a set of artificially generated 3D objects known as fribbles,
which are complex, multipart objects with c...
We investigate the hypothesis that multisensory representations mediate the crossmodal transfer of shape knowledge across visual and haptic modalities. In our experiment, participants rated the similarities of pairs of synthetic 3-D objects in visual, haptic, cross-modal, and multisensory settings. Our results offer two contributions. First, we pro...
Performance limitations in visual short-term memory (VSTM) tasks have
traditionally been explained in terms of resource or capacity limitations. It
has been claimed, for example, that VSTM possesses a limited amount of
cognitive or neural "resources" that can be used to remember a visual display.
In this paper, we highlight the potential importance...
A growing body of scientific evidence suggests that visual working memory and statistical learning are intrinsically linked. Although visual working memory is severely resource limited, in many cases, it makes efficient use of its available resources by adapting to statistical regularities in the visual environment. However, experimental evidence a...
Recent evidence from neuroimaging and psychophysics suggests common neural and representational substrates for visual perception and visual short-term memory (VSTM). Visual perception is adapted to a rich set of statistical regularities present in the natural visual environment. Common neural and representational substrates for visual perception an...
Mobile eye-tracking provides the fairly unique opportunity to record and elucidate cognition in action. In our research, we are searching for patterns in, and distinctions between, the visual-search performance of experts and novices in the geo-sciences. Traveling to regions resultant from various geological processes as part of an introductory fie...
Experimental evidence suggests that the content of a memory for even a simple display encoded in visual short-term memory (VSTM) can be very complex. VSTM uses organizational processes that make the representation of an item dependent on the feature values of all displayed items as well as on these items' representations. Here, we develop a probabi...
Natural outdoor conditions pose unique obstacles for researchers, above and beyond those inherent to all mobile eye-tracking research. During analyses of a large set of eye-tracking data collected on geologists examining outdoor scenes, we have found that the nature of calibration, pupil identification, fixation detection, and gaze analysis all req...
Melioration-defined as choosing a lesser, local gain over a greater longer term gain-is a behavioral tendency that people and pigeons share. As such, the empirical occurrence of meliorating behavior has frequently been interpreted as evidence that the mechanisms of human choice violate the norms of economic rationality. In some environments, the re...
Given a set of images, or time-lapsed imagery, that is captured in an unconstrained domain, there are numerous methods to map that data into a domain that is readily displayable on basic rectilinear digital displays. However, while these mappings can be mathematically sound, they are methods that modify the spatio-temporal “scale” of the scene, and...
We study people's abilities to transfer object category knowledge across visual and haptic domains. If a person learns to categorize objects based on inputs from one sensory modality, can the person categorize these same objects when the objects are perceived through another modality? Can the person categorize novel objects from the same categories...
Limits in visual working memory (VWM) strongly constrain human performance across many tasks. However, the nature of these limits is not well understood. In this article we develop an ideal observer analysis of human VWM by deriving the expected behavior of an optimally performing but limited-capacity memory system. This analysis is framed around r...
Experience in the field is a fundamental aspect of geologic training, and its effectiveness is largely unchallenged because of anecdotal evidence of its success among expert geologists. However, there have been only a few quantitative studies based on large data collection efforts to investigate how Earth Scientists learn in the field. In a recent...
This chapter describes two research projects that evaluated whether people's judgments are predicted by those of the standard ideal observer in more complex situations. The first project, conducted by Michel and Jacobs (2008), examined how people learn to combine information from arbitrary visual features when performing a set of perceptual discrim...
This chapter examines how visual development might interact with visual learning to explain patterns of motion detection and stereoscopic vision that we see unfolding in the developing infant. A model examines more specifically the 'less-is-more' hypothesis in the domain of visual development. This is the idea that initially limiting the amount of...
How do people learn multisensory, or amodal, representations, and what consequences do these representations have for perceptual performance? We address this question by performing a rational analysis of the problem of learning multisensory representations. This analysis makes use of a Bayesian nonparametric model that acquires latent multisensory...
Visual short-term memory (VSTM) is a central component of many human activities, but remains a poorly understood process. While previous theories have posited mechanisms intended to account for observed phenomena, in the present research we develop an ideal observer framework to uncover the expected behavior of an optimally performing memory system...
We report the results of an experiment in which human subjects were trained to perform a perceptual matching task. Subjects were asked to manipulate comparison objects until they matched target objects using the fewest manipulations possible. An unusual feature of the experimental task is that efficient performance requires an understanding of the...
Human behavior in natural tasks consists of an intricately coordinated dance of cognitive, perceptual, and motor activities. Although much research has progressed in understanding the nature of cognitive, perceptual, or motor processing in isolation or in highly constrained settings, few studies have sought to examine how these systems are coordina...
Probabilistic models based on Bayes' rule are an increasingly popular approach to understanding human cognition. Bayesian models allow immense representational latitude and complexity. Because they use normative Bayesian mathematics to process those representations, they define optimal performance on a given task. This article focuses on key mechan...
We are using an Active Vision approach to learn how novices and expert geologists acquire visual information in the field. The Active Vision approach emphasizes that visual perception is an active process wherein new information is acquired about a particular environment through exploratory eye movements. Eye movements are not only influenced by ph...
Recent physiological studies show that while neurons in higher visual areas such as IT are highly experience dependent, neurons earlier in processing show much smaller changes in tuning with practice. We report here two studies examining how these differences in neural plasticity at different stages in the visual system correspond to improvements i...
When people are exposed repeatedly to a conflict in visually and haptically specified shapes, they adapt and the apparent conflict is eventually eliminated. The inter-modal adaptation literature suggests that the conflict is resolved by adapting the haptic shape estimator. Another possibility is that both estimators adapt by amounts that depend on...
Existing studies of sensory integration demonstrate how the reliabilities of perceptual cues or features influence perceptual decisions. However, these studies tell us little about the influence of feature reliability on visual learning. In this article, we study the implications of feature reliability for perceptual learning in the context of bina...
Wallach (1985) hypothesized that people will acquire a new cue to a perceptual judgment when a novel stimulus is correlated with a known cue. We tested this hypothesis in four experiments in which we introduced novel and systematic correlations between different pairs of sensory signals. The first three experiments each paired the visual motion dir...
We study the claim that multisensory environments are useful for visual learning because nonvisual percepts can be processed to produce error signals that people can use to adapt their visual systems. This hypothesis is motivated by a Bayesian network framework. The framework is useful because it ties together three observations that have appeared...
New technologies and new ways of thinking have recently led to rapid expansions in the study of perceptual learning. We describe three themes shared by many of the nine articles included in this topic on Integrated Approaches to Perceptual Learning. First, perceptual learning cannot be studied on its own because it is closely linked to other aspect...
In this paper we investigate the manner in which the human language comprehension system adapts to shifts in probability distributions over syntactic structures, given experimentally controlled experience with those structures. We replicate a classic reading experiment, and present a model of the behavioral data that implements a form of Bayesian b...
The existence of cue-invariant neural mechanisms for representing visual depth and shape has been hypothesized, though the data favoring this hypothesis is currently sparse. We have found that performance improvements by human observers on a visual slant discrimination task transferred from training conditions in which planar surfaces were defined...
Jochen Triesch Ballard Dana- [...]
Robert
Work done in the Computer Science Department, Robotics & Vision area, and published as part of the University of Rochester National Resource Laboratory for the Study of Brain and Behavior series. We study the dynamics of visual cue integration in a tracking / identification task, where subjects track a target object among distractors and identify t...
When performing a perceptual task, precision pooling occurs when an organism's decisions are based on the activities of a small set of highly informative neurons. The Adaptive Precision Pooling Hypothesis links perceptual learning and decision making by stating that improvements in performance occur when an organism starts to base its decisions on...
The computational complexities arising in motor control can be ameliorated through the use of a library of motor synergies. We present a new model, referred to as the Greedy Additive Regression (GAR) model, for learning a library of torque sequences, and for learning the coefficients of a linear combination of sequences minimizing a cost function....
Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial stops...
A number of studies have demonstrated that people often integrate information from multiple perceptual cues in a statistically optimal manner when judging properties of surfaces in a scene. For example, subjects typically weight the information based on each cue to a degree that is inversely proportional to the variance of the distribution of a sce...
In a search problem, an agent uses the membership oracle of a target concept to find a positive ex- ample of the concept. In a shaped search problem the agent is aided by a sequence of increasingly restrictive concepts leading to the target concept (analogous to behavioral shaping). The concepts are given by membership oracles, and the agent has to...
Visual scientists have shown that people are capable of perceptual learning in a large variety of circumstances. Are there constraints on such learning? We propose a new constraint on early perceptual learning, namely, that people are capable of parameter learning-they can modify their knowledge of the prior probabilities of scene variables or of t...
We examined learning at multiple levels of the visual system. Subjects were trained and tested on a same/different slant judgment task or a same/different curvature judgment task using simulated planar surfaces or curved surfaces defined by either stereo or monocular (texture and motion) cues. Taken as a whole, the results of four experiments are c...
A person learning to control a complex system needs to learn about both the dynamics and the noise of the system. We evaluated human subjects' abilities to learn to control a stochastic dynamic system under different noise conditions. These conditions were created by corrupting the forces applied to the system with noise whose magnitudes were eithe...
We consider the properties of motor components, also known as synergies, arising from a computational theory (in the sense of Marr, 1982) of optimal motor behavior. An actor's goals were formalized as cost functions, and the optimal control signals minimizing the cost functions were calculated. Optimal synergies were derived from these optimal cont...
Investigators debate the extent to which neural populations use pair-wise and higher-order statistical dependencies among neural responses to represent information about a visual stimulus. To study this issue, three statistical decoders were used to extract the information in the responses of model neurons about the binocular disparities present in...
Investigators debate the extent to which neural populations use pairwise and higher-order statistical dependencies among neural responses to represent information about a visual stimulus. To study this issue, three statistical decoders were used to extract the information in the responses of model neurons about the binocular disparities present in...
A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. An outcome of the competition is that different networks learn different training patterns and, thus, learn to compute different functions. The architecture performs task decomposition in the sense that it...
Contrast adaptation that was limited to a small region of the peripheral retina was induced as observers viewed a multiple depth-plane textured surface. The small region undergoing contrast adaptation was present only in one depth-plane to determine whether contrast gain-control is depth-dependent. After adaptation, observers performed a contrast-m...
Variations in blur are present in retinal images of scenes containing objects at multiple depth planes. Here we examine whether neural representations of image blur can be recalibrated as a function of depth. Participants were exposed to textured images whose blur changed with depth in a novel manner. For one group of participants, image blur incre...
We studied the hypothesis that observers can recalibrate their visual percepts when visual and haptic (touch) cues are discordant and the haptic information is judged to be reliable. Using a novel visuo-haptic virtual reality environment, we conducted a set of experiments in which subjects interacted with scenes consisting of two fronto-parallel su...
Bernstein (1967) suggested that people attempting to learn to perform a difficult motor task try to ameliorate the degrees-of-freedom problem through the use of a developmental progression. Early in training, people maintain a subset of their control parameters (e.g., joint positions) at constant settings and attempt to learn to perform the task by...
Human observers localize events in the world by using sensory signals from multiple modalities. We evaluated two theories of spatial localization that predict how visual and auditory information are weighted when these signals specify different locations in space. According to one theory (visual capture), the signal that is typically most reliable...
We consider the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training. Four models were trained to estimate motion velocities in sequences of visual images. Three of the models were developmental models in the sense that the nature of their visual input...
This article considers the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training. We report the results of simulations in which four models were trained to detect binocular disparities in pairs of visual images. Three of the models were developmental mo...
We consider the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training. Four models were trained to estimate motion velocities in sequences of visual images. Three of the models were "developmental models" in the sense that the nature of their input chan...
Previous researchers developed new learning architectures for sequential data by extending conventional hidden Markov models through the use of distributed state representations. Although exact inference and parameter estimation in these architectures is computationally intractable, Ghahramani and Jordan (1997) showed that approximate inference and...
Visual environments contain many cues to properties of an observed scene. To integrate information provided by multiple cues in an efficient manner, observers must assess the degree to which each cue provides reliable versus unreliable information. Two hypotheses are reviewed regarding how observers estimate cue reliabilities, namely that the estim...
This paper investigates the use of developmental progressions in the acquisition of binocular disparity sensitivity, In an earlier paper we presented results of simulations comparing a non-developmental neural network model and two developmental neural network models trained to detect binocular disparities and concluded that the developmental model...
Introduction Relative to adults, human infants are born with limited perceptual, motor, linguistic, and cognitive abilities. There are at least two perspectives within the eld of developmental psychology regarding these limitations. The older and more popular view is that these limitations are barriers which must be overcome in order for a child to...
We compared perceptual learning in 16 psychophysical studies, ranging from low-level spatial frequency and orientation discrimination tasks to high-level object and face-recognition tasks. All studies examined learning over at least four sessions and were carried out foveally or using free fixation. Comparison of learning effects across this wide r...
The integration of information from different sensors, cues, or modalities lies at the very heart of perception. We are studying adaptive phenomena in visual cue integration. To this end, we have designed a visual tracking task, where subjects track a target object among distractors and try to identify the target after an occlusion. Objects are def...
This paper considers the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training. We report the results of simulations in which three different artificial neural network models were trained to detect binocular disparities in pairs of visual images. Two of...
Our goal was to examine the plasticity of the human visual system at mid to high levels of visual processing. It is well understood that early stages of visual processing contain cells tuned for spatial frequency and orientation. However images of real-world objects contain a wide range of spatial frequencies and orientations. We were interested in...
We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-M...
We study the hypothesis that observers can use haptic percepts as a standard against which the relative reliabilities of visual cues can be judged, and that these reliabilities determine how observers combine depth information provided by these cues. Using a novel visuo-haptic virtual reality environment, subjects viewed and grasped virtual objects...
Improvements due to perceptual training are often specific to the trained task and do not generalize to similar perceptual tasks. Surprisingly, given this history of highly constrained, context-specific perceptual learning, we found that training on a perceptual task showed significant transfer to a motor task. This result provides evidence for a c...
Our goal was to differentiate low and mid level perceptual learning. We used a complex grating discrimination task that required observers to combine information across wide ranges of spatial frequency and orientation. Stimuli were 'wicker'-like textures containing two orthogonal signal components of 3 and 9 c/deg. Observers discriminated a 15% spa...
Previous investigators have shown that observers' visual cue combination strategies are remarkably flexible in the sense that these strategies adapt on the basis of the estimated reliabilities of the visual cues. However, these researchers have not addressed how observers' acquire these estimated reliabilities. This article studies observers' abili...
We report the results of a depth-matching experiment in which subjects were asked to adjust the height of an ellipse until it matched the depth of a simulated cylinder defined by texture and motion cues. In one-third of the trials the shape of the cylinder was primarily given by motion information, in another one-third of the trials it was given by...
Three models of visual cue combination were simulated: a weak fusion model, a modified weak model, and a strong model. Their relative strengths and weaknesses are evaluated on the basis of their performances on the tasks of judging the depth and shape of an ellipse. The models differ in the amount of interaction that they permit among the cues of s...
Three hypotheses about the activity-dependent development of functionally specialized neural modules are discussed in this review. These hypotheses state that: (1) a combination of structure function correspondences plus the use of competition between neural modules leads to functional specializations; (2) parcellation is due to a combination of ne...
The roles assigned to nature and nurture in the acquisition of functional specializations have been modified in recent years
due to increasing evidence that experience-dependent processes are more influential in determining a brain region’s functional
properties than was previously supposed. Consequently, one may study the developmental principles...
There does not exist a statistical model that shows good performance on all tasks. Consequently, the model selection problem is unavoidable; investigators must decide which model is best at summarizing the data for each task of interest. This article presents an approach to the model selection problem in hierarchical mixtures-of-experts architectur...
This article investigates the bias and variance of mixtures-of-experts (ME) architectures. The variance of an ME architecture can be expressed as the sum of two terms: the first term is related to the variances of the expert networks that comprise the architecture and the second term is related to the expert networks' covariances. One goal of this...
This paper studies the problems of inference and prediction in a class of models known as hierarchical mixtures-of-experts (HME). The statistical model underlying an HME is a mixture model in which both the mixture coefficients and the mixture components are generalized linear models. Bayesian inference regarding an HME's parameters is presented in...
Machine classification of acoustic waveforms as speech events is often difficult due to context-dependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtures-of-experts models. The statistical model underl...
This article reviews statistical techniques for combining multiple probability distributions. The framework is that of a decision maker who consults several experts regarding some events. The experts express their opinions in the form of probability distributions. The decision maker must aggregate the experts' distributions into a single distributi...
Computational models in psychology play an increasingly important role in characterizing theoretical distinctions, understanding empirical results, and formulating new predictions. However, the proper use of models is subject to debate and interpretation, as Cook, Früh, and Landis (1995) have demonstrated in a critique of neural network simulations...
this article we discuss the problem of learning in modular and hierarchical systems. Modular and hierarchical systems allow complex learning problems to be solved by dividing the problem into a set of sub-problems, each of which may be simpler to solve than the original problem. Within the context of supervised learning---our focus in this article-...
An effective functional architecture facilitates interactions among subsystems that are often used together. Computer simulations showed that differences in receptive field sizes can promote such organization. When input was filtered through relatively small nonoverlapping receptive fields, artificial neural net-works learned to categorize shapes r...
TASK DECOMPOSITION THROUGH COMPETITION IN A MODULAR CONNECTIONIST ARCHITECTURE September 1990 Robert A. Jacobs, B.A., University of Pennsylvania M.S., University of Massachusetts Ph.D., University of Massachusetts Directed by: Professor Andrew G. Barto A novel modular connectionist architecture is presented in which the networks composing the archi...
We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation -...
We present a novel statistical model for supervised learning. The model is based on the principle of divide-and-conquer, and is similar in spirit to models such as CART, ID3 and MARS. We formulate the problem of learning the parameters of the model as a maximum likelihood estimation problem and develop an Expectation-Maximization (EM) algorithm for...
We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIMs). Learning is treated as a maximum likelihood problem; in particular, we present an expectation-ma...