
George A Alvarez- Harvard University
George A Alvarez
- Harvard University
About
134
Publications
32,148
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,572
Citations
Current institution
Publications
Publications (134)
A bstract
Recent progress in multimodal AI and ‘language-aligned’ visual representation learning has rekindled debates about the role of language in shaping the human visual system. In particular, the emergent ability of ‘language-aligned’ vision models (e.g. CLIP) – and even pure language models (e.g. BERT) – to predict image-evoked brain activity...
The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity...
Modular and distributed coding theories of category selectivity along the human ventral visual stream have long existed in tension. Here, we present a reconciling framework—contrastive coding—based on a series of analyses relating category selectivity within biological and artificial neural networks. We discover that, in models trained with contras...
Modular and distributed theories of category selectivity along the ventral visual stream have long existed in tension. Here, we present a reconciling framework, based on a series of analyses relating category-selective tuning within biological and artificial neural networks. We discover that, in models trained with contrastive self-supervised objec...
How visual do our models of visual cortex actually need to be? In this work, we characterize the current state of the art in using language-only or language-aligned models to predict human visual cortical responses to natural images, using fMRI data from the Natural Scenes Dataset. We assess 4 kinds of models in their ability to predict visual acti...
Anterior regions of the ventral visual stream encode substantial information about object categories. Are top-down category-level forces critical for arriving at this representation, or can this representation be formed purely through domain-general learning of natural image structure? Here we present a fully self-supervised model which learns to r...
What we know as the self is not just one unified construct, but consists of various self-concepts that are continuously created, revised, and discarded, such as “woman”, “Thai national”, “Northwestern student”, and “true self”. These rich, variegated self-concepts help organize our endeavors throughout the different domains of our lives. How do we...
Attentional tracking and working memory tasks are often performed better when targets are divided evenly between the left and right visual hemifields, rather than contained within a single hemifield (Alvarez & Cavanagh, 2005; Delvenne, 2005). However, this bilateral field advantage does not provide conclusive evidence of hemifield-specific control...
Feature-based attention is known to enhance visual processing globally across the visual field, even at task-irrelevant locations. Here, we asked whether attention to object categories, in particular faces, shows similar location-independent tuning. Using EEG, we measured the face-selective N170 component of the EEG signal to examine neural respons...
How people process images is known to affect memory for those images, but these effects have typically been studied using explicit task instructions to vary encoding. Here, we investigate the effects of intrinsic variation in processing on subsequent memory, testing whether recognizing an ambiguous stimulus as meaningful (as a face vs as shape blob...
How people process images is known to affect memory for those images, but these effects have typically been studied using explicit task instructions to vary encoding. Here, we investigate the effects of intrinsic variation in processing on subsequent memory, testing whether recognizing an ambiguous stimulus as meaningful (as a face vs. as shape blo...
While substantial work has focused on how the visual system achieves basic-level recognition, less work has asked about how it supports large-scale distinctions between objects, such as animacy and real-world size. Previous work has shown that these dimensions are reflected in our neural object representations (Konkle & Caramazza, 2013), and that o...
Visual illusions cut across academic divides and popular interests: on the one hand, illusions provide entertainment as curious tricks of the eye; on the other hand, scientific research related to illusory phenomena has given generations of scientists and artists deep insights into the brain and principles of mind and consciousness. Numerous thinke...
Cognitive training has become a billion-dollar industry with the promise that exercising a cognitive faculty (e.g., attention) on simple ''brain games'' will lead to improvements on any task relying on the same faculty. Although this logic seems sound, it assumes performance improves on training tasks because attention's capacity has been enhanced....
Traditionally, recognizing the objects within a scene has been treated as a prerequisite to recognizing the scene itself. However, research now suggests that the ability to rapidly recognize visual scenes could be supported by global properties of the scene itself rather than the objects within the scene. Here, we argue for a particular instantiati...
New & noteworthy:
Here, we ask which neural regions have neural response patterns that correlate with behavioral performance in a visual processing task. We found that the representational structure across all of high-level visual cortex has the requisite structure to predict behavior. Furthermore, when directly comparing different neural regions,...
Confidence in our memories is influenced by many factors, including beliefs about the perceptibility or memorability of certain kinds of objects and events, as well as knowledge about our skill sets, habits, and experiences. Notoriously, our knowledge and beliefs about memory can lead us astray, causing us to be overly confident in eyewitness testi...
Studies have shown that working memory capability is limited, even for simple visual features. However, typical studies may underestimate the amount and richness of information in memory by relying on paradigms where participants only make a single recall response. To examine this possibility, we had participants memorize five briefly presented col...
Significance
Visual working memory is the cognitive system that holds visual information in an active state, making it available for cognitive processing and protecting it against interference. Here, we demonstrate that visual working memory has a greater capacity than previously measured. In particular, we use EEG to show that, contrary to existin...
Is working memory capacity determined by an immutable limit-for example, 4 memory storage slots? The fact that performance is typically unaffected by task instructions has been taken as support for such structural models of memory. Here, we modified a standard working memory task to incentivize participants to remember more items. Participants were...
Understanding how perceptual and conceptual representations are connected is a fundamental goal of cognitive science. Here, we focus on a broad conceptual distinction that constrains how we interact with objects-real-world size. Although there appear to be clear perceptual correlates for basic-level categories (apples look like other apples, orange...
Influential slot and resource models of visual working memory make the assumption that items are stored in memory as independent units, and that there are no interactions between them. Consequently, these models predict that the number of items to be remembered (the set size) is the primary determinant of working memory performance, and therefore t...
Visual perception and awareness have strict limitations. We suggest that one source of these limitations is the representational architecture of the visual system. Under this view, the extent to which items activate the same neural channels constrains the amount of information that can be processed by the visual system and ultimately reach awarenes...
Human cognition has a limited capacity that is often attributed to the brain having finite cognitive resources, but the nature of these resources is usually not specified. Here, we show evidence that perceptual interference between items can be predicted by known receptive field properties of the visual cortex, suggesting that competition within re...
Ensemble perception, including the ability to "see the average" from a group of items, operates in numerous feature domains (size, orientation, speed, facial expression, etc.). Although the ubiquity of ensemble representations is well established, the large-scale cognitive architecture of this process remains poorly defined. We address this using a...
A central question for models of visual working memory is whether the number of objects people can remember depends on object complexity. Some influential "slot" models of working memory capacity suggest that people always represent 3-4 objects and that only the fidelity with which these objects are represented is affected by object complexity. The...
It is much easier to divide attention across the left and right visual hemifields than within the same visual hemifield. Here we investigate whether this benefit of dividing attention across separate visual fields is evident at early cortical processing stages. We measured the steady-state visual evoked potential, an oscillatory response of the vis...
The severely limited capacity of visual working memory is thought to result from a fixed storage capacity, rather than limitations at encoding or retrieval. Thus, most investigations of working memory have focused on understanding the storage systemits capacity, its flexibility, and the units over which it operates. Little work has investigated how...
The ability to extract a summary representation for a group of related objects (ensemble perception) operates across a host of visual domains: People can readily perceive the average emotion of crowds of faces, the average size of dots, and the average orientation of Gabors. Do these ensemble representations rely on a common underlying mechanism, o...
Visual memory enables a viewer to hold in mind details of objects, textures, faces, and scenes. After initial exposure to an image, however, memory rapidly degrades. To gain insight into this process, and to better understand memory maintenance, we examined whether degradation depends on the amount of information being maintained. We collected high...
Many influential models of scene perception treat objects as the basic unit of scene recognition. However, there is evidence that global properties of an image can drive scene perception before any objects can be identified (e.g., Greene & Oliva, 2009), and computational models suggest this ability could be explained by sensitivity to global patter...
It is known that focusing attention on a particular feature (e.g., the color red) facilitates the processing of all objects in the visual field containing that feature [1-7]. Here, we show that such feature-based attention not only facilitates processing but also actively inhibits processing of similar, but not identical, features globally across t...
Our ability to actively maintain information in visual memory is strikingly limited. There is considerable debate about why this is so. As with many questions in psychology, the debate is framed dichotomously: Is visual working memory limited because it is supported by only a small handful of discrete “slots” into which visual representations are p...
Significance
Human cognition is inherently limited: only a finite amount of visual information can be processed at a given instant. What determines those limits? Here, we show that more objects can be processed when they are from different stimulus categories than when they are from the same category. This behavioral benefit maps directly onto the...
A balance of mutual tonic inhibition between bi-hemispheric posterior parietal cortices is believed to play an important role in bilateral visual attention. However, experimental support for this notion has been mainly drawn from clinical models of unilateral damage. We have previously shown that low-frequency repetitive TMS (rTMS) over the intrapa...
The muscles that control the pupil are richly innervated by the autonomic nervous system. While there are central pathways that drive pupil dilations in relation to arousal, there is no anatomical evidence that cortical centers involved with visual selective attention innervate the pupil. In this study, we show that such connections must exist. Spe...
The MemToolbox is a collection of MATLAB functions for modeling visual working memory. In support of its goal to provide a full suite of data analysis tools, the toolbox includes implementations of popular models of visual working memory, real and simulated data sets, Bayesian and maximum likelihood estimation procedures for fitting models to data,...
People can only store a limited amount of information in visual working memory. What is the nature of this limit, and does its expression depend on task demands or is it fixed? Here we show evidence that participants have control over both the quantity and fidelity of items stored in working memory. Participants were briefly shown five colored circ...
Real-world size is an intrinsic property of objects: it is accessed automatically when we see an object and predicts a consistent medial-to-lateral organization in ventral temporal cortex (Konkle & Oliva, 2012). Is real-world size a purely abstract concept or a conceptual distinction reflected in perceptual differences? Since visual search performa...
Fluid intelligence is important for successful functioning in the modern world, but much evidence suggests that fluid intelligence is largely immutable after childhood. Recently, however, researchers have reported gains in fluid intelligence after multiple sessions of adaptive working memory training in adults. The current study attempted to replic...
Individual Subject Training Gains. Beginning and ending dual n-back loads/Multiple Object Tracking (MOT) speeds are presented for each participant. Beginning points represent the average performance across the first three days of training, while ending points display the average performance across the final three days of training.
(PDF)
Relationships Between Training Gains and Transfer Measures. A) Correlation between improvement on the dual n-back task during training and the difference between pre- and post-training Ravens Advanced Progressive Matrices (RAPM) scores. B) Correlation between dual n-back improvement and change on the Composite Span Task scores. C) Correlation betwe...
Visual long-term memory can store thousands of objects with surprising visual detail, but just how detailed are these representations, and how can one quantify this fidelity? Using the property of color as a case study, we estimated the precision of visual information in long-term memory, and compared this with the precision of the same information...
The brain has finite processing resources so that, as tasks become harder, performance degrades. Where do the limits on these resources come from? We focus on a variety of capacity-limited buffers related to attention, recognition, and memory that we claim have a two-dimensional 'map' architecture, where individual items compete for cortical real e...
Working memory is a mental storage system that keeps task-relevant information accessible for a brief span of time, and it is strikingly limited. Its limits differ substantially across people but are assumed to be fixed for a given person. Here we show that there is substantial variability in the quality of working memory representations within an...
Influential theories of visual working memory have proposed that the basic units of memory are integrated object representations. Key support for this proposal is provided by the same object benefit: It is easier to remember multiple features of a single object than the same set of features distributed across multiple objects. Here, we replicate th...
Are real-world objects represented as bound units? Although a great deal of research has examined binding between the feature dimensions of simple shapes, little work has examined whether the featural properties of real-world objects are stored in a single unitary object representation. In a first experiment, we found that information about an obje...
Observers can learn statistical regularities and use them to hold more content in working memory (Brady, Konkle & Alvarez, 2009). Here we investigated whether this memory enhancement fundamentally alters the structure of perceptual representations as measured by a perceptual grouping task.
Memory-Training Task: On each trial, 8 shapes were displaye...
Faces, scenes, objects, and bodies evoke distinct but overlapping neural activation patterns in the ventral stream when presented in isolation. If these stimulus categories are presented simultaneously in the visual field, how do they compete for perceptual resources? Is the degree of competition between stimulus categories predicted by the similar...
Working memory, the ability to retain task-relevant information in an accessible state over a brief span of time, is strikingly limited. Models of working memory explain these limitations by postulating a finite resource that is divided among stored items in a continuous or quantized manner, and they assume that the quality of memory representation...
Previous studies have shown independent attentional selection of targets in the left and right visual hemifields during attentional tracking (Alvarez & Cavanagh, 2005) but not during a visual search (Luck, Hillyard, Mangun, & Gazzaniga, 1989). Here we tested whether multifocal spatial attention is the critical process that operates independently in...
The world is composed of features and objects and this structure may influence what is stored in working memory. It is widely believed that the content of memory is object-based: Memory stores integrated objects, not independent features. We asked participants to report the color and orientation of an object and found that memory errors were largel...
The limits of visual working memory have been well established for simple colored shapes, where it is typically assumed that all stimuli compete equally for this limited memory resource. Here we examined whether different stimulus categories (faces, bodies, objects, scenes) draw equally on working memory resources or if the kind of stimuli to be re...
Observes automatically learn statistical regularities in their environment, and use these regularities to form more efficient working memory representations (Brady, Konkle, & Alvarez, 2009). For instance, when colors are more likely to appear in certain pairs (e.g., red with blue), observers learn these regularities and over the course of learning...
How does the structure of the environment shape what we store in working memory? Information in the world is bound into meaningful units -objects- and it is widely believed that the contents of visual working memory are bound object representations. This account suggests that, for sample displays containing more information than can be stored, we h...
Is visual attention required for visual consciousness? In the past decade, many researchers have claimed that awareness can arise in the absence of attention. This claim is largely based on the notion that natural scene (or "gist") perception occurs without attention. This article presents evidence against this idea. We show that when observers per...
When a set of objects changing in brightness, color, size, or shape moves across the visual field, the objects appear to stop changing [Suchow & Alvarez, 2011]. In previous work introducing this effect ("silencing"), we showed that its strength depends on speed: the faster the objects move, the less noticeable the change. (See the demos at http://b...
How efficient is visual search in real scenes? In searches for targets among arrays of randomly placed distractors, efficiency is often indexed by the slope of the reaction time (RT) × Set Size function. However, it may be impossible to define set size for real scenes. As an approximation, we hand-labeled 100 indoor scenes and used the number of la...
Traditional memory research has focused on identifying separate memory systems and exploring different stages of memory processing. This approach has been valuable for establishing a taxonomy of memory systems and characterizing their function but has been less informative about the nature of stored memory representations. Recent research on visual...
The visual system can only accurately represent a handful of objects at once. How do we cope with this severe capacity limitation? One possibility is to use selective attention to process only the most relevant incoming information. A complementary strategy is to represent sets of objects as a group or ensemble (e.g. represent the average size of i...
Influential models of visual working memory treat each item to be stored as an independent unit and assume that there are no interactions between items. However, real-world displays have structure that provides higher-order constraints on the items to be remembered. Even in the case of a display of simple colored circles, observers can compute stat...
Loud bangs, bright flashes, and intense shocks capture attention, but other changes--even those of similar magnitude--can go unnoticed. Demonstrations of change blindness have shown that observers fail to detect substantial alterations to a scene when distracted by an irrelevant flash, or when the alterations happen gradually [1-5]. Here, we show t...
Purpose: Research has shown that observers can track 4 or 5 out of 10 identical moving items. Pylyshyn proposed a spatial indexing theory in which a limited set of 4 or 5 spatial indices can be assigned to objects in the visual field. Yantis proposed that the tracked objects are grouped together and attended to as a single deforming object. These a...
Both multiple-object visual tracking (MVT) and inefficient visual search are held to demand visual attention. However, we have previously shown (ARVO 2000) that both tasks can be performed concurrently within a single trial with minimal performance loss on either task. How is this possible? Here we test the hypothesis that observers switch back and...
Observers can store thousands of object images in visual long-term memory with high fidelity, but the fidelity of scene representations in long-term memory is not known. Here, we probed scene-representation fidelity by varying the number of studied exemplars in different scene categories and testing memory using exemplar-level foils. Observers view...
The number of moving objects that can be tracked with attention is often reported to be 4, suggesting that there is a “structural limit” on tracking. We show that the tracking limit is not fixed, but depends systematically on the speed of the objects, such that at slow speeds observers can track 8 targets as well as a single target moving at a fast...
We intuitively believe that a sudden movement or change outside the focus of attention will attract our attention. Indeed, numerous studies of stimulus-driven attention show that sudden changes to a display capture attention, even when subjects know that they are irrelevant to the primary task. This mechanism may exist to automatically allocate vis...
Influential models of visual working memory treat each item as an independent unit and assume there are no interactions between items. However, even in displays with simple colored circles there are higher-order ensemble statistics that observers can compute quickly and accurately (e.g., Ariely, 2001). An optimal encoding strategy would take these...
Recently we demonstrated that visual long-term memory (LTM) can store thousands of objects with remarkable fidelity, but it remains unclear how the fidelity of LTM compares to the fidelity of short-term memory (STM) or online visual perception. We used color as a case study to quantify the fidelity of LTM, STM, and perception using pictures of real...
How do we estimate the number of objects in a set? One primary question is whether our estimates are based on an unbroken visual image or a segmented collection of discrete objects. We manipulated whether individual objects were isolated from each other, or grouped into pairs by irrelevant lines. If number estimation operates over an unbroken image...
Humans have a massive capacity to store detailed information in visual long-term memory. The present studies explored the fidelity of these visual long-term memory representations and examined how conceptual and perceptual features of object categories support this capacity. Observers viewed 2,800 object images with a different number of exemplars...
Purpose: The capacity limitation of human attention is best exemplified in attentive tracking of moving objects: our tracking ability declines when more objects are tracked, or when each object moves at a faster speed. Previous behavioral studies showed a trade-off between the number of objects tracked and the speed of each object, suggesting that...
In visual search tasks, the time required to find targets (reaction time - RT) is a function of the number of items in the display (set size). Targets can be found efficiently if they can be uniquely defined by the presence one of a limited set of features. Thus, for example, in search for red targets among blue distractors, the slope of the RT x s...
Although people can remember a massive number of pictures (e.g.10,000 in Standing, 1973), the fidelity with which human memory can represent such a large number of items has not been tested. Most researchers in visual cognition have assumed that in such studies, only the gist of images were remembered and the details were forgotten. We conducted tw...