ArticlePDF Available

Vision of objects happens faster and earlier for location than for identity

Authors:
Article
iScience
Vision of objects happens faster and earlier for
location than for identity
Graphical abstract
Highlights
dWe compared the visual processing of spatial location and
identity (object category)
dVisual processing started earlier for location than for identity
dVisual processing speed was higher for location than for
identity
dThis suggests an intrinsic preference for the visual system for
processing space
Authors
Christian H. Poth, Werner X. Schneider
Correspondence
c.poth@uni-bielefeld.de
In brief
Sensory neuroscience; Cognitive
neuroscience
Poth & Schneider, 2025, iScience 28, 111702
February 21, 2025 ª2024 The Authors. Published by Elsevier Inc.
https://doi.org/10.1016/j.isci.2024.111702 ll
iScience
Article
Vision of objects happens faster
and earlier for location than for identity
Christian H. Poth
1,2,
*and Werner X. Schneider
1
1
Neuro-Cognitive Psychology, Department of Psychology, Bielefeld University, Bielefeld, Germany
2
Lead contact
*Correspondence: c.poth@uni-bielefeld.de
https://doi.org/10.1016/j.isci.2024.111702
SUMMARY
Visual perception of objects requires the integration of separate independent stimulus features, such as
object identity and location. We ask whether the location and the identity of an object are processed with
different efficiency for being consciously recognized and reported. Participants viewed a target letter at
one out of several locations that were terminated by pattern masks at all possible locations. Participants
reported the location of the target and/or its letter identity. Report performance as a function of the target
duration before the mask is enabled to estimate the speed of visual processing and the minimum duration
for processing to start. Visual processing was faster and started earlier for spatial location than for object
identity, even though the processing of the features was (stochastically) independent. Together, these find-
ings reveal an intrinsic preference of the human visual system for the perceptual processing of space as
opposed to visual features such as categorical identity.
INTRODUCTION
Human behavior is largely guided by vision. Humans visually
sample the environment, they visually acquire information about
objects that are relevant for current needs and behavioral goals.
The visual system in the human brain encodes the different fea-
tures of an object, such as form and color, in separate, special-
ized sub-systems.
1
However, perceiving a coherent visual world,
and guiding behavior accordingly, requires that the separate fea-
tures of an object are integrated into one coherent representa-
tion.
2–5
Perception is assumed to happen once the features
become represented as object files
6
or in visual working mem-
ory,
7,8
a capacity-limited system for retaining (and cognitively
operating on) information available even after it has disappeared
from the environment.
9
Up until this point, the different features
of all objects within an eye fixation are assumed to be processed
independently,
6,7,10
in line with the distributed neural centers
specializing in the processing of different features.
1,11,12
A
capacity limit in terms of object processing is nevertheless
assumed by influential current theories of visual object pro-
cessing.
8,13,14
To a part, this competition is decided by atten-
tional prioritization: processing of an object (or feature) can be
enhanced based on the physical salience and/or the current
task-relevance of the features of an object.
3,8,15
Taking the
top-down task-relevance aside, it is still unclear, however,
whether visual perception is an intrinsic bottom-up preference
for processing certain features rather than other features.
Some evidence suggests that visual features differ in a bot-
tom-up fashion in terms of how they are processed in the visual
system. In whole report paradigms with backward masking,
visual features marking object boundaries (such as shape)
seem to be reported more accurately than surface features as
color (given equal task-relevance).
10
In contrast, in paradigms
based on feature changes, surface features as color seem to
be processed for conscious perception before visual mo-
tion.
16,17
In terms of intrinsic processing differences, the feature
location is an especially informative case. The spatial location of
an object is implicitly represented throughout the levels of the vi-
sual system in a topographic/retinoptic manner in various
cortical maps,
1,18,19
and is thought to help distinguish visual in-
puts from different objects,
3
to enable sensorimotor action
upon the objects,
5
and to modulate ongoing action fast and
automatically.
20,21
In contrast to the spatial location, surface fea-
tures, such as color, form or shape, and object category are rep-
resented by more specialized neural channels, centers, and
maps.
1,18,22–24
Thus, even though these features are ultimately
bound to achieve a coherent object representation,
3
for them-
selves they do not receive such an omnipresent representation
as space. In line with a prominent position of spatial processing,
it is well-established that spatial processing can have strong
modulating effects on vision in general, namely by guiding atten-
tion to prioritize the processing of visual information from spe-
cific locations in the visual field.
25–28
Besides this functionally
important role of implicit spatial processing for initial visual pro-
cessing and for guiding attention,
3
it is unknown whether space
itself also receives priority over other visual features for explicit
visual recognition and report. In particular, since attention is
often studied in visual search tasks requiring speeded manual
actions,
29
a seemingly high priority of spatial processing could
arise from a privileged access of space to action control based
on the ‘‘fast’ dorsal visual system after which the ‘‘slower’
ventral visual system mediating conscious perception lags
iScience 28, 111702, February 21, 2025 ª2024 The Authors. Published by Elsevier Inc. 1
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
ll
OPEN ACCESS
behind.
20,30,31
Thus, even though the special status of spatial
processing is recognized,
27,28,32
current theories often remain
neutral regarding the intrinsic efficiency with which different clas-
ses of visual features are processed
8,33
or might even ques-
tion it.
34
Here, we ask whether the spatial location and the identity
(object category) of an object are processed with different effi-
ciency for being consciously recognized and reported. To this
end, we showed observers single target letters for brief durations
(terminated by pattern masks) and at different locations (Figure
1). In Experiment 1, observers reported the location as well as
the letter identity of the target on a given trial. In Experiment 2, ob-
servers performed different blocks of trials in different sessions,
in which they only reported the location or the identity of the
target. Based on Bundesen’s
7
Theory of Visual Attention (TVA),
we modeled observers’ report performance as a psychometric
function of the presentation duration of the target and estimated
two key parameters of visual processing, namely the temporal
threshold of visual perception, which is the presentation duration
needed for visual processing to start, and the speed of visual pro-
cessing in terms of objects per seconds. According to Bunde-
sen,
7
these two parameters determine conscious perception.
If conscious visual perception was generally better for spatial
location than for object identity, performance in reporting loca-
tion should be higher than for reporting object identity, across
the different target durations and despite the same level of
task relevance. Moreover, if location and object identity were
processed independently, then participants’ reports of these
two object features should be stochastically independent.
10
In
terms of the two TVA parameters of visual processing, we tested
if the spatial location of an object was processed for perception
with higher efficiency than the identity of the object. If so, then
the visual processing speed should be higher for the location
than for the object identity. In addition, if the processing of the
spatial location of the object started earlier than the processing
of the object identity, then the temporal perception threshold
should be lower for the location than for the object identity.
RESULTS
The data were analyzed using custom scripts written in R
(4.3.1.).
35
The data and analysis code can be found online at
(Open Science Framework: https://osf.io/jpcu4/) and contains
all used R-packages. Statistical comparisons were conducted
using repeated-measures analyses of variance, paired (or one-
sample) t-tests (with Cohen’s d
z
as effect size), followed up
upon by Bayesian t-tests (with a prior scale of r = 0.707) yielding
the Bayes Factor in favor of the alternative hypothesis (BF
10
).
36
Experiment 1
The stochastic independence of location and letter reports was
assessed as follows (see Figure 2A for the letter and location
report performance).
10
We computed the predicted probabili-
ties of reporting location or letter identity correctly or wrong
assuming they were mutually independent, based on the
observed marginal probabilities for each target duration and
each observer.
10
Across observers, there was a high correla-
tion between these predicted probabilities and the probabilities
that had been observed (Figure 2B). The mean correlation was
0.99 (SD = 0.014), and significantly larger than 0, t(8) = 207.63,
p<0.001, d
z
= 69.21, BF
10
> 2.91*10
12
. This shows that
location reports and letter identity reports are stochastically
independent.
Location reports were 16.7% (mean of observers’ mean
performance differences for each target duration) more accu-
rate than letter identity reports (Figure 2A), and this was signif-
icantly larger than 0, t(8) = 10.248, p< 0.001, d
z
= 3.416,
BF
01
= 2685.519. To investigate these performance differ-
ences more closely in terms of temporal perception threshold
and visual processing speed, the individual observer’s report
performance for the two report conditions was assessed as
a function of target duration, and this psychometric function
was modeled as an exponential approach of perfect perfor-
mance
7
(Figure 2C shows the psychometric function for the
aggregate observer):
where p is the probability of correct report, t
0
is the
temporal threshold of perception, vis the processing speed,
and chance is the probability of guessing correctly (here
1/12). Psychometric functions were fit using custom code
(inspired by quickpsy,
38,cf.39
).
In these psychometric functions, the TVA parameter t
0
is the
temporal threshold of perception, that is, the target duration
(in s) necessary for increasing performance over chance (i.e.,
the target duration where the curves in Figure 2C rise from
chance) and which represents the time needed for visual pro-
cessing to start.
7
The TVA parameter v is the visual processing
speed in the number of objects that can be processed per
second,
7
that is the exponential rate (i.e., the steepness) of the
curves in Figure 2C).
Across observers, the temporal threshold of perception was
significantly lower for reporting the location of a target compared
with its letter identity (Figure 2D), t(8) = 8.667, p< 0.001, d
z
=
2.889, BF
10
= 929.503. Thus, the processing of location started
earlier than the processing of letter identity. Likewise, visual pro-
cessing speed was significantly higher for location than for letter
identity (Figure 2D), t(8) = 2.868, p= 0.021, d
z
= 0.956, BF
10
=
3.561. Thus, the visual processing of the location not only started
pðtÞ=1expð vðtt0ÞÞ+expð vðtt0ÞÞchance;if t Rt0
chance;if t < t0
;
2iScience 28, 111702, February 21, 2025
iScienc
e
Article
ll
OPEN ACCESS
earlier but also proceeded faster than the processing of the letter
identity.
For Experiment 1, 2 (location vs. letter identity report) x
2(report order, location vs. letter first) repeated-measures ana-
lyses of variance (ANOVA) neither showed main effects nor
interaction (with report type) effects of report order on the tem-
poral perception thresholds, Fs(1, 8) < 3.453, ps > 0.100,
h
G2
s < 0.021. However, the ANOVA showed a main effect
of report order on visual processing speed, F(1, 8) = 6.447,
p= 0.035, h
G2
= 0.028 (Figure S1, and again, no interaction,
F(1, 8) = 2.313, p= 0.167, h
G2
= 0.013). Holm-corrected
post-hoc tests indicated this was due to a higher visual pro-
cessing speed for location than for letter identity when location
had to be reported first, p= 0.042. Likewise, it was due to a
higher visual processing speed for location when location had
to be reported first as compared with the processing speed
for letter identity when the letter identity had to be reported
first, p= 0.027. This finding might suggest that the location in-
formation in working memory might decay over time depending
on target duration (e.g., intermediate target durations could suf-
fice for encoding into short-term memory but resulted in repre-
sentations there still vulnerable to decay) and were fully avail-
able only when it was used for report first, without intervening
letter identity report.
Conversely, for the letter identity report, and intervening loca-
tion report did not seem to have any effects (Figure S1). In line
with such an effect of report order, one might argue that
observers strategically prioritized location over letter identity
for visual processing and retention in short-term memory
throughout the experiment, since both features were to be re-
ported on every trial. Therefore, Experiment 2 asked observers
to report only one of the two features in a given experimental
block, so that observers could fully prioritize the target feature
on a given trial. This manipulation should create conditions of
equally high relevance for location and identity, ruling out top-
down preferences for one feature (location) over the other
(identity).
Experiment 2
In line with Experiment 1, observers’ location reports were
16.6% (mean of observers’ mean performance differences for
each target duration) more accurate than their letter identity re-
ports, t(8) = 7.281, p< 0.001, d
z
= 2.427, BF
10
= 321.221
(Figure S2).
Figure 3A shows the psychometric functions for location and
letter identity reports for the aggregate observer. As in Experi-
ment 1, observers’ temporal perception thresholds were signifi-
cantly lower for location reports than for letter identity reports
(Figure 3B), t(8) = 5.7879, p< 0.001, d
z
=1.929, BF
10
=
86.019. Again, visual processing for location perception started
earlier than processing for letter identity perception. Also, the vi-
sual processing speed for location perception was significantly
higher than for the perception of letter identity (Figure 3B),
t(8) = 3.661, p= 0.006, d
z
= 1.220, BF
10
= 9.089. Thus, when loca-
tion and letter identity reports were blocked, location was still
processed earlier and faster than letter identity.
Figure 1. Paradigm of Experiment 1
After fixating a fixation cross, a single letter target was shown briefly at one out of 12 locations and was followed by pattern masks appearing at all 12 possible
locations. At the end of a trial, participants reported the letter identity and the location (the order of these two report types was randomized and counterbalanced
across trials).
iScience 28, 111702, February 21, 2025 3
iScienc
e
Article
ll
OPEN ACCESS
Figure 2. Results of Experiment 1
(A) Location report performance vs. letter report performance. Points indicate observers’ mean proportion correct, error bars the corresponding 95% confidence
intervals for within designs.
37
(B) Observed probabilities of reporting location or letter identity correctly or wrong as a function of the probabilities predicted by the observed marginal prob-
abilities under the assumption of stochastic independence (each point represents one such probability pair for one observer and target duration). The diagonal
(dashed) indicates the identity of predicted and observed probabilities (hence stochastic independence), the regression line is shown on top of it in blue.
(C) Psychometric function of the aggregate observer for location vs. letter report performance as a function of target presentation duration. Points represent mean
proportion correct across observers (with error bars indicating 95% confidence intervals,
37
smooth curves indicate the psychometric functions found by
averaging the parameters of the individual observers’ fitted psychometric functions.
(D) Means of observers’ temporal perception thresholds and visual processing speed for perceiving location and letter identity, respectively. Error bars provide
95% confidence intervals.
37
4iScience 28, 111702, February 21, 2025
iScienc
e
Article
ll
OPEN ACCESS
DISCUSSION
We asked whether the bottom-up processing of the spatial loca-
tion and the identity (object category) of an object were pro-
cessed with different efficiency for being consciously recognized
and reported. Both of our experiments demonstrate that this is
the case. Overall, perceptual performance was higher for the
spatial location than for the object identity. Observers’ reports
of the two features were stochastically independent of one
another, in line with previous findings and the assumption that
visual features are processed independently and in parallel in
general.
7,10
Most importantly, we found that the bottom-up pro-
cessing of the two features for visual perception/for report was
differently efficient. Visual processing speed was higher for the
spatial object location than for the object identity. Likewise, the
temporal perception threshold was lower for location than object
identity. Thus, the processing of location not only proceeded
faster, but it also started earlier as compared with the processing
of object identity.
One could ask if the differences between location percep-
tion and object identity perception reflected peculiarities of
the task, namely, that is was merely more difficult to discrim-
inate the letter identities as compared with the letter locations.
Arguing against this idea, both, the location report and
the identity report approached an asymptote near perfect per-
formance at the highest presentation durations, showing
that for both report features, there was little confusability
(Figures 2C and and 3A).
In contrast to reaction time measures from speeded tasks that
conflate perceptual, response, and motor processing,
40–42
,our
paradigm offered unlimited time for responding to allow response
and motor processing to finish always. Floor or ceiling effects on
performance were prevented by terminating the visual presenta-
tion duration using backwards pattern masks, which are assumed
to interrupt processing and extinguish visual sensory (iconic)
memory.
43
Visual processing speed and the temporal perception
threshold were assessed by studying how report performance
improved with the increasing presentation duration of the target.
As is often done in TVA-based paradigms,
44–47
participants
viewed a single target that was terminated by a pattern mask.
Crucially, the single letter was accompanied by several pattern
masks at all possible target locations. Here, this was done
because presenting a single mask would have directly delivered
information about the target location even at the lowest target du-
rations (which would have precluded the estimation of visual pro-
cessing speed and the temporal perception threshold for spatial
location). One might argue that perception in such a paradigm
with post-masked targets might not only depend on the target
and its presentation duration but also on the characteristics of
the mask that decide how well the features of the target can be
Figure 3. Results of Experiment 2
(A) Psychometric function of the aggregate observer for location vs. letter report performance as a function of target presenta tion duration. Points represent the
mean proportion correct across observers (with error bars indicating 95% confidence intervals
37
), and smooth curves indicate the psychometric functions found
by averaging the parameters of the individual observers’ fitted psychometric functions.
(B) Means of observers’ temporal perception thresholds and visual processing speed for perceiving location and letter identity, respectively. Error bars provide
95% confidence intervals.
37
iScience 28, 111702, February 21, 2025 5
iScienc
e
Article
ll
OPEN ACCESS
temporally segregated from the mask.
22,48–51
The temporal
segregation is assumed to rely on candidate object representa-
tions (proto-objects) that can be filtered by means of (object-
based spatial) attention.
50
The computation of attentional prior-
ities for candidate objects as well as their initial figure-ground
segregation is assumed to happen in a first, unselective (‘‘pre-
attentive’’) processing phase that should contribute to the time
needed to start the visual processing of objects and object fea-
tures, that is, to the temporal perception threshold.
8,40,51–53
Thus, our findings of lower temporal perception thresholds for
spatial location than for object identity could suggest that the
feature-specific masking strength (feature-specific similarity of
mask and target) was higher for identity (i.e., alphanumeric cate-
gory) than for location. However, such a masking-based view on
the temporal threshold cannot explain why visual processing
speed was also higher for location than for object identity,
because visual processing is assumed to take place after the tem-
poral segregation of target and mask and the computation of
candidate target objects, and thus after the temporal perception
threshold had been passed.
In Bundesen’s
7
Theory of Visual Attention, the higher visual
processing speed for location than identity could be due to
two factors. First, the sensory evidence for location could be
higher than for identity for at least two reasons. Large parts
of the visual system are organized spatiotopic or retino-
topic
18,19, cf. 13
, so that space is implicitly encoded ubiquitously
in the visual brain so that even subtle spatial input can success-
fully be matched against these vast representations in the
recognition process. Spatial location (and retinotopy in general)
is assumed to serve as an implicit organizing feature for guiding
attention and for combining different visual features to coherent
object representations
3,5,15
and for controlling sensorimotor ac-
tion,
20,21,54
the present findings are the first to indicate a special
status of spatial location also for conscious perception and
explicit report. Second, the visual brain could have an intrinsic
and fixed bias for categorizing objects as being at a certain
location in the visual field as opposed to categorizing them
as having any other feature such as a certain identity. The rep-
resentation of object positions in space is often assumed to be
the basis of attentional allocation,
15,55
so that the preceding
space computation prior to visual feature computations is not
unlikely.
56
So, both considerations suggest that visual process-
ing speed for location would be higher than for other features.
Rather than arising accidentally, one may speculate that prior-
itizing location in action control
cf.20,21
and visual consciousness
was itself functional, grounding representations for both pro-
cesses in a common computational space integrating online
sensorimotor action control and conscious perception for
report. This enabled interactions between the two processes,
which could be mediated by common ‘‘early’’ attentional pro-
cesses
5
or by two interacting visual processing streams.
57
Akin to a higher visual processing speed, we also found that
visual processing started earlier for location than for object iden-
tity, as evident from a lower temporal perception threshold. This
finding is surprising because the temporal perception threshold
is assumed to be unspecific to the visual features and to apply
likewise to all visual features and objects.
7,58
That is, it is
assumed that the temporal perception threshold reflects ‘pre-
attentional’’ processes that dissect the visual scene into prelim-
inary representations of objects with their features,
40
on whose
basis attentional priorities (object-based attentional weights)
are computed that control subsequent processing for conscious
visual perception.
53
The present findings cast doubt on this
assumption of a feature-unspecific ‘‘pre-attentional’ temporal
perception threshold. Instead, they suggest that the start of vi-
sual processing for encoding into visual working memory and
object recognition is feature-specific (or at least earlier for
location).
In Experiment 1, participants reported both the location and
the identity of the target after it had been presented. Thus,
theyhadtoadoptatasksetinwhichbothofthetworesponse
features were important for the task and thus received equal pri-
ority. One might argue that in such a situation, humans could
have a top-down set tendency to prioritize space over identity,
which would induce the above-described perceptual bias for
space at the expense of identity.
7
However, in Experiment 2,
participants reported the different features in separate blocks
of trials, so that here they could adopt a task-set in which the
respective response feature, location or identity, was the only
one of importance and thus fully prioritized. Even under these
conditions, the visual processing speed and the temporal
perception threshold were improved for location compared
with identity. Thus, the differences between location and identity
should not result from different top-down perceptual biases for
response features. As such, these findings argue that the differ-
ences between location and identity were more profound, and
could reflect more basic bottom-up characteristics of the visual
system, such as a higher sensory evidence for location due to a
stronger and more widely distributed the representation of
space in the brain. This dovetails findings that location is pro-
cessed faster than the surface feature of color for modifying
ongoing and speeded sensorimotor actions, which may hint at
a privileged access of spatial processing to mechanisms for
(speeded) action control.
20,21
In urgent situations, the most
salient visual information can overpower current intentions, so
that the one corresponding to the salient information out of
two prepared motor plans is executed.
54,59,60
In light of this
finding, the present results might thus point to a higher intrinsic
salience of location as opposed to other object features.
In sum, the present findings reveal that the spatial location of
objects is preferred in visual processing for visual perception.
Compared with object identity, the processing of the spatial
location is more efficient, so that it starts earlier and proceeds
faster. Taken together, this argues that at least for location and
identity, visual processing is intrinsically different for different
visual features.
Limitations of the study
Performance in visual report tasks always bears some speci-
ficity with respect to the stimuli used. Therefore, the speed
of visual processing per se cannot be assessed, only the
speed for processing a certain stimulus. We used letter stimuli
with specific highly effective post-masks
7,58,61
and asked ob-
servers to report the location and/or identity of the letter, and
vice versa. For our sample, we can assume that reading letters
was a highly overlearned skill, so that letter identities formed
6iScience 28, 111702, February 21, 2025
iScienc
e
Article
ll
OPEN ACCESS
distinct categories that were easy to distinguish. However, it
therefore remains a question for future research, whether our
differences in visual processing speed and the temporal
perception threshold for identity and location were affected,
if one used visual stimuli that were less overlearned and
more difficult to verbalize, and thus did not belong to such
distinct categories.
RESOURCE AVAILABILITY
Lead contact
Correspondence and requests for resources should be directed to and will be
fulfilled by the Lead Contact, Christian H. Poth (c.poth@uni-bielefeld.de).
Materials availability
The computer code for running the experiments can be found here: Open Sci-
ence Framework: https://osf.io/jpcu4/
Data and code availability
dThe experimental data can be found here: Open Science framework:
https://osf.io/jpcu4/
dThe computer code for analysis of the data can be found here: Open Sci-
ence Framework: https://osf.io/jpcu4/
ACKNOWLEDGMENTS
We thank Josefine Albert for help with the laboratory administration. This study
was supported as part of the regular research of the Neuro-Cognitive Psychol-
ogy Group at Bielefeld University.
AUTHOR CONTRIBUTIONS
Conceptualization, CHP and WXS, methodology, CHP and WXS, software,
CHP, formal analysis, CHP, visualization, CHP, investigation, CHP, resources,
CHP, data curation, CHP, writing original draft, CHP, writing review and ed-
iting, CHP and WXS, supervision, WXS.
DECLARATION OF INTERESTS
The authors declare no conflicts of interest.
STAR+METHODS
Detailed methods are provided in the online version of this paper and include
the following:
dKEY RESOURCES TABLE
dEXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS
dMETHOD DETAILS
BApparatus and stimuli
BProcedure
BDesign
dQUANTIFICATION AND STATISTICAL ANALYSIS
SUPPLEMENTAL INFORMATION
Supplemental information can be found online at https://doi.org/10.1016/j.isci.
2024.111702.
Received: May 21, 2024
Revised: October 30, 2024
Accepted: December 24, 2024
Published: December 27, 2024
REFERENCES
1. Grill-Spector, K., and Malach, R. (2004). THE HUMAN VISUAL CORTEX.
Annu. Rev. Neurosci. 27, 649–677. https://doi.org/10.1146/annurev.
neuro.27.070203.144220.
2. Olivers, C.N.L., and Roelfsema, P.R. (2020). Attention for action in visual
working memory. Cortex 131, 179–194. https://doi.org/10.1016/j.cortex.
2020.07.011.
3. Treisman, A.M., and Gelade, G. (1980). A feature-integration theory of
attention. Cognit. Psychol. 12, 97–136. https://doi.org/10.1016/0010-
0285(80)90005-5.
4. Treisman, A. (1998). Feature binding, attention and object perception.
Philosophical Transactions of the Royal Society of London. Series B: Bio-
logical Sciences 353, 1295–1306. https://doi.org/10.1098/rstb.1998.
02845.
5. Schneider, W.X. (1995). VAM: A neuro-cognitive model for visual attention
control of segmentation, object recognition, and space-based motor
action. Vis. Cognit. 2, 331–376. https://doi.org/10.1080/135062895
08401737.
6. Kahneman, D., Treisman, A., and Gibbs, B.J. (1992). The reviewing of ob-
ject files: Object-specific integration of information. Cognit. Psychol. 24,
175–219. https://doi.org/10.1016/0010-0285(92)90007-O.
7. Bundesen, C. (1990). A theory of visual attention. Psychol. Rev. 97,
523–547. https://doi.org/10.1037/0033-295X.97.4.523.
8. Bundesen, C., Habekost, T., and Kyllingsbæk, S. (2005). A neural theory of
visual attention: Bridging cognition and neurophysiology. Psychol. Rev.
112, 291–328. https://doi.org/10.1037/0033-295X.112.2.291.
9. Luck, S.J., and Vogel, E.K. (2013). Visual working memory capacity: From
psychophysics and neurobiology to individual differences. Trends Cognit.
Sci. 17, 391–400. https://doi.org/10.1016/j.tics.2013.06.006.
10. Bundesen, C., Kyllingsbæk, S., and Larsen, A. (2003). Independent encod-
ing of colors and shapes from two stimuli. Psychon. Bull. Rev. 10,
474–479. https://doi.org/10.3758/BF03196509.
11. Martin, C.B., and Barense, M.D. (2023). Perception and Memory in the
Ventral Visual Stream and Medial Temporal Lobe. Annu. Rev. Vis. Sci. 9,
409–434. https://doi.org/10.1146/annurev-vision-120222-014200.
12. Wurm, M.F., and Caramazza, A. (2022). Two ‘what’pathways for action
and object recognition. Trends Cognit. Sci. 26, 103–116. https://doi.org/
10.1016/j.tics.2021.10.003.
13. Desimone, R., and Duncan, J. (1995). Neural mechanisms of selective vi-
sual attention. Annu. Rev. Neurosci. 18, 193–222. https://doi.org/10.1146/
annurev.ne.18.030195.001205.
14. Kastner, S., and Ungerleider, L.G. (2001). The neural basis of biased
competition in human visual cortex. Neuropsychologia 39, 1263–1276.
https://doi.org/10.1016/S0028-3932(01)00116-6.
15. Wolfe, J.M. (1994). Guided Search 2.0 A revised model of visual search.
Psychon. Bull. Rev. 1, 202–238. https://doi.org/10.3758/BF03200774.
16. Moutoussis, K., and Zeki, S. (1997). A direct demonstration of perceptual
asynchrony in vision. Proc. Biol. Sci. 264, 393–399. https://doi.org/10.
1098/rspb.1997.0056.
17. Moutoussis, K., and Zeki, S. (1997). Functional segregation and temporal
hierarchy of the visual perceptive systems. Proc. Biol. Sci. 264, 1407–
1414. https://doi.org/10.1098/rspb.1997.0196.
18. Felleman, D.J., and Van Essen, D.C. (1991). Distributed Hierarchical Pro-
cessing in the Primate Cerebral Cortex. Cerebr. Cortex 1, 1–47. https://
doi.org/10.1093/cercor/1.1.1-a.
19. Golomb, J.D., and Kanwisher, N. (2012). Higher Level Visual Cortex Rep-
resents Retinotopic, Not Spatiotopic, Object Location. Cerebr. Cortex 22,
2794–2810. https://doi.org/10.1093/cercor/bhr357.
20. Pisella, L., Arzi, M., and Rossetti, Y. (1998). The timing of color and location
processing in the motor context. Exp. Brain Res. 121, 270–276. https://
doi.org/10.1007/s002210050460.
iScience 28, 111702, February 21, 2025 7
iScienc
e
Article
ll
OPEN ACCESS
21. Pisella, L., Gre
´a, H., Tilikete, C., Vighetto, A., Desmurget, M., Rode, G.,
Boisson, D., Rossetti, Y., and Rossetti, Y. (2000). An ‘automatic pilot’for
the hand in human posterior parietal cortex: toward reinterpreting optic
ataxia. Nat. Neurosci. 3, 729–736. https://doi.org/10.1038/76694.
22. Breitmeyer, B.G. (2014). The visual (un)conscious and its (dis)contents: A
microtemporal approach (USA: Oxford University Press). https://global.
oup.com/academic/product/the-visual-unconscious-and-its-discontents-
9780198712237?cc=de&lang=en&.
23. Gegenfurtner, K.R., and Kiper, D.C. (2003). Color vision. Annu. Rev. Neu-
rosci. 26, 181–206. https://doi.org/10.1146/annurev.neuro.26.041002.
131116.
24. Logothetis, N.K., and Sheinberg, D.L. (1996). Visual Object Recognition.
Annu. Rev. Neurosci. 19, 577–621. https://doi.org/10.1146/annurev.ne.
19.030196.003045.
25. Carrasco, M. (2011). Visual attention: The past 25 years. Vis. Res. 51,
1484–1525. https://doi.org/10.1016/j.visres.2011.04.012.
26. Petersen, S.E., and Posner, M.I. (2012). The Attention System of the Hu-
man Brain: 20 Years After. Annu. Rev. Neurosci. 35, 73–89. https://doi.
org/10.1146/annurev-neuro-062111-150525.
27. Van der Heijden, A.H.C. (1993). The role of position in object selection in
vision. Psychol. Res. 56, 44–58. https://doi.org/10.1098/rspb.1997.0196.
28. Schneider, W.X. (1993). Space-based visual attention models and object
selection: Constraints, problems, and possible solutions. Psychol. Res.
56, 35–43.
29. Wolfe, J.M. (2020). Visual search: How do we find what we are looking for?
Annu. Rev. Vis. Sci. 6, 539–562. https://doi.org/10.1146/annurev-vision-
091718-015048.
30. Milner, D., and Goodale, M. (2006). The Visual Brain in Action (Oxford, UK:
Oxford University Press).
31. Nowak, L.G., and Bullier, J. (1997). The Timing of Information Transfer in
the Visual System. In Extrastriate Cortex in Primates. Cerebral Cortex,
12, K.S. Rockland, J.H. Kaas, and A. Peters, eds. (Springer),
pp. 205–241. https://doi.org/10.1007/978-1-4757-9625-4_5.
32. Logan, G.D. (1996). The CODE theory of visual attention: An integration of
space-based and object-based attention. Psychol. Rev. 103, 603–649.
https://doi.org/10.1037/0033-295X.103.4.603.
33. Bundesen, C., Vangkilde, S., and Petersen, A. (2015). Recent develop-
ments in a computational theory of visual attention (TVA). Vis. Res. 116,
210–218. https://doi.org/10.1016/j.visres.2014.11.005.
34. Bundesen, C. (1991). Visual selection of features and objects: Is location
special? An Interpretation of Nissen’s (1985) findings. Percept. Psycho-
phys. 50, 87–89. https://doi.org/10.3758/BF03212208.
35. (2021). R: A Language and Environment for Statistical Computing (R Foun-
dation for Statistical Computing). https://www.R-project.org/.
36. Rouder, J.N., Speckman, P.L., Sun, D., Morey, R.D., and Iverson, G.
(2009). Bayesian t tests for accepting and rejecting the null hypothesis.
Psychon. Bull. Rev. 16, 225–237. https://doi.org/10.3758/PBR.16.2.225.
37. Morey, R.D. (2008). Confidence Intervals from Normalized Data: A correc-
tion to Cousineau (2005). Tutor. Quant. Methods Psychol. 4, 61–64.
https://doi.org/10.20982/tqmp.04.2.p061.
38. Linares, D., and Lo
´pez-Moliner, J. (2016). quickpsy: An R Package to Fit
Psychometric Functions for Multiple Groups. R J. 8, 122–131. https://
doi.org/10.32614/RJ-2016-008.
39. Knoblauch, K., and Maloney, L.T. (2012). Modeling Psychophysical Data in
R. https://doi.org/10.1007/978-1-4614-4475-6.
40. Bundesen, C., and Habekost, T. (2008). Principles of Visual Attention:
Linking Mind and Brain (Oxford University Press). https://doi.org/10.
1093/acprof:oso/9780198570707.001.0001.
41. Finke, K., Dodds, C.M., Bublak, P., Regenthal, R., Baumann, F., Manly, T.,
and M
uller, U. (2010). Effects of modafinil and methylphenidate on visual
attention capacity: a TVA-based study. Psychopharmacology 210,
317–329. https://doi.org/10.1007/s00213-010-1823-x.
42. Foerster, R.M., Poth, C.H., Behler, C., Botsch, M., and Schneider, W.X.
(2016). Using the virtual reality device Oculus Rift for neuropsychological
assessment of visual processing capabilities. Sci. Rep. 6, 37016. https://
doi.org/10.1038/srep37016.
43. Irwin, D.E., and Thomas, L.E. (2008). Visual sensory memory. In Visual
memory, S.J. Luck and A. Hollingworth, eds. (Oxford University Press),
pp. 9–42.
44. Petersen, A., Petersen, A.H., Bundesen, C., Vangkilde, S., and Habekost,
T. (2017). The effect of phasic auditory alerting on visual perception .
Cognition 165, 73–81. https://doi.org/10.1016/j.cognition.2017.04.004.
45. Poth, C.H., and Schneider, W.X. (2018). Attentional competition across
saccadic eye movements. Acta Psychol. 190, 27–37. https://doi.org/10.
1016/j.actpsy.2018.06.011.
46. Vangkilde, S., Coull, J.T., and Bundesen, C. (2012). Great expectations:
Temporal expectation modulates perceptual processing speed. J. Exp.
Psychol. Hum. Percept. Perform. 38, 1183–1191. https://doi.org/10.
1037/a0026343.
47. Vangkilde, S., Petersen, A., and Bundesen, C. (2013). Temporal expec-
tancy in the context of a theory of visual attention. Philos. Trans. R. Soc.
Lond. B Biol. Sci. 368, 20130054.
48. Enns, J.T., and Di Lollo, V. (2000). What’s new in visual masking? Trends
Cognit. Sci. 4, 345–352. https://doi.org/10.1016/S1364-6613(00)01520-5.
49. Poth, C.H., Herwig, A., and Schneider, W.X. (2015). Breaking object corre-
spondence across saccadic eye movements deteriorates object recogni-
tion. Front. Syst. Neurosci. 9, 176. https://doi.org/10.3389/fnsys.2015.
00176.
50. Poth, C.H., and Schneider, W.X. (2016). Breaking object correspondence
across saccades impairs object recognition: The role of color and lumi-
nance. J. Vis. 16,1.https://doi.org/10.1167/16.11.1.
51. Schneider, W.X. (2013). Selective visual processing across competition
episodes: A theory of task-driven visual attention and working memory.
Philos. Trans. R. Soc. Lond. B Biol. Sci. 368, 20130060. https://doi.org/
10.1098/rstb.2013.0060.
52. Nordfang, M., Staugaard, C., and Bundesen, C. (2018). Attentional
weights in vision as products of spatial and nonspatial components. Psy-
chon. Bull. Rev. 25, 1043–1051. https://doi.org/10.3758/s13423-017-
1337-1.
53. Vangkilde, S., Bundesen, C., and Coull, J.T. (2011). Prompt but inefficient:
nicotine differentially modulates discrete components of attention. Psy-
chopharmacology 218, 667–680. https://doi.org/10.1007/s00213-011-
2361-x.
54. Krause, A., and Poth, C.H. (2023). Maintaining eye fixation relieves pres-
sure of cognitive action control. iScience 26.https://doi.org/10.1016/j.
isci.2023.107520.
55. Treisman, A., and Souther, J. (1985). Search asymmetry: a diagnostic for
preattentive processing of separable features. J. Exp. Psychol. Gen. 114,
285–310.
56. Cox, G.E., Palmeri, T.J., Logan, G.D., Smith, P.L., and Schall, J.D. (2022).
Salience by competitive and recurrent interactions: Bridging neural spiking
and computation in visual attention. Psychol. Rev. 129, 1144–1182.
57. Rossetti, Y., Pisella, L., and McIntosh, R.D. (2017). Rise and fall of the two
visual systems theory. Ann. Phys. Rehabil. Med. 60, 130–140. https://doi.
org/10.1016/j.rehab.2017.02.002.
58. Shibuya, H., and Bundesen, C. (1988). Visual selection from multielement
displays: Measuring and modeling effects of exposure duration. J. Exp.
Psychol. Hum. Percept. Perform. 14, 591–600. https://doi.org/10.1037/
0096-1523.14.4.591.
59. Poth, C.H. (2021). Urgency forces stimulus-driven action by overcoming
cognitive control. Elife 10, e73682. https://doi.org/10.7554/eLife.73682.
60. Salinas, E., Steinberg, B.R., Sussman, L.A., Fry, S.M., Hauser, C.K., An-
derson, D.D., and Stanford, T.R. (2019). Voluntary and involuntary contri-
butions to perceptually guided saccadic choices resolved with millisecond
precision. Elife 8, e46359. https://doi.org/10.7554/eLife.46359.
8iScience 28, 111702, February 21, 2025
iScienc
e
Article
ll
OPEN ACCESS
61. Poth, C.H., Foerster, R.M., Behler, C., Schwanecke, U., Schneider, W.X.,
and Botsch, M. (2018). Ultrahigh temporal resolution of visual presentation
using gaming monitors and G-Sync. Behav. Res. Methods 50, 26–38.
https://doi.org/10.3758/s13428-017-1003-6.
62. Poth, C.H., and Horstmann, G. (2017). Assessing the monitor warm-up
time required before a psychological experiment can begin. Quant.
Method. Psychol. 13, 166–173. https://doi.org/10.20982/tqmp.13.3.p166.
63. Brainard, D.H. (1997). The psychophysics toolbox. Spatial Vis. 10,
433–436.
64. Pelli, D.G. (1997). The VideoToolbox software for visual psychophysics:
Transforming numbers into movies. Spatial Vis. 10, 437–442.
65. Kleiner, M., Brainard, D., and Pelli, D. (2007). What’s new in Psychtoolbox-
3? Perception 36, 1–16.
66. Cornelissen, F.W., Peters, E.M., and Palmer, J. (2002). The Eyelink
Toolbox: Eye tracking with MATLAB and the Psychophysics Toolbox. Be-
hav. Res. Methods Instrum. Comput. 34, 613–617. https://doi.org/10.
3758/BF03195489.
iScience 28, 111702, February 21, 2025 9
iScienc
e
Article
ll
OPEN ACCESS
STAR+METHODS
KEY RESOURCES TABLE
EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS
N = 9 human observers (22 - 42 years old, MD = 25 years, 7 identifying as female, 2 as male) participated in Experiment 1 and N = 9
human observers (between 20 and 30 years old, MD = 23 years, 8 identifying as female, 1 as male) in Experiment 2. All observers had
normal or corrected-to-normal visual acuity and color vision. The experiments employed within-subjects designs, so that experi-
mental effects were assessed within observers (there were no experimental groups, which controls for between-subjects effects
due to sex or gender). They were paid for participating and gave written informed consent beforehand. The experiments followed
the ethical guidelines of the German Psychological Association (Deutsche Gesellschaft f
ur Psychologie, DGPs) and were approved
by the ethics committee at Bielefeld University.
METHOD DETAILS
Apparatus and stimuli
Observers performed the experiment in a dimly lit room, with their heads fixed by a chin and head rest in a viewing distance of 71 cm
to the computer monitor (ViewSonic, resolution of 1024x768 px at physical dimensions of 36x27 cm), that was pre-heated as spec-
ified previously
62
. Their eyes were tracked monocularly at 1000 Hz using a video-based and desktop-mounted eye tracker (Eyelink
1000, SR Research, Ottawa, Ontario, Canada). The experiments were programmed in MATLAB (R2014b, The Mathworks, Natick,
MA, USA) using the Psychophysics Toolbox
63–65
and Eyelink Toolbox
66
extensions. Responses were collected using a QWERTZ-
keyboard and a computer mouse.
Stimuli were presented against a black background (<1 cd/m
2
, measured using a Minolta LS-110, Konica Minolta, Osaka, Japan).
The fixation cross was a central red ‘‘+’’ (RGB: [100, 0, 0]; 3 cd/m
2
, 0.25x0.25of visual angle. Target stimuli were red letters (RGB:
[100, 0, 0]) from the set [ABFGHJLMRSTX] (0.76x0.78,3 cd/m
2
), and the mask stimuli (100 masks per session, algorithmically
created) were red circular patches of overlayed letters (see Figure 1, 0.98x0.98, RGB: [200, 0, 0], 13 cd/m
2
). Stimuli were shown
at one of twelve possible locations 9around screen center. Response displays showed the text ‘‘Buchstabe?’’ (‘‘letter’’, 4.35x0.62)
or ‘‘Ort?’ (‘‘location’’, 1.57x0.62) in gray (7 cd/m
2
).
Procedure
Figure 1 of the main text illustrates the procedure of a single experimental trial in Experiment 1. In the beginning of a trial, observers
fixated the fixation cross for a uniformly random interval between 694 and 1388 ms (in steps of 12 ms). Then, a single target letter
(randomly drawn from the set of 12 letters) shown at one of the twelve possible locations (randomly drawn from the set of locations)
for 12, 24, 35, 47, 59, 71, 82, 106, 129, 176, or 224 ms. The target was terminated by twelve pattern masks, one appearing at each of
the twelve locations for 494 ms. Next, the response displays were presented, asking observers to report the target letter that they had
seen using the keyboard or to report its location by clicking on it using the computer mouse. In Experiment 1, observers always
reported both, the identity and the location of the target letter, whereby the order of the two report types was randomized and coun-
terbalanced across trials.
In Experiment 2, the time-course of an experimental trial was the same as in Experiment 1, except that here, observers only
performed one of the two report types on a trial. To this end, participants were asked either to report the location of the target letter
or its identity in a block of trials.
Design
In Experiment 1, observers performed 11 (target durations) x 2 (location vs. letter reported first) x 25 trials = 550 trials per session. In
the beginning of each session, they performed 20 practice trials. Six observers performed 4 sessions and thus 2200 trials in total. Two
observers performed 2 sessions and 1100 trials each, and one observer terminated during session 3, after 1650 trials.
REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data
Behavioral data Authors https://osf.io/jpcu4/
Software and algorithms
Custom analysis scripts Authors https://osf.io/jpcu4/
e1 iScience 28, 111702, February 21, 2025
iScienc
e
Article
ll
OPEN ACCESS
In Experiment 2, observers performed 11 (target durations) x 50 trials = 550 trials per session. Observers performed 4 sessions, in
each of which they either reported target location or letter identity (whereby this was ordered in an ABBA or BAAB fashion, to cancel
out fatigue effects of the blocks, and counterbalanced across observers). Observers performed 2200 trials in total, except for one
observer who performed 1925 trials (due to a programming error).
The data, experiment code, and analysis code can be found online at Open Science Framework: https://osf.io/jpcu4/.
QUANTIFICATION AND STATISTICAL ANALYSIS
The Data was analyzed using custom scripts written in R (4.3.1., R Core Team, 2023). The data and analysis code can be found online
at (Open Science Framework: https://osf.io/jpcu4/) and contains all used R-packages. Statistical comparisons were conducted us-
ing repeated-measures analyses of variance, paired (or one-sample) t-tests (with Cohen’s d
z
as effect size and a significance criterion
of a= .05), followed-up upon by Bayesian t-tests (with a prior scale of r= 0.707) yielding the Bayes Factor in favor of the alternative
hypothesis (BF10).
36
Sample size for the first experiment was estimated based on previous research,
45,61
and the used for the second
experiment that provided a replication of the first one to safeguard the reported findings against a type-I error. Within the figures, bars
visualize means, and error-bars visualize 95%-confidence intervals.
37
iScience 28, 111702, February 21, 2025 e2
iScienc
e
Article
ll
OPEN ACCESS
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This article presents a theory that integrates space-based and object-based approaches to visual attention. The theory puts together M. P. van Oeffelen and P. G. Vos’s (1982, 1983) COntour DEtector (CODE) theory of perceptual grouping by proximity with C. Bundesen’s (1990) theory of visual attention (TVA). CODE provides input to TVA, accounting for spatially based between-object selection, and TVA converts the input to output, accounting for feature- and category-based within-object selection. CODE clusters nearby items into perceptual groups that are both perceptual objects and regions of space, thereby integrating object-based and space-based approaches to attention. The combined theory provides a quantitative account of the effects of grouping by proximity and distance between items on reaction time and accuracy data in 7 empirical situations that shaped the current literature on visual spatial attention.
Article
Full-text available
In a partial-report experiment, subjects reported the digits from a circular array of digits and letters terminated by a pattern mask. Individual frequency distributions of the number of correctly reported digits were analyzed as functions of number of digits (2, 4, or 6) and number of letters (0, 2, 4, 6, or 8) at nine exposure durations ranging from 10 to 200 ms. The distributions (hundreds of data points per subject) were accurately predicted by a four-parameter fixed-capacity independent race model that assumes exponentially distributed processing times, limitations in both processing capacity and storage capacity, and time-invariant selectivity. Estimated from the data, processing capacity C was 45 items/s, selectivity α (ratio between the amount of processing capacity devoted to a distractor and the amount devoted to a target) was 0.48, short-term storage capacity K was 3.5 items, and the longest ineffective exposure duration t0 was 18 ms.
Article
Full-text available
The search rate for a target among distractors may vary dramatically depending on which stimulus plays the role of target and which that of distractors. For example, the time required to find a circle distinguished by an intersecting line is independent of the number of regular circles in the display, whereas the time to find a regular circle among circles with lines increases linearly with the number of distractors. The pattern of performance suggests parallel processing when the target has a unique distinguishing feature and serial self-terminating search when the target is distinguished only by the absence of a feature that is present in all the distractors. The results are consistent with feature-integration theory (Treisman & Gelade, 1980), which predicts that a single feature should be detected by the mere presence of activity in the relevant feature map, whereas tasks that require subjects to locate multiple instances of a feature demand focused attention. Search asymmetries may therefore offer a new diagnostic to identify the primitive features of early vision. Several candidate features are examined in this article: Colors, line ends or terminators, and closure (in the sense of a partly or wholly enclosed area) appear to be functional features; connectedness, intactness (absence of an intersecting line), and acute angles do not.
Article
Full-text available
Cognitive control enables humans to behave guided by their current goals and intentions. Cognitive control in one task generally suffers when humans try to engage in another task on top. However, we discovered an additional task that supports conflict resolution. In two experiments, participants performed a spatial cognitive control task. For different blocks of trials, they either received no instruction regarding eye movements or were asked to maintain the eyes fixated on a stimulus. The additional eye fixation task did not reduce task performance, but selectively ameliorated the adverse effects of cognitive conflicts on reaction times (Experiment 1). Likewise, in urgent situations, the additional task reduced performance impairments due to stimulus-driven processing overpowering cognitive control (Experiment 2). These findings suggest that maintaining eye fixation locks attentional resources that would otherwise induce spatial cognitive conflicts. This reveals an attentional disinhibition that boosts goal-directed action by relieving pressure from cognitive control.
Article
Full-text available
Intelligent behavior requires to act directed by goals despite competing action tendencies triggered by stimuli in the environment. For eye movements, it has recently been discovered that this ability is briefly reduced in urgent situations (Salinas et al., 2019). In a time-window before an urgent response, participants could not help but look at a suddenly appearing visual stimulus, even though their goal was to look away from it. Urgency seemed to provoke a new visual–oculomotor phenomenon: A period in which saccadic eye movements are dominated by external stimuli, and uncontrollable by current goals. This period was assumed to arise from brain mechanisms controlling eye movements and spatial attention, such as those of the frontal eye field. Here, we show that the phenomenon is more general than previously thought. We found that also in well-investigated manual tasks, urgency made goal-conflicting stimulus features dominate behavioral responses. This dominance of behavior followed established trial-to-trial signatures of cognitive control mechanisms that replicate across a variety of tasks. Thus together, these findings reveal that urgency temporarily forces stimulus-driven action by overcoming cognitive control in general, not only at brain mechanisms controlling eye movements.
Article
Full-text available
The ventral visual stream is conceived as a pathway for object recognition. However, we also recognize the actions an object can be involved in. Here, we show that action recognition critically depends on a pathway in lateral occipitotemporal cortex, partially overlapping and topographically aligned with object representations that are precursors for action recognition. By contrast, object features that are more relevant for object recognition, such as color and texture, are typically found in ventral occipitotemporal cortex. We argue that occipitotemporal cortex contains similarly organized lateral and ventral ‘what’ pathways for action and object recognition, respectively. This account explains a number of observed phenomena, such as the duplication of object domains and the specific representational profiles in lateral and ventral cortex.
Article
Full-text available
From the conception of Baddeley’s visuospatial sketchpad, visual working memory and visual attention have been closely linked concepts. An attractive model has advocated unity of the two cognitive functions, with attention serving the active maintenance of sensory representations. However, empirical evidence from various paradigms and dependent measures has now firmly established an at least partial dissociation between visual attention and visual working memory maintenance – thus leaving unclear what the relationship between the two concepts is. Moreover, a focus on sensory storage has treated visual working memory as a reflection of the past, with attention as a limiting resource. This view ignores what storage is for: immediate or future action. We argue that rather than serving sensory storage, attention emerges from coupling relevant sensory and action representations within working memory. Importantly, this coupling is bidirectional: First, through recurrent feedback mechanisms, action coupling results in the enhancement of the appropriate sensory memory representation. Under this view, unattended memories are currently not coupled to an action plan, but are not necessarily lost and remain available for future tasks when necessary. Second, through the very same feedback projections, attention serves as the credit assignment mechanism for the action’s outcome. When the action is successful, the associated representations are being reinforced, leading to more robust consolidation and more rapid retrieval in the future – thus explaining performance benefits for attended memories without assuming that attention serves as the maintenance mechanism. By firmly grounding VWM in the action system, the new framework integrates a range of behavioural and neurophysiological findings and avoids circularity in explaining the role of attention in working memory.
Article
Perception and memory are traditionally thought of as separate cognitive functions, supported by distinct brain regions. The canonical perspective is that perceptual processing of visual information is supported by the ventral visual stream, whereas long-term declarative memory is supported by the medial temporal lobe. However, this modular framework cannot account for the increasingly large body of evidence that reveals a role for early visual areas in long-term recognition memory and a role for medial temporal lobe structures in high-level perceptual processing. In this article, we review relevant research conducted in humans, nonhuman primates, and rodents. We conclude that the evidence is largely inconsistent with theoretical proposals that draw sharp functional boundaries between perceptual and memory systems in the brain. Instead, the weight of the empirical findings is best captured by a representational-hierarchical model that emphasizes differences in content, rather than in cognitive processes within the ventral visual stream and medial temporal lobe. Expected final online publication date for the Annual Review of Vision Science, Volume 9 is September 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Decisions about where to move the eyes depend on neurons in frontal eye field (FEF). Movement neurons in FEF accumulate salience evidence derived from FEF visual neurons to select the location of a saccade target among distractors. How visual neurons achieve this salience representation is unknown. We present a neuro-computational model of target selection called salience by competitive and recurrent interactions (SCRI), based on the competitive interaction model of attentional selection and decision-making (Smith & Sewell, 2013). SCRI selects targets by synthesizing localization and identification information to yield a dynamically evolving representation of salience across the visual field. SCRI accounts for neural spiking of individual FEF visual neurons, explaining idiosyncratic differences in neural dynamics with specific parameters. Many visual neurons resolve the competition between search items through feedforward inhibition between signals representing different search items, some also require lateral inhibition, and many act as recurrent gates to modulate the incoming flow of information about stimulus identity. SCRI was tested further by using simulated spiking representations of visual salience as input to the gated accumulator model of FEF movement neurons (Purcell et al., 2010, 2012). Predicted saccade response times fit those observed for search arrays of different set sizes and different target-distractor similarities, and accumulator trajectories replicated movement neuron discharge rates. These findings offer new insights into visual decision-making through converging neuro-computational constraints and provide a novel computational account of the diversity of FEF visual neurons. (PsycInfo Database Record (c) 2022 APA, all rights reserved).