ArticlePDF Available

Abstract and Figures

The visual system is exquisitely adapted to the task of extracting conceptual information from visual input with every new eye fixation, three or four times a second. Here we assess the minimum viewing time needed for visual comprehension, using rapid serial visual presentation (RSVP) of a series of six or 12 pictures presented at between 13 and 80 ms per picture, with no interstimulus interval. Participants were to detect a picture specified by a name (e.g., smiling couple) that was given just before or immediately after the sequence. Detection improved with increasing duration and was better when the name was presented before the sequence, but performance was significantly above chance at all durations, whether the target was named before or only after the sequence. The results are consistent with feedforward models, in which an initial wave of neural activity through the ventral stream is sufficient to allow identification of a complex visual stimulus in a single forward pass. Although we discuss other explanations, the results suggest that neither reentrant processing from higher to lower levels nor advance information about the stimulus is necessary for the conscious detection of rapidly presented, complex visual information.
Content may be subject to copyright.
Detecting meaning in RSVP at 13 ms per picture
Mary C. Potter &Brad Wyble &Carl Erick Hagmann &
Emily S. McCourt
Published online: 28 December 2013
#Psychonomic Society, Inc. 2014
Abstract The visual system is exquisitely adapted to the task
of extracting conceptual information from visual input with
every new eye fixation, three or four times a second. Here we
assess the minimum viewing time needed for visual compre-
hension, using rapid serial visual presentation (RSVP) of a
series of six or 12 pictures presented at between 13 and 80 ms
per picture, with no interstimulus interval. Participants wereto
detect a picture specified by a name (e.g., smiling couple)that
was given just before or immediately after the sequence.
Detection improved with increasing duration and was better
when the name was presented before the sequence, but per-
formance was significantly above chance at all durations,
whether the target was named before or only after the se-
quence. The results are consistent with feedforward models,
in which an initial wave of neural activity through the ventral
stream is sufficient to allow identification of a complex visual
stimulus in a single forward pass. Although we discuss other
explanations, the results suggest that neither reentrant process-
ing from higher to lower levels nor advance information about
the stimulus is necessary for the conscious detection of rapidly
presented, complex visual information.
Keywords Picture perception .Feedforward processing .
Attentionalset .Conscious perception .Conceptualprocessing
Our eyes move to take in new information three or four times a
second, and our understanding of the visual input seems to
keep pace with this information flow. Eye fixation durations
may be longer than the time required to perceive a scene,
however, because they include time to encode the scene into
memory and to plan and initiate the next saccade. Indeed, a
picture as brief as 20 ms is easy to see if it is followed by a
blank visual field (e.g., Thorpe, Fize, & Marlot, 1996). How-
ever, presenting another patterned stimulus after the target as a
mask interferes with processing, particularly if the mask is
another meaningful picture (Intraub, 1984; Loftus, Hanna, &
Lester, 1988; Loschky, Hansen, Sethi, & Pydimarri, 2010;
Potter, 1976). With rapid serial visual presentation (RSVP)
of colored photographs of diverse scenes, each picture masks
the preceding one, so only the last picture is not masked.
Nonetheless, viewers can detect a picture presented for
a name for the target, such as picnic or harbor with boats
(Intraub, 1981; Potter, 1975,1976; Potter, Staub, Rado, &
OConnor, 2002). Here, we tested the limits of viewers
detection ability by asking them to look for or recall named
targets in sequences of six (Exp. 1)or12(Exp.2) pictures that
they had never seen before, presented for durations between
13 and 80 ms per picture.
One reason for using such short durations was to investi-
gate the possibility that the visual system has been configured
by experience to process scene stimuli directly to an abstract
conceptual level, such as apicnic.In feedforward compu-
tational models of the visual system (Serre et al., 2007a; Serre,
Oliva, & Poggio, 2007b), the units that process a visual
stimulus are hierarchically arranged: Units representing small
regions of space (receptive fields) in the retina converge to
Electronic supplementary material The online version of this article
(doi:10.3758/s13414-013-0605-z) contains supplementary material,
which is available to authorized users.
M. C. Potter :C. E. Hagmann :E. S. McCourt
Department of Brain and Cognitive Sciences, Massachusetts Institute
of Technology, Cambridge, MA, USA
B. Wyble
Department of Psychology, Pennsylvania State University, College
Park, PA, USA
M. C. Potter (*)
Department of Brain and Cognitive Sciences, 46-4125
Massachusetts Institute of Technology, 77 Massachusetts Avenue,
Cambridge, MA 02139, USA
Atten Percept Psychophys (2014) 76:270279
DOI 10.3758/s13414-013-0605-z
represent larger and larger receptive fields and increasingly
abstract information along a series of pathways from V1 to
inferotemporal cortex, and higher to the prefrontal cortex. A
lifetime of visual experience is thought to tune this hierarchi-
cal structure, which acts as a filter that permits the categori-
zation of objects and scenes with a single, forward pass of
processing. In this model, even a very brief, masked presen-
tation might be sufficient for understanding a picture.
A widely accepted theory of vision, however, is that per-
ception results from a combination of feedforward and feed-
back connections, with initial feedforward activation generat-
ing possible interpretations that are fed back and compared
with lower levels of processing for confirmation, establishing
reentrant loops (Di Lollo, 2012; Di Lollo, Enns, & Rensink,
2000; Enns & Di Lollo, 2000; Hochstein & Ahissar, 2002;
Lamme & Roelfsema, 2000). Such loops produce sustained
activation that enhances memory. It has been proposed that we
become consciously aware of what we are seeing only when
such reentrant loops have been established (Lamme, 2006). A
related suggestion is that consciousness arises from recurrent
long-distance interactions among distributed thalamo-cortical
regions(Del Cul, Baillet, & Dehaene, 2007,p.2408).This
network is ignited as reentrant loops in the visual system are
formed (Dehaene, Kergsberg, & Changeux, 1998;Dehaene&
Naccache, 2001; Dehaene, Sergent, & Changeux, 2003;see
also Tononi & Koch, 2008). It has been estimated that reen-
trant loops connecting several levels in the visual system
would take at least 50 ms to make a round trip, which would
be consistent with stimulus onset asymmetries (SOAs) that
typically produce backward masking.
Thus, when people view stimuli for 50 ms or less with
backward pattern masking, as in some conditions in the pres-
ent study, the observer may have too little time for reentrant
loops to be established between higher and lower levels of the
visual hierarchy before earlier stages of processing are
interrupted by the subsequent mask (Kovacs, Vogels, &
Orban, 1995; Macknik & Martinez-Conde, 2007). In that
case, successful perception would primarily result from the
forward pass of neural activity from the retina through the
visual system (DiCarlo, Zoccolan, & Rust, 2012; Hung,
Kreiman, Poggio, & DiCarlo, 2005; Perrett, Hietanen, Oram,
& Benson, 1992; Thorpe & Fabre-Thorpe, 2001). In support
of the possibility of feedforward comprehension, Liu, Agam,
Madsen, and Kreiman (2009) were able to decode object
category information from human visual areas within
100 ms after stimulus presentation.
An open question is what level of understanding is
achieved in the initial forward wave of processing. One be-
havioral approach to assessing how much is understood in the
feedforward sweep is to measure the shortest time to make a
discriminative response to a stimulus. Studies by Thorpe,
Fabre-Thorpe, and their colleagues (see the reviews by Thorpe
& Fabre-Thorpe, 2001, and Fabre-Thorpe, 2011) required
participants to make a go/no-go response to the presence of
a category such as animals (or vehicles, or faces) in photo-
graphs presented for 20 ms without a mask. They found that
differential electroencephalographic activity for targets began
about 150 ms after presentation. The shortest above-chance
reaction times (which would include motor response time)
were under 300 ms. Choice saccades to a face in one of two
pictures were even faster, as short as 100 ms (Crouzet,
Kirchner, & Thorpe, 2010). In another study (Bacon-Macé,
Kirchner, Fabre-Thorpe, & Thorpe, 2007), pictures with or
without animals were followed by texture masks at SOAs
between 6 and 107 ms; animal detection was at chance at 6
ms, but above chance starting at 12 ms, with performance
approaching an asymptote at 44 ms. These times suggested to
the investigators that the observers were relying on
feedforward activity, at least for their fastest responses.
A different question from that of feedback is whether a
selective set or expectation can modify or resonate with the
feedforward process (Llinás, Ribary, Contreras, &
Pedroarena, 1998), obviating the need for reentrant processing
and enabling conscious perception with presentation durations
shorter than 50 ms. It is well known that advance information
about a target improves detection. For example, in a recent
study by Evans, Horowitz, and Wolfe (2011), participants
viewed a picture for 20 ms, preceded and followed by texture
masks, and judged whether they saw a specified target (e.g.,
animal, beach, or street scene). Accuracy was consistently
higher when the target was specified just before the stimulus
was presented, rather than just after. Numerous studies have
shown that selective attention has effects on the visual system
in advance of expected stimuli (e.g., Cukur, Nishimoto, Huth,
&Gallant,2013). For example, in a recent study using
multivoxel pattern analysis (Peelen & Kastner, 2011), the
amount of preparatory activity in object-selective cortex that
resembled activity when viewing objects in a given category
was correlated with successful item detection.
To evaluate the effect of attentional set on target detection,
in the present experiments we compared two conditions be-
tween groups. In one group, the target was named just before
the sequence (providing a specific attentional set), and in the
other, the target was named just after the sequence (with no
advance attentional set). In the latter case, the participant had
to rely on memory to detect the target. Previous studies of
memory for pictures presented in rapid sequences have shown
that memory is poor with durations in the range of 100200
ms per picture (Potter & Levy, 1969; Potter et al., 2002).
Given these results, and the known benefits of advance infor-
mation already mentioned, we expected that advance infor-
mation would improve performance; the question was wheth-
er it would interact with duration, such that detection of the
target would be impossible at the shorter durations without
advance information. Such a result would conflict with
the hypothesis that feedforward processing, without
Atten Percept Psychophys (2014) 76:270279 271
reentrance and without top-down information, is suffi-
cient for conscious identification.
Although both the feedforward and feedback models pre-
dict that performance will improve with presentation time, the
main questions addressed here were whether knowing the
identity of the target ahead of time would be necessary for
successful detection, particularly at high rates of presentation,
and whether we would observe a sharp discontinuity in per-
formance as the duration of the images was reduced below 50
ms, as is predicted by reentrant models. A seminal earlier
study (Keysers, Xiao, Földiák, & Perrett, 2001;seealso
Keysers et al., 2005) showed successful detection using RSVP
at a duration as short as 14 ms, but the target was cued by
showing the picture itself, and pictures were reused for a
single participant so that they became familiar. In the present
experiments, by specifying the target with a name and by
using multiple pictures in each sequence that the participants
had never seen before, we forced participants to identify the
target at a conceptual level rather than simply matching spe-
cific visual features.
Experiment 1
Procedure Two groups of participants viewed an RSVP se-
quence of six pictures presented for 13, 27, 53, or 80 ms per
picture and tried to detect a target specified by a written name
(see Fig. 1). The one-to-four-word name reflected the gist of the
picture as judged by the experimenters. Examples are swan ,
traffic jam,boxes of vegetables,children holding hands ,boat
out of water,campfire,bear catching fish,andnarrow street.
For those in the before group, each trial began with a fixation
cross for 500 ms, followed by the name of the target for 700 ms,
and then a blank screen of 200 ms and the sequence of pictures.
A blank screen of 200 ms also followed the sequence, and then
the question Yes o r No? appeared and remained in view until
the participant pressed Yor Non the keyboard to report
whether he or she had seen the target. Those in the after group
viewed a fixation cross for 500 ms at the beginning of the trial,
followed by a blank screen of 200 ms and the sequence. At the
end of the sequence, another 200-ms blank screen appeared, and
then the name was presented simultaneously with the yes/no
question until the participant responded.
On trials in which the target had been presented, the par-
ticipants response was followed by a two-alternative forced
choice between two pictures that matched the target name.
The participant pressed the Gor Jkeytoindicatewhether
the left or right picture, respectively, had been presented. On
no-target trials, the words No targetappeared instead of the
pair of pictures.
Participants The 32 participants (17 women, 15 men) were
volunteers 1835 years of age who were paid for their partic-
ipation. All signed a consent form approved by the MIT
Committee on the Use of Humans as Experimental Subjects.
Participants were replaced if they made more than 50% false
yesresponses, overall, on nontarget trials, because such a
high false alarm rate suggested that the participant was not
following instructions, but randomly guessing. One par-
ticipant was replaced in the before group, and three in
the after group.
Materials The stimuli were color photographs of scenes. The
pictures were new to the participants, and each picture was
presented only once. For the targets, two pictures were select-
ed that matched each target name; which one appeared in the
sequence was determined randomly. The other picture was
used as a foil in the forced choice test after each target-present
trial. The pictures were taken from the Web and from other
collections of pictures available for research use. They includ-
ed a wide diversity of subject matter: indoor and outdoor, both
with and without people. The pictures were resized to 300 ×
200 pixels and were presented in the center of the monitor on a
gray background. The horizontal visual angle was 10.3º at the
normal viewing distance of 50 cm. For the forced choice, two
300 × 200 pixel pictures were presented side by side.
Design A practice block was presented at 133 ms per picture,
followed by eight experimental blocks of trials. Across blocks,
the durations were 80, 53, 27, and 13 ms per picture, repeated
in the next four blocks. Each block contained 20 trials, includ-
ing five no-target trials. The target, which was never the first
or last picture, appeared in Serial Position 2, 3, 4, or 5,
balanced over target trials within each block. Across every
eight participants, the eight blocks of trials were rotated so that
the pictures in each block of trials were seen equally often at
each duration and in each half of the experiment.
Apparatus The experiment was programmed with MATLAB
7.5.0 and the Psychophysics Toolbox extension (Brainard,
1997) version 3, and was run on a Mac mini with a 2.4-GHz
Intel Core 2 Duo processor. The Apple 17-in. CRT monitor
was set to a 1,024 × 768 resolution with a 75-Hz refresh rate.
The room was normally illuminated. Timing errors sometimes
occur in RSVP sequences (McKeeff, Remus, & Tong, 2007).
Precision was controlled by using Wybles Stream package for
MATLAB. We checked the actual timing on each refresh
cycle in each of the groups, and excluded trials in which a
timing error of ±12 ms (equivalent to a single refresh of the
monitor) or greater affected the target picture or the pictures
immediately before and after the target. Since the timing
errors were random, they increased noise in the data but
did not produce any systematic bias. In Experiment 1,
an average of 22% of the target trials were removed in
272 Atten Percept Psychophys (2014) 76:270279
the name-before group, and 10% were removed in the
name-after group. In Experiment 2,timingerrorsoc-
curred on fewer than 1% of the trials.
Analyses Repeated measures analyses of variance (ANOVAs)
were carried out on individual participantsd' measures, as a
function of beforeafter group and presentation duration (80, 53,
27, or 13 ms per picture). Planned paired ttests at each duration,
separated for each group, compared d' with chance (0.0). Serial
position effects were analyzed for the proportions of hits on
target-present trials (since there was no way to estimate false
yeses as a function of serial position, we did not use d').
Separate ANOVAs were carried out on the accuracy of the forced
choice responses on target-present trials, conditional on whether
the participant had responded yes(a hit) or no(a miss).
Results and discussion
The results are shown in Fig. 2.Forthed' ANOVA of yesno
responses (Fig. 2A), we found main effects of name position,
F(1, 30) = 4.792, p<.05,η
= .066, and duration, F(3, 90) =
38.03, p< .001, η
= .414, as well as an interaction, F(3, 90) =
7.942, p< .001, η
= .129. As Fig. 2shows, having the target
name presented before rather than after the sequence benefited
detection substantially at 80 ms, but not at all at 13 ms, with the
other durations falling in between. Detection improved as the
duration increased from 13 to 80 ms. Separate paired ttests, two-
tailed, showed that d' was significantly above chance (p< .001) at
each duration in each group. For the name-before group at 13 ms,
t(15) = 4.45, p< .001, SEM = 0.162; the significance of the
difference increased at each of the other durations. For the name-
after group at 13 ms, t(15) = 7.91, p< .0001, SEM =0.139,and
at 27 ms, t(15) = 5.60, p< .0001, SEM = 0.122; the significance
of the difference increased at the other two durations.
In an ANOVA of the effect of the serial position of the
target on the proportions of correct yesresponses, the main
effect of serial position was significant, F(3, 90) = 4.417, p<
.01, η
= .023. The means were .71, .71, .69, and .75 for
serial positions 2, 3, 4, and 5, respectively, suggesting a small
recency effect. A marginal interaction with name position,
F(3, 90) = 2.702, p=.05,η
= .014, indicated that this
effect was larger when the name came after the sequence. This
was confirmed by separate analyses of serial position in the
before and after groups: Serial position was not significant in
the before group (p= .095), but was in the after group, F(3,
45) = 11.23, p<.001,η
= .073, for which the means were
.67, .69, .67, and .75.
An ANOVA of the two-alternative forced choice re-
sults on target-present trials (Fig. 2B) showed that ac-
curacy was high (M= .73) when participants had re-
ported yestothetarget(ahit),butatchance(M=
.52) when they had reported no(a miss), F(1, 30) =
57.92, p< .001, η
= .253. The main effect of group
(before/after) was significant, F(1, 30) = 6.70, p<.05,
Yes or no?
Left or right?
Target name Before or After sequence
Fig. 1 Illustration of a trial in Experiment 1. The target name appeared either 900 ms before the first picture or 200 ms after the last picture and the two
forced-choice pictures appeared after the participant responded yesor no
Atten Percept Psychophys (2014) 76:270279 273
= .018, and interacted with whether the response
had been yesor no,F(1, 30) = 4.63, p<.05,
= .026. When participants reported having seen
the target, forced choice accuracy was relatively better
in the before than in the after condition, although both were
above chance. When the target was missed, both groups were
close to chance. We found a main effect of duration, F(3,
90) = 3.76, p<.05,η
= .048, and no other significant
The main findings of Experiment 1are that viewers
can detect and retain information about named targets
that they have never seen before at an RSVP duration
as short as 13 ms, and that they can do so even when
they have no information about the target until after
presentation. Furthermore, no clear discontinuity in per-
formance emerged as duration was decreased from 80 to
13 ms. If reentrant feedback from higher to lower levels
played a necessary role in extracting conceptual infor-
mation from an image, one would expect an inability to
detect any targets at 27 and 13 ms, even in the before
condition, contrary to what we observed. If advance
information about the target resonated or interacted with
the incoming stimuli, accounting for successful perfor-
mance at 27 and 13 ms without reentrance, then perfor-
mance should have been near chance at those durations
in the after condition, again contrary to the results. A
feedforward account of detection is more consistent with
the results, suggesting that a presentation as short as 13
ms, even when masked by following pictures, is
sufficient on some trials for feedforward activation to
reach a conceptual level, without selective attention.
Experiment 2
One question about the results of Experiment 1was
whether they would generalize to sequences longer than
six pictures. Given that targets were limited to only four
serial positions (excluding the first and last pictures),
could participants have focused on just those pictures,
maintaining one or more of them in working memory to
compare subsequently to the target name? In that case,
increasing the number of pictures to 12 (in Exp. 2)
should markedly reduce the proportion detected, at least
in the name-after condition.
The method was the same as that of Experiment 1,
except as noted.
Participants The 32 participants (22 women, 10 men) were
volunteers 1835 years of age, most of whom were college
students; none had participated in Experiment 1.Theywere
paid for their participation. All signed a consent form ap-
proved by the MIT Committee on the Use of Humans as
Experimental Subjects. Participants were replaced if they
Forced Choice
Fig. 2 Results of Experiment 1, in which participants detected a picture
that matched a name given before or after the sequence ofsix images (N=
16 in each group). Error bars depict the standard errors of the means. (A)
d' results for yesno responses. (B) Proportions correct on a two-
alternative forced choice between two pictures with the same name on
target-present trials, conditional on whether the participant had reported
yesin the detection task (labeled Hit)orno(Miss). Chance = .5
274 Atten Percept Psychophys (2014) 76:270279
made more than 50% false yesresponses, overall, on non-
target trials. No participant was replaced in the before group,
and four were replaced in the after group.
Design The design was like that of Experiment 1,withtwo
groups of participants, one with the target presented before the
sequence, the other with the target presented after. The main
difference was that trials consisted of 12 rather than six pictures.
To make the 12-picture sequences, two 6-picture sequences from
Experiment 1were combined by randomly pairing the trials in a
given block, with the restriction that the two targets in a pair were
in the same serial position (2, 3, 4, or 5; after combination, the
two potential targets were in Serial Positions 2 and 8, or 3 and 9,
etc.). To end up with an even number of six-item sequences, we
generated two new six-picture trials per block, one with a target
and one without. Each block contained 11 trials, eight with
targets and three without. Each of the eight target serial positions
was represented once per block. Which of the two target names
was used was counterbalanced between subjects within each
group. Across participants within a group, the eight blocks of
trials were rotated so that the pictures in each block of
trials were seen equally often at each duration and in
each half of the experiment.
Results and discussion
The results are shown in Fig. 3. The main results were similar to
those of Experiment 1.Inthed' analysis of the yesno re-
sponses, we found main effects of whether the name was given
before or after, F(1, 30) = 8.785, p<.01,η
= .083, and
duration, F(3, 90) = 28.67, p< .001, η
= .397. Detection was
more accurate when the name was given before the sequence
rather than after, and it improved as the duration increased from
13 to 80 ms. The interaction was not significant (p=.22).
Separate paired ttests, two-tailed, showed that d' was signifi-
cantly above chance (p<.02)ateachdurationineachgroup.
For the name-before group at 13 ms, t(15) = 3.28, p<.01,SEM
= 0.152; the significance of the difference increased at each of
the other durations. For the name-after group at 13 ms, t(15) =
2.83, p<.02,SEM = 0.155; the significance of the difference
again increased at each of the other durations.
In an ANOVA of the effect of the serial position of the
target on the proportions of correct yesresponses, the main
effect of serial position was significant, F(7, 210) = 5.20, p<
.001, η
= .032. The means were .57, .54, .66, .76, .66, .63,
.64, and .62 for Serial Positions 2, 3, 4, 5, 8, 9, 10, and 11,
respectively, suggesting a slight disadvantage for primacy, but
no recency benefit. We found no interactions.
An ANOVA of the two-alternative forced choice results on
target-present trials (Fig. 2B) showed that accuracy was fairly
high (M= .67) when participants had reported yesto the
target (a hit) but was near chance (M=.52)whentheyhad
reported no(a miss), F(1, 30) = 20.61, p<.001,η
The main effect of group (before/after) was not significant,
F(1, 30) = 2.34, p= .136, η
= .018, but a marginal
interaction did emerge with whether the response had been
yesor no,F(1, 30) = 2.88, p=.10,η
= .019. As in
Experiment 1, having the name before was only better than
having the name after when the participant reported having
seen the target; when the trial was a miss, both groups were
close to chance. We found no main effect of duration, F(3,
90) = 1.35, but did find an interaction with hit/miss,
F(3, 90) = 6.43, p< .01, η
= .064: As can be seen in
Fig. 3B, the hit benefit was larger at longer durations.
Altogether, the results of Experiment 2replicated the main
results of Experiment 1, but now with 12 pictures per se-
quence rather than six (see Fig. 4). An ANOVA compared
the d' results of the two experiments. Performance (as d')was
significantly higher with six-picture sequences (M=1.33)
than with 12-picture sequences (M=1.06),F(1, 60) =
9.83, p<.01,η
= .057. No interactions with exper-
iment were significant.
Clearly, we can reject the hypothesis that participants could
encode only two or three pictures in working memory; otherwise,
performance would have fallen more dramatically in Experiment
2, especially in the after condition, in which participants had to
retain information about the pictures for later retrieval.
The results also demonstrate that a specific attentional
expectation is not required for extracting conceptual informa-
tion from a stream of pictures: Performance remained sub-
stantially above chance at all durations when the target was
specified after the sequence. The forced choice results indi-
cate, however, that visual details were lost at the two shorter
durations with 12 pictures to retain, even when the target was
correctly reported. In the after condition, however, the forced
choice test was slightly delayed relative to the before condi-
tion, because the participants had to read the name of the target
and scan their memory of the sequence before making a yes
or noresponse. This intervening processing may account
for the somewhat reduced performance in the forced choice
task in both Experiments 1and 2when the target name
followed the sequence.
General discussion
The results of both experiments show that conceptual under-
standing can be achieved when a novel picture is presented as
briefly as 13 ms and masked by other pictures. Even when
Because of the relatively large number of replaced participants in
Experiment 2safter group, we also ran the main d' analysis with the
original 16 participants. Although d' was slightly lower with the original
group than with the replaced participants, none of the significance levels
changed, including the comparison with the before group.
Atten Percept Psychophys (2014) 76:270279 275
participants were not given the target name until after they had
viewed the entire sequence of six or 12 pictures, their perfor-
mance was above chance even at 13 ms, indicating that a top-
down attentional set is not required in order to rapidly extract
and at least briefly retain conceptual content from an RSVP
stream. The numbers of pictures in the sequence and their
serial positions had little effect on performance, suggesting
that pictures were processed immediately rather than accumu-
lating in a limited-capacity memory buffer for subsequent
processing. This pattern of results supports the hypothesis that
feedforward processing is sufficient for the conceptual com-
prehension of pictures.
As expected, detection was more accurate, the longer the
duration per picture. However, it was striking that no sharp
drop in detection was apparent at or below a duration of 50
ms, contrary to the predictions of feedback or reentrant models
of conscious perception (e.g., Del Cul et al., 2007; Lamme,
2006). Instead, performance declined gradually with shorter
durations, but remained well above chance at 13 ms. More-
over, when viewers reported that they had detected the target,
they were usually above chance in selecting it in a forced
choice between two pictures, both of which fit the target
name: That is, they remembered more about the picture than
simply the concept provided by the target name. When they
had not detected the target, their forced choice was near
chance, suggesting that the visual features of unidentified
pictures were not retained.
Although the present behavioral results cannot entirely rule
out feedback, they do present a challenge to existing reentrant
models. They also raise a further question: How can concep-
tual understanding persist long enough to be matched to a
name presented 200 ms after the offset of the final masking
picture, given that the target might have been any of the six or
12 pictures just viewed? The answer to this question may lie in
the carwash metaphor of visual processing (Moore & Wolfe,
2001; Wolfe, 2003), in which each stimulus is passed from
one level of processing to the next. In such a model, multiple
stimuli can be in the processing pipeline at once. At the end of
this pipeline, the stimuli, having now been processed to the
level of concept, may persist in local recurrent networks that
sustain activation for several pictures in parallel, at least
briefly. In such a model, concepts are presumably represented
in a multidimensional, sparsely populated network in which
visual masks may not be effective if they are not also concep-
tually similar to the item being masked. The finding that a
forced choice between two conceptually equivalent pictures
was above chance only if the participant correctly detected the
target is consistent with the conjecture that when feedforward
processing does not reach a conceptual level, lower levels of
representation are already masked, and no featural informa-
tion can be accessed.
The finding that observers can perceive and comprehend
conceptual information from such brief images extends pre-
vious evidence that a purely feedforward mode of processing
is capable of decoding complex information in a novel image
(e.g., DiCarlo et al., 2012; Serre, et al., 2007a; Thorpe et al.,
1996). Feedforward models are consistent with a range of
neural results. For example, in a study by Keysers et al.
Forced Choice
Fig. 3 Results of Experiment 2, in which participants detected a picture
that matched a name given before or after a sequence of 12 images (N=
16 in each group). Error bars depict the standard errors of the means. (A)
d' results for yesno responses. (B) Proportions correct on a two-
alternative forced choice between two pictures with the same name on
target-present trials, conditional on whether the participant had reported
yesin the detection task (labeled Hit)orno(Miss). Chance = .5
276 Atten Percept Psychophys (2014) 76:270279
(2001), recordings were made of individual neurons in the
cortex of the anterior superior temporal sulcus (STSa) of
monkeys as they viewed continuous RSVP sequences of
pictures; the monkeysonly task was to fixate the screen.
Neurons in STSa that were shown to respond selectively to a
given picture at a relatively slow presentation rate of 200 ms
per picture also responded selectively (although not as
strongly) to the same picture at presentations as short
as 14 ms per image.
The present behavioral results suggest that feedforward
processing is capable of activating the conceptual identity of
a picture, even when reentrant processing has presumably
been blocked because the picture is presented briefly and is
then masked by immediately following pictures. Since partic-
ipants were capable of reporting the presence of a target under
these conditions, the results strongly suggest that reentrant
processing is not always necessary for conscious processing.
They are consistent with the possibility, however, that reen-
trant loops facilitate processing and may be essential to
comprehending the details of an image. For example, a rapid
but coarse first pass of low-spatial-frequency information may
provide global category information that is subsequently re-
fined by reentrant processing (e.g., Bar et al., 2006). Work
with monkeys has shown that neurons that are selective for
particular faces at a latency of about 90 ms give additional
information about facial features beginning about 50 ms later
(Sugase, Yamane, Ueno, & Kawano, 1999). Reentrant pro-
cessing therefore might be involved after an initial
feedforward pass (Di Lollo, 2012).
The present findings can be contrasted with those of
masked-priming studies in which the prime is not consciously
seen, although it has an effect on the response to an immedi-
ately following stimulus. In a typical experiment, a brief
presentation of a word in the range of 2560 msthe
primeis followed by a second, unmasked word, to which
the participant must respond (Dehaene et al., 2001; Forster &
Davis, 1984). If the prime is identical or related to the target
word, the response to the latter is faster and more accurate than
with no prime or an unrelated prime, even when the prime is
not consciously identified. In such studies, the participants
focus of attention is on the final word, whose longer duration
permits it to receive full, recurrent processing that might block
awareness of the more vulnerable information from the prime
that was extracted during the feedforward sweep. In the pres-
ent experiments, in contrast, the masking stimuli were of the
same duration as the preceding target stimulus, and all were
potential targets. In these conditions, even durations as short
as 13 ms are clearly sufficient, on a significant proportion of
trials, to drive conscious detection, identification, and imme-
diate recognition memory.
Finally, perhaps our most striking finding is that perfor-
mance was surprisingly good, even when the target name was
given only after the sequence. It has long been assumed that
the detection of rapidly presented targets in an RSVP stream
(e.g., Potter, 1976) is possible only because the participants
had the opportunity to configure their attentional filters in
advance (e.g., Peelen & Kastner, 2011). Indeed, Potter
(1976) found that the ability to detect an image named in
advance was much greater than the ability to recognize pic-
tures later in a yesno test of all of the pictures mixed with
distractors. Other research (e.g., Potter et al., 2002) has indi-
cated that the memory traces generated by RSVP are fragile
Before - 6
Before - 12
After - 6
After- 12
Fig. 4 Comparison of the d' results of Experiment 1(six pictures) and Experiment 2(12 pictures). Error bars depict the standard errors of the means
Atten Percept Psychophys (2014) 76:270279 277
and are rapidly degraded by successive recognition testing.
When participants are given an immediate recognition test of
just one item, however, the present results show that they are
able to detect it in their decaying memory trace at a level of
accuracy not far from the accuracy when the target was
prespecified at the start of the trial. This result is consistent
with the idea that a single forward sweep as short as 13 ms is
capable of extracting a pictures conceptual meaning without
advance knowledge. Moreover, the picturesconceptual iden-
tities can be maintained briefly, enabling one to be matched to
a name presented after the sequence.
A possible role for such rapid visual understanding in
normal experience would be to provide nearly instantaneous
conceptual activation that enables immediate action when
necessary, without waiting to refine understanding by reen-
trant processing or by the kind of conscious reflection that
requires a stable recurrent network.
Author note This research was supported by National Institutes of
Health Grant No. MH47432. M.C.P. developed the study concept. All
of the authors contributed to the study design. The testing, data collection,
and data analysis were performed by E.S.M. and C.E.H. under the
supervision of M.C.P. and B.W. M.C.P. drafted the article, and B.W.
and C.E.H. provided critical revisions. All of the authors approved the
final version of the article for submission. We thank Chidinma Egbukichi
and Steven Yu for assistance.
Bacon-Macé, N., Kirchner, H., Fabre-Thorpe, M., & Thorpe, S.J. (2007).
Effects oftaskrequirements on rapid natural scene processing: From
common sensory encoding to distinct decisional mechanisms.
Journal of Experimental Psychology: Human Perception and
Performance, 33, 10131026. doi:10.1037/0096-1523.33.5.1013
Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M.,
Dale, A. M., & Halgren, E. (2006). Top-down facilitation of visual
recognition. Proceedings of the National Academy of Sciences, 103,
449454. doi:10.1073/pnas.0507062103
Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10,
433436. doi:10.1163/156856897X00357
Crouzet, S. M., Kirchner, H., & Thorpe, S. J. (2010). Fast saccades
toward faces: Face detection in just 100 ms. Journal of Vision,
10(4):16, 117. doi:10.1167/10.4.16
Cukur, T., Nishimoto, S., Huth, A. G., & Gallant, J. I. (2013). Attention
during natural vision warps semantic representation across the hu-
man brain. Nature Neuroscience, 16, 763770.
Dehaene, S., Kergsberg, M., & Changeux, J. P. (1998). A neuronal model
of a global workspace in effortful cognitive tasks. Proceedings of the
National Academy of Sci ences, 95, 1452914534.
Dehaene, S., & Naccache, L. (2001). Towards a cognitive neuroscience of
consciousness: Basic evidence and a workspace framework.
Cognition, 79, 137. doi:10.1016/S0010-0277(00)00123-2
Dehaene, S., Naccache, L., Cohen, L., LeBihan, D., Mangin, J. F., Poline,
J.-B., & Rivière, D. (2001). Cerebral mechanisms of word masking
and unconscious repetition priming. Nature Neuroscience, 4, 752
758. doi:10.1038/89551
Dehaene, S., Sergent, C., & Changeux, J.-P. (2003). A neuronal network
model linking subjective reports and objective physiological data
during conscious perception. Proceedings of the National Academy
of Sciences, 100, 85208525.
Del Cul, A., Baillet, S., & Dehaene, S. (2007). Brain dynamics underly-
ing the nonlinear threshold for access to consciousness. PLoS
Biology, 5, 24082423.
DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain
solve visual object recognition? Neuron, 73, 415434. doi:10.1016/
Di Lollo, V. (2012). The feature-binding problem is an ill-posed problem.
Trends in Cognitive Sciences, 16, 317321.
Di Lollo, V., Enns, J. T., & Rensink, R. A. (2000). Competition for
consciousness among visual events: The psychophysics of reentrant
visual pathways. Journal of Experimental Psychology: General,
129, 481507. doi:10.1037/0096-3445.129.4.481
Enns,J.T.,&DiLollo,V.(2000).Whats new in visual masking? Trends in
Cognitive Sciences, 4, 345352. doi:10.1016/S1364-6613(00)01520-5
Evans, K. K., Horowitz, T. S., & Wolfe, J. W. (2011). When categories
collide: Accumulation of information about multiple categories in
rapid scene perception. Psychological S cience, 22, 739746. doi:10.
Fabre-Thorpe, M. (2011). The characteristics and limits of rapid visual
categorization. Frontiers in Psychology, 2, 243. doi:10.3389/fpsyg.
Forster, K. I., & Davis, C. (1984). Repetition priming and frequency
attenuation in lexical access. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 10, 680698. doi:10.1037/0278-
Hochstein, S., & Ahissar, M. (2002). View from the top: Hierarchies and
reverse hierarchies in the visual system. Neuron, 36, 791804.
Hung, C. P., Kreiman, G., Poggio, T., & DiCarlo, J. J. (2005). Fast readout
of object identity from macaque inferior temporal cortex. Science,
310, 863866.
Intraub, H. (1981). Rapid conceptual identification of sequentially presented
pictures. Journal of Experimental Psychology: Human Perception and
Performance, 7, 604610. doi:10.1037/0096-1523.7.3.604
Intraub, H. (1984). Conceptual masking: The effects of subsequent visual
events on memory for pictures. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 10, 115125. doi:
Keysers, C., Xiao, D.-K., Földiák,P., & Perrett, D. I. (2001). The speed of
sight. Journal of Cognitive Neuroscience, 13, 90101.
Keysers, C., Xiao, D.-K., Földiák, P., & Perrett, D. I. (2005). Out of sight
but not out of mind: The neurophysiology of iconic memory in the
superior temporal sulcus. Cognitive Neuropsychology, 22, 316332.
Kovacs, G., Vogels, R., & Orban, G. A. (1995). Cortical correlate of
backward masking. Proceedings of the National Academy of
Sciences, 92, 55875591.
Lamme, V. A. F. (2006). Towards a true neural stance on consciousness.
Trends in Cognitive Sciences, 10, 494501. doi:10.1016/j.tics.2006.
Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision
offered by feedforward and recurrent processing. Trends in
Neurosciences, 23, 571579. doi:10.1016/S0166-2236(00)01657-X
Liu, H., Agam, Y., Madsen, J. R., & Kreiman, G. (2009). Timing, timing,
timing: Fast decoding of object information from intracranial field
potentials in human visual cortex. Neuron, 62, 281290.
Llinás, R., Ribary, U., Contreras, D., & Pedroarena, C. (1998). The
neuronal basis for consciousness. Philosophical Transactions of
the Royal Society B, 353, 18411849.
Loftus, G. R., Hanna, A. M., & Lester, L. (1988). Conceptual masking:
How one picture captures attention from another picture. Cognitive
Psychology, 20, 237282. doi:10.1016/0010-0285(88)90020-5
Loschky, L. C., Hansen, B. C., Sethi, A., & Pydimarri, T. N. (2010). The
role of higher order image statistics in masking scene gist recogni-
tion. Attention, Perception, & Psychophysics, 72, 427444. doi:10.
278 Atten Percept Psychophys (2014) 76:270279
Macknik, S. L., & Martinez-Conde, S. (2007). The role of feedback in
visual masking and visual processing. Advances in Cognitive
Psychology, 3, 125152.
McKeeff, T. J., Remus, D. A., & Tong, F. (2007). Temporal limitations in
object processing across the human ventral visual pathway. Journal
of Neurophysiology, 98, 382393.
Moore, C. M., & Wolfe, J. M. (2001). Getting beyond the serial/parallel
debate in visual search: A hybrid approach. In K. Shapiro (Ed.), The
limits of attention: Temporal constraints on human information
processing (pp. 178198). Oxford, UK: Oxford University Press.
Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual
search in human occipitotemporal cortex. Proceedings of the
National Academy of Sci ences, 108, 1212512130.
Perrett, D., Hietanen, J., Oram, M., & Benson, P. (1992). Organization
and functions of cells responsive to faces in the temporal cortex.
Philosophical Transactions of the Royal Society B, 335, 2330.
Potter, M. C. (1975). Meaning in visual search. Science, 187, 965966.
Potter, M. C. (1976). Short-term conceptual memory for pictures. Journal of
Experimental Psychology: Human Learning and Memory, 2, 509522.
Potter, M. C., & Levy, E. I. (1969). Recognition memory for a rapid
sequence of pictures. Journal of Experimental Psychology, 81, 10
15. doi:10.1037/h0027470
Potter, M. C., Staub, A., Rado, J., & OConnor, D. H. (2002).
Recognition memory for briefly-presented pictures: The time course
of rapid forgetting. Journal of Experimental Psychology: Human
Perception and Performance, 28, 11631175. doi:10.1037/0096-
Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., & Poggio, T.
(2007a). A quantitative theory of immediate visual recognition.
Progress in Brain Research, 165, 3356.
Serre, T., Oliva, A., & Poggio, T. (2007b). A feedforward architecture
accounts for rapid categorization. Proceedings of the National
Academy of Sciences, 104, 64246429.
Sugase, Y., Yamane, S., Ueno, S., & Kawano, K. (1999). Global and fine
information coded by single neurons in the temporal visual cortex.
Nature, 400, 869873.
Thorpe, S., & Fabre-Thorpe, M. (2001). Seeking categories in the brain.
Science, 291, 260263.
Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the
human visual system. Nature, 381, 520522.
Tononi, G., & Koch, C. (2008). The neural correlates of consciousness:
An update. Annals of the New York Academy of Sciences, 1124,
Wolfe, J. M. (2003). Moving towards solutions to some enduring con-
troversies in visual search. Trends in Cognitive Science, 7, 7076.
Atten Percept Psychophys (2014) 76:270279 279
Reproduced with permission of the copyright owner. Further reproduction prohibited without
... Les recherches en sciences cognitives prouvent sans équivoque que le cerveau humain traite les stimuli visuels plus rapidement que les stimuli verbaux (Childers, 1985) (Kiat & Belli, 2018) (Potter, Wyble, & McCourt, 2013) cf. (Khaneman, 2012). ...
... Traitement cognitif des signes visuels et verbauxpanneaux routiersEn effet, l'usage privilégie des signes iconiques par rapport à des messages verbaux sur les panneaux routiers. Ceci est conditionné non seulement par le fait que ces premiers n'exigent pas de décodage linguistique par les chauffeurs allophones, mais également car ils sont compris quasi immédiatement 131 par les automobilistes dans une situation ou le reflexe joue un rôle 131 Selon les chercheurs de MIT, la compréhension des stimuli visuels complexes peut être accomplie en seulement 13 ms(Potter, Wyble, & McCourt, 2013). ...
L'analyse scientifique des unités non-verbales occupe une place encore marginale au sein des études d'interprétation. Or, le contexte éminemment interculturel des interactions exolingues interprétées exige de reconnaître le caractère multimodal des énoncés-sources pour en analyser les paramètres d'influence. Interdisciplinaire, le présent travail se propose d'examiner les modalités de prise en compte de la gestualité co-verbale dans la pratique professionnelle des interprètes en service public (ISP). Sur le plan théorique, cette recherche se donne pour objectif de tracer le chemin d'évolution du paradigme d'interprète allant d'un être transparent, jusqu'au médiateur interculturel. Elle s'articule par ailleurs autour de l'analyse du non-verbal au travers du prisme des modèles de la communication et de celui des études des propriétés sémiotiques des unités de sens du système visuel. Ces opérations mènent à élaborer une typologie des gestes observables en ISP, inspirée des classements avancés par des gestualistes tels que D. McNeill, J. Cosnier et F. Poyatos. La méthode adoptée repose sur une triangulation de données, impliquant d'abord une enquête menée auprès de 60 interprètes professionnels, des entretiens individuels ensuite, et enfin un corpus multimodal. L'analyse qui en découle permet de révéler des différences fondamentales entre la production d'une part et les perspectives de la perception des gestes co-verbaux d'autre part. Le corpus audiovisuel réunit ici des interactions authentiques et d'autres semi-contrôlées, en contexte médical, social et policier, impliquant 16 langues de travail différentes. L'analyse des séquences vidéo d'une durée totale de 13015 secondes, annotées à l'aide du logiciel ELAN, permet d'établir les profils gestuels des acteurs et d'examiner les schémas et les contextes de reproduction des gestes par les interprètes, pour en déduire des récurrences. Les résultats de l'étude suggèrent ainsi que la gestualité co-verbale participe aux processus de co-construction et de négociation du sens, facilite la médiation interculturelle et contribue à l'élaboration des relations de confiance dans des situations d'asymétrie de pouvoir. C'est pourquoi, la sensibilisation à la place inhérente du non-verbal dans les interactions en service public devrait faire partie de la formation des interprètes dont la mission essentielle consiste à assurer une médiation efficace, non seulement entre des systèmes linguistiques différents mais aussi entre des univers culturels distincts.
... The sensorimotor rhythms (SMR) paradigm is one motor imagery technique that is defined as the imagined movements of large body parts such as the hands, feet, or tongue [4]. This imagined movement causes event-related desynchronization in the mu (8)(9)(10)(11)(12) and beta (18)(19)(20)(21)(22)(23)(24)(25)(26) bands of brain activity in the sensorimotor cortex [14]. One major limitation for SMR based paradigms is the lengthy training times required for participants to learn to modulate the specific frequency bands of brain activity [4]. ...
... Visual imagery, or the manipulation of visual information from memory [23], could be a useful BCI control paradigm that has yet been relatively untested [24]. The human brain is visual by nature: 90% of the information transmitted by the brain is visual [25], and it can process images 60,000 times faster than text [24], [26]. Visual imagery may also be a more intuitive control strategy than any of the paradigms listed above [16]. ...
Conference Paper
Full-text available
Brain-Computer Interface (BCI) technology may provide individuals with motor impairments or even the general population a new way to interact with the world around them. However, current BCI systems using electroencephalography (EEG) can be unreliable and produce large variations in performance. Most studies seek to improve performance by focusing on signal processing and classification techniques. However, it may also be beneficial to investigate different control strategies. For this reason, the main objective of this pilot study was to investigate the use of visual imagery, a control paradigm that has not been much tested for EEG BCI applications. Visual imagery may provide a more intuitive control strategy with a greater number of available classes than other popular imagery-based methods such as motor imagery. Using this paradigm, we have demonstrated above chance binary classification accuracy (59.9%, p < 0.05) during offline decoding of face and scene visual imagery. Furthermore, the participant in this study achieved significantly above chance performance during a three-class, closed-loop BCI interaction (47.2%, p = 0.05). The initial results of this pilot study demonstrate the feasibility of using visual imagery as an alternative EEG BCI control paradigm.
... There is clear evidence that our brain understands and reacts faster to information or data that is visually relayed [13], [14]. Furthermore, it is found that analyzing a single image takes the brain as little as 13 milliseconds, which is much faster than analyzing any other type of information [15]. Since minimising responsetime to a risk-prone situation is critical, a method called FiToViz was proposed as a visualisation approach the to track and identify abnormalities when a person is under a risk-prone situation. ...
Full-text available
With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process. Since visualization is such an effective tool for human comprehension, memorization, and judgment, we have presented a first-of-its-kind estimation approach we refer to as Visualized Learning for Machine Learning (VL4ML) that not only can serve to assist physicians and clinicians in making reasoned medical decisions, but it also allows to appreciate the uncertainty visualization, which could raise incertitude in making the appropriate classification or prediction. For the proof of concept, and to demonstrate the generalized nature of this visualized estimation approach, five different case studies are examined for different types of tasks including classification, regression, and longitudinal prediction. A survey analysis with more than 100 individuals is also conducted to assess users' feedback on this visualized estimation method. The experiments and the survey demonstrate the practical merits of the VL4ML that include: (1) appreciating visually clinical/medical estimations; (2) getting closer to the patients' preferences; (3) improving doctor-patient communication, and (4) visualizing the uncertainty introduced through the black box effect of the deployed ML algorithm. All the source codes are shared via a GitHub repository.
... Tracking information flow is important for testing how disparate brain regions work together to produce visual attention. Information about the timescale of feedback can be important for other cognitive questions, such as whether conscious awareness is possible with only feed-forward information flow (see for example Maguire & Howe, 2016;Potter et al., 2014). It can also lay the groundwork for brain stimulation investigations of causality by allowing researchers to predict when feedforward, feedback, and recurrent processing dominate information flow, and so target specific processes to perturb. ...
Human behaviour is extraordinarily flexible. Task fMRI and patient studies highlight a network of frontoparietal brain regions, called “multiple-demand” regions, that serve as a hub for flexible cognition. Neurons within this network adapt to code what is relevant for the current task, across many task types. Under the attentional episodes view, prioritising immediately relevant information in this way allows us to break complex tasks into simple parts and drive neural resources toward the problem at hand. Many studies demonstrate that relevant information is preferentially encoded, such that we can read out features more accurately when they are task-relevant. Yet we do not know whether the preferential coding that we see on slow timescales and in simple tasks supports moments of narrow focus, or “temporal modules”, in more complex, multi-part tasks. This is central to the attentional episodes account: that selection of immediately relevant information, through preferential coding, dynamically shifts to give us the information that we need for each part of a task. Chapter 2 begins by asking how preferential coding emerges in a multi-part task. Using MEG data from a dual-epoch task with single object (Experiment 1) and dual-object (Experiment 2) displays, I show that what is relevant can be preferentially encoded in sequential task epochs with similar rapidity. Preferential coding in either epoch was only detected with dual-object displays, mirroring dominant theories of attention as a spatial spotlight, or a filter to reduce complexity. Chapter 3 builds on this by tracing how ventral visual and multiple-demand regions contribute to, and communicate, coding of relevant stimulus information throughout a task. I resolve MEG data from a multi-epoch visual attention task (Experiment 2) to source space, to pull apart how the stimulus coding in Chapter 2 arises in visual and domain-general regions. Using Granger causality, I probe the timecourse of top-down and bottom-up information flow as the relevant feature shifts. I show feedback from prefrontal to visual regions emerging in both task epochs, again highlighting how flexibly we are able to direct focus to multiple parts of a task in turn. Chapter 4 extends Chapters 2 and 3 to a situation like those we face often in daily life, where what is relevant for each part of a task can be present throughout. These distractors with some task-relevance are also common in classical tests of fluid ability, and could be preferentially attended if selection is not strictly directed to the immediate task part. I use a behavioural task with two sequential displays, each containing a relevant- and an irrelevant-coloured moving dot cloud. Despite being cued to attend to two colours in sequence, participants were not more distracted by the second target colour when it appeared as a distractor in the first task epoch. That is, we can effectively direct our focus to what is immediately relevant, even when presented with a future-relevant feature. Attending to what is currently relevant as we move through parts of a task is a central aspect of flexible behaviour. These studies probe the limits of this temporal modularity in attention. They show that we can preferentially encode distinct stimulus features as what is relevant changes; and that we can overtly respond to what is relevant in each task part, even when a feature relevant for one task part is visible throughout. Together, they emphasise our extraordinary capacity to direct our focus toward what is relevant.
... In 2011, researchers used three refresh rates to investigate how changes in the CRT (cathode ray tube) temporal stimulus affect cortical responses in tree shrew V1 (the primary visual cortex), they find that refresh rate had a large impact on firing rate and the amplitude of LFP (120 Hz > 90 Hz > 60 Hz). Since mean firing rate is positively correlated with refresh rate, V1 acts like a high-pass filter for sparse noise stimuli as a function of refresh rate (Potter et al., 2014). Furthermore, researchers found the minimum timescale for motion encoding by ganglion cells of cat retinal was 4.6 ms and depended non-linearly on temporal frequency in 2011 (Borghuis et al., 2019). ...
Full-text available
The refresh rate is one of the important parameters of visual presentation devices, and assessing the effect of the refresh rate of a device on motion perception has always been an important direction in the field of visual research. This study examined the effect of the refresh rate of a device on the motion perception response at different stimulation frequencies and provided an objective visual electrophysiological assessment method for the correct selection of display parameters in a visual perception experiment. In this study, a flicker-free steady-state motion visual stimulation with continuous scanning frequency and different forms (sinusoidal or triangular) was presented on a low-latency LCD monitor at different refresh rates. Seventeen participants were asked to observe the visual stimulation without head movement or eye movement, and the effect of the refresh rate was assessed by analyzing the changes in the intensity of their visual evoked potentials. The results demonstrated that an increased refresh rate significantly improved the intensity of motion visual evoked potentials at stimulation frequency ranges of 7–28 Hz, and there was a significant interaction between the refresh rate and motion frequency. Furthermore, the increased refresh rate also had the potential to enhance the ability to perceive similar motion. Therefore, we recommended using a refresh rate of at least 120 Hz in motion visual perception experiments to ensure a better stimulation effect. If the motion frequency or velocity is high, a refresh rate of≥240 Hz is also recommended.
... In a typical RSVP task, participants have to identify an alphanumeric target that is embedded in a stream of multiple distractors, presented at rapid rates (usually about 10 items per second). When the target's response dimension is different than that of the distractors (e.g., a target digit among letters, Figure 3A), it can be easily differentiated from the distractors, in spite of the fact that its temporal position in the stream is unpredictable (Potter, Wyble, Hagmann, & McCourt, 2014). Variations of this paradigm produce two highly reliable patterns of results. ...
Full-text available
Many models of attention assume that attentional selection takes place at a specific moment in time that demarcates the critical transition from pre-attentive to attentive processing of sensory input. We argue that this intuitively appealing standard account of attentional selectivity is not only inaccurate, but has led to substantial conceptual confusion. As an alternative, we offer a ‘diachronic’ framework that describes attentional selectivity as a process that unfolds over time. Key to this view is the concept of attentional episodes, brief periods of intense attentional amplification of sensory representations that regulate access to working memory and response-related processes. We describe how attentional episodes are linked to earlier attentional mechanisms and to recurrent processing at the neural level. We review studies that establish the existence of attentional episodes, delineate the factors that determine if and when they are triggered, and discuss the costs associated with processing multiple events within a single episode. Finally, we argue that this framework offers new solutions to old problems in attention research that have never been resolved. It can provide a unified and conceptually coherent account of the network of cognitive and neural processes that produce the goal-directed selectivity in perceptual processing that is commonly referred to as ‘attention’.
Full-text available
This volume is devoted to the emerging field of Integrated Visual Knowledge Discovery that combines advances in Artificial Intelligence/Machine Learning (AI/ML) and Visualization/Visual Analytics. Chapters included are extended versions of the selected AI and Visual Analytics papers and related symposia at the recent International Information Visualization Conferences (IV2019 and IV2020). AI/ML face a long-standing challenge of explaining models to humans. Models explanation is fundamentally human activity, not only an algorithmic one. In this chapter we aim to present challenges and future directions within the field of Visual Analytics, Visual Knowledge Discovery and AI/ML, and to discuss the role of visualization in visual AI/ML. In addition, we describe progress in emerging Full 2D ML, natural language processing, and AI/ML in multidimensional data aided by visual means.
The human brain rapidly and automatically categorizes faces vs. other visual objects. However, whether face-selective neural activity predicts the subjective experience of a face – perceptual awareness – is debated. To clarify this issue, here we use face pareidolia, i.e., the illusory perception of a face, as a proxy to relate the neural categorization of a variety of facelike objects to conscious face perception. In Experiment 1, scalp electroencephalogram (EEG) is recorded while pictures of human faces or facelike objects – in different stimulation sequences – are interleaved every second (i.e., at 1 Hz) in a rapid 6-Hz train of natural images of nonface objects. Participants do not perform any explicit face categorization task during stimulation, and report whether they perceived illusory faces post-stimulation. A robust categorization response to facelike objects is identified at 1 Hz and harmonics in the EEG frequency spectrum with a facelike occipito-temporal topography. Across all individuals, the facelike categorization response is of about 20% of the response to human faces, but more strongly right-lateralized. Critically, its amplitude is much larger in participants who report having perceived illusory faces. In Experiment 2, facelike or matched nonface objects from the same categories appear at 1 Hz in sequences of nonface objects presented at variable stimulation rates (60 Hz to 12 Hz) and participants explicitly report after each sequence whether they perceived illusory faces. The facelike categorization response already emerges at the shortest stimulus duration (i.e., 17 ms at 60 Hz) and predicts the behavioral report of conscious perception. Strikingly, neural facelike-selectivity emerges exclusively when participants report illusory faces. Collectively, these experiments characterize a neural signature of face pareidolia in the context of rapid categorization, supporting the view that face-selective brain activity reliably predicts the subjective experience of a face from a single glance at a variety of stimuli.
Full-text available
Six experiments investigated repetition priming and frequency attenuation in lexical access with 164 college students. Repetition priming effects in lexical decision tasks are stronger for low-frequency words than for high-frequency words. This frequency attenuation effect creates problems for frequency-ordered search models that assume a relatively stable frequency effect. It was posited that frequency attenuation is a product of the involvement of the episodic memory system in the lexical decision process. This hypothesis was supported by the demonstration of constant repetition effects for high- and low-frequency words when the priming stimulus was masked; the masking was assumed to minimize the influence of any possible episodic trace of the prime. It was further shown that long-term repetition effects were much less reliable when the S was not required to make a lexical decision response to the prime. When a response was required, the expected frequency attenuation effect was restored. It is concluded that normal repetition effects consist of 2 components: a very brief lexical effect that is independent of frequency and a long-term episodic effect that is sensitive to frequency. (32 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Full-text available
The binding problem arises when visual features (colour, orientation), said to be coded in independent brain modules, are to be integrated into unitary percepts. I argue that binding is an ill-posed problem, because those modules are now known to code jointly for multiple features, rendering the feature-binding issue moot. A hierarchical reentrant system explains the emergence of coherent visual objects from primitive features. An initial feed-forward sweep activates many high-level perceptual hypotheses, which descend to lower levels, where they correlate themselves with the ongoing activity. Low correlations are discarded, whereas the hypothesis that yields the highest correlation is confirmed and leads to conscious awareness. In this system, there is no separate binding process that actively assigns features to objects.
When viewing a rapid sequence of pictures, observers momentarily understand the gist of each scene but have poor recognition memory for most of them (M. C. Potter, 1976). Is forgetting immediate, or does some information persist briefly? Sequences of 5 scenes were presented for 173 ms/picture: when yes-no testing began immediately, recognition was initially high but declined markedly during the 10-item test. With testing delays of 2 or 6 s, the decline over testing was less steep. When 10 or 20 pictures were presented, there was again a marked initial decline during testing. A 2-alternative forced-choice recognition test produced similar results. Both the passage of time and test interference (but not presentation interference) led to forgetting. The brief persistence of information may assist in building a coherent representation over several fixations.
Little is known about how attention changes the cortical representation of sensory information in humans. On the basis of neurophysiological evidence, we hypothesized that attention causes tuning changes to expand the representation of attended stimuli at the cost of unattended stimuli. To investigate this issue, we used functional magnetic resonance imaging to measure how semantic representation changed during visual search for different object categories in natural movies. We found that many voxels across occipito-temporal and fronto-parietal cortex shifted their tuning toward the attended category. These tuning shifts expanded the representation of the attended category and of semantically related, but unattended, categories, and compressed the representation of categories that were semantically dissimilar to the target. Attentional warping of semantic representation occurred even when the attended category was not present in the movie; thus, the effect was not a target-detection artifact. These results suggest that attention dynamically alters visual representation to optimize processing of behaviorally relevant objects during natural vision.
When a sequence of pictures is presented at the rapid rate of 113 msec/picture, a viewer can detect a verbally specified target more than 60% of the time. In the present experiment, sequences of pictures were presented to 96 undergraduates at rates of 258, 172, and 114 msec/picture. A target was specified by name, superordinate category, or "negative" category (e.g., "the picture that is not of food"). Although the probability of detection decreased as cue specificity decreased, even in the most difficult condition (negative category cue at 114 msec/picture) 35% of the targets were detected. When the scores from the 3 detection tasks were compared with a control group's immediate recognition memory for the targets, immediate recognition memory was invariably lower than detection. Results are consistent with the hypothesis that rapidly presented pictures may be momentarily understood at the time of viewing and then quickly forgotten. (19 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Visual context information constrains what to expect and where to look, facilitating search for and recognition of objects embedded in complex displays. This article reviews a new paradigm called contextual cueing, which presents well-defined, novel visual contexts and aims to understand how contextual information is learned and how it guides the deployment of visual attention. In addition, the contextual cueing task is well suited to the study of the neural substrate of contextual learning. For example, amnesic patients with hippocampal damage are impaired in their learning of novel contextual information, even though learning in the contextual cueing task does not appear to rely on conscious retrieval of contextual memory traces. We argue that contextual information is important because it embodies invariant properties of the visual environment such as stable spatial layout information as well as object covariation information. Sensitivity to these statistical regularities allows us to interact more effectively with the visual world.