ArticlePDF AvailableLiterature Review

Need for cross-level iterative re-entry in models of visual processing

Authors:
  • Ford Motor Company, Dearborn, Michigan

Abstract

Two main hypotheses regarding the directional flow of visual information processing in the brain have been proposed: feed-forward (bottom-up) and re-entrant (top-down). Early theories espoused feed-forward principles in which processing was said to advance from simple to increasingly complex attributes terminating at a higher area where conscious perceptions occur. That view is disconfirmed by advances in neuroanatomy and neurophysiology, which implicate re-entrant two-way signaling as the predominant form of communication between brain regions. With some notable exceptions, the notion of re-entrant processing has had a relatively modest effect on computational models of perception and cognition, which continue to be predominantly based on feed-forward or within-level re-entrant principles. In the present work we describe five sets of empirical findings that defy interpretation in terms of feed-forward or within-level re-entrant principles. We conclude by urging the adoption of psychophysical, biological, and computational models based on cross-level iterative re-entrant principles.
Vol.:(0123456789)
1 3
Psychonomic Bulletin & Review
https://doi.org/10.3758/s13423-023-02396-x
THEORETICAL/REVIEW
Need forcross‑level iterative re‑entry inmodels ofvisual processing
ThomasM.Spalek1· K.P.Unnikrishnan2· VincentDiLollo1
Accepted: 24 September 2023
© The Author(s) 2023
Abstract
Two main hypotheses regarding the directional flow of visual information processing in the brain have been proposed: feed-
forward (bottom-up) and re-entrant (top-down). Early theories espoused feed-forward principles in which processing was
said to advance from simple to increasingly complex attributes terminating at a higher area where conscious perceptions
occur. That view is disconfirmed by advances in neuroanatomy and neurophysiology, which implicate re-entrant two-way
signaling as the predominant form of communication between brain regions. With some notable exceptions, the notion of
re-entrant processing has had a relatively modest effect on computational models of perception and cognition, which continue
to be predominantly based on feed-forward or within-level re-entrant principles. In the present work we describe five sets of
empirical findings that defy interpretation in terms of feed-forward or within-level re-entrant principles. We conclude by urg-
ing the adoption of psychophysical, biological, and computational models based on cross-level iterative re-entrant principles.
Keywords Feed-forward processing· Re-entrant processing· Computational models· Biological models· Cross-level re-entry
Introduction
An issue that has generated considerable discussion in the
fields of perception and cognition is the directional flow of
information processing within the brain. Visual informa-
tion processing has been modeled as a sequence of steps
culminating in conscious awareness. Those models have
been formulated in psychophysical, biological, and compu-
tational terms. Here, we examine the success of these mod-
els in accounting for the empirical evidence. Our principal
objective was not to provide a comprehensive review of the
literature. Rather, our approach was selective: studies were
selected that were most pertinent to – and best illustrated
– the specific issue under discussion.
Abbreviated history offeed‑forward
andre‑entrant models (psychophysics
andbiology)
Early psychophysical and biological theories of visual infor-
mation processing expounded a feed-forward – also known
as “bottom-up” – sequence in which the sensory input was
said to advance from lower to higher processing levels cul-
minating in a perception. A prime example is Selfridge’s
(1959) Pandemonium model in which notional demons,
each specializing in a different cognitive function, direct
the incoming stimuli to progressively more complex higher-
level demons converging to a decision-making demon that
determines the observer’s conscious awareness.
In the 1960s and 1970s, feed-forward schemes such as
Pandemonium were generally accepted as the default model
of brain functioning. Their acceptance was supported by
Hubel and Wiesel’s (1962, 1977) discovery of the feed-
forward sequence of visual receptive fields aptly named
Simple, Complex, and Hypercomplex. A benefit of feed-
forward schemes lies in their simplicity and in their allowing
subtraction procedures to calculate the timing of different
processing stages (e.g., Donders, 1969).
Adequacy of the feed-forward scheme as a compre-
hensive theory was questioned by later advances in neu-
roanatomy and neurophysiology that revealed massive
* Thomas M. Spalek
tspalek@sfu.ca
1 Department ofPsychology, Simon Fraser University, 8888
University Drive, Burnaby, BritishColumbiaV5A1S6,
Canada
2 eNeuroLearn, AnnArbor, MI, USA
Psychonomic Bulletin & Review
1 3
re-entrant pathways between brain regions (e.g., Felleman
& Van Essen, 1991; Posner & Raichle, 1994; Zeki, 1993).
If region A sends signals to region B, it is invariably the
case that region B sends signals back to region A. Notably,
the descending fibers are known to outnumber the ascend-
ing fibers and to be distributed widely, including into the
spaces between the neurons at the lower level (e.g., Shipp &
Zeki, 1989). Besides mediating a classical handshake with
the units at the lower level, the widely distributed re-entrant
signals can also bias the function of the lower-level units
in preparation for the next step in the processing sequence
(e.g., Sillito etal., 1994, see below). This anticipatory role
of re-entrant processing has been incorporated into several
models of information processing (e.g., Di Lollo etal., 2000;
Hawkins & Blakeslee, 2004; Mumford, 1991, 1992).
Biological evidence notwithstanding, feed-forward prin-
ciples continue to be implemented in theories of perception
and cognition (e.g., de Waal & Ferrari, 2010).1 As noted in
the next section, most deep learning computational models
have also been based on exclusively feed-forward principles
(e.g., Sejnowski, 2018).
Abbreviated history offeed‑forward
andre‑entrant models (computational)
The historical evolution of psychophysical/biological mod-
els is paralleled by the evolution of computational models.
Early computational models employed strictly feed-forward
architectures (McCulloch & Pitts, 1943; Hebb, 1949). Some
of these models included the concept of back propagation
(Rumelhart, Hinton, & Williams, 1985; Hecht-Nielsen,
1992) which may be regarded as involving re-entrant activ-
ity. We hasten to note, however, that back propagation can-
not be regarded as the type or re-entrant activity that under-
lies perceptual and cognitive processes beyond the learning
stage. This is because the re-entrant activity in back propa-
gation mediates the establishment of a neural network with
its hidden layers. Once established, however, that network
functions in an exclusively feed-forward mode. After its
establishment, a network may require updating by means
of back propagation; once updated, however, that network
continues to function in an exclusively feed-forward mode.
As a coda to the discussion on back propagation, we
should note two ways in which the system may optimize the
processing of the input. Back propagation can be regarded
as a way of configuring the system in readiness for a given
input. A similar objective is achieved in the laboratory by
the instructions given to the observer. In both cases, incom-
ing stimuli are processed “off-line” within a system whose
configuration had been set before the arrival of the visual
input. This way of configuring the system has been termed
task-set reconfiguration.
In an alternative “on-line” procedure, the system’s config-
uration is altered as the input is being processed. An exam-
ple of “on-line” processing has been proposed by Lamme
and Roelfsema (2000; see below). In “on-line” processing
the configuration of the system is not fixed as in “off-line”
processing; rather, each step in the processing sequence is
said to reconfigure the system in readiness for the next step.
This sequence of automatic reconfigurations then leads
to conscious awareness of the initial input. It needs to be
emphasized that the present work deals exclusively with
“on-line” processes.2
Returning to the discussion of models based on re-entry,
it is important to distinguish re-entry within a given level in
a multi-level system from re-entry between levels.3 Mod-
els based on within-level re-entry have been proposed by
Fernandez etal. (Recurrent Multilayer Perceptron (RMLP),
1990), by Liang and Xiaolin (Recurrent Convolutional Neu-
ral Network (RCNN), 2015) and by Alom etal. (Inception
Recurrent Convolutional Neural Network (IRCNN), 2021).
The type of re-entry advocated in these models, however, is
strictly within levels. This prevents them from accounting for
the behavioural findings – discussed below – all of which
involve re-entry between levels.
1 Indeed, there has been at least some measure of reluctance in
accepting the concept of re-entrant processing. At a major interna-
tional conference, a speaker proffering a feed-forward model referred
to the biological evidence for re-entry as an “evolutionary mistake.
2 We thank Roberto Dell’Acqua for pointing out the distinction
between “off-line” and “on-line” processing.
3 In biological settings, the term “level” refers to any given brain
region. Whether a level is to be considered “high” or “low” depends
on the context. Thus, a level defined as “low” in one context may be
defined as “high” in a different context. This can best be illustrated by
example. Pascual-Leone and Walsh (2001) used transcranial magnetic
stimulation to study the timing and function of feedback between a
relatively high-level region (V5) and a relatively low-level region
(V1). Similarly, Sillito etal. (1994) investigated the temporal course
of signals between a high-level region (V1) and a low-level region
(Lateral Geniculate Nucleus).
In computational settings, the term “level” has been used to refer
to distinct processes performed at given stages of information pro-
cessing. For example, McClelland and Rumelhart (1981, page 377)
remarked as follows:
…we assume that perceptual processing takes place within a system
in which there are several levels of processing, each concerned with
forming a representation of the input at a different level of abstrac-
tion. For visual word perception, we assume that there is a visual fea-
ture level, a letter level, and a word level, as well as higher levels of
processing that provide "top-down" input to the word level.
We thank an anonymous reviewer for requesting a more informative
definition of the term “level” in both biological and computational
contexts.
Psychonomic Bulletin & Review
1 3
Models based on re-entry between levels have been pro-
posed less frequently. An early instance was the fast-learning
algorithm for deep belief nets (Hinton etal., 2006). That
model contains multiple levels. The lower levels feed infor-
mation forward to higher levels in an initial sweep but have
no further feed-forward function. Rather, they convey only
descending signals between levels. In contrast, the top lev-
els exhibit full two-way connections between levels. Hinton
etal.’s model was elaborated by Lee etal.’s (2011) Convo-
lutional Deep Belief Network (CDBN) that postulated full
two-way connections between all levels in the system. These
between-level models are consistent with the empirical evi-
dence discussed below.
In summary, there is no question that feed-forward pro-
cesses are an essential part of perceptual and cognitive
processes if for no other reason than to provide the initial
sensory input to the system. Also, as discussed below in the
context of face processing, they may underlie a distinct mode
of information processing. But do they provide a suitable
– or even acceptable – explanatory basis for the empirical
findings? A negative answer to that question is demanded by
a range of perceptual and cognitive phenomena that cannot
be fully explained in terms of feed-forward processes or of
processes constrained to re-entry within levels. Five such
cases are reviewed below.
Phenomena thatrequire between‑levels
re‑entrant accounts
Metacontrast masking
Visual masking occurs when the perception of a target
stimulus is impaired by the presentation of a subsequent
visual stimulus (the mask). This form of masking is known
as backward masking because the mask appears to act back-
wards in time. Two types of backward masking have been
recognized, depending on the spatial relationship between
the target and the mask: pattern masking and metacontrast
masking. In pattern masking the contours of the mask are
spatially superimposed on the target; in metacontrast mask-
ing the contours of the mask are closely adjacent to – but do
not overlap – the contours of the target. Metacontrast is the
main form of masking considered in the present work. For
metacontrast masking to occur, the mask must follow the tar-
get in time. The optimal stimulus-onset asynchrony (SOA)
between the target and the mask has been estimated to be
about 100 ms in daylight viewing (Breitmeyer & Öğmen,
2006). Notably, no masking occurs when the target and the
mask are displayed simultaneously.
Theoretical accounts of metacontrast masking have been
formulated in terms of feed-forward processes. For exam-
ple, a well-known theory proposed that the fast transient
activity triggered by the onset of the mask overcomes and
suppresses the slower sustained activity triggered by the tar-
get (Breitmeyer & Ganz, 1976). Here, we claim that feed-
forward accounts of metacontrast masking are disconfirmed
by recent evidence obtained in studies that used a variety of
experimental paradigms. We begin with studies of event-
related potentials (ERPs).
Fahrenfort, Scholte, and Lamme (2007a) recorded ERPs
in a study of metacontrast masking. Observers were required
to detect the presence of a square target figure that was fol-
lowed (or not followed) by a metacontrast mask. The tar-
get was clearly visible when it was not masked but was
invisible when followed by the mask. The corresponding
ERPs revealed that the neural activity in the feed-forward
sweep was the same when the target was masked as when it
wasn’t masked. This indicates that no masking occurred in
the feed-forward sweep. In contrast, the ERP components
associated with the re-entrant activity – which were very
much in evidence when the target was not masked – were
entirely missing when the target was masked. This pattern
of results strongly suggests that metacontrast masking acts
by disrupting the re-entrant signals while leaving the feed-
forward signals intact.
Further evidence that metacontrast masking depends crit-
ically on re-entrant processes has been reported by Fahren-
fort, Scholte, and Lamme (2007b), who found that conscious
awareness of the target in a metacontrast study correlated
with re-entrant but not with feed-forward activity. More evi-
dence along these lines has been reported by Lamme, Zipser,
and Spekreijse (2002) and by Supèr, Spekreijse, and Lamme
(2001), who found that feedback from extrastriate areas was
critical for the stimuli to reach consciousness. Furthermore,
Zhaoping and Liu (2022) found that metacontrast masking
is weaker for stimuli displayed in the peripheral retina where
feedback from higher to lower brain regions is thought to
be weaker. Clearly, metacontrast masking cannot be wholly
explained in terms of feed-forward processes.
Object substitution masking
Object substitution masking (OSM) is also known as com-
mon onset masking because, unlike metacontrast masking,
the target stimulus and the mask come into view simultane-
ously. The display consists of a target item, a variable num-
ber of distractor items, and a mask (typically, four small dots
surrounding the target). No masking occurs if the entire dis-
play disappears after a brief exposure. Masking does occur,
however, if the target and the distractors are removed after a
brief exposure, and only the mask remains in view (Di Lollo,
Enns, & Rensink, 2000; Lleras & Moore, 2003; Woodman
& Luck, 2003).
On the strength of this evidence, Di Lollo etal. (2000)
concluded that OSM cannot be explained by the kind of
Psychonomic Bulletin & Review
1 3
transient feed-forward activity that was held to account
for metacontrast masking (Breitmeyer & Ganz, 1976; see
above). This is because the simultaneous onset of the target
and the mask in OSM precludes the mask from producing
a separate onset transient that might suppress the ongoing
sustained processing of the target. The conclusion that re-
entry is involved in OSM has been corroborated by Boehler,
Schoenfeld, Heinze, and Hopf (2008), who employed mag-
netoencephalography (MEG) to show that OSM is mediated
by re-entrant activity to primary visual cortex.
Figure‑ground segregation
In a series of ingenuous experiments with awake monkeys,
Lamme and Roelfsema (2000) investigated a train of visual
processes that culminated in figure-ground segregation.
They recorded the activity of neurons in primary visual
cortex in response to a brief visual display. The display
consisted of a square patch of oriented line segments on a
background of line segments of the opposite orientation. The
main finding was that re-entrant signals from extrastriate
cortex altered the tuning of the neurons in V1 to perform
several different functions in successive phases of the pro-
cessing cycle. About 40 ms after stimulus onset the neurons
were tuned to line orientation (loosely speaking, they acted
as line-orientation detectors). About 40 ms later the same
neurons became tuned to the subjective boundaries of the
square patch (boundary detectors). Finally, about 40 ms after
that, the same neurons became tuned to the square figure
as distinct from the background (figure-ground detectors).
Ablation of extrastriate cortex caused the neurons to
remain sharply selective for line orientation and figure
boundaries, but the activity corresponding to figure-ground
selection was missing. These findings confirm that, within
a processing cycle, signals from higher centres re-enter the
primary visual cortex and are essential in implementing the
figure-ground selectivity of the neurons at the lower level.
The re-entrant nature of the activity from extrastriate to stri-
ate cortex makes these results not amenable to accounts in
terms of feed-forward processes.
Enhancing theperception ofdirectional motion
Another set of results that defies a feed-forward account
has been reported by Sillito, Jones, Gerstein, and West
(1994). The study involved monitoring the activity along
the two-way
pathways between lateral geniculate nucleus (LGN) and
primary visual cortex in the cat in response to moving grat-
ings. The firing threshold of LGN neurons located just ahead
in the motion path – but not yet activated by the moving grat-
ing – was significantly lowered by re-entrant signals from
primary visual cortex.
Because of the lowered threshold, the primed neurons in
LGN fired more readily and more strongly when eventually
stimulated by the moving grating. As the authors note, this
sequence of events may be regarded as the neurophysiologi-
cal correlate of an expectation about the future location of
a moving object. It goes without saying that this enhance-
ment of motion processing in LGN stems exclusively from
re-entrant signals between levels.
Homologous conclusions have been drawn from a series
of experiments by Hupé etal. (1998), who studied the modu-
lation of motion-selective units in Regions V1, V2, and V3
of macaque monkeys by re-entrant signals from Region V5.
The main manipulation was to cool Region V5 to reduce
the strength of re-entrant signals. The main finding was that
the activity of neurons in the lower regions was reduced by
as much as 95% when the activity of neurons in the higher
region was suppressed by the reversible lesion. Clearly,
feed-forward signals are not sufficient. Rather, appropri-
ate functioning of motion-selective neurons in the lower
regions depends critically on the re-entrant signals from
higher areas. Beyond enhancing the functioning of neurons
in the lower regions, Hupé etal. (1998) note that “… feed-
back projections serve to improve the visibility of features
… in the stimulus and may thus contribute to figure–ground
segregation, breaking of camouflage, and psychophysically
demonstrated ‘pop-out’ effects” (p. 786).
Face recognition
Feed-forward models encounter significant problems in
modeling the findings in the face-recognition literature,
especially those involved in identifying individual faces or
specific facial expressions. We believe that those problems
have arisen from the omission of re-entry as a critical factor
in models of face recognition. Evidence consistent with the
critical role of re-entry comes from recordings from tempo-
ral cortex of macaque monkeys (Sugase, Yamane, Ueno, &
Kawano, 1999; Sugase-Miyamoto, Matsumoto, & Kawano,
2011). Specifically, Sugase etal. (1999) found that face rec-
ognition occurs in two distinct stages. In the words of Sugase
etal. (1999, p. 869):
We found that single neurons conveyed two different
scales of facial information in their firing patterns,
starting at different latencies. Global information,
categorizing stimuli as monkey faces, human faces
or shapes, was conveyed in the earliest part of the
responses. Fine information about identity or expres-
sion was conveyed later, beginning on average 51 ms
after global information. We speculate that global
information could be used as a ‘header’ to prepare des-
tination areas for receiving more detailed information.
Psychonomic Bulletin & Review
1 3
In agreement with Sugase etal. (1999), we suggest that
generic faces are probably detected on the feed-forward
sweep, perhaps along the dorsal pathway for the low spatial
frequency contents of the image, as proposed by Bar etal.
(2006). In contrast, identification of individual faces, or of
specific facial expressions, requires re-entrant signalling
from other cortical and subcortical brain regions. Sugase
etal.’s findings should be considered in the broader context
provided by Chow etal. (2022), in which different levels of
categorization are shown to follow different time courses.4
Consistent with the theme of the present work, face percep-
tion cannot be wholly explained in terms of feed-forward
processes alone.
Concluding comments
Considerable evidence has been cited in the foregoing for
phenomena that defy explanation in strictly feed-forward or
within-level principles. Yet, despite this evidence, accounts
of visual processing couched in feed-forward or within-level
concepts continue to be proposed. For example, models
based on essentially feed-forward principles can be found
in a recent special issue of the journal Vision Research
concerning deep neural network accounts of human vision
(Heinke, Leonardis, & Leek, 2022).
On the other hand, the idea that between-levels re-entry
is an important component of visual information processing
has been around for some time. For example, Bridgeman
(1980) anticipated the multiplexing function of re-entrant
signals that was later proposed by Lamme and Roelfsema
(2000, see above). In Bridgeman’s study monkeys performed
a visual discrimination task under conditions of metacontrast
masking. Consistent with Lamme and Roelfsema’s findings
and conclusions, Bridgeman (1980, p. 347) proposed that
“The results suggest an iterative or recurrent coding of visual
information, where the same cells participate in early, late,
and pre-response coding in different ways.”
Although most models of visual processing are couched
in terms of feed-forward or within-level re-entrant processes,
between-levels models offer a more realistic perspective.
Among the latter class of models are the ALOPEX model
of Harth, Unnikrishnan, and Pandya (1987), the ARTMAP
model by Carpenter, Grossberg, and Reynolds (1991) and
the CDBN model of Lee etal. (2009; described above).
More recently, Hawkins and colleagues have put forth a
systematic theory of brain functioning based on iterative
re-entrant processes between levels (Hawkins & Blakeslee,
2004; Hawkins, Ahmad, & Cui, 2017; Hawkins, 2021).
In summary, models based on feed-forward or within-
level re-entry principles cannot account for the empirical
findings. In contrast, models based on iterative re-entry
between levels offer a more promising perspective. To
account for the empirical findings, however, such models
need to include unique parameters tailor-made for each indi-
vidual phenomenon. This said, the major objective of the
present work was not to propose a novel model based on
between-level re-entry. Rather, it was to draw attention to
empirical findings that are beyond what can be explained
in terms of feed-forward or within-level principles alone.
Open practices statement No new experiments are reported in this
paper. Thus, no preregistration was possible.
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Alom, M. Z., Hasan, M., Yakopcic, C., Taha, T. M., & Asari, V. K.
(2021). Inception recurrent convolutional neural network for
object recognition. Machine Vision and Applications, 32, 1–14.
Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M.,
Dale, A. M., Hämäläinen, M. S., Marinkovic, K., Schacter, D.
L., Rosen, B. R., & Halgren, E. (2006). Top-down facilitation of
visual recognition. Proceedings of the National Academy of Sci-
ences, 103(2), 449–454.
Boehler, C. N., Schoenfeld, M. A., Heinze, H. J., & Hopf, J. M. (2008).
Rapid recurrent processing gates awareness in primary visual cor-
tex. Proceedings of the National Academy of Sciences, 105(25),
8742–8747.
Breitmeyer, B. G., & Ganz, L. (1976). Implications of sustained and
transient channels for theories of visual pattern masking, saccadic
suppression, and information processing. Psychological Review,
83(1), 1–36.
Breitmeyer, B., & Öğmen, H. (2006). Visual masking: Time slices
through conscious and unconscious vision. Oxford University
Press.
Bridgeman, B. (1980). Temporal response characteristics of cells in
monkey striate cortex measured with metacontrast masking and
brightness discrimination. Brain Research, 196(2), 347–364.
Carpenter, G. A., Grossberg, S., & Reynolds, J. H. (1991). ARTMAP:
Supervised real-time learning and classification of nonstationary
data by a self-organizing neural network. Neural Networks, 4(5),
565–588.
Chow, J. K., Palmeri, T. J., & Mack, M. L. (2022). Revealing a com-
petitive dynamic in rapid categorization with object substitution
masking. Attention, Perception, & Psychophysics, 84(3), 638–646.
4 We thank an anonymous reviewer for noting the connection
between the work of Sugase etal. (1999) and Chow etal. (2022).
Psychonomic Bulletin & Review
1 3
de Waal, F. B., & Ferrari, P. F. (2010). Towards a bottom-up perspec-
tive on animal and human cognition. Trends in Cognitive Sciences,
14(5), 201–207.
Di Lollo, V., Enns, J. T., & Rensink, R. A. (2000). Competition for con-
sciousness among visual events: The psychophysics of re-entrant
visual processes. Journal of Experimental Psychology: General,
129(4), 481–507.
Donders, F. C. (1969). Over de snelheid van psychische processen [On
the speed of psychological processes]. (W. Koster, trans.). In W.
Koster (Ed.), Attention and performance: II. Amsterdam: North-
Holland (Original work published 1868).
Fahrenfort, J. J., Scholte, H. S., & Lamme, V. A. (2007a). Masking
disrupts re-entrant processing in human visual cortex. Journal of
Cognitive Neuroscience, 19(9), 1488–1497.
Fahrenfort, J. J., Scholte, H. S., & Lamme, V. A. (2007b). Percep-
tion correlates with feedback but not with feedforward activity in
human visual cortex. Journal of Vision, 7(9), 388–388.
Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical
processing in primate visual cortex. Cerebral Cortex, 1, 1–47.
Fernandez, B., Parlos, A. G., & Tsai, W. (1990). Nonlinear dynamic
system identification using artificial neural networks (anns) (pp.
133–141). In: International Joint Conference on Neural Networks.
Harth, E., Unnikrishnan, K. P., & Pandya, A. S. (1987). The inversion
of sensory processing by feedback pathways: A model of visual
cognitive functions. Science, 237(4811), 184–187.
Hawkins, J. (2021). A thousand brains: A new theory of intelligence
Hachette UK.
Hawkins, J., Ahmad, S., & Cui, Y. (2017). A theory of how columns in
the neocortex enable learning the structure of the world. Frontiers
in neural circuits, 81–98.
Hawkins, J., & Blakeslee, S. (2004). On intelligence Macmillan.
Hebb, D. O. (1949). The first stage of perception: Growth of the assem-
bly. The Organization of Behavior, 4(60), 78–60.
Hecht-Nielsen, R. (1992). Theory of the backpropagation neural network.
In neural networks for perception (pp. 65-93). Academic Press.
Heinke, D., Leonardis, A., & Leek, E. C. (2022). What do deep neural
networks tell us about biological vision? Vision Research, 198,
108069–108069.
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algo-
rithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular
interaction and functional architecture in the cat’s visual cortex.
Journal of Physiology, London, 160, 106–154.
Hubel, D. H., & Wiesel, T. N. (1977). Functional architecture of
macaque visual cortex. Proceedings of the Royal Society, Lon-
don (B), 198, 1–59.
Hupé, J. M., James, A. C., Payne, B. R., Lomber, S. G., Girard, P.,
& Bullier, J. (1998). Cortical feedback improves discrimination
between figure and ground by V1, V2 and V3 neurons. Nature,
394, 784–787.
Lamme, V. A., & Roelfsema, P. R. (2000). The distinct modes of vision
offered by feedforward and recurrent processing. Trends in Neu-
rosciences, 23(11), 571–579.
Lamme, V. A., Zipser, K., & Spekreijse, H. (2002). Masking interrupts
figure-ground signals in V1. Journal of Cognitive Neuroscience,
14(7), 1044–1053.
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009, June). Convo-
lutional deep belief networks for scalable unsupervised learning
of hierarchical representations. In Proceedings of the 26th annual
international conference on machine learning (pp. 609–616).
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2011). Unsupervised
learning of hierarchical representations with convolutional deep
belief networks. Communications of the ACM, 54(10), 95–103.
Liang, M., & Xiaolin, H. (2015). Recurrent convolutional neural net-
work for object recognition. Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition: In.
Lleras, A., & Moore, C. M. (2003). When the target becomes the mask:
Using apparent motion to isolate the object-level component of
object substitution masking. Journal of Experimental Psychology:
Human Perception and Performance, 29(1), 106–120.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation
model of context effects in letter perception: I. An account of basic
findings. Psychological Review, 88(5), 375–407.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas
immanent in nervous activity. The Bulletin of Mathematical Bio-
physics, 5, 115–133.
Mumford, D. (1991). On the computational architecture of the neocor-
tex I. The role of the thalamo-cortical loop. Biological Cybernet-
ics, 65, 135–145.
Mumford, D. (1992). On the computational architecture of the neocor-
tex II. The role of cortico-cortical loops. Biological Cybernetics,
66, 241–251.
Pascual-Leone, A., & Walsh, V. (2001). Fast backprojections from the
motion to the primary visual area necessary for visual awareness.
Science, 292(5516), 510–512.
Posner, M. I., & Raichle, M. E. (1994). Images of mind. Scientific
American Library.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning
internal representations by error propagation. In D. E. Rumelhart
& i. L. McClelland (Eds.), parallel distributed processing: Explo-
rations in the microstructure of cognition. Vol. 1: Foundations.
Cambridge, MA: Bradford books/MIT press.
Shipp, S., & Zeki, S. (1989). The organization of connections between
areas V5 and V1 in the macaque monkey visual cortex. European
Journal of Neuroscience, 1, 309–332.
Sejnowski, T. J. (2018). The deep learning revolution. MIT press.
Selfridge, O. (1959). Pandemonium: A paradigm for learning. In sym-
posium on the mechanization of thought processes. HM Stationery
Office.
Sillito, A. M., Jones, H. E., Gerstein, G. L., & West, D. C. (1994). Fea-
ture-linked synchronization of thalamic relay cell firing induced
by feedback from the visual cortex. Nature, 369, 479–482.
Sugase, Y., Yamane, S., Ueno, S., & Kawano, K. (1999). Global and
fine information coded by single neurons in the temporal visual
cortex. Nature, 400(6747), 869–873.
Sugase-Miyamoto, Y., Matsumoto, N., & Kawano, K. (2011). Role of
temporal processing stages by inferior temporal neurons in facial
recognition. Frontiers in Psychology, 2, 141–149.
Supèr, H., Spekreijse, H., & Lamme, V. A. (2001). Two distinct modes
of sensory processing observed in monkey primary visual cortex
(V1). Nature Neuroscience, 4(3), 304–310.
Woodman, G. F., & Luck, S. J. (2003). Dissociations among attention,
perception, and awareness during object-substitution masking.
Psychological Science, 14(6), 605–611.
Zeki, S. (1993). A vision of the brain. Blackwell.
Zhaoping, L., & Liu, Y. (2022). The central-peripheral dichotomy and
metacontrast masking. Perception, 51(8), 549–564.
Publisher’s note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
... Visual perception involves feedforward streams, processing, and top-down interactions [10,13,28,54,62,69]. The way the information is processed at each stage to give rise to the activation of neurons assemblies will not be addressed here. ...
Article
Full-text available
Synchronization of spikes carried by the visual streams is strategic for the proper binding of cortical assemblies, hence for the perception of visual objects as coherent units. Perception of a complex visual scene involves multiple trains of gamma oscillations, coexisting at each stage in visual and associative cortex. Here, we analyze how this synchrony is managed, so that the perception of each visual object can emerge despite this complex interweaving of cortical activations. After a brief review of structural and temporal facts, we analyze the interactions which make the oscillations coherent for the visual elements related to the same object. We continue with the propagation of these gamma oscillations within the sensory chain. The dominant role of the pulvinar and associated reticular thalamic nucleus as cortical coordinator is the common thread running through this step-by-step description. Synchronization mechanisms are analyzed in the context of visual perception, although the present considerations are not limited to this sense. A simple experiment is described, with the aim of assessing the validity of the elements developed here. A first set of results is provided, together with a proposed method to go further in this investigation.
Article
Full-text available
Advances in neuroscience implicate reentrant signaling as the predominant form of communication between brain areas. This principle was used in a series of masking experiments that defy explanation by feed-forward theories. The masking occurs when a brief display of target plus mask is continued with the mask alone. Two masking processes were found: an early process affected by physical factors such as adapting luminance and a later process affected by attentional factors such as set size. This later process is called masking by object substitution, because it occurs whenever there is a mismatch between the reentrant visual representation and the ongoing lower level activity. Iterative reentrant processing was formalized in a computational model that provides an excellent fit to the data. The model provides a more comprehensive account of all forms of visual masking than do the long-held feed-forward views based on inhibitory contour interactions.
Article
Full-text available
According to the central-peripheral dichotomy (CPD), feedback from higher to lower cortical areas along the visual pathway to aid recognition is weaker in the more peripheral visual field. Metacontrast masking is predominantly a reduced visibility of a brief target by a brief and spatially adjacent mask when the mask succeeds rather than precedes or coincides with the target. If this masking works mainly by interfering with the feedback mechanisms for target recognition, then, by the CPD, this masking should be weaker at more peripheral visual locations. We extended the metacontrast masking at fovea by Enns and Di Lollo to visual field eccentricities 1[Formula: see text], 3[Formula: see text], and 9[Formula: see text]. Relative to the target’s onset, the mask appeared at a stimulus onset asynchrony (SOA) of [Formula: see text], 0, 50, 92, or 142 milliseconds (ms). Enlarged stimuli were used for larger eccentricities to equalize target discrimination performance across eccentricities as best as possible for zero SOA and when SOA was too long for substantial masking. At each eccentricity, the masking was weakest at 0 or [Formula: see text] ms SOA, strongest at 50 ms SOA, and weakened with larger (positive) SOAs. Consistent with the CPD, larger eccentricities presented weaker maskings at all nonzero, and particularly the positive, SOAs.
Article
Full-text available
Categorization at different levels of abstraction have distinct time courses, but the different levels are often considered separately. Superordinate-level categorization is typically faster than basic-level categorization at ultra-rapid exposure durations (< 33 ms) while basic-level categorization is faster than superordinate-level categorization at longer exposure durations. This difference may be due to a competitive dynamic between levels of categorization. By leveraging object substitution masking, we found a distinct time course of masking effects for each level of categorization. Superordinate-level categorization showed a masking effect earlier than basic-level categorization. However, when basic-level categorization first showed a masking effects, superordinate-level categorization was spared despite its earlier masking effect. This unique pattern suggests a trade-off between the two levels of categorization over time. Such an effect supports an account of categorization that depends on the interaction of perceptual encoding, selective attention, and competition between levels of category representation.
Preprint
Full-text available
Neocortical regions are organized into columns and layers. Connections between layers run mostly perpendicular to the surface suggesting a columnar functional organization. Some layers have long-range excitatory lateral connections suggesting interactions between columns. Similar patterns of connectivity exist in all regions but their exact role remain a mystery. In this paper, we propose a network model composed of columns and layers that performs robust object learning and recognition. Each column integrates its changing input over time to learn complete predictive models of observed objects. Excitatory lateral connections across columns allow the network to more rapidly infer objects based on the partial knowledge of adjacent columns. Because columns integrate input over time and space, the network learns models of complex objects that extend well beyond the receptive field of individual cells. Our network model introduces a new feature to cortical columns. We propose that a representation of location relative to the object being sensed is calculated within the sub-granular layers of each column. The location signal is provided as an input to the network, where it is combined with sensory data. Our model contains two layers and one or more columns. Simulations show that using Hebbian-like learning rules small single-column networks can learn to recognize hundreds of objects, with each object containing tens of features. Multi-column networks recognize objects with significantly fewer movements of the sensory receptors. Given the ubiquity of columnar and laminar connectivity patterns throughout the neocortex, we propose that columns and regions have more powerful recognition and modeling capabilities than previously assumed.
Article
Full-text available
Neocortical regions are organized into columns and layers. Connections between layers run mostly perpendicular to the surface suggesting a columnar functional organization. Some layers have long-range excitatory lateral connections suggesting interactions between columns. Similar patterns of connectivity exist in all regions but their exact role remain a mystery. In this paper, we propose a network model composed of columns and layers that performs robust object learning and recognition. Each column integrates its changing input over time to learn complete predictive models of observed objects. Excitatory lateral connections across columns allow the network to more rapidly infer objects based on the partial knowledge of adjacent columns. Because columns integrate input over time and space, the network learns models of complex objects that extend well beyond the receptive field of individual cells. Our network model introduces a new feature to cortical columns. We propose that a representation of location relative to the object being sensed is calculated within the sub-granular layers of each column. The location signal is provided as an input to the network, where it is combined with sensory data. Our model contains two layers and one or more columns. Simulations show that using Hebbian-like learning rules small single-column networks can learn to recognize hundreds of objects, with each object containing tens of features. Multi-column networks recognize objects with significantly fewer movements of the sensory receptors. Given the ubiquity of columnar and laminar connectivity patterns throughout the neocortex, we propose that columns and regions have more powerful recognition and modeling capabilities than previously assumed.
Article
Full-text available
Deep convolutional neural networks (DCNNs) are an influential tool for solving various problems in the machine learning and computer vision fields. In this paper, we introduce a new deep learning model called an Inception- Recurrent Convolutional Neural Network (IRCNN), which utilizes the power of an inception network combined with recurrent layers in DCNN architecture. We have empirically evaluated the recognition performance of the proposed IRCNN model using different benchmark datasets such as MNIST, CIFAR-10, CIFAR- 100, and SVHN. Experimental results show similar or higher recognition accuracy when compared to most of the popular DCNNs including the RCNN. Furthermore, we have investigated IRCNN performance against equivalent Inception Networks and Inception-Residual Networks using the CIFAR-100 dataset. We report about 3.5%, 3.47% and 2.54% improvement in classification accuracy when compared to the RCNN, equivalent Inception Networks, and Inception- Residual Networks on the augmented CIFAR- 100 dataset respectively.
Chapter
This chapter presents a survey of the elementary theory of the basic backpropagation neural network architecture, covering the areas of architectural design, performance measurement, function approximation capability, and learning. The survey includes a formulation of the backpropagation neural network architecture to make it a valid neural network and a proof that the backpropagation mean squared error function exists and is differentiable. Also included in the survey is a theorem showing that any L2 function can be implemented to any desired degree of accuracy with a three-layer backpropagation neural network. An appendix presents a speculative neurophysiological model illustrating the way in which the backpropagation neural network architecture might plausibly be implemented in the mammalian brain for corticocortical learning between nearby regions of cerebral cortex. One of the crucial decisions in the design of the backpropagation architecture is the selection of a sigmoidal activation function.
Book
How deep learning—from Google Translate to driverless cars to personal cognitive assistants—is changing our lives and transforming every sector of the economy. The deep learning revolution has brought us driverless cars, the greatly improved Google Translate, fluent conversations with Siri and Alexa, and enormous profits from automated trading on the New York Stock Exchange. Deep learning networks can play poker better than professional poker players and defeat a world champion at Go. In this book, Terry Sejnowski explains how deep learning went from being an arcane academic field to a disruptive technology in the information economy. Sejnowski played an important role in the founding of deep learning, as one of a small group of researchers in the 1980s who challenged the prevailing logic-and-symbol based version of AI. The new version of AI Sejnowski and others developed, which became deep learning, is fueled instead by data. Deep networks learn from data in the same way that babies experience the world, starting with fresh eyes and gradually acquiring the skills needed to navigate novel environments. Learning algorithms extract information from raw data; information can be used to create knowledge; knowledge underlies understanding; understanding leads to wisdom. Someday a driverless car will know the road better than you do and drive with more skill; a deep learning network will diagnose your illness; a personal cognitive assistant will augment your puny human brain. It took nature many millions of years to evolve human intelligence; AI is on a trajectory measured in decades. Sejnowski prepares us for a deep learning future.