ArticlePDF Available

Cascaded normalizations for spatial integration in the primary visual cortex of primates

Authors:

Abstract and Figures

Spatial integration of visual information is an important function in the brain. However, neural computation for spatial integration in the visual cortex remains unclear. In this study, we recorded laminar responses in V1 of awake monkeys driven by visual stimuli with grating patches and annuli of different sizes. We find three important response properties related to spatial integration that are significantly different between input and output layers: neurons in output layers have stronger surround suppression, smaller receptive field (RF), and higher sensitivity to grating annuli partially covering their RFs. These interlaminar differences can be explained by a descriptive model composed of two global divisions (normalization) and a local subtraction. Our results suggest suppressions with cascaded normalizations (CNs) are essential for spatial integration and laminar processing in the visual cortex. Interestingly, the features of spatial integration in convolutional neural networks, especially in lower layers, are different from our findings in V1.
CNs capture spatial integration in V1 output layers (A) Schematic of the CN model structure. Stimuli are shown at the bottom. The tuning curves in input layers are fitted by the RoG model. The responses in output layers depend on excitatory drive from input layers, subtractive suppression, and divisive suppression. The red arrow indicates the direction of the excitatory input. The names of the four spatial spreads are at the upper right corner of the Gaussian curve. (B) The solid and dashed blue lines show the model fit tunings to the raw data for the same L3 site in Figure 2A. The GoF is 0.96. (Inset) Histograms of GoF for the CN model for 134 recording sites in output layers. The mean GoF is 0.93. Error bars indicate SEM. (C) Comparison of GoF (black curve in the left axis) and explained variance (purple curve in the right axis; see STAR Methods) among different forms of the CN model. Operators (-or O) at the top represent the operations used by the corresponding model, with the lower operators executed before the upper operators (if any). Error bars indicate SEM. (D) Comparison of spatial spreads in the CN model. exc_in and div_in represent the spatial spreads of feedforward excitation and divisive suppression in input layers (1.96s exc_in , 1.96s div_in in Equation 8), respectively; div_out and sub_out represent the spatial spreads of divisive and subtractive suppression in output layers (1.96s div_out , 1.96s sub_out in Equation 9), respectively (visual space in the left axis, cortical space in the right axis) (Dow et al., 1981). Error bars indicate SEM. (E) The relationship between the spatial spreads of both divisive suppressions in input and output layers and local connections. FF, feedforward connection; LC, local connection. Scale bar, 2 mm. See also Figures S3-S5.
… 
Content may be subject to copyright.
Article
Cascaded normalizations for spatial integration in
the primary visual cortex of primates
Graphical abstract
Highlights
dNeurons in V1 output layers have smaller RF sizes and
nonlinear responses to annuli
dA model with cascaded normalizations explains the above
properties in V1
dThe two normalizations play key roles in spatial integration in
V1
dSpatial integration and normalization are different for the
lower layers in CNNs
Authors
Yang Li, Tian Wang, Yi Yang, ...,
Gang Wang, Fei Dou, Dajun Xing
Correspondence
dajun_xing@bnu.edu.cn
In brief
Li et al. find that cascaded normalizations
are essential for spatial integration and
laminar processing in V1 of the primates.
Interestingly, the features of spatial
integration in convolutional neural
networks (CNNS), especially in lower
layers of CNNs, are different from those in
V1.
Li et al., 2022, Cell Reports 40, 111221
August 16, 2022 ª2022 The Author(s).
https://doi.org/10.1016/j.celrep.2022.111221 ll
Article
Cascaded normalizations
for spatial integration in the primary
visual cortex of primates
Yang Li,
1,5
Tian Wang,
1,2,5
Yi Yang,
1
Weifeng Dai,
1
Yujie Wu,
1
Lianfeng Li,
3
Chuanliang Han,
1
Lvyan Zhong,
1
Liang Li,
4
Gang Wang,
4
Fei Dou,
1,2
and Dajun Xing
1,6,
*
1
State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University,
Beijing 100875, China
2
College of Life Sciences, Beijing Normal University, Beijing 100875, China
3
China Academy of Launch Vehicle Technology, Beijing 100076, China
4
Beijing Institute of Basic Medical Sciences, Beijing 100005, China
5
These authors contributed equally
6
Lead contact
*Correspondence: dajun_xing@bnu.edu.cn
https://doi.org/10.1016/j.celrep.2022.111221
SUMMARY
Spatial integration of visual information is an important function in the brain. However, neural computation for
spatial integration in the visual cortex remains unclear. In this study, we recorded laminar responses in V1 of
awake monkeys driven by visual stimuli with grating patches and annuli of different sizes. We find three
important response properties related to spatial integration that are significantly different between input
and output layers: neurons in output layers have stronger surround suppression, smaller receptive field
(RF), and higher sensitivity to grating annuli partially covering their RFs. These interlaminar differences
can be explained by a descriptive model composed of two global divisions (normalization) and a local
subtraction. Our results suggest suppressions with cascaded normalizations (CNs) are essential for spatial
integration and laminar processing in the visual cortex. Interestingly, the features of spatial integration in
convolutional neural networks, especially in lower layers, are different from our findings in V1.
INTRODUCTION
Integrating sensory information across space, time, features,
and modalities is one of the most important functions in the brain
(Angelucci et al., 2017;Calvert and Thesen, 2004;Mudrik et al.,
2014;Quinlan, 2003;Solomon and Kohn, 2014). To fulfill infor-
mation integration, the brain has been thought to utilize various
computational operations, including addition, subtraction, and
division, to combine the activity of different neurons or neural
populations (Bloem and Ling, 2019;Coen-Cagli et al., 2015;
Han et al., 2021c;Henry and Kohn, 2020;Lin et al., 2015).
However, neural computation for information integration remains
unclear (Itti and Koch, 2001;Pack and Bensmaia, 2015;Silver,
2010).
In the primary visual cortex (V1), neuronal modulation elicited
by visual stimuli in regions surrounding a neuron’s receptive field
(RF) (Allman et al., 1985) has been viewed as a strong indication
of information integration in space (spatial integration). Although
many studies have found that adding stimuli outside the classical
receptive field (CRF) can suppress neuronal responses
(surround suppression [SS]) (Allman et al., 1985;Bair et al.,
2003;Bijanzadeh et al., 2018;Henry et al., 2013;Jones et al.,
2001), several studies reported that neuronal response to an
annulus stimuli with a hole in RF is stronger than that to an
equally large filled patch (Jones et al., 2001;Keller et al.,
2020b;Rossi et al., 2001). SS in V1 has been interpreted either
by divisive suppression (normalization) (Cavanaugh et al.,
2002;Self et al., 2014;Vangeneugden et al., 2019) or by subtrac-
tive suppression (Alitto and Usrey, 2015;Keller et al., 2020b;
Roberts et al., 2005;Sceniak et al., 1999,2006). Little is known
about the neural computation related to the V1 responses to
the annulus stimuli and its relation to SS in macaque V1.
Our study aimed to reveal neural computation for spatial inte-
gration that governs neural responses to both large patch and
annuli in different V1 layers. There are six layers with different
neural connections in V1 (Callaway, 1998;Lund, 1988;Lund
et al., 2003;Rockland and Pandya, 1979;Sincich and Horton,
2005;Stettler et al., 2002). Specifically, the V1 input layer (layer
4C) has local recurrent connections and feedforward projections
from the lateral geniculate nucleus (LGN) (Blasdel and Lund,
1983;Callaway, 1998), and the output layers (layer 2/3 and layer
4B) not only receive feedforward drive from the V1 input layers
but also have rich horizontal and feedback connections (Lund
et al., 2003;Rockland and Pandya, 1979;Stettler et al., 2002).
The diverse laminar structures and connection patterns suggest
that information processing in a cortical region involves multiple
levels of neural computation (Felleman and Van Essen, 1991;
Sanchez-Giraldo et al., 2019;Wang et al., 2020;Yang et al.,
Cell Reports 40, 111221, August 16, 2022 ª2022 The Author(s). 1
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
ll
OPEN ACCESS
2022), which is indicated by laminar-specific response proper-
ties in V1 (Bijanzadeh et al., 2018;Buffalo et al., 2011;Han
et al., 2021a;Hansen and Dragoi, 2011;Henry et al., 2013;Maier
et al., 2011;Smith et al., 2013). Therefore, we should understand
neuronal computations for spatial integration in a laminar-spe-
cific manner.
In this study, we simultaneously recorded neural responses in
all V1 layers of awake monkeys and drove the V1 population by
gratings with different spatial configurations (varying the
diameters of the patch and annulus; Figure 1D). Interestingly,
compared with the V1 input layers, we found an increase in
SS, a significant shrinkage of the RF size, and an emergence
of stronger response to the annulus in the output layers. To un-
derstand these interlaminar differences, we fitted descriptive
models to our data simultaneously recorded in the input and
output layers from the same probe placements. We found
ABCD
Figure 1. Laminar profile of spatial integration within a column
(A) Methods for laminar recording. Left, neural activity was recorded with U-Probe (Plexon, 24 channels, inter-channel spacing 100 mm). The linear array was
positioned vertically throughout the full depth of V1. Right, smoothed laminar pattern of current source density (CSD) for one example probe placement (see
STAR Methods). Horizontal black lines indicate the laminar boundaries.
(B) Receptive fields (RFs) map for the same example probe placement in (A). There are 12 good sites with good RF mappings in V1 from the probe placement.
Each map was normalized by its maximum value. The vertical black line represents the mean value of RF centers.
(C) The orientation tunings for the same example probe placement in (A). Each black tuning indicated model (von Mises function) fits the raw data (black dots), and
each tuning was normalized by its maximum value. Vertical dashed lines indicate the mode of preferred orientation across all layer sites. Blue, red, and black
tuning curves represent sites from superficial layers, layer 4C, and deep layers, respectively.
(D) Patch-size (left, solid line) and annulus-size (right, dashed line) tuning curves of drifting gratings for the same example placement in (A). Blue, red, and black
tuning curves represent sites from superficial layers, layer 4C, and deep layers, respectively. Schematics of stimuli are shown at the bottom. Error bars indicate
SEM.
2Cell Reports 40, 111221, August 16, 2022
Article
ll
OPEN ACCESS
suppressions with cascaded normalizations (CNs) can well
explain the patch-size tunings and annulus-size tunings and their
laminar differences. Many studies have suggested that early
layers in convolutional neural network (CNNs) closely resemble
V1, according to population-based matrices. We also tested
whether individual units in CNNs have similar spatial integration
capabilities that rival macaque V1 by characterizing spatial inte-
gration in the convolutional layers (the first five layers) of five
different CNNs.
RESULTS
We simultaneously recorded the spiking activity and local field
potential (LFP) at different cortical depths of awake macaque
V1 using a multiple-channel linear array (Figure 1A; see STAR
Methods). A total of 331 V1 sites were recorded from 28 probe
placements in three macaque monkeys (at eccentricities from
1to 5). We defined the borders of adjacent cortical layers
and aligned the relative cortical depth of these placements
based on the current source density (CSD) patterns of visually
evoked LFP and stimulus-driven multiunit activity (MUA) patterns
(Bijanzadeh et al., 2018;Self et al., 2013;Wang et al., 2020)(Fig-
ure 1A; see STAR Methods). All probe placements in this study
were highly perpendicular to the V1 surface, which was
confirmed by the degree of overlap for the RFs and similarities
of orientation preferences across layers (Figures 1B and 1C;
see STAR Methods for details).
We further measured neural responses to drifting gratings
(presented for 2 s at high contrast with spatial frequency at 2 cy-
cle/deg and temporal frequency at 5 cycle/sec) mainly with two
spatial configurations (grating pattern filled either in circular
patches or circular annuli) at different sizes (Figure 1D). For a re-
corded site, neural responses to grating patches at different
outer sizes formed the site’s patch-size tuning curve (Figure 1D,
left), and the site’s responses to grating annulus at different inner
sizes formed an annulus-size tuning curve (Figure 1D, right). Both
patch- and annulus-size tuning curves reflect V1 neurons’ ability
to integrate information in a wide range of visual spaces (spatial
integration). We found that these tuning curves and their laminar
variations characterized by single-unit spiking activity (SUA) or
multiunit activity (MUA) are highly consistent (Figures S1A,
S1C, and S1D). Therefore, we mainly reported results based
on the MUA responses below for a better signal-to-noise ratio
(SNR) and a finer spatial resolution (Self et al., 2013;Wang
et al., 2020).
Substantial interlaminar differences for spatial
integration in V1
In a patch-size tuning curve, the response magnitude reached its
peak at a certain patch size (around 0.7in diameter) and then
decreased as the patch size continued to increase (defined as
SS) (Allman et al., 1985), which is true for most recording sites
across V1 layers. A significant increase of SS from input layers
to output layers can be observed in our data (Figures 1D and
2A) (Cavanaugh et al., 2002;DeAngelis et al., 1994;Sceniak
et al., 2001;Shushruth et al., 2009). Quantitatively, the strength
of SS (characterized by SS index [SSI], defined as 1 minus the ra-
tio of response driven by the largest patch and a site’s peak
response, shown in left panel of Figure 2A) in V1 output layers
(L2/3 and 4B) is, on average, twice the strength of that in the input
layers (L4C) (SSI, 0.38 ±0.02 for L4C, N = 94; 0.67 ±0.02 for L2/3
and 4B, N = 125; rank-sum tests p < 10
10
). Interestingly, we
found an unexpected shrinkage of the CRF in V1 output layers
compared with input layers (Figures 2A and 2B). The CRF
(defined as patch diameter reaching 95% peak response, shown
as the arrows in population-averaged patch-size tuning curve in
the middle panel of Figure 2A) in output layer is, on average, 33%
smaller than that in the input layer (CRF, 0.53±0.03for L4C,
N = 94; 0.35±0.03for L2/3 and 4B, N = 125, rank-sum tests
p<10
10
)(Figures 2B and S1B). To exclude the influence of ec-
centricities (Figure 2C) on RF size (Dow et al., 1981;Vanessen
et al., 1984), we also scaled the patch-size tunings to the mean
CRF size of neurons in input layers (CRF = 0.53±0.03) for
each recording session and fitted the patch-size tunings as func-
tions of scaled stimulus sizes (Figure 2D left panel). The laminar
differences are very similar (Figure 2D right panel, and Figure 2B
middle panel).
The annulus-size tuning curves at input layers and those in
output layers were also significantly different. The response to
the grating annulus in input layers was the largest when the inner
size of the annulus was zero (the same as a large grating patch),
and then the responses monotonically decreased when the inner
size of the annulus increased (Figure 2A, right). In contrast, neu-
rons in output layers have weak responses to the grating annulus
with either a small or large inner diameter, and they have the
largest responses when the grating annulus has a certain
nonzero inner diameter (a blank hole in Figure 1D), as if there
was a response elevation by the grating annulus with a specific
inner diameter. Our results in a later section suggested that
this phenomenon is likely due to a reduction in suppression in
the center region (please see details in the following sections),
so we defined the specific phenomenon in the annulus-size tun-
ing curve as center suppression. The degree of center suppres-
sion (defined as the center suppression index [CSI] captured by
1 minus the ratio of response driven by the smallest annulus and
a site’s peak response in the annulus-size tuning curve) in the V1
output layer was significantly larger than that in the input layer
(CSI, 0.07 ±0.01 for L4C, N = 94; 0.39 ±0.03 for L2/3 and 4B,
N = 125; rank-sum tests p < 10
10
)(Figures 2B and S1B).
By directly comparing responses simultaneously recorded
from all V1 layers, we found substantial interlaminar differences
(between input and output layers) for responses to either grating
patches or annuli (Figures 1D and 2) captured by three main fea-
tures in output layers, including the shrinkage of CRF and
increasing strength of SSI and CSI (Figures 1D and S1B and
Table S1). Interestingly, the three interlaminar differences were
highly correlated with each other: the change of CSIs from input
layers to output layers measured by the same probe strongly
correlated to the change in SSI (SS) (r = 0.72, p < 10
10
;Fig-
ure 2E). Similarly, a strong correlation between changes in CRF
and SSI was also observed (r = 0.56, p < 10
10
;Figure 2F;
also see the correlation between changes in relative CRF and
SSI in Figure 2G). The correlation results indicate that these
interlaminar changes might share similar suppressive mecha-
nisms that govern the SS in V1 output layers. Our next question
is what neuronal computation could capture these important
Cell Reports 40, 111221, August 16, 2022 3
Article
ll
OPEN ACCESS
0.1 1 10
Adjusted Diameter (°)
0
0.5
1
N
orma
li
ze
d
R
esponse
Patch Tuning
<=0.2 0.6 >=1
2
3
4B
4C
α
4C
β
5
6
012345
Ecc (°)
0
0.5
1
1.5
CRF (°)
Output: r= 0.34
Input: r= 0.06
CD G
0 0.2 0.4 0.6
SSI (Output-Input)
-0.6
-0.4
-0.2
0
0.2
CRF(°) (Output-Input)
r= -0.57
p<10-10
0 0.2 0.4 0.6 0.8 1
2
3
4B
4C
α
4C
β
5
6
<=0.2 0.4 0.6 0.8 >=1
N= 314
0 0.2 0.4 0.6 0.8 1
0.1 1 10
0
0.5
1
Patch Tuning
0.1 1 10
0
0.5
1
Patch Tuning
0.1 1 10
0
0.5
1
Annulus Tuning
Normalized Response
0 0.2 0.4 0.6
SSI (Output-Input)
0
0.2
0.4
0.6
0.8
1 r= 0.72 p<10 -10
N= 125
0
.65
0
.8
Surround Suppression Index Classical Receptive Field (°) Center Suppression Index
Adjusted CRF (°)
0 0.2 0.4 0.6
SSI (Output-Input)
-0.6
-0.4
-0.2
0
0.2
CRF(°) (Output-Input)
r= -0.56
p<10 -10
CSI (Output-Input)
A
B
E
F
Diameter (°)
0
.65
0
.5
Output Layers
Input Layers
Figure 2. Substantial interlaminar differences for spatial integration in V1
(A) The three main different features of spatial integration between input and output layers. (Left and right) The double-headed arrows show the strength of SS and
center suppression (input layers, N = 94, goodness of fit [GoF] ±0.8; output layers, N = 125, GoF ±0.8). (Middle) The arrows represent the diameters of the
classical receptive field (CRF, see STAR Methods). Error bars indicate SEM.
(B) The laminar patterns of three features. (Left) SS index (SSI). )Middle) The diameter of the CRF (N = 314, GoF ±0.8; see STAR Methods). (Right) Center
suppression index (CSI). In all figures, vertical black lines indicate the mean of the distribution. Schematics of the three features are shown at the bottom. The
differences in their mean values between input and output are shown in the inset. Error bars indicate SEM. ***p < 10
10
.
(C) The relationship between eccentricity and CRF in input (red dot, N = 82, GoF > 0.8 for data from sparse noise; r = 0.06, p = 0.56) and output layers (blue dot,
N = 119, GoF > 0.8 for data from sparse noise; r = 0.34, p = 0.0001).
(D) We scaled patch and annulus tunings according to the scaled ratio of the averaged CRF size in input layers from each session to the mean value of CRF size for
all neurons in input layers (mean = 0.53). (Left) The comparison of averaged scaled patch-size tunings between input (red line) and output (blue line) layers. (Right)
Laminar patterns of the relative CRF.
(E) There was a significant correlation between the interlaminar changes (between input and output layers) of SS and those of center suppression (N = 125,
r = 0.72, p < 10
10
).
(F) There was a significant correlation between the interlaminar changes of SS and size changes (between input and output layers) of the CRF (r = 0.56,
p<10
10
).
(G) There was a significant correlation between the interlaminar changes (between input and output layers) of SS and of the relative CRF size (r = 0.57, p < 10
10
).
See also Figure S1.
4Cell Reports 40, 111221, August 16, 2022
Article
ll
OPEN ACCESS
RF properties for spatial integration and their interlaminar
differences.
A single normalization only explains spatial integration
in V1 input layers
Neural responses to circular patches with different outer sizes
(patch-size tuning curve) and its SS from large patches have
been explained by either subtractive suppression (DoG model,
difference of Gaussian model; see model details in STAR
Methods) or divisive suppression (ratio-of-Gaussian [RoG]
model; see model details in STAR Methods). We confirmed
that either form of suppression could well explain patch-size
tuning curves across V1 layers (see fitting details in STAR
Methods) in our dataset (see Figures S2A–S2F). However,
when neural responses to stimulus annuli at different inner di-
ameters (annulus-size tuning curve) were also considered
(see STAR Methods for details), none of the two descriptive
models could capture the characteristics of spatial integration
in V1 output layers (blue curves in Figure 3A). A single subtrac-
tive suppression (DoG model) failed to capture response pat-
terns elicited by patches and annuli in all V1 layers (Figure 3A,
first column), and although a single divisive suppression (RoG
model) fitted the patch-size and annulus-size tuning curves in
input layers well, it failed to account for tunings in the output
layers (Figure 3A, second column).
According to anatomical studies (Callaway, 1998;Lund, 1988;
Sincich and Horton, 2005), responses in V1 output layers should
be explained by the interaction of feedforward drives from input
layers and suppression in output layers (Rossi et al., 2020;Wang
et al., 2020). To some extent, the failure of a single subtractive or
divisive suppression to explain the spatial integration in output
layers is expected because neither the DoG model nor the
RoG model provided a feedforward excitation similar to the
excitatory drives of a neuron in output layers, which should be re-
sponses in input layers (Figure 2A). In the next section, we show
that a descriptive model (a CN model), composed of two normal-
ization stages, accounts for neuronal responses at V1 input and
output layers simultaneously recorded in a probe placement
(recording session).
CNs capture spatial integration in V1 output layers
There are two stages in the CN model (see model details in STAR
Methods) (Figure 4A). The first stage of the CN model contains a
traditional RoG model, which accounts for the spatial integration
AB
C
Figure 3. A single normalization only explains spatial integration in V1 input layers
(A) The fitting performances of the DoG (left) and ratio-of-Gaussian (RoG) (right) models to a site in layer 3 (top) and layer 4Ca(bottom). A schematic of the model
structure is shown on the top of the tuning. The filled and hollow dots represent the raw data of the patch-size and annulus-size tuning curves, respectively. The
solid and dashed blue lines show the model fit tunings to the patch-size and annulus-size tuning curves, respectively. Error bars indicate SEM.
(B) Comparisons of GoF (see STAR Methods) for the DoG model and RoG model. The blue and red dots are sites from output and input layers, respectively (input,
N = 95; output, N = 134).
(C) The averaged GoF for input and output layers from DoG and RoG models. Error bars indicate SEM. See also Figure S2.
Cell Reports 40, 111221, August 16, 2022 5
Article
ll
OPEN ACCESS
(patch-size tuning and annulus-size tuning) in the input layers re-
corded in a given probe placement. The second stage of the CN
model aimed to account for spatial integration in output layers
recorded in the same probe placement. Altogether, there are
five computational components (excitation and suppressions)
in the CN model: two components at first stage and three com-
ponents at the second stage. The two components at the first
stage (the RoG model) are an excitation (represented by exc_in)
and a divisive suppression (represented by div_in)(Figure 4A and
Equation 8); and the three components at the second stage are
an excitation (represented by ff), a divisive suppression (repre-
sented by div_out), and a subtractive suppression (represented
by sub_out)(Figure 4A and Equation 9). The divisive and subtrac-
tive suppressions at the second stage, as well as the excitation
and divisive suppression at the first stage (RoG model), were
all modeled as summated responses with Gaussian functions
for their pooling weights in space (represented by s
exc_in
,s
div_in
,
s
div_out
, and s
sub_out
, respectively; see STAR Methods). In the CN
model, its first stage explained both patch- and annulus-size tun-
ing curves in input layers (Figure 3C), and, more importantly, its
second stage made excellent performances for fitting those tun-
ings in output layers recorded in the same probe placement (Fig-
ure 4B for an example site, and the inset of Figure 4B is the dis-
tribution of the goodness of fit for individual sites).
At the second stage of the CN model, the feedforward excita-
tion was modulated by divisive suppression sequentially fol-
lowed by subtractive suppression (Equation 9). We also tried
other combinations of subtractive or divisive suppressions
(Figures 4C and S3;Equations 10,11,12,13, and 14). Our results
showed that the cascaded structure and the divisive suppres-
sion at the second stage were important to capture the spatial
integration in output layers (Figure S3). The operational
sequence for the divisive and subtractive suppression had a mi-
nor effect. We chose the best one (division followed by subtrac-
tion) to report the following results.
Our CN model suggested that the spatial integration in the
output layers could be explained by two suppressions and the
feedforward excitation (that is, the neuronal responses in the
input layers). We also wondered whether the model framework
enable us to predict spatial integration in the input layers solely
by fitting the patch- and annulus-size tunings in the output
layers. In this way, we used a ‘‘whole model’’ (Equation 16)to
ABC
DE
Figure 4. CNs capture spatial integration in V1 output layers
(A) Schematic of the CN model structure. Stimuli are shown at the bottom. The tuning curves in input layers are fitted by the RoG model. The responses in output
layers depend on excitatory drive from input layers, subtractive suppression, and divisive suppression. The red arrow indicates the direction of the excitatory
input. The names of the four spatial spreads are at the upper right corner of the Gaussian curve.
(B) The solid and dashed blue lines show the model fit tunings to the raw data for the same L3 site in Figure 2A. The GoF is 0.96. (Inset) Histograms of GoF for the
CN model for 134 recording sites in output layers. The mean GoF is 0.93. Error bars indicate SEM.
(C) Comparison of GoF (black curve in the left axis) and explained variance (purple curve in the right axis; see STAR Methods) among different forms of the
CN model. Operators (or O) at the top represent the operations used by the corresponding model, with the lower operators executed before the upper
operators (if any). Error bars indicate SEM.
(D) Comparison of spatial spreads in the CN model. exc_in and div_in represent the spatial spreads of feedforward excitation and divisive suppression in input
layers (1.96s
exc_in
, 1.96s
div_in
in Equation 8), respectively; div_out and sub_out represent the spatial spreads of divisive and subtractive suppression in output
layers (1.96s
div_out
, 1.96s
sub_out
in Equation 9), respectively (visual space in the left axis, cortical space in the right axis) (Dow et al., 1981). Error bars indicate SEM.
(E) The relationship between the spatial spreads of both divisive suppressions in input and output layers and local connections. FF, feedforward connection; LC,
local connection. Scale bar, 2 mm. See also Figures S3–S5.
6Cell Reports 40, 111221, August 16, 2022
Article
ll
OPEN ACCESS
fit the patch- and annulus-size tunings in the output layers (Fig-
ure S4A). The difference between the whole model and our orig-
inal CN model (Equations 8 and 9) was that the whole model ex-
plains responses in output layers without knowing any
information about responses in the input layers (in other words,
model parameters K
1
,K
2
,s
1
,s
2
for the first stage in Equation
16 were optimized without any constraint from data in input
layers), but our original CN model explained responses in output
layers by knowing information about responses in the input
layers in the same recording session (because model parame-
ters of the first stage in the model have to explain responses in
the input layers). The whole model could also fit the patch- and
annulus-size tuning curves in the output layers well (Figure S4A).
More importantly, model parameters (K
1
,K
2
,s
1
,s
2
) in the whole
model corresponding to those at the first stage in our CN model
(Equation 8) could predict the responses of the input layers from
the same recording session. The predicted tuning curves, the
CRFs, and the SSIs, solely based on information in the output
layers of each recording session, were all significantly correlated
with those in the input layers (Figures S4B and S4C). This result
suggests that the CN model has a good generalizability.
Considering the existence of spatial profiles of excitation and
suppressions in the CN model, we also wondered how different
these spatial spreads were. We defined the spatial spread for
each excitation/suppression as the spatial range that includes
95% pooling weights (the same as 1.96sfor the one-dimensional
[1D] Gaussian functions). In the first stage of the CN model (input
layers), the spread of divisive suppression (1.96s
div_in
) is larger
than the excitatory spread (1.96s
exc_in
), which is consistent
with previous studies (Sceniak et al., 2001). In the second stage
of the CN model (output layers), we found that subtractive sup-
pression occupied a local range, whereas divisive suppression
occupied a much larger spatial range (Figure 4D). The divisive
suppressions in both input layers and output layers are two
to three times larger than the feedforward excitation in input
layers (1.96s
exc_in
), indicating that both divisive suppressions
play a global role, while subtractive suppression is a rather
local computation. In addition, the spatial spreads of both
divisive suppressions in input and output layers (1.96s
div_in
and
1.96s
div_out
) are larger than the feedforward (Mooser et al.,
2004;Yabuta and Callaway, 1998) and local recurrent connec-
tions (Douglas et al., 1995;Rockland and Pandya, 1979) when
they are transformed to the cortical distances based on the func-
tion of the cortical magnification factor from Dow’s results
(Figures 4E; Dow et al., 1981).
CNs are important for spatial integration in V1
To reveal why the CN model could capture the spatial integration
in output layers while previous models could not, we analyzed
the influence of multiple computational components. For
patch-size tuning, two layer-specific divisions and one subtrac-
tion successively increased the strength of SS, and both
divisions compressed the CRF (Figures S6A–S6C). For the
annulus-size tuning, the decrease of inhibitions with increased
annulus size has two effects: (1) the divisive suppression scales
annulus tuning of excitation at corresponding stage (marked as
red dashed line in Figure S6D for input layer and black dashed
line S6E for output layer) to lower amplitude more for smaller
annulus than for large annulus, which broadens the annulus
tuning of excitation; (2) the strength of subtractive suppression
decreases with annulus size much faster than that of excitation
does (Figure S6F), which means that the local subtraction is
very weak for stimulus annulus beyond medium sizes (the gray
dashed line Figure S6F). Therefore, the relative response in-
crease to the annulus at the medium size was due to the reduc-
tion of subtraction plus the weakening and widening of the
response field due to the divisions. (Figure S6F). However, a
model with a single divisive and subtractive operation added to
an excited Gaussian was not enough to capture the relative
response increase and its spatial range for both patch and
annulus tunings (the rightmost column in Figures S3A–S3C).
These results suggested that two divisions are needed. In other
words, CNs are important for spatial integration (for both stim-
ulus patches and annuli) in V1 output layers.
Next, we investigated in what aspects the different computa-
tional components at the second stage (the feedforward
excitation from input layers and the divisive and subtractive
suppressions in output layers) contribute to the three main RF
properties (SSI, CRF, and CSI) related to spatial integration in
output layer (Figure 2A). This was done as follows, with the SSI
as an example. For a given probe placement, we started with
the CN model parameters that fit well to patch- and annulus-
size tuning curves in both input layer and output layer. The SSI
in the first stage of the CN model was defined as the SSI
FF
(feed-
forward drive; red curves in Figure 5A) and the SSI in the second
stage was defined as SSI
Full
(full model explanation for output
layer; blue curves in Figure 5A). We then defined SSI
FF&Sub
as
the SSI when the divisive component in the second stage was
removed (response in output layer with only a subtraction, black
curves in the upper panels of Figure 5A), which represents the
SSI due to FF input and the subtractive suppression in the sec-
ond stage; similarly, we also defined SSI
FF&Div
as the SSI when
the subtractive suppression was removed (response in output
layer with only a division; black curves in the lower panels of
Figure 5A).
As our results showed, both the subtractive and divisive
suppression could strengthen the SS on the basis of the FF drive
(Figures 5A and 5B, middle). In addition, we quantitatively
estimated how much different computational components
contributed to the SSI in output layer. We defined the contribu-
tion of the FF drive as the ratio of SSI
FF
to SSI
Full
, and the contri-
bution of each suppression (Sub or Div) as SSI changed relative
to SSI
Full
when either of the two suppressions was added (Fig-
ure 5C; see STAR Methods). The remaining of one minus the
sum of three computational contributions was considered the
contribution of the interaction between subtraction and division.
To our surprise, the FF drive played a key role in the SS of
output layers, which occupied 47%, followed by the divisive
and subtractive components (Figure 5C, left, 23% versus
11%). Although the absolute divisive suppressive contribution
to the SS was not very large, the FF’s suppressive effect origi-
nated from the divisive suppression in the first stage. In total,
divisive normalization plays the most important role in SS.
Similarly, our results suggested that the divisive suppression
contributed most to the shrinkage of CRF (Figures 5B and 5C,
middle) and that center suppression mainly depended on the
Cell Reports 40, 111221, August 16, 2022 7
Article
ll
OPEN ACCESS
A
B
C
Figure 5. Contributions of different computational components to the spatial integration in output layers
(A) Comparison of tuning curves among four models (feedforward driven fitted by the RoG model, feedforward and sub model, feedforward and div model, and full
model). Solid and dashed lines represent the patch-size and annulus-size tuning curves from the four models, and the black arrows represent the changes caused
by subtractive or divisive suppression.
(B) The differences in three main features (SSI, CRF, CSI) among the four models. The significances are shown at the top: ***p < 10
10
; ns, p > 0.05.
(C) The average percentage of contributions from the FF, Sub, and Div to three features (SSI, reduction of CRF, CSI). See also Figure S6.
8Cell Reports 40, 111221, August 16, 2022
Article
ll
OPEN ACCESS
interaction between subtractive and divisive suppression
(Figures 5B and 5C, right), which again suggests the important
functions of normalization in spatial integration in V1.
Thus far, we have shown performances and contributions of
different computational components in the CN model in 1D
space (Figures 4 and 5). We found that models with CNs in
two-dimensional (2D) space (Figure S5) can also account for
our data, and conclusions based on 2D models are held similarly,
which will be elaborated on in the STAR Methods and supple-
mental materials and discussion.
Spatial integration in CNNs is different from that in V1
Many studies have suggested that early layers in CNNs closely
resemble the early visual cortex (V1) of primates (Cichy et al.,
2016;Giraldo and Schwartz, 2019;Guclu and van Gerven,
2015;Xu and Vaziri-Pashkam, 2021;Zeiler and Fergus, 2014),
including RF structures (Gabor filter) and computational opera-
tions. However, the comparisons between CNNs and the visual
system are, thus far, mostly assessed based on population-
based matrices (such as representational similarity analysis
[RSA]) and those studies (Kriegeskorte and Kievit, 2013;Xu
and Vaziri-Pashkam, 2021) did not make claims regarding the
similarity of single neuron properties, including spatial integra-
tion properties. Whether CNNs have similar spatial integration
capabilities that rival macaque V1 or whether CNNs also
apply normalization for spatial integration in a way similar to
the biological visual cortex are still unclear. To address these
questions, we compared spatial integration in macaque V1
with those artificial neurons in the convolutional layers from
five different CNNs, including AlexNet, DenseNet-201,
GoogleNet, ResNet-18, and VGG, which were all pre-trained
with ImageNet images (Deng et al., 2009). We used grating
patches and annuli of different sizes to activate each unit in
the first five convolutional layers of these CNNs. Taking the first
layer of AlexNet as an example (Figure 6A), we formed patch-
and annulus-size tuning curves for each unit (Figure 6B). The
strengths of SS (SSI) and CSI in the first layer of AlexNet
were very weak and significantly different from those in V1
output layers of macaques (Figure 6C).
When we evaluated the spatial integration of CNNs for deeper
layers, we found that the SS (SSI) and CSI of artificial neurons
increased as the layer deepened (Figures S7A–S7E; Table S4).
The SSI distributions, which are more similar to the V1 output
layers, are located in layers 3–4, while the CSI distributions are
located in layers 2–4 from the five CNNs (Figures S7A–S7E). In
addition, we calculated a similarity score (for SSI and CSI) to
estimate the Kolmogorov-Smirnov (KS) distance between
CNNs and V1 (Marques et al., 2021) (see STAR Methods). The
layer most similar to V1 for spatial integration is the middle layer
(L2–L4; Figure 6D).
In early sections, we showed that interlaminar differences of
SS (SSI) and CRF in V1 were mainly due to normalization in
output layers (Figure 5). We further evaluated the effects of
normalization on spatial integration in the CNNs by comparing
the changes in SSI or CRF before and after normalization in the
convolutional layers (see Table S5 for the CNN layers sampled).
Although the normalizations (local response normalization [LRN])
in the AlexNet and GoogleNet influence the shrinkage of CRF,
surprisingly, the operation of normalization (including both batch
normalization and LRN) in CNNs had almost no effect on the SSI
(Figure 6E).
DISCUSSION
By using visual stimuli with various spatial configurations and
directly comparing responses simultaneously recorded from all
V1 layers, we have shown substantial interlaminar differences
for features ofspatial integration between input and output layers.
We further demonstrated that a descriptive model (both 1D and
2D) with CNs could capture the response properties of spatial
integration in V1 and their laminar variations (Figure 7A). Interest-
ingly, we found that the spatial integration abilities in the lower
layers of CNNs were generally weak, and they were less affected
by the normalization in CNNs (Figure 7B). Our results suggested
that suppressions with CNs are essential for spatial integration
and laminar processing in the visual cortex, which might be a gen-
eral computational framework for laminar processing in the brain
that the current artificial networks might miss.
Descriptive models for V1 spatial integration
Previous studies have shown that either a subtraction (DoG
model) (Alitto and Usrey, 2015;Keller et al., 2020b;Sceniak
et al., 2006) or a division (RoG model) (Cavanaugh et al., 2002;
Self et al., 2014;Vangeneugden et al., 2019) fitted well to the
SS in V1. This is largely because the patch-size tuning curve
(responses to grating patches of different sizes) was solely
used in most studies and there were not enough constraints
for differing models (also see our confirmation in Figure S2).
However, when experimental conditions with more spatial con-
figurations were included in the model evaluations, the medium
SS in V1 input layer could only be fitted by the traditional RoG
model with a divisive computation (Figure 3), and an additional
normalization followed by a subtraction was required to explain
the strong SS in output layer (Figure 4).
Most studies (Alitto et al., 2019;Cavanaugh et al., 2002;
DeAngelis et al., 1994;Fisher et al., 2017;Keller et al., 2020a;
Levitt and Lund, 2002;Vangeneugden et al., 2019;Yu et al.,
2022) have used 1D models (DoG or RoG) to describe the SS
in patch-size tuning, but a 2D model (Roberts et al., 2005;Sce-
niak et al., 2006) is more appropriate to describe the properties
of spatial integration because the cortical neuron at a given
V1 depth and visual stimuli used in visual space are both in
2D space. Our study also provides evaluations for different
CN models in 2D space (in the STAR Methods section). Model
parameters and performances between our 1D CN model and
the 2D Gaussian CN model (Figures S5B–S5D) suggest that 2D
Gaussian functions may not be the best for capturing the spatial
summation weights of model components. How exactly we
should describe the weights of spatial integration might be an
intriguing question for future studies. Our study further provided
the 2D modified CN model (Figures S5E–S5G) whose fitting per-
formances and model parameters were all consistent with the re-
sults in the 1D CN model (Figures S5E–G). The advantage of a 2D
CN model is that it allows us to estimate neural responses to im-
ages with more complex patterns (Figures S5N and S5O), such
as natural scenes.
Cell Reports 40, 111221, August 16, 2022 9
Article
ll
OPEN ACCESS
It is worth pointing out that descriptive models (either 1D or 2D)
in this study do not suggest a pure cascaded model for laminar
processing and spatial integration. Instead, our models suggest
the importance of CNs. In the evaluation of different models, we
also tried a pure cascaded model in which both the suppressions
at the second stage were calculated from the responses at the
A
BC
DE
Figure 6. Spatial integration in CNNs is different from that in V1
(A) Schematic of the main structure of AlexNet. There are four main types of operations in the artificial neural network: Conv (convlution), ReLU (rectified linear
unit), normalization, and max pooling. We extract the feature representations (the center activation of each feature map) in the five convolutional layers (layers
rendered with cubes) of the grating stimuli shown to the monkeys.
(B) The normalized and scaled patch- and annulus-size tuning curves across all neurons (channels) in AlexNet layer 1 (L1). The solid and dashed black lines
indicate the averaged tunings.
(C) Comparisons of cumulative distributions to the strength of SS and center suppression between AlexNet laye r 1 and V1. Average values are indicated by
triangles.
(D) The similarity of SSI and CSI between five CNNs and V1. The horizontal axis represents different layers, and the vertical axis represents different networks.
(E) Comparisons of changes in SSI and CRF before and after normalization between five CNNs and V1 output layers. See also Figure S7. Error bars indicate SEM.
10 Cell Reports 40, 111221, August 16, 2022
Article
ll
OPEN ACCESS
first stage (we defined the model as the 2D CN model with Div-
Input and Sub-Input; see details in STAR Methods section
‘‘materials availability’’). However, the pure 2D cascaded model
with Div-Input and Sub-Input did not show good performances
(Figures S5H–S5J) unless we calculated the subtractive compo-
nent at the second stage from visual stimuli or pre-cortical
inputs (Figures S5K-S5M). This suggests that the mechanisms
contributed to the local subtraction may be different from global
division. The local subtraction may be related to the inhibitory
signals from the input layers (bottom up), supported by experi-
mental findings that neurons at output layers can be directly in-
hibited by neurons in V1 input layers (Binzegger et al., 2004;
Dantzker and Callaway, 2000;Katzel et al., 2011;Xu et al.,
2016). Nevertheless, these results still supported the conclusion
that the CNs played an important role in spatial integration in V1
because the global division as a necessary computation for
spatial integration is robust no matter whether it is calculated
from the visual stimulus or the responses in input layers.
Neural mechanisms of spatial integration in V1
Previous results from whole-cell recordings in anesthetized cats
showed that large stimulus patches, compared with the optimal
size, lead to decreases in both inhibition and excitation, with a
larger reduction for excitation, to the V1 neurons (Ozeki et al.,
2009). Similarly, a recent study with whole-cell recording in
awake macaques also showed reduced synaptic input from
larger stimuli (Li et al., 2020). Our experimental results, based
on simultaneous recordings in input and output layers from the
same V1 column, support the conclusions about the reduction
of excitation with large stimuli in the two studies. The excitatory
drive (the responses at V1 input layers) to the output layer (the
second stage in our model) indeed decreased with a large stim-
ulus size. In particular, our results (responses at the input layer
decrease with large stimulus) might provide a possible explana-
tion for the study from V1 layer 2/3 in awake monkeys (Li et al.,
2020). It is worth pointing out that our CN models predict that
division and subtractions increase with stimulus size, which is
inconsistent with the report in anesthetized cats that the inhibi-
tion was reduced by large stimuli (Ozeki et al., 2009). We realize
that the interaction between divisive and subtractive suppres-
sions in our model suggests that the size dependence of inhibi-
tion may be complex. We also speculate that species differences
might lead to such a discrepancy (Han et al., 2021b). Model re-
sults of macaques and the experimental results of mice showed
that enhanced inhibition at large size plays an essential role in
spatial integration (Adesnik et al., 2012;Schwabe et al., 2006;
Shushruth et al., 2012). A dynamic model similar to that for anes-
thetized cats (Ozeki et al., 2009) is worth considering in future
work to further explore the underlying interaction between
excitation and inhibition (Angelucci et al., 2017;Haider and
McCormick, 2009;Ozeki et al., 2009) in a laminar-specific
manner.
It has been shown that somatostatin-expressing (SOM) neu-
rons play an important role in SS in mouse V1 (Adesnik et al.,
2012). It is still an interesting question to ask whether SOM
modulates pyramidal neurons in a subtractive or divisive way un-
der large and complex spatial configurations. Studies in mice
with optogenetic stimulation and in vivo two-photon imaging
(El-Boustani and Sur, 2014;Lee et al., 2012;Wilson et al.,
2012) suggested that SOM neurons and parvalbumin-express-
ing (PV) neurons can modulate neurons in either a subtractive
or divisive way. In addition, the modulatory effect of SOM
AB
Figure 7. Summary of the differences between CNN and V1 to spatial integration
(A) (Left) Schematics of the computational principles of spatial integration in V1. Red shading illustrates input layers (L4C and L6) in which the division plays a key
role. Blue shading illustrates output layers (L2/3, L4B, and L5) that need CNs and a subtractive component. (Right) Schematics of the changes in the tuning curves
to the spatial integration. Red arrows indicate the origin of the excitatory inputs. Red curves indicate tunings inherited from input layers (solid line represents
patch-size tuning; dashed line represents annulus-size tuning). Blue curves indicate the output tunings modulated by the interactions among the feedforward
drive and divisive and subtractive suppressions. The black arrows represent the changes between input and output layers.
(B) (Left) Schematic of the main structure in AlexNet. (Right) Schematics of the changes by normalization in AlexNet. Red and blue curves indicate tunings in the
ReLU and normalization layer, respectively. The black arrows represent the changes between the ReLU and normalization layer.
Cell Reports 40, 111221, August 16, 2022 11
Article
ll
OPEN ACCESS
neurons also depends on the stimulus size (El-Boustani and Sur,
2014). However, because the types of inhibitory neurons might
vary among species (Kooijmans et al., 2020), the specific cellular
mechanisms for divisive or subtractive suppression in the
macaque cortex require further study.
Neural functions of normalization in the brain
The basic idea of normalization in the biological brain is that a
neuron’s excitatory drive is divided by the summed neural
activity from neighboring neurons (Carandini and Heeger,
2012). Studies have shown that normalization could explain a
wide variety of phenomena in many brain regions, including
contrast gain control (Shapley and Victor, 1981;Shapley and
Xing, 2013;Sit et al., 2009), cross-orientation suppression
(Busse et al., 2009;Heeger, 1992;Wang et al., 2021), responses
to motion patterns in MT (Rust et al., 2006), attention modulation
(Bloem and Ling, 2019;McAdams and Maunsell, 1999;Reynolds
and Heeger, 2009;Verhoef and Maunsell, 2016), and multisen-
sory integration (Ohshiro et al., 2011;Pack and Bensmaia,
2015). Our study provides strong experimental evidence for the
existence and importance of CNs for spatial integration in V1.
Normalization is considered to have an important role in
information processing in the brain by removing statistical
redundancy or increasing independence between neuronal
responses (Schwartz and Simoncelli, 2001). The two normaliza-
tions found in our study seem redundant for serving such a func-
tion, but because the hierarchical feedforward connection might
generate neural correlation from one stage to another (Renart
et al., 2010), CNs might be necessary for keeping independent
information from layer to layer. CNs might also remove redun-
dant information in different feature dimensions. Normalization
in the first stage (input layers) might reduce redundancy for
non-orientation-specific information (such as luminance and
contrast information) in a large space, and normalization in the
second stage (output layers) might further remove redundancy
for orientation/motion-specific information. The cascaded
structure of normalization may provide a general framework for
other visual cortices or other sensory cortices. In addition to
the functions in information coding at the individual or population
neuronal levels, normalization has also been proposed for
neuronal functions related to perception in V2/V4/MT, such as
border ownership, visual pop-out, and crowding (Coen-Cagli
et al., 2012;Gao and Vasconcelos, 2009;Henry and Kohn,
2020;Rust et al., 2006;Zhou et al., 2000). Our study has
shown that the combination of normalization and subtraction
can generate strong nonlinearity in V1 (a relative increase in re-
sponses to medium-sized annulus). We speculated that the
CNs also help downstream cortical regions gain more nonlinear
functions related to visual perceptions. How neural normalization
in V1 is related to visual perception is an interesting topic for
future study.
Spatial integration and normalizations in CNNs
Although CNNs have been challenged for missing recurrent and
feedback connections (Huang et al., 2020;Kar et al., 2019;Xu
and Vaziri-Pashkam, 2021), they have achieved good perfor-
mance in object categorization (Kriegeskorte, 2015;Rajalingham
et al., 2018;Serre, 2019;Yamins and DiCarlo, 2016). In
particular, studies have shown that the first layer of the five
CNNs chosen by our study is similar to the primate primary visual
cortex (Cichy et al., 2016;Giraldo and Schwartz, 2019;Guclu
and van Gerven, 2015;Xu and Vaziri-Pashkam, 2021;Zeiler
and Fergus, 2014). This suggests that the feedforward connec-
tions between V1 and CNNs more or less share similar principles.
However, our results showed that the properties of spatial inte-
gration at the lowest layer in the five CNNs are different from
those of V1 (Figures 6B, 6C, and S7). This suggests that the
missing feedback connections might be a main reason for the
different spatial integration between the biological brain and
artificial intelligence (AI).
Regardless of the connections in CNNs, it is worth pointing out
that distributions of SSI or CSI in a specific deeper layer of the
five CNNs (layers 2–4 depending on CNN types) are more similar
to those in V1 output layers (Figures S7A–S7E). This suggests
that CNNs are able to integrate spatial information by cascaded
computations at multiple layers, but the computation is different
from that in V1. Two types of normalization have been proposed
and widely used in CNNs: LRN in AlexNet (Krizhevsky et al.,
2017) and GoogleNet; and batch normalization (BN) in ResNet-
18 and DenseNet-201 (Ioffe and Szegedy, 2015). These normal-
izations did not generate V1-like surround effects in CNNs
because neither the LRN nor the BN has the same computational
form as the normalizations in V1. Although the LRN is inspired by
lateral inhibition in the brain, it offers little effect on the SSI, and
its influence on the CRF size is weak. We speculated that the
normalization in AlexNet lacks interactions among spatial
contexts or has a different spatial range for interactions. The
activation of an artificial neuron is normalized by neighboring
neurons at the same spatial location (Krizhevsky et al., 2017).
On the other hand, the activation of an artificial neuron is normal-
ized by the entire feature map, which has a uniform weight in
space under the BN (Ioffe and Szegedy, 2015). In contrast, the
spatial extents of normalizations are about two to four times
larger than the excitatory spatial extents in V1 (Figure 4).
Recently, some modified CNNs considering biological normali-
zation or spatial schemes have shown improvements in different
ways (Giraldo and Schwartz, 2019;Hasani et al., 2019;Iyer et al.,
2020;Ren et al., 2016), while the structures and spatial extents
are not similar to our results in V1 (Figure 4). In addition, our
results and other previous studies all show significant laminar
variances (Bijanzadeh et al., 2018;Henry et al., 2013,2020;
Wang et al., 2020;Xing et al., 2012;Yeh et al., 2009) of neural
properties, which suggests the importance of interlaminar
processing, while the modulation within layers is limited in
CNNs (Figure 6E). In summary, the precise nature and function
of normalization have not been well reflected in CNNs (Giraldo
and Schwartz, 2019;Turner et al., 2019), suggesting that consid-
ering the spatial extents and the functions of normalization in the
primate visual cortex may further improve the performance of
CNNs.
Limitations of the study
Our CN model is a descriptive model that tries to capture the
properties of steady-state responses for spatial integration and
their laminar variations in V1. Such a model still has a gap to
reach the true neural mechanisms, although we succeeded in
12 Cell Reports 40, 111221, August 16, 2022
Article
ll
OPEN ACCESS
characterizing the manner of the inhibitory effect for spatial inte-
gration in macaque V1, including division and subtraction. We
can hardly reveal the underlying interactions between excitation
and inhibition-like results from intracellular recordings or dy-
namic models (Angelucci et al., 2017;Haider and McCormick,
2009;Li et al., 2020;Ozeki et al., 2009;Schwabe et al., 2006;
Shushruth et al., 2012). Furthermore, there is a significant
difference in the spatial profile of the 1D and 2D Gaussian, espe-
cially when fitting small-sized patch stimuli (Figures 4B and S5B).
The spatial profile in the Gaussian form has been a widely used
function in the descriptive model; whether the spatial connection
weights among the population neurons conform to the Gaussian
shape is an interesting question for further exploration. Finally,
the accurate circuitry origins for each computation in our CN
model are still unclear. We could only speculate that division
may originate from long-range horizontal connections (Angelucci
et al., 2002b;Shushruth et al., 2012,2013) or feedback connec-
tions (Angelucci et al., 2002a;Gilbert and Wiesel, 1989;Stettler
et al., 2002;Wang et al., 2020;Yang et al., 2022), while the sub-
traction might be related to the local recurrent connections
(Douglas et al., 1995;Rockland and Pandya, 1979) based on
their spatial extents. More experimental and computational
studies are needed to further reveal the underlying biological
mechanisms.
STAR+METHODS
Detailed methods are provided in the online version of this paper
and include the following:
dKEY RESOURCES TABLE
dRESOURCE AVAILABILITY
BLead contact
BMaterials availability
BData and code availability
dEXPERIMENTAL MODEL AND SUBJECT DETAILS
dMETHOD DETAILS
BElectrophysiological recordings
dVISUAL STIMULI
dQUANTIFICATION AND STATISTICAL ANALYSIS
BLaminar alignment
BCharacteristics of spatial integration
BModel fitting and evaluation
B2D models with cascaded normalizations
B2D CN model could be used in more complex spatial
configurations
BFitting goodness
BComputational contribution
BCNN details
BComparing the spatial integration between the brain
and CNNs
BStatistics
SUPPLEMENTAL INFORMATION
Supplemental information can be found online at https://doi.org/10.1016/j.
celrep.2022.111221.
ACKNOWLEDGMENTS
We thank all the four anonymous reviewers for their constructive comments
and valuable advice. This work was supported by the fellowship of China Post-
doctoral Science Foundation (grant no.2021M690435), and the National Nat-
ural Science Foundation of China (grant no. 32100831 [T.W.] and grant
no.32171033 [D.X.]).
AUTHOR CONTRIBUTIONS
D.J.X., Y.L., and T.W. designed the research. Y.L., T.W., Y.Y., W.F.D., Y.J.W.,
L.Y.Z., C.L.H., and D.J.X. performed the research. Y.L., T.W., G.W., L.L., and
D.J.X. analyzed the data. Y.L., T.W., L.F.L., and D.J.X. conducted the model
fitting. Y.L., T.W., F.D., and D.J.X. wrote the paper.
DECLARATION OF INTERESTS
The authors declare no competing interests.
Received: November 23, 2021
Revised: April 19, 2022
Accepted: July 25, 2022
Published: August 16, 2022
REFERENCES
Adesnik, H., Bruns, W., Taniguchi, H., Huang, Z.J., and Scanziani, M. (2012). A
neural circuit for spatial summation in visual cortex. Nature 490, 226–231.
https://doi.org/10.1038/nature11526.
Alitto, H.J., Rathbun, D.L., Fisher, T.G., Alexander, P.C., and Usrey, W.M.
(2019). Contrast gain control and retinogeniculate communication. Eur. J. Neu-
rosci. 49, 1061–1068. https://doi.org/10.1111/ejn.13904.
Alitto, H.J., and Usrey, W.M. (2015). Surround suppression and temporal pro-
cessing of visual signals. J. Neurophysiol. 113, 2605–2617. https://doi.org/10.
1152/jn.00480.2014.
Allman, J., Miezin, F., and Mcguinness, E. (1985). Stimulus specific respo nses
from beyond the classical receptive-field - Neurophysiological mechanisms
for local global comparisons in visual neurons. Annu. Rev. Neurosci. 8,
407–430. https://doi.org/10.1146/annurev.ne.08.030185.002203.
Angelucci, A., Bijanzadeh, M., Nurminen, L., Federer, F., Merlin, S., and Bres sl-
off, P.C. (2017). Circuits and mechanisms for surround modulation in visual
cortex. Annu. Rev. Neurosci. 40, 425–451. https://doi.org/10.1146/annurev-
neuro-072116-031418.
Angelucci, A., Levitt, J.B., and Lund, J.S. (2002a). Anatomical origins of the
classical receptive field and modulatory surround field of single neurons in ma-
caque visual cortical area V1. Prog. Brain Res. 136, 373–388. https://doi.org/
10.1016/s0079-6123(02)36031-x.
Angelucci, A., Levitt, J.B., Walton, E.J.S., Hupe, J.M., Bullier, J., and Lund, J.S.
(2002b). Circuits for local and global signal integration in primary visual cortex.
J. Neurosci. 22, 8633–8646.
Bair, W., Cavanaugh, J.R., and Movshon, J.A. (2003). Time course and time-
distance relationships for surround suppression in macaque V1 neurons.
J. Neurosci. 23, 7690–7701.
Bijanzadeh, M., Nurminen, L., Merlin, S., Clark, A.M., and Angelucci, A. (2018).
Distinct laminar processing of local and global context in primate primary vi-
sual cortex. Neuron 100, 259–274.e4. https://doi.org/10.1016/j.neuron.2018.
08.020.
Binzegger, T., Douglas, R.J., and Martin, K.A.C. (2004). A quantitative map of
the circuit of cat primary visual cortex. J. Neurosci. 24, 8441–8453. https://doi.
org/10.1523/Jneurosci.1400-04.2004.
Blasdel, G.G., and Lund, J.S. (1983). Termination of afferent axons in macaque
striate cortex. J. Neurosci. 3, 1389–1413.
Cell Reports 40, 111221, August 16, 2022 13
Article
ll
OPEN ACCESS
Bloem, I.M., and Ling, S. (2019). Normalization governs attentional modulation
within human visual cortex. Nat. Commun. 10, 5660. https://doi.org/10.1038/
s41467-019-13597-1.
Buffalo, E.A., Fries, P., Landman, R., Buschman, T.J., and Desimone, R.
(2011). Laminar differences in gamma and alpha coherence in the ventral
stream. Proc. Natl. Acad. Sci. USA 108, 11262–11267. https://doi.org/10.
1073/pnas.1011284108.
Busse, L., Wade, A.R., and Carandini, M. (2009). Representation of Concurrent
stimuli by population activity in visual cortex. Neuron 64, 931–942. https://doi.
org/10.1016/j.neuron.2009.11.004.
Callaway, E.M. (1998). Local circuits in primary visual cortex of the macaque
monkey. Annu. Rev. Neurosci. 21, 47–74. https://doi.org/10.1146/annurev.
neuro.21.1.47.
Calvert, G.A., and Thesen, T. (2004). Multisensory integration: methodological
approaches and emerging principles in the human brain. J. Physiol. Paris 98,
191–205. https://doi.org/10.1016/j.jphysparis.2004.03.018.
Carandini, M., and Heeger, D.J. (2012). Normalization as a canonical neural
computation. Nat. Rev. Neurosci. 13, 51–62. https://doi.org/10.1038/nrn3136.
Cavanaugh, J.R., Bair, W., and Movshon, J.A. (2002). Nature and interaction of
signals from the receptive field center and surround in macaque V1 neurons.
J. Neurophysiol. 88, 2530–2546. https://doi.org/10.1152/jn.00692.2001.
Cichy, R.M., Khosla, A., Pantazis, D., Torralba, A., and Oliva, A. (2016).
Comparison of deep neural networks to spatio-temporal cortical dynamics
of human visual object recognition reveals hierarchical correspondence. Sci.
Rep. 6, 27755. https://doi.org/10.1038/srep27755.
Coen-Cagli, R., Dayan, P., and Schwartz, O. (2012). Cortical surround interac-
tions and perceptual salience via natural scene statistics. PLoS Comput. Biol.
8, e1002405. https://doi.org/10.1371/journal.pcbi.1002405.
Coen-Cagli, R., Kohn, A., and Schwartz, O. (2015). Flexible gating of contex-
tual influences in natural vision. Nat. Neurosci. 18, 1648–1655. https://doi.
org/10.1038/nn.4128.
Dantzker, J.L., and Callaway, E.M. (2000). Laminar sources of synaptic input to
cortical inhibitory interneurons and pyramidal neurons. Nat. Neurosci. 3,
701–707. https://doi.org/10.1038/76656.
DeAngelis, G.C., Freeman, R.D., and Ohzawa, I. (1994). Length and width tun-
ing of neurons in the cat’s primary visual cortex. J. Neurophysiol. 71, 347–374.
https://doi.org/10.1152/jn.1994.71.1.347.
Deng, J., Dong, W., Socher, R., Li,