ArticlePDF Available

Spectral receptive fields do not explain tuning for boundary curvature in V4

Authors:

Abstract and Figures

The mid-level visual cortical area V4 in the primate is thought to be critical for the neural representation of visual shape. Several studies agree that V4 neurons respond to contour features, e.g., convexities and concavities along a shape boundary, that are more complex than the oriented segments encoded by neurons in the primary visual cortex. Here we compare two distinct approaches to modeling V4 shape selectivity: one based on a spectral receptive field (SRF) map in the orientation and spatial frequency domain and the other based on a map in an object-centered angular-position and contour curvature space. We test the ability of these two characterizations to account for the responses of V4 neurons to a set of parametrically designed two-dimensional shapes recorded previously in the awake macaque. We report two lines of evidence suggesting that the SRF model does not capture the contour sensitivity of V4 neurons. First, the SRF model discards spatial phase information, which is inconsistent with the neuronal data. Second, the amount of variance explained by the SRF model was significantly less than that explained by the contour curvature model. Notably, cells best fit by the curvature model were poorly fit by the SRF model, the latter being appropriate for a subset of V4 neurons that appear to be orientation tuned. These limitations of the SRF model suggest that a full understanding of mid-level shape representation requires more complicated models that preserve phase information and perhaps deal with object segmentation.
Content may be subject to copyright.
Spectral receptive fields do not explain tuning for boundary curvature in V4
Timothy D. Oleskiw, Anitha Pasupathy, and Wyeth Bair
Department of Biological Structure and National Primate Research Center, University of Washington, Seattle, Washington
Submitted 31 March 2014; accepted in final form 22 July 2014
Oleskiw TD, Pasupathy A, Bair W. Spectral receptive fields do not
explain tuning for boundary curvature in V4. J Neurophysiol 112: 2114 –
2122, 2014. First published July 23, 2014; doi:10.1152/jn.00250.2014.—
The midlevel visual cortical area V4 in the primate is thought to be
critical for the neural representation of visual shape. Several studies
agree that V4 neurons respond to contour features, e.g., convexities
and concavities along a shape boundary, that are more complex than
the oriented segments encoded by neurons in the primary visual
cortex. Here we compare two distinct approaches to modeling V4
shape selectivity: one based on a spectral receptive field (SRF) map in
the orientation and spatial frequency domain and the other based on a
map in an object-centered angular position and contour curvature
space. We test the ability of these two characterizations to account for
the responses of V4 neurons to a set of parametrically designed
two-dimensional shapes recorded previously in the awake macaque.
We report two lines of evidence suggesting that the SRF model does
not capture the contour sensitivity of V4 neurons. First, the SRF
model discards spatial phase information, which is inconsistent with
the neuronal data. Second, the amount of variance explained by the
SRF model was significantly less than that explained by the contour
curvature model. Notably, cells best fit by the curvature model were
poorly fit by the SRF model, the latter being appropriate for a subset
of V4 neurons that appear to be orientation tuned. These limitations of
the SRF model suggest that a full understanding of midlevel shape
representation requires more complicated models that preserve phase
information and perhaps deal with object segmentation.
shape processing; object recognition; ventral visual pathway; ma-
caque monkey; computational model
VISUAL OBJECT PERCEPTION and recognition in primates is based
on sensory information processing within the ventral visual
pathway (Felleman and Van Essen 1991; Mishkin and Unger-
leider 1982). Over the last half-century, studies of the primary
visual cortex (V1) have identified local orientation and spatial
frequency as the basis dimensions of form representation at the
early stages in the ventral pathway (Campbell and Robson
1968; De Valois and De Valois 1990; Hubel and Wiesel 1959,
1965, 1968; Movshon et al. 1978; Schiller et al. 1976). At
intermediate stages, in particular area V4, the representation
has yet to be firmly established. Neurons in V4 have been
shown to be selective for bars of different length, for radial or
concentric gratings, for moderately complex shapes, and spe-
cifically for the curvature of segments of the bounding contour
of shapes (Desimone and Schein 1987; Gallant et al. 1993;
Hegdé and Van Essen 2007; Kobatake and Tanaka 1994;
Nandy et al. 2013; Pasupathy and Connor 1999, 2001). No
single model is widely accepted to account for these observa-
tions, but a common approach to explaining extrastriate re-
sponses in both the dorsal and ventral pathways is to model
them in terms of selectivity for simple combinations of the
features that are represented at earlier levels. This amounts to
using weighted combinations of V1-like channels to fit the
observed data (Cadieu et al. 2007; David et al. 2006; Rust et al.
2006; Vintch 2013; Willmore et al. 2010). Here we examine
whether an instance of this approach, known as the spectral
receptive field (SRF) model (David et al. 2006), can account
for complex curvature selectivity observed in V4 neurons.
The SRF model describes the tuning of V4 neurons in terms
of a weighting function across orientation and spatial fre-
quency bands in the power spectrum of the stimulus (David et
al. 2006). This model has the elegant simplicity of combining
V1-like signals in a manner that discards phase and thereby
produces translation invariance, a key feature of V4 responses
(Gallant et al. 1996; Pasupathy and Connor 1999, 2001; Rust
and DiCarlo 2010). It has also been argued (David et al. 2006)
that the SRF model can account for the ability of V4 neurons
to respond to complex shapes in terms of contour features at a
particular location within an object-centered reference frame
(Pasupathy and Connor 2001). For example, some neurons
may respond strongly to shapes with a sharp convexity to the
upper right, while others may respond to shapes with a con-
cavity to the left. These patterns of selectivity are well modeled
by two-dimensional (2D) Gaussian tuning functions in a space
defined by 1) the curvature of the boundary and 2) angular
positions relative to object center (Pasupathy and Connor
2001). They are also well modeled by a hierarchical contour
template model (Cadieu et al. 2007). Using the previously
recorded data set on which both of these models were based,
we examine whether the SRF model, the simplest of the three,
can account for the contour selectivity observed in V4. We find
that there are important features of the data that are not
captured by the SRF model.
MATERIALS AND METHODS
Experimental Procedures
All animal procedures for this study, including implants, surgeries,
and behavioral training, conformed to National Institutes of Health
and US Department of Agriculture guidelines and were performed
under an institutionally approved protocol. The data analyzed here are
derived from a previous study (Pasupathy and Connor 2001) and
consist of the responses of 109 single, well-isolated V4 neurons in two
rhesus monkeys (Macaca mulatta) that were recorded while the
animals fixated a 0.1° white spot on a computer monitor. After
preliminary characterization of the receptive field (RF) location and
preferred color of each cell, shape tuning was characterized with a set
of 366 stimuli (Fig. 1). Each stimulus was presented in random order
without replacement five times for most cells (91/109; 9 cells had 4
repetitions and 9 had 3 repetitions). Response rates were calculated by
counting spike occurrences during the 500-ms stimulus presentation
period. Spontaneous rates, calculated based on blank stimulus periods
Address for reprint requests and other correspondence: W. Bair, Dept. of
Biological Structure and National Primate Research Center, Univ. of Wash-
ington, 1959 NE Pacific St., HSB G-520, UW Mailbox-357420, Seattle, WA
98195 (e-mail: wyeth0@uw.edu).
J Neurophysiol 112: 2114 –2122, 2014.
First published July 23, 2014; doi:10.1152/jn.00250.2014.
2114 0022-3077/14 Copyright © 2014 the American Physiological Society www.jn.org
interspersed randomly during stimulus presentation, were subtracted
from the average response rate for each stimulus.
Stimulus Design and Representation
Stimulus design is described in detail by Pasupathy and Connor
(2001). Briefly, stimuli were constructed by systematic combination
of 4 8 contour segments each of which took 1 of 5 curvature values,
resulting in 51 shapes (Fig. 1). To create radial variation, each shape
was rotated by 8 increments of 45°, discarding duplications due to
rotational symmetry. Shape stimuli were presented in the center of the
RF of the cell under study and were sized such that all parts of the stimuli
were within the estimated RF of the cell. Specifically, the outermost
stimulus edges were at a distance of 3/4 of the RF radius, which was
estimated based on the reported relationship between eccentricity and
RF size (Gattass et al. 1988).
For modeling and fitting, each shape was generated as a discretized
binary mask of 128 128 pixels and then convolved with a Gaussian
filter of standard deviation 1 pixel (e.g., Fig. 2A). This image repre-
sents a 5° 5° patch of the visual field to approximate the experi-
mentally used resolution (Pasupathy and Connor 2001). The cutoff
frequency of this representation is 12.8 cyc/° (half of the 25.6 pixels/°
resolution). Because the typical stimulus size was 3° diameter in the
electrophysiology study of Pasupathy and Connor (2001), we made
the largest stimulus have a diameter of 75 pixels within the 128-
pixel field. Fourier transforms of stimulus images were computed with
a 2D FFT algorithm. The magnitude of complex-valued Fourier
components was subjected to the transformation t(x)log(x1) to
attenuate the low-frequency power that is largely similar across all
shapes (Fig. 2B). Because of the limited number of stimuli and trial
repetitions, power spectra were downsampled to reduce the number of
dimensions in the representation to facilitate model fitting (see below).
Specifically, a spectral power sample (Fig. 2C) was created by
summing over 7 7-pixel blocks within the spectrum, with the
middle block centered on the DC bin, to achieve a 17 17 grid (the
extra few pixels at the margins were ignored). This limited our
frequency representation to 0 –12 cyc/°, which exceeds the range used
in a comparable study (David et al. 2006). Because of the even
symmetry of the power spectrum, this resulted in a 17 9-pixel
representation (as depicted in Fig. 2C), denoted P{P
s
}
sS
, where
the set of all shapes is denoted S(|S|366). Overall, the aims of this
representation were 1) to approximate the methods used during the
original data recording, 2) to reduce the number of parameters to be
fit (17 9, given the symmetry in the power spectra), and 3)to
represent the vast majority of the frequency range that would be
available to the visual cortex at the relevant eccentricities.
Models
Spectral receptive field. As proposed by David et al. (2006), an
SRF model performs a linear combination of the spectral power of the
stimulus in discrete bands to predict neural activity. Using the spectral
power sample, P
s
, of each shape and observed neuronal responses, r
s
,
the SRF model seeks a set of weights,
SRF
, to minimize the residual
error between model prediction P
SRF
and r. Finding such a template
can thus be cast as a linear least-squares optimization, i.e.,
SRF argmin
P⌽⫺r
(1)
where ||.|| denotes the standard Euclidean norm. For procedural con-
venience, stimulus power spectra are encoded by a 153-element vector
representing the coefficients of the 17 9 sampling of spectral power.
As neural responses to 366 shape stimuli are considered, Pis a 366
153 matrix. Vectors and r{r
s
}
sS
are of 153 1 and 366 1
elements, respectively.
1
10
19
28
37
46
Fig. 1. Set of 51 shapes used to characterize V4 neurons. Each shape was
presented at each of 8 orientations, or fewer for shapes with rotational
symmetry. For example, the circles (top left) were shown in just 1 conforma-
tion because all rotations are identical. Shapes are numbered left to right,top
to bottom, starting with 1at top left. Gray arrow marks shape 24, referred to
in RESULTS.
A B
C
Fig. 2. Shape stimuli and their spectral power repre-
sentation. A: 2 rotations of shape 18 (Fig. 1) are
shown at 128 128-pixel resolution. B: log of the
Fourier power spectra of A.C: downsampled spectral
power representation of Bused for fitting. SF, spatial
frequency.
2115CURVATURE TUNING IN V4
J Neurophysiol doi:10.1152/jn.00250.2014 www.jn.org
Because of the high ratio of model parameters to stimuli and
correlations among stimuli, the matrix Pis ill-conditioned, making
standard least squares prone to overfitting. To correct for this, we used
Tikhonov regularization (Press et al. 2007), i.e.,
SRF argmin
P⌽⫺r
(2)
in place of Eq. 1, where
denotes the regularization factor. We tested
values of
from 0.01 to 100, using 100 points that were evenly spaced
on a log scale. The data were divided into 100 randomly chosen
partitions of 75% training and 25% test data. Each partition was used
to fit and test the model at each
value. At each
, we computed M
test
and M
train
, the average explained variance across all partitions in the
testing and training data, respectively. For each cell, we defined
=to
be the value of
that maximized M
test
and then defined the training
and testing performance to be M
test
=
and M
train
=
, respectively. To verify
that these methods were sufficient to reveal SRF maps like those
reported previously (David et al. 2006), we simulated SRFs having a
variety of sizes and shapes, tested them with the same shape set used
in the electrophysiology study, and confirmed that we could recover
the simulated fields.
Angular position and curvature. Pasupathy and Connor (2001)
proposed an angular position and curvature (APC) model that per-
forms a nonlinear computation over stimuli represented as a set of
4 8 points in the 2D space of angular position,
, and contour
curvature,
. Neural responses are predicted by evaluating a 2D
Gaussian energy function (Von Mises in
) at each of these points and
taking the maximum. In particular, s
i
(
i
,
i
) denotes the points
defining a shape stimulus sfor i(1,...,I
s
), where I
s
is the number of
points. An APC model seeks the energy function parameters
APC
(
,
,
,
,
) that minimize the error with respect to the observed
neural responses r
s
. The APC model is fit through nonlinear optimi-
zation, i.e.,
APC argmin
,
,
,
,
sS
max
iIs
e
cos
i
i
2
2rs
2
.
(3)
Unlike SRF modeling, a global optima cannot be found determinis-
tically. We estimated the optimal model parameters by performing
gradient descent on the objective function. To avoid locally optimal
solutions, descent was repeatedly conducted from random initializa-
tions (n100) sampled from a uniform distribution over the angular
position and curvature parameter space. Simulations reveal that global
optima are consistently well approximated after only a few repeated
descents.
Because responses of many V4 neurons depend on the curvature of
three adjoining contour segments centered at a specific angular posi-
tion (Pasupathy and Connor 2001), we also considered an APC model
that includes three curvature dimensions and a single angular position
dimension. We refer to this as the 4D APC model to distinguish it
from the 2D APC model described above. The 4D APC model has
nine parameters, which include the four additional parameters for the
means and SDs of the Gaussian functions describing the two adjoining
curvature dimensions. We used the same 75%/25% data partition
scheme for fitting and testing our APC models as described above for
the SRF model.
RESULTS
The results are organized in two sections. We first examine
whether there is direct evidence for the SRF model by testing
a specific prediction that it makes about responses to stimuli
subject to a 180° rotation. We then compare the ability of the
curvature model and the SRF model to capture variance in the
data and examine whether the two models are equally good at
explaining tuning for boundary curvature.
Response Similarity for 180° Stimulus Rotation
The SRF model predicts responses of V4 neurons on the
basis of the spectral power coefficients of the visual stimuli;
therefore, any SRF-like neuron would naturally yield equiva-
lent responses, up to noise, to stimuli having identical power
spectra. It turns out that any stimulus rotated by 180° has the
same spectrum as the original stimulus. This follows intuitively
because any visual stimulus can be described by its Fourier
(sine and cosine) components and these components do not
change their orientation, spatial frequency, or amplitude when
rotated 180° in the spatial domain. Formally, denoting the
Fourier transform F of a 2D shape image fas
Ff
x,y
f
ˆ
,
,(4)
the spectral power of a 180° rotation of f, denoted f
R
, is equal
to the spectral power of f, i.e.,
FfR
x,y
2Ff
x,y
2
f
ˆ
,
2
f
ˆ
,
2
f
ˆ
,
·f
ˆ
,
A
Ff
x,y
2.
(5)
The second step above follows from the time reversal
property of the Fourier transform. The third step follows
because the Fourier transform of a real-valued function is
Hermitian (overbar denotes the complex conjugate), and the
fourth and fifth steps simply apply the definition of the squared
norm as the product of a complex value and its conjugate, e.g.,
|y|
2
yy
. This prediction of the SRF model, that neurons will
respond the same to a shape and its 180° rotation, is counter-
intuitive in light of findings that many V4 neurons are tuned for
the angular position of stimulus features around the boundary
of a shape (Pasupathy and Connor 2001), the latter being a
property that is grossly changed by 180° rotation. For example,
if a neuron is tuned for a sharp convexity to the right, it would
respond strongly to a shape such as that in Fig. 2A,top, but not
to the 180° rotation of that shape (not shown).
To test this prediction of the SRF model, we identified all
pairs of shapes in our stimulus set that were 180° rotations of
each other. For example, the shape in Fig. 2Awas presented at
8 rotations and thus contributed 4 such 180° rotation pairs. We
assessed the amount of correlation, r
180
(Pearson’s rvalue) in
these paired responses for each cell; data for three example
cells are depicted in Fig. 3 (see legend for details). The first
example cell (b1601; Fig. 3A) shows positive correlations for
180° rotations, cell a8602 (Fig. 3B) shows no correlation, and
cell b2601 (Fig. 3C) shows anticorrelation. The first example
would appear to be consistent with the idea that responses are
similar for 180° rotations, whereas the third clearly contradicts
this notion, suggesting that if a shape produces a larger than
average response, its 180° rotation typically does not. How-
ever, the observed correlation must be interpreted relative to
the amount of correlation between spectrally dissimilar stimuli,
i.e., non-180° rotation pairs. To calculate this baseline corre-
lation, r
baseline
, we chose 4 of the 24 possible non-180° pairings
at random for each shape (where 8 rotations were presented)
2116 CURVATURE TUNING IN V4
J Neurophysiol doi:10.1152/jn.00250.2014 www.jn.org
and calculated the bootstrap distribution of rvalues (Fisher z)
from repeated simulations (n100, which proved to be
convergent). Figure 4Ashows an example (cell a6802)in
which the response correlation for 180° rotations is signifi-
cantly positive (P0.05) but not different from the correlation
of non-180° pairings (Fig. 4B). It turns out that many cells
show a positive baseline correlation because they respond
better to some shapes than others regardless of orientation.
This can arise simply from shapes that have similar attributes
repeated along their boundaries (e.g., Fig. 1, shape 24) or from
sensitivity to attributes that are not changed by rotation, such as
surface area.
The population results of this analysis for the data set of 109
cells are shown in Fig. 5A, where r
180
is plotted against r
baseline
.
The significance level is set at 2
of baseline correlation. Note
that most neurons (n68) lie near the line of equality, e.g.,
a6802 (from Fig. 4; point 4 in Fig. 5A). Interestingly, some
cells, e.g., b1601 (from Fig. 3A;point 1 in Fig. 5A), fall
significantly above equality, indicating possible selectivity for
features that are preserved across 180° rotations and are po-
tentially consistent with an SRF model.
We compared the scatter of data in Fig. 5Ato that expected
from an idealized SRF model that includes realistic (Poisson)
noise. We did this by setting an underlying mean firing rate
(target rate) for each shape and then deriving from it a mea-
sured rate by sampling a spike count from the target rate five
times with Poisson statistics (variance equal to mean). To
embody the SRF model, we set the target rates equal for pairs
of shapes that were 180° rotations, choosing randomly between
the two experimentally observed rates. From these measured
rates, we computed r
180
as described above. We repeated this
process 100 times and determined the average correlation
(using Fisher z). In Fig. 5B, the results of this statistical
simulation are plotted together with the actual data and against
the same r
baseline
values. The results indicate that hypothetical
SRF units show much higher values of r
180
than were observed
A B C
Fig. 3. Comparing responses to a shape and its 180° rotation. Mean responses
to a shape and its 180° rotation are plotted against each other for 3 example
neurons. Data points are derived from the 42 shapes for which 8 rotations were
presented. Each 180° rotation pair contributes two points, (x,y) and (y,x), for a
total of 336 points per neuron. The spectral receptive field (SRF) model
predicts that yx, up to noise, and thus a high positive correlation coefficient
is expected. Nevertheless, our population contained neurons with positive
correlation (r0.54; A), no correlation (r0.00; B) and negative correlation
(r⫽⫺0.39; C). b1601,a8602, and b2601 indicate neuronal ID, where the first
letter indicates the animal ID.
Fig. 4. Computing baseline correlation between non-180° shape rotations.
A: mean responses of neuron a6802 to pairings of stimuli that are spectrally
identical, i.e., 180° rotation, (same analysis and plot format as in Fig. 3).
B: similar to A, but responses are plotted for all pairings for a given shape that
are not 180° rotations. There are 24 such pairings for each shape with 8
rotations, compared with only 4 pairings for 180° rotations. Correlation
coefficients are similar in Aand B,r0.46 and r0.47, respectively,
suggesting that for this neuron average responses are higher for some shapes
than for others, regardless of rotation.
Baseline correlation (r)
Neuron
Idealized SRF
Idealized APC
B
A
0 0.2 0.4 0.6 0.8
Spectrally identical correlation (r)
Significant
Non-significant
0
0.5
1
-0.5
b1601
a6802
a8602
a6701
b2601
1
4
4
1
2
2
5
53
3
Spectrally identical correlation (r)
0
0.5
1
-0.5
Fig. 5. Comparing response correlation for 180° rotations to baseline correla-
tion. A: for each of the 109 neurons, the correlation coefficient for responses to
spectrally identical shapes (180° rotations) is plotted against the baseline
correlation value (the correlation between responses to non-180° pairings, see
RESULTS). Points fall on both sides of the line of equality (dashed line). Points
plotted as asterisks indicate neurons for which the y-axis deviates from the
x-axis by 2 SD, based on a bootstrap estimate of the baseline correlation
distribution. Points numbered 1–3 correspond to the 3 example neurons in Fig.
3, point 4 corresponds to the example in Fig. 4, and point 5 corresponds to an
example neuron shown below in RESULTS.B: the data from Aare replotted
(gray filled circles) and are compared to predictions for an idealized SRF
model and an idealized angular position and curvature (APC) model for each
of the 109 neurons. As expected, points for the idealized SRF prediction have
much higher correlation for spectrally identical stimuli, because the SRF model
predicts identical mean responses for such stimuli.
2117CURVATURE TUNING IN V4
J Neurophysiol doi:10.1152/jn.00250.2014 www.jn.org
in our data. This suggests that, while a few cells (e.g., neuron
b1601) show consistency with the SRF model, the vast major-
ity of neurons from our population do not.
We performed a similar simulation using the response rates
predicted by the APC model (see MATERIALS AND METHODS) for
comparison to the neuronal data and to the responses of the
idealized SRF model. Each cell was fit to the APC model (see
Model Fitting and Performance), and the resulting predicted
mean responses were used as the target rates. Observed rates
were computed from the average of five Poisson samples. The
result (Fig. 5B) shows that the APC model predicts a much
lower r
180
value than the SRF model, and that the predicted
values are approximately consistent with the range of values
found for the neurons.
In summary, the SRF model makes a distinct prediction
about 180° rotations that the APC model does not, and with
respect to this prediction the SRF model is far less consistent
with our data than the APC model.
Model Fitting and Performance
Although the SRF model fails to predict the differences in
neuronal responses to shapes and their 180° rotations, previous
reports show that both the SRF and the APC models account
for only part of the variance of V4 responses (Pearson’s r
values of 0.32 for the SRF model of David et al. 2006 and 0.57
for the 4D APC model of Pasupathy and Connor 2001). We
thus wanted to establish 1) what fraction of the variance is
captured by the SRF model across the entire set of shapes, and
how this compares to that previously reported for the SRF and
APC models, and 2) whether the cells that are well fit by the
SRF model, in terms of amount of explained variance, are also
the ones that are well fit by the APC model.
We performed an empirical evaluation of both SRF and APC
models by fitting to, and predicting, recorded neural responses
to our stimuli. We partitioned our data into training and testing
sets for cross-validation, and we measured model performance
in terms of explained variance (r
2
) for both sets. Bootstrap
validation estimates (Fig. 6A) show that although the SRF
model outperforms both APC models across training data sets,
it underperforms both the 2D and 4D versions of the APC
model on the test data sets. This is a hallmark of overfitting: the
SRF model has 30 times the number of parameters (9 17
spectral weights) compared with the 2D APC model (5 param-
eters) and 17 times that of the 4D APC model (9 parameters).
When comparing only the testing validation performance
across all neurons (Fig. 6B), the responses of the majority of
neurons (77 of 109) are better predicted by the 4D APC model
than the SRF model, with a significantly higher average ex-
plained variance (mean 0.09, SD 0.13, paired t-test, P
0.0001).
Although the performance of the SRF model was relatively
weak, this does not appear to reflect particular limitations of
our stimulus set, because the performance was favorable to,
and in fact better than, that reported previously (David et al.
2006): our mean rvalue was 0.43 (n109), compared with
their mean of 0.32.
Another important feature of the scatter in Fig. 6Bis the
paucity of points near the upper right corner. This implies that
the neurons best explained by the APC model are not also those
that are best explained by the SRF model. For example, neuron
b1601 (Fig. 6B,bottom right) was among the most SRF-like
cells: its responses were best fit by the SRF model (r
2
0.54)
and were also among the most consistent with the prediction
regarding 180° rotations examined above (Fig. 5A), but its
responses were not well explained by the APC model (r
2
0.2). On the other hand, points do fall near the extreme lower
left in Fig. 6B, representing neurons that are poorly fit by both
models. This is expected under the simple assumption that
some neurons do not respond well to the stimulus set, or have
very noisy responses. Discarding the neurons that were poorly
fit by either model (r
2
0.15), there was no significant
correlation between the explained variance of the APC and
SRF models (r0.17, P0.17, n65). This suggests that
these distinct models do not capture the same features of the
response.
To understand the tuning properties of neurons that were
well fit by the SRF model and compare them to those that were
well fit by the APC model, it is useful to examine the raw
responses and fit parameters for several example neurons. The
responses of the SRF-like neuron, b1601, for each of the 366
shapes are plotted in Fig. 7A, where red indicates the strongest
responses and blue the weakest. This neuron tended to respond
most strongly to shapes that were oriented horizontally, and the
strongest responses were often offset in the diagram by 4 rows,
b1601
b2002
a6701
Mean explained variance (r2)
Training Testing
APC 4D explained variance (r2)
SRF explained variance (r2)
A B
0 0.2 0.4 0.6
0
0.2
0.4
0.6
0.2
0.4
0.1
0.3
75% 25%
SRF model
APC 2D model
APC 4D model
0
Fig. 6. Fit performance of SRF and APC models. A: mean
explained variance is plotted for the SRF model and the 2D and
4D versions of the APC model for training data (left) and
testing data (right). The training and testing partitions were
75% and 25% of the data, respectively. The SRF model
explained more variance in the training data than the APC
models, but both APC models out-performed the SRF model
on the testing data on average. Error bars show SE. B: ex-
plained variance values for the 4D APC model are plotted
against those for the SRF model for all 109 neurons. Three
examples are indicated: b1601, which was better fit by the SRF
model; a6701, which was better fit by the APC model; and
b2002, which was about equally well fit by both models.
2118 CURVATURE TUNING IN V4
J Neurophysiol doi:10.1152/jn.00250.2014 www.jn.org
which corresponds to 180° of stimulus rotation. We will see
below (Fig. 8A) that the SRF map for this neuron reflects this
apparent preference for horizontal orientation. A second exam-
ple neuron (Fig. 7B) that was moderately well fit by both
models (b2002 in Fig. 6B) responded strongly to stimuli that
were oriented vertically or tilting somewhat toward the right.
Here some but not all of the stimuli evoking the strongest
responses were separated by 180°, consistent with the moder-
ate fit of the SRF model. A contrasting example (Fig. 7C)
shows a neuron that did not display a clear preference for
overall orientation. In particular, the strongest responses are
not separated by 180° rotations, consistent with the poor fit of
the SRF model (a6701 in Fig. 6B). All of the shapes that evoke
strong responses from this cell include a concavity to the right
side of the shape. This type of tuning is well captured by the
APC model, as indicated by the relatively high explained
variance value (a6701 in Fig. 6B).
The SRF maps for the example neurons just described are
shown in Fig. 8. As described in MATERIALS AND METHODS,wefit
SRF maps over a broad range of regularization values,
,
computing training and test performance at each value to assess
and minimize the influences of overfitting. For neuron b1601
(Fig. 8A,top), the training performance declined with increas-
ing
while the testing performance increased to a maximum
and subsequently fell to an asymptote. This behavior is ex-
pected, and it held for all neurons (Fig. 8Dshows population
average). For each neuron, SRF maps are shown (below the
performance plots) for low, optimal (highest test performance),
and high regularization values. Each map shows spectral
weights as a function of horizontal and vertical spatial fre-
quency. In this representation, frequency increases with dis-
tance from the origin, and power at a particular orientation lies
along a line radiating from the origin. At low
(Fig. 8, A–C,
bottom), the maps have a salt-and-pepper appearance that fits
A B C
Fig. 7. Shape tuning maps for 3 example neurons. A: mean firing rate of neuron b1601 to each shape (drawn in black) is indicated by the color surrounding the
shape. Dark blue and dark red indicate the lowest and highest responses, respectively (see scale bar at bottom). This neuron responded best to shapes that had
a horizontal alignment, and 180° rotations of the same shape often gave roughly similar responses (black arrow pairs). All rotations (up to 8) of each shape are
arranged contiguously within a single column in 1 block. B: responses for example neuron b2002, which tended to prefer shapes with a vertical or right-leaning
alignment. Sometimes responses to 180° rotations were similar (e.g., black arrow pair). C: responses for example neuron a6701, which was well fit by the APC
model and poorly fit by the SRF model (Fig. 6B). Shapes associated with the strongest responses did not elicit strong responses when rotated by 180° (compare
top and bottom arrows in each arrow pair).
2119CURVATURE TUNING IN V4
J Neurophysiol doi:10.1152/jn.00250.2014 www.jn.org
the training data well, but they strongly underperform on the
testing data and thus are not likely to reflect a true receptive
field. At high
(here 16, but maps were similar over a broad
range), the training and test performance become nearly equal,
suggesting that the features remaining in the maps are those
that best generalize beyond the training set. Indeed, the
16
map for neuron b1601 (Fig. 8A,bottom) has a red streak along
the vertical axis, indicating a preference for horizontal orien-
tation, which is apparent in Fig. 7A. The high-
map for neuron
b2002 (Fig. 8B,bottom) has a red streak along the horizontal
axis that expands upward in the left quadrant, indicating a
preference for vertical to right-leaning orientation, as observed
in Fig. 7B. In contrast, the SRF map for neuron a6701 (Fig. 8C,
bottom) has red streaks at multiple orientations, and, most
notably, the performance (Fig. 8C,top) is substantially lower at
all
compared with the first two examples.
In summary, the correspondence between the coherent struc-
ture within the SRF maps (Fig. 8) and the raw shape responses
(Fig. 7) suggests that our SRF fits provide a useful character-
ization for some neurons, but that these neurons also appear to
be ones that display sensitivity to the overall orientation of a
shape.
DISCUSSION
We examined whether the selectivity of V4 neurons for
boundary curvature can be simply explained in terms of tuning
for the spatial frequency power spectrum as quantified by the
SRF model. We found that the responses of curvature-tuned
V4 neurons are inconsistent with the SRF model on several
counts. First, the SRF model predicts identical responses to
180°-rotated stimuli, but most V4 neurons, especially those
that are curvature tuned, do not exhibit this property. Second,
compared with the curvature-based model, the SRF model
captured significantly less of the variance in V4 responses for
a set of parametrically designed 2D complex shapes. Finally,
the V4 neurons that were particularly well fit by the SRF model
were also those that could be roughly described as showing
simple orientation tuning, and were not among the best fit by
the curvature model.
A previous attempt to show that the SRF model could unify
V4 neuronal selectivity from studies using disparate stimulus
sets (David et al. 2006) was motivated by several attractive
features of the model. The SRF model describes V4 tuning in
terms of sensitivity to particular frequency bands within the
power spectrum of the visual input. Because the frequency
bands can be labeled in terms of orientation and spatial fre-
quency, the SRF model can be viewed as a simple extension of
the representation present in V1, where neurons are tuned to
stimulus orientation (Hubel and Wiesel 1968) and spatial
frequency (Albrecht et al. 1980; Campbell et al. 1969; De
Valois and De Valois 1990; Movshon et al. 1978). This has the
advantage that the circuit implementation of a V4 neuron in
terms of the SRF model would be a relatively straightforward
combination of V1 outputs. Another key feature of the SRF
model is the second-order nonlinearity inherent to the power
spectrum that discards phase information and can thereby
produce phase- and position-invariant responses, approximat-
ing similar characteristics of V4 neurons (Gallant et al. 1996;
A B C D
Fig. 8. SRF maps depend on regularization. SRF maps (image panels) are shown for the 3 example cells of Fig. 7 (A–C) and for 3 levels of the regularization
parameter (low to high, top to bottom, respectively). Red indicates positive weights, and blue indicates negative weights (see scale bar near bottom of C). Top
row shows maps for low
(0.15). These maps produce the best performance on the training data (black line, top panels, described below) but substantially worse
performance on the test data (red line, top panels) because of overfitting. Second row shows maps at the optimal
(best test performance) for each neuron (
1.52, 5.09, and 0.38 for A–C, respectively). Third row shows maps for high
(16). Each map shows an example of the SRF given a random selection of
training/testing partition (75/25%). Top panels plot the average performance (across 100 random training/test partitions) on the training and test data as a function
of
. Shaded area shows SD. D: average performance on the training and test data as a function of
across all 109 neurons. Shaded area shows SD.
2120 CURVATURE TUNING IN V4
J Neurophysiol doi:10.1152/jn.00250.2014 www.jn.org
Pasupathy and Connor 1999, 2001; Rust and DiCarlo 2010).
However, the simplification of discarding phase information
before integrating across frequency bands ignores a key feature
of V4 curvature selectivity. Specifically, a V4 neuron may
respond preferentially to a sharp convexity pointing upward
relative to the object center but not to that same feature
pointing downward; the SRF model cannot reproduce this
important aspect of curvature tuning because phase-insensitive
Fourier power models predict identical responses for pairs of
stimuli that are 180° rotations of each other. We directly
examined the responses of V4 neurons to such pairs of stimuli
and found that this prediction did not hold, in contradiction to
the SRF model. We conclude that a defining characteristic of
the SRF model—that phase information is dropped before
combining spatial frequency components across the image—is
inconsistent with curvature selectivity in V4.
Because all current models of V4 have limitations, it is
important to consider how the SRF model compares to alter-
natives in its ability to explain the variance of neuronal re-
sponses to the same stimulus set. We fit SRF maps to V4
responses to a set of simple shapes that parametrically explored
a space of contour curvature and angular position. Our SRF
maps were roughly consistent with those reported previously
(David et al. 2006; see their Figs. 1–3). Our maps often showed
tuning for multiple orientations, similar to theirs, and our maps
explained a larger fraction of the response variance than their
maps did. One difference was that their spatial resolution was
12 cyc/RF, whereas ours was about three times higher (12
cyc/°, with typical RF sizes 3°). Nevertheless, the SRF model
captured less response variance on average than our APC
model, which had far fewer parameters. Two observations are
particularly worth noting. First, none of the cells best fit by the
curvature model (20 cells for which r
2
0.4, 4D APC model)
was better fit by the SRF model. This suggests that the SRF
model does not capture the key features of curvature selectivity
that are represented in the curvature model. Second, a closer
examination of the cells best fit by the SRF model reveals that
they would be well described as orientation selective, consis-
tent with examples of David et al. (their Figs. 1b and 3a). Thus
the SRF model does not provide a sufficient framework to
understand curvature tuning in V4; nevertheless, it may serve
an important role in describing cells in V4 whose tuning is
largely in the orientation dimension. Future work will be
required to understand how these different types of tuning
operate together in V4.
Although the contour curvature model provides a good fit to
the responses of many V4 neurons, it has the limitation of
being a descriptive model and does not point to any obvious
implementation in terms of biologically plausible circuitry.
One model to derive curvature selectivity in V4 from inputs
coming from V1 and V2 would involve first coarsely defining
an object, i.e., segmenting it, and then assessing the orientation
progression along its boundary. The latter step is captured by
the model of Cadieu and colleagues (discussed below). The
former step, segmentation, is more challenging but could be
achieved by a set of “grouping cells” like those proposed by
Craft et al. (2007) as a mechanism for creating border owner-
ship signals in V2. Grouping cells group together concentric
contour segments, and a set of such cells captures the coarse
shape of an object. This is equivalent to finding the set of
largest disks that would just fit within a bounding contour, a
method proposed for computing the medial axis of a shape
(Blum 1967). Grouping cells are hypothesized in V4 and could
send lateral connections to curvature-sensitive neurons. Inputs
from the set of grouping cells would specify the centroid of the
stimulus in a graded fashion. Further experiments are needed to
explore this possibility, but preliminary results from our labo-
ratory suggest that the earliest responses in V4 encode the
overall size of the stimulus, which supports this hypothesis.
Alternatives to the APC and SRF models considered here
include a set of biologically inspired hierarchical models
(Cadieu et al. 2007; Rodríguez-Sánchez and Tsotsos 2012;
Serre et al. 2005). The model of Cadieu et al. has been shown
to account for the curvature tuning of V4 neurons using the
same data set examined here—Fig. 10A of Cadieu et al. (2007)
shows that their model performed similarly to the 4D APC
model in terms of explained variance. The Cadieu model,
however, does not operate in an object-centered system and
does not explicitly represent curvature. Curvature is built up as
a combination of oriented segments, and translation invariance
is achieved in small steps of positional invariance implemented
by using the max-function. The model of Rodríguez-Sánchez
and Tsotsos (2012) explicitly represents curvature tuning at
intermediate stages in the visual hierarchy and implicitly uses
an object-centered coordinate system. These models may pro-
vide a useful foundation for testing the nature of an object-
centered representation and for developing a more complete
model that encompasses novel recent findings related to object
segmentation in V4 that have yet to be modeled (Bushnell et al.
2011).
In conclusion, it is essential to seek out the simplest models,
and the SRF model is therefore an important point of compar-
ison. However, responses of V4 neurons appear to reflect the
solutions to some of the most difficult problems in visual object
recognition, those of translation invariance and object segmen-
tation, so it may be unsurprising if simple combinations of V1
outputs do not account for V4 responses. To advance our
understanding of V4, it will be important to 1) develop a
mechanistic implementation that explains curvature responses,
2) extend such models to handle complex scenes, and 3)
conduct experiments to further characterize those V4 neurons
that are not well explained by either the APC or SRF models.
GRANTS
This work was funded by National Institutes of Health (NIH) Grant R01
EY-018839 (A. Pasupathy), National Science Foundation CRCNS Grant
IIS-1309725 (W. Bair and A. Pasupathy), and NIH Office of Research
Infrastructure Programs Grant RR-00166 (A. Pasupathy). T. D. Oleskiw was
funded by NIH (Computational Neuroscience Training Grant 5R90 DA-
033461-03) and by the Natural Sciences and Engineering Research Council of
Canada (NSERC, PGS-D).
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the author(s).
AUTHOR CONTRIBUTIONS
Author contributions: T.D.O., A.P., and W.B. conception and design of
research; T.D.O., A.P., and W.B. analyzed data; T.D.O., A.P., and W.B.
interpreted results of experiments; T.D.O., A.P., and W.B. prepared figures;
T.D.O., A.P., and W.B. drafted manuscript; T.D.O., A.P., and W.B. edited and
revised manuscript; T.D.O., A.P., and W.B. approved final version of manu-
script.
2121CURVATURE TUNING IN V4
J Neurophysiol doi:10.1152/jn.00250.2014 www.jn.org
REFERENCES
Albrecht DG, De Valois RL, Thorell LG. Visual cortical neurons: are bars or
gratings the optimal stimuli? Science 207: 88 –90, 1980.
Blum HA. Transformation for extracting new descriptors of shape. In: Models
for the Perception of Speech and Visual Form, edited by Wathen-Dunn W.
Cambridge, MA: MIT Press, 1967, p. 362–380.
Burkhalter A, Van Essen DC. Processing of color, form and disparity
information in visual areas VP and V2 of ventral extrastriate cortex in the
macaque monkey. J Neurosci 6: 2327–2351, 1986.
Bushnell BN, Harding PJ, Kosai Y, Pasupathy A. Partial occlusion modu-
lates contour-based shape encoding in primate area V4. J Neurosci 31:
4012– 4024, 2011.
Cadieu C, Kouh M, Pasupathy A, Connor CE, Riesenhuber M, Poggio T.
A model of V4 shape selectivity and invariance. J Neurophysiol 98:
1733–1750, 2007.
Campbell FW, Cooper GF, Enroth-Cugell C. The spatial selectivity of the
visual cells of the cat. J Physiol 203: 223–235, 1969.
Campbell FW, Robson JG. Application of Fourier analysis to the visibility of
gratings. J Physiol 197: 551–566, 1968.
Craft E, Schütze H, Niebur E, von der Heydt R. A neural model of
figure-ground organization. J Neurophysiol 97: 4310– 4326, 2007.
David SV, Hayden BY, Gallant JL. Spectral receptive field properties
explain shape selectivity in area V4. J Neurophysiol 96: 3492–3505,
2006.
Desimone R, Schein SJ. Visual properties of neurons in area V4 of the
macaque: sensitivity to stimulus form. J Neurophysiol 57: 835– 868, 1987.
De Valois RL, De Valois KK. Spatial Vision. Oxford, UK: Oxford Univ.
Press, 1990.
Felleman DJ, Van Essen DC. Distributed hierarchical processing in the
primate cerebral cortex. Cereb Cortex 1: 1– 47, 1991.
Gallant JL, Braun J, Van Essen DC. Selectivity for polar, hyperbolic,
and Cartesian gratings in macaque visual cortex. Science 259: 100 –103,
1993.
Gallant JL, Connor CE, Rakshit S, Lewis JW, Van Essen DC. Neural
responses to polar, hyperbolic, and Cartesian gratings in area V4 of the
macaque monkey. J Neurophysiol 76: 2718 –2739, 1996.
Gattass R, Sousa AP, Gross CG. Visuotopic organization and extent of V3
and V4 of the macaque. J Neurosci 8: 1831–1845, 1988.
Hegdé J, Van Essen DC. A comparative study of shape representation in
macaque visual areas V2 and V4. Cereb Cortex 17: 1100 –1116, 2007.
Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s striate
cortex. J Physiol 148: 574 –591, 1959.
Hubel DH, Wiesel TN. Receptive fields and functional architecture in two
nonstriate visual areas (18 and 19) of the cat. J Neurophysiol 28: 229 –289,
1965.
Hubel DH, Wiesel TN. Receptive fields and functional architecture of monkey
striate cortex. J Physiol 195: 215–243, 1968.
Kobatake E, Tanaka K. Neuronal selectivities to complex object features in
the ventral visual pathway of the macaque cerebral cortex. J Neurophysiol
71: 856 – 867, 1994.
Mishkin M, Ungerleider LG. Contribution of striate inputs to the visuospatial
functions of parieto-preoccipital cortex in monkeys. Behav Brain Res 6:
57–77, 1982.
Movshon JA, Thompson ID, Tolhurst DJ. Spatial summation in the recep-
tive fields of simple cells in the cat’s striate cortex. J Physiol 283: 53–77,
1978.
Nandy AS, Sharpee TO, Reynolds JH, Mitchell JF. The fine structure of
shape tuning in area V4. Neuron 78: 1102–1115, 2013.
Pasupathy A, Connor CE. Responses to contour features in macaque area V4.
J Neurophysiol 82: 2490 –2502, 1999.
Pasupathy A, Connor CE. Shape representation in area V4: position-specific
tuning for boundary conformation. J Neurophysiol 86: 2505–2519, 2001.
Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Reci-
pes: The Art of Scientific Computing (3rd ed.). New York: Cambridge Univ.
Press, 2007.
Rodríguez-Sánchez AJ, Tsotsos JK. The roles of endstopped and curvature
tuned computations in a hierarchical representation of 2D shape. PLoS One
22: 22, 2012.
Rust NC, DiCarlo JJ. Selectivity and tolerance (“invariance”) both increase
as visual information propagates from cortical area V4 to IT. J Neurosci 30:
12987–12995, 2010.
Rust NC, Mante V, Simoncelli EP, Movshon JA. How MT cells analyze the
motion of visual patterns. Nat Neurosci 9: 1421–1431, 2006.
Schiller PH, Finlay BL, Volman SF. Quantitative studies of single-cell
properties in monkey striate cortex. I. Spatiotemporal organization of
receptive fields. J Neurophysiol 39: 1288 –1319, 1976.
Serre T, Kouh M, Cadieu C, Knoblich U, Kreiman G, Poggio T. A Theory
of Object Recognition: Computations and Circuits in the Feedforward Path
of the Ventral Stream in Primate Visual Cortex. CBCL Paper 259/AI Memo
2005-036. Cambridge, MA: MIT, 2005.
Vintch B. Structured Hierarchical Models for Neurons in the Early Visual
System (PhD thesis). New York: Center for Neural Science, New York
Univ., 2013.
Willmore BD, Prenger RJ, Gallant JL. Neural representation of natural
images in visual area V2. J Neurosci 30: 2102–2114, 2010.
2122 CURVATURE TUNING IN V4
J Neurophysiol doi:10.1152/jn.00250.2014 www.jn.org
... Accordingly, shape coding in earlier ventral pathway stages has been studied and characterized for decades in terms of flat, planar shape, including boundary contours and spatial frequencies. [28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45] Figure 1. Area V4 Encodes Solid Shape Information-Introductory Example (A) First generation (Gen 1) of 40 random stimuli per lineage. ...
... The flat shapes are similar to stimuli used previously to measure contour shape tuning. [29][30][31][32][33][34][35][36][37] The evolving solid stimuli simultaneously test for responses to their 2D self-occlusion boundaries, providing an efficient way to explore both domains simultaneously. We did not evolve flat stimuli independently in these tests. ...
... Our study constitutes a break from the long history of studying V4 in terms of flat shape processing. [28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43] It also directly conflicts with previous studies concluding that neither V4 24 nor PIT (posterior inferotemporal cortex), 90 the next ventral pathway stage after V4, 9,10 encode 3D shape to any significant degree. The stimuli in those studies were a limited set of curved surfaces, with sharply cut 2D boundaries and various degrees of convex or concave surface curvature at one orientation, toward or away from the viewer. ...
Article
Full-text available
Area V4 is the first object-specific processing stage in the ventral visual pathway, just as area MT is the first motion-specific processing stage in the dorsal pathway. For almost 50 years, coding of object shape in V4 has been studied and conceived in terms of flat pattern processing, given its early position in the transformation of 2D visual images. Here, however, in awake monkey recording experiments, we found that roughly half of V4 neurons are more tuned and responsive to solid, 3D shape-in-depth, as conveyed by shading, specularity, reflection, refraction, or disparity cues in images. Using 2-photon functional microscopy, we found that flat- and solid-preferring neurons were segregated into separate modules across the surface of area V4. These findings should impact early shape-processing theories and models, which have focused on 2D pattern processing. In fact, our analyses of early object processing in AlexNet, a standard visual deep network, revealed a similar distribution of sensitivities to flat and solid shape in layer 3. Early processing of solid shape, in parallel with flat shape, could represent a computational advantage discovered by both primate brain evolution and deep-network training.
... To determine whether form selectivity in area V4 is invariant across filled and outline shapes, we recorded responses from well-isolated V4 neurons (n ϭ 43) in two awake, fixating macaques for the two stimulus sets shown in Fig. 2. The filled shapes ( Fig. 2A) consisted of unique combinations of convex and concave features and have been used in a series of studies to characterize V4 tuning for boundary curvature (Bushnell et al. 2011a;Bushnell and Pasupathy 2012;Kosai et al. 2014;Oleskiw et al. 2014;Pasupathy and Connor 2001). The outline shapes had the same boundary curvature and were rendered in the same color and luminance as the filled shapes. ...
... Because filled and outline stimuli differ in SF content, modulation of form responses could theoretically be implemented by accessing SF information. For example, the spectral receptive field model (David et al. 2006) could differentiate between filled and outline stimuli, even though it cannot capture V4 tuning for curvature (Oleskiw et al. 2014). The HMax model could create different responses to filled and outline stimuli by varying the use of inputs located in the center of the field and the relative use of inputs with different SF. ...
Article
Visual area V4 is an important midlevel cortical processing stage that subserves object recognition in primates. Studies investigating shape coding in V4 have largely probed neuronal responses with filled shapes, i.e., shapes defined by both a boundary and an interior fill. As a result, we do not know whether form-selective V4 responses are dictated by boundary features alone or if interior fill is also important. We studied 43 V4 neurons in two male macaque monkeys ( Macaca mulatta) with a set of 362 filled shapes and their corresponding outlines to determine how interior fill modulates neuronal responses in shape-selective neurons. Only a minority of neurons exhibited similar response strength and shape preferences for filled and outline stimuli. A majority responded preferentially to one stimulus category (either filled or outline shapes) and poorly to the other. Our findings are inconsistent with predictions of the hierarchical-max (HMax) V4 model that builds form selectivity from oriented boundary features and takes little account of attributes related to object surface, such as the phase of the boundary edge. We modified the V4 HMax model to include sensitivity to interior fill by either removing phase-pooling or introducing unoriented units at the V1 level; both modifications better explained our data without increasing the number of free parameters. Overall, our results suggest that boundary orientation and interior surface information are both maintained until at least the midlevel visual representation, consistent with the idea that object fill is important for recognition and perception in natural vision. NEW & NOTEWORTHY The shape of an object's boundary is critical for identification; consistent with this idea, models of object recognition predict that filled and outline versions of a shape are encoded similarly. We report that many neurons in a midlevel visual cortical area respond differently to filled and outline shapes and modify a biologically plausible model to account for our data. Our results suggest that representations of boundary shape and surface fill are interrelated in visual cortex.
... However, such a model cannot produce translation invariance in shape tuning, a fundamental property of many V4 neurons (El-Shamayleh & Pasupathy 2016, Gallant et al. 1993, Pasupathy & Connor 2001. The spectral RF model of Gallant and colleagues (David et al. 2006) can achieve translation invariance but not tuning for boundary curvature (Oleskiw et al. 2014). ...
Article
Area V4—the focus of this review—is a mid-level processing stage along the ventral visual pathway of the macaque monkey. V4 is extensively interconnected with other visual cortical areas along the ventral and dorsal visual streams, with frontal cortical areas, and with several subcortical structures. Thus, it is well poised to play a broad and integrative role in visual perception and recognition—the functional domain of the ventral pathway. Neurophysiological studies in monkeys engaged in passive fixation and behavioral tasks suggest that V4 responses are dictated by tuning in a high-dimensional stimulus space defined by form, texture, color, depth, and other attributes of visual stimuli. This high-dimensional tuning may underlie the development of object-based representations in the visual cortex that are critical for tracking, recognizing, and interacting with objects. Neurophysiological and lesion studies also suggest that V4 responses are important for guiding perceptual decisions and higher-order behavior. Expected final online publication date for the Annual Review of Vision Science, Volume 6 is September 15, 2020. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Active object recognition, fundamental to tasks like reading and driving, relies on the ability to make time-sensitive decisions. People exhibit a flexible tradeoff between speed and accuracy, a crucial human skill. However, current computational models struggle to incorporate time. To address this gap, we present the first dataset (with 148 observers) exploring the speed–accuracy tradeoff (SAT) in ImageNet object recognition. Participants performed a 16-way ImageNet categorization task where their responses counted only if they occurred near the time of a fixed-delay beep. Each block of trials allowed one reaction time. As expected, human accuracy increases with reaction time. We compare human performance with that of dynamic neural networks that adapt their computation to the available inference time. Time is a scarce resource for human object recognition, and finding an appropriate analog in neural networks is challenging. Networks can repeat operations by using layers, recurrent cycles, or early exits. We use the repetition count as a network's analog for time. In our analysis, the number of layers, recurrent cycles, and early exits correlates strongly with floating-point operations, making them suitable time analogs. Comparing networks and humans on SAT-fit error, category-wise correlation, and SAT-curve steepness, we find cascaded dynamic neural networks most promising in modeling human speed and accuracy. Surprisingly, convolutional recurrent networks, typically favored in human object recognition modeling, perform the worst on our benchmark.
Article
Full-text available
The ventral visual pathway is crucially involved in integrating low-level visual features into complex representations for objects and scenes. At an intermediate stage of the ventral visual pathway, V4 plays a crucial role in supporting this transformation. Many V4 neurons are selective for shape segments like curves and corners, however it remains unclear whether these neurons are organized into clustered functional domains, a structural motif common across other visual cortices. Using two-photon calcium imaging in awake macaques, we confirmed and localized cortical domains selective for curves or corners in V4. Single-cell resolution imaging confirmed that curve or corner selective neurons were spatially clustered into such domains. When tested with hexagonal-segment stimuli, we find that stimulus smoothness is the cardinal difference between curve and corner selectivity in V4. Combining cortical population responses with single neuron analysis, our results reveal that curves and corners are encoded by neurons clustered into functional domains in V4. This functionally-specific population architecture bridges the gap between the early and late cortices of the ventral pathway and may serve to facilitate complex object recognition.
Article
Recognizing a myriad visual objects rapidly is a hallmark of the primate visual system. Traditional theories of object recognition have focused on how crucial form features, for example, the orientation of edges, may be extracted in early visual cortex and utilized to recognize objects. An alternative view argues that much of early and mid-level visual processing focuses on encoding surface characteristics, for example, texture. Neurophysiological evidence from primate area V4 supports a third alternative - the joint, but independent, encoding of form and texture - that would be advantageous for segmenting objects from the background in natural scenes and for object recognition that is independent of surface texture. Future studies that leverage deep convolutional network models, especially focusing on network failures to match biology and behavior, can advance our insights into how such a joint representation of form and surface properties might emerge in visual cortex.
Preprint
Full-text available
Neurons are often probed by presenting a set of stimuli that vary along one dimension (e.g. color) and quantifying how this stimulus property affect neural activity. An open question, in particular where higher-level areas are involved, is how much tuning measured with one stimulus set reveals about tuning to a new set. Here we ask this question by estimating tuning to hue in macaque V4 from a set of natural scenes and a set of simple color stimuli. We found that hue tuning was strong in each dataset but was not correlated across the datasets, a finding expected if neurons have strong mixed selectivity. We also show how such mixed selectivity may be useful for transmitting information about multiple dimensions of the world. Our finding suggest that tuning in higher visual areas measured with simple stimuli may thus not generalize to naturalistic stimuli. New & Noteworthy Visual cortex is often investigated by mapping neural tuning to variables selected by the researcher such as color. How much does this approach tell us a neuron’s general ‘role’ in vision? Here we show that for strongly hue-tuned neurons in V4, estimating hue tuning from artificial stimuli does not reveal the hue tuning in the context of natural scenes. We show how models of optimal information processing suggest that such mixed selectivity maximizes information transmission.
Article
Full-text available
Deep networks provide a potentially rich interconnection between neuroscientific and artificial approaches to understanding visual intelligence, but the relationship between artificial and neural representations of complex visual form has not been elucidated at the level of single-unit selectivity. Taking the approach of an electrophysiologist to characterizing single CNN units, we found many units exhibit translation-invariant boundary curvature selectivity approaching that of exemplar neurons in the primate mid-level visual area V4. For some V4-like units, particularly in middle layers, the natural images that drove them best were qualitatively consistent with selectivity for object boundaries. Our results identify a novel image-computable model for V4 boundary curvature selectivity and suggest that such a representation may begin to emerge within an artificial network trained for image categorization, even though boundary information was not provided during training. This raises the possibility that single-unit selectivity in CNNs will become a guide for understanding sensory cortex.
Thesis
Full-text available
Using 3D stimuli, generated in the animation program Blender, we show that V4 neurons fire selectively to shapes at a preferred angle and curvature, as well as characterizing the temporal window of this response.
Article
Full-text available
Previous studies have shown that neurons in area V4 are involved in the processing of shapes of intermediate complexity and are sensitive to curvature. These studies also suggest that curvature-tuned neurons are position invariant. We sought to examine the mechanisms that endow V4 neurons with these properties. Consistent with previous studies, we found that response rank order to the most- and least-preferred stimuli was preserved throughout the receptive field. However, a fine-grained analysis of shape tuning revealed a surprising result: V4 neurons tuned to highly curved shapes exhibit very limited translation invariance. At a fine spatial scale, these neurons exhibit local variation in orientation. In contrast, neurons that prefer straight contours exhibit spatially invariant orientation-tuning and homogenous fine-scale orientation maps. Both of these patterns are consistent with a simple orientation-pooling model, with tuning for straight or curved shapes resulting, respectively, from pooling of homogenous or heterogeneous orientation signals inherited from early visual areas.
Article
Full-text available
1. The contrast thresholds of a variety of grating patterns have been measured over a wide range of spatial frequencies.2. Contrast thresholds for the detection of gratings whose luminance profiles are sine, square, rectangular or saw-tooth waves can be simply related using Fourier theory.3. Over a wide range of spatial frequencies the contrast threshold of a grating is determined only by the amplitude of the fundamental Fourier component of its wave form.4. Gratings of complex wave form cannot be distinguished from sine-wave gratings until their contrast has been raised to a level at which the higher harmonic components reach their independent threshold.5. These findings can be explained by the existence within the nervous system of linearly operating independent mechanisms selectively sensitive to limited ranges of spatial frequencies.
Article
Full-text available
That shape is important for perception has been known for almost a thousand years (thanks to Alhazen in 1083) and has been a subject of study ever since by scientists and phylosophers (such as Descartes, Helmholtz or the Gestalt psychologists). Shapes are important object descriptors. If there was any remote doubt regarding the importance of shape, recent experiments have shown that intermediate areas of primate visual cortex such as V2, V4 and TEO are involved in analyzing shape features such as corners and curvatures. The primate brain appears to perform a wide variety of complex tasks by means of simple operations. These operations are applied across several layers of neurons, representing increasingly complex, abstract intermediate processing stages. Recently, new models have attempted to emulate the human visual system. However, the role of intermediate representations in the visual cortex and their importance have not been adequately studied in computational modeling. This paper proposes a model of shape-selective neurons whose shape-selectivity is achieved through intermediate layers of visual representation not previously fully explored. We hypothesize that hypercomplex - also known as endstopped - neurons play a critical role to achieve shape selectivity and show how shape-selective neurons may be modeled by integrating endstopping and curvature computations. This model - a representational and computational system for the detection of 2-dimensional object silhouettes that we term 2DSIL - provides a highly accurate fit with neural data and replicates responses from neurons in area V4 with an average of 83% accuracy. We successfully test a biologically plausible hypothesis on how to connect early representations based on Gabor or Difference of Gaussian filters and later representations closer to object categories without the need of a learning phase as in most recent models.
Article
Full-text available
Past studies of shape coding in visual cortical area V4 have demonstrated that neurons can accurately represent isolated shapes in terms of their component contour features. However, rich natural scenes contain many partially occluded objects, which have "accidental" contours at the junction between the occluded and occluding objects. These contours do not represent the true shape of the occluded object and are known to be perceptually discounted. To discover whether V4 neurons differentially encode accidental contours, we studied the responses of single neurons in fixating monkeys to complex shapes and contextual stimuli presented either in isolation or adjoining each other to provide a percept of partial occlusion. Responses to preferred contours were suppressed when the adjoining context rendered those contours accidental. The observed suppression was reversed when the partial occlusion percept was compromised by introducing a small gap between the component stimuli. Control experiments demonstrated that these results likely depend on contour geometry at T-junctions and cannot be attributed to mechanisms based solely on local color/luminance contrast, spatial proximity of stimuli, or the spatial frequency content of images. Our findings provide novel insights into how occluded objects, which are fundamental to complex visual scenes, are encoded in area V4. They also raise the possibility that the weakened encoding of accidental contours at the junction between objects could mark the first step of image segmentation along the ventral visual pathway.
Article
Full-text available
Our ability to recognize objects despite large changes in position, size, and context is achieved through computations that are thought to increase both the shape selectivity and the tolerance ("invariance") of the visual representation at successive stages of the ventral pathway [visual cortical areas V1, V2, and V4 and inferior temporal cortex (IT)]. However, these ideas have proven difficult to test. Here, we consider how well population activity patterns at two stages of the ventral stream (V4 and IT) discriminate between, and generalize across, different images. We found that both V4 and IT encode natural images with similar fidelity, whereas the IT population is much more sensitive to controlled, statistical scrambling of those images. Scrambling sensitivity was proportional to receptive field (RF) size in both V4 and IT, suggesting that, on average, the number of visual feature conjunctions implemented by a V4 or IT neuron is directly related to its RF size. We also found that the IT population could better discriminate between objects across changes in position, scale, and context, thus directly demonstrating a V4-to-IT gain in tolerance. This tolerance gain could be accounted for by both a decrease in single-unit sensitivity to identity-preserving transformations (e.g., an increase in RF size) and an increase in the maintenance of rank-order object selectivity within the RF. These results demonstrate that, as visual information travels from V4 to IT, the population representation is reformatted to become more selective for feature conjunctions and more tolerant to identity preserving transformations, and they reveal the single-unit response properties that underlie that reformatting.
Article
In recent years, many new cortical areas have been identified in the macaque monkey. The number of identified connections between areas has increased even more dramatically. We report here on (1) a summary of the layout of cortical areas associated with vision and with other modalities, (2) a computerized database for storing and representing large amounts of information on connectivity patterns, and (3) the application of these data to the analysis of hierarchical organization of the cerebral cortex. Our analysis concentrates on the visual system, which includes 25 neocortical areas that are predominantly or exclusively visual in function, plus an additional 7 areas that we regard as visual-association areas on the basis of their extensive visual inputs. A total of 305 connections among these 32 visual and visual-association areas have been reported. This represents 31% of the possible number of pathways it each area were connected with all others. The actual degree of connectivity is likely to be closer to 40%. The great majority of pathways involve reciprocal connections between areas. There are also extensive connections with cortical areas outside the visual system proper, including the somatosensory cortex, as well as neocortical, transitional, and archicortical regions in the temporal and frontal lobes. In the somatosensory/motor system, there are 62 identified pathways linking 13 cortical areas, suggesting an overall connectivity of about 40%. Based on the laminar patterns of connections between areas, we propose a hierarchy of visual areas and of somato sensory/motor areas that is more comprehensive than those suggested in other recent studies. The current version of the visual hierarchy includes 10 levels of cortical processing. Altogether, it contains 14 levels if one includes the retina and lateral geniculate nucleus at the bottom as well as the entorhinal cortex and hippocampus at the top. Within this hierarchy, there are multiple, intertwined processing streams, which, at a low level, are related to the compartmental organization of areas V1 and V2 and, at a high level, are related to the distinction between processing centers in the temporal and parietal lobes. However, there are some pathways and relationships (about 10% of the total) whose descriptions do not fit cleanly into this hierarchical scheme for one reason or another. In most instances, though, it is unclear whether these represent genuine exceptions to a strict hierarchy rather than inaccuracies or uncertainties in the reported assignment.