ArticlePDF Available

The Roles of Endstopped and Curvature Tuned Computations in a Hierarchical Representation of 2D Shape

PLOS
PLOS One
Authors:

Abstract and Figures

That shape is important for perception has been known for almost a thousand years (thanks to Alhazen in 1083) and has been a subject of study ever since by scientists and phylosophers (such as Descartes, Helmholtz or the Gestalt psychologists). Shapes are important object descriptors. If there was any remote doubt regarding the importance of shape, recent experiments have shown that intermediate areas of primate visual cortex such as V2, V4 and TEO are involved in analyzing shape features such as corners and curvatures. The primate brain appears to perform a wide variety of complex tasks by means of simple operations. These operations are applied across several layers of neurons, representing increasingly complex, abstract intermediate processing stages. Recently, new models have attempted to emulate the human visual system. However, the role of intermediate representations in the visual cortex and their importance have not been adequately studied in computational modeling. This paper proposes a model of shape-selective neurons whose shape-selectivity is achieved through intermediate layers of visual representation not previously fully explored. We hypothesize that hypercomplex - also known as endstopped - neurons play a critical role to achieve shape selectivity and show how shape-selective neurons may be modeled by integrating endstopping and curvature computations. This model - a representational and computational system for the detection of 2-dimensional object silhouettes that we term 2DSIL - provides a highly accurate fit with neural data and replicates responses from neurons in area V4 with an average of 83% accuracy. We successfully test a biologically plausible hypothesis on how to connect early representations based on Gabor or Difference of Gaussian filters and later representations closer to object categories without the need of a learning phase as in most recent models.
Content may be subject to copyright.
The Roles of Endstopped and Curvature Tuned
Computations in a Hierarchical Representation of 2D
Shape
Antonio J. Rodrı
´guez-Sa
´nchez
1
*, John K. Tsotsos
2
1Intelligent and Interactive Systems, University of Innsbruck, Innsbruck, Austria, 2Centre for Vision Research and Dept. of Computer Science and Engineering, York
University, Toronto, Ontario, Canada
Abstract
That shape is important for perception has been known for almost a thousand years (thanks to Alhazen in 1083) and has
been a subject of study ever since by scientists and phylosophers (such as Descartes, Helmholtz or the Gestalt
psychologists). Shapes are important object descriptors. If there was any remote doubt regarding the importance of shape,
recent experiments have shown that intermediate areas of primate visual cortex such as V2, V4 and TEO are involved in
analyzing shape features such as corners and curvatures. The primate brain appears to perform a wide variety of complex
tasks by means of simple operations. These operations are applied across several layers of neurons, representing
increasingly complex, abstract intermediate processing stages. Recently, new models have attempted to emulate the
human visual system. However, the role of intermediate representations in the visual cortex and their importance have not
been adequately studied in computational modeling. This paper proposes a model of shape-selective neurons whose
shape-selectivity is achieved through intermediate layers of visual representation not previously fully explored. We
hypothesize that hypercomplex - also known as endstopped - neurons play a critical role to achieve shape selectivity and
show how shape-selective neurons may be modeled by integrating endstopping and curvature computations. This model -
a representational and computational system for the detection of 2-dimensional object silhouettes that we term 2DSIL -
provides a highly accurate fit with neural data and replicates responses from neurons in area V4 with an average of 83%
accuracy. We successfully test a biologically plausible hypothesis on how to connect early representations based on Gabor
or Difference of Gaussian filters and later representations closer to object categories without the need of a learning phase as
in most recent models.
Citation: Rodrı
´guez-Sa
´nchez AJ, Tsotsos JK (2012) The Roles of Endstopped and Curvature Tuned Computations in a Hierarchical Representation of 2D
Shape. PLoS ONE 7(8): e42058. doi:10.1371/journal.pone.0042058
Editor: Vladimir E. Bondarenko, Georgia State University, United States of America
Received December 7, 2011; Accepted July 2, 2012; Published August 9, 2012
Copyright: ß2012 Rodrı
´guez-Sa
´nchez and Tsotsos. This is an open-access article distributed under the terms of the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The authors are grateful for research support from the Natural Sciences and Engineering Research Council of Canada through a grant to JKT (4557-
2009) and the Teledyne Scientific Company, Durham, North Carolina, through a contract to JKT (BOU546385). JKT holds the Canada Research Chair in
Computational Vision which also supported this work. JKT is not an employee of Teledyne - he was PI on a contract to York University from Teledyne. The funders
had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have the following competing interest to declare. This study was partly funded by the Teledyne Scientific Company, Durham,
North Carolina. There are no patents, products in development or marketed products to declare. This does not alter the authors’ adherence to all the PLoS ONE
policies on sharing data and materials, as detailed online in the guide for authors.
* E-mail: Antonio.Rodriguez-Sanchez@uibk.ac.at
Introduction
Since the foundation of modern neuroanatomy by Ramo´n y
Cajal, who gave a detailed description of the nerve cell
organization in the central and peripheral nervous system [1–4],
great progress has been achieved in understanding the human
brain. At the same time, computing power and technology have
provided more sophisticated tools to study the brain and its great
complexity. Computational neuroscience has appeared as an
important methodology for formalizing and testing new hypoth-
eses on how that complex system may perform certain operations.
Over the last decades, many models inspired by advances in the
anatomy of the visual cortex have been presented, the earliest from
the late 60 s and early 70 s [5–8]. A subsequent and very
influential model is Fukushima’s Neocognitron [9]. The Neocog-
nitron is a self-organizing neural network model that achieves
position invariance and later demonstrated to perform well on
digit recognition [10]. The network contains an input layer
followed by a cascade of S-cells (for simple cells) and C-cells
(complex cells). After unsupervised training thanks to a self-
organization process, one of the C-cells in the last layer will
respond selectively to the input pattern used in training. Later
models, based on Fukushima’s foundation, that included back-
propagation [11] were also successful at the task of handwriting
digit recognition [12,13].
Since then, there have been several relevant works. Visnet
[14] consists of a four layer network that achieves invariant
object recognition. The most crucial part of such a method is a
tracelearningrulethatisHebbianbased.Toachievetranslation
invariance, the network is trained with inputs at different
positions. Riesenhuber and Poggio’s [15–19] model consists of
five hierarchical levels of Sand Cneurons (following Fukush-
ima’s Neocognitron [9]) that are connected through linear
operations in one layer and non-linear (MAX) in the next (the
strongest units determine the response of the system). The first
PLoS ONE | www.plosone.org 1 August 2012 | Volume 7 | Issue 8 | e42058
level receives input from the retina and is composed of simple
neuron receptive fields that analyze orientations. The next levels
account for more complex features (e.g. junctions). The last level
is composed of view-tuned neurons that achieve position and
scale invariance.
Amit [20,21] presents a parallel neural network for visual
selection. This network is trained to detect candidate locations
for object recognition. Objects are represented as composed of
features localized at different locations with respect to an object
centre. Simple features (edges and conjunctions) are detected in
lower levels, while higher levels carry out disjunctions over
regions. Suzuki and colleagues [22] construct a model of the
form pathway based on predictive coding [23,24]. Predictive
coding hypothesizes that feedback connections from high to
lower-order cortical areas carry predictions of lower-level neural
activities. Feedforward connections carry residual errors be-
tween predictions and the actual lower-level activities. In the
model, a fast coarse processing precedes and contrains more
detailed processing.
None of the models presented until now fully explore the
possible contributions of intermediate representations as they
are known in the brain. Common to most models is a first step
that performs some sort of edge-detection in a similar way to
some V1 neurons in the brain. Even though some of the
proposals may include hierarchies with intermediate represen-
tations (e.g. [19,25]), these representations do not include much
of the complexity now known to exist in the intermediate layers
of the visual cortex. The usual modeling of intermediate layers
to date is a simple composition of earlier features to
approximate shape without computing curvature or shape
directly. Here, we propose a more direct approach, one that
provides models of units that compute shape properties directly
using several novel neurally-based computations. Distinct from
the best of the previous approaches, we do not use simple
hierarchical composition of a common neural type but rather,
define new neural selectivities for each of several intermediate
visual computation layers.
Models up to now have been stagnant on the representation of
contours following Marr’s [26,27] primal sketch, that is, edge
combinations are used to represent shapes and objects. Models
have added layers of Sand Ccells following early systems [9] into
higher levels of the hierarchy, not considering that cells in those
higher levels perform quite different, more complex, operations.
There has been some progress on how hypercomplex cells, also
known as endstopped, may be defined [28–30], but except for the
work of [31–33] on figure-ground segregation, the role of
endstopping has been neglected. Here, following this past work,
we hypothesize that endstopped neurons play an important role in
encoding curvature and shape.
We present a biologically plausible model for shape represen-
tation, 2DSIL, where the focus is on 2D silhouettes. In the
following section we describe in detail each layer in the model.
Next we show the strongly positive results of testing the model
with stimuli used in previous single-cell recording studies followed
by a discussion regarding the characteristics of 2DSIL. In a
previous paper [34] we showed that even when this representa-
tion is used within a recognition system, it outperforms the
leading competing models. Material and methods are presented
at the end.
Results
In this section we explain how shape selectivity may be achieved
with a model that incorporates intermediate layers inspired by the
primate visual system. We demonstrate the performance of our
model by comparing computed responses with neurons from area
V4.
Incorporating endstopping and curvature in a model of
shape representation
Figure 1 presents a depiction of the proposed architecture,
which comprises simple, complex, endstopped, local curvature and
shape-selective cells that are described next in detail. In what
follows whenever a neuron is referred to as model neuron/cell it is
one developed for our theory. A neuron or cell referred to without
the model adjective is a biological one.
Model simple cells. Simple neurons of visual area V1 are
sensitive to bar and edge orientations as previous models also
stipulate. Common spatial response profiles to model simple
neurons in area V1 include Gabor filters [35] and Difference of
Gaussians. The latter provides a better fit to neuronal responses
[36] and accordingly gave better results in our case than the Gabor
filter formulation:
G(x,y)~1
2psx1sy
e{1
2(x0
sx1

2
zy0
sy

2
)
{1
2psx2sy
e{1
2(x0
sx2

2
zy0
sy

2
)
x0~xcos(h)zysin(h)
y0~{xsin(h)zycos(h)
ð1Þ
where syis the height and sx1and sx2are the width of each
Gaussian function. his their orientation. The relation between
these parameters may be referred to as the aspect ratio AR~sy
sx1
and the width ratio WR~sx2
sx1
. Size of filters were 4sy. As with all
the model neurons within 2DSIL, these are defined at multiple
scales, each scale being band-pass for a range of receptive field
sizes, with the number of scales represented appropriate for the
modelling task. Values assigned to these parameters are exposed in
the methods section.
Cells in area V1 are heterogeneous, i.e. they are not all
uniform. In the realization of the model, four different groups of
simple cells were designed, varying sizes and values of width and
length. Model simple cells are organized into hypercolumns.
Within a hypercolumn, cells are organized at the same
orientation but are spatially displaced and combined into model
complex cells as described next (Figure 1), however there is no
input from left and right eye since binocular responses are not
considered in this study. Model simple cells are at different
orientations and scales.
Model complex cells. Complex cells have a sensitivity for
bars and orientations as well, but their receptive fields are larger
than the ones of simple neurons. Hubel and Wiesel [37–39]
suggested that complex cells may integrate the responses of simple
cells. In addition to this, [40] showed that complex cells may be
the result of the addition of simple cells along the axis
perpendicular to their orientation. Following these studies, in
our model, a complex cell is the weighted sum of 5 laterally
displaced model simple cells within a column. The model complex
cell response is given by [30]:
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 2 August 2012 | Volume 7 | Issue 8 | e42058
RCX ~X
n
i~1
ciw(Ri)ð2Þ
Riis the response of the ith cell and ciis its weight. Model cells are
Gaussian weighted by position, with weight inversely proportional
to distance to the center. wis a rectification function, where any
value less than 0 is set to 0. Model simple cells combining into a
model complex cell are laterally displaced, their displacement
being proportional to the cell’s size as well as the height (sy) and
width (sx1) of the Gaussian function. Displacement is in the
direction of the orientation perpendicular to the preferred one
(hzp
2, using the modulo function to keep values in the range 0:::p)
and are given by dx(displacement in xaxis) and dy(displacement
in yaxis) in the following equation:
dx~size
2sysx1
sin(mod(hzp
2,p))
dy~size
2sysx1
cos(mod(hzp
2,p))
ð3Þ
The construction of a model complex neuron is depicted in
Figure 2A. The orientation of its model simple neuronal
components in this case is for 900(vertical), while the 5 model
simple cells are organized perpendicularly (spatially displaced but
Figure 1. Architecture of the representational and computational system for the detection of 2-dimensional object silhouettes
(2DSIL).
doi:10.1371/journal.pone.0042058.g001
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 3 August 2012 | Volume 7 | Issue 8 | e42058
overlapping) to this preferred orientation, that is, 00. This results in
slightly less sensitivity for orientations since each model complex
cell integrates five model simple cells. A model complex neuron
yields a positive response for stimuli at more locations inside its
receptive field and their receptive fields are larger as well. These
characteristics follow [37–39] and up to this point our model
simple and complex cells follow [9] and share some similarities
with its followers as well [15,21,41].
Model endstopped cells. Endstopped - also known as
hypercomplex - neurons respond to contours, both real and
illusory [42]. A more recent study [43] has found that although V2
neurons are mainly selective for angles and corners, these neurons
also showed submaximal responses for bars. Model endstopped
cells result from the difference between a simple cell and two
displaced complex cells [44]. At this point, our model diverges
strongly away from formulations in the previous works cited
above. When simple and complex cells are combined at the same
orientation we can distinguish between degrees of curvature.
Through the use of model complex cells at different orientations
with respect to the simple cell, we can obtain the sign of the
curvature. These two model neuron types are explained next.
Model cells discriminant to the degree of
curvature. This model endstopped cell is the neural conver-
gence of a model simple neuron and two displaced model complex
neurons selective for the same orientation as follows (Figure 2B):
RESC ~W½ccw(Rc){(cd1w(Rd1)zcd2w(Rd2))ð4Þ
cc,cd1and cd2are the gains for the center and displaced cells. Rc,
Rd1and Rd2are the responses of the center and the two displaced
cells. wis a rectification function, where any value less than 0 is set
to 0. Wis:
W~1{e{R=r
1z1=Ce{R=rð5Þ
This sigmoidal function - whose parameter values are given in
the methods section - scales responses to highly intense stimuli.
Displaced cells are shifted 1/2 of their receptive field size in the
direction of their prefered orientation. The center simple cell has
an excitatory effect while the two complex cells (at the top and
bottom in Figure 2B) have an inhibitory effect, which are wider
than the center cell, following [45,46]. This design follows the
work of [28,30,47] and [44,45,48,49].
Thanks to this configuration of simple and complex cells, we
obtain a coarse estimation of curvature such that different
curvatures can be discriminated into classes. Figure 2C shows
how this type of cell can discriminate among different degrees of
curvature. The plot shows how arcs of different radius provide
different responses from this type of cell depending on the size of
the component simple and complex cells. The scales of the simple
and complex neurons that are combined in the configuration of
endstopped cells play an important role in this curvature
discrimination as it is shown in Figure 2C. Different neuronal
sizes provide a different response to different degress of curvature.
The model endstopped smallest neuron (Figure 2C blue plot,
simple cell size 40 pixels) is selective for very high curvatures, while
the largest model enstopped neuron (Figure 2C black plot, simple
cell size 120 pixels) was selective to very broad curvatures, in-
between scales (sizes of 80 and 100 pixels) provide preferred
responses to intermediate curvatures (red and green plots). Note
that this configuration also has maximal responses to bars of a
specified length (that of the simple cell at the center) as it is the case
of real endstopped cells as well. Also note that the choice of these
sizes, and even the number of sizes or scales in the model overall,
are at the discretion of the modeler so that the space of visual
contours addressed by the model are best fit by the scales
represented.
Model cells selective to the sign of curvature. Apart from
the degree of curvature, an additional contour characteristic that
V2 cells seem to encode is the sign of curvature [28,50]. Through
the local information available to endstopping we may compute
the sign of curvature. Here, in contrast to the curvature model
Figure 2. Endstopping. (A) Model complex cell. (B) Structure of model endstopped cell. (C) Response of the model endstopped cells to different
radius of curvatures. Simple cell sizes were 40 (blue color), 80 (red color), 100 (green color) and 120 pixels (black color). sy= (10,20,25,30). AR (aspect
ratio) = (1.15,2,3,4). WR (width ratio) = 2.5 for all cells. Gain c= (0.7,0.8,1,2). Responses were normalized for the range [0,1].
doi:10.1371/journal.pone.0042058.g002
Figure 3. Model endstopped cell selective for curvature sign.
doi:10.1371/journal.pone.0042058.g003
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 4 August 2012 | Volume 7 | Issue 8 | e42058
cells, each displaced complex cell has a different orientation to the
simple cell, and the two model complex cells are oriented at
opposite signs (e.g. 450and 1350for the 00model endstopped
neurons) (Figure 3). A hint regarding this concept was first
proposed by [30], which is extended here to all orientations and
used on curvatures.
For one sign of curvature, a curve excites the model simple
excitatory cell at the center but curves falling into the region of the
model complex inhibitory cells reduces the response of the model
endstopped cell. A similar curve of the opposite sign passes only
through the excitatory region (model simple cell), the curve having
no inhibition effect (or a very low inhibition) on the overall
response of the model endstopped cell since it is not, or is barely,
falling on the model complex cell receptive fields (Figure 3).
Two types of model sign cells are used. These different signs are
obtained by changing the order of the displaced subtracted
neurons.
Rz~w½ccw(Rc){(cd145 w(Rd145 )zcd2135 w(Rd2135 ))
R{~w½ccw(Rc){(cd1135 w(Rd1135 )zcd245 w(Rd245 ))
ð6Þ
where cc,cd1and cd2are the gains for the center and displaced
cells as before. Rc,Rd1and Rd2are the responses of center and
displaced cells. The difference here is that the displaced cells are at
different orientations of the preferred center simple cell, for the
positive sign model endstopped neuron, the displaced model
complex neuron d1 is at 450, while the model complex component
d2 is at 1350. For the negative sign model endstopped cell, the
order is the opposite. For best results, these model cells required
larger receptive field overlap than their degree of curvature
endstopped model cells counterpart (see methods).
Model local curvature cells. This type of cell is the result of
the combination of the responses from the two types of model
endstoped cells (degree and sign of curvature), e.g. a model
curvature cell that is selective for broad curvatures whose sign is
positive as opposed to a model cell also selective for broad
curvatures whose sign is negative. Through this neural conver-
gence of model endstopped cells discriminative to the degree of
curvature and the ones to the sign of curvature, we obtain twice
the number of curvature classes. For example, if we have four
types of model endstopped cells, through the use of the sign of
curvature of those cells we obtain eight curvature classes.
Rhi,ri,si~RESCiT(RziwR{i)
Rhi,ri,sizn~RESCiT(R{iwRzi)
ð7Þ
where Rh,r,sdenotes the response of a neuron tuned to angle h,
curvature rand sign s.nis the number of model endstopped cell
types, RESCiis the response of the model endstopped cell iand
Rz,R{are the responses of the model sign selective endstopped
neurons. In the realization of our model i= {1, 2, 3, 4} and n=4
(see Material and Methods). This equation is read like: If the value
of Rziis greater than R{i,Rhi,ri,sihas the same value as the
model curvature endstopped cell, otherwise, Rhi,ri,sizncontains
that value and Rhi,ri,siis 0. For the case where the response from
endstopped cells is small, a high response from a model orientation
simple cell means the contour is a straight line, so its curvature is
set to 0. Rh,r,sis computed at each location.
Model shape cells. V4 cells are quite sensitive to shape and
less sensitive to spatial position [51]. Experiments in area V4 [52]
and TEO [53,54] of the macaque monkey seem to point to a
strategy of recognition of objects by parts. In the case of V4 and
TEO, those parts would be local curvatures [52,54–56]. The
response to a shape could correspond to the response of the local
curvatures of the object. In TEO, some components of local
curvatures excite the neuron, and others inhibit its response [54].
Neurons in areas V4 and TEO share similar characteristics
regarding shape analysis [54,56] and selectivity [57]. Although
similar, TEO neurons show a higher degree of complexity than V4
neurons [54]. Our model shape neurons mimic that curvature by
parts representation of shapes and silhouettes but are slightly more
complex than just the curvature|angular position coding
proposed by [56] for V4 neurons since they are not only selective
to curvatures at angular positions but also to the distance of the
curvature element to the center of the shape. This conveys more
information regarding the contour element. A shape would be
different if the curvature is far away from the shape center or near
the shape center even though its angular position is the same. We
thus make use of both components to better describe the position
of the curvature element than just one of them (angular position) as
proposed in [56].
Our model shape cells integrate the responses from a population
of model local curvature neurons to encode a shape. The proposed
response of a model’s shape neuron at location xis:
Rshape(x)~P2n
i~1ciRh,ri,si(x)Rh,ri,si~maxm
j~1(Rhj,ri,si)
ci~1
2pe{(x{xi)2
ð8Þ
where Rh,ri,si(x)denotes the response of a model local curvature
cell tuned to angle h, curvature rand sign sat location x, and ciis a
gaussian weight centered at xi(xand xiare in polar coordinates).
max selects the maximum reponse from the local curvature over all
angles, since the importance is on the responses to curvatures from
curvature neurons, not their orientation at this level of the
architecture. A model shape neuron will respond to a shape, and
depending on how close the stimulus is to its selectivity (controlled
through ci- see Materials and Methods), its response will be
stronger or weaker. Total response of a shape neuron is the
summation over all plocations:
Rshape~X
p
i~1
Rshape(xi)ð9Þ
Response of a model shape neuron in curvature space
The model shape neuron of Figure 4A has a response
depending on how close the stimulus is to its curvature-by-parts
selectivity (Figure 4A). In the figure, the model neuron is selective
to a sharp curvature at the top left. This neuron would respond
maximally when that feature is present at that specific location, but
it would respond also to a broader curvature at that location with a
lower value and would have a small response to a very broad
curvature or a straight line.
Model shape neurons exhibit band-pass tuning for curvature
information. Their responses achieve a peak at a specific
curvature, then decay providing a decreasing response for
curvature values of increasing distance. No response is provided
for curvatures very far from the optimal. The model shape
neuron in this example is then selective for those model
endstopped neurons that respond strongly to sharp curvatures at
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 5 August 2012 | Volume 7 | Issue 8 | e42058
that position. Since a model endstopped neuron with a high
response to a sharp curvature has also some response to a
slightly broader type of curvature, model shape neurons will not
provide a binary response but a range or responses depending
on the distance between curvatures in curvature space
(Figure 4B,C).
Figure 4. Shape-selective neuron. (A) Shape-selective neurons respond to different curvatures at different positions. The response is maximal
when those curvatures are present at their selective positions (red). If they are in nearby positions the neuron provides some response as well (orange
and yellow). (B) Shape-selective neuron tuning profile for location and curvature. (C) Shape neuron response to different stimuli, maximum response
is to the stimulus at the top (value 1).
doi:10.1371/journal.pone.0042058.g004
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 6 August 2012 | Volume 7 | Issue 8 | e42058
Response of a model shape neuron based on curvature
locations
Features (curvatures) comprising the model shape neuron are
weighted with respect to a factor ci(Equation 8) depending on how
close the desired curvature is to the desired position (Figure 4A).
Continuing with the example of a neuron selective for a sharp
curvature at the top left, this model neuron will have a high
response to any stimuli that contain such sharp curvature at that
position, but some response will still be elicited in a nearby
position, e.g. a sharp curvature at the top mid-left, but no response
will be obtained for a sharp curvature present at far away positions
(e.g. the sharp curvature is at the bottom) (Figure 4B).
The curvatures that fall into the preferred cell’s positions are
considered in their full value (red in Figure 4A), but if they fall
close, they are weighted in a Gaussian manner depending on how
far from the preferred position they are (orange and yellow in
Figure 4A).This is encoded using polar coordinates [52], that is,
the radial distance to the center of the model shape neuron and its
angular position.
Representational adequacy. In the words of Pasupathy and
Connor [52]: The population code for shape has to accomodate the virtual
infinity of possible objects as well as the variability of a given object’s retinal
image. Our model shape neuron has the capability of representing
that virtual infinity of objects: If we consider that our stimuli are
within 400|400 pixel images, for the bin size selection used in the
experiments below (see Material and Methods) this gives a total of
1,800 possible curvature parts inside a model shape neuron
receptive field. In the case of only 8 curvature classes, when we
consider any possible combination of curvature/location, our
model can represent a maximum of 14,400 (approximately 10 to
the power of 86400) possible configurations of stimuli. In practice,
one might take into account Gestalt properties such as continuity,
proximity and others, and that number can be reduced to reflect
only realizable configurations. The point here is that this
representation is sufficiently rich to enable coding of a wide
variety of shapes and task knowledge or learning through
developmental experience will help determine the relevant subset
for a given task domain.
Comparison with biological neurons from area V4
Here we compare the performance of the model shape neurons
with neurons in area V4 of the macaque’s visual cortex from the
same study on which our shape cells are based. For most cells in
area V4 of the macaque, shapes evoking strongest responses are
characterized by a consistent type of boundary configuration at a
specific position within the stimulus [56]. We show that this
behavior is compatible with the model shape-selective neurons
constructed as explained previously.
Pasupathy and Connor [56] recorded the responses of 109
neurons to 366 different shapes. Each cell in the sample responded
to a variety of very different shapes. No cell displayed a response
pattern that could be characterized in terms of a single type of
global shape. However, for most cells the effective stimuli showed
some degree of shape consistency at one position. In other words,
these cells were tuned for boundary configuration in one part of
the shape.
In order to demonstrate the plausibility of our shape neurons
and the hypothesis that curvature and shape may be encoded
through endstopping, we study the behavior of the model shape
neurons by comparing their responses against real neuron
responses. We compared the responses from 75 - those cells
where the shape consistency was more clear (see Material and
Figure 5. Comparison to Figure 2 of [56].Cells responses are on the left (ß2001 The American Physiological Society, reproduced with
permission) and their respective model responses are on right.
doi:10.1371/journal.pone.0042058.g005
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 7 August 2012 | Volume 7 | Issue 8 | e42058
Methods) - out of the 109 neurons recorded by Pasupathy and
Connor’s group. Data from real neurons to achieve this set of
experiments was kindly provided by Dr. Anitha Pasupathy.
We first compared the responses from our shape-selective
neurons with the four examples from [56]. We start with Figure 2
from [56] (our Figure 5). Real V4 neuron responses are on the left
(stimuli within circles), our model shape neuron equivalent
responses are on the right (stimuli within squares). Each row on
both cases contains stimuli consisting of 2 shapes (one after the
other) rotated in steps of 450. This is the stimulus set used by [56].
Each stimulus is represented by a white icon drawn within a circle
(Pasupathy and Connor’s results) or within a square (model shape
neuron responses) representing the unit receptive field. The darker
the background behind the icon, the higher the response of the
neuron is to that shape, this applies both to Pasupathy and
Connor’s neuron recording and our model shape neuron.
For the cell in Figure 5, stimuli with a sharp convex angle at the
bottom left were particularly effective (e.g. stimuli 1 and 2 in the
middle column, bottom block; these stimuli are labeled with
superscript numbers). Stimuli with a medium convex curve evoked
moderate responses (e.g., stimuli 3 and 4). Thus this cell appears to
encode information about the bottom left boundary region,
responding well to sharp convexity at this location and poorly to
broad convexity or concavity. Based on the response of this cell to
the stimuli, this neuron was selective to a sharp convexity at the
bottom left and a concavity adjacent to it (at the bottom). A first
examination shows that the responses of the model’s shape
neurons are very similar to those of real cells. Our shape-selective
neurons respond strongly to a sharp convexity at the botton left
and a concavity at the bottom as well. If the curvature adjacent to
the sharp convexity at the bottom left is convex, real cell responses
are much weaker, our shape-selective neurons show also weaker
responses as well but not as weak as for real cells. This additional
weakness might be due to local inhibitory mechanisms (local
competition) which are not presently included in the model.
Another example provided by Pasupathy and Connor is on
Figure 4 of their article (Replicated in Figure 6, right). This cell
was sensitive to boundary configuration on the right side of the
object, responding best to concave curvature at that position. This
is exemplified by stimuli 1 and 2; stimulus 1, with a concavity at
the right, evoked a stronger response. Stimulus 2 is almost
identical, but with a convexity at the right, and it evoked no
response. The cell also appears to be tuned for sharper convexities
at the counter-clockwise-adjacent position and medium convexi-
ties at the clockwise-adjacent position. Pasupathy and Connor note
that this is shown by stimulus 3 providing a strong response, while
for stimulus 4, its response is weak (opposite combination: sharp
curvature clockwise and medium curvature counter-clockwise).
The results for the model in this case are almost equal for these
stimuli as well as the other cases mentioned in [56]: compare
shapes 5 and 6, and 7 and 8. As previously, there are some small
differences, the model providing stronger responses than the real
cells for a few stimuli.
Figure 7 shows the comparison between one of our model shape
neurons with the neuron corresponding to Figure 5 from [56].
This neuron was sensitive to a sharp convexity at the top right
flanked by a concavity on one side or the other. A first
examination shows that the responses of the model’s shape
neurons are very similar to those of real cells. As it is the case for
Figure 8 of that same article, that cell was selective for broad
convex curvature at the top. Their results are replicated here in
Figure 8.
We compared the responses of 75 of our model shape neurons
with 75 V4 cells. The comparison consisted in computing the
absolute difference between the normalized responses of each
model shape neuron and that of a real V4 neuron averaged over
the 366 stimuli:
difi~P366
j~1DRshapei,j{Rrealcelli,jD
366 ;i~1:::75 ð10Þ
difiis the absolute difference between each model shape neuron’s
response and the response from the real neuron. Rshapei,j
corresponds to the response of the i-th model shape neuron to
the j-th stimulus and Rrealcelli,jis the response of its real neuron
counterpart to the same stimulus. For each cell, mean and
standard deviation were computed and results will be provided
next as error percentages, that is, mean difference between our
model shape neurons and real cells.
The results for all the 75 cells considered in this study are shown
in Figure 9 for two conditions: model neuron responses using the
curvature parts with respect to the center of the neuron (blue bars)
and model shape neuron responses with respect the centroid of the
shape (green bars). Note that the stimuli from [56] are not always
at the receptive field center. We did not find a significant
difference between using curvature parts with respect to the center
of the model neurons or the centroid of the object.
For both cases we can see that there are only a few model shape
neurons with over 20% error, most of the differences between the
model and that of real cells fall in the range 10–20%. Average
error for all model shape neurons was 16.95% for the center of the
model neuron (stdev = 12.61) and almost the same when using the
centroid of the shape (error = 16.98%, stdev = 12.25). This shows
that even for such a large number of neurons the model performs
well and the difference between the response of the model shape-
selective neurons and that of real cells is small. In direct
comparison with the only other work to compare performance
to this dataset of neural responses, our method significantly
outperforms [25].
Discussion
We have presented a model of 2D shape representation - 2DSIL
- that follows the structure and behavior of the visual cortex.
Building on past conjectures that one of the functional roles of
endstopped cells may be to aid in shape analysis [28,47,55], we set
out to define a biologically plausible computational model of shape
representation. Here, we tested this hypothesis and have shown
how a hierarchy starting from basic simple neurons, that combine
into complex neurons and further endstopped neurons provide
local curvature neurons that are selective for shape stimuli.
The main element in this architecture is that of the model
shape-selective neuron, that represents curvature parts in a
curvature|position (radial and angular) domain. The possible
number of shapes that may be represented by our model shape
neurons is very large, considering the limited types of neurons at
each level of the architecture. Even though the primate visual
system and our model have the capability to represent a virtual
infinity of shapes, the way to handle the large but finite number of
shapes in our world may be achieved through learning, selecting
those configurations of curvatures relevant to recognize the shapes
around us based on our visual experiences. Since the represen-
tation has the capability to represent any shape, a new shape can
be easily incorporated into the system. The model supports a
recognition by parts strategy, in which the parts are curvature
values at different positions, as suggested also by Connor’s group
[54]. We have compared the response of our model shape neurons
with 75 real neurons from [56]. The results obtained by the model
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 8 August 2012 | Volume 7 | Issue 8 | e42058
Figure 6. Comparison to Figure 4 of [56].Cells responses are on the left (ß2001 The American Physiological Society, reproduced with
permission) and their respective model responses are on right.
doi:10.1371/journal.pone.0042058.g006
Figure 7. Comparison to Figure 5 of [56].Cells responses are on the left (ß2001 The American Physiological Society, reproduced with
permission) and their respective model responses are on right.
doi:10.1371/journal.pone.0042058.g007
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 9 August 2012 | Volume 7 | Issue 8 | e42058
are very similar to those of the neurons, and accomplished without
any learning or classifier method.
Our model local curvature neurons do not provide an exact
value of curvature but can discriminate among degrees of
curvature (e.g. 4 in Figure 2C). This was done using a starting
point where V1 is composed of neurons of different sizes. Through
the use of different neuronal sizes and the integration of model
simple neurons into model complex neurons we obtained model
endstopped neurons able to discriminate between degrees of
curvature, from very sharp to very broad (Figure 2C). It is
important to note as well that these neurons do not provide a
binary response for a given curve; model local curvature neurons
provide a band-pass curvature filtering, with the highest response
to the selective curvature and a decaying response that is inversely
proportional to the curvature distances in curvature space. The
response of model endstopped and curvature neurons over a range
of curvatures have a Gaussian shape (Figure 2C), as well as a
model shape neuron (Figure 4B). There is no maximum selection
from the responses from early areas, so, no information is lost
when ascending the hierarchy in a feedforward direction.
However, there is a max selection computation at the last stage
of the hierarchy, the shape cells, where it no longer affects further
decisions, in keeping with Marr’s Principle of Least Commitment
[27]. We consider that any attentive selection, filtering or bias [58–
61] in such a hierarchy would occur top-down and leave that for
future work. Interestingly, our model of sign endstopped neurons
could provide a foundation to deal with the border-ownership
problem. Sign endstopped neurons could represent opponent
channels [62], and this combined with feedback modulation
through a model of attention (e.g. [58]) would further support a
model such as the one presented by [33] on border ownership.
Figure 8. Comparison to Figure 8 of [56].Cells responses are on the left (ß2001 The American Physiological Society, reproduced with
permission) and their respective model responses are on right.
doi:10.1371/journal.pone.0042058.g008
Figure 9. Difference between the model’s Shape-selective neurons and 75 real cells responses from area V4.
doi:10.1371/journal.pone.0042058.g009
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 10 August 2012 | Volume 7 | Issue 8 | e42058
Our model may be considered as a major extension of the works
[9] and [28,30]. In a similar work, Serre, Cadieu and colleagues
construct a hierarchical representation with a first layer computing
oriented edge responses. This is followed by a maximum response
selection layer that feeds a pooling stage that groups spatial piece-
wise linear elements. This strategy - borrowed from Fukushima’s
NeoCognitron [9] - is repeated for each layer of the hierarchy.
Curved lines are thus approximated by linear pieces and there is
no direct computation of curvature of any form. Another related
model, based on excitatory connections is the one proposed by
Amit [20]. One important difference (among others) between our
model and these types of models is that we use inhibition for
curvature representation through endstopping instead of purely
excitatory components. Inhibitory flankers as proposed in our
model have been strongly supported by neurophysiological studies
[39,44–46,48,49] and since our goal is to test the computational
embodiment of these neurophysiological results, this necessarily
figures prominently in our model. It is an aspect that is considered
of great importance by neuroscientists [46], and surprisingly has
been neglected in models to date.
Given that it seems accepted that the visual system computes
increasingly abstract quantities as a signal ascends the visual
processing hierarchy, are those quantities computed by applying
the same computation and thus neural convergence alone suffices
to achieve abstraction, or, is it truly necessary to include more
sophisticated computations layer by layer? This is not easy to
answer in the general case. However, we can point to one
important instance that supports the latter position. In our
previous work where we look at motion processing [63], we found
that simple neural convergence did not suffice. We needed to
include a layer of neurons selective to the spatial derivative of
velocity, a much more complex construct. This is supported by
neurophysiology in monkey [64,65] and by our own fMRI human
studies [66]. Similarly, for shape representation, although our
approach is also based on a hierarchical set of computations, we
deploy different processes at each layer, not simply repetitions of
the same process. Those different processes are intended to reflect
the reality of the different neural computations in the visual cortex.
Our approach is distinct in that we perform a direct computation
of curvature and the sign of curvature. We develop that
computation using well documented neural computation types
that include not only oriented simple cells and complex cells (as the
pooling layer of others is intended to capture) but also endstopped
cells, curvature cells, and curvature sign cells. These naturally
provide a sufficient basis for the definition of shape cells, a basis
that not only mirrors neurophysiological reality of the visual cortex
better, but also provides a richer substrate for shape definition
than piecewise linear components. This is the first model of shape
representation (to the best of our knowledge) to include
aforementioned cells in intermediate layers departing from the
near universal previous use of Fukushima’s Sand Ctypes of cells.
The role of learning from examples also differs between our
work and those mentioned. Although a statistical learning
approach such as that employed by Serre, Cadieu and colleagues
for all of the layers of their processing hierarchy except for the first,
is valuable when there is no other option, we show that in the case
of the successive representations, namely those computed by
endstopped and curvature cells, there is now sufficient knowledge
to directly model these cells and to do so with a significantly high
degree of fidelity. Learning is not required if the appropriate
representations are selected in the first place.
Although this paper does not address object recognition directly,
it may provide important contributions to elements that may
advance the state-of-the-art. In a previous paper [34], we
connected the 2DSIL representation to a recognition system and
compared its performance in object recognition tasks with several
other systems including benchmark systems. Our system per-
formed well beating other systems in several categories while
maintaining comparable performance in others. Following previ-
ous authors such as Zucker and Marr, we advocate that deeper
understanding of visual processes in humans and non-human
primates can lead to important advancements in perceptual
theories and computational systems.
With the model introduced in this paper we follow the steps of
early theories of vision [9,26,67] and propose how to – following
the philosophy of those influential works – take modeling to a next
stage by incorporating new intermediate layer computations
hoping future works will continue building on these hierarchies
aimed at modeling the visual cortex.
Materials and Methods
We used the same stimuli created for [52,56]. In order to
construct the stimuli, a Matlab program was provided by Dr.
Anitha Pasupathy. The stimuli were constructed combining
convex and concave boundary elements to form closed shapes.
Boundary elements include sharp convex angles, and medium and
high convex and concave curvatures. The combination of these
boundary elements gave rise to 49 different stimuli. Stimuli were
composed of white edges against a black background, the inside
was black as well but it is shown in our figures (Figures 5, 6, 7, and
8) as white-filled for illustration purposes. For the experiment,
stimuli were those 49 shapes but rotated to 8 orientations (some
only 2 or 4 due to redundancies) in 450increments to give a total of
366 different shapes. Stimuli are shown in Figures 5, 6, 7, and 8.
Experiments were run on Matlab in a Mac G5 PowerPC. The
input to the model is a gray-value image. Images used are
400|400 pixels, a shape would span 300|300 pixels and
correspond to the stimuli used in the aforementioned study. For
our experiments, we used 12 orientations (00,15
0,30
0,45
0,60
0,
750,90
0, 1050, 1200, 1350, 1500, 1650) and 4 different sizes for
model simple cells, this gives a total of 48 types. Size of V1 model
simple neurons are 40, 60, 88 and 120 pixels, their corresponding
values for AR are 0.7, 1.4, 2.15 and 3 respectively, WR is 2.5 for
all model neurons.
For the integration into model endstopped neurons, the values
of gain c(Equation 4) for displaced neurons were from the smaller
to the larger cell: cd1=cd2= {1.5, 1.25, 1, 3}, cc= 1 for all centre
cells. For the chosen parameters, cells respond (90% of their
maximum value) to the following ranges of curvature radius: 6 to
11, 25 to 52, 48 to 77 and 140 to 301 pixels. Refer also to
Figure 2C for an example on how the selection of these parameters
(size, AR, WR and gain) affect neuronal curvature selectivity. The
parameters for the rectification function (Equation 5) were
C= 0.01 and ris the maximum response of the set of neurons
for a given scale divided by 8.5, a factor that provided a good
normalization approximation for this sigmoidal saturation func-
tion. The displacement values for model endstopped neurons
selective to degrees of curvature was 1/2 the size of the simple
neuron component along its preferred orientation. Displacements
for the model sign endstopped neurons were from smaller to
larger: 1/5 the size, 1/4 the size, 1/4 the size and 2/5 the size
along the orientations stated in Equation 6. The 4 types of model
endstopped neurons and the curvature direction selective neurons
lead to eight curvatures. In order to obtain the aforementioned
parameter values, a program designed to evaluate different
parameter values was created. The target of this program was to
obtain values that would provide neurons able to separate different
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 11 August 2012 | Volume 7 | Issue 8 | e42058
degrees of curvature, providing a graph such as the one shown in
Figure 2C.
Neuron responses were provided by Dr. Anitha Pasupathy for
the comparison with model shape neuron responses. In their
influential study [56], the results from 109 neurons are reported
for 366 different stimuli. We compared with 75 out of those 109
neurons, the reason for this as well as the detailed process are
explained next. Due to the enormous range of shape representa-
tion of the model, we needed to select (or isolate in neurophysi-
ological terms) a subset of model shape neurons that would
correspond to their 109 subset of V4 biological counterparts
recorded in [56]. In order to do this, we created new stimulus
images and stored their model shape representation. The way
these stimuli were created was by superimposing the stimuli for
which the biological neuronal responses were on the 70%
maximum percentile (e.g. Figure 10A). This simple process would
give us an insight on the selectivity of the 109 biological neurons
and is similar to the way [56] analyzes the selectivity of 4 neurons
(Figures 2, 4, 5 and 8 on that work). That is, we consider the
stimuli that maximize the neuron responses to reach the
conclusion that a neuron is selective to some type of curvature
at a specified position, e.g. in Figure 10A it is clear that this
biological neuron is selective for a sharp curvature at the top-right,
flanked by a broad concavity that ends in a medium convexity on
the left side of the stimulus. Then, this image would be modified
such as to only keep the relevant curvatures. This is the stimulus
used to isolate our model shape neurons. This would also be the
stimulus for which the model shape neuron response is maximum.
We repeated this process for the 109 biological neurons, but 34
of them failed to provide any clear insight on their selectivity using
the present process (e.g. Figure 10D). On the other hand, the other
75 provided a very clear picture on their selectivities (Figure 10A–
C). We then stored the representation (Figure 4A) of each shape
model neuron for the stimuli created the way explained above.
The weights ci(Equation 8) are derived from the responses from
the eight curvature classes model neurons at their different
positions. Model shape neuron’s receptive fields were organized
into angular-radial bins (Figure 4A) of 10 pixels for radial values
and p/45 for angular values. A smaller bin size did not provide
significantly better results while having a much higher computa-
tional load.
For each one of the model shape neurons isolated this way, we
recorded responses for each of the 366 stimuli in [56]. Each
response is normalized in the 0–1 range using the maximum
response for the created stimulus as explained before as the
normalization factor. These normalized responses were compared
to their biological counterparts (responses already normalized) and
the absolute value of the difference was computed for each one of
the 366 stimuli. Figure 9 shows the results of these averaged values
with their corresponding standard deviations for each neuron.
Acknowledgments
The authors would like to thank Prof. Anitha Pauspathy for providing all
the stimuli and data we needed to compare our model neuron responses
with that of real neurons. We appreciate as well the helpful comments of
Prof. Allan C. Dobbins.
Author Contributions
Conceived and designed the experiments: ARS JKT. Performed the
experiments: ARS. Analyzed the data: ARS. Contributed reagents/
materials/analysis tools: ARS JKT. Wrote the paper: ARS JKT.
References
1. Ramo´ n y Cajal S (1888) Sobre las fibras nerviosas de la capa molecular del
cerebelo. Revista Trimestral de Histologı
´a Normal y Patologica 1: 33–49.
2. Ramo´ n y Cajal S (1894) The croonian lecture: La fine structure des centres
nerveux. Royal Society of London Proceedings Series I 55: 444–468.
3. Ramo´ n y Cajal S (1904) Variaciones morfo´logicas, normales y patolo´gicas del
retı
´culo neurofibrilar. Trabajos del Laboratorio de Investigaciones Bio´ logicas
Madrid 3: 9–15.
4. Jones E (2007) Neuroanatomy: Cajal and after cajal. Brain Research Reviews
55: 248–255.
5. Grossberg S (1968) Some nonlinear networks capable of learning a spatial
pattern of arbitrary complexity. Proceedings of the National Academy of
Sciences of the United States of America 59: 368–72.
6. Grossberg S (1970) Neural pattern discrimination. Journal of Theoreti cal
Biology 27: 291–337.
7. Grossberg S (1971) Pavlovian pattern lea rning by nonlinear neural networks.
Proceedings of the National Academy of Sciences of the United States of
America 68: 828–31.
8. Grossberg S (1975) A neural model of attention, reinforcement and
discrimination learning. International Review of Neurobiology 18: 263–327.
9. Fukushima K (1980) Neocognitron: a self organizing neural network model for a
mechanism of pattern recognition unaffected by shift in position. Biological
Cybernetics 36: 193–202.
Figure 10. How the features for
isolating
a Shape neuron are obtained. See text.
doi:10.1371/journal.pone.0042058.g010
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 12 August 2012 | Volume 7 | Issue 8 | e42058
10. Fukushima K, Miyake S, Ito T (1983) Neocognitron: A neural network model
for a mechanism of visual patter recognition. IEEE Transactions on Systems,
Man and Cybernetics 13.
11. Rumelhart D, Mcclelland J (1986) Parallel Distributed Processing: Explorations
in the Microstructure of Cognition. MIT Press.
12. Lecun Y, Boser B, Denker J, Henderson D, Howard R, et al. (1989)
Backpropagation applied to handwritten zip code recognition. Neural
Computation 1: 541–551.
13. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied
to document recognition. Proceedings of the IEEE 86: 2278–2324.
14. Wallis G, Rolls E (1997) Invariant face and object recognition in the visual
system. Progress in Neurobiology 51: 167–194.
15. Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in
cortex. Nature Neuroscience 2: 1019–1025.
16. Riesenhube r M, Poggio T (2000) Models of object recognition. Nature
Neuroscience 3 Suppl: 1199–1204.
17. Riesenhuber M, Poggio T (2002) Neural mechanisms of object recognition.
Current Opinion in Neurobiology 12: 162–168.
18. Serre T, Wolf L, Poggio T (2005) Object recognition with features inspired by
visual cortex. IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) : 994–1000.
19. Serre T, Wolf L, Bileschi S, Riesenhuber M (2007) Robust object recognition
with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and
Machine Intelligence 29: 411–426.
20. Amit Y (2000) A neural network architecture for visual selection. Neural
Computation 12: 1141–1164.
21. Amit Y, Mascaro M (2003) An integrated network for invariant visual detection
and recognition. Vision Research 93: 2073–2088.
22. Suzuki N, Hashimoto N, Kashimori Y, Zheng M, Kambara T (2004) A neural
model of predictive recognition in form patway of visual cortex. BioSystems 76:
33–42.
23. Rao R, Ballard D (1997) Dynamic model of visual recognition predicts neural
response properties in the visual cortex. Neural Computation 9: 721–763.
24. Rao R, Ballard D (1999) Predictive coding in the visual cortex: a functional
interpretation of some extra-classical receptive-field effects. Nature Neuroscience
2: 79–87.
25. Cadieu C, Kouch M, Connor C, Riesenhuber M, Poggio T (2007) A model of
v4 shape selectivity and invariance. Journal of Neurlophysiology 3: 1733–1750.
26. Marr D, Hildreth E (1980) Theory of edge detection. Proceedings of the Royal
Society of London, series B, Biological Sciences 207: 187–217.
27. Marr D (1982) Vision: A Computational Investigation into the Human
Representation and Processing of Visual Information. W.H. Freeman.
28. Dobbins A, Zucker S, Cynader M (1987) Endstopped neurons in the visual
cortex as a substrate for calculating curvature. Nature 329: 438–441.
29. Heitger F, Rosenthaler L, von der Heydt R, Peterhans E, Kubler O (1992)
Simulation of neural contour mechanisms: from simple to end-stopped cells.
Vision Research 32: 963–81.
30. Dobbins A (1992) Difference models of Visual Cortical neurons. Ph.D. thesis,
Department of Electrical Engineering. McGill University.
31. Heitger F, von der Heydt R (1993) A computational model of neural contour
processing: figureground segregation and illusory contours. Proceedings of the
IEEE International Conference on Computer Vision : 32–40.
32. Heitger F, von der Heydt R, Peterhans E, Rosenthaler L, Ku¨bler O (1998)
Simulation of neural contour mechanisms: representing anomalous contours.
Image and Vision Computing 16: 409–423.
33. Craft E, Schuetze H, Niebur E, von der Heydt R (2007) A neural model of
figure-ground organization. Journal of Neurophysiology 97: 4310–4326.
34. Rodrı
´guez-Sa´ nchez A, Tsotsos J (2011) The importance of intermediate
representations for the modeling of 2 d shape detection: Endstopping and
curvature tuned computations. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) : 4321–4326.
35. Marcelja S (1980) Mathematical description of the responses of simple cortical
cells. Journal of Optical Society of America 70: 1297–1300.
36. Hawken M, Parker A (1987) Spatial properti es of neurons in the monkey striate
cortex. Proceedings of the Royal Society of London, series B, Biological Sciences
231: 251–288.
37. Hubel D, Wiesel T (1959) Receptive fields of single neurones in the cat’s striate
cortex. The Journal of Physiology 148: 574–591.
38. Hubel D, Wiesel T (1962) Receptive fields, binocular interaction and functional
architecture in the cat’s visual cortex. Journal of Physiology 160: 106–154.
39. Hubel D, Wiesel T (1968) Receptiv e fields and functional architecture of
monkey striate cortex. Journal of Physiology 195: 215–243.
40. Spitzer H, Hochstein S (1985) A complex-cell receptive-field model. Journal of
Neurophysiology 53: 1266–1286.
41. Geman S (2006) Invariance and selectivity in the ventral visual pathway. Journal
of Physiology Paris 100: 212–224.
42. von der Heydt R, Peterhans E, Baumgartner G (1984) Illusory contours and
cortical neuron responses. Science 224: 1260–1262.
43. Ito M, Komatsu H (2004) Representation of angles embedded within contour
stimuli in area v2 of macaque monkeys. Journal of Neuroscience 24: 3313–3324.
44. Kato H, Bishop P, Orban G (1978) Hypercomplex and simple/complex cells
classifications in cat striate cortex. Journal of Neurophysiology : 1071–10 95.
45. Orban G, Kato H, Bishop P (1979) Dimensions and properties of end-zone
inhibitory areas of hypercomplex cells in cat striate cortex. Journal of
Neurophysiology 42: 833–849.
46. Willmore BD, Prenger RJ, Gallant JL (2010) Neural representation of natural
images in visual area v2. The Journal of Neuroscience 30: 2102–14.
47. Dobbins A, Zucker S, Cynader M (1989) Endstopping and curvature. Vision
Research 29: 1371–1387.
48. Orban G, Kato H, Bishop P (1979) End-zone region in receptive fields of
hypercomplex and other striate neurons in the cat. Journal of Neurophysiology
42: 818–832.
49. Bishop P, Kato H, Orban G (1980) Direction-selective cells in complex family in
cat striate cortex. Journal of Neurophysiology 43: 1266–1283.
50. Hegde J, Van Essen DC (2000) Selectivity for complex shapes in primate visual
area v2. The Journal of Neuroscience 20: 61–66.
51. Gallant J, Braun J, Van Essen D (1993) Selectivity for polar, hyperbolic, and
cartesian gratings in macaque visual cortex. Science 259: 100–103.
52. Pasupathy A, Connor C (2002) Population coding of shape in area v4. Nature
Neuroscience 5: 1332–1338.
53. Tanaka K (1996) Inferotemporal cortex and object vision. Annual Review on
Neuroscience 19: 109–139.
54. Brincat S, Connor C (2004) Underlying principles of visual shape selectivity in
posterior inferotemporal cortex. Nature Neuroscience 7: 880–886.
55. Pasupathy A, Connor C (1999) Responses to contour features in macaque area
v4. Journal of Neurophysiology 82: 2490–2502.
56. Pasupathy A, Connor C (2001) Shape representation in area v4: Position-specific
tuning for boundary conformation. Journal of Neurophysiology 86: 2505–2519.
57. Boussaoud D, Desimone R, Ungerleider L (1991) Visual topography of area teo
in the macaque. The Journal of Comparative Neurology 306: 554–575 .
58. Tsotsos J, Culhane S, Winky W, Lai Y, Davis N, et al. (1995) Modeling visual
attention via selective tuning. Artificial Intelligence 78: 507–545.
59. Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention.
Annual Review on Neuroscience 18: 193–222.
60. Rodrı
´guez-Sa´ nchez A, Simine E, Tsotsos J (2007) Attention and visual search.
International Journal of Neural Systems 17: 275–288.
61. Tsotsos JK (2011) A Computational Perspective on Visual Attention. The MIT
Press.
62. Zhou H, Friedman HS, von der Heydt R (2000) Coding of border ownership in
monkey visual cortex. The Journal of Neuroscience 20: 6594–6611.
63. Tsotsos J, Liu Y, Martı
´inez-Trujillo J, Pomplun M, Simine E, et al. (2005)
Attending to visual motion. Computer Vision and Image Understanding 100: 3–
43.
64. Treue S, Andersen R (1996) Neural responses to velocity gradients in macaque
cortical area mt. Visual Neuroscience 13: 797–804.
65. Meese T, Anderson S (2002) Spiral mechanisms are required to account for
summation of complex motion components. Vision Research 42: 1073–1080.
66. Martı
´nez-Trujillo J, Tsotsos J, Simine E, Pomplun M, Wildes R, et al. (2005)
Selectivity for speed gradients in human area mt/v5. Neuroreport 16: 435–438.
67. Zucker SW (1981) Computer vision and human perception: An essay on the
discovery of constraints. Proceedings of the International Conference on
Artificial Intelligence : 1102–1116.
Endstopped and Curvature Computations for Shape
PLoS ONE | www.plosone.org 13 August 2012 | Volume 7 | Issue 8 | e42058
... Flujo de información que realiza el modelo de Rodríguez y Tsotsos, es una red que sigue un procesamiento jerárquico desde los bordes orientados, para después pasar a la selectividad de curvaturas locales y finalmente combinar esa información obteniendo la selectividad a la forma simple. Imagen adaptada de [124] 7.14. a) A los filtros de orientación V1 de las células se les añaden los flancos de inhibición. ...
... El modelo propuesto por Rodríguez y Tsotsos [124] es una arquitectura que toma en cuenta el procesamiento temprano, es decir, lo que hacen las células simples y complejas de V1, pero también el procesamiento intermedio realizado por el sistema visual, obteniendo la selectividad de las células hipercomplejas a la longitud y la selectividad de lasáreas V2 y V4 a los grados de curvatura. ...
... Figura 6.5: Flujo de información que realiza el modelo de Rodríguez y Tsotsos, es una red que sigue un procesamiento jerárquico desde los bordes orientados, para después pasar a la selectividad de curvaturas locales y finalmente combinar esa información obteniendo la selectividad a la forma simple. Imagen adaptada de [124]. ...
Thesis
Full-text available
En este trabajo, se diseña una arquitectura de procesamiento visual de nivel bajo y medio con base en la evidencia neurocientífica del sistema sensorial visual del ser humano. El ser humano procesa los estímulos visuales provenientes del ambiente para poder realizar acciones sobre este, este procesamiento es jerárquico y cubre varias etapas, el procesamiento de nivel bajo, donde las neuronas reaccionan a características visuales primitivas; después, en el procesamiento intermedio, las características visuales simples se combinan para formar mas complejas proporcionando información para realizar el procesamiento de alto nivel, donde se realizan tareas como el reconocimiento de objetos y escenas, entre otras. Cada módulo de esta arquitectura corresponde a áreas de procesamiento visual del cerebro, y realizan funciones análogas a estas. Además, se integran los siguientes aspectos y tareas del procesamiento visual: Transducción del estímulo, procesamiento de color, forma, movimiento y profundidad. Con esta integración nos acercamos cada vez mas a obtener un sistema de visión general el cual proporcionará información variada a mas funciones dentro de una arquitectura cognitiva y junto con la interacción de varias funciones, acercarnos a que un agente inteligente posea un comportamiento similar al humano.
... 2 responses to stimuli in the middle in Figure 1(a) and not to those on either side (See [10], Figure 5 for such a tuning in a Macaque V4 cell). Despite the observed selectivities to both curvature magnitude and sign, existing models of V4 [12,13,14,15,16,17,18,19,20] encode curvature magnitude but none models curvature sign. Lack of a curvature sign encoding in these models result in responses that are in contrast to the reported observations in V4. Figure 1(c) gives a simple example to demonstrate this disparity. ...
... SparseShape, whose architecture is depicted in Figure 2(a), combines and extends two previous models [23,13]. In SparseShape, simple and complex cells extract oriented edges. ...
... Complex cell responses modulated by early recurrence from the dorsal stream result in border ownership (See [23] for details). Combining simple and complex cell responses result in two types of endstopped neurons: curvature degree and curvature direction (details explained in [13]). Curvature degree cells at four scales represent curvature magnitude. ...
Preprint
Full-text available
The mechanisms involved in transforming early visual signals to curvature representations in V4 are unknown. We propose a hierarchical model that reveals V1/V2 encodings that are essential components for this transformation to the reported curvature representations in V4. Then, by relaxing the often-imposed prior of a single Gaussian, V4 shape selectivity is learned in the last layer of the hierarchy from Macaque V4 responses. We found that V4 cells integrate multiple shape parts from the full spatial extent of their receptive fields with similar excitatory and inhibitory contributions. Our results uncover new details in existing data about shape selectivity in V4 neurons that with further experiments can enhance our understanding of processing in this area. Accordingly, we propose designs for a stimulus set that allow removing shape parts without disturbing the curvature signal to isolate part contributions to V4 responses.
... Furthermore, the model is able to incrementally group contour elements in the presence of crossing contours relying on Gestalt properties of good continuation, but does not work properly if lines meet under acute angles (Sections 2.2.4 and S2 Text). A possible solution would be to extend the representation by introducing end-stop-, junction-, or curvature-selective neurons [44,45,190,191]. Incorporating curvature-selective units could further enrich the model simulations to make tracing speeds curvature-dependent, and, thus, more similar to experimental evidence ( [35]; cf. ...
Article
Full-text available
Object-basd visual attention marks a key process of mammalian perception. By which mechanisms this process is implemented and how it can be interacted with by means of attentional control is not completely understood yet. Incremental binding is a mechanism required in demanding scenarios of object-based attention and is experimentally well investigated. Attention spreads across a representation of the visual object and labels bound elements by constant up-modulation of neural activity. The speed of incremental binding was found to be dependent on the spatial arrangement of distracting elements in the scene and to be scale invariant giving rise to the growth-cone hypothesis. In this work, we propose a neural dynamical model of incremental binding that provides a mechanistic account for these findings. Through simulations, we investigate the model properties and demonstrate how an attentional spreading mechanism tags neurons that participate in the object binding process. They utilize Gestalt properties and eventually show growth-cone characteristics labeling perceptual items by delayed activity enhancement of neuronal firing rates. We discuss the algorithmic process underlying incremental binding and relate it to our model computations. This theoretical investigation encompasses complexity considerations and finds the model to be not only of explanatory value in terms of neurophysiological evidence, but also to be an efficient implementation of incremental binding striving to establish a normative account. By relating the connectivity motifs of the model to neuroanatomical evidence, we suggest thalamo-cortical interactions to be a likely candidate for the flexible and efficient realization suggested by the model. There, pyramidal cells are proposed to serve as the processors of incremental grouping information. Local bottom-up evidence about stimulus features is integrated via basal dendritic sites. It is combined with an apical signal consisting of contextual grouping information which is gated by attentional task-relevance selection mediated via higher-order thalamic representations.
... Numerous models, alternatively described as neural network or connectionist models, have been developed on the basis of this theory [7][8][9][10][11][12][13][14]. The models have various degrees of non-specific connectivity among the layers of model neurons as a starting point. ...
Preprint
Full-text available
We do not yet understand how the vertebrate visual system provides for recognition of ob- jects. Countless experiments have been performed to examine the contribution of cues such as color, texture, and shadowing, but the most important cues are the con- tours of the outer boundary. Most objects that we can name can be identified as a silhouette, or equally well as a line drawing of the boundary. This has long been ap- preciated, so it is somewhat surprising that after more than a century of experimental research, we have not yet established how our visual system encodes this shape in- formation.
... It is possible, however, that they have been ill served by the advice this community has provided. A great many models for how to encode shape information have been offered, most being connectionist (neural net) methods by which connections among several neuron populations are modified through many hundreds, sometimes thousands, of training trials [46][47][48][49][50][51][52][53]. Those many training trials are needed to adjust the balance of activation that provides for identification of alternative shapes as well as translation, rotation, and size invariance. ...
Preprint
Full-text available
Selfridge, along with Sutherland and Marr provided some of the earliest proposals for how to program computers to recognize shapes. Their emphasis on filtering for contour features, especially the orientation of boundary segments, was reinforced by the Nobel Prize winning work of Hubel & Wiesel who discovered that neurons in primary visual cortex selectively respond as a function of contour orientation. Countless investigators and theorists have continued to build on this approach. These models are often described as neuromorphic, which implies that the computational methods are based on biologically plausible principles. Recent work from the present lab has challenged the emphasis on orientation selectivity and the use of neural network principles. The goal of the present report is not to relitigate those issues, but to provide an alternative concept for encoding of shape information that may be useful to neuromorphic modelers.
... We used two ways to extract the direction of pattern motion: tracking of extracted features (Lu & Sperling, 2001;Sun, Chubb, & Sperling, 2015) and motion energy combined into units resembling MT pattern-direction selective cells (Adelson & Movshon, 1982;Simoncelli, Adelson, & Heeger, 1991;Weiss et al., 2002;Rust et al., 2006). In the first module of the model ( Figure 8A, bottom), features such as corners and sharp curvatures are extracted by the Harris corner detector (Harris & Stephens, 1988) but could also be extracted by end-stopped cells (Hubel & Wiesel, 1965;Dobbins, Zucker, & Cynader, 1987;Pasupathy & Connor, 1999;Pack et al., 2003;Rodríguez-Sánchez & Tsotsos, 2012). The extracted features are tracked by taking the correlation between successive images and using the highest across-image correlated location to estimate the direction and speed of the motion of each feature. ...
Article
Full-text available
Why do moving objects appear rigid when projected retinal images are deformed non-rigidly? We used rotating rigid objects that can appear rigid or non-rigid to test whether shape features contribute to rigidity perception. When two circular rings were rigidly linked at an angle and jointly rotated at moderate speeds, observers reported that the rings wobbled and were not linked rigidly, but rigid rotation was reported at slow speeds. When gaps, paint, or vertices were added, the rings appeared rigidly rotating even at moderate speeds. At high speeds, all configurations appeared non-rigid. Salient features thus contribute to rigidity at slow and moderate speeds but not at high speeds. Simulated responses of arrays of motion-energy cells showed that motion flow vectors are predominantly orthogonal to the contours of the rings, not parallel to the rotation direction. A convolutional neural network trained to distinguish flow patterns for wobbling versus rotation gave a high probability of wobbling for the motion-energy flows. However, the convolutional neural network gave high probabilities of rotation for motion flows generated by tracking features with arrays of MT pattern-motion cells and corner detectors. In addition, circular rings can appear to spin and roll despite the absence of any sensory evidence, and this illusion is prevented by vertices, gaps, and painted segments, showing the effects of rotational symmetry and shape. Combining convolutional neural network outputs that give greater weight to motion energy at fast speeds and to feature tracking at slow speeds, with the shape-based priors for wobbling and rolling, explained rigid and non-rigid percepts across shapes and speeds (R² = 0.95). The results demonstrate how cooperation and competition between different neuronal classes lead to specific states of visual perception and to transitions between the states.
... These models efficiently achieve those specific tasks under specific conditions; however, they suffer when the conditions have significant variability, which is the case in real life. In the second approach, the proposed biological vision models take neuroscientific or other biological evidence from some cortical regions of the human and monkey visual system and process it (Rolls, 2012;Rodríguez-Sánchez and Tsotsos, 2012;Fukushima, 1980;Zhang et al., 2019;Tschechne and Neumann, 2014). Most of these models have a specific scope or cover the explanation of a specific vision phenomena. ...
Article
Full-text available
The visual system provides with relevant information to create an internal representation of the environment and with this information to make decisions. Visual information is primarily processed in the visual cortex, where neurons react to certain visual features. In computational neuroscience and artificial vision, the study and modeling of vision processes is a topic of great interest since the human visual system can process a great diversity of stimuli in a wide variety of conditions. In this work, we propose a computational framework inspired by the human visual system, that integrates models obtained from neuroscience to describe aspects of vision such as color processing and shape processing and thus contribute to formulating a general vision system and incorporating it into artificial entities in order to approximate human behavior. This system covers the early stages of visual sensory processing, in addition to intermediate processing stages, and considers connections with other cognitive functions within a cognitive architecture. In this proposal, the model is formally described, and the results obtained are analyzed qualitatively to later weigh the considerations for the expansion of this framework.
Article
Human has the remarkable ability of learning novel objects by browsing extremely few examples, which may be attributed to the generic and robust feature extracted in the ventral stream of our brain for representing visual objects. In this sense, the tuning characteristics of ventral stream's neurons can be useful prior knowledge to improve few-shot classification. Specifically, we computationally model two groups of neurons found in ventral stream which are respectively sensitive to shape cues and color cues. Then we propose the hierarchical feature regularization method with these neuron models to regularize the backbone of a few-shot model, thus making it produce more generic and robust features for few-shot classification. In addition, to simulate the tuning characteristic that neuron firing at a higher rate in response to foreground stimulus elements compared to background elements, which we call belongingness, we design a foreground segmentation algorithm based on the observation that the foreground object usually does not appear at the edge of the picture, then multiply the foreground mask with the backbone of few-shot model. Our method is model-agnostic and can be applied to few-shot models with different backbones, training paradigms and classifiers.
Article
Full-text available
The mechanisms involved in transforming early visual signals to curvature representations in V4 are unknown. We propose a hierarchical model that reveals V1/V2 encodings that are essential components for this transformation to the reported curvature representations in V4. Then, by relaxing the often-imposed prior of a single Gaussian, V4 shape selectivity is learned in the last layer of the hierarchy from Macaque V4 responses. We found that V4 cells integrate multiple shape parts from the full spatial extent of their receptive fields with similar excitatory and inhibitory contributions. Our results uncover new details in existing data about shape selectivity in V4 neurons that with further experiments can enhance our understanding of processing in this area. Accordingly, we propose designs for a stimulus set that allow removing shape parts awithout disturbing the curvature signal to isolate part contributions to V4 responses. Significance Statement: Selectivity to convex and concave shape parts in V4 neurons has been repeatedly reported. Nonetheless, the mechanisms that yield such selectivities in the ventral stream remain unknown. We propose a hierarchical computational model that incorporates findings of the various visual areas involved in shape processing and suggest mechanisms that transform the shape signal from low-level features to convex/concave part representations. Learning shape selectivity from Macaque V4 responses in the final processing stage in our model, we found that V4 neurons integrate shape parts from the full spatial extent of their receptive field with both facilitatory and inhibitory contributions. These results reveal hidden information in existing V4 data that with further experiments can enhance our understanding of processing in V4.
Article
Rubin’s face-vase illusion demonstrates how one can switch back and forth between two different interpretations depending on how the figure outlines are assigned. In the primate visual system, assigning ownership along figure borders is encoded by neurons called the border ownership (BO) cells. Studies show that the responses of these neurons not only depend on the local features within their receptive fields, but also on contextual information. Despite two decades of studies on BO neurons, the ownership assignment mechanism in the brain is still unknown. Here, we propose a hierarchical recurrent model grounded on the hypothesis that neurons in the dorsal stream provide the context required for ownership assignment. Our proposed model incorporates early recurrence from the dorsal pathway as well as lateral modulations within the ventral stream. While dorsal modulations initiate the response difference to figure on either side of the border, lateral modulations enhance the difference. We found responses of our dorsally-modulated BO cells, similar to their biological counterparts, are invariant to size, position and solid/outlined figures. Moreover, our model BO cells exhibit comparable levels of reliability in the ownership signal to biological BO neurons. We found dorsal modulations result in high levels of accuracy and robustness for BO assignments in complex scenes compared to previous models based on ventral feedback. Finally, our experiments with illusory contours suggest that BO encoding could explain the perception of such contours in higher processing stages in the brain.
Conference Paper
Full-text available
Computational models of visual processes with biological inspiration - and even biological realism - are currently of great interest in the computer vision community. This paper provides a biologically plausible model of 2D shape which incorporates intermediate layers of visual representation that have not previously been fully explored. We propose that endstopping and curvature cells are of great importance for shape selectivity and show how their combination can lead to shape selective neurons. This shape representation model provides a highly accurate fit with neural data from and provides comparable results with real-world images to current computer vision systems. The conclusion is that such intermediate representations may no longer require a learning approach as a bridge between early representations based on Gabor or Difference of Gaussian filters (that are not learned since they are well-understood) and later representations closer to object representations that still can benefit from a learning methodology.
Article
Cells in area TE of the inferotemporal cortex of the monkey brain selectively respond to various moderately complex object features, and those that cluster in a columnar region that runs perpendicular to the cortical surface respond to similar features. Although cells within a column respond to similar features, their selectivity is not necessarily identical. The data of optical imaging in TE have suggested that the borders between neighboring columns are not discrete; a continuous mapping of complex feature space within a larger region contains several partially overlapped columns. This continuous mapping may be used for various computations, such as production of the image of the object at different viewing angles, illumination conditions. and articulation poses.
Article
A new shape factor for objects in a binary picture is presented. It is based on the average distance between an interior picture point and the nearest boundary point. Its properties are compared to the common shape factor . The new shape factor is shown to be easily computable.
Article
A l’invitation gracieuse que m’ont faite les honorables membres de cette société savante de venir dans cette séance rendre compte de mes travaux sur la structure des centres nerveux, mon premier dessein, je ne le cacherai pas, a été de renoncer à un honneur que je jugeais par trop disproportionné avec mes mérites; mais je songeai ensuite que votre bienveillance à m’écouter ne saurait être moindre que la générosité de votre invitation, et je me suis résigné au rôle, peu flatteur du reste, d’interrompre un moment l’harmonieux concert de vos beaux travaux. J’ai d’autant plus besoin de toute vobre indulgence que je vais vous entretenir d’un sujet qui vous est parfaitement connu. Tout ce que je vais vous dire, des maîtres aussi éminents que His, Kölliker, Waldeyer, von Lenhossék, van Gehuchten, l’ont déjà publié et résumé d’une manière presque irréprochable.
Article
The problem of curve evolution as a function of its local geometry arises naturally in many physical applications. A special case of this problem is the curve shortening problem which has been extensively studied. Here, we consider the general problem and prove an existence theorem for the classical solution. The main theorem rests on lemmas that bound the evolution of length, curvature, and how far the curve can travel.