ArticlePDF Available

Invariant Visual Representation by Single Neurons in the Human Brain

  • Children's Hospital, Harvard Medical School

Abstract and Figures

It takes a fraction of a second to recognize a person or an object even when seen under strikingly different conditions. How such a robust, high-level representation is achieved by neurons in the human brain is still unclear. In monkeys, neurons in the upper stages of the ventral visual pathway respond to complex images such as faces and objects and show some degree of invariance to metric properties such as the stimulus size, position and viewing angle. We have previously shown that neurons in the human medial temporal lobe (MTL) fire selectively to images of faces, animals, objects or scenes. Here we report on a remarkable subset of MTL neurons that are selectively activated by strikingly different pictures of given individuals, landmarks or objects and in some cases even by letter strings with their names. These results suggest an invariant, sparse and explicit code, which might be important in the transformation of complex visual percepts into long-term and more abstract memories.
Content may be subject to copyright.
Invariant visual representation by single neurons in
the human brain
R. Quian Quiroga
, L. Reddy
, G. Kreiman
, C. Koch
& I. Fried
It takes a fraction of a second to recognize a person or an object
even when seen under strikingly different conditions. How such a
robust, high-level representation is achieved by neurons in the
human brain is still unclear
. In monkeys, neurons in the upper
stages of the ventral visual pathway respond to complex images
such as faces and objects and show some degree of invariance to
metric properties such as the stimulus size, position and viewing
. We have previously shown that neurons in the human
medial temporal lobe (MTL) fire selectively to images of faces,
animals, objects or scenes
. Here we report on a remarkable
subset of MTL neurons that are selectively activated by strikingly
different pictures of given individuals, landmarks or objects and
in some cases even by letter strings with their names. These results
suggest an invariant, sparse and explicit code, which might be
important in the transformation of complex visual percepts into
long-term and more abstract memories.
The subjects were eight patients with pharmacologically intract-
able epilepsy who had been implanted with depth electrodes to
localize the focus of seizure onset. For each patient, the placement of
the depth electrodes, in combination with micro-wires, was deter-
mined exclusively by clinical criteria
. We analysed responses of
neurons from the hippocampus, amygdala, entorhinal cortex and
parahippocampal gyrus to images shown on a laptop computer in 21
recording sessions. Stimuli were different pictures of individuals,
animals, objects and landmark buildings presented for 1 s in pseudo-
random order, six times each. An unpublished observation in our
previous recordings was the sometimes surprising degree of
invariance inherent in the neurons (that is, unit’s) firing behaviour.
For example, in one case, a unit responded only to three completely
different images of the ex-president Bill Clinton. Another unit (from
a different patient) responded only to images of The Beatles, another
one to cartoons from The Simpson’s television series and another one
to pictures of the basketball player Michael Jordan. This suggested
that neurons might encode an abstract representation of an individ-
ual. We here ask whether MTL neurons can represent high-level
information in an abstract manner characterized by invariance to the
metric characteristics of the images. By invariance we mean that a
given unit is activated mainly, albeit not necessarily uniquely, by
different pictures of a given individual, landmark or object.
To investigate further this abstract representation, we introduced
several modifications to optimize our recording and data processing
conditions (see Supplementary Information) and we designed a
paradigm to systematically search for and characterize such invariant
neurons. In a first recording session, usually done early in the
morning (screening session), a large number of images of famous
persons, landmark buildings, animals and objects were shown. This
set was complemented by images chosen after an interview with the
patient. The mean number of images in the screening session was
93.9 (range 71–114). The data were quickly analysed offline to
determine the stimuli that elicited responses in at least one unit
(see definition of response below). Subsequently, in later sessions
(testing sessions) between three and eight variants of all the stimuli
that had previously elicited a response were shown. If not enough
stimuli elicited significant responses in the screening session, we
chose those stimuli with the strongest responses. On average, 88.6
(range 70–110) different images showing distinct views of 14 indi-
viduals or objects (range 7–23) were used in the testing sessions.
Single views of random stimuli (for example, famous and non-
famous faces, houses, animals, etc) were also included. The total
number of stimuli was determined by the time available with the
patient (about 30 min on average). Because in our clinical set-up the
recording conditions can sometimes change within a few hours, we
always tried to perform the testing sessions shortly after the screening
sessions in order to maximize the probability of recording from the
same units. Unless explicitly stated otherwise, all the data reported in
this study are from the testing sessions. To hold their attention,
patients had to perform a simple task during all sessions (indicating
with a key press whether a human face was present in the image).
Performance was close to 100%.
We recorded from a total of 993 units (343 single units and 650
multi-units), with an average of 47.3 units per session (16.3 single
units and 31.0 multi-units). Of these, 132 (14%; 64 single units and
68 multi-units) showed a statistically significant response to at least
one picture. A response was considered significant if it was larger
than the mean plus 5 standard deviations (s.d.) of the baseline and
had at least two spikes in the post-stimulus time interval considered
(300–1,000 ms). All these responses were highly selective: for the
responsive units, an average of only 2.8% of the presented pictures
(range: 0.9–22.8%) showed significant activations according to this
criterion. This high selectivity was also present in the screening
sessions, where only 3.1% of the pictures shown elicited responses
(range: 0.9–18.0%). There was no significant difference between the
relative number of responsive pictures obtained in the screening and
testing sessions (t-test, P ¼ 0.40). Responses started around 300 ms
after stimulus onset and had mainly three non-exclusive patterns of
activation (with about one-third of the cells having each type of
response): the response disappeared with stimulus offset, 1 s after
stimulus onset; it consisted of a rapid sequence of about 6 spikes
(s.d. ¼ 5) between 300 and 600 ms after stimulus onset; or it was
prolonged and continued up to 1 s after stimulus offset. For this
study, we calculated the responses in a time window between 300 and
1,000 ms after stimulus onset. In a few cases we also observed cells
that responded selectively only after the image was removed from
view (that is, after 1 s). These are not further analysed here.
Computation and Neural Systems, California Institute of Technology, Pasadena, California 91125, USA.
Division of Neurosurgery and Neuropsychiatric Institute, University of
California, Los Angeles (UCLA), California 90095, USA.
Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA.
Functional Neurosurgery Unit, Tel-Aviv Medical Center and Sackler Faculty of Medicine, Tel-Aviv University, Tel-Aviv 69978, Israel. †Present address: Department of
Engineering, University of Leicester, LE1 7RH, UK.
Vol 435|23 June 2005|doi:10.1038/nature03687
Figure 1a shows the responses of a single unit in the left posterior
hippocampus to a selection of 30 out of the 87 pictures presented to
the patient. None of the other pictures elicited a statistically signifi-
cant response. This unit fired to all pictures of the actress Jennifer
Aniston alone, but not (or only very weakly) to other famous and
non-famous faces, landmarks, animals or objects. Interestingly, the
unit did not respond to pictures of Jennifer Aniston together with the
actor Brad Pitt (but see Supplementary Fig. 2). Pictures of Jennifer
Aniston elicited an average of 4.85 spikes (s.d. ¼ 3.59) between 300
and 600 ms after stimulus onset. Notably, this unit was nearly silent
during baseline (average of 0.02 spikes in a 700-ms pre-stimulus time
window) and during the presentation of most other pictures
(Fig. 1b). Figure 1b plots the median number of spikes (across trials)
in the 300–1,000-ms post-stimulus interval for all 87 pictures shown
to the patient. The histogram shows a marked differential response to
pictures of Jennifer Aniston (red bars).
Next, we quantified the degree of invariance using a receiver
operating characteristic (ROC) framework
. We considered as the
hit rate (y axis) the relative number of responses to pictures of a
specific individual, object, animal or landmark building, and as
Figure 1 | A single unit in the left posterior hippocampus activated
exclusively by different views of the actress Jennifer Aniston.
a, Responses to 30 of the 87 images are shown. There were no statistically
significant responses to the other 57 pictures. For each picture, the
corresponding raster plots (the order of trial number is from top to bottom)
and post-stimulus time histograms are given. Vertical dashed lines indicate
image onset and offset (1 s apar t). Note that owing to insurmountable
copyright problems, all original images were replaced in this and all
subsequent figures by very similar ones (same subject, animal or building,
similar pose, similar colour, line drawing, and so on). b, The median
responses to all pictures. The image numbers correspond to those in a. The
two horizontal lines show the mean baseline activity (0.02 spikes) and the
mean plus 5 s.d. (0.82 spikes). Pictures of Jennifer Aniston are denoted by
red bars. c, The associated ROC curve (red trace) testing the hypothesis that
the cell responded in an invariant manner to all seven photographs of
Jennifer Aniston (hits) but not to other images (including photographs of
Jennifer Aniston and Brad Pitt together; false positives). The grey lines
correspond to the same ROC analysis for 99 surrogate sets of 7 randomly
chosen pictures (P , 0.01). The area under the red curve is 1.00.
NATURE|Vol 435|23 June 2005 LETTERS
the false positive rate (x axis) the relative number of responses to
other pictures. The ROC curve corresponds to the performance of a
linear binary classifier for different values of a response threshold.
Decreasing the threshold increases the probability of hits but also of
false alarms. A cell responding to a large set of pictures of different
individuals will have a ROC curve close to the diagonal (with an area
under the curve of 0.5), whereas a cell that responds to all pictures of
an individual but not to others will have a convex ROC curve far from
the diagonal, with an area close to 1. In Fig. 1c we show the ROC
curve for all seven pictures of Jennifer Aniston (red trace, with an area
equal to 1). The grey lines show 99 ROC surrogate curves, testing
invariance to randomly selected groups of pictures (see Methods). As
expected, these curves are close to the diagonal, having an area of
about 0.5. None of the 99 surrogate curves had an area equal or larger
than the original ROC curve, implying that it is unlikely (P , 0.01)
that the responses to Jennifer Aniston were obtained by chance. A
responsive unit was defined to have an invariant representation if the
area under the ROC curve was larger than the area of the 99 surrogate
Figure 2 shows another single unit located in the right anterior
hippocampus of a different patient. This unit was selectively acti-
vated by pictures of the actress Halle Berry as well as by a drawing of
her (but not by other drawings; for example, picture no. 87). This
unit was also activated by several pictures of Halle Berry dressed as
Catwoman, her character in a recent film, but not by other images of
Catwoman that were not her (data not shown). Notably, the unit was
selectively activated by the letter string ‘Halle Berry. Such an
invariant pattern of activation goes beyond common visual features
of the different stimuli. As with the previous unit, the responses were
mainly localized between 300 and 600 ms after stimulus onset.
Figure 2 | A single unit in the right anterior hippocampus that responds to
pictures of the actress Halle Berry (conventions as in Fig. 1).
ac, Strikingly, this cell also responds to a drawing of her, to herself dressed
as Catwoman (a recent movie in which she played the lead role) and to the
letter string ‘Halle Berry’ (picture no. 96). Such an invariant response cannot
be attributed to common v isual features of the stimuli. This unit also had a
very low baseline firing rate (0.06 spikes). The area under the red curve in c is
LETTERS NATURE|Vol 435|23 June 2005
Figure 2c shows the ROC curve for the pictures of Halle Berry (red
trace) and for 99 surrogates (grey lines). The area under the ROC
curve was 0.99, larger than that of the surrogates.
Figure 3 illustrates a multi-unit in the left anterior hippocampus
responding to pictures of the Sydney Opera House and the Baha’i
Temple. Because the patient identified both landmark buildings as
the Sydney Opera House, all these pictures were considered as a single
landmark building for the ROC analysis. This unit also responded to
the letter string ‘Sydney Opera’ (pictures no. 2 and 8) but not to other
letter strings, such as ‘Eiffel Tower’ (picture no. 1). More examples of
invariant responses are shown in the Supplementary Figs 2–11.
Out of the 132 responsive units, 51 (38.6%; 30 single units and 21
multi-units) showed invariance to a particular individual (38 units
responding to Jennifer Aniston, Halle Berry, Julia Roberts, Kobe
Bryant, and so on), landmark building (6 units responding to the
Tower of Pisa, the Baha’i Temple and the Sydney Opera House),
animal (5 units responding to spiders, seals and horses) or object (2
units responding to specific food items), with P , 0.01 as defined
above by means of the surrogate tests. A one-way analysis of variance
(ANOVA) yielded similar results (see Methods). Eight of these units
(two single units and six multi-units) responded to two different
individuals (or to an individual and an object). Figure 4 presents the
distribution of the areas under the ROC curves for all 51 units that
showed an invariant representation to individuals or objects. The
areas ranged from 0.76 to 1.00, with a median of 0.94. These units
were located in the hippocampus (27 out of 60 responsive units;
45%), parahippocampal gyrus (11 out of 20 responsive units; 55%),
amygdala (8 out of 30 responsive units; 27%) and entorhinal cortex
Figure 3 | A multi-unit in the left anterior hippocampus that responds to
photographs of the Sydney Opera House and the Baha’i Temple
(conventions as in Fig. 1).
ac, The patient identified all pictures of both of
these buildings as the Sydney Opera, and we therefore considered them as a
single landmark. This unit also responded to the presentation of the letter
string ‘Sydney Opera’ (pictures no. 2 and 8), but not to other strings, such as
‘Eiffel Tower’ (picture no. 1). In contrast to the previous two figures, this unit
had a higher baseline firing rate (2.64 spikes). The area under the red curve
in c is 0.97.
NATURE|Vol 435|23 June 2005 LETTERS
(5 out of 22 responsive units; 23%). There were no clear differences in
the latencies and firing patterns among the different areas. However,
more data are needed before making a conclusive claim about
systematic differences between the various structures of the MTL.
As shown in Figs 2 and 3, one of the most extreme cases of an
abstract representation is the one given by responses to pictures of a
particular individual (or object) and to the presentation of the
corresponding letter string with its name. In 18 of the 21 testing
sessions we also tested responses to letter strings with the names of
the individuals and objects. Eight of the 132 responsive units (6.1%)
showed a selective response to an individual and its name (with no
response to other names). Six of these were in the hippocampus, one
was in the entorhinal cortex and one was in the amygdala.
These neuronal responses cannot be attributed to any particular
movement artefact, because selective responses started around
300 ms after image onset, whereas key presses occurred at 1 s or
later, and neuronal responses were very selective. About one-third of
the responsive units had a response localized between 300 and
600 ms. This interval corresponds to the latency of event-related
responses correlated with the recognition of oddball’ stimuli in scalp
electroencephalogram, namely, the P300 (ref. 16). Some studies
argue for a generation of the P300 in the hippocampal formation
and amygdala
, consistent with our findings.
What are the common features that activate these neurons? Given
the great diversity of distinct images of a single individual (pencil
sketches, caricatures, letter strings, coloured photographs with
different backgrounds) that these cells can selectively respond to, it
is unlikely that this degree of invariance can be explained by a simple
set of metric features common to these images. Indeed, our data are
compatible with an abstract representation of the identity of the
individual or object shown. The existence of such high-level visual
responses in medial temporal lobe structures, usually considered to
be involved in long-term memory formation and consolidation,
should not be surprising given the following: (1) the known ana-
tomical connections between the higher stages of the visual hierarchy
in the ventral pathway and the MTL
; (2) the well-characterized
reactivity of the cortical stages feeding into the MTL to the sight of
faces, objects, or spatial scenes (as ascertained using functional
magnetic resonance imaging (fMRI) in humans
and electrophysi-
ology in monkeys
); and (3) the observation that any visual
percept that will be consciously remembered later on will have to be
represented in the hippocampal system
. This is true even though
patients with bilateral loss of parts of the MTL do not, in general,
have a deficit in the perception of images
. Neurons in the MTL
might have a fundamental role in learning associations between
abstract representations
. Thus, our observed invariant responses
probably arise from experiencing very different pictures, words or
other visual stimuli in association with a given individual or object.
How neurons encode different percepts is one of the most intri-
guing questions in neuroscience. Two extreme hypotheses are
schemes based on the explicit representations by highly selective
(cardinal, gnostic or grandmother) neurons and schemes that rely on
an implicit representation over a very broad and distributed popu-
lation of neurons
. In the latter case, recognition would require the
simultaneous activation of a large number of cells and therefore we
would expect each cell to respond to many pictures with similar basic
features. This is in contrast to the sparse firing we observe, because
most MTL cells do not respond to the great majority of images seen
by the patient. Furthermore, cells signal a particular individual or
object in an explicit manner
, in the sense that the presence of the
individual can, in principle, be reliably decoded from a very small
number of neurons. We do not mean to imply the existence of single
neurons coding uniquely for discrete percepts for several reasons:
first, some of these units responded to pictures of more than one
individual or object; second, given the limited duration of our
recording sessions, we can only explore a tiny portion of stimulus
space; and third, the fact that we can discover in this short time some
such as photographs of Jennifer Aniston
that drive the
cells suggests that each cell might represent more than one class of
images. Yet, this subset of MTL cells is selectively activated by
different views of individuals, landmarks, animals or objects. This
is quite distinct from a completely distributed population code and
suggests a sparse, explicit and invariant encoding of visual percepts in
MTL. Such an abstract representation, in contrast to the metric
representation in the early stages of the visual pathway, might be
important in the storage of long-term memories. Other factors,
including emotional responses towards some images, could concei-
vably influence the neuronal activity as well. The responses of these
neurons are reminiscent of the behaviour of hippocampal place cells
in rodents
that only fire if the animal moves through a particular
spatial location, with the actual place field defined independently of
sensory cues. Notably, place cells have been found recently in the
human hippocampus as well
. Both classes of neurons
place cells
and the cells in the present study
have a very low baseline activity
and respond in a highly selective manner. Future research might
show that this similarity has functional implications, enabling
mammals to encode behaviourally important features of the environ-
ment and to transition between them, either in physical space or in a
more conceptual space
The data in the present study come from 21 sessions in 8 patients with
pharmacologically intractable epilepsy (eight right handed; 3 male; 17–47 years
old). Extensive non-invasive monitoring did not yield concordant data corre-
sponding to a single resectable epileptogenic focus. Therefore, the patients were
implanted with chronic depth electrodes for 7–10 days to determine the seizure
focus for possible surgical resection
. Here we report data from sites in the
hippocampus, amygdala, entorhinal cortex and parahippocampal gyrus. All
studies conformed to the guidelines of the Medical Institutional Review Board at
UCLA. The electrode locations were based exclusively on clinical criteria and
were verified by MRI or by computer tomography co-registered to preoperative
MRI. Each electrode probe had a total of nine micro-wires at its end
, eight
active recording channels and one reference. The differential signal from the
micro-wires was amplified using a 64-channel Neuralynx system, filtered
between 1 and 9,000 Hz. We computed the power spectrum for every unit
after spike sorting. Units that showed evidence of line noise were excluded from
subsequent analysis
. Signals were sampled at 28 kHz. Each recording session
lasted about 30 min.
Subjects lay in bed, facing a laptop computer. Each image covered about 1.58
and was presented at the centre of the screen six times for 1 s. The order of the
pictures was randomized. Subjects had to respond, after image offset, according
to whether the picture contained a human face or something else by pressing the
Figure 4 | Distribution of the area under the ROC curves for the 51 units
(out of 132 responsive units) showing an invariant representation.
these, 43 responded to a single individual or object and 8 to two individuals
or objects. The dashed vertical line marks the median of the distribution
LETTERS NATURE|Vol 435|23 June 2005
‘Y’ and ‘N’ keys, respectively. This simple task, on which performance was
virtually flawless, required them to attend to the pictures. After the experiments,
patients gave feedback on whether they recognized the images or not. Pictures
included famous and unknown individuals, animals, landmarks and objects. We
tried to maximize the differences between pictures of the individuals (for
example, different clothing, size, point of view, and so on). In 18 of the 21
sessions, we also presented letter strings with names of individuals or objects.
The data from the screening sessions were rapidly processed to identify
responsive units and images. All pictures that elicited a response in the screening
session were included in the later testing sessions. Three to eight different views
of seven to twenty-three different individuals or objects were used in the testing
sessions with a mean of 88.6 images per session (range 70–110). Spike detection
and sorting was applied to the continuous recordings using a novel clustering
(see Supplementary Information). The response to a picture was
defined as the median number of spikes across trials between 300 and 1,000 ms
after stimulus onset. Baseline activity was the average spike count for all pictures
between 1,000 and 300 ms before stimulus onset. A unit was considered
responsive if the activity to at least one picture fulfilled two criteria: (1) the
median number of spikes was larger than the average number of spikes for the
baseline plus 5 s.d.; and (2) the median number of spikes was at least two.
The classification between single unit and multi-unit was done visually based
on the following: (1) the spike shape and its variance; (2) the ratio between the
spike peak value and the noise level; (3) the inter-spike interval distribution of
each cluster; and (4) the presence of a refractory period for the single units (that
is, less than 1% of spikes within less than 3 ms inter-spike interval).
Whenever a unit had a response to a given stimulus, we further analysed the
responses to other pictures of the same individual or object by a ROC analysis.
This tested whether cells responded selectively to pictures of a given individual.
The hit rate (y axis) was defined as the number of responses to the individual
divided by the total number of pictures of this individual. The false positive rate
(x axis) was defined as the number of responses to the other pictures divided by
the total number of other pictures. The ROC curve was obtained by gradually
lowering the threshold of the responses (the median number of spikes in Figs 1b,
2b and 3b). Starting with a very high threshold (no hits, no false positives, lower
left-hand corner in the ROC diagram), if a unit responds exclusively to an image
of a particular individual or object, the ROC curve will show a steep increase
when lowering the threshold (a hit rate of 1 and no false positives). If a unit
responds to a random selection of pictures, it will have a similar relative number
of hits and false positives and the ROC curve will fall along the diagonal. In the
first case, for a highly invariant unit, the area under the ROC curve will be close
to 1, whereas in the latter case it will be about 0.5. To evaluate the statistical
significance, we created 99 surrogate curves for each responsive unit, testing the
null hypothesis that the unit responded preferentially to n randomly chosen
pictures (with n being the number of pictures of the individual for which
invariance was tested). A unit was considered invariant to a certain individual or
object if the area under the ROC curve was larger than the area of all of the 99
surrogates (that is, with a confidence of P , 0.01). Alternatively, the ROC
analysis can be done with the single trial responses instead of the median
responses across trials. Here, responses to the trials corresponding to any picture
of the individual tested are considered as hits and responses to trials to other
pictures as false positives. This trial-by-trial analysis led to very similar results,
with 55 units of all 132 responsive units showing an invariant representation. A
one-way ANOVA also yielded similar results. In particular, we tested whether the
distribution of median firing rates for all responsive units showed a dependence
on the factor identity (that is, the individual, landmark or object shown). The
different views of each individual were the repeated measures. As with the ROC
analysis, an ANOVA test was performed on all responsive units. Overall, the
results were very similar to those obtained with the ROC analysis: of 132
responsive units, 49 had a significant effect for factor identity with P , 0.01,
compared to 51 units showing an invariant representation with the ROC
analysis. The ANOVA analysis, however, does not demonstrate that the invariant
responses were very selective, whereas the ROC analysis explicitly tests the
presence of an invariant as well as sparse representation.
Images were obtained from Corbis and Photorazzi, with licensed rights to
reproduce them in this paper and in the Supplementary Information.
Received 1 December 2004; accepted 3 February 2005.
1. Barlow, H. Single units and sensation: a neuron doctrine for perception.
Perception 1, 371–-394 (1972).
2. Gross, C. G., Bender, D. B. & Rocha-Miranda, C. E. Visual receptive fields of
neurons in inferotemporal cortex of the monkey. Science 166, 1303–-1306
3. Konorski, J. Integrative Activity of the Brain (Univ. Chicago Press, Chicago, 1967).
4. Logothetis, N. K. & Sheinberg, D. L. Visual object recognition. Annu. Rev.
Neurosci. 19, 577–-621 (1996).
5. Riesenhuber, M. & Poggio, T. Neural mechanisms of object recognition. Curr.
Opin. Neurobiol. 12, 162–-168 (2002).
6. Young, M. P. & Yamane, S. Sparse population coding of faces in the inferior
temporal cortex. Science 256, 1327–-1331 (1992).
7. Logothetis, N. K., Pauls, J. & Poggio, T. Shape representation in the inferior
temporal cortex of monkeys. Curr. Biol. 5, 552–-563 (1995).
8. Logothetis, N. K. & Pauls, J. Psychophysical and physiological evidence for
viewer-centered object representations in the primate. Cereb. Cortex 3,
270–-288 (1995).
9. Perrett, D., Rolls, E. & Caan, W. Visual neurons responsive to faces in the
monkey temporal cortex. Exp. Brain Res. 47, 329–-342 (1982).
10. Schwartz, E. L., Desimone, R., Albright, T. D. & Gross, C. G. Shape recognition
and inferior temporal neurons. Proc. Natl Acad. Sci. USA 80, 5776–-5778 (1983).
11. Tanaka, K. Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19,
109–-139 (1996).
12. Miyashita, Y. & Chang, H. S. Neuronal correlate of pictorial short-term memory
in the primate temporal cortex. Nature 331, 68–-71 (1988).
13. Fried, I., MacDonald, K. A. & Wilson, C. Single neuron activity in human
hippocampus and amygdale during recognition of faces and objects. Neuron 18,
753–-765 (1997).
14. Kreiman, G., Koch, C. & Fried, I. Category-specific visual responses of single
neurons in the human medial temporal lobe. Nature Neurosci. 3, 946–-953
15. Macmillan, N. A. & Creelman, C. D. Detection Theory: A User’s Guide
(Cambridge Univ. Press, New York, 1991).
16. Picton, T. The P300 wave of the human event-related potential. J. Clin.
Neurophysiol. 9, 456–-479 (1992).
17. Halgren, E., Marinkovic, K. & Chauvel, P. Generators of the late cognitive
potentials in auditory and visual oddball tasks. Electroencephalogr. Clin.
Neurophysiol. 106, 156–-164 (1998).
18. McCarthy, G., Wood, C. C., Williamson, P. D. & Spencer, D. D. Task-dependent
field potentials in human hippocampal formation. J. Neurosci. 9, 4253–-4268
19. Saleem, K. S. & Tanaka, K. Divergent projections from the anterior
inferotemporal area TE to the perirhinal and entorhinal cortices in the macaque
monkey. J. Neurosci. 16, 4757–-4775 (1996).
20. Suzuki, W. A. Neuroanatomy of the monkey entorhinal, perirhinal and
parahippocampal cortices: Organization of cortical inputs and interconnections
with amygdale and striatum. Seminar Neurosci. 8, 3–-12 (1996).
21. Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: A module
in human extrastriate cortex specialized for face perception. J. Neurosci. 17,
4302–-4311 (1997).
22. Haxby, J. V. et al. Distributed and overlapping representations of faces and
objects in ventral temporal cortex. Science 293, 2425–-2430 (2001).
23. Eichenbaum, H. A cortical-hippocampal system for declarative memory. Nature
Rev. Neurosci. 1, 41–-50 (2000).
24. Hampson, R. E., Pons, P. P., Stanford, T. R. & Deadwyler, S. A. Categorization in
the monkey hippocampus: A possible mechanism for encoding information
into memory. Proc. Natl Acad. Sci. USA 101, 3184–-3189 (2004).
25. Squire, L. R., Stark, C. E. L. & Clark, R. E. The medial temporal lobe. Annu. Rev.
Neurosci. 27, 279–-306 (2004).
26. Mishashita, Y. Neuronal correlate of visual associative long-term memory in
the primate temporal cortex. Nature 335, 817–-820 (1988).
27. Koch, C. The Quest for Consciousness: A Neurobiological Approach (Roberts,
Englewood, Colorado, 2004).
28. Wilson, M. A. & McNaughton, B. L. Dynamics of the hippocampal ensemble
code for space. Science 261, 1055–-1058 (1993).
29. Ekstrom, A. D. et al. Cellular networks underlying human spatial navigation.
Nature 425, 184–-187 (2003).
30. Quian Quiroga, R., Nadasdy, Z. & Ben-Shaul, Y. Unsupervised spike detection
and sorting with wavelets and super-paramagnetic clustering. Neural Comput.
16, 1661–-1687 (2004).
Supplementary Information is linked to the online version of the paper at
Acknowledgements We thank all patients for their participation; P. Sinha for
drawing some faces; colleagues for providing pictures; I. Wainwright for
administrative assistance; and E. Behnke, T. Fields, E. Ho, E. Isham, A. Kraskov,
P. Steinmetz, I. Viskontas and C. Wilson for technical assistance. This work was
supported by grants from the NINDS, NIMH, NSF, DARPA, the Office of Naval
Research, the W.M. Keck Foundation Fund for Discovery in Basic Medical
Research, a Whiteman fellowship (to G.K.), the Gordon Moore Foundation, the
Sloan Foundation, and the Swartz Foundation for Computational Neuroscience.
Author Information Reprints and permissions information is available at The authors declare no competing
financial interests. Correspondence and request for materials should be
addressed to R.Q.Q. (
NATURE|Vol 435|23 June 2005 LETTERS
... Organic materials are also considered as ideal candidates for neuromorphic electronics because of their excellent optoelectronic and mechanical properties. For instance, Chen et al. developed different types of synaptic devices based on organic materials, such as synaptic transistors with multi-sensing-memory-computing [71] , stretchable synaptic transistors with tunable behavior [23] , and artificial multisensory integration nervous systems based on synaptic transistors [72] . Notably, certain organic and inorganic functional layer materials usually have high electrical qualities; however, they cannot withstand large mechanical deformations owing to their natural rigidity and brittleness, limiting their application in stretchable electronics. ...
... Therefore, the incorporation of multisensory learning capabilities into electronic devices for the development of artificial sensory systems is increasing. Previous studies have attempted to emulate biological multisensory functionalities by integrating various types of sensors with artificial neuromorphic devices [27,72,85,86] . Wu et al. proposed an artificial multisensory integrated nervous system that integrates a flexible triboelectric nanogenerator (TENG) with an organic photosynaptic transistor and can emulate both haptic and iconic perception behaviors [72] . ...
... Previous studies have attempted to emulate biological multisensory functionalities by integrating various types of sensors with artificial neuromorphic devices [27,72,85,86] . Wu et al. proposed an artificial multisensory integrated nervous system that integrates a flexible triboelectric nanogenerator (TENG) with an organic photosynaptic transistor and can emulate both haptic and iconic perception behaviors [72] . The multisensory integration nervous system successfully emulated typical multisensory integration behaviors, including the inverse effectiveness effect and temporal congruency, and the pattern recognition accuracy of the multisensory integration system was higher than that of a single-sense system. ...
Full-text available
Using flexible neuromorphic electronics that emulate biological neuronal systems is an innovative approach for facilitating the implementation of next-generation artificial intelligence devices, including wearable computers, soft robotics devices, and neuroprosthetics. Stretchable synaptic transistors based on field-effect transistors (FETs), which have functions and structures resembling those of biological synapses, are promising technological devices in flexible neuromorphic electronics owing to their high flexibility, excellent biocompatibility, and easy processability. However, obtaining stretchable synaptic FETs with various synaptic characteristics and good stretching stabilities is challenging. Significant efforts to produce stretchable synaptic FETs have been undertaken; and remarkable advances in materials, fabrication processes, and applications have been achieved. From this perspective, we discuss the requirements for neuromorphic devices in flexible neuromorphic electronics and the advantages of stretchable synaptic FETs. Moreover, representative methods used to implement stretchable synaptic transistors, including the structural design and development of intrinsically stretchable devices, are introduced. Additionally, the application of stretchable synaptic transistors in artificial sensory systems such as light, tactile, and multisensory artificial nervous systems is also discussed. Finally, we highlight the possible challenges in implementing and using stretchable synaptic transistors, propose solutions to overcome the current limitations of these devices, and suggest future research directions.
... Instead, the perirhinal cortex (and more generally the medial temporal lobe) may be learning new representations for new objects by drawing upon and combining existing visual features and representations from visual cortex (Deshmukh et al., 2012). This may give rise to specialized "grandmother cells" (Bowers, 2017) (or Jennifer Aniston neurons; Quiroga et al. (2005); Quiroga (2017)) that can be trained on top of an otherwise rather immutable visual cortex backbone. While the grandmother cell hypothesis remains debated in neuroscience (vs. ...
In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentralized population of LL agents that each sequentially learn different tasks, with all agents operating independently and in parallel. After learning their respective tasks, agents share and consolidate their knowledge over a decentralized communication network, so that, in the end, all agents can master all tasks. We present one solution to SKILL which uses Lightweight Lifelong Learning (LLL) agents, where the goal is to facilitate efficient sharing by minimizing the fraction of the agent that is specialized for any given task. Each LLL agent thus consists of a common task-agnostic immutable part, where most parameters are, and individual task-specific modules that contain fewer parameters but are adapted to each task. Agents share their task-specific modules, plus summary information ("task anchors") representing their tasks in the common task-agnostic latent space of all agents. Receiving agents register each received task-specific module using the corresponding anchor. Thus, every agent improves its ability to solve new tasks each time new task-specific modules and anchors are received. On a new, very challenging SKILL-102 dataset with 102 image classification tasks (5,033 classes in total, 2,041,225 training, 243,464 validation, and 243,464 test images), we achieve much higher (and SOTA) accuracy over 8 LL baselines, while also achieving near perfect parallelization. Code and data can be found at
... On a different scale, work in computational neuroscience aimed at understanding spiking neural data also reduces the dimensionality of their data to understand its underlying computation. Previously, researchers hypothesized that single neurons would be related to specific visual inputs [34] but our contemporary understanding is that cognitive processes are likely more directly related to the lower-dimensional dynamics of groups of neurons [48]. A similar revelation is currently unfolding at the whole-brain scale. ...
Full-text available
The neural dynamics underlying brain activity are critical to understanding cognitive processes and mental disorders. However, current voxel-based whole-brain dimensionality reduction techniques fall short of capturing these dynamics, producing latent timeseries that inadequately relate to behavioral tasks. To address this issue, we introduce a novel approach to learning low-dimensional approximations of neural dynamics by using a sequential variational autoencoder (SVAE) that represents the latent dynamical system via a neural ordinary differential equation (NODE). Importantly, our method finds smooth dynamics that can predict cognitive processes with accuracy higher than classical methods. Our method also shows improved spatial localization to task-relevant brain regions and identifies well-known structures such as the motor homunculus from fMRI motor task recordings. We also find that non-linear projections to the latent space enhance performance for specific tasks, offering a promising direction for future research. We evaluate our approach on various task-fMRI datasets, including motor, working memory, and relational processing tasks, and demonstrate that it outperforms widely used dimensionality reduction techniques in how well the latent timeseries relates to behavioral sub-tasks, such as left-hand or right-hand tapping. Additionally, we replace the NODE with a recurrent neural network (RNN) and compare the two approaches to understand the importance of explicitly learning a dynamical system. Lastly, we analyze the robustness of the learned dynamical systems themselves and find that their fixed points are robust across seeds, highlighting our method's potential for the analysis of cognitive processes as dynamical systems.
... This conceptualisation is supported by the ndings of the current study, as our included patients demonstrated ipsilesional neglect in the acute phase, before compensatory mechanisms would be likely to have exerted their effects (Kwon & Heilman, 1991). Additionally, bilateral inferior frontal and temporal regions are key anatomical components of the ventral visual processing stream, and are involved in object-level perceptual and recognition processes (Borowsky et al., 2007;Grill-Spector, 2003;Quiroga et al., 2005). Damage to the internal and external capsule white matter disrupts communication within this fronto-temporal network, and may therefore be related to the occurrence of allocentric neglect impairment. ...
Full-text available
Visuospatial neglect is a common, post-stroke disorder of perception which is widely considered to be a disconnection syndrome. However, the patterns of dysconnectivity associated with neglect remain unclear. Here we had 530 acute stroke survivors (age = 72.8 (SD = 13.3), 44.3% female, 7.5 days poststroke (SD = 11.3)) undertake routine clinical imaging and standardised neglect testing. The data were used to conduct voxel-wise, tract-level, and network-level lesion-mapping analyses aimed at localising the neural correlates of left and right egocentric (body-centred) and allocentric (object-centred) neglect. Only minimal anatomical homogeneity was present between the correlates of right and left egocentric neglect across all analysis types. This finding challenges previous work suggesting that right and left neglect are anatomically homologous, and instead suggests that egocentric neglect may involve damage to a shared, but hemispherically asymmetric attention network. By contrast, egocentric and allocentric neglect were associated with dysconnectivity in a distinct but overlapping set of network edges, with both deficits related to damage across the dorsal and ventral attention networks. Critically, this finding suggests that the distinction between egocentric and allocentric neglect is unlikely to reflect a simple dichotomy between dorsal versus ventral networks dysfunction, as is commonly asserted. Taken together, the current findings provide a fresh perspective on the neural circuitry involved in regulating visuospatial attention, and provide important clues to understanding the cognitive and perceptual processes involved in this common and debilitating neuropsychological syndrome.
... Connections to Neuroscience In a promising demonstration of consilience, our previous discussion of superposition has striking analogues with coding theory from biological neuroscience-the study of how neurons in the brain map to sensory stimulus [57,58,59]. On one extreme, local coding theory posits the existence of "monosemantic" biological neurons which respond to a very specific stimulus (e.g., pictures of Jennifer Aniston [60]). Superposition is then analogous to sparse coding where a subset of neurons encode some feature about the input [58,61]. ...
Full-text available
Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how high-level human-interpretable features are represented within the internal neuron activations of LLMs. We train $k$-sparse linear classifiers (probes) on these internal activations to predict the presence of features in the input; by varying the value of $k$ we study the sparsity of learned representations and how this varies with model scale. With $k=1$, we localize individual neurons which are highly relevant for a particular feature, and perform a number of case studies to illustrate general properties of LLMs. In particular, we show that early layers make use of sparse combinations of neurons to represent many features in superposition, that middle layers have seemingly dedicated neurons to represent higher-level contextual features, and that increasing scale causes representational sparsity to increase on average, but there are multiple types of scaling dynamics. In all, we probe for over 100 unique features comprising 10 different categories in 7 different models spanning 70 million to 6.9 billion parameters.
Robert Doyle has argued that traditional difficulties associated with the concept of free will can be resolved by a two-stage approach, explored by several philosophers and scientists. Possibilities for action can be generated within the brain, more or less randomly, from which an ‘adequately determined’ choice can be made. A similar selective process is well worked out within immune cells, which may provide a useful model. A degree of randomness is pervasive in the physical world but living systems have evolved to both tame and exploit this aspect at the interface between fluid and solid phases. Random and systematically determined processes are used in combination. These cell biological considerations support the plausibility of the two-stage model and may help point to specific mechanisms but may also raise a question about ‘who’ is doing the choosing. Computer modelling may sharpen the focus of that question. It may also highlight a paradox – that ‘freedom’ may only exist as the flipside of limitations or constraints imposed by real world situations.
The magnitude of neuronal activation is commonly considered a critical factor for conscious perception of visual content. However, this dogma contrasts with the phenomenon of rapid adaptation, in which the magnitude of neuronal activation drops dramatically in a rapid manner while the visual stimulus and the conscious experience it elicits remain stable. Here, we report that the profiles of multi-site activation patterns and their relational geometry-i.e., the similarity distances between activation patterns, as revealed using intracranial electroencephalographic (iEEG) recordings-are sustained during extended visual stimulation despite the major magnitude decrease. These results are compatible with the hypothesis that conscious perceptual content is associated with the neuronal pattern profiles and their similarity distances, rather than the overall activation magnitude, in human visual cortex.
Encoding an event that overlaps with a previous experience may involve reactivating an existing memory and integrating it with new information or suppressing the existing memory to promote formation of a distinct, new representation. We used fMRI during overlapping event encoding to track reactivation and suppression of individual, related memories. We further used a model of semantic knowledge based on Wikipedia to quantify both reactivation of semantic knowledge related to a previous event and formation of integrated memories containing semantic features of both events. Representational similarity analysis revealed that reactivation of semantic knowledge related to a prior event in posterior medial prefrontal cortex (pmPFC) supported memory integration during new learning. Moreover, anterior hippocampus (aHPC) formed integrated representations combining the semantic features of overlapping events. We further found evidence that aHPC integration may be modulated on a trial-by-trial basis by interactions between ventrolateral PFC and anterior mPFC, with suppression of item-specific memory representations in anterior mPFC inhibiting hippocampal integration. These results suggest that PFC-mediated control processes determine the availability of specific relevant memories during new learning, thus impacting hippocampal memory integration.
Objective: The aim of this study was to evaluate the utility and safety of "hybrid" stereo-electroencephalography (SEEG) in guiding epilepsy surgery and in providing information at single-neuron levels (i.e., single-unit recording) to further the understanding of the mechanisms of epilepsy and the neurocognitive processes unique to humans. Methods: The authors evaluated 218 consecutive patients undergoing SEEG procedures from 1993 through 2018 at a single academic medical center to assess the utility and safety of this technique in both guiding epilepsy surgery and providing single-unit recordings. The hybrid electrodes used in this study contained macrocontacts and microwires to simultaneously record intracranial EEG and single-unit activity (hybrid SEEG). The outcomes of SEEG-guided surgical interventions were examined, as well as the yield and scientific utility of single-unit recordings in 213 patients who participated in the research involving single-unit recordings. Results: All patients underwent SEEG implantation by a single surgeon and subsequent video-EEG monitoring (mean of 10.2 electrodes per patient and 12.0 monitored days). Epilepsy networks were localized in 191 (87.6%) patients. Two clinically significant procedural complications (one hemorrhage and one infection) were noted. Of 130 patients who underwent subsequent focal epilepsy surgery with a minimum 12-month follow-up, 102 (78.5%) underwent resective surgery and 28 (21.5%) underwent closed-loop responsive neurostimulation (RNS) with or without resection. Seizure freedom was achieved in 65 (63.7%) patients in the resective group. In the RNS group, 21 (75.0%) patients achieved 50% or greater seizure reduction. When the initial period of 1993 through 2013 before responsive neurostimulator implantation in 2014 was compared with the subsequent period of 2014 through 2018, the proportion of SEEG patients undergoing focal epilepsy surgery grew from 57.9% to 79.7% due to the advent of RNS, despite a decline in focal resective surgery from 55.3% to 35.6%. A total of 18,680 microwires were implanted in 213 patients, resulting in numerous significant scientific findings. Recent recordings from 35 patients showed a yield of 1813 neurons, with a mean yield of 51.8 neurons per patient. Conclusions: Hybrid SEEG enables safe and effective localization of epileptogenic zones to guide epilepsy surgery and provides unique scientific opportunities to investigate neurons from various brain regions in conscious patients. This technique will be increasingly utilized due to the advent of RNS and may prove a useful approach to probe neuronal networks in other brain disorders.
Full-text available
The P300 wave is a positive deflection in the human event-related potential. It is most commonly elicited in an "oddball" paradigm when a subject detects an occasional "target" stimulus in a regular train of standard stimuli. The P300 wave only occurs if the subject is actively engaged in the task of detecting the targets. Its amplitude varies with the improbability of the targets. Its latency varies with the difficulty of discriminating the target stimulus from the standard stimuli. A typical peak latency when a young adult subject makes a simple discrimination is 300 ms. In patients with decreased cognitive ability, the P300 is smaller and later than in age-matched normal subjects. The intracerebral origin of the P300 wave is not known and its role in cognition not clearly understood. The P300 may have multiple intracerebral generators, with the hippocampus and various association areas of the neocortex all contributing to the scalp-recorded potential. The P300 wave may represent the transfer of information to consciousness, a process that involves many different regions of the brain.
Cells in area TE of the inferotemporal cortex of the monkey brain selectively respond to various moderately complex object features, and those that cluster in a columnar region that runs perpendicular to the cortical surface respond to similar features. Although cells within a column respond to similar features, their selectivity is not necessarily identical. The data of optical imaging in TE have suggested that the borders between neighboring columns are not discrete; a continuous mapping of complex feature space within a larger region contains several partially overlapped columns. This continuous mapping may be used for various computations, such as production of the image of the object at different viewing angles, illumination conditions. and articulation poses.
Ensemble recordings of 73 to 148 rat hippocampal neurons were used to predict accurately the animals' movement through their environment, which confirms that the hippocampus transmits an ensemble code for location. In a novel space, the ensemble code was initially less robust but improved rapidly with exploration. During this period, the activity of many inhibitory cells was suppressed, which suggests that new spatial information creates conditions in the hippocampal circuitry that are conducive to the synaptic modification presumed to be involved in learning. Development of a new population code for a novel environment did not substantially alter the code for a familiar one, which suggests that the interference between the two spatial representations was very small. The parallel recording methods outlined here make possible the study of the dynamics of neuronal interactions during unique behavioral events.
In human long-term memory, ideas and concepts become associated in the learning process. No neuronal correlate for this cognitive function has so far been described, except that memory traces are thought to be localized in the cerebral cortex; the temporal lobe has been assigned as the site for visual experience because electric stimulation of this area results in imagery recall and lesions produce deficits in visual recognition of objects. We previously reported that in the anterior ventral temporal cortex of monkeys, individual neurons have a sustained activity that is highly selective for a few of the 100 coloured fractal patterns used in a visual working-memory task. Here I report the development of this selectivity through repeated trials involving the working memory. The few patterns for which a neuron was conjointly selective were frequently related to each other through stimulus-stimulus association imposed during training. The results indicate that the selectivity acquired by these cells represents a neuronal correlate of the associative long-term memory of pictures.
Theory of learning and perception inspired by contemporary neurophysiology. Harvard Book List (edited) 1971 #150 (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Experimental lesion studies in monkeys have demonstrated that the cortical areas surrounding the hippocampus, including the entorhinal, perirhinal and parahippocampal cortices play an important role in declarative memory (i.e. memory for facts and events). A series of neuroanatomical studies, motivated in part by the lesion studies, have shown that the macaque monkey entorhinal, perirhinal and parahippocampal cortices are polymodal association areas that each receive distinctive complements of cortical inputs. These areas also have extensive interconnections with other brain areas implicated in non-declarative forms of memory including the amygdala and striatum. This pattern of connections is consistent with the idea that the entorhinal, perirhinal and parahippocampal cortices may participate in a larger network of structures that integrates information across memory systems.
Detection Theory is an introduction to one of the most important tools for analysis of data where choices must be made and performance is not perfect. Originally developed for evaluation of electronic detection, detection theory was adopted by psychologists as a way to understand sensory decision making, then embraced by students of human memory. It has since been utilized in areas as diverse as animal behavior and X-ray diagnosis. This book covers the basic principles of detection theory, with separate initial chapters on measuring detection and evaluating decision criteria. Some other features include: complete tools for application, including flowcharts, tables, pointers, and software;. student-friendly language;. complete coverage of content area, including both one-dimensional and multidimensional models;. separate, systematic coverage of sensitivity and response bias measurement;. integrated treatment of threshold and nonparametric approaches;. an organized, tutorial level introduction to multidimensional detection theory;. popular discrimination paradigms presented as applications of multidimensional detection theory; and. a new chapter on ideal observers and an updated chapter on adaptive threshold measurement. This up-to-date summary of signal detection theory is both a self-contained reference work for users and a readable text for graduate students and other researchers learning the material either in courses or on their own. © 2005 by Lawrence Erlbaum Associates, Inc. All rights reserved.