
Eero P. SimoncelliNew York University | NYU · Center for Neural Science (CNS)
Eero P. Simoncelli
PhD, Elec Eng & Comp Sci, MIT
About
411
Publications
123,592
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
126,001
Citations
Introduction
Additional affiliations
January 2001 - present
September 1996 - present
January 1993 - August 1996
Publications
Publications (411)
Prediction is a fundamental capability of all living organisms, and has been proposed as an objective for learning sensory representations. Recent work demonstrates that in primate visual systems, prediction is facilitated by neural representations that follow straighter temporal trajectories than their initial photoreceptor encoding, which allows...
Temporal prediction is inherently uncertain, but representing the ambiguity in natural image sequences is a challenging high-dimensional probabilistic inference problem. For natural scenes, the curse of dimensionality renders explicit density estimation statistically and computationally intractable. Here, we describe an implicit regression-based fr...
Image representations (artificial or biological) are often compared in terms of their global geometry; however, representations with similar global structure can have strikingly different local geometries. Here, we propose a framework for comparing a set of image representations in terms of their local geometries. We quantify the local geometry of...
Score diffusion methods can learn probability densities from samples. The score of the noise-corrupted density is estimated using a deep neural network, which is then used to iteratively transport a Gaussian white noise density to a target density. Variants for conditional densities have been developed, but correct estimation of the corresponding s...
Fixational eye movements alter the number and timing of spikes transmitted from the retina to the brain, but whether these changes enhance or degrade the retinal signal is unclear. To quantify this, we developed a Bayesian method for reconstructing natural images from the recorded spikes of hundreds of retinal ganglion cells (RGCs) in the macaque r...
The visual world is richly adorned with texture, which can serve to delineate important elements of natural scenes. In anesthetized macaque monkeys, selectivity for the statistical features of natural texture is weak in V1, but substantial in V2, suggesting that neuronal activity in V2 might directly support texture perception. To test this, we inv...
Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks...
The perception of sensory attributes is often quantified through measurements of sensitivity (the ability to detect small stimulus changes), as well as through direct judgments of appearance or intensity. Despite their ubiquity, the relationship between these two measurements remains controversial and unresolved. Here, we propose a framework in whi...
Efficient coding theory posits that sensory circuits transform natural signals into neural representations that maximize information transmission subject to resource constraints. Local interneurons are thought to play an important role in these transformations, shaping patterns of circuit activity to facilitate and direct information flow. However,...
We re-examine the problem of reconstructing a high-dimensional signal from a small set of linear measurements, in combination with image prior from a diffusion probabilistic model. Well-established methods for optimizing such measurements include principal component analysis (PCA), independent component analysis (ICA) and compressed sensing (CS), a...
A bstract
We have measured the visually evoked activity of single neurons recorded in areas V1 and V2 of awake, fixating macaque monkeys, and captured their responses with a common computational model. We used a stimulus set composed of “droplets” of localized contrast, band-limited in orientation and spatial frequency; each brief stimulus containe...
The visual world is richly adorned with texture, which can serve to delineate important elements of natural scenes. In anesthetized macaque monkeys, selectivity for the statistical features of natural texture is weak in V1, but substantial in V2, suggesting that neuronal activity in V2 might directly support texture percep- tion. To test this, we i...
Humans and monkeys can effortlessly recognize objects in everyday scenes. This ability relies on neural computations in the ventral stream of visual cortex. The intermediate computations that lead to object selectivity are not well understood, but previous studies implicate V4 as an early site of selectivity for object shape. To explore the mechani...
Sensory-guided behavior requires reliable encoding of stimulus information in neural populations, and flexible, task-specific readout. The former has been studied extensively, but the latter remains poorly understood. We introduce a theory for adaptive sensory processing based on functionally-targeted stochastic modulation. We show that responses o...
Internal representations are not uniquely identifiable from perceptual measurements: different representations can generate identical perceptual predictions, and similar representations may predict dissimilar percepts. Here, we generalize a previous method ("Eigendistortions" - Berardino et al., 2017) to enable comparison of models based on their m...
Human ability to discriminate and identify visual attributes varies across the visual field, and is generally worse in the periphery than in the fovea. This decline in performance is revealed in many kinds of tasks, from detection to recognition. A parsimonious hypothesis is that the representation of any visual feature is blurred (spatially averag...
Human ability to discriminate and identify visual attributes varies across the visual field, and is generally worse in the periphery than in the fovea. This decline in performance is revealed in many kinds of tasks, from detection to recognition. A parsimonious hypothesis is that the representation of any visual feature is blurred (spatially averag...
Neurons in early sensory areas rapidly adapt to changing sensory statistics, both by normalizing the variance of their individual responses and by reducing correlations between their responses. Together, these transformations may be viewed as an adaptive form of statistical whitening. Existing mechanistic models of adaptive whitening exclusively us...
The retina transmits visual signals to the brain in the spiking activity of retinal ganglion cells (RGCs). This signal is necessarily imperfect: some visual information is lost in phototransduction and retinal processing. To quantify the transmitted visual signal, we developed a Bayesian method to reconstruct images from the simultaneously recorded...
Sensory systems across all modalities and species exhibit adaptation to continuously changing input statistics. Individual neurons have been shown to modulate their response gains so as to maximize information transmission in different stimulus contexts. Experimental measurements have revealed additional, nuanced sensory adaptation effects includin...
Human ability to discriminate and identify visual attributes varies across the visual field, and is generally worse in the periphery than in the fovea. This decline in performance is revealed in many kinds of tasks, from detection to recognition. A parsimonious hypothesis is that the representation of any visual feature is blurred (spatially averag...
Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their liv...
Self-supervised Learning (SSL) provides a strategy for constructing useful representations of images without relying on hand-assigned labels. Many such methods aim to map distinct views of the same scene or object to nearby points in the representation space, while employing some constraint to prevent representational collapse. Here we recast the p...
Observer motion and continuous deformations of objects and surfaces imbue natural videos with distinct temporal structures, enabling partial prediction of future frames from past ones. Conventional methods first estimate local motion, or optic flow, and then use it to predict future frames by warping or copying content. Here, we explore a more dire...
Statistical whitening transformations play a fundamental role in many computational systems, and may also play an important role in biological sensory systems. Individual neurons appear to rapidly and reversibly alter their input-output gains, approximately normalizing the variance of their responses. Populations of neurons appear to regulate their...
Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their liv...
Perceptual sensitivity often improves with training, a phenomenon known as 'perceptual learning'. Another important perceptual dimension is appearance, the subjective sense of stimulus magnitude. Are training-induced improvements in sensitivity accompanied by more accurate appearance? Here, we examine this question by measuring both discrimination...
A fraction of the visual information arriving at the retina is transmitted to the brain by signals in the optic nerve, and the brain must rely solely on these signals to make inferences about the visual world. Previous work has probed the visual information contained in retinal signals by reconstructing images from retinal activity using linear reg...
The perception of sensory attributes is often quantified through measurements of sensitivity (the ability to detect small stimulus changes), as well as through direct judgements of appearance or intensity. Despite their ubiquity, the relationship between these two measurements remains controversial and unresolved. Here, we propose a framework in wh...
Neurons in primate visual cortex (area V1) are tuned for spatial frequency, in a manner that depends on their position in the visual field. Several studies have examined this dependency using functional magnetic resonance imaging (fMRI), reporting preferred spatial frequencies (tuning curve peaks) of V1 voxels as a function of eccentricity, but the...
Denoising is a fundamental challenge in scientific imaging. Deep convolutional neural networks (CNNs) provide the current state of the art in denoising photographic images. However, their potential has been inadequately explored for scientific imaging. Denoising CNNs are typically trained on clean images corrupted with artificial noise, but in scie...
Many sensory-driven behaviors rely on predictions about future states of the environment. Visual input typically evolves along complex temporal trajectories that are difficult to extrapolate. We test the hypothesis that spatial processing mechanisms in the early visual system facilitate prediction by constructing neural representations that follow...
A bstract
Neurons in primate visual cortex (area V1) are tuned for spatial frequency, in a manner that depends on their position in the visual field. Several studies have examined this dependency using fMRI, reporting preferred spatial frequencies (tuning curve peaks) of V1 voxels as a function of eccentricity, but their results differ by as much a...
A deep convolutional neural network has been developed to denoise atomic-resolution transmission electron microscope image datasets of nanoparticles acquired using direct electron counting detectors, for applications where the image signal is severely limited by shot noise. The network was applied to a model system of CeO 2 -supported Pt nanopartic...
Sensory processing necessitates discarding some information in service of preserving and reformatting more behaviorally relevant information. Sensory neurons seem to achieve this by responding selectively to particular combinations of features in their inputs, while averaging over or ignoring irrelevant combinations. Here, we expose the perceptual...
Deep convolutional neural networks (CNNs) for image denoising are usually trained on large datasets. These models achieve the current state of the art, but they have difficulties generalizing when applied to data that deviate from the training distribution. Recent work has shown that it is possible to train denoisers on a single noisy image. These...
Significance
Humans have a remarkable ability to remember images they have seen, even after seeing thousands, each only once and for a few seconds. One important step toward understanding how the primate brain supports this remarkable form of memory involves pinpointing the neural activity patterns that enable image memory behavior. This paper pres...
Sensory-guided behavior requires reliable encoding of stimulus information in neural responses, and task-specific decoding through selective combination of these responses. The former has been the topic of intensive study, but the latter remains largely a mystery. We propose a framework in which shared stochastic modulation of task- informative neu...
The performance of objective image quality assessment (IQA) models has been evaluated primarily by comparing model predictions to human quality judgments. Perceptual datasets gathered for this purpose have provided useful benchmarks for improving IQA methods, but their heavy use creates a risk of overfitting. Here, we perform a large-scale comparis...
A deep learning-based convolutional neural network has been developed to denoise atomic-resolution in situ TEM image datasets of catalyst nanoparticles acquired on high speed, direct electron counting detectors, where the signal is severely limited by shot noise. The network was applied to a model catalyst of CeO2-supported Pt nanoparticles. We lev...
Objective measures of image quality generally operate by comparing pixels of a “degraded” image to those of the original. Relative to human observers, these measures are overly sensitive to resampling of texture regions (e.g., replacing one patch of grass with another). Here, we develop the first full-reference image quality model with explicit tol...
Deep convolutional neural networks (CNNs) currently achieve state-of-the-art performance in denoising videos. They are typically trained with supervision, minimizing the error between the network output and ground-truth clean videos. However, in many applications, such as microscopy, noiseless videos are not available. To address these cases, we bu...
Denoising is a fundamental challenge in scientific imaging. Deep convolutional neural networks (CNNs) provide the current state of the art in denoising natural images, where they produce impressive results. However, their potential has barely been explored in the context of scientific imaging. Denoising CNNs are typically trained on real natural im...
Prior probability models are a central component of many image processing problems, but density estimation is notoriously difficult for high-dimensional signals such as photographic images. Deep neural networks have provided state-of-the-art solutions for problems such as denoising, which implicitly rely on a prior probability model of natural imag...
Memories of the images that we have seen are thought to be reflected in the reduction of neural responses in high-level visual areas such as inferotemporal (IT) cortex, a phenomenon known as repetition suppression (RS). We challenged this hypothesis with a task that required rhesus monkeys to report whether images were novel or repeated while ignor...
We develop a model for representing visual texture in a low-dimensional feature space, along with a novel self-supervised learning objective that is used to train it on an unlabeled database of texture images. Inspired by the architecture of primate visual cortex, the model uses a first stage of oriented linear filters (corresponding to cortical ar...
Neural populations do not perfectly encode the sensory world: their capacity is limited by the number of neurons, metabolic and other biophysical resources, and intrinsic noise. The brain is presumably shaped by these limitations, improving efficiency by discarding some aspects of incoming sensory streams, while preferentially preserving commonly o...
The performance of objective image quality assessment (IQA) models has been evaluated primarily by comparing model predictions to human judgments. Perceptual datasets (e.g., LIVE and TID2013) gathered for this purpose provide useful benchmarks for improving IQA methods, but their heavy use creates a risk of overfitting. Here, we perform a large-sca...
Objective measures of image quality generally operate by making local comparisons of pixels of a "degraded" image to those of the original. Relative to human observers, these measures are overly sensitive to resampling of texture regions (e.g., replacing one patch of grass with another). Here we develop the first full-reference image quality model...
Responses of sensory neurons are often modeled using a weighted combination of rectified linear subunits. Since these subunits often cannot be measured directly, a flexible method is needed to infer their properties from the responses of downstream neurons. We present a method for maximum likelihood estimation of subunits by soft-clustering spike-t...
Motion selectivity in primary visual cortex (V1) is approximately separable in orientation, spatial frequency, and temporal frequency ("frequency-separable"). Models for area MT neurons posit that their selectivity arises by combining direction-selective V1 afferents whose tuning is organized around a tilted plane in the frequency domain, specifyin...
Motion selectivity in primary visual cortex (V1) is approximately separable in orientation, spatial frequency, and temporal frequency (“frequency-separable”). Models for area MT neurons posit that their selectivity arises by combining direction-selective V1 afferents whose tuning is organized around a tilted plane in the frequency domain, specifyin...
Deep convolutional networks often append additive constant ("bias") terms to their convolution operations, enabling a richer repertoire of functional mappings. Biases are also used to facilitate training, by subtracting mean response over batches of training images (a component of "batch normalization"). Recent state-of-the-art blind denoising meth...
Many behaviors rely on predictions derived from recent visual input, but the temporal evolution of those inputs is generally complex and difficult to extrapolate. We propose that the visual system transforms these inputs to follow straighter temporal trajectories. To test this ‘temporal straightening’ hypothesis, we develop a methodology for estima...
The original and corrected figures are shown in the accompanying Author Correction.
Sensory-guided behavior requires reliable encoding of information (from stimuli to neural responses) and flexible decoding (from neural responses to behavior). In typical decision tasks, a small subset of cells within a large population encode task-relevant stimulus information and need to be identified by later processing stages for relevant infor...
Integration of rectified synaptic inputs is a widespread nonlinear motif in sensory neuroscience. We present a novel method for maximum likelihood estimation of nonlinear subunits by soft-clustering spike-triggered stimuli. Subunits estimated from parasol ganglion cells recorded in macaque retina partitioned the receptive field into compact regions...
Sensory neurons represent stimulus information with sequences of action potentials that differ across repeated measurements. This variability limits the information that can be extracted from momentary observations of a neuron's response. It is often assumed that integrating responses over time mitigates this limitation. However, temporal response...