Page 1

Optimal decoding of stimulus velocity using a

probabilistic model of ganglion cell populations in

primate retina.

Edmund C. Lalor,1,∗Yashar Ahmadian,2and Liam Paninski2

1Department of Electronic and Electrical Engineering and Institute of Neuroscience,

Trinity College Dublin,

College Green, Dublin 2, Ireland

2Department of Statistics, Columbia University,

1255 Amsterdam Avenue, New York, N.Y. 10027, USA

∗Corresponding author: edlalor@tcd.ie

A major open problem in systems neuroscience is to understand the re-

lationship between behavior and the detailed spiking properties of neural

populations. In this work, we assess how faithfully velocity information

can be decoded from a population of spiking model retinal neurons whose

spatiotemporal receptive fields and ensemble spike-train dynamics are closely

matched to real data. We describe how to compute the optimal Bayesian

estimate of image velocity given the population spike train response, and

show that, given complete information about the displayed image, the spike

train ensemble signals speed with an average relative precision of about

2% across a specific set of stimulus conditions. We further show how to

compute the Bayesian velocity estimate in the case where we only have some

a priori information about the (naturalistic) correlation structure of the

image, but do not know the image explicitly. As expected, the performance

of the Bayesian decoder is shown to be less accurate with decreasing prior

image information. There turns out to be a close mathematical connection

between a biologically-plausible “motion energy” method for decoding the

velocity and the optimal Bayesian decoder in the case that the image is not

known. Simulations using the motion energy method reveal that it results

in an average relative precision of only 10% across the same set of stimulus

conditions. Estimation performance is rather insensitive to the details of the

precise receptive field location, correlated activity between cells, and spike

timing.

c ? 2009 Optical Society of America

OCIS codes: 330.4060, 330.4150, 330.7310, 330.5310.

1

Page 2

1.Introduction

The question of how different attributes of a visual stimulus are represented by populations of

cells in the retina has been addressed in a number of recent studies [10,13,14,23,24,29,30,32].

This field has received a major boost with the advent of methods for obtaining large-scale

simultaneous recordings from multiple retinal ganglion neurons that almost completely tile a

substantial region of the visual field [20,31]. The utility of this new method for understanding

the encoding of behaviorally-relevant signals was exemplified by [14], who examined how

reliably visual motion was encoded in the spiking activity of a population of macaque parasol

cells. These authors used a simple velocity stimulus and attempted to estimate the stimulus

velocity from the resulting spike train ensemble; this analysis pointed to some important

constraints on the visual system’s ability to decode image velocity given noisy spike train

responses. We will explore these issues in more depth in this paper.

In parallel to these advances in retinal recording technology, significant recent advances

have also been made in our ability to model the statistical properties of populations of spik-

ing neurons. [29] recently described a statistical model of a complete population of primate

parasol retinal ganglion cells (RGCs). This model was fit using data acquired by the array

recording techniques mentioned above and includes spike-history effects and cross-coupling

between cells of the same kind and of different kinds (i.e. ON and OFF cells). [29] demon-

strated that this model accurately captures the stimulus dependence and spatio-temporal

correlation structure of RGC population responses, and allows several insights to be made

into the retinal neural code. One such insight concerns the role of correlated activity in pre-

serving sensory information. Using pseudo-random binary stimuli and Bayesian inference, [29]

reported that stimulus decoding based on the spiking output of the model preserved 20%

more information when knowledge of the correlation structure was used than when the re-

sponses were considered independently.

At the psychophysical level, Bayesian inference has been established as an effective frame-

work for understanding visual perception [17]; some recent notable applications to under-

standing visual velocity processing include [3,33,35,42,43]. In particular, [42] argued that

a number of visual illusions actually arise naturally in a system that attempts to estimate

local image velocity via Bayesian methods (though see also [15,39]).

Links between retinal coding and psychophysical behavior have also been recently exam-

ined using Bayesian methods; [37], for example, examine the contribution of turtle RGC

responses to velocity and acceleration encoding. This study reported that the instantaneous

firing rates of individual turtle RGCs contain information about speed, direction and accel-

eration of moving patterns. The firing rate-based Bayesian stimulus reconstruction carried

out in this study involved a couple of key approximations. These included the assumptions

that RGCs generate spikes according to Poisson statistics and that they do so independently

2

Page 3

of each other. The work of [29] emphasizes that these assumptions are unrealistic, but the

impact of detailed spike timing and correlation information on velocity decoding remains

uncertain.

The primary goal of this paper is to investigate the fidelity with which the velocity of

a visual stimulus may be estimated, given the detailed spiking responses of the primate

population RGC model of [29], using an optimal Bayesian decoder, with and without full

prior knowledge of the image. We begin by describing the mathematical construction of this

optimal decoder, and then compare the optimal estimates to those based on a “net motion

signal” derived directly from the spike trains without any prior image information [14]. We

derive a mathematical connection between these two decoders and investigate the decoders’

performance through a series of simulations.

2.Methods

2.A.Model

The generalized linear model (GLM) [8,21] for the spiking responses of the sensory network

used in this study was described in detail in [29]. It consists of an array of ON and OFF

retinal ganglion cells (RGC) with specific baseline firing rates. Given the spatiotemporal

image movie sequence, the model generates a mean firing rate for each cell, taking into

account the temporal dynamics and the center-surround spatial stimulus filtering properties

of the cells. Then, incorporating spike history effects and cross-coupling between cells of the

same type and of the opposite type, it generates spikes for each cell as a stochastic point

process.

In response to the visual stimulus I, the i-th cell in the observed population emits a spike

train, which we represent by a response function

ri(t) =

?

α

δ(t − ti,α),(1)

where each spike is represented by a delta function, and ti,αis the time of the α-th spike of

the i-th neuron. We use the shorthand notation riand r, for the response function of one

neuron and the collective spike train responses of all neurons, respectively. The stimulus, I,

represents the spatiotemporal luminance profile, I(n,t), of a movie as a function of the pixel

position, n, and time t.

In the GLM framework, the intensity functions (instantaneous firing rate) of the responses

riare given by [25,26,29,40]

λi(t) ≡ f

?

bi+ Ji(t) +

?

j,β

hij(t − tj,β)

?

, (2)

3

Page 4

where f(·) is a positive, strictly increasing rectifying function (in this case, f(·) = exp(·)). The

birepresents the baseline firing rate of the cell, the coupling terms hijmodel the within- and

between-neuron spike history effects noted above, and the stimulus input, Ji(t), is obtained

from I by linearly filtering the spatiotemporal luminance,

Ji(t) =

? ?

ki(t − τ,n)I(τ,n)d2ndτ,(3)

where ki(t,n) is the spatio-temporal receptive field of the cell i. Given Eq. (2), we can write

down the point process log-likelihood in the standard way [34]

logp(r|I) ≡

?

i,α

logλ(ti,α) −

?

i

?T

0

λi(t)dt.(4)

For movies arising from images rigidly moving with constant velocity v we have

I(t,n) = x(n − vt),(5)

where x(n) is the luminance profile of a fixed image. Substituting Eq. (5) into Eq. (3), and

shifting the integration variable n by vτ, we obtain

Ji(t) =

?

Ki,v(t;n)x(n)d2n,(6)

where we defined

Ki,v(t;n) ≡

?

ki(t − τ,n + vτ)dτ.(7)

In the following we replace p(r|I) with its equivalent p(r|x,v) (since, via Eq. (5), I is given

in terms of x and v), and use the short-hand matrix notation Ji= Ki,v·x for Eq. (6).

2.B.Decoding

In order to estimate the speed of the moving bar given the simulated output spike trains, r,

of our RGC population, we employed three distinct methods. The first method involved a

Bayesian decoder with full image information, the second method utilized a Bayesian decoder

with less than full image information, while the third method involved an “energy-based”

algorithm introduced by [14] which used no explicit prior knowledge of the image.

2.B.1.Bayesian Velocity Estimation

To compute the optimal Bayesian velocity decoder we need to evaluate the posterior prob-

ability for the velocity, p(v|r), conditional on the observed spike trains r. Given a prior

distribution pv(v), from Bayes’ rule we obtain

p(v|r) =

p(r|v)pv(v)

?

v′p(r|v′)pv(v′).(8)

4

Page 5

If the image x (e.g. a narrow bar of nonzero contrast) is known to the decoder, then we can

replace p(r|v) with the likelihood function p(r|x,v), obtaining

p(r|x,v)pv(v)

?

p(r|x,v) is provided by the forward model Eq. (4), and therefore computation of the the

posterior probability is straightforward in this case.

Alternatively, if the image is not fully known, we represent the decoder’s uncertain a priori

knowledge regarding x with an image prior distribution px(x). In this case, p(r|v) is obtained

by marginalization over x

p(v|r,x) =

v′p(r|x,v′)pv(v′). (9)

p(r|v) =

?

p(r,x|v)dx =

?

p(r|x,v)px(x)dx.(10)

Hence, we will refer to p(r|v) as the marginal likelihood. Given the marginal likelihood,

Eq. (8) allows us to calculate Bayesian estimates for general velocity priors. The prior dis-

tribution, px(x), which describes the statistics of the image ensemble, can be chosen to have

a naturalistic correlation structure. In our simulations in Sec. 3 we used a Gaussian image

ensemble with power spectrum matched to observations in natural images [7,12].

In general, the calculation of the high-dimensional integral over x in Eq. (10) is a difficult

task. However, when the integrand p(r,x|v) is sharply peaked around its maximum (which

is the maximum a posteriori (MAP) estimate for x — as the integrand is proportional to the

posterior image distribution p(x|r,v), by Bayes’ rule) the so-called “Laplace” approximation

(also known as the “saddle-point” approximation) provides an accurate estimate for this

integral (for applications of this approximation in the Bayesian setting, see e.g., [16]). The

Laplace approximation in the context of neural decoding is further discussed in, e.g., [2,4,9,

18,28]. We briefly review this approximation here.

Following [7], we consider Gaussian image priors with zero mean and covariance, Cx, chosen

to match the power spectrum of natural images [12]. Let us define the function

L(x,r,v) ≡ logpx(x) + logp(r|x,v) +1

2log(2π)d|Cx|, (11)

where d represents the number of pixels in our simulated image, and rewrite Eq. (10) as

p(r|v) =

1

?(2π)d|Cx|

?

eL(x,r,v)dx. (12)

Using Eq. (4) and px(x) = N(0,Cx), we obtain the expression

L(x,r,v) = −1

2x

TC−1

xx +

?

i

??

α

logλi(ti,α;x,r) −

?

λi(t;x,r)dt

?

,(13)

5